One of the features I find useful in Excel is the ability to create “pivot” tables. Essentially pivot tables let you summarize big tables of data in different ways, using different variables to “pivot” your data around (hence the name, I guess). Pivot tables are most easily understood through an example, so here’s one done using Excel 2007, and the sort-of-equivalent “Data-Pilot” in OpenOffice.org Calc (OO.o Calc).
Below is the data we’ll be working with. As you can see, it’s a long spreadsheet with eight columns (Representative, Region, Month, Publisher, Subject, Sales, Margin, and Quantity) and over 300 rows. Certain calculations, like total sum of sales, are easy–you just select the sales column and use Excel’s or OO.o Calc’s sum function. But what if you wanted total sum of sales, but organized first by region, then by representative? That’s where pivot tables come into play, so let’s get started!
Download the CSV file and open it up in Excel. Select all the data, jump over to the “insert” menu, and click on “PivotTable”. This will open a dialog box similar to the following.
I usually select the option to insert the pivot table in a new sheet, and this brings us to the following screen.Pretty plain looking, right?
The important part is the “PivotTable Field List” menu to the right of the screen. In the top half, you have a list of the variables in your data. The bottom half is where the “pivoting” gets set up simply by dragging the variables into the relevant areas in the bottom. In this example, I started by dragging “Region” to be the primary way to summarize the rows, and I dragged “Representative” below that to indicate that the rows should be further sorted by the sales representatives, and finally I dragged “Sales”, “Margin”, and “Quantity” to the sum value box.
By default, Excel assumes you want the sum, but you can also do different data summaries by right-clicking on the variables that you’ve dragged to the “values” corner. Whatever you drag into the filters area will create additional filter options for your data.
Experiment a little bit–drag things around a bit and see what types of consolidated results you end up with. Whatever you change in the PivotTable Field List dialog area is immediately reflected in the main spreadsheet area (which is one huge advantage that Excel 2007 has over OO.o Calc).
By dragging some variables around, here’s what we can quickly end up with:
The process in OpenOffice.org Calc is pretty similar. The option can be found under the “Data” menu under “DataPilot”. The screencap below shows the DataPilot options window with some variables dragged into the relevant areas to create a simple pivot table in OO.o Calc.
The “Page Fields” region in OO.o Calc is the equivalent of the filters area in Excel 2007. The rest is pretty much the same, just not as pretty. The output (in the screencap below) isn’t as pretty either, but it serves its function just fine.
So, now that you know all about pivot tables in Excel and OO.o Calc, you can have a data sorting and summarizing party. Once you get bored with that, you can sit around impatiently and wait for me to write about how you can do this sort of thing with R.