In a previous how to video, we learned how to clean and tidy data in Excel using functions like VLOOKUP. In this how to, we will look at how to summarise that data to answer specific questions using the PivotTable utility in Excel.
In that previous how to, we were working with a set of data released by the UK government on all flights taken by staff of the UK Department for International Development in 2011. The original data released under a Freedom of Information request didn't have the results in a very standard (and therefore, useful) format. But we tidied them by linking the data to an international standard for airport codes, IATA. By linking those two datasets, we now have information about what countries were flown to as well as exact geographic latitude and longitude for the airports. The tidied spreadsheet is available as a Google Spreadsheet for download.
Now that we have the raw data, we might have a number of questions, like:
- What country did DFID staff fly to most frequently in 2011?
- What percentage of flights are on British Airways?
- How much did the average economy class flight between London and Dubai cost DFID staff in 2011 compared to the cost of the average business class flight?
For simple datasets, we could maybe count or use the sort feature to quickly sum up our data. But this dataset contains 8,860 rows – too much for simple methods like that. Luckily, PivotTables can help us to answer these questions quickly. Find out how in the video and tutorial below.
Step 1: Insert a PivotTable
First of all, select the whole dataset. This can be done by pressing the square in the top-left corner of the spreadsheet, or by selecting each of the columns of data while holding down shift.
Then create a PivotTable. For a Mac, this option is in the 'Data' tab. On a PC, this option is available under the 'Insert' tab.
This will create a new sheet with a blank PivotTable.
Step 2: Use PivotTable creator to summarise data
Our first question is pretty straightforward: What country did DFID staff fly to most frequently in 2011? To find the answer to this question, we need to create a table that has all of the 'destination countries' in one column, and the count of all the visits in another.
To create this table, we use the PivotTable builder tool. If this hasn't automatically popped up, you can open in under the PivotTable tab.
In this case, we want to drag the 'Destination country' label to the 'ROW LABELS' box in the bottom-left corner. A list of all the countries DFID staff flew to in 2011 now appears.
But we also want to know the count of how many times DFID went to each of these countries. So, again, we draft the 'Destination country' label, but this time we place it in the 'VALUE' box in the bottom-right corner.
The counts of the number of times each of these countries appears in the dataset are now listed.
Our dataset is in alphabetical order, so we need to select the value count for the first row (in this case, '52' in the Afghanistan and click the 'sort Z-A' button (as demonstrated in the previous how to).
We discover that the country that DFID staff flew to most frequently in 2011 was the United Kingdom itself, nearly 4000 times!
At first glance, this might seem counter-intuitive. Shouldn't DFID staff be flying out of the country? Well, DFID has a good number of staff working across the world, and it makes sense that they might occasionally need to come back to the UK for meetings, etc.
Step 3: Change what the value displays
Our second question is a little bit trickier: what percentage of flights are on British Airways?
The setup for this table is similar to the last question. We clear the table, and then drag 'Airline' into the 'ROW LABELS' box. And then we drag the same label into the 'VALUE' box. Automatically a PivotTable will be created with the number of flights on each airline.
To fully answer our question, we want to change that count into a percent. In the VALUE box, you may notice that there is a little 'i' button on the right of 'Airline' label. Click it and a new box should appear.
When it does, you'll note that we can select count, average, sum etc. We still want the count, but we want the percentage that count is for each row of the total count. To do this we need to click the options button.
In the dropdown menu 'Show data as:' select '% of total' and then select ok.
We now have a PivotTable of all airlines. If we scroll down to British Airways, we see that 32% of all DFID staff flights in 2011 were on British Airways.
Step 4: Change row, column and value displays
Our final question is on the tough side, if only because there are so many elements to it: How much did the average economy class flight between London and Dubai cost DFID staff in 2011 compared to the cost of the average business class flight?
To do this we start by dragging the 'Departure city' label to the 'ROW LABEL' box. This lists out all of the departure cities, but we only want one: London. By clicking the box on the table that says 'Row labels', we can filter it and select only London.
We can then add in a destination airport, ALSO as a 'ROW LABEL', but this time under the 'pill' for 'Departure city'. This will result in a list of all the destinations flown from London.
Part of the question asks about class of service. We'll drag that into the 'COLUMN LABEL' box in the top-right corner. Various columns like 'Business', 'Economy' and 'First' now appear.
The question looks at average cost. We can drag in the 'Cost of booking excluding VAT' label into the VALUES box. And similar to last time, we press the 'i' button, and select 'average' instead of 'count' and select OK.
We now have a table with all the information we need. If we scroll down to Dubai, we see that the average business class flight in 2011 between London and Dubai cost DFID £1458.66, while the average economy class flight cost £444.19, and while we're here, the average economy plus flight cost £889.80.
On its own that doesn't tell us too much (we would expect business class to be more expensive), but it might be helpful when comparing year on year to see how prices are changing.