As part of the 2014-15 On Think Tanks Data Visualisation Competition, we're producing a number of 'how tos' to better support think tanks to develop their visualisations. The plan is to have the how tos cover the range of skills and techniques required to effectively visualise data: from data collection, to data cleaning, and finally to data visualisation.
This how to is part of section on data collection.
We will look at how to use Google Drive Spreadsheets to import live data from the web. I should note that this function is available in newer versions of Excel as well, though it operates slightly differently.
But why bother?
Well, the sky is the limit here, but a few reasons to import data in this way are:
- It automatically updates when you re-open the spreadsheet. That makes it particularly good for information that is in flux, evolving or frequently changed.
- The live data can also automatically update the results of functions, equations and visuals.
Don't believe me? Check out this example that uses a similar function, IMPORTRSS, to map recent website updates to social shares.
With justifications out of the way, here's how to actually do it!
Step 1: Find a website with a relevant table or list
Many government and international organisations have websites full of data. And while a lot of it is locked away in PDFs (don't worry -- that's a how to we've got coming up!), some still make the data available in HTML, either as a table or as a list.
Because of its desire to be as accessible as possible around the globe, Wikipedia articles often still use HTML tables, and might be a good jumping off point in a data search.
In the example above, we've gone to the Centers for Disease Control and Prevention (CDC) webpage tracking outbreaks of the Ebola virus since its discovery. I've chosen this because this function is more relevant when using data that will be updated with some frequency. Given the close scrutiny of the recent Ebola outbreak in Western Africa, we can expect that these numbers will be changed as new information becomes available.
Step 2: Create a new Google Drive Spreadsheet
A Google account is required to use Google Drive (formerly Google Docs), but they are available free of monetary charge for most users.
Once an account is created, simply select the red button on the top-left corner of the screen that says new, and in the dropdown menu, select 'Spreadsheet'.
Step 3: Enter a new formula using IMPORTHTML
In a new cell, enter the following generic formula:
=IMPORTHTML("[URL]","[table or list]", [number of table on webpage])
In the example in the video with the CDC website, the actual formula therefore becomes:
After entering the formula and hitting the return or enter key, Google Drive will likely give a note that it's loading. And within a few moments the table will appear.
Tips and tricks
This is a very easy tutorial to begin with. But there can still be some trip ups in executing it properly. Here are a few tips and tricks:
- Although we're pulling in a table using the IMPORTHTML function, it's worth noting that there are a number of similar functions that might also be useful, in particular IMPORTXML and IMPORTRSS can help to pull information from a website's news feed.
- Be sure to put the URL between quotation marks. Same goes for the word "table" or the word "list" after the first comma.
- The example we use pulls in a table, but Google Drive Spreadsheets can equally pull in a list, whether it is unordered (i.e. bullets) or ordered (i.e. numbered).
- The number after the second comma indicates which table to pull from the website. Unfortunately, there is no way of knowing for sure which table that might be without a bit of trial and error. Even something that appears to be the only table on the page might not be -- in some websites navigation menus are still done through tables. So start with the number 1, and if it's not the desired table, try 2, etc.