Skip to content

A tutorial for the Census Data API

I’ve been exploring APIs ever since attending the hackathon at Gluecon last year.  Since I work for a real estate firm, I’m interested in APIs with a geographic component.  I had heard that USA Today had APIs for census data.  They do.  I also wondered if the census bureau had any APIs available.  They do.  However, the USA Today APIs are much much friendlier.

But the USA Today APIs have some limitations.  Also, the data sets of these two APIs intersect, but neither is a subset of the other.  So, I thought it’d be useful to have a quick start for the census data API.

  1. First, register for an API key.  This only requires an email address and a name.  The key got sent to my spam folder, so make sure you check that if you don’t see the email about five minutes after you submit.
  2. Decide what type of information you want to find.  There are two sets of information–the 2010 census summary file or thee American community survey (for the last five orsix years).  Since the ACS is not available in the USA Today  API, that’s what I’ll explore, but the methodology is similar for the summary file.
  3. The first place to look is the variables XML file showing you what data is available.  Here is the ACS variables file for the most recent five years, and here is the census summary variables file.  For a high level overview of what you can find, you’ll want to focus on the ‘concept’ tags.  You’ll see names like “C24070.  Industry by Class of Worker for the Civilian Employed Population 16 Years and Over” and “B24123.  Detailed Occupation by Median Earnings for the Full-Time, Year-Round Civilian Employed Female Population 16 Yrs and Over”.  While the Census has documents explaining what each of these mean, this website seems to have aggregated the metadata that will let you explore the types of data available.
  4. OK, you’ve found your interesting data.  I’m going to look at the concept “B08126.  MEANS OF TRANSPORTATION TO WORK BY INDUSTRY” for the rest of this example.  You need to pick a variable from the concept, so I’ll pick B08126_003E/Construction and some subsets: B08126_018E/Car, truck, or van – drove alone, B08126_033E/Car, truck, or van – carpooled, and B08126_048E/Public transportation (excluding taxicab). As you see, you can request multiple variables at once.
  5. Now you need to find the area that you want to look at.  The census developer site has examples which I found to be massively confusing when I first looked at them.  What you need to do is decide what level of geography you want, and then find the appropriate code.  Here are state codes.  If you want anything more detailed, you’ll probably want a FIPS code which ranges from states to as small as the Census data is gathered.  Here’s a list of valid geographic delineations.  You may have the information if you are interested in a county, but if all you have is a lat/lng (as if, say, you are looking at a property listing), you can look up the FIPS code.  If you need to get a lat/lng for a city, I’d suggest google.
  6. I’m going to look up Lafayette, CO, which google tells me is at 39.9936, -105.0892.  So, the FIPS request looks like http://data.fcc.gov/api/block/find?latitude=39.9936&longitude=-105.0892&showall=false&format=json and reveals that the FIPS code for this lat/lng is 080130608005010.
  7. A FIPS code is broken into sections.  The first two numbers are the state, so 08 is Colorado’s code.  The next three are the county, so 013 represents Boulder county.  The next six are the tract code, so 060800 is the tract that we are looking at (tracts “provide a stable set of geographic units for the presentation of decennial census data”).  You can see tracts on maps, and our specific tract [pdf].  After the tract comes the block, which is an even smaller subsection of homes.  The block is 5010.  Apparently FIPS codes are on their way out, in favor of GNIS codes.  I couldn’t find a lat/lng to GNIS code service, however.
  8. Now you know what area you are looking at.  Rather than the county, I want to examine census data based on the tract I’ve found (good old tract 060800).
  9. Next step is to build the URL for the API request.  Here’s where the examples come in handy.  For tract level data for the ACS, we look at this example, and see the format of the call: http://api.census.gov/data/2011/acs5?get=[the data code that you found above]&for=tract:[tract code or *]&in=state:[state code]+county:[county code]&key=[your key]
  10. Therefore, my API request looks like http://api.census.gov/data/2011/acs5?get=B08126_033E,B08126_018E,B08126_048E&for=tract:060800&in=state:08+county:013&key=… and I get something like this: [[“B08126_003E”,”B08126_033E”,”B08126_018E”,”B08126_048E”,”state”,”county”,”tract”],[“309″,”57″,”188″,”0″,”08″,”013″,”060800”]] stating that there are 309 construction jobs reported in this tract that answered the transportation question, and that no one took public transport, but more carpooled than drove alone.

Note that I ignored the margin of error fields, but they are there.  Now you have the data, you can use something like d3.js to display it.

Note that the TOS of the Census APIs don’t mention explicit limits but if you are planning to make a lot of calls, consider downloading the full dataset and using that instead.

Finally, you can sign up for the Census API developer forums, though this seems to be more of a feature request forum and less of a technical help arena.  And there doesn’t appear to be anonymous browsing of forums–you have to sign in to view anything.

Update 1/22: From the developer forums comes the comment that the forums are meant to be both technical support and feature requests.

RSS email campaigns

Email is one of the best ways to keep users coming back to your site.  I’d lay out the arguments, but Patrick McKenzie already has.  Go read that article if you doubt my statement.

If you generate a lot of content via your blog, then an email campaign that pulls from your RSS feeds is a great way to generate newsletter content.  If you have two sources of content–a content blog and a link blog, for example–you can have the newsletters pull from both RSS feeds and have them feed different sections of your newsletter.

I am a big fan of mailchimp as an email delivery service (they have a great free plan I’ve used multiple times), and was ready to roll up my sleeves and use their API to read from a set of RSS feeds and generate a newsletter from that feed.  But, luckily, they’ve already implemented RSS email campaigns.  If you want a monthly newsletter and were confident of new articles in your RSS feeds, you can schedule the newletters out for a year, and know that you’d be sending out topical fresh content every month.  (Note that if you didn’t have new articles, the newsletter would still be sent if you didn’t intervene.  And I can’t think of something that would make an internet company look dumber than to send the same content twice in an email newsletter.)

I’m aware of solutions like paper.li that allow you to aggregate multiple sources of content in a pretty package, but mailchimp seems to give you more control and can be used for non RSS email campaigns as well.

Take a look!