I recently built out a read only JSON API with a Google spreadsheet as the back end data source.

Why do such a thing? I didn’t need to modify the back end, but wanted to make the data available to other software. The people who maintain this data are very comfortable using Google spreadsheets. While I could have written a custom CRUD app, this didn’t seem like a good use of time when Google spreadsheets had served us well in the past.

How did I do this? I first of all created a sanitized spreadsheet, using importrange and regexreplace. This level of indirection assures me that if the source data changes, I can adjust fairly easily. If the user managing the spreadsheet wants to rearrange columns, I can adjust my sanitized spreadsheet easily.

Then I created a Google apps script and used doGet to respond to requests and the spreadsheet service to retrieve the data. The content service lets you serve JSON with the appropriate mime type. I used qunit for Google apps script (invaluable when you are working in the loosely typed javascript world and relying on cloud resources). I also worked with the parameters to build the querying needed for our application.

Then, in order to make it look like a normal API call, I fronted the script with Varnish and did some regsub magic in my VCL file (as well as some light authentication).

This approach has the benefit of keeping everything in Google’s cloud, and allowing you to access the spreadsheet data easily.

This approach has significant limitations, however.

  • Google apps scripts calls are slow, especially when accessing spreadsheet data. Using the cache service can help.
  • You cannot return anything other than a 200 response code. None of the other response codes are available.
  • The actual content is served after a redirect, so caching it at the Varnish level is difficult (though possible), and clients must be able to follow redirects.
  • Google changes the ip of the server running the script. This is not such a big problem, unless your version of Varnish only takes IP addresses in the VCL file, not hostnames. Like ours.

Was this a good idea? Well, it let me build out the API relatively quickly without affecting the users managing the data or finding any other place to put it. But we’ll probably move away from this due to the limitations listed above. One we’ve found particularly painful is the IP address switching, which usually only shows up in our automated testing.

We’ll probably start pushing the data daily (it doesn’t change all that often) to a local JDBC database using the JDBC service and use either RestSQL or DropWizard to generate an API for it. (RestSQL is quicker, but DropWizard lets us maintain format compatibility.)

2 thoughts on “Google Spreadsheets as REST API sources

  1. Tom Malaher says:

    Thanks, Dan I’ll be looking into a bunch of those APIs you name dropped

  2. moore says:

    Tom, restsql is great for quick APIs based on a database table or tables. All you need is a servlet container and some SQL. However, is lis limited in the format of the json data in can return and is not quite a 1.0 in terms if features. Have had no problems with stability, though.

    Dropwizard, on th other hand, requires no servlet container, is extremely flexible in json format, and has other nice things like built in health checks. The downside is that you have to have a java project and it is slower to get going as a developer.

    Please let me know how either of these work for you!

Leave a Reply

Your email address will not be published. Required fields are marked *


− five = 1

 


© Moore Consulting, 2003-2010 +