I recently built out a read only JSON API with a Google spreadsheet as the back end data source.
Why do such a thing? I didn’t need to modify the back end, but wanted to make the data available to other software. The people who maintain this data are very comfortable using Google spreadsheets. While I could have written a custom CRUD app, this didn’t seem like a good use of time when Google spreadsheets had served us well in the past.
How did I do this? I first of all created a sanitized spreadsheet, using importrange and regexreplace. This level of indirection assures me that if the source data changes, I can adjust fairly easily. If the user managing the spreadsheet wants to rearrange columns, I can adjust my sanitized spreadsheet easily.
Then I created a Google apps script and used doGet to respond to requests and the spreadsheet service to retrieve the data. The content service lets you serve JSON with the appropriate mime type. I used qunit for Google apps script (invaluable when you are working in the loosely typed javascript world and relying on cloud resources). I also worked with the parameters to build the querying needed for our application.
Then, in order to make it look like a normal API call, I fronted the script with Varnish and did some regsub magic in my VCL file (as well as some light authentication).
This approach has the benefit of keeping everything in Google’s cloud, and allowing you to access the spreadsheet data easily.
This approach has significant limitations, however.
- Google apps scripts calls are slow, especially when accessing spreadsheet data. Using the cache service can help.
- You cannot return anything other than a 200 response code. None of the other response codes are available.
- The actual content is served after a redirect, so caching it at the Varnish level is difficult (though possible), and clients must be able to follow redirects.
- Google changes the ip of the server running the script. This is not such a big problem, unless your version of Varnish only takes IP addresses in the VCL file, not hostnames. Like ours.
Was this a good idea? Well, it let me build out the API relatively quickly without affecting the users managing the data or finding any other place to put it. But we’ll probably move away from this due to the limitations listed above. One we’ve found particularly painful is the IP address switching, which usually only shows up in our automated testing.
We’ll probably start pushing the data daily (it doesn’t change all that often) to a local JDBC database using the JDBC service and use either RestSQL or DropWizard to generate an API for it. (RestSQL is quicker, but DropWizard lets us maintain format compatibility.)