A few years ago, the company for which I worked went through the monumental task of defining neighborhoods for a number of cities in the area where they had real estate agents. Neighborhood data is hard to get, and this task required a lot of back and forth between the person responsible for the mapping and the people who knew the neighborhoods. The maps were captured in Google’s My Maps feature, and exported as KML to a vendor who would then build neighborhood pages and maps with the data. Much of the neighborhood page would be driven off data entered in an admin back end system (it was a custom CMS, essentially).
Almost as an afterthought, I asked the vendor to provide an API for the neighborhoods, including the polygon data. I wrote up an API spec, had it reviewed by my team, and obtained approval for the vendor to build it. If I recall, it was in the neighborhood of a couple thousand dollars, and the vendor had never been asked to build something like this before.
This one API allowed the company to apply dearly won neighborhood information in so many ways:
- generate statistics by neighborhood against any lat/lng coded data
- tag any geocoded content with neighborhood meta data
- find new and sold listings by neighborhood
- understand who were top listing agents in each neighborhood
- create internal BI tools
- write internal recruiting tools
- pull other geocoded data by neighborhood
- tag transactions with neighborhood meta data
Many of these were accomplished with a plugin to the data processing tool (Pentaho Kettle) that used the Java Topology Suite. Creating JTS geometries is expensive, so the plugin caches them with a simple hashmap cache. The plugin java code is garbage collected fully on each data load run, so this simple solution is appropriate, rather than a more complex LRU cache.
However, this solution isn’t perfect. Often, if a property was on the boundary, the JTS code would often put it in the wrong neighborhood. Boundaries of neighborhoods are incorrect or overlap. Points are incorrect because geocoding isn’t perfect. Human review is still required.
But, the very fact that the neighborhood data was so accessible meant that the company could ask questions (how many homes are in each neighborhood, what are the three newest listings in this neighborhood) that simply couldn’t have been asked if there was no API. Having an internal API that exposed hard won business knowledge within the company was beneficial, even if it will never be exposed or monetized outside the company.