{"id":714,"date":"2011-04-11T17:58:37","date_gmt":"2011-04-11T23:58:37","guid":{"rendered":"http:\/\/www.mooreds.com\/wordpress\/?p=714"},"modified":"2011-04-09T18:16:26","modified_gmt":"2011-04-10T00:16:26","slug":"parsing-street-addresses-in-a-java-application","status":"publish","type":"post","link":"https:\/\/www.mooreds.com\/wordpress\/archives\/714","title":{"rendered":"Parsing street addresses in a java application"},"content":{"rendered":"<p>I recently had to find a way to parse a street address into its component parts, and thought I&#8217;d share my adventure.<\/p>\n<p>The idea is to take a string like &#8220;123 S Main Street&#8221; and break it apart into the street number (123), the street direction (S), the street name (Main) and the street type (Street).<\/p>\n<p>At first, I thought that regular expressions would work, but the sheer variety of legal postal street addresses quickly dissuaded me, as did my boss&#8217;s misgivings.<\/p>\n<p>Stackoverflow has <a href=\"http:\/\/stackoverflow.com\/questions\/16413\/parse-usable-street-address-city-state-zip-from-a-string\">a nice discussion of the problem<\/a>, which gave me some additional pointers.\u00a0 There&#8217;s a <a href=\"http:\/\/www.address-parser.com\/\">commercial solution<\/a>, which is available as a COM component or a web service&#8211;I didn&#8217;t try this.\u00a0 There is a free, but application\/attribution required, <a href=\"https:\/\/webgis.usc.edu\/Services\/AddressNormalization\/WebService\/DeterministicNormalizationWebService.aspx\">web service<\/a> provided by a university that did a great job (thanks, California tax payers).\u00a0 This solution is also available in a for-pay variant.<\/p>\n<p>Neither of these were desirable because we needed to parse a lot of addresses quickly, and calling out over the web can be slow.\u00a0 Some more digging turned up this <a href=\"http:\/\/stackoverflow.com\/questions\/877742\/java-postal-address-parser\">stack question<\/a> and\u00a0 <a href=\"http:\/\/jgeocoder.sourceforge.net\/parser.html\">JGeocoder<\/a>, which has a fairly robust address parser.\u00a0 It&#8217;s not perfect, but it was free and open source.\u00a0 I am not sure if it is still in development (the author didn&#8217;t respond to my email) but it does what we need it to do.<\/p>\n<p>As an added bonus, we&#8217;re using <a href=\"http:\/\/www.pentaho.com\/\">pentaho<\/a> for the data processing, and you can call <a href=\"http:\/\/type-exit.org\/adventures-with-open-source-bi\/2010\/06\/using-java-in-pentaho-kettle\/\">java classes directly<\/a> from your <a href=\"http:\/\/wiki.pentaho.com\/display\/EAI\/Pentaho+Data+Integration+Steps\">data processing steps<\/a>, so I didn&#8217;t even have to wrap the java call in a shell script or anything.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently had to find a way to parse a street address into its component parts, and thought I&#8217;d share my adventure. The idea is to take a string like [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,33],"tags":[],"class_list":["post-714","post","type-post","status-publish","format-standard","hentry","category-java","category-useful-tools"],"_links":{"self":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/714","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/comments?post=714"}],"version-history":[{"count":2,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/714\/revisions"}],"predecessor-version":[{"id":716,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/714\/revisions\/716"}],"wp:attachment":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/media?parent=714"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/categories?post=714"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/tags?post=714"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}