Skip to content

Metabase: An Easy Query Builder For Your RDBMS

If you have a relational database as the primary datastore for your application, and you want to easily allow non technical folks to build reports against data in that database, I highly recommend evaluating metabase. This open source query builder is dead simple to install (takes about one minute on a free heroku dyno). You can create individual users, and they all get access to an easy to use interface so they can look at data in your tables. All the nomenclature is non technical, nary a select, group by or from clause to be seen. It’s written in java, but you don’t have to know java to run it.

We haven’t used this for long, so I can’t talk much about the warts, but I was referred to this by someone else who had great success in using it in their organization. The only downside that I’ve seen so far is that joins are not supported, so you end up having to create views if your database is normalized. More on that decision.

Serverless Framework

I had coffee with an acquaintance who is doing a lot of event driven data processing. Whereas ten years ago to tackle this problem you might use an ETL tool like Pentaho or Talend, now his process runs entirely on AWS Lambda functions. He is leveraging the Serverless framework to manage and deploy these applications. As I understand it there is a thin shim layer between the business logic and the lambda event handler, but the business logic is isolated and knows nothing about its environment. That makes the business logic very testable.

His description of the Serverless framework intrigued me. As he described it, the framework is driven by a simple yaml file and takes care of, among other tasks, the complicated infrastructure set up to tie Lambda functions to a variety of AWS events. I haven’t done it myself, but I’ve heard that setting up a lambda to API Gateway link is a real bear. Doing so allows a lambda function respond to a web requests without any AWS authentication, and is a key use case.

You can write and deploy lambda functions in any language that AWS Lambda supports (unfortunately, not java 9 at the moment). Here’s a java/maven/serverless tutorial. It also supports multiple cloud providers, though I haven’t done much beyond note that the documentation exists.

However, using Serverless does require writing code. If evaluating a a complicated ETL process which non developers needed to be able to understand and support, Serverless would not be a good fit. I’m not aware of any abstraction layers on top of it, though I guess you could run, for example, Pentaho Kettle jobs within lambda. There’s also an issue around cold start times–when your code hasn’t been invoked for a while, it can take longer to start up when a request or event occurs. Apparently there are partial solutions, but your lambdas still get cycled every few hours regardless.

I worked through some of the tutorials and was impressed at just how easy it was to get started. If I had a simple API or data processing pipeline to build, Serverless would definitely be on my short list of possible implementation options. It is very inexpensive, scales easily and encourages encapsulation.

Incidentally, my acquaintance’s company is hosting a lunch and learn on this technology at the end of the month. More details here.

Publish java artifacts to s3 using maven

If you don’t want to run a maven repository server like Nessus, you can use AWS S3 for your maven repository for your java artifacts.  You can make the repository public in the same way as you’d make a website public.  However, it’s more likely that you’ll want to make it private and authenticate using AWS IAM.  Here are some step by step instructions that appear useful.

(Note there’s another S3 maven wagon, but it does not appear to support authentication.)

Why do this?  It’s one more piece of infrastructure that you don’t have to maintain, update, and run.

 

Step back and review the problem: SSL Edition

I spent the better half of a week recently getting an understanding of how SSL works and how to get client certificates, self signed certificates and java all playing nicely.  However, in the end, taking a big step back and examining the problem led to an entirely different solution.

The problem in question is access to a secure system from a heroku application.  The secure system is protected by an IP based firewall, basic authentication over SSL, and client certificates that were signed by a non standard certificate authority (the owner of the secure system).

We are using HttpClient from Apache for all HTTP communication, and there’s a nice example (but, depending on your version of the client, run loadKeyMaterial too, as that is what ends up sending the client cert).  When I ran it from my local machine, after setting the IP to be one of the static IPs, I ended up seeing a wide variety of errors.  Oh, there are many possible errors!  But finally, after making sure that the I could import the private key into the store, was using the correct cipher/java version combo, writing a stripped down client that didn’t interface with any other parts of the system, learning about the javax.net.debug=ssl switch, making sure the public certificates were all in the store, and that I had IP based access to the secure system, I was able to watch the system go through a number of the steps outlined here:

SSL Message flow
Image from the JSSE Guide

But I kept seeing this exception: error: javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure. I couldn’t figure out how to get past that. From searching there were a number of issues that all manifested with this error message, and “guess and check” wasn’t working. Interestingly, I saw this error only when running on my local machine with the allowed static IP. When I ran on a different computer that was going through a proxy which provided a static IP, the client certificate wasn’t being presented at all.

After consulting with the other organization and talking with members of the team about this roadblock, I took a step back and validated the basics. I read the JSSE Reference guide. I used the openssl tools to verify that the chain of certificates was valid. (One trick here is that if you have intermediate certs in between your CA and the final cert, you can add only one -untrusted switch. So if you have multiple certs in between, you need to combine the PEM files into one file.) I validated that the final certificate in the keystore matched the CSR and the private key. Turns out I had the wrong key, and that was the source of the handshake issue. Doh!

After I had done that, I took a look at the larger problem. Heroku doesn’t guarantee IP addresses, not even a range. There are add on solutions (proximo, quotaguard, fixie) that do provide a static IP address, and that was our initial plan. However, all of these are proxy based solutions. A quick search turns up the unpleasant reality that proxies can’t pass client certificates. The post talks about a reverse proxy, but it applies as well to regular proxies. From the post:

Yes, because a client can only send its certificate by using encrypted and
SIGNED connection, and only the client can sign the certifikate (sic) so server
can trust it. The proxy does not know the clients private key, otherwise the
connection would not be secure (or not in the way most people know that).

SSL is made up to avoid man-in-the-middle attack, and the reverse proxy IS
the man-in-the-middls. Either you trust it (and accept what it sends) or
don’t use it.

All my work on having the java code create the client certificate was a waste. Well, not a total waste because now I understood the problem space in a way I hadn’t before, so I could perform far better searches.

I opened a support request with our proxy provider, but it was pretty clear from the internet, the support staff and the docs that this was a niche case. I don’t know if any static IP proxy providers support client certificates, but I wasn’t able to find one.

Instead, we were able to use AWS elastic IP and nginx to set up our own proxy. Since we controlled it, we could install the client certificate and key on it, and have the heroku instance connect to that proxy. (Tips for those instructions–make sure you download the openssl source, as nginx wants to compile it into the web server. And use at least version 1.9 of the community software.)

So, I made some mistakes in this process, and in a personal retro, I thought it’d be worth talking about them.

  • I jumped into searching too quickly. SSL and private certs is a complicated system that has a lot of moving pieces, and it would have been worth my time looking at an overview.
  • While I was focusing on accessing the system from the java code, there were hints about the proxy issue. I didn’t consider the larger picture and was too focused on solving the immediate issue.

When you encounter a PropertyReferenceException using SpringData and MongoDb

If you are using spring data and mongodb, you can use these magic methods called “derived queries”, which make writing simple queries very easy.

However, you may run into a PropertyReferenceException: No property module found for <type> message, with an exception similar to the one below.

This means you have a typo in your derived query (miscapitalized word, misspelled word, etc), so take a long look at that Repository interface.


[mongod output] 10:09:58.421 [main] WARN o.s.c.a.AnnotationConfigApplicationContext [AbstractApplicationContext.java:487] - Exception encountered during context initialization - cancelling refresh attempt 
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'tripDAO': Invocation of init method failed; nested exception is org.springframework.data.mapping.PropertyReferenceException: No property module found for type Trip!
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1574) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:539) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:476) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:303) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:299) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:736) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:757) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:480) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.context.annotation.AnnotationConfigApplicationContext.(AnnotationConfigApplicationContext.java:84) [spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at com.katasi.decision_engine.AbstractMongoDbTest.createApplicationContext(AbstractMongoDbTest.java:49) [test-classes/:na] 
at org.apache.camel.testng.CamelSpringTestSupport.doPreSetup(CamelSpringTestSupport.java:71) [camel-testng-2.15.2.jar:2.15.2]
at com.katasi.decision_engine.processor.CreateTripTest.doPreSetup(CreateTripTest.java:66) [test-classes/:na]
at org.apache.camel.testng.CamelTestSupport.setUp(CamelTestSupport.java:215) [camel-testng-2.15.2.jar:2.15.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_31]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_31]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_31]
at java.lang.reflect.Method.invoke(Method.java:483) ~[na:1.8.0_31] 
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeConfigurationMethod(Invoker.java:552) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeConfigurations(Invoker.java:215) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeMethod(Invoker.java:636) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:882) [testng-6.8.21.jar:na]
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1189) [testng-6.8.21.jar:na]
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:124) [testng-6.8.21.jar:na]
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108) [testng-6.8.21.jar:na]
at org.testng.TestRunner.privateRun(TestRunner.java:767) [testng-6.8.21.jar:na]
at org.testng.TestRunner.run(TestRunner.java:617) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.runTest(SuiteRunner.java:348) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:343) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:305) [testng-6.8.21.jar:na]
at org.testng.SuiteRunner.run(SuiteRunner.java:254) [testng-6.8.21.jar:na]
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52) [testng-6.8.21.jar:na]
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86) [testng-6.8.21.jar:na]
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1224) [testng-6.8.21.jar:na]
at org.testng.TestNG.runSuitesLocally(TestNG.java:1149) [testng-6.8.21.jar:na]
at org.testng.TestNG.run(TestNG.java:1057) [testng-6.8.21.jar:na]
at org.apache.maven.surefire.testng.TestNGExecutor.run(TestNGExecutor.java:77) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.executeMulti(TestNGDirectoryTestSuite.java:159) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGDirectoryTestSuite.execute(TestNGDirectoryTestSuite.java:99) [surefire-testng-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.testng.TestNGProvider.invoke(TestNGProvider.java:106) [surefire-testng-2.12.4.jar:2.12.4]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_31]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_31] 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_31]
at java.lang.reflect.Method.invoke(Method.java:483) ~[na:1.8.0_31] 
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) [surefire-api-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) [surefire-booter-2.12.4.jar:2.12.4]
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) [surefire-booter-2.12.4.jar:2.12.4]
Caused by: org.springframework.data.mapping.PropertyReferenceException: No property module found for type Trip!
at org.springframework.data.mapping.PropertyPath.(PropertyPath.java:75) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:327) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.create(PropertyPath.java:307) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:270) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mapping.PropertyPath.from(PropertyPath.java:241) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.Part.(Part.java:76) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$OrPart.(PartTree.java:235) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$Predicate.buildTree(PartTree.java:373) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree$Predicate.(PartTree.java:353) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.query.parser.PartTree.(PartTree.java:87) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.query.PartTreeMongoQuery.(PartTreeMongoQuery.java:53) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactory$MongoQueryLookupStrategy.resolveQuery(MongoRepositoryFactory.java:128) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactorySupport$QueryExecutorMethodInterceptor.(RepositoryFactorySupport.java:367) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:190) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.initAndReturn(RepositoryFactoryBeanSupport.java:239) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:225) ~[spring-data-commons-1.10.2.RELEASE.jar:na]
at org.springframework.data.mongodb.repository.support.MongoRepositoryFactoryBean.afterPropertiesSet(MongoRepositoryFactoryBean.java:108) ~[spring-data-mongodb-1.7.2.RELEASE.jar:na]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1633) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1570) ~[spring-beans-4.1.7.RELEASE.jar:4.1.7.RELEASE]
... 50 common frames omitted

Using basic authentication and Jetty realms to protect Apache Camel REST routes

So, just finished up looking at authentication mechanisms for a REST service that runs on Camel for a client.  I was using Camel 2.15.2 and the default Jetty component, which is Jetty 8.  I was ready to use Oauth2 until I read this post which makes the very valid point that unless you have difficulty exchanging a shared secret with your users, SSL + basic auth may be enough.  Since these APIs will be used by apps that are built for the same client, and there is no role based access, it seemed like SSL _ basic auth would be good enough.

This example is different from a lot of other Camel security posts, which are all about securing routes within camel.  This example secures the external facing REST routes defined using the REST DSL. I’m pretty sure the spring security example could be modified to protect REST DSL routes as well.

So, easy enough.  I looked at the Camel Jetty security configuration docs, and locking things down with basic auth seemed straightforward.  But, they were out of date.  Here’s an updated version.  (I’ve applied for the ability to edit the Camel website and will update the docs when I can.)

First off, secure the rest configuration in your camel context:

<camelContext ...>
   <restConfiguration component="jetty" ...>
      <endpointProperty key="handlers" value="securityHandler"></endpointProperty>
   </restConfiguration>
...
</camelContext>

Also, make sure you have the camel-jetty module imported your pom.xml (or build.gradle, etc).

Now, create the security context. The below is done in spring XML, but I’m sure it can be converted to Spring annotation configuration easily. This example uses a HashLoginService, which requires a properties file. I didn’t end up using it because this was POC code, but here’s how to configure the HashLoginService.

<!-- Jetty Security handling -->
<bean id="constraint" class="org.eclipse.jetty.util.security.Constraint"> <!-- this class's package changed -->
   <property name="name" value="BASIC"/>
   <property name="roles" value="XXXrolenameXXX"/>
   <property name="authenticate" value="true"/>
</bean>

<bean id="constraintMapping" class="org.eclipse.jetty.security.ConstraintMapping">
   <property name="constraint" ref="constraint"/>
   <property name="pathSpec" value="/*"/>
</bean>

<bean id="securityHandler" class="org.eclipse.jetty.security.ConstraintSecurityHandler">
   <property name="loginService">
      <bean class="org.eclipse.jetty.security.HashLoginService" />
   </property>
   <property name="authenticator">
      <bean class="org.eclipse.jetty.security.authentication.BasicAuthenticator"/>
   </property>
   <property name="constraintMappings">
      <list>
         <ref bean="constraintMapping"/>
      </list>
   </property>
</bean>

As mentioned above, I ended up writing a POC authentication class which had hardcoded values. This post was very helpful to me, as was looking through the jetty8 code on using grepcode.

public class HardcodedLoginService implements LoginService {
  
        // matches what is in the constraint object in the spring config
        private final String[] ACCESS_ROLE = new String[] { "rolename" };
        ...

	@Override
	public UserIdentity login(String username, Object creds) {
		
        UserIdentity user = null;
        
        
        // HERE IS THE HARDCODING
		boolean validUser = "ralph".equals(username) && "s3cr3t".equals(creds);
		if (validUser) {
			Credential credential = (creds instanceof Credential)?(Credential)creds:Credential.getCredential(creds.toString());

		    Principal userPrincipal = new MappedLoginService.KnownUser(username,credential);
		    Subject subject = new Subject();
		    subject.getPrincipals().add(userPrincipal);
		    subject.getPrivateCredentials().add(creds);
		    subject.setReadOnly();
		    user=identityService.newUserIdentity(subject,userPrincipal, ACCESS_ROLE);
		    users.put(user.getUserPrincipal().getName(), true);
		}

	    return (user != null) ? user : null;
	}

	...
}

How, when you retrieve your rest routes, you need to pass your username and password or the request will fail with a 401 error: curl -v --user ralph:s3cr3t http://localhost:8080/restroute

Note that the ACCESS_ROLE variable in the login class must match the roles property of the constraint object, or you’ll get this message:

Problem accessing /restroute. Reason:
!role

You can find a working example on github (with the hardcoded login provider).

Thanks to my client Katasi for letting me publish this work.

How to call shell scripts from java properly

shell photo
Photo by dd21207

This has caused me some grief over the past few weeks, so here’s some tips on calling bash scripts from java correctly.

Use ProcessBuilder. Please.

Make sure you set the redirectErrorStream to true, or make other provisions for handling stderr.

I’ve found that inheritIO is useful. Not calling this caused a failure in a bash script that called other java programs.

Make sure your shell scripts have are executable, or that you call them with ‘bash scriptname’. Tarballs preserve permissions, zipfiles do not.

Read the output of the shell script, even if you do nothing with it (but you probably want to log it, if nothing else). If you don’t, buffers fill up and the process will hang:

final InputStreamReader isr = new InputStreamReader(p.getInputStream());
final BufferedReader br = new BufferedReader(isr);
String line = null;
StringBuffer sb = new StringBuffer();
while ((line = br.readLine()) != null) {
sb.append(line).append("\n");
}

Check your exit values, and wait for your shell script to finish: (If looking for speed, or more complex process flow, write it in java.)


int exitValue = p.waitFor();
if (exitValue != 0) {
// error log
}

Fun with Apache Spark And Census Data

spark photo
Photo by Enlightment Photography

I recently downloaded Apache Spark.  After working with HBase for a bit on my last project, it was a joy, even though I know little Scala.  I also downloaded some data from the Census Bureau (the 2010 Business Patterns, a pipe delimited file containing information on the business activity of the USA). I used the prepackaged data sources, so I didn’t have to use the Census API, which I have written about previously.

I was able to quickly start up Apache Spark on my workstation (thanks, quickstart!), and then ask some interesting questions of the data. I did all this within the Spark shell using Scala.  The dataset I downloaded has approximately 3M rows, so it is large enough to be interesting, but not large enough to need to actually use Spark.

So, what kinds of questions can you ask?  Well, given I downloaded the economic activity survey from 2010, I was interesting in knowing about different kinds of professions.  I looked at primarily at MSAs (which are “geographical region[s] with a relatively high population density at [their] core and close economic ties throughout the area”).  I did this because it was easy to filter them out with a string match, and therefore I didn’t have to look at any kind of code mapping table which I would have to dig into smaller geographic regions.

First, how many different professions are there in the Boulder Colorado MSA? This code:

val datfile = sc.textFile("../data/CB1000A1.dat")
val split_lines = datfile.map(_.split("\\|"))
val boulder = split_lines.filter(arr => arr[7].contains("Boulder, CO Metropolitan"))
val boulder_jobs = boulder.map(arr => arr(10));
boulder_jobs.count();

tells me in 2010 there were 1079 different types of jobs in the Boulder MSA.

Then I wanted to know which MSAs had the most jobs. Thanks to this SO post and the word count example, I was able to put together this query:

val countsbymsa = split_lines.map(arr => arr(7))
.filter(location => location.contains("Metropolitan Statistical Area"))
.map(location => (location,1)).reduceByKey(_+_,1).map(item => item.swap)
.sortByKey(true, 1).map(item => item.swap);
countsbymsa.saveAsTextFile("../data/msacounts");

And find out that the Los Angeles-Long Beach-Santa Ana, CA MSA has the most different jobs, at 2084 (nosing ahead of NYC by 14 jobs), and the Hinesville-Fort Stewart, GA MSA had the fewest at 700 (at least in 2010).

I didn’t end up using the XML utilities I found here, but found the wiki full of useful tips.

Don’t forget to deactivate inactive items in easyrec

delete photo
Photo by Ervins Strauhmanis

I wrote before about easyrec, a recommendation system with an easy to integrate javascript API, but just recently realized that I was still showing inactive items as ‘recommended’.  This is because I was marking items inactive in my database, but not in my easyrec system.

Luckily, there’s an API call to mark items inactive.  You could of course manually login and mark them as inactive, but using the API and a bit of SQL lets me run this check for all the items.  Right now I’m just doing this manually, but will probably put it in a cron job to make sure all inactive items are marked so in easyrec.

Here’s the SQL (escaped so it doesn’t wrap):

select concat('wget "http://hostname/api/1.0/json/setitemactive?apikey=apikey&tenantid=tenantid&\
active=false&itemtype=ITEM&itemid=',id,'"; sleep 5;')
from itemtable where enabled = 0;

I have an item table that looks like this:

CREATE TABLE `itemtable` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
...
`enabled` tinyint(1) NOT NULL DEFAULT '1',
...

Where enabled is set to 0 for disabled items and 1 for enabled items. The id column is numeric and also happens to be the easyrec item id number, in a happy coincidence.

The end output is a series of wget and sleep commands that I can run in the shell. I added in the sleep commands because I’m on the demo host of easyrec and didn’t want to overwhelm their server with updates.

 

Dropwizard vs Spring Boot

wizard photo
Photo by seanmcgrath

I just rolled off a project where I chose to use Spring Boot to create a number of microservices.  I have also written a number of Dropwizard services, and wanted to compare the two while they were fresh in my mind.

They have a number of similarities, of course.  Both Spring Boot and Dropwizard create standalone jarfiles that can be deployed without needing a container.  Both favor convention over configuration and want to help you do the right thing out of the box.  Both are java based.  Both have monitoring, health checks, logging and other production nicitiies buit-in.  Both are opinionated–making a lot of choices for the developer, rather than forcing the developer to choose.  Both make it easy to leverage existing libraries.  Both have a focus on performance.

However, there were a number of reasons I choose Spring Boot over Dropwizard for the recent project, and these highlight the differences.  The first is that dependency injection is built into Spring Boot in a way that it simply isn’t with Dropwizard.  Of course, there are third party solutions for bolting DI onto Dropwizard, but we also needed a java DI framework that would handle lifecycle events, which pretty much means Spring.  Finally, this project wasn’t all about REST services, and while Dropwizard has some support for other types of services, it really is designed as a performant HTTP/REST layer, and certainly almost all the questions about Dropwizard online are about REST and APIs.  Spring Boot, on the other hand, aims to provide support for a plethora of different types of services.  Plus, you can leverage the rest of the large Spring codebase.

There are some other differences as well.  Dropwizard uses shading to build fat jars while Spring Boot uses nested jars.  As far as support, Spring, as usual, wins on the documentation front, with loads of accurate docs.  But Dropwizard definitely has a larger community around it (compare the activity of the DW google group to the Spring Boot forums).

If you are writing a REST API in Java, Dropwizard is a great choice (here’s a review of other options I did a few months ago).  If you want to build microservices that integrate with other types of components (queues, nosql, etc), Spring Boot is what I’d recommend.

Update 12/8: Per this tweet, the spring forums aren’t used because of spam, but you can find plenty of support on StackOverflow with questions tagged ‘spring-boot’.