Skip to content

Annotation driven Spring configuration vs XML driven Spring configuration

 julius caesar photo
Photo by Internet Archive Book Images

First, off, I have to admit that I am not a XML hater.  Sure, it’s not the prettiest structured data language, but it seems to do the job just fine, especially in environments where you aren’t concerned about byte size (like, say, when you are configuring a software system).  Plus, it allows you comments, which some alternatives ([cough] JSON) don’t.

But, I come to contrast two types of Spring configuration, not bury XML or @Annotations.  I’ve been working on a project that uses Spring annotations.  I evaluated Guice, but it didn’t have lifecycle management, and we needed that.  I also evaluated PicoContainer (I know, blast from the past, right) but it didn’t seem functional and the web presence was a mess.  In the past, I’ve used XML based configuration of Spring.  I thought it would be useful for me to capture a few of the differences.

First of all though, they are fairly similar.  Both use the standard Spring workflow of “create a number of components, pass them to other components, pass those to other components, occasionally use some of the extensive spring library in a way that appears magical at times, create a few more components, and string them all together to build a system that does what you want”, whether “what you want” is responding to web requests, modifying databases, pulling data off of a queue for processing, or something entirely different.

For annotation based configuration, you are writing the configuration in classes that are specially tagged with a @Configuration annotation. These configuration classes look for components in a couple of places, either under them in the classpath or in certain packages, as specified by the @ComponentScan annotation. This means that you get all the benefits of your IDE:

  • You get code completion.
  • The compiler checks your types so that you don’t have runtime exceptions if you passed a Foo bean to a class that was expecting a Foobar bean.
  • You can do logic in the configuration class to load different components based on anything (accessing a buildtime constant, a file, a database, an external service, etc)
  • You can refactor bean definitions and know that they’re changed everywhere.

All this power comes with the risk of complexity. If you read from a database table to know what kind of component implements a certain interface, that can be, shall we say, less than obvious.

XML based configuration, in my experience, is simpler. You still can have multiple layers of XML files, but you can’t have any logic, at least without using XSLT and generating your Spring configuration files (if you are doing that, annotation based configuration is going to be far simpler!). Components can be anywhere because they are specified by the fully qualified class name.  XML configuration removes temptation to invoke any java code as part of your component configuration. On the flip side, you do have risk of runtime errors with typos, and you must deal with XML.

Both of these configuration options are well supported. I find that there’s more documentation online about the XML based configuration, but they are isomorphic, so I’d recommend picking the option that suits your needs best. If you need complex configuration or want the blanket of type safety, then annotation based configuration is the best option. If you have a simpler project, especially if everything can be contained in one XML file, XML based configuration is the better option.

My Favorite Eclipse Shortcuts

eclipse photo
Photo by NASA Goddard Photo and VideoL

Learning your tools can make you far more efficient, whether those tools are Photoshop, Sublime Text, or Microsoft Word.  I do a fair bit of java development, so I use an IDE called Eclipse.  Here are some of my favorite shortcuts. (Note that you can personalize eclipse shortcuts via the menu outlined here.)

  • control-shift-t: look up a particular type/class by name
  • control-shift-r: look up a particular file by name
  • alt-shift-c (custom): check in the file I’m working on
  • control-f5 (custom): re-run the last thing I ran (typically a unit test)
  • control-h: search through any files for a given string (can set up patterns for the file names if desired)
  • control-f8: shift perspective (from java to debug, for example) without my hands leaving the keyboard

Also, I always install vrapper, which lets me move around my editor screen via the vi movement keys.

What shortcuts do you use to get around eclipse?

Java REST API Framework Options

resting photo
Photo by shioshvili

I’ve been working with a couple of REST API solutions that exist in the Java tech stack.  I haven’t seen any great analysis of REST API solutions (though Matt Raible does mention some in this exhaustive slide deck about Java frameworks [pdf]), so wanted to share my on the ground experience.

First up is restSQL.  This framework makes it easy to get data from a database to a JSON or XML REST API and back.  If you have a servlet container available, you write two configuration files, one with a SQL query and one with db connection information, and you have a RESTful API.  For prototyping and database access, it is hard to beat.

Pros:

  • Quick to set up
  • Only SQL knowledge is required
  • No programming required
  • Allows simple mapping of db table to resource, but can include one to one and one to many mappings
  • Supports all four REST operations out of the box
  • Supports XML as well as JSON
  • Is an embeddable java library as well as a standalone framework
  • Project maintainer is engaged and the project is moving forward

Cons:

  • Requires a servlet engine, and you have to restart it for changes to your configuration to be picked up
  • Output format has limited customization
  • Only works with mysql and postgresql databases (though there is some experimental support for Oracle and MS SQL)
  • Doesn’t work with views
  • The security model, while fine grained, isn’t modern/OAuth (can be solved with an API gateway (like 3scale, Tyk or ApiAxle) or proxy

The next framework I have experience with is Dropwizard.  This is a powerful framework that creates uberjars that you can run on any port as a standalone service.  It’s not limited to providing a JSON representation of database tables–if you can create a Java object, Dropwizard can serve it up as a JSON resource.

Pros:

  • Community support
  • Extreme output formatting flexibility, but be prepared to write a custom deserializer if you want to handle anything other than reads of custom formatted objects
  • Supports any database that hibernate supports
  • Built in testing support
  • Brings together ‘best of breed’ tools like Jersey, Jackson and Hibernate, so you don’t have to do the integration yourself
  • Great documentation

Cons:

  • Have to roll your own deployment solution (tarball, chef, puppet)
  • No services startup script provided
  • Shading can slow down development
  • Not yet at 1.0 release

The last one I don’t have familiarity with, but a colleague used it in the past.  It is Sparkjava.  This is a lightweight framework that fits when you have an existing Java library with functionality you want to expose.  I’m not competent to write pros/cons for this framework, but wanted to mention it.

The gorilla in the room that I haven’t had experience with (in terms of writing RESTful webs services) is Spring.  I would definitely include this in any greenfield solutions review.

Observations on a Writing a Custom Report with Java, Quickbooks, Jasper Reports, Google Spreadsheets and Google Drive

Image courtesy of Sean MacEntee: https://flic.kr/p/9P4CwC
Image courtesy of Sean MacEntee

A recently released project is using java and spring to pull data from quickbooks, a mysql database and google spreadsheets, munging the data in various ways, and using jasper reports and jfreechart to generate a good looking report and a CSV of transactions that will give our brokers weekly updates on how they are doing compared to their goals for the year. I then upload it to Google Drive and send an email notifying each realtor that they have a new file.  It’s always nice to release useful software–feels like a new day dawns.

A few observations from this project:

  • The tech was interesting, but it was actually more interesting to see how the needs of tech drove the business to ‘tighten up’ their processes. Whether that was making sure there was one place to look for user type data, or defining exactly what constituted achieving a goal, or making sure that any new realtors who joined created business goals (and committed them to writing), the binary nature of software forced the business (or, more accurately, people in the business) to make decisions. Just another example of business process crystallization. This also meant deferring some software development. Where the business couldn’t answer a question definitively, rather than force them to do so, we chose to defer development.
  • I’m glad that jasper reports makes it so easy to generate PDFs–you basically create an XML file (I was unable to find a spec, but there were plentiful examples) and then put tokens for dynamic content. Then you compile the XML file, give it a map of said tokens and values (which can be text, numbers, dates or images), and then export the object to PDF. Note that I was not using Jasper in a typical way–reporting from large amounts of similar data via a data connection–because I wanted different PDFs for each user. Perhaps there was a way to do this elegantly, but I was just trying to get stuff done. Creating a grid in jasper was interesting, though.
  • JFreechart had a very weird issue where on stage the graph labels were bolded and italicized, but not on production. Since we make every effort to keep these two environments in sync and they were running exactly the same code, this was a mystery. Finally solved it when we discovered java was different (same version, different vendors: openjdk vs sun java). Had been running this way for years. Oops.
  • Interacting with google spreadsheets is great for the business but a pain in the butt for developers. It’s great for our business because it is extremely easy for someone who is not a programmer to create a ‘database’ that is versioned, backed up, almost always accessible and helps them ‘get stuff done’ in a measured way. It’s a pain for developers because it is a ‘database’ and not a database–no referential integrity or data typing. Also, google provides cell based access and row based access, forcing you to choose. And the java libraries are old. Beats excel though–at least it is accessible from a server. We ended up writing a library to wrap Google’s Java SDK to make some common operations easier.
  • Pushing to google drive is interesting–I alluded to this in my last post but you have to be ready for failure. I ended up using a BlockingQueue for this–I throw files (a data structure defining the file, actually) to be uploaded on the queue, then consumers executing in a different thread each take one off, try to upload it, and if it fails, put it back on. I considered using a third party durable queue like IronMQ, but thought it was overkill.
  • Using the Quickbooks SDK, with all the accounting data exposed, lets you build some pretty powerful graphs that are useful to the business. But the docs are jumbled, with a lot of them aimed at developers who are building integrations to sell to Quickbooks users. Support is OK for standard operations, but for things like renewing your token, you have to drop down to the REST API (see my SO question) This article does a good job of outlining the various projects but as a dev you’ll have to ignore certain sets of information–never fun when getting up to speed.
  • We do a lot of backend processing and spring and maven and a custom assembler that generates a tarball when using ‘maven install’ have been great. I also finally figured out how to work with maven to use ‘release:prepare’ and ‘release:perform’ for releasing libraries, as opposed to going my own way, and that has made things much much easier. Learn your tools, folks!
  • I’m once again astounded by the richness of the java library ecosystem. There doesn’t seem to be very much that I can think of doing that doesn’t have at least one, and probably three, java implementations.

Multi threaded push to google drive with Java Executors

I’ve never really written threaded code in java. Sure, I took a java certification class long ago, and reviewed threading and probably wrote some toy code.

But it had always been one of the areas I was aware of but avoided. And it was easy to avoid–after all, I was often writing for a servlet container, and in that case you are supposed to rely on the platform (Tomcat, etc) to provide multi threading magic–you just have to stay in your servlet box. Other times, I was writing glue code where performance wasn’t an issue.

Recently, I was writing integration code that read from google spreadsheets (using the api) and quickbooks online (using the api) and did some calculations on the gathered data and created PDFs (using jfreechart and jasper reports). These PDFs then get pushed to certain folders on google drive.

I had run through some testing in my development and stage environments, and was ready to do a run in production. I started off the run and, after a few minutes, noticed that performance was not very good. About one report every 80 seconds or so.

Now, I’d run this process before I added google drive integration, and it had completed in a normal amount of time. So I suspected the google drive calls were the issue. When pushing the PDFs to google drive in my dev environment, I’d run into some connectivity issues, so as recommended I’d added a exponential backoff to my requests.

This meant that a lot of clock time was spent just waiting for the google services to recover.

So, an easily parallelizable task that spent a lot of time waiting? Seemed like a great place for multiple threads. I googled multithreading in java and found the Executor framework, which made running such tasks in parallel trivial. It even had futures and everything, and had been a standard part of the library since java 1.5.

I set up a thread pool of 15 threads, spent an hour or so rewriting and testing my file push code to fit the framework, and performance is much better. The push to google drive still takes about 20 seconds per report, but that’s a 4 fold increase in speed.

Next steps–use the spring abstraction layer to make the code more maintainable and clear.

Google Spreadsheets as REST API sources

I recently built out a read only JSON API with a Google spreadsheet as the back end data source.

Why do such a thing? I didn’t need to modify the back end, but wanted to make the data available to other software. The people who maintain this data are very comfortable using Google spreadsheets. While I could have written a custom CRUD app, this didn’t seem like a good use of time when Google spreadsheets had served us well in the past.

How did I do this? I first of all created a sanitized spreadsheet, using importrange and regexreplace. This level of indirection assures me that if the source data changes, I can adjust fairly easily. If the user managing the spreadsheet wants to rearrange columns, I can adjust my sanitized spreadsheet easily.

Then I created a Google apps script and used doGet to respond to requests and the spreadsheet service to retrieve the data. The content service lets you serve JSON with the appropriate mime type. I used qunit for Google apps script (invaluable when you are working in the loosely typed javascript world and relying on cloud resources). I also worked with the parameters to build the querying needed for our application.

Then, in order to make it look like a normal API call, I fronted the script with Varnish and did some regsub magic in my VCL file (as well as some light authentication).

This approach has the benefit of keeping everything in Google’s cloud, and allowing you to access the spreadsheet data easily.

This approach has significant limitations, however.

  • Google apps scripts calls are slow, especially when accessing spreadsheet data. Using the cache service can help.
  • You cannot return anything other than a 200 response code. None of the other response codes are available.
  • The actual content is served after a redirect, so caching it at the Varnish level is difficult (though possible), and clients must be able to follow redirects.
  • Google changes the ip of the server running the script. This is not such a big problem, unless your version of Varnish only takes IP addresses in the VCL file, not hostnames. Like ours.

Was this a good idea? Well, it let me build out the API relatively quickly without affecting the users managing the data or finding any other place to put it. But we’ll probably move away from this due to the limitations listed above. One we’ve found particularly painful is the IP address switching, which usually only shows up in our automated testing.

We’ll probably start pushing the data daily (it doesn’t change all that often) to a local JDBC database using the JDBC service and use either RestSQL or DropWizard to generate an API for it. (RestSQL is quicker, but DropWizard lets us maintain format compatibility.)

Easyrec: a recommendation engine worth looking at

I love recommendation engines.  These are the software that Amazon has everywhere showing “users who bought this also bought” recommendations.

I love them because they are an easy way to leverage the wisdom of the crowd to help users.  They also get better the more data you feed into them, so once you set one up, it just makes your site better and better.

For a while, I’ve wanted to explore mahout as a recommendation engine solution, but felt intimidated by how much work integration would be.  Luckily, I did a bit of searching and turned up this stackoverflow question about java recommendation engines.

Looking at some of the alternatives, I dug up easyrec, an open source recommendation engine.  Rather than solving a couple of different machine learning problems like mahout does, easyrec focuses on recommendations.

It also has a javascript API (for both sending information and displaying recommendations) and a demo installation you can use on your site, so it is trivial to integrate into a website to see if it works for you.  I did run into an issue with the demo server, but a post to the forums got it resolved in a few days.

Easyrec has support for generating recommendations for more than one kind of item (so if you want to display different recommendations within specific categories of an ecommerce site, that is possible) and is self hostable in any java container (which is recommended if you are going to use it in any commercial capacity).  You can also build the recommendations off of the following actions: views, rating, or purchase.

You can also customize easyrec with java plugins, though mahout definitely offers far more options for configuruation.

I haven’t noticed any speed changes to my site with the javascript installed, though I’m sure adding some more remote javascript code didn’t speed up page rendering.  I noticed an uptick in time on site after I installed it (small, on the order of 5%).

If you have a set of items that are viewed together, using easyrec can leverage the wisdom of the crowds with not much effort on your part.  It’s not as powerful or configurable as alternatives, but it drop dead simple to get started with.  It’s worth a look.

JFreeChart for the win

test

I wanted to make some charts showing linear regressions, but they needed to be images (so libraries like d3.js were out).  My first inclination was to use Google’s Image Charts API, but it has been deprecated.  I looked around and couldn’t find anything like it.

So, it looked like I’d need to code something up.  Some searching around turned up some great open source projects.  Using JFreeChart and Apache Commons Math, and some blog posts (this one about regression, this one about drawing dashed lines) and a post on StackOverflow about scatter plot basics, I was able to put together the above regression line chart in about half a day.

Java address parsing gets an upgrade

I wrote last year about address parsing solutions in java, and how an open source project call JGeocoder had worked out well for us.  I wanted to announce that my company, 8z, has significantly improved the address parsing capability of JGeocoder based on data from a number of property listings.

My colleague, Karamjeet Khalsa, added this functionality as well as more than fifty unit tests.  An address that previously would have failed but now is parsed correctly is: 25266 Road 38.1 Dolores CO. Or try this one: 10 Black Bear Gypsum CO.  This code focuses on not just parsing address, city and state, but also breaking apart the address into components like street number, unit number, etc (for US style addresses).

Working with the current maintainer of the project, Karamjeet uploaded the new jar yesterday, and it is now ready to download.  This is the first release in four years, so if you need address parsing, go take a look!

Pentaho Data Integration is damn cool

I have worked on two small projects with Pentaho Data Integration.  If you’re looking for a business intelligence tool that lets you manipulate large amounts of data in a performant way, you definitely want to take a look at this.  The version I’m working with is a couple of revisions back, but the online support is pretty good.  It’s way more developer-efficient than writing java, though debugging is more difficult.

Why is it so cool?  It lets you focus on your problem–validating and transforming your data–rather than the mechanics of it (where do the CSV files live?  what fields did I just add?  how do I parse this fixed width file?).  You can also call out to Java if you need to.

There is a bit of a learning curve, especially around the difference between transformations and jobs.  I bought my first tech book of 2011, Pentaho Kettle Solutions.  These projects weren’t even using Pentaho for its sweet spot, ETLing to a data warehouse, but I have found this to be an invaluable tool for moving data from text files to databases while cleaning up and processing it.