I have worked on two small projects with Pentaho Data Integration. If you’re looking for a business intelligence tool that lets you manipulate large amounts of data in a performant way, you definitely want to take a look at this. The version I’m working with is a couple of revisions back, but the online support is pretty good. It’s way more developer-efficient than writing java, though debugging is more difficult.
Why is it so cool? It lets you focus on your problem–validating and transforming your data–rather than the mechanics of it (where do the CSV files live? what fields did I just add? how do I parse this fixed width file?). You can also call out to Java if you need to.
There is a bit of a learning curve, especially around the difference between transformations and jobs. I bought my first tech book of 2011, Pentaho Kettle Solutions. These projects weren’t even using Pentaho for its sweet spot, ETLing to a data warehouse, but I have found this to be an invaluable tool for moving data from text files to databases while cleaning up and processing it.
Thanks for your feedback. We agree, Pentaho Data Integration is damn cool!
Check out another cool thing it can do!
Director of Enterprise Solutions
Wow, that looks pretty cool. Thanks for passing it along.