Skip to content

Slides from my ‘Transforming Data with Kettle’ talk

Here is a PDF of my slides from the ‘Transforming Data with Kettle’ talk I did last night at the Boulder Java Users’ Group.

I really enjoyed some of the discussion, which centered around topics like how to version control output of graphical code generation tools, error handling and transactions, and who had interacted with the EDI data format. As far as my audience survey questions, I’d say around 40% of people had written scripts to munge data from one datasource to another, and about 15% had used java code and about 20% had used an ETL tool.

Thanks to everyone who attended.

List of Front Range Software Networking Events and Conferences

Updated March 21: crossed out ‘conferences’ because I don’t do a good job of listing those.
Boulder, Colorado, has a great tech scene, that I’ve been a peripheral member of for a while now.  I thought I’d share a few of the places I go to network.  And by “network”, I mean learn about cool new technologies, get a feel for the state of the scene (are companies hiring?  Firing?  What technologies are in high demand?) and chat with interesting people.  All of the events below focus on software, except where noted.

NB: I have not found work through any of these events.  But if I needed work, these communities are the second place I’d look.  (The first place would be my personal network.)

Boulder Denver New Tech Meetup

  • 5 minute presentions.  Two times a month.  Audience varies wildly from hard core developers to marketing folks to graphic designers to upper level execs.  Focus is on new technologies and companies.  Arrive early, because once the presentations start, it’s hard to talk to people.
  • Good for: energy, free food, broad overviews, regular meetings, reminding you of the glory days in 1999.
  • Bad for: diving deep into a subject, expanding your technical knowledge

User groups: Boulder Java Users Group, Boulder Linux Users Group, Rocky Mountain Adobe Users Group, Denver/Boulder Drupal Users Group, Denver Java Users Group others updated 11/12 8:51: added Denver JUG

  • Typically one or two presentations each meeting, for an hour or two.  Tend to focus on a specific technology, as indicated by the names.  Sometimes food is provided.
  • Good for: diving deep into a technology, networking amongst fellow nerds, regular meetings
  • Bad for: anyone not interested in what they’re presenting that night, non technical folks

Meetups (of which BDNT, covered above, is one)

  • There’s a meetup for everything under the sun.  Well, almost.  If you’re looking to focus on a particular subject, consider starting one (not free) or joining one–typically free.
  • Good for: breadth of possibility–you want to talk about Google?  How about SecondLife?
  • Bad for: many are kind of small

Startup Drinks

  • Get together in a bar and mingle. Talk about your startups dreams or realities.
  • Good: have a beer, talk tech–what’s not to like?, takes place after working hours, casual
  • Bad: hard to target who to talk to, intermittent, takes place after working hours.


  • Originally started, I believe, in response to FooCamp, this is an unconference. On Friday attendees get together and assemble an interim conference schedule.  On Saturday, they present, in about an hour or so.  Some slots are group activities (“let’s talk about technology X”) rather than presentations.  Very free form.
  • Good: for meeting people interested in technologies, can be relatively deep introduction to a technology
  • Bad: if you need lots of structure, if you want a goodie bag from a conference, presentations can be uneven in quality, hasn’t been one in a while around here (that I know of)


  • Presentations on a variety of topics, some geeky, some not.  Presentations determined by vote.  Presentations are 20 slide and 5 minutes total.  Costs something (~$10).
  • Good: happens in several cities (Denver, Boulder, Fort Collins) so gives you chance to meet folks in your community, presentations tend to be funny, wide range of audience
  • Bad: skim surface of topic, presentation quality can vary significantly, not a lot of time to talk to people as you’re mostly watching presentations

CU Computer Science colloquia

  • Run by the CU CS department, these are technical presentations.  Usually given by a visiting PhD.
  • Good: Good to see what is coming down the pike, deep exposure to topics you might never think about (“Effective and Ubiquitous Access for Blind People”, “Optimal-Rate Routing in Adversarial Networks”)
  • Bad: The ones I’ve been to had no professionals there that I could see, happen during the middle of the work day, deep exposure to topics you might not care about


  • Cooperative work environments, hosted at a coffee shop or location.
  • Good: informal, could be plenty of time to talk to peers
  • Bad: not sure I’ve ever heard of one happening on the front range, not that different from going to your local coffee shop

Boulder Open Coffee Club

  • From the website: it “encourage entrepreneurs, developers and investors to organize real-world informal meetups”.  I don’t have enough data to give you good/bad points.

Startup Weekend

  • BarCamp with a focus–build a startup company.  With whoever shows up.
  • Good: focus, interesting people, you know they’re entrepeneurial to give a up a weekend to attend, broad cross section of skills
  • Bad: you give up a weekend to attend

Refresh Denver

  • Another group that leverages, these folks are in Denver.  Focus on web developers and designers.  Again, I don’t have enough to give good/bad points.

Except for Ignite, everything above is free or donation-based.  The paid conferences around Colorado that I know about, I’ll cover in a future post.

What am I missing?  I know the list is skewed towards Boulder–I haven’t really been to conferences more than an hours drive from Boulder.

Do you use these events as a chance to network?  Catch up with friends?  Learn about new technologies, processes and companies?

Best Consulting Sales Pitch Ever

I heard this at a presentation about Mule (an enterprise service bus) at BJUG tonight.  The presenter, Rich Remington, at the end of the talk, put up a slide detailing a contest.  Anyone who emailed him with a use case that Mule might be able to help with won a chance at a free lunch and 1-2 hours of free consulting about the use case. You have to email him within 48 hours of the talk.
What a great idea!  Not only does he get to pick a use case that Mule can help with, he also gets a chance to pitch his services, and show his value to the client.  And he gets a list of possible clients–people who know about something about Mule and think they might have a problem Mule can solve.

The contest winner gets more than a free lunch out of it too–the chance to pick an expert’s brain for free for a couple of hours can be worth quite a bit.

Win-win, for sure.  What a great idea for anyone consulting in a specialized field!


Using Excel to ease Java i18n processes

Ah, the perils of reading Bloglines before going to sleep. (I’ll just catch up on one more blog…)
I gave a BJUG talk 18 months ago about large websites and internationalization (i18n). (Links and powerpoint here.) The talk was based on my experiences of a smooth operation, created and executed by Zia Consulting.

A crucial part of this i18n process was creating and moving around Excel files containing keys and translations. In addition, there was an Access database and a VB script that converted the keys and translations to properties files. The reason to do this is that the typical Java developer wants to use ResourceBundles for i18n, which typically involves properties files. And the typical translator is much more comfortable with Excel. So, Zia built a process which bridged that gap.

It looks like someone else solved the same problem with translators and Excel files, and open sourced the solution.

Via Rickard.

Updated 2/16, fixed typo and a small formatting change.


Jini and JavaSpaces at BJUG

I went to BJUG last week to see a presentation about Jini by someone from GigaSpaces. It was an intensely interesting presentation for a number of reasons. First off, I knew the presenter, Owen Taylor. About 6 years ago, I took a class from him, along with a few other people. The class covered BEA Weblogic and EJBs. I’ve attended (and given a couple) technical presentations in my time including some conferences. I don’t think I’ve ever met someone who was more energetic and practiced at conveying hard concepts than Owen Taylor. Owen! Start blogging!

Another reason it was interesting is that Brian Pontarelli, an old friend, really likes Jini and has told me about some of his experiences. I actually looked into it when Bill DeHora published his entry two classic hardbacks. I downloaded Jini and JavaSpaces (Jini is the framework, JavaSpaces is the tuple space repository.) and started playing with it. The final reason that it was an interesting presentation is that JavaSpaces is something that I have never had a chance to use, and didn’t foresee using in the future. By the end of the presentation, I was convinced that this concept deserved more research, if nothing else.

What follows are my scribbled notes from that meeting, along with a smattering of other comments and thoughts regarding these technologies. More information is here, however no presentation artifacts are available, unfortunately.

The problem with distributed systems is that they move data around a lot. What you really want is for the processing and the data to be at most one step apart. Stored procedures do this, but you can’t change the logic easily.

Jini was originally developed for pervasive computing, but the focus of the presentation was on the enterprise applications that can be built based on that spec. This class of applications has some amazing features, including low latency, extremely high throughput and ‘100%’ uptime capability.

For that reason, many large institutions are looking at replacing or augmenting JEE (nee J2EE) applications with JavaSpace applications. He mentioned that GigaSpaces recruited him with the notion that a laptop could run 3 million events an hour. This kind of blew his mind.

JavaSpaces is the command pattern–code and data are distributed, based on Linda. Orbitz uses the technology and talks about 100% uptime. Anyplace where you are batching, you can now do it in real time. The key is to keep everytihing in memory and use replication for persistance, rather than disk. (Eventually you want to push it to disk, for reporting and auditing purposes, but you can do that asynchronously.) Databases tend to be used as a bus between in memory processes right now, and you can replace that with a JavaSpace.

Jini is composed of discrete objects that can run anywhere; more to the point, they don’t care where they run. It also expects failure, as opposed to many other technologies that simply assume that things will run correctly. Jini is a LAN based technology, though Owen mentioned that there are ways to turn it into a WAN technology and cited several examples. I am not competent to give a general overview of Jini–please check out this tutorial for more information.

One thing that really struck me is that all of the complexity that EJB and other JEE technologies hide (clustering, transaction management, thread management, lifecycle), JavaSpaces revels in. Owen actually mentioned that JavaSpaces brings skills that JEE developers currently rarely need to use, like threading and classloading, back into the toolbox, rather than depending on a vendor. That can be a plus and a minus, right? The whole point of not trusting servlet threading to a business developer is that it allows them to focus on the business logic. The problem with much of JEE is that it hasn’t done a very good job of doing this. Do you remember the ‘deployer’ role?

Jini has only interfaces; the named implementations are shipped around transparently. Ha ha, just like EJB remote calls are transparent. However, one very nice aspect of Jini is that when you register an implementation of an interface, you say how long the implementation is going to be available (the lease length). As a service provider, you can keep track of that lease and re-register yourself when it is near to up. Of course, if the service is no longer available, for whatever reason, it is not provided to clients–there’s no need for the JVM to garbage collect. The clients do need to be a bit smart about things though.

As for licensing, version 2.0 has been released under an apache license, as opposed to the Sun Community Source License, which was the previous license. This should grow the community significantly.

Configuration of Jini takes place with a java syntax, which can be a bit confusing, since you don’t compile and execute it. The names of the services (reggie, webster) are a bit cutesy. Webster is the web server which serves implementation classes, but shouldn’t be used in a production environment. Use Tomcat.

Spring and JavaSpaces are complementary; work is in progress to integrate them and completion is expected in the next few months. GigaSpaces has scaled implementations (linearly!) to 2000 cpus on 500 machines….

At this point Owen began talking about various architectural patterns that could be used with Jini; he also covered some war stories. However, I didn’t take any notes–you’ll have to see him talk sometime.

Issues include (so my friend says) versioning. Owen mentioned that debugging isn’t a strong suit. And I did some parallel computing for my senior thesis so I know that splitting up problems so they can be parallelized is not always as easy as you’d like. However, the web paradigm is actually rather suited to parallelization, since you do have the request/response model. The problem is, as it so often is, state.

BJUG last night: breaking it down

OK, here’s a grabbag of links and info regarding my talk last night (links and powerpoint available here). I was interested to know how many folks had actually worked on an internationalized application. (I18n is one of those APIs that you ignore until you need it, and then you absolutely have to have it.) Here are some rough numbers, based on a quick survey of hands. Out of a crowd of 25, about 5 people had used i18n. Of those, 3 had used it with web applications, and one with a J2SE application (I think the last person had used i18n with C). 3 of them had supported European languages and three had supported Asian languages.

I was emphasizing how important it was to pull out all the strings to be displayed to properties files, so they could be localized. A few folks in the audience mentioned that eclipse has support for this–out of the box for java classes, and with a plugin for jsps. (If you are reading this and know the name of the plugin, please comment; I did some googling and couldn’t find any jsp plugins that claimed an ‘Extract Strings’ capability.)

One person asked if we encountered issues with translating plurals, like house/houses. I hadn’t seen that problem with the application I worked upon, but was pretty sure that the java class libraries supported a solution. That solution is ChoiceFormat.

When I mentioned that we used Excel spreadsheets as our transfer format for slinging around translated strings, one of the other folks, who spent five years supporting a java application that was internationalized (he had some war stories), mentioned the XLIFF, an XML variant that is used in several translation editors. If you’re planning to do a lot of translation, you might want to take a look.

The talk after mine was an interesting discussion of SOA by Glen Coward. It was a rather discombobulating mix of an extremely high level look at the SOA landscape, complete with TLAs galore, and an extremely low level example of a rich client in Flex talking to two web services, one built with Axis on JBoss, and the other built with MS.NET deployed on Mono. The demo went poorly, as demos tend to do, but it was enough to whet my appetite to look a bit more closely at SOA.

In addition, as you’d expect from an employee of Novell, he was interested in delineating the differences between open source software and commercial software in a corporate environment, and where each made sense. His point was that in quickly changing areas, corporate software made sense, because of productivity gains that commercial IDE support makes available. In addition, deep integration (read, anything to do with backend legacy systems) is probably going to require commercial software. He didn’t specify why, but I’d guess that specialized backend integration is primarily going to continue to be commercial software for a number of reasons: it’s not sexy, it requires the backend system to test (not often available for the average hacker), and there are intellectual property issues. What Glen didn’t make explicit, but I will, is where that leaves open source–smack dab in the middle, gluing together all the applications. You know, like Apache and JBoss and Tomcat (among many others) already do.

As usual, I learned something at BJUG. And you can’t beat free pizza.

BJUG Localization talk

I’m giving a talk tomorrow (Thursday) in Boulder at BJUG at 6:00. The topic will be “Internationalization and Localization in the Real World”:

This is not another rehash of the Internationalization Trail fromthe Java Tutorial website. Rather, Dan examines one website that is supporting a large number of locales in the real world and looks at how i18n and l10n were implemented in the real world. This includes the nuts and bolts of loading multi character data, framework tools that helped, how users were associated with locales, what parts of the i18n API were not used, and issues to be aware of.

Afterwards, there will be pizza and pop, then a talk about Services Oriented Architecture. Hope to see you there.

JMS at the most recent BJUG

I went to BJUG last Thursday, and enjoyed the informative talk about JMS by Chris Huston. It started out as a bit of a tutorial, with the typical “here’s a messaging system, here are the six types of messages, etc.” When he was doing the tutorial bit, I thought it was a bit simple for a main talk, but it got better as the the speaker continued. It was clear from the speaker’s comments that he was deeply knowledgeable in the subject, or, if not that, at least has been enmeshed in JMS for a while. This wasn’t just a “I downloaded an open source JMS server and ran through the Sun tutorial talk” and I appreciated that.

I had a couple of take aways. One is that managing messaging with transactions is something that you’re always going to want to do, but this is fraught with difficulty, since you’ll then have two transactional systems. And we all know what that means; you’ll have to buy this book. It also means that, in a real system, you’ll never want to use local transactions, as you’ll want the transactions to be managed by a global transaction service, typically your application server.

Recovery of such a transactional messaging service was touched upon. If you have two different transactional systems, and failure occurs, recovering can be a real issue. Chris recommended, if at all possible, having the JMS provider and your data layer live in the same database, as then you can use the backup tools and ensure the two systems are in a consistent state.

One of the most interesting parts of the evening was a question asked by the audience. A fellow asked what scenarios JMS was useful for, and Chris said it was typically used in two ways:

1. Clustering/failover. You can set up a large number of machines and since all they are getting is messages with no context, it’s much easier to fail over to another machine. There’s no state to transfer.

I’ve seen this in the Jetspeed 1.5 project, where messaging is used to allow clustering.

2. Handling a large amount of data while increasing the responsiveness of the system. Since messages can be placed into queues, with no need for immediate response, it’s possible for a message source to create a tremendous number of messages very quickly. These messages may take quite a bit of time to process, and this rules out a synchronous solution. JMS (and messaging solutions in general) allow hysteresis.

I’ve seen this in a client’s system, where they send out a tremendous number of emails and want to ensure they can track the status of each one. It’s too slow to write the status to the database for each email, but sending a message to a queue is quick enough. On the receiving end, there’s some processing and status is written to the database. The performance is acceptable and as long as the JMS provider doesn’t crash or run out of memory, no messages are lost.

The only scenario that I thought of that Chris didn’t mention is one that I haven’t seen. But I’ve heard that many legacy systems have some kind of messaging interface, and so JMS might be an easy way (again, no context required) to integrate such a system.

It was an interesting talk, and reminded me why I need to go to more BJUGs.

Kris Thompson’s review of my talk

Kris Thompson attended my BJUG kick start talk on J2ME development. I wanted to comment on his post.

1. I wouldn’t say that J2ME development has scarred me. But J2ME is definitely a technology (well, a set of technologies, really) that is still being defined. This leads to pain points and opportunities, just like any other new technology. Lots of ground to be broken.

2. Caching–you can do it, but just like in any other situation, caching in J2ME introduces additional complexities. Can it be worth it, if it saves the user time and effort? Yes. Is it worth it for the application I was working on? Not yet.

3. PNG–it’s easy to convert images from GIF/JPEG format to PNG. Check out the extremely obtuse JAI.create() method, and make sure you check out the archives of the jai-interest mailing list as well.

4. Re: Shared actions between MIDP and web portions of the application, I guess I wasn’t very clear on this–the prime reason that we don’t have shared action classes between these two portions was because, other than in one place (authentication) they don’t have any feature overlap. What you can do on the web is entirely separate from what you can do with the phone (though they can influence each other, to be sure).

Anyway, thanks Kris for the kind words.

As a last note, I would reiterate what Kris mentions: “Find out which phones have the features you want/need” and perhaps add “and make sure your service provider supports those features as well.” Unlike in the server side world, where everyone pretty much targets IE, J2ME clients really do have different capabilities and scouting those out is a fundamental part of J2ME development.