Open source portal search

I’ve been looking at some open source portals. My client has an existing java application, written in Expresso that has some reasonably complex logic embedded in it. Additionally, it’s massively internationalized, with dynamic international content coming from a database, and static content coming from a set of resource bundles. There’s an existing process around updating both of these sets of data. And when we’re talking internationalization, we’re talking Asian character sets as well as the European character sets.

So, the criteria for the portal were:

1. Support for multi-byte character sets and easy localization.

2. Ability to integrate with Expresso’s authentication and authorization systems.

3. Support for normal portal features–adding/moving/removing portlets, minimize/maximize portlets.

4. Documentation.

I looked at a fair number of portals, including jcorporate’s own ePortal, eXo, Liferay, Jetspeed 1, Jetspeed 2, and Pluto (a last alternative, to be avoided if possible, is to roll our own portal-like application). First, I looked at ePortal, but that’s a dead project, with no releases. Then, I installed pluto, which seemed like it would be a perfect fit to be integrated into Expresso. However, integrating pluto looked complex, and after installing it (fantastic instructions for installing pluto here), I realized that pluto did not have a layout manager that would allow for the addition, rearranging or moving of portlets.

I then battled with Jetspeed 2, which involved installing a subversion client and building from source. This looked to be pretty cool, but the sheer lack of documentation, and the fact that there have been no releases, caused me to shy off. This is no failure of Jetspeed 2–this is what projects in development are like; I think it will be a fine project when done but my client just doesn’t need to be on the bleeding edge. I also took a quick look at Liferay, which seems to be a much more full featured portal application than we needed. After reading this blog on portals I decided to take a closer look at eXo. However, the documentation wasn’t fantastic, and it wasn’t readily apparent how to plug in authentication.

I also downloaded and installed Jetspeed 1; if you download the src distribution, you get the helpful tutorial. While Jetspeed 1 is not a standards based solution (I expect most of the portlets will be custom developed anyway), the user community is fairly active, as indicated by the mailing list, and I’ve found the documentation to be extensive. In addition, it meets the localization requirements and the pluggable authentication and authorization systems.

I’m less than thrilled about having to use maven for builds. Others have said it better than I, but it’s just too much for my needs. However, I was able to get an independent directory tree for my project by copying over the maven.xml, project.properties, and project.xml from the tutorial directory to an empty directory. Then I tweaked the project.* files, ran maven jetspeed:genapp, tweaked a few settings in TubineResources.properties to make sure the localization settings were correct, and, voila, I have a working project tree, that, using the Jetspeed maven plugin, is one command away from a deployable war file.


Book Review: Java Transaction Processing

Since many financial institutions have standardized on it, I hear Java is the new COBOL. Whether or not this is true, if Java is to become the business language of choice, transaction support is crucial. (By ‘transaction,’ I mean ‘allowing two or more decisions to me made under ACID constraints: atomically, consistently, (as) in isolation and durably’.) Over the last five years, the Java platform has grown by leaps and bounds, not least in this area.

Java Transaction Processing by Mark Little, Jon Maron and Greg Pavlik, explores transactions and their relationship with the Java language and libraries. Starting with basic concepts of transactions, both local and distributed, including the roles of participant and coordinator, and the idea of transaction context, the book covers much old but useful ground. Then, by covering the Java Transaction API (JTA) as well as OTS, the OMG’s transaction API which is JTA’s foundation, this book provides a solid understanding of the complexities of transactions for Java programmers who haven’t dealt with anything more complex than a single RDBMS. I’d say these complexities could be summed up simply: failures happen; how can you deal with them reliably and quickly?

The book then goes on to examine transactions and the part they play in major J2EE APIs: Java Database Connectivity (JDBC), Java Message Service (JMS), Enterprise Java Beans (EJB) and J2EE Connector Architecture (JCA). These chapters were interesting overviews of these technologies, and would be sufficient to begin programming in them. However, they are complex, and a single chapter certainly can’t do justice to any of the APIs. If you’re new to them, expect to buy another book.

In the last section, the authors discuss the future of transactions, especially long running activities (the Java Activity Service) and web services. This was the most interesting section to me, but also is the most likely to age poorly. These technologies are all still under development; the basic concepts, however, seem likely to remain useful for some time. And, if you need to decide on a web service transaction API yesterday, don’t build your own, read chapter 10.

There were some things I didn’t like about Java Transaction Processing. Some of the editing was sloppy—periods or words missing. This wasn’t too big a problem for me, since the publisher provided me a free copy for review, but if I were paying list price ($50) I’d be a bit miffed. A larger annoyance was incorrect UML and Java code snippets. Again, the meaning can be figured out from the text, but it’s a bit frustrating. Finally, while the authors raise some very valid points about trusting, or not, the transaction system software provider, I felt the constant trumpeting of HP and Arjuna technologies was a bit tedious. Perhaps these companies are on the forefront of Java transactions (possible); perhaps the authors are most familiar with the products of these companies (looking at the biographies, this is likely). The warnings—find out who is writing the transaction software, which is probably at the heart of your business, and how often they’ve written such software before—were useful, if a bit repetitive.

That said, this book was still a good read, if a bit long (~360 pages). I think that Java Transaction Processing would be especially useful for an enterprise architect looking to leverage existing (expensive) transactional systems with more modern technology, and trying to see how Java and its myriad APIs fit into the mix. (This is what I imagine, because I’m not an enterprise architect.) I also think this book would be useful to DBAs; knowing about the Java APIs and how they deal with transactions would definitely help a DBA discuss software issues with a typical Java developer.

To me, an average Java developer, the first section of the book was the most useful. While transactions are fairly simple to explain (consider the canonical bank account example), this section illuminated complexities I’d not even thought of—optimizations, heuristic outcomes, failure recovery. These issues occur even in fairly simple setups—I’m working at a client who wants to update two databases with different views of the same information, but make sure that both are updated or neither; this seems to be a typical distributed transaction. The easiest way to deal with this is to pretend that such updates will always be successful, and then accept small discrepancies. That’s fine with click-throughs—money is a different matter.

However, if you are a typical web developer, I’m not sure this book is worth the price. I would borrow it from your company’s enterprise architect, as reading it will make you a better programmer (as well as giving you a sense of history—transactions have been around for a long time). But, after digesting fundamental distributed transaction concepts, I won’t be referencing this book anytime soon, since the scenarios simply don’t happen that often (and when they do, they’re often ignored, as outlined above).


Decreasing the size of a midlet jar

The J2ME application I have been working on has been ready for testing for quite some time, but I didn’t want to get a new AT&T phone. For J2ME, you really need a GSM phone–I don’t think any of the older TDMA models support it. But the GSM network coverage doesn’t match the coverage of the TDMA network–especially out west (aside: isn’t that magnifying glass pretty cool?). So I put off buying a phone until my summer road tripping was done.

I’ve had a Nokia 6160 for almost 4 years. Even though friends mocked the size of it, it was a great phone–durable, good talk time. I thought I’d try another Nokia, and got one of the lower end GSM phones, the 6200. This supported J2ME, and weighed maybe half as much. I was all stoked to try the application on my brand new phone.

I started download the jad file, and was getting ‘File Too Large’ errors. A couple of searches later, I found Nokia’s developer device matrix which is much more useful than the User Guide or the customer facing description of phones. Whoops. Most of the Series 40 (read: affordable) Nokia devices only supported J2ME applications which were, when jarred up, less than 64K in size.

Our application, however, was about 78K. This highlights one of the differences between J2ME and J2SE/J2EE. When coding in the latter world, I was never concerned about code size–getting the job done quickly was paramount, and if I needed to use 13 libraries which bloated the final size of my application, I did. On a cell phone, however, there’s no appeal to adding memory or changing the JVM configuration to optimize memory use. If the Nokia phone only accepts jars of 64K or less, I had three options:

1. Write off the Nokia Series 40 platform. Ugh–I like Nokias, and other folks do too.

2. Do some kind of magic URL classloading. This seemed complicated and I wasn’t sure how to do it.

3. Decrease the size of the jar file.

Now, the 78K jar had already been run through an obfuscator. I wasn’t going to get any quick and easy gains from automated software. I posted a question on the JavaRanch J2ME forum and received some useful replies. Here’s the sequence I went through:

1. Original size of the application: 79884 bytes.

2. Removal of extra, unused classes: 79881. You can see that the obfuscator did a good job of winnowing out unused classes without my help.

3. Changed all the data objects (5-6 classes), which had been written in classic J2SE style with getters and setters for their properties, to have public variables instead: 79465

4. Combined 3 of the data object classes into one generic class: 78868

5. Combined 5 networking classes into 2: 74543

6. Removed all the logging statements: 66044. (Perl to the rescue–$ perl -p -i -e 's!Log\.!//Log.!' `find . -name "*.java" -print |xargs grep -l 'Log\.'`)

7. Next, I played around with the jode obfuscator which Michael Yuan recommended. I was able to radically decrease the size of the jar file, but, unfortunately, that jar file didn’t work on the phone. I also got a ton of exceptions:

java.util.NoSuchElementException
        at jode.bytecode.BytecodeInfo$1.next(BytecodeInfo.java:123)
        at jode.obfuscator.modules.LocalOptimizer.calcLocalInfo(LocalOptimizer.java:370)
        at jode.obfuscator.modules.LocalOptimizer.transformCode(LocalOptimizer.java:916)
        at jode.obfuscator.MethodIdentifier.doTransformations(MethodIdentifier.java:175)
        at jode.obfuscator.ClassIdentifier.doTransformations(ClassIdentifier.java:659)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:320)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:322)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:322)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:322)
        at jode.obfuscator.ClassBundle.doTransformations(ClassBundle.java:421)
        at jode.obfuscator.ClassBundle.run(ClassBundle.java:526)
        at jode.obfuscator.Main.main(Main.java:189)

I'm sure I just didn't use it right, but the jar file size was so close to the limit that I abandoned jode.

8. Instead, I put all the classes in one file (perl to the rescue, again) and compiled that: 64057 bytes. The jar now downloads and works on my Nokia 6200 phone.

When I have to do this again, I'll definitely focus on condensing classes, basically replacing polymorphism with if statements. After removing extraneous Strings and concatenating all your classes into one .java file (both of which are one time shots), condensing classes is the biggest bang for your buck.


XML for data transmission

I was using XML as a file format for a recent project. Not a standardized dialect, just an ad hoc, custom flavor whipped up to hold the data of particular interest. Now, I’ve used many file formats and always thought XML was a bit hyped up. After all, it’s bulky and angle brackets can be a bit tedious to wade through. In addition, parsing it is hard (creating it is difficult too, if you want to do so by creating a DOM tree). Not too hard, you say. Well, compare the joy of a StringTokenizer parsing a pipe delimited line to the pain of traversing around a DOM tree (to say nothing of the “if” hell of a SAX handler).

However, XML does have strong points. Storing hierarchical data in XML is easier. And, as far as I’m concerned, XML’s killer feature as a file transport format is its self-documenting nature. Sure, you can put headers at the top of a pipe delimited file, but it’s easy enough to forget them. Omitting the XML tags isn’t really possible–you can choose obscure names for the tags that cloud the meaning of the data, but that’s about the worst you can do. Using a custom flavor of XML as a data transmission format can be a really good idea; it just means you’ll have a helper class to do the node traversing contortions everytime you want to read it. (I’m purposefully ignoring the technologies like JAX because I haven’t used them.)



java memory management, oh my!

How much do you understand basic java? Every day I find some part of this language that I’m not aware of, or don’t understand. Some days it’s cool APIS (like JAI) but today it’s concurrency. Now, language managed memory is a feature that’s been present in the languages in which I’ve been programming since I started. I’ve looked at C and C++, but taking a job coding in those seems to me it’d be like a job with a long commute–both have obstacles keeping you from getting real work done. (I’m not alone in feeling this way.) But this thread of comments on Cameron Purdy’s blog drove home my ignorance. However, the commenters do point out several interesting articles (in particular, this article about double checked locking was useful and made my head hurt at the same time) to alleviate that. I took a class with Tom Cargill a few years back, which included his threading module, that helped a bit.

However, all these complexities are why servlets (and EJBs) are so powerful. As long as you’re careful to only use local variables, why, you shouldn’t have to worry about threading at all. That’s what you use the container for, right? And we all know that containers are bug free, right? And you’d never have to go back and find some isolated thread related defect that affected your code a maddeningly miniscule amount of time, right?


PL/SQL

I recently wrote a basic data transformation program using Java and PL/SQL. I hadn’t used PL/SQL (which is an Oracle-specific procedural language for stored procedures) since writing a basic data layer for my first professional project (a Yahoo! like application written in PL/SQL, perl and Story Server–don’t ask). Anyway, revisiting PL/SQL reminded me of some of the things I liked and disliked about that language.

I like:

Invalidation of dependencies. In PL/SQL, if package A (packages are simply arbitrary, hopefully logical, groups of procedures and functions) calls package B, A depends on B. If the signatures of B are recompiled (you can separate the signatures from the implementations) package A simply won’t run until you recompile it. This is something I really wish other languages would pick up, because it at least lets you know when something you depend on has changed out from under you.

I dislike:

The BEGIN and END blocks, which indicate boundaries for loops and if statements, are semantically no different than the { and } which I’ve grown to love in perl and Java. But for some reason, it takes me back to my pascal days and leaves a bit of a bad taste in my mouth.

I’m unsure of:

The idea of putting business logic in a database. Of course, schemas are intimately tied to the business layer (ask anyone trying to move to a different one) and anyone who pretends that switching databases in a java applications is a simple matter of changing a configuration file is smoking crack, but the putting chunks of business logic in the data layer introduces a few problems. Every different language that you use increases the complexity of a project–and to debug problems with the interface between them, you need to have someone who knows both. Also, stored procedures don’t fit very well into any of the object relational mapping tools and pretty much force you to use jdbc.


Lessons from a data migration

I’ve been working on a data migration project for the last couple of months. There are two schemas, each used by a number of client applications implemented in a number of technologies, and I just wanted to share some of the lessons I’ve learned. Most of the clients are doing simple CRUD but there is some business logic going on as well. I’m sure most of these points will elicit ‘no-duhs’ from many of you.

1. Domain knowledge is crucial. There were many times where I made dumb mistakes because I didn’t understand how one column mapped to another, or how two tables were being consolidated. This would have been easier if I’d had an understanding of the problem space (networking at level 3 and below of the OSI burrito).

2. Parallel efforts end up wasting a lot of time, and doing things in the correct order is important. For instance, much of the client code was refactored before the data layer had settled down. Result? We had to revisit the client layer again. It was hard to split up the data layer work in any meaningful manner, because of the interdependencies of the various tables (thought doing this made more sense than updating client code). Multiple users working on DDL and DML in the same database leads to my next conclusion:

3. Multiple databases are required for effective parallel efforts. Seems like a no-brainer, but the maintenance nightmare of managing multiple developer databases often leads to developers sharing one database. This is workable on a project where most of the development is happening on top of a stable database schema, but when the schema and data are what is being changed, issues arise. Toes are stepped on.

4. Rippling changes through to clients presents you with a design choice. For large changes, like tables losing columns or being consolidated, you really don’t have a choice–you need to reflect those changes all the way through your application. But when it’s a small change, like the renaming of a column, you can either reflect that change in your value objects, or you can hide the changes, either in the DAO (misnamed properties) or database layer (views). The latter choice will lead to confusion down the line, but is less work. However, point #5 is an alternative to both:

5. Code generation a good idea in this case. Rather than having static objects that are maintained in version control, if the value objects and DAOs had some degree of flexibility in terms of looking at the database to determine their properties, adding, deleting and renaming columns would have been much much easier–freeing up more time to fix the GUI and business layer problems that such changes would cause.


Understanding the nuts and bolts

I remember back when EJBs first came out and there were all these tools bundled with the application server to build the XML deployment descriptors. Yet, the team I was on built a (perl) application which could generate those same descriptors. Why? Was it a case of ‘Not Invented Here’ syndrome? Someone with more time than sense? Well, perhaps, but it also ensured the team had a portable way of developing deployment descriptors and made sure that someone had a deep knowledge of said files.

Now, I feel the same way about web applications in general and JSF in particular. If you want to really understand the applications you create, you want to build them from the ground up. But, rather than regurgitate the arguments expounded so clearly in The Law of Leaky Abstractions and Beware Evil Wizards, I’d like to talk about where tools are good. This touches on some of the articles I’ve written before, including ease of programming.

Tools tend to be a fundamental parts of large systems that have a lot of people involved. Specialized knowledge (or lack of same) can lead to tools being built to help or insulate the users from certain grungy parts of a system–hence the EJB roles which split the deployer and programmer roles (among others) apart. That works fine with a large team.

But another interesting aspect of tools is the abstraction. Joel posulates that eventually the abstraction breaks down, and I’ve seen it happen. But, then again, I don’t expect to understand all of the socket handling that Tomcat does, or the TCP stack of the operating system on which Tomcat runs. I might have to delve into it if there are issues and it’s a high performance site, but in the normal course of events, that is simply not required. To link to another pundit, situations arise where such scalability just isn’t in the nature of the application. I’d also submit the tons and tons of VBA apps built on top of Outlook and the large complex spreadsheets build on Excel as examples of applications where software design, let alone a deep understanding of the fundamental building blocks of the language, is not required.

Sometimes, you just want to get the job done, and understanding the nuts and bolts isn’t necessary. In fact, it can be an inhibition. I was talking to an acquaintance today who used to code. When asked why he didn’t anymore, he pointed back to one factor–he wanted to be able to service the customer more quickly. At a higher level of abstraction, you can do that. You give up control, because the implementation of the service is usually in other hands (allowing you to go on to service another customer), because in the end, it all needs to be coded somehow. Tools, like Rave and Visual Studio.NET, make that trade off as well.


Arrogance

Ah, the arrogance of software developers. (I’m a software developer myself, so I figure I have carte blanche to take aim at the foibles of my profession.) Why, just the other day, I reviewed a legal document, and pointed out several places where I thought it could be improved (wording, some incorrect references, and whatnot). Now, why do I think that I have any business looking over a legal document (a real lawyer will check it over too)? Well, why shouldn’t I? I think that most developers have a couple of the characteristics/behaviors listed below, and that these can lead to such arrogance.

1. Asking questions

Many developers have no fear, even take pride, in asking good, difficult questions about technical topics. Asking such questions can become a habit. A developer may ask a question, and feel comfortable about it, when he/she is entirely out of his/her depth.

2. Attention to detail

Developers tend to be capable of focusing on one thing to the exclusion of all else. This often means that, whatever the idea that comes along, a developer will focus on it exclusively. Such focus may turn up issues that were missed by the less attentive, or it may just be nit picking. (Finding small issues isn’t nitpicking when one is developing–it’s pre-emptive bug fixing.)

3. Curiosity and the desire to learn

Most developers are curious. In part because computers are so new, and in part because software technologies change so rapidly, hackers have to be curious, or they’re left behind, coding Cobol (not that there’s anything wrong with that!). This sometimes spills out into other portions of their lives, tweaking their bodies or the mechanics of an IPO.

4. Know something about something difficult

Yeah, yeah, most developers are not on the bleeding edge of software. But telling most people what they do usually causes some kind of ‘ooh’ or raised eyebrows conveying some level of expectation of the difficulty of software development. (Though this reaction is a lot less universal than it was during the dotcom boom–nowadays, it could just as easily be an ‘ooh’ of sympathy to an out of work .) Because developers are aware that what they do often isn’t that difficult (it’s just being curious, asking questions, and being attentive), it’s easy to assume that other professions usually thought difficult are similarly overblown.

Now, this arrogance surfaces in other realms; for example, business plans. I am realizing just how far I fall short in that arena. I’ve had a few business plans, but they often fall prey to the problem that the gnomes had in South Park: no way to get from action to profit. I’m certainly not alone in this either.

In the same vein of arrogance, I used to make fun of marketing people, because everything they do is so vague and ill-defined. I always want things nailed down. But, guess what, the real world is vague and ill-defined. (Just try finding out something simple, like how many people are driving Fords, how women use the internet, or how many people truly, truly love Richie Valens. You appear to be reduced to interviewing segments of the population and extrapolating.) And if you ask people what they want, they’ll lie to you. Not because they want to lie, but because they don’t really know what they want.

I guess this is a confession of arrogance on the part of one software developer and an apology to all the marketroids I’ve snickered at over the years (oops, I just did it again :). (I promise to keep myself on a shorter leash in the future.) Thanks for going out into the real world and delivering back desires, which I can try to refine into something I can really build. It’s harder than it looks.



© Moore Consulting, 2003-2017 +