Skip to content

How to evaluate an open source project

There are a fantastic number of open source projects out there, on SourceForge, apache, and elsewhere. This is fantastic because you can often leverage other work that folks have done, as well as knowledge and mistakes they’ve made. However, it can be extremely difficult to evaluate accurately how much such projects can help you, especially if you’ve not used them before, or if they are obscure. In addition, you probably don’t have a lot of time to choose a solution–clients that go with open source solutions tend to have budget constraints. I present here a quick checklist that I use to evaluate projects that I’m going to depend upon:

1. Know your drop dead features. These are features that the software package must have in order to be considered. Be careful not to make this too long. The primary purpose of this list is to allow you to quickly exclude packages that you know won’t work for you, and if you make it too long, you might be left with no options.

2. Look at the documentation attached to the project. This is the first place to start ruling a project out–if it doesn’t promise the features you need, move on. Also, look at a demo or screen shots, if possible. This lets you see how the package works. Compare behavior with the list of needed features.

3. Install the software. If you have difficulty installing it, that’s not an insurmountable issue–often open source projects aren’t the smoothest installations. However, installing it and spending a few hours playing around with this software that is going to be a significant part of your project can let you know if the impressions you received from the demo and documentation are correct–is it going to be easy enough to tweak/deploy/extend this software package?

4. In the world of open source support, the mailing list is king. Does this project have a mailing list? Is it archived? Is it googled? How active is it? If there’s no mailing list, is there a set of forums? The mailing list (or forum) is where you’re going to go when you have a smart question to ask, and you will, so you want this resource to be as strong as possible.

5. Look at the documentation again. The first time you did so, you were just looking to exclude a project based on feature set. This time, you want to see how much the documentation can help you. Is there a tutorial? Are the advanced topics that concern you covered? For java projects, is there javadoc? Is it more than just the methods and arguments that are automatically generated? What version of the software is the documentation for?

Of course, the more I depend on a piece of software, the more time I want to spend on evaluation. On the other hand, the process laid out above is entirely technical in nature, and, as we know, there may be other reasons for choosing, or not choosing, software. Installed base, existing team experience and knowledge, project timeline, or the fact that the CEO’s brother is on the board of a company with a rival offering all can influence software package choice. There are many factors to be considered, but this list is how I start off.

Expresso authentication and authorization

I’ve only briefly worked with Expresso. But I’ve heard good things about it. However, one ‘feature’ is really chapping my hide at the moment. Apparently, the only way to authenticate someone is to call the attemptLogin method on a ‘Controller’ object (a subclass of a Struts Action), which is protected and takes, among other things, the http request and response. There’s no way I can find to just pass in a username/password and authenticate. In addition, the authorization system is not broken out either. In OO fashion, you ask an object if a user can access it, and the object knows enough to reply.

I’m not trying to rag on the Expresso developers. After all, they are giving away a fine, full featured java web framework for free. But this just drove home to me how important it is in web applications to have the classes that talk http be nothing more than a thin translating layer around business classes. For instance, all a struts action should do is convert http forms to domain specific value objects, and then call business methods on business objects.

If this was the case in Expresso, it’d be trivial for me to leverage Expresso’s existing authentication model–I’d just have to fall the methods on the business object, perhaps after creating a domain specific value object. Now, however, I’ll probably have to monkey around with the http request and response, and decode exactly what parameters it wants, and fake those up.

Open source portal search

I’ve been looking at some open source portals. My client has an existing java application, written in Expresso that has some reasonably complex logic embedded in it. Additionally, it’s massively internationalized, with dynamic international content coming from a database, and static content coming from a set of resource bundles. There’s an existing process around updating both of these sets of data. And when we’re talking internationalization, we’re talking Asian character sets as well as the European character sets.

So, the criteria for the portal were:

1. Support for multi-byte character sets and easy localization.

2. Ability to integrate with Expresso’s authentication and authorization systems.

3. Support for normal portal features–adding/moving/removing portlets, minimize/maximize portlets.

4. Documentation.

I looked at a fair number of portals, including jcorporate’s own ePortal, eXo, Liferay, Jetspeed 1, Jetspeed 2, and Pluto (a last alternative, to be avoided if possible, is to roll our own portal-like application). First, I looked at ePortal, but that’s a dead project, with no releases. Then, I installed pluto, which seemed like it would be a perfect fit to be integrated into Expresso. However, integrating pluto looked complex, and after installing it (fantastic instructions for installing pluto here), I realized that pluto did not have a layout manager that would allow for the addition, rearranging or moving of portlets.

I then battled with Jetspeed 2, which involved installing a subversion client and building from source. This looked to be pretty cool, but the sheer lack of documentation, and the fact that there have been no releases, caused me to shy off. This is no failure of Jetspeed 2–this is what projects in development are like; I think it will be a fine project when done but my client just doesn’t need to be on the bleeding edge. I also took a quick look at Liferay, which seems to be a much more full featured portal application than we needed. After reading this blog on portals I decided to take a closer look at eXo. However, the documentation wasn’t fantastic, and it wasn’t readily apparent how to plug in authentication.

I also downloaded and installed Jetspeed 1; if you download the src distribution, you get the helpful tutorial. While Jetspeed 1 is not a standards based solution (I expect most of the portlets will be custom developed anyway), the user community is fairly active, as indicated by the mailing list, and I’ve found the documentation to be extensive. In addition, it meets the localization requirements and the pluggable authentication and authorization systems.

I’m less than thrilled about having to use maven for builds. Others have said it better than I, but it’s just too much for my needs. However, I was able to get an independent directory tree for my project by copying over the maven.xml, project.properties, and project.xml from the tutorial directory to an empty directory. Then I tweaked the project.* files, ran maven jetspeed:genapp, tweaked a few settings in TubineResources.properties to make sure the localization settings were correct, and, voila, I have a working project tree, that, using the Jetspeed maven plugin, is one command away from a deployable war file.

Relearning the joys of DocBook

I remember the first time I looked at Simple DocBook. I have always enjoyed compiling my writing–I wrote my senior thesis using LaTeX. When I found DocBook, I was hooked–it was easier to use and understand than any of the TeX derivatives, and the Simplified grammar had just what I needed for technical documentation. I used it to write my JAAS article.

But, I remember it being a huge hassle to set up. You had to download openjade, compile it on some systems, set up some environment variables, point to certain configuration files and in general do quite a bit of fiddling. I grew so exasperated that I didn’t even setup the XML to PDF conversion, just the XML to HTML.

Well, I went back a few weeks ago, and found things had improved greatly. With the help of this document explaining how to set DocBook up on Windows (updated 12/2/2006 to fix a broken link) I was able to generate PDF and HTML files quickly. In fact, with the DocBook XSL transformations and the power of FOP, turning a Simplified DocBook article into a snazzy looking PDF file is as simple as this (stolen from here):


java -cp "C:\Programs\java\fop.jar; \
C:\Programs\java\batik.jar;C:\Programs\java\jimi-1.0.jar; \
C:\Programs\java\xalan.jar; C:\Programs\java\xerces.jar; \
C:\Programs\java\logkit-1.0b4.jar;C:\Programs\java\avalon-framework-4.0.jar" \org.apache.fop.apps.Fop -xsl \ "C:\user\default\xml\stylesheets\docbook-xsl-1.45\fo\docbook.xsl" \ -xml test.xml -pdf test.pdf

Wrap that up in a shell script, and you have a javac for dcuments.

Abstractions, Climbing and Coding

I vividly remember a conversation I had in the late 1990s with a friend in college. He was an old school traditional rock climber; he was born and raised in Grand Teton National Park. We were discussing technology and the changes it wreaks on activities, particularly climbing. He was talking about sport climbing. (For those of you not in the know, there are several different types of outdoor rock climbing. The two I’ll be referring to today are sport climbing and traditional, or trad, climbing. Sport climbers clip existing protection to ensure their safety; traditional climbers insert their own protection gear into cracks.) He was not bagging on sport climbing, but was explaining to me how it opened up the sport of climbing. A rock climber did not need to spend as much money acquiring equipment nor as much time learning to use protection safely. Instead, with sport climbing, one could focus on the act of climbing.

At that moment it struck me that what he was saying was applicable to HTML generation tools (among many, many other things). During that period, I was just becoming aware of some of the WYSIWYG tools available for generating HTML (remember, in the late 1990s, the web was still gaining momentum; I’m not even sure MS Word had ‘Save As HTML’ until Word 97). Just like trad versus sport, there was an obvious trade off to be made between hand coding HTML and using a tool to generate it. The tool saved the user time, but acted as an abstraction layer, clouding the user’s understanding of what was actually happening. In other words, when I coded HTML from hand, I understood everything that was going on. On the other hand, when I used a tool, I was able to make snazzier pages, but didn’t understand what was happening. Let’s just repeat that—I was able to do something and have it work, all without understanding why it worked! How powerful is that?

This trend, towards making complicated things easier happens all the time. After all, the first cars were difficult to start, requiring hand cranking, but now I just get in the car and turn the key. This abstraction process is well and good, as long as we realize it is happening and are willing to accept the costs. For there are costs, in climbing, but also in software. Joel has something to say on this topic. I saw an example of this cost myself a few months ago, when Tomcat was not behaving as I expected, and I had to work around an abstraction that had failed. I also saw a benefit to this process of abstraction when I was right out of school. In 1999, there was not the body of frameworks and best practices that currently exist. There was a lot of invention from scratch. I saw a shopping cart get built, and wrote a user authentication and authorization system myself. These were good experiences, and it was much easier to support this software, since it was understood from the ground up by the authors. But, it was hugely expensive as well.

In climbing terms, I saw this trade off recently when I took a friend (a much better climber than I) trad climbing. She led a pitch far below her climbing level, and yet was twigged out by the need to place her own protection. I imagine that’s exactly how I would feel were I required to fix my brakes or debug a compiler. Dropping down to a lower abstraction takes energy, time, and sometimes money. Since you only have a finite amount of time, you need to decide at what abstraction level you want to sit. Of course, this varies depending the context; when you’re working, the abstraction level of Visual Basic may be just fine, because you just need to get this small application written (though you shouldn’t expect such an application to scale to multiple users). When you’re climbing, you may decide that you need to dig down to the trad level of abstraction in order to go the places you want to go.

I recently read an interview with Richard Rossiter, who has written some of the canonical guidebooks for front range area climbing. When asked where he thought “climbing was going” Rossiter replied: “My guess is that rock climbing will go toward safety and predictability as more and more people get involved. In other words, sport climbing is here to stay and will only get bigger….” A wise prediction; analogous to my prediction that sometimes understanding the nuts and bolts of an application simply isn’t necessary. I sympathize. I wouldn’t have wanted to go climbing with hobnail boots and manila ropes, as they did in the old days; nor would I have wanted to have to write my own compiler, as many did in the 1960s. And, as my college friend pointed out, sport climbing does make climbing in general safer and more accessible; you don’t have to invest a ton of time learning how to fiddle with equipment that will save your life. At the same time, unless you are one of the few who places bolts, you are trusting someone else’s ability to place equipment that will save your life. Just like I’ve trusted DreamWeaver to create HTML that’s readable by browsers—if it does not, and I don’t know HTML, I have few options.

Note, though, that it is silly for folks who sit at one level of abstraction to denigrate folks at another. After all, what is the real difference between someone using a compiler and someone using DreamWeaver? They’re both trying to get something done, using something that they probably don’t understand. (And if you understand compilers, do you understand chip design? How about photo-lithography? Quantum mechanics? Everyone uses things they don’t understand at some level.)

It is important, however, to realize that even if you are using a higher abstraction level, there’s a certain richness and joy that can’t be achieved unless you’re at the lower level. (The opposite is true as well—I’d hate to deal with strings instead of classes all the time; sport climbing frees me to enjoy movement on the rock.) Lower levels tend to be more complicated (that’s what abstraction does—hides complex ‘stuff’ behind a veneer of simplicity), so fewer folks enjoy the benefits of, say, trad climbing or compiler design. Again, depending on context, it may be well worth your while to dip down and see whether an activity like climbing or coding can be made more fulfilling by attacking it at a lower level. You’ll possibly learn a new skill, which, in the computer world can be a career helper, and in the climbing world may save your life at some time. You’ll also probably appreciate the higher level activities if and when you head back to that level, because you’ll have an understanding of the mental and temporal savings that the abstraction provides.

Passwords and authentication

Passwords are omnipresent, but just don’t work the way they should. A password should be a private string that only a user could know. It should be easy to remember, but at the same time hard to guess. It should be changed regularly, and only passed over a secure connection (SSL, ssh). At least, that’s what the password policies I’ve seen say. People, however, get in the way.

I have a friend who always has the same password: ‘lemmein’. She is non-technical. Whenever she tries to sign in to a system, she has invariably forgotten her password. She tries different incarnations, and eventually becomes so frustrated, she just types ‘lemmein’ and, voila, she is logged in.

I have another friend who is a computer security professional (or was). He has the same issue with forgotten passwords, but rather than have one insecure password, he keeps all his passwords in a file on a machine that he controls, protected by one master password. In this way, he only has to remember the one password, yet machines aren’t at risk.

I sympathize with both my friends, since, off the top of my head, I can easily think of ten different passwords that I currently use, for various systems and applications. In fact, the growth of the web applications (since the address bar is the new command line) has exploded the number of passwords that I have to remember.

I’m not as blase about security as my first buddy, nor as together as my second friend, so I just rely on my memory. That works, sometimes. Often, if I seldom visit a site that requires a password, I’ll always make use of the ‘mail me my password’ functionality that most such sites have. I won’t even bother to try to remember the password.

Sometimes, password changes are imposed on you. I’ve been at places where your password had to be changed every three weeks, and must be different rom your previous three passwords. I was only there for a short period of time, but I’m sure that there are some folks who are cycling passwords (‘oh, it’s one of these four, I know it’).

On the other hand, I worked at a place for three years; I had access to a number of web servers, often with sudo, yet I changed my passwords two times. It was just such a tremendous hassle to try to bring all my passwords in sync. (Yes, yes, we should have had an LDAP server responsible for all those passwords; that would have made changing it easier. There are some technical solutions that can ease password pain, at least within one organization.)

Passwords are even used in the ‘real world’ now. Leaving aside the obvious example of ATM pins, my bank won’t let me do anything serious to my account over the phone unless I know my password.

Passwords do have tremendous advantages. They let me authenticate myself without being physically present. They’re easy to carry with you. Computers don’t need special hardware or software to authenticate a user via a password. Everyone understands the concept. But passwords are really the least of the evils when it comes to authenticating remote users (/entities). They’re easy to pass around, or steal, since they’re aren’t physical. Passwords are either easy to forget or easy to crack.

I guess my solution has been to break up my passwords into levels. For simple things like logging into web applications, I have one or two very easy to remember passwords, or I use the ‘mail me my password’ functionality mentioned above. For more sensitive accounts that I use regularly, computer logins where I’m an administrator of some kind, my email, or web applications where my credit card details are viewable, I’ll have some more complicated password, which may or may not be shared among similar systems. And for other systems where I need a good password but don’t use it regularly, I’ll write it down and store it in a safe place.

Passwords are certainly better than using SSN, zip code, or some other arbitrary single token that could be stolen. But they certainly aren’t the optimal solution. I actually used a userid/biometric solution at a client’s office (for the office door) and it rejected me a very small percentage of the time. The overhead to add me to the system was apparently fairly substantial, since it took weeks for this to happen. For situations where the hardware is available and deployed, biometric solutions seem like a good fit.

No one, however, is going to add finger/eye/palm scanners to every machine that I want to access, to say nothing of various interesting remote applications (I want my travelocity!). Some scheme where you login to a single computer that then generates a certificate that uniquely identifies you (something like xauth) may be the best type of solution for general purpose non-physical authentication. But, as a software guy, my mind boggles at the infrastructure needed to support such a solution. Looks like passwords are here to stay for a while.

Slackware to the rescue

I bought a new Windows laptop computer about nine months ago, to replace my linux desktop that I purchased in 2000. Yesterday, I needed to check to see if I had a file or two on the old desktop computer, but I hadn’t logged in for eight months; I had no idea what my password was. Now, I should have a root/boot disk set, even though floppy disks are going the way of cursive. But I didn’t. Instead, I had the slackware installation disks from my first venture into linux: a IBM PS/2, with 60 meg of hard drive space, in 1997. I was able to use those disks to load a working, if spartan, linux system into RAM. Then, I mounted the boot partition and used sed (vi being unavailable) to edit the shadow file:

sed 's/root:[^:]*:/root::/' shadow > shadow.new
mv shadow.new shadow

Unmount the partition, reboot, pop the floppy out, and I’m in to find that pesky file. As far as I know, those slackware install disks are the oldest bit of software that I own that still is useful.

New approach to comment spam

Well, after ignoring my blog for a week, and dealing with 100+ comment spams, I’m taking a new tack. I’m not going to rename my comments.cgi script anymore, as that seems to have become less effective.

Instead, I’m closing all comments on any older entry that doesn’t have at least 2 comments. When I go through and delete any comment spam, I just close the entry. This seems to have worked, as I’ve dealt with 2-3 comment spams in the last week, rather than 10+.

I’ve also considered writing a bit of perl to browse through Movable Types DBM database to ease the removal of ‘tramadol’ entries (rather than clicking my way to carpal tunnel). We’ll see.

(I don’t even know what’s involved in using MT-Blacklist. Not sure if the return would be worth the effort for my single blog installation.)

Back to google

So, the fundamental browser feature I use the most is this set of keystrokes:
* cntrl-T–open a new tab
* g search term–to search for “search term”
(I set up g so the keyword expands and points to a search engine.)

Periodically, I’ll hear of a new search engine–a google killer. And I’ll switch my bookmark so that ‘g’ points to the new search engine. I’ve tried AltaVista, Teoma and, lately, IceRocket. Yet, I always return to Google. The others have some nice features–IceRocket shows you images of the pages–and the search results are similar enough. What keeps me coming back to google is the speed of the result set delivery. I guess my attention span has just plain withered.

Anyone else have a google killer I should try?