Dan Moore!: Programming Archives

July 20, 2006

Video on new Web 2.0 Framework

No, not Rails or Shale or Seam, this is something entirely new! So it must be great! Heh.

If you recognize half the acronyms in this funny sketch about the Hak.5 Microshaft Web 2.0 Framework, you're a geek. But it's so 'used car salesman' that even if you don't, you'll probably enjoy it. Have a laugh on me.

Posted by moore at 06:43 PM

June 26, 2006

Using wget to verify on the fly page compression

I've written about wget before, but I just found a very cool use for it. I'm looking at ways to speed up a site, by stripping out whitespace. I found one servlet filter by googling around. However, that code has a license restriction for commercial use (you just have to pay for it).

A bit more looking and I found this fantastic article: Two Servlet Filters Every Web Application Should Have. Give it a read if you do java web development. The article provides a free set of filters, one of which compresses a servlet response if the browser can handle it. (I've touched on using gzip for such purposes before as well.)

I was seeing some fantastic decreases in page size. (The article states that you can't tell, but FireFox's Page Info command [ Tools / Page Info ] seemed to reflect the differences.) Basically, a 300% decrease in size: 50K to 5K, 130K to 20K. Pretty cool. Note that if your application is CPU bound, adding this filter will not help performance. But if you're bandwidth bound, decreasing your average page size will help.

I couldn't believe those numbers. I also wanted to make sure that clients who didn't understand gzip could still see the pages. Out comes trusty wget. wget url pulls down the standard sized file. wget --header="accept-encoding: gzip" url pulls down a gzipped version that I can even ungzip and verify that nothing has changed.

The only issue I saw was that ' c ' is apparently rendered as the copyright symbol in uncompressed pages. For compressed pages, you need to use the standard: ©. Other than that, the compression was transparent. Way cool.

Posted by moore at 09:43 PM

June 21, 2006

GWT Mortgage Calculator Conclusion

This past week has been a whirlwind, including a whole lot of learning on my part and three releases of the Colorado HomeFinder site. Just to rehash, I've built three versions of the software:

A standalone javascript version. A toy which introduced me to GWT (src)
A calculator that pulled some data from an external expresso server. (src)
And finally, a more complicated version of the calculator, which also included a servlet so everything would run fine in hosted mode. (src)

In general, from a java developer perspective, this is my take on the strengths and weaknesses of GWT.

Strengths:

Can use normal java dev/debug environment--in particular, being able to set breakpoints in what will be javascript was useful
Hides javascript cross browser messiness
Allows rich uis to be built for web using same paradigms as other windowing toolkits (event listeners, layouts, panels, etc). As some folks have said, it is Swing for the web.
Ability to create widget libraries--using module 'inheritance' you can easily leverage other folk's work. See this list for a collection of GWT widgets.

Weaknesses:

Build integration--I really don't understand why they haven't wrapped in an ant task, though others have done it for them, and there is an -ant switch on projectCreator.cmd which generates a stub build.xml
Documentation--again, I've found the Google provided documentation could use some brushing up.
Some cross browser weirdnesses--the only serious one I saw was IE not responding to click events on one page I built. This was due to a table of width 100% next to the div where the GWT widgets were dynamically creating DOM elements. But it only showed up on IE.
Any ui built by GWT is not indexable by search engines. This necessarily limits what you can do with it--you can build web applications but not web sites.

Posted by moore at 04:52 PM

Step #3: Add a tabbed interface and other enhancements

Previously, I described how to integrate a network call into the GWT mortgage calculator. (See Step #2, Step #2a and Step #2b.) Now I'm going to add an another, more complex mortgage calculator panel. This will add in more real world costs. Both the simple and advanced calculators will be available at the same time, using a tabbed interface. Additionally, I've added a servlet which will serve up correct JSON in hosted mode, which means that anyone who wants to download and play with this code will be able to do so without any software beyond the source, the GWT, and a modern version of Eclipse. See the new tabbed interface in action and download the source. (Please note that the interface online is not exactly the same as the interface you can build via the download--my client had me make a couple of changes.) I'll also talk about how I integrated this calculator onto a property specific page.

Adding a tabbed interface was actually quite easy. I used a TabPanel. The only hiccup was that you cannot reuse widgets. I thought it'd be nice if the loan amount, for example, stayed the same between the tabs. However, if you place a widget in two different containers, the last one wins--the widget simply isn't in the first container. I tested this with the VerticalPanel and the FlexTable. Instead, I had to create new widgets. I'm sure that I could have achieved synchronicity between widgets with a KeyboardListener, but the payoff didn't seem worth the hassle. Other than that, this is a minor revision. I did some refactoring and pushed validation up to the button widgets.

The JSONServlet shows how to keep server and client source in the same tree, and also provides a simple way to toy with JSON client-server communication. I tested this and you should be able to simply untar the source, import the project into Eclipse, do a bit of GWT judo, and run the project. Steps (only tested for windows, sorry) to reproduce the final working product:

You will have to download the GWT since I can't legally redistribute it.
Unzip it into, for example, c:\gwt
Download and untar the mortgage calculator source, say into c:\mtgcalc
Copy gwt-user.jar and gwt-dev-windows.jar into the lib directory, for example c:\mtgcalc\lib. (This lets you run the MortgageCalc-compile.cmd batch file to compile the app from the command line. If you just want to see the javascript in action, you can stop there and copy the resulting .html/.xml/.gif files to a host someplace and view in a browser.)
To set up the debugging, start Eclipse.
Import the project vai the File \ Import menu item.
When you're ready to run the browser in hosted mode, edit the launcher in Eclipse. Choose the Run menu, then the Run... option.
Select the MortgageCalc application (under Java Application)
Choose the Classpath tab. Remove the gwt-user.jar and gwt-dev-windows.jar files.
Click under 'User Entries'
Click the 'Add External Jars' button.
Navigate to where you unzipped GWT (for example, c:\gwt and select both gwt-user.jar and gwt-dev-windows.jar
Click 'Open'
You should see the two jars with this icon next to them: . If you see this one instead: , that means you added them as jars, not as external jars, and you won't be able to run in hosted mode. (Updated 6/21: You'll get the error described here: [ERROR] The browser widget class could not be instantiated java.lang.UnsatisfiedLinkError: Unable to load required native library 'gwt-ll')
Click 'Apply' and then 'Run'.

The next step is to integrate the calculator into an existing JSP and pull some information from that JSP page. Since this is tightly integrated into the Colorado HomeFinder site, code would probably not be useful, so it's not provided. However, if you'd like to see the results, visit Colorado HomeFinder and search for a home, click through to a property and then click the 'Mortgage Payment' link. (Sorry this process is not streamlined, but any link I put to a specific property page on that site will expire in a few months.)

The ideal calculator gets listing price and other information from the server, and prepopulates the appropriate fields. However, the calculator needs to know what property it is being shown for; in other words, it needs to pick property specific data out of the page. There are a couple of possible methods: I could write a javascript variable to the page and use JSNI to turn it into a data structure that my compiled gwt code could read. Or I could put text into a hidden div and then use the Google DOM classes to get the data.

Since what I needed to retrieve was a few simple numeric values, I went with the DOM option. If I had a more complex data structure to communicate, using JSNI might have been a better option. I used getElementById to find the appropriate div and getInnerText to get the contents of that division. With those pieces, I could query back to the server and get the price, taxes, etc. which I would use to fill in the appropriate text boxes on the mortgage calculator.

And I believe that's all for now. There are, of course, many places where Icould expand this calculator, but this is functional enough and we now have an understanding of the strengths and weaknesses of GWT. And that's the subject of my next post.

Posted by moore at 08:15 AM

June 14, 2006

Step #2b: Updating the GWT Client to communicate with the Server

The next step is to update the client to communicate with the server side components. Again, if you're not very patient, you can see the live client code at Colorado HomeFinder, and download the source.

I need to add a drop down, which will offer users the option of a 5/25 ARM rate, or a 30 year fixed rate. I need to have some way of decoding the JSON response from the server, and I need to update the interest rate text box with the results from the server.

Adding the drop down is fairly easy: Google provides a listbox. The listbox is an instance variable, rather than a local variable, and I also change the interest text box from a local variable to an instance variable. The reason: the change listener, which is a non static member class, has easier access to an instance variable than to a local one. Here's the relevant code:

    rateChoiceBox.addItem("5/25 ARM");
    rateChoiceBox.addItem("30 Year Fixed");
    rateChoiceBox.setVisibleItemCount(1);
    rateChoiceBox.setWidth(width);
    final RatesResponseHandler rrh = new RatesResponseHandler();
    HTTPRequest.asyncGet(RATES_URL, rrh);
    
    rateChoiceBox.addChangeListener(new ChangeListener(){
		public void onChange(Widget sender) {
			HTTPRequest.asyncGet(RATES_URL, rrh);
		} 
    }
    );
...
private class RatesResponseHandler implements ResponseTextHandler {
	  
	public void onCompletion(String responseText) {
		boolean keepGoing = true;
		JSONObject jso = null;
		try {
			jso = JSONParser.parse(responseText);
		} catch (JSONException je) {
			keepGoing = false;
			// not sure what to do
		}
		String rate = "";
		// ... get the data, put it into the rate variable.
		MortgageCalc.this.interest.setText(rate);
	}
  }

In the RatesResponseHandler, I try to parse the JSON response that has been asynchronously downloaded from the server, and if I get meaningful text, I set the interest rate in the textbox: MortgageCalc.this.interest.setText(rate); If the server is not available, we still have a sane default. One interesting item: when you run the code with the server off in development, GWT pops you into the MS script debugger, which at least lets you know something is wrong.

The JSON processing code has already been written, so it made sense to leverage the JSON example that Google kindly provided. To do this, I made a few changes. Luckily, all the example code is made available under the Apache 2.0 license. You can find json.jar in the download, should you want to use it for a project of your own. First, I wanted to make sure that none of the behavior of the EntryPoint JSON class occurred--I wanted the subsidiary libraries, but not the potato demostration. Therefore, I eviscerated the JSON class, and gave it an empty onModuleLoad method. On reading the module documentation, it appears this wasn't needed--I could have just eliminated the <entry-point> entry. Then, I started building the JSON code library. To pull in a set of external widgets, you need to do a couple of things.

Build the library. I did this by compiling the JSON source, from the samples directory: mkdir builddir && javac -d builddir -classpath `cygpath -wp $PWD/bin/:.:../../../gwt-windows-1.0.21/gwt-user.jar` src/com/google/gwt/sample/json/client/JSON*.java. (The cygpath stuff is there because I'm developing on cygwin.) One thing to remember--you need to copy the java files into that same build directory: cd builddir/com/google/gwt/sample/json/client && cp ../../../../../../src/com/google/gwt/sample/json/client/*.java .. (Apparently you need the .class files for Eclipse or any other IDE, and you need to .java files so the Google compiler can compile the java to javascript. More here.) Also, don't forget the JSON.gwt.xml, which should go in builddir/com/google/gwt/sample/json.
Jar up the code, and place it in the classpaths. To compile from the command line, I added it to the Mortgage-compile.cmd batch file. For Eclipse development, I added it to the project buildpath (instructions for doing so). And I also had to add the json.jar file to the MortgageCalc application, so I could debug in Eclipse. To do this, choose 'Run' from the top menu, choose the 'Run...' option, select 'MortgageCalc', click the 'Classpath' tab, click 'User Entries' and add the jar using the Add button.
Inherit from the JSON module. As far as I can tell, this isn't inheritance in the usual object oriented sense, it's more like importing packages in java. I did this by adding this line to the MortgageCalc.gwt.xml: <inherits name='com.google.gwt.sample.json.JSON'/> (I did try to use a source path to refer to the JSON classes, rather than inherit from them, but got this error:
[WARN] Non-canonical source package: ../../../google/gwt/sample/json/client/ ... [ERROR] Unable to find type 'com.cohomefinder.gwt.mortgagecalculator.client.MortgageCalc' Hint: Check the inheritance chain from your module; it may not be inheriting a required module or a module may not be adding its source path entries properly
)

After going through those steps, I could use the JSONParser and other JSON objects in my widget. Here's the balance of the RateResponseHandler:

		if (keepGoing && jso != null) {
			int selected = MortgageCalc.this.rateChoiceBox.getSelectedIndex();
			String [] keys = jso.getKeys();
			if (keys.length == 2) { // expected
				JSONValue jsvrate =
jso.get(keys[selected]);
				JSONString jsstr = null;
				if ((jsstr = jsvrate.isString()) != null) {
					String rate = "";
					if ((rate = jsstr.toString()) !=
null) {
						//Window.alert("rate:
						//"+rate);
						MortgageCalc.this.interest.setText(rate);
				
					}				
				}	
			}
		}

The JSON parsing code is rather verbose. I also wasn't a fan of the return values from the is* methods; I was always told that any method prefixed with 'is' should return a boolean. But then I peeked into the JSONParser class and decided that I'd just use the free code and stop complaining. Final comments:

The module inheritance system is a quite powerful, if slightly misnamed, method of reusing code. In fact, there is already at least one blog about GWT widgets.
Packaging up code for a module is tedious, and perfectly suited to an ant task.
HTTPRequest is entirely adequate for pulling down data.
JSON, combined with HTTPRequest, gives you some typing, without tying you into a custom servlet. If you want typing stricter or more granular than javascript provides and are using a java backend, then Google's services might be a better choice.

Posted by moore at 08:02 PM

June 13, 2006

Step #2a Creating the Server side components

After choosing the means for transmitting the data, the next step is to build the server side code. For Expresso, that means creating a new controller and a new state. A controller is similar to a Struts action; one controller may have many states. Please be aware that if you want to play with the new client code, you can use almost any dynamic web aware language, or even a static HTML page, to return the appropriate JSON, which will look similar to this: {"fivearm":"5.00","thirtyfixed":"6.50"}.

Adding a new state to handle a request for mortgage interest data in Expresso is not too complicated, though it could definitely be easier. I needed to:

Add the values to the database. Expresso has a setup table for generic configuration information that I'll use.
Create a class that extends DBController, and have this class access the database and create the JSON.
Update the configuration file to map a URL to a class. This version of Expresso is based on Struts.
Add a JSP for output.
Update the security tables so that this state can be accessed by everyone.

Adding the values to the database is just an insert statement. For the thirty year fixed interest rate, the SQL looks like this: insert into SETUP values ('com.cohomefinder.CoHomeFinderSchema', 'MTG_30_Year_Fixed_Rate','30 Year Fixed Mortgage Interest Rate','6.50');. One of the benefits of using the setup table is that Expresso ships with an Administrative Web Interface which lets non technical users change 'setup values' with a browser.

The next step is to create a class to respond to our request. That class is the MortgageCalculator controller. This is a fairly simple class, which uses the JSON.simple Java library to create a correctly formatted JSON object. Note that the rates are sent back as Strings even though JSON can handle converting to numbers. The reason for this is that I wanted some kind of formatting control; if the class sent back '6.5' for the thirty year fixed interest rate, I might want it formatted as '6.50'. Formatting is simpler on the server, where I have the entire Java API to use, including NumberFormatter.

After that, I needed to create an entry in the configuration file to map some URL to this class.

<action path="/MtgCalc" type="com.cohomefinder.controller.MortgageCalculator" name="default" scope="request" validate="false">
<forward name="getRatesAsJSON" path="/expresso/components/registration/jsp/register/mtgcalc.jsp"/>
</action>

As you can see, this is very similar to configuring a struts-config.xml entry.

The JSP was extremely simple. It imports the expresso tag libraries in the standard manner, and then write the property that the MortgageCalculator class puts in the response (on the last line of the Java file).

<expresso:IfElementExists name="JSON" type="Output"><bean:write property="JSON"/></expresso:IfElementExists>

The last thing to do is update the controller table to allow anyone to access this class. Again, it's a simple SQL statement: insert into controllersecurity values ('com.cohomefinder.controller.MortgageCalculator','Everybody','*');. If I wanted to, I could restrict this information to certain classes of users, but in this case, everyone should have access to it.

After these steps, I can hit http://localhost:8080/MtgCalc.do?state=getRatesAsJSON and get back valid JSON containing values stored in the database.

The next step is to update the client to access the server data using HTTPRequest and to change the GUI accordingly.

Posted by moore at 08:21 AM

June 12, 2006

Step #2: A Calculator which retrieves data from a Java server process

(Updated 6/14 with links to the two parts of this step)

The next step in building a real world mortgage calculator using GWT is to retrieve something simple from a back end. (See the problem's introduction and Step #1, as well as the client code and the server code portions of the step described below.) It will be easiest to start with something simple. In this example, the client will pull two different mortgage interest rates (a 30 year fixed rate and a 5/25 ARM rate) from the Expresso back end. Since I am accessing a Java back end, there are two options.

Google's services infrastructure. The benefit of this is fairly painless and transparent marshaling of Java value objects--that's according to the documentation, I haven't used it. The main downside is that I can't use Expresso's default servlet (with its authentication, logging, and caching) to handle the service request, because you must always extend Google's RemoteServiceServlet.
Use the HTTPRequest class. This class essentially wraps the XMLHttpRequest object in a cross browser way, so if you're familiar with that Javascript construct (which is at the heart of AJAX), the API shouldn't be too shocking. This class limits the types of response available; no XML/DOM tree is passed back from a request, just text (in a String). The benefit of this method is that it's very familiar to folks who've used XMLHttpRequest and is relatively simple. The main downside is that you're limited to Strings as return values. There are, however, ways around that limit.

Based on the current requirements, it made more sense to use HTTPRequest than Google services. I could see using the Google services layer if I were doing some green field development in Java, or in a situation where such transparent marshaling saved a significant deal of work.

Of course, when you're sending back text, you have a couple of options for encoding the data. I considered a custom encoding, but that's not very scalable to large datasets, escaping and unescaping can be non trivial, and there are well known text transfer formats out there. Of those I know, XML and JSON are the primary ones. I went with JSON because Google kindly (almost) provides a JSON parsing library in their sample code. This parsing library is nice because it allows me to send back values as Strings, Arrays, Booleans, Numbers or Objectss and converts the JSON to the appropriate type. I say almost because I had to make a few changes to their code.

In my next post, I'll look at the server side changes I made, including integrating the JSON.Simple Java library into Expresso.

Posted by moore at 04:33 PM

June 10, 2006

Step #1: A Calculator with No Server Interaction

(Update 6/26: Here's step #2, step #3 and the conclusion, with more source code.)

The first step is to build a simple javascript mortgage calculator, like many others out there, but using GWT. It seems that GWT is a great way to build complicated JavaScript UIs without much knowledge of JavaScript. (As an aside, I know this first solution is vastly overengineered, but eventually the problem set will require more of GWT.)

If you're in a hurry, the GWT Mortgage Calculator is live and here is the source (for eclipse, for Windows; you'll need to download the GWT and place the libraries in the lib subdirectory).

First, I found this article about calculating loan payments, which seemed straightforward enough. Then I downloaded the GWT and played around with some of the examples. I then created a project for eclipse, using the instructions in the GWT getting started guide.

A mortgage calculator is a pretty simple problem, really. Five labels and text boxes cover the needed inputs and outputs for interest rate, loan amount, term, number of payments and paymount amount. Also needed is a 'calculate' button and some way of conveying an error, should the user give invalid input.

I looked at using HorizontalPanel and VerticalPanel, but they looked horrible. I ended up using a FlexTable, which worked just fine.

I wanted to do some inline validation on the textboxs to make sure no one was entering anything other than numbers and a decimal point. Luckily, there is sample code for doing just this. This code worked just fine when I was running in my hosted environment, but when I deployed to a server and looked at the calculator in FireFox, I couldn't type anything. The Google Web Toolkit group had the answer:

The problem is that GWT does not handle the the way that Mozilla/Firefox uses keyCode. For a key press event, evt.keyCode is only set for the non-character codes (Function keys, PageUp etc). For character keys the evt.charCode field is set. For key up/down events evt.charCode is never set, only evt.keyCode.

After disabiling this input validation, the calculator worked on FF.

I started out using a popup for error conditions, like non numeric user input. This examples is wrong--rather than setWidget() as the last line of the constructor, you need add(), otherwise you end up seeing an error like this:

[ERROR] Uncaught exception escaped

java.lang.RuntimeException: JavaScript method '@

com.google.gwt.user.client.ui.impl.PopupImplIE6::fixup

(Lcom/google/gwt/user/client/Element;)'

threw an exception

    at com.google.gwt.dev.shell.ie.ModuleSpaceIE6.invokeNative(

ModuleSpaceIE6.java:394)

    at com.google.gwt.dev.shell.ie.ModuleSpaceIE6.invokeNativeVoid(

ModuleSpaceIE6.java:283)

    at com.google.gwt.dev.shell.JavaScriptHost.invokeNativeVoid(

JavaScriptHost.java:127)

    at com.google.gwt.user.client.ui.impl.PopupImplIE6.fixup(

PopupImplIE6.java:43)

    at com.google.gwt.user.client.ui.PopupPanel.show(PopupPanel.java:211)

(More information here.) I ended up using a dialog box instead, because I felt it looked better.

Now I had a working copy of the MortgageCalculator, and I just needed to integrate it into my client's website. For speed, we decided I'd just check the derivative javascript/html/xml into the site's source tree. Obviously, for the long term this is not a viable solution. A better way to do this would be to store the java files (and libraries, etc) in CVS and use some of the ant integration out there. Eventually we'll get there.

For ease of integration, I added a paragraph element to an existing JSP with a unique ID that the java class will reference and into which the java class will push the generated html (updated 6/13 to fix a broken url). This is just like the sample application.

In order to access the javascript when the including file and the generated javascript are not in the same directory, you need to not only modify the src attribute from <script language="javascript" src="gwt.js"></script> to <script language="javascript" src="/absolute/path/to/gwt.js"></script> but also the meta tag. This tag changes from <meta name='gwt:module' content='com.cohomefinder.gwt.mortgagecalculator.MortgageCalc'> to <meta name='gwt:module' content='/absolute/path/to/com.cohomefinder.gwt.mortgagecalculator.MortgageCalc'>. This tag can actually be placed in the body rather than in the head.

I ran into another issue when integrating this code into the existing website. For some reason, in FireFox 1.5.0.4 the error dialog box doesn't show up where it should, and instead is placed at the bottom of the screen. Here is the relevant section of code:

          if (retVal != 0) {

                ErrorDialog err = new ErrorDialog("v1 Please enter all data in n

umeric format.");

                Widget slot1 = RootPanel.get("slot1");

                int left = slot1.getAbsoluteLeft() + 10;

                int top = slot1.getAbsoluteTop() + 10;

                err.setPopupPosition(left, top);

                err.setStyleName("error-Popup");

                err.show();

          }

and here's the error message from the Javascript Console:

Error: Error in parsing value for property 'left'.  Declaration dropped.
Source File: http://localhost:8080/Colorado-mortgage.htm Line: 0
Error: Error in parsing value for property 'top'.  Declaration dropped.
Source File: http://localhost:8080/Colorado-mortgage.htm Line: 0

This seemed to occur no matter which widget I was trying to get the absolute position of--the button, the enclosing table, or the enclosing paragraph element. This only occurred when integrating the calculator into a JSP page (not in the generic generated html), so it's likely it was a site specific bug. To work around it, I used a label rather than a dialog box.

Other random tidbits:

In general, I was less than impressed with the Google provided documentation, but very impressed with the GWT group.
It was very easy to quickly develop using Eclipse and the hosted web browser.
The HTML of the file you're developing to lives under the src directory (all the way down in the public folder).
All the javascript and html are compiled into the www directory.

Next, we're going to integrate with Expresso and pull some values from the database.

Posted by moore at 02:31 PM

June 09, 2006

Step by Step: A Mortgage Calculator using GWT

The following couple of posts are a first for my blog. I have blogged about client projects before, but never have I been paid to do so. But, I currently have a client who enjoys my blog. He also is a big fan of the Google Web Toolkit, and is interested in exploring the usefulness of this toolkit to his website, Colorado HomeFinder (Updated 6/10 with the name of the client, per his request). (Disclaimer: everything I say on this blog is my fault and mine alone.)

So, in the interest of exploration, I'm going to be building a simple mortgage calculator. We'll be using GWT 1.0.21, developing on Windows XPSP2 and integrating with a existing backend built using Expresso 5.3, a heavyweight open source framework. (I've used Expresso before, and written about it, and while the framework is not without its warts, it can be very useful.) I will also be using Eclipse 3.1 to build the code, and be deploying to Linux. FireFox 1.5.0.4 and IE 6.0 (both on Windows XPSP2) will be used to test the application.

I'll be documenting my missteps and lessons learned as I go. In addition, the client has kindly offered to let me distribute the source on my website, so I'll be providing a download each step of the way.

Posted by moore at 02:30 PM

May 20, 2006

Verifying the state of an image download in an javascript event

Well, I was going to write a rant, explaining how as far as I could tell, there was no way to make sure an image was downloaded, or degrade gracefully if it wasn't--within an event like onclick. But, it all boils down to the fact that there is no Thread.sleep() equivalent in javascript. See this for a fine explication or read on for an overview of what I tried that failed.

The problem is that the only real way to do it in javascript is to use setTimeout (Mozilla docs, IE docs). The problem with setTimeout is that after calling it, your event handling code merrily continues to execute, and that your setTimeout callback will probably not finish before the event code is finished.

The other way I thought of was to loop waiting for a specified number of seconds (like this). Unfortunately, in my tests, the javascript engine in IE6 doesn't appear to be multithreaded, and while this wait code executes, the image is not being downloaded.

I did not try the modal window approach, or the java applet (which seems a bit like using a sledgehammer to hit a mosquito) outlined here, but I'm not sure that either of those is really production ready (I'm not alone).

Posted by moore at 11:31 AM

May 10, 2006

Familiarity Breeds Content

With tools, at least.

James Governor raises an interesting question: Is Smalltalk Set for a Renaissance. He discusses some of the new things that are being built on this old language.

However, the most interesting thing to me is his comment, the title of this post, that 'familiarity breeds content' for tools. I've touched on this 2 years ago, when I wrote why I thought struts would be around for a good while. Incidentally, history has shown me out--currently there are 1958 hits on dice for 'struts', compared to 58 for 'webwork' and 29 for 'ruby on rails'. (Past performance is no guarentee of future results...)

Of course, that's no judgement on the benefits of the tools; badly written cgi scripts are still around too. In fact, part of a developer's job, I believe, is to at least play around with new tools and options that may make them more productive. The important takeaway is that, just as many users are reluctant to change office suites, even to upgrade, many developers have enough on their plates without learning new tools.

Posted by moore at 12:12 PM

April 10, 2006

JSVC and large log files

jsvc, which is used for daemoning Tomcat and other java applications on unix, takes filenames for stdout and stderr as arguments. One thing to be aware of is that when the either of these files reach a size of just over 2 gigabytes, jsvc simply fails. No error message. If you restart the application, it will note that it can't write to the file and proceeds to write to the console. I saw this behavior using tomcat 5 on fedora core 4 with jsvc 1.0.1 (described here).

I am not sure exactly what the problem is, but when I started tomcat via the normal shell script, it was able to write to that file. The user that jsvc runs as had no limits on file size:

-bash-3.00$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
file size               (blocks, -f) unlimited
pending signals                 (-i) 1024
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 32764
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Therefore, it might be an issue with jsvc. Do note that there are nightly snapshots of jsvc, which might solve the issue. The solution I found is to use the copytruncate option of logrotate.

Posted by moore at 08:42 AM

April 08, 2006

Best Software Essays of 2005

Update, 4/19: the entries are closed and the link to Joel's site below is null and void.

As he did last year, Joel Spolsky is again collecting the best softare essays of the past year. Fantastic essays include Why Big Software Projects Fail and Does Visual Studio Rot The Mind?. Always interesting to see what other folks consider the best essays.

Posted by moore at 12:10 PM | Comments (0)

March 29, 2006

Code behaving badly

You know what, coding is hard. You're balancing shifting requirements, a constant stream of new technologies and a hot cup of coffee, all while trying to keep up with your flood of email. When you actually get a minute to code, you sometimes don't know where to start. Testing is important, but often gets shoved to the bottom of the priority list. This makes for some ... interesting code. I love having my code reviewed, but it's always a humbling experience.

That's why The Daily WTF resonates with me. You know what? I'm self taught, so I probably have written code that bad. I just hope it wasn't code someone paid me for. But even in my professional life, I'm sure I've written code that caused someone maintaining it later to yell 'WTF'. Never Vector Oriented Programming, but I have been accused of using HashMap Oriented Programming.

Apart from mistakes anyone beginning could make, there are also folks (we've all met them) who just shouldn't be coding. That's where this site truly shines: gobs of examples. The Brillant Paula is one great one, among others.

This site is humbling and astonishing. It reminds me of the old chestnut: if your computer was a car; software 'engineering' has a long way to go before it truly becomes engineering.

Posted by moore at 05:31 PM | Comments (0)

March 22, 2006

Ant techniques

The JavaRanch Journal has a new newsletter out; one of the articles is an interesting look at some of the new, advanced 'enterprise' features of Ant. This is just part one of the series by Ajith Kallambella; I'll be keeping my eyes out for the next parts.

Posted by moore at 08:29 AM | Comments (0)

March 16, 2006

Authentication without cookies

A cool piece of webhackery:

Cookieless HTTP Authentication.

Via Stefan Tilkov.

Posted by moore at 06:32 PM | Comments (0)

March 10, 2006

Cross country skiing page mapified

As I mentioned I might do in passing here, I've updated my Boulder Cross Country Ski information page with a map of trailheads. Incidentally, the above post was also mentioned in a blog roundup on Google Maps Mania a few weeks back.

Posted by moore at 09:35 PM | Comments (0)

March 07, 2006

Wiki practices for requirements documentation

I have been heads down for the last couple of weeks helping write requirements and design documents. The team I'm on is building them using a wiki. (I discussed the wiki selection process a few weeks ago.)

I just wanted to outline a few practices (I hesitate to call them best practices) for using a wiki to document business and technical requirements.

Having a wiki allows anyone to edit the requirements. This doesn't mean that everyone will or should. Documents should still have an owner.
Require folks to identify themselves. Require Author met our needs, as it requires an editing user enter some identifier. A history function, without tracking who made which edit, is fairly useless. Note that our solution works for a small team. A larger team may want to authenticate every user.
Make sure you lock down the wiki. We have ours behind the firewall, which means that we don't have to require a user to remember yet another password, or even login at all (beyond providing some kind of identifier once, as mentioned above).
PDF generation allows you to generate decent looking print documents. I found PmWiki2PDF to be adequate.
Think carefully about document structure. We broke out the requirements into sections, and had each section on its own wiki page; more than that, we have pages for each section for each type of requirements (business, technical) or design document. These three section pages are pulled into a page for that one component, via the page include directive, which should describe everything known about a particular component. This kind of page seems useful at present, but we haven't begun coding.
However, if I had to do it over again, I'd build each main document as one wiki page, and then pull the component info out of that. This allows a user to view the overall history of the document, as opposed to the above setup, where, to see what has changed in the requirements, you have to visit as many pages as there are sections. (You can also look at the RecentChanges page for a group, but that has only a page level granularity, as opposed to the line leve granularity of the page history.)
Choose page names carefully. While it's easy to move content from one page to another, realize that you lose all the history when you do that. Well, actually, you might be able to move the file on the filesystem and retain the history, but for normal users, moving a pages (that is, changing a page name) causes history loss.
Keep requirements, whether in sections or in one document, in a different group than the design document. This allows you to lock down the requirements group, via a password while letting other documents, like design, continue to evolve.
Cross reference extensively. Don't cut and paste, link or include.
Use pictures. The support for uploading pictures in PmWiki is alright, though the support for removing them isn't great. Regardless, don't shy away from diagrams and other graphics in the wiki.

I'm going to be interested to see how the process continues to evolve as we get further into development. But so far, I think that a wiki has everything you really need to generate requirements documentation for a small team of developers.

Posted by moore at 10:42 PM | Comments (0)

February 23, 2006

MySQL performance and doing calculations on varchar columns

MySQL, along with other features designed to make it easy to use, tries to do the right thing regarding strings. When you perform a math calculation on a column or columns that are of type varchar, MySQL automatically conversts that string to a number (empty strings are treated as zero.):

To cast a string to a numeric value in numeric context, you normally do not have to do anything other than to use the string value as though it were a number[.]

However, this translation, convenient as it may be, is not free. A client of mine had a query that was running calculations against two such columns. After indexing and trying to simplify the query, we were still seeing query execution times of 2+ seconds (all times are quoted for MySQL 4.1, on my relativly slow personal laptop).

The solution appears to be to change the type of the columns using the alter table syntax to type double. After doing so and running analyze table mytable, I was seeing query execution times of 0.2 seconds for the same query on the same box. Fantastic.

I am not sure if this result was due to not having to do several string conversions for each row returned by the query, or the fact that:

In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed[.]

Regardless of the cause, if you're doing some complicated calculations on columns, consider making them numbers.

Posted by moore at 09:45 AM | Comments (0)

February 15, 2006

mod_alias vs mod_proxy for XMLHttpRequest proxying

If you're going to use an apache proxy to fix some of the issues with XMLHttpRequest, be aware that mod_alias sends a redirect to the browser. That is, it sends one of the 3XX HTTP status codes to the XMLHttpRequest object. The XMLHttpRequest object then issues the GET itself (well, some do, check out these XMLHttpRequest tests for more). That's fine if you're doing a GET, but if you're doing a POST, then some redirects will require user interaction. I found that mod_alias, which was sending a 301 (redirect permanent) just turned the POST request into a GET. (mod_alias doesn't always come to the rescue, apparently.)

The solution? Well, since you are really proxying the XMLHttpRequest's request, use mod_proxy.

Posted by moore at 09:37 AM | Comments (0)

February 07, 2006

Google Maps Gotchas

I've done some recent work with Google Maps for a client and thought I'd share some lessons learned. (It seems I've been posting a lot about Google lately--I don't know why.)

First off, like many other folks on the web, I think Google Maps are great. I've been a long time MapQuest user and the fact is that Google's image panes just produces a better, slicker user experience than MapQuest's dynamic image generation. Not to mention the fact that Google's map API is free (as in beer, not in speech). Now, I haven't had a chance to compare Yahoo's map offering (as Michael Yuan did), though I have played around with Yahoo! MapMaker for Excel, but more mapping options can only be better. On to the issues I had with Google Maps.

* Geocoding is not provided with Google Maps, which means that you need to set up your own geocoding engine. However, the Tiger/Line dataset has some holes in it. Especially for rural regions, I was seeing many addresses that could not be geocoded. Even for an urban area (like Boulder, CO) around ten percent of the addresses were not geocodable. As for accuracy of the geocoding itself, I don't even know how to test it on a large scale, but my client said he saw at least some instances where an address was geocoded incorrectly (a small number, but still significant). Well, what can you say? If you want precision, much less accuracy, pay for it. I investigated using Yahoo's geocoding service, which is free and based on higher quality commercial data. Since my client is a commercial site (even though the maps are available for free) Yahoo said that they would require Yahoo maps on the site if it were to use their geocoding service. Fair enough. (As an aside, this was an interesting podcast of a speech by an executive of Navteq outlining some of the issues surrounding procuring good geodata.)

* PNGs are the default image type for the map pinpoints on Google Maps. These images () let you mark certain locations. However, if you display them in a list alongside the map, you'll quickly find that transparent PNGs don't work with Internet Explorer. They show up, but are surrounded by a black text box (Update Feb 8: transparent PNGs are outlined by a box in Internet Explorer. I've seen black boxes and blue boxes; regardless of the color, it looks bad). Luckily, the transparent PNG/Internet Explorer problem has already been solved.

* Each pinpoint/marker is added using an overlay. If you have significant numbers of overlays, map rendering becomes quite slow. Now, for Firefox users, it's not as big an issue, because Firefox renders the rest of the page before the map. But for IE users, the table containing both the list and the map is not rendered until the map is displayable. On a reasonably fast box, I was seeing times of 80 seconds until the page was rendering. Clearly, this was unacceptable. Even on Firefox, some of the rendering times were just too slow. I searched the Google Maps Discussion Group and most everyone was saying that if you have a large number of markers, you should cluster them into a few representative markers until the user has zoomed sufficiently. (If I recall correctly, the maximum number of markers most folks recommended was around 20.) Such clustering can happen on the server side, or on the client side.

* Data retrieval architecture affects performance. For the first revision of the mapping application, I sent the data for each pinpoint at the same time as the map and the listing. This was the wrong way to do it--this method makes perceived rendering time much longer. The correct way to go is documented in 'Hacking Maps with the Google Maps API' XML.com article (linked below), where you use an XMLHttpRequest to pull in the pinpoint data asynchronously. I am in the midst of developing revision two of the interface and have noticed an appreciable speed up in rendering, so I'd recommend heading down that path at the start.

Finally, some resources. The first article you should read is Hacking Maps with the Google Maps API. This tutorial steps you through everything you need to know. The API documentation is adequate. I found some interesting discussions happening on the API discussion group. And, finally, the GoogleMapki was useful in looking at performance concerns. I haven't read Scott Davis' Google Maps API article, but keep meaning to spend the $8.50 and do so.

All in all, I enjoyed learning this new technology. The API is simple and easy to use, even for someone who's no javascript expert. I look forward to adding maps to some of my other pages; my cross country skiing resources page in particular would benefit. Google has kickstarted a whole new area of web development and I think there's plenty more to do.

Posted by moore at 09:04 PM | Comments (0)

January 14, 2006

Perl to the rescue

I am using Apache JMeter to load test a web application. JMeter has an XML file format for storing load test configuration information. I wanted to system test as well, and needed to generate a large number of unique URL hits. Rather than using the clunky UI to add them, and getting carpal tunnel from it, I analyzed the XML file format and split it up. Then I put tokens (XXXTIMETOWAITXXX) in the appropriate places, and used an Excel generated CSV file to drive perl to assemble the pieces of text into a valid JMeter config file.

Well, what happened next? I needed some way to generate a larger number URLs than Excel would be pleasant to handle? Again, perl came to the rescue, making it easy to generate umpteen lines of correctly formatted CSV.

Posted by moore at 07:51 PM | Comments (0)

Performance testing, complexity of

Performance testing is a bit like visiting your girlfriend's father. You're never quite sure what you're accomplishing, it can be alternately frustrating and satisfying, and you have to do it. Right now I'm in the midst of performance testing a web based application for my new company. I've been in such testing tangles before, though always as a consultant on a fixed bid project. I'd have to say that performance testing as an employee is less stressful than that.

The reasons why performance testing, especially of web applications, is such a rat's nest are many:

complexity of platforms
Most modern web applications are built on a lot of code. In our case, it's a servlet and logging framework on top of tomcat on top of the JVM on top of the operating system. Four levels on the web server, not counting the back end or the load balancer or any interaction with the browser! And this is a relatively simple system. I've seen portal applications that had 6 or more levels in the web server. Each level of the software stack interacts in (sometimes unforeseen) ways with the others, which means that changing parameters can have unpredictable effects. You simply must test every change you make.

realistic hardware
Unless you're working with Scrooge McDuck or an application that has yet to be deployed, you're probably not going to be able to test on production hardware. Very few companies I've dealt with are willing to buy a duplicate of their production hardware for testing purposes, so you'll probably be testing on a scaled down version of the production system. That means that you'll have to make assumptions about what the smaller system will tell you about the bigger system. One usually safe conclusion is that the smaller system sets a performance minimum for the larger system.

amount of time required
Each performance test takes a significant amount of time, minutes rather than unit testing where you want the unit tests to run quickly. Such slow turnarounds mean that performance testing just can't be done quickly.

difficulty of understanding real user behavior
The more complicated your application is, the harder it is to understand how people are going to use it. Will they move quickly through the application? Will they leave sessions open for a long time? How many states will they go through? Anyone can come up with a reasonable guess as to the answers for these questions, but the only way to know for sure is to a) user experience test it, or b) unleash the application.

ambiguous or arbitrary goals
Unless you really understand how your userbase is going to use the application, it's hard to come up with reasonable goals. 'Make it run faster' doesn't cut it. Nor does picking an arbitrary number: 'we want to service 10,000 hits a second' may seem like a good goal, but if that number just was plucked from the air, a lot of misery can result. Especially if you're on a fixed bid project, and every hour you spend is eating into your margin. (It's OK for performance testing to make a tech person miserable, as long as there's business benefit—and an arbitrary number is likely to under- or overshoot the optimum for business.)

difficulty of reproducing real user behavior
I've not had a lot of experience with for pay tools, but have used a variety of free (as in beer) tools. I've written before about my experiences with The Grinder, and am currently using JMeter. I've also used apachebench. And all of these tools were great at hitting URLs repeatedly and rapidly, but it was hard to really reproduce user (and browser) behavior because they're simple programs. An example is that some versions of IE can call a servlet multiple times. You can't possibly hope to replicate all the quirks of browsers when testing, but sometimes those quirks can have performance impacts.

These dimensions of complexity feed on each other. Because it takes so long to performance test an application, you are tempted to change more than one level of the application at once. Because you think you understand user behavior, you come up with an erroneous performance target.

Is it hopeless? Nope, and it can be a very good exercise—it can turn up areas of real weakness in your application. Just remember to document your assumptions, make the tradeoffs abundantly clear to non technical folks and realize that you're going to miss something important. Your results won't be worth as much as you think they will be. Oh yeah, and don't sign any fixed bid performance testing contracts unless you know what you're doing.

Posted by moore at 07:31 PM | Comments (0)

December 30, 2005

On 'The Perils of Java Schools'

Joel has another interesting article, The Perils of Java Schools, where he laments the fact that many CS degrees are focusing on Java. His main points are that if you don't focus on the harder parts of CS (recursion and pointers) then you don't weed out inadequate programmers, and that Java doesn't allow for adequate examination of those harder parts. Weeding is needed, even though the harder parts aren't--for most jobs. (His example of the need for pointers is working on operating systems--how many programmers really need to do that? Darn few. [Of course, for those who are interested in working on operating systems, I'd recommend avoiding a Java based CS degree.]) In addition to weeding out those who don't have a talent for programming, recursion and pointers are a great interview topics (see his Guerrilla Guide to Interviewing) for finding smart people to hire.

When I read his article, I thought about the two related responses to his lament. The first is that coding isn't the most important thing for many 'programming jobs' anymore. For a large number of them, the ability to relate to business problems and solve business needs is much more important. See this article for a related discussion on how to avoid being outsourced. A pure coder is more likely to be outsourced than a coder who also knows the business. I'd argue that at many organizations, a brilliant pure coder who can't relate to the business folks is less effective than a decent coder who can extract requirements.

I don't have a CS degree. It has definitely hurt me at times: I'm not as comfortable with some of the lower level constructs (parse trees, pointers) as other colleagues with a traditional CS degree. However, my liberal arts education has benefited me, because the writing and oral communication skills that I honed at college help me pinpoint what non-technical folks want to build. In fact, while building a system is fun, the real challenge and reward of software engineering is finding out what needs to be built, and figuring out how to build it. Both types of skills are necessary.

But, the other point of his lament remains. How do you find intelligent software engineers, and how do you distinguish those who talk a good game from those who can actually play it? I was sitting around the poker table a few days ago and some friends were discussing MVC and n-tier architectures. It's so easy to toss around those high falutin' words--it's another to understand the nitty gritty of building them--I don't. But I don't think anyone who hasn't worked on those large scale systems really does--CS degree or not.

I don't know any one way to distinguish good workers. The closest thing to a methodology I have is to ask them questions about real world situations that stonkered me, and see if their answers make sense. Joel still makes sense in the Guerrilla Guide when he says you want people who are "smart and get things done". But I believe that the focus of development has changed enough that the lack of C knowledge is not a loss. Just as the lack of punch card skills is not a loss.

(Note that I've used software engineers, developers, and programmers synonymously above, which may or may not be a justifiable abuse of the English language.)

Posted by moore at 04:39 PM

December 18, 2005

On contracting

I recently (within the last couple of months) took a full time job as a software developer. After two years of contract software development, I have found it to be quite a change. I think that both contracting and full time employment have good things to offer; if you can manage it economically, contracting is well worth doing for a while. Why? A number of reasons:

When you contract, you're responsible for finding your own work and making sure you get paid. This alone is worthwhile, as it gives you a fantastic appreciation for sales and accounting. The networking to find your next gig is great experience for moving up the food chain in a company.

You also get exposed to different technologies and businesses. In my two years, I did work in 6 different programming languages, a number of different frameworks (both free and expensive) and build environments, and 3 or 4 operating systems. That was great--I grew technically and learned how to tell if another techie was competent (and what they were competent at!) relatively quickly.

But more importantly was the variety of business situations I worked in. I did work alone, with one other technical person, on a team with no dedicated QA, on a team with dedicated QA and build staff. I did work with small companies, medium sized companies and enormous companies. I worked with startups and with established firms. Some of the organizations had excellent management structures--in others the inmates were running the asylum. And I paid much more attention than I would have if I had been an employee, because my paycheck rested on making sure that whoever was in charge was happy. This breadth of business experience is something that you cannot come by if you're a full time employee.

Beyond that experience, there's also the time and money factors. If you want to control your time, contracting is great--I regularly took multiple week vacations, because I was willing to sacrifice earnings for them. On the flip side, if money is more important than time, you can certainly earn significant amounts of money when contracting, as you get paid by the hour.

Contracting also has a lot lower stress than an employee-employer relationship. In my opinion, in a proper employee-employee relationship, the employee is loyal to the goals of the company (which implies understanding them--itself a difficult matter) and helps achieve those goals. In addition, if an employee makes a technology decision, they are living with that decision for the foreseeable future. These factors combine to make work as a full time employee more stressful; more rewarding, but more stressful. In the contractor-client relationship, you still want to do a great job. But the permanence of technology decisions isn't there--either you're doing work that fits into an existing technology stack or you're making technology decisions but will be moving on in some finite period of time. Your stress is limited to whether your invoices will get paid!

There are downsides to contracting, too. For one, you're not part of a team. You may work in a team, and they may make an effort to include you, but you're not really part of that team--you're just a hired gun who will do the work and then head off. When the inevitable bug comes up 5 months down the road, they can call you, but doing so incurs a higher transaction cost than if they just had to grab a fellow employee for a minute or two. The flexibility of money and time that you have can cause resentment too.

While the breadth of technologies and business methods can be great to experience, it can also be difficult to process. To hop in and be productive on the first day or two of a new contract can be hard, because you want to make sure you're fitting into the existing processes.

All of the above comments are based solely on my experience. But I'd say that if you're considering contracting, do it! It's a great experience. Make sure you have a buffer of 6 months of pay, and then jump in.

Posted by moore at 03:59 PM | Comments (0)

November 14, 2005

The Ghost of Missing Requirements

I read OK/Cancel sporadically, but the Halloween cartoon was just too good to not call attention to:

I think we've all been on such haunted projects.

Posted by moore at 01:59 PM | Comments (0)

November 12, 2005

unescaping a string with PL/SQL

I've written about PL/SQL before, but I've recently started working on a project that uses it heavily. Given the amount of code written for Oracle databases, I'm rather suprised that there's not a PL/SQL Cookbook, where, like the Perl Cookbook and the Java Cookbook (more cookbooks from O'Reilly are listed here). There is an Oracle Cookbook, but based on a quick scan of Amazon, it's is focused, as you'd expect, more on the database design than on PL/SQL programming. (Interestingly, there is a Oracle+PHP cookbook, and a PL/SQL sample code page but neither of those is quite what I'm looking for.)

The reason that I'd like a PL/SQL cookbook is that there are large sets of problems that routinely need to be solved in PL/SQL, but the language is so low level (though they just added some regex support in 10g; bravo!) that doing these routine tasks and making sure they're correctly implemented can be difficult and tedious. This is especially true when it's a programmer from a different language who's used to higher levels of abstraction (like, for example, the good folks who author CPAN modules provide)--it'd be well worth my $70 to make sure that I never had to deal with a problem like, say, unescaping a string.

For that's the problem I recently had to solve. Essentially, we have a string that looks like this: yellow,apple. This string represents two values, which need to be put in different places by splitting them up into 'yellow' and 'apple'. All well and good until the possiblity of embedded commas arises, for it's possible that the desired end values were 'yellow,blue' and 'apple,banana'. The answer, of course, is to escape the commas on the way in (turning the second input into something like this: yellow:,blue,apple:,banana, and when processing to unescape those special characters (both the comma and the escape character, which in the example is the colon). That's what these three functions do. They take a string like the above examples and parse it into a table, to be iterated over at your leisure.


/* ------------------- function splitit ------------------*/
FUNCTION splitit(p_str VARCHAR2, p_del VARCHAR2  := ',',p_idx PLS_INTEGER, p_esc VARCHAR2 

:= ':')
RETURN INTEGER
IS
    l_idx       PLS_INTEGER;
    l_chars_before      VARCHAR2(32767);
    l_escape_char       VARCHAR2(1) := p_esc;
    l_chars_before_count        PLS_INTEGER := 0;
BEGIN
    <>
    LOOP
        l_idx := instr(p_str,p_del, p_idx);
        IF l_idx > 0 then
            WHILE substr(p_str, l_idx-l_chars_before_count-1, 1) = l_escape_char LOOP
                   l_chars_before_count := l_chars_before_count +1;
            END LOOP;


            IF mod(l_chars_before_count, 2) = 0 THEN
                -- if chars_before_count is even, then we're at a segment boundary
                RETURN l_idx;
            ELSE
                -- if odd, then we're at an escaped delimiter, want to move past
                RETURN splitit(p_str, p_del, l_idx+1, p_esc);
            END IF;
            l_chars_before_count := 0;
        ELSE
            RETURN l_idx;
            EXIT outer;
        END IF;
    END LOOP;
END splitit;
/* ------------------- function splitit ------------------*/

/* ------------------- function unescape ------------------*/

FUNCTION unescape(p_str VARCHAR2, p_del VARCHAR2 := ',', p_esc VARCHAR2 := ':')
RETURN VARCHAR2
IS
   l_str VARCHAR2(32767);
BEGIN
   l_str := replace(p_str, p_esc||p_del, p_del);
   l_str := replace(l_str, p_esc||p_esc, p_esc);
   RETURN l_str;
END unescape;
/* ------------------- function unescape ------------------*/

/* ------------------- function split ------------------*/

FUNCTION split(p_list VARCHAR2, p_del VARCHAR2 := ',')
RETURN split_tbl
IS
    l_idx       PLS_INTEGER;
    split_idx   PLS_INTEGER     := 0;
    l_list      VARCHAR2(32767) := p_list;
    l_chars_before      VARCHAR2(32767);
    l_escape_char       VARCHAR2(1) := ':';
    l_array split_tbl := split_tbl('','','','','','','','','','');
BEGIN
    l_list := p_list;
    LOOP
        split_idx := split_idx + 1;
        IF split_idx > 10 then
            EXIT;
        END IF;

        l_idx := splitit(l_list, p_del, 1, l_escape_char);
        IF l_idx > 0 then
            l_array(split_idx) := unescape(substr(l_list,1,l_idx-1), p_del, 

l_escape_char);
            l_list := substr(l_list,l_idx+length(p_del));
        ELSE
            l_array(split_idx) := l_list;
            EXIT;
        END IF;
    END LOOP;
    RETURN l_array;
END split;
/* ------------------- function split ------------------*/

/* in the header file, split_tbl is defined */
  TYPE split_tbl IS TABLE of varchar2(32767)

Not all of this code is mine--I built on a solution from a colleague. But I hope this saves one other person from the afternoon I just endured. And if you are a PL/SQL expert and care to critique this solution, please feel free.

Posted by moore at 09:46 AM | Comments (0)

October 28, 2005

Oracle and regular expressions

I cut my teeth on perl and really enjoyed the power of regular expressions. Looks like Oracle has added regular expressions to Oracle 10g. Now if they'd just give me tab completion in sql*plus.

Posted by moore at 03:43 PM | Comments (0)

October 27, 2005

Outsourcing observations

Guess what? It's hard to find talented engineers anywhere, even in India. An interesting read from someone who has apparently been there and seen the issues. Via Lasse's weblog.

Posted by moore at 08:11 PM | Comments (0)

Cross browser javascript/css development issues

I'm working on an application that needs to be supported on a wide variety of browsers, and unfortunately includes some interesting javascript and css. There are three problems we've encountered so far.

1. Finding Browser share

When you want to support most users, you have to try to figure out what they're using. There are at least three or four different sites which give you their browser share, but I think you have to pay if you want really accurate, detailed information; here's one source, here's another, and here's one last site. Update, 11/3: here are stats for the www.bbc.co.uk homepage.

2. Javascript specifications

Perhaps it's just me, but I've had a devil of a time finding a list of javascript events supported by various browsers. I'll give it to Microsoft, they have some documentation on supported events; I couldn't find a similar list of events anywhere on the mozilla site. Here's the Mozilla Javascript page but I don't see anything resembling an API there. (All I want is a javascript javadoc!) Here is the best comparison of event support on modern browsers that I found. Update 10/31: here is a list of events that Gecko recognizes.

3. Getting ahold of old browsers and older operating systems, so you can test

Luckily, this is fairly easy to solve. VMWare (which I've written about previously) takes care of the various operating systems (well, that and a mac mini) that we need to test under. And a simple google search turned up a fantastic archive of old browsers: browsers.evolt.org, which has many different browsers going all the way back to NCSA Mosaic!.

Posted by moore at 08:09 PM | Comments (0)

October 23, 2005

A quick survey of online map generation options

I have a client who wants to put some maps on his commercial website. I've done a bit of looking around, and it's not clear to me what the best way to do it is. There are really two types of mapping services out there. One depends on URL creation, like MapQuest, MapBlast, Yahoo and Google--you don't register or do much coding at all, you just create a GET string with the address encoded in it. The other is a web service where you register for a key and use an API to generate a map, like Yahoo, Google and MapPoint. You'll note that Yahoo and Google appear on both of those lists--that's because they provide both a URL interface and a more formal API.

Now, even though I am not a lawyer, it seems to me, via looking around at the various Terms Of Service (TOS), that commercial use of any of the URL interfaces is not an option. The Yahoo Maps TOS says

The data included in Yahoo! Maps, including but not limited to maps, routes, and/or directions ("Data"), is provided for your personal use only...

while the MapQuest TOS says

...MapQuest grants you a nonexclusive, non-transferable license to view and print the Materials solely for your own personal non-commercial use.

Google Maps, which has been extensively mashed up with other sorts of data, appears to abide by the general Google TOS which say

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales.

To be fair, you may contact Google about commercial services: [i]f you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Updated, 10/30: apparently the maps API is governed by a different TOS, which apparently allows commercial use "as long your site is generally accessible to consumers without charge". My apologies. I didn't look at MapBlast too carefully, because it's built on the MapPoint web service, which has a noncommercial provision.

Luckily, at least for those of us in the United States of America, there are two services provided by the Census Bureau (see, those taxes you're paying are worth something!) which provide mapping data. As far as I could find, these services have no limits on commercial or non commercial use, but they are a bit hokier than the ones I laid out above. (Here are the Tiger TOS and the general Census position of resale.) The Tiger site was the preferable of the two, because it actually gives you a marker for your location. Of course, you have to geocode your address to find your lat/long, but Geocoder.us makes this easy, and even provides instructions on making your own service. The data for Tiger is from 1998, however. If you're looking for more recent data, FactFinder is worth looking at. It didn't work for my client because it provided no way to pinpoint a particular address, though it did allow you to recenter on one without geocoding it.

Neither of these provide directions, as far as I could see, so if you're looking for that, as well as if you want the cooler interfaces of the private sector, you need to look to the web services.

MapPoint, which is a Microsoft service, explicitly denies external commercial use in its TOS:

MapPoint Web Service is for your individual use, solely for internal use by you for your business, or for your own personal use.

Yahoo and Google, however, take a bit more flexible position. For each of these services, according to their TOS, you need to contact them to use the web service they provide in a commercial context. (Yahoo Maps Web Service TOS, and the Google TOS for the web service which is the same as that for the URL interface service.) I have no idea what kind of licensing agreement will emerge from talks with these companies, but, from reading their TOSes, it appears to me that if you want to use their data in a commercial manner, you need to have that conversation.

I've covered all the services that provide maps that I know of. Please let me know if there are any that I've missed or anything I've misinterpreted.

Posted by moore at 10:52 AM | Comments (0)

October 12, 2005

Set your priorities

Joel has a new article out, on the topic of setting your priorities for new development. I am mostly in the consultingware business, and love his characterization of it. Most companies I've worked for want to get in the shrinkware business, but for various reasons it's hard to do (mostly because you have to invest time and money up front--I have seen several attempts to build a product on top of work done for a customer, but never a success)--whereas selling your labor is easy. However, I think that his fast and loose prioritization scheme would work well for custom software development too.

Posted by moore at 11:15 AM | Comments (0)

September 26, 2005

CU Talk: Supporting the Software Revolution

Last Thursday, I went to a talk (one of the CU CS Colloquia) about software and the problems it faces today at the University of Colorado called "Supporting the Software Revolution". Amer Diwan gave a talk about some of his research and how it deals with modern software programs, which are becoming larger and larger, with the following ramifications:

they are
1. harder to write--more collaborative.
2. harder to understand
3. more resource intensive

He talked about some of his research in the educational sphere, where he was working against the traditional engineering bias against collaboration by training students to work in teams. Amer also mentioned his research into tools to help discover documentation to increase understanding of large programs. But the meat of his research, as well as the focus of this talk, was on technologies to improve performance, including hardware aware software and visualization.

Amer primarily discussed vertical profiling, which stated that because of the multilayered nature of todays applications (application on top of framework on top of virtual machine on top of hardware) it is not enough to simply profile the application and the hardware, since each level can interact with each other level in non intuitive ways.

The answer is vertical profiling, where you instrument each layer appropriately. (Some layers, such as the jikes JVM, are pre-instrumented.) Find a target metric, like instructions per cycle. Instrument different all the different metrics (for example, new object allocations is one thing could be instrumented for the virtual machine level). Then, align all these metrics with one common metric to combat nondeterministic behavior.

(This is where I get a bit fuzzy--I believe that they used some kind of algorithm to match up the instructions per cycle with other interesting metrics. He mentioned some kind of algorithm that had previously been used for speech recognition. Not sure how to align three variables [time, instructions per cycle, and one other interesting metric] on one chart.)

Then, after all the metrics have been aligned in time, look for interesting and disturbing patterns--this is where the humans come in and rank the graphs by similarity. Then see if one metric depends on another--you can then discard the dependent metric, since you're looking for the root issue. After you think you have found the root issue, make a change to that, and profile the stack again. If the pattern of interest is gone, you are validated and have addressed the root issue--otherwise back to the drawing board.

This was interesting because of the alignment in time of different metrics (which is, unfortunately, the piece I understand and remember the least). Other than that, it was pretty much a well thought out and methodical explication of common knowledge of profiling. Change one thing at a time, look for dependencies, validate your suppositions, be aware of interactions between the layers of your application. It is nice to see someone trying to turn the black art of performance tuning a bit more into a science.

So, if you're ever on a project that has the time and budget to deal with performance issues in a methodical manner, vertical profiling is worth a look. More here:

Performance explorer: understanding java application behavior

One of Diwan's student's research pages

Posted by moore at 11:17 AM | Comments (0)

September 22, 2005

InstallAnywhere Impressions

I helped write a java program a few months ago, a product designed to run on mail servers. Now, we had the program packaged up as a zip file (for windows) and a tarball (for unix). That wasn't my decision, but it makes sense to me--if you are deploying a program on a mail server, you should know how to unzip files and edit configuration files.

But, that's not what the client wanted. They came back recently with a few changes, and a desire to install via InstallAnywhere. I am no expert at InstallAnywhere, but the client didn't have the engineering cycles to repackage the program, so they paid me to do it. What follows is my overall impression of InstallAnywhere, and a few tips and tricks.

Overall, I like InstallAnywhere. This program makes it easy to build java program installers for a variety of different platforms (most of the unices, Macs and Windows), execute sundry actions pre and post install, and grab user input while installing. It supports both GUI and console installation procedures. In particular, the Windows installer was a snap, and I didn't have to learn the first thing about the registry--InstallAnywhere took care of that, even to the point of having the program show up on the 'Add/Remove Programs' Control Panel.

On the downside, there are a bevy of options and the help file wasn't exactly the best. They have a free trial version, but it complains every time you install from a file built with the trial version; such installers stop working around 10 days after you build with the trial version as well--but the trial version doesn't tell you about the future failure.

A few tips:

* It's possible to keep the install configuration file in CVS, except for the fact that it hardcodes paths to resources that it includes in the install file. I was able to work around that by using ant's replace task.

* When you start up the program (on unix), you can't kill it normally, via either cntrl-c or backgrounding it and running the kill command on the process. I believe this is because the default behavior of a launcher is to listen to the console for stdin. You can change this easily enough, but the default really should be correct.

* The installer builder doesn't default to generating a log, even though many of the default log messages point you to the install log file. You have to click a checkbox on the Project Info Pane in the Advance Installer.

* The console installer insisted that there were errors in the installation process even though the program, post install, worked fine and there were no errors written in the installer log. I think this is due to the fact that I'm using the trial version, but am not sure.

* There doesn't seem to be any way in the InstallAnywhere GUI to specify that if the DISPLAY variable is set (on unix systems), the GUI installer should run, otherwise the console installer should run. If you want, you can edit the generated install.bin installer script--search for 'FAILSAFE' and use a modern editor capable of long lines--but I couldn't figure out a way to automate this change. This is my biggest gripe, since this is a very typical demand. If you don't run install.bin -i console to specify a console installation, you get a lovely exception:

Stack Trace:
java.awt.HeadlessException:
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
        at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:159)
        at java.awt.Window.(Window.java:317)
        at java.awt.Frame.(Frame.java:419)
        at java.awt.Frame.(Frame.java:384)
        at javax.swing.JFrame.(JFrame.java:150)
        at com.zerog.ia.installer.LifeCycleManager.f(DashoA8113)
        at com.zerog.ia.installer.LifeCycleManager.g(DashoA8113)
        at com.zerog.ia.installer.LifeCycleManager.a(DashoA8113)
        at com.zerog.ia.installer.Main.main(DashoA8113)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at com.zerog.lax.LAX.launch(DashoA8113)
        at com.zerog.lax.LAX.main(DashoA8113)
This Application has Unexpectedly Quit: Invocation of this Java Application has caused an InvocationTargetException. This application will now exit. (LAX)

Overall, I'd say if you're delivering a java based product to typical users, InstallAnywhere is a good choice. I don't know how much it costs, but the experience for the users is as seamless as you want it to be, the width and breadth of installer customization is impressive, and it didn't take too long to get up to speed.

Posted by moore at 11:59 AM | Comments (0)

July 28, 2005

OWASP Guide 2.0.1 released

If you write web applications, you should peruse the latest release of the Open Web Application Security Project Guide, currently at 2.0.1. Its 293 pages of tips, tricks and techniques, for multiple languages, should be useful to any developer.

Posted by moore at 08:09 AM | Comments (0)

May 06, 2005

How Software Patents Actually Work

Here is a 4 minute non technical film explaining software patents.

Via the EFFector mail list.

Posted by moore at 10:42 AM | Comments (0)

May 05, 2005

Evaluating CMSes: cmsmatrix.org

One of the hardest decisions every developer faces is build vs buy. In general, build takes more money and time, but can deliver a program closer to the users' needs with greater flexibility. Buy, on the other hand, limits extension of the software--only in ways that the creators have intended can you typically extend bought software--but delivers it quickly and for a known cost (unless you're buying Oracle in which case, I hear, the price is negotiable :).

One of the harder components of deciding to buy is comparing features. This usually involves rummaging around websites, downloading evaluation copies and installing them. I've done a few of these (for open source portals, open source CMSes and bug tracking tools) and it's interesting as well as daunting. There's just a lot out there, and with limited time, I end up making decisions based on less than full fledged implementation. You can't afford to entirely implement the solution using the proposed software, and every solution will cause you pain (including, for that matter, custom built solutions).

Regardless, a friend sent me a website that takes some of the tediousness out of evaluating CMSes. Sure, it's not a replacement for downloading the software and trying it out, but it does give you a central starting point and makes it easier to quickly rule out possible solutions. I was also impressed by their inclusion of blogging tools and ease of use as well as the breadth of features compared.

Posted by moore at 11:15 AM | Comments (0)

May 03, 2005

Using XSLT to grab only certain RSS entries

So, as I've mentioned before, RSS can help you find a job. However, many jobs in my area are posted to a yahoo group (rmiug-jobs). I'm usually interested in seeing new contracts, even if it's just to see how the market is doing. However, subscribing to this email list presents you with four choices:

1. Have your inbox flooded with job postings, most of which don't apply to you. The benefit of this method is that when you do see one that applies, you can respond. Every single response I've received off of this list was in reply to a mail I sent minutes after seeing the job post; I'm guessing that almost 8000 members means that any job posters are flooded with resumes.

2. Create a filter so that all the mail messages (or even the ones with interesting subject lines) are pushed to one folder in your email client. This means that your inbox isn't flooded, but that you have to read that folder regularly. I didn't do that often enough to be worthwhile. In fact, as the messages piled up in that folder, I felt less and less able to read it. In addition, you may have issues if your filtering rules are complex (I want A and B but not C), though not if you use procmail.

3. Get the daily digest and miss out on timely job postings. I did this for a few months and found that I almost never read the large digest. I just felt guilty at the bandwidth wastage.

4. Use the search functionality to periodically check for postings of relevance to you. This helps with research, but doesn't deal with the time issue. And, you have to remember to check periodically.

However, now there's a fifth solution. Yahoo provides an RSS feed for that group. (Not all groups seem to have rss provided for them, and I couldn't figure out how to turn it on for a group that I moderate.)

With the magic of XSLT, I was able to write a stylesheet which only grabs entries with interesting keywords in the title, thus avoiding the flooding problem. RSS is not real time, but it's can be close (as close as I want/am allowed to poll the feed). Additionally, I'm a lot more likely to scan it than I would any of the email solutions.

Here's the relevant XSLT:

<xsl:template match="item">
        <xsl:variable name="item_link" select="link"/>
        <xsl:variable name="item_desc" select="description"/>
        <xsl:variable name="item_title" select="title"/>
        <xsl:variable name="uc_item_title" select="translate($item_title,'boulderjava','BOULDERJAVA')"/>
        <xsl:choose>
           <xsl:when test="contains($uc_item_title, 'JAVA')">
              <li><a href="{$item_link}" title="{$item_desc}"><xsl:value-of select="title"/></a></li>
           </xsl:when>
           <xsl:when test="contains($uc_item_title, 'BOULDER')">
              <li><a href="{$item_link}" title="{$item_desc}"><xsl:value-of select="title"/></a></li>
           </xsl:when>
           <xsl:otherwise>
           </xsl:otherwise>
        </xsl:choose>

</xsl:template>

The reason for the translate cheesiness is that the version of the perl RSS module I'm using does not support the upper-case function (here's a useful list of XSLT functions).

Posted by moore at 07:39 PM | Comments (1)

March 26, 2005

Metafor: Using English to create program scaffolding

Continuing the evolution of easier-to-use computer programming (a lineage which includes tools ranging from assembly language to the spreadsheet), Metafor is a way to build "the scaffolding for a program." This doesn't mean that programmers will be out of work, but such software sketching might help to bridge the gap between programmers and non-programmers, in the same way that VBA helped bridge that gap. (I believe that naked objects attacks a similar problem from a different angle.) This obviously has implications for novices and folks who don't understand formal problems as well. Via Roland Piquepaille's Technology Trends, which also has links to some interesting PDFs regarding the language.

However, as most business programmers know, the complicated part of developing software is not in writing the code, but in defining the problem. Depending on how intelligent the Metafor parser is, such tools may help non-technical users prototype their problems by writing sets of stories outlining what they want to achieve. This would have two benefits. In one case, there may be users who have tasks that should be automated by software, but who cannot afford a developer. While definitely not available at the present time, perhaps such story based software could create simple, yet sufficient, applications. In addition, software sketching, especially if a crude program was the result, could help the focus of larger software, making it easier (and cheaper!) to prototype a complicated business problem. In this way, when a developer meets with the business user, they aren't just discussing bullet points and static images, but an actual running program, however crude.

Posted by moore at 02:00 PM | Comments (0)

March 18, 2005

The label tag

An HTML tip for you: the label tag is a boon to usability. If you click here it should take you to

Small touches like this make it much easier to build forms that are forgiving in terms of user input--you can make clicking a checkboxes easy for the mouse impaired.

Doing this kind of control extension used to take a bit of javascript, but now the label tag makes it easy. Looks like <tag> is supported in modern browsers (Moz 5, IE 5).

Posted by moore at 05:53 PM | Comments (0)

February 25, 2005

Reliable HTTP Draft

Here's an interesting 'pre-draft' of HTTPLR, an 'application protocol for reliable transmission of messages using HTTP' (via the author's blog). It doesn't require throwing away already built out infrastructure. There are, however, a few wrinkes:

This draft does require support of the PUT method, which is not available out of the box on Apache, as well as the DELETE method, which again requires webdav to work with Apache.

It uses the DELETE method (rather than the POST) to communicate client knowledge of message transfer (section 8.3, 9.3). I'm not sure how I feel about that, as it seems to be mis-using that method.

Other than that, I like the idea. It seems really well thought out, but it would be nice to see some sample code.

Posted by moore at 01:54 PM | Comments (1)

Setting the content encoding for HTML message parts with Javamail

I spent an hour chasing down the solution to this issue, so I figured I'd post it (or at least what worked for me). Basically, I have a multi-part message that can have different content encodings for each text part. I want to send this message via javamail. Now, there's support for setting content as type 'text/plain' with a different character set, but if you want to add a part that is a different subtype of text to your message, there is no convenience method. However, this mail message had an example of how to specify html content and a character set:

MimeBodyPart htmltext = new MimeBodyPart(); htmltext.setContent(someDanishChars, "text/html; charset=\"ISO-8859-1\"");

(The author had some issues with this method in different app servers; it works fine for me in a stand alone java program.)

These additional parameters to the 'Content-Type' header are laid out, for text documents, in section 4.1 of RFC 2046. Here's a collection of helpful email related RFCs. Additionally, section 5.1 of RFC 2045 outlines the way to add parameters and gives examples of the charset parameters.

Posted by moore at 01:18 PM | Comments (0)

February 18, 2005

RIFLE: User Centric Information Flow Security

I went to a talk yesterday about RIFLE: An Architectural Framework for User-Centric Information-Flow Security, one of a series of University of Colorado CS Colloquia. "User-Centric Information-Flow Security" (UCIFS) is a different way of enforcing security than almost anything I've encountered before. Basically, instead of assigning permissions to users and actions, a la JAAS, you assign levels of security to data. This security level is then tracked throughout the application, and at various endpoints (I/O activity, network transmission) a policy is enforced. Therefore, you could tag a SSN with a high security level, and any variables and decisions based on the SSN would be tagged similarly, since security levels propagate. Then, when some piece of malware tries to send your SSN (or anything related to it) off to Russia, the system intervenes.

I say UCIFS is a "different way of enforcing security than almost anything I've encountered" above because there's one thing that I've seen that does assign a security level to some kinds of data: perl's taint mode. I've used taint mode in perl cgi scripts before, and it's a good way to make sure that untrusted data is not used in dangerous situations without the programmer's explicit knowledge.

However, UCIFS aims a bit higher. An ideal system tracks data and its levels through all algorithms, doesn't leak data, requires no effort from a programmer and enforces policies dynamically. According to the presenter, it turns out that no system can have zero data leakage. You can always signal the state of a variable in some way, even if it's as crude as ceasing the operation of the program--these are called 'covert channels'. RIFLE meets the other criteria, apparently, and does so by operating on binaries and tracking the data via extra registers (I'm on thin ice here, since I'm by no means a systems programmer).

It was an interesting talk because tracking security based on data, and giving users choices for data security, sure seems a better way of dealing with security issues than the program level trust that firewalls and ACLs now provide. Not a whole lot of real world applicability just yet (creating policies was barely touched upon, for one thing), but perhaps in the future. For more, please check out the Liberty Research Group's website.

Posted by moore at 10:00 AM | Comments (0)

February 16, 2005

"cvs checkout: failed to create lock directory" solution

For those of us still using CVS, rather than the highly acclaimed subversion, I wanted to outline a solution to a common problem I've often seen:

One user creates a cvs module (named, for example, project) and checks in a number of files and directories. Then another developer tries to check out the module and sees this error. (Here's another explanation of the solution.)

: cvs checkout: failed to create lock directory for `/usr/local/cvsrepo/project' (/usr/local/cvsrepo/project/#cvs.lock): Permission denied : cvs checkout: failed to obtain dir lock in repository `/usr/local/cvsrepo/project' : cvs [checkout aborted]: read lock failed - giving up

If you go to /usr/local/cvsrepo/project, and run an ls -l, you'll see that the permissions look like:

... drwxrwxr-x 2 user group 4096 Feb 16 09:40 bin ...

This error message comes from the fact that the second user is not a member of group group. The best way to solve this is to create a second group, perhaps called cvs, and assign both users to that group.

Then, you want to make sure that all the files have the correct group bit set:
chown -R :cvs /usr/local/cvsrepo/project

And, you want to make sure that any new directories (and files) added use the cvs group, rather than the group group:
chmod -R g+ws /usr/local/cvsrepo/project

Your final permissions should look like:
... drwxrwsr-x 2 user cvs 4096 Feb 16 09:40 bin ...

Now the second user and any other developers should be able to check out the code so safely stored in cvs.

Posted by moore at 10:01 AM | Comments (0)

January 21, 2005

Concurrency, object orientation, and getting software done

The Free Lunch Is Over, via Random Thoughts, is a fascinating look at where CPUs are headed, and what effect that has on software development. The subtitle: "The biggest sea change in software development since the OO revolution is knocking at the door, and its name is Concurrency" drives home the fact that the author believes that concurrency will be the next big thing in software development.

I was struggling to write a relevant post about this topic, becuase I feel like, at least in the companies I've been with, there just wasn't that much object oriented software being written. I'm working on a project right now that has a minimum of object orientation, even though it is written in java. I'm definitely more familiar with small scale projects and web applications, but I know there are plenty of applications out there that are written and working well without the benefits of objects.

Or, should I say, that are written and working well without the benefits of objects directly. Servers, operating systems and general purpose platforms are a different beast and require a different skill set. And by building on top of such platforms, normal programmers don't have to understand the intricacies of object oriented development--they can benefit without that investment. Of course, they'd probably benefit more if they understood things and there may come a time in their development that they'll have to. However, the short term gain of being able to continue on their productive plateau may be worth postponing the learning process (which will take them to a higher plateau at a short term cost).

In the same way, I think that multi-threading won't be required of normal busines developers. I was struggling with this until the latest NTK came out, with this to say:

CPUs aren't getting faster! Multi-core is the future! Which means we'll all need to learn concurrent, multi-threaded programming, or else our software is never going to get faster again! That's what Herb Sutter's future shock article in Dr. Dobbs says (below). But before you start re-learning APL, here's a daring thought: maybe programmers are just too *stupid* to write multi-threaded software (not you of course: that guy behind you). And maybe instead we'll see more *background* processes springing up - filling our spare CPUs with their own weird, low i/o calculations. Guessing wildly, we think background - or remote - processes are going to be the new foreground.

From the Jan 21 edition, which should be online in a day or so. Those Brits certainly have a way with words.

If you're a typical programmer, let the brilliant programmers who are responsible for operating systems, virtual machines and application servers figure out how to best use the new speed of concurrent processor execution, and focus on process and understanding business needsand making sure they're met by your software. Or, if you have a need for speed, look at precalculation rather than multi threading.

Posted by moore at 11:54 AM | Comments (2)

January 20, 2005

Expresso and dbobjects and ampersands

If you're ever pulling a url from a database via an Expresso dbobject (Expresso's O-R layer) and you find yourself with mysterious & characters being inserted, you may want to visit this thread and the FilterManager javadoc. Long story short, add this line:

setStringFilter("fieldname", FilterManager.RAW_FILTER);

to any fields of the dbobject that you don't want 'made safe' by the default filter (which screens out dangerous HTML characters). Tested on Expresso 5.5.

(I'm omitting the rant about changing data pulled from the database without making it loud and clear that default behavior is to filter certain characters. But it's a Bad Idea.)

Posted by moore at 08:23 PM | Comments (0)

January 17, 2005

PL/SQL redux

I've written about PL/SQL before but recently have spent a significant amount of time writing stored procedures. Unlike some of my previous experiences, this time PL/SQL seemed like a great fit for the problem set, which was two fold.

In the first case, some of the stored procedures push data from stage tables, which are loaded via ODBC or SQL*Loader, into tables which the application accesses. PL/SQL is great for this type of task because cursors, especially when used with parameters, make row driven data transformations a pleasure, and fast as well. Handling deltas via updates instead of inserts was alright, and the fact is that PL/SQL code that manipulates data can be positively terse when compared to JDBC PreparedStatements and at least as fast. In addition, these stored procedures can be easily called over an ODBC connection, giving the client the capability to load new data to the stage tables and then call the stored procedure to update or insert the data as needed. (You could definitely do the same thing with a servlet and have the client hit a URL, but that's a bit less self-contained.)

PL/SQL was also used to implement complex logic that was likely to change. Why do that in PL/SQL in the database rather than in java in the application server? Well, changes to PL/SQL programs don't require a server restart, which can be quite an issue when a server needs high levels of uptime. Instead, you just recompile the PL/SQL. Sure, you can use the reloadable attribute of the context to achieve the same thing (if you're using Tomcat) but recompiling PL/SQL doesn't have the same performance hit as monitoring class files for changes.

Use the right tool for the job. Even if PL/SQL ties your application to Oracle, a judicious use of this language can have significant benefits.

Posted by moore at 11:09 PM | Comments (0)

January 15, 2005

Under Pressure

In almost every software project of any length that I've participated in, the last few weeks before a release are tense and pressure filled. (Please note that I write custom business software; that's what these conclusions are drawn from.) Being in the middle of a project release myself, I thought I'd muse on the causes of this pressure. Why are the last few weeks before the deadline so tense? Because software is, above all else, about the details. Joel puts it well in his interview with Salon.com:

The fundamental problem that you're trying to solve here is that humans think of things in vague, mushy terms. In order to visualize something, they don't have to actually visualize every part of it. Whereas the programmer, in order to actually implement that thing, to create it, needs to have every part specified.

What happens on projects over a certain level of complexity is this specification is pushed off, often until a decision must be made, or even past that point. This occurs for a number of reasons: programmers want to start coding, the client doesn't have the information at the moment the issue is raised and it is never revisited, the answers to certain questions (or the questions themselves) are dependent on answers to other questions. In the beginning of a project, big questions are decided, but the small niggling details, which the compiler most certainly needs to know about, are, perhaps noted, but not dealt with.

Why not specify how the system will work before building any of it, to every exacting detail? Some software processes try to do this, but in general, unless the problem is very well understood (in which case the client will almost always be better served by off-the-shelf software), the requirements will change as the project progresses. (Incidentally, if they don't, the project is a great candidate for offshoring.) The client will better understand the problem and technology and the software team will likewise better understand the problem and domain space. So specifying the entire system up front will likely leave the customers unhappy or the system unused.

Because business software is actually business process crystallization, it matters very much that things are correct. Because business software is implemented by a group of people with specialized skills and a different focus from the users, at best, or no understanding of the business, at worst, software delivery is unlike other deadline driven industries in that changes are expensive and mysterious. I think every software engineer has an example of a simple change request that turned out to have massive implications throughout the system, and this effect is mysterious to normal users.

What matters is not why the details crop up, but that they do. So, the last few weeks of every project consists of mentally running around and nailing down every detail. I expect this is true of every job with fixed deadlines (ever been around a retail store the day before Thanksgiving?). Every issue should be resolved or acknowledged when the software is released, and while some facets are less important than others, no detail is unimportant.

Posted by moore at 08:56 PM | Comments (0)

January 10, 2005

sqlldr

I've been writing SQL*Loader scripts to load a fair bit of data into Oracle. I have a set of load tables with minimal constraints on them, into which SQL*Loader pushes the rows. Then I have written some PL/SQL which pulls from the load tables to the real database.

This architecture was chosen because the PL/SQL procedures can be written to allow incremental as well as full data loads. In the incremental case, it's conceivable there there'd be a different way of pushing data over to the load tables (via ODBC or JMS, for example). In addition, the load tables can be denormalized, and you can put enough intelligence in the PL/SQL to turn your data structures into something at which a DBA won't cringe.

Anyway, I thought I'd share a few tips, gleaned through the process. I'm definitely no SQL*Loader guru, but here are some useful links: the sqlldr FAQ, full of good information and recently updated, the Oracle Utilities page which does a great job of explaining all the options of SQL*Loader, and this case study which outlines internationalization with sqlldr. All very useful.

Two other tips: If you are loading delimited character data that is longer that 255 characters, you need to specify the length in your control file (for example, declaring it in the control file as char(4000)), or else you'll get an aggravating error message warning that the data you're loading is longer than the column in which you're trying to load it. I spent some time looking very carefully at the load table trying to see what I was missing before I googled and found out that char fields do have default sizes in sqlldr control files.

And the bindsize and rows parameters are related, in terms of the amount of data that sqlldr can push into a table before it commits. You can make rows very very big, but if bindsize is too small (it defaults to 64k, apparently) the commits will happen sooner than they need to. For more explanation and other perforamance tips, see this page.

Overall, I've been very happy with how easy it is to load a fair bit of data, quickly (both in terms of load time and in development time) using sqlldr.

Posted by moore at 10:55 PM | Comments (0) | TrackBack

javascript and checkboxes

Ran into an interesting problem while I was using javascript today. I had a (dynamically generated) group of checkboxes that I wanted to be able to check and uncheck as a group. This was the code I had originally, which I had cribbed from one of the many fine javascript sites on the web:

function checkAll(field) {
   for (i = 0; i < field.length; i++) field[i].checked = true ;
}

This method was called by a link like this:

<a href="javascript:checkAll(document.form.checkboxes);">Check All</a>

All well and good, as long as the field that is passed into the function is an array of checkboxes. However, since javascript is a typeless language, you can call any method on an object, and depending on how egregarious the error is, the user might never see an error message. In this case, when the dynamically generated group of checkboxes has only one element, document.form.checkboxes is not an array of checkboxes, and its length attribute doesn't return anything. The for loop is not executed at all, and the box is never checked.

The solution is simple enough, just check the type of object passed in:

function checkAll(field) {
    if (field.type != 'checkbox') {
        for (i = 0; i < field.length; i++) field[i].checked = true ;
    } else {
        field.checked = true;
    }
}

It makes a bit of sense why one checkbox wouldn't be an array of size one, but the switch caught me a bit off guard. I'm trying to think of an analogous situation in the other dynamic languages I've used, but in most cases, you're either controlling both the calling and receiving code, or, in the case of libraries, the API is published. Perhaps the javascript API documenting this behavior is published--a quick google did not turn anything up for me.

Posted by moore at 10:24 PM | Comments (0)

December 29, 2004

Useful tools: the catch all email address

When working on a web application that requires authentication, email address is often chosen as a username. It's guaranteed to be unique, it's something that the user knows rather than another username they have to remember, and communication to the user is built in--if they're having trouble, just have send them an email.

However, when developing the initial registration portion of a site that depends on email address for the username, you often run through many email addresses as you tackle development and bugs. Now, it is certainly easy enough to get more email addresses through Yahoo or hotmail. But that's a tedious process, and you're probably violating their terms of service.

Two other alternatives arise: you can delete the emails you want to reuse from the web application's database. This is unsavory for a number of reasons. One is that mucking around in a database when you're in the middle of testing registration is likely to distract you. Of course, if you have a the deletes scripted, it's less of an issue. You'll need to spend some time ensuring you've reset the state back to a true pure place; I've spent time debugging issues that arose for anomalous user state that could never be achieved without access to the back end.

Which is why I favor the other option. If you own your own domain name and have the catch all key set, all email for your domain that does not have a specified user goes to the catch all account. (I wasn't able to find out much of hose this is set up, other than this offhanded reference to the /etc/mail/virtusertable file.)

I find having this available tremendously useful. You have an infinite number (well, perhaps not infinite, but very large) of addresses to throw away. At times, the hardest part is remembering which address I actually used, which is why having a system of some kind is useful. For example, for my dev database on my current project, I start all users with foo and then a number. For the stage database, I start all users with bar and then a number.

In addition to helping with development, it's useful to have these throwaway email addresses when you are signing up for other web applications or posting on the web. For example, my jaas@mooreds.com account, which was posted on my JAAS and Struts paper, is hopelessly spammed. If I had put my real address on that paper, I would have much more spam than I do now, as jaas@mooreds.com simply goes to /dev/null courtesy of procmail. I've also used distinctive email addresses for blog comments and for subscribing to various mailling lists; this way I can find out if everyone really keeps their data as private as they say they will. Of course, there are free services out there that let you have throwaway email addresses but having your own domain gives you a bit more security and longevity.

All in all, I find that having a catch all email address set up for a given domain is a very useful development tool, as well as a useful general web browsing technique.

Posted by moore at 08:46 PM | Comments (0)

December 24, 2004

New vs old technologies

Compare the truths outlined here: "...for many businesses, sticking with what they have is the cheapest choice and best ROI" with Rands' comments on tool cruft.

Of course, engineers aren't businesses. But they operate under some of the same constraints--deadlines, limited money, etc. Despite what Rands says, there's a balance to be struck between the new and the old. Of course, most folks, including myself, tend to lean towards the old and the known because it feels safer. But the known is (often) safer. Dion talks about it here and likewise doesn't come to any conclusions.

I don't want to sound like an old fogey, but I've been burned before in the past by short deadlines, new technologies and inexperienced users (of which I was one). I'm looking at Spring, having heard it praised to the sky, and want to use it on my next project. (Spring, incidentally, reminds me of a supercharged version of ATG's Nucleus; what's old is new again.) New tech is great, but not because it's new. And old tech is safe, but not because it's old. Each of these is appropriate when it's the right tool for the job, but it's hard to divorce that choice from my kneed jerk reactions and emotions--that's what methods like ROI and research are designed to do.

Posted by moore at 09:31 AM | Comments (0)

December 20, 2004

Precision and Accuracy in Software

Back in college, when I took first year physics lab, there was a section of the course that focused on teaching the difference between precision and accuracy in measurement. This distinction was crucial in experimental physics, since measurement is the bedrock of such experimentation. Basically, precision is how many digits of a measurement actually mean something. If I'm measuring the length of a room with my stride (and found it to be 30 feet long), the precision is less than if I were to measure the length of the room with a tape measure (and found it to be 33 feet, 6 and ¾ inches long). However, it's possible that the stride measurement is more accurate than the length found with the tape measure, that is, it reflects how long the room actually is. (Perhaps there's clothing on the floor which adds tape measurement, but which I stride over.)

These concepts aren't just valid in physics; I think they're also useful in software. When building a piece of software, I am precise if I build what I say I am going to build, and I am accurate if what I build actually meets the client's business needs, that is, it solves the business problem. Almost every development tool either makes development more precise or more accurate.

The concept of precision lends itself easily to automation. For example, unit testing is rapidly gaining credence as a useful software technique. With unit testing, a developer writes test cases for each part of their code (often at the method level). The running of these tests ensures that code is actually doing what the developer thinks it is doing. I like writing unit tests; it gives me comfort to know that corner cases are taken care of and that changes to code can be fairly easily regression tested. Other techniques besides unit testing that help ensure precision include:

Round tripping: using a tool like TogetherJ, I can ensure that the model (often described in UML) and the code are in sync. This makes it easier for me to verify my mental model against the code.

Specification writing: The more precise a spec is, the easier it is to translate into code.

Compilers: the checking that occurs at compilation time can be very helpful in ensuring that the code is doing what I think it is doing--at a very low level. Obviously, this technique depends on the language used.

Now, precision is needed, because if I am not confident that I understand what the code is doing, then I'm in real trouble. However, accuracy is much more important. Having a customer onsite is a great example of a technique to ensure accuracy: you have a business domain expert available all the time for developers' questions. In this situation, when a developer stumbles across a part of the business problem that they don't quite understand, the don't do what developers normally do (in order of decreasing accuracy):

1. Ask another developer, which works great if the target audience is developers, but not so well otherwise.
2. Best approximation (read: guess at the correct answer).
3. Ignore the issue. ('I've got a lot more code to write before I can go home today, and we're shipping in two weeks. We'll just let the customer discover it and deal with it as a bug.')

Instead, they have a real live business person, to whom this software really matters (hopefully), who they can ask. Doing this makes it much more likely that the final solution will actually solve the business problem. Other techniques to help improve accuracy include:

Issue tracking software (I use Bugzilla): Having a place where questions and conversations are recorded is truly helpful in making sure the mental model of the business user and the programmer are in sync. Using a web based tool means that non-technical users can participate and contribute.

Specification writing: A well written spec allows both the business user and developer to have a sense of what is being built, which means that the business user can correct invalid notions at an early stage. However, if a spec is too detailed, it can be used to justify precision at the cost of accuracy ('hey, the code does exactly what's specified' is the excuse you'll hear).

Spring and other dependency injection tools, as well as IDEs: These tools help accuracy by decreasing the costs of changing code.

Precision and accuracy are both important in software engineering. Perhaps the best way to characterize the two concepts is that precision is the mapping of the programmer's model of the problem to the computer's model, whereas accuracy is the mapping of the business' needs to the programmer's model. However, though both are needed, accuracy is much harder to obtain. Knowing that I'm building precisely what I think I'm building is beneficial only insofar as what I think I'm building is actually what the customer needs.

Posted by moore at 12:24 AM | Comments (0) | TrackBack

December 11, 2004

Why did the multithreaded chicken cross the road?

to To other side. get the

(from Some Assembly Required.)

Posted by moore at 03:40 PM | Comments (0)

December 10, 2004

Useful tools: javap

javap lets you examine java class files and jar files in a number of ways. See this web page for more information. For me, it's an API reference. I use it in two ways:

1. When I'm coding, and I need to know the exact syntax of a method, I shell out: javap java.util.StringTokenizer. (Yes, I know that any modern IDE will do this for you without shelling out, but javap will work anywhere java is installed and with any editing tool. You trade portability for convenience.) One large catch is that inherited methods are not shown:

$ javap java.io.BufferedReader
Compiled from "BufferedReader.java"
public class java.io.BufferedReader extends java.io.Reader{
    public int read();
       throws java/io/IOException
    static {};
    public void close();
       throws java/io/IOException
    public void reset();
       throws java/io/IOException
    public boolean markSupported();
    public boolean ready();
       throws java/io/IOException
    public void mark(int);
       throws java/io/IOException
    public long skip(long);
       throws java/io/IOException
    public int read(char[],int,int);
       throws java/io/IOException
    public java.io.BufferedReader(java.io.Reader);
    public java.io.BufferedReader(java.io.Reader,int);
    public java.lang.String readLine();
       throws java/io/IOException
    java.lang.String readLine(boolean);
       throws java/io/IOException
}

Running javap on java.io.BufferedReader does not show the method read(char[]), inherited from java.io.Reader. (This example is from the J2SE 1.4 libraries.)

2. Sometimes, the javadoc is too up-to-date (or your jar files are too old) to answer questions about an API. For example, I'm working on a project with Jetspeed which depends on Turbine version 2.2. Unfortunately, this is an extremely old version of Turbine (release 16-Aug-2003), and the javadoc doesn't appear to be available. (Updated Dec 11: It looks like the Turbine 2.2 javadoc is indeed online. Whoops.) Generating the javadoc with ant is certainly an possibility, and if I found myself going time and again to verify the API of Turbine 2.2, I'd do that. But for a quick one- or two-off question about an API that no web search turns up, javap can be very handy.

In short, if you have a quick question about an API, javap can help you out.

Posted by moore at 06:45 PM | Comments (0)

November 23, 2004

Useful tools: p6spy

This entry kicks off a series of entries where I'll examine some of my favorite tools for development. Some of them will be long, some short, but all of them will highlight software I use to make my life a bit easier.

A large, large chunk of the development I do is taking data from a relational database to a an HTML screen, and back again. Often there are business rules for transforming the data, or validation rules, but making sure the data is stored safely and consistently is a high priority, and that means a relational database.

However, I do much of my work in java, which means that the relational-OO impedance mismatch is a common problem. One common way to deal with it is to use an OR tool--something like OJB or JDO. These tools provide object models of your database tables, usually with some help from you. You then have the freedom to pretend like your database doesn't exist, and use these objects in your application. The OR framework takes care of the dirty work like SQL updates and caching.

Every convenience has its price, however, and OR mapping tools are no exception. The same abstraction that lets you pretend that you're simply dealing with objects means that you cannot easily examine the SQL that is generated. In addition, the way that you're using the objects may cause performance issues, because you're treating the data as objects, rather than rows.

It's much the same issue as calling methods over the network via RMI or accesing files via NFS: the abstraction is great and means that programmers don't have to think about the consequences of remote access. But the failure of the abstraction can be catastrophic, all the more so because the programmer was not expecting to have to deal with the grotty details under the abstraction (that's the whole point, right?).

OR tools do not fail often, or have many catastrophic failure modes, but they sure can be slow. With open source software, you can dig around and see how SQL is being generated, but that's tedious and time consuming. With commercial products, you don't even have that option. (Some OR tools may have their own 'Show me the SQL' switch--I haven't run into them.)

Enter p6spy. p6spy can be used in place of any JDBC driver. You point it to the the real driver and it passes on any configuration or SQL calls to that driver. But p6spy logs every SQL statement passed to it and every result set passed back. (A fine non object oriented example of the Decorator pattern.)

It took me about 15 minutes to figure out how to use p6spy, the software is open source with decent documentation, the latest version has data source support, and it scratches an itch that most, if not all, java developers will have at some time. With p6spy, you can find out what that OR tool is doing under the covers--it's an easy way to peel back some of the abstraction if needed.

Posted by moore at 11:36 PM | Comments (0) | TrackBack

November 18, 2004

Koders.com--search source code

Koders.com has apparently indexed many open source software projects. (Link via Dion.) I played around with it a bit and I think it's a very slick application. I'm of two minds about this, though.

The good:

Code reuse is good. A co-worker of mine once called it 'editor inheritance'--in a world where people time is expensive and disk space is cheap, it can make sense (not always) to just copy code rather than figure out how to make a piece of code re-usable. Koders lets you do this in a more effective way.

It also lets coders easily compare and contrast styles between real live projects. And I can only imagine that soon some researcher will sink his teeth into all the code and publish on First Monday.

The bad:

As the linux-SCO lawsuits have shown, it's technically awfully easy to cut and paste code, but the results end up being illegal. I can only see this repository, even though it differentiates by license, exacerbating this problem. And mixing and matching code from different licenses becomes all the easier as they show up side by side in a search engine. If I were a company concerned with legal ramifications, I'd tread softly around this tool.

The possibilities:

Regardless, I have to say it's a very cool application. I'll be interested to find out how much people will use it. What would be really cool is further analysis--after all google gets its power from the links between websites--what would we learn by examining the links between code? For one, you'd have a better idea how useful and stable a project is, if you could know how many other projects used it. Having a plugin into a UML modelling tool would be pretty slick too.

Posted by moore at 04:41 PM | Comments (0)

November 12, 2004

Testing Korean content

I'm currently working on a site that needs to be truly localized for a large number of languages (tens of them). This is accomplished with large numbers of ResourceBundles, the MessageFormat class when variable text layout is needed, an Oracle backend which understands and doesn't muck with UTF-8, an Access database which generates said bundles, and a crack team of translators.

However, how to test? Luckily, it's fairly easy to have IE use a different language: clear instructions live here. One issue with the instructions is that they don't tell you how to actually install a language pack. But this is easy too: I only had to right click on a page, choose the encoding menu, then choose more, and then the encoding I wanted (Korean, because I want to test double byte characters). I was then prompted to install a language pack. I accepted, and Windows downloaded a bunch of DLLs and other files. From then on I could view Korean characters (the encoding menu says I'm viewing 'Unicode (UTF-8)'). Here's a random site about mining that you can use to test your Korean language pack.

Don't forget to test both the input and output of your application--saving user input, and being able to redisplay it is at least as important as being able to display what you draw from your ResourceBundle initially. As a bonus, the Korean character set that I installed via IE was made available to Firefox. This was done on the fly, not only did I not need to restart Windows, I didn't even need to restart Firefox; I just needed to reload the page.

Posted by moore at 08:28 AM | Comments (0)

October 23, 2004

Extending attributes of entities in relational databases

When you are data modeling, entities have attributes. During the early part of a project, it's possible to map out many of these attributes, but often you miss some--the requirements change, the customer changes their mind, or the architects missed something at the beginning. Depending on how you've modeled those attributes, the pain of adding, modifying or removing them can be mellow or intense. In addition, often the values stored in these attributes need to be queried, or modified themselves.

Suppose you have a employee table (I'm trying to do this with SQL 92 syntax but I am not a DBA):
create table emp (emp_id numeric, first_name varchar(100), last_name varchar(100), dept varchar(50));
and you suddenly need to add middle name and salary to this table. There are three ways to create an extensible schema in a relational database. Each of these has their pluses and minuses.

1. The DDL method of extension
alter table emp add middle_name varchar(100), salary numeric;
Here, you simply add another column. For querying and clarity, this method is fantastic. These columns can be indexed and it's clear that employees now have two additional attributes. However, this also means that any mapping you have between your model objects and your database needs to be updated; probably code needs to be regenerated to add these two attributes to your model objects. It also means that you have to version your database--since code that expects a middle_name attribute on employee will probably die a horrible death if that column is missing. In addition, depending on the size of the system, you might need to get a DBA involved.

2. The DML method of extension
create table emp_attributes (emp_id numeric references emp(emp_id), name varchar(100), value varchar(100)); insert into emp_attributes (1, "middle_name", "Sam"); insert into emp_attributes (1, "salary", "100000");
In this case, you can add attributes without getting a DBA involved--you simply add columns into this table. However, there is no referential integrity on the name of the attribute (is middle_name the same as mid_name the same as MIDDLE_NAME?--though, to be fair, you can put constrains on the values of the name column). In additional, the value column is not typed; though almost any data type can be stored as a string, you can lose precision and waste time converting from string to the actual type you want. In addition, querying based on these attributes is tedious:
select first_name from emp e, emp_attributes ea where e.emp_id = ea.emp_id and ea.name = "name" and ea.value ="Sam"
If you want to get all employees paid more than Sam, you need to resort to database specific functions to convert that string to a number.

3. The stored object method
alter table emp add objectdata varbinary;
With this method, you create a object or data structure in memory and serialize it to a stream of bytes which you then store in the objectdata column. This is great because you can add whatever attributes you like and the database structure doesn't need to change at all. However, the data is unreadable by normal SQL tools and other programming languages. Querying on this data also becomes very difficult and slow, as you end up having to recreate each employees data object and test conditions in the programming language--you're not taking advantage of the database.

There are a couple of questions that can help you determine which method you should use: How often will attributes be added? How hard is the process for that? How difficult is it to regenerate your data layer? Will you want to use SQL tools?

In general, the DDL method is the best option. It's just the cleanest, easiest to understand and query against. The DML method is the second best, as you can still use most of the SQL toolset, even if it's more complicated. The stored object method for extending attributes in a relational database should be used carefully, when there are a large number of attributes which can change often and will never be queried upon.

Posted by moore at 02:52 PM

October 22, 2004

Best software essays of 2004

Joel is soliciting suggestions for best software essays of 2004. Very worthwhile browsing.

Posted by moore at 09:50 AM | Comments (0) | TrackBack

October 19, 2004

Problem solving via email composition

I'm currently struggling with an open source project, working to make the UI look like the one the designer showed the client. It's a bit frustrating, because it's sleuth work, and the clues are all in the code. Code I've not worked with, but with which I am rapidly gaining familiarity.

I cannot imagine how frustrating this would be if it were a closed source tool. Check that, I can. I worked on a project a few years ago with a closed source portal product, and it simply wouldn't do what we asked it to do. Well, when you're a small company, you do what you need to do in order to get the job done. This included some decompilation I'm sure voided our warranty, but delivered the client what was promised. It's awful when you're put in that position.

Regardless, I've been able to tweak this open source project to make it do what I need. When I find myself facing an issue that is not covered in the on line documentation, I've found that the best way to solve the problem is to start writing an email to the user list.

Now, because it's written communication to a group of peers, some of whom are quite knowledgeable about the subject, I'm fairly careful about the content. I outline the problem carefully, explain the searches of the mailing list and web that I've already undertaken, and detail any avenues of solution that I've already explored. This rather rigorous analysis of the problem often leads to other glimmers of solutions, log files I have forgotten to check, and more hypotheses to prove or disprove. As I build the email, the problem becomes more and more manageable.

In my efforts to not appear a fool to peers, I reckon I have not sent about seventy percent of the emails I've composed--either I found another solution, I missed some crucial bit when before I started the email, or after doing due diligence on the problem I discovered the answer on the mailing list. It's a bit weird, but crafting an excellent 'can you help me' email to a user list helps me even if the email isn't sent.

Posted by moore at 03:44 PM | Comments (0)

October 11, 2004

Coding Standards

We've all been caught up in the religious war of coding standards. Whether it's 'tab vs spaces' or '3 spaces vs 4 spaces' or 'curly brace on same line as if clause or curly brace on line below if clause', these arguments can take up a significant amount of time on a project. Ken Arnold has an interesting post where he recommends that coding standards be integrated into the language specification, and enforced by the compiler. His reasons:

1. There's no real productivity difference between one coding style and another.

2. There's real productivity lost in setting up pretty printers, re-formatting code manually, and arguing about the better style.

3. If programmers have latitude, they'll use it. (See perl.)

So, take away the freedom of programmers to format their code in a non standard manner. Make coding style part of the compiler--not as a warning though (as Bruce Eckel says "when you discover you can ignore them [warnings], you do")--actually have the compiler error out if the formatting is incorrect.

Sure, I'd wail along with the rest of the coding community, for a while. But then I'd adjust, especially if the tools were there. And I'd probably be more productive. Talk about tough love.

Posted by moore at 08:36 AM | Comments (0) | TrackBack

October 06, 2004

Right tool, right job, part II

Here's a nice explication of 'choose the right tool for the job,' by a pundit. The entire J2EE stack is often overkill, and I've seen shops that would have been much better served by a series of PHP applications, perhaps with java on the backend for heavier duty processing and legacy code reuse.

My only quarrel with his analysis is that he assumes we know how long an application is going to live. How many applications that started as prototypes get rolled to production and are the cause of endless headaches five years down the road?

As in any complex situation, when designing an application, and choosing how complicated to make its architecture, you need to balance between two principles:

1. It's easier to do things right the first time. You're in the code, you have an understanding of the requirements, there's a green field, you have a budget--therefore making sure the application works is easier when you're first writing it.

2. You aren't gonna need it. This slogan, from the Agile folks, says that you have no way of knowing what you're going to need five years in the future, so why try to design for that?

Taken too far, each principle is disastrous. If you follow principle number one fully, you'll end up in analysis paralysis. If you take point two to heart, you'll miss out on big chunks of functionality that have to be retro-fitted on later, to the detriment of a schedule and possibly the design.

That's why they (architects, not pundits) get paid the big bucks.

Posted by moore at 09:03 AM | Comments (0) | TrackBack

October 05, 2004

How to evaluate an open source project

There are a fantastic number of open source projects out there, on SourceForge, apache, and elsewhere. This is fantastic because you can often leverage other work that folks have done, as well as knowledge and mistakes they've made. However, it can be extremely difficult to evaluate accurately how much such projects can help you, especially if you've not used them before, or if they are obscure. In addition, you probably don't have a lot of time to choose a solution--clients that go with open source solutions tend to have budget constraints. I present here a quick checklist that I use to evaluate projects that I'm going to depend upon:

1. Know your drop dead features. These are features that the software package must have in order to be considered. Be careful not to make this too long. The primary purpose of this list is to allow you to quickly exclude packages that you know won't work for you, and if you make it too long, you might be left with no options.

2. Look at the documentation attached to the project. This is the first place to start ruling a project out--if it doesn't promise the features you need, move on. Also, look at a demo or screen shots, if possible. This lets you see how the package works. Compare behavior with the list of needed features.

3. Install the software. If you have difficulty installing it, that's not an insurmountable issue--often open source projects aren't the smoothest installations. However, installing it and spending a few hours playing around with this software that is going to be a significant part of your project can let you know if the impressions you received from the demo and documentation are correct--is it going to be easy enough to tweak/deploy/extend this software package?

4. In the world of open source support, the mailing list is king. Does this project have a mailing list? Is it archived? Is it googled? How active is it? If there's no mailing list, is there a set of forums? The mailing list (or forum) is where you're going to go when you have a smart question to ask, and you will, so you want this resource to be as strong as possible.

5. Look at the documentation again. The first time you did so, you were just looking to exclude a project based on feature set. This time, you want to see how much the documentation can help you. Is there a tutorial? Are the advanced topics that concern you covered? For java projects, is there javadoc? Is it more than just the methods and arguments that are automatically generated? What version of the software is the documentation for?

Of course, the more I depend on a piece of software, the more time I want to spend on evaluation. On the other hand, the process laid out above is entirely technical in nature, and, as we know, there may be other reasons for choosing, or not choosing, software. Installed base, existing team experience and knowledge, project timeline, or the fact that the CEO's brother is on the board of a company with a rival offering all can influence software package choice. There are many factors to be considered, but this list is how I start off.

Posted by moore at 09:32 AM | Comments (0)

Expresso authentication and authorization

I've only briefly worked with Expresso. But I've heard good things about it. However, one 'feature' is really chapping my hide at the moment. Apparently, the only way to authenticate someone is to call the attemptLogin method on a 'Controller' object (a subclass of a Struts Action), which is protected and takes, among other things, the http request and response. There's no way I can find to just pass in a username/password and authenticate. In addition, the authorization system is not broken out either. In OO fashion, you ask an object if a user can access it, and the object knows enough to reply.

I'm not trying to rag on the Expresso developers. After all, they are giving away a fine, full featured java web framework for free. But this just drove home to me how important it is in web applications to have the classes that talk http be nothing more than a thin translating layer around business classes. For instance, all a struts action should do is convert http forms to domain specific value objects, and then call business methods on business objects.

If this was the case in Expresso, it'd be trivial for me to leverage Expresso's existing authentication model--I'd just have to fall the methods on the business object, perhaps after creating a domain specific value object. Now, however, I'll probably have to monkey around with the http request and response, and decode exactly what parameters it wants, and fake those up.

Posted by moore at 09:02 AM | Comments (1)

Open source portal search

I've been looking at some open source portals. My client has an existing java application, written in Expresso that has some reasonably complex logic embedded in it. Additionally, it's massively internationalized, with dynamic international content coming from a database, and static content coming from a set of resource bundles. There's an existing process around updating both of these sets of data. And when we're talking internationalization, we're talking Asian character sets as well as the European character sets.

So, the criteria for the portal were:

1. Support for multi-byte character sets and easy localization.

2. Ability to integrate with Expresso's authentication and authorization systems.

3. Support for normal portal features--adding/moving/removing portlets, minimize/maximize portlets.

4. Documentation.

I looked at a fair number of portals, including jcorporate's own ePortal, eXo, Liferay, Jetspeed 1, Jetspeed 2, and Pluto (a last alternative, to be avoided if possible, is to roll our own portal-like application). First, I looked at ePortal, but that's a dead project, with no releases. Then, I installed pluto, which seemed like it would be a perfect fit to be integrated into Expresso. However, integrating pluto looked complex, and after installing it (fantastic instructions for installing pluto here), I realized that pluto did not have a layout manager that would allow for the addition, rearranging or moving of portlets.

I then battled with Jetspeed 2, which involved installing a subversion client and building from source. This looked to be pretty cool, but the sheer lack of documentation, and the fact that there have been no releases, caused me to shy off. This is no failure of Jetspeed 2--this is what projects in development are like; I think it will be a fine project when done but my client just doesn't need to be on the bleeding edge. I also took a quick look at Liferay, which seems to be a much more full featured portal application than we needed. After reading this blog on portals I decided to take a closer look at eXo. However, the documentation wasn't fantastic, and it wasn't readily apparent how to plug in authentication.

I also downloaded and installed Jetspeed 1; if you download the src distribution, you get the helpful tutorial. While Jetspeed 1 is not a standards based solution (I expect most of the portlets will be custom developed anyway), the user community is fairly active, as indicated by the mailing list, and I've found the documentation to be extensive. In addition, it meets the pluggable authentication and authorization systems.

I'm less than thrilled about having to use maven for builds. Others have said it better than I, but it's just too much for my needs. However, I was able to get an independent directory tree for my project by copying over the maven.xml, project.properties, and project.xml from the tutorial directory to an empty directory. Then I tweaked the project.* files, ran maven jetspeed:genapp, tweaked a few settings in TubineResources.properties to make sure the localization settings were correct, and, voila, I have a working project tree, that, using the Jetspeed maven plugin, is one command away from a deployable war file.

Posted by moore at 08:25 AM | Comments (0)

August 15, 2004

Book Review: Java Transaction Processing

Since many financial institutions have standardized on it, I hear Java is the new COBOL. Whether or not this is true, if Java is to become the business language of choice, transaction support is crucial. (By 'transaction,' I mean 'allowing two or more decisions to me made under ACID constraints: atomically, consistently, (as) in isolation and durably'.) Over the last five ears, the Java platform has grown by leaps and bounds, not least in this area.

Java Transaction Processing by Mark Little, Jon Maron and Greg Pavlik, explores transactions and their relationship with the Java language and libraries. Starting with basic concepts of transactions, both local and distributed, including the roles of participant and coordinator, and the idea of transaction context, the book covers much old but useful ground. Then, by covering the Java Transaction API (JTA) as well as OTS, the OMG's transaction API which is JTA's foundation, this book provides a solid understanding of the complexities of transactions for Java programmers who haven't dealt with anything more complex than a single RDBMS. I'd say these complexities could be summed up simply: failures happen; how can you deal with them reliably and quickly?

The book then goes on to examine transactions and the part they play in major J2EE APIs: Java Database Connectivity (JDBC), Java Message Service (JMS), Enterprise Java Beans (EJB) and J2EE Connector Architecture (JCA). These chapters were interesting overviews of these technologies, and would be sufficient to begin programming in them. However, they are complex, and a single chapter certainly can't do justice to any of the APIs. If you're new to them, expect to buy another book.

In the last section, the authors discuss the future of transactions, especially long running activities (the Java Activity Service) and web services. This was the most interesting section to me, but also is the most likely to age poorly. These technologies are all still under development; the basic concepts, however, seem likely to remain useful for some time. And, if you need to decide on a web service transaction API yesterday, don't build your own, read chapter 10.

There were some things I didn't like about Java Transaction Processing. Some of the editing was sloppy—periods or words missing. This wasn't too big a problem for me, since the publisher provided me a free copy for review, but if I were paying list price ($50) I'd be a bit miffed. A larger annoyance was incorrect UML and Java code snippets. Again, the meaning can be figured out from the text, but it's a bit frustrating. Finally, while the authors raise some very valid points about trusting, or not, the transaction system software provider, I felt the constant trumpeting of HP and Arjuna technologies was a bit tedious. Perhaps these companies are on the forefront of Java transactions (possible); perhaps the authors are most familiar with the products of these companies (looking at the biographies, this is likely). The warnings—find out who is writing the transaction software, which is probably at the heart of your business, and how often they've written such software before—were useful, if a bit repetitive.

That said, this book was still a good read, if a bit long (~360 pages). I think that Java Transaction Processing would be especially useful for an enterprise architect looking to leverage existing (expensive) transactional systems with more modern technology, and trying to see how Java and its myriad APIs fit into the mix. (This is what I imagine, because I'm not an enterprise architect.) I also think this book would be useful to DBAs; knowing about the Java APIs and how they deal with transactions would definitely help a DBA discuss software issues with a typical Java developer.

To me, an average Java developer, the first section of the book was the most useful. While transactions are fairly simple to explain (consider the canonical bank account example), this section illuminated complexities I'd not even thought of—optimizations, heuristic outcomes, failure recovery. These issues occur even in fairly simple setups—I'm working at a client who wants to update two databases with different views of the same information, but make sure that both are updated or neither; this seems to be a typical distributed transaction. The easiest way to deal with this is to pretend that such updates will always be successful, and then accept small discrepancies. That's fine with click-throughs—money is a different matter.

However, if you are a typical web developer, I'm not sure this book is worth the price. I would borrow it from your company's enterprise architect, as reading it will make you a better programmer (as well as giving you a sense of history—transactions have been around for a long time). But, after digesting fundamental distributed transaction concepts, I won't be referencing this book anytime soon, since the scenarios simply don't happen that often (and when they do, they're often ignored, as outlined above).

Posted by moore at 03:03 PM | Comments (1)

August 08, 2004

Decreasing the size of a midlet jar

The J2ME application I have been working on has been ready for testing for quite some time, but I didn't want to get a new AT&T phone. For J2ME, you really need a GSM phone--I don't think any of the older TDMA models support it. But the GSM network coverage doesn't match the coverage of the TDMA network--especially out west (aside: isn't that magnifying glass pretty cool?). So I put off buying a phone until my summer road tripping was done.

I've had a Nokia 6160 for almost 4 years. Even though friends mocked the size of it, it was a great phone--durable, good talk time. I thought I'd try another Nokia, and got one of the lower end GSM phones, the 6200. This supported J2ME, and weighed maybe half as much. I was all stoked to try the application on my brand new phone.

I started download the jad file, and was getting 'File Too Large' errors. A couple of searches later, I found Nokia's developer device matrix which is much more useful than the User Guide or the customer facing description of phones. Whoops. Most of the Series 40 (read: affordable) Nokia devices only supported J2ME applications which were, when jarred up, less than 64K in size.

Our application, however, was about 78K. This highlights one of the differences between J2ME and J2SE/J2EE. When coding in the latter world, I was never concerned about code size--getting the job done quickly was paramount, and if I needed to use 13 libraries which bloated the final size of my application, I did. On a cell phone, however, there's no appeal to adding memory or changing the JVM configuration to optimize memory use. If the Nokia phone only accepts jars of 64K or less, I had three options:

1. Write off the Nokia Series 40 platform. Ugh--I like Nokias, and other folks do too.

2. Do some kind of magic URL classloading. This seemed complicated and I wasn't sure how to do it.

3. Decrease the size of the jar file.

Now, the 78K jar had already been run through an obfuscator. I wasn't going to get any quick and easy gains from automated software. I posted a question on the JavaRanch J2ME forum and received some useful replies. Here's the sequence I went through:

1. Original size of the application: 79884 bytes.

2. Removal of extra, unused classes: 79881. You can see that the obfuscator did a good job of winnowing out unused classes without my help.

3. Changed all the data objects (5-6 classes), which had been written in classic J2SE style with getters and setters for their properties, to have public variables instead: 79465

4. Combined 3 of the data object classes into one generic class: 78868

5. Combined 5 networking classes into 2: 74543

6. Removed all the logging statements: 66044. (Perl to the rescue--$ perl -p -i -e 's!Log\.!//Log.!' `find . -name "*.java" -print |xargs grep -l 'Log\.'`)

7. Next, I played around with the jode obfuscator which Michael Yuan recommended. I was able to radically decrease the size of the jar file, but, unfortunately, that jar file didn't work on the phone. I also got a ton of exceptions:

java.util.NoSuchElementException
        at jode.bytecode.BytecodeInfo$1.next(BytecodeInfo.java:123)
        at jode.obfuscator.modules.LocalOptimizer.calcLocalInfo(LocalOptimizer.java:370)
        at jode.obfuscator.modules.LocalOptimizer.transformCode(LocalOptimizer.java:916)
        at jode.obfuscator.MethodIdentifier.doTransformations(MethodIdentifier.java:175)
        at jode.obfuscator.ClassIdentifier.doTransformations(ClassIdentifier.java:659)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:320)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:322)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:322)
        at jode.obfuscator.PackageIdentifier.doTransformations(PackageIdentifier.java:322)
        at jode.obfuscator.ClassBundle.doTransformations(ClassBundle.java:421)
        at jode.obfuscator.ClassBundle.run(ClassBundle.java:526)
        at jode.obfuscator.Main.main(Main.java:189)

I'm sure I just didn't use it right, but the jar file size was so close to the limit that I abandoned jode.

8. Instead, I put all the classes in one file (perl to the rescue, again) and compiled that: 64057 bytes. The jar now downloads and works on my Nokia 6200 phone.

When I have to do this again, I'll definitely focus on condensing classes, basically replacing polymorphism with if statements. After removing extraneous Strings and concatenating all your classes into one .java file (both of which are one time shots), condensing classes is the biggest bang for your buck.

Posted by moore at 09:15 AM | Comments (0) | TrackBack

August 04, 2004

XML for data transmission

I was using XML as a file format for a recent project. Not a standardized dialect, just an ad hoc, custom flavor whipped up to hold the data of particular interest. Now, I've used many file formats and always thought XML was a bit hyped up. After all, it's bulky and angle brackets can be a bit tedious to wade through. In addition, parsing it is hard (creating it is difficult too, if you want to do so by creating a DOM tree). Not too hard, you say. Well, compare the joy of a StringTokenizer parsing a pipe delimited line to the pain of traversing around a DOM tree (to say nothing of the "if" hell of a SAX handler).

However, XML does have strong points. Storing hierarchical data in XML is easier. And, as far as I'm concerned, XML's killer feature as a file transport format is its self-documenting nature. Sure, you can put headers at the top of a pipe delimited file, but it's easy enough to forget them. Omitting the XML tags isn't really possible--you can choose obscure names for the tags that cloud the meaning of the data, but that's about the worst you can do. Using a custom flavor of XML as a data transmission format can be a really good idea; it just means you'll have a helper class to do the node traversing contortions everytime you want to read it. (I'm purposefully ignoring the technologies like JAX because I haven't used them.)

Posted by moore at 01:27 PM | Comments (0) | TrackBack

July 21, 2004

History of SQL

Most of the work I do involves SQL in one form or another, so I found this history of SQL to be quite interesting. (Picked it up from the DBI pod pages.)

Posted by moore at 11:50 AM | Comments (0) | TrackBack

June 16, 2004

java memory management, oh my!

How much do you understand basic java? Every day I find some part of this language that I'm not aware of, or don't understand. Some days it's cool APIS (like JAI) but today it's concurrency. Now, language managed memory is a feature that's been present in the languages in which I've been programming since I started. I've looked at C and C++, but taking a job coding in those seems to me it'd be like a job with a long commute--both have obstacles keeping you from getting real work done. (I'm not alone in feeling this way.) But this thread of comments on Cameron Purdy's blog drove home my ignorance. However, the commenters do point out several interesting articles (in particular, this article about double checked locking was useful and made my head hurt at the same time) to alleviate that. I took a class with Tom Cargill a few years back, which included his threading module, that helped a bit.

However, all these complexities are why servlets (and EJBs) are so powerful. As long as you're careful to only use local variables, why, you shouldn't have to worry about threading at all. That's what you use the container for, right? And we all know that containers are bug free, right? And you'd never have to go back and find some isolated thread related defect that affected your code a maddeningly miniscule amount of time, right?

Posted by moore at 11:00 AM | Comments (5) | TrackBack

June 09, 2004

PL/SQL

I recently wrote a basic data transformation program using Java and PL/SQL. I hadn't used PL/SQL (which is an Oracle-specific procedural language for stored procedures) since writing a basic data layer for my first professional project (a Yahoo! like application written in PL/SQL, perl and Story Server--don't ask). Anyway, revisiting PL/SQL reminded me of some of the things I liked and disliked about that language.

I like:

Invalidation of dependencies. In PL/SQL, if package A (packages are simply arbitrary, hopefully logical, groups of procedures and functions) calls package B, A depends on B. If the signatures of B are recompiled (you can separate the signatures from the implementations) package A simply won't run until you recompile it. This is something I really wish other languages would pick up, because it at least lets you know when something you depend on has changed out from under you.

I dislike:

The BEGIN and END blocks, which indicate boundaries for loops and if statements, are semantically no different than the { and } which I've grown to love in perl and Java. But for some reason, it takes me back to my pascal days and leaves a bit of a bad taste in my mouth.

I'm unsure of:

The idea of putting business logic in a database. Of course, schemas are intimately tied to the business layer (ask anyone trying to move to a different one) and anyone who pretends that switching databases in a java applications is a simple matter of changing a configuration file is smoking crack, but the putting chunks of business logic in the data layer introduces a few problems. Every different language that you use increases the complexity of a project--and to debug problems with the interface between them, you need to have someone who knows both. Also, stored procedures don't fit very well into any of the object relational mapping tools and pretty much force you to use jdbc.

Posted by moore at 12:47 PM | Comments (1) | TrackBack

May 26, 2004

Lessons from a data migration

I've been working on a data migration project for the last couple of months. There are two schemas, each used by a number of client applications implemented in a number of technologies, and I just wanted to share some of the lessons I've learned. Most of the clients are doing simple CRUD but there is some business logic going on as well. I'm sure most of these points will elicit 'no-duhs' from many of you.

1. Domain knowledge is crucial. There were many times where I made dumb mistakes because I didn't understand how one column mapped to another, or how two tables were being consolidated. This would have been easier if I'd had an understanding of the problem space (networking at level 3 and below of the OSI burrito).

2. Parallel efforts end up wasting a lot of time, and doing things in the correct order is important. For instance, much of the client code was refactored before the data layer had settled down. Result? We had to revisit the client layer again. It was hard to split up the data layer work in any meaningful manner, because of the interdependencies of the various tables (thought doing this made more sense than updating client code). Multiple users working on DDL and DML in the same database leads to my next conclusion:

3. Multiple databases are required for effective parallel efforts. Seems like a no-brainer, but the maintenance nightmare of managing multiple developer databases often leads to developers sharing one database. This is workable on a project where most of the development is happening on top of a stable database schema, but when the schema and data are what is being changed, issues arise. Toes are stepped on.

4. Rippling changes through to clients presents you with a design choice. For large changes, like tables losing columns or being consolidated, you really don't have a choice--you need to reflect those changes all the way through your application. But when it's a small change, like the renaming of a column, you can either reflect that change in your value objects, or you can hide the changes, either in the DAO (misnamed properties) or database layer (views). The latter choice will lead to confusion down the line, but is less work. However, point #5 is an alternative to both:

5. Code generation a good idea in this case. Rather than having static objects that are maintained in version control, if the value objects and DAOs had some degree of flexibility in terms of looking at the database to determine their properties, adding, deleting and renaming columns would have been much much easier--freeing up more time to fix the GUI and business layer problems that such changes would cause.

Posted by moore at 02:29 PM | Comments (0)

May 25, 2004

Understanding the nuts and bolts

I remember back when EJBs first came out and there were all these tools bundled with the application server to build the XML deployment descriptors. Yet, the team I was on built a (perl) application which could generate those same descriptors. Why? Was it a case of 'Not Invented Here' syndrome? Someone with more time than sense? Well, perhaps, but it also ensured the team had a portable way of developing deployment descriptors and made sure that someone had a deep knowledge of said files.

Now, I feel the same way about web applications in general and JSF in particular. If you want to really understand the applications you create, you want to build them from the ground up. But, rather than regurgitate the arguments expounded so clearly in The Law of Leaky Abstractions and Beware Evil Wizards, I'd like to talk about where tools are good. This touches on some of the articles I've written before, including ease of programming.

Tools tend to be a fundamental parts of large systems that have a lot of people involved. Specialized knowledge (or lack of same) can lead to tools being built to help or insulate the users from certain grungy parts of a system--hence the EJB roles which split the deployer and programmer roles (among others) apart. That works fine with a large team.

But another interesting aspect of tools is the abstraction. Joel posulates that eventually the abstraction breaks down, and I've seen it happen. But, then again, I don't expect to understand all of the socket handling that Tomcat does, or the TCP stack of the operating system on which Tomcat runs. I might have to delve into it if there are issues and it's a high performance site, but in the normal course of events, that is simply not required. To link to another pundit, situations arise where such scalability just isn't in the nature of the application. I'd also submit the tons and tons of VBA apps built on top of Outlook and the large complex spreadsheets build on Excel as examples of applications where software design, let alone a deep understanding of the fundamental building blocks of the language, is not required.

Sometimes, you just want to get the job done, and understanding the nuts and bolts isn't necessary. In fact, it can be an inhibition. I was talking to an acquaintance today who used to code. When asked why he didn't anymore, he pointed back to one factor--he wanted to be able to service the customer more quickly. At a higher level of abstraction, you can do that. You give up control, because the implementation of the service is usually in other hands (allowing you to go on to service another customer), because in the end, it all needs to be coded somehow. Tools, like Rave and Visual Studio.NET, make that trade off as well.

Posted by moore at 04:05 PM | Comments (0)

May 07, 2004

Arrogance

Ah, the arrogance of software developers. (I'm a software developer myself, so I figure I have carte blanche to take aim at the foibles of my profession.) Why, just the other day, I reviewed a legal document, and pointed out several places where I thought it could be improved (wording, some incorrect references, and whatnot). Now, why do I think that I have any business looking over a legal document (a real lawyer will check it over too)? Well, why shouldn't I? I think that most developers have a couple of the characteristics/behaviors listed below, and that these can lead to such arrogance.

1. Asking questions

Many developers have no fear, even take pride, in asking good, difficult questions about technical topics. Asking such questions can become a habit. A developer may ask a question, and feel comfortable about it, when he/she is entirely out of his/her depth.

2. Attention to detail

Developers tend to be capable of focusing on one thing to the exclusion of all else. This often means that, whatever the idea that comes along, a developer will focus on it exclusively. Such focus may turn up issues that were missed by the less attentive, or it may just be nit picking. (Finding small issues isn't nitpicking when one is developing--it's pre-emptive bug fixing.)

3. Curiosity and the desire to learn

Most developers are curious. In part because computers are so new, and in part because software technologies change so rapidly, hackers have to be curious, or they're left behind, coding Cobol (not that there's anything wrong with that!). This sometimes spills out into other portions of their lives, tweaking their bodies or the mechanics of an IPO.

4. Know something about something difficult

Yeah, yeah, most developers are not on the bleeding edge of software. But telling most people what they do usually causes some kind of 'ooh' or raised eyebrows conveying some level of expectation of the difficulty of software development. (Though this reaction is a lot less universal than it was during the dotcom boom--nowadays, it could just as easily be an 'ooh' of sympathy to an out of work .) Because developers are aware that what they do often isn't that difficult (it's just being curious, asking questions, and being attentive), it's easy to assume that other professions usually thought difficult are similarly overblown.

Now, this arrogance surfaces in other realms; for example, business plans. I am realizing just how far I fall short in that arena. I've had a few business plans, but they often fall prey to the problem that the gnomes had in South Park: no way to get from action to profit. I'm certainly not alone in this either.

In the same vein of arrogance, I used to make fun of marketing people, because everything they do is so vague and ill-defined. I always want things nailed down. But, guess what, the real world is vague and ill-defined. (Just try finding out something simple, like how many people are driving Fords, how women use the internet, or how many people truly, truly love Richie Valens. You appear to be reduced to interviewing segments of the population and extrapolating.) And if you ask people what they want, they'll lie to you. Not because they want to lie, but because they don't really know what they want.

I guess this is a confession of arrogance on the part of one software developer and an apology to all the marketroids I've snickered at over the years (oops, I just did it again :). (I promise to keep myself on a shorter leash in the future.) Thanks for going out into the real world and delivering back desires, which I can try to refine into something I can really build. It's harder than it looks.

Posted by moore at 05:02 PM | Comments (2) | TrackBack

May 03, 2004

WAP vs J2ME

When I gave my talk about J2ME to BJUG a few weeks ago, one of the points I tried to address was 'Why use J2ME rather than WAP.' This is a crucial point, because WAP is more widely distributed. I believe the user interface is better, there is less network traffic, and there are possibilities for application extension that just don't exist in WAP. (Though, to be fair, Michael Yuan makes a good point regarding issues with the optional packages standards process.)

I defended the choice of using MIDP 1.0 because we needed wide coverage and don't do many complicated things with the data, but WAP is much more widely support than J2ME, by almost any measure. If you don't have an archaic phone like my Nokia 6160, chances are you have a web browser. And WAP 2.0 supports images and XHTML, giving the application almost everything it needs without learning an entirely new markup language like WML.

So, we've decided to support XHTML and thus the vast majority of existing clients (one reason being that Verizon doesn't support J2ME--at all.) So I've gotten a quick education in WAP development recently, and I just found a quote that just sums it up:

"As you can see, this is what Web programmers were doing back in 1994. The form renders effectively the same on the Openwave Browser as it does on a traditional web browser, albeit with more scrolling."

This quote is from Openwave, a company that specializes in mobile development, so I reckon they know what they're talking about. A couple of comments:

1. WAP browsers are where the web was in 1994. (I was going to put in a link from 1994, courtesy of the Way Back Machine, but it only goes back to 1996.) I don't know about you, but I don't really want to go back! I like Flash, DHTML and onClick, even though they can be used for some truly annoying purposes.

2. "...albeit with more scrolling" reinforces, to me, the idea that presenting information on a screen of 100x100 pixels is a fundamentally different proposition than a screen where you can expect, at a minimum, 640x480. (And who codes for that anymore?) On the desktop, you have roughly 30 times as much screen real estate (plus a relatively rich language for manipulating the interface on the client). It's no surprise that I'm frustrated with I browse with WAP, since I'm used to browsing in far superior environments.

3. Just like traditional browsers, every time you want to do something complicated, you have to go to the server. You have to do this with XHTML (but not with WML, I believe. WML has its own issues, like supporting only bitmap pictures). That's not bad when you're dealing with fat pipes, but mobile networks are slow.

4. Fitting in with the carrier is an issue with WAP. Since the browser is provided, you have no control over some important issues. For example, one carrier we're investigating requires you to navigate through pages and pages of carrier imposed links before you can get to your own bookmarks. It's the whole gated community mindset; since the UI sucks, it's harder to get around than it would be with Firefox.

In short, use WAP 2.0 if you must, but think seriously about richer clients (J2ME, BREW, or even the .Net compact framework). Even though they'll be harder to implement and roll out, such clients will be easier to use, and thus more likely to become a part of your customers' lives.

Posted by moore at 01:31 PM | Comments (0)

April 29, 2004

What use is certification?

What good are certifications like the Sun Certified Java Programmer (SCJP) and the Microsoft Certified Systems Engineer programs? Unlike the Cisco certifications, you don't have to renew these every couple of years (at least the Java certifications--in fact, everything I mention below applies only to the Java certifications, as those are the only ones of which I have more than a passing knowledge). I am a SCJP for Java2, and I have an acquaintance who is a certified programmer for Java1.1; a Java1.1 cert isn't very useful unless you're targeting .Net, or writing applets that need to run on most every browser. Yet my colleague and myself can continue to call ourselves 'Java Certified Programmers.' I realize that there's an upgrade exam, but I've never met a soul who's taken it; and I don't believe I'm prohibited from heading down the Java Certification path and handing Sun more money because I am not an SCJP for the most recent version of Java. In fact, I'm studying right now for the Sun Certified Web Component Developer (SCWCD) and plan to take the exam sometime this summer. Even though these certifications may be slightly diluted by not requiring renewal, I think there are a number of reasons why they are a good thing:

1. Proof for employers.

Especially when you deal with technologies that are moving fast (granted, changes to Java have slowed down in the past few years, but it's still moving faster than, for example, C++ or SQL), employers may not have the skill set to judge your competence. Oh, in any sane environment you will probably interview with folks who are up to date on technology, but who hasn't been screened out by HR because of a lack of appropriate keywords. Having a certification is certainly no substitute for proper experience, but it serves as a baseline that employers can trust. In addition, a certification is also a concrete example of professional development: always a good thing.

2. Breadth of understanding.

I've been doing server side Java development for web environments for 3 years now, in a variety of business domains and application servers. Now, that's not a long time in programming years, but in web years, that's a fair stint. But, studying for the SCWCD, I'm learning about some aspects of web application development that I hadn't had a chance to examine before. For example, I'm learning about writing tag libraries. (Can you believe that the latest documentation I could find on sun.com about tag libraries was written in 2000?) I was aware of tag libraries, and I'd certainly used them, the struts tags among others, but learning how to implement one has really given me an appreciation for the technology. Ditto for container managed security. Studying for a certification definitely helps increase the breadth of my Java knowledge.

3. Depth of understanding.

Another aspect is an increased depth of understanding; actually reading the JSP specification or finding out what the difference is between overriding and overloading (and how one of them cares about the type of the object, whereas the other cares only about the type of the reference) or in what order static blocks get initialized. (My all time favorite bit of know-how picked up from the SCJP was how to create anonymous arrays.) The knowledge you gain from certification isn't likely to be used all the time, but it may save you when you've got a weird bug in your code. In addition, knowing some of the methods on the core classes saves you from running to the API every time (though, whenever I'm coding, the javadoc is inevitably open). Yeah, yeah, tools can help, but knowing core methods can be quicker (and your brain will always be there, unlike your IDE).

4. A goal can be an incentive.

Personally, I'm goal oriented, and having a certification to achieve gives me a useful framework for expenditure of effort. I know what I'm aiming for and I'm aware of the concrete series of steps to achieve that goal. I can learn quite a bit just browsing around, but for serious understanding, you can't beat a defined end point. I'd prefer it to be a real-world project, but a certification can be a useful stand in. (Yes, open source projects are good options too--but they may not cover as much ground and certainly, except for a few, are not as widely known as certifications.)

I've met plenty of fine programmers who weren't certified (just as I've met plenty of fine programmers who weren't CS majors). However, I think that certifications can be a useful complement to real world experience, giving job seekers some legitimacy while also increasing the depth and breadth of their understanding of a language or technology.

Posted by moore at 11:26 PM | Comments (0)

April 16, 2004

Software archeology

I presented this evening on J2ME for the kickstart meeting at BJUG, where Grady Booch was the featured speaker. After unknowingly knocking UML in his presence, I enjoyed a fine talk on software archeology. This discipline involves looking at larger, historical patterns of software development. Essentially, when we build software, we are building artifacts. And, just as the plans and meetings of the the slave foremen who built the pyramids used are not recorded, so there are aspects of present day software development that are simply lost when the project ends or the programmers die. One of Booch's projects is to capture as much of that data as possible, because these architectures are full of valuable knowledge that many folks have sweated for. It needs to happen soon, because, in his words, "time is not on our side" when it comes to collecting this kind of data. Man, I could handle that kind of job.

Speaking of architecture, I stumbled on "Effective Enterprise Java" which looks to be a set of rules for enterprise java development. I really enjoy "Effective Java", by Joshua Bloch, so I hope that Ted Neward's book lives up to its name. And I certainly hope this project doesn't get stranded like "Interface Design" apparently did.

Posted by moore at 12:19 AM | Comments (2)

April 12, 2004

Is transparent access control worth unintelligible error messages?

Partly egged on by Rob and Brian, I just took a long overdue look at container managed security for web applications.

My conclusion: it's nice, but there is one major flaw that dooms the whole premise. Users expect informative error messages when they 'sign in' and there's no way to do that with container managed security.

I was using Tomcat 4.1, which is to say, I was examining the servlet 2.3 specification. (I just looked at the 2.4 specification and can see no amelioration of the above issue.) I also focused on the FORM method of authentication, as that's the most customizable. (I imagine, for an intranet app obsessed with security, client certificates would be an worthwhile avenue of investigation.) I found the servlet specs to be very helpful in this process.

With the FORM method of authentication, you can customize the appearance of your login and error pages, to some extent. This is a huge win.

I really liked the automatic access control--no checking at the beginning of every ActionForm or JSP for any specific attribute. Additionally, you can protect different URL patterns easily, and for most of the applications I write, this is enough. If you need to protect buttons on a page, you can always resort to isUserInRole.

Also, you can protect the login and error pages, which should never be accessed directly in a separate /safe directory, to which you can prohibit all access.

For the times when the user is denied access to a resource, you you can create a custom 403 error page, using the error-page directive in web.xml. Unfortunately, you only seem to get three attributes: javax.servlet.error.message, javax.servlet.error.request_uri and javax.servlet.error.status_code, which limits the nature of your response. These were what Tomcat gave me--I don't think it's part of the spec. Regardless, IE, with default settings, doesn't display any custom error messages, which makes this a rather moot point for general webapps.

Creating a logout page is fairly easy, just call session.invalidate() (though there seem to be some non standard methods of doing it as well).

However, as mentioned above, I just don't think that users will accept the generic login error messages that you are forced to give. For instance, you can't tell whether a user didn't enter a password, or entered an incorrect password. You can't redirect them back to a login page with helpful error messages around the incorrect box. These are fundamental issues with authentication--no serious webapp simply throws up its hands when a user doesn't login correctly the *first* time.

Separate from user experience, but still related to authentication behavior, you can't 'lock out' users who've attempted to login too many times. Sure, you can keep track of how many times they've tried to login, but the authentication process is out of your hands.

Additionally, the fact that you're tied to a particular implementation for user/role definition means that writing custom authentication code that just accesses a RDMBS is actually more portable.

The answer, to the question posed in the title of this post: "is transparent access control worth unintelligible error messages?", is almost always "no." And folks accuse developers of not having any sense of user interface!

Posted by moore at 04:43 PM | Comments (2) | TrackBack

April 03, 2004

Scripting languages and productivity

Bruce Eckel has some things to say about different languages and productivity. One quote in particular stood out:

"I didn't have to look that up, or to even think about it [reading the contents of a file using python], because it's so natural. I always have to look up the way to open files and read lines in Java. I suppose you could argue that Java wasn't intended to do text processing and I'd agree with you, but unfortunately it seems like Java is mostly used on servers where a very common task is to process text."

I agree entirely. I come from a perl background (it's the language I cut my teeth on, which, I suppose, dates me), and unlike some, I'm unabashedly in favor of it. I've looked at python briefly, and it does seem to have perl's flexibility and agility with less ambiguity. When you have to grab a file from the filesystem (or parse a file and stuff it into a database) there's simply no comparison, and anyone who reaches for Java to solve such problems simply hasn't experienced the joy of the freedom of scripting languages.

The problem with such free form languages arises when you start doing large scale systems. Java, for all its faults and complexity, forces choices about implementation to be done at a high level--which framework do we want to use, how do we architect this solution. Perl (note that I'm not talking about python, since I'm a python newbie), on the other hand, is more flexible, and hence allows more latitude. It requires more discipline to code OO perl, or, for that matter, readable perl, than it does to code readable java. (There are different ways to implement objects in perl--see Object Oriented Perl for more information.) By limiting some of the latitude of the developer, you gain some maintainability.

I was trying to think of trivial examples that illustrate this point, but I couldn't. Perhaps it's because I've been out of touch with perl's evolving core libraries for so long, or perhaps it's because all the perl I've ever had to maintain has been intensely idiomatic, where all the java I've had to maintain has been, though at times obtuse, fairly easy to read, but I just feel that perl is a harder language to maintain than java.

Now, how does this apply to Eckel's statements? Well, he uses python as his example--stating that you just plain can get more done with python than you can with java. It's hard to argue with that.... But the majority of code expense and lifecycle is not in the creation but the maintenance. How do the scripting languages stack up for large scale systems? My experience (which, granted, is primarily applicable to small to medium size systems) indicates that the very flexibility which allows Bruce such amazing productivity hampers further enhancements and bug fixing on the code he writes.

Posted by moore at 08:20 AM | Comments (1)

March 21, 2004

The Grinder

I did some performance testing against a web application that I helped write this weekend. I used The Grinder and was quite happy with the beta version. This lets you write scripts in Jython and uses the quite nice HTTPClient library. The Grinder, like most other performance tools, has an admin interface (the console) and a set up distributed agents that are given tasks and communicate results back via the network to the console. I used the HTTP client, but it looks like you can 'grind' anything you can talk to via java, from databases to email servers.

I did run into a few problems. I'm using cygwin on WinXP, and had some difficulties running java from the command line. The fix was to use the cygpath command, like so:

#!/bin/sh # to start the agent JAVA=/cygdrive/c/j2sdk1.4.2_03/bin/java CLASSPATH=`cygpath -p -w /home/Owner/work/grinder/grinder-3.0-beta20/lib/grinder.jar:\ /home/Owner/work/grinder/grinder-3.0-beta20/lib/jakarta-oro-2.0.6.jar:\ /home/Owner/work/grinder/grinder-3.0-beta20/lib/jython.jar` $JAVA -cp $CLASSPATH net.grinder.Grinder

The client application that I was testing doesn't use cookies (it's a J2ME application, and the MIDP spec doesn't support cookies out of the box). Or rather, it uses cookies, but only to grab the first one that the server sends, store it off, and then pass it back as a query parameter. This type of configuration isn't The Grinder's sweet spot, and I had to do a bit of hacking to make sure the appropriate cookie value was sent with the appropriate client request. It would have been nice to use contexts but The Grinder wraps the HTTPConnection in its own class. Apparently, if you are simulating use by a browser, cookies are apparently handled correctly. One gripe--there's no javadoc for the main classes available on The Grinder's website, so you have to grab the source if you want to see interactions between pieces (for example, how net.grinder.plugin.http.HTTPRequest interacts with HTTPClient.HTTPConnection).

I also mucked with some of the properties, primarily initialSleepTime. You'll want to make sure that you read about these properties--I blithely uncommented what was in the sample grinder.properties and ended up with an obscene value for grinder.sleepTimeFactor.

After all the setup, I was able to hammer our server. I discovered two useful things: an error in our logout code, which threw exceptions around 10% of the time, and also discovered that our connection timeout between Apache and Tomcat was set incorrectly. Changing this from 0 to 1000 fixed the dreaded SEVERE: All threads are busy, waiting. Please increase maxThreads or check the servlet status error that I was getting. In addition to these two useful bugs, by making some assumptions about how the application will be used, I was able to gimmick up some interesting numbers about supportable users.

I like The Grinder a fair bit. It's got a nice GUI. It's still under active development. I'm a bit leery of using beta software (especially open source beta software), but a poll on the homepage convinced me to try the beta. By using this, I was also able to pick up snatches of python which is a new language to me (finally got to consult my long unused copy of Learning Python). I considered looking at JMeter, but The Grinder appears to be a bit more recently maintained. It's no LoadRunner, but then again, it doesn't pretend to be. All in all, if you're in the market for a cheap, quick performance tool, The Grinder is worth a look.

Posted by moore at 12:57 PM | Comments (11)

March 16, 2004

Miswanting and web application frameworks

I've wanted to respond to this post by Kris Thompson where he predicts that "Struts will continue to lose industry acceptance as the MVC leader in the J2EE space" in 2004 for some time now. I believe this is happening already; if you read the blogging community or some of the industry rags, it seems like other alternatives to Struts are being promoted (not least of which is JSF). But there are still tons of Struts applications out there being built every day. There have been over 2000 messages on the struts mailing list for the past year (granted this number is declining--perhaps because folks are GFTFA [googling for the fcuking answer]).

This article explains why I continue to develop in struts: "A wider range of slightly inferior options, then, can make it harder to settle on one you're happy with." There is definitely a wide range of J2EE frameworks. In my case, these alternatives to struts are not inferior because of any technical shortfall, but rather because I have to learn them.

(An aside: I have programmers' optimism as much as anyone else. But after a few years in the industry, I've realized that while I feel I can learn a new technology as quickly as need be, in order to really understand the pitfalls of a framework or API, I have to "build one to throw away." I really do.)

Please don't take these statements as a whiny "I don't want to learn anything new." That's not true. But my time is finite, and the project deadlines are always creeping up. And struts works well enough for my problem set.

And that's why struts is going to be around for a while yet.

Posted by moore at 02:49 PM | Comments (0)

March 04, 2004

IPTraf

Hey, I like to work at the higher levels of the 7 Layer Burrito, the Application, Presentation and Session layers. But every so often, you have to dig a bit deeper. Currently, I'm troubleshooting a ColdFusion application that was converted from a local mysql database to a remote postgresql database. There are quite a few docs about optimizing postgresql, but the focus on query and local database optimization, and I think the issue was the network traffic (based on load average of both the local and remote boxes). Anyway, I found this neat tool called IPTraf which gives you real time monitoring of ip traffic. Pretty nice, but avoid the US mirror of the binary build, since it's not complete.

Posted by moore at 11:34 AM | Comments (0) | TrackBack

February 19, 2004

Book Review: Hackers

Hackers, by Steven Levy, should be required reading for anyone who programs computers for a living. Starting from the late 1950s, when the first hackers wrote code for the TX-0 and every instruction counted, to the early 1980s, when computers fully entered the consumer mainstream, and it was marketing rather than hacking which mattered. Levy divides this time into three eras: that of the 'True Hackers,' who lived in the AI lab at MIT and spent most of their time on the PDP series, the 'Hardware Hackers,' mostly situated in Silicon Valley and responsible for enhancing the Altair and creating the Apple, and the 'Game Hackers,' who were also centered in California; expert at getting the most out of computer hardware, they were also the first to make gobs and gobs of money hacking.

The reason everyone who codes should read this book is to gain a sense of history. Because the field changes so quickly, it's easy to forget that there is a history, and, as Santayana said, "Those who do not remember the past are doomed to repeat it." It's also very humbling, at least for me, to see what kind of shenanigans were undertaken to get the last bit of performance from a piece of hardware that was amazing for its time, but now would be junked without a thought. And a third takeaway was the transformation that the game industry went through in the early 80s: first you needed technical brilliance, because the hardware was slow and new techniques needed to be discovered. However, at some point, the hard work was all done, and the business types took over. To me, this corresponds to the 1997-2001 time period, with the web rather than games being the focus.

That's one of my beefs--the version I read was written in 1983, and republished, with a new afterword in 1993. So, there's no mention of the new '4th generation' of hackers, who didn't have the close knit communities of the Homebrew Computer Club or the AI lab, but did have a far flung, global fellowship via email and newsgroups. It would be a fascinating read.

Beyond the dated nature of the book, Levy omits several developments that I think were fundamental to the development of the hacker mindset. There's only one mention of Unix in the entire book, and no mention of C. In fact, the only languages he mentions are lisp, basic and assembly. No smalltalk, and no C. I also feel that he overemphasizes 'hacking' as a way that folks viewed and interacted with the world, without defining it. For instance, he talks about Ken Williams, founder of Sierra Online, 'hacking' the company, when it looked to me like it was simple mismanagement.

For all that, it was a fantastic read. The more you identify with the geeky, single males who were in tune with the computer, the easier and more fun a read it will be, but I still think that everyone who uses a computer could benefit from reading Hackers, because of the increased understanding of the folks that we all depend on to create great software.

Posted by moore at 11:05 AM | Comments (0)

February 06, 2004

Dimensions of Application Configuration

Tom Malaher has written an excellent rant about the state of installing and configuring third party software. Since most programmers are definitively not at the bleeding edge of technology ("we need you to build another order entry system"), we all use third party software and understand some of his frustration. After all, it would be nice to be able to configure such software in any way we deemed fit, rather than having to deal with the dictates of the vendor.

Alas, such flexibility is not often found. Even among open source software, you can find rigidity. Of course, if you take the time, you can fix the problems, but the entire point of third party software is that you can use it
'out of the box,' thus saving time.

Tom gave a masterful analysis of the structural components of third party software. Though he repeatedly asks for comments and suggestions, I don't have any to make regarding his 'types of data' delineation. However, I thought it would be worthwhile to examine configuration data more closely. (Eric S Raymond also covers configuration in general here.) In fact, I think there are a number of interesting facets that tie into making configuration data easy to version, store, and separate from other types of data.

1. App specific vs universal format

You can either have one configuration files (or one set of files) that are all shared by every application (a la config.sys and win.ini) or you can have application specific configuration files for every substantial installed application (a la sendmail.conf and /etc/*).

One set of files makes it easy for the user to know where the application they just installed is configured. It also ensures that all applications use roughly the same type of configuration: the same comment character, the same sectioning logic, the same naming conventions. It also means that you can use the operating system to manage the configuration files, rather than having each application have to write their own code to create and manage their configuration.

Having each application manage their own configuration files ensures that the configuration will be tailored to the application's needs. Some applications might need a hierarchical configuration file, where some sections inherit from others. Others can get by with a simple text file with name value pairs. Another advantage of having separate configuration files is that, well, they are separate. This makes it easier to version them, as well as making it easier to tweak the configuration files, possibly to run multiple instances of one application.

2. User vs system

This is closely related to the first differentiation. However it is distinct, as it's possible to have a system format for configuration that has specific areas for users, and to have an app specific format that excludes any other application running on a given system. The crucial question is each user can have an independent installation of a given application.

It's hard to argue against allowing each user to have an individual configuration, but in certain situations, it may make sense. If, for example, there are parameters whose change may drastically affect the performance of a system (the size of a TCP packet), or which may govern specific limited resources (the allocation of ports), then it may make sense to limited user specific configuration. You may notices that my examples are all drawn from the operating system, and this may be one application where user specific configuration may not be a good idea,
ince the OS underlies all the other applications.

3. Binary vs text

There are two possible formats in which to store configuration information. One is eminently computer readable, minimizes disk usage, and increases the speed of the application. The other one is superior.

Binary configuration formats are quicker for the computer to read and take up less space on disk. However, they are prone to rot, as only the application that wrote it can read and manipulate the file. No one else can, and this unfortunately includes the poor programmer who needs to modify some behavior of the application years after it was written.

Text configuration files, on the other hand, parse slower and are bulkier. However, they can also be self describing (check out this sample sendmail configuration file for a counter example). This in itself is a win, because it gives a human being a chance to understand the file. In addition, such configuration files can also be manipulated by the bevy of tools that can transmogrify the configuration files into something else (a bit of perl, anyone). They can also be easily version controlled, and diffed. Pragmatic programmers like text files (section3.14) for many of the above reasons.

It's clear that there are several different options when it comes to configuring any one particular application. Some of these are related, and some are orthogonal, but all of them deserve consideration when designing any application.

Posted by moore at 10:08 PM | Comments (0)

January 31, 2004

Checking the status of your files, using CVS

When I used CVS a few years ago, I remember a colleague writing a tremendous perl script that you could run from anywhere in the CVS source tree. It would let you know whether you had files that weren't in CVS, needed to be updated, or were going to be merged. It was quite a nice piece of perl code, which essentially parsed the output of cvs status, and the information it output was quite useful at the end of a long bug fixing or coding session ("hey, what files did I change again?"). However, it also needed to be maintained and documented, as well as explained to users.

The other day, I stumbled on something which works almost as well, but is part of CVS already: cvs -qn up. The q option tells CVS to be quiet, and not chat about all the directories that it sees. The n option tells CVS not to make any changes on the filesystem, but just tell you what changes it would have made. Here's some sample output:

[moore@localhost guide]$ cvs -qn up ? securityTechniques/NewStuff.rtf M securityTechniques/InputValidation.rtf M securityTechniques/SessionManagement.rtf U securityTechniques/AuthenticationWorkingDraft.doc

M means that the file has been changed locally. ? means that the file exists on the locally, but not in the repository. U means that the file has changed in the repository, but not yet been updated locally. For more information on the output of update, look here.

Use this command and never lose track of the files in your CVS tree again.

Posted by moore at 11:13 AM | Comments (1)

January 11, 2004

Jalopy

I like javadoc. Heck, I like documentation. But I hate adding javadoc to my code. It's tedious, and I can never remember all the tags. I don't use an IDE so the formatting gets to me.

After attending a presentation at BJUG about software tools, I investigated jalopy and I liked what I found. Now, jalopy is more than just a javadoc comment inserter, but javadoc insertion was my primary use of the tool. It may be piss poor for code formatting and whatnot, but it was pretty good at inserting javadoc. I was using the ant plug-in and the instructions were simple and straight forward. It didn't blow away any existing comments, and it didn't munge any files, once I configured it correctly. And there are, make no mistake, lots of configuration options.

Jalopy has a slick Swing interface to set all these configuration options, and you can export your configuration to an XML file which can be referenced by others. This, along with the ant integration, make it a good choice for making sure that all code checked in by a team has similar code formatting.

However, I do have a few minor quibbles with this tool.

1. The default configuration of javadoc is busted. When you run it, it javadocs methods and classes just fine, but any fields are marked with "DOCUMENT ME!" when they should be commented out: "/** DOCUMENT ME! */". This means that, with the default configuration, you can't even run the formatter twice, since jalopy itself chokes on the uncommented "DOCUMENT ME!".

2. The configuration file is not documented anywhere that I could find. I looked long and hard on the Internet, and only found one example of a jalopy configuration file here. And this is apparently just the default options exported to a file. I've put up a sample configuration file here which fixes problem #1. (This configuration is only for javadoc; it accepts all other defaults.)

3. The zip file that you download isn't in its own directory. This means that when you unassumingly unzip it, it spews all over your current directory.

None of these are show stoppers, that's for sure. If you're looking for a free, open source java code formatting tool, jalopy is worth a close look.

Posted by moore at 08:23 PM | Comments (2)

December 24, 2003

Webhacking with CVS

In the latest edition of 2600, there's an article about webhacking with CVS. The basic premise of this article is that if you do a cvs checkout of your static html site to your webroot, you let folks with inquisitive minds and an understanding of CVS know more than you intended about your IT infrastructure. Read the article for more information.

However, it's easy enough to defeat. The answer is to use the cvs export command, which generates exactly the same files as a checkout except without the CVS directories. Rolling out updates toa a site via this command means, of course, that any changes you make to files in the web directory will be blown away. But, it could be argued that it's a godd thing to force everything to go through CVS. It also means that you can't make incremental updates as easily. It's still possible, but you just have to check out the source to some other place and copy the file over manually. Another option, which lets you do updates more easily, is rsync --cvs-exclude, which does the same thing.

Using either of these solutions makes it a bit tougher to move content to the website. But it makes things a whole lot more secure.

Posted by moore at 12:30 AM | Comments (0)

December 11, 2003

Coding Standards

I went to BJUG meeting tonight, and the topic was automatic code standardization tools. Tom Marrs gave a good presentation which covered 4 open source tools that integrate with ant:

Checkstyle checks that code fits existing guidelines. It comes configured to check against Code Conventions for the Java Programming Language. pmd is lint for java; it actually has a page where you can see it run against itself. It also finds generic exceptions and complains. Both of these tool show you where problems exist in your code, usually by generating a nice HTML report, but don't modify the source.

The next two tools actually modify your .java files. cleanImports fixes erroneous import statements, and cleans up com.foo.* imports. It's smart enough, supposedly, to only import the actual classes that are used in a particular file. Jalopy is a bit more ambitious, and attempts to fix missing javadoc, whitespace problems, brace placement and some other problems.

Now, you need a combination of these tools. The style checkers can be very strict, since they don't have to be smart enough to fix the problems they find. The code beautifiers, on the other hand, actually fix the problems that they find. Tom made some good points that these programs can generate a lot of output, and it makes sense to prioritize in deciding which problems to fix. Especially when you aren't starting with a blank slate, it makes a lot of sense to ignore some of the lesser evils (who cares about whitespace when you have a constant that isn't static final).

A member of the audience brought up a good point, which is that using these kind of tools is at least as much a political problem as it is a software problem. Few folks are going to argue that having a consistent coding standard makes maintenance easier, but I think that few folks are going to argue that it's the most important factor. But, as I see it, there are a couple of different things you can do to enforce coding standards. I list these below in increasing order of intrusiveness.

1. Make the tools available

If you make the tools available on the project, folks will probably use it. After all, who likes writing crappy code? All these tools integrate with ant, and some integrate with popular IDEs. Make developers aware of the tools and add the targets to your standard build files, and encourage folks to use it.

2. Get buy in from the team

If you're on a team, it may make sense to have 'tools meeting' at the beginning of a project (or in the middle, for that matter). Decide on basic standards (and remember, the location of braces isn't really that important), after explaining how it makes folks' lives easier. Build a consensus that using one or two of these tools is a good thing to do, and should be done before code is checked in.

3. Have senior staff dictate usage: 'thou shalt use pmd'

If the senior members of a team feel strongly about this, they can make a preemptive decision that the tools must be used. I've been on a few projects where this happened, and I can't say that it was a huge issue. After all, the senior staff make lots of arbitrary decisions (well, they look arbitrary) about architecture, team membership, etc. One more won't hurt too much.

4. Make running the tools required before check in

You can put wrapper scripts around CVS. I've seen it done it on the client side, but this can be circumvented by just running the cvs command. You can also do it on the server side. I'm not sure what the best option is, but this is a large hammer to wield: it ensures that the code meets a standard, but also displays distrust that the coder can and will do the right thing on their own. Not exactly the kind of attitude you want to convey to folks you're paying to think for you.

I think that these automatic tools are great. Code inspection, especially of a large number of classes, is something that programs are well suited for--there's a clear set of rules, it's a repetitive, boring task. But make sure that you don't forget the human element. What happens to the reported problems? No matter how much the code is automagically fixed, you need and want the programmer to look at the output of the tools, and strive to improve his or her code.

Posted by moore at 10:28 PM | Comments (0)

December 10, 2003

"Choose the right tool for the job!"

When you're writing a program to perform some business function, there are usually many different options. Whether it's the particular language, the database, the platform, or the hardware, you have to make some decisions. Like a carpenter, who chooses screws when he needs to attach two planks and a saw when he needs to shorten a dowel, programmers are supposed to choose the correct tool for the task. However, since programming is so new, changes so much, and is so abstract, it's a bit more complex than that.

There are many things that affect the right tool, and some of the considerations aren't directly technical:

Strategic change is one criteria. When I was working at a consultancy in 2000, there was a grand switch in language choice. perl was out, java was in. This decision was not made at a technical level, but rather at a strategic one. No matter that we'd have to retrain folks, and throw away a significant portion of our code base. No matter that for the sites we were doing, perl and java were a wash, except for the times that java was overkill. What did matter is that the future was seen to belong to java, and management didn't want to be left behind.

Cost of the solution is another important factor. TCO is a buzzword today, but it's true that you need to look at more than the initial cost of any piece of technology to get an idea of the true price. Linux has an initial cost of $0, but the TCO certainly isn't. There's the cost of maintaining it, the cost of paying for administrators, the upgrade cost, the security patch cost, the retraining cost, and the lock in cost. Windows is the same way--and though it's hard to put a number on it, it's clear that the future cost of a windows server is not going to be minimal, as you'll eventually be forced to upgrade or provide support yourself.

The type of problem is another reason to preference one technology over the other. Slashdot is a database backed website. They needed speed (because of the vast number of hits they receive daily), but they didn't need transactions. Hence, mysql was a perfect datastore, because it didn't (at the time) support transactions, but was very fast.

The skill sets of folks available for implementation also should affect the choice. I recently worked at a company with a large number of perl applications that were integral to the company working for them. But they are slowly replacing all of them, because most of the folks working there don't know perl. And it's not just the skill set of the existing workers, but also the pool of available talent. I've heard great things about Lisp and how efficient Lisp programmers can be, but I'd never implement a business function in Lisp, because it'd be very hard to find someone else to maintain it.

The existing environment is a related influence. If everything in your organization is Windows, then a unix solution, no matter how elegant it may be to one particular problem, is going to be a poor choice. If all your previous applications were written in perl, your first java application is probably going to use perlish data structures and program flow, and is probably going to be a poor java program. I know my first server side java fell into this pit.

Time is also a factor, in a couple of different senses. How quickly are you trying to churn this code out? Do you have time to do some research into existing solutions and best practices, or to build a prototype and then throw it away? If not, then you should probably use a tool/solution that you're familiar with, even if it's not the best solution. Some tools add to productivity and some languages are made for quick prototyping (perl!). How long will the code be around? The answer to that is almost always 'longer than you think,' although in some of the projects I worked on, it was 'only as long as the dot com boom lasts.' You need to think about the supportability of the platform. I'm working with a Paradox client server application right now. As much as I dislike the MS monopoly, I wish it were Access, because there's simply more information out there about Access.

There are many factors to consider when you choose a technology, and the best way to choose is not obviously clear, at least to me. Every single consideration outlined above could be crucial to a given project. Or it might be a no brainer. You can't really know if you've chosen the correct technology until you've built the project out, and then, unless you have a forgiving boss or client, it's probably to late to correct the worst of the mistakes. No wonder so many software projects fail.

Posted by moore at 03:36 PM | Comments (0)

December 07, 2003

Why I hate IDEs

I'm working on a project with Websphere Device Developer, and it constantly reminds me of why I hate integrated development environments (IDEs).

Why do I hate IDEs? Let me count the ways.

1. It's a whole new interface that you have to learn. How do I save files? How do I save a project? How do I move around in an editor? All these questions need to be answered when I move to a new IDE; but they all lead to a more fundamental question: why should I have to relearn how to use my keyboard every time I get a new IDE.

2. One way to do things. Most IDEs have one favored way of doing anything. They may support other means, but only haphazardly. For instance, WSDD supports me editing files the filesystem, rather than through their editor, but freaks out if I use CVS (it has problems if I use most anything other than commit and update). But sometimes you aren't even allowed the alternate method. I'm trying to get a project that was developed with one CVS repository to move to another CVS repository. WSDD lets you change repository information, but only if the project is still talking to the *same* host with the *same* cvs root. Thanks a lot guys.

3. IDEs are big pieces of code and as the size of a piece of code increases, the stability tends to decrease. In addition, they are being updated a lot more with new features (gotta give the companies some reason to buy, right). This means that you have to be aware of the environment (how often do I have to save, what work arounds do I have to use) and hence less focused on what you're really trying to do, which is write code.

4. My biggest gripe with IDEs, however, is that they do stuff I don't understand when I write or compile code. Now, I don't think I should have to understand everything--I have no idea how GCC works in anything other than the most abstract sense. But and IDE is something that I interact with every day, and it's my main interface to the code. When one does something I don't understand to the product of my time, that scares me. I'm not just talking about code generation, although that is scary enough (just because something else generated the code, that doesn't mean that it is right or that you won't have to wade in and maintain it--and maintaining stuff that I write from the ground up is hard enough for me without bringing in machine generated code). I'm also talking about all the meta state that IDEs like to maintain. What files are included where, external libraries, etc. Now, of course, something has to maintain that state, but does it have to be a monolithic program with (probably) poor documentation on that process?

Some people will say that IDEs aren't all that bad and can lead to huge productivity increases. This may be the case during the coding phase, but how much of that is lost when you have to learn a new IDE's set of behaviors or spend time figuring out why your environment is broken by the IDE? Give me simple, reliable, old fashioned and well understood tools any day over the slick, expensive, tools that I don't know and will have to learn each time I use a new one.

Posted by moore at 01:57 PM | Comments (1)

October 24, 2003

Documentation

I love documentation. I like writing it, and when it's well written, I love reading it. There are many types of documentation, and they aren't all the same. Some serves to illustrate what you can do with a product (think the little product manuals that everyone throws away). Some serves to nail down exactly what will be done (in software, between two business parties, etc). But what I'm writing about today is software documentation, especially programmer to programmer documentation.

I love it for a number of reasons. Good documentation cuts down on communication between software engineers, hence increasing scalability. At the company where I used to work, each developer had their own instance of the application server to which we were developing (whether it was ATG Dynamo, Weblogic, or Tomcat). So, every time a new developer rolled on to the project, they had to be set up. Either the programmer had to do it, or someone else did. On a couple of the projects, I was involved in setting up the first one or two, but I quickly tired of that. So, I wrote a step by step document that enabled the incoming programmer to do the setup themselves. This was good for me, because it saved me time, good for the programmer as it gave them a greater understanding of the platform on which they were developing, and good for the project, as if I got hit by a bus, the knowledge of how to set up a server wasn't lost.

Good documentation also has come to my rescue more than once, by saving information that I struggled to find at one time, but did't not use every day. For example, I imported a project I'm working on into Eclipse. It wasn't strenuous, but it wasn't a cakewalk either. So, for other programmers on the project, I wrote down how I did it. Now, a few months later, I couldn't tell you how I did it. Not at all--that knowledge has been forced out of my brain by other more important stuff--like when my parents' birthday's are, what I'm going to bring to my potluck tonight, the name of that game where you roll plastic pigs around and score points based on their position--you know, important stuff. But, should I have a need to do another import, I can! I know the knowledge is stored somewhere safe (in CVS, but that's a different entry).

There are two complaints about programmer to programmer documentation that I'd like to address. One is that it quickly becomes outdated. This is true. It takes an effort to maintain documentation. When I change the procedure or meaning of something, I try to remind myself of the two benefits above. If I can convince myself that I will save more time in the long run by documenting (through not having to explain the changes to others or myself), then I do it. I'm not always successful, I'll admit. And you can see this with product documentation (both closed and open source). Out of date documentation can be very frustrating, and I'm not sure whether it's better dealt with by tossing the documentation or by keeping it and marking it 'OUT OF DATE.'

The other issue is what I call the 'protecting your job' excuse for avoiding documentation. If you don't document what you've done, you probably will have a secure job--especially if it's an important piece of work. But that security is also a chain that binds. In addition to being a subtle gesture of distrust towards your management (always a good idea to torque off your management in this time of uncertainty), it means that when a different, and possibly better, opportunity comes along, you won't be able to take it. Since no one else knows how to do your job (because teaching someone also is a form of documenting) you're stuck in the same position. Not exactly good for your personal growth, eh?

In short, documentation that gets used is good documentation, and well worth the effort to write.

Posted by moore at 10:36 AM | Comments (0)