Skip to content

Programming - 19. page

Is transparent access control worth unintelligible error messages?

Partly egged on by Rob and Brian, I just took a long overdue look at container managed security for web applications.

My conclusion: it’s nice, but there is one major flaw that dooms the whole premise. Users expect informative error messages when they ‘sign in’ and there’s no way to do that with container managed security.

I was using Tomcat 4.1, which is to say, I was examining the servlet 2.3 specification. (I just looked at the 2.4 specification and can see no amelioration of the above issue.) I also focused on the FORM method of authentication, as that’s the most customizable. (I imagine, for an intranet app obsessed with security, client certificates would be an worthwhile avenue of investigation.) I found the servlet specs to be very helpful in this process.

With the FORM method of authentication, you can customize the appearance of your login and error pages, to some extent. This is a huge win.

I really liked the automatic access control–no checking at the beginning of every ActionForm or JSP for any specific attribute. Additionally, you can protect different URL patterns easily, and for most of the applications I write, this is enough. If you need to protect buttons on a page, you can always resort to isUserInRole.

Also, you can protect the login and error pages, which should never be accessed directly in a separate /safe directory, to which you can prohibit all access.

For the times when the user is denied access to a resource, you you can create a custom 403 error page, using the error-page directive in web.xml. Unfortunately, you only seem to get three attributes: javax.servlet.error.message, javax.servlet.error.request_uri and javax.servlet.error.status_code, which limits the nature of your response. These were what Tomcat gave me–I don’t think it’s part of the spec. Regardless, IE, with default settings, doesn’t display any custom error messages, which makes this a rather moot point for general webapps.

Creating a logout page is fairly easy, just call session.invalidate() (though there seem to be some non standard methods of doing it as well).

However, as mentioned above, I just don’t think that users will accept the generic login error messages that you are forced to give. For instance, you can’t tell whether a user didn’t enter a password, or entered an incorrect password. You can’t redirect them back to a login page with helpful error messages around the incorrect box. These are fundamental issues with authentication–no serious webapp simply throws up its hands when a user doesn’t login correctly the *first* time.

Separate from user experience, but still related to authentication behavior, you can’t ‘lock out’ users who’ve attempted to login too many times. Sure, you can keep track of how many times they’ve tried to login, but the authentication process is out of your hands.

Additionally, the fact that you’re tied to a particular implementation for user/role definition means that writing custom authentication code that just accesses a RDMBS is actually more portable.

The answer, to the question posed in the title of this post: “is transparent access control worth unintelligible error messages?”, is almost always “no.” And folks accuse developers of not having any sense of user interface!

Scripting languages and productivity

Bruce Eckel has some things to say about different languages and productivity. One quote in particular stood out:

“I didn’t have to look that up, or to even think about it [reading the contents of a file using python], because it’s so natural. I always have to look up the way to open files and read lines in Java. I suppose you could argue that Java wasn’t intended to do text processing and I’d agree with you, but unfortunately it seems like Java is mostly used on servers where a very common task is to process text.”

I agree entirely. I come from a perl background (it’s the language I cut my teeth on, which, I suppose, dates me), and unlike some, I’m unabashedly in favor of it. I’ve looked at python briefly, and it does seem to have perl’s flexibility and agility with less ambiguity. When you have to grab a file from the filesystem (or parse a file and stuff it into a database) there’s simply no comparison, and anyone who reaches for Java to solve such problems simply hasn’t experienced the joy of the freedom of scripting languages.

The problem with such free form languages arises when you start doing large scale systems. Java, for all its faults and complexity, forces choices about implementation to be done at a high level–which framework do we want to use, how do we architect this solution. Perl (note that I’m not talking about python, since I’m a python newbie), on the other hand, is more flexible, and hence allows more latitude. It requires more discipline to code OO perl, or, for that matter, readable perl, than it does to code readable java. (There are different ways to implement objects in perl–see Object Oriented Perl for more information.) By limiting some of the latitude of the developer, you gain some maintainability.

I was trying to think of trivial examples that illustrate this point, but I couldn’t. Perhaps it’s because I’ve been out of touch with perl’s evolving core libraries for so long, or perhaps it’s because all the perl I’ve ever had to maintain has been intensely idiomatic, where all the java I’ve had to maintain has been, though at times obtuse, fairly easy to read, but I just feel that perl is a harder language to maintain than java.

Now, how does this apply to Eckel’s statements? Well, he uses python as his example–stating that you just plain can get more done with python than you can with java. It’s hard to argue with that…. But the majority of code expense and lifecycle is not in the creation but the maintenance. How do the scripting languages stack up for large scale systems? My experience (which, granted, is primarily applicable to small to medium size systems) indicates that the very flexibility which allows Bruce such amazing productivity hampers further enhancements and bug fixing on the code he writes.

The Grinder

I did some performance testing against a web application that I helped write this weekend. I used The Grinder and was quite happy with the beta version. This lets you write scripts in Jython and uses the quite nice HTTPClient library. The Grinder, like most other performance tools, has an admin interface (the console) and a set up distributed agents that are given tasks and communicate results back via the network to the console. I used the HTTP client, but it looks like you can ‘grind’ anything you can talk to via java, from databases to email servers.

I did run into a few problems. I’m using cygwin on WinXP, and had some difficulties running java from the command line. The fix was to use the cygpath command, like so:

#!/bin/sh
# to start the agent
JAVA=/cygdrive/c/j2sdk1.4.2_03/bin/java
CLASSPATH=`cygpath -p -w /home/Owner/work/grinder/grinder-3.0-beta20/lib/grinder.jar:\
/home/Owner/work/grinder/grinder-3.0-beta20/lib/jakarta-oro-2.0.6.jar:\
/home/Owner/work/grinder/grinder-3.0-beta20/lib/jython.jar`
$JAVA -cp $CLASSPATH net.grinder.Grinder

The client application that I was testing doesn’t use cookies (it’s a J2ME application, and the MIDP spec doesn’t support cookies out of the box). Or rather, it uses cookies, but only to grab the first one that the server sends, store it off, and then pass it back as a query parameter. This type of configuration isn’t The Grinder’s sweet spot, and I had to do a bit of hacking to make sure the appropriate cookie value was sent with the appropriate client request. It would have been nice to use contexts but The Grinder wraps the HTTPConnection in its own class. Apparently, if you are simulating use by a browser, cookies are apparently handled correctly. One gripe–there’s no javadoc for the main classes available on The Grinder’s website, so you have to grab the source if you want to see interactions between pieces (for example, how net.grinder.plugin.http.HTTPRequest interacts with HTTPClient.HTTPConnection).

I also mucked with some of the properties, primarily initialSleepTime. You’ll want to make sure that you read about these properties–I blithely uncommented what was in the sample grinder.properties and ended up with an obscene value for grinder.sleepTimeFactor.

After all the setup, I was able to hammer our server. I discovered two useful things: an error in our logout code, which threw exceptions around 10% of the time, and also discovered that our connection timeout between Apache and Tomcat was set incorrectly. Changing this from 0 to 1000 fixed the dreaded SEVERE: All threads are busy, waiting. Please increase maxThreads or check the servlet status error that I was getting. In addition to these two useful bugs, by making some assumptions about how the application will be used, I was able to gimmick up some interesting numbers about supportable users.

I like The Grinder a fair bit. It’s got a nice GUI. It’s still under active development. I’m a bit leery of using beta software (especially open source beta software), but a poll on the homepage convinced me to try the beta. By using this, I was also able to pick up snatches of python which is a new language to me (finally got to consult my long unused copy of Learning Python). I considered looking at JMeter, but The Grinder appears to be a bit more recently maintained. It’s no LoadRunner, but then again, it doesn’t pretend to be. All in all, if you’re in the market for a cheap, quick performance tool, The Grinder is worth a look.

Miswanting and web application frameworks

I’ve wanted to respond to this post by Kris Thompson where he predicts that “Struts will continue to lose industry acceptance as the MVC leader in the J2EE space” in 2004 for some time now. I believe this is happening already; if you read the blogging community or some of the industry rags, it seems like other alternatives to Struts are being promoted (not least of which is JSF). But there are still tons of Struts applications out there being built every day. There have been over 2000 messages on the struts mailing list for the past year (granted this number is declining–perhaps because folks are GFTFA [googling for the fcuking answer]).

This article explains why I continue to develop in struts: “A wider range of slightly inferior options, then, can make it harder to settle on one you’re happy with.” There is definitely a wide range of J2EE frameworks. In my case, these alternatives to struts are not inferior because of any technical shortfall, but rather because I have to learn them.

(An aside: I have programmers’ optimism as much as anyone else. But after a few years in the industry, I’ve realized that while I feel I can learn a new technology as quickly as need be, in order to really understand the pitfalls of a framework or API, I have to “build one to throw away.” I really do.)

Please don’t take these statements as a whiny “I don’t want to learn anything new.” That’s not true. But my time is finite, and the project deadlines are always creeping up. And struts works well enough for my problem set.

And that’s why struts is going to be around for a while yet.

IPTraf

Hey, I like to work at the higher levels of the 7 Layer Burrito, the Application, Presentation and Session layers. But every so often, you have to dig a bit deeper. Currently, I’m troubleshooting a ColdFusion application that was converted from a local mysql database to a remote postgresql database. There are quite a few docs about optimizing postgresql, but the focus on query and local database optimization, and I think the issue was the network traffic (based on load average of both the local and remote boxes). Anyway, I found this neat tool called IPTraf which gives you real time monitoring of ip traffic. Pretty nice, but avoid the US mirror of the binary build, since it’s not complete.

Book Review: Hackers

Hackers, by Steven Levy, should be required reading for anyone who programs computers for a living. Starting from the late 1950s, when the first hackers wrote code for the TX-0 and every instruction counted, to the early 1980s, when computers fully entered the consumer mainstream, and it was marketing rather than hacking which mattered. Levy divides this time into three eras: that of the ‘True Hackers,’ who lived in the AI lab at MIT and spent most of their time on the PDP series, the ‘Hardware Hackers,’ mostly situated in Silicon Valley and responsible for enhancing the Altair and creating the Apple, and the ‘Game Hackers,’ who were also centered in California; expert at getting the most out of computer hardware, they were also the first to make gobs and gobs of money hacking.

The reason everyone who codes should read this book is to gain a sense of history. Because the field changes so quickly, it’s easy to forget that there is a history, and, as Santayana said, “Those who do not remember the past are doomed to repeat it.” It’s also very humbling, at least for me, to see what kind of shenanigans were undertaken to get the last bit of performance from a piece of hardware that was amazing for its time, but now would be junked without a thought. And a third takeaway was the transformation that the game industry went through in the early 80s: first you needed technical brilliance, because the hardware was slow and new techniques needed to be discovered. However, at some point, the hard work was all done, and the business types took over. To me, this corresponds to the 1997-2001 time period, with the web rather than games being the focus.

That’s one of my beefs–the version I read was written in 1983, and republished, with a new afterword in 1993. So, there’s no mention of the new ‘4th generation’ of hackers, who didn’t have the close knit communities of the Homebrew Computer Club or the AI lab, but did have a far flung, global fellowship via email and newsgroups. It would be a fascinating read.

Beyond the dated nature of the book, Levy omits several developments that I think were fundamental to the development of the hacker mindset. There’s only one mention of Unix in the entire book, and no mention of C. In fact, the only languages he mentions are lisp, basic and assembly. No smalltalk, and no C. I also feel that he overemphasizes ‘hacking’ as a way that folks viewed and interacted with the world, without defining it. For instance, he talks about Ken Williams, founder of Sierra Online, ‘hacking’ the company, when it looked to me like it was simple mismanagement.

For all that, it was a fantastic read. The more you identify with the geeky, single males who were in tune with the computer, the easier and more fun a read it will be, but I still think that everyone who uses a computer could benefit from reading Hackers, because of the increased understanding of the folks that we all depend on to create great software.

Dimensions of Application Configuration

Tom Malaher has written an excellent rant about the state of installing and configuring third party software. Since most programmers are definitively not at the bleeding edge of technology (“we need you to build another order entry system”), we all use third party software and understand some of his frustration. After all, it would be nice to be able to configure such software in any way we deemed fit, rather than having to deal with the dictates of the vendor.

Alas, such flexibility is not often found. Even among open source software, you can find rigidity. Of course, if you take the time, you can fix the problems, but the entire point of third party software is that you can use it
‘out of the box,’ thus saving time.

Tom gave a masterful analysis of the structural components of third party software. Though he repeatedly asks for comments and suggestions, I don’t have any to make regarding his ‘types of data’ delineation. However, I thought it would be worthwhile to examine configuration data more closely. (Eric S Raymond also covers configuration in general here.) In fact, I think there are a number of interesting facets that tie into making configuration data easy to version, store, and separate from other types of data.

1. App specific vs universal format

You can either have one configuration files (or one set of files) that are all shared by every application (a la config.sys and win.ini) or you can have application specific configuration files for every substantial installed application (a la sendmail.conf and /etc/*).

One set of files makes it easy for the user to know where the application they just installed is configured. It also ensures that all applications use roughly the same type of configuration: the same comment character, the same sectioning logic, the same naming conventions. It also means that you can use the operating system to manage the configuration files, rather than having each application have to write their own code to create and manage their configuration.

Having each application manage their own configuration files ensures that the configuration will be tailored to the application’s needs. Some applications might need a hierarchical configuration file, where some sections inherit from others. Others can get by with a simple text file with name value pairs. Another advantage of having separate configuration files is that, well, they are separate. This makes it easier to version them, as well as making it easier to tweak the configuration files, possibly to run multiple instances of one application.

2. User vs system

This is closely related to the first differentiation. However it is distinct, as it’s possible to have a system format for configuration that has specific areas for users, and to have an app specific format that excludes any other application running on a given system. The crucial question is each user can have an independent installation of a given application.

It’s hard to argue against allowing each user to have an individual configuration, but in certain situations, it may make sense. If, for example, there are parameters whose change may drastically affect the performance of a system (the size of a TCP packet), or which may govern specific limited resources (the allocation of ports), then it may make sense to limited user specific configuration. You may notices that my examples are all drawn from the operating system, and this may be one application where user specific configuration may not be a good idea,
ince the OS underlies all the other applications.

3. Binary vs text

There are two possible formats in which to store configuration information. One is eminently computer readable, minimizes disk usage, and increases the speed of the application. The other one is superior.

Binary configuration formats are quicker for the computer to read and take up less space on disk. However, they are prone to rot, as only the application that wrote it can read and manipulate the file. No one else can, and this unfortunately includes the poor programmer who needs to modify some behavior of the application years after it was written.

Text configuration files, on the other hand, parse slower and are bulkier. However, they can also be self describing (check out this sample sendmail configuration file for a counter example). This in itself is a win, because it gives a human being a chance to understand the file. In addition, such configuration files can also be manipulated by the bevy of tools that can transmogrify the configuration files into something else (a bit of perl, anyone). They can also be easily version controlled, and diffed. Pragmatic programmers like text files (section3.14) for many of the above reasons.

It’s clear that there are several different options when it comes to configuring any one particular application. Some of these are related, and some are orthogonal, but all of them deserve consideration when designing any application.

Checking the status of your files, using CVS

When I used CVS a few years ago, I remember a colleague writing a tremendous perl script that you could run from anywhere in the CVS source tree. It would let you know whether you had files that weren’t in CVS, needed to be updated, or were going to be merged. It was quite a nice piece of perl code, which essentially parsed the output of cvs status, and the information it output was quite useful at the end of a long bug fixing or coding session (“hey, what files did I change again?”). However, it also needed to be maintained and documented, as well as explained to users.

The other day, I stumbled on something which works almost as well, but is part of CVS already: cvs -qn up. The q option tells CVS to be quiet, and not chat about all the directories that it sees. The n option tells CVS not to make any changes on the filesystem, but just tell you what changes it would have made. Here’s some sample output:

[moore@localhost guide]$ cvs -qn up
? securityTechniques/NewStuff.rtf
M securityTechniques/InputValidation.rtf
M securityTechniques/SessionManagement.rtf
U securityTechniques/AuthenticationWorkingDraft.doc

M means that the file has been changed locally. ? means that the file exists on the locally, but not in the repository. U means that the file has changed in the repository, but not yet been updated locally. For more information on the output of update, look here.

Use this command and never lose track of the files in your CVS tree again.

Jalopy

I like javadoc. Heck, I like documentation. But I hate adding javadoc to my code. It’s tedious, and I can never remember all the tags. I don’t use an IDE so the formatting gets to me.

After attending a presentation at BJUG about software tools, I investigated jalopy and I liked what I found. Now, jalopy is more than just a javadoc comment inserter, but javadoc insertion was my primary use of the tool. It may be piss poor for code formatting and whatnot, but it was pretty good at inserting javadoc. I was using the ant plug-in and the instructions were simple and straight forward. It didn’t blow away any existing comments, and it didn’t munge any files, once I configured it correctly. And there are, make no mistake, lots of configuration options.

Jalopy has a slick Swing interface to set all these configuration options, and you can export your configuration to an XML file which can be referenced by others. This, along with the ant integration, make it a good choice for making sure that all code checked in by a team has similar code formatting.

However, I do have a few minor quibbles with this tool.

1. The default configuration of javadoc is busted. When you run it, it javadocs methods and classes just fine, but any fields are marked with “DOCUMENT ME!” when they should be commented out: “/** DOCUMENT ME! */”. This means that, with the default configuration, you can’t even run the formatter twice, since jalopy itself chokes on the uncommented “DOCUMENT ME!”.

2. The configuration file is not documented anywhere that I could find. I looked long and hard on the Internet, and only found one example of a jalopy configuration file here. And this is apparently just the default options exported to a file. I’ve put up a sample configuration file here which fixes problem #1. (This configuration is only for javadoc; it accepts all other defaults.)

3. The zip file that you download isn’t in its own directory. This means that when you unassumingly unzip it, it spews all over your current directory.

None of these are show stoppers, that’s for sure. If you’re looking for a free, open source java code formatting tool, jalopy is worth a close look.

Webhacking with CVS

In the latest edition of 2600, there’s an article about webhacking with CVS. The basic premise of this article is that if you do a cvs checkout of your static html site to your webroot, you let folks with inquisitive minds and an understanding of CVS know more than you intended about your IT infrastructure. Read the article for more information.

However, it’s easy enough to defeat. The answer is to use the cvs export command, which generates exactly the same files as a checkout except without the CVS directories. Rolling out updates toa a site via this command means, of course, that any changes you make to files in the web directory will be blown away. But, it could be argued that it’s a godd thing to force everything to go through CVS. It also means that you can’t make incremental updates as easily. It’s still possible, but you just have to check out the source to some other place and copy the file over manually. Another option, which lets you do updates more easily, is rsync --cvs-exclude, which does the same thing.

Using either of these solutions makes it a bit tougher to move content to the website. But it makes things a whole lot more secure.