Skip to content

The label tag

An HTML tip for you: the label tag is a boon to usability. If you it should take you to

Small touches like this make it much easier to build forms that are forgiving in terms of user input–you can make clicking a checkboxes easy for the mouse impaired.

Doing this kind of control extension used to take a bit of javascript, but now the label tag makes it easy. Looks like <tag> is supported in modern browsers (Moz 5, IE 5).

Reliable HTTP Draft

Here’s an interesting ‘pre-draft’ of HTTPLR, an ‘application protocol for reliable transmission of messages using HTTP’ (via the author’s blog). It doesn’t require throwing away already built out infrastructure. There are, however, a few wrinkes:

This draft does require support of the PUT method, which is not available out of the box on Apache, as well as the DELETE method, which again requires webdav to work with Apache.

It uses the DELETE method (rather than the POST) to communicate client knowledge of message transfer (section 8.3, 9.3). I’m not sure how I feel about that, as it seems to be mis-using that method.

Other than that, I like the idea. It seems really well thought out, but it would be nice to see some sample code.

Setting the content encoding for HTML message parts with Javamail

I spent an hour chasing down the solution to this issue, so I figured I’d post it (or at least what worked for me). Basically, I have a multi-part message that can have different content encodings for each text part. I want to send this message via javamail. Now, there’s support for setting content as type ‘text/plain’ with a different character set, but if you want to add a part that is a different subtype of text to your message, there is no convenience method. However, this mail message had an example of how to specify html content and a character set:

MimeBodyPart htmltext = new MimeBodyPart();
htmltext.setContent(someDanishChars, "text/html; charset=\"ISO-8859-1\"");

(The author had some issues with this method in different app servers; it works fine for me in a stand alone java program.)

These additional parameters to the ‘Content-Type’ header are laid out, for text documents, in section 4.1 of RFC 2046. Here’s a collection of helpful email related RFCs. Additionally, section 5.1 of RFC 2045 outlines the way to add parameters and gives examples of the charset parameters.

RIFLE: User Centric Information Flow Security

I went to a talk yesterday about RIFLE: An Architectural Framework for User-Centric Information-Flow Security, one of a series of University of Colorado CS Colloquia. “User-Centric Information-Flow Security” (UCIFS) is a different way of enforcing security than almost anything I’ve encountered before. Basically, instead of assigning permissions to users and actions, a la JAAS, you assign levels of security to data. This security level is then tracked throughout the application, and at various endpoints (I/O activity, network transmission) a policy is enforced. Therefore, you could tag a SSN with a high security level, and any variables and decisions based on the SSN would be tagged similarly, since security levels propagate. Then, when some piece of malware tries to send your SSN (or anything related to it) off to Russia, the system intervenes.

I say UCIFS is a “different way of enforcing security than almost anything I’ve encountered” above because there’s one thing that I’ve seen that does assign a security level to some kinds of data: perl’s taint mode. I’ve used taint mode in perl cgi scripts before, and it’s a good way to make sure that untrusted data is not used in dangerous situations without the programmer’s explicit knowledge.

However, UCIFS aims a bit higher. An ideal system tracks data and its levels through all algorithms, doesn’t leak data, requires no effort from a programmer and enforces policies dynamically. According to the presenter, it turns out that no system can have zero data leakage. You can always signal the state of a variable in some way, even if it’s as crude as ceasing the operation of the program–these are called ‘covert channels’. RIFLE meets the other criteria, apparently, and does so by operating on binaries and tracking the data via extra registers (I’m on thin ice here, since I’m by no means a systems programmer).

It was an interesting talk because tracking security based on data, and giving users choices for data security, sure seems a better way of dealing with security issues than the program level trust that firewalls and ACLs now provide. Not a whole lot of real world applicability just yet (creating policies was barely touched upon, for one thing), but perhaps in the future. For more, please check out the Liberty Research Group’s website.

“cvs checkout: failed to create lock directory” solution

For those of us still using CVS, rather than the highly acclaimed subversion, I wanted to outline a solution to a common problem I’ve often seen:

One user creates a cvs module (named, for example, project) and checks in a number of files and directories. Then another developer tries to check out the module and sees this error. (Here’s another explanation of the solution.)

: cvs checkout: failed to create lock directory for
`/usr/local/cvsrepo/project'
(/usr/local/cvsrepo/project/#cvs.lock): Permission denied
: cvs checkout: failed to obtain dir lock in repository
`/usr/local/cvsrepo/project'
: cvs [checkout aborted]: read lock failed - giving up

If you go to /usr/local/cvsrepo/project, and run an ls -l, you’ll see that the permissions look like:

...
drwxrwxr-x 2 user group 4096 Feb 16 09:40 bin
...

This error message comes from the fact that the second user is not a member of group group. The best way to solve this is to create a second group, perhaps called cvs, and assign both users to that group.

Then, you want to make sure that all the files have the correct group bit set:
chown -R :cvs /usr/local/cvsrepo/project

And, you want to make sure that any new directories (and files) added use the cvs group, rather than the group group:
chmod -R g+ws /usr/local/cvsrepo/project

Your final permissions should look like:
...
drwxrwsr-x 2 user cvs 4096 Feb 16 09:40 bin
...

Now the second user and any other developers should be able to check out the code so safely stored in cvs.

Concurrency, object orientation, and getting software done

The Free Lunch Is Over, via Random Thoughts, is a fascinating look at where CPUs are headed, and what effect that has on software development. The subtitle: “The biggest sea change in software development since the OO revolution is knocking at the door, and its name is Concurrency” drives home the fact that the author believes that concurrency will be the next big thing in software development.

I was struggling to write a relevant post about this topic, becuase I feel like, at least in the companies I’ve been with, there just wasn’t that much object oriented software being written. I’m working on a project right now that has a minimum of object orientation, even though it is written in java. I’m definitely more familiar with small scale projects and web applications, but I know there are plenty of applications out there that are written and working well without the benefits of objects.

Or, should I say, that are written and working well without the benefits of objects directly. Servers, operating systems and general purpose platforms are a different beast and require a different skill set. And by building on top of such platforms, normal programmers don’t have to understand the intricacies of object oriented development–they can benefit without that investment. Of course, they’d probably benefit more if they understood things and there may come a time in their development that they’ll have to. However, the short term gain of being able to continue on their productive plateau may be worth postponing the learning process (which will take them to a higher plateau at a short term cost).

In the same way, I think that multi-threading won’t be required of normal busines developers. I was struggling with this until the latest NTK came out, with this to say:

CPUs aren’t getting faster! Multi-core is the future! Which means we’ll all need to learn concurrent, multi-threaded programming, or else our software is never going to get faster again! That’s what Herb Sutter’s future shock article in Dr. Dobbs says (below). But before you start re-learning APL, here’s a daring thought: maybe programmers are just too *stupid* to write multi-threaded software (not you of course: that guy behind you). And maybe instead we’ll see more *background* processes springing up – filling our spare CPUs with their own weird, low i/o calculations. Guessing wildly, we think background – or remote – processes are going to be the new foreground.

From the Jan 21 edition, which should be online in a day or so. Those Brits certainly have a way with words.

If you’re a typical programmer, let the brilliant programmers who are responsible for operating systems, virtual machines and application servers figure out how to best use the new speed of concurrent processor execution, and focus on process and understanding business needsand making sure they’re met by your software. Or, if you have a need for speed, look at precalculation rather than multi threading.

Expresso and dbobjects and ampersands

If you’re ever pulling a url from a database via an Expresso dbobject (Expresso’s O-R layer) and you find yourself with mysterious &amp; characters being inserted, you may want to visit this thread and the FilterManager javadoc. Long story short, add this line:

setStringFilter("fieldname", FilterManager.RAW_FILTER);

to any fields of the dbobject that you don’t want ‘made safe’ by the default filter (which screens out dangerous HTML characters). Tested on Expresso 5.5.

(I’m omitting the rant about changing data pulled from the database without making it loud and clear that default behavior is to filter certain characters. But it’s a Bad Idea.)

PL/SQL redux

I’ve written about PL/SQL before but recently have spent a significant amount of time writing stored procedures. Unlike some of my previous experiences, this time PL/SQL seemed like a great fit for the problem set, which was two fold.

In the first case, some of the stored procedures push data from stage tables, which are loaded via ODBC or SQL*Loader, into tables which the application accesses. PL/SQL is great for this type of task because cursors, especially when used with parameters, make row driven data transformations a pleasure, and fast as well. Handling deltas via updates instead of inserts was alright, and the fact is that PL/SQL code that manipulates data can be positively terse when compared to JDBC PreparedStatements and at least as fast. In addition, these stored procedures can be easily called over an ODBC connection, giving the client the capability to load new data to the stage tables and then call the stored procedure to update or insert the data as needed. (You could definitely do the same thing with a servlet and have the client hit a URL, but that’s a bit less self-contained.)

PL/SQL was also used to implement complex logic that was likely to change. Why do that in PL/SQL in the database rather than in java in the application server? Well, changes to PL/SQL programs don’t require a server restart, which can be quite an issue when a server needs high levels of uptime. Instead, you just recompile the PL/SQL. Sure, you can use the reloadable attribute of the context to achieve the same thing (if you’re using Tomcat) but recompiling PL/SQL doesn’t have the same performance hit as monitoring class files for changes.

Use the right tool for the job. Even if PL/SQL ties your application to Oracle, a judicious use of this language can have significant benefits.

Under Pressure

In almost every software project of any length that I’ve participated in, the last few weeks before a release are tense and pressure filled. (Please note that I write custom business software; that’s what these conclusions are drawn from.) Being in the middle of a project release myself, I thought I’d muse on the causes of this pressure. Why are the last few weeks before the deadline so tense? Because software is, above all else, about the details. Joel puts it well in his interview with Salon.com:

The fundamental problem that you’re trying to solve here is that humans think of things in vague, mushy terms. In order to visualize something, they don’t have to actually visualize every part of it. Whereas the programmer, in order to actually implement that thing, to create it, needs to have every part specified.

What happens on projects over a certain level of complexity is this specification is pushed off, often until a decision must be made, or even past that point. This occurs for a number of reasons: programmers want to start coding, the client doesn’t have the information at the moment the issue is raised and it is never revisited, the answers to certain questions (or the questions themselves) are dependent on answers to other questions. In the beginning of a project, big questions are decided, but the small niggling details, which the compiler most certainly needs to know about, are, perhaps noted, but not dealt with.

Why not specify how the system will work before building any of it, to every exacting detail? Some software processes try to do this, but in general, unless the problem is very well understood (in which case the client will almost always be better served by off-the-shelf software), the requirements will change as the project progresses. (Incidentally, if they don’t, the project is a great candidate for offshoring.) The client will better understand the problem and technology and the software team will likewise better understand the problem and domain space. So specifying the entire system up front will likely leave the customers unhappy or the system unused.

Because business software is actually business process crystallization, it matters very much that things are correct. Because business software is implemented by a group of people with specialized skills and a different focus from the users, at best, or no understanding of the business, at worst, software delivery is unlike other deadline driven industries in that changes are expensive and mysterious. I think every software engineer has an example of a simple change request that turned out to have massive implications throughout the system, and this effect is mysterious to normal users.

What matters is not why the details crop up, but that they do. So, the last few weeks of every project consists of mentally running around and nailing down every detail. I expect this is true of every job with fixed deadlines (ever been around a retail store the day before Thanksgiving?). Every issue should be resolved or acknowledged when the software is released, and while some facets are less important than others, no detail is unimportant.

sqlldr

I’ve been writing SQL*Loader scripts to load a fair bit of data into Oracle. I have a set of load tables with minimal constraints on them, into which SQL*Loader pushes the rows. Then I have written some PL/SQL which pulls from the load tables to the real database.

This architecture was chosen because the PL/SQL procedures can be written to allow incremental as well as full data loads. In the incremental case, it’s conceivable there there’d be a different way of pushing data over to the load tables (via ODBC or JMS, for example). In addition, the load tables can be denormalized, and you can put enough intelligence in the PL/SQL to turn your data structures into something at which a DBA won’t cringe.

Anyway, I thought I’d share a few tips, gleaned through the process. I’m definitely no SQL*Loader guru, but here are some useful links: the sqlldr FAQ, full of good information and recently updated, the Oracle Utilities page which does a great job of explaining all the options of SQL*Loader, and this case study which outlines internationalization with sqlldr. All very useful.

Two other tips: If you are loading delimited character data that is longer that 255 characters, you need to specify the length in your control file (for example, declaring it in the control file as char(4000)), or else you’ll get an aggravating error message warning that the data you’re loading is longer than the column in which you’re trying to load it. I spent some time looking very carefully at the load table trying to see what I was missing before I googled and found out that char fields do have default sizes in sqlldr control files.

And the bindsize and rows parameters are related, in terms of the amount of data that sqlldr can push into a table before it commits. You can make rows very very big, but if bindsize is too small (it defaults to 64k, apparently) the commits will happen sooner than they need to. For more explanation and other perforamance tips, see this page.

Overall, I’ve been very happy with how easy it is to load a fair bit of data, quickly (both in terms of load time and in development time) using sqlldr.