Skip to content

Two Expresso Good Practices

I’ve been working with Expresso 5.5 for the last couple of months. Two things I’ve learned, though they certainly don’t qualify as ‘best practices’:

1. Expresso provides a nice way to manipulate the model by setting certain criteria and then retrieving all the rows that match such a criteria. However, this can be abused.

For example, here we look up all the Indo-European languages:

MyLangDBObject lookup = new MyLangDBObject(); lookup.setField("FAMILY", "Indo-European");
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   // process instance
}

(For more, see Chapter 6 of the Expresso Developer’s Guide.)

This is all well and good, as long as there are not a significant number of objects that match your criteria. Because each object is retrieved from the database and plunked into an ArrayList, this method can use a tremendous amount of memory. A more memory efficient method of retrieving and processing a large number of rows is:

MyLangDBObject lookup = new MyLangDBObject();
lookup.setField("FAMILY", "Indo-European");
lookup.search();
Object[] keys = lookup.getFoundKeys();
MyLangDBObject instance = new MyLangDBObject();
for (int i = 0; i

The above code still creates a large List, but each entry in that list is much smaller. I'm not sure how to treat objects with multi valued keys. I just looked in the Expresso 5.5 DBObject class, and it looks like multiple keys are concatenated with '/' and returned as a single string; beware as that's not documented anywhere and I haven't tested it.

2. When you're doing complicated filtering, DBObjects let you add a number of 'and' clauses. For example, this code finds all the dead Indo-European languages from Asia:

MyLangDBObject lookup = new MyLangDBObject();
lookup.setField("FAMILY", "Indo-European");
lookup.setField("GEOGRAPHIC_AREA", "Asia");lookup.setField("TYPE", "dead");
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   // process instance
}

This approach works well for quite a number of cases. However, if you want to do anything more complicated, such as date ranges or 'or' rather than 'and' clauses, you have three options.

* You can call setCustomWhereClause(). This allows you to escape the abstraction and essentially drops you down to SQL. All well and good; this should probably be your primary means of doing more complicated filtering. (Unfortunately, in Expresso 5.5, JoinedDataObjects, an Expresso construct which joins multiple tables together and presents a unified view thereof, do not support the setCustomWhereClause method. Apparently Expresso 5.6 has added such support.) This code finds all the languages that are dead or are Indo-European:

MyLangDBObject lookup = new MyLangDBObject();
lookup.setCustomWhereClause(
   "FAMILY = \"Indo-European\" OR TYPE = \"dead\"");
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   // process instance
}

* You can pull back the data and filter on it in the middleware server. This is a bad idea, since you're not only using java where SQL would be better used, you're also pulling back unneeded data. However, it is an option that will always work, though it may be slow. For example, if the setCustomWhereClause did not work, you could replicate the above example via this code:

MyLangDBObject lookup = new MyLangDBObject();
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   if (! ("Indo-European".equals(instance.getField("FAMILY"))||
      "dead".equals(instance.getField("TYPE"))
      ) ) {
         continue;
   }
   // process instance
}

* You can create a view and point the database object at the view instead of at the underlying tables. This is probably the cleanest, fastest method for a complicated where clause with read only data, since no unneeded data is returned by the database. This works for JoinedDataObjects as well. If you are making updates, however, views may or may not work.

Reliable HTTP Draft

Here’s an interesting ‘pre-draft’ of HTTPLR, an ‘application protocol for reliable transmission of messages using HTTP’ (via the author’s blog). It doesn’t require throwing away already built out infrastructure. There are, however, a few wrinkes:

This draft does require support of the PUT method, which is not available out of the box on Apache, as well as the DELETE method, which again requires webdav to work with Apache.

It uses the DELETE method (rather than the POST) to communicate client knowledge of message transfer (section 8.3, 9.3). I’m not sure how I feel about that, as it seems to be mis-using that method.

Other than that, I like the idea. It seems really well thought out, but it would be nice to see some sample code.

Setting the content encoding for HTML message parts with Javamail

I spent an hour chasing down the solution to this issue, so I figured I’d post it (or at least what worked for me). Basically, I have a multi-part message that can have different content encodings for each text part. I want to send this message via javamail. Now, there’s support for setting content as type ‘text/plain’ with a different character set, but if you want to add a part that is a different subtype of text to your message, there is no convenience method. However, this mail message had an example of how to specify html content and a character set:

MimeBodyPart htmltext = new MimeBodyPart();
htmltext.setContent(someDanishChars, "text/html; charset=\"ISO-8859-1\"");

(The author had some issues with this method in different app servers; it works fine for me in a stand alone java program.)

These additional parameters to the ‘Content-Type’ header are laid out, for text documents, in section 4.1 of RFC 2046. Here’s a collection of helpful email related RFCs. Additionally, section 5.1 of RFC 2045 outlines the way to add parameters and gives examples of the charset parameters.

Runtime log4j configuration

So, I’ve spent the last day or so trying to track down how to configure log4j at runtime (log4j 1.2.8). Now, there are some things that are easy: setting the level of the root logger is as easy as: LogManager.getRootLogger().setLevel((Level) Level.DEBUG). However, if you want to do more complicated things at runtime based on other inputs than the log4j.{properties,xml} file, things begin to get a bit kludgy. For example, I wanted to set up a set of appenders with sane defaults. Then, if values were present in a configuration file, I wanted to update those appenders with different configuration values and change the root logger’s behavior.

The easiest way I could find was to manipulate the properties file, as shown below:

package test;

import org.apache.log4j.*;
import org.apache.log4j.net.SMTPAppender;
import org.apache.log4j.net.SyslogAppender;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import java.util.Properties;
import java.io.*;

public class Test {
   private Log log = LogFactory.getLog(Test.class);
   Test() {
      log.debug("test1");
      switchAppenders();
      log.debug("test2");
   }
   public static void main(String args[]) {
      Test t = new Test();
   }
   private void switchAppenders() {
      Properties props = new Properties();
      try {
           InputStream configStream = getClass().getResourceAsStream("/log4j.properties");
           props.load(configStream);
           configStream.close();
      } catch(IOException e) {
          System.out.println("Error: Cannot laod configuration file ");
      }
      props.setProperty("log4j.rootLogger","DEBUG, file");
      props.setProperty("log4j.appender.file.File","out.log");
      LogManager.resetConfiguration();
      PropertyConfigurator.configure(props);
     }
}

This code is executed via this command, making sure that log4j.properties is present in the classpath:
java -classpath .:log4j-1.2.8.jar:commons-logging.jar test.Test

This is quite a kludge, but I couldn’t find anything better out there. It has the obvious setback that the changes you make to the log4j aren’t persisted, nor can they easily happen in more than one place, and any changes to appender names break a log of things, but at least it works.

RIFLE: User Centric Information Flow Security

I went to a talk yesterday about RIFLE: An Architectural Framework for User-Centric Information-Flow Security, one of a series of University of Colorado CS Colloquia. “User-Centric Information-Flow Security” (UCIFS) is a different way of enforcing security than almost anything I’ve encountered before. Basically, instead of assigning permissions to users and actions, a la JAAS, you assign levels of security to data. This security level is then tracked throughout the application, and at various endpoints (I/O activity, network transmission) a policy is enforced. Therefore, you could tag a SSN with a high security level, and any variables and decisions based on the SSN would be tagged similarly, since security levels propagate. Then, when some piece of malware tries to send your SSN (or anything related to it) off to Russia, the system intervenes.

I say UCIFS is a “different way of enforcing security than almost anything I’ve encountered” above because there’s one thing that I’ve seen that does assign a security level to some kinds of data: perl’s taint mode. I’ve used taint mode in perl cgi scripts before, and it’s a good way to make sure that untrusted data is not used in dangerous situations without the programmer’s explicit knowledge.

However, UCIFS aims a bit higher. An ideal system tracks data and its levels through all algorithms, doesn’t leak data, requires no effort from a programmer and enforces policies dynamically. According to the presenter, it turns out that no system can have zero data leakage. You can always signal the state of a variable in some way, even if it’s as crude as ceasing the operation of the program–these are called ‘covert channels’. RIFLE meets the other criteria, apparently, and does so by operating on binaries and tracking the data via extra registers (I’m on thin ice here, since I’m by no means a systems programmer).

It was an interesting talk because tracking security based on data, and giving users choices for data security, sure seems a better way of dealing with security issues than the program level trust that firewalls and ACLs now provide. Not a whole lot of real world applicability just yet (creating policies was barely touched upon, for one thing), but perhaps in the future. For more, please check out the Liberty Research Group’s website.

“cvs checkout: failed to create lock directory” solution

For those of us still using CVS, rather than the highly acclaimed subversion, I wanted to outline a solution to a common problem I’ve often seen:

One user creates a cvs module (named, for example, project) and checks in a number of files and directories. Then another developer tries to check out the module and sees this error. (Here’s another explanation of the solution.)

: cvs checkout: failed to create lock directory for
`/usr/local/cvsrepo/project'
(/usr/local/cvsrepo/project/#cvs.lock): Permission denied
: cvs checkout: failed to obtain dir lock in repository
`/usr/local/cvsrepo/project'
: cvs [checkout aborted]: read lock failed - giving up

If you go to /usr/local/cvsrepo/project, and run an ls -l, you’ll see that the permissions look like:

...
drwxrwxr-x 2 user group 4096 Feb 16 09:40 bin
...

This error message comes from the fact that the second user is not a member of group group. The best way to solve this is to create a second group, perhaps called cvs, and assign both users to that group.

Then, you want to make sure that all the files have the correct group bit set:
chown -R :cvs /usr/local/cvsrepo/project

And, you want to make sure that any new directories (and files) added use the cvs group, rather than the group group:
chmod -R g+ws /usr/local/cvsrepo/project

Your final permissions should look like:
...
drwxrwsr-x 2 user cvs 4096 Feb 16 09:40 bin
...

Now the second user and any other developers should be able to check out the code so safely stored in cvs.

Article on XmlHttpRequest

XmlHttpRequest popped up on my radar a few months ago when Matt covered it. Back then, everyone and their brother was talking about Google Suggest. Haven’t found time to play with it yet, but I like the idea of asynchronous url requests. There’s lots of power there, not least the ability to make pull down lists dynamic without shipping everything to the browser or submitting a form via javascript.

I found a great tutorial on XmlHttpRequest by Drew McLellan, who also has a interesting blog. Browser based apps are getting better and better UIs, as Rands notices.

The Economist on Blogging

That bastion of free trade economics and British pithy humor has an article about corporate blogging: Face Value. It focuses on Scoble and Microsoft, but also mentions other bloggers, including Jonathan Schwarz.

There’s defintely a fine line between blogging and revealing company secrets. Mark Jen certainly found that out. The quick, informal, personal nature of blogging, combined with its worldwide reach and googles cache, mean that it poses a new challenge to corporations who want to be ‘on message’.

It also exposes a new risk for employees and contractors. I blog about all kinds of technologies, including some that I’m paid to use. At what point does the knowledge I gain from a client’s project become mine, so that I can post about it? Or does it ever? (Obviously, trade secrets are off limits, but if I discover a better way to use Spring or a solution for a common struts exception, where’s the line?) Those required NDAs can be quite chilling to freedom of expression and I have at least one friend who has essentially stopped blogging due to the precarious nature of his work.

JMS at the most recent BJUG

I went to BJUG last Thursday, and enjoyed the informative talk about JMS by Chris Huston. It started out as a bit of a tutorial, with the typical “here’s a messaging system, here are the six types of messages, etc.” When he was doing the tutorial bit, I thought it was a bit simple for a main talk, but it got better as the the speaker continued. It was clear from the speaker’s comments that he was deeply knowledgeable in the subject, or, if not that, at least has been enmeshed in JMS for a while. This wasn’t just a “I downloaded an open source JMS server and ran through the Sun tutorial talk” and I appreciated that.

I had a couple of take aways. One is that managing messaging with transactions is something that you’re always going to want to do, but this is fraught with difficulty, since you’ll then have two transactional systems. And we all know what that means; you’ll have to buy this book. It also means that, in a real system, you’ll never want to use local transactions, as you’ll want the transactions to be managed by a global transaction service, typically your application server.

Recovery of such a transactional messaging service was touched upon. If you have two different transactional systems, and failure occurs, recovering can be a real issue. Chris recommended, if at all possible, having the JMS provider and your data layer live in the same database, as then you can use the backup tools and ensure the two systems are in a consistent state.

One of the most interesting parts of the evening was a question asked by the audience. A fellow asked what scenarios JMS was useful for, and Chris said it was typically used in two ways:

1. Clustering/failover. You can set up a large number of machines and since all they are getting is messages with no context, it’s much easier to fail over to another machine. There’s no state to transfer.

I’ve seen this in the Jetspeed 1.5 project, where messaging is used to allow clustering.

2. Handling a large amount of data while increasing the responsiveness of the system. Since messages can be placed into queues, with no need for immediate response, it’s possible for a message source to create a tremendous number of messages very quickly. These messages may take quite a bit of time to process, and this rules out a synchronous solution. JMS (and messaging solutions in general) allow hysteresis.

I’ve seen this in a client’s system, where they send out a tremendous number of emails and want to ensure they can track the status of each one. It’s too slow to write the status to the database for each email, but sending a message to a queue is quick enough. On the receiving end, there’s some processing and status is written to the database. The performance is acceptable and as long as the JMS provider doesn’t crash or run out of memory, no messages are lost.

The only scenario that I thought of that Chris didn’t mention is one that I haven’t seen. But I’ve heard that many legacy systems have some kind of messaging interface, and so JMS might be an easy way (again, no context required) to integrate such a system.

It was an interesting talk, and reminded me why I need to go to more BJUGs.