Skip to content

Quartz and java job scheduling

I’m working on a stand alone java application that needs some fairly sophisticated scheduling capabilities, more than java.util.Timer can provide. Normally, I’d reach for trusty old cron, but in this case, it’s a java program that needs to be run on both unix and windows with a minimum of fuss.

Quartz to the rescue. This open source java package lets you schedule a myriad of executable objects in many different ways. There are many different ways to use Quartz; there’s a nice tutorial here, and the Quartz javadoc is pretty up to date.

All I’m using it for is a cross-platform cron replacement, but it does seem to have large number of other features. The one that I’m not using that seems the most useful is the ability to differentiate between activities (a Job) to be scheduled, and the events, be they time or otherwise related, that should cause those activities to be executed (a Trigger). Nice orthogonality, but for my purposes, overkill. However, I can see the usefulness of this feature.

The coolest feature that I am using is JobExecutionContext.getScheduler() which allows any job running to access the scheduler in which they are running. They can delete themselves, verify that other jobs are working, and even shutdown the scheduler.

If you have to do scheduling in java, you should take a look at Quartz. (Here is a survey of job scheduling options in Java.)

Two Expresso Good Practices

I’ve been working with Expresso 5.5 for the last couple of months. Two things I’ve learned, though they certainly don’t qualify as ‘best practices’:

1. Expresso provides a nice way to manipulate the model by setting certain criteria and then retrieving all the rows that match such a criteria. However, this can be abused.

For example, here we look up all the Indo-European languages:

MyLangDBObject lookup = new MyLangDBObject(); lookup.setField("FAMILY", "Indo-European");
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   // process instance
}

(For more, see Chapter 6 of the Expresso Developer’s Guide.)

This is all well and good, as long as there are not a significant number of objects that match your criteria. Because each object is retrieved from the database and plunked into an ArrayList, this method can use a tremendous amount of memory. A more memory efficient method of retrieving and processing a large number of rows is:

MyLangDBObject lookup = new MyLangDBObject();
lookup.setField("FAMILY", "Indo-European");
lookup.search();
Object[] keys = lookup.getFoundKeys();
MyLangDBObject instance = new MyLangDBObject();
for (int i = 0; i

The above code still creates a large List, but each entry in that list is much smaller. I'm not sure how to treat objects with multi valued keys. I just looked in the Expresso 5.5 DBObject class, and it looks like multiple keys are concatenated with '/' and returned as a single string; beware as that's not documented anywhere and I haven't tested it.

2. When you're doing complicated filtering, DBObjects let you add a number of 'and' clauses. For example, this code finds all the dead Indo-European languages from Asia:

MyLangDBObject lookup = new MyLangDBObject();
lookup.setField("FAMILY", "Indo-European");
lookup.setField("GEOGRAPHIC_AREA", "Asia");lookup.setField("TYPE", "dead");
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   // process instance
}

This approach works well for quite a number of cases. However, if you want to do anything more complicated, such as date ranges or 'or' rather than 'and' clauses, you have three options.

* You can call setCustomWhereClause(). This allows you to escape the abstraction and essentially drops you down to SQL. All well and good; this should probably be your primary means of doing more complicated filtering. (Unfortunately, in Expresso 5.5, JoinedDataObjects, an Expresso construct which joins multiple tables together and presents a unified view thereof, do not support the setCustomWhereClause method. Apparently Expresso 5.6 has added such support.) This code finds all the languages that are dead or are Indo-European:

MyLangDBObject lookup = new MyLangDBObject();
lookup.setCustomWhereClause(
   "FAMILY = \"Indo-European\" OR TYPE = \"dead\"");
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   // process instance
}

* You can pull back the data and filter on it in the middleware server. This is a bad idea, since you're not only using java where SQL would be better used, you're also pulling back unneeded data. However, it is an option that will always work, though it may be slow. For example, if the setCustomWhereClause did not work, you could replicate the above example via this code:

MyLangDBObject lookup = new MyLangDBObject();
for (Iterator i = lookup.searchAndRetrieveList().iterator(); i.hasNext(); ) {
   MyLangDBObject instance = (MyLangDBObject) i.next();
   if (! ("Indo-European".equals(instance.getField("FAMILY"))||
      "dead".equals(instance.getField("TYPE"))
      ) ) {
         continue;
   }
   // process instance
}

* You can create a view and point the database object at the view instead of at the underlying tables. This is probably the cleanest, fastest method for a complicated where clause with read only data, since no unneeded data is returned by the database. This works for JoinedDataObjects as well. If you are making updates, however, views may or may not work.

Setting the content encoding for HTML message parts with Javamail

I spent an hour chasing down the solution to this issue, so I figured I’d post it (or at least what worked for me). Basically, I have a multi-part message that can have different content encodings for each text part. I want to send this message via javamail. Now, there’s support for setting content as type ‘text/plain’ with a different character set, but if you want to add a part that is a different subtype of text to your message, there is no convenience method. However, this mail message had an example of how to specify html content and a character set:

MimeBodyPart htmltext = new MimeBodyPart();
htmltext.setContent(someDanishChars, "text/html; charset=\"ISO-8859-1\"");

(The author had some issues with this method in different app servers; it works fine for me in a stand alone java program.)

These additional parameters to the ‘Content-Type’ header are laid out, for text documents, in section 4.1 of RFC 2046. Here’s a collection of helpful email related RFCs. Additionally, section 5.1 of RFC 2045 outlines the way to add parameters and gives examples of the charset parameters.

Runtime log4j configuration

So, I’ve spent the last day or so trying to track down how to configure log4j at runtime (log4j 1.2.8). Now, there are some things that are easy: setting the level of the root logger is as easy as: LogManager.getRootLogger().setLevel((Level) Level.DEBUG). However, if you want to do more complicated things at runtime based on other inputs than the log4j.{properties,xml} file, things begin to get a bit kludgy. For example, I wanted to set up a set of appenders with sane defaults. Then, if values were present in a configuration file, I wanted to update those appenders with different configuration values and change the root logger’s behavior.

The easiest way I could find was to manipulate the properties file, as shown below:

package test;

import org.apache.log4j.*;
import org.apache.log4j.net.SMTPAppender;
import org.apache.log4j.net.SyslogAppender;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import java.util.Properties;
import java.io.*;

public class Test {
   private Log log = LogFactory.getLog(Test.class);
   Test() {
      log.debug("test1");
      switchAppenders();
      log.debug("test2");
   }
   public static void main(String args[]) {
      Test t = new Test();
   }
   private void switchAppenders() {
      Properties props = new Properties();
      try {
           InputStream configStream = getClass().getResourceAsStream("/log4j.properties");
           props.load(configStream);
           configStream.close();
      } catch(IOException e) {
          System.out.println("Error: Cannot laod configuration file ");
      }
      props.setProperty("log4j.rootLogger","DEBUG, file");
      props.setProperty("log4j.appender.file.File","out.log");
      LogManager.resetConfiguration();
      PropertyConfigurator.configure(props);
     }
}

This code is executed via this command, making sure that log4j.properties is present in the classpath:
java -classpath .:log4j-1.2.8.jar:commons-logging.jar test.Test

This is quite a kludge, but I couldn’t find anything better out there. It has the obvious setback that the changes you make to the log4j aren’t persisted, nor can they easily happen in more than one place, and any changes to appender names break a log of things, but at least it works.

JMS at the most recent BJUG

I went to BJUG last Thursday, and enjoyed the informative talk about JMS by Chris Huston. It started out as a bit of a tutorial, with the typical “here’s a messaging system, here are the six types of messages, etc.” When he was doing the tutorial bit, I thought it was a bit simple for a main talk, but it got better as the the speaker continued. It was clear from the speaker’s comments that he was deeply knowledgeable in the subject, or, if not that, at least has been enmeshed in JMS for a while. This wasn’t just a “I downloaded an open source JMS server and ran through the Sun tutorial talk” and I appreciated that.

I had a couple of take aways. One is that managing messaging with transactions is something that you’re always going to want to do, but this is fraught with difficulty, since you’ll then have two transactional systems. And we all know what that means; you’ll have to buy this book. It also means that, in a real system, you’ll never want to use local transactions, as you’ll want the transactions to be managed by a global transaction service, typically your application server.

Recovery of such a transactional messaging service was touched upon. If you have two different transactional systems, and failure occurs, recovering can be a real issue. Chris recommended, if at all possible, having the JMS provider and your data layer live in the same database, as then you can use the backup tools and ensure the two systems are in a consistent state.

One of the most interesting parts of the evening was a question asked by the audience. A fellow asked what scenarios JMS was useful for, and Chris said it was typically used in two ways:

1. Clustering/failover. You can set up a large number of machines and since all they are getting is messages with no context, it’s much easier to fail over to another machine. There’s no state to transfer.

I’ve seen this in the Jetspeed 1.5 project, where messaging is used to allow clustering.

2. Handling a large amount of data while increasing the responsiveness of the system. Since messages can be placed into queues, with no need for immediate response, it’s possible for a message source to create a tremendous number of messages very quickly. These messages may take quite a bit of time to process, and this rules out a synchronous solution. JMS (and messaging solutions in general) allow hysteresis.

I’ve seen this in a client’s system, where they send out a tremendous number of emails and want to ensure they can track the status of each one. It’s too slow to write the status to the database for each email, but sending a message to a queue is quick enough. On the receiving end, there’s some processing and status is written to the database. The performance is acceptable and as long as the JMS provider doesn’t crash or run out of memory, no messages are lost.

The only scenario that I thought of that Chris didn’t mention is one that I haven’t seen. But I’ve heard that many legacy systems have some kind of messaging interface, and so JMS might be an easy way (again, no context required) to integrate such a system.

It was an interesting talk, and reminded me why I need to go to more BJUGs.

Options for connecting Tomcat and Apache

Many of the java web applications I’ve worked on run in the Tomcat servlet engine, fronted by an Apache web server. Valid reasons for wanting to run Apache in front of Tomcat are numerous and include increased clickstream statistics, Apache’s ability to quickly and efficiently serve static content such as images, the ability to host other dynamic solutions like mod_perl and PHP, and Apache’s support for SSL certificates. This last is especially important–any site with sensitive data (credit card information, for example) will usually have that data encrypted in transit, and SSL is the default manner in which to do so.

There are a number of different ways to deal with the Tomcat-Apache connection, in light of the concerns mentioned above:

Don’t deal with the connection at all. Run Tomcat alone, responding on the typical http and https ports. This has some benefits; configuration is simpler and fewer software interfaces tends to mean fewer bugs. However, while the documentation on setting up Tomcat to respond to SSL traffic is adequate, Apache handling SSL is, in my experience, far more common. For better or worse, Apache is seen as faster, especially when when confronted with numeric challenges like encryption. Also, as of Jan 2005, Apache serves 70% of websites while Tomcat does not serve an appreciable amount of http traffic. If you’re willing to pay, Netcraft has an SSL survey which might better illuminate the differences in SSL servers.

If, on the other hand, you choose to run some version of the Apache/Tomcat architecture, there are a few different options. mod_proxy, mod_proxy with mod_rewrite, and mod_jk all give you a way to manage the Tomcat-Apache connection.

mod_proxy, as its name suggests, proxies http traffic back and forth between Apache and Tomcat. It’s easy to install, set up and understand. However, if you use this method, Apache will decrypt all SSL data and proxy it over http to Tomcat. (there may be a way to proxy SSL traffic to a different Tomcat port using mod_proxy–if so, I was unable to find the method.) That’s fine if they’re both running on the same box or in the same DMZ, the typical scenario. A byproduct of this method is that Tomcat has no means of knowing whether a particular request came in via secure or insecure means. If using a tool like the Struts SSL Extension, this can be an issue, since Tomcat needs such information to decide whether redirection is required. In addition, if any of the dynamic generation in Tomcat creates absolute links, issues may arise: Tomcat receives requests for localhost or some other hidden hostname (via request.getServerName()), rather than the request for the public host, whichApache has proxied, and may generate incorrect links.

Updated 1/16: You can pass through secure connections by placing the proxy directives in certain virtual hosts:

<VirtualHost _default_:80>
ProxyPass /tomcatapp http://localhost:8000/tomcatapp
ProxyPassReverse /tomcatapp http://localhost:8000/tomcatapp
</VirtualHost>

<VirtualHost _default_:443>

SSLProxyEngine On
ProxyPass /tomcatapp https://localhost:8443/tomcatapp
ProxyPassReverse /tomcatapp https://localhost:8443/tomcatapp
</VirtualHost>

This doesn’t, however, address the getServerName issue.

Updated 1/17:

Looks like the Tomcat Proxy Howto can help you deal with the getServerName issue as well.

Another option is to run mod_proxy with mod_rewrite. Especially if the secure and insecure parts of the dynamic application are easily separable (for example, if the application was split into /secure/ and /normal/ chunks), mod_rewrite can be used to rewrite the links. If a user visits this url: https://www.example.com/application/secure and traverses a link to /application/normal, mod_rewrite can send them to http://www.example.com/application/normal/, thus sparing the server from the strain of serving pages needlessly encrypted.

mod_jk is the usual way to connect Apache and Tomcat. In this case, Tomcat listens on a different port and a piece of software known as a connector enables Apache to send the requests to Tomcat with more information than is possible with a simple proxy. For instance, certain variables are sent via the connector when Apache receives an SSL request. This allows Tomcat full knowledge of the state of the request, and makes using a tool like the aforementioned Struts SSL Extension possible. The documentation is good. However using mod_jk is not always the best choice; I’ve seen some performance issues with some versions of the software. You almost always have to build it yourself: binary releases of mod_jk are few and far between, I’ve rarely found the appropriate version for my version of Apache, and building mod_jk is confusing. (Even though mod_jk 1.2.8 provides an ant script, I ended up using the old ‘configure/make/make install’ process because I couldn’t make the ant script work.)

In short, there are plenty of options for connecting Tomcat and Apache. In general, I’d start out using mod_jk, simply because that’s the option that was built specifically to connect the two; mod_proxy doesn’t provide quite the same level of integration.

Useful tools: javap

javap lets you examine java class files and jar files in a number of ways. See this web page for more information. For me, it’s an API reference. I use it in two ways:

1. When I’m coding, and I need to know the exact syntax of a method, I shell out: javap java.util.StringTokenizer. (Yes, I know that any modern IDE will do this for you without shelling out, but javap will work anywhere java is installed and with any editing tool. You trade portability for convenience.) One large catch is that inherited methods are not shown:

$ javap java.io.BufferedReader
Compiled from "BufferedReader.java"
public class java.io.BufferedReader extends java.io.Reader{
public int read();
throws java/io/IOException
static {};
public void close();
throws java/io/IOException
public void reset();
throws java/io/IOException
public boolean markSupported();
public boolean ready();
throws java/io/IOException
public void mark(int);
throws java/io/IOException
public long skip(long);
throws java/io/IOException
public int read(char[],int,int);
throws java/io/IOException
public java.io.BufferedReader(java.io.Reader);
public java.io.BufferedReader(java.io.Reader,int);
public java.lang.String readLine();
throws java/io/IOException
java.lang.String readLine(boolean);
throws java/io/IOException
}

Running javap on java.io.BufferedReader does not show the method read(char[]), inherited from java.io.Reader. (This example is from the J2SE 1.4 libraries.)

2. Sometimes, the javadoc is too up-to-date (or your jar files are too old) to answer questions about an API. For example, I’m working on a project with Jetspeed which depends on Turbine version 2.2. Unfortunately, this is an extremely old version of Turbine (release 16-Aug-2003), and the javadoc doesn’t appear to be available. (Updated Dec 11: It looks like the Turbine 2.2 javadoc is indeed online. Whoops.) Generating the javadoc with ant is certainly an possibility, and if I found myself going time and again to verify the API of Turbine 2.2, I’d do that. But for a quick one- or two-off question about an API that no web search turns up, javap can be very handy.

In short, if you have a quick question about an API, javap can help you out.

Jetspeed 1 and 2

Here’s an article about Jetspeed 2. I’m actually very excited about this, though I feel that compliance to the spec is overrated. Perhaps I’m not really an enterprise developer, but I haven’t seen very many situations where I wanted a portlet (or an EJB) that had been developed on one portal server and deployed to another. But some of the Spring oriented features and ability to deploy struts application war files as portlets seem pretty neat and useful.

I’m currently working on a portal application, and we’re using Jetspeed 1–version 1.5, which has extensive documentation and has actually been released. JS1 is built on the Turbine framework. I hadn’t done much looking at this (other than a glance when I was looking at Torque for an OR layer). But so far, I’ve been really really impressed with Jetspeed and Turbine.

What has most impressed me is the configurability of these frameworks. There are a set of properties files that specify services, things like logging, persistence, localization, session validation, and authentication. The developers have done a great job of breaking these up into well defined chunks and letting me subclass services and plug in my own implementations. One example: our users do not login to the portal. Rather, they are authenticated by another application, which sets a cookie. I was able to disable Jetspeed’s own authentication system (which looks to a database) and plug in mine without making any modifications to source code. With properties overriding, I didn’t even have to modify the default properties files. Fantastic.

Jetspeed supports both JSP and Velocity as templating languages for the view. Velocity is used throughout the default portal, and I decided to use it as well. It’s an interesting language which has a lot of the benefits I’ve read about Groovy–there’s no types, and methods and properties look the same. It does look a bit perlish, I’ll admit–lots of $ and #, but it’s been fun to learn a different view language.

JS1 also provides an easy framework for developing portlets–an MVC framework, in fact. I don’t want to repeat what the relevant section of the portlet tutorial says, so I’ll just mention that developing dynamic content is a breeze.

I don’t want to say that Jetspeed has been a pure joy, however. There were a few days a couple of weeks ago that I wondered whether it had been a good choice at all. We have an existing base of users (hence the alternate authentication) and were noticing that the portal was loading slow when running against a user table of around 100K users. Then I ran some tests with The Grinder and noticed it was running really slow. Like 30 seconds to render a page with 5 portlets. Luckily, some sleuthing around indicated that there was much extraneous database access going on, and when that was eliminated, performance became acceptable.

In addition, the default build process is mavenized. I had an extremely bad experience and ended up writing a simple ant build script to do what I want. It’d be nice if both were options–since lots of people come to JS1 looking to slap a portal together for cheap, rather than thrash around with a complex build process. (I’d say at least forty percent of the messages to the mailing list are build related–not a good sign.) I just went to a BJUG talk about Maven, and Thompson was persuasive, so I might give maven another chance.

All in all, though, I’ve been very happy with Jetspeed. I am looking forward to JS2; I wish the timing had been better so I could have used it on the current project.

Useful tools: p6spy

This entry kicks off a series of entries where I’ll examine some of my favorite tools for development. Some of them will be long, some short, but all of them will highlight software I use to make my life a bit easier.

A large, large chunk of the development I do is taking data from a relational database to a an HTML screen, and back again. Often there are business rules for transforming the data, or validation rules, but making sure the data is stored safely and consistently is a high priority, and that means a relational database.

However, I do much of my work in java, which means that the relational-OO impedance mismatch is a common problem. One common way to deal with it is to use an OR tool–something like OJB or JDO. These tools provide object models of your database tables, usually with some help from you. You then have the freedom to pretend like your database doesn’t exist, and use these objects in your application. The OR framework takes care of the dirty work like SQL updates and caching.

Every convenience has its price, however, and OR mapping tools are no exception. The same abstraction that lets you pretend that you’re simply dealing with objects means that you cannot easily examine the SQL that is generated. In addition, the way that you’re using the objects may cause performance issues, because you’re treating the data as objects, rather than rows.

It’s much the same issue as calling methods over the network via RMI or accesing files via NFS: the abstraction is great and means that programmers don’t have to think about the consequences of remote access. But the failure of the abstraction can be catastrophic, all the more so because the programmer was not expecting to have to deal with the grotty details under the abstraction (that’s the whole point, right?).

OR tools do not fail often, or have many catastrophic failure modes, but they sure can be slow. With open source software, you can dig around and see how SQL is being generated, but that’s tedious and time consuming. With commercial products, you don’t even have that option. (Some OR tools may have their own ‘Show me the SQL’ switch–I haven’t run into them.)

Enter p6spy. p6spy can be used in place of any JDBC driver. You point it to the the real driver and it passes on any configuration or SQL calls to that driver. But p6spy logs every SQL statement passed to it and every result set passed back. (A fine non object oriented example of the Decorator pattern.)

It took me about 15 minutes to figure out how to use p6spy, the software is open source with decent documentation, the latest version has data source support, and it scratches an itch that most, if not all, java developers will have at some time. With p6spy, you can find out what that OR tool is doing under the covers–it’s an easy way to peel back some of the abstraction if needed.