Skip to content

Destroying robot generated Tomcat sessions

A large effort goes into creating sites that are crawlable by robots, such as Google, Yahoo! and other search engines. However, these programs can create a large number of sessions, if the site is based on servlet technology. Per the servlet spec (the 2.3 specification, page 50), if a client never joins a session, new sessions will be created for each request.

A session is considered new when it is only a prospective session and has not been established. Because HTTP is a request-response based protocol, an HTTP session is considered to be new until a client joins it. A client joins a session when session tracking information has been returned to the server indicating that a session has been established. Until the client joins a session, it cannot be assumed that the next request from the client will be recognized as part of a session.The session is considered to be new if either of the following is true:

  • The client does not yet know about the session
  • The client chooses not to join a session.

These conditions define the situation where the servlet container has no mechanism by which to associate a request with a previous request.

Since all these extra sessions take up memory, and are long lived, a client asked me to look into a way to invalidate them. (I’m not the first person to run into this problem.) The easiest way to do that was to build a filter that examined the User-Agent HTTP header; here’s a nice list of User-Agent values. If the client was any of the robots, we could safely invalidate the session. For some reason, in with Tomcat 4.1, I needed to run session.isNew(); before running session.invalidate();, otherwise the session wasn’t destroyed. The filter was placed at the end of the request chain, as outlined in this article, by calling chain.doFilter(request, response); before the invalidation filter looked at the request or response.

I haven’t seen any performance problems with creating a session and then throwing it away, probably because java is so good at garbage collecting short lived objects. If I did, conditionally disabling session participation in a JSP might be an option to pursue.