I recently placed squid in front of an Apache/Tomcat based web application to serve as a web accelerator. We could have used Apache’s mod_proxy, but squid has the ability to federate and that was considered valuable for future growth. (Plus, Wikipedia uses squid, and it has worked out pretty good for them so far.) I didn’t find a whole lot of other options–Varnish looks good, but wasn’t quite documentation and feature rich enough.

However, when the application generates a page for a user who is logged in, the content can be different than if the exact same URL is visited by a robot or a user who is not signed in. It’s easy to tell if a user is signed in, because they send cookies. What was not intuitive was how to tell Squid that pages for logged in users (matching a certain header, or a certain URL pattern) should always be referred to Tomcat. In fact, I asked about this on the mailing list, and it doesn’t seem as if it is possible at all. Squid caches objects at the page level, and can’t cache just pieces of a page (like I believe, among others, OSCache can).

I compromised by deleting the cached object (a page, for example) whenever a logged in user visits it. This forces squid to go back to the origin server, guaranteeing that the logged in user gets the correct version. Then, if a non logged in user requests the page, squid again goes back to the origin server (since it doesn’t have anything in its cache). If a second logged in user requests the same page, squid serves it out of cache. It’s not the best solution, but it works. And non logged in users are such a high proportion of the traffic that squid is able to serve a fair number of pages without touching the application.

Overall I found Squid to be pretty good–even with the above workaround it was able to take a substantial amount of traffic off the main application. Note that Squid was designed to be a forward proxy (for example, a proxy at an ISP caching commonly requested pages for that ISPs users), so if you want to use it as a web accelerator (in front of your website, to increase the speed of pages you create), you have to ignore a lot of the documentation. The FAQ is a must read, especially the section on reverse proxying and the logs section.

Technorati Tags: , ,


© Moore Consulting, 2003-2019