Comments on Yahoo! Mail storage and APIs

Check it out, I’m quoted in a piece about Yahoo! Mail in Infoworld, saying:

As far as I can tell, I’ll never need to delete another e-mail message, but if I did run into a limit, it’d be very easy for me to pull down the messages via the POP interface and store them on a personal hard drive

I stand by my words. I haven’t deleted a message for years, and doubt I ever will again. I’m not really sure what that means in terms of access–I’d be very interested to look at how many messages older than, say, a year, have been accessed. It does mean that I no longer need to decide whether or not to spend time filing/deleting/organizing mail.

I have been a premium user of Yahoo! mail for quite some time. I like the ‘beta’ JS intensive interface, though before I bought a new computer, it taxed my PC.

Personally, I think the bigger announcement is the opening of the Yahoo! mail APIs. Granted, only premium users get the full API, but I can imagine all sorts of neat applications built on that API. (The standard API doesn’t look that useful.) For example, you could build an external program to live out of your Yahoo Mail, like this fellow, but with offline capacity, or a simple blogging client where you leverage Yahoo’s superb rich client interface and platform to generate content, and then push it elsewhere.

Out of the box complete open source WAMP stack

If you’re looking for an out of the box WAMP stack, I had good luck with Apache2Triad. A big download, but you’re getting Apache, PostGreSQL, MySQL, a mail server and an FTP server, plus various admin tools. I had things running in about 15 minutes–very cool, given that I’d spent a fair bit of time just trying to get PostGreSQL installed alone. The only hiccup was that my spyware tool thought SlimFTPd was a virus.

Beware, though, if you have an existing MySQL installation. Apache2Triad won’t blow away the data, but it will usurp the Services entry. And then, if you remove Apache2Triad, your existing MySQL instance is orphaned. I was able to get mine up and running again, from the command line. But to get it running again as a service required a complete uninstall/reinstall. No fun.

So, if you have no dev environment and need a quick start, take a look at this package. If you, on the other hand, have existing development tools installed, be more careful than I was.

Technorati Tags:

Review of MODx

Update 7/2010: Here’s a more recent review of MODx.  I’m bummed, but I have not yet had another opportunity to use MODx.

A couple of weeks ago, I mentioned I’d be reviewing MODx in the near future. I recently used it and was quite impressed by this project. It is a web CMS written in PHP with mysql as the datastore. I say ‘web CMS’ because MODx is designed to manage web content, as opposed to a more enterprisey CMS like StoryServer, which can manage all different types of content with complex workflows. You could use MODx to manage, say, printed brochures, but that would take some finagling. Web content is the sweet spot of this framework. Currently at version 0.9.5, it is fairly mature and ready for use.

I wanted to review MODx because I believe it deserves more attention, and as an example of how I’d evaluate an open source project, on the back end, rather than on the front end, of a project.

The good:

full featured

I was able to find everything I needed in MODx, or in the extensions repository. This included a thumbnail generator, email forms, integrated rich text editor, SEO friendly links, database updating forms, and a content redistributor (that syndicated content internally within the site).

admin interface

I think the administrative interface is excellent for non technical users. It’s responsive, and intuitive.

user access/authentication system

There’s a very well thought out access system. You can assign users to roles, which lets them access certain
functions in the admin interface, and groups, which define groups of documents that a user may modify. Here is more documentation on this feature.

defined development architecture

When you start working with MODx and you want to do something more than a cookie cutter website, you start hearing jargon, like snippets, plugins, chunks and template variables. All of these are MODx specific concepts, and it takes a while to wrap your head around them. But when you do, you appreciate the thoughtfulness of the architecture. In particular, you rarely have to modify existing source–there are hooks and easy ways to tie in custom code. (Most of these hooks are for the user side–to modify the admin interface, I had to hack some existing PHP.)


The MODx forums are the heart of the community. There are quite a few active members. I found the community to be very responsive and friendly to any questions I had, no matter how dumb.

a growing set of extensions

The MODx repository has a number of useful extensions. I especially liked that each entry in the repository is labeled with the version of MODx it supports.

active development

The project has gone from start up to 0.9.5 in less than two years.

open source

The license is GPL version 2.


You can turn on a simple form of caching, which will serialize a generated page to disk. Unfortunately, there’s no way to expire that cache. You can delete it, on a site wide or page by page basis, but you can’t say ‘expire the cached version of this page in one month’. Still, for many pages, this is an appropriate form of caching and can noticeably speed up the site.

The bad:


There is a ton of documentation for MODx, even a wiki. But I always felt like I was missing something–either it was hard to find what you wanted, or when you did, it wasn’t enough. An example is the API documentation. Here’s a sample function call that you’d make on the $modx object. No explanation of the returned data structure is available, and no actual example of how to call this function. I became very friendly with var_export($var,TRUE); and print statements to navigate these returned structures.

dependency on the database

MODx is very tightly bound to mysql. No problem there–mysql is a great database. But I mean, it’s really tied to mysql. By default, all code you write (see ‘defined development architecture’ above) is stored in the database. That’s not the place for code! Luckily, you can avoid that by using an include: include($modx->config['filemanager_path'].'/assets/libs/thumbs/thumbplugin.php');

This way, the code is on the file system, and can be versioned, etc. Also, since MODx depends on the database for so much functionality, make doubly sure you backup the database.

error messages in development

PHP syntax errors can be hard to track down. I ended up using a lot of command line debugging: php -l -f foo.php and looking in error log to see messages.

5000 document limit

This is a big one. Because of the caching mechanism, you can’t have more than 5000 documents in a MODx website.  However, this is acknowledged as a lack, and the team is working on it.

no search in specific forum

The forums are great, and are divided up into various sections. However, there was no way (that I could find), to search within a particular forum, or even within just the forums. This meant that when you were searching, you ended up with a lot of extraneous results.

Sure, MODx isn’t right for every site. But if you have a PHP savvy developer, a non technical userbase, requirements more complex than brochureware, and want to get a site up and running quickly, MODx is worth a look. As I’ve said before, use the right tool for the job.

Much thanks to the developers of MODx for putting together a great generic web CMS development platform!

Update 11/2009: HostColor offers MODx hosting for a reasonable price.  If you’re looking, check ’em out (click the CMS Hosting link).  Disclaimer: I make a bit of money if you visit them and/or sign up.

Technorati Tags: , ,

TWiki, apache authentication and denying view access to anyone not authenticated

I just spent a half an hour chasing my tail trying to get TWiki to deny view privileges to anonymous users (who are assigned the TWikiGuest userid. I have a client that is going to be using TWiki as a document repository/portal, and wanted to make sure that we weren’t depending on ‘security through obscurity’.

We’re using Apache Authentication and it worked just fine for editing–you had to login before you could edit anything. We only want to limit access to certain Webs (it would be nice if people who knew about it could self register, which requires access to the TWiki web). I tried to edit the WebPreferences for the web to be protected and set DENYWEBVIEW = MainWeb.TWikiGuest This denied view of the Dev web whether or not I was logged in.

Using the %WIKIUSERNAME% variable and this post on a similar problem led me to conclude that the REMOTE_USER environment variable wasn’t being carried across invocations. On every view, TWiki thought I was the TWikiGuest, until I explicitly logged in. Then, as long as I was editing, it was fine, but viewing was still denied.

That led me to this FAQ: Why is the environment variable REMOTE_USER var not set? which states that the REMOTE_USER variable isn’t sent on every request, but only for protected resources.

Protecting my view.cgi script did the trick. I did so by adding this line to my apache config (in the twiki/bin directory entry) and restarting it:

<FilesMatch "(view|attach|edit|manage|rename|save|upload|mail|logon|.*auth).*">
require valid-user

Now, the only problem is that self registration doesn’t work. But that’s minor, as I can create a guest user and have folks login to register with that user (and deny it access to the protected webs too, to force everyone to register).

Technorati Tags: ,

Eclipse Remote File Synchronization Plugins

I was talking to a friend about Eclipse and he was saying that one of the things keeping him from using Eclipse was the lack of a ssh synchronization plugin (so that he could edit locally and deploy to a remote server, a typical web application setup). I typically use CVS for that purpose, but sometimes it’s overkill.

I took it upon myself to find one, because I think it’d be useful too. I found a few (all open source):

Sftp Plugin: Not updated since 2003, didn’t work in Eclipse 3.2.1. CPL Licensed. Seemed like it had the nicest interface.

Deployer: Works, but only deployed one file at a time (that I could see). Not released since 2005. LGPL.

DeployerFTP: Looked to be FTP only (according to the documentation), released in 2006.

And finally, one that worked for me: Esftp. Last released in 2006, but works with Eclipse 3.2.1. LGPL 2.1. Weird install procedure (but if you read the README.txt in the distribution, it makes sense) so make sure to read the installation instructions.

Some of these were listed in the Eclipse Plugins directory, but others weren’t. Caveat emptor.

Technorati Tags:

Initialize your GWT widgets

I’m a big fan of using GWT to increase web application usability in an incremental fashion. It may be fine to use GWT to build a whole-blown application, but I’ve never done that. When you go the widget approach, often you want to configure the widget, perhaps based on the page it is on. Kevin Jansz talks about how to give a GWT module init params (very much like init-param elements in web.xml). He suggests using the Dictionary class, which is in the i18n module. For a sweet example (that is not even related to i18n), read the Dictionary doc linked to above.There are some caveats. From the aforementioned documentation:

…the Dictionary class is fully dynamic. As a result, a variety of error conditions (particularly those involving key mismatches) cannot be caught until runtime. Similarly, the GWT compiler is unable discard unused dictionary values since the structure cannot be statically analyzed.

To me, using a Dictionary is a better way of getting configuration information from a host page than what I’ve done in the past: write a value to a hidden span and use the DOM GWT class to access it. Much clearer and no unneeded DOM elements. In fact, if you wanted to get fancy, you could generate the javascript object properties dynamically (this is conjecture, I’ve not tested this).

Nice find Kevin!

Webapp performance tuning tool list

Here’s a great article about performance tuning web applications. In short, have a goal, and measure, measure, measure. Otherwise, you’re just shooting in the dark at a pin in a haystack. Or something like that.

I’ve touched on the complexity of performance testing web applications before, but this article goes me one better by outlining various tools that can be used to actual test different pieces of the stack.

I did notice one missing piece, though. The SitePen folks outline tools to test from the browser to the web server, and then the database server. But they don’t mention any app server or web server profilers. I wonder whether that’s an unintentional oversight, or whether they haven’t needed to tune dynamic business logic, either in the app server or web server layer.

I don’t have any business logic layer performance tuning tools to suggest, either. Looks like has a number of profilers–anyone have experience using one?

Technorati Tags:

XHR Data Caching and the Dojo Offline Toolkit

I’ve found caching JSON to be very useful when writing GWT components. Basically, the XMLHttpRequest will go to the browser cache first, and if the data is (relatively) static, you can direct the browser to cache it. More about caching in this post. If the browser hasn’t seen that url before, it will get it from the server. Using some kind of rewrite tool to make sure the data page looks like a normal HTML page tends to be good form.

But the Dojo Offline Toolkit promises to take things to a whole new level. It’ll be interesting if he succeeds. I took a look at his milestone list, and there are a number of ‘Figure Out’ steps, as you’d expect with something this ambitious and new. Well worth looking at if you write web applications of any kind.

Technorati Tags: , ,

Options for connecting Tomcat and Apache

Many of the java web applications I’ve worked on run in the Tomcat servlet engine, fronted by an Apache web server. Valid reasons for wanting to run Apache in front of Tomcat are numerous and include increased clickstream statistics, Apache’s ability to quickly and efficiently serve static content such as images, the ability to host other dynamic solutions like mod_perl and PHP, and Apache’s support for SSL certificates. This last is especially important–any site with sensitive data (credit card information, for example) will usually have that data encrypted in transit, and SSL is the default manner in which to do so.

There are a number of different ways to deal with the Tomcat-Apache connection, in light of the concerns mentioned above:

Don’t deal with the connection at all. Run Tomcat alone, responding on the typical http and https ports. This has some benefits; configuration is simpler and fewer software interfaces tends to mean fewer bugs. However, while the documentation on setting up Tomcat to respond to SSL traffic is adequate, Apache handling SSL is, in my experience, far more common. For better or worse, Apache is seen as faster, especially when when confronted with numeric challenges like encryption. Also, as of Jan 2005, Apache serves 70% of websites while Tomcat does not serve an appreciable amount of http traffic. If you’re willing to pay, Netcraft has an SSL survey which might better illuminate the differences in SSL servers.

If, on the other hand, you choose to run some version of the Apache/Tomcat architecture, there are a few different options. mod_proxy, mod_proxy with mod_rewrite, and mod_jk all give you a way to manage the Tomcat-Apache connection.

mod_proxy, as its name suggests, proxies http traffic back and forth between Apache and Tomcat. It’s easy to install, set up and understand. However, if you use this method, Apache will decrypt all SSL data and proxy it over http to Tomcat. (there may be a way to proxy SSL traffic to a different Tomcat port using mod_proxy–if so, I was unable to find the method.) That’s fine if they’re both running on the same box or in the same DMZ, the typical scenario. A byproduct of this method is that Tomcat has no means of knowing whether a particular request came in via secure or insecure means. If using a tool like the Struts SSL Extension, this can be an issue, since Tomcat needs such information to decide whether redirection is required. In addition, if any of the dynamic generation in Tomcat creates absolute links, issues may arise: Tomcat receives requests for localhost or some other hidden hostname (via request.getServerName()), rather than the request for the public host, whichApache has proxied, and may generate incorrect links.

Updated 1/16: You can pass through secure connections by placing the proxy directives in certain virtual hosts:

<VirtualHost _default_:80>
ProxyPass /tomcatapp http://localhost:8000/tomcatapp
ProxyPassReverse /tomcatapp http://localhost:8000/tomcatapp

<VirtualHost _default_:443>

SSLProxyEngine On
ProxyPass /tomcatapp https://localhost:8443/tomcatapp
ProxyPassReverse /tomcatapp https://localhost:8443/tomcatapp

This doesn’t, however, address the getServerName issue.

Updated 1/17:

Looks like the Tomcat Proxy Howto can help you deal with the getServerName issue as well.

Another option is to run mod_proxy with mod_rewrite. Especially if the secure and insecure parts of the dynamic application are easily separable (for example, if the application was split into /secure/ and /normal/ chunks), mod_rewrite can be used to rewrite the links. If a user visits this url: and traverses a link to /application/normal, mod_rewrite can send them to, thus sparing the server from the strain of serving pages needlessly encrypted.

mod_jk is the usual way to connect Apache and Tomcat. In this case, Tomcat listens on a different port and a piece of software known as a connector enables Apache to send the requests to Tomcat with more information than is possible with a simple proxy. For instance, certain variables are sent via the connector when Apache receives an SSL request. This allows Tomcat full knowledge of the state of the request, and makes using a tool like the aforementioned Struts SSL Extension possible. The documentation is good. However using mod_jk is not always the best choice; I’ve seen some performance issues with some versions of the software. You almost always have to build it yourself: binary releases of mod_jk are few and far between, I’ve rarely found the appropriate version for my version of Apache, and building mod_jk is confusing. (Even though mod_jk 1.2.8 provides an ant script, I ended up using the old ‘configure/make/make install’ process because I couldn’t make the ant script work.)

In short, there are plenty of options for connecting Tomcat and Apache. In general, I’d start out using mod_jk, simply because that’s the option that was built specifically to connect the two; mod_proxy doesn’t provide quite the same level of integration.

© Moore Consulting, 2003-2020