Skip to content

Checking the status of your files, using CVS

When I used CVS a few years ago, I remember a colleague writing a tremendous perl script that you could run from anywhere in the CVS source tree. It would let you know whether you had files that weren’t in CVS, needed to be updated, or were going to be merged. It was quite a nice piece of perl code, which essentially parsed the output of cvs status, and the information it output was quite useful at the end of a long bug fixing or coding session (“hey, what files did I change again?”). However, it also needed to be maintained and documented, as well as explained to users.

The other day, I stumbled on something which works almost as well, but is part of CVS already: cvs -qn up. The q option tells CVS to be quiet, and not chat about all the directories that it sees. The n option tells CVS not to make any changes on the filesystem, but just tell you what changes it would have made. Here’s some sample output:

[moore@localhost guide]$ cvs -qn up
? securityTechniques/NewStuff.rtf
M securityTechniques/InputValidation.rtf
M securityTechniques/SessionManagement.rtf
U securityTechniques/AuthenticationWorkingDraft.doc

M means that the file has been changed locally. ? means that the file exists on the locally, but not in the repository. U means that the file has changed in the repository, but not yet been updated locally. For more information on the output of update, look here.

Use this command and never lose track of the files in your CVS tree again.

Book Review: Enterprise J2ME

Update 2/25/07: added Amazon link.

I go to Java Users Groups (yes, I’m struggling to get in touch with my inner geek) once every two or three months. Sometimes there’s an engaging speaker, but most of the time the fellow up front looks like he’s just swallowed a hot pepper, speaks like he has a permanent stutter, and answers questions like I’m speaking Greek. (I’m not making fun; I had a hard time when I was in front of a JUG too.) Regardless of the quality of the speaker, I gain something just by watching the presentation–he points out interesting technologies and usually has a list of resources at the end that I can use for further research.

I think Michael Yuan would be a great speaker at a JUG, as he seems to have a masterful understanding of Java 2 Platform, Micro Edition (J2ME). However, the true value of his book, Enterprise J2ME, was in its introduction of new ideas and concepts, and the extensive resource listings. This book is a survey of the current state of the art in mobile java technology. Whatever your topic is, except for gaming development, you’ll find some coverage here. Securing information on the device or network, XML parsing strategies, messaging architectures, and data synchronization issues are all some of the topics that Yuan covers.

My favorite chapter was Chapter 7, ‘End to End Best Practices.’ Here, Yuan covers some of the things he’s learned in developing his own enterprise applications, and offers some solutions to five issues that differ between the J2ME world and the worlds familiar to most Java developers: J2EE and J2SE. He offers capsule solutions to the issues of “limited device hardware, slow unreliable networks, pervasive devices, ubiquitous integration [and] the impatient user.” Later in the book, he explores various architectures to expand on some of these capsules.

However, the strength of this book, exposing the reader to a number of different mobile technologies, is also its weakness. JUG speakers very rarely dive into a technology to the point that I feel comfortable using it without additional research; I usually have to go home, download whatever package was presented, and play with it a bit to get a real feel for its usefulness. This book was much the same. Some of the chapters, like chapters 12 and 13, where issues with databases on mobile devices (CDC devices, not CLDC devices) weren’t applicable to my kind of development, but you can hardly fault Yuan for that. Some of the later chapters felt like a series of ‘hello world’ applications for various vendors. This is especially true of chapter 12, and also of chapter 20, which is a collection of recipes for encryption on the device.

Additionally, I feel like some of the points he raised in Chapter 7 are never fully dealt with. An example of this is section 7.3.3, “Optimize for many devices.” The project I’m on is struggling with this right now, but I had trouble finding any further advice on this important topic beyond this one paragraph section. However, these small issues don’t take away from the overall usefulness of the book–if you are developing enterprise software, you’ll learn enough from this book to make its purchase worthwhile.

However, I wouldn’t buy the book if you’re trying to learn J2ME. Yuan gives a small tutorial on basic J2ME development in Appendix A, but you really need an entire book to learn the various packages, processes and UI concerns of J2ME, whether or not you have previously programmed in Java. Additionally, if you’re trying to program a standalone game, this book isn’t going to have a lot to offer you, since Yuan doesn’t spend a lot of time focused on UI concerns and phone compatibility issues. Some of the best practices about limited hardware may be worth reading, and if it’s a networked game, however, you may gain from his discussions in Chapter 6, “Advanced HTTP Techniques.” In general though, I’m not sure there’s enough to make it worth a game developer’s while.

I bought this book because I’m working on a networked J2ME application, and it stands alone in its discussion of the complex architectural issues that such applications face. It covers more than that, and isn’t perfect, but it is well worth the money, should you be facing the kind of problems I am. Indeed, I wish I had had this book months ago, as I’m sure it would have improved the my current application.

Link to book on Amazon.

An IP address is to DNS as a URL is to Google

I just read this post from Mike Clark. Now, I agree with some of what he says. It’s true that it is a whole lot easier to remember terms you were searching for than a URL. Words and concepts are just plain easier to remember than strings where the slightest mistype will give you a 404 error. That’s why we use DNS rather than just typing in IP addresses everywhere. However, IP addresses work almost all the time, even when the DNS server is down or misconfigured. If I know the IP address of a mail server, then I can still check my email even when I can’t resolve its domain name.

This is true of the search engine/URL dichotomy as well. Have you noticed the size of the uproar when Google changes PageRank? Every time a search engine changes its ranking algorithms, it will throw into havoc any sites you’ve memorized via search terms. And search engines change their systems more often than DNS goes down. But cool URIs [URLs] don’t change.

Another issue is that when it’s so easy to search vast amounts of information, you don’t end up looking anywhere else. This rant, which circulated a few months ago, highlights that issue. It’s almost like, if you can’t find something online, you can’t be bothered to find out about it. I do it myself. Even results of search engine queries don’t get fully explored. How often have you viewed anything other than the first page at google?

I understand the power and love of search engines, but folks, including myself, need to be sure to understand the implications of using them as shorthand for permanent links and/or shortcuts for true research.

How can you keep a website out of a search engine?

It’s an interesting problem. Usually, you want your site to be found, but there are cases where you’d rather not have your website show up in a search engine. There are many reasons for this: perhaps because google never forgets, or perhaps because what is on the website is truly private information: personal photos or business documents. There are several ways to prevent indexing of your site by a search engine. However, the only sure fire method is to password protect your site.

If you require some kind of username and password to access your website, it won’t be indexed from by any search engine robots. Even if a search engine finds it, the robot doing the indexing won’t be able to move past the login page, as they won’t have a username and password. Use a .htaccess if you have no other method of authenticating, since even simple text authentication will stop search engine robots. Intranets and group weblogs will find this kind of block useful. However, if it’s truly private information, make sure that you use SSL because .htaccess access control sends passwords in clear text. You’ll be defended from search engines, but not from people snooping for interesting passwords.

What if you don’t want people to be forced to remember a username and password? Suppose you want to share pictures of baby with Grandma and Grandpa, but don’t want to either force them to remember anything, nor allow the entire world to see your child dressed in a pumpkin suit. In this case, it’s helpful to understand how search engines work.

Most search engines start out with a given set of URLs, often submitted to them, and then follow all the links in a relentless search for more content (for more on this, see this excellent tutorial). Following the links means that submitters do not have to give the search engine each and every page of a site, as well as implying that any page linked to by a submitted site will eventually be indexed as well. Therefore, if you don’t want your site to be searched, don’t put the web sites URL any place it could be picked up. This includes archived email lists, Usenet news groups, and other websites. Make sure you make this requirement crystal clear to any other users who will be visiting this site, since all it takes is one person posting a link somewhere on the web, or submitting the URL to a search engine, for your site to be found and indexed. I’m not sure whether search engines look at domain names from whois and try to visit those addresses; I suspect not, simply because of the vast number of domains that are parked, along with the fact that robots have plenty of submitted and linked sites to visit and index.

It’s conceivable that you’d have content that you didn’t want searched, but you did want public. For example, if the information is changing rapidly: a forum or bulletin board, where the content rapidly gets out of date, or you’re EBay. You still want people to come to the web site, but you don’t want any deep links. (Such ‘deep linking’ has been an issue for a while, from 1999 to 2004.) Dynamic content (that is, content generated by a web server, usually from a relational database) is indexable when linked from elsewhere, so that’s no protection.

There are, however, two ways to tell a search engine, “please, don’t index these pages.” Both of these are documented here. You can put this meta tag: <meta name=”robots” content=”none”> in the <head> section of your HTML document. This lets you exclude certain documents easily. You can also create a robots.txt file, which allows you to disallow indexing of documents on a directory level. It also is sophisticated enough to do user-agent matching, which means that you can have different rules for different search engines.

Both of these latter approaches depend on the robot being polite and following conventions, whereas the first two solutions guarantee that search engines won’t find your site, and hence that strangers will have a more difficult time as well. Again, if you truly want your information private, password protect it and only allow logins over SSL.

imap proxy and horde

I’m implementing an intranet using the Horde suite of tools. This is written in PHP, and provides an amazing amount of out of the box, easily configured functionality. The most robust pieces are the mail client (incidentally, used by WestHost for webmail, and very scalable), the calendar, and the address book. There are a ton of other projects using the Horde framework, but most of them are in beta, and haven’t been officially released. Apparently these applications are pretty solid (at least, that’s what I get from reading the mail list) but I wanted to shy away from unreleased code. I am, however, anxiously awaiting the day that the new version is ready; as you can see from the demo site that it’s pretty sharp.

Anyway, I was very happy with the Horde framework. The only issue I had was that the mail application was very slow. I was connecting to a remote imap server, and PHP has no way to cache imap connections. Also, for some reason, the mail application reconnects to the imap server every time. However, someone on that same thread suggested using UP IMAP Proxy. This very slick C program was simple to compile and install on a BSD box, and sped up the connections noticeably. For instance, the authentication to the imap server (the only part of the application that I instrumented) went from 10 milliseconds to 1. It apparently caches the user name and password (as an MD5 hash) and only communicates with the imap server when it doesn’t have the information needed (for example, when you first visit, or when you’re requesting messages from your inbox). It does have some security concerns (look here and search for P_NEWLOG), but you can handle these at the network level. All in all, I’m very impressed with UP IMAP Proxy.

And, for that matter, I’m happy with Horde. I ended up having to write a small horde module, and while the framework doesn’t give you some things that I’m used to in the java world (no database pooling, no MVC pattern) it does give you a lot of other stuff (an object architecture to follow, single sign-on, logging). And I’m not aware of any framework in the java world that comes with so many applications ready to roll out. It’s all LGPL and, as I implied above, the released modules have a very coherent structure that makes it easy to add and subtract needed functionality.

Bravo Horde developers! Bravo imap proxy maintainer!

mod_alias to the rescue

Have you ever wanted to push all the traffic from one host to another? If you’re using apache, it’s easy. I needed to have all traffic from a http://www.foo.com site go to https://secure.foo.com. Originally, I was thinking of having a meta header redirect on the index.html page, and creating a custom 404 page that would also do a redirect.

Luckily, some folks corrected me, and showed me an easier way. mod_alias (ver 2.0) can do this easily, and as far as I can tell, transparently. I just put this line in the virtual server section for www.foo.com:

Redirect permanent / https://secure.foo.com/

Now, every request for any file from www.foo.com gets routed to secure.foo.com. And since they share the same docroot, this is exactly what I wanted.

To do this, make sure you have mod_alias available. It should be either compiled in (you can tell with httpd -l) or a shared library (on unix, usually called mod_alias.so). You have to make sure to load the shared library; see LoadModule and AddModule for more information.

My most popular posting

I don’t know why, but my post on yahoo mail problems is my most popular post thus far. I suspect it got picked up in google, or some other search engine, and is now serving as a place for folks to gripe about the free Yahoo! mail service. (Incidentally, I’m the second “Dan Moore” in google now! Meri has some interesting things to say about this intersection between the internet and real life.) This is interesting (and a bit amusing) to me for several reasons:

For one, there’s no helpful content on that posting for these folks problems. In fact, I don’t even use the free service from Yahoo (I pay extra for storage). And the posting concerns the short term problems a client of mine had with the new Yahoo mail interface, and how outsourcing exposes you to those types of risks. The comments are not germane to the posting.

Or should I say that the posting is not germane to the comments? As is ever the case on internet forums, this posting has been hijacked by people who want to complain and share possible fixes to a very real problem–they can’t get to their email (I’m cranky when I can’t to my email, after all). I don’t begrudge them the use of my site; this just reinforces what Clay Shirky wrote about social software–people will twist software until it does what they need it to do, and fighting that is a lost cause (and has been for 20 years).

And that’s not just true for software, but for technology in general. After all, I doubt anyone working on radar thought it would someday be used for re-heating leftovers, and I’m sure that Daguerre (the inventor of photography) would be shocked at some of the pictures I’ve taken at house parties.

IM Everywhere

The last two companies I worked at used instant messaging (IM) extensively in their corporate environment. They sent meeting notifications over IM, they used IM to indicate their availability for interactions, and they used it for the quick questions that IM is so good at handling (“hey John, can you bounce the server?”). IM has no spam, and is good at generating immediate responses.

I’m a latecomer to IM. I’ve used talk in college, but in a work environment, email was usually enough. And, I have to confess, when I’m programming, it’s not easy for me to task switch. And IM demands that, in the same way that a phone call does. How many times have you been deep in a conversation with someone, only to have their phone ring? You know what happens next: they say “can I get that?” and reach for the phone. Whatever flow and connection you had is disrupted.

Now, obviously you can configure IM to be less intrusive than a phone call, and the first thing I did was switch off all sound notifications in my yahoo IM client. However, the entire point of IM is to disrupt what you’re doing–whether it’s by playing a sound or blinking or popping up a window, the attraction of IM is that it is immediate.

I’ve found that ninety percent of people would rather talk to a person than look something up for themselves. (I am one of those ninety percent.) There are a number of reasons. It’s easier to ask unstructured questions. People are more responsive, and can come up with answers that you wouldn’t think to find on your own. And it’s just plain reassuring to find out what someone else thinks–you can have a mini discussion about the issue. This last is especially important if you aren’t even sure what you’re trying to find.

IM is a great help for this kind of ad-hoc discussion. However, it’s another distraction at work. The real question is, do we need more distractions at work? Jakob Nielsen doesn’t think so (see number 6) and I agree.

However, IM is becoming ingrained in these corporations, and I don’t see anything standing in the way of further adoption. The original impetus to write this essay was the astonishment I felt at these two facts:

1. the widespread corporate use of IM

2. the paucity of corporate level control over IM

In all the time I was working at these companies, I saw many many IMs sent. But I only heard one mention of setting up a corporate IM server (someone mentioned that one of the infrastructure projects, long postponed, was to set up a jabber server). Now, I don’t pretend that any corporate secrets were being exchanged, at least no more than are sent every day via unencrypted email. But every corporation of a decent size has control over its email infrastructure. I was astonished that no similar move had taken place yet for IM. Perhaps because IM is a young technology, perhaps because it is being rolled out from the bottom up, perhaps because it’s not (always) a permanent medium.

For whatever reason, I think that you’re going to see more and more IM servers (even MS has an offering) being deployed as businesses (well, IT departments) realize that IM is being heavily used and is not being monitored at all. Perhaps this is analogous to the explosion of departments static HTML intranets that happened in the late 1990s, which only came to an end when the IT department realized what was happening, and moved to standardize what became an important business information resource.

Jalopy

I like javadoc. Heck, I like documentation. But I hate adding javadoc to my code. It’s tedious, and I can never remember all the tags. I don’t use an IDE so the formatting gets to me.

After attending a presentation at BJUG about software tools, I investigated jalopy and I liked what I found. Now, jalopy is more than just a javadoc comment inserter, but javadoc insertion was my primary use of the tool. It may be piss poor for code formatting and whatnot, but it was pretty good at inserting javadoc. I was using the ant plug-in and the instructions were simple and straight forward. It didn’t blow away any existing comments, and it didn’t munge any files, once I configured it correctly. And there are, make no mistake, lots of configuration options.

Jalopy has a slick Swing interface to set all these configuration options, and you can export your configuration to an XML file which can be referenced by others. This, along with the ant integration, make it a good choice for making sure that all code checked in by a team has similar code formatting.

However, I do have a few minor quibbles with this tool.

1. The default configuration of javadoc is busted. When you run it, it javadocs methods and classes just fine, but any fields are marked with “DOCUMENT ME!” when they should be commented out: “/** DOCUMENT ME! */”. This means that, with the default configuration, you can’t even run the formatter twice, since jalopy itself chokes on the uncommented “DOCUMENT ME!”.

2. The configuration file is not documented anywhere that I could find. I looked long and hard on the Internet, and only found one example of a jalopy configuration file here. And this is apparently just the default options exported to a file. I’ve put up a sample configuration file here which fixes problem #1. (This configuration is only for javadoc; it accepts all other defaults.)

3. The zip file that you download isn’t in its own directory. This means that when you unassumingly unzip it, it spews all over your current directory.

None of these are show stoppers, that’s for sure. If you’re looking for a free, open source java code formatting tool, jalopy is worth a close look.