Checking the status of your files, using CVS

When I used CVS a few years ago, I remember a colleague writing a tremendous perl script that you could run from anywhere in the CVS source tree. It would let you know whether you had files that weren’t in CVS, needed to be updated, or were going to be merged. It was quite a nice piece of perl code, which essentially parsed the output of cvs status, and the information it output was quite useful at the end of a long bug fixing or coding session (“hey, what files did I change again?”). However, it also needed to be maintained and documented, as well as explained to users.

The other day, I stumbled on something which works almost as well, but is part of CVS already: cvs -qn up. The q option tells CVS to be quiet, and not chat about all the directories that it sees. The n option tells CVS not to make any changes on the filesystem, but just tell you what changes it would have made. Here’s some sample output:

[moore@localhost guide]$ cvs -qn up
? securityTechniques/NewStuff.rtf
M securityTechniques/InputValidation.rtf
M securityTechniques/SessionManagement.rtf
U securityTechniques/AuthenticationWorkingDraft.doc

M means that the file has been changed locally. ? means that the file exists on the locally, but not in the repository. U means that the file has changed in the repository, but not yet been updated locally. For more information on the output of update, look here.

Use this command and never lose track of the files in your CVS tree again.


Book Review: Enterprise J2ME

Update 2/25/07: added Amazon link.

I go to Java Users Groups (yes, I’m struggling to get in touch with my inner geek) once every two or three months. Sometimes there’s an engaging speaker, but most of the time the fellow up front looks like he’s just swallowed a hot pepper, speaks like he has a permanent stutter, and answers questions like I’m speaking Greek. (I’m not making fun; I had a hard time when I was in front of a JUG too.) Regardless of the quality of the speaker, I gain something just by watching the presentation–he points out interesting technologies and usually has a list of resources at the end that I can use for further research.

I think Michael Yuan would be a great speaker at a JUG, as he seems to have a masterful understanding of Java 2 Platform, Micro Edition (J2ME). However, the true value of his book, Enterprise J2ME, was in its introduction of new ideas and concepts, and the extensive resource listings. This book is a survey of the current state of the art in mobile java technology. Whatever your topic is, except for gaming development, you’ll find some coverage here. Securing information on the device or network, XML parsing strategies, messaging architectures, and data synchronization issues are all some of the topics that Yuan covers.

My favorite chapter was Chapter 7, ‘End to End Best Practices.’ Here, Yuan covers some of the things he’s learned in developing his own enterprise applications, and offers some solutions to five issues that differ between the J2ME world and the worlds familiar to most Java developers: J2EE and J2SE. He offers capsule solutions to the issues of “limited device hardware, slow unreliable networks, pervasive devices, ubiquitous integration [and] the impatient user.” Later in the book, he explores various architectures to expand on some of these capsules.

However, the strength of this book, exposing the reader to a number of different mobile technologies, is also its weakness. JUG speakers very rarely dive into a technology to the point that I feel comfortable using it without additional research; I usually have to go home, download whatever package was presented, and play with it a bit to get a real feel for its usefulness. This book was much the same. Some of the chapters, like chapters 12 and 13, where issues with databases on mobile devices (CDC devices, not CLDC devices) weren’t applicable to my kind of development, but you can hardly fault Yuan for that. Some of the later chapters felt like a series of ‘hello world’ applications for various vendors. This is especially true of chapter 12, and also of chapter 20, which is a collection of recipes for encryption on the device.

Additionally, I feel like some of the points he raised in Chapter 7 are never fully dealt with. An example of this is section 7.3.3, “Optimize for many devices.” The project I’m on is struggling with this right now, but I had trouble finding any further advice on this important topic beyond this one paragraph section. However, these small issues don’t take away from the overall usefulness of the book–if you are developing enterprise software, you’ll learn enough from this book to make its purchase worthwhile.

However, I wouldn’t buy the book if you’re trying to learn J2ME. Yuan gives a small tutorial on basic J2ME development in Appendix A, but you really need an entire book to learn the various packages, processes and UI concerns of J2ME, whether or not you have previously programmed in Java. Additionally, if you’re trying to program a standalone game, this book isn’t going to have a lot to offer you, since Yuan doesn’t spend a lot of time focused on UI concerns and phone compatibility issues. Some of the best practices about limited hardware may be worth reading, and if it’s a networked game, however, you may gain from his discussions in Chapter 6, “Advanced HTTP Techniques.” In general though, I’m not sure there’s enough to make it worth a game developer’s while.

I bought this book because I’m working on a networked J2ME application, and it stands alone in its discussion of the complex architectural issues that such applications face. It covers more than that, and isn’t perfect, but it is well worth the money, should you be facing the kind of problems I am. Indeed, I wish I had had this book months ago, as I’m sure it would have improved the my current application.

Link to book on Amazon.


An IP address is to DNS as a URL is to Google

I just read this post from Mike Clark. Now, I agree with some of what he says. It’s true that it is a whole lot easier to remember terms you were searching for than a URL. Words and concepts are just plain easier to remember than strings where the slightest mistype will give you a 404 error. That’s why we use DNS rather than just typing in IP addresses everywhere. However, IP addresses work almost all the time, even when the DNS server is down or misconfigured. If I know the IP address of a mail server, then I can still check my email even when I can’t resolve its domain name.

This is true of the search engine/URL dichotomy as well. Have you noticed the size of the uproar when Google changes PageRank? Every time a search engine changes its ranking algorithms, it will throw into havoc any sites you’ve memorized via search terms. And search engines change their systems more often than DNS goes down. But cool URIs [URLs] don’t change.

Another issue is that when it’s so easy to search vast amounts of information, you don’t end up looking anywhere else. This rant, which circulated a few months ago, highlights that issue. It’s almost like, if you can’t find something online, you can’t be bothered to find out about it. I do it myself. Even results of search engine queries don’t get fully explored. How often have you viewed anything other than the first page at google?

I understand the power and love of search engines, but folks, including myself, need to be sure to understand the implications of using them as shorthand for permanent links and/or shortcuts for true research.


How can you keep a website out of a search engine?

It’s an interesting problem. Usually, you want your site to be found, but there are cases where you’d rather not have your website show up in a search engine. There are many reasons for this: perhaps because google never forgets, or perhaps because what is on the website is truly private information: personal photos or business documents. There are several ways to prevent indexing of your site by a search engine. However, the only sure fire method is to password protect your site.

If you require some kind of username and password to access your website, it won’t be indexed from by any search engine robots. Even if a search engine finds it, the robot doing the indexing won’t be able to move past the login page, as they won’t have a username and password. Use a .htaccess if you have no other method of authenticating, since even simple text authentication will stop search engine robots. Intranets and group weblogs will find this kind of block useful. However, if it’s truly private information, make sure that you use SSL because .htaccess access control sends passwords in clear text. You’ll be defended from search engines, but not from people snooping for interesting passwords.

What if you don’t want people to be forced to remember a username and password? Suppose you want to share pictures of baby with Grandma and Grandpa, but don’t want to either force them to remember anything, nor allow the entire world to see your child dressed in a pumpkin suit. In this case, it’s helpful to understand how search engines work.

Most search engines start out with a given set of URLs, often submitted to them, and then follow all the links in a relentless search for more content (for more on this, see this excellent tutorial). Following the links means that submitters do not have to give the search engine each and every page of a site, as well as implying that any page linked to by a submitted site will eventually be indexed as well. Therefore, if you don’t want your site to be searched, don’t put the web sites URL any place it could be picked up. This includes archived email lists, Usenet news groups, and other websites. Make sure you make this requirement crystal clear to any other users who will be visiting this site, since all it takes is one person posting a link somewhere on the web, or submitting the URL to a search engine, for your site to be found and indexed. I’m not sure whether search engines look at domain names from whois and try to visit those addresses; I suspect not, simply because of the vast number of domains that are parked, along with the fact that robots have plenty of submitted and linked sites to visit and index.

It’s conceivable that you’d have content that you didn’t want searched, but you did want public. For example, if the information is changing rapidly: a forum or bulletin board, where the content rapidly gets out of date, or you’re EBay. You still want people to come to the web site, but you don’t want any deep links. (Such ‘deep linking’ has been an issue for a while, from 1999 to 2004.) Dynamic content (that is, content generated by a web server, usually from a relational database) is indexable when linked from elsewhere, so that’s no protection.

There are, however, two ways to tell a search engine, “please, don’t index these pages.” Both of these are documented here. You can put this meta tag: <meta name=”robots” content=”none”> in the <head> section of your HTML document. This lets you exclude certain documents easily. You can also create a robots.txt file, which allows you to disallow indexing of documents on a directory level. It also is sophisticated enough to do user-agent matching, which means that you can have different rules for different search engines.

Both of these latter approaches depend on the robot being polite and following conventions, whereas the first two solutions guarantee that search engines won’t find your site, and hence that strangers will have a more difficult time as well. Again, if you truly want your information private, password protect it and only allow logins over SSL.


imap proxy and horde

I’m implementing an intranet using the Horde suite of tools. This is written in PHP, and provides an amazing amount of out of the box, easily configured functionality. The most robust pieces are the mail client (incidentally, used by WestHost for webmail, and very scalable), the calendar, and the address book. There are a ton of other projects using the Horde framework, but most of them are in beta, and haven’t been officially released. Apparently these applications are pretty solid (at least, that’s what I get from reading the mail list) but I wanted to shy away from unreleased code. I am, however, anxiously awaiting the day that the new version is ready; as you can see from the demo site that it’s pretty sharp.

Anyway, I was very happy with the Horde framework. The only issue I had was that the mail application was very slow. I was connecting to a remote imap server, and PHP has no way to cache imap connections. Also, for some reason, the mail application reconnects to the imap server every time. However, someone on that same thread suggested using UP IMAP Proxy. This very slick C program was simple to compile and install on a BSD box, and sped up the connections noticeably. For instance, the authentication to the imap server (the only part of the application that I instrumented) went from 10 milliseconds to 1. It apparently caches the user name and password (as an MD5 hash) and only communicates with the imap server when it doesn’t have the information needed (for example, when you first visit, or when you’re requesting messages from your inbox). It does have some security concerns (look here and search for P_NEWLOG), but you can handle these at the network level. All in all, I’m very impressed with UP IMAP Proxy.

And, for that matter, I’m happy with Horde. I ended up having to write a small horde module, and while the framework doesn’t give you some things that I’m used to in the java world (no database pooling, no MVC pattern) it does give you a lot of other stuff (an object architecture to follow, single sign-on, logging). And I’m not aware of any framework in the java world that comes with so many applications ready to roll out. It’s all LGPL and, as I implied above, the released modules have a very coherent structure that makes it easy to add and subtract needed functionality.

Bravo Horde developers! Bravo imap proxy maintainer!


mod_alias to the rescue

Have you ever wanted to push all the traffic from one host to another? If you’re using apache, it’s easy. I needed to have all traffic from a http://www.foo.com site go to https://secure.foo.com. Originally, I was thinking of having a meta header redirect on the index.html page, and creating a custom 404 page that would also do a redirect.

Luckily, some folks corrected me, and showed me an easier way. mod_alias (ver 2.0) can do this easily, and as far as I can tell, transparently. I just put this line in the virtual server section for www.foo.com:

Redirect permanent / https://secure.foo.com/

Now, every request for any file from www.foo.com gets routed to secure.foo.com. And since they share the same docroot, this is exactly what I wanted.

To do this, make sure you have mod_alias available. It should be either compiled in (you can tell with httpd -l) or a shared library (on unix, usually called mod_alias.so). You have to make sure to load the shared library; see LoadModule and AddModule for more information.


My most popular posting

I don’t know why, but my post on yahoo mail problems is my most popular post thus far. I suspect it got picked up in google, or some other search engine, and is now serving as a place for folks to gripe about the free Yahoo! mail service. (Incidentally, I’m the second “Dan Moore” in google now! Meri has some interesting things to say about this intersection between the internet and real life.) This is interesting (and a bit amusing) to me for several reasons:

For one, there’s no helpful content on that posting for these folks problems. In fact, I don’t even use the free service from Yahoo (I pay extra for storage). And the posting concerns the short term problems a client of mine had with the new Yahoo mail interface, and how outsourcing exposes you to those types of risks. The comments are not germane to the posting.

Or should I say that the posting is not germane to the comments? As is ever the case on internet forums, this posting has been hijacked by people who want to complain and share possible fixes to a very real problem–they can’t get to their email (I’m cranky when I can’t to my email, after all). I don’t begrudge them the use of my site; this just reinforces what Clay Shirky wrote about social software–people will twist software until it does what they need it to do, and fighting that is a lost cause (and has been for 20 years).

And that’s not just true for software, but for technology in general. After all, I doubt anyone working on radar thought it would someday be used for re-heating leftovers, and I’m sure that Daguerre (the inventor of photography) would be shocked at some of the pictures I’ve taken at house parties.


IM Everywhere

The last two companies I worked at used instant messaging (IM) extensively in their corporate environment. They sent meeting notifications over IM, they used IM to indicate their availability for interactions, and they used it for the quick questions that IM is so good at handling (“hey John, can you bounce the server?”). IM has no spam, and is good at generating immediate responses.

I’m a latecomer to IM. I’ve used talk in college, but in a work environment, email was usually enough. And, I have to confess, when I’m programming, it’s not easy for me to task switch. And IM demands that, in the same way that a phone call does. How many times have you been deep in a conversation with someone, only to have their phone ring? You know what happens next: they say “can I get that?” and reach for the phone. Whatever flow and connection you had is disrupted.

Now, obviously you can configure IM to be less intrusive than a phone call, and the first thing I did was switch off all sound notifications in my yahoo IM client. However, the entire point of IM is to disrupt what you’re doing–whether it’s by playing a sound or blinking or popping up a window, the attraction of IM is that it is immediate.

I’ve found that ninety percent of people would rather talk to a person than look something up for themselves. (I am one of those ninety percent.) There are a number of reasons. It’s easier to ask unstructured questions. People are more responsive, and can come up with answers that you wouldn’t think to find on your own. And it’s just plain reassuring to find out what someone else thinks–you can have a mini discussion about the issue. This last is especially important if you aren’t even sure what you’re trying to find.

IM is a great help for this kind of ad-hoc discussion. However, it’s another distraction at work. The real question is, do we need more distractions at work? Jakob Nielsen doesn’t think so (see number 6) and I agree.

However, IM is becoming ingrained in these corporations, and I don’t see anything standing in the way of further adoption. The original impetus to write this essay was the astonishment I felt at these two facts:

1. the widespread corporate use of IM

2. the paucity of corporate level control over IM

In all the time I was working at these companies, I saw many many IMs sent. But I only heard one mention of setting up a corporate IM server (someone mentioned that one of the infrastructure projects, long postponed, was to set up a jabber server). Now, I don’t pretend that any corporate secrets were being exchanged, at least no more than are sent every day via unencrypted email. But every corporation of a decent size has control over its email infrastructure. I was astonished that no similar move had taken place yet for IM. Perhaps because IM is a young technology, perhaps because it is being rolled out from the bottom up, perhaps because it’s not (always) a permanent medium.

For whatever reason, I think that you’re going to see more and more IM servers (even MS has an offering) being deployed as businesses (well, IT departments) realize that IM is being heavily used and is not being monitored at all. Perhaps this is analogous to the explosion of departments static HTML intranets that happened in the late 1990s, which only came to an end when the IT department realized what was happening, and moved to standardize what became an important business information resource.


Jalopy

I like javadoc. Heck, I like documentation. But I hate adding javadoc to my code. It’s tedious, and I can never remember all the tags. I don’t use an IDE so the formatting gets to me.

After attending a presentation at BJUG about software tools, I investigated jalopy and I liked what I found. Now, jalopy is more than just a javadoc comment inserter, but javadoc insertion was my primary use of the tool. It may be piss poor for code formatting and whatnot, but it was pretty good at inserting javadoc. I was using the ant plug-in and the instructions were simple and straight forward. It didn’t blow away any existing comments, and it didn’t munge any files, once I configured it correctly. And there are, make no mistake, lots of configuration options.

Jalopy has a slick Swing interface to set all these configuration options, and you can export your configuration to an XML file which can be referenced by others. This, along with the ant integration, make it a good choice for making sure that all code checked in by a team has similar code formatting.

However, I do have a few minor quibbles with this tool.

1. The default configuration of javadoc is busted. When you run it, it javadocs methods and classes just fine, but any fields are marked with “DOCUMENT ME!” when they should be commented out: “/** DOCUMENT ME! */”. This means that, with the default configuration, you can’t even run the formatter twice, since jalopy itself chokes on the uncommented “DOCUMENT ME!”.

2. The configuration file is not documented anywhere that I could find. I looked long and hard on the Internet, and only found one example of a jalopy configuration file here. And this is apparently just the default options exported to a file. I’ve put up a sample configuration file here which fixes problem #1. (This configuration is only for javadoc; it accepts all other defaults.)

3. The zip file that you download isn’t in its own directory. This means that when you unassumingly unzip it, it spews all over your current directory.

None of these are show stoppers, that’s for sure. If you’re looking for a free, open source java code formatting tool, jalopy is worth a close look.


Is the American tech worker obsolete?

I’ve been doing some thinking about offshoring recently. For those of you not in the IT industry, ‘offshoring’ is jargon for outsourcing of white collar jobs to developing countries like India and China. There’s been a lot of talk about this phenomenon: Salon (1 and 2), The Economist [sorry, it’s a pay article], InfoWorld, and CNET News.com have all commented recently. In addition to reading these articles and others like them, I’ve also talked to folks in the software industry, as well as friends who work in aerospace and packaged good manufacturing industries. And I’ve come to the conclusion that a certain amount of offshoring is inevitable, since the labor cost differential is so great.

However, like any other business fad (or any fad, for that matter), there are external costs that haven’t simply are not fully understood. Among these are

1. Loss of customers
The constant mantra of folks who are trying to tell IT workers how to adjust to losing their jobs is ‘retrain, retrain’. But retrain for what? What job isn’t offshorable? Nursing is the only one that comes to mind. If allvirtualizable (i.e. can be done without face time) service jobs where there is a price differential get pushed overseas, that’s millions of jobs lost. The American economy has been the engine for most of the world’s growth over the last 10 years, sucking up exports from other countries to the tune of billions of dollars. What happens when the consumers of the USA don’t spend money, either because they are out of a job, or afraid of losing their job soon? It takes a visionary like Henry Ford to realize that if you grow the amount of folks who can buy your products, you grow your business.

2. Loss of future business leaders
If you export big chunks of your IT department, perhaps keeping only senior folks who can help manage external projects, then you win in the short run. However, eventually those folks will retire (ask NASA). And if you haven’t brought any entry level talent in, where will you go to replace these needed folks?

3. Difficulty of managing far flung teams from different cultures
This is difficult in two ways. One is the logistical aspect. If you want a teleconference, you have to adjust either your schedule or theirs. And if the country is on the opposite side of the globe, possibly both folks need to be in the office at an awkward time. The second difficulty is related to quality. Just as the Japanese products in the 50s and 60s were thought cheap and low quality, some of the work done overseas today is a lower standard than expected. This is true of all software of course, but it’s harder to control the quality when you write a spec and throw it over the wall.

In short, offshoring is in its infancy. More of it is coming down the line, but, as the hidden costs are discovered, the benefits of local teams will come back into focus. I can do much of my consulting work from my house, yet I still find that most companies want me to come into their office. Why? To some extent it’s control (should be careful not to blog my way out of a job here), but it also is because communication, in each direction, is easier with someone who’s on site. There’s extra effort expended in any kind of virtual communication. And by meeting me, the customer and I build a relationship, which is perhaps more important for trust in business than anything else.

I also want to address Michael Yuan’s comments:

“3. Coding is a dying profession in the long run. If the jobs have not been outsourced to developing countries, the new generation of model-focused automatic code generation tools will eliminate the need for basic coders anyway. The jobs that have a future are system designing and architecting (the real engineering jobs). I think the ability to design end-to-end systems using whatever tools available is an important skill for the future.”

I’ve commented enough, I hope on, the idea that all coding work will or should be outsourced to developing countries. But I think that the idea that ‘the need for basic coders’ will be eliminated due to improvements in ‘automatic code generation tools’ is foolish for a number of reasons.

1. If you don’t understand basic coding on a system, when the automatic tool doesn’t do what you need it to, the means you’re screwed. Basic coding is not something you can pick up at school–you need to be out in the real world, working on crufty systems that are too expensive to change and older than dirt, to really appreciate what automatic tools can and cannot do, and what systems can and cannot do. In addition, easier code generation means more code out there, not less. And, easier code generation means more code out there. And, the easier the code is to generate, a la Visual Basic, the more likely the person doing the generation won’t get it right, or won’t document it, or won’t understand why there’s certain behavior exhibited. In this case, someone who has a deep, visceral understanding of the system and language will be called in.

2. Yuan talks about architecture and system design being the ‘real engineering jobs.’ I’ll agree that those are more stimulating and challenging than coding, and also have a higher value add. But again, how many folks are system architects right out of school? I certainly wouldn’t want to work on a project that had been designed by someone with exactly 0 years of real world experience. If you abolish entry level positions, you eat your seed corn, as I mention in my comments about offshoring above.

3. There is no silver bullet. The hard part isn’t writing the software, it’s determining what software needs to be written.

4. Someone’s got to write the tools. I remember an old science fiction story, by Isaac Asimov I think, about a world of taped learning. You took a test at 18 that determined what your skills were, and then you listened to a set of tapes that taught you all you needed to know. The protagonist was distraught because the test just didn’t seem to fit him–he was outcast and ridiculed for not knowing his profession. Of course it turned out that the folks that couldn’t be tape educated were exactly the folks who were able to write the tapes. Someone’s got to write the tools. (I don’t pretend, however, that anywhere near the same number of folks are needed to write tools as to use them.)

Michael and others are certainly correct that things are going to change in the software world. It’s not going to be possible to just know a language–you have to be involved in the business and maintain relationships. But change is nothing new for IT workers, right?

In short, it appears that Neal Stephenson was correct. I leave you with an excerpt from Snow Crash:

“When it gets down to it–talking trade balances here–once we’ve brain-drained all our technology into other countries, once things have evened out, they’re making cars in Bolivia and microwave ovens in Tadzhikistan and selling them here–once our edge in natural resources has been made irrelevant by giant Hong Kong ships and dirigibles that can ship North Dakota all the way to New Zealand for a nickel–once the Invisible Hand has take all those historical inequities and smeared them out into a broad global layer of what a Pakistani brickmaker would consider to be prosperity–y’know what? There’s only four things we [America] do better than anyone else

music
movies
microcode (software)
high-speed pizza delivery”

And none of the four above are guaranteed (except perhaps the high-speed pizza delivery).

Happy New Year.



© Moore Consulting, 2003-2017 +