Webhacking with CVS

In the latest edition of 2600, there’s an article about webhacking with CVS. The basic premise of this article is that if you do a cvs checkout of your static html site to your webroot, you let folks with inquisitive minds and an understanding of CVS know more than you intended about your IT infrastructure. Read the article for more information.

However, it’s easy enough to defeat. The answer is to use the cvs export command, which generates exactly the same files as a checkout except without the CVS directories. Rolling out updates toa a site via this command means, of course, that any changes you make to files in the web directory will be blown away. But, it could be argued that it’s a godd thing to force everything to go through CVS. It also means that you can’t make incremental updates as easily. It’s still possible, but you just have to check out the source to some other place and copy the file over manually. Another option, which lets you do updates more easily, is rsync --cvs-exclude, which does the same thing.

Using either of these solutions makes it a bit tougher to move content to the website. But it makes things a whole lot more secure.

Business Process Crystallization

I’m in the process of helping a small business migrate an application that they use from Paradox back end to a PostgreSQL back end. The front end will remain written in Paradox. (There are a number of reasons for this–they’d like to have a more robust database, capable of handling more users. Also, Paradox is on the way out. A google search doesn’t turn up any pages from corel.com in the top 10. Ominous?)

I wrote this application a few years ago. Suffice it to say that I’ve learned a lot since then, and wish I could rectify a few mistakes. But that’s another post. What I’d really like to talk about now is how computer programs crystallize business processes.

Business processes are ‘how things get done.’ For instance, I write software and sell it. If I have a program to write, I specify the requirements, get the client to sign off on them (perhaps with some negotiation), design the program, code the program, test it, deploy it, make changes that the client wants, and maintain it. This is a business process, but it’s pretty fluid. If I need to get additional requirements specification after design, I can do that. Most business processes are fluid, with a few constraints. These constraints can be positive: I need to get client sign off (otherwise I won’t get paid). Or they can be negative: I can’t program .NET because I don’t have Visual Studio.NET, or I can’t program .NET because I don’t want to learn it.

Computerizing tasks can make processes much, much easier. Learning how to mail merge can save time when invoicing, or sending out those ever impressive holiday gift cards. But everything has its cost, and computerizing processes is no different. Processes become harder to change after a program has been written or installed to ‘help’ with them. For small businesses, such process engineering is doubly calcifying, because few folks have time to think about how to make things better (they’re running just as fast as they can to stay in place) and also because computer expertise is at a premium. (Realizing this is a fact and that folks will take a less technically excellent solution if it’s maintainable by normal people is what has helped MicroSoft make so much money. The good is the enemy of the best and if you can have a pretty good solution for one quarter of the price of a perfect solution, most folks will choose the first.)

So, what happens? People, being more flexible than computers, adjust themselves to the process, which, in a matter of months or years, may become obsolete. It may not do what they need it to do, or it may require them to do extra labor. However, because it is a known constraint and it isn’t worth the investment to change, it remains. I’ve seen cruft in computer programs (which one friend of mine once declared was nothing but condensed business knowledge), but I’m also starting to realize that cruft exists in businesses too. Of course, sweeping away business process cruft assumes the same risks as getting rid of code cruft. There are costs to getting rid of the unneeded processes, and the cost of finding the problems, fixing them, documenting them, and training employees on the new ones, may exceed the cost of just putting up with them.

“A computer lets you make more mistakes faster than any invention in human history – with the possible exceptions of handguns and tequila.” -Mitch Ratcliffe, Technology Review, April 1992

A computer, for the virtue of being able to instantaneously recall and process vast amounts of data, also crystallizes business processes, making it harder to change them. In additional to letting you make mistakes quickly, it also forces you to let mistakes stand uncorrected.

Book Review: The Alchemist

This book, by Paulo Coelho, is, like all fables, written on many levels. Ostensibly the story of a shepard in Spain who, unlike so many people, follows his dreams. He does get a little help from the supernatural, but many of the stories most interesting thoughts come from his musings on nature. His travels take him across the Mediteranean into Africa, where he meets several archetypal characters (the Man Afraid of Change, the Waiting Woman, the Wise Shaman, the Warrior Chief, the Cynical Fool), learns about himself and his dreams, and finds his destiny.

An interesting way to look at this story is to ask the question: who is the title character? Alchemy is such a potent idea–the changing of one element into another has had a grasp on the human mind for as long as we have known about elements. But, of course, alchemy has secondary meanings–an alchemist transforms. Is the boy an alchemist, for transforming himself and the lives of those around him? Is God the alchemist, for transforming the destinies of humanity? Is the reader the alchemist, for taking the fable and transforming its words into something personally meaningful?

My favorite part about this book was its gritty reality. I like epics, but there were no sweeping vistas and no ubermensch heros in this book. Everything the boy does (and we never learn his name) is something you and I could do. I guess that’s the point of the book.

Update: As ihath commented you do learn the boy’s name. It’s revealed on the first page. But, as I remember, it’s not used much throughout the book, maintaining the everyman nature of the story.

“The Alchemist” at Amazon.

Book Review: The Long Dark Tea Time of the Soul

Updated 2/25/2007: Added amazon link.

Douglas Adams is amazingly whimsical. If the Hitchhiker’s Guide to The Galaxy didn’t convince you of that, the Dirk Gently novels will. Gently is a detective, but no Sherlock Holmes. No, rather than ruling out the impossible to leave only the improbable, Gently prefers to believe the impossible, because it makes so much more sense than the improbable. He solves his case through ingenuity, luck, and a belief in the interconnectedness of all things.

A highlight for me is Dirk’s method of finding directions. He just follows someone who looks like they know where they are going. This, he says, doesn’t always get him to where he wanted to go, but almost always gets him to where he needs to be. If only we all had such faith!

This book is the second of two about the private eye. I don’t want to give away too much of the story, as it is definitely a mystery, but it covers some of the same ground as American Gods in a much less sinister manner. Everything has a reason and a rhyme in this book, even if at first encounter, an event makes no sense, neither to the characters nor the reader. While the ending is a bit abrupt for my taste, if you like whimsy, you’ll get an ample helping with this book.

Link to this book on Amazon.

Coding Standards

I went to BJUG meeting tonight, and the topic was automatic code standardization tools. Tom Marrs gave a good presentation which covered 4 open source tools that integrate with ant:

Checkstyle checks that code fits existing guidelines. It comes configured to check against Code Conventions for the Java Programming Language. pmd is lint for java; it actually has a page where you can see it run against itself. It also finds generic exceptions and complains. Both of these tool show you where problems exist in your code, usually by generating a nice HTML report, but don’t modify the source.

The next two tools actually modify your .java files. cleanImports fixes erroneous import statements, and cleans up com.foo.* imports. It’s smart enough, supposedly, to only import the actual classes that are used in a particular file. Jalopy is a bit more ambitious, and attempts to fix missing javadoc, whitespace problems, brace placement and some other problems.

Now, you need a combination of these tools. The style checkers can be very strict, since they don’t have to be smart enough to fix the problems they find. The code beautifiers, on the other hand, actually fix the problems that they find. Tom made some good points that these programs can generate a lot of output, and it makes sense to prioritize in deciding which problems to fix. Especially when you aren’t starting with a blank slate, it makes a lot of sense to ignore some of the lesser evils (who cares about whitespace when you have a constant that isn’t static final).

A member of the audience brought up a good point, which is that using these kind of tools is at least as much a political problem as it is a software problem. Few folks are going to argue that having a consistent coding standard makes maintenance easier, but I think that few folks are going to argue that it’s the most important factor. But, as I see it, there are a couple of different things you can do to enforce coding standards. I list these below in increasing order of intrusiveness.

1. Make the tools available

If you make the tools available on the project, folks will probably use it. After all, who likes writing crappy code? All these tools integrate with ant, and some integrate with popular IDEs. Make developers aware of the tools and add the targets to your standard build files, and encourage folks to use it.

2. Get buy in from the team

If you’re on a team, it may make sense to have ‘tools meeting’ at the beginning of a project (or in the middle, for that matter). Decide on basic standards (and remember, the location of braces isn’t really that important), after explaining how it makes folks’ lives easier. Build a consensus that using one or two of these tools is a good thing to do, and should be done before code is checked in.

3. Have senior staff dictate usage: ‘thou shalt use pmd’

If the senior members of a team feel strongly about this, they can make a preemptive decision that the tools must be used. I’ve been on a few projects where this happened, and I can’t say that it was a huge issue. After all, the senior staff make lots of arbitrary decisions (well, they look arbitrary) about architecture, team membership, etc. One more won’t hurt too much.

4. Make running the tools required before check in

You can put wrapper scripts around CVS. I’ve seen it done it on the client side, but this can be circumvented by just running the cvs command. You can also do it on the server side. I’m not sure what the best option is, but this is a large hammer to wield: it ensures that the code meets a standard, but also displays distrust that the coder can and will do the right thing on their own. Not exactly the kind of attitude you want to convey to folks you’re paying to think for you.

I think that these automatic tools are great. Code inspection, especially of a large number of classes, is something that programs are well suited for–there’s a clear set of rules, it’s a repetitive, boring task. But make sure that you don’t forget the human element. What happens to the reported problems? No matter how much the code is automagically fixed, you need and want the programmer to look at the output of the tools, and strive to improve his or her code.

“Choose the right tool for the job!”

When you’re writing a program to perform some business function, there are usually many different options. Whether it’s the particular language, the database, the platform, or the hardware, you have to make some decisions. Like a carpenter, who chooses screws when he needs to attach two planks and a saw when he needs to shorten a dowel, programmers are supposed to choose the correct tool for the task. However, since programming is so new, changes so much, and is so abstract, it’s a bit more complex than that.

There are many things that affect the right tool, and some of the considerations aren’t directly technical:

Strategic change is one criteria. When I was working at a consultancy in 2000, there was a grand switch in language choice. perl was out, java was in. This decision was not made at a technical level, but rather at a strategic one. No matter that we’d have to retrain folks, and throw away a significant portion of our code base. No matter that for the sites we were doing, perl and java were a wash, except for the times that java was overkill. What did matter is that the future was seen to belong to java, and management didn’t want to be left behind.

Cost of the solution is another important factor. TCO is a buzzword today, but it’s true that you need to look at more than the initial cost of any piece of technology to get an idea of the true price. Linux has an initial cost of $0, but the TCO certainly isn’t. There’s the cost of maintaining it, the cost of paying for administrators, the upgrade cost, the security patch cost, the retraining cost, and the lock in cost. Windows is the same way–and though it’s hard to put a number on it, it’s clear that the future cost of a windows server is not going to be minimal, as you’ll eventually be forced to upgrade or provide support yourself.

The type of problem is another reason to preference one technology over the other. Slashdot is a database backed website. They needed speed (because of the vast number of hits they receive daily), but they didn’t need transactions. Hence, mysql was a perfect datastore, because it didn’t (at the time) support transactions, but was very fast.

The skill sets of folks available for implementation also should affect the choice. I recently worked at a company with a large number of perl applications that were integral to the company working for them. But they are slowly replacing all of them, because most of the folks working there don’t know perl. And it’s not just the skill set of the existing workers, but also the pool of available talent. I’ve heard great things about Lisp and how efficient Lisp programmers can be, but I’d never implement a business function in Lisp, because it’d be very hard to find someone else to maintain it.

The existing environment is a related influence. If everything in your organization is Windows, then a unix solution, no matter how elegant it may be to one particular problem, is going to be a poor choice. If all your previous applications were written in perl, your first java application is probably going to use perlish data structures and program flow, and is probably going to be a poor java program. I know my first server side java fell into this pit.

Time is also a factor, in a couple of different senses. How quickly are you trying to churn this code out? Do you have time to do some research into existing solutions and best practices, or to build a prototype and then throw it away? If not, then you should probably use a tool/solution that you’re familiar with, even if it’s not the best solution. Some tools add to productivity and some languages are made for quick prototyping (perl!). How long will the code be around? The answer to that is almost always ‘longer than you think,’ although in some of the projects I worked on, it was ‘only as long as the dot com boom lasts.’ You need to think about the supportability of the platform. I’m working with a Paradox client server application right now. As much as I dislike the MS monopoly, I wish it were Access, because there’s simply more information out there about Access.

There are many factors to consider when you choose a technology, and the best way to choose is not obviously clear, at least to me. Every single consideration outlined above could be crucial to a given project. Or it might be a no brainer. You can’t really know if you’ve chosen the correct technology until you’ve built the project out, and then, unless you have a forgiving boss or client, it’s probably to late to correct the worst of the mistakes. No wonder so many software projects fail.

Why I hate IDEs

I’m working on a project with Websphere Device Developer, and it constantly reminds me of why I hate integrated development environments (IDEs).

Why do I hate IDEs? Let me count the ways.

1. It’s a whole new interface that you have to learn. How do I save files? How do I save a project? How do I move around in an editor? All these questions need to be answered when I move to a new IDE; but they all lead to a more fundamental question: why should I have to relearn how to use my keyboard every time I get a new IDE.

2. One way to do things. Most IDEs have one favored way of doing anything. They may support other means, but only haphazardly. For instance, WSDD supports me editing files the filesystem, rather than through their editor, but freaks out if I use CVS (it has problems if I use most anything other than commit and update). But sometimes you aren’t even allowed the alternate method. I’m trying to get a project that was developed with one CVS repository to move to another CVS repository. WSDD lets you change repository information, but only if the project is still talking to the *same* host with the *same* cvs root. Thanks a lot guys.

3. IDEs are big pieces of code and as the size of a piece of code increases, the stability tends to decrease. In addition, they are being updated a lot more with new features (gotta give the companies some reason to buy, right). This means that you have to be aware of the environment (how often do I have to save, what work arounds do I have to use) and hence less focused on what you’re really trying to do, which is write code.

4. My biggest gripe with IDEs, however, is that they do stuff I don’t understand when I write or compile code. Now, I don’t think I should have to understand everything–I have no idea how GCC works in anything other than the most abstract sense. But and IDE is something that I interact with every day, and it’s my main interface to the code. When one does something I don’t understand to the product of my time, that scares me. I’m not just talking about code generation, although that is scary enough (just because something else generated the code, that doesn’t mean that it is right or that you won’t have to wade in and maintain it–and maintaining stuff that I write from the ground up is hard enough for me without bringing in machine generated code). I’m also talking about all the meta state that IDEs like to maintain. What files are included where, external libraries, etc. Now, of course, something has to maintain that state, but does it have to be a monolithic program with (probably) poor documentation on that process?

Some people will say that IDEs aren’t all that bad and can lead to huge productivity increases. This may be the case during the coding phase, but how much of that is lost when you have to learn a new IDE’s set of behaviors or spend time figuring out why your environment is broken by the IDE? Give me simple, reliable, old fashioned and well understood tools any day over the slick, expensive, tools that I don’t know and will have to learn each time I use a new one.

Will you be my friendster?

Friendster is an interesting phenomenon. The premise of site is that it’s easier to meet and become friends with folks if you are somehow connected to them. This is common sense and much validated by my experience–one of the things that made meeting folks in hostels when traveling was that you knew you had at least one thing in common with them: you were interested in travel. And this is true of other clubs and special interest groups–the Elks, adult sports teams, volunteer organizations, book discussion groups–all these are venues for adults to hang out with other people, knowing they have a common interest (which is whatever the purpose of the organization is).

Friendster takes this to a new level by making the social connections, by which we all have benefited, automated. Instead of having to introduce all my college friends to all my friends in Boulder, I can just invite both the Friendster, and let them check each other out. Of course, this is a pale imitation of true networking, but it’s a start. And, as many folks can attest, something that starts out as a simple on line friendship can become as deep and real as any other.

What’s interesting to me is that the level of effort to ‘get to know’ someone is very much reduced. You just look at their profile and you see what’s important to them. It’s almost as though there’s another level of friendship being created–you know more about these people than strangers or acquaintances, but less than real friends. I’ve had people email me, asking me to be their ‘friendster.’ This level of familiarity is disintermediated (I can operate entirely virtually) and permanent (unless I delete my profile, it’s going to be there as long as Friendster is around) and public (anyone connected to me can see my profile–family, friends, enemies). This means that the level of intimacy and sharing on Friendster is drastically less than you’d find at other ‘meeting places,’ including a house party.

Another interesting topic is: how the heck is Friendster going to survive. They’ve obviously put a lot of time and effort into their software. (For that matter, the members of Friendster have also put in a substantial time and data commitment.) How can the website make money (at least enough to make the site a wee bit faster)? I can see four ways:

1. Selling user information. Not very palatable, and I think this would drastically affect the quality of information that folks would be willing to give them. I’m not a big fan of giving corporations something valuable of mine to sell, and my connections definitely are valuable to me. In terms of selling information generated by users, by posting anything to Friendster, I grant them “an irrevocable, perpetual, non-exclusive, fully paid, worldwide license to use, copy, perform, display, and distribute such information and content and to prepare derivative works of, or incorporate into other works, such information and content, and to grant and authorize sublicenses of the foregoing.” In terms of selling information about users, their privacy policy doesn’t mention the possibility, other than specifying that if they change their use of personal information, they’ll email us.

2. Advertising–they already have some on the site, but we’ve all seen how profitable advertising funded websites are. Even if you’re getting a tremendous number of hits a day, the advertising has to be very focused to be successful.

3. Selling subscriptions. This is definitely coming down the pike. It will be interesting to see how many folks bail. Personally, the content on Friendster just isn’t compelling enough to pay for. If I wanted to stay in contact with old friends, especially in the age of free long distance on the weekends, I’d just call them.

4. Affiliation with product vendors. This would be easy to implement (after all, Friendster is already capturing book, movie and music info about users), wouldn’t impinge on current usage, and would offer a valuable service to users. Frankly, I’m surprised they haven’t done it already.

I like Friendster, and I like the idea of a new set of folks to ask questions of, interact with, and send email to. But I’m just not sure how long it’s going to survive. Enjoy it while it’s here.

Software as commodity

So, I was perusing the Joel On Software archives last night, and came upon Strategy Letter V in which Joel expounds on the economics of software. In particular, he mentions that commodifying hardware is easier than commodifying software. This is because finding or building substitutes for software is hard.

Substitutes for any item need to have the same attributes and behavior. The new hard drive that I install in my computer might be slower or faster, larger or smaller, but it will definitely save files and be accessible to my operating system. There are two different types of attributes of any substitute. There are required attributes (can the hard drive save files?) and ancillary attributes (how much larger is the hard drive?). A potential substitute can have all the ancillary features in the world, but it isn’t a substitute until it has all the required features. The catch to building a substitute is knowing what are required and what are ancillary–spending too much time on ancillary can lead to the perfect being the enemy of the good, but spending too little means that you can’t compete on features (because, by definition, all your viable competitors will have all the required features). (For an interesting discussion of feature selection, check out this article.)

Software substitutes are difficult because people don’t like change (not in applications, not in URLs, not in business). And software is how the user interacts with the computer, so the user determines the primary attributes of any substitute. And those are different with every user, since every user uses their software in a different manner.

But, you can create substitutes for software, especially if

  1. The users are technically apt (because such users tend to resent learning new things less).
  2. You take care to mimic user interfaces as much as you can, to minimize the new things a user had to learn.
  3. It’s a well understood problem, which means the solutions are probably pretty well understood also (open standards can help with this as well)

Bug tracking software is an example of this. Now, I’m not talking about huge defect tracking systems like Rational’s ClearCase that can, if you believe the marketing, track a bug throughout the software life cycle, up into requirements and out into deployment. I’m talking about tools that allow small teams to write better code by making sure nothing slips between the cracks. I’ve worked with a number of these tools, including Joel’s own FogBUGZ, TestTrack, Mozilla’s Bugzilla and PHPBT. I have to say that I think the open source solutions (Bugzilla and PHPBT) are going to eat the commercial solutions’ lunch for small projects, because they are a cheaper substitute with all the required attributes (bug states, email changes, users, web access).

I used to like Bugzilla, but recently have become a fan of PHPBT because it’s even simpler to install. If you have local access to sendmail, a mysql database and a web server (all of which WestHost provides for $100/year or you can get on a $50 Redhat CD and install on a old Intel box). It tracks everything that you’d need to know. It ain’t elegant, but it works.

I think that in general, the web has helped to commodify software, just because it imposes a certain uniformity of user interface. Everyone expects to use forms, select boxes, and the back button. However, as eBay knows and Yahoo! Auctions found out, there are other factors that prevent substitution of web applications.

Book Review: Afghanistan

Updated 2/25/2007: Added amazon link.

Afghanistan: A Short History of Its People and Politics, by Martin Ewans, is a fantastic book. This fascinating account of this plucky country was chock full of facts that have immediate relevance. Covering from ancient times to 2002, this book provides a traditional history–no stories of the working classes or women. But it covers the byzantine regime changes of Afghanistan very well. It als does a fine job of explaining how the Afghanistan state was in constant tension between the local tribal powers and the more modern central authority of the king. The foreign situation was also an exercise in balance, with the Afghans depending on money, guns and expertise from British India to fend off the Russian Empire. However, the relationship with the Brits wasn’t entirely golden, as the three Anglo-Afghan wars suggest.

While the history was intensely interesting, the last chapters of the book, which cover the politics and battles of the last two decades which have left Afghanistan such a mess, were the most relevant for me. If you want to know how mcuh the CIA spent supporting the Taliban, it’s in there. If you want to know which external nations supported which of the warring factions, it’s in there. If you want to know why Afghanistan grows the majority of the world’s opium, it’s in there.

I won’t say this book was easy to get through. The writing is quite dense. The frequent re-appearance of characters was at times confusing, but I fear that is more a feature of Afghan history than a shortcoming of the book. For a concise political history of a nation that we’re becoming more and more involved with, check it out.

Link to the book on Amazon.

© Moore Consulting, 2003-2017 +