Hey, I like to work at the higher levels of the 7 Layer Burrito, the Application, Presentation and Session layers. But every so often, you have to dig a bit deeper. Currently, I’m troubleshooting a ColdFusion application that was converted from a local mysql database to a remote postgresql database. There are quite a few docs about optimizing postgresql, but the focus on query and local database optimization, and I think the issue was the network traffic (based on load average of both the local and remote boxes). Anyway, I found this neat tool called IPTraf which gives you real time monitoring of ip traffic. Pretty nice, but avoid the US mirror of the binary build, since it’s not complete.
Hackers, by Steven Levy, should be required reading for anyone who programs computers for a living. Starting from the late 1950s, when the first hackers wrote code for the TX-0 and every instruction counted, to the early 1980s, when computers fully entered the consumer mainstream, and it was marketing rather than hacking which mattered. Levy divides this time into three eras: that of the ‘True Hackers,’ who lived in the AI lab at MIT and spent most of their time on the PDP series, the ‘Hardware Hackers,’ mostly situated in Silicon Valley and responsible for enhancing the Altair and creating the Apple, and the ‘Game Hackers,’ who were also centered in California; expert at getting the most out of computer hardware, they were also the first to make gobs and gobs of money hacking.
The reason everyone who codes should read this book is to gain a sense of history. Because the field changes so quickly, it’s easy to forget that there is a history, and, as Santayana said, “Those who do not remember the past are doomed to repeat it.” It’s also very humbling, at least for me, to see what kind of shenanigans were undertaken to get the last bit of performance from a piece of hardware that was amazing for its time, but now would be junked without a thought. And a third takeaway was the transformation that the game industry went through in the early 80s: first you needed technical brilliance, because the hardware was slow and new techniques needed to be discovered. However, at some point, the hard work was all done, and the business types took over. To me, this corresponds to the 1997-2001 time period, with the web rather than games being the focus.
That’s one of my beefs–the version I read was written in 1983, and republished, with a new afterword in 1993. So, there’s no mention of the new ‘4th generation’ of hackers, who didn’t have the close knit communities of the Homebrew Computer Club or the AI lab, but did have a far flung, global fellowship via email and newsgroups. It would be a fascinating read.
Beyond the dated nature of the book, Levy omits several developments that I think were fundamental to the development of the hacker mindset. There’s only one mention of Unix in the entire book, and no mention of C. In fact, the only languages he mentions are lisp, basic and assembly. No smalltalk, and no C. I also feel that he overemphasizes ‘hacking’ as a way that folks viewed and interacted with the world, without defining it. For instance, he talks about Ken Williams, founder of Sierra Online, ‘hacking’ the company, when it looked to me like it was simple mismanagement.
For all that, it was a fantastic read. The more you identify with the geeky, single males who were in tune with the computer, the easier and more fun a read it will be, but I still think that everyone who uses a computer could benefit from reading Hackers, because of the increased understanding of the folks that we all depend on to create great software.
Tom Malaher has written an excellent rant about the state of installing and configuring third party software. Since most programmers are definitively not at the bleeding edge of technology (“we need you to build another order entry system”), we all use third party software and understand some of his frustration. After all, it would be nice to be able to configure such software in any way we deemed fit, rather than having to deal with the dictates of the vendor.
Alas, such flexibility is not often found. Even among open source software, you can find rigidity. Of course, if you take the time, you can fix the problems, but the entire point of third party software is that you can use it
‘out of the box,’ thus saving time.
Tom gave a masterful analysis of the structural components of third party software. Though he repeatedly asks for comments and suggestions, I don’t have any to make regarding his ‘types of data’ delineation. However, I thought it would be worthwhile to examine configuration data more closely. (Eric S Raymond also covers configuration in general here.) In fact, I think there are a number of interesting facets that tie into making configuration data easy to version, store, and separate from other types of data.
1. App specific vs universal format
You can either have one configuration files (or one set of files) that are all shared by every application (a la config.sys and win.ini) or you can have application specific configuration files for every substantial installed application (a la sendmail.conf and /etc/*).
One set of files makes it easy for the user to know where the application they just installed is configured. It also ensures that all applications use roughly the same type of configuration: the same comment character, the same sectioning logic, the same naming conventions. It also means that you can use the operating system to manage the configuration files, rather than having each application have to write their own code to create and manage their configuration.
Having each application manage their own configuration files ensures that the configuration will be tailored to the application’s needs. Some applications might need a hierarchical configuration file, where some sections inherit from others. Others can get by with a simple text file with name value pairs. Another advantage of having separate configuration files is that, well, they are separate. This makes it easier to version them, as well as making it easier to tweak the configuration files, possibly to run multiple instances of one application.
2. User vs system
This is closely related to the first differentiation. However it is distinct, as it’s possible to have a system format for configuration that has specific areas for users, and to have an app specific format that excludes any other application running on a given system. The crucial question is each user can have an independent installation of a given application.
It’s hard to argue against allowing each user to have an individual configuration, but in certain situations, it may make sense. If, for example, there are parameters whose change may drastically affect the performance of a system (the size of a TCP packet), or which may govern specific limited resources (the allocation of ports), then it may make sense to limited user specific configuration. You may notices that my examples are all drawn from the operating system, and this may be one application where user specific configuration may not be a good idea,
ince the OS underlies all the other applications.
3. Binary vs text
There are two possible formats in which to store configuration information. One is eminently computer readable, minimizes disk usage, and increases the speed of the application. The other one is superior.
Binary configuration formats are quicker for the computer to read and take up less space on disk. However, they are prone to rot, as only the application that wrote it can read and manipulate the file. No one else can, and this unfortunately includes the poor programmer who needs to modify some behavior of the application years after it was written.
Text configuration files, on the other hand, parse slower and are bulkier. However, they can also be self describing (check out this sample sendmail configuration file for a counter example). This in itself is a win, because it gives a human being a chance to understand the file. In addition, such configuration files can also be manipulated by the bevy of tools that can transmogrify the configuration files into something else (a bit of perl, anyone). They can also be easily version controlled, and diffed. Pragmatic programmers like text files (section3.14) for many of the above reasons.
It’s clear that there are several different options when it comes to configuring any one particular application. Some of these are related, and some are orthogonal, but all of them deserve consideration when designing any application.
When I used CVS a few years ago, I remember a colleague writing a tremendous perl script that you could run from anywhere in the CVS source tree. It would let you know whether you had files that weren’t in CVS, needed to be updated, or were going to be merged. It was quite a nice piece of perl code, which essentially parsed the output of
cvs status, and the information it output was quite useful at the end of a long bug fixing or coding session (“hey, what files did I change again?”). However, it also needed to be maintained and documented, as well as explained to users.
The other day, I stumbled on something which works almost as well, but is part of CVS already:
cvs -qn up. The
q option tells CVS to be quiet, and not chat about all the directories that it sees. The
n option tells CVS not to make any changes on the filesystem, but just tell you what changes it would have made. Here’s some sample output:
[moore@localhost guide]$ cvs -qn up
M means that the file has been changed locally.
? means that the file exists on the locally, but not in the repository.
U means that the file has changed in the repository, but not yet been updated locally. For more information on the output of update, look here.
Use this command and never lose track of the files in your CVS tree again.
After attending a presentation at BJUG about software tools, I investigated jalopy and I liked what I found. Now, jalopy is more than just a javadoc comment inserter, but javadoc insertion was my primary use of the tool. It may be piss poor for code formatting and whatnot, but it was pretty good at inserting javadoc. I was using the ant plug-in and the instructions were simple and straight forward. It didn’t blow away any existing comments, and it didn’t munge any files, once I configured it correctly. And there are, make no mistake, lots of configuration options.
Jalopy has a slick Swing interface to set all these configuration options, and you can export your configuration to an XML file which can be referenced by others. This, along with the ant integration, make it a good choice for making sure that all code checked in by a team has similar code formatting.
However, I do have a few minor quibbles with this tool.
1. The default configuration of javadoc is busted. When you run it, it javadocs methods and classes just fine, but any fields are marked with “DOCUMENT ME!” when they should be commented out: “/** DOCUMENT ME! */”. This means that, with the default configuration, you can’t even run the formatter twice, since jalopy itself chokes on the uncommented “DOCUMENT ME!”.
2. The configuration file is not documented anywhere that I could find. I looked long and hard on the Internet, and only found one example of a jalopy configuration file here. And this is apparently just the default options exported to a file. I’ve put up a sample configuration file here which fixes problem #1. (This configuration is only for javadoc; it accepts all other defaults.)
3. The zip file that you download isn’t in its own directory. This means that when you unassumingly unzip it, it spews all over your current directory.
None of these are show stoppers, that’s for sure. If you’re looking for a free, open source java code formatting tool, jalopy is worth a close look.
In the latest edition of 2600, there’s an article about webhacking with CVS. The basic premise of this article is that if you do a cvs checkout of your static html site to your webroot, you let folks with inquisitive minds and an understanding of CVS know more than you intended about your IT infrastructure. Read the article for more information.
However, it’s easy enough to defeat. The answer is to use the cvs export command, which generates exactly the same files as a checkout except without the CVS directories. Rolling out updates toa a site via this command means, of course, that any changes you make to files in the web directory will be blown away. But, it could be argued that it’s a godd thing to force everything to go through CVS. It also means that you can’t make incremental updates as easily. It’s still possible, but you just have to check out the source to some other place and copy the file over manually. Another option, which lets you do updates more easily, is
rsync --cvs-exclude, which does the same thing.
Using either of these solutions makes it a bit tougher to move content to the website. But it makes things a whole lot more secure.
I went to BJUG meeting tonight, and the topic was automatic code standardization tools. Tom Marrs gave a good presentation which covered 4 open source tools that integrate with ant:
Checkstyle checks that code fits existing guidelines. It comes configured to check against Code Conventions for the Java Programming Language. pmd is lint for java; it actually has a page where you can see it run against itself. It also finds generic exceptions and complains. Both of these tool show you where problems exist in your code, usually by generating a nice HTML report, but don’t modify the source.
The next two tools actually modify your .java files. cleanImports fixes erroneous import statements, and cleans up com.foo.* imports. It’s smart enough, supposedly, to only import the actual classes that are used in a particular file. Jalopy is a bit more ambitious, and attempts to fix missing javadoc, whitespace problems, brace placement and some other problems.
Now, you need a combination of these tools. The style checkers can be very strict, since they don’t have to be smart enough to fix the problems they find. The code beautifiers, on the other hand, actually fix the problems that they find. Tom made some good points that these programs can generate a lot of output, and it makes sense to prioritize in deciding which problems to fix. Especially when you aren’t starting with a blank slate, it makes a lot of sense to ignore some of the lesser evils (who cares about whitespace when you have a constant that isn’t
A member of the audience brought up a good point, which is that using these kind of tools is at least as much a political problem as it is a software problem. Few folks are going to argue that having a consistent coding standard makes maintenance easier, but I think that few folks are going to argue that it’s the most important factor. But, as I see it, there are a couple of different things you can do to enforce coding standards. I list these below in increasing order of intrusiveness.
1. Make the tools available
If you make the tools available on the project, folks will probably use it. After all, who likes writing crappy code? All these tools integrate with ant, and some integrate with popular IDEs. Make developers aware of the tools and add the targets to your standard build files, and encourage folks to use it.
2. Get buy in from the team
If you’re on a team, it may make sense to have ‘tools meeting’ at the beginning of a project (or in the middle, for that matter). Decide on basic standards (and remember, the location of braces isn’t really that important), after explaining how it makes folks’ lives easier. Build a consensus that using one or two of these tools is a good thing to do, and should be done before code is checked in.
3. Have senior staff dictate usage: ‘thou shalt use pmd’
If the senior members of a team feel strongly about this, they can make a preemptive decision that the tools must be used. I’ve been on a few projects where this happened, and I can’t say that it was a huge issue. After all, the senior staff make lots of arbitrary decisions (well, they look arbitrary) about architecture, team membership, etc. One more won’t hurt too much.
4. Make running the tools required before check in
You can put wrapper scripts around CVS. I’ve seen it done it on the client side, but this can be circumvented by just running the cvs command. You can also do it on the server side. I’m not sure what the best option is, but this is a large hammer to wield: it ensures that the code meets a standard, but also displays distrust that the coder can and will do the right thing on their own. Not exactly the kind of attitude you want to convey to folks you’re paying to think for you.
I think that these automatic tools are great. Code inspection, especially of a large number of classes, is something that programs are well suited for–there’s a clear set of rules, it’s a repetitive, boring task. But make sure that you don’t forget the human element. What happens to the reported problems? No matter how much the code is automagically fixed, you need and want the programmer to look at the output of the tools, and strive to improve his or her code.
When you’re writing a program to perform some business function, there are usually many different options. Whether it’s the particular language, the database, the platform, or the hardware, you have to make some decisions. Like a carpenter, who chooses screws when he needs to attach two planks and a saw when he needs to shorten a dowel, programmers are supposed to choose the correct tool for the task. However, since programming is so new, changes so much, and is so abstract, it’s a bit more complex than that.
There are many things that affect the right tool, and some of the considerations aren’t directly technical:
Strategic change is one criteria. When I was working at a consultancy in 2000, there was a grand switch in language choice. perl was out, java was in. This decision was not made at a technical level, but rather at a strategic one. No matter that we’d have to retrain folks, and throw away a significant portion of our code base. No matter that for the sites we were doing, perl and java were a wash, except for the times that java was overkill. What did matter is that the future was seen to belong to java, and management didn’t want to be left behind.
Cost of the solution is another important factor. TCO is a buzzword today, but it’s true that you need to look at more than the initial cost of any piece of technology to get an idea of the true price. Linux has an initial cost of $0, but the TCO certainly isn’t. There’s the cost of maintaining it, the cost of paying for administrators, the upgrade cost, the security patch cost, the retraining cost, and the lock in cost. Windows is the same way–and though it’s hard to put a number on it, it’s clear that the future cost of a windows server is not going to be minimal, as you’ll eventually be forced to upgrade or provide support yourself.
The type of problem is another reason to preference one technology over the other. Slashdot is a database backed website. They needed speed (because of the vast number of hits they receive daily), but they didn’t need transactions. Hence, mysql was a perfect datastore, because it didn’t (at the time) support transactions, but was very fast.
The skill sets of folks available for implementation also should affect the choice. I recently worked at a company with a large number of perl applications that were integral to the company working for them. But they are slowly replacing all of them, because most of the folks working there don’t know perl. And it’s not just the skill set of the existing workers, but also the pool of available talent. I’ve heard great things about Lisp and how efficient Lisp programmers can be, but I’d never implement a business function in Lisp, because it’d be very hard to find someone else to maintain it.
The existing environment is a related influence. If everything in your organization is Windows, then a unix solution, no matter how elegant it may be to one particular problem, is going to be a poor choice. If all your previous applications were written in perl, your first java application is probably going to use perlish data structures and program flow, and is probably going to be a poor java program. I know my first server side java fell into this pit.
Time is also a factor, in a couple of different senses. How quickly are you trying to churn this code out? Do you have time to do some research into existing solutions and best practices, or to build a prototype and then throw it away? If not, then you should probably use a tool/solution that you’re familiar with, even if it’s not the best solution. Some tools add to productivity and some languages are made for quick prototyping (perl!). How long will the code be around? The answer to that is almost always ‘longer than you think,’ although in some of the projects I worked on, it was ‘only as long as the dot com boom lasts.’ You need to think about the supportability of the platform. I’m working with a Paradox client server application right now. As much as I dislike the MS monopoly, I wish it were Access, because there’s simply more information out there about Access.
There are many factors to consider when you choose a technology, and the best way to choose is not obviously clear, at least to me. Every single consideration outlined above could be crucial to a given project. Or it might be a no brainer. You can’t really know if you’ve chosen the correct technology until you’ve built the project out, and then, unless you have a forgiving boss or client, it’s probably to late to correct the worst of the mistakes. No wonder so many software projects fail.
I’m working on a project with Websphere Device Developer, and it constantly reminds me of why I hate integrated development environments (IDEs).
Why do I hate IDEs? Let me count the ways.
1. It’s a whole new interface that you have to learn. How do I save files? How do I save a project? How do I move around in an editor? All these questions need to be answered when I move to a new IDE; but they all lead to a more fundamental question: why should I have to relearn how to use my keyboard every time I get a new IDE.
2. One way to do things. Most IDEs have one favored way of doing anything. They may support other means, but only haphazardly. For instance, WSDD supports me editing files the filesystem, rather than through their editor, but freaks out if I use CVS (it has problems if I use most anything other than commit and update). But sometimes you aren’t even allowed the alternate method. I’m trying to get a project that was developed with one CVS repository to move to another CVS repository. WSDD lets you change repository information, but only if the project is still talking to the *same* host with the *same* cvs root. Thanks a lot guys.
3. IDEs are big pieces of code and as the size of a piece of code increases, the stability tends to decrease. In addition, they are being updated a lot more with new features (gotta give the companies some reason to buy, right). This means that you have to be aware of the environment (how often do I have to save, what work arounds do I have to use) and hence less focused on what you’re really trying to do, which is write code.
4. My biggest gripe with IDEs, however, is that they do stuff I don’t understand when I write or compile code. Now, I don’t think I should have to understand everything–I have no idea how GCC works in anything other than the most abstract sense. But and IDE is something that I interact with every day, and it’s my main interface to the code. When one does something I don’t understand to the product of my time, that scares me. I’m not just talking about code generation, although that is scary enough (just because something else generated the code, that doesn’t mean that it is right or that you won’t have to wade in and maintain it–and maintaining stuff that I write from the ground up is hard enough for me without bringing in machine generated code). I’m also talking about all the meta state that IDEs like to maintain. What files are included where, external libraries, etc. Now, of course, something has to maintain that state, but does it have to be a monolithic program with (probably) poor documentation on that process?
Some people will say that IDEs aren’t all that bad and can lead to huge productivity increases. This may be the case during the coding phase, but how much of that is lost when you have to learn a new IDE’s set of behaviors or spend time figuring out why your environment is broken by the IDE? Give me simple, reliable, old fashioned and well understood tools any day over the slick, expensive, tools that I don’t know and will have to learn each time I use a new one.
I love documentation. I like writing it, and when it’s well written, I love reading it. There are many types of documentation, and they aren’t all the same. Some serves to illustrate what you can do with a product (think the little product manuals that everyone throws away). Some serves to nail down exactly what will be done (in software, between two business parties, etc). But what I’m writing about today is software documentation, especially programmer to programmer documentation.
I love it for a number of reasons. Good documentation cuts down on communication between software engineers, hence increasing scalability. At the company where I used to work, each developer had their own instance of the application server to which we were developing (whether it was ATG Dynamo, Weblogic, or Tomcat). So, every time a new developer rolled on to the project, they had to be set up. Either the programmer had to do it, or someone else did. On a couple of the projects, I was involved in setting up the first one or two, but I quickly tired of that. So, I wrote a step by step document that enabled the incoming programmer to do the setup themselves. This was good for me, because it saved me time, good for the programmer as it gave them a greater understanding of the platform on which they were developing, and good for the project, as if I got hit by a bus, the knowledge of how to set up a server wasn’t lost.
Good documentation also has come to my rescue more than once, by saving information that I struggled to find at one time, but did’t not use every day. For example, I imported a project I’m working on into Eclipse. It wasn’t strenuous, but it wasn’t a cakewalk either. So, for other programmers on the project, I wrote down how I did it. Now, a few months later, I couldn’t tell you how I did it. Not at all–that knowledge has been forced out of my brain by other more important stuff–like when my parents’ birthday’s are, what I’m going to bring to my potluck tonight, the name of that game where you roll plastic pigs around and score points based on their position–you know, important stuff. But, should I have a need to do another import, I can! I know the knowledge is stored somewhere safe (in CVS, but that’s a different entry).
There are two complaints about programmer to programmer documentation that I’d like to address. One is that it quickly becomes outdated. This is true. It takes an effort to maintain documentation. When I change the procedure or meaning of something, I try to remind myself of the two benefits above. If I can convince myself that I will save more time in the long run by documenting (through not having to explain the changes to others or myself), then I do it. I’m not always successful, I’ll admit. And you can see this with product documentation (both closed and open source). Out of date documentation can be very frustrating, and I’m not sure whether it’s better dealt with by tossing the documentation or by keeping it and marking it ‘OUT OF DATE.’
The other issue is what I call the ‘protecting your job’ excuse for avoiding documentation. If you don’t document what you’ve done, you probably will have a secure job–especially if it’s an important piece of work. But that security is also a chain that binds. In addition to being a subtle gesture of distrust towards your management (always a good idea to torque off your management in this time of uncertainty), it means that when a different, and possibly better, opportunity comes along, you won’t be able to take it. Since no one else knows how to do your job (because teaching someone also is a form of documenting) you’re stuck in the same position. Not exactly good for your personal growth, eh?
In short, documentation that gets used is good documentation, and well worth the effort to write.