Skip to content

jad

If you’ve never used jad then you’re missing out on a great tool. Jad lets you easily decompile java class files. It may be shady legally, depending on what contracts you’ve signed, but it’s definitely useful in debugging and understanding behavior of java applications. It couldn’t be simpler to use. Just run

jad classfile.class

from the command line, and you get a java file (named classfile.java) in the same directory. The names of the variables aren’t fantastic (s1, s2…) but it sure beats reading the bytecode output of javap -c.

Note, it’s free for noncommercial use, but if you want to use it commercially, contact the author for terms. And if you get a chance to download it from the above tripod.com link, grab it and store it someplace else, because the page often is unavailable due to its exceeding bandwidth limits.

Windows frustrations

I’m reading Hackers by Steven Levy right now. This book is about the first people to really program computers with enthusiasm and an eye towards some of the anarchic possibilities of the machine. And the obstacles they overcame were tremendous. Writing entire video games in assembly language, re-implementing FORTRAN for different platforms (heck, writing anything in FORTRAN at all is a trial), working with computers the size of entire building floors, dealing with the existing IBM priesthood… There were plenty of obstacles to getting work done with a computer back then.

And, there still are, I have to say. I’m currently writing this from my first laptop ever. I love it. The mobility, the freedom, especially when combined with a wireless network card. This computer came with Windows XP and I plan to leave windows on this box, primarily so that I can do more J2ME development.

Now, the first thing any Unix user learns is that you shouldn’t log in as root any more than you absolutely have to. The reasons for this are many: you can delete system files unintentionally, there’s no log file to recreate disaster scenarios, and in general, you just don’t need to do this. The first thing I do every time I’m on a new desktop Unix box is download a copy of sudo and install it. Then I change the root password to something long and forgettable, preferably with unpronounceable characters. I do this so that there’s never any chance of me logging in as the super user again. I will say that this has caused me to have to boot from a root disk a time or two, but, on the other hand, I’ve never deleted a device file unintentionally.

Anyway, the purpose of that aside was to explain why I feel that you should always run your day to day tasks as a less privileged user. Even more so on Windows than on Unix, given the wider spread of Windows viruses and, to be honest, my lack of experience administering Windows. So, the first thing I did when I got this new computer was to create a non administrative user. Of course, for the first couple of days, I spent most of my time logged in as the administrative user, installing OpenOffice, Vim and other software. I also got my wireless card to work, which was simple. Plug in the card, have it find the SSID, enter the WEP key and I was in business.

That is, until I tried to access the Internet via my wireless card when logged in as the limited user. The network bounces up and down, up and down, and there doesn’t seem to be anything I can do about it. Every second, the network changed status. To be honest, I haven’t looked in google because I can’t even think of how to describe the phenomenon. But, when I’m logged in as the administrator, it’s smooth sailing. There are some things I plan to try, like creating another administrator and seeing if that account has similar problems. If that’s the case, it’s probably not the fact that my limited privilege account has limited privileges, but rather that the network software hasn’t been made accessible to it. However, this situation is especially frustrating because the time when I least want to be logged in as an administrative user is when I’m most vulnerable to worms, viruses and rogue email attachments–that is to say, when I’m connected to the Internet.

I remember fighting this battle 3 years ago, when I was using Windows NT on a team of software developers. I was the only one (so far as I know) to create and use regularly a non privileged account. Eventually, I just said ‘screw it’ and did everything as the administrative user, much as I’ll do now after a few more attempts to make the unprivileged user work. Windows just doesn’t seem to be built for this deep division between administrators and users, and that doesn’t seem to have changed.

Dimensions of Application Configuration

Tom Malaher has written an excellent rant about the state of installing and configuring third party software. Since most programmers are definitively not at the bleeding edge of technology (“we need you to build another order entry system”), we all use third party software and understand some of his frustration. After all, it would be nice to be able to configure such software in any way we deemed fit, rather than having to deal with the dictates of the vendor.

Alas, such flexibility is not often found. Even among open source software, you can find rigidity. Of course, if you take the time, you can fix the problems, but the entire point of third party software is that you can use it
‘out of the box,’ thus saving time.

Tom gave a masterful analysis of the structural components of third party software. Though he repeatedly asks for comments and suggestions, I don’t have any to make regarding his ‘types of data’ delineation. However, I thought it would be worthwhile to examine configuration data more closely. (Eric S Raymond also covers configuration in general here.) In fact, I think there are a number of interesting facets that tie into making configuration data easy to version, store, and separate from other types of data.

1. App specific vs universal format

You can either have one configuration files (or one set of files) that are all shared by every application (a la config.sys and win.ini) or you can have application specific configuration files for every substantial installed application (a la sendmail.conf and /etc/*).

One set of files makes it easy for the user to know where the application they just installed is configured. It also ensures that all applications use roughly the same type of configuration: the same comment character, the same sectioning logic, the same naming conventions. It also means that you can use the operating system to manage the configuration files, rather than having each application have to write their own code to create and manage their configuration.

Having each application manage their own configuration files ensures that the configuration will be tailored to the application’s needs. Some applications might need a hierarchical configuration file, where some sections inherit from others. Others can get by with a simple text file with name value pairs. Another advantage of having separate configuration files is that, well, they are separate. This makes it easier to version them, as well as making it easier to tweak the configuration files, possibly to run multiple instances of one application.

2. User vs system

This is closely related to the first differentiation. However it is distinct, as it’s possible to have a system format for configuration that has specific areas for users, and to have an app specific format that excludes any other application running on a given system. The crucial question is each user can have an independent installation of a given application.

It’s hard to argue against allowing each user to have an individual configuration, but in certain situations, it may make sense. If, for example, there are parameters whose change may drastically affect the performance of a system (the size of a TCP packet), or which may govern specific limited resources (the allocation of ports), then it may make sense to limited user specific configuration. You may notices that my examples are all drawn from the operating system, and this may be one application where user specific configuration may not be a good idea,
ince the OS underlies all the other applications.

3. Binary vs text

There are two possible formats in which to store configuration information. One is eminently computer readable, minimizes disk usage, and increases the speed of the application. The other one is superior.

Binary configuration formats are quicker for the computer to read and take up less space on disk. However, they are prone to rot, as only the application that wrote it can read and manipulate the file. No one else can, and this unfortunately includes the poor programmer who needs to modify some behavior of the application years after it was written.

Text configuration files, on the other hand, parse slower and are bulkier. However, they can also be self describing (check out this sample sendmail configuration file for a counter example). This in itself is a win, because it gives a human being a chance to understand the file. In addition, such configuration files can also be manipulated by the bevy of tools that can transmogrify the configuration files into something else (a bit of perl, anyone). They can also be easily version controlled, and diffed. Pragmatic programmers like text files (section3.14) for many of the above reasons.

It’s clear that there are several different options when it comes to configuring any one particular application. Some of these are related, and some are orthogonal, but all of them deserve consideration when designing any application.

Checking the status of your files, using CVS

When I used CVS a few years ago, I remember a colleague writing a tremendous perl script that you could run from anywhere in the CVS source tree. It would let you know whether you had files that weren’t in CVS, needed to be updated, or were going to be merged. It was quite a nice piece of perl code, which essentially parsed the output of cvs status, and the information it output was quite useful at the end of a long bug fixing or coding session (“hey, what files did I change again?”). However, it also needed to be maintained and documented, as well as explained to users.

The other day, I stumbled on something which works almost as well, but is part of CVS already: cvs -qn up. The q option tells CVS to be quiet, and not chat about all the directories that it sees. The n option tells CVS not to make any changes on the filesystem, but just tell you what changes it would have made. Here’s some sample output:

[moore@localhost guide]$ cvs -qn up
? securityTechniques/NewStuff.rtf
M securityTechniques/InputValidation.rtf
M securityTechniques/SessionManagement.rtf
U securityTechniques/AuthenticationWorkingDraft.doc

M means that the file has been changed locally. ? means that the file exists on the locally, but not in the repository. U means that the file has changed in the repository, but not yet been updated locally. For more information on the output of update, look here.

Use this command and never lose track of the files in your CVS tree again.

Book Review: Enterprise J2ME

Update 2/25/07: added Amazon link.

I go to Java Users Groups (yes, I’m struggling to get in touch with my inner geek) once every two or three months. Sometimes there’s an engaging speaker, but most of the time the fellow up front looks like he’s just swallowed a hot pepper, speaks like he has a permanent stutter, and answers questions like I’m speaking Greek. (I’m not making fun; I had a hard time when I was in front of a JUG too.) Regardless of the quality of the speaker, I gain something just by watching the presentation–he points out interesting technologies and usually has a list of resources at the end that I can use for further research.

I think Michael Yuan would be a great speaker at a JUG, as he seems to have a masterful understanding of Java 2 Platform, Micro Edition (J2ME). However, the true value of his book, Enterprise J2ME, was in its introduction of new ideas and concepts, and the extensive resource listings. This book is a survey of the current state of the art in mobile java technology. Whatever your topic is, except for gaming development, you’ll find some coverage here. Securing information on the device or network, XML parsing strategies, messaging architectures, and data synchronization issues are all some of the topics that Yuan covers.

My favorite chapter was Chapter 7, ‘End to End Best Practices.’ Here, Yuan covers some of the things he’s learned in developing his own enterprise applications, and offers some solutions to five issues that differ between the J2ME world and the worlds familiar to most Java developers: J2EE and J2SE. He offers capsule solutions to the issues of “limited device hardware, slow unreliable networks, pervasive devices, ubiquitous integration [and] the impatient user.” Later in the book, he explores various architectures to expand on some of these capsules.

However, the strength of this book, exposing the reader to a number of different mobile technologies, is also its weakness. JUG speakers very rarely dive into a technology to the point that I feel comfortable using it without additional research; I usually have to go home, download whatever package was presented, and play with it a bit to get a real feel for its usefulness. This book was much the same. Some of the chapters, like chapters 12 and 13, where issues with databases on mobile devices (CDC devices, not CLDC devices) weren’t applicable to my kind of development, but you can hardly fault Yuan for that. Some of the later chapters felt like a series of ‘hello world’ applications for various vendors. This is especially true of chapter 12, and also of chapter 20, which is a collection of recipes for encryption on the device.

Additionally, I feel like some of the points he raised in Chapter 7 are never fully dealt with. An example of this is section 7.3.3, “Optimize for many devices.” The project I’m on is struggling with this right now, but I had trouble finding any further advice on this important topic beyond this one paragraph section. However, these small issues don’t take away from the overall usefulness of the book–if you are developing enterprise software, you’ll learn enough from this book to make its purchase worthwhile.

However, I wouldn’t buy the book if you’re trying to learn J2ME. Yuan gives a small tutorial on basic J2ME development in Appendix A, but you really need an entire book to learn the various packages, processes and UI concerns of J2ME, whether or not you have previously programmed in Java. Additionally, if you’re trying to program a standalone game, this book isn’t going to have a lot to offer you, since Yuan doesn’t spend a lot of time focused on UI concerns and phone compatibility issues. Some of the best practices about limited hardware may be worth reading, and if it’s a networked game, however, you may gain from his discussions in Chapter 6, “Advanced HTTP Techniques.” In general though, I’m not sure there’s enough to make it worth a game developer’s while.

I bought this book because I’m working on a networked J2ME application, and it stands alone in its discussion of the complex architectural issues that such applications face. It covers more than that, and isn’t perfect, but it is well worth the money, should you be facing the kind of problems I am. Indeed, I wish I had had this book months ago, as I’m sure it would have improved the my current application.

Link to book on Amazon.

An IP address is to DNS as a URL is to Google

I just read this post from Mike Clark. Now, I agree with some of what he says. It’s true that it is a whole lot easier to remember terms you were searching for than a URL. Words and concepts are just plain easier to remember than strings where the slightest mistype will give you a 404 error. That’s why we use DNS rather than just typing in IP addresses everywhere. However, IP addresses work almost all the time, even when the DNS server is down or misconfigured. If I know the IP address of a mail server, then I can still check my email even when I can’t resolve its domain name.

This is true of the search engine/URL dichotomy as well. Have you noticed the size of the uproar when Google changes PageRank? Every time a search engine changes its ranking algorithms, it will throw into havoc any sites you’ve memorized via search terms. And search engines change their systems more often than DNS goes down. But cool URIs [URLs] don’t change.

Another issue is that when it’s so easy to search vast amounts of information, you don’t end up looking anywhere else. This rant, which circulated a few months ago, highlights that issue. It’s almost like, if you can’t find something online, you can’t be bothered to find out about it. I do it myself. Even results of search engine queries don’t get fully explored. How often have you viewed anything other than the first page at google?

I understand the power and love of search engines, but folks, including myself, need to be sure to understand the implications of using them as shorthand for permanent links and/or shortcuts for true research.

How can you keep a website out of a search engine?

It’s an interesting problem. Usually, you want your site to be found, but there are cases where you’d rather not have your website show up in a search engine. There are many reasons for this: perhaps because google never forgets, or perhaps because what is on the website is truly private information: personal photos or business documents. There are several ways to prevent indexing of your site by a search engine. However, the only sure fire method is to password protect your site.

If you require some kind of username and password to access your website, it won’t be indexed from by any search engine robots. Even if a search engine finds it, the robot doing the indexing won’t be able to move past the login page, as they won’t have a username and password. Use a .htaccess if you have no other method of authenticating, since even simple text authentication will stop search engine robots. Intranets and group weblogs will find this kind of block useful. However, if it’s truly private information, make sure that you use SSL because .htaccess access control sends passwords in clear text. You’ll be defended from search engines, but not from people snooping for interesting passwords.

What if you don’t want people to be forced to remember a username and password? Suppose you want to share pictures of baby with Grandma and Grandpa, but don’t want to either force them to remember anything, nor allow the entire world to see your child dressed in a pumpkin suit. In this case, it’s helpful to understand how search engines work.

Most search engines start out with a given set of URLs, often submitted to them, and then follow all the links in a relentless search for more content (for more on this, see this excellent tutorial). Following the links means that submitters do not have to give the search engine each and every page of a site, as well as implying that any page linked to by a submitted site will eventually be indexed as well. Therefore, if you don’t want your site to be searched, don’t put the web sites URL any place it could be picked up. This includes archived email lists, Usenet news groups, and other websites. Make sure you make this requirement crystal clear to any other users who will be visiting this site, since all it takes is one person posting a link somewhere on the web, or submitting the URL to a search engine, for your site to be found and indexed. I’m not sure whether search engines look at domain names from whois and try to visit those addresses; I suspect not, simply because of the vast number of domains that are parked, along with the fact that robots have plenty of submitted and linked sites to visit and index.

It’s conceivable that you’d have content that you didn’t want searched, but you did want public. For example, if the information is changing rapidly: a forum or bulletin board, where the content rapidly gets out of date, or you’re EBay. You still want people to come to the web site, but you don’t want any deep links. (Such ‘deep linking’ has been an issue for a while, from 1999 to 2004.) Dynamic content (that is, content generated by a web server, usually from a relational database) is indexable when linked from elsewhere, so that’s no protection.

There are, however, two ways to tell a search engine, “please, don’t index these pages.” Both of these are documented here. You can put this meta tag: <meta name=”robots” content=”none”> in the <head> section of your HTML document. This lets you exclude certain documents easily. You can also create a robots.txt file, which allows you to disallow indexing of documents on a directory level. It also is sophisticated enough to do user-agent matching, which means that you can have different rules for different search engines.

Both of these latter approaches depend on the robot being polite and following conventions, whereas the first two solutions guarantee that search engines won’t find your site, and hence that strangers will have a more difficult time as well. Again, if you truly want your information private, password protect it and only allow logins over SSL.

imap proxy and horde

I’m implementing an intranet using the Horde suite of tools. This is written in PHP, and provides an amazing amount of out of the box, easily configured functionality. The most robust pieces are the mail client (incidentally, used by WestHost for webmail, and very scalable), the calendar, and the address book. There are a ton of other projects using the Horde framework, but most of them are in beta, and haven’t been officially released. Apparently these applications are pretty solid (at least, that’s what I get from reading the mail list) but I wanted to shy away from unreleased code. I am, however, anxiously awaiting the day that the new version is ready; as you can see from the demo site that it’s pretty sharp.

Anyway, I was very happy with the Horde framework. The only issue I had was that the mail application was very slow. I was connecting to a remote imap server, and PHP has no way to cache imap connections. Also, for some reason, the mail application reconnects to the imap server every time. However, someone on that same thread suggested using UP IMAP Proxy. This very slick C program was simple to compile and install on a BSD box, and sped up the connections noticeably. For instance, the authentication to the imap server (the only part of the application that I instrumented) went from 10 milliseconds to 1. It apparently caches the user name and password (as an MD5 hash) and only communicates with the imap server when it doesn’t have the information needed (for example, when you first visit, or when you’re requesting messages from your inbox). It does have some security concerns (look here and search for P_NEWLOG), but you can handle these at the network level. All in all, I’m very impressed with UP IMAP Proxy.

And, for that matter, I’m happy with Horde. I ended up having to write a small horde module, and while the framework doesn’t give you some things that I’m used to in the java world (no database pooling, no MVC pattern) it does give you a lot of other stuff (an object architecture to follow, single sign-on, logging). And I’m not aware of any framework in the java world that comes with so many applications ready to roll out. It’s all LGPL and, as I implied above, the released modules have a very coherent structure that makes it easy to add and subtract needed functionality.

Bravo Horde developers! Bravo imap proxy maintainer!

mod_alias to the rescue

Have you ever wanted to push all the traffic from one host to another? If you’re using apache, it’s easy. I needed to have all traffic from a http://www.foo.com site go to https://secure.foo.com. Originally, I was thinking of having a meta header redirect on the index.html page, and creating a custom 404 page that would also do a redirect.

Luckily, some folks corrected me, and showed me an easier way. mod_alias (ver 2.0) can do this easily, and as far as I can tell, transparently. I just put this line in the virtual server section for www.foo.com:

Redirect permanent / https://secure.foo.com/

Now, every request for any file from www.foo.com gets routed to secure.foo.com. And since they share the same docroot, this is exactly what I wanted.

To do this, make sure you have mod_alias available. It should be either compiled in (you can tell with httpd -l) or a shared library (on unix, usually called mod_alias.so). You have to make sure to load the shared library; see LoadModule and AddModule for more information.

My most popular posting

I don’t know why, but my post on yahoo mail problems is my most popular post thus far. I suspect it got picked up in google, or some other search engine, and is now serving as a place for folks to gripe about the free Yahoo! mail service. (Incidentally, I’m the second “Dan Moore” in google now! Meri has some interesting things to say about this intersection between the internet and real life.) This is interesting (and a bit amusing) to me for several reasons:

For one, there’s no helpful content on that posting for these folks problems. In fact, I don’t even use the free service from Yahoo (I pay extra for storage). And the posting concerns the short term problems a client of mine had with the new Yahoo mail interface, and how outsourcing exposes you to those types of risks. The comments are not germane to the posting.

Or should I say that the posting is not germane to the comments? As is ever the case on internet forums, this posting has been hijacked by people who want to complain and share possible fixes to a very real problem–they can’t get to their email (I’m cranky when I can’t to my email, after all). I don’t begrudge them the use of my site; this just reinforces what Clay Shirky wrote about social software–people will twist software until it does what they need it to do, and fighting that is a lost cause (and has been for 20 years).

And that’s not just true for software, but for technology in general. After all, I doubt anyone working on radar thought it would someday be used for re-heating leftovers, and I’m sure that Daguerre (the inventor of photography) would be shocked at some of the pictures I’ve taken at house parties.