Skip to content

Trust, but verify

As I’ve mentioned previously the web lets smaller players get into the publishing arena, and we all know there are some amazing websites chock full of interesting and useful information. If you’re tired of hearing the hyperbolic claims of either presidential candidate, and want to see them debunked, check out factcheck.org. Non-partisan and detailed examinations of ads can only help voters make an informed choice. Now, if only they had an RSS feed!

java memory management, oh my!

How much do you understand basic java? Every day I find some part of this language that I’m not aware of, or don’t understand. Some days it’s cool APIS (like JAI) but today it’s concurrency. Now, language managed memory is a feature that’s been present in the languages in which I’ve been programming since I started. I’ve looked at C and C++, but taking a job coding in those seems to me it’d be like a job with a long commute–both have obstacles keeping you from getting real work done. (I’m not alone in feeling this way.) But this thread of comments on Cameron Purdy’s blog drove home my ignorance. However, the commenters do point out several interesting articles (in particular, this article about double checked locking was useful and made my head hurt at the same time) to alleviate that. I took a class with Tom Cargill a few years back, which included his threading module, that helped a bit.

However, all these complexities are why servlets (and EJBs) are so powerful. As long as you’re careful to only use local variables, why, you shouldn’t have to worry about threading at all. That’s what you use the container for, right? And we all know that containers are bug free, right? And you’d never have to go back and find some isolated thread related defect that affected your code a maddeningly miniscule amount of time, right?

Symlinks and shortcuts and Apache

So, I’m helping install Apache on a friend’s computer. He’s running Windows XP SP1, and Apache has a very nice page describing how to install on Windows. A few issues did arise, however.

1. I encountered the following error message on the initial startup of the web server:

[Tue Jun 15 23:09:11 2004] [error] (OS 10038)An operation was attempted on something that is not a socket. : Child 4672: Encountered too many errors accepting client connections. Possible causes: dynamic address renewal, or incompatible VPN or firewall software. Try using the Win32DisableAcceptEx directive.

I read a few posts online that suggested I could just follow the instructions–I did and just added the Win32DisableAcceptEx directive to the bottom of the httpd.conf file. A restart, and now localhost shows up in a web browser.

2. Configuration issues: My friend also has a firewall on his computer (good idea). I had to configure the firewall to allow Apache to receive packets, and respond to them. Also, I had to configure the gateway (my friend shares a few computers behind one fast internet connection) to forward the port that external clients can request information from to the computer on which Apache was running. Voila, now I can view the default index.html page using his IP address.

3. However, the biggest hurdle is yet to come. My friend wants to server some files off one of his hard drives (a different one than Apache is installed upon). No problem on unix, just create a symlink. On windows, I can use a shortcut, right? Just like a symlink, they “…can point to a file on your computer or a file on a network server.”

Well, not quite. Shortcuts have a .lnk extension, and Apache doesn’t know how to deal with that, other than to serve it up as a file. I did a fair bit of searching, but the only thing I found on dealing with this issue was this link which basically says you should just reconfigure Apache to have its DocRoot be the directory which contains whatever files you’d like to serve up. Ugh.

However, the best solution is to create an Alias (which has helped me in the past) to the directories you’re interested in serving up. And now my friend has Apache, installed properly as a service, to play around with as well.

PL/SQL

I recently wrote a basic data transformation program using Java and PL/SQL. I hadn’t used PL/SQL (which is an Oracle-specific procedural language for stored procedures) since writing a basic data layer for my first professional project (a Yahoo! like application written in PL/SQL, perl and Story Server–don’t ask). Anyway, revisiting PL/SQL reminded me of some of the things I liked and disliked about that language.

I like:

Invalidation of dependencies. In PL/SQL, if package A (packages are simply arbitrary, hopefully logical, groups of procedures and functions) calls package B, A depends on B. If the signatures of B are recompiled (you can separate the signatures from the implementations) package A simply won’t run until you recompile it. This is something I really wish other languages would pick up, because it at least lets you know when something you depend on has changed out from under you.

I dislike:

The BEGIN and END blocks, which indicate boundaries for loops and if statements, are semantically no different than the { and } which I’ve grown to love in perl and Java. But for some reason, it takes me back to my pascal days and leaves a bit of a bad taste in my mouth.

I’m unsure of:

The idea of putting business logic in a database. Of course, schemas are intimately tied to the business layer (ask anyone trying to move to a different one) and anyone who pretends that switching databases in a java applications is a simple matter of changing a configuration file is smoking crack, but the putting chunks of business logic in the data layer introduces a few problems. Every different language that you use increases the complexity of a project–and to debug problems with the interface between them, you need to have someone who knows both. Also, stored procedures don’t fit very well into any of the object relational mapping tools and pretty much force you to use jdbc.

Death marchs and Don Quixote

I just finished watching ‘Lost In La Mancha’ which chronicles Terry Gilliam’s attempt to film his version of the story of Don Quixote, ‘The Man Who Killed Don Quixote’. (More reviews here.) The attempt failed, though there was a postscript that indicated that Gilliam was trying again. (An aside: not the best date movie.)

It was interesting to watch the perspective of the team start upbeat and slowly descend into despair. There were many reasons the filming failed, but what was most fascinating is that it was a death march project that just happened to take place in the sphere of film.

Of course there were certain ‘acts of God’ that contributed to the failure, but there always are difficulties beyond control. What’s more interesting to me is the disasters that could have been planned for. Read through some of the aspects of ‘Lost In La Mancha’ and see if you recognize any (plenty of spoilers, so don’t read if you want to watch the movie):

1. Gilliam tried to create a $60 million film on a $32.1 million dollar budget. He actually smiles while saying words to this effect!

2. Not all key players present during planning. In pre-production, none of the actors are able to schedule time to rehearse, partly because they all took pay cuts to make this movie (see point 1), partly because they were all busy.

3. Tight timelines. Due to money and scheduling, every day of filming was very carefully planned out; any problems on early days required changes to the entire schedule.

4. A visionary architect wasn’t willing to compromise. Gilliam is well known for his mind-blowing films (Twelve Monkeys, Brazil) and had been working on this movie in his mind for decades. This led to perfectionism, which, given the tight timelines and lack of money, wasn’t always the right use of resources. Addtitionally, Gilliam had a lackadaisical methodology: he mentions several times that his philosophy is ‘just shoot film and it will work out.’ That sounds freakishly similar to ‘just start coding and everything will be fine.’

5. Project history worked against success. This is one of the most interesting points–there were really two kinds of project history present. Film versions of ‘Don Quixote’ have a checkered past–Orson Welles tried for years to make a version, even continuing to film beyond his Don Quixote dying. And Gilliam has had at least one bomb–The Adventures of Baron Munchausen, a box office failure which haunted him for years. In both of these cases, there past actions cast a shadow over the present, affecting morale of the team.

6. When problems arose, the producers didn’t trust the technical staff (the directors). In particular, when weather struck, the directors wanted to allow the team to regroup, whereas the producers, because of points 1 and 3, wanted to film. Strife at the top never helps a project.

7. The equipment and setting was not optimal. Due to, I’m guessing, point 1, the outside scenes are set in a location next to a NATO air base, where jets will be flying overhead (‘only for an hour a day’ according to the first assistant director). The last sound stage in Madrid is reserved–it turns out to be a simple warehouse with awful acoustics.

And then there were some factors that simply were out of the blue. These included some bad weather and the illness of the actor playing Don Quixote. These were what pushed the film over the edge–but it wouldn’t have been on the edge if not for the other factors above. And you can also see that factors snowball on each other–timelines are tight because actors aren’t around; trust between team members is lost because of money and time issues.

It was like watching a train wreck in slow motion, but it was also illuminating to see that the lessons of project management not only are ignored in the software development, but also in film. Misery loves company.

Lessons from a data migration

I’ve been working on a data migration project for the last couple of months. There are two schemas, each used by a number of client applications implemented in a number of technologies, and I just wanted to share some of the lessons I’ve learned. Most of the clients are doing simple CRUD but there is some business logic going on as well. I’m sure most of these points will elicit ‘no-duhs’ from many of you.

1. Domain knowledge is crucial. There were many times where I made dumb mistakes because I didn’t understand how one column mapped to another, or how two tables were being consolidated. This would have been easier if I’d had an understanding of the problem space (networking at level 3 and below of the OSI burrito).

2. Parallel efforts end up wasting a lot of time, and doing things in the correct order is important. For instance, much of the client code was refactored before the data layer had settled down. Result? We had to revisit the client layer again. It was hard to split up the data layer work in any meaningful manner, because of the interdependencies of the various tables (thought doing this made more sense than updating client code). Multiple users working on DDL and DML in the same database leads to my next conclusion:

3. Multiple databases are required for effective parallel efforts. Seems like a no-brainer, but the maintenance nightmare of managing multiple developer databases often leads to developers sharing one database. This is workable on a project where most of the development is happening on top of a stable database schema, but when the schema and data are what is being changed, issues arise. Toes are stepped on.

4. Rippling changes through to clients presents you with a design choice. For large changes, like tables losing columns or being consolidated, you really don’t have a choice–you need to reflect those changes all the way through your application. But when it’s a small change, like the renaming of a column, you can either reflect that change in your value objects, or you can hide the changes, either in the DAO (misnamed properties) or database layer (views). The latter choice will lead to confusion down the line, but is less work. However, point #5 is an alternative to both:

5. Code generation a good idea in this case. Rather than having static objects that are maintained in version control, if the value objects and DAOs had some degree of flexibility in terms of looking at the database to determine their properties, adding, deleting and renaming columns would have been much much easier–freeing up more time to fix the GUI and business layer problems that such changes would cause.

Understanding the nuts and bolts

I remember back when EJBs first came out and there were all these tools bundled with the application server to build the XML deployment descriptors. Yet, the team I was on built a (perl) application which could generate those same descriptors. Why? Was it a case of ‘Not Invented Here’ syndrome? Someone with more time than sense? Well, perhaps, but it also ensured the team had a portable way of developing deployment descriptors and made sure that someone had a deep knowledge of said files.

Now, I feel the same way about web applications in general and JSF in particular. If you want to really understand the applications you create, you want to build them from the ground up. But, rather than regurgitate the arguments expounded so clearly in The Law of Leaky Abstractions and Beware Evil Wizards, I’d like to talk about where tools are good. This touches on some of the articles I’ve written before, including ease of programming.

Tools tend to be a fundamental parts of large systems that have a lot of people involved. Specialized knowledge (or lack of same) can lead to tools being built to help or insulate the users from certain grungy parts of a system–hence the EJB roles which split the deployer and programmer roles (among others) apart. That works fine with a large team.

But another interesting aspect of tools is the abstraction. Joel posulates that eventually the abstraction breaks down, and I’ve seen it happen. But, then again, I don’t expect to understand all of the socket handling that Tomcat does, or the TCP stack of the operating system on which Tomcat runs. I might have to delve into it if there are issues and it’s a high performance site, but in the normal course of events, that is simply not required. To link to another pundit, situations arise where such scalability just isn’t in the nature of the application. I’d also submit the tons and tons of VBA apps built on top of Outlook and the large complex spreadsheets build on Excel as examples of applications where software design, let alone a deep understanding of the fundamental building blocks of the language, is not required.

Sometimes, you just want to get the job done, and understanding the nuts and bolts isn’t necessary. In fact, it can be an inhibition. I was talking to an acquaintance today who used to code. When asked why he didn’t anymore, he pointed back to one factor–he wanted to be able to service the customer more quickly. At a higher level of abstraction, you can do that. You give up control, because the implementation of the service is usually in other hands (allowing you to go on to service another customer), because in the end, it all needs to be coded somehow. Tools, like Rave and Visual Studio.NET, make that trade off as well.

Denver No Fluff Just Stuff

Well, I just got done with two days of the Denver No Fluff Just Stuff conference. First off, unlike the previous NFJS conferences, this one wasn’t held in the DTC. You forget how far that is from Boulder, until you drive there and back twice on a weekend.

Anyway, I thought I’d share a few choice thoughts and tidbits regarding some of the sessions I attended. These are by no means full summaries of the talks.

Mock objects–Dave Thomas

Mock objects are objects that emulate behavior of external entities that make testing difficult. (I’ve worked with a few Englishmen in my life, and Dave Thomas had the same acerbic sense of humor.) Dave illustrated how to choose when to implement a mock object, as opposed to using the real object. He also touched on the true difficulty of mock objects, which is figuring out how to choose which object to use in your class (factory, pass the correct object into the constructor, AOP, class loaders).

JSF (both beginning and advanced)–David Geary

JSF is the new standard for web frameworks. David compared it to Swing and Struts meeting in a particle accelerator. Thompson’s fussed about tools for JSF, but I don’t think they’ll be needed for all JSF development, just like tools for Struts help, but aren’t required. I think that the most important bit about JSF is that it really tries to treat HTML widgets as full featured GUI components, which is something that is a bit of an adjustment for me. I’m really really used to thinking of HTML interfaces as generated strings, but this higher level approach (which has been used in the rich client world for a long time) is interesting.

There was an expert panel, consisting of several of the speakers. One hot topic was whether EJB 3.0 had been hijacked by Gavin King; everyone seemed to have an opinion on that. However, the choicest statement to emerge was Bruce Tate saying Java’s “type safety is an illusion” because everyone who uses a collection casts whenever they take anything out.

Herding Racehorses, Racing Sheep–Dave Thomas

This was a non-technical talk discussing how to improve programming as a profession. He referenced the Dreyfus Model of Skill Acquisition (novices learn differently from experts), and referenced Patricia Benner and her study of nurses in the 1970s, and how it was analgous to the current situation of developers. A great quote was “Training is how you give people knowledge; time is how you give people experience.” He also talked about how to move up the skill ladder, and how that will make it more difficult to outsource. However, he didn’t talk about how the relative dearth of novices would create a future shortage of experts, other than to acknowledge that everyone, anywhere, can move up the skill ladder and we need to prepare for that. Prepare by having a plan; this makes sense, as what you’re really doing is choosing where to invest your most precious commodity–your time.

TDD in the web tier–Rick Hightower

Rick covered the basics of Test Driven Development, and seemed a bit surprised that everyone wasn’t practicing it; he said it’s helped his code quite a bit. He went over a few tools that make testing (not just unit testing) easier today. A key point seemed to be the differentiation between TDD and Continuous Integration; tests that run for TDD need to be fast, since you’re running them multiple times a day, whereas CI tests can be slower. He also made the offhand comment that you could have JMeter proxy requests from a QA tester (in a web browser) and use Canoo (a JSP testing tool) to automate those tests. Wouldn’t that be cool?–cheaper than LoadRunner, that’s for sure.

Another expert panel. Someone asked “what are you folks going to be looking at in the next 6 months” and I was struck by the lack of diversity in the responses. Groovy, Hibernate, Tapestry came up again and again. Where do the new ideas come from? And where does deep knowledge come from, if everyone is running to a new cool tool every 6-12 months?

An offhand comment that someone made when we were talking abouty why so many apps had extraneous EJBs: “Yup, that was design by resume.”

Appfuse–Rick Hightower

Appfuse is a way to kick start your Struts applications. It provides a large chunk of best practices all in one place, along with a few pages that everyone needs (user creation, user authentication). Its license is liberal enough that you can use the code in your own project. I was struck by how many times Rick mentioned ripping stuff out, but I’m sure that I would learn quite a bit by poking around it. It was also clear to me that AppFuse is great for staring new applications, but I’m not sure it’s a good thing (other than a learning tool) for retrofitting best practices to existing applications. Also, Rick mentioned multiple times that he wouldn’t use Struts for a new application; given that AppFuse is primarily a Struts starter kit, I was a bit confused by this statement.

GIS–Scott Davis

This was a 1,000 foot overview of (primarily java) GIS applications. There are quite a few tools out there for displaying GIS data, which has several standardized formats (both those formally blessed by a standards organization, and those informal standards that grow out of network effects). There aren’t a collection of open source data sets, but you can get a ton of GIS data from government websites. The satellite that Scott’s company owns takes photos that are 15GB of data, and takes 500 such photos a day. Talk about storage needs. Also, anyone who wants to find out a bit more about satellite imaging would do well to read “Private eyes in the sky”, an article from the May 4th 2000 edition of the Economist, which is a good overview of the business.

Again, apologies for the jerky nature of my comments above. (Hey, at least I’m not talking about tugging any unmentionables.) Hangovers are not conducive to good note taking, but even if I had been rested, I still couldn’t do justice to 90 minutes of expert dialog in a paragraph on my blog. But it’s well worth going to one of these conferences, especially if you’re doing java web development.

Social issues of online gaming

Via the Mobile Community Design weblog comes an interesting presentation on some of the social issues for online gaming (unfortunately, the slide show is IE only). There’s a basic overview of some of graph theory, and heavy emphasis on human social networks as graphs, and how you can exploit and support said networks for your game.

Some fascinating slides in the presentation, chock full of information: “In 1974 GranovetterÂ’s ‘Getting a Job’ found that you get most jobs from weak ties, not strong ones, because weak ties inhabit other clusters and therefore have different information”, the relative size of US cities have been constant since the 1900s, and the actual degrees of separation, from a 1967 experiment, is 5.5, not 6.

I wish I could have gone to the presentation, since I agree with Mike Clark: “bulletware isn’t the content of [the] presentation”, and I’m sure the speaker had plenty of interesting explication of his slides. If nothing else, though, the five pages of bibliography should provide plenty of future reading material.

(Also, check out the Internet timeline for images of the Internet’s growth and more neat graphs.)

vi keybindings for Word

Well, someone’s finally done it. William Tan has put together a set of vi key bindings for Microsoft Word. (Thanks for the pointer, NTK!) I just downloaded and installed it, and thought I’d mention a few things.

1. The author mentions the instability (“alpha” nature) of the code. I haven’t run it long, but I get quite a few “Error 5346” and “Error 4198” messages. I’m no VB expert (nor even a newbie) so I have no idea what those mean. It didn’t seem to affect the document I was editing.

2. Installing the .dot file exposed some weirdness. The default location where you’re supposed to put these files (on WinXP, with Word 2003) is c:\Documents And Settings\Username\Application Data\Microsoft\Word\Startup\. Both the Application Data and Microsoft directories in the above path were hidden from Windows Explorer and the dir command in the shell, but you can cd to them.

The easiest way to install the .dot file is to open up Word, navigate via menus: Tools / Options / File Locations / Startup. Click the modify button, which brings up a file dialog box. Then drag the .dot file to that dialog box.

All in all, I’m glad someone has done this. Now, if only they’d do it for an IDE editor. Errm, I mean a free IDE–I know Visual Slickedit has a killer vi emulation mode. Yes, I know about Vimulator for jEdit, but the author’s language (“This plugin is in the early stages of implementation and does not yet provide a consistent or reliable VI-style interface.”), along with the fact it was last released in 2002, scared me away. Actually, it looks like there is one available for Eclipse: viPlugin.

Regardless, a very cool hack. Thanks, William.