Book Review: How to Lie with Statistics

How to Lie with Statistics, by Darrel Huff, should be required reading for everyone. The cachet of numbers are used all the time in modern society. Usually to end arguments–after all, who can argue with “facts”? Huff shows how the same set of numbers can be tweaked to show three different outcomes, depending on where you start and what you use. The fundamental lesson I learned from this book is that mathematical calculation involves a whole set of conditions, and any number derived from such a calculation is meaningless without understanding those conditions.

He also mentions that colleagues have told him that the flurry of meaningless statistics is due to incompetence–he dispatches this argument with a simple query: “Why, then, do the numbers almost always favor the person quoting them?” Huff also provides five questions (not unlike the five d’s of dodgeball) for readers to ask, when confronted with a statistic:

1. Who says so?

2. How does he know?

3. What’s missing?

4. Did somebody change the subject?

5. Does it make sense?

All this is wrapped up in a book with simple examples (no math beyond arithmetic, really) and quaint 1950s prose. In addition humor runs from the beginning (the dedication is “To my wife with good reason”) to the end (on page 135, Huff says “Almost anybody can claim to be first in something if he is not too particular what it is”). This book is well worth a couple hours of your time.

“How To Lie With Statistics” at Amazon.


More on the difficulty of tuning the JVM

Here’s a great overview of the JVM memory model (for all of Sun’s JVMs, including the latest changes). I find it intensely interesting that he brushes over what, for me, is the most complicated part of any web application tuning–testing. He outlines a process for initially sizing the various compartments of the JVM heap (for versions 1.4 and below), and then says: ‘The [resizing] process continues by “testing and tweaking” until things look good.’

Wow. Talk about waving your hands. I did some testing of a web application a few months ago, but when I presented the results, I was very clear that they were guidelines only. I didn’t have the resources or ingenuity to replicate the behavior of real users on real clients. A few years ago, I was part of a project that ran aground on this same rock, costing the company I was working for plenty of money. Using software to imitate user behavior is hard. A short list of the differences between software and users:

1. Users get distracted–by popups, their kids, etc. Software, not so much.
2. Users are connecting via a variety of methods, with a wide range of quality levels–modems, broadband.
3. Users don’t use applications in the way developers intended. The testing software, on the other hand, is programmed by developers, who naturally have it use the application in the way they intended

The adaptive memory model for the next JDK (wow, now it’s J2SE 5–Sun pulled another Solaris reversioning trick) that the author outlines might make the “tweaking” portion of his hand waving, err, I mean tuning, easier but leaves the “testing” as difficult as ever.


An open letter to Climbing magazine

Here’s a letter to Climbing magazine. I’m posting it here because I think that the lessons Climbing is learning, especially regarding the Internet, are relevant to every print magazine.

——————–
I just wanted to address some of the issues raised in the Climbing July 2004 Editorial, where you mention that you’ve cut back on advertising as well as touching on the threat to Climbing from website forums. First off, I wanted to congratulate you on adding more content. If you’re in the business of delivering readers to advertisers you want to make sure that the readers are there. It doesn’t matter how pretty the ads are–Climbing is read for the content. I’m sure it’s a delicate balance between (expensive) content that readers love and (paid) advertisements which readers don’t love; I wish you the best in finding that balance.

I also wanted to address forums, and the Internet in general. I believe that websites and email lists are fantastic resources for finding beta, discussing local issues, and distributing breaking news. Perhaps climbing magazines fulfilled that need years ago, but the cost efficiencies of the Internet, especially when amateurs provide free content, can be hard to beat. But, guess what? I don’t read Climbing for beta, local issues, or breaking news. I read Climbing for the deliberate, beautiful articles and images. This level of reporting, in-depth and up-close, is difficult to find on the web. Climbing should continue to play to the strengths of a printed magazine–quality, thoughtful, deliberate articles and images; don’t ignore breaking news, but realize that’s not the primary reason subscribers read it. I don’t see how any magazine can compete with the interactivity of the Internet, so if Climbing wants to foster community, perhaps it should run a mailing list, or monitor rec.climbing (and perhaps print some of the choice comments). I see you do run a message board on climbing.com–there doesn’t look to be much activity–perhaps you should promote it in the magazine?

Now for some concrete suggestions for improvement. One of my favorite sections in Climbing is ‘Tech Tips.’ I’ve noticed this section on the website–that’s great. But, since this information is timeless, and I’ve only been a subscriber for 3 years, I was wondering if you could reprint older Tech Tips, to add cheap, useful content to Climbing. Also, I understand the heavy emphasis on the modern top climbers–these are folks that have interesting, compelling stories to tell, which are interesting around the world. Still, it’d be nice to see ‘normal’ climbers profiled as well, since most of us will never make a living climbing nor establish 5.15 routes, but all climbers have stories to share. And a final suggestion: target content based on who reads your magazine. Don’t use just a web survey, as that will be heavily tilted in favor of the folks who visit your website (sometimes no data is better than skewed data). Instead find out what kind of climbers read your magazine in a number of ways: a web survey, a small survey on subscription cards, paper surveys at events where Climbing has presence, etc. This demographic data will let you know if you should focus on the latest sick highball problem, the latest sick gritstone headpoint or the latest sick alpine ascent.

Finally, thanks for printing a magazine worth caring about.
——————–


Friendster re-written in PHP

Friendster is still alive and kicking, and according to Salon, it’s adding 200,000 users every week. In the past, I’ve commented about their business model and I still don’t see any resolution of those problems (lest we forget, taking VC money is not a business model!). But, I’m not here to write about the business model of Friendster today.

I check in, periodically, to Friendster to see if anyone new has joined, or added a new picture, or come up with a new catchy slogan for themselves. When I joined, it was daily, now it’s monthly. One of the things that detracted from the experience was the speed of the site. It was sloooow. Well, they’ve dealt with that–it’s now a peppy site (at least on Saturday morning). And it appears that one of the ways they did this was to switch from JSP to PHP. Wow. (Some folks noticed a while ago.) I wasn’t able to find any references comparing the relative speed of PHP and JSP, but I certainly appreciate Friendster’s new responsiveness.


Book review: The Great Divide

The Great Divide, by Stephen Pern, explores one man’s trip from Mexico to Canada along the Continental Divide. Now, this book explores the backbone of the USA, but the author is definitely (perhaps defiantly) English–and in many ways, from his frequent stops for tea to his sardonic wit to his idioms (biro, peg), it adds to the charm of the book. From New Mexico to Montana, Pern relates the obstacles, emotional, physical and personal, which confront him during his journey. Typically tongue-in-cheek in his prose, he also strikes true notes, especially when commenting on life in America. He lays out a succinct contrast between the New World and the Old: when confronting the lack of historic artifacts on his jounry, he muses “Life [in America] was first established, then lived. Back home [in Europe], it was the other way around.”

The logistics of supplying his 2500 mile journey were worth the read alone–his description of peanut butter rationing chimes with anyone who has backpacked with luxury foods. He also includes an appendix with much information, including suggested maps, useful equipment and obstacles encountered. should you wish to follow in his footsteps. In 1986, when he wrote the book, there was no Continental Divide Trail, although it looks like Congess designated a (still incomplete) route in 1978. Pern is also very clear when he diverges from the Divide, providing maps with small comments and textual explanations of his detours. Many of these are for good reasons–bad terrain, a hot shower, a resupply mission.

But the most interesting sections of this book was not the physical exertion nor the beauty that he described (though a picture section would have been a fantastic addition). No, in the tradition of Least Heat Moon’s ‘Blue Highways’ and Bryson’s ‘In A Sunburned Country,’ it is his interactions that really lend depth and meaning to his book. Whether it’s the innumerable breakfasts fixed for him, a surly shopkeeper in Montana, or a Navajo shepherd who can’t speak English and doesn’t understand the lifestyle of her grandchildren, Pern takes each encounter and uses it to reflect a bit of the American psyche.

All in all, this book was inspiring and well worth a read.


Trust, but verify

As I’ve mentioned previously the web lets smaller players get into the publishing arena, and we all know there are some amazing websites chock full of interesting and useful information. If you’re tired of hearing the hyperbolic claims of either presidential candidate, and want to see them debunked, check out factcheck.org. Non-partisan and detailed examinations of ads can only help voters make an informed choice. Now, if only they had an RSS feed!


java memory management, oh my!

How much do you understand basic java? Every day I find some part of this language that I’m not aware of, or don’t understand. Some days it’s cool APIS (like JAI) but today it’s concurrency. Now, language managed memory is a feature that’s been present in the languages in which I’ve been programming since I started. I’ve looked at C and C++, but taking a job coding in those seems to me it’d be like a job with a long commute–both have obstacles keeping you from getting real work done. (I’m not alone in feeling this way.) But this thread of comments on Cameron Purdy’s blog drove home my ignorance. However, the commenters do point out several interesting articles (in particular, this article about double checked locking was useful and made my head hurt at the same time) to alleviate that. I took a class with Tom Cargill a few years back, which included his threading module, that helped a bit.

However, all these complexities are why servlets (and EJBs) are so powerful. As long as you’re careful to only use local variables, why, you shouldn’t have to worry about threading at all. That’s what you use the container for, right? And we all know that containers are bug free, right? And you’d never have to go back and find some isolated thread related defect that affected your code a maddeningly miniscule amount of time, right?


Symlinks and shortcuts and Apache

So, I’m helping install Apache on a friend’s computer. He’s running Windows XP SP1, and Apache has a very nice page describing how to install on Windows. A few issues did arise, however.

1. I encountered the following error message on the initial startup of the web server:

[Tue Jun 15 23:09:11 2004] [error] (OS 10038)An operation was attempted on something that is not a socket. : Child 4672: Encountered too many errors accepting client connections. Possible causes: dynamic address renewal, or incompatible VPN or firewall software. Try using the Win32DisableAcceptEx directive.

I read a few posts online that suggested I could just follow the instructions–I did and just added the Win32DisableAcceptEx directive to the bottom of the httpd.conf file. A restart, and now localhost shows up in a web browser.

2. Configuration issues: My friend also has a firewall on his computer (good idea). I had to configure the firewall to allow Apache to receive packets, and respond to them. Also, I had to configure the gateway (my friend shares a few computers behind one fast internet connection) to forward the port that external clients can request information from to the computer on which Apache was running. Voila, now I can view the default index.html page using his IP address.

3. However, the biggest hurdle is yet to come. My friend wants to server some files off one of his hard drives (a different one than Apache is installed upon). No problem on unix, just create a symlink. On windows, I can use a shortcut, right? Just like a symlink, they “…can point to a file on your computer or a file on a network server.”

Well, not quite. Shortcuts have a .lnk extension, and Apache doesn’t know how to deal with that, other than to serve it up as a file. I did a fair bit of searching, but the only thing I found on dealing with this issue was this link which basically says you should just reconfigure Apache to have its DocRoot be the directory which contains whatever files you’d like to serve up. Ugh.

However, the best solution is to create an Alias (which has helped me in the past) to the directories you’re interested in serving up. And now my friend has Apache, installed properly as a service, to play around with as well.


PL/SQL

I recently wrote a basic data transformation program using Java and PL/SQL. I hadn’t used PL/SQL (which is an Oracle-specific procedural language for stored procedures) since writing a basic data layer for my first professional project (a Yahoo! like application written in PL/SQL, perl and Story Server–don’t ask). Anyway, revisiting PL/SQL reminded me of some of the things I liked and disliked about that language.

I like:

Invalidation of dependencies. In PL/SQL, if package A (packages are simply arbitrary, hopefully logical, groups of procedures and functions) calls package B, A depends on B. If the signatures of B are recompiled (you can separate the signatures from the implementations) package A simply won’t run until you recompile it. This is something I really wish other languages would pick up, because it at least lets you know when something you depend on has changed out from under you.

I dislike:

The BEGIN and END blocks, which indicate boundaries for loops and if statements, are semantically no different than the { and } which I’ve grown to love in perl and Java. But for some reason, it takes me back to my pascal days and leaves a bit of a bad taste in my mouth.

I’m unsure of:

The idea of putting business logic in a database. Of course, schemas are intimately tied to the business layer (ask anyone trying to move to a different one) and anyone who pretends that switching databases in a java applications is a simple matter of changing a configuration file is smoking crack, but the putting chunks of business logic in the data layer introduces a few problems. Every different language that you use increases the complexity of a project–and to debug problems with the interface between them, you need to have someone who knows both. Also, stored procedures don’t fit very well into any of the object relational mapping tools and pretty much force you to use jdbc.


Death marchs and Don Quixote

I just finished watching ‘Lost In La Mancha’ which chronicles Terry Gilliam’s attempt to film his version of the story of Don Quixote, ‘The Man Who Killed Don Quixote’. (More reviews here.) The attempt failed, though there was a postscript that indicated that Gilliam was trying again. (An aside: not the best date movie.)

It was interesting to watch the perspective of the team start upbeat and slowly descend into despair. There were many reasons the filming failed, but what was most fascinating is that it was a death march project that just happened to take place in the sphere of film.

Of course there were certain ‘acts of God’ that contributed to the failure, but there always are difficulties beyond control. What’s more interesting to me is the disasters that could have been planned for. Read through some of the aspects of ‘Lost In La Mancha’ and see if you recognize any (plenty of spoilers, so don’t read if you want to watch the movie):

1. Gilliam tried to create a $60 million film on a $32.1 million dollar budget. He actually smiles while saying words to this effect!

2. Not all key players present during planning. In pre-production, none of the actors are able to schedule time to rehearse, partly because they all took pay cuts to make this movie (see point 1), partly because they were all busy.

3. Tight timelines. Due to money and scheduling, every day of filming was very carefully planned out; any problems on early days required changes to the entire schedule.

4. A visionary architect wasn’t willing to compromise. Gilliam is well known for his mind-blowing films (Twelve Monkeys, Brazil) and had been working on this movie in his mind for decades. This led to perfectionism, which, given the tight timelines and lack of money, wasn’t always the right use of resources. Addtitionally, Gilliam had a lackadaisical methodology: he mentions several times that his philosophy is ‘just shoot film and it will work out.’ That sounds freakishly similar to ‘just start coding and everything will be fine.’

5. Project history worked against success. This is one of the most interesting points–there were really two kinds of project history present. Film versions of ‘Don Quixote’ have a checkered past–Orson Welles tried for years to make a version, even continuing to film beyond his Don Quixote dying. And Gilliam has had at least one bomb–The Adventures of Baron Munchausen, a box office failure which haunted him for years. In both of these cases, there past actions cast a shadow over the present, affecting morale of the team.

6. When problems arose, the producers didn’t trust the technical staff (the directors). In particular, when weather struck, the directors wanted to allow the team to regroup, whereas the producers, because of points 1 and 3, wanted to film. Strife at the top never helps a project.

7. The equipment and setting was not optimal. Due to, I’m guessing, point 1, the outside scenes are set in a location next to a NATO air base, where jets will be flying overhead (‘only for an hour a day’ according to the first assistant director). The last sound stage in Madrid is reserved–it turns out to be a simple warehouse with awful acoustics.

And then there were some factors that simply were out of the blue. These included some bad weather and the illness of the actor playing Don Quixote. These were what pushed the film over the edge–but it wouldn’t have been on the edge if not for the other factors above. And you can also see that factors snowball on each other–timelines are tight because actors aren’t around; trust between team members is lost because of money and time issues.

It was like watching a train wreck in slow motion, but it was also illuminating to see that the lessons of project management not only are ignored in the software development, but also in film. Misery loves company.



© Moore Consulting, 2003-2017 +