Skip to content

Book Review: How to Lie with Statistics

How to Lie with Statistics, by Darrel Huff, should be required reading for everyone. The cachet of numbers are used all the time in modern society. Usually to end arguments–after all, who can argue with “facts”? Huff shows how the same set of numbers can be tweaked to show three different outcomes, depending on where you start and what you use. The fundamental lesson I learned from this book is that mathematical calculation involves a whole set of conditions, and any number derived from such a calculation is meaningless without understanding those conditions.

He also mentions that colleagues have told him that the flurry of meaningless statistics is due to incompetence–he dispatches this argument with a simple query: “Why, then, do the numbers almost always favor the person quoting them?” Huff also provides five questions (not unlike the five d’s of dodgeball) for readers to ask, when confronted with a statistic:

1. Who says so?

2. How does he know?

3. What’s missing?

4. Did somebody change the subject?

5. Does it make sense?

All this is wrapped up in a book with simple examples (no math beyond arithmetic, really) and quaint 1950s prose. In addition humor runs from the beginning (the dedication is “To my wife with good reason”) to the end (on page 135, Huff says “Almost anybody can claim to be first in something if he is not too particular what it is”). This book is well worth a couple hours of your time.

“How To Lie With Statistics” at Amazon.

Trust, but verify

As I’ve mentioned previously the web lets smaller players get into the publishing arena, and we all know there are some amazing websites chock full of interesting and useful information. If you’re tired of hearing the hyperbolic claims of either presidential candidate, and want to see them debunked, check out factcheck.org. Non-partisan and detailed examinations of ads can only help voters make an informed choice. Now, if only they had an RSS feed!

Social issues of online gaming

Via the Mobile Community Design weblog comes an interesting presentation on some of the social issues for online gaming (unfortunately, the slide show is IE only). There’s a basic overview of some of graph theory, and heavy emphasis on human social networks as graphs, and how you can exploit and support said networks for your game.

Some fascinating slides in the presentation, chock full of information: “In 1974 GranovetterÂ’s ‘Getting a Job’ found that you get most jobs from weak ties, not strong ones, because weak ties inhabit other clusters and therefore have different information”, the relative size of US cities have been constant since the 1900s, and the actual degrees of separation, from a 1967 experiment, is 5.5, not 6.

I wish I could have gone to the presentation, since I agree with Mike Clark: “bulletware isn’t the content of [the] presentation”, and I’m sure the speaker had plenty of interesting explication of his slides. If nothing else, though, the five pages of bibliography should provide plenty of future reading material.

(Also, check out the Internet timeline for images of the Internet’s growth and more neat graphs.)

Book Review: The Social Life of Information

I just finished reading The Social Life of Information, by John Seeley Brown and Paul Duguid. This was not the quickest read; it’s a business book with the obtuseness of vocabulary that implies. However, if you’re a computer person with any desire to see your work in a larger context, this is a book you should read. In it, they examine eight separate areas in which computers, and the internet in particular, have supposedly changed our lives (this is typically called ‘hype’, though the authors don’t use the word) in the latter years of the 20th century. (This book is copyright 2000.) You probably remember some of these claims: the death of the corporation, of the university, of paper documents, of the corporate office. In each chapter, they review one claim, show how the claim’s proponents over-simplify the issue, and look at the (new and old) responses of people and institutions to the problem that the claim was trying to solve. They also examine, in detail, the ways in which humans process information, and how the software that is often touted as a replacement simply isn’t.

I really enjoy ‘ah-ha’ moments; these are times where I look back at my experiences in a new light, thanks to a theory that justifies or explains something that I didn’t understand. For example, I remember when I started my first professional job, right out of college, I thought the whole point of work was to, well, work. So I sat in my cube and worked 8 solid hours a day. After a few months, when I still didn’t know anyone at the office, but had to ask someone how to modify a script I was working on, I learned the value of social interaction at the office. (Actually, I was so clueless, I had to ask someone to find the appropriate someone to ask.) While examining the concept of the home office, the authors state “[t]he office social system plays a major part in keeping tools (and people) up and running.” It’s not just work that happens at the office–there’s collaboration and informal learning.

I’ve worked remotely in the past year for the first time, and anyone who’s worked remotely has experienced a moment of frustration when trying to explain something and wished they were just “there,” to show rather than tell–the authors refer to this process as ‘huddling.’ When someone is changing a software configuration that I’m not intimately familiar, it’s much easier to judge correct options and settings if I’m there. The authors explain that “[huddling] is often a way of getting things done through collaboration. At home with frail and fickle technologies and unlimited configurations, people paradoxically may need to huddle even more, but can’t.” This collaboration is even more important between peers.

Reading about the home office and its lack of informal networks (which do occur around the corporate office) really drove home the social nature of work. After a few years at my company, I had cross-departmental relationships (often struck up over beer Friday) that truly eased some of my pain. Often, knowing who to ask a question is more important than knowing the answer to the question. It’s not impossible to build those relationships when you’re working remotely, but it’s much more difficult.

Another enjoyable moment of clarity arose when the authors discussed the nature of documents. I think of a document as a Word file, or perhaps a set of printed out pages. The explicit information (words, diagrams, etc) that I can get from the document is the focus (and this is certainly the case in document management systems sales pitches). But there’s a lot more to a document. How do I know how much to trust the information? Well, if it’s on a website somewhere, that’s a fair bit sketchier than if it’s in the newspaper, which is in turn less trustworthy than if I’ve experienced the information myself. Documents validate information–we’ve all picked up a book, hefted it, examined it, and judged it based on its cover. The authors say “readers look beyond the information in documents. … The investment evident in a document’s material content is often a good indicator of the investment in its informational content.” Just as if someone says “trust me” you should probably run the other way, information alone can’t attest to its own veracity. The authors also look at aspects to documents (like history, like feel, like layout) that simply aren’t captured when you treat them as streams of bits.

And there are many other examples of ‘hype’ that are deflated in this book, and a few other ‘ah-ha’ moments as well. As I stated above, this is a great read for anyone who thinks there is a technical answer to any problem (or even most problems). By taking apart various claims, and examining the truth and untruth of those claims in a real world context, these two authors give technology credit where it’s due, while at the same time explaining why some of the older institutions and important factors in our lives will remain around. Reading this book was hard work, but understanding what the authors say gives me yet another way to relate to non-technical people, as well as fend off the zealots who claim, in a knee-jerk fashion, that more software solves problems. I majored in physics, in college, but minored in politics. It always seemed that the people problems, though more squishy, were more interesting. This book is confirmation of that fact.

People and automation

I read this article with interest. I’ve noticed the creep of automated services in the last ten years. Who goes into gas stations any more, unless you need a candy bar. Given the fact that these machines are a fixed cost investment (as opposed to an ongoing expense, like labor), I expect to see more and more of these. When reading this article, what every employee has to ask themselves is ‘Am I an elevator attendant or a bank teller?’.

I remember reading a story in Analog, years ago, about a general purpose robot and the societal dysfunction it caused. These robots could do everything a human being could, but 24 hours a day, rather than 8. Of course, this caused riots among workers, afraid of their jobs being turned over to the robots. Luckily, workers’ organizations and employers were able to come to an compromise–businesses couldn’t own these robots, only people could. Businesses would rent them from individuals, who would thus be able to earn a living.

That’s science fiction for you: both the problems and solutions are outlined in black and white. What we see nowadays is greyer–more and more ATMs are installed, yet tellers are being hired. Robots aren’t general purpose (and humanoid)–they’re slipping into the mainstream industry by industry. People aren’t rioting in protest of robots–they’re traveling extra distance to use them.

But the issues raised are still the same. Every machine that replaces a person (or two and one half people) causes a very real impact on the bottom line of the employee. At the same time, if a business can cut its labor costs, it will need to do so (especially if its competitors are also heading down the automation path). These differences revisit the old labor vs. capital divide (wouldn’t Marx and Engels be proud?), and the answers aren’t simple (or completely known, for that matter).

(The same issues arise in offshoring, and Bob Lewis comments here (sorry, you have to register to read the article). He states that the labor and capital national economies have been coupled for a long time, but now are being decoupled. He doesn’t have any answers, either.)

Technology has been automating away jobs since the Industrial Revolution, if not before. Things have worked out fine in the past, but it hasn’t always been pleasant to live through.

I don’t see any substantive debate on the nature of labor disempowerment. Perhaps this is because “we’ve seen this before, and it has always worked out” or because it’s an uncomfortable issue (especially in an election year) or because “we don’t have any real leaders anymore” or because we’re all vegetated by the modern opiate of the masses? I don’t know whether labor will riot, but brushing the issue under the rug certainly isn’t going to help.

Computer Security

Computer security has been on people’s minds quite a bit lately. What with all the new different viruses, worms and new schemes to get information through firewalls, I can see why. These problems cause downtime, which costs money. I had recently shared a conversation over a beer with one of my acquaintances who works for a networking security company. He’d given a presentation to a local business leaders conference about security. Did he talk about the latest and greatest in counter measures and self healing networks? Nope. He talked about three things average users can do to make their computers safer:

1. Anti virus software, frequently updated.
2. Firewalls, especially if you have an always on connection.
3. Windows Update.

Computer security isn’t a question of imperviousness–not unless you’re a bank or the military. In most cases, making it hard to break in is good enough to stop the automated programs as well as send the less determined criminals on their way. (This is part of the reason Linux and Mac systems aren’t (as) plagued by viruses–they’re not as typical and that makes breaking in just hard enough.) To frame it in car terms, keep your CDs under your seat–if someone wants in bad enough, they’ll get in, but the average crook is going to find another mark.

What it comes down to, really, is that users need to take responsibility for security too. Just like automobiles, where active, aware, and sober drivers combine with seat belts, air bags and anti-lock brakes to make for a safe driving experience, you can’t expect technology to solve the problem of computer security. After all, as Mike points out, social engineering is a huge security problem, and that’s something no program can deal with.

I think that science and technology have solved so many problems for modern society that it’s a knee jerk reaction nowadays to look to them for solutions, even if it’s not appropriate (the V-chip, the DMCA, Olean), rather than try to change human behavior.

Update (May 10):

I just can’t resist linking to The Tragedy of the Commons, which does a much more eloquent job of describing what I attempted to delineate above:

“An implicit and almost universal assumption of discussions published in professional and semipopular scientific journals is that the problem under discussion has a technical solution. A technical solution may be defined as one that requires a change only in the techniques of the natural sciences, demanding little or nothing in the way of change in human values or ideas of morality.

In our day (though not in earlier times) technical solutions are always welcome. Because of previous failures in prophecy, it takes courage to assert that a desired technical solution is not possible.”

First Monday

First Monday is a collection of peer reviewed papers regarding technology, “solely devoted to the Internet”. If you want a feeling for how Internet technology is affecting society, presented in a clear, reasoned format, this is one of the places to go. Topics range from “what’s wrong with open source” (which got slashdotted recently) to “how online education affects the balance of power at universities” to “how can we best keep track of information”. Fascinating stuff, and the academic nature of the discourse means that it’s got a solid foundation (as opposed to most weblogs, which are just some opinionated person rambling). You can even sign up for an email list to be notified when they post news articles (ah, listserv). Too bad they don’t have an RSS feed.

Ease of programming

Much has been written about ease of use in software, but I think that ease of programming has an even bigger effect. Clay Shirky has a written an interesting post about situated software. Situated software is apparently social software written without certain ‘Web Software’ characteristics, and has some other unique traits. These include
1. not being as technically rigorous
2. capitalizing on ‘real world’ group knowledge without including that
knowledge in software
3. lack of generality
4. planned small number of users
5. accepted physicality
6. short lifespan
7. lack of scalability

His post simply acknowledges that social software (that is, software intended to be used by and relying on the strengths of groups) is becoming, much other software, easier and easier to write. This is due to a variety of factors:

1. Increasing awareness of computers. The PC has been around for 20 years, and is featured in more and more facets of life. This means that even folks who aren’t computer geeks have a basic understanding of how applications work and can be expected to use any applications that are interesting.

2. Open source and costless software reduce the cost structure. If you have to spend thousands of dollars (or hundreds of hours building) for a crucial infrastructure component (for example, a database, or a web server, or a set of client GUI libraries), it’s hard to justify if you’re just whipping something together for a small group. But if you have MySQL, Apache, and IE already provided free of charge, it’s a lot easier to build something interesting on top of these components. This also applies to technical knowledge. I’m on a mailing list for computer book authors and have seem quite a few lamentations about technical content being available for free on the web and cutting into book sales.

3. Programmers are expensive. Methodologies are expensive. Repeatable process is expensive. And all these are unneeded, if it’s going to be a small application used by a known and finite number of people.

4. Increasing ease of use. Tools like perl, MS Office, VB and PHP are made for throwing together quick applications. Sure, you can build large scale applications with these tools if you want, but that takes rigor and discipline. The reason it takes discipline is because these languages were designed from inception to make ‘easy things easy’ even for non programmers. Microsoft deserves plaudits for realizing this and developing their applications with the idea of a non-programmer building applications in mind. (Have you seen some of the wicked Excel spreadsheets your accounting department has?)

This trend is nothing new. In the 1960s, you had to control your video display system in your program; now you just call on MFC or Swing to handle the guts of the GUI. In the 1990s, you had to build your own state machine for each web application; now you just download one of the many frameworks out there and you get a state machine for free.

My question is, what does it do to society when everyone has some kind of understanding of software? To lean on the analogy with cars, I think you’ll end up with a similar division: a highly skilled, specialized, small workforce that builds software that’s easy to use, and a large class of users, who have varying degrees of understanding of the software, but use it in ways that the designers can’t imagine (how did you dent that?) in all facets of their lives.

Do you know where your sensitive files are? Google does.

Googling Up Passwords points out that Google‘s spiders crawl web server error messages and other misconfigurations just as easily as they crawl real content. For simple sites, like mine, there’s not really an issue. Static HTML doesn’t yield much of interest. For complex sites, like Amazon and Ebay, there is a phalanx of security experts waiting to pounce upon and patch the latest bug (perhaps not an entire phalanx, but those sites can and must afford security experts). But for the small workgroup web server, probably using MS products (for ease of use, convenience and training reasons), having such detailed examination of their web server available by keyword search is a disaster.

I often think of computers and cars in the same light. Automobiles were difficult to operate, prone to breaking down, and expensive during the early years of the 20th century. However, eventually, the technology standardized, the industry consolidated, and the car became a fundamental part of (American) life. Computers have only been accessible to common folk since the 1950s, so it’s not fair to demand the same level of reliability. Yet, how much more protean is the computer than the automobile? It took decades to get air bags installed and seat belts worn. How long will it take before folks have the same level of visceral, unconscious understanding of the perils of the computer?

With enough eyeballs…

I referred to Project Gutenberg obliquely here, but browsing their site I found that they’ve implemented distributed proofreading. This is a very good thing. I did one book, Hiram, the Young Farmer, for PG a few years ago, when I was in college and time wasn’t so precious. The OCR went quickly, but the proofreading was slow going and error prone; the story wasn’t exactly riveting, but it was in the public domain. (In fact, I just took a look at Hiram and found at least two mistakes. Doh!)

But Distributed Proofreaders solves the proofreading problem by making both the scanned image and the OCRed text available to me in a web browser. Now I can proofread one page at a time, easily take a break, and even switch between books if I’d like. Also, they’ve implemented a two phase review, much like Mozilla’s review and super review process. Hopefully this will prevent mistakes from being made, since these are going to be the authoritative electronic versions of these documents for some time. Linus’ law probably holds for text conversion even more than for software development.

Now, it wasn’t apparent to me from the website, but I certainly hope the creators of this project have licensed it out to businesses–I can see this application being a huge help for medical transcriptions (work from home!) and any other kind of paper to electronic form conversion.

Update:
It looks like there is a bit of a distributed.net type competition among the PGDP proofreaders.