Skip to content

With enough eyeballs…

I referred to Project Gutenberg obliquely here, but browsing their site I found that they’ve implemented distributed proofreading. This is a very good thing. I did one book, Hiram, the Young Farmer, for PG a few years ago, when I was in college and time wasn’t so precious. The OCR went quickly, but the proofreading was slow going and error prone; the story wasn’t exactly riveting, but it was in the public domain. (In fact, I just took a look at Hiram and found at least two mistakes. Doh!)

But Distributed Proofreaders solves the proofreading problem by making both the scanned image and the OCRed text available to me in a web browser. Now I can proofread one page at a time, easily take a break, and even switch between books if I’d like. Also, they’ve implemented a two phase review, much like Mozilla’s review and super review process. Hopefully this will prevent mistakes from being made, since these are going to be the authoritative electronic versions of these documents for some time. Linus’ law probably holds for text conversion even more than for software development.

Now, it wasn’t apparent to me from the website, but I certainly hope the creators of this project have licensed it out to businesses–I can see this application being a huge help for medical transcriptions (work from home!) and any other kind of paper to electronic form conversion.

Update:
It looks like there is a bit of a distributed.net type competition among the PGDP proofreaders.

Long running queries in servlets

The stateless nature of the web presents some user interface issues. Not least of these is how to handle long running processes most efficiently. Do you keep the user waiting, do you poll, etc? Remember, even if everything is going dandy, normal users like to see something.

This JavaRanch article is a good explication of how to use message driven beans and asynchronous access to data in the web tier to deal with these problems. It leans a bit heavily on WebSphere, but does seem to address some of Dion’s issues about there not being enough use of messaging systems. And it even throws a couple of design patterns in as well.

The people’s voice

Tim Bray points out Radio Vox Populi which is a really cool idea:

weblogs + web crawler + text-to-speech + mp3 streaming = talk radio for everyone.

Of course it could do with some filtering, or categorization, but it’s a cool idea. It actually jives with an idea I’ve had for a long time, which is to use text-to-speech, perhaps Festival, to burn cds of Project Gutenberg to create cheap books on cd (oh, should I listen to Boy Scouts on Motorcycles, by G. Harvey Ralphson or Armenian Literature, by Anonymous today?). That’d be cool, if you can handle listening to a robot voice.

Comments on “‘Real Throwbacks’ comment response”

Well, I was going to trackback this post, but Nancy doesn’t have that enabled, so I’ll just comment here. Much anger in this one.

The problem with raging about radio is that it’s a *free* service. What do you pay for the time you listen to the radio? Now, of course ClearChannel is pop pap and there’s a lot of consolidation happening in the radio business, with generally negative impacts on quality. Don’t blame CC–they’re just reacting to the mandates of the market. (Your media can be free, diverse, or equal, pick any two.)

If you don’t like that-which-was-KTCL, blame the government for taking a public good and whoring it out without thinking about the consequences or having any more justification than ‘the market always does right.’ If there’s one thing we should have learned from the last couple of centuries, it’s that while capitalism may the least of the evils, it’s still evil. Of course, this isn’t a new thing.

Harpers

I was at a friend’s house a few months ago and ran across a copy of Harper’s magazine. I’d read it before, mostly in dentists’ offices and such, but I read this one cover to cover. There was an especially hilarious piece, Beware of Dogg by Dr. Ninjaforkian, in the Readings section (which has apparently been posted on /. and MetaFilter). Since then, there’ve been bits on ClearChannel, the food chain, Korean sayings, and the coming election. Eclectic, no?

I just found out that one of my favorite sections is online: Harpers Index displays fascinating facts and gives you the source for every one. Just what you need at parties!

“Percentage of Chinese exports to the U.S. accounted for by merchandise sold at Wal-Mart : 10 [Wal-Mart (Bentonville, Ark.)/Department of Commerce (Washington) ]

Number of factory jobs that China has lost since 1995 : 25,000,000 [Alliance Capital Management Corporation (N.Y.C.) ]”
from Feb 2004

“Number of Canadian prison inmates who overdosed in March on fellow prisoners’ methadone-laced vomit: 2 [Saskatchewan Department of Corrections (Regina, Canada)]

Number of inmates charged with drug trafficking for providing the vomit: 3 [Saskatchewan Department of Corrections (Regina, Canada)]”
from Sep 2003

I didn’t see the sources online, but they’re there in the HTML source, and hence in the cut-and-paste above (I don’t really understand why they weren’t showing up; neither Mozilla nor IE displayed them). Go ahead, read them all.

IPTraf

Hey, I like to work at the higher levels of the 7 Layer Burrito, the Application, Presentation and Session layers. But every so often, you have to dig a bit deeper. Currently, I’m troubleshooting a ColdFusion application that was converted from a local mysql database to a remote postgresql database. There are quite a few docs about optimizing postgresql, but the focus on query and local database optimization, and I think the issue was the network traffic (based on load average of both the local and remote boxes). Anyway, I found this neat tool called IPTraf which gives you real time monitoring of ip traffic. Pretty nice, but avoid the US mirror of the binary build, since it’s not complete.

What are EJBs good for?

Dion had a good post about what EJBs are good for. I’ve only used EJBs seldom (and peripherally), but it’s my understanding, from reading the literature, that EJBs are appropriately named–that is, good for enterprise situations. In that case, what on earth are these folks thinking? They demonstrate using an EJB in JSP. What?

Publishing power

You have to give the web credit for making information distribution a lot cheaper. Whether it’s a small business distributing forms via the web or BlockBuster distributing rental coupons via email, it’s just plain simpler to get information distributed over the internet.

A friend just forwarded me the expected US budgets for the next 5 years. And then he forwarded me budgets going back to 1996. An invaluable resource, to be certain. What other countries allow you to look at their budget on the web? The UK, New Zealand, Canada, Australia, India, Fiji….

Wow. And all this was found with half an hour of searching. Wonderful!

An IM application server

I’ve written before about IM in the workplace. It’s becoming more and more prevalent, and other people have noticed this as well. IM is something that’s easy to use, and gives you the immediate response of the phone without be nearly as intrusive.

Now, in the past, using IRC, it was relatively easy to have a program, or bot, that would listen to conversations, or that you could ask questions of. They were dumb, but they worked. In the world of IM, I wasn’t aware of any easy way to do this. However, browsing freshmeat yesterday I discovered an easy way to write IM applications.

It’s called the SDBA Revolution Instant Messaging Application Server and building IM applications is fantastically easy if you use this perl framework. I was able to download it, and build a simple application in about 30 minutes. And that includes signing up for the usernames from AOL. It uses a perlish syntax and doesn’t support extremely complicated applications, but does offer enough to be useful. If you can code a php website, you can build an IM application. The author even provides six or so sample applications, including a database interface (scary!). The only issues I found with the IM app server were:

1. It doesn’t support Yahoo! That’s because the Yahoo! IM perl module has been unmaintained since the last Yahoo! protocol update.

2. I’m not sure of the legality of using a bot on a public service like AIM, MSN, or Yahoo!. Violations of these license agreements happen all the time, but, if you’re a stickler for those darn license agreements, this application server appears to work with Jabber.

Just goes to show you that 30 minutes a week browsing freshmeat or SourceForge will almost never be wasted. A bit of slack to do this will probably pay off in the long run.

PowerPoint and presentations

I went to an ACM meeting last Tuesday at NREL. The topic was “The Role of Computational Science in Energy Efficiency and Renewable Energy Research” by Dr. Steve Hammond. It was an interesting talk–NREL is doing some neat stuff with alternative energy sources (one thing that Dr. Hammond mentioned was an algae that produces hydrogen gas–a possible clean, renewable, easily scalable source of that element).

Now, I definitely don’t want to single out Dr. Hammond. He did a good job explaining the value of computing to energy research, as well as fielding questions that were out of his expertise from nitpicking engineers (are there any other kind?). However, his presentation just drove home to me how easy it is to let PowerPoint drive a presentation. And how doing that really detracts from the speaker’s points. I’m certainly not the first person to mention this. But I just wanted to point out this very very good article about speaking during a presentation, rather than just reading from slides.

Hey buddy, I can probably read those slides faster than you can say them, and it’s a lot less boring for me. Instead, explain the slides to me in a way that makes the talk more of a conversation. Don’t let the technology drive the presentation; it may be easier to read the slides, but it makes for a much poorer presentation.