Skip to content

All posts by moore - 83. page

A survey of geocoding options

I wrote a while back about building your own geocoding engine. The Tiger/Line dataset has some flaws as a geocoding service, most notably that once you get out of urban areas many addresses simply cannot be geocoded.

Recently, I was sent a presentation outlining other options (pdf) which seems to be a great place to start an evaluation. The focus is on Lousiana–I’m not sure how the conclusions would apply to other states’ data.

Full content feeds and Yahoo ads

I changed the Movable Type template to include full content on feeds. Sorry for the disruption (it may have made the last fifteen entries appear new).

I think sending full content in the feeds (both RSS1 and RSS2) goes nicely with the Yahoo Ads I added a few months ago. Folks who actually subscribe to what I say shouldn’t have to endure ads, while those who find the entries via a search engine can endure some advertising. (Russell Beattie had a much slicker implementation of this idea a few years ago.)

More about the ads: I think that they’re not great, but I think that’s due to my relative lack of traffic–because of the low number of pageviews, Yahoo just doesn’t (can’t) deliver ads that are very targeted. (I’ve seen a lot of ‘Find Dan Moore’). It’s also a beta service (ha, ha). Oh well–it has paid me enough to go to lunch (but I’ll have to wait because they mail a check only when you hit $100).

As long as we’re doing public service announcements, I’ve decided to turn off comments (from the initial post, rather than only on the old ones). Maybe it’s because I’m posting to thin air, or because I’m not posting on inflammatory topics, or because comment spam is so prevalent, but I’m not getting any comments anymore (I think 5 in the last 6 months). So, no more comments.

And that’s the last blog about this blog you’ll see, hopefully for another year.

New blog on information technology and public policy

Here is a new blog on information technology and public policy blog that I’ve started reading. It’s an interesting concept–each student is required to write once a week, guests are welcome to chime in (you can even follow along with the reading list, should you choose to do so), and the posts seem to be well thought out. I found the post titledFile-sharing, Market Impact and Consumer Welfare to be particularly interesting.

This is a graduate course at the Woodrow Wilson School of Public and International Affairs, graduates of which include Samual Alito, Bill Frist, Eliot Sptizer and others. So the folks writing these opinions have, at the least, an opportunity to become a mover and shaker.

Via Freedom to Tinker.

Exceptions in API design

Here’s an old but fantastic post about API (mis)design.

To handle an exception, you have four choices, one of which is:

Log it! We bought big hard disks for those servers, let’s use them! Log the exception toString() and print its stack trace, but only if you expect the exception to be thrown over 1,000,000 times each day. Alternatively, if you think that the exception would only occur rarely and that it could indicate a problem worth looking at, just print the exception class name … since stack traces just confuse people!

Hilarious. And, it looks like it’s part 4 of a 7 part series. Via Dejan Bosanac.

Wiki practices for requirements documentation

I have been heads down for the last couple of weeks helping write requirements and design documents. The team I’m on is building them using a wiki. (I discussed the wiki selection process a few weeks ago.)

I just wanted to outline a few practices (I hesitate to call them best practices) for using a wiki to document business and technical requirements.

  • Having a wiki allows anyone to edit the requirements. This doesn’t mean that everyone will or should. Documents should still have an owner.
  • Require folks to identify themselves. Require Author met our needs, as it requires an editing user enter some identifier. A history function, without tracking who made which edit, is fairly useless. Note that our solution works for a small team. A larger team may want to authenticate every user.
  • Make sure you lock down the wiki. We have ours behind the firewall, which means that we don’t have to require a user to remember yet another password, or even login at all (beyond providing some kind of identifier once, as mentioned above).
  • PDF generation allows you to generate decent looking print documents. I found PmWiki2PDF to be adequate.
  • Think carefully about document structure. We broke out the requirements into sections, and had each section on its own wiki page; more than that, we have pages for each section for each type of requirements (business, technical) or design document. These three section pages are pulled into a page for that one component, via the page include directive, which should describe everything known about a particular component. This kind of page seems useful at present, but we haven’t begun coding.
  • However, if I had to do it over again, I’d build each main document as one wiki page, and then pull the component info out of that. This allows a user to view the overall history of the document, as opposed to the above setup, where, to see what has changed in the requirements, you have to visit as many pages as there are sections. (You can also look at the RecentChanges page for a group, but that has only a page level granularity, as opposed to the line leve granularity of the page history.)
  • Choose page names carefully. While it’s easy to move content from one page to another, realize that you lose all the history when you do that. Well, actually, you might be able to move the file on the filesystem and retain the history, but for normal users, moving a pages (that is, changing a page name) causes history loss.
  • Keep requirements, whether in sections or in one document, in a different group than the design document. This allows you to lock down the requirements group, via a password while letting other documents, like design, continue to evolve.
  • Cross reference extensively. Don’t cut and paste, link or include.
  • Use pictures. The support for uploading pictures in PmWiki is alright, though the support for removing them isn’t great. Regardless, don’t shy away from diagrams and other graphics in the wiki.

I’m going to be interested to see how the process continues to evolve as we get further into development. But so far, I think that a wiki has everything you really need to generate requirements documentation for a small team of developers.

MySQL performance and doing calculations on varchar columns

MySQL, along with other features designed to make it easy to use, tries to do the right thing regarding strings. When you perform a math calculation on a column or columns that are of type varchar, MySQL automatically conversts that string to a number (empty strings are treated as zero.):

To cast a string to a numeric value in numeric context, you normally do not have to do anything other than to use the string value as though it were a number[.]

However, this translation, convenient as it may be, is not free. A client of mine had a query that was running calculations against two such columns. After indexing and trying to simplify the query, we were still seeing query execution times of 2+ seconds (all times are quoted for MySQL 4.1, on my relativly slow personal laptop).

The solution appears to be to change the type of the columns using the alter table syntax to type double. After doing so and running analyze table mytable, I was seeing query execution times of 0.2 seconds for the same query on the same box. Fantastic.

I am not sure if this result was due to not having to do several string conversions for each row returned by the query, or the fact that:

In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed[.]

Regardless of the cause, if you’re doing some complicated calculations on columns, consider making them numbers.