MySQL performance and doing calculations on varchar columns

MySQL, along with other features designed to make it easy to use, tries to do the right thing regarding strings. When you perform a math calculation on a column or columns that are of type varchar, MySQL automatically conversts that string to a number (empty strings are treated as zero.):

To cast a string to a numeric value in numeric context, you normally do not have to do anything other than to use the string value as though it were a number[.]

However, this translation, convenient as it may be, is not free. A client of mine had a query that was running calculations against two such columns. After indexing and trying to simplify the query, we were still seeing query execution times of 2+ seconds (all times are quoted for MySQL 4.1, on my relativly slow personal laptop).

The solution appears to be to change the type of the columns using the alter table syntax to type double. After doing so and running analyze table mytable, I was seeing query execution times of 0.2 seconds for the same query on the same box. Fantastic.

I am not sure if this result was due to not having to do several string conversions for each row returned by the query, or the fact that:

In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed[.]

Regardless of the cause, if you’re doing some complicated calculations on columns, consider making them numbers.


Choosing a wiki

I am setting up a wiki at work. These are our requirements:

  • File based storage–so I don’t have to deal with installing a database on the wiki server
  • Authentication of some kind–so that we can know who made changes to what document
  • Versioning–so we can roll back changes if need be.
  • PHP based–our website already runs php and I don’t want to deal with alternate technologies if I don’t have to.
  • Handles binary uploads–in case someone had a legacy doc they wanted to share.
  • Publish to PDF–so we can use this wiki for more formal documents. We won’t publish the entire site, but being able to do this on a per document basis is required.

I see this wiki both as a repository for company muscle memory (technical decisions, R&D results) and a place for more formal documents (requirements) that might be published to PDF. Basically, I’m trying to get rid of .doc files as a technical document format. I read about this on some XP mailing lists of which I used to be a part and am interested to see how effective it might be.

I started with PHPWiki but its support for PDF was lacking. (At least, I couldn’t figure it out, even though it was documented.)

After following the wizard process at WikiMatrix (similar to CMSMatrix, which I’ve touched on before), I found PmWiki, which has support for all of our requirements. It also seems to have a nice extensible architecture.

Installation was a snap and after monkeying around with authentication and PDF installation (documented here), I’ve added the following lines to my local/config.php:

include_once('cookbook/pmwiki2pdf/pmwiki2pdf.php');
$EnablePostAuthorRequired = 1;
$EnableUpload = 1;
$UploadDir = "/path/to/wiki/uploads";
$UploadUrlFmt = "http://www.myco.com/wiki/uploads";
$UploadMaxSize = 100000000; // 100M


PR: another industry being transformed by the internet

No, this entry isn’t about blogs. If you want to hear how blogs are transforming PR, I suggest you visit Micro Persuasion.

Via Dave Taylor, I just found out about PRLeads, a service that lets you field requests for information from journalists. When you see a request about a topic on which you consider yourself an ‘expert’, you can correspond with the journalist. You provide information and context to the journalist and if things work out, you get publicity and gain authority by being quoted in a story.

I asked a friend who has worked in PR for a long time and this service is rather revolutionary. Ten years ago, the journalist would have looked to friends or in-shop files for an expert, but now they have access to everyone who knows about the service and is willing to pay the fee ($99/month–hardly backbreaking).

This is good for everyone. Journalists get access to experts who they might not find otherwise as well as the chance to write a more correct story (due to the fact that they’ll be exposed to more viewpoints). Experts get a chance to shape public opinion as well as publicity. And the public is exposed to a wider range of views than they’d otherwise see. Win-win-win.

PRLeads is a perfect example of an internet company, by the way. It has network effects–journalists will go where the most experts are, and experts will go where the most journalists are. It’s a service that just couldn’t be efficiently run without the internet. And the main commodity is information.

Update: Here’s a blog entry about results from PRLeads.com.


mod_alias vs mod_proxy for XMLHttpRequest proxying

If you’re going to use an apache proxy to fix some of the issues with XMLHttpRequest, be aware that mod_alias sends a redirect to the browser. That is, it sends one of the 3XX HTTP status codes to the XMLHttpRequest object. The XMLHttpRequest object then issues the GET itself (well, some do, check out these XMLHttpRequest tests for more). That’s fine if you’re doing a GET, but if you’re doing a POST, then some redirects will require user interaction. I found that mod_alias, which was sending a 301 (redirect permanent) just turned the POST request into a GET. (mod_alias doesn’t always come to the rescue, apparently.)

The solution? Well, since you are really proxying the XMLHttpRequest’s request, use mod_proxy.


Google Maps Gotchas

I’ve done some recent work with Google Maps for a client and thought I’d share some lessons learned. (It seems I’ve been posting a lot about Google lately–I don’t know why.)

First off, like many other folks on the web, I think Google Maps are great. I’ve been a long time MapQuest user and the fact is that Google’s image panes just produces a better, slicker user experience than MapQuest’s dynamic image generation. Not to mention the fact that Google’s map API is free (as in beer, not in speech). Now, I haven’t had a chance to compare Yahoo’s map offering (as Michael Yuan did), though I have played around with Yahoo! MapMaker for Excel, but more mapping options can only be better. On to the issues I had with Google Maps.

* Geocoding is not provided with Google Maps, which means that you need to set up your own geocoding engine. However, the Tiger/Line dataset has some holes in it. Especially for rural regions, I was seeing many addresses that could not be geocoded. Even for an urban area (like Boulder, CO) around ten percent of the addresses were not geocodable. As for accuracy of the geocoding itself, I don’t even know how to test it on a large scale, but my client said he saw at least some instances where an address was geocoded incorrectly (a small number, but still significant). Well, what can you say? If you want precision, much less accuracy, pay for it. I investigated using Yahoo’s geocoding service, which is free and based on higher quality commercial data. Since my client is a commercial site (even though the maps are available for free) Yahoo said that they would require Yahoo maps on the site if it were to use their geocoding service. Fair enough. (As an aside, this was an interesting podcast of a speech by an executive of Navteq outlining some of the issues surrounding procuring good geodata.)

* PNGs are the default image type for the map pinpoints on Google Maps. These images () let you mark certain locations. However, if you display them in a list alongside the map, you’ll quickly find that transparent PNGs don’t work with Internet Explorer. They show up, but are surrounded by a black text box (Update Feb 8: transparent PNGs are outlined by a box in Internet Explorer. I’ve seen black boxes and blue boxes; regardless of the color, it looks bad). Luckily, the transparent PNG/Internet Explorer problem has already been solved.

* Each pinpoint/marker is added using an overlay. If you have significant numbers of overlays, map rendering becomes quite slow. Now, for Firefox users, it’s not as big an issue, because Firefox renders the rest of the page before the map. But for IE users, the table containing both the list and the map is not rendered until the map is displayable. On a reasonably fast box, I was seeing times of 80 seconds until the page was rendering. Clearly, this was unacceptable. Even on Firefox, some of the rendering times were just too slow. I searched the Google Maps Discussion Group and most everyone was saying that if you have a large number of markers, you should cluster them into a few representative markers until the user has zoomed sufficiently. (If I recall correctly, the maximum number of markers most folks recommended was around 20.) Such clustering can happen on the server side, or on the client side.

* Data retrieval architecture affects performance. For the first revision of the mapping application, I sent the data for each pinpoint at the same time as the map and the listing. This was the wrong way to do it–this method makes perceived rendering time much longer. The correct way to go is documented in ‘Hacking Maps with the Google Maps API’ XML.com article (linked below), where you use an XMLHttpRequest to pull in the pinpoint data asynchronously. I am in the midst of developing revision two of the interface and have noticed an appreciable speed up in rendering, so I’d recommend heading down that path at the start.

Finally, some resources. The first article you should read is Hacking Maps with the Google Maps API. This tutorial steps you through everything you need to know. The API documentation is adequate. I found some interesting discussions happening on the API discussion group. And, finally, the GoogleMapki was useful in looking at performance concerns. I haven’t read Scott Davis’ Google Maps API article, but keep meaning to spend the $8.50 and do so.

All in all, I enjoyed learning this new technology. The API is simple and easy to use, even for someone who’s no javascript expert. I look forward to adding maps to some of my other pages; my cross country skiing resources page in particular would benefit. Google has kickstarted a whole new area of web development and I think there’s plenty more to do.






© Moore Consulting, 2003-2017 +