Skip to content

Installing the median user defined function on MySQL

I just re-read “How To Lie With Statistics”, which is so good I think it should be required reading in every middle school. In it, the author makes the point that there are three kinds of ‘averages’: arithmetic mean, median and mode (here I am, contributing to Wikipedia’s dominance, due to my laziness in looking up alternative definitions of statistical concepts). In general, the median is the most informative average, because it’s not skewed by a small number of outliers.But mysql (and other databases I’ve worked on) don’t natively supprt the medan, whereas I believe most support average (by which they mean ‘arithmetic mean’). Sure, you can use a stored procedure (as suggested here for PostgreSQL. However, I’m working with MySQL 4, which does not support stored procs. However, there is another solution: user defined functions. These seem like stored procedures, except you have to write them in C (or C++).

Now, I’m not a C programmer. Luckily, someone has written and released a set of mysql user defined functions that include median (as well as many other statistical manipulations). The bad news is that it hasn’t been updated for years. The good news is that with a bit of luck and many downloads, I was able to get the median function working on mysql, both on windows as a dll, and on linux as a shared library. To repeat, I am not a C programmer, so if you see any head thumping errors below, please let me know and I’ll update this document.

First off, I was working with these versions of mysql: c:\Program Files\MySQL\MySQL Server 4.1\bin\mysql.exe Ver 14.7 Distrib 4.1.10a, for Win95/Win98 (i32) and mysql Ver 14.7 Distrib 4.1.7, for pc-linux (i686)

To get median working on windows, you need to:

  1. Download the mysql-udf tarball.
  2. patch the files if you’re running a version of mysql greater than 4.1.1. patch available here, or the patched tarball is here.
  3. Download and install Visual C++ Express. (If you have a C compiler on Windows, you can skip this step and the next. Oh, and the ones following that will probably be different. (Here’s a blog post about creating a UDF using Visual Studio C++ 2003.)
  4. Download and install the platform SDK; I only followed through step 3. If you don’t, you’ll get ‘windows.h’ errors when you try to compile the UDF.
  5. Untar the myslq-udf tarball. Patch if needed.
  6. Install the mysql header files. I was able to do this via the Windows Installer, which let me modify my existing mysql installation; I had to add the ‘C Include Files / Lib Files’ feature.
  7. Create a new directory. Copy udf_median.cc from the untarred directory to this new directory.
  8. Create a new file in that directory called udf_median.def. This file contains all the methods the UDF is exporting. Or you can just download the file I used here.
  9. Open Visual C++ Express
  10. Choose File / New / Project From Existing Code. Hit Next. Browse to the directory you just created. Create a name for the project. Hit Finish
  11. Edit the udf_median.cc file and comment out the #ifdef HAVE_DLOPEN line as well as the corresponding #endif. If I didn’t do this, I kept getting link errors, as I guess everything between those preprocessor directives was not being compiled.
  12. Add the mysql include files: right click on the project and choose properties. Expand ‘Configuration Properties’ then ‘C/C++’ and click ‘General’. On the right, add an include directory. Navigate to the Mysql include directory and add that.
  13. Add the module definition file: right click on the project and choose properties. Expand ‘Configuration Properties’ then ‘Linker’ and click ‘Input’. Add ‘udf_median.def’ to the key ‘Module Definition File’.
  14. Make sure VC knows this is a DLL: right click on the project and choose properties. Expand ‘Configuration Properties’ and click ‘General’. Choose ‘Dynamic Library (.dll)’ for Configuration Type. If you don’t do this, you’ll get errors like: error LNK2019: unresolved external symbol _WinMain because the compiler thinks you’re trying to build an application.
  15. Right click on the project and choose ‘Build’. This gives you a DLL in the Debug directory.
  16. Copy the DLL to the bin directory of your mysql installation.
  17. Create the function by logging in to mysql and running this command: CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.dll';. (The user you log in as will need to have the ability to insert rows into the mysql tables.)
  18. Test and enjoy.

Deploying the UDF to linux is much simpler, mostly because you don’t have to install a compiler, linker, etc. I used ‘gcc (GCC) 3.3.4’.

  1. Download the mysql-udf tarball.
  2. patch the files if you’re running a version of mysql greater than 4.1.1. patch available here, or the patched tarball is here.
  3. Untar the myslq-udf tarball. Patch if needed.
  4. Edit the udf_median.cc file and comment out the #ifdef HAVE_DLOPEN line as well as the corresponding #endif.
  5. Compile and link the code. Do not use the instructions on the mysql-udf homepage. If you compile with those flags, you’ll get this error when you try to add the function: mysql> CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';
    ERROR 1126 (HY000): Can't open shared library 'udf_median.so' (errno: 22 /usr/lib/udf_median.so: undefined symbol: _Znwj)
    . Rather, use the instructions in this bug report: gcc -shared -lstdc++ -I /usr/include -I /usr/local/include -I /usr/local/mysql/include/ -o udf_median.so udf_median.cc'
  6. Copy the shared library to a directory where mysql will see it. I put it in /usr/lib.
  7. Create the function by logging in to mysql and running this command: CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';. (The user you log in as will need to have the ability to insert rows into the mysql tables.)
  8. Test and enjoy.

We have pushed this UDF to production with replicated servers and haven’t seen any issues with it yet.

I want to extend my thanks to:

[tags]median, mysql, user defined functions[/tags]

Comments on upgrading to version two of Google Maps

I recently upgraded a simple simple Google Map that I built last spring to display some of the cross country skiing around Boulder. You can see the original version here. I built this based on this XML.com article, using XMLHttpRequest to retrieve the data from the server and Gmarker.openInfoWindoXSLT() with this XSL stylesheet to present the data.

I decided to upgrade this map last week to version two. Since openInfoWindowXSLT is no longer supported on every browser, I feared that the upgrade would take significant effort, even though the map very simple. However, the upgrade ended up being easier than I thought it would be. To get started, I read the Google Upgrade Guide–this document explains just what changes were made in the API. The changes that affected my map included:

  • A few method name changes–centerAndZoom becomes setCenter
  • GPoint is no longer used to indicate a latitude and longitude location on a map, and its replacement, GLatLng, reverses the order of the constructor’s arguments.
  • Zoom levels are flipped around, with larger numbers now signifying higher resolutions
  • The biggest effort was modifing the code not to use the XSLT process for generating infoWindows. However, this was easier than I thought it would be. I simply wrote a javascript method that mimicked what the XSL had previously done. Sure, accessing the DOM elements was a bit of a hassle that required some debugging (that’s the win of XSL–declarative DOM access), but the alternatives were either ignore browsers that don’t have built-in XSLT support (Safari) or integrate AJAXSLT, a Google sponsored project to provide cross browser XSLT support. If this were a larger project that depended on more XSLT, I probably would have done the latter.

Upgrading my (admittedly very simple map) took about 1.5 hours. Visit the new map and take a look at the code.

[tags]google maps upgrade[/tags]

Book Review: Google Maps API V2

Seven months ago, I wrote about Google Maps Gotchas. I mentioned Scott Davis’ Google Maps API Pragmatic Friday article, published by the Pragmatic Programmer folks. Well, a few things have happened since then. In April, Google released version two of their maps API (though they still haven’t set a date when version one will no longer be supported), Scott revised his article and I spent a tax deductible $8.50 to give it a read. What you’ll find below is my take on his article.

The good: first, the ordering was easy, and I received my custom PDF (complete with “Prepared Exclusively for Daniel Scott Moore” as a footer on every page) in less than 20 minutes. Scott explains in a very easy to understand fashion how to create a map. He also covers each of the API’s javascript objects and how to use them. In particular, I thought the list of events and objects that fire them (in the ‘Events’ chapter) was a good reference. Now, Google provides a class reference, but Scott’s are a bit easier to understand here’s a comparison, for the Gmarker class:

Google API:

A GMarker marks a position on the map. It implements the GOverlay interface andthus is added to the map using the GMap2.addOverlay() method.A marker object has a point, which is the geographical position where the marker is anchored on the map, and an icon. If the icon is not set in the constructor, the default icon G_DEFAULT_ICON is used.

After it is added to a map, the info window of that map can be opened through the marker. The marker object will fire mouse events and infowindow events.

Davis’ Book:

In the Core Objects section, we introduced the GLatLng. A GLatLng stores a Latitude / Longitude coordinate, but it doesn’t offer you a way to visualize it on a map. A GMarker is the way to add GLatLngs GMarker to the map for display purposes. The GMarker constructor takes a GLatLng as the only required argument.Once we have the marker, we need to tell the map to display it; map.addOverlay(myMarker) should do the trick. (Objects that you superimpose over the map are called Overlays.) You can remove the Overlays marker using map.removeOverlay(myMarker). To remove all overlays, use map.clearOverlays( ).

var myPoint = new GLatLng(38.898748, -77.037684);
var myMarker = new GMarker(myPoint);
map.addOverlay(myMarker);

Theoretically a map can support an unlimited number of markers, but anecdotal evidence suggests that performance starts to slow down significantly after a hundred or so markers. (File under, “Doc, it hurts when I do this.”)

I liked the real world examples–the fact that you could click through and see the code Scott was writing about in action on his website is a real plus. In addition, he builds a decently complex example in Chapter 7 where the user can add and delete cities. He also gives a good warning about examples that use Gmap, rather than Gmap2.

However, there were some issues. Scott’s coverage of the upgrade to version two of the API is, unfortunately, rather spotty. In his blog, the June release of that feature, and the April revision of the book). He also doesn’t cover GDownloadURL, a convenience method for XMLHttpRequest processing, or the GUnload methods. I’ll freely admit that the maps API is a moving target, and some of the omissions above may be due to that.

However, there are other problems. Though billed as a beginner book, he omits what I consider to be one of the fundamental challenges of Google Maps development–the performance obstacles large numbers of database driven markers (other than the comment mentioned above in the GMarker reference). In addition, he doesn’t cover design options, nor cross browser issues (like the transparent PNG in IE issue).

In the last chapter, he mentions good examples of mapping websites, but Scott omits references to useful websites–something that even dead tree books do. In particular, he doesn’t mention mapki.com (a wiki full of useful user provided data) nor the Google Maps group (which some users consider a primary differentiator between Google and Yahoo Maps).

One final gripe is that the 75 pages of content that I expected were really only 45–text only filled about 60% of the column width. I expect that in articles I read for free on the web, but in books that I pay for, I like a bit higher content to page ratio.

In short, this ebook is a good choice for the first time Google Maps builder. This is due to the tutorial nature of much of the book, the examples, and the explanation of typical good javascript code, such as using anonymous functions for the event handlers. It is not entirely adequate in covering version 2 of the API, possibly due to API changes, and it ignored some of the more complex aspects of the API.

If you’re looking for a folksy introduction to Google Maps api, it’s worth the $8.50 to have a coherent guide. If you’ve muddled through one google maps project, piecing together things from the API docs and various blogs, it becomes less worthwhile. But if you want some kind of discussion about complex Google Maps issues this document is not the right place to look.

[tags]Scott Davis, Google Maps, Pragmatic Fridays[/tags]

11 Tips for Managing Virtual Employees

Via WebWorkerDaily, here’s 11 tips for managing virtual workersUpdate, 6/2009: this link is dead, but here’s the wayback machine’s archived version.
I have been working virtually on and off for over three years (mostly as a contractor). I think that most of Scott’s comments are spot on, except two of them.

#1 (It’s not cheaper) says that you’ll need hardward like webcams and wireless keyboards. I’ve done prefectly well with a laptop and cellphone. In fact, one of the great things about working virtually is that everyone can choose their equipment to suit their needs.

#4 (Metrics, Metrics, Metrics) is one I’d amend, rather than totally discount. I think what Scott is really saying is that you need some means of verifying that work is getting done. And if it is not, you want to know sooner rather than later, so the more incremental feedback you can get, the better. In software, this can be accomplished by metrics, but frequent releases can also serve the same purpose.

That said, all of his post is worth reading.

[tags]working virtually,virtual software development[/tags]

On recruiters and job boards

Joel, of Joel On Software fame, has created a jobs board with some rather different rules for posting: no hidden company names (no ‘anonymous jobs’) and a money back guarentee. The main goal is to make a niche job board where folks from the JoS community can find jobs at good companies. Requiring company names also screens out many recruiters.

I think intermediaries in the hiring process can provide high value (I love reading the insights of Nick Corcodilos’ “Ask The Headhunter”). I’ve gotten contracts through recruiters before. Such recruiters can add value by screening candidates and pulling from a wider pool than a company might have available. It’s outsourced HR.

However, I have a resume on Monster. Often, I’ll get an email from a recruiter about a job that is obviously a wrong fit. For example, I have some ATG Dynamo experience and some StoryServer experience, both from several years ago. I’ve received emails detailing jobs where extensive experience with the latest version of ATG Dynamo or StoryServer is required. If the emailer had bothered to read my resume after the keyword match, they’d know I was not a good fit. It is job spam. I’ve hidden my resume on Monster, which helps, but before I did so, my email was sucked into the recruiter databases.

Now, you may be saying “Boo-hoo, Dan! You’re being offered jobs that aren’t a good fit. Just delete the emails!” It’s true, I can treat those emails like any other spam. And the costs are the same as any other spam–my attention. (Occasionally it’s useful to see what positions and technologies are being recruited for, which keeps me from sending all the job spam directly to the trash). The business model for these recruiters is also the same–send out enormous quantities of (nearly costless) email and play the numbers. Employees do this kind of thing all the time when they send out massive numbers of resumes, and it’s broken.

Back to Joel’s jobs board. After a few days, someone violated the no ‘anonymous jobs’ rule, and Joel’s request for suggsions on how to deal with it sparked this discussion, which is pretty lively. Pretty much everyone fell into two categories: don’t allow ‘anonymous jobs’ lest you become another Monster.com, or allow the recruiters, but make anyone submitting an anonymous job check a box, and let users hide such entries. Some folks also suggested charging recruiters more (sometimes much more) than companies, or creating a job board just for recruiters, or creating some kind of rating system which will let ‘good’ employers (whatever that means) float to the top of listings. It does look like Joel has decided to keep the ‘no anonymous jobs’ rule. The JoS job board is now an intermediary, even though it provides nothing more than aggregation and a bit of screening.

I said before that intermediaries in hiring process can have value. I believe this is true of any intermediary in any process, whether it be job board, recruiter, real estate agent, car salesman, travel agent or anything else. But (cue Jaws Theme) disintermediation due to decreased information distribution costs means intermediaries can no longer add value just by having access to information (whether it be MLS listings or phone numbers for hotels in Australia). Now they need to provide something beyond what the internet can provide, whether that be deep experience in a particular city’s real estate or a good relationship with a hiring manager or aggregation of interesting eyeballs.

(“And that’s something that job spammers simply cannot provide.” Well, that was my last sentence until I re-read my post. But job spammers do actually aggregate interesting eyeballs; they just do it inefficiently.)

[tags]joel on software, job boards, recruiters, disintermediation[/tags]