{"id":376,"date":"2006-09-27T11:57:18","date_gmt":"2006-09-27T17:57:18","guid":{"rendered":"http:\/\/www.mooreds.com\/wordpress\/?p=376"},"modified":"2007-04-11T16:20:07","modified_gmt":"2007-04-11T22:20:07","slug":"installing-the-median-user-defined-function-on-mysql","status":"publish","type":"post","link":"https:\/\/www.mooreds.com\/wordpress\/archives\/376","title":{"rendered":"Installing the median user defined function on MySQL"},"content":{"rendered":"<p>I just re-read <a href=\"http:\/\/www.mooreds.com\/wordpress\/?p=158\">&#8220;How To Lie With Statistics&#8221;<\/a>, which is so good I think it should be required reading in every middle school.  In it, the author makes the point that there are three kinds of &#8216;averages&#8217;: <a href=\"http:\/\/en.wikipedia.org\/wiki\/Mean\">arithmetic mean<\/a>, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Medin\">median<\/a> and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Mode_%28statistics%29\">mode<\/a> (here I am, <a href=\"http:\/\/www.tbray.org\/ongoing\/When\/200x\/2006\/09\/15\/Wikipedia\">contributing to Wikipedia&#8217;s dominance<\/a>, due to my laziness in looking up alternative definitions of statistical concepts).  In general, the median is the most informative average, because it&#8217;s not skewed by a small number of outliers.But mysql (and other databases I&#8217;ve worked on) don&#8217;t natively supprt the medan, whereas I believe most support average (by which they mean &#8216;arithmetic mean&#8217;).  Sure, you can use a stored procedure (as suggested <a href=\"http:\/\/www.joeconway.com\/plr\/doc\/plr-aggregate-funcs.html\">here for PostgreSQL<\/a>.  However, I&#8217;m working with MySQL 4, which does not support stored procs.  However, there is another solution: <a href=\"http:\/\/dev.mysql.com\/doc\/refman\/4.1\/en\/adding-udf.html\">user defined functions<\/a>.  These seem like stored procedures, except you have to write them in C (or C++).<\/p>\n<p>Now, I&#8217;m not a C programmer.  Luckily, someone has written and released a set of <a href=\"http:\/\/mysql-udf.sourceforge.net\/\">mysql user defined functions that include median<\/a> (as well as many other statistical manipulations).  The bad news is that it hasn&#8217;t been updated for years.  The good news is that with a bit of luck and many downloads, I was able to get the median function working on mysql, both on windows as a dll, and on linux as a shared library.  To repeat, I am not a C programmer, so if you see any head thumping errors below, please let me know and I&#8217;ll update this document.<\/p>\n<p>First off, I was working with these versions of mysql: <code>c:\\Program Files\\MySQL\\MySQL Server 4.1\\bin\\mysql.exe  Ver 14.7 Distrib 4.1.10a,  for Win95\/Win98 (i32)<\/code> and <code>mysql  Ver 14.7 Distrib 4.1.7, for pc-linux (i686)<\/code><\/p>\n<p>To get median working on windows, you need to:<\/p>\n<ol>\n<li><a href=\"http:\/\/prdownloads.sourceforge.net\/mysql-udf\/mysql-udf-0.3.tar.gz?download\">Download the mysql-udf tarball<\/a>.<\/li>\n<li>patch the files if you&#8217;re running a version of mysql greater than 4.1.1.  <a href=\"http:\/\/sourceforge.net\/tracker\/?group_id=114041&#038;atid=667000\">patch available here<\/a>, or the <a href=\"http:\/\/www.mooreds.com\/files\/mysql-udf-patched.tar.gz\">patched tarball is here<\/a>.<\/li>\n<li>Download and install <a href=\"http:\/\/msdn.microsoft.com\/vstudio\/express\/visualC\/default.aspx\">Visual C++ Express<\/a>.  (If you have a C compiler on Windows, you can skip this step and the next.  Oh, and the ones following that will probably be different. (<a href=\"http:\/\/rpbouman.blogspot.com\/2005\/11\/using-udf-to-raise-errors-from-inside.html\">Here&#8217;s a blog post about creating a UDF using Visual Studio C++ 2003<\/a>.)<\/li>\n<li>Download and install the <a href=\"http:\/\/msdn.microsoft.com\/vstudio\/express\/visualc\/usingpsdk\/\">platform SDK<\/a>; I only followed through step 3.  If you don&#8217;t, you&#8217;ll get &#8216;windows.h&#8217; errors when you try to compile the UDF.<\/li>\n<li>Untar the myslq-udf tarball.  Patch if needed.<\/li>\n<li>Install the mysql header files.  I was able to do this via the Windows Installer, which let me modify my existing mysql installation; I had to add the &#8216;C Include Files \/ Lib Files&#8217; feature.<\/li>\n<li>Create a new directory.  Copy <code>udf_median.cc<\/code> from the untarred directory to this new directory.<\/li>\n<li>Create a new file in that directory called <code>udf_median.def<\/code>.  This file contains all the methods the UDF is exporting.  Or you can just download <a href=\"http:\/\/www.mooreds.com\/files\/udf_median.def\">the file I used here<\/a>.<\/li>\n<li>Open Visual C++ Express<\/li>\n<li>Choose File \/ New \/ Project From Existing Code.  Hit Next.  Browse to the directory you just created.  Create a name for the project.  Hit Finish<\/li>\n<li>Edit the <code>udf_median.cc<\/code> file and comment out the <code>#ifdef HAVE_DLOPEN<\/code> line as well as the corresponding <code>#endif<\/code>.  If I didn&#8217;t do this, I kept getting link errors, as I guess everything between those preprocessor directives was not being compiled.<\/li>\n<li>Add the mysql include files: right click on the project and choose properties.  Expand &#8216;Configuration Properties&#8217; then &#8216;C\/C++&#8217; and click &#8216;General&#8217;.  On the right, add an include directory.  Navigate to the Mysql include directory and add that.<\/li>\n<li>Add the module definition file: right click on the project and choose properties.  Expand &#8216;Configuration Properties&#8217; then &#8216;Linker&#8217; and click &#8216;Input&#8217;.  Add &#8216;udf_median.def&#8217; to the key &#8216;Module Definition File&#8217;.<\/li>\n<li>Make sure VC knows this is a DLL: right click on the project and choose properties.  Expand &#8216;Configuration Properties&#8217; and click &#8216;General&#8217;.  Choose &#8216;Dynamic Library (.dll)&#8217; for Configuration Type.  If you don&#8217;t do this, you&#8217;ll get errors like: <code>error LNK2019: unresolved external symbol _WinMain<\/code> because the compiler thinks you&#8217;re trying to build an application.<\/li>\n<li>Right click on the project and choose &#8216;Build&#8217;.  This gives you a DLL in the Debug directory.<\/li>\n<li>Copy the DLL to the bin directory of your mysql installation.<\/li>\n<li>Create the function by logging in to mysql and running this command: <code>CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.dll';<\/code>.  (The user you log in as will need to have the ability to insert rows into the mysql tables.)<\/li>\n<li>Test and enjoy.<\/li>\n<\/ol>\n<p>Deploying the UDF to linux is much simpler, mostly because you don&#8217;t have to install a compiler, linker, etc.  I used &#8216;gcc (GCC) 3.3.4&#8217;.<\/p>\n<ol>\n<li><a href=\"http:\/\/prdownloads.sourceforge.net\/mysql-udf\/mysql-udf-0.3.tar.gz?download\">Download the mysql-udf tarball<\/a>.<\/li>\n<li>patch the files if you&#8217;re running a version of mysql greater than 4.1.1.  <a href=\"http:\/\/sourceforge.net\/tracker\/?group_id=114041&#038;atid=667000\">patch available here<\/a>, or the <a href=\"http:\/\/www.mooreds.com\/files\/mysql-udf-patched.tar.gz\">patched tarball is here<\/a>.<\/li>\n<li>Untar the myslq-udf tarball.  Patch if needed.<\/li>\n<li>Edit the <code>udf_median.cc<\/code> file and comment out the <code>#ifdef HAVE_DLOPEN <\/code> line as well as the corresponding <code>#endif<\/code>.<\/li>\n<li>Compile and link the code.  Do not use the instructions on the <a href=\"http:\/\/mysql-udf.sourceforge.net\/\">mysql-udf<\/a> homepage.  If you compile with those flags, you&#8217;ll get this error when you try to add the function: <code>mysql> CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';<br \/>\nERROR 1126 (HY000): Can't open shared library 'udf_median.so' (errno: 22 \/usr\/lib\/udf_median.so: undefined symbol: _Znwj)<\/code>.  Rather, use the instructions in <a href=\"http:\/\/sourceforge.net\/tracker\/index.php?func=detail&#038;aid=1007884&#038;group_id=114041&#038;atid=666998\">this bug report<\/a>: <code>gcc -shared -lstdc++ -I \/usr\/include -I \/usr\/local\/include -I \/usr\/local\/mysql\/include\/ -o udf_median.so udf_median.cc' <\/code><\/li>\n<li>Copy the shared library to a directory where mysql will see it.  I put it in <code>\/usr\/lib<\/code>.<\/li>\n<li>Create the function by logging in to mysql and running this command: <code>CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';<\/code>.  (The user you log in as will need to have the ability to insert rows into the mysql tables.)<\/li>\n<li>Test and enjoy.<\/li>\n<\/ol>\n<p>We have pushed this UDF to production with replicated servers and haven&#8217;t seen any issues with it yet.<\/p>\n<p>I want to extend my thanks to:<\/p>\n<ul>\n<li>Jan Steemann, the creator of the mysql-udf project<\/li>\n<li>Redy Rodriguez and sergivs for the patch and the compiler flag help, respectively<\/li>\n<li>Roland Bouman, for a <a href=\"http:\/\/rpbouman.blogspot.com\/2005\/11\/using-udf-to-raise-errors-from-inside.html\">great tutorial on building UDFs for Windows<\/a><\/li>\n<\/ul>\n<p>[tags]median, mysql, user defined functions[\/tags]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I just re-read &#8220;How To Lie With Statistics&#8221;, which is so good I think it should be required reading in every middle school. In it, the author makes the point [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,28,6],"tags":[],"class_list":["post-376","post","type-post","status-publish","format-standard","hentry","category-databases","category-mysql","category-programming"],"_links":{"self":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/376","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/comments?post=376"}],"version-history":[{"count":0,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/376\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/media?parent=376"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/categories?post=376"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/tags?post=376"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}