Fri, 04 Jan 08

BatchUpdateException using Hibernate and MySQL5

Posted in Java, Databases, MySQL at 7:59 pm by moore

I ran into a crazy error last week. One of my clients was upgrading from MySQL4 to MySQL5. The application in question was using Hibernate 3.2. Here’s the table structure, and the hibernate bean definition. See if you can spot the issue:

mysql> desc stat;
+--------------+-------------+------+-----+---------------------+-------+
| Field        | Type        | Null | Key | Default             | Extra |
+--------------+-------------+------+-----+---------------------+-------+
| stat_date    | date        |      | PRI | 0000-00-00          |       |
| stat_type    | varchar(50) |      | PRI |                     |       |
| stat_count   | int(11)     | YES  |     | NULL                |       |
| last_updated | datetime    |      |     | 0000-00-00 00:00:00 |       |
+--------------+-------------+------+-----+---------------------+-------+
4 rows in set (0.00 sec)

<class name="com.foo.common.data.Statistic" table="stat" lazy="false">
<cache usage="read-write"/>
<composite-id name="statisticId" class="com.foo.common.data.StatisticId">
<key-property name="date" type="java.util.Date" column="stat_date"/>
<key-property name="type" column="stat_type"/>
</composite-id>
<property name="count" column="stat_count"/>
<property name="lastUpdated" type="java.util.Date" column="last_updated" />
</class>

The exception stack trace I was seeing was something like this:

2007-12-31 14:15:09,888 ERROR [Thread-14] def.AbstractFlushingEventListener (AbstractFlushingEventListener.java:301)

- Could not synchronize database state with session
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:71)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:249)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:235)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:139)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
....
Caused by: java.sql.BatchUpdateException: Duplicate key or integrity constraint
violation message from server: "Duplicate entry '2007-12-31-stattype' for key 1"
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1492)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:242)
... 57 more

I ended up turning on the mysql logging (the log setting in the my.ini file, which logs all sql statements mysql makes) to see what was happening.

Basically, I was looking to see if an entry in the stat table existed; if it did, increment and update, if it did not, insert. And the insert was always happening, so the entry was not found–it did exist because mysql threw the ‘integrity constraint’ exception.

The cause of the issue was the date type of stat_date and the fact that I incorrectly mapped it to java.util.Date. It really should have been mapped to java.sql.Date. How this worked in mysql4 is beyond me, but it did. Changing the hibernate dialect to mysql5 had no impact.

Technorati Tags: ,

Wed, 11 Apr 07

MySQL tuning

Posted in Databases, MySQL at 4:18 pm by moore

If you’re trying to tune MySQL, make sure you measure before and after. mysqlreport is a great way to measure a number of aspects of mysql (make sure you check out the guide). If you want graphing and long term tracking for mysqlreport and just about any other measure you care to track, I recommend cacti, which works with rrdtool.

Technorati Tags: ,

Tue, 20 Mar 07

Out of the box complete open source WAMP stack

Posted in Databases, Web Applications at 8:12 pm by moore

If you’re looking for an out of the box WAMP stack, I had good luck with Apache2Triad. A big download, but you’re getting Apache, PostGreSQL, MySQL, a mail server and an FTP server, plus various admin tools. I had things running in about 15 minutes–very cool, given that I’d spent a fair bit of time just trying to get PostGreSQL installed alone. The only hiccup was that my spyware tool thought SlimFTPd was a virus.

Beware, though, if you have an existing MySQL installation. Apache2Triad won’t blow away the data, but it will usurp the Services entry. And then, if you remove Apache2Triad, your existing MySQL instance is orphaned. I was able to get mine up and running again, from the command line. But to get it running again as a service required a complete uninstall/reinstall. No fun.

So, if you have no dev environment and need a quick start, take a look at this package. If you, on the other hand, have existing development tools installed, be more careful than I was.

Technorati Tags:

Thu, 11 Jan 07

Large varchar columns can lead to huge ESRI exports when using ogr2ogr

Posted in Programming, Databases at 10:02 am by moore

I was recently using ogr2ogr to convert, on the fly, some data in a PostGIS to other standard formats (ESRI and MapInfo). The ESRI export in particular had some problems–it took about 4 minutes for the export of an table with 11K rows and 37 columns, and it generated a 700M dbf file. This file was then zipped (with the other config files), and in around 6 minutes was compressed to a 7M zip file, that was sent to the browser. Now, you can imagine how thrilled a user would be to wait 10 minutes for an export. Apache was timing out (the default timeout is 5 min) and I was at a loss as to how to address the performance issue.

I mentioned this to a colleague who has significantly more experience with GIS tools, and he pointed out that in the source table there were several varchar(4000) fields. Now, in PostgreSQL, [i]f the string to be stored is shorter than the declared length … values of type character varying [varchar] will simply store the shorter string. But the ESRI export does not do that–each varchar(4000) field was padded to a length of 4000, even though none of the fields approached that length.

The solution? A few simple select max(length(colname)) from table and alter table statements, and the varchar(4000)columns were decreased in size. The dbf file decreased to a 50M file, uncompressed, and the entire zip file decreased to 5M. As you can guess, the download time was slashed.

Update 2/16: The kind members of the GDAL mailing list pointed me to a document listing all the limitations of the ESRI driver for ogr2ogr.  Check out the “Creation Issues Section”.

Technorati Tags: , ,

Wed, 27 Sep 06

Installing the median user defined function on MySQL

Posted in Programming, Databases, MySQL at 11:57 am by moore

I just re-read “How To Lie With Statistics”, which is so good I think it should be required reading in every middle school. In it, the author makes the point that there are three kinds of ‘averages’: arithmetic mean, median and mode (here I am, contributing to Wikipedia’s dominance, due to my laziness in looking up alternative definitions of statistical concepts). In general, the median is the most informative average, because it’s not skewed by a small number of outliers.But mysql (and other databases I’ve worked on) don’t natively supprt the medan, whereas I believe most support average (by which they mean ‘arithmetic mean’). Sure, you can use a stored procedure (as suggested here for PostgreSQL. However, I’m working with MySQL 4, which does not support stored procs. However, there is another solution: user defined functions. These seem like stored procedures, except you have to write them in C (or C++).

Now, I’m not a C programmer. Luckily, someone has written and released a set of mysql user defined functions that include median (as well as many other statistical manipulations). The bad news is that it hasn’t been updated for years. The good news is that with a bit of luck and many downloads, I was able to get the median function working on mysql, both on windows as a dll, and on linux as a shared library. To repeat, I am not a C programmer, so if you see any head thumping errors below, please let me know and I’ll update this document.

First off, I was working with these versions of mysql: c:\Program Files\MySQL\MySQL Server 4.1\bin\mysql.exe Ver 14.7 Distrib 4.1.10a, for Win95/Win98 (i32) and mysql Ver 14.7 Distrib 4.1.7, for pc-linux (i686)

To get median working on windows, you need to:

  1. Download the mysql-udf tarball.
  2. patch the files if you’re running a version of mysql greater than 4.1.1. patch available here, or the patched tarball is here.
  3. Download and install Visual C++ Express. (If you have a C compiler on Windows, you can skip this step and the next. Oh, and the ones following that will probably be different. (Here’s a blog post about creating a UDF using Visual Studio C++ 2003.)
  4. Download and install the platform SDK; I only followed through step 3. If you don’t, you’ll get ‘windows.h’ errors when you try to compile the UDF.
  5. Untar the myslq-udf tarball. Patch if needed.
  6. Install the mysql header files. I was able to do this via the Windows Installer, which let me modify my existing mysql installation; I had to add the ‘C Include Files / Lib Files’ feature.
  7. Create a new directory. Copy udf_median.cc from the untarred directory to this new directory.
  8. Create a new file in that directory called udf_median.def. This file contains all the methods the UDF is exporting. Or you can just download the file I used here.
  9. Open Visual C++ Express
  10. Choose File / New / Project From Existing Code. Hit Next. Browse to the directory you just created. Create a name for the project. Hit Finish
  11. Edit the udf_median.cc file and comment out the #ifdef HAVE_DLOPEN line as well as the corresponding #endif. If I didn’t do this, I kept getting link errors, as I guess everything between those preprocessor directives was not being compiled.
  12. Add the mysql include files: right click on the project and choose properties. Expand ‘Configuration Properties’ then ‘C/C++’ and click ‘General’. On the right, add an include directory. Navigate to the Mysql include directory and add that.
  13. Add the module definition file: right click on the project and choose properties. Expand ‘Configuration Properties’ then ‘Linker’ and click ‘Input’. Add ‘udf_median.def’ to the key ‘Module Definition File’.
  14. Make sure VC knows this is a DLL: right click on the project and choose properties. Expand ‘Configuration Properties’ and click ‘General’. Choose ‘Dynamic Library (.dll)’ for Configuration Type. If you don’t do this, you’ll get errors like: error LNK2019: unresolved external symbol _WinMain because the compiler thinks you’re trying to build an application.
  15. Right click on the project and choose ‘Build’. This gives you a DLL in the Debug directory.
  16. Copy the DLL to the bin directory of your mysql installation.
  17. Create the function by logging in to mysql and running this command: CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.dll';. (The user you log in as will need to have the ability to insert rows into the mysql tables.)
  18. Test and enjoy.

Deploying the UDF to linux is much simpler, mostly because you don’t have to install a compiler, linker, etc. I used ‘gcc (GCC) 3.3.4′.

  1. Download the mysql-udf tarball.
  2. patch the files if you’re running a version of mysql greater than 4.1.1. patch available here, or the patched tarball is here.
  3. Untar the myslq-udf tarball. Patch if needed.
  4. Edit the udf_median.cc file and comment out the #ifdef HAVE_DLOPEN line as well as the corresponding #endif.
  5. Compile and link the code. Do not use the instructions on the mysql-udf homepage. If you compile with those flags, you’ll get this error when you try to add the function: mysql> CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';
    ERROR 1126 (HY000): Can't open shared library 'udf_median.so' (errno: 22 /usr/lib/udf_median.so: undefined symbol: _Znwj)
    . Rather, use the instructions in this bug report: gcc -shared -lstdc++ -I /usr/include -I /usr/local/include -I /usr/local/mysql/include/ -o udf_median.so udf_median.cc'
  6. Copy the shared library to a directory where mysql will see it. I put it in /usr/lib.
  7. Create the function by logging in to mysql and running this command: CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';. (The user you log in as will need to have the ability to insert rows into the mysql tables.)
  8. Test and enjoy.

We have pushed this UDF to production with replicated servers and haven’t seen any issues with it yet.

I want to extend my thanks to:

Technorati Tags: , ,

Fri, 08 Sep 06

Every scripting language evolves until it can access an RDBMS

Posted in Programming, Dynamic Languages, Databases at 10:18 am by moore

Here’s my new theory about scripting languages: everybody loves SQL (or, more likely, the data SQL makes accessible), so every scripting language evolves until it can access an RDBMS, whether it’s JavaScript, or Movable Type’s template language or server side includes.

(See also the history of SQL.)

Technorati Tags: , ,

Thu, 23 Feb 06

MySQL performance and doing calculations on varchar columns

Posted in Programming, Databases at 9:45 am by moore

MySQL, along with other features designed to make it easy to use, tries to do the right thing regarding strings. When you perform a math calculation on a column or columns that are of type varchar, MySQL automatically conversts that string to a number (empty strings are treated as zero.):

To cast a string to a numeric value in numeric context, you normally do not have to do anything other than to use the string value as though it were a number[.]

However, this translation, convenient as it may be, is not free. A client of mine had a query that was running calculations against two such columns. After indexing and trying to simplify the query, we were still seeing query execution times of 2+ seconds (all times are quoted for MySQL 4.1, on my relativly slow personal laptop).

The solution appears to be to change the type of the columns using the alter table syntax to type double. After doing so and running analyze table mytable, I was seeing query execution times of 0.2 seconds for the same query on the same box. Fantastic.

I am not sure if this result was due to not having to do several string conversions for each row returned by the query, or the fact that:

In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed[.]

Regardless of the cause, if you’re doing some complicated calculations on columns, consider making them numbers.

Sat, 12 Nov 05

unescaping a string with PL/SQL

Posted in Programming, Databases, Oracle at 9:46 am by moore

I’ve written about PL/SQL before, but I’ve recently started working on a project that uses it heavily. Given the amount of code written for Oracle databases, I’m rather suprised that there’s not a PL/SQL Cookbook, where, like the Perl Cookbook and the Java Cookbook (more cookbooks from O’Reilly are listed here). There is an Oracle Cookbook, but based on a quick scan of Amazon, it’s is focused, as you’d expect, more on the database design than on PL/SQL programming. (Interestingly, there is a Oracle+PHP cookbook, and a PL/SQL sample code page but neither of those is quite what I’m looking for.)

The reason that I’d like a PL/SQL cookbook is that there are large sets of problems that routinely need to be solved in PL/SQL, but the language is so low level (though they just added some regex support in 10g; bravo!) that doing these routine tasks and making sure they’re correctly implemented can be difficult and tedious. This is especially true when it’s a programmer from a different language who’s used to higher levels of abstraction (like, for example, the good folks who author CPAN modules provide)–it’d be well worth my $70 to make sure that I never had to deal with a problem like, say, unescaping a string.

For that’s the problem I recently had to solve. Essentially, we have a string that looks like this: yellow,apple. This string represents two values, which need to be put in different places by splitting them up into ‘yellow’ and ‘apple’. All well and good until the possiblity of embedded commas arises, for it’s possible that the desired end values were ‘yellow,blue’ and ‘apple,banana’. The answer, of course, is to escape the commas on the way in (turning the second input into something like this: yellow:,blue,apple:,banana, and when processing to unescape those special characters (both the comma and the escape character, which in the example is the colon). That’s what these three functions do. They take a string like the above examples and parse it into a table, to be iterated over at your leisure.

/* ------------------- function splitit ------------------*/
FUNCTION splitit(p_str VARCHAR2, p_del VARCHAR2  := ',',p_idx PLS_INTEGER, p_esc VARCHAR2

:= ':')
RETURN INTEGER
IS
l_idx       PLS_INTEGER;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := p_esc;
l_chars_before_count        PLS_INTEGER := 0;
BEGIN
>
LOOP
l_idx := instr(p_str,p_del, p_idx);
IF l_idx > 0 then
WHILE substr(p_str, l_idx-l_chars_before_count-1, 1) = l_escape_char LOOP
l_chars_before_count := l_chars_before_count +1;
END LOOP;

IF mod(l_chars_before_count, 2) = 0 THEN
-- if chars_before_count is even, then we're at a segment boundary
RETURN l_idx;
ELSE
-- if odd, then we're at an escaped delimiter, want to move past
RETURN splitit(p_str, p_del, l_idx+1, p_esc);
END IF;
l_chars_before_count := 0;
ELSE
RETURN l_idx;
EXIT outer;
END IF;
END LOOP;
END splitit;
/* ------------------- function splitit ------------------*/

/* ------------------- function unescape ------------------*/

FUNCTION unescape(p_str VARCHAR2, p_del VARCHAR2 := ',', p_esc VARCHAR2 := ':')
RETURN VARCHAR2
IS
l_str VARCHAR2(32767);
BEGIN
l_str := replace(p_str, p_esc||p_del, p_del);
l_str := replace(l_str, p_esc||p_esc, p_esc);
RETURN l_str;
END unescape;
/* ------------------- function unescape ------------------*/

/* ------------------- function split ------------------*/

FUNCTION split(p_list VARCHAR2, p_del VARCHAR2 := ',')
RETURN split_tbl
IS
l_idx       PLS_INTEGER;
split_idx   PLS_INTEGER     := 0;
l_list      VARCHAR2(32767) := p_list;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := ':';
l_array split_tbl := split_tbl('','','','','','','','','','');
BEGIN
l_list := p_list;
LOOP
split_idx := split_idx + 1;
IF split_idx > 10 then
EXIT;
END IF;

l_idx := splitit(l_list, p_del, 1, l_escape_char);
IF l_idx > 0 then
l_array(split_idx) := unescape(substr(l_list,1,l_idx-1), p_del,

l_escape_char);
l_list := substr(l_list,l_idx+length(p_del));
ELSE
l_array(split_idx) := l_list;
EXIT;
END IF;
END LOOP;
RETURN l_array;
END split;
/* ------------------- function split ------------------*/

/* in the header file, split_tbl is defined */
TYPE split_tbl IS TABLE of varchar2(32767)

Not all of this code is mine–I built on a solution from a colleague. But I hope this saves one other person from the afternoon I just endured. And if you are a PL/SQL expert and care to critique this solution, please feel free.

Fri, 28 Oct 05

Oracle and regular expressions

Posted in Programming, Databases at 3:43 pm by moore

I cut my teeth on perl and really enjoyed the power of regular expressions. Looks like Oracle has added regular expressions to Oracle 10g. Now if they’d just give me tab completion in sql*plus.

Mon, 10 Jan 05

sqlldr

Posted in Programming, Databases, Oracle at 10:55 pm by moore

I’ve been writing SQL*Loader scripts to load a fair bit of data into Oracle. I have a set of load tables with minimal constraints on them, into which SQL*Loader pushes the rows. Then I have written some PL/SQL which pulls from the load tables to the real database.

This architecture was chosen because the PL/SQL procedures can be written to allow incremental as well as full data loads. In the incremental case, it’s conceivable there there’d be a different way of pushing data over to the load tables (via ODBC or JMS, for example). In addition, the load tables can be denormalized, and you can put enough intelligence in the PL/SQL to turn your data structures into something at which a DBA won’t cringe.

Anyway, I thought I’d share a few tips, gleaned through the process. I’m definitely no SQL*Loader guru, but here are some useful links: the sqlldr FAQ, full of good information and recently updated, the Oracle Utilities page which does a great job of explaining all the options of SQL*Loader, and this case study which outlines internationalization with sqlldr. All very useful.

Two other tips: If you are loading delimited character data that is longer that 255 characters, you need to specify the length in your control file (for example, declaring it in the control file as char(4000)), or else you’ll get an aggravating error message warning that the data you’re loading is longer than the column in which you’re trying to load it. I spent some time looking very carefully at the load table trying to see what I was missing before I googled and found out that char fields do have default sizes in sqlldr control files.

And the bindsize and rows parameters are related, in terms of the amount of data that sqlldr can push into a table before it commits. You can make rows very very big, but if bindsize is too small (it defaults to 64k, apparently) the commits will happen sooner than they need to. For more explanation and other perforamance tips, see this page.

Overall, I’ve been very happy with how easy it is to load a fair bit of data, quickly (both in terms of load time and in development time) using sqlldr.

« Previous entries ·

© Moore Consulting, 2003-2008