Databases – Page 2

Java REST API Framework Options

I’ve been working with a couple of REST API solutions that exist in the Java tech stack. I haven’t seen any great analysis of REST API solutions (though Matt Raible does mention some in this exhaustive slide deck about Java frameworks [pdf]), so wanted to share my on the ground experience.

First up is restSQL. This framework makes it easy to get data from a database to a JSON or XML REST API and back. If you have a servlet container available, you write two configuration files, one with a SQL query and one with db connection information, and you have a RESTful API. For prototyping and database access, it is hard to beat.

Pros:

Quick to set up
Only SQL knowledge is required
No programming required
Allows simple mapping of db table to resource, but can include one to one and one to many mappings
Supports all four REST operations out of the box
Supports XML as well as JSON
Is an embeddable java library as well as a standalone framework
Project maintainer is engaged and the project is moving forward

Cons:

Requires a servlet engine, and you have to restart it for changes to your configuration to be picked up
Output format has limited customization
Only works with mysql and postgresql databases (though there is some experimental support for Oracle and MS SQL)
Doesn’t work with views
The security model, while fine grained, isn’t modern/OAuth (can be solved with an API gateway (like 3scale, Tyk or ApiAxle) or proxy

The next framework I have experience with is Dropwizard. This is a powerful framework that creates uberjars that you can run on any port as a standalone service. It’s not limited to providing a JSON representation of database tables–if you can create a Java object, Dropwizard can serve it up as a JSON resource.

Pros:

Community support
Extreme output formatting flexibility, but be prepared to write a custom deserializer if you want to handle anything other than reads of custom formatted objects
Supports any database that hibernate supports
Built in testing support
Brings together ‘best of breed’ tools like Jersey, Jackson and Hibernate, so you don’t have to do the integration yourself
Great documentation

Cons:

Have to roll your own deployment solution (tarball, chef, puppet)
No services startup script provided
Shading can slow down development
Not yet at 1.0 release

The last one I don’t have familiarity with, but a colleague used it in the past. It is Sparkjava. This is a lightweight framework that fits when you have an existing Java library with functionality you want to expose. I’m not competent to write pros/cons for this framework, but wanted to mention it.

The gorilla in the room that I haven’t had experience with (in terms of writing RESTful webs services) is Spring. I would definitely include this in any greenfield solutions review.

APIs Databases Java | moore | September 15, 2014

Small scale data migrations

So, I’ve recently been involved in another data migration, the second one in three years. These are small migrations, with thousands of records. One person could take care of this size of data migration with effort, but the amount of data is still large enough that manual data re-entry isn’t really an option–the error rate and the cost and the management difficulty mean that a software solution is the better option.

Here are some lessons I learned from these data migrations.

Learn as much as you can about the data models–both the old and the new–as you can. This includes, in preferred order, talking to any people familiar with the old system, talking to any people familiar with the new system, looking at the databases via a sql client, reading documentation (if any is written), and looking at code. I spent some time thrashing around in old system code for a while. Then I asked the developer for a tour, and learned more in that hour than I had in the previous day of looking at code.

Map entities and concepts as early as you can. Take special note of any that are in the old and not in the new (and what you are planning to do with them). Those that are in the new and not in the old aren’t as big of an issue. Also, attributes of entities are as important as entities, so note discrepancies there. Early on I noticed that one of the two primary entities in the old system did not exist in the new system. This led to some interesting conversations with the business users that saved me work.

As above, talk to people who are going to be using the new system, and who use the old system, throughout the migration process. An entity or attribute that will be a royal pain to migrate may not be used anymore! Or, the business person might have some good ideas on how to map something in the old system into the new system. Someone who uses the software you are migrating has more domain expertise than you. Let them try the new system with migrated data as soon as some data is moved. Make sure to guide their experience so they don’t spin their wheels looking in corners of the system that not yet migrated.

Start a spreadsheet of tasks. Doing so means that every time you uncover something that needs to be done while you are in the process of doing something else, you can note it and keep on your original task. My spreadsheets are simple; three columns are enough: task name, completed (with an X for completion, blank for still open) and notes (for possible implementation solutions, people to talk to, relevant URLs, or any other text that will help me complete the task).

Document all the migration steps, preferably to the point you can cut and paste commands. Include any discrepancies discovered, special commands to run, access to all needed systems, names of relevant people, areas that need further investigation, and basically anything else you would want handed to you if you were starting on this project. This helps immensely if you need to pass off the project, or come back to it later (even just a few days), and provides documentation of entities on the old and new system.

Write scripts wherever possible, but don’t try to script the whole process. Access to different servers can be hard to automate. Use whatever language you feel is best for these scripts. I’ve used bash, sql, perl, and awk/sed, but I don’t shy away from a compiled language like java, especially if a library exists that can save me time. Make sure to put these scripts into version control, and document the purpose with comments at the top and a good name. I wouldn’t worry too much about unit testing or refactoring this software, because chances are it will be seldom used once the migration occurs.

Get familiar with the concatenate function of your database. Using queries to write DDL for the new system based on data from the old system can save you writing a script in an imperative language. When migrating from Expression Engine to WordPress, I used a statement like <code>select concat(‘update wp_comments set comment_author_email = ”’,email,”’ where comment_author = ”’,name,”’;’) from exp_comments where name in (select distinct(name) from exp_comments);</code> to generate an update statement for WordPress for each comment author in the EE database.

Think about data types and representations. Especially if you are moving from one database to another. When I was moving from MSSQL to MySQL, date fields were particularly thorny.

Realize that these types of projects are typically difficult slogs. There were moments where I despaired of ever getting through the migration in a timely fashion. To do it right, you need a fantastic attention to detail, an understanding of the business needs, and an ability to drive things through to the finish. All of this can be pretty draining–I find it far more draining than bug fixing or building new features.

Control the old and new systems–try to not have new capabilities added during the migration. If you can’t guarantee that, can the migration wait until the new and old systems stabilize? If not, checkpoint the migration against the new capabilities during the process, and realize that you are introducing a lot of extra work and complexity into an already complex process.

Have a staging system where you can practice your migrations without affecting anyone. Plan to go through at least two or three of these new staging systems so that you can get the migration steps solid before you touch production. Start from a clean slate each time so no time is spent chasing phantom bugs from a previous migration that didn’t finish or wasn’t entirely correct. This is what makes the migration documentation you write so important. Be aware that the new stage system and the new production system will not necessarily be the same.

Lastly, avoid committing to a schedule if at all possible. And if you must, pad it and only commit after you’ve done a thorough analysis. Because there are so many hidey holes and areas of the old system that you won’t understand, there is a high probability that you’ll be discovering new issues and data you need to migrate halfway through the project. (This is a special case of the requirements nightmare known as ‘build system B that acts exactly like system A’.) Communicate progress to the business.

While this is not my favorite type of project, when done well it can have tremendous business value. Combining newer, more flexible systems with rich older data, without re keying the data, can make system users much happier. In some cases, if there is no migration, the newer system simply can’t be used.

Databases Programming | moore | August 3, 2012

Moving data from MS-SQL to mysql

I recently worked on a project where I needed to port data from a MS-SQL database to a mysql database. There are programs, both payware and freeware that will help with this process, but I didn’t have ODBC access to the MS-SQL database, which these programs require. All I had were a bunch of insert statements that looked like this:


INSERT INTO
[sample].[dbo].[users_info]([ID],[registration_day],[registration_month],
[registration_year],[registration_time],[first_name],[last_name],
[username],[userpwd],[repeat_pwd],[birth_date],[birth_day],[birth_year],
[street],[state],[city],[zip_code],[email_address],[membership_startdate],
[online],[gender]) VALUES(463,N'15',N'7',N'2009',N'9:13:52 AM',N'Homer',
N'Simpson',N'homerrocks',N'beerbeer',N'beerbeer',N'January',N'1',N'1999',
N'234 Main',N'Alaska',N'Springfield',NULL,N'homer@thesimpsons.com'),
CAST(0x9AE90000 AS SmallDateTime),
CAST(0x00009CD700000000 AS DateTime),N'Man');
I wrote a perl script to turn that dialect of SQL into a mysql friendly dialect (feel free to download it).
The most interesting parts were those CAST statements.  This forum post and this blog post helped me turn those casts into real dates.  (After loading the inserts into mysql, I did some post processing, using the helpful case statement and str_to_date function to rationalize some of the data.)
[tags]mssql, mysql, data migration[/tags]


   
        
                  Databases MySQL          |  moore
         |  March 13, 2010



				
					 

 
   
	    
        
            Hibernate Boolean/Integer ClassCastException
        
        
            I ran into an issue the other day with Hibernate configuration.
I have a bean that maps to a table in the database.  It has a column, featured, that only has values of 0 or 1.  For legacy reasons, we map that to an integer (turning it into a boolean is on the List Of Things To Do).
I ran into an issue, and was getting this error in the logs:

Caused by: java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.Integer
at org.hibernate.type.IntegerType.set(IntegerType.java:41)
at org.hibernate.type.NullableType.nullSafeSet(NullableType.java:136)
at org.hibernate.type.NullableType.nullSafeSet(NullableType.java:116)
at org.hibernate.loader.Loader.bindPositionalParameters(Loader.java:1698)
at org.hibernate.loader.Loader.bindParameterValues(Loader.java:1669)
at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1554)
at org.hibernate.loader.Loader.doQuery(Loader.java:661)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:224)
at org.hibernate.loader.Loader.doList(Loader.java:2211)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2095)
at org.hibernate.loader.Loader.list(Loader.java:2090)
at org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:94)
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1569)
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:283)
Here's the relevant section of the mapping file:
<property name="featured" type="integer" columnn="featured" />
I tried explicitly laying out the type of the SQL column:
<property name="featured" type="integer">

<column sql-type="INTEGER" name="featured" />

</property>
But neither of these worked; I received the same error.
I took a step back and looked at the bean.


Integer featured;

public Integer getFeatured() {
return featured;
}
public void setFeatured(Integer featured) {
this.featured = featured;
}

...
and realized that I had added a convenience method that was confusing the Hibernate code:


public boolean isFeatured() {
return ...
}
When I changed the signature of the convenience method to:


public boolean isFeaturedListing() {
return ...
}
the exception went away.  Thought I'd share for anyone searching for this stacktrace.
[tags]javabeans[/tags]
                                

     

   

   
        
                  Databases Java          |  moore
         |  December 28, 2009
		                     
                
		
    
 



				
					 

 
   
	    
        
            Tips: Deploying a web application to the cloud
        
        
            I am wrapping up helping a client with a build out of a drupal site to ec2.  The site itself is a pretty standard CMS implementation–custom content types, etc.  The site is an extension to an existing brand, and exists to collect email addresses and send out email newsletters.  It was a team of three technical people (there were some designers and other folks involved, but I was pretty much insulated from them by my client) and I was lucky enough to do a lot of the infrastructure work, which is where a lot of the challenge, exploration and experimentation was.
The biggest attraction of the cloud was the ability to spin up and spin down extra servers as the expected traffic on the site increased or decreased.  We choose Amazon’s EC2 for hosting.  They seem a bit like the IBM of the cloud–no one ever got fired, etc.  They have a rich set of offerings and great documentation.
Below are some lessons I learned from this project about EC2.  While it was a drupal project, I believe many of these lessons are applicable to anyone who is building a similar system in the cloud.  If you are building an video processing super computer, maybe not so much.
Fork your AMI
Amazon EC2 running instances are instantiations of a machine image (AMI).  Anyone can create a machine image and make it available for others to use.  If you start an instance off an image, and then the owner of the image deletes the image (or otherwise removes it), your instance continues to run happily, but, if you ever need to spin up a second instance off the same AMI, you can’t.  In this case, we were leveraging some of the work done by Chapter Three called Project Mercury.  This was an evolving project that released several times while we were developing with it.  Each time, there was a bit of suspense to see if what we’d done on top of it worked with the new release.
This was suboptimal, of course, but the solution is easy.  Once you find an AMI that works, you can start up an instance, and then create your own AMI from the running instance.  Then, you use that AMI as a foundation for all your instances.  You can control your upgrade cycle.  Unless you are running against a very generic AMI that is unlikely to go away, forking is highly recommended.
Use Capistrano
For remote deployment, I haven’t seen or heard of anything that compares to Capistrano.  Even if you do have to learn a new scripting language (Ruby), the power you get from ‘cap’ is fantastic.  There’s pretty good EC2 integration, though you’ll want to have the EC2 response XML documentation close by when you’re trying to parse responses.  There’s also some hassle involved in getting cap to run on EC2.  Mostly it involves making sure the right set of ssh keys is in the correct place. But once you’ve got it up and running, you’ll be happy.  Trust me.
There’s also a direct capistrano/EC2 integration project, but I didn’t use that.  It might be worth a look too.
Use EBS
If you are doing any kind of database driven website, there’s really no substitute for persistent storage.  Amazon’s Elastic Block Storage (EBS) is relatively cheap.  Here’s an article explaining setting up MySQL on EBS.  I do have a friend who is using EC2 in a different manner that is very write intensive, that is having some performance issues with his database on EBS, but for a write seldom, read often website, like this one, EBS seems plenty fast.
EC2 Persistence
Some of the reasons to use Capistrano are that it forces you to script everything, and makes it easy to keep everything in version control.  The primary reason to do that is that EC2 instances aren’t guaranteed to be persistent.  While there is an SLA around overall EC2 availability, individual instances don’t have any such assurances.  That’s why you should use EBS.  But, surprisingly, the EC2 instances that we are using for the website haven’t bounced at all.  I’m not sure what I was expecting, but they (between three and eight instances) have been up and running for over 30 days, and we haven’t seen a single failure.
Use ElasticFox
This is a FireFox extension that lets you do every workaday task, and almost every conceivable operation, to your EC2 instances.  Don’t delay, use this today.
Consider CloudFront
For distributed images, CloudFront is a natural fit.  Each instance can then reference the image, without you needing to sync files across instances.  You could use this for other files as well.
Use Internal Network Addressing where possible
When you start an EC2 instance, Amazon assigns it two IP addresses–an external name that can be used to access it from the internet, and an internal name.  For most contexts, the external name is more useful, but when you are communicating within the cloud (pushing files around, or a database connection), prefer the internal DNS.  It looks like there are some performance benefits, but there are definitely pricing benefits. “Always use the internal address when you are communicating between Amazon EC2 instances.  This ensures that your network traffic follows the highest bandwidth, lowest cost, and lowest latency path through our network.”  We actually used the internal DNS, but it makes more sense to use the IP address, as you don’t get any abstraction benefits from the internal DNS, which you don’t control–that takes a bit of mental adjustment for me.
Consider reserved instances
If you are planning to use Amazon for hosting, make sure you explore reserved instance pricing.  For an upfront cost, you get significant savings on your runtime costs.
On Flexibility
You have a lot of flexibility with EC2–AMIs are essentially yours to customize as you want, starting up another node takes about 5 minutes, you control your own DNS, etc.  However, there are some things that are set at startup time.  Make sure you spend some time thinking about security groups (built in firewall rules)–they fall into this category.  Switching between AMIs requires starting up a new instance.  Right now we’re using DNS round robin to distribute load across multiple nodes, but we are planning to use elastic IPs which allow you to remap a routable ip address to a new instance without waiting for DNS timeouts.  EBS volumes and instances they attach to must be in the same availability zone.  None of these are groundbreaking news, it’s really just a matter of reading all the documentation, especially the FAQs.
Documentation
Be aware that there are a ton of documentation, one set for each API release, for EC2 and the other web services that Amazon provides.  Rather than starting with Google, which often leads you to an outdated version of documentation, you should probably start at the AWS documentation center.  This is especially true if you’re working with any of the systems that are newer with perhaps not as stable an API.
In the end
Remember that, apart from new tools and a few catches, using EC2 is not that different than using a managed server where you don’t have access to the hardware.  The best document I found on deploying drupal to EC2 doesn’t talk about EC2 at all–it focuses on the architecture of drupal (drupal 5 at that) and how to best scale that with additional servers.
[tags]ec2,amazon web services,capistrano rocks[/tags]
                                
     
   
   
        
                  Cloud Computing Databases Drupal Useful Tools Web Applications          |  moore
         |  October 29, 2009
		                     
                
		
    
 


				
					 

 
   
	    
        
            Optimizing a distance calculation in a mysql query
        
        
            If you have a query that sorts by a derived field, and then takes a limited number of the results, it can be a real dog.  Here’s how I optimized a situation like this.  Imagine this table.
create table office_building (
id int primary key,
latitude float not null,
longitude float not null,
rent int,
address varchar(20),
picture_url varchar(255)
);
If you want to find the nearest 100 office buildings to a point on a map, you run a query something like this (plug your lat/lng into the question marks):
explain select *, round( sqrt( ( ( (latitude - ?) * (latitude - ?) ) *  69.1 * 69.1) +
((longitude - ?) * (longitude - ?) * 53 * 53 ) ) ) as distance
from office_building order by distance limit 100
(See here for an explanation of the 69.1 and 53 constants–basically they convert roughly from lat/lng to miles.) Unfortunately, you are ordering by a derived field, and mysql can no longer do order by optimization.
This means that you’ll be doing a filesort (which does not actually have anything to do with the filesystem, but is just a sort not on an index).  And this, in turn means that your performance will suck if you have any large number of rows returned.
You can help things out a bit by limiting your office building query to a box of a certain size around the point.  Here’s the query with a 5 mile box:
select *, round( sqrt( ( ( (latitude - ?) * (latitude - ?) ) *  69.1 * 69.1 ) +
( (longitude - ?) * (longitude - ?) * 53 * 53 ) ) ) as distance
from office_building
where latitude < ?  + (1/69.1)*5 and latitude > ? - (1/69.1)*5 and longitude < ? + (1/53)*5 and longitude > ? - (1/53)*5
order by distance limit 100
But if you still have too many results, the sorting on distance will be slow.  Also, even if you have an index on latitude and longitude, (such as create index idx_nearby on office_building (latitude,longitude)) because you are not using equality, only the first column will be used.
This is worth repeating, because it took me a while to understand.  If you have an index: create index idx on tbl (col1,col2,col3,col4,col5) and you run a query like select count(*) from tbl where col1 = 1 and col2 > 2 and col3 < 3 and col4 > 4 only col1 and col2 will be used from the index.  Mysql goes to the table data files for col3 and beyond (assuming no other indices on the table).  This makes sense when you think about how indices are created and stored, but I didn’t really understand it until I’d been beaten over the head with it.
As stated here: “[mysql] will use the fields [in the index], from left to right, as long as the WHERE clause has “=”. Once it hits a ‘range’ (>, IN, BETWEEN, …), it stops with that field.”  I don’t know why it is not in the mysql index documentation–maybe it is obvious?
The solution I found was to separate what I wanted to find in the select clause from how I find it, in the where and order by clause:


select select_clause.*,
round( sqrt( ( ( where_clause.latitude - ?) * (where_clause.latitude - ? ) *  69.1 * 69.1 ) +
( (where_clause.longitude - ? ) *(where_clause.longitude - ? ) * 53 * 53 ) ) ) as distance
from office_building where_clause, office_building select_clause
where where_clause.latitude < ? + (1/69.1)*5 and where_clause.latitude > ? - (1/69.1)*5
and where_clause.longitude < ? + (1/53)*5 and where_clause.longitude > ? - (1/53)*5
and where_clause.id = select_clause.id
order by distance
limit 100
You also need to add an index:
create index idx_nearby on office_building (latitude,longitude,id);
Then, when you run the query, you still have the filesort, but you also see the magic ‘Using index’ in your explain plan.  You never have to go to the table to do the sort!  You also have a join now, but it’s on the primary key, and you only need to go to the table for the 100 rows that you know you want.
Using this query had an effect on one live system of one to two orders of magnitude increase in query speed, depending on the query.  This not only works for distance queries, but anytime you want to order by a calculated value.
More useful links: geo search suggestions, index explanation
[tags]mysql, performance, query optimization[/tags]
                                
     
   
   
        
                  Databases MySQL          |  moore
         |  October 16, 2009
		                     
                
		
    
 


				
					 

 
   
	    
        
            Article about using hibernate with GWT
        
        
            I just read this article about the Google Web Toolkit and hibernate, and I’m thrilled that someone wrote this. A few years ago, when I was just starting to use GWT and hibernate, the ORM tool, I thought about writing something similar myself. I could never get over the hump of writing about setting up all the infrastructure necessary, something which the author does quite nicely.
I think this article gives a great overview of some of the complexities of using hibernate with the GWT client. The author essentially talks about three possible solutions to the primary problem when using hibernate objects in a GWT system: hibernate enhances your POJO code, and thus you cannot send objects returned from hibernate queries down the wire to the JavaScript client.  The JRE emulation simply can’t handle it.
I especially enjoyed the explanations of how to use some of the tools, to make mapping between GWT capable objects and hibernate objects easier. I’d heard of hibernate4gwt, now Gilead, but never used it. For most of my RPC calls, I end up using the first approach the author explores, custom DTO creation. Often times, I won’t create a special DTO object, but rather reuse the POJO that represents the domain object. This way, you can scrub subsidiary objects (though you lose lazy loading when you do this) and send those down as well.  As long as the POJO doesn’t have too many extraneous members, this seems to work fine, and removes the need for an extra class.
I was a bit frustrated, however, that the author ignored the delete case. This seems like a situation where tools like Gilead might really shine. I have often run into issues where I have to add a ‘deleted’ boolean flag to the hibernate object.  I do this because when an object gets deleted from a collection on the GWT side, my server-side code has no way of knowing this, without some additional complexity (rerunning the query and doing a comparison of results). Adding such a ‘deleted’ boolean flag, solves one set of problems, but raises additional complexity, because you end up having to check to see whether or not an object exists before you try to insert it in the database.
For example, imagine you have a user with set of CDs, which you display in a grid.  If you want to allow a user to correct the name of one of the CDs, and send it back, the server side has the modified record, hopefully with and ID, and can simply save it.  But if you delete one of the CDs from the collection, the server side does not have the modified object, and so has to figure out which one to delete.  Gilead, with its knowledge of the object graph, seems at first glance like it could solve this problem elegantly (a quick search on the Gilead site shows nothing that I could see).
Also note that, using RPC is fantastic for GWT applications, but if you think about using GWT for widgets, I would suggest using something that gives you a bit more flexibility like JSONP. Because GWT RPC depends on XMLHTTPRequest, it is fundamentally limited to sites where the JavaScript and RPC services are on the same host.  Obviously, since using JSONP serializes hibernate objects to strings, none of these tools are appropriate.  (See my survey of Google Web Toolkit client-server communication strategies for more.)
All that said, if you’re thinking about using hibernate and GWT in the same project, reading this paper and running through the examples will be a worthwhile use of your time.
[tags]hibernate,gwt,useful articles[/tags]
                                
     
   
   
        
                  Databases GWT Java          |  moore
         |  July 30, 2009
		                     
                
		
    
 


				
					 

 
   
	    
        
            Data.gov–freely accessible data in standard formats from the USA federal government
        
        
            Have you ever wondered where the world’s copper smelters are?  Or pondered reservoir storage data for the Colorado river?  Had questions about the residential energy use of US households?
Now you can find the answers to these questions, using data.gov, the stated purpose of which is “to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.”  I’ve been writing for a while about the publishing power of the internet, but data.gov takes this to a whole new level.
It’s definitely a starting point, not an end, as there are only 47 raw datasets that you can access.  They cover a wide range of data and agencies, and were apparently chosen to kick things off because they “already enjoy a high degree of consensus around definitions, are in formats that are readily usable, include the availability of metadata, and provide support for machine-to-machine data transfer.”  The four main formats for data provided by data.gov are XML, CSV, KML and ESRI.  (There are also a number of widgets, and tools you can use, including the census factfinder.)
More datasets can be requested, and I’m hoping that they will be rolled out soon.  What a playground! Go take a look!
Update, 4:55: Here’s a great article on the whole process and problem.

[tags]data.gov,public data[/tags]
                                
     
   
   
        
                  Databases GIS Useful Tools          |  moore
         |  May 23, 2009
		                     
                
		
    
 


				
					 

 
   
	    
        
            BatchUpdateException using Hibernate and MySQL5
        
        
            I ran into a crazy error last week.  One of my clients was upgrading from MySQL4 to MySQL5.  The application in question was using Hibernate 3.2.  Here’s the table structure, and the hibernate bean definition.  See if you can spot the issue:

mysql> desc stat;
+--------------+-------------+------+-----+---------------------+-------+
| Field        | Type        | Null | Key | Default             | Extra |
+--------------+-------------+------+-----+---------------------+-------+
| stat_date    | date        |      | PRI | 0000-00-00          |       |
| stat_type    | varchar(50) |      | PRI |                     |       |
| stat_count   | int(11)     | YES  |     | NULL                |       |
| last_updated | datetime    |      |     | 0000-00-00 00:00:00 |       |
+--------------+-------------+------+-----+---------------------+-------+
4 rows in set (0.00 sec)

<class name="com.foo.common.data.Statistic" table="stat" lazy="false">
<cache usage="read-write"/>
<composite-id name="statisticId" class="com.foo.common.data.StatisticId">
<key-property name="date" type="java.util.Date" column="stat_date"/>
<key-property name="type" column="stat_type"/>
</composite-id>
<property name="count" column="stat_count"/>
<property name="lastUpdated" type="java.util.Date" column="last_updated" />
</class>
The exception stack trace I was seeing was something like this:

2007-12-31 14:15:09,888 ERROR [Thread-14] def.AbstractFlushingEventListener (AbstractFlushingEventListener.java:301)

- Could not synchronize database state with session
org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:71)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:249)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:235)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:139)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
....
Caused by: java.sql.BatchUpdateException: Duplicate key or integrity constraint
violation message from server: "Duplicate entry '2007-12-31-stattype' for key 1"
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1492)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at org.hibernate.jdbc.BatchingBatcher.doExecuteBatch(BatchingBatcher.java:48)
at org.hibernate.jdbc.AbstractBatcher.executeBatch(AbstractBatcher.java:242)
... 57 more
I ended up turning on the mysql logging (the log setting in the my.ini file, which logs all sql statements mysql makes) to see what was happening.
Basically, I was looking to see if an entry in the stat table existed; if it did, increment and update, if it did not, insert.  And the insert was always happening, so the entry was not found--it did exist because mysql threw the 'integrity constraint' exception.
The cause of the issue was the date type of stat_date and the fact that I incorrectly mapped it to java.util.Date.  It really should have been mapped to java.sql.Date.  How this worked in mysql4 is beyond me, but it did.  Changing the hibernate dialect to mysql5 had no impact.
[tags]mysql upgrade,hibernate[/tags]
                                

     

   

   
        
                  Databases Java MySQL          |  moore
         |  January 4, 2008
		                     
                
		
    
 



				
					 

 
   
	    
        
            MySQL tuning
        
        
            If you’re trying to tune MySQL, make sure you measure before and after.  mysqlreport is a great way to measure a number of aspects of mysql (make sure you check out the guide).  If you want graphing and long term tracking for mysqlreport and just about any other measure you care to track, I recommend cacti, which works with rrdtool.
[tags]mysql performance, monitoring[/tags]
                                
     
   
   
        
                  Databases MySQL          |  moore
         |  April 11, 2007
		                     
                
		
    
 


				

			
			            	
	
		
			
	« Previous
	1
	2
	3
	Next »




 
    
    Letters to a New Developer
Pages

	

	About

	

	Contact Me

Subscribe
			My newsletter will deliver my blog posts to your email as well as infrequent email only content.
I also have a free newsletter about CIAM if you prefer a more focused option.

		
Socials
 Follow me on Twitter

 Follow me on Blue Sky

 Follow me on Mastodon
Categories
			
					8z

	Amazon Machine Learning

	Amazon RDS

	Android

	AngularJS

	APIs

	AWS

	BarCamp

	Big Data

	BJUG

	Blog

	Books

	BPA

	Business

	CakePHP

	ciam

	Cloud Computing

	Community

	Conferences

	Consulting

	Content

	Contracting Basics

	Cordova CLI

	CU Colloquia

	Databases

	Devops

	Devrel

	Drupal

	Dynamic Languages

	ElasticSearch

	Email

	FusionAuth

	GIS

	golang

	Google Apps

	Google Maps

	GWT

	GWT Mini Patterns

	HTTP

	IDEs

	Interview

	Java

	Javascript

	Jobs

	Lifehack

	Mistakes

	Mobile Technology

	MySQL

	New Tech Meetup

	Newsletter

	NoSQL

	Oracle

	Pentaho Data Integration

	Phonegap

	PHP

	Podcast

	Presentations

	Programming

	Rails

	Real Estate

	Recruiting

	RIA

	RSS Pick

	SaaS

	Security

	Sharetribe

	Side Project

	Social Media

	Startup

	Stripe

	Technology

	Technology and Society

	Testing

	The Food Corridor

	Thunderbird

	Tips

	Tomcat

	Transposit

	Uncategorized

	Useful Tools

	Video

	Web Applications

	Wikis

	YUI

			

			
Archives
			
					July 2025 (1)
	June 2025 (1)
	May 2025 (1)
	April 2025 (1)
	March 2025 (1)
	February 2025 (1)
	January 2025 (2)
	December 2024 (1)
	November 2024 (1)
	October 2024 (1)
	September 2024 (1)
	August 2024 (1)
	July 2024 (1)
	June 2024 (1)
	May 2024 (1)
	April 2024 (1)
	March 2024 (1)
	February 2024 (1)
	January 2024 (1)
	December 2023 (1)
	November 2023 (1)
	October 2023 (2)
	September 2023 (1)
	August 2023 (1)
	July 2023 (1)
	June 2023 (1)
	May 2023 (1)
	April 2023 (1)
	March 2023 (1)
	February 2023 (1)
	January 2023 (1)
	December 2022 (1)
	November 2022 (2)
	October 2022 (1)
	September 2022 (1)
	August 2022 (1)
	July 2022 (1)
	June 2022 (1)
	May 2022 (1)
	April 2022 (1)
	March 2022 (1)
	February 2022 (1)
	January 2022 (1)
	December 2021 (1)
	November 2021 (1)
	October 2021 (1)
	September 2021 (1)
	August 2021 (1)
	July 2021 (1)
	June 2021 (1)
	May 2021 (1)
	April 2021 (1)
	March 2021 (1)
	February 2021 (1)
	January 2021 (1)
	December 2020 (1)
	November 2020 (1)
	October 2020 (1)
	September 2020 (1)
	August 2020 (1)
	July 2020 (1)
	June 2020 (1)
	May 2020 (1)
	April 2020 (2)
	March 2020 (3)
	February 2020 (1)
	January 2020 (4)
	December 2019 (2)
	November 2019 (1)
	October 2019 (1)
	September 2019 (1)
	August 2019 (3)
	July 2019 (1)
	June 2019 (1)
	May 2019 (1)
	April 2019 (1)
	March 2019 (1)
	February 2019 (1)
	January 2019 (2)
	December 2018 (1)
	November 2018 (1)
	October 2018 (2)
	September 2018 (4)
	August 2018 (1)
	July 2018 (2)
	June 2018 (6)
	May 2018 (15)
	April 2018 (18)
	March 2018 (31)
	February 2018 (28)
	January 2018 (32)
	December 2017 (31)
	November 2017 (3)
	October 2017 (2)
	September 2017 (1)
	August 2017 (2)
	July 2017 (4)
	June 2017 (3)
	May 2017 (5)
	April 2017 (3)
	March 2017 (4)
	February 2017 (5)
	January 2017 (4)
	December 2016 (2)
	November 2016 (1)
	October 2016 (2)
	September 2016 (2)
	August 2016 (6)
	July 2016 (1)
	June 2016 (2)
	May 2016 (1)
	April 2016 (2)
	March 2016 (2)
	February 2016 (1)
	January 2016 (2)
	December 2015 (1)
	November 2015 (2)
	October 2015 (3)
	September 2015 (1)
	August 2015 (1)
	July 2015 (3)
	June 2015 (2)
	May 2015 (1)
	April 2015 (1)
	March 2015 (1)
	February 2015 (5)
	January 2015 (13)
	December 2014 (14)
	November 2014 (12)
	October 2014 (15)
	September 2014 (17)
	August 2014 (16)
	July 2014 (4)
	June 2014 (1)
	May 2014 (1)
	April 2014 (1)
	March 2014 (2)
	February 2014 (3)
	January 2014 (2)
	December 2013 (3)
	November 2013 (2)
	October 2013 (3)
	September 2013 (2)
	August 2013 (5)
	July 2013 (13)
	June 2013 (6)
	May 2013 (5)
	April 2013 (5)
	March 2013 (2)
	February 2013 (1)
	January 2013 (2)
	December 2012 (1)
	November 2012 (2)
	October 2012 (4)
	September 2012 (3)
	August 2012 (5)
	July 2012 (2)
	June 2012 (2)
	May 2012 (3)
	April 2012 (3)
	March 2012 (2)
	February 2012 (5)
	January 2012 (4)
	December 2011 (4)
	October 2011 (3)
	September 2011 (4)
	August 2011 (1)
	July 2011 (2)
	June 2011 (1)
	May 2011 (2)
	April 2011 (2)
	March 2011 (2)
	February 2011 (3)
	January 2011 (4)
	December 2010 (2)
	November 2010 (4)
	October 2010 (1)
	September 2010 (3)
	August 2010 (4)
	July 2010 (8)
	June 2010 (1)
	May 2010 (2)
	April 2010 (5)
	March 2010 (5)
	February 2010 (4)
	January 2010 (2)
	December 2009 (9)
	November 2009 (7)
	October 2009 (6)
	September 2009 (4)
	August 2009 (4)
	July 2009 (3)
	June 2009 (10)
	May 2009 (4)
	April 2009 (3)
	March 2009 (4)
	February 2009 (4)
	January 2009 (4)
	December 2008 (5)
	November 2008 (3)
	October 2008 (4)
	September 2008 (5)
	August 2008 (1)
	July 2008 (3)
	June 2008 (4)
	May 2008 (1)
	April 2008 (4)
	March 2008 (8)
	February 2008 (2)
	January 2008 (4)
	December 2007 (5)
	November 2007 (3)
	October 2007 (6)
	September 2007 (1)
	August 2007 (3)
	July 2007 (10)
	June 2007 (3)
	May 2007 (3)
	April 2007 (5)
	March 2007 (6)
	February 2007 (6)
	January 2007 (12)
	December 2006 (3)
	November 2006 (3)
	October 2006 (2)
	September 2006 (10)
	August 2006 (4)
	July 2006 (6)
	June 2006 (12)
	May 2006 (7)
	April 2006 (6)
	March 2006 (10)
	February 2006 (9)
	January 2006 (10)
	December 2005 (4)
	November 2005 (5)
	October 2005 (8)
	September 2005 (4)
	August 2005 (3)
	July 2005 (5)
	June 2005 (12)
	May 2005 (7)
	April 2005 (4)
	March 2005 (3)
	February 2005 (12)
	January 2005 (7)
	December 2004 (10)
	November 2004 (6)
	October 2004 (9)
	September 2004 (7)
	August 2004 (9)
	July 2004 (8)
	June 2004 (6)
	May 2004 (13)
	April 2004 (12)
	March 2004 (13)
	February 2004 (14)
	January 2004 (9)
	December 2003 (15)
	November 2003 (7)
	October 2003 (8)