Skip to content

Lessons from a data migration

I’ve been working on a data migration project for the last couple of months. There are two schemas, each used by a number of client applications implemented in a number of technologies, and I just wanted to share some of the lessons I’ve learned. Most of the clients are doing simple CRUD but there is some business logic going on as well. I’m sure most of these points will elicit ‘no-duhs’ from many of you.

1. Domain knowledge is crucial. There were many times where I made dumb mistakes because I didn’t understand how one column mapped to another, or how two tables were being consolidated. This would have been easier if I’d had an understanding of the problem space (networking at level 3 and below of the OSI burrito).

2. Parallel efforts end up wasting a lot of time, and doing things in the correct order is important. For instance, much of the client code was refactored before the data layer had settled down. Result? We had to revisit the client layer again. It was hard to split up the data layer work in any meaningful manner, because of the interdependencies of the various tables (thought doing this made more sense than updating client code). Multiple users working on DDL and DML in the same database leads to my next conclusion:

3. Multiple databases are required for effective parallel efforts. Seems like a no-brainer, but the maintenance nightmare of managing multiple developer databases often leads to developers sharing one database. This is workable on a project where most of the development is happening on top of a stable database schema, but when the schema and data are what is being changed, issues arise. Toes are stepped on.

4. Rippling changes through to clients presents you with a design choice. For large changes, like tables losing columns or being consolidated, you really don’t have a choice–you need to reflect those changes all the way through your application. But when it’s a small change, like the renaming of a column, you can either reflect that change in your value objects, or you can hide the changes, either in the DAO (misnamed properties) or database layer (views). The latter choice will lead to confusion down the line, but is less work. However, point #5 is an alternative to both:

5. Code generation a good idea in this case. Rather than having static objects that are maintained in version control, if the value objects and DAOs had some degree of flexibility in terms of looking at the database to determine their properties, adding, deleting and renaming columns would have been much much easier–freeing up more time to fix the GUI and business layer problems that such changes would cause.