{"id":876,"date":"2012-08-03T21:16:18","date_gmt":"2012-08-04T03:16:18","guid":{"rendered":"http:\/\/www.mooreds.com\/wordpress\/?p=876"},"modified":"2023-11-16T22:28:09","modified_gmt":"2023-11-17T04:28:09","slug":"small-scale-data-migrations","status":"publish","type":"post","link":"https:\/\/www.mooreds.com\/wordpress\/archives\/876","title":{"rendered":"Small scale data migrations"},"content":{"rendered":"<p>So, I&#8217;ve recently been involved in another data migration, the second one in three years.\u00a0 These are small migrations, with thousands of records. One person could take care of this size of data migration with effort, but the amount of data is still large enough that manual data re-entry isn&#8217;t really an option&#8211;the error rate and the cost and the management difficulty mean that a software solution is the better option.<\/p>\n<p>Here are some lessons I learned from these data migrations.<\/p>\n<p><strong>Learn as much as you can about the data models&#8211;both the old and the new&#8211;as you can.\u00a0<\/strong> This includes, in preferred order, talking to any people familiar with the old system, talking to any people familiar with the new system, looking at the databases via a sql client, reading documentation (if any is written), and looking at code.\u00a0 I spent some time thrashing around in old system code for a while.\u00a0 Then I asked the developer for a tour, and learned more in that hour than I had in the previous day of looking at code.<\/p>\n<p><strong>Map entities and concepts as early as you can.\u00a0<\/strong> Take special note of any that are in the old and not in the new (and what you are planning to do with them).\u00a0 Those that are in the new and not in the old aren&#8217;t as big of an issue.\u00a0 Also, attributes of entities are as important as entities, so note discrepancies there.\u00a0 Early on I noticed that one of the two primary entities in the old system did not exist in the new system.\u00a0 This led to some interesting conversations with the business users that saved me work.<\/p>\n<p><strong>As above, talk to people who are going to be using the new system, and who use the old system, throughout the migration process.\u00a0<\/strong> An entity or attribute that will be a royal pain to migrate may not be used anymore!\u00a0 Or, the business person might have some good ideas on how to map something in the old system into the new system.\u00a0 Someone who uses the software you are migrating has more domain expertise than you.\u00a0 Let them try the new system with migrated data as soon as some data is moved. Make sure to guide their experience so they don&#8217;t spin their wheels looking in corners of the system that not yet migrated.<\/p>\n<p><strong>Start a spreadsheet of tasks.<\/strong> Doing so means that every time you uncover something that needs to be done while you are in the process of doing something else, you can note it and keep on your original task.\u00a0 My spreadsheets are simple; three columns are enough: task name, completed (with an X for completion, blank for still open) and notes (for possible implementation solutions, people to talk to, relevant URLs, or any other text that will help me complete the task).<\/p>\n<p><strong>Document all the migration steps, preferably to the point you can cut and paste commands.\u00a0<\/strong> Include any discrepancies discovered, special commands to run, access to all needed systems, names of relevant people, areas that need further investigation, and basically anything else you would want handed to you if you were starting on this project.\u00a0 This helps immensely if you need to pass off the project, or come back to it later (even just a few days), and provides documentation of entities on the old and new system.<\/p>\n<p><strong>Write scripts wherever possible, but don&#8217;t try to script the whole process.<\/strong> Access to different servers can be hard to automate.\u00a0 Use whatever language you feel is best for these scripts.\u00a0 I&#8217;ve used bash, sql, perl, and awk\/sed, but I don&#8217;t shy away from a compiled language like java, especially if a library exists that can save me time.\u00a0 Make sure to put these scripts into version control, and document the purpose with comments at the top and a good name.\u00a0 I wouldn&#8217;t worry too much about unit testing or refactoring this software, because chances are it will be seldom used once the migration occurs.<\/p>\n<p><strong>Get familiar with the concatenate function of your database.\u00a0<\/strong> Using queries to write DDL for the new system based on data from the old system can save you writing a script in an imperative language.\u00a0 When migrating from Expression Engine to WordPress, I used a statement like &lt;code&gt;select concat(&#8216;update wp_comments set comment_author_email = &#8221;&#8217;,email,&#8221;&#8217; where comment_author = &#8221;&#8217;,name,&#8221;&#8217;;&#8217;) from exp_comments where name in (select distinct(name) from exp_comments);&lt;\/code&gt; to generate an update statement for WordPress for each comment author in the EE database.<\/p>\n<p><strong>Think about data types and representations.<\/strong>\u00a0 Especially if you are moving from one database to another.\u00a0 When I was moving from MSSQL to MySQL, date fields were particularly thorny.<\/p>\n<p><strong>Realize that these types of projects are typically difficult slogs.\u00a0<\/strong> There were moments where I despaired of ever getting through the migration in a timely fashion.\u00a0 To do it right, you need a fantastic attention to detail, an understanding of the business needs, and an ability to drive things through to the finish.\u00a0 All of this can be pretty draining&#8211;I find it far more draining than bug fixing or building new features.<\/p>\n<p><strong>Control the old and new systems&#8211;try to not have new capabilities added during the migration.\u00a0<\/strong> If you can&#8217;t guarantee that, can the migration wait until the new and old systems stabilize?\u00a0 If not, checkpoint the migration against the new capabilities during the process, and realize that you are introducing a lot of extra work and complexity into an already complex process.<\/p>\n<p><strong>Have a staging system where you can practice your migrations without affecting anyone.\u00a0<\/strong> Plan to go through at least two or three of these new staging systems so that you can get the migration steps solid before you touch production.\u00a0 Start from a clean slate each time so no time is spent chasing phantom bugs from a previous migration that didn&#8217;t finish or wasn&#8217;t entirely correct.\u00a0 This is what makes the migration documentation you write so important.\u00a0 Be aware that the new stage system and the new production system will not necessarily be the same.<\/p>\n<p><strong>Lastly, avoid committing to a schedule if at all possible.\u00a0<\/strong> And if you must, pad it and only commit after you&#8217;ve done a thorough analysis.\u00a0 Because there are so many hidey holes and areas of the old system that you won&#8217;t understand, there is a high probability that you&#8217;ll be discovering new issues and data you need to migrate halfway through the project.\u00a0 (This is a special case of the requirements nightmare known as &#8216;build system B that acts exactly like system A&#8217;.)\u00a0 Communicate progress to the business.<\/p>\n<p>While this is not my favorite type of project, when done well it can have tremendous business value.\u00a0 Combining newer, more flexible systems with rich older data, without re keying the data, can make system users much happier.\u00a0 In some cases, if there is no migration, the newer system simply can&#8217;t be used.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, I&#8217;ve recently been involved in another data migration, the second one in three years.\u00a0 These are small migrations, with thousands of records. One person could take care of this [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,6],"tags":[],"class_list":["post-876","post","type-post","status-publish","format-standard","hentry","category-databases","category-programming"],"_links":{"self":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/876","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/comments?post=876"}],"version-history":[{"count":6,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/876\/revisions"}],"predecessor-version":[{"id":3636,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/876\/revisions\/3636"}],"wp:attachment":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/media?parent=876"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/categories?post=876"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/tags?post=876"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}