{"id":1845,"date":"2014-11-24T09:26:07","date_gmt":"2014-11-24T15:26:07","guid":{"rendered":"http:\/\/www.mooreds.com\/wordpress\/?p=1845"},"modified":"2014-11-17T09:58:26","modified_gmt":"2014-11-17T15:58:26","slug":"why-use-an-etl-tool","status":"publish","type":"post","link":"https:\/\/www.mooreds.com\/wordpress\/archives\/1845","title":{"rendered":"Why Use an ETL Tool?"},"content":{"rendered":"<figure style=\"width: 160px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" title=\"2012 ~ transformation of consciousness by AlicePopkorn\" src=\"http:\/\/www.mooreds.com\/wordpress\/wp-content\/uploads\/2014\/11\/8292816990_1b1c8e25b6_m_transformation.jpg\" alt=\"transformation photo\" width=\"160\" height=\"240\" \/><figcaption class=\"wp-caption-text\"><small>Photo by <a href=\"http:\/\/www.flickr.com\/photos\/14111752@N07\/8292816990\" target=\"_blank\">AlicePopkorn<\/a> <a title=\"Attribution-NoDerivs License\" href=\"http:\/\/creativecommons.org\/licenses\/by-nd\/2.0\/\" target=\"_blank\" rel=\"nofollow\"><img decoding=\"async\" src=\"http:\/\/www.mooreds.com\/wordpress\/wp-content\/plugins\/wp-inject\/images\/cc.png\" alt=\"\" \/><\/a><\/small><\/figcaption><\/figure>\n<p>I&#8217;m a big fan of ETL tools.\u00a0 The one with which I am most familiar is <a href=\"http:\/\/community.pentaho.com\/projects\/data-integration\/\">Kettle<\/a>, aka Pentaho Data Integration.\u00a0 When I was working for 8z, we used it heavily to pull data from other systems, process it, and update our databases.\u00a0 While ETL systems are not without their flaws, I think their strengths are such that everyone who is moving data around should consider them.\u00a0 This is more true now than in the past because there is a lot more data flowing everywhere, and there are several viable open source ETL tools, so you don&#8217;t have to spend thousands or tens of thousands of dollars to get started.<\/p>\n<p>What are the benefits of ETL tools?<\/p>\n<ul>\n<li>There are pre-built components for common data tasks (connecting to a database, parsing a flat file) that have been tested and debugged by many many people.\u00a0 It&#8217;s hard to over emphasize how much time this can save, allowing you to focus on business logic.<\/li>\n<li>You operate at a higher level of abstraction.<\/li>\n<li>There is support for other performance features like parallel jobs that you can configure.<\/li>\n<li>The GUI makes data flow obvious.<\/li>\n<li>You can write your own components that leverage existing libraries.<\/li>\n<\/ul>\n<p>What are the detriments?<\/p>\n<ul>\n<li>Possible to version control, impossible to merge.<\/li>\n<li>Limits of components mean you sometimes have to contort your data flows, or drop down to write your own component.<\/li>\n<li>Some components (at least for Kettle) are not open source.<\/li>\n<li>You have to roll your own testing framework.\u00a0 <a href=\"http:\/\/www.mooreds.com\/wordpress\/pentaho-kettle-testing\">I did<\/a>.<\/li>\n<li>You have to learn another tool.<\/li>\n<\/ul>\n<p>Don&#8217;t <a href=\"http:\/\/en.wikipedia.org\/wiki\/Reinventing_the_wheel\">re-invent the wheel<\/a>!\u00a0 Your data movement problem may very well be a super special snowflake, but chances are it isn&#8217;t.\u00a0 Every line of code you write is another you have to maintain.\u00a0 When you are confronted with a data movement problem, take a look at an ETL tool like Kettle and see if you can stand on the shoulders of giants.\u00a0 Here&#8217;s a <a href=\"http:\/\/butleranalytics.com\/5-free-open-source-etl-tools\/\">list of open source ETL tools<\/a> to evaluate.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m a big fan of ETL tools.\u00a0 The one with which I am most familiar is Kettle, aka Pentaho Data Integration.\u00a0 When I was working for 8z, we used it [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,63],"tags":[],"class_list":["post-1845","post","type-post","status-publish","format-standard","hentry","category-databases","category-pentaho-data-integration"],"_links":{"self":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/1845","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/comments?post=1845"}],"version-history":[{"count":4,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/1845\/revisions"}],"predecessor-version":[{"id":1858,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/1845\/revisions\/1858"}],"wp:attachment":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/media?parent=1845"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/categories?post=1845"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/tags?post=1845"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}