{"id":1026,"date":"2013-04-26T11:23:26","date_gmt":"2013-04-26T17:23:26","guid":{"rendered":"http:\/\/www.mooreds.com\/wordpress\/?p=1026"},"modified":"2014-09-13T08:40:32","modified_gmt":"2014-09-13T14:40:32","slug":"testing-with-pentaho-kettle-current-options","status":"publish","type":"post","link":"https:\/\/www.mooreds.com\/wordpress\/archives\/1026","title":{"rendered":"Testing with Pentaho Kettle &#8211; current options"},"content":{"rendered":"<p>Before we dive into writing a custom test suite harness, it behooves us to look around and see if anyone else has solve the problem in a more general fashion.\u00a0 This <a href=\"http:\/\/forums.pentaho.com\/showthread.php?66948-ETL-Testing\">question has been asked in the kettle forums<\/a> before as well.<\/p>\n<p>This article is part of a series.\u00a0 Here&#8217;s the first part, explaining the\u00a0<a href=\"\/wordpress\/archives\/995\">benefits of automated testing for ETL jobs<\/a>\u00a0, and the second, talking about <a href=\"\/wordpress\/archives\/1031\">what parts of ETL processes to test<\/a>.<\/p>\n<p>Below are the options I was able to find.\u00a0 (If you know of any others, let me know and I&#8217;ll update this list.)<\/p>\n<ul>\n<li>\nIn chapter 11, <a href=\"http:\/\/www.wiley.com\/WileyCDA\/WileyTitle\/productCd-0470635177.html\">Pentaho Kettle Solutions<\/a> gives an overview of testing and debugging ETL transformations.\n<\/li>\n<li><a href=\"http:\/\/code.google.com\/p\/testkitchen\/\">TestKitchen<\/a>, a framework that combines some other tools with PDI to help test.\u00a0 This hasn&#8217;t been updated since 2010.\u00a0 I have not had a chance to download this and play around with it, but it is probably worth a look.<\/li>\n<li><a href=\"http:\/\/wiki.pentaho.com\/display\/EAI\/Black+Box+Testing\">PDI Black Box Testing<\/a> is an article from 2007 talking about a framework for PDI testing, but has no code.\u00a0 Here&#8217;s a <a href=\"http:\/\/blog.xebia.com\/2009\/09\/30\/pentaho-kettle-and-integration-testing\/\">blog post with some comments <\/a>about this framework.<\/li>\n<li>The <a href=\"http:\/\/wiki.pentaho.com\/display\/EAI\/Data+Grid\">data grid<\/a> step lets you enter reference or test data, so could play a part in a test.<\/li>\n<li>Here is a blog post describing building a <a href=\"http:\/\/devno.blogspot.ch\/2013\/03\/pentaho-kettle-etl-regression-with.html\">test harness around ETL transformations using Hibernate<\/a>.<\/li>\n<\/ul>\n<p>Other options outlined on <a href=\"http:\/\/stackoverflow.com\/questions\/9993611\/pentaho-kettle-how-to-set-up-tests-for-transformations-jobs\/15837725\">a StackOverflow question<\/a> include using <a href=\"http:\/\/www.dbunit.org\/\">DBUnit<\/a> to populate databases.<\/p>\n<p>A general purpose framework for testing ETL transformations suffers from a few hindrances:<\/p>\n<ul>\n<li>it is easy to have side effects in a transform and in general transformations are a higher level of abstraction than java classes (which is why we can be more productive using them)<\/li>\n<li>inputs and outputs differ for every transform<\/li>\n<li>correctness is a larger question than a set of assert statements that unit testing frameworks provide<\/li>\n<\/ul>\n<p>As we build out a custom framework for testing, we&#8217;ll follow these principles:<\/p>\n<ul>\n<li>mock up outside data sources as CSV files<\/li>\n<li>break apart the ETL process into a load and a transform process<\/li>\n<li>use golden data that we know to be correct as our &#8220;assert&#8221; statements<\/li>\n<\/ul>\n<p>As a reminder, I&#8217;ll be publishing another installment in a couple of days.\u00a0 But if you can&#8217;t wait, <a href=\"https:\/\/github.com\/mooreds\/pentaho-kettle-testing\/\">the full code is on github<\/a>.<\/p>\n<p>Signup for my <a href=\"http:\/\/eepurl.com\/x2Biz\">infrequent emails about pentaho testing<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Before we dive into writing a custom test suite harness, it behooves us to look around and see if anyone else has solve the problem in a more general fashion.\u00a0 [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[63,58],"tags":[],"class_list":["post-1026","post","type-post","status-publish","format-standard","hentry","category-pentaho-data-integration","category-testing"],"_links":{"self":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/1026","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/comments?post=1026"}],"version-history":[{"count":16,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/1026\/revisions"}],"predecessor-version":[{"id":1754,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/posts\/1026\/revisions\/1754"}],"wp:attachment":[{"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/media?parent=1026"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/categories?post=1026"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mooreds.com\/wordpress\/wp-json\/wp\/v2\/tags?post=1026"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}