Skip to content

Testing time dependent kettle transformations

Testing transformations that depend on the date will often be required when you only want to process new data, or if you want to treat events that happened in the past differently depending on how long ago they occurred.

I have handled the time dimension in one of two ways.

The first is to have a SQL statement that is pulled in via a ‘Get Variables’ step.  This statement is then executed.  For the production job, the statement simply pulls the current date from the database: ‘select curdate()‘ for mysql.  For testing, the statement returns some known date: ‘select str_to_date(‘2012-05-27′,’%Y-%m-%d’)‘ for mysql.

The benefit to this is that you can make this SQL call in your transformation, and everything stays tidily in there.  The disadvantage is that you’re making another database call and mostly just for testing purposes.

The second is just to have a variable that is set previously in the job and is passed in to a transformation as a named parameter.  This date can be pulled from a file (for test), or using the ‘Get System Info’ step, or a database lookup (for production).  The benefit to this is that you aren’t necessarily making another database call and it is more understandable.  I can’t think of any downside, so this is my recommended method.

After this setup is done, you can pivot your test data around the hardcoded test date.  For example, if your data should change state one year after insertion, you can set the date in your input data rows to 364, 365 and 366 days from your test date.  This kind of condition testing ensures that when the logic changes (you should change state two years after insertion), your test will fail, and you will know about the issue before your users do.

This is content from my email newsletter about Pentaho Kettle Testing. To receive similar emails in your inbox, sign up below.

Signup for my infrequent emails about pentaho testing.