Now that we have our business logic, we need to build a test case that can exercise that logic.
FYI, this article is part of a series. Previous posts covered:
- The benefits of automated testing for ETL jobs
- what parts of ETL processes to test
- current options and frameworks for testing Kettle
- writing testable business logic
First, build out a job that looks almost like our regular job, but has a few extra steps. Below I’ll show you screen captures from spoon as we build out the business logic, but you can view the complete set of code on github.
It sets some variables for input, output and expected files. You can see below that we also set a base.job.dir variable which is used as a convenience elsewhere in the TestCaseRunner (for pulling in sample data, for example).
The job also creates a temp directory for output files, then calls the two transformations that are at the heart of our business logic. After that, the TestCaseRunner compares the output and expected files, and signals either success or failure.
To make the business logic transformations testable, we have to be able to inject test files for processing. At the same time, in the main job/production, we obviously want to process real data. The answer is to modify the transformations to read the file to process from named parameters. We do this on both the job entry screen:
and on the transformation settings screen:
We also need to make sure to change the main GreetFolks main job to pass the needed parameters into the updated transformations.
Once these parameters are passed to the transformations, you need to modify the steps inside to use the parameters rather than hardcoded values. Below we show the modified Text File Input step in the Load People To Greet transformation.
The input and expected files are added to our project in the
src/test/data directory and are placed under source control. These are the data sets we vary to test interesting conditions in our code. The output file is sent to a temporary directory.
So, now we can run this single test case in spoon and see if our expected values match the output values. You can see from the logfile below that this particular run was successful.
The compare step at the end is our ‘assert’ statement. In most cases, it will be comparing two files. The expected output file (also called ‘golden’) and the output of the transformation. The job step of File Compare works well if you are testing a single file. If the comparison is between two database tables, you can use a Merge Rows step, and if all rows aren’t identical, fail.
You can run the TestCaseRunner job in spoon by hitting the play button or f9.
Next time we will look at how to run multiple tests via one job.
Signup for my infrequent emails about pentaho testing.
Thanks a lot. Great article. I would like to know if we are able to write the outputs in the pass or fail column of excel sheet of our testcase. That way we do not have to keep creating deleting output folder.
Hi dorjay, I don’t see why you can’t do that. Since each test writes its own file response, you may need to change the code around, but I can’t think of any reason it would not work.