| |
| ||||||
Managing kettle job configurationThis is a discussion on Managing kettle job configuration within the DWH Tip Feeds forums, part of the Data Warehousing Tips and Techniques category; Over time I've grown a habit of making a configuration file for my kettle jobs. This is especially useful if you have a reusable job, where the same work has ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| News Bot Join Date: Nov 2007
Posts: 15,067
![]() | Over time I've grown a habit of making a configuration file for my kettle jobs. This is especially useful if you have a reusable job, where the same work has to be done but against different conditions. A simple example where I found this useful is when you have separate development, testing and production environments: when you're done developing your job, you transfer the .kjb file (and its dependencies) to the testing environment. This is the easy part. But the job still has to run within the new environment, against different database connections, webservice urls and file system paths. Variables In the past, much has been written about using kettle variables, parameters and arguments. Variables are the basic features that provide the mechanism to configure the transformation steps and job entries: instead of using literal configuration values, you use a variable reference. This way, you can initialize all variables to whatever values are appropriate at that time, and for that environment. Today, I don't want to discuss variables and variable references - instead I'm just focussing on how to manage the configuration once you already used variable references inside your your jobs and transformations. Managing configuration To manage the configuration, I typically start the main job with a set-variables.ktr transformation. This transformation reads configuration data from a config.properties file and assigns it to the variables so any subsequent jobs and transformations can access the configration data through variable references. The main job has one parameter called ${CONFIG_DIR} which has to be set by the caller so the set-variables.ktr transformation knows where to look for its config.properties file: ![]() Reading configuration properties The config.properties file is just a list of key/value pairs, where each key represents a variable name, and the value the appropriate value. The set-variables.ktr transformation reads it using a "Property Input" step, and this yields a stream of key/value pairs: ![]() Pivoting key/value pairs to use the "set variables" step In the past, I used to set the variables using the "Set variables" step. This step works by creating a variable from selected fields in the incoming stream and assigning the field value to it. This means that you can't just feed the stream of key/value pairs from the property input step into the set variables step: the stream coming out of the property input step contains multiple rows with just two fields called "Key" and "value". Feeding it directly into the "Set variables" step would just lead to creating two variables called Key and Value, and they would be assigned values multiple times for all key/value pairs in the stream. So in order to meaningfully assign variable, I used to pivot the stream of key/value pairs into a single row having one field for each key in the stream using the "Row Denormaliser" step: ![]() Drawbacks There are two important drawbacks to this approach:
Solution: Javascript As it turns out, there is in fact a very simple solution that solves all of these problems: don't use the "set variables" step for this kind of problem! We still need to set the variables of course, but we can conveniently do this using a JavaScript step. The new set-variables.ktr transformation now looks like this: ![]() The actual variable assignemnt is done with Kettle's built-in setVariable(key, value, scope). The key and value from the incoming stream are passed as arguments to the key and value arguments of the setVariable() function. The third argument of the setVariable() function is a string that identifies the scope of the variable, and must have one of the following values:
The bonus is that this set-variables.ktr is less complex than the previous one and is now even completely independent of the content of the configuration. It has become a reusable transformation that you can use over and over. More from Roland Bouman's Blog ... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Kettle data in a browser | Latest News Headlines | Data Integration News Feeds | 0 | 27th April 2011 08:44 AM |
| Kettle vs Oracle REF CURSOR | Latest News Headlines | Data Integration News Feeds | 0 | 17th November 2010 10:20 AM |
| Configuration Analyst - ITIL environment | admin | 2010 Job Archive | 0 | 30th March 2010 03:31 AM |
| Kettle log text capturing | Latest News Headlines | Data Integration News Feeds | 0 | 3rd March 2010 04:42 AM |
| Formatting SSIS Configuration files | James Beresford | BI Monkey | 0 | 11th August 2009 11:58 AM |
| | |
| | |