| |
| ||||||
Substituting variables in Kettle Parameter valuesThis is a discussion on Substituting variables in Kettle Parameter values within the DWH Tip Feeds forums, part of the Data Warehousing Tips and Techniques category; Kettle (a.k.a. Pentaho Data Integration) jobs and transformations offers support for named parameters (as of version 3.2). Named parameters form a special class of ordinary kettle variables and are intended ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| News Bot Join Date: Nov 2007
Posts: 15,067
![]() | Kettle (a.k.a. Pentaho Data Integration) jobs and transformations offers support for named parameters (as of version 3.2). Named parameters form a special class of ordinary kettle variables and are intended to clearly and explicitly define for which variables the caller should supply a value. One of my pet projects, the pentaho auto-documentation solution kettle-cookbook, uses two named parameters called INPUT_DIR and OUTPUT_DIR. These allow you to specify the directory that contains the BI content that is to be documented (such as kettle transformation and job files, action sequence files and mondrian schema files), and the directory to store the generated documentation. Several kettle-cookbook users ran into problems attempting to use variable references in the values they supplied for the INPUT_DIR and OUTPUT_DIR variables. In this case, the variables referenced in the supplied parameter values would be set by adding entries in kettle.properties file. I just committed revision 64 of kettle-cookbook which should fix this problem. In this article I briefly discuss the solution, as I think it may be useful to other kettle users. Substituting Kettle Variable References Kettle doesn't automatically substitute variable references in parameter values (nor in ordinary variable values). So, if you need to support variable references inside parameter values, you have to apply a little trick. For kettle-cookbook, I added a single transformation called substitute-variables-in-parameters.ktr as the first transformation of the main job. ![]() The substitute-variables-in-parameters.ktr transformation uses a "Get Variables" step to read the values of the INPUT_DIR and OUTPUT_DIR parameters. The values are then processed by a javascript function which substitutes all variable references with their values. Finally, a "Set Variables" step overwrites the original value of the variables with their replaced value. The code for the JavaScript step is shown below: function replace_variables(text){ var re = /\$\{([^\}]+)\}|%%([^%]+)%%/g, match, from = 0, variable_name, variable_value, replaced_text = "" ; while ((match = re.exec(text)) !== null) { variable_name = match[1] ? match[1] : match[2]; variable_value = getVariable(variable_name, ""); replaced_text += text.substring(from, match.index); replaced_text += variable_value; from = from + match.index + match[0].length; } replaced_text += text.substring(from, text.length); return replaced_text; } var replaced_input_dir = replace_variable(input_dir); var replaced_output_dir = replace_variable(output_dir); The script first defines function replace_variables(text) which accepts the parameter value, and returns the substituted value. Then it calls the function, applying it to the input_dir and output_dir fields from the incoming stream. These fields originate in the preceding "Get variables" step which assigns them the value of the INPUT_DIR and OUTPUT_DIR variables. The output of the replace_variables() function is assigned to the replaced_input_dir and replaced_output_dir javascript variables, which leave the JavaScript step as fields of the outgoing stream. In the final "Set variables" step, the replaced_input_dir and replaced_output_dir fields are used to overwrite the original value of the INPUT_DIR and OUTPUT_DIR values. The replace_variables() function Let's take a closer look at the replace_variables() function. The heart of the function is formed by a while loop that executes a javascript regular expression called re that matches variable references. The regular expression itself is defined in the top of the function: var re = /\$\{([^\}]+)\}|%%([^%]+)%%/g, It's intention is to recognize variable references of the form ${NAME} and %%NAME%%. The part of the pattern for the name is enclosed in parenthesis to form a capturing group. As we shall see later on, this allows us to extract the actual name of the referenced variable. The trailing g indicates that the pattern should be matched anywhere in the string. This is necessary because we want to replace all variable references in the input text, not just the first one. The regular expression object is used to drive the while loop by calling its exec() method. In case of a match, the exec() returns an array that describes the text matched by the regular expression. If there's no match, exec() returns null while ((match = re.exec(text)) !== null) { ... } If there is a match, we first extract the variable name: variable_name = match[1] ? match[1] : match[2]; The first entry of the match array (at index 0) is the text that was matched by the pattern as a whole. The array contains subsequent elements for each capturing group in the regular expression. Because our regular expression re has 2 capturing groups, the match array contains two more elements. If the variable is of the form ${NAME}, the element at index=1 contains the variable name. If it's of the form %%NAME%%, it will be contained in the element at index=2. Once we have the variable name, we can use the getVariable() javascript function to obtain its value: variable_value = getVariable(variable_name, ""); The getVariable() is not a standard javascript function, but supplied by the kettle javascript step. To perform the actual substitution, we take the substring of the original text up to the location where the variable reference was matched. This location is conveniently supplied by the match array: replaced_text += text.substring(from, match.index); Right after that location, we need to put the variable value instead of its name: replaced_text += variable_value; The last action in the loop is to remember to location right behind the last replaced variable reference, so we can pick up at the right location in the original value the next time we match a variable: from = from + match.index + match[0].length; Right after the loop, we need to copy the final piece of original text occurring right behind the last variable reference to yield the complete replaced text: replaced_text += text.substring(from, text.length); More from Roland Bouman's Blog ... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Parameterizing SQL statements in the Kettle Table Input step: Variables vs Parameters | Latest News Headlines | DWH Tip Feeds | 0 | 9th December 2010 10:04 PM |
| Telstra values its NBN deal at $16bn | Latest News Headlines | Latest News | 0 | 1st November 2010 11:21 AM |
| Report Variables and Group Variables in Reporting Services 2008 | Latest News Headlines | Microsoft News and Views | 0 | 25th November 2009 01:49 PM |
| Bendigo values loyal shareholders, to a degree | Latest News Headlines | 2009 Q3 News Headlines | 0 | 11th August 2009 09:21 AM |
| Commercial property values plunge | Latest News Headlines | 2009 Q3 News Headlines | 0 | 5th August 2009 01:20 PM |
| | |
| | |