Go Back   CORTEX Forums > Best Practices > Tips and Techniques > Data Warehousing Tips and Techniques > DWH Tip Feeds
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Parameterizing SQL statements in the Kettle Table Input step: Variables vs Parameters

This is a discussion on Parameterizing SQL statements in the Kettle Table Input step: Variables vs Parameters within the DWH Tip Feeds forums, part of the Data Warehousing Tips and Techniques category; I get this question on a regular basis, so I figured I might as well blog it, in the hope it will be useful for others. Here goes: Let's say ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 9th December 2010, 10:04 PM   #1
News Bot
 
Join Date: Nov 2007
Posts: 15,067
Latest News Headlines is on a distinguished road
Post Parameterizing SQL statements in the Kettle Table Input step: Variables vs Parameters

I get this question on a regular basis, so I figured I might as well blog it, in the hope it will be useful for others. Here goes:
Let's say that I want to delete all records that match an id from a set of tables. The table names come in as rows into the Execute SQL Script step (check execute for every row). Next I write:
DELETE FROM {table_name} WHERE id = {identifier}
as the SQL to execute. In the parameters grid at the bottom right, I have two fields: table_name and identifier. What is the syntax for substituting the table_name and identifier parameters in the sql script?
(Although this particular question focuses on the "Execute SQL Script" step, it also applies to the "Table Input" step, and probably a few more steps I can't recall right now.)

The parameters grid can be used for prepared statement value placeholders. In the SQL statement these placeholders are denoted as questionmarks (?). These are positional parameters: they get their value from those fields in the incoming stream that are entered in the parameters grid, in order.
Here's an example of the correct usage of these placeholders:
DELETE FROM myTable WHERE id = ?
Here, the ? in the WHERE clause will be bound to the value of the first field from the incoming stream entered in the parameters grid. Because there is only one such placeholder, there can be only one field in the parameters grid.

An important thing to realize is that these parameters can only be used to parameterize value expressions. So, this kind of parameter does not work for identifiers, nor do they work for structural elements of the SQL statement, such as keywords. So this kind of parameter cannot be used to parameterize the table name which seems to be the intention in the original example posed in the question.

There is a way to parameterize the structural elements of the SQL statement as well as the parameters. You can apply variable substitution to the SQL statetment.

Kettle Variables can be defined by a Set Variables step, or by specifying parameters at the transformation level. They get their value from "the environment": for example, parameters get their value initially when the transformation is started, and regular variables are typically set somewhere in the job that is calling your transformation.

In text fields, including the SQL textarea of the Table input step or the Execute SQL Script step, you denote those variables with this syntax: ${VARIABLE_NAME}. So to parameterize the table name we could use something like this:
DELETE FROM ${TABLE_NAME}

In order to force kettle to apply variable substitution to the SQL statement, you have to check the "variable substitution" checkbox. If this checkbox is checked, then all variables are simply substituted with their (string)value during transformation initialization. This is a lot like the way macro's are substituted by the pre-processor in C/C++ code.

When comparing variables with parameters, two important things should be mentioned here:
  • Unlike value placeholders, variables can be used to manipulate any aspect of the SQL statement, not just value expressions. The variable value will simply become the text that makes up the SQL statement, it is your responsibility it results in a syntactically valid and correct SQL statement.
  • Variables are evaluated once during transformation initalization. So if you want to vary the variable value, you'll have to call the transformation again for the change to take effect. For the same reasons, you cannot set the value of a variable and read it within the same transformation: setting the variable value occurs at runtime, but evaluating it occurs at initialization time.

Finally, here's a screenshot that summarizes these different ways to parameterize SQL statements in kettle:

If you want to read more about this topic, it's covered in both our books Pentaho Solutions and Pentaho Kettle Solutions. Another title you might be interested in is Maria Roldan's Pentaho 3.2 Data Integration: Beginner's Guide.


More from Roland Bouman's Blog ...
Latest News Headlines is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
10,000 observations and 100,000 parameters: what to do? admin Analytic News Feeds 0 12th September 2010 07:12 PM
Google Analytics on site targeting based on custom variables admin Presentation News Feeds 0 8th June 2010 12:23 PM
Reform of input taxes too hard for now Latest News Headlines 2010 Q2 News Headlines 0 3rd May 2010 12:12 PM
Report Variables and Group Variables in Reporting Services 2008 Latest News Headlines Microsoft News and Views 0 25th November 2009 01:49 PM
ShortRead: a bioconductor package for input, quality assessment and exploration of hi admin Analytic News Feeds 0 18th November 2009 11:38 AM


All times are GMT +11. The time now is 06:44 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO