Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

The Pivot Transformation

This is a discussion on The Pivot Transformation within the BI Monkey forums, part of the CORTEX Blogs category; In this post I will be covering the Pivot Transformation. Due to the lack of a GUI or wizard for this component, it’s difficult to configure properly and painful to ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 23rd June 2009, 09:34 PM   #1
Guru
 
Join Date: Jun 2009
Posts: 122
James Beresford is on a distinguished road
Thumbs up The Pivot Transformation

In this post I will be covering the Pivot Transformation. Due to the lack of a GUI or wizard for this component, it’s difficult to configure properly and painful to use, so I will try to walk through it as thoroughly as possible. I found the MSDN documentation on this component a bit confusing as well, so hopefully this will prove to be clearer.

The sample package can be found*here for 2005 and guidelines on use are here.

What does the Pivot Transformation do?

This bit is fairly straightforward - it takes row values and converts them to columns, moving - but not aggregating -* a value as it does so. So if your data looked like this:

Fig 1: Pre Pivot Data


Pivoting on “Food Type” using “Cost” as a value would turn it into this:

Fig 2: Post Pivot Data


How to configure the Pivot Transformation

In the sample package Data Flow 1 I start with a query which delivers Line Totals and Order Quantities summarised by Product and Order Year. What I want to do is Pivot so I have Order Year on the rows, the values from Product Category as the Columns and the sum of Line Totals as the Value, as laid out below:

Fig 3: The Desired Pivot Results


Step 1: Configure the Inputs

The first step is to open the Pivot Transformation and on the Input Columns tab select the columns that are going to be used in the Pivot. Once that is done, switch to the Input and Output Properties tab. Here we are going to set how each column is used in the Pivot operation. This is done by setting the PivotUsage property of the Input Columns. This can be found by expanding Pivot Default Input > Input Columns and selecting the relevant Input Column. Down at the bottom of the list under the Custom header is the property PivotUsage. This can be set to one of four values:
  • 0 - the column is passed through unaffected
  • 1 - the column values become the rows of the pivot (aka the Set Key)
  • 2 - the columnvalues become the column names of the pivot (aka the Pivot Column)
  • 3 - the column values that are pivoted in the pivot
So, in my example, I set OrderYear to 1, Product to 2 and Line Totals to 3.

Fig 4: Configuring the Pivot Transformation Inputs


Now something you have to configure upstream from this is that whatever column will be the rows of the pivot (PivotKeyUsage = 1) needs to be sorted for the component to work properly. The Pivot component iteslf doesn’t force this (as it does in a Merge Join) so you need to go back and make sure that your column is sorted by the time it hits the pivot. Setting the IsSorted and SortKeyPosition properties of the Set Key column upstream will make no difference as the Pivot Transformation won’t actually pay any attention to these settings, but it may cause unexpected results if the data itself isn’t sorted. You can see how this affects the output by running Data Flow 1a, which is the same as Data Flow 1 but not sorted properly.

Step 2: Configure the Outputs

Despite everything you’ve done so far, this component will have exactly zero outputs - each has to be manually defined. So you will need an output for your Rows (In my example, OrderYear) and an output for each of your Pivoted column’s values that you want to output. So in my example I will need to create an output for each Product Category that is in the source data - you cannot ignore a value as the component will return an error. For example, if I forgot to add a column for Components, I would get the error “The pivot key value “Components” is not valid” when I tried to execute.

To create an output, on the Inputs and Outputs tab expand the Pivot Default Output > OutputColumns and start clicking the Add Column button until you have enough columns. Then for each column you need to set the following properties:
  • Name - the name for the output column - so if you were pivoting a value of Bike, you could rename it to Bicycle in the output
  • PivotKeyValue - the value in the pivoted column that will go into this output
  • SourceColumn - the Lineage ID of the column that is to be used in the pivot. What this means is you need to go back to your input columns, and get the value of the LineageID property of the column that you set the PivotUsage of to 3 for pivoted columns, and the LineageID of the Column you set to PivotUsage of 1 for the Row Column.
Fig 5: Configuring the Pivot Transformation Outputs


Having fun yet? Unlikely, but fortunately you are done.

Multiple Pivots and the T-SQL Alternative

In the sample package Data Flow 2, I demonstrate a configuration of the Pivot Transformation that actually pivots two data values. This is achieved by adding a duplicate set of columns for the Pivot Column to receive the pivoted values in, and can be achieved mostly by duplicating the actions in Step 2 to create a new set of Pivot Columns.

I also demonstrate the Pivot T-SQL command in Data Flow 3 - this generates the same output as in Data Flow 1 but the aggregation and pivoting is done in a single operation with none of the awkward configuration required by the Pivot Transformation in SSIS. I won’t be going into the syntax for these queries here but lkinks to the MSDN documentation are at the end of this post.

Where to use the Pivot Transformation?

Because of the pain involved in configuring it, I would suggest you only use it when you absolutely have to and can’t push the operation back to the source. This is not least because the pivot - unlike Excel Pivot Tables or a T-SQL pivot - cannot aggregate, it needs aggregation first on the pivoted column. In my example, if I didn’t have an aggregate operation in the query, I would get the error “Duplicate pivot key value “Bikes”” as it tried to pivot another row which had a Product Category of “Bike”. The T-SQL Pivot query in SQL Server is far more powerful with its variety of aggregation options - even though it will initially be a bit awkward to write those queries to a non-expert. Additionally using T-SQL will allow you to lock down the number of columns - if you get a new value in the Pivot Column it won’t kill your package.

All in all the Pivot Transformation is badly implemented from a UI perspective, and weak from a data processing viewpoint due to it’s lack of aggregation capability.

MSDN Documentation for the Pivot Transformation can be found here for 2008 and here for 2005.

MSDN Documentation for the Pivot T-SQL command can be found here for 2008 and here for 2005.



More...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
The Conditional Split Transformation James Beresford BI Monkey 0 23rd June 2009 09:34 PM
The Data Conversion Transformation James Beresford BI Monkey 0 23rd June 2009 09:34 PM
The Fuzzy Lookup Transformation James Beresford BI Monkey 0 23rd June 2009 09:34 PM


All times are GMT +11. The time now is 09:39 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO