Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

The Row Sampling Transformation

This is a discussion on The Row Sampling Transformation within the BI Monkey forums, part of the CORTEX Blogs category; Fig 1: The Row Sampling Transformation It’s been a long time since I did one of these! In this post I will be covering the Row Sampling Transformation. The sample ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 7th June 2010, 01:26 AM   #1
Guru
 
Join Date: Jun 2009
Posts: 122
James Beresford is on a distinguished road
Thumbs up The Row Sampling Transformation

Fig 1: The Row Sampling Transformation


It’s been a long time since I did one of these! In this post I will be covering the Row Sampling Transformation. The sample package can be found here for 2005 and guidelines on use are here.

What does the Row Sampling Transformation do?

The Row Sampling Transformation takes a fixed number of rows from a source data set – in a similar manner to the Percentage Sampling Transformation, except that instead of a proportion of your data, it takes a fixed number of rows. It splits your data set into two sets, the Sampled and Unsampled outputs, as below where 10 rows of a 100 row data set have been sampled:

Fig 2: The Row Sampling Transformation outputs


The assigning of rows to an output is nominally random, but given the same data set and random seed (explained below), the same rows will always be selected each time you run the package.

Configuring the Row Sampling Transformation

There are two important properties to configure on the transformation. First is the Number of rows, which determines how many rows will fall into the Sample output. Second is the random seed. This seed tells the random selection algorithm which rows to choose. If you fix the seed, you will get consistent results – if you understand a little about randomisation in computing, you will understand randomness is a bit of a relative concept to a computer. If you leave the checkbox unselected, the package will pick a random seed based on the OSes’ tick count, so results will appear to change.

You can also name your Sample and Unselected outputs, should you wish. It’s worth noting that you aren’t obliged to actually use either output downstream of the component, so you can use this component to select a fixed number of rows from your source – or ignore a fixed number of rows from your source, by only using the Unselected output.

Fig 3: Configuring the Row Sampling Transformation


Where should you use the Row Sampling Transformation?

The main use for this would be to select a fixed size subset of data. This subset could be used for Data Mining test sets, or for limiting your data set size when testing packages – e.g. if you are running against a multimillion row data source, you could just run the package with 100 rows to see if your processes worked.

MSDN Documentation for the Row Sampling Transformation can be found here for 2008 and here for 2005.

If you need specific help or advice, or have suggestions on the post, please leave a comment and I will do my best to help you.



Get More from the original blog...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Arizona court rules statistical sampling is legal admin Analytic News Feeds 0 13th April 2010 06:23 AM
The Script Transformation part 1 ? a simple Transformation James Beresford BI Monkey 0 1st September 2009 03:01 PM
The Row Count Transformation James Beresford BI Monkey 0 13th August 2009 12:51 PM
The Sort Transformation James Beresford BI Monkey 0 22nd July 2009 10:03 PM
The Unpivot Transformation James Beresford BI Monkey 0 4th July 2009 02:20 PM


All times are GMT +11. The time now is 10:29 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO