Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

SQL Server Data Quality Services in SQL2012 RC0 ? Part 1

This is a discussion on SQL Server Data Quality Services in SQL2012 RC0 ? Part 1 within the BI Monkey forums, part of the CORTEX Blogs category; So the key news – in case you missed it – is that SQL2012 RC0 has been made available for download . After a few battles with the Installer – ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 14th December 2011, 11:11 AM   #1
Guru
 
Join Date: Jun 2009
Posts: 135
James Beresford is on a distinguished road
Thumbs up SQL Server Data Quality Services in SQL2012 RC0 ? Part 1

So the key news – in case you missed it – is that SQL2012 RC0 has been made available for download. After a few battles with the Installer – first the known issue with the Distributed Replay users – then some things requiring manual installs of KB’s to get the installer to run through – I have a VM set up with it.

The DQS team have posted about the improvements made in the DQS blog – and the one I really wanted to focus on was performance via SSIS as the CTP3 offering was not viable for large data sets. So this Part 1 post is all about the performance of DQS via SSIS in RC0.

So, I set up a Knowledge Base in the same way as I did for testing CTP3, with 5 duplicate domains – just evaluating an Integer with a single rule saying that integer had to be greater than a value to be valid. Then I ran two sets of values (5k & 10k rows) through the KB via SSIS, evaluating 1,2,3,4 and 5 fields.

So how does DQS Perform?

Here’s the results- the value in the grid is Seconds taken to process.

DQS Performance in SSIS




So – have we moved on from CTP3? A bit. But not much, and enough to be accounted for by a different VM setup (as a reminder CTP3 processing 5k rows took from 20 to 45 seconds for 1-5 columns). I accept a VM may be slower than a properly configured server, but even if it was twice as quick it would still not be a viable option for industrial use.

Looking at execution time changes by number of columns / rows processed, the time taken seems to be pretty linear as rows and columns increase, so it appears DQS performance can be evaluated pretty much as:
DQS Execution Time = Spin Up Time + (Columns * (Rows * Row Process Time))

Where:

Spin Up Time = time taken for DQS engine to start (Constant)

Columns = number of columns being evaluated (Variable)

Rows = number of rows being processed (Variable)

Row Process Time = time taken to process a single row (Constant)

On my VM, Spin Up Time seems to be 7 seconds, and Rows Process Time = 0.0014 seconds.

So, if we had to validate 10 columns on 1,000,000 rows of data (not too crazy) -
DQS Execution Time = Spin Up Time + (Columns * (Rows * Row Process Time))

DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))

DQS Execution Time = 7 + (10 * (1,000,000 * 0.0014))

DQS Execution Time = 14007 seconds = 233 minutes = 3.9 hours

Which effectively rules it out as a viable production process. Note of course that my formula doesn’t make any allowance for rule complexity.

Is DQS Production ready?

As per anything, the answer is – It depends. For validating small data sets it’s in the realms of slow, but probably acceptable. For big data sets, I’d have to say no – I couldn’t use it in a production environment to validate large sets of data. I’ve added a Connect suggestion to get this on the teams radar.



Get More from the original blog...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
SQL Server Data Quality Services & SSIS James Beresford BI Monkey 2 8th February 2012 03:26 PM
SQL Server Data Quality Services & SSIS ? Performance James Beresford BI Monkey 0 14th September 2011 04:05 PM
SQL Server Data Quality Services ? Composite Domains James Beresford BI Monkey 0 23rd August 2011 03:34 PM
SQL Server Data Quality Services ? Domain Management James Beresford BI Monkey 0 3rd August 2011 05:12 PM
SQL Server Data Quality Services ? First Look James Beresford BI Monkey 0 28th July 2011 10:57 AM


All times are GMT +11. The time now is 11:01 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO