| |
| ||||||
SQL Server Data Quality Services in SQL2012 RC0 ? Part 1This is a discussion on SQL Server Data Quality Services in SQL2012 RC0 ? Part 1 within the BI Monkey forums, part of the CORTEX Blogs category; So the key news – in case you missed it – is that SQL2012 RC0 has been made available for download . After a few battles with the Installer – ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| Guru Join Date: Jun 2009
Posts: 135
![]() | So the key news – in case you missed it – is that SQL2012 RC0 has been made available for download. After a few battles with the Installer – first the known issue with the Distributed Replay users – then some things requiring manual installs of KB’s to get the installer to run through – I have a VM set up with it. The DQS team have posted about the improvements made in the DQS blog – and the one I really wanted to focus on was performance via SSIS as the CTP3 offering was not viable for large data sets. So this Part 1 post is all about the performance of DQS via SSIS in RC0. So, I set up a Knowledge Base in the same way as I did for testing CTP3, with 5 duplicate domains – just evaluating an Integer with a single rule saying that integer had to be greater than a value to be valid. Then I ran two sets of values (5k & 10k rows) through the KB via SSIS, evaluating 1,2,3,4 and 5 fields. So how does DQS Perform? Here’s the results- the value in the grid is Seconds taken to process. DQS Performance in SSISSo – have we moved on from CTP3? A bit. But not much, and enough to be accounted for by a different VM setup (as a reminder CTP3 processing 5k rows took from 20 to 45 seconds for 1-5 columns). I accept a VM may be slower than a properly configured server, but even if it was twice as quick it would still not be a viable option for industrial use. Looking at execution time changes by number of columns / rows processed, the time taken seems to be pretty linear as rows and columns increase, so it appears DQS performance can be evaluated pretty much as: DQS Execution Time = Spin Up Time + (Columns * (Rows * Row Process Time))On my VM, Spin Up Time seems to be 7 seconds, and Rows Process Time = 0.0014 seconds. So, if we had to validate 10 columns on 1,000,000 rows of data (not too crazy) - DQS Execution Time = Spin Up Time + (Columns * (Rows * Row Process Time))Which effectively rules it out as a viable production process. Note of course that my formula doesn’t make any allowance for rule complexity. Is DQS Production ready? As per anything, the answer is – It depends. For validating small data sets it’s in the realms of slow, but probably acceptable. For big data sets, I’d have to say no – I couldn’t use it in a production environment to validate large sets of data. I’ve added a Connect suggestion to get this on the teams radar. Get More from the original blog... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| SQL Server Data Quality Services & SSIS | James Beresford | BI Monkey | 2 | 8th February 2012 03:26 PM |
| SQL Server Data Quality Services & SSIS ? Performance | James Beresford | BI Monkey | 0 | 14th September 2011 04:05 PM |
| SQL Server Data Quality Services ? Composite Domains | James Beresford | BI Monkey | 0 | 23rd August 2011 03:34 PM |
| SQL Server Data Quality Services ? Domain Management | James Beresford | BI Monkey | 0 | 3rd August 2011 05:12 PM |
| SQL Server Data Quality Services ? First Look | James Beresford | BI Monkey | 0 | 28th July 2011 10:57 AM |
| | |
| | |