Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

SQL Server Data Quality Services – Creating a Knowledge Base

This is a discussion on SQL Server Data Quality Services – Creating a Knowledge Base within the BI Monkey forums, part of the CORTEX Blogs category; So far we’ve opened up the client and taken a look at the areas for working with. In this post I’ll look at setting up a Knowledge Base from scratch ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 2nd August 2011, 10:41 AM   #1
Guru
 
Join Date: Jun 2009
Posts: 135
James Beresford is on a distinguished road
Thumbs up SQL Server Data Quality Services – Creating a Knowledge Base

So far we’ve opened up the client and taken a look at the areas for working with. In this post I’ll look at setting up a Knowledge Base from scratch using some sample data I’ve mocked up based on some Netflix catalogue data.

Note all of the images are of near full screen so I’ve shrunk them in the post, just click on the image to see the full screen.

Creating a Knowledge Base

To create a new KB just select “New Knowledge Base” from the client front screen, and give it a name and description. You can either clone an existing one on already the server or import, but in this case I’m just going to start from scratch. There’s three possible modes in which you can create a new KB:
  • Domain Management – Creating from scratch with no guidance
  • Knowledge Discovery – Using a sample data set to guide building your KB with a view to using it for data cleansing rules
  • Matching Policy – Using a sample data set to guide building your KB with a view to using its record matching capabilities
SSDQS Client: Create new Knowledge Base


As I’m not attempting to do any matching, I’m using the Knowledge Discovery approach. (Note: because I’m using an Excel source I need Excel installed on the machine).

Mapping Columns to Domains

Once I’ve picked my data source, I need to map columns:

Data Quality Client - Field Mapping


At this stage, there is no facility to automatically create domains, so before I map the fields to a Domain, I need to create each of them:

Data Quality Client - New Domain


Now a (known) defect appears to be fields from Excel are automatically treated as text – so if they contain numbers you cannot map them to a Numeric domain. So I’ll skip over evaluating Year for now.

Analysing and Managing the results

Once all the mapping is done, click next and you can upload the file for analysis. Now my sample file is 5,000 records, and it took DQS a few minutes just to upload this – so big sets may take a while. Anyway, the output of this is an analysis of the data, displayed in a profiler screen:

Data Quality Client - Discovery Output


This breaks down the records by the following criteria:
  • New records – in this case, all of them as it’s a first pass
  • Unique - how many unique records
  • Valid - how many valid records – again, as it’s a first pass, everything is valid
So, we skip on to the next stage – to manage the results of the findings. This is the first time we start to see what DQS can offer us in terms of cleansing:

Data Quality Client - Managing Domain Values


Here we are reviewing the values found for Country, and can manage the values that come through, flagging them as Correct, Error or Invalid – and assign a corrected value to incorrect ones.

I’m trying to see if there is a functional difference between Error and Invalid. As per documentation:
The status of the value, as determined by the discovery process. You can change the type by clicking the down arrow and selecting a different type. A green check indicates that the value is correct or corrected; a red cross indicates that the value is in error; and an orange triangle with an exclamation point indicates that the value is not valid.
A value that is not valid does not conform to the data requirements for the domain. A value that is in error can be valid, but is not the correct value for data reasons.

Skipping over the semantics issues, what we see here is a list of the values that the DQS Client has found in the Knowledge Discovery analysis of the data. We can then flag these values to Invalid or Error as we see fit – or leave them at their default value of Correct. Once we have flagged them as not correct, it is then possible to enter the Correct value in the “Correct To” column. Handily, the client then groups your corrected values under the correct value in the list.

The final thing is to click Next and Publish the Knowledge Base (i.e. store the results back on the DQS Server)

Summary

So in this post we have quickly reviewed the creation of* Knowledge Base through the Knowledge Discovery Mode. This has allowed us to create a set of values in our Knowledge Base using some sample data and then apply some corrections to those values, using a simple GUI to manage the results.

In the next post I will look at working in more depth with this created Knowledge Base using the “Domain Management” mode.



Get More from the original blog...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
SQL Server Data Quality Services ? First Look James Beresford BI Monkey 0 28th July 2011 10:57 AM
Installing SQL Server Data Quality Services (Denali CTP3) James Beresford BI Monkey 0 26th July 2011 06:12 PM
Organising Knowledge>> Blog: Data, Knowledge Organisation and Scientific Knowledge Straits Knowledge Green Chameleon 0 20th April 2011 03:48 PM
New Data Visualizations in SQL Server Reporting Services 2008 R2 Latest News Headlines Microsoft News and Views 0 10th March 2010 09:47 AM
Starter Kit for PerformancePoint Planning Server SQL Server Integration Services (SSI Latest News Headlines Microsoft News and Views 0 25th November 2009 01:49 PM


All times are GMT +11. The time now is 10:51 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO