Go Back   CORTEX Forums > Reference Shelf > CAG - CORTEX Analytic Glossary > D - E
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Data Profiling

This is a discussion on Data Profiling within the D - E forums, part of the CAG - CORTEX Analytic Glossary category; Data profiling is the systematic up front analysis of the content of a data source, all the way from counting the bytes and checking cardinalities up to the most thoughtful ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 26th January 2010, 11:06 AM   #1
Administrator
 
Join Date: Oct 2007
Posts: 15,929
Blog Entries: 7
admin has disabled reputation
Post Data Profiling

Data profiling is the systematic up front analysis of the content of a data source, all the way from counting the bytes and checking cardinalities up to the most thoughtful diagnosis of whether the data can meet the high level goals of the data warehouse. Data profiling is the technical analysis of data to describe its content, consistency and structure.

Data profiling practitioners divide this analysis into a series of tests, starting with individual fields and ending with whole suites of tables comprising extended databases. Individual fields are checked to see that their contents agree with their basic data definitions and domain declarations. It is especially valuable to see how many rows have null values, or have contents that violate the domain definition.

For example, if the domain definition is “telephone number” then alphanumeric entries clearly represents a problem. The best data profiling tools count, sort, and display the entries that violate data definitions and domain declarations.

Moving beyond single fields, data profiling then describes the relationships discovered between fields in the same table. Fields that implement a key to the data table can be displayed, together with higher level many-to-1 relationships that implement hierarchies. Checking what should be the key of a table is especially helpful because the violations (duplicate instances of the key field) are either serious errors, or reflect a business rule that has not been incorporated into the ETL design.

Relationships between tables are also checked in the data profiling step, including assumed foreign key to primary key relationships and the presence of parents without children.

Finally, data profiling can be custom programmed to check complex business rules unique to a business such as verifying that all the preconditions have been met for granting approval of a major funding initiative.
admin is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Tags
data profile
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Tilly’s deploys SAS® Size Profiling as a managed solution Latest News Headlines SAS Forum 0 20th January 2010 08:22 AM
Tilly’s deploys SAS® Size Profiling as a managed solution Latest News Headlines SAS Forum 0 12th January 2010 04:55 AM
Relative power and sample size analysis on gene expression profiling data. admin Analytic News Feeds 0 18th November 2009 11:38 AM
DATA DATA DATA ANALYSIS, STRONG SQL, Technical Consulting admin 2009 Job Archive 0 21st October 2009 06:56 PM
RAAAKERS™ Profiling and Portfolio Management Graham Durant-Law Knowledge matters 0 26th June 2009 10:27 AM


All times are GMT +11. The time now is 12:02 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO