Go Back   CORTEX Forums > Best Practices > Subject Matter Expertise > Data Quality
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

The Data Audit Imperative

This is a discussion on The Data Audit Imperative within the Data Quality forums, part of the Subject Matter Expertise category; by Malcolm Chisholm, BeyeNETWORK , 2 DECEMBER 2009 Data is now widely accepted as being something of value. In this it is analogous to money, and it seems to share ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 6th December 2009, 02:55 PM   #1
Administrator
 
Join Date: Oct 2007
Posts: 14,318
Blog Entries: 7
admin has disabled reputation
Post The Data Audit Imperative

by Malcolm Chisholm, BeyeNETWORK, 2 DECEMBER 2009

Data is now widely accepted as being something of value. In this it is analogous to money, and it seems to share another property of money in that it can "flow." Modern bookkeeping recognizes the ability of money to flow by identifying a credit and a debit for each side of a financial transaction, and treating the transaction as a flow of funds from the debit side to the credit. Should we do something similar for data? I think that we should, but I think that what we need to do with flows of data also bears a resemblance to another aspect of financial management – the need to audit.

The word "audit" comes from the word "auditor" which originally meant "one who listens." Once upon a time, auditors really did listen. For instance, it was quite common in the Spanish empire for the monarch to send auditors to the colonies to gather intelligence on what was happening in the colonial administrations. The auditors would report directly back to Madrid. In this way, the Emperor or Empress could be assured that the colonies were being run in compliance with his or her wishes. The idea of the auditor was gradually developed from this early arrangement into what we have today.

The Need to Audit

Today, compliance includes the idea that we are not simply expected to do something, but that we may also be expected to prove that we did it. Whatever mechanism we use to perform a task – be it manual, or automated, or both – it is impossible to use it to prove that the task was performed in the way that was expected. Suppose I write a function to move data from Table A to Table B, and this function outputs the number of records read from Table A and the number of records written to Table B. This is a very good feature, but it is not an audit. Perhaps the records were not really written to Table B in the Production environment, but were written to Table B in the Development environment because somebody forgot to change the connection string. The function might produce perfect record counts, but the process would still have failed. Of course, this is not to say that processes should not have their own internal controls like record counts. They should. However, there is also a need to independently verify that the process has functioned in the way it should have. There is a need to audit it.
The Flow of Data

Data movement is very common in modern IT environments. We expect that transaction applications will produce data, but that different informational applications will analyze it. The data must be moved from the transactional applications to the informational applications. But data movement is even more pervasive. Data is moved among transaction applications, in both real time and batch modes. Data is sent to external parties, such as regulators, and received from others, such as data vendors. We are all aware that myriads of data flows happen every day in the enterprises we work in, just as financial flows do. However, data flows are different. There is no real "debit" from the source of data. Data is nearly always copied, rather than moved. That is, the records which are written to the target do not result in the elimination of the corresponding records in the source. This makes data difficult to deal with. There is no single place in which a given record is located – it may have been copied to many places.

Given that data flows are now so common, it is worth considering if these flows should be audited. It would be nice to have an independent assurance that the data which we think we have moved actually got moved, and that the data came from where it was supposed to have come from, and has gone where it was supposed to go to.
What Can Go Wrong?

Actually, this is not really a "nice to have" feature for a modern enterprise. It is essential. Data flows can go wrong in all kinds of ways. Consider orchestration of data movement. We may have a nightly flow from a table in Transaction Application A to Staging Table B in Data Warehouse C, and a second flow from Staging Table B to Fact Table D in the warehouse. Suppose that the flow from A to B is scheduled to run at 01:30 a.m. every day, and the flow from B to D at 04:00 a.m. every day. Now suppose that the first flow is delayed and does not happen until after the B to D flow has completed. We obviously will have a problem.

A single isolated example like this seems easy to comprehend and might not seem to really require an audit process to detect exceptions. Perhaps this could be done within the process itself. But when we have hundreds or thousands of data flows per day, figuring out everything that could go wrong and specifically coding it into the data movement processes is not scalable. Also, what happens if a data movement process – for whatever reason – simply is not run? It cannot detect its own failure. We are back to the need for independent verification – for auditing.
What is Data Auditing?

We undoubtedly still have a lot of theoretical and practical work to do in the realm of data auditing, but it is possible to see the outlines of what it should consist of.

A data auditing tool should allow us to identify a source and a target. Data is going to flow from the source to the target. We should then be able to identify the records expected to have been moved in the source and the records expected to have arrived from the source in the target. This could be simple, or it could be complex. It can certainly involve identifying subsets of records in the source and the target. If this is the case, we will inevitably need a business rules approach. Logic will be needed to identify the subsets of records in the source and target. Perhaps this will be based on SQL queries. This logic will require metadata, such as description of what the logic is trying to do, who set it up, and how it corresponds to some kind of business reality. Governance processes will need to be overlain on all of this. Thus, we can quickly appreciate that a simplistic programming approach will not be sufficient.

There must be other components in the architecture that supports data auditing. The results of the audit runs must be stored in a database. A notification service will be needed to send messages to stakeholders if exceptions are detected. This, in turn, requires elements for stakeholder management. Then there is orchestration. The audit processes have to run in the correct time windows and observe the proper dependencies. And then there are the governance processes to configure, monitor, and evaluate the auditing.

None of this is ultimately easy. However, it needs to be addressed to stop the "data mess" from spiraling ever further out of control in the enterprises we work in. Automated tools are now making their appearance in this area. They will be part of any solution, but all data managers need to begin thinking in earnest about data auditing.
admin is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Podcast: Lean: The New Business Technology Imperative Latest News Headlines Forrester 0 5th December 2009 11:15 AM
Ron Paul on the Federal Reserve audit admin Prediction Markets News Feeds 0 30th November 2009 08:46 AM
SAP Consultant - IT Audit (S15) admin 2009 Job Archive 0 26th November 2009 06:51 PM
SAP Consultant - IT Audit (S15) admin 2009 Job Archive 0 26th November 2009 06:51 PM
Debating on whether we should audit the Fed admin Prediction Markets News Feeds 0 22nd November 2009 12:49 PM


All times are GMT +11. The time now is 10:04 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO