Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Managing your history data

This is a discussion on Managing your history data within the BI Monkey forums, part of the CORTEX Blogs category; This post is to an extent a small rant about some design decisions I have been constrained by on my current project. These decisions were made predominantly for*one fairly bad ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 11th May 2010, 03:05 PM   #1
Guru
 
Join Date: Jun 2009
Posts: 122
James Beresford is on a distinguished road
Thumbs up Managing your history data

This post is to an extent a small rant about some design decisions I have been constrained by on my current project. These decisions were made predominantly for*one fairly bad reason: it made the architect’s life easier (apologies to the architects if they are reading – but these were bad choices!)

The design choices in question are around the managing of history data. In one component of the system it relates to Database storage design, the other relates to Cube storage design. In both cases the history data is stored in a separate location to the “current” data.

Databases: Why separate history*tables are*a bad idea

The first – and most compelling – reason for not storing your history data in*separate tables to your current*tables is that it increases complexity for users. Instead of having one location to look for data, your users now have to use two.

The second compelling reason is that there is no point to doing this from a storage point of view. SQL 2005 & 2008 (Enterprise*editions only, admittedly) provide partitioning. This enables the contents of an individual*table to be stored in separate locations on different filegroups. This means that you can store your current days data in one location and your history in a different one. The reason for doing this*is the same as splitting it into separate tables – that querying the current section will be faster than the historic section. *In theory*queries against partitioned tables*should in fact be faster as the current data is now no longer in the same filegroup as the history data.

Now, there is an overhead associated with designing and maintaining partitions but I don’t see that it is significantly larger than that required to*deal with the process required to archive data into separate tables*on a daily basis. Additionally when maintaining separate history tables, you need to separate out every single table, whether it gets 10 rows a day or 10 million. With partitioning you can just target the large tables that need that focus.

There are*other downsides to maintaining separate*tables. If you make a*change to a table design, you need to do it in 2 places. *You also need to remember to update your history processes. If your history process fails, you can end up with users getting unexpected query results or ETL process failures when the system loads the next day’s data into the current tables – *and untangling it becomes a real mess. If your partition processes fail to run, you just have too much data in one filegroup for a while – unlikely to be*fatal.

So if you have large tables you need to split out for performance purposes – do it at the back end, using the power of the database – which is designed to store data efficiently. Keep it away from the users – they neither need to know or care about your need to keep the data separate. If you want to give them a single object to query with the current day’s data, just use views.

Cubes: Why a separate history*Cube is a bad idea

Much of the above applies here – SSAS also has partitions – so you can again store your historic and current data in separate physical locations with the users being totally unaware of this. Again there is overhead in maintenance, but this will also balance out with the maintenance and risks associated with maintaining two identical cubes that only differ in terms of data source.

Use your storage options!

So without banging on about the same things any further, please consider the following two points whenever considering managing your history data:
  1. How does what i’m planning affect my users?
  2. How does what i’m planning leverage the platforms capabilities?

Get More from the original blog...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Cartographies of Time: A History of the Timeline admin Analytic News Feeds 0 20th April 2010 04:29 AM
Christian Bartens, Managing Director of Datalicious, new chair of ADMA Data & Analyti admin Presentation News Feeds 0 14th March 2010 06:00 PM
IBM Announces New Software for Managing Data Centers Latest News Headlines IBM and Cognos Forum 0 20th October 2009 04:06 PM
Interactive Data Promotes Liz Duggan to Managing Director, Evaluations Latest News Headlines 2009 Q4 News Headlines 0 6th October 2009 06:57 AM
Excellent video on the History of BI James Beresford BI Monkey 3 1st July 2009 12:58 PM


All times are GMT +11. The time now is 10:23 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO