Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > Innovations in Data Management
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

The Problem with the Relational Database (Part 2) – Predictability

This is a discussion on The Problem with the Relational Database (Part 2) – Predictability within the Innovations in Data Management forums, part of the CORTEX Blogs category; Introduction The Problem with the Relational Database (Part 1 ) –The Deployment Model I met with a friend of mine in New York recently who runs a credit card processing ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 26th June 2009, 11:27 AM   #1
Senior Member
 
Join Date: Jun 2009
Posts: 71
Tony Bain is on a distinguished road
Default The Problem with the Relational Database (Part 2) – Predictability

I met with a friend of mine in New York recently who runs a credit card processing end-point company.* They specifically built their business around a non-relational database platform and feel they would have major issues had they chosen to build their business around a traditional relational database.

In this business they process hundreds of transactions a second, this means that hundreds of people every second are standing somewhere in the world at a checkout making a purchase.* And if the transaction is delayed, that is a lot of people are standing and waiting.* Worst still if the transaction is delayed too long the transaction fails and the customer not only has to retry their transaction, they are likely embarrassed and probably somewhat annoyed.

The relational database certainly has the capability to scale to many hundreds of transactions a second. But the reasons for their move away from the relational database aren’t specifically scale related but instead come down to predictability and consistency.

Relational databases are not predictable or reliable in terms of consistent performance for a number of reasons.

Firstly, every query uses a different amount of resources.* A query could consume 1 or 2 I/O’s or 1 or 2 million I/O’s depending on how the query is written, what data is selected and factors such as how the database is indexed.* Performance is further varied by how the database is maintained (fragmentation).* What makes matters more complex is that different predicate values for a query can have vastly given data distributions.* The same query executed with different constants can have vastly different resource requirements.

Because every query has a different “footprint” running a query in isolation does not provide indicative*statistics on how that query will perform under concurrent load.* In fact is become impossible to predict the exact execution duration of a relational database query as its performance will be dependent on what else is being executed at the exact moment it is.


Cost Based Optimization


Because queries can have complex execution steps behind them with multiple paths to achieving the result, one neat thing that was added to the relational database was cost based optimization.* This takes the control of “how” a query is resolved away from the developer and instead the relational database server works this out at runtime using a complex series of algorithms.* This was hugely important for simplifying the use of relational databases.* But it is bad for predictability.



Cost based optimization can get it wrong.* It is much less common now than 10 years ago but every DBA has still come across cases when the optimizer chooses the wrong “query plan” and the query execution takes a long time for seemingly for no reason.* But worse for predictability is cost based optimization is dynamic and looks at a lot of factors at run time to determine which plan is best.* These factors can change meaning that query performance can change (often without any external schema/code modifications).* This dynamic ability can cause headaches to DBAs who are often tasked with resolving performance issues when suddenly the database is “going slow”.


Example:

Ok let’s take a real world look at this problem.* I have gathered some stats from a real, production (SQL Server) relational database.* This database is around 500GB in size and processes ~700 transactions per second 24 x 7.* This data is just a couple of minutes of captured activity.



The first graph I am showing here is the # of queries based on resource usage.* To simplify things resource usage in my example = CPU TIME*+ # READS + # WRITES.*


Because queries in relational databases can have vastly different resource usage profiles, we get a very wide distribution of resource usage by query (note the logarithmic scale).
*

We can see that most of the queries executing on this database use
Tony Bain is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
HP Oracle Database Machine RolandBeck BI Appliances 3 22nd January 2012 08:06 PM
The problem with the RDBMS (Part 3) – Let's Get Real Tony Bain Innovations in Data Management 0 26th June 2009 11:27 AM
Database Failure and New Beginnings Graham Durant-Law Knowledge matters 0 26th June 2009 11:27 AM


All times are GMT +11. The time now is 03:57 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO