| |
| ||||||
The Problem with the Relational Database (Part 2) – PredictabilityThis is a discussion on The Problem with the Relational Database (Part 2) – Predictability within the Innovations in Data Management forums, part of the CORTEX Blogs category; Introduction The Problem with the Relational Database (Part 1 ) –The Deployment Model I met with a friend of mine in New York recently who runs a credit card processing ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| Senior Member Join Date: Jun 2009
Posts: 71
![]() | I met with a friend of mine in New York recently who runs a credit card processing end-point company.* They specifically built their business around a non-relational database platform and feel they would have major issues had they chosen to build their business around a traditional relational database. In this business they process hundreds of transactions a second, this means that hundreds of people every second are standing somewhere in the world at a checkout making a purchase.* And if the transaction is delayed, that is a lot of people are standing and waiting.* Worst still if the transaction is delayed too long the transaction fails and the customer not only has to retry their transaction, they are likely embarrassed and probably somewhat annoyed. The relational database certainly has the capability to scale to many hundreds of transactions a second. But the reasons for their move away from the relational database aren’t specifically scale related but instead come down to predictability and consistency. Relational databases are not predictable or reliable in terms of consistent performance for a number of reasons. Firstly, every query uses a different amount of resources.* A query could consume 1 or 2 I/O’s or 1 or 2 million I/O’s depending on how the query is written, what data is selected and factors such as how the database is indexed.* Performance is further varied by how the database is maintained (fragmentation).* What makes matters more complex is that different predicate values for a query can have vastly given data distributions.* The same query executed with different constants can have vastly different resource requirements. Because every query has a different “footprint” running a query in isolation does not provide indicative*statistics on how that query will perform under concurrent load.* In fact is become impossible to predict the exact execution duration of a relational database query as its performance will be dependent on what else is being executed at the exact moment it is. Cost Based Optimization Because queries can have complex execution steps behind them with multiple paths to achieving the result, one neat thing that was added to the relational database was cost based optimization.* This takes the control of “how” a query is resolved away from the developer and instead the relational database server works this out at runtime using a complex series of algorithms.* This was hugely important for simplifying the use of relational databases.* But it is bad for predictability. Cost based optimization can get it wrong.* It is much less common now than 10 years ago but every DBA has still come across cases when the optimizer chooses the wrong “query plan” and the query execution takes a long time for seemingly for no reason.* But worse for predictability is cost based optimization is dynamic and looks at a lot of factors at run time to determine which plan is best.* These factors can change meaning that query performance can change (often without any external schema/code modifications).* This dynamic ability can cause headaches to DBAs who are often tasked with resolving performance issues when suddenly the database is “going slow”. Example: Ok let’s take a real world look at this problem.* I have gathered some stats from a real, production (SQL Server) relational database.* This database is around 500GB in size and processes ~700 transactions per second 24 x 7.* This data is just a couple of minutes of captured activity. The first graph I am showing here is the # of queries based on resource usage.* To simplify things resource usage in my example = CPU TIME*+ # READS + # WRITES.* Because queries in relational databases can have vastly different resource usage profiles, we get a very wide distribution of resource usage by query (note the logarithmic scale). We can see that most of the queries executing on this database use |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| HP Oracle Database Machine | RolandBeck | BI Appliances | 3 | 22nd January 2012 08:06 PM |
| The problem with the RDBMS (Part 3) – Let's Get Real | Tony Bain | Innovations in Data Management | 0 | 26th June 2009 11:27 AM |
| Database Failure and New Beginnings | Graham Durant-Law | Knowledge matters | 0 | 26th June 2009 11:27 AM |
| | |
| | |