| |
| ||||||
Why to avoid DISTINCT and GROUP BY to get unique recordsThis is a discussion on Why to avoid DISTINCT and GROUP BY to get unique records within the BI Monkey forums, part of the CORTEX Blogs category; This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last ... |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| | #1 |
| Guru Join Date: Jun 2009
Posts: 135
![]() | This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last couple of weeks. Their thought process was that because they were getting duplicate records, the easiest way to get rid of them was to slap a DISTINCT at the start of the query to get unique results. Which, in a sense, is OK – because it worked (sort of).However there’s two very good reasons why this is not always a good approach. #1: Your query is wrong If you are getting back duplicate records, what it probably means is that you are really doing your query wrong. The below example is an admittedly imperfect example of this – as the first query returns far more than intended – but was close to what I was dealing with: USE AdventureWorksWhat I’m trying to illustrate with the example above is that if you consider more carefully what records you are bringing back in your joins, you are less likely to end up with duplicates. By making sure you are only joining to tables in such a way as to bring back the data you need is going to reduce the risk of other errors creeping in. #2: Performance If you are getting duplicate records, you are bringing back more data than you need. On top of this the DISTINCT or GROUP BY operations are having to go over the whole returned data set to identify unique records. This can get pretty expensive pretty quicky in terms of database operations. From my perspective badly performing queries are a lesser sin than incorrect ones. I doubt many business users will be making decisions based on query length, but they will on the data you serve up to them. Summing up All I want to do in this post is make you pause and think before doing a DISTINCT or GROUP BY purely to eliminate duplicates – especially if you don’t really understand why you are getting them. A better designed and more accurate query can often get rid of the dupes and cut off the risk of bad data in the future. Get More from the original blog... |
| | |
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Mining Engineer OP Planning - to $155K - Distinct lack of bureaucracy here! | admin | 2010 Job Archive | 0 | 19th March 2010 06:30 PM |
| SSAS: Can I make my Distinct Count totals subtotal correctly? | Darren Gosbell | Random Procrastination | 0 | 16th February 2010 03:16 AM |
| A couple more unique business cards | admin | Prediction Markets News Feeds | 0 | 30th January 2010 04:22 AM |
| A Speeding Ticket to Avoid | admin | Prediction Markets News Feeds | 0 | 13th January 2010 01:04 AM |
| Is your organisation really unique? | Robert Hillard | Navigating the Information Management maze | 0 | 28th October 2009 05:01 PM |
| | |
| | |