Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > BI Monkey
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Why to avoid DISTINCT and GROUP BY to get unique records

This is a discussion on Why to avoid DISTINCT and GROUP BY to get unique records within the BI Monkey forums, part of the CORTEX Blogs category; This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last ...


Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 30th January 2011, 03:30 AM   #1
Guru
 
Join Date: Jun 2009
Posts: 135
James Beresford is on a distinguished road
Thumbs up Why to avoid DISTINCT and GROUP BY to get unique records

This is a quick and dirty post on the use of DISTINCT or GROUP BY to get unique records, based on something I helped a developer with over the last couple of weeks.

Their thought process was that because they were getting duplicate records, the easiest way to get rid of them was to slap a DISTINCT at the start of the query to get unique results. Which, in a sense, is OK – because it worked (sort of).However there’s two very good reasons why this is not always a good approach.

#1: Your query is wrong

If you are getting back duplicate records, what it probably means is that you are really doing your query wrong. The below example is an admittedly imperfect example of this – as the first query returns far more than intended – but was close to what I was dealing with:
USE AdventureWorks

/* Query 1: Using DISTINCT to try to eliminate duplicates */

select** *DISTINCT
s.Name,
CASE
WHEN sc.ContactTypeID = 11 THEN ‘Y’
ELSE ‘N’
END** *AS** *’OwnerContact’
from** *Sales.Store s
left join Sales.StoreContact sc
ON s.CustomerID = sc.CustomerID

/* Query 2: Using a properly formed WHERE clause */

select** *s.Name,
‘Y’ AS** *’OwnerContact’
from** *Sales.Store s
left join Sales.StoreContact sc
ON s.CustomerID = sc.CustomerID
WHERE sc.ContactTypeID = 11

What I’m trying to illustrate with the example above is that if you consider more carefully what records you are bringing back in your joins, you are less likely to end up with duplicates. By making sure you are only joining to tables in such a way as to bring back the data you need is going to reduce the risk of other errors creeping in.

#2: Performance

If you are getting duplicate records, you are bringing back more data than you need. On top of this the DISTINCT or GROUP BY operations are having to go over the whole returned data set to identify unique records. This can get pretty expensive pretty quicky in terms of database operations.

From my perspective badly performing queries are a lesser sin than incorrect ones. I doubt many business users will be making decisions based on query length, but they will on the data you serve up to them.

Summing up

All I want to do in this post is make you pause and think before doing a DISTINCT or GROUP BY purely to eliminate duplicates – especially if you don’t really understand why you are getting them. A better designed and more accurate query can often get rid of the dupes and cut off the risk of bad data in the future.



Get More from the original blog...
James Beresford is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiTweet this Post!
Reply With Quote
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mining Engineer OP Planning - to $155K - Distinct lack of bureaucracy here! admin 2010 Job Archive 0 19th March 2010 06:30 PM
SSAS: Can I make my Distinct Count totals subtotal correctly? Darren Gosbell Random Procrastination 0 16th February 2010 03:16 AM
A couple more unique business cards admin Prediction Markets News Feeds 0 30th January 2010 04:22 AM
A Speeding Ticket to Avoid admin Prediction Markets News Feeds 0 13th January 2010 01:04 AM
Is your organisation really unique? Robert Hillard Navigating the Information Management maze 0 28th October 2009 05:01 PM


All times are GMT +11. The time now is 10:40 PM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO