DISTINCT keyword and query performance in SQL Server

0UniqueDISTINCT keyword and query performance in SQL Server

DISTINCT is really useful and handy whenever you wanted to have distinct value in your result set but nothing comes free and DISTINCT keyword is not an exception. If your database is properly normalized, you face less issue to use DISTINCT. DISTINCT and performance of SELECT query is long debated topic and today I come up with my findings and experience with DISTINCT.

I have also seen some myth hovering around DISTINCT. I have seen few very popular blog post which suggest to use GROUP BY or sub query or EXISTS over the DISTINCT keyword to gain performance over DISTINCT keyword but personally I don’t believe it. SQL Server is very smart and it always try to find best suited execution plan for your query. BTW, I will not give name of those blog post and author, however if you are eager then you can search google with terms something like “speed up DISTINCT query” or “performance of DISTINCT in SQL Server” or anything else which has “DISTINCT” and “Performance” together.

Anyway, let us move on and find the effect of DISTINCT keyword on SELECT query. I have prepared demonstration in AdventureWorks2012 database so either download it from CODEPLEX website or use your own database by changing table name in the queries.

Here are two simple SELECT query with and without DISTINCT. You can see that query with DISTINCT keyword uses more resource.

[sourcecode language=”sql”]
–It is HIGHLY recommended that, DON’T use following two DBCC
–commnad on production environment. this is just for testing
–purpose in developement/test environent
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT DISTINCT SalesOrderID FROM Sales.SalesOrderDetail
GO

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT SalesOrderID FROM Sales.SalesOrderDetail
GO
[/sourcecode]

Here is the screen capture of execution plan for both queries we have executed above:

1SimpleSelectwithDistinct

Query with DISTINCT keyword is taking 77% of total batch execution time whereas without SELECT returning more row though it takes only 23%. I have tried to get unique SalesOrderID from SalesOrderDetail table, I should have get it from SalesOrderHeader table as there would be less records with unique SalesOrderID. This query could be considered as bad designed query.

Now let us see whether there is any performance enhancement of GROUP BY or Sub Query or Exists or IN / NOT IN over DISTINCT key words.

Here are some different queries which returns same data. I have used DBCC command before each query so that previously created execution plan won’t be used.

[sourcecode language=”sql”]
–It is HIGHLY recommended that, DON’T use following two DBCC
–commnad on production environment. this is just for testing
–purpose in developement/test environent
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT DISTINCT SO.SalesOrderID,SH.DueDate,SH.TotalDue FROM
Sales.SalesOrderHeader AS SH INNER JOIN
Sales.SalesOrderDetail AS SO
ON SH.SalesOrderID=so.SalesOrderID
WHERE SO.ProductID=776
GO

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT SH.SalesOrderID,SH.DueDate,SH.TotalDue FROM
Sales.SalesOrderHeader AS SH
WHERE EXISTS
(SELECT 1 FROM Sales.SalesOrderDetail AS SO WHERE SO.SalesOrderID=SH.SalesOrderID AND SO.ProductID=776)
GO

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT SH.SalesOrderID,SH.DueDate,SH.TotalDue FROM
Sales.SalesOrderHeader AS SH
WHERE SH.SalesOrderID IN
(SELECT SO.SalesOrderID FROM Sales.SalesOrderDetail AS SO WHERE SO.ProductID=776)
GO

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT SO.SalesOrderID,SH.DueDate,SH.TotalDue FROM
Sales.SalesOrderHeader AS SH INNER JOIN
Sales.SalesOrderDetail AS SO
ON SH.SalesOrderID=so.SalesOrderID
WHERE SO.ProductID=776
GROUP BY SO.SalesOrderID, SH.DueDate,SH.TotalDue
GO
[/sourcecode]

You can see each of the query above will return same number of row with exact same data, moreover there is no performance enhancement. No matter what condition or keyword you use, SQL Server will choose best suited plan for executing your query. You can confirm this by looking at following screen capture of execution plan.

2SELECTwithDISTINCTandOtherOption

If you like this article, do like “Extreme-Advicepage in Facebook.

Reference: Ritesh Shah

http://Extreme-Advice.com

http://www.sqlhub.com

Note: Microsoft Books online is a default reference of all articles.

Author: Riteshshah

Ritesh Shah is a database professional with having 16+ years of experience in various domain of IT field. He is a author of many technical articles on Microsoft Technology and authored a book of SQL Server 2012 Performance Tuning cookbook. He loves work with different Microsoft Products.

2 thoughts on “DISTINCT keyword and query performance in SQL Server”

Comments are closed.