Comparing UNION and UNION ALL operator in SQL Server
It happens many-time that we need a single result set by the combination of two or more result set in SQL Server. UNION operator comes as a handy tool in this situation.
I have often seen that people tend to use UNION though there UNION ALL can work in that situation. Do you think twice before making a choice between UNION and UNION ALL? Or use UNION always, just to save some (three “ALL”) key stroke???
UNION and UNION ALL has different purpose and that is why those have to be used wisely because it affect performance of the overall query.
Before we go further, let us see what UNION and UNION ALL can do for us.
UNION: This operator can combine two or more resultset into one single resultset and remove duplicate records to show distinct result.
UNION ALL: This operator can combine two or more resultset into one single resultset but don’t remove duplicate records to show distinct result and hence if there is any duplication in result sets, you will get more number rows.
There are some rules to follow in order to use either of these operator.
1.) COMPUTE and ORDER BY clause can be applied for over all resultset, not for individual SELECT statement part of UNION operator
2.) GROUP BY and HAVING clause can work with individual SELECT statement, not with overall resultset
3.) Each SELECT statement participated in UNION (ALL) operator suppose to have same number of column with compatible data types.
4.) Column name or alias given in the first SELECT statement would be used as a column heading in final resultset.
Well, after having an idea of UNION and UNION ALL, let us now look at the TSQL script to know how we can practically use it.
Create two sample tables with following script:
USE AdventureWorks2012 GO CREATE TABLE UnionTest1 ( ID INT IDENTITY(1,1) ,Name VARCHAR(25) ) GO CREATE TABLE UnionTest2 ( ID INT IDENTITY(1,1) ,Name VARCHAR(25) ) GO INSERT INTO UnionTest1 VALUES ('Ritesh Shah'), ('Rajan Shah') GO INSERT INTO UnionTest2 VALUES ('Ritesh Shah'), ('Bihag Thakar'), ('Bhushan Shah') GO
Now execute following two SELECT query (keep your execution plan on query) out of which first query is using UNION and second query is using UNION ALL. Please note that we have total 4 unique name in both table.
USE AdventureWorks2012 GO SELECT Name FROM UnionTest1 UNION SELECT Name FROM UnionTest2 GO SELECT Name FROM UnionTest1 UNION ALL SELECT Name FROM UnionTest2 GO
Here is the resulset I have received by executing above query:
UNION did the DISTINCT internally and hence return only 4 unique name in first resultset whereas UNION ALL return all 5 names.
Now, let us have a look at execution plan to find query cost.
If you look at the query cost, UNION has taken 73% cost of total batch execution whereas UNION ALL has taken remaining 27% only.
Thus it is proved that UNION is adding overhead and taking more time to execute. UNION should be used as a last resort. If database is properly normalized, you don’t need UNION. If there is a situation that your result set returning duplicate record, you should find a way to eliminate it, if possible, rather then taking a easy solution of using UNION.
Reference: Ritesh Shah
Note: Microsoft Books online is a default reference of all articles.