Many times you may have heard this “Use UNION ALL over UNION whenever possible.” The question arises - why? To answer this question in one statement - UNION ALL performs faster compare to UNION.
Then again question arises - Why UNION ALL performs faster? Also - Why whenever possible, why not always?
Let me answer the 2nd question 1st – Though both UNION and UNION ALL combines the results of two or more queries into a single result set, there is fundamental difference between these two. UNION returns only DISTINCT result set, while UNION ALL returns basically all rows available, this includes duplicates.
Lets see the following example:
-- create 2 tables A and B. -- insert data into table B -- test sample data |
Here is how the data of the table A and B looks like :
Note that id=2 and names=Ram is there in both the tables. That will help us in understanding the difference between UNION and UNION ALL. Now lets execute the following 2 queries, 1st is with UNION and 2nd is with UNION ALL. –
-- with UNION |
The result:
Observations :
1. 1st query with UNION returns 5 rows, and UNION ALL query returns 6 rows.
2. Row for ID=2(for Ram) appears twice in UNION ALL result set.
3. Result set for UNION is sorted on ID column. For UNION ALL all the rows of table A appeared 1st followed by rows of table B(no sort).
As you can see, UNION eliminates any duplicate rows from final result set while UNION ALL returns basically all rows available including duplicates. That is the cause of UNION being slow. For each row UNION operator checks whether the entire row exists in previous rows or not. And for making this validation UNION by default 1st sort the result set on the 1st available column of the result set. In our example UNION has sorted the result set on ID column even though I haven’t specified any ORDER BY clause. If you see Name “Sham” (which is in table A) appeared last in the UNION result because it has the highest id 5 while it appeared on 2nd row of UNION ALL result. A look at the query execution plan can help you visualizing it better :
As you can see cost of the UNION query is 73% compare to 27% for UNION ALL. And measure reason being the “Distinct Sort” that UNION operator performs to sort and to eliminate the duplicate rows. While UNION ALL doesn’t really bother about sort and duplicates. And that is why UNION is slow compare to UNION ALL.
So again going back to question – why not use UNION ALL always? And one more question to be added - when to use which one?
- You should use UNION when you don’t want the duplicates in your final result set, and you are not sure (or may be you are sure) that duplicate records exists in the different queries involved in the UNION.
- You should be using UNION ALL when :
1. You are not bothered about the duplicate rows in the result.
2. You are sure there are no duplicates in different queries involved in UNION. e.g. if you are combining results from 2 or more different years(sales orders) with each query reruns result for individual year with some unique id for each row. Or combining result for 2 or more different departments.
All this long I’m talking about UNION and UNION ALL as if they are 2 different things all together. Are they? Not exactly. Reason I’m saying this because, when one of my friend asked me about UNION ALL and I advised him to look into the Books online, and he came back complaining me that “books online doesn’t say anything about UNION ALL”. Reason – he was thinking that Books online must be having some separate section dedicated to UNION ALL, as if it is different from UNION.
Actually, the ALL is just an optional argument in the UNION syntax. For more on UNION you can refer the books online - http://msdn.microsoft.com/en-us/library/ms180026.aspx.
Mangal
No comments:
Post a Comment