Wednesday, July 22, 2009

UNION Vs UNION ALL

Many times you may have heard this “Use UNION ALL over UNION whenever possible.”  The question arises - why?  To answer this question in one statement  - UNION ALL performs faster compare to UNION. 
Then again question arises - Why UNION ALL performs faster?  Also - Why whenever possible, why not always?

Let me answer the 2nd question 1st – Though both UNION and UNION ALL combines the results of two or more queries into a single result set, there is fundamental difference between these two.  UNION returns only DISTINCT result set, while UNION ALL returns basically all rows available, this includes duplicates.

Lets see the following example:

-- create 2 tables A and B.
CREATE TABLE A
(
ID INT,
Names VARCHAR(10)
)
GO
CREATE TABLE B
(
ID INT,
Names VARCHAR(10)
)
GO
-- insert data into table A
INSERT INTO A VALUES(1,'Mangal');
INSERT INTO A VALUES(5,'Sham');
INSERT INTO A VALUES(2,'Ram');

-- insert data into table B
INSERT INTO B VALUES(2,'Ram');
INSERT INTO B VALUES(3,'Shiv');
INSERT INTO B VALUES(4,'John');

-- test sample data
SELECT id, Names
FROM A
GO
SELECT id, Names
FROM B
GO

Here is how the data of the table A and B looks like :

samle

Note that id=2 and names=Ram is there in both the tables.  That will help us in understanding the difference between UNION and UNION ALL.  Now lets execute the following 2 queries, 1st is with UNION and 2nd is with UNION ALL.

-- with UNION
SELECT id, Names
FROM A
UNION
SELECT id, Names
FROM B
GO

-- with UNION ALL
SELECT id, Names
FROM A
UNION ALL
SELECT
id, Names
FROM B
GO

The result: 

Result

Observations :
1. 1st query with UNION returns 5 rows, and UNION ALL query returns 6 rows.
2.  Row for ID=2(for Ram) appears twice in UNION ALL result set.
3. Result set for UNION is sorted on ID column.  For UNION ALL all the rows of table A appeared 1st followed by rows of table B(no sort).

As you can see, UNION eliminates any duplicate rows from final result set while UNION ALL returns basically all rows available including duplicates.  That is the cause of UNION being slow.  For each row UNION operator checks whether the entire row exists in previous rows or not.  And for making this validation UNION by default 1st sort the result set on the 1st available column of the result set.  In our example UNION has sorted the result set on ID column even though I haven’t specified any ORDER BY clause.  If you see Name “Sham” (which is in table A) appeared last in the UNION result because it has the highest id 5 while it appeared on 2nd row of UNION ALL result.  A look at the query execution plan can help you visualizing it better :

plan 

As you can see cost of the UNION query is 73% compare to 27% for UNION ALL.  And measure reason being the “Distinct Sort” that UNION operator performs to sort and to eliminate the duplicate rows.  While UNION ALL doesn’t really bother about sort and duplicates.  And that is why UNION is slow compare to UNION ALL. 

So again going back to question – why not use UNION ALL always?  And one more question to be added - when to use which one?

- You should  use UNION when you don’t want the duplicates in your final result set, and you are not sure (or may be you are sure) that duplicate records exists in the different queries involved in the UNION.

- You should be using UNION ALL when :
1.  You are not bothered about the duplicate rows in the result.
2.  You are sure there are no duplicates in different queries involved in UNION.  e.g. if you are combining results from 2 or more different years(sales orders) with each query reruns result for individual year with some unique id for each row.  Or combining result for 2 or more different departments.

All this long I’m talking about UNION and UNION ALL as if they are 2 different things all together.  Are they?  Not exactly.  Reason I’m saying this because, when one of my friend asked me about UNION ALL and I advised him to look into the Books online, and he came back complaining me that “books online doesn’t say anything about UNION ALL”.  Reason – he was thinking that Books online must be having some separate section dedicated to UNION ALL, as if it is different from UNION.

Actually, the ALL is just an optional argument in the UNION syntax.  For more on UNION you can refer the books online - http://msdn.microsoft.com/en-us/library/ms180026.aspx.  

Mangal

Friday, July 3, 2009

DELETE Vs TRUNCATE

I know this has been done many times, but still here is something from my side. What is the difference between DELETE and TRUNCATE?
Well one reason I wanted to write this post was, on so many of the blogs, forum threads I keep seeing statements like “TRUNCATE cannot be rolled back”. This is also one of the most frequently asked questions in Interviews.  And since my target audience is people who are just started learning SQL (and NOT SQL Experts), I thought I should write this one.
Instead of just focusing on ROLLBCAK I will try to cover all the differences between DELETE and TRUNCATE.
  • Remove Data  : First thing first, both can remove the data from a table. 
    But a DELETE can be used, to remove the rows not only from a Table but also from a VIEW or the result of an OPENROWSET or OPENQUERY subject to provider capabilities.

  • FROM Clause : With DELETE you can also delete rows from one table/view/rowset_function_limited based on rows from another table by using another FROM clause.  In that FROM clause you can also write normal JOIN conditions.  Actually you can create a DELETE statement from a SELECT statement that doesn’t contain any aggregate functions by replacing SELECT with DELETE and removing column names. 
    With TRUNCATE you can’t do that.


  • WHERE : A TRUNCATE cannot have WHERE Conditions, but a DELETE can.  That means with TRUNCATE you can’t delete a specific row or specific group of rows.
    TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause.

  • Performance : TRUNCATE TABLE is faster and uses fewer system and transaction log resources.
    And one of the reason is locks used by either statements. The DELETE statement is executed using a row lock, each row in the table is locked for deletion. TRUNCATE TABLE always locks the table and page but not each row.

  • Transaction log : DELETE statement removes rows one at a time and makes individual entries in the transaction log for each row. 
    TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log.

  • Pages : After a DELETE statement is executed, the table can still contain empty pages.
    TRUNCATE removes the data by deallocating the data pages used to store the table data.

  • Trigger : TRUNCATE does not activate the delete triggers on the table.  So you must be very careful while using TRUNCATE.  One should never use a TRUNCATE if delete Trigger is defined on the table to do some automatic cleanup or logging action when rows are deleted.
  • Identity Column : With TRUNCATE if the table contains an identity column, the counter for that column is reset to the seed value defined for the column.  If no seed was defined, the default value 1 is used.
    DELETE doesn’t reset the identity counter.  So if you want to retain the identity counter, use DELETE instead.

  • Replication : DELETE can be used against table used in transactional replication or merge replication. 
    While TRUNCATE cannot be used against the tables involved in transactional replication or merge replication.

  • Rollback : DELETE statement can be rolled back. 
    TRUNCATE can also be rolled back provided it is enclosed in a TRANSACTION block and session is not closed. 
    Once session is closed you won't be able to Rollback TRUNCATE.
  • Restrictions : The DELETE statement may fail if it violates a trigger or tries to remove a row referenced by data in another table with a FOREIGN KEY constraint. If the DELETE removes multiple rows, and any one of the removed rows violates a trigger or constraint, the statement is canceled, an error is returned, and no rows are removed. 
    And if DELETE is used against View, that View must be an Updatable view.

    TRUNCATE cannot be used against the table used in Indexed view. 

    TRUNCATE cannot be used against the table referenced by a FOREIGN KEY constraint, unless a table that has a foreign key that references itself.

Source :
DELETE -
http://msdn.microsoft.com/en-us/library/ms189835.aspx 
TRUCNATE -
http://msdn.microsoft.com/en-us/library/ms177570.aspx

I hope you liked my this post on the topic: the difference between DELETE and TRUNCATE in SQL Server.