Sunday, November 14, 2010

ORDER BY Enhancements in SQL Server

Today I am going to talk about one new feature that is been introduced in the SQL Server Denali.   And that is:  one interesting enhancement in ORDER BY clause.  Now with ORDER BY clause you can also specify the OFFSET and FETCH options. 

From Books Online of Denali:
OFFSET: Specifies the number of rows to skip before it starts to return rows from the query expression. The value can be an integer constant or expression that is greater than or equal to zero.

FETCH: Specifies the number of rows to return after the OFFSET clause has been processed. The value can be an integer constant or expression that is greater than or equal to one.

This is how the new syntax of ORDER BY clause looks like in SQL Server Denali:
------------------------------------------------------------------------------------------------------
ORDER BY order_by_expression
    [ COLLATE collation_name ]
    [ ASC | DESC ]
    [ ,...n ]
[ <offset_fetch> ]
<offset_fetch> ::=
{
    OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
    [
      FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
    ]
}

------------------------------------------------------------------------------------------------------

Take a look at following example:

------------------------------------------------------------------------------------------------------
USE AdventureWorks
GO
SELECT
VendorID
    ,Name
FROM Purchasing.Vendor
ORDER BY VendorID OFFSET 5 ROWS
GO

------------------------------------------------------------------------------------------------------

Output:

Offset

In the example I wrote 5 as OFFSET.  You can see in the output 1st five rows are skipped and we got VendorIds starting from 6.  Here SQL Server 1st orders the data on the Column specified in ORDER BY clause(i.e. VendorID).  The next query uses the clause OFFSET 5 ROWS to skip the first 5 rows and return all remaining rows.

Now lets see the example of FETCH:

------------------------------------------------------------------------------------------------------
USE AdventureWorks
GO
SELECT
VendorID
    ,Name
FROM Purchasing.Vendor
ORDER BY VendorID OFFSET 5 ROWS
    FETCH NEXT
3 ROWS ONLY
GO

------------------------------------------------------------------------------------------------------

Output:

FETCH

As you can SQL Server has fetched only 3 rows and that also after skipping 1st five rows.  That is because I specified 5 as OFFSET and 3 as FETCH NEXT.  Here FETCH NEXT 3 ROWS ONLY to limit the rows returned to 3 rows from the sorted result set.

Also:
1. offset_row_count_expression can be a variable, parameter, or constant scalar subquery. When a subquery is used, it cannot reference any columns defined in the outer query scope. That is, it cannot be correlated with the outer query.

2. ROW and ROWS are synonyms and are provided for ANSI compatibility.

3. In query execution plans, the offset row count value is displayed in the Offset attribute of the TOP query operator.

You can find all this information and more on: ORDER BY Clause (Transact-SQL)

Thanks
Mangal

Wednesday, September 1, 2010

Some Things About VIEWS

Everybody has different views about VIEWS, that’s what make them interesting topic to discuss.  Other thing is if you make some assumptions about VIEWS, they can lead you to problems.  As my target audience is SQL beginners, today I’ll talk about few things about VIEWS so some of the obvious mistakes can be avoided.   Actually there are so many things we can talk about VIEWS, but I leave them for future posts.

Today I’ll show what happens when you create a view using “SELECT * FROM TableName” and then ALTER the underlying table used in the VIEW.   The normal assumption most of the SQL beginners make is:  If you create a VIEW using “SELECT * FROM TableName”, all the changes made in the underlying table will automatically reflect in the VIEW as well.

Case 1: You add a column to the table. 
Now here is a question for you:  If you have a table named Employees and a VIEW created  top of it with simple query “SELECT * FROM Employees”.  Now if I add one column to Employees table, will that column appear in result if I execute query “SELECT * FROM View”?

I had ask this question many times in interviews; 80% of time I heard a thumping “Yes”.  And most of the candidate were having experience of well over 3 years.  Answer to above question is BIG NO.  Let me explain this with actual example, 1st lets create some sample data.

------------------------------------------------------------------------------------------------------

-- Create the table with 4 columns
CREATE TABLE
Employees
(
EmpID INT,
FirstName VARCHAR(6),
LastName VARCHAR(20),
ManagerID INT

)
GO

-- Populate the table with sample data
INSERT INTO Employees SELECT
1, 'Mangal', 'Pardeshi', 0 UNION ALL SELECT
2, 'Rahul', 'Sharma', 1 UNION ALL SELECT
3, 'Ajay', 'Varma',2
GO

-- Create a View with wild card *
CREATE VIEW Emp AS
SELECT *
FROM Employees
GO

-- Verify the data in the table and view
SELECT
*
FROM Employees
GO
SELECT *
FROM Emp
GO

------------------------------------------------------------------------------------------------------

Data in the table and view:
test data

As you can see the data  in the Table and View is identical, so are the number of columns.  Now lets add one column HireDate in the table and see what happens.

------------------------------------------------------------------------------------------------------
-- Add HireDate Column in Employees Table
ALTER TABLE Employees ADD HireDate DATETIME
GO

-- Update the HireDate column with some date
UPDATE Employees
SET HireDate = '20100827'
GO

-- Again Verify the data in table and View
SELECT *
FROM Employees
GO
SELECT *
FROM Emp
GO
------------------------------------------------------------------------------------------------------

add column

As you see HireDate column didn’t appear in VIEW.  So question arises: Why a VIEW doesn't get refresh when I add a column to the table? 
Short answer is when you create the VIEW, the column information/definition of VIEW(metadata of VIEW) gets stored in system tables at the time of creation of VIEW.  And that metadata doesn’t get refresh when you alter the underlying table.  You have to explicitly refresh the metadata of VIEW. 
So next question is: How to refresh the VIEW once you modified the underlying table?
There are 2 ways to refresh the VIEW:
1.  Using the system stored procedure sp_refreshview
From Books Online - sp_refreshview: Updates the metadata for the specified non-schema-bound view. Persistent metadata for a view can become outdated because of changes to the underlying objects upon which the view depends.
Syntax: EXECUTE sp_refreshview ‘viewname’

2. Or by executing ALTER VIEW statement.  When you ALTER the VIEW, SQL Server will pick the latest column definition from underlying table and will update the VIEW metadata.

So in our case we can refresh the VIEW Emp by:

------------------------------------------------------------------------------------------------------
-- 1. using system stored procedure sp_refreshview
EXECUTE sp_refreshview 'Emp'
GO
-- 2. Or by executing the ALTER statement
ALTER VIEW Emp
AS
SELECT
*
FROM Employees
GO

-- Verify the data in View
SELECT *
FROM Emp
GO
------------------------------------------------------------------------------------------------------

Note you don’t need to execute both the queries, either of the above can do the trick for you.  Now you can see the HireDate column in Emp view as well:
view

Case 2: You drop a column from the table.
Similarly when you drop a column from the table, the VIEW definition doesn’t get updated even though you have used wild card “*” in VIEW definition.  Now lets drop the column HireDate from the table and see what happens:
Note: In previous step I refreshed the View after adding the HireDate, so now HireDate is part of Emp view as well.

------------------------------------------------------------------------------------------------------
-- First Verify the data before dropping column
SELECT *
FROM Employees
GO
SELECT *
FROM Emp

-- Drop HireDate column from Table
ALTER TABLE Employees DROP COLUMN Hiredate
GO

-- Verify the data after dropping the column.
SELECT *
FROM Employees
GO
SELECT *
FROM Emp
GO

------------------------------------------------------------------------------------------------------

Now in this case HireDate column will get removed from table but the metadata of the VIEW still have its information stored so you will get the following error on selecting data from VIEW.
Msg 4502, Level 16, State 1, Line 1
View or function 'Emp' has more column names specified than columns defined.

The solution is, again refresh the VIEW definition by executing:

------------------------------------------------------------------------------------------------------
EXECUTE sp_refreshview 'Emp'
GO

------------------------------------------------------------------------------------------------------

Now you won’t get the same error.

Case 3: You drop one or more columns and add equal or more number columns to the table.
This case is even more dangerous as User selecting data from the VIEW can get wrong data under wrong columns and can create confusion.  I won’t explain this in detail, just execute following queries and you will realize what I’m saying.

------------------------------------------------------------------------------------------------------
-- Drop column LastName from table
ALTER TABLE Employees DROP COLUMN LastName
GO
-- Add Column DeptName with default value 'IT'
ALTER TABLE Employees ADD DeptName VARCHAR(10) NOT NULL DEFAULT 'IT'
GO

-- Verify the data
SELECT *
FROM Employees
GO
SELECT *
FROM Emp
GO
------------------------------------------------------------------------------------------------------

Final

Now if you see, the LastName column is not present in the table and you can also see the DeptName column in the table.  And interesting observation with VIEW is, though the data is exactly matching with the table but columns names are not correct.  We again need to refresh the VIEW to correct it.

------------------------------------------------------------------------------------------------------
EXECUTE sp_refreshview 'Emp'
GO

-- Verify the data
SELECT *
FROM Emp
GO
------------------------------------------------------------------------------------------------------

Back to normal now:
Last

What is the solution? 
Obvious prevention is don’t use wild card “*” while creating VIEWS.  But even listing out columns is just a prevention or I’d say it is a good practice.   Because even after listing out the columns, if you drop a column from the table that has been used in any VIEW you will still face problems.

The solution is creating the view using “WITH SCHEMABINDING” option.
From Books Online:  Binds the view to the schema of the underlying table or tables. When SCHEMABINDING is specified, the base table or tables cannot be modified in a way that would affect the view definition. The view definition itself must first be modified or dropped to remove dependencies on the table that is to be modified. When you use SCHEMABINDING, the select_statement must include the two-part names (schema.object) of tables, views, or user-defined functions that are referenced. All referenced objects must be in the same database.

Views or tables that participate in a view created with the SCHEMABINDING clause cannot be dropped unless that view is dropped or changed so that it no longer has schema binding. Otherwise, the Database Engine raises an error. Also, executing ALTER TABLE statements on tables that participate in views that have schema binding fail when these statements affect the view definition.

Let me show you one example with SCHEMABINDING:

------------------------------------------------------------------------------------------------------
-- Create View with SCHEMABINDING
CREATE VIEW NewEmp
WITH SCHEMABINDING
AS
SELECT
EmpID,
    FirstName
FROM dbo.Employees
GO

-- Verify the data
SELECT *
FROM NewEmp
GO

-- Now try to alter the table
ALTER TABLE Employees DROP COLUMN EmpID
GO
------------------------------------------------------------------------------------------------------

Now you will get the error:
Msg 5074, Level 16, State 1, Line 2
The object 'NewEmp' is dependent on column 'EmpID'.
Msg 4922, Level 16, State 9, Line 2
ALTER TABLE DROP COLUMN EmpID failed because one or more objects access this column.

Basically  WITH SCHEMABINDING has prevented the change that would affect the view definition.

Additional Information: 
A] If you want see the all the dependant objects on particular table you can use following script:

------------------------------------------------------------------------------------------------------
SELECT *

FROM sys.objects AS so
INNER JOIN sys.sql_expression_dependencies AS sed
    ON so.object_id = sed.referencing_id
WHERE so.type = 'V' AND sed.referenced_id = OBJECT_ID('dbo.employees')

GO
------------------------------------------------------------------------------------------------------

B] The following example creates a script that refreshes the metadata for all views that have a dependency on table dbo.employees:

------------------------------------------------------------------------------------------------------
SELECT DISTINCT 'EXEC sp_refreshview ''' + name + ''''
FROM sys.objects AS so
INNER JOIN sys.sql_expression_dependencies AS sed
    ON so.object_id = sed.referencing_id
WHERE so.type = 'V' AND sed.referenced_id = OBJECT_ID('dbo.employees')
GO

------------------------------------------------------------------------------------------------------

Source:
CREATE VIEW (Transact-SQL)
sp_refreshview (Transact-SQL)

Namaste!
- Mangal

Tuesday, August 24, 2010

Convert Seconds to HH:MM:SS

Today 2 of my colleagues from reporting team had this requirement. They had one table where Seconds were stored as INT and in report they wanted to convert the seconds to HH:MM:SS format.  They already had 1 solution ready with them.   It was something like:

---------------------------------------------------------------------------------------------------------------

DECLARE @Seconds INT
SET @Seconds = 3800

SELECT CONVERT(VARCHAR(10),@Seconds/3600) 
    +':'
    + RIGHT('00'+CONVERT(VARCHAR(2),(@Seconds%3600)/60),2)
    +':'
    + RIGHT('00'+CONVERT(VARCHAR(2),@Seconds%60),2) AS [HH:MM:SS]
  GO

----------------------------------------------------------------------------------------------------------------- The output--
01:03:20

It was working fine, but my colleagues were looking for something different, something elegant.  And I jumped on to help them.  Now a days I hardly get any chance to write SQL, so i don’t let such opportunities go.  I had one solution in mind using CONVERT and Style 108:

--------------------------------------------------------------------------------------------------------------- DECLARE @Seconds INT
SET @Seconds = 3800
SELECT CONVERT(VARCHAR(8), DATEADD(SS,@Seconds,0),108) AS [HH:MM:SS]
GO

---------------------------------------------------------------------------------------------------------------

But problem with the above query is, it fails when number of seconds are more than 86399(there are total 86400 seconds in a day).  So if number of seconds are 86400; above query will show 00 hours instead of showing 24 hours.  See following example:

--------------------------------------------------------------------------------------------------------------- DECLARE @Seconds INT
SET @Seconds = 86500
SELECT CONVERT(VARCHAR(8), DATEADD(SS,@Seconds,0),108)  AS [HH:MM:SS]
GO

---------------------------------------------------------------------------------------------------------------

-- The output--
00:01:40

Here the expected output was 24:01:40.  So I modified the above query a little and I came up with this:

---------------------------------------------------------------------------------------------------------------DECLARE @Seconds INT,@Hour VARCHAR(10)
SET @Seconds = 86500
SET @Hour = DATEDIFF(HH, 0, DATEADD(SS,@Seconds,0))
SELECT STUFF(CONVERT(VARCHAR(8), DATEADD(SS,@Seconds,0),108),1,2,@Hour)  AS [HH:MM:SS]
GO

---------------------------------------------------------------------------------------------------------------

-- The output--
24:01:40

Just in case you want to run above query on table:

--------------------------------------------------------------------------------------------------------------- SELECT STUFF(CONVERT(VARCHAR(8), DATEADD(SS, Seconds,0),108),1,2, DATEDIFF(HH, 0, DATEADD(SS, Seconds,0)))  AS [HH:MM:SS]
FROM TableName
GO

---------------------------------------------------------------------------------------------------------------

I don’t know which approach is better, rather you tell me which one you liked.  Looking at them I think both the queries will give almost identical performance, just that 1st query looks like a Mathematical solution while my approach looks like a SQL solution.  If you know any other approach please feel free to share.

Namaste!
- Mangal

Friday, August 20, 2010

SQL Myth: Primary Key and Clustered Index

After remaining quiet for almost a year I’m back with what I enjoy the most, talking about SQL and sharing whatever little knowledge I have.   Many SQL Developers have this misconception:  “Primary key => Clustered Index: Only a Clustered Index can exist on a Primary key column”.     On numerous occasions I had tough times explaining that this is not the case every time, you can create a Non-Clustered Index on a primary key column.  But if this hot discussion is going on across a coffee table and I’m away from Computer I get helpless.  So finally I decided to write about this.

You can create a Non-Clustered Index on primary key column.   Or if I try to put this in Myth Buster words “A primary key column can exist/survive without a Clustered Index”  Yes it is a fact that PRIMARY KEY constraints default to CLUSTERED Index.   But it doesn’t mean that you CAN’T create a non-clustered index on a Primary Key Column.  And also you can create a Clustered Index on a non-primary key column.  Let me show you this with some simple examples. 

Case 1:  1st lets see what happens when you specify only PRIMARY KEY and nothing else.  In this case YES, by default Clustered Index will be created on Primary Key Column.

USE tempdb
GO

CREATE TABLE MyTable1
(
Id INT PRIMARY KEY,
Dates DATETIME
)
GO

You can see as expected a clustered Index got created on ID column.1

Case 2:  But by just adding a NONCLUSTERED word in front of primary key you can tell SQL Server to create a Non Clustered Index instead of a default Clustered one.

Here you go:

USE tempdb
GO

CREATE TABLE MyTable2
(
Id INT PRIMARY KEY NONCLUSTERED,
Dates DATETIME
)
GO

As you can clearly see in the image a non-clustered index got created on ID column which is also a primary key.

2

Case 3:  Now here is the small trick, you can force SQL Server to create a non-clustered index on a primary key column even without writing NONCLUSTERED in front of it.  Yes, there is exception to the rule   “PRIMARY KEY constraints default to CLUSTERED Index” even if you don’t specify NONCLUSTERED.   Question is, How?  Well by simply creating a Clustered Index on another column while creating the table.

USE tempdb
GO

CREATE TABLE MyTable3
(
Id INT PRIMARY KEY,
Dates DATETIME UNIQUE CLUSTERED
)
GO

See the Image, a Non Clustered got created on ID column and a Clustered on Dates column. 3

The obvious question will be, why SQL Server didn’t create the Clustered on Id column this time?  Answer is very simple, if you know the basic rule “You can have only one Clustered Index on a table”.  And since in the CREATE statement you forced SQL Server to create a clustered index on dates columns SQL Server had no choice but to create a non clustered on Id column.

What we learned today?
Honestly speaking, I didn’t tell anything new.  Experts/people with good knowledge about SQL Server already knew this.  But interesting thing we can learn here – yes there are some DEFAULTs set by SQL Server, but that doesn’t stop you from telling SQL Server “Boss enough of your DEFAULTs, now let me take the control”.  Actually in early days of learning SQL we all get into  this habit of relying on DEFAULTs set by SQL Server.  And we get so used to them that we start considering them as RULES that can’t be broken.

Actually I think there is no harm in taking little bit extra effort and writing some extra keywords and telling SQL Server this is what I want or this is what is expected.

You also learned how to create a Clustered Index on a column of your choice.  This can be very useful when you don’t want a Clustered Index on a primary key especially in cases like where you are using GUID as a primary key(I hate them) and you want Clustered Index to be created on some other column.

Namaste!
- Mangal.