Friday, May 15, 2009

GROUPING Function SQL Server

If you are going to use ROLLUP or CUBE operators in future or used it in past, then this is the post you must read.

Well GROUPING is one of the least used function in T-SQL, even in cases where it should be used. And so many times instead of using GROUPING function we follow some wrong/bad practice by using ISNULL only.  I’ll just give you a small example of it in this post.

First lets understand what the GROUPING function is – From Books on line

GROUPING (Transact-SQL) : Indicates whether a specified column expression in a GROUP BY list is aggregated or not. GROUPING returns 1 for aggregated or 0 for not aggregated in the result set. GROUPING can be used only in the SELECT <select> list, HAVING, and ORDER BY clauses when GROUP BY is specified.

Syntax : GROUPING ( <column_expression> )

Arguments <column_expression> : Is a column or an expression that contains a column in a GROUP BY clause.

To put it in simple words : GROUPING is an aggregate function that causes an additional column to be output with a value of 1 when the row is added by either the CUBE or ROLLUP operator, or 0 when the row is not the result of CUBE or ROLLUP.

Grouping is allowed only in the select list associated with a GROUP BY clause that contains either the CUBE or ROLLUP operator.

So after reading it from the books on line “what the GROUPING function is”, you might like to ask me where it can be useful? And what is wrong practice I was talking about at the start of this post?

Well answer to 1st question is, it is very useful while using ROLLUP or CUBE operator. Especially when you have some NULL values in any of the columns used in GROUP BY clause. Basically it helps us to identify which row is added by CUBE or ROLLUP operator. Need of this function arises because when CUBE or ROLLUP operator adds the subtotals and totals rows to result SQL Server returns NULL in related GROUP columns. And in such cases it is difficult to differentiate between actual NULL values from the column and the NULL returned by CUBE or ROLLUP operator.

Now to understand how it works and the wrong practice I was talking about, lets see following example.

For example I'll take a simplified table of sales representatives and and their Sales in respective Zones.

-- create sample table Sales

CREATE TABLE Sales
(
ID INT,
FName VARCHAR(30),
Zone VARCHAR(30),
Sale INT
)
GO

-- Load sample data

INSERT INTO Sales SELECT
1, 'Mangal', 'East', 20 UNION ALL SELECT
2, 'Mangal', 'East', 150 UNION ALL SELECT
3, 'Mangal', 'West', 50 UNION ALL SELECT
4, 'Ram', 'East', 45 UNION ALL SELECT
5, 'Ram', NULL, 80 UNION ALL SELECT
6, 'Ram', NULL, 40 UNION ALL SELECT
7, 'Sachin', 'West', 50 UNION ALL SELECT
8, 'Sachin', 'West', 40
GO

-- Test sample data

SELECT Id, FName, Zone, Sale
FROM Sales
GO

The sample data :

ID FName Zone Sale
1 Mangal East 20
2 Mangal East 150
3 Mangal West 50
4 Ram East 45
5 Ram NULL 80
6 Ram NULL 40
7 Sachin West 50
8 Sachin West 40

Now here is what we normally do when somebody asks us to summarize the sale column on FName with subtotals for each Zone : We write a SELECT query with GROUP BY FName, Zone WITH ROLLUP. And when we see NULLS coming in Zone/FName column because of ROLLUP we add a ISNULL function on Zone column (in this example we'll add ISNULL only on Zone column, as I have intentionally added some NULLS in Zone column's sample data ). And that is where we go wrong, the Wrong Practice I was talking about.

SELECT Fname, ISNULL(Zone, 'All Zone') as Zone, SUM(Sale) AS Total
FROM Sales
GROUP BY Fname, Zone WITH ROLLUP
ORDER BY
FName

The output :

FName Zone Total
NULL All Zone 475
Mangal East 170
Mangal West 50
Mangal All Zone 220
Ram All Zone 120
Ram East 45
Ram All Zone 165
Sachin West 90
Sachin All Zone 90

As you can see in the result, for Mangal and Sachin everything looks fine, but for Ram there are two “All Zone” in Zone column. Well 1st 'All Zone' is because of actual NULL value in Zone column and 2nd All Zone is sub total returned by ROLLUP operator for all Zones for Ram. But ISNULL is not capable of differentiating between these two NULLS and blindly replaces them with 'All Zone'. And that's when GROUPING function comes in for our help. Now 1st just see what GROUPING returns, execute the following query -

SELECT FName,Zone, SUM(Sale) AS Total, GROUPING(Zone) as 'Grouping'
FROM Sales
GROUP BY FName, Zone WITH ROLLUP
ORDER BY
  FName

And the result :

FName Zone Sale Grouping
NULL NULL 475 1
Mangal East 170 0
Mangal West 50 0
Mangal NULL 220 1
Ram NULL 120 0
Ram East 45 0
Ram NULL 165 1
Sachin West 90 0
Sachin NULL 90 1

In above result wherever you see 1 in Grouping column, that means that row is added by ROLLUP operator.  And wherever you see 0 in Grouping column, that means that row is not returned by ROLLUP.  So now identifying between NULL by ROLLUP and NULL from column becomes easy for us.  Now we can use this GROUPING indicator in our query, to identify the NULL rows added by ROLLUP operator, and NULLS from the column.

So our final query will be :

SELECT Fname,CASE GROUPING(Zone)
                WHEN 1 THEN 'All Zone'
                ELSE ISNULL(Zone, 'Unknown') END as Zone,
        SUM(Sale) AS Total
FROM Sales
GROUP BY Fname, Zone WITH ROLLUP
ORDER BY  FName

And the output:

FName Zone Total
NULL All Zone 475
Mangal East 170
Mangal West 50
Mangal All Zone 220
Ram Unknown 120
Ram East 45
Ram All Zone 165
Sachin West 90
Sachin All Zone 90

So now, all NULLS from column are replaced by 'Unknown' and all subtotals added by ROLLUP comes with 'All Zone' in Zone column.

The GROUPING function becomes even more useful while using CUBE operator.

Mangal Pardeshi

Wednesday, April 29, 2009

Query to fetch the Yesterday's Data

Seems to be easy question. And many of you may be already knew 2-3 ways of doing it. Here is the query I use normally to get the all the data from the previous day(Yesterday).

SELECT Id, OrderId, CustomerId
FROM Orders
WHERE Date >= DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0)
AND Date < DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0)

Make it more readable and scalable using variable

DECLARE @YesterDay DATETIME, @Today DATETIME
SET @YesterDay = DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0)
SET @Today = DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0)

SELECT
Id, OrderId, CustomerId
FROM Orders
WHERE  Date >= @YesterDay
AND Date < @Today

Logic : Basically above WHERE conditions is like - Greater than Equal to Yesterday and Less than Today.

This DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0) gives Yesterday's date with time as 00:00:00.

And this DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0)  gives today's date with time as 00:00:00.

You can do a quick check by executing -

SELECT DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0) AS YDay
,DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0) as ToDay

Reason for doing this way instead of using some obvious ones – As you may heard this from many SQL experts, “Whenever possible never use CONVERT/CAST or any functions in WHERE clause on table's column.”

Question is Why?

Well whenever you use the any function on a column , SQL Server no longer uses the Index define on particular column. And use of a CAST or CONVERT functions slows down the query execution considerably.

You may say, I used the DateAdd and DateDiff function in the WHERE clause. Yes, I used them, but NOT on Date column from the table. I used them with the Getdate function which is more of a constant and NOT part of the table's column.

Query Optimization Tip :

Whenever possible never use CONVERT/CAST or any functions in WHERE clause on table's column.

In future I'll have some more posts on Date comparison in WHERE condition. Also some more simple tricks to optimized the everyday's queries.

 

Mangal Pardeshi
SQL MVP

Wednesday, April 1, 2009

UNPIVOT Multiple Columns

Unlike PIVOT which de-normalizes the data, UNPIVOT is more helpful in getting your data back to a more normalized form.

UNPIVOT performs the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.

For a single column UNPIVOT examples and for more on PIVOT and UNPIVOT you can read from Microsoft Books Online - Using PIVOT and UNPIVOT

In this article I’ll try to show you how to UNPIVOT multiple columns in a single query. There is simple trick to do that, if you are familiar with the UNPIVOT operator you will find it easy to understand.

First lets create some sample data :

-- Create sample table
CREATE TABLE Suppliers
(Id INT,
Product VARCHAR(500),
Supplier1 VARCHAR(500),
Supplier2 VARCHAR(500),
Supplier3 VARCHAR(500),
City1 VARCHAR(500),
City2 VARCHAR(500),
City3 VARCHAR(500)
)
GO

-- Load Sample data
INSERT INTO Suppliers SELECT
1, 'Car', 'Tata Motors', 'Bajaj', 'Mahindra', 'Jamshedpur','Pune','Mumbai'
UNION ALL SELECT
2, 'Bike', 'Bajaj', 'Hero Honda', 'Suzuki', 'Pune', 'New Delhi', 'Chandigarh'
UNION ALL SELECT
3, 'Cycle', 'Hercules', 'Hero', 'Atlas', 'Mumbai', 'Banglaore', 'Pune'
GO

-- Test sample data
SELECT Id, Product, Supplier1, Supplier2, Supplier3, City1, City2, City3
FROM Suppliers
GO

Here is how our sample table looks like :

Id Product Supplier1 Supplier2 Supplier3 City1 City2 City3
1 Car TataMotors Bajaj Mahindra Jamshedpur Pune Mumbai
2 Bike Bajaj Hero Honda Suzuki Pune New Delhi Chandigarh
3 Cycle Hercules Hero Atlas Mumbai Bangalore Pune

And as you can see here we have 2 columns which needs to be UNPIVOT, first is Supplier and second is there respective Cities.  If it was single column to be UNPIVOTed it could have been a straight forward task.  Thing is you can write a query with 2 UNPIVOT operators, but question is how to JOIN two UNPIVOT sub queries correctly so we will get the each supplier with its respective city next to it in second column?

As I said earlier there is simple trick to do that.  See the following query and its WHERE condition :

SELECT Id,
    Product,
    ROW_NUMBER()OVER(Partition By Id Order By Suppliers) as SuppId,
    SupplierName,
    CityName
FROM
(
SELECT Id, Product, Supplier1, Supplier2, Supplier3, City1, City2, City3
FROM Suppliers
) Main


UNPIVOT
(
SupplierName FOR Suppliers IN (Supplier1, Supplier2, Supplier3)
) Sup


UNPIVOT
(
CityName For Cities IN (City1, City2, City3)
) Ct


WHERE RIGHT(Suppliers,1) =  RIGHT(Cities,1)


And here is the Output of the query :
Id Product SuppId SupplierName CityName
1 Car 1 Tata Motors Jamshedpur
1 Car 2 Bajaj Pune
1 Car 3 Mahindra Mumbai
2 Bike 1 Bajaj Pune
2 Bike 2 Hero Honda New Delhi
2 Bike 3 Suzuki Chandigarh
3 Cycle 1 Hercules Mumbai
3 Cycle 2 Hero Bangalore
3 Cycle 3 Atlas Pune

So the trick was this where condition -

WHERE RIGHT(Suppliers,1) =  RIGHT(Cities,1)

There are other ways also to UNPIVOT the multiple columns, may be some other time I’ll post them too. 

Source of inspiration for this post is following thread on MSDN forums - http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/54626179-ee9f-4db7-a31e-c02104bdd6beAt 

At the end of it, OP also come up with his own solution he got from SQLTeam’s forums.

Mangal Pardeshi

Saturday, March 21, 2009

RunningValue Function (Reporting Services) – Calculating Running Totals

In one of my previous posts I talked about Calculating Running Totals in T-SQL.  Now I will show you how you can calculate the running totals in Sql Server Reporting Services(SSRS).  Yes, it is very much possible with RunningValue function in SSRS and also very easy compare to doing it in SQL.  

The syntax for RunningValue function goes like this -

RunningValue(expression, function, scope)
  • expression : The expression on which to perform the aggregation, for example, [Quantity].
  • function : The name of the aggregate function to apply to the expression, for example- Sum.
  • scope :The name of a dataset, data region, group or Nothing.

See the table below and RunningTotal column which is Cumulative SUM of Quantity column. 

ShipId OrderId ShipDate Quantity RunningTotal
1 1 2009-02-01 12:06:16.820 10 10
2 1 2009-02-02 16:16:16.820 15 25
3 2 2009-02-01 13:26:16.820 20 45
4 2 2009-02-03 17:40:16.820 12 57
5 2 2009-02-04 13:05:16.820 15 72
6 3 2009-02-03 14:25:16.820 25 97
7 3 2009-02-04 15:50:16.820 50 147
8 3 2009-02-05 19:40:16.820 30 177
9 4 2009-02-06 15:30:16.820 20 197
10 4 2009-02-07 16:20:16.820 10 207
11 5 2009-02-05 17:05:16.820 35 242

For achieving that in SSRS, add extra column in your Report’s Layout Page and put following expression in it-

=RunningValue(Fields!Quantity.Value, Sum, Nothing)

Yes, it is easy and simple, isn’t it?  But what about when you want the RunningTotals for group of records?  Like for above example you want the Running Totals for each orderId separately.  See the table below :

ShipId OrderId ShipDate Quantity RunningTotal
1 1 2009-02-01 12:06:16.820 10 10
2 1 2009-02-02 16:16:16.820 15 25
3 2 2009-02-01 13:26:16.820 20 20
4 2 2009-02-03 17:40:16.820 12 32
5 2 2009-02-04 13:05:16.820 15 47
6 3 2009-02-03 14:25:16.820 25 25
7 3 2009-02-04 15:50:16.820 50 75
8 3 2009-02-05 19:40:16.820 30 105
9 4 2009-02-06 15:30:16.820 20 20
10 4 2009-02-07 16:20:16.820 10 30
11 5 2009-02-05 17:05:16.820 35 35

Now for achieving that you need do followings :

  • First go to Report Layout page and add a new Group.   Give some name to the group.  If you don’t gave any name the default name would be something like “table1_Group1”.  Just note that.  And add OrderId to “Group On” expression.
  • Now change the expression at RunningTotal column as
    =RunningValue(Fields!Quantity.Value,SUM,"table1_Group1")

        Now if you have noticed I have replaced the scope Nothing to Group name “table1_Group1” .

Well after adding the extra group in the report you might see blank spaces after every  OrderId change. If you don’t want those blanks and want continuous report, then go to the property page of the new group added and set Visible to “false”.

Mangal Pardeshi

SQL MVP

Monday, March 16, 2009

Script All the Stored Procedures in The Database

In Sql Server 2005 and 2008 you can script the stored procedure in Management Studio by right clicking on Store Procedure name and clicking on “Script Store Procedure as” and then “Create To”.

But if you want to script all the Stored Procedures in the database programmatically, then here is the simple T-SQL query for it -

To script All the Stored Procedures in the Database :

SELECT    O.Name as ProcName
        ,M.Definition as CreateScript
        ,O.Create_Date
        ,O.Modify_Date
FROM sys.sql_modules as M INNER JOIN sys.objects as O
ON M.object_id = O.object_id
WHERE O.type = 'P'

If the Stored Procedure is created with ENCRYPTION option then you will get the NULL in the definition column.

Similarly,

To script All the Views in the Database :

SELECT    O.Name as ProcName
        ,M.Definition as CreateScript
        ,O.Create_Date
        ,O.Modify_Date
FROM sys.sql_modules as M INNER JOIN sys.objects as O
ON M.object_id = O.object_id
WHERE O.type = 'V'

To script All the Functions in the Database :

SELECT    O.Name as ProcName
        ,M.Definition as CreateScript
        ,O.Create_Date
        ,O.Modify_Date
FROM sys.sql_modules as M INNER JOIN sys.objects as O
ON M.object_id = O.object_id
WHERE O.type = 'FN'

For scripting all Triggers small modification is required, instead of sys.objects I joined the sys.triggers with sys.sql_modules.

To script All the Triggers in the Database :

SELECT    O.Name as ProcName
        ,M.Definition as CreateScript
        ,O.Create_Date
        ,O.Modify_Date
FROM sys.sql_modules as M INNER JOIN sys.triggers as O
ON M.object_id = O.object_id

Mangal Pardeshi
SQL MVP