Friday, May 15, 2009

GROUPING Function SQL Server

If you are going to use ROLLUP or CUBE operators in future or used it in past, then this is the post you must read.

Well GROUPING is one of the least used function in T-SQL, even in cases where it should be used. And so many times instead of using GROUPING function we follow some wrong/bad practice by using ISNULL only.  I’ll just give you a small example of it in this post.

First lets understand what the GROUPING function is – From Books on line

GROUPING (Transact-SQL) : Indicates whether a specified column expression in a GROUP BY list is aggregated or not. GROUPING returns 1 for aggregated or 0 for not aggregated in the result set. GROUPING can be used only in the SELECT <select> list, HAVING, and ORDER BY clauses when GROUP BY is specified.

Syntax : GROUPING ( <column_expression> )

Arguments <column_expression> : Is a column or an expression that contains a column in a GROUP BY clause.

To put it in simple words : GROUPING is an aggregate function that causes an additional column to be output with a value of 1 when the row is added by either the CUBE or ROLLUP operator, or 0 when the row is not the result of CUBE or ROLLUP.

Grouping is allowed only in the select list associated with a GROUP BY clause that contains either the CUBE or ROLLUP operator.

So after reading it from the books on line “what the GROUPING function is”, you might like to ask me where it can be useful? And what is wrong practice I was talking about at the start of this post?

Well answer to 1st question is, it is very useful while using ROLLUP or CUBE operator. Especially when you have some NULL values in any of the columns used in GROUP BY clause. Basically it helps us to identify which row is added by CUBE or ROLLUP operator. Need of this function arises because when CUBE or ROLLUP operator adds the subtotals and totals rows to result SQL Server returns NULL in related GROUP columns. And in such cases it is difficult to differentiate between actual NULL values from the column and the NULL returned by CUBE or ROLLUP operator.

Now to understand how it works and the wrong practice I was talking about, lets see following example.

For example I'll take a simplified table of sales representatives and and their Sales in respective Zones.

-- create sample table Sales

CREATE TABLE Sales
(
ID INT,
FName VARCHAR(30),
Zone VARCHAR(30),
Sale INT
)
GO

-- Load sample data

INSERT INTO Sales SELECT
1, 'Mangal', 'East', 20 UNION ALL SELECT
2, 'Mangal', 'East', 150 UNION ALL SELECT
3, 'Mangal', 'West', 50 UNION ALL SELECT
4, 'Ram', 'East', 45 UNION ALL SELECT
5, 'Ram', NULL, 80 UNION ALL SELECT
6, 'Ram', NULL, 40 UNION ALL SELECT
7, 'Sachin', 'West', 50 UNION ALL SELECT
8, 'Sachin', 'West', 40
GO

-- Test sample data

SELECT Id, FName, Zone, Sale
FROM Sales
GO

The sample data :

ID FName Zone Sale
1 Mangal East 20
2 Mangal East 150
3 Mangal West 50
4 Ram East 45
5 Ram NULL 80
6 Ram NULL 40
7 Sachin West 50
8 Sachin West 40

Now here is what we normally do when somebody asks us to summarize the sale column on FName with subtotals for each Zone : We write a SELECT query with GROUP BY FName, Zone WITH ROLLUP. And when we see NULLS coming in Zone/FName column because of ROLLUP we add a ISNULL function on Zone column (in this example we'll add ISNULL only on Zone column, as I have intentionally added some NULLS in Zone column's sample data ). And that is where we go wrong, the Wrong Practice I was talking about.

SELECT Fname, ISNULL(Zone, 'All Zone') as Zone, SUM(Sale) AS Total
FROM Sales
GROUP BY Fname, Zone WITH ROLLUP
ORDER BY
FName

The output :

FName Zone Total
NULL All Zone 475
Mangal East 170
Mangal West 50
Mangal All Zone 220
Ram All Zone 120
Ram East 45
Ram All Zone 165
Sachin West 90
Sachin All Zone 90

As you can see in the result, for Mangal and Sachin everything looks fine, but for Ram there are two “All Zone” in Zone column. Well 1st 'All Zone' is because of actual NULL value in Zone column and 2nd All Zone is sub total returned by ROLLUP operator for all Zones for Ram. But ISNULL is not capable of differentiating between these two NULLS and blindly replaces them with 'All Zone'. And that's when GROUPING function comes in for our help. Now 1st just see what GROUPING returns, execute the following query -

SELECT FName,Zone, SUM(Sale) AS Total, GROUPING(Zone) as 'Grouping'
FROM Sales
GROUP BY FName, Zone WITH ROLLUP
ORDER BY
  FName

And the result :

FName Zone Sale Grouping
NULL NULL 475 1
Mangal East 170 0
Mangal West 50 0
Mangal NULL 220 1
Ram NULL 120 0
Ram East 45 0
Ram NULL 165 1
Sachin West 90 0
Sachin NULL 90 1

In above result wherever you see 1 in Grouping column, that means that row is added by ROLLUP operator.  And wherever you see 0 in Grouping column, that means that row is not returned by ROLLUP.  So now identifying between NULL by ROLLUP and NULL from column becomes easy for us.  Now we can use this GROUPING indicator in our query, to identify the NULL rows added by ROLLUP operator, and NULLS from the column.

So our final query will be :

SELECT Fname,CASE GROUPING(Zone)
                WHEN 1 THEN 'All Zone'
                ELSE ISNULL(Zone, 'Unknown') END as Zone,
        SUM(Sale) AS Total
FROM Sales
GROUP BY Fname, Zone WITH ROLLUP
ORDER BY  FName

And the output:

FName Zone Total
NULL All Zone 475
Mangal East 170
Mangal West 50
Mangal All Zone 220
Ram Unknown 120
Ram East 45
Ram All Zone 165
Sachin West 90
Sachin All Zone 90

So now, all NULLS from column are replaced by 'Unknown' and all subtotals added by ROLLUP comes with 'All Zone' in Zone column.

The GROUPING function becomes even more useful while using CUBE operator.

Mangal Pardeshi

3 comments:

  1. thank u plz post more articles

    ReplyDelete
  2. thank u plz post more article

    ReplyDelete
  3. Great explanation and example! I will be definitely be back for more!!!

    ReplyDelete