If you are going to use ROLLUP or CUBE operators in future or used it in past, then this is the post you must read.
Well GROUPING is one of the least used function in T-SQL, even in cases where it should be used. And so many times instead of using GROUPING function we follow some wrong/bad practice by using ISNULL only. I’ll just give you a small example of it in this post.
First lets understand what the GROUPING function is – From Books on line
GROUPING (Transact-SQL) : Indicates whether a specified column expression in a GROUP BY list is aggregated or not. GROUPING returns 1 for aggregated or 0 for not aggregated in the result set. GROUPING can be used only in the SELECT <select> list, HAVING, and ORDER BY clauses when GROUP BY is specified.
Syntax : GROUPING ( <column_expression> )
Arguments <column_expression> : Is a column or an expression that contains a column in a GROUP BY clause.
To put it in simple words : GROUPING is an aggregate function that causes an additional column to be output with a value of 1 when the row is added by either the CUBE or ROLLUP operator, or 0 when the row is not the result of CUBE or ROLLUP.
Grouping is allowed only in the select list associated with a GROUP BY clause that contains either the CUBE or ROLLUP operator.
So after reading it from the books on line “what the GROUPING function is”, you might like to ask me where it can be useful? And what is wrong practice I was talking about at the start of this post?
Well answer to 1st question is, it is very useful while using ROLLUP or CUBE operator. Especially when you have some NULL values in any of the columns used in GROUP BY clause. Basically it helps us to identify which row is added by CUBE or ROLLUP operator. Need of this function arises because when CUBE or ROLLUP operator adds the subtotals and totals rows to result SQL Server returns NULL in related GROUP columns. And in such cases it is difficult to differentiate between actual NULL values from the column and the NULL returned by CUBE or ROLLUP operator.
Now to understand how it works and the wrong practice I was talking about, lets see following example.
For example I'll take a simplified table of sales representatives and and their Sales in respective Zones.
| |
-- create sample table Sales CREATE TABLE Sales ( ID INT, FName VARCHAR(30), Zone VARCHAR(30), Sale INT ) GO -- Load sample data INSERT INTO Sales SELECT 1, 'Mangal', 'East', 20 UNION ALL SELECT 2, 'Mangal', 'East', 150 UNION ALL SELECT 3, 'Mangal', 'West', 50 UNION ALL SELECT 4, 'Ram', 'East', 45 UNION ALL SELECT 5, 'Ram', NULL, 80 UNION ALL SELECT 6, 'Ram', NULL, 40 UNION ALL SELECT 7, 'Sachin', 'West', 50 UNION ALL SELECT 8, 'Sachin', 'West', 40 GO -- Test sample data SELECT Id, FName, Zone, Sale FROM Sales GO |
The sample data :
ID | FName | Zone | Sale |
1 | Mangal | East | 20 |
2 | Mangal | East | 150 |
3 | Mangal | West | 50 |
4 | Ram | East | 45 |
5 | Ram | NULL | 80 |
6 | Ram | NULL | 40 |
7 | Sachin | West | 50 |
8 | Sachin | West | 40 |
Now here is what we normally do when somebody asks us to summarize the sale column on FName with subtotals for each Zone : We write a SELECT query with GROUP BY FName, Zone WITH ROLLUP. And when we see NULLS coming in Zone/FName column because of ROLLUP we add a ISNULL function on Zone column (in this example we'll add ISNULL only on Zone column, as I have intentionally added some NULLS in Zone column's sample data ). And that is where we go wrong, the Wrong Practice I was talking about.
| |
SELECT Fname, ISNULL(Zone, 'All Zone') as Zone, SUM(Sale) AS Total FROM Sales GROUP BY Fname, Zone WITH ROLLUP ORDER BY FName |
The output :
FName | Zone | Total |
NULL | All Zone | 475 |
Mangal | East | 170 |
Mangal | West | 50 |
Mangal | All Zone | 220 |
Ram | All Zone | 120 |
Ram | East | 45 |
Ram | All Zone | 165 |
Sachin | West | 90 |
Sachin | All Zone | 90 |
As you can see in the result, for Mangal and Sachin everything looks fine, but for Ram there are two “All Zone” in Zone column. Well 1st 'All Zone' is because of actual NULL value in Zone column and 2nd All Zone is sub total returned by ROLLUP operator for all Zones for Ram. But ISNULL is not capable of differentiating between these two NULLS and blindly replaces them with 'All Zone'. And that's when GROUPING function comes in for our help. Now 1st just see what GROUPING returns, execute the following query -
| |
SELECT FName,Zone, SUM(Sale) AS Total, GROUPING(Zone) as 'Grouping' FROM Sales GROUP BY FName, Zone WITH ROLLUP ORDER BY FName |
And the result :
FName | Zone | Sale | Grouping |
NULL | NULL | 475 | 1 |
Mangal | East | 170 | 0 |
Mangal | West | 50 | 0 |
Mangal | NULL | 220 | 1 |
Ram | NULL | 120 | 0 |
Ram | East | 45 | 0 |
Ram | NULL | 165 | 1 |
Sachin | West | 90 | 0 |
Sachin | NULL | 90 | 1 |
In above result wherever you see 1 in Grouping column, that means that row is added by ROLLUP operator. And wherever you see 0 in Grouping column, that means that row is not returned by ROLLUP. So now identifying between NULL by ROLLUP and NULL from column becomes easy for us. Now we can use this GROUPING indicator in our query, to identify the NULL rows added by ROLLUP operator, and NULLS from the column.
So our final query will be :
| |
SELECT Fname,CASE GROUPING(Zone) WHEN 1 THEN 'All Zone' ELSE ISNULL(Zone, 'Unknown') END as Zone, SUM(Sale) AS Total FROM Sales GROUP BY Fname, Zone WITH ROLLUP ORDER BY FName |
And the output:
FName | Zone | Total |
NULL | All Zone | 475 |
Mangal | East | 170 |
Mangal | West | 50 |
Mangal | All Zone | 220 |
Ram | Unknown | 120 |
Ram | East | 45 |
Ram | All Zone | 165 |
Sachin | West | 90 |
Sachin | All Zone | 90 |
So now, all NULLS from column are replaced by 'Unknown' and all subtotals added by ROLLUP comes with 'All Zone' in Zone column.
The GROUPING function becomes even more useful while using CUBE operator.
Mangal Pardeshi