Thursday, June 4, 2009

Multiplying Column Values

Well we all know how to add the values in a column using SUM and GROUP BY.  But what about multiplying the column values?  And also how to calculate running multiplication on a column similar to Running Totals?

Sometime back I came across this interesting question on MSDN SQL Server forums - How to multiply all values in a table column?

I don't know the reason of such requirement, but very interesting problem though.  Of course there is no such built-in function in SQL Server to do it for you.  But after thinking on it for few minutes, I realize that if you still remember the basics of Mathematics, it is not that difficult to do in T-SQL.

If you remember
A * B = ANTILOG ( LOG (A) + LOG(B) )

So looking at it again, it is very straight forward using LOG, ANTILOG and SUM in SQL Server. (For doing ANTILOG you have EXP function in SQL Server.) Let me show you how :

Lets create some sample data first.

-- Create sample table
CREATE TABLE Scores
(
ID INT IDENTITY,
Category VARCHAR(1),
)
GO
-- Load sample data into table
INSERT INTO Scores(Category, Score)
SELECT 'A', 4 UNION ALL
SELECT
'A', 5 UNION ALL
SELECT 'A', 2 UNION ALL
SELECT
'B', 5 UNION ALL
SELECT
'B', 5 UNION ALL
SELECT
'B', 2 UNION ALL
SELECT
'B', 3
GO
-- test the sample data
SELECT Id, Category, Score
FROM Scores
GO

The sample data :
ID CATEGORY SCORE
1 A 4
2 A 5
3 A 2
4 B 5
5 B 5
6 B 2
7 B 3

Multiplying Column Values:  And here is the query to calculate the Total Product of Score Column for each Category.

SELECT Category, SUM(Score) as TotalSum,EXP(SUM(LOG(Score))) as TotalProduct
FROM Scores
GROUP BY Category

Result :

Category

TotalSum

TotalProduct

A

11

40

B

15

150

Second column you have normal total SUM for the Score column and in 3rd column you have the total product of the Score column for each Category.

Calculating The Running Multiplication:  After multiplying the column values, next question is how to calculate the Running Product similar to Running totals? See the following table -

ID

Category

Score

RunningProduct

1

A

4

4

2

A

5

20

3

A

2

40

4

B

5

5

5

B

5

25

6

B

2

50

7

B

3

150

The logic is similar to what I just mentioned for Multiplying Rows and adding the Running Total logic to query.

SELECT O.Id, O.Category, O.Score
        ,(  SELECT EXP ( SUM ( LOG (I.Score) ) )
             FROM Scores I
             WHERE I.Category = O.Category
             AND I.Id <= O.Id
         ) as RunningProduct
FROM Scores O

But remember this solution only works for positive numbers, if column contains negative values then you will need to modify the query using ABS function and taking into account number of negative values.  Well but at least you will get a start from this post.

Friday, May 22, 2009

Expiration Date of SQL Server Evaluation Edition

If you have installed the Evaluation / Trial Version of SQL Server, you may like to find out what is the expiry date of the Trial Version.  The Trial Edition or Evaluation Edition of SQL Server will expire exactly 180 days after the install date.  Using it after the 180 days period is violation of MICROSOFT evaluation license terms.  Anyways after 180 days when an install of Evaluation Edition Expires, the SQL Server services stop functioning.  So you must purchase the license for SQL Server and upgrade the Evaluation copy before the expiry date of the Evaluation.

But then how to find out the exact expiry date of Trial Version?  There are 2 ways to find out

  • From installation logs
  • From Management Studio
  • From installation logs
    A summary.txt file is get created during the installation of SQL Server.  So this summary.txt can give you the exact date and time of the SQL Server installation.  And the trial software will automatically expire after six months.  You can find summary.txt file under
    <Drive>\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Summary.txt

    At the end of the file you can see something like

    Setup succeeded with the installation, inspect the log file completely for status on all the components.

    Time : Tue Dec 30 15:59:55 2008

    So after 180 days counting from 30 Dec 2008 i.e. 28 Jun 2009 the SQL Server Evaluation Edition will be expired.

  • From Management Studio
    Now this is very simple. Open the Management Studio, click on the “Help” of Menu Bar and then on “About”.  A new window will appear, where you will see some thing like:



    Microsoft SQL Server Management Studio (expires in 102 days)


    See the Image : untitled12



  • Wednesday, May 20, 2009

    Calculating Running Totals

    Edit: If you are using SQL Server 2012 or 2014 you may like to see my latest post on the topic How to Calculate Running Totals in SQL Server

    In one of my previous posts I wrote about Calculating Running Totals using a SELF JOIN.  But after realizing that, SELF JOIN is not efficient way of calculating Running Totals I thought of writing this post, with another technique without using Cursor.

    In this post I'll show you how to Calculate Running Totals without using a cursor.  On the same line with my previous post, the different scenarios I’ll be covering in this post are -

    • Running Total for all records.
    • Running Total on each Group of Records.
    • Running Total on each Date.

    Here is our sample table :

    ShipId OrderId ShipDate Quantity
    1 1 2009-02-01 12:06:16.820 10
    2 1 2009-02-02 16:16:16.820 15
    3 2 2009-02-01 13:26:16.820 20
    4 2 2009-02-03 17:40:16.820 12
    5 2 2009-02-04 13:05:16.820 15
    6 3 2009-02-03 14:25:16.820 25
    7 3 2009-02-04 15:50:16.820 50
    8 3 2009-02-05 19:40:16.820 30
    9 4 2009-02-06 15:30:16.820 20
    10 4 2009-02-07 16:20:16.820 10
    11 5 2009-02-05 17:05:16.820 35

    Create Sample Data:

    -- Create Table
    CREATE TABLE Sales
    ( ShipId INT,
    OrderId INT,
    ShipDate DateTime,
    Quantity INT )
    GO
    -- Load Sample Data

    INSERT INTO Sales SELECT
    1,1, 'Feb 01 2009 12:06:16:820PM',10 UNION ALL SELECT
    2,1, 'Feb 02 2009 04:16:16:820PM',15 UNION ALL SELECT
    3,2, 'Feb 01 2009 01:26:16:820PM',20 UNION ALL SELECT
    4,2, 'Feb 03 2009 02:40:16:820PM',12 UNION ALL SELECT
    5,2, 'Feb 04 2009 01:05:16:820PM',15 UNION ALL SELECT
    6,3, 'Feb 03 2009 05:25:16:820PM',25 UNION ALL SELECT
    7,3, 'Feb 04 2009 03:50:16:820PM',50 UNION ALL SELECT
    8,3, 'Feb 05 2009 07:40:16:820PM',30 UNION ALL SELECT
    9,4, 'Feb 06 2009 03:30:16:820PM',20 UNION ALL SELECT
    10,4, 'Feb 07 2009 04:20:16:820PM',10 UNION ALL SELECT
    11,5, 'Feb 05 2009 05:05:16:820PM',35


    Case I : Running Total For all records.
    Here I will calculate the Running Totals for each records. For that I'll be using a correlated sub query to calculate the Running Totals for the all the records. So the query will be :
    SELECT O.ShipId, O.OrderId, O.ShipDate, O.Quantity
            ,(SELECT SUM(I.Quantity)
                FROM Sales I
                WHERE I.ShipId <= O.ShipId
             ) as RunningTotal
    FROM Sales O
    ORDER BY O.ShipId

    Here is the result returned by above query :
    ShipId OrderId ShipDate Quantity RunningTotal
    1 1 2009-02-01 12:06:16.820 10 10
    2 1 2009-02-02 16:16:16.820 15 25
    3 2 2009-02-01 13:26:16.820 20 45
    4 2 2009-02-03 17:40:16.820 12 57
    5 2 2009-02-04 13:05:16.820 15 72
    6 3 2009-02-03 14:25:16.820 25 97
    7 3 2009-02-04 15:50:16.820 50 147
    8 3 2009-02-05 19:40:16.820 30 177
    9 4 2009-02-06 15:30:16.820 20 197
    10 4 2009-02-07 16:20:16.820 10 207
    11 5 2009-02-05 17:05:16.820 35 242

    Case II: Running Total on each Group of Records.
    Now instead of calculating the running total for each record we will calculate the running total for each OrderId. For that small change in our above query will be addition of one condition in WHERE clause of , that will be - I.OrderId = O.OrderId. So the final query will be:

    SELECT O.ShipId, O.OrderId, O.ShipDate, O.Quantity
            ,(SELECT SUM(I.Quantity)
                FROM Sales I
                WHERE I.OrderId = O.OrderId
                AND I.ShipId <= O.ShipId
             ) as RunningTotal
    FROM Sales O
    ORDER BY OrderId

    And the Output :
    ShipId OrderId ShipDate Quantity RunningTotal
    1 1 2009-02-01 12:06:16.820 10 10
    2 1 2009-02-02 16:16:16.820 15 25
    3 2 2009-02-01 13:26:16.820 20 20
    4 2 2009-02-03 17:40:16.820 12 32
    5 2 2009-02-04 13:05:16.820 15 47
    6 3 2009-02-03 14:25:16.820 25 25
    7 3 2009-02-04 15:50:16.820 50 75
    8 3 2009-02-05 19:40:16.820 30 105
    9 4 2009-02-06 15:30:16.820 20 20
    10 4 2009-02-07 16:20:16.820 10 30
    11 5 2009-02-05 17:05:16.820 35 35

    Case III: Running Total on each Date.
    Now we will see how to calculate the Running Total for each day. For that some small modifications will required in our WHERE clause, observe them carefully and try to understand what I’m trying to do :

    SELECT O.ShipId, O.OrderId, O.ShipDate, O.Quantity
            ,(SELECT SUM(I.Quantity)
                FROM Sales I
                WHERE I.shipDate <= O.Shipdate
                AND DateAdd(dd, DateDiff (dd, 0, I.Shipdate), 0)
                = DateAdd(dd, DateDiff (dd, 0, O.Shipdate) ,0)
              ) as RunningTotal
    FROM Sales O
    ORDER BY O.ShipDate

    And the output :

    ShipId OrderId ShipDate Quantity RunningTotal
    1 1 2009-02-01 12:06:16.820 10 10
    3 2 2009-02-01 13:26:16.820 20 30
    2 1 2009-02-02 16:16:16.820 15 15
    6 3 2009-02-03 14:25:16.820 25 25
    4 2 2009-02-03 17:40:16.820 12 37
    5 2 2009-02-04 13:05:16.820 15 15
    7 3 2009-02-04 15:50:16.820 50 65
    11 5 2009-02-05 17:05:16.820 35 35
    8 3 2009-02-05 19:40:16.820 30 65
    9 4 2009-02-06 15:30:16.820 20 20
    10 4 2009-02-07 16:20:16.820 10 10

    Mangal Pardeshi

    Friday, May 15, 2009

    GROUPING Function SQL Server

    If you are going to use ROLLUP or CUBE operators in future or used it in past, then this is the post you must read.

    Well GROUPING is one of the least used function in T-SQL, even in cases where it should be used. And so many times instead of using GROUPING function we follow some wrong/bad practice by using ISNULL only.  I’ll just give you a small example of it in this post.

    First lets understand what the GROUPING function is – From Books on line

    GROUPING (Transact-SQL) : Indicates whether a specified column expression in a GROUP BY list is aggregated or not. GROUPING returns 1 for aggregated or 0 for not aggregated in the result set. GROUPING can be used only in the SELECT <select> list, HAVING, and ORDER BY clauses when GROUP BY is specified.

    Syntax : GROUPING ( <column_expression> )

    Arguments <column_expression> : Is a column or an expression that contains a column in a GROUP BY clause.

    To put it in simple words : GROUPING is an aggregate function that causes an additional column to be output with a value of 1 when the row is added by either the CUBE or ROLLUP operator, or 0 when the row is not the result of CUBE or ROLLUP.

    Grouping is allowed only in the select list associated with a GROUP BY clause that contains either the CUBE or ROLLUP operator.

    So after reading it from the books on line “what the GROUPING function is”, you might like to ask me where it can be useful? And what is wrong practice I was talking about at the start of this post?

    Well answer to 1st question is, it is very useful while using ROLLUP or CUBE operator. Especially when you have some NULL values in any of the columns used in GROUP BY clause. Basically it helps us to identify which row is added by CUBE or ROLLUP operator. Need of this function arises because when CUBE or ROLLUP operator adds the subtotals and totals rows to result SQL Server returns NULL in related GROUP columns. And in such cases it is difficult to differentiate between actual NULL values from the column and the NULL returned by CUBE or ROLLUP operator.

    Now to understand how it works and the wrong practice I was talking about, lets see following example.

    For example I'll take a simplified table of sales representatives and and their Sales in respective Zones.

    -- create sample table Sales

    CREATE TABLE Sales
    (
    ID INT,
    FName VARCHAR(30),
    Zone VARCHAR(30),
    Sale INT
    )
    GO

    -- Load sample data

    INSERT INTO Sales SELECT
    1, 'Mangal', 'East', 20 UNION ALL SELECT
    2, 'Mangal', 'East', 150 UNION ALL SELECT
    3, 'Mangal', 'West', 50 UNION ALL SELECT
    4, 'Ram', 'East', 45 UNION ALL SELECT
    5, 'Ram', NULL, 80 UNION ALL SELECT
    6, 'Ram', NULL, 40 UNION ALL SELECT
    7, 'Sachin', 'West', 50 UNION ALL SELECT
    8, 'Sachin', 'West', 40
    GO

    -- Test sample data

    SELECT Id, FName, Zone, Sale
    FROM Sales
    GO

    The sample data :

    ID FName Zone Sale
    1 Mangal East 20
    2 Mangal East 150
    3 Mangal West 50
    4 Ram East 45
    5 Ram NULL 80
    6 Ram NULL 40
    7 Sachin West 50
    8 Sachin West 40

    Now here is what we normally do when somebody asks us to summarize the sale column on FName with subtotals for each Zone : We write a SELECT query with GROUP BY FName, Zone WITH ROLLUP. And when we see NULLS coming in Zone/FName column because of ROLLUP we add a ISNULL function on Zone column (in this example we'll add ISNULL only on Zone column, as I have intentionally added some NULLS in Zone column's sample data ). And that is where we go wrong, the Wrong Practice I was talking about.

    SELECT Fname, ISNULL(Zone, 'All Zone') as Zone, SUM(Sale) AS Total
    FROM Sales
    GROUP BY Fname, Zone WITH ROLLUP
    ORDER BY
    FName

    The output :

    FName Zone Total
    NULL All Zone 475
    Mangal East 170
    Mangal West 50
    Mangal All Zone 220
    Ram All Zone 120
    Ram East 45
    Ram All Zone 165
    Sachin West 90
    Sachin All Zone 90

    As you can see in the result, for Mangal and Sachin everything looks fine, but for Ram there are two “All Zone” in Zone column. Well 1st 'All Zone' is because of actual NULL value in Zone column and 2nd All Zone is sub total returned by ROLLUP operator for all Zones for Ram. But ISNULL is not capable of differentiating between these two NULLS and blindly replaces them with 'All Zone'. And that's when GROUPING function comes in for our help. Now 1st just see what GROUPING returns, execute the following query -

    SELECT FName,Zone, SUM(Sale) AS Total, GROUPING(Zone) as 'Grouping'
    FROM Sales
    GROUP BY FName, Zone WITH ROLLUP
    ORDER BY
      FName

    And the result :

    FName Zone Sale Grouping
    NULL NULL 475 1
    Mangal East 170 0
    Mangal West 50 0
    Mangal NULL 220 1
    Ram NULL 120 0
    Ram East 45 0
    Ram NULL 165 1
    Sachin West 90 0
    Sachin NULL 90 1

    In above result wherever you see 1 in Grouping column, that means that row is added by ROLLUP operator.  And wherever you see 0 in Grouping column, that means that row is not returned by ROLLUP.  So now identifying between NULL by ROLLUP and NULL from column becomes easy for us.  Now we can use this GROUPING indicator in our query, to identify the NULL rows added by ROLLUP operator, and NULLS from the column.

    So our final query will be :

    SELECT Fname,CASE GROUPING(Zone)
                    WHEN 1 THEN 'All Zone'
                    ELSE ISNULL(Zone, 'Unknown') END as Zone,
            SUM(Sale) AS Total
    FROM Sales
    GROUP BY Fname, Zone WITH ROLLUP
    ORDER BY  FName

    And the output:

    FName Zone Total
    NULL All Zone 475
    Mangal East 170
    Mangal West 50
    Mangal All Zone 220
    Ram Unknown 120
    Ram East 45
    Ram All Zone 165
    Sachin West 90
    Sachin All Zone 90

    So now, all NULLS from column are replaced by 'Unknown' and all subtotals added by ROLLUP comes with 'All Zone' in Zone column.

    The GROUPING function becomes even more useful while using CUBE operator.

    Mangal Pardeshi

    Wednesday, April 29, 2009

    Query to fetch the Yesterday's Data

    Seems to be easy question. And many of you may be already knew 2-3 ways of doing it. Here is the query I use normally to get the all the data from the previous day(Yesterday).

    SELECT Id, OrderId, CustomerId
    FROM Orders
    WHERE Date >= DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0)
    AND Date < DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0)

    Make it more readable and scalable using variable

    DECLARE @YesterDay DATETIME, @Today DATETIME
    SET @YesterDay = DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0)
    SET @Today = DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0)

    SELECT
    Id, OrderId, CustomerId
    FROM Orders
    WHERE  Date >= @YesterDay
    AND Date < @Today

    Logic : Basically above WHERE conditions is like - Greater than Equal to Yesterday and Less than Today.

    This DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0) gives Yesterday's date with time as 00:00:00.

    And this DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0)  gives today's date with time as 00:00:00.

    You can do a quick check by executing -

    SELECT DateAdd(DD, DateDiff(DD, 0, GETDATE())-1, 0) AS YDay
    ,DateAdd(DD, DateDiff(DD, 0, GETDATE()), 0) as ToDay

    Reason for doing this way instead of using some obvious ones – As you may heard this from many SQL experts, “Whenever possible never use CONVERT/CAST or any functions in WHERE clause on table's column.”

    Question is Why?

    Well whenever you use the any function on a column , SQL Server no longer uses the Index define on particular column. And use of a CAST or CONVERT functions slows down the query execution considerably.

    You may say, I used the DateAdd and DateDiff function in the WHERE clause. Yes, I used them, but NOT on Date column from the table. I used them with the Getdate function which is more of a constant and NOT part of the table's column.

    Query Optimization Tip :

    Whenever possible never use CONVERT/CAST or any functions in WHERE clause on table's column.

    In future I'll have some more posts on Date comparison in WHERE condition. Also some more simple tricks to optimized the everyday's queries.

     

    Mangal Pardeshi
    SQL MVP