Tuesday, November 17, 2015

SQL query to get the job error message from SSISDB

In SQL Server 2012 Microsoft did lot of enhancements in SSIS, especially by introducing the “Project Deployment Model”.  After this so many things got better in SSIS.  While there are so many good things about this change one thing started to irritate me and that is the error in the job history.  With this change we stopped getting the exact error messages in job history.  Instead, when we open the job history to check the reason for failure we see this message:
To view the details for the execution, right-click on the Integration Services Catalog, and open the [All Executions] report.
To get the exact job failure reason we need to go the Integration Services Catalog, right click on the Project name >> Reports >> Standard Reports >> All Executions.  These reports are very nice and detailed but every time going to these reports and trying to find the error is very irritating.  Sometimes too much information is not that good.  I have seen people making mistakes in reading these execution reports.  Most of the times we are interested in error messages only and not in other execution events.  Worst part is we can’t even copy the error message to quickly Google the error for solutions.  So I thought of doing something about it.  Every time I’m not going to open execution reports to see the error.
--------------------------------------------------------------------------------------------------------
DECLARE @DATE DATE = GETDATE() - 7 -- This is to restrict the data for last 7 days, used in ON condition
SELECT O.Operation_Id -- Not much of use
,E.Folder_Name AS Project_Name
,E.Project_name AS SSIS_Project_Name
,EM.Package_Name
,CONVERT(DATETIME, O.start_time) AS Start_Time
,CONVERT(DATETIME, O.end_time) AS End_Time
,OM.message as [Error_Message]
,EM.Event_Name
,EM.Message_Source_Name AS Component_Name
,EM.Subcomponent_Name AS Sub_Component_Name
,E.Environment_Name
,CASE E.Use32BitRunTime
WHEN 1
THEN 'Yes'
ELSE 'NO'
END Use32BitRunTime
,EM.Package_Path
,E.Executed_as_name AS Executed_By

FROM [SSISDB].[internal].[operations] AS O
INNER JOIN [SSISDB].[internal].[event_messages] AS EM
ON o.start_time >= @date -- Restrict data by date AND EM.operation_id = O.operation_id

INNER JOIN [SSISDB].[internal].[operation_messages] AS OM
ON EM.operation_id = OM.operation_id

INNER JOIN [SSISDB].[internal].[executions] AS E
ON OM.Operation_id = E.EXECUTION_ID

WHERE OM.Message_Type = 120 -- 120 means Error AND EM.event_name = 'OnError'
-- This is something i'm not sure right now but SSIS.Pipeline just adding duplicates so I'm removing it. AND ISNULL(EM.subcomponent_name, '') <> 'SSIS.Pipeline'
ORDER BY EM.operation_id DESC

--------------------------------------------------------------------------------------------------------
A quick SQL query to get the exact error would be nice.   And as I expected, all the package execution logs and error messages are stored in the SSISDB internal tables.   I created below sql script to query the SSISDB to get the exact error messages:
This was simple script to get the errors from SSISDB instead of going to Integration Services Catalogs execution reports.  Though this query is still work in progress, there could be scope for performance improvement or adding more information.  I’m still understanding the tables in SSISDB.  I’ll keep on posting the updates if any.
Mangal Pardeshi

Saturday, November 14, 2015

Dynamic PIVOT in SQL

Today I’m going to show you how to write a PIVOT query when column names are dynamic.  A dynamic pivot is very necessary when you don’t know the column names or column names can change with the time or due to any other reason.  I’m not going to discuss the pros and cons of the dynamic sql or even the Pivot itself (in my opinion pivoting of data should be handled at the reporting level.).  So let see how to write a dynamic pivot query with a very simplified example.

First lets create some sample data:
--------------------------------------------------------------------------------------------------------
-- create table
CREATE TABLE Orders
(
OrderID INT,
ProductName VARCHAR(50),
Quantity INT,
OrderDate DATE
);
-- insert some sample data
INSERT INTO Orders
VALUES
(1, 'Pen', 100, GETDATE()-1),
(2, 'Pen', 200, GETDATE()-2),
(3, 'Pen', 300, GETDATE()-3),
(4, 'Pencil', 500, GETDATE()-1),
(5, 'Pencil', 600, GETDATE()-2),
(6, 'Pencil', 400, GETDATE()-3),
(7, 'Eraser', 30, GETDATE()-1),
(8, 'Eraser', 20, GETDATE()-2),
(9, 'Eraser', 10, GETDATE()-3),
(10, 'Pen', 100, GETDATE()-4),
(11, 'Pencil', 500, GETDATE()-4),
(12, 'Eraser', 30, GETDATE()-4);
-- verify the data
SELECT *
FROM Orders

--------------------------------------------------------------------------------------------------------
This is how our data looks like:
Data
Requirement:  Now I want to Pivot this data on the column OrderDate.  I want to show the last 3 dates of OrderDates in the column and respective quantity of each Product under the Date.  Here is the expected output, though you need to remember that you will get different column names(dates) as my sample data itself is dynamic: 
Expected
Now first understand the challenges.  Our requirement is to show the last 3 days of OrderDate.  Today I’m writing this post that’s why last days are 11 Nov – 13 Nov, but tomorrow I’ll want my query to show the dates 12 Nov – 14 Nov.  It will continue to change.  Every time changing these dates in our query can be difficult/painful/irritating.   And that’s where dynamic SQL comes into the picture.  The main challenge is to get these ever changing column names and pass them into the PIVOT block without hard-coding and also into the SELECT clause.

Lets do one thing at a time.  First we will try to get the last 3 OrderDates and concatenate them in desired order and format.  For that I’m going to use XML PATH.  Note that, there could be other methods to concatenate the column names into a single variable.  Here is my code to get the column names dynamically into a variable:
--------------------------------------------------------------------------------------------------------
DECLARE @ColumnNames VARCHAR(1000)

SELECT @ColumnNames = 
           STUFF(( SELECT DISTINCT TOP 100 
           ', ' + QUOTENAME(CONVERT(VARCHAR(10), OrderDate, 120)) 
           FROM Orders 
           WHERE OrderDate > DATEADD(DD, -4, GETDATE()) 
           ORDER BY ', ' + QUOTENAME(CONVERT(VARCHAR(10), OrderDate, 120)) DESC
           FOR XML PATH('') 
           ), 1, 2, '')
-- Just to check how we are getting the column names
PRINT @ColumnNames

--------------------------------------------------------------------------------------------------------
Few notes on above query:
1. DISTINCT is needed because one OrderDate is appearing multiple times in the table.
2. TOP 100 is needed because I want to order the dates in the descending order. Since our query is actually a sub-query, TOP hack is needed. I put the number 100 just randomly, I could have simply used TOP 3 because I want only top 3 dates.  When you are not sure about the number of columns you can either put a relatively large number like 100 or TOP 100 PERCENT.
3. In WHERE clause I'm restricting the dates to last 3 dates from getdate() i.e. from "TODAY".
4. I'm also converting OrderDate to varchar and using the style 120 to get the dates in the desired format.  In my case I want the dates in yyyy-mm-dd, that's why the style 120.
This is how we get column names from above query: [2015-11-13], [2015-11-12], [2015-11-11]

Now here is our PIVOT query by using dynamic column names created above:
--------------------------------------------------------------------------------------------------------
DECLARE @ColumnNames VARCHAR(1000)
DECLARE @Query VARCHAR(4000)

-- This part is already explained above
SELECT @ColumnNames = 
           STUFF(( SELECT DISTINCT TOP 100 
           ', ' + QUOTENAME(CONVERT(VARCHAR(10), OrderDate, 120)) 
           FROM Orders 
           WHERE OrderDate > DATEADD(DD, -4, GETDATE()) 
           ORDER BY ', ' + QUOTENAME(CONVERT(VARCHAR(10), OrderDate, 120)) DESC
           FOR XML PATH('') 
           ), 1, 2, '')

-- Preparing our PVIOT query
SET @Query = 'SELECT ProductName, ' +
@ColumnNames + '
FROM
        ( SELECT ProductName
                     ,Quantity
                     ,OrderDate
          FROM Orders
        ) AS M
PIVOT
       (
         SUM( Quantity )
         FOR OrderDate IN ( '
+ @ColumnNames + ' ) 
       ) AS P
'
--Executing @Query to get the result
EXECUTE(@Query)

--------------------------------------------------------------------------------------------------------

You may also like to see my other post on Pivot: PIVOT Multiple Columns in SQL Server

Mangal Pardeshi

Thursday, November 12, 2015

Fastest way to find the row count of all the tables in SQL

Let me show you 2 simple queries to find the row count of all the tables in a Database.

Some time back I shared a trick to quickly find the row count of a table in SQL Server.  Now lets take it to next level and try to get the row count of all the tables in a Database.  There are 3-4 ways to do that, but I’m going to show you only 2 very identical methods.

1. Using Catalog View sys.partitions:

Here is the script to find the row count of all the tables using sys.partitions.  If you are a normal developer, who doesn’t have high level of permissions on a production Servers, this is the best method for you to find the row count of all the tables quickly.
--------------------------------------------------------------------------------------------------------
SELECT T.NAME AS TableName 
           ,SUM(P.ROWS) AS [RowCount]
FROM sys.partitions AS P
INNER JOIN sys.tables AS T
ON P.OBJECT_ID = T.OBJECT_ID
WHERE P.INDEX_ID < 2
GROUP BY T.NAME
ORDER BY [RowCount] DESC

--------------------------------------------------------------------------------------------------------

2. Using DMV sys.dm_db_partition_stats:

--------------------------------------------------------------------------------------------------------
SELECT T.NAME AS TableName 
           ,SUM(S.ROW_COUNT) AS [RowCount]
FROM sys.dm_db_partition_stats AS S
INNER JOIN sys.tables AS T
ON S.OBJECT_ID = T.OBJECT_ID
WHERE S.INDEX_ID < 2
GROUP BY T.NAME
ORDER BY [RowCount] DESC

--------------------------------------------------------------------------------------------------------

Actually nothing to chose between 2 methods, they both are quite identical, performance wise also both are very same.  Only difference between them is permissions needed to execute them.  While catalog view sys.partitions requires membership in the public role.  Effectively even if you have only “read only” access on a database, you can still use the 1st method.  On the other hand it requires VIEW DATABASE STATE permission to query the sys.dm_db_partition_stats dynamic management view.  So if you are a normal developer “sometimes” it is difficult to get that level of permissions.

And yes, as I already mentioned previously there is a chance that you will get the approximate row count if statistics are not updated.  This applies to both the methods.

To know more about the views see:
1. sys.partitions
2. sys.dm_db_partition_stats

See my couple of other posts on similar topics:
1. Fastest way to find the row count of a table in SQL Server
2. Find the row count of temporary tables in SQL Server

Thanks
Mangal Pardeshi

Friday, November 6, 2015

Disk space monitoring of SQL Server

Recently I was working on creating a custom dashboard to monitor the health of all the database servers.  One thing I wanted to add to the dashboard was “Available Disk Space” on the data drives of SQL Server instances.  And I thought I’m not going to get this information in any system table or DMV and I’ll need to get the disk space information via some window commands or something.  Then I came across this dynamic management function sys.dm_os_volume_stats ,  which provided the exact information I was looking for.

sys.dm_os_volume_stats:

Returns information about the operating system volume (directory) on which the specified databases and files are stored in SQL Server. Use this dynamic management function to check the attributes of the physical disk drive or return available free space information about the directory.

Here is the script that gives you the useful information about the total and available size of attached data volumes on SQL Server instance.
--------------------------------------------------------------------------------------------------------
SELECT DISTINCT d.logical_volume_name AS LogicalName
,d.volume_mount_point AS DriveName
,CONVERT(NUMERIC(32,2),d.available_bytes *1.00/(1048576.0 *1024 ))AS FreeSpaceInGB
,CONVERT(NUMERIC(32,2),d.Total_Bytes *1.00/(1048576.0 *1024 )) AS TotalSizeInGB
FROM sys.master_files f
CROSS APPLY sys.dm_os_volume_stats(f.database_id, f.FILE_ID) d
ORDER BY FreeSpaceInGB

--------------------------------------------------------------------------------------------------------

This script is very useful in monitoring the data volumes, tracking the growth of the data, detecting the low disk space on the data drive of SQL Server etc.  I have implemented the low disk space alert on my Servers using this script.

Thanks
Mangal Pardeshi

Wednesday, November 4, 2015

Row Count of Temporary Tables

In this post I’ll show you a quick trick to find the rowcount of temporary tables in SQL Server.
Imagine a situation where you are executing a stored procedure that has many SQL queries. In the stored procedure there are few temporary tables.   Query is creating temporary tables, inserting rows into them and then performing some actions.  While this stored procedure is executing you want to track the progress of the execution.  To be more specific you want to know how many rows are inserted into a particular temporary table.  Knowing the row count of a temporary table in another sessions seems to be pretty impossible, right?   Actually it is very easy.  By using below script:
--------------------------------------------------------------------------------------------------------
SELECT T.NAME AS TABLE_NAME
           ,S.ROW_COUNT 
FROM TEMPDB.sys.dm_db_partition_stats AS S
INNER JOIN TEMPDB.sys.tables AS T
ON S.OBJECT_ID = T.OBJECT_ID
WHERE 
S.INDEX_ID < 2
--------------------------------------------------------------------------------------------------------
Note:  To keep it simple I’m assuming you don’t have partitioned tables, anyways it is pointless to create partitions in a temporary table. 
By executing above script you can get the row count of all the temporary tables created on the server.  In my last post Fastest way to find the row count of a table I mentioned about the system DMV maintained by the SQL Server sys.dm_db_partition_stats.  sys.dm_db_partition_stats returns page and row-count information for every partition in the current database.  We also know that all the temporary tables are created in the tempdb database.  So if we execute the sys.dm_db_partition_stats in the tempdb we can get the row count of the tables in the tembdb database and also the temporary tables created by users.
BUT, there is one thing you should be aware.  When we create the temporary table we know that it gets created in the tembdb database but not with the exact same name.  SQL Server engine adds some extra (random) characters into the name.  Mostly lots of underscores followed by some number.   See the below image, I created one temporary table #T and immediately executed above script:
TempTable
You just need to little smart while giving the name to temporary tables.  Otherwise by habit many users can create temporary table with same name (typically #temp or #t) then it will be difficult to know which table is created by which user.  So if you give some meaningful names while creating temporary table (e.g. #OrdersLastMonth), above script to know the row_count of temporary tables can really help you.
I hope this script of finding the row count of temporary tables in SQL Server will be useful to you.  Do let me know your feedback.
Thanks
Mangal Pardeshi