Welcome to VSTSC's Community Server | | Help

SSIS Design Pattern - ETL Instrumentation, Part 1

Introduction
 
SSIS is a fantastic ETL engine. When I build and ETL solution in SSIS, I like to collect runtime metrics. Why? I use the data initially to determine a baseline for performance and to determine, in some cases, if I'm loading within a defined window.
 
I refer to this process of collecting performance and runtime metrics as ETL Instrumentation and I collect this data for every enterprise ETL solution I design.
 
The types of things I want to know are:
  • Did the process start when I wanted it to start?
  • Did it succeed?
    • If not, is it still running?
    • Did it fail?
  • How long did it execute?
  • How does recent performance compare to past performance?

In this mini-series entitled SSIS Design Pattern - ETL Instrumentation, I will demonstrate how I capture, report, and measure these data using SSIS, SSRS, SQL Agent Jobs, and Transact-SQL.

In Part 1, we build an elementary system. In subsequent parts, we will improve upon the concepts demonstrated here.

To Instrument Or Not To Instrument; That Is The Question
 
My thoughts on this are best summed up thus:
 
There are two types of enterprise data warehouse developers:
1) Those who collect ETL instrumentation data; and
2) Those who will.
 
Why do I make this statement? When you build ETL to load your enterprise data warehouse, you are building the source for the reporting system. Should anything unfortunate happen to the data between the time you read it from the System Of Record and the time it lands in your Reporting Server Database, you are basically guilty until proven innocent. ETL Instrumentation will provide evidence for (and sometimes against) you as you defend your ETL design and implementation.
 
Building The Database
 
Before we instrument an ETL process, we need a database solution for the data. We'll start by building the database itself, then add objects.
 
Let's build the database. We'll call it SSISRunTimeMetrics. Open SQL Server Management Studio (SSMS) and execute the following Transact-SQL:
 
 
use master
go

if not exists(select name
              from master.sys.databases
              where name = 'SSISRunTimeMetrics')
 Create Database SSISRunTimeMetrics
go

 
   This script creates the SSISRunTimeMetrics database. Next, let's create a schema and table to hold our metrics:
 
 
use SSISRunTimeMetrics
go

-- vars...
declare @sql varchar(255)

-- create ssis schema...
if not exists(select name
              from sys.schemas
              where name = 'ssis')
 begin
  set @sql = 'Create Schema ssis'
  exec(@sql)
 
end

-- create RunTimeMetrics table...
if not exists(select s.name + '.' + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = 'RunTimeMetrics'
               and s.name = 'ssis')
 Create Table ssis.RunTimeMetrics
  (id int identity(1,1)
  ,packageStartDateTime datetime null
  ,packageEndDateTime datetime null
  ,packageName varchar(255) null
  ,packageVersion varchar(25) null
  ,packageStatus varchar(25) null)

Next, let's add a stored procedure to record a package start event:
 

use SSISRunTimeMetrics
go
 
if exists(select s.name + '.' + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = 'usp_RecordPackageStart'
           and s.name = 'ssis')
 begin
  Drop Procedure ssis.usp_RecordPackageStart
 end
go
 
Create Procedure ssis.usp_RecordPackageStart
  @packageStartDateTime
datetime = null
 ,@packageName varchar(255) = null
 ,@packageVersion varchar(25) = null
 ,@packageStatus varchar(25) = null
As
 begin
  -- set @packageStartDateTime default...
  declare @Now datetime
  set @Now = GetDate()

  if (@packageStartDateTime Is Null)
   set @packageStartDateTime = @Now

  -- insert the run time metrics data...
  insert into ssis.RunTimeMetrics
   (packageStartDateTime
   ,packageName
   ,packageVersion
   ,packageStatus)
  values
  (@packageStartDateTime
  ,@packageName
  ,@packageVersion
  ,@packageStatus)

 end

go

 
That's all for now. We'll add more stuff later.
 
Building The SSIS Project
 
Open SQL Server Business Intelligence Development Studio (BIDS) and create a new project. When the New Project dialog displays, select Business Intelligence Projects from the Project types list and Integration Services Project from the Templates. Name the project SSISRunTimeMetrics and click the OK button: 
 
 
Rename the Package SSISRunTimeMetrics.dtsx:
 
 
Always click the Yes button when you see this dialog (trust me):
 
 
Right-click in the white-space on the Control Flow canvas and click Add Annotation to document your code:
 
 
I can't say enough about good documentation. You never know when someone else will have to open your source code to troubleshoot.
 
Right-click in the white-space of the Control Flow canvas and click Variables. Add a variable named sVersion. Make sure the Scope is the SSISRunTimeMetrics package and make the Data Type String:
 
 
Add a Sequence Container and change the name to "Step 1 - Do some stuff":
 
 
Sequence Containers are nice for several reasons:
  • They're aesthetic. You can break up chucks of functionality logically... how cool!
  • When troubleshooting, you can execute the contents of a container from the right-click context menu.
  • You can isolate chunks of functionality using transactions by container.
Next, add an Execute SQL Task and name it "Log Start of Package Execution":
 
 
Double-click the Execute SQL Task to edit it. Click Connection and select "":
 
 
When the Configure OLE DB Connection Manager dialog displays, click the "New..." button:
 
 
Enter or select your server in the Server Name dropdown. In the "Connect to a database" panel, select the "Select or enter a database name" option and enter or select the SSISRunTimeMetrics database you created earlier. Click the OK button to proceed:
 
 
Then click the OK button again to set the Connection property of the Execute SQL Task:
 
 
Select the SQL Statement property and click the ellipsis. When the "Enter SQL Query" dialog displays, enter the following parameterized Transact-SQL statement:
 
Exec ssis.usp_RecordPackageStart ?,?,?,'Started'
 
Click the OK button to proceed:
 
 
Click on the Parameter Mapping page and configure parameters as shown:
 
 
Click the OK button to proceed.
 
Next, let's do some work on the sVersion variable. Click on the sVersion variable name and press the F4 key to display variable properties. Change the EvaluateAsExpression property to True, then click on the Expression property's ellipsis:
 
 
When the Expression Builder displays, enter the following SSIS Expression Language text to build a version string in the format Major.Minor.Build:
 
(DT_WSTR, 3) @VersionMajor + "." +  (DT_WSTR, 3)@VersionMinor + "." + (DT_WSTR, 3)@VersionBuild
 
 
Click the "Evaluate Expression" button to check the expression. Then click the OK button to close the Expreesion Builder:
 
 
Add a Script Task and configure it to popup a message box displaying the value of sVersion:
 
Imports System
Imports
System.Data
Imports
System.Math
Imports
Microsoft.SqlServer.Dts.Runtime
 
Public Class ScriptMain

  Public Sub Main()
 
    Dim sVersion As String = Dts.Variables("sVersion").Value.ToString
    Dim sMsg As String
 
    sMsg = "Version: " & sVersion
    MsgBox(sMsg)
    Dts.TaskResult = Dts.Results.Success
 
  End Sub

End
Class
 
 
Execute the package. Then execute the following Transact-SQL query to check the results:

select * from ssis.RunTimeMetrics

 
It's time to add a procedure to log the end of package execution, but we have a problem: How do we identify the row we just added to the ssis.RunTimeMetrics table? Think about it. We just inserted a row, and since that's the only row in the table it's pretty simple to identify. Later, we will have lots of rows in the table in varying states. We need a method to positively identify the row we're working with.
 
We could use the last row inserted - apply Max(id) criteria - but what happens as our metrics grows to include multiple packages? One package could start while another was executing and we'd update the wrong row.
 
We require a means to positively identify the row when added to the table. Such a means exists. If we modify the ssis.usp_RecordPackageStart stored procedure we can have it return the value of the inserted ID. The OUTPUT clause gives us access to the inserted and deleted virtual tables. Modify the stored procedure by adding the following OUTPUT clause as shown:
 
output inserted.id as 'Inserted_ID'
 
Execute the Transact-SQL to drop and recreate the ssis.usp_RecordPackageStart stored procedure:
 
 
Add another Int32 Type variable named iPackageLoadID:
 
 
Back in Execute SQL Task, change the ResultSet property to "Single row":
 
 
Click the Result Set page and add a new resultset. Name it 0 (ordinals work well) and assign the results of this query to the iPackageLoadID variable you just created. This will push the results of the OUTPUT clause in the stored procedure (which will include the id of the newly inserted row) into the iPackageLoadID variable. You can then access it later in the package to update the precise row:
 
 
Alter the Script Task by adding iPackageLoadID to the list of ReadOnlyVariables:
 
 
Alter the VB.Net script to read:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
 
Public Class ScriptMain

  Public Sub Main()
 
    Dim sVersion As String = Dts.Variables("sVersion").Value.ToString
    Dim sPackageLoadID As String = Dts.Variables("iPackageLoadID").Value.ToString
    Dim sMsg As String
 
    sMsg = "Version: " & sversion & vbCrLf
    sMsg = sMsg &
"PackageLoadID: " & sPackageLoadID
    MsgBox(sMsg)
    Dts.TaskResult = Dts.Results.Success
 
  End Sub

End
Class
 
 
Execute the package to test the new script:
 
 
Next, add a new stored procedure named ssis.usp_RecordPackageEnd to the SSISRunTimeMetrics database using the following Transact-SQL script:

use
SSISRunTimeMetrics
go
 
if exists(select s.name + '.' + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = 'usp_RecordPackageEnd'
          and s.name = 'ssis')
 begin
  Drop Procedure ssis.usp_RecordPackageEnd
 end
go

Create Procedure ssis.usp_RecordPackageEnd
  @id
int
 ,@packageEndDateTime datetime = null
 ,@packageStatus varchar(25) = null
As
 begin

  -- set @packageEndDateTime default...
  declare @Now datetime
  set @Now = GetDate()
  if (@packageEndDateTime Is Null)
   set @packageEndDateTime = @Now
 
  -- update the run time metrics data...
  update ssis.RunTimeMetrics
   Set packageEndDateTime = @packageEndDateTime
      ,packageStatus = @packageStatus
  where id = @id

 end

go
 
Now add a new Execute SQL Task named "Log End of Package Execution":
 
 
Set the Connection property to your SSISRunTimeMetrics database and the SQLStatement property to the following Transact-SQL statement:

declare @Now datetime
set
@Now = GetDate()
 
Exec ssis.usp_RecordPackageEnd ?,@Now,'Succeeded'
 
Click the Parameter Mapping page and configure the iPackageLoadID variable as an input variable of the Long data type, assigned to ordinal 0:
 
 
Click the OK button: 
 
 
Execute the package. Note that each time a change is saved, the VersionBuild property auto-increments, thus updating your sVersion variable value:
 
 
The package succeeds:
 
 
Examine the data stored in the ssis.RunTimeMetrics table. Note the latest row now contains a Start and End DateTime, and the packageStatus field now records "Succeeded":
 
 
We will work with these concepts more - fleshing them out by adding more robust code and additional functionality. For now, this is a good start. Congratulations!
 
Get the code!
 
:{> Andy
Published Sunday, July 15, 2007 3:51 PM by andy

Comments

Sunday, November 11, 2007 12:08 AM by Applied Business Intelligence

# SSIS Design Pattern - ETL Instrumentation, Part 2

Introduction This post is part of a series of posts on ETL Instrumentation. In Part 1 we built a database

Sunday, November 18, 2007 8:43 PM by Applied Business Intelligence

# SSIS Design Pattern - ETL Instrumentation, Part 3

Introduction This post is part of a series of posts on ETL Instrumentation. In Part 1 we built a database

Wednesday, December 26, 2007 12:48 AM by Applied Business Intelligence

# SSIS Design Pattern - ETL Instrumentation, Part 4

Introduction This post is part of a series of posts on ETL Instrumentation. In Part 1 we built a database

Monday, December 31, 2007 12:54 PM by TaffyLewis

# re: SSIS Design Pattern - ETL Instrumentation, Part 1

I thank you again for you article-it really helped me get up and going using SSIS.  I'm having a problem and I'm wondering if you've experienced it.  When you're creating your packages, have you had to use SQL Server Authentication while developing?  I have to do this and have been having problems with connecting.  I tried using XML file and inputing the password in note pad.  This seems to be a solution for deploying, but while testing.  Have you had to use SQL Server?

Taffy

Anonymous comments are disabled