Welcome to VSTSC's Community Server | | Help

SSIS Design Pattern - ETL Instrumentation, Part 2

Introduction

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we will expand on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

A Brief History Of Our ETL Instrumentation Project

To review, our existing database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are three objects:
 - a table named ssis.RunTimeMetrics.
 - a stored procedure named ssis.usp_RecordPackageStart.
 - a stored procedure named ssis.usp_RecordPackageEnd.

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of "Started". From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status "Succeeded".

First Things First

Let's start by opening the SSIS package we built in Part 1. The first thing we want to change is the package metadata - the data in our SSIS package about our SSIS package. Why? We always want to know what changes were made, who made the changes, and when they were made.

Let's begin by updating the version. Click on the Control Flow tab and press the F4 key to open properties. Scroll to the Version Properties and change the VersionMinor from 0 to 1:

Also edit the Control Flow annotation that contains version information:

Update package description annotation. I like to include a Change Log summary inside each package:

In the Change Log, I include 

  • A version number in the format Major.Minor.Date.
  • The date of the change.
  • The name of the developer who made the change.
  • A brief description of the change.

Now I'm ready to actually make changes! To facilitate package execution on your machine, I want to edit the name and target SQL Server instance of the OLEDB Connection Manager. Double-click the connection manager to edit it and change the Server name property to "(local)".

Click the OK button to close the connection manager editor. For consistency, right-click the connection manager and click Rename. Change the name to "(local).SSISRunTimeMetrics":

There. Housekeeping accomplished!

A test run verifies I haven't broken anything:

Unit test successful - ready to continue developing.

Errors Aside

Our current solution works well as long as our SSIS package succeeds. And this is the way all good engineering is done; Use Case 1 should always be "Everything Succeeds". But what happens if the SSIS package fails? Shouldn't we record the failure and as much metadata about the failure as possible? I'm glad you asked: Yes, we should.

In SSIS, errors "bubble". The concept of bubbling is familiar to ASP developers - it simply means events follow a prescribed path through the scope hierarchy of the application. Ok, you can scratch the word "simply" out of the preceeding sentence.

Let's use the script task to generate an error. Right the Test Script Task and click Edit to open the editor. Click the Script page and the Design Script button to open the Visual Studio for Applications (VSA) script editor. I use Script Tasks for exception testing. The following code  - replacing the "Dts.TaskResult = Dts.Results.Success" line - will prompt you to have the task succeed or fail:

Dim iResponse As Integer
iResponse = MsgBox("Have task complete with Success?", MsgBoxStyle.YesNo)
If iResponse = vbYes Then
  Dts.TaskResult = Dts.Results.Success
Else
  Dts.TaskResult = Dts.Results.Failure
End If

The additional code causes a second messagebox:

 

If you click Yes in response, the task will succeed. Clicking No will cause the task to fail:

SSIS has some great logging and error-handling functionality built in. The first place to look when an error occurs is the Progress / Execution Results tab in the SSIS Designer GUI. The tab is named Progress while the package is executing and becomes Execution Results once execution completes. Click the Progress / Execution Results tab to view the error raised when you clicked the No button:

This isn't a very descriptive error message. Some SSIS error messages aren't very descriptive. Let's add some error logging capabilities to our ETL Instrumentation application.

To Err is Human... 

There are lots of ways to develop solutions with SSIS. Some of them are best practices, more are simply matters of coding style. For instance, here you could build the database table to hold error information first; or you could build the error-handling functionality in the SSIS package first. This is a style decision. I choose to build the table first.

My initial errors table will be named RunTimeErrors and will contain the following fields:

  • RunTimeMetricsID - to map the error to the execution instance. I want to know the error occurred when I executed the package at a certain time.
  • ErrorDateTime - when the error actually occurred.
  • SourceName - the task that raised the error.
  • ErrorNumber - the error number.
  • ErrorDescription - the error description.

I generate the table with the following T-SQL script:

use SSISRunTimeMetrics
go

-- vars...
declare @sql varchar(255)

-- create ssis schema...
if not exists(select name
              from sys.schemas
              where name = 'ssis')
begin
 set @sql = 'Create Schema ssis'
 exec(@sql)
end

-- create RunTimeErrors table...
if not exists(select s.name + '.' + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = 'RunTimeErrors'
               and s.name = 'ssis')
 Create Table ssis.RunTimeErrors
 (RunTimeMetricsId int
 ,ErrorDateTime datetime null
 ,SourceName varchar(255) null
 ,ErrorNumber int null
 ,ErrorDescription varchar(2500) null)

Note this script is re-runnable. I encourage you to take the time to write re-runnable scripts when developing database solutions.

We could stop here and directly insert rows into our error table, but let's complete the development process by creating a stored procedure to do this work for us. Execute the following code in SSMS:

use SSISRunTimeMetrics
go

if exists(select s.name + '.' + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = 'usp_RecordPackageError'
           and s.name = 'ssis')
 begin
  Drop Procedure ssis.usp_RecordPackageError
 end
go

Create Procedure ssis.usp_RecordPackageError
  @RunTimeMetricsID
int
 ,@SourceName varchar(255) = null
 ,@ErrorNumber int = null
 ,@ErrorDescription varchar(2500) = null
As

 begin

  -- set @Now...
  declare @Now datetime
  set @Now = GetDate()

  -- insert the run time errors data...
  insert into ssis.RunTimeErrors
  (RunTimeMetricsID
  ,ErrorDateTime
  ,SourceName
  ,ErrorNumber
  ,ErrorDescription)
  values
  (@RunTimeMetricsID
  ,@Now
  ,@SourceName
  ,@ErrorNumber
  ,@ErrorDescription)

 end

go

Again, re-runnable code. Now lets' return to Visual Studio to populate our table with error information.

Click on the Test Script Task on the Control Flow. Then click the Event Handlers tab. Because you clicked the Test Script Task first, the Event Handler Executable defaults to the Test Script Task. The Event Handler defaults to the OnError event.

 

To create the OnError Event Handler, click the link labeled "Click here to create an 'OnError' event handler for executable 'Test Script Task'". Let's do that now.

After you click the link you will note the canvas looks remarkably familiar: it's nearly identical to the Control Flow.

Our goal is to load data into our RunTimeErrors table. Let's use an Execute SQL Task to call the stored procedure. Drag an Execute SQL Task onto the OnError Event Handler canvas and double-click it to open the editor. Name the Execute SQL Task "Record Error". Set the connection to "(local).SSISRunTimeMetrics". Set the SQLStatement property to "exec ssis.usp_RecordPackageError ?,?,?,?":

Click the Parameter Mapping page. Parameterized queries use question marks (?) as parameter placeholders. There are a few ways to access the parameters (a subject for another blog post). We will use ordinal position to access them. Click the Add button to add a new parameter. Set the Variable Name property to "User::iPackageLoadID" and the Parameter Name property to "0". The default data type ("Long") works for SSIS integer data type variables. This maps the iPackageLoadID variable to the first question mark in our SQLStatement property, which is the @RunTimeMetricsID parameter for the stroed procedure. Got it? Good.

Add three more parameters:

Variable Name: System::SourceName
Data Type: VarChar
Parameter Name: 1

Variable Name: System::ErrorCode
Data Type: Long
Parameter Name: 2

Variable Name: System::ErrorDescription
Data Type: VarChar
Parameter Name: 3

Click the OK button to close the editor and execute the package. 

Click the OK button on the first messagebox, then click the No button on the messagebox dialog to fail the Test Script Task. When the task fails, the OnError event fires and calls our stored procedure:

 

Let's look at the table to see what was recorded. Return to SSMS and execute the following query:

select *
from ssis.RunTimeErrors
order by ErrorDateTime desc 

The results:

We now have basic error information from our failed task. This is great for our Test Script Task, but what about the other tasks on our SSIS package? Do we have to create OnError event handlers for them as well? That could be a lot of work.

Remember that bubbling thing we mentioned at the beginning of this post? We can employ the fact that errors bubble up through the object model. When an error occurs in the Test Script Task, an error bubbles up to the Sequence Container named "Step 1 - Do some stuff". You can see it in this earlier screenshot:

To demonstrate, let's add error handling to the Sequence Container. First, stop debugging the SSIS package if you have not already done so. Then right-click the "Record Error" Execute SQL Task and click Copy to copy it to the clipboard. Click the Executable dropdown and select the "Step 1 - Do some stuff" Sequence Container:

Select the OnError event handler and click the link to create the OnError event handler. Then paste the "Record Error" Execute SQL Task onto the new OnError event handler canvas. Re-execute the package observing the new OnError event handler. Click the No button to cause the Test Script Task to fail.

Note the new OnError event handler fires, in addition to the Test Script Task OnError event handler:

Re-execute the query in SSMS to verify:

Note the subsequent execution (RunTimeMetricsID 10, in my case) records the error twice. Hmmm... this is not the desired behavior. I only want the error to show up once per instance. However, also note the information recorded is nearly identical - only the date is different (and then only by 183 milliseconds). So the 20:17:58.803 occurrence of the error is from the Test Script Task and the 20:17:58.990 occurrence is the same error bubbling into the "Step 1 - Do some stuff" Sequence Container's context and firing an identicaly-configured OnError event handler.

Again, this is not the desired behavior but it gives me an idea: Since I know errors bubble up the object model, all errors should eventually make it to the Package level. If I delete my existing OnError event handlers and create a single OnError event handler at the package level, I should then be able to record any and all errors - no matter where they occur in the SSIS package.

Let's do that. First, copy the "Record Error" Execute SQL Task to the clipboard. Click the Delete button to delete the Sequence Container's OnError event handler, then navigate to the Test Script Task's OnError event handler and click the Delete button there. Click the Yes button on the confirmation dialog each time:

Navigate to the SSISRunTimeMetrics (package) event handlers:

Create the event handler by clicking the link and paste the Execute SQL Task onto the canvas. Voila, you now have error handling for then entire package. Here we must also add package status logging. The quickest way to accomplish this is to return to the Control Flow and copy the "Log End of Package Execution" Execute SQL Task to the clipboard. Navigate to the package OnError event handler and paste the Execute SQL Task onto the canvas. Connect it to the "Record Error" script task as shown:

Double-click the task to open the editor. Rename the task "Log Failed Package Execution" and change the SQLStatement property to:

declare @Now datetime
set
@Now = GetDate()
Exec ssis.usp_RecordPackageEnd ?,@Now,'Failed'

Click the OK button to close the editor.

Re-execute the package to test, answering No to the dialog and failing the Script Task. Re-execute the query in SSMS to verify a single instance of the error is recorded:

Note the SourceName column contains the name of the task where the error originates, even though the error has bubbled through two layers of object model. This also holds for the ErrorNumber and ErrorDescription fields.

Task Status, Anyone?

It's helpful to be able to track performance at a granular level. When instrumenting an ETL process, I like to record additional (and different) metadata at the Task level.

Since containers contain tasks and groups can represent linked processes or related tasks, task metrics must include parent-task relationships. Execute the following script to build our first TaskMetrics table:

use SSISRunTimeMetrics
go

-- vars...
declare @sql varchar(255)

-- create ssis schema...
if not exists(select name
              from sys.schemas
              where name = 'ssis')
 begin
  set @sql = 'Create Schema ssis'
  exec(@sql)
 end

-- create RunTimeErrors table...
if not exists(select s.name + '.' + t.name
              from sys.tables t
              inner join sys.schemas s on s.schema_id = t.schema_id
              where t.name = 'TaskMetrics'
               and s.name = 'ssis')
 Create Table ssis.TaskMetrics
 (TaskMetricsID int identity(1,1)
 ,RunTimeMetricsId int not null
 ,ParentTaskMetricsID int null
 ,TaskStartDateTime datetime null
 ,TaskEndDateTime datetime null
 ,SourceName varchar(255) null
 ,TaskStatus varchar(25) null)

As before, let's create stored procedures to manage inserting start-of-task information into the TaskMetrics table and updating end-of-task information:

use SSISRunTimeMetrics
go

if exists(select s.name + '.' + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = 'usp_RecordTaskMetrics'
           and s.name = 'ssis')
 begin
  Drop Procedure ssis.usp_RecordTaskMetrics
 end
go

Create Procedure ssis.usp_RecordTaskStart
  @RunTimeMetricsID
int
 ,@ParentTaskMetricsID int = null
 ,@TaskStartDateTime datetime = null
 ,@SourceName varchar(255) = null
 ,@TaskStatus varchar(25) = null
As

begin

 -- set @TaskStartDateTime default...
 declare @Now datetime
 set @Now = GetDate()
 if @TaskStartDateTime is null
  set @TaskStartDateTime = @Now

 -- insert the run time errors data...
insert into ssis.TaskMetrics
 (RunTimeMetricsID
 ,ParentTaskMetricsID
 ,TaskStartDateTime
 ,SourceName
 ,TaskStatus)
output inserted.TaskMetricsID as 'Inserted_Task_ID'
values
 (@RunTimeMetricsID
 ,@ParentTaskMetricsID
 ,@TaskStartDateTime
 ,@SourceName
 ,@TaskStatus)

end

go

Now for the end-of-task update:

use SSISRunTimeMetrics
go

if exists(select s.name + '.' + p.name
          from sys.procedures p
          inner join sys.schemas s on s.schema_id = p.schema_id
          where p.name = 'usp_RecordTaskEnd'
           and s.name = 'ssis')
 begin
  Drop Procedure ssis.usp_RecordTaskEnd
 end
go

Create Procedure ssis.usp_RecordTaskEnd
 @TaskMetricsID
int
,@TaskEndDateTime datetime = null
,@TaskStatus varchar(25) = null
As

begin

-- set @TaskEndDateTime default...
declare @Now datetime
set @Now = GetDate()

if
(@TaskEndDateTime Is Null)
 set @TaskEndDateTime = @Now

-- update the task metrics data...
update ssis.TaskMetrics
Set TaskEndDateTime = @TaskEndDateTime
   ,TaskStatus = @TaskStatus
where TaskMetricsid = @TaskMetricsid

end

go

Instrumenting the Task 

We will use the new procedures to instrument the task.

We need some variables for these parameters. At a bare minimum, we need a variable to hold the Task Metrics ID returned from the ssis.usp_RecordTaskStart stored procedure. Open variables and create a new package-scoped, Int32 type variable named iTaskLoadID:

 

Drag an Execute SQL Task into the Sequence Container an configure the General page as shown:

Set the SQLStatement property to the follow script:

declare @Now datetime
set
@Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,'Do some stuff','Started'

Click the Parameter Mapping page and add a parameter. Set the Variable Name to User::iPackageLoadID and Parameter Name to 0 - accept the defaults for the remaining parameter properties. Click the Result Set page and add a new result. Set the ResultName to 0 and the Variable Name to User::iTaskLoadID. This will hold the TaskMetricsID value returned from the ssis.usp_RecordTaskStart stored procedure:

Click the OK button to close the editor. Next, let's create a way to update TaskMetrics table after a successful execution.

Drag an Execute SQL Task into the "Step 1 - Do some stuff" Sequence Container and connect it as shown:

 

Double-click the Execute SQL Task to edit it. Name it "Log Successful End of Task". Set the SQLStatement to:

declare @Now datetime
set
@Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,'Completed'

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task succeed. Execute the following query to observe the results:

select *
from ssis.TaskMetrics
order by TaskMetricsID desc

Note TaskMetricsID 2 completed while TaskMetricsID 1 started and then stopped.

We now need to add a way to track and record task error status. Add another Execute SQL Task to the Control Flow canvas and connect a Failed Precedence Constraint from the Sequence Container to the new Execute SQL Task as shown:

 

 Name the new Execute SQL Task "Log Failed End of Task". Set the SQLStatement to:

declare @Now datetime
set
@Now = GetDate()
exec ssis.usp_RecordTaskEnd ?,@Now,'Failed'

Add one parameter mapped to User::iTaskLoadID and set to Parameter Name 0. Test the new tasks by executing the SSIS package. When prompted, have the script task fail. Execute the query to observe the results:

You now have enough information to correlate the three tables. For example, we can view the results of the last execution with the following script:

select
  m.packageName
 ,m.packageStartDateTime
 ,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as 'packageRunTime'
 ,m.packageStatus
 ,t.SourceName
 ,t.TaskStartDateTime
 ,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as 'taskRunTime'
 ,t.TaskStatus
 ,e.ErrorNumber
 ,e.ErrorDescription
from ssis.RunTimeMetrics m
inner join ssis.RunTimeErrors e on e.RunTimeMetricsID = m.id
inner join ssis.TaskMetrics t on t.RunTimeMetricsID = m.id
where m.packageStartDateTime = (select Max(packageStartDateTime)
                                from ssis.RunTimeMetrics)

Those results appear as shown here:

This makes a fine source query for a SQL Server Reporting Services solution!

Conclusion

In this second part of ETL Instrumentation, we modified version metadata to indicate we have made changes; added package error logging and error status logging; and added task status logging.

Get the code!

:{> Andy

 

Published Saturday, November 10, 2007 7:41 AM by andy

Comments

Sunday, November 18, 2007 8:43 PM by Applied Business Intelligence

# SSIS Design Pattern - ETL Instrumentation, Part 3

Introduction This post is part of a series of posts on ETL Instrumentation. In Part 1 we built a database

Wednesday, December 12, 2007 7:34 PM by TaffyLewis

# re: SSIS Design Pattern - ETL Instrumentation, Part 2

I really enjoyed this article, but I do have one question.  What if you have many task in the Sequence Container.  How would you do a log of start task, then log end task, without having to create a pair or sql execution tasks?

Wednesday, December 12, 2007 7:35 PM by TaffyLewis

# re: SSIS Design Pattern - ETL Instrumentation, Part 2

I meant without having to creat many pairs of slq execution tasks for each task!

Thanks!

Wednesday, December 12, 2007 8:03 PM by andy

# re: SSIS Design Pattern - ETL Instrumentation, Part 2

Hi Taffy,

  Thanks!

  There are a couple approaches to ETL Instrumentation - each with advantages and disadvantages. The methodology I chose to explore here is the most straightforward, in my opinion. That makes it great for introducing the science of ETL measurement, but it also forces a particular design. That design compels one to break each process into granular chunks and place each chunk in a container of some kind. And that's a disadvantage.

  Another approach is to use EventHandlers to record this information. One advantage of the event-driven method is its transparency on the Control Flow. Because events "bubble" in SSIS, you can configure the database inserts and updates in OnPreExecute and OnPostExecute events at the package level.

  Then any object added to the Control Flow will automatically be captured with no additional coding required. I will cover this method in a later post.

  But, in answer to your question (finally): Yes. You would need to add Execute SQL Tasks at each step to capture the flow.

:{> Andy

Wednesday, December 26, 2007 12:48 AM by Applied Business Intelligence

# SSIS Design Pattern - ETL Instrumentation, Part 4

Introduction This post is part of a series of posts on ETL Instrumentation. In Part 1 we built a database

Anonymous comments are disabled