Welcome to VSTSC's Community Server Sign in | Join | Help

SSIS Expression Language and Dynamic Property Expressions

Introduction

This post is part of a series titled An Introduction to the SSIS Expression Language.

In this post, I demonstrate the SSIS Expression Language and Dynamic Property Expressions.

To build the demo project described in this article you will need SSIS 2005 and the AdventureWorks sample database (which can be downloaded at http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004).

Build The Demo Project

If you have not already done so, create an SSIS project named ExpressionLanguageDemo.

Once the project is created, open Solution Explorer and rename Package.dtsx - change the name to DynamicProperties.dtsx. If you've already created the project, right-click the SSIS Packages folder in Solution Explorer and click New Package - and rename that package to DynamicProperties.dtsx. When prompted, rename the package object as well. (I think you should always do this - I always answer this message box by clicking the Yes button.)

A New Source

Before we begin connect to your instance of SQL Server - I'm using (local) - and execute the following T-SQL statement:

use master
go

if
not exists(select name
              from sys.databases
              where name='TestSource')
 Create Database TestSource
go

use
TestSource
go

if not exists(select name
              from sys.schemas
              where name = 'Person')
 begin
  declare @Sql varchar(100)
  set @Sql = 'create schema Person'
  exec(@Sql)
 end
go

if
not exists(select name
              from sys.tables
              where name='Contact')
 begin
  select top 1000
   ContactID
  ,Title
  ,FirstName
  ,
MiddleName
  ,LastName
  ,EmailAddress
  into
Person.Contact
  from AdventureWorks.Person.Contact
 end
else
 begin
  truncate table Person.Contact
  insert into Person.Contact
  select top 1000 
  
ContactID
  ,Title
  ,FirstName
  ,
MiddleName
  ,LastName
  ,EmailAddress
  from AdventureWorks.Person.Contact
 end

This script creates a new database with a table named dbo.Contact, which it populates from the AdventureWorks database. We'll use this other data source to demonstrate a cool use of dynamic property expressions.

Manage the Connection

Rename the (local).AdventureWorks Connection Manager to MySource:

Click MySource and press the F4 key to display Properties. Highlight the ConnectionString property value and copy it to the clipboard.

Right-click anywhere in the white-space of the Control Flow and click Variables. Create a new variable named ConStr. Check to make sure the Scope of the Varibale is DynamicProperties (the package) and set the Data Type to String. In the Value textbox, paste the contents of the clipboard (the connection string of the MySource connection manager).

Back To Our Regularly Scheduled Flow... 

Drag an Execute SQL Task onto the Control Flow and double-click the Execute SQL Task to open the editor. Drag an OLE DB Source onto the canvas. Double-click the OLE DB Source to open the editor. Set the Connection property to MySource. 

Set the Data Access Mode to SQL Command and enter the following T-SQL statement into the SQL Command Text textbox:

Select Count(*)
From Person.Contact

Set the ResultSet property to Single Row:

Click the Result Set page and then click the Add button. Set the Result Name to 0. Drop down the Variable Name column and click <New Variable>:

When the Add Variable form displays, set the Name to CountVal, Value Type to Int32, and the Value to 0:

Click OK to close the Add Variable form:

 

Click OK again to close the editor.

Follow the Script

Drag a Script Task onto the Control Flow and connect a precedence constraint (green arrow) from the Execute SQL Task to the Script Task. Double-click the Script Task to open the editor.Click the Script page and set the ReadOnlyVariables property to CountVal. Set the ReadWriteVariables property to ConStr: 

 

Click the Design Script button to open the Visual Studio for Applications (VSA) editor.

Replace the code with the following:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

 Public
Sub Main()

  Dim sConStr As String = Dts.Variables("ConStr").Value.ToString
  Dim iCountVal As Integer = Convert.ToInt32(Dts.Variables("CountVal").Value)
  Dim sMsg As String

  sMsg = "ConnectionString: " & sConStr
  sMsg = sMsg & vbCrLf &
"Count: " & iCountVal.ToString

  MsgBox(sMsg)

  Dts.Variables(
"ConStr").Value = Strings.Replace(sConStr, "AdventureWorks", "TestSource")

  Dts.TaskResult = Dts.Results.Success

 End
Sub

End
Class

This script reads the value of the ConStr and CountVal variables, then displays them in a message box. Finally, the script changes the value of the ConStr variable, replacing the AdventureWorks database with the TestSource database. We'll use this change in a bit. 

Close the VSA editor and click the OK button to close the Script Task editor.

Go Back, Jack, Do It Again...

Copy the Execute SQL Task and paste it in the Control Flow. Connect the Script Task to the pasted Execute SQL Task (Execute SQL Task 1).

Drag another Script Task onto the Control Flow. Connect a precedence constraint from Execute SQL Task 1 to Script Task 1 and double-click it to open the editor. As before, add ConStr to the ReadOnlyVariables property and CountVal to the ReadWriteVariables property. Click the Design Script button and paste the same code as before into this Script Task.

I can hear you thinking: "Andy, why not just copy and paste the Script Task?" That's an excellent question! Script Tasks are very fickle on the clipboard. On 32-bit systems, you get better results. On x64, it gets ugly. The task fails with the error:

Error: Precompiled script failed to load. The computer that is running the Script Task does not have a compiler capable of recompiling the script. To recompile the script, see the Microsoft Knowledge Base article, KB931846 (http://go.microsoft.com/fwlink/?LinkId=81885).

Ugly.

One Last Thing

Ok. We're counting the rows in the AdventureWorks.Person.Contact table and displaying that value in a message box. Then, we're altering the connection string contained in the ConStr variable to point from AdventureWorks to TestSource, and then re-executing the Count query. Then we're displaying the ConStr variable value and the Count value again.

We're almost done. Here's where we use dynamic property expressions.

Click the MySource Connection Manager and press the F4 key to display the properties. Click the the Expressions property, then the ellipsis (...) button in the value textbox. The Property Expressions Editor form displays. Click the Property dropdown and select the ConnectionString property. In the Expression textbox, type @ConStr.

This maps the value contained in the ConStr variable into the ConnectionString property of the MySource Connection Manager - dynamically. Change the variable value and the ConnectionString changes. It's that simple - and that cool.

Test It!

Execute the package in debug mode to view the results:

Conclusion

Changing a connection string dynamically is but one use of Dynamic Property Expressions, albeit a very powerful use.

:{> Andy

Posted by andy | 1 Comments

SSIS Expression Language and the Conditional Split Transformation

Introduction

This post is part of a series titled An Introduction to the SSIS Expression Language.

In this post, I demonstrate the SSIS Expression Language and the Conditional Split Transformation.

To build the demo project described in this article you will need SSIS 2005 and the AdventureWorks sample database (which can be downloaded at http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004).

Build The Demo Project

If you have not already done so, create an SSIS project named ExpressionLanguageDemo.

Once the project is created, open Solution Explorer and rename Package.dtsx - change the name to ConditionalSplit.dtsx. If you've already created the project, right-click the SSIS Packages folder in Solution Explorer and click New Package - and rename that package to ConditionalSplit.dtsx. When prompted, rename the package object as well. (I think you should always do this - I always answer this message box by clicking the Yes button.)

Drag a Data Flow Task onto the Control Flow and click the Data Flow tab to edit. Drag an OLE DB Source onto the canvas. Double-click the OLE DB Source to open the editor. As in the post entitled SSIS Expression Language and Variables, click the New button to create a new Connection Manager to the AdventureWorks database.

Set the Data Access Mode to SQL Command and enter the following T-SQL statement into the SQL Command Text textbox:

Select
  ContactID
 ,Title
 ,FirstName
 ,MiddleName
 ,LastName
 ,EmailAddress
From Person.Contact

Click OK to close the editor.

On One Condition...

Drag a Conditional Split transformation onto the Data Flow canvas and connect the OLE DB Source Adapter to it:

Double-click the Conditional Split to open the editor. Expand the NULL Functions folder in the Operations list (upper right). Drag an ISNULL function into the Condition column of the first row in the Conditions grid. Expand the Columns folder in the Values list (upper left). Click and drag the Title column from the list onto the <<expression>> placeholder in the ISNULL expression:

The default name for a Condition is "Case n" where n is the row number of the Condition in the Condition grid.

Next, drag and drop the ContactID column into the second row's Condition column. Complete the expression so that it reads: ContactID <= 5000. Rename the Outputs to NullTitles and SmallContactIDs respectively:

What we've done here is define a couple of outputs. One of the outputs will contain rows where the Title is NULL. The other will contain - this is important, pay attention - rows where the Title is not NULL and the ContactID is less than or equal to 5000. Why is this? It's because rows with NULL Titles are redirected to the NullTitles output first. The ContactID value of these rows is never evaluated to see if it's less than or equal to 5000. If neither condition applies to a row, that row is sent to the Conditional Split Default Output. This operates a lot like a Switch statement in C# or a Select Case statement in VB, with the Default Output acting like the Else branch. You can adjust the order of condition evaluation using the spinner buttons on the right:

Click OK to close the Conditional Split editor.

Terminate It! 

Drag a Multicast transformation onto the Data Flow task surface. We'll use the Multicast to terminate a Data Flow Path. Drag a Data Flow Path from the Conditional Split transformation to the Multicast. When you do this, you'll be prompted to select an output from the Conditional Split to connect to the Multicast input:

After selecting an Output, click Ok to close:

The title of the output you selected appears in the label.

Conclusion

 Expressions are used to branch data row flow inside the Data Flow Task with the Conditional Split transformation.

:{> Andy 

Posted by andy | 1 Comments

SSIS Expression Language and the Derived Column Transformation

Introduction

This post is part of a series titled An Introduction to the SSIS Expression Language.

In this post, I demonstrate the SSIS Expression Language and the Derived Column Transformation.

To build the demo project described in this article you will need SSIS 2005 and the AdventureWorks sample database (which can be downloaded at http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004).

Add To The Demo Project

If you have not already done so, create an SSIS project named ExpressionLanguageDemo.

Once the project is created, open Solution Explorer and rename Package.dtsx - change the name to DerivedColumn.dtsx. If you've already created the project, right-click the SSIS Packages folder in Solution Explorer and click New Package - and rename that package to DerivedColumn.dtsx. When prompted, rename the package object as well. (I think you should always do this - I always answer this message box by clicking the Yes button.)

Drag a Data Flow Task onto the Control Flow. Click the Data Flow tab to begin editing, and drag an OLE DB Source onto the canvas. Double-click the OLE DB Source to open the editor. As in the post entitled SSIS Expression Language and Variables, click the New button to create a new Connection Manager to the AdventureWorks database.

Set the Data Access Mode to SQL Command and enter the following T-SQL statement into the SQL Command Text textbox:

Select
  ContactID
 ,Title
 ,FirstName
 ,MiddleName
 ,LastName
 ,EmailAddress
From Person.Contact

Click OK to close the editor.

Learning To Derive

Drag a Derived Column Transformation onto the Data Flow canvas and connect the OLE DB Source Adapter to it using a Data Flow Task:

 

Double-click the Derived Column transformation to open the editor.

Let's start somewhat in the middle. The Derived Column column of the editor contains a list of all the columns flowing into the transformation, plus an entry for "<add as new column>": 

 

If you select one of the columns flowing into the transformation, you can use an Expression to modify the value in some way. Why would you do this? Let's look at an example.

The Replacement

From the Derived Column dropdown, select "Replace 'MiddleName'":

Note a few properties are automatically populated when you make the selection. The Derived Column Name is set to "MiddleName." This makes sense - your selection indicates you want to replace the current values in the MiddleName column. The Data Type column contains the equivalent SSIS data type for the MiddleName field in the Person.Contact table.

Data Type coercion in SSIS is interesting - there are quite a few moving parts. The MiddleName field is an NVARCHAR(50) data type in the Person.Contact table. The connection between SSIS and the AdventureWorks database is made using an OLE DB Connection Manager. There is some data type corecion in the database - OLE DB provider - SSIS connection layers. By the time the data type makes it into the SSIS data flow pipeline, it is represented as a DT_WSTR, or Unicode string. The Length column is also populated with 50.

The Expression column is a blank slate upon which you write the code that controls value. You can hard code a value, such as a space, by enclosing a space inside double-quotes:

" "

You can hard code a Null value by using a NULL function. Expand the NULL Functions folder in the upper right of the Derived Column transformation and drag the Null function for the DT_WSTR data type into the Expression textbox, replacing the <<length>> placeholder with the length of the column (50):

NULL(DT_WSTR, 50)

Or you can conditionally replace the MiddleName value. In this case, we'll replace Null middle names with a space. If not Null, we'll allow the existing value to continue in the data flow pipeline. To do this, use the ternary syntax:

[Test Condition] ? [True Operation] : [False Operation]

To replace Null MiddleName values with a space, use the following expression:

IsNull(MiddleName) ? " " : MiddleName

Add One

In the second row, let's add a new column. In the Derived Column dropdown, select "<add as new column>". Note the only column auto-populated is the Derived Column Name column, and it contains "Derived Column 1". Rename this column "FullName." Add the following expression:

FirstName + (IsNull(MiddleName) ? " " : MiddleName) + LastName

Note the Data Type DT_WSTR for both columns. Suppose we want to change the data type? We can cast the data type for added columns. Expand the Type Casts function folder and drag a DT_STR cast to the beginning of the Expressions textbox - replacing the <<length>> placeholder with 150 (the summed lengths of the columns) and the <<code_page>> placeholder with 1252 for en-us (for US English, feel free to substitute with your code page if different) - to convert the value from Unicode to an ASCII string. The FullName expression should now read:

(DT_STR, 150, 1252)(FirstName + (IsNull(MiddleName) ? " " : MiddleName) + LastName)

Note the Data Type column is now a String:

You can add a DT_STR Cast statement to the expression for the MiddleName, but it doesn't change the Data Type. Why can't we change the data type for existing columns in the Derived Column transformation? We're replacing the values, not changing the data type. Is it impossible to change the data type of an existing column in the Derived Column? Let's put it this way: It is not possible to convert the data type of a column when you are merely replacing the value. You can, however, accomplish the same goal by creating a new column in the Data Flow.

To demonstrate, configure the third row to add a new column. Name the new column StrMiddleName and set the expression to:

(DT_STR, 150, 1252)MiddleName

Other options: You can also use a Data Conversion Transformation to change the data type of a column in the data flow pipeline, but the Data Conversion also adds a column to the data flow pipeline. You can also cast the value in T-SQL from the source (in the OLE DB Source Adapter).

Conclusion

The Derived Column Transformation is very flexible and powerful. The mathematical and string parsing functions are more easily understood by folks with a developer background. The ETL-specific functions - such as those demonstrated in this post: NULL fields, type-casting, and ternary conditional operations - are not as commonly understood.

:{> Andy

Posted by andy | 1 Comments

SSIS Expression Language and Variables

Introduction

This post is part of a series titled An Introduction to the SSIS Expression Language.

In this post, I demonstrate the SSIS Expression Language and Variables.

To build the demo project described in this article you will need SSIS 2005 and the AdventureWorks sample database (which can be downloaded at http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004).

Build The Demo Project

If you have not already done so, create an SSIS project named ExpressionLanguageDemo.

Once the project is created, open Solution Explorer and rename Package.dtsx - change the name to Variables.dtsx. When prompted, rename the package object as well. (I think you should always do this - I always answer this message box by clicking the Yes button.)

Make Some Variables 

Click the SSIS dropdown menu and select Variables.

When the Variables window displays, click the Add Variable button (first button on the left) to create a new variable with the following properties:

  • Name: SQL
  • Scope: Variables
  • Data Type: String
  • Value: Select 1 As One 

 

I included a syntactically-correct Transact-SQL statement as the default value. Why? Later I plan to use this statement in an Execute SQL Task. I chose to include a valid default to avoid validation warnings and errors.

Validation: SSIS provides design-time and run-time validation of components and settings. In general, this is a good thing as it catches real and potential errors before the SSIS package is executed. SSIS also provides a means of ignoring design-time validation warnings and errors via the DelayValidation property.

Add Some Tasks

Drag a Script Task onto the Control Flow and double-click to open the editor. If you're using SSIS2008, set the ScriptLanguage property on the Script page to Microsoft Visual Basic 2008. (If you're using SSIS 2005 you have no other option.) Add SQL to the list of ReadOnlyVariables and click the Design Script (Edit Script in SSIS 2008) button to open the script engine editor.

Replace the code in Public Sub Main() with the following:

Public Sub Main()
  Dim sSQL As String = Dts.Variables("SQL").Value.ToString
  MsgBox(
"SQL: " & sSQL)
  Dts.TaskResult = Dts.Results.Success
End Sub

Close the Visual Studio for Applications editor and click OK to close the Script Task editor.

Execute the package and observe the result. You should see a message box displaying the value of the SQL variable:

Just the SQL, Ma'am

Next, add an Execute SQL Task to the Control Flow canvas and connect a Precedence Constraint from the Script Task to the Execute SQL Task. Double-click the Execute SQL Task to open the editor. Leave the ConnectionType property set to OLE DB. Click the dropdown for the Connection property and click "<New connection...>":

 

When the Configure OLE DB Connection Manager form displays, select a connection to the AdventureWorks database if one exists in your Data Connections list. If not, click the New button. Configure the connection to your server - I use (local) for my default local instance - and the AdventureWorks database:

Click OK until you return to the Execute SQL Task editor. You have three options for the SQLSourceType property: Direct input, File Connection, and Variable. There is a limit to the number of characters you can enter using direct input. I'm not sure but I think it's around 4,000 or so. I've hit this limit once - and in a big way: the client required 4 MB of dynamic SQL. I would add italics to that statement if it didn't make it look so funny. After unsuccessfully lobbying for a better approach, I managed the dynamic SQL in a script task that wrote it to a file, and used a file connection SQLSourceType to execute it.

For our example, set the SQLSourceType to Variable. Then set the SourceVariable property (which was hidden until you selected the SQLSourceType Variable) to User::SQL.

Click OK to close the editor and execute the package to test. Click OK when the message box displays. The Execute SQL Task should succeed:

 

Make it Dynamic

First, we'll add some more variables and build a dynamic SQL statement the old school way.

Create the following package-scoped String data type variables [with default values]:

  • SelectClause [SELECT Title, FirstName, LastName, EmailAddress]
  • FromClause [FROM Person.Contact]
  • WhereClause [WHERE LastName IN ('Smith','Jones')]

Open the Script Task editor and navigate to the Script page. Move the SQL variable from the ReadOnlyVariables property to the ReadWriteVariables property. Add SelectClause, FromClause, and WhereClause to the ReadOnlyVariables property:

Click the Design Script button and replace the code in Public Sub Main() with the following:

Public Sub Main()
 
Dim sSelect As String = Dts.Variables("SelectClause").Value.ToString
  Dim sFrom As String = Dts.Variables("FromClause").Value.ToString
  Dim sWhere As String = Dts.Variables("WhereClause").Value.ToString
  Dim sSQL As String = sSelect & " " & sFrom & " " & sWhere
  Dts.Variables(
"SQL").Value = ssql
  MsgBox(
"SQL: " & sSQL)
  Dts.TaskResult = Dts.Results.Success
End Sub

Close the VSA editor and click OK to close the Script Task editor. Execute the package to examine the results:

Express Yourself!

Let's look at another way to accomplish the same result, this time using expressions.

First, edit the Script Task Public Sub Main() code to read:

Public Sub Main()
  Dim sSQL As String = Dts.Variables("SQL").Value.ToString
  MsgBox(
"SQL: " & sSQL)
  Dts.TaskResult = Dts.Results.Success
End Sub

This returns the script task functionality to simply displaying the value of the SQL variable.

In the Variables window, click on the SQL variable and press the F4 key to display the properties of the SQL variable. Change the EvaluateAsExpression property to True and enter the following expression in the Expression property:

@SelectClause + " " + @FromClause + " " + @WhereClause

This changes the way the SQL variable works. It no longer contains the value specified in the Value column of the Variables window. Instead, the value of the SQL variable is determined by the expression, which contains the SelectClause, FromClause, and WhereClause variables.

Conclusion

The SSIS Expression Language can be used with variables to dynamically set the value of one variable from one or more other variables.

:{> Andy

Posted by andy | 2 Comments

An Introduction to the SSIS Expression Language

Introduction

The SSIS Expression Language is one of the steeper slopes on the SSIS learning curve. Many database professionals have never been exposed to the syntax. In this series, I provide examples that demonstrate common uses for the SSIS Expression Language.

The examples are built in SSIS 2005. Why? Not everyone has access to SSIS 2008 at this time, and the examples work in both platforms.

Syntax 

Books Online has some good information about the SSIS Expression Language.  This is an excellent place to start.

For starters, the syntax shares similarities with what I call "curly-brace" languages: C++, C#, Java, etc. It's obvious (to me at least) from the "=" operators:

  • == for comparison
  • = for assignment

Literals

Literals often prove challenging for folks unfamiliar with the syntax. Of particular interest is escape characters. The two escape sequences I'm most often asked about are double quotes and the backslash. As in curly-brace languages, the backslash is the escape indicator. This makes Windows paths loads of fun in SSIS.

For example, you think you're storing the path C:\results\new\andysdata.csv, but what you're telling SSIS is C: [Carriage Return] esults [New line] ew [Alert] ndysdata.csv. Yeah. That's gonna make for some interesting error messages! So how to address it? C:\\results\\new\\andysdata.csv works, as two backslashes together are the escape sequence for a single backslash.

If you want to include double-quotes in an expression, the escape sequence is backslash + " - \".

Casting

Casting is also unintuitive to the uninitiated. SSIS implicit conversions throw me every time. SSIS has its own set of data types, and the different data providers have their set of data types. Some interesting things happen where these data types meet.

To cast to a string (or SQL Server varchar) data type, preface the value with (DT_STR, [len], [code page]). For example, if you want to cast the integer 42 to a 2-character, en-us string; the expression is (DT_STR, 2, 1252) 42.

Conclusion

There's more to come in this series. I hope you enjoy it!

:{> Andy

Posted by andy | 2 Comments

My 2008 PASS Sumit Presentations

I am honored to be selected to present at the 2008 PASS Summit!

I'm presenting two sessions: Test-Driven Development For Database Professionals and SSIS Scripting. I've been working on the presentations recently and realize there will be a few minutes left in the SSIS Scripting talk unless I talk slow. (Some people say I talk slow enough already - and that I have a Southern accent. Can you believe it?)

Anyway, I open the floor to you good people - the readers of my blog: What cool (or useful, or challenging) SSIS Scripting demo would you like to see? We'll make it a contest. Leave a comment here (keep in mind I moderate comments so they will not appear automatically) and I'll pick one.

I'll have to come up with some cool gift to award the winner.

:{> Andy

Posted by andy | 1 Comments

SSIS Design Pattern - Collect Enterprise SQL Server Database Metadata With SSIS

Introduction

A few months back I received an email from a nice lady who was beginning to work with SSIS. She was trying to use SSIS to automate metadata collection in her enterprise and she had made a good start in the package she attached to the email. She had a few questions so I opened the package and began poking around some to answer them. I realized it would take me a lot longer to type out answers to her questions than to simply build the SSIS package for her (or at least make a good start). So I did. This post is about that package.

I've built similar stuff in the past to monitor the schemas of data warehouse sources. "Why would you do such a thing, Andy?" I know this will come as a shock to you: Sometimes developers and DBAs make changes to the schemas of data warehouse sources and don't tell the data warehouse people. Ok, I made that part up - that never happens. But think about how you could address the issue if it ever happened. Hypothetically.

First Things First

There are a couple approaches to collecting SQL Server database metadata. My preference is to automate the process to the point that I can drop it almost anywhere and get results with a minimal amount of tinkering and tweaking. For me, this means my development process is iterative because I never get things right the first time. Ever.

This Version

The current version of this package is striped 0.4. The version I write about here is version 0.1. It serves to introduce the principles and functionality in a less complex way - allowing you the Reader to take the project in any direction you deem cool. But that means there's stuff missing from this SSIS package that I would normally include. Stuff like error handling and logging, for example.

A Place For My Data 

Before I started I decided to store the data in a SQL Server database. You don't need the database for the SSIS part of the project described in this post. You can simply skip the tasks that write the metadata to the database. But if you skip them and I decide to write about version 0.2, you will come back and want the database scripts. You can download them here (free registration required).

In this post, I am not going to focus on the database I designed. Like the SSIS package, this is version 0.1 of the database. There is no attempt at referential integrity in this version. It is a simple, single-pass effort at designing a landing zone for this data.

A. The Servers

Servers? Raise Your Hands Please

Create a new SSIS project and name it EnterpriseDBMetadata. Name the package EnterpriseDBMetadata.dtsx.

Create the following package-scoped variables (in alphabetical order):

  • ColumnDataType (String)
  • ColumnName (String)
  • ColumnNames (Object)
  • ColumnObjectID (Int32)
  • DatabaseId (Int32)
  • DatabaseName (String)
  • DatabaseNames (Object)
  • DatabaseObjectId (Int32)
  • saSQLServers (Object)
  • SchemaId (Int32)
  • SchemaName (String)
  • SchemaNames (Object)
  • SchemaObjectId (Int32)
  • SQLServerId (Int32)
  • SQLServerInstanceName (String)
  • TablesAndViewsId (Int32)
  • TablesAndViewsName (String)
  • TablesAndViewsNames (Object)
  • TablesAndViewsObjectId (Int32)
  • TablesAndViewsType (String)

 

Add a Script Task to the Control Flow and rename it Enumerate SQL Server Instances. Editing the Script Task, add saSQLServers to the ReadWriteVariables property:

 

Click the Design Script button to open the Visual Studio for Applications (VSA) script designer and, in Project Explorer, right-click References to add references to the Microsoft.SqlServer.Smo, Microsoft.SqlServer.SmoEnum, and Microsoft.SqlServer.SqlEnum assemblies:

 

Next add the following VB.Net code to the script designer:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports Microsoft.SqlServer.Management.Smo
Imports Microsoft.SqlServer.Management.Common

Public Class ScriptMain

   Public Sub Main()

      Dim oSmo As New SmoApplication
      Dim dtSQLServers As DataTable = oSmo.EnumAvailableSqlServers()
      Dim iSQLServerCount As Integer = dtSQLServers.Rows.Count - 1
      Dim i As Integer = 0
      Dim saSQLServers() As String
      Dim sSQLServerInstanceName As String

      ReDim saSQLServers(iSQLServerCount)

      For Each row As DataRow In dtSQLServers.Rows
         sSQLServerInstanceName = row(
"Name").ToString
         'MsgBox(sSQLServerInstanceName)
         saSQLServers(i) = sSQLServerInstanceName
         i += 1
      Next

      Dts.Variables("saSQLServers").Value = saSQLServers

      Dts.TaskResult = Dts.Results.Success

   End Sub

End
Class

This is a handy piece of code. It basically creates a list of all the SQL Server instances it can locate on your network and shoves that list into an SSIS variable called saSQLServers.

Shredding The List

Steve Jones - my friend, fellow SQL Server MVP, and editor of SQL Server Central - was kind enough to publish a recent article I wrote about using the Foreach Loop Container to shred object variables. You can read more about that here. Shredding is just a fancy word (we call them "$3 words" here in Farmville) for reading individual items in a collection.

Next drop a Foreach Loop Container onto the Control Flow and connect the Script Task to it with a Success Precedence Constraint:

 

Double-click the Foreach Loop Container to open the editor. On the General page rename it ForEach SQL Server Instance. Select the Foreach From Variable Enumerator on the Collection page, and then select the saSQLServers variable:

 

On the Variable Mappings page, map the SQLServerInstanceName variable to Index 0. There's only one column of items in my saSQLServers variable - it's an SSIS object variable, but it contains a single-dimension String array:

 

Click the Ok button to close the Foreach Loop Container editor.

Let's take a moment to talk about what we did. We have a variable named saSQLServers that is populated by the Script Task with a list of all the SQL Server instances it can identify on the network. The Foreach Loop Container shreds this variable, walking through each value - each SQL Server instance name, in this case - one at a time. Each pass, it pushes the value of one of those SQL Server instance names into a variable called SQLServerInstanceName.

In and of itself, this is pretty cool.

The Destination Connection

Now we need to connect our SSIS package to SQL Server so we can store the name of the servers we can reach. Right-click inside the Connection Managers window at the bottom of the Control Flow and select New OLE DB Connection:

 

When the Configure OLE DB Connection Manager form displays, click the New button to display the Connection Manager editor. In the Server Name dropdown, enter the name of the SQL Server instance to which you deployed the EnterpriseDBMetadata scripts. In the "Select or enter a database name" dropdown, select EnterpriseDBMetadata:

I always recommend testing the connection. Click the Test Connection button to verify connectivity, then click the Ok button to close the Connection Manager editor. Then click the Ok button again to close the Configure OLE DB Connection Manager.

Rename this connection "EnterpriseDBMetadata.OLEDB".

The Source Connection

I can hear you thinking "Gosh Andy, you did that backwards! Most people start at the source and then talk about the destination." Yeah, I know. But I'm an engineer at heart and engineers are not normal people.

How many source Connection Managers do we need?

"That's a trick question Andy, I know it! But I think we need one for each SQL Server instance out there."

You are partially correct. It is a trick question - you're right about that. We do need one source Connection Manager for each SQL Server instance, but we are only going to be connected to one SQL Server instance at a time as we iterate through our Foreach Loop Container. So we can get away with one source Connection Manager if we can only find a way to aim it at the individual SQL Server instance we wish, when we wish it.

So let's do that.

Create another OLE DB Connection Manager as you did before. This time configure it to connect to a local or developement instance to which you can connect using Windows Authentication. Configure the database name as before and click the Ok buttons to end the Connection Manager creation.

Rename this Connection Manager "SQLServerInstance.OLEDB".

Right-click the SQLServerInstance.OLEDB Connection Manager and select Properties. Click on the Expressions Property and then click the ellipsis in the Value textbox for the Expressions Property:

 

When the Property Expressions form displays, select the ServerName property from the Property dropdown:

Click on the ellipsis on the Expression textbox to display the Expression Builder form: 

Expand the Variables list and drag the SQLServerInstanceName variable into the Expression textbox as shown above. I recommend always checking the value of an expression. To do so, click the Evaluate Expression button. Click the Ok button to close the Expression Builder, then click the Ok button again to close the Property Expressions editor.

You've just done something else that's pretty cool, so let's reflect: We have a variable named saSQLServers that is populated by the Script Task with a list of all the SQL Server instances it can identify on the network. The Foreach Loop Container shreds this variable, walking through each value - each SQL Server instance name, in this case - one at a time. Each pass, it pushes the value of one of those SQL Server instance names into a variable called SQLServerInstanceName. And each time it changes the value of the SQLServerInstanceName variable, the SQLServerInstance.OLEDB Connection Manager's connection properties are updated to now point to that instance of SQL Server.

How cool is that?

Show Me!

I love lamp. And I like Script Tasks. I use Script Tasks a lot (lamps too!). One cool use of them is to show me the value of a variable that's being changed by a Foreach Loop Container. Since we have a Foreach Loop Container that's changing a variable, allow me to demonstrate.

Drag a Script Task into the Foreach Loop Container and rename it "Show SQL Server Instance Name". Open the editor and add SQLServerInstanceName to the ReadOnlyVariables property:

 

Add these two lines of code to Sub Main():

Dim sSQLServerInstanceName As String = Dts.Variables("SQLServerInstanceName").Value.ToString
MsgBox(
"SQL Server Instance Name: " & sSQLServerInstanceName)

All this script does is read the value currently stored in SQLServerInstanceName and display it in a message box:

Close the VSA editor and click the Ok button to close the Script Task editor.

Test Run Time!

You can run the package now to see which servers it finds. I did. That's how I generated the last image.

Store It

Next, drag an Execute SQL Task into the ForEach SQL Server Instance Container and connect the Script Task to it using a Success precedence constraint. Name the Execute SQL Task "Add the SQL Server name" and double-click it to open the editor.

Select the EnterpriseDBMetadata.OLEDB connection in the Connection property dropdown. Click the SQLStatement property and then the ellipsis in the Value. When the Enter SQL Query form displays, paste this T-SQL into the textbox:

If
Not Exists(Select SQLServerInstanceName
              from dbo.SQLServerInstances
              where SQLServerInstanceName = ?)
 begin
  Insert Into dbo.SQLServerInstances
  (SQLServerInstanceName)
  Output inserted.Id
  Values(?)
 end
Else
 begin
  Select Id
  from dbo.SQLServerInstances
  where SQLServerInstanceName = ?
 end

See the question marks? This is a parameterized query. The question marks are placeholders for parameters. The collection of parameters comprises an array, and the array is zero-based. This means the first question mark - that one in the first If statement - maps to parameter 0. The second maps to parameter 1, the third to parameter 2. There's a pattern emerging here... You get the picture.

To map these parameters in the Execute SQL Task, click on the Parameter Mapping page:

 

Map parameters by clicking the Add button to add a parameter mapping. Since we have three question marks in our T-SQL, you need three parameters. The first parameter maps to the SQLServerInstanceName variable, so I select that variable from the Variable Name dropdown. I am supplying the variable to the query, so it's an Input (Direction). It's a String data type and Strings map to the VARCHAR Data Type in the OLEDB provider I'm using in the ConnectionType property. I supply the ordinal of the parameter to which I am mapping for Parameter Name. In other words, this is the first question mark, so the Parameter Name is 0. I leave the Parameter Size set to the default (-1).

Now it just so happens that all three of our question marks map to the same SSIS variable value: SQLServerInstanceName. That isn't always the case. But since it is here, I configure the remaining two parameters identically to the first, the only exception is the Parameter Name --> ordinal values.

The T-SQL in this Execute SQL Task returns a value. If the dbo.SQLServerInstances table does not contain a value for the current SQLServerInstanceName, the current SQLServerInstanceName is added. Did you catch the Output clause in the Insert statement? Output inserted.Id returns the value of the newly inserted Id, which is an Identity(1,1) column in the dbo.SQLServerInstances table. The Output clause rocks. It's new in T-SQL 2005 and allows us to write a single statement to perform the insert and return the inserted row's identity value. Note that if the SQLServerInstanceName exists in the dbo.SQLServerInstances table, the Id for that row is returned.

Either way, you get an Id value back from this statement. What do you do with it? Return to the General page of the Execute SQL Task editor and change the ResultSet Property value from "None" to "Single row". This tells the Execute SQL Task to expect a row back from the T-SQL statement.

We want to capture this Id value for later (patience! You'll see...). Now that we've told the Execute SQL Task to expect a single row from the statement's execution, click on the Result Set page to configure it.

Similar to the parameters collection, single-row resultsets are a zero-based array and use ordinals to map various columns in the row to SSIS variables. Click the Add button to add a new ResultSet. Change the Result Name to 0 and select the SQLServerId variable:

Let's review: We configured the Execute SQL Task to find or insert a SQLServerInstanceName into the dbo.SQLServerInstances table and return the Id of the row to us. We used a parameterized query for this. Why? Because it's inside a Foreach Loop Container and the SQLServerInstanceName value is going to change with each iteration (pass) through the loop. We want to add them all to the dbo.SQLServerInstances table, one at a time. Make sense? Good. We also want to store the Id value for some reason Andy won't tell us yet. Bad Andy.

The General page of the configured Execute SQL Task looks like this:

Click the Ok button to close the Execute SQL Task editor.

A Pattern Emerges...

What we've built so far is actually the template for everything else in this package. The package is a lot more complex than this, but it will help you immensely if you realize that this piece of functionality is merely repeated with minor tweaking to build the remainder of the package.

Now. You can copy and paste a lot here if you want to. And I don't mind if you do so long as you promise to build really good tests. You should build really good tests anyway but testing SSIS's a topic for another post. My point is simple: You can introduce a host of errors - some of them very interesting (which is another way of saying "hard to find") - cutting and pasting. So be careful.

The pattern looks like this:

 

That's it really. Looks simple when you look at it this way. That's because it is simple. It just takes work to implement this in SSIS. Or in any other language or platform. Surprise! We call this "software development". Smile

B. The Databases

Next drag another Execute SQL Task into the ForEach SQL Server Instance Loop Container. Connect the "Add the SQL Server name" Execute SQL Task to the new one using a Success precedence constraint and rename the new Execute SQL Task "Get DatabaseNames":

Double-click the new Execute SQL Task to open the editor. Set the Connection property to the SQLServerInstance.OLEDB Connection Manager. Set the SQLStatement property to the following T-SQL statement:

select name, Database_Id
from master.sys.databases
where name not in ('master','msdb','model','tempdb')

This statement queries the master.sys.databases table for a list of databases and their Database_Id values, explicitly excluding system databases by name. Now there are other ways to explicitly exclude system databases and you are welcome to add them to the comments for this post. I love learning new stuff - teach me something! (In Farmville, we'd say "Learn me something". We talk funny. I know.)

We do not use any parameters in this statement. It is dynamically aimed at the current instance of SQL Server by the dynamic expression we defined for the ServerName property or the SQLServerInstance.OLEDB Connection Manager. So this query will return a list of database on each SQL Server Instance identified by the "Enumerate SQL Server Instances" Script Task, as saSQLServers is shredded by the "ForEach SQL Server Instance" Loop Container, pushing new values into the SQLServerInstanceName variable, re-aiming the SQLServerInstance.OLEDB Connection Manager. Got it? Good.

What to do with this list of databases... I know - let's push them into a variable.

Unlike last time, we're returning a couple columns and several rows. The Single Row ResultSet simply will not do - we need a "Full result set" so set the ResultSet property of the Execute SQL Task to this value. On the Reuslt Set page, click the Add button and name the new Result 0. Select DatabaseNames from the Variable Name column:

So now, the list of database names and IDs are going to be pushed into the DatabaseNames object variable.

Object variables are cool. You can use them for anything. They are like Variants in old VB. They hold scalars or collections, or collections of collections. In this case we're pushing a full ADO.Net Dataset into the DatabaseNames object variable.

How cool is that?

Click the Ok button to close the Execute SQL Task editor.

A-Shreddin' We Will Go

Next drag a Foreach Loop Container into the "ForEach SQL Server Instance" Loop Container. Connect the Get DatabaseNames Execute SQL Task to the new Foreach Loop Container using a Success precedence constraint and rename the new Foreach Loop Container "Foreach Database".

Double-click the Foreach Database Loop Container and navigate to the Collection page. Select the Foreach ADO Enumerator and the DatabaseNames variable in the "ADO object source variable" dropdown. ADO.Net datasets can hold multpile tables but ours only has one table, so accept the default Enumeration mode of "Rows in the first table":

 

Click the Variable Mappings page and add a couple variable mappings.

Remember: the DatabaseNames object variable contains an ADO.Net dataset that has one DataTable. That table is populated with the results of the T-SQL query in the "Get DatabaseNames" Execute SQL Task. The T-SQL shaped the DataTable contained inside the DatabaseNames variable. That query had two columns in the Select clause. We map the data in the DatabaseNames variable to scalar SSIS variables using the ordinal of the columns in that T-SQL Select clause. Since it read select name, Database_Id, we map name to the DatabaseName SSIS variable using ordinal 0 and Database_Id to the DatabaseObjectId SSIS variable using ordinal 1. Make sense? Good.

Click the Ok button to close the Foreach Loop Container editor.

Let's look at what we have here. Once we've established a connection to a particular instance of SQL Server, we query the master database for a list of database names and Ids. We then shred that list in a nested Foreach Loop Container.

Before we move on, we need to add some more dynamic functionality to our SQLServerInstance.OLEDB Connection Manager. So right click the SQLServerInstance.OLEDB Connection Manager and select Properties. Click the Expressions property and then click the ellipsis in the Value textbox. When the Property Expressions Editor displays select InitialCatalog in the Property dropdown. Click the ellipsis in the Expression textbox:

 

In the Expression Builder, add the DatabaseName variable to the Expression:

This makes the SQLServerInstance.OLEDB Connection Manager even more dynamic. It now is aimed at individual databases as the Foreach Database Loope Container iterates.

Show Me The Databases!

Drag a Script Task into the Foreach Database Loop Container. Rename it "Show Database Name" and double-click to open the editor. On the Script page add the SQLServerInstanceName, DatabaseName, DatabaseObjectId variables to the ReadOnlyVariables property:

Click the Design Script button and add the following code to Sub Main():

Dim sSQLServerInstanceName As String = Dts.Variables("SQLServerInstanceName").Value.ToString
Dim sDatabaseName As String = Dts.Variables("DatabaseName").Value.ToString
Dim iDatabaseObjectId As Integer = Convert.ToInt32(Dts.Variables("DatabaseObjectId").Value)

Dim
sMsg As String = "SQL Server Instance Name: " & sSQLServerInstanceName & vbCrLf & _
             "Database Name: " & sDatabaseName & vbCrLf & _
             "Database Id: " & iDatabaseObjectId.ToString

MsgBox(sMsg)

Close the VSA editor and click the Ok button to close the Script Task editor.

This script builds a string containing the values of the SQLServerInstanceName, DatabaseName, and DatabaseObjectId variables; then displays them in a message box:

 

Store The Database Metadata

Drag an Execute SQL Task into the Foreach Database Loop Container. Connect the Show Database Name Script Task to it using a Success precedence constraint and rename the Execute SQL Task "Add the Database Name". Double-click it to open the editor. Set the following properties:

  • ResultSet: Single row 
  • Connection: EnterpriseDBMetadata.OLEDB
  • SQLStatement:

    If Not Exists(Select DatabaseName
                  from dbo.Databases
                  where DatabaseName = ?
                    and SQLServerInstancesTableId = ?)
     begin
      Insert Into dbo.Databases
      (DatabaseObjectId
      ,SQLServerInstancesTableId
      ,DatabaseName)
      output inserted.Id
      Values(?, ?, ?)
     end
    Else
     begin
      Select Id
      from dbo.Databases
      where DatabaseName = ?
      and SQLServerInstancesTableId = ?
     end

There are a few more parameters this time. Map them as shown in the following list and image:

  • DatabaseName, Input, VARCHAR, 0
  • SQLServerId, Input, LONG, 1
  • DatabaseObjectId, Input, LONG, 2
  • SQLServerId, Input, LONG, 3
  • DatabaseName, Input, VARCHAR, 4
  • DatabaseName, Input, VARCHAR, 5
  • SQLServerId, Input, LONG, 6

 

Click the Result Set page and add a new resultset. Set the Result Name to 0 and assign the result to the variable DatabaseId:

Click the Ok button to close the Execute SQL Task editor.

Is it starting to come together now? Can you see where we're headed? Do you smell what Andy's cooking? Wink

C. The Schemas 

Add another Execute SQL Task to the Foreach Database Loop Container and rename it "Get SchemaNames". Add a Success precedence constraint from the "Add the Database Name" Execute SQL Task to the Get SchemaNames Execute SQL Task, and double-click Get SchemaNames to open the editor. Set the following properties:

  • ResultSet: Full result set
  • Connection: SQLServerInstance.OLEDB
  • SQLStatement:

    select name, Schema_Id
    from sys.schemas
    where schema_id < 16384
    and name not in ('guest', 'INFORMATION_SCHEMA', 'sys')

This statement returns dbo and user schemas and their respective Ids.

Configure the Result Set page with the following resultset properties:

  • Result Name: 0
  • Variable Name: SchemaNames

Click the Ok button to close the Execute SQL Task editor.

Shred The Schemas

Drag a new Foreach Loop Container into the Foreach Database Loop Container and rename it "Foreach Schema". Connect the "Get SchemaNames" Execute SQL Task to it using a Success precedence constraint. Double-click Foreach Schema to open the editor and set the following properties:

  • Collection page, Enumerator: Foreach ADO Enumerator
  • Collection page, Ado object source variable: SchemaNames
  • Collection page, Enumeration mode: Rows in first table
  • Variable Mappings page
    • Variable: SchemaName; Index: 0
    • Variable: SchemaObjectId; Index: 1

Click the Ok button to close the Foreach Loop Container editor.

Show Me The Schemas!

Drag a Script Task into the Foreach Schema Loop Container and rename it "Show Schema Name". Double-click to open the editor and add the following SSIS variables to the ReadOnlyVariables property: SQLServerInstanceName, DatabaseName, DatabaseObjectId, SchemaName, SchemaObjectId. Click the Design Script button to open the VSA editor and add the following code to Sub Main():

Dim sSQLServerInstanceName As String=Dts.Variables("SQLServerInstanceName").Value.ToString
Dim sDatabaseName As String = Dts.Variables("DatabaseName").Value.ToString
Dim iDatabaseObjectId As Integer = Convert.ToInt32(Dts.Variables("DatabaseObjectId").Value)
Dim sSchemaName As String = Dts.Variables("SchemaName").Value.ToString
Dim iSchemaObjectId As Integer = Convert.ToInt32(Dts.Variables("SchemaObjectId").Value)

Dim sMsg As String = "SQL Server Instance Name: " & sSQLServerInstanceName & vbCrLf & _
                     "Database Name: " & sDatabaseName & vbCrLf & _
                     "Database Id: " & iDatabaseObjectId.ToString & vbCrLf & _
                     "Schema Name: " & sSchemaName & vbCrLf & _
                     "Schema Id: " & iSchemaObjectId.ToString

MsgBox(sMsg)

Close the VSA editor and click the Ok button to close the Script Task editor.

Store Schema Metadata

Drag an Execute SQL Task into the Foreach Schema Loop Container and connect the "Show Schema Name" Script Task to it using a Success precedence constraint. Rename the Execute SQL Task "Add the Schema Name" and double-click it to open the editor. Set the following properties:

  • ResultSet: Single row 
  • Connection: EnterpriseDBMetadata.OLEDB
  • SQLStatement:

    If Not Exists(Select SchemaName
                  from dbo.Schemas
                  where SchemaName = ?
                    and DatabasesTableId = ?)
     begin
      Insert Into dbo.Schemas
      (SchemaObjectId
      ,DatabasesTableId
      ,SchemaName)
      output inserted.Id
      Values(?, ?, ?)
     end
    Else
     begin
      Select Id
      from dbo.Schemas
      where SchemaName = ?
      and DatabasesTableId = ?
     end

Map the parameters as shown in the following list and image:

  • SchemaName, Input, VARCHAR, 0
  • DatabaseId, Input, LONG, 1
  • SchemaObjectId, Input, LONG, 2
  • DatabaseId, Input, LONG, 3
  • SchemaName, Input, VARCHAR, 4
  • SchemaName, Input, VARCHAR, 5
  • DatabaseId, Input, LONG, 6

Click the Result Set page and add a new resultset. Set the Result Name to 0 and assign the result to the variable SchemaId:

Click the Ok button to close the Execute SQL Task editor.

D. The Tables and Views

Add another Execute SQL Task to the Foreach Schema Loop Container and rename it "Get Table and View Names". Add a 
Success precedence constraint from the "Add the Schema Name" Execute SQL Task to the "Get Table and View Names" Execute SQL Task, and double-click Get SchemaNames to open the editor. Set the following properties:

  • ResultSet: Full result set
  • Connection: SQLServerInstance.OLEDB
  • SQLStatement:

    select name, Object_Id, type_desc
    from sys.tables
    where schema_id = ?
    union
    select name, Object_Id, type_desc
    from sys.views
    where schema_id = ?

This statement returns tables and views, their respective Ids, and type descriptions (table or view). 

Map the following parameters:

  • SchemaObjectId, Input, LONG, 0, -1
  • SchemaObjectId, Input, LONG, 1, -1

Configure the Result Set page with the following resultset properties:

  • Result Name: 0
  • Variable Name: TablesAndViewsNames

Click the Ok button to close the Execute SQL Task editor.

Shred The Tables and Views

Drag a new Foreach Loop Container into the Foreach Schema Loop Container and rename it "Foreach Table and View". Connect the "Get Table and View Names" Execute SQL Task to it using a Success precedence constraint. Double-click Foreach Table and View to open the editor and set the following properties:

  • Collection page, Enumerator: Foreach ADO Enumerator
  • Collection page, Ado object source variable: TableAndViewsNames
  • Collection page, Enumeration mode: Rows in first table
  • Variable Mappings page
    • Variable: TableAndViewsName; Index: 0
    • Variable: TableAndViewsObjectId; Index: 1
    • Variable: TableAndViewsType; Index: 2

Click the Ok button to close the Foreach Loop Container editor.

Show Me The Table And Views!

Drag a Script Task into the Foreach Table and View Loop Container and rename it "Show Table and View Name". Double-click to open the editor and add the following SSIS variables to the ReadOnlyVariables property: SQLServerInstanceName, DatabaseName, DatabaseObjectId, SchemaName, SchemaObjectId, TablesAndViewsName, TablesAndViewsObjectId, TablesAndViewsType. Click the Design Script button to open the VSA editor and add the following code to Sub Main():

Dim sSQLServerInstanceName As String = Dts.Variables("SQLServerInstanceName").Value.ToString
Dim sDatabaseName As String = Dts.Variables("DatabaseName").Value.ToString
Dim iDatabaseObjectId As Integer = Convert.ToInt32(Dts.Variables("DatabaseObjectId").Value)
Dim sSchemaName As String = Dts.Variables("SchemaName").Value.ToString
Dim iSchemaObjectId As Integer = Convert.ToInt32(Dts.Variables("SchemaObjectId").Value)
Dim sTableAndViewName As String = Dts.Variables("TablesAndViewsName").Value.ToString
Dim iTableAndViewObjectId As Integer = Convert.ToInt32(Dts.Variables("TablesAndViewsObjectId").Value)
Dim sTableAndViewType As String = Dts.Variables("TablesAndViewsType").Value.ToString

Dim sMsg As String = "SQL Server Instance Name: " & sSQLServerInstanceName & vbCrLf & _
"Database Name: " & sDatabaseName & vbCrLf & _
"Database Id: " & iDatabaseObjectId.ToString & vbCrLf & _
"Schema Name: " & sSchemaName & vbCrLf & _
"Schema Id: " & iSchemaObjectId.ToString & vbCrLf & _
"Table / View Name: " & sTableAndViewName & vbCrLf & _
"Table / View Id: " & iTableAndViewObjectId.ToString & vbCrLf & _
"Table / View Type: " & sTableAndViewType
MsgBox(sMsg)

Close the VSA editor and click the Ok button to close the Script Task editor.

Store Tables And Views Metadata

Drag an Execute SQL Task into the Foreach Tables and Views Loop Container and connect the "Show Table and View Name" Script Task to it using a Success precedence constraint. Rename the Execute SQL Task "Add the Table and View Name" and double-click it to open the editor. Set the following properties:

  • ResultSet: Single row 
  • Connection: EnterpriseDBMetadata.OLEDB
  • SQLStatement:

    If Not Exists(Select ObjectName
                  from dbo.TablesAndViews
                  where ObjectName = ?
                    and ObjectType = ?
                    and SchemasTableId = ?)
     begin
      Insert Into dbo.TablesAndViews
      (TablesAndViewsObjectId
      ,SchemasTableId
      ,ObjectName
      ,ObjectType
    )
      output inserted.Id
      Values(?, ?, ?, ?)
     end
    Else
     begin
      Select Id
      from dbo.TablesAndViews
      where ObjectName = ?
        and ObjectType = ?
        and SchemasTableId = ?
     end

Map the parameters as shown in the following list and image:

  • TablesAndViewsName, Input, VARCHAR, 0
  • TablesAndViewsType, Input, VARCHAR, 1
  • SchemaId, Input, LONG, 2
  • TablesAndViewsObjectId, Input, LONG, 3
  • SchemaId, Input, LONG, 4
  • TablesAndViewsName, Input, VARCHAR, 5
  • TablesAndViewsType, Input, VARCHAR, 6
  • TablesAndViewsName, Input, VARCHAR, 7
  • TablesAndViewsType, Input, VARCHAR, 8
  • DatabaseId, Input, LONG, 9

 

Click the Result Set page and add a new resultset. Set the Result Name to 0 and assign the result to the variable TablesAndViewsId:

Click the Ok button to close the Execute SQL Task editor.

E. The Tables and Views

Add another Execute SQL Task to the Foreach Table and View Loop Container and rename it "Get Column Names and Metadata". Add a Success precedence constraint from the "Add the Table and View Name" Execute SQL Task to the "Get Column Names and Metadata" Execute SQL Task, and double-click "Get Column Names and Metadata" to open the editor. Set the following properties:

  • ResultSet: Full result set
  • Connection: SQLServerInstance.OLEDB
  • SQLStatement:

    select distinct
     
    c
    .name
     ,c.column_id
     ,t.name
    from sys.columns c
    inner join sys.types t on t.user_type_id = c.user_type_id
    where c.object_id = ?

This statement returns columns, their respective Ids, and type names. 

Map the following parameters:

  • TablesAndViewsObjectId, Input, LONG, 0, -1

Configure the Result Set page with the following resultset properties:

  • Result Name: 0
  • Variable Name: ColumnNames

Click the Ok button to close the Execute SQL Task editor.

Shred The Columns

Drag a new Foreach Loop Container into the Foreach Table and View Loop Container and rename it "Foreach Column". Connect the "Get Column Names and Metadata" Execute SQL Task to it using a Success precedence constraint. Double-click Foreach Column to open the editor and set the following properties:

  • Collection page, Enumerator: Foreach ADO Enumerator
  • Collection page, Ado object source variable: ColumnNames
  • Collection page, Enumeration mode: Rows in first table
  • Variable Mappings page
    • Variable: ColumnName; Index: 0
    • Variable: ColumnObjectId; Index: 1
    • Variable: ColumnDataType; Index: 2

Click the Ok button to close the Foreach Loop Container editor.

Show Me The Columns!

Drag a Script Task into the Foreach Column Loop Container and rename it "Show Column Name". Double-click to open the editor and add the following SSIS variables to the ReadOnlyVariables property: SQLServerInstanceName, DatabaseName, DatabaseObjectId, SchemaName, SchemaObjectId, TablesAndViewsName, TablesAndViewsObjectId, TablesAndViewsType, ColumnName, ColumnObjectId, ColumnDataType. Click the Design Script button to open the VSA editor and add the following code to Sub Main():

Dim sSQLServerInstanceName As String = Dts.Variables("SQLServerInstanceName").Value.ToString
Dim sDatabaseName As String = Dts.Variables("DatabaseName").Value.ToString
Dim iDatabaseObjectId As Integer = Convert.ToInt32(Dts.Variables("DatabaseObjectId").Value)
Dim sSchemaName As String = Dts.Variables("SchemaName").Value.ToString
Dim iSchemaObjectId As Integer = Convert.ToInt32(Dts.Variables("SchemaObjectId").Value)
Dim sTableAndViewName As String = Dts.Variables("TablesAndViewsName").Value.ToString
Dim iTableAndViewObjectId As Integer = Convert.ToInt32(Dts.Variable
(
"TablesAndViewsObjectId").Value)
Dim sTableAndViewType As String = Dts.Variables("TablesAndViewsType").Value.ToString
Dim sColumnName As String = Dts.Variables("ColumnName").Value.ToString
Dim iColumnObjectId As Integer = Convert.ToInt32(Dts.Variables("ColumnObjectId").Value)
Dim sColumnDataType As String = Dts.Variables("ColumnDataType").Value.ToString

Dim
sMsg As String = "SQL Server Instance Name: " & sSQLServerInstanceName & vbCrLf & _
"Database Name: " & sDatabaseName & vbCrLf & _
"Database Id: " & iDatabaseObjectId.ToString & vbCrLf & _
"Schema Name: " & sSchemaName & vbCrLf & _
"Schema Id: " & iSchemaObjectId.ToString & vbCrLf & _
"Table / View Name: " & sTableAndViewName & vbCrLf & _
"Table / View Id: " & iTableAndViewObjectId.ToString & vbCrLf & _
"Table / View Type: " & sTableAndViewType & vbCrLf & _
"Column Name: " & sColumnName & vbCrLf & _
"Column Id: " & iColumnObjectId.ToString & vbCrLf & _
"Column DataType: " & sColumnDataType

MsgBox(sMsg)

Close the VSA editor and click the Ok button to close the Script Task editor.

Store Columns And Metadata

Drag an Execute SQL Task into the Foreach Column Loop Container and connect the "Show Column Name" Script Task to it using a Success precedence constraint. Rename the Execute SQL Task "Add the Column Name" and double-click it to open the editor. Set the following properties:

  • ResultSet: Single row 
  • Connection: EnterpriseDBMetadata.OLEDB
  • SQLStatement:

    If Not Exists(Select ColumnName
                  from dbo.Columns
                  where ColumnName = ?
                    and TablesAndViewsTableId = ?)
     begin
      Insert Into dbo.Columns
      (ColumnsObjectId
      ,TablesAndViewsTableId
      ,ColumnName
      ,ColumnDataType
    )
      Values(?, ?, ?, ?)
     end

Map the parameters as shown in the following list and image:

  • ColumnName, Input, VARCHAR, 0
  • TablesAndViewsId, Input, LONG, 1
  • ColumnObjectId, Input, LONG, 2
  • TablesAndViewsId, Input, LONG, 3
  • ColumnName, Input, VARCHAR, 4
  • ColumnDataType, Input, VARCHAR, 5

Click the Ok button to close the Execute SQL Task editor.

Conclusion

You're done!

When you're done your SSIS package should look something like this:

 

You can get the SSIS package here (free registration required).

:{> Andy

Posted by andy | 1 Comments

New SSIS 101 Article!

My latest SSIS 101 article - SSIS 101: Object Variables, ResultSets, and Foreach Loop Containers - is live at SQL Server Central today!

Enjoy!

:{> Andy

Posted by andy | 0 Comments
Filed under:

Installing Visual Studio 2008 SP1 Beta

Introduction

Visual Studio 2008 Service Pack 1 Beta is available. Installation is not straightforward - there is at least one "gotcha."

Full SP documentation can be found here.

Procedure

Download Visual Studio 2008 Service Pack 1 Beta. I am installing this on the same virtual machine I installed SQL Server 2008 RC0 - details in this post. I also plan to install the Visual Studio 2008 GDR June CTP on this virtual machine (which will be covered in this post) - hence the GDRsetup.exe included in the screenshot.

I execute the VS90sp1-KB945140-ENU.exe file with Administrator privileges by right-clicking it ans selecting "Run as administrator" :

 This starts the installation wizard. Components are listed in the "Affected products" listbox on the introductory splash page:

 Click Next to proceed to the License page:

If you read and accept the License terms, click Next to proceed to the Installation Progress page:

Oops 

When I install Visual Studio 2008 SP1 Beta - following the instruction - the installation fails. This is the "gotcha" I mentioned in the introduction:

If you click the link in the sentence "View the log file," the installation log is displayed. I copied the error message at the end of this file. The error message is "A later version of Microsoft SQL Server Compact is already installed" :

I copy this error to the clipboard and paste it into a search engine. I receive one link:

Browsing to that link and scrolling down, I find an answer (thanks Vicente!):

Following Vicente's instructions (and link), I download SSCERuntime-ENU.msi from the Microsoft SQL Server Compact 3.5 SP1 Beta website:

I save this file in my Apps directory:

 

I install SSCERuntime-ENU.msi from my Apps directory:

When the Microsoft SQL Server Compact 3.5 SP1 Beta installation wizard starts, click Next:

At the Program Maintenance page, select Remove and click Next:

Click the Remove button to remove the current version of the SQL Server CE Runtime:

When the SQL Server CE Runtime is removed, click Finish:

Return to your folder containing the downloaded SSCERuntime-ENU.msi file and copy that file to the clipboard:

Navigate to the temporary directory containing the Visual Studio 2008 SP1 files. For my virtual machine, this path is C:\Users\Administrator\AppData\Local\Temp\1\Microsoft Visual Studio 2008 SP1 (Beta). Paste SSCERuntime-ENU.msi, overwriting the existing file:

Again, overwrite the existing file:

Return to your folder containing the Visual Studio 2008 SP1 Beta file (VS90sp1-KB945140-ENU.exe) and execute this file with administrator permissions (again):

Click Next to start the installation (again):

Agree to the license (again):

Observe the Progress (again):

This time, the service pack is applied - without error!

I may be missing something - this is how the installation went for me. I do not believe the temporary file we overwrite in the corrective steps exist until the first installation attempt fails.

:{> Andy

 

Installing SQL Server RC0

Introduction

SQL Server 2008 RC0 has been out for a while. You can download it here.

This post is not a detailed examination of the setup options. There are much better articles and posts out there if you want more information. This post represents one way to get SQL Server 2008 RC0 up and running so you can start learning the new features!

Getting Started

I downloaded the software and installed it on a brand new virtual machine running Windows Server 2008 Enterprise (without Hyper-V). I used VMWare Workstation for this virtual machine because I need a 64-bit guest OS. I was also able to configure the machine to use both CPUs on my host system. This may be possible with Virtual Server or Virtual PC, but I don't know how to do it (if it is).

I loaded the ISO image for SQL Server 2008 RC0 64-bit into the virtual DVD drive and started the setup application. The following screen displays:

Now this looks a lot different from previous SQL Server installation applications. In general, you navigate between pages using the Page List on the left. To continue installing click the Installation link in the Page List:

Installing a Stand-Alone Default Instance 

To install a new instance of SQL Server 2008, click the "New SQL Server stand-alone installation or add features to an existing installation" link. The Setup Support Rules wizard executes and checks your system configuration:

 

Green checkmarks == good. When you click OK the Product Key screen displays. On this screen you can select any of the available evaluation editions or enter a product key:

Click Next to continue. The License Terms page is next. Accept the license terms and click Next to proceed:

The Setup Support Files page displays next. Click Install to begin application setup:

 

The first step in setting up the application is checking the Setup Support Rules:

 On this execution, the Setup Support Rules detected Windows Firewall and issued the following warning:

 

 I will address this warning after setup is complete. The next page is Feature Selection. Select the SQL Server 2008 features you wish to install and their respective directories, then click the Next button:

 The Instance Configuration page allows you to specify the SQL Server instance name and root directory:

Next, disk space is checked. Click Next to proceed:

Before configuring the Server, I created a SQL Server service account. Right-click My Computer and select Manage:

Navigate to Server Manager \ Configuration \ Local Users and Groups \ Users. Right-click in the Users pane and select New User:

I named my service account SQLDude:

Returning to the Server Configuration page of the SQL Server 2008 RC0 installation application, I click the "Use the same account for all SQL Server services" button and supply the credentials for SQLDude:

 

Click OK to assign the credentials to all services:

 

Click Next to proceed to the Database Engine Configuration page. On this page, configure administrator accounts, data directories, and the filestream options:

On the Data Directories tab, set a data root directory and edit any associated default paths:

I enabled filestream on the FILESTREAM tab. I haven't tested this yet. Configuring Filestream did not work in the CTPs (this was one the bugs I reported):

The next page is Analysis Services Configuration. Select administrators and data directories for SSAS 2008:

Reporting Services Configuration is the next page. I installed the native mode default configuration:

Click Next to set Error and Usage Reporting options on the next page:

Click Next to execute the Installation Rules. If there are no errors or warnings, click Next to proceed. Otherwise, stop here and address them:

Click Next to navigate to the Ready to Install page. Review the summary and click Back to make any changes. Click Install to install SQL Server 2008 RC0:

Once the installation completes the Next button will enable. Review the Status to make sure all items are marked Success and click Next to complete the installation:

Click Close on the Complete page to unload the SQL Server 2008 RC0 installation application:

That's it! You've successfully installed SQL Server 2008 RC0 - congratulations!

:{> Andy

 

 

Posted by andy | 1 Comments

MVP Renewed!

I received the news today: Microsoft has awarded me SQL Server MVP again this year. How cool!

Thanks Microsoft!

:{> Andy

Posted by andy | 0 Comments
Filed under:

Power Tools for Visual Studio Team System 2008 Database Edition

Gert and team announce the availability of Power Tools for Visual Studio Team System 2008 Database Edition!

Cool new stuff includes a couple new test conditions, file- and XML-based data generators, and (my favorite) SQL Static Code Analysis integrated into MSBuild / Team Build!

Download page:
http://www.microsoft.com/downloads/details.aspx?FamilyID=73ba5038-8e37-4c8e-812b-db14ede2c354&displaylang=en

Installer download:
http://download.microsoft.com/download/f/b/8/fb8d1c0d-c0c4-4004-ab86-12396b2a3ee3/VSTSDB2008PT.msi

Documentation download:
http://download.microsoft.com/download/f/b/8/fb8d1c0d-c0c4-4004-ab86-12396b2a3ee3/Power%20Tools%202008.doc

Great work Team!

:{> Andy

Posted by andy | 0 Comments

Introducing Change Data Capture, SSIS, and SQL Server 2008 CTP5 (Nov 2007)

Introduction 

On Thursday, 24 Jan 2008, I presented New Features In SSIS 2008 to the Richmond SQL Server Users Group.

Most of the presentation was dedicated to demonstrating Change Data Capture (CDC) interacting with SQL Server 2008 Integration Services. I started seriously working on this demo the first week of January, thinking I'd put 2 - 6 hours into it to get it running using the detailed instructions in Books Online. Things were going relatively well working through the demo until I hit calls from SSIS to table-valued functions created by CDC.

The TVFs didn't exist. Indifferent

Well, that's not exactly true. Their renamed stubs exist in CTP5, but they only return 0. Bummer. Or is it? This is what's cool about CTPs - you get a feel for where the development is and where it will likely go. And while working through some other stuff related to CDC I'd learned enough to cobble together a solution, so I did. What I built will demonstrate the principles of CDC.

<As_Is>

I assure you there will be differences in the RTM and would wager good money this post will be obsolete with CTP6. So if I get any questions / comments the first thing I will ask is "Which CTP are you using?" If it's not CTP5 this post is not appropriate.
</As_Is>

You can build your own virtual server or PC and play along, or you can download a prebuilt virtual hard drive (vhd) file here. I built my own. Once I got the SQL Server 2008 November CTP up and running I logged in and began tinkering. Here's what I did and what I learned:

Introducing CDC 101

The idea of Change Data Capture is pretty cool. The data is stored in a format similar to that used by some database engine transaction logs. In a transaction log changes are respresented in row-based before- and after-images Inserts have empty before-images and the inserted data in the after-image. This makes sense because there's no data there before, only after the insert. Deletes have deleted data in the before-image and empty after-images. Updates have the existing row (or affected columns in the row) in the before-image and the updated row (or affected columns in the row) in the after-image.

In SQL Server 2008's implementation of CDC, changes are stored in a table. The table is created in a special schema named "cdc" built in the database when you enable Change Data Capture. I had a cynical thought: I wonder if the Center for Disease Control uses SQL Server... Wink

A table in the cdc schema is created for each table for which Change Tracking is enabled. The table is named cdc.schema_table_CT and contains a column for each column in the Tracked table plus metadata columns. In my example I enabled Chage Tracking on a table named dbo.Contact. The table created in the cdc schema is cdc.dbo_Contact_CT.

One of the metadata columns is __$operation and it identifies the record (image) type. Inserts and Deletes have no empty images. Deletes are respresented in the row by an __$operation column value of 1; Inserts by an __$operation column value of 2. The before-image of an Update in the row by an __$operation column value of 3; the after-image by an __$operation column value of 4.

Setting It Up

I'm going to demonstrate using Change Data Capture to develop ETL with SSIS 2008. I started with the Improving Incremental Loads with Change Data Capture - accessible from Change Data Capture In Integration Services - topic in SQL Server 2008 CTP5 Books Online (ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10is_1devconc/html/c4aaba1b-73e5-4187-a97b-61c10069cc5a.htm).

I first create two databases: CDCSource and CDCTarget. I use the following script:

use master
go
-- create CDCSource database...
if not exists(select name
from sys.databases
where name = 'CDCSource')
Create Database CDCSource
go
use CDCSource
go

-- create CDCTarget database...
if not exists(select name
from sys.databases
where name = 'CDCTarget')
Create Database CDCTarget
go

Next I create a table in each database to serve as my ETL source and destination. I use a portion of the AdventureWorks Person.Contact table:

use CDCTarget
go

-- create and populate dbo.Contact table...
if exists(select name
from sys.tables
where name = 'Contact')
Drop Table dbo.Contact
go

select ContactID
,NameStyle
,Title
,FirstName
,MiddleName
,LastName
,Suffix
,EmailAddress
,EmailPromotion
,Phone
,ModifiedDate
into dbo.Contact
from AdventureWorks.Person.Contact
go

use CDCSource
go

-- create and populate dbo.Contact table...
if exists(select name
from sys.tables
where name = 'Contact')
Drop Table dbo.Contact
go

select ContactID
,NameStyle
,Title
,FirstName
,MiddleName
,LastName
,Suffix
,EmailAddress
,EmailPromotion
,Phone
,ModifiedDate
into dbo.Contact
from AdventureWorks.Person.Contact
go

Change Tracking

Change Tracking is enabled on the database and then on individual tables. To enable Change Tracking in CTP5 use the following T-SQL statement:

-- enable CDC on database...
if not exists(select name, is_cdc_enabled
              from master.sys.databases
              where name = 'CDCSource'
               and is_cdc_enabled = 1)
 exec sys.sp_cdc_enable_db_change_data_capture
go

Once Change Tracking is enabled on the database you can enable Change Data Capture on individual tables using a script similar to the following:

-- enable CDC on dbo.Contact table...
exec sys.sp_cdc_enable_table_change_data_capture
  @source_schema
= 'dbo'
, @source_name = 'Contact'
, @role_name = 'cdc_admin';
go

Making It Work

Books Online has us using a table-valued function to access the cdc.dbo_Contact_CT table, but that function is merely a stub in CTP5. Here's where I got creative to make it work.

Create a table-valued function dbo.uf_Contact:

-- build CDC table-valued function...
if exists(select name
          from sys.objects
          where name = 'uf_Contact')
 drop function cdc.uf_Contact
go

CREATE function cdc.uf_Contact (
@start_time datetime
,@end_time datetime
)
returns @Contact table (
ContactID int
,NameStyle bit
,Title nvarchar(8)
,FirstName nvarchar(50)
,MiddleName nvarchar(50)
,LastName nvarchar(50)
,Suffix nvarchar(10)
,EmailAddress nvarchar(50)
,EmailPromotion int
,Phone nvarchar(25)
,ModifiedDate datetime
,CDC_OPERATION varchar(1)
)
as
begin
declare @from_lsn binary(10), @to_lsn binary(10)
if (@start_time is null)
select @from_lsn = sys.fn_cdc_get_min_lsn('Contact')
else
select @from_lsn = sys.fn_cdc_increment_lsn(sys.fn_cdc_map_time_to_lsn('largest less than or equal',@start_time))
if (@end_time is null)
select @to_lsn = sys.fn_cdc_get_max_lsn()
else
select @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal',@end_time)
if (@from_lsn = sys.fn_cdc_increment_lsn(@to_lsn))
return
-- Query for change data
insert into @Contact
select ContactID
,NameStyle
,Title
,FirstName
,MiddleName
,LastName
,Suffix
,EmailAddress
,EmailPromotion
,Phone
,ModifiedDate
,case __$operation
 when 1 then 'D'
 when 2 then 'I'
 when 4 then 'U'
 else null
end as CDC_OPERATION
from cdc.dbo_Contact_CT
where __$start_lsn between @from_lsn and @to_lsn
 and __$operation != 3 --'all'
return
end

go

This is the function we will query from SSIS to get changes.

Change Tracking is now in place for CDCSource and dbo.Contact. Changes made to the dbo.Contact will be stored in the cdc.dbo_Contact_CT table.

The SSIS Package

Let's build an SSIS package to take advantage of Change Tracking. Create a new SSIS 2008 project and rename the default package ChangeDataCapture.dtsx. Again, I'm using the Improving Incremental Loads with Change Data Capture - accessible from Change Data Capture In Integration Services - topic in SQL Server 2008 CTP5 Books Online (ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10is_1devconc/html/c4aaba1b-73e5-4187-a97b-61c10069cc5a.htm).

Add the following package-scoped variables:

Name                     Data Type            Value
ExtractStartTime     DateTime             
ExtractEndTime      DateTime             
DataReady             Int32                    2
DelaySeconds        Int32                    10
IntervalID                Int32                    0
TimeoutCount         Int32                    0
TimeoutCeiling        Int32                    20
SQLDataQuery       String                  SELECT * FROM cdc.uf_Contact(null, '2008-01-16 01:00:00')

Right-click in the Connection Managers space at the bottom of the package and select" New ADO.NET Connection...". Connect to an instance of SQL Server 2008 CTP5. My instance is the default instance on a virtual server named VPCSQL2K8CTP5. In the "Select or enter a database name" textbox enter CDCSource:

Click the Test Connection button to make sure all is well with your connectivity. Correct any issues and click the OK button to close the editor.

You are free to leave the connection manager named as is, but I like to have the names make sense to me. Right-click the Connection Manager and rename it "CDCSource.ADO".

Create two more Connection Managers. Make them OLEDB Connection Managers. Set up one for CDCSource and name it CDCSource.OLEDB. Set up the other for CDCTarget and name it (wait for it...) CDCTarget.OLEDB. Wink

Drag an Execute SQL Task onto the Control Flow canvas and rename it Get Interval. Set the ConnectionType property to the OLEDB and the Connection property CDCSource.OLEDB. Set the SQLStatement property to:

SELECT DATEADD(hh,-2, GETDATE()) AS ExtractStartTime,
GetDate() AS ExtractEndTime

Set the ResultSet property to Single Row. On the Result Set page, add two results:

Result Name     Variable Name
0                      User::ExtractStartTime
1                      User::ExtractEndTime

Drag and drop a For Loop Container onto the Control Flow and connect the Get Interval Execute SQL Task to it using a Success precedence constraint. Double-click the For Loop to open the editor. Set the InitExpression property to @DataReady = 0 and the EvalExpression property to @DataReady == 0:

Drag an Execute SQL Task into the For Loop Container. Change the name to Check Change Data Status and double-click the Task to open the editor. Set the Connection to CDCSource.OLEDB and the SQLStatement property to the following script:

declare @DataReady int
declare
@TimeoutCount int
if
not exists (select tran_end_time
                  
from cdc.lsn_time_mapping
                   where tran_end_time > ? )
select @DataReady = 0
else
if ? = 0
select @DataReady = 3
else
if not exists (select tran_end_time
                  
from cdc.lsn_time_mapping
                   where tran_end_time <= ? )
select @DataReady = 1
else
select @DataReady = 2
select @TimeoutCount = ?
if (@DataReady = 0)
select @TimeoutCount = @TimeoutCount + 1
else
select @TimeoutCount = 0
if (@TimeoutCount > ?)
select @DataReady = 5
select @DataReady as DataReady, @TimeoutCount as TimeoutCount 

Set the Parameters as shown:

Set the ResultSet property to Single Row and the Result Set page as follows:

Click the OK button to close the editor. Drag a Script Task into the For Loop Container with the Execute SQL Task. Double-click it to open the editor. Set the Script Language to Microsoft Visual Basic 2005 (you can now also use Microsoft Visual C# 2005) and click the ellipsis in the ReadOnlyVariables property and select User::DelaySeconds:

I have to pause here and say "Kudos!" to the SSIS team for this interface. I usually fat-finger the variable names when typing them. This is much, much nicer. Thanks!

Click the OK button to close the Select Variables form. Click the Edit Script button to open the VSTA editor. Double-click the ScriptMain.vb class to open the script editor. Edit Public Sub Main() so it reads as follows:

Public Sub Main()
'
' Add your code here
'
System.Threading.Thread.Sleep(CType(Dts.Variables("DelaySeconds").Value, Integer) * 1000)
Dts.TaskResult = ScriptResults.Success
End Sub

Close the VSTA editor and click the OK button to close the Script Task editor.

Connect the Execute SQL Task to the Script Task with a Success Precedence Constraint. Double-click the constraint to open the editor. Dropdown the Evaluation operation and select "Expression and Constraint". Make sure the Value is set to Success. Enter the following expression into the Expression textbox: "@DataReady == 0 && @TimeoutCount <= @TimeoutCeiling" as shown:

Click the Test button to test the validity of the expression. Click the OK buttont to close the Precedence Constraint editor.

I like the visual effects available for precedence constraints. When you hover over the Fx box you get a tooltip that displays the Expression value:

But note: if you select the precedence constraint and view properties (a quick shortcut is to press the F4 key) you can set the Annotation property from AsNeeded to ConstraintOptions, which will display the Expression property value always - how cool:

Add a second Script Task inside the For Loop Container and double-click to open the editor. Select Microsoft Visual Basic 2005 as the language and select User::DataReady and User::ExtractStartTime as ReadOnlyVariables. Click the Edit Script button and open ScriptMain.vb. Replace the code in Public Sub Main() with the following VB.Net:

Public Sub Main()
'
' Add your code here
'
' User variables.
Dim dataReady As Integer = _
CType(Dts.Variables("DataReady").Value, Integer)
Dim extractStartTime As Date = _
CType(Dts.Variables("ExtractStartTime").Value, DateTime)
' System variables.
Dim packageName As String = _
Dts.Variables(
"PackageName").Value.ToString()
Dim executionStartTime As Date = _
CType(Dts.Variables("StartTime").Value, DateTime)
Dim eventMessage As New System.Text.StringBuilder()
If dataReady = 1 OrElse dataReady = 5 Then
If dataReady = 1 Then
eventMessage.AppendLine("Start Time Error")
Else
eventMessage.AppendLine("Timeout Error")
End If
With eventMessage
.Append(
"The package ")
.Append(packageName)
.Append(
" started at ")
.Append(executionStartTime.ToString())
.Append(
" and ended at ")
.AppendLine(DateTime.Now().ToString())
If dataReady = 1 Then
.Append("The specified ExtractStartTime was ")
.AppendLine(extractStartTime.ToString())
End If
End With
System.Windows.Forms.MessageBox.Show(eventMessage.ToString())
Dts.Log(eventMessage.ToString(), 0,
Nothing)
Dts.TaskResult = ScriptResults.Failure
  Else
    Dts.TaskResult = ScriptResults.Success
  End If
End Sub

Close the editor and click the OK button to close the Script Task editor.

Connect the Execute SQL Task to the new Script Task with a Success Precedence Constraint. Edit the Precedence Constraint, setting the Evaluation Operation to "Success and Constraint" and the Expression to "@DataReady == 1 || @DataReady == 5". Click the OK button to close the editor.

And Now For Something Completely Different

This next part isn't included in the Books Online walk-through. I needed to add it because I was stopping a lot during my presentations. It doesn't hurt to have this in your package. So here goes.

We're now done with the For Loop Container. Add a new Execute SQL Task to the Control Flow canvas below the For Loop Container, and connect the For Loop Container to it using a Success Precedence Constraint. Double-click it to open the editor. Change the Name property to Update EndTime and the ResultSet property to Single Row. Set the Connection property to CDCSource.OLEDB and add the following statement to the SQLStatement property:

select GETDATE() as ExtractEndTIme

Click the Result Set page and add a Result. Set the Name to 0 and the Variable Name property to User::ExtractEndTime. Click the OK button to close the editor.

Why did I do this? The scripts that retrieve the changed rows use time. You'll see when we walk through the demo portion, you can wait several minutes between the time you initially set the ExtractEndTime variable in the original Execute SQL Task and the time you actually update data in the Change-Tracked table. This task makes certain that the value of ExtractEndTime is current when you exit the loop.

Back To Our Regularly Scheduled Programming

Next, add a Script Task and double-click it to open the editor. This time select Microsoft Visual C# as the ScriptLanguage property. Add User::DataReady, User::ExtractEndTime, and User::ExtractStartTime in the ReadOnlyVariables property, and User:SQLDataQuery in the ReadWriteVariables property:

Click the Edit Script button to open the VSTA editor and double-click ScriptMain.cs to open the class. Replace the public void Main() method code with the following code:

public void Main()
  {
    //string sEST;
    //sEST = Dts.Variables["ExtractStartTime"].Value.ToString();
    //System.Windows.Forms.MessageBox.Show(sEST);
    int dataReady;
    System.
DateTime extractStartTime;
    System.
DateTime extractEndTime;
    string sqlDataQuery;
    dataReady = (
int)Dts.Variables["DataReady"].Value;
    extractStartTime = (System.
DateTime)Dts.Variables["ExtractStartTime"].Value;
    extractEndTime = (System.
DateTime)(Dts.Variables["ExtractEndTime"].Value);

    //string sExtractStart;
    //sExtractStart = extractStartTime.ToString();
    //System.Windows.Forms.MessageBox.Show(sExtractStart);
    if (dataReady == 2)
      {
        sqlDataQuery =
"SELECT * FROM cdc.uf_Contact('" + string.Format("{0:yyyy-MM-dd hh:mm:ss}",
extractStartTime) +
"', '" + string.Format("{0:yyyy-MM-dd hh:mm:ss}", extractEndTime) + "')";
      }
    else
      {
        sqlDataQuery =
"SELECT * FROM cdc.uf_Contact(null" + ", '" + string.Format("{0:yyyy-MM-dd hh:mm:ss}", extractEndTime) + "')";
      }
    Dts.Variables[
"SQLDataQuery"].Value = sqlDataQuery;
    //system.windows.forms.messagebox[sqlDataQuery];
    System.Windows.Forms.MessageBox.Show(sqlDataQuery);
    Dts.TaskResult = (
int)ScriptResults.Success;
  }

Close the VSTA editor and click the OK button to close the Script Task editor.

Add a Data Flow Task and connect the former Script Task to it via a Success Precedence Constraint. Double-click the Data Flow Task to open the editor. Drag an OLE DB Data Adapter onto the Data Flow canvas. Set the Connection Manager property to CDCSource.OLEDB, the Data Access Mode property to "SQL command from variable", and the Variable name property to User::SQLDataQuery:

 

Click the OK button to close the editor.

As in SSIS 2005, a Conditional Split transformation accepts a single input and allows you to define multiple outputs using the SSIS Expression Language. Rows at the input are redirected to one and only one of the outputs. If you are familiar with the switch function in C-ish languages or the Select Case function in VB-ish languages, the Conditional Split operates much the same way. The else from these programming functions is accomplished via the Default Output, which you can also label with a name.

Drag a Conditional Split Transformation onto the canvas. Add an output named Inserts with the Condition expression set to CDC_OPERATION == "I". Add another output named Updates with the Condition expression set to CDC_OPERATION == "U". Add a third output named Deletes with the Condition expression set to CDC_OPERATION == "D":

These three outputs will divide our data stream into rows to be inserted, updated, and deleted (respectively). For the Inserts, drag an OLEDB Destination Adapter onto the Data Flow canvas. For the Updates and Deletes, drag two OLEDB Command transformations onto the canvas. Rename the OLEDB Destination Adapter "Land Inserts". Rename the OLEDB Command transformations "Apply Updates" and "Apply Deletes" (respectively).

Connect a data flow from the Conditional Split to the Destination Adapter. Because there are multiple outputs available from the Conditional Split, you will be prompted to select one. Select the Inserts output:

 

Click the OK button to close the Input Output Selection form. 

Double-click the Destination Adapter to open the editor. Set the OLEDB Connection Manager property the to CDCTarget.OLEDB connection manager. Select the dbo.Contact table from the "Name of the table or view" dropdown:

Click the OK button to close the editor. 

Drag a second data flow from the Conditional Split to the Apply Updates OLEDB Command transformation and select Updates when prompted for an output. Double-click the Apply Updates OLEDB Command transformation to open the Advanced Editor for Apply Updates (OLEDB Command transformations do not have a standard editor - at least not in CTP5). Select the CDCTarget.OLEDB connection manager on the Connection Managers tab. On the Component Properties tab, set the SqlCommand property to the following T-SQL statement:

Update dbo.Contact
 set
 
 NameStyle = ?
 ,Title = ?
 ,FirstName = ?
 ,MiddleName = ?
 ,LastName = ?
 ,Suffix = ?
 ,EmailAddress = ?
 ,EmailPromotion = ?
 ,Phone = ?
 ,ModifiedDate = ?
where ContactID = ?

Click the Column Mappings tab and map Available Input Columns to Available Destination Columns in one of two ways: 

  1. On the top half of the editor drag individual Input Columns and drop them on the desired Destination Columns.
  2. On the lower half of the editor select Input Columns to match to Destination Columns.

The question marks in the SqlCommand statement map to Paramater values (Param_n) by ordinal. The first question mark maps to Param_0, the second to Param_1, and so on. Our first question mark assigns a value to the NameStyle column, so map the NameStyle Input to Param_0 in the Destination. Continue this, mapping the following:

Input                                      Destination
NameStyle                             Param_0
Title                                       Param_1
FirstName                              Param_2
MiddleName                           Param_3
LastName                              Param_4
Suffix                                     Param_5
EmailAddress                         Param_6
EmailPromotion                      Param_7
Phone                                    Param_8
ModifiedDate                          Param_9
ContactID                               Param_10

 

Click the OK button to close the Advanced Editor.

Drag another output from the Conditional Split to the Apply Deletes OLEDB Command transformation and select Deletes when prompted to select an output. Double-click the OLEDB Command to open its Advanced Editor and select CDCTarget.OLEDB for the Connection Manager. On the Component Properties tab, enter the following statement in the SqlCommand property:

delete dbo.Contact where ContactID = ?

On the Column Mappings tab, map the ContactID input to the Param_0 Destination and click the OK button to close the Advanced Editor.

The Data Flow is complete and should appear similar to the following:

Go, Go, Change Data Capture!

Execute the SSIS package and view the Control Flow. If you do nothing more, you will see the For Loop execute 20 times, each time incrementing the @TimeoutCount variable value and checking to see if it has reached the @TimeoutCeiling variable value:

 However, if you open SQL Server Management Studio (SSMS) and execute a query that changes the underlying data in the CDCSource database, the SSIS package will detect the change and break out of this loop.

Restart the SSIS package (to reset the @TimeoutCount variable value). Use the following query to execute changes in the CDCSource.dbo.Contact table:

-----------------------
-- Execute Changes --
-----------------------

-- Update --
update dbo.Contact
set EmailAddress = '_' + EmailAddress
where ContactID % 5 = 0

-- Insert --
insert into dbo.Contact
(NameStyle
,Title
,FirstName
,MiddleName
,LastName
,Suffix
,EmailAddress
,EmailPromotion
,Phone
,ModifiedDate)
values
(0
,
'Mr.'
,'Andy'
,'Mortimer'
,'Leonard'
,null
,'aleonard@solidq.com'
,0
,
'123-456-6789'
tdate()
)

-- Delete --
delete dbo.Contact
where ContactID % 27 = 0

Return to SSIS and note that on the next pass through the loop more code is executed:

If you didn't comment it out, the C# script should display a message box containing the query (stored in @User::SqlDataQuery) that will serve as the source of the OLEDB Source Adapter in the Data Flow:

Click the OK button to dismiss the non-modal message box.

Click the Data Flow tab to observe changes as they are applied to CDCTarget. When complete, the Data Flow should look similar to the following:

You can also look at the table containing the changes by executing the following statement against CDCSource in SSMS:

select
__$start_lsn
,__$end_lsn
,__$seqval
,__$operation
,__$update_mask
,ContactID
,NameStyle
,Title
,FirstName
,MiddleName
,LastName
,Suffix
,EmailAddress
,EmailPromotion
,Phone
,ModifiedDate
from cdc.dbo_Contact_CT

In my database the change tracking table appears as shown:

You will recall from (building the function) that the __$operation column is used to determine whether the change is an insert, update, or delete. I mentioned this earlier: The value in this column captures the type of change.

__$operation         Change Type
1                          Delete
2                          Insert
3                          Update (Before)
4                          Update (After)

The changes in the image above are all Updates.

Conclusion

This post represents one method to use SQL Server 2008 CTP5 (Nov 2007) Change Data Capture with SSIS 2008. CTP6 will be out soon and things will likely change. I will write an updated version at that time, and endeavor to keep my readers posted on other changes to this technology between now and SQL Server 2008 Release To Manufacturing (RTM).

:{> Andy

 

 

Posted by andy | 0 Comments

SSIS Design Pattern - ETL Instrumentation, Part 4

Introduction

This post is part of a series of posts on ETL Instrumentation.

In Part 1 we built a database to hold collected SSIS run time metrics and an SSIS package to deomnstrate how and why we would load metrics into the database.

In Part 2 we expanded on our database and the SSIS package to annotate version metadata, manage error metrics capture, and task status reporting.

In Part 3, we started using the ETL Instrumentation infrastructure we have built to measure some actual ETL. We started by counting rows.

In Part 4, we continue instrumenting by adding yet another ETL process.

A Brief History Of Our ETL Instrumentation Project

To review, our metrics database is named SSISRunTimeMetrics. It contains a schema named ssis. In this schema are eleven objects:
 - a table named ssis.RunTimeMetrics.
 - a table named ssis.RunTimeErrors.
 - a table named ssis.TaskMetrics.
 - a table named ssis.RowCounts.
 - a table named ssis.RowCountTypes.
 - a stored procedure named ssis.usp_RecordPackageStart.
 - a stored procedure named ssis.usp_RecordPackageEnd.
 - a stored procedure named ssis.usp_RecordPackageError.
 - a stored procedure named ssis.usp_RecordTaskStart.
 - a stored procedure named ssis.usp_RecordTaskEnd.
 - a stored procedure named ssis.usp_RecordRowCounts.

Our source database is AdventureWorks and our destination database is SSISRunTimeMetrics_Target. SSISRunTimeMetrics_Target contains one object:
 - a table named dbo.Contact.

We expanded the types of run-time data we are collecting. Part 1 introduced Status collection, in Part 2 we added Exception collection. We also introduced scope into both types of collection, recording Exception information on error and finalizing Status (reporting that an error occurred).

At the beginning of SSIS package execution, we call ssis.usp_RecordPackageStart from an Execute SQL Task. We pass the package start date and time, the package name, and the package version. We also pass in a status of "Started". From this stored procedure we get the ID of the newly created row, which we then push into a Package Load ID variable (iPackageLoadID).

At the beginning of a task or collection of tasks that define a process, we call ssis.usp_RecordTaskStart from an Execute SQL Task. We pass the task or process start date and time, the task (source) name, iPackageLoadID, and a status of "Started". From this stored procedure we get the ID of the newly created row, which we then push into a Task Load ID variable (iTaskLoadID).

We have a Data Flow Task to move rows from the AdventureWorks.Person.Contact table to a target database and table we created: SSISRunTimeMetrics_Target.dbo.Contact. We optimized the package for set-based updates and collect row count metrics which are inserted into SSISRunTimeMetrics.ssis.usp_RecordRowCounts.

When this task completes, we call ssis.usp_RecordTaskEnd from an Execute SQL Task. We pass in the Task Load ID from the iTaskLoadID variable, the current date and time, and the status "Succeeded".

On error, we capture Exception data and record an Error Status - both are crucial to knowing what happens when an exception is thrown.

When the package completes execution, we call ssis.usp_RecordPackageEnd from an Execute SQL Task. We pass in the Package Load ID from the variable, the current date and time, and the status "Succeeded".

Let's get started on the next step!

Version Control

First, update version information:

Remember to update Version properties:

 

Now we are ready to start developing.

Sell, Sell, Sell

Let's extract and load some Sales data.

Open the SSISRunTimeMetrics package you built previously. Delete the Success Precedence Constraint between the "Step 1 - Load Contact" Sequence Container and the "Log End of Package Execution" Execute SQL Task.

Drag a Sequence Container onto the Control Flow canvas. Move the "Log End of Package Execution" Execute SQL Task down some and position the new Sequence Container between the "Step 1 - Load Contact" Sequence Container and the "Log End of Package Execution" Execute SQL Task.

 Connect the "Step 1 - Load Contact" Sequence Container to the new Sequence Container with a Success Precedence Constraint, and the new Sequence Container to the "Log End of Package Execution" Execute SQL Task with a Success Precedence Constraint.

Rename the new Sequence Container "Step 2 - Load Sales". 

Good design is reuseable. Maybe not 100%, but most good designs are at least partially reuseable. Such is the case here - we have a good design in "Step 1 - Load Contact" - we will reuse lots of it in "Step 2 - Load Sales". Let's frame-out the flow, then fill in the details.

Drag two Execute SQL Tasks and a Data Flow Task into "Step 2 - Load Sales".

Name the first Execute SQL Task "Load Sales" and double-click it to open the editor. Set the ResultSet property to "Single row" and the Connection property to "(local).SSISRunTimeMetrics". Enter the following in the SQLStatement property:

declare @Now datetime
set
@Now = GetDate()
exec ssis.usp_RecordTaskStart ?,NULL,@Now,?,'Started'

On the Parameter Mappings page, add two input parameters. Set Parameter 0 to Long data type and supply the User::iPackageLoadID variable. Set Parameter 1 to VarChar data type and supply the System::TaskName variable:

On the Result Set page, add one Result named 0 aimed at the User::iTaskLoadID variable: 

Click the OK button to close the editor.

Before proceeding, note that this Execute SQL Task is also the product of good design. In fact, the only difference task and it's counterpart in "Step 1 - Load Contact" is the name of the task itself. Everything else is identical.

So why not copy and paste the task? Good question - we certainly could have! And we will copy and paste other tasks starting now.

Connect a Success Precedence Constraint from the "Load Sales" Execute SQL Task to the Data Flow Task. We need to do some cleanup here before proceeding. In the "Step 1 - Load Contact" Sequence Container there's a Data Flow Task named "Data Flow Task". We have one of those in our "Step 2 - Load Sales" Sequence Container as well. This is permissible because the objects are in different containers and have different scope.

It robs us of an important navigation facility - one we will likely need: the ability to use the Data Flow Task tab's dropdown box. Have a look:

To remedy this, let's rename the Data Flow Task in "Step 1 - Load Contact" "Load Contact Data". Similary, let's rename the "Step 2 - Load Sales" Data Flow Task "Load Sales Data".

There. Much better.

Connect the "Load Sales Data" Data Flow Task to the second Execute SQL Task with a Success Precedence Constraint and rename it (the second Execute SQL Task) "Apply Staged Updates". Double-click it to open the editor and set the Connection property to "(local).SSISRunTimeMetrics_Target". We will return to this task later - click the OK button to close the editor.

Copy the "Log Successful End of Task" Execute SQL Task from the "Step 1 - Load Contact" Sequence Container and paste it onto the Data Flow canvas. Then drag it into the the "Step 2 - Load Sales" Sequence Container.

Note: You can paste it directly into the "Step 2 - Load Sales" Sequence Container if you want to, but I recommend you not do this in SSIS 2005. The Sequence Container will expand to accomodate anything inside it, and the paste functionality in SSIS 2005 completely ignores the mouse pointer position (and eveything else, so far as I can tell) when you paste from the clipboard. Combined, these behaviors cause sequence containers to grow unpredictably large when you paste directly into them.

No modifications are required for the "Log Successful End of Task" Execute SQL Task to function as desired in the new sequence container - how cool is that?

Copy the "Log Failed End Of Task" Execute SQL Task and paste it onto the Control Flow canvas. The new task will show up named "Log Failed End Of Task 1". Again, a naming convention conflict. To resolve it, rename the original "Log Failed End Of Task" Execute SQL Task - connected to the "Step 1 - Load Contact" Sequence Container via a Failure Precedence Constraint - to "Log Failed End of Load Contact Task".

Rename the newly pasted "Log Failed End Of Task 1" Execute SQL Task to "Log Failed End Of Load Sales Task" and connect the "Step 2 - Load Sales" Sequence Container to "Log Failed End Of Load Sales Task" via a Failure Precedence Constraint.

Copy the "Record Row Count" Execute SQL Task from the "Step 1 - Load Contact" Sequence Container. Again, paste it onto the Control Flow canvas and then drag it into the "Step 2 - Load Sales" Sequence Container. Connect the "Log Successful End Of Task" Execute SQL Task to the "Record Row Count" Execute SQL Task with a Success Precedence Constraint and double-click the task to open the editor.

All is well with the General page, but the Parameter Mapping page reveals some poor variable-naming choices in the last exercise. We can fix this in the variable dropdown. Click the dropdown that currently contains the User::iContactCount variable and select <New variable...>:

When the Add Variable dialog displays, click the Container dropdown and select the package ("SSISRunTimeMetrics"). This determines the scope of the variable and we want a package-scoped variable.

Click the OK button to select the package scope. Set the Name of the variable to iSalesInputCount, the Value Type (data type) to Int32, and the Value to 0:

Click the OK button to close the Add Variable dialog.

Repeat the procedure above for the "Counts" variables. Name the remaining three Counts variables iSalesNewRowsCount, iSalesChangedRowsCount, and iSalesUnchangedRowsCount; respectively. When complete, the Parameter Mapping page should appear as shown:

Click the OK button to close the Execute SQL Task editor.

The flow is now framed-out. We are ready to begin our Sales-specific coding.

Building The LZ (Landing Zone)

We need a place for our Sales data to land in SSISRunTimeMetrics_Target database.

In this section I am going to walk through the first phase of the process of converting a well-designed OLTP schema into a denormalized schema.

We'll start with the AdventureWorks Sales schema. First, let's list all the tables in the Sales schema using the following query:

use AdventureWorks;
go

select s.name + '.' + t.name
from sys.tables t
inner join sys.schemas s on s.schema_id = t.schema_id
where s.name = 'Sales'

This gives us a list of tables in the Sales schema:

Sales.StoreContact
Sales.ContactCreditCard
Sales.CountryRegionCurrency
Sales.CreditCard
Sales.Currency
Sales.SalesOrderDetail
Sales.CurrencyRate
Sales.Customer
Sales.SalesOrderHeader
Sales.CustomerAddress
Sales.SalesOrderHeaderSalesReason
Sales.SalesPerson
Sales.SalesPersonQuotaHistory
Sales.SalesReason
Sales.Individual
Sales.SalesTaxRate
Sales.SalesTerritory
Sales.SalesTerritoryHistory
Sales.ShoppingCartItem
Sales.SpecialOffer
Sales.SpecialOfferProduct
Sales.Store

Let's select Sales.SalesOrderDetail as our base table... we have to start somewhere. Open SQL Server Management Studio and connect the Object Browser to your local (or development) instance of SQL Server 2005. Expand Databases, then AdventureWorks, then Tables:

 

Scroll down to Sales.SalesOrderDetail. Right-click the table object in Object Browser. Hover over "Script Table as", then "CREATE To", and click "New Query Editor Window":

 

This creates a nice CREATE script (and more) for the Sales.SalesOrderDetail table. I only need the CREATE TABLE portion so I remove the rest. I modify the script further - making the table part of the dbo schema. I discard the constraints, NOT NULLs, brackets, and extended properties and I'm left with:

CREATE TABLE dbo.SalesOrderDetail(
SalesOrderID int NULL,
SalesOrderDetailID int NULL,
CarrierTrackingNumber nvarchar(25) NULL,
OrderQty smallint NULL,
ProductID int NULL,
SpecialOfferID int NULL,
UnitPrice money NULL,
UnitPriceDiscount money NULL,
LineTotal money,
rowguid uniqueidentifier,
ModifiedDate datetime NULL)

Repeating the process for the Sale.SalesOrderHeader table yields:

CREATE TABLE dbo.SalesOrderHeader(
SalesOrderID int NULL,
RevisionNumber tinyint NULL,
OrderDate datetime NULL,
DueDate datetime NULL,
ShipDate datetime NULL,
Status tinyint NULL,
OnlineOrderFlag bit NULL,
SalesOrderNumber nvarchar(25) NULL,
PurchaseOrderNumber nvarchar(25) NULL,
AccountNumber nvarchar(15) NULL,
CustomerID int NULL,
ContactID int NULL,
SalesPersonID int NULL,
TerritoryID int NULL
BillToAddressID int NULL,
ShipToAddressID int NULL,
ShipMethodID int NULL,
CreditCardID int NULL,
CreditCardApprovalCode varchar(15) NULL,
CurrencyRateID int NULL,
SubTotal money NULL,
TaxAmt money NULL,
Freight money NULL,
TotalDue money NULL,
Comment nvarchar(128) NULL,
rowguid uniqueidentifier NULL,
ModifiedDate datetime NULL)

I can now combine these statements, removing the duplication, to create a destination table statement:

CREATE TABLE dbo.SalesOrderHeaderDetail(
SalesOrderID int NULL,
SalesOrderDetailID int NULL,
CarrierTrackingNumber nvarchar(25) NULL,
OrderQty smallint NULL,
ProductID int NULL,
SpecialOfferID int NULL,
UnitPrice money NULL,
UnitPriceDiscount money NULL,
LineTotal money,
SalesOrderDtatilrowguid uniqueidentifier,
SalesOrderDetailModifiedDate datetime NULL,

RevisionNumber tinyint NULL,
OrderDate datetime NULL,
DueDate datetime NULL,
ShipDate datetime NULL,
Status tinyint NULL,
OnlineOrderFlag bit NULL,
SalesOrderNumber nvarchar(25) NULL,
PurchaseOrderNumber nvarchar(25) NULL,
AccountNumber nvarchar(15) NULL,
CustomerID int NULL,
ContactID int NULL,
SalesPersonID int NULL,
TerritoryID int NULL,
BillToAddressID int NULL,
ShipToAddressID int NULL,
ShipMethodID int NULL,
CreditCardID int NULL,
CreditCardApprovalCode varchar(15) NULL,
CurrencyRateID int NULL,
SubTotal money NULL,
TaxAmt money NULL,
Freight money NULL,
TotalDue money NULL,
Comment nvarchar(128) NULL,
SalesOrderHeaderrowguid uniqueidentifier NULL,
SalesOrderHeaderModifiedDate datetime NULL)

Execute this statement against the SSISRunTimeMetrics_Target database to create our destination table.

Filling In The Blanks

Double-click the "Load Sales Data" Data Flow Task to switch to the Data Flow tab for editing. Drag an OLE DB Source Adapter onto the canvas and double-click it to open the editor. Select the (local).AdventureWorks connection manager. Change the Data access mode to Sql Command and enter the following SQL statement into the SQL Command Text textbox: 

SELECT SalesOrderID
,SalesOrderDetailID
,CarrierTrackingNumber
,OrderQty
,ProductID
,SpecialOfferID
,UnitPrice
,UnitPriceDiscount
,LineTotal
,rowguid
,ModifiedDate
FROM Sales.SalesOrderDetail

Click the OK button to close the editor. Right-click the Source Adapter and rename it "Sales Detail Source":

Drag a second OLE DB Source Adapter onto the Data Flow canvas and double-click it to open the editor. Select "(local).AdventureWorks" as the connection manager and SQL Command as the Data Access Mode. Enter the following statement into the SQL Command Text textbox:

SELECT SalesOrderID
,RevisionNumber
,OrderDate
,DueDate
,ShipDate
,Status
,OnlineOrderFlag
,SalesOrderNumber
,PurchaseOrderNumber
,AccountNumber
,CustomerID
,ContactID
,SalesPersonID
,TerritoryID
,BillToAddressID
,ShipToAddressID
,ShipMethodID
,CreditCardID
,CreditCardApprovalCode
,CurrencyRateID
,SubTotal
,TaxAmt
,Freight
,TotalDue
,Comment
,rowguid
,ModifiedDate
FROM Sales.SalesOrderHeader

 

Click the OK button to close the editor and rename the Source Adapter "Sales Header Source".

Drag a Merge Join onto the Data Flow canvas and connect a Data Flow Path (green arrow) from each Source Adapter to the Merge Join:

Note the Merge Join has an error - the Left Input is not sorted. (Neither is the Right Input, but validation fails on, and reports, the first error). To address this condition, right-click each Source Adapter and select Show Advanced Editor:

Click on the Input and Output Properties tab and expand the OLE DB Source Output object, then expand the Output Columns logical folder. Click on the OLE DB Source Output object (which represents the Output buffer) and change the IsSorted property to True:

In the Output Columns list, click on the SalesOrderID column and change the SortKeyPosition property to 1:

Click the OK button to close the Advanced Editor. Double-click the source adapter to open the editor and append an Order By clause to the SQL Command: "ORDER BY SalesOrderID". Close the editor and repeat this process for the other Source Adapter.

But wait - we still have an error:

 

Double-click the Merge Join to open the editor. Click every column from the Left input and every column except SalesOrderID from the Right Input. Two columns are named the same in both tables - rowguid and ModifiedDate. To differentiate, prepend each column's Output Alias with the table name:

 

Click the OK button to close the editor. The error clears.

Ok

We did all that to set this up. 

Count 'Em Up 

As with the Contacts data, we will count the rows entering the process. Depending on the nature of the process, source data, and desired measurement(s); you may choose to measure immediately after the Source Adapters or after the Merge Join - or both. We'll start with the same kind of counts measurements we built in the Contacts data flow.

Drag a Row Count transformation onto the Data Flow canvas and connect the output of the Merge Join to its input. Double-click the Row Count to open the editor and assign the User::iSalesInputCount to the VariableName property:

 

As with the Contacts data, our next steps are to correlate and filter the data, so drag a Lookup and Conditional Split transformation onto the data flow canvas and connect them (in respective order) to the Row Count transformation:

Double-click the Lookup to open the editor and assign the following properties:
OLE DB Connection Manager: (local).SSISRunTimeMetrics_Target
Table or View: dbo.SalesOrdeHeaderDetail 

Click the Columns tab, right-click in the white-space, and click Select All Mappings. Right-click again and select Delete Selected Mappings. Connect the SalesOrderID and SalesOrderDetailID columns. Select every column in the Available Lookup Columns list by checking each checkbox, then prepend each Output Alias with "Dest_":

Click the Configure Error Output button and change the Lookup Error from "Fail Component" to "Ignore Failure". Remember, this converts the default INNER JOIN functionality of the Lookup transformation into a LEFT OUTER JOIN. Click the OK button to close the Error Output Configuration, then click the OK button again to close the Lookup editor.

We are loading the pipeline with lookup data in our data flow that matches data - by SalesOrderID and SalesOrderDetailID - in the destination.

Connect the output data flow path of the Lookup of the Lookup transformation to the Conditional Split transformation and double-click the Conditional Split transformation to open the editor. Create a new output named "New Sales" with the Condition: "IsNull(Dest_SalesOrderDetailID)". If the LEFT OUTER JOIN functionality of the Lookup returns a NULL Dest_SalesOrderDetailID - and really every destination column will be NULL if there's no matching destination row, we could use any of them - then this is a new Sales data row.

Add a second condition named "Changed Sales" with the following condition expression:

(ISNULL(Dest_RevisionNumber) ? -1 : Dest_RevisionNumber) != (ISNULL(RevisionNumber) ? -1 : RevisionNumber) || (ISNULL(Dest_OrderDate) ? (DT_DBDate)0 : Dest_OrderDate) != (ISNULL(OrderDate) ? (DT_DBDate)0 : OrderDate) || (ISNULL(Dest_DueDate) ? (DT_DBDate)0 : Dest_DueDate) != (ISNULL(DueDate) ? (DT_DBDate)0 : DueDate) || (ISNULL(Dest_ShipDate) ? (DT_DBDate)0 : Dest_ShipDate) != (ISNULL(ShipDate) ? (DT_DBDate)0 : ShipDate) || (ISNULL(Dest_Status) ? 0 : Dest_Status) != (ISNULL(Status) ? 0 : Status) || (ISNULL(Dest_OnlineOrderFlag) ?  TRUE  : Dest_OnlineOrderFlag) != (ISNULL(OnlineOrderFlag) ?  TRUE  : OnlineOrderFlag) || (ISNULL(Dest_SalesOrderNumber) ? "NULL" : Dest_SalesOrderNumber) != (ISNULL(SalesOrderNumber) ? "NULL" : SalesOrderNumber) || (ISNULL(Dest_PurchaseOrderNumber) ? "NULL" : Dest_PurchaseOrderNumber) != (ISNULL(PurchaseOrderNumber) ? "NULL" : PurchaseOrderNumber) || (ISNULL(Dest_AccountNumber) ? "NULL" : Dest_AccountNumber) != (ISNULL(AccountNumber) ? "NULL" : AccountNumber) || (ISNULL(Dest_CustomerID) ? -1 : Dest_CustomerID) != (ISNULL(CustomerID) ? -1 : CustomerID) || (ISNULL(Dest_ContactID) ? -1 : Dest_ContactID) != (ISNULL(ContactID) ? -1 : ContactID) || (ISNULL(Dest_SalesPersonID) ? -1 : Dest_SalesPersonID) != (ISNULL(SalesPersonID) ? -1 : SalesPersonID) || (ISNULL(Dest_TerritoryID) ? -1 : Dest_TerritoryID) != (ISNULL(TerritoryID) ? -1 : TerritoryID) || (ISNULL(Dest_BillToAddressID) ? -1 : Dest_BillToAddressID) != (ISNULL(BillToAddressID) ? -1 : BillToAddressID) || (ISNULL(Dest_ShipToAddressID) ? -1 : Dest_ShipToAddressID) != (ISNULL(ShipToAddressID) ? -1 : ShipToAddressID) || (ISNULL(Dest_ShipMethodID) ? -1 : Dest_ShipMethodID) != (ISNULL(ShipMethodID) ? -1 : ShipMethodID) || (ISNULL(Dest_CreditCardID) ? -1 : Dest_CreditCardID) != (ISNULL(CreditCardID) ? -1 : CreditCardID) || (ISNULL(Dest_CreditCardApprovalCode) ? "NULL" : Dest_CreditCardApprovalCode) != (ISNULL(CreditCardApprovalCode) ? "NULL" : CreditCardApprovalCode) || (ISNULL(Dest_CurrencyRateID) ? -1 : Dest_CurrencyRateID) != (ISNULL(CurrencyRateID) ? -1 : CurrencyRateID) || (ISNULL(Dest_SubTotal) ? 0 : Dest_SubTotal) != (ISNULL(SubTotal) ? 0 : SubTotal) || (ISNULL(Dest_TaxAmt) ? 0 : Dest_TaxAmt) != (ISNULL(TaxAmt) ? 0 : TaxAmt) || (ISNULL(Dest_Freight) ? 0 : Dest_Freight) != (ISNULL(Freight) ? 0 : Freight) || (ISNULL(Dest_TotalDue) ? 0 : Dest_TotalDue) != (ISNULL(TotalDue) ? 0 : TotalDue) || (ISNULL(Dest_Comment) ? "NULL" : Dest_Comment) != (ISNULL(Comment) ? "NULL" : Comment) || (ISNULL(Dest_SalesOrderHeaderModifiedDate) ? (DT_DBDATE)0 : Dest_SalesOrderHeaderModifiedDate) != (ISNULL(SalesOrderHeaderModifiedDate) ? (DT_DBDATE)0 : SalesOrderHeaderModifiedDate) || (ISNULL(Dest_CarrierTrackingNumber) ? "NULL" : Dest_CarrierTrackingNumber) != (ISNULL(CarrierTrackingNumber) ? "NULL" : CarrierTrackingNumber) || (ISNULL(Dest_OrderQty) ? 0 : Dest_OrderQty) != (ISNULL(OrderQty) ? 0 : OrderQty) || (ISNULL(Dest_ProductID) ? -1 : Dest_ProductID) != (ISNULL(ProductID) ? -1 : ProductID) || (ISNULL(Dest_SpecialOfferID) ? -1 : Dest_SpecialOfferID) != (ISNULL(SpecialOfferID) ? -1 : SpecialOfferID)

Rename the default output "Unchanged Rows". Click the OK button to close the editor.

Drag an OLE DB Destination Adapter onto the Data Flow canvas and rename it "New Sales Destination". Connect an output of the Conditional Split to the new Destination Adapter. When prompted, select the "New Sales" output of the Conditional Split:

 

Double-click the Destination Adapter ot open the editor. Set the Connection Manager property to (local).SSISRunTimeMetrics_Target. Set the Data Access Mode property to "Table or View" and select the dbo.SalesOrderHeaderDetail table. Click on the Mappings page to automap the pipeline fields to the table columns:

 

Click the OK button to close the editor.

Drag another OLE DB Destination Adapter onto the Data Flow canvas and rename it "stgSalesChangedRows". Connect an output from the Conditional Split to the new Destination Adapter and select the "Changed Sales" output when prompted. Double-click the Destination Adapter to open the editor. Set the Connection Manager property to (local).SSISRunTimeMetrics_Target and set the Data Access Mode property to "Table or View". Click the New button next to the "Name of the Table or View" dropdown:

Click the OK button to create the stgSalesChangedRows table. Click the Mappings page to automap the columns, then click the OK button to close the editor.

We now have Sales ETL. Cool

To complete our counts logic, add three Row Count transformations to the Data Flow canvas. Name them "New Rows Count", "Changed Rows Count", and "Unchanged Rows Count". Position "New Rows Count" between the Conditional Split and the "New Sales Destination" Adapter. Double-click to open the editor and set the VariableName property to "User::iSalesNewRowsCount". Click the OK button to close the editor.

Position the "Changed Rows Count" between the Conditional Split and the stgSalesChangedRows Destination Adapter. Open its editor and set the VariableName property to "User::iSalesChangedRowsCount".

Open the editor for the "Unchanged Rows Count" transformation and set the VariableName property to "User::iSalesUnchangedRowsCount".

 Before we leave this section, let's complete the staged updates by returning to the Control Flow and updating the SQLStatement property of the "Apply Staged Updates" Execute SQL Task inside the "Step 2 - Load Sales" Sequence Container with the following statement:

UPDATE dest
SET dest.SalesOrderID = stage.SalesOrderID
,dest.CarrierTrackingNumber = stage.CarrierTrackingNumber
,dest.OrderQty = stage.OrderQty
,dest.ProductID = stage.ProductID
,dest.SpecialOfferID = stage.SpecialOfferID
,dest.UnitPrice = stage.UnitPrice
,dest.UnitPriceDiscount = stage.UnitPriceDiscount
,dest.LineTotal = stage.LineTotal
,dest.SalesOrderDtatilrowguid = stage.SalesOrderDtatilrowguid
,dest.SalesOrderDetailModifiedDate = stage.SalesOrderDetailModifiedDate
,dest.RevisionNumber = stage.RevisionNumber
,dest.OrderDate = stage.OrderDate
,dest.DueDate = stage.DueDate
,dest.ShipDate = stage.ShipDate
,dest.Status = stage.Status
,dest.OnlineOrderFlag = stage.OnlineOrderFlag
,dest.SalesOrderNumber = stage.SalesOrderNumber
,dest.PurchaseOrderNumber = stage.PurchaseOrderNumber
,dest.AccountNumber = stage.AccountNumber
,dest.CustomerID = stage.CustomerID
,dest.ContactID = stage.ContactID
,dest.SalesPersonID = stage.SalesPersonID
,dest.TerritoryID = stage.TerritoryID
,dest.BillToAddressID = stage.BillToAddressID
,dest.ShipToAddressID = stage.ShipToAddressID
,dest.ShipMethodID = stage.ShipMethodID
,dest.CreditCardID = stage.CreditCardID
,dest.CreditCardApprovalCode = stage.CreditCardApprovalCode
,dest.CurrencyRateID = stage.CurrencyRateID
,dest.SubTotal = stage.SubTotal
,dest.TaxAmt = stage.TaxAmt
,dest.Freight = stage.Freight
,dest.TotalDue = stage.TotalDue
,dest.Comment = stage.Comment
,dest.SalesOrderHeaderrowguid = stage.SalesOrderHeaderrowguid
,dest.SalesOrderHeaderModifiedDate = stage.SalesOrderHeaderModifiedDate
FROM dbo.SalesOrderHeaderDetail dest
INNER JOIN dbo.stgSalesChangedRows stage ON stage.SalesOrderDetailID = dest.SalesOrderDetailID

There. Done and done. 

If You Build It...

Up until now, we've basically followed the same template used for Contacts to construct the Sales ETL. We added some complexity (by design) to grow our understanding of ETL along with our knowledge of SSIS.

Our ETL measurement is record counts and record counts only. Let's expand on that some by also capturing a monetary sum. This will add even more confidence in our ETL, once we validate (Validation is Part 5).

Let's begin by creating a new destination table to hold our sums: SSISRunTimeMetrics.ssis.RowSums. Use the following script to create the table:

use SSISRunTimeMetrics
go

-- vars...
declare @sql varchar(255)

-- create ssis schema...
if not exists(select name
              from sys.schemas
              where name = 'ssis')
 begin
  set @sql = 'Create Schema ssis'
  exec(@sql)
 end

-- create RunTimeErrors table...
if exists(select s.name + '.' + t.name
          from sys.tables t
          inner join sys.schemas s on s.schema_id = t.schema_id
          where t.name = 'RowSums'
           and s.name = 'ssis')
 drop table ssis.RowSums
go

Create Table ssis.RowSums
 (RowSumsID int identity(1,1)
 ,TaskMetricsID int null
 ,RunTimeMetricsId int not null
 ,ParentTaskMetricsID int null
 ,RowSum decimal(38,2) null
 ,RowSumColumnName varchar(255) null
 ,RowSumTypeID char(1) null)

This table is remarkably similar to the ssis.RowCounts table we created to hold Row Count data - and for good reason, the functions of these two tables are remarkably similar. As with the Row Counts data, we need to add a stored procedure to insert Sums data, and another table to hold Inserted Types... or do we? Instead of re-creating the functionality contained in the ssis.RowCountTypes table, let's rename - and expand the purpose of - the table.

Executing the following script accomplishes this nicely:

use SSISRunTimeMetrics
go

-- vars...
declare @sql varchar(255)

-- create ssis schema...
if not exists(select name
              from sys.schemas
              where name = 'ssis')
 begin
  set @sql = 'Create Schema ssis'
  exec(@sql)
 end

-- delete RowCountTypes table, if exists...
if exists(select s.name + '.' + t.name
          from sys.tables t
          inner join sys.schemas s on s.schema_id = t.schema_id
          where t.name = 'RowCountTypes'
           and s.name = 'ssis')
 drop table ssis.RowCountTypes
go

-- delete RowTypes table, if exists...
if exists(select s.name + '.' + t.name
          from sys.tables t
          inner join sys.schemas s on s.schema_id = t.schema_id
          where t.name = 'RowTypes'
           and s.name = 'ssis')
 drop table ssis.RowTypes
go

Create Table ssis.RowTypes
(RowTypeID char(1) not null
,RowTypeName varchar(25) null
,RowTypeDescription varchar(255) null)
go

if not exists(select RowTypeID
              from ssis.RowTypes
              where RowTypeID = 'I')
 insert into ssis.RowTypes
 (RowTypeID
 ,RowTypeName
 ,RowTypeDescription)
 values
 ('I'
 ,'Selected Input Rows'
 ,'Input rows selected from a source')

if not exists(select RowTypeID
              from ssis.RowTypes
              where RowTypeID = 'N')
 insert into ssis.RowTypes
 (RowTypeID
 ,RowTypeName
 ,RowTypeDescription)
 values
 ('N'
 ,'New Rows'
 ,'New rows')

if not exists(select RowTypeID
              from ssis.RowTypes
              where RowTypeID = 'C')
 
insert into ssis.RowTypes
 (RowTypeID
 ,RowTypeName
 ,RowTypeDescription)
 values
 ('C'
 ,'Changed Rows'
 ,'Changed rows')

if not exists(select RowTypeID
              from ssis.RowTypes
              where RowTypeID = 'U')
 
insert into ssis.RowTypes
 (RowTypeID
 ,RowTypeName
 ,RowTypeDescription)
 values
 ('U'
 ,'Unchanged Rows'
 ,'No changes detected in rows')
go

Our stored procedure to accomplish inserts:

use SSISRunTimeMetrics
go
if exists(select s.name + '.' + p.name
from sys.procedures p
inner join sys.schemas s on s.schema_id = p.schema_id
where p.name = 'usp_RecordRowSum'
and s.name = 'ssis')
begin
Drop Procedure ssis.usp_RecordRowSum
end
go
Create Procedure ssis.usp_RecordRowSum
@RunTimeMetricsID
int
,@TaskMetricsID int
,@RowSum decimal(38,2)
,@RowSumTypeID char(1)
,@RowSumColumnName varchar(255) = null
,@ParentTaskMetricsID int = null
As
begin
-- insert the run time errors data...
insert into ssis.RowSums
(TaskMetricsID
,RunTimeMetricsId
,ParentTaskMetricsID
,RowSum
,RowSumColumnName
,RowSumTypeID)
values
(@TaskMetricsID
,@RunTimeMetricsID
,@ParentTaskMetricsID
,@RowSum
,@RowSumColumnName
,@RowSumTypeID)
end

go

Now that our infrastructure is built we can start using it to load SSIS run time metrics.  

Add 'Em Up

We need variables to hold the sums we intend to collect. Right-click the Control Flow and click Variables. Click the New Variable button and add a package-scoped variable, data type Double, named iSalesInputAmount. Repeat the process for three other Double variables named iSalesNewAmount, iSalesChangedAmount, and iSalesUnchangedAmount.

There are other ways to load this type of data. The way I choose to demonstrate here is not the cleanest but it clearly exercises the principles of ETL Instrumentation.

Return to the "Load Sales Data" Data Flow Task and add one each Multicast and Aggregate transformations. Position the Mulitcast between Merge Join and Input Row Count transformations and connect them through it. Rename the Aggregate transformation "Input Line Total" and connect another output of the Multicast to it:

 

Double-click the Aggregate transformation to open the editor and check the LineTotal input column. Select Sum from the Operation column - this will add the total of all the LineTotal columns that pass between the Merge Join and Input Row Count transformations: 

Add a Script Component to the Data Flow. When prompted for Script Component Type, select Destination:

Rename the Script Component "Push InputLineTotal into Input Variable", connect the "Input Line Total" Aggregate transformation to it, and double-click the Script Component to open the editor.

On the Input Columns page, check the Line Total input. On the Script page, enter iSalesInputAmount in the ReadWriteVariables property and click the Design Script button. In the script editor, enter the following code:

Dim iAmount As Double

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
  '
  ' Add your code here
  '

  iAmount = Row.LineTotal

End Sub

Public
Overrides Sub PostExecute()

  MyBase
.PostExecute()
  Me.Variables.iSalesInputAmount = iAmount
End Sub

Close the Script Editor and click the OK button to clase the Script Component editor. This should load the aggregated value into the iSalesInputAmount variable.

Drag two each Multicast and three each Aggregate, and Script Component transformations onto the Data Flow canvas. Repeat the procedure outlined above for the New, Changed, and Unchanged Conditional Split outputs - for the iSalesNewAmount, iSalesChangedAmount, and iSalesUnchangedAmount variable values (respectively). Note you do not need a Multicase transformation for the Unchanged output. Sum the LineTotal fields for each output.

This is a lot of work and there is lots of room for error. Take your time. Double-check your work. Don't take shortcuts. When complete, the New section will look something like this:

Load 'Em Up

Return to the Control Flow - it's time to captue these metrics!

In the "Step 2 - Load Sales" Sequence Container, rename the "Record Row Count" Execute SQL Task "Record Metrics". Double-click it to open the editor. Click the ellipsis on the SQLStatement property and add the following script to the existing statement:

exec ssis.usp_RecordRowSum ?,?,?,'I','Sales.SalesOrderDetail.LineTotal'

exec ssis.usp_RecordRowSum ?,?,?,'N','Sales.SalesOrderDetail.LineTotal'

exec ssis.usp_RecordRowSum ?,?,?,'C','Sales.SalesOrderDetail.LineTotal'

exec ssis.usp_RecordRowSum ?,?,?,'U','Sales.SalesOrderDetail.LineTotal'

Click the Parameter Mapping page and add twelve parameters. With the existing twelve parameters, the new twelve are numbered 12 - 23:

The parameters follow the pattern iPackageLoadID (Input, Long, Incrementally Numbered), iTaskLoadID (Input, Long, Incrementally Numbered), iSales___Amount (Input, Double, Incrementally Numbered).

Click the Ok button to close the editor.

Testing, One, Two, Three...

Execute the package to test the Summing functionality. The following represents a better report query for our collected data:

use SSiSRunTimeMetrics
go

select
m
.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as 'packageRunTime'
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as 'taskRunTime'
,t.TaskStatus
,s.RowSum as 'Measurement'
,'Sum' as 'MeasurementType'
,st.RowTypeName
from ssis.TaskMetrics t
inner join ssis.RunTimeMetrics m on t.RunTimeMetricsID = m.id
inner join ssis.RowSums s on s.TaskMetricsID = t.TaskMetricsID
inner join ssis.RowTypes st on st.RowTypeID = s.RowSumTypeID
where m.id = (select Max(id)
from ssis.RunTimeMetrics)
and s.RowSum > 0

union

select

m
.packageName
,m.packageStartDateTime
,DateDiff(ss, m.packageStartDateTime, m.packageEndDateTime) as 'packageRunTime'
,m.packageStatus
,t.SourceName
,t.TaskStartDateTime
,DateDiff(ss, t.TaskStartDateTime, t.TaskEndDateTime) as 'taskRunTime'
,t.TaskStatus
,c.[RowCount] as 'Measurement'
,'Counts' as 'MeasurementType'
,ct.RowTypeName
from ssis.TaskMetrics t
inner join ssis.RunTimeMetrics m on t.RunTimeMetricsID = m.id
inner join ssis.RowCounts c on c.TaskMetricsID = t.TaskMetricsID
inner join ssis.RowTypes ct on ct.RowTypeID = c.RowCountTypeID
where m.id = (select Max(id)
from ssis.RunTimeMetrics)
and c.[RowCount] > 0

Conclusion

Again, these examples are intended to demonstrate the principles and characteristics of ETL Instrumentation. They are not complete and Production-ready. I make no claims that this is "the right way" or even a best practice to capture ETL Run Time Metrics data. I do maintain that such data is useful in many ways - especially for troubleshooting and certain performance predictive analytics.

Next: Validation - putting this data to work.

:{> Andy

Installing Virtual Server On Vista Ultimate 64-bit

Mea...

This may be a little off-topic.

I use Virtual Server and Virtual PC a lot for things like testing beta and CTP releases of SQL Server. I don't have time or the inclination to reformat physical hard drives, load an OS, or even fiddle with restoring from an enterprise backup solution like Ghost.

I blogged about configuring IIS7 for Reporting Services 2005, so this may have more relevance than I imagine.

Introduction 

If you're not using virtualized machines or servers, you should check them out. They're great for testing new stuff - relatively risk-free. I am aware of two flavors of virtual engines: VMWare and Microsoft products: Virtual PC and Virtual Server.

I have no experience with VMWare.

The Microsoft products are 100% free (as in beer).

I do not know which is better. I imagine that, like all things in life, one has advantages over the other and vice versa. The Microsoft stuff does what I need, so I've not ventured to search greener pastures (ever wonder why those pastures are greener? Think about it).

Get To The Configuration Already!

When I installed Virtual Server 2005 R2 SP1 on my newish Vista Ultimate 64-bit machine, I saw quirky stuff. For example, I could not change the port for admin - it defaulted to 80 and gave me an error message stating my server "didn't allow" installations on other ports.

Reeeeaaallly? Huh.

I proceeded with the installation anyway and nothing worked. Well, nothing in Virtual Server. I could not get to the administrative page and, in case you've never worked with Virtual Server, you can do nothing unless you can reach the administration page. (Note, you can do stuff if you like to type. I don't like to type.)

search ensued. I found a good article on why this was happening and how to fix it. It happens because IIS7 isn't configured (out of the box) to allow Virtual Server 2005 R2 to do its thing. To fix it, open Control Panel, Programs and Features (the snappy new Add/Remove Programs), then click "Turn Windows features on or off".

You want to navigate to Internet Information Services folder and make your Windows Features look like this:

 
(A Thousand Words)

The image above may be the first on the web (it was the only one I could find) of how to configure IIS7 on Vista Ultimate 64-bit for both SSRS 2005 and Virtual Server 2005 R2.

Note you still get the Configure Components screen displayed above - the one with port textbox disabled and set to 80 - at least in Vista Ultimate. But you also get these screens:

 

:{> Andy

Posted by andy | 2 Comments
Filed under: , ,
More Posts Next page »