SSIS Expression Language and the Derived Column Transformation
Introduction
This post is part of a series titled An Introduction to the SSIS Expression Language.
In this post, I demonstrate the SSIS Expression Language and the Derived Column Transformation.
To build the demo project described in this article you will need SSIS 2005 and the AdventureWorks sample database (which can be downloaded at http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004).
Add To The Demo Project
If you have not already done so, create an SSIS project named ExpressionLanguageDemo.
Once the project is created, open Solution Explorer and rename Package.dtsx - change the name to DerivedColumn.dtsx. If you've already created the project, right-click the SSIS Packages folder in Solution Explorer and click New Package - and rename that package to DerivedColumn.dtsx. When prompted, rename the package object as well. (I think you should always do this - I always answer this message box by clicking the Yes button.)
Drag a Data Flow Task onto the Control Flow. Click the Data Flow tab to begin editing, and drag an OLE DB Source onto the canvas. Double-click the OLE DB Source to open the editor. As in the post entitled SSIS Expression Language and Variables, click the New button to create a new Connection Manager to the AdventureWorks database.
Set the Data Access Mode to SQL Command and enter the following T-SQL statement into the SQL Command Text textbox:
Select
ContactID
,Title
,FirstName
,MiddleName
,LastName
,EmailAddress
From Person.Contact
Click OK to close the editor.
Learning To Derive
Drag a Derived Column Transformation onto the Data Flow canvas and connect the OLE DB Source Adapter to it using a Data Flow Task:
Double-click the Derived Column transformation to open the editor.
Let's start somewhat in the middle. The Derived Column column of the editor contains a list of all the columns flowing into the transformation, plus an entry for "":
If you select one of the columns flowing into the transformation, you can use an Expression to modify the value in some way. Why would you do this? Let's look at an example.
The Replacement
From the Derived Column dropdown, select "Replace 'MiddleName'":
Note a few properties are automatically populated when you make the selection. The Derived Column Name is set to "MiddleName." This makes sense - your selection indicates you want to replace the current values in the MiddleName column. The Data Type column contains the equivalent SSIS data type for the MiddleName field in the Person.Contact table.
Data Type coercion in SSIS is interesting - there are quite a few moving parts. The MiddleName field is an NVARCHAR(50) data type in the Person.Contact table. The connection between SSIS and the AdventureWorks database is made using an OLE DB Connection Manager. There is some data type corecion in the database - OLE DB provider - SSIS connection layers. By the time the data type makes it into the SSIS data flow pipeline, it is represented as a DT_WSTR, or Unicode string. The Length column is also populated with 50.
The Expression column is a blank slate upon which you write the code that controls value. You can hard code a value, such as a space, by enclosing a space inside double-quotes:
" "
You can hard code a Null value by using a NULL function. Expand the NULL Functions folder in the upper right of the Derived Column transformation and drag the Null function for the DT_WSTR data type into the Expression textbox, replacing the <> placeholder with the length of the column (50):
NULL(DT_WSTR, 50)
Or you can conditionally replace the MiddleName value. In this case, we'll replace Null middle names with a space. If not Null, we'll allow the existing value to continue in the data flow pipeline. To do this, use the ternary syntax:
[Test Condition] ? [True Operation] : [False Operation]
To replace Null MiddleName values with a space, use the following expression:
IsNull(MiddleName) ? " " : MiddleName
Add One
In the second row, let's add a new column. In the Derived Column dropdown, select "". Note the only column auto-populated is the Derived Column Name column, and it contains "Derived Column 1". Rename this column "FullName." Add the following expression:
FirstName + (IsNull(MiddleName) ? " " : MiddleName) + LastName
Note the Data Type DT_WSTR for both columns. Suppose we want to change the data type? We can cast the data type for added columns. Expand the Type Casts function folder and drag a DT_STR cast to the beginning of the Expressions textbox - replacing the <> placeholder with 150 (the summed lengths of the columns) and the <> placeholder with 1252 for en-us (for US English, feel free to substitute with your code page if different) - to convert the value from Unicode to an ASCII string. The FullName expression should now read:
(DT_STR, 150, 1252)(FirstName + (IsNull(MiddleName) ? " " : MiddleName) + LastName)
Note the Data Type column is now a String:
You can add a DT_STR Cast statement to the expression for the MiddleName, but it doesn't change the Data Type. Why can't we change the data type for existing columns in the Derived Column transformation? We're replacing the values, not changing the data type. Is it impossible to change the data type of an existing column in the Derived Column? Let's put it this way: It is not possible to convert the data type of a column when you are merely replacing the value. You can, however, accomplish the same goal by creating a new column in the Data Flow.
To demonstrate, configure the third row to add a new column. Name the new column StrMiddleName and set the expression to:
(DT_STR, 150, 1252)MiddleName
Other options: You can also use a Data Conversion Transformation to change the data type of a column in the data flow pipeline, but the Data Conversion also adds a column to the data flow pipeline. You can also cast the value in T-SQL from the source (in the OLE DB Source Adapter).
Conclusion
The Derived Column Transformation is very flexible and powerful. The mathematical and string parsing functions are more easily understood by folks with a developer background. The ETL-specific functions - such as those demonstrated in this post: NULL fields, type-casting, and ternary conditional operations - are not as commonly understood.
:{> Andy