Author:
Donald Farmer
ISBN:
1-932577-21-1
Audience:
Developer
Publication Date:
June 5, 2005
Description:
Get an early look at how to use the scripting features of SQL Server
2005 Integration Services (SSIS) from the source – Donald Farmer.
Donald is a Group Program Manager at Microsoft for SQL Server
2005 Integration Services. In less than 200 pages, this book covers the
concepts, architecture, scripting tasks and components. In addition,
Donald covers how to use .NET assemblies with your script tasks.
This book contains numerous examples, code, and scripts that are
available to download for free.
S
Q
L Server 2005
Integration
S
erv
i
ce
s
gg
Beta Prev
i
ew
SQL Server 2005
Integration
Services
Beta Preview
T
h
e Rationa
l
G
ui
d
e T
o
The Rational Guide To
Scriptin
g
wit
h
Scripting with
SAMPLE CHAPTER - COPYRIGHTED MATERIAL
Chapter 7
Your First Script Component
In Chapter 2, you learned about the three types of Data Flow components in SSIS:
sources, transformations and destinations. This chapter will describe these different types
and show you how to create your fi rst transformation component using VB.NET in the
Script Component. Once you have mastered the Script Component, you will fi nd that you
can readily integrate almost any functionality into the SSIS Data Flow. Custom scripted
data sources can handle otherwise unsupported fi le formats. Custom transformations
can call functions in managed assemblies, including the .NET Framework, and custom
scripted destinations enable SSIS to output data in very fl exible ways. In fact, it is quite
possible to write sophisticated Data Flows for ETL and data integration using only Script
Components!
Script Component Types
You will typically use script somewhat differently in each of the three component types:
Script Source Component
Source components have no input columns, but do have output columns. The purpose of
the Script Component in this case is to deliver data to the output columns. For example,
the component author could write a script using fi le and string handling functions to parse
a complex text fi le whose format can not be handled by the Flat File Connection Manager.
Text fi les like the one in Listing 7.1 are quite common:
SAMPLE CHAPTER - COPYRIGHTED MATERIAL
RECORD _ START
REC _ ID: 17804-4
BATCH _ ID: O2 _ 2004 _ 5
CUST _ LNAME: MACLEOD
RECORD _ END
RECORD _ START
REC _ ID: 17815-2
ALT _ REC _ ID: Temp2 _ 31
BATCH _ ID: H1 _ 2004 _ 2
CUST _ LNAME: MACLEAN
RECORD _ END
RECORD _ START
REC _ ID: 17222-1
BATCH _ ID: O1 _ 2004 _ 2
CUST _ LNAME: MCLENNAN
RECORD _ END
Listing 7.1:
A Good Candidate for a Script Source Component.
In this case, the text fi le has its data arranged in rows with labels rather than columns (as
expected by the Flat File Connection Manager), and it has irregular information in each
record (the second record contains an ALT_REC_ID item which is not in the fi rst) which
may need special handling. A script source component could readily handle this format,
using .NET Framework string functions to convert the records and their items into rows
and columns.
Script Transformation Component
Transformation components have input columns
and
output columns. In these components,
and output columns. In these components, and
the purpose of the script will typically be to transform the data in some way between
inputs and outputs. What these transformations are is up to you—that is why scripts are
so fl exible. Chapter 2 discussed two different patterns of transformation: synchronous and
asynchronous. Script Components can handle both patterns quite easily.
e Rational Guide To Scripting with SQL Server 2005 Integration Services Beta Preview
2
SAMPLE CHAPTER - COPYRIGHTED MATERIAL
A synchronous component, you will remember, is particularly useful for row by row
transformations. For example, I may have incoming data which includes customer last
names. Perhaps for easy cross-referencing with other customer records, I would like to
calculate a SOUNDEX value for each customer name. SOUNDEX is an algorithm invented
by the US Census for codifying names to take account of different spellings. T-SQL has
a SOUNDEX function, but SQL Server Integration Services does not. The component
author can code their own SOUNDEX function in VB.NET script and transform every
incoming customer name using that script, emitting the new SOUNDEX value at the
output.
Note:
Soundex is an algorithm for matching strings—principally personal names—phonetically.
Soundex has been used since the 1880 US Census to conform diff erent spellings of surnames
in census reports to a standard value. For example, Smith and Smyth have the same Soundex
value. Soundex helped researchers using Census data to match and fi nd surnames even
with diff erent spellings. The algorithm converts a name to a code where the fi rst letter of
the code is the fi rst letter of the name, and a sequence of numbers represents the other
syllables. Vowels are ignored, as are double consonsants and the letters Y, H and W. So, in
Soundex, Alan and Allen both become A45 . Alonso becomes A452 .
An asynchronous component is useful for performing operations which change the shape
of the data signifi cantly, or where incoming rows do not have related rows at the output.
A good use of an asynchronous script would be to aggregate text. The SSIS Aggregate
component is very powerful, but it can only perform
Min
and
Max
calculations against
numeric columns. If you need to be able to calculate the
Min
and
Max
values of a string
column, this can easily be achieved in VB.NET script, as shown in Chapter 14.
Script Destination Component
Listing 7.1 showed a text fi le which could not be parsed by the Flat File Connection Manager
but which could be handled by a script source component. In an enterprise where text fi les
like this are used by legacy applications, you may also need the ability to generate a fi le in
this format for the legacy application to read. The script destination component is useful
in these circumstances.
Chapter 7 - Your First Script Component
3
SAMPLE CHAPTER - COPYRIGHTED MATERIAL
You will have guessed by now that a script destination component has input columns,
but no output columns. Instead, it is the VB.NET script which handles the data, perhaps
using fi le and string handling routines from the .NET Framework to output text fi les in
the appropriate format.
Having discussed the various kinds of components which you can script, it is time to get
started with our fi rst Script Component.
Adding a Script Component to Your Package
The fi rst Script Component we are going to look at it is a transformation component.
In fact, we are going to build a simple SOUNDEX component to transform a column
containing a name to a codifi ed value representing the sound of the name, just as the US
Census would do.
As this fi rst component is a transformation, it requires some data to work with. To get us
started, we can quickly build a package containing a Data Flow and a source component
as follows.
Note:
This chapter assumes you have at least some familiarity with building a Data Flow with SSIS,
even if just the simplest examples from the documentation or samples that ship with the
product.
Preparing the package
Use the following steps to prepare the package:
1. Create a new SSIS Package.
2. In the designer, drag a
Data Flow Task
from the
Data Flow Task from the Data Flow Task
Control Flow Items
tab of
the Toolbox to the
Control Flow
design surface.
3. Double-click the
Data Flow Task
to open the
Data Flow Task to open the Data Flow Task
Data Flow
design surface.
4. Drag an
OLE DB Source
component from the
Data Flow Sources
tab of
the Toolbox.
e Rational Guide To Scripting with SQL Server 2005 Integration Services Beta Preview
4