Posts tagged ‘Script’

Naming Columns for the Script Component

Do you use whitespace or special characters in your column names? Most people don’t, because of the additional headaches it creates. You have to delimit the column names, come up with a work-around for tools that don’t support column names with special characters, etc. Underscores, however, are used pretty extensively in place of spaces. If you are using Script Components in SSIS, though, you may encounter an all-new headache with special characters or even underscores in your column names.

When you use a script component in SSIS, it generates some .NET code for you automatically, based on the metadata in the pipeline connected to the script component. However, when this code is generated, SSIS strips out any whitespace or special characters from the names of inputs, outputs, and columns. It only retains the letters and numbers (alphanumeric characters) in these names.

Here’s some examples of column name issues that I’ve run into with scripts (and while these specific items are made up, they represent real-world scenarios I’ve encountered – there’s some really horrible naming approaches out there):

Original Column Name Script Column Name
Account Account
Account# Account
Account Number AccountNumber
Account_Number AccountNumber
TI_TXN_ID TITXNID
TI_TXNID TITXNID

 

As you can see, once the alphanumeric characters have been stripped from these column names, they are no longer unique. That can pose a few problems in your script code. What’s worse, because this code is auto-generated by SSIS, you can’t fix it without changing the column names in the data flow, even though this is really purely a script thing (and not even a .NET limitation – underscores are perfectly valid in .NET naming). What’s even worse than that – you don’t get an error till the binary code is recompiled.

So, if you are working with script components, make sure all your column names are unique even when all non-alphanumeric characters have been stripped from them. The same thing applies to your output names – they must be unique based only on the alphanumeric characters.

Using OLE DB Connections from Script Tasks

I write scripts on a pretty regular basic, and often need to access database connections from them. It’s pretty easy to do this if you are using an ADO.NET connection. However, if you are using OLE DB, you have to go through a couple of additional steps to convert the connection to an ADO.NET version. Matt Masson posted a great code sample for doing this conversion.

I use a slightly altered version of this code pretty regularly. It’s been modified to support both OLE DB and ADO.NET connections, so that I can switch connections without having to change the script code.

To use it, you need to add a reference in your script project to Microsoft.SqlServer.DTSRuntimeWrap. Then, add the following to the usings section at the top of the script:

using System.Data.Common;
using Wrap = Microsoft.SqlServer.Dts.Runtime.Wrapper;

For the code to get the connection, use the following:

ConnectionManager cm = Dts.Connections["MyConnection"];
DbConnection conn = null;
if (cm.CreationName == "OLEDB")
{
	Wrap.IDTSConnectionManagerDatabaseParameters100 cmParams =
	cm.InnerObject as Wrap.IDTSConnectionManagerDatabaseParameters100;
	conn = cmParams.GetConnectionForSchema() as DbConnection;
}
else
{
	conn = cm.AcquireConnection(null) as DbConnection;
}

if (conn.State == ConnectionState.Closed)
{
	conn.Open();
}

// TODO: Add your code here

conn.Close();
Dts.TaskResult = (int)ScriptResults.Success;

You can use the “conn” object to perform actions against the connection. Since it’s using the common DBConnection interface, you can use it against any database connection that you have an ADO.NET provider for (which includes OLE DB providers).

Passing an Object from a Parent Package To a Child

Occasionally, you may run into the need to pass values between packages. In most cases, you can use a Parent Package Variable configuration to pass the value of a variable in the calling (or parent) package to the called (or child) package. However, Parent Package Variable configurations can’t be set up on variables of type Object. Fortunately, you can work around this pretty easily, thanks to the fact that the calling package variables are inherently accessible in the called packages. 

I’ve set up a sample parent and child package to illustrate this. The parent package is straightforward. It uses an Execute SQL task to populate an object variable named TestVar with a recordset object, and then calls the child package.

image image

The child package has a Foreach Loop Container to iterate over the recordset object. It has a Script task that is used to copy the parent package’s variable (TestVar) to a local variable named LocalVar. This is the variable that the Foreach Loop is configured to use. Why copy the value? If you don’t have a local variable to reference in the Foreach Loop, it won’t validate properly.

image image

The script in Copy Variable is pretty simple. It relies on the fact that you can reference parent package variables inherently, as they are included in the collection of variables accessible in the local package. The script just copies the value from one variable to the other, so that the Foreach Loop will have something to do.

public void Main()
{
    Variables vars = null;
    Dts.VariableDispenser.LockForWrite("User::LocalVar");
    Dts.VariableDispenser.LockForRead("User::TestVar");
    Dts.VariableDispenser.GetVariables(ref vars);

    vars["User::LocalVar"].Value = vars["User::TestVar"].Value;
    vars.Unlock();

    Dts.TaskResult = (int)ScriptResults.Success;
}

Please note that for this to work, you cannot have a variable in the child package with the same name as the variable in the parent package. If you do, the local variable will hide the parent variable. Outside of that, this works really well for passing object values between packages. The same technique can also be used in reverse to send values back to the parent package, if you have that need.

The sample has been uploaded to my Skydrive. Let me know if you have any questions.

Code Snippets for SSIS 2008

If you develop in Visual Studio using C# or VB, then you are probably familiar with code snippets. They are little chunks of code that you can insert into your application and save you some typing. For example, if you are writing a C# application, and you type “if” and hit TAB twice, the following will be inserted into your code:

image

You can enter in the value to be evaluated, and then type in the rest of your code. I like code snippets because they save me from typing the same things over and over again. However, in SSIS 2005 scripting, code snippets weren’t available, due to the VSA environment used by the script objects. Now that we have a real scripting environment in SSIS 2008, code snippets can be used, although there are a few things you need to know.

Two of the most common activities I do in SSIS scripts are reading from and writing to variables. So, I put together a few code snippets that do this. Here’s what the snippet for reading a value from a variable looks like:

image

To create a snippet, you define the code snippet in an XML format, and save it to a file with a .snippet extension. I’m not going to post the code for the snippet here, as it is fairly verbose, but you can look at the .snippet file in any text editor. Once you have the file, you need to put it in a specific location for the Visual Studio environment to recognize it. This is where it can get a little confusing. Normally, snippets go into the “C:\Users\<user>\Documents\Visual Studio 2008\Code Snippets\<language>\My Code Snippets” folder. However, this is the location used for the full Visual Studio environment. Since the script editors in SSIS use the Visual Studio Tools for Applications environments, they get their own folder. In fact, there are separate folders for the Script Task and the Script Component. Code snippets for the Script Task go into the “C:\Users\<user>\Documents\Integration Services Script Task\Code Snippets\<language>\My Code Snippets” folder, and snippets for the Script Component go into the “C:\Users\<user>\Documents\Integration Services Script Component\Code Snippets\<language>\My Code Snippets” folder.

This actually works out well, because the code for the snippets is slightly different between the Script Task and Script Component. This is due to the object references being slightly different between the two.

I’ve provided two code snippets for C# for reading from and writing to variables in the Script Task, and the same two in C#, but altered to work in the Script Component. To use them, you can simply copy them to the above folders, and then type “SSISReadOneVariable” or “SSISWriteOneVariable” as a shortcut, followed by two tabs to insert them. You can change the shortcut in the snippet file if you’d like to use something else. It also wouldn’t be difficult to convert them to VB, if that’s your preference. The snippets are located on my SkyDrive. If you have ideas for other useful snippets, post a comment. Enjoy!

Calculating a Running Total Part 2

To finish up the topic that I started in my previous post, I wanted to post a more complex example of running total calculations. This is based off a request on the MSDN forums, for a running total calculation that would “slide” along as additional rows were processed. In the poster’s example, the running total needed to reflect the total for the last 10 weeks.

To show how to accomplish this, I’m going to update the same package that was used in the previous example.

image

The script component has the same RunningTotal column defined on the output:

image

The primary difference is in the script itself. Since I want to keep a list of the previous 10 records, I am using a Queue object, which is part of the System.Collections.Generic namespace. This object lets you easily remove the oldest item from the queue, so it works very well for this example. In the script, if the queue has more than 10 items, the oldest item is removed prior to adding a new one. The running total is produced by iterating through all the items in the collection, and storing the result in the RunningTotal output column.

Public Class ScriptMain
    Inherits UserComponent
    Dim runningTotalCollection As Queue(Of Integer) = New Queue(Of Integer)
 
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        If runningTotalCollection.Count >= 10 Then
            runningTotalCollection.Dequeue()
        End If
 
        runningTotalCollection.Enqueue(Row.OrderQty)
 
        Dim x As Integer
        Dim sumX As Integer
        For Each x In runningTotalCollection
            sumX += x
        Next
 
        Row.RunningTotal = sumX
    End Sub
 
End Class

This is another example of the flexibility of the Script component. It also helps highlight that by using the Script component, you can store the values from previous rows, and then reference them later. The example package is posted on my Skydrive.

Calculating a Running Total In SSIS

Running totals are a common calculation for reporting, and are occasionally needed in ETL processes to populate aggregate tables. One way to accomplish this in SSIS, if you are retrieving data from a relational database, is to do the running total in the SELECT statement. There are a number of resources on the Internet that illustrate how to accomplish this. However, this may not work for you if you are retrieving information from a flat file. Also, the logic to implement a running total may not be straightforward in SQL, particularly if you need sliding windows for the totals (I’ll explain this later).

Fortunately, there are other ways to accomplish this in SSIS. A very flexible method is to use a script component, as this will let you customize the logic however you prefer. As an example, I’ll take a look at a couple of examples from AdventureWorks.

First, the more straightforward example. The AdventureWorks database has a Sales.SalesOrderDetail table that lists the itemized purchases for each order. Note – I’m only filtering the results for example purposes. The actual code in the package works with the whole table.

image

If I was to calculate a running total for the OrderQty for each SalesOrderID above, I’d expect to see:

SalesOrderID ProductID OrderQty RunningTotal
43685 765 3 3
43685 763 1 4
43685 754 1 5
43685 725 1 6
43686 758 3 3
43686 762 1 4
43686 770 1 5
43687 768 1 1
43687 765 2 3

Notice that the running total resets each time the SalesOrderID changes. To implement a script component that will calculate the running total, I’ve created a new script component and added an output column to it to hold the running total.

image

Inside the script component, I’ve added a variable to store the value of the control number (SalesOrderID), and another variable to capture the running total. When the control number changes, the running total is reset.

Public Class ScriptMain
    Inherits UserComponent
    Dim runningTotal As Integer
    Dim previousValue As Integer
 
 
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        If Not previousValue = Row.SalesOrderID Then
            runningTotal = 0
            previousValue = Row.SalesOrderID
        End If
 
        runningTotal += Row.OrderQty
 
        Row.RunningTotal = runningTotal
    End Sub
 
End Class

This is an easy method to calculate running totals in the data flow. One thing to note is that the data must be sorted by the control number (SalesOrderID in the example) in order for this to work properly.

This is in an example package on my SkyDrive here. I will follow up this post with another sample showing a more complex "running total" problem and how it can be solved in SSIS as well.

See if a File is Locked

A question that comes up occasionally on the forums is how to determine if a file is being written to by another process. It can cause issues if a file is partially written when a source component starts extracting data from it.

There is a way to check this through a Script Task. The script below attempts to lock the file for writing. If the file is be written to be another process, the attempt to lock it will fail. If the file is available, the attempt will succeed.

Public Sub Main()
    Dim vars As Variables
    Dim file As System.IO.FileStream
 
    Try
        file = System.IO.File.Open(Dts.Connections("FlatFile").ConnectionString, System.IO.FileMode.Open, System.IO.FileAccess.Write)
    Catch ex As Exception
        Dts.VariableDispenser.LockOneForRead("FileLocked", vars)
        vars("FileLocked").Value = True
        vars.Unlock()
    Finally
        If Not file Is Nothing Then
            file.Close()
        End If
    End Try
 
    Dts.TaskResult = Dts.Results.Success
End Sub

The package needs to have a connection manager that points to the file you want to check (named "FlatFile" in the script above), and a variable ("FileLocked") that is set to True if the file is locked. The variable should be set to False by default. The variable can be used in a precedence constraint to control whether other tasks execute.

Renaming the Script Project in SSIS 2008

I found something interesting in SSIS 2008 today, and thought I’d share it. I was working with David Darden on a package that included a script task. In 2008, the Visual Studio Tools for Applications (VSTA) environment is used to edit script tasks. It exposes a few more details about the script task, including the project name. By default, it creates a rather long project name, as you can see in the screen shot below. (Yes, that is C#. It’s my preferred language, so expect to see more of it here now that it’s available in 2008). The namespace is based on the project name, so you get the same long value for it.

image

While poking around in the script task’s expressions editor, we found the ScriptProjectName property. This property does not show up in the Properties window (F4), only in the expressions editor. By setting this, you can update the project name in the script. For example, setting this property to "NewScriptProjectName", as shown here:

image

results in this showing up in the script project:

image 

There are a few caveats to this. One, if you do this after initially editing the script, the project name will be updated, but the namespace will remain the same. Two, you must save the package after setting the expression, in order for the expression to be applied. If you add the expression, then edit the script without saving, it will not use the new project name.

So, if you want to name your script project with a friendly name, do the following:

  1. Add the Script Task.
  2. Add an expression to set the ScriptProjectName property to your desired value.
  3. Save the package.
  4. Now you can use the Edit Script button to edit the script project with the friendly name.

I do want to point out that the name of the project has no real impact on functionality. The script will not run any differently if you do this. It’s purely for aesthetic purposes. But if the long, random looking project names bother you, it’s nice to know you can fix them. There is also a MSDN forum post here that details the steps to edit the namespace after the script has been edited for the first time. It’s a little easier to set it this way, though, so that it is handled up front.

Sending an Alert if a Package Runs Too Long

A question that has come up in a few different situations recently is how to send someone an alert if your package is taking too long to execute. For example, you might have a package that loads a large number rows from a source system to a destination on a nightly basis. There are a number of other packages that depend on this one to complete before they can begin running. The initial package normally takes 15 minutes to run, but on occasion, the source system can be under heavy load, so the package can run 45 minutes to an hour. In this scenario, you’d like to alert people that the load will be delayed.

If you are using an scheduling tool, you should first check to see whether it has this capability. Ideally, you want this monitoring process outside of the package itself. If it is internal to the package, then there is potential for the alerts not to be sent, if the DTEXEC process has really gotten hung up. Some scheduling tools have the capability to monitor the duration of the execution. However, SQL Agent does not have this, so you may need to implement this in the package itself. Another reason you may need to do it in the package is to monitor the duration of a specific set of tasks, but not the entire package.

Fortunately, this isn’t too difficult to implement in SSIS. All it takes is a little scripting. I’ve set up a sample package that illustrates how to do this. The main package looks like this:

image

The Long Running Task script task simulates a long running task by running a specific number of seconds (the value of the TaskDelay variable below). The Mark Task Successfully Completed sets the TaskSuccessful variable to True, so that the package knows when the long running task completed. The Monitor Duration script task is the one that actually does the checking on execution time, based on the value in the TaskDuration variable. If the execution time has taken more seconds than the value of TaskDuration specifies, the script exits. There is a precedence constraint that checks that the value of the TaskSuccessful variable is still False, then the Send an Alert task is run. In the sample, the Send an Alert task is a script task, but in a real package it would likely be a Send Mail task.

image

As stated above, the real work happens in the script. Basically, it performs a check once a second to see if either:

  1. The TaskSuccessful variable is True, meaning that the long running task completed successfully, or
  2. The difference in seconds between the current time and the time when the script started running is greater than the value in the TaskDuration variable.

If either case is true, the script exits. If both cases are false, the script sleeps for a second, then checks again. Putting the thread to sleep helps make sure that CPU isn’t tied up checking the conditions too often.

Public Sub Main()
    Dim vars As Variables
    Dim maxDuration As Integer
    Dim startTime As DateTime = DateTime.Now
    Dim continueWaiting As Boolean = True
    Dim taskCompleted As Boolean = False
 
    Dts.VariableDispenser.LockOneForRead("TaskDuration", vars)
    maxDuration = System.Convert.ToInt32(vars("TaskDuration").Value)
    vars.Unlock()
 
    Do While continueWaiting
        System.Threading.Thread.Sleep(1000)
 
 
        Dts.VariableDispenser.LockOneForRead("TaskSuccessful", vars)
        taskCompleted = System.Convert.ToBoolean(vars("TaskSuccessful").Value)
        vars.Unlock()
 
        If taskCompleted Or _
            (DateDiff(DateInterval.Second, startTime, DateTime.Now) > maxDuration) Then
            continueWaiting = False
        End If
    Loop
 
    Dts.TaskResult = Dts.Results.Success
End Sub

 

There are few things to be aware of with this approach. It’s not foolproof by any means. Any process that ties up 100% of the CPU is likely going to make this fail. Also, this depends on parallel execution of the long running task and the monitoring task. If you set the MaxConcurrentExecutables property in your package to 1, this will not work. It needs to be run on a machine with enough power to start the task at roughly the same time. In my experience, this hasn’t been a problem with the hardware that I work with, but it is a potential failure point.

The sample file is posted on my Skydrive, as usual.

Importing Files Using SSIS

A topic that has come up a few times recently is the idea of loading a set of files into a database using SSIS. One of the data flow components in SSIS is the Import Column transform, which allows you to load files into a binary column. There is a great video by Brian Knight that shows how to use it, and I recommend viewing that. If you are looking for a quick overview of the component, and a different approach for loading the list of filenames to import, then read on.


The Import Column transform works in the data flow, and imports one file for each row that is passed through it. It expects a column that contains the file name to import as an input. It outputs a column of type DT_TEXT, DT_NTEXT, or DT_IMAGE, that contains the file contents.


I’ve included a sample package with this post that uses the Import Column transform. It has a single data flow that uses a script component to get the list of files to import.


pic1[4]


The package has two connection managers, one of which points to a SQL Server database where the files will be stored. The other connection manager is a File connection manager, that is pointed to a folder. This is the folder that we want to import the files from.


p1


The script component was created as a Source. A single output column of type DT_WSTR was added to contain the filenames.


1


On the connection managers page, the File connection manager is specified so that it can be accessed from the script.


1[4]


The script uses the Directory class from the System.IO namespace. By calling the GetFiles method, the code can iterate through all of the files in the directory, and output one row for each file.


Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Imports System.IO
 
Public Class ScriptMain
    Inherits UserComponent
 
    Public Overrides Sub CreateNewOutputRows()
        Dim fileName As String
 
        For Each fileName In Directory.GetFiles(Me.Connections.ImportFilesDir.AcquireConnection(Nothing).ToString())
            Output0Buffer.AddRow()
            Output0Buffer.Filename = fileName
        Next
 
        Output0Buffer.SetEndOfRowset()
    End Sub
 
End Class

The next component is the Import Columns component. Configuring it can be a little difficult. I found the Books Online documentation a little unclear on what had to be done. On the Input Columns tab, the column that contains the filename (including path) to import needs to be selected.


1[1]


On the Input and Output Properties tab, a new column was added to hold the binary contents of the file. When adding this column, make a note of the LineageID value, as it needs to be used in the next step.


1[1]


After adding the output column, the input column (that contains the filename, not the file contents), needs to be selected. The LinageID from the previous step needs to be put into the FileDataColumnID property. This tells the component which column to populate with the file contents.


1[3]


The OLE DB Destination is fairly straightforward, as it just maps the columns from the data flow to the database.


Hopefully this helps if you are working with the Import Column transform. The samples are located on my Skydrive.