You see things; and you say 'Why?' But I dream things that never were; and I say 'Why not?'

Tuesday, September 15, 2009

SSIS Design Pattern - Custom Error Handling

Error logging is crucial to troubleshooting and auditing efforts. If an error occurs in your SSIS package while it is executing production loads, transformations, or transfers; you will want to know as much as possible about it.

To demonstrate, create a new SSIS project named ErrorTest:

I encourage you to develop the practice of renaming packages as you create them. This will make your life simpler once you deploy:

I rename Package.dtsx to ErrorTestPkg.dtsx (don't change the extension):

When you rename a package the following dialog will appear:

Always answer this dialog "Yes".
Drag an Execute SQL Task onto the Control Flow:

Configure it (double-click or right-click and select Edit to open the editor) as shown below. Note any connection will suffice, so long as the SQL Server is running. Also note the SQL will generate an exception (intentionally):

Click OK to close the editor. Right-click the Task and select Execute Task to test:

The task should fail (as planned):

Stop execution and click on the Event Handlers tab:

Check to be sure the Executable dropdown is set to the ErrorTestPkg (the package) and the Event handler dropdown is set to OnError. Click the link to create a new OnError Event Handler for the package. From the Toolbox, drag and drop a Data Flow Task onto the Event Handler:

Double-click the Data Flow Task to edit it. From the toolbox, drag a Script Component transformation onto the Data Flow:

The Script Component can be configured as a Source, Transformation, or Destination. Select Source:

Since a Source has only outputs (a destination has only inputs, and a transformation has both inputs and outputs), you will need to configure the Output. Rename the generic Output 0 to ErrorOutput:

Click on Output Columns and click Add Column:

Rename the Column ErrDescription:

Edit the ErrDesription column's properties - change the datatype to String (DT_STR) and the length to 5000:

Add the ErrorDescription System variable to the ReadOnlyVariables property of the Script Component:

There are other variables you will add to your error handling script component. Remember variable names are case-sensitive, comma-separated, with no spaces in the list.
Click the Design Script button and add the following code snippet in the CreateNewOutputRows subroutine as shown below:


With ErrorOutputBuffer
.AddRow()
.ErrDescription = Me.Variables.ErrorDescription
End With


Some explanation: The name of the ErrorOutputBuffer was determined by the name you gave the Script Component Output earlier. When you changed the Output name to ErrorOutput, the script component's output buffer was changed to ErrorOutputBuffer.

To populate the buffer, you must add rows to it. Think of this as starting a new row in a table. If you're familiar with tables in Microsoft Word documents, you know when you start a new row it is empty until you populate it. The same is true here. You are creating a new, empty row in the output buffer to populate with data.

The final step is to load the row. Again, in your production-ready error event handler you will populate more columns in the output buffer than this one - this is for demonstration purposes only:

Next, drag a flat file destination onto the Data Flow and connect it to the Script Component as shown:

Note: You will likely want to store error data in a database rather than a flat file.

When you configure a flat file destination, you are prompted for which type of file. Delimited files are fairly simple to read - especially with few fields:

Select a location and name for your flat file. I chose C:\ErrorOutput1.txt:

Click Columns to view the columns the file will contain. Note: the columns would be empty if you had not connected a data path from the Script Component prior to configuring the Flat File Destination:

Click OK to proceed. Click the Mappings page to assign mappings between fields in the pipeline and fields in the destination:

Since you built the destination from the metadata contained in the pipeline, auto-mapping should occur when you click Mappings:

Click OK. Your Event Handler should appear as shown:

Start debugging the package by pressing F5 or clicking the Debug Run arow. Your ExecuteSQL task should fail as before:

Click on the Event Handlers tab to determine the run status of your Package OnError Event Handler. Double-click the Data Flow Task and note one row was written to your error file (ErrorOutput1.txt):

Open ErrorOutput1.txt file to view the error stored there:

In this example, we built a package to demonstrate logging package errors to a data destination - albeit a flat file. This technique can be used to catch errors and store them for troubleshooting posterity.

Get the code!

Thursday, September 10, 2009

DTS vs SSIS

In today’s blog post, we will give a basic introduction of SSIS (SQL Server Integration Services) and provide a comparison of some of those features with DTS (Data Transformation Services), it’s predecessor.

SSIS is an ETL tool provided by Microsoft for extracting the data from various sources, transforming the data according to the business requirements and loading the data into the destination. The source and the destination could be anything that you can talk to using a connection. One important thing to understand is that SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server 2005 and is not a separate install. Integration Services includes graphical tools and wizards for building and debugging packages, tasks for performing workflow functions such as FTP operations, for executing SQL statements, or for sending e-mail messages, data sources and destinations for extracting and loading data, transformations for cleaning, aggregating, merging, and copying data.

In SQL Server 2000, DTS packages are created in Enterprise Manager. The DTS packages are created in the DTS designer and you have the option of saving them in SQL Server or saving them to the filesystem (by specifying the location as “Structured Storage File”). In Integration Services, the packages are created using Business Intelligence Development Studio (BIDS). BIDS is nothing but another version of Microsoft Visual Studio. It is a modified version of Visual Studio. When we create Integration Services packages using BIDS, they are saved on the client machine (local machine). We have to explicitly deploy the Integration Services Packages onto the destination server.

The following steps show how you can launch BIDS:

Go to: Start –> All Programs –> Microsoft Sql Server 2005 –> Business Intelligence Development Studio
ssis_1.jpg

The basic organization concept in DTS as well as SSIS is the concept of a package. A package is essentially a collection of SSIS objects which includes:

a) Connections to the data sources,
b) Data Flows,
c) Control Flows, and
d) Event Handlers

We will discuss these shortly. Let’s first look at the DTS Package Designer and BIDS to get an idea of the IDE layout.

Package Design Pane:

The following images show the DTS and SSIS design panes respectively.
ssis_2.jpg
ssis_3.jpg

The striking difference when one sees both the designers is the way the designer is designed.

In DTS, the designer consists of a single pane. And to the right we see two options: the Connections and the Workflow tasks. The Connections are used to make the connections to a wide variety of data sources and destinations like Excel, Flat Files, Sql Server and Access essentially anything that we have an ODBC, OLEDB, Jet driver etc. connectivity to. The Work Flow tasks add the functionality to the package. The thick line or what we call the data transformation task between the source and the destination in the above example transforms the data and loads into the destination. One can write custom code for the transformations or simply use the inbuilt capabilities of the data transformation task.

In SSIS, the designer is split into 4 design panes:
a) Control Flow,
b) Data Flow,
c) Event Handlers, and
d) Package Explorer.

The control flow pane is used to design the workflow inside the package. There is also a great improvement in the number of tasks and additional functionality has been added to some of the existing tasks. The control flow is like a program flow. All the workflow tasks in the DTS come under the Control Flow tasks in SSIS except the Data Transformation Task.

The Data Transformation Task in DTS has been converted into Data Flow task. As the name suggests, the Data Flow task handles the flow of data. Microsoft has introduced a wide variety of data flow tasks in SSIS for building an entire ETL. All the data transformation from the source to the destination takes place in this Data Flow task. Examples of Data Flow Task are Multicast, OLEDB Command, Sort transformation, Pivot transformation, Unpivot transformation etc.

Event Handlers is something new to SSIS. Event handlers respond to various events raised in the package like on error, on warning, on pre execute and on post execute. When an event is raised then the control goes to the Event Handler Pane, and the package does whatever is instructed in the Event Handler Pane. Again Event Handlers Contain the Control Flow Tasks because Control Flow determines the workflow of the package.

Package Explorer is also a new pane in SSIS. It gives the overall view of the package.

ssis_4.jpg

The image shown above shows the package explorer. As said earlier, the package explorer gives the over all gist of the entire package like Executables, Precedence Constraints, Connection Managers, Event Handlers and the Variables. Precedence Constraints and Connection Managers are described later in the post.

The Solution Explorer:

The Solution Explorer is also a new concept in SSIS. The solution explorer consists of the SSIS packages, connections and Data Source Views (DSV). The solution contains a series of projects. Each project contains a series of objects used in the projects. In DTS we were allowed to create only independent packages. There was no possibility to store the related packages of the module in a single place. The Solution Explorer allows us to do that. We can have multiple packages relation to one module in a single Solution.

Data Source View (DSV)

This is also new to SSIS. A data source view is a document that describes the schema of an underlying data source. A data source view provides a named, browseable, persisted selection of database objects that can be used to define sources, destinations, and lookup tables for SQL Server 2005 Integration Services (SSIS) tasks, transformations, data sources, and destinations. Example: In some ERPs like SAP or even in some databases, the columns might be named like A0012, A0013, AABE etc. Naming the columns this way is typically done by some designers for obfuscation (especially in packaged applications) but does not help the SSIS developers. As far as the SSIS developer is concerned it is just another column on which data transformations have to be applied. In this case we can create a data source view over this data source and can rename the columns in some meaningful way by not renaming the columns in the data source. In other words we are overlapping the table with another transparent table with change in the column names. So we see the column names of the transparent table and the data of the original table.

Connection Managers:

Connection Manager is also something new in SSIS. To understand this let us consider the following screen shot of DTS.

ssis_5.jpg

In this package we see two connections. The first one is to the database on the left and the other one is to the text file on the right. What happens when we move this package from one environment to another for example from development to QA? The connections to the databases and the file locations change. Therefore we have to open up each package and then each connection to configure it to point to the intended environment. Typically each DTS package might contain up to 5-10 connections. And when we are moving a large number of packages from one environment to another it becomes cumbersome. Please do note that this can easily be overcome by using dynamic properties and disconnected edits in DTS and we can either pick up the connection information from a file at run-time or even from a configuration table from a database but one needed to code for it or the configuration wasn’t that straight forward.

Now, let’s see how the connection manager helps in making this process easy in SSIS. Let us understand this with the help of an SSIS example shown in the image below:

ssis_6.jpg

In the above package we see two connections and one database connection. In SSIS, we have a centralized place where we can configure the connection to a particular data source or data destination called connection managers. All the connections inside the package make use of these connection managers to make the connection to the data sources and data destinations. As in DTS, SSIS also provides a wide variety of connection managers to make connections to a variety of sources and destinations. So when moving from one environment to another, we should open up the SSIS package and just make the change only in one place, which are the connection managers as opposed to multiple places as in DTS. This saves us time and also is less cumbersome. In a little while we will see how to dynamically alter the connections even without opening up the packages. This is done with Configurations.

Variables:

Variables are a powerful piece of the SSIS architecture; they allow you to dynamically control the package at runtime. There are two types of variables: system and user. System variables are ones that are built into SSIS, whereas user variables are created by the SSIS developer. Variables can also have varying scope, with the default scope being the entire package. They can also be set to be in scope of a container, task, or event handler inside the package. The addition of scope to variables is the main differentiating factor between SSIS variables and DTS global variables.

Configurations:

SSIS provides us with a way of altering the connection information and also the variable values at run time. This is done by using the Configuration Files. The following screen shot shows us how to open up the configuration editor in SSIS.

ssis_7.jpg
ssis_8.jpg

Right click on the control flow pane and click on the Configurations menu. This will open up the configuration wizard. This is also something which did not exist in the DTS. We will explore more of package configurations and how to create configuration files in the upcoming posts.

Transactions:

All Microsoft SQL Server 2005 Integration Services (SSIS) container types—packages, the For Loop, For each Loop, and Sequence containers, and the task hosts that encapsulate each task—can be configured to use transactions. Integration Services provides three options for configuring transactions: NotSupported, Supported, and Required.

Required indicates that the container starts a transaction, unless one is already started by its parent container. If a transaction already exists, the container joins the transaction
Supported indicates that the container does not start a transaction, but joins any transaction started by its parent container. For example, if a package with four Execute SQL tasks starts a transaction and all four tasks use the Supported option, the database updates performed by the Execute SQL tasks are rolled back if any task fails. If the package does not start a transaction, the four Execute SQL tasks are not bound by a transaction, and no database updates except the ones performed by the failed task are rolled back.
NotSupported indicates that the container does not start a transaction or join an existing transaction. A transaction started by a parent container does not affect child containers that have been configured to not support transactions. For example, if a package is configured to start a transaction and a For Loop container in the package uses the NotSupported option, none of the tasks in the For Loop can roll back if they fail.

In this post, we covered a brief introduction to SSIS, its important features and how SSIS differs from its predecessor: DTS. In future posts, we will discuss each task in detail and will also cover the best practices as well as go into the usage of SSIS in building up business intelligence applications like explaining how to handle slowly changing dimensions etc..

Deploying Reports

Introduction

Tight integration with the .NET environment is the important feature for the popularity of Microsoft SQL Server 2005. As a result, it is possible to perform the some of the common tasks by using different options. This allows the developers/administrators to do their work by selecting the options that they are comfortable with. For example, you can manage/administer the report sever using the SQL Server Management Studio OR the web-based Report Manager. In this article, will look at the various ways for deploying reports on the report server. [It is assumed that the reports are already created]

How to deploy reports?

It is possible to deploy the reports in 3 ways:
  • Using the BI Development studio.
  • Using the SQL Server Management Studio.
  • Programmatically. [Using the RS.EXE utility]
Using BI Development studio:
  1. Open the Project Properties dialog-box and set the TargetServerURL. It is the URL of the report server, on which the reports will be deployed. [Figure 1]

    [Figure 1]
  2. Right-Click on the Project and select Deploy [Figure 2]. This will deploy the reports and Data-sources defined in the project on the specified report server.

    [Figure 2]
Using SQL Server Management Studio:
  1. Connect to the Reporting services database.
  2. Right-click on Home and select the desired options. [Figure3]

    [Figure 3]

Select New Folder to create a new folder on the server. Select New Data Source option for defining a data-source for the reports. Select Import File for adding/deploying the reports on the server.

Using the RS.EXE utility:

  • It is a Command-line utility.
  • Used for performing Deployment and Administrative tasks programmatically.
  • Executes the RSS file. [Reporting Services Script file]
  • Command to execute:
    rs i MyRSS.rss -s http://localhost/reportserver
  • Parameters can also be passed. These parameters are the global variables in the RSS script file. Multiple parameters can be passed.
    For example:
    rs i MyRSS.rss -s http://localhost/reportserver -v param1="value1" -v param2="value2" 
How to write the RSS file:
  • VB.NET code file [with rss extension].
  • Can contain user-defined function and global parameters. The value for the global parameters has to be supplied from the command line.
  • It should have a Main () procedure.
  • By default, the namespaces System.IO, System.Xml and System.Web.Services are included.
  • A reference variable rs for the reporting services web service is generated automatically when the script is executed using the RS.EXE utility. This variable allows access to all the functionality of web service class library [of reporting services].

Sample code:

Extract the MyReport.zip file in the D\MyReports folder.

The sample RSS file creates a data-source, which points to the Northwind database on the (local) sql server instance using Windows authentication. The function used is: CreateDataSource(name, extension, connectionstring)

Example:

CreateDataSource("MyDataSource", "SQL", "Data Source= (local);Initial Catalog=Northwind")

It also publishes a sample report using the CreateReport function.
CreateReport(NameofReport, ParentFolder, Overwrite, ReportDefinition, Properties)

Example:

CreateReport(MyReport, /MyReports, False, definition, Nothing)

If report contains any external images, then it has to be also deployed as a resource. This is not required if the image is embedded in the report. The function used is: CreateResource(ImageName, ParentFolder, Overwrite, Definition, MIME type, Properties)

Example:

CreateResource(ImageName, parentPath, False, definition, "image/jpeg",Nothing)
For executing the rss file, open the command prompt and set the directory to D:\MyReports and run the following command:
rs -i MyRSS.rss -s http://localhost/ReportServer

Output will look like:

After successful execution, MyReports folder will be created on the report server. It can be viewed using the Report Manager.

Conclusion

The reports can be deployed using the above 3 methods. Reporting Services script files can be used to run any of the Report Server Web service operations like creating schedules, subscriptions, security settings, etc. Replicating report server would be easy using the Reporting Services script file.

Report Models in Reporting Services 2005

A report model, in technical terms, is a metadata description of a data source and the relationships that reside within that data source. What it is in practical terms is so much more. For a user it's their own little sandbox. But creating a Report Model is a multi-step process.

The first process is to analyze the needs of the users so you know what fields they are going to need in thier reports. Providing too little information renders the reporting service useless, but providing too much information could eventually cause performance issues as more users access reports. The second step is to ensure you have the appropriate referential integrity in place. Having a series of reports running on a poorly normalized database or poorly designed Analysis Services cube will cause additional headaches. When the project is first created, go into the properties of the project and set the URL of the Report Server (default setting is http://localhost/reportserver):

Setting the URL for the Report Server

After setting the URL of the Report you must specify a data source. Right-click Data Sources and choose New Data Source. Then choose the server you wish to connect to and the authentication method:

Creating a new data source in Solution Explorer

The next step is to create a Data Source View. Right-click Data Source Views and choose New Data Source View:

Creating a new Data Source View in Solution Explorer

As usual, you must select the data source you wish to use or, if not created yet, do so inside the wizard:

Selecting the data source

The next screen in the wizard is one of interest, because here you can either choose from the entire list of tables and views, or narrow your selection based on the schema the object belongs to:

Selecting from all tables and views

Filtering objects based on schema

As stated before, you can choose either a view or one or more tables, although you should create a view if joining more than two tables is necessary. Once the choice of objects is made, click Next and provide a name for the data source view:

Naming the Data Source View

View of Solution Explorer with Data Source View

Now you have everything you need to create a report model. Right-click Report Models and choose Add New Report Model:

Creating a New Report Model

Choose the data source view you wish to create the report model for and click Next:

Choosing the Data Source View

Then select the rules you want for metadata generation:

Choosing the Rules for Metadata Generation

The wizard then gives you an option on whether you want to update data statistics for the data source view or use the existing ones (the default is to use the current ones):

Collection of model statistics on the Data Source View

Once all the information is collected for creating the report model, click Run to complete the wizard:

Building statistics on the Report Model

Now you have to deploy the Data Source and Report Model to the Report Server (there is no need to deploy the data source view). Right-click each of the objects and select Deploy. From there you can assign security permissions to ensure that only those who need to access the data are able to access it. Users can also use Report Builder to create the ad-hoc reports and deploy either to their My Reports folder or to another folder on the Report Server.

Followers