SQL Server 2005

You see things; and you say 'Why?' But I dream things that never were; and I say 'Why not?'

Wednesday, November 4, 2009

Recovery Models and backup

Backups

Microsoft, in SQL Server Books Online, defines backups as:

A copy of data that is used to restore and recover data after a system failure

SQL Backups can be created a number of ways and can incorporate all or some of the data, as well as some part of the transaction log. While this article is focused on 2005 syntax, most of the concepts are applicable to 2000. This is a huge topic. At best, I'm going to scratch the surface and give you enough information so you won't start crying again. After reading this, you should be able to set up a reasonable set of backups for your system.

Recovery Models

In order to begin working on backups, the business needs define a database recovery model. In essence, a recovery model defines what you're going to do with the transaction log data.

There are three recovery models: Full, Simple and Bulk Logged. These are pretty easy to define:

  • Simple in simple recovery mode, the transaction log is not backed up so you can only recover to the most recent full or differential backup.
  • Full – in full recovery mode you backup the database and the transaction log so that you can recover the database to any point in time.
  • Bulk Logged – in bulk logged mode, most transactions are stored in the transaction log, but some bulk operations such as bulk loads or index creation are not logged.

The two most commonly used modes are Simple and Full. Don't necessarily assume that, of course, you always need to use Full recovery to protect your data. It is a business decision. The business is going to tell you if you need to recover to a point in time or if you simply need the last full backup. It's going to define if your data is recoverable by other means, such as manual entry, or if you have to protect as much as possible as it comes across the wire. You use Simple recovery if you can afford to lose the data stored since the last full or differential backup and/or you just don't need recovery to a point in time. In Simple mode, you must restore all secondary read/write file groups when you restore the primary. You use Simple mostly on secondary databases that are not an absolute vital part of the enterprise or reporting systems, with read only access so there isn't a transaction log to worry about anyway. You use Full if every bit of the data is vital, you need to recover to a point in time or, usually in the case of very large databases (VLDB), you need to restore individual files and file groups independently of other files and file groups.

With both Simple and full recovery models, you can now run a Copy-Only backup which allows you to copy the database to a backup file, but doesn't affect the log, differential backup schedules or impact recovery to a point in time. I'll try to drill down on as many of these topics as possible through the article, but not the files and filegroups.

Working with Simple Recovery

Enough talk. Let's get down to running backups. Let's assume that we're in Simple recovery on a small to mid-sized database. I'm going to use AdventureWorks for all the sample scripts. To set it to simple recovery:

ALTER DATABASE AdventureWorks SET RECOVERY SIMPLE

Your simplest backup strategy is to run, at regular intervals, the following SQL Server backup command, which will perform a full backup of the database:

BACKUP DATABASE AdventureWorks
TO DISK = 'C:\Backups\AdventureWorks.BAK'

What's with all the typing you ask? Don't we have GUI tools to handle the work for us? Yes, most simple backups can be performed using SQL Server Management Studio. However, if you want to learn and understand what Management Studio is doing for you, or if you want some fine grained control over what is backed up, how and where, then you're going to have to break out the keyboard and put away the mouse.

The above command will precipitate a basic backup to disk. Most DBAs I know backup to file and then scrape the files onto a tape or some other media. This is because files on disk are simple and quick to recover, whereas media can sometimes be a bit of a pain. For example, we generally have two to three days worth of backups on our file systems for immediate recovery. We only go to the tape systems if we need to run restores for older backups.

What did that command do? It made a copy of all the committed data in the database. It also copied uncommitted log entries. These are used during recovery to either commit or rollback changes that occurred to the data during the backup process.

Copy-only backups

Normally, backing up a database affects other backup and restore processes. For example after running the previous command, any differential backups (a backup that only copies data changed since the last backup) would be using this as the starting point for data changes, not the backup you ran last night. As noted earlier, SQL 2005 introduces a new concept to backups, COPY_ONLY backups, which allow us to keep from interrupting the cycle:

BACKUP DATABASE AdventureWorks
TO DISK = 'C:\Backups\AdventureWorks.bak'
WITH COPY_ONLY;

Already we've found one of those more granular moments when the Management Studio wouldn't help you. If you want a copy only backup, you have to use the command line.

Differential backups

Let's assume for a moment, that we're still in simple recovery, but we're dealing with a larger database, say something above 100 GB in size. Full backups can actually start to slow down the process a bit. Instead, after consultation with the business, we've decided to do a weekly full backup and daily differential backups. Differential backups only backup the data pages that have changed since the last full backup. Following is the SQL backup command to perform a differential backup:

BACKUP DATABASE AdventureWorks
TO DISK = 'C:\backups\AdventureWorks.bak'
WITH DIFFERENTIAL;

Now, if we had to restore this database, we'd first go to the last full backup, restore that, and then restore the differential backups in order (more on that later).

You might have noticed, if you're trying these commands out as you go along, that we have not been changing the backup file names each time. Yet we haven't run into any errors. When running backups as we've done, SQL Server treats the file like a backup device and simply keeps appending the backups to the file.

If you want to simply overwrite the existing file you'll need to modify your backup statements:

BACKUP DATABASE Adventureworks
TO DISK = 'C:\backups\AdventureWorks.bak'
WITH INIT;

There are a number of other backup options that I won't be detailing here. Read the books online to see details on BLOCKSIZE, EXPIREDATE, RETAINDAYS, PASSWORD, NAME, STATS, and so on.

You can also run a statement that will check the integrity of a database backup. It doesn't check the integrity of the data within a backup, but it does verify that the backup is formatted correctly and accessible.

RESTORE VERIFYONLY
FROM DISK = 'C:\backups\Adventureworks.bak'

Full recovery and log backups

We've primarily been working on a database that was in Simple recovery mode (this used to be called Truncate Log on Checkpoint). In this mode, we do not backup the transaction logs for later recovery. Every backup under this mechanism is a database backup. Log backups are simply not possible.

However, you've only protected the data as of the last good backup, either full or differential. Let's change our assumptions. Now we're dealing with a large, mission critical application and database. We want to be able to recover this database up to the latest minute. This is a very important point. In theory, since the log entries are being stored and backed up, we're protected up to the point of any failure. However, some failures can cause corruption of the log, making recovery to a point in time impossible. So, we have to determine what the reasonable minimum time between log backups will be. In this case we can live with no more than 15 minutes worth of lost data.

So, let's start by putting our database in FULL recovery mode:

ALTER DATABASE AdventureWorks SET RECOVERY FULL

Then, on a scheduled basis, in this case every 15 minutes, we'll run the SQL backup command for the transaction log:

BACKUP LOG Adventureworks
TO DISK = 'C:\backups\AdventureWorks_Log.bak';

This script will backup committed transactions from the transaction log. It has markers in the file that show the start and stop time. It will truncate the log when it successfully completes, cleaning out from the transaction log the committed transactions that have been written to the backup file. If necessary, you can use the WITH NO_TRUNCATE statement to capture data from the transaction log regardless of the state of the database, assuming it's online and not in an EMERGENCY status. This is for emergencies only.

Note that we are not using the INIT statement in this case, but you can do so if you choose. When doing log backups, you've got options:

  1. Run all the backups to a single file, where they'll stack and all you have to do, on restore (covered later), is cycle through them.
  2. Name the backups uniquely, probably using date and time in the string.

In that latter case, safety says, use INIT because you're exercising maximum control over what gets backed up where, and you'll be able to know exactly what a backup is, when it was taken and from where based on the name. This is yet another place where operating backups from the command line gives you more control than the GUI. We've used both approaches in our systems for different reasons. You can decide what is best for your technology and business requirements.

Most of the options available to the database backup are included in Log backup, including COPY_ONLY. This would allow you to capture a set of transaction data without affecting the log or the next scheduled log backup. This would be handy for taking production data to another system for troubleshooting etc.

If you have your database set to FULL Recovery, you need to run log backups. Sometimes, people forget and the transaction log grows to the point that it fills up the disk drive. In this case, you can run:

BACKUP LOG Adventureworks WITH NO_LOG;

Attaching NO_LOG to the log backup, and not specifying a location for the log, causes the inactive part of the log to be removed and it does this without a log entry itself, thus defeating the full disk drive. This is absolutely not recommended because it breaks the log chain, the series of log backups from which you would recover your database to a point in time. Microsoft recommends running a full backup immediately after using this statement. Further, they're warning that this statement may be deprecated in a future release.

Restoring Databases

As important as SQL Server backups are, and they are vital, they are useless without the ability to restore the database.

Restoring a full database backup

Restoring a full database backup is as simple as it was to create:

RESTORE DATABASE Adventureworks
FROM DISK = 'C:\Backup\AdventureWorks.bak';

It's really that simple – unless, as we did originally, we are backing up everything to a file as if it were a backup device. In that case, you'll need to specify which file within the "device" you're accessing. If you don't know which file, you'll need to generate a list:

RESTORE HEADERONLY
FROM DISK = 'C:\Backup\Adventureworks.bak';

This will give you the same list as I showed above from Management Studio. So now, if we wanted to restore the second file in the group, the COPY_ONLY backup, you would issue the following command:

RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backup\Adventureworks.bak'
WITH FILE = 2;

Unfortunately, if you're following along, you may find that you just generated this error:

Msg 3159, Level 16, State 1, Line 1
The tail of the log for the database "AdventureWorks" has not been backed up.
Use BACKUP LOG WITH NORECOVERY to backup the log if it contains work you do
not want to lose. Use the WITH REPLACE or WITH STOPAT clause of the RESTORE
statement to just overwrite the contents of the log.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

What this means is, that your database is in full recovery mode, but you haven't backed up the "tail of the log", meaning the transactions entered since the last time you ran a backup. You can override this requirement if you change the previous syntax to:

RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2,
REPLACE;

That's the first time we've stacked the WITH clauses (WITH FILE=2 and WITH REPLACE is represented as WITH FILE=2, REPLACE), but it won't be the last. Read through the books online. Most of the WITH clause statements can be used in combination with the others.

What happens if we want to restore to a different database than the original? For example, we want to make a copy of our database from a separate backup. Maybe we want to move it down to a production support server where we are going to do some work on it, separate from the production copy of the database. If we take the simple approach, well, try this:

RESTORE DATABASE AdventureWorks_2
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2;

In this case, you should see a whole series of errors relating to files not being overwritten. You really can create new databases from backups, but if you're doing it on a server with the existing database, you'll need to change the location of the physical files using the logical names. In order to know the logical names of the files for a given database, run this prior to attempting to move the files:

RESTORE FILELISTONLY
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2;

This can then be used to identify the appropriate logical names in order to generate this script:

RESTORE DATABASE AdventureWorks_2
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 2,
MOVE 'AdventureWorks_Data' TO 'C:\backups\aw2_data.mdf',
MOVE 'AdventureWorks_Log' TO 'C:\backups\aw2_log.ldf';

Restoring a differential backup

The last method is to apply the differential backup. This requires two steps. First, we'll restore the database, but with a twist and then we'll apply the differential backup:

RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 1,
NORECOVERY,
REPLACE;

RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\AdventureWorks.bak'
WITH FILE = 3;

Most of this is probably self-explanatory based on what we've already covered. The one wrinkle is the inclusion of the NORECOVERY keyword. Very simply, during a restore, transactions may have started during the backup process. Some of them complete and some don't. At the end of a restore, completed transactions are rolled forward into the database and incomplete transactions are rolled back. Setting NORECOVERY keeps transactions open. This allows for the next set of transactions to be picked up from the next backup in order.

We're mainly dealing with simple backups and restores in this article, but a more advanced restore in 2005 allows secondary file groups to be restored while the database is online. Its primary file group must be online during the operation. This will be more helpful for very large database systems.

Restoring SQL Server databases to a point in time

Restoring logs is not much more difficult than the differential database restore that we just completed. There's just quite a bit more involved in restoring to a moment in time. Assuming you're backing up your logs to a single file or device:

RESTORE HEADERONLY
FROM DISK = 'C:\Backups\Adventureworks_log.bak';

Otherwise, you simply go and get the file names you need. First run the database restore, taking care to leave it in a non-recovered state. Follow this up with a series of log restores to a point in time.

RESTORE DATABASE AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks.bak'
WITH FILE = 1,
NORECOVERY,
REPLACE,
STOPAT = 'Oct 23, 2006 14:30:29.000';

RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 1,
NORECOVERY,
STOPAT = 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 2,
NORECOVERY,
STOPAT = 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 3,
NORECOVERY,
STOPAT = 'Oct 23, 2006 14:30:29.000';
RESTORE LOG AdventureWorks
FROM DISK = 'C:\Backups\Adventureworks_log.bak'
WITH FILE = 4,
STOPAT = 'Oct 23, 2006 14:30:29.000';

Now what we have is a database that is up to the exact, last committed transaction at 14:30:29 on the 23rd of October. Remember, during multi-step restores such as this, you have to leave the database in a recovering status. That means appending NORECOVERY to each statement until you've completed the restore process. If for some reason you've added NORECOVERY to all your statements, or you simply stop in the middle, and would like to bring the database back online, you can use this statement to complete the process:

RESTORE DATABASE Adventureworks
WITH RECOVERY;

Database snapshots

SQL Server 2005 introduced the concept of a snapshot, or a read-only, static view of a database. Snapshots are primarily created in order to supply a read-only version of a database for reporting purposes. However, they do function in a similar way to backups. The one primary difference is that all uncommitted transactions are rolled back. There is no option for rolling forward, capturing logs, etc., that backups provide, nor are very many SQL Server resources used at all. Rather, disk technology is used to create a copy of the data. Because of this they are much faster than backups both to create and restore.

NOTE:
For more details on SQL 2005 Snapshot, please refer to http://www.simple-talk.com/sql/database-administration/sql-server-2005-snapshots/.

A good use of snapshots, in addition to reporting, might be to create one prior to maintenance after you've already removed all the active users (and their transactions) from the system. While snapshots don't support the volatility of live backups, their speed and ease of recovery make a great tool for quick recovery from a botched rollout. Snapshots are stored on the server, so you must make sure you've got adequate storage.

The syntax is different because you're not backing up a database; you're creating a new one:

CREATE DATABASE Adventureworks_ss1430
ON (NAME = AdventureWorks_Data,
FILENAME = 'C:\Backups\AdventureWorks_data_1430.ss')
AS SNAPSHOT OF AdventureWorks;

Now it will be accessible for read-only access. Since we're primarily concerned with using this as a backup mechanism, let's include the method for reverting a database to a database snapshot.

First, identify the snapshot you wish to use. If there is more than one on any database that you're going to revert, you'll need to delete all except the one you are using:

DROP DATABASE Adventureworks_ss1440;

Then you can revert the database by running a RESTORE statement (mixed metaphors, not good):

RESTORE DATABASE Adventureworks
FROM DATABASE_SNAPSHOT = Adventureworks_ss1430;

That's it. On my system, running the database snapshots of Adventureworks took 136 ms. The full backup took 5,670 ms. The restore of the snapshot took 905ms and the database restore took 13,382ms. Incorporating this into a production rollout process could result in significant benefits

Again, it's worth noting that there are some caveats to using the snapshot. You have to have enough disk space for a second copy of the database. You need to be careful dealing with snapshots since most of the syntax is similar to that used by databases themselves. Last, while there are snapshots attached to a database you can not run a restore from a database backup of that database.

Best practices

The manner in which you perform database backups should not be a technical decision. It should be dictated by the business. Small systems with low transaction rates and/or reporting systems that are loaded regularly will only ever need a full database backup. Medium sized systems and large systems become dependent on the type of data managed to determine what types of backup are required.

For a medium sized system, a daily backup with log backups during the day would probably answer most data requirements in a timely manner.

For a large database the best approach is to mix and match the backups to ensure maximum recoverability in minimum time. For example, run a weekly full backup. Twice a day during the week, run a differential backup. Every 10 minutes during the day, run a log backup. This gives you a large number of recovery mechanisms.

For very large databases, you'll need to get into running filegroup and file backups because doing a full backup or even a differential backup of the full database may not be possible. A number of additional functions are available to help out in this area, but I won't be going into them here.

You should take the time to develop some scripts for running your backups and restores. A naming convention so you know what database, from which server, from which date, in what specific backup and format will be very conducive to your sanity. A common location for backups, log, full or incremental, should be defined. Everyone responsible should be trained in both backup and recovery and troubleshooting the same. There are many ways of doing this, but you can find a few suggestions in Pop backs up and Pop Restores.

The real test is to run your backup mechanisms and then run a restore. Then try a different type of restore, and another, and another. Be sure that, not only have you done due diligence in defining how to backup the system, but that you've done the extra step of ensuring that you can recover those backups. If you haven't practiced this and documented the practice and then tested the document, in effect, you're not ready for a disaster.



Tuesday, September 15, 2009

SSIS Design Pattern - Custom Error Handling

Error logging is crucial to troubleshooting and auditing efforts. If an error occurs in your SSIS package while it is executing production loads, transformations, or transfers; you will want to know as much as possible about it.

To demonstrate, create a new SSIS project named ErrorTest:

I encourage you to develop the practice of renaming packages as you create them. This will make your life simpler once you deploy:

I rename Package.dtsx to ErrorTestPkg.dtsx (don't change the extension):

When you rename a package the following dialog will appear:

Always answer this dialog "Yes".
Drag an Execute SQL Task onto the Control Flow:

Configure it (double-click or right-click and select Edit to open the editor) as shown below. Note any connection will suffice, so long as the SQL Server is running. Also note the SQL will generate an exception (intentionally):

Click OK to close the editor. Right-click the Task and select Execute Task to test:

The task should fail (as planned):

Stop execution and click on the Event Handlers tab:

Check to be sure the Executable dropdown is set to the ErrorTestPkg (the package) and the Event handler dropdown is set to OnError. Click the link to create a new OnError Event Handler for the package. From the Toolbox, drag and drop a Data Flow Task onto the Event Handler:

Double-click the Data Flow Task to edit it. From the toolbox, drag a Script Component transformation onto the Data Flow:

The Script Component can be configured as a Source, Transformation, or Destination. Select Source:

Since a Source has only outputs (a destination has only inputs, and a transformation has both inputs and outputs), you will need to configure the Output. Rename the generic Output 0 to ErrorOutput:

Click on Output Columns and click Add Column:

Rename the Column ErrDescription:

Edit the ErrDesription column's properties - change the datatype to String (DT_STR) and the length to 5000:

Add the ErrorDescription System variable to the ReadOnlyVariables property of the Script Component:

There are other variables you will add to your error handling script component. Remember variable names are case-sensitive, comma-separated, with no spaces in the list.
Click the Design Script button and add the following code snippet in the CreateNewOutputRows subroutine as shown below:


With ErrorOutputBuffer
.AddRow()
.ErrDescription = Me.Variables.ErrorDescription
End With


Some explanation: The name of the ErrorOutputBuffer was determined by the name you gave the Script Component Output earlier. When you changed the Output name to ErrorOutput, the script component's output buffer was changed to ErrorOutputBuffer.

To populate the buffer, you must add rows to it. Think of this as starting a new row in a table. If you're familiar with tables in Microsoft Word documents, you know when you start a new row it is empty until you populate it. The same is true here. You are creating a new, empty row in the output buffer to populate with data.

The final step is to load the row. Again, in your production-ready error event handler you will populate more columns in the output buffer than this one - this is for demonstration purposes only:

Next, drag a flat file destination onto the Data Flow and connect it to the Script Component as shown:

Note: You will likely want to store error data in a database rather than a flat file.

When you configure a flat file destination, you are prompted for which type of file. Delimited files are fairly simple to read - especially with few fields:

Select a location and name for your flat file. I chose C:\ErrorOutput1.txt:

Click Columns to view the columns the file will contain. Note: the columns would be empty if you had not connected a data path from the Script Component prior to configuring the Flat File Destination:

Click OK to proceed. Click the Mappings page to assign mappings between fields in the pipeline and fields in the destination:

Since you built the destination from the metadata contained in the pipeline, auto-mapping should occur when you click Mappings:

Click OK. Your Event Handler should appear as shown:

Start debugging the package by pressing F5 or clicking the Debug Run arow. Your ExecuteSQL task should fail as before:

Click on the Event Handlers tab to determine the run status of your Package OnError Event Handler. Double-click the Data Flow Task and note one row was written to your error file (ErrorOutput1.txt):

Open ErrorOutput1.txt file to view the error stored there:

In this example, we built a package to demonstrate logging package errors to a data destination - albeit a flat file. This technique can be used to catch errors and store them for troubleshooting posterity.

Get the code!

Thursday, September 10, 2009

DTS vs SSIS

In today’s blog post, we will give a basic introduction of SSIS (SQL Server Integration Services) and provide a comparison of some of those features with DTS (Data Transformation Services), it’s predecessor.

SSIS is an ETL tool provided by Microsoft for extracting the data from various sources, transforming the data according to the business requirements and loading the data into the destination. The source and the destination could be anything that you can talk to using a connection. One important thing to understand is that SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server 2005 and is not a separate install. Integration Services includes graphical tools and wizards for building and debugging packages, tasks for performing workflow functions such as FTP operations, for executing SQL statements, or for sending e-mail messages, data sources and destinations for extracting and loading data, transformations for cleaning, aggregating, merging, and copying data.

In SQL Server 2000, DTS packages are created in Enterprise Manager. The DTS packages are created in the DTS designer and you have the option of saving them in SQL Server or saving them to the filesystem (by specifying the location as “Structured Storage File”). In Integration Services, the packages are created using Business Intelligence Development Studio (BIDS). BIDS is nothing but another version of Microsoft Visual Studio. It is a modified version of Visual Studio. When we create Integration Services packages using BIDS, they are saved on the client machine (local machine). We have to explicitly deploy the Integration Services Packages onto the destination server.

The following steps show how you can launch BIDS:

Go to: Start –> All Programs –> Microsoft Sql Server 2005 –> Business Intelligence Development Studio
ssis_1.jpg

The basic organization concept in DTS as well as SSIS is the concept of a package. A package is essentially a collection of SSIS objects which includes:

a) Connections to the data sources,
b) Data Flows,
c) Control Flows, and
d) Event Handlers

We will discuss these shortly. Let’s first look at the DTS Package Designer and BIDS to get an idea of the IDE layout.

Package Design Pane:

The following images show the DTS and SSIS design panes respectively.
ssis_2.jpg
ssis_3.jpg

The striking difference when one sees both the designers is the way the designer is designed.

In DTS, the designer consists of a single pane. And to the right we see two options: the Connections and the Workflow tasks. The Connections are used to make the connections to a wide variety of data sources and destinations like Excel, Flat Files, Sql Server and Access essentially anything that we have an ODBC, OLEDB, Jet driver etc. connectivity to. The Work Flow tasks add the functionality to the package. The thick line or what we call the data transformation task between the source and the destination in the above example transforms the data and loads into the destination. One can write custom code for the transformations or simply use the inbuilt capabilities of the data transformation task.

In SSIS, the designer is split into 4 design panes:
a) Control Flow,
b) Data Flow,
c) Event Handlers, and
d) Package Explorer.

The control flow pane is used to design the workflow inside the package. There is also a great improvement in the number of tasks and additional functionality has been added to some of the existing tasks. The control flow is like a program flow. All the workflow tasks in the DTS come under the Control Flow tasks in SSIS except the Data Transformation Task.

The Data Transformation Task in DTS has been converted into Data Flow task. As the name suggests, the Data Flow task handles the flow of data. Microsoft has introduced a wide variety of data flow tasks in SSIS for building an entire ETL. All the data transformation from the source to the destination takes place in this Data Flow task. Examples of Data Flow Task are Multicast, OLEDB Command, Sort transformation, Pivot transformation, Unpivot transformation etc.

Event Handlers is something new to SSIS. Event handlers respond to various events raised in the package like on error, on warning, on pre execute and on post execute. When an event is raised then the control goes to the Event Handler Pane, and the package does whatever is instructed in the Event Handler Pane. Again Event Handlers Contain the Control Flow Tasks because Control Flow determines the workflow of the package.

Package Explorer is also a new pane in SSIS. It gives the overall view of the package.

ssis_4.jpg

The image shown above shows the package explorer. As said earlier, the package explorer gives the over all gist of the entire package like Executables, Precedence Constraints, Connection Managers, Event Handlers and the Variables. Precedence Constraints and Connection Managers are described later in the post.

The Solution Explorer:

The Solution Explorer is also a new concept in SSIS. The solution explorer consists of the SSIS packages, connections and Data Source Views (DSV). The solution contains a series of projects. Each project contains a series of objects used in the projects. In DTS we were allowed to create only independent packages. There was no possibility to store the related packages of the module in a single place. The Solution Explorer allows us to do that. We can have multiple packages relation to one module in a single Solution.

Data Source View (DSV)

This is also new to SSIS. A data source view is a document that describes the schema of an underlying data source. A data source view provides a named, browseable, persisted selection of database objects that can be used to define sources, destinations, and lookup tables for SQL Server 2005 Integration Services (SSIS) tasks, transformations, data sources, and destinations. Example: In some ERPs like SAP or even in some databases, the columns might be named like A0012, A0013, AABE etc. Naming the columns this way is typically done by some designers for obfuscation (especially in packaged applications) but does not help the SSIS developers. As far as the SSIS developer is concerned it is just another column on which data transformations have to be applied. In this case we can create a data source view over this data source and can rename the columns in some meaningful way by not renaming the columns in the data source. In other words we are overlapping the table with another transparent table with change in the column names. So we see the column names of the transparent table and the data of the original table.

Connection Managers:

Connection Manager is also something new in SSIS. To understand this let us consider the following screen shot of DTS.

ssis_5.jpg

In this package we see two connections. The first one is to the database on the left and the other one is to the text file on the right. What happens when we move this package from one environment to another for example from development to QA? The connections to the databases and the file locations change. Therefore we have to open up each package and then each connection to configure it to point to the intended environment. Typically each DTS package might contain up to 5-10 connections. And when we are moving a large number of packages from one environment to another it becomes cumbersome. Please do note that this can easily be overcome by using dynamic properties and disconnected edits in DTS and we can either pick up the connection information from a file at run-time or even from a configuration table from a database but one needed to code for it or the configuration wasn’t that straight forward.

Now, let’s see how the connection manager helps in making this process easy in SSIS. Let us understand this with the help of an SSIS example shown in the image below:

ssis_6.jpg

In the above package we see two connections and one database connection. In SSIS, we have a centralized place where we can configure the connection to a particular data source or data destination called connection managers. All the connections inside the package make use of these connection managers to make the connection to the data sources and data destinations. As in DTS, SSIS also provides a wide variety of connection managers to make connections to a variety of sources and destinations. So when moving from one environment to another, we should open up the SSIS package and just make the change only in one place, which are the connection managers as opposed to multiple places as in DTS. This saves us time and also is less cumbersome. In a little while we will see how to dynamically alter the connections even without opening up the packages. This is done with Configurations.

Variables:

Variables are a powerful piece of the SSIS architecture; they allow you to dynamically control the package at runtime. There are two types of variables: system and user. System variables are ones that are built into SSIS, whereas user variables are created by the SSIS developer. Variables can also have varying scope, with the default scope being the entire package. They can also be set to be in scope of a container, task, or event handler inside the package. The addition of scope to variables is the main differentiating factor between SSIS variables and DTS global variables.

Configurations:

SSIS provides us with a way of altering the connection information and also the variable values at run time. This is done by using the Configuration Files. The following screen shot shows us how to open up the configuration editor in SSIS.

ssis_7.jpg
ssis_8.jpg

Right click on the control flow pane and click on the Configurations menu. This will open up the configuration wizard. This is also something which did not exist in the DTS. We will explore more of package configurations and how to create configuration files in the upcoming posts.

Transactions:

All Microsoft SQL Server 2005 Integration Services (SSIS) container types—packages, the For Loop, For each Loop, and Sequence containers, and the task hosts that encapsulate each task—can be configured to use transactions. Integration Services provides three options for configuring transactions: NotSupported, Supported, and Required.

Required indicates that the container starts a transaction, unless one is already started by its parent container. If a transaction already exists, the container joins the transaction
Supported indicates that the container does not start a transaction, but joins any transaction started by its parent container. For example, if a package with four Execute SQL tasks starts a transaction and all four tasks use the Supported option, the database updates performed by the Execute SQL tasks are rolled back if any task fails. If the package does not start a transaction, the four Execute SQL tasks are not bound by a transaction, and no database updates except the ones performed by the failed task are rolled back.
NotSupported indicates that the container does not start a transaction or join an existing transaction. A transaction started by a parent container does not affect child containers that have been configured to not support transactions. For example, if a package is configured to start a transaction and a For Loop container in the package uses the NotSupported option, none of the tasks in the For Loop can roll back if they fail.

In this post, we covered a brief introduction to SSIS, its important features and how SSIS differs from its predecessor: DTS. In future posts, we will discuss each task in detail and will also cover the best practices as well as go into the usage of SSIS in building up business intelligence applications like explaining how to handle slowly changing dimensions etc..

Followers