Introduction
In this blog, I will discuss SQL Server Integration Services (SSIS) and its components.
SQL Server Integration Services (SSIS) is an ETL tool (Extract, Transform and Load) which is used for building enterprise-level data integration and data transformation solutions. Integration services help in developing solutions for complex business problems, as listed below:
- Copying or downloading files
- Sending e-mail messages in response to events
- Updating DataWarehouses
- Cleaning and mining data
- Managing SQL server objects and data
Integration services can extract and transform data from a wide variety of sources such as XML data files, flat files, and relational data sources and then load data into one or more destinations. Integration services has rich set of built-in tasks and transformations, tools for constructing packages and the integration services service for running and managing packages.
Packages
A package is an organized collection of connections, control flow elements, data flow elements, event handlers, variables, parameters, and configurations which are assembled using either the graphical design tools that SQL Server Integration Services provides, or build programmatically. The package is the unit of work that is retrieved, executed and saved.
When we create a package it is an empty object that does nothing. The functionality is added to the package by adding control flow and one or more data flow to the package.
The following diagram shows a simple package that contains a control flow with a data flow task, which in turn contains a data flow.
After creating the package, the package functionality can be extended by adding advanced features like event handlers, logging, variables and configurations which will be explained in a while.
Contents of a Package
In this section we will discuss the various contents of the package.
Tasks and Containers (Control Flow)
A control flow consists of one or more tasks and containers that execute when the package runs. The execution of the tasks can be controlled by precedence constraints that connects the tasks and containers in a package. A subset of tasks can be grouped together as a unit and can be executed repeatedly within the package control flow.
Data Sources and Destination (Data Flow)
A data flow consists of the sources and destinations that extract and load data, transformations that modify and extend data and the paths that link sources, transformations and destinations.
To add a data flow to a package, the package control flow must include a data flow task. The data flow task is the executable within the SSIS package that creates, orders and runs the data flow.
Connection Managers (connections)
A package typically includes at least one connection manager. A connection manager is a link between package and data source that defines the connection strings for accessing the data that the tasks, transformations and event handlers in the package use. Integration Services includes connection types for data sources such as text and XML files, relational databases, and Analysis Services databases and projects.
Package Functionality Extension Objects
Event Handlers
An event handler is a work flow that runs in response to the events raised by a package, task or container. For example, if the package fails during execution, we can add a email task to send an email On Error event as seen in the below image.
Configurations
A configuration is a set of property-value pairs that defines the properties of the package and its tasks, containers, variables, connections and event handlers when the package runs. Using configuration the properties can be updated without modifying the package. When the package is run, the configuration information is loaded, updating the values of properties.
Logging and Log Providers
A log is a collection of information that is collected when the package runs. A log can provide start and finish time of the package run. Integration services has several built-in log providers for logging.
Variables
Integration services supports system variables and user-defined variables. The package-level variables include the pre-defined system variables available to a package and the user-defined variables with package scope.
In my next blog, I will explain in detail, the process to create a package, control flow, and its components.