Explore 1.5M+ audiobooks & ebooks free for days

Only €10,99/month after trial. Cancel anytime.

Professional Microsoft SQL Server 2014 Integration Services
Professional Microsoft SQL Server 2014 Integration Services
Professional Microsoft SQL Server 2014 Integration Services
Ebook1,885 pages15 hours

Professional Microsoft SQL Server 2014 Integration Services

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Fill the gap between planning and doing with SSIS 2014

The 2014 release of Microsoft's SQL Server Integration Services provides enhancements for managing extraction, transformation, and load operations, plus expanded in-memory capabilities, improved disaster recovery, increased scalability, and much more. The increased functionality will streamline your ETL processes and smooth out your workflow, but the catch is that your workflow must change. New tools come with new best practices, and Professional Microsoft SQL Server 2014 Integration Services will keep you ahead of the curve. SQL Server MVP Brian Knight is the most respected name in the business, and your ultimate guide to navigating the changes to use Microsoft SQL Server Integration Services 2014 to your utmost advantage.

  • Implement new best practices for effective use of SSIS
  • Work through tutorials for hands-on learning of complex techniques
  • Read case studies that illustrate the more advanced concepts
  • Learn directly from the foremost authority on SSIS

SQL Server Integration Services is a complex tool, but it's the lifeblood of your work. You need to know it inside out, and you must understand the full potential of its capabilities in order to use it effectively. You need to make sure the right architecture is in place. Professional Microsoft SQL Server 2014 Integration Services is your roadmap to understanding SSIS on a fundamental level, and setting yourself up for success.

LanguageEnglish
PublisherWiley
Release dateApr 17, 2014
ISBN9781118850855
Professional Microsoft SQL Server 2014 Integration Services

Read more from Devin Knight

Related to Professional Microsoft SQL Server 2014 Integration Services

Related ebooks

Databases For You

View More

Reviews for Professional Microsoft SQL Server 2014 Integration Services

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Professional Microsoft SQL Server 2014 Integration Services - Devin Knight

    Chapter 1

    Welcome to SQL Server Integration Services

    WHAT’S IN THIS CHAPTER?

    What’s new to this version of SSIS

    Exploring tools you’ll be using in SSIS

    Overview of the SSIS architecture

    Considering your licensing options around BI with SQL Server

    SQL Server Integration Services (SSIS) is the anchor in a growing suite of products that make up the Microsoft SQL Server Business Intelligence (BI) platform. What makes SSIS so important is without the data movement and cleansing features that SSIS brings to the table, the other SQL Server BI products can’t operate. What’s the point of a cube, for example, with bad or inconsistent data? In its simplest form, SSIS is an enterprise-level, in-memory ETL tool. However, SSIS is not just a fancy wrapper around an import wizard. In a drag-and-drop development environment, ETL developers can snap together intricate workflows and out-of-the-box data-cleansing flows that rival custom coding and expensive million-dollar, third-party tools. The best thing about SSIS is that you have already paid for it when you license SQL Server.

    When we put together the first edition of this book, we were blown away by the new architecture and capabilities of SSIS. SSIS was a big change from the Data Transformation Services (DTS) product that it replaced, and there was much to learn. Since the first edition of SSIS, we have collectively racked up many years of experience converting older DTS packages and mind-sets over to using it, and trust us when we say that no one who has made the change is asking to go back. We’ve learned some things, too.

    While SQL Server 2012 was a large jump forward for SSIS, SQL Server 2014 has some very small iterative changes. When we wrote this book, we dug deeply to mine the decades of cumulative experience working with this product, adding our collective knowledge back into these pages. We hope you will agree that the result makes your experience with SSIS a more productive one. This chapter starts from the beginning by providing an overview of SSIS, describing where it fits within the BI product platform and ETL development in general.

    SQL SERVER SSIS HISTORICAL OVERVIEW

    In SQL Server 7.0, Microsoft had a small team of developers work on a very understated feature of SQL Server called Data Transformation Services (DTS). DTS was the backbone of the Import/Export Wizard, and its primary purpose was to transform data from almost any OLE DB–compliant data source to almost any destination. It also had the ability to execute programs and run scripts, making workflow a minor feature.

    By the time that SQL Server 2000 was released, DTS had a strong following of DBAs and maybe a few developers. Microsoft included in the release new features like the Dynamic Properties Task that enabled you to alter the package dynamically at runtime. Even though DTS utilized extensive logging along with simple and complex multiphase data pumps, usability studies still showed that developers had to create elaborate scripts to extend DTS to get what they wanted done. A typical use case was enabling DTS to load data conditionally based on the existence of a file. To accomplish this in DTS, you had to use the ActiveX Script Task to code a solution using the file system object in VBScript. The problem with that was DTS lacked some of the common components needed to support typical ETL processes. Although it was powerful if you knew how to write scripting code, most DBAs didn’t have this type of scripting experience (or time).

    After five years, Microsoft released the much-touted SQL Server 2005 and SSIS, which was no longer an understated feature like DTS. With the SQL Server 2008 release, SSIS was given extra scalability features to help it appeal more to the enterprise. This is entirely appropriate because so much has been added to SSIS. Microsoft made a huge investment in usability, with simple enhancements to the toolbox that allow newer users to ramp up easier. The main focus of the newest release of SQL Server is on the management and deployment of SSIS.

    WHAT’S NEW IN SSIS

    The scope of the SQL Server 2014 release of SSIS resembles the scope of the SQL Server 2008 R2 release. With the last release of SQL Server 2008 R2, the Microsoft SSIS team did very incremental changes after a very large SQL Server 2008 release. In SQL Server 2012 release, Microsoft had focused on SSIS manageability, making it easier to deploy and execute. Also added in 2012 are robust new data cleansing components that help you standardize and detect data anomalies. Furthermore, improvements to the development tools will help make SSIS developers more productive and help new developers get up to speed more easily. The SQL Server 2014 release uses a newer version of Visual Studio but all in all, it will feel much like SQL Server 2012. You will find new components in SQL Server 2014 SSIS, but they will have to be downloaded from sites like CodePlex from the product team and will eventually be rolled into the core product at a future release.

    TOOLS OF THE TRADE

    Most of this book will assume that you know nothing about previous releases of SQL Server SSIS. Instead, it takes a fresh look at SQL Server SSIS. The learning curve can be considered steep at first, but once you figure out the basics, you’ll be creating complex packages in no time. To provide an idea of how easy SSIS is to use, the following section looks at a staple tool in the ETL world: the Import and Export Wizard.

    Import and Export Wizard

    If you need to move data quickly from almost any OLE DB–compliant data source or flat file to a destination, you can use the SSIS Import and Export Wizard (shown in Figure 1-1). In fact, many SSIS packages are born this way, but most packages you wish to keep in a BI solution should not be created with the wizard. The wizard provides a quick way to move data and perform very light transformations of data but does not create packages that use best practices. The wizard is available in all editions of SQL Server except the Local Database edition and Express. It enables you to persist the logic of the data movement into a package file. The basic concept of an import/export wizard has not changed substantially from the days of DTS. You still have the option to check all the tables you want to transfer. In addition, however, you can also encapsulate the entire transfer of data into a single transaction.

    FIGURE 1-1

    Where do you find the wizard? It depends. If you just need to perform a quick import or export, access the wizard directly from the Start menu by navigating to Start ⇒ Microsoft SQL Server 2014 ⇒ Import and Export Data. The other option is to open a project in the SSIS development environment and select Project ⇒ SSIS Import and Export Wizard. We cover this in detail in Chapter 2. Before we get into all the mechanics for that, see Figure 1-1 for an example of the wizard that has bulk loaded tables.

    The SQL Server Data Tools Experience

    The SQL Server Data Tools (SSDT) was previously called Business Intelligence Development Studio (BIDS) in SQL Server 2008, and it is the central environment in which you’ll spend most of your time as an SSIS developer. SSDT is just a specialized use of the familiar Visual Studio development environment. In SQL Server 2014, SSDT no longer installs when you install SQL Server. Instead, you’ll have to download and install the SQL Server Data Tools (Business Intelligence for Visual Studio) from the Microsoft website. At the time of this publication, SQL Server 2014 can use the Visual Studio 2012 and 2013 versions to design SSIS packages. Visual Studio can host many different project types, from Console applications to Class Libraries and Windows applications. Although you may see many project types when you create a project, SSDT actually contains project templates for only Analysis Services, Integration Services, Report Server, and variants thereof. SSIS in particular uses a BI project type called an Integration Services project (see Figure 1-2), which provides a development design surface with a completely ETL-based set of tools in the Toolbox window.

    FIGURE 1-2

    This development environment is similar to the legacy DTS Designer, but the approach is completely different. Most important, this is a collaborative development environment just like any Visual Studio development effort, with full source code management, version control, and multi-user project management. SSIS solutions are developed just like all other .NET development solutions, including being persisted to files — in this case, XML file structures with a .DSTX file extension. You can even develop within the SSDT environment without a connection to a SQL Server instance using the offline mode. Once your solution is complete, it can be built and deployed to one or multiple target SQL servers. These changes from DTS to SSIS are crucial to establishing the discipline and best practices of existing software development methodologies as you develop business intelligence solutions. We’ll discuss this SSDT development interface in more detail in Chapter 2.

    SSIS ARCHITECTURE

    Microsoft has truly established SSIS as a major player in the extraction, transformation, and loading (ETL) market. Not only is the SSIS technology a complete code rewrite from SQL Server 2000 DTS, it now rivals other third-party ETL tools that can cost hundreds of thousands of dollars depending on how you scale the software — and it is included free with the purchase of SQL Server 2014. Free always sounds great, but most free products can take you only so far if the feature set is minimal or the toolset has usability, scalability, or enterprise performance limitations. SSIS, however, is the real deal, satisfying typical ETL requirements with an architecture that has evolved dramatically from earlier incarnations. At the time of this publication, SSIS held the world speed record of loading more than 2 terabytes in a single hour.

    Packages

    A core component of SSIS is the notion of a package. A package best parallels an executable program that you can write that contains workflow and business logic. Essentially, a package is a collection of tasks snapped together to execute in an orderly fashion. A package is also a unit of execution and development, much like a .NET developer creates programs or DLL files. Precedence constraints are used to connect the tasks together and manage the order in which they execute, based on what happens in each task or based on rules defined by the package developer. The package is brought together into a .DTSX file that is actually an XML-structured file with collections of properties. Just like other .NET projects, the file-based code is marked up using the development environment and can then be saved and deployed to a SQL Server.

    Don’t worry; you won’t have to know how to write this type of XML to create a package. That’s what the designer is for. The point here is that the SSIS package is an XML-structured file, much like .RDL files are to Reporting Services. Of course, there is much more to packages than that, and you’ll explore the other elements of packages, such as event handlers, later in this chapter.

    Control Flow

    The brain of a package is its Control Flow, which orchestrates the order of execution for all its components. The components consist of tasks and containers and are controlled by precedence constraints, discussed later in this chapter. For example, Figure 1-3 shows three tasks that are tied together with two precedence constraints.

    FIGURE 1-3

    Tasks

    A task can best be described as an individual unit of work. Tasks provide functionality to your package, in much the same way that a method does in a programming language. However, in SSIS, you aren’t coding the methods; rather, you are dragging and dropping them onto a design surface and configuring them. You can develop your own tasks, but here are the current ETL tasks available to you out of the box:

    Analysis Services Execute DDL Task: Executes a DDL Task in Analysis Services. For example, this can create, drop, or alter a cube (Enterprise and Developer Editions only).

    Analysis Services Processing Task: This task processes a SQL Server Analysis Services cube, dimension, or mining model.

    Bulk Insert Task: Loads data into a table by using the BULK INSERT SQL command.

    CDC Control Task: Maintains and interacts with the change data capture (CDC) feature from SQL Server.

    Data Flow Task: This very specialized task loads and transforms data into an OLE DB and ADO.NET destination.

    Data Mining Query Task: Allows you to run predictive queries against your Analysis Services data-mining models.

    Data Profiling Task: This exciting task enables the examination of data; it replaces your ad hoc data profiling techniques.

    Execute Package Task: Allows you to execute a package from within a package, making your SSIS packages modular.

    Execute Process Task: Executes a program external to your package, such as one to split your extract file into many files before processing the individual files.

    Execute SQL Task: Executes a SQL statement or stored procedure.

    Expression Task: Sets a variable to an expression at runtime.

    File System Task: This task can handle directory operations such as creating, renaming, or deleting a directory. It can also manage file operations such as moving, copying, or deleting files.

    FTP Task: Sends or receives files from an FTP site.

    Message Queue Task: Sends or receives messages from a Microsoft Message Queue (MSMQ).

    Script Task: This task enables you to perform .NET-based scripting in the Visual Studio Tools for Applications programming environment.

    Send Mail Task: Sends a mail message through SMTP.

    Web Service Task: Executes a method on a web service.

    WMI Data Reader Task: This task can run WQL queries against the Windows Management Instrumentation. This enables you to read the event log, get a list of applications that are installed, or determine hardware that is installed, to name a few examples.

    WMI Event Watcher Task: This task empowers SSIS to wait for and respond to certain WMI events that occur in the operating system.

    XML Task: Parses or processes an XML file. It can merge, split, or reformat an XML file.

    Also included are a whole set of DBA-oriented tasks that enable you to create packages that can be used to maintain your SQL Server environment. These tasks perform functions such as transferring your SQL Server databases, backing up your database, or shrinking the database. Each of the available tasks is described in Chapter 3 in much more detail, and you will see them in other examples throughout the book.

    Tasks are extensible, and you can create your own custom tasks in .NET if you need a workflow item that doesn’t exist or if you have a common scripting function that can benefit from reuse in your package development. To learn more about this topic, see Chapter 19.

    NOTE There’s a thriving ecosystem of third-party components that are available for SSIS. If you are looking for a task or Data Flow component that doesn’t exist out of the box, be sure to first search online before creating your own. Some examples of these components include support for SFTP, SalesForce.com communication, SharePoint integration, and compression of files to name just a few.

    Precedence Constraints

    Precedence constraints are package components that direct tasks to execute in a given order. In fact, precedence constraints are the connectors that not only link tasks together but also define the workflow of your SSIS package. A constraint controls the execution of the two linked tasks by executing the destination task based upon the final state of the prior task and business rules that are defined using special expressions. The expression language embedded in SSIS essentially replaces the need to control workflow using script-based methodologies that enable and disable tasks, as was used in the DTS legacy solution. With expressions, you can direct the workflow of your SSIS package based on all manner of given conditions. You’ll look at many examples of using these constraints throughout this book.

    To set up a precedence constraint between two tasks, you must set the constraint value; optionally, you can set an expression. The following sections provide a brief overview of the differences between the two.

    Constraint values define how the package will react when the prior task of two linked tasks completes an execution. The options define whether the destination task of two linked tasks should execute based solely on how the prior task completes. Three constraint values are possible:

    Success: A task that’s chained to another task with this constraint will execute only if the prior task completes successfully. These precedence constraints are colored green.

    Completion: A task that’s chained to another task with this constraint will execute if the prior task completes, whether or not the prior task succeeds or fails. These precedence constraints are colored blue.

    Failure: A task that’s chained to another task with this constraint will execute only if the prior task fails to complete. This type of constraint is usually used to notify an operator of a failed event. These precedence constraints are colored red.

    You can also conditionally tie tasks together by writing logic on a precedence constraint. This is done by placing an SSIS expression language (resembles C#) on the precedence constraint. For example, you might specify that a task should run only at the end of each month. To do this, you would add an expression that evaluated the runtime of the package to determine if the next step should be run. Much more about writing expressions can be found in Chapter 5.

    Containers

    Containers are core units in the SSIS architecture for grouping tasks together logically into units of work. Besides providing visual consistency, containers enable you to define variables and event handlers (these are discussed in a moment) within the scope of the container, instead of the package. There are four types of containers in SSIS:

    Task Host Container: Not a visible element that you’ll find in the Toolbox, but rather an abstract concept like an interface.

    Sequence Container: Allows you to group tasks into logical subject areas. Within the development environment, you can then collapse or expand this container for usability.

    For Loop Container: Loops through a series of tasks until a condition is met.

    Foreach Loop Container: Loops through a series of files or records in a data set, and then executes the tasks in the container for each record in the collection.

    Because containers are so integral to SSIS development, Chapter 6 is devoted to them. As you read through the book, you’ll see many real-world examples that demonstrate how to use each of these container types for typical ETL development tasks.

    Data Flow

    The core strength of SSIS is its capability to extract data into the server’s memory, transform it, and write it out to an alternative destination. If the Control Flow is the brains of SSIS, then the Data Flow would be its heart. The in-memory architecture is what helps SSIS scale and what makes SSIS run faster than staging data and running stored procedures. Data sources are the conduit for these data pipelines, and they are represented by connections that can be used by sources or destinations once they’ve been defined. A data source uses connections that are OLE DB–compliant and ADO.NET data sources such as SQL Server, Oracle, DB2, or even nontraditional data sources, such as Analysis Services and Outlook. The data sources can be in scope to a single SSIS package or shared across multiple packages in a project.

    All the characteristics of the connection are defined in the Connection Manager. The Connection Manager dialog options vary according to the type of connection you’re trying to configure. Figure 1-4 shows you what a typical connection to SQL Server would look like.

    FIGURE 1-4

    Connection Managers are used to centralize connection strings to data sources and to abstract them from the SSIS packages themselves. They can be shared across multiple packages in a project or isolated to a single package. Connection Managers also allow you to externalize the configuration of them at runtime by your DBA with a configuration file or parameters (which we’ll describe in Chapter 22). SSIS will not use the connection until you begin to instantiate it in the package. This provides the ultimate in lightweight development portability for SSIS.

    You learned earlier that the Data Flow Task is simply another executable task in the package. The Data Flow Task is the pipeline mechanism that moves data from source to destination. However, in the case of SSIS, you have much more control of what happens from start to finish. In fact, you have a set of out-of-the-box transformation components that you snap together to clean and manipulate the data while it is in the data pipeline.

    One confusing thing for new SSIS developers is that once you drag and drop a Data Flow Task in the Control Flow, it spawns a new Data Flow design surface with its own new tab in the SSDT user interface. Each Data Flow Task has its own design surface that you can access by double-clicking the Data Flow Task or by clicking the Data Flow tab and selecting the name of the Data Flow Task from the drop-down list. Just as the Control Flow handles the main workflow of the package, the Data Flow handles the transformation of data in memory. Almost anything that manipulates data falls into the Data Flow category. As data moves through each step of the Data Flow, the data changes, based on what the transform does. For example, in Figure 1-5, a new column is derived using the Derived Column Transformation, and that new column is then available to subsequent transformations or to the destination.

    FIGURE 1-5

    In this section, each of the sources, destinations, and transformations are covered from an overview perspective. These areas are covered in much more detail in later chapters.

    Sources

    A source is a component that you add to the Data Flow design surface to specify the location of the source data that will send data to components downstream. Sources are configured to use Connection Managers in order to enable the reuse of connections throughout your package. SSIS provides eight out-of-the-box sources:

    OLE DB Source: Connects to nearly any OLE DB data source, such as SQL Server, Access, Oracle, or DB2, to name just a few.

    Excel Source: Specializes in receiving data from Excel spreadsheets. This source also makes it easy to run SQL queries against your Excel spreadsheet to narrow the scope of the data that you wish to pass through the flow.

    Flat File Source: Connects to a delimited or fixed-width file.

    Raw File Source: Produces a specialized binary file format for data that is in transit; it is especially quick to read by SSIS. This component is one of the only components that does not use a Connection Manager.

    Xml Source: Retrieves data from an XML document. This source does not use a Connection Manager to configure it.

    ADO.NET Source: This source is just like the OLE DB Source but only for ADO.NET-based sources. The internal implementation uses an ADO.NET DataReader as the source. The ADO.NET connection is much like the one you see in the .NET Framework when hand-coding a connection and retrieval from a database.

    CDC Source: Reads data out of a table that has change data capture (CDC) enabled. Used to retrieve only rows that have changed over a duration of time.

    ODBC Source: Reads data out of table by using an ODBC provider instead of OLE DB. When you are given the choice between OLE DB and ODBC, it is still recommended in SSIS packages that you use OLE DB.

    If the source components included in SSIS do not provide the functionality required for your solution, you can write code to connect to any data source that is accessible from a .NET application. One method is to use the Script Component to create a source stream using the existing .NET libraries. This method is more practical for single-use applications. If you need to reuse a custom source in multiple packages, you can develop one by using the SSIS .NET API and object model.

    Transformations

    Transformations are key components within the Data Flow that allow changes to the data within the data pipeline. You can use transformations to split, divert, and remerge data in the data pipeline. Data can also be validated, cleansed, and rejected using specific rules. For example, you may want your dimension data to be sorted and validated. This can be easily accomplished by dropping a Sort and a Lookup Transformation onto the Data Flow design surface and configuring them.

    Transformation components in the SSIS Data Flow affect data in the data pipe in memory. Because this process is done in memory, it can be much faster than loading the data into a staging environment and updating the staging system with stored procedures. Here’s a complete list of transformations and their purposes:

    Aggregate: Aggregates data from transformation or source.

    Audit: Exposes auditing information from the package to the data pipe, such as when the package was run and by whom.

    CDC Splitter: After data has been read out of a table with CDC enabled, this transform sends data that should be inserted, updated, and deleted down different paths.

    Character Map: Makes common string data changes for you, such as changing data from lowercase to uppercase.

    Conditional Split: Splits the data based on certain conditions being met. For example, this transformation could be instructed to send data down a different path if the State column is equal to Florida.

    Copy Column: Adds a copy of a column to the transformation output. You can later transform the copy, keeping the original for auditing purposes.

    Data Conversion: Converts a column’s data type to another data type.

    Data Mining Query: Performs a data-mining query against Analysis Services.

    Derived Column: Creates a new derived column calculated from an expression.

    DQS Cleansing: Performs advanced data cleansing using the Data Quality Services engine.

    Export Column: Exports a column from the Data Flow to the file system. For example, you can use this transformation to write a column that contains an image to a file.

    Fuzzy Grouping: Performs data cleansing by finding rows that are likely duplicates.

    Fuzzy Lookup: Matches and standardizes data based on fuzzy logic. For example, this can transform the name Jon to John.

    Import Column: Reads data from a file and adds it to a Data Flow.

    Lookup: Performs a lookup on data to be used later in a transformation. For example, you can use this transformation to look up a city based on the zip code.

    Merge: Merges two sorted data sets into a single data set in a Data Flow.

    Merge Join: Merges two data sets into a single data set using a join function.

    Multicast: Sends a copy of the data to an additional path in the workflow.

    OLE DB Command: Executes an OLE DB command for each row in the Data Flow.

    Percentage Sampling: Captures a sampling of the data from the Data Flow by using a percentage of the Data Flow’s total rows.

    Pivot: Pivots the data on a column into a more nonrelational form. Pivoting a table means that you can slice the data in multiple ways, much like in OLAP and Excel.

    Row Count: Stores the row count from the Data Flow into a variable.

    Row Sampling: Captures a sampling of the data from the Data Flow by using a row count of the Data Flow’s total rows.

    Script Component: Uses a script to transform the data. For example, you can use this to apply specialized business logic to your Data Flow.

    Slowly Changing Dimension: Coordinates the conditional insert or update of data in a slowly changing dimension.

    Sort: Sorts the data in the Data Flow by a given column.

    Term Extraction: Looks up a noun or adjective in text data.

    Term Lookup: Looks up terms extracted from text and references the value from a reference table.

    Union All: Merges multiple data sets into a single data set.

    Unpivot: Unpivots the data from a non-normalized format to a relational format.

    Destinations

    Inside the Data Flow, destinations consume the data after the data pipe leaves the last transformation components. The flexible architecture can send the data to nearly any OLE DB–compliant, flat file, or ADO.NET data source. Like sources, destinations are also managed through the Connection Manager. The following destinations are available to you in SSIS:

    ADO.NET Destination: Exposes data to other external processes, such as a .NET application.

    Data Mining Model Training: Trains an Analysis Services mining model by passing data from the Data Flow to the destination.

    Data Reader Destination: Allows the ADO.NET DataReader interface to consume data, similar to the ADO.NET Destination.

    Dimension Processing: Loads and processes an Analysis Services dimension. It can perform a full, update, or incremental refresh of the dimension.

    Excel Destination: Outputs data from the Data Flow to an Excel spreadsheet.

    Flat File Destination: Enables you to write data to a comma-delimited or fixed-width file.

    ODBC Destination: Outputs data to an ODBC data connection like SQL Server, DB2, or Oracle.

    OLE DB Destination: Outputs data to an OLE DB data connection like SQL Server, Oracle, or Access.

    Partition Processing: Enables you to perform incremental, full, or update processing of an Analysis Services partition.

    Raw File Destination: Outputs data in a binary format that can be used later as a Raw File Source. It’s usually used as an intermediate persistence mechanism.

    Recordset Destination: Writes the records to an ADO record set. Once written, to an object variable, it can be looped over a variety of ways in SSIS like a Script Task or a Foreach Loop Container.

    SQL Server Compact Edition Destination: Inserts data into a SQL Server running the Compact Edition of the product on a mobile device or PC.

    SQL Server Destination: The destination that you use to write data to SQL Server. This destination has many limitations, such as the ability to only write to the SQL Server where the SSIS package is executing. For example, if you’re running a package to copy data from Server 1 to Server 2, the package must run on Server 2. This destination is there largely for backwards compatibility and should not be used.

    Variables

    Variables are another vital component of the SSIS architecture. SSIS variables can be set to evaluate to an expression at runtime. You can also set variables to be set in the Control Flow with either a Script Task or an Expression Task. Variables in SSIS have become the method of exchange between many tasks and transformations, making the scoping of variables much more important. By default, SSIS variables exist within a package scope, but they can be scoped to different levels within a package as mentioned earlier in the Containers section.

    Parameters

    Parameters behave much like variables but with a few main exceptions. Parameters, like variables, can make a package dynamic. The largest difference between them is that parameters can be set outside the package easily and can be designated as values that must be passed in for the package to start, much like a stored procedure input parameter. Parameters replace the capabilities of Configurations in previous releases of SQL Server.

    Error Handling and Logging

    In SSIS, you can control error handling in several places, depending on whether you are handling task or Data Flow errors. For task errors, package events are exposed in the user interface, and each event can have its own event-handler design surface. This design surface is yet another area where you can define workflow, in addition to the Control Flow and Data Flow surfaces you’ve already learned about. Using the event-handler design surface in SSIS, you can specify a series of tasks to be performed if a given event happens for a task in the task flow.

    Some event handlers can help you develop packages that can self-fix problems. For example, the OnError error handler triggers an event whenever an error occurs anywhere in scope. The scope can be the entire package or an individual task or container. Event handlers are represented as a workflow, much like the Control Flow workflow in SSIS. An ideal use for an event handler would be to notify an operator if any component fails inside the package. (You will learn much more about event handlers in Chapter 18.) You can also use the precedence constraints directly on the task flow design surface to direct workflow when a task fails to complete or it evaluates to an expression that forces the workflow to change.

    Logging has also been improved in SSIS in this latest release. Logging is now enabled by default, and packages are easier to troubleshoot. More than a dozen events can be simply selected within each task or package for logging. You can also choose to enable partial logging for one task and enable much more detailed logging for another task, such as billing. Some of the examples of events that can be monitored are OnError, OnPostValidate, OnProgress, and OnWarning, to name just a few. The logs can be written to nearly any connection: SQL Profiler, text files, SQL Server, the Windows Event log, or an XML file. You’ll see some examples of this in Chapter 18.

    EDITIONS OF SQL SERVER

    The available features in SSIS and SQL Server vary according to what edition of SQL Server you’re using. Of course, the more high-end the edition of SQL Server, the more features are available. In order from high-end to low-end, the following is a partial list of SQL Server editions:

    SQL Server Enterprise Edition: This edition of SQL Server is for large enterprises that need high availability and more advanced features in SQL Server and business intelligence. For example, there is no limit on processors or RAM in this edition. You’re bound only by the number of processors and the amount of RAM that the OS can handle. Microsoft will also continue to support Developer Edition, which enables developers to create SQL Server solutions at a much reduced price. Ultimately, if you’re trying to scale your solution into terabytes of data storage in SQL Server then Enterprise Edition is the right choice for you.

    SQL Server Business Intelligence Edition: This edition includes all the features of Standard Edition and also includes additional data cleansing features like Data Quality Services, which helps you create business rules that SSIS consumes. It also has many SharePoint integration features outside of SSIS and some scalability features. You can scale all the BI features other than the database engine to the OS maximum of cores with this edition.

    SQL Server Standard Edition: This edition of SQL Server now offers even more value than before. For example, you can create a highly available system in Standard Edition by using clustering, database mirroring, and integrated 64-bit support. Like Enterprise Edition in SQL Server 2012, it also offers unlimited RAM. Thus, you can scale it as high as your physical hardware and OS will allow. However, there is a cap of four processors with this edition.

    As for SSIS, you’ll have to use at least the Standard Edition to receive the bulk of the SSIS features. In the Express Edition, only the Import and Export Wizard is available. BI Edition gives you access to items like Data Quality Services and the DQS Cleansing Transformation. You need to upgrade to the higher editions in order to see some features in SSIS. For example, the following advanced transformations are available only with Enterprise Edition:

    Analysis Services Partition Processing Destination

    Analysis Services Dimension Processing Destination

    CDC Source, Destination, Splitter Transformation, and CDC Control Task

    Data Mining Training Destination

    Data Mining Query Component

    Fuzzy Grouping

    Fuzzy Lookup

    Term Extraction

    Term Lookup

    Half of these transformations are used in servicing Analysis Services. Along those same lines, one task is available only in Enterprise Edition: the Data Mining Query Task.

    SUMMARY

    In this chapter, you were introduced to the historical legacy and the exciting capabilities of the SQL Server Integration Services (SSIS) platform. You looked at where SSIS fits into the business intelligence (BI) platform for SQL Server, and then dove into an overview of the SSIS architecture. Within the architecture, we stayed up at 20,000 feet to ensure that you have a good understanding of how SSIS works and the core parts of the architecture. You learned about the core components of tasks, Data Flows, transformations, event handlers, containers, and variables — all crucial concepts that you’ll be dealing with daily in SSIS. Packages are executable programs in SSIS that contain a collection of tasks. Tasks are individual units of work that are chained together with precedence constraints. Lastly, transformations are the Data Flow items that change the data to the form you request, such as sorting the data.

    The next chapter describes some of the tools and wizards you have at your disposal to expedite tasks in SSIS. Chapter 3 dives deeply into the various tasks in the SSIS Toolbox that you can use to create SSIS workflows. In Chapter 4, you’ll learn about Data Flow Task and examine the data components that are available for use within the Data Flow pipeline to perform the transformations in ETL.

    Chapter 2

    The SSIS Tools

    WHAT’S IN THIS CHAPTER?

    Working with the Import and Export Wizard

    Using the SQL Server Data Tools application

    Examining the windows used to create packages

    Utilizing Management Studio to administer your packages

    As with any Microsoft product, SQL Server ships with a myriad of wizards and tools to make your life easier and reduce your time to market. In this chapter you will learn about some of the tools of the trade that are available to you and how to create your first basic package. These wizards make transporting data and deploying your packages much easier and can save you hours of work in the long run, but they’re only a starting point in most cases. In the first part of this chapter, you’ll look at the Import and Export Wizard, which enables you to create a package for importing or exporting data quickly with minimal transformations. As a matter of fact, you may run this tool in your day-to-day work without even knowing that SSIS is the back end for the wizard. The latter part of this chapter explores other, more powerful, tools that are available to you, such as SQL Server Data Tools (SSDT). By the time this chapter is complete, you will have created your first SSIS package.

    IMPORT AND EXPORT WIZARD

    The Import and Export Wizard is the easiest method to move data from sources like Oracle, DB2, SQL Server, Excel, and text files to nearly any destination, and it is available across all versions of SQL Server — even those that don’t include SSIS. This wizard uses SSIS as a framework and can optionally save a package as its output prior to executing. The package it produces may not be the most elegant, but it can eliminate a lot of tedious package development work and it provides the building blocks that are necessary for building the remainder of the package. Oftentimes as an SSIS developer, you’ll want to relegate the grunt work and heavy lifting to the wizard when you want to just move data for a onetime load, and then do the more complex coding yourself.

    As with any of the SSIS tools, there are numerous ways to open the tool. To open the Import and Export Wizard, right-click the database you want to import data from or export data to in SQL Server Management Studio and select Tasks ⇒ Import Data (or Export Data based on what task you’re performing). You can also open the wizard by right-clicking the SSIS Packages folder in SSDT and selecting SSIS Import and Export Wizard. Another common way to open it is from the Start menu under SQL Server 2014, where it’s called Import and Export Data. The last way to open the wizard is by typing dtswizard.exe at the command line or Run prompt. Regardless of whether you need to import or export data, the first few screens in the wizard look very similar.

    When the wizard appears, you’ll see the typical Microsoft wizard welcome screen. Click Next to begin specifying the source connection. If you had opened the wizard from Management Studio by selecting Export Data, this screen is pre-populated. In this screen you specify where your data is coming from in the Source dropdown box. Once you select the source, the rest of the options on the dialog may vary based on the type of connection. The default source is .Net Framework Data Provider for Odbc. Out of the box, you have ODBC and OLE DB sources that connect to SQL Server, Oracle, and Access. You can also use text files, Excel files, and XML files. Traditionally, the SQL Native Client is the provider used in SSIS because it gives additional functionality during design time. Change the data source to use the SQL Server Native Client 11.0 provider.

    For SQL Server, you must enter the server name, as well as the user name and password you’d like to use. If you’re going to connect with your Windows account, simply select Use Windows Authentication. Windows Authentication will pass your Windows local or domain credentials into the data source. Lastly, choose a database to which you want to connect. For most of the examples in this book, you’ll use the AdventureWorksDW database or a variation of that DW database, shown in Figure 2-1. This database can be downloaded at www.wrox.com.

    FIGURE 2-1

    NOTE Additional sources such as Sybase and DB2 are also available if you install the vendor’s OLE DB providers. You can download the OLE DB provider for DB2 free if you’re using Enterprise Edition by going to the SQL Server Feature Pack on the Microsoft website. (As of this writing, the SQL Server 2014 Feature Pack has not be released. However, the SQL 2012 Feature Pack will work for 2014 as well.)

    NOTE In 2011, Microsoft released information regarding the appropriate provider types to use for new development. It was recommended that any new development should be done with ODBC providers rather than OLE DB. This rule should only be followed for home grown applications for now. SSIS developers should continue using OLE DB because ODBC does not have full feature parity yet to complete some ordinary tasks.

    After you click Next, you are taken to the next screen in the wizard, where you specify the destination for your data. The properties for this screen are identical to those for the previous screen with the exception of the database. Change the Destination provider to SQL Server Native Client 11.0, then select TempDB from the Database dropdown. This will create and load the tables into a temporary space, which will disappear once you restart your instance of SQL Server. Click Next again to be taken to the Specify Table Copy or Query screen (see Figure 2-2). Here, if you select Copy data from one or more tables or views, you can simply check the tables you want. If you select Write a query to specify the data to transfer, you can write an ad hoc query (after clicking Next) that defines where to select the data from, or what stored procedure to use to retrieve your data.

    FIGURE 2-2

    For the purpose of this example, select Copy data from one or more tables or views and click Next. This takes you to the Select Source Tables and Views screen, where you can check the tables or views that you want to transfer to the destination (see Figure 2-3). For this tutorial, check a couple of tables such as FactResellerSales and FactInternetSales in the AdventureWorksDW database.

    FIGURE 2-3

    If you wish, you can click the Edit buttons to access the Column Mappings dialog for each table (see Figure 2-4). Here you can change the mapping between each source and destination column. For example, if you want the ProductKey column to go to the ProductKey2 column on the destination, simply select the Destination cell for the ProductKey column and point it to the new column, or select to ignore the column altogether.

    FIGURE 2-4

    Note that because you’re moving the data to a new database that doesn’t already contain the FactInternetSales table, the Create destination table option is one of the few options enabled by default. This will create the table on the destination before populating it with data from the source. If the table already existed, the data will append existing records but you could specify that all rows in the destination table should be deleted before populating it.

    Finally, you can check the Enable identity insert option if the table into which you are moving data has an identity column. If the table does have an identity column, the wizard automatically enables this option. If you don’t have the option enabled and you try to move data into an identity column, the wizard will fail to execute.

    For the purpose of this example, don’t change any of the settings in this screen. Click OK to apply the settings from the Column Mappings dialog and then click Next to proceed.

    If no errors are found, you are taken to the Save and Run Package screen (Figure 2-5). Here you can specify whether you want the package to execute only once, or whether you’d like to save the package for later use. As shown earlier, it isn’t necessary to execute the package here. You can uncheck Run Immediately and just save the package for later modification and execution. For this example, check the options for Run Immediately, Save SSIS Package, and File System. This collection of options will execute the package and save it as a .dtsx file to your computer. You learn more about where to save your SSIS packages later in this chapter. Note that if you save the package to SQL Server or to the file system, you’re saving the package with the Package Deployment Model. We’ll discuss more about the package deployment model in Chapter 22.

    FIGURE 2-5

    In this screen, you’re also asked how you wish to protect the sensitive data in your package. SSIS packages are essentially large XML files behind the scenes, and encryption of sensitive data, such as passwords, is critical to ensuring that no one sees that information by opening the XML manually. Again, you learn more about this later in this chapter, so for now just change the Package Protection Level property to Encrypt sensitive data with password to protect your sensitive data with a password, and give the dialog a password (as shown in Figure 2-5).

    You are then taken to the Save SSIS Package screen, where you can type the name of the package and the location to which you want to save it (see Figure 2-6).

    FIGURE 2-6

    Click Next and confirm what tasks you wish the wizard to perform. The package will then execute when you click Finish, and you’ll see the page shown in Figure 2-7. Any errors are displayed in the Message column. You can also see how many rows were copied over in this column, and you can double-click an entry that failed to see why, in the event that there are errors during execution.

    FIGURE 2-7

    After the wizard executes, the package can be found in the location that you have specified, but the default is the My Documents directory. You can open the package that executed in SSDT by creating a project in SSDT and copying and pasting the package into the project or by right-clicking on Packages and selecting Add Existing Package.

    SQL SERVER DATA TOOLS

    The SQL Server Data Tools (SSDT) is where you’ll spend most of your time as an SSIS developer. It is where you create and deploy your SSIS projects. SSDT uses a subset of the full version of Visual Studio 2013. If you have the full version of Visual Studio 2013 and SQL Server 2014 installed, you can create business intelligence projects there as well as in the full interface, but as far as SSIS is concerned, there’s no added value in using the full version of Visual Studio. Either way, the user experience is the same. In SQL Server 2014, the SSIS development environment is detached from SQL Server, so you can develop your SSIS solution offline and then deploy it wherever you like in a single click.

    In prior versions of Integration Services, SSDT was part of the SQL Server installation, but with the release of SQL Server 2014, SSDT has been decoupled from SQL Server installer. This means to develop new SSIS packages you must go download SSDT from the Microsoft download site. The benefit of this change is that developers will now see more frequent enhancements to the development environment. Use a search engine with the term SQL Server Data Tools for Visual Studio 2013 to find the most recent release.

    After you download and install SSDT you’ll find SSDT in the root of the Microsoft SQL Server 2014 program group from the Start menu. Once you start SSDT, you are taken to the Start Page, an example of which is shown in Figure 2-8, before you open or create your first project. You can open more windows (you learn about these various windows in a moment) by clicking their corresponding icon in the upper-right corner of SSDT or under the View menu. Please note that some of the screenshots in this book, such as Figures 2-8, 2-9, and 2-10, were shot using Visual Studio 2012, which also works with SQL Server 2014. You may also choose to use Visual Studio 2013 if you prefer, and the screenshot may be slightly different.

    FIGURE 2-8

    FIGURE 2-9

    FIGURE 2-10

    The Start Page contains key information about your SSDT environment, such as the last few projects that you had open (under the Recent Projects section). You can also see the latest MSDN news under the Get Started section from the Latest News box. By clicking the Latest News box, you can also set the RSS feed that you’re consuming as well.

    The Visual Studio environment is that it gives you full access to the Visual Studio feature set, such as debugging, automatic integration with source code control systems, and integrated help. It is a familiar environment for developers and makes deployments easy.

    The starting point for SSIS is to create a solution and project.

    A solution is a container in Visual Studio that holds one or many projects.

    A project in SSIS is a container of one or many packages and related files. You can also create projects for Analysis Services, Reporting Services, C#, and so on. All of these projects can be bundled together with a single solution, so a C# developer is in the same environment as an SSIS developer. Make sure you put packages that belong together into a single project since the project is the unit of deployment in SQL Server 2014.

    To start a new SSIS project, you first need to open SSDT and select File ⇒ New ⇒ Project. Note a series of new templates (shown in Figure 2-9) in your template list now that you’ve installed SSDT for Visual Studio 2013. From the Installed Templates pane on the left, select Integration Services and then select Integration Services Project. Name your project and solution whatever you like (I named the solution ProSSISSolution and the project ProSSISProject). Also shown in Figure 2-9 is another type of SSIS project called the Integration Services Import Project Wizard, which is used to bring packages into a project from another deployed project. Click OK at this point to create the solution, the project, and your first package.

    Typically, you want to align your projects into solutions that fit the business requirement that you’re trying to meet. For example, you may be assigned to a business project for the creation of a data warehouse. That warehouse project would probably have ETL, an SSAS cube, and Reporting Services reports. You could place all of these into a single solution so you could manage them from a unified interface. Note that once you begin work in Visual Studio, if your solution contains only a single project, the solution will be hidden by default. If you want to always see the solution name, go to Tools ⇒ Options and check Always Show Solution from the Projects and Solutions group (shown in Figure 2-10). If you’re doing any type of source control, this option should always be turned on so you can check in the solution easily. Otherwise, once a second project is added to the solution, you’ll see the solution and both projects under the solution.

    THE SOLUTION EXPLORER WINDOW

    The Solution Explorer window is where you can find all your created SSIS packages, project connection managers, project parameters, and any other miscellaneous files needed for the project, such as installation documents. As mentioned earlier, a solution is a container that holds a series of projects. Each project holds a myriad of objects for whatever type of project you’re working on. Once you create a solution, you can store many projects inside of it. For example, you may have a solution that has your VB.NET application and all the SSIS packages that support that application. In this case, you would have two projects: one for VB and another for SSIS contained within the single solution.

    After creating a new project, your Solution Explorer window will contain a series of empty folders. Figure 2-11 shows a partially filled Solution Explorer. In this screenshot, there’s a solution named ProSSISSolution with two projects: SSASProject and ProSSISProject. Inside that project are two SSIS packages.

    FIGURE 2-11

    To create a new project inside an existing open solution, right-click the solution name in the Solution Explorer window and select Add ⇒ New Project. To add a new item to your project in the folder, right-click the folder that holds the type of item that you wish to add and select New Connection Manager or New SSIS Package. You can also drag or copy and paste files into the project if they are of a similar type, like .dtsx files.

    If you look in the directory that contains your solution and project files, you’ll see all the files that are represented in the Solution Explorer window. Some of the base files you may see will have the following extensions:

    .dtsx: An SSIS package, which uses its legacy extension from the early beta cycles of SQL Server 2005 when SSIS was still called DTS

    .conmgr: A connection manager that can be shared across any package in the entire project

    .sln: A solution file that contains one or more projects

    .dtproj: An SSIS project file

    .params: An SSIS project parameter file

    If you copy any file that does not match the .params, .conmgr, or .dtsx extension, it will be placed in the Miscellaneous folder. This folder is used to hold any files that describe the installation of the package, such as Word documents or requirements documents. You can put anything you like in that folder, and it can potentially all be checked into a source control system like Team Foundation Server (TFS) or SourceSafe with the code. You’ll learn more about source control systems in Chapter 17.

    The SSIS Toolbox

    The SSIS Toolbox contains all the items that you can use in the particular tab’s design pane at any given point in time. For example, the Control Flow tab has a list of tasks (a partial list is shown in Figure 2-12). This list may grow depending on what custom tasks are installed, and the list will be completely different when you’re in a different tab, such as the Data Flow tab. All the tasks shown in Figure 2-12 are covered in Chapter 3 in much more detail.

    FIGURE 2-12

    In the Control Flow tab, the Toolbox is organized into tabs such as Favorites, Common, Containers, and Other Tasks. These tabs can be collapsed and expanded for usability. As you use the Toolbox, you may want to customize your view by removing tasks or tabs from the default view. You can move or customize the list of items in your Toolbox by right-clicking on a given component (refer to Figure 2-12). You can also reset the entire Toolbox to its defaults by right-clicking and selecting Restore Toolbox Defaults. As you install third-party components, those tasks will now automatically appear in the Toolbox after you refresh the Toolbox or when you reopen SSDT.

    The Properties Windows

    You can use the Properties window (shown in Figure 2-13) to customize any item that you have selected in the Control Flow or Data Flow tabs. For example, if you select a task in the design pane of those tabs, you’ll be shown a list of properties to configure, such as the task’s name and what query it’s going to use. The view varies widely based on what item you have selected. Figure 2-13 shows the properties of the Execute Process task you created earlier in this chapter.

    FIGURE 2-13

    Most tasks can be configured through the user interface of the tasks or by going to the Properties pane when the task is selected. Note that the Properties pane may contain some advanced properties not shown in the user interface for the component. To edit the properties for the package, simply select the design pane in the background. If the Properties pane is closed, you can press F4 to reopen it or select the Properties Window button under View.

    THE SSIS PACKAGE DESIGNER

    The SSIS Package Designer contains the design panes that you use to create an SSIS package. This tool contains all the items you need to move data or create a workflow with minimal or no code. The great thing about SSIS is that it is like programming with building blocks. The Package Designer contains five tabs: Control Flow, Data Flow, Parameters, Event Handlers, and Package Explorer. One additional tab, Progress, also appears when you execute packages. This Progress tab is renamed to Execution

    Enjoying the preview?
    Page 1 of 1