18
Integration Services Creating an ETL Solution with SSIS Module Overview Introduction to ETL with SSIS Implementing Data Flow Lesson 1: Introduction to ETL with SSIS What Is SSIS? SSIS Projects and Packages The SSIS Design Environment Using the Import/Export wizard What Is SSIS? A platform for ETL operations Installed as a feature of SQL Server Control flow engine: Runtime resources and operational support for data flow Data flow engine: Pipeline architecture for buffer-oriented rowset processing Control Flow Engine Data Flow Engine Pipeline

Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Integration Services

Creating an ETL Solution with SSIS

Module Overview

• Introduction to ETL with SSIS

• Implementing Data Flow

Lesson 1: Introduction to ETL with SSIS

•What Is SSIS?

• SSIS Projects and Packages

• The SSIS Design Environment

•Using the Import/Export wizard

What Is SSIS?

•A platform for ETL

operations

• Installed as a feature

of

SQL Server

•Control flow engine:

• Runtime resources and

operational support for

data flow

•Data flow engine:

• Pipeline architecture for

buffer-oriented rowset

processing

Control Flow Engine

Data Flow Engine

Pip

elin

e

Page 2: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

SSIS Projects and Packages

• Package Deployment Model

• SSIS Packages are deployed and managed individually

• Project Deployment Model

• Multiple packages are deployed in a single project

Project

Package Package

Project-level parameter

Package-level parameter Package-level parameter

Deploy

Deploy

SSIS Catalog

Package

Deployment

Model

Project-level connection manager

Package connection manager Package connection manager

The SSIS Design Environment

Control Flow

Design

Surface

Data Flow

Tab

Solution

Explorer

Properties

Pane

Connection

Managers

Pane

SSIS

Toolbox

Pane

Package-level

Parameters

Event

Handlers

Tab

Package

Explorer

Variables

Pane

Variables and SSIS Toolbox buttons are at the upper right of the design surface.

Using the Import/Export Wizard

•Can be used to Export data from a table or query

in SQL Server

•Destination can be a wide variety of database

systems or file types.

•Can be used to Import data to a SQL table.

• The resulting package can be saved for reuse.

• Limited datatype transformations

Demonstration: Exploring Source Data

In this demonstration, you will see how to:

• Extract Data with the Import and Export Data

Wizard

• Explore Data in Microsoft Excel

Page 3: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Demonstration Steps Extract Data with the Import and Export Data Wizard • Ensure that the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines are both running, and then log on

to 20463C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd. • In the D:\Demofiles\Mod04 folder, right-click Setup.cmd, and then click Run as administrator. • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen, type Import and Export, and then start the SQL Server 2014 Import and Export

Data (64-bit) app. • On the Welcome to SQL Server Import and Export Wizard page, click Next. • On the Choose a Data Source page, set the following options, and then click Next:

Data source: SQL Server Native Client 11.0

Server name: localhost

Authentication: Use Windows Authentication

Database: ResellerSales • On the Choose a Destination page, select the following options, and then click Next:

Destination: Flat File Destination

File name: D:\Demofiles\Mod04\Top 500 Resellers.csv

Locale: English (United States)

Unicode: Unselected

Code page: 1252 (ANSI – Latin 1)

Lesson 2: Implementing Data Flow

•Connection Managers

• The Data Flow Task

•Data Sources

•Data Destinations

•Data Transformations

•Optimizing Data Flow Performance

•Demonstration: Implementing a Data Flow

Connection Managers

•A connection to a data source or destination:

• Provider (for example, ADO.NET, OLE DB, or flat file)

• Connection string

• Credentials

• Project or package level:

• Project-level connection managers:

• Can be shared across packages

• Are listed in Solution Explorer and the Connection Managers

pane for packages in which they are used

• Package-level connection managers:

• Can be shared across objects in the package

• Are listed only in the Connection Managers pane for packages

in which they are used

Page 4: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

The Data Flow Task

• The core control flow task in most SSIS

packages

• It encapsulates a data flow pipeline

• You define the pipeline for the task on the Data

Flow tab

Data Sources

•The source of data for a data flow:

• Connection manager

• Table, view, or query (where supported)

• Columns that are included

•Many Sources Supported:• Database (ADO.NET, OLE DB, CDC Source)

• File (Excel, Flat File, XML, Raw File)

• Custom

.

Data Destinations

• Endpoint for a data flow:

• Connection manager

• Table or view (where supported)

• Column mapping

•Multiple destination types:

• Database (ADO.NET, OLE DB, SQL Server, SQL Server

Compact)

• File (Excel, Flat File, Raw File)

• SQL Server Analysis Services (Data mining model

training, dimension processing, partition processing)

• Rowset (DataReader, Recordset)

• Custom

Data Transformations

• Row Transformations

• Character Map, Copy Column, data Conversion, Derived Column, Export

Column, Import Column, OLE DB Command

• Rowset Transformations

• Aggregate, Sort, Percentage Sampling, Row Sampling, Pivot, Unpivot

• Split and Join Transformations

• Conditional Split, Multicast, Union All, Merge, Merge Join, Lookup, Cache,

CDC Splitter

• Auditing Transformations

• Audit, Rowcount

• BI Transformations

• Slowly Changing Dimension, Fuzzy Grouping, Fuzzy Lookup, Term

Extraction, Term Lookup, Data Mining Query, Data Cleansing

• Custom Transformations

• Script, Custom Component

Page 5: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Optimizing Data Flow Performance

•Optimize queries:

• Select only the rows and columns that you need

•Avoid unnecessary sorting:

• Use presorted data where possible

• Set the IsSorted property where applicable

•Configure Data Flow task properties:

• Buffer size

• Temporary storage location

• Parallelism

• Optimized mode

For more information, go to http://go.microsoft.com/fwlink/?LinkID=248854, Tuning Your SSIS Package Data Flow in the Enterprise (SQL Server Video), and http://go.microsoft.com/fwlink/?LinkID=248858, Understanding SSIS Data Flow Buffers (SQL Server Video).

Demonstration: Implementing a Data Flow

In this demonstration, you will see how to:

•Configure a Data Source

•Use a Derived Column Transformation

•Use a Lookup Transformation

•Configure a Destination

Preparation Steps Complete the previous demonstrations in this module. Demonstration Steps Configure a Data Source • Ensure you have completed the previous demonstrations in this module. • Start SQL Server Management Studio and connect to the MIA-SQL database engine instance using

Windows authentication. • In Object Explorer, expand Databases, expand Products, and expand Tables. Then right-click each of

the following tables and click Select Top 1000 Rows and view the data they contain.

dbo.Product

dbo.ProductCategory

dbo.ProductSubcategory • In Object Explorer, under Databases, expand DemoDW, and expand Tables. Then right-click

dbo.DimProduct and click Select Top 1000 Rows to verify that this table is empty. • Start Visual Studio and create a new Integration Services project named DataFlowDemo in the

D:\Demofiles\Mod04 folder. • If the Getting Started (SSIS) window is displayed, close it. • In Solution Explorer, expand SSIS Packages, right-click Package.dtsx, and click Rename. Then change

the package name to ExtractProducts.dtsx. • In Solution Explorer, right-click Connection Managers and click New Connection Manager. Then add a

new OLEDB connection manager with the following settings:

Server name: localhost

Log on to the server: Use Windows Authentication

Select or enter a database name: Products

Page 6: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Lab: Implementing Data Flow in an SSIS Package

• Exercise 1: Exploring Source Data

• Exercise 2: Transferring Data by Using a Data Flow

Task

• Exercise 3: Using Transformations in a Data Flow

Logon Information

Virtual machine: 20462C-MIA-SQL

User name: ADVENTUREWORKS\Student

Password: Pa$$w0rd

Estimated Time: 60 minutes

Page 7: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

SSIS Control Flow

Implementing Control Flow in an SSIS Package

Module Overview

• Introduction to Control Flow

•Creating Dynamic Packages

•Using Containers

Lesson 1: Introduction to Control Flow

•Control Flow Tasks

• Precedence Constraints

•Grouping and Annotations

•Demonstration: Implementing Control Flow

•Using Multiple Packages

Control Flow Tasks

•Data Flow Tasks

•Database Tasks

• File and Internet Tasks

• Process Execution Tasks

•WMI Tasks

•Custom Logic Tasks

•Database Transfer Tasks

•Analysis Services Tasks

• SQL Server Maintenance Tasks

Page 8: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Precedence Constraints

• Connect sequences of tasks

• Three control flow conditions

• Success

• Failure

• Completion

• Multiple constraints

• Logical AND

• Logical OR

Task 1

Task 2

Task 3 Task 4

Task 5

Task 10

Task 6

Task 7

Success (AND)

Failure (AND)

Completion (AND)

Success (OR)

Failure (OR)

Completion (OR)

Task 9 Task 8

• The control flow starts with Task 1. • If Task 1 succeeds, Task 2 is executed. • If either Task 1 or Task 2 fails, Task 3 is

executed. • If Task 2 or Task 3 succeeds, Task 4 is

executed. • If Task 4 fails, Task 5 is executed, and if Task 4

succeeds, Task 6 is executed. • If Task 5 or Task 6 completes, Task 7 is

executed. • If Task 7 completes, Task 8 is executed. • If Task 7 and Task 8 succeed, Task 9 is

executed. • If Task 3 and Task 9 complete, Task 10 is

executed.

Grouping and Annotations

•Group tasks to manage them as a unit at design

time

• Show/Hide

• Move

•Add annotations to provide documentation

Task 1 Task 2 Task 3

Task 4

Grouped Tasks Can be Managed as a Unit

Annotations appear as

notes on the design

surface

Demonstration: Implementing Control Flow

In this demonstration, you will see how to:

•Add Tasks to a Control Flow

•Use Precedence Constraints to Define a Control

Flow

Page 9: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Preparation Steps Start the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines. Demonstration Steps Add Tasks to a Control Flow • Ensure that the 20463C-MIA-DC and 20463C-MIA-SQL virtual machines are both running, and then log on

to 20463C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa$$w0rd. • In the D:\Demofiles\Mod05 folder, run Setup.cmd as Administrator. • Start Visual Studio and open ControlFlowDemo.sln from the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • If the SSIS Toolbox is not visible, on the SSIS menu, click SSIS Toolbox. Then, from the SSIS Toolbox,

drag a File System Task to the control flow surface. • Double-click the File System Task and configure the following settings:

Name: Delete Files

Operation: Delete directory content

SourceConnection: A new connection with a Usage type of Create folder, and a Folder value of D:\Demofiles\Mod05\Demo.

• From the SSIS Toolbox, drag a second File System Task to the control flow surface. Then double-click the File System Task and configure the following settings:

Name: Delete Folder

Operation: Delete directory

SourceConnection: Demo

Using Multiple Packages

• Create reusable units of workflow

• Run multiple control flows in parallel

• Separate ETL workflows to fit data acquisition windows

Pkg1 Pkg2

Pkg4 Pkg3

Execute Package tasks

Lesson 2: Creating Dynamic Packages

•Variables

• Parameters

• Expressions

•Demonstration: Using Variables and Parameters

Page 10: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Variables

• User Variables:

• Variables created by an SSIS developer to hold dynamic

values

• Defined in the User namespace by default

• Defined at a specified scope

• System Variables

• Built-in variables with dynamic system values

• Defined in the System namespace

Name: fName

Data Type: String

Value: MyFile.csv

Scope: Package

User::fName

Name: StartTime

Data Type: DateTime

Value: When the package

started running

System::StartTime

The fully-qualified naming syntax for variables, which is namespace::variable_name.

Parameters

•Project parameters

• Accessible from any package in the project

•Package parameters

• Exist only at the package level

Default Value: "D:\MyFiles\"Project::folderPath

Package1

Project

Default Value: "Server=localhost…"

Package::dbConnStr

Package2

Default Value: "ftpsrv01"

Package::ftpSrvr

Expressions

•Used to set values dynamically:

• Properties

• Conditional split criteria

• Derived column values

• Precedence constraints

• Based on Integration Services expression syntax

• Can include variables and parameters

•Can be created graphically by using Expression

Builder

@[$Project::folderPath]+@[User::fName]

Demonstration: Using Variables and Parameters

In this demonstration, you will see how to:

•Create a Variable

•Create a Parameter

•Use Variables and Parameters in an Expression

Page 11: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Preparation Steps Complete the previous demonstration in this module. Demonstration Steps Create a Variable • Ensure you have completed the previous demonstration in this module. • Start Visual Studio and open the VariablesAndParameters.sln solution in the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • On the View menu, click Other Windows, and click Variables. • In the Variables pane. Click the Add Variable button and add a variable with the following properties:

Name: fName

Scope: Control Flow

Data type: String

Value: Demo1.txt

Lesson 3: Using Containers

• Sequence Containers

•Demonstration: Using a Sequence Container

• For Loop Containers

•Demonstration: Using a For Loop Container

• Foreach Loop Containers

•Demonstration: Using a Foreach Loop Container

Sequence Containers

•Define a control flow subset

• Enable you to manage properties for multiple

tasks

•Create a scope for variables, transactions, and

precedence

Task 1 Task 2 Task 3

Task 4

Sequence Container

Unlike a group, a sequence exists at design time and run time.

Demonstration: Using a Sequence Container

In this demonstration, you will see how to:

•Use a Sequence Container

Page 12: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Preparation Steps Complete the previous demonstrations in this module. Demonstration Steps Use a Sequence Container • Ensure you have completed the previous demonstrations in this module. • Start Visual Studio and open the SequenceContainer.sln solution in the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • Right-click the Group indicator around the Delete Files and Delete Folder tasks and click Ungroup to

remove it. • Drag a Sequence Container from the SSIS Toolbox to the control flow design surface. • Right-click the precedence constraint that connects Delete Files to Send Failure Notification, and click

Delete. Then delete the precedence constraints connecting the Delete Folder to Send Failure Notification and Create Folder.

• Click and drag around the Delete Files and Delete Folder tasks to select them both, and then drag into the sequence container.

• Drag a precedence constraint from the sequence container into Create Folder. Then right-click the precedence constraint and click Completion.

• Drag a precedence constraint from the sequence container to Send Failure Notification. Then right-click the precedence constraint and click Failure.

• Run the package and view the results, then stop debugging. • Click the sequence container and press F4. Then in the Properties pane, set the Disable property to

True. • Run the package again and note that neither of the tasks in the sequence container is executed. Then stop

debugging and close Visual Studio.

For Loop Containers

• Implement iterative control flow

• Similar to a C# For loop

• Initialization expression

• Evaluation expression

• Iteration expression

@Count = 0

@Count < 4

@Count = @Count + 1

For Loop

Task

Task

Iterator variable (Count)

@Count = 0

@Count < 4?

@Count = @Count + 1

No

Yes

Task

Demonstration: Using a For Loop Container

In this demonstration, you will see how to:

•Use a For Loop Container

Page 13: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Foreach Loop Containers

Iterate through an enumerated collection

• ADO• Rows in a recordset

• ADO.NET Schema Rowset• Objects in a database schema

• File• Files in a folder

• Variable• Elements in an array variable

• Item• Enumerated property values of an item

• Nodelist• Nodes in an XML document

• SMO• SQL Server Management Objects

Foreach Loop

Task

Enumerator variable

(for example, file name)

Demonstration: Using a Foreach Loop Container

In this demonstration, you will see how to:

•Use a Foreach Loop Container

Preparation Steps Complete the previous demonstrations in this module. Demonstration Steps Use a Foreach Loop Container • Ensure you have completed the previous demonstrations in this module. • Start Visual Studio and open the ForeachLoopContainer.sln solution in the D:\Demofiles\Mod05 folder. • In Solution Explorer, double-click Control Flow.dtsx. • From the SSIS Toolbox, drag a Foreach Loop Container to the control flow design surface. Then double-

click the Foreach loop container to view the Foreach Loop Editor dialog box. • On the Collection tab, in the Enumerator list, select Foreach File Enumerator, and in the Expressions

box, click the ellipsis (…) button. Then in the Property Expressions Editor dialog box, in the Property list, select Directory and in the Expression box click the ellipsis (…) button.

• In the Expression Builder dialog box, expand the Variables and Parameters folder and drag the $Project::folderPath parameter to the Expression box to specify that the loop should iterate through files in the folder referenced by the folderPath project parameter. Then click OK to close the Expression Builder, and click OK again to close the Property Expression Editor.

• In the Foreach Loop Editor dialog box, on the Collection tab, in the Retrieve file name section, select Name and extension to return the file name and extension for each file the loop finds in the folder.

• In the Foreach loop Editor dialog box, on the Variable Mappings tab, in the Variable list, select User::fName and in the Index column select 0 to assign the file name of each file found in the folder to the fName variable. Then click OK.

• Remove the precedence constraints that are connected to and from the Copy File task, and then drag the Copy File task into the Foreach Loop Container.

Page 14: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Lab: Implementing Control Flow in an SSIS Package

• Exercise 1: Using Tasks and Precedence in a

Control Flow

• Exercise 2: Using Variables and Parameters

• Exercise 3: Using Containers

Logon Information

Virtual machine: 20462C-MIA-SQL

User name: ADVENTUREWORKS\Student

Password: Pa$$w0rd

Estimated Time: 60 minutes

Page 15: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Deploying and ConfiguringSSIS Packages

Module Overview

•Overview of SSIS Deployment

•Deploying SSIS Projects

Lesson 1: Overview of SSIS Deployment

• SSIS Deployment Models

• Package Deployment Model

• Project Deployment Model

•Deployment Model Comparison

SSIS Deployment Models

Package Deployment Model

• SSIS Packages are deployed and managed individually

Project Deployment Model

• Multiple packages are deployed in a single project

Page 16: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Package Deployment Model

• Storage

• MSDB

• File System

• Package Configurations

• Property values to be set dynamically at run

time

• Package Deployment Utility

• Generate all required files for easier

deployment

Project Deployment Model

• The SSIS catalog

• Storage and management for SSIS projects on a SQL

Server instance

• Folders

• A hierarchical structure for organizing and securing

SSIS projects

Deployment Model Comparison

Feature Package Deployment Project Deployment

Unit of Deployment Package Project

Storage File system or MSDB SSIS Catalog

Dynamic configuration Package configurations Environment variables

mapped to project-

level parameters and

connection managers

Compiled format Multiple .dtsx files Single .ispac file

Troubleshooting Configure logging for

each package

SSIS catalog includes

built-in reports and

views

Lesson 2: Deploying SSIS Projects

•Creating an SSIS Catalog

• Environments and Variables

•Deploying an SSIS Project

•Viewing Project Execution Information

•Demonstration: Deploying an SSIS Project

Page 17: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Creating an SSIS Catalog

• Pre-requisites

• SQL Server 2012 or later

• SQL CLR enabled

•Creating a catalog

• Use SQL Server Management Studio

• One SSIS catalog per SQL Server instance

•Catalog Security

• Folder Security

• Object Security

• Catalog Encryption

• Sensitive Parameters

Environments and Variables

• Environments

• Execution contexts for projects

•Variables

• Environment-specific values that can be mapped to

project parameters and connection manager properties

at run time

Deploying an SSIS Project

• Integration Services Deployment Wizard

• Visual Studio

• SQL Server Management Studio

Viewing Project Execution Information

• Integration Services Dashboard provides built-in

reports

•Additional sources of information:

• Event Handlers

• Error Outputs

• Logging

• Debug Dump Files

Page 18: Integration Services Control Flow Engine 2014 Notes.pdf · • When you are prompted to confirm, click Yes, and then wait for the batch file to complete. • On the Start screen,

Demonstration: Deploying an SSIS Project

In this demonstration you will see how to:

•Configure the SSIS Environment

•Deploy an SSIS Project

•Create Environments and Variables

•Run an SSIS Package

•View Execution Information