31

2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

Embed Size (px)

Citation preview

Page 1: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand
Page 2: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

2

SQL Server 2008 ETL drilldownShane BartlePrincipal Consultant

BIN 309

Pat MartinANZ SQL Premier Field Engineer

Microsoft New Zealand

Page 3: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

3

What We Will Cover

Background to SSISSource Data Extraction – New ApproachesMonitoring EnhancementsDeveloper AdditionsData Profiling

Page 4: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

4

END USER TOOLS & PERFORMANCE MANAGEMENT APPS

ExcelPerformancePoint

Server

BI PLATFORM

SQL Server Reporting Services

SQL Server Analysis Services

SQL Server DBMS

SQL Server Integration Services

SharePoint Server

DELIVERY

Reports Dashboards Excel Workbooks

AnalyticViews Scorecards Plans

Integrated End-To-End BI Offering

Page 5: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

5

Integration todayIncreasing data volumesIncreasingly diverse sources

Requirements reached the tipping pointLow-impact source extractionEfficient transformationBulk loading techniques

SQL Server 2008 SSISBackground to SSIS

Page 6: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

6

GeoSpatial Data:Semi structured

Legacy data: binary files

Application database

Integration is a seamless, manageable operationSource, prepare, & load data in single, auditable processScale to handle heavy and complex data requirements

SQL Server Integration Services

GeoSpatialComponents

Customsource

Standardsources

Data-cleansingcomponents

Merges

Data miningcomponents

Warehouse

Reports

Mobiledata

Integration Services In Action

Cube

Page 7: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

7

Current SSIS Thread SchedulerThreads affinitised to dataflow subtreesThread starvation on highly-parallel designsSingle thread for each synchronous pathNon-linear scale-up (plateau)

SSIS Pipeline ParallelismRewrote the thread schedulerImproved performance and scaleThread pool shared across multiple components

BenefitsBetter performance (50%) in highly-parallel designsLess manual tuning during development (lower TCO)Better hardware utilisation (higher ROI)It just works!

Performance Improvements

Faster!

Page 8: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

8

Extracting data from the source is expensiveEfficient extraction is key to improving ETL performanceInvolves bulk loading data into staging areas or warehouse

Time consuming and resource intensiveTriggers (synchronous IO penalty)Timestamp columns (Schema changes)Complex queries (delayed IO penalty)Custom (ISV, mirror, snapshot, …)

Incremental data load is key to efficient extractionNeed to know what changed at source since a point in time

Expensive lookups to determine changed columnsProviding information up front about which columns changed

SQL Server 2008 SSISSource Data Extraction – New Approaches

Page 9: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

9

Change Data CaptureInformation about what changed at the source

Operation (Insert, Update, Delete)Update mask (which columns changed)

Changes captured from the log asynchronouslyMinimal impact on source systemLog reader can be scheduled to run during idle time

Enabled per tableHidden change tables store captured changesOne change table per source table that is trackedRetention-based cleanup jobs

CDC APIs provide access to change dataTable valued functions and scalar functions provide access to change data and CDC metadataTVF allows the changes to be gathered for specific intervals enabling incremental population of DW

Change Tables

OLTP

Data Warehouse

Page 10: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

10

Change Data Capture

Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.

demo

Page 11: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

11

Merge StatementSingle statement can deal with Inserts, Updates, and Deletes

Microsoft extension to ANSI definition for DELETE semanticsPerformance goals:

20% fasterMinimal logging on inserts (2×)

Typical solution:Clean the source data, load it into Tbl_StagingIndex Tbl_StagingUPDATE Warehouse INNER JOIN Tbl_Staging ON…INSERT Warehouse LEFT JOIN Tbl_Staging ON…MERGE Warehouse FROM Tbl_Staging ON…

Page 12: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

12

Merge ExampleMERGE dbo.Departments AS dUSING dbo.Departments_delta AS ddON (d.DeptID = dd.DeptID)WHEN MATCHED AND d.Manager <>

dd.Manager OR d.DeptName <> dd.DeptName

THEN UPDATE SET d.Manager = dd.Manager, d.DeptName = dd.DeptName

WHEN NOT MATCHED THEN INSERT (DeptID, DeptName, Manager) VALUES (dd.DeptID, dd.DeptName,

dd.Manager)WHEN NOT MATCHED BY SOURCE THEN DELETE

Page 13: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

13

Merge Statement

Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.

demo

Page 14: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

14

Lookup Transform EnhancementsScalable Cache Implementation

Cache-load is a separate operation to LookupCan be hydrated and dehydrated securely to the file systemCaches can be explicitly shared

Adaptable CachesCan be loaded from any source (SQL, Text, Mainframe…)Track cache hits and misses

Multiple ModesNo Cache (pass-through to DB)JIT – Just In Time (on miss, query database, and store result)Full-Cache (pre-load all rows)

Page 15: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

15

Lookup Transform 2008

Shane BartlePrincipal ConsultantMicrosoft N.Z.

demo

Page 16: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

16

Logging events to watch pipeline internalsPipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning

System Monitor to track I/O issuesBuffers In Use tracks how many buffers are presently being usedBuffers Spooled tracks how many 10Mb buffers have been spooled to disk

Superdump to resolve error scenariosNew support capability for reactive fix-up

SQL Server 2008 SSISMonitoring Enhancements

Page 17: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

17

Superdump

Provides visibility into the activity of a running packageCan be triggered without stopping a packageCan be scheduled (via registry key) to run on a

CrashSpecific Error Condition

Page 18: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

18

Superdump

Pat MartinPremier Field EngineerMicrosoft N.Z.

demo

Page 19: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

19

C# SupportADO.Net SupportImproved Import Export Wizard

SQL Server 2008 SSISDeveloper Additions

Page 20: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

20

2005 used VSA for Script Design and ExecutionLegacy component with Visual Basic onlyLimited set of “reference-able” assemblies

In 2008 SSIS uses Visual Studio Tools for ApplicationsVisual Studio Designer ShellC# (or VB.NET) as a languageCan reference all .net assembliesCan reference Web Services

C# Support

Page 21: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

21

ADO.Net Support

SSIS 2005 had a ADO.NET ‘DataReader’Limited to supplying a SqlCommand

SSIS 2008 has a full ADO.NET Data SourceMuch Enhanced User InterfaceODBC Support

Page 22: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

22

ADO.Net SupportData Type Conversions

New page shows mappings and possible issuesMay insert data convert transforms into dataflowDefault mappings are customisable (via Notepad)

New System for Scaling Up Number of TablesMake as sequence of dataflow tasks Each with 5 source/transform/destination chains

Import Export Wizard Additions

Page 23: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

23

Import/Export Wizard

Shane BartlePrincipal ConsultantMicrosoft N.Z.

demo

Page 24: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

24

Creates a Profile of Your SQL TablesExplore or maintain data qualityRun as a task in SSISProduces XML file outputHas a nice visual tool for working with profiles

Analyze a Set of Columns / Tables Looking ForCandidate keysColumn length distributionNull RatioPattern detectionValue distributions and statsFunctional dependenciesValue inclusion

SQL Server 2008 SSISData Profiling

Page 25: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

25

Data Profiling

Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.

demo

Page 26: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

26

Session Summary

Performance Improvements for throughputNew delta extraction alternative approachesMonitoring and support featuresC# and ADO.Net extend coverageData Profiling for data quality considerations

Page 27: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

28

Q & A

Page 28: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

29

Resources

www.microsoft.com/teched Tech·Talks Tech·Ed BloggersLive Simulcasts Virtual Labs

http://microsoft.com/technet

Evaluation licenses, pre-released products, and MORE!

http://microsoft.com/msdn

Developer’s Kit, Licenses, and MORE!

Page 29: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

Related ContentDAT361 SQL Server 2008 Security Deep Dive

BIN309 SQL Server 2008 ETL drill down

BIN310 SQL Server 2008 Analysis Server (SSAS) enhancements

DAT362 SQL Server Spatial in the Spotlight

BIN352 Microsoft SQL Server 2008 Reporting Services: Architecture Overview

BIN311 Advanced Dashboard Creation with MOSS 2007

DAT364 End-to-End Troubleshooting for Microsoft SQL Server 2005/2008

BIN401 Optimising Query Performance in SQL Server 2008 Analysis Services

DAT355 Upgrading to Microsoft SQL Server 2008: Notes from Early Adopters

BIN402 Building and Deploying Advanced MOSS 2007 Planning Applications

Page 30: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

32

Please complete anevaluation

Page 31: 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

33

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED

OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.