Upload
edwin-summers
View
220
Download
4
Tags:
Embed Size (px)
Citation preview
2
SQL Server 2008 ETL drilldownShane BartlePrincipal Consultant
BIN 309
Pat MartinANZ SQL Premier Field Engineer
Microsoft New Zealand
3
What We Will Cover
Background to SSISSource Data Extraction – New ApproachesMonitoring EnhancementsDeveloper AdditionsData Profiling
4
END USER TOOLS & PERFORMANCE MANAGEMENT APPS
ExcelPerformancePoint
Server
BI PLATFORM
SQL Server Reporting Services
SQL Server Analysis Services
SQL Server DBMS
SQL Server Integration Services
SharePoint Server
DELIVERY
Reports Dashboards Excel Workbooks
AnalyticViews Scorecards Plans
Integrated End-To-End BI Offering
5
Integration todayIncreasing data volumesIncreasingly diverse sources
Requirements reached the tipping pointLow-impact source extractionEfficient transformationBulk loading techniques
SQL Server 2008 SSISBackground to SSIS
6
GeoSpatial Data:Semi structured
Legacy data: binary files
Application database
Integration is a seamless, manageable operationSource, prepare, & load data in single, auditable processScale to handle heavy and complex data requirements
SQL Server Integration Services
GeoSpatialComponents
Customsource
Standardsources
Data-cleansingcomponents
Merges
Data miningcomponents
Warehouse
Reports
Mobiledata
Integration Services In Action
Cube
7
Current SSIS Thread SchedulerThreads affinitised to dataflow subtreesThread starvation on highly-parallel designsSingle thread for each synchronous pathNon-linear scale-up (plateau)
SSIS Pipeline ParallelismRewrote the thread schedulerImproved performance and scaleThread pool shared across multiple components
BenefitsBetter performance (50%) in highly-parallel designsLess manual tuning during development (lower TCO)Better hardware utilisation (higher ROI)It just works!
Performance Improvements
Faster!
8
Extracting data from the source is expensiveEfficient extraction is key to improving ETL performanceInvolves bulk loading data into staging areas or warehouse
Time consuming and resource intensiveTriggers (synchronous IO penalty)Timestamp columns (Schema changes)Complex queries (delayed IO penalty)Custom (ISV, mirror, snapshot, …)
Incremental data load is key to efficient extractionNeed to know what changed at source since a point in time
Expensive lookups to determine changed columnsProviding information up front about which columns changed
SQL Server 2008 SSISSource Data Extraction – New Approaches
9
Change Data CaptureInformation about what changed at the source
Operation (Insert, Update, Delete)Update mask (which columns changed)
Changes captured from the log asynchronouslyMinimal impact on source systemLog reader can be scheduled to run during idle time
Enabled per tableHidden change tables store captured changesOne change table per source table that is trackedRetention-based cleanup jobs
CDC APIs provide access to change dataTable valued functions and scalar functions provide access to change data and CDC metadataTVF allows the changes to be gathered for specific intervals enabling incremental population of DW
Change Tables
OLTP
Data Warehouse
10
Change Data Capture
Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.
demo
11
Merge StatementSingle statement can deal with Inserts, Updates, and Deletes
Microsoft extension to ANSI definition for DELETE semanticsPerformance goals:
20% fasterMinimal logging on inserts (2×)
Typical solution:Clean the source data, load it into Tbl_StagingIndex Tbl_StagingUPDATE Warehouse INNER JOIN Tbl_Staging ON…INSERT Warehouse LEFT JOIN Tbl_Staging ON…MERGE Warehouse FROM Tbl_Staging ON…
12
Merge ExampleMERGE dbo.Departments AS dUSING dbo.Departments_delta AS ddON (d.DeptID = dd.DeptID)WHEN MATCHED AND d.Manager <>
dd.Manager OR d.DeptName <> dd.DeptName
THEN UPDATE SET d.Manager = dd.Manager, d.DeptName = dd.DeptName
WHEN NOT MATCHED THEN INSERT (DeptID, DeptName, Manager) VALUES (dd.DeptID, dd.DeptName,
dd.Manager)WHEN NOT MATCHED BY SOURCE THEN DELETE
13
Merge Statement
Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.
demo
14
Lookup Transform EnhancementsScalable Cache Implementation
Cache-load is a separate operation to LookupCan be hydrated and dehydrated securely to the file systemCaches can be explicitly shared
Adaptable CachesCan be loaded from any source (SQL, Text, Mainframe…)Track cache hits and misses
Multiple ModesNo Cache (pass-through to DB)JIT – Just In Time (on miss, query database, and store result)Full-Cache (pre-load all rows)
15
Lookup Transform 2008
Shane BartlePrincipal ConsultantMicrosoft N.Z.
demo
16
Logging events to watch pipeline internalsPipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning
System Monitor to track I/O issuesBuffers In Use tracks how many buffers are presently being usedBuffers Spooled tracks how many 10Mb buffers have been spooled to disk
Superdump to resolve error scenariosNew support capability for reactive fix-up
SQL Server 2008 SSISMonitoring Enhancements
17
Superdump
Provides visibility into the activity of a running packageCan be triggered without stopping a packageCan be scheduled (via registry key) to run on a
CrashSpecific Error Condition
18
Superdump
Pat MartinPremier Field EngineerMicrosoft N.Z.
demo
19
C# SupportADO.Net SupportImproved Import Export Wizard
SQL Server 2008 SSISDeveloper Additions
20
2005 used VSA for Script Design and ExecutionLegacy component with Visual Basic onlyLimited set of “reference-able” assemblies
In 2008 SSIS uses Visual Studio Tools for ApplicationsVisual Studio Designer ShellC# (or VB.NET) as a languageCan reference all .net assembliesCan reference Web Services
C# Support
21
ADO.Net Support
SSIS 2005 had a ADO.NET ‘DataReader’Limited to supplying a SqlCommand
SSIS 2008 has a full ADO.NET Data SourceMuch Enhanced User InterfaceODBC Support
22
ADO.Net SupportData Type Conversions
New page shows mappings and possible issuesMay insert data convert transforms into dataflowDefault mappings are customisable (via Notepad)
New System for Scaling Up Number of TablesMake as sequence of dataflow tasks Each with 5 source/transform/destination chains
Import Export Wizard Additions
23
Import/Export Wizard
Shane BartlePrincipal ConsultantMicrosoft N.Z.
demo
24
Creates a Profile of Your SQL TablesExplore or maintain data qualityRun as a task in SSISProduces XML file outputHas a nice visual tool for working with profiles
Analyze a Set of Columns / Tables Looking ForCandidate keysColumn length distributionNull RatioPattern detectionValue distributions and statsFunctional dependenciesValue inclusion
SQL Server 2008 SSISData Profiling
25
Data Profiling
Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.
demo
26
Session Summary
Performance Improvements for throughputNew delta extraction alternative approachesMonitoring and support featuresC# and ADO.Net extend coverageData Profiling for data quality considerations
28
Q & A
29
Resources
www.microsoft.com/teched Tech·Talks Tech·Ed BloggersLive Simulcasts Virtual Labs
http://microsoft.com/technet
Evaluation licenses, pre-released products, and MORE!
http://microsoft.com/msdn
Developer’s Kit, Licenses, and MORE!
Related ContentDAT361 SQL Server 2008 Security Deep Dive
BIN309 SQL Server 2008 ETL drill down
BIN310 SQL Server 2008 Analysis Server (SSAS) enhancements
DAT362 SQL Server Spatial in the Spotlight
BIN352 Microsoft SQL Server 2008 Reporting Services: Architecture Overview
BIN311 Advanced Dashboard Creation with MOSS 2007
DAT364 End-to-End Troubleshooting for Microsoft SQL Server 2005/2008
BIN401 Optimising Query Performance in SQL Server 2008 Analysis Services
DAT355 Upgrading to Microsoft SQL Server 2008: Notes from Early Adopters
BIN402 Building and Deploying Advanced MOSS 2007 Planning Applications
32
Please complete anevaluation
33
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED
OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.