Click here to load reader

Solution to help customers and partners accelerate their data

  • View

  • Download

Embed Size (px)

Text of Solution to help customers and partners accelerate their data

BIE402: Implementing a Microsoft SQL Server Data Warehouse Fast Track

Implementing a Microsoft SQL Server Data Warehouse Fast TrackBrian KnightFounder, Pragmatic Worksbknight@pragmaticworks.comSESSION CODE: BIE4026/10/2010 6:25 PM 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

1About the Ugly Guy SpeakingSQL Server MVPFounder of Pragmatic WorksCo-Founder of, and SQLShare.comWritten more than a dozen books on SQL ServerTodays Problems with IntegrationIntegration todayIncreasing data volumesIncreasingly diverse sourcesRequirements reached the Tipping PointLow-impact source extractionEfficient transformationBulk loading techniques

3AgendaSQL Instance-level Data load tuningFast Track maintenanceSQL Server Fast Track Data WarehouseA method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this methodBest practices for data layout, loading and managementSolution to help customers and partners accelerate their data 5Fast Track Data Warehouse ComponentsSoftware:SQL Server 2008 EnterpriseWindows Server 2008Hardware:Tight specifications for servers, storage & networkingPer core building blockConfiguration guidelines:Physical table structuresIndexesCompressionSQL Server settingsWindows Server settingsLoading

6/10/2010 6:25 PM 2004 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.6Fast Track Performance HP ProLiant DL785 G6 (8) AMD Opteron CPUs, 6 core, 2.6 GHz 48 total CPU cores 24 TB optimized storage (48 TB max) 9600 MB/s throughput

Fast Track Data Warehouse Reference Configurations* Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB.** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure. 30% of this storage should be reserved for DBA operationsServerCPUCPU CoresSANData Drive CountInitialCapacity*MaxCapacity**HP Proliant DL 385 G6(2) AMD Opteron Istanbulsix core 2.6 GHz12(3) HP MSA2312fc(24) 300GB 15k SAS6TB12TBHP Proliant DL 380 G6(2) Intel Xeon 5500 SeriesQuad core 8(2) HP MSA2312(16) 300GB 15k SAS4TB8TBHP Proliant DL 585 G6(4) AMD Opteron Istanbul six core 2.6 GHz24(6) HP MSA2312fc(48) 300GB 15k SAS12TB24TBHP Proliant DL 580 G5(4) Intel Xeon 7400 Series six core24(6) HP MSA2312(48) 300GB 15k SAS12TB24TBHP Proliant DL 785 G6(8) AMD Opteron Istanbul six core 2.8 GHz48(12) HP MSA2312(96) 300GB 15k SAS24TB48TBDell PowerEdge R710 (2) Intel Xeon Nehalem quad core 2.66 GHz 8(2) EMC AX4(16) 300GB 15k FC4TB8TBDell Power Edge R900(4) Intel Xeon Dunningtonsix core 2.67GHz 24(6) EMC AX4(48) 300GB 15k FC12TB24TBIBM X3650 M2(2) Intel Xeon Nehalem quad core 2.67 GHx8(2) IBM DS3400(16) 200GB 15K FC4TB8TBIBM X3850 M2(4) Intel Xeon Dunnington six core 2.67 GHz24(6) IBM DS3400(24) 300GB 15k FC12TB24TBIBM X3950 M2(8) Intel Xeon Nehalem four core 2.13 GHz32(8) IBM DS3400(32) 300GB 15k SAS16TB32TBBull Novascale R460 E2 (2) Intel Xeon Nehalem quad core 2.66 GHz 8(2) EMC AX4(16) 300GB 15k FC4TB8TBBull Novascale R480 E1 (4) Intel Xeon Dunningtonsix core 2.67GHz 24(6) EMC AX4(48) 300GB 15k FC12TB24TBPotential Performance BottlenecksFCHBAABFCHBAABFC SWITCHSTORAGECONTROLLERABABCACHESERVERCACHESQL SERVERWINDOWSCPU CORESCPU Feed RateHBA Port RateSwitch Port RateSP Port RateABDISKDISKLUNDISKDISKLUNSQL Server Read Ahead RateLUN Read RateDisk Feed Rate9Fast Track SQL DW Architecture vs. Traditional DW

SQL 2008 Data Warehouse4 Processor 16 Core ServerShared Network BandwidthEnterprise Shared SAN Storage Dedicated Network BandwidthTraditional SQL DWArchitectureShared InfrastructureFast Track SQL DW ArchitectureDedicated DW InfrastructureArchitecture modeled after DW Appliances 1TB 48TB Pre-Tested

Dedicated Low Cost SAN Arrays 1 for every 4 CPU Cores EMC AX4 HP MSA2312

OLTP ApplicationsBenefits:More System Predictability Thus User ExperiencePretested Configurations Lowers TCOBalanced CPU to I/O Channel Optimized for DWModular Building Block ApproachScale Out or Up within limits of Server and San10Case: Insurance Claims High-volume loads in a short load windowExample: Load and enrich 50 GB of incremental data in less than 1 hourOnly possible with a highly parallel load designUse partitioned destination table# partitions = # coresParallel loading to staging table firstSeparate filegroups per-partition prevents interleaving during load

ResultsExisting ApplianceSQL Server Fast Track DWComparisonLoading Subject Area 15:10:21 total time51:31 total timeR 6x fasterLoading Subject Area 24:36:08 total time1:50.01 total timeR 2.5x fasterQuery times Subject Area 13:03 avg query time(using 9 benchmark queries)0:15 avg query time(using 9 benchmark queries)R 12x fasterQuery times Subject Area 256:44 avg query time(using 4 benchmark queries)8:09 avg query time(using 4 benchmark queries)R 7x fasterPrice per TB (8TB) Cal : $22K / TBPrice per TB (16TB) Cal: $13K / TBCase StudyReplaced AS/400 DB2 with SQL ServerReplaced CICS with SSISSaved ~$50,000 a monthTook 12 hour process down to 50 minutes

DW Products PositioningStart hereIncremental HW Expansion, Fast parallel loading by default,HA by defaultScaleComplexityHA by defaultSW-HW integration123SQL Server 2008with Fast Track Reference ArchitecturePDW with Hub-and-spokeSQL Server 20084PDW14Fast Track Data StripingFast Track evenly spreads SQL data files across physical RAID-1 disk arraysCREATE FILEGROUP DB1ARY01D1v01ARY01D2v02ARY02D1v03ARY02D2v04ARY03D1v05ARY03D2v06ARY04D1v07ARY04D2v08ARY05v09DB1-1.ndfDB1-7.ndfDB1-5.ndfDB1-3.ndfDB1-2.ndfDB1-4.ndfDB1-6.ndfDB1-8.ndfDB1.ldfPrimary DataLogFT Storage EnclosureRaid-1 Disk 1 & 2Fast Track File LayoutSQL Server File SystemThree layers of storage configurationSAN file systemLogical storage allocationPrimary Data (user databases)(4) 2 disk RAID-1 arrays per enclosureLog(1) 2 disk RAID-1 array per enclosureDatabase file creationUser databasesTempdbTransaction logs17Writing Sequential DataSequential scan performance starts with database creation and extentallocationRecall that the E startup option is usedAllocate 64 extents at a time (4MB)Pre-allocation of user databases is recommenedAutogrow should be avoided if possibleIf used, always use 4MB increments

18Mounting the SAN File SystemCreating LUNSMount points can be used to map LUNs to the Windows Server OSFast Track RA recommends using a naming scheme to identity LUN to physical disk relationship.LUN, RAID, and Physical Disk number are used as components of the windows volume nameNaming scheme enables targeted IO validation of disk (LUN), array, and storage processor using a tool such as SQLIOPrimary Data arrays: 2 LUN per ArrayLOG array: 1 LUN19SQL Server ConfigurationSQL Server Startup-E : Allocate 64 extents at a time (4MB)This is not a guarantee of a logically contiguous extent allocation-T1117: Autogrow in even increments-T610 : Minimal logging during data loadsAll databases should be sized to meet expected growth for next 12-18 monthsAutogrow for ALL Databases should be set to 4 MB20SQL Server FilesTransaction LogCreate a single transaction log file per database and place on a dedicated Log LUNEnable auto-grow for log filesThe transaction log size for each database should be at least twice the size of the largest DML operation21SQL Server Files

User DatabasesCreate at least one Filegroup containing one data file per LUNFT targets 1:1 LUN to CPU core affinityMake all files the same sizeEffectively stripes database files across data LUNsMultiple file groups may be advantageousDisable Auto-Grow for the databaseTransaction Log is allocated to a Log LUNData Load in a Fast TrackConventional data loads lead to fragmentationBulk Inserts into Clustered Index using a moderate batchsize parameterEach batch is sorted independentlyOverlapping batches lead to page splits1:321:311:351:341:331:361:381:371:401:391:321:311:351:341:33Key Order of IndexTechniques to Maximize Scan ThroughputMinimize use of NonClustered indexes on Fact TablesLoad techniques to avoid fragmentationLoad in Clustered Index order (e.g. date) when possibleIndex Creation always MAXDOP 1, SORT_IN_TEMPDBIsolate volatile tables in separate filegroupIsolate staging tables in separate filegroup or DBPeriodic maintenance

25Minimizing Extent FragmentationExtent fragmentation can be minimized through use of filegroupsSeparate filegroups for volatile dataSeparate filegroups for staging tablesPartition key tables across multiple filegroupsUseful if data volatility varies across partition rangesIsolate data operations that generate significant fragmentation to dedicated filegroups or databases

26Loading DataPrimary method used to create sequential data layoutGoalsMaximize sequential data layoutMinimize fragmentationKey considerationsConcurrent load operations to the same file will induce fragmentationDML change operations (Update/Delete) may induce fragmentation

Loading DataLoad recommendations for Fast Track

Search related