Upload
philomena-baker
View
219
Download
1
Embed Size (px)
Citation preview
High Performance Analytical Appliance• MPP Database Server Platform for high performance• Prebuilt appliance with HW & SW included and optimally
configured• Shared nothing architecture; In-Memory Columnstore engine
Value• Lowest price per terabyte of high end DW appliances on the
market• Up to 100x faster than legacy SMP Database queries• Up to 15x data compression
Scale & Support• Scales from Terabytes to Petabytes; Built with Big Data in
mind• Architected with redundancy throughout• Integrated, single call Support
What is SQL Server 2012 PDW?
Shared Nothing ArchitectureFault tolerance
• All h/w components have redundancy• CPU• DISK• NETWORK• POWER• STORAGE PROCESSORS
• Control Nodes Failover Clustering• Compute Nodes Part of a single cluster
per rack
MPP Massively Parallel
Processing
SMP Symmetric Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers
• Mostly, the solution is housed on a shared SAN• Increase compute capacity via scale-up design
• Uses separate CPUs running in parallel to execute a single query
• Each CPU has its own allocated memory• High-speed communications between nodes• Increase compute capacity via scale-out design
SMP vs. MPP Architecture
2nd Scale Unit (additional 3 nodes optional )
Base Unit (3 nodes)
Infiniband Switch
Ethernet Switch
Management Control Node
Management Failover Node
Ethernet Switch
Infiniband Switch
3rd Scale Unit (additional 3 nodes optional)JBOD
Compute ServerCompute ServerCompute Server
JBOD
JBOD
Compute ServerCompute ServerCompute Server
JBOD
JBOD
Compute ServerCompute ServerCompute Server
JBOD
Dell PDW Modular Design
• 3 – 54 Compute Nodes• 1 – 6 Racks• 1TB, 2TB or 3TB Drives• 22.65 TB – 1,223TB Raw• 113 TB – 6PB User Data
Linear Scale Mixed Workloads
No Downtime Start Small & Grow
113TB 6 PB
Linear Scale Solution
Shared Nothing ArchitectureI/O and CPU affinity within SMP nodes• Eliminates contention per user query• Utilize full resources for each user query• Multiple physical instances of tables
• Distribute large tables• Replicate small tables
Sample Data Model Across Compute
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store Dim
Store Dim IDStore NameStore MgrStore Size
Product Dim
Prod Dim IDProd CategoryProd Sub CatProd Desc
MktgCampaign Dim
Mktg Camp IDCamp NameCamp MgrCamp StartCamp End
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
SQL
SQL
SQL
Sample Data Model – Distributed Tables
SQL
SQL
SQL
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
SF-1
SF-2
SF-3
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store Dim
Store Dim IDStore NameStore MgrStore Size
Product Dim
Prod Dim IDProd CategoryProd Sub CatProd Desc
MktgCampaign Dim
Mktg Camp IDCamp NameCamp MgrCamp StartCamp End
Sample Data Model – Replicated Tables
SQL
SQL
SQL
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store Dim
Store Dim IDStore NameStore MgrStore Size
Product Dim
Prod Dim IDProd CategoryProd Sub CatProd Desc
MktgCampaign Dim
Mktg Camp IDCamp NameCamp MgrCamp StartCamp End
TD
PD
SD
MD
TD
PD
SD
MD
Smaller Dimension Tables are Replicated
on Every Compute Node
TD
PD
SD
MD
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-1
SF-2
SF-3
Result: Fact -Dimension Joins can be performed locally
Infiniband Switch
Ethernet Switch
Management Control Node
Management Failover Node
Ethernet Switch
Infiniband Switch
JBOD
Compute ServerCompute ServerCompute Server
JBOD
JBOD
Compute ServerCompute ServerCompute Server
JBOD
JBOD
Compute ServerCompute ServerCompute Server
JBOD
PDW Query Execution
27x Faster
7x Smaller
37x Smaller
28x Faster
16.17% Impact
From 2.5 Hours to
5.5 Minutes From 3.3 TB To 486 GB
3.3TB to 90 GB From43 min to 90 sec Minimal
concurrency impact
Real World Summary Results
Why Distributed DW?• Full SQL Server functionality• Distributes the workload• Allows existing and new data marts to be
integrated into the EDW• Better solution than consolidation• Enables publishing• Expand and add spokes without impacting
users
Distributed Architecture• Parallel Database Export (PDE) technology
enables rapid data movement and consistency between distributed SQL Servers
• Support different SLAs and user groups:High-performance loading and queriesGuaranteed server resourcesData concurrencyCustomized workloads
Hub & Spoke Design
PDWv2 Hub
Landing Zone/SSIS Server
DW Loader push of flat files over 56GB
Infiniband
Backup Node
SQLServer (QSDW 2000)
SSAS ProcessingServer
Remote Table Copy over 56GB Infiniband
Remote Table Copy over 56GB Infiniband
Backup push over 56GB Infiniband
SMP SQL Server (Ex: Fast Track
Reference Architecture, Quick
Start Data Warehouse)
Create Remote Table (CRTAS)• Enables the high-speed PDE feature• Selects data from a PDW appliance and copies that data to a new table in an SQL Server SMP database• Sample transfer rate to 4 socket 24 core
server: 120GB / Hour plus compression
CRTAS PreqrequisitesMust be co-located and on same Infiniband® network:• Requires Infiniband® HCA card in remote SQL Server
SMP• Requires physical server placement within ~100
meters ofPDW appliance• Recommend externally facing network to be
firewalled• Exception for the SQL Server admin/management
ports• PDW to SQL Server SMP is the only supported• Configuration• Target table(s) must not already exist
CRTAS ExampleCREATE REMOTE TABLEOrderReporting.dbo.OrdersAT( 'Data Source = SQL_Sales, 1433;User ID = Madrid;Password = TechEd2013;' )ASSELECT * FROM 2010Q4.dbo.Orders;
CRTAS Monitoring Performance• Performance counters on destination SMP SQL Server:
Databases: Bulk Copy Rows/SecDatabases: Bulk Copy Throughput/Sec (KB)
• On the PDW appliance, use the following DMV-basedquery to view the data export status:
SELECT * FROM sys.dm_pdw_dms_workers WHERE type = 'PARALLEL_COPY_READER';
Enhancing Fact Table Performance• Partition tables where appropriate
• Common key is date (or integer surrogate)
• Similar guidelines to SMP SQL Server partition
• Use partition switch for large inserts/updates
Partition a Replicated TableCREATE TABLE Customers
(id integer NOT NULL,lastName varchar(20),postalCode varchar(10)
WITH(PARTITION (id RANGE LEFTFOR VALUES (10,20,30,40,50)) );
Partition Distributed TableCREATE TABLE Orders
(id integer NOT NULL,lastName varchar(20),shipdate datetime
WITH(DISTRIBUTION = HASH(id),
PARTITION (shipdate RANGE RIGHT FOR VALUES(‘1992-01-01’,’1992-02-01’,’1992-03-01’..)));
Other Performance Considerations• Consider de-normalizing tables (traditional
Master / Detail)• Use CTAS as Swiss Army Knife
CTAS• Creates a new table based upon SELECT• Executes in parallel, minimal logging• Copy tables (or subsets) for querying• Change replicated table to distributed• Change the distribution column• Use to periodically defrag tables• Reduce the overhead of a DELETE
SQL Server Integration Services
(SSIS)
DWLoader
• Achieve data load speeds of up to 1.7 TB per hour Accommodate multiple and concurrent incremental loads
• Provides transactional protection and configurable batch size (10,000)
• Supports direct load of compressed files
• SQL Server Parallel Data Warehouse Connection Manager
• SQL Server Parallel Data Warehouse Destination
High Speed Data Loads
Data Loading with SSIS• SQL Server PDW Destination Component• Loads occur in parallel, both within a
package and among multiple packages concurrently
• SSIS can run on Loading Server or another server outside of the PDW appliance
• Leverages DMS for parallel operations
SSIS – Management vs. Performance• Row level locking• PDW Connections and Queries are costly• Data type conversion in the destination adapter is
expensive. Match destination data types (String, decimals)
• Consider ELT rather than ETL• Consider using SSIS control flow to instantiate
DWLoader
msdn
Resources for Developers
http://microsoft.com/msdn
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Resources for IT Professionals
http://microsoft.com/technet
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.