Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Anthony Howcroft, MicrosoftData Warehouse Category Manager - [email protected]
©2009 Microsoft Corporation
Each element meets a business need
Grown over time (organic + acquisition)
Generally reliable and (mostly) trusted
Expensive to maintain
Only privileged users have access
Slow to implement changes
CIO Viewpoint
SLAs getting harder to hit/keep
Large number of skillsets required
Systems scaled for peak times
Increasing integration issues
Increasing data quality issues
End users ‗doing-it-themselves‘
Pla
n to U
se
Anticipated Growth in the next 3 Years
0%
25
%5
0%
75
%1
00
%
-50% -25% 0% 25% 50% 75% 100%
Decreasing Usage Increasing Usage
Na
rro
w C
om
mitm
ent
Bro
ad
Co
mm
itm
ent
4Microsoft Confidential—Preliminary Information Subject to Change
DBMS Built
for
Transactions
SMP
Centralized
EDW
Analytics
within EDW
Analytics
Outside EDW
Blades in
Racks
DBMS Built
for DW
Server
Virtualization
DW
Bundles
Security
DW Appliance
Mixed Workloads
Data Federation
Columnar DBMS
Streaming
Data
SOA
Low-Power
Hardware
In-Memory DBMS
SaaS
Open Source
OS
Open Source
ReportingOpen Source
Data IntegrationSoftware
Appliance
Public CloudOpen Source DBMS
Advanced
Analytics
Data
Quality
HA for DW
Web Services
MPP
64-bit MDM
Real-time DW
Source: TDWI
Declining
usage despite
commitment
Flat growth,
good/moderate
commitment
Good growth,
good
commitment
Good growth,
moderate
commitment
Good growth,
small
commitment
Areas of strategic investment for Microsoft
Reasonably fixed in the short-term
Aspirational goal
Foundation for Vendor Product Roadmaps
Drives Continuous Innovation
Never looks quite how we imagined….
6Microsoft Confidential—Preliminary Information Subject to Change
Massive Scalability at
Low Cost
Improved Business Agility
and Alignment
Business Intelligence For Everyone
Hardware Choice
Make SQL Server the gold standard for data warehousing offering customers
SSIS
Microsoft & Partner
Services
Some
Big SAN
Big 64-core Server
Connected together
What’s wrong with this picture?
This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec
Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn‘t
Lots of disks for Random IOPS BUT
Limited controllers Limited IO bandwidth
System is typically IO bound
Queries are slow
Result: significant investment, not delivering performance
Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload
Avoid sharing storage devices among servers
Avoid overinvesting in disk drives Focus on scan performance, not IOPS
Layout and manage data to maximize range scan performance and minimize fragmentation
A method for designing a cost-effective, balanced system for Data Warehouse workloads
Reference hardware configurations developed in conjunction with hardware partners using this method
Best practices for data layout, loading and management
Relational Database Only – Not SSAS, IS, RS
Solution to help customers and partners accelerate their data warehouse deployments
Determine your data consumption rate, per CPU core, for
your particular query mix.
E.g. Assume TPCH query 2 is your average query
Run the query on a test server with data fully cached in
memory
Execute parallel query using MAXDOP 4
Observe 100% CPU on 4 cores
Time the query and observe # pages read
Per Core Consumption = (# Logical Reads* 8K)/(CPU Time)
Queries performing complex calculations, format conversions, multi-dimension hash joins, etc. will be more cpu-intensivei.e. complex queries will consume data at a slower per-core rate than simpler queries
Therefore: measure per-core data consumption for a variety of queries, and take the weighted average
We‘ve measured a mix of TPCH queries that reflect a ‗prototype‘ Data Warehouse workload
Concluded that SQL Sever 2008 on current x64 cores consume ~200 MB/Sec per core on average for this workload
We use this as a basis for the published reference architectures
Your mileage will vary!
2 Processor ConfigurationServer: HP ProLiant DL385 G6 with 2 6-core AMD Opteron CPUs
Storage server: MSA Storage
Scalability: 4 – 12 TB
4 Processor ConfigurationServer: HP ProLiant DL 585 G6 with 4 6-core AMD Opteron CPUs
Storage server: MSA Storage
Scalability: 12 – 24 TB
8 processor ConfigurationServer: HP ProLiant DL 785 G6 with 8 6-core AMD
Opteron CPUs
Storage server: MSA Storage
Scalability: 24 – 48TB
2 Processor ConfigurationServer: HP ProLiant DL380 G6 with 2 4-core Intel Xeon® 5500 Series CPUs
Storage server: MSA Storage
Scalability: 4 – 8 TB
4 Processor ConfigurationServer: HP ProLiant DL 580 G5 with 4 6-core Intel Xeon® 7400 Series CPUs
Storage server: MSA Storage
Scalability: 12 – 24 TB
2 Processor ConfigurationServer: Dell Power Edge R710 with 2 Quad-core Intel Xeon processors
8 CPU Cores
32GB Memory
Storage server: EMC CLARiiON AX4
Scalability: 4 – 8 TB
4 Processor ConfigurationServer: Dell Power Edge R900 with 4 6-core Intel Xeon processors
24 CPU Cores
96 GB Memory
Storage server: EMC CLARiiON AX4
Scalability: 12 – 24 TB
2 Processor ConfigurationServer: Bull Novascale R460 E2 with 2 Quad-core Intel Xeon processors
Storage server: EMC CLARiiON AX4
Scalability: 4 – 8 TB
4 Processor ConfigurationServer: Bull Novascale R480 E1 with 4 6-core Intel Xeon processors
Storage server: EMC CLARiiON AX4
Scalability: 12 – 24 TB
Also included in the Rack:SQL Server Analysis Services
SQL Server Reporting Services
SQL Server Integration Services
HA Server
Admin Server (with Management Studio, Backup Server)
2 Processor ConfigurationServer: IBM System x3650 M2 with 2 Quad-core Intel Xeon CPUs
Storage server: IBM System Storage DS3400
Scalability: 4 – 8 TB
4 Processor ConfigurationServer: IBM System x3850 M2 with 4 6-core Intel Xeon CPUs
Storage server: IBM System Storage DS3400
Scalability: 12 – 24 TB
8 processor ConfigurationServer: IBM System x3950 M2 with 8 Quad-core Intel Xeon CPUs
Storage server: IBM System Storage DS3400
Scalability: 16 – 32TB
HP Fast Track Bull Fast Track
Appliance‘s make it faster to get value… less chance of mistakes …more consistent performance…DBAs working as DBAs…
Microsoft is not ‘just software’!
Attribute Workload Affinity
Data Warehouse OLTP
Use Case
Description
Read-mostly (90%-10%)
Updates generally limited to data quality
requirements
High-volume bulk inserts
Medium to low overall query
concurrency; peak concurrent query
request ranging from 10-30.
Concurrent query throughput
characterized by analysis and reporting
needs
Large range scans and/or aggregations
Complex queries (filter, join, group-by,
aggregation)
Balanced read-update ratio (60%-40%)
Concurrent query throughput
characterized by operational needs
Fine-grained inserts and updates
High transaction throughput (for
example, 10s K/sec)
Medium-to-high overall user
concurrency. Peak concurrent query
request ranging from 50-100 or more
Usually very short transactions (for
example, discrete minimal row lookups)
Data Model
Highly normalized centralized data
warehouse model
Denormalization in support of reporting
requirements often serviced from BI
applications such as SQL Server
Analysis Services
Dimensional data structures hosted on
the database with relatively low
concurrency, high volume analytical
requests
Large range scans are common
Ad-hoc analytical use cases
Highly normalized operational data
model
Frequent denormalization for decision
support; high concurrency, low latency
discrete lookups
Historical retention of data is limited
Denormalized data models extracted
from other source systems in support of
operational event decision making
Data Architecture
Significant use of heap table structures
Large partitioned tables with clustered
indexes supporting range restricted
scans
Very large fact tables (for example,
hundreds of gigabytes to multiple
terabytes)
Very large data sizes (for example,
hundreds of terabytes to a petabyte)
Minimal use of heap table structures
Clustered index table structures support
detailed record lookups (1 to few rows
per request).
Smaller fact tables (for example, less
than100 GB)
Relatively small data sizes (for
example., few terabytes)
Database
Optimization
Minimal use of secondary indexes
(described earlier as index-light)
Partitioning is common
Heavy utilization of secondary index
optimization
Fast Track
Scan-intensive
Non-volatile
Index-light
Partition aligned
23©2009 Microsoft Corporation
Fast Track SMP RA for SQL Server 2008 CPU Core Calculator v2.4Updated 10/09/2009 - uw
This spreadsheet can be used to estimate the number of cores required to support a user workload and workload mix.
Enter your factors into the green fields and the results will be calculated in the pink cells.
The spreadsheet uses a weighted average to determine the number of cores required based on your inputs.
User Variable Input
Anticipated total number of users expected on the
system 3,000 users
Adjust for
workload mix
Estimated % of
workload
Estimated %
data found in
SQL Server
cache
Estimated Query
Data
Scan Volume MB
(Uncompressed)
Desired Query
Response Time
(seconds)
(under load)
Estimated Disk
Scan volume MB
(Uncompressed)
Estimated percent of actual query concurrency1% concurrency Simple 70% 10% 8,000 25 7,200
Fast Track DW CPU max core consumption rate
(MCR) in MB/s of page compressed data per core 200 MB/s Average 20% 0% 75,000 180 75,000
Estimated compression ratio (default = 2.5:1)2.5 :1 Complex 10% 0% 450,000 1,200 450,000
Estimated drive serial throughput speed in
compressed MB/s 100 MB/s 100%
Number of data drives in single storage array 8 drives
Usable capacity per drive 272 GB
Space Reserved for TempDB 26%
Calculations and Results
% of core
consumption
rate achieved
Expected per
CPU core
consumption
rate (MB/s)
Calculated Single
Query Scan
Volume in MB
(compressed)
Calculated
Target
Concurrent
Queries
Estimated
Target Queries
per Hour
Required IO
Throughput in
MB/s
Estimated
Number of Cores
Required
Estimated Single
Query Run Time
(seconds)
Simple 100% 200 2,880 21 3,024 2,419 12.10 0.5Average 50% 100 30,000 6 120 1,000 10.00 9.4
Complex 25% 50 180,000 3 9 450 9.00 112.5
30 3,153 3,869 32.00
Arrays
Required based
on throughput
Single Array
Throughput in
MB/s
Throughput in
MB/s for All
Required Arrays
5 800 4,000
Suggested Fast Track RA Server
Requirements No of CPU
cores
Number of
arrays
Total Compressed
Data Capacity
(TB)
Max achievable
IO Throughput
in MB/s
Max
achievable CPU
consumption in
MB/s
Required IO
Throughput in
MB/s
32 8 16 6,400 6,400 3,869
Current EnvironmentTeradata 4-node (5450 model) with 6TB of user data
BI: Business Objects
ETL: Informatica and BTEQ scripts
Proposed Microsoft PlatformSQL Server Fast Track Data Warehouse
HP DL580 Server - 4 Quadcore Processors (16 core total)
256 GB Memory
SAN Storage: MSA 2000 (Qty 4) – 8TB User Data Capacity
BI: Business Objects
ETL: SQL Server and SSIS
TeradataSQL Server
Fast Track DWComparison
LoadingSubject Area 1
5:10:21 total time 0:51:31 total time R
6x faster
Loading Subject Area 2
4:36:08 total time 1:50.01 total time R
2.5x faster
Query times Subject Area 1
3:03 avg query time(using 9 benchmark
queries)
0:15 avg query time(using 9 benchmark
queries)
R
12x faster
Query times Subject Area 2
56:44 avg query time(using 4 benchmark
queries)
8:09 avg query time(using 4 benchmark
queries)
R
7x faster
Fast Track Pricing* (at List)Hardware (8TB capacity) $152,500
SQL Server – 2 options
Server CAL (100) License $26,119
Total SW & HW* $178, 619
Price per TB (8TB) – CAL $22,327
Expand to 16 TB
Additional Hardware* $37,016
Total Price w/CAL license $215,635
Price per TB (16TB) – CAL $13,477
*NOTE: The above calculation is based on Microsoft estimated retail price for
SQL Server 2008 Enterprise, Windows Server 2003, and published hardware
prices available through participating resellers as of May 2009. Actual reseller
prices may vary.
Fast Track Data Warehouse 2.0
New Reference Architectures from IBM
Updated Configurations from HP, Dell and Bull
EMC as a Service Partner for Fast Track
2008 Beyond2009 2010
Enterprise ETL Services
Star Join Query Optimizations
Data Compression
Partitioned table parallelism
Test Harness for Partners
Microsoft to create Test Harness for validation of new Fast Track configurations
NEC to validate new Reference Architectures
DW Reference Architectures
Predictable performance at low cost
Faster time to solution
Fast Track Data Warehouse
Fast Track vNextFuture Partners to create new Validated Reference Architectures with Test Harness
Incorporates SQL vNext
? ? ?
Reduces DBA effort; fewer indexes, much higher level of sequential I/O
Dell, HP, Bull, EMC and IBM – more in future
Commodity Hardware and value pricing;
Lower storage costs
New reference architectures scale up to 48TB (assuming 2.5x compression)
Validated by Microsoft; better choice of hardware; application of Best Practice
Formerly known as Project “Madison”
Scale-Out of SQL Server: 10s TB ►100s TB ►PB
Reference Architectures from HP, Bull, EMC, Dell, IBM
Low cost of ownership
Simplified deployment and maintenance via appliance model
Integration with existing SQL Server 2008 data warehouses via Hub &
Spoke Architecture
Available 1HCY10
Preview program running
30Microsoft Confidential—Preliminary Information Subject to Change
2008 Beyond2009 2010
Parallel Data Warehouse
MTP Program Launched
Circa 10 Customers Provided with early Madison Benchmark
Madison Named as SQL Server Parallel DW
List Price at $55K per proc
Microsoft Announce Intention to Acquire DATAllegro (July)
Acquisition Closes (Sept)
150TB demo of DATAllegro on SQL Server run at BI Conference (Oct)
Hardware Architectures Identified
Early whitepapers / guidance
Launch date estimated Summer 2010
Project “Madison” MTP 2 Program to Launch (fully functional, fully performant)
TAP Program (on client site)
RTM in Summer 2010
Parallel Data Warehouse
PDW vNextFocus on continually lowering the costs of high end DW, while increasing performance
Additional Hardware Partners
Additional functionality
Further integration with MS stack
?
Database Servers
Du
al In
fin
iban
d
Control Nodes
Active / Passive
Spare Database Server
Du
al Fib
er
Ch
an
nel
Date Dim
D_DATE_SK
D_DATE_ID
D_DATE
D_MONTH
…
Store SalesSs_sold_date_sk
Ss_item_sk
Ss_customer_sk
Ss_cdemo_sk
Ss_store_sk
Ss_promo_sk
Ss_quantity
…
Promotion
P_PROMO_SK
P_PROMO_ID
P_START_DATE_SK
P_END_DATE_SK
…
Customer
C-CUSTOMER_SK
C_CUSTOMER_ID
C_CURRENT_ADDR
… Item
I_ITEM_SK
I_ITEM_ID
I_REC_START_DATE
I_ITEM_DESC
…
Store
S_STORE_SK
S_STORE_ID
S_REC_START_DATE
S_REC_END_DATE
S_STORE_NAME
…
Customer
Demographics
CD_DEMO_SK
CD_GENDER
CD_MARITAL_STATUS
CD_EDUCATION
…
1
Trillion
Rows
100 Million73, 049
1.92 Million1, 902
2, 500
502, 000
Date and time restrictions
4 hours
~50M rows selected
AggregatesHAVING filter
18 s
RestrictionsDate
1 year
Item
AggregatesHAVING filter
13 s
Existing
Environment
Hardware16 CPU HP 8620 Itanium
Hitachi Storage 27TB Raw
SATA 21 LUNS
SoftwareWindows 2003 SP2
SQLServer 2008
SSIS/SSRS
Data Warehouse18 Terabytes
Star Schema
80 Fact Tables
500 + Dimensions
Current
Challenges
Data Load Speeds
Analytic Capacity
Analytic Speed
Mixed Workload
Total Cost of
Ownership
Madison
Highlights
Improved by 300%
30TB/160 Cores
Query Speeds 70X
Improvement
Concurrency
Mixed Workload
TCO Lowered by
50%
EDW provides “single version of truth” but makes it difficult to support
mixed workloads and multiple user groups, each requiring SLAs
Departmental data marts enable mixed workloads, but make it difficult to
consolidate information across the enterprise
A Hub and Spoke solution gives you the flexibility to add/change diverse
workloads/user groups, while maintaining data consistency across the
enterprise
Parallel database copy
technology enables rapid
data integration and
consistency between hub
and spokes
Create SQL Server 2008, Fast Track Data Warehouse, and SQL Server Analysis Services
spokes
Support user groups with
very different SLAs; hot,
warm and cold data;
different requirements on
data loading, etc.
• Seamless and secure connections• Rich and natural expressions• Precise and anticipative insights
―The vision is not an attempt to predict the
future, but an attempt to articulate the kinds of
software experiences we want to be able to
deliver to our customers in the future.‖
• Real-time language
translation
• Low-cost, multi-touch
displays
• E-Ink
• Natural user interfaces
• Dynamic data visualizations
• Semantic meta-data
• Location-based services
• Sensor networks
• Contextual information
retrieval
• Augmented reality
http://www.microsoft.com/video/en/us/details/e7728af1-3fe4-4e25-a907-3dbf689fe11a
Visitwww.microsoft.com/fasttrack
www.microsoft.com/madison
Visit the SQL Server DW Portal on TechNethttp://technet.microsoft.com/en-gb/sqlserver/dd421879.aspx
Local contactsRasmus Johansson
[email protected] 050-4999 589
Pekka Pykäläinen
[email protected] 040-551 8478
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.