View
212
Download
0
Category
Preview:
Citation preview
DBTest2008 1
Testing Challenges for Extending SQL Server's Query Processor:A Case Study
Torsten Grabs, Steve Herbert, Xin (Shin) Zhang {torsteng; stevhe; xinzh}@microsoft.com
DBTest2008 2
Agenda
MotivationBackground
Relational Data Warehousing (DW)SQL Server 2008 Starjoin improvement
Testing ChallengeExtending Enterprise-class Commercial DBMS
SolutionIterative development processMulti-dimensional testing
Case Study ResultsConclusions
DBTest2008 3
Motivation
Data warehouses are hugeBillions of rows in fact tablesMulti-terabyte database
Query response time requirements are strictInteractive response times desired: <5 secIdeally: speed-of-thought response time
Plan choice is CRUCIAL for good performanceUser requirements are challenging
Large input spaceZero administration overheadDo not break existing customer base
DBTest2008 4
Background: Relational DW
Fact Table
PeriodDate_KeyQuarter_NumberYear
ProductProduct_KeyProduct_IDProduct_NameCategory
SalesDate_KeyProduct_KeyQty_SoldDollars
Dimension Tables
Business Question:
Give me total sales of SQL Server 2005 in second quarter of year 2006.
Example Star Query:
SELECT SUM(Dollars) FROM Sales S JOIN Product P ON P.Product_Key = S.Product_KeyJOIN Period Pe ON Pe.Date_Key = S.Date_KeyWHERE Product_Name = 'SQL Server 2005' AND Quarter_Number = 2 AND Year = 2006
DBTest2008 5
Background: New Feature
Fact selectivity matters for plan choiceSQL 2008 improve medium selectivity queries
100% of fact rows qualify
0% of fact rows qualify
Seek-based plans with
nested loop joins
fact table selectivity
Scan-based plans with regular hash joins
Scan-based plans with bitmap hash joins
High selectivity queries
Medium selectivity queries
Low selectivity queries
DW-specific Extensions of the SQL Query Processor
SQL Server Query Optimizer
Standard (join) query optimizations
Alternative query plans
Cost-based plan
choice
Final query plan
Star query detection
Selectivity analysis
Sta
ndar
d op
timiz
atio
nO
ptim
izat
ion
exte
nsio
n fo
r D
W
Schema detection
Star query plans
Bitmap-based semi-join reduction
Hash Join
Hash JoinFilter
Productdimension
tableFilter
Store dimension
table
Fact Table Scan
Join Reduction Info 1
Join Reduction Info 2
Join Reduction Processing
Join Reduction Info 2
Join Reduction Info 1
SK_D1 SK_D2 Meas1 Meas2
D1_05 D2_01 1 11
D1_01 D2_03 2 11
D1_05 D2_03 3 11
D1_05 D2_03 4 11
D1_07 D2_04 5 11
SK_D1 SK_D2 Meas1 Meas2
D1_05 D2_03 3 11
D1_05 D2_03 4 11
Rowset before join reduction
Rowset afterjoin reduction
SK_D1
D1_05
SK_D2
D2_03
SK_D1
D1_05
SK_D2
D2_03
Surrogate key values of rows qualifying the filter
over the product dimension
Surrogate key values of rows qualifying the filter over the store dimension
DBTest2008 8
Testing Challenge
Large input space of queriesFull range of selectivityMixed ad-hoc and parameterized queriesComplex schema and workloads
Automatic featureCorrect cost based plan choiceSmart plan pattern detectionAccurate join selectivity estimationNo knobs – no application changes required
Happy existing customersSignificant improvements Negligible regressions
DBTest2008 9
Agenda
MotivationBackground
Relational Data Warehousing (DW)SQL Server 2008 Starjoin improvement
Testing ChallengeExtending Enterprise-class Commercial Server
SolutionIterative development process Multi-dimensional testing
Case Study ResultsConclusions
DBTest2008 10
Iterative Development Process
In-cycle validation of assumptionsMitigates risk of major end-of-cycle issues
Especially performance problems
Maximal paralleling of testing and developing efforts
Quality
DBTest2008 11
Multi-Dimensional Testing
Functional testingTarget testing to ensure core functionalityModel-based testing to ensure coverage
Performance testingComponentBenchmarkCustomer Workloads
DBTest2008 12
Functional: Target Testing
Query Results
Functional Correctness
Bitmap Filtering
DBTest2008 13
Functional: Model-Based Testing
Large number of test dimensions10+ test dimensions …If assume 3 variations each …will generate 60K combinations!
Two abstract models covering key requirementsSchema model
Database schema and data
Query modelStar-join queries built on top of the schema model
DBTest2008 14
Functional: Schema Model
Schema Model
Number and Classification
of Tables
Relationships Between Tables
Cardinalities
Data Distributions
DBTest2008 15
Functional: Query Model
Query Model
Number of Facts
Number of Dimensions
Dimension Selectivity
Fact AggregationsNested
Subqueries
Fact Selectivity
DBTest2008 16
Model-based Test Example
Test scenario Testing selectivity estimation of single fact star schema
Schema modelNumber and classification of tables: fact 1, dimension 5
Relationships between tables: star schema
Cardinality: fact 100K rows, dimension 10 rows each
Data distribution: uniform
Query modelNumber of facts: 1
Number of dimensions: 10
Dimension selectivity: 0.4~0.8 (5 choices)
Fact aggregation: 1 aggregation (12 possible types)
Nested subqueries: none
Fact selectivity: 0.1~1.0
Single test covers 55*12 (37,836) tests cases
DBTest2008 17
Performance Testing
Component• Micro-benchmark• Targeted Test
Workloads• Microsoft
Sales• Retail
Business• …
Benchmarks• TPC-H• Decision
Support
DBTest2008 18
Case Study Results
~10 different workloads3 representative results
Decision support workload resultsMicrosoft sales data warehouse resultsRetail workload results
DBTest2008 19
Results: Decision Support Workload
Limited performance benefit for initial designLots of regressions initiallyGood convergence over several iterations
100GB data70+ queriesTypical DSS scenario
SchemaQueries
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
0%
2%
4%
6%
8%
10%
12%
14%
16%
90%
100%
110%
120%
130%
140%
150%
% of regressed queries in workload (comparing baseline vs. star join optimization)SQL Server 2008 with star join optimization
% o
f re
gre
ss
ed
qu
eri
es
ge
om
ea
n q
ue
ry
res
po
ns
e t
ime
ra
tio
DBTest2008 20
Results: Microsoft Sales DW
Started with good design for performanceBut: too many regressions with initial designConverge to good result over several iterations
Run 1
Run 2
Run 3
Run 4
Run 5
Run 6
Run 7
0%
2%
4%
6%
8%
10%
12%
14%
16%
90%
100%
110%
120%
130%
140%
150%
% of regressed queries in workload (comparing baseline vs. star join optimization)SQL Server 2008 with star join optimization
% o
f re
gre
ss
ed
qu
eri
es
ge
om
ea
n q
ue
ry
res
po
ns
e t
ime
750GB data50 queriesComplex queries
> 20 joins
DBTest2008 21
Results: Retail Workload
Several iterations to establish the “winning” designSignificant improvements after several iterationsRegressions limited to “2 wrongs make 1 right” (see Giakoumakis/Galindo-Legaria TKDE 2008)
100GB data30 queriesComplex physical design
Indexes Partitioning
No Run
No Run
Run 3
Run 4
Run 5
Run 6
Run 7
0%
5%
10%
15%
20%
25%
30%
35%
90%
95%
100%
105%
110%
115%
120%
125%
% of regressed queries in workload (comparing baseline vs. star join optimization)SQL Server 2008 with star join optimization
% o
f re
gre
ss
ed
qu
eri
es
ge
om
ea
n q
ue
ry
res
po
ns
e t
ime
DBTest2008 22
Conclusions
Extension of the SQL Server in relational DWNew feature with zero administration overheadWidely deployed system
Identified testing challengesBalance performance improvement and regression risk
SolutionIterative development and testing cyclesMulti-dimensional testing (functional, performance)
Iterative development and testing insightsSupports learning and adjustment during developmentDelivers well-understood results Leads to high-quality features
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
DBTest2008 24
Q&Ahttp://www.microsoft.com/sql/2008
Torsten torsteng@microsoft.comSteve stevhe@microsoft.comShin shin.zhang@microsoft.com
Recommended