View
222
Download
3
Category
Tags:
Preview:
Citation preview
1
The Query Mesh Project: A Powerful Multi-Route Query Processing
Paradigm
New England Database Summit 2010
Elke. A. RundensteinerWorcester Polytechnic Institute
rundenst@cs.wpi.edu
Elisa BertinoPurdue University
bertino@cs.purdue.edu
1
Rimma V. NehmeMicrosoft Jim Gray
Systems Labrimman@microsoft.com
Thanx goes to NSF 0917017 for partial support of this project.
2
Motivation A variety of modern applications face data with non-uniform
characteristics ubiquitous healthcare, location-based services, financial tickers, network
monitoring…
Data
Query Results
Data Sources Database Engine
SELECT * FROM …
Query Optimizer
Plan Cost
1.234
Query
Query Execution Plan
Query
Executor
Overa
ll Sta
tist
ics
I want my results quickly. I don’t
care how exactly they are computed
TYPICALLY ONE
execution plan for
ALL DATA
2
3
Concrete Example: Network Monitoring
Data Streams
Query Results
Network packets
DSMS
SELECT * FROM …
Query Optimizer
Continuous Query
Query Execution Plan
Network Monitoring
Multi-Plan (/Route) Query ProcessingPlan 1 Plan 2 Plan 3
Single Plan Query ProcessingOpportunity for Improvement:
It may be more efficient to use different plans for different subsets
of data
3
• Here example is with streaming data• Similar examples can be found with static data
4
Outline Introduction & Motivation Background : Query Mesh
Model Optimization Execution
Dynamic Re-Optimization with Query Mesh Challenges Architecture Details Experimental Evaluation
Ongoing and future work Conclusion
4
5
(Here, route = execution plan)
Multi-Plan Query Processing Using Query Mesh
Query Mesh provides a middle ground between a single pre-computed route and multiple runtime routes systems
Single “route-oriented” solution
Multiple routesClassifier
Traditional Query Optimization Eddies and its descendants
Multi “route-less” solution
Eddy
Query Mesh
…
…
…
Multi “route-oriented” solution
Coarse optimization
Small overhead
Fine-granularity optim.
Significant overhead
Fine-granularity optimization
Less overhead
Physical Architecture of Query Mesh Framework
5
6
Query Mesh Search Space
1234
1/2/3/4
1/23/4 14/2/3 1/24/3 13/2/4 12/3/4 1/2/34
14/23 1/234 124/3 13/24 123/4 134/2 12/34
Set of training tuples {1,2,3,4}* has cardinality n = 4
* We denote {{1},{2,3}} as “1/23” for brevity
One plan for all data
Each subset has individual route
Query MeshLattice ShapedSearch Space
6
Search Space: the set of all possible solutions
Search Space Complexity
Bell number Bn = sum of Stirling numbers of second kind S(n,k)
Stirling number of the second kind S(n, k) is the number of ways to partition a set of cardinality n
into exactly k nonempty subsets
7
Query Mesh Optimization Problem
7
Query Mesh Cost Model(main idea)
Cost(QM) = Cost of Classifier + Cost of routes + Multi-route overhead
Query Mesh Search Algorithms
Optimal Query Mesh Search (Opt-QM)
Query Mesh Search Heuristics
Start solution
Final solution
= explored solutions
Three components of search heuristics: (1) Start Solution 5 different approaches - extreme-1, extreme-N, random, content-driven, route-driven Experimentally evaluated (2) Search Strategy Randomized algorithms -Iterative Improvement - Simulated annealing (3) Stop condition Largely depends on the search strategy employed -K-iterations, Plateau, Time-bounded, Resource-boundedToo expensive! Need heuristics!
(1) Form all possible sets for the given powerset
(2 ) Form partitions out of the above sets
Main idea:
8
Query Mesh Optimization Overview
Sample of Tuples(training dataset)
t10 t9 t8 t7 t6 t5 t4 t3 t2 t1t11t12…
Data Stream
…
Query Executor
Query Optimizer
… samplesamplesampleand so on
Compute Routes (i.e., plans)
Query Mesh
…
…
…
…
Induce Classifierr3
r4
r2r1
r1 r2 r4
- QM Optimizer
- QM Executor
8[NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology, (Demo) In VLDB 2009.
Query Mesh Execution Overview
Classification Window(tumbling window)
t5 t4 t3 t1
t9 t6 t2
t10 t8 t7
After Classification
route r1
route r2
route r3
t10 t9 t8 t7 t6 t5 t4 t3 t2 t1t11t12…
<1,4,3,2>
<2,4,3,1>
<3,4,1,2>
r-tokensdata tuples
rusters
Send to Self-Routing
Fabric
Data Stream
Query Executor
Query Optimizer
…
- QM Optimizer
- QM Executor
9
[NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology (Demo), In VLDB 2009.
11
Can we have an execution strategy that
Dynamic Re-optimization with Query Mesh
is plan-basedsupports different plans for distinct subsets of datais as adaptive “as Eddies”
Self-Tuning Query Mesh (ST-QM)
11[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query
Processing, In EDBT 2009.
12
Outline Introduction & Motivation Background : Query Mesh
Model Optimization Execution
Dynamic Re-Optimization with Query Mesh Challenges Architecture Details
Conclusion Current and Future Work
12
13
Challenges
Multiple routesClassifier
Query Mesh
…
…
…
1. What should be monitored to determine whether the current QM solution is no longer adequate?
2. How to determine if the current QM solution should be adapted?
3. How to efficiently execute the physical migration from the current QM to a new QM solution while the query is being executed?
Concept Drift Analysis, QM Cost Model, Improvement Measure
Data and Statistics Monitoring
Single Lightweight Operation to Physically Adapt QM
.
.Self-Tuning Query Mesh
…
…
…
Contributions
13
[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.
14
ST-QM Architecture
Static QM Framework
Query Executor
Query Optimiz
er
Query Executor
Query Optimiz
er
ST-QM
Adaptive QM Framework
14
[NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.
15
ST-QM Monitor continuously samples data and execution statistics that will be used to determine if a concept drift has occurred (i.e., QM needs to be adapted)
ST-QM Analyzer determines if a concept drift has actually occurred and makes recommendations if and how the QM solution should be adapted
ST-QM Actuator takes these recommendations and physically adapts the QM solution
ST-QM Components
ST-QMMonitor
ST-QMAnalyzer
ST-QMActuator
measurements recommendations
actuationsampling
15
Query Mesh
ST-QM
NewQuery Mesh
17
Classifier Modification
Query Mesh
………
Query Mesh
………
Query Mesh
………
R1 New Classifier + Old Routes
R2 Old Classifier + New Routes
R3 New Classifier + New Routes
ST-QM Actuator: Physical Query Mesh Adaptation
All possible recommendations:Case 1: Virtual Concept Drift RecommendationCase 2: Real Concept Drift RecommendationCase 3: Hybrid Concept Drift Recommendation
1
2
3
4
0
…
Query results
OI-arrayOp-modules
opi
opi
opk
opl
Self-Routing Fabric
Data
r1
r2
r3
r1
r2
r3
Online Classifier
rusters
rusters
CurrentClassifier
NewClassifier
The beauty of
the proposed design!!!
17
18
Experimental Evaluation
ST-QM was implemented inside Java-based continuous query engine called CAPE
Compare its relative performance against competitor systems, namely, we compared adaptive QM against: Static (non-adaptive) QM, Adaptive “plan-less” Eddies Adaptive “plan-less” Eddies with CBR-based routing policy
Results can be found in EDBT’ 2010.
18
19
Summary of ST-QM Experimental Results
ST-QM gave up to 44% improvement in execution time and output rate compared to non-adaptive QM, Eddy and single plan execution approach
The runtime overhead of ST-QM relative to query execution is small (on average 2%).
The actuation cost of physical adaptivity is nearly negligible resulting in 0.02% of total execution cost
Even if no adaptivity is needed, ST-QM’s performance in the worst case will be at most 2-3% slower than static QM
19
20
Conclusion
• Query Mesh is practical query optimization approach Eliminates single plan assumption Feasibility shown Has low overhead & high potential benefit Easily implemented and integrated with existing
systems
• Query Mesh leads to novel solutions Usage of machine learning in query optimization
and query processing Usage of network-inspired techniques in query
optimization and query processing20
21
Next Steps in QM Project
• Consider state caching and indexing in QM stream context
• Work with alternate classification methods for route decisions
• Design customized query optimization and processing strategies
• Study multi-query processing and optimization
• Scale by applying distributed processing technologies
• Do QM principles also apply in static DB context !?
21
Recommended