28
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts, USA SIGMOD’2004 *Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

Dynamic Plan Migration for Continuous Query over Data Streams

  • Upload
    rhonda

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Dynamic Plan Migration for Continuous Query over Data Streams. Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts, USA SIGMOD’2004. - PowerPoint PPT Presentation

Citation preview

Page 1: Dynamic Plan Migration for Continuous Query over Data Streams

Dynamic Plan Migration for Continuous Query over Data

StreamsYali Zhu, Elke Rundensteiner and George Heineman

Database System Research Group, WPI.Massachusetts, USA

SIGMOD’2004*Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

Page 2: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 2

Stream Query Optimization

Differences with Traditional Query Optimization?

Page 3: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 3

Stream Query Optimization New classes of operators (windows) may mean

new rewrites New execution modes (continous/pipelining) More dynamic fluctuations in statistics

compile time optimization not possible Global optimization not practical; as huge query

networks Adaptive optimization. Other cost models taking memory into account Query optimization and load shedding

Page 4: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 4

Motivation of ‘Query Migration’

Continuous query over streamsStatistics unknown before startStatistics changing during execution

Stream rates, arrival pattern, distribution, etc

Need for dynamic adaptationPlan re-optimization

Change the shape of query plan tree

Page 5: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 5

Run-time Plan Re-Optimization

Step 1 - Decide when to optimizeStatistics Monitoring

Step 2 – Generate new query planQuery Optimization

Step 3 – Replace current plan by new planPlan Migration

Page 6: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 6

Naïve Plan Migration Strategy

Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan

AB

BC

A B C

AB

BC

A B C

Problem: Works for stateless operators only

Page 7: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 7

Stateful Operator in CQ Why stateful

Need non-blocking operators in CQ Operator needs to output partial results State data structure keep received tuples

AB

A B

b1b2b3b4b5

ax

State A State B

ax

ax b2ax b3

Key Observation: The purge of tuples in states relies on processing of new tuples.

Example: Symmetric NL join w/ window constraints

Page 8: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 8

Naïve Migration Strategy Revisited

Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan

AB

BC

A B C(2)

All tuples drained

(4)Processing

Resumed

(3) Old Replaced

By new

Deadlock Waiting Problem:

Page 9: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 9

Problem Definition Dynamic Plan Migration

Input (two migration boxes) One contains old plan One contains new plan Have same input and output queues

Result Old box is replaced by new box

Valid Migration No missing tuples No duplicates

BC

AB

QA QB QC QD

QABCD

AB

CD

BC

QA QB QC QD

QABCD

SAB SC

SA SBSB SC

SBC SD

SBCDSACD

SABC SD

Key points:- Involved plans contain stateful operators- Need to migrate yet still retain useful states and discard useless states.

Page 10: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 10

State of the Art

“Efficient mid-query re-optimization of sub-optimal query execution plans” [Kabra, DeWitt 1998] Only migrates unprocessed portion

Query plan competing model [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994] Generate several candidate query plans before start Execute all, choose one after a while

Page 11: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 11

Outline

Problem Motivation and Definition Dynamic Migration Strategies

Moving State StrategyParallel Track Strategy

Experimental Results

Page 12: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 12

Moving State Strategy Basic idea

Share common states between two migration boxes

Key steps State Matching

Match states based on IDs. State Moving

Create new pointers for matched states in new box

What’s left? Unmatched states in new

box

CDSABC SD

BCSAB SC

ABSA SB

ABSA SBCD

CDSBC

SD

BCSB SC

QA QB QC QD QA QB QC QD

QABCD QABCD

Old Box New Box

Page 13: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 13

Unmatched States State Recomputing

Recursively recompute unmatched SBC and SBCD from bottom up

Why always possible? Old and new boxes have same input

queues The states associated with input

queues always match Why necessary?

ABSA SBCD

CDSBC SD

BCSB SC

QA QB QC QD

QABCD

Page 14: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 14

Terms on Tuples New/Old tuples

Old: tuples already in old box when migration starts New: tuples not exist in old box when migration starts

Sub-tuples Tuple ABCD is result of Tuple A, B, C and D are sub-tuples of tuple ABCD Tuple ABCD has 24=16 possible combinations of old/new sub-tuples

A B C D

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

QABCD

Page 15: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 15

Why Recompute Unmatched States

To get the complete results of ABCD, we need all 16 old/new combinations

AB

CD

BC

QB QC QDQA

SA

SD

SB SC

SBCD

SBC

If SBC not recomputed, will miss results with both B and C as OLD:

Old TupleNew Tuple

B C DAB C DAB C DA

Page 16: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 16

Cost Estimation of MS Migration Cost of MS consists of

Cost of state matching ID comparison (neglectable)

Cost of state moving Create pointers (neglectable)

Cost of state recomputing Majority of cost

Affecting parameters Operator selectivities # of tuples in states

Estimated as (input rate x window size) See paper for detailed cost models

One cost model conclusion:

Cost of MS has polynomial relation to window size

Page 17: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 17

MS Migration Pros and Cons

ProsFast when # of tuples in states is small

Low input rates, low selectivity or small window Cons

Output silence during entire migration stage Can query output even during migration?

Motivation for Parallel Track Strategy

Page 18: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 18

Parallel Track Strategy Basic idea

Execute both plans in parallel and gradually “push” old tuples out of old box by purging

Key steps Connect boxes Execute in parallel

Until old box “expired” (no old tuple or sub-tuple)

Disconnect old box Start execute new

box only

CD

SABC SD

BC

SAB SC

AB

SA SB

ABSA

SBCD

CD

SBC SD

BCSB SC

QA QB QC QD

QA QB QC QD

QABCD QABCD

Page 19: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 19

Potential Duplicates Tuple ABCD

24=16 possible old/new sub-tuple combinations

Same case not generated by both boxes

Otherwise we may have duplicates

In new box all states start empty only generates ABCD as

(new,new,new,new) In old box

may generate all 16 cases duplicate the case of

(new,new,new,new)

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

QABCDAt root op in old box:If both to-be-joined tuples have all-new

sub-tuples, don’t join.

Other op in old box:

Proceed as normal

Duplicate Prevention

Page 20: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 20

Estimation of PT Migration

TPT ≈ 2W

1st W

2nd W

TM-start

TM-end

T

New New

OldOld

New New

Old Old

Estimation Formula:

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

Old Box W

Page 21: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 21

PT Migration Duration Given enough system computing resources

new tuples processed right away PT migration duration ≈ 2W

If not enough system resources New tuples accumulated in queues PT migration duration > 2W

Page 22: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 22

Cost Estimation of PT Migration

Cost of PT = cost of process 2W tuples in old box

+ cost of process 2W tuples in new box

Parameters: Input rates, window size, selectivity

Similar to MS strategy

Page 23: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 23

PT Migrations Pros and Cons

ProsKeep on producing results even during

migration no results during MS migration

ConsMigration duration is at least 2W

MS may be faster depending on # tuples in states

Page 24: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 24

Outline

Problem Definition and Motivation Dynamic Migration Strategies

Moving State StrategyParallel Track Strategy

Experimental Results

Page 25: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 25

Experimental Setup Embed in the CAPE system

CAPE = Continuous Adaptive Processing Engine A streaming query engine developed at DSRG, WPI

VLDB’04 demo Layers of Adaptations

Punctuation exploring Adaptive scheduling Query migration Dynamic distribution

Input Streams By stream generator of CAPE Poisson arrival pattern

Experiments on migration duration Vary window size

CAPE Runtime Engine

Runtime Engine

OperatorConfigurator

QoS Inspector

OperatorScheduler

PlanMigrator

ExecutionEngineStorage

ManagerStream

Receiver

DistributionManager

Query PlanGenerator

Stream / QueryRegistration

GUI

StreamProvider

QueriesResults

CAPE Runtime Engine

Runtime Engine

OperatorConfigurator

QoS Inspector

OperatorScheduler

PlanMigrator

ExecutionEngineStorage

ManagerStream

Receiver

DistributionManager

Query PlanGenerator

Stream / QueryRegistration

GUI

StreamProvider

QueriesResults

Page 26: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 26

Migration Duration vs. Window Size

02000400060008000

100001200014000

0 2000 4000 6000 8000Global Window Size W (ms)

Mig

ratio

n D

urat

ion

(ms)

Measured T_PT Estimated T_PT

0200400600800

100012001400160018002000

0 2000 4000 6000 8000Global Window Size W (ms)

Mig

ratio

n D

urat

ion

(ms)

Measured T_MS Poly. (Measured T_MS)

02000400060008000

100001200014000

0 1000 2000 3000 4000 5000Window Size (ms)

Mig

ratio

n D

urat

ion

T_MS T_PT

Page 27: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 27

Conclusions

Identify problem of migration for stateful operators First solutions for continuous query migration

Moving state strategy Parallel track strategy

Embed both strategies into stream system Cost model and experimental evaluation

Cost model confirmed by experiments Identify performance trade-off of the two strategies

Page 28: Dynamic Plan Migration for Continuous Query over Data Streams

SIGMOD 2004 28

Thank You

For more information, check the CAPE website @:

http://davis.wpi.edu/~dsrg/CAPE/