45
PSoup Kevin Menard CS 561 4/11/2005

PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

PSoup

Kevin Menard

CS 561

4/11/2005

Streaming Queries over Streaming Data

Sirish Chandrasekaran

UC Berkeley

August 20, 2002

with Michael J. Franklin

VLDB 2

002

Slides are modified versions of the following original presentation:

Sirish Chandrasekaran

Psoup Insight #1Queries and data are duals

Store new queries, apply to data that arrived earlier

Store new data, apply to queries that arrived earlier

Multiquery Processing = “join” of query and data– Supports all three types of queries: queries over the past,

(landmark and sliding window) continuous, and hybrid

Data

Index

Result

QueriesQuery

Index

Sirish Chandrasekaran

Psoup Insight #1

Index Index

Data

Result

DataQueries

Queries and data are dualsStore new queries, apply to data that arrived earlier

Store new data, apply to queries that arrived earlier

Multiquery Processing = “join” of query and data– Supports all three types of queries: queries over the past,

(landmark and sliding window) continuous, and hybrid

Sirish Chandrasekaran

Motivation?

Why another model for continuous queries?

What is wrong with how Aurora and STREAM supply responses?

Sirish Chandrasekaran

Motivation: Disconnected Operation

Previous solutions stream out answers immediatelyNot feasible/suitable for all applications

Intermittent Connectivity: e.g., Applications on hand-held devices (as in this morning’s keynote address)

Even if connected: Not always interested in streaming answers

Sirish Chandrasekaran

Psoup Insight #2Separate computation from delivery

Query answers continuously generated in backgroundApply windows on-demand to transmit “current” results

Efficient support for disconnected operationLow response time, Shared computation and storage across invocations

DataID R.aR.b

QueryID Predicate

Results Structure

Queries

Dat

a

T T FF T TF F FT F FRegister

TTFT

Invoke

}

Sirish Chandrasekaran

PSoup Query ModelSELECT select_listFROM from_listWHERE where_clauseBEGIN begin_timeEND end_time

Where clause: conjunction of boolean factorsBEGIN-END clause: system clock or sequence numbers(begin_time, end_time):

(constant, constant) – snapshot query(constant, variable) – landmark window query(variable, variable) – sliding window query

Sirish Chandrasekaran

Query Registration

SELECT select_list

FROM from_list

WHERE where_clause

BEGIN begin_time

END end_time

}

}

Standing Query Clause (SQC)

Windows_Table

Symmetric Jointo the

to the

QueryID: handle for future query invocations

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Query Specification

Data Store

ID48495051

R.a4730

3380

52 8 4

R.b

PSoup

(a) Initial State

Query Store

ID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=3

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Query Specification

PSoup

(b) Arrival of new Query

Select *From RWhere R.a<=4 and R.b>=3

New query

ID48495051

R.a4730

3380

52 8 4

R.bID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=3

Data StoreQuery Store

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Query Specification

PSoup

(c) Building Query Store

24R.a<=4 and R.b>=3

ID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=3

ID48495051

R.a4730

3380

52 8 4

R.b

BUILD

Data StoreQuery Store

Sirish Chandrasekaran

(d) Probing Data Store

Selections over Single Stream:Arrival of New Query Specification

PSoup

matchmatch

24R.a<=4 and R.b>=3

ID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=3

ID48495051

R.a4730

3380

52 8 4

R.b

PROBE

Data StoreQuery Store

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Query Specification

Results Structure

48495051

20????

52 ?

21

(e) Inserting Results

Results

Queries

Dat

a

22 23 2448

50

4

3

3

8

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Query Specification

Results Structure

48495051

20TFTF

52 F

21

(e) Inserting Results

Results

Queries

Dat

a

22 23 2448

50

4

3

3

8

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Data

Data Store

ID48495051

R.a4730

3380

52 8 4

R.b

PSoup

(a) Initial State

Query Store

ID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=324R.a<=4 and R.b>=3

Sirish Chandrasekaran

PSoup

(b) Arrival of new Data

New data

24R.a<=4 and R.b>=3

Query Store

ID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=3

Data Store

ID48495051

R.a4730

3380

52 8 4

R.b

53 3 6

Selections over Single Stream:Arrival of New Data

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Data

PSoup

(c) Building Data Store

24R.a<=4 and R.b>=3

Query Store

ID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=3

Data Store

ID48495051

R.a4730

3380

52 8 4

R.b

53 3 6BUILD

Sirish Chandrasekaran

(d) Probing Query Store

Selections over Single Stream:Arrival of New Data

PSoup

24R.a<=4 and R.b>=3

ID20212223

Predicate0<R.a<=5

R.a>4 and R.b=30>R.b>4

R.a=4 and R.b=3

Query Store Data Store

ID48495051

R.a4730

3380

52 8 4

R.b

53 3 6

match

match

PROBE

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Data

Results Structure

48495051

20

52

21

(e) Inserting Results

Results

Queries

Dat

a

22 23 24

53 ? ? ? ? ?

24R.a<=4 and R.b>=3

20 0<R.a<=5

Sirish Chandrasekaran

Selections over Single Stream:Arrival of New Data

Results Structure

48495051

20

52

21

(e) Inserting Results

Results

Queries

Dat

a

22 23 24

53 T F F F T

24R.a<=4 and R.b>=3

20 0<R.a<=5

Sirish Chandrasekaran

Query Invocation

Results Structure

48495051

20TFTF

52 F

21

Queries

22 23 24

Dat

a

53 T F F F T

}

Curr

en

t W

ind

ow

BEGIN begin_time

END end_time

System returns the results corresponding to the current value of the BEGIN-END clause

Sirish Chandrasekaran

Joins over R and S:Arrival of New Query Specification

Query StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID10143148

R.a2349

5317

R.b

R-Data Store

(a) Initial State

PSoup

ID21253649

S.a2345

2345

S.bS-Data Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Query Specification

23R.a<5 and R.a>S.a and S.b>1(b) Arrival of new Query

PSoupNew query

Query StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID10143148

R.a2349

5317

R.b

R-Data Store

S-Data StoreID21253649

S.a2345

2345

S.b

Sirish Chandrasekaran

Joins over R and S:Arrival of New Query Specification

23R.a<5 and R.a>S.a and S.b>1

(c) Building Query Store

PSoup

ID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID10143148

R.a2349

5317

R.b

R-Data Store

BUILD

S-Data StoreID21253649

S.a2345

2345

S.b

Query Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Query Specification

(d) Probing R-Data Store

PSoup

}

Matc

hes

23R.a<5 and R.a>S.a and S.b>1

ID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID10143148

R.a2349

5317

R.b

R-Data Store

PROBE

S-Data StoreID21253649

S.a2345

2345

S.b

Query Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Query Specification

ID20212223

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

R.a<5 and R.a>S.a and S.b>1

ID10143148

R.a2349

5317

R.b

R-Data Store

(e) Constructing Hybrid Structs

PSoup

} Matc

hes

101431

23 2>S.a and S.b>1

Query Store

23 3>S.a and S.b>123 4>S.a and S.b>1

Hybrid StructsR.ID Q.ID Q.Predicate

S-Data StoreID21253649

S.a2345

2345

S.b

Sirish Chandrasekaran

Joins over R and S:Arrival of New Query Specification

(f) Probing S-Data Store

PSoup

Matc

hes

{

ID20212223

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

R.a<5 and R.a>S.a and S.b>1

S-Data Store

ID10143148

R.a2349

5317

R.b

R-Data Store

Query Store

101431

23 2>S.a and S.b>123 3>S.a and S.b>123 4>S.a and S.b>1

Hybrid StructsR.ID Q.ID Q.Predicate

PROBE???

R,S,QResults ID

21253649

S.a2345

2345

S.b

Sirish Chandrasekaran

Joins over R and S:Arrival of New Query Specification

(f) Probing S-Data Store

PSoup

Matc

hes

{

ID20212223

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

R.a<5 and R.a>S.a and S.b>1

S-Data Store

ID10143148

R.a2349

5317

R.b

R-Data Store

Query Store

101431

23 2>S.a and S.b>123 3>S.a and S.b>123 4>S.a and S.b>1

Hybrid StructsR.ID Q.ID Q.Predicate

PROBE14,21,2331,21,2331,25,23

R,S,QResults ID

21253649

S.a2345

2345

S.b

Sirish Chandrasekaran

Joins over R and S:Arrival of New Data

Query StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID475051

R.a453

338

R.b

R-Data Store

(a) Initial State

PSoup

23 R.a<4 and R.b<S.b

ID484952

S.a453

432

S.bS-Data Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Data

(b) Arrival of new Data

PSoup New data53 5 4

Query StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID475051

R.a453

338

R.b

R-Data Store

23 R.a<4 and R.b<S.b

ID484952

S.a453

432

S.bS-Data Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Data

(c) Building R-Data Store

PSoup

Query StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID47505153

R.a4535

3384

R.b

23 R.a<4 and R.b<S.b

R-Data Store

BUILD

ID484952

S.a453

432

S.bS-Data Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Data

(c) Probing Query Store

PSoup

Matc

hes

{

Query StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

ID47505153

R.a4535

3384

R.b

23 R.a<4 and R.b<S.b

R-Data Store

PROBE

ID484952

S.a453

432

S.bS-Data Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Data

(d) Constructing Hybrid Structs

PSoup

Matc

hes

{

?5353

? 4<S.b21 ?22 ?

Hybrid Structs

ID47505153

R.a4535

3384

R.bQuery StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

23 R.a<4 and R.b<S.b

R-Data Store

R.ID Q.ID Q.PredicateID484952

S.a453

432

S.bS-Data Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Data

(d) Constructing Hybrid Structs

PSoup

Matc

hes

{

535353

20 4<S.b21 4<S.b and S.a<1022 10>S.a and S.b>2

Hybrid Structs

ID47505153

R.a4535

3384

R.bQuery StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

23 R.a<4 and R.b<S.b

R-Data Store

R.ID Q.ID Q.PredicateID484952

S.a453

432

S.bS-Data Store

Sirish Chandrasekaran

Joins over R and S:Arrival of New Data

(e) Probing S-Data Store

PSoup

Matc

hes

}Hybrid Structs

ID47505153

R.a4535

3384

R.b

ID484952

S.a453

432

S.bS-Data Store

Query StoreID202122

PredicateR.a=5 and R.b<S.b

R.a>4 and R.b<S.b and S.a<10R.b=4 and R.a+5>S.a and S.b>2

23 R.a<4 and R.b<S.b

R-Data Store

PROBE535353

20 4<S.b21 4<S.b and S.a<1022 10>S.a and S.b>2

R.ID Q.ID Q.Predicate 53,48,2253,49,22

R,S,QResults

Sirish Chandrasekaran

Other QueriesN-way Joins

Similar to 2-way joins

Probe, generate hybrid structs, repeat

Can be executed without intermediate tables

AggregationsPerformed at query invocation

Uses n-ary ranked tree, clustered on time

Sirish Chandrasekaran

Telegraph Background: CACQCACQ [MSHR02]

Shared execution of multiple queries with one EddyTuple lineage

Query Indices

Queries and Data treated very differently

Only Landmark Continuous Queries

No support for disconnected operation

Sirish Chandrasekaran

Leverage SteMs to store and index queries

Changes to EddiesEncode queries as tuples

break Where clause into individual boolean factors (BF)

encode each BF as

R.a relop [R.b|S.b] [+|-] constant

Stream Prefix ConsistencyA new query or data tuple is completely processed before any other tuple: no holes in Result Structure.

Results Structure: to buffer the results.

PSoup in Telegraph

Sirish Chandrasekaran

Experiments and ResultsAlternatives

NoMat – No background processingPSoup-Partial – background processing, apply current window on invocation PSoup-Complete – current windows are also continuously applied in the background

Experimental ParametersUnloaded Server with two Intel Pentium III, 666 MHz processors with 768 MB RAMData arrives as fast as possible, in domain [0,255]Queries of form R.a relop C, where c in [0,255]Join Queries of form R.a relop S.b +/- C.

Sirish Chandrasekaran

Experiments: Response Time vs. Window Size

Interval Predicates, Selection Queries

Sirish Chandrasekaran

Equality Predicates, Selection Queries

Experiments: Response Time vs. Window Size

Sirish Chandrasekaran

Window Size = 1000 tuples

Experiments: Max data arrival rate vs. #SQCs

Sirish Chandrasekaran

PSoup in traditional query processor

PSoup = SQL QUERY over data and client query streams?

Joins = expression evaluators

NotesConventional QPs do not have tuple lineage

Conventional QPs always use intermediate tables

Sirish Chandrasekaran

Conclusions

Treating Queries and Data the sameCombines approaches for previously studied queries

Queries over the past and continuous queries

Allows new functionality – hybrid queries

Separating Result Generation and DeliveryMakes disconnected operation feasibleEfficient support for repeated query invocations