48
PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research Society (DFG) grant Se

PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

Embed Size (px)

Citation preview

Page 1: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

PIPES: A Resource Adaptive Data Stream Management System

Bernhard SeegerPhilipps-University Marburg, Germany

Research supported by the German Research Society (DFG) grant Se 553/4-2

Page 2: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

2

Information Landscape

DBMS

Input

Output

DBMS

DBMS

DBMS

DBMS

DBMS

File System

File System

File System

File System

File System

DSMS

Page 3: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

3

Outline

Motivation and problem definition

Sliding Windows

Query Processing in PIPES Data Stream Model

Logical Operators

Algebraic Query Optimization

Physical Operators

Runtime Environment

Dynamic Plan Migration

Conclusions

Page 4: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

4

Example Application

Traffic monitoring Data format

Continuous dataflow streams Variable stream rates

Time + location dependence

Queries Continuous, long-running

“At which measuring stations of the highway has the average speed of vehicles been below

15 m/s over the last 15 minutes ?”

HighwayStream( lane, speed, length, timestamp )

Page 5: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

5

Data Streams

Continuously Arriving Sequence of Records

time as an integral component

Autonomous Data Sources sensors, mobile devices,

software agents, …

Important Type of Data miniaturization of hardware

ubiquitous networks

o o oo o …

Page 6: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

6

Requirements

Declarative Query Language Expressive like (Temporal) SQL

join of data streams according to time combination of data streams with persistent databases

assigns meaning to data

query results as a data stream

Publish/Subscribe Paradigm Subscribe: users register new queries Publish: continous report of results

Quality of Service (QoS) e. g. at least one record per second

scalability number of data sources number of subscribed queries

Page 7: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

7

Stream Query Processing

Similar to Traditional DBMS1. Queries expressed in CQL

SQL-like query language

2. Logical Query Plan algebra with „relational“ operators

3. Query Optimization algebraic rules

simple, but accurate cost model

4. Physical Query Plan select physical operators

5. Processing of the Query

Page 8: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

8

What is special about PIPES?

PIPES provides an Infrastructure for DSMS DSMS = Data Stream Management System PIPES = Public Infrastructure for Processing and Exploring

Data Streams Differences to DBMS

Semantics is borrowed from Temporal Databases Expressiveness Query Optimization

Data Driven Query Processing Publish/Subscribe

Adaptive Runtime Environment Dynamic assignment of resources at runtime scalability, QoS

Continuous Optimization of Queries von Anfragen plan migration scalability, QoS

Page 9: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

9

Outline

Motivation and problem definition

Sliding Windows

Query Processing in PIPES Data Stream Model

Logical Operators

Algebraic Query Optimization

Physical Operators

Runtime Environment

Dynamic Plan Migration

Conclusions

Page 10: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

10

2. Sliding Windows

Requirement of Users no impact of outdated data on our result integration of different streams according to time

Moving Temporal Windows Finite subsequence of an infinite stream Query processing is restricted to the most recent data

Important for an expressive and efficient query processing

Options Count-based windows

FIFO queue of size w

Time-based windows t time stamp of an element t + w + 1 end of the validity of an element

Page 11: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

11

Problem: Determinism

Data-driven Processing

Count-based Windows w = 2

Non-Determinism Result of a query depends

on scheduling

a3 b3

a3b1a3b2

a1

a2

b1

b2

a2b3a3b3

a3b1a3b2a2b3a3b3

a1b3a2b3a3b2a3b3

a1b3a2b3a3b2a3b3

Example: Symetric Join

a2

a3

b2

b3

Reset

a3b1a3b2a2b3a3b3

a1b3a2b3a3b2a3b3

Page 12: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

12

Temporal Windows in CQL

SELECT sectionIDFROM ( SELECT AVG(speed) AS avgSpeed, 1 AS sectionID FROM HighwayStream1 [Range 15 minutes] UNION ALL … UNION ALL SELECT AVG(speed) AS avgSpeed, 20 AS sectionID FROM HighwayStream20 [Range 15 minutes])WHERE avgSpeed < 15;

“At which measuring stations of the highway has the average speed of vehicles been below 15 m/s over the last 15 minutes ?”

Page 13: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

13

Outline

Motivation and problem definition

Sliding Windows

Query Processing in PIPES Data Stream Model

Logical Operators

Algebraic Query Optimization

Physical Operators

Runtime Environment

Dynamic Plan Migration

Conclusions

Page 14: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

14

3. Query Processing in PIPES

Data Streams Model Input Streams

Autonomous Source

Logical Streams Semantics

Physical Streams Implementation of the Semantics, but more expressive

Page 15: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

15

Input Streams

Sequence of Records Arbitrary, but fixed schema

No limitation to the relational model

Records with timestamps Temporal ordered

Schema: HighwayStream( short lane, float speed, float length, Timestamp timestamp )

Input Stream:(5; 18.28; 5.27; 5:00:08)(2; 21.33; 4.62; 5:01:32)(4; 19.69; 9.97; 5:02:16)

Page 16: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

16

Physical Stream

PIPES: Time Intervals instead of Points Validity of an element e

Processing of e restricted to its time interval

Removal of invalid records

Sequence of tuples (e, [tS, tE))

Ordered by tS and tE

((5; 18.28; 5.27; 5:00:08), [5:00:08, 5:00:09))((2; 21.33; 4.62; 5:01:32), [5:01:32, 5:01:33))((4; 19.69; 9.97; 5:02:16), [5:02:16, 5:02:17))

Transformation: input stream physical stream

Page 17: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

17

Data Stream Operators

Window Operator

Relational Operator „relational“ algebra on data streams

projection

selection

Cartesian product

union

difference

temporal extension of operators

Page 18: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

18

Window Operator

Purpose Extension of the validity of an element by w time units.

Overlap of windows of elements Elements need to be processed together

Window: w = 15 minutes

(e1, [5:00:08, 5:15:09))(e2, [5:01:32, 5:16:33))(e3, [5:02:16, 5:17:17))

Sliding window: 15 minutes

tS+1+wtS

w+1

Page 19: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

19

Relational Stream Operators

Snapshot-Reducibility Snapshot

Mapping of a physical stream to a non-temporal relation. Relation comprises all valid elements at point t

t

RelationalOperator

RelationalStreamOperator

S1, …, Sn R1, …, Rn

RoutSout

Page 20: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

20

Query Optimization

Application of Well-known Rules from Temporal Databases Slivinskas, Jensen, Snodgrass (ICDE 2000)

Query Plans for Conventional and Temporal Queries Involving Duplicates and Ordering

many rules directly applicable to streams

conventional + temporal rules

Basis for Effective Query Optimization

Page 21: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

21

1) Query2) Logical Query Plan3) Query Optimization4) Physical Query Plan

Steps

SELECT sectionIDFROM ( SELECT AVG(speed) AS avgSpeed, 1 AS sectionID FROM HighwayStream1 [Range 15 minutes] UNION ALL … UNION ALL SELECT AVG(speed) AS avgSpeed, 20 AS sectionID FROM HighwayStream20 [Range 15 minutes])WHERE avgSpeed < 15;

“At which measuring stations of the highway has the average speed of vehicles been below 15 m/s over the last 15 minutes ?”

Map: projection on sectionID

Filter: avgSpeed < 15

Union: merge of data streams

Aggregation: averagespeed (avgSpeed)Map: projection on speed., assigning sectionID

Window: w = 15 minutes

Page 22: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

22

Physical Operators

Stateless Operators Processing of an element is independent from the

previous ones.

Examples: filter, map

Stateful Operators Processing of an element depends on previous

elements Restrict to elements in sliding window

Explicit management of status

Examples:join, aggregation

Page 23: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

23

Data-driven Joins

Input streams A and B and sliding window of size w

join predicate P

Output records ((a,b), [tS,tE))

P(a,b)

overlapping intervals of a und b

a b

tS tE

(a,b)

Page 24: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

24

Methodology

Adaptation of Sweepline TechniquetA = Start time of last element of A

tB = Start time of last element of B

Status for each input Status of A: elements of A with end time ≥ tB

Status of B: elements of B with end time ≥ tA

Continuous Processing

A B

StatusA StatusB

insertionprobing & reorganisation

Page 25: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

25

Runtime Environment of PIPES

Sources

Sinks

Qu

ery

grap

h

PIP

ES

Page 26: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

26

Outline

Motivation and problem definition

Sliding Windows

Query Processing in PIPES Data Stream Model

Logical Operators

Algebraic Query Optimization

Physical Operators

Runtime Environment

Dynamic Plan Migration

Conclusions

Page 27: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

27

4. Plan Migration

Re-Optimization of Query Plans at Runtime Identification of poorly performing subgraphs in the

query graph

Plan Migration Substitution of old plan by a new one

Requirements

Preserving of snapshot reducibility

Continuous production of results

Short migration time

Page 28: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

28

Beispiel

R S T U

C1 C2Sinks

Sources

Page 29: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

29

Semantics Problems

Duplicates Parallel insertion of new elements into both plans

Loss of Results Exclusive insertion of new element in the new plan

Page 30: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

30

Split

Approach in PIPES

Assumptions Streams A and B Window of length w equivalent query plans Palt and Pneu

Earliest split time tsplit = max {tA, tB} + w

Splitting of the input at split time

tsplit

Page 31: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

31

Approach in PIPES

Production of Results Acceptance of all results received from the old plan

Pold

Selection of results received from the new plan Pnew

Acceptance only if start time > tsplit

Pold Pnew

Split

A

Split

B

Page 32: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

32

Properties

Method is broadly applicable Arbitrary plans

Many data streams

Different window sizes

Migration Time Worst-case: w time units

Page 33: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

33

Outline

Motivation and problem definition

Sliding Windows

Query Processing in PIPES Data Stream Model

Logical Operators

Algebraic Query Optimization

Physical Operators

Runtime Environment

Dynamic Plan Migration

Conclusions

Page 34: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

34

5. Conclusions

Applications Traffic management Alarming systems

Observation of production lines

Basic ideas of stream processing in PIPES Temporal Databases Data-driven query processing Adaptivity at runtime Continuous Optimization at runtime

Dynamic Plan Migration Broadly applicable approach

Page 35: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

35

Current Work

Problems Cost models for optimization

New techniques

Strategies for adaptation Memory

CPU

QoS

Runtime environment Realtime applications

Real applications for DSMS Observation of patients in hospitals

Processing of sensor data Coupling of PIPES and commercial products

Page 36: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

36

Related Work

Abadi, Carney, Cetintemel et al. Aurora: A new model and architecture for data stream

management. The VLDB Journal, 12(2):120-139, 2003.

Arasu, Babu, and Widom The CQL continuous query language: Semantic foundations and

query execution. Technical Report 2003-67, Stanford University, 2003.

Tucker, Maier, Sheard, and Faragas Exploiting punctuation semantics in continuous data streams.

IEEE Trans. Knowledge and Data Eng., 15(3):555-568, 2003.

Law, Wang, and Zaniolo Query languages and data models for database

sequences and data streams. In VLDB, pages 492-503, 2004.

Page 37: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

37

Papers on PIPES/XXL

Michael Cammert, Jürgen Krämer, Bernhard Seeger, Sonny Vaupel: An Approach to Adaptive Memory Management in Data Stream Systems , will appear in Proc. ICDE 2006.

Michael Cammert, Christoph Heinz, Jürgen Krämer, Bernhard Seeger: Sortierbasierte Joins über Datenströmen,BTW 2005, Karlsruhe - Germany, March, 2-4.

Björn Blohsfeld, Christoph Heinz, Bernhard Seeger:Maintaining Nonparametric Estimators over Data Streams,BTW 2005, Karlsruhe - Germany, March, 2-4.

Christoph Heinz, Bernhard Seeger: Wavelet Density Estimators over Data Streams (Extended Abstract),ACM Symposium on Applied Computing, Santa Fe - New Mexico, 2005.

Michael Cammert, Christoph Heinz, Jürgen Krämer, Bernhard Seeger: Anfrageverarbeitung auf Datenströmen,Datenbank-Spektrum 11: 5-13, (2004).

Jürgen Krämer, Bernhard Seeger:PIPES–A Public Infrastructure for Processing and Exploring Data Streams. Proc. SIGMOD 2004 (Demo)

Jochen Van den Bercken, Björn Blohsfeld, Jens-Peter Dittrich, Jürgen Krämer, Tobias Schäfer, Martin Schneider, Bernhard Seeger: XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries,In Proc. of the Conf. on Very Large Databases (VLDB), 39-48, September 2001.

Page 38: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

38

Future Work

Query optimization Adequate cost model

Not only stream rates

Runtime statistics: delays, memory usage, etc.

Static query optimization Multi query optimization

Subquery sharing

Dynamic query optimization Detection of suitable subgraphs

Plan migration at runtime

Temporal aspects Coalesce

Page 39: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

Thank you !

Any questions ?

For more information check our website:

http://dbs.mathematik.uni-marburg.de/Home/Research/Projects/PIPES

Page 40: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

40

Reorganization

Restriction of memory usage

All elements where tE mintSj tSj : latest start timestamp of input stream j

Ordering invariant no temporal overlap with future stream elements

Which elements can be discarded in internal data structures ?

Why ?

Page 41: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

41

Aggregation

Incremental computation

Efficient implementation Aggregation segment-tree

Amortized logarithmic costs per element

T

current state(aggregates)

new element

Example: Sum

4

25

345

9

7

ReorganizationInsertion

Page 42: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

42

Outline

Motivation and problem definition Query formulation Our temporal approach

Stream typesLogical query plansQuery optimizationPhysical query plansQuery execution

Exploration of Data Streams Conclusions

Page 43: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

43

Exploration of Data Streams

Example Estimation of selectivity during runtime of continuous range

queries:

select * from Stream S

where S.measure between min and max

Our Approach Exploit the density p of the distribution

Represents all information about the distribution

Suitable for estimating the selectivity multiple queries

max

min

)( dxxp

Page 44: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

44

Requirement

Problem Density is unknown

Adaptation of a non-parametric density estimation technique Kernels Wavelets Sampling and CDF

Requirements Low resource consumption (memory and CPU)

Memory and CPU adaptive Increasing memory size higher accuracy

Valid estimation at each point in time Adapt to a changing distribution

Page 45: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

45

Reservoir Sampling

CDF is built on top of the iid samples

Disadvantages Estimation relies on a few elements

No advantage from an increasing memory

Advantage Low processing overhead

main memory

12 5 2734 4

samples

0 jdata stream

... 34...5...12 4...27...

Page 46: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

46

Blockwise Estimation

Stream is transformed into blocks For simplicity: blocks are of the same size

Idea Estimation of the first k blocks is available

Compute the estimation of k+1 blocks iteratively

Example (Average)

Generalization for density functions Straightforward Extension

Problem: Violates the requirement of limited memory

actkk avgk

avgk

kavg

1

1

11

Page 47: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

47

Cumulative-Compressed Estimation

Compression Cubic splines

Weighting strategies

Amortized cost for updates O(log M)

))(ˆ)(ˆ)1(()(ˆ111 xsxfcompressxf kkkk

12 5 2734 4

sample

main memory

Current estimatorat time k

k 1k

Page 48: PIPES: A Resource Adaptive Data Stream Management System Bernhard Seeger Philipps-University Marburg, Germany Research supported by the German Research

48

Experimental Comparison

Streaming data from a real traffic data set

Arithmetic weights

Memory size: 5000