19
Chair for Computer Science 6 (Data Management) Friedrich-Alexander-University of Erlangen-Nuremberg Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener 2011-09-23 IDEAS 2011 Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems

Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems

  • Upload
    phila

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems. Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener. 2011-09-23. IDEAS 2011. Agenda. Problem Statement Calibration of Cost Models Function Approximation - PowerPoint PPT Presentation

Citation preview

Page 1: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

Chair for Computer Science 6 (Data Management)Friedrich-Alexander-University of Erlangen-Nuremberg

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

2011-09-23

IDEAS 2011

Black-box Determination of Cost Models’ Parameters for

Federated Stream-Processing Systems

Page 2: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

2

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Agenda

Problem Statement

Calibration of Cost ModelsFunction Approximation

Estimating the Costs of Single Operators

Evaluation

Summary

Perspective: Cost Estimation for Federated DSMS

2011-09-23

Page 3: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

3

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Problem Statement

DSAM: heterogeneous distributed data stream processing

Automatic cost-based query distribution

Problem: hardware and DSMS specific cost models needed

2011-09-23

Page 4: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

4

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Things we know a priori

2011-09-23

Operator graph

Topology

Data rates

Selectivity

Distribution of certain values

For some operators: Cost model Calibration of Cost Models

Stream characteristics

Page 5: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

5

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Things we do not know a priori

2011-09-23

Hardware and DSMS-specific parameters of cost models

System costs

For some operators: cost model Function approximation

Page 6: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

6

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Calibration of Cost Models - Parameter Estimation

Cost model consists ofStream and operator-dependent parameters

Constant values

Hardware/System/Implementation dependent values

Test queries and input streamsDifferent values for the stream and operator dependent parameters

Cost Measurements

Least squares

Outlier detection (e.g. RANSAC)

2011-09-23

Page 7: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

7

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Function Approximation – Nonparametric Models

No appropriate cost model Operator without existing cost model

Existing cost models could not be fitted to a specific system

Solution: function approximation

Radial Basis Function Network (RBNF)Function approximation instead of interpolation

Less centers than input points

Moore-Penrose pseudoinverse least squares solution

Improving the function approximationIterative approach

1. Naive function approximation

2. Improving areas of interest (e.g. discontinuities, high gradient)

2011-09-23

Page 8: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

8

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Estimating the Costs of Single Operators

Assumptions Only the system costs can be measured

The costs of a single operator are independent of other operators additivity

System costs linear dependent on the number of operators

Parallel instances of the same operator

LatencyParallel operators latency not dependent on the number of operators

Operators have to be connected in series

2011-09-23

Page 9: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

9

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Evaluation

Coral 8

Test settingSynthetic input streams with constant properties

(rate, attribute value distribution)

Every test query running for two minutes

The test data collected in the first minute is discarded

Measured valuesLatency

Memory consumption (resident set size)

CPU usage

Coral8 status stream

Input and output rate

Query latency

Application Memory

2011-09-23

Page 10: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

10

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Coral8 Measurements

Filter operatorApplication memory

CPU usage

Unexpected behavior: steps and peaks

2011-09-23

Page 11: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

11

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Costs of Single Operators

CPU usage linear dependent on the number of operators

Slope equals the costs of a single operator

Operators Operators

2011-09-23

Page 12: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

12

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Model Calibration and RBFN

Application memory of the aggregate operator

Left side: Calibrated cost modelLinear cost model

Right side: Function ApproximationAdapts to the steps

2011-09-23

Page 13: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

13

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Cost Estimation for Operator Graphs

Operator graph consisting of 100 parallel filter operators

Cost estimation using function approximation

2011-09-23

Page 14: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

14

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Summary

Cost estimation for black-box systems without cost estimatorsCalibration of a cost model

Default cost model

System-specific cost model

Function approximation

Calibration of a cost model for unknown systems Behavior conforming to cost model is required

Nonconforming behavior can be detected (automatically) after some measurements

EvaluationCPU usage and memory consumption can be estimated

Latency: Queuing theory

2011-09-23

Page 15: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

15

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Application: Cost Estimation for Federated DSMS

Cost formulas as metadataCost formulas containing constants, variables and parameters

Cost estimationHardware-dependent and system-dependent parameters loaded from metadata catalog

Operator-dependent variables by a metadata provider

Stream-dependent variables by a monitoring component or an estimator

Interpreter to calculate costs

AdvantagesBoth default and system specific cost formulas possible

Cost models interchangeable at runtime

2011-09-23

Page 16: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

16

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Any questions…?

2011-09-23

Page 17: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

17

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener

Generating Test Data and Test Queries

Identifying parameters

Cost model based Identifying query or stream-dependent parameters

Generating a set of test data for the parameters

Mapping the parameters to the query language and stream properties

Operator or query language basedNo existing cost model

Function approximation

Identifying important parameters based on the query language and possible stream properties

Generating a set of test data

2011-09-23

Page 18: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

18

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener

Problem statement

Global Query Graph

Op1 Op2

Op5

Op3 Op4

Op6

Stream1

Stream2

Node 1

Node 2

Node 3

Distributed Query Processing

Data Rate, Density, Statistics

Out

Data Rate, Density, Statistics

???

??? Relevant metadata about inner streams unknown

???

??????

???

SSDBM 2010

Page 19: Black-box Determination of  Cost Models’ Parameters for  Federated Stream-Processing Systems

19

Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener

Propagation of Densities

Propagation of input streams‘ statistics

Propagation of statistics for inner streams between operators

Propagation of statistics for output streams

Statistical objective: Attribute Value Distribution (Density)

Analytic Operator ModelAccurate Formulas

Numerical Operator ModelDiscrete Mappings

Training of mapping relation Data Rate, Density, Statistics

OperatorInput-Stream Output-Stream

Operator Model

Data Rate, Density, Statistics

AnalyticOperator

Model

NumericalOperator

Model

SSDBM 2010