7
Peter R. Pietzuch [email protected] THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven Algorithms and Architectures for Self-Aware Computing Systems, Dagstuhl 2015 Marco Fiscato Imperial College London, UK Theodoros Salonidis IBM Research, USA Peter Pietzuch Imperial College London, UK

Peter R. Pietzuch [email protected] THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

Embed Size (px)

Citation preview

Page 1: Peter R. Pietzuch prp@doc.ic.ac.uk THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

Peter R. Pietzuch [email protected]

THEMIS: Fairness in Data Stream Processing under Overload

Evangelia KalyvianakiCity University London, UK

15041 Model-driven Algorithms and Architectures for Self-Aware Computing Systems, Dagstuhl 2015

Marco FiscatoImperial College London, UK

Theodoros SalonidisIBM Research, USA

Peter PietzuchImperial College London, UK

Page 2: Peter R. Pietzuch prp@doc.ic.ac.uk THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

The Puzzle of Big Data Real-Time Processing Engines in Data Centres

2

Queries overload data center resources. How to efficiently allocate resources across

clusters/engines?

Page 3: Peter R. Pietzuch prp@doc.ic.ac.uk THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

3

A well-known technique to handle transient

overload conditions is to discard data [][][]

Data Shedding

overloaded

overloaded

How to measure shedding across

queries?

a well-known mechanism to handle transient

overload conditions is to discard data

How much data should

we shed from queries?

How to implement

shedding in this distributed

setup?

Page 4: Peter R. Pietzuch prp@doc.ic.ac.uk THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

4

shedding data reduced correctness degraded performance

different dropped data difference degrees of degradation

Source Information Content (SIC) metric

measures the contribution of data from sources to results11/6 < 3degraded

processing

perfectprocessing

How to measure shedding across queries?

SIC is a data-stream-processing-aware metric.

But can we have a metric that is operator- or query-aware?

Page 5: Peter R. Pietzuch prp@doc.ic.ac.uk THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

5

Fair Shedding for Equalising SIC values

each local shedder equalises the SIC values of its own queries

global coordination is achieved with local informed shedding

Page 6: Peter R. Pietzuch prp@doc.ic.ac.uk THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

6

SIC Fair Shedder

to address nodes’ heterogeneity and workload variations:online cost model estimates

the time to process an average tupleCould we build the system to be goal-aware?

Page 7: Peter R. Pietzuch prp@doc.ic.ac.uk THEMIS: Fairness in Data Stream Processing under Overload Evangelia Kalyvianaki City University London, UK 15041 Model-driven

7

A self-aware autonomic system for data processing in real-time

Systems already have (some) adaption and (some) self-awareness but could we extend to (full) self-awareness?

For example, can we build a self-aware system to performfair data shedding for data stream processing and databases and filesystems in overload?

Thank you! Questions?

[email protected] http://www.staff.city.ac.uk/~sbbj913/