fast, accurate, automatic scaling decisions for ... · Three steps is all you need fast, accurate, automatic scaling decisions for distributed streaming dataﬂows Vasiliki Kalavri†,

Three steps is all you need fast, accurate, automatic scaling decisions

for distributed streaming dataflows

Vasiliki Kalavri†, John Liagouris†, Moritz Hoffmann†,Desislava Dimitrova†, Matthew Forshaw††, Timothy Roscoe†

Support:

†

†Systems Group, Department of Computer Science, ETH Zürich, [email protected]‡

††Newcastle University, [email protected]

mailto:[email protected]

Any streaming job will inevitably become over- or under-provisioned in the future

2


even

ts/s

time

load shedding

: input rate : throughput

2

even

ts/s

time

idle resources


even

ts/s

time

load shedding


2

even

ts/s

time

idle resources

even

ts/s

time

SLO violation


even

ts/s

time

load shedding


2

3

input streams

parallel dataflow

output stream

CONFIGURING PARALLELISM FOR A STREAMING JOB

3

input streams

parallel dataflow

output stream

1. monitor event rates

2. configure parallelism

3. deploy and test performance

until the target throughput is met

CONFIGURING PARALLELISM FOR A STREAMING JOB

THE SCALING PROBLEM

4

Given a logical dataflow with sources S1, S2, … Sn and rates λ1, λ2, … λn identify the minimum parallelism πi per operator i, such that the physical

dataflow can sustain all source rates.

S1

S2

λ1

λ2

S1

S2

π=2

π=3

logical dataflow physical dataflow

5

scaling controllermetrics

policy

scaling action

AUTOMATIC SCALING OVERVIEW

5


policy

scaling action

detect symptoms


5


policy

scaling action

detect symptoms

decide whether to scale


5


policy

scaling action

detect symptoms

decide whether to scale

decide how much to scale


6

policy scaling actionBorealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…

Existing approachesmetrics

6

policy scaling action

CPU utilizationbacklog, tuples/s

backpressure signal

Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


6



backpressure signal

threshold and rule-basedif CPU > 80% => scale

Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


6



backpressure signal


small changes,one operator at a time

Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


6



backpressure signal



problematic due to interference, multitenancy

Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


6



backpressure signal




sensitive to noise, manual, hard to tune

Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


6



backpressure signal





non-predictive, speculative steps

Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


6



backpressure signal






Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


oscillations

6



backpressure signal






Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


oscillationstemporary over- and under-provisioning

6



backpressure signal






Borealis

StreamCloud

Seep

IBM Streams

Spark Streaming

Google Dataflow

Dhalion

…


oscillations slow convergence

temporary over- and under-provisioning

7

effect of Dhalion’s scaling actions in an initially under-provisioned wordcount dataflow

7

effect of Dhalion’s scaling actions in an initially under-provisioned wordcount dataflow

12

3 654

8

externally observed

threshold-based

non-predictive, single-operator

policy

scaling action

metrics

true rates through instrumentation

dataflow dependency model

predictive, dataflow-wide

actions

externally observed

threshold-based


policy

scaling action

metrics

8

OUR APPROACH: DS2

no oscillations

fast convergence

true rates as bounds to avoid

over/under-shoot

true rates through instrumentation

dataflow dependency model

predictive, dataflow-wide

actions

externally observed

threshold-based


policy

scaling action

metrics

8

OUR APPROACH: DS2

o1src o2

10 rec/s 100 rec/s

backpressuretarget: 40 rec/s

9

o1src o2

10 rec/s 100 rec/s


9

observed view

Which operator is the bottleneck?

What if we scale ο1 x 4?

How much to scale ο2?

10

o1src o2

10 rec/s 100 rec/s


observed view

10

o1src o2

10 rec/s 100 rec/s


observed view

src

o1

o2

Time (s)

10 10 10

1 2 3 4

100 100: busy: waiting

0.5s

instrumentation

DS2 view

10

o1src o2

10 rec/s 100 rec/s


observed view

src

o1

o2

Time (s)

10 10 10

1 2 3 4


0.5s

instrumentation

DS2 view

10

o1src o2

10 rec/s 100 rec/s


observed view

ο1 is the bottleneck

src

o1

o2

Time (s)

10 10 10

1 2 3 4


0.5s

instrumentation

DS2 view

10

o1src o2

10 rec/s 100 rec/s


observed view

ο1 is the bottleneck

true rate = 200 rec/s

2 ο2 instances can keep up with the rate

of 4 ο1 instances

THE DS2 MODEL

11

THE DS2 MODELUseful time: The time spent by an operator instance in deserialization, processing, and serialization activities.

11


11

True processing (resp. output) rate: The number of records an operator instance can process (resp. output) per unit of useful time.


11

True processing (resp. output) rate: The number of records an operator instance can process (resp. output) per unit of useful time.

Optimal parallelism for oi

aggregated true output rate of upstream opsaverage true processing rate of oi

:

CONVERGENCE STEPS

12

parallelism

initial rate

target• no overshoot when scaling up: ideal rate is an upper bound

• no undershoot when scaling down: ideal rate is a lower bound

if the actual scaling is linear, convergence takes one step

p0

x

CONVERGENCE STEPS

12

parallelism

initial rate

targetpre

dictio

n

• no overshoot when scaling up: ideal rate is an upper bound



p0

x

CONVERGENCE STEPS

12

parallelism

initial rate

targetpre

dictio

n




p0 p1

x

x

CONVERGENCE STEPS

12

parallelism

initial rate

targetpre

dictio

n



actu

alif the actual scaling is linear,

convergence takes one step

p0 p1

x

x

13

In practice, rates are commonly sub-linear due to other overheads (e.g. worker coordination).

parallelism

initial rate

target

x

predic

tion

x

p0 p1

CONVERGENCE STEPS

13


parallelism

initial rate

target

x

predic

tion

xactual

p0 p1

CONVERGENCE STEPS

13


parallelism

initial rate

target

x

predic

tion

xactual

x error

when the actual scaling is sub-linear, convergence takes more than one steps p0 p1

CONVERGENCE STEPS

14


parallelism

p0

initial rate

target

p1

predic

tion

actual

xx

p2

In our experiments, DS2 took up to three steps to converge

for complex queries.

x

p3

CONVERGENCE STEPS

15

DS2 operates online in a reactive setting

Instrumented stream processor

Scaling Manager Scaling Policy

Metrics Repository

invoke

re-scale job

report metrics

monitor

pull metrics

decision

Timely dataflow

Apache Flink

EVALUATION

DS2 VS. DHALION ON HERON

17

wordcount

Target rate: 16.700 rec/s


17

wordcount

DS2 converges in a single step for both operators



17

wordcount


DS2 converges in 60s, i.e. as soon as it receives the Heron

metrics



17

wordcount


Dhalion scales one operator at a time, resulting to a total

of six steps


metrics



17

wordcount



of six steps


metrics

Dhalion converges in 2000s



17

wordcount



of six steps


metrics

Dhalion converges in 2000s

+10 counts

+12 mappers


DS2 ON APACHE FLINK

18

wordcount

Target rate: 2.000.000 rec/s

DS2 ON APACHE FLINK

18

wordcount

Apache Flink savepoint and

reconfiguration takes ~30s


DS2 ON APACHE FLINK

18

wordcount



DS2 converges in two steps for both operators Target rate: 2.000.000 rec/s

DS2 ON APACHE FLINK

18

wordcount



DS2 converges in two steps for both operators

DS2 reacts 3s after the target

rate has changed


CONVERGENCE - NEXMARK

initial parallelism

Q1: flatmap

Q2: filter

Q3: incremental

join

Q5: tumbling window

join

Q8: sliding

window

Q11: session window

8 => 12 => 16 11 => 13 => 14 16 => 20 14 => 15 => 16 10 12 => 22 => 28

12 => 16 14 18 => 20 16 10 22 => 28

16 => 16 12 => 14 20 16 8 => 10 26 => 28

20 => 16 13 => 14 20 14 => 16 8 => 10 28

24 => 16 14 20 14 => 16 8 => 10 28

28 => 16 14 20 13 => 16 8 => 10 28

=> : scaling action

19


initial parallelism

Q1: flatmap

Q2: filter

Q3: incremental

join

Q5: tumbling window

join

Q8: sliding

window

Q11: session window

8 => 12 => 16 11 => 13 => 14 16 => 20 14 => 15 => 16 10 12 => 22 => 28

12 => 16 14 18 => 20 16 10 22 => 28

16 => 16 12 => 14 20 16 8 => 10 26 => 28

20 => 16 13 => 14 20 14 => 16 8 => 10 28

24 => 16 14 20 14 => 16 8 => 10 28

28 => 16 14 20 13 => 16 8 => 10 28

=> : scaling action

scale-up

19


initial parallelism

Q1: flatmap

Q2: filter

Q3: incremental

join

Q5: tumbling window

join

Q8: sliding

window

Q11: session window

8 => 12 => 16 11 => 13 => 14 16 => 20 14 => 15 => 16 10 12 => 22 => 28

12 => 16 14 18 => 20 16 10 22 => 28

16 => 16 12 => 14 20 16 8 => 10 26 => 28

20 => 16 13 => 14 20 14 => 16 8 => 10 28

24 => 16 14 20 14 => 16 8 => 10 28

28 => 16 14 20 13 => 16 8 => 10 28

=> : scaling action

scale-up

scale-down

19


initial parallelism

Q1: flatmap

Q2: filter

Q3: incremental

join

Q5: tumbling window

join

Q8: sliding

window

Q11: session window

8 => 12 => 16 11 => 13 => 14 16 => 20 14 => 15 => 16 10 12 => 22 => 28

12 => 16 14 18 => 20 16 10 22 => 28

16 => 16 12 => 14 20 16 8 => 10 26 => 28

20 => 16 13 => 14 20 14 => 16 8 => 10 28

24 => 16 14 20 14 => 16 8 => 10 28

28 => 16 14 20 13 => 16 8 => 10 28

a single step for many queries and initial configurations

=> : scaling action

scale-up

scale-down

19


initial parallelism

Q1: flatmap

Q2: filter

Q3: incremental

join

Q5: tumbling window

join

Q8: sliding

window

Q11: session window

8 => 12 => 16 11 => 13 => 14 16 => 20 14 => 15 => 16 10 12 => 22 => 28

12 => 16 14 18 => 20 16 10 22 => 28

16 => 16 12 => 14 20 16 8 => 10 26 => 28

20 => 16 13 => 14 20 14 => 16 8 => 10 28

24 => 16 14 20 14 => 16 8 => 10 28

28 => 16 14 20 13 => 16 8 => 10 28

at most 3 steps

a single step for many queries and initial configurations

=> : scaling action

scale-up

scale-down

19

RECAP

20

Observed metrics threshold-based policies can lead to

oscillations, misconfiguration and slow convergence.

DS2 uses instrumentation to measure true processing and

output rates and estimate parallelism for all operators at once.

Stream processor

Scaling Manager Scaling Policy

Metrics Repositor

invoke

re-scale job

report metrics

monitor

pull metrics

decision

Timely dataflow

DS2 makes fast and accurate scaling decisions and converges in up to three steps even for non-

linear, complex dataflows.

https://github.com/strymon-system/ds2

Three steps is all you need fast, accurate, automatic scaling decisions

for distributed streaming dataflows

Vasiliki Kalavri†, John Liagouris†, Moritz Hoffmann†,Desislava Dimitrova†, Matthew Forshaw††, Timothy Roscoe†

Support:

†

†Systems Group, Department of Computer Science, ETH Zürich, [email protected]‡

††Newcastle University, [email protected]

mailto:[email protected]

Documents

fast, accurate, automatic scaling decisions for ... · Three steps is all you need fast, accurate, automatic scaling decisions for distributed streaming dataﬂows Vasiliki Kalavri†,