CICADA - USENIX · 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm 8 vm 9 vm 2 vm 3 vm 4 vm 5 vm 6 vm 7 vm 8 vm 9 vm 1 rigid application (similar to VOC [1]) vm 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm

CICADAPREDICTIVE GUARANTEES FOR CLOUD NETWORKS

Katrina LaCurts Jeff Mogul

Hari Balakrishnan Yoshio Turner

MIT CSAIL, Google Inc., HP Labs

lacurts@mit|2

BANDWIDTH GUARANTEES

`

lacurts@mit|2

BANDWIDTH GUARANTEES10 machines

`

lacurts@mit|2


`

lacurts@mit|2


`

lacurts@mit|2

BANDWIDTH GUARANTEES

bandwidth guarantee: a promise from the provider to the customer that its VMs will be able to

communicate with each other at a particular rate (informal definition)

10 machines

`

lacurts@mit|3

WHY GUARANTEES?

lacurts@mit|3

WHY GUARANTEES?

is there really that much data?

lacurts@mit|3

WHY GUARANTEES?applications can send

terabytes or petabytes of data internally [1]

is there really that much data?

[1] Zaharia, et al. “Improving MapReduce Performance in Heterogeneous Environments”. OSDI 2008

lacurts@mit|3


terabytes or petabytes of data internally [1]

aren’t cloud networks homogeneous and well-provisioned?


lacurts@mit|3


terabytes or petabytes of data internally [1]application traffic is

affected by other users

aren’t cloud networks homogeneous and well-provisioned?


lacurts@mit|3




what customers care about network performance?


lacurts@mit|3




enterprise customers need to satisfy SLAs with their own customers

what customers care about network performance?


lacurts@mit|4

POSSIBLE ARCHITECTURE

lacurts@mit|4


customer requests machines and guarantee

10 machines, 10Gb/s between each pair

lacurts@mit|4



provider places customer’s VMs,

enforces guarantee


lacurts@mit|4




enforces guarantee


what if some pairs of VMs use fewer than 10Gb/s?

lacurts@mit|4




enforces guarantee



over-provisioned network increased cost to customer,

wasted bandwidth within network

lacurts@mit|4




enforces guarantee



what if some pairs of VMs need more than 10Gb/s?



lacurts@mit|4




enforces guarantee






under-provisioned network poor performance for

customer’s application

lacurts@mit|4




enforces guarantee






under-provisioned network poor performance for

customer’s application

problem: how do customers know what guarantee they need?

lacurts@mit|5

CICADA’S ARCHITECTUREcicada makes predictions about an application’s

traffic to automatically generate a guarantee

lacurts@mit|5

CICADA’S ARCHITECTURE

customer makes initial request for machines

cicada makes predictions about an application’s traffic to automatically generate a guarantee

lacurts@mit|5


customer makes initial request for machines provider places

tenant VMs


lacurts@mit|5


hypervisors send measurements to cicada controller


tenant VMs


lacurts@mit|5




tenant VMs

cicada predicts bandwidth for each tenant


lacurts@mit|5




provider offers guarantee based on cicada’s prediction

provider places tenant VMs



lacurts@mit|5







provider updates placement based on cicada’s prediction


lacurts@mit|6









lacurts@mit|6








problem: is application traffic actually predictable? if so, is it captured by existing models?

lacurts@mit|7

APPLICATION TRAFFICto design cicada’s prediction algorithm, we need to understand how

cloud applications send traffic

lacurts@mit|7

APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic

lacurts@mit|7


vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

rigid application (similar to VOC [1])

more data

less data

[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.


lacurts@mit|7


vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1


more data

less datano data



lacurts@mit|7


vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1


vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1

variable application

more data

less datano data



lacurts@mit|7


vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1


spatial variability: different pairs of VMs transfer different amounts of data

more data

less datano data



lacurts@mit|7


vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1



more data

less datano data



lacurts@mit|7


vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1



more data

less datano data



lacurts@mit|7


vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1


spatial variability: different pairs of VMs transfer different amounts of datatemporal variability: pairs of VMs transfer different amounts of data at different times

more data

less datano data



lacurts@mit|

DATASET

... ... ......

HP Cloud Serviceshttp://hpcloud.com

8

lacurts@mit|

DATASET

... ... ......

observed sflow data (could also use tcpdump, etc.)


8

lacurts@mit|

DATASET

... ... ......



8

(in this talk) collected one traffic matrix per hour, for

each application

lacurts@mit|

DATASET

... ... ......


M1 M2 Mn

...

observed traffic


8

each entry represents the number of bytes

transferred between i and j


each application

lacurts@mit|

DATASET

... ... ......


M1 M2 Mn

...

observed traffic


8

each entry represents the number of bytes

transferred between i and j


each application

goal: use this dataset to quantify spatial and temporal variability

lacurts@mit|

high spatial variabilityvm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

no spatial variability

9

SPATIAL VARIABILITYhow do we quantify spatial variability?

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1

some spatial variability

more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9


1. let Fij = fraction of tenant traffic sent from VMi to VMj

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9



Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9



all Fij values are equal

Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0

more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12

more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12

two distinct Fij values

more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20 Fij < 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20 Fij < 1/20 Fij > 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9



all Fij values are equal many distinct Fij values

Fij = 1/20 Fij < 1/20 Fij > 1/20

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20 Fij < 1/20 Fij > 1/20

2. calculate the coefficient of variation (σ/µ) of the Fij values

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20 Fij < 1/20 Fij > 1/20


⇒ σ(Fij)/µ(Fij) = 0

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12


more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20 Fij < 1/20 Fij > 1/20


⇒ σ(Fij)/µ(Fij) = 0

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12


⇒ σ(Fij)/µ(Fij) = 1*

*see paper for this result

more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20 Fij < 1/20 Fij > 1/20


⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12




more data

less datano data

lacurts@mit|


vm2

vm3

vm4

vm5

vm1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


9




Fij = 1/20 Fij < 1/20 Fij > 1/20


⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1

vm1 vm2 vm3 vm4 vm5

vm2

vm3

vm4

vm5

vm1


Fij = 0 Fij = 1/12




the higher the cov, the higher the spatial variation

more data

less datano data

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F

Spatial Variation(cov of Fij values)

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F


applications exhibit high spatial variability

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F



cov values as large as 100

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F




(the same result holds for temporal variability, see paper)

lacurts@mit|10

SPATIAL VARIABILITY

0

0.2

0.4

0.6

0.8

1

0.01 0.1 1 10 100

CD

F




(the same result holds for temporal variability, see paper)

customers cannot model such variability themselves, and existing models do not capture it

lacurts@mit|11

CICADA’S ALGORITHMM1 M2 Mn

...observed traffic

M̂n+1

prediction for next hour

Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.

lacurts@mit|11

CICADA’S ALGORITHM

M̂n+1

M1 M2 Mn

...observed traffic

M̂n+1


= f(M1,M2, . . . ,Mn)


lacurts@mit|11


= w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1



lacurts@mit|11


= w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1


instead of setting the weights beforehand, cicada learns the weights online; they are updated with every

prediction based on errors made in previous predictions


lacurts@mit|11


= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1





lacurts@mit|11


= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1




wi(n+ 1) = wi(n)


lacurts@mit|11


= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1




· e�L(i,n)loss function

wi(n+ 1) = wi(n)


lacurts@mit|11


= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1




L(i, n) =k ~Mi � ~Mnk

k ~Mnk· e�L(i,n)

loss function

wi(n+ 1) = wi(n)


lacurts@mit|11


= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1

M1 M2 Mn

...observed traffic

M̂n+1




L(i, n) =k ~Mi � ~Mnk

k ~Mnk· e�L(i,n)

loss function

normalization

· 1

Zn+1

wi(n+ 1) = wi(n)


lacurts@mit|12

EVALUATIONto evaluate cicada, we compared its predictions to ones made by

a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9

vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data


lacurts@mit|12



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data

VOC requires customers to input parameters

(group sizes, amount of bandwidth)


lacurts@mit|12



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data



we used an oracle method to pick the best possible

parameters, and also allowed for temporal variability


lacurts@mit|12



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data





L2-norm errorE =

kM̂ �MkkM̂k

relative error

E =

Pi=n2

i=1 M̂ [i]�M [i]Pi=n2

i=1 M [i]


lacurts@mit|12



vm2

vm3

vm4

vm5

vm6

vm7

vm8

vm9

vm1 more data

less datano data





L2-norm errorE =

kM̂ �MkkM̂k

relative error

E =

Pi=n2

i=1 M̂ [i]�M [i]Pi=n2

i=1 M [i]

differentiates between over- and under-prediction


lacurts@mit|13

RESULTS PREDICTING AVERAGE DEMAND

does not differentiate between over- and under-prediction


0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F

Relative L2-norm Error (fraction)

CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F

Relative Error (fraction)

CicadaVOC (Oracle)

lacurts@mit|13

RESULTS PREDICTING AVERAGE DEMAND

cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and

require no customer input(71% for L2 error)



0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F


CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F


CicadaVOC (Oracle)

lacurts@mit|13

RESULTS

cicada occasionally under-predicts; VOC does not because we selected its parameters to

avoid under-prediction

PREDICTING AVERAGE DEMAND





0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F


CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F


CicadaVOC (Oracle)

lacurts@mit|13

RESULTS

cicada occasionally under-predicts; VOC does not because we selected its parameters to

avoid under-prediction

PREDICTING AVERAGE DEMAND





cicada’s predictions typically require only 1-2 hours of

application history

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5

CD

F


CicadaVOC (Oracle)

0

0.2

0.4

0.6

0.8

1

-1 0 1 2 3 4 5

CD

F


CicadaVOC (Oracle)

lacurts@mit|14

PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair

lacurts@mit|14


network resources must be available

lacurts@mit|14


network resources must be available customers can add a buffer to the offered guarantees

accept cicada’s guarantee, but add 5% more bandwidth

lacurts@mit|14



the provider may choose to augment predictions that cicada is not confident about

typically not confident for “small” applications

customers can add a buffer to the offered guarantees


lacurts@mit|14



the provider may choose to augment predictions that cicada is not confident about

typically not confident for “small” applications

customers can add a buffer to the offered guarantees


cicada can detect when a guarantee is too low, and offer a new one

observe packet drops

lacurts@mit|15

NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?

lacurts@mit|15


wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application

lacurts@mit|15



provider’s goal: minimize wasted bandwidth

lacurts@mit|15




intra-rack bandwidth is usually cheap

(sometimes free)

lacurts@mit|15





(sometimes free)

inter-rack bandwidth is more expensive

lacurts@mit|15



provider’s goal: minimize wasted inter-rack bandwidth


(sometimes free)

inter-rack bandwidth is more expensive

lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck

Ba

nd

wid

th A

va

ila

ble

(GB

yte

/ho

ur)

Oversubscription Factor


cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)

lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)


bw factor 1x = VOC’s placement [1]



lacurts@mit|16

NETWORK UTILIZATION

= VOC’s placement [1]= cicada’s placement

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck

Ba

nd

wid

th A

va

ila

ble

(GB

yte

/ho

ur)


bw factor 1x



lacurts@mit|16

NETWORK UTILIZATION


0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck

Ba

nd

wid

th A

va

ila

ble

(GB

yte

/ho

ur)


bw factor 1xbw factor 25x



lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)



bw factor 250x




lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)



bw factor 250x


as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth



lacurts@mit|16

NETWORK UTILIZATION

0

100000

200000

300000

400000

500000

600000

1 1.4142 1.5 1.666 1.75 2 4 8 16

Inte

r-ra

ck B

an

dw

idth

Availab

le(G

Byte

/ho

ur)



bw factor 250x


in some scenarios, cicada’s placement more than doubles the amount

of available inter-rack bandwidth

as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth



lacurts@mit|17

SUMMARY

accurate

cloud applications exhibit variability that existing models don’t capture

cicada captures this variability, and provides guarantees that are

calculated quickly*

require little history

increase network utilization


lacurts@mit|17

SUMMARY

accurate



calculated quickly*




improve relative error by 90%

lacurts@mit|17

SUMMARY

accurate



calculated quickly*





< 10milliseconds per application

lacurts@mit|17

SUMMARY

accurate



calculated quickly*






1--2 hours for most applications

lacurts@mit|17

SUMMARY

accurate



calculated quickly*






1--2 hours for most applications

in some cases, inter-rack utilization is doubled

Documents

CICADA - USENIX · 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm 8 vm 9 vm 2 vm 3 vm 4 vm 5 vm 6 vm 7 vm 8 vm 9 vm 1 rigid application (similar to VOC [1]) vm 1 vm 2 vm 3 vm 4 vm 5vm 6 vm 7 vm