Upload
others
View
93
Download
0
Embed Size (px)
Citation preview
CICADAPREDICTIVE GUARANTEES FOR CLOUD NETWORKS
Katrina LaCurts Jeff Mogul
Hari Balakrishnan Yoshio Turner
MIT CSAIL, Google Inc., HP Labs
lacurts@mit|2
BANDWIDTH GUARANTEES
`
lacurts@mit|2
BANDWIDTH GUARANTEES10 machines
`
lacurts@mit|2
BANDWIDTH GUARANTEES10 machines
`
lacurts@mit|2
BANDWIDTH GUARANTEES10 machines
`
lacurts@mit|2
BANDWIDTH GUARANTEES
bandwidth guarantee: a promise from the provider to the customer that its VMs will be able to
communicate with each other at a particular rate (informal definition)
10 machines
`
lacurts@mit|3
WHY GUARANTEES?
lacurts@mit|3
WHY GUARANTEES?
is there really that much data?
lacurts@mit|3
WHY GUARANTEES?applications can send
terabytes or petabytes of data internally [1]
is there really that much data?
[1] Zaharia, et al. “Improving MapReduce Performance in Heterogeneous Environments”. OSDI 2008
lacurts@mit|3
WHY GUARANTEES?applications can send
terabytes or petabytes of data internally [1]
aren’t cloud networks homogeneous and well-provisioned?
[1] Zaharia, et al. “Improving MapReduce Performance in Heterogeneous Environments”. OSDI 2008
lacurts@mit|3
WHY GUARANTEES?applications can send
terabytes or petabytes of data internally [1]application traffic is
affected by other users
aren’t cloud networks homogeneous and well-provisioned?
[1] Zaharia, et al. “Improving MapReduce Performance in Heterogeneous Environments”. OSDI 2008
lacurts@mit|3
WHY GUARANTEES?applications can send
terabytes or petabytes of data internally [1]application traffic is
affected by other users
what customers care about network performance?
[1] Zaharia, et al. “Improving MapReduce Performance in Heterogeneous Environments”. OSDI 2008
lacurts@mit|3
WHY GUARANTEES?applications can send
terabytes or petabytes of data internally [1]application traffic is
affected by other users
enterprise customers need to satisfy SLAs with their own customers
what customers care about network performance?
[1] Zaharia, et al. “Improving MapReduce Performance in Heterogeneous Environments”. OSDI 2008
lacurts@mit|4
POSSIBLE ARCHITECTURE
lacurts@mit|4
POSSIBLE ARCHITECTURE
customer requests machines and guarantee
10 machines, 10Gb/s between each pair
lacurts@mit|4
POSSIBLE ARCHITECTURE
customer requests machines and guarantee
provider places customer’s VMs,
enforces guarantee
10 machines, 10Gb/s between each pair
lacurts@mit|4
POSSIBLE ARCHITECTURE
customer requests machines and guarantee
provider places customer’s VMs,
enforces guarantee
10 machines, 10Gb/s between each pair
what if some pairs of VMs use fewer than 10Gb/s?
lacurts@mit|4
POSSIBLE ARCHITECTURE
customer requests machines and guarantee
provider places customer’s VMs,
enforces guarantee
10 machines, 10Gb/s between each pair
what if some pairs of VMs use fewer than 10Gb/s?
over-provisioned network increased cost to customer,
wasted bandwidth within network
lacurts@mit|4
POSSIBLE ARCHITECTURE
customer requests machines and guarantee
provider places customer’s VMs,
enforces guarantee
10 machines, 10Gb/s between each pair
what if some pairs of VMs use fewer than 10Gb/s?
what if some pairs of VMs need more than 10Gb/s?
over-provisioned network increased cost to customer,
wasted bandwidth within network
lacurts@mit|4
POSSIBLE ARCHITECTURE
customer requests machines and guarantee
provider places customer’s VMs,
enforces guarantee
10 machines, 10Gb/s between each pair
what if some pairs of VMs use fewer than 10Gb/s?
what if some pairs of VMs need more than 10Gb/s?
over-provisioned network increased cost to customer,
wasted bandwidth within network
under-provisioned network poor performance for
customer’s application
lacurts@mit|4
POSSIBLE ARCHITECTURE
customer requests machines and guarantee
provider places customer’s VMs,
enforces guarantee
10 machines, 10Gb/s between each pair
what if some pairs of VMs use fewer than 10Gb/s?
what if some pairs of VMs need more than 10Gb/s?
over-provisioned network increased cost to customer,
wasted bandwidth within network
under-provisioned network poor performance for
customer’s application
problem: how do customers know what guarantee they need?
lacurts@mit|5
CICADA’S ARCHITECTUREcicada makes predictions about an application’s
traffic to automatically generate a guarantee
lacurts@mit|5
CICADA’S ARCHITECTURE
customer makes initial request for machines
cicada makes predictions about an application’s traffic to automatically generate a guarantee
lacurts@mit|5
CICADA’S ARCHITECTURE
customer makes initial request for machines provider places
tenant VMs
cicada makes predictions about an application’s traffic to automatically generate a guarantee
lacurts@mit|5
CICADA’S ARCHITECTURE
hypervisors send measurements to cicada controller
customer makes initial request for machines provider places
tenant VMs
cicada makes predictions about an application’s traffic to automatically generate a guarantee
lacurts@mit|5
CICADA’S ARCHITECTURE
hypervisors send measurements to cicada controller
customer makes initial request for machines provider places
tenant VMs
cicada predicts bandwidth for each tenant
cicada makes predictions about an application’s traffic to automatically generate a guarantee
lacurts@mit|5
CICADA’S ARCHITECTURE
hypervisors send measurements to cicada controller
customer makes initial request for machines
provider offers guarantee based on cicada’s prediction
provider places tenant VMs
cicada predicts bandwidth for each tenant
cicada makes predictions about an application’s traffic to automatically generate a guarantee
lacurts@mit|5
CICADA’S ARCHITECTURE
hypervisors send measurements to cicada controller
customer makes initial request for machines
provider offers guarantee based on cicada’s prediction
provider places tenant VMs
cicada predicts bandwidth for each tenant
provider updates placement based on cicada’s prediction
cicada makes predictions about an application’s traffic to automatically generate a guarantee
lacurts@mit|6
CICADA’S ARCHITECTURE
hypervisors send measurements to cicada controller
customer makes initial request for machines
provider offers guarantee based on cicada’s prediction
provider places tenant VMs
cicada predicts bandwidth for each tenant
provider updates placement based on cicada’s prediction
cicada makes predictions about an application’s traffic to automatically generate a guarantee
lacurts@mit|6
CICADA’S ARCHITECTURE
hypervisors send measurements to cicada controller
customer makes initial request for machines
provider offers guarantee based on cicada’s prediction
provider places tenant VMs
cicada predicts bandwidth for each tenant
provider updates placement based on cicada’s prediction
problem: is application traffic actually predictable? if so, is it captured by existing models?
lacurts@mit|7
APPLICATION TRAFFICto design cicada’s prediction algorithm, we need to understand how
cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
rigid application (similar to VOC [1])
more data
less data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
rigid application (similar to VOC [1])
more data
less datano data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
rigid application (similar to VOC [1])
vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
variable application
more data
less datano data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
rigid application (similar to VOC [1])
vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
variable application
spatial variability: different pairs of VMs transfer different amounts of data
more data
less datano data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
rigid application (similar to VOC [1])
vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
variable application
spatial variability: different pairs of VMs transfer different amounts of data
more data
less datano data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
rigid application (similar to VOC [1])
vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
variable application
spatial variability: different pairs of VMs transfer different amounts of data
more data
less datano data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|7
APPLICATION TRAFFICvm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
rigid application (similar to VOC [1])
vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1
variable application
spatial variability: different pairs of VMs transfer different amounts of datatemporal variability: pairs of VMs transfer different amounts of data at different times
more data
less datano data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
to design cicada’s prediction algorithm, we need to understand how cloud applications send traffic
lacurts@mit|
DATASET
... ... ......
HP Cloud Serviceshttp://hpcloud.com
8
lacurts@mit|
DATASET
... ... ......
observed sflow data (could also use tcpdump, etc.)
HP Cloud Serviceshttp://hpcloud.com
8
lacurts@mit|
DATASET
... ... ......
observed sflow data (could also use tcpdump, etc.)
HP Cloud Serviceshttp://hpcloud.com
8
(in this talk) collected one traffic matrix per hour, for
each application
lacurts@mit|
DATASET
... ... ......
observed sflow data (could also use tcpdump, etc.)
M1 M2 Mn
...
observed traffic
HP Cloud Serviceshttp://hpcloud.com
8
each entry represents the number of bytes
transferred between i and j
(in this talk) collected one traffic matrix per hour, for
each application
lacurts@mit|
DATASET
... ... ......
observed sflow data (could also use tcpdump, etc.)
M1 M2 Mn
...
observed traffic
HP Cloud Serviceshttp://hpcloud.com
8
each entry represents the number of bytes
transferred between i and j
(in this talk) collected one traffic matrix per hour, for
each application
goal: use this dataset to quantify spatial and temporal variability
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
Fij = 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal
Fij = 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal
Fij = 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal
Fij = 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal
Fij = 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal
Fij = 1/20 Fij < 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal
Fij = 1/20 Fij < 1/20 Fij > 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal many distinct Fij values
Fij = 1/20 Fij < 1/20 Fij > 1/20
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal many distinct Fij values
Fij = 1/20 Fij < 1/20 Fij > 1/20
2. calculate the coefficient of variation (σ/µ) of the Fij values
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal many distinct Fij values
Fij = 1/20 Fij < 1/20 Fij > 1/20
2. calculate the coefficient of variation (σ/µ) of the Fij values
⇒ σ(Fij)/µ(Fij) = 0
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal many distinct Fij values
Fij = 1/20 Fij < 1/20 Fij > 1/20
2. calculate the coefficient of variation (σ/µ) of the Fij values
⇒ σ(Fij)/µ(Fij) = 0
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
⇒ σ(Fij)/µ(Fij) = 1*
*see paper for this result
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal many distinct Fij values
Fij = 1/20 Fij < 1/20 Fij > 1/20
2. calculate the coefficient of variation (σ/µ) of the Fij values
⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
⇒ σ(Fij)/µ(Fij) = 1*
*see paper for this result
more data
less datano data
lacurts@mit|
high spatial variabilityvm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
no spatial variability
9
SPATIAL VARIABILITYhow do we quantify spatial variability?
1. let Fij = fraction of tenant traffic sent from VMi to VMj
all Fij values are equal many distinct Fij values
Fij = 1/20 Fij < 1/20 Fij > 1/20
2. calculate the coefficient of variation (σ/µ) of the Fij values
⇒ σ(Fij)/µ(Fij) = 0 ⇒ σ(Fij)/µ(Fij) > 1
vm1 vm2 vm3 vm4 vm5
vm2
vm3
vm4
vm5
vm1
some spatial variability
Fij = 0 Fij = 1/12
two distinct Fij values
⇒ σ(Fij)/µ(Fij) = 1*
*see paper for this result
the higher the cov, the higher the spatial variation
more data
less datano data
lacurts@mit|10
SPATIAL VARIABILITY
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100
CD
F
Spatial Variation(cov of Fij values)
lacurts@mit|10
SPATIAL VARIABILITY
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100
CD
F
Spatial Variation(cov of Fij values)
applications exhibit high spatial variability
lacurts@mit|10
SPATIAL VARIABILITY
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100
CD
F
Spatial Variation(cov of Fij values)
applications exhibit high spatial variability
cov values as large as 100
lacurts@mit|10
SPATIAL VARIABILITY
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100
CD
F
Spatial Variation(cov of Fij values)
applications exhibit high spatial variability
cov values as large as 100
(the same result holds for temporal variability, see paper)
lacurts@mit|10
SPATIAL VARIABILITY
0
0.2
0.4
0.6
0.8
1
0.01 0.1 1 10 100
CD
F
Spatial Variation(cov of Fij values)
applications exhibit high spatial variability
cov values as large as 100
(the same result holds for temporal variability, see paper)
customers cannot model such variability themselves, and existing models do not capture it
lacurts@mit|11
CICADA’S ALGORITHMM1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
M̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
= f(M1,M2, . . . ,Mn)
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
= w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
= w1 ·M1 + w2 ·M2 + . . .+ wn ·MnM̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
instead of setting the weights beforehand, cicada learns the weights online; they are updated with every
prediction based on errors made in previous predictions
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
instead of setting the weights beforehand, cicada learns the weights online; they are updated with every
prediction based on errors made in previous predictions
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
instead of setting the weights beforehand, cicada learns the weights online; they are updated with every
prediction based on errors made in previous predictions
wi(n+ 1) = wi(n)
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
instead of setting the weights beforehand, cicada learns the weights online; they are updated with every
prediction based on errors made in previous predictions
· e�L(i,n)loss function
wi(n+ 1) = wi(n)
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
instead of setting the weights beforehand, cicada learns the weights online; they are updated with every
prediction based on errors made in previous predictions
L(i, n) =k ~Mi � ~Mnk
k ~Mnk· e�L(i,n)
loss function
wi(n+ 1) = wi(n)
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|11
CICADA’S ALGORITHM
= w1(n) ·M1 + w2(n) ·M2 + . . .+ wn(n) ·MnM̂n+1
M1 M2 Mn
...observed traffic
M̂n+1
prediction for next hour
instead of setting the weights beforehand, cicada learns the weights online; they are updated with every
prediction based on errors made in previous predictions
L(i, n) =k ~Mi � ~Mnk
k ~Mnk· e�L(i,n)
loss function
normalization
· 1
Zn+1
wi(n+ 1) = wi(n)
Cicada’s prediction algorithm draws inspiration from the following works: [1] Wang, et al. “COPE: Traffic Engineering in Dynamic Networks”. SIGCOMM 2006. [2] Herbster and Warmuth. “Tracking the Best Expert”. Machine Learning Vol 32, 1998.
lacurts@mit|12
EVALUATIONto evaluate cicada, we compared its predictions to ones made by
a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1 more data
less datano data
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
lacurts@mit|12
EVALUATIONto evaluate cicada, we compared its predictions to ones made by
a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1 more data
less datano data
VOC requires customers to input parameters
(group sizes, amount of bandwidth)
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
lacurts@mit|12
EVALUATIONto evaluate cicada, we compared its predictions to ones made by
a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1 more data
less datano data
VOC requires customers to input parameters
(group sizes, amount of bandwidth)
we used an oracle method to pick the best possible
parameters, and also allowed for temporal variability
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
lacurts@mit|12
EVALUATIONto evaluate cicada, we compared its predictions to ones made by
a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1 more data
less datano data
VOC requires customers to input parameters
(group sizes, amount of bandwidth)
we used an oracle method to pick the best possible
parameters, and also allowed for temporal variability
L2-norm errorE =
kM̂ �MkkM̂k
relative error
E =
Pi=n2
i=1 M̂ [i]�M [i]Pi=n2
i=1 M [i]
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
lacurts@mit|12
EVALUATIONto evaluate cicada, we compared its predictions to ones made by
a VOC-style system [1]vm1 vm2 vm3 vm4 vm5vm6 vm7 vm8 vm9
vm2
vm3
vm4
vm5
vm6
vm7
vm8
vm9
vm1 more data
less datano data
VOC requires customers to input parameters
(group sizes, amount of bandwidth)
we used an oracle method to pick the best possible
parameters, and also allowed for temporal variability
L2-norm errorE =
kM̂ �MkkM̂k
relative error
E =
Pi=n2
i=1 M̂ [i]�M [i]Pi=n2
i=1 M [i]
differentiates between over- and under-prediction
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
lacurts@mit|13
RESULTS PREDICTING AVERAGE DEMAND
does not differentiate between over- and under-prediction
differentiates between over- and under-prediction
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
CD
F
Relative L2-norm Error (fraction)
CicadaVOC (Oracle)
0
0.2
0.4
0.6
0.8
1
-1 0 1 2 3 4 5
CD
F
Relative Error (fraction)
CicadaVOC (Oracle)
lacurts@mit|13
RESULTS PREDICTING AVERAGE DEMAND
cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and
require no customer input(71% for L2 error)
does not differentiate between over- and under-prediction
differentiates between over- and under-prediction
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
CD
F
Relative L2-norm Error (fraction)
CicadaVOC (Oracle)
0
0.2
0.4
0.6
0.8
1
-1 0 1 2 3 4 5
CD
F
Relative Error (fraction)
CicadaVOC (Oracle)
lacurts@mit|13
RESULTS
cicada occasionally under-predicts; VOC does not because we selected its parameters to
avoid under-prediction
PREDICTING AVERAGE DEMAND
cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and
require no customer input(71% for L2 error)
does not differentiate between over- and under-prediction
differentiates between over- and under-prediction
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
CD
F
Relative L2-norm Error (fraction)
CicadaVOC (Oracle)
0
0.2
0.4
0.6
0.8
1
-1 0 1 2 3 4 5
CD
F
Relative Error (fraction)
CicadaVOC (Oracle)
lacurts@mit|13
RESULTS
cicada occasionally under-predicts; VOC does not because we selected its parameters to
avoid under-prediction
PREDICTING AVERAGE DEMAND
cicada’s predictions outperform VOC-style predictions (median error decreases by 90%) and
require no customer input(71% for L2 error)
does not differentiate between over- and under-prediction
differentiates between over- and under-prediction
cicada’s predictions typically require only 1-2 hours of
application history
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5
CD
F
Relative L2-norm Error (fraction)
CicadaVOC (Oracle)
0
0.2
0.4
0.6
0.8
1
-1 0 1 2 3 4 5
CD
F
Relative Error (fraction)
CicadaVOC (Oracle)
lacurts@mit|14
PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair
lacurts@mit|14
PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair
network resources must be available
lacurts@mit|14
PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair
network resources must be available customers can add a buffer to the offered guarantees
accept cicada’s guarantee, but add 5% more bandwidth
lacurts@mit|14
PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair
network resources must be available
the provider may choose to augment predictions that cicada is not confident about
typically not confident for “small” applications
customers can add a buffer to the offered guarantees
accept cicada’s guarantee, but add 5% more bandwidth
lacurts@mit|14
PREDICTIONS→GUARANTEEScicada turns its predictions into pipe-model guarantees; a separate guarantee for each (source, destination) pair
network resources must be available
the provider may choose to augment predictions that cicada is not confident about
typically not confident for “small” applications
customers can add a buffer to the offered guarantees
accept cicada’s guarantee, but add 5% more bandwidth
cicada can detect when a guarantee is too low, and offer a new one
observe packet drops
lacurts@mit|15
NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?
lacurts@mit|15
NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?
wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application
lacurts@mit|15
NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?
wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application
provider’s goal: minimize wasted bandwidth
lacurts@mit|15
NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?
wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application
provider’s goal: minimize wasted bandwidth
intra-rack bandwidth is usually cheap
(sometimes free)
lacurts@mit|15
NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?
wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application
provider’s goal: minimize wasted bandwidth
intra-rack bandwidth is usually cheap
(sometimes free)
inter-rack bandwidth is more expensive
lacurts@mit|15
NETWORK UTILIZATIONdo cicada’s bandwidth guarantees improve network utilization?
wasted bandwidth: bandwidth that is guaranteed for an application, but not used by the application
provider’s goal: minimize wasted inter-rack bandwidth
intra-rack bandwidth is usually cheap
(sometimes free)
inter-rack bandwidth is more expensive
lacurts@mit|16
NETWORK UTILIZATION
0
100000
200000
300000
400000
500000
600000
1 1.4142 1.5 1.666 1.75 2 4 8 16
Inte
r-ra
ck
Ba
nd
wid
th A
va
ila
ble
(GB
yte
/ho
ur)
Oversubscription Factor
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)
lacurts@mit|16
NETWORK UTILIZATION
0
100000
200000
300000
400000
500000
600000
1 1.4142 1.5 1.666 1.75 2 4 8 16
Inte
r-ra
ck B
an
dw
idth
Availab
le(G
Byte
/ho
ur)
Oversubscription Factor
bw factor 1x = VOC’s placement [1]
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)
lacurts@mit|16
NETWORK UTILIZATION
= VOC’s placement [1]= cicada’s placement
0
100000
200000
300000
400000
500000
600000
1 1.4142 1.5 1.666 1.75 2 4 8 16
Inte
r-ra
ck
Ba
nd
wid
th A
va
ila
ble
(GB
yte
/ho
ur)
Oversubscription Factor
bw factor 1x
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)
lacurts@mit|16
NETWORK UTILIZATION
= VOC’s placement [1]= cicada’s placement
0
100000
200000
300000
400000
500000
600000
1 1.4142 1.5 1.666 1.75 2 4 8 16
Inte
r-ra
ck
Ba
nd
wid
th A
va
ila
ble
(GB
yte
/ho
ur)
Oversubscription Factor
bw factor 1xbw factor 25x
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)
lacurts@mit|16
NETWORK UTILIZATION
0
100000
200000
300000
400000
500000
600000
1 1.4142 1.5 1.666 1.75 2 4 8 16
Inte
r-ra
ck B
an
dw
idth
Availab
le(G
Byte
/ho
ur)
Oversubscription Factor
bw factor 1xbw factor 25x
bw factor 250x
= VOC’s placement [1]= cicada’s placement
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)
lacurts@mit|16
NETWORK UTILIZATION
0
100000
200000
300000
400000
500000
600000
1 1.4142 1.5 1.666 1.75 2 4 8 16
Inte
r-ra
ck B
an
dw
idth
Availab
le(G
Byte
/ho
ur)
Oversubscription Factor
bw factor 1xbw factor 25x
bw factor 250x
= VOC’s placement [1]= cicada’s placement
as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)
lacurts@mit|16
NETWORK UTILIZATION
0
100000
200000
300000
400000
500000
600000
1 1.4142 1.5 1.666 1.75 2 4 8 16
Inte
r-ra
ck B
an
dw
idth
Availab
le(G
Byte
/ho
ur)
Oversubscription Factor
bw factor 1xbw factor 25x
bw factor 250x
= VOC’s placement [1]= cicada’s placement
in some scenarios, cicada’s placement more than doubles the amount
of available inter-rack bandwidth
as applications use more bandwidth, VOC’s placement wastes more inter-rack bandwidth
[1] Ballani, et al. “Towards Predictable Datacenter Networks”. SIGCOMM 2011.
cicada’s greedy placement heuristic: place the pairs of VMs that need the highest guarantees on the fastest paths (see paper for details)
lacurts@mit|17
SUMMARY
accurate
cloud applications exhibit variability that existing models don’t capture
cicada captures this variability, and provides guarantees that are
calculated quickly*
require little history
increase network utilization
*see paper for this result
lacurts@mit|17
SUMMARY
accurate
cloud applications exhibit variability that existing models don’t capture
cicada captures this variability, and provides guarantees that are
calculated quickly*
require little history
increase network utilization
*see paper for this result
improve relative error by 90%
lacurts@mit|17
SUMMARY
accurate
cloud applications exhibit variability that existing models don’t capture
cicada captures this variability, and provides guarantees that are
calculated quickly*
require little history
increase network utilization
*see paper for this result
improve relative error by 90%
< 10milliseconds per application
lacurts@mit|17
SUMMARY
accurate
cloud applications exhibit variability that existing models don’t capture
cicada captures this variability, and provides guarantees that are
calculated quickly*
require little history
increase network utilization
*see paper for this result
improve relative error by 90%
< 10milliseconds per application
1--2 hours for most applications
lacurts@mit|17
SUMMARY
accurate
cloud applications exhibit variability that existing models don’t capture
cicada captures this variability, and provides guarantees that are
calculated quickly*
require little history
increase network utilization
*see paper for this result
improve relative error by 90%
< 10milliseconds per application
1--2 hours for most applications
in some cases, inter-rack utilization is doubled