View
1
Download
0
Category
Preview:
Citation preview
SMORE: Semi-Oblivious Traffic Engineering
Praveen Kumar* Yang Yuan* Chris Yu‡ Nate Foster* Robert Kleinberg* Petr Lapukhov# Chiun Lin Lim# Robert Soulé §
* Cornell ‡ CMU # Facebook § USI Lugano
WAN Traffic Engineering
WAN Traffic EngineeringObjectives Challenges
Gbps
Performance Robustness
Latency Operational simplicity
WAN Traffic EngineeringObjectives Challenges
Gbps
Performance Robustness
Latency Operational simplicity
Unstructured topology
Unexpected failures
Misprediction & Traffic Bursts
Heterogeneous capacity
Update overheads
Device limitations
TE ApproachesTraditional Distributed
SDN-Based Centralized
1
1
100
1
1
1
1
1
1
1
TE ApproachesTraditional Distributed
SDN-Based Centralized
1
1
100
1
1
1
1
1
1
1
TE ApproachesTraditional Distributed
SDN-Based Centralized
1
1
100
1
1
1
1
1
1
1100
TE ApproachesTraditional Distributed
SDN-Based Centralized
1
1
100
1
1
1
1
1
1
1100
TE ApproachesTraditional Distributed
SDN-Based Centralized
Optimal TE?(MCF)
1
1
100
1
1
1
1
1
1
1100
Operational Cost of OptimalitySolver Time
Tim
e (s
econ
ds)
Traffic Matrix
Operational Cost of OptimalityPath Churn
Chur
n (#
pat
hs)
Traffic Matrix
Towards a Practical ModelTopology
(+ demands)
Path Selection
Rate Adaptation
Paths
Splitting Ratio
Demands
1
2
Towards a Practical ModelTopology
(+ demands)
Path Selection
Rate Adaptation
Paths
Splitting Ratio
Demands
Computing and updating
paths is typically expensive and
slow.
But updating splitting ratios is cheap and fast!
1
2
Towards a Practical ModelTopology
(+ demands)
Path Selection
Rate Adaptation
Paths
Splitting Ratio
Demands
Computing and updating
paths is typically expensive and
slow.
But updating splitting ratios is cheap and fast!
Static
Dynamic
1
2
Path Selection Challenges
• Selecting a good set of paths is tricky!
• Route the demands (ideally, with competitive latency)
• React to changes in demands (diurnal changes, traffic bursts, etc.)
• Be robust under mis-prediction of demands
• Have sufficient extra capacity to route demands in presence of failures
• and more …
ApproachA static set of cleverly-constructed paths can
provide near-optimal performance and robustness!
Desired path properties:
• Low stretch for minimizing latency
• High diversity for ensuring robustness
• Good load balancing for performance • Capacity aware
• Globally optimized{
Path Properties: Capacity Aware
• Traditional approaches to routing based on shortest paths (e.g., ECMP, KSP) are generally not capacity aware
C
B
A
G E
F
D
100 Gbps10 Gbps
Path Properties: Capacity Aware
• Traditional approaches to routing based on shortest paths (e.g., ECMP, KSP) are generally not capacity aware
C
B
A
G E
F
DA
C
B
100 Gbps10 Gbps
Path Properties: Capacity Aware
• Traditional approaches to routing based on shortest paths (e.g., ECMP, KSP) are generally not capacity aware
C
B
A
G E
F
DA
C
B
100 Gbps10 Gbps
❌
Path Properties: Globally OptimalOther approaches based on greedy algorithms are
capacity aware, but are still not globally optimal
C
B
A
G E
F
D
Globally optimalCSPF
Path Properties: Globally OptimalOther approaches based on greedy algorithms are
capacity aware, but are still not globally optimal
C
B
A
G E
F
DA
Globally optimalCSPF
Path Properties: Globally OptimalOther approaches based on greedy algorithms are
capacity aware, but are still not globally optimal
C
B
A
G E
F
DA
B
Globally optimalCSPF
Path Properties: Globally OptimalOther approaches based on greedy algorithms are
capacity aware, but are still not globally optimal
C
B
A
G E
F
DA
C
B
Globally optimalCSPF
Path Properties: Globally OptimalOther approaches based on greedy algorithms are
capacity aware, but are still not globally optimal
C
B
A
G E
F
DA
C
B
C
B
A
G E
F
DA
C
B
Globally optimalCSPF
Path Selection
AlgorithmLoad balanced
Diverse Low-stretchCapacity aware
Globally Optimized
SPF / ECMP ❌ ❌ ❌ ✔
CSPF ✔ ❌ ❌ ✔
k-shortest paths ❌ ❌ ? ✔
Edge-disjoint KSP ❌ ❌ ✔ ✔
MCF ✔ ✔ ❌ ❌
VLB ❌ ❌ ✔ ❌
B4 ✔ ✔ ❌ ?
? - Difficult to generalize
Path Selection
AlgorithmLoad balanced
Diverse Low-stretchCapacity aware
Globally Optimized
SPF / ECMP ❌ ❌ ❌ ✔
CSPF ✔ ❌ ❌ ✔
k-shortest paths ❌ ❌ ? ✔
Edge-disjoint KSP ❌ ❌ ✔ ✔
MCF ✔ ✔ ❌ ❌
VLB ❌ ❌ ✔ ❌
B4 ✔ ✔ ❌ ?
? - Difficult to generalize
Path Selection
AlgorithmLoad balanced
Diverse Low-stretchCapacity aware
Globally Optimized
SPF / ECMP ❌ ❌ ❌ ✔
CSPF ✔ ❌ ❌ ✔
k-shortest paths ❌ ❌ ? ✔
Edge-disjoint KSP ❌ ❌ ✔ ✔
MCF ✔ ✔ ❌ ❌
VLB ❌ ❌ ✔ ❌
B4 ✔ ✔ ❌ ?
? - Difficult to generalize
Path Selection
AlgorithmLoad balanced
Diverse Low-stretchCapacity aware
Globally Optimized
SPF / ECMP ❌ ❌ ❌ ✔
CSPF ✔ ❌ ❌ ✔
k-shortest paths ❌ ❌ ? ✔
Edge-disjoint KSP ❌ ❌ ✔ ✔
MCF ✔ ✔ ❌ ❌
VLB ❌ ❌ ✔ ❌
B4 ✔ ✔ ❌ ?
? - Difficult to generalize
Path Selection
AlgorithmLoad balanced
Diverse Low-stretchCapacity aware
Globally Optimized
SPF / ECMP ❌ ❌ ❌ ✔
CSPF ✔ ❌ ❌ ✔
k-shortest paths ❌ ❌ ? ✔
Edge-disjoint KSP ❌ ❌ ✔ ✔
MCF ✔ ✔ ❌ ❌
VLB ❌ ❌ ✔ ❌
B4 ✔ ✔ ❌ ?
? - Difficult to generalize
Path Selection
AlgorithmLoad balanced
Diverse Low-stretchCapacity aware
Globally Optimized
SPF / ECMP ❌ ❌ ❌ ✔
CSPF ✔ ❌ ❌ ✔
k-shortest paths ❌ ❌ ? ✔
Edge-disjoint KSP ❌ ❌ ✔ ✔
MCF ✔ ✔ ❌ ❌
VLB ❌ ❌ ✔ ❌
B4 ✔ ✔ ❌ ?
? - Difficult to generalize
Oblivious Routing
VLB
• Route through random intermediate node
• Works well for mesh topologies
• WANs are not mesh-like
• Good resilience
• Poor performance & latency
Mesh
3
21
…
N
4
VLB
• Route through random intermediate node
• Works well for mesh topologies
• WANs are not mesh-like
• Good resilience
• Poor performance & latency
Mesh
3
21
…
N
4
Not MeshVLB
• Route through random intermediate node
• Works well for mesh topologies
• WANs are not mesh-like
• Good resilience
• Poor performance & latency
Not MeshVLB
• Route through random intermediate node
• Works well for mesh topologies
• WANs are not mesh-like
• Good resilience
• Poor performance & latency
Oblivious [Räcke ‘08]
• Generalizes VLB to non-mesh
• Distribution over routing trees
• Approximation algorithm for low-stretch trees [FRT ’04]
• Penalize links based on usage
• O(log n) competitive
Not Mesh
Low-stretch routing trees
Prob
abili
ty
Oblivious [Räcke ‘08]
• Generalizes VLB to non-mesh
• Distribution over routing trees
• Approximation algorithm for low-stretch trees [FRT ’04]
• Penalize links based on usage
• O(log n) competitive
Not Mesh
Low-stretch routing trees
Prob
abili
ty
Oblivious [Räcke ‘08]
• Generalizes VLB to non-mesh
• Distribution over routing trees
• Approximation algorithm for low-stretch trees [FRT ’04]
• Penalize links based on usage
• O(log n) competitive
Not Mesh
Low-stretch routing trees
Prob
abili
ty
Path Selection
AlgorithmLoad balanced
Diverse Low-stretchCapacity aware
Globally Optimized
SPF / ECMP ❌ ❌ ❌ ✔
CSPF ✔ ❌ ❌ ✔
k-shortest paths ❌ ❌ ? ✔
Edge-disjoint KSP ❌ ❌ ✔ ✔
MCF ✔ ✔ ❌ ❌
VLB ❌ ❌ ✔ ❌
B4 ✔ ✔ ❌ ?
SMORE / Oblivious ✔ ✔ ✔ ✔
SMORE: Semi-Oblivious Routing
Oblivious Routing computes a set of paths which are low-stretch, robust and have good load balancing properties
LP Optimizer balances load by dynamically adjusting splitting ratios used to map incoming traffic flows to paths
Path Selection
Rate Adaptation
Semi-Oblivious Traffic Engineering: The Road Not Taken [NSDI ’18]
Semi-Oblivious Routing in Practice?
• ▼ Previous work [Hajiaghayi et al.] established a worst-case competitive ratio that is not much better than oblivious routing: Ω(log(n)/log (log(n)))
• But the real-world does not typically exhibit worst-case scenarios
• Implicit correlation between demands and link capacities Question: How well does semi-oblivious routing perform in practice?
Evaluation
YATES
Facebook’s Backbone Network
Source: https://research.fb.com/robust-and-efficient-traffic-engineering-with-oblivious-routing/
YATES: Rapid Prototyping for Traffic Engineering Systems [SOSR ’18]
PerformanceM
etric
Time
Throughput Congestion Drop Max. Link Utilization
PerformanceM
etric
Time
Throughput Congestion Drop Max. Link Utilization
RobustnessM
etric
Time
Throughput Congestion Drop Max. Link Utilization Failure Drop
Robustness
Path budget = 4
Met
ric
Time
Throughput Congestion Drop Max. Link Utilization Failure Drop
Operational Constraints - Path Budget
4-8xOptimal
SMOREMCF KSP+MCFR-MCF
Path Budget
Max
. Lin
k U
tiliza
tion
Large Scale Simulations
• Conducted larger set of simulations on Internet Topology Zoo
• 30 topologies from ISPs and content providers
• Multiple traffic matrices (gravity model), failure models and operational conditions
Do these results generalize?Yes*
Throughput SLA
SMORE
Prob
abili
ty o
f ach
ievi
ng S
LANormalized Capacity
KSP+MCFSMORE
R-MCFFFC
Oblivious
ECMP
Takeaways• Path selection plays an outsized role in the performance of TE systems
• Semi-oblivious TE meets the competing objectives of performance and robustness in modern networks
• Oblivious routing for path selection + Dynamic load-balancing
• Ongoing and future-work:
• Apply to other networks (e.g. non-Clos DC topologies)
• SR-based implementations and deployments
Thank You!
Bobby Kleinberg Cornell
Robert Soule Lugano
Nate Foster Cornell
Petr Lapukhov Facebook
Chiun Lin Lim Facebook
Chris Yu CMU
Yang Yuan Cornell
Code: github.com/cornell-netlab/yates
SMORE: Oblivious routing + Dynamic rate adaptation
Learn more: www.cs.cornell.edu/~praveenk/smore/NSDI ’18
Recommended