24
DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Embed Size (px)

Citation preview

Page 1: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

DARD: Distributed Adaptive Routing for Datacenter Networks

Xin Wu, Xiaowei Yang

Page 2: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Multiple equal cost paths in DCN

• Scale-out topology -> Horizontal expansion -> More paths

src dst

core

Agg

ToR

pod

Page 3: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Suboptimal scheduling -> hot spot

src1 src2 dst1 dst2

Unavoidable intra-datacenter traffic• Common services: DNS, search, storage• Auto-scaling: dynamic application instances

Page 4: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

To prevent hot spots• Distributed– ECMP & VL2: flow-level hashing in switches

• Centralized– Hedera: compute optimal scheduling in ONE server

Centralized:Efficient but Not Robust

Distributed:Robust but Not Efficient

Design Space

Page 5: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Goal: practical, efficient, robust• Practical– Using well-proven technologies

• Efficient– Close to optimal traffic scheduling

• Robust– No single point failure

Centralized:Efficient but Not Robust

Distributed:Robust but Not Efficient

Design Space

Distributed:Robust and Efficient

Page 6: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Contributions

• Explore the possibility of distributed yet close-to-optimal flow scheduling in DCNs.

• A working implementation in testbed.

• Proven convergence upper bound.

Page 7: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Intuition: minimize the maximum number of flows via a link

src1 dst1 src2 src3dst2 dst3

Step 0: maximum # of flows via a link = 3

Page 8: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

src1 dst1 src2 src3dst2 dst3

Step 1: maximum # of flows via a link = 2

Intuition: minimize the maximum number of flows via a link

Page 9: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Intuition: minimize the maximum number of flows via a link

src1 dst1 src2 src3dst2 dst3

Step 2: maximum # of flows via a link = 1

Page 10: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Architecture

Monitor network states

Compute next scheduling

Change flow’s path

• Control loop runs on every server independently

Page 11: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Monitor network states

• src asks switches for the #_of_flows and bandwidth of each link to dst.

src dst

• src assemblies the link states to identify the most and least congested paths to dst.

Page 12: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Distributed computation• Runs on every server

1. for each dst 2. { 3. Pbusy: the most congested path from src to dst;

4. Pfree : the least congested path from src to dst;

5. if (moving one flow from pbusy to pfree won’t

cause a more congested path than pbusy)

6. Move one flow from pbusy to pfree;

7. }

• Steps to convergence is bounded

Page 13: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Change path: using different src-dst pair core1 core2

1.0.0.0/8 2.0.0.0/8 3.0.0.0/8 4.0.0.0/8core3 core4

src1.1.1.22.1.1.23.1.1.24.1.1.2

src dst

dst1.2.1.22.2.1.23.2.1.24.2.1.2

• src-dst address pair uniquely encodes a path

• Static forwarding table

tor1tor1

1.1.1.0/242.1.1.0/243.1.1.0/244.1.1.0/24

tor2

agg1’s down-hill tabledst next hop 1.1.1.0/24 tor1

1.1.2.0/24 tor2

2.1.1.0/24 tor1

2.1.2.0/24 tor2

agg1

1.1.0.0/162.1.0.0/16

agg1 agg2

agg1’s up-hill tablesrc next hop 1.0.0.0/8 core1 2.0.0.0/8 core2

Page 14: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Forwarding example: E2->E1core1

tor1 tor2

agg1 agg2

E1 E2

agg1’s down-hill tabledst next hop 1.1.1.0/24 tor1

1.1.2.0/24 tor2

2.1.1.0/24 tor1

2.1.2.0/24 tor2

agg1’s up-hill tablesrc next hop 1.0.0.0/8 core1 2.0.0.0/8 core2

1.0.0.0/8 2.0.0.0/8

1.1.1.2 1.2.1.2

src: 1.2.1.2, dst: 1.1.1.2Packet header:

Page 15: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Forwarding example: E1->E2core1

tor1 tor2

agg1 agg2

E1 E2

agg1’s down-hill tabledst next hop 1.1.1.0/24 tor1

1.1.2.0/24 tor2

2.1.1.0/24 tor1

2.1.2.0/24 tor2

agg1’s up-hill tablesrc next hop 1.0.0.0/8 core1 2.0.0.0/8 core2

1.0.0.0/8 2.0.0.0/8

1.1.1.2 1.2.1.2

src: 1.1.1.2, dst: 1.2.1.2Packet header:

Page 16: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Randomness: prevent path oscillation

• Add a random time interval to the control cycle

Page 17: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Implementation

• DeterLab testbed– 16-end-hosts fattree– Monitoring: OpenFlow API– Computation: daemon on end hosts– One NIC multiple addresses: IP alias– Static routes: OpenFlow forwarding table– Multipath: IP-in-IP encapsulation

• ns-2 simulator– For different & larger topologies

Page 18: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

DARD fully utilizes the bisection bandwidth

intra-pod dominant random inter-pod dominant600

700

800

pVLB ECMP DARD Hedera

Traffic Patterns

Bise

ction

ban

dwid

th

(Gbp

s)

• Simulation, 1024-end-host fattree• pVLB: periodical flow-level VLB

Page 19: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

DARD improves large file transfer time

Inter-pod dominant

Intra-pod dominant

random

# of new files per second

DARD

vs.

ECM

P im

prov

emen

t

• Testbed, 16-end-host fattree

Page 20: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Convergence time (seconds)

Inter-pod dominantrandom

Intra-pod dominant

DARD converges in 2~3 control cycles• Simulation, 1024-end-host fattree, static traffic patterns• One control cycle ≈ 10 seconds

Page 21: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Inter-pod dominantrandomIntro-pod dominant

Times a flow switches its paths

Randomness prevents path oscillation• Simulation, 128-end-host fattree

Page 22: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

DARD’s control overhead is bounded by the topology

• control_traffic = #_of_servers x #_of_switches.

• Simulation, 128-end-host fattree

DARDHedera

# of simultaneous flows

Cont

rol t

raffi

c (M

B/s)

Page 23: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Conclusion

• DARD: Distributed Adaptive Routing for Datacenters– Practical: well-proven end-host-based technologies– Efficient: close to optimal traffic scheduling– Robust: no single point failure

Monitor network states

Compute next scheduling

Change flow’s path

Page 24: DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

Thank You!

Questions and comments:[email protected]