50
Scalable Label Assignment in Data Center Networks With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California, San Diego

With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Embed Size (px)

Citation preview

Page 1: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Scalable Label

Assignment in Data

Center Networks

With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research),

Keith Marzullo, Amin Vahdat

Meg Walraed-SullivanUniversity of California, San Diego

Page 2: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Group of entities that want to communicate◦ Need a way to refer to one another

Historically, a common problem

◦ E.g. laptop has two labels (MAC address, IP address)

Labeling in data center networks is unique

Labeling in Distributed Networks

◦ Phone system◦ Snail mail◦ Internet

◦ Wireless networks

2

Page 3: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Interconnect of switches connecting hosts Massive in scale: 10k switches, 100k hosts,

millions of VMs

Data Center Network Size

3

Page 4: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Designed with regular, symmetric structure◦ Often multi-rooted trees (e.g. fat tree)

Data Center Network Structure

Reality doesn’t always match the blueprint◦ Components and partitions are added/removed◦ Links/switches/hosts fail and recover◦ Cables are connected incorrectly

4

Page 5: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

What gets labeled in a data center network?◦ Switch ports◦ Host NICs◦ Virtual machines at hosts◦ Etc.

Labels in Data Center Networks

5

Page 6: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Flat Addressing◦ E.g. MAC Addresses (Layer 2)

UniqueAutomatic

✗Scalability: Switches have limited forwarding entries (say, 10k) # Labels in forwarding tables = # Nodes

Data Center Labeling Techniques

6

Page 7: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Hierarchical Addressing◦ E.g. IP Addresses (Layer 3) with DHCP

Scalable forwarding state # Labels in forwarding tables < # Nodes

✗Relies on manual configuration: Unrealistic at scale

Data Center Labeling Techniques

7

Page 8: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

PortLand’s LDP: Location Discovery Protocol DAC: Data center Address Configuration

Manual configuration via blueprints Rely on centralized control

◦ Cannot directly connect controller to all nodes◦ Requires separate out-of-band control network or

flooding techniques

Combining L2 and L3 Benefits

8

PortLand: A Scalable Fault-Tolerance Layer 2 Data Center Network Fabric. Niranjan Mysore et al. SIGCOMM 2009

Generic and Automatic Address Configuration for Data Center Networks. Chen et al. SIGCOMM 2010

Page 9: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Scalability vs. Management

Network Size

Labe

l Ass

ignm

ent

Man

agem

ent O

verh

ead

Ethernet

IP

Target location

Hardware Limit:Need Labels < Nodes

Flat Labels Structured Labels

Automation

9

Page 10: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Cost of Automation

Less management means more automation Structured labels encode topology∴Labels change with topology dynamics

Network Size

Man

age

me

nt O

verh

ead

Ethernet

IP

Target

10

Page 11: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

ALIAS Overview

ALIAS: topology discovery and label assignment in hierarchical networks

Approach: Automatic, decentralized assignment of hierarchical labels

Benefits:◦ Scalability (structured labels, shared label

prefixes)◦ Low management overhead (automation)◦ No out-of-band control network (decentralized)

11

Page 12: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Systems (Implementation/Evaluation)

ALIAS Evolution

Theory (Proof/Protocol Derivation)

ALIAS: Scalable, Decentralized Label Assignment for Data Centers. M. Walraed-Sullivan, R. Niranjan Mysore, M. Tewari, Y. Zhang, K. Marzullo, A. Vahdat. SOCC 2011

Brief Announcement: A Randomized Algorithm for Label Assignment in Dynamic Networks. M. Walraed-Sullivan, R. Niranjan Mysore, K. Marzullo, A. Vahdat. DISC 2011

ALIAS: topology discovery and label assignment in hierarchical networks

12

Page 13: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Multi-rooted trees◦ Multi-stage switch fabric connecting hosts◦ Indirect hierarchy◦ May allow peer links

Labels ultimately used for communication◦ Multiple paths between nodes

Data Center Network Topologies

13

Page 14: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Switches and hosts have labels◦ Labels encode (shortest physical) paths from the root

of the hierarchy to a switch/host◦ Each switch/host may have multiple labels◦ Labels encode location and expose path multiplicity

ALIAS Labels

h’s Labelsa d g h

b e g h

b f g h

c f g h

a d g

b e g

b f g

c f g

g’s Labels

b

d e

g

f

ca

h14

Page 15: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Hierarchical routing leverages this info◦ Push packets upward, downward path is explicit

Communication over ALIAS Labels

h’s Labelsa d g h

b e g h

b f g h

c f g h

a d g

b e g

b f g

c f g

g’s Labels

b

d e

g

f

ca

h15

Page 16: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Continuously1 Overlay appropriate hierarchy on network fabric2 Group sets of related switches into hypernodes3 Assign coordinates to switches4 Combine coordinates to form labels

Periodic state exchange between immediate neighbors

Distributed Protocol Overview

16

Page 17: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Switches are at levels 1 through n Hosts are at level 0

Step 1. Overlay Hierarchy

Only requires 1 host to begin

Level 0

Level 1

Level 2

Level 3

17

Page 18: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Continuously1 Overlay appropriate hierarchy on network fabric2 Group sets of related switches into hypernodes3 Assign coordinates to switches4 Combine coordinates to form labels

Distributed Protocol Overview

18

Page 19: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Labels encode paths from a root to a host◦ Multiple paths lead to multiple labels per host

Aggregate for label compaction◦ Locate switches that reach same hosts

Step 2. Discover Hypernodes

Level 1

Level 2

Level 3

Level 4

(hosts omitted for space)

19

Page 20: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Step 2. Discover Hypernodes

Hypernode (HN):Maximal set of switches that connect to same HNs below

(via any member)

Level 1

Level 2

Level 3

Level 4

Hypernode members are indistinguishable on downward

path from root

Base Case: Each Level 1 switch

is in its own hypernode

20

Page 21: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Continuously1 Overlay appropriate hierarchy on network fabric2 Group sets of related switches into hypernodes3 Assign coordinates to switches4 Combine coordinates to form labels

Distributed Protocol Overview

21

Page 22: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Coordinates combine to make up labels Labels used to route downwards

Step 3. Assign Coordinates

22

Switches in a HN share a coordinate

HN’s with a parent in common need distinct coordinates

Page 23: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Step 3. Assign Coordinates

23

choosers

deciders

Can we make this problem simpler?

Switches in a HN share a coordinate

HN’s with a parent in common need distinct coordinates

Page 24: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

To assign coordinates to hypernodes:a. Define abstraction

(choosers/deciders)b.Design solution for abstractionc. Apply solution throughout multi-

rooted tree

Step 3. Assign Coordinates

24

choosers

deciders

Page 25: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Label Selection Problem (LSP)◦ Chooser processes connected to Decider

processes◦ In a bipartite graph

Step 3. Assign Coordinatesa. Decider/Chooser abstraction

d2 d3d1 d4

c1 c2 c3 c4 c5 c6 Choosers(hypernodes)

deciders(parent

switches)

25

Page 26: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Label Selection Problem Goals:◦ All choosers eventually select coordinates◦ Choosers sharing a decider have distinct

coordinates

Step 3. Assign Coordinates

d2 d3d1 d4

c1 c2 c3 c4 c5 c6 choosers

deciders

x y z y

q

z

z

x

Multiple instances of LSP

Per-instance coordinates

y z

26

a. Decider/Chooser abstraction

Page 27: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Label Selection Problem (LSP)◦ Difficulty: connections can change over time

Step 3. Assign Coordinates

d2 d3d1 d4

c1 c2 c3 c4 c5 c6

x y z y

q

z

z

xz

r27

a. Decider/Chooser abstraction

Page 28: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Decider/Chooser Protocol (DCP)◦ Distributed algorithm that implements LSP◦ Las-Vegas style randomized algorithm

Probabilistically fast, guaranteed to be correct

◦ Practical: Low message overhead, quick convergence

◦ Reacts quickly and locally to topology dynamics Transient startup conditions Miswirings Failure/recovery, connectivity changes

Step 3. Assign Coordinatesb. Design Solution for Abstraction

28

Page 29: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

c2:y?c1:x? c2:y?c1:x?

Algorithm:◦ Choosers select coordinates randomly and send

to deciders◦ Deciders reply with [yes] or [no+hints]◦ One no reselect, All yeses finished

Step 3. Assign Coordinatesb. Design Solution for Abstraction

d2d1

c1 c2

c1:c2:

c1:c2:

c1: xc2: y

c1: xc2: y

yes yesyesyes

Coord: x Coord: y

29

Page 30: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Hypernodes are choosers for their coordinates

Switches are deciders for neighbors below

Step 3. Assign Coordinatesc. Apply DCP through Hierarchy

30

2 choosers

3 deciders 2 choosers

1 decider

3 choosers

3 deciders

Page 31: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

DCP assigns level 1 coordinates

Step 3. Assign Coordinates

3 choosers

3 deciders

31

c. Apply DCP through Hierarchy

Page 32: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

DCP for upper levels:◦ HN switches cooperate (per-parent restrictions)◦ Not directly connected

Step 3. Assign Coordinates

2 choosers

3 deciders

32

c. Apply DCP through Hierarchy

Communicate via shared L1 switch

“Distributed-Chooser DCP”

Page 33: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Continuously1 Overlay appropriate hierarchy on network fabric2 Group related switches into hypernodes3 Assign per-hypernode coordinates4 Combine coordinates to form labels

Distributed Protocol Overview

33

Page 34: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Concatenate coordinates from root downward

Step 4. Assign Labels

(For clarity, assume labels same across instances of

LSP)

34

Page 35: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Hypernodes create clusters of hosts that share label prefixes

Step 4. Assign Labels

35

Page 36: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Topology changes may cause paths to change

Which causes labels to change Evaluation:

◦ Quick convergence ◦ Localized effects

Relabeling

36

Page 37: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Many overlying communication protocols◦ Hierarchical-style forwarding makes most sense

E.g. MAC address rewriting◦ At sender’s ingress switch: dest. MAC ALIAS label◦ At recipient’s egress switch: ALIAS labeldest. MAC◦ Up*/down* forwarding (AutoNet, SOSP91)◦ Proxy ARP for resolution

E.g. encapsulation, tunneling

Using ALIAS labels

37

Page 38: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

“Standard” systems approach◦ Implementation, experimentation, deployment

Theoretical approach◦ Proof, formalization, verification via model

checking

Goal: ◦ Verify correctness, feasibility◦ Assess scalability

Evaluation Methodology

38

Page 39: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Does ALIAS assign labels correctly? Do labels enable scalable communication?

✓Implemented in Mace (www.macesystems.org)✓Used Mace Model Checker to verify

Label assignment: levels, hypernodes, coordinates Sample overlying communication: pairs of nodes can

communicate when physically connected

✓Ported to small testbed with existing communication protocol for realistic evaluation

Evaluation: Correctness

39

Page 40: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Does DCP solve the Label Selection Problem?

✓Proof that DCP implements LSP✓Implemented in Mace and model checked all

versions of DCP

Is LSP a reasonable abstraction?

✓Formal protocol derivation from basic DCPALIAS

Evaluation: Correctness

40

Page 41: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Is overhead (storage, control) acceptable?

✓Resource requirements of algorithm Memory: ~KBs for 10k host network Control overhead: agility/overhead tradeoff

✓Memory usage on testbed deployment (<150B)

Evaluation: Feasibility

41

Ports/Switch Hosts Cycle (ms) Control Overhead (Mbps, %10G link)

64 65k100 31.5 (0.3%)

500 6.29 (0.06%)

128 524k1000 25.16 (0.25%)

2000 12.58 (0.12%)

Page 42: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Is the protocol practical in convergence time?

✓DCP: Used Mace simulator to verify that “probabilistically fast” is quite fast in practice

✓Measured convergence on tested deployment On startup After failure (speed and locality)

✓Used Mace model checker to verify locality of failure reactions for larger networks

Evaluation: Feasibility

42

Page 43: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Does ALIAS scale to data center sizes?

✓Used Mace model checker to verify labels and communication for larger networks than testbed

✓Wrote simulation code to analyze network behavior for enormous networks

Evaluation: Scalability

43

Page 44: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Result: Small Forwarding StateTopology

ALIAS Forwarding

Table EntriesLevels Ports % Fully Provisioned Servers

3

32

100

8,192

4580 26250 17320 86

64

100

65,653

9080 102850 65320 291

4 32

100

131,072

4680 127850 207920 2415

5 16

100

65,653

2380 49250 88620 1108

44

e.g. MACe.g. IP,

LDP/DAC

Page 45: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Scale and complexity of data center networks make labeling problem unique

ALIAS enables scalable data center communication by:◦ Using a distributed approach◦ Leveraging hierarchy to form topologically

significant labels◦ Eliminating manual configuration

Conclusion

45

Page 46: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

46

Convergence of DCP

Page 47: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Convergence vs. Coord. Domain

01

23

4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

d=4

d=8

d=16

d=32

d=64

d=128

m=4

d=4

d=8

d=16

d=32

d=64

d=128

k

P(k

,4,d

)

47

Page 48: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Convergence vs. Coord. Domain

0 1 2 3 4 5 6 7 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

d=8d=16

d=32d=64

d=128

m=8d=8

d=16

d=32

d=64

d=128

k

P(k

,8,d

)

48

Page 49: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Convergence vs. Coord. Domain

02

46

810

1214

16

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

m=16

d=16d=32d=64d=128

k

P(k

,16,d

)

49

Page 50: With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat Meg Walraed-Sullivan University of California,

Convergence vs. Coord. Domain

50