© Copyright 2010 Hewlett-Packard Development Company, L.P. 1 Jayaram Mudigonda, HP Labs Praveen...

Preview:

Citation preview

© Copyright 2010 Hewlett-Packard Development Company, L.P.    1© Copyright 2010 Hewlett-Packard Development Company, L.P.   

Jayaram Mudigonda, HP Labs Praveen Yalagandula, HP LabsMohammad Al-Fares, UCSD Jeff Mogul, HP Labs

SPAIN:

High BW Data-Center Ethernet

with Unmodified Switches

© Copyright 2010 Hewlett-Packard Development Company, L.P.    2

Datacenter Fabric

Traditional DatacenterInternet

Internet-facing applications: E-Mail, Web Servers, etc.

© Copyright 2010 Hewlett-Packard Development Company, L.P.    3

DC Trends

Information

Explosion

Application Consolidatio

nVirtualization

HPC Applications

© Copyright 2010 Hewlett-Packard Development Company, L.P.    4

Datacenter Fabric

DC TrendsInternet

M R

M

R

RR

RR

R

R

M

M

M

M

M

M

Shuffle phase of Map – Reduce

RM

© Copyright 2010 Hewlett-Packard Development Company, L.P.    5

Datacenter Fabric

DC TrendsInternet

M R

M

R

RR

RR

R

R

M

M

M

M

M

M

Shuffle phase of Map – Reduce

RM

High bisection

bandwidth

© Copyright 2010 Hewlett-Packard Development Company, L.P.    6

Datacenter Fabric

DC TrendsInternet

Flat Network

© Copyright 2010 Hewlett-Packard Development Company, L.P.    7

DC Fabric Goals

High bisection BWFlat network

Low-cost

© Copyright 2010 Hewlett-Packard Development Company, L.P.    8

Ethernet: a good choiceCommodity Inexpensive

Speeds: 10G is here 40G/100G soonFlat-addressingSelf-configuring

© Copyright 2010 Hewlett-Packard Development Company, L.P.    9

But wait…

© Copyright 2010 Hewlett-Packard Development Company, L.P.    10

Spanning Tree Protocol (STP)

makes Ethernet hard

to scale!

© Copyright 2010 Hewlett-Packard Development Company, L.P.    11

Spanning Tree Protocol (STP) Roo

tBandwidth bottleneck

Unused links

© Copyright 2010 Hewlett-Packard Development Company, L.P.    12

Proposal 1: High-port core switch

A common current approach

© Copyright 2010 Hewlett-Packard Development Company, L.P.    13

Expensive Core

SwitchHigh BW or

Multiple Links

© Copyright 2010 Hewlett-Packard Development Company, L.P.    14

Proposal 2: L3

IP SubnettingVL2 [SIGCOMM’09]

© Copyright 2010 Hewlett-Packard Development Company, L.P.    15

L3 routers

ExpensiveNo non-IP protocols

(FCoE)

© Copyright 2010 Hewlett-Packard Development Company, L.P.    16

Proposal 3: Modify switches (HW/SW)

TRILL [IETF]

SEATTLE [SIGCOMM’08]

PortLand [SIGCOMM’09]

Not deployable today!

© Copyright 2010 Hewlett-Packard Development Company, L.P.    17

SPAIN

Unmodified L2 switches

Multi-pathing

Arbitrary topologies

© Copyright 2010 Hewlett-Packard Development Company, L.P.    18

SPAIN Approach

Multi-pathing via VLANs

+ End-host driver to

spread load

© Copyright 2010 Hewlett-Packard Development Company, L.P.    19

A C B D

Multi-pathing via VLANsDefault

VLAN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    20

A C B D

Multi-pathing via VLANsDefault

VLAN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    21

SPAIN

Unmodified L2

switches

Multi-pathing via

VLANs

Arbitrary topologies

Minor End-host

modifs

Low-

cost

High-BW

DC

Fabric

Today!

© Copyright 2010 Hewlett-Packard Development Company, L.P.    22

OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary

© Copyright 2010 Hewlett-Packard Development Company, L.P.    23

OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary

© Copyright 2010 Hewlett-Packard Development Company, L.P.    24

Offline Computation

Steps: 1. Discover topology 2. Compute paths 3. Layout paths as VLANs

© Copyright 2010 Hewlett-Packard Development Company, L.P.    25

Discover topology

SNMP Queries

SPAIN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    26

Compute pathsGoal: leverage redundancy; improve reliability

Challenges: large graphs; more pathsmore resources

© Copyright 2010 Hewlett-Packard Development Company, L.P.    27

Compute pathsOnly consider paths between edge-

switchesModified Dijkstra’s; Prefer edge-disjoint

paths

© Copyright 2010 Hewlett-Packard Development Company, L.P.    28

VLAN LayoutSimple scheme: Each Path as

VLAN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    29

But…

VLAN ID = 12 bits 4096 VLANs!

IEEE 802.1Q:

© Copyright 2010 Hewlett-Packard Development Company, L.P.    30

Simple scheme: Each Path as

VLANScales to only few switches

VLAN Layout

© Copyright 2010 Hewlett-Packard Development Company, L.P.    31

Our approach: 1 VLAN for a set of

paths

VLAN Layout

© Copyright 2010 Hewlett-Packard Development Company, L.P.    32

Challenge: Minimize VLANs

NP-Hard for arbitrary topologies

© Copyright 2010 Hewlett-Packard Development Company, L.P.    33

Heuristics:

1. Greedy path packing

2. Parallel graph-coloring

VLAN Layout

© Copyright 2010 Hewlett-Packard Development Company, L.P.    34

# VLANs = 4

VLAN Layout

© Copyright 2010 Hewlett-Packard Development Company, L.P.    35

OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary

© Copyright 2010 Hewlett-Packard Development Company, L.P.    36

SPAIN End-host Driver

A B

SPAIN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    37

SPAIN

SPAIN End-host Driver

A B

Topology & VLANs

© Copyright 2010 Hewlett-Packard Development Company, L.P.    38

2

1

SPAIN End-host Driver

A B

Flow TableAB, 1 : REDAB, 2 : BLUE

Flow Table

2

1

© Copyright 2010 Hewlett-Packard Development Company, L.P.    39

ChallengesLink & switch failuresPathological flooding

InteroperabilityHost mobilityLoad-balanceEnd-host state

© Copyright 2010 Hewlett-Packard Development Company, L.P.    40

Failures

A B

Flow TableAB : RED

Flow Table

© Copyright 2010 Hewlett-Packard Development Company, L.P.    41

Pathological Flooding

A B

Flow TableAB : RED

Flow TableBA : GREEN

Does not know the

location of B

© Copyright 2010 Hewlett-Packard Development Company, L.P.    42

Solution:

Chirping

© Copyright 2010 Hewlett-Packard Development Company, L.P.    43

Chirping

A B

Flow TableAB : RED

Flow TableBA : GREEN

Does not know the

location of B

C

Knows the location of B

© Copyright 2010 Hewlett-Packard Development Company, L.P.    44

Chirping

A B

Flow TableAB : RED BLUE

Flow TableBA : GREEN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    45

OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary

© Copyright 2010 Hewlett-Packard Development Company, L.P.    46

Evaluation

Simulations

Real testbed

© Copyright 2010 Hewlett-Packard Development Company, L.P.    47

SimulationsTopologies:CiscoD

C Core switches

Aggregation modulesm = 2

Access switches per modulea = 2

© Copyright 2010 Hewlett-Packard Development Company, L.P.    48

SimulationsTopologies:CiscoD

C Fat-Tree [Al-fares et al. SIGCOMM’08]

#ports/switchp = 4

© Copyright 2010 Hewlett-Packard Development Company, L.P.    49

Simulations

2D HyperX

k=4

Topologies:CiscoD

C Fat-Tree [Al-fares et al. SIGCOMM’08]

HyperX [Ahn et al. SC’09]

© Copyright 2010 Hewlett-Packard Development Company, L.P.    50

SimulationsTopologies:CiscoD

C Fat-Tree [Al-fares et al. SIGCOMM’08]

HyperX [Ahn et al. SC’09]

B-Cube [Guo et al. SIGCOMM’09]

#ports/switch (p)

= 2

Levels (l) = 2

© Copyright 2010 Hewlett-Packard Development Company, L.P.    51

SimulationsTopologies:CiscoD

C Fat-Tree [Al-fares et al. SIGCOMM’08]

HyperX [Ahn et al. SC’09]

B-Cube [Guo et al. SIGCOMM’09]

Metrics: #VLANs Link-Coverage Reliability Throughput

© Copyright 2010 Hewlett-Packard Development Company, L.P.    52

SimulationsTopologies:CiscoD

C Fat-Tree [Al-fares et al. SIGCOMM’08]

HyperX [Ahn et al. SC’09]

B-Cube [Guo et al. SIGCOMM’09]

Metrics: #VLANs Link-Coverage Reliability Throughput

© Copyright 2010 Hewlett-Packard Development Company, L.P.    53

Num. of VLANs

CiscoDC

(8,8)

Fat-Tree (48)

HyperX (16)

B-Cube

(48,2)

146

2880

256

2048

#switches

38

576

971

2048

#VLANs

© Copyright 2010 Hewlett-Packard Development Company, L.P.    54

Throughput

CiscoDC

Fat-Tree

HyperX

B-Cube

2x

24x

10.5x

1.6x

Improvement over STP

© Copyright 2010 Hewlett-Packard Development Company, L.P.    55

OpenCirrus Experiments

© Copyright 2010 Hewlett-Packard Development Company, L.P.    56

1G

10GRACK SWITCH (RS)

CORE SWITCH (CS)

80 blades

OpenCirrus Testbed

© Copyright 2010 Hewlett-Packard Development Company, L.P.    57

1G

10GRACK SWITCH (RS)

CORE SWITCH (CS)

80 blades

CS

S1

S2

S3

OpenCirrus Testbed

© Copyright 2010 Hewlett-Packard Development Company, L.P.    58

CS

S1

S2

S3

OpenCirrus Testbed

10G links that we added

© Copyright 2010 Hewlett-Packard Development Company, L.P.    59

CS

S1

S2

S3

OpenCirrus Testbed

4 VLANs

© Copyright 2010 Hewlett-Packard Development Company, L.P.    60

Shuffle-like experiment

Every server to all other servers

500MB data transfer

© Copyright 2010 Hewlett-Packard Development Company, L.P.    61

CS

S1

S2

S3

Spanning Tree Protocol(STP)

© Copyright 2010 Hewlett-Packard Development Company, L.P.    62

Link utilization in each direction

100%

100%

0%

Time

© Copyright 2010 Hewlett-Packard Development Company, L.P.    63

CS

S1

S2

S3

Spanning Tree Protocol(STP)

Over loade

d

Unused

© Copyright 2010 Hewlett-Packard Development Company, L.P.    64

CS

S1

S2

S3

No bottle-necks

SPAIN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    65

Completion times

STP SPAIN

832 s

431 s

~50% reduction

© Copyright 2010 Hewlett-Packard Development Company, L.P.    66

Aggregate Goodput (Gbps)

STP SPAIN

35.6

66.7

87% improvement

© Copyright 2010 Hewlett-Packard Development Company, L.P.    67

Aggregate Goodput (Gbps)

0% 20% 50% 70% 100%

35.6 37.044.7

56.066.7

% SPAIN hosts

Incremental Deployability

© Copyright 2010 Hewlett-Packard Development Company, L.P.    68

CS

S1

S2

S3

Single Shortest Path(SSP) SEATTLE/TRILL

All flows

on RED

All flows on

GREEN

All flows

on GRAY

SEATTLE/TRILL on unmodified switches with

SPAIN

© Copyright 2010 Hewlett-Packard Development Company, L.P.    69

Comparison with SSP

SSP SPAIN

62.3 66.7

Goodput (Gbps)

SSP SPAIN

513 431

Completion Time(s)

16% better 7% better

© Copyright 2010 Hewlett-Packard Development Company, L.P.    70

SPAIN Take-away

Unmodified L2

switches

Multi-pathing via

VLANs

Arbitrary topologies

Minor End-host

modifs

Low-

cost

High-BW

DC

Fabric

Today!

© Copyright 2010 Hewlett-Packard Development Company, L.P.    71 © Copyright 2010 Hewlett-Packard Development Company, L.P.    71

Q&A

Recommended