© Copyright 2010 Hewlett-Packard Development Company, L.P. 1© Copyright 2010 Hewlett-Packard Development Company, L.P.
Jayaram Mudigonda, HP Labs Praveen Yalagandula, HP LabsMohammad Al-Fares, UCSD Jeff Mogul, HP Labs
SPAIN:
High BW Data-Center Ethernet
with Unmodified Switches
© Copyright 2010 Hewlett-Packard Development Company, L.P. 2
Datacenter Fabric
Traditional DatacenterInternet
Internet-facing applications: E-Mail, Web Servers, etc.
© Copyright 2010 Hewlett-Packard Development Company, L.P. 3
DC Trends
Information
Explosion
Application Consolidatio
nVirtualization
HPC Applications
© Copyright 2010 Hewlett-Packard Development Company, L.P. 4
Datacenter Fabric
DC TrendsInternet
M R
M
R
RR
RR
R
R
M
M
M
M
M
M
Shuffle phase of Map – Reduce
RM
© Copyright 2010 Hewlett-Packard Development Company, L.P. 5
Datacenter Fabric
DC TrendsInternet
M R
M
R
RR
RR
R
R
M
M
M
M
M
M
Shuffle phase of Map – Reduce
RM
High bisection
bandwidth
© Copyright 2010 Hewlett-Packard Development Company, L.P. 6
Datacenter Fabric
DC TrendsInternet
Flat Network
© Copyright 2010 Hewlett-Packard Development Company, L.P. 7
DC Fabric Goals
High bisection BWFlat network
Low-cost
© Copyright 2010 Hewlett-Packard Development Company, L.P. 8
Ethernet: a good choiceCommodity Inexpensive
Speeds: 10G is here 40G/100G soonFlat-addressingSelf-configuring
© Copyright 2010 Hewlett-Packard Development Company, L.P. 9
But wait…
© Copyright 2010 Hewlett-Packard Development Company, L.P. 10
Spanning Tree Protocol (STP)
makes Ethernet hard
to scale!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 11
Spanning Tree Protocol (STP) Roo
tBandwidth bottleneck
Unused links
© Copyright 2010 Hewlett-Packard Development Company, L.P. 12
Proposal 1: High-port core switch
A common current approach
© Copyright 2010 Hewlett-Packard Development Company, L.P. 13
Expensive Core
SwitchHigh BW or
Multiple Links
© Copyright 2010 Hewlett-Packard Development Company, L.P. 14
Proposal 2: L3
IP SubnettingVL2 [SIGCOMM’09]
© Copyright 2010 Hewlett-Packard Development Company, L.P. 15
L3 routers
ExpensiveNo non-IP protocols
(FCoE)
© Copyright 2010 Hewlett-Packard Development Company, L.P. 16
Proposal 3: Modify switches (HW/SW)
TRILL [IETF]
SEATTLE [SIGCOMM’08]
PortLand [SIGCOMM’09]
Not deployable today!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 17
SPAIN
Unmodified L2 switches
Multi-pathing
Arbitrary topologies
© Copyright 2010 Hewlett-Packard Development Company, L.P. 18
SPAIN Approach
Multi-pathing via VLANs
+ End-host driver to
spread load
© Copyright 2010 Hewlett-Packard Development Company, L.P. 19
A C B D
Multi-pathing via VLANsDefault
VLAN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 20
A C B D
Multi-pathing via VLANsDefault
VLAN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 21
SPAIN
Unmodified L2
switches
Multi-pathing via
VLANs
Arbitrary topologies
Minor End-host
modifs
Low-
cost
High-BW
DC
Fabric
Today!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 22
OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 23
OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 24
Offline Computation
Steps: 1. Discover topology 2. Compute paths 3. Layout paths as VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P. 25
Discover topology
SNMP Queries
SPAIN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 26
Compute pathsGoal: leverage redundancy; improve reliability
Challenges: large graphs; more pathsmore resources
© Copyright 2010 Hewlett-Packard Development Company, L.P. 27
Compute pathsOnly consider paths between edge-
switchesModified Dijkstra’s; Prefer edge-disjoint
paths
© Copyright 2010 Hewlett-Packard Development Company, L.P. 28
VLAN LayoutSimple scheme: Each Path as
VLAN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 29
But…
VLAN ID = 12 bits 4096 VLANs!
IEEE 802.1Q:
© Copyright 2010 Hewlett-Packard Development Company, L.P. 30
Simple scheme: Each Path as
VLANScales to only few switches
VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 31
Our approach: 1 VLAN for a set of
paths
VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 32
Challenge: Minimize VLANs
NP-Hard for arbitrary topologies
© Copyright 2010 Hewlett-Packard Development Company, L.P. 33
Heuristics:
1. Greedy path packing
2. Parallel graph-coloring
VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 34
# VLANs = 4
VLAN Layout
© Copyright 2010 Hewlett-Packard Development Company, L.P. 35
OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 36
SPAIN End-host Driver
A B
SPAIN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 37
SPAIN
SPAIN End-host Driver
A B
Topology & VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P. 38
2
1
SPAIN End-host Driver
A B
Flow TableAB, 1 : REDAB, 2 : BLUE
Flow Table
2
1
© Copyright 2010 Hewlett-Packard Development Company, L.P. 39
ChallengesLink & switch failuresPathological flooding
InteroperabilityHost mobilityLoad-balanceEnd-host state
© Copyright 2010 Hewlett-Packard Development Company, L.P. 40
Failures
A B
Flow TableAB : RED
Flow Table
© Copyright 2010 Hewlett-Packard Development Company, L.P. 41
Pathological Flooding
A B
Flow TableAB : RED
Flow TableBA : GREEN
Does not know the
location of B
© Copyright 2010 Hewlett-Packard Development Company, L.P. 42
Solution:
Chirping
© Copyright 2010 Hewlett-Packard Development Company, L.P. 43
Chirping
A B
Flow TableAB : RED
Flow TableBA : GREEN
Does not know the
location of B
C
Knows the location of B
© Copyright 2010 Hewlett-Packard Development Company, L.P. 44
Chirping
A B
Flow TableAB : RED BLUE
Flow TableBA : GREEN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 45
OutlineIntroductionSPAIN Components Offline computation End-host driverEvaluationSummary
© Copyright 2010 Hewlett-Packard Development Company, L.P. 46
Evaluation
Simulations
Real testbed
© Copyright 2010 Hewlett-Packard Development Company, L.P. 47
SimulationsTopologies:CiscoD
C Core switches
Aggregation modulesm = 2
Access switches per modulea = 2
© Copyright 2010 Hewlett-Packard Development Company, L.P. 48
SimulationsTopologies:CiscoD
C Fat-Tree [Al-fares et al. SIGCOMM’08]
#ports/switchp = 4
© Copyright 2010 Hewlett-Packard Development Company, L.P. 49
Simulations
2D HyperX
k=4
Topologies:CiscoD
C Fat-Tree [Al-fares et al. SIGCOMM’08]
HyperX [Ahn et al. SC’09]
© Copyright 2010 Hewlett-Packard Development Company, L.P. 50
SimulationsTopologies:CiscoD
C Fat-Tree [Al-fares et al. SIGCOMM’08]
HyperX [Ahn et al. SC’09]
B-Cube [Guo et al. SIGCOMM’09]
#ports/switch (p)
= 2
Levels (l) = 2
© Copyright 2010 Hewlett-Packard Development Company, L.P. 51
SimulationsTopologies:CiscoD
C Fat-Tree [Al-fares et al. SIGCOMM’08]
HyperX [Ahn et al. SC’09]
B-Cube [Guo et al. SIGCOMM’09]
Metrics: #VLANs Link-Coverage Reliability Throughput
© Copyright 2010 Hewlett-Packard Development Company, L.P. 52
SimulationsTopologies:CiscoD
C Fat-Tree [Al-fares et al. SIGCOMM’08]
HyperX [Ahn et al. SC’09]
B-Cube [Guo et al. SIGCOMM’09]
Metrics: #VLANs Link-Coverage Reliability Throughput
© Copyright 2010 Hewlett-Packard Development Company, L.P. 53
Num. of VLANs
CiscoDC
(8,8)
Fat-Tree (48)
HyperX (16)
B-Cube
(48,2)
146
2880
256
2048
#switches
38
576
971
2048
#VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P. 54
Throughput
CiscoDC
Fat-Tree
HyperX
B-Cube
2x
24x
10.5x
1.6x
Improvement over STP
© Copyright 2010 Hewlett-Packard Development Company, L.P. 55
OpenCirrus Experiments
© Copyright 2010 Hewlett-Packard Development Company, L.P. 56
1G
10GRACK SWITCH (RS)
CORE SWITCH (CS)
80 blades
OpenCirrus Testbed
© Copyright 2010 Hewlett-Packard Development Company, L.P. 57
1G
10GRACK SWITCH (RS)
CORE SWITCH (CS)
80 blades
CS
S1
S2
S3
OpenCirrus Testbed
© Copyright 2010 Hewlett-Packard Development Company, L.P. 58
CS
S1
S2
S3
OpenCirrus Testbed
10G links that we added
© Copyright 2010 Hewlett-Packard Development Company, L.P. 59
CS
S1
S2
S3
OpenCirrus Testbed
4 VLANs
© Copyright 2010 Hewlett-Packard Development Company, L.P. 60
Shuffle-like experiment
Every server to all other servers
500MB data transfer
© Copyright 2010 Hewlett-Packard Development Company, L.P. 61
CS
S1
S2
S3
Spanning Tree Protocol(STP)
© Copyright 2010 Hewlett-Packard Development Company, L.P. 62
Link utilization in each direction
100%
100%
0%
Time
© Copyright 2010 Hewlett-Packard Development Company, L.P. 63
CS
S1
S2
S3
Spanning Tree Protocol(STP)
Over loade
d
Unused
© Copyright 2010 Hewlett-Packard Development Company, L.P. 64
CS
S1
S2
S3
No bottle-necks
SPAIN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 65
Completion times
STP SPAIN
832 s
431 s
~50% reduction
© Copyright 2010 Hewlett-Packard Development Company, L.P. 66
Aggregate Goodput (Gbps)
STP SPAIN
35.6
66.7
87% improvement
© Copyright 2010 Hewlett-Packard Development Company, L.P. 67
Aggregate Goodput (Gbps)
0% 20% 50% 70% 100%
35.6 37.044.7
56.066.7
% SPAIN hosts
Incremental Deployability
© Copyright 2010 Hewlett-Packard Development Company, L.P. 68
CS
S1
S2
S3
Single Shortest Path(SSP) SEATTLE/TRILL
All flows
on RED
All flows on
GREEN
All flows
on GRAY
SEATTLE/TRILL on unmodified switches with
SPAIN
© Copyright 2010 Hewlett-Packard Development Company, L.P. 69
Comparison with SSP
SSP SPAIN
62.3 66.7
Goodput (Gbps)
SSP SPAIN
513 431
Completion Time(s)
16% better 7% better
© Copyright 2010 Hewlett-Packard Development Company, L.P. 70
SPAIN Take-away
Unmodified L2
switches
Multi-pathing via
VLANs
Arbitrary topologies
Minor End-host
modifs
Low-
cost
High-BW
DC
Fabric
Today!
© Copyright 2010 Hewlett-Packard Development Company, L.P. 71 © Copyright 2010 Hewlett-Packard Development Company, L.P. 71
Q&A