Upload
norma-claypoole
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Symbiotic Routingin Future Data Centers
Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron,Greg O’Shea, Austin Donnelly
Cornell University Microsoft Research Cambridge
1
Data center networking• Network principles evolved from Internet systems• Multiple administrative domains• Heterogeneous environment
• But data centers are different• Single administrative domains• Total control over all operational aspects
• Re-examine the network in this new setting
2
Perf
orm
ance
Isol
ation
Band
wid
th
Faul
t Tol
eran
ce
Gra
cefu
l Deg
rada
tion
Scal
abili
ty
TCO
Com
mod
ity C
ompo
nent
s
. . .
Mod
ular
Des
ign
Rethinking DC networks• New proposals for data center network architectures• DCell, BCube, Fat-tree, VL2, PortLand …
• Network interface has not changed!
Network Interface
3
Challenge• The network is a black box to applications• Must infer network properties
• Locality, congestion, failure …etc• Little or no control over routing
• Applications are a black box to the network• Must infer flow properties
• E.g. Traffic engineering/Hedera
• In consequence• Today’s data centers and proposals use a single protocol• Routing trade-offs made in an application-agnostic way
• E.g. Latency, throughput, …etc4
CamCube• A new data center design
• Nodes are commodity x86 servers with local storage• Container-based model 1,500-2,500 servers
• Direct-connect 3D torus topology• Six Ethernet ports / server• Servers have (x,y,z) coordinates
• Defines coordinate space• Simple 1-hop API
• Send/receive packets to/from 1-hop neighbours• Not using TCP/IP
• Everything is a service• Run on all servers
• Multi-hop routing is a service• Simple link state protocol• Route packets along shortest paths from source to destination
5
(0,2,0)
x
y
z
Development experience• Built many data center services on CamCube
• E.g.• High-throughput transport service
• Desired property: high throughput• Large-file multicast service
• Desired property: low link load• Aggregation service
• Desired property: distribute computation load over servers• Distributed object cache service
• Desired property: per-key caches, low path stretch
6
Per-service routing protocols• Higher flexibility
• Services optimize for different objectives• High throughput transport disjoint paths
• Increases throughput
• File multicast non-disjoint paths• Decreases network load
7
What is the benefit?• Prototype Testbed• 27 servers, 3x3x3 CamCube• Quad core, 4 GB RAM, six 1Gbps Ethernet ports
• Large-scale packet-level discrete event simulator• 8,000 servers, 20x20x20 CamCube• 1Gbps links
• Service code runs unmodified on cluster and simulator
8
Service-level benefits• High throughput transport service• 1 sender 2000 receivers
• Sequential iteration• 10,000 packets/flow• 1500 bytes/packet
• Metric: throughput• Shown: custom/base ratio
9
0 1 2 3 4 50
0.25
0.5
0.75
1
Custom/Base Throughput Ratio
CDF
Flow
s
Service-level benefits• Large-file multicast service• 8,000-server network• 1 multicast group• Group size: 0% 100% of servers
• Metric: # of links in multicast tree• Shown: custom/base ratio
10
0%10%
20%30%
40%50%
60%70%
80%90%
100%0
0.1
0.2
0.3
0.4
Number of servers in the group (%)
Link
s re
ducti
on
Service-level benefits• Distributed object cache service• 8,000-server network• 8,000,000 key-object pairs
• Evenly distributed among servers• 800,000 lookups
• 100 lookups per server• Keys picked by Zipf distribution
• 1 primary + 8 replicas per key• Replicas unpopulated initially
• Metric: path length to nearest hit
11
0 5 10 15 20 25 300
0.25
0.5
0.75
1
Custom Routing
Base Routing
Path length
CDF
Look
ups
Network impact• Ran all services simultaneously• No correlation in link usage• Reduction in link utilization
• Take-away: custom routing reduced network load and increased service-level performance
12
0 services 1 service 2 services 3 services 4 services0
0.2
0.4
0.6
Services per link
Frac
tion
of l
inks
Key-value Cache
Multicast
Fixed Path
Aggregation
High-Throughput T
ransport
00.20.40.60.8
1
Change in link utilization
Cust
om/b
ase
pack
et ra
tio
Symbiotic routing relations• Multiple routing protocols running concurrently• Routing state shared with base routing protocol
• Services• Use one or more routing protocols• Use base protocol to simplify their custom protocols
• Network failures• Handled by base protocol• Services route for common case
13
Network
Base Routing Protocol
Routing Protocol 1 Routing Protocol 2 Routing Protocol 3
Service A Service B Service C
Building a routing framework• Simplify building custom routing protocols
• Routing:• Build routes from set of intermediate points
• Coordinates in the coordinate space• Services provide forwarding function ‘F’• Framework routes between intermediate points
• Use base routing service• Consistently remap coordinate space on node failure
• Queuing:• Services manage packet queues per link• Fair queuing between services per link
14
Fpacket
local coordnext coord
Example: cache service• Distributed key-object caching
• Key-space mapped onto CamCube coordinate space
• Per-key caches• Evenly distributed across coordinate space• Cache coordinates easily computable based on key
15
Cache service routing• Routing• Source nearest cache or primary• On cache miss: cache primary
• Populate cache: primary cache
• F function computed at• Source• Cache• Primary
• Different packets can use different links• Accommodate network conditions
• E.g. congestion16
Fv FF
v
v
v
source/querier
nearest cache
primary server
• On link failure• Base protocol routes around failure
• On replica server failure• Key space consistently remapped
by framework
• F function does not change• Developer only targets common case• Framework handles corner cases
Handling failures
17
F
v
source/querier
nearest cache
primary server
Cache service F functionprotected override List<ulong> F(int neighborIndex, ulong currentDestinationKey, Packet packet) { List<ulong> nextKeys = new List<ulong>(); ulong itemKey = LookupPacket.GetItemKey(packet); ulong sourceKey = LookupPacket.GetSourceKey(packet);
if (currentDestinationKey == sourceKey) // am I the source? { // get the list of caches (using KeyValueStore static method) ulong[] cachesKey = ServiceKeyValueStore.GetCaches(itemKey);
// iterate over all cache nodes and keep the closest ones int minDistance = int.MaxValue; foreach (ulong cacheKey in cachesKey) { int distance = node.nodeid.DistanceTo(LongKeyToKeyCoord(cacheKey)); if (distance < minDistance) { nextKeys.Clear(); nextKeys.Add(cacheKey); minDistance = distance; } else if (distance == minDistance) nextKeys.Add(cacheKey); } }
else if (currentDestinationKey != itemKey) // am I the cache? nextKeys.Add(itemKey);
return nextKeys; }
18
extract packet details
if at source, route to nearest cacheor primary
if cache miss,route to primary
Framework overhead• Benchmark performance• Single server in testbed• Communicate with all six 1-hop neighbors (Tx + Rx)• Sustained 11.8 Gbps throughput
• Out of upper bound of 12 Gbps
• User-space routing overhead
19
Baseline Framework0
20
40
60
80
100
CPU
Util
izati
on (%
)
What have we done• Services only specify a routing “skeleton”• Framework fills in the details
• Control messages and failures handled by framework• Reduce routing complexity for services
• Opt-in basis• Services define custom protocols only if they need to
20
Network requirements• Per-service routing not limited to CamCube
• Network need only provide:• Path diversity
• Providing routing options• Topology awareness
• Expose server locality and connectivity• Programmable components
• Allow per-service routing logic
21
Conclusions• Data center networking from the developer’s perspective
• Custom routing protocols to optimize for application-level performance requirements
• Presented a framework for custom routing protocols• Applications specify a forwarding function (F) and queuing hints• Framework manages network state, control messages, and
remapping on failure
• Multiple routing protocols running concurrently• Increase application-level performance• Decrease network load
22
Thank [email protected]
23
Cache serviceInsert throughput
0 20 40 60 80 100 120 1400
0.5
1
1.5
2
2.5
3
3.5
4
F=3, disk
F=27, disk
F=3, no disk
F=27, no disk
Concurrent insert requests
Inse
rt th
roug
hput
(Gbp
s)
Disk I/O bounded
Ingress bandwidth bounded (3 front-ends)
24
Cache serviceLookup requests/second
0 20 40 60 80 100 120 1400
20,000
40,000
60,000
80,000
100,000
120,000
140,000
F=3 F=27
Concurrent lookup requests
Look
up ra
te (r
eqs/
s) Ingress bandwidth bounded
25
Cache serviceCPU Utilization on FEs
0 20 40 60 80 100 120 1400
10
20
30
40
50
60
70
80
90
100lookup (F=3)
insert (F=3, no disk)
insert (F=27, no disk)
lookup (F=27)
Concurrent requests
CPU
util
izati
on (%
)
3 front-ends
27 front-ends
26
Camcube link latency
1,500-byte packets 9,000-byte packets0
100
200
300
400
500
600
700
800
900
1000UDP (x-cable)
Camcube (1 hop)
UDP (switch)
TCP (x-cable)
TCP (switch)
Roun
d tr
ip ti
me
(mic
rose
c)
27