View
215
Download
0
Tags:
Embed Size (px)
Citation preview
A Scalable, Commodity Data Center Network Architecture
Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat
Presented by Gregory Peaker and Tyler Maclean
Overview
• Structure and Properties of a Data Center• Desired properties in a DC Architecture• Fat tree based solution• Evaluation of fat tree approach
Common data center topology
Problem With common DC topology
• Single point of failure• Over subscription of links higher up in the
topology– Typical over subscription is between 2.5:1 and 8:1– Trade off between cost and provisioning
Properties of solutions
• Compatible with Ethernet and TCP/IP• Cost effective– Low power consumption & heat emission– Cheap infrastructure– Commodity hardware
• Allows host communication at line speed– Over subscription of 1:1
Cost of maintaining switches
Review of Layer 2 & Layer 3• Layer 2– Data Link Layer
• Ethernet• MAC address
– One spanning tree for entire network• Prevents looping• Ignores alternate paths
• Layer 3– Transport Layer
• TCP/IP– Shortest path routing between source and destination– Best-effort delivery
FAT Tree based Solution
• Connect end-host together using a fat tree topology – Infrastructure consist of cheap devices• Every port is the same speed
– All devices can transmit at line speed if packets are distributed along existing paths
– A k-port fat tree can support k3/4 hosts
Fat-Tree Topology
Problems with a vanilla Fat-tree
• Layer 3 will only use one of the existing equal cost paths
• Packet re-ordering occurs if layer 3 blindly takes advantage of path diversity– Creates overhead at host as TCP must order the
packets
FAT-tree Modified
• Enforce special addressing scheme in DC– Allows host attached to same switch to route only
through switch– Allows inter-pod traffic to stay within pod– unused.PodNumber.switchnumber.Endhost
• Use two level look-ups to distribute traffic and maintain packet ordering.
2 Level look-ups
• First level is prefix lookup– Used to route down the topology to endhost
• Second level is a suffix lookup– Used to route up towards core– Diffuses and spreads out traffic– Maintains packet ordering by using the same
ports for the same endhost
Diffusion Optimizations
• Flow classification– Eliminates local congestion– Assign to traffic to ports on a per-flow basis
instead of a per-host basis• Flow scheduling– Eliminates global congestion– Prevent long lived flows from sharing the same
links– Assign long lived flows to different links
Results: Heat & Power Consumption
Implementation• NetFPGA:• 4 Gigabit Ports, 36 Mb SRAM• 64MB DDR2, 3GB SATA Port
• Implemented elements in Click Router Software• Two Level Table• Initialized with preconfigured information
• Flow Classifier• Distributes output evenly across local ports
• Flow Report + Flow Schedule• Communicates with central schedule
Evaluation• Purpose: measure bisection bandwidth
• Fat-Tree: 10 machines connected to 48 port switch• Hierarchical: 8 machines connected to 48 port switch
Results
Related Work• Myrinet – popular for cluster based supercomputers• Benefit: low latency• Cost: proprietary, host responsible for load balancing
• Infiniband – used in high-performance computing environments• Benefit: proven to scale and high bandwidth• Cost: imposes its own layer 1-4 protocol• Uses Fat Tree
• Many massively parallel computers such as Thinking Machines & SGI use fat-trees
Conclusion• The Good: cost per gigabit, energy per gigabit is
going down• The Bad: Datacenters are growing faster than
commodity Ethernet devices• Our fat-tree solution• Is better: technically infeasible 27k node cluster using 10
GigE, we do it in $690M• Is faster: equal or faster bandwidth in tests• Increases fault tolerance• Is Cheaper: 20k hosts costs $37M for hierarchical and
$8.67M for fat-tree (1 GigE)• KO’s the competing data center’s