31
A Scalable, Commodity Data Center Network Architecture Mohammad AI-Fares , Alexander Loukissas , Amin Vahdat

FATTREE: A scalable Commodity Data Center Network Architecture

Embed Size (px)

DESCRIPTION

FATTREE is a special case of Clos topology. It is the foremost architecture from which the current DC networking architectures have evolved.

Citation preview

Page 1: FATTREE: A scalable Commodity Data Center Network Architecture

A Scalable, Commodity Data Center Network

ArchitectureMohammad AI-Fares , Alexander Loukissas , Amin Vahdat

Page 2: FATTREE: A scalable Commodity Data Center Network Architecture

Overview

• Background of Current DCN Architectures• Desired properties in a DC Architecture• Fat tree based solution• Evaluation• Conclusion

Page 3: FATTREE: A scalable Commodity Data Center Network Architecture

Common DC TopologyInternet

Servers

Layer-2 switchAccess

Data Center

Layer-2/3 switchAggregation

Layer-3 routerCore

Page 4: FATTREE: A scalable Commodity Data Center Network Architecture

Background Cont. Oversubscription:

Ratio of the worst-case achievable aggregate bandwidth among the end hosts to the total bisection bandwidth of a particular communication topology(1:1 indicates that all hosts may communicate with any arbitrary host at full BW. Ex) 1Gb/s for commodity Ethernet)

Lower the total cost of the designTypical designs: factor of 2.5:1 (400 Mbps)to 8:1(125 Mbps)

Cost:Edge: $7,000 for each 48-port GigE switchAggregation and core: $700,000 for 128-port 10GigE switchesCabling costs are not considered!

Page 5: FATTREE: A scalable Commodity Data Center Network Architecture

Current DC Network Architectures

2High level choices for building comm fabric for large scale clusters: Leverages specialized hardware and communication protocols,

such as InfiniBand, Myrinet. – These solutions can scale to clusters of thousands of nodes with high

bandwidth

– Expensive infrastructure, incompatible with TCP/IP applications Leverages commodity Ethernet switches and routers to

interconnect cluster machines – Backwards compatible with existing infrastructures, low-cost– Aggregate cluster bandwidth scales poorly with cluster size, and achieving

the highest levels of bandwidth incurs non-linear cost increase with cluster size

Page 6: FATTREE: A scalable Commodity Data Center Network Architecture

Problem With common DC topology

• Single point of failure• Over subscript of links higher up in the

topology

Page 7: FATTREE: A scalable Commodity Data Center Network Architecture

Properties of the solution(FAT)

• Backwards compatible with existing infrastructure– No changes in applications.– Support of layer 2 (Ethernet)

• Cost effective– Low power consumption & heat emission– Cheap infrastructure

• Allows host communication at line speed.

Page 8: FATTREE: A scalable Commodity Data Center Network Architecture

Clos Networks/Fat-Trees

• Adopt a special instance of a Clos topology:“K-ary FAT tree”

• Similar trends in telephone switches led to designing a topology with high bandwidth by interconnecting smaller commodity switches.

Page 9: FATTREE: A scalable Commodity Data Center Network Architecture

Fat-Tree Based DC Architecture • Inter-connect racks (of servers) using a fat-tree topology

K-ary fat tree: three-layer topology (edge, aggregation and core)– each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches– each edge switch connects to k/2 servers & k/2 aggr. switches – each aggr. switch connects to k/2 edge & k/2 core switches– (k/2)2 core switches: each connects to k pods

Fat-tree with K=4

Page 10: FATTREE: A scalable Commodity Data Center Network Architecture

Fat-Tree Based Topology • Why Fat-Tree?

– Fat tree has identical bandwidth at any bisections– Each layer has the same aggregated bandwidth

• Can be built using cheap devices with uniform capacity– Each port supports same speed as end host– All devices can transmit at line speed if packets are distributed uniform along

available paths • Great scalability: k-port switch supports k3/4 servers/hosts:

(k/2 hosts/switch * k/2 switches/pod * k pods)

Fat tree network with K = 6 supporting 54 hosts

Page 11: FATTREE: A scalable Commodity Data Center Network Architecture

FATtree DC meets the following goals:

• 1. Scalable interconnection bandwidth:• Achieve the full bisetion bandwidth of clusters consisting

of tens of thousands of nodes.

• 2. Backward compatibility. No changes to end hosts.

• 3. Cost saving.

Page 12: FATTREE: A scalable Commodity Data Center Network Architecture

• Enforce a special (IP) addressing scheme in DC– unused.PodNumber.switchnumber.Endhost– Allows host attached to same switch to route

only through that switch– Allows inter-pod traffic to stay within pod

• Exising routing protocals such as OSPF, OSPF-ECMP are unavailable.– Concentrate on traffic.– Avoid congestion– Fast Switch must be able to recognize!

FAT-Tree (Addressing)

Page 13: FATTREE: A scalable Commodity Data Center Network Architecture

FAT-Tree (2-level Look-ups)• Use two level look-ups to distribute traffic

and maintain packet ordering:– First level is prefix lookup

• used to route down the topology to hosts.

– Second level is a suffix lookup• used to route up towards core• maintain packet ordering by using same ports for

same server.• Diffuses and spreads out traffic.

Page 14: FATTREE: A scalable Commodity Data Center Network Architecture

More on Fat-Tree DC Architecture

Diffusion Optimizations (optional dynamic routing techniques)

1. Flow classification, Define a flow as a sequence of packets; pod switches forward subsequent packets of the same flow to same outgoing port. And periodically reassign a minimal number of output ports

– Eliminates local congestion– Assign traffic to ports on a per-flow basis

instead of a per-host basis, Ensure fair distribution on flows

Page 15: FATTREE: A scalable Commodity Data Center Network Architecture

More on Fat-Tree DC Architecture

2. Flow scheduling, Pay attention to routing large flows, edge switches detect any outgoing flow whose size grows above a predefined threshold, and then send notification to a central scheduler. The central scheduler tries to assign non-conflicting paths for these large flows.

– Eliminates global congestion– Prevent long lived flows from sharing the same

links– Assign long lived flows to different links

Page 16: FATTREE: A scalable Commodity Data Center Network Architecture

Fault-ToleranceIn this scheme, each switch in the network maintains a BFD (Bidirectional Forwarding Detection) session with each of its neighbors to determine when a link or neighboring switch fails. 2 kinds of failures:

Failure between upper layer and core switches Outgoing inter-pod traffic: local routing table marks

the affected link as unavailable and chooses another core switch

Incoming inter-pod traffic, core switch broadcasts a tag to upper switches directly connected signifying its inability to carry traffic to that entire pod, then upper switches avoid that core switch when assigning flows destined to that pod

Page 17: FATTREE: A scalable Commodity Data Center Network Architecture

Fault-Tolerance• Failure b/w lower and upper layer switches:

– Outgoing inter- and intra pod traffic from lower-layer:– the local flow classifier sets the cost to infinity and

does not assign it any new flows, chooses another upper layer switch

– Intra-pod traffic using upper layer switch as intermediary:– Switch broadcasts a tag notifying all lower level

switches, these would check when assigning new flows and avoid it

– Inter-pod traffic coming into upper layer switch:– Tag to all its core switches signifying its ability to carry

traffic, core switches mirror this tag to all upper layer switches, then upper switches avoid affected core switch when assigning new flaws

Page 18: FATTREE: A scalable Commodity Data Center Network Architecture

PackingDrawback: No of cables needed.Proposed packaging solution:• Incremental modeling• Simplifies cluster

mgmt.• Reduces cable size• Reduces cost.

Page 19: FATTREE: A scalable Commodity Data Center Network Architecture

Evaluation

• Benchmark suite of communication mappings to evaluate the performance of the 4-port fat-tree using the TwoLevelTable switches, FlowClassifier ans the FlowScheduler and compare to hirarchical tree with 3.6:1 oversubscription ratio

Page 20: FATTREE: A scalable Commodity Data Center Network Architecture

Results: Network Utilization

Page 21: FATTREE: A scalable Commodity Data Center Network Architecture

Results: Heat & Power Consumption

Page 22: FATTREE: A scalable Commodity Data Center Network Architecture

Conclusion

Bandwidth is the scalability bottleneck in large scale clusters

Existing solutions are expensive and limit cluster size Fat-tree topology with scalable routing and backward

compatibility with TCP/IP and Ethernet Large number of commodity switches have the

potential of displacing high end switches in DC the same way clusters of commodity PCs have displaced supercomputers for high end computing environments

Page 23: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 23

Critical Analysis of Fat-Tree Topology

Page 24: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 24

Draw Backs

• Scalability• Unable to support different traffic types

efficiently• Simple load balancing• Specialized hardware• No inherent support for VLan traffic• Ignored connectivity to the Internet• Waste of address space

Page 25: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 25

Scalability

• The size of the data center is fixed. – For commonly available 24, 36, 48 and 64 ports

commodity switches, such a Fat-tree structure is limited to sizes 3456, 8192, 27648 and 65536, respectively.

• The cost of incremental deployment is significant.

Page 26: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 26

Unable to support different traffic types efficiently

• The proposed structure is switch-oriented, which might not be sufficient to support 1-to-x traffic.

• A flow can be setup between 2 any pairs of servers with the help of the switching fabric, but it is difficult to setup 1-to-many and 1-to-all flows efficiently using the proposed algorithms.

Page 27: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 27

Specialized hardware

• This architecture still relies on a customized routing primitive that does not exist in commodity switches.

• To implement the 2-table lookup, the lookup routine in a router have to be modified. – The authors achieves this by modifying a NetFPGA

router. – This modification is yet to be incorporated into

current commodity switches.

Page 28: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 28

No inherent support for VLan traffic

• Agility– The capacity to assign any server to any services

• Performance Isolation– Services should not affected each othe

Page 29: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 29

Waste of address space

• Semantic information in the IP address of servers and switches.– e.g. 10.pod.switch.0/24, port – A large portion of the IP address is wasted.

• NAT is required at the border.

Page 30: FATTREE: A scalable Commodity Data Center Network Architecture

04/08/2023 COMP6611A In-class Presentation 30

Possible Future Research

• Flexible incremental expandability in data center networks.

• Support Valiant Load Balancing or other load balancing techniques in the network.

• Reduce cabling, or increase tolerance to mis-wiring

Page 31: FATTREE: A scalable Commodity Data Center Network Architecture

References:

• A scalable Commodity DCN Archi, dept. of CSE- Univ of California

• pages.cs.wisc.edu/~akella/CS740/F08/DataCenters.ppt

• http://networks.cs.northwestern.edu/EECS495-s11/prez/DataCenters-defence.ppt

• University of Hong Kong:http://www.cse.ust.hk/~kaichen/courses/spring2013/comp6611/