34
© N. Zilberman, G. Bracha, G. Schzukin 2019 Stardust Divide and Conquer in the Data Center Network Golan Schzukin & Gabi Bracha Broadcom Noa Zilberman University of Cambridge February 2019

A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

  • Upload
    others

  • View
    10

  • Download
    1

Embed Size (px)

Citation preview

Page 1: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

StardustDivide and Conquer in the Data Center Network

Golan Schzukin & Gabi BrachaBroadcom

Noa ZilbermanUniversity of Cambridge

February 2019

Page 2: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Network switches

Switch silicon

Switch box

Switch chassis

Scale: 12.8Tbps, 32×400GE 2

Page 3: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Network switch systems

3

Scale: Petabit / second

Page 4: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Data center networksConnecting 10K’s to 100K’s of servers

4

Page 5: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Do data center networks scale?

Network FabricLink Bundle

5

Page 6: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

• Example: Building DC with 100K servers (2500 ToR switches)• Option 1 – Link bundle of 1 (L=1):

–6.4Tbps Fabric Switch, 256×25G–Requires 2 Tiers#fabric-switches = 1172

• Option 2 – Link bundle of 4 (L=4): –6.4Tbps Fabric Switch, 64×100G–Requires 3 Tiers#fabric-switches = 1954 (×1.66 more)

Do data center networks scale?

In a network of 𝑛𝑛 tiers scale is 𝑂𝑂 𝐿𝐿−𝑛𝑛

6

Page 7: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Observation: A link bundle of one enables an optimum build of the network(i.e., less tiers, less switches, …)

Do data center networks scale?

7

Page 8: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Designing new network devices

• A decade ago: “Can we implement this feature?”

• Today: “Is this feature worth implementing, given the design constraints?”

8

Page 9: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

The resource wall• Network silicon die > 7 Billion transistors (Tomahawk, 2014)

• Limited by:• Power density• Die size• Manufacturing feasibility

9

Page 10: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

PKT

Data center network

10

Page 11: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Switch system

Line card Fabric card Fabric card Fabric card Line card

11

PKT

Page 12: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Why waste resources?in n tier network

O(n×(Switching+2×I/O+2×NIF)+n×(Ingress Processing + Egress Processing + Queueing))

O(n×(Switching+2×I/O+2×NIF)+1×(Ingress Processing + Egress Processing + Queueing))

12

Page 13: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Observation: Significant resources can be saved by simplifying the data center network

Why waste resources?

13

Page 14: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

12.8Tbps Switches!

Lets convert to packet rate requirements:5800 Mpps @ 256B (100GE→38.7Mpps)

19200 Mpps @ 64B (100GE→150Mpps)

But clock rate is only ~1GHz….

The single-pipeline switch

14

Page 15: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Observation: To support full line rate for all packet sizes,network devices need to process multiple packets each and every clock cycle.

The age of multi core has reached switching…

The single-pipeline switch

15

Page 16: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

The switch pipelineThe common depiction:

16

PKT

Page 17: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

PACKET512B

Actual Implementation:Throughput = clock frequency x bus width

Data pathWidthe.g. 256B

256B

256B

CLOCK CLOCKCYCLE2 CYCLE1

The switch pipeline

17

Page 18: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

PACKET257B

Actual Implementation:Throughput ≠ clock frequency x bus width

Data pathWidthe.g. 256B

256B

1BCLOCK CLOCKCYCLE2 CYCLE1

The switch pipeline

18

Page 19: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

02468

101214161820

64 320 576 832 1088 1344 1600

Requ

ired

Para

llelis

m

Packet Size [B]

12.8Tbps Switches!

Lets convert to packet rate requirements:5800 Mpps @ 256B (100GE→38.7Mpps)19200 Mpps @ 64B (100GE→150Mpps)But clock rate is only ~1GHz….

The single-pipeline switch

0

2

4

6

8

10

12

14

16

18

20

64 320 576 832 1088 1344 1600

Requ

ired

Para

llelis

m

Packet Size [B]

But if we pack data optimally…

19

Page 20: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Observation: To support full line rate for all packet sizes, network devices need to process multiple packets each and every clock cycle.

Observation: For best switch utilization, use fixed-size data units (cells)

The age of multi core has reached networking…

The single-pipeline switch

20

Page 21: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

• A link bundle of one enables an optimum build of the network (i.e. less tiers, less switches, …)

• Significant resources can be saved by simplifying the network fabric

• To support full line rate for all packet sizes, network devices need to process multiple packets each and every clock cycle.

• For best switch utilization, use fixed-size data units (cells)

Observations

21

Page 22: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Introducing StardustFrom switch-system to data-center scale

22

Page 23: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Introducing Stardust• Complex edge, simple network fabric

• Fabric Element - Fabric device

• A simple cell switch

• Fabric Adapter – Edge device

• A packet switch

• Quite similar to a ToR

• Chops packets to cells 7th generation

5th generation

23

Widely used in switch-systems

Page 24: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

A Stardust based network

No Link Bundles

24

Page 25: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Dynamic cell routing

Input 1 Output 1

Input 7 Output 1

Non-Blocking

123

456

789

123

456

789

Input 9 Output 7Input 8 Output 2

1/3, 1/3, 1/3 1/3

25

Page 26: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Reachability table• Need to know only the destination Fabric Adapter

• 1M virtual machines → 100K end hosts → 2500 Fabric Adapters

• Entries indicate “reachable through these links”

• “You can get to Fabric Adapter 1 using links 1,5,8,14,36”

• Bitmap of size “switch radix”

• Automatically constructed and updated

• Using reachability messages2

123

26

Page 27: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Buffering and scheduling• Packet buffering at the edge

• Using virtual output queues (VOQ) at the ingress Fabric Adapter

• A distributed scheduled fabric• A Fabric Adapter generates credits (e.g. 4KB) to all non-empty

associated VOQ

432-node Fat-Tree(simulation)

2790

KB F

low

s

Page 28: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Packet packing

28

Page 29: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Packet packing

29

+

NetFPGA SUME

Page 30: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Properties Protocol and traffic pattern agnosticism

Improved resilience and self healing

Less network tiers, better scalability

Optimal load balancing

Lossless transmission

Incast absorption

Pull fabric and port fairness30

Cell switching & packing, dynamic routing, fabric scheduling

Reachability messages, link bundling, dynamic routing

Link bundling, reachability messages, dynamic routing

Dynamic routing, cell switching & packing, fabric scheduling

Fabric scheduling, dynamic routing, cell switching, reachability messages

Fabric scheduling, dynamic routing, cell switching, reachability messages

Fabric scheduling, dynamic routing, cell switching, link bundling

Page 31: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Power and cost – entire network• Less network tiers → less devices• Less power & area (cost) per device

− Fabric Element saves 35% of power− Fabric Element saves 33.3% of silicon area

• Save 87% of header processing area• Save 70% of network interface area

31

Page 32: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

What about the future?• Scalability of ToR / Fabric Adapter is the bottleneck

• Let us replace the ToR with a Fabric Element

• Let us turn the NIC into a Fabric Adapter• Lighter MAC

• Smaller tables

• Limited VOQs

• Fabric adapters already support DMA

32

Port

s

PCIe

SoC

Light MAC DMA

Engine

VoQ ReachabilityTable

Page 33: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Stardust - summaryFrom switch-system to data center scale:• Simple network fabric• Push complexity to the edge

• Combines:• Cell switching and Packet packing• Load balancing• Scheduled fabric• Reduced network tiers

• Better performance• Lower power, lower cost

33

Page 34: A scalable fabric architecture for data center networks...Introducing Stardust • Complex edge, simple network fabric • Fabric Element - Fabric device • A simple cell switch •

© N. Zilberman, G. Bracha, G. Schzukin 2019

Acknowledgements

34