19
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through: Part-time Optics in Data Centers

David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

Embed Size (px)

Citation preview

Page 1: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

David G. Andersen

CMU

Guohui Wang,

T. S. Eugene Ng

Rice

Michael Kaminsky, Dina Papagiannaki,

Michael A. Kozuch, Michael Ryan

Intel Labs Pittsburgh

1

c-Through: Part-time Optics in Data Centers

Page 2: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

Current solutions for increasing data center network bandwidth

2

1. Hard to construct 2. Hard to expand

FatTree BCube

Page 3: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

An alternative: hybrid packet/circuit switched data center network

3

Goal of this work: – Feasibility: software design that enables efficient use of optical

circuits– Applicability: application performance over a hybrid network

Page 4: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

Electrical packet switching

Optical circuit switching

Switching technology

Store and forward Circuit switching

Switching capacity

Switching time

Optical circuit switching v.s. Electrical packet switching

4

16x40Gbps at high end e.g. Cisco CRS-1

320x100Gbps on market, e.g. Calient FiberConnect

Packet granularity Less than 10ms

e.g. MEMS optical switch

Page 5: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

5

Optical circuit switching is promising despite slow switching time

Full bisection bandwidth at packet granularitymay not be necessary

[WREN09]: “…we find that traffic at the five edge switches exhibit an ON/OFF pattern… ”

[IMC09][HotNets09]: “Only a few ToRs are hot and most their traffic goes to a few other ToRs. …”

Page 6: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

Hybrid packet/circuit switched network architecture

Optical circuit-switched network for high capacity transfer

Electrical packet-switched network for low latency delivery

Optical paths are provisioned rack-to-rack– A simple and cost-effective choice – Aggregate traffic on per-rack basis to better utilize optical circuits

Page 7: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

Design requirements

7

Control plane:– Traffic demand estimation – Optical circuit configuration

Data plane:– Dynamic traffic de-multiplexing– Optimizing circuit utilization

(optional)

Traffic demands

Page 8: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

c-Through (a specific design)

8

No modification to applications and switches

Leverage end-hosts for traffic

management Centralized control for circuit configuration

Page 9: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

c-Through - traffic demand estimation and traffic batching

9

Per-rack traffic demand vector

2. Packets are buffered per-flow to avoid HOL blocking.

1. Transparent to applications.

Applications

Accomplish two requirements: – Traffic demand estimation – Pre-batch data to improve optical circuit utilization

Socket buffers

Page 10: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

c-Through - optical circuit configuration

10

Use Edmonds’ algorithm to compute optimal configuration

Many ways to reduce the control traffic overhead

Traffic demand

configuration

Controller

configuration

Page 11: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

c-Through - traffic de-multiplexing

11

VLAN #1

Traffic de-multiplexer

VLAN #1 VLAN #2

circuit configuration

traffic

VLAN #2

VLAN-based network isolation:– No need to modify

switches– Avoid the instability

caused by circuit reconfiguration

Traffic control on hosts:– Controller informs hosts

about the circuit configuration

– End-hosts tag packets accordingly

Page 12: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

12

Testbed setup

Ethernet switch

Emulated optical circuit switch

4Gbps links

100Mbps links

16 servers with 1Gbps NICs Emulate a hybrid network on

48-port Ethernet switch

Optical circuit emulation– Optical paths are available

only when hosts are notified – During reconfiguration, no

host can use optical paths– 10 ms reconfiguration delay

Page 13: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

13

Evaluation

Basic system performance: – Can TCP exploit dynamic bandwidth quickly?

– Does traffic control on servers bring significant overhead?

– Does buffering unfairly increase delay of small flows?

Application performance:– Bulk transfer (VM migration)?

– Loosely synchronized all-to-all communication (MapReduce)?

– Tightly synchronized all-to-all communication (MPI-FFT) ?

Yes

No

No

Yes

Yes

Yes

Page 14: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

14

TCP can exploit dynamic bandwidth quickly

Throughput ramps upwithin 10 ms

Throughput stabilizeswithin 100ms

Page 15: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

15

MapReduce Overview

mapper

mapper

mapper

reducer

reducer

reducer

loadlocal write

data shuffling

outputfile

write

outputfile

outputfile

Split 0

Split 1

Split 2

Input file

Concentrated traffic in 64MB blocks Concentrated traffic

in 64MB blocksIndependent transfers:amenable to batching

Page 16: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

16

MapReduce sort 10GB random data

128 KB

50 MB

100 MB

300 MB

500 MB

0

200

400

600

800

1000

c-Through varying socket buffer size limit (reconfiguration interval: 1 sec)

Electrical network

Full bisection

bandwidth

c-Through

Co

mp

leti

on

tim

e (

s)

153s 135s

Page 17: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

MapReduce sort 10GB random data

17

c-Through varying reconfiguration interval (socket buffer size limit: 100MB)

0.3 Sec

0.5 Sec

1.0 Sec

3.0 Sec

5.0 Sec

0

200

400

600

800

Co

mp

leti

on

tim

e (

s)

Electrical network

Full bisection

bandwidth

c-Through

168s 135s

Page 18: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

Yahoo Gridmix benchmark

18

3 runs of 100 mixed jobs such as web query, web scan and sorting 200GB of uncompressed data, 50 GB of compressed data

Page 19: David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:

19

Summary

Hybrid packet/circuit switched data center network c-Through demonstrates its feasibility Good performance even for applications with all to all traffic

Future directions to explore: The scaling property of hybrid data center networks Making applications circuit aware Power efficient data centers with optical circuits

Picture from Internet websites.