26
Silicon Nanophotonic Network- On-Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf, Luca P. Carloni, Keren Bergman

Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Embed Size (px)

Citation preview

Page 1: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Silicon Nanophotonic Network-On-Chip Using TDM Arbitration

Gilbert Hendry – Columbia University

Johnnie Chan, Shoaib Kamil, Lenny Oliker,

John Shalf, Luca P. Carloni, Keren Bergman

Page 2: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

2

Why Photonics?

TX RX

ELECTRONICS: Buffer, receive and re-

transmit at every router.

Each bus lane routed independently. (P NLANES)

Off-chip BW is pin-limited and power hungry.

Photonics changes the rules for Bandwidth, Energy, and Distance.

OPTICS: Modulate/receive high

bandwidth data stream once per communication event.

Broadband switch routes entire multi-wavelength stream.

Off-chip BW = On-chip BW for nearly same power.

RX

TX

RX RX

TX

RX

TXRXTX

TX TXTXTX TX

RX

Page 3: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Silicon Photonic Integration

Cornell, 2005

Sandia, 2008 Ghent, 2007

Columbia, 2008

Cornell, 2009

Page 4: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Photonic Networks-on-Chip

[U. of Wisconsin, HP] [MIT] [Columbia]

Corona Photonic Clos PhotonicTorus

Page 5: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Ring Resonators

Modulator/filter

λ λ

Broadband

Page 6: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Circuit-switched P-NoCs

SD

0V1V

n-region

p-region

Electronic Control

0V1V

Ohmic Heater

Thermal Control

Tran

sm

issi

on

Injected Wavelengths

Off-resonance profile

On-resonance profile

Page 7: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Energy-efficient end-to-end transmission

High bandwidth through WDM

Electronic network still available for small control messages*

Network-level support for secure regions

Pros:

Cons:

* [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]

Circuit-switched P-NoCs

Path setup latency Path setup contention

(no fairness) Longer paths block more

Head-of-line blocking at gateways

Page 8: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Head of Line Blocking

Core

Core

Core

Core

Tx/Rx

Netw

ork

IF

Bidirectional Waveguide

Bidirectional Electronic Channel

Control Router

Electronic Crossbar

5-port photonic switch

To/From Control plane

To/From Data plane

Seri

aliz

atio

n

Dri

vers

Des

eria

liza

tion

Rec

eive

rs

* [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]

External Concentration*

Page 9: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

TDM Arbitration

Tim

e sl

ot

0 Tim

e sl

ot

1 Tim

e sl

ot

T

t0t1t2

t3t4

tC-3tC-2tC-1

Page 10: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Synchronous Gateway/Control

Time slot ~ 10nsTDM sync clock ~ 100MHz

Page 11: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Nonblocking Network Scheduling

Time slot 0

Time slot 1

Time slot 2

Required time slots = N-1

Page 12: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

However…

0

10

20

30

40

50

Inse

rtio

n L

oss

(dB

)

Topology Size (nodes)

Non-BlockingTorus Topology

18.7 25.331.5

38.044.1

50.656.8

63.2

[M. Petracca et al. IEEE Micro, 2008]

Nonblocking topology difficult to implement because of Insertion Loss

* [J. Chan et al. Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis. JLT, May 2010

Page 13: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Scheduling Time Slots

Problem: Blocking Network Full coverage Minimize Time

Slots (most comm. per

slot)

Constraints: Source contention Destination

contention Topology

contention

Page 14: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Solution: Genetic Search

S

S

S

S

SS

S

S S

S

S

S S

SS

S

S

S S

SS

S

S

S

S

S

SS

S

S S

SS

S

S

S

S

S

SS

S

Population

(size P)

Selection(down to size

psxP)

Reproduction(back to P)

Mutation(still P)

Slot 0: c0, c5, c7, c8Slot 1: c23, c6, c58…Slot T: c42, c65, c1

Initialization

S

Slot 0: c0Slot 1: c1…Slot N2: cN2

Fitness = 1/(number of time slots)

Page 15: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Reproduction: Birds and Bees

S0

S1

c0, c3, c60, c19c27, c4

c100, c71, c9

c1, c17, c23

C

c12, c2, c1, c60c100, c82, c9

c0

c89, c56, c16, c63

c0, c3, c60, c19c12, c2, c1, c60

Page 16: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Mutation: Secret of the Ooze

S

c0, c3, c60, c19c27, c4

c100, c71, c9

c1, c17, c23

c100c71c9

S

c0, c3, c60, c19, c9c27, c4, c100

c1, c17, c23, c71

c100c71c9

Page 17: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Schedule Results

Pop size = 50 Mutation prob = 0.8

16-node 36-node 64-node

10 20 30 40 50 60 701

10

100

1000

10000

10

100

1000

10000

Network size

Exe

cuti

on T

ime

(s)

Sol

utio

n (N

umbe

r of

slo

ts)

Page 18: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Implementation: Photonic Switch

200µm rings Total switch size =

1.4mm x 1.4mm No

S->W, S->E, N->W, N->E (X-then-Y routing)

Page 19: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Implementation: Switch Control Width of LUT = 12

(number of rings) Length of LUT = T

(number of time slots)

Page 20: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Implementation: Network Gateway 1. Send request 2. Grant, set x-

bar and transmit to serializer

3. Receive, deserialize

4. Store in temp buffer, request to core

Page 21: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Simulation Setup PhoenixSim* – Photonic and Electronic

network simulator 64 cores E-mesh, P-mesh, P-TDM Traffic

Random – 32B, 1kB, 32kB messages Scientific application traces

* [Chan et al. PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks. In DATE 2010]

Page 22: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Results – Random Traffic

1 10 100 10000.01

0.1

1

10

100

1000E-MeshP-MeshP-TDM

Measured Bandwidth (GB/s)

Avg

. Lat

ency

s)32B

Page 23: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

1 10 100 10000.01

0.1

1

10

100

1000

E-Mesh

Measured Bandwidth (GB/s)

Avg

. Lat

ency

s)

Results – Random Traffic

32B1kB

Page 24: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

1 10 100 10000.01

0.1

1

10

100

1000E-Mesh

Measured Bandwidth (GB/s)

Avg

. Lat

ency

s)

Results – Random Traffic

32B1kB32kB

Page 25: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Results – Scientific Applications

Cactus GTC MADbench PARATEC

0.00001

0.0001

0.001

0.01

E-Mesh P-Mesh P-TDM

Exe

cuti

on T

ime

(s)

Cactus GTC MADbench PARATEC

0.00001

0.0001

0.001

0.01

0.1

E-Mesh P-Mesh P-TDM

Ene

rgy

(J)

Benchmark

Num Phases

Num Messages

Total Size (MB)

Avg Msg Size (B)

Cactus 2 285 7.3 25600

GTC 2 63 8.1 129796

MADbench 195 15414 86.5 5613

PARATEC 34 126059 5.4 43.3

Page 26: Silicon Nanophotonic Network-On- Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, Shoaib Kamil, Lenny Oliker, John Shalf,

Conclusion TDM implements fairness TDM improves network utilization Genetic Search useful for finding full-coverage

static schedule Future Work:

Scaling gracefully* Reducing time slots* Dynamic scheduling

Contact: [email protected]

* [Hendry et al. Time-Division-Multiplexed Arbitration in Silicon Nanophotonic Networks-on-Chip for High Perf. CMPs. In JPDC, Jan 2011]