26
George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow- Control for On-Chip Networks

George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Embed Size (px)

Citation preview

Page 1: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

George Michelogiannakis,James Balfour, William J. Dally

Computer Systems Laboratory

Stanford University

Elastic-Buffer Flow-Control for On-Chip Networks

Page 2: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Introduction

Elastic-buffer (EB) flow-control uses the channels as distributed FIFOs• Input buffers at routers are not needed

Can provide 12% more throughput per unit power• Equal zero-load latency

Reduces router cycle time by 18%• Compared to VC routers

2

Page 3: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Outline

Building elastic-buffered channels• By using what is already there

Router microarchitecture

Deadlock avoidance

Load-sensing for adaptive routing

Evaluation

3

Page 4: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

The Idea

Use the network channels as distributed FIFOs

Use that storage instead of input buffers at routers• To remove input buffer area and power costs

4

Pipelined channel

Channel as FIFO

Page 5: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Building an Elastic Buffer

To build an EB in a pipelined channel with master-slave flip-flops (FFs):

Use latches for storage by driving their enables independently

5

Master-slave FF

Elastic buffer

Page 6: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

How Elastic Buffer Channels Work

Ready/valid handshake between elastic buffers• Ready: At least one free storage slot

• Valid: Non-empty (driving valid data)

6

Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6

Page 7: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Control Logic Area Overhead

Control logic is implemented as a four-state FSM with 10 gates and 2 FFs• Cost is amortized over channel width

Example: control logic increases

area of a 64-bit channel by 5%

7

Page 8: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Outline

Building elastic-buffered channels

Router microarchitecture• Use EB flow-control through the router

Deadlock avoidance

Load-sensing for adaptive routing

Evaluation

8

Page 9: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Use EB Flow-Control Through the Router

9

VC input-buffered router

EB router

Input bufferreplaced byinput EB

VC & SWallocators removed.Per-output arbitersinstead.

Three-slot outputEB to cover forarbitration doneone cycle inadvance.

LA routing alsoapplicable to EBnetworks.

Page 10: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Outline

Building elastic-buffered channels

Router microarchitecture

Deadlock avoidance• How to provide isolation without VCs

Load-sensing for adaptive routing

Evaluation

10

Page 11: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Deadlock Avoidance: Duplicate Channels

No input buffers no virtual channels

Three types of possible deadlocks:

1. Protocol deadlock

2. Cyclic flit dependency in network

Solution: Duplicate physical channels

11

Page 12: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Deadlock Avoidance: No Interleaving

3. Interleaving deadlock• New head flits require destination registers

• Occupied destination registers depend on tail flits

• Tail flits cannot bypass the new head flit

Solution: Disallow packet interleaving

12

Page 13: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Duplicating Channels Between Routers

Duplicate channels with neckdown• Small improvement (still one switch port), large cost

Duplicate channels with duplicate switch ports• Excessive cost (switch quadratic cost)

13

Page 14: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Dividing Into Sub-Networks More Efficient

Divide into sub-networks• Double bandwidth, double the cost

• However, when narrowing datapath down to normalize for throughput or power more beneficial

• Again, due to switch quadratic cost

14

Page 15: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Outline

Building elastic-buffered channels

Router microarchitecture

Deadlock avoidance

Load-sensing for adaptive routing• Propose a load metric for EB networks

Evaluation

15

Page 16: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Output Channel Occupancy Load Metric

Flit-buffered networks use credit count

EB networks measure output channel occupancy• At a certain segment of the output channel (shown in red)

• Occupancy decremented when flits leave that segment

• Incremented by a packet’s length when routing decision is made. Packets see other decisions in same cycle

16

Page 17: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Outline

Building elastic-buffered channels

Router microarchitecture

Deadlock avoidance

Load-sensing for adaptive routing

Evaluation• Compare throughput, power, area, latency, cycle time

17

Page 18: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Evaluation Methodology

Used a modified version of booksim

Area/power estimations from a 65nm library• Input buffers modeled as SRAM cells

• Throughput/power optimal # of VCs and buffer depth

• Two sub-networks: request and reply

Averaged over a set of 6 traffic patterns

Constant packet size (512 bits)

Swept channel width from 28 to 192 bits

Low-swing channels: 0.3 of the full-swing repeated wire traversal power

18

Page 19: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Throughput-Power Gains in 2D Mesh

19

EB network improvement:

Same power: 10% increased throughput

Same throughput: 12% reduced power

Throughput gain

Page 20: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Throughput-Area Gains in 2D Mesh

20

2% improvementfor EB networks

Page 21: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Latency-Throughput in 2D Mesh

21

Zero-load latency equal

Page 22: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Power Breakdown: No Input Buffer Power

22

0 0.2 0.4 0.6 0.8

VC-Buff

EBN

Mesh low-swing power breakdown (2% packet injection rate)

Output clock

Output FF

Crossbar control

Crossbar power

Input buffer write

Input buffer read

Channel FF

Channel clock

Channel traversal

(W)

Page 23: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Area Breakdown: No Input Buffer Area

23

0.0

0.2

0.4

0.6

0.8

1.0

1.2

VC-Buff EBN

Low-swing mesh area breakdown

Channel Switch Input Output(mm2)

Page 24: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Router RTL Implementation

No buffers, VCs, allocators, credits

• VC router had look-ahead routing

Buffers: FF arrays. 2 VCs, 8 slots each

Aspect VC router EB router Savings

Area (μm2) 63,515 14,730 77%

Clock (ns) 3.3 2.7 18%

Power (mW) 2.59 0.12 95%

24

45nm, LP-CMOS, worst-caseMesh 5x5 routers. DOR. 64-bit datapath

Page 25: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Conclusions

EB flow-control uses channels as distributed FIFOs• Removes input buffers from routers

• Uses duplicate physical channels instead of VCs

Increases throughput per unit power up to 12% for low-swing• Depends on what fraction of the overall cost input buffers

constitute

Reduces router cycle time by 18%

Flow-control choice depends on design parameters and priorities

25

Page 26: George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford University Elastic-Buffer Flow-Control for On-Chip Networks

Questions?

Thanks for your attention