George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford...

George Michelogiannakis,James Balfour, William J. Dally

Computer Systems Laboratory

Stanford University

Elastic-Buffer Flow-Control for On-Chip Networks

Introduction

Elastic-buffer (EB) flow-control uses the channels as distributed FIFOs• Input buffers at routers are not needed

Can provide 12% more throughput per unit power• Equal zero-load latency

Reduces router cycle time by 18%• Compared to VC routers

Outline

Building elastic-buffered channels• By using what is already there

Router microarchitecture

Deadlock avoidance

Load-sensing for adaptive routing

Evaluation

The Idea

Use the network channels as distributed FIFOs

Use that storage instead of input buffers at routers• To remove input buffer area and power costs

Pipelined channel

Channel as FIFO

Building an Elastic Buffer

To build an EB in a pipelined channel with master-slave flip-flops (FFs):

Use latches for storage by driving their enables independently

Master-slave FF

Elastic buffer

How Elastic Buffer Channels Work

Ready/valid handshake between elastic buffers• Ready: At least one free storage slot

• Valid: Non-empty (driving valid data)

Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6

Control Logic Area Overhead

Control logic is implemented as a four-state FSM with 10 gates and 2 FFs• Cost is amortized over channel width

Example: control logic increases

area of a 64-bit channel by 5%

Outline

Building elastic-buffered channels

Router microarchitecture• Use EB flow-control through the router

Deadlock avoidance

Evaluation

Use EB Flow-Control Through the Router

VC input-buffered router

EB router

Input bufferreplaced byinput EB

VC & SWallocators removed.Per-output arbitersinstead.

Three-slot outputEB to cover forarbitration doneone cycle inadvance.

LA routing alsoapplicable to EBnetworks.

Outline

Deadlock avoidance• How to provide isolation without VCs

Evaluation

Deadlock Avoidance: Duplicate Channels

No input buffers no virtual channels

Three types of possible deadlocks:

1. Protocol deadlock

2. Cyclic flit dependency in network

Solution: Duplicate physical channels

Deadlock Avoidance: No Interleaving

3. Interleaving deadlock• New head flits require destination registers

• Occupied destination registers depend on tail flits

• Tail flits cannot bypass the new head flit

Solution: Disallow packet interleaving

Duplicating Channels Between Routers

Duplicate channels with neckdown• Small improvement (still one switch port), large cost

Duplicate channels with duplicate switch ports• Excessive cost (switch quadratic cost)

Dividing Into Sub-Networks More Efficient

Divide into sub-networks• Double bandwidth, double the cost

• However, when narrowing datapath down to normalize for throughput or power more beneficial

• Again, due to switch quadratic cost

Outline

Deadlock avoidance

Load-sensing for adaptive routing• Propose a load metric for EB networks

Evaluation

Output Channel Occupancy Load Metric

Flit-buffered networks use credit count

EB networks measure output channel occupancy• At a certain segment of the output channel (shown in red)

• Occupancy decremented when flits leave that segment

• Incremented by a packet’s length when routing decision is made. Packets see other decisions in same cycle

Outline

Deadlock avoidance

Evaluation• Compare throughput, power, area, latency, cycle time

Evaluation Methodology

Used a modified version of booksim

Area/power estimations from a 65nm library• Input buffers modeled as SRAM cells

• Throughput/power optimal # of VCs and buffer depth

• Two sub-networks: request and reply

Averaged over a set of 6 traffic patterns

Constant packet size (512 bits)

Swept channel width from 28 to 192 bits

Low-swing channels: 0.3 of the full-swing repeated wire traversal power

Throughput-Power Gains in 2D Mesh

EB network improvement:

Same power: 10% increased throughput

Same throughput: 12% reduced power

Throughput gain

Throughput-Area Gains in 2D Mesh

2% improvementfor EB networks

Latency-Throughput in 2D Mesh

Zero-load latency equal

Power Breakdown: No Input Buffer Power

0 0.2 0.4 0.6 0.8

VC-Buff

Mesh low-swing power breakdown (2% packet injection rate)

Output clock

Output FF

Crossbar control

Crossbar power

Input buffer write

Input buffer read

Channel FF

Channel clock

Channel traversal

Area Breakdown: No Input Buffer Area

VC-Buff EBN

Low-swing mesh area breakdown

Channel Switch Input Output(mm2)

Router RTL Implementation

No buffers, VCs, allocators, credits

• VC router had look-ahead routing

Buffers: FF arrays. 2 VCs, 8 slots each

Aspect VC router EB router Savings

Area (μm2) 63,515 14,730 77%

Clock (ns) 3.3 2.7 18%

Power (mW) 2.59 0.12 95%

45nm, LP-CMOS, worst-caseMesh 5x5 routers. DOR. 64-bit datapath

Conclusions

EB flow-control uses channels as distributed FIFOs• Removes input buffers from routers

• Uses duplicate physical channels instead of VCs

Increases throughput per unit power up to 12% for low-swing• Depends on what fraction of the overall cost input buffers

constitute

Reduces router cycle time by 18%

Flow-control choice depends on design parameters and priorities

Questions?

Thanks for your attention

George Michelogiannakis, James Balfour, William J. Dally Computer Systems Laboratory Stanford...

Documents

Balfour Beatty

Healthcare Healthcare - Dally & Associates

George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally This work was completed in Stanford University

Elastic-Buffer Flow Control for On-Chip Networks · 2012-12-10 · Elastic-Buffer Flow Control for On-Chip Networks George Michelogiannakis, James Balfour and William J. Dally Computer

Balfour Beatty is a world-class engineering, construction and … · Balfour Beatty is a registered trade mark of Balfour Beatty plc The creation and care of essential assets Balfour

Dally E Mail

Balfour Test

Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University

Facebook power point dally

George Michelogiannakis William J. Dally Stanford University Router Designs for Elastic- Buffer On-Chip Networks

Balfour Declaration.pdf

Balfour Castle Estate

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Balfour Beatty 1909 – 2009 The first How Balfour Beatty ...files.investis.com/bby/siteware/bb100.pdf · Balfour Beatty 1909 – 2009 The first 100 years How Balfour Beatty developed

The Works of Francis Maitland Balfour, Volume III (of 4) by Balfour

Brian Balfour Presentation

DALLY project

Don't Dilly Dally

NICOLAS DALLY - ŽIVOT I DJELO (1795-1862) · 2018-03-15 · Franjo Prot NICOLAS DALLY - ŽIVOT I DJELO (1795-1862) Dally, Nicolas, rođen u Pompignyju (Meuse) 1795. godine, sin je

Company: Balfour Beatty | Description: Corporate responsibility … · 2008-05-19 · Balfour Beatty Infrastructure Services and Balfour Beatty UtiliW Solutions are two businesses