20
Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker , Nan Jiang, George Michelogiannakis, William J. Dally Stanford University Concurrent VLSI Architecture Group ICCD 2012, 9/30/12–10/3/12, Montreal, Canada

Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Embed Size (px)

Citation preview

Page 1: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Adaptive Backpressure:Efficient Buffer Management for

On-Chip Networks

Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally

Stanford University

ConcurrentVLSIArchitectureGroup

ICCD 2012, 9/30/12–10/3/12, Montreal, Canada

Page 2: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

2

Overview

• Input buffer sharing is attractive in NoCs• Improves area and power efficiency• But facilitates spread of congestion

• Adaptive Backpressure mitigates performance degradation by avoiding unproductive use of buffer space in the presence of congestion

• Avoid downsides of buffer sharing while maintaining benefits in benign case

10/3/12

Page 3: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

3

Dynamic Buffer Management

• Buffer space is expensive resource in NoCs– 30-35% network power (MIT RAW, UT TRIPS)

• Dynamic management increases utilization by sharing buffer space among multiple VCs– Optimize use of expensive buffer resources– Decrease incremental cost of VCs

⇒Improved area and power efficiency⇒25% more throughput or 34% less power

[Nicopoulos’06]

10/3/12

Page 4: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

4

Buffer Monopolization

• Blocked flits from congested VC accumulate in buffer⇒Effective buffer size reduced for other VCs

⇒Performance degradation (latency / throughput)⇒Congestion spreads across VCs (flows / apps / VMs / …)

10/3/12

VC 0

VC 1

Page 5: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

5

Adaptive Backpressure

Goal:• Avoid unproductive use of buffer space• But allow sharing when beneficial

Approach:• Match arrival and departure rate for each VC

by regulating credit availability (backpressure)• Derive quota from credit round trip times

10/3/12

Page 6: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

6

Quota Motivation (1)

Tcrt,0

Without congestion, full throughput

requires Tcrt,0 credits

Router 0 Router 1 Router 0 Router 1

10/3/12

Creditstall

Insufficient credit supply causes idle cycle downstream

Idlecycle

time

Page 7: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

7

Quota Motivation (2)

Congestionstall

Creditstall

Matching stalls avoids unproductive

buffer occupancy

Router 0 Router 1 Router 0 Router 1

Excessdrained

10/3/12

Queuing stall

Queuing stall

Tcrt,0+TstallCongestionstall

Queuing stall

Queuing stall

Queuing stall

Queuing stall

Excessflits

Congestion stallcauses unproductive

buffer occupancy

Excessflits

time

Page 8: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

8

Quota Heuristic

• Track credit RTT for each output VC• RTT=RTTmin ⇒ set quota to RTTmin

– No downstream congestion⇒Allow one flit in each cycle of RTT interval

• RTT>RTTmin ⇒ subtract difference from RTTmin

– Each congestion and queuing stall adds to RTT⇒Allow one credit stall per downstream stall

10/3/12

Page 9: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

9

Implementation

• Network design determines RTTmin for each link• Track RTT for single in-flight credit per VC• Update quota value upon return• Switch allocator masks all VCs that exceed quota

⇒Simple extension to existing flow control logic⇒No additional signaling required⇒< 5% overhead for 16x64b buffer with 4 VCs

10/3/12

Page 10: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

10

Evaluation Methodology

• BookSim 2.0• 8x8 2D mesh, 64-bit channels, DOR• 16-slot input buffers, 4 VCs• Combined VC and switch allocation• Synthetic traffic and application benchmarks• Compare ABP to unrestricted sharing

10/3/12

Page 11: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

11

Network Stability (1)

• For adversarial traffic, throughput in Mesh is unstable at high load– Traffic merging causes starvation– Tree saturation causes widespread congestion

• ABP improves stability– Throttles sources that inject at very high rate– Efficient buffer use reduces tree saturation

⇒Faster recovery from transient congestion10/3/12

Page 12: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

12

Network Stability (2)[tornado traffic]

6.3x

10/3/12

Page 13: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

13

Network Stability (3)[foreground traffic at 50% injection rate]

3.3x

-13%saturation rate

10/3/12

Page 14: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

14

Performance Isolation (1)

• Inject two classes of traffic into network– Shared buffer space, separate VCs

⇒Sharing causes interference between classes

• ABP reduces interference– Contains effects of congestion within a class

⇒Better isolation between workloads, VMs, …

10/3/12

Page 15: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

15

Performance Isolation (2)[uniform random foreground traffic]

[hotspot background traffic][uniform random background traffic]

-33% -38%

10/3/12

Page 16: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

16

Performance Isolation (3)[50% uniform random background traffic]

-31%

w/o background

10/3/12

Page 17: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

17

Application Performance (1)

10/3/12

• Array of stream processors• Streaming data to memory• Modeled as hotspot traffic

• In-order general purpose core• Running at 4x network frequency• Executing PARSEC benchmarks• Modeled using Netrace [Hestness’11]

• Common network• Disjoint VC ranges• Shared buffer space

• 8 interleaved memory controllers• Heterogeneous network nodes

Page 18: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

18

Application Performance (2)

[12.5% injection rate for streaming traffic]

-31%

w/o background

10/3/12

Page 19: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

19

Conclusions

• Sharing improves buffer utilization, but can lead to undesired interference effects

• Adaptive Backpressure regulates credit flow to avoid unproductive use of shared buffer space

• Mitigates performance degradation in presence of adversarial traffic

• But maintains key benefits of buffer sharing under benign conditions

10/3/12

Page 20: Adaptive Backpressure: Efficient Buffer Management for On-Chip Networks Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally Stanford

Becker, Jiang, Michelogiannakis, Dally: Adaptive Backpressure

20

THE ENDThank you for your attention!

10/3/12