Upload
quintin-cronk
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
Adaptive Backpressure:Efficient Buffer Management for
On-Chip Networks
Authors: Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally
Stanford University
Presenter: Han LiuUniversity of California, San Diego
Han Liu 2
Background
• NoCs become huge– Hundreds of cores on a single die
• Currently using: Input-queued routers– Input buffer resources become significant
• Input buffer sharing is attractive in NoCs– Pros: Improves area and power efficiency– Cons: facilitates spread of congestion
04/29/13
Han Liu 3
Overview
• Adaptive Backpressure mitigates performance degradation by avoiding unproductive use of buffer space in the presence of congestion
• Avoid downsides of buffer sharing while maintaining benefits in benign case
04/29/13
Han Liu 4
Motivation
• Assumption: buffers are good– More flexible routing– Helps traffic waiting closer to the destination
• Is this always true?– Energy, area efficiency– Implementation difficulty
04/29/13
Han Liu 5
Train Example
04/29/13
San Diego(Source)
Denver(buffer)
Boston(Destination)
Buffers are good
Han Liu 6
Motivation
• Static buffer vs Dynamic buffer management
04/29/13
Wasted buffer
Static
Dynamic
VC1
VC2
VC1
VC2
Han Liu 7
Dynamic Buffer Management
• Buffer space is expensive resource in NoCs– 30-35% network power (MIT RAW, UT TRIPS)
• Dynamic management increases utilization by sharing buffer space among multiple VCs– Optimize use of expensive buffer resources– Decrease incremental cost of VCs
⇒Improved area and power efficiency⇒25% more throughput or 34% less power
[Nicopoulos’06]
04/29/13
Han Liu 8
Sharing
• Pros– Economic– Efficient
• Cons– Inconvenient– Trouble
04/29/13
Han Liu 9
Boarder Example
04/29/13
HWY5 HWY805
Mexico
US
Han Liu 10
Buffer Monopolization
• Blocked flits from congested VC accumulate in buffer⇒Effective buffer size reduced for other VCs
⇒Performance degradation (latency / throughput)⇒Congestion spreads across VCs (flows / apps / VMs / …)
04/29/13
VC 0
VC 1
Han Liu 11
Adaptive Backpressure
Goal:• Avoid unproductive use of buffer space in
dynamic buffer management• But allow sharing when beneficial
Approach:• Match arrival and departure rate for each VC by
regulating credit availability (backpressure)• Derive quota from credit round trip times04/29/13
Han Liu 12
Buffer Monopolization
04/29/13
VC 0
VC 1
• Want a way to regulate unlimited credits supply to congested VC1– Give VC0 more credits and buffer space
Han Liu 13
Quota Motivation (1)
Tcrt,0
Without congestion, full throughput
requires Tcrt,0 credits
Router 0 Router 1 Router 0 Router 1
04/29/13
Creditstall
Insufficient credit supply causes idle cycle downstream
Idlecycle
time
Han Liu 14
Quota Motivation (2)
Congestionstall
Creditstall
Matching stalls avoids unproductive
buffer occupancy
Router 0 Router 1 Router 0 Router 1
Excessdrained
04/29/13
Queuing stall
Queuing stall
Tcrt,0+TstallCongestionstall
Queuing stall
Queuing stall
Queuing stall
Queuing stall
Excessflits
Congestion stallcauses unproductive
buffer occupancy
Excessflits
time
Han Liu 15
Quota Algorithm
04/29/13
• VC’s quota value = Throughput * RRTmin - Throughput of upstream router is hard to
measure-> Compute quota values based on observefd
RTT for individual credits
Han Liu 16
Quota Heuristic
• Track credit RTT for each output VC• RTT=RTTmin ⇒ set quota to RTTmin
– No downstream congestion⇒Allow one flit in each cycle of RTT interval
• RTT>RTTmin ⇒ subtract difference from RTTmin
– Each congestion and queuing stall adds to RTT⇒Allow one credit stall per downstream stall
04/29/13
Han Liu 17
Quota Equation
• Q = max(Tcrt,base - (Tcrt,obs - Tcrt,base), 1 )= max(2 * Tcrt,base - Tcrt,obs , 1)
– When Tcrt,obs is large, Q is small
– Qmin = 1 in order to guarantee that quota values can continue to be updated
04/29/13
Han Liu 18
Implementation
• Network design determines RTTmin for each link• Track RTT for single in-flight credit per VC• Update quota value upon return• Switch allocator masks all VCs that exceed quota
⇒Simple extension to existing flow control logic⇒No additional signaling required⇒< 5% overhead for 16x64b buffer with 4 VCs
04/29/13
Han Liu 19
Evaluation Methodology
• BookSim 2.0• 8x8 2D mesh, 64-bit channels, DOR• 16-slot input buffers, 4 VCs• Combined VC and switch allocation• Synthetic traffic and application benchmarks• Compare ABP to unrestricted sharing
04/29/13
Han Liu 20
Network Stability (1)
• For adversarial traffic, throughput in Mesh is unstable at high load– Traffic merging causes starvation– Tree saturation causes widespread congestion
• ABP improves stability– Throttles sources that inject at very high rate– Efficient buffer use reduces tree saturation
⇒Faster recovery from transient congestion04/29/13
Han Liu 21
Network Stability (2)[tornado traffic]
6.3x
04/29/13
Han Liu 22
Network Stability (3)[foreground traffic at 50% injection rate]
3.3x
-13%saturation rate
04/29/13
Han Liu 23
Performance Isolation (1)
• Inject two classes of traffic into network– Shared buffer space, separate VCs
⇒Sharing causes interference between classes (leads to latency problem)
• ABP reduces interference– Contains effects of congestion within a class
⇒Better isolation between workloads, VMs, …
04/29/13
Han Liu 24
Performance Isolation (2)[uniform random foreground traffic]
[hotspot background traffic][uniform random background traffic]
-33% -38%
04/29/13
Han Liu 25
Performance Isolation (3)[50% uniform random background traffic]
-31%
w/o background
04/29/13
Han Liu 26
Application Performance
[12.5% injection rate for streaming traffic]
-31%
w/o background
04/29/13
Han Liu 27
Conclusions
• Sharing improves buffer utilization, but can lead to undesired interference effects
• Adaptive Backpressure regulates credit flow to avoid unproductive use of shared buffer space
• Mitigates performance degradation in presence of adversarial traffic
• But maintains key benefits of buffer sharing under benign conditions
04/29/13
Han Liu 28
THE ENDThank you for your attention!
04/29/13
Question?