Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Technische Universität MünchenInstitute for Integrated Systems
Comparison of Deadlock Recovery and Avoidance Mechanisms to Approach Message
Dependent Deadlocks in on-Chip Networks
Andreas Lankes¹, Soeren Sonntag², Helmut Reinig³, Thomas Wild¹, Andreas Herkersdorf¹
¹ Technische Universität München, Institute for Integrated Systems² Lantiq GmbH Deutschland
³ Infineon Technologies AG, Intellectual Property Reuse
NOCS 2010 Andreas Lankes 2
Technische Universität MünchenInstitute for Integrated Systems
Networks-on-Chip & Deadlocks
• Packet-switched NoCs susceptible to deadlocks– Especially wormhole forwarding
• Routing cycles in channel dependency diagram
S
D
D
S
NOCS 2010 Andreas Lankes 3
Technische Universität MünchenInstitute for Integrated Systems
Deadlock Prevention
• Removal of Routing cycles
Allowed turns
Forbidden turns
Implementation of virtual channels and adaption of routing function
Restriction of routing function
NOCS 2010 Andreas Lankes 4
Technische Universität MünchenInstitute for Integrated Systems
Memory
CPU
Message Dependent Deadlocks
• Network itself free of routing cycles• Communication contains message dependencies
– Memory access: read request -> read response– DMA transaction– ...
• N-way protocol– N dependent messages or
message types
Response packet
Request packet
Message dependency
between request and response packet creates forbidden turn!
NOCS 2010 Andreas Lankes 5
Technische Universität MünchenInstitute for Integrated Systems
Message Dependent Deadlock Avoidance
• Buffer Sizing– Destination tile guarantees reception of all packets
-> Huge input buffers
• End-to-end flow control– Limitation of sender quota
E.g. credit based
• Strict ordering– Separation of message types in different
networks
– E.g. virtual channels: Buffer size riseswith number of dependent messages
Switch
2 virtual channels
Link
NOCS 2010 Andreas Lankes 6
Technische Universität MünchenInstitute for Integrated Systems
Table of Content
• Introduction
• Message Dependent Deadlock Recovery for NoCs
• Comparison of Deadlock Recovery and Deadlock Avoidance
• Conclusion
NOCS 2010 Andreas Lankes 7
Technische Universität MünchenInstitute for Integrated Systems
Deadlock Avoidance
Strict Ordering with virtual channels• Additional buffer
queues per port(number of messagetypes!)
H1
D1I0
I1
O0
O1
Router
Input buffer Output bufferD2
Virtual channel queue 2 virtual
channel queues PER
port
NOCS 2010 Andreas Lankes 8
Technische Universität MünchenInstitute for Integrated Systems
Deadlock Recovery in HPC
Additional channel in the network reserved for deadlocked packets
• In all routers and network interfaces
• Central to the router
• Redirection from input- and output buffers
H3
Deadlockrecovery control
unit
H1
D1
D3
I0
I1
O0
O1
Router
Input buffer Output buffer
normal path of a packet
redirection of packet
Tin
Tout
D2
Timer based deadlock detection
Reserved deadlock channel
Reserved deadlock
channel as virtual
channel
NOCS 2010 Andreas Lankes 9
Technische Universität MünchenInstitute for Integrated Systems
Deadlock Recovery for NoCs
Avoid deadlocks in reserved deadlock channel
• Strict ordering in deadlock recovery channel
• Exclusive access to deadlock virtual channels
H3
Deadlockrecovery control
unit
H1
D1
D3
I0
I1
O0
O1
Router
Input buffer Output buffer
normal path of a packet
redirection of packet
Tin
Tout
D2
Reserved channel with
nested deadlock virtual channels
NOCS 2010 Andreas Lankes 10
Technische Universität MünchenInstitute for Integrated Systems
Access Regulation Scheme
Exclusive access to each deadlock virtual channel by token based access scheme
• Tokens circle through the token distribution ring network
• On redirection:– Token travels with redirected
packets– Released on reception in the
destination
Tile
Router
Network Interface
Token distribution
ring network
NOCS 2010 Andreas Lankes 11
Technische Universität MünchenInstitute for Integrated Systems
Enable Redirection of Packets
Problems:• Buffers implemented as FIFO queues• Wormhole forwarding
• Header flits always at first position in queues– Restrict switching function– Restrict flow control function
• Reduction of effectivebuffer size -> throughput
Packet 1 must not be switched
H3
Deadlockrecovery control
unit
H1
D1
D3
I0
I1
O0
O1
Router
Input buffer Output buffer
normal path of a packet
redirection of packet
Tin
Tout
D2
NOCS 2010 Andreas Lankes 12
Technische Universität MünchenInstitute for Integrated Systems
Back-off Mechanism
• Timer based deadlock detection:Congested network
• Back-off mechanism
– Back-off token in token ringnetwork
– Forced sending stop for tilesTile
Router
Network Interface
Token distribution
ring network
Disable sending
Deadlock recovery unit
NOCS 2010 Andreas Lankes 13
Technische Universität MünchenInstitute for Integrated Systems
Table of Content
• Introduction
• Message Dependent Aware Deadlock Recovery for NoCs
• Comparison of Deadlock Recovery and Deadlock Avoidance
• Conclusion
NOCS 2010 Andreas Lankes 14
Technische Universität MünchenInstitute for Integrated Systems
Comparison of Deadlock Avoidance & Recovery
• Common system architecture– 8x8 2D mesh architecture– XY routing, wormhole forwarding
• Applied Traffic– Inter processor traffic
(uniform distribution, rate constant)– Memory access traffic
(uniform or varying localization, rate iterated)
• Deadlock Recovery (MeshDr)• Deadlock Avoidance: strict ordering using virtual channels (Mesh)
CPU CPU MEM CPU CPU
CPU CPU CPU CPU CPU
MEM CPU CPU CPU MEM
CPU CPU CPU CPU CPU
CPU CPU MEM CPU CPU
NOCS 2010 Andreas Lankes 15
Technische Universität MünchenInstitute for Integrated Systems
Buffer Size Comparison
• Deadlock Recovery savesalmost 50% of total bufferspace– For 2 dependent
messages
0
2000
4000
6000
8000
10000
12000
Buffer Space of Networks
Buf
fer
spac
e [f
lits]
Mes
h Mes
hDr
Mes
hExt
Buf
Mes
hDrE
xtB
uf
Mes
hExt
Buf
2
Mes
hDrE
xtB
uf2
Length of routers' buffer queues 2 flits 4 flits 8 flits
NOCS 2010 Andreas Lankes 16
Technische Universität MünchenInstitute for Integrated Systems
Memory Throughput
• Deadlock avoidance outperforms deadlock recovery
• Throughput of deadlock recovery depends on timings
0,002 0,004 0,006 0,008 0,010 0,012 0,014 0,016 0,018 0,020
0,0
0,1
0,2
0,3
0,4
0,5
0,6
Memory Throughput
MeshMeshDrT1MeshDrT2MeshDrT3MeshDrT4
Request flit generation rate of one processor
Se
nd r
ate
of
resp
ons
e f
lits
of
a m
em
ory
Name oftimings
Deadlock DetectionThreshold [cycles]
Back-off Period[cycles]
T1 100 100T2 100 150T3 150 150T4 50 50
NOCS 2010 Andreas Lankes 17
Technische Universität MünchenInstitute for Integrated Systems
Localization of Memory Access Traffic
• Processors prefer nearer memories
• Deadlock recovery profitsfrom localization
0,002 0,004 0,006 0,008 0,010 0,012 0,014 0,016 0,018 0,020
0,1
0,2
0,3
0,4
0,5
0,6
Memory Throughput
MeshMeshLoc0MeshLoc1T1T1Loc0T1Loc1
Request flit generation rate of one processor [flits/cycle]
Sen
d ra
te o
f re
spon
se f
lits
of a
mem
ory
Increasing localization
CPU CPU MEM CPU CPU
CPU CPU CPU CPU CPU
MEM CPU CPU CPU MEM
CPU CPU CPU CPU CPU
CPU CPU MEM CPU CPU
NOCS 2010 Andreas Lankes 18
Technische Universität MünchenInstitute for Integrated Systems
Comparison of Networks with equal Buffer Space
• Higher throughput for recovery scheme with equal buffer space(for localized memory access traffic)
0,002 0,004 0,006 0,008 0,010 0,012 0,014 0,016 0,018 0,020
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Memory Throughput
MeshLoc1Mes-hLoc1ExtBufMes-hLoc1ExtBuf2T1Loc1T1Loc1ExtBufT1Loc1ExtBuf2
Request flit generation rate of one processor [flits/cycle]
Sen
d ra
te o
f re
spon
se f
lits
of a
mem
ory
Approx. equal buffer space
0
2000
4000
6000
8000
10000
12000
Buffer Space of Networks
Buf
fer
spac
e [f
lits]
Mes
h Mes
hDr
Mes
hExt
Buf
Mes
hDrE
xtB
uf
Mes
hExt
Buf
2
Mes
hDrE
xtB
uf2
Length of routers' buffer queues
2 flits 4 flits 8 flits
NOCS 2010 Andreas Lankes 19
Technische Universität MünchenInstitute for Integrated Systems
Table of Content
• Introduction
• Message Dependent Aware Deadlock Recovery for NoCs
• Comparison of Deadlock Recovery and Deadlock Avoidance
• Conclusion
NOCS 2010 Andreas Lankes 20
Technische Universität MünchenInstitute for Integrated Systems
Conclusion
• Significant savings in buffer space– For 2 dependent messages almost 50%– Savings increase with number of dependent messages
• Comparable buffer space leads to throughput advantage(for localized memory traffic)
• Future work– Deadlock detection– Random access to buffer queues– ...
NOCS 2010 Andreas Lankes 21
Technische Universität MünchenInstitute for Integrated Systems
Thank You!
Any Questions?
NOCS 2010 Andreas Lankes 22
Technische Universität MünchenInstitute for Integrated Systems
Effects of Restricted Switching & Flow Control
• Reduction of effectivebuffer size
• Reduction of throughput
0 0,05 0,1 0,15 0,2 0,25 0,3
40
60
80
100
120
140
160
180
200
Transfer Latency of Uniform Traffic
Mesh:pl=3MeshDr:pl=3Mesh:pl=10MeshDr:pl=10
Flit Generation Rate [flits/cycles]
La
tenc
y [n
s]