Upload
olympe
View
41
Download
6
Tags:
Embed Size (px)
DESCRIPTION
Déjà Vu Switching for Multiplane NoCs. Ahmed Abousamra. Rami Melhem. Alex Jones. University of Pittsburgh. Motivation. Power efficiency has become a primary concern in the design of CMPs. The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power. - PowerPoint PPT Presentation
Citation preview
Déjà Vu Switching for Multiplane NoCs NOCS’12
Déjà Vu Switching for Multiplane NoCs
University of Pittsburgh
Ahmed Abousamra Rami Melhem Alex Jones
Déjà Vu Switching for Multiplane NoCs NOCS’12
Motivation
Power efficiency has become a primary concern in the design of CMPs.The NoC of Intel’s TeraFLOPS processor consumes more than 28% of the chip’s power.Network messages can be classified into critical and non-criticalIt may be possible to send non-critical messages on a slower plane without hurting performance.
Déjà Vu Switching for Multiplane NoCs NOCS’12
SettingBaseline Single
Plane NoC
Control Plane Data Plane
• Carries control and coherence messages: data requests, invalidates, acknowledgments, …
• Operates at baseline’s voltage & frequency
• Carries data messages
• Operates at a lower voltage & frequency to save power
Déjà Vu Switching for Multiplane NoCs NOCS’12
Outline
Motivation & Related workImportance of data messages to performanceThe Optimization Problem Proposed Solution
Déjà Vu SwitchingAnalysis of the acceptable data plane speed
EvaluationSummary
Déjà Vu Switching for Multiplane NoCs NOCS’12
Importance of data messages to performance
1 2 3 4 5 6 7 8Cache line having 8 data words
Data Message
5 1 2 3 4 6 7 8Header
Critical Word
Subsequent miss for another word in the line Delayed cache hit
Critical WordNon-Critical Words
Déjà Vu Switching for Multiplane NoCs NOCS’12
Importance of data messages to performance
If delayed cache hits are overly delayed, performance can suffer.
Percentage of L1 misses that are delayed cache hits
barn
es
blac
ksch
oles
body
trac
k
fluid
anim
ate
lu c
ontig
lu n
onco
ntig
ocea
n co
ntig
radi
osity
radi
x
rayt
race
spec
jbb
wat
er n
squa
red
wat
er sp
atial
Geom
etric
mea
n
0%5%
10%15%20%25%30%35%40%45%50%
% o
f L1
miss
es t
hat a
re d
elay
ed h
its
Déjà Vu Switching for Multiplane NoCs NOCS’12
Outline
Motivation & Related workImportance of data messages to performanceThe Optimization ProblemProposed Solution
Déjà Vu SwitchingAnalysis of the acceptable data plane speed
EvaluationSummary
Déjà Vu Switching for Multiplane NoCs NOCS’12
The Optimization Problem
How slow can we operate the data plane to maximize energy savings while not impacting performance?
Déjà Vu Switching for Multiplane NoCs NOCS’12
Proposed Solution
Split NoC physically into 2 planes: control + dataOn data plane:
Use circuit-switching to speed-up communication.Reduce voltage & frequency.
Send a control message to establish the circuit once a cache hit is detected.Do not block circuit establishment message: Déjà Vu SwitchingAnalyze acceptable slow down of the data plane to minimize energy while maintaining performance.
Déjà Vu Switching for Multiplane NoCs NOCS’12
Déjà Vu Switching – Circuit SwitchingRequest Packet Reservation Packet Data Packet
Control Plane
Data Plane
1) Upon a cache miss, a data request is sent
Déjà Vu Switching for Multiplane NoCs NOCS’12
Déjà Vu Switching – Circuit SwitchingRequest Packet Reservation Packet Data Packet
Control Plane
Data Plane
2) Upon a data hit in the next cache level, the circuit reservation packet is sent
Déjà Vu Switching for Multiplane NoCs NOCS’12
Déjà Vu Switching – Circuit SwitchingRequest Packet Reservation Packet Data Packet
Control Plane
Data Plane
3) Finally, the data packet is sent to the requester on the reserved circuit
Déjà Vu Switching for Multiplane NoCs NOCS’12
Déjà Vu Switching – Conflicting Reservations
23 1
23 1
Control Plane
Data Plane Reserved Circuits
Reservation Packets
Data Packets
East-NorthWest-NorthWest-East
Router
Déjà Vu Switching for Multiplane NoCs NOCS’12
Reservation Queue of West Input Port
Reservation Queue of East Output Port
Reserving the West-East Connection: E W
Déjà Vu Switching - Reservation Scheme
Head of theQueues
Port Names: North West South East Local
Déjà Vu Switching for Multiplane NoCs NOCS’12
S S
Reservation Queues of the Input Ports
Reservation Queues of the Output Ports
Déjà Vu Switching – Realizing Connections
EN
NN N
S W
North
West
South
East
Local
E
W N
Head of theQueues
North
West
South
East
Local
Port Names: North West South East Local
Déjà Vu Switching for Multiplane NoCs NOCS’12
Reservation Queues of the Input Ports
Reservation Queues of the Output Ports
Déjà Vu Switching – Realizing Connections
N
S
North
West
South
East
Local
E
W N
Head of theQueues
North
West
South
East
Local
N
Port Names: North West South East Local
Déjà Vu Switching for Multiplane NoCs NOCS’12
Reservation Queues of the Input Ports
Reservation Queues of the Output Ports
Déjà Vu Switching – Realizing Connections
N
S
North
West
South
East
Local
E
W N
Port Names: North West South East Local
Head of theQueues
North
West
South
East
Local
N
Déjà Vu Switching for Multiplane NoCs NOCS’12
Déjà Vu Switching – Ensuring Correct Routing
Each input and output port must independently track the reserved circuits it is part of.Any two reservation packets that share part of their paths, must traverse all the shared links in the same orderData packets must be injected onto the data plane in the same order their reservation packets are injected onto the control plane.
Déjà Vu Switching for Multiplane NoCs NOCS’12
Outline
Motivation & Related workImportance of data messages to performanceThe Optimization ProblemProposed Solution
Déjà Vu SwitchingAnalysis of the acceptable data plane speed
EvaluationSummary
Déjà Vu Switching for Multiplane NoCs NOCS’12
Reservations should always be ahead of data packets
82345679121011113Time
Reservation Packet injected early at cycle 1Data Packet injected at cycle 3
Déjà Vu Switching for Multiplane NoCs NOCS’12
Reservations should always be ahead of data packets
82345679121011113Time
Reservation Packet injected early at cycle 1Data Packet injected at cycle 3
Déjà Vu Switching for Multiplane NoCs NOCS’12
Acceptable data plane slowdown - Message Latency
Déjà Vu Switching for Multiplane NoCs NOCS’12
Outline
Motivation & Related workImportance of data messages to performanceThe Optimization ProblemProposed Solution
Déjà Vu SwitchingAnalysis of the acceptable data plane speed
EvaluationSummary
Déjà Vu Switching for Multiplane NoCs NOCS’12
Evaluation Environment
We use the functional simulator, Simics, to simulate cache coherent CMPs of 16 and 64 cores.We use Orion2 to get power numbers for the interconnect routersWe evaluate with
Synthetic traces: allows varying the network loadExecution driven simulation of parallel benchmarks from the SPLASH-2 and PARSEC suits
Interconnect Topology: Mesh
Déjà Vu Switching for Multiplane NoCs NOCS’12
Evaluation Environment
Baseline NoC: Single plane, 16 byte links, packet switched with 3 cycles router pipeline, clocked at 4 GHz
Evaluated NoC:Control plane: 6 byte links, packet switched, 4GHzData plane: 10 byte links, circuit switched.
Control and Coherence Packets: 1-flitData Packets:
Baseline NoC: 5 flitsData Plane: 7 flits
Déjà Vu Switching for Multiplane NoCs NOCS’12
Evaluation – Synthetic Traces
Normalized NoC energy and completion timeof synthetic traces on a 64-core CMP
0.01 0.03 0.050%
20%
40%
60%
80%
100%
120%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Norm
alize
d E
nerg
y Co
nsum
ption
Traffic Injection Rate (request / cycle / node) 0.01 0.03 0.05
0%
20%
40%
60%
80%
100%
120%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Traffic Injection Rate (request / cycle / node)
Norm
alize
d C
ompl
etion
Tim
e
*Each request receives a reply data packet
Déjà Vu Switching for Multiplane NoCs NOCS’12
barn
es
blac
ksch
oles
body
trac
k
fluid
anim
ate
lu c
ontig
lu n
onco
ntig
ocea
n co
ntig
radi
osity
radi
x
rayt
race
spec
jbb
wat
er n
squ.
..
wat
er s
patia
l
Geo
met
ric...
0%
20%
40%
60%
80%
100%
120%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Nor
mal
ized
Exe
cutio
n
Tim
eEvaluation – Execution Driven Simulation
Normalized execution time on a 16-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
barn
es
blac
ksch
oles
body
trac
k
fluid
anim
ate
lu c
ontig
lu n
onco
ntig
ocea
n co
ntig
radi
osity
radi
x
rayt
race
spec
jbb
wat
er n
squ.
..
wat
er s
patia
l
Geo
met
ric...
0%10%20%30%40%50%60%70%80%90%
100%
4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Nor
mal
ized
Ene
rgy
Con-
sum
ption
Evaluation – Execution Driven Simulation
Normalized NoC energy on a 16-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
barn
es
blac
ksch
oles
body
trac
k
fluid
anim
ate
lu co
ntig
lu n
onco
ntig
ocea
n co
ntig
radi
osity
radi
x
spec
jbb
wat
er n
squa
red
wat
er sp
atial
Geom
etric
mea
n
0%20%40%60%80%
100%120%140%
4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Nor
mal
ized
Exe
cutio
n T
ime
barn
es
blac
ksch
oles
body
trac
k
fluid
anim
ate
lu co
ntig
lu n
onco
ntig
ocea
n co
ntig
radi
osity
radi
x
spec
jbb
wat
er n
squa
red
wat
er sp
atial
Geom
etric
mea
n
0%10%20%30%40%50%60%70%80%90%
100%4 GHz 4/4 GHz 4/3 GHz 4/2.66 GHz 4/2 GHz
Nor
mal
ized
Ene
rgy
Cons
umpti
on
Evaluation – Execution Driven Simulation
Normalized NoC energy and execution timeon a 64-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
barn
es
blac
ksch
oles
body
trac
k
fluid
anim
ate
lu c
ontig
lu n
onco
ntig
ocea
n co
ntig
radi
osity
radi
x
rayt
race
spec
jbb
wat
er n
squa
red
wat
er s
patia
l
Geo
met
ric m
ean
90%
95%
100%
105%
110%
PS 4/4 DV 4/4 PS 4/2.66 PS+CW 4/2.66 DV 4/2.66
Perf
orm
ance
Rel
ative
to S
ingl
e Pl
ane
Base
line
Evaluation – Isolating Effect of Déjà Vu Switching
Performance relative to CMP with a single plane NoCon a 16-core CMP
Déjà Vu Switching for Multiplane NoCs NOCS’12
Related Work
L. Cheng et al. (ISCA'06) and A. Flores et al. (IEEE Trans. Computers'10): Heterogeneous NoC using wires of different latency and power characteristics to improve performance and reduce NoC energy. Proposal requires wide links (75 bytes), but performance degrades with narrow links.Our work differs in:
Tying latency of messages to performanceUsing Déjà Vu Switching to Compensate for slower data plane
Déjà Vu Switching for Multiplane NoCs NOCS’12
SummaryProblem: Saving power in the NoC by reducing the data plane’s power consumption without impacting performance.Delayed cache hits are important to performance.Operating data plane in circuit-switched mode allows it to operate at reduced frequency.Déjà Vu Switching allows reservation to proceed when resources are not currently available.The constraints governing the speed of the data plane can be estimated analytically.
Déjà Vu Switching for Multiplane NoCs NOCS’12
Thank you!