Explicit Modeling of Control and Data for Improved NoC
Router Estimation
Andrew B. Kahng+*, Bill Lin*
and Siddhartha Nath+
UCSD CSE+ and ECE* Departments{abk, billlin, sinath}@eng.ucsd.edu
2
Outline
• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary
3
NoC Modeling So Far… (ORION)
Arbiter
XBAR
BUF I
BUFE
BUFW
BUFN
BUFSLink
Link
Link
Link
SRC
Link
Link
Link
Link
SINK
ORION1.0 (2002)
6NOR + 2INV + DFF
ORION2.0 (2009)
6NOR + 2INV + DFF
Leakage power
Clock power
4
What Is The Problem?
• RTL code mismatch• Logic transformation and
technology mapping mismatch
Arbiter
XBAR
BUF I
BUFE
BUFW
BUFN
BUFSLink
Link
Link
Link
SRC
Link
Link
Link
Link
SINK
6NOR + 2INV + DFF
5
How Bad Is It?Router RTL generators:Netmaker – Cambridge, UKStanford NoC - Stanford
5 6 8 100
10000
20000
30000
40000
50000
60000
ORION2.0 NetMaker Stanford
# Ports
Ins
tan
ce
Co
un
t
460%
16 24 32 640
5000
10000
15000
20000
25000
ORION2.0 NetMaker Stanford
Flit-Width (bits)
Ins
tan
ce
co
un
t
89%
Why such large errors?Assumed logic template inaccurateControl logic not modeledImplementation details missing
6
• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary
Outline
7
P - #PortsV - #VCsB - #BUFsF – Flit-width
Key idea: No assumed logic template Component models derived from actual RTL
synthesized with cell libraries
We Propose: Step 1• Derive router component block parametric models from
post-synthesis netlists
P V B F # Instances
10 2 8 32 3300
8 2 8 32 2112
5 2 8 32 825
~P2
~P2
P V B F # Instances
5 2 8 16 400
5 2 8 32 825
5 2 8 64 1673
~F
XBAR ~ P2F
8
We Propose: Step 2
• Automatic fitting of models with post-P&R power and area
XBAR ~ P2F
P V B F Area
5 2 8 16 1439.9
5 2 8 32 2916.0
5 2 8 64 5867.4
8 2 8 32 7465.1
LSQRXBARarea =
a1.P2F + a0
Key idea: Capture implementation details using automatic regression fit
Characterization performed only once and usable for multiple design space explorations
9
• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary
Outline
10
Model Development
• Two RTL generators:– Netmaker (Cambridge, UK)– Stanford NoC
• SP&R tools:– Cadence RC & Synopsys DC for
hierarchical synthesis to analyze each block
– Cadence SOC Encounter for P&R
NoC router RTL generators
Impl params: Clock Frequency
µArch params: P, V, B, F
Synthesis and P&R: DC/RC, SOCE
Analysis of blocks: XBAR, SW & VC arbiter, Input & Output
buffers
New models for each component block
Component Model
XBAR P2F
SWVC 9(P2V2 + P2 + PV – P)
InBUF 180PV + 2PVBF + 2P2VB + 3PVB + 5P2B + P2 + PF + 15P
OutBUF 25P + 80PV
CLKCTRL 0.02(SWVC + InBUF + OutBUF)
11
Overall Methodology
• Manual– Quick and easy– Misses implementation
details
Basic Regression fit
Manual
Estimates for gate count
ORION_NEW models
LSQR
Technology Library
Cell area
Cell leakage
Pin cap.
Internalenergy
Area Power: leakage, internal, switching
Post P&R data per block
Std. cell count & area
Leakage power
Internal power
Switching power
• LSQR– Accurate (captures implementation
details)– One-time overhead (generation of
P&R training data points)
12
NEW 2.0 NEW 2.0 NEW 2.0 NEW 2.045nm 65nm 45nm 65nm
Stanford NoC NetMaker
0%
20%
40%
60%
80%
100%Avg Max Min
POWER
6.5x reduction
Results: Area And Power
NEW 2.0 NEW 2.0 NEW 2.0 NEW 2.045nm 65nm 45nm 65nm
Stanford NoC NetMaker
0%
20%
40%
60%
80%
100%Avg Max MinAREA
4x reduction
Methodology scales across technologies, router RTL generators
13
• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary
Outline
14
Flit-level Power Estimation• Dynamic power estimation using flit-level bit encodings• Have integrated with full-system NoC simulator (GARNET)
Post-P&R router netlist
Testbench Gate-level simulation
VCDPower
analysis
Power ReportRegression fit
ORION_NEW models
Flit-level power model
GARNET gem5
Flit-level power estimates
15
Results: Flit-level Power
• Accurate estimation of flit-level dynamic power
Flit NEW 2.0 Flit NEW 2.0Stanford NoC NetMaker
0%
20%
40%
60%
80%Avg Max Min
3.6x reduction
16
• Motivation• Our work: Overview• Methodology• Flit-level power estimation• Summary
Outline
17
Summary• New hybrid modeling methodology: relax the
template mindset– Explicitly models control and data signals– Captures RTL and implementation details
• Using proposed parametric regression methodology, worst-case estimation errors reduced by a factor of– 6.5x from ORION2.0 for power– 4x from ORION2.0 for area
• We propose an application of our methodology for flit-level dynamic power modeling and integration with GARNET– 3.6x worst-case error reduction in dynamic power estimation
• Ongoing: Non-parametric modeling of post-P&R power and area
18
Thank You !
19
Back upBack up
20
Regression analysis approach• Multi-step regression fit
– Step 1: Fit instances of each router component with post-layout instance counts
a1. Instsmodel <component> + a0 = Inststool <component>
Step 2a: Fit area of each router component with post-layout area
b1. InstsRmodel <component> + b0 = Areatool <component>
InstsRmodel <component> = a1. Instsmodel <component> + a0
Step 2b: Fit power of each router component with post-layout power (leakage, internal, switching separately)
{c5, d5, e5}. InstsRmodel XBAR + {c4, d4, e4}.InstsR model SWVC +
{c3, d3, e3}.InstsRmodel InBUF + {c2, d2, e2}.InstsR
model OutBUF + {c1, d1, e1}.InstsR
model CLKCTRL + {c0, d0, e0} = {Pleak tool,Pint tool, PSW tool}
21
Related work
• Architecture templates– ORION2.0
• Gate-level analytical models
• Parametric regression– Pre- and post-layout
power estimation– RTL simulations
• Non-parametric regression– MARS
NoC Modeling
Regression model
Parametric Non-parametric
ORION_NEW + regression;
flit-level
Circuit model
Arch templates
Analytical
Significant Departure: Relax the “template” mindset
Control
Tool
22
Results
5 6 8 100
10000
20000
30000
40000
50000
60000ORION2.0NetMakerStanford NoC
# Ports
Inst
ance
Co
un
t
5 6 8 100
5001000150020002500300035004000
NEWNetMakerStanford NoC
# Ports
Inst
ance
Co
un
t16 24 32 64
0
5000
10000
15000
20000
25000ORION2.0NetMakerStanford NoC
Flit-Width (bits)
Inst
ance
Co
un
t
16 24 32 640
5000
10000
15000
20000
25000NEWNetMakerStanford NoC
Flit-Width (bits)In
stan
ce C
ou
nt
• Avg. estimation error in # instances reduced from 109.5% to 8.8% – Avg. estimation error in area reduced to 9.8%– Avg estimation error in power reduced to 4.58%