Upload
erica
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Scaling Challenge: Can Correct-by-Construction Design Help?. Prashant Saxena Noel Menezes Pasquale Cocchini Desmond Kirkpatrick Intel Labs (CAD Research) Hillsboro OR International Symposium on Physical Design Monterey, CA Apr 16, 2003. - PowerPoint PPT Presentation
Citation preview
The Scaling Challenge:The Scaling Challenge:Can Correct-by-Construction Design Can Correct-by-Construction Design
Help?Help?
Prashant Saxena Prashant Saxena Noel Menezes Pasquale Cocchini Desmond KirkpatrickNoel Menezes Pasquale Cocchini Desmond Kirkpatrick
Intel Labs (CAD Research)Intel Labs (CAD Research)
Hillsboro ORHillsboro OR
International Symposium on Physical DesignInternational Symposium on Physical Design
Monterey, CAMonterey, CA
Apr 16, 2003Apr 16, 2003
22
ISPD’03ISPD’03
Repeaters, which are already a Repeaters, which are already a full-chip headache, will become full-chip headache, will become
critical at the block level alsocritical at the block level also
33
ISPD’03ISPD’03
OutlineOutline
Some scaling experimentsSome scaling experiments– Spice simulationsSpice simulations
Implications for post-RTL designImplications for post-RTL design Correct-by-Construction (CbC) designCorrect-by-Construction (CbC) design
–What’s the promise? What’s missing?What’s the promise? What’s missing?
44
ISPD’03ISPD’03
A Scaling PrimerA Scaling Primer
Process scaling:Process scaling:– Devices shrink 0.7x, delay 0.7x Devices shrink 0.7x, delay 0.7x
– Wires shrink 0.7xWires shrink 0.7x– R/R/ increases 2x, C/ increases 2x, C/ unchanged unchanged
– So, (delay/scaled So, (delay/scaled increases 1.4x increases 1.4x
Block area often stays sameBlock area often stays same– # cells, # nets doubles# cells, # nets doubles
– Wiring histogram shape invariantWiring histogram shape invariant
SS
GG
DD
55
ISPD’03ISPD’03
Critical Repeater LengthsCritical Repeater Lengths
Optimally-sized Optimally-sized uniformly for min delay uniformly for min delay
– Min distance at which Min distance at which inserting a repeater speeds inserting a repeater speeds up the lineup the line
““Ideally shrunk” circuit Ideally shrunk” circuit requires additional requires additional repeaters repeaters (0.7x (0.7x vs vs 0.57x)0.57x)
90nm 65nm 45nm 32nm
M3M60
0.2
0.4
0.6
0.8
1
Relative Critical
Repeater Length
0.57x0.57x
586.0ss In line with scaling theory:In line with scaling theory:
66
ISPD’03ISPD’03
Critical Sequential LengthsCritical Sequential Lengths Optimized for max Optimized for max
distance in one clock distance in one clock periodperiod
Assumes:Assumes: – 2x frequency scaling, 5GHz on 90nm2x frequency scaling, 5GHz on 90nm
– Ignores setup, hold, skewIgnores setup, hold, skew
““Ideally shrunk” circuit: Ideally shrunk” circuit: – Requires Requires muchmuch new wire new wire
pipeliningpipelining (0.7x (0.7x vsvs 0.43x) 0.43x)
– Ratio of regular to clocked Ratio of regular to clocked repeaters decreasingrepeaters decreasing
90nm 65nm 45nm 32nm
M3M60
1
2
3
4
5
6
7
Relative Critical
Seq. Length
0.43x0.43x
90nm 65nm 45nm 32nm
0
1
2
3
4
5
6
7
# rep. between
FFs
0.75x0.75x
77
ISPD’03ISPD’03
1
10
100
1000
10000
100000
0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Normalized wirelengthNormalized wirelength
# W
ires
(90
nm
)#
Wir
es (
90n
m)
1
10
100
1000
10000
100000
0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
Block Wiring Histogram and Block Wiring Histogram and Critical Repeater LengthsCritical Repeater Lengths
Critical lengths migrating rapidly to the left… Critical lengths migrating rapidly to the left… (zoomed view coming up)(zoomed view coming up)
Normalized wirelengthNormalized wirelength
# W
ires
(90
nm
)#
Wir
es (
90n
m)
45nm32nm
65nmM6M3
Metal Process90nm
88
ISPD’03ISPD’03
# w
ires
(90
nm
)#
wir
es (
90n
m)
Normalized WirelengthNormalized Wirelength
Block Wiring Histogram: Block Wiring Histogram: Zoomed ViewZoomed View
Increasingly steep slope of curve Increasingly steep slope of curve (log scale)(log scale) => # impacted nets exploding! => # impacted nets exploding!
Critical Repeater Lengths
1
10
100
1000
10000
100000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Normalized WirelengthNormalized Wirelength
# W
ires
(90
nm
)#
Wir
es (
90n
m)
M6M3
Metal Process90nm65nm45nm32nm
99
ISPD’03ISPD’03
PSC/bus1p Wiring HistogramCritical Sequential Distances
1
10
100
1000
10000
100000
0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
#wir
es (
90n
m)
Normalized Wirelength
Block Wiring Histogram and Block Wiring Histogram and Critical Sequential LengthsCritical Sequential Lengths
# pipelined nets growing from negligible (90nm) to substantial (32nm)# pipelined nets growing from negligible (90nm) to substantial (32nm)
PSC/bus1p Wiring HistogramCritical Sequential Distances
1
10
100
1000
10000
100000
0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
MetalM6M3
Process90nm65nm45nm32nm
#wir
es (
90n
m)
#wir
es (
90n
m)
Normalized Normalized WirelengthWirelength
PSC/bus1p Wiring HistogramCritical Sequential Distances
1
10
100
1000
10000
100000
0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
MetalM6M3
Process90nm65nm45nm32nm
#wir
es (
90n
m)
#wir
es (
90n
m)
Normalized Normalized WirelengthWirelength
PSC/bus1p Wiring HistogramCritical Sequential Distances
1
10
100
1000
10000
100000
0.25 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
#wir
es (
90n
m)
#wir
es (
90n
m)
Normalized Normalized WirelengthWirelength
MetalMetalM6M3
ProcessProcess90nm65nm45nm32nm
#wir
es (
90n
m)
#wir
es (
90n
m)
Normalized Normalized WirelengthWirelength
#wir
es (
90n
m)
Normalized Wirelength
1010
ISPD’03ISPD’03
Repeated Block-level NetsRepeated Block-level Nets
0
5
10
15
20
25
30
35
90nm 65nm 45nm 32nm
% r
ep
ea
ted
ne
ts
M3 M6 Ever-increasing %age of block-Ever-increasing %age of block-level nets requires repeaterslevel nets requires repeaters
Even the rate of growth is Even the rate of growth is accelerating!accelerating!
…especially for clocked repeaters
0
2
4
6
8
10
12
14
90nm 65nm 45nm 32nm
% n
ets
wit
h c
lk-r
ep
M3 M6
1111
ISPD’03ISPD’03
Total Repeater CountTotal Repeater Count
Ever-increasing Ever-increasing fractions of total cell fractions of total cell count will be repeaterscount will be repeaters– 70% in 32nm70% in 32nm (and this (and this
omits FC repeaters within omits FC repeaters within block !)block !) 0
10
20
30
40
50
60
70
80
90nm 65nm 45nm 32nm%ce
lls u
sed
to
rep
eat
blo
ck-l
evel
net
s
clk-rep
rep
tot-rep
Total repeater count is independent of Total repeater count is independent of frequency scaling assumptionsfrequency scaling assumptions
1212
ISPD’03ISPD’03
Interconnects scaling worse than devicesInterconnects scaling worse than devices …….in spite of optimal (re-)buffering.in spite of optimal (re-)buffering
# repeaters increasing exponentially# repeaters increasing exponentially
So, what’s changing?So, what’s changing?
Interconnect repeaters will comprise significant Interconnect repeaters will comprise significant fractionfraction of cells in blockof cells in block
Even block-level nets will need to be pipelinedEven block-level nets will need to be pipelined
1313
ISPD’03ISPD’03
Implications on SynthesisImplications on Synthesis
Literal/Gate count and fanout Literal/Gate count and fanout metrics misleadingmetrics misleading– Major delay contribution from Major delay contribution from
communicationcommunication
– Fanouts often isolated by repeatersFanouts often isolated by repeaters
– Area often wire-limitedArea often wire-limited
Sizing often determined by Sizing often determined by (predictable) repeater load(predictable) repeater load
– Pre-layout sizing wastedPre-layout sizing wasted
1414
ISPD’03ISPD’03
Implications on SynthesisImplications on Synthesis
Less logic per pipeline stageLess logic per pipeline stage Combinational synthesis: max Combinational synthesis: max
benefit shrinkingbenefit shrinking Synthesis across sequential Synthesis across sequential
boundariesboundaries Methodological support for Methodological support for
retiming retiming
1515
ISPD’03ISPD’03
Implications on SynthesisImplications on Synthesis
Bandwidth ceilingBandwidth ceiling– Hard to move data around for Hard to move data around for
computationcomputation
Logic replicationLogic replication– Encourage low fansEncourage low fans
Dense encodingsDense encodings Distribution of computation across Distribution of computation across
channelchannel
1616
ISPD’03ISPD’03
Implications on LayoutImplications on Layout
RoutingRouting– Must understand repeater insertionMust understand repeater insertion– Fine power grid => templated routing?Fine power grid => templated routing?
Placement with repeaters Placement with repeaters – Intra-block nets: # repeaters depends on Intra-block nets: # repeaters depends on
routing routing – OTH routes: fixed obstructionsOTH routes: fixed obstructions– Add buffering into placement core Add buffering into placement core
… … as opposed to ECO postprocessingas opposed to ECO postprocessing
a b
a b
a
b
S SSV
SSVS
S
S
1717
ISPD’03ISPD’03
Implications on LayoutImplications on Layout Latency-constrained placementLatency-constrained placement
– arch sub-optimalityarch sub-optimality
– Hard constraint per stage Hard constraint per stage (unlike (unlike delay)delay)
OROR
Post-RTL latency optimizationPost-RTL latency optimization– Methodological nightmareMethodological nightmare
– Delay insensitive design?Delay insensitive design?
32nm
90nm
1818
ISPD’03ISPD’03
Implications on FC AssemblyImplications on FC AssemblyWhat if we reduce block area to avoid wire effects?What if we reduce block area to avoid wire effects?
Many of the new physical synthesis problems go awayMany of the new physical synthesis problems go away
BUTBUT
# blocks triples!# blocks triples! (and block assembly is the hardest part of chip design!)(and block assembly is the hardest part of chip design!)
Flat assemblyFlat assembly(Fragmentation of paths across blocks)(Fragmentation of paths across blocks)
OROR
Increased hierarchyIncreased hierarchy(Lack of visibility across hierarchy levels)(Lack of visibility across hierarchy levels)
1919
ISPD’03ISPD’03
The CbC LinkThe CbC Link
Process scaling => worsening predictabilityProcess scaling => worsening predictability
Predictability => CbC designPredictability => CbC design
But current CbC approaches too rigidBut current CbC approaches too rigid
Can we still apply them?Can we still apply them?
2020
ISPD’03ISPD’03
Principles of CbC DesignPrinciples of CbC Design More predictabilityMore predictability
– Reduced estimation error improves high-level Reduced estimation error improves high-level optimizationsoptimizations
Break the design-verification loopBreak the design-verification loop– Sequence of small, guaranteed-correct Sequence of small, guaranteed-correct
transformationstransformations– No unexpected deterioration of secondary metricsNo unexpected deterioration of secondary metrics
Avoid micro-engineeringAvoid micro-engineering– Design productivity gapDesign productivity gap
2121
ISPD’03ISPD’03
Abstract FabricsAbstract Fabrics Structural fabrics: too resource-intensiveStructural fabrics: too resource-intensive
e.g. DWF: 50% routing trackse.g. DWF: 50% routing tracks
Use algorithmic fabrics insteadUse algorithmic fabrics instead– Prune to subspace with desirable CbC propertiesPrune to subspace with desirable CbC properties e.g. Non-uniform power grid using “min power pitch” (ISPD’02)e.g. Non-uniform power grid using “min power pitch” (ISPD’02) Guaranteed throughput bus design (ICCAD’02)Guaranteed throughput bus design (ICCAD’02)
– CbC rules-of-thumb CbC rules-of-thumb e.g. Bound on max adjacent runs of signalse.g. Bound on max adjacent runs of signals
Performance with predictabilityPerformance with predictability
2222
ISPD’03ISPD’03
Synth/mapped Synth/mapped
netlistnetlist
CbC Block ConstructionCbC Block Construction ““Vertical” partitioning and Vertical” partitioning and
successive refinementsuccessive refinement– Coarse layout of unsynthesized Coarse layout of unsynthesized
designdesign– Successive refinement of “vertical” Successive refinement of “vertical”
partitionspartitions– Critical partitions firstCritical partitions first– Different partitions exist at different Different partitions exist at different
level of refinementlevel of refinement– Hierarchical enginesHierarchical engines
– Enables early repeater predictionEnables early repeater prediction
RTLRTL
Placed/buffered Placed/buffered
netlistnetlist
GR/track-assigned GR/track-assigned
layoutlayout
2323
ISPD’03ISPD’03
Latency prediction for full-chip interconnectsLatency prediction for full-chip interconnects– Preferential routing for performance-critical netsPreferential routing for performance-critical nets
– Flip-flop staging on non-critical netsFlip-flop staging on non-critical nets
– Performance prediction with cycle latency rangesPerformance prediction with cycle latency ranges
Block area mis-prediction toleranceBlock area mis-prediction tolerance– Move blocks without re-implementationMove blocks without re-implementation
– Global communication gridsGlobal communication grids
CbC Full Chip AssemblyCbC Full Chip Assembly
2424
ISPD’03ISPD’03
Summing UpSumming Up
Repeaters becoming critical at the block levelRepeaters becoming critical at the block level Most post-RTL design problems changing Most post-RTL design problems changing
fundamentallyfundamentally Combination of algorithmic and methodological Combination of algorithmic and methodological
advances requiredadvances required
CbC approaches viable, but at the abstract levelCbC approaches viable, but at the abstract level
– Current structural fabrics too resource intensiveCurrent structural fabrics too resource intensive
– Achieve predictability through algorithmic fabricsAchieve predictability through algorithmic fabrics
Backup SlidesBackup Slides
2626
ISPD’03ISPD’03
PIE (Process Independent PIE (Process Independent Exploration) ModelsExploration) Models To provide an easier way to study interconnect structures and their To provide an easier way to study interconnect structures and their
trends in future CMOS processestrends in future CMOS processes To be used in place of To be used in place of fudgedfudged process files process files Analytical models directly correlating to device and interconnect physicsAnalytical models directly correlating to device and interconnect physics
– Device models based on BSIM3 equations including major 2Device models based on BSIM3 equations including major 2ndnd order effects order effects– Accurate mobility and velocity saturation models, DIBL and channel length Accurate mobility and velocity saturation models, DIBL and channel length
modulation approximationmodulation approximation– Continuous from weak to strong inversionContinuous from weak to strong inversion
– Interconnect models with 2D fringe capacitance approximationInterconnect models with 2D fringe capacitance approximation– Scattering not accounted forScattering not accounted for
Entire process expressed by small set of physically meaningful process Entire process expressed by small set of physically meaningful process parameters (e.g. Tparameters (e.g. Toxox, V, Vthth, k, kildild, etc.) in PEF (Process Exploration File) files, etc.) in PEF (Process Exploration File) files
– 16 for devices16 for devices– 6 each metal layer6 each metal layer
Test cases simulated as SPICE netlistsTest cases simulated as SPICE netlists PIE models implemented as behavioral sourcesPIE models implemented as behavioral sources Calibrated against existing process filesCalibrated against existing process files