Methodology from Chaos in IC Implementation
Kwangok Jeong* and Andrew B. Kahng*,**
* ECE Dept., UC San Diego** CSE Dept., UC San Diego
(2)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Outline
• Motivation
• Assessment of “Chaos”
• Exploitation of “Chaotic” behavior
• Conclusion
(3)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Motivation• Chip implementation flow is a “Chaos Machine”
(Ward Vercruysse, Sun Microsystems, ISPD97 talk)• Hard to predict behavior of back-end implementation
• “Inherent noise” (Kahng/Mantik, ISQED-2001)• Equivalent inputs to tools result in different outputs
• Algorithms and EDA tools are not deterministic or predictable• Most design optimization problems are NP-hard
Heuristic-based approaches• Physical phenomena are too complex
Simplified models
How to exploit “chaotic behavior”
(4)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Scope of This Work• We assess “chaotic” behavior in design process
• When it occurs in design processes• Post-synthesis vs. post-routing• Place- and-route tools’ view vs. signoff tools view
• What user inputs affect it most• Input parameter sensitivity to synthesis tools• Input parameter sensitivity to place-and-route tools
• We propose a practical method to exploit “chaos” in EDA tools, based on empirical analyses• Sensitivity of input parameters to outcomes Find safe/easy knobs that don’t change netlists/libraries• Best-of-k: multi-start, multi-run methodologies
(5)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Outline
• Motivation
• Assessment of “Chaos”
• Exploitation of “Chaotic” behavior
• Conclusion
(6)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Analysis 1: Synthesis vs. Place-and-Route• How strongly correlated are post-synthesis netlist quality
and post-routing design quality?• Timing quality of synthesized netlists vs. timing quality after
placement and routing, and signoff
Clock used @
synthesis(ns)
WNS with 2ns clock
after synthesis
Clock used @ P&R
WNS (ns)seen
@ P&R
WNS (ns) @ signoff
1.60 0.400 2.0 0.171 -0.249
1.80 0.200 2.0 0.088 -0.196
1.90 0.101 2.0 0.112 -0.195
1.95 0.051 2.0 0.074 -0.449
2.00 0.001 2.0 0.088 -0.252
2.10 -0.097 2.0 0.088 -0.214
2.20 -0.196 2.0 0.120 -0.281
2.40 -0.395 2.0 0.162 -0.081
Worst quality netlist can result in best quality!
*AES design
(7)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Analysis 2: Implementation Vs. Signoff
• Timing miscorrelation • Delay calculation
• RC parasitic-0.350
-0.300
-0.250
-0.200
-0.150
-0.100
-0.050
0.000
-0.350 -0.250 -0.150 -0.050 0.050
Sig
no
ff (P
rim
eT
ime
)
Implementation (SOCE , Astro)
Astro vs. PrimeTime
SOCE vs. PrimeTime
~200ps underestimation
Worst negative slack comparisonfrom 29 testcases
WNS (ns)
Imp. Signoff
AES 0.144 0.146
JPEG 0.129 0.095
LSU -0.002 -0.005
EXU -0.171 -0.183
Imp
lem
enta
tio
n
Signoff
• How strongly correlated are P&R and signoff ?
(8)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Beyond Miscorrelation Issues• Kahng and Mantik 2001 – “Noise”
• Equivalent inputs result in different outputs• Changing seeds of random number generators• Changing cell/net ordering• Renaming cell instances• Perturbing design hierarchy
• Injecting “noise” is practically difficult
• Our focus – “Chaos” • Negligible change of inputs Large change in outputs• E.g., 0.1ps changes affect design quality significantly
Clock cycle time (ns)Worst negative slack (WNS)
(ns)
1.9998 -0.011
1.9999 -0.068
2.0000 -0.093
2.0001 -0.010
2.0002 -0.004
*JPEG design
89ps difference
(9)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
What Inputs Can Be Perturbed?• Tool-specific options: command options to turn on/off
• Not our concern, since these are tool-dependent
• Design-specific constraints:
• These knobs do not change design signatures easy and safe knobs to perturb
Timing-Related Constraints
Clock cycle time (T) -3 / -2 / -1 / 0 / 1 / 2 / 3 ps
Clock uncertainty (S)
-3 / -2 / -1 / 0 / 1 / 2 / 3 ps
Input/output delay (B)
-3 / -2 / -1 / 0 / 1 / 2 / 3 ps
Floorplan-Related Constraints
Utilization (U) -3 / -2 / -1 / 0 / 1 / 2 / 3 %
Aspect ratio (A) -0.03 / -0.02 / -0.01 / 0 / 0.01 / 0.02 / 0.03
(10)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Testbed• Designs
• Implemented with TSMC 65nm GPLUS library
• Tools
Block Source Clock (ns)
#Cells Area (um2)
AES Opencores 1.7 22438 48957
JPEG Opencores 2.2 69845 178696
LSU OpenSparcT1
1.2 24945 113479
EXU OpenSparcT1
1.2 20382 69780Tool Vendor Purpose
Design Compiler
Synopsys Logic synthesis
RTL Compiler Cadence Logic synthesis
SOC Encounter Cadence Place-and-route
Astro Synopsys Place-and-route
STAR-RCXT Synopsys Signoff RC extraction
PrimeTime-SI Synopsys Signoff STA
(11)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Analysis 3: Noise in Synthesis – Timing • What chaotic behavior is associated with input
parameters of vendor synthesis tools?
• Ideally, results should not vary significantly However, worst negative slack can change by up to 52ps
WN
S (
ns)
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
-3p
s-2
ps
-1p
s0
ps
1p
s
2p
s3
ps
-3p
s
-2p
s-1
ps
0p
s1
ps
2p
s
3p
s-3
ps
-2p
s-1
ps
0p
s
1p
s2
ps
3p
s
Clock ClockUncertainty
IO delay
AES JPEG LSU EXU
(DesignCompiler)
(12)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Analysis 3: Noise in Synthesis – Area• What chaotic behavior is associated with input
parameters of vendor synthesis tools?
• Synthesized area can change by up to 6%
Nor
mal
ized
Are
a (%
)
-3
-2
-1
0
1
2
3
4
5
6
7
-3p
s-2
ps
-1p
s0
ps
1p
s2
ps
3p
s-3
ps
-2p
s-1
ps
0p
s1
ps
2p
s3
ps
-3p
s-2
ps
-1p
s0
ps
1p
s2
ps
3p
s
Clock ClockUncertainty
IO delay
AES JPEG LSU EXU
(DesignCompiler)
(13)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
-3p
s-2
ps
-1p
s0
ps
1p
s2
ps
3p
s-3
ps
-2p
s-1
ps
0p
s1
ps
2p
s3
ps
-3p
s-2
ps
-1p
s0
ps
1p
s2
ps
3p
s-0
.03
-0.0
2-0
.01 0
0.0
10
.02
0.0
3-3
%-2
%-1
%0
%1
%2
%3
%
Clockcycle
Clockuncertainty
IO delay Aspectratio
Placementutilization
AES JPEG LSU EXU
Analysis 4: Noise in P&R Tools• What chaotic behavior is associated with input
parameters of vendor place-and-route tools?
• Noise at place-and-route stage is even worse!• WNS and TNS can change by up to 165ps and 46ns
Astro
WN
S (
ns)
(14)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
-3p
s-2
ps
-1p
s0
ps
1p
s2
ps
3p
s-3
ps
-2p
s-1
ps
0p
s1
ps
2p
s3
ps
-3p
s-2
ps
-1p
s0
ps
1p
s2
ps
3p
s-0
.03
-0.0
2-0
.01 0
0.0
10
.02
0.0
3-3
%-2
%-1
%0
%1
%2
%3
%
Clockcycle
Clockuncertainty
IO delay Aspectratio
Placementutilization
AES JPEG LSU EXU
Analysis 4: Noise in P&R Tools• What chaotic behavior is associated with input parameters
of vendor place-and-route tools?
• Noise at place-and-route stage is even worse!• WNS and TNS can change by up to 190ps and 69ns• Area can change by up to 16.4%
SOC Encounter
WN
S (
ns)
(15)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Outline
• Motivation
• Assessment of “Chaos”
• Exploitation of “Chaotic” behavior
• Conclusion
(16)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Exploiting Noise in Design Flow
• Multi-start and multi-run• When there are idle machines in the compute farm Multi-start: After running on k distinct machines with
ignorable perturbations of inputs, choose best out of k different solutions
• When there are remaining timing-to-market Multi-run: After running k sequential jobs with ignorable
perturbations of inputs, choose best out of k different solutions
• Best-of-k method• Find the best solution from many trials• Larger k better best solution• How to determine k that produces predictably good
results?
(17)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Best-of-k Using Sampling• Which k results in consistent, reasonably good solution?
• To obtain statistics: “set of k trials” should be performed a large number (N) of times, for each value of k
Naive procedure:for each k,
Perform k trials by N times
sk average of best solutions
• For large N, sk is the expected (average) best solution, when we perform k trials
• Example: k = {1, 2, 3, 4, 5, 10}, N = 100 2,500 separate runsMany runs are required!
Best-of-k sampling procedure: // find “virtual” solution space
Perform N’ trials (N’ < N)
Record solutions set of solutions S
// best-of-k sampling
for each k
sample k solutions out of S, N different times
sk average of best solutions • Example: N’ = 50 (Sampling from S does not add cost)
(18)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Qu
alit
y R
an
k O
rde
r
kClock cycle Clock uncertaintyInput/output delay Aspect ratioUtilization
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10
Qu
alit
y R
an
k O
rde
r
kClock cycle Clock uncertaintyInput/output delay Aspect ratioUtilization
Application of Best-of-k Sampling(1)• Find best input parameters to perturb using best-of-k
sampling• k = 1, 2, 3, …, 10, and N =100 5,500 exp. in naive procedure• S = 7 solutions from each of T, S, B, A, U perturbations
• Quality rank of input parameters in P&R• E.g.) AES: clock cycle (T) or input/output delay (B) perturbations result
in best solution quality*AES design *EXU design
(19)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Application of Best-of-k Sampling (2)• Solution quality versus number of trials “k” (with N = 100)
• Average solution quality approaches the best solution as “k” increases• Average solution quality is significantly better than worst possible
solution quality best-of-k can avoid bad luck• Best-of-3 shows reasonably good solutions
-0.090
-0.080
-0.070
-0.060
-0.050
-0.040
-0.030
-0.020
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
-0.270
-0.250
-0.230
-0.210
-0.190
-0.170
-0.150
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
-0.190
-0.180
-0.170
-0.160
-0.150
-0.140
-0.130
-0.120
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials-0.330
-0.280
-0.230
-0.180
-0.130
-0.080
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
AES
JPEG
LSU
EXU
(20)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Conclusion and Ongoing Work• Experimental assessment of “chaotic” behavior in commercial
EDA tools• Miscorrelation issues between design stages are well-known
• Exploiting chaos: Intentional negligible input perturbations can significantly change outputs
• Proposed a methodology to exploit the chaotic tool behavior• “best-of-k”: multi-start / multi-run methodology
• Efficient sampling method to determine the best number of trials
• We also find best input parameters to perturb using best-of-k sampling
• Ongoing work• Analysis of potential advantages of “chaos” in advanced physical
synthesis tools to reduce miscorrelation-related issues
• Evaluation of the benefits of chaos in more advanced signoff methodologies (signal integrity-enabled, path-based STA)
(23)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Potential Cause 1• Miscorrelation between synthesis and place-and-
route• Rank correlation of timing critical paths between
synthesis and placement: 0.421
Not critical at synthesis,
Critical at placement
*AES design
(24)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Potential Cause 2: Parasitic Miscorrelation • Miscorrelation in delay
calculation• With same RC parasitic file
(.spef)• May not be a major
problem: A few tens of picoseconds difference
• Miscorrelation in RC extraction• Implementation tool can
underestimate capacitance by 18.6%
Imp
lem
enta
tio
n
Signoff
WNS (ns)
Imp. Signoff
AES 0.144 0.146
JPEG 0.129 0.095
LSU -0.002 -0.005
EXU -0.171 -0.183
(25)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
Inherent Noise: Detailed Results
• Noise is really random! Difficult to predict• Red texts are the best in each group
Design CriticalityClock(ns)
SOCE Astro BlastFusionWith original Clock With original Clock With original Clock
Setup Setup SetupWNS
(SOCE)(ns)
WNS (PT)(ns)
TNS (PT)(ns)
WNS(Astro)
(ns)
WNS (PT)(ns)
TNS (PT)(ns)
WNS(BF)(ns)
WNS (PT)(ns)
TNS (PT)(ns)
AES
Tight clock(original2.2ns)
2.1998 -0.407 -0.430 -81.124 -0.241 -0.487 -94.822 -0.077 -0.391 -60.156 2.1999 -0.392 -0.420 -73.533 -0.218 -0.512 -89.316 -0.067 -0.397 -58.728 2.2000 -0.399 -0.457 -85.641 -0.255 -0.569 -100.956 -0.081 -0.331 -59.985 2.2001 -0.436 -0.439 -82.053 -0.280 -0.535 -110.341 -0.074 -0.442 -61.048 2.2002 -0.406 -0.441 -82.576 -0.246 -0.490 -92.196 -0.067 -0.384 -51.980
Loose clock(original3.0ns)
2.9998 -0.026 -0.119 -1.965 0.040 -0.280 -35.482 0.000 -0.342 -44.778 2.9999 -0.091 -0.095 -2.137 0.064 -0.325 -34.699 0.001 -0.469 -46.154 3.0000 -0.046 -0.096 -3.499 0.049 -0.346 -36.565 -0.001 -0.448 -48.369 3.0001 -0.049 -0.112 -1.972 0.083 -0.239 -23.040 -0.008 -0.373 -44.683 3.0002 -0.061 -0.078 -1.718 0.057 -0.287 -31.985 0.000 -0.421 -48.042
JPEG
Tight clock(original1.3ns)
1.2998 -0.294 -0.315 -625.434 -0.265 -0.352 -744.637 -0.228 -0.324 -501.295 1.2999 -0.263 -0.281 -566.317 -0.240 -0.418 -701.361 -0.166 -0.266 -410.594 1.3000 -0.257 -0.258 -537.580 -0.256 -0.395 -733.841 -0.244 -0.338 -567.228 1.3001 -0.249 -0.303 -561.013 -0.239 -0.321 -719.196 -0.202 -0.304 -475.253 1.3002 -0.298 -0.514 -757.272 -0.229 -0.346 -731.566 -0.197 -0.277 -471.392
Loose clock(original2.0ns)
1.9998 -0.005 -0.011 -0.011 0.101 -0.140 -0.520 0.000 -0.216 -11.407 1.9999 0.008 -0.068 -0.068 0.101 -0.140 -0.520 0.000 -0.167 -12.021 2.0000 -0.007 -0.093 -0.137 0.101 -0.131 -1.240 -0.002 -0.196 -15.189 2.0001 -0.001 -0.010 -0.010 0.096 -0.098 -0.449 0.001 -0.181 -16.782 2.0002 0.008 -0.004 -0.006 0.099 -0.066 -0.279 -0.006 -0.178 -12.220
(27)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
AES ASTRO
-0.108
-0.106
-0.104
-0.102
-0.100
-0.098
-0.096
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
(28)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
AES SOCE
-0.090
-0.080
-0.070
-0.060
-0.050
-0.040
-0.030
-0.020
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
(29)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
JPEG ASTRO
-0.270
-0.250
-0.230
-0.210
-0.190
-0.170
-0.150
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
(30)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
JPEG SOCE
-0.210
-0.205
-0.200
-0.195
-0.190
-0.185
-0.180
-0.175
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
(31)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
LSU ASTRO
-0.150
-0.140
-0.130
-0.120
-0.110
-0.100
-0.090
-0.080
-0.070
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
(32)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
LSU SOCE
-0.190
-0.180
-0.170
-0.160
-0.150
-0.140
-0.130
-0.120
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials
(33)UCSD VLSI CAD Laboratory - ISQED 2010, March 24, 2010
EXU ASTRO
-0.330
-0.310
-0.290
-0.270
-0.250
-0.230
-0.210
-0.190
-0.170
-0.150
1 2 3 4 5 6 7 8 9 10
WN
S (n
s)
k (# of different runs)
Worst of 100 trials
Average of 100 Trials
Best of 100 trials