Upload
neil-day
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
-1-UC San Diego / VLSI CAD Laboratory
A Global-Local Optimization Framework for Simultaneous Multi-
Mode Multi-Corner Clock Skew Variation Reduction
A Global-Local Optimization Framework for Simultaneous Multi-
Mode Multi-Corner Clock Skew Variation Reduction
Kwangsoo Han, Andrew B. Kahng, Jongpil Lee, Jiajia Li and Siddhartha Nath
VLSI CAD LABORATORY, UC San Diego
-2-
OutlineOutline
Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions
-3-
MotivationMotivation Many signoff PVT corners in modern SoCs Clock skew variation across corners
“ping-pong” effect == fixing timing issues at one corner leads to timing violation at others
Our goal: Minimize clock skew variation
datapath
launch path capture path
CornerClock latency
SkewLaunc
h Captur
e
SS, 0.7V, -25°C 1.0 1.1 -0.1
FF, 1.1V, -25°C 0.9 0.7 +0.2
Low voltage: gate delay dominatesHigh voltage: wire delay dominates Skew reversal Power/area overheads
1.0 1.1
Skew = -0.1/+0.2
/0.7/0.7
-4-
OutlineOutline
Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions
-5-
Related WorkRelated Work
Skew minimization at multiple corners [Cho05] perform temperature-aware skew reduction based
on an improved DME [Lung10] minimize the worst clock skew across corners
with delay correlation factors
Skew variation minimization across corners [Restle01] propose two-level non-tree structure, in which
mesh is applied at bottom level [Su01] use mesh for top-level of clock network [Rajaram04] insert crosslinks in a clock tree to minimize
skew variation
Our work: systematic optimization framework for minimization of clock skew variation in clock tree
-6-
Skew Variation Reduction ProblemSkew Variation Reduction Problem
Clock skew between sink pair (i, j) at corner C: difference between delays from r to sinks i and j at corner C
Skew variation between corner pair (C, C’)
Maximum skew variation for sink pair (i, j)
Skew variation reduction problem: Given a routed clock tree, minimize the sum over all sink pairs of maximum skew variation
Minimize
At C :Skewi,j
C
At C’ : Skewi,j
C’
i j
r
r: root; i, j: sinks
C’
C’’ i j
r
C
C’’ i j
r
C
C’
i j
r
max…
∑
-7-
OutlineOutline
Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions
-8-
Our Optimization FrameworkOur Optimization Framework Incremental optimization of a CTS solution Perform both global and local optimization Global optimization uses LP to determine delta delays on arcs Local optimization performs iterative local moves
root
last-stage buffer
sinksOriginal routed clock tree
target
buffer
After global optimization
root
root
After local optimization
Routed clock tree database
Global Optimization Buffer insertion/removal,
routing detour
Local Optimization Local moves (e.g.,
sizing/displacement)
Optimized database
-9-
Global Optimization: LPGlobal Optimization: LP Formulate linear program to minimize skew variation
Determine the delta delay on each arc at each corner Based on LUTs to insert/remove buffer and detour wires
Discreteness of buffer delays ECO feasibility is importantMinimize (: delta delay of arc k at corner C) (1)Subject to (: maximum skew variation) (2) (3) (: clock latency to sink i at corner C) (4) min delay without wire detour (: arc delay) (5) range of delay ratio from LUTs (6)
(1) Minimize number of ECO changes (2) Sweep U for solution with minimum skew variation (3) Ensure no skew degradation (4) Maximum clock latency constraint (1, 5, 6) Improve ECO feasibility
-10-
Our Optimization FrameworkOur Optimization Framework Incremental optimization of a CTS solution Perform both global and local optimization Global optimization use LP to determine delta delays on arcs Local optimization perform iterative local moves
Routed clock tree database
Global Optimization Buffer insertion/removal,
routing detour
Local Optimization Local moves (e.g.,
sizing/displacement)
Optimized database
-11-
Local Optimization: MovesLocal Optimization: Moves Iterative local moves to minimize skew variation Tree types of local moves
1. Displacement {N, S, E, W, NE, NW, SE, SW} by 10μm x one-step sizing2. Displacement by 10μm x one-step sizing on child buffer3. Reassign to a new driver (i) at the same level, (ii) within bounding
box of 50μm x 50μm10μm
...
...... ...
(1)
10μm
...
...... ...
(2)
...
...... ...
...
(3) Each move is expensive (= legalization, ECO routing, RC extraction, STA)
Each buffer has ~100 candidate moves Which move is the best? Our solution: learning-based
model
-12-
Machine Learning-Based ModelMachine Learning-Based Model
Predict driver-to-fanout latency change due to local moves
Local move
Analytical models Routing: FLUTE, STST Cell delay: Liberty LUTs Wire delay: Elmore, D2M
Delta delays
Learning-based model
Delta delays
0 2 4 6 8 10 120%
20%
40%
60%
80%
100%
Flute+EDFlute+D2MSTST+EDSTST+D2MModel
#Attempts
%B
uff
ers
id
en
tifi
ed
to
h
av
e t
he
be
st
mo
ve
Each attempt is a local move 114 buffers 45 candidate moves for each buffer Learning-based model identifies best
moves for more buffers with less #attempts
-13-
OutlineOutline
Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions
-14-
Experimental SetupExperimental Setup Technology: foundry 28nm LP Initial clock tree from Synopsys IC Compiler Testcases: (a) high-speed application processor,
(b) memory controller
CornersClock ports Clock ports
In yellow are clock nets/cells and sinks
Corner
Process
Voltage
Temperature BEOL Apply to which testcase
C0 SS 0.90V -25°C Cmax (a), (b)
C1 SS 0.75V -25°C Cmax (a), (b)
C2 FF 1.10V 125°C Cmin (b)
C3 FF 1.32V 125°C Cmin (a)
-15-
Experimental Results (1)Experimental Results (1) Up to 22% reduction on sum of skew variation over all sink
pairs No skew degradation at all corners Negligible area and power overhead
Testcase Flow
Variation
(ns)
Skew (ps)
#CellsPower(mW)
Area(μm2)C0 C1 C2/C3
(a)Original 512 214 530 226 2515 0.355 3615
Global-local 399 175 387 188 2553 0.356 3706
(b)Original 972 179 192 282 5568 0.865 8556
Global-local 841 176 192 232 5574 0.866 8557
-16-
Experimental Results (2)Experimental Results (2) Figure shows comparison of skew variation on (a) Our optimization significantly reduces the large skew variation
between corner pairs
Corner pair = (C0, C3)Corner pair = (C0, C1)
Optimized skew variation (ns)
Ori
gin
al s
kew
var
iatio
n (n
s)
Optimized skew variation (ns)
Ori
gin
al s
kew
var
iatio
n (n
s)
-17-
OutlineOutline
Motivation Related Work Our Optimization Framework Experimental Setup and Results Conclusions
-18-
Conclusion and Future WorksConclusion and Future Works First framework to minimize sum of skew variation over
all sink pairs in a clock tree Up to 22% reduction of the sum of skew variation Future works
– Study resultant power and area benefits– Model to predict a buffer location for minimum skew
over a continuous range of possible locations
Thank You!
-19-
Backup Slides
-20-
Experimental Results (3)Experimental Results (3) Figure shows distribution of skew ratios between C0 and C1 Our optimization significantly reduces the variation of skew
ratios between corner pairs
μ = 1.34𝜎2 =
3.21
μ = 2.26𝜎2 =
2.26
Ratio (= skew at C1 / skew at C0)Ratio (= skew at C1 / skew at C0)
#Sin
k pa
irs
#Sin
k pa
irs
Original Global-local