Upload
eustacia-hines
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
1
Placement
2
Placement
• Problem Given a netlist, and fixed-shape cells (small,
standard cell), find the exact location of the cells to minimize area and wire-length
Consistent with the standard-cell design methodology
o Row-based, no hard-macros
Modules:o Usually fixed, equal height (exception: double height
cells)o Some fixed (I/O pads)o Connected by edges or hyperedges
• Objectives Cost components: area, wire length
o Additional cost components: timing, congestion[Bazargan]
3
Problem formulation
• Input: Blocks (standard cells and macros) B1, ... , Bn
Shapes and Pin Positions for each block Bi
Nets N1, ... , Nm
• Output: Coordinates (xi , yi ) for block Bi. No overlaps between blocks The total wire length is minimized The area of the resulting block is minimized or given a
fixed die
• Other considerations: timing, routability, clock, buffering and interaction with physical synthesis
4
Wire Length Objective
5
Placement Cost Components• Area
Would like to pack all the modules very tightly• Wire length (half-perimeter of the hnet bbox)
Minimize average wire length Would result in tight packing of modules with high
connectivity• Overlap
Could be prohibited by the moves, or used as penalty
Keep the cells from overlapping (moves cells apart)• Timing
Not a 1-1 correspondence with wire length minimization, but consistent on average
• Congestion Measure of routability Tends to move cells apart
[Bazargan]
6
Importance of Placement• Placement: fundamental problem in physical design • Glue of the physical synthesis• Became very active again in recent years:
9 new academic placers for WL min. since 2000 Many other publications to handle timing, routability, etc.
• Reasons: Serious interconnect issues (delay, routability, noise) in
deep-submicron designo Placement determines interconnect to the first order o Need placement information even in early design stages (e.g.,
logic synthesis)o Need to have a good placement solution
Placement problem becomes significantly larger Cong et al. [ASPDAC-03, ISPD-03, ICCAD-03] point out
that existing placers are far from optimal, not scalable, and not stable; subsequent rejoinder from other studies
[© He][Bazargan]
7
Placement can Make A Difference
• MCNC Benchmark circuit e64 (contains 230 4-LUT). Placed to a FPGA.
Random InitialPlacement
FinalPlacement
After DetailedRouting
[© He][Bazargan]
8
Design Types
• ASICs Lots of fixed I/Os, few macros, millions of standard cells Placement densities : 40-80% (IBM) Flat and hierarchical designs
• SoCs Many more macro blocks, cores Datapaths + control logic Can have very low placement densities : < 20%
• Micro-Processor (P) Random Logic Macros(RLM) Hierarchical partitions are placement instances (5-30K) High placement densities : 80%-98% (low whitespace) Many fixed I/Os, relatively few standard cells Recall “Partitioning w Terminals” DAC`99, ISPD `99,
ASPDAC`00
[© He][Bazargan]
9
Requirements for Placers
• Must handle 4-10M cells, 1000s macros 64 bits + near-linear asymptotic complexity Scalable/compact design database (OpenAccess)
• Accept fixed ports/pads/pins + fixed cells• Place macros, esp. with var. aspect ratios
Non-trivial heights and widths(e.g., height=2rows)
• Honor targets and limits for net length• Respect floorplan constraints• Handle a wide range of placement densities
(from <25% to 100% occupied), ICCAD `02
[© He][Bazargan]
10
A B C
Optimal Relative Order:
11
A B C
To spread ...
12
A B C
.. or not to spread
13
A B C
Place to the left
14
A B C
… or to the right
15
A B C
Optimal Relative Order:
Without “free” space the problem is dominated by order
16
Placement Footprints:
Standard Cell:
Data Path:
IP - Floorplanning
[Bazargan]
17
Core
ControlIO
Reserved areas
Mixed Data Path & sea of gates:
Placement Footprints:
[Bazargan]
18
Perimeter IO
Area IO
Placement Footprints:
[Bazargan]
19
UnconstrainedPlacement
[Bazargan]
20
Floor plannedPlacement
[Bazargan]
21
VLSI Global Placement Examples
bad placement
good placement
[Bazargan]
22
Placement Algorithms• Top-Down
Partitioning-based placement Recursive bi-partitioning or
quadrisectiono Cut direction?o Partition vs. physical location
• Iterative Simulated annealing
OR: Force directed/analytic Start with an initial placement,
iteratively improve wire-length / area
• Constructive Start with a few cells in the
center, and place highly connected adjacent modules around them
12 B
A
D B
FA
C
G
L H
[Bazargan]
23
Analytic Placement
24
Analytic Placement
• Also referred to as force-directed or quadratic placement
• Model Wires simulated as springs
Forceij = Weightij x distanceij.
Cell sizes as repellent forces
• Algorithm Solve a set of linear equations to find an
intermediate solution (module locations) Repeat the process until equilibrium
[Bazargan]
25
Force-Directed Placement (cont.)
• Model (details): Cell distances: either
OR:
Forces:
Objective: find x,y coordinates for all cells such that total force exerted on each cell is zero.
|||| jiijjiij yyyxxx
22 )()( jijiij yyxxd
)()(11
ij
n
jij
iyij
n
jij
ix ykFxkF
[Bazargan]
26
Force-Directed Placement (cont.)• Avoiding overlaps or collapsing in one point?
Use fixed boundary I/O cells Use repelling force between cells that are not
connected by a net Do not allow a move that results in overlap Use repelling “field” forces from congested areas
to sparse ones [Eisenmann, DAC’98]
• Problems with force directed: Overlap still might occur (cell sizes model
artificially) Flat design, not hierarchy
),()(1
iixij
n
jij
ix yxExkF
[Bazargan]
27
Analytic Placement (Contd.)
• Write down the placement problem as an analytical mathematical problem
• Solve the placement problem directly• Example:
Sum of squared wire length is quadratic in the cell coordinates.
So the wirelength minimization problem can be formulated as a quadratic program.
It can be proved that the quadratic program is convex, hence polynomial time solvable
28
x2x1
x=100
x=200Toy
Example:Cost x1 1002 x1 x2
2 x2 2002
x1Cost 2x1 1002x1 x2
x2Cost 2x1 x2 2x2 200
setting the partial derivatives = 0 we solve for the minimum Cost:
Ax + B = 0
= 04 22 4
x1
x2 200
400
= 02 11 2
x1
x2 100
200
x1=400/3 x2=500/3
29
setting the partial derivatives = 0 we solve for the minimum Cost:
Ax + B = 0
= 04 2 2 4
x1x2
200 400
= 02 1 1 2
x1x2
100 200
x1=400/3 x2=500/3
x2x1
x=100
x=200
Interpretation of matrices A and B:
The diagonal values A[i,i] correspond to the number of connections to xiThe off diagonal values A[i,j] are -1 if object i is connected to object j, 0 otherwiseThe values B[i] correspond to the sum of the locations of fixed objects connected to object i
Example:
30
Why formulate the problem this way?
• Because we can• Because it is trivial to solve• Because there is only one solution• Because the solution is a global optimum• Because the solution conveys “relative
order” information• Because the solution conveys “global
position” information
31
Gordian: A Quadratic Placement Approach
Global optimization: solves a sequence of quadratic programming problems
Partitioning: enforces the non-overlap constraints
Ref. 1: Gordian: VLSI Placement by Quadratic Programming and slicing Optimization, by J. M. Kleinhans, G.Sigl, F.M. Johannes, K.J. Antreich IEEE Trans. On CAD, March 1991. pp 356-365
Ref. 2: Analytical Placement: A Linear or a Quadratic Objective Function? By G. Sigl, K. Doll, F.M. Johannes, DAC’91 pp 427-423
32
Quadratic Placement Formulation
• Quadratic Placement Framework:repeat
Solve the convex quadratic programSpread the cells
until the cells are evenly distributed
ectorsSolution v,
cell and cellbetween net theofWeight
cell ofcenter theof sCoordinateLet
yx
jiw
i),y(x
ij
ii
ectorsSolution v,
cell and cellbetween net theofWeight
cell ofcenter theof sCoordinateLet
yx
jiw
i),y(x
ij
ii
22 )()(2
1 cell and cellbetween net theofCost
jijiij yyxxw
ji
22 )()(2
1 cell and cellbetween net theofCost
jijiij yyxxw
ji
const2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ const
2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ
33
Solution of the Original QP
34
Partitioning
• Find a good cut direction and position.
• Improve the cut value using FM.
35
• Before every level of partitioning, do the Global Optimization again with additional constraints that the center of gravities should be in the center of regions.
• Always solve a single QP (i.e., global).
Applying the Idea Recursively
Center of Gravities
36
Center of Gravity Constraints
Mathematical formulation
37
38
Process of Gordian
(a) Global placement with 1 region (b) Global placement with 4 region (c) Final placements
39
Star Model for Wire Length
40
Other Details
• Sometimes, there is too much overlapping between modules (indicating a bad partitioning). In that case, it is necessary to merge 2 sub-partitions and repartition them.
• When the number of cells in each partition is small enough, can stop partitioning. Then do a final placement of the sub-circuits according to the design style.
41
Repartitioning
42
Quadratic TechniquesPros:- mathematically well behaved- efficient solution techniques find global optimum- great quality
Cons:- solution of Ax + B = 0 is not a legal placement, so
generallysome additional partitioning techniques are required.
- solution of Ax + B = 0 is that of the "mapped" problem, i.e., nets are represented as cliques, and the solution minimizes wire length squared, not linear wire length unless additionalmethods are deployed
- fixed IOs are required for these techniques to work well
43
Differences between linear and quadratic objective function
A B Cfixed movable fixed
A B Cfixed fixedmovable
a) Quadratic objective function
b) Linear objective function
Linear vs. Quadratic Objective Function
44
Linear vs. Quadratic Objective Function (Cont’d)
Quadratic objective function tends to make very long nets shorter than linear objective function does, and lets short nets become slightly longer
row1
row2
row3
row4
row1
row2
row3
row4
A
B
A
B
Linear objective function Quadratic objective function
45
Kraftwerk Placement Tool
Hans Eisenmann and F. Johannes, “Generic Global Placement and Floorplanning”,
DAC-98, p.269 - 274
46
Approach
• Iteratively solve the quadratic formulation:
• Spread cells by additional forces:
• Density-based force proposed Push cells away from dense region to sparse
region
0
)(Min 21
dCp
constpdCpppf TT
0 fdCp
)','(' and ),( where
'''
')','(
2),( 2
yxryxr
dydxrr
rryxD
kyxf
// equivalent to spring force // equilibrium
47
Some Details
• Let fi be the additional force applied to cell i
• The proportional constant k is chosen so that the maximum of all fi is the same as the force of a net with length k(W+H)
• k is a user-defined parameter k=0.2 for standard operation k=1.0 for fast operation
• Can be extended to handle timing, mixed block placement and floorplanning, congestion, heat-driven placement, incremental changes, etc.
48
Some Potential Problems of Kraftwerk
1. Convergence is difficult to control• Large k oscillation• Small k slow convergence
• Example: Layout of a multiplier
2. Density-based force is expensive to compute
'''
')','(
2),( 2 dydx
rr
rryxD
kyxf
49
A “discrete” penalty function
• Used in Aplace, Kahng et al., ISPD05 Proposed by Naylor (U.S. Pat. 6301693 ) Divide placement area into grids Equalize cell area over all grids Straightforward overlap function
Bell-shaped function is continuous and differentiable
50
Natarajan Viswanathan and Chris ChuISPD-04
FastPlace: FastPlace: Efficient Analytical Placement Efficient Analytical Placement
using Cell Shifting, using Cell Shifting, Iterative Local Refinement Iterative Local Refinement
and a Hybrid Net Modeland a Hybrid Net Model
51
FastPlace Approach• FastPlace Framework (roughly):
repeatSolve the convex quadratic program Reduce wirelength by iterative heuristic Spread the cells
until the cells are evenly distributed
• Special features of FastPlace: Cell Shifting
o Easy-to-compute technique o Enable fast convergence
Hybrid Net Modelo Speed up solving of convex QP
Iterative Local Refinemento Minimize wirelength based on linear objective
52
Cell Shifting
Uniform Bin Structure Non-uniform Bin Structure
1.1. Shifting of bin boundary Shifting of bin boundary
2.2. Shifting of cells linearly within each Shifting of cells linearly within each binbin
• Apply to all rows and all columns independently
53
Cell Shifting – Animation …
NBi
Bini
Bini+1
OBiOBi-1 OBi+1
Ui Ui+1
j
k
l
Bini
Bini+1
OBiOBi-1 OBi+1
j
k
l
NBi
54
Iterative Local Refinement
• Iteratively go through all the cells one by one• For each cell, consider moving it in four directions
for a certain distance• Compute a score for each direction based on
Half-perimeter wirelength (HPWL) reduction Cell density at the source and destination regions
• Move to the direction with highest positive score (Not move if no positive score)
• Distance moved (H or V) is decreasing over iterations
• Detailed placement is handledby the same heuristic
H H
V
V
55
Pseudo pin and Pseudo net
Pseudo net
Additional Force
Pseudo pin
Target Position
Original Position
• Need to add forces to prevent cells from collapse back
• By adding pseudo pins and pseudo nets
• Only diagonal and linear terms of the quadratic system need to be updated
• Horizontal and vertical problems have the same connectivity matrix Q
56
Effect of Net Model on Runtime
• Need to replace each multi-pin net by 2-pin nets• Then the placement problem (even with pseudo nets) can be
formulated as a convex QP:
• Solved by any convex QP algorithms Use Incomplete Cholesky Conjugate Gradient (ICCG)
• Runtime is proportional to # of non-zero entries in Q • Each non-zero entry in Q corresponds to one 2-pin net• Traditionally, placers model each multi-pin net by a clique• High-degree nets will generate a lot of 2-pin nets• Slow down convex QP algorithms significantly
const2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ const
2
1
2
1cost Total yyyxxx T
yTT
xT dQdQ
57
Clique, Star and Hybrid Net Models
Star Node
Clique Model Star Model Hybrid Model
# pins Net Model2 Clique3 Clique4 Star5 Star6 Star… …
• Star model is introduced by Mo et al. [ICCAD-00] for macro placement
• Introduces a star node even for 2-pin nets
58
Comparsion
• FastPlace is fast: Compared to Capo 8.8:
o 13x faster o 1% longer in wirelength
Compared to Dragon 2.2.3o 97.4x fastero 1.6% longer in wirelength
Compared to Kraftwerk (based on published data)o 20-25x fastero 10% better in wirelength
59
Detailed Placement
60
Objectives
• Major: Legalization Make placement feasible with as little
perturbation as possible
• Minor: Further Improvement Wirelength Timing Routability
61
Representative Approaches
• Constructive methods Network flow [K. Doll et al, TCAD’94] Linear placement [A. Kahng et al, ASPDAC’99] Window interleaving [S. Hur et al, ICCAD’00’]
• Iterative method Simulated annealing [W. Sun & C. Sechen, ICCAD’93]
• Hybrid method Network flow + Dynamic programming + Linear
programming [J. Vygen, DATE’98] [Brenner et al, ISPD’04]
• Placement migration/spreading Diffusion-based [Ren et al, DAC’05] Delaunay [Luo et al, ICCAD’05]
62
Constructive Method[K. Doll et al, TCAD’94]
• Overlapped initial placement
• Improve placement• Divide chip into
overlapping regions• Assign cells within
each region• Iterated if necessary
63
Constructive Method: Network Flow
• Network transformation for regions
C1
C2
C3
Region
Final Result
Subcells Subregion
(capacity, cost)
64
Constructive Method: Network Flow
• Within each region Chop cells into uniform sized subcells
o All subcells from the same cell have same cost to each candidate location• All subcells tend to be pulled towards the cheapest location. As a
result, tend to lie side by side• In case of separation, move subcells to the column holding the
majorityo Determine y coordinate using gravity center of subcells
Transform into min-cost-maximum-flowo Introduce source Q and sink So Capacity from Q to any cell µ is sµ=# subcells in µ o Capacity from candidate location to S is 1o One edge from each cell to each candidate location with
capacity o One edge from each candidate location to sink D with
capacity 1 Solve by flow-augmentation algorithm
65
Constructive Method[K. Doll et al, TCAD’94]
Pseudo-code for the algorithm
66
Constructive Method [A. Kahng et al, ASPDAC’99]
• Determine the order of cells on a row from global placement
• Fix cells outside the row, and place the cells within the row optimally according to cell order
• Feasible location of cells are discrete, e.g., grid points
• Consider white space on the row, i.e., unused space between cells
• Three algorithms are given to exploit different properties of cost function: details omitted
• Key point: concept of optimal regions
67
Constructive Method: Linear Placement
• Input A single row with dimension with n movable cells ci, i=1,2,…
m Fixed cells in other rows Feasible locations S1, S2, Sk
m nets, N1, N2, … Nm between the movable and fixed cells Discrete feasible location for cells
• Output Overlapping free placement of ci within the row with
minimum
• Constraint (fixed order) x(c1) x(c2) … x(cn)
n
iiNHPWL
1
)(
68
Constructive Method: Linear Placement
fixed cells
net N
span (N)
fl(N) fr(N)
ml(N) mr(N)
fixed_span (N)
minimize
69
Constructive Method: Linear Placement
• Consider contribution of cell ci
• Piece-wise linear and convex Increasing (decreasing) when fr(N) to
the left of corresponding li(N) is less (more) than fl(N) to the right of corresponding ri(N)
)(
)(
}0),()(max{
}0),()(max{)(cos
NLCill
rrNRCi
i
NmNf
NfNmxt
fr(1) fl(2) fl(3) fr(3) fl(4) fr(2) fr(5) fl(6) fl(7)
Minimum Segment
1 23 4
5
6
7
70
Optimal region for detailed placement• Generalization of Kahng, ASPDAC99 (key idea proposed
earlier by Goto in 1981); used in Pan, ICCAD05• Optimal region = target region for moving a cell in detailed
placement• Given by the x and y medians of all nets
Given nets j=1…n, with bounding box coordinates xj(l), xj(r), yj(l), yj(r), order all x’s and find the median; order all y’s and find the median
71
Constructive Legalization in Mongrel [Hur and Lillis, ICCAD’00]
• Start from an overlap-free detailed placement and further improve the wirelength
• Pick an arbitrarily sized interval (window) on a row Fix the cells outside the window, and
interleave the cells optimally Repeat by sliding the window right across
each row
72
1. Wirelength improvement
• All movements monotonic from a source bin S with violated capacity to a destination bin T with available capacity
• Gain = wire length reduction by moving a single cell to a neighbor
• Create a gain graph (DAG), and find maximum gain path
73
2. Window Interleaving
• Input An arbitrarily sized interval W within a row Arbitrary ordered subsequence of W as A={a1, a2, … am}. , and B = W-
A = {b1, b2, … bn} Fixed cells outside W n nets, N1, N2, … Nn between cells in W and fixed cells
• Output Overlapping free placement within W with minimum
• Constraint Cell order of A and B is preserved Wirelength of nets connecting cells within W is minimized
n
iiNHPWL
1
)(
74
Ordered sequence A and B are chosen arbitrarilyThe cell order of A and B is preserved after interleaving
2. Window Interleaving
75
• Dynamic programming Subproblem
o Denote Sij as the interleaving if {a1, a2,… ai } and {b1, b2,… bj}, with C(Si,j) minimized, C(Si,j) denotes the cost of Sij ,
Recursion
Complexity O(nm+p(n+m)), p is the number of pins on incident nets, n and m is the cells in A and B respectively
otherwise ,
)()( if ,,
0)(
0
1,
1,,11
0,0
0,0
jji
jjiijiiji
ij bS
bSCaSCaSS
SC
S
2. Window Interleaving
76
• Dynamic programming Key point: the optimal interleaving of a
prefix Si,j is independent of the ordering of subsequent cells in the window
Separability => Dynamic Programming
• Points to ponder Is it optimal? What is the difference compared to the DP
in Linear Placement [Kahng et al]?
2. Window Interleaving
Other approaches to legalization
• Not discussed in class – for your reference FengShui
o Dynamic programming [ICCAD’03]o Greedy Algorithm [ISPD’04]
Tetris - Dwight Hill (US Patent 6,370,673)o Also used in Kahng/Wang APlace paper [ISPD04]
77
Simulated annealing
78
79
Simulated Annealing Placement
• Cost Area (usually fixed # of rows, variable row width) Wirelength (Euclidian or Manhattan) Cell overlap (penalty increases with
temperature)
• Moves Exchange two cells within a radius R
(R temperature dependent?) Displace a cell within a row Flip a cell horizontally
• Low vs. High temperature If used as a post processing, start with low-temp
• Post-processing? Might be needed if there are still overlaps
[Bazargan]
80
Case Study: TimberWolf• “The Timberwolf Placement and Routing Package”, Sechen, Sangiovanni;
IEEE Journal of Solid-State Circuits, vol SC-20, No. 2(1985) 510-522• “Timber wolf 3.2: A New Standard Cell Placement and Global Routing
Package” Sechen, Sangiovanni, 23rd DAC, 1986, 432-439
Timber wolf
Stage 1 Modules are moved between different rows as well as within the
same row Modules overlaps are allowed When the temperature is reduced below a certain value, stage 2
begins
Stage 2 Remove overlaps Annealing process continues, but only interchanges adjacent
modules within the same row
[© He][Bazargan]
81
Solution Space
All possible arrangements of modules into rows possibly with overlaps
overlaps
[© He][Bazargan]
82
Neighboring Solutions
M3: Change the orientation of a module1 2
3 4
2 1
3 4
1 2
3 4
Axis of reflections
M1: Displace a module to a new location
M2: Interchange two modules
..
Three types of moves:
[© He][Bazargan]
83
Move Selection
• Timber wolf first tries to select a move between M1 and M2
o Prob(M1)=4/5o Prob(M2)=1/5
• If a move of type M1 is chosen (for certain module) and it is rejected, then a move of type M3 (for the same module) will be chosen with probability 1/10
• Restriction on: How far a module can be displaced What pairs of modules can be interchanged
M1: Displacement
M2: Interchange
M3: Reflection
[© He][Bazargan]
84
Move Restriction
• Range Limiter At the beginning, R is very large, big enough to
contain the whole chip Window size shrinks slowly as the temperature
decreases. In fact, height and width of R log(T) Stage 2 begins when window size are so small that
no inter-row modules interchanges are possible
Rectangular window R
[© He][Bazargan]
85
Cost Function
• Cost = C1+C2+C3
C1 = (aiwi + bihi)
ai, bi are horizontal and vertical weights, respectively
ai =1, bi =1 1/2 perimeter of bounding box
Critical nets: Increase both ai and bi
Double metal technology: Over-the-cell routing is possible. Fewer feed through cells are needed
If vertical wirings are “cheaper” than horizontal wiringsl use smaller vertical weights i.e. bi< ai
net i
hi
wi
[© He][Bazargan]
86
Cost Function (Cont’d)
C2: Penalty function for module overlaps
O(i,j) = amount of overlaps in the X-dimension between modules i and j
— offset parameter to ensure C2 0 when T 0
ji
jiOC2
2 ),(
C3: Penalty function that controls the row lengths
Desired row length = d( r )
l( r ) = sum of the widths of the modules in row r
r
rdrlC )()(3
[© He][Bazargan]
87
Annealing Schedule
Tk = r(k)•Tk-1 k= 1, 2, 3, …. r(k) increase from 0.8 to max value 0.94 and
then decrease to 0.1 At each temperature, a total number of K•n
attempts is made n= number of modules K= user specified constant
[© He][Bazargan]
88
Dragon2000: Standard-Cell Placement Tool for Large
Industry Circuits
M. Wang, X. Yang, and M. Sarrafzadeh,ICCAD-2000
pages 260-263
89
Main Idea
• Simulated annealing based 1.9x faster than iTools 1.4.0 (commerical version of
TimberWolf) Comparable wirelength to iTools (i.e., very good) Performs better for larger circuits Still very slow compared with than other approaches Also shown to have good routability
• Top-down hierarchical approach hMetis to recursively quadrisect into 4h bins at level h Swapping of bins at each level by SA to minimize WL Terminates when each bin contains < 7 cells Then swap single cells locally to further minimize WL
• Detailed placement is done by greedy algorithm
Partitioning
90
91
Partitioning-based Approach
• Try to group closely connected modules together.
• Repetitively divide a circuit into sub-circuits such that the cut value is minimized.
• Also, the placement region is partitioned (by cutlines) accordingly.
• Each sub-circuit is assigned to one partition of the placement region.
Note: Also called min-cut placement approach.
92
An Example
Circuit
Placement
Cutline
93
Variations
• There are many variations in the partitioning-based approach. They are different in: The objective function used. The partitioning algorithm used. The selection of cutlines.
94
Partitioning:
Objective:
Given a set of interconnected blocks, produce two sets thatare of equal size, and such that the number of nets connecting the two sets is minimized.
95
FM Partitioning:
Initial Random Placement
After Cut 1
After Cut 2
list_of_sets = entire_chip;while(any_set_has_2_or_more_objects(list_of_sets)){
for_each_set_in(list_of_sets){
partition_it();}/* each time through this loop the number of *//* sets in the list doubles. */
}
96
Partitioning-based Placement• Simultaneously perform:
Circuit partitioning Chip area partitioning Assign circuit partitions to chip slots
• Problem: Circuit partitioning unaware of the physical
location
Solution: Terminal propagation (add dummy terminals)
A B
A
B
A B
[She99] p.239
A B
[Bazargan]
97
Terminal Propagation Algorithm by Dunlop and Kernighan
“A Procedure for Placement of Standard-Cell VLSI Circuits”,TCAD, 4(1):92-98, Jan. 1985.
98
Problem of Partitioning Subcircuits
A B
A B
A
B
Cost of these 2 partitionings are not the same.
99
Terminal Propagation
• Need to consider nets connecting to external terminals or other modules as well.
• Do partitioning in a breath-first manner (i.e., finish all higher-level partitioning first).
A BA
B
Dummy Terminal
A B
The Dummy Terminal will try to pull B to the top partition.
100
Terminal Propagation
101
Partitioning-based Placement• More problems:
Direction of the cut? [Yildiz, DAC’01]
How to handle fixed blocks? (area assigned to a partition might not be enough)
How to correct a bad decision made at a higher level?
• Advantages: Hierarchical, scalable Inherently apt for congestion minimization, easily
extendable to timing optimization
1
2 34 5
6 7
12 3
4 5
1234
56789
(a) (b) (c)
1 2
3
(d)
[Bazargan]
102
Can Recursive Bisection Alone Produce Routable Placement?
(Name of placer: Capo)
Andrew Caldwell, Andrew Kahng, and Igor Markov
DAC-2000
103
Capo Overview
• Standard cell placement, Fixed-die context• Pure recursive bisectioning placer
Several minor techniques to produce good bisections
• Produce good results mainly because: Improvement in mincut bisection using multi-
level idea in the past few years Pay attention to details in implementation
• Implementation with good interface (LEF/DEF and GSRC bookshelf) available on web
104
Capo Approach
• Recursive bisection framework: Multi-level FM for instances with >200 cells Flat FM for instances with 35-200 cells Branch-and-bound for instances with <35 cells
• Careful handling partitioning tolerance: Uncorking: Prevent large cells from blocking smaller
cells to move Repartitioning: Several FM calls with decreasing
tolerance Block splitting heuristics: Higher tolerance for vertical
cut Hierarchical tolerance computation: Instance with more
whitespace can have a bigger partitioning tolerance
105
Handling Macrocells
• Macrocells: large blocks Shred into smaller blocks of regular height [Doll,
1994] Add a number of “fake”nets between these to
keep them together (or equivalently, use a larger net weight)
Use the centroid of the placed blocks as the centroid of the macroblock after placement
• Can be used for “floorplacement” (floorplanning+placement)
106
Partitioning with CAPO
Pros: fast scales nearly linearly with problem size
Cons: non-trivial to implement very directed algorithm, but this limits the
ability to deal with miscellaneous constraints Not stable (if there is minor change)
o Concept of placement stability: if the input netlist experiences a small change, the placement should not change substantially
107
Summary for Partition Based Placement
• Improvement in mincut partitioning are conducive to better wirelength and congestion
• Routable placements can be produced in most cases without explicit congestion management Explicit congestion control may still be useful in
some cases
• Better weighted wirelength often implies better routed wirelength, but not always
108
Other Placement Metrics
109
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
netlist with delay for each gate
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
arrival times
0
0
0
1
3
1
7
9
7
7
13
15
14
18
22
18
Timing Analysis
110
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
arrival time/required time
0/4
0/0
0/8
1/5
3/3
1/9
7/9
9/9
7/15
7/13
13/15
15/15
14/18
18/22
22/22
18/22
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
slack = required time - arrival time
4
0
8
4
0
8
2
0
8
6
2
0
4
4
0
4
Timing Analysis
Another example with interconnect delay – Same Timing Analysis
5 5 5
4 4 4
2
LATCH
LATCH
3 2 1 1
2 1 3 2
1
2222
1919
112
Timing Driven Placement Approaches
• Path-based Most accurate information Very slow
• Budgeting Inaccurate information Hard to budget Fast
• Net-based approach Net-weighting
113
Net-Weighting• Basic approach
For more timing critical nets (i.e., smaller slack), assign higher net weights
Minimize
where
),(_ ilengthnetwi
i
ii S
w1
114
Congestion Minimization
• Traditional placement problem is to minimize interconnection length (wirelength)
• A valid placement has to be routable• Congestion is important because it
represents routability (lower congestion implies better routability)
• There is not yet enough research work on the congestion minimization problem
115
Definition of Congestion
Routing demand = 3Assume routing supply is 1,overflow = 3 - 1 = 2 on this edge.
Overflow = overflowall edges
Overflow on each edge =
Routing Demand - Routing Supply (if Routing Demand > Routing Supply)0 (otherwise)
116
Correlation between Wirelength and Congestion
Total Wirelength = Total Routing Demand
117
Wirelength Congestion
A congestion minimized placement A wirelength minimized placement
118
Congestion Map of a Wirelength Minimized Placement
Congested Spots
119
Congestion Reduction Postprocessing
Reduce congestion globally by minimizing the traditional wirelength
Post process the wirelength optimized placement using the congestion objective
120
An Effective Congestion Driven Placement Framework
André RoheUniversity of Bonn, Germany
joint work with Ulrich Brenner
ISPD 2002 (Best Paper)
121
A Dense Placement
• good wirelength• impossible to
route
122
Possible Solution
• easy to route
• bad wirelength/timing
123
Congestion Driven Placement
• easy to route + good wirelength almost no extra computation effort !
124
Overall Algorithm: Bonn Place
• Partitioning based approach• Solves QP in each level, followed by
partitioning • Partitioning is done by quadrisection:
circuits are partitioned with minimum movement (Vygen)
125
Methods used for congestion driven placement
• Very fast congestion calculation
• Inflate circuits in congested regions
• Spreading inflated cells
126
Congestion calculation
• Calculate Steiner Tree for each net• Probability estimation for each 2-point
connection (similar to Hung & Flynn, Lou et al.)
127
Inflation of circuits(used previously by Hou et al.)
• Initial inflation (based on pin density) • Given a circuit c in Region R, c is inflated by
up to 100% • The inflation is based on the congestion in
R and the surrounding regions & the pin density in R
• Deflation is possible if the circuit is no longer critical.
128
Placement Step 0
129
Placement Step 1
130
Placement Step 2
131
Placement Step 3
132
Placement Step 4
133
Placement Step 5
134
Placement Step 6
135
Placement Step 7
136
Spreading inflated cells
• Repartitioning considers 2x2 windows in placement grid to optimize netlength
• Use extra repartitioning step to move cells away from overloaded regions
137
Placement Migration
• From an initial placement (either legal or illegal), migrate to another placement solution
• It happens all the time during physical synthesis for multi-objective design closure
• Post/detailed placement Legalization due to buffer insertion, gate sizing,
Engineering Change Order (ECO), or decoupling capacitor insertion.
Congestion/noise mitigation Thermal hot spots removal …
• Global placement Cell spreading is a KEY step in any analytical engine
138
Prior Works
• There are many previous works in legalization/spreading Greedy approach [Viswanathan & Chu, ISPD’04]] Force-directed [Eisenmann & Johannes, DAC’98] Network flow [Brenner et al, ISPD’04] Dynamic Programming [Kahng et al, ASPDAC’99; Hur &
Lillis ICCAD’00] Global re-placement from a late cut [Ren et al,
ICCAD’04] ……However, none of them attempted to
PRESERVE the relative geometry orderin a continuous manner – important for design closure!
139
Diffusion Based Placement Migration[Ren et al, DAC’05]
Initial placement(illegal)
Legal placement from greedy alg.
Legal placement from DiffusionBased Placement[Ren+, DAC’05]
140
What is Diffusion?
• Diffusion equation
• Physical diffusion is a natural process to diffuse material from high density to low density while maintaining the relative order
)()(
,2, tdD
t
tdyx
yx
141
Diffusion Flow
• The velocity of the diffusion flow is proportional to the local density gradient.
• Its final location can be computed by the velocity integral.
d
dV
VdtX
A
B
142
Diffusion-Based Placement
Initialize density
Compute cell velocity
Compute next cell location
Update density
Max(density) < 1No
Yes
143
Initialize Bin Density
0.5 1.6 0.6 0.4
1.4 1.0 0.4 0.8
1.2 0.4 0.8 0.6
0.4 1.0 0.2 0.6
144
Compute Horizontal Bin Velocity
50012
414011 .
.
..vΗ
,
1.4 1.0 0.4
145
Compute Vertical Bin Velocity
6.00.12
6.14.01,1
Vv
1.6
1.0
0.4
146
Compute Bin Velocity
1,1vVv 1,1
Hv 1,1
147
Velocity Interpolation
1,2v1,1v
2,2v2,1v
148
Move Cell
n 1n
tvnyny
tvnxnxV
H
)()1(
)()1(
149
Update Bin Density
)(1,1 nd
)(3,1 nd
)(2,0 nd )(2,2 nd
)](2)()([2
)](2)()([2
)()1(
2,13,11,1
2,12,22,02,12,1
ndndndt
ndndndt
ndnd
)(2,1 nd
150
Cell Movement Example
x(0),y(0)
x(9),y(9)
151
Computational Geometry Based Placement [Luo+, ICCAD’05]
• Another view for stable placement migration to maintain the “relative” geometry order Bin-based Spreading
o Iterative bin stretchingo Cell interpolation
Delaunay triangulation based legalizationo Delaunay triangulation of the placement regiono Fine-grain density distribution and overlapping
removing
Geometric placement stability metrics
152
Bin Stretching
• Compute the bin density Dk,l(n) in iteration n
• Compute εxk,l and εx
k,l, the stretching/shrinking amount in Horizontal and Vertical directions. W
D
nD
ett
lkxlk )1
)((
arg
,,
HD
nD
ett
lkylk )1
)((
arg
,,
Dk,l(n)
Dtarget
εxk,l
εyk,l
W
H
153
Bin Stretching Corner Stretching
• Stretch each bin individually will generate bin overlaps stretch bin corners px
k,l, pyk,l instead
If a corner is on a fixed boundary, take the 0.5 factor off δ is for controlling the smoothness of spreading process
)(5.0)()1( ,1,,11,1,,x
lkx
lkx
lkx
lkx
lkx
lk npnp
)(5.0)()1( ,1,,11,1,,y
lky
lky
lky
lky
lky
lk npnp
D3,4 D4,4
D3,3 D4,3
p4,4(n)
154
Cell Location Interpolation
• Linearly interpolate cell coordinates x(n+1), y(n+1) from x(n), y(n) inside the new stretched bin
β *(pk,l(n+1)-pk,l-1(n+1))
x(n+1),y(n+1)
pk-1,l-1(n+1)
Pk-1,l(n+1)
pk,l(n+1)
pk,l-1(n+1)α*(pk,l-1(n+1)-p’k-1,l-1(n+1))
x(n),y(n)
pk-1,l-1(n)
Pk-1,l(n) pk,l(n)
Pk,l-1(n)
α*(pk,l-1(n)-pk-1,l-1(n))
β *(pk-1,l(n)-pk-1,l-1(n))
155
Hierarchical Iterative Spreading
• In every iteration The bin boundary is restored after cells are moved to new
locations Re-compute the bin density and repeat previous steps Stops once all bin densities are under Dtarget.
• Multilevel spread Use large bin in the beginning Divide large bin into smaller bins, selectively do regional
spreading.
…
156
Delaunay Triangulation
• What is Delaunay triangulation? The dual of the Voronoi diagram Given a set of point set S, the
Voronoi diagram of S is the partition of space such that each point has a closet subspace.
• Delaunay triangulation captures the geometric ordering Any point is connected with all
nearest neighbors by Delaunay edges.
157
Delaunay Triangulate a Placement
158
Delaunay Triangulation Based Legalizer
• Delaunay-triangulate the placement region• Delaunay-based cell spreading• Snap cells to closest rows• Row balancing (multiple iterations)
Triangulate the region
• Final legalization (multiple iterations)
159
Delaunay Spreading Order
• Cell spreading ordering Pick a root cell in the center of congestion area BFS walking through Delaunay edges to add
cells into the tree Spreading stops at low congestion area
[For details, see the paper]
…
160
Multilevel placement
(Note: this is a full placer, not a detailed placement method!)
Motivation and Related WorkMultilevel Methods in Scientific Computation
• Originally developed to solve boundary-value partial differential equation (PDE) problems on continuous field
• Discretized elliptic PDE is a structured, positive-definite system of linear equations Geometric/algebraic multi-grid methods
• Successfully applied to solve hypergraph partitioning problem: Hmetis [ G. Karypis 1998]
161
162
Multigrid methods
• Structure of analysis problem is similar to solution of partial differential equations (PDE’s)
• Borrow the idea of multigrid solution techniques
[Taken from U. Meier Yang, ATCS Toolkits Workshop
2001]
163
Outline of the approach• Original grid Reduced (coarsened) grid
• Solve coarsened grid• Map solution back to original fine grid
164
Illustration of a multilevel algorithm (for graph drawing)
Coarse solution
Finer solution (in various iterations)
Final solution
Flat (single-level) solution
[Walshaw]
165
Main Components in Multilevel Framework
1. Create coarser problems [aggregation/coarsening/clustering]
2. Optimize coarser problems [relaxation/smoothing]
3. Transform coarser problem solution to finer level [Interpolation/declustering ]
166
Initial Fine-Grain Problem
mPL: Multilevel Placement Framework
Intermediate Level
Intermediate Level
Find a Good Coarse-Grain Problem Solution
Aggregate
Aggregate
Intermediate LevelRelaxation (Optimization)
Intermediate LevelRelaxation (Optimization)
Final Fine-Grain Problem.Thorough Relaxation and
Detailed Placement
Aggregate Interpolate
InterpolateAggregate
etc. etc. Interpolate
Interpolate
167
Solving the Subproblem
• Problem formulation (horizontal case):
• Iterative solve the weighted quadratic minimization problem, using the current solution to determine the weight (Gordian-L).
number. small is , )(||
1 where
|)(|
))((min
)()(
2
eve
Ee evk
ek
e
vxe
x
xvx
xvx
M
168
AMG-based Linear Interpolation [A. Brandt 1986]
interpolation
constantAMG
jijvF
jijvC
i vavavjj points points
clusterNext finer level cells
Within each cluster, select the one with
maximum degree as C-point; others are
considered as F-points
C-point
169
AMG-based Interpolation
• Use the clique-model graph to define connectivity weights (connectivity matrix)
• Within each cluster, select the one with maximum degree as a C-point
• Each C-point is placed at the cluster’s position.• Each F-point is placed at the weighted average of the C-
points and F-points to which it is strongly connected• The F-points’ positions can be iteratively improved.
170
Iterated Multilevel Flow
Make use of placement solution from 1st V-cycle
First Choice (FC)clustering
Geometric basedFC clustering
171
Comparison on Standard Cell Designs(Caveat: this is outdated)
Capo9.01.09, 2.29
Dragon3.011.03, 12.38
FastPlace1.01.08, 0.18
Fengshui5.01.06, 2.03mPL5
1,1mPL5-fast1.07, 0.30
123456789
10111213
0.98 1 1.02 1.04 1.06 1.08 1.1
Scaled wirelength
Sca
led
ru
nti
me
Experiments carried out on ISPD2004 FastPlace IBM benchmarks.
172
Placement: Summary
173
Global Placement
Partitioning(Capo)
SimulatedAnnealing
(TimberWolf,Dragon)
Multilevel(mPL)Analytic/
Quadratic/Force-directed
Partitioningbased
(Gordian)
Densitybased
(Kraftwerk,Aplace)
various classes of algorithms
Cell shifting(Fastplace)
types of spreading forces
Global Placement
HPWL(conventional)
Timing(net weighting)
Congestion(cell inflation)
Metric: linear vs.quadratic wirelength
(net weighting)
objectives
174
Detailed Placement
Legalization Improving wirelength, timing, congestion
objectives
Detailed Placement
techniques
Network flows(Domino)
Optimal regions(Kahng-ASPDAC99;
Fastplace:Pan-ICCAD05)Restricted
network flows by longest paths
(Hur/Lillis-Mongrel)Greedy
Dynamic programming(Kahng-ASPDAC99;Hur/Lillis-Mongrel,
FengShui)
Diffusion analogy(Ren-DAC05)
Grid distortion/Delaunay triangulations
(Luo-ICCAD05)