1 Placement. 2 Problem Given a netlist, and fixed-shape cells (small, standard cell), find the exact location of the cells to minimize area and wire-length

1

Placement

2

Placement

• Problem Given a netlist, and fixed-shape cells (small,

standard cell), find the exact location of the cells to minimize area and wire-length

Consistent with the standard-cell design methodology

o Row-based, no hard-macros

Modules:o Usually fixed, equal height (exception: double height

cells)o Some fixed (I/O pads)o Connected by edges or hyperedges

• Objectives Cost components: area, wire length

o Additional cost components: timing, congestion[Bazargan]

3

Problem formulation

• Input: Blocks (standard cells and macros) B1, ... , Bn

Shapes and Pin Positions for each block Bi

Nets N1, ... , Nm

• Output: Coordinates (xi , yi ) for block Bi. No overlaps between blocks The total wire length is minimized The area of the resulting block is minimized or given a

fixed die

• Other considerations: timing, routability, clock, buffering and interaction with physical synthesis

4

Wire Length Objective

5

Placement Cost Components• Area

Would like to pack all the modules very tightly• Wire length (half-perimeter of the hnet bbox)

Minimize average wire length Would result in tight packing of modules with high

connectivity• Overlap

Could be prohibited by the moves, or used as penalty

Keep the cells from overlapping (moves cells apart)• Timing

Not a 1-1 correspondence with wire length minimization, but consistent on average

• Congestion Measure of routability Tends to move cells apart

[Bazargan]

6

Importance of Placement• Placement: fundamental problem in physical design • Glue of the physical synthesis• Became very active again in recent years:

9 new academic placers for WL min. since 2000 Many other publications to handle timing, routability, etc.

• Reasons: Serious interconnect issues (delay, routability, noise) in

deep-submicron designo Placement determines interconnect to the first order o Need placement information even in early design stages (e.g.,

logic synthesis)o Need to have a good placement solution

Placement problem becomes significantly larger Cong et al. [ASPDAC-03, ISPD-03, ICCAD-03] point out

that existing placers are far from optimal, not scalable, and not stable; subsequent rejoinder from other studies

[© He][Bazargan]

7

Placement can Make A Difference

• MCNC Benchmark circuit e64 (contains 230 4-LUT). Placed to a FPGA.

Random InitialPlacement

FinalPlacement

After DetailedRouting

[© He][Bazargan]

8

Design Types

• ASICs Lots of fixed I/Os, few macros, millions of standard cells Placement densities : 40-80% (IBM) Flat and hierarchical designs

• SoCs Many more macro blocks, cores Datapaths + control logic Can have very low placement densities : < 20%

• Micro-Processor (P) Random Logic Macros(RLM) Hierarchical partitions are placement instances (5-30K) High placement densities : 80%-98% (low whitespace) Many fixed I/Os, relatively few standard cells Recall “Partitioning w Terminals” DAC`99, ISPD `99,

ASPDAC`00

[© He][Bazargan]

9

Requirements for Placers

• Must handle 4-10M cells, 1000s macros 64 bits + near-linear asymptotic complexity Scalable/compact design database (OpenAccess)

• Accept fixed ports/pads/pins + fixed cells• Place macros, esp. with var. aspect ratios

Non-trivial heights and widths(e.g., height=2rows)

• Honor targets and limits for net length• Respect floorplan constraints• Handle a wide range of placement densities

(from <25% to 100% occupied), ICCAD `02

[© He][Bazargan]

10

A B C

Optimal Relative Order:

11

A B C

To spread ...

12

A B C

.. or not to spread

13

A B C

Place to the left

14

A B C

… or to the right

15

A B C

Optimal Relative Order:

Without “free” space the problem is dominated by order

16

Placement Footprints:

Standard Cell:

Data Path:

IP - Floorplanning

[Bazargan]

17

Core

ControlIO

Reserved areas

Mixed Data Path & sea of gates:


[Bazargan]

18

Perimeter IO

Area IO


[Bazargan]

19

UnconstrainedPlacement

[Bazargan]

20

Floor plannedPlacement

[Bazargan]

21

VLSI Global Placement Examples

bad placement

good placement

[Bazargan]

22

Placement Algorithms• Top-Down

Partitioning-based placement Recursive bi-partitioning or

quadrisectiono Cut direction?o Partition vs. physical location

• Iterative Simulated annealing

OR: Force directed/analytic Start with an initial placement,

iteratively improve wire-length / area

• Constructive Start with a few cells in the

center, and place highly connected adjacent modules around them

12 B

A

D B

FA

C

G

L H

[Bazargan]

23

Analytic Placement

24

Analytic Placement

• Also referred to as force-directed or quadratic placement

• Model Wires simulated as springs

Forceij = Weightij x distanceij.

Cell sizes as repellent forces

• Algorithm Solve a set of linear equations to find an

intermediate solution (module locations) Repeat the process until equilibrium

[Bazargan]

25

Force-Directed Placement (cont.)

• Model (details): Cell distances: either

OR:

Forces:

Objective: find x,y coordinates for all cells such that total force exerted on each cell is zero.

|||| jiijjiij yyyxxx

22 )()( jijiij yyxxd

)()(11

ij

n

jij

iyij

n

jij

ix ykFxkF

[Bazargan]

26

Force-Directed Placement (cont.)• Avoiding overlaps or collapsing in one point?

Use fixed boundary I/O cells Use repelling force between cells that are not

connected by a net Do not allow a move that results in overlap Use repelling “field” forces from congested areas

to sparse ones [Eisenmann, DAC’98]

• Problems with force directed: Overlap still might occur (cell sizes model

artificially) Flat design, not hierarchy

),()(1

iixij

n

jij

ix yxExkF

[Bazargan]

27

Analytic Placement (Contd.)

• Write down the placement problem as an analytical mathematical problem

• Solve the placement problem directly• Example:

Sum of squared wire length is quadratic in the cell coordinates.

So the wirelength minimization problem can be formulated as a quadratic program.

It can be proved that the quadratic program is convex, hence polynomial time solvable

28

x2x1

x=100

x=200Toy

Example:Cost x1 1002 x1 x2

2 x2 2002

x1Cost 2x1 1002x1 x2

x2Cost 2x1 x2 2x2 200

setting the partial derivatives = 0 we solve for the minimum Cost:

Ax + B = 0

= 04 22 4

x1

x2 200

400

= 02 11 2

x1

x2 100

200

x1=400/3 x2=500/3

29

setting the partial derivatives = 0 we solve for the minimum Cost:

Ax + B = 0

= 04 2 2 4

x1x2

200 400

= 02 1 1 2

x1x2

100 200

x1=400/3 x2=500/3

x2x1

x=100

x=200

Interpretation of matrices A and B:

The diagonal values A[i,i] correspond to the number of connections to xiThe off diagonal values A[i,j] are -1 if object i is connected to object j, 0 otherwiseThe values B[i] correspond to the sum of the locations of fixed objects connected to object i

Example:

30

Why formulate the problem this way?

• Because we can• Because it is trivial to solve• Because there is only one solution• Because the solution is a global optimum• Because the solution conveys “relative

order” information• Because the solution conveys “global

position” information

31

Gordian: A Quadratic Placement Approach

Global optimization: solves a sequence of quadratic programming problems

Partitioning: enforces the non-overlap constraints

Ref. 1: Gordian: VLSI Placement by Quadratic Programming and slicing Optimization, by J. M. Kleinhans, G.Sigl, F.M. Johannes, K.J. Antreich IEEE Trans. On CAD, March 1991. pp 356-365

Ref. 2: Analytical Placement: A Linear or a Quadratic Objective Function? By G. Sigl, K. Doll, F.M. Johannes, DAC’91 pp 427-423

32

Quadratic Placement Formulation

• Quadratic Placement Framework:repeat

Solve the convex quadratic programSpread the cells

until the cells are evenly distributed

ectorsSolution v,

cell and cellbetween net theofWeight

cell ofcenter theof sCoordinateLet

yx

jiw

i),y(x

ij

ii

ectorsSolution v,

cell and cellbetween net theofWeight

cell ofcenter theof sCoordinateLet

yx

jiw

i),y(x

ij

ii

22 )()(2

1 cell and cellbetween net theofCost

jijiij yyxxw

ji

22 )()(2

1 cell and cellbetween net theofCost

jijiij yyxxw

ji

const2

1

2

1cost Total yyyxxx T

yTT

xT dQdQ const

2

1

2


yTT

xT dQdQ

33

Solution of the Original QP

34

Partitioning

• Find a good cut direction and position.

• Improve the cut value using FM.

35

• Before every level of partitioning, do the Global Optimization again with additional constraints that the center of gravities should be in the center of regions.

• Always solve a single QP (i.e., global).

Applying the Idea Recursively

Center of Gravities

36

Center of Gravity Constraints

Mathematical formulation

37

38

Process of Gordian

(a) Global placement with 1 region (b) Global placement with 4 region (c) Final placements

39

Star Model for Wire Length

40

Other Details

• Sometimes, there is too much overlapping between modules (indicating a bad partitioning). In that case, it is necessary to merge 2 sub-partitions and repartition them.

• When the number of cells in each partition is small enough, can stop partitioning. Then do a final placement of the sub-circuits according to the design style.

41

Repartitioning

42

Quadratic TechniquesPros:- mathematically well behaved- efficient solution techniques find global optimum- great quality

Cons:- solution of Ax + B = 0 is not a legal placement, so

generallysome additional partitioning techniques are required.

- solution of Ax + B = 0 is that of the "mapped" problem, i.e., nets are represented as cliques, and the solution minimizes wire length squared, not linear wire length unless additionalmethods are deployed

- fixed IOs are required for these techniques to work well

43

Differences between linear and quadratic objective function

A B Cfixed movable fixed

A B Cfixed fixedmovable

a) Quadratic objective function

b) Linear objective function

Linear vs. Quadratic Objective Function

44

Linear vs. Quadratic Objective Function (Cont’d)

Quadratic objective function tends to make very long nets shorter than linear objective function does, and lets short nets become slightly longer

row1

row2

row3

row4

row1

row2

row3

row4

A

B

A

B

Linear objective function Quadratic objective function

45

Kraftwerk Placement Tool

Hans Eisenmann and F. Johannes, “Generic Global Placement and Floorplanning”,

DAC-98, p.269 - 274

46

Approach

• Iteratively solve the quadratic formulation:

• Spread cells by additional forces:

• Density-based force proposed Push cells away from dense region to sparse

region

0

)(Min 21

dCp

constpdCpppf TT

0 fdCp

)','(' and ),( where

'''

')','(

2),( 2

yxryxr

dydxrr

rryxD

kyxf

// equivalent to spring force // equilibrium

47

Some Details

• Let fi be the additional force applied to cell i

• The proportional constant k is chosen so that the maximum of all fi is the same as the force of a net with length k(W+H)

• k is a user-defined parameter k=0.2 for standard operation k=1.0 for fast operation

• Can be extended to handle timing, mixed block placement and floorplanning, congestion, heat-driven placement, incremental changes, etc.

48

Some Potential Problems of Kraftwerk

1. Convergence is difficult to control• Large k oscillation• Small k slow convergence

• Example: Layout of a multiplier

2. Density-based force is expensive to compute

'''

')','(

2),( 2 dydx

rr

rryxD

kyxf

49

A “discrete” penalty function

• Used in Aplace, Kahng et al., ISPD05 Proposed by Naylor (U.S. Pat. 6301693 ) Divide placement area into grids Equalize cell area over all grids Straightforward overlap function

Bell-shaped function is continuous and differentiable

50

Natarajan Viswanathan and Chris ChuISPD-04

FastPlace: FastPlace: Efficient Analytical Placement Efficient Analytical Placement

using Cell Shifting, using Cell Shifting, Iterative Local Refinement Iterative Local Refinement

and a Hybrid Net Modeland a Hybrid Net Model

51

FastPlace Approach• FastPlace Framework (roughly):

repeatSolve the convex quadratic program Reduce wirelength by iterative heuristic Spread the cells

until the cells are evenly distributed

• Special features of FastPlace: Cell Shifting

o Easy-to-compute technique o Enable fast convergence

Hybrid Net Modelo Speed up solving of convex QP

Iterative Local Refinemento Minimize wirelength based on linear objective

52

Cell Shifting

Uniform Bin Structure Non-uniform Bin Structure

1.1. Shifting of bin boundary Shifting of bin boundary

2.2. Shifting of cells linearly within each Shifting of cells linearly within each binbin

• Apply to all rows and all columns independently

53

Cell Shifting – Animation …

NBi

Bini

Bini+1

OBiOBi-1 OBi+1

Ui Ui+1

j

k

l

Bini

Bini+1

OBiOBi-1 OBi+1

j

k

l

NBi

54

Iterative Local Refinement

• Iteratively go through all the cells one by one• For each cell, consider moving it in four directions

for a certain distance• Compute a score for each direction based on

Half-perimeter wirelength (HPWL) reduction Cell density at the source and destination regions

• Move to the direction with highest positive score (Not move if no positive score)

• Distance moved (H or V) is decreasing over iterations

• Detailed placement is handledby the same heuristic

H H

V

V

55

Pseudo pin and Pseudo net

Pseudo net

Additional Force

Pseudo pin

Target Position

Original Position

• Need to add forces to prevent cells from collapse back

• By adding pseudo pins and pseudo nets

• Only diagonal and linear terms of the quadratic system need to be updated

• Horizontal and vertical problems have the same connectivity matrix Q

56

Effect of Net Model on Runtime

• Need to replace each multi-pin net by 2-pin nets• Then the placement problem (even with pseudo nets) can be

formulated as a convex QP:

• Solved by any convex QP algorithms Use Incomplete Cholesky Conjugate Gradient (ICCG)

• Runtime is proportional to # of non-zero entries in Q • Each non-zero entry in Q corresponds to one 2-pin net• Traditionally, placers model each multi-pin net by a clique• High-degree nets will generate a lot of 2-pin nets• Slow down convex QP algorithms significantly

const2

1

2


yTT

xT dQdQ const

2

1

2


yTT

xT dQdQ

57

Clique, Star and Hybrid Net Models

Star Node

Clique Model Star Model Hybrid Model

# pins Net Model2 Clique3 Clique4 Star5 Star6 Star… …

• Star model is introduced by Mo et al. [ICCAD-00] for macro placement

• Introduces a star node even for 2-pin nets

58

Comparsion

• FastPlace is fast: Compared to Capo 8.8:

o 13x faster o 1% longer in wirelength

Compared to Dragon 2.2.3o 97.4x fastero 1.6% longer in wirelength

Compared to Kraftwerk (based on published data)o 20-25x fastero 10% better in wirelength

59

Detailed Placement

60

Objectives

• Major: Legalization Make placement feasible with as little

perturbation as possible

• Minor: Further Improvement Wirelength Timing Routability

61

Representative Approaches

• Constructive methods Network flow [K. Doll et al, TCAD’94] Linear placement [A. Kahng et al, ASPDAC’99] Window interleaving [S. Hur et al, ICCAD’00’]

• Iterative method Simulated annealing [W. Sun & C. Sechen, ICCAD’93]

• Hybrid method Network flow + Dynamic programming + Linear

programming [J. Vygen, DATE’98] [Brenner et al, ISPD’04]

• Placement migration/spreading Diffusion-based [Ren et al, DAC’05] Delaunay [Luo et al, ICCAD’05]

62

Constructive Method[K. Doll et al, TCAD’94]

• Overlapped initial placement

• Improve placement• Divide chip into

overlapping regions• Assign cells within

each region• Iterated if necessary

63

Constructive Method: Network Flow

• Network transformation for regions

C1

C2

C3

Region

Final Result

Subcells Subregion

(capacity, cost)

64

Constructive Method: Network Flow

• Within each region Chop cells into uniform sized subcells

o All subcells from the same cell have same cost to each candidate location• All subcells tend to be pulled towards the cheapest location. As a

result, tend to lie side by side• In case of separation, move subcells to the column holding the

majorityo Determine y coordinate using gravity center of subcells

Transform into min-cost-maximum-flowo Introduce source Q and sink So Capacity from Q to any cell µ is sµ=# subcells in µ o Capacity from candidate location to S is 1o One edge from each cell to each candidate location with

capacity o One edge from each candidate location to sink D with

capacity 1 Solve by flow-augmentation algorithm

65

Constructive Method[K. Doll et al, TCAD’94]

Pseudo-code for the algorithm

66

Constructive Method [A. Kahng et al, ASPDAC’99]

• Determine the order of cells on a row from global placement

• Fix cells outside the row, and place the cells within the row optimally according to cell order

• Feasible location of cells are discrete, e.g., grid points

• Consider white space on the row, i.e., unused space between cells

• Three algorithms are given to exploit different properties of cost function: details omitted

• Key point: concept of optimal regions

67

Constructive Method: Linear Placement

• Input A single row with dimension with n movable cells ci, i=1,2,…

m Fixed cells in other rows Feasible locations S1, S2, Sk

m nets, N1, N2, … Nm between the movable and fixed cells Discrete feasible location for cells

• Output Overlapping free placement of ci within the row with

minimum

• Constraint (fixed order) x(c1) x(c2) … x(cn)

n

iiNHPWL

1

)(

68


fixed cells

net N

span (N)

fl(N) fr(N)

ml(N) mr(N)

fixed_span (N)

minimize

69


• Consider contribution of cell ci

• Piece-wise linear and convex Increasing (decreasing) when fr(N) to

the left of corresponding li(N) is less (more) than fl(N) to the right of corresponding ri(N)

)(

)(

}0),()(max{

}0),()(max{)(cos

NLCill

rrNRCi

i

NmNf

NfNmxt

fr(1) fl(2) fl(3) fr(3) fl(4) fr(2) fr(5) fl(6) fl(7)

Minimum Segment

1 23 4

5

6

7

70

Optimal region for detailed placement• Generalization of Kahng, ASPDAC99 (key idea proposed

earlier by Goto in 1981); used in Pan, ICCAD05• Optimal region = target region for moving a cell in detailed

placement• Given by the x and y medians of all nets

Given nets j=1…n, with bounding box coordinates xj(l), xj(r), yj(l), yj(r), order all x’s and find the median; order all y’s and find the median

71

Constructive Legalization in Mongrel [Hur and Lillis, ICCAD’00]

• Start from an overlap-free detailed placement and further improve the wirelength

• Pick an arbitrarily sized interval (window) on a row Fix the cells outside the window, and

interleave the cells optimally Repeat by sliding the window right across

each row

72

1. Wirelength improvement

• All movements monotonic from a source bin S with violated capacity to a destination bin T with available capacity

• Gain = wire length reduction by moving a single cell to a neighbor

• Create a gain graph (DAG), and find maximum gain path

73

2. Window Interleaving

• Input An arbitrarily sized interval W within a row Arbitrary ordered subsequence of W as A={a1, a2, … am}. , and B = W-

A = {b1, b2, … bn} Fixed cells outside W n nets, N1, N2, … Nn between cells in W and fixed cells

• Output Overlapping free placement within W with minimum

• Constraint Cell order of A and B is preserved Wirelength of nets connecting cells within W is minimized

n

iiNHPWL

1

)(

74

Ordered sequence A and B are chosen arbitrarilyThe cell order of A and B is preserved after interleaving


75

• Dynamic programming Subproblem

o Denote Sij as the interleaving if {a1, a2,… ai } and {b1, b2,… bj}, with C(Si,j) minimized, C(Si,j) denotes the cost of Sij ,

Recursion

Complexity O(nm+p(n+m)), p is the number of pins on incident nets, n and m is the cells in A and B respectively

otherwise ,

)()( if ,,

0)(

0

1,

1,,11

0,0

0,0

jji

jjiijiiji

ij bS

bSCaSCaSS

SC

S


76

• Dynamic programming Key point: the optimal interleaving of a

prefix Si,j is independent of the ordering of subsequent cells in the window

Separability => Dynamic Programming

• Points to ponder Is it optimal? What is the difference compared to the DP

in Linear Placement [Kahng et al]?


Other approaches to legalization

• Not discussed in class – for your reference FengShui

o Dynamic programming [ICCAD’03]o Greedy Algorithm [ISPD’04]

Tetris - Dwight Hill (US Patent 6,370,673)o Also used in Kahng/Wang APlace paper [ISPD04]

77

Simulated annealing

78

79

Simulated Annealing Placement

• Cost Area (usually fixed # of rows, variable row width) Wirelength (Euclidian or Manhattan) Cell overlap (penalty increases with

temperature)

• Moves Exchange two cells within a radius R

(R temperature dependent?) Displace a cell within a row Flip a cell horizontally

• Low vs. High temperature If used as a post processing, start with low-temp

• Post-processing? Might be needed if there are still overlaps

[Bazargan]

80

Case Study: TimberWolf• “The Timberwolf Placement and Routing Package”, Sechen, Sangiovanni;

IEEE Journal of Solid-State Circuits, vol SC-20, No. 2(1985) 510-522• “Timber wolf 3.2: A New Standard Cell Placement and Global Routing

Package” Sechen, Sangiovanni, 23rd DAC, 1986, 432-439

Timber wolf

Stage 1 Modules are moved between different rows as well as within the

same row Modules overlaps are allowed When the temperature is reduced below a certain value, stage 2

begins

Stage 2 Remove overlaps Annealing process continues, but only interchanges adjacent

modules within the same row

[© He][Bazargan]

81

Solution Space

All possible arrangements of modules into rows possibly with overlaps

overlaps

[© He][Bazargan]

82

Neighboring Solutions

M3: Change the orientation of a module1 2

3 4

2 1

3 4

1 2

3 4

Axis of reflections

M1: Displace a module to a new location

M2: Interchange two modules

..

Three types of moves:

[© He][Bazargan]

83

Move Selection

• Timber wolf first tries to select a move between M1 and M2

o Prob(M1)=4/5o Prob(M2)=1/5

• If a move of type M1 is chosen (for certain module) and it is rejected, then a move of type M3 (for the same module) will be chosen with probability 1/10

• Restriction on: How far a module can be displaced What pairs of modules can be interchanged

M1: Displacement

M2: Interchange

M3: Reflection

[© He][Bazargan]

84

Move Restriction

• Range Limiter At the beginning, R is very large, big enough to

contain the whole chip Window size shrinks slowly as the temperature

decreases. In fact, height and width of R log(T) Stage 2 begins when window size are so small that

no inter-row modules interchanges are possible

Rectangular window R

[© He][Bazargan]

85

Cost Function

• Cost = C1+C2+C3

C1 = (aiwi + bihi)

ai, bi are horizontal and vertical weights, respectively

ai =1, bi =1 1/2 perimeter of bounding box

Critical nets: Increase both ai and bi

Double metal technology: Over-the-cell routing is possible. Fewer feed through cells are needed

If vertical wirings are “cheaper” than horizontal wiringsl use smaller vertical weights i.e. bi< ai

net i

hi

wi

[© He][Bazargan]

86

Cost Function (Cont’d)

C2: Penalty function for module overlaps

O(i,j) = amount of overlaps in the X-dimension between modules i and j

— offset parameter to ensure C2 0 when T 0

ji

jiOC2

2 ),(

C3: Penalty function that controls the row lengths

Desired row length = d( r )

l( r ) = sum of the widths of the modules in row r

r

rdrlC )()(3

[© He][Bazargan]

87

Annealing Schedule

Tk = r(k)•Tk-1 k= 1, 2, 3, …. r(k) increase from 0.8 to max value 0.94 and

then decrease to 0.1 At each temperature, a total number of K•n

attempts is made n= number of modules K= user specified constant

[© He][Bazargan]

88

Dragon2000: Standard-Cell Placement Tool for Large

Industry Circuits

M. Wang, X. Yang, and M. Sarrafzadeh,ICCAD-2000

pages 260-263

89

Main Idea

• Simulated annealing based 1.9x faster than iTools 1.4.0 (commerical version of

TimberWolf) Comparable wirelength to iTools (i.e., very good) Performs better for larger circuits Still very slow compared with than other approaches Also shown to have good routability

• Top-down hierarchical approach hMetis to recursively quadrisect into 4h bins at level h Swapping of bins at each level by SA to minimize WL Terminates when each bin contains < 7 cells Then swap single cells locally to further minimize WL

• Detailed placement is done by greedy algorithm

Partitioning

90

91

Partitioning-based Approach

• Try to group closely connected modules together.

• Repetitively divide a circuit into sub-circuits such that the cut value is minimized.

• Also, the placement region is partitioned (by cutlines) accordingly.

• Each sub-circuit is assigned to one partition of the placement region.

Note: Also called min-cut placement approach.

92

An Example

Circuit

Placement

Cutline

93

Variations

• There are many variations in the partitioning-based approach. They are different in: The objective function used. The partitioning algorithm used. The selection of cutlines.

94

Partitioning:

Objective:

Given a set of interconnected blocks, produce two sets thatare of equal size, and such that the number of nets connecting the two sets is minimized.

95

FM Partitioning:

Initial Random Placement

After Cut 1

After Cut 2

list_of_sets = entire_chip;while(any_set_has_2_or_more_objects(list_of_sets)){

for_each_set_in(list_of_sets){

partition_it();}/* each time through this loop the number of *//* sets in the list doubles. */

}

96

Partitioning-based Placement• Simultaneously perform:

Circuit partitioning Chip area partitioning Assign circuit partitions to chip slots

• Problem: Circuit partitioning unaware of the physical

location

Solution: Terminal propagation (add dummy terminals)

A B

A

B

A B

[She99] p.239

A B

[Bazargan]

97

Terminal Propagation Algorithm by Dunlop and Kernighan

“A Procedure for Placement of Standard-Cell VLSI Circuits”,TCAD, 4(1):92-98, Jan. 1985.

98

Problem of Partitioning Subcircuits

A B

A B

A

B

Cost of these 2 partitionings are not the same.

99

Terminal Propagation

• Need to consider nets connecting to external terminals or other modules as well.

• Do partitioning in a breath-first manner (i.e., finish all higher-level partitioning first).

A BA

B

Dummy Terminal

A B

The Dummy Terminal will try to pull B to the top partition.

100

Terminal Propagation

101

Partitioning-based Placement• More problems:

Direction of the cut? [Yildiz, DAC’01]

How to handle fixed blocks? (area assigned to a partition might not be enough)

How to correct a bad decision made at a higher level?

• Advantages: Hierarchical, scalable Inherently apt for congestion minimization, easily

extendable to timing optimization

1

2 34 5

6 7

12 3

4 5

1234

56789

(a) (b) (c)

1 2

3

(d)

[Bazargan]

102

Can Recursive Bisection Alone Produce Routable Placement?

(Name of placer: Capo)

Andrew Caldwell, Andrew Kahng, and Igor Markov

DAC-2000

103

Capo Overview

• Standard cell placement, Fixed-die context• Pure recursive bisectioning placer

Several minor techniques to produce good bisections

• Produce good results mainly because: Improvement in mincut bisection using multi-

level idea in the past few years Pay attention to details in implementation

• Implementation with good interface (LEF/DEF and GSRC bookshelf) available on web

104

Capo Approach

• Recursive bisection framework: Multi-level FM for instances with >200 cells Flat FM for instances with 35-200 cells Branch-and-bound for instances with <35 cells

• Careful handling partitioning tolerance: Uncorking: Prevent large cells from blocking smaller

cells to move Repartitioning: Several FM calls with decreasing

tolerance Block splitting heuristics: Higher tolerance for vertical

cut Hierarchical tolerance computation: Instance with more

whitespace can have a bigger partitioning tolerance

105

Handling Macrocells

• Macrocells: large blocks Shred into smaller blocks of regular height [Doll,

1994] Add a number of “fake”nets between these to

keep them together (or equivalently, use a larger net weight)

Use the centroid of the placed blocks as the centroid of the macroblock after placement

• Can be used for “floorplacement” (floorplanning+placement)

106

Partitioning with CAPO

Pros: fast scales nearly linearly with problem size

Cons: non-trivial to implement very directed algorithm, but this limits the

ability to deal with miscellaneous constraints Not stable (if there is minor change)

o Concept of placement stability: if the input netlist experiences a small change, the placement should not change substantially

107

Summary for Partition Based Placement

• Improvement in mincut partitioning are conducive to better wirelength and congestion

• Routable placements can be produced in most cases without explicit congestion management Explicit congestion control may still be useful in

some cases

• Better weighted wirelength often implies better routed wirelength, but not always

108

Other Placement Metrics

109

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

44

6

6

5

5

7

4

netlist with delay for each gate

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

44

6

6

5

5

7

4

arrival times

0

0

0

1

3

1

7

9

7

7

13

15

14

18

22

18

Timing Analysis

110

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

44

6

6

5

5

7

4

arrival time/required time

0/4

0/0

0/8

1/5

3/3

1/9

7/9

9/9

7/15

7/13

13/15

15/15

14/18

18/22

22/22

18/22

PO1

PO2

PO3

PI1

PI2

PI3

1

3

1

4

6

44

6

6

5

5

7

4

slack = required time - arrival time

4

0

8

4

0

8

2

0

8

6

2

0

4

4

0

4

Timing Analysis

Another example with interconnect delay – Same Timing Analysis

5 5 5

4 4 4

2

LATCH

LATCH

3 2 1 1

2 1 3 2

1

2222

1919

112

Timing Driven Placement Approaches

• Path-based Most accurate information Very slow

• Budgeting Inaccurate information Hard to budget Fast

• Net-based approach Net-weighting

113

Net-Weighting• Basic approach

For more timing critical nets (i.e., smaller slack), assign higher net weights

Minimize

where

),(_ ilengthnetwi

i

ii S

w1

114

Congestion Minimization

• Traditional placement problem is to minimize interconnection length (wirelength)

• A valid placement has to be routable• Congestion is important because it

represents routability (lower congestion implies better routability)

• There is not yet enough research work on the congestion minimization problem

115

Definition of Congestion

Routing demand = 3Assume routing supply is 1,overflow = 3 - 1 = 2 on this edge.

Overflow = overflowall edges

Overflow on each edge =

Routing Demand - Routing Supply (if Routing Demand > Routing Supply)0 (otherwise)

116

Correlation between Wirelength and Congestion

Total Wirelength = Total Routing Demand

117

Wirelength Congestion

A congestion minimized placement A wirelength minimized placement

118

Congestion Map of a Wirelength Minimized Placement

Congested Spots

119

Congestion Reduction Postprocessing

Reduce congestion globally by minimizing the traditional wirelength

Post process the wirelength optimized placement using the congestion objective

120

An Effective Congestion Driven Placement Framework

André RoheUniversity of Bonn, Germany

joint work with Ulrich Brenner

ISPD 2002 (Best Paper)

121

A Dense Placement

• good wirelength• impossible to

route

122

Possible Solution

• easy to route

• bad wirelength/timing

123

Congestion Driven Placement

• easy to route + good wirelength almost no extra computation effort !

124

Overall Algorithm: Bonn Place

• Partitioning based approach• Solves QP in each level, followed by

partitioning • Partitioning is done by quadrisection:

circuits are partitioned with minimum movement (Vygen)

125

Methods used for congestion driven placement

• Very fast congestion calculation

• Inflate circuits in congested regions

• Spreading inflated cells

126

Congestion calculation

• Calculate Steiner Tree for each net• Probability estimation for each 2-point

connection (similar to Hung & Flynn, Lou et al.)

127

Inflation of circuits(used previously by Hou et al.)

• Initial inflation (based on pin density) • Given a circuit c in Region R, c is inflated by

up to 100% • The inflation is based on the congestion in

R and the surrounding regions & the pin density in R

• Deflation is possible if the circuit is no longer critical.

128

Placement Step 0

129

Placement Step 1

130

Placement Step 2

131

Placement Step 3

132

Placement Step 4

133

Placement Step 5

134

Placement Step 6

135

Placement Step 7

136

Spreading inflated cells

• Repartitioning considers 2x2 windows in placement grid to optimize netlength

• Use extra repartitioning step to move cells away from overloaded regions

137

Placement Migration

• From an initial placement (either legal or illegal), migrate to another placement solution

• It happens all the time during physical synthesis for multi-objective design closure

• Post/detailed placement Legalization due to buffer insertion, gate sizing,

Engineering Change Order (ECO), or decoupling capacitor insertion.

Congestion/noise mitigation Thermal hot spots removal …

• Global placement Cell spreading is a KEY step in any analytical engine

138

Prior Works

• There are many previous works in legalization/spreading Greedy approach [Viswanathan & Chu, ISPD’04]] Force-directed [Eisenmann & Johannes, DAC’98] Network flow [Brenner et al, ISPD’04] Dynamic Programming [Kahng et al, ASPDAC’99; Hur &

Lillis ICCAD’00] Global re-placement from a late cut [Ren et al,

ICCAD’04] ……However, none of them attempted to

PRESERVE the relative geometry orderin a continuous manner – important for design closure!

139

Diffusion Based Placement Migration[Ren et al, DAC’05]

Initial placement(illegal)

Legal placement from greedy alg.

Legal placement from DiffusionBased Placement[Ren+, DAC’05]

140

What is Diffusion?

• Diffusion equation

• Physical diffusion is a natural process to diffuse material from high density to low density while maintaining the relative order

)()(

,2, tdD

t

tdyx

yx

141

Diffusion Flow

• The velocity of the diffusion flow is proportional to the local density gradient.

• Its final location can be computed by the velocity integral.

d

dV

VdtX

A

B

142

Diffusion-Based Placement

Initialize density

Compute cell velocity

Compute next cell location

Update density

Max(density) < 1No

Yes

143

Initialize Bin Density

0.5 1.6 0.6 0.4

1.4 1.0 0.4 0.8

1.2 0.4 0.8 0.6

0.4 1.0 0.2 0.6

144

Compute Horizontal Bin Velocity

50012

414011 .

.

..vΗ

,

1.4 1.0 0.4

145

Compute Vertical Bin Velocity

6.00.12

6.14.01,1

Vv

1.6

1.0

0.4

146

Compute Bin Velocity

1,1vVv 1,1

Hv 1,1

147

Velocity Interpolation

1,2v1,1v

2,2v2,1v

148

Move Cell

n 1n

tvnyny

tvnxnxV

H

)()1(

)()1(

149

Update Bin Density

)(1,1 nd

)(3,1 nd

)(2,0 nd )(2,2 nd

)](2)()([2

)](2)()([2

)()1(

2,13,11,1

2,12,22,02,12,1

ndndndt

ndndndt

ndnd

)(2,1 nd

150

Cell Movement Example

x(0),y(0)

x(9),y(9)

151

Computational Geometry Based Placement [Luo+, ICCAD’05]

• Another view for stable placement migration to maintain the “relative” geometry order Bin-based Spreading

o Iterative bin stretchingo Cell interpolation

Delaunay triangulation based legalizationo Delaunay triangulation of the placement regiono Fine-grain density distribution and overlapping

removing

Geometric placement stability metrics

152

Bin Stretching

• Compute the bin density Dk,l(n) in iteration n

• Compute εxk,l and εx

k,l, the stretching/shrinking amount in Horizontal and Vertical directions. W

D

nD

ett

lkxlk )1

)((

arg

,,

HD

nD

ett

lkylk )1

)((

arg

,,

Dk,l(n)

Dtarget

εxk,l

εyk,l

W

H

153

Bin Stretching Corner Stretching

• Stretch each bin individually will generate bin overlaps stretch bin corners px

k,l, pyk,l instead

If a corner is on a fixed boundary, take the 0.5 factor off δ is for controlling the smoothness of spreading process

)(5.0)()1( ,1,,11,1,,x

lkx

lkx

lkx

lkx

lkx

lk npnp

)(5.0)()1( ,1,,11,1,,y

lky

lky

lky

lky

lky

lk npnp

D3,4 D4,4

D3,3 D4,3

p4,4(n)

154

Cell Location Interpolation

• Linearly interpolate cell coordinates x(n+1), y(n+1) from x(n), y(n) inside the new stretched bin

β *(pk,l(n+1)-pk,l-1(n+1))

x(n+1),y(n+1)

pk-1,l-1(n+1)

Pk-1,l(n+1)

pk,l(n+1)

pk,l-1(n+1)α*(pk,l-1(n+1)-p’k-1,l-1(n+1))

x(n),y(n)

pk-1,l-1(n)

Pk-1,l(n) pk,l(n)

Pk,l-1(n)

α*(pk,l-1(n)-pk-1,l-1(n))

β *(pk-1,l(n)-pk-1,l-1(n))

155

Hierarchical Iterative Spreading

• In every iteration The bin boundary is restored after cells are moved to new

locations Re-compute the bin density and repeat previous steps Stops once all bin densities are under Dtarget.

• Multilevel spread Use large bin in the beginning Divide large bin into smaller bins, selectively do regional

spreading.

…

156

Delaunay Triangulation

• What is Delaunay triangulation? The dual of the Voronoi diagram Given a set of point set S, the

Voronoi diagram of S is the partition of space such that each point has a closet subspace.

• Delaunay triangulation captures the geometric ordering Any point is connected with all

nearest neighbors by Delaunay edges.

157

Delaunay Triangulate a Placement

158

Delaunay Triangulation Based Legalizer

• Delaunay-triangulate the placement region• Delaunay-based cell spreading• Snap cells to closest rows• Row balancing (multiple iterations)

Triangulate the region

• Final legalization (multiple iterations)

159

Delaunay Spreading Order

• Cell spreading ordering Pick a root cell in the center of congestion area BFS walking through Delaunay edges to add

cells into the tree Spreading stops at low congestion area

[For details, see the paper]

…

160

Multilevel placement

(Note: this is a full placer, not a detailed placement method!)

Motivation and Related WorkMultilevel Methods in Scientific Computation

• Originally developed to solve boundary-value partial differential equation (PDE) problems on continuous field

• Discretized elliptic PDE is a structured, positive-definite system of linear equations Geometric/algebraic multi-grid methods

• Successfully applied to solve hypergraph partitioning problem: Hmetis [ G. Karypis 1998]

161

162

Multigrid methods

• Structure of analysis problem is similar to solution of partial differential equations (PDE’s)

• Borrow the idea of multigrid solution techniques

[Taken from U. Meier Yang, ATCS Toolkits Workshop

2001]

163

Outline of the approach• Original grid Reduced (coarsened) grid

• Solve coarsened grid• Map solution back to original fine grid

164

Illustration of a multilevel algorithm (for graph drawing)

Coarse solution

Finer solution (in various iterations)

Final solution

Flat (single-level) solution

[Walshaw]

165

Main Components in Multilevel Framework

1. Create coarser problems [aggregation/coarsening/clustering]

2. Optimize coarser problems [relaxation/smoothing]

3. Transform coarser problem solution to finer level [Interpolation/declustering ]

166

Initial Fine-Grain Problem

mPL: Multilevel Placement Framework

Intermediate Level

Intermediate Level

Find a Good Coarse-Grain Problem Solution

Aggregate

Aggregate

Intermediate LevelRelaxation (Optimization)

Intermediate LevelRelaxation (Optimization)

Final Fine-Grain Problem.Thorough Relaxation and

Detailed Placement

Aggregate Interpolate

InterpolateAggregate

etc. etc. Interpolate

Interpolate

167

Solving the Subproblem

• Problem formulation (horizontal case):

• Iterative solve the weighted quadratic minimization problem, using the current solution to determine the weight (Gordian-L).

number. small is , )(||

1 where

|)(|

))((min

)()(

2

eve

Ee evk

ek

e

vxe

x

xvx

xvx

M

168

AMG-based Linear Interpolation [A. Brandt 1986]

interpolation

constantAMG

jijvF

jijvC

i vavavjj points points

clusterNext finer level cells

Within each cluster, select the one with

maximum degree as C-point; others are

considered as F-points

C-point

169

AMG-based Interpolation

• Use the clique-model graph to define connectivity weights (connectivity matrix)

• Within each cluster, select the one with maximum degree as a C-point

• Each C-point is placed at the cluster’s position.• Each F-point is placed at the weighted average of the C-

points and F-points to which it is strongly connected• The F-points’ positions can be iteratively improved.

170

Iterated Multilevel Flow

Make use of placement solution from 1st V-cycle

First Choice (FC)clustering

Geometric basedFC clustering

171

Comparison on Standard Cell Designs(Caveat: this is outdated)

Capo9.01.09, 2.29

Dragon3.011.03, 12.38

FastPlace1.01.08, 0.18

Fengshui5.01.06, 2.03mPL5

1,1mPL5-fast1.07, 0.30

123456789

10111213

0.98 1 1.02 1.04 1.06 1.08 1.1

Scaled wirelength

Sca

led

ru

nti

me

Experiments carried out on ISPD2004 FastPlace IBM benchmarks.

172

Placement: Summary

173

Global Placement

Partitioning(Capo)

SimulatedAnnealing

(TimberWolf,Dragon)

Multilevel(mPL)Analytic/

Quadratic/Force-directed

Partitioningbased

(Gordian)

Densitybased

(Kraftwerk,Aplace)

various classes of algorithms

Cell shifting(Fastplace)

types of spreading forces

Global Placement

HPWL(conventional)

Timing(net weighting)

Congestion(cell inflation)

Metric: linear vs.quadratic wirelength

(net weighting)

objectives

174

Detailed Placement

Legalization Improving wirelength, timing, congestion

objectives

Detailed Placement

techniques

Network flows(Domino)

Optimal regions(Kahng-ASPDAC99;

Fastplace:Pan-ICCAD05)Restricted

network flows by longest paths

(Hur/Lillis-Mongrel)Greedy

Dynamic programming(Kahng-ASPDAC99;Hur/Lillis-Mongrel,

FengShui)

Diffusion analogy(Ren-DAC05)

Grid distortion/Delaunay triangulations

(Luo-ICCAD05)

Documents

1 Placement. 2 Problem Given a netlist, and fixed-shape cells (small, standard cell), find the exact location of the cells to minimize area and wire-length