Upload
miron
View
25
Download
0
Embed Size (px)
DESCRIPTION
Optimality, Scalability and Stability study of Partitioning and Placement Algorithms. Jason Cong, Michail Romesis, Min Xie UCLA Computer Science Department. This work is partially supported by Semiconductor Research Corporation and National Science Foundation. Overview. - PowerPoint PPT Presentation
Citation preview
Optimality, Scalability and Stability study of Partitioning and Placement
Algorithms
Jason Cong, Michail Romesis, Min Xie
UCLA Computer Science Department
This work is partially supported by Semiconductor Research Corporation and National Science Foundation
2
Overview
Motivation and related work Our contribution
Construction of Partitioning Examples with Known Upper bound
Construction of Placement Examples with Known Upper bound
Optimality, Scalability and Stability study Conclusions and future work
3
Overview
Motivation and related workMotivation and related work Our contribution
Construction of Partitioning Examples with Known Upper bound
Construction of Placement Examples with Known Upper bound
Optimality, Scalability and Stability study Conclusions and future work
4
Motivation Partitioning
0
20
40
60
80
100
120
FM PANZA CLIP LSR hMetis
(1982) (1995) (1996) (1997) (1997)
MCNC ISPD
Significant progress in partitioning during the mid-to-late 90’s
No significant improvement in the last 5 years
Have we reached a plateau?
5
Motivation Placement
Lack of significant progress in wirelength reduction Rate of reduction is about 5-10% every 2-3 years Latest developments in placement differ mainly in runti
me Capo [A. Caldwell et al, 2000] Dragon [M. Wang et al, 2000] Mongrel [S. Hur et al, 2000] mPL [T. Chan et al, 2000] mPG [C. Chang et al, 2002]
How much is the room for further improvement?
6
Motivation
Most work compare only with known heuristics Use real design based benchmarks
ISPD98 [C. Alpert 1998] WSI [D. Ghosh et al, 1997]
Use synthetic benchmarks circ and gen [M. D. Hutton et al, 1998] gnl [D. Stroobandt et al, 2000]
Little understanding about the divergence from the optimal
7
Related Work Quantified Suboptimality of VLSI Layo
ut Heuristics [L. Hagen et al, 1995] Construct scaled instance with k
nown upperbound from an initial problem
x
x x x
x x x
x x x
? Over 10% area suboptimality in
TimberWolf Notable wirelength suboptimalit
y in GORDIAN-L Significant improvement was po
ssible for placement and partitioning
But test cases are small, the largest netlist is less than 40K
8
Related Work Optimality and Scalability of Existing
Placement Algorithms [C. Chang et al, 2003] Construct instances with known
optimal using the characteristic of the original problem?
Existing placement algorithms can be 70% to 150% away from the optimal
Average solution quality deteriorates by an additional 4% to 25% when the problem size increases by a factor of 10
All the connections are local, no global connections
9
Overview
Motivation and related work Our contributionOur contribution
Construction of Partitioning Examples with Construction of Partitioning Examples with Known Upper bound Known Upper bound
Construction of Placement Examples with Construction of Placement Examples with Known Upper bound Known Upper bound
Optimality, Scalability and Stability studyOptimality, Scalability and Stability study Conclusions and future work
10
BEKU Construction Example
Cutsize improved to 4 after FM
Input: t = 16, D={12,8} B = 5
Generate 9 2-pin nets that do not cross the partition line
C
D
BA
P1 P2 Create two partitions of size 8
Generate 3 2-pin nets that cross the partition line
Generate 6 3-pin nets that do not cross the partition line
Generate 2 3-pin nets that cross the partition line
Cutsize = 5
11
Construction of Multiway Partitioning Examples with Known
Upper Bounds (MEKU)
Divide the nodes into mm partitions of equal size
Create BB nets that cross at least two partitions. The remaining nets stay in one partition
Improve by multiway FM
12
BEKU and MEKU Suite
2-way partitions occupy 45-55% of the total area
8-way partitions occupy 11.8-13.3% of the total area
# of nodes # of nets# of
partsUpper bound
500,000 530,705 2 92,343500,000 530,705 2 111,873
1,000,000 1,061,410 2 184,7141,000,000 1,061,410 2 223,5201,500,000 1,592,114 2 276,6701,500,000 1,592,114 2 335,2422,000,000 2,122,819 2 369,5262,000,000 2,122,819 2 447,781500,000 530,705 8 139,943500,000 530,705 8 160,163
1,000,000 1,061,410 8 279,9751,000,000 1,061,410 8 320,4571,500,000 1,592,114 8 420,2791,500,000 1,592,114 8 479,9712,000,000 2,122,819 8 560,2752,000,000 2,122,819 8 640,459
URL : http://cadlab.cs.ucla.edu/~pubbench/partitioning/
13
Tested three State-of-the-Art Partitioning Tools
hMetis [G. Karypis et al, 1997] Based on multilevel framework MHEC and FC clustering algorithms Variations of FM for refinement at each level
MLPart [A. Caldwell et al, 2000] Based on multilevel framework Different algorithms for coarsening (PinEC) and refinement
(VRW) Flare [J. Cong et al, 2000]
Two-level hierarchy created by the ESC clustering algorithm Based on the LR bipartitioning engine and the PM multiway
partitioning framework
14
Experimental Results on BEKU
MLPart produces the best results (very close to our estimated upper bound), and Flare the worst
The value of the bound (as a percentage of nets) influences the quality of hMetis and Flare
0.9
0.951
1.051.1
1.15
1.21.25
1.31.35
1.4
15% 17% 19% 21% 23% 25%
Bound (% of nets)
Qu
alit
y R
atio
MLPart hMetis Flare
15
Experimental Results on BEKU
The runtime scale well (almost linearly) Flare runs out of memory when problem size exceeds
1M nodes
0
10
20
30
40
500000 1000000 1500000 2000000
Circuit size
Min
ute
s
hMETIS MLPart Flare
16
Experimental Results on MEKU
hMetis is worse by only 2% when the initial bound is 30%, but the gap increases to 18% for a bound of 35%
MLPart does not support multiway partitioning
0
0.5
1
1.5
2
30% 35%Bound (% of nets)
Qu
alit
y R
atio
hMetis Flare
17
Placement Examples with Global Connections
circuit height widthWL of
longest netWL contribution of longest 10%
ibm01 8158 4530 7148 51%ibm02 8158 6430 14224 46%ibm03 8158 6740 10624 58%ibm04 8158 9140 15171 53%ibm05 8158 11055 19064 47%ibm06 8158 8715 13966 61%ibm07 8158 14605 14051 51%ibm08 8158 15895 16142 60%ibm09 8158 16395 13780 55%ibm10 8158 27890 30755 53%ibm11 16350 10925 19234 59%ibm12 16350 15545 26748 52%ibm13 16350 12230 19539 59%ibm14 16350 25475 26370 61%ibm15 16350 23785 27284 63%ibm16 16350 34015 42860 59%ibm17 16283 38895 45686 56%ibm18 16350 37065 52846 64%
Produced by Dragon on ISPD98
The wirelength contribution from global connections can be significant!
Need to consider the impact of global connections
18
Placement Examples with Global Connections only
Each net connects either a row or column
Obvious upper bound Sum the length of each
row and column Similar to datapath
examples
19
Placement Examples with Non-local Connections
Extend PEKO [ C.Chang 2003] by introducing non-local nets to mimic global connections All the modules are of equal size, and there is
no space between rows and adjacent modules
For nets of degree ii, *d*dii of them are generat
ed by randomly conneting ii modules, the rest are generated optimally as in PEKO
20
Placement Examples with Non-local Connections
Input : t = 64, D = {d2=34,d3=20,d4=7,d5=4,d6=2, d7=1} =0.2
Total WL = 160
Generate 28 2-pin optimally
Generate 6 2-pin randomly
Generate 16 3-pin optimally
Generate 4 3-pin randomly
Generate 6 4-pin randomly
Generate 1 4-pin randomly
Generate 4 5-pin optimally
Generate 2 6-pin optimally
Generate 1 7-pin optimally
21
G-PEKU Suite
circuit #cell #net #row UBGPeku01 12506 224 113 7.93E+05GPeku05 28146 336 169 1.79E+06GPeku10 68685 525 263 4.38E+06GPeku15 161187 803 402 1.03E+07GPeku18 210341 918 460 1.34E+07
Module number extracted from ISPD98
URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm
22
PEKU Suite
Module number t and NDVs extracted from ISPD98
Remove connections with pads Vary from 0 to 10% 15% white space by expanding one dime
nsion of the chip
23
PEKU Suite% non-
local nets
circuit #cell #net #rowRow
utilization
LB UB
Peku01 12506 14111 113 85% 8.14E+05 8.14E+05Peku05 28146 28446 169 85% 1.91E+06 1.91E+06Peku10 68685 75196 263 85% 4.73E+06 4.73E+06Peku15 161187 186608 402 85% 1.15E+07 1.15E+07Peku18 210341 201920 460 85% 1.32E+07 1.32E+07Peku01 12506 14111 113 85% 8.14E+05 9.23E+05Peku05 28146 28446 169 85% 1.91E+06 2.24E+06Peku10 68685 75196 263 85% 4.73E+06 6.17E+06Peku15 161187 186608 402 85% 1.15E+07 1.71E+07Peku18 210341 201920 460 85% 1.32E+07 2.01E+07Peku01 12506 14111 113 85% 8.14E+05 1.02E+06Peku05 28146 28446 169 85% 1.91E+06 2.63E+06Peku10 68685 75196 263 85% 4.73E+06 7.52E+06Peku15 161187 186608 402 85% 1.15E+07 2.30E+07Peku18 210341 201920 460 85% 1.32E+07 2.75E+07
Up to 10%
0
0.25%
0.50%
…
URL: http://cadlab.cs.ucla.edu/~pubbench/peku.htm
24
Tested four State-of-the-Art Placers Capo [A. Caldwell et al, 2000]
Based on multilevel partitioner Aims to enhance the routability
Dragon [M. Wang et al, 2000] Uses hMetis for initial partition SA with bin-based swapping
mPL [T. Chan et al, 2000] Nonlinear programming on the coarsest level Goto based relaxation
mPG [C. Chang et al, 2002] Uses FC clustering and hierarchical density control Incremental A-tree for routability
25
Experimental Results on G-PEKU
The gap between their solutions and the upper bound varies between 79% and 102% in the worst case
Another validation that there is significant room for improvement for the placement problem
circuitDragon v.2.20
QRCapo v.8.5
QRmPG v.1.0
QRmPL v.2.0
QR
GPeku01 1.98 1.56 1.91 1.69GPeku05 2.01 1.69 1.97 1.83GPeku10 2.02 1.72 1.98 1.94GPeku15 1.99 1.79 1.97 1.97GPeku18 2.02 1.78 1.98 1.98
26
Experimental Results on PEKU
1
1.2
1.4
1.6
1.8
2
2.2
0.00% 0.25% 0.50% 0.75% 1.00% 2.00% 5.00% 10.00%% of non-local nets
Qu
alit
y R
atio
Capo v.8.5 Dragon v.2.20 mPG v.1.0 mPL v.2.0
mPL’s QR increases when is increased from 0 to 0.75%, while for the other three placers, QRs are steadily decreasing
Absolute value of the QRs may not be meaningful, but it helps to identify the technique that works best under each scenario
27
Overview
Motivation and related work Our contribution
Partitioning Examples with Known Upper bound
Placement Examples with Known Upper bound
Optimality, Scalability and Stability study Conclusions and future workConclusions and future work
28
Conclusions Bipartitioning techniques seem fairly mature
The best available algorithms perform and scale very well on examples by our construction
The best available multiway partitioning algorithms do not perform equally well The worst divergence from upperbound is 18%
by hMetis There is still significant room for improve
ment in circuit placement Existing placement algorithms may produce so
lutions far away from the optimal (or upper bound)
Their effectiveness depends much on the characteristic of circuits
29
Future Work
Construction of more synthetic examples
Measure routability optimality Measure timing optimality
Understand the deficiencies of existing algorithms using these examples
Guide the development of new VLSI CAD algorithms
30
Acknowledgement
Prof. I. Markov for providing Capo’s latest version
Prof. S. Lim for providing Flare’s latest version
X. Yuan for providing the data of mPG J. Shinnerl and K. Sze for providing the e
xperimental data of mPL
31
THE END
THANK YOU