View
216
Download
1
Tags:
Embed Size (px)
Citation preview
An Analytic Placer for Mixed-Size Placement and Timing-Driven Placement
Andrew B. Kahng and Qinke WangUCSD CSE Department
{abk, qiwang}@cs.ucsd.edu
Work partially supported by the MARCO GigascaleSystems Research Center, NSF MIP-9987678 and the
Semiconductor Research Corporation.
Motivation
• Mixed-size placement– design productivity increasingly requires IP reuse
• processing / interface cores, embedded memories, etc.
– “boulders and dust” challenge:sizes of placeable objects can vary by factors of 10,000 or more
– placement is particularly complex in fixed-die context
• Timing-driven placement– more critical with device and interconnect scaling
Our Work
• APlace [Kahng/Wang ISPD04]: an analytic placer for wirelength-driven standard-cell placement – [Naylor et al., US Patent 6301693, 2001]– superior wirelength quality compared to Cadence
QPlace, Dragon and Capo– strong extensibility: congestion-directed placement,
I/O-core co-placement, constraint handling for mixed-signal, etc.
– poor scalability: average 13.2 X slower than Capo
• This work: extend APlace to address mixed-size placement and timing-driven placement
Outline
• APlace Background
• Extension to Mixed-Size Placement
• Extension to Timing-Driven Placement
• Conclusions and Ongoing Work
Outline
• APlace Background– Formulations
• wirelength minimization• cell spreading = density control
– Implementation
• Extension to Mixed-Size Placement
• Extension to Timing-Driven Placement
• Conclusion and Ongoing Work
Wirelength Formulation
• Placement objective: HPWL• Smooth approximation Naylor et al., US Patent 6301693, 2001
– log-sum-exp formula: pick the most dominant terms among pin coordinates
: smoothing parameter– closer to HPWL when α → 0– precise– strictly convex– continuously differentiable
Density Control
• Common strategy– divide the placement area into grids– equalize the total cell area in each grid
• Penalty of an uneven cell distribution
– not smooth or differentiable– difficult to optimize
Cell Potential Function
• Bell-shaped cell potential function [Naylor et al., US Patent 6301693, 2001]
• Cell c has potential(c, g) with respect to grid g
• Cell c at (x, y) has area A• Grid point g = (x', y')• p(d) : bell-shaped function • r : the radius of cells' potential • C : a proportionality factor, s.t.
r
1-2d2/r2
2(r-d)2/r2
r/2r/2r
d
p(d)
Implementation
• Cells are spread by minimizing the smooth density penalty function
• APlace combines the above two objectives and optimizes the following function using a Conjugate Gradient optimizer:
– Density term drives cell spreading – Wirelength term draws connected components
back toward each other
Wirelength vs. Density Objectives
• Density weight: fixed – larger spread cells out hastily without good wirelength
• Wirelength weight: variable– larger contract cells together and prevent them from
spreading out– initially set to be large– repeat until all cells are spread out evenly:
• execute conjugate-gradient solver until convergence• reduce the weight by half
Objective:
Outline
• APlace Background
• Extension to Mixed-Size Placement– Density control for macros– Legalization– Experimental results
• Extension to Timing-Driven Placement
• Conclusion and Ongoing Work
Previous Works
• Capo flow: a three stage placement-floorplanning-placement flow that uses Capo [Adya et al., ISPD02, ICCAD03]
• mPG-MS: a simulated annealing based multi-level placer[Chang et al., ASPDAC03]
• Feng Shui: a recursive bisection based placement tool using fractional cuts[Khatkhate et al., ISPD04]
Potential Function for Macros (I)
• Each module has a potential or influence with respect to nearby grids
• APlace seeks to equalize the total module potential at each grid
• rm is the radius of module’s potential • Standard-cell placement: rm is a constant r • Mixed-size placement: rm changes
according to the module's dimension• A larger block will have potential with
respect to more nearby grids
Potential Function for Macros (II)
• p(d) : potential function d : distance from module to grid
• Radius rm = w/2 + r for a block with width w
• Convex curved < w/2 + r/2
• Concave curvew/2 + r/2 < d < w/2+ r
• p(d) is smooth atd = w/2 + r/2 w/2+r
1-a*d2
b*(r-d)2
d
p(d)
w/2+r/2w/2+r w/2+r/2
Legalization
• Simplified Tetris algorithm [Hill, US Patent 6370673, 2002]– sort modules based on a linear combination of vertical
coordinate and width– search the current nearest available position for each
module• Pros and cons
fast larger blocks are fixed at a position ahead of nearby
small cells best applied when modules are distributed evenly may fail if the global placement has many overlaps
among macros
APlace-MS Results• Ten ISPD02 Mixed-Size Benchmarks (10K-70K cells)• Average wirelength increase after legalization: 6.5%
circuitWL WL_l inc. (%) CPU WL_dp impr. (%) CPU
ibm01 0.20 0.24 18.5 15 0.23 5.7 1ibm02 0.51 0.52 0.7 45 0.50 2.5 3ibm03 0.70 0.74 6.2 56 0.72 3.5 3ibm04 0.81 0.85 4.8 48 0.83 2.8 4ibm05 1.01 1.00 -0.5 15 0.98 2.0 5ibm06 0.65 0.71 9.6 76 0.68 4.4 5ibm07 1.03 1.09 5.8 98 1.05 3.7 8ibm08 1.49 1.50 0.6 128 1.46 2.7 8ibm09 1.25 1.45 15.7 113 1.38 5.2 9ibm10 2.97 3.07 3.3 206 3.00 2.2 11
APlace-MS detailed placement
Detailed placement by Feng Shui: 3.5% avg. WL improvement
HPWL Comparison
• Capo flow [ICCAD03] 26.0% (11.5% ~ 34.0%)
• mPG-MS [ASPDAC03]24.7% (9.9% ~ 40.1%)
• Feng Shui [ISPD04] 4.0% (-7.3% ~ 20.0%)
• Runtime– Xeon server (2.4GHz
CPU, double-threaded)– much slower than Feng
Shui
circuitHPWL CPU HPWL CPU HPWL CPU HPWL CPU
ibm01 0.31 20 0.30 18 0.24 3 0.23 16ibm02 0.68 11 0.74 32 0.53 5 0.50 48ibm03 1.04 59 1.20 32 0.75 6 0.72 59ibm04 1.01 15 1.05 42 0.80 7 0.83 52ibm05 1.11 5 1.09 36 1.01 8 0.98 20ibm06 0.99 18 0.92 45 0.68 10 0.68 81ibm07 1.53 25 1.37 68 1.17 13 1.05 106ibm08 1.79 29 1.64 82 1.36 16 1.46 136ibm09 1.99 29 1.86 84 1.38 15 1.38 122ibm10 4.55 116 4.36 172 3.75 22 3.00 217
Capo mPG-MS Feng Shui our placer
Placements Before and After Legalization
Outline
• APlace Background
• Extension to Mixed-Size Placement
• Extension to Timing-Driven Placement– Slack-derived edge weights– Timing-driven placement flow– Experimental results
• Conclusion and Ongoing Work
Timing-Driven Approaches
• Path based methods– consider all or a subset of paths directly – maintain an accurate timing view during optimization– complexity is relatively high
• Net based methods– transform timing constraints or requirements into either
net weight or net length (or delay) constraints
Net Based Methods
• Delay budgeting– distribute slacks from the end-points to constituent nets
along the path– may severely over-constrain the problem without
consideration of physical feasibility
• Net weighting– assign weights to nets based on timing criticality– low complexity, strong flexibility and easy
implementation– more attractive as circuit sizes increase and timing
constraints become more complex
Slack-Derived Edge Weights
• Net weighting in TD-APlace
– β : timing criticality exponent– slack(π) : the slack of path π – T : longest path delay
• Heavy net weights are assigned to:– timing critical nets exponential function
[Marquardt et al. 2000]– nets included in many critical paths
[Kong ICCAD02]
Timing-Driven Placement Flow
• Final placement stage• TrialRoute (SoC Encounter
v3.2): a fast global and detailed routing
• Extract RC• Pearl (SE v5.4): static timing
analysis (STA)• Import critical path delays to
decide net weights • Minimize weighted WL
objective
APlace-TD
LEF/DEF/GCF/SDC
TrialRouteExtractRC
Pearl
Critical PathsMin Cycle
Timing Results: Indust1 Testcase
• Indust1: ~ 7k cells• Xeon 2.4GHz CPU,
double-threaded• Minimum cycle time
– measures quality of TD placements
– initially decreases with criticality exponent
– gradually deteriorates as criticality exponent continues to increase
Results with varying criticality exponents (β)
TrialRoutebeta WL CPU WL min cycle impr. (%)
0 0.4468 11 0.5853 14.30 0.00
3 0.4468 12 0.5845 13.86 3.085 0.4469 12 0.5857 13.76 3.787 0.4470 12 0.5860 13.86 3.089 0.4469 12 0.5873 13.62 4.7611 0.4473 12 0.5873 13.66 4.4813 0.4477 12 0.5869 13.57 5.1015 0.4480 12 0.5852 13.84 3.2217 0.4480 12 0.5875 13.57 5.1019 0.4485 11 0.5881 13.58 5.03
Placement STA
Comparison vs. Industry Placers (I)
• Two industry placers– QPlace (SE v5.4)– amoebaPlace (SoC
Encounter v3.2)• Six industry circuits
– 7k ~ 40k cells– two from the ISPD 2001
Circuit Benchmarks• Experimental flow
– TD or non-TD placements– WarpRoute (SoC
Encounter v3.2) : timing-driven routing
– Extract RC– Pearl (SE v5.4): static
timing analysis (STA)
TD-Place
LEF/DEF/GCF/SDC
TD-WarpRouteExtractRC
Pearl
Min Cycle
nonTD-Place
Comparison vs. Industry Placers (II)
• Comparison to TD-QPlace and TD-amoebaPlace
• Final HPWL– TD-QPlace: 7.2%
(-1.2% ~ 7.1%)– TD-amoebaPlace:
6.5% (-11.1% ~ 23.2%)
• Min Cycle– TD-QPlace: 9.6%
(-1.2% ~ 14.8%)– TD-amoebaPlace:
8.5% (-0.8% ~ 28.5%)
– APlace: 2% (0.1% ~ 3.8%)
Route STAckts cells placer HPWL CPU WL min cycle
indust1 7077 TD-QPlace 0.58 21 0.73 15.04TD-Amoeba 0.61 1 0.88 14.91APlace 0.51 14 0.67 14.28TD-APlace 0.51 11 0.68 13.83
indust2 20094 TD-QPlace 1.29 80 2.31 38.87TD-Amoeba 1.40 5 2.11 46.98APlace 1.31 58 2.32 34.92TD-APlace 1.31 55 2.34 33.60
indust3 40447 TD-QPlace 0.34 37 0.41 27.20TD-Amoeba 0.36 5 0.42 27.31APlace 0.35 119 0.43 27.65TD-APlace 0.34 112 0.41 27.53
indust4 35272 TD-QPlaceTD-Amoeba 15.08 6 16.74 402.09APlace 12.84 65 15.33 401.49TD-APlace 12.80 80 15.30 401.28
mac1 5937 TD-QPlace 0.33 6 0.52 4.66TD-Amoeba 0.36 1 0.47 4.46APlace 0.28 6 0.40 4.13TD-APlace 0.28 9 0.40 4.06
mac2 21491 TD-QPlace 1.29 22 2.27 7.25TD-Amoeba 1.48 3 2.18 6.64APlace 1.15 37 2.13 6.26TD-APlace 1.14 38 2.11 6.18
fail in TD-QPlace
Place
Conclusions
• APlace analytic placement framework extended to address mixed-size and timing-driven placement
• Mixed-size placement – HPWL outperforms mPG-MS, Feng Shui and the Capo
flow respectively by 24.7%, 4.0% and 26.0% on average
• Timing-driven placement– Minimum cycle time outperforms that of TD-QPlace and
TD-amoebaPlace respectively by 9.6% and 8.5%– Routed WL outperforms that of TD-QPlace and
TD-amoebaPlace respectively by 7.2% and 6.5%
Ongoing Work
• Scalability issue– APlace currently does not scale to large instances– control scheme for larger circuits– Augmented Lagrangian method for constrained
nonlinear optimization
– multigrid algorithm • Extension to low power or IR drop directed
placement• Extension to 3D or thermal-aware placement
Acknowledgments
• We thank Brent Gregory, Will Naylor and Synopsys, Inc. for a research and educational license pertaining to U.S. Patents 6282693, 6662348, 6301693, 6671859 and 6665851.
Thank You !
HPWL Results Comparison
• Comparison (HPWL) – the Capo flow [ICCAD03]
26.0% (11.5% ~ 34.0%)– mPG-MS [ASPDAC03]
24.7% (9.9% ~ 40.1%)– Feng Shui [ISPD04]
4.0% (-7.3% ~ 20.0%)
• Comparison (Running Time)– Xeon server (2.4GHz
CPU, double-threaded)– much slower than Feng
Shui
Comparison of our results with the Capo flow, mPG-MS and Feng Shui
circuitHPWL CPU HPWL CPU HPWL CPU HPWL CPU
ibm01 0.31 20 0.30 18 0.24 3 0.23 16ibm02 0.68 11 0.74 32 0.53 5 0.50 48ibm03 1.04 59 1.20 32 0.75 6 0.72 59ibm04 1.01 15 1.05 42 0.80 7 0.83 52ibm05 1.11 5 1.09 36 1.01 8 0.98 20ibm06 0.99 18 0.92 45 0.68 10 0.68 81ibm07 1.53 25 1.37 68 1.17 13 1.05 106ibm08 1.79 29 1.64 82 1.36 16 1.46 136ibm09 1.99 29 1.86 84 1.38 15 1.38 122ibm10 4.55 116 4.36 172 3.75 22 3.00 217
Capo mPG-MS Feng Shui our placer