Power-Aware Placement Yongseok Cheon, Pei-Hsin Ho Advanced Technology Group, Synopsys, Inc. {cheon,pho}@synopsys.com Andrew B. Kahng, Sherief Reda and Qinke Wang UCSD CSE Department
Power-Aware Placement Yongseok Cheon, Pei-Hsin Ho Advanced Technology Group, Synopsys, Inc. {cheon,pho}@synopsys.com Andrew B. Kahng, Sherief Reda and
Power-Aware Placement Yongseok Cheon, Pei-Hsin Ho Advanced
Technology Group, Synopsys, Inc. {cheon,pho}@synopsys.com Andrew B.
Kahng, Sherief Reda and Qinke Wang UCSD CSE Department
{abk,sreda,qiwang}@cs.ucsd.edu
3 IC Power Consumption Switching power largest source of power
dissipation usually accounts for 40% to 80% of total power
switching power of a net is proportional to the product of net
capacitance and signal switching rate Short circuit power power
dissipation due to short current that happens briefly during the
switching of a CMOS gate Leakage power power dissipation due to
spurious currents in the non-conducting state of a transistor
Slide 4
4 Clock Power Consumption Clock net a major contributor to
dynamic power much larger capacitances than most signal nets
highest switching activity typically consumes up to 40% of total
dynamic power across a variety of design types Traditional
placement methodologies treat registers no differently than
combinational cells lead to sub-optimal placements in terms of
power
Slide 5
5 Power Aware Placement Method Activity-based register
clustering reduce capacitance of clock nets hence clock power
Activity-based net weighting reduce capacitance of high-activity
signal nets hence total net switching power
7 Large Weight for Clock Net? Not a good idea May only affect
registers close to boundaries Introduce hot spots and highly
congested areas
Slide 8
8 Distribution of Clock Tree Capacitance Observation: most of
the clock tree capacitance (e.g., 80%) is at the leaf level
Slide 9
9 Register Clustering Goal: reduce capacitance of a clock net
Method: clumping the registers within the same leaf cluster of the
clock tree into a smaller area Result: reduced leaf-level clock
tree capacitance and potentially clock skew
Slide 10
10 Flow of Register Clustering 1.Quick CTS algorithm: group
registers into clusters such that each cluster can become a leaf
cluster of the actual clock tree 2.Group Bounds: constrain the
placement of a cluster of registers within smaller bounding
box
Slide 11
11 Quick Clock-Tree Synthesis Algorithm Decide a scope of
target cluster size heuristically based on size of the clock net
design rule constraints: max fanout and max load user configuration
Perform clustering for each direction from left, right, top and
down and each target cluster size Select the clustering with the
best CTS objective e.g., minimum clock skew, minimum clock delay,
minimum # clock buffers, etc.
Slide 12
12 Quick CTS Algorithm (contd) Start with the leftmost
(rightmost, highest or lowest) un- clustered clock pin Add clock
pin with shortest Manhattan distance to the capacitance weighted
centroid of the current cluster Grow until target cluster size
Repeat growing clusters until all done
Slide 13
13 Group Bounds Control bounding box of a cluster and reduce it
while still fitting the registers Compute current bounding box of
registers Shrink the bounding box proportionally Shrink ratio p
specified shrinking factor of p 0 switching rate of clock net SR
and max switching rate MSR
Slide 14
14 Aspect Ratio of Bounding Box Close to the original bounding
box aspect ratio AR old when shrinking ratio p is close to 1
without serious increasing of signal net length Close to square
when shrinking ratio p is close to 0 reduced clock skew Linear
function of original aspect ratio AR old and shrink ratio p
16 Effectively reduce capacitance of leaf-level clock tree
Increase the length of some signal nets Cancel out clock power
reduction Pros and Cons of Register Clustering
Slide 17
17 Activity-Based Net Weighting Goal: reduce capacitance of
signal nets Assigning larger weight to signal nets with higher
switching rates Combining register clustering and activity- based
net weighting further reduces the total net switching power
Slide 18
18 Assign larger weights to nets with higher switching rates T:
threshold for selecting high activity nets MSSR: maximum signal net
switching rate W: controls the scope of power weights
Activity-Based Net Weighting
Slide 19
19 Compatibility with Timing Weights Linear combination of
power and timing net weighting Power ratio : 0 ~ 1 control the
ratio of power weight knob for trade-off between timing and
power
21 Experimental Setup Implemented on Synopsys IC compiler Eight
industry circuits: #cells: 20k ~ 186k #registers: 2.3k ~ 44.2k
clock power: 32% of total power net switching power: 39% of total
power Power aware placement shrink ratio and power ratio around
0.8
Slide 22
22 Experimental Flow Commercial IC implementation flow Power
analysis: IC Compiler specified switching rates of primary inputs
net switching rates estimated by probabilistic simulation Place CTS
Route Extract RC STAPower Analysis
Slide 23
23 Clock Net Switching Power 11.2%
Slide 24
24 Total Net Switching Power 25.4%
Slide 25
25 Results
Slide 26
26 Summary Reduction clock net switching power: 11.3% (1.6% ~
34.5%) total net switching power: 25.3% (10.5% ~ 47.1%) total
power: 11.4% (6.5% ~ 18.8%) clock WL: 10.1% clock skew: random
Impact WNS (worst negative slack): 2.0% total cell area: 1.2%
runtime: 11.5%
Slide 27
27 Power-Timing Trade-Off with Power Ratio
Slide 28
28 Power-Timing Trade-Off with Shrink Ratio
Slide 29
29 Conclusions We have presented a power-aware placement method
that performs activity- based net weighting and register clustering
to reduce the capacitance of high-activity signal and clock nets We
have experimented the method on eight real designs through a
complete industrial physical design flow Our approach achieved
average 25.3% and 11.4% reduction in net switching and total power,
with 2.0% timing, 1.2% total cell area and 11.5% runtime
degradation