View
63
Download
0
Category
Tags:
Preview:
DESCRIPTION
An O(bn 2 ) Time Algorithm for Optimal Buffer Insertion with b Buffer Types. Authors: Zhuo Li and Weiping Shi Presenter: Sunil Khatri Department of Electrical Engineering Texas A&M University College Station, Texas 77845, USA. Outline. Introduction O(b 2 n 2 ) Algorithm - PowerPoint PPT Presentation
Citation preview
An O(bn2) Time Algorithm for Optimal Buffer Insertion with b Buffer Types
Authors: Zhuo Li and Weiping ShiPresenter: Sunil Khatri
Department of Electrical EngineeringTexas A&M University
College Station, Texas 77845, USA
2DATE 2005, MUNICH03/10/2005
Outline• Introduction• O(b2n2) Algorithm• New O(bn2) Algorithm• Experimental Results• Extension• Conclusion
3DATE 2005, MUNICH03/10/2005
• Buffer insertion and sizing is one of the most effective method for reducing interconnect delay.
Introduction
0
10
20
30
40
50
60
70
80
90nm 65nm 45nm 32nmTechnology node
%ce
lls u
sed
to re
peat
blo
ck-le
vel
nets
clk-repreptot-rep
0
5
10
15
20
25
30
35
90nm 65nm 45nm 32nmTechnology node
% re
peat
ed n
ets
M3
M6
Saxena, et al.
[TCAD 2004]
4DATE 2005, MUNICH03/10/2005
• Modern libraries contain hundreds of different buffers with different characteristics.
• Polarity, input capacitance, driving resistance, intrinsic delay, noise margin, power, area, etc.
• Buffer library size has quadratic effect on running time in traditional algorithms.
• With such large number of buffers and buffer types, fast algorithms for buffer insertion are crucial for timing closure.
Introduction(cont.)
5DATE 2005, MUNICH03/10/2005
Problem Formulation• Given: A routing tree, n possible buffer positions, sink
capacitances and required arrival times (RAT), a buffer library, wire resistance and capacitance.
• Delay model: Elmore delay for interconnect and linear delay model for buffers.
s0
s1s2
s3
s4
buffer library
source
sinks
possible buffer positions
6DATE 2005, MUNICH03/10/2005
Maximum Slack Problem• Find: Where to insert buffers so that the slack
at the source Q(s0) is maximized.
)},()({min)( 000 iiissdelaysRATsQ
s0
s1
s3
s4
without buffer, Q(S0)= – 50 ps
s2
7DATE 2005, MUNICH03/10/2005
Maximum Slack Problem• Find: Where to insert buffers so that the slack
at the source Q(s0) is maximized.
)},()({min)( 000 iiissdelaysRATsQ
s0
s1
s3
s4
with 2 buffers, Q(S0)= 100 ps
s2
8DATE 2005, MUNICH03/10/2005
Previous Research• Maximum Slack
• van Ginneken [ISCAS 90]: O(n2) time and space, where n is the number of buffer positions.
• Lillis, Cheng and Lin [TCAS 96]: O(b2n2) time and space for b buffer types.
• Shi and Li [DAC 03]: O(nlogn) time for 2-pin nets, O(nlog2n) time for multi-pin nets. O(nlogn) space.
• Minimum Buffer Cost (Area, Power, etc.)• Lillis, Cheng and Lin [TCAS 96]: pseudo-
polynomial time algorithm.• Shi, Li and Alpert [ASPDAC 04]: buffer cost
minimization is NP-hard if b is a variable.
9DATE 2005, MUNICH03/10/2005
Outline• Introduction• O(b2n2) Algorithm• New O(bn2) Algorithm• Experimental Results• Extension• Conclusion
10DATE 2005, MUNICH03/10/2005
Dynamic Programming• Each candidate solution of a sub-tree is represented by a
(Q, C) pair, where Q is slack and C is downstream capacitance.
• For any two candidates A1 and A2 of the same sub-tree, if Q(A1)Q(A2) and C(A1)C(A2), then A1 is redundant.
• O(b2n2) time dynamic programming algorithm (Lillis-Cheng-Lin)
• For b buffer types, the number of candidates is at most bn+1• For a wire, update (Q, C) value for every candidate in O(bn) time• For a buffer position, add b new candidates in O(b2n) time• For a branch point, merge two sets of candidates in O(bn1+bn2)
time
11DATE 2005, MUNICH03/10/2005
Dynamic Programming• Each candidate solution of a sub-tree is represented by a
(Q, C) pair, where Q is slack and C is downstream capacitance.
• For any two candidates A1 and A2 of the same sub-tree, if Q(A1)Q(A2) and C(A1)C(A2), then A1 is redundant.
• O(bn2) time dynamic programming algorithm (This paper)• For b buffer types, the number of candidates is at most bn+1• For a wire, update (Q, C) value for every candidate in O(bn)
time• For a buffer position, add b new candidates in O(bn) time• For a branch point, merge two sets of candidates in O(bn1+bn2)
time
12DATE 2005, MUNICH03/10/2005
Data Structure: Linked List• Use linked list to store non-redundant
candidates• Sorted in decreasing Q and decreasing C order• Each entry also contains the list of buffer positions
(Q1,C1) (Q2,C2) (Q3,C3)
Less CapacitanceBetter Slack
13DATE 2005, MUNICH03/10/2005
Best Candidates• For each buffer Bi, R(Bi) is buffer driver resistance, C(Bi)
is buffer input capacitance, and t(Bi) is buffer intrinsic delay. Label buffers according to non-decreasing order of resistance R(B1)R(B2) … R(Bb).
• For each buffer type Bi
• Define the best candidate i as the candidate that maximizes slack among all candidates after Bi is inserted.
• The new slack is Q(i)–R(Bi)C(i) –t(Bi).• Define the new candidate i as the candidate formed by i with
buffer type Bi.
• How to find all best candidates quickly is the key addressed in this paper.
14DATE 2005, MUNICH03/10/2005
Example• Three buffer types
• R(B1)=1, C(B1), t(B1)• R(B2)=3, C(B2), t(B2)• R(B3)=5, C(B3), t(B3)
Candidates (Q, C):(21, 5)(19, 4)(15, 3)(7, 2)(6, 1)
Best candidate for B1 is 1, and the new candidate is 1
Insert B1:(16t(B1), C(B1))(15t(B1), C(B1))(12t(B1), C(B1))(5t(B1), C(B1))(5t(B1), C(B1))
1 1
Insert B2:(6t(B2), C(B2))(7t(B2), C(B2))(6t(B2), C(B2))(1t(B2), C(B2))(3t(B2), C(B2))
22
Insert B3:(4t(B3), C(B3))(1t(B3), C(B3))(0t(B3), C(B3))(3t(B3), C(B3))(1t(B3), C(B3))
3 3
Best candidate for B3 is 3, and the new candidate is 3 Best candidate for B2 is 2, and the new candidate is 2
15DATE 2005, MUNICH03/10/2005
Outline• Introduction• O(b2n2) Algorithm• New O(bn2) Algorithm• Experimental Results• Extension• Conclusion
16DATE 2005, MUNICH03/10/2005
(Q, C) Plane
0
5
10
15
20
25
0 2 4 6 C
Q
• Non-redundant (Q, C) list is a monotonically decreasing sequence
• As resistance is added, Q values changeA5
(6, 1)
A4(7, 2)
A3(15, 3)
A2(19, 4)
A1(21, 5)
17DATE 2005, MUNICH03/10/2005
R(B1) = 1, Q=Q–R(B1)*C
0
5
10
15
20
25
0 2 4 6 C
Q
A1(21-5,
5)
18DATE 2005, MUNICH03/10/2005
R(B2) = 3, Q=Q–R(B2)*C
0
5
10
15
20
25
0 2 4 6 C
Q
A1(21-15, 5)
19DATE 2005, MUNICH03/10/2005
R(B3) = 5, Q=Q–R(B3)*C
-5
0
5
10
15
20
25
0 2 4 6C
Q
A1(21-25, 5)
20DATE 2005, MUNICH03/10/2005
As R Increases, Q Decreases
-50510152025
Q
1 2 3 4 5
R=5R
=3R=1R
=0
C
R=5R=3R=1R=0
21DATE 2005, MUNICH03/10/2005
Best Q Values Move to Left
-50510152025
Q
1 2 3 4 5
R=5R
=3R=1R
=0
C
R=5R=3R=1R=0
Best Q foreach R
22DATE 2005, MUNICH03/10/2005
0
5
10
15
20
25
0 2 4 6 C
Q
Best Candidates are in Decreasing Order of C
• Lemma 1: C(1) C(2) … C(b)
12
3
• Not enough for an O(bn) algorithm to find all best candidates.
• Need global search
23DATE 2005, MUNICH03/10/2005
0
5
10
15
20
25
0 2 4 6 C
Q
Pruned
Convex Pruning
A5
A4
A3
A2
• Convex pruning prune candidates like A4
A1
24DATE 2005, MUNICH03/10/2005
Before Convex Pruning
-50510152025
Q
1 2 3 4 5
R=5R
=3R=1R
=0
C
R=5R=3R=1R=0
Non-Convex
25DATE 2005, MUNICH03/10/2005
After Convex Pruning
-50510152025
Q
1 2 3 4 5
R=5R
=3R=1R
=0
C
R=5R=3R=1R=0
2
3
1
26DATE 2005, MUNICH03/10/2005
Convex Hull• After convex pruning, remaining list is a convex hull• Lemma 3: Best candidates must be on the convex hull
• A candidate is on the convex hull if and only if there exists an resistance R such that when R is added, this candidate gives maximum Q
• Lemma 4: On convex hull, if Ai gives maximum Q among neighboring candidates, Ai gives maximum Q among all candidates
• The slope (Qi Qj)/(CiCj) between candidates Ai and Aj (i>j) is the extra resistance value that makes Aj to have better slack than Ai
• On convex hull, slopes are in sorted order• Local Optimal Global Optimal
27DATE 2005, MUNICH03/10/2005
0
5
10
15
20
25
0 2 4 6 C
Q
Local Optimal Global Optimal
A5
A2
A1 • For any R(Bi), if A2
gives better slack than A1 and A3, then A2 is the best candidate for Bi.
A3
28DATE 2005, MUNICH03/10/2005
Q
C
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can
perform convex pruning in linear time
29DATE 2005, MUNICH03/10/2005
Q
C
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can
perform convex pruning in linear time
30DATE 2005, MUNICH03/10/2005
Q
C
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can
perform convex pruning in linear time
31DATE 2005, MUNICH03/10/2005
Find Convex Hull: Graham’s Scan • Since the points are sorted, Graham’s scan can
perform convex pruning in linear time
Q
C
32DATE 2005, MUNICH03/10/2005 O(bn)
O(blogb)
O(bn)
O(bn)
New Subroutine for Adding Buffer• At each buffer position, given the (Q, C) list N in decreasing C order and the
buffer library, where R(B1) R(B2) … R(Bb).
• Generate new (Q, C) list A1, A2, …, with Convex Pruning
• Generate new candidates 1 , 2 … with the following loop• Initialize j = 1, then for i = 1 to b do
If Aj gives better slack than Aj+1 thenGenerate new candidates i for buffer Bi
Q(i ) = Q(Aj)–R(Bi)C(Aj) –t(Bi)C(i) = C(Bi)elsej = j + 1
• Sort i s in non-increasing C order.
• Insert i s into original list N
33DATE 2005, MUNICH03/10/2005
O(bn2) Algorithm• Dynamic programming
• For b buffer types, the number of candidates is at most bn+1
• For a wire, update (Q, C) value for every candidate in O(bn) time
• For a buffer position, add b new candidates in O(bn) time
• For a branch point, merge two sets of candidates in O(bn1+bn2) time
• Total complexity is O(bn2).
34DATE 2005, MUNICH03/10/2005
Outline• Introduction• O(b2n2) Algorithm• New O(bn2) Algorithm• Experimental Results• Extension• Conclusion
35DATE 2005, MUNICH03/10/2005
Speedup over O(b2n2) Algorithm
024681012
RunningTime
Speedup
8 16 32 64
net1ne
t2net3
Buffer Library Size
net1: 337 sinksnet2: 1944 sinks net3: 2676 sinks
36DATE 2005, MUNICH03/10/2005
Speedup vs. Buffer Positions
024681012
RunningTime
Speedup
1x 10x 20xnet1net2
net3
Normalized Buffer Positions
Buffer Library Size: 64
37DATE 2005, MUNICH03/10/2005
Outline• Introduction• O(b2n2) Algorithm• New O(bn2) Algorithm• Experimental Results• Extension and Conclusion
38DATE 2005, MUNICH03/10/2005
Extension to Min Buffer Cost• Buffer cost is associated with area and power• Find a solution satisfying the slack requirement
and at the same time, has minimum buffer cost• Each candidate solution is represented by a (Q,
C, W) triple, where Q is slack, C is capacitance, and W is buffer cost
• Worst-case NP-hard• Our algorithm can reduce the operation of
adding a buffer from O(bN) to O(N), where N is the number of non-redundant candidates
39DATE 2005, MUNICH03/10/2005
Conclusion• New O(bn2) algorithm for optimal buffer
insertion with b buffer types• Best candidates must be in decreasing order of C• Best candidates must be on the convex hull• Local optimal global optimal
• Applicable to cost minimization and inverting buffer types
40DATE 2005, MUNICH03/10/2005
Thank You!
Recommended