F.F. Dragan F.F. Dragan (Kent State)(Kent State)
A.B. Kahng A.B. Kahng (UCSD)(UCSD)
I. Mandoiu I. Mandoiu (UCLA/UCSD)(UCLA/UCSD)
S. Muddu S. Muddu (Sanera Systems)(Sanera Systems)
A. Zelikovsky A. Zelikovsky (Georgia State)(Georgia State)
Practical Approximation Algorithms for Separable Packing LPs
Practical Approximation Algorithms for Separable Packing LPs
2
Outline
• VLSI design motivation
– Global routing via buffer-blocks
– Separable packing ILP formulations
• PTAS for separable packing LPs
• Analysis
• Experimental results
3
Outline
• VLSI design motivation
– Global routing via buffer-blocks
– Separable packing ILP formulations
• PTAS for separable packing LPs
• Analysis
• Experimental results
4
Outline
• VLSI design motivation
– Global routing via buffer-blocks
– Separable packing ILP formulations
• PTAS for separable packing LPs
• Analysis
• Experimental results
5
Outline
• VLSI design motivation
– Global routing via buffer-blocks
– Separable packing ILP formulations
• PTAS for separable packing LPs
• Analysis
• Experimental results
6
VLSI Global Routing
7
VLSI Global RoutingBuffered
Buffer Blocks
8
Problem Formulation
Global Routing via Buffer-Blocks (GRBB) ProblemGiven:
• BB locations and capacities
• List of multi-pin nets– upper-bound on #buffers for each source-sink path
• L/U bounds on the wirelength b/w consecutive buffers/pins
Find:
• Buffered routing of a maximum number of nets subject to the given constraints
9
Integer Program Formulation
}],[)(:)({
BlocksBuffer terminals
:),(graph Routing
ULu,vdistu,vE
V
EVG
otherwisecapacity BB terminal,is vif 1 cap(v)
otherwise 0 , if 1 ),(
}1,0{)(
)(cap)(),(..
)(max
TvvT
Tf
vTfvTts
Tf
T
T
10
Enforcing Parity Constraints
• Inverting buffers change the polarity of the signal• Each sink has a given polarity requirement
Parity constraints for the #buffers on each routed source-sink path A path may use two buffers in the same buffer block
)(cap)()]'',()',([ rTfrTrTT
Integer program changes• Split each BB vertex r of G into two copies, r’ and r’’• Impose capacity constraint on the sets of vertices {r’,r’’}
11
Combining with compaction
12
Combining with compaction
13
Combining with compaction
Set capacity constraints: cap(BB1) + cap(BB2) const.
14
GRBB with Buffer Library
• Discrete buffer library: different buffer sizes/driving strengths Need to allocate BB capacity between different buffer types
)(cap)()'()',()('
rTfrsizerTT rXr
Integer program changes• Replace each BB vertex r of G by a set X(r) of vertices (one
for each buffer type)• Modify edge set of G to take into account non-uniform
driving strengths• Impose capacity constraint on the sets of vertices X(r):
15
“Relax+Round” Approach to GRBB
1. Solve the fractional relaxation
– Exact linear programming algorithms are impractical for large instances
– KEY IDEA: use an approximation algorithm
• allows fine-tuning the tradeoff between runtime and solution quality
2. Round to integer solution
– Provably good rounding [RT87]
– Practical runtime (random-walk based)
16
Outline
• VLSI design motivation
– Global routing via buffer-blocks
– Separable packing LP formulations
• PTAS for separable packing LPs
• Analysis
• Experimental results
17
Separable Packing LP
vZcap
vvsizeRVsize
EVG
V inalevery termfor 1cap({v}) s.t. 2:function Capacity
inalevery termfor 1)( s.t. :function Size
),(graph Routing
X
T
T
vsizevTXT
Tf
XTfXTts
Tf
)(),( ),(
0)(
)(cap)(),(..
)(max
18
Previous Work
• MCF and packing/covering LP approximation: [FGK73,SM90, PST91,G92,GK94,KPST94,LMPSTT95,R95,Y95,GK98,F00,…]
• Exponential length function to model flow congestion [SM90]
• Shortest-path augmentation + final scaling [Y95]
• Modified routing increment [GK98]
• Fewer shortest-path augmentations [F00]
• We extend speed-up idea of [F00] to separable packing LPs
19
Separable Packing LP Algorithm
w(X) , f 0, = For i = 1 to N do For k = 1, …, #nets do Find min weight feasible Steiner tree T for net k While weight(T) < min{ 1, (1+) } do f(T)= f(T) + 1 For every X do w(X) ( 1 + (T,X)/cap(X) ) * w(X) End For Find min weight feasible Steiner tree T for net k End While End For = (1+) End ForOutput f/N
20
Outline
• VLSI design motivation
– Global routing via buffer-blocks
– Separable packing ILP formulations
• PTAS for separable packing LPs
• Analysis
• Experimental results
21
Runtime
0)(
1)(),(..
)(cap)(min
Xf
XwXTts
XXw
X
X
Dual LP:
• Choose #iterations N such that all feasible trees have weight 1 after N iterations (i.e., 1)
• Tree weight lower bound is initially, and is multiplied by (1+) in each iteration
1
log 1N
22
Approximation Guarantee
)log)nets(#( 2 LTO tree
Theorem: For every <.15, the algorithm finds factor
1/(1+4 ) approximation by choosing
where L is the maximum number of vertices in a
feasible Steiner tree. For this value of , the running
time is
1
))1)((1(
L
23
Outline
• VLSI design motivation
– Global routing via buffer-blocks
– Separable packing ILP formulations
• PTAS for separable packing LPs
• Analysis
• Experimental results
24
Implementation choices
2-Pin 3,4-pin Multi-pin
Decomposition Star,
Minimum Spanning tree
Matching,
3-restricted Steiner tree
Not needed
Min-weight DRST Shortest path (exact)
Try all Steiner pts
+ shortest paths (exact)
Very hard!
heuristics
Rounding Random-walk Backward random-walks
25
1. Store fractional flows f(T) for every feasible Steiner tree T
2. Scale down each f(T) by 1- for small
3. Each net k routed with prob. f(k)={ f(T) | T feasible for k }
Number of routed nets (1- )OPT
4. To route net k, choose tree T with probability = f(T) / f(k)
With high probability, no BB capacity is exceeded
Problem: Impractical to store all non-zero flow trees
Provably Good Rounding
26
1. Store fractional flows f(T) for every valid routing tree T
2. Scale down each f(T) by 1- for small
3. Each net k routed with prob. f(k)={ f(T) | T routing for k }
Number of routed nets (1- )OPT
4. To route net k, choose tree T with probability = f(T) / f(k)
With high probability, no BB capacity is exceeded
Random-Walk 2-TMCF Rounding
use random walk from source to sink
Practical: random walk requires storing only flows on edges
27
Random-Walk MTMCF Rounding
ST1
T2
T3SourceSinks
28
Random-Walk MTMCF Rounding
ST1
T2
T3SourceSinks
29
The MTMCF Rounding Heuristic
1. Round each net k with probability f(k), using backward
random walks
– No scaling-down, approximate MTMCF < OPT
2. Resolve capacity violations by greedily deleting routed paths
– Few violations
3. Greedily route remaining nets using unused BB capacity
– Further routing still possible
30
Implemented Heuristics
• Greedy buffered routing:1. For each net, route sinks sequentially along shortest paths to
source or node already connected to source
2. After routing a net, remove fully used BBs
• Generalized MCF approximation + randomized rounding– G2TMCF – G3TMCF (3-pin decomposition)– G4TMCF (4-pin decomposition)– GMTMCF (no decomposition, approximate DRST)
31
Experimental Setup
• Test instances extracted from next-generation SGI microprocessor
• Up to 5,000 nets, ~6,000 sinks • U=4,000 m, L=500-2,000 m• 50 buffer blocks• 200-400 buffers / BB
32
% Routed Nets vs. Runtime
93
94
95
96
97
98
99
0.1 1 10 100 1000 10000 100000
CPU Seconds
% r
ou
ted
ne
ts
MT-Greed
G2TMCF
G3TMCF
G4TMCF
GMTMCF
33
Conclusions and Ongoing Work
• Provably good algorithms and practical heuristics based on separable packing LP approximation– Higher completion rates than previous algorithms
• Extensions:– Combine global buffering with BB planning– Buffer “site” methodology tile graph– Routing congestion (channel capacity constraints)– Simultaneous pin assignment
34
35
% Sinks Connected
#sinks/
#netsGreed
G2TMCF G3TMCF G4TMCF GMTMCF
=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04
2958/ 2396
92.2 93.8 95.5 96.2 97.8 96.6 98.3 96.7 97.4
3077/ 2438
92.3 93.9 96.5 96.4 98.5 96.9 98.8 97.6 99.3
3099/ 2784
92.1 93.6 95.5 96.4 98.0 96.6 98.1 97.3 98.7
6038/ 4764
93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7
6296/ 4925
93.6 96.2 97.6 97.0 98.6 97.7 99.1 97.7 98.4
6321/ 4938
93.3 96.2 97.5 96.8 98.4 97.7 98.9 97.7 98.2
36
Runtime (sec.)
#sinks/ #nets
Greed
G2TMCF G3TMCF G4TMCF GMTMCF
=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04
2958/ 2396
.30 1.63 357 9.16 2,090 98.91 29,190 2.33 947
3077/ 2438
.33 2.35 350 11.10 2,356 128.38 37,970 2.87 846
3099/ 2784
.33 1.80 392 12.56 2,364 132.81 38,341 2.86 877
6038/ 4764
.53 2.84 600 16.57 3,166 182.55 60,450 4.98 1,866
6296/ 4925
.55 4.35 690 19.5 3,721 265.78 77,671 5.38 1,828
6321/ 4938
.54 3.37 730 18.99 3,813 255.37 79,123 5.43 1,833
37
Resource Usage
GreedG2TMCF G3TMCF G4TMCF GMTMCF
=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04# Conn. Sinks
5,645 5,725 5,842 5,779 5,896 5,827 5,942 5,813 5,897
% Conn. Sinks
93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7
WL (meters)
42.22 45.18 47.80 44.48 47.66 44.18 47.49 45.33 47.51
WL/sink (microns)
7,479 7,891 8,182 7,697 8,083 7,582 7,992 7,798 8,057
#Buff 9037 9,860 10,676 9,591 10,610 9,497 10,507 9,860 10,647
#Buff/sink 1.60 1.72 1.83 1.66 1.80 1.63 1.77 1.70 1.81
#nets = 4,764 #sinks = 6,038 400 buffers/BB
38
Resource Usage for 100% Completion
Greed 4TMCF, =.04=.04
#buffers/BB 1,000 or INF 500 600 1,000 INF
WL (meters) 47.89 49.46 49.58 49.98 51.40
WL/sink (microns)
7,931 8,191 8,212 8,278 8,513
#Buff 10,330 11,079 11,115 11,373 11.803
#Buff/sink 1.71 1.83 1.84 1.88 1.95
#nets = 4,764 #sinks = 6,038 MTMCF wastes routing resources!