Provably Good Global Buffering by Multiterminal Multicommodity Flow Approximation

F.F. Dragan F.F. Dragan (Kent State)(Kent State)

A.B. Kahng A.B. Kahng (UCSD)(UCSD)

I. Mandoiu I. Mandoiu (UCLA)(UCLA)

S. Muddu S. Muddu (Sanera Systems)(Sanera Systems)

A. Zelikovsky A. Zelikovsky (Georgia State)(Georgia State)

Provably Good Global Buffering by Multiterminal Multicommodity Flow

Approximation

Provably Good Global Buffering by Multiterminal Multicommodity Flow

Approximation

2

Outline

• Buffer-block methodology for global buffering

• Global routing via buffer-blocks problem

• Integer node-capacitated multiterminal multicommodity flow (MTMCF) formulation

• Provably good approximation of fractional MTMCF

• Provably good rounding of fractional MTMCF

• Key implementation choices

• Experimental results

• Extensions & conclusions

3

Motivation

• VDSM buffer / inverter insertion for all global nets – 50nm technology >1,000,000 buffers

• Solution: insert buffers only in Buffer-Blocks (BBs)

Simplified design: isolates buffer insertion from circuit block implementations

Efficient utilization of routing/area resources (RAR) RAR(cap. 2k buffer-block) = RAR(cap. k buffer-block)

For high-end designs, 1.6

4

1. Buffer-block planning [Cong+99] [TangW00]

– given placement of circuit blocks + netlist– find shape and location of BBs within available free space so that to

maximize the number of routable nets

2. Global buffering via given BBs This paper– given nets + BB locations and capacities– find buffered routing for each net, subject to timing-driven and buffer-

parity constraints

Buffer-Block Methodology

5

Buffer-Block Methodology

6

Problem Formulation

Global Buffering via Buffer-Blocks (GRBB) ProblemGiven:

• BB locations and capacities

• list of multi-pin nets, each net has

• upper-bound + parity requirement on #buffers for each source-sink path

• [non-negative weight (criticality coefficient)]

• L/U bounds on wirelength b/w consecutive buffers/pins

Find:

• buffered routing of a maximum [weighted] number of nets subject to the given constraints

[Dragan+00]: 2-pin nets This paper: multi-pin nets

7

Our Contributions

• Integer node-capacitated MTMCF formulation

• Approximation algorithm for fractional MTMCF

– Extends [GargK98,Fleischer99,Albrecht00,Dragan+00] to node-capacitated + multiterminal case

• Provably good fractional MTMCF rounding algorithms,

Provably good algorithm for GRBB Problem

• Practical rounding heuristics based on random-walks

• Computational study comparing alternative implementations

8

Integer Program Formulation

}],[:{

BlocksBuffer pins :),(Graph

ULdist(u,v)(u,v)E

VEVG

oncemost at routednet each

1 set to is )(cap ,pin every For

vv

{0,1,2})( through passes tree times# ),(

every for },1,0{)(

every for ),(cap)(),(..

)(max

vTvT

TTf

VvvTfvTts

Tf

T

T

9

“Relax+Round” Approach

1. Solve the fractional relaxation

– Relaxation = node-capacitated multiterminal multicommodity flow

– Exact linear programming algorithms are impractical for large instances

– KEY IDEA: use approximation algorithm

• can approximate optimum within a factor of (1-) for any >0

• allows continuous tradeoff between runtime and solution quality

2. Round to integer solution

– Provably good rounding using [RaghavanT87]

– Practical rounding using random-walks

10

The -MTMCF Algorithm

w(v) = , f = 0For i = 1 to N do For k = 1, …, #nets do Find min weight valid routing tree T for net k While w(T) < min{ 1, (1+2)^i } do f(T)= f(T) + 1 For every v T do w(v) ( 1 + (T,v)/cap(v) ) * w(v) End For Find min weight valid routing tree T for net k End While End ForEnd ForOutput f/N

11

Runtime of -MTMCF Algorithm

Main step of -MTMCF algorithm: computing min node-weight valid routing tree for a net min node-weight directed rooted Steiner tree (DRST) in a directed acyclic graph

))nets(#( Runtime 2 MO

4,3,2 nets,pin -for )BBs)(#(

DRSTweight -nodemin compute totime1

kkO

Mk

12

Implementation choices

2-Pin 3,4-pin Multi-pin

Decomposition Star,

Minimum Spanning tree

Matching,

3-restricted Steiner tree

Not needed

Min-weight DRST Shortest path (exact)

Try all Steiner pts

+ shortest paths (exact)

Very hard!

heuristics

Rounding Random-walk Backward random-walks

[Dragan+00] This paper

13

1. Store fractional flows f(T) for every valid routing tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T routing for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Problem: Impractical to store all non-zero flow trees

Provably Good Rounding

14

1. Store fractional flows f(T) for every valid routing tree T

2. Scale down each f(T) by 1- for small

3. Each net k routed with prob. f(k)={ f(T) | T routing for k }

Number of routed nets (1- )OPT

4. To route net k, choose tree T with probability = f(T) / f(k)

With high probability, no BB capacity is exceeded

Random-Walk 2-TMCF Rounding

use random walk from source to sink

Practical: random walk requires storing only flows on edges

15

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

16

Random-Walk MTMCF Rounding

ST1

T2

T3SourceSinks

17

The MTMCF Rounding Heuristic

1. Round each net k with probability f(k), using backward

random walks

– No scaling-down, approximate MTMCF < OPT

2. Resolve capacity violations by greedily deleting routed paths

– Few violations

3. Greedily route remaining nets using unused BB capacity

– Further routing still possible

18

Implemented Heuristics

• Greedy buffered routing1. For each net, route sinks sequentially along shortest

path to source or node already connected to source

2. After routing a net, remove fully used BBs

• MTMCF approximation + randomized rounding– 2TMCF [Dragan+00]

– 3TMCF (3-pin decomposition + -MTMCF + rounding)

– 4TMCF (4-pin decomposition + -MTMCF + rounding)

– MTMCF (-MTMCF w/ approximate DRST + rounding)

19

Experimental Setup

• Test instances extracted from next-generation SGI microprocessor

• Up to 5,000 nets, ~6,000 sinks • U=4,000 m, L=500-2,000 m• 50 buffer blocks• 200-400 buffers / BB

20

% Sinks Connected

#sinks/

#netsGreed

2TMCF 3TMCF 4TMCF MTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

92.2 93.8 95.5 96.2 97.8 96.6 98.3 96.7 97.4

3077/ 2438

92.3 93.9 96.5 96.4 98.5 96.9 98.8 97.6 99.3

3099/ 2784

92.1 93.6 95.5 96.4 98.0 96.6 98.1 97.3 98.7

6038/ 4764

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

6296/ 4925

93.6 96.2 97.6 97.0 98.6 97.7 99.1 97.7 98.4

6321/ 4938

93.3 96.2 97.5 96.8 98.4 97.7 98.9 97.7 98.2

21

Runtime (sec.)

#sinks/ #nets

Greed


=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

2958/ 2396

.30 1.63 357 9.16 2,090 98.91 29,190 2.33 947

3077/ 2438

.33 2.35 350 11.10 2,356 128.38 37,970 2.87 846

3099/ 2784

.33 1.80 392 12.56 2,364 132.81 38,341 2.86 877

6038/ 4764

.53 2.84 600 16.57 3,166 182.55 60,450 4.98 1,866

6296/ 4925

.55 4.35 690 19.5 3,721 265.78 77,671 5.38 1,828

6321/ 4938

.54 3.37 730 18.99 3,813 255.37 79,123 5.43 1,833

22

% Routed Nets vs. Runtime

93

94

95

96

97

98

99

0.1 1 10 100 1000 10000 100000

CPU Seconds

% r

ou

ted

ne

ts

Greed

2TMCF

3TMCF

4TMCF

MTMCF

23

Resource Usage

Greed2TMCF 3TMCF 4TMCF MTMCF

=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04# Conn. Sinks

5,645 5,725 5,842 5,779 5,896 5,827 5,942 5,813 5,897

% Conn. Sinks

93.5 94.8 96.8 95.7 97.6 96.5 98.4 96.3 97.7

Wirelength (meters)

42.22 45.18 47.80 44.48 47.66 44.18 47.49 45.33 47.51

WL/sink (microns)

7,479 7,891 8,182 7,697 8,083 7,582 7,992 7,798 8,057

#Buffers 9037 9,860 10,676 9,591 10,610 9,497 10,507 9,860 10,647

#Buff/sink 1.60 1.72 1.83 1.66 1.80 1.63 1.77 1.70 1.81

#nets = 4,764 #sinks = 6,038 400 buffers/BB

24

WL and #Buffers for 100% Completion

#nets = 4,764 #sinks = 6,038 Flow-rounding wastes routing resources!

BB Cap.

Greed


=.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04 =.64=.64 =.04=.04

500 ——— ———51.28

———50.07

———49.45

———49.54

11,738 11,312 11,079 11,161

600 ———50.46 51.13 48.93 49.95 48.02 49.58 48.34 49.27

11,330 11,688 10,802 11,267 10,512 11,115 10,631 11,075

100047.89 50.59 50.76 49.05 49.93 48.01 49.98 48.28 48.27

10,330 11,334 11,558 10,802 11,284 10,512 11,373 10,619 10,783

800047.89 50.62 50.28 48.97 51.28 48.07 51.40 48.33 48.44

10,330 11,334 11340 10,794 11,788 10,503 11,803 10,619 10,625

25

Conclusions and Ongoing Work

• Provably good algorithms and practical heuristics based on node-capacitated MTMCF approximation– Higher completion rates than previous algorithms

• Extensions:– Combine global buffering with BB planning

• combine with compaction

26

Combining with compaction

27


28


• Sum-capacity constraints: cap(BB1) + cap(BB2) const.

29

Conclusions and Ongoing Work

• Provably good algorithms and practical heuristics based on node-capacitated MTMCF approximation– Higher completion rates than previous algorithms

• Extensions:– Combine global buffering with BB planning

• combine with compaction

– Enforce channel capacity constraints – Improved resource usage

• smart release of resources

Documents

Provably Good Global Buffering by Multiterminal Multicommodity Flow Approximation