84
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical and Computer Engineering *Dept of Electrical and Computer Engineering Michigan Technological University Michigan Technological University **IBM Austin Research Lab **IBM Austin Research Lab Austin, TX Austin, TX

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

  • Upload
    jacie

  • View
    47

  • Download
    1

Embed Size (px)

DESCRIPTION

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion. Shiyan Hu *, Zhuo Li**, Charles Alpert** *Dept of Electrical and Computer Engineering Michigan Technological University **IBM Austin Research Lab Austin, TX. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer

Insertion

Shiyan Hu*, Zhuo Li**, Charles Alpert**Shiyan Hu*, Zhuo Li**, Charles Alpert**

*Dept of Electrical and Computer Engineering *Dept of Electrical and Computer Engineering Michigan Technological UniversityMichigan Technological University

**IBM Austin Research Lab**IBM Austin Research LabAustin, TXAustin, TX

Page 2: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

2

Outline

Page 3: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

3

0.180

50100150200250300

Technology generation (m)

Del

ay (p

sec)

Transistor/Gate delay

Interconnect delay

0.8 0.5 0.250.25

0.150.35

Interconnect Delay Dominates

Page 4: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

44

Timing Driven Buffer Insertion

Page 5: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

R

Buffers Reduce RC Wire Delay

x/2

cx/4 cx/4rx/2

∆t = t_buf – t_unbuf = RC + tb – rcx2/4

x/2

cx/4 cx/4rx/2

CC R

x

∆t

x/2

x

Delay grows linearly with interconnect length

Page 6: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

6

25% Gates are Buffers

05

101520253035

Technology node

% b

uffe

red

nets

M3

M6

01020304050607080

Technology node

% c

ells

that

are

buf

fers clocked

unclocked

total

Saxena, et al.

[TCAD 2004]

Page 7: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

7

Problem Formulation

T

Minimal cost (area/power) solution

1.1. Steiner TreeSteiner Tree2.2. n candidate n candidate

buffer buffer locationslocations

Page 8: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

8

Solution Characterization

To model effect To model effect to downstream, to downstream, a candidate a candidate solution is solution is associated withassociated with

• v: a nodev: a node• C: downstream C: downstream

capacitancecapacitance• Q: required Q: required

arrival timearrival time• W: cumulative W: cumulative

buffer costbuffer cost

Page 9: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

9

Dynamic Programming (DP)

Candidate solutions are propagated toward the source

Start from sinks Candidate

solutions are generated

Three operations– Add Wire– Insert Buffer– Merge

Solution Pruning

Page 10: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

10

Generating Candidates

(1)

(2)

(3)

Page 11: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

11

Pruning Candidates

(3)

(a) (b)

Both (a) and (b) look the same to the source.Remove the one with the worse slack and cost

(4)

Page 12: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

12

Merging Branches

Right Candidates

Left Candidates

O(nO(n11nn22) solutions ) solutions after each branch after each branch merge. Worst-case merge. Worst-case O((n/m)O((n/m)mm) solutions.) solutions.

Page 13: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

13

DP Properties

((QQ11,C,C11,W,W11))

((QQ22,C,C22,W,W22))

inferior/inferior/dominateddominatedif Cif C11 C C2,2,WW11 WW22 and Q and Q11 Q Q22

Non-dominated solutions are Non-dominated solutions are maintained - for the same Q maintained - for the same Q and W, pick min Cand W, pick min C # solutions depends on # of # solutions depends on # of distinct W and Q, but not their distinct W and Q, but not their valuesvalues

Page 14: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

14

Previous Works

1990 1991 ……. 1996 ……. 2003 2004 ……. 2008 2009

van Ginn

eken

van Ginn

eken ’’s s

algori

thm

algori

thm

Lillis

Lillis ’’

algori

thm

algo

rithm

Shi a

nd Li’

s algo

rithm

Shi a

nd Li’

s algo

rithm

Chen a

nd Zho

u

Chen a

nd Zho

u ’’s s

algori

thm

algori

thm

NP-hard

ness

proof

NP-hard

ness

proof

Page 15: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

1515

Bridging The Gap

We are We are bridging bridging the gap!the gap!

A Fully Polynomial A Fully Polynomial Time Approximation Time Approximation Scheme (FPTAS)Scheme (FPTAS)

• Provably goodProvably good• Within (1+ɛ) Within (1+ɛ) optimal cost for optimal cost for any ɛ>0any ɛ>0• Runs in time Runs in time polynomial in n polynomial in n (nodes), b (nodes), b (buffer types) (buffer types) and 1/ɛand 1/ɛ• Best solution Best solution for an NP-hard for an NP-hard problem in problem in theorytheory• Highly Highly practicalpractical

Page 16: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

1616

The Rough Picture

W*: the cost of optimal solutionW*: the cost of optimal solution

Check it

Make guess on W*

Return the solution

Good (close to W*)

Not Good

Key 2: Smart guessKey 1: Efficient checking

Page 17: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

17

Key 1: Efficient Checking

Benefit of guessBenefit of guess• Only maintain Only maintain the solutions with the solutions with cost no greater cost no greater than the guessed than the guessed costcost• Accelerate DPAccelerate DP

Page 18: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Oracle (x): the checker, able to decide whether x>W* Oracle (x): the checker, able to decide whether x>W* or notor not

– Without knowing W*Without knowing W*– Answer efficientlyAnswer efficiently

1818

The Oracle

Oracle (x)

Guess x within the bounds

Setup upper and lower bounds of cost W*

Update the bounds

Page 19: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

1919

Construction of Oracle(x)

Scale and Scale and round each round each buffer costbuffer cost

nxww/

Only interested in Only interested in whether there is whether there is a solution with a solution with

cost up to x cost up to x satisfying timing satisfying timing

constraintconstraint

Dynamic Dynamic ProgrammingProgramming

Perform DP to Perform DP to scaled problem scaled problem with n/ɛ. with n/ɛ. Runtime Runtime polynomial in polynomial in n/ɛn/ɛ

Page 20: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

20

Scaling and Rounding

xɛɛ/n 2xɛɛ/n 3xɛɛ/n 4xɛɛ/n

Buffer cost

0

buffer costs are integers due to

rounding and are bounded by n/ɛ.

Rounding error at each buffer Rounding error at each buffer xɛɛ/n, total rounding error , total rounding error xɛ. ɛ. • Larger x: larger error, fewer Larger x: larger error, fewer distinct costs and faster distinct costs and faster • Smaller x: smaller error, more Smaller x: smaller error, more distinct costs and slower distinct costs and slower • Rounding is the reason of Rounding is the reason of accelerationacceleration

Page 21: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

DP Results

21

Yes, there is a solution satisfying timing

constraint

No, no such solution

With cost rounding back, the solution has cost at most n/ɛ • xɛ/n

+ xɛ= (1+ɛ)x > W*

With cost rounding back, the solution has cost at least n/ɛ • xɛ/n

= x W*

DP result w/ all w are integers n/ɛ

Page 22: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

22

Rounding on Q

# # solutions bounded by # distinct W and Qsolutions bounded by # distinct W and Q # W = O(n/ɛ# W = O(n/ɛ11))

– Rounding before DPRounding before DP # Q# Q

– Round up Q to nearest value in {0, ɛRound up Q to nearest value in {0, ɛ22T/m , 2ɛT/m , 2ɛ22T/m, T/m, 3ɛ3ɛ22T/m,…,T T/m,…,T }, }, in branch merge (m is # sinks)in branch merge (m is # sinks)

– Rounding during DPRounding during DP– # Q = O(m/ɛ# Q = O(m/ɛ22))

# non-dominated solutions is O(mn/ɛ# non-dominated solutions is O(mn/ɛ11ɛɛ22))

3ɛ3ɛ22T/T/mm

2ɛ2ɛ22T/T/mm

ɛɛ22T/mT/m 4ɛ4ɛ22T/T/mm

00

Page 23: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Q-W Rounding Before Branch Merge

WW

QQ

n/ɛn/ɛ11

TT

ɛɛ22T/mT/m

0 1 2 3 4

2ɛ2ɛ22T/mT/m

3ɛ3ɛ22T/mT/m

4ɛ4ɛ22T/mT/m

Page 24: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

24

Solution Propagation: Add Wire

cc22 = c = c11 + cx + cx qq22 = q = q11 - (rcx - (rcx22/2 + rxc/2 + rxc11)) r: wire resistance per unit lengthr: wire resistance per unit length c: wire capacitance per unit lengthc: wire capacitance per unit length

(v1, c1, w1, q1)(v2, c2, w2, q2)x

Page 25: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

25

Solution Propagation: Insert Buffer

(v1, c1, w1, q1)(v1, c1b, w1b, q1b)

qq1b1b = q = q1 1 - d(b) - d(b) cc1b 1b = C(b)= C(b) ww1b1b = w = w1 1 + w(b)+ w(b) d(b): buffer delayd(b): buffer delay

Page 26: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Buffer Insertion Runtime

branch single ain solutions dominated-non )(most At 1

2

21 bnmnO

pruning.bin - Wcross No node.each for time)( 1

22

21 bnmnbO

mergebranch aafter solutions )(21

mnO

esbuffer typ b with solutions dominated-non )( introducesinsertion buffer A 1nbO

bins- W)(1nO

Page 27: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

27

Solution Propagation: Merge

Round q in both branchesRound q in both branches ccmerge merge = c= cl l + c+ crr wwmerge merge = w= wl l + w+ wrr qqmergemerge = min(q = min(ql l , q, qrr))

(v, cl , wl , ql) (v, cr ,wlr,qr)

Page 28: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime - 1

Target Q=0Target Q=0

Page 29: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime - 2

Target Q= Target Q= ɛɛ22T/m T/m

Page 30: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime -3

Target Q= Target Q= 22ɛɛ22T/m T/m

Page 31: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Branch Merge Runtime -4

time)( each takes wherea,W Wall try a, WmergedFor 2

rl amO

)( is runtime total,0,1,...,aFor 2

21

2

1 mnOn

)( isit bins, into solutions puttingfor timeIncluding2

21

2

1

2

21 mnbnmnO

mergebranch aafter solutions )(21

mnO

Page 32: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

32

Timing-Cost Approximate DP

Lemma: a buffering solution with cost at Lemma: a buffering solution with cost at most (1+ɛmost (1+ɛ11)W* and with timing at most )W* and with timing at most (1+ɛ(1+ɛ22)T can be computed in time)T can be computed in time

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnmO

Page 33: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

33

Key 2: Geometric Sequence Based Guess

U (L): upper (lower) bound on W*U (L): upper (lower) bound on W* Naive binary search style approachNaive binary search style approach

Runtime (# iterations) depends on the initial bounds U and LRuntime (# iterations) depends on the initial bounds U and L

Oracle (x)

x=(U+L)/2

Set U and L on W*

U= (1+ɛ)x(1+ɛ)x L= x

W*<(1+ɛ)xW*<(1+ɛ)x W* W* x x

Page 34: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

34

Adapt ɛAdapt ɛ11

Rounding factor xɛɛ11/n for W Larger ɛLarger ɛ11: faster with : faster with rough estimationrough estimation Smaller ɛSmaller ɛ11: slower with : slower with accurate estimationaccurate estimation Adapt ɛAdapt ɛ11 according to U and L according to U and L

Page 35: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

35

U/L Related Scale and Round

Buffer cost

0U/L

xɛ/nxɛ/n

Page 36: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

36

Conceptually

Begin with large ɛBegin with large ɛ11 and progressively reduce it and progressively reduce it (towards ɛ) according to U/L as x approaches W*(towards ɛ) according to U/L as x approaches W*

Fix ɛFix ɛ22=ɛ in rounding Q for limiting timing violation=ɛ in rounding Q for limiting timing violation

• Set ɛSet ɛ11 as a geometric sequence of …, 8, 4, 2, 1, 1/2, …, ɛɛ• One run of DP takes about O(n/ɛɛ11) time. Total runtime is bounded by the last run as Total runtime is bounded by the last run as O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(… + n/8 + n/4 + n/2 + … + n/ɛ) = O(n/ɛ), independent of # iterationsO(n/ɛ), independent of # iterations

Page 37: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Oracle Query Till U/L<2

37

'

*,

*,

*,

*,'

1 ,1

i

iliu

il

iui

WWx

WW

)()()1()3/4(2/1

1*,

*,

2

2

1*,

*,

2

2

1'

2

2it

ti iu

il

ti iu

il

ti i WWnmO

WWnmOnmO

)() 59.0()(2

2

0

)3/4(2/1

2

2)3/4(2/1

0*,

*,

2

2

nmOnmO

WWnmO

tjtj iu

il j

j

it

tu

tl

iu

il

iu

il

iu

il

il

iu

il

iu

WW

WW

WW

WW

WW

WW

)3/4(

*,

*,

*,

*,

3/4

*,

*,

*,

*,

4/3

*,

*,

*1,

*1,

Page 38: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

38

Mathematically

Page 39: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

39

The Algorithmic Flow

Oracle (x)

Adapting ɛ1 =[U/L-1]1/2

Set U and L of W*

Set x=[UL/(1+ ɛ1)]1/2

Update U or L

U/L<2

Compute final solution

Page 40: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

When U/L<2

40

At least one At least one feasible feasible solution, solution, otherwise no otherwise no solution with solution with cost 2n/ɛcost 2n/ɛ • Lɛ/n = 2L Lɛ/n = 2L U U

A single DP A single DP runtimeruntime

Pick min cost solution satisfying Pick min cost solution satisfying timing at drivertiming at driver

W=2n/ɛW=2n/ɛ

Scale and round each cost by Scale and round each cost by Lɛ/nLɛ/n

Run DP

Page 41: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Main Theorem

Theorem: a (1+ ɛ) approximation to the Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost buffering timing constrained minimum cost buffering problem can be computed in O(mproblem can be computed in O(m22nn22b/ɛb/ɛ33+ + nn33bb22/ɛ) time for 0<ɛ<1 and in /ɛ) time for 0<ɛ<1 and in O(mO(m22nn22b/ɛ+mnb/ɛ+mn22b+nb+n33b) time for ɛb) time for ɛ11

Page 42: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

42

Experiments

Experimental SetupExperimental Setup– 1000 industrial nets1000 industrial nets– 48 buffer types including non-inverting 48 buffer types including non-inverting buffers and inverting buffersbuffers and inverting buffers

Compared to Dynamic Compared to Dynamic ProgrammingProgramming

Page 43: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

4343

Cost Ratio Compared to DP

Approximation Ratio ɛ

Buffer Cost Ratio

00.020.040.060.080.1

0.120.14

FPTAS

Page 44: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

4444

Speedup Compared to DP

Approximation Ratio ɛ

Speedup

0123456

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS

Page 45: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

45

Timing Violations (% nets)

0%1%2%3%4%5%6%7%

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS

Approximation Ratio ɛ

Timing

violations

Page 46: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

4646

Cost Ratio w/ Timing Recovery

Approximation Ratio ɛ

Buffer Cost Ratio

0

0.05

0.1

0.15

0.2

0.25

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS FPTAS w/ Recovery

Page 47: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

4747

Speedup w/ Timing Recovery

Approximation Ratio ɛ

Speedup

0123456

0.01

0.05 0.1 0.2 0.3 0.4 0.5

FPTAS FPTAS w/ Recovery

Page 48: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

48

Observations

Without timing recoveryWithout timing recovery– FPTAS always achieves the theoretical guaranteeFPTAS always achieves the theoretical guarantee– Larger Larger ɛɛ leads to more speedup leads to more speedup– On average about 5x faster than dynamic programmingOn average about 5x faster than dynamic programming– Can run 4.6x faster with 0.57% solution degradationCan run 4.6x faster with 0.57% solution degradation– <5% nets with timing violations<5% nets with timing violations

With timing recoveryWith timing recovery– FPTAS well approximates the optimal solutions FPTAS well approximates the optimal solutions – Can still have >4x speedupCan still have >4x speedup

Page 49: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

NP-Hardness NP-Hardness ComplexityComplexity

Exponential Exponential Time Time

AlgorithmAlgorithm

Our Bridge

Page 50: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

50

Conclusion

Propose a (1+ ɛ) approximation for timing constrained Propose a (1+ ɛ) approximation for timing constrained minimum cost buffering for any ɛ > 0minimum cost buffering for any ɛ > 0– Runs in O(mRuns in O(m22nn22b/ɛb/ɛ33+ n+ n33bb22/ɛ) time/ɛ) time– Timing-cost approximate dynamic programming Timing-cost approximate dynamic programming – Double-ɛ geometric sequence based oracle searchDouble-ɛ geometric sequence based oracle search– 5x speedup in experiments5x speedup in experiments– Few percent additional buffers as guaranteed theoreticallyFew percent additional buffers as guaranteed theoretically

The first provably good approximation algorithm on this The first provably good approximation algorithm on this problemproblem

Page 51: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

51

0.18

Source: Gordon Moore, Chairman Emeritus, Intel Corp.

050

100150200250300

Technology generation (m)

Del

ay (p

sec)

Transistor/Gate delay

Interconnect delay

0.8 0.5 0.25

0.15

0.35

Summary on Buffer Insertion and Layer Assignment

This is why Moore’s law does not hold This is why Moore’s law does not hold anymore.anymore.

Page 52: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Interconnect Delay Scaling

Scaling factor s=0.7 per generationScaling factor s=0.7 per generation Emore Delay of a wire of length Emore Delay of a wire of length ll : :

intint = = (rl)(cl)/2= rcl(rl)(cl)/2= rcl22/2/2 (first order) (first order)

Local interconnects : Local interconnects : intint : : (r/s(r/s22)(c)(ls))(c)(ls)22/2 = rcl/2 = rcl22/2/2

– Local interconnect delay roughly unchangedLocal interconnect delay roughly unchanged

Global interconnects : Global interconnects : intint : : (r/s(r/s22)(c)(l))(c)(l)22/2= (rcl/2= (rcl22)/2s)/2s22

– Global interconnect delay doubles – unsustainableGlobal interconnect delay doubles – unsustainable

Interconnect delay increasingly more dominant Interconnect delay increasingly more dominant

Page 53: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Interconnect Optimization

Page 54: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Analogy

Advancing technology = period of city expansion More transistors = larger city Buffers = gas stations Interconnects = streets

– Lower layer = local street– Higher layer = highways

Signal delay (timing) = time to cross the city Highway is fast but its power has not been well

explored– Traditional wire sizing = make lane wider– Layer assignment = highway overpasses

Page 55: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

R

Buffers Reduce RC Wire Delay

x/2

cx/4 cx/4rx/2

∆t = t_buf – t_unbuf = RC + tb – rcx2/4

x/2

cx/4 cx/4rx/2

CC R

x

∆t

x/2

x

Page 56: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Detailed Analysis

The delay of a wire of length L is T=rcL2/2

Assume N identical buffers with equal inter-buffer length l (L = Nl). To minimize delay

gddg

ggd

CRl

cRrCrclL

clCrlclCRNT

12/

2/

0dldT

02 2

opt

gd

lCRrcL

rcCR

l gdopt

2

L

r,c – Resistance, cap. per unit lengthRd – On resistance of inverterCg – Gate input capacitance

l

Page 57: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Quadratic Delay -> Linear Delay

Substituting lopt back into the interconnect delay expression:

rcCR

CRcRrC

rcCR

rcL

CRl

cRrCrclLT

gd

gddg

gd

gdopt

dgoptopt

2

2

1

cRrCrcCRLT dggdopt 2

Delay grows linearly with L instead of quadratically

Page 58: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

58

25% Gates are Buffers

01020304050607080

90nm

65nm

45nm

32nm

Technology node

% c

ells

that

are

buf

fers clocked

unclocked

total

Saxena, et al.

[TCAD 2004]

Page 59: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

59

Problem Formulation

T

Minimal cost (area/power) solution

1.1. Steiner TreeSteiner Tree2.2. n candidate n candidate

buffer buffer locationslocations

Page 60: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

60

Dynamic Programming (DP)

Candidate solutions are propagated toward the source

Start from sinks Candidate

solutions are generated

Three operations– Add Wire– Insert Buffer– Merge

Solution Pruning

Page 61: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

61

Solution Propagation: Add Wire

cc22 = c = c11 + cx + cx qq22 = q = q11 - (rcx - (rcx22/2 + rxc/2 + rxc11)) r: wire resistance per unit lengthr: wire resistance per unit length c: wire capacitance per unit lengthc: wire capacitance per unit length

(v1, c1, w1, q1)(v2, c2, w2, q2)x

Page 62: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

62

Solution Propagation: Insert Buffer

(v1, c1, w1, q1)(v1, c1b, w1b, q1b)

qq1b1b = q = q1 1 - d(b) - d(b) cc1b 1b = C(b)= C(b) ww1b1b = w = w1 1 + w(b)+ w(b) d(b): buffer delayd(b): buffer delay

Page 63: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

63

Solution Propagation: Merge

ccmerge merge = c= cl l + c+ crr wwmerge merge = w= wl l + w+ wrr qqmergemerge = min(q = min(ql l , q, qrr))

(v, cl , wl , ql) (v, cr, wr, qr)

Page 64: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Solution Pruning

Needs solution pruning for accelerationNeeds solution pruning for acceleration Two candidate solutionsTwo candidate solutions

– (v, c(v, c11, q, q11,w,w11))– (v, c(v, c22, q, q22,w,w22))

Solution 1 is inferior to Solution 2 if Solution 1 is inferior to Solution 2 if – cc11 c c22 : larger load : larger load– and and qq11 q q2 2 : tighter timing: tighter timing– and and ww11 ww22: larger cost: larger cost

Page 65: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

ENDEND

Car Race - Speed

Car Speed <=> RATCar Speed <=> RAT

Page 66: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Car Race - Load

Load <=> Load CapacitanceLoad <=> Load Capacitance

Page 67: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Faster & Smaller Load

ENDENDFaster & smaller loadFaster & smaller load(larger RAT, smaller (larger RAT, smaller

capacitance):capacitance):GoodGood

Slower & larger loadSlower & larger load(smaller RAT, larger (smaller RAT, larger

capacitance):capacitance):InferiorInferior

Page 68: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

ENDEND

Faster & Larger Load: Result 1

Page 69: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

ENDEND

Who will be the winner?Who will be the winner?Cannot tell at this moment, Cannot tell at this moment,

so keep both of them.so keep both of them.

Faster & Larger Load: Result 2

Page 70: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

70

Pruning

((QQ11,C,C11,W,W11))

((QQ22,C,C22,W,W22))

inferior/inferior/dominateddominatedif Cif C11 C C2,2,WW11 WW22 and Q and Q11 Q Q22

Non-dominated solutions are Non-dominated solutions are maintained: for the same Q and maintained: for the same Q and W, pick min CW, pick min C # of solutions depends on # of # of solutions depends on # of distinct W and Q, but not their distinct W and Q, but not their valuesvalues

Page 71: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

7171

FPTAS For Buffer Insertion

We are We are bridging bridging the gap!the gap!

A Fully Polynomial A Fully Polynomial Time Approximation Time Approximation Scheme (FPTAS)Scheme (FPTAS)

• Provably goodProvably good• Within (1+ɛ) Within (1+ɛ) optimal cost for optimal cost for any ɛ>0any ɛ>0• Runs in time Runs in time polynomial in n polynomial in n (nodes), b (nodes), b (buffer types) (buffer types) and 1/ɛand 1/ɛ• Best solution Best solution for an NP-hard for an NP-hard problem in problem in theorytheory• Highly Highly practicalpractical

Page 72: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

7272

The Rough Picture

W*: the cost of optimal solutionW*: the cost of optimal solution

Check it

Make guess on W*

Return the solution

Good (close to W*)

Not Good

Key 2: Smart guessKey 1: Efficient checking

Page 73: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

7373

Key 1: Construction of Oracle(x)

Scale and Scale and round each round each buffer costbuffer cost

Only interested in Only interested in whether there is whether there is a solution with a solution with

cost up to x cost up to x satisfying timing satisfying timing

constraintconstraint

Dynamic Dynamic ProgrammingProgramming

Perform DP to Perform DP to scaled problem scaled problem with cost upper with cost upper bound n/ɛ. Time bound n/ɛ. Time polynomial in polynomial in n/ɛn/ɛ

Page 74: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

74

Scaling and Rounding

xɛɛ/n 2xɛɛ/n 3xɛɛ/n 4xɛɛ/n

Buffer cost

0

Page 75: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Timing-Cost Approximate DP

Lemma: a buffering solution with cost at Lemma: a buffering solution with cost at most (1+ɛmost (1+ɛ11)W* and with timing at most )W* and with timing at most (1+ɛ(1+ɛ22)T can be computed in time)T can be computed in time

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnmO

75

Page 76: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

76

Key 2: Geometric Sequence Based Guess

U (L): upper (lower) bound on W*U (L): upper (lower) bound on W* Naive binary search style approachNaive binary search style approach

Runtime (# iterations) depends on the initial bounds U and LRuntime (# iterations) depends on the initial bounds U and L

Oracle (x)

x=(U+L)/2

Set U and L on W*

U= (1+ɛ)x(1+ɛ)x L= x

W*<(1+ɛ)xW*<(1+ɛ)x W* W* x x

Page 77: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

77

Adapt ɛAdapt ɛ11

Rounding factor xɛɛ11/n for W Larger ɛLarger ɛ11: faster with : faster with rough estimationrough estimation Smaller ɛSmaller ɛ11: slower with : slower with accurate estimationaccurate estimation Adapt ɛAdapt ɛ11 according to U and L according to U and L

Page 78: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

78

U/L Related Scale and Round

Buffer cost

0U/L

xɛ/nxɛ/n

Page 79: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Oracle Query Till U/L<2

79

'

*,

*,

*,

*,'

1 ,1

i

iliu

il

iui

WWx

WW

)()()1()3/4(2/1

1*,

*,

2

2

1*,

*,

2

2

1'

2

2it

ti iu

il

ti iu

il

ti i WWnmO

WWnmOnmO

)() 59.0()(2

2

0

)3/4(2/1

2

2)3/4(2/1

0*,

*,

2

2

nmOnmO

WWnmO

tjtj iu

il j

j

it

tu

tl

iu

il

iu

il

iu

il

il

iu

il

iu

WW

WW

WW

WW

WW

WW

)3/4(

*,

*,

*,

*,

3/4

*,

*,

*,

*,

4/3

*,

*,

*1,

*1,

)(1

23

21

2

22

1

22

1

2

21

2

bnbmnnmbmnnmO

Page 80: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Mathematically

80

Page 81: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Main Theorem

81

Theorem: a (1+ ɛ) approximation to the Theorem: a (1+ ɛ) approximation to the timing constrained minimum cost buffering timing constrained minimum cost buffering problem can be computed in O(mproblem can be computed in O(m22nn22b/ɛb/ɛ33+ + nn33bb22/ɛ) time for 0<ɛ<1 and in /ɛ) time for 0<ɛ<1 and in O(mO(m22nn22b/ɛ+mnb/ɛ+mn22b+nb+n33b) time for ɛb) time for ɛ11

Page 82: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Extension For Layer Assignment

Theorem: a (1+ɛ) approximation to the Theorem: a (1+ɛ) approximation to the timing constrained minimum cost layer timing constrained minimum cost layer assignment problem can be computed in assignment problem can be computed in O(mnO(mn22/ɛ) time for any ɛ>0./ɛ) time for any ɛ>0.

82

Oracle Lemma: given a tree with n wire Oracle Lemma: given a tree with n wire segments and m layers, the optimal layer segments and m layers, the optimal layer assignment subject to cost budget W=n/ɛ assignment subject to cost budget W=n/ɛ can be computed in O(mnW)=O(mncan be computed in O(mnW)=O(mn22/ɛ) /ɛ) time.time.

Page 83: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

Conclusion

A (1+ ɛ) approximation for timing constrained minimum cost A (1+ ɛ) approximation for timing constrained minimum cost buffering for any ɛ > 0 buffering for any ɛ > 0 (DAC’09)(DAC’09)– Runs in O(mRuns in O(m22nn22b/ɛb/ɛ33+ n+ n33bb22/ɛ) time/ɛ) time– Timing-cost approximate dynamic programming Timing-cost approximate dynamic programming – Double-ɛ geometric sequence based oracle searchDouble-ɛ geometric sequence based oracle search– 5x speedup in experiments5x speedup in experiments– Few percent additional buffers as guaranteed theoreticallyFew percent additional buffers as guaranteed theoretically

The first provably good approximation algorithm on this The first provably good approximation algorithm on this problemproblem

A similar algorithm for layer assignment problem A similar algorithm for layer assignment problem (ICCAD’08)(ICCAD’08)

83

Page 84: A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion

84

Thanks