The Associative-Skew Clock Routing Problem

Preview:

DESCRIPTION

The Associative-Skew Clock Routing Problem. In memory of Mr. Patrick Catapano, Jr. of Motorola Corporation and his contributions to modern chip implementation methodology Yu Chen (UCLA) Andrew B. Kahng (UCLA) Gang Qu (UCLA) Alexander Zelikovsky (Georgia State) - PowerPoint PPT Presentation

Citation preview

The Associative-Skew Clock Routing Problem

In memory of Mr. Patrick Catapano, Jr. of Motorola Corporation and his contributions to modern chip implementation methodology

Yu Chen (UCLA)

Andrew B. Kahng (UCLA)

Gang Qu (UCLA)

Alexander Zelikovsky (Georgia State)

Supported in part by grants from Cadence Design Systems, Inc. and the MARCO (SRC/DARPA) Gigascale Silicon Research Center, and by a GSU research initiation grant.

Introduction• Zero-Skew clock routing• Associative-Skew Problem• Example: potential gain • 3 composing methods:

– optimal RSMT– optimal joining– optimal merging

• improves over GDME

• Testbed: random sink sets• Experiments• Conclusion

Model• Set of sinks S = s1, …, sn and source s0

• Sink delay = t(s0, si)

• Skew between two sinks

skew (si , sj) = |t(s0 , si )- t(s0 , sj )|

• Skew of a tree T = maximum skew between sinks

• Objective: minimize wirelength, such that skews

between certain (all) pairs of sinks are small

Relevant Clock Routing• Zero skew over ALL sinks

– Zero-Skew Tree (ZST) literature

– Deferred-Merge Embedding (DME)• optimal with the given topology

• Greedy-DME: finds high-quality topology

• Almost zero skew over ALL sinks– Bounded-Skew Tree (BST) literature

– solvable with DME-like approaches

• Zero skew over SUBSETS of sinks– zero skew inside each subset

– no skew constraints between subsets

Associative-Skew Problem

• Given: Set of sinks S partitioned into disjoint subsets S = S1 … Sk

• Construct: a Steiner tree connecting all sinks with a root S0 to achieve– zero skew within each subset Si – unconstrained skew between subsets

• Real-world: bounded global skew– skew in subsets < 1 gate delay

• latch-to-latch “0-level” datapaths• 0-level paths quite small

– skew between subsets < 12 gate delays

Potential Wirelength Gain• Potential gain is logarithmic:

– each subset has single sink, | Si | =1; all sinks

on a segment– standard Steiner minimum tree may be log k

times shorter than zero-skew tree1

12/7

4/71/7

Potential Wirelength Gain• Potential gain is logarithmic:

– each subset has single sink, | Si | =1; all sinks

on a segment– standard Steiner minimum tree may be log k

times shorter than zero-skew tree

Heuristic H0

A

B

A

B

H0 H1

• All our heuristics combine GDME solutions for each sink subset into single tree with common root

• H0 = k-Greedy DME:– Greedy DME for each subset Si

– join the roots with any rectilinear Steiner minimum tree heuristic

• Can do better: connect root of B to any point in A

Heuristic H1• H1 = H0 but connect root from any internal point

– For each ordered pair of ZST’s A = ZST(Si), B = ZST(Sj), find optimal joining = shortest edge from a point in A to root(B)– Full directed graph G: nodes = ZST(Si), arcs = optimal joinings – Find optimal branching (directed MST) in G

• O(k2) - Tarjan (1977), Camerini-Fratta-Maffioli (1979)

Directed graph G with optimal branching

Better Merging

• Given: Given two ZST’s A and B and insertion delay offset w = skew between sinks in A and B from root(A)

• Find: min cost tree T(A,B,w)– rooted at root(A) – containing A– offset w between path delays of sinks in A and sinks in B

A

B

A

B

• Potential gain of total wirelength of B

Optimal Slice Merging Algorithm

T(A,B,w)

r(A)

A

B

v

r(B)

s s1 s2

u`

u

•For each v in B– Find merge (v) = merging cost of v to A with offset w– Find adj(v) = adjusted cost of root(B)-v-path– gain(v) = adj(v)-merge(v)

•Sum of gains over slice = gain of slice•Find optimal slice by preorder/postorder traversal

merge(v) = cost of thick edgeadj(v) = cost of dashed edges

divided by 2L, L = edge level in Bslice = roots of subtrees of Bjoined to points of A

Wirelength Gain over GDME

(-2,2)

(-2,-2) (2,-2)

(2,2)

(-1,1) (1,1)

(-1,-1) (1,-1)

(-2,2)

(-2,-2) (2,-2)

(2,2)

(-1,1) (1,1)

(-1,-1) (1,-1)

Root(0,0)Root(0,0)

Greedy DME with offset Optimal Slice Merging

Gain for this instance: 28%

Testbed: Random Sink Sets

Class 1: “Shift” Class 2: “Ring”

n K Shift H0 H1 H2 GDME+O Time1 Time2125 2 0 28.90 28.81 24.05 19.80 1.8 2.5250 2 0 40.13 40.09 34.59 28.19 9.9 22.0500 2 0 56.69 56.65 48.92 40.61 69.5 189.8125 2 0.25 28.92 28.32 26.50 22.50 1.8 2.6250 2 0.25 39.76 39.25 38.15 33.15 10.2 20.6500 2 0.25 57.06 56.61 55.11 46.47 71.4 198.1125 4 0 59.85 59.80 45.26 28.03 7.0 24.6250 4 0 80.57 80.52 62.38 40.97 34.7 183.1500 4 0 113.0 112.98 92.69 57.84 213.3 1505.1125 4 0.25 59.47 58.83 54.44 43.30 6.7 25.4250 4 0.25 83.18 81.74 78.19 60.46 35.9 196.2500 4 0.25 128.20 127.45 110.13 86.46 213.5 1602.5

Experimental data for n-point sets randomly generated in K shifted square regionsTime1 refers to H2 runtimes; Time2 refer to GDME runtimes

Experimental Results (“Shift”)

Runtime improvement since GDME is nonlinear

n K Shift H0 H1 H2 GDME+O Time1 Time2125 2 0 28.90 28.81 24.05 19.80 1.8 2.5250 2 0 40.13 40.09 34.59 28.19 9.9 22.0500 2 0 56.69 56.65 48.92 40.61 69.5 189.8125 2 0.25 28.92 28.32 26.50 22.50 1.8 2.6250 2 0.25 39.76 39.25 38.15 33.15 10.2 20.6500 2 0.25 57.06 56.61 55.11 46.47 71.4 198.1125 4 0 59.85 59.80 45.26 28.03 7.0 24.6250 4 0 80.57 80.52 62.38 40.97 34.7 183.1500 4 0 113.0 112.98 92.69 57.84 213.3 1505.1125 4 0.25 59.47 58.83 54.44 43.30 6.7 25.4250 4 0.25 83.18 81.74 78.19 60.46 35.9 196.2500 4 0.25 128.20 127.45 110.13 86.46 213.5 1602.5

Experimental Results (“Shift”)

H2 is losing to GDME with offsets

n K Shift H0 H1 H2 GDME+O Time1 Time2125 2 0 28.90 28.81 24.05 19.80 1.8 2.5250 2 0 40.13 40.09 34.59 28.19 9.9 22.0500 2 0 56.69 56.65 48.92 40.61 69.5 189.8125 2 0.25 28.92 28.32 26.50 22.50 1.8 2.6250 2 0.25 39.76 39.25 38.15 33.15 10.2 20.6500 2 0.25 57.06 56.61 55.11 46.47 71.4 198.1125 4 0 59.85 59.80 45.26 28.03 7.0 24.6250 4 0 80.57 80.52 62.38 40.97 34.7 183.1500 4 0 113.0 112.98 92.69 57.84 213.3 1505.1125 4 0.25 59.47 58.83 54.44 43.30 6.7 25.4250 4 0.25 83.18 81.74 78.19 60.46 35.9 196.2500 4 0.25 128.20 127.45 110.13 86.46 213.5 1602.5

Experimental Results (“Shift”)

Runtime improvement since GDME is nonlinear

n K H0 H1 H2 GDME+O Time1 Time2125 2 48.04 47.63 45.75 49.26 2.0 2.8250 2 84.81 84.72 82.22 84.88 2.8 23.6500 2 123.15 123.12 119.07 123.28 11.4 191.2125 4 156.77 154.31 123.11 147.67 6.4 23.9250 4 204.24 202.19 168.46 195.34 42.6 207.1500 4 270.81 268.23 236.38 247.61 214.9 1558.4125 2 125.63 124.49 121.79 126.04 1.9 3.1250 2 169.46 169.01 165.30 167.24 9.8 26.3500 2 239.87 239.45 235.89 240.65 68.1 214.9125 4 549.23 548.52 489.82 540.80 7.1 20.0250 4 748.75 748.29 686.35 741.23 36.0 163.4500 4 1018.76 1018.19 955.31 998.08 211.1 1300.6

Experimental Results (“Ring”)

Average wirelength improvement 7%

n K H0 H1 H2 GDME+O Time1 Time2125 2 48.04 47.63 45.75 49.26 2.0 2.8250 2 84.81 84.72 82.22 84.88 2.8 23.6500 2 123.15 123.12 119.07 123.28 11.4 191.2125 4 156.77 154.31 123.11 147.67 6.4 23.9250 4 204.24 202.19 168.46 195.34 42.6 207.1500 4 270.81 268.23 236.38 247.61 214.9 1558.4125 2 125.63 124.49 121.79 126.04 1.9 3.1250 2 169.46 169.01 165.30 167.24 9.8 26.3500 2 239.87 239.45 235.89 240.65 68.1 214.9125 4 549.23 548.52 489.82 540.80 7.1 20.0250 4 748.75 748.29 686.35 741.23 36.0 163.4500 4 1018.76 1018.19 955.31 998.08 211.1 1300.6

Experimental Results (“Ring”)

Conclusions & Future Work

• We introduced the Associative-Skew Problem

• We suggested a series of heuristics – outperform GDME for separated sink subsets– GDME is good otherwise

• Open issues:– find a consistently better heuristic– solve the ASP with limited-height (sub)trees

Recommended