32
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance- driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported in part by MARCO GSRC

Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

  • Upload
    bracha

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning. Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. . Work supported in part by MARCO GSRC. Outline.  Motivation Performance driven bipartition problem New bipartitioning algorithm - PowerPoint PPT Presentation

Citation preview

Page 1: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts.

Work supported in part by MARCO GSRC

Page 2: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Outline

Motivation• Performance driven bipartition problem• New bipartitioning algorithm• Experimental results• Conclusion and future work

Page 3: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Partitioning and Performance

The hypergraph partitioning problem is to divide the nodes of a hypergraph into roughly equal parts; the traditional objective is to minimize cutsize.

In performance-driven partitioning, we also seek to minimize path delay on timing paths.

Page 4: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

– Reduces delay by 16% while increasing cutsize by 17%

– Requires substantial gate replication

Previous Work (I)• [Cong et al. ISPD-2002]

– Global clustering based algorithm with retiming

Min-delay Clusteringw/ retiming

De-clusteringand refinement

Min-cutsizeClustering

Page 5: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

– 14% reduction of delay with 10% increase in cutsize

– 139% increase in runtime compared with hMetis

Previous Work (II)

• [Ababei et al. ICCAD-2002]– Reweighting based method

Global timing analysis Find critical paths

Reweighting Input

1

11

1

1 2

Path based

Net based

Cutsize oriented partitioner, suchas hMetis,MLPart

Page 6: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Motivating Questions Can we avoid global timing analysis?

– Global timing analysis is extremely time-consumingCan we improve path delay without significant

degrading of cutsize? – Need smooth tradeoff between delay and cutsize

Can we reduce implementation overheads?– Previous methods store thousands of critical paths and

continuously update them

Page 7: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Outline

• MotivationPerformance driven bipartition problem• New bipartitioning algorithm• Experimental results• Conclusion and future work

Page 8: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Delay ModelDelay = hop_delay + node_delay

Part 0 Part 1FF nodes

Combinational nodes

hop

cut

[Cong et al. ISPD-2002]hop_delay=5 node_delay=1 Delay = 3x5 + 5x1 = 20

[Ababei et al. ICCAD-2002]hop_delay=Elmore delay node_delay=constant

Page 9: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Performance Driven Bipartition Problem

Given: • Hypergraph H=(V,E)• Area Balance tolerance s (0<s<1), a parameter

to control allowable slack in the area constraint• , a given parameter which captures tradeoff

between cutsize and path delay (hopcount)Find: A bipartition (V0|V1) which satisfies: and minimizes (cutsize)+(1-)

(Max_hopcount)

Page 10: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Outline

• Motivation• Performance driven bipartition problem New bipartitioning algorithm• Experimental results• Conclusion and future work

Page 11: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Unidirectional Partition Path delay is minimized with

hopcount = 1 if the partition is unidirectional (“acyclic”), that is, all cuts are in the same direction

Problem:• High cutsize• No unidirectional solution

Can we achieve “locally unidirectional” partition?

Max hopcount=5 Max hopcount=3

Part 1Part 0Part 0

Part 0 Part 1 Part 0 Part 1

Page 12: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

V-Shaped NodesV-shaped node If a combinational node v satisfies: there exist vj, vt in the other part and a path from vj to vt that includes only v

then v is a V-shaped node

vj

Part 1

Part 0 vt

v

Page 13: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

V-Shaped Nodes in Critical Paths

Empirical observations from study of partitioning solutions:• there are V-shaped nodes in the partitioning solutions• every V-shaped node is included in many critical paths• every critical path contains several V-shaped nodes

For testcase 1:•Number of nets : 16377•Number of critical paths : 26772•On average, one critical path contains 27.6 nodes •On average, one critical path contains 3.4 V-nodes•On average, one V-node belongs to 233.7 critical paths

Page 14: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Key Idea: V-Shaped Nodes Elimination

PATH: abc hopcount=2

PATH: dbc hopcount=1

PATH: ebc hopcount=1

af

cb

edMove b

af

cb

ed

Move V-shaped node “b” to reduce path hopcount

Part 0

Part 1

Part 1

Part 0

PATH: abc hopcount=0

PATH: dbc hopcount=1

PATH: ebc hopcount=1

Page 15: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Distance-k V-Shaped Nodes Elimination

a d

b Move b,c

k = 2: Move V2 node “b, c” reduce path hopcount from 2 to 0

Part 0

Part 1 c

a d

b

Part 0

Part 1

c

Problems with large k:Cutsize may be greatly increasedDelay of one path reduced while other paths delay increased

Page 16: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

New Gain Function

v

Before MoveAfter Move

v

g(v): traditional FM gainrj(v): reduction of Vj nodes after moving v

Gain(v)=δ(0)+ δ(1)

Page 17: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Distance-k Unidirectional Algorithm

Calculate initial gains for all nodes and store the gainsSelect the node v with maximum gain

/* CLIP-like method: move the cluster that v belongs to */Reset the gains of all nodes to zeroMove v and update the gains of v and its neighborsWhile ( one node not moved) Select one node v with the maximum updated gain

Move v and update the related gains Find the point in the move sequence at which the sum of

gains is maximum; undo all moves after this point

Page 18: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Outline

• Motivation• New bipartitioning algorithm Experimental results• Conclusion and future work

Page 19: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Experimental Setup

• Four industry testcases obtained as LEF/DEF• Model of Ababei et al. (ICCAD-2002) used to

calculate delay • Partitioning solutions compared to results of

MLPart – strongest multilevel netlist partitioning code– website:

http://nexus6.cs.ucla.edu/GSRC/bookshelf/Slots/Partitioning/MLPart

• All tests on 600MHz Intel Pentium-III Xeon

Page 20: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Biasing against V1 Nodes vs. MLPart

TestcaseMLPart MLPart+V-shaped nodes

Removal

cutsize h delay time(s) cutsize h delay time(s)

1 820.7 5.3 352.8 11.79 856.1 3.3 266.8 12.58

2 169.9 3.5 220.7 13.45 189.8 2.5 211.2 15.32

3 141.3 3 291.6 16.67 152.3 2.3 283.6 18.27

4 408.7 5.3 302.6 12.43 421.2 3.6 252.7 14.03

• Reduction of delay: 4.5%-24.4% average:15.1%• Increase of cutsize: 3.0%-10.0% average: 4.9%• Increase of runtime: 6.3%-11.4% average: 9.7% Using the delay model in Cong et al. ISPD -2002• Reduction of delay: 4.3%-21.2% average:14.7%

δ(0)=1, δ(1)=10

Page 21: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Biasing against V2 Nodes vs. MLPart

TestcaseMLPart MLPart+Vk=2 nodes Removal

cutsize h delay time(s) cutsize h delay time(s)

1 820.7 5.3 352.8 11.79 847.5 3 262.1 13.16

2 169.9 3.5 220.7 13.45 183.2 2 202.5 15.67

3 141.3 3 291.6 16.67 149.2 2 275.6 18.92

4 408.7 5.3 302.6 12.43 416.7 3.4 243.5 14.79

δ(0)=1, δ(1)=30, δ(2)=3

• Reduction of delay: 8.9%-30.0% average: 18.7%• Increase of cutsize: 3.1%-7.2% average: 3.5%• Increase of runtime: 11.9%-15.9% average: 13.1%Using the delay model in Cong et al. ISPD -2002• Reduction of delay: 8.3%-28.7% average: 17.3%

Page 22: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Outline

• Motivation• Performance driven bipartition problem• New bipartitioning algorithm• Experimental results Conclusions and future work

Page 23: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Conclusions• Simple yet efficient timing-driven partitioning

that does not require global timing analysis • Negligible implementation, runtime overhead• Significantly reduces path delay with cutsize

and runtime almost same as leading-edge MLPart

• Similar improvements observed with different path delay metrics

• Futures– Impact of new partitioner on placement– Efficient methods for biasing δ(k) k>2

Page 24: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Thank you!

Page 25: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Future Work• Impact of new partitioner on placement• Efficient methods for biasing δ(k) k>2

Page 26: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Why Performance Driven Partitioning?• Achieving timing closure becomes increasingly

difficult in deep-submicron technologies due to non-ideal scaling of interconnect delay

• Routing alone can no longer solve timing problem, even with aggressive optimizations (buffer insertion, buffer/wire sizing,…)

Timing needs to be addressed at all design stages• Partitioning is a critical step in defining

interconnect timing properties, but is traditionally driven by cutsize objective

Page 27: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Previous Work (I)• With Logic Replication

– Retiming – Replication graph

• Without Logic Replication– Net based reweighting– Path based reweighting

Page 28: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

FM Partitioning and Gain Function

v

Before Move

v

After Move

Gain(v) = Reduction of cutsize after moving v

Gain(v)=-1

Move the node with the max gain and lock it

Start with random partition

Keep moving until all nodes are locked

Find the best point in the move sequence

Part 0

Part 1

Part 0

Part 1

Part 0

Part 1Part 0

Part 1

Page 29: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Procedure to Calculate rj(v)

Delete all FF nodes and their related edgesIn the remaining graph, BFS from vFor each level j from 1 to k If v is a Vj node before moving, rj’=1 If v is a Vj node after moving, rj’’=1 rj=rj’’-rj’

Page 30: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

CLIP Algorithm

vCLIP

v

Reminiscent of CLIP (Deng et al. DAC 1996) in how it induces movement of clusters across the cutline.

Page 31: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Distance-k V-Shaped Nodes

Distance-k V-shaped nodes (Vk-node): If k combinational nodes vi,1 … vi,k satisfy: vi,1 … vi,k are in the same part vj, vt in the other part a path from vj to vt and only passes vi,1 … vi,k

then vi,1 … vi,k are distance-k V-shaped nodes

vj

Part 1

Part 0 vt

vi,1 vi,k

Page 32: Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning

Notation

• H(V,E)= circuit hypergraph• V = set of nodes representing components of the

circuit• E = set of signal nets• A bipartition (V0|V1) of H(V,E) divides V into two

disjoint subsets s.t. V= V0V1, which are called Part 0 and Part 1

• A = the total area of all the nodes in V• A0 = the area of all the nodes in V0