96
FPGA Routing Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

FPGA Routing

  • Upload
    kylene

  • View
    175

  • Download
    8

Embed Size (px)

DESCRIPTION

FPGA Routing. Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223. Routing Resource Graph (RRG). source. wire 3. wire 4. 2-LUT. out. out. wire 3. wire 4. in 2. in 1. wire 2. wire 1. in 2. in 1. wire 1. wire 2. sink. - PowerPoint PPT Presentation

Citation preview

Page 1: FPGA Routing

FPGA Routing

Dr. Philip BriskDepartment of Computer Science and Engineering

University of California, Riverside

CS 223

Page 2: FPGA Routing

2

Routing Resource Graph (RRG)

www.eecg.toronto.edu/~aling/ece1718/project/fang/route_rr_graph.png

2-LUT

in1 in2

out

wire1

wire2

wire3 wire4source

sink

out

wire4

wire2

in2in1

wire1

wire3

Page 3: FPGA Routing

3

FPGA Routing• Disjoint-path problem (on the RRG); NP-complete

• Input:– Graph G(V, E)– Set of sources S = {s1, s2, …, sm}

– Set of sets of sinks T = {T1, T2, …, Tn}, Ti = {ti1, ti

2, …, tik}

• Solution

– Finds paths from each source si to all sinks in Ti

– Paths emanating from different sinks must be disjoint (cannot shared any vertices or edges)

• Objective(s)– Minimize delay, wirelength, etc.

Page 4: FPGA Routing

4

Disjoint and Non-disjoint Paths

s1

s2

t21

t11

Page 5: FPGA Routing

5

Disjoint and Non-disjoint Paths

This route is illegal!

s1

s2

t21

t11

Two nets are routed on the same FPGA segment(remember, RRG vertices represent wires and CLB I/Os)

Page 6: FPGA Routing

6

Disjoint and Non-disjoint Paths

This route is legal!

s1

s2

t21

t11

Page 7: FPGA Routing

7

LUT Input Equivalence (1/3)

s1

s2

2-LUTin1

in2

s1

s2

2-LUTin1

in2

LUT inputs are interchangeable

Page 8: FPGA Routing

8

LUT Input Equivalence (2/3)

s1

s2

2-LUTin1

in2

s1

s2

2-LUTin1

in2

s1

s2 t21 = in2

t11 = in1 t2

1 = in1

t11 = in2

s1

s2

Overly restrictive disjoint-path problem formulation

Page 9: FPGA Routing

9

LUT Input Equivalence (3/3)

s1

s2

2-LUTin1

in2

s1

s2

2-LUTin1

in2

Represent all inputs of a LUT as one RRG sink, t

ts1

s2

Page 10: FPGA Routing

PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs

Larry McMurchie and Carl EbelingInternational Symposium on FPGAs, 1995

Page 11: FPGA Routing

Architecture and CAD for Deep Submicron FPGAs

(Section 2.2.3, pp. 25-30)

Vaughn Betz, Jonathan Rose, Sandy MarquardtSpringer, 1999

Page 12: FPGA Routing

Basic Plan

• Route nets one-by-one– Each net is routed by a priority-driven breadth-first

search– Multiple nets may use the same nodes and edges– Node/edge costs depend on current usage and past

usage history• Repeat for a fixed number of iterations (say 200)– Success: A disjoint routing solution for for all nets– Failure: No disjoint solution found after a fixed number

of iterations

Page 13: FPGA Routing

Routability Cost Function (per Node)

• C(n) = p(n) × (b(n) + h(n))

– b(n): base cost of routing through node n• Typically the intrinsic delay of the circuit element corresponding to

the node

– h(n): history cost of routing through node n, based on previous router iterations

– p(n): depends on the number of signals presently routed through node n in the current iteration

Page 14: FPGA Routing

Timing-Routability Cost Function

Page 15: FPGA Routing

First-Order Congestion

s1

s3

s2

At1

t3

t2

B

C

1

1

1

2

3

4

3

1

1

1

2

3

4

3

Page 16: FPGA Routing

First-Order Congestion

s1

s3

s2

At1

t3

t2

B

C

1

1

1

2

3

4

3

1

1

1

2

3

4

3

All paths route through B

• Increase the penalty cost of B per iteration due to sharing

• In a future iteration, it will be cheaper to route signal 1 through A, rather than B

• In a future iteration, it will be cheaper to route signal 2 through C, rather than B

• Requires gradual increase in cost of sharing nodes

Page 17: FPGA Routing

Second-order Congestion

s1

s3

s2

At1

t3

t2

B

C

1

2

2

1

1

2

1

2

1

1

Routing Order: 1, 2, 3

Page 18: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

1

2

2

1

1

2

1

2

1

1

s1

s3

s2

t1

t3

t2

Page 19: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

1

2

2

1

1

2

1

2

1

1

s1

s3

s2

t1

t3

t2

Page 20: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

1

2

2

1

1

2

1

2

1

1

Two paths route through C

• Increasing the history cost through C will push signal 2 onto an alternative path through B

s1

s3

s2

t1

t3

t2

Page 21: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

1

2

2

3

3

2

1

2

3

3

Two paths route through C

• Increasing the history cost through C will push signal 2 onto an alternative path through B

s1

s3

s2

t1

t3

t2

Page 22: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

1

2

2

3

3

2

1

2

3

3

Two paths route through B

• Increasing the history cost through B will push signal 1 onto an alternative path through A

• Increasing the history cost through B will push signal 2 back onto the path through C

s1

s3

s2

t1

t3

t2

Page 23: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

3

4

2

3

3

4

3

2

3

3

Two paths route through B

• Increasing the history cost through B will push signal 1 onto an alternative path through A

• Increasing the history cost through B will push signal 2 back onto the path through C

s1

s3

s2

t1

t3

t2

Page 24: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

3

4

2

3

3

4

3

2

3

3

Two paths route through C

• Increasing the history cost through C will push signal 2 back through B

s1

s3

s2

t1

t3

t2

Page 25: FPGA Routing

Second-order Congestion

Routing Order: 1, 2, 3A

B

C

3

4

2

5

5

4

3

2

5

5

Two paths route through C

• Increasing the history cost through C will push signal 2 back through B

s1

s3

s2

t1

t3

t2

Page 26: FPGA Routing

Pseudocode

Priority driven BFS• Computed from the source

to every sink of net(i)

Connect the newly found path to the current sink to the partly-computed routing tree RT(i) for net I

Page 27: FPGA Routing

Timing-Routability Cost Function

Represents the estimated criticality of the path from source I to sink j on net(i)• net(i) may fan out to multiple sinks with varying criticality, depending on placement

Delay of critical path

Page 28: FPGA Routing

Negotiated A* Routing for FPGAs

Russell TessierFifth Canadian Workshop on Field Programmable

Devices, 1998

Page 29: FPGA Routing

Basic Idea

Page 30: FPGA Routing

A* Cost Function

• Cost at a node ni

fi = gi + di

gi = fi-1 + ci

Cost of routing from the source to ni

Estimated cost of routing from ni to the sink

Total cost of previous path

Cost of the next candidate node

Page 31: FPGA Routing

A* vs. BFS

A*: fi = gi + di = (fi-1 + ci) + di

BFS: fi = gi

Hybrid: fi = (1 - α)gi + αdi

= (1 - α)(fi-1 + ci) + αdi

Page 32: FPGA Routing

Good Ideas, But…

• Algorithm and implementation assumed disjoint switch block– i.e., disjoint global routing domains

• Key issue:– Which routing domains (reachable from the

source) are explored in what order?

Page 33: FPGA Routing

A Fast Routability-driven Router for FPGAs

Jordan S. Swartz, Vaughn Betz, Jonathan RoseInternational Symposium on FPGAs, 1998

Page 34: FPGA Routing

Architecture and CAD for Deep Submicron FPGAs

(Section 4.3, pp. 76-80)

Vaughn Betz, Jonathan Rose, Sandy MarquardtSpringer, 1999

Page 35: FPGA Routing

Speed Enhancements

• Directed depth-first search– Similar to A*, but arguably more aggressive– Key Issues:

• Cost function• Sink (target) ordering for multi-terminal nets

• Route classification– Low-stress– Difficult– Impossible

Page 36: FPGA Routing

Cost Function

Cost(n) = C(n) + Costprev + α(ΔD)– Costprev: cost of track segments used to reach the

current segment– C(n): (base) cost of using the track segment

• Increases more rapidly than the original PathFinder algorithm– (ΔD): estimated cost of routing from the current track

segment to the destination • Based on Manhattan (XY) distance)

– α: Direction factor• BFS: α = 0• Directed search: α > 0

Page 37: FPGA Routing

Updated Node (Base) Cost Function

• C(n) = p(n) × (b(n) + h(n))– b(n): base cost – h(n): history cost– p(n): usage cost

• C(n) = p(n) × b(n) × h(n) + Bendcost(n, m)– Multiplying b(n) and h(n) eliminates normalization– Penalize global routes that take a lot of turns• These routes are less likely to use long wires

Page 38: FPGA Routing

Penalty Cost Function

• Not specified in the original PathFinder paper

p(n) = 1 + max(0, [occupancy(n) – capacity(n)] × pfac)• occupancy(n): number of nets using resource n• capacity(n): maximum number of nets that can use

resource n (typically 1)• pfac: 0.5 for the first iteration; increase by 1.5x to 2x each

subsequent iteration

Page 39: FPGA Routing

History Cost Function

• Not specified in the original PathFinder paper

• occupancy(n): number of nets using resource n• capacity(n): maximum number of nets that can use

resource n (typically 1)• hfac: constant; any value between 0.2 and 1 works well

Page 40: FPGA Routing

Target Selection

Closest Sink First• Uses fewer track segments

Furthest Sink First• Uses more track segments

Page 41: FPGA Routing

Net Order

• Route nets in order of decreasing fan-out

• High fanout nets tend to span the whole FPGA– Easier to route when there is less congestion do to

other nets routed earlier

• Low fanout nets tend to be more localized– Relatively easy to route, even in the presence of

some congestion

Page 42: FPGA Routing

Binning

• Only the portions of the net close to the target sink should be expanded– E.g., only expand within

Bin 4 in this example

• Shown to be effective for sinks with more than 50 targets

Page 43: FPGA Routing

Bounding Box

• Define a bounding box around source and sinks• Restrict the route for each net to no more than 3

channels outside of the bounding box

Page 44: FPGA Routing

Routing High-fanout Nets

Page 45: FPGA Routing

Difficulty Prediction

• Since you can’t know Wmin first without routing, use an estimate based on a wirelength prediction model

Page 46: FPGA Routing

Routability Results

Page 47: FPGA Routing

Routing Times as W increases

Page 48: FPGA Routing

Routing Time vs. W (clma Benchmark)

Page 49: FPGA Routing

BFS vs. Directed Search (w/Binning)

Page 50: FPGA Routing

Difficulty Prediction

• Westimate took less than 1 second to compute

• Some prediction mistakes will happen

• Prediction accuracy was 84%

Page 51: FPGA Routing

Fuzzy Prediction

Page 52: FPGA Routing

Architecture and CAD for Deep Submicron FPGAs(Section 2.2.4, pp. 31-34,

Section 4.4, pp. 80-89, Section 4.6.4, pp. 99-102 )

Vaughn Betz, Jonathan Rose, Sandy MarquardtSpringer, 1999

VPR Timing-Driven Router

Page 53: FPGA Routing

Example Timing Graph

Page 54: FPGA Routing

Example Timing Graph

12ns

Page 55: FPGA Routing

Timing and Slack

Process vertices in forward topological order

Process vertices in reverse topological order(Required time for any output node is Dmax)

How much can I slow down this circuit element (node) without increasing the critical path?

Page 56: FPGA Routing

Elmore Delay of a Source-Sink Path

• Td,i: intrinsic delay of element i (buffer); 0 otherwise

• Ri: equivalent resistance of element i (Rwire, Rbuf, Rpass)

• C(subtreei): total downstream capacitance not isolated by buffers

Page 57: FPGA Routing

Equivalent Circuits

Page 58: FPGA Routing

Example: M Wire Segments connected by Buffers

Linear and Elmore Delay Models agree!

Page 59: FPGA Routing

Example: Chain of Pass Transistors

Linear

Quadratic

Page 60: FPGA Routing

Fanout

Page 61: FPGA Routing

Introduce Elmore Delay into VPR-PathFinder’s Node Cost Function

Parameters that control how slack affects congestion-delay tradeoff

Page 62: FPGA Routing

… And to Support Directed Search• Expected cost to route from a wire segment n to the target

sink j

– The connection will be completed using wires of the same type as node n • Length, connecting switch type, etc.

– There is no congestion along the shortest path to the target • p(m) and h(m) are both 1 for all nodes m on the path that completes the

connection.

Page 63: FPGA Routing

Why Optimize Elmore Delay?

Page 64: FPGA Routing

Loops

Page 65: FPGA Routing

Timing-Driven Router

Page 66: FPGA Routing

Timing-Driven Routing of High-Fanout Nets

Xun Chen, Jianwen Zhu, Minxuan ZhangInternational Conference on Field Programmable

Logic and Applications, 2011

Page 67: FPGA Routing

Motivation (for Binning, Earlier)

Page 68: FPGA Routing

Angle-based Pruning

Route the trunk first, then each leaf

NetDecomposition

Page 69: FPGA Routing

Results

Page 70: FPGA Routing

Routing Algorithms for FPGAs with Sparse Intra-Cluster Routing Crossbars

Yehdhih Moctar, Guy Lemieux, Philip BriskInternational Conference on Field Programmable

Logic and Applications, 2012

Page 71: FPGA Routing

71

Full Crossbar Intra-cluster Routing2-LUT BLE

in1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Page 72: FPGA Routing

72

RRG

Full Crossbar Intra-cluster Routing

RRG

Page 73: FPGA Routing

73

Full Crossbar RRG

Don’t model the RRG inside each CLB!

Page 74: FPGA Routing

74

Sparse Crossbar Intra-cluster Routing (1/6)

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• RRG must include the intra-cluster routing

Page 75: FPGA Routing

75

Sparse Crossbar Intra-cluster Routing

RRG

Page 76: FPGA Routing

76

Sparse Crossbar RRG

RRG encompasses the entire FPGA

Page 77: FPGA Routing

77

Why is this Hard?

• Key Issues relating to RRG size– May run out of memory • If you are not routing in the cloud

– Long router runtimes• Large search space leads to slow convergence• Thrashing

• Objective– Route without expanding the whole RRG

Page 78: FPGA Routing

78

Routing Algorithms• Extensions to PathFinder routing algorithm

– Routes one net at a time via wavefront expansion

• Baseline– Extend the RRG with intra-cluster routing at each CLB

• Selective RRG Expansion (SERRGE)– Route nets to CLB inputs– Dynamically expand the RRG to finish the route

• Partial Pre-Routing (PPR)– Pre-route all nets within CLBs– Complete global routes to CLB inputs

Page 79: FPGA Routing

79

Baseline Router2-LUT BLE

in1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

• Inputs of the same LUT are equivalent!

Global RRG

Page 80: FPGA Routing

80

SERRGE in Action (1/8)

Start with a global RRG

s

t

Page 81: FPGA Routing

81

SERRGE in Action (2/8)

s

t

Route from s’s CLB to t’s CLB

Page 82: FPGA Routing

82

SERRGE in Action (3/8)

s

t

Locally expand the RRG with part of t’s CLB

Page 83: FPGA Routing

83

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Maintain one copy of the intra-cluster topology

SERRGE in Action (4/8)

t

Page 84: FPGA Routing

84

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Expand the fanout of the CLB input being routed

SERRGE in Action (5/8)

t

Page 85: FPGA Routing

85

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Complete the route if possible; if not, try again

SERRGE in Action (6/8)

t

Page 86: FPGA Routing

86

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Track used CLB and BLE inputs; remember the route

SERRGE in Action (7/8)

t

Page 87: FPGA Routing

87

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Deallocate unused RRG resources

SERRGE in Action (8/8)

t

Page 88: FPGA Routing

88

SERRGE Summary• Global routing uses standard PathFinder

• Selectively expand portions of the RRG to complete routes from CLB inputs to BLE inputs

• Storage Requirement– Global RRG (standard PathFinder requirement)– One copy of the intra-cluster routing topology– All intra-cluster routes computed thus far

• Fairly challenging to implement!

Page 89: FPGA Routing

89

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Process CLBs one at a time

PPR in Action (1/4)

t1

t2

Page 90: FPGA Routing

90

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

Compute local routes for each BLE

PPR in Action (2/4)

t1

t2

Page 91: FPGA Routing

91

2-LUT BLEin1

in2

2-LUT BLEin1

in2

CLB inputs

Local feedbacks

CLB inputs now form equivalence classes.

PPR in Action (3/4)

t1

t2

Page 92: FPGA Routing

92

PPR in Action (4/4)

Perform global routing w/equivalence class constraints on CLB inputs

s

t

Page 93: FPGA Routing

93

PPR Summary

• Pre-route each CLB

• Propagate LUT equivalences to CLB inputs– A baseline router on the full-blown RRG would still need to

satisfy BLE input-pin equivalence constraints

• Route as normal

• Much easier implementation than SERRGE

Page 94: FPGA Routing

Critical Path Delayac

_ctr

l

aes_

core

des_

area

mem

_ctr

l

pci_

brid

ge32 sp

i

syst

emca

es

syst

emcd

es

usb_

func

t

wb_

conm

axgeommean

012345678

p = 40%

ns

ac_c

trl

aes_

core

des_

area

mem

_ctr

l

pci_

brid

ge32 sp

i

syst

emca

es

syst

emcd

es

usb_

func

t

wb_

conm

ax

geomean

012345678

p = 75%

ns

PPR SERRGE Baseline

Page 95: FPGA Routing

RRG Size (MB)ac

_ctr

l

aes_

core

des_

area

mem

_ctr

l

pci_

brid

ge32 sp

i

syst

emca

es

syst

emcd

es

usb_

func

t

wb_

conm

axgeommean

0

10

20

30

40

50

60

p = 40%

MB

ac_c

trl

aes_

core

des_

area

mem

_ctr

l

pci_

brid

ge32 sp

i

syst

emca

es

syst

emcd

es

usb_

func

t

wb_

conm

ax

geomean

0

10

20

30

40

50

60

p = 75%

MB

PPR SERRGE Baseline

Page 96: FPGA Routing

Router Runtime (s)ac

_ctr

l

aes_

core

des_

area

mem

_ctr

l

pci_

brid

ge32 sp

i

syst

emca

es

syst

emcd

es

usb_

func

t

wb_

conm

axgeomean

0

200

400

600

800

1000

1200

p = 40%

Seco

nds

ac_c

trl

aes_

core

des_

area

mem

_ctr

l

pci_

brid

ge32 sp

i

syst

emca

es

syst

emcd

es

usb_

func

t

wb_

conm

ax

geomean

0

200

400

600

800

1000

1200

Seco

nds

PPR SERRGE Baseline

p = 75%