Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling...

Preview:

Citation preview

Exploration of Pipelined FPGA Interconnect Structures

Scott HauckAkshay Sharma, Carl EbelingUniversity of Washington

Katherine ComptonUniversity of Wisconsin - Madison

2

PipeRoute

• FPGA’2003: Pipelining-aware Router for FPGAs• Architecture-adaptive, based on Pathfinder

• Uses optimal 2-terminal, 1-delay router

• Greedy formulation for multi-delay, multi-terminal routing

T1

S

T2

3

RaPiD

• Coarse-grained, 1D, 16-bit, w/DSP Units

• Carl Ebeling @ UW-CSE

• Pipelined interconnect via Bus Connectors (BCs)

GP

R

RA

M

RA

M

GP

R

MU

LT

GP

R

AL

U

AL

U

GP

R

GP

R

RA

M

AL

U

GP

R

4

Pipelined Routing Results• Area expansion due to pipelining

• Normalized to unpipelined circuit area

0

0.5

1

1.5

2

2.5

3

0% 10% 20% 30% 40% 50% 60% 70%

% PIPELINED SIGNALS

NO

RM

AL

IZE

D A

RE

A

TS TS

Ave: 75% cost

5

Contributions

• Optimized PipeRoute• Support multiple delays per BC (greedy preprocessor)

• Timing driven – Pathfinder’s, worst-case criticality across signal

• RouteCost = Criticality * delay_cost + (1-criticality) * area_cost

• Arch. Exploration of RaPiD Pipelined Interconnects• Registered logic block (input/output/none)

• BC track length

• Delays per register/BC

• BC/non-BC routing mix

• Register-only logic blocks

• Goal: More efficient support of pipelined interconnects

TS

6

Methodology

• Benchmarks• Retimed, not C-slowed

• Graphs• Increase arch to fit

(cells, tracks/cell)

• Variation around local minima

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7Delays per BC/Reg

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

0%

20%

40%

60%

80%

NETLIST

% P

IPE

LIN

ED

SIG

NA

LS

7

Registers in Logic Blocks

• Output Registers

• No Registers

• Input Registers

0

1

2

3

4

5

6

7

8

9

Out None InRegs in Functional Units

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

+

+

+

T1

S

T2

5% 20% 23%

8

Delays per Register/BC

• 1 Delay/BC

• 2 Delays/BC

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7Delays per BC/Reg

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

15% 20% 30%

9

BC Track Length

• Length 16 BC wires

• Length 8 BC wires

0

1

2

3

4

5

6

7

8

9

32 16 8 4BC Track Length

AR

EA

0

5

10

15

20

25

DE

LA

Y

AREA DELAY AREA*DELAY

17% 64% 69%

10

Routing Resource Mix (BC vs. non-BC)

• 5/7

• 7/7

0

1

2

3

4

5

6

7

8

9

7/7 6/7 5/7 4/7 3/7Proportion BC Tracks

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

19% 17% 18%

11

GPRs per Cell

• GPR roles:• Registers from computation

• Passthrough for changing tracks

• 6 per cell

• 9 per cell

0

1

2

3

4

5

6

7

8

5 6 7 8 9 10GPRs per Cell

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

6% 23% 22%

12

Overall – vs. RaPiD-I

• RaPiD-I• 1 BC / cell (13 LBs long)

• 5/7 BC tracks

• 3 registers / BC

• 6 GPRs / cell

• registered outputs

• Post-Explore• 1 BC / cell (16 LBs long)

• 5/7 BC tracks

• 3 registers / BC

• 9 GPRs / cell

• registered inputs

0

0.2

0.4

0.6

0.8

1

1.2

1.4

firtm

fft16

cascade

matmult4

sobel

imagerapid

firsymeven

sort_g

sort_rb

Proportion non-BC Tracks

Ra

tio

Po

st/

Ra

PiD

-I

AREA DELAY AREA*DELAY

Ave: 1% 18% 19%

13

Overall – Pipelining Cost

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0% 10% 20% 30% 40% 50% 60% 70% 80%

% Pipelined Signals

No

rma

lize

d A

rea

TS TS

Ave: 18% cost

14

Conclusions

• Router for arbitrary pipelined architectures• Timing-driven

• Supports multiple delays at each register site

• Good quality: <18% of pseudo-lower bound (non-pipelined) area

• Architecture Exploration of RaPiD• Parameters:

• Registered inputs on functional units

• Length 16 wires

• 3 delays per BC/register

• 2/7 non-registered, 5/7 registered wires

• 9 GPRs/cell to improve flexibility

• Delay: spacing of registers CRITICAL, too close better than too far

• 19% area*delay improvement over RaPiD-I (primarily delay)

15

*** End of Talk Marker ***

16

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

17

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

18

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

19

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

20

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

21

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

22

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

23

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router

TS

24

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

TS

25

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS

26

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS

27

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS

28

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS