28
Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of Wisconsin - Madison

Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

Exploration of Pipelined FPGA Interconnect Structures

Scott HauckAkshay Sharma, Carl EbelingUniversity of Washington

Katherine ComptonUniversity of Wisconsin - Madison

Page 2: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

2

PipeRoute

• FPGA’2003: Pipelining-aware Router for FPGAs• Architecture-adaptive, based on Pathfinder

• Uses optimal 2-terminal, 1-delay router

• Greedy formulation for multi-delay, multi-terminal routing

T1

S

T2

Page 3: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

3

RaPiD

• Coarse-grained, 1D, 16-bit, w/DSP Units

• Carl Ebeling @ UW-CSE

• Pipelined interconnect via Bus Connectors (BCs)

GP

R

RA

M

RA

M

GP

R

MU

LT

GP

R

AL

U

AL

U

GP

R

GP

R

RA

M

AL

U

GP

R

Page 4: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

4

Pipelined Routing Results• Area expansion due to pipelining

• Normalized to unpipelined circuit area

0

0.5

1

1.5

2

2.5

3

0% 10% 20% 30% 40% 50% 60% 70%

% PIPELINED SIGNALS

NO

RM

AL

IZE

D A

RE

A

TS TS

Ave: 75% cost

Page 5: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

5

Contributions

• Optimized PipeRoute• Support multiple delays per BC (greedy preprocessor)

• Timing driven – Pathfinder’s, worst-case criticality across signal

• RouteCost = Criticality * delay_cost + (1-criticality) * area_cost

• Arch. Exploration of RaPiD Pipelined Interconnects• Registered logic block (input/output/none)

• BC track length

• Delays per register/BC

• BC/non-BC routing mix

• Register-only logic blocks

• Goal: More efficient support of pipelined interconnects

TS

Page 6: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

6

Methodology

• Benchmarks• Retimed, not C-slowed

• Graphs• Increase arch to fit

(cells, tracks/cell)

• Variation around local minima

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7Delays per BC/Reg

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

0%

20%

40%

60%

80%

NETLIST

% P

IPE

LIN

ED

SIG

NA

LS

Page 7: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

7

Registers in Logic Blocks

• Output Registers

• No Registers

• Input Registers

0

1

2

3

4

5

6

7

8

9

Out None InRegs in Functional Units

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

+

+

+

T1

S

T2

5% 20% 23%

Page 8: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

8

Delays per Register/BC

• 1 Delay/BC

• 2 Delays/BC

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7Delays per BC/Reg

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

15% 20% 30%

Page 9: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

9

BC Track Length

• Length 16 BC wires

• Length 8 BC wires

0

1

2

3

4

5

6

7

8

9

32 16 8 4BC Track Length

AR

EA

0

5

10

15

20

25

DE

LA

Y

AREA DELAY AREA*DELAY

17% 64% 69%

Page 10: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

10

Routing Resource Mix (BC vs. non-BC)

• 5/7

• 7/7

0

1

2

3

4

5

6

7

8

9

7/7 6/7 5/7 4/7 3/7Proportion BC Tracks

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

19% 17% 18%

Page 11: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

11

GPRs per Cell

• GPR roles:• Registers from computation

• Passthrough for changing tracks

• 6 per cell

• 9 per cell

0

1

2

3

4

5

6

7

8

5 6 7 8 9 10GPRs per Cell

AR

EA

0

2

4

6

8

10

12

DE

LA

Y

AREA DELAY AREA*DELAY

6% 23% 22%

Page 12: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

12

Overall – vs. RaPiD-I

• RaPiD-I• 1 BC / cell (13 LBs long)

• 5/7 BC tracks

• 3 registers / BC

• 6 GPRs / cell

• registered outputs

• Post-Explore• 1 BC / cell (16 LBs long)

• 5/7 BC tracks

• 3 registers / BC

• 9 GPRs / cell

• registered inputs

0

0.2

0.4

0.6

0.8

1

1.2

1.4

firtm

fft16

cascade

matmult4

sobel

imagerapid

firsymeven

sort_g

sort_rb

Proportion non-BC Tracks

Ra

tio

Po

st/

Ra

PiD

-I

AREA DELAY AREA*DELAY

Ave: 1% 18% 19%

Page 13: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

13

Overall – Pipelining Cost

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0% 10% 20% 30% 40% 50% 60% 70% 80%

% Pipelined Signals

No

rma

lize

d A

rea

TS TS

Ave: 18% cost

Page 14: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

14

Conclusions

• Router for arbitrary pipelined architectures• Timing-driven

• Supports multiple delays at each register site

• Good quality: <18% of pseudo-lower bound (non-pipelined) area

• Architecture Exploration of RaPiD• Parameters:

• Registered inputs on functional units

• Length 16 wires

• 3 delays per BC/register

• 2/7 non-registered, 5/7 registered wires

• 9 GPRs/cell to improve flexibility

• Delay: spacing of registers CRITICAL, too close better than too far

• 19% area*delay improvement over RaPiD-I (primarily delay)

Page 15: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

15

*** End of Talk Marker ***

Page 16: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

16

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

Page 17: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

17

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

Page 18: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

18

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

Page 19: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

19

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

Page 20: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

20

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

Page 21: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

21

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

Page 22: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

22

1-Delay Two Terminal

• Can do optimal routing for 1-delay routes via BFS

TS

Page 23: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

23

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router

TS

Page 24: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

24

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

TS

Page 25: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

25

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS

Page 26: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

26

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS

Page 27: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

27

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS

Page 28: Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of

28

N-Delay Two Terminal

• Greedy Approximation via 1-Delay Router• Find 1-delay route

• While not enough delay on route

• Replace any 0-delay segment with cheapest 1-delay replacement

TS