View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Exploration of Pipelined FPGA Interconnect Structures
Scott HauckAkshay Sharma, Carl EbelingUniversity of Washington
Katherine ComptonUniversity of Wisconsin - Madison
2
PipeRoute
• FPGA’2003: Pipelining-aware Router for FPGAs• Architecture-adaptive, based on Pathfinder
• Uses optimal 2-terminal, 1-delay router
• Greedy formulation for multi-delay, multi-terminal routing
T1
S
T2
3
RaPiD
• Coarse-grained, 1D, 16-bit, w/DSP Units
• Carl Ebeling @ UW-CSE
• Pipelined interconnect via Bus Connectors (BCs)
GP
R
RA
M
RA
M
GP
R
MU
LT
GP
R
AL
U
AL
U
GP
R
GP
R
RA
M
AL
U
GP
R
4
Pipelined Routing Results• Area expansion due to pipelining
• Normalized to unpipelined circuit area
0
0.5
1
1.5
2
2.5
3
0% 10% 20% 30% 40% 50% 60% 70%
% PIPELINED SIGNALS
NO
RM
AL
IZE
D A
RE
A
TS TS
Ave: 75% cost
5
Contributions
• Optimized PipeRoute• Support multiple delays per BC (greedy preprocessor)
• Timing driven – Pathfinder’s, worst-case criticality across signal
• RouteCost = Criticality * delay_cost + (1-criticality) * area_cost
• Arch. Exploration of RaPiD Pipelined Interconnects• Registered logic block (input/output/none)
• BC track length
• Delays per register/BC
• BC/non-BC routing mix
• Register-only logic blocks
• Goal: More efficient support of pipelined interconnects
TS
6
Methodology
• Benchmarks• Retimed, not C-slowed
• Graphs• Increase arch to fit
(cells, tracks/cell)
• Variation around local minima
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7Delays per BC/Reg
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
0%
20%
40%
60%
80%
NETLIST
% P
IPE
LIN
ED
SIG
NA
LS
7
Registers in Logic Blocks
• Output Registers
• No Registers
• Input Registers
0
1
2
3
4
5
6
7
8
9
Out None InRegs in Functional Units
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
+
+
+
T1
S
T2
5% 20% 23%
8
Delays per Register/BC
• 1 Delay/BC
• 2 Delays/BC
0
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7Delays per BC/Reg
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
15% 20% 30%
9
BC Track Length
• Length 16 BC wires
• Length 8 BC wires
0
1
2
3
4
5
6
7
8
9
32 16 8 4BC Track Length
AR
EA
0
5
10
15
20
25
DE
LA
Y
AREA DELAY AREA*DELAY
17% 64% 69%
10
Routing Resource Mix (BC vs. non-BC)
• 5/7
• 7/7
0
1
2
3
4
5
6
7
8
9
7/7 6/7 5/7 4/7 3/7Proportion BC Tracks
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
19% 17% 18%
11
GPRs per Cell
• GPR roles:• Registers from computation
• Passthrough for changing tracks
• 6 per cell
• 9 per cell
0
1
2
3
4
5
6
7
8
5 6 7 8 9 10GPRs per Cell
AR
EA
0
2
4
6
8
10
12
DE
LA
Y
AREA DELAY AREA*DELAY
6% 23% 22%
12
Overall – vs. RaPiD-I
• RaPiD-I• 1 BC / cell (13 LBs long)
• 5/7 BC tracks
• 3 registers / BC
• 6 GPRs / cell
• registered outputs
• Post-Explore• 1 BC / cell (16 LBs long)
• 5/7 BC tracks
• 3 registers / BC
• 9 GPRs / cell
• registered inputs
0
0.2
0.4
0.6
0.8
1
1.2
1.4
firtm
fft16
cascade
matmult4
sobel
imagerapid
firsymeven
sort_g
sort_rb
Proportion non-BC Tracks
Ra
tio
Po
st/
Ra
PiD
-I
AREA DELAY AREA*DELAY
Ave: 1% 18% 19%
13
Overall – Pipelining Cost
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0% 10% 20% 30% 40% 50% 60% 70% 80%
% Pipelined Signals
No
rma
lize
d A
rea
TS TS
Ave: 18% cost
14
Conclusions
• Router for arbitrary pipelined architectures• Timing-driven
• Supports multiple delays at each register site
• Good quality: <18% of pseudo-lower bound (non-pipelined) area
• Architecture Exploration of RaPiD• Parameters:
• Registered inputs on functional units
• Length 16 wires
• 3 delays per BC/register
• 2/7 non-registered, 5/7 registered wires
• 9 GPRs/cell to improve flexibility
• Delay: spacing of registers CRITICAL, too close better than too far
• 19% area*delay improvement over RaPiD-I (primarily delay)
15
*** End of Talk Marker ***
16
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
17
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
18
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
19
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
20
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
21
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
22
1-Delay Two Terminal
• Can do optimal routing for 1-delay routes via BFS
TS
23
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router
TS
24
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
TS
25
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
26
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
27
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS
28
N-Delay Two Terminal
• Greedy Approximation via 1-Delay Router• Find 1-delay route
• While not enough delay on route
• Replace any 0-delay segment with cheapest 1-delay replacement
TS