Bachelor of Applied Science Thesis Defense
An Analysis of Network-on-Chip Implementations on Field Programmable Gate Arrays
Kevan ThompsonComputer Engineering
School of Engineering Science, SFU
Overview
Introduction
Background
Methodology
Results
Conclusions and Future Work
Introduction
2000 2002 2004 2006 2008 2010 20120
500000
1000000
1500000
2000000
2500000
Size of Xilinx FPGAs
Year
Num
ber o
f Log
ic Ce
lls
Virtex-2P
Virtex-7
Virtex-6
Virtex-5Virtex-4
ASIC Vs FPGA
ASIC: Completely Custom Design
Large Initial Investment
Need to carefully design interconnect between nodes
FPGA:
Reconfigurable
Low cost for small volume runs
Wires already placed on the FPGA
Objective
Improvements in the Xilinx tools that have significantly affected the performance of NoCs on FPGAs
Improvements in NoC performance on FPGAs that are possible using manual PAR
The Star and Fully Connected topologies do not fit into current models
NoC Terminology
Ring
Star
Mesh
Fully Connected
•Topology
•Node
•Degree
•Average Node Degree (AND)
Previous Work on NoCs on FPGAs
For Xilinx FPGAs:
Methodology
Multiplier Node| | | ... | | |
| | | ... | | |
Input FSL
Output FSL
Network Switch
.
.
.
8-bit multiplier node Two Fast Simplex Links (FSLs)
Network topology communication switch
FSLs: 16-word-deep queues,24-bit width
Multiplier uses 981 Flip-flops, and 653 LUTs
FPGA Xilinx Virtex-5 xc5vlx330
Results
10.1 Tools Vs 12.1 Tools
•Star, Ring, and Fully Connected Networks
Predicted Vs Measured Results
•Star, and Fully Connected Networks
Manual Implementation
Ring, Star, and Mesh Networks
10.1 Tools VS 12.1 Tools for Star Networks
8 16 32 48 640
50
100
150
200
250
10.1 Tools Vs 12.1 Tools for Star Networks
10.1 Tools12.1 Tools
Number of Nodes
Maxi
mum Frequency (
MHz)
10.1 Tools VS 12.1 Tools for Ring Networks
8 16 32 48 64192
194
196
198
200
202
204
206
208
210
10.1 Tools Vs 12.1 Tools for Ring Networks
10.1 Tools12.1 Tools
Number of Nodes
Maxi
mum Frequency (
MHZ)
10.1 Tools VS 12.1 Tools for Fully Connected Networks
8 16 24 32 40 480
20
40
60
80
100
120
140
160
180
200
10.1 Tools Vs 12.1 Tools For Fully Connected Networks
10.1 Tools12.1 Tools
Number of Nodes
Maxi
mum Frequency (
MHz)
Percent Improvement of 12.1 Tools Over 10.1 Tools
8 16 32 48 640.0%
10.0%
20.0%
30.0%
40.0%
50.0%
60.0%
70.0%
80.0%
Percent Improvement of 12.1 Over 10.1 Tools
StarRingFully Connected
Number of Nodes
Percent Increase (%)
Star Networks
8 16 32 48 64 80 960
50
100
150
200
250
Predicted Vs Measured Results
Measured ResultsPredicted Results
Number of Nodes
Maxi
mum Frquency (
MHz)
𝒚= −𝟎.𝟑𝟎𝟗𝟎𝒙+ 𝟐𝟎𝟑.𝟖
Results
Results for Adjusted Model
8 16 32 48 64 80 960
50
100
150
200
250
Adjusted Predicted Vs Measured Result
Measured ResultsPredicted Results
Number of Nodes
Maxi
mum Frequency (
MHz)
Comparison of Models
8 16 32 48 64 80 96
-20
-10
0
10
20
30
40
50
60
Percent Difference Between Predicted and Measured Results
Original ModelAdjusted Model
Number of Nodes
Percent Difference (%)
Prediction of Adjusted Model for Random Networks
2 3 4 5 6 7 8 9 10
-70
-60
-50
-40
-30
-20
-10
0
Percent Error for Random Networks
Random_16Random_32Random_48
Average Node Degree
Percent Error (%)
Fully Connected Networks
8 16 24 32 40
-300
-200
-100
0
100
200
300
Predicted Vs Measured Results for Fully Connected Networks
MeasuredPredicted
Number of Nodes
Maxi
mum Frequency (
MHz)
Results
5 10 15 20 25 30 35 40 45 50 550
20
40
60
80
100
120
140
160
180
200
Interpolated Results for Fully Connected Networks
Number of Nodes
Max
imum
Fre
quen
cy (M
Hz)
CAD Tool Synthesis Steps
Behavioural-level Synthesis [14]
Technology Mapping [15]
Placement [16]
Routing [16]
1
4
3
2
HDL is parsed for recognizable constructs
Constructs mapped to the specific FPGAs technology
Components of the design are placed on the FPGA using Simulated Annealing
Wires are connected between the components, using an algorithm called Pathfinder
Automatic PAR of a 96 node Ring Network
Manual PAR of a 96 Node Ring Network
Ring Network Pre and Post PlanAhead Results
8 16 32 48 64 80 96 1280
50
100
150
200
250
Ring Pre and Post PlanAhead Results
Pre-PlanAheadPost-PlanAhead
Number of Nodes
Maxi
mum Frequency (
Mhz)
Star Network Pre and Post PlanAhead Results
8 16 32 48 64 80 960
50
100
150
200
250
Star Pre and Post PlanAhead Results
Pre-PlanAheadPost-PlanAhead
Number of Nodes
Maxi
mum Frequency (
MHz)
Mesh Network Pre and Post PlanAhead Results
8 16 32 48175
180
185
190
195
200
205
210Mesh Pre and Post PlanAhead Results
Pre-PlanAheadPost-PlanAhead
Number of Nodes
Maxi
mum Frequency (
MHz)
Conclusions
Xilinx 12.1 Tools offer significant improvements in the PAR of NoCs on FPGAs
The analytical model proposed by Lee et al[1] does accuratly predict the performance of Star, and Fully Connected Networks
Using manual PAR it is possible to improve the performance of NoCs on FPGAs
Future Work
Compare the performance of the Xilinx 10.1 tools suite and the Xilinx 12.1 tools suite for link widths of 16, and 32 bits
Build Star and Fully Connected networks with link widths of 16, and 32 bits
Create manual implementations for Torus and Hyper Cube topologies
Acknowledgements
Dr. Lesley Shannon
Dr. Ash Parameswaran
Michael Sjoerdsma
Viewers Like you!
References
[1] J. Lee. “An Analytical Model Describing The Performance Of Application-Specific Networks-On-Chip On Field-Programmable Gate Arrays” M.A.Sc. thesis, Simon Fraser University, Canada, 2007.[2] Xilinx. “Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds083.pdf[3] Xilinx. “Virtex-4 Family Overview”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf [4] Xilinx. “Virtex-5 Family Overview”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds100.pdf[5] Xilinx. “Virtex-6 Family Overview”. 2010. Available: http://www.xilinx.com/support/documentation/data_sheets/ds150.pdf[6] Xilinx. “Virtex-7 Product Table”. 2010. Available: http://www.xilinx.com/publications/prod_mktg/Virtex7-Product-Table.pdf[7] Xilinx. “What's New in Xilinx ISE Design Suite 12”. 2010. Available: http://www.xilinx.com/support/documentation/sw_manuals/xilinx12_1/whatsnew.htm#121
References Cont…
[8] Cisco Systems Inc. “Fiber Distributed Data Interface”. 2010. Available: http://docwiki.cisco.com/wiki/Fiber_Distributed_Data_Interface[9] Cisco Systems Inc. “Token Ring/IEEE 802.5”. 2010. Available: http://docwiki.cisco.com/wiki/Token_Ring/IEEE_802.5[10] Cisco Systems Inc. “Ethernet Technologies”. 2010. Available: http://docwiki.cisco.com/wiki/Ethernet_Technologies[11] Kompics. “Distributed System Launcher”. 2010. Available: http://kompics.sics.se/trac/wiki/DistributedSystemLauncher[12] T. Kranenburg, R. van Leuken. “MB-LITE: A robust, light-weight soft-core implementation of the MicroBlaze architecture”, DATE, France, 2010.[13] K Eguro, S. Hauck, A. Sharma. “Architecture -Adaptive Range Limit Windowing for Simulated Annealing FPGA Placement”, DAC, United States, 2005.[14] G. Grewal, M. O’Cleirigh, M. Wineberg. “An Evolutionary Approach to Behavioral-Level Synthesis”, CEC, Australia, 2003.
References Cont…
[15] C Legl, B Wurth, K. Eckl. “A Boolean Approach to Performance-Directed Technology Mapping for LUT-Based FPGA Designs”, DAC, United States, 1996.[16]S Chin, S Wilton. “An Analytical Model Relating Fpga Architecture And Place And Route Runtime”, FPL, Czech Republic, 2009.[17]R Gindin, I Cidon, I Keidar. “NoC-Based FPGA: Architecture and Routing”, NOCS, United States, 2007.
Questions?