Upload
lythu
View
222
Download
5
Embed Size (px)
Citation preview
AN INTERACTIVE EMBEDDED SYSTEM
& AI TO PLAY CONNECT5 David Biancolin, Ritchie Zhao, Mohamad Kayed
CONTENTS
Overview ........................................................................................................................................................................ 3
Outcome ........................................................................................................................................................................ 5
AI Performance .......................................................................................................................................................... 5
Achieved Features vs Proposed features................................................................................................................... 6
Future Work ............................................................................................................................................................... 6
Project Schedule ............................................................................................................................................................ 7
System Description ........................................................................................................................................................ 9
AI AlGORITHM............................................................................................................................................................ 9
Hardware Accelerator Description .......................................................................................................................... 11
Interface Description ........................................................................................................................................... 11
SUB-BLOCK Description ....................................................................................................................................... 11
Embedded C Description: ........................................................................................................................................ 12
File Structure and Description: ............................................................................................................................ 12
Embedded code flow and function description:.................................................................................................. 13
Tips and tricks .............................................................................................................................................................. 16
TIPS & TRICKS: Accelerators .................................................................................................................................... 16
TIPS & TRICKS: Vivado HLS ....................................................................................................................................... 16
Understanding interfaces: ................................................................................................................................... 16
Understanding Pragmas and Directives:.............................................................................................................. 17
Some Final Tricks…............................................................................................................................................... 17
OVERVIEW
Connect-5 is a well known board game where two players alternate and place stones onto a board. The first player
to form a pattern of five stones in a row wins the game. Competitive two player board games are a popular topic in
the FPGA research community, as evidenced by the appearance of games such as Connect-6 and Blokus in the
2012 and 2013 International Conferences on Field Programmable Technology (ICFPT). The project topic was
motivated by a desire by the team members to learn about High Level Synthesis tools such as Xilinx’s Vivado HLS
and how they can be used to generate hardware out of C code.
Figure F3. Example board state in a Connect-5 game running on the final project
Two main goals were proposed for the project:
1. Build a standalone FPGA implementation of the Connect-5 board game using the touch screen peripheral.
The system should support matches between any combination of human and AI players.
2. Build an AI using hardware accelerators which is capable of beating human players and outperforms the
same AI running on modern PCs in runtime.
Figure F4. Basic block diagram of the project showing all pcores used.
Figure F4 about gives an overview of the FPGA project. A list of IP used along with descriptions is given
below:
IP Name Description
Microblaze Xilinx’s proprietary embedded microcontroller. Clocked at 50 MHz.
HW Accelerator Vivado HLS generated accelerator capable of evaluation (scoring) all
potential moves for a given board state.
TouchSensor Interrupt managed touch screen used to accept input for human
game moves, to change game state, and reset the game.
TFT Display Conjoined with the TouchSensor, the TFT renders the 11x11 game
board and button. Renderings map to buttons.
Memory Controller Manages transactions to the on-board DD2. The TFT reads it’s input
images from this shared memory space.
DDR2 Managed by the memory controller. 128MB total size
UART Used to communicate with a PC; enables the FPGA to be pitted
against the PC version.
Microblaze
HW Accelerator Memory Controller
TouchSensor
DDR2
UART
TFT Display
FSL
AXI4Lite
Timer
AXI4
Timer Generates the interrupts required to do software-intrusive profiling
with mb-gprof.
Table T2. List of IP used in the project
OUTCOME
The two key objectives outlined in the Overview were met. Firstly, an interactive game environment was building
using the TFT touchsensor. The game implements a reduced board size (11 x 11) as larger boards would cause each
square to be too small on the touch screen. The game allows the user to switch between three different player
modes including:
1) Human: A play is generated when a human player touches a valid square on the touchsensor’s
gameboard.
2) FPGA: The FPGA’s native AI generates a move.
3) UART: The FPGA waits on an interrupt from a USB connected host-cpu.
Black Player Mode Select
White Player Mode Select
11x11 Gameboard
Reset Button
Figure X: Shows the final version of the UI.
AI PERFORMANCE
The second objective of this project was to implement an AI on the FPGA that outperformed its equivalent
implementation running on a PC. A description of the AI algorithm can be found in section X. With a speculation
depth of 5 future moves and a breadth of the best 8 available moves per level the FPGA implementation ran BX
faster than the PC implementation.
Profiles of the embedded and PC variants follow in figures X2 and Y2, and elucidate where the PC and FPGA spend
most of their execution time.
Figure X2. Profile of the PC implementation:
% cumulative self self total
time seconds seconds calls s/call s/call name
27.20 15.21 5.21 2897546 0.00 0.00 count_vert
27.04 10.38 5.18 2897546 0.00 0.00 count_horiz
19.56 14.12 3.74 2897546 0.00 0.00 count_se
18.46 17.66 3.53 2897546 0.00 0.00 count_ne
5.28 18.67 1.01 22621 0.00 0.00 get_best_move_n
1.67 18.99 0.32 2648713 0.00 0.00 copy_board
0.52 19.09 0.10 11590184 0.00 0.00 add_counts
0.16 19.12 0.03 1 0.03 19.14 tree_search
0.10 19.14 0.02 2897545 0.00 0.00 board_count_score
Figure X3. Profile of Final FPGA Variant (Breadth of 36, Depth of 3):
% cumulative self self total
time seconds seconds calls s/call s/call name
55.37 6.34 6.34 116207 0.00 0.00 hardened_generate_board_count_score
24.61 9.17 2.82 3321 0.00 0.00 hardened_get_best_move_n
4.91 9.73 0.56 505538 0.00 0.00 XGenerate_board_count_score_IsDone
3.38 10.11 0.39 3 0.13 3.67 tree_search
2.08 10.35 0.24 268033 0.00 0.00 TFT_WriteToPixel
1.68 10.55 0.19 239052 0.00 0.00 set_square
1.56 10.73 0.18 3321 0.00 0.00 generate_unsorted_scores
1.29 10.87 0.15 119528 0.00 0.00 XGenerate_board_count_score_Start
1.14 11.00 0.13 119528 0.00 0.00 XGenerate_board_count_score_SetP
1.10 11.13 0.13 116207 0.00 0.00 board_count_score
1.08 11.25 0.12 119528 0.00 0.00 XGenerate_board_count_score_GetReturn
As visible in the profile above, the majority of the system run time is spent in functions that directly interface with
the hardware accelerator, demonstrating good utilization of the hardware accelerator.
ACHIEVED FEATURES VS PROPOSED FEATURES
Both objectives set out for the project were met. In fact the eventual speed up we got from the FPGA
implementation exceeded our initial expectation. This speed up was achieved despite implementing less of the AI
in the hardware than was proposed. The proposed system would do the entire tree traversal in hardware, while
the actual system does the game tree traversal in the MicroBlaze while using the accelerator to evaluation all of
the potential moves given a particular board state.
Moving beyond what was initially proposed, a UART interface was created to enable a PC to play against the FPGA
AI, to visible show the relative speed up of the FPGA implementation versus the initial PC variant. This feature was
not part of the original proposal and was added as further demonstration of the FPGA implementation’s superior
performance.
FUTURE WORK
Integration of UART functionality was intended to be a stepping stone towards playing the FPGA system against
competitive software Connect-5 AIs found online. A number of these AIs and a framework to compete against
them are available at gomocup.org, which organizes an annual world championship for Connect-5 (also known as
Gomoku).
By constraining the time budgets of these competitive AIs to make moves, the quality of our FPGA implemented
algorithm could be measured. This would enable data-driven parameter space exploration of our AI, where we
could trade off move breadth for depth, in addition to tuning the coefficients of the board threat evaluation. This
advanced tuning would greatly improve the strength of the current system.
PROJECT SCHEDULE
Week Milestones Proposed Milestones Met
Feb 26 - Display game grid and identify square pressed - Write function to evaluate a single board state
- All met
Mar 5 - Display game state - Create minimax search tree
- All met
Mar 12 - AI finished - AI code ported to MicroBlaze - Finish touchscreen GUI
- AI is complete but buggy, makes mistakes while playing - No-Tree AI ported - Met
Mar 19 - Create prototype accelerator - Speed up AI compared to previous week
- Met - Was not able to interface accelerator with MicroBlaze
Mar 26 - Move board evaluation function into accelerator - FPGA now faster than PC
- Met, but resulting accelerator was unable to achieve speedup - Was not able to achieve speed up over PC
Apr 2 - Move search tree into hardware - Tune design to maximize search depth
- Milestone deemed not possible - Was not able to achieve speed up over PC - UART game interface created
Apr 9 - Achieved speedup using hardware accelerator - Implemented ability to play games with a PC over UART
Figure F1. Comparison of milestones proposed with milestones achieved
As Figure F1 shows, while we were able to meet the project goals, it was not until the final week that we achieved
speedup over a PC. The project met two main unexpected challenges which set back the project:
Creating a strong AI for Connect-5: While the board evaluation function and minimax search tree was written on
schedule, the AI routinely underperformed after it was meant to be complete due to the difficulty of tuning the
numerical constants in the algorithm. The AI code should be been frozen on March 12 but saw many changes in
mid-March, which affected work on hardware generation.
Defining interfaces for a custom block: While the processing elements were pushed through the HLS flow to create
a hardware accelerator on schedule, defining interfaces for the pcore proved difficult. A number of different
interfaces such as streaming and FIFO were tried, but were either non-functional or too slow. As a result
milestones regarding connecting the pcore to the MicroBlaze or using it to speed up runtime were not met.
As mentioned before, the UART interface was a feature implemented in the final project which was not part of the
original proposal. This was done because of setbacks to the project which freed up manpower in the final weeks.
SYSTEM DESCRIPTION
AI ALGORITHM
The game algorithm consists of two main concepts, a board evaluation function and a minimax search tree. The
board evaluation function assigns an integer score to a given board position which corresponds to how “good” the
position is for the current player. A winning position gets a high score while a losing position gets a low (possibly
negative) score. To decide on a move, an AI can play every possible next move, evaluate the board position
afterwards, and choose the best scored move. Such an AI performs a single move look-ahead, and was the first
version that was built.
The board evaluation function for the game of Connect-5 uses the concept of threats. Consider a pattern of five
consecutive squares in any direction (vertical, horizontal, either diagonal) – if all five squares are occupied by allied
stones, the game is a victory for the current player. If any four squares are occupied (and the pattern contains no
opposing stones), this constitutes a one-turn threat, as the opponent must play on the reminding square on its
next turn to avoid a loss. Two simultaneous four-stone threats result in a win. If any three squares are occupied,
the pattern is a two turn threat. The board evaluation function operates by counting up all instances of two, three,
and four stone threats on the board for both the player and opponent, and computes the final score by multiplying
the number of each type of threat by a scaling constant, and summing up the products. This is illustrated in the
following code:
int score = C_P4*C.p4[p] - C_O3*C.p3[o] + C_P3*C.p3[p] - C_O2*C.p2[o] + C_P2*C.p2[p]; if (C.p5[p] > 0) score = MAX_SCORE; // win else if (C.p5[o] > 0) score = -MAX_SCORE; // lose else if (C.p4[o] > 0) score = -(MAX_SCORE-2); // lose in 1 else if (C.p4[p] > 1) score = MAX_SCORE-3; // win in 1
where C_P4 and the like are the constants and C.p4[p] and C.p4[o] are the number of one type of player and
opponent threats. The if statements following ensure the function returns an extreme (essentially positive or
negative infinity) result if a win or loss is ensured.
To count the number of threats on the board, the following method is used:
1) Define a 5 square “stencil” or mask.
2) Put the stencil on the board and find what threat, if any, is inside it. A stencil contains an N-stone threat if
there are N player stones and zero opposing stones on the squares in the stencil.
3) Sweep the stencil over the board, performing step 2 at each location.
4) Perform the above steps for horizontal, vertical, and both diagonal directions.
5) Sum up all of the threats detected.
The board evaluation function thus consists of a large number of bit based comparisons and shifts when
implemented in the hardware. To maximize speed, the board is stored as a 121 element array, where each
element is a 2-bit value capable of being 0 (unoccupied), 1 (player one stone), or 2 (player 2 stone).
The board evaluation function is defined in board.c, with count_horiz, count_vert, count_ne, count_se
implementing the directional stencil sweeps, and generate_board_count_score implementing the entire evaluation
function.
In order to perform a look-ahead of two or more moves, an AI must take into account the fact that the opponent
will also be attempting to maximize its board score. The most common way to do this is through a minimax game
tree, which is illustrated in Figure F2. The algorithm works by building a tree where each node has N children, N
being the search breadth. The layers of the tree alternate between max (player) and min (opponent) layers,
starting with a max layer. Each node has an associated move to explore and a score, except the root node. Score
are calculated as follows from the bottom of the tree upwards:
For each leaf node, the score is the board score of the resulting board after playing all moves from the
root to the leaf.
For each non-leaf node in a max layer, the score is the maximum score of the children. This simulates the
player choosing its best move.
For each non-leaf node in a min layer, the score is the minimum score of the children. This simulates the
opponent choosing its best move.
The root node is always on a max layer. The move associated with the best scoring child of the root node
is the correct move to make.
Figure F2. First four layers of a minimax search tree with a breadth of 3.
In this project, the minimax search tree is implemented using the function tree_search in ai.c. This function
performs a recursive depth first traversal of the entire tree using the stack only. On a non-leaf node, tree_search
calls get_best_move_n to obtain the N best moves to make, and calls itself on each move. On a leaf node,
tree_search computes the board score. Because the tree resides on the stack, a depth first traversal was chosen as
it limits the number of recursion calls at any given time to the tree depth, thus saving memory.
Layer 0Max LayerEach node generates 3 player moves
Layer 1Min LayerEach node generates 3 opponent moves
Layer 2Max LayerEach node generates 3 player moves
Layer 3Min LayerEach node generates 3 opponent moves
One simple optimization is that if at any point a max node receives an infinite score from its child or a min node
receives a negative infinite score, the traversal is stopped immediately and the score returned to the previous
layer.
HARDWARE ACCELERATOR DESCRIPTION
The only custom IP built for the project was an accelerator used to evaluate all possible speculative moves
available given a current game state. In other words, this block returns the board scores for all of possible (121-(N-
1)) moves available to a player on turn N. As the MicroBlaze traverses down the game tree in tree search it
explores the strongest speculative moves, and passes the updated board state back to the AI.
The accelerated function is most similar to get_best_move_n, described in the previous section, however, this
function returns, as its name suggests, the best n moves available, where as the accelerator returns the scores for
all moves and expects the MicroBlaze to select from them.
The accelerator is clocked with the 50 MHz system clock, and is reset with the peripheral reset (active high).
INTERFACE DESCRIPTION
The C – signature of the function passed through the HLS flow was as follows:
int generate_board_count_score (unsigned int* master, int p, int* move_score)
Argument interfaces:
Master (input): The board game state to be evaluated. This is read sequentially and thus is implemented as an
ap_fifo built upon FSL.
P (input): the current player ID; implemented as ap_none, set through AXI4lite slave interface.
move_score (output): a buffer of all possible scores as they are generated (row-major order). This interface is
implemented as an ap_fifo built upon FSL.
Return interface:
The block returns one of two different integer values based on the value of P. If P = {0,1}, the number of
speculative moves evaluated is returned. If P = {2,3}, the current board evaluation, given no speculation, is
returned. The latter case is important to evaluate leaf nodes of the game tree, where no further speculation is
required, and for the win condition.
SUB-BLOCK DESCRIPTION
Since HLS defines sub-blocks according to the function call hierarchy, it is natural to describe the blocks within the
accelerator by referring to the functions that would be invoked in the C-implementation.
count_vert, count_horiz, count_ne, count_se:
As described in section X, a complete board evaluation, involves the application of four different stencil operators
(horizontal, vertical, and two diagonals) to every conceivable winning position on the board. These functions
perform these operations for each of the 4 possible orientations. Each function consists of three for loops, the
inner loop counts all threats with the stencil scope, the upper two loops sweep the scope over the board. The HLS
description of these functions applies the flatten directive, to the inner most loop, and pipelines the loop it’s
nested in.
As shown in figure N, these functions account for the bulk of PC run time.
generate_board_count_score_internal:
This function, takes the counts generated by the stencil functions and multiplies them by the coefficients for each
threat length (2,3,4,5). It also adds a small weight to the score to bias moves towards the centre of the board. This
weight is omitted if there is a threat that would force a win or loss for the current player.
generate_board_count_score:
This is the top level function (whose arguments were described in the previous section). Here, two loops sweep
through every possible speculative position, and pass the now-speculative game state to the
generate_board_count_score_interal, for evaluation. Upon completion, the score and the moves position (x,y) are
written to the output fifo [move_score]. The move is unplayed, and the block proceeds to evaluate the next
available move.
EMBEDDED C DESCRIPTION:
The embedded C code performs three main functions:
1) Execute game flow
2) Interface with the hardware accelerator
3) Render and display the game User Interface(UI)
To best understand the embedded code, the following section describes the file structure of the code. In addition,
it provides an overview of the embedded code flow with descriptions of the functions used and how they serve to
perform each of the three main functions outlined above.
FILE STRUCTURE AND DESCRIPTION:
main.c
This file contains the IP and game state initialization, the main execution loop as well as the implementation
for several helper functions.
ai.c/ai.h
These files contain the C component of the AI implementation. The AI leverages the hardware accelerator core
to generate board score counts.
board.c/board.h
These above files contain the functions related to manipulating and evaluating the game board.
gameboard.c/gameboard.h
These files implement the UI side of the game board. In particular, they provide the functionality used to
render the UI control buttons as well as rendering and re-rendering the game board following a move.
common.h
This file contains the definition of several game structures that are used throughout the embedded code. The
most important of these structures are: BOARD, PLAYER and COORD. In addition, it contains many important
macro definitions such as board dimensions and search tree parameters.
touchsensor_buttons.c/touchsensor_buttons.h
These files provide a Touch Sensor Button Library API. This allows for the creation of buttons with custom
functionality managed through a provided button manager.
xgenerate_board_count_score_ctrl.h/ xgenerate_board_count_score.c/ xgenerate_board_count_score.h
These files make up the driver for the hardware accelerator core.
EMBEDDED CODE FLOW AND FUNCTION DESCRIPTION:
As described above, the main function of the program lies in main.c. The beginning of this function consists of a
series of function calls that perform the required initialization of the game state and the numerous IPs used:
Touch Sensor IP and Button setup:
- TouchSensor_Initialize: This function initializes the touch screen IP which is used to receive user input for
gameplay and mode selection. The function is part of the touch sensor driver.
- TouchSensorButtons_InitializeManager: This function initializes the button manager which is used to direct a
touch sensor interrupt upon a button press to the correct button handler function. This function is part of the
Touch Sensor Buttons API which will be described in more detail later.
Following that, the following set of functions is called to set up the main buttons of which the game UI is
comprised:
- Button_SetGridDim/Button_SetRectDim: These functions set up the dimensions of the buttons to be rendered
onto the TFT Display. The first function applies to grid type buttons while the second function applies to the
rectangle type buttons
- Button_AssignHandler: This function assigns the button handler in the button data structure.
- TouchSensorButtons_RegisterButton: This function registers the button with the button manager. As
described above, the button manager will invoke this handler upon receiving a touch sensor interrupt that
corresponds to the button
- TouchSensorButtons_EnableButton: Finally, this function enables the button with in the button manager. This
functionality exists to allow buttons to be disabled/enabled during program execution
The above set of functions is also part of the Touch Sensor Buttons API. In total, the UI consists of 4 buttons.
There are three grid type buttons which correspond to the main game board grid and player mode selection
for each of the black and white players. In addition, a rectangle type exists to allow for game play to be reset.
Hardware Accelerator Setup:
- initialize_accelerator: This function initializes the hardware accelerator. It is part of the accelerator driver.
TFT Display Setup:
- blackScreen: The definition of this function is found in this file and it initializes the memory region corresponding the TFT pixel values to black.
The following functions are responsible for setting up the TFT display and are part of the TFT driver:
- TFT_Initialize: Initializes the TFT Display IP. - TFT_SetImageAddress: Points the TFT IP to the address of the memory region where the pixel values are
stored. - TFT_SetBrightness: Sets the brightness of the TFT Display. - TFT_TurnOn: Turns the TFT Display on.
UART Setup:
- XUartLite_Initialize: Initializes the UART IP.
- XUartLite_SetRecvHandler: Sets the handler function for an interrupt generated by receiving something on
the UART. This is set to RecvUartCommand which is defined in “main.c”
- XUartLite_ResetFifos: Resets the UART Fifos. - XUartLite_EnableInterrupts: Enables interrupts from the UART side.
Interrupt Controller Setup:
- XIntc_Initialize: This function initializes the interrupt controller.
- XIntc_Connect: This function connects an interrupt source to the interrupt controller and specifies interrupt handler corresponding to that source. There are two IPs connected to the interrupt controller: 1) UART which is handled by XUartLite_InterruptHandler. 2) Touch Screen which is handled by TouchSensorButtons_InterruptHandler.
- XIntc_Start: Starts the interrupt controller. - XIntc_Enable: Enable the interrupts for each interrupt source connected to the controller.
The above functions are all part of the interrupt controller driver. In addition, microblaze_enable_interrupts is called to enable interrupts on the microblaze side.
Gameboard Initialization and Rendering:
- Gameboard_Initialize: Initialize the game board. - Gameboard_RenderBoard: Renders the game board. - Gameboard_RenderAllCtrlButtons: Renders the UI control buttons. -
The definitions for the above functions are all found in gameboard.c.
After the initial setup, the program enters the main execution loop which is structured as follows: - Outer while loop which iterates until a win has been achieved. - Switch statements that select between several functions to perform based on the current player mode in
anticipation of the player move: o Human player:
Enable the game board grid button using TouchSensorButtons_EnableButton o FPGA AI:
Disable the game board grid button using TouchSensorButtons_DisableButton Get the AI move using HandleAiMove. HandleAiMove is defined in “main.c” and invokes the
top level AI function get_move_ai2 defined in “ai.c”. o UART:
Disable the game board grid button. Send the last played move to the UART using SendUartCommand.
- Inner while loop which iterates until a new move has been played. - Win condition check which used check_board_win (defined in “board.c”).
TIPS AND TRICKS
This project produced a number of interesting insights into two primary areas:
1) How to develop hardware accelerators in embedded environments
2) How to use Vivado HLS flow effectively
TIPS & TRICKS: ACCELERATORS
Profiling is as essential in optimizing in an embedded environment as it is on a PC or super computer. Fortunately,
Xilinx provides a microblaze specific variant of gprof (mb-gprof) to help developers identify hot spots in their code.
Using gprof helped us identify the proportion of time we were keeping our accelerator active, and what the cost
overheads were of invoking it. Xilinx UG448 “EDK Profiling User Guide” is very helpful in describing how to use the
profiling from within SDK/EDK.
Setting up a microblaze project for profiling is not trivial however, there were a few key issues.
1) Gprof uses timer driven interrupts to sample code during execution. This must be provided in the XPS
project and connected to the interrupt controller. This was all expected. However, when other interrupt
devices were initialized and connected in our embedded project, all successive samples were lost. To
avoid this, we ifdefed out this initialization when a profile flag was set to true.
2) In order to profile the system, the BSP must also be compiled with –pg enabled, and with software-
intrusive interrupts enabled. Compiling the BSP takes a considerable amount of time. so it was more
convenient to define a profile-only variant, which helped reduce our iteration time.
TIPS & TRICKS: VIVADO HLS
Vivado HLS is a powerful tool, if the engineer understands how it performs it’s translations.. At present the best
reference for Vivado HLS related information is:
Xilinx UG902 “Vivado Design Suite User Guide: High Level Synthesis”
Xilinx UG817, “Vivado Design Suite Tutorial – High Level Synthesis”
UG817 demonstrates to the new user, the process of building a pcore using vivado HLS, while UG902 is a
comprehensive reference about the tool itself. There are a few key areas a new user must study to become
effective at using the flow.
Understanding interfaces:
The first insight is that arguments to the function being accelerated should not be thought of as arguments at all,
but rather interfaces.
If an argument, or the return value is single valued (scalar), a bundle of wires connected to a memory mapped
register is all that’s required to capture it (use an ap_none interface, and bundle it with the control interface
(AXI4lite slave).
#pragma HLS INTERFACE ap_none port=p #pragma HLS RESOURCE variable=p core=AXI4LiteS metadata="-bus_bundle ctrl" #pragma HLS INTERFACE ap_ctrl_hs port=return
#pragma HLS RESOURCE variable=return core=AXI4LiteS metadata="-bus_bundle ctrl"
Vivado will conveniently generate driver code to write and read from these slave fields. If reading from, or writing
to an array sequentially, use an ap_fifo. Ap_fifo interfaces can use either FSL or AXI_streaming cores.
#pragma HLS INTERFACE ap_fifo port=master #pragma HLS RESOURCE variable=master core=FSL #pragma HLS INTERFACE ap_fifo port=move_score #pragma HLS RESOURCE variable=move_score core=FSL
We had considerably less difficulty intergrating with FSLs over and AXI Streaming Fifo. This, coupled with their
reduced call overhead, mad FSL the clear winner for this project.
If random access to array to is required, a bus interface will be required (ap_bus), these have considerable area
over head, and increased read and write latency.
Understanding Pragmas and Directives:
UG902 is very important here. The final sections of the document expand on what transformations each pragma
performs.
1) Array optimizations: These tell vivado how to implement and partition storage elements in HW
2) Loop optimizations: These tell vivado how it should extract loop-level parallelism (there are a few
different varieties). It’s also important to tell vivado if inter-interation dependencies exist and what
variety they are.
3) Function Optimizations: Functions are treated as blocks. Vivado can be told to multiplex different
invocations of the same function into the same piece of hardware, or instantiate multiple instances of it.
Some Final Tricks…
Use reduced bit-width values wherever possible. IE, uint4, if the values are known to be bound between 0 – 15.
Vivado may not be able to infer this from the C description, leading to completely unnecessary 32 bit counters.
But be aware of the pitfalls (which seem obvious after the fact):
1) Adding signed ints of different bit widths leads exactly to what you’d expect.
2) This is an infinite loop:
uint4 counter;
for (counter=0; counter < 16; counter++) {
printf(“There ain’t no escape!”);
}