DFPM on FPGA -Bachelor Thesis Report

Självständigt arbete på grundnivå

Independent degree project first cycle

Electrical Engineering

DFPM on FPGA – A speed optimized implementation of the Dynamic

Functional Particle method on Spartan 3E

Taiyelolu Adeboye

DFPM on FPGA

Taiyelolu Adeboye

2015-09-25

iii

MID SWEDEN UNIVERSITY Department of Electronics Design(EKS)

Examiner: Benny Thörnberg, [email protected]

Supervisor: Kent bertilsson, [email protected]

Author: Taiyelolu O. Adeboye, [email protected]

Degree programme: International Bachelor’s Programme in Electronics, 180 credits

Main field of study: Electronics Engineering

Semester, year: Autumn, 2014

DFPM on FPGA

Taiyelolu Adeboye

Abstract

2015-09-25

iv

Abstract This thesis focuses on the design of electronic circuitry that implements

the Dynamic Functional Particle Method (DFPM). The design was done

in VHDL and implemented on a Xilinx Spartan 3E FPGA. The work

included a digital 33-bit ALU implementation that was designed to

solve differential equations with the DFPM algorithm and UART trans-

ceiver and controller circuits for data exchange between the FPGA and

the PC. This report explains the design principles, process, tests and

results of the work. It also compares the performance of the designed

system with the performance of generic computational devices and also

examines the possibilities and limitations of operational concurrency

with relation to the size of problem sets.

Keywords: MATLAB, VHDL, FPGA, DFPM, algorithm evaluation, CPU

clock cycles, particle method

DFPM on FPGA

Taiyelolu Adeboye

Acknowledgements

2015-09-25

v

Acknowledgements I would like to express my appreciation to my supervisor, Associate

Professor Kent Bertilsson, for his guidance, mentorship and support in

the course of this project. His contribution was vital to the execution and

completion of this project work. I would also like to express my appreci-

ation to Associate Professor Sverker Edvardsson for being so approach-

able and for his great willingness to explain.

My various tutors and examiners in the course of this Bachelor’s pro-

gramme have proven themselves to be exceptional and unforgettable. In

no particular order, Professor Bengt Oelmann, Dr. Börje Norlin, Profes-

sor Kent Bertilsson, Professor Benny Thörnberg, Martin Kjellqvist,

Mikael Hasselmalm, Dr. Najeem Lawal, Mikael Bylund, Amir Yousaf,

Professor Cornelia Schiebold, Dr. Peng Cheng, Mazhar Hussein, Profes-

sor Engmont Porten, Stefan Haller, David Krapohl, Solange Hamrin and

Evelina Caffrey will remain entrenched in my memory.

Without mincing words, Anders Rådberg, Anders Molin, Sara Lodin,

Lars Malmbom, Tove Gullikson and the team at MIUN Innovation will

always remain dear to my heart. Thank you for your time, advice and

your effort!

Finally, I owe a huge debt of gratitude to the following: The divine, for

those moments when I was dry, Temitope Ruth, for being so under-

standing and special, Ire Peter, our bundle of joy, for being so sweet,

Kehinde, my wonderful twin, my family (Samuel, Dorcas, Ardex,

Adeyemi and Ope) for being such a pillar of support, and my friends in

Sweden and in Nigeria. Words will not be enough to express how much

I appreciate you!

Thank you for being part of this journey, muchas gracias! Greater things

are still to come!

DFPM On FPGA

Taiyelolu Adeboye

Table of Contents

2015-09-25

vi

Table of Contents

Abstract ............................................................................................................ iv

Acknowledgements ......................................................................................... v

1 Introduction ............................................................................................ 1

1.1 Background and problem motivation ...................................... 2

1.2 Overall aim ................................................................................... 3

1.3 Scope ............................................................................................. 4

1.4 Tools to be used ........................................................................... 4

1.5 Concrete and verifiable goals .................................................... 4

1.6 Outline .......................................................................................... 5

1.7 Contributions ............................................................................... 5

2 Theory ...................................................................................................... 6

2.1 Definition of terms and abbreviations ...................................... 7

2.1.1 Terms .................................................................................. 7

2.1.2 Abbreviations .................................................................. 11

2.2 DFPM algorithm ........................................................................ 12

3 Methodology ........................................................................................ 15

3.1 Concurrence vs. sequentiality ................................................. 15

3.2 Numerical representation ........................................................ 15

3.3 Modularity .................................................................................. 16

4 Design .................................................................................................... 17

4.1 The DFPM algorithm ................................................................ 17

4.2 Project Top Module ................................................................... 19

4.2.1 The two top sub-modules .............................................. 19

4.2.2 Data type conversion ..................................................... 19

4.3 Project defined Packages .......................................................... 20

4.4 Communication Top Module .................................................. 20

4.4.1 UART ................................................................................ 20

4.5 Iteration Control Top Module ................................................. 22

4.6 Implementation Constraint ...................................................... 24

4.7 Parameters .................................................................................. 24

4.8 Data exchange format ............................................................... 25

4.9 Signed numerical representation ............................................ 26

4.10 Integer and fractional representation ..................................... 27

4.11 Spartan 3E-1200 FG320 FPGA ................................................. 28

DFPM On FPGA

Taiyelolu Adeboye

Table of Contents

2015-09-25

vii

4.12 Nexys2 FPGA demonstration board ...................................... 28

4.13 Xilinx ISE .................................................................................... 29

4.14 ISim Simulation software ......................................................... 29

4.15 Design verification .................................................................... 30

4.16 The complete design ................................................................. 30

5 Results ................................................................................................... 32

5.1 Simulation results ...................................................................... 32

5.1.1 Element wise vector multiplication ............................. 32

5.1.2 Element-wise vector subtraction .................................. 33

5.1.3 Evaluating new vector V ............................................... 34

5.1.4 Evaluating new vector X ............................................... 34

5.1.5 Convergence check ......................................................... 35

5.1.6 DFPM top module .......................................................... 36

5.2 Comparison ................................................................................ 39

6 Discussion ............................................................................................. 42

6.1 FPGA resource utilization ........................................................ 42

6.2 Reduction in computation time ............................................... 42

6.3 Larger problem sets .................................................................. 42

6.4 UART bottleneck ....................................................................... 43

6.5 Precision ...................................................................................... 43

6.6 Communication input/output limitations ............................. 43

6.7 Cross platform comparison...................................................... 43

6.8 Output comparison ................................................................... 45

6.9 Communication possibilities ................................................... 49

6.10 Applications ............................................................................... 49

6.11 Implications ................................................................................ 50

7 Conclusions .......................................................................................... 51

7.1 Benchmark .................................................................................. 51

7.2 Further work .............................................................................. 51

References ........................................................................................................ 53

Appendix A: Documentation of own developed program code ........... 54

Design codes .................................................................................................... 54

New V operations………. .............................................................................. 65

New X operations. ........................................................................................... 67

One Iteration …………………………………………………………...69

DFPM top module .......................................................................................... 73

UART Core …………………………………………………………..76

UART Interface …………………………………………………………..83

Project Top module ......................................................................................... 88

DFPM On FPGA

Taiyelolu Adeboye

Table of Contents

2015-09-25

viii

Test code written in C++ ................................................................................. 96

Appendix B: Explanation of some basic mathematical concepts ........ 100

Two’s complement ........................................................................................ 100

Euclidian norm .............................................................................................. 100

Appendix C: Project report summary ....................................................... 102

Appendix D: MATLAB codes .................................................................... 103

Code for problem specification and comparison. .................................... 103

Appendix E. Table of standard ASCII symbols and their numerical

representation .................................................................................... 109

DFPM On FPGA

Taiyelolu Adeboye

1 Introduction

2015-09-25

1

1 Introduction DFPM on FPGA is a project work that implements the algorithm of the Dy-

namic Functional Particle Method in silicon. The implementation was done on

Xilinx Spartan 3E FPGA, and it was designed for speed (in terms of the num-

ber of clock cycles required for the implementation).

The Dynamic Functional Particle Method (DFPM) is a numerical particle

method that was developed at Mid Sweden University. While the method is

iterative, it consists of steps, some of which can be executed in parallel. There-

fore a FPGA was considered to be able to offer advantages due to its parallel

processing capabilities.

The FPGA implementation takes matrix elements as input parameters through

the UART and returns an output in the form of the solution vector relevant to

the parameter input received.

Figure 1.1: A simplified illustration of the project

DFPM On FPGA

Taiyelolu Adeboye

1 Introduction

2015-09-25

2

1.1 Background and problem motivation

Systems of linear equations can be used to describe many observable natural

phenomena in nature and find application in many areas in physics, mechan-

ics, and sensor fusion among others.

One of the approaches to solving systems of linear equations involves the

application of the knowledge of matrices. This approach treats the system as

matrices or vectors comprising of elements that represent the parameters of

the system in question.

This approach often results in the classical A*X = B problem where A, X and B

are matrices/vectors. A has elements containing various parameters of the

system, X contains elements representing the defining properties of the pa-

rameters and B represents the solution vector.

For instance, if a system is defined as shown below,

3x – 2y + 4z = 10

5y + 1y – 2z = -2

10y – 5y + 3z = 4

Then it can be represented in A*X = B form as shown below.

As the number of variables in these systems increase, the size of the matrices

increase proportionately but the number of iterations required for solving the

problem using an iterative numerical method increases geometrically, thus

consuming significant CPU time.

This project aims to address this problem through the design of an Arithmetic

and Logical Unit (ALU) that implements the DFPM algorithm in a system that

combines sequential and parallel execution as a means of reducing the number

of CPU clock cycles required per iteration and consequentially, the computa-

tion time for the complete algorithm.

DFPM On FPGA

Taiyelolu Adeboye

1 Introduction

2015-09-25

3

1.2 Overall aim

The overall aim of the project is the design of an ALU that implements the

Dynamic Functional Particle Method on a FPGA. The system will be capable of

receiving input in the form of parameters that represent the variables of the

system to be analysed and will give its output in the form of a matrix whose

elements represent the solution to the problem.

The designed system will be capable of communicating with a computer

through the USB port and the data is to be collected and displayed on the

computer screen using suitable software.

The output from the designed system should be correct and consistent in

comparison with values obtainable from a similar computation executed in

MATLAB or similar software on a PC.

Figure 1.2: An overview of the project concept

DFPM On FPGA

Taiyelolu Adeboye

1 Introduction

2015-09-25

4

1.3 Scope

The designed system is expected to be able to resolve system of linear equation

problems expressed in the form A*X = B where A is a 5x5 square matrix while

X and B are 5X1 Vectors respectively. A and B will be given as input to the

designed system while the system gives an output that represents X as a solu-

tion vector of the system.

The input to the designed system should be in the form of positive 8 bit inte-

gers while the output from it is expected to consist of whole numbers as well

as fractions which can be represented to a maximum precision of 8 binary bits.

Although limits have been imposed on the kind of input parameter expected

with the aim of easing the communication between the designed FPGA system

and PC software, it is expected that the ALU designed should be able to exe-

cute the DFPM algorithm on input data beyond these constraints.

1.4 Tools to be used

The following tools are expected to be used to carry out this project:

1. Xilinx Spartan 3E FPGA on Nexys2 demonstration board.

2. Xilinx ISE design suite.

3. Desktop terminal application software running on a PC.

4. MATLAB software running on a PC.

1.5 Concrete and verifiable goals

The goals of the project are as follows:

1. Design of a processor/ALU in VHDL. The unit should implement the

DFPM algorithm.

2. Implementation of parallel processing into the design of the DFPM

computational module, as much as optimal for the problem size.

3. Design of UART communication modules, in VHDL, for the transfer of

data from the PC/UART port to the DFPM computation module speci-

fied in the item number above.

4. Verification of the output from the FPGA. It should be consistently

equivalent to the output of the same algorithm run on a PC.

DFPM On FPGA

Taiyelolu Adeboye

1 Introduction

2015-09-25

5

5. Investigation and suggestion of possible solutions and approaches to

scaling up the design for significantly larger problem sets.

1.6 Outline

Chapter 2 of this report explains, in brief, the theories behind the design and

some related work pertinent to DFPM and the FPGA implementation while

Chapter 3 examines the design methodology and principles behind design

choices and approaches. Chapter 4 outlines some of the tests carried out to

verify the functionality of the modules designed as well as compares the

results with those obtainable from other systems. In the fifth chapter, the

results are discussed, and the possibilities and limitations examined, and

Chapter 6, which concludes the report.

1.7 Contributions

This design was wholly done by the author of this report with support and

guidance from the supervisor (Associate Prof. Kent Bertilsson). The design was

based on the Dynamic Functional Particle Method algorithm which was devel-

oped by Prof. Sverker Edvardsson et al [1].

Prof. Sverker Edvardsson supplied the author with information about DFPM

and sample application of the algorithm implemented in MATLAB. A UART

core designed for the Nexys2 and made available by Digilent Inc., it was

adapted in designing the data exchange modules interfacing between the

FPGA and the PC.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

6

2 Theory Systems of linear and differential equations is a well-established concept in

mathematics and finds its applications in solving theoretical numerical prob-

lems as well as real world challenges in various fields of endeavours like

mechanics, biology, electronics, economics etc. Thus a lot of work has been

done to develop approaches to solving these problems.

The dynamic functional paticle (DFPM) is an approach, recently developed by

Sverker Edvardsson et al [1] [2], which can be used to solve systems of linear

and differential equations. The algorithm is simple, widely applicable and

efficient with significant comparative advantages in relation to some of the

other established approaches [2].

DFPM implements a novel second order dynamical particle method which,

though new, is related to some first order approaches in previous work done

by Sincovec and Madsen [3], Pata and Squassina [4], and F. Alvarez [5].

There are a number of computational libraries and algorithm, implementing

various approaches to solve problems of linear and differential equation sys-

tems. Some of these include ARPACK and LAPACK, Colt library (java), and

IML++ (C++) among others.

Since this report is not a mathematical treatise, the main focus is on design and

implementation of electronic hardware that is able to compute and present

solutions to problems presented as a system of differential equations received

as input.

The design and implementation done in this project, while novel, is also relat-

ed to a previous work by Bruce Land entitled “Hybrid Computing on an

FPGA“ [6], in which a Digital Differential Analyzer (DDA) was designed and

implemented on Altera Cyclone II 2C35 FPGA on an Altera DE2 FPGA

demonstration board. The design made use of numerical representation in 18

bits, of which 16 bits were set apart for floating point fractions. Parallel compu-

tations were also used in order to reduce CPU computation time.

Apart from Bruce Land’s design above, there is little or no known information

about the implementation of numerical or particle methods in FPGA, and this

work could lead to novel concepts and applications.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

7

2.1 Definition of terms and abbreviations

2.1.1 Terms

Below are basic definitions and/or explanation of some important concepts

used in this report.

1. Linear equations

A linear equation can simply be defined as an algebraic equation consisting of

either or both constants and a product of constants and single power variables.

2. Systems of linear equations

These are a set of simultaneous linear equations which are defined as a single

problem and meant to be treated as such. These are often encountered in real

life situations and observable physical phenomena.

3. Differential equations

These kinds of equations define relationships connecting certain functions or

physical properties with their differentials (i.e. derivatives) hence the name.

4. Systems of differential equations

These are simultaneous statements of differential equations defining a specific

problem as a function of relationships between one or more independent

variables and their derivatives (dependent variables).

5. Numerical methods

These are approaches to solving mathematical problems with the use of vari-

ous methods numerical approximation. Numerical methods can be direct or

iterative.

Direct numerical methods include algorithms that have a predefined number

of steps for arriving at solutions. An example is the Gaussian elimination

method. Iterative methods, however, require an undetermined number of

iterations, of computational steps, which can vary with each problem defini-

tion. Examples of iterative numerical methods are Newton’s method and the

Newton-Raphson method.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

8

6. Particle methods

Particle methods are algorithms used, primarily, for the simulation of interact-

ing particles of physical systems and their motion in nature. These algorithms

are, sometimes, applied to numerical treatment of theoretical mathematical

models. The dynamic functional particle method falls under this category.

7. Convergence

Convergence is a characteristic of an iterative method when its sequences

subsequently and consistently approximates, or “converges”, to some specific

numeric approximations. The approximation to which the method converges

to is said to be the solution for the problem being solved with the use of the

iterative method.

8. The Dynamic Functional Particle method

This is an iterative particle method applied to general mathematical problems

by which mathematical problem models can be translated to particle models

and solved, as developed by Sverker Edvardsson et al [2].

The method is robust and widely applicable to problems of systems of linear

and differential equations, especially those defining nature and observable

physical phenomena.

9. Sequential processes

Sequential processes are processes consisting of operations which are carried

out one after the other. In these kinds of processes no two operations take

place simultaneously. All operations follow a definite sequence. Examples are

operations that take place in a single core CPU (Central Processing Unit).

10. Concurrent processes

Concurrent processes are processes consisting of more than one operation

being carried out in parallel. These kinds of processes can occur in multi-core

CPUs, FPGAs and other kinds of devices with parallel processing capabilities.

11. CPU time

This refers to the time spent by a processing unit while carrying out a certain

computational operation or set of operations. It is expressed in seconds.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

9

12. Clock

This is a component in digital electronics systems by which the timing of

operations and processes are controlled. It basically oscillates between a high

and low signal.

13. Clock cycle

This is a single complete up and down oscillation of a clock.

14. Clock frequency

This refers to the number of cycles a clock completes in a second. It is ex-

pressed in Hertz.

15. Field Programmable Gates Array (FPGA)

These are integrated circuits that are factory manufactured to be configurable

by engineers and designers as the use case or application demands. They are

normally programmed in a hardware description language (HDL).

16. Universal Asynchronous Receiver Transmitter

This is a standard hardware that facilitates serial data exchange between two

electronic devices. A UART port should be connected to another UART port in

order for them to exchange data.

Data exchange between UART hardware is 1 bit serial and takes place between

cross-connected receiver and transmitter pins while the data received is con-

verted to parallel 8 bit format and exchanged between the UART hardware

and the device controlling it.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

10

Figure 2.1 Simplified illustration of the UART communication process

17. MATLAB

MATLAB is an interactive software platform and high-level programming

language which is often used in scientific and engineering computing due to its

simplicity, robustness and easy to use interactive environment and functions.

In this project, it was used for the initial execution of the DFPM algorithm and

comparison.

18. Terminal software application

This is a software application that enables its user to get access to one or more

input/output ports (e.g. USB) of a PC and which displays the data stream. In

this project, Br@y++ terminal was used to access a USB port and communicate

with the FPGA running the DFPM algorithm.

19. Two’s complement

Two’s complement is a method of representing positive and negative signed

numbers such that the most significant bit is used to represent the sign while

the rest of the bits represent the numeric value of the number being represent-

ed.

When the most significant bit of a number represented in two’s complement is

“1”, then the number is negative but when it is “0”, the number is positive.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

11

This is a standard way of representing numbers that is frequently applied in

computing and electronics.

2.1.2 Abbreviations

The following abbreviations are used in this report:

ALU: Arithmetic and Logic Unit.

ASCII: American Standard Code for Information Interchange. This is the

standard used for the data exchanged between the PC and the FPGA.

ASIC: Application Specific Integreated Circuit. These are integrated circuits

that are designed or configured for a specific use case or application.

ARPACK: Arnoldi PACKage. Is a software library, coded in FORTRAN,

which can be used to solve eigenvalue problems.

BGA: Ball Grid Array.

CLB: Configurable Logic Blocks. These are logic elements on FPGAs used to

implement circuits.

CPLD: Complex Programmable Logic Device.

CPU: Central Processing Unit.

DE: Differential Equations.

DFPM: Dynamic Functional Particle Method.

FPGA: Field Programmable Gates Array.

FPU: Floating-Point Unit.

HDL: Hardware Description Language. These are languages by which one can

design hardware by means of semantics in an ISE or IDE.

IDE: Integrated Design Environment.

IOB: Input Output Block. These are ports for input and output to and from the

FPGA.

ISE: Integrated Synthesis Environment. This is software for synthesizing

designs done in HDL. Xilinx ISE is an example.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

12

LAPACK: Linear Algebra PACKage. This a library written in FORTRAN

which can be used to solve problems in linear algebra.

LDE: Linear Differential Equations.

LSB: Least Significant Bit.

LUT: Look Up Table

MATLAB: This is a software platform and high-level language used for pro-

gramming and simulations.

MCU: Microcontroller.

MSB: Most Significant Bit.

N/A: Not Applicable.

RAM: Random Access Memory.

RX: Receive. This is a pin through which data is to be received on a transceiver

port.

TX: Transmit. This is a pin through which data is to be transmitted on a trans-

ceiver port.

UART: Universal Asynchronous Receiver Transmitter.

USB: Universal Serial Bus.

VGA: Video Graphics Array. This is a standard for image display.

VHDL: VHSIC Hardware Description Language. In this project, VHDL was

used for digital hardware design.

VHSIC: Very High Speed Integrated Circuit.

2.2 DFPM algorithm

The dynamic functional particle method (DFPM) is widely applicable to solv-

ing a number of different problems when defined as a system of linear or

differential equations. However, the focus of this project work is on the appli-

cation of DFPM to solve the classical A*X = B system of differential equation

problem as described in Chapter 1 of this report.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

13

The algorithm is simply a two-step computation which is iterated until con-

vergence (or a specified level of convergence) is reached. Checking for conver-

gence is done by evaluating the Euclidean norm of the difference between

vector B and the vector product of vector X and matrix A and comparing it

with a predetermined scalar value representing the acceptable tolerance of the

computation.

The algorithm requires a number of input which are three n sized vectors

representing vector B in the problem statement and vectors X and V which are

used in the algorithm. An nxn matrix is also required as an input equivalent to

the A-matrix in the problem statement. Three scalar input Dt, mu and toler-

ance are also expected in the algorithm and they represent the discretization

step, the damping factor and the tolerance respectively.

DFPM On FPGA

Taiyelolu Adeboye

2 Theory

2015-09-25

14

Figure 2.2 A flowchart of the DFPM algorithm

A MATLAB sample code implementing the algorithm in Figure 2.2 above is

included in this report.

DFPM On FPGA

Taiyelolu Adeboye

3 Methodology

2015-09-25

15

3 Methodology As stated in the introductory part of this report, one of the purposes of this

project work is the reduction of CPU time. Hence, significant attention was

paid to the computational processes implemented in this design, as well as the

impact on the speed, and resource use on the FPGA. This chapter describes the

methodologies and considerations that influenced the design and implementa-

tion as described in the following chapter.

The preference of an FPGA over traditional CPUs and other types of pro-

cessing units is a consequence of the advantages offered by operational con-

currency that is characteristic of FPGAs and CPLDs.

After having chosen a design concept, the next biggest challenge was the

design itself. The design in this project work was done in VHDL (VHSIC

Hardware Description Language). While there are other languages and ap-

proaches to similar hardware design, VHDL was chosen because of the ease

with which it can be used to manage large projects, as well as the author’s

familiarity with it.

3.1 Concurrence vs. sequentiality

A limitation that was encountered early in the course of the design was the

limited number of dedicated multipliers on FPGAs. This was due to the fact

that FPGAs have a limit to the number of multipliers available on them, hence

limiting the number of multiplicative operations that can be executed concur-

rently.

An important focus of this work is speed optimization, for which concurrency

is key in this implementation. However, a balance needed to be struck between

concurrency and sequentiality. Hence some operations were run in parallel

while others were sequential. Addition and subtraction operations were most-

ly concurrent while some multiplicative operations were sequential and others

parallel.

3.2 Numerical representation

The dynamic functional particle method involves an iterative process with a

number of multiplications, subtractions and additions at each stage. The algo-

DFPM On FPGA

Taiyelolu Adeboye

3 Methodology

2015-09-25

16

rithm was implemented in MATLAB and run while the result of the computa-

tions at each stage of the iteration was output to the console and examined.

The cursory examination clearly indicated that the various values obtained

from the computations assumed a range that stretched across positive and

negative parts of the number line. This implied that a scheme was needed for a

distinct representation of negative and positive values. The values contained

integers as well as fractions, necessitating a need for representation of frac-

tions.

3.3 Modularity

In order to simplify the design, the whole project was split into to two major

top modules. One of these two top modules implemented the DFPM algorithm

and the necessary iterative computations while the other module was designed

to implement UART communication and data exchange between the UART

hardware on the FPGA board and the port on the PC with which it will be

communicating. This second module was also responsible for the conversion

of the 8-bit parallel data to 33-bit numbers and the format expected by the

DFPM algorithm module.

Each of these top modules was subdivided into smaller modules which carried

out specific functions and communicated with other modules through signals

and inter-module data exchange.

The details of the design are discussed under design in Chapter 4.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

17

4 Design The digital hardware designed in VHDL consisted of combinatorial and syn-

chronous circuits which were coded as IO ports, modules, processes and

signals. The functioning of the combinatorial circuit elements were instantane-

ous while synchronous circuit activities too place at the edge of the clock.

The complete design was made up of several modules exchanging information

with the aid of signal input and output via their ports. Since the design is

reasonably complex and large, an attempt was made to give each module a

name that signified or helped to identify the purpose and function of the

modules.

The core of the design consisted of the modules which executed the DFPM

algorithm, an over view of these core modules and their interaction is present-

ed in Figure 4.1

4.1 The DFPM algorithm

The dynamic functional particle method is widely applicable to many problem

models as stated in Chapter 2 of this report. However, in order to design a

circuit that specifically solves the A*X = B problem, one needs to understand

the step by step procedure of applying DFPM to the problem. Various imple-

mentations of DFPM in MATLAB, C++ and VHDL as applied in this thesis are

included in the appendix.

The procedure entails access to input vectors and matrix containing a number

of elements, of vectors and matrices, which make up the coefficients of the

systems of equations. The next step is the iterative computation, after which

comes the output. Throughout the process, the values of vector B, matrix A, Dt

and the damping factor (mu) remains fixed while the values of vectors X and V

may be modified at the end each iteration.

Each stage of the iterative computation comprises of two steps which are the

approximation calculation and the convergence check. The approximation

calculation takes the form of matrix multiplication, subtraction and addition

operations while the convergence check required a comparison of a predeter-

mined tolerance value with the Euclidian norm of the vector V.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

18

Figure 4.1. An overview of the core modules of the DFPM algorithm

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

19

4.2 Project top module

The topmost level container for the project HDL code was named

DFPM_ON_FPGA_TOP_MODULE. This module functioned as the overall top

module, containing all VHDL code relevant to the project design. It consisted

of two top modules which served two distinctly important functions. The

modules were named “UART_INTERFACE” and

“Signed_DFPM_Iteration_Control_Top_Module”. The complete VHDL code

for all the modules will be included as an appendix to this report.

4.2.1 The two top sub-modules

The communication top module was designed to handle communication with

the PC through the UART port and the UART VHDL code that controlled it.

Data received from the PC which would normally be in 8 bits were converted

to 33 bits in the format stated in section 3.2.2 of this report. The data were also

accumulated in arrays internal to this module until all data relevant to the

specific problem model has been received. The data would then be sent as

output through the ports of this module.

The Signed DFPM Iteration control module receives a stream of 33-bit data in a

format specified in its design, which mathematically describes the problem

being solved. The data received would then be subjected to the DFPM algo-

rithm, after which a solution would be obtained and sent out as an output

through the ports of this module.

At the conclusion of the Signed DFPM Iteration Control module’s computa-

tion, the output signal would be returned to the Communication top module

which reconverts the solution by first translating the result into human reada-

ble decimal equivalent before serially shifting the values out in 8 bits through

the UART interface.

4.2.2 Data type conversion

The communication top module handles data as standard logic vectors and

standard logic signals while the Signed DFPM Iteration Control module han-

dles data as signed bit vectors for all vectors.

This fact necessitated a need for the conversion of the data signal types from

standard logic vectors to signed bits and vice versa. This was done with the aid

of predefined functions which are conversion standards in VHDL. The conver-

sion takes place in the project top module.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

20

4.3 Project defined packages

The input data for each problem consisted of scalar data and many vectors and

some multi-dimensional matrices. Hence a specific format was designed for

easy recognition and handling of these vectors and matrices. Due to the fact

that these design-specific format vector data types were often handled and

shared between multiple modules in the project, it was considered advanta-

geous to create special packages to define these unique format vectors.

The specific formats designed are described below:

1. DFPM_VECTOR_5X32_BIT: A data type defining an array of 5 standard

logic vectors. Representative of a 5 by 1 vector of standard logic type

data.

2. DFPM_VECTOR_25X32_BIT: A data type defining an array of 5

DFPM_VECTOR_5X32_BIT. I.e. a multidimensional array equivalent to

a 5 by 5 matrix of standard logic vector type data.

3. DFPM_ARRAY_5X32_BIT: A data type defining an array of 5 signed bit

vectors. It was used to represent 5 by 1 vectors of containing signed da-

ta.

4. DFPM_ARRAY_25x32_BIT: A data type defining an array of 5

DFPM_ARRAY_5X32_BIT. This is equivalent to a 5 by 5 multidimen-

sional array of signed data.

These packages were used to ease the process of design and implementation

and also facilitated a unified standard between modules.

4.4 Communication top module

The communication top module comprised of 8 sub-modules. The modules

and their functionalities are briefly described below.

4.4.1 UART

These are the modules controlling the UART circuitry

1. RS232RefComp: This module was released by Digilent Inc. as a sample

code for an implementation of a UART core for the Nexys2 board. It is

the only purely non-original code used in this project.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

21

It is a simple implementation of UART designed in VHDL and it is re-

sponsible for 1 bit serial data transmission and reception, as well as the

conversion of 1-bit serial to 8-bit parallel data and transmission to the

on-board electronic hardware.

2. UART_INTERFACE: This module was used to control the RS232Comp

circuit. It determines when the UART core should transmit data, receive

data or neither.

This module is a simple four-state state machine. The states correspond

to:

a. Receive state: When the UART core is switched to receive data.

b. Waiting state: When both the UART interface and the UART core

do nothing but wait for data from the DFPM module.

c. Send state: When the UART module is switched to send an 8 bit da-

ta.

d. RepeatSend state: This is a transitional state where the module goes

to after sending each 8-bit data before sending the next. This helps to

ensure that the data transmission between the UART INTERFACE

and the UART core is hitch-free.

The control of the UART core from the UART INTERFACE and feed-

back from the UART core was facilitated with the aid of four signals namely

wrSig, rdSig, TBESig and RDASig. These signals and their effect on the UART

core are outlined in Table 4.1 below.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

22

Table 4.1 Table of control signals and their effect on the state of the

UART core

UART Module status

Transmit Receive

Signal wrSig 0 Off N/A

1 On N/A

rdSig 0 N/A On

1 N/A Off

Feedback from the UART core was received through the TBE and RDA signals,

which, when raised high, indicated that new data has been read or transmitted

respectively.

4.5 Iteration control top module

This module is made up of the circuitry that implements the DFPM algorithm.

The sub-modules were designed to carry out the various computations and

logical evaluation required in the DFPM method.

1. Signed_Vector_Vector_Mult_5By1: This module computes the ele-

ment-wise product of two 5 by 1 vectors of 33-bit data. Its operation is

concurrent and all computation results are immediately available at the

output when the input values changes.

2. Signed_Vector_Vector_5By1_Subtr: This module computes the ele-

ment-wise difference between the elements that make up two modules.

It concurrently performs subtraction operations on two vectors contain-

ing five elements of 33-bit data type and immediately assigns the result

to the output.

3. Signed_SubtrAndMult_Ops_Module: This module instantiates the

vector multiplication and the vector subtraction modules above and us-

es them in the computation “B – A*X – mu*V” for each iteration stage of

the DFPM algorithm.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

23

In this module, computation of the product of matrix A and vector X

was a combination of concurrent and sequential operations. The prod-

uct of one row of matrix A and the vector X was concurrent but since

matrix A comprised of 5 rows, each row product was pipelined in order

of row sequence.

4. Signed_New_V_Ops: This module computed a new value for the vec-

tor V at each iteration stage of the DFPM algorithm. The value was

based on the result of the operations carried out in the subtraction and

multiplication operations module, described in number 3 above.

5. Signed_New_X_Ops: This module computed a new value for the vector

X in each iteration stage of the DFPM algorithm. The new value for vec-

tor X is always dependent on the new value of vector V above.

6. Signed_Tolerance_Check: This module receives the value of B-A*X as

input and should then compare the Euclidean norm of the vector re-

ceived with the pre-fixed tolerance value. However, computing square

roots in FPGA can be problematic and introduce significant errors.

Hence, the square of the tolerance value was compared with the square

of the Euclidean norm, which is equivalent to the sum of the squares of

the elements that make up the vector input.

After comparison, if the square of the norm was found to be lesser than

the square of the tolerance level, a signal line would then be raised and

the algorithm terminates. The squares of the two vectors were comput-

ed by self-multiplying them with the aid of the Vector_Vector_Mult

module described above.

When the condition checked by this module is found to be true, conver-

gence is said to have been reached.

7. Signed_DFPM_One_Iteration: This module instantiated the subtraction

and multiplication module, new v operation module, new x operation

module and the tolerance check module. It connected the input and

output appropriately and makes up all the operation that make up one

iteration stage of the DFPM algorithm.

8. Signed_DFPM_Iteration_Control: This module instantiated the

Signed_DFPM_One_Iteration module. It feeds the new V and X vectors

back into the computational module and stops the iterations when con-

vergence is attained.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

24

4.6 Implementation constraint

In order to translate, map and route the design done in VHDL to device specif-

ic circuit, an implementation constraints file named UCF_DFPM_TOP was

used. The file links input and output pins specified in the project top module

with the intended pin on the FPGA chip and demonstration board.

4.7 Parameters

The design was intended to make room for some level of easy configurability.

Thus, the initial values of vectors v and x, and the scalar discretization coeffi-

cient (dt), the tolerance and the damping factor (mu) can be changed inside the

DFPM modules. The UART module parameters can also be easily modified.

The default values for these parameters are listed below:

Table 4.2 Table of parameters and corresponding values used

S/N Parameter Value used

1. Vector V [1 1 1 1 1]

2. Vector X [1 1 1 1 1]

3. Damping factor 0.1

4. Discretization coefficient 1.0

5. Tolerance 2-7

6. UART baud rate 9600

7. Number of data bits per trans-

mission

8

8. Parity odd

9. Number of stop bits 1

10. Handshaking None

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

25

4.8 Data exchange format

The exchange of data between the PC terminal and the FPGA system needed

to be standardized in order for the data to be stored in the correct structure

and also for it to be usable by the DFPM computation modules.

The MATLAB approach for specifying vectors and matrices was, hence,

adopted.

In order to specify a problem set of the type applicable in the format usable by

the DFPM module, closing braces begin all problem sets, followed by each

element of each row of the matrix separated by whitespace and each row in a

matrix separated by a semicolon. The solution output from the FPGA is trans-

mitted using the same standard except for the opening and closing braces.

An example of the utilization is shown in the Figure 4.2 below.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

26

Figure 4.2 Image showing the terminal being used for data exchange be-

tween the FPGA and the PC

4.9 Signed numerical representation

Since digital systems only deal with binary arithmetic for numerical computa-

tions and representation, the numbers handled in the DFPM algorithm were

represented by using signed bits. This decision helped to ensure that positive

and negative numbers were distinguished from one another.

The downside of this approach was that the bit being used for sign representa-

tion could not be used for numerical value representation. Therefore an extra

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

27

bit needed to be added to the number of bits representing each signed number

in order to make up for the shortfall.

4.10 Integer and fractional representation

Another important consideration in the design was the representation of

fractional values. It was decided that binary digits after the radix point will be

represented and treated like whole integers i.e. shifted to the left. At the end of

all computations, the result will also be shifted to the right by the appropriate

number of binary digits to make up for the left shift. This process is a simple

scheme that makes for the manipulation of fractions in a way that is similar to

whole numbers.

As a result, each number in the DFPM algorithm consisted of 33 bits. The MSB

indicated the sign of the number while the next 16 bits represented the integer

part of the value being handled. The fractional part of the number was then

represented by the least significant 16 bits.

Below is an image showing a sample numerical representation as used in the

design. It can be seen that the MSB is “0” therefore it is a positive number. The

next 16 bits are equivalent to 910 and the last 16 bits are equivalent to 0.628906

(i.e. 2-1 + 2-3 + 2-8). Hence the number represented in the image below is

+9.628910.

Fig 4.3 Image showing the numerical representation scheme

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

28

The multiplication of two numbers with n number of fractional binary digits

will result in a product with 2n fractional binary digits. This scheme, therefore,

offers an advantage in multiplication operations since it ensures that multipli-

cative operations maintain a precision of 2-810 for each operation.

4.11 Spartan 3E-1200 FG320 FPGA

Spartan 3E-51200 FG320 FPGA is a standard performance 320-ball fine pitch

ball grid array FPGA chip with 1.2 million gates, 136 K RAM, 28 dedicated

multipliers and 250 user IO pins [7]. The chip is made up of five functional

elements which are the Digital Clock Managers (DCMs), the Input/Output

Blocks (IOBs), Configurable Logic Blocks (CLBs), dedicated multipliers and

block RAMs.

The dedicated multipliers are able to directly compute 18-bit by 18-bit multi-

plication in two’s complement while the IOBs can be used for data input and

output to and from the FPGA and the 136 K RAM is equivalent to 139264 bits

of memory available for storage on (136 * 1024 bits). The logic of combinatorial

and synchronous circuits resulting from the VHDL design is mainly imple-

mented in CLBs (Configurable Logic Blocks) on the chip.

4.12 Nexys2 FPGA demonstration board

The Nexys2 FPGA demonstration board is a hardware platform, designed and

manufactured to accommodate and support the Spartan 3E FPGA, enable a

demonstration of its capabilities and provide some standard hardware periph-

eral access to the chip.

It can be powered via USB, battery or wall socket and runs on a 50 MHz oscil-

lator while featuring 16 MB SDRAM and flash and an impressive array of

standard hardware interfaces like VGA, USB, RS232 ports as well as switches,

buttons and a quad digit seven segment display [8].

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

29

Figure 4.4 Image showing a Nexys2 FPGA demonstration board

4.13 Xilinx ISE

Hardware design was done with Xilinx ISE (Integrated Synthesis Environ-

ment) and the generated design was then downloaded onto the FPGA. Xilinx

is free software developed by Xilinx for programming FPGAs and for their

hardware design.

There are a number of other design/synthesis environment applications for

hardware design, e.g. Altera’s Quartus II design environment. However,

Xilinx seemed to be an obvious choice due to the fact that it was offered by the

vendor of the FPGA chip used, and also because it provides out-of-the-box

support for the FPGA chip and the board used.

4.14 ISim simulation software

ISim simulator software is a software application for the simulation of HDL

code which is bundled with the Xilinx ISE software suite. It is easy to use and

provides support for mixed languages, multi-threaded compilation, and dis-

plays the circuit behavior with the aid of waveforms on the screen.

ModelSim is also a simulation software that can be used but due to its usage

restrictions and the author’s familiarity with ISim, ISim was chosen over

ModelSim.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

30

4.15 Design verification

For each module designed in this project, a test-bench was written for testing,

simulation and verification of its functionality and behavior. Test-benches, in

this context, refer to VHDL code written for the purpose of simulating opera-

tional circumstances of the designed module in question. The modules being

tested are normally referred to as unit under test (UUT).

4.16 The complete design

The complete system integrated these different modules and connected them

while doing type conversion in the top module where appropriate. The incom-

ing data from the UART were converted to signed bit vectors and stored in

memory on the FPGA until all the data necessary for each problem set were

received.

After this, a signal that activates the DFPM computation module is raised so

that computation can start. The complete design made use of 26 multipliers, 12

IOB pins and 3243 LUTs. While the utilization of multipliers was 92%, the

utilization of logical and IO blocks was much lower. A copy of the project

report summary is included in the appendix of this report.

DFPM On FPGA

Taiyelolu Adeboye

4 Design

2015-09-25

31

Figure 4.5 The Nexys2 board FPGA connected to a PC and running the

DFPM algorithm.

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

32

5 Results Every module designed in Chapter 4 of this report was tested with a test-bench

written in VHDL. The test benches were written to simulate the expected

conditions and functional environment for each module. The simulations were

done in ISim software and the module’s behavior verified through visual

inspection and calculations. The test benches were not included in appendix of

this report. The following are results of the tests carried out on the modules.

It is worth noting that since the values represented in this chapter are basically

binary, negative numbers were represented in two’s complement.

5.1 Simulation results

5.1.1 Element wise vector multiplication

The image below shows the result of the simulation of the vector multiplica-

tion module. Vectors 1 and 2 were input while vector_out was the output.

Fig 5.1 Test simulation for Signed_Vector_Vector_Mult module

Vector 1 = [5.0 3.0 2.0 4.0 7.0] and Vector 2 = [3.0 2.0 3.0 4.0 5.0]

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

33

The output vector was 10011102 = 78.0

By calculation: (5*3) + (3*2) + (2*3) + (4*4) + (7*5) = 78

This supports the idea that the module worked fine.

5.1.2 Element-wise vector subtraction

Figure 5.2 Test simulation for Signed_Vector_Vector_5By1_Subtr module

Above is an image of the simulation waveform for the vector subtraction

module. The input vectors were named vectors 1 and 2 while the output was

named vector_out.

Vector 1 = [1.0 7.81e-3 11.72e-3 15.62e-3 19.53e-3]

Vector 2 = [15.0 3.91e-3 3.91e-3 3.91e-3 3.91e-3]

Vector out = [-14.0 3.91e-3 7.81e-3 11.72e-3 15.62e-3]

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

34

Simple calculation indicates that Vector 1 – vector 2 = vector out.

5.1.3 Evaluating new vector V

In the image below, the effect of operations pipelining can be seen as the

elements of vector_new_v assume new values one clock cycle after one anoth-

er. The iteration complete signal indicates the completion of the subtraction

and multiplication operations in each iteration stage.

Figure 5.3 Test simulation for Signed_New_V_Ops

5.1.4 Evaluating new vector X

Similar to the module in section 5.1.3 above, the effect of pipelining is seen in

the evaluation of vector_new_x. The signal new_v_ready signified that the

evaluation of the new value for vector V was complete and that the evaluation

process for vector x can start.

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

35

Figure 5.4 Test simulation for Signed_New_V_Ops

The signal new_X_ready is a signal line that indicated that the operation was

complete. The behavior was as expected.

5.1.5 Convergence check

The tolerance check module was simulated with two sets of values for vector

b_ax. The first set of values was set to be beyond the tolerance level while the

second set of values was set to be below the expected limit.

The signal “iteration complete” raised at the end of each multiplication and

subtraction operation of the iteration stage. The convergence check module

completes its function in about seven clock cycles, after which, the “iterate”

signal should be raised high or low depending on the result of the convergence

check.

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

36

Figure 5.5 Test simulation for tolerance check module

It can be seen above that after the second set of values were received and

computed, the “iterate” signal was brought low. This is consistent with the

design concept.

5.1.6 DFPM top module

This simulation was done with the following input set:

Vector B

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

37

Matrix A

Vectors X and V

By visual inspection of the results from the simulation, the final value of vector

X on the output was calculated thus:

Vector X(0) is a negative number since the first bit is 1.

1111111111111111111000111011001012 in two’s complement is equivalent to -

0000000000000000000111000100110102 in unsigned binary. A simplified ap-

proach to conversion of unsigned binary to and from two’s complement is

outlined in the appendix.

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

38

Figure 5.6 Test simulation for DFPM top module

Hence it is correct to state that:

Vector X(0) = - (0.0 + 2-3 + 2-4 + 2-5 + 2-9 + 2-12 + 2-13 + 2-15).

Vector X(0) = -0.2211

In the same manner Vector X(1) is a negative number.

1111111111111111111100100011100112 in two’s complement is equivalent to -

0000000000000000000011011100011002 in unsigned binary. Hence,

Vector X(1) = - (0.0 + 2-4 + 2-5 + 2-7 + 2-8 + 2-9 + 2-13 + 2-14)

Vector X(1) = -0.1076

Vector X(2) , Vector X(3) and Vector X(4) are positive numbers since their MSB

are 0. Therefore conversion from two’s complement is not required for them.

Vector X(2) = 000000000000000000001001111100000

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

39

Vector X(2) = +0.0 + 2-4 + 2-7 + 2-8 + 2-9 + 2-10 + 2-11

Vector X(2) = +0.0776

Vector X(3) = 000000000000000000011111001011011

Vector X(3) = +0.0 + 2-3 + 2-4 + 2-5 + 2-6 + 2-7 + 2-10 + 2-12 + 2-13 + 2-15 + 2-16

Vector X(3) = +0.2436

Vector X(4) = 000000000000000000101101000100000

Vector X(4) = +0.0 + 2-2 + 2-4 + 2-5 + 2-7 + 2-11

Vector X(4) = +0.3520

Therefore the final value of the solution vector in this simulation was

While the behavior seen above was consistent with design expectation, it was

considered that comparison with the output from a MATLAB implementation

would help to further verify the module’s behavior.

The values obtained from the MATLAB code and the VHDL simulations were

quite close as the MATLAB implementation produced vector X as shown

below:

X = [-0.2199, -0.1074, 0.0775, 0.2440, 0.3521]

5.2 Comparison

The circuit implemented on FPGA was tested by connecting the FPGA to a PC

and sending in numbers that represented problem sets while the FPGA re-

turned the solution to the problems. Since the accuracy was crucial, the results

obtained during these tests were noted and compared with values obtainable

from the same algorithm implemented in MATLAB on a PC. The comparison

showed that the values obtained by both systems, for each problem set inves-

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

40

tigated, were approximately equal. A table comparing the results obtained

during two of these tests is shown below.

Table 5.1 Table of a comparison of the results obtained from two runs of

DFPM on different systems.

1st test 2nd test

Problem

Set

Vector A

Vector B

Solution

Vector

(MATLA

B/PC)

Binary N/A N/A

Decimal

Solution

Vector

(FPGA)

Binary

DFPM On FPGA

Taiyelolu Adeboye

5 Results

2015-09-25

41

Decimal

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

42

6 Discussion Based on the tests carried out on the VHDL design modules, the behavior of

the circuit was as expected. However, a number of implications need to be

discussed.

6.1 FPGA resource utilization

Due to the fact that FPGAs have limited resources, there are established limita-

tions to the number of multiplication operations one can execute in parallel for

problems of the 5x5 matrix dimension implemented in this design. As matrix

dimensions get bigger the number of concurrent operations possible are re-

duced proportionately.

By this design, for a problem defined by an n dimension matrix and n-element

vectors, then n + 5 number of multipliers will be needed for the design. This is

because matrix row-vector multiplication in A*X was done concurrently for

each row while other multiplication operations were done sequentially. An-

other limitation is the data size expected by the dedicated multipliers.

The Spartan 3E multipliers are 18-bit multipliers by default and multiplication

operations involving data types bigger than 18 bits will consume even more

resources. As can be seen in the project report, the actual number of multipli-

ers used was 26 out of a total of 28.

6.2 Reduction in computation time

For every iteration stage of this design, computation time for (n-1)2 is saved.

Thus for a solution requiring m number of iterations, the time required for ((n

– 1)2 * m) multiplication operations are saved per solution. For instance, a 5 by

5 design as implemented in this project work saves the computation time for

1600 multiplication operations for a solution requiring a hundred iterations.

6.3 Larger problem sets

An approach to implementing this design for significantly larger problem sets

might be to section the complete data set into subsets containing small-sized

problem sets which the module is capable of handling. The solutions can then

be stored and reused as appropriate. At a point, this approach might encounter

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

43

limitations as well, due to the fact that the on-chip memory of FPGAs is also

limited. However, this was not the focus of this design.

6.4 UART bottleneck

Tests showed that each iteration stage of DFPM computation for a 5 by 5

dimensioned problem required 28 clock cycles. However, the data was being

received through a 9600 baud rate UART. The UART is, thus, slower than the

DFPM computations. In a case where large volumes of data may need to be

transmitted to the DFPM computation module, the UART may prove to be a

bottleneck. This problem might be mitigated with the use of a more parallel

communication mode and faster transmission rates.

6.5 Precision

Although the number of bits assigned for fractional value representation was

quite many (16 bits), there might be some challenges when it comes to the

accuracy of the exact values obtained from multiplication operations. This is

because the result of the multiplication of two 33-bit values is a 66-bit value.

When this product is to be stored back in a 32-bit data type container, then

some bits will be lost.

This problem will, most likely, not affect integer values in the DFPM computa-

tion but can result in some precision loss in the fractional representation.

6.6 Communication input/output limitations

Since the data received from the UART could not be used directly, modules

were written for the forward and reverse translation of the data transmitted to

and received from the DFPM computation module.

For instance, due to the translation done in the “UART_out_DFPM_in” mod-

ule, only single digit decimal numbers are expected as input data typifying the

problem set. Likewise, in order to reduce FPGA resource consumption, reverse

translation of the solution vector element sets was also limited to four fraction-

al digits.

6.7 Cross platform comparison

Since the goal of the project is to implement DFPM in an FPGA design that is

speed optimized, the CPU time consumed by the algorithm became an issue of

pertinent importance. However, since different computational devices have

varying architectures and processing speed, as well as operating systems, a

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

44

reasonable metric for the evaluation of the computation time that is independ-

ent of these parameters was needed in order to compare the performance of

the FPGA design with other implementations. The agreed metric was the

number of clock cycles used by the processing unit while executing the DFPM

algorithm.

Thus comparison was done between the DFPM computation done on the

FPGA and the same algorithm coded in C++ and run on a 2.4 GHz CPU PC.

The FPGA implementation completed the algorithm for solving the sample

problem used for testing the DFPM top module (according to simulation) in

57670 nanoseconds which is equivalent to 2883.5 clock cycles while the PC

used completed the same problem in 0.0156001 seconds.

The time used up by the PC included the time used for context switching and

kernel operations, in the operating system, as well as process user time. Provi-

sion was made in the C++ code used for implementing the algorithm and for

measuring the time taken.

In the C++ code, arrays with a dimension of 1000 were created for storing a

thousand copies of vectors A and B and the DFPM algorithm was implement-

ed and looped through each copy of the same problem statement. Thus a

thousand copies of the same problem were treated with the same algorithm.

The large number of iterations was a result of the fact that the amount of time

spent by the CPU in kernel mode was sometimes too low to be measured by

the functions used to measure the CPU process times when the algorithm was

run only once.

Hence running the algorithm a thousand times generated reasonably measur-

able process times from which the time spent by the CPU while not running

the actual algorithm was deducted and the result of the deduction was divided

by 1000 in order to trim down the CPU time obtained to what is applicable to a

single run of the DFPM algorithm.

Based on the test, and the assumptions that the program/algorithm was exe-

cuted on only one core of the CPU and that the CPU was not overclocking, the

number of clock cycles used by the PC = 2.4 * 109 * 0.0156001/1000 = 37440.240.

This evidently indicated that the FPGA implementation offers a great ad-

vantage.

It is noteworthy to state that if the CPU executed the program on multiple

cores or overclocked while running the program, the PC may have ended up

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

45

using more cycles than stated above. Nonetheless, the calculations show that

in both cases, DFPM would still have been faster. A copy of the C++ code is

included in the appendices.

6.8 Output comparison

In order to ensure consistency of results and ease of operation, a MATLAB

script was written which is able to communicate problem specifications to the

FPGA and receive its results. The MATLAB script also computes the algorithm

on its own and the two outputs were printed to the screen and compared. The

script is described further in Appendix D with the code included.

By making use of the script described above, three different problem sets were

formulated and fed to the DFPM on FPGA design through the MATLAB

script. The results obtained are shown below as well as the MATLAB plots of

the values obtained during each test.

The plots have no units on the x and y axes since the plots were only used to

indicate the proximity between the results obtained. Hence the plots showed

the location of each of the results obtained on the co-ordinate axes.

Figure 6.1 Plot of the values obtained during the first test

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

46

Table 6.1 Table of results obtained in tests with three different problem sets

Tests Results obtained

MATLAB implementation FPGA implementation

Test 1 -2.4599e-01

-1.9253e-01

+5.8280e-03

+2.5866e-01

+5.0859e-01

-2.4715e-01

-1.9301e-01

+5.7221e-03

+2.5965e-01

+5.1057e-01

Test 2 -3.8910e-01

-1.5755e-01

+1.2061e-02

+2.6273e-01

+5.1339e-01

-3.9112e-01

-1.5810e-01

+1.1765e-02

+2.6343e-01

+5.1507e-01

Test 3 +6.5463e-01

+3.7920e-01

+3.1785e-01

+6.8058e-02

-1.8173e-01

+6.5653e-01

+3.7948e-01

+3.2008e-01

+6.8391e-02

-1.8323e-01

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

47

Figure 6.2 Plot of the values obtained during the second test

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

48

Figure 6.3 Plot of the values obtained during the third test

As can be seen in the figures and table above, in each of the three tests carried

out, the results of the MATLAB implementation and the FPGA implementa-

tion tallied so much so that the point plots overlapped at each of the positions

marked on the plots, indicating that, to a large extent, the differences in the

values obtained are almost negligible.

However, it is worth noting that these tests made use of single digit data as

coefficients in the matrices and vectors used to define the problem sets. It is

believed that this implementation can handle these kinds of data but the de-

sign of the communication modules were limited and only capable (by design

intent) to handle single digit input alone.

While the MATLAB implementation produced results that are very close, it

may be reasonable to expect some variation with some other implementations

and system architectures due to the differences in hardware and software

design, as well as system optimization, be it in hardware or software.

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

49

6.9 Communication possibilities

As indicated in an earlier part of this discussion, the speed of the whole system

was limited due to bottlenecks in the UART. However, in consideration of the

fact that most inter-component communication between electronic modules

and components make use of standard protocols, of which UART is one, this

design will still perform slightly better and faster than most other designs that

make use of sequential processing.

Nonetheless, there are other faster protocols which can be exploited in order to

speed up the rate of data exchange and parallel communication can also be

considered since the FPGA has a substantial number of I/O (Input/Output)

pins.

6.10 Applications

This design concept can find application in a large number of fields ranging

from mathematical theory to real world engineering design and systems. The

DFPM can be used to model systems in nature, for instance heat flow in a

space, and fluid flow [10] etc.

A great number of applications can also be found in electronics and engineer-

ing in general. DFPM will prove very useful in solving least squares and,

possibly, weighted least squares problems in sensor fusion. This will prove

useful in radar systems, telecommunications, multi-sensor networks and

mobile sensory and localization problems often encountered in systems requir-

ing self-localization, e.g. mobile robots, and sound-source detecting systems.

DFPM looks promising for the field of image and signal processing especially

in problems requiring singular value decomposition (SVD). DFPM will also

find great usefulness in mechanics where complex linear and non-linear sys-

tems may need to be modeled.

Solutions of large matrix problems often require significant computation and

computational resources, hence DFPM can be found to be a very suitable and

resource-efficient approach to solving these problems. It will be even more

useful when the problem involves sparse matrices, a concept that is useful in

FEM based simulations which is used in all engineering fields [9].

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

50

A DFPM algorithm based on a smaller dimensioned matrix that functions as a

sliding window through the matrix can serve as a very quick, efficient ap-

proach that requires minimal computational resources.

6.11 Implications

While DFPM offers a lot of advantages and developmental possibilities, there

are situations in which its efficiency can possibly be exploited for negative

purposes.

Certain aspects of data safety and integrity depend on hashing and a signifi-

cant amount of computational resource and time is required to break them but

the advent of simpler algorithms and dedicated devices (e.g.) FPGAs with

great computational power facilitate access to, supposedly secured, data by

criminals.

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

51

7 Conclusions It was found that the design approach met expectations and offered significant

advantages over traditional computational devices and methods. It was also

found that implementing the DFPM algorithm in FPGA is an efficient ap-

proach to reducing computation time and improving resource efficiency.

Since the DFPM algorithm is widely applicable to a number of other problems,

implementing the algorithm in a dedicated device that makes efficient use of

resources, while increasing the speed at which results are obtained, offers a lot

of advantages.

7.1 Benchmark

In order to base the conclusions drawn in this project on criteria that are inde-

pendent of platforms, the computation output and the number of clock cycles

were used.

Based on the result of a test carried out using the C++ snippet in Appendix A,

on a mobile PC, Acer Aspire 5750, with dual CPU cores running at 2.4 GHz

clock speed, it was observed that the same algorithm applied to a specific

problem required 75754 clock cycles on the PC while the same problem was

completed in 3192 clock cycles using the FPGA implementation.

Regardless of the significant difference in computation time and computational

architecture and resources, the results obtained from both computations were

close enough to be regarded as equivalent.

Hence, the initial goals of the design were achieved and the expectation of

superior performance and resource-efficiency was verified.

7.2 Further work

A lot can be improved in this design. Below is a list of possibilities:

1. Improving the forward translation modules so that they can handle

multi-digit decimal input in the problem set.

2. Modifying the module that reverse-translates the solution vector from

the DFPM top module so that they are able to handle the full range of

bits representing fractional values in the data type used in the design.

DFPM On FPGA

Taiyelolu Adeboye

6 Discussion

2015-09-25

52

3. Designing the DFPM computational module to be able to handle larger

problem sets along with the possibility of handling multi-dimensional

problem sets.

4. Enhancing the UART baud rate as well as making it configurable in use.

This will reduce the stress that can be encountered while setting up a

connection between the UART on the FPGA and the terminal applica-

tion software.

5. Enhancing the design so that it can handle multiple problem sets, i.e. re-

ceive a problem set, resolve it and return to wait for the next problem.

DFPM On FPGA

2015-09-25

53

References [1] S. Edvardsson, M. Gulliksson, J. Persson, et. al, “The Dynamic Functional

Particle Method: An Approach for Boundary Value Problems”, J. Appl.

Mech. 79(2) 021012 (Feb 24, 2012)

[2] S. Edvardsson et al, Role of the dynamic functional particle method for

solving linear equations, Physical Review E. Statistical, Nonlinear, and

Soft Matter Physics.

[3] R. Sincovec, N. Madsen, Software for non-linear partial differential

equations, ACM Trans. Math. Softw. 1 (1975) 232 260

[4] V. Pata, M. Squassina, On the strongly damped wave equation, Com-

mun. Math. Phy. 253 (2005) 511 533

[5] F. Alvarez, On the minimization property of a second order dissipative

system in Hilbert spaces, Siam J. Control Optim. 38 (2000) 1102 1119

[6] B. Land, Hybrid Computing On an FPGA, Cornell University,

https://courses.cit.cornell.edu/ece576/DDA/FPGAhybridBRL.pdf, last re-

trieved 2014-09-25

[7] Xilinx Inc., 2013: Spartan 3-E FPGA family data sheet,

http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf ,

last retreived 2014-09-25

[8] Digilent Inc., 2011, Digilent Nexys2 Board Reference manual,

http://www.digilentinc.com/data/products/nexys2/nexys2_rm.pdf , last

retrieved 2014-09-25

[9] Y. Saad, Iterative methods for sparse linear systems, 2nd ed., Society for

Industrial and applied mathematics, 2003.

[10] Ne_Zheng Sun, Applications of numerical methods to simulate the

movements of contaminants in groundwater, Environmental Health Per-

spectives, Vol. 83, (Nov. 1989), pp. 97 – 115.

[11] ASCII Table, www.asciitable.com , last retrieved 2014-09-26.

https://courses.cit.cornell.edu/ece576/DDA/FPGAhybridBRL.pdf

http://www.xilinx.com/support/documentation/data_sheets/ds312.pdf

http://www.digilentinc.com/data/products/nexys2/nexys2_rm.pdf

http://www.asciitable.com/

DFPM On FPGA

Appendix A: Documentation of

developed program code

2015-09-25

54



Design codes

Vector multiplication 1 --------------------------------------------------------------

2 -- Company: Mid Sweden University

3 -- Engineer: Taiyelolu Adeboye

4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:

7 -- Module Name: Signed_Vector_Vector_Mult_5By1 - Behavioral

8 -- Project Name: DFPM on FPGA

9 -- Target Devices: Nexys2

10 -------------------------------------------------------------

11 library IEEE;

12 use IEEE.STD_LOGIC_1164.ALL;

13 use IEEE.std_logic_signed.all;

14 use work.DFPM_ARRAY_5X32_BIT.all;

15

16 -- Uncomment the following library declaration if using

17 -- arithmetic functions with Signed or Unsigned values

18 use IEEE.NUMERIC_STD.ALL;

19

20 -- Uncomment the following library declaration if instantiating

21 -- any Xilinx primitives in this code.

22 --library UNISIM;

23 --use UNISIM.VComponents.all;

24

25 entity Signed_Vector_Vector_Mult_5By1 is

26 Port ( Vector_1 : in DFPM_SIGNED_VECTOR_5X32_BIT;

27 Vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;

28 CLK : in STD_LOGIC;

29 RST : in STD_LOGIC;

30 Vector_Out : out Signed (32 downto 0));

31 end Signed_Vector_Vector_Mult_5By1;

32

33 architecture Behavioral of Signed_Vector_Vector_Mult_5By1 is

34

35 Signal Mult0, Mult1, Mult2,

Mult3, Mult4 : Signed(65 downto 0):= (others => '0');

36

37 Signal Sum : Signed(69 downto 0):= (others => '0');

38

39 begin

40

41 Mult0 <= Vector_1(0) * Vector_2(0);





46

47 Sum <= "0000" & Mult0 + Mult1 + Mult2 + Mult3 + Mult4;

48

49 Vector_Out <= Sum(48 downto 16);

DFPM On FPGA



2015-09-25

55

50

51 end Behavioral;

Vector subtraction 1 --------------------------------------------------------------



4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:




10 -------------------------------------------------------------

11

12 library IEEE;




16




20



23 --use IEEE.NUMERIC_STD.ALL;

24

25

29

30 entity Signed_Vector_Vector_5By1_Subtr is


32 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;



35 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT);

36 end Signed_Vector_Vector_5By1_Subtr;

37

38 architecture Behavioral of Signed_Vector_Vector_5By1_Subtr is

39

40 Signal Subtr0, Subtr1, Subtr2, Subtr3, Subtr4 : Signed(33 downto 0);

41

42 begin

43

44 Subtr0 <= '0' & Vector_1(0) - vector_2(0);





49

50 Vector_Out(0) <= Subtr0(32 downto 0);





55

56

57 end Behavioral;

Subtraction and multiplication operations

Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015

Page 1

DFPM On FPGA



2015-09-25

56

1 --------------------------------------------------------------



4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:




10 -------------------------------------------------------------

11

12 library IEEE;






18

19

20 entity Signed_SubtrAndMult_Ops_Module is

21 Port ( Vector_A : in DFPM_SIGNED_VECTOR_25X32_BIT;

22 Vector_B : in DFPM_SIGNED_VECTOR_5X32_BIT;

23 Vector_X : in DFPM_SIGNED_VECTOR_5X32_BIT;

24 Scalar_Mu : in SIGNED (32 downto 0);

25 Vector_V : in DFPM_SIGNED_VECTOR_5X32_BIT;

26



29 NEW_ITERATION : in STD_LOGIC := '0';

30 ITERATION_COMPLETE : out STD_LOGIC:= '0';

31

32 B_Minus_AX : out DFPM_SIGNED_VECTOR_5X32_BIT;

33 B_Minus_Ax_Minus_muV : out DFPM_SIGNED_VECTOR_5X32_BIT);

34 end Signed_SubtrAndMult_Ops_Module;

35

36 architecture Behavioral of Signed_SubtrAndMult_Ops_Module is

37

38 ------------------------------------------------

39

40

41 -- This component will be used to evaluate

42 -- The vector multiplication A*X

43 -- It takes two input of 5 by 1 vectors

44 COMPONENT Signed_Vector_Vector_Mult_5By1

45 PORT(

46 Vector_1 : IN DFPM_SIGNED_VECTOR_5X32_BIT;

47 Vector_2 : IN DFPM_SIGNED_VECTOR_5X32_BIT;

48 CLK : IN std_logic;

49 RST : IN std_logic;

50 Vector_Out : OUT Signed(32 downto 0)

51 );

52 END COMPONENT;

53

54 -- This component will be used top evaluate the subtraction in B -

Ax

55 COMPONENT Signed_Vector_Vector_5By1_Subtr


57 vector_2 : in DFPM_SIGNED_VECTOR_5X32_BIT;



60 Vector_Out : out DFPM_SIGNED_VECTOR_5X32_BIT);

61 END COMPONENT;

62

DFPM On FPGA



2015-09-25

57

63 ------------------------------------------------

64

65

66

67 ------------------------------------------------

68 -- Signals for storing the input values

69 Signal Sig_Vector_A : DFPM_SIGNED_VECTOR_25X32_BIT := ( ((Others =>

'0'), (Others

=> '0'), (Others => '0'), (Others => '0'), (Others => '0')),

70 ((Others => '0'), (Others







=> '0'), (Others => '0'), (Others => '0'), (Others => '0')));

74

75 Signal Sig_Vector_B : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>

'0'), (Others =>

'0'), (Others => '0'), (Others => '0'), (Others => '0'));

76 Signal Sig_Vector_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>

'0'), (Others =>


77 Signal Sig_Scalar_Mu: SIGNED (32 downto 0);

78 Signal Sig_Vector_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others =>

'0'), (Others =>


79

80

81 -- The two signals below are used to connect the signals at the

Vector_vector_Mult_Module

82 -- To the the Corresponding Vector indexes.

83 -- These were used to avoid assigning Dynamically changing signals

directly to a

static line

84 Signal Sig_Vector_A_With_IndexPosition : DFPM_SIGNED_VECTOR_5X32_BIT

:= ((Others =>

'0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others =>

'0'));

85

86 Signal Sig_Vector_A_Mult_X_With_IndexPosition : SIGNED (32 downto

0);

87

88 -- These following two(2) signals will be used to store the products

of the

89 -- Multiplication of Vectors A and X

90 -- as well as Scalar mu and Vector V.

91 Signal Sig_Vector_A_Mult_X : DFPM_SIGNED_VECTOR_5X32_BIT := ((Others

=> '0'), (

Others => '0'), (Others => '0'), (Others => '0'), (Others => '0'));

92 Signal Sig_Vector_Mu_Mult_V : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth-

ers => '0'), (


93

94 -- These following tow signals will be used to store the result

95 -- of the subtraction operations

96 Signal Sig_Vector_B_Minus_AX : DFPM_SIGNED_VECTOR_5X32_BIT := ((Oth-

ers => '0'), (


97 Signal Sig_Vector_B_Minus_AX_Minus_MuV : DFPM_SIGNED_VECTOR_5X32_BIT

:= ((Others =>

DFPM On FPGA



2015-09-25

58

'0'), (Others => '0'), (Others => '0'), (Others => '0'), (Others =>

'0'));

98

99 -- This signal will only be raised for one clock cycle

100 -- when there is a new set of data for available computation

101 Signal DFPMCompute : STD_LOGIC := '0';

102

103 -- This signal is used to sommunicate with other modules "down-

stream" of this module

104 -- when there the result of this module's computation is ready

105 Signal Sig_ITERATION_COMPLETE : STD_LOGIC := '0';

106

107 -- This Signal will be used to represent the index position that

108 -- that will be progressively incremented as a means of pipelining

109 -- data for multiplication in this module as well as input for the

110 -- Vector_Vector_Multiplication module

111 Signal MultplicationStageArrayPosition : integer := 0;

112

113 -- This signal will be used to signal when the index position

114 -- can be shifted and when data can be stored for output

115 Signal Shift_Array_Position : STD_LOGIC := '0';

116

117 -- This signal will be raised once when all the products of multi-

plication are

ready.

118 -- This is to enable the module to signal to other modules "down-

stream"

119 -- that the result of the computation is ready

120 Signal MultiplicationProductsReady : STD_LOGIC := '0';

121

122 Signal ReadyFlag : STD_LOGIC := '0';

123

124 -- This clock signal was created as a slowed down (half pace of

CLK)

125 -- And will be used for clocking the shifting of the index position

126 Signal Sig_Clk_For_Index_Shifting : STD_LOGIC := '0';

127

128

129 begin

130 -- For Vector - Vector multiplication

131 Vector_Vector_Mult: Signed_Vector_Vector_Mult_5By1 PORT MAP (

132 Vector_1 => Sig_Vector_A_With_IndexPosition,

133 Vector_2 => Sig_Vector_X,

134 CLK => CLK,

135 RST => RST,

136 Vector_Out => Sig_Vector_A_Mult_X_With_IndexPosition);

137

138 -- For Subtraction operations for B - AX

139 Doing_B_Minus_AX : Signed_Vector_Vector_5By1_Subtr PORT MAP (

140 Vector_1 => Sig_Vector_B,

141 vector_2 => Sig_Vector_A_Mult_X,

142 CLK => CLK,

143 RST => RST,

144 Vector_Out => Sig_Vector_B_Minus_AX);

145

146 -- For Subtraction operations for B - AX - muV

147 Doing_B_Minus_AX_Minus_MuV : Signed_Vector_Vector_5By1_Subtr PORT

MAP (

148 Vector_1 => Sig_Vector_B_Minus_AX,

149 vector_2 => Sig_Vector_Mu_Mult_V,

150 CLK => CLK,

151 RST => RST,

152 Vector_Out => Sig_Vector_B_Minus_AX_Minus_MuV);

DFPM On FPGA



2015-09-25

59

153

154 -- This signal wiill be used to signal that the output of this

module is ready to

be read.

155 ITERATION_COMPLETE <= Sig_ITERATION_COMPLETE;

156

157

158

159

160

161 -- This process determines the when each iteration of the DFPM

algorithm is to be

started

162 -- Computation will only be done if it's a new iteration and it has

not been

completed before

163 -- Therefore this process sets DFPMCompute to '1' only on the

rising edge of

NEW_ITERATION

164 -- And stored new Value into the Vectors only at the rising edge of

NEW_ITERATION

165 process(CLK, RST, Sig_ITERATION_COMPLETE, NEW_ITERATION)

166 Variable NEW_ITERATION_Var : STD_LOGIC := '0';

167 begin

168 if rising_edge(CLK) then

169 if (RST = '1') then

170 DFPMCompute <= '0';

171 NEW_ITERATION_Var := '0';

172 elsif (Sig_ITERATION_COMPLETE = '1') then



175 -- This more or less senses for the rising edge of NEW_ITERATION

176 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '0') then

177 --if rising_edge(NEW_ITERATION) then


179

180 Sig_Vector_A <= Vector_A;

181 Sig_Vector_B <= Vector_B;

182 Sig_Vector_X <= Vector_X;

183 Sig_Vector_V <= Vector_V;

184 Sig_Scalar_Mu <= Scalar_Mu;

185


187 elsif (NEW_ITERATION = '1') and (NEW_ITERATION_Var = '1') then



190 elsif (NEW_ITERATION = '0') then



193 end if;

194 end if;

195 end process;

196

197

198 -- This process determies the array postions to be multiplied

together for A*X

199 process(RST, Sig_ITERATION_COMPLETE, DFPMCompute,

Shift_Array_Position,

NEW_ITERATION, CLK, Sig_Clk_For_Index_Shifting, MultplicationStageAr-

rayPosition,

Sig_Vector_A, Sig_Vector_A_Mult_X_With_IndexPosition, Sig_Scalar_Mu,

Sig_Vector_V)

200 Variable MultplicationStageArrayPosition_Var : integer := 0;

DFPM On FPGA



2015-09-25

60

201

202 begin

203 if (RST = '1') then

204 MultplicationStageArrayPosition <= 0;

205 Shift_Array_Position <= '0';

206 MultiplicationProductsReady <= '0';

207

208 elsif (Sig_ITERATION_COMPLETE = '1') then



211

212 elsif (DFPMCompute = '1') then -- Checking for the rising edge of

NEW

iteration here




216

217 -- Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0);

218 -- Sig_Vector_A_Mult_X(0) <=

Sig_Vector_A_Mult_X_With_IndexPosition;

219 -- productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0);

220 -- Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <=

productTempStore(48 downto 16);

221

222 elsif (Shift_Array_Position = '1') then

223 if rising_edge(Sig_Clk_For_Index_Shifting) then

224 if (MultplicationStageArrayPosition = 5) then




228 else

229 MultplicationStageArrayPosition_Var :=

MultplicationStageArrayPosition;

230 MultplicationStageArrayPosition <=

MultplicationStageArrayPosition_Var + 1;

231 end if;

232 end if;

233 end if;

234 end process;

235

236 process(CLK, DFPMCompute, Shift_Array_Position, Multplication-

StageArrayPosition)

237 Variable productTempStore : Signed(65 downto 0);

238 begin


240 if (Shift_Array_Position = '1') and ( MultplicationStageArrayPosi-

tion < 5

) then

241 case MultplicationStageArrayPosition is

242 when 0 =>

243 Sig_Vector_A_With_IndexPosition <= Sig_Vector_A(0);

244 Sig_Vector_A_Mult_X(0) <= Sig_Vector_A_Mult_X_With_IndexPosition;

245 productTempStore := Sig_Scalar_Mu * Sig_Vector_V(0);

246 when 1 =>




250 when 2 =>




254 when 3 =>

DFPM On FPGA



2015-09-25

61




258 when 4 =>




262 when Others =>

263 NULL;

264 end case;

265 -- -- Setting the correcponding Vector_A element as the input to

the Vector_Vector_Mult_Module

266 -- Sig_Vector_A_With_IndexPosition <=

Sig_Vector_A(MultplicationStageArrayPosition);

267 -- -- Connecting the output of the Vector_Vector_Mult module to

tghe corresponding A_Mult_X index

268 -- Sig_Vector_A_Mult_X(MultplicationStageArrayPosition) <=

Sig_Vector_A_Mult_X_With_IndexPosition;

269 -- -- Doing mu*V

270 -- productTempStore := Sig_Scalar_Mu *

Sig_Vector_V(MultplicationStageArrayPosition);

271 Sig_Vector_Mu_Mult_V(MultplicationStageArrayPosition) <=

productTempStore(48 downto 16);

272 end if;

273 end if;

274 end process;

275

276

277 -- This process clears ITERATION_COMPLETE and

278 -- only sets it to 1 when the MultiplicationProductsReady signal is

high.

279 -- At the rising_edge of MultiplicationProductsReady, the vectors

280 -- B_Minus_AX and B_Minus_Ax_Minus_muV are assigned.

281 process(CLK, RST, DFPMCompute, MultiplicationProductsReady, Ready-

Flag)

282 begin

283 if rising_edge(clk) then

284 if (RST = '1') then

285 Sig_ITERATION_COMPLETE <= '0';

286 ReadyFlag <= '0';

287

288 elsif (DFPMCompute = '1') then



291 elsif (MultiplicationProductsReady = '1') and (ReadyFlag = '0')

then


293


295 B_Minus_AX <= Sig_Vector_B_Minus_AX;

296 B_Minus_Ax_Minus_muV <= Sig_Vector_B_Minus_AX_Minus_MuV;

297 else


299 -- end if;

300 end if;

301 end if;

302 end process;

303

304 -- The clock signal created in this process is a real afterthought

305 -- It would not have been created if this module had behaved itself

;-))

306 -- It was observed that the circuit computed an output that was

wrong

DFPM On FPGA



2015-09-25

62

307 -- For as long as the shifting of the index position was based on

the normal clock

"CLK"

308 -- Hence this clock that cuts the speed to half.

Subtr_Ops_Module.vhd Wed Feb 04 01:26:12 2015

Page 7 309 process(CLK)

310 begin


312 Sig_Clk_For_Index_Shifting <= not(Sig_Clk_For_Index_Shifting);

313 end if;

314 End process;

315

316 end Behavioral;

317

318

Tolerance check

1 ---------------------------------------------------------------------

-------------



4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:




10 --------------------------------------------------------------------

--------------

11

12 library IEEE;




16

17




21




25





30

31 entity Signed_Tolerance_Check is

32 Port ( Vector_B_AX : in DFPM_SIGNED_VECTOR_5X32_BIT;

33 Tolerance_Limit : in Signed (32 downto 0);

34 Iteration_Complete : in STD_LOGIC:= '0';

35

36 CLK : in STD_LOGIC:= '0';

37 RST : in STD_LOGIC:= '0';

38

DFPM On FPGA



2015-09-25

63

39 Tolerance_Limit_Squared, Vector_B_AX_Sum : out Signed (32 downto 0);

40

41 Iterate : out STD_LOGIC := '1');

42 end Signed_Tolerance_Check;

43

44 architecture Behavioral of Signed_Tolerance_Check is

45

46 Signal Sig_Vector_B_AX, Sig_Vector_B_AX_Squared :

DFPM_SIGNED_VECTOR_5X32_BIT;

47 Signal Sig_Tolerance_Limit, Sig_Tolerance_Limit_Squared : Signed (32

downto 0);

48

49 Signal Sig_Vector_B_AX_Sum : Signed(32 downto 0);

50

51 Signal Sig_Position : integer := 0;

52

53 Signal Sig_ShiftPosition, Sig_Multiplication_Is_Complete,

Sig_Check_Tolerance_Limit

: STD_LOGIC := '0';

54

55

56

57

58 begin

59

60 Tolerance_Limit_Squared <= Sig_Tolerance_Limit_Squared;

61 Vector_B_AX_Sum <= Sig_Vector_B_AX_Sum;

62

63 -- This process determines when data stored innternally are to be

serially

multiplied

64 -- They are serially multiplied to save on Multipliers

65 process(CLK, RST, Iteration_Complete, Sig_ShiftPosition,

Sig_Position)

66 Variable Var_Position: integer := 0;

67 begin


69 if (RST = '1') then

70 Sig_Position <= 0;

71 Sig_ShiftPosition <= '0';

72 Sig_Multiplication_Is_Complete <= '0';

73 elsif (Iteration_Complete = '1') then

74 Sig_Check_Tolerance_Limit <= '0';




78 elsif (Sig_Multiplication_Is_Complete = '1') then

79 Sig_Check_Tolerance_Limit <= '1';

80 else

81 if (Sig_ShiftPosition = '1') then

82 if (Sig_Position = 5) then




86 else

87 Var_Position := Sig_Position;

88 Sig_Position <= Var_Position + 1;

89 end if;

90 end if;

91 end if;

92 end if;

93 end process;

94

DFPM On FPGA



2015-09-25

64

95 -- Storing data internally at when signal from SubtrAndMult Module

is high

96 process(Iteration_Complete)

97 Variable productTempStore : Signed(65 downto 0) := (Others => '0');

98 begin

99 if rising_edge(Iteration_Complete) then

100 Sig_Tolerance_Limit <= Tolerance_Limit;

101 Sig_Vector_B_AX <= Vector_B_AX;

102 end if;

103 end process;

104

105 -- Serial multiplication

106 process(CLK, Sig_ShiftPosition, Sig_Position)

107 Variable productTempStore : Signed(65 downto 0);

108 begin


110 if (Sig_ShiftPosition <= '1') then

111 Case Sig_Position is

112 when 0 =>

113 productTempStore := (Sig_Vector_B_AX(Sig_Position) *

Sig_Vector_B_AX(Sig_Position));

114 Sig_Vector_B_AX_Squared(Sig_Position) <= productTempStore(48

downto 16);

115 when 1 =>




downto 16);

118 when 2 =>




downto 16);

121 when 3 =>




downto 16);

124 when 4 =>




downto 16);

127 when 5 =>

128 productTempStore := Sig_Tolerance_Limit * Sig_Tolerance_Limit;

129 Sig_Tolerance_Limit_Squared <= productTempStore(48 downto 16);

130 when others =>

131 NULL;

132 End case;

133 end if;

134 end if;

135 end process;

136

137 process(Sig_Multiplication_Is_Complete)

138 variable Var_Vector_B_AX_Sum : Signed (36 downto 0);

139 begin

140 if rising_edge(Sig_Multiplication_Is_Complete) then

141 Var_Vector_B_AX_Sum := ("0000" & Sig_Vector_B_AX_Squared(0) +

Sig_Vector_B_AX_Squared(1)

142 + Sig_Vector_B_AX_Squared(2) +

Sig_Vector_B_AX_Squared(3)

143 + Sig_Vector_B_AX_Squared(4));

144

DFPM On FPGA



2015-09-25

65

145 Sig_Vector_B_AX_Sum <= Var_Vector_B_AX_Sum(32 downto 0);

146 end if;

147 end process;

148

149 process(CLK, Sig_Check_Tolerance_Limit, Sig_Vector_B_AX_Sum,

Sig_Tolerance_Limit_Squared)

150 begin


152 if (Sig_Check_Tolerance_Limit = '1') then

153 if (Sig_Vector_B_AX_Sum < Sig_Tolerance_Limit_Squared) then

154 Iterate <= '0';

155 else

156 Iterate <= '1';

157 end if;

158 end if;

159 end if;

160 end process;

161 end Behavioral;

162

163

New V operations

1 ---------------------------------------------------------------------

-------------



4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:




10 --------------------------------------------------------------------

--------------

11

12 library IEEE;




16

17








25

26 entity Signed_New_V_Ops is

27 Port ( B_Ax_Muv : in DFPM_SIGNED_VECTOR_5X32_BIT;


29

30 DT : in Signed (32 downto 0);



33 ITERATION_COMPLETE : in STD_LOGIC;

34

35 VECTOR_NEW_V : out DFPM_SIGNED_VECTOR_5X32_BIT;

DFPM On FPGA



2015-09-25

66

36 NEW_V_READY : out STD_LOGIC);

37 end Signed_New_V_Ops;

38

39 architecture Behavioral of Signed_New_V_Ops is

40

41 Signal Sig_Vector_V : DFPM_SIGNED_VECTOR_5X32_BIT;

42 Signal Sig_B_Ax_MuV : DFPM_SIGNED_VECTOR_5X32_BIT;

43 Signal Sig_B_Ax_MuV_Mult_Dt : DFPM_SIGNED_VECTOR_5X32_BIT;

44

45


47

48 Signal Sig_ShiftPosition : STD_LOGIC := '0';

49 Signal Sig_NEW_V_READY : STD_LOGIC := '0';

50

51 begin

52

53 process(CLK, RST, ITERATION_COMPLETE, Sig_ShiftPosition,

Sig_Position,

Sig_NEW_V_READY)

54 variable Var_Iteration_Complete : STD_LOGIC := '0';

55 Variable Var_Position : integer := 0;

56 begin




60 Var_Position := 0;


62 Var_Iteration_Complete := '0';

63 Sig_NEW_V_READY <= '0';

64 elsif (ITERATION_COMPLETE = '0') then


66 -- elsif (ITERATION_COMPLETE = '1') and (Var_Iteration_Complete =

'1') then

67 -- Var_Iteration_Complete := '0';

68 elsif (ITERATION_COMPLETE = '1') and (Var_Iteration_Complete = '0')

then





73 end if;

74





79 else



82 end if;

83 end if;

84

85 if (Sig_NEW_V_READY = '1') then


87 end if;

88 end if;

89 end process;

90

91 process(ITERATION_COMPLETE)

92 begin

93 if rising_edge(ITERATION_COMPLETE) then

94 Sig_B_Ax_MuV <= B_Ax_Muv;

DFPM On FPGA



2015-09-25

67


96 end if;

97 end process;

98

99 process(CLK, Sig_ShiftPosition)


101 begin



104 productTempStore := Sig_B_Ax_MuV(Sig_Position) * DT;

105

106 Sig_B_Ax_MuV_Mult_Dt(Sig_Position) <= productTempStore(48 downto

16);

107 end if;

108 end if;

109 end process;

110

111 NEW_V_READY <= Sig_NEW_V_READY;

112

113 VECTOR_NEW_V(0) <= Sig_Vector_V(0) + Sig_B_Ax_MuV_Mult_Dt(0);





118

119 end Behavioral;

120

121

New X operations

1 ---------------------------------------------------------------------

-------------



4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:




10 --------------------------------------------------------------------

--------------

11

12 library IEEE;




16

17








25

26 entity Signed_New_V_Ops is


DFPM On FPGA



2015-09-25

68


29





34



37 end Signed_New_V_Ops;

38

39 architecture Behavioral of Signed_New_V_Ops is

40

41 Signal Sig_Vector_V : DFPM_SIGNED_VECTOR_5X32_BIT;


43 Signal Sig_B_Ax_MuV_Mult_Dt : DFPM_SIGNED_VECTOR_5X32_BIT;

44

45


47

48 Signal Sig_ShiftPosition : STD_LOGIC := '0';

49 Signal Sig_NEW_V_READY : STD_LOGIC := '0';

50

51 begin

52

53 process(CLK, RST, ITERATION_COMPLETE, Sig_ShiftPosition,

Sig_Position,

Sig_NEW_V_READY)

54 variable Var_Iteration_Complete : STD_LOGIC := '0';

55 Variable Var_Position : integer := 0;

56 begin




60 Var_Position := 0;




64 elsif (ITERATION_COMPLETE = '0') then


66 -- elsif (ITERATION_COMPLETE = '1') and (Var_Iteration_Complete =

'1') then

67 -- Var_Iteration_Complete := '0';

68 elsif (ITERATION_COMPLETE = '1') and (Var_Iteration_Complete = '0')

then





73 end if;

74





79 else



82 end if;

83 end if;

84

85 if (Sig_NEW_V_READY = '1') then


DFPM On FPGA



2015-09-25

69

87 end if;

88 end if;

89 end process;

90

91 process(ITERATION_COMPLETE)

92 begin

93 if rising_edge(ITERATION_COMPLETE) then

94 Sig_B_Ax_MuV <= B_Ax_Muv;


96 end if;

97 end process;

98

99 process(CLK, Sig_ShiftPosition)


101 begin



104 productTempStore := Sig_B_Ax_MuV(Sig_Position) * DT;

105

106 Sig_B_Ax_MuV_Mult_Dt(Sig_Position) <= productTempStore(48 downto

16);

107 end if;

108 end if;

109 end process;

110

111 NEW_V_READY <= Sig_NEW_V_READY;

112






118

119 end Behavioral;

120

121

One Iteration

1 ---------------------------------------------------------------------

-------------



4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:




10 --------------------------------------------------------------------

--------------

11

12 library IEEE;





17



DFPM On FPGA



2015-09-25

70


21





26

27 entity Signed_DFPM_One_Iteration is

28 Port ( VECTOR_A_IN : in DFPM_SIGNED_VECTOR_25X32_BIT;

29 VECTOR_B_IN : in DFPM_SIGNED_VECTOR_5X32_BIT;

30 VECTOR_X_IN : in DFPM_SIGNED_VECTOR_5X32_BIT;

31 VECTOR_V_IN : in DFPM_SIGNED_VECTOR_5X32_BIT;

32 Mu_IN : in Signed (32 downto 0);

33 DT_IN : in Signed (32 downto 0);

34 NEW_ITERATION_IN : in STD_LOGIC;

35



38

39 B_AX_OUT : out DFPM_SIGNED_VECTOR_5X32_BIT;

40 NEW_V_OUT : out DFPM_SIGNED_VECTOR_5X32_BIT;

41 NEW_X_OUT : out DFPM_SIGNED_VECTOR_5X32_BIT;

42

43 ITERATION_STAGE_COMPLETE : out STD_LOGIC;

44 ITERATE_AGAIN : out STD_LOGIC);

45 end Signed_DFPM_One_Iteration;

46

47 architecture Behavioral of Signed_DFPM_One_Iteration is

48

49

50 COMPONENT Signed_SubtrAndMult_Ops_Module






56





61



64 END COMPONENT;

65

66 COMPONENT Signed_New_V_Ops



69





74



77 END COMPONENT;

78

79 COMPONENT Signed_New_X_Ops

80 Port ( VECTOR_X : in DFPM_SIGNED_VECTOR_5X32_BIT;

81 VECTOR_NEW_V : in DFPM_SIGNED_VECTOR_5X32_BIT;

82 DT : in Signed(32 downto 0);

DFPM On FPGA



2015-09-25

71

83



86 NEW_V_READY : in STD_LOGIC;

87

88 VECTOR_NEW_X : out DFPM_SIGNED_VECTOR_5X32_BIT;

89 NEW_X_READY : out STD_LOGIC);

90 END COMPONENT;

91

92 COMPONENT Signed_Tolerance_Check




96

97

98



101

102 Tolerance_Limit_Squared, Vector_B_AX_Sum : out Signed (32 downto

0);

103


105 END COMPONENT;

106

107 -------------------------------------------------------------------

--------------------

---------

108

109

110 Signal Sig_VECTOR_A_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

111 Signal Sig_VECTOR_B_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

112 Signal Sig_VECTOR_V_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

113 Signal Sig_VECTOR_X_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

114 Signal Sig_Mu_IN : Signed(32 downto 0);

115 Signal Sig_DT_IN : Signed(32 downto 0);

116 Signal Sig_B_AX_OUT : DFPM_SIGNED_VECTOR_5X32_BIT;

117

118 Signal Sig_Start_SubtrMultOps_To_NewVOps : STD_LOGIC := '0';

119 Signal Sig_Start_NewVOps_To_NewXOps : STD_LOGIC := '0';

120 Signal Sig_New_X_Is_Ready : STD_LOGIC := '0';

121


123 Signal Sig_New_V : DFPM_SIGNED_VECTOR_5X32_BIT;

124 Signal Sig_New_X : DFPM_SIGNED_VECTOR_5X32_BIT;

125

126 Constant Const_Tolerance_Limit : Signed (32 downto 0) :=

"000000000000000000000001000000000"; -- 1*2^(-7) + 1*2^(-8)

127

128

129

130 begin

131

132 Sig_DT_IN <= DT_IN;

133 Sig_Mu_IN <= Mu_IN;

134 Sig_VECTOR_V_IN <= VECTOR_V_IN;

135 Sig_VECTOR_X_IN <= VECTOR_X_IN;

136 ITERATION_STAGE_COMPLETE <= Sig_New_X_Is_Ready;

137 B_AX_OUT <= Sig_B_AX_OUT;

138 NEW_V_OUT <= Sig_New_V;

139 NEW_X_OUT <= Sig_New_X;

140

DFPM On FPGA



2015-09-25

72

141 Inst_Signed_SubtrAndMult_Ops_Module: Signed_SubtrAndMult_Ops_Module

PORT MAP(

142 Vector_A => VECTOR_A_IN,

143 Vector_B => VECTOR_B_IN,

144 Vector_X => Sig_VECTOR_X_IN,

145 Scalar_Mu => Mu_IN,

146 Vector_V => Sig_VECTOR_V_IN,

147 CLK => CLK,

148 RST => RST,

149 NEW_ITERATION => NEW_ITERATION_IN,

150 ITERATION_COMPLETE => Sig_Start_SubtrMultOps_To_NewVOps,

151 B_Minus_AX => Sig_B_AX_OUT,

152 B_Minus_Ax_Minus_muV => Sig_B_Ax_MuV);

153

154

155 Inst_Signed_New_V_Ops: Signed_New_V_Ops PORT MAP(

156 B_Ax_Muv => Sig_B_Ax_MuV,


158 DT => Sig_DT_IN,

159 CLK => CLK,

160 RST => RST,


162 VECTOR_NEW_V => Sig_New_V,

163 NEW_V_READY => Sig_Start_NewVOps_To_NewXOps);

164

165

166 Inst_Signed_New_X_Ops: Signed_New_X_Ops PORT MAP(

167 VECTOR_X => Sig_VECTOR_X_IN,



170 CLK => CLK,

171 RST => RST,

172 NEW_V_READY => Sig_Start_NewVOps_To_NewXOps,

173 VECTOR_NEW_X => Sig_New_X,

174 NEW_X_READY => Sig_New_X_Is_Ready);

175

176 Inst_Signed_Tolerance_Check: Signed_Tolerance_Check PORT MAP (

177 Vector_B_AX => Sig_B_AX_OUT,

178 Tolerance_Limit => Const_Tolerance_Limit,

179 Iteration_Complete => Sig_Start_SubtrMultOps_To_NewVOps,

180 CLK => CLK,

181 RST => RST,

182 Tolerance_Limit_Squared => open,

183 Vector_B_AX_Sum => open,

184 Iterate => ITERATE_AGAIN);

185

186

187 end Behavioral;

188

189

DFPM On FPGA



2015-09-25

73

DFPM top module

1 ---------------------------------------------------------------------

-------------



4 --

5 -- Create Date: 10:42:33 01/07/2015

6 -- Design Name:




10 --------------------------------------------------------------------

--------------

11

12 library IEEE;





17




21





26

27 entity Signed_DFPM_One_Iteration is

28 Port ( VECTOR_A_IN : in DFPM_SIGNED_VECTOR_25X32_BIT;

29 VECTOR_B_IN : in DFPM_SIGNED_VECTOR_5X32_BIT;

30 VECTOR_X_IN : in DFPM_SIGNED_VECTOR_5X32_BIT;

31 VECTOR_V_IN : in DFPM_SIGNED_VECTOR_5X32_BIT;

32 Mu_IN : in Signed (32 downto 0);

33 DT_IN : in Signed (32 downto 0);

34 NEW_ITERATION_IN : in STD_LOGIC;

35



38

39 B_AX_OUT : out DFPM_SIGNED_VECTOR_5X32_BIT;

40 NEW_V_OUT : out DFPM_SIGNED_VECTOR_5X32_BIT;

41 NEW_X_OUT : out DFPM_SIGNED_VECTOR_5X32_BIT;

42

43 ITERATION_STAGE_COMPLETE : out STD_LOGIC;

44 ITERATE_AGAIN : out STD_LOGIC);

45 end Signed_DFPM_One_Iteration;

46

47 architecture Behavioral of Signed_DFPM_One_Iteration is

48

49

50 COMPONENT Signed_SubtrAndMult_Ops_Module






56

DFPM On FPGA



2015-09-25

74





61



64 END COMPONENT;

65

66 COMPONENT Signed_New_V_Ops



69





74



77 END COMPONENT;

78

79 COMPONENT Signed_New_X_Ops

80 Port ( VECTOR_X : in DFPM_SIGNED_VECTOR_5X32_BIT;

81 VECTOR_NEW_V : in DFPM_SIGNED_VECTOR_5X32_BIT;

82 DT : in Signed(32 downto 0);

83



86 NEW_V_READY : in STD_LOGIC;

87

88 VECTOR_NEW_X : out DFPM_SIGNED_VECTOR_5X32_BIT;

89 NEW_X_READY : out STD_LOGIC);

90 END COMPONENT;

91

92 COMPONENT Signed_Tolerance_Check




96

97

98



101

102 Tolerance_Limit_Squared, Vector_B_AX_Sum : out Signed (32 downto

0);

103


105 END COMPONENT;

106

107 -------------------------------------------------------------------

--------------------

---------

108

109

110 Signal Sig_VECTOR_A_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

111 Signal Sig_VECTOR_B_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

112 Signal Sig_VECTOR_V_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

113 Signal Sig_VECTOR_X_IN : DFPM_SIGNED_VECTOR_5X32_BIT;

114 Signal Sig_Mu_IN : Signed(32 downto 0);

115 Signal Sig_DT_IN : Signed(32 downto 0);

116 Signal Sig_B_AX_OUT : DFPM_SIGNED_VECTOR_5X32_BIT;

DFPM On FPGA



2015-09-25

75

117

118 Signal Sig_Start_SubtrMultOps_To_NewVOps : STD_LOGIC := '0';

119 Signal Sig_Start_NewVOps_To_NewXOps : STD_LOGIC := '0';

120 Signal Sig_New_X_Is_Ready : STD_LOGIC := '0';

121


123 Signal Sig_New_V : DFPM_SIGNED_VECTOR_5X32_BIT;

124 Signal Sig_New_X : DFPM_SIGNED_VECTOR_5X32_BIT;

125

126 Constant Const_Tolerance_Limit : Signed (32 downto 0) :=

"000000000000000000000001000000000"; -- 1*2^(-7) + 1*2^(-8)

127

128

129

130 begin

131

132 Sig_DT_IN <= DT_IN;

133 Sig_Mu_IN <= Mu_IN;

134 Sig_VECTOR_V_IN <= VECTOR_V_IN;

135 Sig_VECTOR_X_IN <= VECTOR_X_IN;

136 ITERATION_STAGE_COMPLETE <= Sig_New_X_Is_Ready;

137 B_AX_OUT <= Sig_B_AX_OUT;

138 NEW_V_OUT <= Sig_New_V;

139 NEW_X_OUT <= Sig_New_X;

140

141 Inst_Signed_SubtrAndMult_Ops_Module: Signed_SubtrAndMult_Ops_Module

PORT MAP(

142 Vector_A => VECTOR_A_IN,

143 Vector_B => VECTOR_B_IN,

144 Vector_X => Sig_VECTOR_X_IN,

145 Scalar_Mu => Mu_IN,


147 CLK => CLK,

148 RST => RST,

149 NEW_ITERATION => NEW_ITERATION_IN,


151 B_Minus_AX => Sig_B_AX_OUT,

152 B_Minus_Ax_Minus_muV => Sig_B_Ax_MuV);

153

154

155 Inst_Signed_New_V_Ops: Signed_New_V_Ops PORT MAP(

156 B_Ax_Muv => Sig_B_Ax_MuV,



159 CLK => CLK,

160 RST => RST,



163 NEW_V_READY => Sig_Start_NewVOps_To_NewXOps);

164

165

166 Inst_Signed_New_X_Ops: Signed_New_X_Ops PORT MAP(

167 VECTOR_X => Sig_VECTOR_X_IN,



170 CLK => CLK,

171 RST => RST,

172 NEW_V_READY => Sig_Start_NewVOps_To_NewXOps,

173 VECTOR_NEW_X => Sig_New_X,

174 NEW_X_READY => Sig_New_X_Is_Ready);

175

176 Inst_Signed_Tolerance_Check: Signed_Tolerance_Check PORT MAP (

177 Vector_B_AX => Sig_B_AX_OUT,

DFPM On FPGA



2015-09-25

76

178 Tolerance_Limit => Const_Tolerance_Limit,

179 Iteration_Complete => Sig_Start_SubtrMultOps_To_NewVOps,

180 CLK => CLK,

181 RST => RST,

182 Tolerance_Limit_Squared => open,

183 Vector_B_AX_Sum => open,

184 Iterate => ITERATE_AGAIN);

185

186

187 end Behavioral;

188

189

UART Core

1 ---------------------------------------------------------------------

---

2 -- RS232RefCom.vhd

3 ---------------------------------------------------------------------

---

4 -- Author: Dan Pederson

5 -- Copyright 2004 Digilent, Inc.

6 ---------------------------------------------------------------------

---

7 -- Description: This file defines a UART which tranfers data from

8 -- serial form to parallel form and vice versa.

9 ---------------------------------------------------------------------

---

10 -- Revision History:

11 -- 07/15/04 (Created) DanP

12 -- 02/25/08 (Created) ClaudiaG: made use of the baudDivide constant

13 -- in the Clock Dividing Processes

14 --------------------------------------------------------------------

----

15

16 library IEEE;


18 use IEEE.STD_LOGIC_ARITH.ALL;

19 use IEEE.STD_LOGIC_UNSIGNED.ALL;

20

21 -- Uncomment the following lines to use the declarations that are

22 -- provided for instantiating Xilinx primitive components.



25

26 entity Rs232RefComp is

27 Port (

28 TXD : out std_logic := '1';

29 RXD : in std_logic;

30 CLK : in std_logic; --Master Clock = 50MHz

31 DBIN : in std_logic_vector (7 downto 0); --Data Bus in

32 DBOUT : out std_logic_vector (7 downto 0); --Data Bus out

33 RDA : inout std_logic; --Read Data Available

34 TBE : inout std_logic := '1'; --Transfer Bus Empty

35 RD : in std_logic; --Read Strobe

36 WR : in std_logic; --Write Strobe

37 PE : out std_logic; --Parity Error Flag

DFPM On FPGA



2015-09-25

77

38 FE : out std_logic; --Frame Error Flag

39 OE : out std_logic; --Overwrite Error Flag

40 RST : in std_logic := '0'); --Master Reset

41 end Rs232RefComp;

42

43 architecture Behavioral of Rs232RefComp is

44 --------------------------------------------------------------------

----

45 -- Component Declarations

46 --------------------------------------------------------------------

----

47

48 --------------------------------------------------------------------

----

49 -- Local Type Declarations

50 --------------------------------------------------------------------

----

51 --Receive state machine

52 type rstate is (

53 strIdle, --Idle state

54 strEightDelay, --Delays for 8 clock cycles

55 strGetData, --Shifts in the 8 data bits, and checks parity

56 strCheckStop --Sets framing error flag if Stop bit is wrong

57 );

58

59 type tstate is (

60 sttIdle, --Idle state

61 sttTransfer, --Move data into shift register

62 sttShift --Shift out data

63 );

64

65 type TBEstate is (

66 stbeIdle,

67 stbeSetTBE,

68 stbeWaitLoad,

69 stbeWaitWrite

70 );

71

72

73 --------------------------------------------------------------------

----

74 -- Signal Declarations

75 --------------------------------------------------------------------

----

76 constant baudDivide : std_logic_vector(7 downto 0) := "10100011"; --

Baud Rate

dividor, set now for a rate of 9600.

77 --Found by

dividing 50MHz by 9600 and 16.

78 signal rdReg : std_logic_vector(7 downto 0) := "00000000"; --Receive

holding register

79 signal rdSReg : std_logic_vector(9 downto 0) := "1111111111"; --

Receive

shift register

80 signal tfReg : std_logic_vector(7 downto 0); --Transfer

holding register

81 signal tfSReg : std_logic_vector(10 downto 0) := "11111111111"; --

Transfer

shift register

82 signal clkDiv : std_logic_vector(8 downto 0) := "000000000"; --used

for rClk

83 signal rClkDiv : std_logic_vector(3 downto 0) := "0000"; --used for

tClk

DFPM On FPGA



2015-09-25

78

84 signal ctr : std_logic_vector(3 downto 0) := "0000"; --used for

delay times

85 signal tfCtr : std_logic_vector(3 downto 0) := "0000"; --used to

delay in transfer

86 signal rClk : std_logic := '0'; --Receiving Clock

87 signal tClk : std_logic; --Transfering Clock

88 signal dataCtr : std_logic_vector(3 downto 0) := "0000"; --Counts

the number

of read data bits

89 signal parError: std_logic; --Parity error bit

90 signal frameError: std_logic; --Frame error bit

91 signal CE : std_logic; --Clock enable for the latch

92 signal ctRst : std_logic := '0';

93 signal load : std_logic := '0';

94 signal shift : std_logic := '0';

95 signal par : std_logic;

96 signal tClkRST : std_logic := '0';

97 signal rShift : std_logic := '0';

98 signal dataRST : std_logic := '0';

99 signal dataIncr: std_logic := '0';

100

101 signal strCur : rstate := strIdle; --Current state in the Receive

state machine

102 signal strNext : rstate; --Next state in the Receive

state machine

103 signal sttCur : tstate := sttIdle; --Current state in the Transfer

state machine

104 signal sttNext : tstate; --Next state in the Transfer

staet machine

105 signal stbeCur : TBEstate := stbeIdle;

106 signal stbeNext: TBEstate;

107

108 -------------------------------------------------------------------

-----

109 -- Module Implementation

110 -------------------------------------------------------------------

-----

111

112 begin

113 frameError <= not rdSReg(9);

114 parError <= not ( rdSReg(8) xor (((rdSReg(0) xor rdSReg(1)) xor

(rdSReg(2) xor

rdSReg(3))) xor ((rdSReg(4) xor rdSReg(5)) xor (rdSReg(6) xor

rdSReg(7)))) );

115 DBOUT <= rdReg;

116 tfReg <= DBIN;

117 par <= not ( ((tfReg(0) xor tfReg(1)) xor (tfReg(2) xor tfReg(3)))

xor ((tfReg(4)

xor tfReg(5)) xor (tfReg(6) xor tfReg(7))) );

118

119 --Clock Dividing Functions--

120

121 process (CLK, clkDiv) --set up clock divide for rClk

122 begin

123 if (Clk = '1' and Clk'event) then

124 if (clkDiv = baudDivide) then

125 clkDiv <= "000000000";

126 else

127 clkDiv <= clkDiv +1;

128 end if;

129 end if;

130 end process;

131

DFPM On FPGA



2015-09-25

79

132 process (clkDiv, rClk, CLK) --Define rClk

133 begin

134 if CLK = '1' and CLK'Event then

135 if clkDiv = baudDivide then

136 rClk <= not rClk;

137 else

138 rClk <= rClk;

139 end if;

140 end if;

141 end process;

142

143 process (rClk) --set up clock divide for tClk

144 begin

145 if (rClk = '1' and rClk'event) then

146 rClkDiv <= rClkDiv +1;

147 end if;

148 end process;

149

150 tClk <= rClkDiv(3); --define tClk

151

152 process (rClk, ctRst) --set up a counter based on rClk

153 begin

154 if rClk = '1' and rClk'Event then

155 if ctRst = '1' then

156 ctr <= "0000";

157 else

158 ctr <= ctr +1;

159 end if;

160 end if;

161 end process;

162

163 process (tClk, tClkRST) --set up a counter based on tClk

164 begin

165 if (tClk = '1' and tClk'event) then

166 if tClkRST = '1' then

167 tfCtr <= "0000";

168 else

169 tfCtr <= tfCtr +1;

170 end if;

171 end if;

172 end process;

173

174 --This process controls the error flags--

175 process (rClk, RST, RD, CE)

176 begin

177 if RD = '1' or RST = '1' then

178 FE <= '0';

179 OE <= '0';

180 RDA <= '0';

181 PE <= '0';

182 elsif rClk = '1' and rClk'event then

183 if CE = '1' then

184 FE <= frameError;

185 OE <= RDA;

186 RDA <= '1';

187 PE <= parError;

188 rdReg(7 downto 0) <= rdSReg (7 downto 0);

189 end if;

190 end if;

191 end process;

192

193 --This process controls the receiving shift register--

194 process (rClk, rShift)

DFPM On FPGA



2015-09-25

80

195 begin


197 if rShift = '1' then

198 rdSReg <= (RXD & rdSReg(9 downto 1));

199 end if;

200 end if;

201 end process;

202

203 --This process controls the dataCtr to keep track of shifted val-

ues--

204 process (rClk, dataRST)

205 begin

206 if (rClk = '1' and rClk'event) then

207 if dataRST = '1' then

208 dataCtr <= "0000";

209 elsif dataIncr = '1' then

210 dataCtr <= dataCtr +1;

211 end if;

212 end if;

213 end process;

214

215 --Receiving State Machine--

216 process (rClk, RST)

217 begin


219 if RST = '1' then

220 strCur <= strIdle;

221 else

222 strCur <= strNext;

223 end if;

224 end if;

225 end process;

226

227 --This process generates the sequence of steps needed receive the

data

228

229 process (strCur, ctr, RXD, dataCtr, rdSReg, rdReg, RDA)

230 begin

231 case strCur is

232

233 when strIdle =>

234 dataIncr <= '0';

235 rShift <= '0';

236 dataRst <= '0';

237

238 CE <= '0';

239 if RXD = '0' then

240 ctRst <= '1';

241 strNext <= strEightDelay;

242 else

243 ctRst <= '0';

244 strNext <= strIdle;

245 end if;

246

247 when strEightDelay =>


249 rShift <= '0';

250 CE <= '0';

251

252 if ctr(2 downto 0) = "111" then

253 ctRst <= '1';

254 dataRST <= '1';

255 strNext <= strGetData;

DFPM On FPGA



2015-09-25

81

256 else

257 ctRst <= '0';

258 dataRST <= '0';

259 strNext <= strEightDelay;

260 end if;

261

262 when strGetData =>

263 CE <= '0';

264 dataRst <= '0';

265 if ctr(3 downto 0) = "1111" then

266 ctRst <= '1';


268 rShift <= '1';

269 else

270 ctRst <= '0';


272 rShift <= '0';

273 end if;

274

275 if dataCtr = "1010" then

276 strNext <= strCheckStop;

277 else

278 strNext <= strGetData;

279 end if;

280

281 when strCheckStop =>


283 rShift <= '0';

284 dataRst <= '0';

285 ctRst <= '0';

286

287 CE <= '1';

288 strNext <= strIdle;

289

290 end case;

291

292 end process;

293

294 --TBE State Machine--

295 process (CLK, RST)

296 begin

297 if CLK = '1' and CLK'Event then


299 stbeCur <= stbeIdle;

300 else

301 stbeCur <= stbeNext;

302 end if;

303 end if;

304 end process;

305

306 --This process gererates the sequence of events needed to control

the TBE flag--

307 process (stbeCur, CLK, WR, DBIN, load)

308 begin

309

310 case stbeCur is

311

312 when stbeIdle =>

313 TBE <= '1';

314 if WR = '1' then

315 stbeNext <= stbeSetTBE;

316 else

317 stbeNext <= stbeIdle;

DFPM On FPGA



2015-09-25

82

318 end if;

319

320 when stbeSetTBE =>

321 TBE <= '0';

322 if load = '1' then

323 stbeNext <= stbeWaitLoad;

324 else

325 stbeNext <= stbeSetTBE;

326 end if;

327

328 when stbeWaitLoad =>


330 stbeNext <= stbeWaitWrite;

331 else

332 stbeNext <= stbeWaitLoad;

333 end if;

334

335 when stbeWaitWrite =>

336 if WR = '0' then

337 stbeNext <= stbeIdle;

338 else

339 stbeNext <= stbeWaitWrite;

340 end if;

341 end case;

342 end process;

343

344 --This process loads and shifts out the transfer shift register--

345 process (load, shift, tClk, tfSReg)

346 begin

347 TXD <= tfsReg(0);

348 if tClk = '1' and tClk'Event then


350 tfSReg (10 downto 0) <= ('1' & par & tfReg(7 downto 0) &'0');

351 end if;

352 if shift = '1' then

353

354 tfSReg (10 downto 0) <= ('1' & tfSReg(10 downto 1));

355 end if;

356 end if;

357 end process;

358

359 -- Transfer State Machine--

360 process (tClk, RST)

361 begin

362 if (tClk = '1' and tClk'Event) then


364 sttCur <= sttIdle;

365 else

366 sttCur <= sttNext;

367 end if;

368 end if;

369 end process;

370

371 -- This process generates the sequence of steps needed transfer the

data--

372 process (sttCur, tfCtr, tfReg, TBE, tclk)

373 begin

374

375 case sttCur is

376

377 when sttIdle =>

378 tClkRST <= '0';

379 shift <= '0';

DFPM On FPGA



2015-09-25

83

380 load <= '0';

381 if TBE = '1' then

382 sttNext <= sttIdle;

383 else

384 sttNext <= sttTransfer;

385 end if;

386

387 when sttTransfer =>

388 shift <= '0';

389 load <= '1';

390 tClkRST <= '1';

391 sttNext <= sttShift;

392

393

394 when sttShift =>

395 shift <= '1';

396 load <= '0';

397 tClkRST <= '0';

398 if tfCtr = "1100" then

399 sttNext <= sttIdle;

400 else

401 sttNext <= sttShift;

402 end if;

403 end case;

404 end process;

406 end Behavioral;

UART Interface

1 ---------------------------------------------------------------------

-------------

2 -- Company:

3 -- Engineer:

4 --

5 -- Create Date: 21:47:53 01/06/2015

6 -- Design Name:

7 -- Module Name: UART_INTERFACE - Behavioral

8 -- Project Name:

9 -- Target Devices:

10 -- Tool versions:

11 -- Description:

12 --

13 -- Dependencies:

14 --

15 -- Revision:

16 -- Revision 0.01 - File Created

17 -- Additional Comments:

18 --

19 --------------------------------------------------------------------

--------------

20 library IEEE;


22 use IEEE.STD_LOGIC_ARITH.ALL;

23 use IEEE.STD_LOGIC_UNSIGNED.ALL;




27

DFPM On FPGA



2015-09-25

84





32

33 entity UART_INTERFACE is

34 Port ( RXD : in STD_LOGIC := '1';

35 DATA_UART_TO_DFPM : out STD_LOGIC_VECTOR (7 downto 0);

36 RDA_SIG : out STD_LOGIC;

37 DATA_READY_FROM_UART : out STD_LOGIC := '0';

38

39 WAITING_FOR_DFPM : out STD_LOGIC := '0';

40


42 RST : in STD_LOGIC := '0';

43 LEDS : out STD_LOGIC_VECTOR (7 downto 0) := "00000000";

44

45 TXD : out STD_LOGIC := '1';

46 DATA_DFPM_TO_UART : in STD_LOGIC_VECTOR (7 downto 0);

47 TBE_SIG : out STD_LOGIC;

48 DATA_READY_FROM_DFPM : in STD_LOGIC := '0');

49 end UART_INTERFACE;

50

51 architecture Behavioral of UART_INTERFACE is

52

53 component RS232RefComp

54 Port (TXD : out std_logic := '1';

55 RXD : in std_logic;

56 CLK : in std_logic;

57 DBIN : in std_logic_vector (7 downto 0);

58 DBOUT : out std_logic_vector (7 downto 0);

59 RDA : inout std_logic;

60 TBE : inout std_logic := '1';

61 RD : in std_logic;

62 WR : in std_logic;

63 PE : out std_logic;

64 FE : out std_logic;

65 OE : out std_logic;

66 RST : in std_logic := '0');

67 end component;

68

69 --------------------------------------------------------------------

-----

70 type mainState is (

71 stReceive,

72 stWaitForDFPMOutput,

73 stSend,

74 stRepeatSend);

75 --------------------------------------------------------------------

-----

76

77 signal dbInSig : std_logic_vector(7 downto 0):= "00000000";

78 signal dbOutSig : std_logic_vector(7 downto 0):= "00000000";

79 signal rdaSig : std_logic;

80 signal tbeSig : std_logic;

81 signal rdSig : std_logic;

82 signal wrSig : std_logic;

83 signal peSig : std_logic;

84 signal feSig : std_logic;

85 signal oeSig : std_logic;

86

87 signal stCur : mainState := stReceive;

88 signal stNext : mainState;

DFPM On FPGA



2015-09-25

85

89

90 Signal TxCount, RxCount : integer := 0;

91

92 Signal RxFlag, TxFlag, TbeFlag, RdaFlag, clearSendCount : std_logic

:= '0';

93

94 Signal TxDataReadStartPos : integer := 7;

95 Signal RxDataReadStartPos : integer := 0;

96

97 Signal Sig_Waiting_For_DFPM_Results : std_logic := '0';

98

99 Constant endOfRxMessage : std_logic_vector(7 downto 0) :=

"00111010";

100 Constant endOfTxMessage : std_logic_vector(7 downto 0) :=

"11111111";

101

102 Constant numberOfTxTransmissions : integer := 50;

103 Constant numberOfRxTransmissions : integer := 8;

104

105

106

107 begin

108

109 WAITING_FOR_DFPM <= Sig_Waiting_For_DFPM_Results;

110

111 TBE_SIG <= tbeSig;

112

113 RDA_SIG <= rdaSig;

114

115

116 Instantiating_the_UART: RS232RefComp port map ( TXD => TXD,

117 RXD => RXD,

118 CLK => CLK,

119 DBIN => dbInSig,

120 DBOUT => dbOutSig,

121 RDA => rdaSig,

122 TBE => tbeSig,

123 RD => rdSig,

124 WR => wrSig,

125 PE => peSig,

126 FE => feSig,

127 OE => oeSig,

128 RST => RST);

129

130 -------------------------------------------------------------------

------

131 process (CLK, RST)

132 begin

133 if (CLK = '1' and CLK'Event) then


135 stCur <= stReceive;

136 else

137 stCur <= stNext;

138 end if;

139 end if;

140 end process;

141 -------------------------------------------------------------------

------

142

143 process (stCur, rdaSig, dboutsig, tbeSig,

TxCount,DATA_READY_FROM_DFPM)

144 Variable TXFlagVar : std_logic:= '0';

145 begin

DFPM On FPGA



2015-09-25

86

146 case stCur is

147 when stReceive =>

148 rdSig <= '0';

149 wrSig <= '0';

150 Sig_Waiting_For_DFPM_Results <= '0';

151 DATA_READY_FROM_UART <= '1';

152 if (dbOutSig = endOfRxMessage) then

153 stNext <= stWaitForDFPMOutput;

154 DATA_READY_FROM_UART <= '0';

155 else

156 stNext <= stReceive;

157 end if;

158 if (rdaSig = '1') then

159 rdSig <= '1';

160 LEDS <= dbOutSig;

161 -- Send the newly received data to DFPM

162 DATA_UART_TO_DFPM <= dbOutSig;

163 end if;

164

165 when stWaitForDFPMOutput =>


167 -- Signal with the LEDS

168 LEDS <= (Others => '1');

169 -- Prevent the RX from receiving and the TX from transmitting

170 rdSig <= '1';

171 wrSig <= '0';

172 -- Do nothing else. Just wait until output is ready from the DFPM

173 if (DATA_READY_FROM_DFPM = '1') then

174 stNext <= stSend;

175 else

176 stNext <= stWaitForDFPMOutput;

177 end if;

178

179 when stSend =>


181 LEDS <= (Others => '0');

182 --LEDS(0) <= '1';

183 if (TxCount = numberOfTxTransmissions) then

184 stNext <= stReceive;

185 else

186 rdSig <= '1';

187 wrSig <= '1';

188 stNext <= stRepeatSend;

189 end if;

190

191 when stRepeatSend =>

192 --LEDS(1) <= '1';

193 wrSig <= '0';

194 if (tbeSig = '1') then

195 --SEnd the newly received data to UART TX

196 dbInSig <= DATA_DFPM_TO_UART;

197 stNext <= stSend;

198 else

199 stNext <= stRepeatSend;

200 end if;

201 end case;

202 end process;

203

204 ---- Determining the number of tx transmissions to be sent

205 ---- and which positioon in memory is to be printed out.

206 process(tbeSig)

207 Variable TxCountVar : integer := 0;

208 Variable TxDataReadStartPosVar : integer := 0;

DFPM On FPGA



2015-09-25

87

209 begin

210 if falling_edge(tbeSig) then

211 TxCountVar := TxCount;

212 TxCount <= TxCountVar + 1;

213

214 TxDataReadStartPosVar := TxDataReadStartPos;

215 TxDataReadStartPos <= TxDataReadStartPosVar - 1;

216 end if;

217 end process;

218

219 end Behavioral;

220

221

DFPM On FPGA



2015-09-25

88

Project Top module DFPM_ON_FPGA_PROJECT_DEMO_TOP_MODULE.vhd Mon Feb 09 03:31:07 2015

Page 1 1 ---------------------------------------------------------------------

-------------

2 -- Company:

3 -- Engineer:

4 --

5 -- Create Date: 14:56:46 02/06/2015

6 -- Design Name:

7 -- Module Name: DFPM_ON_FPGA_PROJECT_DEMO_TOP_MODULE - Behavioral

8 -- Project Name:

9 -- Target Devices:

10 -- Tool versions:

11 -- Description:

12 --

13 -- Dependencies:

14 --

15 -- Revision:

16 -- Revision 0.01 - File Created

17 -- Additional Comments:

18 --

19 --------------------------------------------------------------------

--------------

20 library IEEE;


22 USE ieee.std_logic_unsigned.all;


24

25 use work.DFPM_VECTOR_5X32_BIT.all; -- 5 by 1 matrix of std logic

vectors package

26 use work.DFPM_VECTOR_25X32_BIT.all; -- 5 by 5 matrix of

std_logic_vectors package

27 use work.DFPM_ARRAY_5X32_BIT.all; -- 5 by 1 of Signed signed vectors

package

28 use work.DFPM_ARRAY_25X32_BIT.all; -- 5 by 5 matrix of signed vec-

tors package

29



32

33





38

39 entity DFPM_ON_FPGA_PROJECT_DEMO_TOP_MODULE is

40 Port ( RXD : in STD_LOGIC;

41 TXD: out STD_LOGIC;



44 LEDS : out STD_LOGIC_VECTOR (7 downto 0));

45 end DFPM_ON_FPGA_PROJECT_DEMO_TOP_MODULE;

46

47 architecture Behavioral of DFPM_ON_FPGA_PROJECT_DEMO_TOP_MODULE is

48

49 COMPONENT UART_INTERFACE

50 PORT(

51 RXD : IN std_logic;

52 CLK : IN std_logic;

DFPM On FPGA



2015-09-25

89

53 RST : IN std_logic;

54 DATA_DFPM_TO_UART : IN std_logic_vector(7 downto 0);

55 LEDS : out STD_LOGIC_VECTOR (7 downto 0);

56 DATA_READY_FROM_DFPM : IN std_logic;

57

DFPM_ON_FPGA_PROJECT_DEMO_TOP_MODULE.vhd Mon Feb 09 03:31:07 2015

Page 2 58 WAITING_FOR_DFPM : out STD_LOGIC := '0';

59

60 DATA_UART_TO_DFPM : OUT std_logic_vector(7 downto 0);

61 RDA_SIG : OUT std_logic;

62 DATA_READY_FROM_UART : OUT std_logic;

63 TXD : OUT std_logic;

64 TBE_SIG : OUT std_logic);

65 END COMPONENT;

66

67 COMPONENT Signed_DFPM_Iteration_Control_Top_Module

68 Port ( VECTOR_A_IN : IN DFPM_SIGNED_VECTOR_25X32_BIT;

69 VECTOR_B_IN : IN DFPM_SIGNED_VECTOR_5X32_BIT;

70

71 DATA_READY_FROM_UART_RX : IN STD_LOGIC;

72

73 CLK : IN STD_LOGIC;

74 RST : IN STD_LOGIC;

75

76 VECTOR_B_AX : OUT DFPM_SIGNED_VECTOR_5X32_BIT;

77

78 DATA_READY_FROM_ONE_ITERATION : OUT STD_LOGIC := '0';

79 DATA_READY_FROM_DFPM_ITERATIONS : OUT STD_LOGIC;

80

81 VECTOR_X_OUT : OUT DFPM_SIGNED_VECTOR_5X32_BIT);

82 END COMPONENT;

83

84 type storage_type_dfpm is array (0 to 5) of std_logic_vector(40

downto 0);--

Larger so as to make room for the New line character

85 type type_DFPM_Format is array (0 to 30) of Signed(32 downto 0);

86 ------------------------------------------------------------------

87 Signal Sig_DFPM_Input_array : type_DFPM_Format := (

"000000000000000000000000000000000",

88

"000000000000000000000000000000000",

89

"000000000000000000000000000000000",

90

"000000000000000000000000000000000",

91

"000000000000000000000000000000000",

92

"000000000000000000000000000000000",

93

"000000000000000000000000000000000",

94

"000000000000000000000000000000000",

95

"000000000000000000000000000000000",

96

"000000000000000000000000000000000",

97

"000000000000000000000000000000000",

98

"000000000000000000000000000000000",

99

DFPM On FPGA



2015-09-25

90

"000000000000000000000000000000000",

100


Page 3 "000000000000000000000000000000000",

101

"000000000000000000000000000000000",

102

"000000000000000000000000000000000",

103

"000000000000000000000000000000000",

104

"000000000000000000000000000000000",

105

"000000000000000000000000000000000",

106

"000000000000000000000000000000000",

107

"000000000000000000000000000000000",

108

"000000000000000000000000000000000",

109

"000000000000000000000000000000000",

110

"000000000000000000000000000000000",

111

"000000000000000000000000000000000",

112

"000000000000000000000000000000000",

113

"000000000000000000000000000000000",

114

"000000000000000000000000000000000",

115

"000000000000000000000000000000000",

116

"000000000000000000000000000000000",

117

"000000000000000000000000000000000" );

118

119 Signal Sig_DFPM_storage_array : storage_type_dfpm := (

"00000000000000000000000000000000000000000",

120

"00000000000000000000000000000000000000000",

121

"00000000000000000000000000000000000000000",

122

"00000000000000000000000000000000000000000",

123

"00000000000000000000000000000000000000000",

124

"00000000000000000000000000000000000000000");

125

126 Signal Sig_UART_STORE_Pos : integer := 0;

127

128 Signal Sig_UART_READ_Pos : integer := 0;

129

130 Signal Sig_UART_READ_Pos_8BitPart : integer := 4;

131

132 Signal Sig_UART_IN_STORAGE_FLAG : STD_LOGIC := '1';

133


DFPM On FPGA



2015-09-25

91

Page 4 134 Signal Sig_RDA, Sig_TBE : STD_LOGIC := '0';

135 Signal Sig_WAITING_FOR_DFPM, Sig_DataReady_From_UART: STD_LOGIC;

136

137 Signal Sig_UART_Data_Storage_Complete : STD_LOGIC := '0';

138

139 Signal Sig_start_DFPM_computation : STD_LOGIC := '0';

140

141 Signal Sig_DataReady_From_DFPM : STD_LOGIC := '0';

142

143 Signal Sig_DataReady_To_DFPM_Module, Sig_DATA_READY_FROM_UART :

std_logic := '0';

144

145 Signal Sig_DATA_OUT_UART_INTERFACE : std_logic_vector(7 downto 0);

146

147 Signal Sig_VECTOR_A_IN_TO_DFPM : DFPM_SIGNED_VECTOR_25X32_BIT;

148 Signal Sig_VECTOR_B_IN_TO_DFPM : DFPM_SIGNED_VECTOR_5X32_BIT;

149

150 Signal Sig_VECTOR_X_OUT_FROM_DFPM : DFPM_SIGNED_VECTOR_5X32_BIT;

151

152 Signal Sig_DATA_IN_UART_INTERFACE : std_logic_vector(7 downto 0);

153

154 Signal Sig_DFPMStartFlag, Sig_RDA_Signal : std_logic := '0';

155

156 Signal sig_FirstWrite : std_logic := '1';

157

158 ----------------------------------------------------------------

159 --ASCII REpresentations

160 Constant Const_SemiColon : std_logic_vector(7 downto 0) :=

"00111011";

161 Constant Const_Colon : std_logic_vector(7 downto 0) := "00111010";

162 Constant Const_Space : std_logic_vector(7 downto 0) := "00100000";

163 Constant Const_Opening_Bracket : std_logic_vector(7 downto 0) :=

"01011011";

164 Constant Const_Closing_Bracket : std_logic_vector(7 downto 0) :=

"01011101";

165 Constant Const_Newline : std_logic_vector(7 downto 0) :=

"00001010";

166

167 Constant Const_Zero : std_logic_vector(7 downto 0) := "00110000";

168 Constant Const_One : std_logic_vector(7 downto 0) := "00110001";

169 Constant Const_Two : std_logic_vector(7 downto 0) := "00110010";

170 Constant Const_Three : std_logic_vector(7 downto 0) := "00110011";

171 Constant Const_Four : std_logic_vector(7 downto 0) := "00110100";

172 Constant Const_Five : std_logic_vector(7 downto 0) := "00110101";

173 Constant Const_Six : std_logic_vector(7 downto 0) := "00110110";

174 Constant Const_Seven : std_logic_vector(7 downto 0) := "00110111";

175 Constant Const_Eight : std_logic_vector(7 downto 0) := "00111000";

176 Constant Const_Nine : std_logic_vector(7 downto 0) := "00111001";

177

178 -- Other constants

179 Constant Const_9_Zeros : signed(8 downto 0) := "000000000";

180 Constant Const_16_Zeros : signed(15 downto 0) :=

"0000000000000000";

181

182

183 Constant Const_0 : Signed(7 downto 0) := "00000000";







DFPM On FPGA



2015-09-25

92



Page 5 191 Constant Const_8 : Signed(7 downto 0) := "00001000";


193

194 Constant C9 : signed(8 downto 0) := "000000000";

195 Constant C16 : signed(15 downto 0) := "0000000000000000";

196 ----------------------------------------------------------------

197

198

199

200 begin

201

202 -- Determining the position in which incoming data is to be stored

203 process(Sig_RDA, Sig_UART_STORE_Pos, Sig_DATA_OUT_UART_INTERFACE)

204 variable varPos : integer;

205 begin

206 if rising_edge(Sig_RDA) then

207 if (Sig_UART_STORE_Pos < 30) then

208 if ((Sig_DATA_OUT_UART_INTERFACE = Const_SemiColon)

209 --or (Sig_DATA_OUT_UART_INTERFACE = Const_Colon)

210 or (Sig_DATA_OUT_UART_INTERFACE = Const_Space)

211 or (Sig_DATA_OUT_UART_INTERFACE = Const_Closing_Bracket)) then

212

213 varPos := Sig_UART_STORE_Pos;

214 Sig_UART_STORE_Pos <= varPos + 1;

215 end if;

216 end if;

217 end if;

218 end process;

219

220 -- The actual data storage

221 process(Sig_RDA, Sig_DATA_OUT_UART_INTERFACE, Sig_UART_STORE_Pos)

222 begin

223 if rising_edge(Sig_RDA) then

224 if (Sig_UART_STORE_Pos < 30) then

225 if ((Sig_DATA_OUT_UART_INTERFACE = Const_One)

226 or (Sig_DATA_OUT_UART_INTERFACE = Const_Two) or (

Sig_DATA_OUT_UART_INTERFACE = Const_Three)

227 or (Sig_DATA_OUT_UART_INTERFACE = Const_Four) or (

Sig_DATA_OUT_UART_INTERFACE = Const_Five)

228 or (Sig_DATA_OUT_UART_INTERFACE = Const_Six) or (

Sig_DATA_OUT_UART_INTERFACE = Const_Seven)

229 or (Sig_DATA_OUT_UART_INTERFACE = Const_Eight) or (

Sig_DATA_OUT_UART_INTERFACE = Const_Nine)) then

230 -- Storing the 4 LSBs - the part of ASCII numerical representation

that contains the number being transmitted

231 -- Sig_DFPM_Input_array(Sig_UART_STORE_Pos)(19 downto 16) <=

Signed(Sig_DATA_OUT_UART_INTERFACE(3 downto 0));

232

233 Sig_DFPM_Input_array(Sig_UART_STORE_Pos)(19 downto 16) <= Signed(

Sig_DATA_OUT_UART_INTERFACE(3 downto 0));

234

235 end if;

236 end if;

237 end if;

238 end process;

239 ------------------------------------------------------------------

240 --Signalling that storage is complete


Page 6

DFPM On FPGA



2015-09-25

93

241 process(CLK, Sig_UART_STORE_Pos)

242 begin


244 --if (Sig_UART_STORE_Pos = 29) then

245 if (Sig_DATA_OUT_UART_INTERFACE = Const_Colon) then

246 Sig_UART_Data_Storage_Complete <= '1';

247 end if;

248 end if;

249 end process;

250

251 -- Signalling that the DFPM computation should start

252 process(clk, Sig_UART_Data_Storage_Complete, Sig_DFPMStartFlag)

253 variable var_DFPMStartFlag : std_logic := '0';

254 begin


256 if (Sig_UART_Data_Storage_Complete = '1') then

257 if (Sig_DFPMStartFlag = '0') then

258 Sig_start_DFPM_computation <= '1';

259 Sig_DFPMStartFlag <= '1';

260 else


262 end if;

263 else


265 end if;

266 end if;

267 end process;

268

269 ------------------------------------------------------------------

270

271 process(Sig_TBE, Sig_UART_READ_Pos, Sig_UART_READ_Pos_8BitPart)

272 Variable var_readPos, var_ReadPos_8bitPart : integer := 0;

273 begin

274 if rising_edge(Sig_TBE) then

275 if (Sig_UART_READ_Pos < 5) then

276

277 --Bits 31 downto 24, then 23 downto 16, then 15 downto 8, then 7

downto 0

278 --in successive bit transmissions through the UART is equivalent to

->

279 --(8(x + 1) - 1) downto (8*x) where x is the

Sig_UART_READ_Pos_8BitPart

280 -- This approach will transmit the data contained in each element

of

the solution vector

281 -- in series if 8 bits starting from the MSB to the LSB

282 --------------------------------------

283 if (Sig_UART_READ_Pos_8BitPart = 4) then

284 Sig_DATA_IN_UART_INTERFACE <= Sig_DFPM_storage_array(

Sig_UART_READ_Pos)(39 downto 32);

285 elsif (Sig_UART_READ_Pos_8BitPart = 3) then











Page 7 292 Sig_DATA_IN_UART_INTERFACE <= Sig_DFPM_storage_array(

DFPM On FPGA



2015-09-25

94


293 end if;

294

295

296 if (Sig_UART_READ_Pos_8BitPart = 0) then

297 Sig_UART_READ_Pos_8BitPart <= 4;

298

299 var_readPos := Sig_UART_READ_Pos;

300 Sig_UART_READ_Pos <= var_readPos + 1;

301 else

302 var_ReadPos_8bitPart := Sig_UART_READ_Pos_8BitPart;

303 Sig_UART_READ_Pos_8BitPart <= var_ReadPos_8bitPart - 1;

304 end if;

305 end if;

306 end if;

307 end process;

308 ------------------------------------------------------------------

309

310 Inst_UART_INTERFACE: UART_INTERFACE PORT MAP(

311 RXD => RXD,

312 DATA_UART_TO_DFPM => Sig_DATA_OUT_UART_INTERFACE,

313 RDA_SIG => Sig_RDA,

314 DATA_READY_FROM_UART => Sig_DATA_READY_FROM_UART,

315

316 CLK => CLK,

317 RST => RST,

318 LEDS => LEDS,

319

320 WAITING_FOR_DFPM => Sig_WAITING_FOR_DFPM,

321

322 TXD => TXD,

323 DATA_DFPM_TO_UART => Sig_DATA_IN_UART_INTERFACE,

324 TBE_SIG => Sig_TBE,

325 DATA_READY_FROM_DFPM => Sig_DataReady_From_DFPM);

326

327

328 Inst_Signed_DFPM_Iteration_Control_Top_Module:

Signed_DFPM_Iteration_Control_Top_Module PORT MAP(

329 VECTOR_A_IN => Sig_VECTOR_A_IN_TO_DFPM,

330 VECTOR_B_IN => Sig_VECTOR_B_IN_TO_DFPM,

331 DATA_READY_FROM_UART_RX => Sig_DATA_READY_FROM_UART,

332 CLK => CLK,

333 RST => RST,

334 VECTOR_B_AX => open,

335 DATA_READY_FROM_ONE_ITERATION => open,

336 DATA_READY_FROM_DFPM_ITERATIONS => Sig_DataReady_From_DFPM,

337 VECTOR_X_OUT => Sig_VECTOR_X_OUT_FROM_DFPM);

338 -------------------------------------------------------------------

--

339 --

340 --

341 -- Sig_VECTOR_A_IN_TO_DFPM <= ( (C9&Const_8&C16, C9&Const_2&C16,

C9&Const_3&C16,

C9&Const_4&C16, C9&Const_5&C16),

342 -- (C9&Const_1&C16, C9&Const_7&C16, C9&Const_3&C16,





Page 8 344 -- (C9&Const_1&C16, C9&Const_2&C16, C9&Const_3&C16,

C9&"00001010"&C16, C9&Const_5&C16),


DFPM On FPGA



2015-09-25

95

C9&Const_4&C16, C9&"00001100"&C16));

346 --

347 --

348 -- Sig_VECTOR_B_IN_TO_DFPM <= (C9&Const_1&C16, C9&Const_2&C16,

C9&Const_3&C16,

C9&Const_4&C16, C9&Const_5&C16);

349

350

351 -- Direct assignment of Data received from the UART RX pin (which

were stored

after conversion)

352 Sig_VECTOR_A_IN_TO_DFPM <= ( (Sig_DFPM_Input_array(0),

Sig_DFPM_Input_array(1),

Sig_DFPM_Input_array(2), Sig_DFPM_Input_array(3),

Sig_DFPM_Input_array(4)),

353 (Sig_DFPM_Input_array(5), Sig_DFPM_Input_array(6),











Sig_DFPM_Input_array(24)) );

357

358

359 Sig_VECTOR_B_IN_TO_DFPM <= (Sig_DFPM_Input_array(25),

Sig_DFPM_Input_array(26),


Sig_DFPM_Input_array(29));

360

361

-----------------------------------------------------------------------

--------------

362

363

364 -- Direct assignment of the output of the DFPM module to the output

storage

365 Sig_DFPM_storage_array(0) <=

std_logic_vector(Sig_VECTOR_X_OUT_FROM_DFPM(0)) &

Const_Newline;



Const_Newline;



Const_Newline;



Const_Newline;



Const_Newline;

370

371 ------------------------------------------------------------------

372

373 end Behavioral;

374

DFPM On FPGA



2015-09-25

96

Test code written in C++

#include <iostream> #include <stdio.h> //#include <time.h> #include <Windows.h> #include <Math.h> /* #define _WIN32_WINNT 0x0600 #define NTDDI_WIN7 (0x06010000) #define _WIN32_WINNT_WIN7 (0x0601) */ /* * This application is used for an estimation of the number of clock cycles used during an execution of * the Dynamic Functional Particle Method algorithm. The method can be used to solve many problems requiring * numerical methods. In this application, DFPM was used to solve the classical A*X = B problem, * where A is a matrix of coefficients, X, a vector of of variables and B a vector of coefficients. * The values of the variables in vector X are computed by the method and printed to the console/screen. * * In order to estimate the number of cycles used, the Kernel and Usertimes were first obtained and then * the total number of clock cycles used in both Kernel and User modes were obtained. This was done to make it easier * to identify the number of clock cycles used up in each mode separately. * * In this implementation, the actual DFPM algorithm was repeated a thousand (1000) times. This was done * in consideration of the fact that a single execution of the algorithm might be complet-ed in a very * short time. So short that the time used during both User and Kernel modes for execu-tion might be too * small to be noted that they will be registered as zero. Thus making it ifficult to identify the number * of clock cycles spent in each mode. * * By repeating the algorithm a thousand times, and then dividing the number of clock cycles obtained by * a thousand. A reasonable approximate of the average number of clock cycles used by the process in user * mode was obtained. */

DFPM On FPGA



2015-09-25

97

int main(){ using namespace std; //************Initial initializations //Creating matrix A int a[5][5] = { { 8, 2, 3, 4, 5 }, { 1, 7, 3, 4, 5 }, { 1, 2, 9, 4, 5 }, { 1, 2, 3, 8, 5 }, { 1, 6, 3, 4, 9 } }; //Creating vector B int b[5] = { 1, 2, 3, 4, 5 }; int mu = 1; double dt = 0.1; volatile double Ax[5]; volatile double B_Minus_Ax[5]; volatile double B_Minus_Ax_Minus_MuV[5]; volatile double muV[5]; double tolerance = 7.8125e-3; int i, j, count; count = 0; int noOfCompleteAlgorithmRepetitions = 1000; //*********** Creating multiple arrays for the vectorNorm and Vectors V and X volatile double arrayOfV[1000][5]; volatile double arrayOfX[1000][5]; volatile double arrayOfVectorNorm[1000]; for (int outerIndex = 0; outerIndex < noOfCompleteAlgorithmRepetitions; outerIn-dex++){ arrayOfVectorNorm[outerIndex] = 10000; for (int innerIndex = 0; innerIndex < 5; innerIndex++){ arrayOfV[outerIndex][innerIndex] = 1; arrayOfX[outerIndex][innerIndex] = 1; } } //********Noting the Kernel and User times at the begining of the algorithm execution FILETIME creationTime1, exitTime1, kernelTime1, userTime1; double startTime_Kernel, startTime_User; bool myBool1; if (myBool1 = GetProcessTimes(GetCurrentProcess(), &creationTime1, &exitTime1, &kernelTime1, & userTime1)){ startTime_Kernel = (double)(kernelTime1.dwLowDateTime | ((unsigned long long)kernelTime1.dwHighDateTime << 32))*0.0000001; startTime_User = (double)(userTime1.dwLowDateTime | ((unsigned long long)userTime1.dwHighDateTime << 32))*0.0000001; }

DFPM On FPGA



2015-09-25

98

else { cout << "Function GetProcessTimes failed on the first call" << endl; } //***Algorithm computations and repetitions while (count < noOfCompleteAlgorithmRepetitions){//Enabling repetition here //The actual algorithm while (arrayOfVectorNorm[count] > tolerance){ arrayOfVectorNorm[count] = 0; i = 0; while (i < 5){ Ax[i] = 0; j = 0; while (j < 5){ //A*X is done here Ax[i] += (a[i][j] * arrayOfX[count][j]); j++; }//Ax[i] gets a new value at this end of the loop muV[i] = mu*arrayOfV[count][i]; B_Minus_Ax[i] = b[i] - Ax[i]; B_Minus_Ax_Minus_MuV[i] = B_Minus_Ax[i] - muV[i]; arrayOfVectorNorm[count] += ((B_Minus_Ax[i])*(B_Minus_Ax[i])); arrayOfV[count][i] += B_Minus_Ax_Minus_MuV[i] * dt; arrayOfX[count][i] += arrayOfV[count][i] * dt; i++; } } count++; } ULONG64 myCycleTime = 0; bool myBool3; if (myBool3 = QueryProcessCycleTime(GetCurrentProcess(), &myCycleTime)){ cout << "The The total number of clock cycles used for " << noOfCompleteAlgorithmRepetitions <<" repetitions is : " << myCycleTime << endl; } else { cout << "Failed accessing the Process Cycle time" << endl; } FILETIME creationTime2, exitTime2, kernelTime2, userTime2; double stopTime_Kernel, stopTime_User; bool myBool2; if (myBool2 = GetProcessTimes(GetCurrentProcess(), &creationTime2, &exitTime2, &kernelTime2, & userTime2)){ stopTime_Kernel = (double)(kernelTime2.dwLowDateTime | ((unsigned long long)kernelTime2.dwHighDateTime << 32))*0.0000001; stopTime_User = (double)(userTime2.dwLowDateTime | ((unsigned long long)userTime2.dwHighDateTime << 32))*0.0000001;

DFPM On FPGA



2015-09-25

99

} else { cout << "Function GetProcessTimes failed on the second call" << endl; } for (int pos = 0; pos < 5; pos++){ cout << arrayOfX[count - 1][pos] << endl; } double totalUserTime = stopTime_User - startTime_User; double totalKernelTime = stopTime_Kernel - startTime_Kernel; if (myBool1 & myBool2 & myBool3){ cout << "The number of repetitions is : " << count << endl; cout << "The total kernel time in seconds : " << totalKernelTime << endl; cout << "The total user time in seconds : " << totalUserTime << endl; cout << "The cpu time in seconds : " << (totalKernelTime + totalUserTime) << endl; cout << "The total number of clock cycles used during each run of the complete algo-rithm is : " << (myCycleTime / noOfCompleteAlgorithmRepetitions) << " cycles." << endl; cout << "In consideration of the possibility of Kernel time being too low to be meas-ured," << " the minimum number of processor clock cycles in User mode is : " << (((totalUserTime - (1 * 0.0000001)) / (totalUserTime + totalKernelTime))*(myCycleTime / noOfCompleteAlgorithmRepeti-tions)) << " cycles." << endl; }else { cout << "One of the processor information obtaining processes failed" <<endl; } return 0; }

DFPM On FPGA



2015-09-25

100

Appendix B: Explanation of some basic

mathematical concepts

Two’s complement

When a positive number is represented in binary, the unsigned form of it is the

same as the two’s complement representation. When such a number is negative,

conversion from binary to two’s complement and vice versa can be done in a

two step operation:

1. Invert all the bits.

2. Add 1 to the LSB

Negative binary numbers represented in two’s complement representation of

always have 1 at the MSB.

Euclidian norm

The Euclidian norm of a vector can be calculated by summing up the squares of

the elements of the vector and finding the root of the sum.

(B.1)

Where n = the number of elements in the vector and x = element at position k.

DFPM On FPGA



2015-09-25

101

DFPM On FPGA



2015-09-25

102

Appendix C: Project report summary

Figure C.1 Project summary

DFPM On FPGA



2015-09-25

103

Appendix D: MATLAB codes

Code for problem specification and comparison.

The MATLAB code below was used to send and recieve problem parameters

between the PC running MATLAB and the FPGA. It required access to the PC’s

USB ports and USB and RS232 cables for powering and communication respec-

tively.

A short guide was included in the comment at the beginning of the script. The

script takes a problem specification of the form Ax = b and runs a MATLAB

implementation of the algorithm, printing the solution to the screen, immediate-

ly after which the same problem is fed into the FPGA that has been pre-

programmed with the DFPM on FPGA design.

The FPGA receives the problem specification and runs the DFPM algorithm to

compute the solution and then sends the solution back to the MATLAB applica-

tion. The MATLAB application then plots a graph of the two sets of values

obtained. The results were discussed in Chapter 6.

%%%%%%%%%%%%%%

%This script runs an implementation of the DFPM in MATLAB

and then parses

%the same problemimplementation to the FPGA implementation

through a port

%object that is bound to the UART, which is connected to

the FPGA.

%%*********************************************************

%%%%

%The following parameters define the problem and they can

be changed right

%from inside this script by modifying lines 56 and 59 for

the MATLAB

%implementation and line 118 for the FPGA implementation.

%Problem model A*x = b

%A = [9 2 3 4 5; 1 7 3 4 5; 1 2 9 4 5; 1 2 3 8 5; 1 2 3 4

9]

%b = [1 2 3 4 5]

%x, which is the solution, will be printed to the MATLAB

console on the PC

%in decimal format.

%%*********************************************************

*%%%

DFPM On FPGA



2015-09-25

104

%Fixed parameters relevant to the test:

%The following parameters are fixed in the sense that these

same parameters

%are available and modifiable in this script for the MATLAB

implementation

%but for the FPGA implementation, one needs to modify the

code in the FPGA

%and resythesize, re-map and re-generate the bit program-

ming file and then

%program the FPGA before the algorithm run conditions can

be at par between

%the two inmolementations.

%The parameters are:

%X = [1 1 1 1 1]

%V = [1 1 1 1 1]

%dt = 0.1 (discretization coefficient)

%mu = 1.0 (damping coefficent)

%%********************************************************%

%%

%Communication parameters:

%Since the PC communicates with the FPGA through USART on

the COM port,

%The parameters are listed below:

%Baud Rate: 9600

%Number of data bits: 8

%Parity: None

%Stop bits: 1

%Handshaking: None

%In order to have a hitch free test, please follow these

steps:

% 1. Program the FPGA with the git programmming file

% 2. Ensure that a USB to RS232 communication cable is

connected between

% the PC's USB port and the FPGA's RS232 port.

% 3. Check and verify the identity of the port to shich the

FPGA is

% connected (in the WIndows device manager or command line)

% (NOTE: If this same code is to run in a linux environ-

ment, then the

% port identity should be checked in the command line as

well and

% necessary modifications made to this script)

% 4. Ensure that the COM port identity corresponds to the

COM port identity

% in the connection parameters indicated in the com object

creation.

% 5. Run the script and check the solution on the MATLAB

console. :-)

% 6. If there is any errror, verify that the steps 1 to 5

were followed.

%%***First part - MATLAB implementa-

tion*********%%%%%%%%%%%%%%%%%%%%%%%%%

DFPM On FPGA



2015-09-25

105

A = [6 2 3 4 5; 1 8 3 4 5; 1 2 7 4 5; 1 2 3 8 5; 1 2 3 4

9];

x = [1 1 1 1 1]';

b = [5 4 3 2 1]';

v = [1 1 1 1 1]';

dt = 0.1;

mu = 1;

for i = 1:10000,

v = v + (b - A*x - mu*v) * dt;

x = x + v*dt;

if norm(b - A*x) < 7.8125e-3,

break,

end

end

i

%Printing out the values in the solution vector

fprintf('The values obtained from the MATLAB implementa-

tion:\n%d, %d, %d, %d, %d\n\n',

x(1), x(2), x(3), x(4), x(5));

%****End of the first

part**********%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%******Second part - the MATLAB to FPGA implementa-

tion/communication*%%

%Communicating through a port object with the following

parameters:

%**Com port ID: COM5 -should be modified as appropriate

%**Baud Rate: 9600

%**No of data bits: 8

%**Parity: odd

%**Stop bit: 1

%Creation of the port object

thePort = serial('COM12');

%Opening the port object created

fopen(thePort);

%Setting the Baud rate

thePort.BaudRate = 9600;

%Setting the number of data bits

thePort.DataBits = 8;

%Setting the parity

thePort.Parity = 'odd';

%Setting the stop bit

thePort.StopBits = 1;

%Setting the Terminator to "Carriage Return"

thePort.Terminator = 'cr';

%get(thePort)

%fprintf(thePort, 'This is a Print');

%Sending tha parameters of the problem statement to the

FPGA

fwrite(thePort, '[6 2 3 4 5;1 8 3 4 5;1 2 7 4 5;1 2 3 8 5;1

2 3 4 9][5 4 3 2 1]::',

'async');

%Acquiring the solution from the FPGA

DFPM On FPGA



2015-09-25

106

theSolution = fread(thePort);

%theSolution = fgets(thePort)

%%Storing up each of the 4 digits of 8 bits making up the

solution for each

%%of the solution elements

x1_arr = [theSolution(2), theSolution(3), theSolution(4),

theSolution(5)];

x2_arr = [theSolution(7), theSolution(8), theSolution(9),

theSolution(10)];

x3_arr = [theSolution(12), theSolution(13), theSolu-

tion(14), theSolution(15)];





solutionMatrix = [x1_arr; x2_arr; x3_arr; x4_arr; x5_arr];

elementMultipliers = [1, 1, 1, 1, 1];

solutionVector = [1, 1, 1, 1, 1];

for i = 1 : length(solutionMatrix)

%Checking to see if the MSB of the solution is '1'

%This will only be true if the solution element

%being considered in this case is a negative number

%%%Negative numbers%%%%%%%%%%%

if solutionMatrix(i, 2) == 255

%This number will eventually be multiplied with the final

value of

%the solution element

elementMultipliers(i) = -1;

%%%%Conversion of the 2's complement negative number starts

here%%%%

%%%%%%%%%%%%%%%

%Inverting every bit of the four 8 bit digits representing

the

%solution element

solutionMatrix(i, 1) = 255 - solutionMatrix(i, 1);




2015-03-31 18:25 C:\Users\...\DFPM_On_FPGA_MATLAB_Test.m 4

of 4

%Adding one to the solution element

solutionMatrix(i, 4) = solutionMatrix(i, 4) + 1;

%%%%%%%%%%%%%%%

%%%%Conversion of the 2's complement negative number ends

here%%%%

%Summing up the digits in order to obtain the solution

element

solutionVector(i) = solutionMatrix(i, 1) * 255 + solution-

Matrix(i, 2) +

(solutionMatrix(i, 3)/255) + (solutionMatrix(i, 4)/65536);

%%%Positive numbers%%%%%%%%%%%%%%

else

DFPM On FPGA



2015-09-25

107

%Summing up the digits in order to obtain the solution

element

solutionVector(i) = solutionMatrix(i, 1) + solution-

Matrix(i, 2) +

(solutionMatrix(i, 3)/255) + (solutionMatrix(i, 4)/65536);

end;

end;

solutionVector = elementMultipliers .* solutionVector;

fclose(thePort)

delete(thePort)

clear thePort

%Printing out the values in the solution vector as computed

by the FPGA

fprintf('The values obtained from the FPGA implementa-

tion:\n%d, %d, %d, %d, %d\n\n',

solutionVector(1), solutionVector(2), solutionVector(3),

solutionVector(4),

solutionVector(5));

%%**End of the FPGA implementation*****%%%%%%%%%%%%%

plot(x, 'b*');

hold on;

plot(solutionVector, 'rx');

title('Plot of values obtained in MATLAB implementation vs.

FPGA implementation for :

A = [6 2 3 4 5; 1 8 3 4 5; 1 2 7 4 5; 1 2 3 8 5; 1 2 3 4

9], and b = [5 4 3 2 1]');

legend('MATLAB implementation', 'FPGA Implementation');

DFPM On FPGA



2015-09-25

108

DFPM On FPGA



2015-09-25

109

Appendix E: Table of standard ASCII

symbols and their numerical

representation Below is a table of ASCII[11] standard symbols used in handling data exchange

between the PC and the FPGA.

Figure E.1 Table of standard ASCII symbols

DFPM On FPGA



2015-09-25

110

Documents

DFPM on FPGA -Bachelor Thesis Report