Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Scalable Hardware Architecture for Memristor
Based Artificial Neural Network Systems
A thesis submitted to the
Graduate School
of the University of Cincinnati
in partial fulfillment of the
requirements for the degree of
MASTER OF SCIENCE
in the Dept. of Electrical Engineering and Computing Systems
of the College of Engineering and Applied Sciences
May 2016
by
Ananthakrishnan Ponnileth Rajendran
B.Tech, Amrita Vishwa Vidyapeetham University, Kerala, India
May 2013
Thesis Advisor and Committee Chair: Dr. Ranga Vemuri
Abstract
Since the physical realization of the Memristor by HP labs in 2008, research on Mem-
ristors and Memristive devices gained momentum, with focus primarily on modelling
and fabricating Memristors and in developing applications for Memristive devices. The
Memristor’s potential can be exploited in applications such as neuromorphic engineering,
memory technology and analog and digital logic circuit implementations. Research on
Memristor based neural networks have thus far focused on developing algorithms and
methodologies for implementation.
The Memristor Bridge Synapse, a Wheatstone bridge-like circuit composed of four
Memristors is a very effective way to implement weights in hardware neural networks. Re-
search on Memristor Bridge Synapse implementations coupled with the Random Weight
Change Algorithm proved effective in learning complex functions with potential for imple-
mentation on hardware with simple and efficient circuity. However, the simulations and
experiments conducted was purely on software and was only proof of concept. Realizing
neural networks using the Memristor Bridge Synapse capable of on-chip training requires
an effective hardware architecture with numerous components and complex timing.
This thesis presents a scalable hardware architecture for implementing artificial neu-
ral networks using the Memristor Bridge Synapse capable of being trained on-chip using
the Random Weight Change algorithm. Individual components required for implement-
ing training logic, timing and evaluation are described and simulated using SPICE. A
complete training simulation for a small neural network based on the proposed architec-
ture was performed using HSPICE. A prototypical placement and routing tool for the
architecture is also presented.
ii
To my parents and my sister. Thank you for being my inspiration.
In memory of my friends Govind and Srinivas. You’ll forever be in my heart.
iv
Acknowledgements
I would like to start by thanking the most important people in my life, my family. My
parents Rajendran and Rajam have made a lot of sacrifices to help my sister Malavika
and I to realize our dreams. Thank you very much for believing in me and motivating
me towards realizing my goals. I will forever be indebted to you.
I consider myself very lucky to have received the opportunity to work under Dr. Ranga
Vemuri. The knowledge you imparted will forever stay with me. Thank you very much
for letting me be a part of DDEL and guiding me through my Master’s journey. Thank
you Dr. Wen-Ben Jone and Dr. Carla Purdy for being part of my defense committee.
Thanks to Rob Montjoy for providing continuous support with the DDEL machines.
Special thanks to my friend Prabanjan for our innumerable discussions and the ideas you
gave me to put my work together. I would like to thank my friends Diwakar, Ashwini
and Meera for providing a helping hand on numerous occasions. Thank you Renuka for
reviewing my thesis.
I would like to thank all my teachers from primary school through college for moulding
me into the person I am today. Special thanks to Dr. Rajesh Kannan Megalingam for
inducing interest in the field of VLSI in me and motivating me to pursue a Master’s
degree.
Last but not least, thanks to all my friends and relatives for being a part of my journey
of life. I will forever be grateful for your help and support.
v
Contents
1 Introduction 1
1.1 The Memristor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Artificial Neural Networks on Hardware . . . . . . . . . . . . . . . . . . . 7
1.3.1 Analog Neural Network Implementations . . . . . . . . . . . . . . 7
1.3.2 Memristor Based Neural Networks . . . . . . . . . . . . . . . . . 10
1.4 Random Weight Change Algorithm . . . . . . . . . . . . . . . . . . . . . 15
1.5 Memristor Bridge Synapse . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 The Memristor Neural Network Architecture 22
2.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Architecture Components . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Neuron Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.2 Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.3 Shift Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.4 Connection Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Memristor Bridge Synapse Bit-Slice . . . . . . . . . . . . . . . . . . . . . 35
2.4 Architecture in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
vi
CONTENTS
3 Placement and Routing Tool for Memristor Neural Network Architec-
ture 38
3.1 Tool Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Output and Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 43
3.3.1 Area Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.2 Runtime Performance . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 Experimental Results and Analysis 50
4.1 Memristor Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Memristor Bridge Synapse Simulation . . . . . . . . . . . . . . . . . . . . 54
4.3 Memristor Bridge Synapse Bit-Slice Simulation . . . . . . . . . . . . . . 57
4.4 Simple Neural Network Simulation . . . . . . . . . . . . . . . . . . . . . 59
4.5 OR-Gate Training in SPICE . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.2 Observation and Analysis . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Power and Timing Estimation . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6.1 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7 Training Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5 Conclusion and Future Work 74
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2.1 Implementing Stronger Activation Function . . . . . . . . . . . . 75
5.2.2 Linear Feedback Shift Register for Random Bits . . . . . . . . . . 76
5.2.3 Implementing other Hardware Friendly Algorithms . . . . . . . . 76
vii
CONTENTS
5.2.4 Bit-slice in Layout . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2.5 Testing with more Memristor Models . . . . . . . . . . . . . . . . 76
5.2.6 Reconfigurable Neural Network . . . . . . . . . . . . . . . . . . . 77
Bibliography 78
viii
List of Figures
1.1 Conceptual symmetries of the four circuit variables with the three classical
circuit elements and the memristor [1]. . . . . . . . . . . . . . . . . . . . 2
1.2 Cross section of HP’s Crossbar Array showing the memristor switch [2]. . 3
1.3 Representation of a simple three-layered artificial neural network [3]. . . . 5
1.4 Differential floating gate synapse schematic diagram of Electrically Train-
able Analog Neural Network (ETANN) [4]. . . . . . . . . . . . . . . . . . 8
1.5 Analog current synapse, synapse current input, weight-control and neuron
output circuit schematic of model proposed in [5]. . . . . . . . . . . . . . 9
1.6 Schematic of a weight cell of CMOS integrated feed-forward neural network
[6]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Excitatory neuron with the input sensing circuit of Memristor Crossbar
Architecture for Synchronous Neural Networks [7]. . . . . . . . . . . . . . 11
1.8 Weighting and Range Select circuit for RANLB and MTNLB [8]. . . . . . 12
1.9 (a) Activation function circuit for RANLB. (b) Activation function circuit
for MTNLB [8]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.10 Circuit that accomplishes weighting using the Memristor bridge synaptic
circuit and voltage-to-current conversion with differential amplifier in [9]. 13
1.11 (a) Typical multi-layered neural network inputs in voltage form. (b)
Schematic of learning architecture for the equivalent hardware for the neu-
ral network in (a) [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.12 Flowchart for Random Weight Change Algorithm. . . . . . . . . . . . . . 15
ix
LIST OF FIGURES
1.13 Illustration of energy surface tracing by back-propagation and random
weight change algorithm [9]. . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.14 Memristor Bridge Synapse Circuit [10]. . . . . . . . . . . . . . . . . . . . 17
2.1 Sample input for face pose identification problem [11]. . . . . . . . . . . . 22
2.2 Three layered neural network for face pose identification. . . . . . . . . . 23
2.3 Memristor based neural network architecture for face pose identification. 24
2.4 Simple three-layered neural network. . . . . . . . . . . . . . . . . . . . . 26
2.5 Memristor Bridge Synapse design. . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Summing logic for neuron N3 from Figure 2.4. . . . . . . . . . . . . . . . 28
2.7 Summing circuit using voltage average and operational amplifier circuits. 29
2.8 Difference circuit using differential amplifier. . . . . . . . . . . . . . . . . 30
2.9 Neuron N3 inputs and output for the neural network in Figure 2.4. . . . 32
2.10 Memristor Bridge Synapse Bit-Slice. . . . . . . . . . . . . . . . . . . . . 36
3.1 Placement of 10 blocks of output layer on layout represented with p-diffusion. 41
3.2 After routing of input bus for placed blocks in Figure 3.1. . . . . . . . . . 42
3.3 Completed placement and routing for neural network with 30 inputs, 10
hidden layer neurons and 10 output layer neurons. . . . . . . . . . . . . . 43
3.4 Flowchart showing the tool flow for placement and routing. . . . . . . . . 44
3.5 Layout for face pose identification neural network with 960 inputs, 10
hidden layer neurons and 4 output layer neurons. . . . . . . . . . . . . . 45
3.6 Output layer layout for face pose identification neural network. . . . . . . 46
3.7 Output layer layout for neural network with 80 inputs, 12 hidden layer
neurons and 15 output layer neurons. . . . . . . . . . . . . . . . . . . . . 47
3.8 Neural network with 80 inputs and 15 output layer neurons having two
hidden layers with 30 neurons in the first hidden layer and 25 neurons in
the second hidden layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1 Circuit for Memristor simulation with Memristor M1 (Ron =116Ω, Roff=16kΩ)
in series with resistor R1 (100Ω) and Voltage source Vin. . . . . . . . . . 51
x
LIST OF FIGURES
4.2 Memristor simulation with DC voltage +1V and -1V. . . . . . . . . . . . 52
4.3 Resistance change in the memristor for millisecond input pulse-width. . . 53
4.4 Resistance change in the memristor for microsecond input pulse-width. . 54
4.5 Memristor Bridge Synapse circuit used for simulation. . . . . . . . . . . . 55
4.6 Memristor Bridge Synapse simulation waveform. . . . . . . . . . . . . . . 55
4.7 Evaluation pulse applied to Memristor Bridge Synapse. . . . . . . . . . . 56
4.8 Memristor Bridge Synapse Bit-Slice simulation waveform. . . . . . . . . . 58
4.9 Neural network training input application and output evaluation. . . . . 59
4.10 Neural network weight update pulse application. . . . . . . . . . . . . . . 60
4.11 Neural network output at evaluation during different iterations. . . . . . 61
4.12 Flowchart showing tool flow for neural network training simulator in SPICE. 62
4.13 Neural network output for learning OR-gate function at the start of sim-
ulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.14 Neural network output for learning OR-gate function for 54th iteration of
training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.15 Neural network output for learning OR-gate function at the end of simulation. 66
4.16 Mean squared error vs iterations for training OR-gate function. . . . . . 70
4.17 Mean squared error vs iterations for simulation 2. . . . . . . . . . . . . . 71
4.18 Mean squared error vs iterations for simulation 3. . . . . . . . . . . . . . 72
4.19 Mean squared error vs iterations for simulation 4. . . . . . . . . . . . . . 72
4.20 Mean squared error vs iterations for simulation 5. . . . . . . . . . . . . . 73
xi
List of Tables
2.1 Training input selection logic. . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Comparison of total layout area for neural networks for different technology
nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Fraction of unused area in layout for different neural networks . . . . . . 47
4.1 Instantaneous current and resistance measurements for forward biased
memristor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Instantaneous current and resistance measurements for reverse biased mem-
ristor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Weight change for different training signal pulse-widths for memristor
bridge synapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Comparison of training performance for multiple simulations for training
OR-gate function in HSPICE . . . . . . . . . . . . . . . . . . . . . . . . 70
xii
Chapter 1
Introduction
In 1971, Leon Chua presented an argument that a fourth two-terminal device should
exist along with the three classical circuit elements, namely, the resistor, capacitor and
inductor [12]. He named this fourth circuit element as the Memristor. Chua pointed out
that the three basic circuit elements were defined based on a relationship between two
of the four fundamental circuit variables current, voltage, charge and flux-linkage. There
are six possible relationships between these four circuit variables, of which two are direct
relationships.
q =∫i(t)dt, (1.1)
is the relationship between charge (q) and current (i) and,
φ =∫v(t)dt, (1.2)
is the relationship between flux-linkage (φ) and voltage (v). The other three relations
are based on the axiomatic definition of the three classical circuit elements. The resistor
is defined by the relationship between current and voltage, the inductor by current and
flux-linkage and the capacitor by the relationship between charge and voltage. Chua
postulated based on a logical as well as axiomatic point of view that a fourth basic
two-terminal device should exist, which can be characterized by charge and flux-linkage.
There was no physical realization for such a two terminal device for over three decades
1
CHAPTER 1. INTRODUCTION
Figure 1.1: Conceptual symmetries of the four circuit variables with the three classicalcircuit elements and the memristor [1].
since Chua’s proposal, until in 2008 Dmitri B. Strukov et al. from HP Labs published
an article claiming that they observed memristance arises naturally in nanoscale systems
on coupling solid-state electronic and ionic transport under an external bias voltage [13].
Since this discovery, research on Memristors and Memristive devices gained momentum,
with focus primarily on modelling and fabricating memristors and in developing applica-
tions for memristive devices. The memristor’s potential can be exploited in applications
such as neuromorphic engineering, memory technology and analog and digital logic cir-
cuit implementations. The work presented in this thesis focuses on the application of the
memristors in the area of artificial neural networks.
1.1 The Memristor
The Memristor is a two terminal device whose electrical resistance is not a constant, but
varies depending on the amount of charge that flows through it. This variable resistance
of the Memristor is termed as its Memristance. The memristor is non-volatile in nature,
meaning that the device can remember its most recent resistance value even after it is
2
1.1. THE MEMRISTOR
Figure 1.2: Cross section of HP’s Crossbar Array showing the memristor switch [2].
disconnected from an electric power supply. This property of the memristor makes it
very useful for various applications such as in designing efficient memories and hardware
realizations of artificial neural networks.
There have been several implementations for the memristor device such as the Poly-
meric Memristor, Layered Memristor, Ferroelectric Memristor, Spin Memristive systems
etc. In this text, we will discuss the Titanium Dioxide Memristor that HP developed
in 2008. Researchers Dmitri B. Strukov et al. developed the memristor while working
on crossbar memory architecture at HP Labs. The crossbar is an array of perpendicular
wires that are connected using swtiches at points where they cross. Their idea was to
open and close these switches by applying voltages at the end of the wires. The design
of these switches lead to the creation of the memristor.
HP’s memristor is composed by sandwiching a thin layer of titanium dioxide (TiO2)
between two platinum electrodes. The electrodes are about 5nm thick and the TiO2 layer
is about 30nm thick. The TiO2 layer is divided into two separate regions, one composed
of pure TiO2 and the other slightly depleted of oxygen atoms. These oxygen vacancies
act as charge carriers and help conduct current through the device leading to a lower
resistance in the oxygen depleted region. The application of an electric field results in
3
CHAPTER 1. INTRODUCTION
a drift of these oxygen vacancies which results in a shift of the boundary between the
low and high resistance regions. Figure 1.2 shows a cross sectional view of HP’s crossbar
array with the memristor. If an electric field is applied across the two electrodes, it results
in the boundary between the normal region and oxygen depleted region moving either
towards or away from the upper platinum electrode. If the boundary moves towards the
upper electrode, it results in higher resistance and vice versa. Thus, the resistance of the
device is dependent on how much charge has passed through it in a particular direction.
The memristance is observed only when both the pure and doped regions contribute to
the resistance. After enough charge passes through the device, the ions becomes unable
to move further and the device enters hysteresis. The device then acts as a simple resistor
until the direction of the current is reversed.
In 2010, R. Stanley Williams of HP labs reported that they were able to fabricate
memristors as small as 3 nm by 3 nm in size that had a switching time of 1 ns (1 GHz
speed). Such small dimension and great speed promises a lot of application for the
memristor. In the work presented here, the memristor’s ability to provide a wide range
of resistance values is utilized in creating synaptic weights for artificial neural networks.
For simplicity, we have used the term ’resistance’ instead of ’memristance’ throughout in
this text.
1.2 Artificial Neural Networks
Artificial neural networks are group of nodes that are connected using weighted edges.
They are models inspired by biological neural networks and are used to estimate or
approximate functions that usually depend on a large number of unknown inputs. The
ability of artificial neural networks to adapt to a given set of circumstances is what
makes them very attractive for applications such as pattern recognition, data mining,
game-play and decision making, medical diagnosis etc. Neural networks adapt to a given
set of inputs by modifying the weights of the interconnects between its neurons based
on a suitable algorithm. An activation function at the neuron defines its output for an
4
1.2. ARTIFICIAL NEURAL NETWORKS
input or set of inputs to it. There are mainly three learning paradigms, viz. supervised
learning, unsupervised learning and reinforcement learning.
Every neural network has one input layer and one output layer. It may have one or
more hidden layers. Figure 1.3 shows a simple three-layered neural network. The number
of neurons in each layer depend on the function that the network is trying to approximate.
The neural networks discussed in this thesis are feed-forward neural networks i.e., data
only flows in the forward direction and there is no feedback for the data while the network
is evaluated. The neural network in Figure 1.3 is fully interconnected arrangement in the
sense that every neuron in one layer is connected to every neuron in the succeeding layer.
This not a necessity while designing a neural network since all connections may not be
required to implement a specific function. However, it is very difficult to accurately
predict the optimal number of hidden layer neurons and connections that a particular
problem might require. The beauty lies in the fact that neural networks have the ability
to learn whether or not a particular neuron or connection has a significant impact on its
output.
Figure 1.3: Representation of a simple three-layered artificial neural network [3].
Supervised learning is one of the most commonly used learning method for artificial
5
CHAPTER 1. INTRODUCTION
neural networks. In this kind of learning, the aim is to infer the mapping implied by the
data; the cost function is related to the mismatch between the user’s mapping and the
data and it implicitly contains prior knowledge about the problem domain [3]. The mean-
squared error is often used as the cost and the learning tries to reduce the average error
between the network’s output and the desired output. The Backpropagation algorithm
is a well-known and efficient algorithm used for training neural networks. Training is
accomplished by adjusting the weights on the connections between neurons with an aim
to reduce the mean-squared error at the output of the neural network.
The Backpropagation algorithm calculates the gradient of a loss function with respect
to all of the weights in the network. The algorithm tries to minimize the loss function
by feeding the gradient to an optimization method which uses it to update the weights.
In order for the Backpropagation algorithm to work, the activation function used by the
neurons should be differentiable. The activation function is any mathematical function
at the neuron which defines its output for a given set of inputs. The Backpropagation
algorithm is very effective in training neural networks, but poses a lot of challenges when
implementing it on a standalone hardware system. The algorithm works in two phases;
the propagation phase and the weight update phase. In the propagation phase, the
algorithm first forward propagates a training input through the network and generates
the output activations. In the next step, the algorithm does a backward propagation
of the output activations through the network using the target pattern to generate the
difference between input and output values of all the hidden and output neurons. In the
weight update phase, the algorithm first multiplies the difference obtained with the input
activation to find the gradient of the weight. Then it uses this gradient to update each
of the weights in the network.
It is quite evident that the Backpropagation algorithm though very effective, requires
complex multiplication, summation and derivaties that are difficult to implement in VLSI
circuits [14]. A simpler algorithm is desirable to design a standalone hardware neural
network system. There are several hardware friendly algorithms implemented to train
artificial neural networks on hardware. The Random Weight Change algorithm is one
6
1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE
such popular algorithm. Though not as efficient as Backpropagation, it is hardware
friendly and much simpler to implement.
1.3 Artificial Neural Networks on Hardware
Implementation of artificial neural networks on hardware has been popular for over three
decades. Hardware neural networks extend from Analog to Digital to FPGA and even to
Optical Neural Networks. In this section, we briefly explore a few analog neural network
implementations and neural network implementations using memristors.
1.3.1 Analog Neural Network Implementations
Implementation of artificial neural networks on hardware gained popularity in the 1980s
with Intel’s Electrically Trainable Analog Neural Network (ETANN) 80170NX chip being
one of the earliest fully developed analog chips [4]. The ETANN is a general purpose neu-
rochip that stores its weights on non-volatile floating gate transistors (Floating-gate MOS-
FET or FGMOS) as electric charge with the help of EEPROM cells, and uses Gilbert-
multiplier synapses to provide four-quadrant multiplication. Training for ETANN is done
off chip using a host computer and the weights are written into the ETANN [4]. The chip
contains 64 fully interconnected neurons and can be cascaded by bus interconnection to
form a network of up to 1024 neurons with up to 81,920 weights [15].
Figure 1.4 shows the synapse circuit of the ETANN, which is an NMOS version of
the Gilbert-Multiplier with a pair of EEPROM cells in which the a differential voltage
is stored as weights. Flower-Nordheim tunneling of electrons is used to add and remove
electrons from the floating gates in the EEPROM to adjust the weights [4]. ETANN was
used in several systems like the Mod2 Neurocomputer which implemented 12 ETANN
chips for real-time image processing [16] and the MBOX II which makes use of 8 ETANN
chips to create an analog audio synthesizer [15].
One of the major drawbacks of this chip was the limited resolution in storing the
synaptic weights. The long time resolution of the weights was not more than five bits.
7
CHAPTER 1. INTRODUCTION
Figure 1.4: Differential floating gate synapse schematic diagram of ElectricallyTrainable Analog Neural Network (ETANN) [4].
Another issue was the writing speed and cyclability of the EAROMs used to store the
weights which restricted the application of chip-in-the-loop training [17].
Milev and Hristov [5] present a simple analog-signal synapse with inherent quadratic
non-linearity implemented using MOSFETs with no floating-gate transistors. They de-
signed a neural matrix for finger-print feature extraction with 2176 analog current mode
synapses arranged in eight layers of 16 neurons with 16 inputs each. A chip was fabri-
cated in a standard 0.35µm TSMC process to demonstrate the feasibility of non-linear
synapses in practical application.
Apart from the 16 x 8 neural-matrix of 128 analog 16-input-neurons, a 16-bit latched
digital inputs multiplexed with 16 analog-current inputs and 16 analog-current signal out-
puts and a 9-bit current-output digital-to-analog converter (DAC) is also implemented on
chip. Weight storage done is on an on-chip SRAM of more than 19K size. The architec-
ture allows for cascaded interconnection for system expansion. The internal system clock
is specified at 200 MHz maximum frequency. However, the input-data processing speed
is determined by current propagation delay through the components in the network and
varies significantly with the reference current driving the analog synapse circuits [5].
8
1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE
Figure 1.5: Analog current synapse, synapse current input, weight-control and neuronoutput circuit schematic of model proposed in [5].
Lui et al. [6] developed a mixed signal CMOS feed-forward neural network chip
with on-chip error reduction hardware. The design is compact and capable of high-speed
parallel learning using the Random Weight Change Algorithm (RWC). The weight storage
in the system is accomplished using capacitors. Capacitors implemented as weights are
compact and easy to program, but are susceptible to leakage issues leading to error in the
stored weights. In their system, Lui et al. designed large capacitors to ensure the leakage
be negligible. The chip is designed to operate in conditions that change continuously,and
the weight leakage problem is mitigated by constant weight updates. They found that
the weight retention time for the capacitors was around 2s for losing 1% of the weight
value at room temperature.
Figure 1.6 shows the schematic of a single weight cell with a shift register for random
input, the weight storage and modification circuit and the multiplier circuit. Lui et al.
were able to fabricated and test a chip with 100 weights and 10x10 array with 10 inputs
and 10 outputs. They tested the chip with by connecting it to a PC using an analog to
digital converter (ADC) and a digital to analog converter (DAC). In this work we make
use of the same RWC algorithm used by Lui et al. in their system. The RWC algorithm
9
CHAPTER 1. INTRODUCTION
Figure 1.6: Schematic of a weight cell of CMOS integrated feed-forward neural network[6].
is described in detail in the next section.
The analog neural network implementations discussed in this text is only a small
subset of the innumerable VLSI implementations of artificial neural networks. Misra and
Saha [15] provide a comprehensive survey of the hardware implementations of artificial
neural networks for over 20 years. Their discussion is not limited to analog neural network
implementations, but extend to digital, hybrid, FPGA based, RAM based and optical
neural networks.
1.3.2 Memristor Based Neural Networks
The potential to mimic brain logic is one of the most attractive feature of the mem-
ristor. Various architectures and synapse designs have been proposed using memristors
for realizing artificial neural networks. Here, we briefly discuss a couple of neural net-
work implementations using memristors and the Memristor Bridge Synapse based neural
network that we have used as the primary reference in our work.
Starzyk and Basawaraj [7] propose an architecture and training scheme for neural
networks implemented using crossbar connections of memristors with a view of preserving
the high density of synaptic connections. They employ simple threshold based neurons,
synapse constituting of only a single memristor and a common sensing network. The
synapse is designed with a view of creating large scale systems with synapses arranged
10
1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE
in a grid structure capable of being trained on-chip. The sysyem is composed of a single
layer feed-forward neural network with n inputs and m outputs.
Figure 1.7: Excitatory neuron with the input sensing circuit of Memristor CrossbarArchitecture for Synchronous Neural Networks [7].
The neuron of the Memristor Crossbar Architecture proposed in [7] operates in three
different phases, viz. sensing phase, active phase and resting phase. During the sensing
phase, the neuron waits for input activity and does not fire. Increase in any of the input
signals above the threshold would switch the neuron into active phase, where the neuron
either fires or does not for a specific amount of time. Once the active phase timing expires,
the neuron goes into resting phase where all the inputs and outputs go to 0V and remains
in this state till the next sampling time. The excitatory neuron with the input sensing
circuit of the Memristor Crossbar Architecture is shown in Figure 1.7. The design was
tested in HPSICE for organization of the neural network on noisy digit recognition.
In [8] Solitiz et al. propose two Neuron Logic Block (NLB) designs to overcome the
limitation of not being able train linearly inseparable functions with existing perceptron
based NLB designs using thin-film memristors that implement static threshold activation
functions. Their designs overcome the limitation by allowing effective activation function
to be adapted during learning. Solitiz et al. contribute a perceptron based NLB design
with an adaptive activation function, a perceptron based NLB with static activation func-
tion and multiple activation thresholds and demonstrate the designs for reconfigurable
logic and optical character recognition for hand written digits.
11
CHAPTER 1. INTRODUCTION
Figure 1.8: Weighting and Range Select circuit for RANLB and MTNLB [8].
Figure 1.8 shows the weighting and range selection circuit implemented using mem-
ristors for the Robust Adaptive Neural Logic Block (RANLB) and the Multithreshold
Neural Logic Block (MTNLB). The RANLB implements an adaptive activation function
using the circuit in Figure 1.9 (a), by providing an adjustable digital value for each in-
put current range. A flip-flop stores the digital value for each input current range. The
MTNLB is designed with a view of overcoming the high area overhead of the RANLB’s
activation function which limits its implementation on large neural networks where area
is a primary constraint. The MTNLB employs a static activation function in such a way
that the ability to learn linearly inseparable functions is not compromised. Figure 1.9 (b)
shows the activation function circuit for the MTNLB circuit.
Figure 1.9: (a) Activation function circuit for RANLB. (b) Activation function circuitfor MTNLB [8].
12
1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE
Figure 1.10: Circuit that accomplishes weighting using the Memristor bridge synapticcircuit and voltage-to-current conversion with differential amplifier in [9].
The Memristor Bridge Synapse introduced by Kim et al. in [10] is a very popular
synaptic design used to implement neural networks. [9], [18], [19] and [20] present imple-
mentations of the Memristor Bridge Synapse in artificial neural networks. In our work,
we build on the work presented in [9] by Adhikari et al. on neural networks constructed
using Memristor Bridge Synapse that involves the Random Weight Change algorithm for
training.
Each neuron in the Memristor Bridge Synapse based neural network in [9] is composed
of multiple synapse and one activation unit. The inputs to the neural network are sup-
plied as voltage values which are weighted and then converted to current by differential
amplifiers. Kirchhoff Current Law (KCL) is used to sum the currents and produce the
output of a neuron. The differential amplifier along with the active load circuit form the
activation unit of the neuron. Figure 1.10 shows the Memristor Bridge Synapse connected
to the differential amplifier circuit. Figure 1.11 (a) shows a simple neural network with
two neurons and Figure 1.11 (b) shows the equivalent hardware circuit for the neural
network in Figure 1.11 (a) along with the architecture for the training regime.
Adhikari et al. designed and simulated the differential amplifier and the active load
circuit in HSPICE and developed a look-up table from the results. The Memristor model,
error calculation, random number generation and training pulse application were simu-
lated in MATLAB. They tested the architecture to learn the 3-bit parity problem, a Robot
13
CHAPTER 1. INTRODUCTION
Figure 1.11: (a) Typical multi-layered neural network inputs in voltage form. (b)Schematic of learning architecture for the equivalent hardware for the neural network in
(a) [9].
workspace and face pose identification using neural networks with 3 input x 5 hidden x
1 output, 10 input x 20 hidden x 1 output and 960 input x 10 hidden x 4 output nodes
respectively in MATLAB [9]. Their aim was to show that the Memristor Bridge Synapse
based neural networks trained using the Random Weight Change algorithm could be used
to realize simple, compact and reliable neural networks that are capable of being used for
real-life applications.
In our work, we have used the Memristor Bridge Synapse based neural networks
described in [9] as the base and try to build a complete hardware architecture which
can be implemented on a chip. We have made several modifications to the architecture
presented in [9], but have used the Memristor Bridge Synapse as the primary component
of the system along with the application of the RWC algorithm for training. The RWC
14
1.4. RANDOM WEIGHT CHANGE ALGORITHM
algorithm and the circuit implementation of the Memristor Bridge Synapse are discussed
in detail in the following sections.
1.4 Random Weight Change Algorithm
The Random Weight Change (RWC) algorithm was first described by Hirotsu and Brooke
in 1993. They proposed the algorithm as an alternative to Backpropagation to eliminate
the need for complex calculations while training a neural network. The non-idealities
of analog circuits is another reason why Backpropagation is not preferred for hardware
implementations. They were able to successfully implement and test the algorithm on a
chip with 18 neurons and 100 weights which learned the XOR Gate problem [14].
Figure 1.12: Flowchart for Random Weight Change Algorithm.
The algorithm randomly changes all of the weights by a small increment of -δ or +δ
from its initial state. The training input is then supplied to the network and the output
15
CHAPTER 1. INTRODUCTION
Figure 1.13: Illustration of energy surface tracing by back-propagation and randomweight change algorithm [9].
error is calculated. If the new error has reduced compared to the previous iteration,
the same weight change is done again, until the output error either increases or falls
to within a desired limit. If the output error increases, then the weights are updated
randomly again. The algorithm can be summarized using the following equations from
[14]:
wij(n+ 1) = wij(n) + ∆wij(n+ 1) (1.3)
where,
∆wij(n+ 1) = ∆wij(n) if E(n+ 1) < E(n)
∆wij(n+ 1) = δ ∗Rand(n) if E(n+ 1) ≥ E(n)
E() is the root mean-squared error at the output, δ is a small constant and Rand(n) =
+1 or -1 randomly. The flowchart in Figure 1.12 illustrates the steps in the Random
Weight Change algorithm.
The Random Weight Change algorithm is less efficient when compared to Backproga-
tion. Figure 1.13 shows a comparison of the RWC algorithm with Backpropagation. For
Backpropagation, the operating point goes down along the steepest slope of the energy
16
1.5. MEMRISTOR BRIDGE SYNAPSE
Figure 1.14: Memristor Bridge Synapse Circuit [10].
curve in the network. For RWC algorithm, the operating point goes up and down on the
energy curve rather than descending straight along the energy curve. However, RWC’s
operating point statistically descends and finally reaches the correct answer [14].
The RWC algorithm is very effective for analog implementations of artificial neural
networks as it eliminates the need for complex circuitry and is not greatly affected by
circuit non-idealities. Moreover, the algorithm does not require any specific network struc-
ture and can be applied to all feed-forward neural networks. Fully connected feedback
networks may have local minimum problems [14].
1.5 Memristor Bridge Synapse
The Memristor Bridge Synapse is a Wheatstone Bridge like circuit that is composed of
four identical memristors. Figure 1.14 shows the arrangement of the memristors in the
Bridge Synapse. The memristor are arranged such that the polarities of memristors M1
and M4 are the same and opposite to that of M2 and M3. When a positive voltage
is supplied at Vin M1 and M4 are forward biased, which leads to the decrease in their
resistances. M2 and M3 on the other hand become reverse biased and their resistance
increases [10]. The outputs of the Bridge Synapse are tapped out at the nodes A and B.
The Bridge Synapse basically acts as two voltage divider circuits. The voltage at the
17
CHAPTER 1. INTRODUCTION
nodes A and B are given by the simple voltage divider formula:
VA = (M2
M1 +M2
) ∗ Vin (1.4)
VB = (M4
M3 +M4
) ∗ Vin (1.5)
where M1, M2, M3 and M4 are the resistance of the memristors M1, M2, M3 and M4
respectively. The weight of the Memristor Bridge Synapse is the difference in the voltage
VA and VB. Initially, when all the memristors are in the same state, the node VA and VB
will have the same value. The synaptic weight of the Bridge Synapse is described by the
following expressions from [10]:
positive synaptic weight if,
M2
M1
>M4
M3
negative synaptic weight if,
M2
M1
<M4
M3
zero synaptic weight if,
M2
M1
=M4
M3
The output of the Bridge Synapse can be modelled by the equation
Vout = ψ ∗ Vin (1.6)
where ψ is the synaptic weight defined by,
ψ =M2
M1 +M2
− M4
M3 +M4
(1.7)
The Memristor Bridge Neuron is implemented by summing the output signals from
different Bridge Synapses. Differential amplifiers are used to process the weighted inputs
from primary inputs or other neurons. The implementation of the Bridge Neuron is
described in Chapter 2.
18
1.6. THESIS STATEMENT
1.6 Thesis Statement
Since the physical realization of the memristor by HP labs in 2008, the research on
memristors and its applications have been constantly gathering pace. The potential
of memristors in realizing simple and fast neuromorphic circuits is immense. As the
lithographic process for fabricating memristors evolve, architectures and tools for circuit
realization also need to evolve.
Majority of the research on memristor based neural networks have thus far focused on
various algorithms and methodologies for implementation. The Bridge Synapse based ar-
tificial neural network presented in [9] shows a lot of promise for practical implementation
because of the simplicity in its design. In the work presented in [9], the authors focused
on illustrating the simplicity and effectiveness of using the Memristor Bridge Synapse
in tandem with the Random Weight Change algorithm for neural network implementa-
tions. They proposed to use the Memristor Bridge Synapse as the weighting element of
the neural network to which inputs were applied as voltage pulses. At the neuron level,
voltage-to-current conversion was achieved using differential amplifiers to take advantage
of Kirchhoff Current Law to sum the inputs of the neurons. The differential amplifier
along with the active load circuit form the activation unit of the neuron.
In [9], the authors tested their design by first simulating the differential amplifier and
the active load circuit in HSPICE and creating a look-up table which was then used
for training the neural network in MATLAB. The neural network circuit was created in
MATLAB using a memristor model. The error calculation and random number generation
was done by MATLAB code and the weight updates were done by changing the resistance
of the memristors in the bridge synapse based on the random numbers. They successfully
trained neural network for 3-bit parity problem, learning robot workspace and for face
pose identification.
Although [9] proves that neural networks using the Memristor Bridge Synapse for
weighting along with the RWC algorithm for training is a good approach for real-life
applications, a path for an actual realization of a chip was not described. Moreover,
19
CHAPTER 1. INTRODUCTION
on-chip training requires additional circuitry and timing becomes critical. In our work,
we focus on developing an architecture that can efficiently implement the RWC algorithm
and Memristor Bridge Synapses to create a hardware neural network the can be trained
completely on chip. We have made modifications to the design of the neuron and activa-
tion function in [9], but the training algorithm and weighting methodology remains the
same.
Our architecture is composed of the neural network circuit realized using Memristors
and differential amplifiers. The architecture also incorporates a microcontroller, which
is responsible for measuring and calculating the output error and supplying the random
training signals and timing signals to the neural network. We designed and implemented
circuits to supply the random inputs and apply them to each individual Memristor Bridge
Synapse during training.
We also developed a placement and routing tool to realize the architecture on a
physical layout. The tool takes the number of inputs, hidden layers and outputs as its
input and generates a physical layout with interconnections between neuron blocks on
different layers. Since layout libraries for memristors are not available yet, the placement
and routing tool is only a prototype to illustrate how the architecture would appear on
a layout and to gather an approximation of the area occupied by a specific network.
Majority of the simulations in this work were performed using HSPICE. Spice level
simulations are the best available approximations to actual circuit behavior in hardware.
Simulations were performed for individual components of the architecture and complete
neural network circuits. We also developed a simulator to train a small neural network
in HPSICE using Perl. Perl mimicked functions of the microcontroller such as supplying
random inputs, clock signals etc. by generating PWL inputs to the HPSICE circuit. A
neural network with 2 inputs, 3 hidden layer neurons and 1 output layer neuron success-
fully learned the OR-gate function in HSPICE.
The aim of our work was to develop an architecture suitable for implementing mem-
ristor based neural networks on chip. With the core of the neural network implemented in
HSPICE using real components and only minimal functionality simulated using software,
20
1.7. THESIS OVERVIEW
we were able to show that our architecture is well suited to be realized on a chip.
1.7 Thesis Overview
The remainder of this document is organized in the following manner: Chapter 2 discusses
the architecture for implementing neural networks with the Memristor Bridge Synapse.
The Chapter describes the various components of the architecture and their functions. An
overview of the functioning of the neural network and the bit-slice design of the synapse
is also presented in this Chapter.
The placement and routing tool for the architecture layout is described in Chapter
3. This Chapter explains the algorithm and the implementation of the tool and presents
and discusses the output. The Chapter also discusses how the tool is designed to produce
layout for varying number of neurons and neural layers.
Chapter 4 describes the experimental setup, observations and analyzes the results of
the experiments conducted at different abstractions of the neural network design. All
components of the neural network are simulated both individually and and as full circuit.
The power calculations and estimations for neural network training and normal operation
are also presented in this Chapter. The conclusions drawn from this thesis and future
work are described in Chapter 5.
21
Chapter 2
The Memristor Neural Network
Architecture
The primary focus of this thesis is to develop an efficient hardware architecture to im-
plement the memristor based artificial neural networks described in [9]. This Chapter
focuses on describing our architecture and the various components of the neural network
system and their functions. The architecture is best explained with the help of examples.
In this thesis, we have used two different neural networks for simulations at different
levels of abstraction. A small neural network that aimed to learn the OR-Gate problem
was used in simulations to verify the functionality of the Memristor Bridge Synapse and
other components and the entire architecture at the SPICE level. A much larger neural
network for face pose identification explained in [9], was simulated using Python to verify
the functioning of the large Memristor Bridge Synapse based neural networks for more
practical applications.
Figure 2.1: Sample input for face pose identification problem [11].
22
2.1. ARCHITECTURE OVERVIEW
2.1 Architecture Overview
Image recognition is a popular application of artificial neural networks and the memris-
tor bridge synapse based artificial neural networks are efficient in learning functions of
this kind. We illustrate the working of the neural network architecture using face pose
identification problem discussed in [9].
The sample inputs to the neural network for the face pose identification problem is
shown in Figure 2.1. The images for face recognition are available for download from
CMU [11]. In this problem, the network tries to learn the direction in which the face of
the subject in the image is oriented. There are four face poses that the networks aims to
learn; left, right straight and up as depicted in Figure 2.1 (a) through (d). The images are
greyscale with 32x30 resolution. Figure 2.2 shows a representation of the neural network
used for this problem.
Figure 2.2: Three layered neural network for face pose identification.
23
CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE
The network has a total of 960 (32*30) inputs, 10 hidden layer neurons and 4 output
neurons. Every neuron in one layer is connected to every neuron in the succeeding layer.
The network consists of a total of 9640 memristor bridge synapses. The circuit produces
an output of [1 -1 -1 -1], [-1 1 -1 -1], [-1 -1 1 -1] and [-1 -1 -1 1] for left, right, straight
and up orientations of the subject’s face. In Figure 2.2, the input layer neurons are
only a representation of the fan-out of the external inputs to multiple memristors bridge
synapses. No function is applied to the inputs at the input layer neurons.
Figure 2.3: Memristor based neural network architecture for face pose identification.
For this neural network design, we can see that the number of hidden and output layers
are smaller in number compared to the input layer. In this particular example of face
pose identification, the number of input neurons is almost 100 times the hidden neurons,
and the number of output neurons is less than half the number of input neurons. A chip
for such a neural network can have close to 1,000 pins and the architecture in Figure 2.3
is designed keeping the constraint of connecting the pins to internal signals in mind.
Since all inputs go to all neurons, each neuron block in the middle layer receives the
inputs from a bus. A neuron block consists of as many Memristor Bridges as inputs to the
block (960 Memristor Bridge Synapses in this example) and three operational amplifiers
24
2.2. ARCHITECTURE COMPONENTS
circuits, two for summing and one for difference. The middle layer neuron blocks are
placed close to the periphery of the chip on three sides and the output is drawn out from
the fourth side.
The input layer bus (consisting of 960 wires in the example) is placed around the
middle layer neuron blocks. This way, the inputs from the pins can easily be supplied to
the bus and the bus lines can be conveniently accessed by each neuron block. The middle
layer bus will have as many lines as middle layer neuron blocks (10 in this case). The
output of each neuron in the middle layer is connected to the bus and supplied to output
layer neuron blocks. The outputs of the output layer neuron blocks are connected to the
microcontroller, which reads the values generated by the network and calculate the error
and perform training. The outputs can also be tapped out through other pins on the
chip.
2.2 Architecture Components
With respect to the description of the architecture in Figure 2.3, the components of the
neural network can be categorized as internal and external to the neuron block. The
components that are external to the neuron blocks are the connection buses and the
microcontroller. We first describe the components internal to the neuron block and then
move onto the components external to it.
We describe the internal components of the neuron block with the help of a simple
neural network. Figure 2.4 shows an artificial neural network with two input layer neu-
rons, two hidden layer neurons and one output layer neuron. The aim of this neural
network is to learn the OR-Gate function. The training inputs are applied through the
nodes IN1 and IN2. There are a total of five neurons, N1 through N5 and six memristor
bridges, BR1 through BR6 in this network. Each neuron in one layer is connected to
every neuron in the succeeding layer. The neurons N1 and N2 are only a representations
and do not apply any function on the inputs. The applied inputs fan out from N1 and
N2 neurons to different bridges. For example, the input supplied at IN1 fans out to
25
CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE
bridges BR1 and BR2. Each memristor bridge produces two output components, the
VA component and the VB component. These components are represented by the two
lines that originate from each bridge synapse and go into the neuron where summing
logic is implemented.
Figure 2.4: Simple three-layered neural network.
2.2.1 Neuron Block
2.2.1.1 Memristor Bridge Synapse
The Memristor Bridge Synapse is the primary component of the neural network and
takes up most area on the chip. Each memristor is about 50 nm x 50 nm wide. A single
memristor bridge requires about 200 nm x 200 nm of area after including the routing
between the memristors. The biggest network simulated in this work has almost 10,000
bridge synapses.
Figure 2.5 shows the design of a Memristor bridge synapse. The input to the memristor
is applied from one end (node IN ) and the other end is tied to ground. As discussed in
Chapter 1, the two memristors connected on either side of node A are arranged such that
one of them will be forward biased and the other reverse biased when a voltage is applied
at node IN. The same logic applies to the memristors connected on either side of node
B, the only difference being that their orientation with respect to node IN is opposite
to that of the other two memristors on either side of node A in the bridge. The nature
of this arrangement ensures that the voltage drop at either node A or node B will be
greater than the other. Because of this arrangement, when the voltage drop at one node
26
2.2. ARCHITECTURE COMPONENTS
Figure 2.5: Memristor Bridge Synapse design.
increases, drop at the other node decreases. It also ensures that the total resistance of
the memristor bridge is a constant and brings a symmetry to the weight supplied by the
bridge.
The weight supplied by the bridge synapse is the difference between the node voltages
(VA − VB). The weight is changed by supplying either a positive or negative voltage at
IN. For the bridge in Figure 2.5, a positive voltage pulse at IN will result in the decrease
in the resistances of the memristors M1 and M4, and an increase in the resistance of M2
and M3. Consequently, the voltage drop at A will increase and voltage drop at B will
decrease as explained using equations 1.4 and 1.5. On the contrary, if a negative voltage
pulse is applied at IN, the voltage drop at A will decrease and that at B will increase.
The weight supplied will either be positive or negative depending on whether VA or VB
is greater.
It is interesting to note that both the evaluation and training pulse are applied through
the same node to the memristor bridge. The question arises how would the evaluation
input affect the resistance of the bridge and in turn, the weight of the bridge if they
are both applied from the same node. From experiments conducted, we observed that if
the pulse width of the input is within 1 ms, it does not bring any notable change to the
resistance of a memristor. Moreover, to ensure that the evaluation pulse does not alter
the resistance of the bridge synapse, the evaluation pulse is supplied as a complement,
27
CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE
e.g. if an input to one of the inputs of the neural network is +1V, a -1V is applied for a
the duration as the +1V input during evaluation to reverse any change to the resistance
caused by the input pulse. To change the resistance of a memristor by 40 Ω , a pulse of
width 250 µs was required. The experiments and observations are described in detail in
Chapter 4.
All connections between neurons in the network are established using the memristor
bridge synapse. While training is in progress, each memristor is applied a training pulse
based on the random number that was generated for it. The circuitry for applying random
pulses will be discussed in a later section.
The activation function at the neuron is implemented by the operational amplifiers
using a summing logic. The neuron receives its input from the various bridges that are
connected to it. Each bridge supplies two voltage components (VA and VB). The neuron
first sums these two components individually and then evaluates the difference between
the two sums.
Figure 2.6: Summing logic for neuron N3 from Figure 2.4.
Figure 2.6 shows the summing logic implementation of neuron N3 from the circuit in
Figure 2.4. Each bridge synapse has two output components, the VA component (voltage
from node A) and VB component (voltage from node B). At the neuron, the individual
VA and VB components are summed together first. After the summing is complete, the
difference of these individual summed values is evaluated. This evaluated voltage value
28
2.2. ARCHITECTURE COMPONENTS
will be the output of the neuron.VASUM and VBSUM in Figure 2.6 are evaluated by
summing the VA components and VB components from memristor bridges BR1 and BR3.
After the summation, the difference is evaluated by subtracting VBSUM from VASUM .
The difference gives the output N3OUT of neuron N3.
Both the summing and difference logic is implemented with the help of operational
amplifiers. Each neuron contains three operational amplifiers, two for each summing cir-
cuit and one for the difference circuit. The implementation of the summing and difference
circuit are explained in the following section.
2.2.1.2 Summing Amplifier
The summing operation is implemented by using a voltage average circuit along with
an operational amplifier as depicted in Figure 2.7. Note that the resistors used along
with the amplifier circuits are normal resistors and not memristors. The memristors
are used only to design the bridge synapses. The voltage averaging is accomplished by
Figure 2.7: Summing circuit using voltage average and operational amplifier circuits.
connecting the input voltages to resistors of resistance R. The other end of these resistors
are connected to the same node. For example, in Figure 2.7, the voltages VA from BR1
and BR3 are connected to two resistors of resistance R. Now, the voltage at node S1
will be the average of the two input voltages. To get the sum from the averaged voltage,
the voltage at node S1 needs to be multiplied with the total number of inputs to the
summing circuit. This accomplished by adjusting the gain of the operational amplifier.
In the circuit it Figure 2.7, the operational amplifier is in the non-inverting configuration,
29
CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE
whose gain is determined by the two resistors R1 and R2.
The gain for this particular amplifier is two, since there are two inputs to the summing
circuit. VASUM will automatically be generated after the circuit receives the input
voltages. For a neuron which has n inputs, the operational amplifier will be configured
to have a gain of n. The gain of the transistor once fixed, does not have to be altered
during the operation of the neural network. An important point to note here is that the
output, VASUM of the summing circuit is limited by the supply voltage to the differential
amplifier. In the case of the circuit in Figure 2.7, the output voltage will be within -1V
to +1V.
2.2.1.3 Difference Amplifier
Figure 2.8: Difference circuit using differential amplifier.
The implementation of the difference amplifier is much more straightforward. The
differential amplifier circuit used for this operation is shown in Figure 2.8. It is configured
with a gain of 1 and the input VASUM is supplied to the non-inverting input and VBSUM
sum is supplied to the inverting input. All the resistors in the circuit are the same value.
The circuit essentially does the operation N3OUT = VASUM - VBSUM .
30
2.2. ARCHITECTURE COMPONENTS
2.2.2 Microcontroller
The microcontroller is one of the key components of the architecture. It is responsible
for implementing the training algorithm by supplying all the necessary signals to the
memristor bridge synapses and the neurons. The microcontroller additionally generates
the random numbers required to train the weights of the bridge synapses.
2.2.2.1 Signals Generated by the Microcontroller
The microcontroller contains the logic for generating the control signals that are required
to train and operate the neural network. There are three control signals: update/evaluate,
evaluate, shift in and clk.
1. update/evaluate: This signal decides whether the neural network is in weight
update or evaluation mode. When update is high (update = 1), the network is in
weight update mode. The microcontroller supplies this signal to activate the weight
update process by enabling the +1V and -1V power rails. When the signal is low,
the network is either evaluating its output using the supplied external input, or is
in an idle state. When the network is in idle state, all bridge synapses and neurons
are undriven.
The update signal is also used to isolate the memristor bridges from the operational
amplifiers during the weight update phase. The isolation of the operational ampli-
fiers is very important to ensure that the training pulse on one memristor bridge
synapse is not propagated forward to the next layer. This is done by disabling the
input power rails to the differential amplifier through power gating.
2. shift in :The random numbers for each memristor bridge synapse are supplied using
this signal. Each bridge requires a random signal (either 0 or 1) and this random
number is generated and supplied by the microcontroller. The random numbers are
passed on to a shift register that are connected to the bridge synapse. Each bridge
synapse will have one D flip-flop associated with it to supply the random number
for training. The random numbers are supplied to the shift register through the
31
CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE
shift in line. There may be more than one shift in line depending on the size of the
neural network and the number of shift registers implemented.
3. clk : The clk is the global clock supplied to the entire neural network and is used
for supplying the random numbers to all the flip-flops in the network. This signal
is activated only when the random input are supplied to the neural network for
training.
2.2.2.2 Functions of the Microcontroller
The microcontroller is the central component of the architecture. It is responsible for
supplying the input, training signals, updating weights and evaluating the output of the
network. It ensure that all the components in the network function in a synchronized
manner.
Figure 2.9: Neuron N3 inputs and output for the neural network in Figure 2.4.
1. Synchronizing Input Application:
The microcontroller enables and disables the application of external inputs to the
neural network. The external inputs are supplied to the network for only a very
short period of time (<1ms) during the evaluation phase. The inputs should be
disabled while the training pulses are applied to the memristor bridge synapses.
Otherwise, this will lead to two strong signals driving one single node and may
32
2.2. ARCHITECTURE COMPONENTS
even damage the circuit. The microcontroller ensures that multiple inputs will not
drive a memristor bridge synapse at any given time using the signals it generates.
2. Disabling Operational Amplifiers:
Figure 2.9 shows two input bridges BR1 and BR3 to the neuron N3 and its output
connected bridge BR5. The external input as well as the weight update pulses are
supplied to BR1 and BR3 through the nodes IN1 and IN2 respectively. The weight
update pulse for BR5 is supplied through node N3OUT. During the weight update
process, the voltage pulse supplied to BR1 and BR5 will propagate all the way to
the differential amplifiers and get amplified to generate a voltage value at N3OUT.
To avoid this scenario, the differential amplifiers are turned off while the training
pulses are supplied. Since the input and output of the differential amplifiers are
electrically isolated, the bridges themselves will remain electrically isolated. This
is ensured by gating the evaluate signal generated by the microcontroller with the
power rails of the operational amplifiers.
3. Generating Random Numbers:
The microcontroller is responsible for generating the random numbers that are
required for updating the weights of the memristor bridges during training. Each
memristor bridge requires a uniquely generated binary random number to decide
the direction of its weight update. If the random number associated with a bridge
synapse is 1, its weight is increased. If the random number is 0, then the bridge’s
weight is decreased. Each memristor bridge synapse has a D flip-flop that stores its
training input. All the flip-flops are connected as a shift register and their inputs
are supplied serially. A neural network may have more than ten thousand bridge
synapses and each will have its own D flip-flop, which makes it difficult to supply
so many random numbers in parallel.
4. Error Calculation and Processing:
Every iteration in training generates an output. The microcontroller reads this
output and compares it with an expected output already stored in its memory to
33
CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE
generate a mean-squared error value. This generated error is compared against an
already defined and stored threshold. If the error value obtained falls within the
threshold, then the training process will come to an end. If the generated error
is above the threshold, the error is compared with the error generated during the
previous iteration. If the comparison yields that the new error is less than the old
error, then the flip-flop values are not updated, and the weight update is done in
the same direction for each bridge as the previous iteration. If the new error is
greater than the old error, then the microcontroller generates new random values
and sends it through the shift register to all memristor bridges before the weight
updates are done.
2.2.3 Shift Register
A shift register is nothing but a cascade of flip-flops that share the same clock. A shift
register is used in the neural network to supply the random inputs necessary for weight
update for the memristor bridge synapses. Each bridge synapse has a flip-flip associated
with it, which hold a 1 or 0, to dictate whether a positive or negative should be applied
to the memristor bridge for weight update. A positive pulse increases the bridge’s weight
while a negative pulse decreases it.
There will be as many number of flip-flips as there are memristor bridge synapses.
The number of shift registers may vary depending on the design. If there is a large
number of memristor bridges, it may be suitable to have multiple shift registers to which
the random numbers can be supplied. For example, if there are 10,000 flip-flops in the
neural network, having two shift registers of 5,000 flip-flops each with input to each shift
register supplied in parallel, the process can be completed in half the time. However, if
there are 20 bridges, it is not desirable to have two separate shift registers due to the
logic overhead required to supply pulses to two different shift registers.
Although the flip-flops of the shift register are part of the neuron block, we described
it as a separate component since the shift register is formed by the interconnection of
flip-flops across multiple neuron blocks.
34
2.3. MEMRISTOR BRIDGE SYNAPSE BIT-SLICE
2.2.4 Connection Buses
The connection buses are used to supply the inputs to each memristor in the neuron
block. The same input needs to be supplied to multiple memristor bridge synapses and
we use a bus to establish this connection. For example, for the neural network in Figure
2.3, each input is supplied to 10 memristor bridge synapses inside 10 different neuron
blocks that are placed on three edges of the chip. The bus 960 lines is laid around the 10
neuron blocks and all neuron blocks receive all 960 input signals.
2.3 Memristor Bridge Synapse Bit-Slice
The memristor bridge synapse is the most recurring element in the neural network design.
Each memristor bridge synapse requires additional circuitry to implement the weight
change and evaluation logic. This extra circuitry is composed of a flip-flop to hold the
selection logic for weight change and an additional multiplexer to select the input power
rails. Since these components need to be implemented alongside all memristor bridge
synapses, it is logical to create a bit-slice design with the bridge and the components that
go along with it. The flip-flops in different bit-slices will be connected together to form
a shift register.
Figure 2.10 shows bit-slice for the memristor bridge synapse. A multiplexer circuit
is implemented in the bit-slice using transistor logic to supply either +1V or -1V to
the input br in, of the memristor bridge synapse. The D flip-flop, which is part of the
shift register holds the value which determines to which power rail the memristor bridge
synapse will be connected to during weight update. Table 2.1 summarizes the logic.
When update = 0, node br in will either be undriven or driven by bridge input de-
pending on whether or not the microcontroller has asserted the training input signal.
When update = 0, the power rails to the multiplexer are gated and br in can only be
driven by bridge input. When update = 1, the power rails to the multiplexer are active
and the weight update pulse is automatically applied.
Table 2.1: Training input selection logic.
35
CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE
Figure 2.10: Memristor Bridge Synapse Bit-Slice.
update shift out in
0 0 undriven/bridge input
0 1 undriven/bridge input
1 0 -1V
1 1 +1V
Combining these components together and creating a bit-slice design makes it much
more efficient when creating layout for the architecture. The neuron blocks will be com-
posed of an array of these bit-slices and the differential amplifier circuits.
2.4 Architecture in a nutshell
The scalable hardware architecture is composed of several layers of neuron blocks. The
number of neuron blocks in each layer defines the number of neurons in the layer. Each
neuron block in a one layer receives all inputs from the previous layer or primary input.
36
2.5. SUMMARY
The innermost layer in the architecture is the output layer, whose neuron blocks are
considerably smaller in size compared to the other layers.
Each neuron block contains as many memristor bridge synapses bit-slices as inputs
to it. Each bit-slice is composed of a memristor bridge synapse, a multiplexer circuit, an
inverter and a flip-flop. The flip-flops across multiple bit-slices are connected together to
form a shift register. The neuron blocks also contain three operational amplifier circuits
for summation and difference.
Training of the neural network is synchronized by the microcontroller. It reads the
neural network output to compute the error and generates all control signals and random
bits for training and supplies training input. Training is implemented by the Random
Weight Change algorithm, with the microcontroller comparing the output error at each
iteration with the previous iteration and deciding whether to continue weight change in
the same direction or switch to a new random direction. Training is stopped when the
output error goes below a set threshold or if the iteration limit is reached.
2.5 Summary
In this Chapter, we presented an overview of the proposed architecture for memristor
based artificial neural networks. All the individual building blocks of the architecture
were introduced and how they all piece together to form the complete hardware for an
artificial neural network was explained. In the next Chapter, we introduce a placement
and routing tool that can be used for realizing the hardware architecture presented in
this Chapter.
37
Chapter 3
Placement and Routing Tool for
Memristor Neural Network
Architecture
In Chapter 2, we described the hardware architecture for implementing Memristor Bridge
Synapse based artificial neural networks. In this Chapter we describe the placement and
routing tool that can be used to translate the architecture into a physical layout. The
tool is only a prototype and is designed to be modified when memristor layout libraries
become available.
3.1 Tool Overview
The hardware architecture was designed in such a way that neural networks with a large
number of inputs can be easily implemented on a chip. In Chapter 2, we described the
architecture with the help of an example neural network with 960 inputs, 10 hidden layer
neurons and 4 output layer neurons with complete connectivity. This neural network will
be composed of a total of 9640 Memristor Bridge Synapses, 38,560 memristors, 9640 flip-
flops, 9640 voltage multiplexers, 9640 inverters and 42 differential amplifiers. Realizing
such a large circuit with over 65,000 components requires efficient floorplanning and
routing. The tool also needs to adaptable in cases where the number of hidden layers is
38
3.1. TOOL OVERVIEW
more than one, which is the primary advantage of the hardware architecture.
We have implemented a tool using C++ to generate a layout capable of being loaded
onto Magic [21] layout editor. Magic is a free and open source VLSI layout tool initially
developed at UC Berkeley. Our placement and routing tool generates a .mag file with
layout information stored as coordinates of a two dimensional gird space that can be read
directly by the Magic layout editor. The placement and routing for the tool is done at
a level of abstraction where we illustrate only neuron blocks being placed and routed.
The neuron blocks constitute of Memristor Bridge Synapse bit-slices and differential
amplifiers.
As mentioned earlier, placement and routing in Magic is based on two dimensional
grids, where the dimension of each grid is the feature size of the technology used. The
blocks and the connection wires are printed on the two dimensional grid space by speci-
fying what material should be printed on each grid. Once the placement and routing is
complete, the grid coordinates and what material it holds is written out as text to a .mag
file. In most placement and routing tools, the grid space is stored in a data structure
so that each entry in the data structure pertaining to a coordinate can hold a specific
value to indicate what material each grid space holds. Once the placement and routing
is complete, the tool would print out the coordinates and the content of each grid onto
the .mag file.
For our neural network architecture, the total grid space required for placing and
routing the neural network for face pose identification was 7680 x 11520. If a data
structure was to be created to store each grid on the grid space, memory will have to be
allocated to store a total of 88,473,600 grid values.
In case of most VLSI circuits, placement and routing is accomplished by implementing
already established algorithms. These tools also take certain amount of runtime to create
and print out a layout to a .mag file depending on the size of the circuit and the routing
complexity. Our architecture however is a basically multiple instantiations of the same
components in a specific structured way. We place and route neuron blocks on three
sides of concentric squares. By taking advantage of this symmetry in the design, we
39
CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE
mathematically calculate the grids on which different materials fall and directly write the
data to the .mag file. This way, we avoid the overhead of creating a large data structure
to store the contents of each grid which in turn brings significant reduction in runtime.
Analysis of the tool performance is described in a later Section in this Chapter.
In the following sections, we describe the tool flow and the algorithm, layout com-
ponents, analysis of the output and how the tool produces the layout for networks with
different number of hidden layer neurons. We also discuss the tool run-time analysis and
the area consumed by the layout.
3.2 Tool Flow
The placement and routing tool takes the number of inputs, number of hidden layers
and hidden layer neurons and the number of output layer neurons as its input. From the
description of the architecture in Chapter 2, we can see that neuron blocks are placed on
three sides of a square shaped chip and the outputs are drawn out to the fourth side. The
input layer neurons are placed close to the periphery of the chip and the output layer
neurons at its center. The hidden layers will be placed between the input and output
layer neurons.
The tool only places and routes the neuron blocks discussed in Chapter 2. Each
neuron block will contain three operational amplifier circuits and as many Memristor
Bridge Synapse bit-slices as inputs to the layer in which the block is present.
The tool starts by first placing the output layer neuron blocks on three sides of the
innermost area of the chip. The placement and routing in Magic as discussed in the
previous section is on two dimensional grids. The materials on layout are printed on
the two dimensional grid space by specifying the bottom left and top right corner of
a rectangle and the type of material occupying this rectangular area. The number of
blocks to be placed at the top, bottom and left sides are derived by the following simple
formulae:
No. of blocks on top side =Total no. of blocks
3
40
3.2. TOOL FLOW
No. of blocks on left side =Total no. of blocks
3+ (Total no. of blocks)%3
No. of blocks on bottom side =Total no. of blocks
3
Figure 3.1: Placement of 10 blocks of output layer on layout represented withp-diffusion.
For simplicity, we abstract the components inside the neuron block and represent
each neuron block with p-diffusion on the Magic layout editor. The size of each block
is estimated by the number of inputs to the block. For example, if a particular layer
receives n inputs from either the previous layer or the primary input to the network, the
size of each block is set to be 2n x 2n grids. The blocks are separated with an offset
value of 5 gird spaces. The separation of blocks with offset is only done for illustrating
the placement and routing of blocks. For an actual implementation, the blocks can be
placed next to each other. Figure 3.1 shows an 10 neuron blocks of dimension 20 x 20
41
CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE
grids and each block receives 10 inputs from either the previous layer or primary input.
Figure 3.2: After routing of input bus for placed blocks in Figure 3.1.
After placing the blocks, the tool creates a bus on the outside of the layer of neuron
blocks that span to all neuron blocks in the layer. Since each neuron blocks requires all
inputs from the previous layer, the bus is used to supply the inputs. The inputs to the
bus comes from either the output of the previous layer or the primary input. Figure 3.2
shows the updated layout in Figure 3.1 after input bus routing.
Once routing of the input bus is complete, the tool routes the output of the neuron
layer. If the layer under construction is the output layer, then the wires are drawn to
connect the output to the pins. For hidden layers under construction, the output of the
layer will be connected to the input bus of the next layer. Note that first the output layer
is routed followed by the hidden layers and finally the input layer.
After construction of one layer is complete, the tool picks up the next higher layer in
42
3.3. OUTPUT AND PERFORMANCE ANALYSIS
Figure 3.3: Completed placement and routing for neural network with 30 inputs, 10hidden layer neurons and 10 output layer neurons.
the network and repeats the same process. On finishing construction of all layers of the
neural network, the tool routes the control signals and power rails through the blocks.
Figure 3.3 shows a completely routed 3-layered neural network with 30 inputs, 10 hidden
layers neurons and 10 output layer neurons.
The flowchart in Figure 3.4 summarizes the tool flow for placement and routing. In the
next section, we analyze the output of the layout tool for a few example neural networks.
3.3 Output and Performance Analysis
In the previous section, we presented the flow for our placement and routing tool and
showed the final layout for a neural network with 30 inputs, 10 hidden layer neurons and
10 output layer neurons. In this section, we discuss the output produced for different size
43
CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE
Figure 3.4: Flowchart showing the tool flow for placement and routing.
neural networks and analyze the area occupied by the layout and the efficiency. We also
briefly discuss the runtime for generating the layout.
3.3.1 Area Analysis
To analyze the area occupied by the neural networks on layout, we present layout for
three different neural networks. We discuss the total area occupied by the layout and
also show what percentage of the total area is used for placement and routing for different
sized neural networks.
The largest neural network we created using the tool was for the face pose identifi-
cation problem which received 960 inputs and was composed of 10 hidden layer neurons
44
3.3. OUTPUT AND PERFORMANCE ANALYSIS
Figure 3.5: Layout for face pose identification neural network with 960 inputs, 10hidden layer neurons and 4 output layer neurons.
and 4 output layer neurons. Figure 3.5 shows the full layout for this neural network. The
network occupies a total grid dimension of 9642 x 15393 grids. It is clear visible from
the Figure that a great percentage of the total area is left unused. The reason for this
is the nature of the neural network implemented. Due to the large number of inputs to
the network, each neuron in the hidden layer receives 960 inputs, which means that there
are 960 Memristor Bridge Synapses associated with each neuron. When we compare this
number to the output layer neurons, each neuron receives only 10 inputs the previous
layer which makes the size of the neuron blocks in the output layer to be very small.
Figure 3.6 shows the output layer of the neural network after zooming into the center of
the layout.
45
CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE
Figure 3.6: Output layer layout for face pose identification neural network.
In Figure 3.7, we show the output layout of a neural network with 80 inputs, 12 hidden
layer neurons and 15 output layer neurons. Table 3.1 compares the total area of layout
for different neural networks for different technology nodes and Table 3.2 compares shows
the total unused area in the layouts for the neural networks in Table 3.1.
Table 3.1: Comparison of total layout area for neural networks for different technologynodes.
Neural
Network DescriptionGrid
Dimensions
Area (mm2)
for 2 λ =
InputsHidden
layer neurons
Output
layer neurons45 nm 32 nm 22 nm
960 10 4 9642x15396 0.075 0.038 0.018
80 12 15 1007x1313 6.60 × 10−4 3.38 × 10−4 1.60 × 10−4
30 10 10 342x513 8.88 × 10−5 4.49 × 10−5 2.12 × 10−5
From Table 3.2 we can see that a significant area of the layout is unused. This area can
46
3.3. OUTPUT AND PERFORMANCE ANALYSIS
Figure 3.7: Output layer layout for neural network with 80 inputs, 12 hidden layerneurons and 15 output layer neurons.
Table 3.2: Fraction of unused area in layout for different neural networks
NeuralNetwork Description
GridDimensions
Unusedarea
InputsHidden
layer neuronsOutput
layer neurons960 10 4 9642x15396 35%80 12 15 1007x1313 40%30 10 10 342x513 41%
be used to implement other logic that is required for the operation of the neural network.
For example, generation of random numbers can be accomplished by implementing a
linear feedback shift register (LFSR) within the circuit. This way, the amount of time
required to shift in the random numbers to the shift register can be significantly reduced.
3.3.2 Runtime Performance
Since we take advantage of the symmetry of the architecture and multiple instantiations of
the same components to create the layout using mathematical and geometric calculation
that require less memory access and processing, the runtime for creating the layout is
very less. The largest layout we created using the tool was the neural network for face
pose identification which occupied 9642 x 15393 grid vectors. The runtime required to
47
CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE
create this network on a PC with Intel CORE i3 370m processor at 2.40 GHz clock speed
was less than 0.2s. Since the runtime for the tool is very small, we are not reporting the
runtime for any of the other neural networks that we had created.
3.4 Scalability
Figure 3.8: Neural network with 80 inputs and 15 output layer neurons having twohidden layers with 30 neurons in the first hidden layer and 25 neurons in the second
hidden layer.
We show the scalability of the tool with the example of one neural network. This neural
network has 2 hidden layers with 30 neurons in the first hidden layer and 25 neurons in
the second hidden layer. The network has 80 inputs and 15 output layer neurons. Figure
3.8 shows the neural network layout. This neural network occupies 1974x2303 grids on
layout. When scaling the architecture to incorporate more number of hidden layer in the
48
3.5. SUMMARY
neural network, the number of neurons in each layer need to be carefully planned. A
reasonable ratio needs to be maintained between the number of inputs to a layer and the
number of neurons in the layer and the number of neurons in the succeeding layer. It
needs to be noted that the number of primary inputs to the neural network needs to be
greater than the number of neurons in the first layer.
3.5 Summary
In this Chapter, we introduced and described the placement and routing tool that can be
used to realize the scalable hardware architecture for memristor based neural networks.
We gave an overview of the tool and explained how it was designed. The tool flow was
also described and an analysis of the tool output in terms of the area occupied by different
neural networks was also given. The scalability of the tool was illustrated with the help
of an example of a three-layered neural network. In the next Chapter, we discuss the
experiments, observations and results.
49
Chapter 4
Experimental Results and Analysis
The simulations in this work were primarily done on SPICE and Python. The basic
components such as the memristor, the memristor bridge synapse, the memristor bit-
slice and a small neural network were simulated using SPICE. Bigger neural networks
were simulated using Python, which mimicked the behavior of the basic components at
a higher level of abstraction. In this Chapter, we describe the observations and results
of the simulations in SPICE and analysis of the results. The simulation results from
Python are not presented here since they do not convey anything different from what
was reported in [9]
We build confidence in the design by first illustrating the proper functionality of the
basic components of the architecture using SPICE. We begin by describing the behavior
of the memristor followed by the memristor bridge synapse. Once these components
are described, we go on to simulate the summing and difference logic using operational
amplifier circuits. After simulating all the individual components, we first perform a
partial simulation to illustrate the training process of the small neural network. This
network will have all the basic components working together in unison. We then go on
to simulate a complete training of a neural network to learn the OR-gate function.
50
4.1. MEMRISTOR SIMULATION
4.1 Memristor Simulation
Biolek et al. developed a mathematical model for the memristor, based on the findings
in [13] by incorporating non-linear dopant drift modeled using window functions [22]. In
the experiments presented here, we have used an ideal model of the memristor in the
simulations.
Figure 4.1: Circuit for Memristor simulation with Memristor M1 (Ron =116Ω,Roff=16kΩ) in series with resistor R1 (100Ω) and Voltage source Vin.
Figure 4.1 shows the circuit used for simulating the memristor. In this circuit, memris-
tor M1 is connected in series with a resistor R1 and a voltage source Vin. The memristor
used in this simulation has on resistance (Ron) of 116Ω and off resistance (Roff ) of 16kΩ.
The series resistor R1 is 100Ω.
The circuit was simulated by supplying +1V and -1V DC voltages and the change in
the resistance of the memristor M1 was measured. The resistance of the memristor was
calculated based on the measured instantaneous current and applied voltage.
The memristor in its initial state has a resistance of 1kΩ. A +1V was first applied
to terminal A and the memristor was brought to its ON state (state of least resistance).
Then a -1V was applied for a certain period of time and the memristor was brought to its
OFF state (state of most resistance). Figure 4.2 shows the waveform for this simulation.
It took about 9ms for 1V pulse to completely turn the memristor ON and make
it reach Ron=116Ω. From the Ron state, it took -1V pulse about 20ms to bring the
51
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
Figure 4.2: Memristor simulation with DC voltage +1V and -1V.
memristor to near OFF state. From Figure 4.2, we can see that the memristor does
not go completely into OFF state until a long period of time. This is due to the non-
linear nature of the memristor. This behavior is expected since it takes a pulse of longer
duration to turn OFF the memristor when compared to turning it ON.
To observe the resistance change of the memristor more closely, we applied voltage
pulses of different duration and measured the resistance change with a view of finding a
suitable training pulse for the memristor bridge synapses. We performed two simulations
on two instances of the circuit in Figure 4.1. To one of the circuits, a positive pulse was
supplied at node A, and to the other a negative pulse was applied at node A, making
them forward and reverse biased respectively.
In the first simulation, voltage pulses of duration in milliseconds were supplied to
both instances of the circuit. Figure 4.3 shows the simulation waveforms for millisecond
pulses. Pulses of pulse-width 10ms and 5ms were applied to the circuit and the instan-
taneous initial and final currents were measured and the resistance of the memristor was
calculated. In the second simulation, pulses with duration in microseconds were applied
and similar calculations were made. Figure 4.4 shows the simulation waveforms for the
52
4.1. MEMRISTOR SIMULATION
Figure 4.3: Resistance change in the memristor for millisecond input pulse-width.
pulses of pulse-width 400µs and 250µs. The waveform in Figure 4.4 also illustrates that
that applying a negative pulse for the same duration can reverse the effect of the initial
positive pulse and vice versa.
The results of the simulation are summarized in Table 4.1 and Table 4.2. Table 4.1
shows data for forward biased circuit and Table 4.2 for reverse biased. The instantaneous
initial and final currents were measured for both forward and reverse biased memristors,
and the instantaneous resistance values were calculated. Input voltage of pulse-width
10ms, 5ms, 400µs and 250µs were applied and the resistance changes were measured. The
objective of this experiment was to identify a pulse-width that would bring an optimal
resistance change in the memristor for updating the weight of the memristor bridge
synapse.
Table 4.1: Instantaneous current and resistance measurements for forward biasedmemristor.
Pulse-width Iinit (A) Ifinal (A) Rinit (Ω) Rfinal (Ω) Delta R (Ω)
10ms 1.23 × 10−4 1.59 × 10−4 8057.9377 6176.2819 1881.6557
5m 1.59 × 10−4 1.93 × 10−4 6176.2819 5083.4957 1092.7862
400µs 1.23 × 10−4 1.24 × 10−4 8057.9377 7989.9604 67.977314
250µs 1.24 × 10−4 1.24 × 10−4 7989.9604 7946.9944 42.965912
From the data, we can see that pulses with pulse-width in millisecond range bring a
53
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
Figure 4.4: Resistance change in the memristor for microsecond input pulse-width.
very large change in the resistance of the memristors. The pulses in microsecond range
seem more suitable for weight training since the resistance change is less than 100 Ω.
This will give a wider range for the weights supplied by the memristor bridge synapse. A
positive 1V pulse of 400µs brings a decrease in the resistance of the memristor by about
68Ω and a negative 1V pulse of the same duration brings an increase of nearly the same
amount. Simulating the memristor bridge synapse will give a better idea of how the
pulse-width of the training pulse affects the weight of the neural network.
Table 4.2: Instantaneous current and resistance measurements for reverse biasedmemristor.
Pulse-width Iinit (A) Ifinal (A) Rinit (Ω) Rfinal (Ω) Delta R (Ω)
10ms 1.23 × 10−4 1.03 × 10−4 8057.9377 9587.1065 1529.1688
5m 1.03 × 10−4 9.67 × 10−5 9587.1065 10239.123 652.01678
400µs 1.23 × 10−4 1.22 × 10−4 8057.9377 8125.7136 67.775907
250µs 1.22 × 10−4 1.21 × 10−4 8125.7136 8167.1958 41.482187
4.2 Memristor Bridge Synapse Simulation
To simulate the memristor bridge synapse, we used the arrangement shown if Figure 2.5
and inserted a voltage source at node IN. Figure 4.5 shows the circuit used for simulating
54
4.2. MEMRISTOR BRIDGE SYNAPSE SIMULATION
Figure 4.5: Memristor Bridge Synapse circuit used for simulation.
the memristor bridge synapse. As described earlier the two memristors top and bottom
are connected such that one of the two memristors is forward biased and the other reverse
biased. The output voltage of the bridge synapse is tapped from nodes A and B. The
weight supplied by the memristor bridge synapse is adjusted by changing the resistance
of the memristors on the bridge synapse. The difference in the voltage at nodes A and B
(VA − VB) is the output of the bridge synapse.
Figure 4.6: Memristor Bridge Synapse simulation waveform.
The memristor bridge synapse was simulated by applying a 1V pulse of width 400µs.
Before the pulse was applied, all the memristors were brought to their initial state with
all memristors having the same resistance. The waveforms from the simulation are shown
in Figure 4.6. The first pulse applied for 400µs is the update pulse. The second spike
55
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
in the waveform is the training input pulse applied for a much shorter duration (5ns).
We can observe that the voltage difference (VA − VB) produced by the memristor bridge
synapse is about 4.26mV after updating the weights. When the weight update pulse is
applied, the resistance of memristors M1 and M4 decrease, while that of M2 and M3
increase. This results in an increased voltage drop at node A and decreased drop at B.
Figure 4.7: Evaluation pulse applied to Memristor Bridge Synapse.
The evaluation pulse is applied to read out the output of the network and is of much
shorter duration compared to the training pulse. The 5ns pulse is short enough to not
bring any notable change in the resistance of any of the memristors in the bridge synapse.
Figure 4.7 shows the magnified part of the evaluation pulse from Figure 4.6. We can see
that the output voltages at neither node A nor node B changed after the evaluation pulse
was supplied. Usually, the evaluation pulse is applied with another voltage pulse of same
duration but opposite magnitude to negate any resistance change that might be incurred
during evaluation.
Table 4.3 shows the voltage difference generated for training pulses of different pulse-
widths. In each of the simulations, all the memristors in the memristor bridge synapse
were at the same initial state (R=8050Ω) before the training pulse was applied. A voltage
pulse of 1V amplitude was applied to the IN as the training input.
The memristors used in our experiments have ON resistance of 116Ω and OFF resis-
56
4.3. MEMRISTOR BRIDGE SYNAPSE BIT-SLICE SIMULATION
Table 4.3: Weight change for different training signal pulse-widths for memristor bridgesynapse
Pulse-width (µs) VA (V ) VB (V ) VA - VB (V )400 0.50213 0.49787 0.00426800 0.50426 0.49574 0.008521200 0.50638 0.49362 0.012762400 0.51277 0.48723 0.025544800 0.52499 0.47501 0.04998
tance of 16kΩ. In terms of magnitude, the minimum and maximum output voltage the
memristor bridge synapse can produce is 0V and 0.9928V, assuming that the maximum
input voltage is 1V. This means that if 400µs pulse of 1V magnitude is used as the train-
ing pulse, each bridge synapse will have about 466 possible weights. If a 4800µs pulse
of 1V magnitude is used, there would be around 40 possible weights for each memristor
bridge synapse. The length of the training pulse should be chosen based on requirements
function the neural network is attempting to approximate. For certain problems, it would
be effective to have more number of available weights, while for others a smaller number
could be more efficient.
4.3 Memristor Bridge Synapse Bit-Slice Simulation
The Memristor Bridge Synapse Bit-Slice was simulated to test its functionality. The
bit-slice design as described in Figure 2.10 is composed of the memristor bridge synapse,
a flip-flop and a multiplexer circuit to choose between +1V and -1V training pulse. The
output of the flip-flop controls whether the multiplexer supplies a +1V or -1V to the
input of the memristor bridge synapse during training.
A simple simulation was done to verify the functionality of the memristor bridge
synapse bit-slice. The experiment results are shown in the waveform in Figure 4.8. The
update signal is used to control the weight update and training input application. When
update is low, the +1V and -1V rails used for weight update is gated and driven to GND.
This ensures that there will be no leakage current through the multiplexer circuit that
would otherwise alter the weight of the bridge. When update is high, the power rails are
57
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
active and the weight update process is activated. The weight-update pulse is applied
to the memristor bridge synapse till the update signal is high. During the weight update
phase, the training input is not applied.
Figure 4.8: Memristor Bridge Synapse Bit-Slice simulation waveform.
In the simulation waveform in Figure 4.8, the update signal is initially kept low. A
high signal is produced at the input of the flip-flop and a clock signal is applied. Once the
clock is applied, the output of the flip-flip (scan out) becomes high. After a small time
gap, the update signal is made high and the weight update process is activated. When the
weight update is completed, a small pulse is applied through the training input port to the
memristor bridge synapse to evaluate the voltages at nodes A and B. When evaluation is
complete, a low signal is produced at the flip-flops input and the same process is repeated.
In this experiment, a weight-update pulse of 30ms was applied while the flip-flop
output held a high value and a 20ms pulse was applied when it held a low value. The
waveform in Figure 4.8 show that the bit-slice circuit is functioning correctly.
58
4.4. SIMPLE NEURAL NETWORK SIMULATION
4.4 Simple Neural Network Simulation
To illustrate the working of the memristor based artificial neural network as a system of
the basic components working together, we simulated the training of the simple memristor
based artificial neural network in Figure 2.4 explained in Chapter 2. This neural network
aims to approximate the OR-Gate function. The components of this neural network
include six memristor bridge synapses (Figure 2.5), six summing amplifiers (Figure 2.7)
and three difference amplifiers (Figure 2.8). The elements of the memristor bridge synapse
bit-slice like the D flip-flop, multiplexer etc. are ignored in this simulation for simplicity.
The weight update pulses are supplied directly to the bridge synapse terminals for this
simulation using separate voltage sources.
Figure 4.9: Neural network training input application and output evaluation.
The initial conditions of the memristors were changed for better clarity in illustrating
the functioning of the network. Each memristor is set to have a different initial resistance.
Figure 4.9 shows the simulation results of the circuit functioning. In the waveform, signal
n5out is the output of the neural network. Signals in1 and in2 are the two input training
signals to the neural network. update is the signal used to control between weight update
59
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
and training input application. The training input is applied to the circuit when update
is low. The weight update process happens automatically when update goes high.
For the first step, update is made low and the training pulse is applied. The training
inputs are given as a complementary pair of pulse-width 1µs. The output is only measured
and evaluated during the first 1µs. The second complemented input is applied to restore
any change caused to the memristors due to the application of the input.
Figure 4.10: Neural network weight update pulse application.
After the training inputs are applied and output measured, the output error is calcu-
lated by comparing the obtained output with the expected output. On taking the mean
squared error of the output, we see that the output error is about 37.4%. Since this error
is much greater than desired, we apply weight update pulses to the memristor bridge
synapses. For the first training iteration, random weight update pulses are applied to
each of the memristor bridges in the network. There are six memristor bridges in this
particular example network and six individual pulses are applied to train each of the
memristor bridges. Figure 4.10 shows the weight update pulses applied to the memristor
bridges.
The first set of random wight update pulse applied to the memristor bridge synapses
60
4.4. SIMPLE NEURAL NETWORK SIMULATION
are [-1 1 1 -1 -1 -1] to [BR1 BR2 BR3 BR4 BR5 BR6]. Each weight update pulse is applied
for a duration of 400µs. The pulse changes weights of the memristor bridge synapses in
either positive or negative direction depending on whether a positive or negative voltage
applied. After the weight update pulses are applied, the training input is applied and the
network is evaluated to measure the output error. We see that the output value during
evaluation is 0.38426V and the output error is 37.9%. The output error has increased
after the first set of weight update pulses were applied. The RWC recommends that
a new set of random weight update pulses should be applied to the memristor bridge
synapses if the output error increased compared to the previous iteration. A new set
of voltages, [1 -1 1 1 1 -1] is applied to the memristor bridge synapses and an output
voltage of 0.38608V is obtained on evaluation. The new output error is 37.6%, which is
lower than the previous iteration. So, the same weight update pulses are applied again
until the error either increases or reaches the expected value. Figure 4.11 (a)-(d) show
evaluation pulses magnified.
Figure 4.11: Neural network output at evaluation during different iterations.
61
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
4.5 OR-Gate Training in SPICE
4.5.1 Experimental Setup
To show a complete training simulation of the Memristor Based neural network designed
using the hardware architecture described in this thesis, we created a training simulator
using HSPICE and Perl. The simulator tries to mimic the behavior of the neural network
on hardware as close as possible. SPICE is the closest and most accurate simulation
technique available to simulate electric and VLSI circuits.
Figure 4.12: Flowchart showing tool flow for neural network training simulator inSPICE.
62
4.5. OR-GATE TRAINING IN SPICE
In our simulator, the neural network is defined using a SPICE circuit. All the logic
components of the neural network are modeled at the transistor level in SPICE except
for the differential amplifier for which we have used an idea model. For the memristor,
we have used the ideal memristor model from [23]. Perl mimics the operations of the
microcontroller by generating the control signals and the supplying the training input.
The flowchart in Figure 4.12 shows the simulator flow. The simulator is basically a
wrapper around HSPICE created using Perl. All interactions of the user are through the
command line interface to the Perl script. The simulator receives the number of inputs,
hidden layer neurons and output layer neurons along with pointers to files containing the
training inputs and the expected output. The user can also supply the error threshold and
a maximum iteration count for terminating the simulation. The simulation will terminate
if the output error goes below the error threshold or if the number of iterations of training
reaches the limit.
The Perl script first reads and store the training input and expected output from their
respective files and stores the information in a data structure. It then creates a SPICE
file of the complete neural network with all of its components using the specifications
provided by the user. Note that the SPICE files contains only the neural network and
none of the functions of the microcontroller are implemented in SPICE.
For the first iteration, the SPICE file contains only the training input supplied and all
other control signals are made inactive. The Perl script includes instructions to sample
the output of the neural network for when the training inputs are applied to the network
as voltages. The output voltage values are stored in a log file generated by HSPICE. At
the end of the HSPICE simulation for an iteration, the HSPICE tool is given a directive
to store the circuit state to a file that can be loaded by another SPICE file to begin its
simulation from where the previous iteration had finished.
Once the simulation is complete, Perl reads the network output voltage values from
the HSPICE log file and compares it with the expected output to compute the output
error. For the first iteration, the error for the previous iteration is saved as 0. The
wrapper checks if the new error is greater or lesser compared to the previous iteration. If
63
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
Figure 4.13: Neural network output for learning OR-gate function at the start ofsimulation.
the new error is greater, then the wrapper generates random bits for training the neural
network. It modifies the PWL voltage inputs defined in the SPICE file to update the
weights of the neural network. It also adds lines to reload the circuit state at the end of
the previous iteration and calls the HSPICE simulator for the next iteration. If the new
error is found to be less compared to the previous iteration, then the wrapper first checks
if the new error is less than the error threshold. If this is found to be true, the training
simulation will end. If the new error is greater than error threshold then the wrapper
starts the next iteration of HSPICE simulation.
Since the RWC algorithm is an iterative heuristic, the output is not guaranteed to be
optimal. There is a chance that the output error may not go below the error threshold
during training. Even if the output error is only 0.1% above the error threshold, the
circuit still continues training and may not find an solution after the iteration limit has
reached. Unlike in software, it is not possible to save a snapshot of the circuit for the best
case output and revert to the required state. Setting the circuit to a specific state would
involve changing the resistance values of many memristors. Hence, choosing a suitable
error threshold is critical in training the neural network with the RWC algorithm.
64
4.5. OR-GATE TRAINING IN SPICE
4.5.2 Observation and Analysis
We successfully trained a neural network with 2 inputs, 3 hidden layer neurons and one
output layer neuron to learn the OR-gate function. The training simulation was done
using the Perl-HSPICE simulator explained in the previous section. For this simulation,
all the memristors were initially in the same state. Hence, the weight supplied by all
Memristor Bridge Synapses will be 0 at the start of training. Figure 4.13 shows the
circuit output for the first iteration of training. v(in1) and v(in2) represent the input
pulses supplied to the neural network and v(l3in) represent the output. The input pulses
are supplied as complementary pairs in order to revert the effect of the applied input. The
output values obtained for the neural network was [8.2067aV 5.384nV 5.2055nV 9.736nV]
for an expected output of [0V 1V 1V 1V]. Note that all four input combinations for the
two input OR-gate are supplied to the neural network for training.
Figure 4.14: Neural network output for learning OR-gate function for 54th iteration oftraining.
In Figure 4.14 we show the output waveforms for the 54th iteration of training. It is
interesting to note that in this iteration, the output voltage for the input combination
v(in1) = 1V and v(in2) = 0V, the output is -0.012V. The error threshold for the simulation
was set at 0.015%. At the end of the simulation, the outputs obtained were [150.86fV
65
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
0.92259V 0.98883V 1V]. The simulation ran for a total of 276 iterations with random bits
being supplied for 28 of those iterations. The weight update pulses were supplied for 500
µs for each iteration. Figure 4.15 shows the output waveform form the neural network at
the end of simulation.
Figure 4.15: Neural network output for learning OR-gate function at the end ofsimulation.
4.6 Power and Timing Estimation
In this section, we give an approximation of the power consumption and timing of the
neural network circuit. We mathematically analyze the power consumption and timing
for the circuit during training and come up with a generalized formula for estimating
these metrics for different neural network designs.
4.6.1 Power
The major player in power consumption for the neural network in both training and
normal operation are the memristor bridge synapses. Since these components are resistive
elements, they are likely to consume most power. Here, we mathematically calculate the
power consumed by a single memristor bridge synapse based on the memristor model
66
4.6. POWER AND TIMING ESTIMATION
that we have used in our design.
The arrangement of the memristors on the memristor bridge synapse ensures that
the total resistance of the bridge synapse remains a constant. When a training pulse is
applied, the resistance of two of the memristor bridge synapse increase and that of the
other two decrease. This feature makes it easier for us to calculate the power consumed
during the operation of the circuit. For our simulations, we used a memristor model
with Ron = 116Ω, Roff = 16kΩ and Rinit = 8050Ω. We assume that all the memristors
of all bridge synapses in the circuit are at the same initial state before commencing
training. This implies that the total resistance of each memristor bridge synapse is about
8050Ω at all times. With this notion, we can calculate the instantaneous power drawn by
the memristor bridge synapse during training. To update weights, we supply a positive
or negative pulse of 1V magnitude. The current drawn from the circuit and the total
instantaneous power is given by the equations below.
Power, P =1
8050W (4.1)
P = 124.22 µW (4.2)
Both instantaneous power and average power for the memristor bridge synapse during
training is the same since the it can be viewed as a DC circuit with constant overall resis-
tance during the weight change process although the actual memristors may be changing
their resistance values. Note that the multiplexer circuit, the inverter and the flip-flip
associated with each memristor bridge synapse also consumes power during training, but
we ignore the power consumed by these elements since it is negligible compared to the
power consumed by the bridge synapse. So, the total average power consumed during
training can be generalized by the following formula.
Total Average Power, Ptot = (P ∗ number of bridge synapses) W (4.3)
67
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
Total Average Power, Ptot = (124.22 ∗ number of bridge synapses) µW (4.4)
Equation 4.3 shows the general formula for total average power for a memristor bridge
synapse based artificial neural network and Equation 4.4 shows the formula for total
instantaneous power for our simulations. The neural network used to simulate the OR-
gate function described in the previous section consisted of a total of 9 memristor bridge
synapses. So, the total average power consumed by the entire network for one iteration of
training is 1.118 mW. The power consumption for the neural network during evaluation
phase of training and standalone operation post training cannot be accurately estimated
since it depends on the input voltages supplied and the output of differential amplifiers
at each neuron. But the worst case instantaneous or average power consumed by the
neural network during standalone operation will be the same as the instantaneous power
consumed during training.
4.6.2 Timing
The complete timing for training a neural network cannot be accurately predicted because
of the algorithm being used to train the network. However, we can estimate the time
required for one iteration of training. Application of the weight update pulse occupies
the majority of time in one training iteration. In our simulations, we supplied a training
pulses for 500µs duration to update the weights. To evaluate the new output after
updating the weights, signals can be supplied in ns range. When compared to time
required for updating weights, the evaluation time is negligible. Another contributing
factor during training is the time required to shift in random bits to the flip-flips when
the weight change directions have to be updated. This time would depend on the number
of memristor bridge synapses in the network and the total number of individual shift
registers in the circuit. The clock period for the flip-flops can also be in nanosecond
range. The following equations summarize the time for training.
68
4.7. TRAINING PERFORMANCE
Time one training w/ random bit generation = Weight update time + Shift in time
(4.5)
Time one training w/o random bit generation = Weight update time (4.6)
Equations 4.5 and 4.6 show the total time required for one training iteration when
random training bits are applied and not applied respectively. The neural network simu-
lation for OR-gate function required a total of 276 iterations with 28 iterations requiring
random bit generation. In our simulations, we used a clock of period 2µs to shift in the
values to the shift register. There were a total of 9 flip-flops in the circuit all connected to
form on shift register, which meant 9 clock cycles were required to shift in all the values
to the shift register. So, total time for training the circuit in hardware would be,
Total time = [(500 + 2 ∗ 9) ∗ 28 + 500 ∗ 248]µs (4.7)
Total time = 138.50ms (4.8)
We can see from Equation 4.8 that the actual time required to train this neural network
in hardware is less than 0.15s. Even though flip-flops can work faster than with clock
cycles of 2µs, we supplied such long pulses to reduce the run time for HSPICE simulation
by reducing the resolution. In real-time scenarios, the shift in time for random bits would
be far less than what is reported here.
4.7 Training Performance
We performed five training simulations to learn the OR-gate function using the same
neural network to analyze the performance of the training algorithm. One of the training
simulations required 276 iterations of weight updates. During training, the network
69
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
Figure 4.16: Mean squared error vs iterations for training OR-gate function.
received new random bits for weight update for 28 iterations, meaning that the output
error increased 28 times during training. The graph in Figure 4.16 shows how the mean
squared error of the output reduces during the course of training for this simulation. The
Y-axis shows the mean squared error and the X-axis shows the number of iterations.
We can see from the graph that initially the error increases and decreases a few times
before the error starts to decrease continuously. After the initial slow decent, the error
increases again close to 50 iterations. Following another change in the random inputs the
error starts to decrease steeply as the network finds a suitable direction of weight changes
to match the expected output. The error again increases and decreases before it finally
reaches below the error threshold of 0.015%.
Table 4.4: Comparison of training performance for multiple simulations for trainingOR-gate function in HSPICE
Simulation Total Weight Random Continuous Training TimeNo. Updates Updates Updates on Hardware (ms)
1 276 28 248 138.502 56 2 54 28.043 693 20 673 346.864 97 6 91 48.615 732 4 728 366.07
70
4.7. TRAINING PERFORMANCE
In Table 4.4 we summarize the results for all five training simulations for OR-Gate.
The number of iterations required for training cannot be predicted because of the ran-
domness of the algorithm used. We see that for the second experiment, the number of
iterations required for training was only 54, while the number for the fifth experiment it
was 732. When we compare the fourth and fifth simulation, we can see that the fourth
one finished in 97 iterations, but had more random weight updates than the fifth simu-
lation. In the fourth simulation, the network was able to find good directions for weight
change for its memristor bridge synapses and the error curve had a steeper slope. Figure
4.17-4.20 shows the mean squared error plot for simulations 2-5.
Figure 4.17: Mean squared error vs iterations for simulation 2.
71
CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS
Figure 4.18: Mean squared error vs iterations for simulation 3.
Figure 4.19: Mean squared error vs iterations for simulation 4.
72
4.8. SUMMARY
Figure 4.20: Mean squared error vs iterations for simulation 5.
4.8 Summary
The primary objective of this Chapter was to simulate and illustrate the working of the
neural network hardware architecture in SPICE. We began by simulating the working
of the individual components of the neural network and followed these simulations up
with a full training simulation completely in SPICE. We also presented an estimation of
the power consumption of the circuit and showed the timing requirements for training a
circuit. In the next Chapter, we draw conclusions of this work and propose enhancements
and extensions to architecture presented in this thesis.
73
Chapter 5
Conclusion and Future Work
5.1 Conclusion
The Memristor based artificial neural networks presented in [9] employed the Memristor
Bridge Synapse to implement weights and the Random Weight Change algorithm for
training. The focus of the work in [9] was only to prove that Memristor Bridge Synapse
based neural networks can used to learn complex functions and maybe implemented on
a chip with supplementary hardware. The simulations were done in software to illustrate
the training of the neural network but a path to actual hardware implementation was
not provided.
We based our work on the findings in [9] that the Memristor Bridge Synapse an effec-
tive system for implementing weights which when employed with the RWC algorithm can
yield to neural networks capable of learning complex functions on chip without requiring
a host computer. Our aim was to develop a complete hardware architecture to imple-
ment Memristor Bridge Synapse based artificial neural networks. First, we presented an
efficient way to place different layers of neurons to allow maximum inputs to be supplied
to the network with less routing. Then we went on to describe the primary components
of the neural network like the Memristor Bridge Synapse and the operational amplifiers
along with various other hardware components necessary to implement the training logic.
After describing the building blocks of the neural network, we went on to show how var-
74
5.2. FUTURE WORK
ious components could be combined together to form a bit-slice structure which can be
repeated to form layers of neurons.
We developed a prototypical placement and routing tool for the proposed architecture
to illustrate how the neural network would appear on layout. The tool also gives an
approximate insight into how much area the neural network requires and the efficiency
of the architecture in utilizing the chip area.
To ascertain that the proposed architecture with all its components can successfully
implement an artificial neural network capable of learning complex functions on chip, we
performed various SPICE simulations. We first simulated all of the basic components of
the architecture individually. After verifying their functionality, we combined the basic
components to make circuits to perform the different tasks in implementing and training
the neural network and tested their functioning. We performed a complete neural network
training simulation in HSPICE to learn the OR-gate function.
Through our simulations and analysis, we were able to conclude that the hardware
architecture presented in this thesis is an effective way to implement artificial neural
networks using memristors. As the large scale production of memristors on physical layout
becomes possible our architecture can be directly realized on chip without requiring any
additional circuity and can be easily scaled to have several layers of neurons to learn
complex functions.
5.2 Future Work
In this section, we present a few ideas that might help improve the robustness of the
system and its ability to learn functions and reduce power consumption.
5.2.1 Implementing Stronger Activation Function
The activation function implemented in the hardware architecture is only the summing
of the individual VA and VB voltage components of each bridge and taking the difference
of the two sums. A more complex activation function can be implemented to improve the
75
CHAPTER 5. CONCLUSION AND FUTURE WORK
learning process. Circuits are available to implement popular activation functions such as
the sigmoid function and these circuits can be added to the neuron. The circuit will need
to be tested to see how effective other activation functions will be when implemented
along with the Memristor Bridge Synapse and the Random Weight Change Algorithm.
5.2.2 Linear Feedback Shift Register for Random Bits
In our architecture, we had assigned the task of creating random bits for training solely
to the microcontroller. The microcontroller would serially shift in the bits to the shift
register before the training pulses could be applied. This processes takes several clock
cycles depending on the size of the network. However, if the shift registers themselves
were made to generate random bit values, then the process would take far less time.
Completely new random bits could be generated in just one clock cycle and would lead
to saving a lot of time during training. The layout of the architecture also contains lot
of free space to implement logic for creating an LFSR.
5.2.3 Implementing other Hardware Friendly Algorithms
The same hardware architecture could be used to implement neural networks with a
training algorithm other than the Random Weight Change algorithm. In [24], Moerland
and Fiesler explain few hardware friendly algorithms for artificial neural networks.
5.2.4 Bit-slice in Layout
A layout for the Memristor Bridge Synapse bit-slice can be created and functionality can
be incorporated into the placement and routing tool to automatically place and route the
bit-slice on the layout by replacing the p-diffusion blocks that represent neuron blocks.
5.2.5 Testing with more Memristor Models
The hardware architecture was tested only for one memristor model in our experiments.
The architecture should be tested with different memristor models, which may allow re-
76
5.2. FUTURE WORK
duction in training pulse application depending on the memristors’ device parameters.
We tested the network with only an ideal model of the memristor. In reality, the mem-
ristor’s non-idealities might play a significant part in the efficiency of the neural network
implementation. Thorough testing of the neural network can be done when characterized
libraries of memristors become available.
5.2.6 Reconfigurable Neural Network
Our architecture when translated to a layout will have a fixed number of inputs, hidden
layer neurons and output layer neurons. Different functions require different number of
neurons in each layer for efficient implementation. If logic can be incorporated into the
system by which the user can choose the number of neurons for each layer, it will provide
more flexibility and robustness in implementing various types of functions.
77
Bibliography
[1] Wikipedia, “Memristor — wikipedia, the free encyclopedia,” 2016. [Online; accessed
4-February-2016].
[2] R. Williams, “How we found the missing memristor,” Spectrum, IEEE, vol. 45,
pp. 28–35, Dec 2008.
[3] Wikipedia, “Artificial neural network — wikipedia, the free encyclopedia,” 2016.
[Online; accessed 5-February-2016].
[4] M. Holler, S. Tam, H. Castro, and R. Benson, “An electrically trainable artificial
neural network (etann) with 10240 ’floating gate’ synapses,” in Neural Networks,
1989. IJCNN., International Joint Conference on, pp. 191–196 vol.2, 1989.
[5] M. Milev and M. Hristov, “Analog implementation of ann with inherent quadratic
nonlinearity of the synapses,” Neural Networks, IEEE Transactions on, vol. 14, no. 5,
pp. 1187–1200, 2003.
[6] J. Liu, M. A. Brooke, and K. Hirotsu, “A cmos feedforward neural-network chip
with on-chip parallel learning for oscillation cancellation,” Neural Networks, IEEE
Transactions on, vol. 13, no. 5, pp. 1178–1186, 2002.
[7] J. A. Starzyk et al., “Memristor crossbar architecture for synchronous neural net-
works,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 61,
no. 8, pp. 2390–2401, 2014.
[8] M. Soltiz, D. Kudithipudi, C. Merkel, G. S. Rose, and R. E. Pino, “Memristor-
78
BIBLIOGRAPHY
based neural logic blocks for nonlinearly separable functions,” Computers, IEEE
Transactions on, vol. 62, no. 8, pp. 1597–1606, 2013.
[9] S. Adhikari, H. Kim, R. Budhathoki, C. Yang, and L. Chua, “A circuit-based learning
architecture for multilayer neural networks with memristor bridge synapses,” Circuits
and Systems I: Regular Papers, IEEE Transactions on, vol. 62, pp. 215–223, Jan
2015.
[10] H. Kim, M. Sah, C. Yang, T. Roska, and L. Chua, “Memristor bridge synapses,”
Proceedings of the IEEE, vol. 100, pp. 2061–2070, June 2012.
[11] CMU, “Neural networks for face recognition.” [Online; accessed 18-February-2016].
[12] L. Chua, “Memristor-the missing circuit element,” Circuit Theory, IEEE Transac-
tions on, vol. 18, pp. 507–519, Sep 1971.
[13] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing
memristor found,” Nature, vol. 453, pp. 80–83, May 2008.
[14] K. Hirotsu and M. Brooke, “An analog neural network chip with random weight
change learning algorithm,” in Neural Networks, 1993. IJCNN ’93-Nagoya. Proceed-
ings of 1993 International Joint Conference on, vol. 3, pp. 3031–3034 vol.3, Oct
1993.
[15] J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two
decades of progress,” Neurocomputing, vol. 74, no. 1, pp. 239–255, 2010.
[16] M. L. Mumford, D. K. Andes, and L. R. Kern, “The mod 2 neurocomputer system
design,” Neural Networks, IEEE Transactions on, vol. 3, no. 3, pp. 423–433, 1992.
[17] I. Bayraktaroglu, A. S. Ogrenci, G. Dundar, S. Balkır, and E. Alpaydın, “Annsys: an
analog neural network synthesis system,” Neural Networks, vol. 12, no. 2, pp. 325–
338, 1999.
79
BIBLIOGRAPHY
[18] S. P. Adhikari, C. Yang, H. Kim, and L. O. Chua, “Memristor bridge synapse-
based neural network and its learning,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 23, pp. 1426–1435, Sept 2012.
[19] S. P. Adhikari, H. Kim, R. K. Budhathoki, C. Yang, and J.-M. Kim, “Learning
with memristor bridge synapse-based neural networks,” in 2014 14th International
Workshop on Cellular Nanoscale Networks and their Applications (CNNA), pp. 1–2,
July 2014.
[20] M. P. Sah, C. Yang, H. Kim, T. Roska, and L. Chua, “Memristor bridge circuit
for neural synaptic weighting,” in 2012 13th International Workshop on Cellular
Nanoscale Networks and their Applications, pp. 1–5, Aug 2012.
[21] “Magic VLSI Layout Tool.” http://opencircuitdesign.com/magic/. Accessed: 04-19-
2016.
[22] Z. Biolek, D. Biolek, and V. Biolkova, “Spice model of memristor with nonlinear
dopant drift,” Radioengineering, vol. 18, no. 2, pp. 210–214, 2009.
[23] D. Biolek, M. Di Ventra, and Y. V. Pershin, “Reliable spice simulations of memris-
tors, memcapacitors and meminductors,” arXiv preprint arXiv:1307.2717, 2013.
[24] P. Moerland and E. Fiesler, “Neural network adaptations to hardware implementa-
tions,” tech. rep., IDIAP, 1997.
80