93

Scalable Hardware Architecture for Memristor

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalable Hardware Architecture for Memristor
Page 2: Scalable Hardware Architecture for Memristor

Scalable Hardware Architecture for Memristor

Based Artificial Neural Network Systems

A thesis submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

MASTER OF SCIENCE

in the Dept. of Electrical Engineering and Computing Systems

of the College of Engineering and Applied Sciences

May 2016

by

Ananthakrishnan Ponnileth Rajendran

B.Tech, Amrita Vishwa Vidyapeetham University, Kerala, India

May 2013

Thesis Advisor and Committee Chair: Dr. Ranga Vemuri

Page 3: Scalable Hardware Architecture for Memristor

Abstract

Since the physical realization of the Memristor by HP labs in 2008, research on Mem-

ristors and Memristive devices gained momentum, with focus primarily on modelling

and fabricating Memristors and in developing applications for Memristive devices. The

Memristor’s potential can be exploited in applications such as neuromorphic engineering,

memory technology and analog and digital logic circuit implementations. Research on

Memristor based neural networks have thus far focused on developing algorithms and

methodologies for implementation.

The Memristor Bridge Synapse, a Wheatstone bridge-like circuit composed of four

Memristors is a very effective way to implement weights in hardware neural networks. Re-

search on Memristor Bridge Synapse implementations coupled with the Random Weight

Change Algorithm proved effective in learning complex functions with potential for imple-

mentation on hardware with simple and efficient circuity. However, the simulations and

experiments conducted was purely on software and was only proof of concept. Realizing

neural networks using the Memristor Bridge Synapse capable of on-chip training requires

an effective hardware architecture with numerous components and complex timing.

This thesis presents a scalable hardware architecture for implementing artificial neu-

ral networks using the Memristor Bridge Synapse capable of being trained on-chip using

the Random Weight Change algorithm. Individual components required for implement-

ing training logic, timing and evaluation are described and simulated using SPICE. A

complete training simulation for a small neural network based on the proposed architec-

ture was performed using HSPICE. A prototypical placement and routing tool for the

architecture is also presented.

ii

Page 4: Scalable Hardware Architecture for Memristor
Page 5: Scalable Hardware Architecture for Memristor

To my parents and my sister. Thank you for being my inspiration.

In memory of my friends Govind and Srinivas. You’ll forever be in my heart.

iv

Page 6: Scalable Hardware Architecture for Memristor

Acknowledgements

I would like to start by thanking the most important people in my life, my family. My

parents Rajendran and Rajam have made a lot of sacrifices to help my sister Malavika

and I to realize our dreams. Thank you very much for believing in me and motivating

me towards realizing my goals. I will forever be indebted to you.

I consider myself very lucky to have received the opportunity to work under Dr. Ranga

Vemuri. The knowledge you imparted will forever stay with me. Thank you very much

for letting me be a part of DDEL and guiding me through my Master’s journey. Thank

you Dr. Wen-Ben Jone and Dr. Carla Purdy for being part of my defense committee.

Thanks to Rob Montjoy for providing continuous support with the DDEL machines.

Special thanks to my friend Prabanjan for our innumerable discussions and the ideas you

gave me to put my work together. I would like to thank my friends Diwakar, Ashwini

and Meera for providing a helping hand on numerous occasions. Thank you Renuka for

reviewing my thesis.

I would like to thank all my teachers from primary school through college for moulding

me into the person I am today. Special thanks to Dr. Rajesh Kannan Megalingam for

inducing interest in the field of VLSI in me and motivating me to pursue a Master’s

degree.

Last but not least, thanks to all my friends and relatives for being a part of my journey

of life. I will forever be grateful for your help and support.

v

Page 7: Scalable Hardware Architecture for Memristor

Contents

1 Introduction 1

1.1 The Memristor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Artificial Neural Networks on Hardware . . . . . . . . . . . . . . . . . . . 7

1.3.1 Analog Neural Network Implementations . . . . . . . . . . . . . . 7

1.3.2 Memristor Based Neural Networks . . . . . . . . . . . . . . . . . 10

1.4 Random Weight Change Algorithm . . . . . . . . . . . . . . . . . . . . . 15

1.5 Memristor Bridge Synapse . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.6 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.7 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 The Memristor Neural Network Architecture 22

2.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.2 Architecture Components . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.1 Neuron Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.2 Microcontroller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.2.3 Shift Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.2.4 Connection Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3 Memristor Bridge Synapse Bit-Slice . . . . . . . . . . . . . . . . . . . . . 35

2.4 Architecture in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

vi

Page 8: Scalable Hardware Architecture for Memristor

CONTENTS

3 Placement and Routing Tool for Memristor Neural Network Architec-

ture 38

3.1 Tool Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Tool Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Output and Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 43

3.3.1 Area Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.2 Runtime Performance . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4 Experimental Results and Analysis 50

4.1 Memristor Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Memristor Bridge Synapse Simulation . . . . . . . . . . . . . . . . . . . . 54

4.3 Memristor Bridge Synapse Bit-Slice Simulation . . . . . . . . . . . . . . 57

4.4 Simple Neural Network Simulation . . . . . . . . . . . . . . . . . . . . . 59

4.5 OR-Gate Training in SPICE . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5.2 Observation and Analysis . . . . . . . . . . . . . . . . . . . . . . 65

4.6 Power and Timing Estimation . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6.1 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.7 Training Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5 Conclusion and Future Work 74

5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2.1 Implementing Stronger Activation Function . . . . . . . . . . . . 75

5.2.2 Linear Feedback Shift Register for Random Bits . . . . . . . . . . 76

5.2.3 Implementing other Hardware Friendly Algorithms . . . . . . . . 76

vii

Page 9: Scalable Hardware Architecture for Memristor

CONTENTS

5.2.4 Bit-slice in Layout . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.2.5 Testing with more Memristor Models . . . . . . . . . . . . . . . . 76

5.2.6 Reconfigurable Neural Network . . . . . . . . . . . . . . . . . . . 77

Bibliography 78

viii

Page 10: Scalable Hardware Architecture for Memristor

List of Figures

1.1 Conceptual symmetries of the four circuit variables with the three classical

circuit elements and the memristor [1]. . . . . . . . . . . . . . . . . . . . 2

1.2 Cross section of HP’s Crossbar Array showing the memristor switch [2]. . 3

1.3 Representation of a simple three-layered artificial neural network [3]. . . . 5

1.4 Differential floating gate synapse schematic diagram of Electrically Train-

able Analog Neural Network (ETANN) [4]. . . . . . . . . . . . . . . . . . 8

1.5 Analog current synapse, synapse current input, weight-control and neuron

output circuit schematic of model proposed in [5]. . . . . . . . . . . . . . 9

1.6 Schematic of a weight cell of CMOS integrated feed-forward neural network

[6]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.7 Excitatory neuron with the input sensing circuit of Memristor Crossbar

Architecture for Synchronous Neural Networks [7]. . . . . . . . . . . . . . 11

1.8 Weighting and Range Select circuit for RANLB and MTNLB [8]. . . . . . 12

1.9 (a) Activation function circuit for RANLB. (b) Activation function circuit

for MTNLB [8]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.10 Circuit that accomplishes weighting using the Memristor bridge synaptic

circuit and voltage-to-current conversion with differential amplifier in [9]. 13

1.11 (a) Typical multi-layered neural network inputs in voltage form. (b)

Schematic of learning architecture for the equivalent hardware for the neu-

ral network in (a) [9]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.12 Flowchart for Random Weight Change Algorithm. . . . . . . . . . . . . . 15

ix

Page 11: Scalable Hardware Architecture for Memristor

LIST OF FIGURES

1.13 Illustration of energy surface tracing by back-propagation and random

weight change algorithm [9]. . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.14 Memristor Bridge Synapse Circuit [10]. . . . . . . . . . . . . . . . . . . . 17

2.1 Sample input for face pose identification problem [11]. . . . . . . . . . . . 22

2.2 Three layered neural network for face pose identification. . . . . . . . . . 23

2.3 Memristor based neural network architecture for face pose identification. 24

2.4 Simple three-layered neural network. . . . . . . . . . . . . . . . . . . . . 26

2.5 Memristor Bridge Synapse design. . . . . . . . . . . . . . . . . . . . . . . 27

2.6 Summing logic for neuron N3 from Figure 2.4. . . . . . . . . . . . . . . . 28

2.7 Summing circuit using voltage average and operational amplifier circuits. 29

2.8 Difference circuit using differential amplifier. . . . . . . . . . . . . . . . . 30

2.9 Neuron N3 inputs and output for the neural network in Figure 2.4. . . . 32

2.10 Memristor Bridge Synapse Bit-Slice. . . . . . . . . . . . . . . . . . . . . 36

3.1 Placement of 10 blocks of output layer on layout represented with p-diffusion. 41

3.2 After routing of input bus for placed blocks in Figure 3.1. . . . . . . . . . 42

3.3 Completed placement and routing for neural network with 30 inputs, 10

hidden layer neurons and 10 output layer neurons. . . . . . . . . . . . . . 43

3.4 Flowchart showing the tool flow for placement and routing. . . . . . . . . 44

3.5 Layout for face pose identification neural network with 960 inputs, 10

hidden layer neurons and 4 output layer neurons. . . . . . . . . . . . . . 45

3.6 Output layer layout for face pose identification neural network. . . . . . . 46

3.7 Output layer layout for neural network with 80 inputs, 12 hidden layer

neurons and 15 output layer neurons. . . . . . . . . . . . . . . . . . . . . 47

3.8 Neural network with 80 inputs and 15 output layer neurons having two

hidden layers with 30 neurons in the first hidden layer and 25 neurons in

the second hidden layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1 Circuit for Memristor simulation with Memristor M1 (Ron =116Ω, Roff=16kΩ)

in series with resistor R1 (100Ω) and Voltage source Vin. . . . . . . . . . 51

x

Page 12: Scalable Hardware Architecture for Memristor

LIST OF FIGURES

4.2 Memristor simulation with DC voltage +1V and -1V. . . . . . . . . . . . 52

4.3 Resistance change in the memristor for millisecond input pulse-width. . . 53

4.4 Resistance change in the memristor for microsecond input pulse-width. . 54

4.5 Memristor Bridge Synapse circuit used for simulation. . . . . . . . . . . . 55

4.6 Memristor Bridge Synapse simulation waveform. . . . . . . . . . . . . . . 55

4.7 Evaluation pulse applied to Memristor Bridge Synapse. . . . . . . . . . . 56

4.8 Memristor Bridge Synapse Bit-Slice simulation waveform. . . . . . . . . . 58

4.9 Neural network training input application and output evaluation. . . . . 59

4.10 Neural network weight update pulse application. . . . . . . . . . . . . . . 60

4.11 Neural network output at evaluation during different iterations. . . . . . 61

4.12 Flowchart showing tool flow for neural network training simulator in SPICE. 62

4.13 Neural network output for learning OR-gate function at the start of sim-

ulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.14 Neural network output for learning OR-gate function for 54th iteration of

training. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.15 Neural network output for learning OR-gate function at the end of simulation. 66

4.16 Mean squared error vs iterations for training OR-gate function. . . . . . 70

4.17 Mean squared error vs iterations for simulation 2. . . . . . . . . . . . . . 71

4.18 Mean squared error vs iterations for simulation 3. . . . . . . . . . . . . . 72

4.19 Mean squared error vs iterations for simulation 4. . . . . . . . . . . . . . 72

4.20 Mean squared error vs iterations for simulation 5. . . . . . . . . . . . . . 73

xi

Page 13: Scalable Hardware Architecture for Memristor

List of Tables

2.1 Training input selection logic. . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1 Comparison of total layout area for neural networks for different technology

nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Fraction of unused area in layout for different neural networks . . . . . . 47

4.1 Instantaneous current and resistance measurements for forward biased

memristor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Instantaneous current and resistance measurements for reverse biased mem-

ristor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3 Weight change for different training signal pulse-widths for memristor

bridge synapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4 Comparison of training performance for multiple simulations for training

OR-gate function in HSPICE . . . . . . . . . . . . . . . . . . . . . . . . 70

xii

Page 14: Scalable Hardware Architecture for Memristor

Chapter 1

Introduction

In 1971, Leon Chua presented an argument that a fourth two-terminal device should

exist along with the three classical circuit elements, namely, the resistor, capacitor and

inductor [12]. He named this fourth circuit element as the Memristor. Chua pointed out

that the three basic circuit elements were defined based on a relationship between two

of the four fundamental circuit variables current, voltage, charge and flux-linkage. There

are six possible relationships between these four circuit variables, of which two are direct

relationships.

q =∫i(t)dt, (1.1)

is the relationship between charge (q) and current (i) and,

φ =∫v(t)dt, (1.2)

is the relationship between flux-linkage (φ) and voltage (v). The other three relations

are based on the axiomatic definition of the three classical circuit elements. The resistor

is defined by the relationship between current and voltage, the inductor by current and

flux-linkage and the capacitor by the relationship between charge and voltage. Chua

postulated based on a logical as well as axiomatic point of view that a fourth basic

two-terminal device should exist, which can be characterized by charge and flux-linkage.

There was no physical realization for such a two terminal device for over three decades

1

Page 15: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

Figure 1.1: Conceptual symmetries of the four circuit variables with the three classicalcircuit elements and the memristor [1].

since Chua’s proposal, until in 2008 Dmitri B. Strukov et al. from HP Labs published

an article claiming that they observed memristance arises naturally in nanoscale systems

on coupling solid-state electronic and ionic transport under an external bias voltage [13].

Since this discovery, research on Memristors and Memristive devices gained momentum,

with focus primarily on modelling and fabricating memristors and in developing applica-

tions for memristive devices. The memristor’s potential can be exploited in applications

such as neuromorphic engineering, memory technology and analog and digital logic cir-

cuit implementations. The work presented in this thesis focuses on the application of the

memristors in the area of artificial neural networks.

1.1 The Memristor

The Memristor is a two terminal device whose electrical resistance is not a constant, but

varies depending on the amount of charge that flows through it. This variable resistance

of the Memristor is termed as its Memristance. The memristor is non-volatile in nature,

meaning that the device can remember its most recent resistance value even after it is

2

Page 16: Scalable Hardware Architecture for Memristor

1.1. THE MEMRISTOR

Figure 1.2: Cross section of HP’s Crossbar Array showing the memristor switch [2].

disconnected from an electric power supply. This property of the memristor makes it

very useful for various applications such as in designing efficient memories and hardware

realizations of artificial neural networks.

There have been several implementations for the memristor device such as the Poly-

meric Memristor, Layered Memristor, Ferroelectric Memristor, Spin Memristive systems

etc. In this text, we will discuss the Titanium Dioxide Memristor that HP developed

in 2008. Researchers Dmitri B. Strukov et al. developed the memristor while working

on crossbar memory architecture at HP Labs. The crossbar is an array of perpendicular

wires that are connected using swtiches at points where they cross. Their idea was to

open and close these switches by applying voltages at the end of the wires. The design

of these switches lead to the creation of the memristor.

HP’s memristor is composed by sandwiching a thin layer of titanium dioxide (TiO2)

between two platinum electrodes. The electrodes are about 5nm thick and the TiO2 layer

is about 30nm thick. The TiO2 layer is divided into two separate regions, one composed

of pure TiO2 and the other slightly depleted of oxygen atoms. These oxygen vacancies

act as charge carriers and help conduct current through the device leading to a lower

resistance in the oxygen depleted region. The application of an electric field results in

3

Page 17: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

a drift of these oxygen vacancies which results in a shift of the boundary between the

low and high resistance regions. Figure 1.2 shows a cross sectional view of HP’s crossbar

array with the memristor. If an electric field is applied across the two electrodes, it results

in the boundary between the normal region and oxygen depleted region moving either

towards or away from the upper platinum electrode. If the boundary moves towards the

upper electrode, it results in higher resistance and vice versa. Thus, the resistance of the

device is dependent on how much charge has passed through it in a particular direction.

The memristance is observed only when both the pure and doped regions contribute to

the resistance. After enough charge passes through the device, the ions becomes unable

to move further and the device enters hysteresis. The device then acts as a simple resistor

until the direction of the current is reversed.

In 2010, R. Stanley Williams of HP labs reported that they were able to fabricate

memristors as small as 3 nm by 3 nm in size that had a switching time of 1 ns (1 GHz

speed). Such small dimension and great speed promises a lot of application for the

memristor. In the work presented here, the memristor’s ability to provide a wide range

of resistance values is utilized in creating synaptic weights for artificial neural networks.

For simplicity, we have used the term ’resistance’ instead of ’memristance’ throughout in

this text.

1.2 Artificial Neural Networks

Artificial neural networks are group of nodes that are connected using weighted edges.

They are models inspired by biological neural networks and are used to estimate or

approximate functions that usually depend on a large number of unknown inputs. The

ability of artificial neural networks to adapt to a given set of circumstances is what

makes them very attractive for applications such as pattern recognition, data mining,

game-play and decision making, medical diagnosis etc. Neural networks adapt to a given

set of inputs by modifying the weights of the interconnects between its neurons based

on a suitable algorithm. An activation function at the neuron defines its output for an

4

Page 18: Scalable Hardware Architecture for Memristor

1.2. ARTIFICIAL NEURAL NETWORKS

input or set of inputs to it. There are mainly three learning paradigms, viz. supervised

learning, unsupervised learning and reinforcement learning.

Every neural network has one input layer and one output layer. It may have one or

more hidden layers. Figure 1.3 shows a simple three-layered neural network. The number

of neurons in each layer depend on the function that the network is trying to approximate.

The neural networks discussed in this thesis are feed-forward neural networks i.e., data

only flows in the forward direction and there is no feedback for the data while the network

is evaluated. The neural network in Figure 1.3 is fully interconnected arrangement in the

sense that every neuron in one layer is connected to every neuron in the succeeding layer.

This not a necessity while designing a neural network since all connections may not be

required to implement a specific function. However, it is very difficult to accurately

predict the optimal number of hidden layer neurons and connections that a particular

problem might require. The beauty lies in the fact that neural networks have the ability

to learn whether or not a particular neuron or connection has a significant impact on its

output.

Figure 1.3: Representation of a simple three-layered artificial neural network [3].

Supervised learning is one of the most commonly used learning method for artificial

5

Page 19: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

neural networks. In this kind of learning, the aim is to infer the mapping implied by the

data; the cost function is related to the mismatch between the user’s mapping and the

data and it implicitly contains prior knowledge about the problem domain [3]. The mean-

squared error is often used as the cost and the learning tries to reduce the average error

between the network’s output and the desired output. The Backpropagation algorithm

is a well-known and efficient algorithm used for training neural networks. Training is

accomplished by adjusting the weights on the connections between neurons with an aim

to reduce the mean-squared error at the output of the neural network.

The Backpropagation algorithm calculates the gradient of a loss function with respect

to all of the weights in the network. The algorithm tries to minimize the loss function

by feeding the gradient to an optimization method which uses it to update the weights.

In order for the Backpropagation algorithm to work, the activation function used by the

neurons should be differentiable. The activation function is any mathematical function

at the neuron which defines its output for a given set of inputs. The Backpropagation

algorithm is very effective in training neural networks, but poses a lot of challenges when

implementing it on a standalone hardware system. The algorithm works in two phases;

the propagation phase and the weight update phase. In the propagation phase, the

algorithm first forward propagates a training input through the network and generates

the output activations. In the next step, the algorithm does a backward propagation

of the output activations through the network using the target pattern to generate the

difference between input and output values of all the hidden and output neurons. In the

weight update phase, the algorithm first multiplies the difference obtained with the input

activation to find the gradient of the weight. Then it uses this gradient to update each

of the weights in the network.

It is quite evident that the Backpropagation algorithm though very effective, requires

complex multiplication, summation and derivaties that are difficult to implement in VLSI

circuits [14]. A simpler algorithm is desirable to design a standalone hardware neural

network system. There are several hardware friendly algorithms implemented to train

artificial neural networks on hardware. The Random Weight Change algorithm is one

6

Page 20: Scalable Hardware Architecture for Memristor

1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE

such popular algorithm. Though not as efficient as Backpropagation, it is hardware

friendly and much simpler to implement.

1.3 Artificial Neural Networks on Hardware

Implementation of artificial neural networks on hardware has been popular for over three

decades. Hardware neural networks extend from Analog to Digital to FPGA and even to

Optical Neural Networks. In this section, we briefly explore a few analog neural network

implementations and neural network implementations using memristors.

1.3.1 Analog Neural Network Implementations

Implementation of artificial neural networks on hardware gained popularity in the 1980s

with Intel’s Electrically Trainable Analog Neural Network (ETANN) 80170NX chip being

one of the earliest fully developed analog chips [4]. The ETANN is a general purpose neu-

rochip that stores its weights on non-volatile floating gate transistors (Floating-gate MOS-

FET or FGMOS) as electric charge with the help of EEPROM cells, and uses Gilbert-

multiplier synapses to provide four-quadrant multiplication. Training for ETANN is done

off chip using a host computer and the weights are written into the ETANN [4]. The chip

contains 64 fully interconnected neurons and can be cascaded by bus interconnection to

form a network of up to 1024 neurons with up to 81,920 weights [15].

Figure 1.4 shows the synapse circuit of the ETANN, which is an NMOS version of

the Gilbert-Multiplier with a pair of EEPROM cells in which the a differential voltage

is stored as weights. Flower-Nordheim tunneling of electrons is used to add and remove

electrons from the floating gates in the EEPROM to adjust the weights [4]. ETANN was

used in several systems like the Mod2 Neurocomputer which implemented 12 ETANN

chips for real-time image processing [16] and the MBOX II which makes use of 8 ETANN

chips to create an analog audio synthesizer [15].

One of the major drawbacks of this chip was the limited resolution in storing the

synaptic weights. The long time resolution of the weights was not more than five bits.

7

Page 21: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

Figure 1.4: Differential floating gate synapse schematic diagram of ElectricallyTrainable Analog Neural Network (ETANN) [4].

Another issue was the writing speed and cyclability of the EAROMs used to store the

weights which restricted the application of chip-in-the-loop training [17].

Milev and Hristov [5] present a simple analog-signal synapse with inherent quadratic

non-linearity implemented using MOSFETs with no floating-gate transistors. They de-

signed a neural matrix for finger-print feature extraction with 2176 analog current mode

synapses arranged in eight layers of 16 neurons with 16 inputs each. A chip was fabri-

cated in a standard 0.35µm TSMC process to demonstrate the feasibility of non-linear

synapses in practical application.

Apart from the 16 x 8 neural-matrix of 128 analog 16-input-neurons, a 16-bit latched

digital inputs multiplexed with 16 analog-current inputs and 16 analog-current signal out-

puts and a 9-bit current-output digital-to-analog converter (DAC) is also implemented on

chip. Weight storage done is on an on-chip SRAM of more than 19K size. The architec-

ture allows for cascaded interconnection for system expansion. The internal system clock

is specified at 200 MHz maximum frequency. However, the input-data processing speed

is determined by current propagation delay through the components in the network and

varies significantly with the reference current driving the analog synapse circuits [5].

8

Page 22: Scalable Hardware Architecture for Memristor

1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE

Figure 1.5: Analog current synapse, synapse current input, weight-control and neuronoutput circuit schematic of model proposed in [5].

Lui et al. [6] developed a mixed signal CMOS feed-forward neural network chip

with on-chip error reduction hardware. The design is compact and capable of high-speed

parallel learning using the Random Weight Change Algorithm (RWC). The weight storage

in the system is accomplished using capacitors. Capacitors implemented as weights are

compact and easy to program, but are susceptible to leakage issues leading to error in the

stored weights. In their system, Lui et al. designed large capacitors to ensure the leakage

be negligible. The chip is designed to operate in conditions that change continuously,and

the weight leakage problem is mitigated by constant weight updates. They found that

the weight retention time for the capacitors was around 2s for losing 1% of the weight

value at room temperature.

Figure 1.6 shows the schematic of a single weight cell with a shift register for random

input, the weight storage and modification circuit and the multiplier circuit. Lui et al.

were able to fabricated and test a chip with 100 weights and 10x10 array with 10 inputs

and 10 outputs. They tested the chip with by connecting it to a PC using an analog to

digital converter (ADC) and a digital to analog converter (DAC). In this work we make

use of the same RWC algorithm used by Lui et al. in their system. The RWC algorithm

9

Page 23: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

Figure 1.6: Schematic of a weight cell of CMOS integrated feed-forward neural network[6].

is described in detail in the next section.

The analog neural network implementations discussed in this text is only a small

subset of the innumerable VLSI implementations of artificial neural networks. Misra and

Saha [15] provide a comprehensive survey of the hardware implementations of artificial

neural networks for over 20 years. Their discussion is not limited to analog neural network

implementations, but extend to digital, hybrid, FPGA based, RAM based and optical

neural networks.

1.3.2 Memristor Based Neural Networks

The potential to mimic brain logic is one of the most attractive feature of the mem-

ristor. Various architectures and synapse designs have been proposed using memristors

for realizing artificial neural networks. Here, we briefly discuss a couple of neural net-

work implementations using memristors and the Memristor Bridge Synapse based neural

network that we have used as the primary reference in our work.

Starzyk and Basawaraj [7] propose an architecture and training scheme for neural

networks implemented using crossbar connections of memristors with a view of preserving

the high density of synaptic connections. They employ simple threshold based neurons,

synapse constituting of only a single memristor and a common sensing network. The

synapse is designed with a view of creating large scale systems with synapses arranged

10

Page 24: Scalable Hardware Architecture for Memristor

1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE

in a grid structure capable of being trained on-chip. The sysyem is composed of a single

layer feed-forward neural network with n inputs and m outputs.

Figure 1.7: Excitatory neuron with the input sensing circuit of Memristor CrossbarArchitecture for Synchronous Neural Networks [7].

The neuron of the Memristor Crossbar Architecture proposed in [7] operates in three

different phases, viz. sensing phase, active phase and resting phase. During the sensing

phase, the neuron waits for input activity and does not fire. Increase in any of the input

signals above the threshold would switch the neuron into active phase, where the neuron

either fires or does not for a specific amount of time. Once the active phase timing expires,

the neuron goes into resting phase where all the inputs and outputs go to 0V and remains

in this state till the next sampling time. The excitatory neuron with the input sensing

circuit of the Memristor Crossbar Architecture is shown in Figure 1.7. The design was

tested in HPSICE for organization of the neural network on noisy digit recognition.

In [8] Solitiz et al. propose two Neuron Logic Block (NLB) designs to overcome the

limitation of not being able train linearly inseparable functions with existing perceptron

based NLB designs using thin-film memristors that implement static threshold activation

functions. Their designs overcome the limitation by allowing effective activation function

to be adapted during learning. Solitiz et al. contribute a perceptron based NLB design

with an adaptive activation function, a perceptron based NLB with static activation func-

tion and multiple activation thresholds and demonstrate the designs for reconfigurable

logic and optical character recognition for hand written digits.

11

Page 25: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

Figure 1.8: Weighting and Range Select circuit for RANLB and MTNLB [8].

Figure 1.8 shows the weighting and range selection circuit implemented using mem-

ristors for the Robust Adaptive Neural Logic Block (RANLB) and the Multithreshold

Neural Logic Block (MTNLB). The RANLB implements an adaptive activation function

using the circuit in Figure 1.9 (a), by providing an adjustable digital value for each in-

put current range. A flip-flop stores the digital value for each input current range. The

MTNLB is designed with a view of overcoming the high area overhead of the RANLB’s

activation function which limits its implementation on large neural networks where area

is a primary constraint. The MTNLB employs a static activation function in such a way

that the ability to learn linearly inseparable functions is not compromised. Figure 1.9 (b)

shows the activation function circuit for the MTNLB circuit.

Figure 1.9: (a) Activation function circuit for RANLB. (b) Activation function circuitfor MTNLB [8].

12

Page 26: Scalable Hardware Architecture for Memristor

1.3. ARTIFICIAL NEURAL NETWORKS ON HARDWARE

Figure 1.10: Circuit that accomplishes weighting using the Memristor bridge synapticcircuit and voltage-to-current conversion with differential amplifier in [9].

The Memristor Bridge Synapse introduced by Kim et al. in [10] is a very popular

synaptic design used to implement neural networks. [9], [18], [19] and [20] present imple-

mentations of the Memristor Bridge Synapse in artificial neural networks. In our work,

we build on the work presented in [9] by Adhikari et al. on neural networks constructed

using Memristor Bridge Synapse that involves the Random Weight Change algorithm for

training.

Each neuron in the Memristor Bridge Synapse based neural network in [9] is composed

of multiple synapse and one activation unit. The inputs to the neural network are sup-

plied as voltage values which are weighted and then converted to current by differential

amplifiers. Kirchhoff Current Law (KCL) is used to sum the currents and produce the

output of a neuron. The differential amplifier along with the active load circuit form the

activation unit of the neuron. Figure 1.10 shows the Memristor Bridge Synapse connected

to the differential amplifier circuit. Figure 1.11 (a) shows a simple neural network with

two neurons and Figure 1.11 (b) shows the equivalent hardware circuit for the neural

network in Figure 1.11 (a) along with the architecture for the training regime.

Adhikari et al. designed and simulated the differential amplifier and the active load

circuit in HSPICE and developed a look-up table from the results. The Memristor model,

error calculation, random number generation and training pulse application were simu-

lated in MATLAB. They tested the architecture to learn the 3-bit parity problem, a Robot

13

Page 27: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

Figure 1.11: (a) Typical multi-layered neural network inputs in voltage form. (b)Schematic of learning architecture for the equivalent hardware for the neural network in

(a) [9].

workspace and face pose identification using neural networks with 3 input x 5 hidden x

1 output, 10 input x 20 hidden x 1 output and 960 input x 10 hidden x 4 output nodes

respectively in MATLAB [9]. Their aim was to show that the Memristor Bridge Synapse

based neural networks trained using the Random Weight Change algorithm could be used

to realize simple, compact and reliable neural networks that are capable of being used for

real-life applications.

In our work, we have used the Memristor Bridge Synapse based neural networks

described in [9] as the base and try to build a complete hardware architecture which

can be implemented on a chip. We have made several modifications to the architecture

presented in [9], but have used the Memristor Bridge Synapse as the primary component

of the system along with the application of the RWC algorithm for training. The RWC

14

Page 28: Scalable Hardware Architecture for Memristor

1.4. RANDOM WEIGHT CHANGE ALGORITHM

algorithm and the circuit implementation of the Memristor Bridge Synapse are discussed

in detail in the following sections.

1.4 Random Weight Change Algorithm

The Random Weight Change (RWC) algorithm was first described by Hirotsu and Brooke

in 1993. They proposed the algorithm as an alternative to Backpropagation to eliminate

the need for complex calculations while training a neural network. The non-idealities

of analog circuits is another reason why Backpropagation is not preferred for hardware

implementations. They were able to successfully implement and test the algorithm on a

chip with 18 neurons and 100 weights which learned the XOR Gate problem [14].

Figure 1.12: Flowchart for Random Weight Change Algorithm.

The algorithm randomly changes all of the weights by a small increment of -δ or +δ

from its initial state. The training input is then supplied to the network and the output

15

Page 29: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

Figure 1.13: Illustration of energy surface tracing by back-propagation and randomweight change algorithm [9].

error is calculated. If the new error has reduced compared to the previous iteration,

the same weight change is done again, until the output error either increases or falls

to within a desired limit. If the output error increases, then the weights are updated

randomly again. The algorithm can be summarized using the following equations from

[14]:

wij(n+ 1) = wij(n) + ∆wij(n+ 1) (1.3)

where,

∆wij(n+ 1) = ∆wij(n) if E(n+ 1) < E(n)

∆wij(n+ 1) = δ ∗Rand(n) if E(n+ 1) ≥ E(n)

E() is the root mean-squared error at the output, δ is a small constant and Rand(n) =

+1 or -1 randomly. The flowchart in Figure 1.12 illustrates the steps in the Random

Weight Change algorithm.

The Random Weight Change algorithm is less efficient when compared to Backproga-

tion. Figure 1.13 shows a comparison of the RWC algorithm with Backpropagation. For

Backpropagation, the operating point goes down along the steepest slope of the energy

16

Page 30: Scalable Hardware Architecture for Memristor

1.5. MEMRISTOR BRIDGE SYNAPSE

Figure 1.14: Memristor Bridge Synapse Circuit [10].

curve in the network. For RWC algorithm, the operating point goes up and down on the

energy curve rather than descending straight along the energy curve. However, RWC’s

operating point statistically descends and finally reaches the correct answer [14].

The RWC algorithm is very effective for analog implementations of artificial neural

networks as it eliminates the need for complex circuitry and is not greatly affected by

circuit non-idealities. Moreover, the algorithm does not require any specific network struc-

ture and can be applied to all feed-forward neural networks. Fully connected feedback

networks may have local minimum problems [14].

1.5 Memristor Bridge Synapse

The Memristor Bridge Synapse is a Wheatstone Bridge like circuit that is composed of

four identical memristors. Figure 1.14 shows the arrangement of the memristors in the

Bridge Synapse. The memristor are arranged such that the polarities of memristors M1

and M4 are the same and opposite to that of M2 and M3. When a positive voltage

is supplied at Vin M1 and M4 are forward biased, which leads to the decrease in their

resistances. M2 and M3 on the other hand become reverse biased and their resistance

increases [10]. The outputs of the Bridge Synapse are tapped out at the nodes A and B.

The Bridge Synapse basically acts as two voltage divider circuits. The voltage at the

17

Page 31: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

nodes A and B are given by the simple voltage divider formula:

VA = (M2

M1 +M2

) ∗ Vin (1.4)

VB = (M4

M3 +M4

) ∗ Vin (1.5)

where M1, M2, M3 and M4 are the resistance of the memristors M1, M2, M3 and M4

respectively. The weight of the Memristor Bridge Synapse is the difference in the voltage

VA and VB. Initially, when all the memristors are in the same state, the node VA and VB

will have the same value. The synaptic weight of the Bridge Synapse is described by the

following expressions from [10]:

positive synaptic weight if,

M2

M1

>M4

M3

negative synaptic weight if,

M2

M1

<M4

M3

zero synaptic weight if,

M2

M1

=M4

M3

The output of the Bridge Synapse can be modelled by the equation

Vout = ψ ∗ Vin (1.6)

where ψ is the synaptic weight defined by,

ψ =M2

M1 +M2

− M4

M3 +M4

(1.7)

The Memristor Bridge Neuron is implemented by summing the output signals from

different Bridge Synapses. Differential amplifiers are used to process the weighted inputs

from primary inputs or other neurons. The implementation of the Bridge Neuron is

described in Chapter 2.

18

Page 32: Scalable Hardware Architecture for Memristor

1.6. THESIS STATEMENT

1.6 Thesis Statement

Since the physical realization of the memristor by HP labs in 2008, the research on

memristors and its applications have been constantly gathering pace. The potential

of memristors in realizing simple and fast neuromorphic circuits is immense. As the

lithographic process for fabricating memristors evolve, architectures and tools for circuit

realization also need to evolve.

Majority of the research on memristor based neural networks have thus far focused on

various algorithms and methodologies for implementation. The Bridge Synapse based ar-

tificial neural network presented in [9] shows a lot of promise for practical implementation

because of the simplicity in its design. In the work presented in [9], the authors focused

on illustrating the simplicity and effectiveness of using the Memristor Bridge Synapse

in tandem with the Random Weight Change algorithm for neural network implementa-

tions. They proposed to use the Memristor Bridge Synapse as the weighting element of

the neural network to which inputs were applied as voltage pulses. At the neuron level,

voltage-to-current conversion was achieved using differential amplifiers to take advantage

of Kirchhoff Current Law to sum the inputs of the neurons. The differential amplifier

along with the active load circuit form the activation unit of the neuron.

In [9], the authors tested their design by first simulating the differential amplifier and

the active load circuit in HSPICE and creating a look-up table which was then used

for training the neural network in MATLAB. The neural network circuit was created in

MATLAB using a memristor model. The error calculation and random number generation

was done by MATLAB code and the weight updates were done by changing the resistance

of the memristors in the bridge synapse based on the random numbers. They successfully

trained neural network for 3-bit parity problem, learning robot workspace and for face

pose identification.

Although [9] proves that neural networks using the Memristor Bridge Synapse for

weighting along with the RWC algorithm for training is a good approach for real-life

applications, a path for an actual realization of a chip was not described. Moreover,

19

Page 33: Scalable Hardware Architecture for Memristor

CHAPTER 1. INTRODUCTION

on-chip training requires additional circuitry and timing becomes critical. In our work,

we focus on developing an architecture that can efficiently implement the RWC algorithm

and Memristor Bridge Synapses to create a hardware neural network the can be trained

completely on chip. We have made modifications to the design of the neuron and activa-

tion function in [9], but the training algorithm and weighting methodology remains the

same.

Our architecture is composed of the neural network circuit realized using Memristors

and differential amplifiers. The architecture also incorporates a microcontroller, which

is responsible for measuring and calculating the output error and supplying the random

training signals and timing signals to the neural network. We designed and implemented

circuits to supply the random inputs and apply them to each individual Memristor Bridge

Synapse during training.

We also developed a placement and routing tool to realize the architecture on a

physical layout. The tool takes the number of inputs, hidden layers and outputs as its

input and generates a physical layout with interconnections between neuron blocks on

different layers. Since layout libraries for memristors are not available yet, the placement

and routing tool is only a prototype to illustrate how the architecture would appear on

a layout and to gather an approximation of the area occupied by a specific network.

Majority of the simulations in this work were performed using HSPICE. Spice level

simulations are the best available approximations to actual circuit behavior in hardware.

Simulations were performed for individual components of the architecture and complete

neural network circuits. We also developed a simulator to train a small neural network

in HPSICE using Perl. Perl mimicked functions of the microcontroller such as supplying

random inputs, clock signals etc. by generating PWL inputs to the HPSICE circuit. A

neural network with 2 inputs, 3 hidden layer neurons and 1 output layer neuron success-

fully learned the OR-gate function in HSPICE.

The aim of our work was to develop an architecture suitable for implementing mem-

ristor based neural networks on chip. With the core of the neural network implemented in

HSPICE using real components and only minimal functionality simulated using software,

20

Page 34: Scalable Hardware Architecture for Memristor

1.7. THESIS OVERVIEW

we were able to show that our architecture is well suited to be realized on a chip.

1.7 Thesis Overview

The remainder of this document is organized in the following manner: Chapter 2 discusses

the architecture for implementing neural networks with the Memristor Bridge Synapse.

The Chapter describes the various components of the architecture and their functions. An

overview of the functioning of the neural network and the bit-slice design of the synapse

is also presented in this Chapter.

The placement and routing tool for the architecture layout is described in Chapter

3. This Chapter explains the algorithm and the implementation of the tool and presents

and discusses the output. The Chapter also discusses how the tool is designed to produce

layout for varying number of neurons and neural layers.

Chapter 4 describes the experimental setup, observations and analyzes the results of

the experiments conducted at different abstractions of the neural network design. All

components of the neural network are simulated both individually and and as full circuit.

The power calculations and estimations for neural network training and normal operation

are also presented in this Chapter. The conclusions drawn from this thesis and future

work are described in Chapter 5.

21

Page 35: Scalable Hardware Architecture for Memristor

Chapter 2

The Memristor Neural Network

Architecture

The primary focus of this thesis is to develop an efficient hardware architecture to im-

plement the memristor based artificial neural networks described in [9]. This Chapter

focuses on describing our architecture and the various components of the neural network

system and their functions. The architecture is best explained with the help of examples.

In this thesis, we have used two different neural networks for simulations at different

levels of abstraction. A small neural network that aimed to learn the OR-Gate problem

was used in simulations to verify the functionality of the Memristor Bridge Synapse and

other components and the entire architecture at the SPICE level. A much larger neural

network for face pose identification explained in [9], was simulated using Python to verify

the functioning of the large Memristor Bridge Synapse based neural networks for more

practical applications.

Figure 2.1: Sample input for face pose identification problem [11].

22

Page 36: Scalable Hardware Architecture for Memristor

2.1. ARCHITECTURE OVERVIEW

2.1 Architecture Overview

Image recognition is a popular application of artificial neural networks and the memris-

tor bridge synapse based artificial neural networks are efficient in learning functions of

this kind. We illustrate the working of the neural network architecture using face pose

identification problem discussed in [9].

The sample inputs to the neural network for the face pose identification problem is

shown in Figure 2.1. The images for face recognition are available for download from

CMU [11]. In this problem, the network tries to learn the direction in which the face of

the subject in the image is oriented. There are four face poses that the networks aims to

learn; left, right straight and up as depicted in Figure 2.1 (a) through (d). The images are

greyscale with 32x30 resolution. Figure 2.2 shows a representation of the neural network

used for this problem.

Figure 2.2: Three layered neural network for face pose identification.

23

Page 37: Scalable Hardware Architecture for Memristor

CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

The network has a total of 960 (32*30) inputs, 10 hidden layer neurons and 4 output

neurons. Every neuron in one layer is connected to every neuron in the succeeding layer.

The network consists of a total of 9640 memristor bridge synapses. The circuit produces

an output of [1 -1 -1 -1], [-1 1 -1 -1], [-1 -1 1 -1] and [-1 -1 -1 1] for left, right, straight

and up orientations of the subject’s face. In Figure 2.2, the input layer neurons are

only a representation of the fan-out of the external inputs to multiple memristors bridge

synapses. No function is applied to the inputs at the input layer neurons.

Figure 2.3: Memristor based neural network architecture for face pose identification.

For this neural network design, we can see that the number of hidden and output layers

are smaller in number compared to the input layer. In this particular example of face

pose identification, the number of input neurons is almost 100 times the hidden neurons,

and the number of output neurons is less than half the number of input neurons. A chip

for such a neural network can have close to 1,000 pins and the architecture in Figure 2.3

is designed keeping the constraint of connecting the pins to internal signals in mind.

Since all inputs go to all neurons, each neuron block in the middle layer receives the

inputs from a bus. A neuron block consists of as many Memristor Bridges as inputs to the

block (960 Memristor Bridge Synapses in this example) and three operational amplifiers

24

Page 38: Scalable Hardware Architecture for Memristor

2.2. ARCHITECTURE COMPONENTS

circuits, two for summing and one for difference. The middle layer neuron blocks are

placed close to the periphery of the chip on three sides and the output is drawn out from

the fourth side.

The input layer bus (consisting of 960 wires in the example) is placed around the

middle layer neuron blocks. This way, the inputs from the pins can easily be supplied to

the bus and the bus lines can be conveniently accessed by each neuron block. The middle

layer bus will have as many lines as middle layer neuron blocks (10 in this case). The

output of each neuron in the middle layer is connected to the bus and supplied to output

layer neuron blocks. The outputs of the output layer neuron blocks are connected to the

microcontroller, which reads the values generated by the network and calculate the error

and perform training. The outputs can also be tapped out through other pins on the

chip.

2.2 Architecture Components

With respect to the description of the architecture in Figure 2.3, the components of the

neural network can be categorized as internal and external to the neuron block. The

components that are external to the neuron blocks are the connection buses and the

microcontroller. We first describe the components internal to the neuron block and then

move onto the components external to it.

We describe the internal components of the neuron block with the help of a simple

neural network. Figure 2.4 shows an artificial neural network with two input layer neu-

rons, two hidden layer neurons and one output layer neuron. The aim of this neural

network is to learn the OR-Gate function. The training inputs are applied through the

nodes IN1 and IN2. There are a total of five neurons, N1 through N5 and six memristor

bridges, BR1 through BR6 in this network. Each neuron in one layer is connected to

every neuron in the succeeding layer. The neurons N1 and N2 are only a representations

and do not apply any function on the inputs. The applied inputs fan out from N1 and

N2 neurons to different bridges. For example, the input supplied at IN1 fans out to

25

Page 39: Scalable Hardware Architecture for Memristor

CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

bridges BR1 and BR2. Each memristor bridge produces two output components, the

VA component and the VB component. These components are represented by the two

lines that originate from each bridge synapse and go into the neuron where summing

logic is implemented.

Figure 2.4: Simple three-layered neural network.

2.2.1 Neuron Block

2.2.1.1 Memristor Bridge Synapse

The Memristor Bridge Synapse is the primary component of the neural network and

takes up most area on the chip. Each memristor is about 50 nm x 50 nm wide. A single

memristor bridge requires about 200 nm x 200 nm of area after including the routing

between the memristors. The biggest network simulated in this work has almost 10,000

bridge synapses.

Figure 2.5 shows the design of a Memristor bridge synapse. The input to the memristor

is applied from one end (node IN ) and the other end is tied to ground. As discussed in

Chapter 1, the two memristors connected on either side of node A are arranged such that

one of them will be forward biased and the other reverse biased when a voltage is applied

at node IN. The same logic applies to the memristors connected on either side of node

B, the only difference being that their orientation with respect to node IN is opposite

to that of the other two memristors on either side of node A in the bridge. The nature

of this arrangement ensures that the voltage drop at either node A or node B will be

greater than the other. Because of this arrangement, when the voltage drop at one node

26

Page 40: Scalable Hardware Architecture for Memristor

2.2. ARCHITECTURE COMPONENTS

Figure 2.5: Memristor Bridge Synapse design.

increases, drop at the other node decreases. It also ensures that the total resistance of

the memristor bridge is a constant and brings a symmetry to the weight supplied by the

bridge.

The weight supplied by the bridge synapse is the difference between the node voltages

(VA − VB). The weight is changed by supplying either a positive or negative voltage at

IN. For the bridge in Figure 2.5, a positive voltage pulse at IN will result in the decrease

in the resistances of the memristors M1 and M4, and an increase in the resistance of M2

and M3. Consequently, the voltage drop at A will increase and voltage drop at B will

decrease as explained using equations 1.4 and 1.5. On the contrary, if a negative voltage

pulse is applied at IN, the voltage drop at A will decrease and that at B will increase.

The weight supplied will either be positive or negative depending on whether VA or VB

is greater.

It is interesting to note that both the evaluation and training pulse are applied through

the same node to the memristor bridge. The question arises how would the evaluation

input affect the resistance of the bridge and in turn, the weight of the bridge if they

are both applied from the same node. From experiments conducted, we observed that if

the pulse width of the input is within 1 ms, it does not bring any notable change to the

resistance of a memristor. Moreover, to ensure that the evaluation pulse does not alter

the resistance of the bridge synapse, the evaluation pulse is supplied as a complement,

27

Page 41: Scalable Hardware Architecture for Memristor

CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

e.g. if an input to one of the inputs of the neural network is +1V, a -1V is applied for a

the duration as the +1V input during evaluation to reverse any change to the resistance

caused by the input pulse. To change the resistance of a memristor by 40 Ω , a pulse of

width 250 µs was required. The experiments and observations are described in detail in

Chapter 4.

All connections between neurons in the network are established using the memristor

bridge synapse. While training is in progress, each memristor is applied a training pulse

based on the random number that was generated for it. The circuitry for applying random

pulses will be discussed in a later section.

The activation function at the neuron is implemented by the operational amplifiers

using a summing logic. The neuron receives its input from the various bridges that are

connected to it. Each bridge supplies two voltage components (VA and VB). The neuron

first sums these two components individually and then evaluates the difference between

the two sums.

Figure 2.6: Summing logic for neuron N3 from Figure 2.4.

Figure 2.6 shows the summing logic implementation of neuron N3 from the circuit in

Figure 2.4. Each bridge synapse has two output components, the VA component (voltage

from node A) and VB component (voltage from node B). At the neuron, the individual

VA and VB components are summed together first. After the summing is complete, the

difference of these individual summed values is evaluated. This evaluated voltage value

28

Page 42: Scalable Hardware Architecture for Memristor

2.2. ARCHITECTURE COMPONENTS

will be the output of the neuron.VASUM and VBSUM in Figure 2.6 are evaluated by

summing the VA components and VB components from memristor bridges BR1 and BR3.

After the summation, the difference is evaluated by subtracting VBSUM from VASUM .

The difference gives the output N3OUT of neuron N3.

Both the summing and difference logic is implemented with the help of operational

amplifiers. Each neuron contains three operational amplifiers, two for each summing cir-

cuit and one for the difference circuit. The implementation of the summing and difference

circuit are explained in the following section.

2.2.1.2 Summing Amplifier

The summing operation is implemented by using a voltage average circuit along with

an operational amplifier as depicted in Figure 2.7. Note that the resistors used along

with the amplifier circuits are normal resistors and not memristors. The memristors

are used only to design the bridge synapses. The voltage averaging is accomplished by

Figure 2.7: Summing circuit using voltage average and operational amplifier circuits.

connecting the input voltages to resistors of resistance R. The other end of these resistors

are connected to the same node. For example, in Figure 2.7, the voltages VA from BR1

and BR3 are connected to two resistors of resistance R. Now, the voltage at node S1

will be the average of the two input voltages. To get the sum from the averaged voltage,

the voltage at node S1 needs to be multiplied with the total number of inputs to the

summing circuit. This accomplished by adjusting the gain of the operational amplifier.

In the circuit it Figure 2.7, the operational amplifier is in the non-inverting configuration,

29

Page 43: Scalable Hardware Architecture for Memristor

CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

whose gain is determined by the two resistors R1 and R2.

The gain for this particular amplifier is two, since there are two inputs to the summing

circuit. VASUM will automatically be generated after the circuit receives the input

voltages. For a neuron which has n inputs, the operational amplifier will be configured

to have a gain of n. The gain of the transistor once fixed, does not have to be altered

during the operation of the neural network. An important point to note here is that the

output, VASUM of the summing circuit is limited by the supply voltage to the differential

amplifier. In the case of the circuit in Figure 2.7, the output voltage will be within -1V

to +1V.

2.2.1.3 Difference Amplifier

Figure 2.8: Difference circuit using differential amplifier.

The implementation of the difference amplifier is much more straightforward. The

differential amplifier circuit used for this operation is shown in Figure 2.8. It is configured

with a gain of 1 and the input VASUM is supplied to the non-inverting input and VBSUM

sum is supplied to the inverting input. All the resistors in the circuit are the same value.

The circuit essentially does the operation N3OUT = VASUM - VBSUM .

30

Page 44: Scalable Hardware Architecture for Memristor

2.2. ARCHITECTURE COMPONENTS

2.2.2 Microcontroller

The microcontroller is one of the key components of the architecture. It is responsible

for implementing the training algorithm by supplying all the necessary signals to the

memristor bridge synapses and the neurons. The microcontroller additionally generates

the random numbers required to train the weights of the bridge synapses.

2.2.2.1 Signals Generated by the Microcontroller

The microcontroller contains the logic for generating the control signals that are required

to train and operate the neural network. There are three control signals: update/evaluate,

evaluate, shift in and clk.

1. update/evaluate: This signal decides whether the neural network is in weight

update or evaluation mode. When update is high (update = 1), the network is in

weight update mode. The microcontroller supplies this signal to activate the weight

update process by enabling the +1V and -1V power rails. When the signal is low,

the network is either evaluating its output using the supplied external input, or is

in an idle state. When the network is in idle state, all bridge synapses and neurons

are undriven.

The update signal is also used to isolate the memristor bridges from the operational

amplifiers during the weight update phase. The isolation of the operational ampli-

fiers is very important to ensure that the training pulse on one memristor bridge

synapse is not propagated forward to the next layer. This is done by disabling the

input power rails to the differential amplifier through power gating.

2. shift in :The random numbers for each memristor bridge synapse are supplied using

this signal. Each bridge requires a random signal (either 0 or 1) and this random

number is generated and supplied by the microcontroller. The random numbers are

passed on to a shift register that are connected to the bridge synapse. Each bridge

synapse will have one D flip-flop associated with it to supply the random number

for training. The random numbers are supplied to the shift register through the

31

Page 45: Scalable Hardware Architecture for Memristor

CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

shift in line. There may be more than one shift in line depending on the size of the

neural network and the number of shift registers implemented.

3. clk : The clk is the global clock supplied to the entire neural network and is used

for supplying the random numbers to all the flip-flops in the network. This signal

is activated only when the random input are supplied to the neural network for

training.

2.2.2.2 Functions of the Microcontroller

The microcontroller is the central component of the architecture. It is responsible for

supplying the input, training signals, updating weights and evaluating the output of the

network. It ensure that all the components in the network function in a synchronized

manner.

Figure 2.9: Neuron N3 inputs and output for the neural network in Figure 2.4.

1. Synchronizing Input Application:

The microcontroller enables and disables the application of external inputs to the

neural network. The external inputs are supplied to the network for only a very

short period of time (<1ms) during the evaluation phase. The inputs should be

disabled while the training pulses are applied to the memristor bridge synapses.

Otherwise, this will lead to two strong signals driving one single node and may

32

Page 46: Scalable Hardware Architecture for Memristor

2.2. ARCHITECTURE COMPONENTS

even damage the circuit. The microcontroller ensures that multiple inputs will not

drive a memristor bridge synapse at any given time using the signals it generates.

2. Disabling Operational Amplifiers:

Figure 2.9 shows two input bridges BR1 and BR3 to the neuron N3 and its output

connected bridge BR5. The external input as well as the weight update pulses are

supplied to BR1 and BR3 through the nodes IN1 and IN2 respectively. The weight

update pulse for BR5 is supplied through node N3OUT. During the weight update

process, the voltage pulse supplied to BR1 and BR5 will propagate all the way to

the differential amplifiers and get amplified to generate a voltage value at N3OUT.

To avoid this scenario, the differential amplifiers are turned off while the training

pulses are supplied. Since the input and output of the differential amplifiers are

electrically isolated, the bridges themselves will remain electrically isolated. This

is ensured by gating the evaluate signal generated by the microcontroller with the

power rails of the operational amplifiers.

3. Generating Random Numbers:

The microcontroller is responsible for generating the random numbers that are

required for updating the weights of the memristor bridges during training. Each

memristor bridge requires a uniquely generated binary random number to decide

the direction of its weight update. If the random number associated with a bridge

synapse is 1, its weight is increased. If the random number is 0, then the bridge’s

weight is decreased. Each memristor bridge synapse has a D flip-flop that stores its

training input. All the flip-flops are connected as a shift register and their inputs

are supplied serially. A neural network may have more than ten thousand bridge

synapses and each will have its own D flip-flop, which makes it difficult to supply

so many random numbers in parallel.

4. Error Calculation and Processing:

Every iteration in training generates an output. The microcontroller reads this

output and compares it with an expected output already stored in its memory to

33

Page 47: Scalable Hardware Architecture for Memristor

CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

generate a mean-squared error value. This generated error is compared against an

already defined and stored threshold. If the error value obtained falls within the

threshold, then the training process will come to an end. If the generated error

is above the threshold, the error is compared with the error generated during the

previous iteration. If the comparison yields that the new error is less than the old

error, then the flip-flop values are not updated, and the weight update is done in

the same direction for each bridge as the previous iteration. If the new error is

greater than the old error, then the microcontroller generates new random values

and sends it through the shift register to all memristor bridges before the weight

updates are done.

2.2.3 Shift Register

A shift register is nothing but a cascade of flip-flops that share the same clock. A shift

register is used in the neural network to supply the random inputs necessary for weight

update for the memristor bridge synapses. Each bridge synapse has a flip-flip associated

with it, which hold a 1 or 0, to dictate whether a positive or negative should be applied

to the memristor bridge for weight update. A positive pulse increases the bridge’s weight

while a negative pulse decreases it.

There will be as many number of flip-flips as there are memristor bridge synapses.

The number of shift registers may vary depending on the design. If there is a large

number of memristor bridges, it may be suitable to have multiple shift registers to which

the random numbers can be supplied. For example, if there are 10,000 flip-flops in the

neural network, having two shift registers of 5,000 flip-flops each with input to each shift

register supplied in parallel, the process can be completed in half the time. However, if

there are 20 bridges, it is not desirable to have two separate shift registers due to the

logic overhead required to supply pulses to two different shift registers.

Although the flip-flops of the shift register are part of the neuron block, we described

it as a separate component since the shift register is formed by the interconnection of

flip-flops across multiple neuron blocks.

34

Page 48: Scalable Hardware Architecture for Memristor

2.3. MEMRISTOR BRIDGE SYNAPSE BIT-SLICE

2.2.4 Connection Buses

The connection buses are used to supply the inputs to each memristor in the neuron

block. The same input needs to be supplied to multiple memristor bridge synapses and

we use a bus to establish this connection. For example, for the neural network in Figure

2.3, each input is supplied to 10 memristor bridge synapses inside 10 different neuron

blocks that are placed on three edges of the chip. The bus 960 lines is laid around the 10

neuron blocks and all neuron blocks receive all 960 input signals.

2.3 Memristor Bridge Synapse Bit-Slice

The memristor bridge synapse is the most recurring element in the neural network design.

Each memristor bridge synapse requires additional circuitry to implement the weight

change and evaluation logic. This extra circuitry is composed of a flip-flop to hold the

selection logic for weight change and an additional multiplexer to select the input power

rails. Since these components need to be implemented alongside all memristor bridge

synapses, it is logical to create a bit-slice design with the bridge and the components that

go along with it. The flip-flops in different bit-slices will be connected together to form

a shift register.

Figure 2.10 shows bit-slice for the memristor bridge synapse. A multiplexer circuit

is implemented in the bit-slice using transistor logic to supply either +1V or -1V to

the input br in, of the memristor bridge synapse. The D flip-flop, which is part of the

shift register holds the value which determines to which power rail the memristor bridge

synapse will be connected to during weight update. Table 2.1 summarizes the logic.

When update = 0, node br in will either be undriven or driven by bridge input de-

pending on whether or not the microcontroller has asserted the training input signal.

When update = 0, the power rails to the multiplexer are gated and br in can only be

driven by bridge input. When update = 1, the power rails to the multiplexer are active

and the weight update pulse is automatically applied.

Table 2.1: Training input selection logic.

35

Page 49: Scalable Hardware Architecture for Memristor

CHAPTER 2. THE MEMRISTOR NEURAL NETWORK ARCHITECTURE

Figure 2.10: Memristor Bridge Synapse Bit-Slice.

update shift out in

0 0 undriven/bridge input

0 1 undriven/bridge input

1 0 -1V

1 1 +1V

Combining these components together and creating a bit-slice design makes it much

more efficient when creating layout for the architecture. The neuron blocks will be com-

posed of an array of these bit-slices and the differential amplifier circuits.

2.4 Architecture in a nutshell

The scalable hardware architecture is composed of several layers of neuron blocks. The

number of neuron blocks in each layer defines the number of neurons in the layer. Each

neuron block in a one layer receives all inputs from the previous layer or primary input.

36

Page 50: Scalable Hardware Architecture for Memristor

2.5. SUMMARY

The innermost layer in the architecture is the output layer, whose neuron blocks are

considerably smaller in size compared to the other layers.

Each neuron block contains as many memristor bridge synapses bit-slices as inputs

to it. Each bit-slice is composed of a memristor bridge synapse, a multiplexer circuit, an

inverter and a flip-flop. The flip-flops across multiple bit-slices are connected together to

form a shift register. The neuron blocks also contain three operational amplifier circuits

for summation and difference.

Training of the neural network is synchronized by the microcontroller. It reads the

neural network output to compute the error and generates all control signals and random

bits for training and supplies training input. Training is implemented by the Random

Weight Change algorithm, with the microcontroller comparing the output error at each

iteration with the previous iteration and deciding whether to continue weight change in

the same direction or switch to a new random direction. Training is stopped when the

output error goes below a set threshold or if the iteration limit is reached.

2.5 Summary

In this Chapter, we presented an overview of the proposed architecture for memristor

based artificial neural networks. All the individual building blocks of the architecture

were introduced and how they all piece together to form the complete hardware for an

artificial neural network was explained. In the next Chapter, we introduce a placement

and routing tool that can be used for realizing the hardware architecture presented in

this Chapter.

37

Page 51: Scalable Hardware Architecture for Memristor

Chapter 3

Placement and Routing Tool for

Memristor Neural Network

Architecture

In Chapter 2, we described the hardware architecture for implementing Memristor Bridge

Synapse based artificial neural networks. In this Chapter we describe the placement and

routing tool that can be used to translate the architecture into a physical layout. The

tool is only a prototype and is designed to be modified when memristor layout libraries

become available.

3.1 Tool Overview

The hardware architecture was designed in such a way that neural networks with a large

number of inputs can be easily implemented on a chip. In Chapter 2, we described the

architecture with the help of an example neural network with 960 inputs, 10 hidden layer

neurons and 4 output layer neurons with complete connectivity. This neural network will

be composed of a total of 9640 Memristor Bridge Synapses, 38,560 memristors, 9640 flip-

flops, 9640 voltage multiplexers, 9640 inverters and 42 differential amplifiers. Realizing

such a large circuit with over 65,000 components requires efficient floorplanning and

routing. The tool also needs to adaptable in cases where the number of hidden layers is

38

Page 52: Scalable Hardware Architecture for Memristor

3.1. TOOL OVERVIEW

more than one, which is the primary advantage of the hardware architecture.

We have implemented a tool using C++ to generate a layout capable of being loaded

onto Magic [21] layout editor. Magic is a free and open source VLSI layout tool initially

developed at UC Berkeley. Our placement and routing tool generates a .mag file with

layout information stored as coordinates of a two dimensional gird space that can be read

directly by the Magic layout editor. The placement and routing for the tool is done at

a level of abstraction where we illustrate only neuron blocks being placed and routed.

The neuron blocks constitute of Memristor Bridge Synapse bit-slices and differential

amplifiers.

As mentioned earlier, placement and routing in Magic is based on two dimensional

grids, where the dimension of each grid is the feature size of the technology used. The

blocks and the connection wires are printed on the two dimensional grid space by speci-

fying what material should be printed on each grid. Once the placement and routing is

complete, the grid coordinates and what material it holds is written out as text to a .mag

file. In most placement and routing tools, the grid space is stored in a data structure

so that each entry in the data structure pertaining to a coordinate can hold a specific

value to indicate what material each grid space holds. Once the placement and routing

is complete, the tool would print out the coordinates and the content of each grid onto

the .mag file.

For our neural network architecture, the total grid space required for placing and

routing the neural network for face pose identification was 7680 x 11520. If a data

structure was to be created to store each grid on the grid space, memory will have to be

allocated to store a total of 88,473,600 grid values.

In case of most VLSI circuits, placement and routing is accomplished by implementing

already established algorithms. These tools also take certain amount of runtime to create

and print out a layout to a .mag file depending on the size of the circuit and the routing

complexity. Our architecture however is a basically multiple instantiations of the same

components in a specific structured way. We place and route neuron blocks on three

sides of concentric squares. By taking advantage of this symmetry in the design, we

39

Page 53: Scalable Hardware Architecture for Memristor

CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE

mathematically calculate the grids on which different materials fall and directly write the

data to the .mag file. This way, we avoid the overhead of creating a large data structure

to store the contents of each grid which in turn brings significant reduction in runtime.

Analysis of the tool performance is described in a later Section in this Chapter.

In the following sections, we describe the tool flow and the algorithm, layout com-

ponents, analysis of the output and how the tool produces the layout for networks with

different number of hidden layer neurons. We also discuss the tool run-time analysis and

the area consumed by the layout.

3.2 Tool Flow

The placement and routing tool takes the number of inputs, number of hidden layers

and hidden layer neurons and the number of output layer neurons as its input. From the

description of the architecture in Chapter 2, we can see that neuron blocks are placed on

three sides of a square shaped chip and the outputs are drawn out to the fourth side. The

input layer neurons are placed close to the periphery of the chip and the output layer

neurons at its center. The hidden layers will be placed between the input and output

layer neurons.

The tool only places and routes the neuron blocks discussed in Chapter 2. Each

neuron block will contain three operational amplifier circuits and as many Memristor

Bridge Synapse bit-slices as inputs to the layer in which the block is present.

The tool starts by first placing the output layer neuron blocks on three sides of the

innermost area of the chip. The placement and routing in Magic as discussed in the

previous section is on two dimensional grids. The materials on layout are printed on

the two dimensional grid space by specifying the bottom left and top right corner of

a rectangle and the type of material occupying this rectangular area. The number of

blocks to be placed at the top, bottom and left sides are derived by the following simple

formulae:

No. of blocks on top side =Total no. of blocks

3

40

Page 54: Scalable Hardware Architecture for Memristor

3.2. TOOL FLOW

No. of blocks on left side =Total no. of blocks

3+ (Total no. of blocks)%3

No. of blocks on bottom side =Total no. of blocks

3

Figure 3.1: Placement of 10 blocks of output layer on layout represented withp-diffusion.

For simplicity, we abstract the components inside the neuron block and represent

each neuron block with p-diffusion on the Magic layout editor. The size of each block

is estimated by the number of inputs to the block. For example, if a particular layer

receives n inputs from either the previous layer or the primary input to the network, the

size of each block is set to be 2n x 2n grids. The blocks are separated with an offset

value of 5 gird spaces. The separation of blocks with offset is only done for illustrating

the placement and routing of blocks. For an actual implementation, the blocks can be

placed next to each other. Figure 3.1 shows an 10 neuron blocks of dimension 20 x 20

41

Page 55: Scalable Hardware Architecture for Memristor

CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE

grids and each block receives 10 inputs from either the previous layer or primary input.

Figure 3.2: After routing of input bus for placed blocks in Figure 3.1.

After placing the blocks, the tool creates a bus on the outside of the layer of neuron

blocks that span to all neuron blocks in the layer. Since each neuron blocks requires all

inputs from the previous layer, the bus is used to supply the inputs. The inputs to the

bus comes from either the output of the previous layer or the primary input. Figure 3.2

shows the updated layout in Figure 3.1 after input bus routing.

Once routing of the input bus is complete, the tool routes the output of the neuron

layer. If the layer under construction is the output layer, then the wires are drawn to

connect the output to the pins. For hidden layers under construction, the output of the

layer will be connected to the input bus of the next layer. Note that first the output layer

is routed followed by the hidden layers and finally the input layer.

After construction of one layer is complete, the tool picks up the next higher layer in

42

Page 56: Scalable Hardware Architecture for Memristor

3.3. OUTPUT AND PERFORMANCE ANALYSIS

Figure 3.3: Completed placement and routing for neural network with 30 inputs, 10hidden layer neurons and 10 output layer neurons.

the network and repeats the same process. On finishing construction of all layers of the

neural network, the tool routes the control signals and power rails through the blocks.

Figure 3.3 shows a completely routed 3-layered neural network with 30 inputs, 10 hidden

layers neurons and 10 output layer neurons.

The flowchart in Figure 3.4 summarizes the tool flow for placement and routing. In the

next section, we analyze the output of the layout tool for a few example neural networks.

3.3 Output and Performance Analysis

In the previous section, we presented the flow for our placement and routing tool and

showed the final layout for a neural network with 30 inputs, 10 hidden layer neurons and

10 output layer neurons. In this section, we discuss the output produced for different size

43

Page 57: Scalable Hardware Architecture for Memristor

CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE

Figure 3.4: Flowchart showing the tool flow for placement and routing.

neural networks and analyze the area occupied by the layout and the efficiency. We also

briefly discuss the runtime for generating the layout.

3.3.1 Area Analysis

To analyze the area occupied by the neural networks on layout, we present layout for

three different neural networks. We discuss the total area occupied by the layout and

also show what percentage of the total area is used for placement and routing for different

sized neural networks.

The largest neural network we created using the tool was for the face pose identifi-

cation problem which received 960 inputs and was composed of 10 hidden layer neurons

44

Page 58: Scalable Hardware Architecture for Memristor

3.3. OUTPUT AND PERFORMANCE ANALYSIS

Figure 3.5: Layout for face pose identification neural network with 960 inputs, 10hidden layer neurons and 4 output layer neurons.

and 4 output layer neurons. Figure 3.5 shows the full layout for this neural network. The

network occupies a total grid dimension of 9642 x 15393 grids. It is clear visible from

the Figure that a great percentage of the total area is left unused. The reason for this

is the nature of the neural network implemented. Due to the large number of inputs to

the network, each neuron in the hidden layer receives 960 inputs, which means that there

are 960 Memristor Bridge Synapses associated with each neuron. When we compare this

number to the output layer neurons, each neuron receives only 10 inputs the previous

layer which makes the size of the neuron blocks in the output layer to be very small.

Figure 3.6 shows the output layer of the neural network after zooming into the center of

the layout.

45

Page 59: Scalable Hardware Architecture for Memristor

CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE

Figure 3.6: Output layer layout for face pose identification neural network.

In Figure 3.7, we show the output layout of a neural network with 80 inputs, 12 hidden

layer neurons and 15 output layer neurons. Table 3.1 compares the total area of layout

for different neural networks for different technology nodes and Table 3.2 compares shows

the total unused area in the layouts for the neural networks in Table 3.1.

Table 3.1: Comparison of total layout area for neural networks for different technologynodes.

Neural

Network DescriptionGrid

Dimensions

Area (mm2)

for 2 λ =

InputsHidden

layer neurons

Output

layer neurons45 nm 32 nm 22 nm

960 10 4 9642x15396 0.075 0.038 0.018

80 12 15 1007x1313 6.60 × 10−4 3.38 × 10−4 1.60 × 10−4

30 10 10 342x513 8.88 × 10−5 4.49 × 10−5 2.12 × 10−5

From Table 3.2 we can see that a significant area of the layout is unused. This area can

46

Page 60: Scalable Hardware Architecture for Memristor

3.3. OUTPUT AND PERFORMANCE ANALYSIS

Figure 3.7: Output layer layout for neural network with 80 inputs, 12 hidden layerneurons and 15 output layer neurons.

Table 3.2: Fraction of unused area in layout for different neural networks

NeuralNetwork Description

GridDimensions

Unusedarea

InputsHidden

layer neuronsOutput

layer neurons960 10 4 9642x15396 35%80 12 15 1007x1313 40%30 10 10 342x513 41%

be used to implement other logic that is required for the operation of the neural network.

For example, generation of random numbers can be accomplished by implementing a

linear feedback shift register (LFSR) within the circuit. This way, the amount of time

required to shift in the random numbers to the shift register can be significantly reduced.

3.3.2 Runtime Performance

Since we take advantage of the symmetry of the architecture and multiple instantiations of

the same components to create the layout using mathematical and geometric calculation

that require less memory access and processing, the runtime for creating the layout is

very less. The largest layout we created using the tool was the neural network for face

pose identification which occupied 9642 x 15393 grid vectors. The runtime required to

47

Page 61: Scalable Hardware Architecture for Memristor

CHAPTER 3. PLACEMENT AND ROUTING TOOL FOR MEMRISTOR NEURALNETWORK ARCHITECTURE

create this network on a PC with Intel CORE i3 370m processor at 2.40 GHz clock speed

was less than 0.2s. Since the runtime for the tool is very small, we are not reporting the

runtime for any of the other neural networks that we had created.

3.4 Scalability

Figure 3.8: Neural network with 80 inputs and 15 output layer neurons having twohidden layers with 30 neurons in the first hidden layer and 25 neurons in the second

hidden layer.

We show the scalability of the tool with the example of one neural network. This neural

network has 2 hidden layers with 30 neurons in the first hidden layer and 25 neurons in

the second hidden layer. The network has 80 inputs and 15 output layer neurons. Figure

3.8 shows the neural network layout. This neural network occupies 1974x2303 grids on

layout. When scaling the architecture to incorporate more number of hidden layer in the

48

Page 62: Scalable Hardware Architecture for Memristor

3.5. SUMMARY

neural network, the number of neurons in each layer need to be carefully planned. A

reasonable ratio needs to be maintained between the number of inputs to a layer and the

number of neurons in the layer and the number of neurons in the succeeding layer. It

needs to be noted that the number of primary inputs to the neural network needs to be

greater than the number of neurons in the first layer.

3.5 Summary

In this Chapter, we introduced and described the placement and routing tool that can be

used to realize the scalable hardware architecture for memristor based neural networks.

We gave an overview of the tool and explained how it was designed. The tool flow was

also described and an analysis of the tool output in terms of the area occupied by different

neural networks was also given. The scalability of the tool was illustrated with the help

of an example of a three-layered neural network. In the next Chapter, we discuss the

experiments, observations and results.

49

Page 63: Scalable Hardware Architecture for Memristor

Chapter 4

Experimental Results and Analysis

The simulations in this work were primarily done on SPICE and Python. The basic

components such as the memristor, the memristor bridge synapse, the memristor bit-

slice and a small neural network were simulated using SPICE. Bigger neural networks

were simulated using Python, which mimicked the behavior of the basic components at

a higher level of abstraction. In this Chapter, we describe the observations and results

of the simulations in SPICE and analysis of the results. The simulation results from

Python are not presented here since they do not convey anything different from what

was reported in [9]

We build confidence in the design by first illustrating the proper functionality of the

basic components of the architecture using SPICE. We begin by describing the behavior

of the memristor followed by the memristor bridge synapse. Once these components

are described, we go on to simulate the summing and difference logic using operational

amplifier circuits. After simulating all the individual components, we first perform a

partial simulation to illustrate the training process of the small neural network. This

network will have all the basic components working together in unison. We then go on

to simulate a complete training of a neural network to learn the OR-gate function.

50

Page 64: Scalable Hardware Architecture for Memristor

4.1. MEMRISTOR SIMULATION

4.1 Memristor Simulation

Biolek et al. developed a mathematical model for the memristor, based on the findings

in [13] by incorporating non-linear dopant drift modeled using window functions [22]. In

the experiments presented here, we have used an ideal model of the memristor in the

simulations.

Figure 4.1: Circuit for Memristor simulation with Memristor M1 (Ron =116Ω,Roff=16kΩ) in series with resistor R1 (100Ω) and Voltage source Vin.

Figure 4.1 shows the circuit used for simulating the memristor. In this circuit, memris-

tor M1 is connected in series with a resistor R1 and a voltage source Vin. The memristor

used in this simulation has on resistance (Ron) of 116Ω and off resistance (Roff ) of 16kΩ.

The series resistor R1 is 100Ω.

The circuit was simulated by supplying +1V and -1V DC voltages and the change in

the resistance of the memristor M1 was measured. The resistance of the memristor was

calculated based on the measured instantaneous current and applied voltage.

The memristor in its initial state has a resistance of 1kΩ. A +1V was first applied

to terminal A and the memristor was brought to its ON state (state of least resistance).

Then a -1V was applied for a certain period of time and the memristor was brought to its

OFF state (state of most resistance). Figure 4.2 shows the waveform for this simulation.

It took about 9ms for 1V pulse to completely turn the memristor ON and make

it reach Ron=116Ω. From the Ron state, it took -1V pulse about 20ms to bring the

51

Page 65: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.2: Memristor simulation with DC voltage +1V and -1V.

memristor to near OFF state. From Figure 4.2, we can see that the memristor does

not go completely into OFF state until a long period of time. This is due to the non-

linear nature of the memristor. This behavior is expected since it takes a pulse of longer

duration to turn OFF the memristor when compared to turning it ON.

To observe the resistance change of the memristor more closely, we applied voltage

pulses of different duration and measured the resistance change with a view of finding a

suitable training pulse for the memristor bridge synapses. We performed two simulations

on two instances of the circuit in Figure 4.1. To one of the circuits, a positive pulse was

supplied at node A, and to the other a negative pulse was applied at node A, making

them forward and reverse biased respectively.

In the first simulation, voltage pulses of duration in milliseconds were supplied to

both instances of the circuit. Figure 4.3 shows the simulation waveforms for millisecond

pulses. Pulses of pulse-width 10ms and 5ms were applied to the circuit and the instan-

taneous initial and final currents were measured and the resistance of the memristor was

calculated. In the second simulation, pulses with duration in microseconds were applied

and similar calculations were made. Figure 4.4 shows the simulation waveforms for the

52

Page 66: Scalable Hardware Architecture for Memristor

4.1. MEMRISTOR SIMULATION

Figure 4.3: Resistance change in the memristor for millisecond input pulse-width.

pulses of pulse-width 400µs and 250µs. The waveform in Figure 4.4 also illustrates that

that applying a negative pulse for the same duration can reverse the effect of the initial

positive pulse and vice versa.

The results of the simulation are summarized in Table 4.1 and Table 4.2. Table 4.1

shows data for forward biased circuit and Table 4.2 for reverse biased. The instantaneous

initial and final currents were measured for both forward and reverse biased memristors,

and the instantaneous resistance values were calculated. Input voltage of pulse-width

10ms, 5ms, 400µs and 250µs were applied and the resistance changes were measured. The

objective of this experiment was to identify a pulse-width that would bring an optimal

resistance change in the memristor for updating the weight of the memristor bridge

synapse.

Table 4.1: Instantaneous current and resistance measurements for forward biasedmemristor.

Pulse-width Iinit (A) Ifinal (A) Rinit (Ω) Rfinal (Ω) Delta R (Ω)

10ms 1.23 × 10−4 1.59 × 10−4 8057.9377 6176.2819 1881.6557

5m 1.59 × 10−4 1.93 × 10−4 6176.2819 5083.4957 1092.7862

400µs 1.23 × 10−4 1.24 × 10−4 8057.9377 7989.9604 67.977314

250µs 1.24 × 10−4 1.24 × 10−4 7989.9604 7946.9944 42.965912

From the data, we can see that pulses with pulse-width in millisecond range bring a

53

Page 67: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.4: Resistance change in the memristor for microsecond input pulse-width.

very large change in the resistance of the memristors. The pulses in microsecond range

seem more suitable for weight training since the resistance change is less than 100 Ω.

This will give a wider range for the weights supplied by the memristor bridge synapse. A

positive 1V pulse of 400µs brings a decrease in the resistance of the memristor by about

68Ω and a negative 1V pulse of the same duration brings an increase of nearly the same

amount. Simulating the memristor bridge synapse will give a better idea of how the

pulse-width of the training pulse affects the weight of the neural network.

Table 4.2: Instantaneous current and resistance measurements for reverse biasedmemristor.

Pulse-width Iinit (A) Ifinal (A) Rinit (Ω) Rfinal (Ω) Delta R (Ω)

10ms 1.23 × 10−4 1.03 × 10−4 8057.9377 9587.1065 1529.1688

5m 1.03 × 10−4 9.67 × 10−5 9587.1065 10239.123 652.01678

400µs 1.23 × 10−4 1.22 × 10−4 8057.9377 8125.7136 67.775907

250µs 1.22 × 10−4 1.21 × 10−4 8125.7136 8167.1958 41.482187

4.2 Memristor Bridge Synapse Simulation

To simulate the memristor bridge synapse, we used the arrangement shown if Figure 2.5

and inserted a voltage source at node IN. Figure 4.5 shows the circuit used for simulating

54

Page 68: Scalable Hardware Architecture for Memristor

4.2. MEMRISTOR BRIDGE SYNAPSE SIMULATION

Figure 4.5: Memristor Bridge Synapse circuit used for simulation.

the memristor bridge synapse. As described earlier the two memristors top and bottom

are connected such that one of the two memristors is forward biased and the other reverse

biased. The output voltage of the bridge synapse is tapped from nodes A and B. The

weight supplied by the memristor bridge synapse is adjusted by changing the resistance

of the memristors on the bridge synapse. The difference in the voltage at nodes A and B

(VA − VB) is the output of the bridge synapse.

Figure 4.6: Memristor Bridge Synapse simulation waveform.

The memristor bridge synapse was simulated by applying a 1V pulse of width 400µs.

Before the pulse was applied, all the memristors were brought to their initial state with

all memristors having the same resistance. The waveforms from the simulation are shown

in Figure 4.6. The first pulse applied for 400µs is the update pulse. The second spike

55

Page 69: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

in the waveform is the training input pulse applied for a much shorter duration (5ns).

We can observe that the voltage difference (VA − VB) produced by the memristor bridge

synapse is about 4.26mV after updating the weights. When the weight update pulse is

applied, the resistance of memristors M1 and M4 decrease, while that of M2 and M3

increase. This results in an increased voltage drop at node A and decreased drop at B.

Figure 4.7: Evaluation pulse applied to Memristor Bridge Synapse.

The evaluation pulse is applied to read out the output of the network and is of much

shorter duration compared to the training pulse. The 5ns pulse is short enough to not

bring any notable change in the resistance of any of the memristors in the bridge synapse.

Figure 4.7 shows the magnified part of the evaluation pulse from Figure 4.6. We can see

that the output voltages at neither node A nor node B changed after the evaluation pulse

was supplied. Usually, the evaluation pulse is applied with another voltage pulse of same

duration but opposite magnitude to negate any resistance change that might be incurred

during evaluation.

Table 4.3 shows the voltage difference generated for training pulses of different pulse-

widths. In each of the simulations, all the memristors in the memristor bridge synapse

were at the same initial state (R=8050Ω) before the training pulse was applied. A voltage

pulse of 1V amplitude was applied to the IN as the training input.

The memristors used in our experiments have ON resistance of 116Ω and OFF resis-

56

Page 70: Scalable Hardware Architecture for Memristor

4.3. MEMRISTOR BRIDGE SYNAPSE BIT-SLICE SIMULATION

Table 4.3: Weight change for different training signal pulse-widths for memristor bridgesynapse

Pulse-width (µs) VA (V ) VB (V ) VA - VB (V )400 0.50213 0.49787 0.00426800 0.50426 0.49574 0.008521200 0.50638 0.49362 0.012762400 0.51277 0.48723 0.025544800 0.52499 0.47501 0.04998

tance of 16kΩ. In terms of magnitude, the minimum and maximum output voltage the

memristor bridge synapse can produce is 0V and 0.9928V, assuming that the maximum

input voltage is 1V. This means that if 400µs pulse of 1V magnitude is used as the train-

ing pulse, each bridge synapse will have about 466 possible weights. If a 4800µs pulse

of 1V magnitude is used, there would be around 40 possible weights for each memristor

bridge synapse. The length of the training pulse should be chosen based on requirements

function the neural network is attempting to approximate. For certain problems, it would

be effective to have more number of available weights, while for others a smaller number

could be more efficient.

4.3 Memristor Bridge Synapse Bit-Slice Simulation

The Memristor Bridge Synapse Bit-Slice was simulated to test its functionality. The

bit-slice design as described in Figure 2.10 is composed of the memristor bridge synapse,

a flip-flop and a multiplexer circuit to choose between +1V and -1V training pulse. The

output of the flip-flop controls whether the multiplexer supplies a +1V or -1V to the

input of the memristor bridge synapse during training.

A simple simulation was done to verify the functionality of the memristor bridge

synapse bit-slice. The experiment results are shown in the waveform in Figure 4.8. The

update signal is used to control the weight update and training input application. When

update is low, the +1V and -1V rails used for weight update is gated and driven to GND.

This ensures that there will be no leakage current through the multiplexer circuit that

would otherwise alter the weight of the bridge. When update is high, the power rails are

57

Page 71: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

active and the weight update process is activated. The weight-update pulse is applied

to the memristor bridge synapse till the update signal is high. During the weight update

phase, the training input is not applied.

Figure 4.8: Memristor Bridge Synapse Bit-Slice simulation waveform.

In the simulation waveform in Figure 4.8, the update signal is initially kept low. A

high signal is produced at the input of the flip-flop and a clock signal is applied. Once the

clock is applied, the output of the flip-flip (scan out) becomes high. After a small time

gap, the update signal is made high and the weight update process is activated. When the

weight update is completed, a small pulse is applied through the training input port to the

memristor bridge synapse to evaluate the voltages at nodes A and B. When evaluation is

complete, a low signal is produced at the flip-flops input and the same process is repeated.

In this experiment, a weight-update pulse of 30ms was applied while the flip-flop

output held a high value and a 20ms pulse was applied when it held a low value. The

waveform in Figure 4.8 show that the bit-slice circuit is functioning correctly.

58

Page 72: Scalable Hardware Architecture for Memristor

4.4. SIMPLE NEURAL NETWORK SIMULATION

4.4 Simple Neural Network Simulation

To illustrate the working of the memristor based artificial neural network as a system of

the basic components working together, we simulated the training of the simple memristor

based artificial neural network in Figure 2.4 explained in Chapter 2. This neural network

aims to approximate the OR-Gate function. The components of this neural network

include six memristor bridge synapses (Figure 2.5), six summing amplifiers (Figure 2.7)

and three difference amplifiers (Figure 2.8). The elements of the memristor bridge synapse

bit-slice like the D flip-flop, multiplexer etc. are ignored in this simulation for simplicity.

The weight update pulses are supplied directly to the bridge synapse terminals for this

simulation using separate voltage sources.

Figure 4.9: Neural network training input application and output evaluation.

The initial conditions of the memristors were changed for better clarity in illustrating

the functioning of the network. Each memristor is set to have a different initial resistance.

Figure 4.9 shows the simulation results of the circuit functioning. In the waveform, signal

n5out is the output of the neural network. Signals in1 and in2 are the two input training

signals to the neural network. update is the signal used to control between weight update

59

Page 73: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

and training input application. The training input is applied to the circuit when update

is low. The weight update process happens automatically when update goes high.

For the first step, update is made low and the training pulse is applied. The training

inputs are given as a complementary pair of pulse-width 1µs. The output is only measured

and evaluated during the first 1µs. The second complemented input is applied to restore

any change caused to the memristors due to the application of the input.

Figure 4.10: Neural network weight update pulse application.

After the training inputs are applied and output measured, the output error is calcu-

lated by comparing the obtained output with the expected output. On taking the mean

squared error of the output, we see that the output error is about 37.4%. Since this error

is much greater than desired, we apply weight update pulses to the memristor bridge

synapses. For the first training iteration, random weight update pulses are applied to

each of the memristor bridges in the network. There are six memristor bridges in this

particular example network and six individual pulses are applied to train each of the

memristor bridges. Figure 4.10 shows the weight update pulses applied to the memristor

bridges.

The first set of random wight update pulse applied to the memristor bridge synapses

60

Page 74: Scalable Hardware Architecture for Memristor

4.4. SIMPLE NEURAL NETWORK SIMULATION

are [-1 1 1 -1 -1 -1] to [BR1 BR2 BR3 BR4 BR5 BR6]. Each weight update pulse is applied

for a duration of 400µs. The pulse changes weights of the memristor bridge synapses in

either positive or negative direction depending on whether a positive or negative voltage

applied. After the weight update pulses are applied, the training input is applied and the

network is evaluated to measure the output error. We see that the output value during

evaluation is 0.38426V and the output error is 37.9%. The output error has increased

after the first set of weight update pulses were applied. The RWC recommends that

a new set of random weight update pulses should be applied to the memristor bridge

synapses if the output error increased compared to the previous iteration. A new set

of voltages, [1 -1 1 1 1 -1] is applied to the memristor bridge synapses and an output

voltage of 0.38608V is obtained on evaluation. The new output error is 37.6%, which is

lower than the previous iteration. So, the same weight update pulses are applied again

until the error either increases or reaches the expected value. Figure 4.11 (a)-(d) show

evaluation pulses magnified.

Figure 4.11: Neural network output at evaluation during different iterations.

61

Page 75: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

4.5 OR-Gate Training in SPICE

4.5.1 Experimental Setup

To show a complete training simulation of the Memristor Based neural network designed

using the hardware architecture described in this thesis, we created a training simulator

using HSPICE and Perl. The simulator tries to mimic the behavior of the neural network

on hardware as close as possible. SPICE is the closest and most accurate simulation

technique available to simulate electric and VLSI circuits.

Figure 4.12: Flowchart showing tool flow for neural network training simulator inSPICE.

62

Page 76: Scalable Hardware Architecture for Memristor

4.5. OR-GATE TRAINING IN SPICE

In our simulator, the neural network is defined using a SPICE circuit. All the logic

components of the neural network are modeled at the transistor level in SPICE except

for the differential amplifier for which we have used an idea model. For the memristor,

we have used the ideal memristor model from [23]. Perl mimics the operations of the

microcontroller by generating the control signals and the supplying the training input.

The flowchart in Figure 4.12 shows the simulator flow. The simulator is basically a

wrapper around HSPICE created using Perl. All interactions of the user are through the

command line interface to the Perl script. The simulator receives the number of inputs,

hidden layer neurons and output layer neurons along with pointers to files containing the

training inputs and the expected output. The user can also supply the error threshold and

a maximum iteration count for terminating the simulation. The simulation will terminate

if the output error goes below the error threshold or if the number of iterations of training

reaches the limit.

The Perl script first reads and store the training input and expected output from their

respective files and stores the information in a data structure. It then creates a SPICE

file of the complete neural network with all of its components using the specifications

provided by the user. Note that the SPICE files contains only the neural network and

none of the functions of the microcontroller are implemented in SPICE.

For the first iteration, the SPICE file contains only the training input supplied and all

other control signals are made inactive. The Perl script includes instructions to sample

the output of the neural network for when the training inputs are applied to the network

as voltages. The output voltage values are stored in a log file generated by HSPICE. At

the end of the HSPICE simulation for an iteration, the HSPICE tool is given a directive

to store the circuit state to a file that can be loaded by another SPICE file to begin its

simulation from where the previous iteration had finished.

Once the simulation is complete, Perl reads the network output voltage values from

the HSPICE log file and compares it with the expected output to compute the output

error. For the first iteration, the error for the previous iteration is saved as 0. The

wrapper checks if the new error is greater or lesser compared to the previous iteration. If

63

Page 77: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.13: Neural network output for learning OR-gate function at the start ofsimulation.

the new error is greater, then the wrapper generates random bits for training the neural

network. It modifies the PWL voltage inputs defined in the SPICE file to update the

weights of the neural network. It also adds lines to reload the circuit state at the end of

the previous iteration and calls the HSPICE simulator for the next iteration. If the new

error is found to be less compared to the previous iteration, then the wrapper first checks

if the new error is less than the error threshold. If this is found to be true, the training

simulation will end. If the new error is greater than error threshold then the wrapper

starts the next iteration of HSPICE simulation.

Since the RWC algorithm is an iterative heuristic, the output is not guaranteed to be

optimal. There is a chance that the output error may not go below the error threshold

during training. Even if the output error is only 0.1% above the error threshold, the

circuit still continues training and may not find an solution after the iteration limit has

reached. Unlike in software, it is not possible to save a snapshot of the circuit for the best

case output and revert to the required state. Setting the circuit to a specific state would

involve changing the resistance values of many memristors. Hence, choosing a suitable

error threshold is critical in training the neural network with the RWC algorithm.

64

Page 78: Scalable Hardware Architecture for Memristor

4.5. OR-GATE TRAINING IN SPICE

4.5.2 Observation and Analysis

We successfully trained a neural network with 2 inputs, 3 hidden layer neurons and one

output layer neuron to learn the OR-gate function. The training simulation was done

using the Perl-HSPICE simulator explained in the previous section. For this simulation,

all the memristors were initially in the same state. Hence, the weight supplied by all

Memristor Bridge Synapses will be 0 at the start of training. Figure 4.13 shows the

circuit output for the first iteration of training. v(in1) and v(in2) represent the input

pulses supplied to the neural network and v(l3in) represent the output. The input pulses

are supplied as complementary pairs in order to revert the effect of the applied input. The

output values obtained for the neural network was [8.2067aV 5.384nV 5.2055nV 9.736nV]

for an expected output of [0V 1V 1V 1V]. Note that all four input combinations for the

two input OR-gate are supplied to the neural network for training.

Figure 4.14: Neural network output for learning OR-gate function for 54th iteration oftraining.

In Figure 4.14 we show the output waveforms for the 54th iteration of training. It is

interesting to note that in this iteration, the output voltage for the input combination

v(in1) = 1V and v(in2) = 0V, the output is -0.012V. The error threshold for the simulation

was set at 0.015%. At the end of the simulation, the outputs obtained were [150.86fV

65

Page 79: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

0.92259V 0.98883V 1V]. The simulation ran for a total of 276 iterations with random bits

being supplied for 28 of those iterations. The weight update pulses were supplied for 500

µs for each iteration. Figure 4.15 shows the output waveform form the neural network at

the end of simulation.

Figure 4.15: Neural network output for learning OR-gate function at the end ofsimulation.

4.6 Power and Timing Estimation

In this section, we give an approximation of the power consumption and timing of the

neural network circuit. We mathematically analyze the power consumption and timing

for the circuit during training and come up with a generalized formula for estimating

these metrics for different neural network designs.

4.6.1 Power

The major player in power consumption for the neural network in both training and

normal operation are the memristor bridge synapses. Since these components are resistive

elements, they are likely to consume most power. Here, we mathematically calculate the

power consumed by a single memristor bridge synapse based on the memristor model

66

Page 80: Scalable Hardware Architecture for Memristor

4.6. POWER AND TIMING ESTIMATION

that we have used in our design.

The arrangement of the memristors on the memristor bridge synapse ensures that

the total resistance of the bridge synapse remains a constant. When a training pulse is

applied, the resistance of two of the memristor bridge synapse increase and that of the

other two decrease. This feature makes it easier for us to calculate the power consumed

during the operation of the circuit. For our simulations, we used a memristor model

with Ron = 116Ω, Roff = 16kΩ and Rinit = 8050Ω. We assume that all the memristors

of all bridge synapses in the circuit are at the same initial state before commencing

training. This implies that the total resistance of each memristor bridge synapse is about

8050Ω at all times. With this notion, we can calculate the instantaneous power drawn by

the memristor bridge synapse during training. To update weights, we supply a positive

or negative pulse of 1V magnitude. The current drawn from the circuit and the total

instantaneous power is given by the equations below.

Power, P =1

8050W (4.1)

P = 124.22 µW (4.2)

Both instantaneous power and average power for the memristor bridge synapse during

training is the same since the it can be viewed as a DC circuit with constant overall resis-

tance during the weight change process although the actual memristors may be changing

their resistance values. Note that the multiplexer circuit, the inverter and the flip-flip

associated with each memristor bridge synapse also consumes power during training, but

we ignore the power consumed by these elements since it is negligible compared to the

power consumed by the bridge synapse. So, the total average power consumed during

training can be generalized by the following formula.

Total Average Power, Ptot = (P ∗ number of bridge synapses) W (4.3)

67

Page 81: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Total Average Power, Ptot = (124.22 ∗ number of bridge synapses) µW (4.4)

Equation 4.3 shows the general formula for total average power for a memristor bridge

synapse based artificial neural network and Equation 4.4 shows the formula for total

instantaneous power for our simulations. The neural network used to simulate the OR-

gate function described in the previous section consisted of a total of 9 memristor bridge

synapses. So, the total average power consumed by the entire network for one iteration of

training is 1.118 mW. The power consumption for the neural network during evaluation

phase of training and standalone operation post training cannot be accurately estimated

since it depends on the input voltages supplied and the output of differential amplifiers

at each neuron. But the worst case instantaneous or average power consumed by the

neural network during standalone operation will be the same as the instantaneous power

consumed during training.

4.6.2 Timing

The complete timing for training a neural network cannot be accurately predicted because

of the algorithm being used to train the network. However, we can estimate the time

required for one iteration of training. Application of the weight update pulse occupies

the majority of time in one training iteration. In our simulations, we supplied a training

pulses for 500µs duration to update the weights. To evaluate the new output after

updating the weights, signals can be supplied in ns range. When compared to time

required for updating weights, the evaluation time is negligible. Another contributing

factor during training is the time required to shift in random bits to the flip-flips when

the weight change directions have to be updated. This time would depend on the number

of memristor bridge synapses in the network and the total number of individual shift

registers in the circuit. The clock period for the flip-flops can also be in nanosecond

range. The following equations summarize the time for training.

68

Page 82: Scalable Hardware Architecture for Memristor

4.7. TRAINING PERFORMANCE

Time one training w/ random bit generation = Weight update time + Shift in time

(4.5)

Time one training w/o random bit generation = Weight update time (4.6)

Equations 4.5 and 4.6 show the total time required for one training iteration when

random training bits are applied and not applied respectively. The neural network simu-

lation for OR-gate function required a total of 276 iterations with 28 iterations requiring

random bit generation. In our simulations, we used a clock of period 2µs to shift in the

values to the shift register. There were a total of 9 flip-flops in the circuit all connected to

form on shift register, which meant 9 clock cycles were required to shift in all the values

to the shift register. So, total time for training the circuit in hardware would be,

Total time = [(500 + 2 ∗ 9) ∗ 28 + 500 ∗ 248]µs (4.7)

Total time = 138.50ms (4.8)

We can see from Equation 4.8 that the actual time required to train this neural network

in hardware is less than 0.15s. Even though flip-flops can work faster than with clock

cycles of 2µs, we supplied such long pulses to reduce the run time for HSPICE simulation

by reducing the resolution. In real-time scenarios, the shift in time for random bits would

be far less than what is reported here.

4.7 Training Performance

We performed five training simulations to learn the OR-gate function using the same

neural network to analyze the performance of the training algorithm. One of the training

simulations required 276 iterations of weight updates. During training, the network

69

Page 83: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.16: Mean squared error vs iterations for training OR-gate function.

received new random bits for weight update for 28 iterations, meaning that the output

error increased 28 times during training. The graph in Figure 4.16 shows how the mean

squared error of the output reduces during the course of training for this simulation. The

Y-axis shows the mean squared error and the X-axis shows the number of iterations.

We can see from the graph that initially the error increases and decreases a few times

before the error starts to decrease continuously. After the initial slow decent, the error

increases again close to 50 iterations. Following another change in the random inputs the

error starts to decrease steeply as the network finds a suitable direction of weight changes

to match the expected output. The error again increases and decreases before it finally

reaches below the error threshold of 0.015%.

Table 4.4: Comparison of training performance for multiple simulations for trainingOR-gate function in HSPICE

Simulation Total Weight Random Continuous Training TimeNo. Updates Updates Updates on Hardware (ms)

1 276 28 248 138.502 56 2 54 28.043 693 20 673 346.864 97 6 91 48.615 732 4 728 366.07

70

Page 84: Scalable Hardware Architecture for Memristor

4.7. TRAINING PERFORMANCE

In Table 4.4 we summarize the results for all five training simulations for OR-Gate.

The number of iterations required for training cannot be predicted because of the ran-

domness of the algorithm used. We see that for the second experiment, the number of

iterations required for training was only 54, while the number for the fifth experiment it

was 732. When we compare the fourth and fifth simulation, we can see that the fourth

one finished in 97 iterations, but had more random weight updates than the fifth simu-

lation. In the fourth simulation, the network was able to find good directions for weight

change for its memristor bridge synapses and the error curve had a steeper slope. Figure

4.17-4.20 shows the mean squared error plot for simulations 2-5.

Figure 4.17: Mean squared error vs iterations for simulation 2.

71

Page 85: Scalable Hardware Architecture for Memristor

CHAPTER 4. EXPERIMENTAL RESULTS AND ANALYSIS

Figure 4.18: Mean squared error vs iterations for simulation 3.

Figure 4.19: Mean squared error vs iterations for simulation 4.

72

Page 86: Scalable Hardware Architecture for Memristor

4.8. SUMMARY

Figure 4.20: Mean squared error vs iterations for simulation 5.

4.8 Summary

The primary objective of this Chapter was to simulate and illustrate the working of the

neural network hardware architecture in SPICE. We began by simulating the working

of the individual components of the neural network and followed these simulations up

with a full training simulation completely in SPICE. We also presented an estimation of

the power consumption of the circuit and showed the timing requirements for training a

circuit. In the next Chapter, we draw conclusions of this work and propose enhancements

and extensions to architecture presented in this thesis.

73

Page 87: Scalable Hardware Architecture for Memristor

Chapter 5

Conclusion and Future Work

5.1 Conclusion

The Memristor based artificial neural networks presented in [9] employed the Memristor

Bridge Synapse to implement weights and the Random Weight Change algorithm for

training. The focus of the work in [9] was only to prove that Memristor Bridge Synapse

based neural networks can used to learn complex functions and maybe implemented on

a chip with supplementary hardware. The simulations were done in software to illustrate

the training of the neural network but a path to actual hardware implementation was

not provided.

We based our work on the findings in [9] that the Memristor Bridge Synapse an effec-

tive system for implementing weights which when employed with the RWC algorithm can

yield to neural networks capable of learning complex functions on chip without requiring

a host computer. Our aim was to develop a complete hardware architecture to imple-

ment Memristor Bridge Synapse based artificial neural networks. First, we presented an

efficient way to place different layers of neurons to allow maximum inputs to be supplied

to the network with less routing. Then we went on to describe the primary components

of the neural network like the Memristor Bridge Synapse and the operational amplifiers

along with various other hardware components necessary to implement the training logic.

After describing the building blocks of the neural network, we went on to show how var-

74

Page 88: Scalable Hardware Architecture for Memristor

5.2. FUTURE WORK

ious components could be combined together to form a bit-slice structure which can be

repeated to form layers of neurons.

We developed a prototypical placement and routing tool for the proposed architecture

to illustrate how the neural network would appear on layout. The tool also gives an

approximate insight into how much area the neural network requires and the efficiency

of the architecture in utilizing the chip area.

To ascertain that the proposed architecture with all its components can successfully

implement an artificial neural network capable of learning complex functions on chip, we

performed various SPICE simulations. We first simulated all of the basic components of

the architecture individually. After verifying their functionality, we combined the basic

components to make circuits to perform the different tasks in implementing and training

the neural network and tested their functioning. We performed a complete neural network

training simulation in HSPICE to learn the OR-gate function.

Through our simulations and analysis, we were able to conclude that the hardware

architecture presented in this thesis is an effective way to implement artificial neural

networks using memristors. As the large scale production of memristors on physical layout

becomes possible our architecture can be directly realized on chip without requiring any

additional circuity and can be easily scaled to have several layers of neurons to learn

complex functions.

5.2 Future Work

In this section, we present a few ideas that might help improve the robustness of the

system and its ability to learn functions and reduce power consumption.

5.2.1 Implementing Stronger Activation Function

The activation function implemented in the hardware architecture is only the summing

of the individual VA and VB voltage components of each bridge and taking the difference

of the two sums. A more complex activation function can be implemented to improve the

75

Page 89: Scalable Hardware Architecture for Memristor

CHAPTER 5. CONCLUSION AND FUTURE WORK

learning process. Circuits are available to implement popular activation functions such as

the sigmoid function and these circuits can be added to the neuron. The circuit will need

to be tested to see how effective other activation functions will be when implemented

along with the Memristor Bridge Synapse and the Random Weight Change Algorithm.

5.2.2 Linear Feedback Shift Register for Random Bits

In our architecture, we had assigned the task of creating random bits for training solely

to the microcontroller. The microcontroller would serially shift in the bits to the shift

register before the training pulses could be applied. This processes takes several clock

cycles depending on the size of the network. However, if the shift registers themselves

were made to generate random bit values, then the process would take far less time.

Completely new random bits could be generated in just one clock cycle and would lead

to saving a lot of time during training. The layout of the architecture also contains lot

of free space to implement logic for creating an LFSR.

5.2.3 Implementing other Hardware Friendly Algorithms

The same hardware architecture could be used to implement neural networks with a

training algorithm other than the Random Weight Change algorithm. In [24], Moerland

and Fiesler explain few hardware friendly algorithms for artificial neural networks.

5.2.4 Bit-slice in Layout

A layout for the Memristor Bridge Synapse bit-slice can be created and functionality can

be incorporated into the placement and routing tool to automatically place and route the

bit-slice on the layout by replacing the p-diffusion blocks that represent neuron blocks.

5.2.5 Testing with more Memristor Models

The hardware architecture was tested only for one memristor model in our experiments.

The architecture should be tested with different memristor models, which may allow re-

76

Page 90: Scalable Hardware Architecture for Memristor

5.2. FUTURE WORK

duction in training pulse application depending on the memristors’ device parameters.

We tested the network with only an ideal model of the memristor. In reality, the mem-

ristor’s non-idealities might play a significant part in the efficiency of the neural network

implementation. Thorough testing of the neural network can be done when characterized

libraries of memristors become available.

5.2.6 Reconfigurable Neural Network

Our architecture when translated to a layout will have a fixed number of inputs, hidden

layer neurons and output layer neurons. Different functions require different number of

neurons in each layer for efficient implementation. If logic can be incorporated into the

system by which the user can choose the number of neurons for each layer, it will provide

more flexibility and robustness in implementing various types of functions.

77

Page 91: Scalable Hardware Architecture for Memristor

Bibliography

[1] Wikipedia, “Memristor — wikipedia, the free encyclopedia,” 2016. [Online; accessed

4-February-2016].

[2] R. Williams, “How we found the missing memristor,” Spectrum, IEEE, vol. 45,

pp. 28–35, Dec 2008.

[3] Wikipedia, “Artificial neural network — wikipedia, the free encyclopedia,” 2016.

[Online; accessed 5-February-2016].

[4] M. Holler, S. Tam, H. Castro, and R. Benson, “An electrically trainable artificial

neural network (etann) with 10240 ’floating gate’ synapses,” in Neural Networks,

1989. IJCNN., International Joint Conference on, pp. 191–196 vol.2, 1989.

[5] M. Milev and M. Hristov, “Analog implementation of ann with inherent quadratic

nonlinearity of the synapses,” Neural Networks, IEEE Transactions on, vol. 14, no. 5,

pp. 1187–1200, 2003.

[6] J. Liu, M. A. Brooke, and K. Hirotsu, “A cmos feedforward neural-network chip

with on-chip parallel learning for oscillation cancellation,” Neural Networks, IEEE

Transactions on, vol. 13, no. 5, pp. 1178–1186, 2002.

[7] J. A. Starzyk et al., “Memristor crossbar architecture for synchronous neural net-

works,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 61,

no. 8, pp. 2390–2401, 2014.

[8] M. Soltiz, D. Kudithipudi, C. Merkel, G. S. Rose, and R. E. Pino, “Memristor-

78

Page 92: Scalable Hardware Architecture for Memristor

BIBLIOGRAPHY

based neural logic blocks for nonlinearly separable functions,” Computers, IEEE

Transactions on, vol. 62, no. 8, pp. 1597–1606, 2013.

[9] S. Adhikari, H. Kim, R. Budhathoki, C. Yang, and L. Chua, “A circuit-based learning

architecture for multilayer neural networks with memristor bridge synapses,” Circuits

and Systems I: Regular Papers, IEEE Transactions on, vol. 62, pp. 215–223, Jan

2015.

[10] H. Kim, M. Sah, C. Yang, T. Roska, and L. Chua, “Memristor bridge synapses,”

Proceedings of the IEEE, vol. 100, pp. 2061–2070, June 2012.

[11] CMU, “Neural networks for face recognition.” [Online; accessed 18-February-2016].

[12] L. Chua, “Memristor-the missing circuit element,” Circuit Theory, IEEE Transac-

tions on, vol. 18, pp. 507–519, Sep 1971.

[13] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams, “The missing

memristor found,” Nature, vol. 453, pp. 80–83, May 2008.

[14] K. Hirotsu and M. Brooke, “An analog neural network chip with random weight

change learning algorithm,” in Neural Networks, 1993. IJCNN ’93-Nagoya. Proceed-

ings of 1993 International Joint Conference on, vol. 3, pp. 3031–3034 vol.3, Oct

1993.

[15] J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two

decades of progress,” Neurocomputing, vol. 74, no. 1, pp. 239–255, 2010.

[16] M. L. Mumford, D. K. Andes, and L. R. Kern, “The mod 2 neurocomputer system

design,” Neural Networks, IEEE Transactions on, vol. 3, no. 3, pp. 423–433, 1992.

[17] I. Bayraktaroglu, A. S. Ogrenci, G. Dundar, S. Balkır, and E. Alpaydın, “Annsys: an

analog neural network synthesis system,” Neural Networks, vol. 12, no. 2, pp. 325–

338, 1999.

79

Page 93: Scalable Hardware Architecture for Memristor

BIBLIOGRAPHY

[18] S. P. Adhikari, C. Yang, H. Kim, and L. O. Chua, “Memristor bridge synapse-

based neural network and its learning,” IEEE Transactions on Neural Networks and

Learning Systems, vol. 23, pp. 1426–1435, Sept 2012.

[19] S. P. Adhikari, H. Kim, R. K. Budhathoki, C. Yang, and J.-M. Kim, “Learning

with memristor bridge synapse-based neural networks,” in 2014 14th International

Workshop on Cellular Nanoscale Networks and their Applications (CNNA), pp. 1–2,

July 2014.

[20] M. P. Sah, C. Yang, H. Kim, T. Roska, and L. Chua, “Memristor bridge circuit

for neural synaptic weighting,” in 2012 13th International Workshop on Cellular

Nanoscale Networks and their Applications, pp. 1–5, Aug 2012.

[21] “Magic VLSI Layout Tool.” http://opencircuitdesign.com/magic/. Accessed: 04-19-

2016.

[22] Z. Biolek, D. Biolek, and V. Biolkova, “Spice model of memristor with nonlinear

dopant drift,” Radioengineering, vol. 18, no. 2, pp. 210–214, 2009.

[23] D. Biolek, M. Di Ventra, and Y. V. Pershin, “Reliable spice simulations of memris-

tors, memcapacitors and meminductors,” arXiv preprint arXiv:1307.2717, 2013.

[24] P. Moerland and E. Fiesler, “Neural network adaptations to hardware implementa-

tions,” tech. rep., IDIAP, 1997.

80