AN HYBRID ARCHITECTURE MULTI-LAYER FEED-FORWARD NEURAL ... · 1 rrwilrl iikr ro express rny deep grutirride arld ... 3.7.4 Testing of neural chip 3.8 Sumrnary 4. VLSI ... Fisure 1.3

AN HYBRID ARCHITECTURE FOR MULTI-LAYER FEED-FORWARD NEURAL NETWORKS

by

Zulfiqar Ahmed

A Thesis Subrnitted to the College of Graduate Studies through the Department of Eiectrical Engineering in Partial Fulfillment

of the Requirements for the Degree of Master of Applied Science at the

University of Windsor

Windsor, Ontario, Canada

May 1999

National Library 1*1 of Canada Bibliothéque nationale du Canada

Acquisitions and Acquisitions et Bibliographic Services services bibliographiques

395 Wellington Street 395. rue Wetllington OttawaON KlAON4 OttawaON KlAON4 Canada Canada

The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts kom it may be p ~ t e d or othenvise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

O Copy Rights Zulfiqar Ahmed 1999

ABSTRACT

Multi-layer ked-forward neural networks have the capability to classify and generalize. which

are not achievable with other rnethods. The complete exploitation of their potential to fi111 limit

requires efficient hardware implementation. The two main problems of hardware realization;

easy long term storage of synaptic weights and massive interconnections. are addrrssed and

solved by the mixed signal architecture for implementation of feed-fonvard neural network. The

hybrid architecture is analyzed and implemented in 0.5 micron CMOS technology. The analog

processing blocks have been designed in current mode analog CMOS and the synaptic weights

and threshold values are stored in digital ROM.

ACKNOWLEDGE-MENTS

1 rrwilrl iikr ro express rny deep grutirride arld tlzanks tu my srrpen*isor DI: M. Altrnncli. for

/lis srlpport and grridance rlzrorrghour the progress of this thesis. I rvould also like ro rliank

Dr: G. A. Jrtllieri for his ver). helpfrrl srrggestior~s and encoriragernent. I aiso appreciatu the

parricipatiurl of DI: A. Jaekrl in rny cornmittee and lier fririrfiii s~rgyesrioris. Thanks nrrist also

go ro my parents for their infallible srrppon and encouragement.

TABLE OF CONTENTS

ABSTRACT DEDICATION .4CKNOWLEDGEMENTS LIST OF FIGURES LIST OF TABLES

1 . Introduction 1. I Introduction 1.3 A brief history of research in neural models 1.3 Neurril network rnodels

1 3 . 1 Feed back or recurrent networks 1.3.2 Feed forward networks

1.4 The learning algorithm 1.3.1 Error back propagation algorithm

1-5 Implementation of neural networks 1.6 Goals. objectives and organization

3. VLSI implernentation of neural networks 2.1 Introduction 2.3 Analog implernentation

2.2.1 Floating gate (EEPROM) devices 2 - 2 2 Capacitive storage

2.3 Hybrid implementation

3. An architecture for rnulti-layer neural networks 3.1 A VLSI architecture 3.2 The rnultiplexed architecture 3.3 Operation

vii

iv v vi ix xii

3.4 Input/Outpur requirements 3.5 Multiplexing 3.6 Pipelined architecture 3.7 Performance analysis on an example

3 -7.1 Software implementation of XOR 3.7.2 Quantization 3.7.3 VLSI implementation of XOR 3.7.4 Testing of neural chip

3.8 Sumrnary

4. VLSI circuitry 4.1 Neuron

4.1.1 Current mode neuron 4.1.2 Voltase mode neuron

4.2 Synapse 4.2.1 Multiplier 4.2.2 ROM

4.3 Trans-impedance amplifier 4.3 VLSI implementation of capacitor 4.5 Summary

5 . Conclusion

Appendix A Mask Layout Diagrams

Appendix B Simulation Models

Appendix C Verilog Source Code

REFERENCES

VITA AUCTORIS

LIST OF FIGURES

Figure 1 . 1 Neuron mode1

Figure 1.2 The architecture of a feed back neural network

Fisure 1.3 Multi-layer feed forward neural network

Figure 2.1 Floating gate structure for weight storage

Figure 3.1 An hybrid architecture by Djahanshahi

Figure 3.2 A unified synapse neuron

Figure 3.3 Layers and stages in the architecture

Figure 3.4 Interna1 structure of a stage

Figure 3.5 The adder

Figure 3.6 Conversion of the signal mode at input and output

Figure 3.7 Second solution to the VO mode

Figure 3.8 Pipelined architecture

Figure 3.9 Modified structure of each stage

Fisure 3.10 XOR network with hidden layer

Figure 3.1 1 Implemented network for the XOR function

Figure 3.12 Network with quantized values

Figure 3.13 System tevel VLSI layout for XOR

ix

Figure 3.14 System level simulations

Figure 3.15 Fabrication results of the neural chip

Figure 4.1

Figure 3.2

Figure 4.3

Figure 4.4

Figure 4.5

Figure 3.6

Figure 4.7

Figure 4.S

Figure 4.9

Figure 4.1 O

Figure 4.1 1

Figure 4.12

Figure 4.13

Figure 4.14

Figure A. 1

Figure A.2

Figure A.3

Sigmoid function and its derivative

Differential transistor pair and its transfcr characteristics

Current mode neuron

Transfer characteristics of current mode neuron

Results of montecarlo analysis for current mode neuron

Voltage mode neuron

Transfer charactenstics of voltage mode neuron

h/lontecarlo anaiysis for voltage mode neuron

Multiplier digital to analog convener

Transfer characteristics of MDAC

RO iM

Simulation results of ROM

Trans-impedance amplifier

Transient response

Mask layout of current mode neuron

Mask layout of voltage mode neuron

Mask layout of MDAC

Figure A.4 Mask Iayout of uans-impedance amplifier X

Fisure A.5 Mask layout of ROM with buffers

Figure A.6 Mask Iayout of capacitor

Figure A.7 Mask layout of neural chip

LIST OF TABLES

Table 3.1 Test vectors

Table 3.2 Simulation result

Table 3 . 3 Simulation resuit with quantized values

Table 4.1 Device sizes of current mode neuron

D b l e 4.2 Device sizes of voltage mode neuron

Table 3.3 Device sizes of trans-impedance amplifier

Table 4.4 Device sizes of capacitors

xii

Chapter 1

Introduction

1.1 Introduction:

Massively parallel networks based on neural network models have k e n subject of great interest

for the last several years. Although neural networks have been considered for many years

basically as models for understanding the biological information processing in the human

brain, the recent resurgence of interest in neural network concepts as a new approach to

computing is a cumulative ef5ect of the work of several researchers [6, IO, 131 due to many reasons.

The reasons are; maturing of technologies necessary to implement massively interconnected

parallel networks and improved knowledge of biological models upon which neural models are

Ioosely based. These motivations are underscored by the generai failure of symbolic processing,

to efficiently perform the tasks which are core of neural networks like image processing, pattern/

speech recognition and robotics or proçess control etc.

1. INTRODUCTION

Neural networks consists of large number of simple node processors. These processors do not

require programrning in the normal way, and usually any control which is necessary can be

exercised on at least a semi global ba i s . The stored information (associative memory) in a

neural network is presented in distributed manner. Proper operation of the network is therefore not

dependent upon the value at any specific storage location. Information processing within the

network is also distributed. in human brain the biological neurons keep dying, even then the brain

still functions correctly without any loss of performance. Alike in many neural network

implementations, a few defective neurons do not produce little or no degradation in the

performance of the network. Therefore the neural networks are fault tolerant [63]. This is in

contrast to digital computers.

Digital computers are very useful in solving well defined problems since they can be represented

by sequence of instructions. However, they have a great difficulty when addressing some tasks

which biological systems appear to perform with relative ease such as pattedspeech recognition

[7-9.11) and control. Real time performance of these tasks which is sometimes critical, presents

even more difficulty for digital computers.

It is quite evident that the power of the human brain is not needed to solve many perplexing

problems such as associative memory, pattern recognition, but still the characteristics and

techniques employed by the brain are required to handle these problerns effectively. Massive

interconnected parallelisrn is the most important feature of the brain. The human brain performs

1. INTRODUCTION

some tasks very easily which are very difficult for the digital cornputers, the reason for the

efficiency of brain lies in the massive parallel approach.

Analog circuits offer the best choice for providing this type of computational power. An analog

synapse can be as simple as a single transistor and a capacitor-This equals the sarne cornplexity as

a single bit of dynamic memory in the digital cornputer, though the analog circuit is capable of

representing the equivalent of about 8 bits worth of information, and perfonning crude

multipIication which requires a large amount of digital hardware to duplicate. This shows a lot of

reduction in chip area in cornparison to the digital circuits for comparable level of processing

power. Though analog circuitry is prone to process dependent parameters such as offsets,

restricted dynamic range, noise and temperature. However, these can be controlled by speciai

design techniques.

1.2 A brief history of research in neural rnodels:

The history of research on nervous system goes as far back as the discovery of the electrical

nature of nervous transmission by Galvani,and the early experirnents of Helmholz[67]. Significant

advances in the area of neuro-modeling, however, had to wait until the end of nineteenth century.

At this time, cognitive psychologist like James [2] outlined the fundamental concepts of neural

activity which is still in use even now.

1. INTRODUCTION

The farnous paper of McCullough and Pitts [3] in 1943 is generally considered as the beginning

of senous work on neural modeling. They showed that a network like neural networks of

Iinear threshold elements can compute any logical function. As a matter of fact, this paper had

a more pronounced effect on computer science than on neural networks [4-51. In 1949, Donald

Hebb published bis epoch making work[l2], stating the correlation update law. Frank Rosenblatt

[ 131 proposed the first computationally onented network and also gave the perceptron

convergence procedure. Widrow and Hoff 1141 with their ADALINE (1960) helped to bnng

adaptive systems and neural networks together.

At this time, one group inspired by Rosenblatt, believed in the power of the perceptron and in

general considered a connectionist architecture essential to the types of behavior observed in

human learning and recall processes. The second group anached not much importance to the

intemal mechanism of the nervous system, rather they were mostly in favor of serial symbolic

processing. The area of artificial intelligence; is a reflection of the beliefs of this group.

In 1960, Minsky and Papert published their farnous book, Perceptron [15]. This book is a

brilliant mathematical analysis of the limitations of network consisting of a single layer of iinear

threshold units. The only rnistake of the authors was that they went a step ahead and surmised

that these limitations extend to the general multi-layer and non linear case. This misconception

resulted in a considerable slackening of Pace of research in neural networks for next

two decades.

1 - INTRODUCTION

Between this time and late 1970's, a new interest in neural network was aroused, the works

of Kohnen in correlation matrix mernories [16] and Grosberg in mathematical modeling [17]

were most prorninent.

In the 1980's a re-awakening in neural network was seen. The developrnents of this decade

includes; studies in letter perception [18,20], Hopfield networks [21-221, Self organized networks

[33-251, Neocogniuon [26-291, Boluman machine [30], and the Error back propagation learning

algorithm [3 1-34]. The latter topic is of paramount interest and has attracted researchers from

various of science and engineering.

1.3 Neural network models:

GeneralIy neural network modeIs share a comrnon underlying structure. In very basic form, the

neural networks c m be described as a collection of simple node proçessors, called neurons, which

interact among themselves through a massive interconnected synaptic network. The function of a

single processor is to derive a weighted sum of al1 the previous outputs connected to its input

and to apply a nonlinear function (usually sigrnoid) to the result. The neural mode1 in layer 1+1

is shown in figure 1.1.

The neural networks are usually classified by their general structure. Among the many hardware

implernentations of neural network models, the feed back and feed forward neural network

are frequently used

1. INTRODUCTION

-- - . -

Figure 1.1 : Neuron mode1

Another new class of information processing systems called Cellular neural Network was

presented by Chua and Yang [19]. Similar to neural networks it is a large scale non-linear analog

circuit which presented signais in real time; like cellular automata, it consists of massive

aggregate of regularly spaced circuit clones, called cells, which comrnunicate with each other

directly only through their neighbors. Each ce11 is made of a linear capacitor, a non linear voltage

controlled current source and a few resistive linear circuits elements. Cellular neural networks

are well suited for high speed parallel signal processing. CNNs, which combine some features

of fully interconnected analog neural networks with their nearest neighbors found in cellular

automata, are especially well suited for VLSI implementations 1691.

1. INTRODUCTION

1.3.1 Feed back or recurrent networks:

A general structure of recurrent network is shown in figure 1.2. Each neuron has a non-decreasing

sigmoid non linearity at each node and its output is fed back to al1 other neurons via synaptic

weights. This mode1 has been applied to tasks such as associative memory and optirnization

where one out of several competing solutions must be resolved.

Inputs Q 9

Outputs

Associative mernories, or content addressable mernories cm be considered to be optimizing

networks, because they attempt to minirnize an objective function. Each neuron has a an externai

input and output. The task asigned to network is to reconstmct a vector while a part of it is given

1. INTRODUCTION

as input. The synaptic weights are prograrnmed such that the minima of the objective function

correspond to the stored vector. When a partial input vector is applied, the network will generate

the remainder of the vector whose stored pattern matches the best to that presented. Computer

aided machines were one of the earliest hardware implementations of neural networks [70]. The

most cornmon model of feed back neurd networks is the Hopfield model [2 1,22,7 11.

1 A 2 Feed forward networks:

The most commonly used model of neural network architecture is feed forward network with two

o r more hidden layers. The classical perceptron had a single layer of node processors and is a

feed forward network. Multi-layer feed forward neural networks have one or more layers of node

processors between input and output layers. They are also called as multi-layer perceptron [44].

The general form of feed forward network is shown in figure 1.3; the outputs of any layer are

weighted and added as an input to a neuron in the next layer.

An external input is applied to the first layer i.e. input layer, the processed output is then fed to the

next layers till the last layer Le. output layer. These layers are cascaded to form a multi-layer feed

forward neural network. The governing equation for any neuron in any layer is given by:

where Yj is the output of the jh neuron. Sj is a non linear sigmoid function. wij is the connection

weight from the ith layer to the jth neuron input. x, is the ith output of a neuron in the previous

layer and ej is a threshold at the input of the jth neuron.

OUTPUTS

Hiddei Layer

Figure 1.3: Multi-layer feed forward neural network

Networks with this structure have a flow of information in foward direction only therefore they

are inherently stable. These are often used in classifications tasks; common applications include

pattern recognition1 speech recognition [8,9,11].

1. INTRODUCTION

1.4 The learning algorithm:

In order to realize a desired result with neural networks. a learning process must be used to find

the correct weight matrix. There are two main classes of iearning algonthms; supervised and

unsupervised. In case of unsupervised leaming the network does not receive any feed back from

the environment and no information about the correct result is given, The training relies on

redundancies in order for the network to self organize. Supervised learning is a process where

feed back about the error is given and the weights are adjusted accordingly to minimize the error

. One such supervised training algorithm used for rnulti-Iayer feed forward neural networks is

error back propagation aigorithm [Ml.

1.4.1 Error back propagation algorithm:

Back error propagation is the most popular supervised learning rule for implementation of the

multi-layer neural networks[64]. It is an extension to the gradient rule; is a steepest-descent

method which minirnizes the totd mean square error. The training phase consists of a forward

pass for the output computation, calculation of the output error and propagation of this error to

each neuron using backward pass. The weight update may be then performed by appkat ion of

the neural activations and the associated errors. Let f(nei,) be the activation of the ith neuron.

5 and y be the desired and obtained output of the ith network output, W, be the weight of

connection from the jth neuron to the ith. The a, is the propagated error <O the ith neuron:

At the output layer: lji = f (net)(& - Y ) 1.2

At the hidden layer:

the weight update then can be represented by:

Where E is the leaming rate.The advantage of errorback propagation algorithm is its highly

parailelism: a forward pass and single backward pass is sufficient for weight update calculations.

The parallelism is achieved through assuming a known neural activation function alongwith its

derivative and backward computations [Ml. This assumption is quite valid for software

implementations but cannot be extended to the analog hardware domain[63].

1.5 Implementation of neural networks:

Mathematical modeling and the work Grosberg and Hebb was fundamental in the development of

neural networks[12,17]. However the investigators of nervous system and behavioral psychology

1. INTRODUCTION

were actually able to evaluate the performance of their models with any accuracy or precision

after the advent of digital cornputers and availability of numerical simulations. Farley and Clark

[35] were the first to utiIize the digital cornputer for modeling and software simulation

of neural networks.

Software implementation was done by Neocognitron by Fukushima [26-291. This was a multi-

layer neural network with 9 layers. Four different types of neural units were used and the learning

was supervised. The network was trained to recognize hand written characters regardless of

position site.

Another exarnple of software implementation of neural networks was the NETTALK; a network

that learned how to read. This feed forward, 3 layer network was developed in 1986 by Sejnowsky

and Rosenberg [36]. Boltzman machine and error back propagation learning rules were applied

with comparable results, it showed that error back propagation algorithm was faster in learning.

The software simulation on a digital computer is not difficult, but time consuming. This

stems from modeling a highly interconnected pardiel system with serial hardware. The hardware

implementation of neural networks has been addressed by many researchers. The first learning

machine was made by Marvin Minsky in 195 1. This machine had a memory consisting of 40

control knobs, which were moved by a single motor through electric clutches. It had 300

thermionic valves and in Minsky's own words [37] "was never thoroughly debugged, but worked

nonetheless(robustness)".

1. INTRODUCTION

Rosenblatt in early 1960's in Corne11 University built a "MARK 1" perceptron with 400 photo

receptive sensors on a 20 X 20 array. This perceptron had 152 associative units and 8 binary

response units for the final classification. Each sensory unit had upto 40 random connections to

the associator unit[38].The major obstacle in the realization of networks is the large amount of

hardware required to implement even the simplest functions.

1.6 Goals, objectives and organization:

The objective of the thesis is to develop architecture and circuits for feed forward multi-layer

neural networks. An architecture for this type of network will be presented. It will be shown

that this architecture results in decrease in the number of physical interconnections on the chip

without any loss of generality . The multiplexing scheme used will also make possible multi-

chip systems without a large number of interconnections. The architecture and circuits presented

can be applied to any feed forward neural networks regardless of application. A complete set of

cells for this architecture has been designed and integrated in the neural chip for boolean function

XOR, designed in a full custom 0.5 micron CMOS(3-metal single polysilicon process)

technology.

The chapter 2 will draw some cornparison between analog and digital implementations and

survey analog synapse techniques and the solutions to the analog memory problem will be

discussed. In chapter 3 an implementation of a modular hybrid architecture using 0.5 micron

1. INTRODUCTION

CMOS process will be presented. The building blocks of the hybrid architecture will be presented

in chapter 4. Finally the chapted is the conclusion.This chapter presents a summary of the work

done and .as well, presents promising future areas of research in this area.

Chapter 2

VLSI Implementation of Neural Networks

2.1 Introduction:

Artificial neural networks, are a cIass of distxibuted processors, consisting of massively

interconnected processors. Its capabilities of generalization. adaptation, leaming from examples

and tolerance to noise [3 11 have made them very attractive for many applications such as pattern

recognition. imagekpeech processing, control etc.

The massive interconnected distributed structure of neural networks typically requires millions

of operations per second. Therefore, complete exploitation of neural network potentials is not

achievable through software implementations. Large number of simple processors, huge

interconnectivity of the processors and distxibuted information storage suggests VLSI a suitable

hardware implementation scheme.

2. VLSI irnplementation of neural networks

Similar to any other VLSI architecture reaiization, there are three trends in neural network

implemen tation; Digital, Anaiog and Mixed. The advan tages of digi ta1 approach are:

a) Design techniques are advmced, automated and well understood.

b) Programming of weights can be managed easily.

C) Interchip communication and possible exchange of information with host computer

can be easily perforrned,

d) Digital memories are comparativeiy easy to build and the weight storage problem, from

which analog implementations suffer, does not appear in the digital case.

However, the neural networks are analog in nature and. intuitively, it seems that an analog

implementation would be more suitable and elegant. The neural networks use signals which

have limited precision and range and the environment can be very noisy.T'herefore the precision

and immunity to noise, the main advantages of digital systern are not essential. Also in digital

design, the multipliers occupy a larger silicon area cornparhg with their anaiog counterparts and

the two level information representation on each line increases the interconnections [65-661.

Analog approach [39-461 provides compact realization of neuron non linearity and synaptic

operation which are crucial in the implementation of neural networks for real world applications.

Also the ability to transmit/process more than one bit per line provides a desirable reduction of

interconnections. Some of its limitations are low precision, low noise, immunity, temperature

dependence and process parameter dependence. However, special design measures that typically

2. VLSI implementation of neural networks

increase both the design complexity and size may be taken to improve precision. temperature

and process tolerances 147-481.

The major drawback of analog implementation approach is lack of reliable non volatile analog

memory to store the connection weights:

The third design approach [1,53-561 combines, the digital and analog methods. This scherne is

called rnixed (hybrid) analog/digital method. Murray and Smith [53] have used pulse coded

analog signals and digital synaptic storages. Arima et al [54] have reported a 400-neuron design

which takes advantage of analog weight storage on câpacitors and binary neuron activation. In

the design of Boser et ai [7,49 1 al1 the operations are performed in andog mode. But in order to

simplify system integration, digital input/output (both weights and activations) are used in the

design.

Since implernentation of non volatile programmable memory is the major drawback of fully

analog realization, some researchers have used complete analog processing blocks in conjunction

with digital weight memory [55-561. In both of these designs synaptic multiplication is based on

multiplying D/A converter. This hybrid multiplier produces an analog output through

multiplication of the digital weight by analog input. The draw back of this approach is the limited

number of bits in dynamic range. This will have an adverse effect on the convergence of most of

learning algorithms. However this rnethod is suitable for applications where constant adoption is

necessary.


2.2 Analog implementation:

Analog VLSI is a promising platform for the implementation of these networks. Large scale

integration makes it possible to put many electronic components on a single chip with better

reliabiiity and lower cost. However, another inherent property of neural networks, higher inter-

connectivity, has proven to be one of the major obstacles in the way of hardware implementation

of large networks. The interconnections are a major source of limiting the size of networks on

chip. Analog technology has the advantage that the maximum information capacity of single line

is only limited by the noise and other uncertainties. When the signais are represented digitally

and transrnitted in parailel, each activation or node value requires several wires for inter-

connections, whereas in case of analog one wire suffices.

The most difficult issue in the design of analog neural networks is storage of synaptic weights.

Although some studies have k e n made in this area, a reliable, compact analog rnemory in CMOS

technology, that

available. In the

be discussed.

can preserve data with acceptable accuracy for longer periods of time, is not

following sections the solutions to analog memory by different researchers will

2.2.1 Floating gate (EEPROM) devices:

FIoating gate memory has been used as a method of analog weight storage in neural networks.

2- VLSI implementation of neural networks

The main reason for its usage is due to the fact that its storage time is in years in contrat to

milliseconds of capacitive storage. A cross section of the simple floating gate device is shown in

figure 2.1.

I Figure 2.1 : Floating gate structure for weight storage

The name floating gate mernories is because the polysilicon gaie of the transistor is left

unconnected and by different methods electrons are deposited and removed from the gates. The n-

channel device is constructed on p-substrate with heavily doped n+ diffusion for source and drain.

A special layer of thin oxide is inserted between the substrate and Roating gate. On top of floating

gate is a layer of normal oxide and then the controi gate which corresponds to the gate of regular

MOS transistor. Electricaily this structure acts like two capacitors connected in series between


control gate terminal and substrate. If a large voltage is applied to the control gate relative to

substrate, a high elecuic field will be induced between the conuol and the floating gate and

between floating gate and substrate. If the electric field is strong enough electrons can tunnel

the thin oxide layer between the floating gate and the substrate. Thus the electrons are then

trapped on the floating gate. This phenornenon is known as Fowler-Nordheim tunneling. Negative

voltage c m be applied to the control gate to cause tunnel in the reverse direction. The trapped

charge causes a shift in the threshold voltage of the device , because in the normal mode of

operation a higher voltage will be needed to be applied to control gate to overcome the effect of

trapped charge on the floating gate.

Therefore the threshold voltage on this device can be used to represent a modifiable weight value.

Programming is perfomed with short voltage pulses for efficient control of the amount to charge

which is transferred. There are other methods for trapping of the charge but Fowler-Nordheim

tunneling appears the best method for this application. Due to hjgh qudity of VLSI insulators

, charges on the floating gare will remain intact for a long duration; therefore these devices can

be categorized as non-volatile with non-destructive read. A special fabrication process which is

not universally available, is required to produce the thin oxide layer for these devices-

Floating gate devices have been proposed as non volatile weight storage by many researchers

[SO-521 and were applied in many neural network implementations [40-42,451. The most notice

able implementation of floating gate synapses is the ETANN (Elecuically Trainable Artificial


Neural Network) chip from lntel [40]. Weight modification is perfonned using Fowler-Nordheim

tunneling and a weight resolution of 7 bits was demonstrated and weight retention was estimated

to 15 years at resolution of 4 bits. the Fowler Nordheim process utilized in CMOS process was

exarnined by Thomson and Brooke[76]. Due to thicker oxides associated with this process, long

term weight storage is estimated to be upto 25 years with 10 bits.

Floating gate transistors offer a simple and efficient scalar product circuit implementation

technique for neural networks. The technique is particularly suitable for large nenivorks fabricated

with modem VLSI processes.The most appealing feature of using floating gate MOS transistors

as analog multipliers in neural networks is; weight storage and inputs are implemented

concurrently by the sarne circuitry. The floating gate transistors provide an adjustable, non-

volatile analog memory and simultaneously behave as e l e m e n t q anaiog processors when

appropnately connected [40].

Although the tremendous potential of floating gate devices for large dense synaptic structures

has been amply demonstrated by [40], there remain several technical difficulties with this

technology. In these design methods [SO-521 extra voltage is required for memory prograrnrning,

which is slow and relatively imprecise. Cauwenberghs et al [45] have applied ultraviolet

illumination of the chip to perform weight update, therefore the rest of the circuitry has to be

shielded from the ultraviolet light. Therefore the attractiveness of floating gate memory as non-

volatile analog memory suffers from the necessity of a speciai fabrication process, difficulty

2. VLSI implernentation of neural networks

in small weight changes [SOI, requirements of a successive voltage pulses and measurements for

accurate voltage adjustment [5 11 and drift after every weight change.

2.2.2 Capacitive storage:

Synaptic and bias weights can also be stored as voltages on chip capacitors. The input gate of a

MOS transistor is actually a capacitor whose bottom plate is the channel of the transistor-When

charge is accumulated on the gate, charge carriers are drawn into the channel causing a current

flow. In general, the more the charge the larger will be the flow of current through the channel.

Therefore, the current flowing through the transistor can be used to represent the weight, and

the value of the current can be programrned by the arnount of charge on the gate. which is

proportional to the size of voltage appiied to the gate. The gate is isolated from the channel so

ideally the charge c m remain on the gate until its forced to change. There are various leakage

currents in the circuit and therefore much larger capacitors are needed. CMOS capacitors are very

efficient and decay time in minutes is achievable [77], the mernories are nevertheless volatile.

However with a back ground training process, it might be possible to maintain weights close

to desired target values, especially if the chip is cooled to cryogenic temperatures [39] to extend

the decay time.

Since the leakage currents cannot be eliminated completely therefore in most designs the voltages


on capacitors are seriaily and invisibly refreshed using an extemal digital memory for weight

storage and D/A converter [43-441. The rate at which the voltages are refreshed can have a direct

effect on how precisely a weight c m be represented. Faster the refresh rate the lesser will be the

voltage drop between refresh cycles so more accurate will be the weight. There are other effects

which effect the precision such as dock feed through in the switch transistors which sample the

weight voltage ont0 the capacitors, thus setting an upper limit on the refresh rate. If the capacitors

are large enough and reasonable control over the temperature is maintained, leakage current can

be minimized so that analog refresh rate from digital memory using D/A techniques will be

possible.

A proposa1 for weight stored on the gates of MOS transistors by Akers et al. [79] provided a

g r a s multiplication of an input voltage and the weight value. Since the voltages would be stored

on the relatively small capacitance MOS transistor gates, it will be necessary to update them

every 100 ps. One method for utilizing capacitors for long term storage proposed by Brown et ai.

[80]. it uses a refresh cycle similar to used in digital RAM. A sirnilar method of capacitive

storage was implemented by Mann and Gilbert [78]. A single capacitor holds the weight value

for each synapse and can be R/W by standard addressing means, the weight resolution is 6-8

bits. Furman and Abidi have described an analog CMOS error back propagation VLSI

implementation [39]. The design uses dynamic charge stored on capacitors with values 0.7 pF

as the weight representation. For their chip, cooling has been suggested to preserve the valid

weights after the training phase completion. The charges are updated in leaniing phase.


Single capacitor weight storage with a MOS capacitor was implemented by Satyanarayana et al.

[43] . The 1024-synapse chip allows d l of the weight capacitors to be refreshed in approximately

130 ps from off chip SRAM using off chip D/A converters. A weight resolution of t bits was

mentioned and implied that clock feed through on the sampling swiiches was the lirniting factor.

Boser et al. [7-491 has dso described a chip which uses a single capacitor to store weights. On

chip 6-bit D/A converters are used to refresh the weight from off-chip RAM. the inputs and the

neuron outputs are represented as 3-bit quantities. A learning network utilizing single capacitor

weight storage was developed by Arima et al. [54]. The main network is recurrent and implements

a new learning algorithm. A seconed feed forward network on the same chip provides a measure

of how well a pattern has been learned, which control the l eming circuitry.

In circuits with differential inputs the weights can be stored deferentially; so if the circuitry is

symrnetncal the leakage currents will be very sirnilar. Although the gate voltage wili drop at the

sarne rate but the differential voltage will be much more stable. This allows lower refresh rates.

The draw back of this differential storage technique is that capacitors consume relatively large

area.

In implementation of JPL group [8 11 with dual capacitors as weight values, precision of 1 1 bits

was reported. In this design capacitor charges are refreshed using extemal digital memory. Each

synapse ce11 consist of a pair of sample and hold circuits to store the differential weight value

and a four quadrant multiplier to obtain the product of input and weight. Kub's group at Naval

Research Laboratones implemented two different chips utilizing capacitors to store a differential


weight value [82]. The actual weight values are digitally stored off chip and the on chip voltages

are frequently refreshed by multiplexing the D/A converted values to the appropriate ceil. Resuits

showed that the weight retention was 50 times longer for differential storage in cornparison to

single ended storage. The researchers at AT&T Bell Laboratones have described a sirnilar weight

storage method [7 1,771. They used a a novel method in which weight values were modified. A

weight is initialized to zero by charging both capacitors to the sarne value. CCD(charge coupled

devices) were utilized to pump the charge bidirectionaily between the capacitors to increment o r

decrement the weights. The resolution of the weight was empiricaily determined to be

approximately 10 bits and the maximum weight update rate was estimated to be 2 X 10' updates/

second.

The capacitor as anaiog memory has the foliowing advantages; smailness in size, device structure

is fully compatible with CMOS process and the information as a charge can be controlled easily.

However its biggest drawback is leakage of charge. Even then it is a valuable memory device. It

is aiso used in neural networks as short terrn memory alongwith other non-volatile memories.

2.3 Hybrid implementation:

AI1 the methods described in the last section present problems in the design of a practical neural

network for . A realistic design should work in normal temperature and preserve connection

2. VLSI implementation of neural network

strengths during the operation of system. This requires a certain degree of efficiency in the

utilization of chip area. The design using dynamic charge storage uses large chip area for the

capacitive elements to ensure a reliable preservation of weights at different temperatures.Mu1ti-

level storage on capacitor involves a lot of overhead in circuitry to keep the capacitors locked at

certain voltages. Floating gate mernories show large variations in transistors characteristics and

need large voltages for programming,

The final approach to store the weights; is digital weights in digitai memory. In this method

MDAC(rnultip1ying digi ta1 to analog converter) perform the synaptic multiplications. The hybrid

multiplier produces an analog output through multiplication of digital weight by the analog input.

[85] . Also this allows a faster interface to a host computer and switching noise is only generated

while weights are updated. The hybrid MDACs are an attractive solution with the virtues of ease

of design and simulation but are limited to few bits of dynamic range. This c m causes adverse

effect on the convergence of ieaming algorithm [ I l .

The researchers at AT&T designed a configurable chip [83] with binary valued weights and input

to perform image processing tasks such as edge detection. They used SRAM cells to store

data. The stored data is represented as interconnection matnx and only one output, which has the

largest inner product value ktween input vector and the stored vector is chosen. The design does

not generate spurious steady States since it physically stores the desired data as weights. However,

it requires extra memory cells to store the desired data.

2. VLSI implementation of neural network

The JPL group irnplemented a chip with 32 X 32 MDAC synaptic crossbar matrix [84]. The

inputs are single ended giving two quadrant multiplication. An NMOS transistor operating in the

triode region provides voltage to current conversion. The synaptic precision is limited by

fabricating technology to 7 bits, such that there are only 128 monotonically increasing weight

levels.

Hybrid architectures have been proposed by Djahanshahi, Nosratinia, Yazdi [85,1,63]. A11 of

these use digital memories to store the synaptic and bias weights- The synapse is constnicted

by MDAC, multiplying the input with digital weight from memory. The output of MDACs is

added to generate the nonlinear sigmoid function. These architectures are discussed in the next

chapter.

Chapter 3

An Architecture for Multi-Layer Neural Networks

Among the numerous network models [59-601 the multi layer neural network is a major and

widely applicable rnodel. It is a feed forward network with one or more layers of node processors

between input and output layers which are cailed hidden layers. Al1 the node processors have

non linear characteristics. The resultant muhistage nonlinear characteristics of the network

provide a great potential of input/output mapping for classification applications. Lack of feed

back in the multi layer neural networks makes them inherently stable. This good dynamic

behavior as well as availability of powerfül training schemes such as error back propagation,

genetic dgonthms make them even more attractive from engineering point of view.

AI1 methods used by different researchers as mentioned in chapter 2 present difficulties for

practical applications of neural networks. A realistic design should work in normal temperature

3. An Architecture for MultiLayer Neural Networks

and preserve connection strengths during the operation of system. This requires a certain degree

of efficiency in the utilization of chip area. The design using dynamic charge storage uses large

chip area for the capacitive elements to ensure a reliable preservation of weights at different

temperatures.Multi-level storage on capacitor involves a sizeable overhead in circuitry to keep

the capacitors locked at certain voltages. Floating gate mernories show large variations in

transistors characteristics and need large voltages for programming.

Digitai weight storage has k e n addressed by Raffel [6 1 ,] and others [1,63,85]. This method is

chosen or the architecture presented in this thesis. The most general form of multi-layer neural

networks, without feedback is impiemented (Fig 1.3). Most of the functions are realized in analog

current mode circuitry because of their wider bandwidth, independence from voltage supply

restrictions and no requirement for adder hardware

3.1 A VLSI architecture:

An hybrid architecture for neural networks implemented by Djahanshahi is shown in figure 3.1

[85]. Each module consists of a 5-bit MDAC with register to store the synaptic weight and a

unified synapse neuron as shown in figure 3.2, generates partial non linearity. Ail of these partial

non linearities when connected in parallel generate a scdable sigmoid function.


A sigrnoidal non linearity for one or two synapses look like a hard limiting function for moderate

or large number of input synapses e.g N>5 because of large saturating areas. Therefore a scaling

scheme proportional to f i is to be desired.

Figure 3.1: An hybrid architecture by Djahanshahi


S-shape load element

Figure 3.2: A unified synapse neuron

3.2 The multiplexed architecture:

The general f o m of the architecture[l];shown in figure 3.3. The structure of each layer is similar

as shown in figure 3.4. Each stage is defined as a layer of neurons and their corresponding

connection strengths. Each set of connection strength is associated with the neurons of next layer

rather than preceding layer. Thus each stage constitutes neurons present in that layer plus al1

the connection strengths from the previous layer. Therefore the number of physical connections

between two stages i and i+l is reduced to the number of neurons in stage i. This is apparent

;once the weights are fixed after training the information passed to a layer does not exceed the

information present at the output of the preceding layer and the minimum number of lines

needed to carry this information is no more than mi.

In this architecture, inter stage wiring has been dramatically reduced and much larger systems

3 1


can be designed using a number of chips, each of which constitutes one stage network-This c m

aiso Iead to semi custom approach to the design of neural system, where only one generic chip

containing one stage need to be designed. This chip should then contain the maximum number of

neurons in any stage. If the number of neurons in any stage is greater than needed. they can be

masked out by assigning zeros to the incorning weights.

Layer K

Layer 2

Layer 1

y2 r3 O P m m J mk o U ~ ~ U ~ S

O o o m m . O mtnewo~s

I I I I rnt-1 conrrections

m2 conn&ions

1 m2 neurons

Figure 3.3: Layers and stages in the architecture


ROM

Figure 3.4: Interna1 structure of a stage

The idea of multiplexing the connection strengths is depicted in the figure 3.4. It is possible

since, the analog bandwidth c m be made much smaller than the clocking frequencies available

in digital CMOS. Since the feed fonvard networks are free of feed back, therefore the delays

introduced by multiplexing do not disrupt the network dynamics and only increase the time

latency. The constraint for this architecture is ihat the analog nodes should be refreshed in

time so that the output of the neurons is kept valid at al1 times. The minimum clock speed depends

on the leakage of nodes, itself dependent upon temperature. and is in KHz range. The weights are

presently stored in static ROM and can be changed to RAM also.

3. An Architecture for MultiLayer Neural Ne tworks

3.3 Operation:

Let mi be the number of the neurons in the ith stage. The synaptic weights corresponding to

the ith layer and its preceding layer are saved in (mi-i + 1) x mi two dimensional digital weight

memory array. Each memory block forms the input of one of the stage multipliers-The threshold

of the neurons can be considered as a negative signal coming from a source with a strength of

unity and a weight equal to the threshold value. One row of the storage is dlocated to the

to the threshold value and to avoid mismatching the current source shown in figure 3.4 is actually

another neuron with signal and bias connected to VDD and ground respectively. The multipliers

are basically current mode multiplying digital to analog converter. The output current of each

multiplier is product of its input digital weight by the analog input current. The adder is actually

a cornmon mode output node of the multipliers and a current to voltage converter

A counter with cycle of mi is the main stage address/control generator. At kth (Ockani) time

slot the decoded output of the counter addresses the kth location of each mernory block. The

mu1 tipliers and the adder perform the synaptic/threshold computations of the kth neuron

activation. Meanwhile the stage demultiplexer connects the adder output to the branch going to

the kth neuron input. A capacitor at the input is applied as a short term memory. This storage

preserves valid neuron input for the period that the activation of al1 the stage neurons are being

computed.


ADDER ,

Trans- lmpedance Amplifier

- To Demul tiplexer

Figure 3.5: The adder

An inhibit signal and a switching circuit is also added before neurons. The signal is nothing but a

a delayed inverted clock. Because of the finite settling time of the multipliers and specifically

the tram-impedance amplifier inside the adder block as shown in figure 3.5, the signals at the

output of the demultiplexer are not stable at the time the address becomes available. If this

inhibition circuitry is not provided, the capacitors at the input nodes of the neurons will receive

erroneous charging voltages at the beginning of each refresh cycle. The corresponding voltage

spikes in the signais effectively increase the settling time.

3.4 Input/Output requirements:

In the architecture discussed above the input and output signals are in current mode. For V O

communication it is better to use voltage mode due to two reasons. First; the current levels are


in microamp range so would be severely affected by the noise that c m be coupled to the relatively

long connection wires to the chip package. Secondly the current mode circuits have high output

impedance, therefore the high impedance nodes especially outside the chip are susceptible to

voltage noise spikes of large amplitude, which wiH take the current mode circuits to saturation

if the node voltage goes beyond power supply voltage.

Therefore to have the voltage mode inputioutput, a voltage convener at the input and a t r ans -

irnpedance amplifier at the output may be used as shown in figure 3.6. The combined transfer

characteristics of the current to voltage convener and the trans-impedance amplifier should have

a slope of unity. The magnitude of the current mode outputs should also be taken into account

such that none of them will be driven into nonlinear zone in normal course of operation.

For the output, another variation is possible which can Save the chip area. Voltage mode neural

blocks as shown in figure 3.7, can replace the current mode neurons and the trans-impedance

amplifiers at the last stage. It has an additional advantage; feedback is used in trans-impedance

arnplifiers to achieve linear stabilized 1-V characteristics. The feedback also results in

performance degradation in the network dynamics. Most of the propagation delay of the network

is due to settling time of the trans-impedance amplifiers, thus by elirninating them the total

network delay can be reduced.

The disadvantage of using voltage mode neuron into network is; transfer charactenstic of the

voltage mode neuron is slightly different from the current mode neurons and is almost linear.


This small amount of error sometimes may not be acceptabie therefore the choice of solutions will

depend upon the particular application.

-~~~ network

X-Z Amp z Figure 3.6: Conversion of the signal mode at input and output

voltage mode inputs /(network roltage mode outputs

I Figure 3.7: Seconed solution to the i/0 mode


3.5 Multiplexing:

One of the iirniting factors on the size of the reaiized networks is large number of interconnections

between two consecutive stages Le. if the number of neurons is N,o(N') connections are

required. The interconnection complexity necessitates the design to be squeezed in one chip. Even

then the massive interconnections occupy a large share of the die area. This can be solved by time

multiplexing the interconnections(56-581. However this scheme reduces the forward pass speed

of the network.

The total input/output propagation delay of a multi-layer neural network without any concurrency

in the operation of its stages, is linearly dependent on the delay of each stage as well as the

number of the stages. Let the inputloutput propagation delay of the ith stage of the network be

Ti. The totd inpuiloutput propagation delay of the massively interconnected network with

M layers is[63]:

The total propagation delay for the same network with an Ni base time multiplexing scheme in

the ith stage is:


The latter relationship shows that the number of layers should also be chosen carefully to achieve

an acceptable speed in any time multipiexing implementation.

3.6 Pipeiined architecture:

Implementation of multi-layer neural networks with large number of layers may not be possible

using a tirne multiplexed approach without any scheme of speed improvement; since the speed

trade off with the multiplexing scheme could bring the performance below acceptable margins.

Speed improvements may be achieved by either architectural measures or faster building blocks.

Yazdi proposed a pipelined architecture for speed improvement as shown in figure 3.8 f63]. The

operation is similar to the architecture of Nosratinia [ I l . Each stage operates on its input set, pass

the valid output to the input latch of its next stage and will stop operating. The latched values are

only dtered when the output of preceding stage is completed. The only required control signal is

stop signal Si . After al1 the outputs of the ith layer are computed stop signal will be activated. This

stop signal will halt the operation of the ith layer, signal the proceeding latch to latch the output

of the ith layer and signals the next layer to resume operation on new latched inputs.

Generation of stop signal Si is quite simple; in a N base time multiplexed stage there is a N-base

counter decoder. The output of the counter at the Nlh time slot can be used to trigger a monostable

which generates Si . The modified structure of each stage is shown in figure 3.9. In this structure


the neurons of the stage have been merged to a single neuron. This neuron has current input and

sigrnoid voltage output, thus eliminating the crans-impedance amplifier. The demultiplexed output

is passed to appropriate analog latches.

l m O r

O 0 o m * I I O .

00 : Layer K

. I Analog Latch

1 I o . O w O

O

O O a O

a O O .

1 Analog Latch

1 Anatog Latch Pi

ml narms

mo inputs

Figure 3.8: Pipelined architecture


Figure 3.9: Modified structure of each stage

3.7 Performance analysis on an example:

The XOR problem has k e n selected for the VLSI implementation of the architecture of Aria

Nosratinia [ 11 due to its disconnectedness. It cannot be solved by a simple perceptron, because the

argument space is not linearly separable. A perceptron with one hidden layer can solve ail

problems where the argument space is divided into two convex open or closed regions of

arbitrary shape [60].

The state of the network with hidden layer as shown in figure 3.10, and is governed by the

following equations:


' j = s g n ( ~ w j k o k - ' j ) 3 -3

where Wjk and Wij are synaptic weights between input and hidden layer and hidden and output

layer . 19, and 19, are threshold values for hidden and output layer neurons. Sj and Si are output of

hidden and output layer and =k is the input.

The hidden neuron s l plays the logical element representing the AND function while the s2

emulates the logical OR element. The combination of these two elements allow the generation

of boolean function XOR.

Figure 3.1 O: XOR network with hidden layer


3.7.1 Software implementation of XOR:

The XOR function was implemented using the feedfoward neural networks. The network

consists of one hidden layer in addition to input and output layers. The six connection strengths

and three threshold values were calculated in the training process. These

by MATLAB using error back propagation algorithm with the following

values were cornputed

test vectors:

.. - -

Table 3. i : Test vectors

The result of training is shown in figure 3.1 1. The simulation results of the network with are

given in table 3.2

OUT

O . O s 6

0.9504

Table 3.2: Simulation results


Figure 3.1 1: Implemented network for the XOR function

3.7.2 Quantization:

The network as shown in figure 3.1 1 has k e n trained by error back propagation algorithm which

generates continuous valued connection strengths. In the architecture of Nosratinia [llthe weights

are represented in fixed precision format. This implies that the weights obtained from the training

process have to be quantized and limited in range before they can be incorporated into the neural

network architecture.

The network and its connection strengths as shown in figure 3.1 1; the dynamic range of the

numbers is limited and the concentration of weights is around +8. The maximum range is chosen

as k8: and no truncation will be necessary.


where Q is the resolution, W,, and Wmi, are minimum and maximum values of synaptic

weights and n is the number of bits

With eight bits of accuracy the resolution will be 0.0625.

The figure 3.12 shows the network with quantized weights and threshold values. These are

actually the values stored in the ROM and system level simulations show almost no change in the

behavior of circuits and the output are the sarne as original up to two least significant digits as

shown in table 3.3.

Figure 3.12: Network with quantizes values

Table 3.3: Simulation results with quantized values

e O

0

1

1

HI

0.0094

0.0-7

0.0507

~2

0.9466

0.0i59

O.0i59

OUT

0.0378

0.9-5

0.SSBS


3.7.3 VLSI Implernentation of XOR:

The function XOR bas been implemented by using the architecture of Nosratinia [l]. The

circuit consists of 2 layers:2 input neurons, 2 hidden neurons and one output neuron. Two stages

are integrated in one chip to reaiize the network as shown in figure 3.13. The nine synaptic

weights of the network are saved in ROMs-The architecture and operation of each stage has k e n

explained in section 3.3 and 3.4. Since there are only two memory addresses in each ROM

section, the clock signal and its complement were used for selection and address of weights,

thus address decoders have been avoided. To have voltage mode output, voltage mode neurons

have been used in the second stage. At the input, instead of voltage to current converters, two

current mode neurons have been used. This is pemiissible because the test vectors are digital and

the behavior of this block is unimponant between O and 1.

Waveforms of the signals are given in figure 3.14. The output transition happens at alternate 2

dock cycles after each input transition.The data is presented to circuits at a frequency of 2 MHz.

The effect of inhibit circuitry is very evident as the input to neurons is smooth. The waveforms

presented in this section are the result of full scale SPECTRE simulations. The simulation results

of individual blocks will be presented in next chapter.

The neural chip is implemented in 0.5 micron triple metal single poly CMOS process provided

by Hewelett Packard through Canadian Microelectronics Corporation using Cadence DF-II

design tools. It occupies 1500 X 1500 micro meters of the die area including the bonding pads.

The mask layout of the chip and the cells is illustrated in appendix A.

3. An Architecture for MuitiLayer Neurai Networks

ROM

ROM

dod

Figure 3.13: S ystem leveI VLSI layout for XOR


Clock

Input A

Input B

Inhibit input i : finhibit-input

3 .Bu 6,0u time ( s

Figure 3.14: System level simulations (continued)


Output of Adder of stage 1

Output of De-multi plexer A

Figure 3.14 : System level simulations (continued)

3. An Architecture for Mu1 tiLayer Neural Networks

Output of De-multi plexer B A

> w

Output of T-gate A

A

> w

Output of T-gate B

Output of stage 1 neuron A

n

d w

- 0 t - . 1 . . p . , ,

0.0 3.0~ 6.0~ 9 .Bu time ( s )

Figure 3.14: System level simulations (continued)


Output of stage 1 neuron B

Output of Adder of stage 2

Network Output , > w

Figure 3.14: System level simulations


3.7.4 Testing of neural chip:

This was a simple test chip which included two stages. The tests required a HP work station with

VeeTest software and CMC's TH 1000 testhead. To perform the tests a program was written

using HP's VeeTest software to control the testhead. This prograrn controlled the testhead's

curent source and when measurements were taken. The measurements were graphed and saved

on the Tektronix oscilIoscope. The data was transferred from oscilloscope to PC and plotted by

MATLAB. Figure 3.15 shows the input/output wave forms of the chip at a clock speed of 2 MHz.

Figure 3.15: Fabrication results of the neural chip


3.8 Summary:

Analog CMOS for the implementation has k e n used as the basis of the architecture presented

in this chapter. The main aim in this design is permanent weight storage and reduction in physical

interconnections. The multiplexing scheme in this architecture diows the reduction of the

number of synaptic multipliers and physical interconnections. This is justified by the fact that

bio1ogica.i neurons are generally much slower than CMOS circuitry and even then have good

performance unparalleled by a digital computer.

The XOR problem has been used as an example to test the validity of the concept-The boolean

function XOR has been implemented in software using error back propagation algorithm and the

trained network is implemented in hardware using the architecture.Since the XOR function is

universal therefore the architecture needs not to be trainabie; ROM has k e n used to store the

digital weights (analog quantized weights).

Chapter 4

VLSI Circuitry

The main building blocks of the multi-layer feed forward neural network for the realization of

boolean function XOR will be descnbed in this chapter. The Hspice level3 and level 13 models

have been used for simulations and are shown in Appendix B.

4.1 Neuron:

In neural networks models, the neuron has nonlinear transfer characteristics of the form of a

sigmoidal function; have different saturation levels for low and high inputs. It is continuously

differentiable which is a good approximation to the activation of biological neural cells. The feed

forward multi-layer neural networks are trained by steepest descent algorithm like error back

propagation; usually a sigmoidal charactenstic for the neuron is to be considered:

4. VLSI Circuitry

The sigrnoidai function has the advantage that its derivative can be expressed in terms of itself

and its shifted version. The derivative is used in weight update.

SlGMOlD FUNCTlON AND ITS DERIVATIVE

Figure 4.1 : Sigrnoid hnction and its denvative

The MOS transistors differential pair as shown in figure 4.2 has k e n used to approximate these

characteristics. The large signal behavior of the differential pair is given by [72]:

M 1 and M2 are assumed to be in saturation. The solution for I I and Iî can be obtained by

substituting (4.4) in (4.3). The four regions of operation are:

Region 1 :

4. VLSI Circuitry

The transfer characteristics of the differential pair when I,, = 1 and fi =

The shape of this curve is a good approximation to the figure 4.1.

1 is given in figure 4.2.

Figure 4.2 : Differential transistor pair and its uansfer characteristics

4. VLSI Circuitry

4.1.1 Current mode neuron:

The current mode neuron is shown in figure 4.3. is a neuron with fixed threshold voltage [63].

The threshold value is also expressed as bias weight, which when multiplied by negative unity

results in threshold value. Therefore the effect of threshold will be included in the post synaptic

neuron activation.

The circuit core consists of differential pair transistors M 5 and M6. M7 is an ideal current source

and bias current is reflected to il. The gate of M6 is the neuron input and gate of MS is connected

to the drains of M 1 and M2. The W/L ratios of M 1 and M2 are adjusted such that the drain of

these transistors is at O volt and provide bias current through M7 to the differential pair. The

The current output of neuron from M6 is reflected through an output driver consisting of current

mirrors M4-M8 and Mg-M 10 transistors. M3 is used to make the circuit symmetric. The capacitor

connected at the input node serves to reduce the voltage drop in time between two refresh cycles

i-e. acts as short terrn memory.

Transistor W k (ri m)-Capacitor size(pF)

1 M5-M6 1 112.6 1 Capacitor 1 0.5 1

Table 4.1 : Device sizes of current mode neuron

4. VLSI Circuitry

Vdd

Figure 4.3: Current mode neuron

0.0 = l . . . . 1

0.0 1

2.0 4- 0 6.0 dc C V )

Figure 4.4 : Transfer characteristic of current mode neuron

4, VLSI Circuitry

The circuit is implemented in N-well process. thus is prone to body effect therefore Montecarlo

analysis were performed. In these test a variation of 10 % (uniformiy distributed) in threshold

voltage VT of the transistors in the design around their nominal values is allowed [74].The figure

4.5 shows the result of analysis over thirty iterations. The output was found to be within 3 % of

the desired value i.e. 22 micro amperes.

, . I

C .- A*. .

.I. .

Figure 4.5: Results of Montecarlo analysis for current mode neuron The circuit is implemented in 0.5 micron CMOS single poly triple metal technology provided by

Hewlett Packard, has an area of 57.4 X 23.5 pn and has an peak power dissipation of 0.6 rnilli

watts. The device sizes are shown in table 4.1. The rnask layout of the ce11 is presented in

Appendix A.

4. VLSI Circuitry

4.1.2 Voltage Mode Neuron:

The voltage neuron is designed by adding a current to voltage convener to the current mode

neuron [ 11. The constant current from the drain of M6 is subtracted by M9 from the input node of

M 10 and M l 1 which form the current to voltage converter. The schematic and transfer

characteristics are shown in figure 4.6 and 4.7. The device sizes are s h o w in table 4.2. It has an

area of 6 1.1 X 25.6 pm and peak power dissipation of 1.1 rnilliwatts.

The voltage mode neuron has a positive and negative saturation leveis. Since this neuron is only

used at the output stage, therefore the saturation level wiIL not affect the network operation

adversely [56].

Ti-ansistor W/L (1 m)-Capacitor size (pF)

Table 4.2 : Device sizes of voltage mode neuron

The results of Montecarlo analysis for voltage mode neuron are shown in figure 4.8 and the

results are within 1 % of the desired value i.e. 4.5 volts.

MS-M6

M7

The ce11 is implemented in 0.5 micron CMOS single poly triple metal technology provided by

Hewlett Packard. The mask layout of the ce11 is presented in Appendix A.

A

112.6

1 14

Ml 1

Capaci tor

115.4

0.5

4. VLSI Circuitry

Vdd

Figure 4.6 : Voltage mode neuron

Figure 4.7: Tram fer c haracteristics of voltage mode neuron

4. VLSI Circuitry

Figure 4.8: Montecarlo analysis for voltage mode neuron

4.2 Synapse:

The synapse is a connection between the two neurons. It is realized by multiplier and the

connection strength. The synapse in this architecture is realized by a current mode digital to

analog multiplier and converter and the connection strengths are digitally stored in ROM.

4. VLSI Circuitry

4.2.1 Multiplier:

The multiplier as shown in figure 4.9 is a current mode digital to analog converter and multiplier.

The multiplying elements are the digital connection weights and analog input, the multiplicand is

analog current output. The traditional method of making a converter by using a linear bipo1a.r

process is to generate binary weighted current mirrors [87]. The same technique can be employed

with MOS transistors. The difficulty experienced with this method is the result of variation of

currents due to their drain source voltages mismatch. This can be considerably reduced by using

cascode or Wilson current mirrors [72].

The circuit consists of series of cascode current mirrors, each of which divide the input current by

half. The current output of these current mirrors is sumnied up by a NMOS acting as switch k i n g

controlled by the digital input. The choice of cascode current mirrors is based on their high output

resistance and graceful degradation of linear transfer characteristics [ 11. It has an area of 101 -7 X

46.7 pm and peak power dissipation of 6 milliwatts.

A fixed device size for al1 the NMOS and PMOS transistors for al1 the stages has been used.

An 8-bit digital to anaiog converter and multiplier is implernented in 0.5 micron single poly triple

metal CMOS technology provided by Hewlett Packard-The mask layout of the ce11 is presented in

Appendix A. The transfer characteristics of the ce11 with al1 the digital inputs set to 1 is shown in

figure 4.10.

Vdd

C

- iqn bit

I l

O Analog Current Out

Figure 4.9: Multiplier digital to anaiog converter

4. VLSI Circuitry

~ DC Response

Figure 4.10: Transfer characteristics of MDAC

4.2.2 ROM:

One of the factor for the selection of digital memory device is large number of weights i.e. for N

number neurons, O ( N ~ ) weights are required to be stored. The NAND configuration results in

considerable loss in performance and is only useful for small memory arrays [ 5 ] . Therefore the

NOR ROM as shown in figure 4.1 1 has been seiected for the implementation. It is a combination

of p-channel pullup and n-channel transistors as pull downs constitute a pseudo-NMOS NOR gate

with the word lines as inputs. The lower resistance of the pullup transistors causes lower noise

margins which is controlled by feeding the bit lines to complementary inverters. Since the delay

introduced in the circuit by analog blocks is rnuch higher than ROM, therefore the static ROM for

the systern is sufficient. The results for the ce11 is shown in figure 4.12.

out 1

4. VLSI Circuitry

Vdd

Figure 4.1 1 : ROM

Figure 4.12: Simu lation results of ROM

4- VLSi Circuitry

The ce11 has been implemented in 0.5 micron single poly triple metal CMOS technology provided

by Hewlett Packard. It has an area of 106.3 X 74.3 pm and peak power dissipation of 17.5 milli

watts-The mask layout is presented in Appendix A. The ce11 has also been simulated by verilog

-XL simulator, the v e d o g source code is given in Appendix C

4.3 Trans-Impedance amplifier:

The trans-impedance amplifier as shown in figure 4.13 is basicaI!y a current to voltage convener.

This circuit is part of adder block as shown in figure 3.5. converts the summed current output of

hlDACs to voltage. A prime requirement of trans-impedance amplifier is to desensitize the input

from capacitive loading. in the bipolar technique Miller capacitance is achieved by the input

transistor stage [88].This can be implemented in MOS circuits by stabilizing the circuit with the

load irnpedance or adding a compensation capacitor across the feedback resistor [ 1 1 .

It consists of two stages; a differential amplifier (M3, M4, M5, M6, M7. M8, M9) and second

gain boosting stage comprising of M 1, M2, M 10 and M 1 1. A voltage parallel feedback is

applied through active resistor M 12 and M 13 which result in predictable 1-V characteristics.The

drain current of M6 is reflected to M l 0 through M4 and is available for sourcing it to load.

M 1 1 provides the abiIity to sink current from the output load. Capacitor is added to stabilize the

circuit as well as to decrease the settling time. The uansient response of the cell is shown in figure

4.14. The device sizes are shown in table 4.3. The ce11 is implemented in 0.5 micron single poly

triple metal CMOS technology provided by Hewlett Packard. It has an area of 55.5 X 29 Pm and

peak power dissipation of 3.3 milli wattS.The mask layout of thc ce11 is presented in Appendix A.

4. VLSI Circuitry

Table 4.3 : Device sizes of tram-impedance amplifier

--

Figure 4.13 : Trans-impedance amplifier

4. VLSI Circuitry

Transient Response

Figure 4.14 : Transient response

4.4 VLSI irnplernentation of Capacitor:

Theoretically, the capacitance is represented by

A C = E- d

4.1 1

where A is the area of electnc plate, d is the distance between two plates and E is the dielectric

constant of the isolator.

A capacitor can be designed using standard MOS transistor. This approach suffers from two draw

backs. First, the capacitance of MOS is non linear and secondly it uses large chip area. In case of

MOS transistor used as capacitor the area is given by

4. VLSI Circuitry

where Tm is the thickness of the gate oxide, E,, is the dielectric constant of gate oxide, C is the

size of capacitance and A is the area (A=WL).

There are two methods for the passive realizations of capacitors. One method uses double

polysilicon separated by silicon dioxide. It requires a double polysilicon process as upper and

lower plates of the capacitor are formed with polysilicon. The dielectric is formed by a thin

silicon dioxide layer. The other method uses a conducting layer on top of crystalline silicon

separated by a dielectric silicon dioxide. In order to achieve a low voltage coefficient, the bottom

plate must be heavily doped diffusion [72].

Our design is implemented in 0.5 micron CMOS pmcess which is a single poly process, therefore

the later method described above has k e n used to implement capacitor. The area in this case is

given by

where Co is nominal capacitance, Cl and C2 are voltage dependent coefficients which are

negligible [73]. This method showed a more linear behavior and much lesser area in cornparison

to the active capacitor. Table 4.4 shows the sizes versus capacitances.

4. VLSI Circuitry

-- -

Table 4.4: Device sizes of capacitors

The mask layout of capacitor is shown in Appendix A.

4.5 Summary:

The main building btocks for the VLSI implementation of boolean funclion XOR by multi-layer

feed fonvard neural networks are described in this chapter. These blocks are redesigned for

implernentation in 0.5 micron technology [ 1,631.

A syrnrnetnc differential pair is used as the main element in current and voltage mode neuron

,which has the characteristics of sigmoid hnction.The hybrid synapse has k e n constmcted by

current mode digital to analog multiplier and static ROM. The multiplier uses cascode current

mirrors and static ROM has k e n used. The uans-impedance amplifier for conversion of current

to voltage uses voltage parallel feed back for predictable 1-V characteristics and the capacitor

ensures stability.

Chapter 5

Conclusion

Among neural networks the multi-layer feed forward neural networks are the major and widely

applicable modei. The multistage non Iinear characteristics, lack of feedback and powerful

training schemes provide a great potential for imagelspeech recognition, control etc. Although

the neural networks can be implemented by software, but it takes a long time to simulate, because

a highly parallel system is modeled by a serial processor. Therefore complete exploitation of the

potentials require a large number of operations at high speed which necessitates the hardware

reaiization.

Analog VLSI offers area-efficient implementation of the functions required in neural network

such as multiplication, summation and sigmoid transfer function. However the analog circuits

are sensitive to the problems of process variation. device matching, cascadeability. For this reason

attention must be given to the limitations of MOS transistors and to the design techniques-As

5. Conclusion

a result the high accuracy and linearity found in digital implementations can be traded off for the

simplicity, speed, silicon area and interconnectivity found in analog circuits. But one of the major

drawback of analog implementation is non-availabiliry of reliable non-volatile memory to store

synaptic weights. The neural networks have a large arnount of connectivity which is a major

probIem in the hardware reaiization. The former problem c m be solved by using a rnixed signal

architecture in which the synaptic weights c m be easily stored in digital memories.The

reduction in physical interconnections can be achieved by implementing time division

mu1 tiplexing.

The rnixed signal architecture [ 1 ] based on analog CMOS units and digital synaptic weight

memory has been analyzed. The architecture is well suited for descent based learning

algorithms such as errorback propagation and weight perturbation [63]. As an example for the

architecture ,the boolean function XOR has been implemented. The building blocks and network

have been implemented in 0.5 micron CMOS process.

The architecture was analyzed in chapter3. In this architecture the number of physical

interconnections and multipliers is reduced considerably from the full implementation. The y0

mode of the network was also discussed in section 3.4. The boolean function XOR is

irnplemented as an example to test the validity of the concept. Since the arcliirecture is hybxid

therefore quantized analog values are stored in digital memory, the network with quantized

values is discussed in section 3.8. The VLSI implementation of the XOR network is discussed in

section 3.9.

S. Conclusion

The analog and digital CMOS blocks [ I l have been redesigned and are presented in chapter4.

The neuron is presented in two modes: current and voltage mode with fixed threshold. MDAC

and ROM constitute the synapse. ROM stores the synaptic weight and MDAC performs synaptic

multiplication. The tram-irnpedance amplifier is the linear current to voltage converter- It is a

differential amplifier with voltage parallel feedback, the stability is ensured by connecting a

compensation capacitor across the feedback. The robustness of the neuron circuits has been

tested by montecarlo analysis.

Speed improvements in the area of circuit design are important areas for future research. The new

integration technologies with smaller channel lengths and lower power supply range will add

some factors to circuit design and the current building blocks will have to be modified before they

can be used in advanced integrating technologies.The low power and faster building blocks will

be able to increase the speed of the network. This is an uphill task which if successfully done will

enhance the speed of the architecture and decrease the power consumption.

APPENDIX A

Mask Layout Diagrams

The appendix contains layout diagrarns of the building blocks of the network. The layout diagram

of a chip for XOR function is aiso presented in this appendix. The blocks and chip are

implemented in 0.5 micron single plysilicon triple metal CMOS technology. The mask layouts

of the cells/chip are:

1 . Current mode neuron.

2. Voltage mode neuron.

3. Multiplier and digital to analog converter (MDAC).

4. Trans-impedance amplifier.

5. ROM with buffers.

6. Capacitor

7. WRZAC( the neural chip for XOR)

A. Layout Diagrams

Figure A. 1 : Mask layout of the current mode neuron

A. Layout Diagrams

Figure A.2 : Mask layout of voltage mode neuron

A. Layout Diagrams

Figure A.3 : Mask layout of multiplier and digital to analog converter

Figure A.4 : Mask layout of tram-impedance amplifier

A. Layout Diagrams

- . . . . . . . . . . . . . - - . . . . . . . . . . flAFrx:.~~:~**@ri, . . . . o . . . .. .......... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - - W....

- -- - - - - - - - . . - - - - - - - - - -

Figure A S : Mask layout of ROM with static buffers

A. Layout Diagrams

f i . . . . . . . . . . . . . . . . . : a .. . . . . . . ......... . - i .. C I I .

: ' 1 ......... . . a . . -. ..

..S. ..... ! ......... . . - 1 . . . . . . . . . . . . . . . . . . . . . . . . I ................................................................................................... . , .: . .: :. :.: ! ; - ; ! ; - : -: - - : - : - ? ;.;.; i - ; . : > : - : :

. . a . . . - . . .."" .. - - ! - 1 l " . ~ . . . . . . . ....................................................................................... "...."* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .: . : 1 . . . .~.:..-X..:.~.:...~...:.O.:~..I...:Z..:.....:Z......:..I.....:..:......:..:...>.. . . . . . . . . . . . . . . . . . . . . . . .

Figure A.6: Mask layout of capacitor

A. Layout Diagrams

Figure A.7: Mask layout of neural chip

APPENDIX B

Simulation Models

B. 1 Hspice level3 mode1 parameters:

.MODEL CMOSN NMOS LEVEL=3

+ PHI=0.700000 TOX=9.6000E-09 XJ=0.200000U TPG= 1 + VTO=0.6566 DELTA=6.9 100E-O 1 LW.7290E-O8 KP= 1.9647E-04 + UO=546.2 THETA=2.684ûE-0 1 RSH=3.5 12OEi-O 1 GAMMA=0.5976 + NSUB= 1.39îOE+ 17 WS=5.9090E+11 VMAX=2.008OE+05 ETA=3.7 180E-02 + KAPPA=2.8980E-02 CGDO=3.05 ISE- 10 CGSO=3.05 ISE- 10 + CGBO4.0239E- 10 CJz5.62E-04 MJ=0.559 CJSW=S.OOE- 1 1 + MJSW-0.52 1 PB4.99 + X W 4 . IOSE-07 * Weff = Wdrawn - Delta-W * The suggested Delta-W is 4.108OE-07

.MODEL CMOSP PMOS LEVEL=3 + PHI=0.700000 TOX-9.6000E-09 XJ4.200000U TPG=- 1 + VTO=-0.92 13 DELTA=2.8750E-0 1 LLk3.5070E-O8 KP4.8740E-05 + UO= 135.5 THETA= 1.8070E-0 1 RSH= 1.1000E-0 1 GAMMA=0.4673

B. Simulation models

+ NSUB=8.5 1 20E+ 16 NFS=6SûûûE+ 1 1 VMAX=2.5420E+05 ETA=2.45OOE-02 + KAPPA=7.958OE+OO CGDCk2.3922E- IO CGSO=2.3922E- 10 + CGBOz3.7579E- 10 CJz9.35E-04 MJz0.468 CJS W S 8 9 E - 10 + MJSW=0.505 PBd.99 + XW~3.622E-07 * Weff = Wdrawn - Delta-W * The suggested Delta-W is 3.6220E-07

B. 2Hspice level 13 (Berkeley Level 4; HSPICE Level 13) model parameters:

NMOS PARAMETERS

.MODEL CMOSN nmos levek 13 +vfbO=-7.05628E-0 1 ,Ivfb=-3 -86432E-02, wvfb4.98790E-02 +phi0=8.4 1 845E-0 l,lphi=0.00000E+00, wphi=û.OOûûûE+OO +k l=7.7657OE-O 1 ,Ik 1=-7.65089E-O4,wk l=-4.83494E-O2 +k2=2.66993E-02,lk24.57480E-02,~k2=-2.589 17E-02 +e tao=- 1 -94480E-03, leta= 1.7435 1 E-02, weta=-5 -089 1 &-O3 +muz=5.75297E+02,dlO= 1 -70587E-00 1 ,dwW.75746E-00 1 +u00=3.305 13E-0 1,1~0=9.75 1 lOE-O2,~~0=-8.58678E-02 +U 1 =3 -26384E-02, IU 1 =2.94349E-O2, WU 1 =- 1.38002E-02 +x2rn=9.73293E+00,1~2m=-5.62944E+OO, wxSrn=6.55955E+00 +x2e=4.37 180E-04,1~2e=-3.070 IOE-03, wx2e=8.94355E-04 +x3e=-5.050 12E-O5,lx3e=- 1 -68530E-03,wx3e=- I -4270 1E-03 + ~ 2 ~ 0 = - 1.1 1542E-02,1~2~0=-9.58423E-04, ~ ~ 2 ~ 0 4 . 6 1645E-03 + X ~ U 1 =- 1.04401E-03,lx2u 1= 1.2900 I E - O ~ , W X ~ U 1=-7.1009SE-04 +mus=6.927 16E+02,lms=-5.2 1760E+O 1, wms=7.009 12E+00 +x3ms=-6.4 1307E-02, Wms= 1.37809E+00, wx2ms-d. 1 S45SE+OO +x3ms=8.86387E+OO, lx3ms=2.0602 lE+ûO,wx3ms=-6.198 17E+00 +x3u 1 =9.02467E-03,lx3u 1 =2.06380E-û4,~~3u 1=-5.202 18E-03 +toxrn=9.60000E-003, tempm=2.70000E+O 1, vddm=S.OûûûûE+ûû


*N+ diffusion:: * +rshm=2.1, cjm=3 SOOOOûe-04, cjw=2.900000e- 10, +ijs= l e-08, p j d . 8 +pjw=0.8, mj0=0.44, mjw=0.26, wdf=O, ds=O

Gate Oxide Thickness is 96 Angstroms

PMOS PARAMETERS .MODEL CMOSP pmos level= 13 +vfbO=-2.026 lOE-Ol,l~fb=3.59493E-O2,~~fb=- 1.1065 1E-0 1 +phi0=8.25364E-0 l,lphi=û.OûûE+ûû, wphi=û.O0000E+OO +k 1~3.54 162E-01 ,lkl=-6.88 193E-02, wk1=1 S2476E-0 1 -tu=-4.5 1065E-02, Ikh9.4 1324E-03, ~k2=3.52243E-02 +etaO=- 1 -07507E-02, leta= 1 -96344E-02,weta=-3.5 1067E-04 +muz= l.37992€+02,dlO= 1 -92 169E-00 1 ,dwW.68470E-00 1 +uOO= 1.8933 1 E-O l,Iu0=6.30898E-02,~~0=-6.38388E-02 +U 1 = 1.3 17 10E-02, lu 1 = 1 -44096E-02, WU 1=6.92372E-04 +x2m=6.57709E+Qû,lx2m=- 1.56096E+00, wx2m= l.I3564E+OO +x2e=4.68478E-O5,1~2e=- 1 -09352E-03,wx2e=-1.53 I 1 1E-04 +x3e=7.76679E-O4,1~3e=- 1.972 13E-M,wx3e=- 1.12034E-03 +x3u0=8.7 1439E-03,1~2~0=- 1 -92306E-03, WX~UO= 1 -86243E-03 +x2u 1 =5 -9894 1 E-04,1~2~ 1 =4.54922E-O4, W X ~ U 1 =3.11794E-04 +mus= 1.49460E+02,lms= 1.36 l52E+û 1, wms=3.55246E+ûû +x2ms=6.37235E+00,1x2ms=-6.63305E-01, wx2ms=2.25929E+00 +x3ms=- 1.2 1 135E-O2,1~3ms= 1.92973E+00, wx3rns=1.00 182E+00 +x3u 1 =- 1.1 6599E-O3,1~3u 1 =-5.08278E-04, W X ~ U 1 S.5679 1 E-04 +toxm=9.60000E-003, tempm=2.70000E+Ol, vddm=5.00000E+00 +cgdom=4.18427E-0 IO,cgsom=4.18427E-0 lO,cgbornd.33943E-010 +xpart= l.000OûE+ûûO,durn 1 =0.00000E+000,dumS=O.O0000E+000 +nO= 1.00000E~,ln0=0.00000E+000,wnW.OOOOOE+000 +nb0=0.00000E~,Inb=O.00000E+000,wnb=0.00000E+000 +ndO=O.00000E+000,lnd=O.O0000E+000, wnd=û.O0000E+Oûû


*P+ diffusion::

APPENDIX C

Verilog Source Code

C. 1 The verilog behavioral source code:

Il Verilog HDL for "thesis", "ROM-B" "behavioral"

module ROM-B (al, a2, a3, a4, a5, ad, a7, a8, bl, b2, b3, b4, b5, b6, b7, b8, c l , c2, c3, c4, c5, c 6 f i c8, inputl. input2);

output a 1 ; output a2; output a3; output a4; output as; output a6; output a7; output a8; output b 1 ; output b2; output b3; output b4; output b5; output b6; output b7; output b8; output c l ; output c2;

C. Venlog Source Code

output c3; output c4: output c5: output c6; output c7; output c8; input input 1 ; input input2;

//interna1 wiring wire XI , x2, x3, x4. x5, x6, x7, x8, x9, x10, x l l . x12, x13, x14, x15, x16, x17 , x18, x19, x20, x21, x22, x23, x24, zl , z2,23, z4. z5,26, z7,z8,z9,z10, 211,212, z13,z14,z15, zl6, 217, z18, ~ 1 9 ~ ~ 2 0 , z21, 222,223,224;

//power supplies

supply 1 vdd; supplyO gnd;

//transistors pmos

pmos p 1 (x 1, vdd, gnd); pmos p2(x2, vdd, gnd); pmos p3(x3, vdd, gnd); pmos p4(x4, vdd, gnd); pmos p5(x5, vdd, gnd); pmos p6(x6, vdd, gnd); pmos p7(x7, vdd, gnd); pmos p8(x8, vdd, gnd); pmos pg(x9, vdd, gnd); pmos p 10(x 10, vdd, gnd); pmos p 1 1 (x 1 1, vdd, gnd); pmos p 12(x 12, vdd, gnd); pmos p 1 3(x 1 3, vdd, gnd); pmos p 14(x 14, vdd, gnd); pmos p 15(x 15, vdd, gnd); pmos p I6(x 16, vdd, gnd); pmos p 17(x 1 7, vdd, gnd); pmos p 1 8(x 1 8, vdd, gnd); pmos p 19(x 19, vdd, end); pmos p20(x20, vdd, gnd); pmos p2 1 (x2 1, vdd, gnd); pmos p22(x22, vdd, gnd); pmos p23(x23, vdd, gnd); pmos p24(x24, vdd, gnd);

C. Verilog Source Code

//transistors nmos

// inverters not nl(z1, X I ) ; not n2(alT zl); not n3(z2, x2); not n4(a2,z2); not nS(z3, x3); not n6(a3, 23); not n7(z4, x4); not n8(a4,z4); not ng(z5, x5); not n lO(a5,zS); not n 1 1 (26, x6); not n l2(a6,z6); not n 13(27, x7); not n 14(a7,z7); not n lS(z8, x8); not n 16(a8,28); not n 17(z9, x9) ; not n 18(b 1, 29); not n19(z10, x10); not n20(b2, z10); not nSl(zl1, XI 1); not n22(b3, z 1 l ); not n23(z 12, x 12); not n24(b4, z 12); not n25(z13, x13); not n26(b5, z 13);

C . Venlog Source Code

not n27(z 14, x 14); not n28(bG, z 14); not n29(z15, x15); not n30(b7, z 15); not n31(z16, x16); not n32(b8, 216); not n33(2 17, x 17); not n34(c 1, z 17); not n35(z18, x 18); not n36(c2, z 18); not n37(z 19, x 19); not n38(c3, z 19); not n39(z20, x20); not n40(c4,z20); not n41(z21, x21); not n42(c5, z2 1); not n43(z22, x22): not nM(c6.222); not n45(z23, x23); not n46(c7,z23); not n47(z24, x24); not n48(c8, 224);

reg al. a2. a3, a4, a5, a6, a7, a8, b l , b2. b3. b4. b5, b6. b7. b8, c l , CS, c3, c4, c5. c6. c7, c8;

initial begin

a l = l'bO; a2 = I'bO; a3 = 1 ' bO; a 4 = 1 * bO; a5 = 1'bO; a6 = 1 ' bO; a7 = 1 'bO; a8 = l'bO; bl = l'bO; b2 = 1 ' bO; b3 = 1'bO; b4 = I'bO; b5 = 1 %O; b6 = 1'bO; b7 = 1 'bO;


# 1 Sdisplay("a1 =Sb. a2=%b, a3=%b. a4=%b. a5=%b, a6=%b. a7=%b. a8=%b. b 1 =%b. b2=%b. b3=8b. b4=%b, b5=%b, b6=%b. b7=%b, b8=%b, cl=%b. c2=%b. c3=%b, c4=%b. c5=%b, c6=%b. c7=%b. c8=%b\n",al, a2. a3. a4. a5, a6. a7. a8, bl, b2. b3. b4. b5. b6, b7, b8. cl. c2, c3. c4, c5, c6. c7, c8);

a l = l'bO; a2 = l ' b l ; a3 = libO; a 4 = l 'bl ; a5 = 1 %O; a6 = l'bO; a7 = I 'bO; a8 = l 'b l ; b1 = l'bO; b2 = I 'bl; b3 = l'bO; b 4 = l 'bl; b5 = l'bO; b6 = I'bO; b7 = l'bO; b8 = l 'b l ; c l = l 'bl; CS = l 'bl; c3 = l 'bl; c4 = l 'b l ; c5 = l 'bl; c6 = l 'b l ; c7 = l 'bl; c8 = I'bO;

#50 Sdisplay("al=%b, a2=%b, a3=%b1 a4=%b, a5=%b, a6=%b, a7=%b. a8=%b. bl=%b. b2=%b, b3=%b, b4=%b, b5=%b, b6=%b1 b7=%b, b8=%b, cl=%b, c2=%b, c3=%bl ~ 4 = % b , c5=%b, c6=%b. c7=%b, c8=%b\nV,al, a2, a3, a4, a5, a6. d, a8, bl, b2. b3, b4, b5. b6. b7, b8. cl . c2, c3, c4, c5, c6, c7, cg);


al = l 'b l ; a 2 = l 'bl; a3 = I'bl; a4 = l 'bl; a5 = I'bO; a6 = I'bO; a7 = I'bO; a8 = 1'bO; bl = l 'bl; b2 = l 'bl; b3 = l 'bl; b4 = l 'bl: b5 = I'bO; b6 = 1 'bO; b7 = l'bO; b8 = I'bO; c l = l'bO; c2 = I'bO; c3 = l 'bl; c4 = 1 'bO; c5 = l 'b l ; c6 = I 'bl; c7 = l 'b l ; c8 = 1 'bO;

#50 Sdisplay("al=%b, a2=%b, a3=%b, a4=%b, aS=c/ob, a6=%b, a7=%b, a8=%b, bl=%b, b2=%b, b3=%b, b4=%b, b5=%b, b6=%b, b7=%b, b8=%b, cl=%b, c2=%b, c3=%b, c4=%b, c5=%b, c6=%b, c7=%b, c8=%b\n",al, a2, a3, a4, a5, a6, a7, a8, bl, b2, b3, b4, bS, b6, b7, b8, c l , c2, c3, c4, c5, CO, ~7,428);

end

endmodule

C. 2 The verilog stimulus source code:

// Verilog HDL for "thesis", " R O M j " "behavioral"

module R O M j ;

reg input 1, input2; wire a l , a2, a3, a4, a5, a6, a7, a8, b l , b2, b3, b4, b5, b6, b7, b8, cl, c2, c3, c4, c5, c6, c7, c8;


R O M 3 sl(al,aS,~.a4,a5,a6,a7,a8,bl,b2, b3, b4, b5, b6, b7, b8, c 1, c2, c3, c4, c5, c6, c7, c8, input 1, input2);

initial begin

input 1 = 1 %O; input2 = l'bû; #1 $display( "al = %b, a2 = %b, a3= %b, a4= %b, a5 = %b, a6 = %b. a7 =ab . a8 = %b. bl

=%b,b2=%b,b3=%bTb4=%b,b5=%b,b6=%b,b7=%b,b8=%b,cl=%b,c2=%b,c3= %b, c4 = %b, c5 = %b, c6= %bc7 = %B, c8 = %b input1 = %b, input2 = %b\nW.al, a2, a3, a4, a5. a6, a7, a8, bl. b2, b3, b4, b5, b6, b7, b8, c 1, c2, c3, c4, c5, c6, c7, c8, inputl, input2):

input l = 1 'b 1; input2 = l'bO; #50 $display( "a1 = %b, a2 = %b, a3= %b, a4= %b, a5 = %b, a6 = %b, a7 =%b, a8 = %b, bl

=%b, bZ= %b, b3 =%b, b4= %b, b5 =%b, b6=%b, b7 =%b, b8 =%b,cl =%b,c2=%b,c3 = %b, c4 = %b, c5 = %b, c6= %bc7 = %B,c8 = %b inputl = %b, input2 = %bùi",al, d, a3. a4, a5, a6, a7, a8, b 1, b2, b3, b4, b5, b6, b7, b8, c 1, c2, c3, c4, c5, c6, c7, c8, input 1, input2):

input 1 = 1 'bû; input2 = l'bl; #50 $display( "al = %b, a2 = %b, a3= %b, a4= %b, a5 = %b, a6 = a b , a7 =%b, a8 = %b, bl

=%b,b2=%b,b3=%b,b4=%b1b5=%b,b6=%b,b7=%b,b8=%b,cl =%b,c2=%b,c3= %bT c4 = lob, CS = %b, c6= %b c7 = %B, c8 = %b inputl = %b, input2 = %b\n",al, a2, a3, a4, a5, a6, a7, a8, bl, b2, b3, b4, b5, b6, b7, b8, cl , c2, c3, c4, c5, c6, c7, c8, input 1, input2);

end

REFERENCES

[ 11 A. Nosratinia, "An architecture for multi-layer feedforward neural networks", M.A.Sc

thesis, University of Windsor, 199 1.

[2] W-James. Psychology (Brief course). New York: Holt, Chapter XIV "Associations"

, 1980, pp253-279.

[3] W. S. McCulloch and W. Pitts, " A logical caiculus of ideas immanent in nervous activity"

, Bulletin of Mathematicai Biophysics, Vo1.5, pp. 115-133, 1943.

[4] J. von Neuman, The Computer and the Brain. New Haven: Yale University Press,

1958, pp.66-82.

[5] Jan M. Rabaey, Digital integrated circuits- A design perspective: Prentice Hall Decenber

1995.

References

B. Widrow and M. A. Lehr,"30 years of Adaptive Neural Networks: Perceptron, Madaiine

and Backpropagation", Proceedings of EEE, vol. 78, no. 9, pp. 1415-1442. Sep 1990.

B. Boser, E. Sackinger, J. Bromley. Y. leCun and L. Jacke1,"Hardware Requirements for

Neural Network Pattern Classifiers", IEEE Micro, vol. 12. no. 1. pp. 32-39, Feb. 1992.

E. Sackinger, B. Boser, J. Bromley, Y. leCun and L. Jackel,"Application of the ANN A

Neural Network Chip to High Speed Character Recognition", EEE Trans. Neural

Networks. vol. 3. no. 3, pp. 498-505, May 1992.

T. X. Brown, M. D. Tran, T. Daud, A.P. Thakoor,*'Cascaded VLSI neural network chips:

Hardware learning for pattern recognition and classification", Simulation, pp. 340-346,

May 1992.

J. von Neuman, "Fint draft of a report on the EDVAC", in The Origins of the Digital

Computers: Selected papers. B. Randall (Editor, Berlin: Springer Verlag, 1945/1982).

A. Sankar and R. Marnmone,"Speaker independent vowel recognition using neural trees".

Int. Joint Conf. of Neural networks, vol. 2, Seattle, WA, pp. 809-8 14, 199 1.

D. O. Hebb, The Organization of Behavior, New York: Wiley, 1949.

F. Rosenblatt, 'The perceptron: a probabilistic mode1 for information storage and

organization in the brain", Psychological Review, Vo1.65, pp.386-408, 1958.

[14] B-Widrow and M. E. Hoff, "Adaptive switching circuits", 1960 IRE WESCON

Re fe rences

Convention Record, IRE, New York, pp.96- lO4.

[15] M. Minsky and S. Papert, Perceptrons. Cambridge, Massachusetts: MIT press, 1969.

[16] T. kohonen, "Correlation matrix memones", JEEE Transactions on Cornputers,

V0L.C-2 1, pp.353-359, 1972.

[ 171 S. Grosberg, "Adaptive pattern classification and universal recording: 1. Parallel

development and coding of neural feature detectors", Biological Cybernetics, Vo1.23,

No.4, pp. 12 1 - 134. July 1976.

f 181 J. L. McClelIand and D.E.Rumelhart, "An interactive activation of context effects in letter

perception: Part 1. An account of basic findings", Psychological Review, Vo1.88,

pp.375-407. 198 1.

[19] L. O. Chua and L. Yang,*'Cellular Neural Networks:Theory", IEEE Trans. Circuits &

Systems, vol. 3, no. 10, pp. 1257- 1272, Oct. 1988.

1201 D. E. Rumelhart and J.L. McClelland, "An interactive activation of context effects in letter

perception: Part II. The contextual enhancement effect and some tests and extensions of

the model", Psychological Review, Vo1.89, pp.75- 1 12, 1982.

[2 11 J. J. Hopfield, "Neural Networks and physicai systems with emergent collective

computational abilities", Proceedings of the National Academy of Sciences, Vo1.79,

pp. 2554-2558, April 1982.

Re ferences

[22] J. J. Hopfield, "Neurons with graded response have collective computational properties

like those of two state neurons", Proceedings of the National Academy of Sciences",

Vol. 8 1, pp. 3088-3092, May 1984.

[23] T. Kohonen, "Self-organized formation of topologicaily correct feature maps", Biological

Cybernetics, Vol. 43, pp. 59-69, 1982.

[24] T. Kohonen, "Adaptive associative, and self-organizing functions in neural computing",

Applied Optics, Vol. 26, No.23, pp. 4910-4918 December 1987.

[25] T. Kohonen, Self Organization and Associative Memory, Berlin: Springer Verlag, 1984.

[26] K. Fukushima, "Neocognitron: a self organizing neural network mode1 for a mechanism

of pattern recognition unaffected by shift in position", Biological Cybernetics, Vol. 36

, pp. 193-202, A p d 1980.

[27] K. Fukushima, S. Miyakee and Tito, "Neocognitron: a neural network mode1 for an

mechanism of visual pattern recognition", IEEE Transactions on Systems, Man and

Cybernetics, Vol. SMC- 13, NOS, pp. 826-834, Sept./Oct. 1983.

[28] K. Fukushima, "Neocognitron: a hierarchal neural network capable of visual pattern

recognition", Neural Networks, Vol. 1, No.2, pp. 109- 130, 1988.

[29] K. Fukushima, "A neural network for visual pattern recognition", IEEE Cornputer

Magazine, pp. 65-75, March 1988.

References

1301 D. H. Ackley. G. E. Hinton and T. J. Sejnowsky, "A learning algorithm for Boltzman

machines", Cognitive Sciences, Vol. 9, pp. 14% 169, 1985.

13 I l D. E. Rumelhart, G. E. Hinton and R. J. Williams,"Learning representations by back

propagating errors", Nature, Vol. 323, No. 6û88, PP. 533-5369 October 1986.

[32] Y. Le Cun, "Learning processes in an asyrnrnetric threshold network, Disordered Systems

and Biological Organizations, E. Bienenstock, F. Fogelman Souli and G. Weisbuch

(Editors), Berlin: Springer Verlag, 1986.

[33] D. Parker, "Learning Logic", Technical report TR-87, Center for Computational Research

in Economics and Management Sciences, MIT, Cambridge, MA, 1985.

[34] P. J. Werbos, "Beyond regression: new tools for prediction and analysis in the behaviorai

sciences", Ph. D. thesis, Harvard University, Cambridge, MA, 1974.

[35] B. G. Farley and W. A. Clark. "Simulation of self organizing systerns by digital computer

", IRE Transactions on Information Theory, Vol. 4, pp. 76-84, 1954.

[36] T. J. Sejnowsky and C. R. Rosenberg:' NETtalk: a parallel network that learns to read aioud

", The Johns Hopkins University Electrical Engineering and Computer Science Technical

Repon IUH/EECS-86/0 1, 1986.

[37] J. Bernstien, "profiles: AI, Marvin Minsky", The New Yorker, pp.50-126,

14 Decernber 198 1.

References

[38] H. D. Block, "The perceptron:a mode1 for brain functioning", Reviews of Modern Physics

, VOL 34, pp. 123-135, 1962.

1391 B. Furman and A. A. Abidi, "An analog CMOS backward error propagation LSI", Proc.

2 n d ASLOMAR Conf. on Signals, Systems & Cornputers, pp.645-648, 1988.

f40] M. Holler et al, "An Electrically Trainable Artificial Neural Network (ETANN) with 10240

Floating Gate Synapses", Proc. 1989 Int'l. Joint Conf. on Neural Networks, Vol. 2.

pp. 19 1 - 196, June 1989.

[4 1 ] C. Schneider and H. Card, " Analog CMOS Hebbian synapses", Electronics Letters,

Vol. 27, No. 9, pp. 585-586.25 April 1991.

[42] T. H. Borgestrom, M. Ismail and S. B. Bibyk,"Prograrnmable current mode network for

implementation in analog VLSI", IEEE proceedings, part G, VOL 137, No. 2, PP. 75-84

, April 1990.

[43] S. Satyanarayana, Y. Tsividis and A. F. Graf, "A Reconfigurable VLSI Neural Network

, IEEE Journal of Solid State Circuits, Vol. 27, No. 1, Jan. 1992, pp. 67-8 1.

[44] 1. B. Lont and W. Guggenbuhl, "Analog CMOS irnplementation of Multi-Layer perception

with nonlinear synapses", E E E Trans. on Neural Networks, Vol. 3, pp. 457-465, May92.

[45] G. Cauwenberghs, C. F. Neugebauer and A. Yariv, "Anaiysis and verification of an analog

VLSI Incremental Outer-Product Learning Systems", IEEE Trans. on Neural Networks

Vol. 3, No.3, PP. 488-497, May 1992.

Re ferences

[46] B. Hoechet et al. "Implementation of a leaming Kohonen Neuron based on a new Multilevel

stonge technique", EEE Journal of Solid-State Circuits", Vol. 26. No. 3. pp. 262-267.

March 1991.

[47] D. J. Weller & R. R. Spencer,"A process invariant analog invariant neural network IC

with dynamically refreshed weights", Proc. 33rd Midwest Symp. on Circuits & Systems

, pp. 273-276. Alberta Canada, 1990.

[48] M. Verleysen, D. Jespers. " Precision of computations in analog neural networks", in VLSI

Design of neural networks. U. Rarnacher & U. Rucken, eds., Kluwer Academic Publisher,

pp. 65-8 1, 199 i .

[49] B. Boser. E. Sue Kinger, J. Brornley, Y. Lecan and L.D. Jacke1,"An Analog neural network

processor with programmable topology", IEEE Journal of Solid-State Circuits, Vol. 26,

pp. 20 17-2025, December 199 1.

[SOI V. Hu, A. Kramer and P. K. Ko, "EEPROM's as analog storage devices for neural nets",

Neural Networks, suppl., Vol 1,pp. 387, 1988.

[5 11 E. Vittoz et al," Analog storage of adjustable synaptic weights in VLSI design of neural

networks", U. Rarnacber & U. Ruckert (Editors), Kluwer Academic Publishers, pp. 47-63

1991.

[52] E. Pasero, "Floating gates as adaptive weights for artificial neural networks", in silicon

architectures for Neural nets, Proçeedings of IFIP (WG. 10. 5) workshop on Silicon

Re ferences

Architectures for Neural nets, pp. 125- 135, Saint Paul de Vence, France, 1990.

[53] A. F. Murray and A.V. Smith, "'Asynchronous VLSI neural networks using pulse Stream

arithrnetic", IEEE Journal of Solid State Circuits, Vol. 23 No. 3. pp. 688-697, lune 1988.

[54] Y. Arima et aI, "A refreshable analog VLSI neural network chip with 400 neurons and 40k

synapses". IEEE Journal of Solid State Circuits. Vol. 27, No. 12, pp. 1854- 186 1,

Dec 1992.

[55] A. Moopen, T. Duong a. P. Thakoor, " Digital-Anaiog-Hybrid Synapses Chips for

Electronic neural networks", Advances in Neural information processing Systems 2

(NIPS 89), D. S. Touretiky (editor), Morgan Kaufman Publishers, 1990.

[56] A. Nosratinia, M. Ahmadi, M. Shridhar, G. A. Jullien, "A Hybrid Architecture for Feed

Forward Multi-layer neural networks", Proc. 1992 EEE International Symposium on

Circuits and Systems, Vol. 3, pp. 1541- 1544, San Diego, USA.

1571 F- Distante, M. G. Sami, R. Stefanelli and G. Stari-Gajani,'*A compact and fast silicon

implementation for layered neural networks", Proc. ht'l. Workshop on VLSI for Ai and

Neural Net, Oxford, UK, Sep, 1990.

[58] R. R. Spencer, "'Analog irnplementation of artificiai neurai network, Proc. 1992 E E E

international Symposium on Circuits and Systems, pp. 127 1 - 1274, Singapore 199 1.

1591 J. L. McClelland and D. E. Rumelhart, Explorations in Parallel Distributed Processing: A

References

Handbook of Models, Programs and Exercises, Cambridge, Massachusetts: MIT Press,

1988.

[60] R.P. Lipmann, "An introduction to computing with neural nets", IEEE ASAP Magazine

, Vol. 4, pp. 4-22, A p d 1987.

[6 1 ] J. Raffel et al, "A genenc architecture for wafer scde Neuromorphic Systems", Proc.

IEEE 1st International Conference on Neural Networks, 1987, Vol. III, p.501

[62] J. Reinhardt, B. Muller, An Introduction to Neural Networks, Springer - Verlag , 1990.

[63] N. Yazdi, "Pipelined and Trainable Architectures for Multi-layer Neural Networks",

M.A.Sc Thesis, University of Windsor, 1992.

[64] D. E. Rumelhart, G. E. Hinton and R. J. Williams, "Learning interna1 representations by

error back propagation", in parallel distributed processing, D. E. Rumelhart and J. L.

McClelland, eds., Ch 8, Cambridge, MA: MIT Press 1988

[65] L. E. Atlas and Y. Suzuki, "Digital systems for artificial neural networks", EEE Circuits

and Devices Magazine, Vol.5, No.6, pp. 20-24, November 1989.

1661 M.Yasunaga, et al., "A wafer s cde integration neural network utilizing completely digital

circuits", Proc. UCNN 1989, pp.IY2 13-2 17, 1989.

Re ferences

[67] Blasius, Boylan and Krarner, Founders of experimental physiology, J. F. Lehmanns 197 1

[68] J. J. Paulos and W. Hollis,"Artificial Neural Networks Using MOS Analog Multipliers",

E E E Journal of Solid State Circuits, vol. 25, no. 3, pp. 849-855, June 1990.

[69] S. Sadeghi Emamchaie,"Cellular Neural Networks", Ph. D Thesis, University of Windsor

1999.

[70] J . Hertz, A. Krogh and R.G. Palmer," Introduction to theory of Neural Computation",

Addison Wesley Publishing Company, 1992.

[7 11 B. W. Lee and B. J. Sheu,"Design of Neural Based A/D Converter Using Modified

Hopfield Network", IEEE Journal of Solid State Circuits, vol. 24, no. 4, pp. 1 129-1 135,

August 1989.

[72] Phillip Allen and Douglas R. Holberg,'' CMOS Analog Circuit Design", Oxford University

Press, Newyork, 1987.

[73] Design mle manual for HP- l4TB process.

[74] Meta Software, Hspice Users Manual: Simulation and Analysis, version 96.1 for HSPICE

Release 96.1, Feb 1996, California.

[75] G. G. Loremtz,'The 13* Problem of Hilbert", Mathematical Development Arising from

Hilbert Problems, Amencan Mathematical Society, Providence, R.I., 1976.

References

[76] Thomson and Brooke,"A floating gate MOSFET with tunneling injector fabricated using

a standard double polysilicon CMOS process", E E E Electronic Device letters, vol. 12,

no. 3. pp. 1 1 1-1 13, 1991.

1771 D. B. Schwartz, R. E. Huward and W. Hubbard,*'A programmable Analog Neural Network

chip". E E E Journal of Solid State Circuits, vol. 24, no. 2, pp. 3 13-3 19, Aprîi 1989.

(781 J. Mann and S. Gilbert,"An anaiog self organizing neural network chip", in Advances in

Neural Information Processing Systems 1, pp. 739-747, Morgan Kaufman, 1989.

[79] M. Walker, S. Haghighi, L. Akers, R. O. Grandin and D. K. Ferry,"Parallel Hardware

Architecture for neuromorphic Computation", Proc. 3rd Annuai Parailei Processing

Symposium, pp. 540-548, March 1989.

1801 P. Brown, R. Millecchia and M. Stinley,"Analog memory for continous voltage discrete

time implementation of Neural Network", IEEE Trans. Neural Networks vol. 3, pp. 523-

530, San Diego, CA, 1987.

[8 11 S. Eberhardt, T. Daud and A. Thakoor,"A VLSI analog synapse 'building-block' chip for

hardware neural network implementations", Proc. 3rd Annual Parailel Processing

Symposium, pp. 257-267, March 1987.

[82] F. J. Kub, K. K. Moon, 1. A. Mack and F. M. Long,"Prograrnmable analog vector-matrix

Muiltipliers", IEEE Journal of Solid State Circuits, vol. 25, no. 1, pp. 645-648, Dec 1990.

References

[83] H. P. Graf and D. Henderson,"A reconfigurable CMOS neural network, P m . IEEE intl.

Solid State Circuits Conference(1SSCC). pp. 144- 146, 1990.

[84] T. Daud, S. Ebarhardt, M. D. Tran and A. Thakoor,"Leming and optirnization with

cascaded VLSI Neural Network building block chips". Proc. IJCNN, pp. 1 84- 192, 1992-

[85] H. Djahanshahi,"A robust hybnd VLSI neural network architecture for a smart optical

sensor", Ph.D Thesis University of Windsor, 1998.

1861 B. MacleanV9'A VLSI implementation of an intelligent sensor", M.A.Sc Thesis University

of Windsor, 1998

187) P. R. Gray and R. J. Meyer, Analysis and Design of Analog Integrated Circuits. NewYork

: John Wiley and Sons, 1977.

[88] Robert G. Meyer and Robert Alan Blauschild," A wide band low noise monolithic trans-

impedance amplifier", E E E Journal of Solid State Circuits, vol. SC-2 1, no. 4, August

1986.

VITA AUCTORIS

Zulfiqar Ahmed was born in Pakistan in 1967. He graduated from Pakistan Naval Junior Cadet

College in 1984. He obtained his B.E in Electrical Engineering from N.E.D University in 1988.

From 1988 to 1993, he worked on board P.N ships and shore establishments as Weapon Electrical

Officer. From 1994 to 1995, he was Deputy Manager Design, Configuration management and

R&D Department in Pakistan Navy Electronic Research Centre. In 1996. he joined the graduate

program at the University of Windsor and obtained his M.A.Sc degree in Eiectrical Engineering in

1999.

Documents

AN HYBRID ARCHITECTURE MULTI-LAYER FEED-FORWARD NEURAL ... · 1 rrwilrl iikr ro express rny deep grutirride arld ... 3.7.4 Testing of neural chip 3.8 Sumrnary 4. VLSI ... Fisure 1.3