Upload
yoshida-
View
19
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Basic things about neural network
Citation preview
5/26/2018 Neural Networks
1/73 Negnevitsky, Pearson Education, 2005 1
Lecture 7
Artificial neural networks:Supervised learning
Introduction, or how the brain works
The neuron as a simple computing element
The perceptron
Multilayer neural networks
Accelerated learning in multilayer neural networks
The Hopfield network
Bidirectional associative memories (BAM)
Summary
5/26/2018 Neural Networks
2/73 Negnevitsky, Pearson Education, 2005 2
Introduction, or how the brain works
Machine learning involves adaptive mechanismsthat enable computers to learn from experience,
learn by example and learn by analogy. Learning
capabilities can improve the performance of an
intelligent system over time. The most popular
approaches to machine learning are artificial
neural networks and genetic algorithms. This
lecture is dedicated to neural networks.
5/26/2018 Neural Networks
3/73 Negnevitsky, Pearson Education, 2005 3
A neural network can be defined as a model of
reasoning based on the human brain. The brain
consists of a densely interconnected set of nervecells, or basic information-processing units, called
neurons.
The human brain incorporates nearly 10 billionneurons and 60 trillion connections,synapses,
between them. By using multiple neurons
simultaneously, the brain can perform its functionsmuch faster than the fastest computers in existence
today.
5/26/2018 Neural Networks
4/73 Negnevitsky, Pearson Education, 2005 4
Each neuron has a very simple structure, but an
army of such elements constitutes a tremendous
processing power.
A neuron consists of a cell body, soma, a number
of fibers called dendrites, and a single long fiber
called the axon.
5/26/2018 Neural Networks
5/73 Negnevitsky, Pearson Education, 2005 5
Biological neural network
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
5/26/2018 Neural Networks
6/73 Negnevitsky, Pearson Education, 2005 6
Our brain can be considered as a highly complex,
non-linear and parallel information-processing
system. Information is stored and processed in a neural
network simultaneously throughout the whole
network, rather than at specific locations. In otherwords, in neural networks, both data and its
processing are global rather than local.
Learning is a fundamental and essentialcharacteristic of biological neural networks. The
ease with which they can learn led to attempts to
emulate a biological neural network in a computer.
5/26/2018 Neural Networks
7/73 Negnevitsky, Pearson Education, 2005 7
An artificial neural network consists of a number of
very simple processors, also called neurons, which
are analogous to the biological neurons in the brain. The neurons are connected by weighted links
passing signals from one neuron to another.
The output signal is transmitted through theneurons outgoing connection. The outgoing
connection splits into a number of branches that
transmit the same signal. The outgoing branchesterminate at the incoming connections of other
neurons in the network.
5/26/2018 Neural Networks
8/73 Negnevitsky, Pearson Education, 2005 8
Architecture of a typical artificial neural network
Input Layer Output Layer
Middle Layer
InputS
ignals
Ou
tputS
ignals
5/26/2018 Neural Networks
9/73 Negnevitsky, Pearson Education, 2005 9
Analogy between biological and
artificial neural networks
Biological Neural Network Artificial Neural Network
Soma
DendriteAxonSynapse
Neuron
InputOutputWeight
5/26/2018 Neural Networks
10/73 Negnevitsky, Pearson Education, 2005 10
The neuron as a simple computing element
Diagram of aneuron
Neuron Y
Input Signals
x1
x2
xn
Output Signals
Y
Y
Y
w2
w1
wn
Weights
5/26/2018 Neural Networks
11/73 Negnevitsky, Pearson Education, 2005 11
The neuron computes the weighted sum of the input
signals and compares the result with a threshold
value, q. If the net input is less than the threshold,
the neuron output is1. But if the net input is greater
than or equal to the threshold, the neuron becomes
activated and its output attains a value +1.
The neuron uses the following transfer or activation
function:
This type of activation function is called a sign
function.
n
iiiwxX
1
q1
q1
X
X
Y if,
if,
5/26/2018 Neural Networks
12/73 Negnevitsky, Pearson Education, 2005 12
Activation functions of a neuron
Step function Sign function
+1
-1
0
+1
-1
0X
Y
X
Y
1 1
-1
0 X
Y
Sigmoidfunction
-1
0 X
Y
Linear function
0if,00if,1
XXYstep
0if,10if,1
XXYsign
Xsigmoid
eY
1
1 XYlinear
5/26/2018 Neural Networks
13/73 Negnevitsky, Pearson Education, 2005 13
Can a single neuron learn a task?
In 1958, Frank Rosenblatt introduced a trainingalgorithm that provided the first procedure for
training a simple ANN: a perceptron.
The perceptron is the simplest form of a neuralnetwork. It consists of a single neuron with
adjustable synaptic weights and a hard limiter.
5/26/2018 Neural Networks
14/73 Negnevitsky, Pearson Education, 2005 14
Single-layer two-input perceptron
Threshold
Inputs
x1
x2
Output
Y
Hard
Limiter
w2
w1
Linear
Combiner
q
5/26/2018 Neural Networks
15/73 Negnevitsky, Pearson Education, 2005 15
ThePerceptron
The operation of Rosenblatts perceptron is basedon the McCulloch and Pitts neuron model. The
model consists of a linear combiner followed by a
hard limiter.
The weighted sum of the inputs is applied to the
hard limiter, which produces an output equal to +1
if its input is positive and 1 if it is negative.
5/26/2018 Neural Networks
16/73 Negnevitsky, Pearson Education, 2005 16
The aim of the perceptron is to classify inputs,
x1,x2, . . .,xn, into one of two classes, sayA1and A2.
In the case of an elementary perceptron, the n-
dimensional space is divided by a hyperplane intotwo decision regions. The hyperplane is defined by
the l inearly separable function:
01
q
n
iiiwx
5/26/2018 Neural Networks
17/73
Negnevitsky, Pearson Education, 2005 17
Linear separability in the perceptrons
x1
x2
Class A 2
Class A 1
1
2
x1w1 +x2w2q = 0
(a) Two-input perceptron. (b) Three-input perceptron.
x2
x1
x3 x1w1+ x2w2+ x3w3q = 0
1 2
5/26/2018 Neural Networks
18/73
Negnevitsky, Pearson Education, 2005 18
How does the perceptron learn its classification
tasks?
This is done by making small adjustments in the
weights to reduce the difference between the actual
and desired outputs of the perceptron. The initial
weights are randomly assigned, usually in the range[0.5, 0.5], and then updated to obtain the output
consistent with the training examples.
5/26/2018 Neural Networks
19/73
Negnevitsky, Pearson Education, 2005 19
If at iterationp, the actual output is Y(p) and the
desired output is Yd(p), then the error is given by:
wherep = 1, 2, 3, . . .
Iterationp here refers to thepth training examplepresented to the perceptron.
If the error, e(p), is positive, we need to increase
perceptron output Y(p), but if it is negative, weneed to decrease Y(p).
)()()( pYpYpe d
5/26/2018 Neural Networks
20/73
Negnevitsky, Pearson Education, 2005 20
)()()()1( pepxpwpw iii
a. .
wherep = 1, 2, 3, . . .
a is the learning rate, a positive constant less than
unity.
The perceptron learning rule was first proposed by
Rosenblatt in 1960. Using this rule we can derive
the perceptron training algorithm for classificationtasks.
The perceptron learning rule
5/26/2018 Neural Networks
21/73
Negnevitsky, Pearson Education, 2005 21
Perceptrons training algorithm
Step 1: InitialisationSet initial weights w1, w2,, wnand threshold q
to random numbers in the range [0.5, 0.5].
If the error, e(p), is positive, we need to increaseperceptronoutput Y(p), but if it is negative, we
need to decrease Y(p).
5/26/2018 Neural Networks
22/73
Negnevitsky, Pearson Education, 2005 22
Perceptrons training algorithm (continued)
Step 2: Activation
Activate the perceptron by applying inputsx1(p),
x2(p),,xn(p) and desired output Yd(p).
Calculate the actual output at iterationp = 1
where n is the number of the perceptron inputs,andstep is a step activation function.
q
n
iii pwpxsteppY
1
)()()(
5/26/2018 Neural Networks
23/73
Negnevitsky, Pearson Education, 2005 23
Perceptrons training algorithm (continued)
Step 3: Weight training
Update the weights of the perceptron
where Dwi(p) is the weight correction at iterationp.
The weight correction is computed by the deltarule:
Step 4: Iteration
Increase iterationpby one, go back to Step 2 and
repeat the process until convergence.
)()()1( pwpwpw iii D
)()()( pepxpw ii aD .
5/26/2018 Neural Networks
24/73
Negnevitsky, Pearson Education, 2005 24
Example of perceptron learning: the logical operationANDInputs
x1 x2
0
01
1
0
10
1
0
00
EpochDesiredoutputYd
1
Initialweights
w1
w2
1
0.3
0.30.3
0.2
0.1
0.1
0.1
0.1
0
01
0
Actualoutput
Y
Error
e
0
01
1
Finalweightsw
1 w
2
0.3
0.30.2
0.3
0.1
0.1
0.1
0.0
00
11
0
1
01
00
0
2
1
0.30.3
0.30.2
0
0
11
00
10
0.30.3
0.20.2
0.00.0
0.00.0
0
0
1
1
0
1
0
1
0
0
0
3
1
0.2
0.2
0.2
0.1
0.00.0
0.00.0
0.0
0.0
0.0
0.0
0
0
1
0
0
0
1
1
0.2
0.2
0.1
0.2
0.0
0.0
0.0
0.1
0
0
11
0
1
01
0
0
0
4
1
0.2
0.2
0.20.1
0.1
0.1
0.10.1
0
0
11
0
0
10
0.2
0.2
0.10.1
0.1
0.1
0.10.1
00
11
0
1
01
00
0
5
1
0.10.1
0.10.1
0.10.1
0.10.1
0
0
01
00
0
0.10.1
0.10.1
0.10.1
0.10.1
0
Threshold: q= 0.2; learning rate: = 0.1
5/26/2018 Neural Networks
25/73
Negnevitsky, Pearson Education, 2005 25
Two-dimensional plots of basic logical operations
A perceptron can learn the operationsAND and OR,but notExclusive-OR.
x1
x2
1
1
x1
x2
1
1
(b) OR (x1 x2)
x1
x2
1
1
(c) Exclusive-OR
(x1 x2)
00 0
(a) AND (x1x2)
5/26/2018 Neural Networks
26/73
Negnevitsky, Pearson Education, 2005 26
Multilayerneuralnetworks
A multilayer perceptron is a feedforward neuralnetwork with one or more hidden layers.
The network consists of an input layer of source
neurons, at least one middle or hidden layer ofcomputational neurons, and an output layer of
computational neurons.
The input signals are propagated in a forwarddirection on a layer-by-layer basis.
5/26/2018 Neural Networks
27/73
Negnevitsky, Pearson Education, 2005 27
Multilayer perceptron with two hidden layers
Input
layer
First
hidden
layer
Second
hidden
layerOutput
layer
InputSi
gnals
Ou
tputS
ignals
5/26/2018 Neural Networks
28/73
Negnevitsky, Pearson Education, 2005 28
What does the middle layer hide? A hidden layer hides its desired output.
Neurons in the hidden layer cannot be observedthrough the input/output behaviour of the network.
There is no obvious way to know what the desired
output of the hidden layer should be.
Commercial ANNs incorporate three and
sometimes four layers, including one or two
hidden layers. Each layer can contain from 10 to
1000 neurons. Experimental neural networks mayhave five or even six layers, including three or
four hidden layers, and utilise millions of neurons.
5/26/2018 Neural Networks
29/73
Negnevitsky, Pearson Education, 2005 29
Back-propagation neural network
Learning in a multilayer network proceeds the
same way as for a perceptron.
A training set of input patterns is presented to the
network.
The network computes its output pattern, and if
there is an error or in other words a difference
between actual and desired output patterns the
weights are adjusted to reduce this error.
5/26/2018 Neural Networks
30/73
Negnevitsky, Pearson Education, 2005 30
In a back-propagation neural network, the learning
algorithm has two phases.
First, a training input pattern is presented to the
network input layer. The network propagates the
input pattern from layer to layer until the output
pattern is generated by the output layer.
If this pattern is different from the desired output,
an error is calculated and then propagated
backwards through the network from the outputlayer to the input layer. The weights are modified
as the error is propagated.
5/26/2018 Neural Networks
31/73
Negnevitsky, Pearson Education, 2005 31
Three-layer back-propagation neural network
Input
layer
xi
x1
x2
xn
1
2
i
n
Output
layer
1
2
k
l
yk
y1
y2
yl
Input signals
Error signals
wjk
Hidden
layer
wij
1
2
j
m
5/26/2018 Neural Networks
32/73
Negnevitsky, Pearson Education, 2005 32
The back-propagation training algorithm
Step 1: Initialisation
Set all the weights and threshold levels of thenetwork to random numbers uniformly
distributed inside a small range:
whereFiis the total number of inputs of neuron iin the network. The weight initialisation is done
on a neuron-by-neuron basis.
ii FF
4.2,
4.2
5/26/2018 Neural Networks
33/73
Negnevitsky, Pearson Education, 2005 33
Step 2: Activation
Activate the back-propagation neural network by
applying inputsx1(p),x
2(p),,x
n(p) and desired
outputsyd,1(p),yd,2(p),,yd,n(p).
(a) Calculate the actual outputs of the neurons in
the hidden layer:
where n is the number of inputs of neuronj in thehidden layer, andsigmoid is thesigmoid activation
function.
q
j
n
iijij pwpxsigmoidpy
1
)()()(
5/26/2018 Neural Networks
34/73
Negnevitsky, Pearson Education, 2005 34
(b) Calculate the actual outputs of the neurons in
the output layer:
Step 2: Activation (continued)
where m is the number of inputs of neuron k in the
output layer.
q
k
m
jjkjkk pwpxsigmoidpy
1
)()()(
5/26/2018 Neural Networks
35/73
Negnevitsky, Pearson Education, 2005 35
Step 3: Weight training
Update the weights in the back-propagation network
propagating backward the errors associated with
output neurons.
(a) Calculate the error gradient for the neurons in the
output layer:
where
Calculate the weight corrections:
Update the weights at the output neurons:
)()()1( pwpwpw jkjkjk D
)()(1)()( pepypyp kkkk
)()()( , pypype kkdk
)()()( ppypw kjjk D
5/26/2018 Neural Networks
36/73
Negnevitsky, Pearson Education, 2005 36
(b) Calculate the error gradient for the neurons in
the hidden layer:
Step 3: Weight training (continued)
Calculate the weight corrections:
Update the weights at the hidden neurons:
)()()(1)()(1
][ pwppypyp jkl
kkjjj
)()()( ppxpw jiij D
)()()1( pwpwpw ijijij D
5/26/2018 Neural Networks
37/73
Negnevitsky, Pearson Education, 2005 37
Step 4: Iteration
Increase iterationpby one, go back to Step 2 and
repeat the process until the selected error criterion
is satisfied.
As an example, we may consider the three-layer
back-propagation network. Suppose that the
network is required to perform logical operation
Exclusive-OR. Recall that a single-layer perceptron
could not do this operation. Now we will apply the
three-layer net.
5/26/2018 Neural Networks
38/73
Negnevitsky, Pearson Education, 2005 38
Three-layer network for solving the
Exclusive-OR operation
y55
x1 31
x2
Input
layer
Output
layer
Hiddenlayer
42
q3
w13
w24
w23
w24
w35
w45
q4
q5
1
1
1
5/26/2018 Neural Networks
39/73
Negnevitsky, Pearson Education, 2005 39
The effect of the threshold applied to a neuron in the
hidden or output layer is represented by its weight, q,
connected to a fixed input equal to 1.
The initial weights and threshold levels are set
randomly as follows:
w13= 0.5, w14= 0.9, w23= 0.4, w24= 1.0, w35 = 1.2,w45= 1.1, q3= 0.8, q4= 0.1 and q5= 0.3.
5/26/2018 Neural Networks
40/73
Negnevitsky, Pearson Education, 2005 40
We consider a training set where inputsx1 andx2are
equal to 1 and desired outputyd,5is 0. The actual
outputs of neurons 3 and 4 in the hidden layer are
calculated as
Now the actual output of neuron 5 in the output layer
is determined as:
Thus, the following error is obtained:
5250.01/1)( )8.014.015.01(32321313 q ewxwxsigmoidy
8808.01/1)(
)1.010.119.01(
42421414 q
ewxwxsigmoidy
5097.01/1)( )3.011.18808.02.15250.0(
54543535 q
ewywysigmoidy
5097.05097.0055, yye d
5/26/2018 Neural Networks
41/73
Negnevitsky, Pearson Education, 2005 41
The next step is weight training. To update the
weights and threshold levels in our network, we
propagate the error, e, from the output layerbackward to the input layer.
First, we calculate the error gradient for neuron 5 in
the output layer:
Then we determine the weight corrections assuming
that the learning rate parameter, a, is equal to 0.1:
1274.05097).0(0.5097)(10.5097)1( 555 eyy
0112.0)1274.0(8808.01.05445 D yw
0067.0)1274.0(5250.01.05335 D yw
0127.0)1274.0()1(1.0)1( 55 qD
5/26/2018 Neural Networks
42/73
Negnevitsky, Pearson Education, 2005 42
Next we calculate the error gradients for neurons 3
and 4 in the hidden layer:
We then determine the weight corrections:
0381.0)2.1(0.1274)(0.5250)(10.5250)1( 355333 wyy
0.0147.114)0.127(0.8808)(10.8808)1( 455444 wyy
0038.00381.011.03113 D xw0038.00381.011.03223 D xw
0038.00381.0)1(1.0)1( 33 qD
0015.0)0147.0(11.04114 D
xw 0015.0)0147.0(11.04224 D xw0015.0)0147.0()1(1.0)1( 44 qD
5/26/2018 Neural Networks
43/73
Negnevitsky, Pearson Education, 2005 43
At last, we update all weights and threshold:
The training process is repeated until the sum of
squared errors is less than 0.001.
5038.00038.05.0131313 D www
8985.00015.09.0141414 D www
4038.00038.04.0232323 D www
9985.00015.00.1242424 D www
2067.10067.02.1353535
D www
0888.10112.01.1454545 D www
7962.00038.08.0333 qDqq
0985.00015.01.0444 qDqq
3127.00127.03.0555 qDqq
5/26/2018 Neural Networks
44/73
Negnevitsky, Pearson Education, 2005 44
Learning curve for operation Exclusive-OR
0 50 100 150 200
10 1
Epoch
Sum-Squared Network Error for 224 Epochs
100
10 -1
10 -2
10 -3
10-4
Sum-SquaredError
5/26/2018 Neural Networks
45/73
Negnevitsky, Pearson Education, 2005 45
Final results of three-layer network learning
Inputs
x1 x2
1
01
0
1
10
0
0
11
Desiredoutput
yd
0
0.0155
Actualoutput
y5 e
Sum ofsquared
errors
0.98490.9849
0.0175
0.0010
5/26/2018 Neural Networks
46/73
Negnevitsky, Pearson Education, 2005 46
Network represented by McCulloch-Pitts model
for solving the Exclusive-OR operation
y55
x1 31
x2 42
+1.0
1
1
1+1.0
+1.0
+1.0
+1.5
+1.0
+0.5
+0.52.0
i i i
5/26/2018 Neural Networks
47/73
Negnevitsky, Pearson Education, 2005 47
Decision boundaries
(a) Decisionboundary constructed by hidden neuron 3;(b) Decision boundary constructed by hidden neuron 4;
(c) Decision boundaries constructed by the complete
three-layer network
x1
x2
1
(a)
1
x2
1
1
(b)
00
x1
+ x21.5 = 0 x
1+ x
2 0.5 = 0
x1 x1
x2
1
1
(c)
0
5/26/2018 Neural Networks
48/73
Negnevitsky, Pearson Education, 2005 48
Accelerated learning in multilayer
neural networks
A multilayer network learns much faster when thesigmoidal activation function is represented by a
hyperbolic tangent:
where a and bare constants.
Suitable values for a and b are:
a = 1.716 and b = 0.667
ae
aYbX
htan
1
2
5/26/2018 Neural Networks
49/73
Negnevitsky, Pearson Education, 2005 49
We also can accelerate training by including a
momentum term in the delta rule:
where b is a positive number (0 b 1) called themomentum constant. Typically, the momentum
constant is set to 0.95.
This equation is called the generalised delta rule.
)()()1()( ppypwpw kjjkjk DD
L i ith t f ti E l i OR
5/26/2018 Neural Networks
50/73
Negnevitsky, Pearson Education, 2005 50
Learning with momentum for operation Exclusive-OR
0 20 40 60 80 100 120
10-4
10-2
100
102
Epoch
Training for 126 Epochs
0 100 140
-1
-0.5
0
0. 5
1
1. 5
Epoch
10-3
101
10-1
20 40 60 80 120
Learning
Rate
L i ith d ti l i t
5/26/2018 Neural Networks
51/73
Negnevitsky, Pearson Education, 2005 51
Learning with adaptive learning rate
To accelerate the convergence and yet avoid the
danger of instability, we can apply two heuristics:
Heuristic 1
If the change of the sum of squared errors has the same
algebraic sign for several consequent epochs, then thelearning rate parameter, a, should be increased.
Heuristic 2
If the algebraic sign of the change of the sum ofsquared errors alternates for several consequent
epochs, then the learning rate parameter, a, should be
decreased.
5/26/2018 Neural Networks
52/73
Negnevitsky, Pearson Education, 2005 52
Adapting the learning rate requires some changesin the back-propagation algorithm.
If the sum of squared errors at the current epochexceeds the previous value by more than a
predefined ratio (typically 1.04), the learning rate
parameter is decreased (typically by multiplyingby 0.7) and new weights and thresholds arecalculated.
If the error is less than the previous one, thelearning rate is increased (typically by multiplyingby 1.05).
L i ith d ti l i t
5/26/2018 Neural Networks
53/73
Negnevitsky, Pearson Education, 2005 53
Learning with adaptive learning rate
0 10 20 30 40 50 60 70 80 90 100Epoch
Tr aining for 103Epochs
0 20 40 60 80 100 1200
0.2
0.4
0.6
0.8
1
Epoch
10-4
10 -2
100
102
10-3
10 1
10-1
Sum-SquaredErro
LearningRate
L i ith t d d ti l i t
5/26/2018 Neural Networks
54/73
Negnevitsky, Pearson Education, 2005 54
Learning with momentum and adaptive learning rate
0 10 20 30 40 50 60 70 80Epoch
Trainingfor 85 Epochs
0 10 20 30 40 50 60 70 80 900
0.5
1
2.5
Epoch
10 -4
10 -2
10 0
10 2
10-3
101
10 -1
1.5
2
Sum-SquaredE
rro
Learnin
gRate
Th H fi ld N t k
5/26/2018 Neural Networks
55/73
Negnevitsky, Pearson Education, 2005 55
The Hopfield Network
Neural networks were designed on analogy with
the brain. The brains memory, however, works by
association. For example, we can recognise a
familiar face even in an unfamiliar environment
within 100-200 ms. We can also recall a completesensory experience, including sounds and scenes,
when we hear only a few bars of music. The brain
routinely associates one thing with another.
5/26/2018 Neural Networks
56/73
Negnevitsky, Pearson Education, 2005 56
Multilayer neural networks trained with the back-
propagation algorithm are used for pattern
recognition problems. However, to emulate thehuman memorys associative characteristics we
need a different type of network: a recurrent
neural network.
A recurrent neural network has feedback loops
from its outputs to its inputs. The presence of
such loops has a profound impact on the learning
capability of the network.
5/26/2018 Neural Networks
57/73
Negnevitsky, Pearson Education, 2005 57
The stability of recurrent networks intrigued
several researchers in the 1960s and 1970s.
However, none was able to predict which networkwould be stable, and some researchers were
pessimistic about finding a solution at all. The
problem was solved only in 1982, when John
Hopfield formulated the physical principle of
storing information in a dynamically stable
network.
Si l l n H fi ld t k
5/26/2018 Neural Networks
58/73
Negnevitsky, Pearson Education, 2005 58
Single-layer n-neuron Hopfield network
xi
x1
x2
xn
yi
y1
y2
yn
1
2
i
n
5/26/2018 Neural Networks
59/73
Negnevitsky, Pearson Education, 2005 59
The Hopfield network uses McCulloch and Pitts
neurons with thesign activation function as its
computing element:
XY
X
X
Ysign
if,
if,1
0if,1
5/26/2018 Neural Networks
60/73
Negnevitsky, Pearson Education, 2005 60
The current state of the Hopfield network is
determined by the current outputs of all neurons,
y1,y2, . . .,yn.Thus, for a single-layer n-neuron network, the state
can be defined by the state vector as:
ny
y
y
2
1
Y
5/26/2018 Neural Networks
61/73
Negnevitsky, Pearson Education, 2005 61
In the Hopfield network, synaptic weights between
neurons are usually represented in matrix form as
follows:
whereMis the number of states to be memorised
by the network, Ymis the n-dimensional binary
vector, I is nnidentity matrix, and superscript T
denotes matrix transposition.
IYYW MM
m
Tmm
1
Possible states for the three neuron
5/26/2018 Neural Networks
62/73
Negnevitsky, Pearson Education, 2005 62
Possible states for the three-neuron
Hopfield network
y1
y2
y3
(1,1,1)(1,1,1)
(1,1,1) (1,1,1)
(1, 1, 1)(1,1,1)
(1, 1,1)(1,1,1)
0
The stable state vertex is determined by the weight
5/26/2018 Neural Networks
63/73
Negnevitsky, Pearson Education, 2005 63
The stable state-vertex is determined by the weight
matrix W, the current input vector X, and the
threshold matrix . If the input vector is partially
incorrect or incomplete, the initial state will converge
into the stable state-vertex after a few iterations.
Suppose, for instance, that our network is required to
memorise two opposite states, (1, 1, 1) and (1, 1, 1).Thus,
where Y1and Y2are the three-dimensional vectors.
or
11
1
1Y
1
1
1
2Y 111
1 TY 111
2
TY
The 3 3 identity matrix I is
5/26/2018 Neural Networks
64/73
Negnevitsky, Pearson Education, 2005 64
The 3 3 identity matrix I is
Thus, we can now determine the weight matrix as
follows:
Next, the network is tested by the sequence of input
vectors, X1and X2, which are equal to the output (or
target) vectors Y1and Y2, respectively.
100
010
001
I
100
010
001
2111
1
1
1
111
1
1
1
W
022
202
220
First we activate the Hopfield network by applying
5/26/2018 Neural Networks
65/73
Negnevitsky, Pearson Education, 2005 65
First, we activate the Hopfield network by applying
the input vector X. Then, we calculate the actual
output vector Y, and finally, we compare the result
with the initial input vector X.
11
1
00
0
11
1
022202
220
1 signY
1
1
1
0
0
0
1
1
1
022
202
220
2 signY
The remaining six states are all unstable However
5/26/2018 Neural Networks
66/73
Negnevitsky, Pearson Education, 2005 66
The remaining six states are all unstable. However,stable states (also calledfundamental memories) are
capable of attracting states that are close to them.
The fundamental memory (1, 1, 1) attracts unstable
states (1, 1, 1), (1, 1, 1) and (1, 1, 1). Each ofthese unstable states represents a single error,
compared to the fundamental memory (1, 1, 1).
The fundamental memory (1, 1, 1) attracts
unstable states (1, 1, 1), (1, 1, 1) and (1, 1, 1).
Thus, the Hopfield network can act as an errorcorrection network.
St it f th H fi ld t k
5/26/2018 Neural Networks
67/73
Negnevitsky, Pearson Education, 2005 67
Storage capacity of the Hopfield network
Storage capacity is or the largest number of
fundamental memories that can be stored and
retrieved correctly.
The maximum number of fundamental memories
Mmaxthat can be stored in the n-neuron recurrent
network is limited by
nMmax 0.15
Bidirectional associative memory (BAM)
5/26/2018 Neural Networks
68/73
Negnevitsky, Pearson Education, 2005 68
Bidirectional associative memory (BAM)
The Hopfield network represents an autoassociative
type of memory it can retrieve a corrupted orincomplete memory but cannot associate this memorywith another different memory.
Human memory is essentially associative. One thingmay remind us of another, and that of another, and soon. We use a chain of mental associations to recovera lost memory. If we forget where we left an
umbrella, we try to recall where we last had it, whatwe were doing, and who we were talking to. We
attempt to establish a chain of associations, andthereby to restore a lost memory.
To associate one memory with another we need a
5/26/2018 Neural Networks
69/73
Negnevitsky, Pearson Education, 2005 69
To associate one memory with another, we need a
recurrent neural network capable of accepting an
input pattern on one set of neurons and producing
a related, but different, output pattern on another
set of neurons.
Bidirectional associative memory (BAM), first
proposed by Bart Kosko, is a heteroassociative
network. It associates patterns from one set, setA,
to patterns from another set, setB, and vice versa.
Like a Hopfield network, the BAM can generaliseand also produce correct outputs despite corrupted
or incomplete inputs.
BAM operation
5/26/2018 Neural Networks
70/73
Negnevitsky, Pearson Education, 2005 70
BAM operation
yj(p)
y1(p)
y2(p)
ym(p)
1
2
j
m
Output
layer
Input
layer
xi(p)
x1
x2
(p)
(p)
xn(p)
2
i
n
1
xi(p+1)
x1(p+1)
x2(p+1)
xn(p+1)
yj(p)
y1(p)
y2(p)
ym(p)
1
2
j
m
Output
layer
Input
layer
2
i
n
1
(a) Forward direction. (b) Backward direction.
5/26/2018 Neural Networks
71/73
Negnevitsky, Pearson Education, 2005 71
The basic idea behind the BAM is to storepattern pairs so that when n-dimensional vector
X from setA is presented as input, the BAM
recalls m-dimensional vector Y from setB, but
when Y is presented as input, the BAM recalls X.
5/26/2018 Neural Networks
72/73
Negnevitsky, Pearson Education, 2005 72
To develop the BAM, we need to create a
correlation matrix for each pattern pair we want to
store. The correlation matrix is the matrix productof the input vector X, and the transpose of the
output vector YT. The BAM weight matrix is the
sum of all correlation matrices, that is,
whereM is the number of pattern pairs to be stored
in the BAM.
Tm
M
m
mYXW
1
Stability and storage capacity of the BAM
5/26/2018 Neural Networks
73/73
Stability and storage capacity of the BAM
The BAM is unconditionally stable. This means that
any set of associations can be learned without risk ofinstability.
The maximum number of associations to be stored in
the BAM should not exceed the number ofneurons in the smaller layer.
The more serious problem with the BAM is
incorrect convergence. The BAM may notalways produce the closest association. In fact, a
stable association may be only slightly related to
the initial input vector