View
224
Download
2
Category
Preview:
Citation preview
Artificial Intelligence Techniques
Multilayer Perceptrons
Overview
The multi-layered perceptron Back-propagation Introduction to training Uses
Pattern space - linearly separable
X2
X1
Non-linearly separable problems
If a problem is not linearly separable, then it is impossible to divide the pattern space into two regions
A network of neurons is needed Until fairly recently, it was not
known how to train a multi-layered network
Pattern space - non linearly separable
X2
X1
Decision surface
The multi-layered perceptron (MLP)
Input layer Hidden layer Output layer
Complex decision surface The MLP has the ability to emulate
any function using one hidden layer with a sigmoid function, and a linear output layer
A 3-layered network can therefore produce any complex decision surface
However, the number of neurons in the hidden layer cannot be calculated
The multi-layered perceptron (MLP)
Input layer Hidden layer Output layer
Network architecture All neurons in one layer are connected
to all neurons in the next layer The network is a feedforward network,
so all data flows from the input to the output
The architecture of the network shown is described as 3:4:2
All neurons in the hidden and output layers have a bias connection
Input layer
Receives all of the inputs Number of neurons equals the
number of inputs Does no processing Connects to all the neurons in the
hidden layer
Hidden layer Could be more than one layer, but
theory says that only one layer is necessary
The number of neurons is found by experiment
Processes the inputs Connects to all neurons in the output
layer The output is a sigmoid function
Output layer
Produces the final outputs Processes the outputs from the
hidden layer The number of neurons equals the
number of outputs The output could be linear or
sigmoid
Problems with networks
Originally the neurons had a hard-limiter on the output
Although an error could be found between the desired output and the actual output, which could be used to adjust the weights in the output layer, there was no way of knowing how to adjust the weights in the hidden layer
The invention of back-propagation
By introducing a smoothly changing output function, it was possible to calculate an error that could be used to adjust the weights in the hidden layer(s)
Output function
The sigmoid function
0
0.2
0.4
0.6
0.8
1
1.2
-5
-4.5 -4
-3.5 -3
-2.5 -2
-1.5 -1
-0.5 -0 0.5 1
1.5 2
2.5 3
3.5 4
4.5 5
net
y
Sigmoid function
The sigmoid function goes smoothly from 0 to 1 as net increases
The value of y when net=0 is 0.5 When net is negative, y is between
0 and 0.5 When net is positive, y is between
0.5 and 1.0
Back-propagation
The method of training is called the back-propagation of errors
The algorithm is an extension of the delta rule, called the generalised delta rule
Generalised delta rule
The equation for the generalised delta rule is ΔWi = ηXiδ
δ is the defined according to which layer is being considered.
For the output layer, δ is y(1-y)(d-y).
For the hidden layer δ is a more complex.
Pattern recognition
Many problems can be described as pattern recognition
For example, voice recognition, face recognition, optical character recognition
Pattern classification A more precise definition is pattern
classification In pattern classification a system is
shown examples of a number of objects
Each object is given a label or class The task of the system is to correctly
classify objects that it hasn’t seen before
Example of 2-input dataX1 X2 Class
1 1.5 02 1.8 02 3.5 04 0.52 05 1.5 04 1 01 3 0
1.5 2 05 2 0
4.5 1.44 04.5 2.5 15.5 3.5 14.5 4 13 5 1
3.5 4 13.5 2 13 3 1
4.5 4 14 3.5 1
5.5 4.5 1
Pattern space
0
1
2
3
4
5
6
0 1 2 3 4 5 6
Series1
Series2
Training a network The problem could not be implemented
on a single layer - nonlinearly separable A 3 layer MLP was tried with 4 neurons
in the hidden layer - which trained The number of neurons in the hidden
layer was reduced to 2 and still trained With 1 neuron in the hidden layer it
failed to train
The weights
The weights for the 2 neurons in the hidden layer are -9, 3.6 and 0.1 and 6.1, 2.2 and -7.8
These weights can be shown in the pattern space as two lines
The lines divide the space into 4 regions
The hidden neurons
0
1
2
3
4
5
6
0 1 2 3 4 5 6
Series1
Series2
Training and Testing
Starting with a data set, the first step is to divide the data into a training set and a test set
Use the training set to adjust the weights until the error is acceptably low
Test the network using the test set, and see how many it gets right
A better approach
Critics of this standard approach have pointed out that training to a low error can sometimes cause “overfitting”, where the network performs well on the training data but poorly on the test data
The alternative is to divide the data into three sets, the extra one being the validation set
Validation set During training, the training data is
used to adjust the weights At each iteration, the test data is
also passed through the network and the error recorded but the weights are not adjusted
The training stops when the error for the test set starts to increase
Stopping criteria
error
time
Stop here
Test set
Training set
ArchitectureInput layer Hidden layer Output layer
Back-propagation
The method of training is called the back-propagation of errors
The algorithm is an extension of the delta rule, called the generalised delta rule
Generalised delta rule
The equation for the generalised delta rule is ΔWi = ηXiδ
δ is the defined according to which layer is being considered.
For the output layer, δ is y(1-y)(d-y).
For the hidden layer δ is a more complex.
Hidden Layer
We have to deal with the error from the output layer being feedback backwards to the hidden layer.
Lets look at example the weight w2(1,2)
Which is the weight connecting neuron 1 in the input layer with neuron 2 in the hidden layer.
Δw2(1,2)=ηX1(1)δ2(2) Where
X1(1) is the output of the neuron 1 in the hidden layer.
δ2(2) is the error on the output of neuron 2 in the hidden layer.
δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)
δ3(1) = y(1-y)(d-y)= x3(1)[1-x3(1)][d-
x3(1)]
So we start with the error at the output and use this result to ripple backwards altering the weights.
Example
Exclusive OR using the network shown earlier: 2:2:1 network
Initial weights W2(0,1)=0.862518 W2(1,1)=-0.155797
W2(2,1)=0.282885 W2(0,2)=0.834986 w2(1,2)=-0.505997
w2(2,2)=-0.864449 W3(0,1)=0.036498 w3(1,1)=-0.430437
w3(2,1)=0.48121
Feedforward – hidden layer (neuron 1) So if
X1(0)=1 (the bias) X1(1)=0 X1(2)=0
The output of weighted sum inside neuron 1 in the hidden layer=0.862518
Then using sigmoid function X2(1)=0.7031864
Feedforward – hidden layer (neuron 2) So if
X1(0)=1 (the bias) X1(1)=0 X1(2)=0
The output of weighted sum inside neuron 2 in the hidden layer=0.834986
Then using sigmoid function X2(2)=0.6974081
Feedforward – output layer So if
X2(0)=1 (the bias) X2(1)=0.7031864 X2(2)=0.6974081
The output of weighted sum inside neuron 2 in the hidden layer=0.0694203
Then using sigmoid function X3(1)=0.5173481 Desired output=0
δ3(1)=x3(1)[1-x3(1)][d-x3(1)] =-0.1291812 δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1)=0.0116054 δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)=-0.0131183
Now we can use the delta rule to calculate the change in the weights
ΔWi = ηXiδ
Examples If we set η=0.5 ΔW2(0,1) = ηX1(0)δ2(1)
=0.5 x 1 x 0.0116054
=0.0058027 ΔW3(2,1) = ηX2(1)δ3(1)
=0.5 x 0.7031864 x –0.1291812
=-0.04545192
What would be the results of the following?
ΔW2(2,1) = ηX1(2)δ2(1) ΔW2(2,2) = ηX1(2)δ2(2)
ΔW2(2,1) = ηX1(2)δ2(1)
=0.5x0x0.0116054=0
ΔW2(2,2) = ηX1(2)δ2(2)=0.5 x 0 x –
0.131183=0
New weights W2(0,1)=0.868321 W2(1,1)=-0.155797
W2(2,1)=0.282885 W2(0,2)=0.828427 w2(1,2)=-0.505997
w2(2,2)=-0.864449 W3(0,1)=0.028093 w3(1,1)=-0.475856
w3(2,1)=0.436164
Conclusions Train using training, test and
validation sets An MLP can be used to recognise
(classify) complex data It uses supervised learning with
back-propagation to adjust the weights
It divides the pattern space in the hidden layer
Conclusions
Extending the delta rule to do back propagation
Need to calculate the error at the outputs of neurones in the hidden and output layers
δ3(1)=x3(1)[1-x3(1)][d-x3(1)] δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1) δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)
Once you have the error values (δ’s) for the neurones you then use the delta rule to calculate the actual change in the weights.
ΔWi = ηXiδ
Recommended