Artificial Intelligence Techniques Multilayer Perceptrons

Artificial Intelligence Techniques

Multilayer Perceptrons

Overview

The multi-layered perceptron Back-propagation Introduction to training Uses

Pattern space - linearly separable

Non-linearly separable problems

If a problem is not linearly separable, then it is impossible to divide the pattern space into two regions

A network of neurons is needed Until fairly recently, it was not

known how to train a multi-layered network

Pattern space - non linearly separable

Decision surface

The multi-layered perceptron (MLP)

Input layer Hidden layer Output layer

Complex decision surface The MLP has the ability to emulate

any function using one hidden layer with a sigmoid function, and a linear output layer

A 3-layered network can therefore produce any complex decision surface

However, the number of neurons in the hidden layer cannot be calculated

The multi-layered perceptron (MLP)

Input layer Hidden layer Output layer

Network architecture All neurons in one layer are connected

to all neurons in the next layer The network is a feedforward network,

so all data flows from the input to the output

The architecture of the network shown is described as 3:4:2

All neurons in the hidden and output layers have a bias connection

Input layer

Receives all of the inputs Number of neurons equals the

number of inputs Does no processing Connects to all the neurons in the

hidden layer

Hidden layer Could be more than one layer, but

theory says that only one layer is necessary

The number of neurons is found by experiment

Processes the inputs Connects to all neurons in the output

layer The output is a sigmoid function

Output layer

Produces the final outputs Processes the outputs from the

hidden layer The number of neurons equals the

number of outputs The output could be linear or

sigmoid

Problems with networks

Originally the neurons had a hard-limiter on the output

Although an error could be found between the desired output and the actual output, which could be used to adjust the weights in the output layer, there was no way of knowing how to adjust the weights in the hidden layer

The invention of back-propagation

By introducing a smoothly changing output function, it was possible to calculate an error that could be used to adjust the weights in the hidden layer(s)

Output function

The sigmoid function

-4.5 -4

-3.5 -3

-2.5 -2

-1.5 -1

-0.5 -0 0.5 1

Sigmoid function

The sigmoid function goes smoothly from 0 to 1 as net increases

The value of y when net=0 is 0.5 When net is negative, y is between

0 and 0.5 When net is positive, y is between

0.5 and 1.0

Back-propagation

The method of training is called the back-propagation of errors

The algorithm is an extension of the delta rule, called the generalised delta rule

Generalised delta rule

The equation for the generalised delta rule is ΔWi = ηXiδ

δ is the defined according to which layer is being considered.

For the output layer, δ is y(1-y)(d-y).

For the hidden layer δ is a more complex.

Pattern recognition

Many problems can be described as pattern recognition

For example, voice recognition, face recognition, optical character recognition

Pattern classification A more precise definition is pattern

classification In pattern classification a system is

shown examples of a number of objects

Each object is given a label or class The task of the system is to correctly

classify objects that it hasn’t seen before

Example of 2-input dataX1 X2 Class

1 1.5 02 1.8 02 3.5 04 0.52 05 1.5 04 1 01 3 0

1.5 2 05 2 0

4.5 1.44 04.5 2.5 15.5 3.5 14.5 4 13 5 1

3.5 4 13.5 2 13 3 1

4.5 4 14 3.5 1

5.5 4.5 1

Pattern space

0 1 2 3 4 5 6

Series1

Series2

Training a network The problem could not be implemented

on a single layer - nonlinearly separable A 3 layer MLP was tried with 4 neurons

in the hidden layer - which trained The number of neurons in the hidden

layer was reduced to 2 and still trained With 1 neuron in the hidden layer it

failed to train

The weights

The weights for the 2 neurons in the hidden layer are -9, 3.6 and 0.1 and 6.1, 2.2 and -7.8

These weights can be shown in the pattern space as two lines

The lines divide the space into 4 regions

The hidden neurons

0 1 2 3 4 5 6

Series1

Series2

Training and Testing

Starting with a data set, the first step is to divide the data into a training set and a test set

Use the training set to adjust the weights until the error is acceptably low

Test the network using the test set, and see how many it gets right

A better approach

Critics of this standard approach have pointed out that training to a low error can sometimes cause “overfitting”, where the network performs well on the training data but poorly on the test data

The alternative is to divide the data into three sets, the extra one being the validation set

Validation set During training, the training data is

used to adjust the weights At each iteration, the test data is

also passed through the network and the error recorded but the weights are not adjusted

The training stops when the error for the test set starts to increase

Stopping criteria

Stop here

Test set

Training set

ArchitectureInput layer Hidden layer Output layer

Back-propagation

The method of training is called the back-propagation of errors

The algorithm is an extension of the delta rule, called the generalised delta rule

Generalised delta rule

The equation for the generalised delta rule is ΔWi = ηXiδ

δ is the defined according to which layer is being considered.

For the output layer, δ is y(1-y)(d-y).

For the hidden layer δ is a more complex.

Hidden Layer

We have to deal with the error from the output layer being feedback backwards to the hidden layer.

Lets look at example the weight w2(1,2)

Which is the weight connecting neuron 1 in the input layer with neuron 2 in the hidden layer.

Δw2(1,2)=ηX1(1)δ2(2) Where

X1(1) is the output of the neuron 1 in the hidden layer.

δ2(2) is the error on the output of neuron 2 in the hidden layer.

δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

δ3(1) = y(1-y)(d-y)= x3(1)[1-x3(1)][d-

x3(1)]

So we start with the error at the output and use this result to ripple backwards altering the weights.

Example

Exclusive OR using the network shown earlier: 2:2:1 network

Initial weights W2(0,1)=0.862518 W2(1,1)=-0.155797

W2(2,1)=0.282885 W2(0,2)=0.834986 w2(1,2)=-0.505997

w2(2,2)=-0.864449 W3(0,1)=0.036498 w3(1,1)=-0.430437

w3(2,1)=0.48121

Feedforward – hidden layer (neuron 1) So if

X1(0)=1 (the bias) X1(1)=0 X1(2)=0

The output of weighted sum inside neuron 1 in the hidden layer=0.862518

Then using sigmoid function X2(1)=0.7031864

Feedforward – hidden layer (neuron 2) So if

X1(0)=1 (the bias) X1(1)=0 X1(2)=0

Then using sigmoid function X2(2)=0.6974081

Feedforward – output layer So if

X2(0)=1 (the bias) X2(1)=0.7031864 X2(2)=0.6974081

Then using sigmoid function X3(1)=0.5173481 Desired output=0

δ3(1)=x3(1)[1-x3(1)][d-x3(1)] =-0.1291812 δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1)=0.0116054 δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)=-0.0131183

Now we can use the delta rule to calculate the change in the weights

ΔWi = ηXiδ

Examples If we set η=0.5 ΔW2(0,1) = ηX1(0)δ2(1)

=0.5 x 1 x 0.0116054

=0.0058027 ΔW3(2,1) = ηX2(1)δ3(1)

=0.5 x 0.7031864 x –0.1291812

=-0.04545192

What would be the results of the following?

ΔW2(2,1) = ηX1(2)δ2(1) ΔW2(2,2) = ηX1(2)δ2(2)

ΔW2(2,1) = ηX1(2)δ2(1)

=0.5x0x0.0116054=0

ΔW2(2,2) = ηX1(2)δ2(2)=0.5 x 0 x –

0.131183=0

New weights W2(0,1)=0.868321 W2(1,1)=-0.155797

W2(2,1)=0.282885 W2(0,2)=0.828427 w2(1,2)=-0.505997

w2(2,2)=-0.864449 W3(0,1)=0.028093 w3(1,1)=-0.475856

w3(2,1)=0.436164

Conclusions Train using training, test and

validation sets An MLP can be used to recognise

(classify) complex data It uses supervised learning with

back-propagation to adjust the weights

It divides the pattern space in the hidden layer

Conclusions

Extending the delta rule to do back propagation

Need to calculate the error at the outputs of neurones in the hidden and output layers

δ3(1)=x3(1)[1-x3(1)][d-x3(1)] δ2(1)=X2(1)[1-X2(1)]w3(1,1) δ3(1) δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

Once you have the error values (δ’s) for the neurones you then use the delta rule to calculate the actual change in the weights.

ΔWi = ηXiδ

Artificial Intelligence Techniques Multilayer Perceptrons

Documents

Artificial neural network model & hidden layers in multilayer artificial neural networks

Lecture 4: Perceptrons and Multilayer Perceptrons€¦ · · 2007-10-09Lecture 4: Perceptrons and Multilayer Perceptrons ... Lecture 4: Perceptrons and Multilayer Perceptrons –

MULTILAYER PERCEPTRONS - York University Multilayer... · Now each layer of our multi-layer perceptron is a logistic regressor. ! Recall that optimizing the weights in logistic regression

Kalman Filter Based Algorithms for Fast Training of Multilayer Perceptrons: Implementation and Applications ECE 539 Project Dan Li Spring, 2000

Lectures 12&13&14: Multilayer Perceptrons (MLP) Networks

Multilayer Perceptrons CS/CMPE 537 – Neural Networks

Machine Learning Generalisation in Multilayer perceptrons Prof. Dr

Taehwan Kim and Tulay Adali- Approximation by Fully Complex Multilayer Perceptrons

Using Multilayer Perceptrons as means to predict …904219/FULLTEXT01.pdfDEGREE PROJECT IN MATERIALS DESIGN AND ENGINEERING 300 CREDITS, SECOND CYCLE STOCKHOLM, SWEDEN 2015 Using Multilayer

MULTILAYER PERCEPTRONS - Electrical Engineering & Computer Science

Artificial Neurons Perceptrons and the perceptron learning rule Sebastian Frühling – 29.04.2004

Lecture 5: Multilayer Perceptrons · accident, these networks are called multilayer perceptrons. 1.1 Learning Goals • Know the basic terminology for neural nets • Given the weights

Multilayer Perceptrons

Application of Multilayer Perceptrons for Response Modelingworldcomp-proceedings.com/proc/p2014/ICA3099.pdf · 2014. 5. 17. · Application of Multilayer Perceptrons for Response

Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)

4 Feedforward Multilayer Neural Networks — part Iusers.monash.edu/~app/CSE5301/Lnts/Ld.pdf · • Multilayer Perceptrons — Feedforward neural networks Each layer of the network

Jasmin Steinwender and Sebastian Bitzer- Multilayer Perceptrons

PMR5406 Redes Neurais e Lógica Fuzzy - USPsites.poli.usp.br/d/pmr5406/Download/Aula3/MultiLayerPerceptron1p.pdf · PMR5406 Redes Neurais e Lógica Fuzzy Multilayer Perceptrons 12

Lecture 5: Multilayer Perceptrons - cs.toronto.edurgrosse/courses/csc321_2018/readings/L05 Multilayer... · perceptron algorithm. Here, the units are arranged into a set of layers,

CSC413/2516 Lecture 2: Multilayer Perceptrons ... · A multilayer network consisting of fully connected layers is called amultilayer perceptron. Despite the name, it has nothing to