6
Multi-layer Neural Networks Steve Renals Informatics 2B— Learning and Data Lecture 13 8 March 2011 Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1 Overview Multi-layer neural networks Multi-layer perceptrons (MLPs) The credit assignment problem for hidden units Back-propagation of error (backprop) training Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 2 Limitations of single-layer neural networks Single-layer neural networks have many advantages: Easy to setup and train Explicit link to statistical models: Shared covariance Gaussian density functions Sigmoid output functions allow a link to posterior probabilities Outputs are weighted sum of inputs: interpretable representation Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 3 Limitations of single-layer neural networks Single-layer neural networks have many advantages: Easy to setup and train Explicit link to statistical models: Shared covariance Gaussian density functions Sigmoid output functions allow a link to posterior probabilities Outputs are weighted sum of inputs: interpretable representation But some big limitations: Can only represent a limited set of functions Decision boundaries must be hyperplanes Can only perfectly separate linearly separable data Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 3 Generalised Linear Network Generalises linear discriminants by adding another, non-adaptive layer: y k (x)= M j =0 w kj φ j (x) The input vector x is transformed using a set of M predefined non-linear functions, φ j (x), called basis functions. Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4 Generalised Linear Network Generalises linear discriminants by adding another, non-adaptive layer: y k (x)= M j =0 w kj φ j (x) The input vector x is transformed using a set of M predefined non-linear functions, φ j (x), called basis functions. This allows a much larger class of discriminant functions (in fact can approximate any continuous function to an arbitrary accuracy) Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4 Generalised Linear Network Generalises linear discriminants by adding another, non-adaptive layer: y k (x)= M j =0 w kj φ j (x) The input vector x is transformed using a set of M predefined non-linear functions, φ j (x), called basis functions. This allows a much larger class of discriminant functions (in fact can approximate any continuous function to an arbitrary accuracy) Multilayer neural networks employ adaptive basis functions with parameters (weights) that may be estimated from the training data Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4 Multi-layer neural networks Construct more general networks by considering layers of processing units Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5 Multi-layer neural networks Construct more general networks by considering layers of processing units Unlike generalised linear discriminants, each layer has a set of adaptive weights Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

rks - The University of Edinburgh · rks considering ers of units of ts as hidden called erceptrons (MLPs) 13 rks5 rks considering ers of units of ts as hidden called erceptrons (MLPs)

Embed Size (px)

Citation preview

Page 1: rks - The University of Edinburgh · rks considering ers of units of ts as hidden called erceptrons (MLPs) 13 rks5 rks considering ers of units of ts as hidden called erceptrons (MLPs)

Multi-layer Neural Networks

Steve Renals

Informatics 2B— Learning and Data Lecture 138 March 2011

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 1

Overview

Multi-layer neural networks

Multi-layer perceptrons (MLPs)

The credit assignment problem for hidden units

Back-propagation of error (backprop) training

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 2

Limitations of single-layer neural networks

Single-layer neural networks have many advantages:

Easy to setup and trainExplicit link to statistical models:

Shared covariance Gaussian density functionsSigmoid output functions allow a link to posterior probabilities

Outputs are weighted sum of inputs: interpretablerepresentation

But some big limitations:

Can only represent a limited set of functionsDecision boundaries must be hyperplanesCan only perfectly separate linearly separable data

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 3

Limitations of single-layer neural networks

Single-layer neural networks have many advantages:

Easy to setup and trainExplicit link to statistical models:

Shared covariance Gaussian density functionsSigmoid output functions allow a link to posterior probabilities

Outputs are weighted sum of inputs: interpretablerepresentation

But some big limitations:

Can only represent a limited set of functionsDecision boundaries must be hyperplanesCan only perfectly separate linearly separable data

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 3

Generalised Linear Network

Generalises linear discriminants by adding another,non-adaptive layer:

yk(x) =M∑

j=0

wkjφj(x)

The input vector x is transformed using a set of M predefinednon-linear functions, φj(x), called basis functions.

This allows a much larger class of discriminant functions (infact can approximate any continuous function to an arbitraryaccuracy)

Multilayer neural networks employ adaptive basis functionswith parameters (weights) that may be estimated from thetraining data

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4

Generalised Linear Network

Generalises linear discriminants by adding another,non-adaptive layer:

yk(x) =M∑

j=0

wkjφj(x)

The input vector x is transformed using a set of M predefinednon-linear functions, φj(x), called basis functions.

This allows a much larger class of discriminant functions (infact can approximate any continuous function to an arbitraryaccuracy)

Multilayer neural networks employ adaptive basis functionswith parameters (weights) that may be estimated from thetraining data

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4

Generalised Linear Network

Generalises linear discriminants by adding another,non-adaptive layer:

yk(x) =M∑

j=0

wkjφj(x)

The input vector x is transformed using a set of M predefinednon-linear functions, φj(x), called basis functions.

This allows a much larger class of discriminant functions (infact can approximate any continuous function to an arbitraryaccuracy)

Multilayer neural networks employ adaptive basis functionswith parameters (weights) that may be estimated from thetraining data

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 4

Multi-layer neural networks

Construct more general networks by considering layers ofprocessing units

Unlike generalised linear discriminants, each layer has a set ofadaptive weights

Layers that are not input or output are referred to as hidden

Often called multilayer perceptrons (MLPs)

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

Multi-layer neural networks

Construct more general networks by considering layers ofprocessing units

Unlike generalised linear discriminants, each layer has a set ofadaptive weights

Layers that are not input or output are referred to as hidden

Often called multilayer perceptrons (MLPs)

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

Page 2: rks - The University of Edinburgh · rks considering ers of units of ts as hidden called erceptrons (MLPs) 13 rks5 rks considering ers of units of ts as hidden called erceptrons (MLPs)

Multi-layer neural networks

Construct more general networks by considering layers ofprocessing units

Unlike generalised linear discriminants, each layer has a set ofadaptive weights

Layers that are not input or output are referred to as hidden

Often called multilayer perceptrons (MLPs)

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

Multi-layer neural networks

Construct more general networks by considering layers ofprocessing units

Unlike generalised linear discriminants, each layer has a set ofadaptive weights

Layers that are not input or output are referred to as hidden

Often called multilayer perceptrons (MLPs)

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 5

Multi-layer Perceptron

+

Inputs

x0 x1 xd

Bias

+

+ +

Hidden

Outputs

Biasz0 z1 zM

yKy1

w(1)Md

w(1)10

w(2)10 w(2)

KM

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 6

Building up the MLP (1)

First we take a M linear combinations of the d-dimensioninputs:

bj =d∑

i=0

w(1)ji xi

bj : activations

w(1)ji : first layer of weights

Activations transformed by a nonlinear activation function h(e.g. a sigmoid):

zj = h(bj) =1

1 + exp(−bj)

zj : hidden unit outputs

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 7

Building up the MLP (1)

First we take a M linear combinations of the d-dimensioninputs:

bj =d∑

i=0

w(1)ji xi

bj : activations

w(1)ji : first layer of weights

Activations transformed by a nonlinear activation function h(e.g. a sigmoid):

zj = h(bj) =1

1 + exp(−bj)

zj : hidden unit outputs

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 7

Building up the MLP (2)

Outputs of the hidden units are linearly combined to giveactivations of the output units:

ak =M∑

j=0

w(2)kj zj

The output units are transformed using an activation function(e.g. a sigmoid)

yk = g(ak) =1

1 + exp(−ak)

For multiclass problems, a softmax may be usedCombine to give the overall forward propagation equation forthe network:

yk = g

M∑

j=0

w(2)kj h

(d∑

i=0

w(1)ji xi

)

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 8

Building up the MLP (2)

Outputs of the hidden units are linearly combined to giveactivations of the output units:

ak =M∑

j=0

w(2)kj zj

The output units are transformed using an activation function(e.g. a sigmoid)

yk = g(ak) =1

1 + exp(−ak)

For multiclass problems, a softmax may be used

Combine to give the overall forward propagation equation forthe network:

yk = g

M∑

j=0

w(2)kj h

(d∑

i=0

w(1)ji xi

)

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 8

Building up the MLP (2)

Outputs of the hidden units are linearly combined to giveactivations of the output units:

ak =M∑

j=0

w(2)kj zj

The output units are transformed using an activation function(e.g. a sigmoid)

yk = g(ak) =1

1 + exp(−ak)

For multiclass problems, a softmax may be usedCombine to give the overall forward propagation equation forthe network:

yk = g

M∑

j=0

w(2)kj h

(d∑

i=0

w(1)ji xi

)

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 8

Multi-layer Perceptron

+

Inputs

x0 x1 xd

Bias

+

+ +

Hidden

Outputs

Biasz0 z1 zM

yKy1

w(1)Md

w(1)10

w(2)10 w(2)

KM

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 9

Page 3: rks - The University of Edinburgh · rks considering ers of units of ts as hidden called erceptrons (MLPs) 13 rks5 rks considering ers of units of ts as hidden called erceptrons (MLPs)

Training MLPs: Credit assignment

Hidden units make training the weights more complicated

The gradients for a single layer neural network have a simpleform:

∂En

∂wki= δkxi

For a multi-layer neural network: what is the “error” of a

hidden unit? how important is input-hidden weight w(1)ji to

output unit k?

Solution: we need to define derivatives of the error withrespect to each weight

Algorithm: back-propagation of error (backprop)

Backprop gives a way to compute the derivatives. Thesederivatives are used by an optimisation algorithm (e.g.gradient descent) to train the weights.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

Training MLPs: Credit assignment

Hidden units make training the weights more complicated

The gradients for a single layer neural network have a simpleform:

∂En

∂wki= δkxi

For a multi-layer neural network: what is the “error” of a

hidden unit? how important is input-hidden weight w(1)ji to

output unit k?

Solution: we need to define derivatives of the error withrespect to each weight

Algorithm: back-propagation of error (backprop)

Backprop gives a way to compute the derivatives. Thesederivatives are used by an optimisation algorithm (e.g.gradient descent) to train the weights.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

Training MLPs: Credit assignment

Hidden units make training the weights more complicated

The gradients for a single layer neural network have a simpleform:

∂En

∂wki= δkxi

For a multi-layer neural network: what is the “error” of a

hidden unit? how important is input-hidden weight w(1)ji to

output unit k?

Solution: we need to define derivatives of the error withrespect to each weight

Algorithm: back-propagation of error (backprop)

Backprop gives a way to compute the derivatives. Thesederivatives are used by an optimisation algorithm (e.g.gradient descent) to train the weights.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

Training MLPs: Credit assignment

Hidden units make training the weights more complicated

The gradients for a single layer neural network have a simpleform:

∂En

∂wki= δkxi

For a multi-layer neural network: what is the “error” of a

hidden unit? how important is input-hidden weight w(1)ji to

output unit k?

Solution: we need to define derivatives of the error withrespect to each weight

Algorithm: back-propagation of error (backprop)

Backprop gives a way to compute the derivatives. Thesederivatives are used by an optimisation algorithm (e.g.gradient descent) to train the weights.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

Training MLPs: Credit assignment

Hidden units make training the weights more complicated

The gradients for a single layer neural network have a simpleform:

∂En

∂wki= δkxi

For a multi-layer neural network: what is the “error” of a

hidden unit? how important is input-hidden weight w(1)ji to

output unit k?

Solution: we need to define derivatives of the error withrespect to each weight

Algorithm: back-propagation of error (backprop)

Backprop gives a way to compute the derivatives. Thesederivatives are used by an optimisation algorithm (e.g.gradient descent) to train the weights.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

Training MLPs: Credit assignment

Hidden units make training the weights more complicated

The gradients for a single layer neural network have a simpleform:

∂En

∂wki= δkxi

For a multi-layer neural network: what is the “error” of a

hidden unit? how important is input-hidden weight w(1)ji to

output unit k?

Solution: we need to define derivatives of the error withrespect to each weight

Algorithm: back-propagation of error (backprop)

Backprop gives a way to compute the derivatives. Thesederivatives are used by an optimisation algorithm (e.g.gradient descent) to train the weights.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 10

Training MLPs: Error function and required gradients

Sum-of-squares error function, obtained by summing over atraining set of N examples:

E =N∑

n=1

En

En =1

2

K∑

k=1

(ynk − tnk )2

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 11

Gradient descent training

Operation of gradient descent:1 Start with a guess for the weight matrix W (small random

numbers)2 Update the weights by adjusting the weight matrix in the

direction of −∇WE .3 Recompute the error, and iterate

The update for weight wki at iteration τ + 1 is:

w τ+1ki = w τ

ki − η∂E

∂wki

The parameter η is the learning rate

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 12

Training MLPs: Error function and required gradients

Sum-of-squares error function, obtained by summing over atraining set of N examples:

E =N∑

n=1

En

En =1

2

K∑

k=1

(ynk − tnk )2

To obtain the overall error gradients, we sum over the trainingexamples:

∂E

∂wkj=

N∑

n=1

∂En

∂wkj

∂E

∂wji=

N∑

n=1

∂En

∂wji

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 13

Page 4: rks - The University of Edinburgh · rks considering ers of units of ts as hidden called erceptrons (MLPs) 13 rks5 rks considering ers of units of ts as hidden called erceptrons (MLPs)

Training MLPs: Hidden-to-output weights

Write En in terms of hidden-to-output weights:

En =1

2

K∑

k=1

(g(ank)− tnk )2 =1

2

K∑

k=1

g

M∑

j=0

wkjznj

− tnk

2

Break down error derivatives

∂En

∂wkj=∂En

∂ank

∂ank∂wkj

∂En/∂ank is often referred to as the error signal, δnk

δnk =∂En

∂ank=∂En

∂ynk· ∂y

nk

∂ank= (ynk − tnk )g ′(ank)

Since∂ank∂wkj

= znj , we obtain∂En

∂wkj= δnkz

nj

This is similar to single-layer neural networks with a nonlinearactivation function.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

Training MLPs: Hidden-to-output weights

Write En in terms of hidden-to-output weights:

En =1

2

K∑

k=1

(g(ank)− tnk )2 =1

2

K∑

k=1

g

M∑

j=0

wkjznj

− tnk

2

Break down error derivatives

∂En

∂wkj=∂En

∂ank

∂ank∂wkj

∂En/∂ank is often referred to as the error signal, δnk

δnk =∂En

∂ank=∂En

∂ynk· ∂y

nk

∂ank= (ynk − tnk )g ′(ank)

Since∂ank∂wkj

= znj , we obtain∂En

∂wkj= δnkz

nj

This is similar to single-layer neural networks with a nonlinearactivation function.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

Training MLPs: Hidden-to-output weights

Write En in terms of hidden-to-output weights:

En =1

2

K∑

k=1

(g(ank)− tnk )2 =1

2

K∑

k=1

g

M∑

j=0

wkjznj

− tnk

2

Break down error derivatives

∂En

∂wkj=∂En

∂ank

∂ank∂wkj

∂En/∂ank is often referred to as the error signal, δnk

δnk =∂En

∂ank=∂En

∂ynk· ∂y

nk

∂ank= (ynk − tnk )g ′(ank)

Since∂ank∂wkj

= znj , we obtain∂En

∂wkj= δnkz

nj

This is similar to single-layer neural networks with a nonlinearactivation function.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

Training MLPs: Hidden-to-output weights

Write En in terms of hidden-to-output weights:

En =1

2

K∑

k=1

(g(ank)− tnk )2 =1

2

K∑

k=1

g

M∑

j=0

wkjznj

− tnk

2

Break down error derivatives

∂En

∂wkj=∂En

∂ank

∂ank∂wkj

∂En/∂ank is often referred to as the error signal, δnk

δnk =∂En

∂ank=∂En

∂ynk· ∂y

nk

∂ank= (ynk − tnk )g ′(ank)

Since∂ank∂wkj

= znj , we obtain∂En

∂wkj= δnkz

nj

This is similar to single-layer neural networks with a nonlinearactivation function.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

Training MLPs: Hidden-to-output weights

Write En in terms of hidden-to-output weights:

En =1

2

K∑

k=1

(g(ank)− tnk )2 =1

2

K∑

k=1

g

M∑

j=0

wkjznj

− tnk

2

Break down error derivatives

∂En

∂wkj=∂En

∂ank

∂ank∂wkj

∂En/∂ank is often referred to as the error signal, δnk

δnk =∂En

∂ank=∂En

∂ynk· ∂y

nk

∂ank= (ynk − tnk )g ′(ank)

Since∂ank∂wkj

= znj , we obtain∂En

∂wkj= δnkz

nj

This is similar to single-layer neural networks with a nonlinearactivation function.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 14

Training MLPs: Input-to-hidden weights

To compute the error gradients for the input-to-hiddenweights we must take into account all the ways in whichhidden unit j (and hence weight wji ) can influence the error.Consider δnj , the error signal for hidden unit j :

δnj =∂En

∂bnj=

K∑

k=1

∂En

∂ank

∂ank∂bnj

=K∑

k=1

δnk∂ank∂bnj

Sum over all the output units’ contributions to δnj :

∂ank∂bnj

=∂ank∂znj

∂znj∂bnj

= wkjh′(bnj )

Substituting in we obtain:

δnj = h′(bnj )K∑

k=1

δnkwkj

This is the famous back-propagation of error (backprop)equation.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 15

Training MLPs: Input-to-hidden weights

To compute the error gradients for the input-to-hiddenweights we must take into account all the ways in whichhidden unit j (and hence weight wji ) can influence the error.Consider δnj , the error signal for hidden unit j :

δnj =∂En

∂bnj=

K∑

k=1

∂En

∂ank

∂ank∂bnj

=K∑

k=1

δnk∂ank∂bnj

Sum over all the output units’ contributions to δnj :

∂ank∂bnj

=∂ank∂znj

∂znj∂bnj

= wkjh′(bnj )

Substituting in we obtain:

δnj = h′(bnj )K∑

k=1

δnkwkj

This is the famous back-propagation of error (backprop)equation.

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 15

Training MLPs: Input-to-hidden weights

To compute the error gradients for the input-to-hiddenweights we must take into account all the ways in whichhidden unit j (and hence weight wji ) can influence the error.Consider δnj , the error signal for hidden unit j :

δnj =∂En

∂bnj=

K∑

k=1

∂En

∂ank

∂ank∂bnj

=K∑

k=1

δnk∂ank∂bnj

Sum over all the output units’ contributions to δnj :

∂ank∂bnj

=∂ank∂znj

∂znj∂bnj

= wkjh′(bnj )

Substituting in we obtain:

δnj = h′(bnj )K∑

k=1

δnkwkj

This is the famous back-propagation of error (backprop)equation.Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 15

Back-propagation of error: hidden unit error signal

Outputs

Hidden units

z j

xi

w(2)1 j w(2)

! j

w(1)ji

yKy!y1

w(2)K j

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 16

Page 5: rks - The University of Edinburgh · rks considering ers of units of ts as hidden called erceptrons (MLPs) 13 rks5 rks considering ers of units of ts as hidden called erceptrons (MLPs)

Back-propagation of error: hidden unit error signal

Outputs

Hidden units

z j

xi

w(2)1 j w(2)

! j

δ!δ1

w(1)ji

yKy!y1

w(2)K j

!K

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 16

Back-propagation of error: hidden unit error signal

Outputs

Hidden units

z j

xi

w(2)1 j w(2)

! j

δ!δ1

w(1)ji

yKy!y1

w(2)K j

!j = h!(bj)!

!

!!w!j

!K

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 16

Back-propagation of error

The derivatives of the input-to-hidden weights can thus beevaluated using:

∂En

∂wji=∂En

∂bnj

∂bnj∂wji

= δnj xni

The back-propagation of error algorithm is summarised asfollows:

1 Apply the N input vectors from the training set, xn, to thenetwork and forward propagate to obtain the set of outputvectors yn

2 Using the target vectors tn compute the error E3 Evaluate the error signals δnk for each output unit4 Evaluate the error signals δnk for each hidden unit using

back-propagation of error5 Evaluate the derivatives for each training pattern, summing to

obtain the overall derivatives

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 17

Back-propagation of error

The derivatives of the input-to-hidden weights can thus beevaluated using:

∂En

∂wji=∂En

∂bnj

∂bnj∂wji

= δnj xni

The back-propagation of error algorithm is summarised asfollows:

1 Apply the N input vectors from the training set, xn, to thenetwork and forward propagate to obtain the set of outputvectors yn

2 Using the target vectors tn compute the error E3 Evaluate the error signals δnk for each output unit4 Evaluate the error signals δnk for each hidden unit using

back-propagation of error5 Evaluate the derivatives for each training pattern, summing to

obtain the overall derivatives

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 17

Gradient descent training

Operation of gradient descent:1 Start with a guess for the weight matrix W (small random

numbers)2 Update the weights by adjusting the weight matrix in the

direction of −∇WE .3 Recompute the error, and iterate

The update for weight wki at iteration τ + 1 is:

w τ+1ki = w τ

ki − η∂E

∂wki

The parameter η is the learning rate

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 18

MLP Example

Netlab demmlp2

MLP trained as a classifier on data from a known distribution

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 19

Training data

!2 !1 0 1 2

!2

!1

0

1

2

3

The Sampled Data Probability Density p(x)

!2 !1 0 1 2

!2

!1

0

1

2

3

Training Data pdf

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 20

Training data

Density p(x|red)

!2 0 2

!2

!1

0

1

2

3

0.05

0.1

0.15

0.2

Density p(x|yellow)

!2 0 2

!2

!1

0

1

2

3

0.1

0.2

0.3

0.4

0.5

0.6

Posterior Probability p(red|x)

!2 0 2

!2

!1

0

1

2

3

0.2

0.4

0.6

0.8

Posterior Probability p(yellow|x)

!2 0 2

!2

!1

0

1

2

3

0.2

0.4

0.6

0.8

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 21

MLP Example

Netlab demmlp2

MLP trained as a classifier on data from a known distribution

Train MLP: 6 hidden units, 2 output units

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 22

Page 6: rks - The University of Edinburgh · rks considering ers of units of ts as hidden called erceptrons (MLPs) 13 rks5 rks considering ers of units of ts as hidden called erceptrons (MLPs)

MLP Network OutputTraining Data

BayesNetwork

Network Output

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Training data Network output

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 23

MLP Example

Netlab demmlp2

MLP trained as a classifier on data from a known distribution

Train MLP: 6 hidden units, 2 output units

Compare with single layer network

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 24

SLN Network OutputTraining Data

BayesSLN

SLN Output

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Training data Network output

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 25

MLP Example

Netlab demmlp2

MLP trained as a classifier on data from a known distribution

Train MLP: 6 hidden units, 2 output units

Compare with single layer network

Classify test data

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 26

Decision boundaries

Test Data

Bayes decision boundarySLN decision boundary

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 27

MLP Example

Netlab demmlp2

MLP trained as a classifier on data from a known distribution

Train MLP: 6 hidden units, 2 output units

Compare with single layer network

Classify test data

Confusion matrices:

Optimal

79 2810 83

81% correct

MLP

80 2713 80

80% correct

SLN

79 3816 77

73% correct

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 28

Summary

Multi-layer perceptrons

Multi-layer neural networks

Back-propagation of error training

Example

Informatics 2B: Learning and Data Lecture 13 Multi-layer Neural Networks 29