30
Feedforward Networks Gradient Descent Learning and Backpropagation CPSC 533 — Fall 2003 Christian Jacob © Dept.of Computer Science,University of Calgary Feedforward Neural Networks - Backpropagation 1 Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Feedforward Networks

Gradient Descent Learning and Backpropagation

CPSC 533 — Fall 2003

Christian Jacob ©

Dept.of Computer Science,University of Calgary

Feedforward Neural Networks - Backpropagation 1

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 2: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Adaptive "Programming" of ANNs through Learning

ANN LearningA learning algorithm is an adaptive method by which a network of computing units self-organizes to implement the desired behavior.

Changing Network Parameters

TestingInput/Output

ExamplesCalculating

Network Errors

Figure 1. Learning process in a parametric system

In some learning algorithms, examples of the desired input-output mapping are presented to the network.

A correction step is executed iteratively until the network learns to produce the desired response.

Feedforward Neural Networks - Backpropagation 2

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 3: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Learning Schemes

‡ Unsupervised Learning

For a given input, the exact numerical output a network should produce is unknown. Since no "teacher" is available, the network must organize itself (e.g., in order to associate clusters with units).

Examples: Clustering with self-organizing feature maps, Kohonen networks.

Figure 2. Three clusters and a classifier network

‡ Supervised Learning

Some input vectors are collected and presented to the network. The output com-puted by the network is observed and the deviation from the expected answer is measured. The weights are corrected (= learning algorithm) according to the magni-tude of the error.

Ë Error-correction Learning:

The magnitude of the error, together with the input vector, determines the magnitude of the corrections to the weights.

Examples: Perceptron learning, backpropagation.

Ë Reinforcement Learning:

After each presentation of an input-output example we only know whether the network produces the desired result or not. The weights are updated based on this Boolean decision (true or false).

Examples: Learning how to ride a bike.

Feedforward Neural Networks - Backpropagation 3

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 4: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Ë Reinforcement Learning:

After each presentation of an input-output example we only know whether the network produces the desired result or not. The weights are updated based on this Boolean decision (true or false).

Examples: Learning how to ride a bike.

Learning by Gradient Descent

Definition of the Learning ProblemLet us start with the simple case of linear cells, which we have introduced as percep-tron units.

The linear network should learn mappings (for m = 1, …, P ) between

Ë an input pattern xm = Hx1m, …, xN

m L and

Ë an associated target pattern T m .

Figure 3. Perceptron

Feedforward Neural Networks - Backpropagation 4

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 5: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

The output Oim of cell i for the input pattern xm is calculated as

(1)Oim = ‚

k

Hwki ÿ xkmL

The goal of the learning procedure is, that eventually the output Oim for input pat-

tern xm corresponds to the desired output Tim :

(2)Oim =

! Ti

m = ‚k

Hwki ÿ xkmL

‡ Example: Letter Classification

Note: This letter classification will only work with non-linear (sigmoidal) process-ing units.

Feedforward Neural Networks - Backpropagation 5

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 6: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Explicit Solution (Linear Network)*For a linear network, the weights that satisfy Equation (2) can be calculated explic-itly using the pseudo-inverse:

(3)wik =1ÅÅÅÅP

‚ml

Tim HQk-1Lml xk

l

(4)Qml =1ÅÅÅÅP

‚k

xkm xk

l

‡ Correlation Matrix

Here Qml is a component of the correlation matrix Qk of the input patterns:

(5)Qk =ikjjjjjjjj xk

1 xk1 xk

1 xk2 … xk

1 xkP

. . . .xkP xk

1 … … xkP xk

P

y{zzzzzzzzYou can check that this is indeed a solution by verifying

(6)‚k

wik xkm = Ti

m.

‡ Caveat

Note that Q-1 only exists for linearly independent input patterns.

That means, if there are ai such that for all k = 1, …, N

(7)a1 xk1 + a2 xk

2 + … + aP xkP = 0,

then the outputs Oim cannot be selected independently from each other, and the

problem is NOT solvable.

Feedforward Neural Networks - Backpropagation 6

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 7: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Learning by Gradient Descent (Linear Network)Let us now try to find a learning rule for a linear network with M output units.

Starting from a random initial weight setting w”÷÷ 0 , the learning procedure should find a solution weight matrix for Equation (2).

‡ Error Function

For this purpose, we define a cost or error function EHw”÷÷ L:

(8)

E Hw”L =1ÅÅÅÅ2

‚m=1

M ‚m=1

P HTmm - OmmL2

E Hw”L =1ÅÅÅÅ2

„m=1

M „m=1

P ikjjjjTmm - ‚k

Hwkm ÿ xkmLy{zzzz2

EHw”÷÷ L ¥ 0 will approach zero as w”÷÷ = 8wkm< satisfies Equation (2).

This cost function is a quadratic function in weight space.

Feedforward Neural Networks - Backpropagation 7

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 8: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

‡ Paraboloid

Therefore, EHw”÷÷ L is a paraboloid with a single global minimum.

<< RealTime3D`

Plot3D@x2 + y2, 8x, -5, 5<, 8y, -5, 5<D;

Feedforward Neural Networks - Backpropagation 8

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 9: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

ContourPlot@x2 + y2, 8x, -5, 5<, 8y, -5, 5<D;

-4 -2 0 2 4

-4

-2

0

2

4

If the pattern vectors are linearly independent—i.e., a solution for Equation (2) exists—the minimum is at E = 0.

Feedforward Neural Networks - Backpropagation 9

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 10: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

‡ Graphical Illustration: Following the Gradient

‡ Finding the Minimum: Following the Gradient

We can find the minimum of EHw”÷÷ L in weight space by following the negative gradient

(9)-∑w”÷ EHw”÷ L =-∑EHw”÷ LÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ

∑ w”We can implement this gradient strategy as follows:

‡ Changing a Weight

Each weight wki œ w”÷÷ is changed by Dwki proportionate to the E gradient at the current weight position (i.e., the current settings of all the weights):

(10)Dwki = -h ∑ E Hw”LÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ

∑ wki

Feedforward Neural Networks - Backpropagation 10

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 11: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

‡ Steps Towards the Solution

(11)

Dwki = -h ∑

ÅÅÅÅÅÅÅÅÅÅÅÅÅ∑ wki

ikjjjjjjjj 1ÅÅÅÅ2 „

m=1

M „m=1

P ikjjjjTmm - ‚n

Hwnm ÿ xnmLy{zzzz2y{zzzzzzzz

Dwki = -h 1ÅÅÅÅ2

„m=1

P∑

ÅÅÅÅÅÅÅÅÅÅÅÅÅ∑ wki

ikjjjjjjj„m=1

M ikjjjjTmm - ‚n

Hwnm ÿ xnmLy{zzzz2y{zzzzzzz

Dwki = -h 1ÅÅÅÅ2

„m=1

P

2 ikjjjjTim - ‚

n

Hwni ÿ xnmLy{zzzz H-xk

mL‡ Weight Adaptation Rule

(12)Dwki = h ‚m=1

P HTim - OimL xk

m

The parameter h is usually referred to as the learning rate.

In this formula, the adaptation of the weights are accumulated over all patterns.

‡ Delta, LMS Learning

If we change the weights after each presentation of an input pattern to the network, we get a simpler form for the weight update term:

(13)Dwki = h HTim - OimL xk

m

or

(14)Dwki = h dim xk

m

with

(15)dim = Ti

m - Oim.

This learning rule has several names:

Feedforward Neural Networks - Backpropagation 11

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 12: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Ë Delta rule

Ë Adaline rule

Ë Widrow-Hoff rule

Ë LMS (least mean square) rule.

Feedforward Neural Networks - Backpropagation 12

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 13: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Gradient Descent Learning with Nonlinear CellsWe will now extend the gradient descent technique for the case of nonlinear cells, that is, where the activation/output function is a general nonlinear function g(x).

† The input function is denoted by hHxL.

† The activation/output function gHhHxLL is assumed to be differentiable in x .

‡ Remember:

‡ Rewriting the Error Function

The definition of the error function (Equation (8)) can be simply rewritten as follows:

(16)

E Hw”L =1ÅÅÅÅ2

‚m=1

M ‚m=1

P HTmm - OmmL2

E Hw”L =1ÅÅÅÅ2

„m=1

M

„m=1

P

ikjjjjTmm - g

ikjjjj‚k

Hwkm ÿ xkmLy{zzzzy{zzzz2

Feedforward Neural Networks - Backpropagation 13

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 14: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

‡ Weight Gradients

Consequently, we can compute the wki gradients:

(17)∑ E Hw”LÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ

∑ wki= ‚

m=1

P

HTim - g HhimLL ÿ g£ HhimL ÿ xkm

‡ From Weight Gradients to the Learning Rule

This eventually (after some more calculations) shows us that the adaptation term Dwki for wki has the same form as in Equations (10), (13), and (14), namely:

(18)Dwki = h dim xk

m

where

(19)dim = HTim - Oi

mL ÿ g£ HhimL

Suitable Activation FunctionsThe calculation of the above d terms is easy for the following functions g, which are commonly used as activation functions:

‡ Hyperbolic Tangens:

(20)g HxL = tanh b x

g£ HxL = b H1 - g2 HxLL

Feedforward Neural Networks - Backpropagation 14

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 15: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Hyperbolic Tangens Plot:

Plot@Tanh@xD, 8x, -5, 5<D;

-4 -2 2 4

-1

-0.5

0.5

1

Plot of the first derivative:

Plot@Tanh'@xD, 8x, -5, 5<D;

-4 -2 2 4

0.2

0.4

0.6

0.8

1

Feedforward Neural Networks - Backpropagation 15

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 16: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Check for equality with 1 - tanh2 x

Plot@1 - Tanh@xD2, 8x, -5, 5<D;

-4 -2 2 4

0.2

0.4

0.6

0.8

1

Influence of the b parameter:

p1@b_D :=Plot@Tanh@b xD, 8x, -5, 5<, PlotRange Ø All, DisplayFunction Ø IdentityDp2@b_D :=Plot@Tanh'@b xD, 8x, -5, 5<, PlotRange Ø All, DisplayFunction Ø IdentityDTable@Show@GraphicsArray@8p1@bD, p2@bD<DD, 8b, 1, 5<D;

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

Feedforward Neural Networks - Backpropagation 16

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 17: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

Feedforward Neural Networks - Backpropagation 17

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 18: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Table@Show@GraphicsArray@8p1@bD, p2@bD<DD, 8b, 0.1, 1, 0.1<D;

-4 -2 2 4

-0.4

-0.2

0.2

0.4 -4 -2 2 4

0.8

0.85

0.9

0.95

-4 -2 2 4

-0.6-0.4-0.2

0.20.40.6 -4 -2 2 4

0.5

0.6

0.7

0.8

0.9

-4 -2 2 4

-0.75-0.5

-0.25

0.250.5

0.75 -4 -2 2 4

0.2

0.4

0.6

0.8

-4 -2 2 4

-1

-0.5

0.5

1 -4 -2 2 4

0.2

0.4

0.6

0.8

Feedforward Neural Networks - Backpropagation 18

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 19: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

-1

-0.5

0.5

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

Feedforward Neural Networks - Backpropagation 19

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 20: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

‡ Sigmoid:

(21)g HxL =

1ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ1 + e-2 bx

g£ HxL = 2 b g HxL H1 - g HxLLSigmoid Plot:

sigmoid@x_, b_D :=1

ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ1 + E-2 b x

Plot@sigmoid@x, 1D, 8x, -5, 5<D;

-4 -2 2 4

0.2

0.4

0.6

0.8

1

Plot of the first derivative:

D@sigmoid@x, bD, xD2 ‰-2 x b b

ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅH1 + ‰-2 x bL2

Feedforward Neural Networks - Backpropagation 20

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 21: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Plot@D@sigmoid@x, 1D, xD êê Evaluate, 8x, -5, 5<D;

-4 -2 2 4

0.1

0.2

0.3

0.4

0.5

Check for equality with 2 ÿ g ÿ H1 - gLPlot@2 sigmoid@x, 1D H1 - sigmoid@x, 1DL, 8x, -5, 5<D;

-4 -2 2 4

0.1

0.2

0.3

0.4

0.5

Influence of the b parameter:

p1@b_D :=Plot@sigmoid@x, bD, 8x, -5, 5<, PlotRange Ø All, DisplayFunction Ø IdentityDp2@b_D := Plot@D@sigmoid@x, bD, xD êê Evaluate,8x, -5, 5<, PlotRange Ø All, DisplayFunction Ø IdentityD

Feedforward Neural Networks - Backpropagation 21

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 22: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Table@Show@GraphicsArray@8p1@bD, p2@bD<DD, 8b, 1, 5<D;

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.1

0.2

0.3

0.4

0.5

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.20.40.60.81

1.21.4

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.5

1

1.5

2

Feedforward Neural Networks - Backpropagation 22

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 23: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Table@Show@GraphicsArray@8p1@bD, p2@bD<DD, 8b, 0.1, 1, 0.1<D;

-4 -2 2 4

0.4

0.5

0.6

0.7

-4 -2 2 4

0.042

0.044

0.046

0.048

0.05

-4 -2 2 4

0.20.30.40.50.60.70.8 -4 -2 2 4

0.05

0.06

0.07

0.08

0.09

-4 -2 2 4

0.4

0.6

0.8

-4 -2 2 4

0.04

0.06

0.08

0.12

0.14

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.0250.05

0.075

0.1250.15

0.1750.2

Feedforward Neural Networks - Backpropagation 23

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 24: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.05

0.1

0.15

0.2

0.25

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.050.1

0.150.2

0.250.3

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.050.1

0.150.2

0.250.3

0.35

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.1

0.2

0.3

0.4

-4 -2 2 4

0.2

0.4

0.6

0.8

1

-4 -2 2 4

0.1

0.2

0.3

0.4

Feedforward Neural Networks - Backpropagation 24

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 25: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

d Update Rule for Sigmoid Units

Using the sigmoidal activation function, the d update rule takes the simple form:

(22)dim = Oi

m H1 - OimL HTim - Oi

mL,which is used in the weight update rule:

(23)Dwki = h di

m xkm

Feedforward Neural Networks - Backpropagation 25

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 26: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

Learning in Multilayer NetworksMultilayer networks with nonlinear processing elements have a wider capability for solving classification tasks.

Learning by error backpropagation is a common method to train multilayer networks.

Error BackpropagationThe backpropagation (BP) algorithm describes an update procedure for the set of weights w”÷÷ in a feedforward multilayer network.

The network has to learn input-output patterns 8xkm, Ti

m<.

The basis for BP learning is, again, a similar gradient descent technique as used for perceptron learning, as described above.

‡ Notation

We use the following notation:

Ë xkm : value of input unit k for training pattern m; k = 1, …, N ;

m = 1, …, P

Ë H j : output of hidden unit j

Ë Oi : output of output unit i , i = 1, …, M

Ë wkj : weight of the link from input unit k to hidden unit j

Ë W ji : weight of the link from hidden unit j to output unit i

‡ Propagating the input through the network

For pattern m the hidden unit j receives the input

(24)hjm = ‚

k=1

N

wkj xkm

and generates the output

Feedforward Neural Networks - Backpropagation 26

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 27: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

(25)Hjm = g HhjmL = g

ikjjjj‚k=1

N

wkj xkmy{zzzz.

These signals are propagated to the output cells, which receive the signals

(26)him = ‚

j

Wij Hjm = „

j

Wij g ikjjjj‚k=1

N

wkj xkmy{zzzz

and generate the output

(27)Oim = g HhimL = g

ikjjjjjjjj„j

Wij g ikjjjj‚k=1

N

wkj xkmy{zzzzy{zzzzzzzz

‡ Error function

We use the known quadratic function as our error function:

(28)E Hw”L =1ÅÅÅÅ2

‚m=1

M ‚m=1

P HTmm - OmmL2

Continuing the calculations, we get:

(29)

E Hw”L =1ÅÅÅÅ2

‚m=1

M ‚m=1

P HTmm - g HhmmLL2E Hw”L =

1ÅÅÅÅ2

„m=1

M „m=1

P ikjjjjjjjjTmm - g

ikjjjjjjjj„j

Wmj g ikjjjj‚k=1

N

wkj xkmy{zzzzy{zzzzzzzzy{zzzzzzzz

2

E Hw”L =1ÅÅÅÅ2 „

m=1

M „m=1

P ikjjjjjjTmm - g ikjjjjjj‚

j

Wmj Hjmy{zzzzzzy{zzzzzz2

‡ Updating the weights: hidden—output layer

For the connections from hidden to output cells we can use the delta weight update rule:

Feedforward Neural Networks - Backpropagation 27

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 28: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

(30)

DWji = -h ∑ E

ÅÅÅÅÅÅÅÅÅÅÅÅÅ∑ Wji

DWji = h ‚m

HTim - OimL g£ HhimL Hj

m

DWji = h ‚m

dim Hj

m

with

(31)dim = g£ HhimL HTim - Oi

mL‡ Updating the weights: input—hidden layer

(32)

Dwkj = -h ∑ E

ÅÅÅÅÅÅÅÅÅÅÅÅÅ∑ wkj

Dwkj = -h „m

ikjjj ∑ EÅÅÅÅÅÅÅÅÅÅÅ∑ Hj

m ÿ∑ Hj

m

ÅÅÅÅÅÅÅÅÅÅÅÅÅ∑ wkj

y{zzzAfter a few more calculations we get the following weight update rule:

(33)Dwkj = h ‚m

djm xk

m

with

(34)djm = g£ HhjmL ‚

i

Wjim di

m

Feedforward Neural Networks - Backpropagation 28

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 29: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

The Backpropagation AlgorithmFor the BP algorithms we use the following notations:

Ë Vim : output of cell i in layer m

Ë Vi0 : corresponds to xi , the i -th input component

Ë w jim : the connection from V j

m-1 to Vim

‡ Backpropagation Algorithm

Ï Step 1: Initialize all weights with random values.

Ï Step 2: Select a pattern xm and attach it to the input layer Hm = 0L :

(35)Vj0 = xj

m , " k

Ï Step 3: Propagate the signals through all layers:

(36)Vim = g HhimL = g

ikjjjjjj‚j

wjim Vj

m-1y{zzzzzz, " i, " m

Ï Step 4: Calculate the d's of the output layer:

(37)diM = g£ HhiML HTiM - Vi

MLÏ Step 5: Calculate the d's for the inner layers by error backpropagation:

(38)dim-1 = g£ Hhim-1L ‚

j

wijm dj

m, m = M, M - 1, …, 2

Ï Step 6: Adapt all connection weights:

(39)wjinew = wji

old + Dwjiwith Dwji

m = h dim Vj

m-1

Ï Step 7: Go back to Step 2 for the next training pattern.

Feedforward Neural Networks - Backpropagation 29

Christian Jacob, Dept. of Computer Science, Univ. of Calgary

Page 30: Gradient Descent Learning and Backpropagationpages.cpsc.ucalgary.ca/~jacob/Courses/Fall2003/CPSC533/Slides/0… · Examples: Perceptron learning, backpropagation. ËReinforcement

ReferencesFreeman, J. A. Simulating Neural Networks with Mathematica. Addison-Wesley, Reading, MA, 1994.

Hertz, J., Krogh, A., and Palmer, R. G. Introduction to the Theory of Neural Compu-tation. Addison-Wesley, Reading, MA, 1991.

Rojas, R. Neural Networks: A Systematic Introduction . Springer Verlag, Berlin,1996.

Feedforward Neural Networks - Backpropagation 30

Christian Jacob, Dept. of Computer Science, Univ. of Calgary