23
x 1 x 2 1 4 2 - 3.93 1 . 0 1 5 . 0 982 . 0 d x The transfer function is unipolar continuous (logsig) x o o o d w o o net f ) 1 )( ( ) 1 ( ) ( ' x w net e o 1 1 0 net=2*0.982+4*0.5-3.93*1=0.034 o=1/(1+exp(-0.04)) = 0.51 1225 . ) 51 )(. 51 . 1 )( 51 . 1 ( 012 . 2 012 . 2 012 . 012 . 0 982 . * 1225 . * 1 . 0 982 . 0 * * old new w w w d = t Example 2 4+0.1*0.1225*.5=4.0061 -3.93+.1*.1225*1=-3.9178 net = 2.012*0.982+4.0061*0.5-3.9178*1=0.061 o=1/(1+exp(-0.061)=0.5152 Error=1-0.51=0.49 Error=1-0.5152=0.4848

intelligent control techniques

Embed Size (px)

DESCRIPTION

Training panels for Neural Fitting Tool and Neural Time Series Tool Provide Choice of Training AlgorithmsBayesian Regularization Supports Optional Validation StopsNeural Network Training Tool Shows Calculations Mode

Citation preview

Page 1: intelligent control techniques

x1

x2 1

4

2

-3.93

1.015.0

982.0

dx

The transfer function is unipolar continuous (logsig)

xooodw

oonetf

)1)((

)1()('

xw

neteo

1

1

0

net=2*0.982+4*0.5-3.93*1=0.034

o=1/(1+exp(-0.04)) = 0.51

1225.)51)(.51.1)(51.1(

012.2012.2012.

012.0982.*1225.*1.0982.0**

oldnew ww

w

d = t

Example 2

4+0.1*0.1225*.5=4.0061

-3.93+.1*.1225*1=-3.9178

net = 2.012*0.982+4.0061*0.5-3.9178*1=0.061

o=1/(1+exp(-0.061)=0.5152

Error=1-0.51=0.49

Error=1-0.5152=0.4848

Page 2: intelligent control techniques

By chain rule:

∂E ∂E ∂oi ∂xi ---- = ---- ---- ----∂wij ∂oi ∂xi ∂wij

∂E---- = (1/2) 2 (di - oi) (-1) = (oi - ti)∂oi

∂oi ∂---- = ---- [1 / (1 + e-xi)] = - [1 / (1 + e-xi)2] (- e-xi ) = e-xi / (1 + e-xi)2

∂xi ∂xi

∂xi---- = aj ∂wij

E = 1/2 ∑ (di - oi)2

i

xi = ∑ wijaj j

(1 + e-xi) - 1 1 = ------------- • ----------- = [1 - 1 / (1 + e-xi)] • [1 / (1 + e-xi)] (1 + e-xi) (1 + e-xi)

= (1 - oi) oi

Page 3: intelligent control techniques

∂E ∂E ∂oi ∂xi ---- = ---- ---- ----∂wij ∂oi ∂xi ∂wij

= (oi - ti) (1 - oi)oi aj }

raw error term

}

due to sigmoid

}

due to incoming (pre-synaptic) activation

∂E Δwij = - η ----- (where η is an arbitrary learning rate) ∂wij

wijt+1

= wijt + η (ti - oi) (1 - oi) oi aj

Page 4: intelligent control techniques
Page 5: intelligent control techniques

Examples of Network Architectures

Page 6: intelligent control techniques

A two layer network

1225.)51)(.51.1)(51.1(5

Transfer function is unipolar continuous

net3=u3= 3*1+4*0+1*1=4 o3=1/(1+exp(-4))=0.982

net4=u4= 6*1+5*0+-6*1=0 o4=1/(1+exp(0))=0.5

net5=u5=2*0.982+4*0.5-3.93*1=0.034 o5=1/(1+exp(-0.04))

=0.51

neteo

1

1

1010

1.dx

xo)o)(od(w

o)o()net('f

1

1

xw δ

012.2012.2012.

012.0982.*1225.*1.0982.0**

5353

553

ww

w

Page 7: intelligent control techniques

Derivation of Backprop

Input layer

Hidden layer

Output layer Define:ai = activation of neuron iwij = synaptic weight from neuron j to neuron ixi = excitation of neuron i (sum of weighted activations coming into neuron i, before squashing)=netdi = target vector=ti

oi = output of neuron iBy definition:xi = ∑ wijaj j

oi = 1 / (1 + e-xi)Summed, squared error at output layer: E = 1/2 ∑ (di - oi)2

i

Page 8: intelligent control techniques

Derivation of BackpropBy chain rule:

∂E ∂E ∂oi ∂xi ---- = ---- ---- ----∂wij ∂oi ∂xi ∂wij

∂E---- = (1/2) 2 (di - oi) (-1) = (oi - ti)∂oi

∂oi ∂---- = ---- [1 / (1 + e-xi)] = - [1 / (1 + e-xi)2] (- e-xi ) = e-xi / (1 + e-xi)2

∂xi ∂xi

∂xi---- = aj ∂wij

E = 1/2 ∑ (di - oi)2

i

xi = ∑ wijaj j

(1 + e-xi) - 1 1 = ------------- • ----------- = [1 - 1 / (1 + e-xi)] • [1 / (1 + e-xi)] (1 + e-xi) (1 + e-xi)

= (1 - oi) oi

Page 9: intelligent control techniques

Derivation of Backprop∂E ∂E ∂oi ∂xi ---- = ---- ---- ----∂wij ∂oi ∂xi ∂wij

= (oi - ti) (1 - oi)oi aj }

raw error term

}

due to sigmoid

}

due to incoming (pre-synaptic) activation

∂E Δwij = - η ----- (where η is an arbitrary learning rate) ∂wij

wijt+1

= wijt + η (ti - oi) (1 - oi) oi aj

Page 10: intelligent control techniques

Derivation of BackpropNow need to compute weight changes in the hidden layer, so, as before, we write out the equation for the error function slope w.r.t. a particular weight leading into the hidden layer:

∂E ∂E ∂ai ∂xi ---- = ---- ---- ----∂wij ∂ai ∂xi ∂wij

(where i now corresponds to a unit in the hidden layer and j now corresponds to a unit in the input or earlier hidden layer)

From previous derivation, last two terms can simply be written down:∂ai---- = (1 - ai) ai ∂xi

∂xi---- = aj ∂wij

Page 11: intelligent control techniques

Derivation of BackpropNow need to compute weight changes in the hidden layer, so, as before, we write out the equation for the error function slope w.r.t. a particular weight leading into the hidden layer:

∂E ∂E ∂ai ∂xi ---- = ---- ---- ----∂wij ∂ai ∂xi ∂wij

(where i now corresponds to a unit in the hidden layer and j now corresponds to a unit in the input or earlier hidden layer)

From previous derivation, last two terms can simply be written down:∂ai---- = (1 - ai) ai ∂xi

∂xi---- = aj ∂wij

Page 12: intelligent control techniques

Derivation of Backprop

However, the first term is more difficult to understand for this hidden layer. It is what Minsky called the credit assignment problem, and is what stumped connectionists for two decades. The trick is to realize that the hidden nodes do not themselves make errors, rather they contribute to the errors of the output nodes. So, the derivative of the total error w.r.t. a hidden neuron’s activation is the sum of that hidden neuron’s contributions to the errors in all of the output neurons:

∂E ∂E ∂ok ∂xk ---- = ∑ ---- ---- ---- (where k indexes over all output units)∂ai k ∂ok ∂xk ∂ai

contribution of each output neuron

contribution of all inputs to the output neuron (from the hidden layer)

contribution of the particular neuron in the hidden layer

Page 13: intelligent control techniques

Derivation of BackpropFrom our previous derivations, the first two terms are easy:

∂E---- = (ok - dk)∂ok

∂ok---- = (1 - ok) ok ∂xk

∂xk---- = wki ∂ai

For the third term, remember:

xk = ∑ wkiai i

And since only one member of the sum involves ai:

Page 14: intelligent control techniques

Derivation of Backprop

∂E---- = - ∑ (dk - ok) (1 - ok) ok wki ∂ai k

Combining these terms then yields:

δk Weight between hidden and output layers

And combining with previous results yields:

∂E---- = - (∑ δk wki) (1 - ai) ai aj ∂wij k

wijt+1

= wijt + η (∑ δk wki) (1 - ai) ai aj k

δi

ei

Page 15: intelligent control techniques

Derivation of Backprop

Page 16: intelligent control techniques

Forward Propagation of Activity

• Forward Direction layer by layer:

– Inputs applied

– Multiplied by weights

– Summed

– ‘Squashed’ by sigmoid activation function

– Output passed to each neuron in next layer

• Repeat above until network output produced

Back-propagation of error

• Compute error (delta or local gradient) for each output unit

• Layer-by-layer, compute error (delta or local gradient) for each hidden unit by backpropagating errors (as shown previously)

Can then update the weights using the Generalised Delta Rule (GDR), also known as the Back Propagation (BP) algorithm

Page 17: intelligent control techniques

For output neuron

wijt+1

= wijt + η (di - oi) (1 - oi) oi aj

For hidden neuron

i

wijt+1

= wijt + η (∑ δk wki) (1 - ai) ai aj k

δ k=(dk - ok) (1 - ok) ok

i

The chain rule does the following: distribute the error of an output unit o to all the hidden units that is it connected to, weighted by this connection. Differently put, a hidden unit h receives a delta from each output unit o equal to the delta of that output unit weighted with (= multiplied by) the weight of the connection between those units.

Page 18: intelligent control techniques

Algorithm (Backpropagation)Start with random weightswhile error is unsatisfactory do for each input pattern compute hidden node input (net) compute hidden node output (o) compute input to output node (net) compute network output (o) Modify outer layer weights

Modify outer layer weights

end end

wijt+1

= wijt + η (di - oi) (1 - oi) oi aj

wijt+1

= wijt + η (∑ δk wki) (1 - ai) ai aj k

δ k=(dk - ok) (1 - ok) ok

Page 19: intelligent control techniques

So the error for this training example is: (1 - 0.510)= 0.490

1225.)1)((**

0043.)1)((**

1225.)51)(.51.1)(51.1(

445454

335353

5

oow

oow

Page 20: intelligent control techniques

9078.301225.92.301225.

01225.01225.*1.01**

5050

550

ww

w

012.2012.2012.

012.0982.*1225.*1.0982.0**

5353

553

ww

w

0043.30043.30043.

0043.01*0043.*1.01**

01225.60125.601225.

01225.01*1225.*1.01**

3153

331

41341

441

ww

w

ww

w

Page 21: intelligent control techniques

w δ a new w

Page 22: intelligent control techniques

Verification that it works

Thus the new error (1 - 0.5239)=0.476

has been reduced by 0.014

(from 0.490 to 0.476)

Page 23: intelligent control techniques

Update the weights of the multi-layer network using backpropagation algorithm. The transfer function of the neurons are unipolar sigmoid functions. Target outputs are y2*=1 and y3*=0.5. Learning rate is 0.5.Show that with the updated weights there is a reduction in the total error.

Homework