Concept Learning Algorithms

Concept Learning Concept Learning AlgorithmsAlgorithms

bull Come from many different Come from many different theoretical backgrounds and theoretical backgrounds and motivationsmotivations

bull Behaviors related to human Behaviors related to human learninglearning

bull Some biologically inspired others Some biologically inspired others notnot

Biologically-Inspired

Utilitarian(just getgood result)

Neural Networks Tree LearnersNearest Neighbor

copy Jude Shavlik 2006 copy Jude Shavlik 2006 David Page 2010 David Page 2010

CS 760 ndash Machine Learning (UW-Madison)CS 760 ndash Machine Learning (UW-Madison)

TodayrsquosTodayrsquos TopicsTopics

bull PerceptronsPerceptronsbull Artificial Neural Networks (ANNs)Artificial Neural Networks (ANNs)bull BackpropagationBackpropagationbull Weight SpaceWeight Space



ConnectionismConnectionism

PERCEPTRONS (Rosenblatt 1957)PERCEPTRONS (Rosenblatt 1957)bull among earliest work in machine among earliest work in machine

learninglearningbull died out in 1960rsquos (Minsky amp Papert died out in 1960rsquos (Minsky amp Papert

book)book)J

K

L

I

wij

wik

wil

Outputi = F(Wij outputj + Wik outputk + Wil outputl )



Perceptron as Perceptron as ClassifierClassifierbull Output for N example X is sign(WOutput for N example X is sign(WX) X)

where sign is -1 or +1 (or use threshold where sign is -1 or +1 (or use threshold and 01)and 01)

bull Candidate Hypotheses real-valued weight Candidate Hypotheses real-valued weight vectorsvectors

bull Training Update W for each misclassified Training Update W for each misclassified example X (target class example X (target class tt predicted predicted oo) by) bybull WWii W Wii + + ((tt--oo)X)Xii

bull Here Here is learning rate parameteris learning rate parameter



Gradient Descent Gradient Descent for the Perceptronfor the Perceptron(Assume no threshold for now and (Assume no threshold for now and

start with a common error measure)start with a common error measure) Error Error frac12 ( t ndash o ) frac12 ( t ndash o )

2

Networkrsquos output

Teacherrsquos answer (a constant wrt the weights) EE

WWkk

ΔΔWWjj - η

= (t ndash o)

EE

WWkk

(t ndash o)(t ndash o)

WWkk

= -(t ndash o) oo

WW kk

Remember o = WmiddotX



Continuation of Continuation of DerivationDerivation

EE WWkk

= -(t ndash o) WWkk

(( sumsumk k ww k k x x kk))

= -(t ndash o) x k

So ΔWk = η (t ndash o) xk The Perceptron Rule

Stick in formula for

output

Also known as the delta rule and other names (with small variations in calc)



As it looks in your text As it looks in your text (processing all data at once)hellip(processing all data at once)hellip



Linear Separability Linear Separability

Consider a perceptron its output is

1 If W1X1+W2X2 + hellip + WnXn gt 0 otherwise

In terms of feature space W1X1 + W2X2 =

X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -

Hence can only classify examples if a ldquolinerdquo (hyerplane) can separate them

y = mx + b



Perceptron Convergence Perceptron Convergence

TheoremTheorem (Rosemblatt 1957)(Rosemblatt 1957)

Perceptron no Hidden Units

If a set of examples is learnable the perceptron training rule will eventually find the necessary weightsHowever a perceptron can only learnrepresent linearly separable datasetcopy Jude Shavlik 2006 copy Jude Shavlik 2006 David Page 2010 David Page 2010


The (Infamous) XOR The (Infamous) XOR ProblemProblem

Input

0 00 11 01 1

Output

0110

a)b)c)d)

Exclusive OR (XOR)Not linearly separable

b

a c

d

0 1

1

A Neural Network SolutionX1

X2

X1

X2

1

1

-1-1

1

1 Let = 0 for all nodescopy Jude Shavlik 2006 copy Jude Shavlik 2006 David Page 2010 David Page 2010


The Need for Hidden The Need for Hidden UnitsUnits

If there is one layer of enough hidden units (possibly 2N for Boolean functions) the input can be recoded (N = number of input units)

This recoding allows any mapping to be represented (Minsky amp Papert)Question How to provide an error signal to the interior units



Hidden UnitsHidden Units

One ViewAllow a system to create its own internal representation ndash for which problem solving is easy A perceptron



Advantages of Advantages of Neural NetworksNeural Networks

Provide best predictive Provide best predictive accuracy for some accuracy for some problemsproblems

Being supplanted by Being supplanted by SVMrsquosSVMrsquos

Can represent a rich Can represent a rich class of conceptsclass of concepts

PositivenegativePositive

Saturday 40 chance of rainSunday 25 chance of rain



Overview of ANNsOverview of ANNs

Recurrentlink

Output units

Input units

Hidden units

error

weight



BackpropagationBackpropagation




bull Backpropagation involves a generalization of the Backpropagation involves a generalization of the perceptron ruleperceptron rule

bull Rumelhart Parker and Le Cun (and Bryson amp Ho 1969) Rumelhart Parker and Le Cun (and Bryson amp Ho 1969) Werbos 1974) independently developed (1985) a Werbos 1974) independently developed (1985) a technique for determining how to adjust weights of technique for determining how to adjust weights of interior (ldquohiddenrdquo) unitsinterior (ldquohiddenrdquo) units

bull Derivation involves partial derivatives Derivation involves partial derivatives (hence threshold function must be differentiable)(hence threshold function must be differentiable)

error signal

E Wij



Weight SpaceWeight Space

bull Given a neural-network layout the weights Given a neural-network layout the weights are free parameters that are free parameters that define a define a spacespace

bull Each pointEach point in this in this Weight SpaceWeight Space specifies a specifies a networknetwork

bull Associated with each point is an Associated with each point is an error rateerror rate EE over the training dataover the training data

bull Backprop performs Backprop performs gradient descentgradient descent in weight in weight spacespace



Gradient Descent in Weight Gradient Descent in Weight SpaceSpace

E

W1

W2

Ew

W1

W2



The Gradient-Descent The Gradient-Descent RuleRule

E(w) [ ]Ew0

Ew1

Ew

2

EwN

hellip hellip hellip _

The ldquogradien

trdquoThis is a N+1 dimensional vector (ie the lsquoslopersquo in weight space)Since we want to reduce errors we want to go ldquodown hillrdquoWersquoll take a finite step in weight space

E

W1

W2

w = - E ( w )

or wi = - Ewi

ldquodeltardquo = change to

w

E

w



ldquoldquoOn Linerdquo vs ldquoBatchrdquo On Linerdquo vs ldquoBatchrdquo BackpropBackprop

bull Technically we should look at the error Technically we should look at the error gradient for the entire training set gradient for the entire training set before taking a step in weight space before taking a step in weight space (ldquo(ldquobatchbatchrdquo Backprop)rdquo Backprop)

bull HoweverHowever as presented we take a step as presented we take a step after each example (ldquoafter each example (ldquoon-lineon-linerdquo Backprop)rdquo Backprop)bull Much faster convergenceMuch faster convergencebull Can reduce overfitting (since on-line Can reduce overfitting (since on-line

Backprop is ldquonoisyrdquo gradient descent)Backprop is ldquonoisyrdquo gradient descent)



ldquoldquoOn Linerdquo vs ldquoBatchrdquo BP On Linerdquo vs ldquoBatchrdquo BP (continued)(continued)

BATCHBATCH ndash add ndash add w w vectors for vectors for everyevery training example training example thenthen lsquomoversquo in weight lsquomoversquo in weight spacespace

ON-LINEON-LINE ndash ldquomoverdquo ndash ldquomoverdquo after after eacheach example example (aka (aka stochasticstochastic gradient descent)gradient descent)

E

wi

w1

w3w2

w

w1

w2

w3

Final locations in space need not be the same for BATCH and ON-LINE w

N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w



Need Derivatives Replace Need Derivatives Replace Step (Threshold) by Step (Threshold) by SigmoidSigmoidIndividual units

bias

output

input

output i= F(weight ij x output j)

Where

F(input i) =

j

1

1+e -(input i ndash bias i)



Differentiating the Differentiating the Logistic FunctionLogistic Function

out i =

1

1 + e - ( wji x outj)

F rsquo(wgtrsquoed in) = out i ( 1- out i ) 0

12

Wj x outj

F(wgtrsquoed in)



Assume one layer of hidden units (std topology)Assume one layer of hidden units (std topology)

11 Error Error frac12 frac12 ( Teacher ( Teacherii ndash ndash OutputOutput ii ) ) 22

22 = frac12 = frac12 (Teacher(Teacherii ndash F ndash F (( [[WWijij x x OutputOutput jj] )] )22

33 = frac12 = frac12 (Teacher(Teacherii ndash F ndash F (( [[WWijij x F x F ((WWjkjk x Output x Output

kk)])))]))22

DetermineDetermine

recallrecall

BP CalculationsBP Calculations

Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)

See Table 42 in Mitchell for resultswxy = - ( E wxy )

k j i



Derivation in MitchellDerivation in Mitchell



Some NotationSome Notation



By Chain Rule By Chain Rule (since (since WWjiji influences rest of network only influences rest of network only by its influence on by its influence on NetNetjj)hellip)hellip





Also remember this for later ndashWersquoll call it -δj





Remember thatoj is xkj outputfrom j is inputto k

Remember netk =wk1 xk1 + hellip+ wkN xkN





Using BP to Train Using BP to Train ANNrsquosANNrsquos

11 Initiate weights amp bias to Initiate weights amp bias to small random values small random values (eg in [-03 03])(eg in [-03 03])

22 Randomize order of Randomize order of training examples for training examples for each doeach do

a)a) Propagate activity Propagate activity forwardforward to output unitsto output units

k j i

outi = F( wij x outj )j



Using BP to Train Using BP to Train ANNrsquos ANNrsquos (continued)(continued)

b)b) Compute ldquodeviationrdquo for output Compute ldquodeviationrdquo for output unitsunits

c)c) Compute ldquodeviationrdquo for hidden Compute ldquodeviationrdquo for hidden unitsunits

d)d) Update weightsUpdate weights

i = F rsquo( neti ) x (Teacheri-outi)

ij = F rsquo( netj ) x ( wij x

i)

wij = x i x out j

wjk = x j x out k

F rsquo( netj ) = F(neti) neti




33 Repeat until training-set error rate Repeat until training-set error rate small enough (or until tuning-set error small enough (or until tuning-set error rate begins to rise ndash see later slide)rate begins to rise ndash see later slide)

Should use ldquoearly stoppingrdquo (ie Should use ldquoearly stoppingrdquo (ie minimize error on the tuning set more minimize error on the tuning set more details later)details later)

44 Measure accuracy on test set to Measure accuracy on test set to estimate estimate generalizationgeneralization (future (future accuracy)accuracy)



Advantages of Neural Advantages of Neural NetworksNetworks

bull Universal representation (provided Universal representation (provided enough hidden units)enough hidden units)

bull Less greedy than tree learnersLess greedy than tree learnersbull In practice good for problems with In practice good for problems with

numeric inputs and can also numeric inputs and can also handle numeric outputshandle numeric outputs

bull PHD for many years best protein PHD for many years best protein secondary structure predictorsecondary structure predictor



DisadvantagesDisadvantages

bull Models not very comprehensibleModels not very comprehensiblebull Long training timesLong training timesbull Very sensitive to number of Very sensitive to number of

hidden unitshellip as a result largely hidden unitshellip as a result largely being supplanted by SVMs (SVMs being supplanted by SVMs (SVMs take very different approach to take very different approach to getting non-linearity)getting non-linearity)



Looking AheadLooking Aheadbull Perceptron rule can also be thought of as Perceptron rule can also be thought of as

modifying modifying weights on data points weights on data points rather rather than featuresthan features

bull Instead of process all data (batch) vs Instead of process all data (batch) vs one-at-a-time could imagine processing 2 one-at-a-time could imagine processing 2 data points at a time adjusting their data points at a time adjusting their relative weights based on their relative relative weights based on their relative errorserrors

bull This is what Plattrsquos SMO does (the SVM This is what Plattrsquos SMO does (the SVM implementation in Weka)implementation in Weka)



Backup Slide to help Backup Slide to help with Derivative of with Derivative of SigmoidSigmoid



TodayrsquosTodayrsquos TopicsTopics

bull PerceptronsPerceptronsbull Artificial Neural Networks (ANNs)Artificial Neural Networks (ANNs)bull BackpropagationBackpropagationbull Weight SpaceWeight Space






book)book)J

K

L

I

wij

wik

wil













2



WWkk

ΔΔWWjj - η

= (t ndash o)

EE

WWkk


WWkk

= -(t ndash o) oo

WW kk





EE WWkk

= -(t ndash o) WWkk


= -(t ndash o) x k



output











X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -


y = mx + b









Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k


































book)book)J

K

L

I

wij

wik

wil













2



WWkk

ΔΔWWjj - η

= (t ndash o)

EE

WWkk


WWkk

= -(t ndash o) oo

WW kk





EE WWkk

= -(t ndash o) WWkk


= -(t ndash o) x k



output











X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -


y = mx + b









Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k








































2



WWkk

ΔΔWWjj - η

= (t ndash o)

EE

WWkk


WWkk

= -(t ndash o) oo

WW kk





EE WWkk

= -(t ndash o) WWkk


= -(t ndash o) x k



output











X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -


y = mx + b









Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k

































2



WWkk

ΔΔWWjj - η

= (t ndash o)

EE

WWkk


WWkk

= -(t ndash o) oo

WW kk





EE WWkk

= -(t ndash o) WWkk


= -(t ndash o) x k



output











X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -


y = mx + b









Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k
































EE WWkk

= -(t ndash o) WWkk


= -(t ndash o) x k



output











X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -


y = mx + b









Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k






































X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -


y = mx + b









Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k



































X2 = = W1X1

W2

-W1 W2 W2

X1+

+ + + + + + - + - - + + + + - + + - - -+ + - -+ - - -

- -


y = mx + b









Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k





































Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k
































Input

0 00 11 01 1

Output

0110

a)b)c)d)


b

a c

d

0 1

1


X2

X1

X2

1

1

-1-1

1





















Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k

















































Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k












































Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k








































Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k
































Recurrentlink

Output units

Input units

Hidden units

error

weight










error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k






































error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k



































error signal

E Wij











E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k







































E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k
































E

W1

W2

Ew

W1

W2




E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k
































E(w) [ ]Ew0

Ew1

Ew

2

EwN


The ldquogradien


E

W1

W2

w = - E ( w )

or wi = - Ewi


w

E

w












E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k








































E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k


































E

wi

w1

w3w2

w

w1

w2

w3


N

ote

w

iB

ATC

H

w

i O

N-L

INE

for

i gt

1

E

w




bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k
































bias

output

input


Where

F(input i) =

j

1





out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k
































out i =

1



12

Wj x outj

F(wgtrsquoed in)







kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k



































kk)])))]))22

DetermineDetermine

recallrecall


Error Wij

Error Wjk

= (use equation 2)

= (use equation 3)


k j i





























k j i










i)

wij = x i x out j

wjk = x j x out k

























































k j i










i)

wij = x i x out j

wjk = x j x out k






















































k j i










i)

wij = x i x out j

wjk = x j x out k



















































k j i










i)

wij = x i x out j

wjk = x j x out k
















































k j i










i)

wij = x i x out j

wjk = x j x out k














































k j i










i)

wij = x i x out j

wjk = x j x out k











































k j i










i)

wij = x i x out j

wjk = x j x out k









































k j i










i)

wij = x i x out j

wjk = x j x out k





































k j i










i)

wij = x i x out j

wjk = x j x out k



































k j i










i)

wij = x i x out j

wjk = x j x out k





































i)

wij = x i x out j

wjk = x j x out k









































































































Documents

Concept Learning Algorithms