Upload
kendall
View
33
Download
0
Embed Size (px)
DESCRIPTION
Before we start ADALINE. Test the response of your Hebb and Perceptron on this following noisy version Exercise pp98 2.6(d). Input Unit. Output Unit. 1. b. X 1. w 1. Y. :. w 2. X n. ADALINE. ADAPTIVE LINEAR NEURON - PowerPoint PPT Presentation
Citation preview
Before we start ADALINEBefore we start ADALINE
Test the response of your Hebb and Perceptron on this Test the response of your Hebb and Perceptron on this following noisy versionfollowing noisy version
Exercise pp98 2.6(d)Exercise pp98 2.6(d)
ADALINEADALINE ADAPTIVE LINEAR NEURONADAPTIVE LINEAR NEURON Typically uses bipolar (1, -1) activations for its input Typically uses bipolar (1, -1) activations for its input
signal and its target outputsignal and its target output The weights are adjustable, has bias whose activation is The weights are adjustable, has bias whose activation is
always 1always 1
Architecture of an ADALINE
w1
1
X1
Xn
Y
b
w2
Output UnitInput Unit
:
ADALINEADALINE
In general ADALINE can be trained using the In general ADALINE can be trained using the delta ruledelta rule also known as also known as least mean squaresleast mean squares ( (LMSLMS) or Widrow-Hoff ) or Widrow-Hoff rulerule
The The delta ruledelta rule can also be used for single layer nets with can also be used for single layer nets with several output unitsseveral output units
ADALINE – a special one - only one output unitADALINE – a special one - only one output unit
ADALINEADALINE Activation of the unitActivation of the unit
Is the net input with identity functionIs the net input with identity function
The The learning rulelearning rule minimizes the mean squares error minimizes the mean squares error between the activation and the target valuebetween the activation and the target value
Allows the net to continue learning on all training Allows the net to continue learning on all training patterns, even after the correct output value is patterns, even after the correct output value is generatedgenerated
ADALINEADALINE
After training, if the net is being used for pattern After training, if the net is being used for pattern classification in which the desired output is either a +1 or classification in which the desired output is either a +1 or a -1, a threshold function is applied to the net input to a -1, a threshold function is applied to the net input to obtain the activationobtain the activation
If If net_input net_input ≥ 0 then ≥ 0 then activationactivation = 1 = 1Else activation = -1Else activation = -1
The AlgorithmThe AlgorithmStep 0: Initialize all weights and bias:
(small random values are usually used0Set learning rate (0 < ≤ 1) = 0
Step 1: While stopping condition is false, do steps 2-6.
Step2:For each bipolar training pair s:t, do steps 3-5
Step 3. Set activations for input units:i = 1, …, n:xi = si
Step 4.Compute net input to output unit:
NET = y_in = bb + + xi wi ;
The AlgorithmThe AlgorithmStep 5. Update weights and bias i = 1, …, n
wi(new) = wi(old) + (t – y_in)xi b(new) = b(old) + (t – y_in)else
wi(new) = wi(old) b(new) = b(old)
Step 6. Test stopping condition:If the largest weight change that occurred in Step 2 is smaller than a specified tolerance, then stop; otherwise continue.
Setting the learning rate Setting the learning rate
Common to take a small value for = 0.1 initially If too large, the learning process will not
converge If too small learning will be extremely slow
For single neuron, a practical range is0.1 ≤ n ≤ 1.0
ApplicationApplicationAfter training, an ADALINE unit can be used to classify input patterns. If the target values are bivalent (binary or bipolar), a step function can be applied as activation function for the output unit
Step 0: Initialize all weights Step 1: For each bipolar input vector x, do steps 2-4
Step 2. Set activations for input units to xStep 3. Compute net input to output unit:
net = y_in = bb + + xi wi ;
Step 4. Apply the activation function
1 if y_in ≥ 0;
-1 if y_in < 0.
f(y_in)
Example 1Example 1 ADALINE for AND function: binary input, bipolar targets
(x1 x2 t)(1 1 1)(1 0 -1)(0 1 -1)(0 0 -1)
Delta rule in ADALINE is designed to find weights that minimize the total error
E = ( (x1(p) w1 + x2(p)w2 + w0 – t(p))2
p=1
4
Net input to the output unit for pattern p
Associated target for pattern p
Example 1Example 1
ADALINE for AND function: binary input, bipolar targets
Delta rule in ADALINE is designed to find weights that minimize the total error
Weights that minimize this error are w1 = 1, w2 = 1, w0 = -3/2
Separating lines x1 + x2 – 3/2 = 0
Example 2Example 2 ADALINE for AND function: bipolar input, bipolar targets
(x1 x2 t)(1 1 1)
(1 -1 -1)(-1 1 -1)(-1 -1 -1)
Delta rule in ADALINE is designed to find weights that minimize the total error
E = ( (x1(p) w1 + x2(p)w2 + w0 – t(p))2
p=1
4
Net input to the output unit for pattern p
Associated target for pattern p
Example 2Example 2
ADALINE for AND function: bipolar input, bipolar targets
Weights that minimize this error are w1 = 1/2, w2 = 1/2, w0 = -1/2
Separating lines 1/2x1 +1/2 x2 – 1/2 = 0
ExampleExample
Example 3: ADALINE for AND NOT function: bipolar input, bipolar targets
Example 4: ADALINE for OR function: bipolar input, bipolar targets
Derivations Delta rule for single output unit
The delta rule changes the weights of the connections to minimize the difference between input and output unit
By reducing the error for each pattern one at a time The delta rule for Ith weight(for each pattern) is
wI = (t – y_in)xI
Derivations The squared error for a particular training pattern is
E = (t – y_in)2.E : function of all weights wi, I = 1, …, n
The gradient of E is the vector consisting of the partial derivatives of E with respect to each of the weights
The gradient gives the direction of most rapid increase in E
Opposite direction gives the most rapid decrease in the error
The error can be reduced by adjusting the weight wI in the direction of
- E wI
Derivations Since
- E wI
y_in = xi wi ,,
= -2(t – y_in) - y_in wI
= -2(t – y_in)xI
The local error will be reduced most rapidly by adjusting the weights according to the delta rule
wI = (t – y_in)xI
Derivations Delta rule for multiple output unit
The delta rule for Ith weight(for each pattern) iswIJ = (t – y_inJ)xI
Derivations The squared error for a particular training pattern is
E = (tj – y_inj)2.
E : function of all weights wi, I = 1, …, n
The error can be reduced by adjusting the weight wI in the direction of
- E wIJ
(tj – y_inj)2= wI
j=1
m
j=1
m
= wI
(tJ – y_inJ)2
Continued pp 88
Exercise http://www.neural-networks-at-your-fingertips.com/
adaline.html Adaline Network Simulator
MADALINEMADALINE MANY ADAPTIVE LINEAR NEURONMANY ADAPTIVE LINEAR NEURON
Architecture of an MADALINE with twohidden ADALINES and one output ADALINE
v1
1
Z1
Z2
Y
v2
1
X2
X1
1
b3
b1
b2
w22
w21
w12
w11
MADALINEMADALINE Derivation of delta rule for several outputs shows no change in Derivation of delta rule for several outputs shows no change in
the training process with several combination of ADALINEsthe training process with several combination of ADALINEs
The outputs of The outputs of two hiddentwo hidden ADALINES, ADALINES, zz11 and and zz22 are are determined by signal from input units determined by signal from input units XX11 and and XX22
Each output signal is the result of applying a threshold function Each output signal is the result of applying a threshold function to the unit’s net inputto the unit’s net input
y y is the non-linear function of the input vector (is the non-linear function of the input vector (xx11, x, x22))
MADALINEMADALINE Why we need hidden units???Why we need hidden units???
The use of hidden units Z1 and Z2 give the netThe use of hidden units Z1 and Z2 give the net Computational capabilities not found in single layer netsComputational capabilities not found in single layer nets But…complicate the training processBut…complicate the training process
Two algorithmsTwo algorithms MRI – only weights for hidden ADALINES are adjusted, MRI – only weights for hidden ADALINES are adjusted,
the weights for output unit are fixedthe weights for output unit are fixed MRII – provides methods for adjusting all weights in the MRII – provides methods for adjusting all weights in the
netnet
ALGORITHM: MRIALGORITHM: MRI
v1
1
Z1
Z2
Y
v2
1 X2
X1
1
b3b1
b2
w22
w21
w12
w11
The weights v1 and v2 and bias b3 that feed into the output unit Y are determined so that the response of unit Y is 1 if the signal it receives from either Z1 or Z2 (or both) is 1 and is -1 if both Z1 and Z2 send a signal of -1. The unit Y performs the logic function OR on the signals it receives from Z1 and Z2
Set v1 = ½, v2 = ½ and b3 = ½
see example 2.19 the OR function
ALGORITHM: ALGORITHM: MRIMRI
v1
1
Z1
Z2
Y
v2
1
X2
X1
1
b3
b1
b2
w22
w21
w12
w11
x1 x2 t1 1 -11 -1 1-1 1 1-1 -1 -1
Set = 0.5
Weights into Z1 Z2 Yw11 w21 b1 w12 w22 b2 v1 v2 b3
.05 .2 .3 .1 .2 .15 .5 .5 .5
Set v1 = ½, v2 = ½ and b3 = ½
see example 2.19 the OR function
1 if x ≥ 0-1 if x < 0
Step 0: Initialize all weights and bias: wi = 0 (i= 1 to n), b=0Set learning rate (0 < ≤ 1) = 0
Step 1: While stopping condition is false, do steps 2-8.
Step2: For each bipolar training pair s:t, do steps 3-7 Step 3. Set activations for input units:
xi = si
Step 4.Compute net input to each hidden ADALINE unit:
z_in1 = b1+ x1 w11 + x2 w21 ;z_in2 = b2+ x2 w12 + x2 w22 ;
Step 5. Determine output of each hidden ADALINEz1 = f(z_in1)z2 = f(z_in2)
Step 6. Determine output of net:y_in = b3+ z1 v1 + z2 v2
f(x)
v1
1
Z1
Z2
Y
v2
1 X2
X1
1
b3
b1
b2
w22
w21
w12
w11
The AlgorithmThe AlgorithmStep 7. Update weights and bias if an error occurred for this pattern
If t = y, no weight updates are performed
otherwise;
If t = 1, then update weights on ZJ, the unit whose net input is closest to 0,
wiJ(new) = wiJ(old) + (1 – z_in)xi
bJ(new) = bJ(old) + (1 – z_inJ)
If t = -1, then update weights on all units ZK, that have positive net input,
wik(new) = wik(old) + (-1 – z_in)xi
bk(new) = bk(old) + (-1 – z_ink)
Step 8. Test stopping condition:
Of weight changes have stopped(or reached an acceptable level), or if a specified maximum number of weight update iterations (Step 2) have been performed, then stop; otherwise continue