Upload
vasukoneti5124
View
218
Download
0
Embed Size (px)
Citation preview
8/11/2019 perceptons neural networks
1/33
Perceptron Perceptron is one of the earliest models of artificial neuron.
It was proposed by Rosenblatt in 1958.
It is a single layer neural network whose weights can be trained to
produce a correct target vector when presented with the correspondinginput vector
The training technique used is called the Perceptron learning rule.
The Perceptron generated great interest due to its ability togeneralizefrom its training vectors and work with randomly distributed
connections. Perceptrons are especially suited for problems in pattern classification.
27-08-2014 K.Vasu MITS 1
8/11/2019 perceptons neural networks
2/33
27-08-2014 K.Vasu MITS 2
The schematic diagram of perceptron is shown in below Fig. Its synaptic weights aredenoted by w
1
, w2
, . . . wn
. The inputs applied to the perceptron are denoted by x1
, x2
, .. . . xn. The externally applied bias is denoted by b.
xn
o
w2
wn
f(.)
Output
x2
Hard limiter
x1 w1
net
bias, b
Inputs
Fig. Schematic diagram of perceptron
8/11/2019 perceptons neural networks
3/33
27-08-2014 K.Vasu MITS 3
The net input to the activation of the neuron is written as
n
1i
ii bxwnet
The output of Perceptron is written as o = f(net)
where f(.) is the activation function of Perceptron. Dependingupon the type of activation function, the Perceptron may beclassified into two types
Discrete perceptron, in which the activation function is hard
limiteror sgn(.)function
Continuous perceptron, in which the activation function issigmoid function, which is differentiable.
8/11/2019 perceptons neural networks
4/33
Perceptrons
Linear separability
A set of (2D) patterns (x1,x2) of two classes is linearly separableif there exists a line on the (x1,x2) plane
w0+ w1x1+ w2x2= 0
Separates all patterns of one class from the other class
A perceptron can be built with
3 inputx0= 1,x1,x2with weights w0, w1, w2 ndimensional patterns (x1,,xn)
Hyperplane w0+ w1x1+ w2x2++ wnxn= 0 dividing thespace into two regions
Can we get the weights from a set of sample patterns?
If the problem is linearly separable, then YES (by perceptronlearning)
27-08-2014 K.Vasu MITS 4
8/11/2019 perceptons neural networks
5/33
LINEAR SEPARABILITY
Definition:Two sets of points A and B in an n-dimensional space are calledlinearly separable if n+1 real numbers w1, w2, w3, . . . ., wn+1exist, such thatevery point (x1, x2, . . . , xn)A satisfies and every point (x1, x2, . . . , xn) B
satisfies .Absolute Linear Separability
Two sets of points A and B in an n-dimensional space are called linearlyseparable if n+1 real numbers w1, w2, w3, . . . ., wn+1exist, such that everypoint (x1, x2, . . . , xn) A satisfies and every point (x1, x2, . . . , xn) Bsatisfies .
Two finite sets of points A and B, in n-dimensional space which are linearseparable are also absolute linearly separable. In general, absolute linearly separable-> linearly separable
but if sets are finite, linearly separableabsolutely linearly separable
27-08-2014 K.Vasu MITS 5
8/11/2019 perceptons neural networks
6/33
Examples of linearly separable classes
- LogicalAND function
patterns (bipolar) decision boundary
x1 x2 output w1 = 1-1 -1 -1 w2 = 1-1 1 -1 w0 = -11 -1 -1
1 1 1 -1 + x1 + x2 = 0- Logical OR function
patterns (bipolar) decision boundary
x1 x2 output w1 = 1-1 -1 -1 w2 = 1
-1 1 1 w0 = 11 -1 11 1 1 1 + x1 + x2 = 0
27-08-2014 K.Vasu MITS 6
x
oo
o
x: class I (output = 1)o: class II (output = -1)
x
xo
x
x: class I (output = 1)o: class II (output = -1)
8/11/2019 perceptons neural networks
7/33
Single Layer Discrete Perceptron Networks (SLDP)
27-08-2014 K.Vasu MITS 7
ClassC1
Class C2
x1
x2
Fig. 3.2 Illustration of the hyper plane (in this example, a straight line)
as decision boundary for a two dimensional, two-class patron classification problem.
To develop insight into the behavior of a pattern classifier, it is necessary toplot a map of the decision regions in n-dimensional space, spanned by the ninput variables. The two decision regions separated by a hyper plane definedby
n
i
iw0
i 0x
8/11/2019 perceptons neural networks
8/33
SLDP
27-08-2014 K.Vasu MITS 8
Cla
ss
C1
Cla
ss
C2
(b)
Cla
ss
C1
Cla
ss
C2
(a)
Decision boundary
Fig (a) A pair of linearly separable patterns
(b) A pair of nonlinearly separable patterns.
For the Perceptron to function properly, the two classes C1 and C2 must be linearlyseparable.
In Fig.3.3(a), the two classes C1 and C2 are sufficiently separated from each other todraw a hyper plane (in this it is a straight line) as the decision boundary.
8/11/2019 perceptons neural networks
9/33
SLDPAssume that the input variables originate from two linearly separable classes.
Let1be the subset of training vectors X1(1), X1(2), . , that belongs to class C1
2be the subset of training vectors X2(1), X2(2), . , that belong to class C2.
Given the sets of vectors 1 and 2 to train the classifier, the training
process involves the adjustment of the W in such a way that the two
classes C1 and C2 are linearly separable. That is, there exists a weight
vector W such that we may write,
2
1
CclasstobelongingXorinput vecteveryfor0
CclasstobelongingXorinput vecteveryfor0
WX
WX
27-08-2014 K.Vasu MITS 9
8/11/2019 perceptons neural networks
10/33
SLDPThe algorithm for updating the weights may be formulated as follows:
1. If the kth
member of the training set, Xkis correctly classified by the weight vector
W(k) computed at the kth
iteration of the algorithm, no correction is made to the
weight vector of Perceptron in accordance with the rule.
Wk+1
= Wk if W
kXk>0 and Xkbelongs to class C1
Wk+1
= Wk if W
k0Xk and Xkbelongs to class C2
2. Otherwise, the weight vector of the Perceptron is updated in accordance with the
rule.
kkT)1( X-W TkW if W
kXk>0 and Xkbelongs to class C2
kkT)1( XW TkW if W
kXk 0 and Xkbelongs to class C1
where the learning rule parameter controls the adjustment applied to the weight vector.
27-08-2014 K.Vasu MITS 10
8/11/2019 perceptons neural networks
11/33
Discrete Perceptron training algorithmConsider P number of training patterns are available for training the model as :{(X1, t1), (X2, t2), . . . . (Xp, tp)}, where Xiis the i
thinput vector,
tiis the ith
target output, i = 1, 2, . . . P.
Learning Algorithm
Step 1: Set learning rate (0
8/11/2019 perceptons neural networks
12/33
Algorithm continued..
Step 4: Compute the output response
n
1i
p
ip xw ik
net
pp
netfO
where, activation function is pnetf
For bipolar binary activation function
p
p
pp netif
netifnetfo
1
1)(
For unipolar binary activation function
otherwise
netifnetfo
p
pp0
1)(
27-08-2014 K.Vasu MITS 12
8/11/2019 perceptons neural networks
13/33
Algorithm continued..
Step 5: Update the weights
ipp
k
i
k
i xotww )(2
1
Here, the weights are updated only if the target and output does not match.
Step 6: If p < P, the p p+1, go to step 4 and compute the output response for
the next input, otherwise go to step 7.
Step 7: Test the stopping condition: if weights are not changed stop and store
the final weights (W) and bias (b), else go to step 3.
The network training stops when all the input vectors are correctly classified i.e.
when the target value matches with the output for all the input vectors.
27-08-2014 K.Vasu MITS 13
8/11/2019 perceptons neural networks
14/33
Example:
Build the Perceptron network to realize fundamental logic gates, such as AND, OR
and XOR.
Solution:
The following steps are included for hand calculations with OR gate input-output data.
Table: OR logic gate function
Input
X1 X2
Output
(Target)
0 0 0
0 1 1
1 0 1
1 1 1Step 1: Initialize weights w1 = 0.1, w2 = 0.3;
Step 2: Set learning rate, = 0.1 and threshold value, = 0.2.
S t e p 3 : A p p l y i n p u t p a t t e r n o n e b y o n e a n d r e p e a t t h e s t e p s 4 a n d 5 ,
27-08-2014 K.Vasu MITS 14
8/11/2019 perceptons neural networks
15/33
27-08-2014 K.Vasu MITS 15
For input 1:
Let us consider the input, X1= [0,0] with target, t1=0.
Step 4: Compute the net input to the Perceptron, using equation
003.001.0bxw2
1i
01
i
0
i1
net
with the bipolar binary activation function, the output obtained as
0)0(1 fo
Step 5: The output is same as that of target, t1= 0, that is, the input pattern is correctly
classified.
Therefore, the weights and bias elements remain as their previous values, that is
updation in weights does not takes place.
ow the weight matrix for next input is w1= [0.1 0.3].
8/11/2019 perceptons neural networks
16/33
27-08-2014 K.Vasu MITS 16
For input 2:
The steps 4 and 5 are repeated for the next input, X2 = [0, 1] with target, t2=1.
The net input obtained as
3.013.001.0bxw2
1i
12
i
1
i2
net
The corresponding output is obtained as o2= f(0.3) = 1
The output is same as that of target, t2= 1, that is, the input pattern is correctly
classified. Therefore, the weights and bias elements remain as their previous
values, that is updation in weights and bias does not takes place. Now the weight
matrix for next input is w1= [0.1 0.3].
8/11/2019 perceptons neural networks
17/33
27-08-2014 K.Vasu MITS 17
For input 3:
Repeat steps 4 and 5 for the next input, x3 = [1,0] with target, t3=1.
Compute the net input to the Perceptron and output
1.003.011.0bxw2
1i
23
i
2
i3
net
o3= f(0.1) = 0
The output is not same as target, t2= 1 the weights are updated using the equation (3.14)The weights and bias are updated
2.01)01(1.01.0)(3
13
2
1
3
1 xotww o
3.00)01(1.03.0)(3
23
2
2
3
2 xotww o
So the weights are [0.2 0.3].
8/11/2019 perceptons neural networks
18/33
For input 4:
Repeat steps 4 and 5 for the next input, x4 = [1,1] with target, t3=1.
Compute the net input to the Perceptron and output
5.013.012.0bxw
2
1i
34
i
3
i4 net
The corresponding output using equation (3.13) obtained as
o4= f(0.5) = 1
The output is same as that of target, t2= 1, that is, the input pattern is correctly
classified. Therefore, the weights and bias elements remain as their previous values,
that is updation in weights and bias does not takes place. Now the weight matrix aftercompletion of one cycle is : w1= [0.2 0.3].
The summary of weights changes are described in Table 3.2
Table 3.2: The updated weights
Input
X1 X2 Net Output Target
pdated values
w1 w2
0.1 0.3
0 0 0 0 0 0.1 0.3
0 1 0.3 1 1 0.1 0.3
1 0 0.1 0 1 0.2 0.3
1 1 0.5 1 1 0.2 0.3
27-08-2014 K.Vasu MITS 18
8/11/2019 perceptons neural networks
19/33
1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of epochs
Error
1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of epochs
Error
Fig. 3.5 The Error profile during the training ofPerceptron to learn input-output relation of
AND gate
Fig. 3.4 The Error profile during the training of
Perceptron to learn input-output relation of OR gate
Results
27-08-2014 K.Vasu MITS 19
8/11/2019 perceptons neural networks
20/33
0 5 10 15 20 25 30 35 40 45 500.5
1
1.5
2
2.5
Number of epochs
Error
Fig. 3.6 The Error profile during the training of Perceptronto learn input-output relation of XOR gate
27-08-2014 K.Vasu MITS 20
8/11/2019 perceptons neural networks
21/33
Single-Layer Continuous Perceptron networks
(SLCP) The activation function that is used in modeling the Continuous
Perceptron is sigmoidal, which is differentiable.
The two advantages of using continuous activation function are (i)
finer control over the training procedure and (ii) differentialcharacteristics of the activation function, which is used forcomputation of the error gradient.
This gives the scope to use the gradients in modifying the weights. Thegradient or steepest descent method is used in updating weightsstarting from any arbitrary weight vector W, the gradient E(W) of thecurrent error function is computed.
27-08-2014 K.Vasu MITS 21
8/11/2019 perceptons neural networks
22/33
Single-Layer Continuous Perceptron networks
updated weight vector may be written as
)E(W-W kk)1( kW (3.22)
where is learning constant.
T h e e r r o r f u n c t i o n a t s t e p k m a y b e w r i t t e n a s
2ko-2
1 kk tE ( 3 . 2 3 a )
o r
Ek= 2kXWf-2
1kt
(3.23b)
27-08-2014 K.Vasu MITS 22
8/11/2019 perceptons neural networks
23/33
SLCP
The error minimization algorithm (3.22) requires computation of the gradient of the
error function (3.23) and it may be written as
2k)f(net-2
1)( tWE k (3.24)
The n+1 dimensional gradient vector is defined as
n
k
w
w
w
WE
E
.
.
E
E
)(
1
0
(3.25)
27-08-2014 K.Vasu MITS 23
8/11/2019 perceptons neural networks
24/33
SLCPUsing (3.24), we obtain the gradient vector as
n
k
k
k
k
w
net
w
net
w
net
WE
)(
.
.
)(
)(
)(netf)o-(d-)(
1
0
k
'
kk
(3.26)
Since netk = WkX, we have
,x)(
i
i
k
w
net for i =0, 1, . . . n. (3.27)
(x0=1 for bias element) and
27-08-2014 K.Vasu MITS 24
8/11/2019 perceptons neural networks
25/33
SLCP
equation (3.27) can be written as
)X(net)fo-(t-)( k'
kk kWE (3.28a)
or
ik
'
kk )x(net)fo-(t-
iw
E for i = 0, 1, . . . n (3.28b)
ik
'
kk
kk
i
)x(net)fo-(t)E(W-w (3.29)
27-08-2014 K.Vasu MITS 25
8/11/2019 perceptons neural networks
26/33
SLCP
The gradient (3.28a) can be written as
)Xo-(1)o-(t2
1-)(
2
kkk kWE (3.32)
and the complete delta training for the bipolar continuous activation function results
from (3.32) as
k
2
kk
k)1( )Xo-(1)o-(2
1W k
k tW (3.33)
where k denotes the reinstated number of the training step.
27-08-2014 K.Vasu MITS 26
8/11/2019 perceptons neural networks
27/33
8/11/2019 perceptons neural networks
28/33
Perceptron Convergence Theorem
Proof: Let us make three simplifications, without losing generality:(i) The sets Pand N can be joined in a single set
NPP , whereN consists of the negated elements of N .
(ii) The vectors in P can be normalized ( 1ip
), because if a weight vector
w
is found such that 0xw
then this is also valid for any other vector
x
, where is a constant.
(iii) The weight vector can also be normalized ( 1* w
). Since we assume that
a solution for the linear separation problem exists, we call
w
a normalizedsolution vector.
27-08-2014 K.Vasu MITS 28
8/11/2019 perceptons neural networks
29/33
Perceptron Convergence Theorem
Now, assume that after 1t steps the weight vector 1tw
has been computed. This means that at time t, a
vector ip
was incorrectly classified by the weight vector tw
and so a correction was applied:
itt pww
1 (3.37)The cosine of the angle between 1tw
and w
is
*
1
1cosww
ww
t
t
(3.38)
Numerator of equation (3.38): itt pwwww
1*
it pwww
tww
where pppw min
27-08-2014 K.Vasu MITS 29
8/11/2019 perceptons neural networks
30/33
Perceptron Convergence Theorem
Since w
defines an absolute linear separation ( it means finite sets + linearly
separable ) of Pand N , we know that 0 By induction, we obtain
101* twwww t
(3.39)
(Induction is:
we have
1
*
tt wwww
11*
tt wwww
211*
tt wwww
: Induction
Therefore
101* twwww t
)
27-08-2014 K.Vasu MITS 30
8/11/2019 perceptons neural networks
31/33
Perceptron Convergence Theorem
Denominator of equation (3.38):
ititt pwpww
2
1
222 iitt ppww
Since 0 it pw
(remember we corrected tw
using ip
)222
1 itt pww
12 tw
(since ip
is normalized)
By induction: 12
0
2
twwt
(3.40)
27-08-2014 K.Vasu MITS 31
8/11/2019 perceptons neural networks
32/33
Substituting (3.39), (3.40) in (3.38), we get
1
1cos1
2
0
0
tw
tww
1
1
t
t = 1t t
The right hand side term grows proportionally to t and since 0 ; it canbecome arbitrarily large. However, since 1cos ; tmust be bounded by a
maximum value
2
1
t .
The number of corrections to the weight vector must be finite.
27-08-2014 K.Vasu MITS 32
8/11/2019 perceptons neural networks
33/33
Limitations of Perceptron There are limitations to the capabilities of Perceptron however. They will learn the solution, if there is a solution to be found. First, the output values of a Perceptron can take on only one of two
values (True or False).
Second, Perceptron can only classify linearly separablesets of vectors.If a straight line or plane can be drawn to separate the input vectorsinto their correct categories, the input vectors are linearly separableand the Perceptron will find the solution.
If the vectors are not linearly separable learning will never reach apoint where all vectors are classified properly.
The most famous example of the Perceptions inability to solveproblems with linearly non-separable vectors is the boolean XORrealization.
S