perceptons neural networks

8/11/2019 perceptons neural networks

1/33

Perceptron Perceptron is one of the earliest models of artificial neuron.

It was proposed by Rosenblatt in 1958.

It is a single layer neural network whose weights can be trained to

produce a correct target vector when presented with the correspondinginput vector

The training technique used is called the Perceptron learning rule.

The Perceptron generated great interest due to its ability togeneralizefrom its training vectors and work with randomly distributed

connections. Perceptrons are especially suited for problems in pattern classification.

27-08-2014 K.Vasu MITS 1


2/33

27-08-2014 K.Vasu MITS 2

The schematic diagram of perceptron is shown in below Fig. Its synaptic weights aredenoted by w

1

, w2

, . . . wn

. The inputs applied to the perceptron are denoted by x1

, x2

, .. . . xn. The externally applied bias is denoted by b.

xn

o

w2

wn

f(.)

Output

x2

Hard limiter

x1 w1

net

bias, b

Inputs

Fig. Schematic diagram of perceptron


3/33

27-08-2014 K.Vasu MITS 3

The net input to the activation of the neuron is written as

n

1i

ii bxwnet

The output of Perceptron is written as o = f(net)

where f(.) is the activation function of Perceptron. Dependingupon the type of activation function, the Perceptron may beclassified into two types

Discrete perceptron, in which the activation function is hard

limiteror sgn(.)function

Continuous perceptron, in which the activation function issigmoid function, which is differentiable.


4/33

Perceptrons

Linear separability

A set of (2D) patterns (x1,x2) of two classes is linearly separableif there exists a line on the (x1,x2) plane

w0+ w1x1+ w2x2= 0

Separates all patterns of one class from the other class

A perceptron can be built with

3 inputx0= 1,x1,x2with weights w0, w1, w2 ndimensional patterns (x1,,xn)

Hyperplane w0+ w1x1+ w2x2++ wnxn= 0 dividing thespace into two regions

Can we get the weights from a set of sample patterns?

If the problem is linearly separable, then YES (by perceptronlearning)

27-08-2014 K.Vasu MITS 4


5/33

LINEAR SEPARABILITY

Definition:Two sets of points A and B in an n-dimensional space are calledlinearly separable if n+1 real numbers w1, w2, w3, . . . ., wn+1exist, such thatevery point (x1, x2, . . . , xn)A satisfies and every point (x1, x2, . . . , xn) B

satisfies .Absolute Linear Separability

Two sets of points A and B in an n-dimensional space are called linearlyseparable if n+1 real numbers w1, w2, w3, . . . ., wn+1exist, such that everypoint (x1, x2, . . . , xn) A satisfies and every point (x1, x2, . . . , xn) Bsatisfies .

Two finite sets of points A and B, in n-dimensional space which are linearseparable are also absolute linearly separable. In general, absolute linearly separable-> linearly separable

but if sets are finite, linearly separableabsolutely linearly separable

27-08-2014 K.Vasu MITS 5


6/33

Examples of linearly separable classes

- LogicalAND function

patterns (bipolar) decision boundary

x1 x2 output w1 = 1-1 -1 -1 w2 = 1-1 1 -1 w0 = -11 -1 -1

1 1 1 -1 + x1 + x2 = 0- Logical OR function

patterns (bipolar) decision boundary

x1 x2 output w1 = 1-1 -1 -1 w2 = 1

-1 1 1 w0 = 11 -1 11 1 1 1 + x1 + x2 = 0

27-08-2014 K.Vasu MITS 6

x

oo

o

x: class I (output = 1)o: class II (output = -1)

x

xo

x

x: class I (output = 1)o: class II (output = -1)


7/33

Single Layer Discrete Perceptron Networks (SLDP)

27-08-2014 K.Vasu MITS 7

ClassC1

Class C2

x1

x2

Fig. 3.2 Illustration of the hyper plane (in this example, a straight line)

as decision boundary for a two dimensional, two-class patron classification problem.

To develop insight into the behavior of a pattern classifier, it is necessary toplot a map of the decision regions in n-dimensional space, spanned by the ninput variables. The two decision regions separated by a hyper plane definedby

n

i

iw0

i 0x


8/33

SLDP

27-08-2014 K.Vasu MITS 8

Cla

ss

C1

Cla

ss

C2

(b)

Cla

ss

C1

Cla

ss

C2

(a)

Decision boundary

Fig (a) A pair of linearly separable patterns

(b) A pair of nonlinearly separable patterns.

For the Perceptron to function properly, the two classes C1 and C2 must be linearlyseparable.

In Fig.3.3(a), the two classes C1 and C2 are sufficiently separated from each other todraw a hyper plane (in this it is a straight line) as the decision boundary.


9/33

SLDPAssume that the input variables originate from two linearly separable classes.

Let1be the subset of training vectors X1(1), X1(2), . , that belongs to class C1

2be the subset of training vectors X2(1), X2(2), . , that belong to class C2.

Given the sets of vectors 1 and 2 to train the classifier, the training

process involves the adjustment of the W in such a way that the two

classes C1 and C2 are linearly separable. That is, there exists a weight

vector W such that we may write,

2

1

CclasstobelongingXorinput vecteveryfor0

CclasstobelongingXorinput vecteveryfor0

WX

WX

27-08-2014 K.Vasu MITS 9


10/33

SLDPThe algorithm for updating the weights may be formulated as follows:

1. If the kth

member of the training set, Xkis correctly classified by the weight vector

W(k) computed at the kth

iteration of the algorithm, no correction is made to the

weight vector of Perceptron in accordance with the rule.

Wk+1

= Wk if W

kXk>0 and Xkbelongs to class C1

Wk+1

= Wk if W

k0Xk and Xkbelongs to class C2

2. Otherwise, the weight vector of the Perceptron is updated in accordance with the

rule.

kkT)1( X-W TkW if W

kXk>0 and Xkbelongs to class C2

kkT)1( XW TkW if W

kXk 0 and Xkbelongs to class C1

where the learning rule parameter controls the adjustment applied to the weight vector.

27-08-2014 K.Vasu MITS 10


11/33

Discrete Perceptron training algorithmConsider P number of training patterns are available for training the model as :{(X1, t1), (X2, t2), . . . . (Xp, tp)}, where Xiis the i

thinput vector,

tiis the ith

target output, i = 1, 2, . . . P.

Learning Algorithm

Step 1: Set learning rate (0


12/33

Algorithm continued..

Step 4: Compute the output response

n

1i

p

ip xw ik

net

pp

netfO

where, activation function is pnetf

For bipolar binary activation function

p

p

pp netif

netifnetfo

1

1)(

For unipolar binary activation function

otherwise

netifnetfo

p

pp0

1)(

27-08-2014 K.Vasu MITS 12


13/33

Algorithm continued..

Step 5: Update the weights

ipp

k

i

k

i xotww )(2

1

Here, the weights are updated only if the target and output does not match.

Step 6: If p < P, the p p+1, go to step 4 and compute the output response for

the next input, otherwise go to step 7.

Step 7: Test the stopping condition: if weights are not changed stop and store

the final weights (W) and bias (b), else go to step 3.

The network training stops when all the input vectors are correctly classified i.e.

when the target value matches with the output for all the input vectors.

27-08-2014 K.Vasu MITS 13


14/33

Example:

Build the Perceptron network to realize fundamental logic gates, such as AND, OR

and XOR.

Solution:

The following steps are included for hand calculations with OR gate input-output data.

Table: OR logic gate function

Input

X1 X2

Output

(Target)

0 0 0

0 1 1

1 0 1

1 1 1Step 1: Initialize weights w1 = 0.1, w2 = 0.3;

Step 2: Set learning rate, = 0.1 and threshold value, = 0.2.

S t e p 3 : A p p l y i n p u t p a t t e r n o n e b y o n e a n d r e p e a t t h e s t e p s 4 a n d 5 ,

27-08-2014 K.Vasu MITS 14


15/33

27-08-2014 K.Vasu MITS 15

For input 1:

Let us consider the input, X1= [0,0] with target, t1=0.

Step 4: Compute the net input to the Perceptron, using equation

003.001.0bxw2

1i

01

i

0

i1

net

with the bipolar binary activation function, the output obtained as

0)0(1 fo

Step 5: The output is same as that of target, t1= 0, that is, the input pattern is correctly

classified.

Therefore, the weights and bias elements remain as their previous values, that is

updation in weights does not takes place.

ow the weight matrix for next input is w1= [0.1 0.3].


16/33

27-08-2014 K.Vasu MITS 16

For input 2:

The steps 4 and 5 are repeated for the next input, X2 = [0, 1] with target, t2=1.

The net input obtained as

3.013.001.0bxw2

1i

12

i

1

i2

net

The corresponding output is obtained as o2= f(0.3) = 1

The output is same as that of target, t2= 1, that is, the input pattern is correctly

classified. Therefore, the weights and bias elements remain as their previous

values, that is updation in weights and bias does not takes place. Now the weight

matrix for next input is w1= [0.1 0.3].


17/33

27-08-2014 K.Vasu MITS 17

For input 3:

Repeat steps 4 and 5 for the next input, x3 = [1,0] with target, t3=1.

Compute the net input to the Perceptron and output

1.003.011.0bxw2

1i

23

i

2

i3

net

o3= f(0.1) = 0

The output is not same as target, t2= 1 the weights are updated using the equation (3.14)The weights and bias are updated

2.01)01(1.01.0)(3

13

2

1

3

1 xotww o

3.00)01(1.03.0)(3

23

2

2

3

2 xotww o

So the weights are [0.2 0.3].


18/33

For input 4:

Repeat steps 4 and 5 for the next input, x4 = [1,1] with target, t3=1.

Compute the net input to the Perceptron and output

5.013.012.0bxw

2

1i

34

i

3

i4 net

The corresponding output using equation (3.13) obtained as

o4= f(0.5) = 1

The output is same as that of target, t2= 1, that is, the input pattern is correctly

classified. Therefore, the weights and bias elements remain as their previous values,

that is updation in weights and bias does not takes place. Now the weight matrix aftercompletion of one cycle is : w1= [0.2 0.3].

The summary of weights changes are described in Table 3.2

Table 3.2: The updated weights

Input

X1 X2 Net Output Target

pdated values

w1 w2

0.1 0.3

0 0 0 0 0 0.1 0.3

0 1 0.3 1 1 0.1 0.3

1 0 0.1 0 1 0.2 0.3

1 1 0.5 1 1 0.2 0.3

27-08-2014 K.Vasu MITS 18


19/33

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of epochs

Error

1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of epochs

Error

Fig. 3.5 The Error profile during the training ofPerceptron to learn input-output relation of

AND gate

Fig. 3.4 The Error profile during the training of

Perceptron to learn input-output relation of OR gate

Results

27-08-2014 K.Vasu MITS 19


20/33

0 5 10 15 20 25 30 35 40 45 500.5

1

1.5

2

2.5

Number of epochs

Error

Fig. 3.6 The Error profile during the training of Perceptronto learn input-output relation of XOR gate

27-08-2014 K.Vasu MITS 20


21/33

Single-Layer Continuous Perceptron networks

(SLCP) The activation function that is used in modeling the Continuous

Perceptron is sigmoidal, which is differentiable.

The two advantages of using continuous activation function are (i)

finer control over the training procedure and (ii) differentialcharacteristics of the activation function, which is used forcomputation of the error gradient.

This gives the scope to use the gradients in modifying the weights. Thegradient or steepest descent method is used in updating weightsstarting from any arbitrary weight vector W, the gradient E(W) of thecurrent error function is computed.

27-08-2014 K.Vasu MITS 21


22/33

Single-Layer Continuous Perceptron networks

updated weight vector may be written as

)E(W-W kk)1( kW (3.22)

where is learning constant.

T h e e r r o r f u n c t i o n a t s t e p k m a y b e w r i t t e n a s

2ko-2

1 kk tE ( 3 . 2 3 a )

o r

Ek= 2kXWf-2

1kt

(3.23b)

27-08-2014 K.Vasu MITS 22


23/33

SLCP

The error minimization algorithm (3.22) requires computation of the gradient of the

error function (3.23) and it may be written as

2k)f(net-2

1)( tWE k (3.24)

The n+1 dimensional gradient vector is defined as

n

k

w

w

w

WE

E

.

.

E

E

)(

1

0

(3.25)

27-08-2014 K.Vasu MITS 23


24/33

SLCPUsing (3.24), we obtain the gradient vector as

n

k

k

k

k

w

net

w

net

w

net

WE

)(

.

.

)(

)(

)(netf)o-(d-)(

1

0

k

'

kk

(3.26)

Since netk = WkX, we have

,x)(

i

i

k

w

net for i =0, 1, . . . n. (3.27)

(x0=1 for bias element) and

27-08-2014 K.Vasu MITS 24


25/33

SLCP

equation (3.27) can be written as

)X(net)fo-(t-)( k'

kk kWE (3.28a)

or

ik

'

kk )x(net)fo-(t-

iw

E for i = 0, 1, . . . n (3.28b)

ik

'

kk

kk

i

)x(net)fo-(t)E(W-w (3.29)

27-08-2014 K.Vasu MITS 25


26/33

SLCP

The gradient (3.28a) can be written as

)Xo-(1)o-(t2

1-)(

2

kkk kWE (3.32)

and the complete delta training for the bipolar continuous activation function results

from (3.32) as

k

2

kk

k)1( )Xo-(1)o-(2

1W k

k tW (3.33)

where k denotes the reinstated number of the training step.

27-08-2014 K.Vasu MITS 26


27/33


28/33

Perceptron Convergence Theorem

Proof: Let us make three simplifications, without losing generality:(i) The sets Pand N can be joined in a single set

NPP , whereN consists of the negated elements of N .

(ii) The vectors in P can be normalized ( 1ip

), because if a weight vector

w

is found such that 0xw

then this is also valid for any other vector

x

, where is a constant.

(iii) The weight vector can also be normalized ( 1* w

). Since we assume that

a solution for the linear separation problem exists, we call

w

a normalizedsolution vector.

27-08-2014 K.Vasu MITS 28


29/33


Now, assume that after 1t steps the weight vector 1tw

has been computed. This means that at time t, a

vector ip

was incorrectly classified by the weight vector tw

and so a correction was applied:

itt pww

1 (3.37)The cosine of the angle between 1tw

and w

is

*

1

1cosww

ww

t

t

(3.38)

Numerator of equation (3.38): itt pwwww

1*

it pwww

tww

where pppw min

27-08-2014 K.Vasu MITS 29


30/33


Since w

defines an absolute linear separation ( it means finite sets + linearly

separable ) of Pand N , we know that 0 By induction, we obtain

101* twwww t

(3.39)

(Induction is:

we have

1

*

tt wwww

11*

tt wwww

211*

tt wwww

: Induction

Therefore

101* twwww t

)

27-08-2014 K.Vasu MITS 30


31/33


Denominator of equation (3.38):

ititt pwpww

2

1

222 iitt ppww

Since 0 it pw

(remember we corrected tw

using ip

)222

1 itt pww

12 tw

(since ip

is normalized)

By induction: 12

0

2

twwt

(3.40)

27-08-2014 K.Vasu MITS 31


32/33

Substituting (3.39), (3.40) in (3.38), we get

1

1cos1

2

0

0

tw

tww

1

1

t

t = 1t t

The right hand side term grows proportionally to t and since 0 ; it canbecome arbitrarily large. However, since 1cos ; tmust be bounded by a

maximum value

2

1

t .

The number of corrections to the weight vector must be finite.

27-08-2014 K.Vasu MITS 32


33/33

Limitations of Perceptron There are limitations to the capabilities of Perceptron however. They will learn the solution, if there is a solution to be found. First, the output values of a Perceptron can take on only one of two

values (True or False).

Second, Perceptron can only classify linearly separablesets of vectors.If a straight line or plane can be drawn to separate the input vectorsinto their correct categories, the input vectors are linearly separableand the Perceptron will find the solution.

If the vectors are not linearly separable learning will never reach apoint where all vectors are classified properly.

The most famous example of the Perceptions inability to solveproblems with linearly non-separable vectors is the boolean XORrealization.

S

Documents

perceptons neural networks