perceptons neural networks

Embed Size (px)

Citation preview

  • 8/11/2019 perceptons neural networks

    1/33

    Perceptron Perceptron is one of the earliest models of artificial neuron.

    It was proposed by Rosenblatt in 1958.

    It is a single layer neural network whose weights can be trained to

    produce a correct target vector when presented with the correspondinginput vector

    The training technique used is called the Perceptron learning rule.

    The Perceptron generated great interest due to its ability togeneralizefrom its training vectors and work with randomly distributed

    connections. Perceptrons are especially suited for problems in pattern classification.

    27-08-2014 K.Vasu MITS 1

  • 8/11/2019 perceptons neural networks

    2/33

    27-08-2014 K.Vasu MITS 2

    The schematic diagram of perceptron is shown in below Fig. Its synaptic weights aredenoted by w

    1

    , w2

    , . . . wn

    . The inputs applied to the perceptron are denoted by x1

    , x2

    , .. . . xn. The externally applied bias is denoted by b.

    xn

    o

    w2

    wn

    f(.)

    Output

    x2

    Hard limiter

    x1 w1

    net

    bias, b

    Inputs

    Fig. Schematic diagram of perceptron

  • 8/11/2019 perceptons neural networks

    3/33

    27-08-2014 K.Vasu MITS 3

    The net input to the activation of the neuron is written as

    n

    1i

    ii bxwnet

    The output of Perceptron is written as o = f(net)

    where f(.) is the activation function of Perceptron. Dependingupon the type of activation function, the Perceptron may beclassified into two types

    Discrete perceptron, in which the activation function is hard

    limiteror sgn(.)function

    Continuous perceptron, in which the activation function issigmoid function, which is differentiable.

  • 8/11/2019 perceptons neural networks

    4/33

    Perceptrons

    Linear separability

    A set of (2D) patterns (x1,x2) of two classes is linearly separableif there exists a line on the (x1,x2) plane

    w0+ w1x1+ w2x2= 0

    Separates all patterns of one class from the other class

    A perceptron can be built with

    3 inputx0= 1,x1,x2with weights w0, w1, w2 ndimensional patterns (x1,,xn)

    Hyperplane w0+ w1x1+ w2x2++ wnxn= 0 dividing thespace into two regions

    Can we get the weights from a set of sample patterns?

    If the problem is linearly separable, then YES (by perceptronlearning)

    27-08-2014 K.Vasu MITS 4

  • 8/11/2019 perceptons neural networks

    5/33

    LINEAR SEPARABILITY

    Definition:Two sets of points A and B in an n-dimensional space are calledlinearly separable if n+1 real numbers w1, w2, w3, . . . ., wn+1exist, such thatevery point (x1, x2, . . . , xn)A satisfies and every point (x1, x2, . . . , xn) B

    satisfies .Absolute Linear Separability

    Two sets of points A and B in an n-dimensional space are called linearlyseparable if n+1 real numbers w1, w2, w3, . . . ., wn+1exist, such that everypoint (x1, x2, . . . , xn) A satisfies and every point (x1, x2, . . . , xn) Bsatisfies .

    Two finite sets of points A and B, in n-dimensional space which are linearseparable are also absolute linearly separable. In general, absolute linearly separable-> linearly separable

    but if sets are finite, linearly separableabsolutely linearly separable

    27-08-2014 K.Vasu MITS 5

  • 8/11/2019 perceptons neural networks

    6/33

    Examples of linearly separable classes

    - LogicalAND function

    patterns (bipolar) decision boundary

    x1 x2 output w1 = 1-1 -1 -1 w2 = 1-1 1 -1 w0 = -11 -1 -1

    1 1 1 -1 + x1 + x2 = 0- Logical OR function

    patterns (bipolar) decision boundary

    x1 x2 output w1 = 1-1 -1 -1 w2 = 1

    -1 1 1 w0 = 11 -1 11 1 1 1 + x1 + x2 = 0

    27-08-2014 K.Vasu MITS 6

    x

    oo

    o

    x: class I (output = 1)o: class II (output = -1)

    x

    xo

    x

    x: class I (output = 1)o: class II (output = -1)

  • 8/11/2019 perceptons neural networks

    7/33

    Single Layer Discrete Perceptron Networks (SLDP)

    27-08-2014 K.Vasu MITS 7

    ClassC1

    Class C2

    x1

    x2

    Fig. 3.2 Illustration of the hyper plane (in this example, a straight line)

    as decision boundary for a two dimensional, two-class patron classification problem.

    To develop insight into the behavior of a pattern classifier, it is necessary toplot a map of the decision regions in n-dimensional space, spanned by the ninput variables. The two decision regions separated by a hyper plane definedby

    n

    i

    iw0

    i 0x

  • 8/11/2019 perceptons neural networks

    8/33

    SLDP

    27-08-2014 K.Vasu MITS 8

    Cla

    ss

    C1

    Cla

    ss

    C2

    (b)

    Cla

    ss

    C1

    Cla

    ss

    C2

    (a)

    Decision boundary

    Fig (a) A pair of linearly separable patterns

    (b) A pair of nonlinearly separable patterns.

    For the Perceptron to function properly, the two classes C1 and C2 must be linearlyseparable.

    In Fig.3.3(a), the two classes C1 and C2 are sufficiently separated from each other todraw a hyper plane (in this it is a straight line) as the decision boundary.

  • 8/11/2019 perceptons neural networks

    9/33

    SLDPAssume that the input variables originate from two linearly separable classes.

    Let1be the subset of training vectors X1(1), X1(2), . , that belongs to class C1

    2be the subset of training vectors X2(1), X2(2), . , that belong to class C2.

    Given the sets of vectors 1 and 2 to train the classifier, the training

    process involves the adjustment of the W in such a way that the two

    classes C1 and C2 are linearly separable. That is, there exists a weight

    vector W such that we may write,

    2

    1

    CclasstobelongingXorinput vecteveryfor0

    CclasstobelongingXorinput vecteveryfor0

    WX

    WX

    27-08-2014 K.Vasu MITS 9

  • 8/11/2019 perceptons neural networks

    10/33

    SLDPThe algorithm for updating the weights may be formulated as follows:

    1. If the kth

    member of the training set, Xkis correctly classified by the weight vector

    W(k) computed at the kth

    iteration of the algorithm, no correction is made to the

    weight vector of Perceptron in accordance with the rule.

    Wk+1

    = Wk if W

    kXk>0 and Xkbelongs to class C1

    Wk+1

    = Wk if W

    k0Xk and Xkbelongs to class C2

    2. Otherwise, the weight vector of the Perceptron is updated in accordance with the

    rule.

    kkT)1( X-W TkW if W

    kXk>0 and Xkbelongs to class C2

    kkT)1( XW TkW if W

    kXk 0 and Xkbelongs to class C1

    where the learning rule parameter controls the adjustment applied to the weight vector.

    27-08-2014 K.Vasu MITS 10

  • 8/11/2019 perceptons neural networks

    11/33

    Discrete Perceptron training algorithmConsider P number of training patterns are available for training the model as :{(X1, t1), (X2, t2), . . . . (Xp, tp)}, where Xiis the i

    thinput vector,

    tiis the ith

    target output, i = 1, 2, . . . P.

    Learning Algorithm

    Step 1: Set learning rate (0

  • 8/11/2019 perceptons neural networks

    12/33

    Algorithm continued..

    Step 4: Compute the output response

    n

    1i

    p

    ip xw ik

    net

    pp

    netfO

    where, activation function is pnetf

    For bipolar binary activation function

    p

    p

    pp netif

    netifnetfo

    1

    1)(

    For unipolar binary activation function

    otherwise

    netifnetfo

    p

    pp0

    1)(

    27-08-2014 K.Vasu MITS 12

  • 8/11/2019 perceptons neural networks

    13/33

    Algorithm continued..

    Step 5: Update the weights

    ipp

    k

    i

    k

    i xotww )(2

    1

    Here, the weights are updated only if the target and output does not match.

    Step 6: If p < P, the p p+1, go to step 4 and compute the output response for

    the next input, otherwise go to step 7.

    Step 7: Test the stopping condition: if weights are not changed stop and store

    the final weights (W) and bias (b), else go to step 3.

    The network training stops when all the input vectors are correctly classified i.e.

    when the target value matches with the output for all the input vectors.

    27-08-2014 K.Vasu MITS 13

  • 8/11/2019 perceptons neural networks

    14/33

    Example:

    Build the Perceptron network to realize fundamental logic gates, such as AND, OR

    and XOR.

    Solution:

    The following steps are included for hand calculations with OR gate input-output data.

    Table: OR logic gate function

    Input

    X1 X2

    Output

    (Target)

    0 0 0

    0 1 1

    1 0 1

    1 1 1Step 1: Initialize weights w1 = 0.1, w2 = 0.3;

    Step 2: Set learning rate, = 0.1 and threshold value, = 0.2.

    S t e p 3 : A p p l y i n p u t p a t t e r n o n e b y o n e a n d r e p e a t t h e s t e p s 4 a n d 5 ,

    27-08-2014 K.Vasu MITS 14

  • 8/11/2019 perceptons neural networks

    15/33

    27-08-2014 K.Vasu MITS 15

    For input 1:

    Let us consider the input, X1= [0,0] with target, t1=0.

    Step 4: Compute the net input to the Perceptron, using equation

    003.001.0bxw2

    1i

    01

    i

    0

    i1

    net

    with the bipolar binary activation function, the output obtained as

    0)0(1 fo

    Step 5: The output is same as that of target, t1= 0, that is, the input pattern is correctly

    classified.

    Therefore, the weights and bias elements remain as their previous values, that is

    updation in weights does not takes place.

    ow the weight matrix for next input is w1= [0.1 0.3].

  • 8/11/2019 perceptons neural networks

    16/33

    27-08-2014 K.Vasu MITS 16

    For input 2:

    The steps 4 and 5 are repeated for the next input, X2 = [0, 1] with target, t2=1.

    The net input obtained as

    3.013.001.0bxw2

    1i

    12

    i

    1

    i2

    net

    The corresponding output is obtained as o2= f(0.3) = 1

    The output is same as that of target, t2= 1, that is, the input pattern is correctly

    classified. Therefore, the weights and bias elements remain as their previous

    values, that is updation in weights and bias does not takes place. Now the weight

    matrix for next input is w1= [0.1 0.3].

  • 8/11/2019 perceptons neural networks

    17/33

    27-08-2014 K.Vasu MITS 17

    For input 3:

    Repeat steps 4 and 5 for the next input, x3 = [1,0] with target, t3=1.

    Compute the net input to the Perceptron and output

    1.003.011.0bxw2

    1i

    23

    i

    2

    i3

    net

    o3= f(0.1) = 0

    The output is not same as target, t2= 1 the weights are updated using the equation (3.14)The weights and bias are updated

    2.01)01(1.01.0)(3

    13

    2

    1

    3

    1 xotww o

    3.00)01(1.03.0)(3

    23

    2

    2

    3

    2 xotww o

    So the weights are [0.2 0.3].

  • 8/11/2019 perceptons neural networks

    18/33

    For input 4:

    Repeat steps 4 and 5 for the next input, x4 = [1,1] with target, t3=1.

    Compute the net input to the Perceptron and output

    5.013.012.0bxw

    2

    1i

    34

    i

    3

    i4 net

    The corresponding output using equation (3.13) obtained as

    o4= f(0.5) = 1

    The output is same as that of target, t2= 1, that is, the input pattern is correctly

    classified. Therefore, the weights and bias elements remain as their previous values,

    that is updation in weights and bias does not takes place. Now the weight matrix aftercompletion of one cycle is : w1= [0.2 0.3].

    The summary of weights changes are described in Table 3.2

    Table 3.2: The updated weights

    Input

    X1 X2 Net Output Target

    pdated values

    w1 w2

    0.1 0.3

    0 0 0 0 0 0.1 0.3

    0 1 0.3 1 1 0.1 0.3

    1 0 0.1 0 1 0.2 0.3

    1 1 0.5 1 1 0.2 0.3

    27-08-2014 K.Vasu MITS 18

  • 8/11/2019 perceptons neural networks

    19/33

    1 2 3 4 5 6 7 8 9 100

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Number of epochs

    Error

    1 2 3 4 5 6 7 8 9 100

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    Number of epochs

    Error

    Fig. 3.5 The Error profile during the training ofPerceptron to learn input-output relation of

    AND gate

    Fig. 3.4 The Error profile during the training of

    Perceptron to learn input-output relation of OR gate

    Results

    27-08-2014 K.Vasu MITS 19

  • 8/11/2019 perceptons neural networks

    20/33

    0 5 10 15 20 25 30 35 40 45 500.5

    1

    1.5

    2

    2.5

    Number of epochs

    Error

    Fig. 3.6 The Error profile during the training of Perceptronto learn input-output relation of XOR gate

    27-08-2014 K.Vasu MITS 20

  • 8/11/2019 perceptons neural networks

    21/33

    Single-Layer Continuous Perceptron networks

    (SLCP) The activation function that is used in modeling the Continuous

    Perceptron is sigmoidal, which is differentiable.

    The two advantages of using continuous activation function are (i)

    finer control over the training procedure and (ii) differentialcharacteristics of the activation function, which is used forcomputation of the error gradient.

    This gives the scope to use the gradients in modifying the weights. Thegradient or steepest descent method is used in updating weightsstarting from any arbitrary weight vector W, the gradient E(W) of thecurrent error function is computed.

    27-08-2014 K.Vasu MITS 21

  • 8/11/2019 perceptons neural networks

    22/33

    Single-Layer Continuous Perceptron networks

    updated weight vector may be written as

    )E(W-W kk)1( kW (3.22)

    where is learning constant.

    T h e e r r o r f u n c t i o n a t s t e p k m a y b e w r i t t e n a s

    2ko-2

    1 kk tE ( 3 . 2 3 a )

    o r

    Ek= 2kXWf-2

    1kt

    (3.23b)

    27-08-2014 K.Vasu MITS 22

  • 8/11/2019 perceptons neural networks

    23/33

    SLCP

    The error minimization algorithm (3.22) requires computation of the gradient of the

    error function (3.23) and it may be written as

    2k)f(net-2

    1)( tWE k (3.24)

    The n+1 dimensional gradient vector is defined as

    n

    k

    w

    w

    w

    WE

    E

    .

    .

    E

    E

    )(

    1

    0

    (3.25)

    27-08-2014 K.Vasu MITS 23

  • 8/11/2019 perceptons neural networks

    24/33

    SLCPUsing (3.24), we obtain the gradient vector as

    n

    k

    k

    k

    k

    w

    net

    w

    net

    w

    net

    WE

    )(

    .

    .

    )(

    )(

    )(netf)o-(d-)(

    1

    0

    k

    '

    kk

    (3.26)

    Since netk = WkX, we have

    ,x)(

    i

    i

    k

    w

    net for i =0, 1, . . . n. (3.27)

    (x0=1 for bias element) and

    27-08-2014 K.Vasu MITS 24

  • 8/11/2019 perceptons neural networks

    25/33

    SLCP

    equation (3.27) can be written as

    )X(net)fo-(t-)( k'

    kk kWE (3.28a)

    or

    ik

    '

    kk )x(net)fo-(t-

    iw

    E for i = 0, 1, . . . n (3.28b)

    ik

    '

    kk

    kk

    i

    )x(net)fo-(t)E(W-w (3.29)

    27-08-2014 K.Vasu MITS 25

  • 8/11/2019 perceptons neural networks

    26/33

    SLCP

    The gradient (3.28a) can be written as

    )Xo-(1)o-(t2

    1-)(

    2

    kkk kWE (3.32)

    and the complete delta training for the bipolar continuous activation function results

    from (3.32) as

    k

    2

    kk

    k)1( )Xo-(1)o-(2

    1W k

    k tW (3.33)

    where k denotes the reinstated number of the training step.

    27-08-2014 K.Vasu MITS 26

  • 8/11/2019 perceptons neural networks

    27/33

  • 8/11/2019 perceptons neural networks

    28/33

    Perceptron Convergence Theorem

    Proof: Let us make three simplifications, without losing generality:(i) The sets Pand N can be joined in a single set

    NPP , whereN consists of the negated elements of N .

    (ii) The vectors in P can be normalized ( 1ip

    ), because if a weight vector

    w

    is found such that 0xw

    then this is also valid for any other vector

    x

    , where is a constant.

    (iii) The weight vector can also be normalized ( 1* w

    ). Since we assume that

    a solution for the linear separation problem exists, we call

    w

    a normalizedsolution vector.

    27-08-2014 K.Vasu MITS 28

  • 8/11/2019 perceptons neural networks

    29/33

    Perceptron Convergence Theorem

    Now, assume that after 1t steps the weight vector 1tw

    has been computed. This means that at time t, a

    vector ip

    was incorrectly classified by the weight vector tw

    and so a correction was applied:

    itt pww

    1 (3.37)The cosine of the angle between 1tw

    and w

    is

    *

    1

    1cosww

    ww

    t

    t

    (3.38)

    Numerator of equation (3.38): itt pwwww

    1*

    it pwww

    tww

    where pppw min

    27-08-2014 K.Vasu MITS 29

  • 8/11/2019 perceptons neural networks

    30/33

    Perceptron Convergence Theorem

    Since w

    defines an absolute linear separation ( it means finite sets + linearly

    separable ) of Pand N , we know that 0 By induction, we obtain

    101* twwww t

    (3.39)

    (Induction is:

    we have

    1

    *

    tt wwww

    11*

    tt wwww

    211*

    tt wwww

    : Induction

    Therefore

    101* twwww t

    )

    27-08-2014 K.Vasu MITS 30

  • 8/11/2019 perceptons neural networks

    31/33

    Perceptron Convergence Theorem

    Denominator of equation (3.38):

    ititt pwpww

    2

    1

    222 iitt ppww

    Since 0 it pw

    (remember we corrected tw

    using ip

    )222

    1 itt pww

    12 tw

    (since ip

    is normalized)

    By induction: 12

    0

    2

    twwt

    (3.40)

    27-08-2014 K.Vasu MITS 31

  • 8/11/2019 perceptons neural networks

    32/33

    Substituting (3.39), (3.40) in (3.38), we get

    1

    1cos1

    2

    0

    0

    tw

    tww

    1

    1

    t

    t = 1t t

    The right hand side term grows proportionally to t and since 0 ; it canbecome arbitrarily large. However, since 1cos ; tmust be bounded by a

    maximum value

    2

    1

    t .

    The number of corrections to the weight vector must be finite.

    27-08-2014 K.Vasu MITS 32

  • 8/11/2019 perceptons neural networks

    33/33

    Limitations of Perceptron There are limitations to the capabilities of Perceptron however. They will learn the solution, if there is a solution to be found. First, the output values of a Perceptron can take on only one of two

    values (True or False).

    Second, Perceptron can only classify linearly separablesets of vectors.If a straight line or plane can be drawn to separate the input vectorsinto their correct categories, the input vectors are linearly separableand the Perceptron will find the solution.

    If the vectors are not linearly separable learning will never reach apoint where all vectors are classified properly.

    The most famous example of the Perceptions inability to solveproblems with linearly non-separable vectors is the boolean XORrealization.

    S