Neural Networks

Embed Size (px)

DESCRIPTION

Basic things about neural network

Citation preview

  • 5/26/2018 Neural Networks

    1/73 Negnevitsky, Pearson Education, 2005 1

    Lecture 7

    Artificial neural networks:Supervised learning

    Introduction, or how the brain works

    The neuron as a simple computing element

    The perceptron

    Multilayer neural networks

    Accelerated learning in multilayer neural networks

    The Hopfield network

    Bidirectional associative memories (BAM)

    Summary

  • 5/26/2018 Neural Networks

    2/73 Negnevitsky, Pearson Education, 2005 2

    Introduction, or how the brain works

    Machine learning involves adaptive mechanismsthat enable computers to learn from experience,

    learn by example and learn by analogy. Learning

    capabilities can improve the performance of an

    intelligent system over time. The most popular

    approaches to machine learning are artificial

    neural networks and genetic algorithms. This

    lecture is dedicated to neural networks.

  • 5/26/2018 Neural Networks

    3/73 Negnevitsky, Pearson Education, 2005 3

    A neural network can be defined as a model of

    reasoning based on the human brain. The brain

    consists of a densely interconnected set of nervecells, or basic information-processing units, called

    neurons.

    The human brain incorporates nearly 10 billionneurons and 60 trillion connections,synapses,

    between them. By using multiple neurons

    simultaneously, the brain can perform its functionsmuch faster than the fastest computers in existence

    today.

  • 5/26/2018 Neural Networks

    4/73 Negnevitsky, Pearson Education, 2005 4

    Each neuron has a very simple structure, but an

    army of such elements constitutes a tremendous

    processing power.

    A neuron consists of a cell body, soma, a number

    of fibers called dendrites, and a single long fiber

    called the axon.

  • 5/26/2018 Neural Networks

    5/73 Negnevitsky, Pearson Education, 2005 5

    Biological neural network

    Soma Soma

    Synapse

    Synapse

    Dendrites

    Axon

    Synapse

    Dendrites

    Axon

  • 5/26/2018 Neural Networks

    6/73 Negnevitsky, Pearson Education, 2005 6

    Our brain can be considered as a highly complex,

    non-linear and parallel information-processing

    system. Information is stored and processed in a neural

    network simultaneously throughout the whole

    network, rather than at specific locations. In otherwords, in neural networks, both data and its

    processing are global rather than local.

    Learning is a fundamental and essentialcharacteristic of biological neural networks. The

    ease with which they can learn led to attempts to

    emulate a biological neural network in a computer.

  • 5/26/2018 Neural Networks

    7/73 Negnevitsky, Pearson Education, 2005 7

    An artificial neural network consists of a number of

    very simple processors, also called neurons, which

    are analogous to the biological neurons in the brain. The neurons are connected by weighted links

    passing signals from one neuron to another.

    The output signal is transmitted through theneurons outgoing connection. The outgoing

    connection splits into a number of branches that

    transmit the same signal. The outgoing branchesterminate at the incoming connections of other

    neurons in the network.

  • 5/26/2018 Neural Networks

    8/73 Negnevitsky, Pearson Education, 2005 8

    Architecture of a typical artificial neural network

    Input Layer Output Layer

    Middle Layer

    InputS

    ignals

    Ou

    tputS

    ignals

  • 5/26/2018 Neural Networks

    9/73 Negnevitsky, Pearson Education, 2005 9

    Analogy between biological and

    artificial neural networks

    Biological Neural Network Artificial Neural Network

    Soma

    DendriteAxonSynapse

    Neuron

    InputOutputWeight

  • 5/26/2018 Neural Networks

    10/73 Negnevitsky, Pearson Education, 2005 10

    The neuron as a simple computing element

    Diagram of aneuron

    Neuron Y

    Input Signals

    x1

    x2

    xn

    Output Signals

    Y

    Y

    Y

    w2

    w1

    wn

    Weights

  • 5/26/2018 Neural Networks

    11/73 Negnevitsky, Pearson Education, 2005 11

    The neuron computes the weighted sum of the input

    signals and compares the result with a threshold

    value, q. If the net input is less than the threshold,

    the neuron output is1. But if the net input is greater

    than or equal to the threshold, the neuron becomes

    activated and its output attains a value +1.

    The neuron uses the following transfer or activation

    function:

    This type of activation function is called a sign

    function.

    n

    iiiwxX

    1

    q1

    q1

    X

    X

    Y if,

    if,

  • 5/26/2018 Neural Networks

    12/73 Negnevitsky, Pearson Education, 2005 12

    Activation functions of a neuron

    Step function Sign function

    +1

    -1

    0

    +1

    -1

    0X

    Y

    X

    Y

    1 1

    -1

    0 X

    Y

    Sigmoidfunction

    -1

    0 X

    Y

    Linear function

    0if,00if,1

    XXYstep

    0if,10if,1

    XXYsign

    Xsigmoid

    eY

    1

    1 XYlinear

  • 5/26/2018 Neural Networks

    13/73 Negnevitsky, Pearson Education, 2005 13

    Can a single neuron learn a task?

    In 1958, Frank Rosenblatt introduced a trainingalgorithm that provided the first procedure for

    training a simple ANN: a perceptron.

    The perceptron is the simplest form of a neuralnetwork. It consists of a single neuron with

    adjustable synaptic weights and a hard limiter.

  • 5/26/2018 Neural Networks

    14/73 Negnevitsky, Pearson Education, 2005 14

    Single-layer two-input perceptron

    Threshold

    Inputs

    x1

    x2

    Output

    Y

    Hard

    Limiter

    w2

    w1

    Linear

    Combiner

    q

  • 5/26/2018 Neural Networks

    15/73 Negnevitsky, Pearson Education, 2005 15

    ThePerceptron

    The operation of Rosenblatts perceptron is basedon the McCulloch and Pitts neuron model. The

    model consists of a linear combiner followed by a

    hard limiter.

    The weighted sum of the inputs is applied to the

    hard limiter, which produces an output equal to +1

    if its input is positive and 1 if it is negative.

  • 5/26/2018 Neural Networks

    16/73 Negnevitsky, Pearson Education, 2005 16

    The aim of the perceptron is to classify inputs,

    x1,x2, . . .,xn, into one of two classes, sayA1and A2.

    In the case of an elementary perceptron, the n-

    dimensional space is divided by a hyperplane intotwo decision regions. The hyperplane is defined by

    the l inearly separable function:

    01

    q

    n

    iiiwx

  • 5/26/2018 Neural Networks

    17/73

    Negnevitsky, Pearson Education, 2005 17

    Linear separability in the perceptrons

    x1

    x2

    Class A 2

    Class A 1

    1

    2

    x1w1 +x2w2q = 0

    (a) Two-input perceptron. (b) Three-input perceptron.

    x2

    x1

    x3 x1w1+ x2w2+ x3w3q = 0

    1 2

  • 5/26/2018 Neural Networks

    18/73

    Negnevitsky, Pearson Education, 2005 18

    How does the perceptron learn its classification

    tasks?

    This is done by making small adjustments in the

    weights to reduce the difference between the actual

    and desired outputs of the perceptron. The initial

    weights are randomly assigned, usually in the range[0.5, 0.5], and then updated to obtain the output

    consistent with the training examples.

  • 5/26/2018 Neural Networks

    19/73

    Negnevitsky, Pearson Education, 2005 19

    If at iterationp, the actual output is Y(p) and the

    desired output is Yd(p), then the error is given by:

    wherep = 1, 2, 3, . . .

    Iterationp here refers to thepth training examplepresented to the perceptron.

    If the error, e(p), is positive, we need to increase

    perceptron output Y(p), but if it is negative, weneed to decrease Y(p).

    )()()( pYpYpe d

  • 5/26/2018 Neural Networks

    20/73

    Negnevitsky, Pearson Education, 2005 20

    )()()()1( pepxpwpw iii

    a. .

    wherep = 1, 2, 3, . . .

    a is the learning rate, a positive constant less than

    unity.

    The perceptron learning rule was first proposed by

    Rosenblatt in 1960. Using this rule we can derive

    the perceptron training algorithm for classificationtasks.

    The perceptron learning rule

  • 5/26/2018 Neural Networks

    21/73

    Negnevitsky, Pearson Education, 2005 21

    Perceptrons training algorithm

    Step 1: InitialisationSet initial weights w1, w2,, wnand threshold q

    to random numbers in the range [0.5, 0.5].

    If the error, e(p), is positive, we need to increaseperceptronoutput Y(p), but if it is negative, we

    need to decrease Y(p).

  • 5/26/2018 Neural Networks

    22/73

    Negnevitsky, Pearson Education, 2005 22

    Perceptrons training algorithm (continued)

    Step 2: Activation

    Activate the perceptron by applying inputsx1(p),

    x2(p),,xn(p) and desired output Yd(p).

    Calculate the actual output at iterationp = 1

    where n is the number of the perceptron inputs,andstep is a step activation function.

    q

    n

    iii pwpxsteppY

    1

    )()()(

  • 5/26/2018 Neural Networks

    23/73

    Negnevitsky, Pearson Education, 2005 23

    Perceptrons training algorithm (continued)

    Step 3: Weight training

    Update the weights of the perceptron

    where Dwi(p) is the weight correction at iterationp.

    The weight correction is computed by the deltarule:

    Step 4: Iteration

    Increase iterationpby one, go back to Step 2 and

    repeat the process until convergence.

    )()()1( pwpwpw iii D

    )()()( pepxpw ii aD .

  • 5/26/2018 Neural Networks

    24/73

    Negnevitsky, Pearson Education, 2005 24

    Example of perceptron learning: the logical operationANDInputs

    x1 x2

    0

    01

    1

    0

    10

    1

    0

    00

    EpochDesiredoutputYd

    1

    Initialweights

    w1

    w2

    1

    0.3

    0.30.3

    0.2

    0.1

    0.1

    0.1

    0.1

    0

    01

    0

    Actualoutput

    Y

    Error

    e

    0

    01

    1

    Finalweightsw

    1 w

    2

    0.3

    0.30.2

    0.3

    0.1

    0.1

    0.1

    0.0

    00

    11

    0

    1

    01

    00

    0

    2

    1

    0.30.3

    0.30.2

    0

    0

    11

    00

    10

    0.30.3

    0.20.2

    0.00.0

    0.00.0

    0

    0

    1

    1

    0

    1

    0

    1

    0

    0

    0

    3

    1

    0.2

    0.2

    0.2

    0.1

    0.00.0

    0.00.0

    0.0

    0.0

    0.0

    0.0

    0

    0

    1

    0

    0

    0

    1

    1

    0.2

    0.2

    0.1

    0.2

    0.0

    0.0

    0.0

    0.1

    0

    0

    11

    0

    1

    01

    0

    0

    0

    4

    1

    0.2

    0.2

    0.20.1

    0.1

    0.1

    0.10.1

    0

    0

    11

    0

    0

    10

    0.2

    0.2

    0.10.1

    0.1

    0.1

    0.10.1

    00

    11

    0

    1

    01

    00

    0

    5

    1

    0.10.1

    0.10.1

    0.10.1

    0.10.1

    0

    0

    01

    00

    0

    0.10.1

    0.10.1

    0.10.1

    0.10.1

    0

    Threshold: q= 0.2; learning rate: = 0.1

  • 5/26/2018 Neural Networks

    25/73

    Negnevitsky, Pearson Education, 2005 25

    Two-dimensional plots of basic logical operations

    A perceptron can learn the operationsAND and OR,but notExclusive-OR.

    x1

    x2

    1

    1

    x1

    x2

    1

    1

    (b) OR (x1 x2)

    x1

    x2

    1

    1

    (c) Exclusive-OR

    (x1 x2)

    00 0

    (a) AND (x1x2)

  • 5/26/2018 Neural Networks

    26/73

    Negnevitsky, Pearson Education, 2005 26

    Multilayerneuralnetworks

    A multilayer perceptron is a feedforward neuralnetwork with one or more hidden layers.

    The network consists of an input layer of source

    neurons, at least one middle or hidden layer ofcomputational neurons, and an output layer of

    computational neurons.

    The input signals are propagated in a forwarddirection on a layer-by-layer basis.

  • 5/26/2018 Neural Networks

    27/73

    Negnevitsky, Pearson Education, 2005 27

    Multilayer perceptron with two hidden layers

    Input

    layer

    First

    hidden

    layer

    Second

    hidden

    layerOutput

    layer

    InputSi

    gnals

    Ou

    tputS

    ignals

  • 5/26/2018 Neural Networks

    28/73

    Negnevitsky, Pearson Education, 2005 28

    What does the middle layer hide? A hidden layer hides its desired output.

    Neurons in the hidden layer cannot be observedthrough the input/output behaviour of the network.

    There is no obvious way to know what the desired

    output of the hidden layer should be.

    Commercial ANNs incorporate three and

    sometimes four layers, including one or two

    hidden layers. Each layer can contain from 10 to

    1000 neurons. Experimental neural networks mayhave five or even six layers, including three or

    four hidden layers, and utilise millions of neurons.

  • 5/26/2018 Neural Networks

    29/73

    Negnevitsky, Pearson Education, 2005 29

    Back-propagation neural network

    Learning in a multilayer network proceeds the

    same way as for a perceptron.

    A training set of input patterns is presented to the

    network.

    The network computes its output pattern, and if

    there is an error or in other words a difference

    between actual and desired output patterns the

    weights are adjusted to reduce this error.

  • 5/26/2018 Neural Networks

    30/73

    Negnevitsky, Pearson Education, 2005 30

    In a back-propagation neural network, the learning

    algorithm has two phases.

    First, a training input pattern is presented to the

    network input layer. The network propagates the

    input pattern from layer to layer until the output

    pattern is generated by the output layer.

    If this pattern is different from the desired output,

    an error is calculated and then propagated

    backwards through the network from the outputlayer to the input layer. The weights are modified

    as the error is propagated.

  • 5/26/2018 Neural Networks

    31/73

    Negnevitsky, Pearson Education, 2005 31

    Three-layer back-propagation neural network

    Input

    layer

    xi

    x1

    x2

    xn

    1

    2

    i

    n

    Output

    layer

    1

    2

    k

    l

    yk

    y1

    y2

    yl

    Input signals

    Error signals

    wjk

    Hidden

    layer

    wij

    1

    2

    j

    m

  • 5/26/2018 Neural Networks

    32/73

    Negnevitsky, Pearson Education, 2005 32

    The back-propagation training algorithm

    Step 1: Initialisation

    Set all the weights and threshold levels of thenetwork to random numbers uniformly

    distributed inside a small range:

    whereFiis the total number of inputs of neuron iin the network. The weight initialisation is done

    on a neuron-by-neuron basis.

    ii FF

    4.2,

    4.2

  • 5/26/2018 Neural Networks

    33/73

    Negnevitsky, Pearson Education, 2005 33

    Step 2: Activation

    Activate the back-propagation neural network by

    applying inputsx1(p),x

    2(p),,x

    n(p) and desired

    outputsyd,1(p),yd,2(p),,yd,n(p).

    (a) Calculate the actual outputs of the neurons in

    the hidden layer:

    where n is the number of inputs of neuronj in thehidden layer, andsigmoid is thesigmoid activation

    function.

    q

    j

    n

    iijij pwpxsigmoidpy

    1

    )()()(

  • 5/26/2018 Neural Networks

    34/73

    Negnevitsky, Pearson Education, 2005 34

    (b) Calculate the actual outputs of the neurons in

    the output layer:

    Step 2: Activation (continued)

    where m is the number of inputs of neuron k in the

    output layer.

    q

    k

    m

    jjkjkk pwpxsigmoidpy

    1

    )()()(

  • 5/26/2018 Neural Networks

    35/73

    Negnevitsky, Pearson Education, 2005 35

    Step 3: Weight training

    Update the weights in the back-propagation network

    propagating backward the errors associated with

    output neurons.

    (a) Calculate the error gradient for the neurons in the

    output layer:

    where

    Calculate the weight corrections:

    Update the weights at the output neurons:

    )()()1( pwpwpw jkjkjk D

    )()(1)()( pepypyp kkkk

    )()()( , pypype kkdk

    )()()( ppypw kjjk D

  • 5/26/2018 Neural Networks

    36/73

    Negnevitsky, Pearson Education, 2005 36

    (b) Calculate the error gradient for the neurons in

    the hidden layer:

    Step 3: Weight training (continued)

    Calculate the weight corrections:

    Update the weights at the hidden neurons:

    )()()(1)()(1

    ][ pwppypyp jkl

    kkjjj

    )()()( ppxpw jiij D

    )()()1( pwpwpw ijijij D

  • 5/26/2018 Neural Networks

    37/73

    Negnevitsky, Pearson Education, 2005 37

    Step 4: Iteration

    Increase iterationpby one, go back to Step 2 and

    repeat the process until the selected error criterion

    is satisfied.

    As an example, we may consider the three-layer

    back-propagation network. Suppose that the

    network is required to perform logical operation

    Exclusive-OR. Recall that a single-layer perceptron

    could not do this operation. Now we will apply the

    three-layer net.

  • 5/26/2018 Neural Networks

    38/73

    Negnevitsky, Pearson Education, 2005 38

    Three-layer network for solving the

    Exclusive-OR operation

    y55

    x1 31

    x2

    Input

    layer

    Output

    layer

    Hiddenlayer

    42

    q3

    w13

    w24

    w23

    w24

    w35

    w45

    q4

    q5

    1

    1

    1

  • 5/26/2018 Neural Networks

    39/73

    Negnevitsky, Pearson Education, 2005 39

    The effect of the threshold applied to a neuron in the

    hidden or output layer is represented by its weight, q,

    connected to a fixed input equal to 1.

    The initial weights and threshold levels are set

    randomly as follows:

    w13= 0.5, w14= 0.9, w23= 0.4, w24= 1.0, w35 = 1.2,w45= 1.1, q3= 0.8, q4= 0.1 and q5= 0.3.

  • 5/26/2018 Neural Networks

    40/73

    Negnevitsky, Pearson Education, 2005 40

    We consider a training set where inputsx1 andx2are

    equal to 1 and desired outputyd,5is 0. The actual

    outputs of neurons 3 and 4 in the hidden layer are

    calculated as

    Now the actual output of neuron 5 in the output layer

    is determined as:

    Thus, the following error is obtained:

    5250.01/1)( )8.014.015.01(32321313 q ewxwxsigmoidy

    8808.01/1)(

    )1.010.119.01(

    42421414 q

    ewxwxsigmoidy

    5097.01/1)( )3.011.18808.02.15250.0(

    54543535 q

    ewywysigmoidy

    5097.05097.0055, yye d

  • 5/26/2018 Neural Networks

    41/73

    Negnevitsky, Pearson Education, 2005 41

    The next step is weight training. To update the

    weights and threshold levels in our network, we

    propagate the error, e, from the output layerbackward to the input layer.

    First, we calculate the error gradient for neuron 5 in

    the output layer:

    Then we determine the weight corrections assuming

    that the learning rate parameter, a, is equal to 0.1:

    1274.05097).0(0.5097)(10.5097)1( 555 eyy

    0112.0)1274.0(8808.01.05445 D yw

    0067.0)1274.0(5250.01.05335 D yw

    0127.0)1274.0()1(1.0)1( 55 qD

  • 5/26/2018 Neural Networks

    42/73

    Negnevitsky, Pearson Education, 2005 42

    Next we calculate the error gradients for neurons 3

    and 4 in the hidden layer:

    We then determine the weight corrections:

    0381.0)2.1(0.1274)(0.5250)(10.5250)1( 355333 wyy

    0.0147.114)0.127(0.8808)(10.8808)1( 455444 wyy

    0038.00381.011.03113 D xw0038.00381.011.03223 D xw

    0038.00381.0)1(1.0)1( 33 qD

    0015.0)0147.0(11.04114 D

    xw 0015.0)0147.0(11.04224 D xw0015.0)0147.0()1(1.0)1( 44 qD

  • 5/26/2018 Neural Networks

    43/73

    Negnevitsky, Pearson Education, 2005 43

    At last, we update all weights and threshold:

    The training process is repeated until the sum of

    squared errors is less than 0.001.

    5038.00038.05.0131313 D www

    8985.00015.09.0141414 D www

    4038.00038.04.0232323 D www

    9985.00015.00.1242424 D www

    2067.10067.02.1353535

    D www

    0888.10112.01.1454545 D www

    7962.00038.08.0333 qDqq

    0985.00015.01.0444 qDqq

    3127.00127.03.0555 qDqq

  • 5/26/2018 Neural Networks

    44/73

    Negnevitsky, Pearson Education, 2005 44

    Learning curve for operation Exclusive-OR

    0 50 100 150 200

    10 1

    Epoch

    Sum-Squared Network Error for 224 Epochs

    100

    10 -1

    10 -2

    10 -3

    10-4

    Sum-SquaredError

  • 5/26/2018 Neural Networks

    45/73

    Negnevitsky, Pearson Education, 2005 45

    Final results of three-layer network learning

    Inputs

    x1 x2

    1

    01

    0

    1

    10

    0

    0

    11

    Desiredoutput

    yd

    0

    0.0155

    Actualoutput

    y5 e

    Sum ofsquared

    errors

    0.98490.9849

    0.0175

    0.0010

  • 5/26/2018 Neural Networks

    46/73

    Negnevitsky, Pearson Education, 2005 46

    Network represented by McCulloch-Pitts model

    for solving the Exclusive-OR operation

    y55

    x1 31

    x2 42

    +1.0

    1

    1

    1+1.0

    +1.0

    +1.0

    +1.5

    +1.0

    +0.5

    +0.52.0

    i i i

  • 5/26/2018 Neural Networks

    47/73

    Negnevitsky, Pearson Education, 2005 47

    Decision boundaries

    (a) Decisionboundary constructed by hidden neuron 3;(b) Decision boundary constructed by hidden neuron 4;

    (c) Decision boundaries constructed by the complete

    three-layer network

    x1

    x2

    1

    (a)

    1

    x2

    1

    1

    (b)

    00

    x1

    + x21.5 = 0 x

    1+ x

    2 0.5 = 0

    x1 x1

    x2

    1

    1

    (c)

    0

  • 5/26/2018 Neural Networks

    48/73

    Negnevitsky, Pearson Education, 2005 48

    Accelerated learning in multilayer

    neural networks

    A multilayer network learns much faster when thesigmoidal activation function is represented by a

    hyperbolic tangent:

    where a and bare constants.

    Suitable values for a and b are:

    a = 1.716 and b = 0.667

    ae

    aYbX

    htan

    1

    2

  • 5/26/2018 Neural Networks

    49/73

    Negnevitsky, Pearson Education, 2005 49

    We also can accelerate training by including a

    momentum term in the delta rule:

    where b is a positive number (0 b 1) called themomentum constant. Typically, the momentum

    constant is set to 0.95.

    This equation is called the generalised delta rule.

    )()()1()( ppypwpw kjjkjk DD

    L i ith t f ti E l i OR

  • 5/26/2018 Neural Networks

    50/73

    Negnevitsky, Pearson Education, 2005 50

    Learning with momentum for operation Exclusive-OR

    0 20 40 60 80 100 120

    10-4

    10-2

    100

    102

    Epoch

    Training for 126 Epochs

    0 100 140

    -1

    -0.5

    0

    0. 5

    1

    1. 5

    Epoch

    10-3

    101

    10-1

    20 40 60 80 120

    Learning

    Rate

    L i ith d ti l i t

  • 5/26/2018 Neural Networks

    51/73

    Negnevitsky, Pearson Education, 2005 51

    Learning with adaptive learning rate

    To accelerate the convergence and yet avoid the

    danger of instability, we can apply two heuristics:

    Heuristic 1

    If the change of the sum of squared errors has the same

    algebraic sign for several consequent epochs, then thelearning rate parameter, a, should be increased.

    Heuristic 2

    If the algebraic sign of the change of the sum ofsquared errors alternates for several consequent

    epochs, then the learning rate parameter, a, should be

    decreased.

  • 5/26/2018 Neural Networks

    52/73

    Negnevitsky, Pearson Education, 2005 52

    Adapting the learning rate requires some changesin the back-propagation algorithm.

    If the sum of squared errors at the current epochexceeds the previous value by more than a

    predefined ratio (typically 1.04), the learning rate

    parameter is decreased (typically by multiplyingby 0.7) and new weights and thresholds arecalculated.

    If the error is less than the previous one, thelearning rate is increased (typically by multiplyingby 1.05).

    L i ith d ti l i t

  • 5/26/2018 Neural Networks

    53/73

    Negnevitsky, Pearson Education, 2005 53

    Learning with adaptive learning rate

    0 10 20 30 40 50 60 70 80 90 100Epoch

    Tr aining for 103Epochs

    0 20 40 60 80 100 1200

    0.2

    0.4

    0.6

    0.8

    1

    Epoch

    10-4

    10 -2

    100

    102

    10-3

    10 1

    10-1

    Sum-SquaredErro

    LearningRate

    L i ith t d d ti l i t

  • 5/26/2018 Neural Networks

    54/73

    Negnevitsky, Pearson Education, 2005 54

    Learning with momentum and adaptive learning rate

    0 10 20 30 40 50 60 70 80Epoch

    Trainingfor 85 Epochs

    0 10 20 30 40 50 60 70 80 900

    0.5

    1

    2.5

    Epoch

    10 -4

    10 -2

    10 0

    10 2

    10-3

    101

    10 -1

    1.5

    2

    Sum-SquaredE

    rro

    Learnin

    gRate

    Th H fi ld N t k

  • 5/26/2018 Neural Networks

    55/73

    Negnevitsky, Pearson Education, 2005 55

    The Hopfield Network

    Neural networks were designed on analogy with

    the brain. The brains memory, however, works by

    association. For example, we can recognise a

    familiar face even in an unfamiliar environment

    within 100-200 ms. We can also recall a completesensory experience, including sounds and scenes,

    when we hear only a few bars of music. The brain

    routinely associates one thing with another.

  • 5/26/2018 Neural Networks

    56/73

    Negnevitsky, Pearson Education, 2005 56

    Multilayer neural networks trained with the back-

    propagation algorithm are used for pattern

    recognition problems. However, to emulate thehuman memorys associative characteristics we

    need a different type of network: a recurrent

    neural network.

    A recurrent neural network has feedback loops

    from its outputs to its inputs. The presence of

    such loops has a profound impact on the learning

    capability of the network.

  • 5/26/2018 Neural Networks

    57/73

    Negnevitsky, Pearson Education, 2005 57

    The stability of recurrent networks intrigued

    several researchers in the 1960s and 1970s.

    However, none was able to predict which networkwould be stable, and some researchers were

    pessimistic about finding a solution at all. The

    problem was solved only in 1982, when John

    Hopfield formulated the physical principle of

    storing information in a dynamically stable

    network.

    Si l l n H fi ld t k

  • 5/26/2018 Neural Networks

    58/73

    Negnevitsky, Pearson Education, 2005 58

    Single-layer n-neuron Hopfield network

    xi

    x1

    x2

    xn

    yi

    y1

    y2

    yn

    1

    2

    i

    n

  • 5/26/2018 Neural Networks

    59/73

    Negnevitsky, Pearson Education, 2005 59

    The Hopfield network uses McCulloch and Pitts

    neurons with thesign activation function as its

    computing element:

    XY

    X

    X

    Ysign

    if,

    if,1

    0if,1

  • 5/26/2018 Neural Networks

    60/73

    Negnevitsky, Pearson Education, 2005 60

    The current state of the Hopfield network is

    determined by the current outputs of all neurons,

    y1,y2, . . .,yn.Thus, for a single-layer n-neuron network, the state

    can be defined by the state vector as:

    ny

    y

    y

    2

    1

    Y

  • 5/26/2018 Neural Networks

    61/73

    Negnevitsky, Pearson Education, 2005 61

    In the Hopfield network, synaptic weights between

    neurons are usually represented in matrix form as

    follows:

    whereMis the number of states to be memorised

    by the network, Ymis the n-dimensional binary

    vector, I is nnidentity matrix, and superscript T

    denotes matrix transposition.

    IYYW MM

    m

    Tmm

    1

    Possible states for the three neuron

  • 5/26/2018 Neural Networks

    62/73

    Negnevitsky, Pearson Education, 2005 62

    Possible states for the three-neuron

    Hopfield network

    y1

    y2

    y3

    (1,1,1)(1,1,1)

    (1,1,1) (1,1,1)

    (1, 1, 1)(1,1,1)

    (1, 1,1)(1,1,1)

    0

    The stable state vertex is determined by the weight

  • 5/26/2018 Neural Networks

    63/73

    Negnevitsky, Pearson Education, 2005 63

    The stable state-vertex is determined by the weight

    matrix W, the current input vector X, and the

    threshold matrix . If the input vector is partially

    incorrect or incomplete, the initial state will converge

    into the stable state-vertex after a few iterations.

    Suppose, for instance, that our network is required to

    memorise two opposite states, (1, 1, 1) and (1, 1, 1).Thus,

    where Y1and Y2are the three-dimensional vectors.

    or

    11

    1

    1Y

    1

    1

    1

    2Y 111

    1 TY 111

    2

    TY

    The 3 3 identity matrix I is

  • 5/26/2018 Neural Networks

    64/73

    Negnevitsky, Pearson Education, 2005 64

    The 3 3 identity matrix I is

    Thus, we can now determine the weight matrix as

    follows:

    Next, the network is tested by the sequence of input

    vectors, X1and X2, which are equal to the output (or

    target) vectors Y1and Y2, respectively.

    100

    010

    001

    I

    100

    010

    001

    2111

    1

    1

    1

    111

    1

    1

    1

    W

    022

    202

    220

    First we activate the Hopfield network by applying

  • 5/26/2018 Neural Networks

    65/73

    Negnevitsky, Pearson Education, 2005 65

    First, we activate the Hopfield network by applying

    the input vector X. Then, we calculate the actual

    output vector Y, and finally, we compare the result

    with the initial input vector X.

    11

    1

    00

    0

    11

    1

    022202

    220

    1 signY

    1

    1

    1

    0

    0

    0

    1

    1

    1

    022

    202

    220

    2 signY

    The remaining six states are all unstable However

  • 5/26/2018 Neural Networks

    66/73

    Negnevitsky, Pearson Education, 2005 66

    The remaining six states are all unstable. However,stable states (also calledfundamental memories) are

    capable of attracting states that are close to them.

    The fundamental memory (1, 1, 1) attracts unstable

    states (1, 1, 1), (1, 1, 1) and (1, 1, 1). Each ofthese unstable states represents a single error,

    compared to the fundamental memory (1, 1, 1).

    The fundamental memory (1, 1, 1) attracts

    unstable states (1, 1, 1), (1, 1, 1) and (1, 1, 1).

    Thus, the Hopfield network can act as an errorcorrection network.

    St it f th H fi ld t k

  • 5/26/2018 Neural Networks

    67/73

    Negnevitsky, Pearson Education, 2005 67

    Storage capacity of the Hopfield network

    Storage capacity is or the largest number of

    fundamental memories that can be stored and

    retrieved correctly.

    The maximum number of fundamental memories

    Mmaxthat can be stored in the n-neuron recurrent

    network is limited by

    nMmax 0.15

    Bidirectional associative memory (BAM)

  • 5/26/2018 Neural Networks

    68/73

    Negnevitsky, Pearson Education, 2005 68

    Bidirectional associative memory (BAM)

    The Hopfield network represents an autoassociative

    type of memory it can retrieve a corrupted orincomplete memory but cannot associate this memorywith another different memory.

    Human memory is essentially associative. One thingmay remind us of another, and that of another, and soon. We use a chain of mental associations to recovera lost memory. If we forget where we left an

    umbrella, we try to recall where we last had it, whatwe were doing, and who we were talking to. We

    attempt to establish a chain of associations, andthereby to restore a lost memory.

    To associate one memory with another we need a

  • 5/26/2018 Neural Networks

    69/73

    Negnevitsky, Pearson Education, 2005 69

    To associate one memory with another, we need a

    recurrent neural network capable of accepting an

    input pattern on one set of neurons and producing

    a related, but different, output pattern on another

    set of neurons.

    Bidirectional associative memory (BAM), first

    proposed by Bart Kosko, is a heteroassociative

    network. It associates patterns from one set, setA,

    to patterns from another set, setB, and vice versa.

    Like a Hopfield network, the BAM can generaliseand also produce correct outputs despite corrupted

    or incomplete inputs.

    BAM operation

  • 5/26/2018 Neural Networks

    70/73

    Negnevitsky, Pearson Education, 2005 70

    BAM operation

    yj(p)

    y1(p)

    y2(p)

    ym(p)

    1

    2

    j

    m

    Output

    layer

    Input

    layer

    xi(p)

    x1

    x2

    (p)

    (p)

    xn(p)

    2

    i

    n

    1

    xi(p+1)

    x1(p+1)

    x2(p+1)

    xn(p+1)

    yj(p)

    y1(p)

    y2(p)

    ym(p)

    1

    2

    j

    m

    Output

    layer

    Input

    layer

    2

    i

    n

    1

    (a) Forward direction. (b) Backward direction.

  • 5/26/2018 Neural Networks

    71/73

    Negnevitsky, Pearson Education, 2005 71

    The basic idea behind the BAM is to storepattern pairs so that when n-dimensional vector

    X from setA is presented as input, the BAM

    recalls m-dimensional vector Y from setB, but

    when Y is presented as input, the BAM recalls X.

  • 5/26/2018 Neural Networks

    72/73

    Negnevitsky, Pearson Education, 2005 72

    To develop the BAM, we need to create a

    correlation matrix for each pattern pair we want to

    store. The correlation matrix is the matrix productof the input vector X, and the transpose of the

    output vector YT. The BAM weight matrix is the

    sum of all correlation matrices, that is,

    whereM is the number of pattern pairs to be stored

    in the BAM.

    Tm

    M

    m

    mYXW

    1

    Stability and storage capacity of the BAM

  • 5/26/2018 Neural Networks

    73/73

    Stability and storage capacity of the BAM

    The BAM is unconditionally stable. This means that

    any set of associations can be learned without risk ofinstability.

    The maximum number of associations to be stored in

    the BAM should not exceed the number ofneurons in the smaller layer.

    The more serious problem with the BAM is

    incorrect convergence. The BAM may notalways produce the closest association. In fact, a

    stable association may be only slightly related to

    the initial input vector