NN Question Bank VIISem

Embed Size (px)

Citation preview

  • 8/10/2019 NN Question Bank VIISem

    1/42

    NEURAL NETWORKS QUESTION BANK

    1. The network of figure 1, is:

    (a) a single layer feed-forward neural network

    (b) an autoassociative neural network(c) a multiple layer neural network

    Figure 1

    The answer is (a).The network of figure 1, is a single layer feed-forward neural network because there is only one neuronbetween any input and output. The network is not autoassociative i.e. doesn't have feedback becausethere are no loops in it.

    2. A 3-input neuron is trained to output a zero when the input is 110 and a one when the input is

    111. After generalisation, the output will be zero when and only when the input is:(a) 000 or 110 or 011 or 101

    (b) 010 or 100 or 110 or 101

    (c) 000 or 010 or 110 or 100

    The answer is (c).The truth table before generalisation is:

    Inputs Output

    000 $

    001 $

    010 $

    011 $

    100 $

    101 $

  • 8/10/2019 NN Question Bank VIISem

    2/42

    110 0

    111 1

    where $ represents don't know cases and the output is random.

    After generalisation, the truth table becomes:

    Inputs Output

    000 0

    001 1

    010 0

    011 1

    100 0

    101 1110 0

    111 1

    Therefore, the output will be zero when the input is 000 or 010 or 110 or 100

    3. A perceptron is:

    (a) a single layer feed-forward neural network with preprocessing

    (b) an autoassociative neural network

    (c) a double layer autoassociative neural network

    The answer is (a).The perceptron is a single layer feed-forward neural network. It is not an autoassociative networkbecause it has no feedback and is not a multiple layer neural network because the preprocessing stage isnot made of neurons.

    4. An autoassociative network is:

    (a) a neural network that contains no loops

    (b) a neural network that contains feedback

    (c) a neural network that has only one loop

    The answer is (b).An autoassociative network is equivalnet to a neural network that conatins feedback. The number offeedback paths(loops) does not have to be one.

  • 8/10/2019 NN Question Bank VIISem

    3/42

    5. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant ofproportionality being equal to 2. The inputs are 4, 10, 5 and 20 respectively. The output will be:

    (a) 238

    (b) 76

    (c) 119

    The answer is (b).The output is found by multipling the weights with their respective inputs, summing the results andmultipling with the trasfer function. Therefore:Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238

    6. Which of the following is true?

    (i) On average, neural networks have higher computational rates than conventional computers.(ii) Neural networks learn by example.(iii) Neural networks mimic the way the human brain works.

    (a) all of them are true

    (b) (ii) and (iii) are true

    (c) (i), (ii) and (iii) are true

    The answer is (a)Neural networks have higher computational rates than conventional computers because a lot of theoperation is done in parallel.Note: that is not the case when the neural network is simulated on a computer. The idea behind neuralnets is based on the way the human brain works. Neural nets cannot be programmed, they cam onlylearn by examples.

    7. Which of the following is true for neural networks?

    (i) The training time depends on the size of the network.(ii) Neural networks can be simulated on a conventional computer.(iii)Artificial neurons are identical in operation to biological ones.

    (a) all of them are true.

    (b) (ii) is true.

    (c) (i) and (ii) are true.

    The answer is (c).

    The training time depends on the size of the network; the number of neuron is greater and therefore thethe number of possible 'states' is increased. Neural networks can be simulated on a conventionalcomputer but the main advantage of neural networks - parallel execution - is lost. Artificial neurons arenot identical in operation to the biological ones. We don't know yet what the real neurons do in detail.

  • 8/10/2019 NN Question Bank VIISem

    4/42

    8. What are the advantages of neural networks over conventional computers?

    (i) They have the ability to learn by example(ii) They are more fault tolerant(iii)They are more suited for real time operation due to their high 'computational' rates

    (a) (i) and (ii) are true

    (b) (i) and (iii) are true

    (c) all of them are true

    The answer is (c).Neural networks learn by example.They are more fault tolerant because they are always able to respond and small changes in input do notnormally cause a change in output.Because of their parallel architecture, high computational rates are achived.

    9. Which of the following is true?

    Single layer associative neural networks do not have the ability to:

    (i) perform pattern recognition(ii) find the parity of a picture(iii)determine whether two or more shapes in a picture are connented or not

    (a) (ii) and (iii) are true

    (b) (ii) is true

    (c) all of them are true

    The answer is (a).Pattern recognition is what single layer neural networks are best at but they don't have the ability to findthe parity of a picture or to determine whether two shapes are connected or not.

    10. The network shown in Figure 1 is trained to recognize the characters H and T as shown below:

    If the following pattern was given

    What would be the output of the network?

  • 8/10/2019 NN Question Bank VIISem

    5/42

    (a)

    (b)

    (c)

    The answer is (b).The top square of the output is black because the top pattern differs in two squares from a T and in 3squares from an H.The middle square is not defined because the middle row of the input defers the same amount from bothT and H(differs in 1). Therefore, the output can be either black or white.The bottom square is black because it differs from a T and in 2 from an H.

    11. With a supervised learning algorithm, we can specify target output values, but we may neverget close to those targets at the end of learning. Give two reasons why this might happen.

    Answer:

    (i) data may be valid, and inconsistency results from a stochastic aspect of the task (or some aspect ofthe task is not modelled by the input data collected);

    (ii) the data may contain errors - e.g. measurement errors or typographical errors

    12. Describe the architecture and the computational task of the NetTalk neural network.

    Answer:

  • 8/10/2019 NN Question Bank VIISem

    6/42

    Each group of 29 input units represents a letter, so inputs together represent seven letters computationaltask is to output the representation of the phoneme corresponding to the middle letter of the seven.

    13. Why does a time-delay neural network (TDNN) have the same set of incoming weights for eachcolumn of hidden units?

    Answer:

    To provide temporal translation invariance. Or So that the TDNN will be able to identify the input sound,no matter which frame the input sound begins in.

    14. Distinguish between a feedforward network and a recurrent network.

    Answer:

    A feedforward network has no cyclic activation flows.

    15. Draw the weight matrix for a feedforward network, showing the partitioning. You can assume

    that the weight matrix for connections from the input layer to the hidden layer is W ih , and that theweight matrix for connections from the hidden layer to the output layer is Who .

    Answer:

    16. In a Jordan network with i input neurons, h hidden layer neurons, and o output neurons:

    (a) how many neurons will there be in the state vector, and

    (b) if i = 4, h = 3, and o = 2, draw a diagram showing the connectivity of the network. Do not forgetthe bias unit.

    Answer:

    (a) o neurons in state vector (same as output vector that s letter o, not zero)

    (b)

  • 8/10/2019 NN Question Bank VIISem

    7/42

    17. Draw a diagram illustrating the architecture of Elmans simple recurrent network that performsa temporal version of the XOR task. How are the two inputs to XOR provided to this network?

    Answer:

    The inputs are passed sequentially to the single input unit (0) of the temporal XOR net.

    18. Briefly describe the use of cluster analysis in Elmans lexical class discovery experiments, andone of his conclusions from this.

    Answer:

    Elman clustered hidden unit activation patterns corresponding to different input vectors and differentsequences of input units. He found that the clusters corresponded well to the grammatical contexts inwhich the inputs (or input sequences) occurred, and thus concluded that the network had in effect learnedthe grammar.

    19. Draw an architectural diagram of a rank 2 tensor product network where the dimensions of theinput/output vectors are 3 and 4. You do not need to show the detailed internal structure of thebinding units.

    Answer:

  • 8/10/2019 NN Question Bank VIISem

    8/42

    20. Draw a diagram of a single binding unit in a rank 2 tensor product network illustrating theinternal operation of the binding unit in teaching mode.

    Answer:

    21. Define the concepts of dense and sparse random representations. How do their propertiescompare with those of an orthonormal set of representation vectors.

    Answer:

    In a dense random representation, each vector component is chosen at random from a uniformdistribution over say [1, +1]. In a sparse random representation, the non-zero components are chosen inthis way, but most components are chosen (at random) to be zero. In both cases, the vectors arenormalised so that they have length 1.

    Members of orthonormal sets of vectors have length one, and are orthogonal to one another. Vectors in

    dense and sparse random representations are orthogonal on average their inner products have amean of zero.

    22. What is a Hadamard matrix? Describe how a Hadamard matrix can be used to produce suitabledistributed concept representation vectors for a tensor product network. What are the propertiesof the Hadamard matrix that makes the associated vectors suitable?

    Answer:

  • 8/10/2019 NN Question Bank VIISem

    9/42

    A Hadamard matrix H is a square matrix of side n, all of whose entries are 1, which satisfies HHT = In the identity matrix of side n. The rows of a Hadamard matrix, once normalised, can be used as distributedrepresentation vectors in a tensor product network. This is because the rows are orthogonal to eachother, and have no zero-components.

    23. In a 2-D self-organising map with input vectors of dimension m, and k neurons in the map,

    how many weights will there be?

    Answer:

    mk

    24. Describe the competitive process of the Self-Organising Map algorithm.

    Answer:

    Let m denote the dimension of the input pattern

    x = [x1, x2, ..., xm]T

    The weight vector for each of the neurons in the SOM also has dimension m. So for neuronj, the weightvector will be:

    wj = [wj1, wj2, ..., wjm]T

    For an input pattern x, compute the inner product wjx for each neuron, and choose the largest innerproduct. Let i(x) denote the index of the winning neuron (and also the output of a trained SOM).

    25. Briefly explain the concept of a Voronoi cell.

    Answer:

    Given a set of vectors X, the Voronoi cells about those vectors are the ones that partition the space theylie in, according to the nearest-neighbour rule. That is, the Voronoi cell that a vector lies in is thatbelonging to the x X to which it is closest.

    26. Briefly explain the term code book in the context of learning vector quantisation.

    Answer:

    When compressing data by representing vectors by the labels of a relatively small set of reconstructionvectors, the set of reconstruction vectors is called the code book.

    27. Describe the relationship between the Self-Organising Map algorithm, and the Learning VectorQuantisation algorithm.

    Answer:

  • 8/10/2019 NN Question Bank VIISem

    10/42

    In order to use Learning Vector Quantisation (LVQ), a set of approximate reconstruction vectors is firstfound using the unsupervised SOM algorithm. The supervised LVQ algorithm is then used to fine-tune thevectors found using SOM.

    28. Briefly describe two types of attractor in a dynamical system.

    Answer:

    An attractor is a bounded subset of space to which non-trivial regions of initial conditions converge at timepasses. Pick two of

    point attractor: system converges to a single point

    limit cycle: system converges to a cyclic path

    chaotic attractor: stays within a bounded region of space, but no predictable cyclic path

    29. Write down the energy function of a BSB network with weight matrix W, feedback constant ,

    and activation vector x.

    Answer:

    30. Compute the weight matrix for a 4-neuron Hopfield net with the single fundamental memory 1= [1,1, 1,1] stored in it.

    Answer:

    31. Write down the energy function of a discrete Hopfield net.

  • 8/10/2019 NN Question Bank VIISem

    11/42

    Answer:

    32. What is Artificial Neural Network?

    An extremely simplified model of the brainEssentially a function approximatorTransforms inputs into outputs to the best of its ability

    Composed of many neurons that co-operate to perform the desired function

    33. What Are ANNs Used For?

    ClassificationPattern recognition, feature extraction, image matchingNoise ReductionRecognize patterns in the inputs and produce noiseless outputs

  • 8/10/2019 NN Question Bank VIISem

    12/42

    PredictionExtrapolation based on historical data

    Ability to learnNNs figure out how to perform their function on their ownDetermine their function based only upon sample inputsAbility to generalizei.e. produce reasonable outputs for inputs it has not been taught how to deal with34. How do Neural Networks Work?

    The building blocks of neural networks are the neurons.

    In technical systems, we also refer to them as unitsor nodes.

    Basically, each neuron

    receives inputfrom many other neurons,

    changes its internal state (activation) based on the current input,

    sends one output signalto many other neurons, possibly including its input neurons(recurrent network)

    .Information is transmitted as a series of electric impulses, so-called spikes.

    The frequencyand phaseof these spikes encodes the information.

    In biological systems, one neuron can be connected to as many as 10,000other neurons.

    Usually, a neuron receives its information from other neurons in a confined area, its so-called

    receptive field.

    NNs are able to learnby adapting their connectivity patternsso that the organism improves its

    behavior in terms of reaching certain (evolutionary) goals.

    The strength of a connection, or whether it is excitatory or inhibitory, depends on the state of a

    receiving neurons synapses.

    The NN achieveslearningby appropriately adapting the states of its synapses

    The output of a neuron is a function of the weighted sum of the inputs plus a bias

    The function of the entire neural network is simply the computation of the outputs of all the neurons

  • 8/10/2019 NN Question Bank VIISem

    13/42

    An enti

    35. Expla

    Another t

    Gaussian

    principle

    neurons ithe learni

    36. Expla

    Sigmoida

    and 1. Si

    A networ

    f: Rm(

    The para

    offset of t

    In backpr

    37. Expla

    Hebbian

    irely determi

    in Gaussian

    pe of neuro

    neurons are

    nrestricted

    that we havg in Gaussi

    in Sigmoida

    l neurons ac

    moidal neur

    of sigmoidal

    ,1)n

    eter contr

    e function in

    pagation ne

    in Correlati

    earning (19

    istic calculati

    Neurons

    s overcome

    able to realiz

    ith regard to

    e to make sun networks.

    l Neurons

    ept any vect

    ns are the

    l units with m

    ls the slope

    a way simila

    tworks, we ty

    n Learning

    9):

    on

    this proble

    e non-linear

    the functions

    re that their

    rs of real nu

    ost common

    input neuron

    of the sigmoi

    r to the thres

    pically choos

    )(net( ii tf

    by using a

    unctions.Th

    that they ca

    et input doe

    mbers as inp

    type of artifi

    s and n outp

    d function, w

    hold neurons

    e = 1 and

    (if

    (net(1

    1) +

    =ie

    aussian acti

    refore, netw

    realize. Th

    not exceed

    ut, and they

    ial neuron, e

    t neurons re

    ile the para

    .

    = 0.

    ))(et =i t

    /))t

    ation functio

    rks of Gaus

    drawback o

    1. This adds

    utput a real

    specially in le

    alizes a netw

    eter contr

    2

    1)(net

    ti

    e

    n:

    ian units are

    Gaussian

    some difficul

    umber betw

    arning netwo

    ork function

    ls the horizo

    in

    ty to

    een 0

    rks.

    ntal

  • 8/10/2019 NN Question Bank VIISem

    14/42

    When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes place in

    firing it, some growth process or metabolic change takes place in one or both cells such that As

    efficiency, as one of the cells firing B, is increased.

    Weight modification rule:

    wi,j= cxixj

    Eventually, the connection strength will reflect the correlation between the neurons outputs.

    38. Explain Competitive Learning

    Nodes compete for inputs

    Node with highest activation is the winner

    Winner neuron adapts its tuning (pattern of weights) even further towards the current input

    Individual nodes specialize to win competition for a set of similar inputs

    Process leads to most efficient neural representation of input space

    Typical for unsupervised learning

    39. Explain Linear Neurons

    Obviously, the fact that threshold units can only output the values 0 and 1 restricts their applicability to

    certain problems.

    We can overcome this limitation by eliminating the threshold and simply turning fiinto the identity function

    so that we get:

    With this kind of neuron, we can build feedforward networks with m input neurons and n output neurons

    that compute a function f: RmRn

    Linear neurons are quite popular and useful for applications such as interpolation.

    However, they have a serious limitation: Each neuron computes a linear function, and therefore the

    overall network function f: RmRn is also linear.

    This means that if an input vector x results in an output vector y, then for any factor the input x willresult in the output y.

    Obviously, many interesting functions cannot be realized by networks of linear neurons.

    40. Explain Gradient Descent

    Gradient descent is a very common technique to find the absolute minimum of a function.

    It is especially useful for high-dimensional functions. We will use it to iteratively minimizes the networks

    (or neurons) error by finding the gradientof the error surface in weight-space and adjusting the

    weightsin the opposite direction.

    )(net)( ttx ii =

  • 8/10/2019 NN Question Bank VIISem

    15/42

    Gradient

    Repeat th

    Gradients

    The two-

    where arr

    pointing i

    we shoul

    41. Devel

    Algorithm

    S

    L

    mis

    descent ex

    is iteratively

    of two-dime

    imensional f

    ows indicate

    thedirection

    always mov

    op a Percep

    Perceptron;

    tart with a ra

    et k = 1;

    hile there exi

    classified by

    Let ijb

    mple:Findin

    until for some

    sional functi

    nction in the

    the gradient

    of the steep

    e against the

    tron Trainin

    domly chos

    st input vect

    wk-1, do

    e a misclassi

    g the absolu

    xi, f(xi) is su

    ons:

    left diagram

    f the functio

    st increase

    gradient.

    Algorithm

    n weight vec

    rs that are

    fied input ve

    e minimum

    fficiently clos

    is represent

    at different

    f the functio

    tor w0;

    tor;

    f a one-dime

    e to 0.

    d by contour

    locations. Ob

    . In order to

    nsional error

    lines in the ri

    viously, the

    find the funct

    function f(x):

    ight diagram,

    radient is al

    ions minimu

    ays

    ,

  • 8/10/2019 NN Question Bank VIISem

    16/42

    Let xk= class(ij)ij, implying that wk-1xk< 0;

    Update the weight vector to wk= wk-1+ xk;

    Increment k;

    end-while;

    For example, for some input i with class(i) = -1,

    If wi > 0, then we have a misclassification.

    Then the weight vector needs to be modified to w + w

    with (w + w)i < wi to possibly improve classification.

    We can choose w = -i, because

    (w + w)i = (w - i)i = wi - ii < wi,

    and ii is the square of the length of vector i and is thus positive.

    If class(i) = 1, things are the same but with opposite signs; we introduce x to unify these two cases.

    42. Develop an Adaline Learning Algorithm?

    The Adaline uses gradient descent to determine the weight vector that leads to minimal error.

    Error is defined as the MSE between the neurons net input netjand its desired output dj(= class(ij))

    across all training samples ij.

    The idea is to pick samples in random order and perform (slow) gradient descent in their individual error

    functions.

    This technique allows incremental learning, i.e., refining of the weights as more training samples are

    added.

    2)( jj netdE = )()(2 jk

    jj

    k

    netw

    netdw

    E

    =

    = =

    n

    l

    jll

    k

    jj iww

    netd0

    ,)(2

  • 8/10/2019 NN Question Bank VIISem

    17/42

    The Adaline uses gradient descent to determine the weight vector that leads to minimal error.

    The gradient is then given by

    For gradient descent, w should be a negative multiple of the gradient:

    43. Explain the difference between Internal Representation Issues and External Interpretation

    Issues?

    Internal Representation Issues

    As we said before, in all network types, the amplitude of input signals and internal signals is limited:

    analog networks: values usually between 0 and 1

    binary networks: only values 0 and 1allowed

    bipolar networks: only values 1 and 1allowed

    Without this limitation, patterns with large amplitudes would dominate the networks behavior.

    A disproportionately large input signal can activate a neuron even if the relevant connection weight is very

    small.

    External Interpretation Issues

    From the perspective of the embedding application, we are concerned with the interpretation of input and

    output signals.

    These signals constitute the interface between the embedding application and its NN component.

    Often, these signals only become meaningful when we define an external interpretation for them.

    This is analogous to biological neural systems: The same signal becomes completely different meaningwhen it is interpreted by different brain areas (motor cortex, visual cortex etc.).

    Without any interpretation, we can only use standard methods to define the difference (or similarity)

    between signals.

    For example, for binary patterns x and y, we could

    jkjj inetd ,)(2 =

    jjj

    n

    netd

    w

    E

    w

    E

    i)(2:0

    =

    parametersize-steppositivewith,)( jjj netd iw =

  • 8/10/2019 NN Question Bank VIISem

    18/42

    Example:

    x = 0001

    y = 1000

    These pa

    same thin

    44. Expla

    M

    T

    A

    si

    I

    v

    treat them

    their differen

    treat them

    angle betwe

    count theto flip in ord

    distance)

    Two binary

    0010111110

    1000010000

    terns seem t

    g.

    in the proce

    ost networks

    hese networ

    ll networks p

    gnals or disc

    both cases,

    lue.

    as binary nu

    ce as | x y

    as vectors a

    n them as a

    umbers of dr to transfor

    atterns x an

    0010001100

    1000010000

    o be very diff

    ss of data r

    process info

    s produce o

    ocess one o

    rete (quantiz

    signals have

    mbers and c

    |

    nd use the c

    measure of

    igits that wex into y (Ha

    y:

    1011001001

    1000011110

    erent from e

    presentatio

    rmation in th

    tput pattern

    two types of

    d) signals.

    a finite ampl

    mpute

    sine of the

    imilarity

    ould havemming

    ch other. Ho

    ?

    form of inp

    ectors that a

    signal comp

    itude; their a

    wever, given

    t pattern vec

    re interprete

    onents: anal

    plitude has

    their externa

    and y actua

    tors.

    by the emb

    g (continuou

    a minimum a

    l interpretatio

    lly represent

    dding applic

    sly variable)

    nd a maximu

    n

    the

    ation.

    m

  • 8/10/2019 NN Question Bank VIISem

    19/42

    The main

    How can

    into the n

    We shoul

    respond t

    Relevant

    Similarly,

    Often, a

    We are g

    methods

    45. Expla

    Often, ou

    requires

    A four-no

    Each per

    otherwise

    question is:

    e appropria

    twork?

    d aim for a d

    o) relevant fe

    features are

    we also nee

    natural repr

    ing to consi

    or creating a

    in the proce

    classificatio

    t least 26 dif

    e perceptro

    eptron learn

    .

    tely capture t

    ta represent

    atures in the

    hose that en

    to define a

    sentation of

    er internal re

    ppropriate re

    ss of Multicl

    n problems in

    erent classe

    for a four-cl

    s to recogniz

    hese signals

    ation scheme

    input pattern

    able the net

    et of desired

    he output da

    presentation

    presentation

    ass Discrim

    volve more t

    . We can pe

    ass problem

    one particul

    and represe

    that maximi

    .

    ork to gener

    outputs that

    ta turns out t

    and external

    .

    ination?

    an two clas

    rform such ta

    in n-dimensi

    ar class, i.e.,

    t them as pa

    es the abilit

    te the desire

    the network

    o be impossi

    interpretatio

    es. For exa

    sks using lay

    nal input spa

    output 1 if th

    ttern vectors

    of the netwo

    d output patt

    an actually

    le for the ne

    issues as

    ple, charact

    ers of percep

    ce

    e input is in t

    that we can f

    rk to detect (

    ern.

    roduce.

    work to prod

    ell as specifi

    r recognition

    trons or Adal

    hat class, an

    eed

    and

    uce.

    lines

    0

  • 8/10/2019 NN Question Bank VIISem

    20/42

    The units can be trained separately and in parallel.

    In production mode, the network decides that its current input is in the k-th class if and only if ok= 1, and

    for all j k, oj= 0, otherwise it is misclassified.

    For units with real-valued output, the neuron with maximal output can be picked to indicate the class of

    the input.

    This maximum should be significantly greater than all other outputs, otherwise the input is misclassified.

    46. Explain difference between Supervised and unsupervised learning?

    Supervised learning:

    An archaeologist determines the gender of a human skeleton based on many past examples of

    male and female skeletons.

    Unsupervised learning:

    The archaeologist determines whether a large number of dinosaur skeleton fragments belong to

    the same species or multiple species. There are no previous data to guide the archaeologist, and

    no absolute criterion of correctness.

    47. Explain different ways of representing the data in the neural network system? 10

    48. Explain temporal data representations? Give example. 10

    49. Write a note on Adaptive Networks

    As you know, there is no equation that would tell you the ideal number of neurons in a multi-layer

    network.Ideally, we would like to use the smallest number of neurons that allows the network to do its tasksufficiently accurately, because of:

    the small number of weights in the system, fewer training samples being required, faster training, typically, better generalization for new test samples.

    So far, we have determined the number of hidden-layer units in BPNs by trial and error.However, there are algorithmic approaches for adapting the size of a network to a given task.Some techniques start with a large network and then iteratively prune connections and nodes thatcontribute little to the network function.Other methods start with a minimal network and then add connections and nodes until the networkreaches a given performance level.Finally, there are algorithms that combine these pruning and growing approaches.

    50. Write a note on Cascade correlationNone of these algorithms are guaranteed to produce ideal networks. (It is not even clear how to definean ideal network.)However, numerous algorithms exist that have been shown to yield good results for most applications.We will take a look at one such algorithm named cascade correlation.It is of the network growing type and can be used to build multi-layer networks of adequate size.However, these networks are not strictly feed-forward in a level-by-level manner.

  • 8/10/2019 NN Question Bank VIISem

    21/42

    This learning algorithm is much faster than backpropagation learning, because only one neuron is trainedat a time.On the other hand, its inability to retrain neurons may prevent the cascade correlation network fromfinding optimal weight patterns for encoding the given function.

    51. Explain Covariance and Correlation

    For a dataset (xi, yi) with i = 1, , n the covariance is:

    Covariance tells us something about the strength and direction (directly vs. inversely proportional) of thelinear relationship between x and y.For many applications, it is useful to normalize this variable so that it ranges from -1 to 1.The result is the correlation coefficient r, which for a dataset (xi, yi) with i = 1, , n is given by:

    =

    =n

    i

    ii

    n

    yyxx

    1

    ))((),cov( yx

    ==

    =

    ==n

    i i

    n

    i i

    ni ii

    yyxx

    yyxx

    0

    2

    0

    2

    0

    )()(

    ))((),(corrr yx

  • 8/10/2019 NN Question Bank VIISem

    22/42

    In the case of high (close to 1) or low (close to -1) correlation coefficients, we can use one variable as apredictor of the other one.To quantify the linear relationship between the two variables, we can use linear regression:

    52. What are the benefits to have smallest number of neurons in the network? 4

    53. Develop a cascade correlation algorithm? Why it is used for? What are its advantages?

    We start with a minimal network consisting of only the input neurons (one of them should be a constantoffset = 1) and the output neurons, completely connected as usual.The output neurons (and later the hidden neurons) typically use output functions that can also producenegative outputs; e.g., we can subtract 0.5 from our sigmoid function for a (-0.5, 0.5) output range.Then we successively add hidden-layer neurons and train them to reduce the network error step by step:

  • 8/10/2019 NN Question Bank VIISem

    23/42

    Weights to each new hidden node are trained to maximize the covariance of the nodes output with thecurrent network error.Covariance:

    : vector of weights to the new node

    : output of the new node to p-th input sample

    : error of k-th output node for p-th input sample before the new node is added

    : averages over the training set

    None of these algorithms are guaranteed to produce ideal networks.(It is not even clear how to define an ideal network.)

    However, numerous algorithms exist that have been shown to yield good results for most applications.We will take a look at one such algorithm named cascade correlation.It is of the network growing type and can be used to build multi-layer networks of adequate size.However, these networks are not strictly feed-forward in a level-by-level manner.Since we want to maximize S (as opposed to minimizing some error), we use gradient ascent:

    : i-th input for the p-th pattern

    : sign of the correlation between the nodes output and the k-th network output

    : learning rate

    : derivative of the nodes activation function with respect to its net input, evaluated at p-th pattern

    If we can find weights so that the new nodes output perfectly covaries with the error in each output node,we can set the new output node weights and offsets so that the new error is zero.More realistically, there will be no perfect covariance, which means that we will set each output nodeweight so that the error is minimized.To do this, we can use gradient descent or linear regression for each individual output node weight.

    = =

    =K

    k

    P

    p

    kpknewpnewnew EExxS(w

    1 1

    ,, ))(()

    neww

    pnewx ,

    pkE ,

    knew Ex and

    = =

    ==K

    k

    pip

    P

    p

    kpkk

    i

    i IfEESw

    Sw

    1,

    1, ')(

    piI ,

    kS

    pf '

  • 8/10/2019 NN Question Bank VIISem

    24/42

    The next added hidden node will further reduce the remaining network error, and so on, until we reach adesired error threshold.

    This learning algorithm is much faster than backpropagation learning, because only one neuron is trainedat a time.On the other hand, its inability to retrain neurons may prevent the cascade correlation network fromfinding optimal weight patterns for encoding the given function.

    54. What are input space clusters and radial basic functions (RBFs)? 6To achieve such local receptive fields, we can use radial basis functions, i.e., functions whose outputonly depends on the Euclidean distance between the input vector and another (weight) vector.

    A typical choice is a Gaussian function:

    where c determines the width of the Gaussian.However, any radially symmetric, non-increasing function could be used.

    55. Explain linear interpolation for one dimensional and multidimensional case? 5For function approximation, the desired output for new (untrained) inputs could be estimated by linear

    interpolation.As a simple example, how do we determine the desired output of a one-dimensional function at a newinput x0that is located between known data points x1and x2?

    which simplifies to:

    with distances D1and D2from x0to x1and x2, resp.In the multi-dimensional case, hyperplane segments connect neighboring points so that the desiredoutput for a new input x0is determined by the P0known samples that surround it:

    Where Dpis the Euclidean distance between x0and xpand f(xp) is the desired output value for input xp.Example for f:R

    2R1(with desired output indicated):

    For four nearest neighbors, the desired output for x0is

    56. Explain different types of learning methods? What are counter propagation networks?

    ( ) ( )2

    /cg e

    ( ) ( ) ( ) ( )( )( )

    ( )121012

    10xx

    xxxfxfxfxf

    +=

    ( ) ( )1

    2

    1

    1

    2

    1

    21

    1

    10 )(

    +

    +=

    DD

    xfDxfDxf

    ( ) ( ) 11

    2

    1

    1

    1

    2

    1

    1

    0

    0

    ...

    ...)(

    +++

    +++=

    P

    P

    DDD

    DfDfDf

    xxx

    21

    0

    5.56745

    )(1

    7

    1

    6

    1

    3

    1

    2

    1

    7

    1

    6

    1

    3

    1

    2 +++

    +++=

    DDDD

    DDDDf 0x

  • 8/10/2019 NN Question Bank VIISem

    25/42

    Unsupervised/Supervised Learning .

    The counterpropagation network (CPN) is a fast-learning combination of unsupervised and supervisedlearning.

    Although this network uses linear neurons, it can learn nonlinear functions by means of a hidden layer ofcompetitive units.Moreover, the network is able to learn a function and its inverse at the same time.However, to simplify things, we will only consider the feedforward mechanism of the CPN.

    57. Explain the process of learning in radial basic function networks? 5

    If we are using such linear interpolation, then our radial basis function (RBF) 0that weights an inputvector based on its distance to a neurons reference (weight) vector is 0(D) = D

    -1.

    For the training samples xp, p = 1, , P0, surrounding the new input x, we find for the networks output o:

    (In the following, to keep things simple, we will assume that the network has only one output neuron.However, any number of output neurons could be implemented.)Since it is difficult to define what surrounding should mean, it is common to consider all P trainingsamples and use any monotonically decreasing RBF :

    This, however, implies a network that has as many hidden nodes as there are training samples. This inunacceptable because of its computational complexity and likely poor generalization ability the networkresembles a look-up table.It is more useful to have fewer neurons and accept that the training set cannot be learned 100%accurately:

    Here, ideally, each reference vector iof these N neurons should be placed in the center of an input-space cluster of training samples with identical (or at least similar) desired output i.To learn near-optimal values for the reference vectors and the output weights, we can as usual employ gradient descent.

    58. Write a note on distance and similarity functions with respect to counterpropagation network?5In the hidden layer, the neuron whose weight vector is most similar to the current input vector is thewinner.There are different ways of defining such maximal similarity, for example:(1) Maximal cosine similarity (same as net input):

    (2) Minimal Euclidean distance:

    (no square root necessary for determining the winner)

    59. Develop a counterpropagation network learning algorithm? 10

    A simple CPN with two input neurons, three hidden neurons, and two output neurons can be described asfollows:

    xwxw, =)(s( ) =

    i

    ii xwd2

    )( xw,

    ( ) )(where,1 00

    pp

    p

    pp xfdd

    Po = xx

    ( )=

    =P

    p

    ppdP

    o1

    1xx

    ( )=

    =N

    i

    iiN

    o1

    1x

  • 8/10/2019 NN Question Bank VIISem

    26/42

    The CPN learning process (general form for n input units and m output units):1. Randomly select a vector pair (x, y) from the training set.2. If you use the cosine similarity function, normalize (shrink/expand to length 1) the input vector x

    by dividing every component of x by the magnitude ||x||, where

    3. Initialize the input neurons with the resulting vector and compute the activation of the hidden-layerunits according to the chosen similarity measure.

    4. In the hidden (competitive) layer, determine the unit W with the largest activation (the winner).5. Adjust the connection weights between W and all N input-layer units according to the formula:

    6. Repeat steps 1 to 5 until all training patterns have been processed once.7. Repeat step 6 until each input pattern is consistently associated with the same competitive unit.8. Select the first vector pair in the training set (the current pattern).9. Repeat steps 2 to 4 (normalization, competition) for the current pattern.10. Adjust the connection weights between the winning hidden-layer unit and all M output layer units

    according to the equation:

    11. Repeat steps 9 and 10 for each vector pair in the training set.12. Repeat steps 8 through 11 for several epochs.

    60. Develop a Quickprop learning algorithm? 10

    The assumption underlying Quickprop is that the network error as a function of each individual weight canbe approximated by a paraboloid.Based on this assumption, whenever we find that the gradient for a given weight switched its signbetween successive epochs, we should fit a paraboloid through these data points and use its minimum asthe next weight value.Illustration (sorry for the crummy paraboloid):

    =

    =n

    j

    jxx1

    2||||

    ))(()()1( twxtwtw HWnnH

    Wn

    H

    Wn +=+

    ))(()()1( twytwtw OmWm

    O

    mW

    O

    mW

    +=+

  • 8/10/2019 NN Question Bank VIISem

    27/42

    Newtons method:

    For the minimum of E we must have:

    Notice that this method cannot be applied if the error gradient has not decreased in magnitude and hasnot changed its sign at the preceding time step.In that case, we would ascent in the error function or make an infinitely large weight modification.In most cases, Quickprop converges several times faster than standard backpropagation learning.

    61. Develop an Rprop learning algorithm? 10

    Resilient Backpropagation (Rprop)The Rprop algorithm takes a very different approach to improving backpropagation as compared toQuickprop.Instead of making more use of gradient information for better weight updates, Rprop only uses the sign ofthe gradient, because its size can be a poor and noisy estimator of required weight updates.Furthermore, Rprop assumes that different weights need different step sizes for updates, which varythroughout the learning process.The basic idea is that if the error gradient for a given weight wijhad the same sign in two consecutiveepochs, we increase its step size ij, because the weights optimal value may be far away.If, on the other hand, the sign switched, we decrease the step size.Weights are always changed by adding or subtracting the current step size, regardless of the absolutevalue of the gradient.This way we do not get stuck with extreme weights that are hard to change because of the shallowslope in the sigmoid function.

    cbwawE ++= 2

    btawtEw

    tE +==)(2)(')( tawtE

    wtE +==

    )1(2)1(')1(

    )1(

    )1(')('

    )1()(

    )1(')('2

    =

    =tw

    tEtE

    twtw

    tEtEa )1(

    ())1(')('()('

    =tw

    wtEtEtEb

    0)1(2)1(

    =++=

    +btaw

    w

    tEa

    btw

    2)1( =+

    )1(

    )1()(')()]1(')('[

    )1(')('

    )1()1(

    =+

    tw

    twtEtwtEtE

    tEtE

    twtw

    )1('

    )(')()1(

    tE

    wtEtwtw

    +=+

    =

    +

    0if,

    0if,

    )()1()1(

    )()1()1(

    )(tt

    t

    ij

    ij

    t

    ij

    tt

    ij

    t

    ij

    EE

    w

    E

    w

    E

  • 8/10/2019 NN Question Bank VIISem

    28/42

    Formally, the step size update rules are:

    Empirically, best results were obtainedwith initial step sizes

    of 0.1,

    +

    =1.2,

    -

    =1.2, max=50, and min=10

    -6

    Weight updates are then performed as follows:

    It is important to remember that, like inQuickprop, in Rprop the gradient needs to becomputed across allsamples (per-epoch learning).

    The performance of Rprop is comparable to Quickprop; it also considerably accelerates backpropagationlearning. Compared to both the standard backpropagation algorithm and Quickprop, Rprop has oneadvantage:Rprop does not require the user to estimate or empirically determine a step size parameter and itschange over time. Rprop will determine appropriate step size values by itself and can thus be applied asis to a variety of problems without significant loss of efficiency.

    62. What are Maxnets? Give example. 5A maxnet is a recurrent, one-layer network that uses competition to determine which of its nodes has thegreatest initial input value.

    All pairs of nodes have inhibitory connections with the same weight -, where typically 1/(# nodes).In addition, each node has a self-excitatory connection to itself, whose weight is typically 1.The nodes update their net input and their output by the following equations:

    All nodes update their output simultaneously.With each iteration, the neurons activations will decrease until only one neuron remains active.This is the winner neuron that had the greatest initial input.Maxnet is a biologically plausible implementation of a maximum-finding function.In parallel hardware, it can be more efficient than a corresponding serial function.We can add maxnet connections to the hidden layer of a CPN to find the winner neuron.

    Example of a Maxnet with five neurons and = 1, = 0.2:

    =i

    iixwnet ),0max()( netnetf =

    =

    otherwise,0

    0if,

    0if,

    )()(

    )()(

    )(

    ij

    tt

    ij

    ij

    tt

    ij

    t

    ijw

    E

    w

    E

    w

  • 8/10/2019 NN Question Bank VIISem

    29/42

    63. Write a note on Kohonen maps? 5

    Self-Organizing Maps (Kohonen Maps)As you may remember, the counterpropagation network employs a combination of supervised andunsupervised learning. We will now study Self-Organizing Maps (SOMs) as examples for completelyunsupervised learning (Kohonen, 1980). This type of artificial neural network is particularly similar tobiological systems (as far as we understand them).In the human cortex, multi-dimensional sensory input spaces (e.g., visual input, tactile input) arerepresented by two-dimensional maps.The projection from sensory inputs onto such maps is topology conserving.This means that neighboring areas in these maps represent neighboring areas in the sensory inputspace.For example, neighboring areas in the sensory cortex are responsible for the arm and hand regions.

    Such topology-conserving mapping can be achieved by SOMs:

    Two layers: input layer and output (map) layer Input and output layers are completely connected. Output neurons are interconnected within a defined

    neighborhood. A topology (neighborhood relation) is defined on

    the output layer.

    Network structure:

    Common output-layer structures:

  • 8/10/2019 NN Question Bank VIISem

    30/42

    A neighborhood function (i, k) indicates how closely neurons i and k in the output layer areconnected to each other. Usually, a Gaussian function on the distance between the two neurons

    in the layer is used:

    64. Describe Adaptive resonance theory with an example? 10

    Adaptive Resonance Theory (ART) networks perform completely unsupervised learning.

    Their competitive learning algorithm is similar to the first (unsupervised) phase of CPN learning.

    However, ART networks are able to grow additional neurons if a new input cannot be categorized

    appropriately with the existing neurons.

    A vigilance parameter determines the tolerance of this matching process.

    A greater value of leads to more, smaller clusters (= input samples associated with the same winnerneuron).

    ART networks consist of an input layer and an output layer.

    We will only discuss ART-1 networks, which receive binary input vectors.

    Bottom-up weights are used to determine output-layer candidates that may best match the current input.

    Top-down weights represent the prototype for the cluster defined by each output neuron.

    A close match between input and prototype is necessary for categorizing the input.

    Finding this match can require multiple signal exchanges between the two layers in both directions until

    resonance is established or a new neuron is added.

    ART networks tackle the stability-plasticity dilemma:

  • 8/10/2019 NN Question Bank VIISem

    31/42

    Plasticity: They can always adapt to unknown inputs (by creating a new cluster with a new weight vector)

    if the given input cannot be classified by existing clusters.

    Stability: Existing clusters are not deleted by the introduction of new inputs (new clusters will just be

    created in addition to the old ones).

    Problem:Clusters are of fixed size, depending on .

    A. Initialize each top-down weight tl,j(0) = 1;

    B. Initialize bottom-up weight bj,l(0) = ;

    C. Whilethe network has not stabilized, do

    1. Present a randomly chosen pattern x = (x1,,xn)for learning

    2. Let the active setAcontain all nodes; calculate

    yj= bj,1 x1++bj,nxn for each nodej A;

    3. Repeat

    a) Letj* be a node inAwith largest yj, with ties being broken arbitrarily;

    b) Compute s*= (s*1,,s*n )where s

    *l = tl,j*xl ;

    c) Compare similarity between s*and xwith the given vigilance parameter r :

    if < r thenremovej*from set A

    elseassociate xwith nodej*and update weights:

    bj*l(new) = tl,j*(new) =

    UntilAis empty or x has been associated with some nodej

  • 8/10/2019 NN Question Bank VIISem

    32/42

    4. If A is empty, then create new node whose weight vector coincides with current input

    pattern x;

    end-while

    65. What is classification?

    A. Deciding which features to use in a pattern recognition problem.

    B. Deciding which class an input pattern belongs to.

    C. Deciding which type of neural network to use.

    Answer: B

    66. What is a pattern vector?

    A. A vector of weights w = [w1,w2, ...,wn]T in a neural network.

    B. A vector of measured features x = [x1, x2, ..., xn]T of an input example.

    C. A vector of outputs y = [y1, y2, ..., yn]T of a classifier.

    Answer: B

    67. For a minimum distance classifier with one input variable, what is the decision boundarybetween two classes?

    A. A line.

    B. A curve.

    C. A plane.

    D. A hyperplane.

    E. A discriminant.

    Answer: E

    68. For a Bayes classifier with two input variables, what is the decision boundary between twoclasses?

    A. A line.

    B. A curve.

    C. A plane.

    D. A hypercurve.

  • 8/10/2019 NN Question Bank VIISem

    33/42

    E. A discriminant.

    Answer: B

    69. Design a minimum distance classifier with three classes using the following training data:

    Then classify the test vector [0.5,1]T with the trained classifier. Which class does this vectorbelong to?

    A. Class 1.

    B. Class 2.

    C. Class 3.

    Answer: B

    70. The decision function for a minimum distance classifier is dj(x) = xTmj 1/2mj

    Tmjwhere mjis

    the prototype vector for class !j . What is the value of the decision function for each of the threeclasses in above question for the test vector [0,0.5]T ?

    A. d1(x) = 1.5, d2(x) = 0.5, d3(x) = 0.5.

    B. d1(x) = 0.875, d2(x) = 0.375, d3(x) = 2.375.

    C. d1(x) = 0.5, d2(x) = 0.5, d3(x) = 1.5.

    D. d1(x) = 0.375, d2(x) = 0.875, d3(x) = 0.875.

    Answer: A

    71. Is the following statement true or false? An outlier is an input pattern that is very differentfrom the typical patterns of the same class.

    A. TRUE.

    B. FALSE.

    Answer: A

    72. What is generalization?

    A. The ability of a pattern recognition system to approximate the desired output values for pattern vectorswhich are not in the test set.

    B. The ability of a pattern recognition system to approximate the desired output values for pattern vectorswhich are not in the training set.

  • 8/10/2019 NN Question Bank VIISem

    34/42

    C. The ability of a pattern recognition system to extrapolate on pattern vectors which are not in thetraining set.

    D. The ability of a pattern recognition system to interpolate on pattern vectors which are not in the testset.

    Answer: B

    73. Is the following statement true or false? In the human brain, roughly 70% of the neurons areused for input and output. The remaining 30% are used for information processing.

    A. TRUE.

    B. FALSE.

    Answer: B

    74. Which of the following statements is the best description of supervised learning?

    A. If a particular input stimulus is always active when a neuron fires then its weight should be increased.

    B. If a stimulus acts repeatedly at the same time as a response then a connection will form between theneurons involved. Later, the stimulus alone is sufficient to activate the response.

    C. The connection strengths of the neurons involved are modified to reduce the error between thedesired and actual outputs of the system.

    Answer: C

    75. Is the following statement true or false? Artificial neural networks are parallel computingdevices consisting of many interconnected simple processors.

    A. TRUE.

    B. FALSE.

    Answer: A

    76. Is the following statement true or false? Knowledge is acquired by a neural network from itsenvironment through a learning process, and this knowledge is stored in the connectionsstrengths (neurons) between processing units (weights).

    A. TRUE.

    B. FALSE

    Answer: A

    77. A neuron with 4 inputs has the weight vector w = [1, 2, 3, 4]T and a bias _ = 0 (zero). Theactivation function is linear, where the constant of proportionality equals 2 that is, theactivation function is given by f(net) = 2 net. If the input vector is x = [4, 8, 5, 6]T then the outputof the neuron will be

  • 8/10/2019 NN Question Bank VIISem

    35/42

    A. 1.

    B. 56.

    C. 59.

    D. 112.

    E. 118.

    Answer: E

    78. Which of the following types of learning can used for training artificial neural networks?

    A. Supervised learning.

    B. Unsupervised learning.

    C. Reinforcement learning.

    D. All of the above answers.

    E. None of the above answers.

    Answer: D

    79. Which of the following neural networks uses supervised learning?

    A. Simple recurrent network.

    B. Self-organizing feature map.

    C. Hopfield network.

    D. All of the above answers.

    E. None of the above answers.

    Answer: A

    80. Which of the following algorithms can be used to train a single-layer feedforward network?

    A. Hard competitive learning.

    B. Soft competitive learning.

    C. A genetic algorithm.

    D. All of the above answers.

  • 8/10/2019 NN Question Bank VIISem

    36/42

  • 8/10/2019 NN Question Bank VIISem

    37/42

    What are the new values of the weights and threshold after one step of training with the inputvector x = [0, 1]T and desired output 1, using a learning rate _ = 0.5?

    A. w1 = 0.5,w2 = 0.2, _ = 0.3.

    B. w1 = 0.5,w2 = 0.3, _ = 0.2.

    C. w1 = 0.5,w2 = 0.3, _ = 0.2.

    D. w1 = 0.5,w2 = 0.3, _ = 0.7.

    E. w1 = 1.0,w2 = 0.2, _ = 0.2.

    Answer: C

    85. The Perceptron Learning Rule states that for any data set which is linearly separable, thePerceptron Convergence Theorem is guaranteed to find a solution in a finite number of steps.

    A. TRUE.

    B. FALSE.

    Answer: B

    86. Is the following statement true or false? The XOR problem can be solved by a multi-layerperceptron but a multi-layer perceptron with bipolar step activation functions cannot learn to dothis.

    A. TRUE.

    B. FALSE.

    Answer: A

    87. The Adaline neural network can be used as an adaptive filter for echo cancellation intelephone circuits. For the telephone circuit given in the above figure, which one of the followingsignals carries the corrected message sent from the human speaker on the left to the humanlistener on the right? (Assume that the person on the left transmits an outgoing voice signal andreceives an incoming voice signal from the person on the right.)

    A. The outgoing voice signal, s.

    B. The delayed incoming voice signal, n.

    C. The contaminated outgoing signal, s + n0.

    D. The output of the adaptive filter, y.

    E. The error of the adaptive filter, " = s + n0 y.

    Answer: E

  • 8/10/2019 NN Question Bank VIISem

    38/42

    88. What is the credit assignment problem in the training of multi-layer feedforward networks?

    A. The problem of adjusting the weights for the output layer.

    B. The problem of adapting the neighbours of the winning unit.

    C. The problem of defining an error function for linearly inseparable problems.

    D. The problem of avoiding local minima in the error function.

    E. The problem of adjusting the weights for the hidden layers.

    Answer: E

    89. Is the following statement true or false? The generalized Delta rule solves the creditassignment problem in the training of multi-layer feedforward networks.

    A. TRUE.

    B. FALSE.

    Answer: A

    90. A common technique for training MLFF networks is to calculate the generalization error on aseparate data set after each epoch of training. Training is stopped when the generalization errorstarts to decrease. This technique is called

    A. Boosting.

    B. Momentum.

    C. Hold-one-out.

    D. Early stopping.

    E. None of the above answers.

    Answer: E

    91. Which of the following statements is NOT true for an autoassociative feedforward network witha single hidden layer of neurons?

    A. During training, the target output vector is the same as the input vector.

    B. It is important to use smooth, non-decreasing activation functions in the hidden units.

    C. The network could be trained using the backpropagation algorithm, but care must be taken to deal withthe problem of local minima.

    D. After training, the hidden units give a representation that is equivalent to the principal components ofthe training data, removing non-redundant parts of the input data.

  • 8/10/2019 NN Question Bank VIISem

    39/42

    E. The trained network can be split into two machines: the first layer of weights compresses the inputpattern (encoder), and the second layer of weights reconstructs the full pattern (decoder).

    Answer: D

    92. Which of the following statements is NOT true for a simple recurrent network (SRN)?

    A. The training examples must be presented to the network in the correct order.

    B. The test examples must be presented to the network in the correct order.

    C. This type of network can predict the next chunk of data in the series from the past history of data.

    D. The hidden units encode an internal representation of the data in the series that precedes the currentinput.

    E. The number of context units should be the same as the number of input units.

    Answer: E

    93. How many hidden layers are there in an autoassociative Hopfield network?

    A. None (0).

    B. One (1).

    C. Two (2).

    Answer: A

    94. A Hopfield network has 20 units. How many adjustable parameters does this network contain?

    A. 95

    B. 190

    C. 200

    D. 380

    E. 400

    Answer: B

    95. Is the following statement true or false? Patterns within a cluster should be similar in someway.

    A. TRUE.

    B. FALSE.

  • 8/10/2019 NN Question Bank VIISem

    40/42

    Answer: A

    96. Is the following statement true or false? Clusters that are similar in some way should be farapart.

    A. TRUE.

    B. FALSE.

    Answer: B

    97. Which of the following statements is NOT true for hard competitive learning (HCL)?

    A. There is no target output in HCL.

    B. There are no hidden units in a HCL network.

    C. The input vectors are often normalized to have unit length that is, k x k= 1.

    D. The weights of the winning unit k are adapted by 4wk = _ (x wk), where x is the input vector.

    E. The weights of the neighbours j of the winning unit are adapted by 4wj = _j (x wj ), where

    _j < _ and j 6= k.

    Answer: E

    98. Which of the following statements is NOT true for a self-organizing feature map (SOFM)?

    A. The size of the neighbourhood is decreased during training.

    B. The SOFM training algorithm is based on soft competitive learning.

    C. The network can grow during training by adding new cluster units when required.

    D. The cluster units are arranged in a regular geometric pattern such as a square or ring.

    E. The learning rate is a function of the distance of the adapted units from the winning unit.

    Answer: C

    99. Which of the following statements is the best description of reproduction?

    A. Randomly change a small part of some strings.

    B. Randomly generate small initial values for the weights.

    C. Randomly pick new strings to make the next generation.

    D. Randomly combine the genetic information from 2 strings.

  • 8/10/2019 NN Question Bank VIISem

    41/42

    Answer: C

    100. Which of the following statements is the best description of mutation?

    A. Randomly change a small part of some strings.

    B. Randomly pick new strings to make the next generation.

    C. Randomly generate small initial values for the weights.

    D. Randomly combine the genetic information from 2 strings.

    Answer: A

    101. Ranking is a technique used for

    A. deleting undesirable members of the population.

    B. obtaining the selection probabilities for reproduction.

    C. copying the fittest member of each population into the mating pool.

    D. preventing too many similar individuals from surviving to the next generation.

    Answer: B

    102. Is the following statement true or false? A genetic algorithm could be used to search thespace of possible weights for training a recurrent artificial neural network, without requiring anygradient information.

    A. TRUE.

    B. FALSE.

    Answer: A

    103. Is the following statement true or false? Learning produces changes within an agent thatover time enables it to perform more effectively within its environment.

    A. TRUE.

    B. FALSE.

    Answer: A

    104. Which application in intelligent mobile robots made use of a single-layer feedforwardnetwork?

    A. Goal finding.

    B. Path planning.

  • 8/10/2019 NN Question Bank VIISem

    42/42

    C. Wall following.

    D. Route following.

    E. Gesture recognition.

    Answer: C

    105. Which application in intelligent mobile robots made use of a self-organizing feature map?

    A. Goal finding.

    B. Path planning.

    C. Wall following.

    D. Route following.

    E. Gesture recognition.

    Answer: D

    106. Which application in intelligent mobile robots made use of a genetic algorithm?

    A. Goal finding.

    B. Path planning.

    C. Wall following.

    D. Route following.

    E. Gesture recognition.

    Answer: B