Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1

1

Artificial Neural Networks

Chapter 4

• Perceptron• Gradient Descent• Multilayer Networks • Backpropagation Algorithm

2

Amazing numbers

3

Properties of Neural Networks

6

Real Neurons

• Cell structures– Cell body– Dendrites– Axon– Synaptic terminals

7

Neural Communication• Electrical potential across cell membrane

exhibits spikes called action potentials.• Spike originates in cell body, travels down

axon, and causes synaptic terminals to release neurotransmitters.

• Chemical diffuses across synapse to dendrites of other neurons.• Neurotransmitters can be excititory or

inhibitory.• If net input of neurotransmitters to a neuron

from other neurons is excititory and exceeds some threshold, it fires an action potential.

8

Real Neural Learning

• Synapses change size and strength with experience.

• Hebbian learning (1949): When two connected neurons are firing at the same time, the strength of the synapse between them increases.

“Neurons that fire together, wire together.”

9

A Neuron model

10

Artificial Neuron Model• Model network as a graph with cells as nodes and synaptic

connections as weighted edges wi , from node i

• Model net input to cell as

• Cell output is:

O

x11 x3x2 x4

w0

w1 w2

w3w4

ii

ixwnet

0 if 1

0 if 1

net

neto

net

O=sign(net)

0

1

-1w0: is called bias or threshold

11

Artificial Neural Network

• Learning approach based on modeling adaptation in biological neural systems:

– Perceptron: Initial algorithm for learning simple neural networks (single layer)

(Rosenblatt, Frank (1957), The Perceptron--a perceiving and recognizing automaton. Report 85-460-1, Cornell Aeronautical Laboratory)

– Backpropagation: More complex algorithm for learning feed forward multi-layer neural networks

(Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (8 October 1986). "Learning representations by back-propagating errors". Nature 323 (6088): 533–536)

12

A Perceptron

14

Perceptron Training

• Assume supervised training examples giving the desired output for a unit given a set of known input activations.

• Learn synaptic weights so that unit produces the correct output for each example.

• Perceptron uses iterative update algorithm to learn a correct set of weights.

15

Perceptron Learning Rule

iii xotww )( • Update weights by:

where, η: is the “learning rate” t: is the teacher specified output (target value).• Equivalent to rules:– If output is correct do nothing.– If output is high, lower weights on active inputs– If output is low, increase weights on active inputs

16

Perceptron Learning Algorithm

• Iteratively update weights until convergence.

• Each execution of the outer loop is typically called an epoch.

Initialize weights to random valuesUntil outputs of all training examples are correct For each training pair, E, do: Compute current output o for E given its inputs Compare current output to target value, t , for E Update synaptic weights and threshold using learning rule

17

Perceptron training rule

18

Perceptron as Hill Climbing

• to minimize error on the training set.• Perceptron effectively does hill-climbing (gradient descent) in this space,

changing the weights a small amount at each point to decrease training set error.

• For a single model neuron, the space is well behaved with a single minima.

0

weights

trainingerror

19

Geometric Interpretation:Perceptron as a Linear Separator

022110 xwxww

• Since perceptron uses linear threshold function, it is searching for a linear separator that discriminates the classes.

x2

x1

??

Or hyperplane in n-dimensional space

20

Perceptron Performance

• Linear threshold functions are restrictive

• In practice, converges fairly quickly for linearly separable data.

• Can effectively use even incompletely converged results when only a few outliers are misclassified.

• Experimentally, Perceptron does quite well on many benchmark data sets.

21

Concept Perceptron Cannot Learn XOR

• Cannot learn exclusive-or, or parity function in general.

x3

x2

??+1

01

–

+–

22

Perceptron Limits

• If the data are not linearly separable, the convergence is not assured.

• Minksy and Papert (1969) wrote a book analyzing the perceptron and demonstrating many functions it could not learn.

(M.L. Minsky and S. Papert. Perceptrons; An introduction to computational geometry. Cambridge, Mass.,: MIT Press, 1969)

• These results discouraged further research on neural nets

24

Hidden unit: a solution for non-linear separable data

• Such networks show the potential of hidden neurons

Threshold (bias)

weights

x1 x2

Documents

Artificial Neural Networks Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1