A BRIEF INTRODUCTION TO NEURAL NETWORKS Luke Flemmer

A BRIEF INTRODUCTION TO NEURAL NETWORKS

Luke Flemmer

History

First proposed by McCulloch and Pitts in 1943 Early models were binary; shown to be Turing

complete for large enough networks McCulloch and Pitts also demonstrated, in

1947, how such networks could perform pattern recognition

Rosenblatt coined term ‘Perceptron’ Criticism of two layer networks by Minksy and

Papert in 1969 led to reduced interest in the field

Re-emergence in early 80’s driven by more powerful hardware, and multi-layer networks

A Biological Neuron

An Artificial Neuron

Input Summation

Transfer Function

Activation Value

One or more inputs connect into the neuron. Each connection has a weight, which is analogous to the synaptic strength of the connection. The effect that the input has on this neuron will be determined by the original strength of the signal, and its weight.

The neuron sums all the inputs, and applies a transfer function, to define a single scalar value for its output. The transfer function can be as simple as y = x, i.e. just pass on the summed value. The output is then fed to other neurons.

A Simple Linear Network

4

12

8W = .5

W = .5

6

4

5W = .5

W = .5

And Another…

1

0

W = 1

0

1

W = 0

W = 1

W = 0

0

1

W = 1

1

0

W = 0

W = 1

W = 0

The Basic Node Evaluation

Input Values are the product of the Activation Values of incoming nodes and the weight of their connection

NetInput = ∑(incomingNodeValue * connectionStrength)

Inputs can be inhibitory (negative weight) Simple model is called Linear Activation:

OutputValue = InputValue More complex (non-linear)functions can be used

Often activation is gated with some threshold, to generate a binary output

You can use a bias to adjust firing threshold

More Sophisticated Networks Introduce intermediate layers between input and output Must use a non-linear activation function; if we don’t the set of

intermediate nodes can be collapsed to a single set of weights in a two layer network, rendering the hidden layer(s) useless

Popular activation function is the ‘Logistic Activation Function’: ActivationValue = 1 / (1 + e ^ ( -1 * netInput)) netInput is the sum of weighted inputs, as we have used previously

‘hidden layer’

Learning Systems

Also called Evolutionary or Adaptive Systems Systems are able to adjust their characteristics based on

the data to which they are exposed The systems ‘learn’ the characteristics of the data – in the

case of networks, the connection weights between layers Other examples would be Genetic Algorithms and other

Gradient Descent methods like Simulated Annealing and Simplex

Neural Networks and Genetic Algorithms have in common that they can evolve not just the constants, but the model itself

Two kinds – Supervised and Unsupervised Learning (e.g. clustering in Kohonen Maps)

Supervised systems use training datasets Try to minimize error, but there is a risk of over-fitting

Learning in Networks: Hebbian Learning Hebbian Learning

Simplest kind, and basis of more complex models Based on neurobiological theory that synaptic

connections are strengthened between neurons that fire concurrently

Only applicable to two-layer networks Output values are fixed at the expected value Over multiple iterations (called ‘epochs’), the value of

each connection is adjusted with the formula: ∆weight = OuputValue * InputValue * LearningRate Where Learning rate is a constant, e.g. 0.05

This has the effect of strengthening weights between an input node and an output node when the value of the input agrees with the value of the output node (based on the sum of all its inputs)

This is not a stable formula, i.e. it will not converge on the optimal values for the weights

Learning in Networks – The Delta Rule Also only applicable to two layer networks Similar to Hebbian learning, but does not fix output values Rather, compares actual output with desired output at each epoch,

and generates an error correction (effectively minimizes sum of square errors across the networks outputs)

Assuming a linear activation function, the formula for adjustment of the weight of a connection to an individual output unit is given by: ∆weight = (DesiredActivation – ActualActivation) * Input Value *

LearningRate This has the effect of adjusting the connection weight based on the

discrepancy between expected and actual output, and doing it in proportion to how large the input value was This means that, if the input value of a particular node was 0, the

connection between it and the otuput unit will not have its weight adjusted, since we cannot hold it responsible for the error on the output

We are actually taking the partial derivative of the total output error across all nodes, with respect to the input weight for each node.

This will converge to the optimal weights for the network

Learning in Networks: Back Propagation More sophisticated approach Applicable to networks incorporating one or more

‘hidden layers’ The concept is the same as for the Delta Rule: we

seek to minimize the sum of squared errors of the networks output, measured against the training set

The problem of dividing the responsibility for the error across the multiple layers becomes more complex

We end up with a recursively applied equation in which we apply the error backwards from the output layer through the one or more hidden layers, using the weights to determine how much error to expose to the hidden layers

Back Propagation Learning: A Visual Representation

Error = 6

Error = 8

Weight = 0.25

Weight = 0.5

Error = 6 * 0.5 + 8 * 0.25 = 5

Direction of Error Propagation

A Non-trivial Network

‘A’Pixel 1,1 = 0

Pixel 4,9 = 1

Pixel 18,16 = 1Pixel 20,20 = 0

…

0

1

0

0

0

0

1

0

ASCII Character 65

400 Node Input Layer

400 Node Hidden Layer

8 Node Output Layer

Auto-Associative Networks

Use excitatory and inhibitory connections to establish connections between items and their properties

When an item is associated with some set of properties, excitatory connections are created between them, and inhibitory connections are created to the non-associated properties

This has the effect of making related items and properties ‘pop out’ when one of them is chosen

For this reason, such systems are also known as ‘content addressable memories’ as items are retrieved by presenting related content stimuli to the network

Generally, the network is evaluated over several cycles to allow the activation to propagate, until it reaches a steady state, which is the network’s ‘answer’ to the inputs

An Example Auto-Associative Network

Presenting ‘Steve’ to the network will yield responses for ‘Lawyer’ and ‘NYC’ as well as weaker responses for ‘Ian’ and ‘London’

Presenting ‘Doctor’ to the network will yield a response of ‘Marcy’ and a weaker response of ‘Paris’

Presenting ‘Lawyer’ and ‘London’ will yield a response of ‘Ian’ – this is the essence of a ‘content addressable memory’

The excitatory / inhibitory connection strengths do not have to be binary, and can be learned from the data using the Hebbian learning rule.

More complex and interdependent networks yield richer semantic discovery

Lawyer

Doctor

Architect

Steve

Ian

Marcy

John

NYC

London

Paris

Oslo

Attractive Aspects of Networks Neural Plausibility

More interesting for neuroscientists than computer scientists

Support for Soft Constraints Graceful degradation Content-addressable memory

Auto-associative networks Ability to learn and to generalize

Weaknesses of Networks

Have been called ‘the second best way to do anything’

Biological plausibility of the classic model is poor Problems of modeling scale, particularly in the area

of psychology Researchers look for equivalence between human

learning and the behavior of networks with a tiny number of nodes

Large number of training epochs required Still prone to the same problems as other

optimization techniques – over/under fitting, local minima etc

Interesting Areas of Research Temporal correlation

Spiking models Feedback and Recursion Content addressable memory Hierarchical Networks Hippocampal Modelling

A Short Reading List

Connectionism and The Mind – William Bechtel and Adele Abrahamsen

The Quest For Consciousness – Christof Koch

Gateway to Memory – Mark Gluck and Catherine Myers

On Intelligence – Jeff Hawkins and Sandra Blakeslee

Documents

A BRIEF INTRODUCTION TO NEURAL NETWORKS Luke Flemmer