Upload
daniel-walker
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
A BRIEF INTRODUCTION TO NEURAL NETWORKS
Luke Flemmer
History
First proposed by McCulloch and Pitts in 1943 Early models were binary; shown to be Turing
complete for large enough networks McCulloch and Pitts also demonstrated, in
1947, how such networks could perform pattern recognition
Rosenblatt coined term ‘Perceptron’ Criticism of two layer networks by Minksy and
Papert in 1969 led to reduced interest in the field
Re-emergence in early 80’s driven by more powerful hardware, and multi-layer networks
A Biological Neuron
An Artificial Neuron
Input Summation
Transfer Function
Activation Value
One or more inputs connect into the neuron. Each connection has a weight, which is analogous to the synaptic strength of the connection. The effect that the input has on this neuron will be determined by the original strength of the signal, and its weight.
The neuron sums all the inputs, and applies a transfer function, to define a single scalar value for its output. The transfer function can be as simple as y = x, i.e. just pass on the summed value. The output is then fed to other neurons.
A Simple Linear Network
4
12
8W = .5
W = .5
6
4
5W = .5
W = .5
And Another…
1
0
W = 1
0
1
W = 0
W = 1
W = 0
0
1
W = 1
1
0
W = 0
W = 1
W = 0
The Basic Node Evaluation
Input Values are the product of the Activation Values of incoming nodes and the weight of their connection
NetInput = ∑(incomingNodeValue * connectionStrength)
Inputs can be inhibitory (negative weight) Simple model is called Linear Activation:
OutputValue = InputValue More complex (non-linear)functions can be used
Often activation is gated with some threshold, to generate a binary output
You can use a bias to adjust firing threshold
More Sophisticated Networks Introduce intermediate layers between input and output Must use a non-linear activation function; if we don’t the set of
intermediate nodes can be collapsed to a single set of weights in a two layer network, rendering the hidden layer(s) useless
Popular activation function is the ‘Logistic Activation Function’: ActivationValue = 1 / (1 + e ^ ( -1 * netInput)) netInput is the sum of weighted inputs, as we have used previously
‘hidden layer’
Learning Systems
Also called Evolutionary or Adaptive Systems Systems are able to adjust their characteristics based on
the data to which they are exposed The systems ‘learn’ the characteristics of the data – in the
case of networks, the connection weights between layers Other examples would be Genetic Algorithms and other
Gradient Descent methods like Simulated Annealing and Simplex
Neural Networks and Genetic Algorithms have in common that they can evolve not just the constants, but the model itself
Two kinds – Supervised and Unsupervised Learning (e.g. clustering in Kohonen Maps)
Supervised systems use training datasets Try to minimize error, but there is a risk of over-fitting
Learning in Networks: Hebbian Learning Hebbian Learning
Simplest kind, and basis of more complex models Based on neurobiological theory that synaptic
connections are strengthened between neurons that fire concurrently
Only applicable to two-layer networks Output values are fixed at the expected value Over multiple iterations (called ‘epochs’), the value of
each connection is adjusted with the formula: ∆weight = OuputValue * InputValue * LearningRate Where Learning rate is a constant, e.g. 0.05
This has the effect of strengthening weights between an input node and an output node when the value of the input agrees with the value of the output node (based on the sum of all its inputs)
This is not a stable formula, i.e. it will not converge on the optimal values for the weights
Learning in Networks – The Delta Rule Also only applicable to two layer networks Similar to Hebbian learning, but does not fix output values Rather, compares actual output with desired output at each epoch,
and generates an error correction (effectively minimizes sum of square errors across the networks outputs)
Assuming a linear activation function, the formula for adjustment of the weight of a connection to an individual output unit is given by: ∆weight = (DesiredActivation – ActualActivation) * Input Value *
LearningRate This has the effect of adjusting the connection weight based on the
discrepancy between expected and actual output, and doing it in proportion to how large the input value was This means that, if the input value of a particular node was 0, the
connection between it and the otuput unit will not have its weight adjusted, since we cannot hold it responsible for the error on the output
We are actually taking the partial derivative of the total output error across all nodes, with respect to the input weight for each node.
This will converge to the optimal weights for the network
Learning in Networks: Back Propagation More sophisticated approach Applicable to networks incorporating one or more
‘hidden layers’ The concept is the same as for the Delta Rule: we
seek to minimize the sum of squared errors of the networks output, measured against the training set
The problem of dividing the responsibility for the error across the multiple layers becomes more complex
We end up with a recursively applied equation in which we apply the error backwards from the output layer through the one or more hidden layers, using the weights to determine how much error to expose to the hidden layers
Back Propagation Learning: A Visual Representation
Error = 6
Error = 8
Weight = 0.25
Weight = 0.5
Error = 6 * 0.5 + 8 * 0.25 = 5
Direction of Error Propagation
A Non-trivial Network
‘A’Pixel 1,1 = 0
Pixel 4,9 = 1
Pixel 18,16 = 1Pixel 20,20 = 0
…
0
1
0
0
0
0
1
0
ASCII Character 65
400 Node Input Layer
400 Node Hidden Layer
8 Node Output Layer
Auto-Associative Networks
Use excitatory and inhibitory connections to establish connections between items and their properties
When an item is associated with some set of properties, excitatory connections are created between them, and inhibitory connections are created to the non-associated properties
This has the effect of making related items and properties ‘pop out’ when one of them is chosen
For this reason, such systems are also known as ‘content addressable memories’ as items are retrieved by presenting related content stimuli to the network
Generally, the network is evaluated over several cycles to allow the activation to propagate, until it reaches a steady state, which is the network’s ‘answer’ to the inputs
An Example Auto-Associative Network
Presenting ‘Steve’ to the network will yield responses for ‘Lawyer’ and ‘NYC’ as well as weaker responses for ‘Ian’ and ‘London’
Presenting ‘Doctor’ to the network will yield a response of ‘Marcy’ and a weaker response of ‘Paris’
Presenting ‘Lawyer’ and ‘London’ will yield a response of ‘Ian’ – this is the essence of a ‘content addressable memory’
The excitatory / inhibitory connection strengths do not have to be binary, and can be learned from the data using the Hebbian learning rule.
More complex and interdependent networks yield richer semantic discovery
Lawyer
Doctor
Architect
Steve
Ian
Marcy
John
NYC
London
Paris
Oslo
Attractive Aspects of Networks Neural Plausibility
More interesting for neuroscientists than computer scientists
Support for Soft Constraints Graceful degradation Content-addressable memory
Auto-associative networks Ability to learn and to generalize
Weaknesses of Networks
Have been called ‘the second best way to do anything’
Biological plausibility of the classic model is poor Problems of modeling scale, particularly in the area
of psychology Researchers look for equivalence between human
learning and the behavior of networks with a tiny number of nodes
Large number of training epochs required Still prone to the same problems as other
optimization techniques – over/under fitting, local minima etc
Interesting Areas of Research Temporal correlation
Spiking models Feedback and Recursion Content addressable memory Hierarchical Networks Hippocampal Modelling
A Short Reading List
Connectionism and The Mind – William Bechtel and Adele Abrahamsen
The Quest For Consciousness – Christof Koch
Gateway to Memory – Mark Gluck and Catherine Myers
On Intelligence – Jeff Hawkins and Sandra Blakeslee