Upload
lixue
View
81
Download
7
Tags:
Embed Size (px)
DESCRIPTION
Artificial Neural Network (ANN). Neural network -- “ a machine that is designed to model the way in which the brain performs a particular task or function of interest ” (Haykin, 1994, pg. 2). Uses massive interconnection of simple computing cells (neurons or processing units). - PowerPoint PPT Presentation
Citation preview
04/21/23 Neural Networks 1
04/21/23 Neural Networks 2
Artificial Neural Network (ANN)
o Neural network -- “a machine that is designed to model the way in which the brain performs a particular task or function of interest” (Haykin, 1994, pg. 2).– Uses massive interconnection of simple
computing cells (neurons or processing units).– Acquires knowledge thru learning.– Modify synaptic weights of network in orderly
fashion to attain desired design objective.o Attempts to use ANNs since 1950’s.
– Abandoned by most by 1970’s.
04/21/23 Neural Networks 3
Artificial Intelligence (AI)
o “A field of study that encompasses computational techniques for performing tasks that apparently require intelligence when performed by humans” (Tanimoto, 1990).– Goal to increase our understanding of
reasoning, learning, & perceptual processes.
o Knowledge representation.o Search. Fundamental Issueso Perception & inference.
04/21/23 Neural Networks 4
Traditional AI vs. Neural Networks
Traditional AI:o Programs brittle & overly
sensitive to noise.o Programs either right or
fail completely.– Human intelligence
much more flexible (guessing).
o http://www-ai.ijs.si/eliza/eliza.html
Neural Networks:o Capture knowledge in
large # of fine-grained units.
o More potential for partially matching noisy & incomplete data.
o Knowledge is distributed uniformly across network.
o Model for parallelism – each neuron is independent unit.
o Similar to human brains?
Neural Network
sConnectioni
sm
Parallel Distributed Processing
Neuro-computing
Natural
Intelligent
Systems
Machine Learning
Algorithms
Artificial N
eural
Networks
Biologicall
y Inspired
Computing
04/21/23 Neural Networks 6
Handwriting Neural Network
o http://www.youtube.com/watch?v=qXoVGxjUTtA
http://www.manifestation.com/
neurotoys/eliza.php3
04/21/23 Neural Networks 7
NETtalk (Sejnowski & Rosenberg)
o http://cnl.salk.edu/Media/nettalk.mp3.
04/21/23 Neural Networks 8
04/21/23 Neural Networks 9
Human Brain
o “… a highly complex, nonlinear, and parallel computer (information-processing system). It has the capability to organize its structural constituents, know as neurons, so as to perform certain computations (e.g., pattern recognition, perception, and motor control) many times faster than the fastest digital computer in existence today.” (Haykin, 1999, Neural Networks: A Comprehensive Foundation, pg. 1).
04/21/23 Neural Networks 10
Approaches to Studying Brain
o Know enough neuroscience to understand why computer models make certain approximations.– Understand when approximations are good &
when bad.o Know tools of formal analysis for models.
– Some simple mathematics.– Access to simulator or ability to program.
o Know enough cognitive science to have some idea of about what the system is supposed to do.
04/21/23 Neural Networks 11
Why Build Models? “… a model is simply a detailed
theory.”1. Explicitness – constructing model of theory &
implementing it as computer program requires great level of detail.
2. Prediction – difficult to predict consequences of model due to interactions between different parts of model.– Connectionist models are non-linear.
3. Discover & test new experiments & novel situations.4. Practical reasons why difficult to test theory in real world.
– Systematically vary parameters thru full range of possible values.
5. Help understand why a behavior might occur.• Simulations open for direct inspections explanation
of behavior.
04/21/23 Neural Networks 12
Simulations As Experiments
o Easy to do simulations, but difficult to do them well.o Running a good simulation like running good
experiment.1. Clearly articulated problem (goal).2. Well-defined hypothesis, design for testing
hypothesis, & plan how to the results.– Hypothesis from current issues in literature.– E.g., test predictions, replicate observed
behaviors, test theory of behavior.3. Task, stimulus representations & network
architectures must be defined.
04/21/23 Neural Networks 13
What kinds of problems can ANNs help us understand?
o Brain of newborn child contains billions of neurons– But child can’t perform many
cognitive functions.o After a few years of receiving
continuous streams of signals from outside world via sensory systems,– Child can see, understand
language & control movements of body.
o Brain discovers, without being taught, how to make sense of signals from world.
o How???o Where do you start?
NN Applications http://www-cs-faculty.stanford.edu/~eroberts/cours
es/soco/projects/2000-01/neural-networks/Applications/index.html
o Character recognitiono Image compressiono Stock market predictiono Traveling salesman problemo Medicine, electronic noise, loan applications
04/21/23 Neural Networks 14
Neural Networks (ACM)
o Web spam detection by probability mapping graphSOMs and graph neural networks
o No-reference quality assessment of JPEG images by using CBP neural networks
o An Embedded Fingerprints Classification System based on Weightless Neural Networks
o Forecasting Portugal global load with artificial neural networkso 2006 Special issue: Neural network forecasts of the tropical
Pacific sea surface temperatureso Developmental learning of complex syntactical song in the
Bengalese finch: A neural network modelo Neural networks in astronomy
04/21/23 Neural Networks 15
04/21/23 Neural Networks 16
Artificial & Biological Neural Networks
o Build intelligent programs using models that parallel structure of neurons in human brain.
o Neurons – cell body with dendrites & axon.– Dendrites receive signals from other
neurons.– When combined impulses exceed threshold,
neuron fires & impulse passes down axon.– Branches at end of axon form synapses with
dendrites of other neurons.• Excitatory or inhibitory.
04/21/23 Neural Networks 17
Do Neural Networks Mimic Human Brain?
o “It is not absolutely necessary to believe that neural network models have anything to do with the nervous system, …
o … but it helps. o Because, if they do, we are able to
use a large body of ideas, experiments, and facts from cognitive science and neuroscience to design, construct, and test networks.” (Anderson, 1997, p. 1)
04/21/23 Neural Networks 18
Neural Networks Abstract From the Details of Real Neurons
o Conductivity delays are neglected.o Net input is calculated as weighted sum of
input signals.o Net input is transformed into an output signal
via a simple function (e.g., a threshold function).
o Output signal is either discrete (e.g., 0 or 1) or it is a real-valued number (e.g., between 0 and 1).
04/21/23 Neural Networks 19
ANN Features
o A series of simple computational elements, called neurons (or nodes, units, cells)
o Connections between neurons that carry signals
o Each link (connection) between neurons has a weight that can be modified
o Each neuron sums the weighted input signals and applies an activation function to determine the output signal (Fausett, 1994).
04/21/23 Neural Networks 20
04/21/23 Neural Networks 21
Neural Networks Are Composed of Nodes &
Connectionso Nodes – simple processing
units.– Similar to neurons – receive
inputs from other sources.– Excitatory inputs tend to
increase neuron’s rate of firing.
– Inhibitory inputs tend to decrease neuron’s rate of firing.
o Firing rate changes via real-valued number (activation).
o Input to node comes from other nodes or from some external source.
Fully recurrent network
3-layer feed forward network
04/21/23 Neural Networks 22
Connections
o Input travels along connection lines.o Connections between different nodes
can have different potency (connection strength) in many models.– Strength represented by real-valued
number (connection weight).– Input from one node to another is
multiplied by connection weight.o If connection weight is
– Negative number – input is inhibitory.– Positive number – input is excitatory.
04/21/23 Neural Networks 23
Nodes & Connections Form Various Layers of NN
04/21/23 Neural Networks 24
A Single Node/Neuron
o Inputs to node usually summed ( ).o Net input passed thru activation function ( f(net) ).o Produces node’s activation which is sent to other nodes.o Each input line (connection) represents flow of activity
from some other neuron or some external source.
f(net)Inputs from other nodes
Outputs to other nodes
04/21/23 Neural Networks 25
More Complex Model of a Neuron
wk1
wk2
…
…
wkp
(-)
k
x1
x2
xp
uk
Output
yk
Activation Function
Threshold
Summing function
Synaptic weights of neuron
Input
signals
Linear Combiner Output
04/21/23 Neural Networks 26
Add up Net Inputs to Node
o Each input (from different nodes) is calculated by multiplying activation value of input node by weight on connection (from input node to receiving node).
neti = wijaj Net input to node i j
= sigma (summation) o i = receiving nodeo aj = activation on nodes sending to node io wij = weights on connection between nodes j & i.
04/21/23 Neural Networks 27
Sums (weights * activation) For All Input Nodes
neti = wijaj
j
o i = 4 (node 4).o j = 3 (3 input nodes
into node 4).
o add up wij * ai for all 3 input nodes.
4
0 1 2
04/21/23 Neural Networks 28
Activation Functions : Node Can Do Several Things With Net Input
1. Activation (e.g., output) = Input.• (f(net)) is Identity function.• Simplest case.
2. Threshold must be achieved before activation occurs.1. Activation function may be non-linear
function of input. Resembles sigmoid.2. Activation function may be linear.
Real neurons
04/21/23 Neural Networks 29
Different Types of NN Possible
1. Single layer or multi-layer architectures (Hopfield, Kohonen).2. Data processing. thru network.
o Feedforward.o Recurrent.
3. Variations in nodes.o Number of nodes.o Types of connections among nodes in network.
4. Learning algorithms.o Supervised.o Unsupervised (self-organizing).o Back propagation learning (training).
5. Implementation.– Software or hardware.
04/21/23 Neural Networks 30
04/21/23 Neural Networks 31
Steps in Designing a Neural Network
1. Arrange neurons in various layers. 2. Decide type of connections among neurons
for different layers, as well as among neurons within layer.
3. Decide way a neuron receives input & produces output.
4. Determine strength of connection within network by allowing network to learn appropriate values of connection weights via training data set.
Activation Functions
1. Identity function: f(x) = x for all x2. Binary step function: f(x) = 1 if x >= θ; f(x) =
0 if x < θ3. Continuous log-sigmoid function (Logistic
function): f(x) = 1/[1 + exp(-σx)]
04/21/23 Neural Networks 32
04/21/23 Neural Networks 33
– a i = activation (output) of node i
– net i = net activation flowing into node i
– e = exponential
o What output of node will be for any given net input.
o Graph of relationship (next slide).
Sigmoid Activation Function
04/21/23 34
Sigmoid Activation Function Often Used for Nodes in NN
o For wide range of inputs (> 4.0 & < -4.0), nodes exhibit all or nothing.– Output max. value of 1(on).– Output min. value of 0 (off).
o Within range of –4.0 to 4.0, nodes show greater sensitivity.– Output capable of making
fine discriminations between different inputs.
o Non-linear response is at heart of what makes these networks interesting.nothing or all
04/21/23 Neural Networks 35
o What will be the activation of node 2, assuming the input you just calculated?
o If node 2 receives input of 1.25, activation of 0.777.
o Activation function scales from 0.0 to 1.0.
o When net input = 0.0, net output is exact mid-range of possible activation (0.5).
Negative inputs
04/21/23 Neural Networks 36
Example 2-Layered Feedforward Network : Step Thru Process
o Neural network consists of collection of nodes.– Number & arrangement of
nodes defines network architecture.
o Example 2-layered feedforward.– 2 layers (input, output).– no intra-level connections.– no recurrent connections.– single connection into input
nodes & out of output nodes.o Very simplified in comparison to
biological neural network!2-layered
feedforward network
a2
a0 a1
w20 w21
Output nodes
Input nodes
04/21/23 Neural Networks 37
o Each input node has certain level of activity associated with it.– 2 input nodes (a0, a1).– 2 output nodes (a2, a3).
o Look at one output unit (a2).– Receives input from a0 & a1 via
independent connections.– Amount depends on activation
values of input nodes (a0 & a1) and weights (w20, w21).
o For this network, activity flows in 1 direction along connections.– e.g., w20 w02– w02 doesn’t exist
o Total input to node 2 (a2) = w20a0 + w21a1.
a2
a0 a1
w20 w21
wij = 20 when i = 0 & j = 2
04/21/23 Neural Networks 38
Exercise 1.1o What is the input received by node 2?
o Net input for node 2 = (1.0 * 0.75) + (1.0 * 0.5) = 1.25
o Net input alone doesn’t determine activity of output node.
o Must know activation function of node.o Assume nodes have activation
functions shown in EQ 1.2 (& Fig. 1.3).o Next slide shows sample inputs &
activations produced - assuming logistic activation function.
a2
1 1
0.75 0.5
04/21/23 Neural Networks 39
Bias Node (Default Activation)
o In absence of any input (i.e. input = 0.), nodes have output of 0.5.
o Useful to allow nodes to have default activation.– Node is “off” (output 0.0) in absence of input.– Or can have default state where node is “on”.
o Accomplish this by adding node to network which receives no inputs, but is always fully activated & outputs 1.0 (bias node).– Node can be connected to any node in network.– Often connected to all nodes except input nodes.– Allow weights on connections from this node to
receiving nodes to be different.
04/21/23 Neural Networks 40
o Guarantees that all receiving nodes have some input even if all other nodes are off.
o Since output of bias node is always 1.0, input it sends to any other node is 1.0 * wij (value of weight itself).
o Only need one bias node per network.o Similar to giving each node a variable threshold.
– large negative bias == node is off (activation close to 0.0) unless gets sufficient positive input from other sources to compensate.
– large positive bias == receiving node is on & requires negative input from other nodes to turn it off.
o Useful to allow individual nodes to have different defaults.
04/21/23 Neural Networks 41
Learning From Experience
• Changing of neural networks connection weights (training) causes network to learn solution to a problem. • Strength of connection between neurons
stored as weight-value for specific connection.
• System learns new knowledge by adjusting these connection weights.
04/21/23 42
Three Training Methods for NN
1. Unsupervised learning – hidden neurons must find a way to organize themselves without help from outside. • No sample outputs provided to network
against which it can measure its predictive performance for given vector of inputs.
• Learning by doing.
2. Supervised Learning (Reinforcement)
o works on reinforcement from outside. • Connections among neurons in hidden
layer randomly arranged, then reshuffled as network told how close it is to solution.
• Requires teacher -- training set of data or observer who grades performance of network results.
• Both unsupervised & supervised suffer from relative slowness & inefficiency relying on random shuffling to find proper connection weights.
04/21/23 Neural Networks 43
3. Back Propagation
o Network given reinforcement for how it is doing on task plus information about errors is used to adjust connections between layers.– Proven highly successful in training of
multilayered neural nets. – Form of supervised learning.
04/21/23 Neural Networks 44
04/21/23 Neural Networks 45
Example Learning Algorithms
1. Hebb’s Rule -- how physical networks might learn.
2. Perceptron Convergence Procedures (PCP).– Widrow-Hoff Learning Rule (1960s).
3. Hopfield.4. Backpropagation of Error (Generalized Delta
Rule).5. Kohonen’s Learning Laws (not covered here).
04/21/23 Neural Networks 46
McCulloch-Pitts (1943) Neuron
1. Activity of neuron is an “all-or-none” process.2. Certain fixed number of synapses must be excited
within period of latent addition to excite neuron at any time.o Number is independent of previous activity &
position of neuron.3. Only significant delay within nervous system is
synaptic delay.4. Activity of any inhibitory synapse absolutely prevents
excitation of neuron at that time.5. Structure of net does not change with time.
McColloch-Pitts Neuron
o Firing within a neuron is controlled by a fixed threshold (θ).
o binary step function: f(x) = 1 if x >= θ; f(x) = 0 if x < θ.
o What happens here if θ = 2?
04/21/23 Neural Networks 47
McColloch-Pitts Neuron AND
P Q P ^ Q (P and Q)
T T TT F FF T FF F F
04/21/23 Neural Networks 48
Threshold = 2Does a2 fire?
McColloch-Pitts Neuron OR
P Q P V Q (P or Q)
T T TT F TF T TF F F
04/21/23 Neural Networks 49
Threshold = 2Does a2 fire?
McColloch-Pitts Neuron XOR
P Q P XOR Q (P xor Q)
T T FT F TF T TF F F
04/21/23 Neural Networks 50
Threshold = 2Does a2 fire?
McColloch-Pitts Neuron AND NOT
o Did you get weights of 2 for w20 and -1 for w21?
04/21/23 Neural Networks 51
McColloch-Pitts Neuron
o http://lcn.epfl.ch/tutorial/english/mcpits/html/index.html
o No learning algorithms
04/21/23 Neural Networks 52
Hebb : The Organization of Behavior (1949)
o When an axon of cell A is near enough to excite a cell B & repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”
o If neuron receives input from another neuron & if both highly active, weight between neurons should be strengthened. – Specific synaptic change (Hebb synapse) which
underlies learning.o Result was interconnections between large, diffuse set of
cells, in different parts of brain called “cell assemblies.”o Changes suggested by Rochester et al. (1956) make
more practical model.
04/21/23 Neural Networks 53
Hebb’s Rule: Associative learning
“Cells that fire together, wire together”
wij = ai aj
– where change in weight = product of activations of nodes that are connected to it.
wij = ηai aj
– where η is the learning rate
o Unsupervised learningo Success at learning some patterns
– it only learns these patterns (e.g., pair-wise correlations). There will be times when want ANN to learn to associate a pattern with some desired behaviors even when there is no pair-wise correlation
04/21/23 Neural Networks 54
04/21/23 CS 271 Ch. 4 55
Pros & Cons of Hebbian Learning
o Known biological mechanisms that might use Hebbian Learning.
o Provides reasonable answer to “where does teacher info for learning process come from?”– Lots of useful info in
correlated activity.– System just needs to
look for patterns.
o All it can learn is pair-wise correlations.
o May need to learn to associate patterns with desired behaviors even if patterns aren’t pair-wise.– Hebb rule can’t do
this.
04/21/23 CS 271 Ch. 4 56
Perceptron Convergence Procedures (PCP)
o Variations of Hebb’s Rule from 1960s. – Perceptron (Rosenblatt, 1958).– Widrow-Hoff rule is similar to PCP (1960).
o Start with network of units with connections initialized with random weights.
o Take target set of input/output patterns & adjust weights automatically so at end of training weights yield correct outputs for any input.– Network should generalize to produce correct output
for input patterns it hasn’t seen during training.o gradient descent rule, Delta rule or Adaline rule
04/21/23 CS 271 Ch. 4 57
http://lcn.epfl.ch/tutorial/english/perceptron/html/index.html
Widrow-Hoff Rule
o starts with connections initialized with random weights and one input pattern is presented to the network.
o For each input pattern, the network’s actual output is compared to the target output for that pattern.
04/21/23 Neural Networks 58
Figure 18: Supervised (Delta Rule) vs. Unsupervised (Perceptron) Learning (www.willamette.edu/~gorr/classes/cs449/Classification/delta.html)
o Any discrepancy (error) used as basis for changing weights on input connections & changing output node’s threshold for activation.
o How much weights are changed depends on error produced & activation from given input. – Correction is proportional to error signal multiplied
by value of activation given by derivative of transfer function.
– Using derivative allows making finely tuned corrections when activation is near its extreme values (minimum or maximum) & larger corrections when activation is in middle range.
o Goal of the Widrow-Hoff Rule is to minimize error on output unit by apportioning credit & blame to the input nodes.
o Only works for simple, 2-layer networks (I/O units).
04/21/23 Neural Networks 59
Using Similarity
o Basic principle that drives learningo Allows generalization of behaviors because similar inputs
tend to yield similar outputs. o 11110000 vs. 11110001o “make” and “bake” “made” and “baked”o Cats and tigerso Similarity is generally a good rule of thumb, but not in
every case. o Hebbian networks & basic, 2-layer PCP networks can only
learn to generalize on basis of physical similarity
04/21/23 Neural Networks 60
04/21/23 CS 271 Ch. 4 61
2-layer Perceptron Can’t Solve Problem of Boolean XOR
o If want output to be true (1).– At least 1 input must be 1 & at
least 1 weight must be large enough so when multiplied, output node turns on.
o For patterns (00 & 11) want 0 so set weights to 0.
o For patterns (01 & 10), need weights from either input large enough so 1 input alone activates output.
o Contradictory requirements -- no set of weights allows output to come on if either input on & keeps it off if both are on!
Node 0
Node 1 XOR
0 0 0
1 0 1
0 1 1
1 1 0
w21
a0a1
a
2w20
04/21/23 CS 271 Ch. 4 62
Vectors
o Vector -- collection of numbers or point in space.o Can think of inputs in XOR example as 2-D space.
– With each number indicating how far out along the dimension the point is located.
o Judge similarity of 2 vectors by Euclidean distance in space.– Pairs of patterns furthest apart & most dissimilar
(00 & 11) are ones need to group together for XOR function.
• •
• •
1
10
0
0,1 1,1
0,0 1,0
04/21/23 CS 271 Ch. 4 63
o I/O weights impose linear decision bound on input space.– Patterns which fall on 1 side of decision line
classified differently than patterns on other side.o When groups of inputs can’t be separated by line, no
way for unit to discriminate between categories.– Problems called non-linearly separable.
o What’s needed are hidden units & learning algorithms that can handle more than one layer.
• •
• •
1
10
0
0,1 1,1
0,0 1,0
• •
• •
1
10
0
0,1 1,1
0,0 1,0
• •
• •
1
10
0
0,1 1,1
0,0 1,0
AND OR XOR
CS 271 Ch. 4 64
Solving the XOR Problem : Allow Internal
Representationo Add extra node(s) between I & XOR problem
solved.o “Hidden” units equivalent to internal
representations & aren’t seen by world.– Very powerful -- networks have internal
representations that capture more abstract, functional relationships.
o Inputs (sensors), outputs (motor effectors) & hidden (inter-neurons).
o Input similarity still important .– All things being equal, physical resemblance of inputs
exerts strong pressure to induce similar responses.
CS 271 Ch. 4 65
Hidden Units & XOR Problem
o (a) what input looks like to network showing intrinsic similarity structure of inputs.
o Input vectors are passed through weights between inputs & hidden units (multiplied); transforms (folds) input space to produce (b).
o (b) 2 most distinct patterns (11, 00) are close in hidden space.o Weights to output unit can impose linear decision bound &
classify output (c).
• •
• •
1
10
0
0,1 1,1
0,0 1,0
• •
•
1
10
0
0,1 1,1 0,0
1,0
1,0 1,1
0,1 0,0
Input 1 (a) Hidden unit 1 (b) Output (c)
04/21/23 CS 271 Ch. 4 66
Hidden Units Used to Construct Internal Representations of External
Worldo Hidden units make it possible for network to treat
physically similar inputs as different, as needed.– Transform input representations to more
abstract kinds of representations.– Solve difficult problems like XOR.
o However, being able to solve problem, just means that some set of weights exist -- in principle.– Network must be able to learn these weights!
o Real challenge is how to train networks!– One solution -- backpropagation of error.
04/21/23 CS 271 Ch. 4 67
Earlier Laws (PCP) Can’t Handle Hidden Layers Since Don’t Know How to Change
Weights To Them
o PCP & others work well for weights leading to outputs since have target for Output & can calculate weight changes.
o Problem occurs when have hidden units -- how to change weights from inputs to hidden units?– With these algorithms must know how much
error is already apparent at level of Hidden before Output is activated.
– Don’t have predefined target for H, so can’t say what their activation levels should be.
– Can’t specify error at this level of network.
Hopfield
o Recurrent ANNo They are guaranteed
to converge to a local minimum, but convergence to one of the stored patterns is not guaranteed
o http://www.cbu.edu/~pong/ai/hopfield/hopfieldapplet.html
04/21/23 Neural Networks 68
04/21/23 Neural Networks 69
Backpropagation of Error AKA Generalized Delta Rule. (δ)
(Rummelhart, Hinton & Williams, 1986)
o Begin with network which has been assigned initial weights drawn at random.– Usually from uniform distribution with mean of 0.0 &
some user-defined upper & lower bounds ( ±1.0).
o User has set of training data in form of input/output pairs.
o Goal of training -- learn single set of weights such that any input pattern will produce correct output pattern.– Desired if weights allow network to generalize to
novel data not seen during training.
04/21/23 Neural Networks 70
Backprop
o Extremely powerful learning tool.– Applied over wide range of domains.
o Provides very general framework for learning.– Implements gradient descent search in
space of possible network weights to minimize network error.
o What counts as error is up to modeler.– Usually squared difference between target
& actual output, but any quantity that is affected by weights may be minimized.
04/21/23 CS 271 Ch. 4 71
Backprop Training Takes 4 Steps
1. Select I/O pattern (usually at random).2. Compare network’s output with desired output
(teacher pattern) on node-by-node basis & calculate error for each output node.
3. Propagate error info backwards in network from output to hidden.
4. Adjust weights on connections to reduce errors.
04/21/23 CS 271 Ch. 4 72
1. Select I/O pattern
o Pattern usually selected at random.o Input pattern used to activate network &
activation values for output nodes are calculated.
o Can have additional nodes between I/O (“hidden”).
o Since weights selected at random, outputs generated at start are typically not those that go with input pattern.
04/21/23 CS 271 Ch. 4 73
2: Calculate Delta ( ip ) Error
(EQ 1.3)
ip = (tip - oip) f’(netip) = (tip - oip) o ip (1-oip)
o ( ip ) = difference in value between target for node i on training pattern p (target ip) and
o actual output for that node on that pattern (oip)
o multiplied by derivative of output node’s activation function given its input.
– f’(net ip) = slope of activation function.
– EQ 1.2, Fig. 1.3 -- steepest around middle of function where net input closest to 0.
04/21/23 CS 271 Ch. 4 74
o For large values of net input to node (+ & -), derivative is small.
– ( ip ) will be small.
– Net input to node tends to be large when connections feeding into it are strong.
o Weak connections tend to yield small input to node.
– Activation function is large & ( ip ) can be large.
04/21/23 CS 271 Ch. 4 75
Error
Weight x
Weight y
Ideal weightvector
Delta vectorCurrent weight vector
Weight Changes in the Delta Rule
New weight vector
04/21/23 CS 271 Ch. 4 76
Gradient Descent Learning Rule
o Moves weight vector from current position on bowl to new position closer to minimum error by falling down the negative gradient of the bowl.
o Not guaranteed to find correct answer.– Always goes down hill & may get stuck in local
minimum.
o Use momentum to “push” changes in same direction & possibly keep network from getting stuck.
77
Backprop: Calculate Weight Adjustments
o Know, for each output node, how far off target value is.
o Must adjust weights on connections that feed into it to reduce error.– Want to change weight on connections from
every nodej coming into current node i so that can reduce error on pattern.
error
weights
Learning rate
04/21/23 CS 271 Ch. 4 78
o Partial derivative – rate of change.– May be other variables, but they’re
being held constant.o Measures how quantity on top changes
when quantity on bottom is changed. – i.e., how is error (E) affected by
changing weights (w)?o If know this, know how to change
weight to decrease error.– i.e., to decrease discrepancy
between what network outputs & what we want it to output.
04/21/23 CS 271 Ch. 4 79
o Partial derivative is bell shaped for sigmoidal curves (threshold function).– Large values are in the mid-range.
o Contributes to stability of network – as outputs approach 0 or 1, only small changes occur.
o Helps compensate for excessive blame attached to hidden nodes.
o () = Learning Rate.o Convert partial derivative in EQ 1.4 to
EQ 1.5.
04/21/23 CS 271 Ch. 4 80
Backprop: Delta Rule (EQ 1.5) wij = ip ojp
o Make changes small -- learning rate () set to less than 1.0 so that changes aren’t too drastic.– Change in weight depends on error have for unit
( ip ).o Take output into account (ojp) since node’s error is
related to how much (mis)information it has received from another node.• If node is highly active & contributed lots to
current activation, then responsible for much of current error.
• If node inactive to unit i, won’t contribute to i’s error.
04/21/23 CS 271 Ch. 4 81
Delta Rule continued
ip reflects error on unit i for input pattern p.
– Difference between target & output.– Also includes partial derivative (EQ 1.4).
o Calculate errors on all output nodes & weight changes on connections coming into them.– Don’t yet make any changes.
wij = ip ojp
04/21/23 CS 271 Ch. 4 82
3. Propagate error info backwards from output to
hidden– Assume shared blame of hidden unit on basis of:
– What errors on O unit H unit is activating and– Strength of connection between H & each O it connects
to.
Move to hidden layer(s), if any, & use EQ 1.5 to change weights leading into hidden units from below.– Can’t use EQ 1.3 to compute H nodes’ errors since no
given target to make comparison with.– H nodes “inherit” errors of all nodes they’ve activated.– If nodes activated by H unit have large errors, then H
unit shares blame.
04/21/23 CS 271 Ch. 4 83
o Calculate error by summing up errors of nodes it activates multiplied by weight between nodes since it will have effect.– i = hidden node– p = current pattern– k indexes output node feeding back to
hidden node.– derivative of hidden unit’s activation
function multiplied in.o Continues iteratively down thru network
(backpropagation of error)…
04/21/23 CS 271 Ch. 4 84
4. Adjust weights on connections to reduce errors
o When reach layer above input layer (no incoming weights), actually impose the weight changes.
Error flow
04/21/23 CS 271 Ch. 4 85
Backprop Pros & Cons
o Extremely powerful learning tool that is applied over wide range of domains.
o Provides very general framework for learning.– Implements gradient
descent search.o What counts as error is up
to modeler.– Usually squared
difference between target & actual output.
– Any quantity that is affected by weights may be minimized.
o Requires large # presentations of input dat to learn.
o Each presentation requires 2 passes thru network (forward & backward).
o Each pass is complex computationally.
04/21/23 Neural Networks 86
Kohonen
04/21/23 Neural Networks 87
04/21/23 Neural Networks 88
3 Ways Developmental Models Handle Change
1. Development results from working out predetermined behaviors. Change is the triggering of innate knowledge.
2. Change is inductive learning. Learning involves copying or internalizing behaviors present in the environment.
3. Change arises through interaction of maturational factors, under genetic control, and environment.• Progress in neurosciences.• Computational framework good for
exploring & modeling.
04/21/23 Neural Networks 89
Biologically-Oriented Connectionism (Elman, et al)
1. We think it is critical to pay attention to what is known about genetic basis for behavior & about developmental neuroscience.
2. At level of computational & modeling, believe it is important to understand sorts of computations that can plausibly be carried out in neural systems.
3. We take a broad view of biology which includes concern for evolutionary basis for behavior.
4. A broader biological perspective emphasizes adaptive aspects of behaviors & recognizes that to understand adaptation requires attention to environment.
04/21/23 Neural Networks 90
Connectionist Models
o Cognitive functions performed by system that computes with simple neuron-like elements, acting in parallel, on distributed representations.
1. Have precisely matched data from human subject experiments.– Measure speed of reading words – depends
on frequency of word & regularity of pronunciation pattern. (E.g., GAVE, HAVE).• Similar pattern (humans – latency, NN – errors).• Fig. P.1 on pg. 3 (McLeod, Plunkett, Rolls)
04/21/23 Neural Networks 91
04/21/23 Neural Networks 92
2. Connectionist models can predict results.– Suggest areas of investigation– E.g., U-shape learning or Over-generalization
problems when kids learn past tense of verbs (WENT – GOED) suggests linguistic development occurs in stages.
– NN model produced over-regularization errors.
– Fig. P.2. (McLeod, Plunkett, Rolls)
04/21/23 Neural Networks 93
04/21/23 Neural Networks 94
3. Connectionist models have suggested solutions to some of the oldest problems in cognitive science.• E.g., face recognition from various angles.• View invariance – respond to one particular
face (regardless of view) & not the other faces.
• E.g., face 3 in Fig. P.3. (McLeod, Plunkett, Rolls)
04/21/23 Neural Networks 95
04/21/23 Neural Networks 96
04/21/23 Neural Networks 97
04/21/23 Neural Networks 98
Task
o When train network, want it to produce some behavior.
o Task – behavior that are training network to do.– E.g., associate present tense form of verb
with past tense form.o Task must be precisely defined – for class of
networks we’re dealing with – learning correct output for a given input.– Set of input stimuli.– Correct output is paired with each input.
Training Environm
ent
04/21/23 Neural Networks 99
Implications of Defining the Task
o Must conceptualize behavior in terms of inputs & outputs.– May need abstract notion of input & output.– E.g., associate 2 forms of verb – neither is really
input for other.o Teach network task by example, not by explicit rule.
– If successful, network learns underlying relationship between input & output by induction.
– Can’t assume network has learned generalization we assume underlies behavior – may have learned some other behavior! Eg., tanks.
1980s Pentagon trained NN to recognize tanks
04/21/23 Neural Networks 100
04/21/23 Neural Networks 101
Implications - 2
o Nature of training data is extremely important for learning.– The more data you give a network, the better.– With too little data, may make bad generalization.– Quality counts too!! – structure of environment
influences outcome.o Some tasks more convincing/more effective/more
informative than others to demonstrate a point.– Is info represented in teacher (output) plausibly
available to human learners?– E.g., children? See task on next slide.
04/21/23 Neural Networks 102
Two Ways to Teach Network to Segment Sounds into
Words1. Expose network to sequences of sounds (present one at time,
in order, with no breaks between words).• Train network to produce “yes” when sequence makes
word.• Explicitly learns about words from info where words start.
2. Train network on different task – given same sequences of sounds as input, but task is to predict next sound.• At beginning of word, network makes many mistakes.• As it hears more of word, prediction error declines until end
of word.• Learns about words implicitly as indirect consequence of
task.o First approach -- gives away secret by directly teaching task
(boundary info) which is NOT how children learn.
04/21/23 Neural Networks 103
Network Architectures : Number & Arrangement of Nodes in
Network
1. Single-layer feedforward networks -- input layer that projects onto output layer of neurons in one direction.
2. Multilayer feedforward network -- has 1+ hidden layers that intervene between external input & network output.
04/21/23 Neural Networks 104
Network Architectures : Number & Arrangement of Nodes in
Network3. Recurrent network -- has at least
1 feedback loop.
4. Lattice structure -- 1-D, 2-D or greater arrays of neurons with output neurons arranged in rows & columns.
04/21/23 Neural Networks 105
Most Neural Networks Consists of 3 Layers
04/21/23 Neural Networks 106
6 Different Types of Connections Used Between Layers (Inter-layer
Connections)1. Fully connected. Each neuron on
first layer is connected to every neuron on second layer.
2. Partially connected. Neuron of first layer does not have to be connected to all neurons on second layer.
3. Feed forward. Neurons on first layer send their output to neurons on second layer, but receive no input back from neurons on second layer.
04/21/23 Neural Networks 107
4. Bi-directional (recurrent). .Another set of connections carrying output of neurons of second layer into neurons of first layer.
5. Hierarchical. Neurons of lower layer may only communicate with neurons on next level of layer.
6. Resonance.Layers have bi-directional connections.– Can continue sending messages
across connections number of times until certain condition is achieved.
04/21/23 Neural Networks 108
How to Select Correct Network Architectures
o Any task can be solved by some neural network (in theory) – not any neural network can solve any task.
o Number & arrangement of nodes defines network architecture.
o Textbook uses: 1) feedforward.2) simple recurrent networks.
o # nodes depends on task & how I/O are represented.– E.g., if images input in 100x100 dot array -- 10,000 I nodes.
o Selection of architecture reflects modeler’s theory about what info processing is required for task.
04/21/23 Neural Networks 109
Analysis
1. Train network on task.2. Evaluate network’s performance & try to
understand basis for performance.o Need to anticipate kinds of tests before
training!
Ways to evaluate network performance:1. Global error.2. Individual pattern error.3. Analyzing weights & internal representations.
04/21/23 Neural Networks 110
Evaluate Network Performance: Global Error
o During training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce.
o Simulator reports this error on-line -- sum it over number of patterns.– As learning occurs, error should decline &
reach 0.o If network is trained on task in which same input
can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0.
04/21/23 Neural Networks 111
Evaluate Network Performance: Individual
Pattern Erroro Global error can be misleading.
– If have large # of patterns to learn, global error may be low even if some patterns are not learned correctly.
– These may be the interesting patterns.o Also may want to create special test stimuli
not presented to network during training.– Generalize to novel cases?– What has network learned?
o Helps discover what generalizations have been created from a finite data set.
04/21/23 Neural Networks 112
Evaluate Network Performance: Analyzing Weights & Internal
Representations
1. Hierarchical clustering of hidden unit activations.
2. Principal component analysis & projection pursuit.
3. Activation patterns in conjunction with actual weights.
04/21/23 Neural Networks 113
Hierarchical Clustering of Hidden Unit Activations
o Present test patterns to network after training.o Patterns produce activations on hidden units
which record & tag -- vectors in multi-dimensional space.
o Clustering looks at similarity structure of space.o Inputs treated as similar by network produce
internal representations that are similar.o Produces tree format of inner-pattern distance.o Can’t examine space directly -- difficult to
visualize high-dimensional spaces.
04/21/23 Neural Networks 114
Principal Component Analysis & Project Pursuit
o Used to identify interesting lower-dimensional slices from hierarchical clustering.
o Move viewing perspective around in this space.
04/21/23 Neural Networks 115
Activation Patterns in Conjunction With Actual
Weightso When look at activation patterns, only look at
part of what network “knows.”o Network manipulates & transforms info via
connections between nodes.o Examine connections & weights to see how
transformations are being carried out.o Hinton diagrams can be used -- weights
shown as colored squares with color & size of square representing magnitude & sign of connection.
04/21/23 Neural Networks 116
04/21/23 Neural Networks 117
Hinton Diagram.
White = positive weight. Black = negative weight.
Area of box proportional to absolute value of corresponding weight.
04/21/23 Neural Networks 118
What Do We Learn From a Simulation?
o Are the simulations framed in such way that clearly address some issue?
o Are the task & stimuli appropriate for points being made?
o Do you feel you’ve learned something from the simulation?
04/21/23 Neural Networks 119
Uses of Neural Networks
o Prediction -- Use input values to predict some output. E.g. pick best stocks, predict weather, identify cancer risk people.
o Classification -- Use input values to determine classification. E.g. is input letter A; is blob of video data a plane & what kind?
o Data association -- Recognize data that contains errors. E.g. identify characters when scanner is not working properly.
o Data Conceptualization -- Analyze inputs so that grouping relationships can be inferred. E.g. extract from database names most likely to buy product.
o Data Filtering -- Smooth an input signal. E.g. take the noise out of a telephone signal.
04/21/23 Neural Networks 120
Send In The Robotshttp://www.spacedaily.com/news/robot-01b.html
by Annie Strickler and Patrick Barry for NASA Science News
Pasadena - May 29, 2001
o As a project scientist specializing in artificial intelligence at NASA's Jet Propulsion Laboratory (JPL), Ayanna is part of a team that applies creative energy to a new generation of space missions -- planetary and moon surface explorations led by autonomous robots capable of "thinking" for themselves.
o Nearly all of today's robotic space probes are inflexible in how they respond to the challenges they encounter (one notable exception is Deep Space 1, which employs artificial intelligence technologies). They can only perform actions that are explicitly written into their software or radioed from a human controller on Earth.
o When exploring unfamiliar planets millions of miles from Earth, this "obedient dog" variety of robot requires constant attention from humans. In contrast, the ultimate goal for Ayanna and her colleagues is "putting a robot on Mars and walking away, leaving it to work without direct human interaction."
04/21/23 Neural Networks 121
o "We want to tell the robot to think about any obstacle it encounters just as an astronaut in the same situation would do," she says. "Our job is to help the robot think in more logical terms about turning left or right, not just by how many degrees." …
o To do this, Ayanna rely on 2 concepts in field of artificial intelligence: "fuzzy logic" & "neural networks." …
o Neural networks also have ability to learn from experience. This shouldn't be too surprising, since design of neural networks mimics way brain cells process information.
o "Neural networks allow you to associate general input to a specific output," Ayanna says. "When someone sees four legs and hears a bark (the input), their experience lets them know it is a dog (the output)." This feature of neural networks will allow a robot pioneer to choose behaviors based on the general features of its surroundings, much like humans do. “
04/21/23 Neural Networks 122
o By combining these two technologies, Ayanna and her colleagues at JPL hope to create a robot "brain" that can learn on its own how to expertly traverse the alien terrains of other planets.
o Such a brainy 'bot might sound more like the science fiction fantasies of children's comics than a real NASA project, but Ayanna thinks the sci-fi flavor of the project contributes to its importance for space exploration.
o Ayanna -- who wanted to be television's "Bionic Woman" when she was young, and later decided she wanted to try to build her instead -- says she believes that the flights of imagination common in childhood translate into adult scientific achievement.
o "I truly believe science fiction drives real science forward," she says. "You must have imagination to go to the next level."
04/21/23 Neural Networks 123
Learning to Use tlearn
o Define task.o Define architecture.o Setting up simulator.
– Configuration (.cf) file.
– Data (.data) file .– Teach (.teach) file.
o Check architecture.
o Run simulation.– Global error.– Pattern error.
o Examine weights.– Role of start state.– Role of learning
state.o Try:
– Logical Or.– Exclusive Or.
04/21/23 Neural Networks 124
Define Task
o Train neural network to map Boolean functions AND, OR, EXCLUSIVE OR (XOR).
o Boolean functions take set of inputs (1, 0) & decide if given input falls into positive or negative category.
o Input & output activation values of nodes in network with 2 input units & 1 output unit.
o Networks simple & relatively easy to construct for task.
o Many of problems encounter with this task have direct implications for more complex problems.
04/21/23 Neural Networks 125
Boolean Functions AND, OR, XOR
Input Output Activations Activations (Node 3)
Node 0
Node 1
AND OR XOR
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
4 possible input combinations 22
04/21/23 126
Define Architecture for AND Function
o 4 input patterns & 2 distinct outputs.– Each input pattern has 2 activation values.– Each output has single activation.– For every input pattern, have well-defined
output.o Use simple feedforward network with 2 I units & 1
O unit.Single Layer
Perceptron – 1 layer of weights.
w21
a0a1
a2
w20
04/21/23 Neural Networks 127
04/21/23 Neural Networks 128
1. Network menu – New Project option.2. New project dialogue box appears.3. Select directory or folder in which to save your
project files. Use N: Drive!4. Call project and. All files associated with
project should have same name (any name you want).
5. Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture.– and.teach – defines output patterns to network, how
many & format.– and.data – defines input patterns to network, how
many & format.– and.cf – used to define # nodes in network & initial
pattern of connectivity between nodes before training.
04/21/23 Neural Networks 129
Info Stored in .cf, .data & .teach Files
o Can use editor of tlearn. o Or text editor or word processor.
– Must save files in ASCII format (text).o Enter data for and.cf file.
– Follow upper- & lower-case distinctions, spaces & colons.
– Use delete or backspace keys to correct errors.
o File Save command in tlearn.
04/21/23 Neural Networks 130
1 AND 1 = 1
0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
OUTPUT
INPUT
CONFIGURATION
04/21/23 Neural Networks 131
Key to setting up simulator.Describes configuration of
network.Conforms to fairly rigid
format.
3 sections:NODES:CONNECTIONS:SPECIAL
04/21/23 Neural Networks 132
NODES:
NODES: Beginning of nodes sectionnodes = 1 # units in network (not input)inputs = 2 # input units (counted separately)outputs = 1 # output units in networkoutput node is 1 identifies output unit – only 1 non-
input node in network. Start at 1.
o Inputs don’t count as nodes.o Output nodes are < node-list>.o Spaces are critical.
04/21/23 Neural Networks 133
CONNECTIONS:
CONNECTIONS : Beginning of sectiongroups = 0 How many groups of connections
are constrained to have same value.1 from i1-i2 Indicates node 1 (output) receives
input from 2 input units. Input units given prefix i.
1 from 0 Node 0 is bias unit which is always on. So node 1 has a bias.
o All connections in a group are identical strength.– groups = 0 is common.
04/21/23 Neural Networks 134
o <node-list> from <node-list> provides info about connections– <node-list> is comma-separated list of node
# with dashes indicating that intermediate node # are included.
– 1 from i1-i2– Contains no spaces.– Nodes numbered counting from 1.
o Inputs are numbered, counting from 1, with i prefix.
o Node 0 always outputs a 1 & serves as bias node.– If biases are desired, connections must be
specified from node 0 to specific other nodes.– 1 from 0
04/21/23 Neural Networks 135
SPECIAL:
SPECIAL: Beginning of sectionselected = 1 Which units selected for special
printout. Output node (1) is selected.weight_limit = 1.00 Sets start weights (from I to O &
biases to O) randomly in range of +/-0.5.
o Optional lines can specify if :– linear = <node-list> some nodes linear– bipolar = <node-list> values range from –1 to 1– selected = <node-list> nodes selected for special
printout
04/21/23 Neural Networks 136
Data (.data) File
o Defines input patterns presented to tlearn.o First line is either:
– distributed (normal) – set of vectors with i values.
– localist (only few numbers of many input lines are non-zero).
o Second line is integer specifying number of input vectors to follow.
o Remainder of file consists of input.– Integers or floating-point numbers.
04/21/23 Neural Networks 137
04/21/23 Neural Networks 138
Teach (.teach) File
o Required whenever learning is to be performed.o First line: distributed (normal)
localist (only few of many target values nonzero).
o Integer specifying # output vectors to follow.o Ordering of output pattern matches ordering of
corresponding input patterns in .data file.o In normal (distributed), each output vector contains
o floating point or integer numbers.– o = number of outputs in network.– can use * instead of a floating point number to
indicate “don’t care”.
04/21/23 Neural Networks 139
04/21/23 Neural Networks 140
Checking the Architecture
o If typed in info to and.cf, and.data & and.teach files correctly should have no problems.
o tlearn offers check of and.cf by displaying picture of network architecture.– Displays menu, Network Architecture option.– Can change how see nodes, but doesn’t change
contents of network configuration file.o Get error message if mistake in syntax of training
files.o Doesn’t not find incorrect entries in data!!
04/21/23 Neural Networks 141
04/21/23 Neural Networks 142
Running the Simulation
o Specify 3 input files (.cf, .data, .teach) & save them.o Specify parameters for tlearn to determine initial
start state of network, learning rate, & momentum.o Network menu, training options.
04/21/23 Neural Networks 143
o # training sweeps before stop – training sweep is 1 presentation of input pattern
causing activation to propagate thru network & appropriate weight adjustments to be carried out.
o Order in which patterns are presented to network determined by :– train sequentially – presents patterns in order
they appear in .data & .teach files.– train randomly – presents patterns in random
order.o Learning Rate – determines how fast weights are
changed in response to a given error signal.– set to 0.100
o Momentum –discussed later.– set to 0.0
04/21/23 Neural Networks 144
o Initial state of network determined by weight values assigned to connections before training starts.– .cf file specifies weight_limit
o Weights assigned according to random seed indicated by number next to Seed with: button.– Select any number you like.– Simulation can be replicated using the same
random seed – initial start weights of network are identical & patterns are sampled in same random order.
o Seed randomly – computer selects random seed.o Both Seed with & Seed randomly select set of
random start weights within the limits specified by weight_limit parameter.
04/21/23 Neural Networks 145
Train the Network
o Once set training options, select Train the network from Network menu.
o Get tlearn Status display.– # sweeps– Abort, dump current state in weights file.– Iconify – clear screen for other tasks while
tlearn runs in background.
04/21/23 Neural Networks 146
Has the Network Solved the Problem?
1. Examine global error produced at output nodes averaged across patterns.
2. Examine response of network to individual input patterns.
3. Analyzing weights & internal representations.
04/21/23 Neural Networks 147
Examine Global Error
o During training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce.
o Simulator reports this error on-line -- sum it over a number of patterns.– As learning occurs, error should decline & reach 0.
o If network is trained on task in which same input can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0.
o Error calculated by subtracting actual response from desired (target) response.
o Value of discrepancy is either:– Positive if target greater than actual output.– Negative if actual output is greater than target output.
04/21/23 Neural Networks 148
Root Mean Square (RMS) Error
o Global error – average error across 4 pairs at a given point in training.
o tlearn provides Root Mean Square error (RMS) to prevent cancellation of positive & negative numbers.– Average of the squared errors for all
patterns.– Returns square root of average.
04/21/23 Neural Networks 149
• Tracks RMS error throughout training (every 100 sweeps).• Error decreases as training continues … after 1000 sweeps
RMS error = 0.35.– Average output error = 0.35– Output off target by approx. 0.35 averaged across 4
patterns.
AND Network
04/21/23 Neural Networks 150
o Equation 3.1– k indicates number of input patterns (4 for AND)– ok is vector of output activations produced by input
pattern k– number of elements in vector corresponds to number
of output nodes.• e.g., in this case (AND), only one output node so
vector contains only 1 element.– vector tk specifies desired or target activations for
input pattern k.o With 1000 sweeps & 4 input patterns, network sees
each pattern 250 approximately.
04/21/23 Neural Networks 151
o Given RMS error = 0.35, has the network learned the AND function?– Depends on how define acceptable level of error.
o Activation function of output unit is sigmoid function (EQ 1.2).– Activation curve never reaches 1.0 or 0.0– Net input to node would need to be ± infinity.– Always some residual finite error.
o So what level of error is acceptable? No right answer.– Can say all outputs be within 0.1 of target.– Can round off activation values & ones closest to 1.0
are correct if target is 1.0.
04/21/23 Neural Networks 152
Has Network Solved Problem?
o RMS error = 0.35. Solved?o Depends on how define acceptable level of
error.– Can’t always use just global error.– Network may have low RMS, but hasn’t
solved all input patterns correctly.Exercise 3.31. How many times has network seen each input
pattern after 1000 sweeps through training set?2. How small must RMS error be before we can
say network has solved problem?
04/21/23 Neural Networks 153
Pattern Error – Verify Network Has Learned
o RMS error is the average error across 4 patterns.
o Is error uniformly distributed across different patterns or have some patterns been correctly learned while others are not??
o Verify network has learned from Network menu– Presents each input pattern to network once
& observes resulting output node activation.– Compare output activations with teacher
signal in .teach file.
04/21/23 Neural Networks 154
o Output window indicates file and.1000.wts as specification of state of network.
o Used and.data training patterns to verify network performance.
o Compare activation values to target activations in and.teach file.
o Has the network solved Boolean AND?
04/21/23 Neural Networks 155
Pattern Error – Node Activities
o Activation levels indicated by squares.– Large white = high activations.– Small white = low activations.– Grey = inactive node.
04/21/23 Neural Networks 156
Individual Pattern Error
• Global error can be misleading.– If have large # of patterns to learn, global
error may be low even if some patterns are not learned correctly.
– These may be the interesting patterns.• Also may want to create special test stimuli
not presented to network during training.– Generalize to novel cases?– What has network learned?
• Helps discover what generalizations have been created from a finite data set.
04/21/23 Neural Networks 157
Pattern Error: Present each Input Pattern Just Once
• Select Verify network has learned from Network menu.• Presents each input pattern to network just
once.• E.g., for AND function, should do 4 sweeps
(1 per each training input).• Observe resulting output node activations.• Compare output activations with teacher
signal in .teach file.
04/21/23 Neural Networks 158
• Output window indicates file and.1000.wts as specification of state of network.
• Used and.data training patterns to verify network performance.
• Compare activation values to target activations in and.teach file.
• Has the network solved Boolean AND?
AND Network
04/21/23 Neural Networks 159
Calculate Actual RMS Error Value & Compare it to Value Plotted (Boolean AND)
Input
Output Round Off
Target
Squared Error
0 0 0.099 0 0 .0098
1 0 0.294 0 0 .0864
0 1 0.301 0 0 .0906
1 1 0.620 1 1 .1444
RMSError =
.2877
Sqrt(.3312/4)
04/21/23 Neural Networks 160
Pattern Error – Node Activities
• Activation levels indicated by squares.– Large white = high activations.– Small white = low activations.– Grey = inactive node.
04/21/23 Neural Networks 161
Examine Weights
o Input activations transmitted to other nodes along modifiable connections.
o Performance of network determined by strength of connections (weight values).
1. Display menu, Connection Weights (Hinton diagram).– white (positive)– black (negative)– size reflects absolute size
of connectionbias node/first input/second input
04/21/23 Neural Networks 162
o All rectangles in first column code values of connection from bias node.
o Rectangles in 2nd column code connections from 1st input unit.
o Across columns – higher numbered nodes (from .cf)
o Rows in each column identify destination nodes of connection.– higher numbered rows indicate higher
numbered destination nodes.– Only one node in this example receives
inputs (output node) – only one that receives incoming connections.
04/21/23 Neural Networks 163
o Hinton diagram provides clues how network solves Boolean AND.– Bias has strong negative connection to
output node.– 2 input nodes have moderately sized
positive connections to output node.– One active node by itself can’t provide
enough activation to overcome strong negative bias.
– Two active input nodes together can overcome negative bias.
– Output node only turns on if both input nodes are active!
04/21/23 Neural Networks 164
Role of Start State
o Network solved Boolean AND starting with particular set of random weights & biases.
o Use different random seed (Training options) to wipe out learning that has occurred …
o Can resume training beyond the specified number of sweeps using the Resume training option.
o Start states can have dramatic impact on way network attempts to solve a problem & on final solution.– Training networks with different random seeds is like
running subjects on experiments.
04/21/23 Neural Networks 165
Role of Learning Rate
o Learning rate determines proportion of error signal which is used to change weights in network.– Large learning rates lead to big weight changes.– Small learning rates lead to small weight changes.
o To examine effect of learning rate on performance, run simulation so that learning rate is only factor changed.– Start with same random weights & biases.
o Modelers often use small learning rate to avoid large weight changes.– Large weight changes can be disruptive (learning is
undone).– Large weight changes can be counter-productive
when network is close to a solution!
04/21/23 Neural Networks 166
1. Network menu – New Project option. New project dialogue box appears.
2. Select directory or folder in which to save your project files. Use N: Drive!
3. Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture (.teach, .data, & .cf).
4. Check architecture.
5. Specify training option parameters to determine initial start state of network, learning rate, & momentum.
6. Train network (from Network menu).7. Determine if network has learned task by checking error
rates, examine response to individual patterns, etc.
Steps To Building Neural Network in tlearn
04/21/23 Neural Networks 167
Bias Node
First Input
Second Input
AND Network : Hinton Diagram
04/21/23 Neural Networks 168
Hinton Diagram.
White = positive weight. Black = negative weight.
Area of box proportional to absolute value of corresponding weight.
04/21/23 Neural Networks 169
Logical AND Network Implemented With 2 I & 1 O
o Output unit on (value close to 1.0) when both inputs 1.0. Otherwise off.
o With large - weight from bias unit to output, off by default.
o Make weights from input nodes to output large enough that if both nodes are present, net input is great enough to turn output on.– Neither input by itself is
large enough to overcome negative bias.
Node 0 is bias unit which is always on. So node 1 has a bias.
04/21/23 Neural Networks 170
Hinton Diagram Example
04/21/23 Neural Networks 171
Weights File in tlearn
o tlearn keeps up-to-date record of network’s state in weights file.
o Saved to disk at regular intervals & at end of training.o Lists all connections in network grouped according to
received node.o In and.cf file only 1 receiving node is specified (output
node 1).
04/21/23 Neural Networks 172
o 1st # represents weight on connections from bias node to output node (-2.204).
o 2nd # (1.328) shows connection from 1st input node to output.
o 3rd # (1.36) shows connection from 2nd input node to output node.
o Final number (0.000) shows connection from output node itself – non-existent due to feedforward nature.
04/21/23 Neural Networks 173
Resume Training
o Can continue network training by Resume training option on the Network menu.– Extend training by # sweeps & adjust error
display to accommodate extra training sweeps.
o Does the RMS error decrease significantly?
04/21/23 Neural Networks 174
Several Different Ways to Analyze Weights & Examine Internal
Representations
1. Hierarchical clustering of hidden unit activations.
2. Principal component analysis & projection pursuit.
3. Activation patterns in conjunction with actual weights.
• Examine these methods in detail later in semester!
04/21/23 Neural Networks 175
1 - Hierarchical Clustering of Hidden Unit Activations
• Present test patterns to network after training.• Patterns produce activations on hidden units which
record & tag -- vectors in multi-dimensional space.• Clustering looks at similarity structure of space.• Inputs treated as similar by network produce internal
representations that are similar.• Produces tree format of inner-pattern distance.• Can’t examine space directly -- difficult to visualize
high-dimensional spaces.
04/21/23 Neural Networks 176
2 - Principal Component Analysis & Project Pursuit
• Used to identify interesting lower-dimensional slices from hierarchical clustering.
• Move viewing perspective around in this space.
04/21/23 Neural Networks 177
3 - Activation Patterns In Conjunction With Actual Weights
• When look at activation patterns, only look at part of what network “knows.”
• Network manipulates & transforms info via connections between nodes.
• Examine connections & weights to see how transformations are being carried out.
• Hinton diagrams can be used -- weights shown as colored squares with color & size of square representing magnitude & sign of connection.
04/21/23 Neural Networks 178
Has Network Solved AND Problem? RMS error = 0.35. Solved?
• Depends on how define acceptable level of error.– Can’t always use just global error.– Network may have low RMS, but hasn’t solved all
input patterns correctly.• Exercise 3.3
1. How many times has network seen each input pattern after 1000 sweeps through training set?
2. How small must RMS error be before we can say network has solved problem?
• Exercise 3.41. Compare exact value of RMS to plotted value.
04/21/23 Neural Networks 179
What Do We Learn From a Simulation?
• Are the simulations framed in such way that clearly address some issue?
• Are the task & stimuli appropriate for points being made?
• Do you feel you’ve learned something from the simulation?
04/21/23 Neural Networks 180
Logical OR
o What type of network architecture?
o 2 input, 1 output + bias node
o Try the OR network (pg. 57-62).
Input Output Activations Activations (Node 3)
Node 0 Node 1 AND OR XOR
0 0 0 0 0
0 1 0 1 1
1 0 0 1 1
1 1 1 1 0
04/21/23 Neural Networks 181
04/21/23 Neural Networks 182
Exclusive OR
o Create third project called xor and try the exclusive OR function with input layer and output layer.
04/21/23 Neural Networks 183
Neural Network Simulation Software : tlearn, Membrain
o Simulations allow examination of how model solved problem.
o Simulator needs to be told:– Network architecture.– Training data.– Learning rate & other parameters.
o Simulator:– Creates network.– Performs training.– Reports results.
o You can examine results.
04/21/23 Neural Networks 184
Tlearn Software
1. Copy win_tlearn.exe from disk or R: drive to N: drive.
2. Double-click on file to begin installation.3. Executable is called tlearn.
o http://www.columbia.edu/cu/psychology/courses/3205/tlearn/
To download Adobe Acrobat PDF version: ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/TlearnManual.pdf