Artificial Neural Network (ANN)

04/21/23 Neural Networks 1


Artificial Neural Network (ANN)

o Neural network -- “a machine that is designed to model the way in which the brain performs a particular task or function of interest” (Haykin, 1994, pg. 2).– Uses massive interconnection of simple

computing cells (neurons or processing units).– Acquires knowledge thru learning.– Modify synaptic weights of network in orderly

fashion to attain desired design objective.o Attempts to use ANNs since 1950’s.

– Abandoned by most by 1970’s.


Artificial Intelligence (AI)

o “A field of study that encompasses computational techniques for performing tasks that apparently require intelligence when performed by humans” (Tanimoto, 1990).– Goal to increase our understanding of

reasoning, learning, & perceptual processes.

o Knowledge representation.o Search. Fundamental Issueso Perception & inference.


Traditional AI vs. Neural Networks

Traditional AI:o Programs brittle & overly

sensitive to noise.o Programs either right or

fail completely.– Human intelligence

much more flexible (guessing).

o http://www-ai.ijs.si/eliza/eliza.html

Neural Networks:o Capture knowledge in

large # of fine-grained units.

o More potential for partially matching noisy & incomplete data.

o Knowledge is distributed uniformly across network.

o Model for parallelism – each neuron is independent unit.

o Similar to human brains?

http://www-ai.ijs.si/eliza/eliza.html

http://www-ai.ijs.si/eliza/eliza.html

Neural Network

sConnectioni

sm

Parallel Distributed Processing

Neuro-computing

Natural

Intelligent

Systems

Machine Learning

Algorithms

Artificial N

eural

Networks

Biologicall

y Inspired

Computing


Handwriting Neural Network

o http://www.youtube.com/watch?v=qXoVGxjUTtA

http://www.manifestation.com/

neurotoys/eliza.php3


NETtalk (Sejnowski & Rosenberg)

o http://cnl.salk.edu/Media/nettalk.mp3.


http://cnl.salk.edu/Media/nettalk.mp3


Human Brain

o “… a highly complex, nonlinear, and parallel computer (information-processing system). It has the capability to organize its structural constituents, know as neurons, so as to perform certain computations (e.g., pattern recognition, perception, and motor control) many times faster than the fastest digital computer in existence today.” (Haykin, 1999, Neural Networks: A Comprehensive Foundation, pg. 1).


Approaches to Studying Brain

o Know enough neuroscience to understand why computer models make certain approximations.– Understand when approximations are good &

when bad.o Know tools of formal analysis for models.

– Some simple mathematics.– Access to simulator or ability to program.

o Know enough cognitive science to have some idea of about what the system is supposed to do.


Why Build Models? “… a model is simply a detailed

theory.”1. Explicitness – constructing model of theory &

implementing it as computer program requires great level of detail.

2. Prediction – difficult to predict consequences of model due to interactions between different parts of model.– Connectionist models are non-linear.

3. Discover & test new experiments & novel situations.4. Practical reasons why difficult to test theory in real world.

– Systematically vary parameters thru full range of possible values.

5. Help understand why a behavior might occur.• Simulations open for direct inspections explanation

of behavior.


Simulations As Experiments

o Easy to do simulations, but difficult to do them well.o Running a good simulation like running good

experiment.1. Clearly articulated problem (goal).2. Well-defined hypothesis, design for testing

hypothesis, & plan how to the results.– Hypothesis from current issues in literature.– E.g., test predictions, replicate observed

behaviors, test theory of behavior.3. Task, stimulus representations & network

architectures must be defined.


What kinds of problems can ANNs help us understand?

o Brain of newborn child contains billions of neurons– But child can’t perform many

cognitive functions.o After a few years of receiving

continuous streams of signals from outside world via sensory systems,– Child can see, understand

language & control movements of body.

o Brain discovers, without being taught, how to make sense of signals from world.

o How???o Where do you start?

NN Applications http://www-cs-faculty.stanford.edu/~eroberts/cours

es/soco/projects/2000-01/neural-networks/Applications/index.html

o Character recognitiono Image compressiono Stock market predictiono Traveling salesman problemo Medicine, electronic noise, loan applications


Neural Networks (ACM)

o Web spam detection by probability mapping graphSOMs and graph neural networks

o No-reference quality assessment of JPEG images by using CBP neural networks

o An Embedded Fingerprints Classification System based on Weightless Neural Networks

o Forecasting Portugal global load with artificial neural networkso 2006 Special issue: Neural network forecasts of the tropical

Pacific sea surface temperatureso Developmental learning of complex syntactical song in the

Bengalese finch: A neural network modelo Neural networks in astronomy



Artificial & Biological Neural Networks

o Build intelligent programs using models that parallel structure of neurons in human brain.

o Neurons – cell body with dendrites & axon.– Dendrites receive signals from other

neurons.– When combined impulses exceed threshold,

neuron fires & impulse passes down axon.– Branches at end of axon form synapses with

dendrites of other neurons.• Excitatory or inhibitory.


Do Neural Networks Mimic Human Brain?

o “It is not absolutely necessary to believe that neural network models have anything to do with the nervous system, …

o … but it helps. o Because, if they do, we are able to

use a large body of ideas, experiments, and facts from cognitive science and neuroscience to design, construct, and test networks.” (Anderson, 1997, p. 1)


Neural Networks Abstract From the Details of Real Neurons

o Conductivity delays are neglected.o Net input is calculated as weighted sum of

input signals.o Net input is transformed into an output signal

via a simple function (e.g., a threshold function).

o Output signal is either discrete (e.g., 0 or 1) or it is a real-valued number (e.g., between 0 and 1).


ANN Features

o A series of simple computational elements, called neurons (or nodes, units, cells)

o Connections between neurons that carry signals

o Each link (connection) between neurons has a weight that can be modified

o Each neuron sums the weighted input signals and applies an activation function to determine the output signal (Fausett, 1994).



Neural Networks Are Composed of Nodes &

Connectionso Nodes – simple processing

units.– Similar to neurons – receive

inputs from other sources.– Excitatory inputs tend to

increase neuron’s rate of firing.

– Inhibitory inputs tend to decrease neuron’s rate of firing.

o Firing rate changes via real-valued number (activation).

o Input to node comes from other nodes or from some external source.

Fully recurrent network

3-layer feed forward network


Connections

o Input travels along connection lines.o Connections between different nodes

can have different potency (connection strength) in many models.– Strength represented by real-valued

number (connection weight).– Input from one node to another is

multiplied by connection weight.o If connection weight is

– Negative number – input is inhibitory.– Positive number – input is excitatory.


Nodes & Connections Form Various Layers of NN


A Single Node/Neuron

o Inputs to node usually summed ( ).o Net input passed thru activation function ( f(net) ).o Produces node’s activation which is sent to other nodes.o Each input line (connection) represents flow of activity

from some other neuron or some external source.

f(net)Inputs from other nodes

Outputs to other nodes


More Complex Model of a Neuron

wk1

wk2

…

…

wkp

(-)

k

x1

x2

xp

uk

Output

yk

Activation Function

Threshold

Summing function

Synaptic weights of neuron

Input

signals

Linear Combiner Output


Add up Net Inputs to Node

o Each input (from different nodes) is calculated by multiplying activation value of input node by weight on connection (from input node to receiving node).

neti = wijaj Net input to node i j

= sigma (summation) o i = receiving nodeo aj = activation on nodes sending to node io wij = weights on connection between nodes j & i.


Sums (weights * activation) For All Input Nodes

neti = wijaj

j

o i = 4 (node 4).o j = 3 (3 input nodes

into node 4).

o add up wij * ai for all 3 input nodes.

4

0 1 2


Activation Functions : Node Can Do Several Things With Net Input

1. Activation (e.g., output) = Input.• (f(net)) is Identity function.• Simplest case.

2. Threshold must be achieved before activation occurs.1. Activation function may be non-linear

function of input. Resembles sigmoid.2. Activation function may be linear.

Real neurons


Different Types of NN Possible

1. Single layer or multi-layer architectures (Hopfield, Kohonen).2. Data processing. thru network.

o Feedforward.o Recurrent.

3. Variations in nodes.o Number of nodes.o Types of connections among nodes in network.

4. Learning algorithms.o Supervised.o Unsupervised (self-organizing).o Back propagation learning (training).

5. Implementation.– Software or hardware.



Steps in Designing a Neural Network

1. Arrange neurons in various layers. 2. Decide type of connections among neurons

for different layers, as well as among neurons within layer.

3. Decide way a neuron receives input & produces output.

4. Determine strength of connection within network by allowing network to learn appropriate values of connection weights via training data set.

Activation Functions

1. Identity function: f(x) = x for all x2. Binary step function: f(x) = 1 if x >= θ; f(x) =

0 if x < θ3. Continuous log-sigmoid function (Logistic

function): f(x) = 1/[1 + exp(-σx)]



– a i = activation (output) of node i

– net i = net activation flowing into node i

– e = exponential

o What output of node will be for any given net input.

o Graph of relationship (next slide).

Sigmoid Activation Function

04/21/23 34

Sigmoid Activation Function Often Used for Nodes in NN

o For wide range of inputs (> 4.0 & < -4.0), nodes exhibit all or nothing.– Output max. value of 1(on).– Output min. value of 0 (off).

o Within range of –4.0 to 4.0, nodes show greater sensitivity.– Output capable of making

fine discriminations between different inputs.

o Non-linear response is at heart of what makes these networks interesting.nothing or all


o What will be the activation of node 2, assuming the input you just calculated?

o If node 2 receives input of 1.25, activation of 0.777.

o Activation function scales from 0.0 to 1.0.

o When net input = 0.0, net output is exact mid-range of possible activation (0.5).

Negative inputs


Example 2-Layered Feedforward Network : Step Thru Process

o Neural network consists of collection of nodes.– Number & arrangement of

nodes defines network architecture.

o Example 2-layered feedforward.– 2 layers (input, output).– no intra-level connections.– no recurrent connections.– single connection into input

nodes & out of output nodes.o Very simplified in comparison to

biological neural network!2-layered

feedforward network

a2

a0 a1

w20 w21

Output nodes

Input nodes


o Each input node has certain level of activity associated with it.– 2 input nodes (a0, a1).– 2 output nodes (a2, a3).

o Look at one output unit (a2).– Receives input from a0 & a1 via

independent connections.– Amount depends on activation

values of input nodes (a0 & a1) and weights (w20, w21).

o For this network, activity flows in 1 direction along connections.– e.g., w20 w02– w02 doesn’t exist

o Total input to node 2 (a2) = w20a0 + w21a1.

a2

a0 a1

w20 w21

wij = 20 when i = 0 & j = 2


Exercise 1.1o What is the input received by node 2?

o Net input for node 2 = (1.0 * 0.75) + (1.0 * 0.5) = 1.25

o Net input alone doesn’t determine activity of output node.

o Must know activation function of node.o Assume nodes have activation

functions shown in EQ 1.2 (& Fig. 1.3).o Next slide shows sample inputs &

activations produced - assuming logistic activation function.

a2

1 1

0.75 0.5


Bias Node (Default Activation)

o In absence of any input (i.e. input = 0.), nodes have output of 0.5.

o Useful to allow nodes to have default activation.– Node is “off” (output 0.0) in absence of input.– Or can have default state where node is “on”.

o Accomplish this by adding node to network which receives no inputs, but is always fully activated & outputs 1.0 (bias node).– Node can be connected to any node in network.– Often connected to all nodes except input nodes.– Allow weights on connections from this node to

receiving nodes to be different.


o Guarantees that all receiving nodes have some input even if all other nodes are off.

o Since output of bias node is always 1.0, input it sends to any other node is 1.0 * wij (value of weight itself).

o Only need one bias node per network.o Similar to giving each node a variable threshold.

– large negative bias == node is off (activation close to 0.0) unless gets sufficient positive input from other sources to compensate.

– large positive bias == receiving node is on & requires negative input from other nodes to turn it off.

o Useful to allow individual nodes to have different defaults.


Learning From Experience

• Changing of neural networks connection weights (training) causes network to learn solution to a problem. • Strength of connection between neurons

stored as weight-value for specific connection.

• System learns new knowledge by adjusting these connection weights.

04/21/23 42

Three Training Methods for NN

1. Unsupervised learning – hidden neurons must find a way to organize themselves without help from outside. • No sample outputs provided to network

against which it can measure its predictive performance for given vector of inputs.

• Learning by doing.

2. Supervised Learning (Reinforcement)

o works on reinforcement from outside. • Connections among neurons in hidden

layer randomly arranged, then reshuffled as network told how close it is to solution.

• Requires teacher -- training set of data or observer who grades performance of network results.

• Both unsupervised & supervised suffer from relative slowness & inefficiency relying on random shuffling to find proper connection weights.


3. Back Propagation

o Network given reinforcement for how it is doing on task plus information about errors is used to adjust connections between layers.– Proven highly successful in training of

multilayered neural nets. – Form of supervised learning.



Example Learning Algorithms

1. Hebb’s Rule -- how physical networks might learn.

2. Perceptron Convergence Procedures (PCP).– Widrow-Hoff Learning Rule (1960s).

3. Hopfield.4. Backpropagation of Error (Generalized Delta

Rule).5. Kohonen’s Learning Laws (not covered here).


McCulloch-Pitts (1943) Neuron

1. Activity of neuron is an “all-or-none” process.2. Certain fixed number of synapses must be excited

within period of latent addition to excite neuron at any time.o Number is independent of previous activity &

position of neuron.3. Only significant delay within nervous system is

synaptic delay.4. Activity of any inhibitory synapse absolutely prevents

excitation of neuron at that time.5. Structure of net does not change with time.

McColloch-Pitts Neuron

o Firing within a neuron is controlled by a fixed threshold (θ).

o binary step function: f(x) = 1 if x >= θ; f(x) = 0 if x < θ.

o What happens here if θ = 2?


McColloch-Pitts Neuron AND

P Q P ^ Q (P and Q)

T T TT F FF T FF F F


Threshold = 2Does a2 fire?

McColloch-Pitts Neuron OR

P Q P V Q (P or Q)

T T TT F TF T TF F F



McColloch-Pitts Neuron XOR

P Q P XOR Q (P xor Q)

T T FT F TF T TF F F



McColloch-Pitts Neuron AND NOT

o Did you get weights of 2 for w20 and -1 for w21?


McColloch-Pitts Neuron

o http://lcn.epfl.ch/tutorial/english/mcpits/html/index.html

o No learning algorithms


http://lcn.epfl.ch/tutorial/english/mcpits/html/index.html

http://lcn.epfl.ch/tutorial/english/mcpits/html/index.html

Hebb : The Organization of Behavior (1949)

o When an axon of cell A is near enough to excite a cell B & repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”

o If neuron receives input from another neuron & if both highly active, weight between neurons should be strengthened. – Specific synaptic change (Hebb synapse) which

underlies learning.o Result was interconnections between large, diffuse set of

cells, in different parts of brain called “cell assemblies.”o Changes suggested by Rochester et al. (1956) make

more practical model.


Hebb’s Rule: Associative learning

“Cells that fire together, wire together”

wij = ai aj

– where change in weight = product of activations of nodes that are connected to it.

wij = ηai aj

– where η is the learning rate

o Unsupervised learningo Success at learning some patterns

– it only learns these patterns (e.g., pair-wise correlations). There will be times when want ANN to learn to associate a pattern with some desired behaviors even when there is no pair-wise correlation


04/21/23 CS 271 Ch. 4 55

Pros & Cons of Hebbian Learning

o Known biological mechanisms that might use Hebbian Learning.

o Provides reasonable answer to “where does teacher info for learning process come from?”– Lots of useful info in

correlated activity.– System just needs to

look for patterns.

o All it can learn is pair-wise correlations.

o May need to learn to associate patterns with desired behaviors even if patterns aren’t pair-wise.– Hebb rule can’t do

this.

04/21/23 CS 271 Ch. 4 56

Perceptron Convergence Procedures (PCP)

o Variations of Hebb’s Rule from 1960s. – Perceptron (Rosenblatt, 1958).– Widrow-Hoff rule is similar to PCP (1960).

o Start with network of units with connections initialized with random weights.

o Take target set of input/output patterns & adjust weights automatically so at end of training weights yield correct outputs for any input.– Network should generalize to produce correct output

for input patterns it hasn’t seen during training.o gradient descent rule, Delta rule or Adaline rule

04/21/23 CS 271 Ch. 4 57

http://lcn.epfl.ch/tutorial/english/perceptron/html/index.html

Widrow-Hoff Rule

o starts with connections initialized with random weights and one input pattern is presented to the network.

o For each input pattern, the network’s actual output is compared to the target output for that pattern.


Figure 18: Supervised (Delta Rule) vs. Unsupervised (Perceptron) Learning (www.willamette.edu/~gorr/classes/cs449/Classification/delta.html)

http://www.willamette.edu/~gorr/classes/cs449/Classification/delta.html

http://www.willamette.edu/~gorr/classes/cs449/Classification/delta.html

o Any discrepancy (error) used as basis for changing weights on input connections & changing output node’s threshold for activation.

o How much weights are changed depends on error produced & activation from given input. – Correction is proportional to error signal multiplied

by value of activation given by derivative of transfer function.

– Using derivative allows making finely tuned corrections when activation is near its extreme values (minimum or maximum) & larger corrections when activation is in middle range.

o Goal of the Widrow-Hoff Rule is to minimize error on output unit by apportioning credit & blame to the input nodes.

o Only works for simple, 2-layer networks (I/O units).


Using Similarity

o Basic principle that drives learningo Allows generalization of behaviors because similar inputs

tend to yield similar outputs. o 11110000 vs. 11110001o “make” and “bake” “made” and “baked”o Cats and tigerso Similarity is generally a good rule of thumb, but not in

every case. o Hebbian networks & basic, 2-layer PCP networks can only

learn to generalize on basis of physical similarity


04/21/23 CS 271 Ch. 4 61

2-layer Perceptron Can’t Solve Problem of Boolean XOR

o If want output to be true (1).– At least 1 input must be 1 & at

least 1 weight must be large enough so when multiplied, output node turns on.

o For patterns (00 & 11) want 0 so set weights to 0.

o For patterns (01 & 10), need weights from either input large enough so 1 input alone activates output.

o Contradictory requirements -- no set of weights allows output to come on if either input on & keeps it off if both are on!

Node 0

Node 1 XOR

0 0 0

1 0 1

0 1 1

1 1 0

w21

a0a1

a

2w20

04/21/23 CS 271 Ch. 4 62

Vectors

o Vector -- collection of numbers or point in space.o Can think of inputs in XOR example as 2-D space.

– With each number indicating how far out along the dimension the point is located.

o Judge similarity of 2 vectors by Euclidean distance in space.– Pairs of patterns furthest apart & most dissimilar

(00 & 11) are ones need to group together for XOR function.

• •

• •

1

10

0

0,1 1,1

0,0 1,0

04/21/23 CS 271 Ch. 4 63

o I/O weights impose linear decision bound on input space.– Patterns which fall on 1 side of decision line

classified differently than patterns on other side.o When groups of inputs can’t be separated by line, no

way for unit to discriminate between categories.– Problems called non-linearly separable.

o What’s needed are hidden units & learning algorithms that can handle more than one layer.

• •

• •

1

10

0

0,1 1,1

0,0 1,0

• •

• •

1

10

0

0,1 1,1

0,0 1,0

• •

• •

1

10

0

0,1 1,1

0,0 1,0

AND OR XOR

CS 271 Ch. 4 64

Solving the XOR Problem : Allow Internal

Representationo Add extra node(s) between I & XOR problem

solved.o “Hidden” units equivalent to internal

representations & aren’t seen by world.– Very powerful -- networks have internal

representations that capture more abstract, functional relationships.

o Inputs (sensors), outputs (motor effectors) & hidden (inter-neurons).

o Input similarity still important .– All things being equal, physical resemblance of inputs

exerts strong pressure to induce similar responses.

CS 271 Ch. 4 65

Hidden Units & XOR Problem

o (a) what input looks like to network showing intrinsic similarity structure of inputs.

o Input vectors are passed through weights between inputs & hidden units (multiplied); transforms (folds) input space to produce (b).

o (b) 2 most distinct patterns (11, 00) are close in hidden space.o Weights to output unit can impose linear decision bound &

classify output (c).

• •

• •

1

10

0

0,1 1,1

0,0 1,0

• •

•

1

10

0

0,1 1,1 0,0

1,0

1,0 1,1

0,1 0,0

Input 1 (a) Hidden unit 1 (b) Output (c)

04/21/23 CS 271 Ch. 4 66

Hidden Units Used to Construct Internal Representations of External

Worldo Hidden units make it possible for network to treat

physically similar inputs as different, as needed.– Transform input representations to more

abstract kinds of representations.– Solve difficult problems like XOR.

o However, being able to solve problem, just means that some set of weights exist -- in principle.– Network must be able to learn these weights!

o Real challenge is how to train networks!– One solution -- backpropagation of error.

04/21/23 CS 271 Ch. 4 67

Earlier Laws (PCP) Can’t Handle Hidden Layers Since Don’t Know How to Change

Weights To Them

o PCP & others work well for weights leading to outputs since have target for Output & can calculate weight changes.

o Problem occurs when have hidden units -- how to change weights from inputs to hidden units?– With these algorithms must know how much

error is already apparent at level of Hidden before Output is activated.

– Don’t have predefined target for H, so can’t say what their activation levels should be.

– Can’t specify error at this level of network.

Hopfield

o Recurrent ANNo They are guaranteed

to converge to a local minimum, but convergence to one of the stored patterns is not guaranteed

o http://www.cbu.edu/~pong/ai/hopfield/hopfieldapplet.html



Backpropagation of Error AKA Generalized Delta Rule. (δ)

(Rummelhart, Hinton & Williams, 1986)

o Begin with network which has been assigned initial weights drawn at random.– Usually from uniform distribution with mean of 0.0 &

some user-defined upper & lower bounds ( ±1.0).

o User has set of training data in form of input/output pairs.

o Goal of training -- learn single set of weights such that any input pattern will produce correct output pattern.– Desired if weights allow network to generalize to

novel data not seen during training.


Backprop

o Extremely powerful learning tool.– Applied over wide range of domains.

o Provides very general framework for learning.– Implements gradient descent search in

space of possible network weights to minimize network error.

o What counts as error is up to modeler.– Usually squared difference between target

& actual output, but any quantity that is affected by weights may be minimized.

04/21/23 CS 271 Ch. 4 71

Backprop Training Takes 4 Steps

1. Select I/O pattern (usually at random).2. Compare network’s output with desired output

(teacher pattern) on node-by-node basis & calculate error for each output node.

3. Propagate error info backwards in network from output to hidden.

4. Adjust weights on connections to reduce errors.

04/21/23 CS 271 Ch. 4 72

1. Select I/O pattern

o Pattern usually selected at random.o Input pattern used to activate network &

activation values for output nodes are calculated.

o Can have additional nodes between I/O (“hidden”).

o Since weights selected at random, outputs generated at start are typically not those that go with input pattern.

04/21/23 CS 271 Ch. 4 73

2: Calculate Delta ( ip ) Error

(EQ 1.3)

ip = (tip - oip) f’(netip) = (tip - oip) o ip (1-oip)

o ( ip ) = difference in value between target for node i on training pattern p (target ip) and

o actual output for that node on that pattern (oip)

o multiplied by derivative of output node’s activation function given its input.

– f’(net ip) = slope of activation function.

– EQ 1.2, Fig. 1.3 -- steepest around middle of function where net input closest to 0.

04/21/23 CS 271 Ch. 4 74

o For large values of net input to node (+ & -), derivative is small.

– ( ip ) will be small.

– Net input to node tends to be large when connections feeding into it are strong.

o Weak connections tend to yield small input to node.

– Activation function is large & ( ip ) can be large.

04/21/23 CS 271 Ch. 4 75

Error

Weight x

Weight y

Ideal weightvector

Delta vectorCurrent weight vector

Weight Changes in the Delta Rule

New weight vector

04/21/23 CS 271 Ch. 4 76

Gradient Descent Learning Rule

o Moves weight vector from current position on bowl to new position closer to minimum error by falling down the negative gradient of the bowl.

o Not guaranteed to find correct answer.– Always goes down hill & may get stuck in local

minimum.

o Use momentum to “push” changes in same direction & possibly keep network from getting stuck.

77

Backprop: Calculate Weight Adjustments

o Know, for each output node, how far off target value is.

o Must adjust weights on connections that feed into it to reduce error.– Want to change weight on connections from

every nodej coming into current node i so that can reduce error on pattern.

error

weights

Learning rate

04/21/23 CS 271 Ch. 4 78

o Partial derivative – rate of change.– May be other variables, but they’re

being held constant.o Measures how quantity on top changes

when quantity on bottom is changed. – i.e., how is error (E) affected by

changing weights (w)?o If know this, know how to change

weight to decrease error.– i.e., to decrease discrepancy

between what network outputs & what we want it to output.

04/21/23 CS 271 Ch. 4 79

o Partial derivative is bell shaped for sigmoidal curves (threshold function).– Large values are in the mid-range.

o Contributes to stability of network – as outputs approach 0 or 1, only small changes occur.

o Helps compensate for excessive blame attached to hidden nodes.

o () = Learning Rate.o Convert partial derivative in EQ 1.4 to

EQ 1.5.

04/21/23 CS 271 Ch. 4 80

Backprop: Delta Rule (EQ 1.5) wij = ip ojp

o Make changes small -- learning rate () set to less than 1.0 so that changes aren’t too drastic.– Change in weight depends on error have for unit

( ip ).o Take output into account (ojp) since node’s error is

related to how much (mis)information it has received from another node.• If node is highly active & contributed lots to

current activation, then responsible for much of current error.

• If node inactive to unit i, won’t contribute to i’s error.

04/21/23 CS 271 Ch. 4 81

Delta Rule continued

ip reflects error on unit i for input pattern p.

– Difference between target & output.– Also includes partial derivative (EQ 1.4).

o Calculate errors on all output nodes & weight changes on connections coming into them.– Don’t yet make any changes.

wij = ip ojp

04/21/23 CS 271 Ch. 4 82

3. Propagate error info backwards from output to

hidden– Assume shared blame of hidden unit on basis of:

– What errors on O unit H unit is activating and– Strength of connection between H & each O it connects

to.

Move to hidden layer(s), if any, & use EQ 1.5 to change weights leading into hidden units from below.– Can’t use EQ 1.3 to compute H nodes’ errors since no

given target to make comparison with.– H nodes “inherit” errors of all nodes they’ve activated.– If nodes activated by H unit have large errors, then H

unit shares blame.

04/21/23 CS 271 Ch. 4 83

o Calculate error by summing up errors of nodes it activates multiplied by weight between nodes since it will have effect.– i = hidden node– p = current pattern– k indexes output node feeding back to

hidden node.– derivative of hidden unit’s activation

function multiplied in.o Continues iteratively down thru network

(backpropagation of error)…

04/21/23 CS 271 Ch. 4 84

4. Adjust weights on connections to reduce errors

o When reach layer above input layer (no incoming weights), actually impose the weight changes.

Error flow

04/21/23 CS 271 Ch. 4 85

Backprop Pros & Cons

o Extremely powerful learning tool that is applied over wide range of domains.

o Provides very general framework for learning.– Implements gradient

descent search.o What counts as error is up

to modeler.– Usually squared

difference between target & actual output.

– Any quantity that is affected by weights may be minimized.

o Requires large # presentations of input dat to learn.

o Each presentation requires 2 passes thru network (forward & backward).

o Each pass is complex computationally.


Kohonen



3 Ways Developmental Models Handle Change

1. Development results from working out predetermined behaviors. Change is the triggering of innate knowledge.

2. Change is inductive learning. Learning involves copying or internalizing behaviors present in the environment.

3. Change arises through interaction of maturational factors, under genetic control, and environment.• Progress in neurosciences.• Computational framework good for

exploring & modeling.


Biologically-Oriented Connectionism (Elman, et al)

1. We think it is critical to pay attention to what is known about genetic basis for behavior & about developmental neuroscience.

2. At level of computational & modeling, believe it is important to understand sorts of computations that can plausibly be carried out in neural systems.

3. We take a broad view of biology which includes concern for evolutionary basis for behavior.

4. A broader biological perspective emphasizes adaptive aspects of behaviors & recognizes that to understand adaptation requires attention to environment.


Connectionist Models

o Cognitive functions performed by system that computes with simple neuron-like elements, acting in parallel, on distributed representations.

1. Have precisely matched data from human subject experiments.– Measure speed of reading words – depends

on frequency of word & regularity of pronunciation pattern. (E.g., GAVE, HAVE).• Similar pattern (humans – latency, NN – errors).• Fig. P.1 on pg. 3 (McLeod, Plunkett, Rolls)



2. Connectionist models can predict results.– Suggest areas of investigation– E.g., U-shape learning or Over-generalization

problems when kids learn past tense of verbs (WENT – GOED) suggests linguistic development occurs in stages.

– NN model produced over-regularization errors.

– Fig. P.2. (McLeod, Plunkett, Rolls)



3. Connectionist models have suggested solutions to some of the oldest problems in cognitive science.• E.g., face recognition from various angles.• View invariance – respond to one particular

face (regardless of view) & not the other faces.

• E.g., face 3 in Fig. P.3. (McLeod, Plunkett, Rolls)





Task

o When train network, want it to produce some behavior.

o Task – behavior that are training network to do.– E.g., associate present tense form of verb

with past tense form.o Task must be precisely defined – for class of

networks we’re dealing with – learning correct output for a given input.– Set of input stimuli.– Correct output is paired with each input.

Training Environm

ent


Implications of Defining the Task

o Must conceptualize behavior in terms of inputs & outputs.– May need abstract notion of input & output.– E.g., associate 2 forms of verb – neither is really

input for other.o Teach network task by example, not by explicit rule.

– If successful, network learns underlying relationship between input & output by induction.

– Can’t assume network has learned generalization we assume underlies behavior – may have learned some other behavior! Eg., tanks.

1980s Pentagon trained NN to recognize tanks



Implications - 2

o Nature of training data is extremely important for learning.– The more data you give a network, the better.– With too little data, may make bad generalization.– Quality counts too!! – structure of environment

influences outcome.o Some tasks more convincing/more effective/more

informative than others to demonstrate a point.– Is info represented in teacher (output) plausibly

available to human learners?– E.g., children? See task on next slide.


Two Ways to Teach Network to Segment Sounds into

Words1. Expose network to sequences of sounds (present one at time,

in order, with no breaks between words).• Train network to produce “yes” when sequence makes

word.• Explicitly learns about words from info where words start.

2. Train network on different task – given same sequences of sounds as input, but task is to predict next sound.• At beginning of word, network makes many mistakes.• As it hears more of word, prediction error declines until end

of word.• Learns about words implicitly as indirect consequence of

task.o First approach -- gives away secret by directly teaching task

(boundary info) which is NOT how children learn.


Network Architectures : Number & Arrangement of Nodes in

Network

1. Single-layer feedforward networks -- input layer that projects onto output layer of neurons in one direction.

2. Multilayer feedforward network -- has 1+ hidden layers that intervene between external input & network output.


Network Architectures : Number & Arrangement of Nodes in

Network3. Recurrent network -- has at least

1 feedback loop.

4. Lattice structure -- 1-D, 2-D or greater arrays of neurons with output neurons arranged in rows & columns.


Most Neural Networks Consists of 3 Layers


6 Different Types of Connections Used Between Layers (Inter-layer

Connections)1. Fully connected. Each neuron on

first layer is connected to every neuron on second layer.

2. Partially connected. Neuron of first layer does not have to be connected to all neurons on second layer.

3. Feed forward. Neurons on first layer send their output to neurons on second layer, but receive no input back from neurons on second layer.


4. Bi-directional (recurrent). .Another set of connections carrying output of neurons of second layer into neurons of first layer.

5. Hierarchical. Neurons of lower layer may only communicate with neurons on next level of layer.

6. Resonance.Layers have bi-directional connections.– Can continue sending messages

across connections number of times until certain condition is achieved.


How to Select Correct Network Architectures

o Any task can be solved by some neural network (in theory) – not any neural network can solve any task.

o Number & arrangement of nodes defines network architecture.

o Textbook uses: 1) feedforward.2) simple recurrent networks.

o # nodes depends on task & how I/O are represented.– E.g., if images input in 100x100 dot array -- 10,000 I nodes.

o Selection of architecture reflects modeler’s theory about what info processing is required for task.


Analysis

1. Train network on task.2. Evaluate network’s performance & try to

understand basis for performance.o Need to anticipate kinds of tests before

training!

Ways to evaluate network performance:1. Global error.2. Individual pattern error.3. Analyzing weights & internal representations.


Evaluate Network Performance: Global Error

o During training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce.

o Simulator reports this error on-line -- sum it over number of patterns.– As learning occurs, error should decline &

reach 0.o If network is trained on task in which same input

can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0.


Evaluate Network Performance: Individual

Pattern Erroro Global error can be misleading.

– If have large # of patterns to learn, global error may be low even if some patterns are not learned correctly.

– These may be the interesting patterns.o Also may want to create special test stimuli

not presented to network during training.– Generalize to novel cases?– What has network learned?

o Helps discover what generalizations have been created from a finite data set.


Evaluate Network Performance: Analyzing Weights & Internal

Representations

1. Hierarchical clustering of hidden unit activations.

2. Principal component analysis & projection pursuit.

3. Activation patterns in conjunction with actual weights.


Hierarchical Clustering of Hidden Unit Activations

o Present test patterns to network after training.o Patterns produce activations on hidden units

which record & tag -- vectors in multi-dimensional space.

o Clustering looks at similarity structure of space.o Inputs treated as similar by network produce

internal representations that are similar.o Produces tree format of inner-pattern distance.o Can’t examine space directly -- difficult to

visualize high-dimensional spaces.


Principal Component Analysis & Project Pursuit

o Used to identify interesting lower-dimensional slices from hierarchical clustering.

o Move viewing perspective around in this space.


Activation Patterns in Conjunction With Actual

Weightso When look at activation patterns, only look at

part of what network “knows.”o Network manipulates & transforms info via

connections between nodes.o Examine connections & weights to see how

transformations are being carried out.o Hinton diagrams can be used -- weights

shown as colored squares with color & size of square representing magnitude & sign of connection.



Hinton Diagram.

White = positive weight. Black = negative weight.

Area of box proportional to absolute value of corresponding weight.


What Do We Learn From a Simulation?

o Are the simulations framed in such way that clearly address some issue?

o Are the task & stimuli appropriate for points being made?

o Do you feel you’ve learned something from the simulation?


Uses of Neural Networks

o Prediction -- Use input values to predict some output. E.g. pick best stocks, predict weather, identify cancer risk people.

o Classification -- Use input values to determine classification. E.g. is input letter A; is blob of video data a plane & what kind?

o Data association -- Recognize data that contains errors. E.g. identify characters when scanner is not working properly.

o Data Conceptualization -- Analyze inputs so that grouping relationships can be inferred. E.g. extract from database names most likely to buy product.

o Data Filtering -- Smooth an input signal. E.g. take the noise out of a telephone signal.


Send In The Robotshttp://www.spacedaily.com/news/robot-01b.html

by Annie Strickler and Patrick Barry for NASA Science News

Pasadena - May 29, 2001

o As a project scientist specializing in artificial intelligence at NASA's Jet Propulsion Laboratory (JPL), Ayanna is part of a team that applies creative energy to a new generation of space missions -- planetary and moon surface explorations led by autonomous robots capable of "thinking" for themselves.

o Nearly all of today's robotic space probes are inflexible in how they respond to the challenges they encounter (one notable exception is Deep Space 1, which employs artificial intelligence technologies). They can only perform actions that are explicitly written into their software or radioed from a human controller on Earth.

o When exploring unfamiliar planets millions of miles from Earth, this "obedient dog" variety of robot requires constant attention from humans. In contrast, the ultimate goal for Ayanna and her colleagues is "putting a robot on Mars and walking away, leaving it to work without direct human interaction."

http://science.nasa.gov/


o "We want to tell the robot to think about any obstacle it encounters just as an astronaut in the same situation would do," she says. "Our job is to help the robot think in more logical terms about turning left or right, not just by how many degrees." …

o To do this, Ayanna rely on 2 concepts in field of artificial intelligence: "fuzzy logic" & "neural networks." …

o Neural networks also have ability to learn from experience. This shouldn't be too surprising, since design of neural networks mimics way brain cells process information.

o "Neural networks allow you to associate general input to a specific output," Ayanna says. "When someone sees four legs and hears a bark (the input), their experience lets them know it is a dog (the output)." This feature of neural networks will allow a robot pioneer to choose behaviors based on the general features of its surroundings, much like humans do. “


o By combining these two technologies, Ayanna and her colleagues at JPL hope to create a robot "brain" that can learn on its own how to expertly traverse the alien terrains of other planets.

o Such a brainy 'bot might sound more like the science fiction fantasies of children's comics than a real NASA project, but Ayanna thinks the sci-fi flavor of the project contributes to its importance for space exploration.

o Ayanna -- who wanted to be television's "Bionic Woman" when she was young, and later decided she wanted to try to build her instead -- says she believes that the flights of imagination common in childhood translate into adult scientific achievement.

o "I truly believe science fiction drives real science forward," she says. "You must have imagination to go to the next level."


Learning to Use tlearn

o Define task.o Define architecture.o Setting up simulator.

– Configuration (.cf) file.

– Data (.data) file .– Teach (.teach) file.

o Check architecture.

o Run simulation.– Global error.– Pattern error.

o Examine weights.– Role of start state.– Role of learning

state.o Try:

– Logical Or.– Exclusive Or.


Define Task

o Train neural network to map Boolean functions AND, OR, EXCLUSIVE OR (XOR).

o Boolean functions take set of inputs (1, 0) & decide if given input falls into positive or negative category.

o Input & output activation values of nodes in network with 2 input units & 1 output unit.

o Networks simple & relatively easy to construct for task.

o Many of problems encounter with this task have direct implications for more complex problems.


Boolean Functions AND, OR, XOR

Input Output Activations Activations (Node 3)

Node 0

Node 1

AND OR XOR

0 0 0 0 0

0 1 0 1 1

1 0 0 1 1

1 1 1 1 0

4 possible input combinations 22

04/21/23 126

Define Architecture for AND Function

o 4 input patterns & 2 distinct outputs.– Each input pattern has 2 activation values.– Each output has single activation.– For every input pattern, have well-defined

output.o Use simple feedforward network with 2 I units & 1

O unit.Single Layer

Perceptron – 1 layer of weights.

w21

a0a1

a2

w20



1. Network menu – New Project option.2. New project dialogue box appears.3. Select directory or folder in which to save your

project files. Use N: Drive!4. Call project and. All files associated with

project should have same name (any name you want).

5. Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture.– and.teach – defines output patterns to network, how

many & format.– and.data – defines input patterns to network, how

many & format.– and.cf – used to define # nodes in network & initial

pattern of connectivity between nodes before training.


Info Stored in .cf, .data & .teach Files

o Can use editor of tlearn. o Or text editor or word processor.

– Must save files in ASCII format (text).o Enter data for and.cf file.

– Follow upper- & lower-case distinctions, spaces & colons.

– Use delete or backspace keys to correct errors.

o File Save command in tlearn.


1 AND 1 = 1

0 AND 0 = 0

0 AND 1 = 0

1 AND 0 = 0

OUTPUT

INPUT

CONFIGURATION


Key to setting up simulator.Describes configuration of

network.Conforms to fairly rigid

format.

3 sections:NODES:CONNECTIONS:SPECIAL


NODES:

NODES: Beginning of nodes sectionnodes = 1 # units in network (not input)inputs = 2 # input units (counted separately)outputs = 1 # output units in networkoutput node is 1 identifies output unit – only 1 non-

input node in network. Start at 1.

o Inputs don’t count as nodes.o Output nodes are < node-list>.o Spaces are critical.


CONNECTIONS:

CONNECTIONS : Beginning of sectiongroups = 0 How many groups of connections

are constrained to have same value.1 from i1-i2 Indicates node 1 (output) receives

input from 2 input units. Input units given prefix i.

1 from 0 Node 0 is bias unit which is always on. So node 1 has a bias.

o All connections in a group are identical strength.– groups = 0 is common.


o <node-list> from <node-list> provides info about connections– <node-list> is comma-separated list of node

# with dashes indicating that intermediate node # are included.

– 1 from i1-i2– Contains no spaces.– Nodes numbered counting from 1.

o Inputs are numbered, counting from 1, with i prefix.

o Node 0 always outputs a 1 & serves as bias node.– If biases are desired, connections must be

specified from node 0 to specific other nodes.– 1 from 0


SPECIAL:

SPECIAL: Beginning of sectionselected = 1 Which units selected for special

printout. Output node (1) is selected.weight_limit = 1.00 Sets start weights (from I to O &

biases to O) randomly in range of +/-0.5.

o Optional lines can specify if :– linear = <node-list> some nodes linear– bipolar = <node-list> values range from –1 to 1– selected = <node-list> nodes selected for special

printout


Data (.data) File

o Defines input patterns presented to tlearn.o First line is either:

– distributed (normal) – set of vectors with i values.

– localist (only few numbers of many input lines are non-zero).

o Second line is integer specifying number of input vectors to follow.

o Remainder of file consists of input.– Integers or floating-point numbers.



Teach (.teach) File

o Required whenever learning is to be performed.o First line: distributed (normal)

localist (only few of many target values nonzero).

o Integer specifying # output vectors to follow.o Ordering of output pattern matches ordering of

corresponding input patterns in .data file.o In normal (distributed), each output vector contains

o floating point or integer numbers.– o = number of outputs in network.– can use * instead of a floating point number to

indicate “don’t care”.



Checking the Architecture

o If typed in info to and.cf, and.data & and.teach files correctly should have no problems.

o tlearn offers check of and.cf by displaying picture of network architecture.– Displays menu, Network Architecture option.– Can change how see nodes, but doesn’t change

contents of network configuration file.o Get error message if mistake in syntax of training

files.o Doesn’t not find incorrect entries in data!!



Running the Simulation

o Specify 3 input files (.cf, .data, .teach) & save them.o Specify parameters for tlearn to determine initial

start state of network, learning rate, & momentum.o Network menu, training options.


o # training sweeps before stop – training sweep is 1 presentation of input pattern

causing activation to propagate thru network & appropriate weight adjustments to be carried out.

o Order in which patterns are presented to network determined by :– train sequentially – presents patterns in order

they appear in .data & .teach files.– train randomly – presents patterns in random

order.o Learning Rate – determines how fast weights are

changed in response to a given error signal.– set to 0.100

o Momentum –discussed later.– set to 0.0


o Initial state of network determined by weight values assigned to connections before training starts.– .cf file specifies weight_limit

o Weights assigned according to random seed indicated by number next to Seed with: button.– Select any number you like.– Simulation can be replicated using the same

random seed – initial start weights of network are identical & patterns are sampled in same random order.

o Seed randomly – computer selects random seed.o Both Seed with & Seed randomly select set of

random start weights within the limits specified by weight_limit parameter.


Train the Network

o Once set training options, select Train the network from Network menu.

o Get tlearn Status display.– # sweeps– Abort, dump current state in weights file.– Iconify – clear screen for other tasks while

tlearn runs in background.


Has the Network Solved the Problem?

1. Examine global error produced at output nodes averaged across patterns.

2. Examine response of network to individual input patterns.

3. Analyzing weights & internal representations.


Examine Global Error

o During training, simulator calculates discrepancy between actual network output activations & target activations it is being taught to produce.

o Simulator reports this error on-line -- sum it over a number of patterns.– As learning occurs, error should decline & reach 0.

o If network is trained on task in which same input can produce different outputs, then network can learn correct probabilities, but error rate never reaches 0.

o Error calculated by subtracting actual response from desired (target) response.

o Value of discrepancy is either:– Positive if target greater than actual output.– Negative if actual output is greater than target output.


Root Mean Square (RMS) Error

o Global error – average error across 4 pairs at a given point in training.

o tlearn provides Root Mean Square error (RMS) to prevent cancellation of positive & negative numbers.– Average of the squared errors for all

patterns.– Returns square root of average.


• Tracks RMS error throughout training (every 100 sweeps).• Error decreases as training continues … after 1000 sweeps

RMS error = 0.35.– Average output error = 0.35– Output off target by approx. 0.35 averaged across 4

patterns.

AND Network


o Equation 3.1– k indicates number of input patterns (4 for AND)– ok is vector of output activations produced by input

pattern k– number of elements in vector corresponds to number

of output nodes.• e.g., in this case (AND), only one output node so

vector contains only 1 element.– vector tk specifies desired or target activations for

input pattern k.o With 1000 sweeps & 4 input patterns, network sees

each pattern 250 approximately.


o Given RMS error = 0.35, has the network learned the AND function?– Depends on how define acceptable level of error.

o Activation function of output unit is sigmoid function (EQ 1.2).– Activation curve never reaches 1.0 or 0.0– Net input to node would need to be ± infinity.– Always some residual finite error.

o So what level of error is acceptable? No right answer.– Can say all outputs be within 0.1 of target.– Can round off activation values & ones closest to 1.0

are correct if target is 1.0.


Has Network Solved Problem?

o RMS error = 0.35. Solved?o Depends on how define acceptable level of

error.– Can’t always use just global error.– Network may have low RMS, but hasn’t

solved all input patterns correctly.Exercise 3.31. How many times has network seen each input

pattern after 1000 sweeps through training set?2. How small must RMS error be before we can

say network has solved problem?


Pattern Error – Verify Network Has Learned

o RMS error is the average error across 4 patterns.

o Is error uniformly distributed across different patterns or have some patterns been correctly learned while others are not??

o Verify network has learned from Network menu– Presents each input pattern to network once

& observes resulting output node activation.– Compare output activations with teacher

signal in .teach file.


o Output window indicates file and.1000.wts as specification of state of network.

o Used and.data training patterns to verify network performance.

o Compare activation values to target activations in and.teach file.

o Has the network solved Boolean AND?


Pattern Error – Node Activities

o Activation levels indicated by squares.– Large white = high activations.– Small white = low activations.– Grey = inactive node.


Individual Pattern Error

• Global error can be misleading.– If have large # of patterns to learn, global

error may be low even if some patterns are not learned correctly.

– These may be the interesting patterns.• Also may want to create special test stimuli

not presented to network during training.– Generalize to novel cases?– What has network learned?

• Helps discover what generalizations have been created from a finite data set.


Pattern Error: Present each Input Pattern Just Once

• Select Verify network has learned from Network menu.• Presents each input pattern to network just

once.• E.g., for AND function, should do 4 sweeps

(1 per each training input).• Observe resulting output node activations.• Compare output activations with teacher

signal in .teach file.


• Output window indicates file and.1000.wts as specification of state of network.

• Used and.data training patterns to verify network performance.

• Compare activation values to target activations in and.teach file.

• Has the network solved Boolean AND?

AND Network


Calculate Actual RMS Error Value & Compare it to Value Plotted (Boolean AND)

Input

Output Round Off

Target

Squared Error

0 0 0.099 0 0 .0098

1 0 0.294 0 0 .0864

0 1 0.301 0 0 .0906

1 1 0.620 1 1 .1444

RMSError =

.2877

Sqrt(.3312/4)


Pattern Error – Node Activities

• Activation levels indicated by squares.– Large white = high activations.– Small white = low activations.– Grey = inactive node.


Examine Weights

o Input activations transmitted to other nodes along modifiable connections.

o Performance of network determined by strength of connections (weight values).

1. Display menu, Connection Weights (Hinton diagram).– white (positive)– black (negative)– size reflects absolute size

of connectionbias node/first input/second input


o All rectangles in first column code values of connection from bias node.

o Rectangles in 2nd column code connections from 1st input unit.

o Across columns – higher numbered nodes (from .cf)

o Rows in each column identify destination nodes of connection.– higher numbered rows indicate higher

numbered destination nodes.– Only one node in this example receives

inputs (output node) – only one that receives incoming connections.


o Hinton diagram provides clues how network solves Boolean AND.– Bias has strong negative connection to

output node.– 2 input nodes have moderately sized

positive connections to output node.– One active node by itself can’t provide

enough activation to overcome strong negative bias.

– Two active input nodes together can overcome negative bias.

– Output node only turns on if both input nodes are active!


Role of Start State

o Network solved Boolean AND starting with particular set of random weights & biases.

o Use different random seed (Training options) to wipe out learning that has occurred …

o Can resume training beyond the specified number of sweeps using the Resume training option.

o Start states can have dramatic impact on way network attempts to solve a problem & on final solution.– Training networks with different random seeds is like

running subjects on experiments.


Role of Learning Rate

o Learning rate determines proportion of error signal which is used to change weights in network.– Large learning rates lead to big weight changes.– Small learning rates lead to small weight changes.

o To examine effect of learning rate on performance, run simulation so that learning rate is only factor changed.– Start with same random weights & biases.

o Modelers often use small learning rate to avoid large weight changes.– Large weight changes can be disruptive (learning is

undone).– Large weight changes can be counter-productive

when network is close to a solution!


1. Network menu – New Project option. New project dialogue box appears.

2. Select directory or folder in which to save your project files. Use N: Drive!

3. Get 3 windows on screen – each used for entering info relevant to different aspect of network architecture (.teach, .data, & .cf).

4. Check architecture.

5. Specify training option parameters to determine initial start state of network, learning rate, & momentum.

6. Train network (from Network menu).7. Determine if network has learned task by checking error

rates, examine response to individual patterns, etc.

Steps To Building Neural Network in tlearn


Bias Node

First Input

Second Input

AND Network : Hinton Diagram


Hinton Diagram.

White = positive weight. Black = negative weight.

Area of box proportional to absolute value of corresponding weight.


Logical AND Network Implemented With 2 I & 1 O

o Output unit on (value close to 1.0) when both inputs 1.0. Otherwise off.

o With large - weight from bias unit to output, off by default.

o Make weights from input nodes to output large enough that if both nodes are present, net input is great enough to turn output on.– Neither input by itself is

large enough to overcome negative bias.

Node 0 is bias unit which is always on. So node 1 has a bias.


Hinton Diagram Example


Weights File in tlearn

o tlearn keeps up-to-date record of network’s state in weights file.

o Saved to disk at regular intervals & at end of training.o Lists all connections in network grouped according to

received node.o In and.cf file only 1 receiving node is specified (output

node 1).


o 1st # represents weight on connections from bias node to output node (-2.204).

o 2nd # (1.328) shows connection from 1st input node to output.

o 3rd # (1.36) shows connection from 2nd input node to output node.

o Final number (0.000) shows connection from output node itself – non-existent due to feedforward nature.


Resume Training

o Can continue network training by Resume training option on the Network menu.– Extend training by # sweeps & adjust error

display to accommodate extra training sweeps.

o Does the RMS error decrease significantly?


Several Different Ways to Analyze Weights & Examine Internal

Representations

1. Hierarchical clustering of hidden unit activations.

2. Principal component analysis & projection pursuit.

3. Activation patterns in conjunction with actual weights.

• Examine these methods in detail later in semester!


1 - Hierarchical Clustering of Hidden Unit Activations

• Present test patterns to network after training.• Patterns produce activations on hidden units which

record & tag -- vectors in multi-dimensional space.• Clustering looks at similarity structure of space.• Inputs treated as similar by network produce internal

representations that are similar.• Produces tree format of inner-pattern distance.• Can’t examine space directly -- difficult to visualize

high-dimensional spaces.


2 - Principal Component Analysis & Project Pursuit

• Used to identify interesting lower-dimensional slices from hierarchical clustering.

• Move viewing perspective around in this space.


3 - Activation Patterns In Conjunction With Actual Weights

• When look at activation patterns, only look at part of what network “knows.”

• Network manipulates & transforms info via connections between nodes.

• Examine connections & weights to see how transformations are being carried out.

• Hinton diagrams can be used -- weights shown as colored squares with color & size of square representing magnitude & sign of connection.


Has Network Solved AND Problem? RMS error = 0.35. Solved?

• Depends on how define acceptable level of error.– Can’t always use just global error.– Network may have low RMS, but hasn’t solved all

input patterns correctly.• Exercise 3.3

1. How many times has network seen each input pattern after 1000 sweeps through training set?

2. How small must RMS error be before we can say network has solved problem?

• Exercise 3.41. Compare exact value of RMS to plotted value.


What Do We Learn From a Simulation?

• Are the simulations framed in such way that clearly address some issue?

• Are the task & stimuli appropriate for points being made?

• Do you feel you’ve learned something from the simulation?


Logical OR

o What type of network architecture?

o 2 input, 1 output + bias node

o Try the OR network (pg. 57-62).

Input Output Activations Activations (Node 3)

Node 0 Node 1 AND OR XOR

0 0 0 0 0

0 1 0 1 1

1 0 0 1 1

1 1 1 1 0



Exclusive OR

o Create third project called xor and try the exclusive OR function with input layer and output layer.


Neural Network Simulation Software : tlearn, Membrain

o Simulations allow examination of how model solved problem.

o Simulator needs to be told:– Network architecture.– Training data.– Learning rate & other parameters.

o Simulator:– Creates network.– Performs training.– Reports results.

o You can examine results.


Tlearn Software

1. Copy win_tlearn.exe from disk or R: drive to N: drive.

2. Double-click on file to begin installation.3. Executable is called tlearn.

o http://www.columbia.edu/cu/psychology/courses/3205/tlearn/

To download Adobe Acrobat PDF version: ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/TlearnManual.pdf

ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/TlearnManual.pdf

ftp://ftp.crl.ucsd.edu/pub/neuralnets/tlearn/TlearnManual.pdf

Documents

Artificial Neural Network (ANN)