Self-organization and error correction

EE1411

Self-organization and error Self-organization and error correctioncorrection

Janusz A. Starzyk

http://grey.colorado.edu/CompCogNeuro/index.php/CECN_CU_Boulder_OReillyhttp://grey.colorado.edu/CompCogNeuro/index.php/Main_Page

Based on a courses taught by Prof. Randall O'Reilly, University of Colorado,Prof. Włodzisław Duch, Uniwersytet Mikołaja Kopernikaand http://wikipedia.org/

Cognitive Neuroscience Cognitive Neuroscience and Embodied Intelligenceand Embodied Intelligence

http://grey.colorado.edu/CompCogNeuro/index.php/CECN_CU_Boulder_OReilly








http://grey.colorado.edu/CompCogNeuro/index.php/Main_Page








http://psych.colorado.edu/~oreilly



EE1412

Learning: typesLearning: types1. How should an ideal learning system look?2. How does a human being learn?

Detectors (neurons) can change local parameters but we want to achieve a change in the functioning of the entire information processing network.

We will consider two types of learning, requiring other mechanisms:

Learning an internal model of the environment (spontaneous). Learning a task set by the network (supervised). Combination of both.

EE1413

Learning operationsLearning operationsOne output neuron can't learn much. Operation = sensomotor transformation, perception-action.

Stimulation and selection of the correct operation, interpretation, expectations, plan…

What type of learning does this allow us to explain? What types of learning require additional mechanisms?

EE1414

SimulationSimulationSelect self_org.proj.gz, in Chapter 4.8.1

25 inputs20 hidden neurons, kWTA;

The network will learn interesting features.

EE1415

Chose Self_org.proj from Ch4.

5x5 input has either a single horizontal or vertical line (10 samples) or a combination of 2 lines (45 samples).

Learning is possible only for individual lines.

Miracle: Hebbian lerning + kWTA is sufficient for the network to make correct internal representations.

SimulationSimulation

EE1416

4x5 = 20 hidden neurons, kWTA.

After training (30 epochs presenting all line pairs), selective units responding to single lines appear, (2 units for 2 lines) giving a combinatorial representation!

Initially responses to inputs are random but winners quickly appear. Some units (5) remain inactive and they can be used to learn new inputs.

Self-organization, but no topological representation, since neighbors respond to different features.

10 unique representations for single line inputs – all correct.

SimulationSimulation

EE1417

Sensomotor mapsSensomotor mapsSelf-organization is modeled in many ways; simple models are helpful in explaining qualitative features of topographic maps.

Fig. from:

P.S. Churchland, T.J. Sejnowski,The computational brain.MIT Press, 1992

EE1418

Motor and somatosensory mapsMotor and somatosensory maps

This is a very simplified image, in reality most neurons are multimodal, neurons in the motor cortex react to sensory, aural, and visual impulses (mirror neurons)

- many specialized circuits of perception-action-naming.

EE1419

Finger representation: plasticityFinger representation: plasticity

Hand

Face

Before After stimulationstimulation

Sensory fields in the cortex expand after stimulation– local dominances resulting from activation

Plasticity of cortical areas to sensory-motor representations

EE14110

Simplest modelsSimplest modelsSOM or SOFM (Self-Organized Feature Mapping) – self-organizing feature map, one of the most popular models.

How can topographical maps be created in the brain? Local neural connections create strong groupsinteracting with each other, weaker across greater distances and inhibiting nearby groups.

History:

von der Malsburg and Willshaw (1976), competitive learning, Hebbian learning with "Mexican hat" potential, mainly visual systemAmari (1980) – layered models of neural tissue.Kohonen (1981) – simplification without inhibition; only two essential variables: competition and cooperation.

EE14111

SOM: ideaSOM: idea

Data: vectors XT = (X1, ... Xd) from d-dimensional space.

A net of nodes with local processors (neurons) in each node.

Local processor # j has d adaptive parameters W(j).

Goal: adjust the W(j) parameters to model the clusters in p-ni X.

EE14112

Training SOMTraining SOM

Fritzke's algorithm Growing Neural Gas (GNG)

Demonstrations of competitive GNG learning in Java: http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html

oo

oox

x

xx=dane

siatka neuronów

N-wymiarowa

xo=pozycje wag neuronów

o

o o

o

o

o

oo

przestrzeń danych

wagi wskazująna punkty w N-D

w 2-D

EE14113

SOMSOM algorithm: competition algorithm: competitionNodes should calculate the similarity of input data to their parameters.Input vector X is compared to node parameters W. Similar = minimal distance or maximal scalar product.

Competition: find node j=c with W most similar to X.

2( ) ( )

( )arg min

i

j ji

i

j

j

X W

c

X W

X W

Node number c is most similar to the input vector X It is a winner, and it will learn to be more similar to X, hence this is a “competitive learning” procedure.

Brain: those neurons that react to some signals activate and learn.

EE14114

SOMSOM algorithm: cooperation algorithm: cooperation

Cooperation: nodes on a grid close to the winner c should behave similarly. Define the “neighborhood function” O(c):

2 20( , , ) ( )exp / ( )c c ch r r t h t r r t

t – iteration number (or time);rc – position of the winning node c (in physical space, usually 2D).

||r-rc|| – distance from the winning node, scaled by c(t).

h0(t) – slowly decreasing multiplicative factor

The neighborhood function determines how strongly the parameters of the winning node and nodes in its neighborhood will be changed, making them more similar to data X

EE14115

SOMSOM algorithm: dynamics algorithm: dynamicsAdaptation rule: take the winner node c, and those in its neighborhood O(rc), change their parameters making them more similar to the data X

( ) ( ) ( )

,

For

1 ,i i ii c

i O c

t t h r r t t t

W W X W

Randomly select new sample vector X, and repeat. Decrease h0(t) slowly until there will be no changes.

Result: W(i) ≈ the center of local clusters in the X feature space Nodes in the neighborhood point to adjacent areas in X space

EE14116

Maps and distortionsMaps and distortions

Initial distortions may slowly disappear or may get frozen ... giving the user a completely distorted view of reality.

EE14117

Demonstrations with the help of GNGDemonstrations with the help of GNG

Growing Self-Organizing Networks demo Growing Self-Organizing Networks demo http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html

Parameters of the SOM program:

t – iterations (t) = i (f / i )t/tmax specifies a step in learning

(t) = i (f / i )t/tmax specifies the size of the neighborhood

2 2( , , , , ) ( )exp / ( )c ch r r t t r r t

Maps 1x30 show the formation of Peano's curves.We can try to reconstruct Penfield's maps.

http://www.neuroinformatik.ruhr-uni-bochum.de/ini/VDM/research/gsn/DemoGNG/GNG.html

http://www.is.umk.pl/~duch/ref/01/01-plastic/brain-maps.ppt

EE14118

Mapping kWTA CPCAMapping kWTA CPCA

Hebbian learning finds relationship between input and output.

Example:

pat_assoc.proj.gz

in Chapter 5,described in 5.2

Simulations for 3 tasks, from easy to impossible.

EE14119

Derivative based Hebbian learningDerivative based Hebbian learning

Hebb's rule: wkj = (xk -wkj) yj

will be replaced by derivative based learning based on time domain correlation of firing between neurons.

This can be implemented in many ways; For the signal normalization purpose let us assume that the maximum rate of change between two consecutive time frames is 1. Let us represent derivative of the signal x(t) change by dx(t).

Assume that the neuron responds to signal changes instead of signal activation

k

jkj wdxy

EE14120

Derivative based Hebbian learningDerivative based Hebbian learningDefine product of derivatives

pdkj(t)=dxk(t)*dyj(t).

Derivative based weight adjustment will be calculated as follows:Feedforward weights are adjusted aswkj = (pdki (t) - wkj) |pdki (t)|

and feedback weight are adjusted aswjk = (pdki (t) - wjk) |pdki (t)|

This adjustment gives symmetrical feedforward and feedback weights.

xk(t)

yj(t)

pdkj(t) t

EE14121

Derivative based Hebbian learningDerivative based Hebbian learningAsymmetrical weight can be obtained by using product of shifted derivative values

pdkj(+)=dxk(t)*dyj(t+1) and pdkj(-)=dxk(t)*dyj(t-1).

Derivative based weight adjustment will be calculated as follows:Feedforward weights are adjusted aswkj = (pdki (+) - wkj) |pdki (+)|

and feedback weight are adjusted aswjk = (pdki (-) - wjk) |pdki (-)|

yj

xk

wjk

wkj

x1 x2

EE14122


Feedforward weights are adjusted aswkj = (pdki (+) - wkj) |pdki (+)|

xk(t)

yj(t)

yj(t+1)

pdkj(+)t

yj

xk

wkj

x1 x2

EE14123


and feedback weight are adjusted aswjk = (pdki (-) - wjk) |pdki (-)|

xk(t)

yj(t)

yj(t-1)

pdkj(-)t

yj

xk

wjk

x1 x2

EE14124

Task learningTask learningUnfortunately Hebbian learning won't suffice to learn arbitrary relationship between input and output. This can be done by learning based on error correction.

Where do goals come from? From the "teacher," or confronting the predictions of the internal model.

EE14125

The Delta ruleThe Delta ruleIdea: weights wik should be revised so that they change strongly for large errors and not undergo a change if there is no error, sowik ~ ||tk – ok|| si

Change is also proportional to the size of the activation by input si Phase + is the presentation of the goal, phase – is the result of the network.This is the delta rule.

EE14126

Credit AssignmentCredit AssignmentCredit/blame assignment

wik = ||tk – ok|| si

The error is local, for image k.

If a large error formed and output ok is significantly smaller than expected then input neurons with a large activation will make the error even larger. If output ok is significantly larger than expected then input neurons with a large activation will decrease it significantly. Eg. input si is the number of calories in different foods, output is a moderate weight; if it's too big then we must decrease high-calorie weights (food), if it's too small then we must increase them.

Representations created by an error-minimalization process are the result of the best assignment of credit to many units, and not the greatest correlation (like in Hebbian models).

EE14127

We don't want the weights to change without limits and not accept negative values. This is consistent with biological demands which separate inhibitory and excitatory neurons and have upper weight limits.

The weight change mechanism below, based on the delta rule, ensures the fulfillment of both restrictions. wik = ik (1- wik) if ik >0wik = ik wik if ik <0where ik is the weight change resulting from error propagation

Limiting weightsLimiting weights

This equation limits the weight values to the 0-1 range.

The upper limit is biologically justified by the maximum amount of NT which can be emitted and the maximum density of the synapses

ik

wik

wei

ght

0

1

EE14128

Task learningTask learningWe want: Hebbian learning and learning using error correction, hidden units and biologically justified models.

The combination of error correction and correlations can be aligned with what we know about LTP/LTD

wij = [ xi yj + xi yj

]

Hebbian networks model states of the world but not perception-action.

Error correction can learn mapping. Unfortunately the delta rule is only good for output units, and not hidden units, because it has to be given a goal.

Backpropagation of errors can teach hidden units.But there is no good biological justification for this method…

EE14129

SimulationsSimulationsSelect:pat_assoc.proj.gz, in Chapt. 5

Description: Chapt. 5. 5

The delta rule can learn difficult mappings, at least theoretically...

Documents

Self-organization and error correction