Upload
neal-douglas
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
1
10.10. The cognitive brainThe cognitive brain
Lecture Notes on Brain and Computation
Summary by Byoung-Hee Kim
Lecture by Byoung-Tak Zhang
Biointelligence LaboratorySchool of Computer Science and Engineering
Graduate Programs in Cognitive Science, Brain Science and BioinformaticsBrain-Mind-Behavior Concentration Program
Seoul National University
E-mail: [email protected] material is available online at http://bi.snu.ac.kr/
Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2010.
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
IntroductionIntroduction
This chapter is a continuation of discussing system-level models of the brain
Layered representation of brain Invariant object recognition Visual attention
Workspace hypothesis How the brain is able to produce novel solutions to new tasks
General discussion of the brain theory in bidirectional layered models of cortex – anticipation is a central feature Bayesian formulations Boltzmann machine / Helmholtz machine / Deep belief networks Probabilistic reasoning: Bayesian networks, EM Adaptive resonance theory (ART)
2
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
OutlineOutline
3
10.110.210.310.410.5
Hierarchical maps and attentive visionAn interconnecting workspace hypothesisThe anticipating brainAdaptive resonance theoryWhere to go from here
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.1 Hierarchical maps and attentive vision10.1 Hierarchical maps and attentive vision
Invariant object recognition of humans Recognize objects even though they vary in form, size, location,
and viewing angle
Neural systems that learn to recognize objects through supervised learning in mapping networks are quite sensitive against changes in the input vector
Solution: hierarchical networks Related question: attention in the visual system
4
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.1.1 Invariant object recognition10.1.1 Invariant object recognition
5
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
VisNet ModelVisNet Model
Each layer is a competitive map Competition is implemented through adjustment of the firing threshold of
nodes until a predefined sparseness is reached (refer Section 7.5.11)
The model is trained on sequences of patterns from movements of objects in the visual field
Weights btw the layers are adjusted with Hebbian learning Early version: Hebbian learning with a trace rule – some memory in the
neural activity (trace) was used Temporal associations between moving objects
Recent model: without a trace rule when objects in consecutive time steps have some overlap with previous neural representations
Using spatial associations Able to learn invariant object recognition
invariance to translation, rotation, and size Multiple objects can be trained simultaneously The information flow between the layers is strictly feedforward
6
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.1.2 Attentive vision10.1.2 Attentive vision
Top-down information flow in cortical models Crucial in cognitive process:
Example: Visual search Demands the top-down influence of an object bias that specifies what
to look for ↔ object recognition demands a spatial attention
7
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
The overall scheme of the model for The overall scheme of the model for attentive visionattentive vision Model by Gustavo Deco et al. Three important parts: ‘V1-V4’, ‘IT’, ‘PP’
We are particularly interested in how the different parts influence each other
8
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Roles of parts: LGNRoles of parts: LGN
Representations in the LGN: Gabor functions Bottom-up input images to the model are decomposed with
Gabor functions Turning curves in the LGN can be parameterized with Gabor
functions
9
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Roles of parts: ‘V1-V4’Roles of parts: ‘V1-V4’
Principal role in the model Decomposing the visual scene into features V1 neurons respond mainly to simple features - orientation of edges Later areas are specialized to represent other features – color, motion,
combination of basic features Modeling issue
Simplification: features are just represented in sections which correspond to the location of the object in the visual field
Each section represents a feature of a part as a vector of node activities No global competition between the modules. Only within each section – an inhibitory pool that keeps the sparseness
of the activity in each section roughly constant (refer Section 7.5.1)
10
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Roles of parts: ‘IT’Roles of parts: ‘IT’
11
IT: model of processes in the inferior-temporal cortex Known to be involved in object recognition Modeled as associative memory (refer Ch. 8) Attractor network: Point attractors of specific objects are
formed through the Hebbian-trained collateral connections within this structure
The connections btw ‘V1-V4’ and ‘IT’ can also be trained with Hebbian learning
This enables simulation of translation-invariant object recognition
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Roles of parts: ‘IT’Roles of parts: ‘IT’
12
The contribution of the attractor network in ‘IT’ to translation-invariant object recognition explains some recent experimental findings the size of the receptive field of IT neurons depends on the
content of the visual field and the specifics of the task Example: Single object on a screen with a blank
background – the receptive field of a neuron that responds to this object can be very large (> 30 degrees)
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Roles of parts: ‘IT’Roles of parts: ‘IT’
13
If two objects are presented simultaneously, or if the target object is shown on top of a complex background (which can be viewed as a scene with many objects)
Then the size of the receptive field shrinks markedly Fig. 10.4(A)
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Roles of parts: ‘IT’Roles of parts: ‘IT’
14
Hypothesis given by the model Based on the attractor dynamics of the autoassociator network in ‘IT’ If only one object is shown, then this object would trigger the right point
attractor and thus recall of the object regardless of its location, which corresponds to large receptive fields
If two or more trained objects are shown, then it is likely that the final state of the attractor network is mainly dominated by the object closest to the fovea, which gets the most weight due to cortical magnification
Fig. 10.4(B)
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.1.3 Attentional bias in visual search and 10.1.3 Attentional bias in visual search and object recognitionobject recognition
Object bias input to the attractor network in ‘IT’ Tells the system what to look for in visual search task
Such top-down information is thought to originate in the frontal areas of the brain
Can speed up the recognition process in ‘IT’ Supports the recognition ability of the input from ‘V1-V4’ that
corresponds to the target object in visual search The receptive fields of target objects are larger than receptive
fields of non-target objects
15
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Attentional bias in Object RecognitionAttentional bias in Object Recognition
16
Top-down input to a specific location in ‘PP’ Object recognition task The label of this module suggests processing in the posterior
parietal cortex, which is part of the dorsal visual processing pathway (‘where’ pathway)
Modeled with a spatially organized neural sheet, which is connected to the corresponding section in ‘V1-V4’
Enhance the neural activity in ‘V1-V4’ for the features of the object that are located at the corresponding location in the visual field
Faster completion of the input patterns for the ‘IT’ networkFaster object recognition at the corresponding location
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Attentional bias – summaryAttentional bias – summary
Origins of the attentional biases Visual search: top-down input to ‘IT’, object-based Object recognition: acts on ‘PP’, location-based
It may be difficult to separate the different forms of attention in experiments because all parts of the model are bidirectional
17
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Numerical experimentsNumerical experiments
Simulation experiment by Deco et al. Target: the letter ‘E’ Distractors: visually different (‘X’, Fig. 5A) and similar (‘F’,
Fig. 5B) Reaction time
Independent of the number of objects (Fig. 5A) Linear increase with the number of objects (Fig. 5B)
Both modes are present in the same ‘parallel architecture’ The apparent serial search is only due to the more
intense conflict-resolution demand in the recognition process
18
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.2 An interconnecting workspace hypothesis10.2 An interconnecting workspace hypothesis
Humans are very flexible in coping with the complex world Ex. Humans can drive a car with such ease that little attention
despite the fact that we have to react to the unknown environment
Model to solve complex cognitive task Attributes: flexible cooperation of many specialized modules System-level perspective
specialized processors + global system activities Flexibility + robustness
19
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.2.1 The global workspace10.2.1 The global workspace
Model by S. Dehaene et al. (1995) Five basic subsystems
Perceptual (input), motor (output), three cognitive subsystems
Global workspace: interconnecting network among subsystems
20
Projection between cortical areas are indeed abundant
Large portion of the global workspace could be localized within layers II and III
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.2.2 Demonstration of the global 10.2.2 Demonstration of the global workspace in the Stroop taskworkspace in the Stroop task
Stroop Task Task 1: read the word or Task 2: name the color in which the word is written You should do it rapidly
Idea: the global workspace has to become active to ‘re-wire’ the commonly active word-naming configuration of the brain
21
Yellow Green Blue
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Model that can be tested on a Stroop task to demonstrate the idea Three specialized processors Top-down influence of workspace nodes on the nodes in the
specialized processors Reward signal – indicates a mismatch btw the desired response and
actual response of the system Vigilance parameter
Usage of the model Predictions Flexibility to solve different tasks
22
10.2.2 Demonstration of the global 10.2.2 Demonstration of the global workspace in the Stroop taskworkspace in the Stroop task
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.3 The anticipating brain10.3 The anticipating brain
Layered architectures with bottom-up and top-down information flow are important for cognitive processes Previous examples: VisNet, interconnecting workspace hypothesis
Generalization (emerging brain theory) brain-style information processing principles general hypothesis of how the brain implements cognitive functions Brain as an anticipating memory system
Remainder is a discussion of Anticipating brain hypothesis Related model implementations
23
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
The Anticipating brain – factors The Anticipating brain – factors
Factors that are essential in realizing cognitive functions The brain can develop a model of the world, which ca be used
to anticipate or predict the environment The inverse of the model can be used to recognize causes by
evoking internal concepts Hierarchical representations are essential to capture the richness
of the world Internal concepts are learned through matching the brain’s
hypotheses with input from the world An agent can learn actively by testing hypothesis through
actions The temporal domain is an important degree of freedom
24
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.3.1 The brain as anticipatory system in 10.3.1 The brain as anticipatory system in a probabilistic frameworka probabilistic framework
Notations Sensory state: s c: causal state g: describes the physical process of generating the sensory
response a: action of the agent s’: internal representation of sensory states in primary sensory
cortex c’: higher-order cortical representations. Called as concepts Generative model G: on an abstract level, we see the brain as a
generative model of the world Recognition model Q: the inverse of G, which evokes internal
concepts from causes in the environment
25
)(cs g )',|'( csp sa)c,|'(sp );'( Gp s
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Definition of the term ‘causes’ highlights two important functions of brain processing First, one of the major goals of the brain is to learn what causes are by
forming internal concepts Second, the brain must learn concepts at different levels of abstraction Learning concepts and predicting causes in our environment is central
to the thesis developed in this chapter
Agent: system that can explore the environment actively via interaction with the environment
Conjecture The brain is trying to match sensory input with internally generated
states
26
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
The world model in a probabilistic The world model in a probabilistic frameworkframework
Layered structure that includes the necessary bidirectional connections Related models: deep belief networks
Highly interactive system: it is not easy to clearly separate the generative model from the recognition model
Model form: Bayesian network or causal graphical model
27
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Concepts at different levels of cortical representation learned in a self-supervised way through the interaction with the
environment Engrained into a memory system
Example from the visual system Early level can learn to recognize different sequences of retinal
patterns Sequences of these concepts can then be learned by higher-order
cortical areas Higher order concepts that are evoked by specific sensory input can
influence the expectations of concepts in lower cortical representations, ultimately anticipating specific patterns of sensory input
28
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Hypothesis testing – how good the world model Can only be achieved through interactions with the environment Inference of the world model with environmental data Hypothesis testing by the agent is different to common
inference techniques in statistics in that the agent seems to be able to actively interact with the environment
Active learning might be necessary to reduce the demands on learning in large systems
Now, we will learn some recent models that implement and elaborate on the principal ideas outlined in this section
29
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.3.2 The Boltzmann machine - Intro10.3.2 The Boltzmann machine - Intro
Models which are able to learn expectations of sensory states The attractor neural network (ANN) (Ch. 8) More general dynamic models (in this chapter)
Features of the attractor neural network (ANN) It can be seen as predictive memory systems Simple recurrent network Trained with Hebbian autocorrelation rules corresponding dynamic
system has point attractors
Limitations of the ANN Always produce the same answer given partial input of a sensory state Not reflecting the probability of different causes in the environment
with similar sensory state
30
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Toward a general dynamic systemToward a general dynamic system
Introducing hidden nodes to a recurrent system Hidden nodes
in feedforward mapping networks (perceptrons), provide enough internal representations
In recurrent networks, provide enough degree of freedom Finding practical training rules for the dynamic system of recurrent
networks with hidden nodes have been a major challenge
Extension Distinguish visible nodes and hidden nodes The system still can be described by an energy function Boltzmann machines – symmetrical connections Helmholtz machines – asymmetrical connections
31
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
The Boltzmann machine - 1The Boltzmann machine - 1
The energy between two nodes s: the state variable n or m can have values v or h to indicate visible and hidden
nodes
Probabilistic update rule (Glauber dynamics)
β: inverse temperature Describes the competitive interaction between minimizing the
energy and the randomizing thermal force Probability distribution for such a stochastic system is called the
Boltzmann-Gibbs distribution
32
)6.10(.2
1 ij
mj
niij
nm sswH
)7.10(,exp1
1)1(
i
niij
ni sw
sp
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
The Boltzmann machine - 2The Boltzmann machine - 2
The distribution of visible states in thermal equilibrium
w: the weights of the recurrent network
normalization term called the partition function
Target of the model With enough hidden nodes By choosing the right weight values, we want to the dynamical
system to approximate the probability function of the sensory states caused by the environment
33
)8.10(,exp1
);(
hm
vmv HZ
p ws
,exp, mn
nmHZ
)( vp s
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
The Boltzmann machine - 3The Boltzmann machine - 3
To derive a learning rule we need to define an objective function – difference between two density functions Kullbach-Leibler (KL) divergence
• \
Minimizing the KL divergence is equivalent to maximizing the average log-likelihood function
By gradient ascent the learning rule can be written
‘clamped’ – thermal average of the correlation btw two nodes when the states of the visible nodes are fixed
‘free’ – thermal average when the recurrent system is running freely
34
)10.10(
)9.10(
.);(log)()(log)(
);(
)(log)());(),(KL(
vvv
vvv
v
v
vvvv
pppp
p
pppp
ss
s
wssss
ws
sswss
)11.10(.);(log);(log)()( v
vvv pppls
wswssw
)12.10().(2 freeclamped
jijiij
ij ssssw
lw
Note: <x> represents expectation of x
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Features In principle, the Boltzmann machine can be trained to represent
any arbitrary density functions, given that the network has a sufficient number hidden nodes
The clamped phase could be associated with a sensory driven agent during an awake state
The freely running state could be associated with a sleep phase
Limitations Learning is too demanding in practice
The averages have to be evaluated at thermal equilibrium Instability of the gradient method in recurrent systems – small
changes of weights can trigger large changes in the dynamics
Features and limitations of the Boltzmann Features and limitations of the Boltzmann machinemachine
35
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.3.3 The restricted Boltzmann machine 10.3.3 The restricted Boltzmann machine and contrastive Hebbian learningand contrastive Hebbian learning
From the Boltzmann machine to the restricted Boltzmann machine (RBM) Training the Boltzmann machine with (10.12) is challenging because
the states of the nodes are always changing (1) The update rule is probabilistic – even with constant activity of the
visible nodes, hidden nodes receive variable input (2) The recurrent connections between hidden nodes can change the states
of the hidden nodes rapidly
RBM Keep (1) and change (2) eliminating recurrent connections within each layer of BM Many layers – still giving abilities of general recurrent networks
36
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Contrastive Hebbian learningContrastive Hebbian learning
Outline of the basic step of learning RBM A sensory input state to the input layer probabilistic recognition in
the hidden layer The pattern in hidden layer is used to approximately reconstruct the
pattern of visible nodes
Alternating Gibbs sampling Learning rule
Contrastive divergence (CD) by Geoffrey Hinton et al. Allow some finite number of alternations between hidden responses and
the reconstruction of sensory states Learning with only a few reconstructions is able to self-organize the
system
37
)13.10(.0 hj
vi
hj
viij ssssw
Fig. 10.12 Alternating Gibbs sampling
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Deep belief networksDeep belief networks
Building a hierarchy of RBM Using the activities of hidden nodes in one layer as inputs to the
next layer Many applications
Object recognition in images, information retrieval, modeling V1-V2, digit classification, music classification, etc.
Layered RBMs as auto-encoders Restricted alternating Gibbs sampling, or contrastive
divergence, was used as pre-training Fine-tuned with backpropagation technique
Note: for us, it is more important to understand how the brain works
38
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
General description of the learning process Driven by differences between sensory states caused by the
environment and the expectations generated by the causal model Measure of differences
In the Boltzmann machine – log-likelihood of the data Helmholtz free energy
– L(G): the log-likelihood of the generative model
– p(c;s,Q): the densities of causes produced by the recognition model
– p(c|s;G): the densities of causes produced by the generative model, for a given set of visible states
10.3.4 The Helmholtz machine10.3.4 The Helmholtz machine
A recurrent network with asymmetrical connections
39
)14.10(.));|(),,;((KL)(),( c
GpQpGLGQF scsc
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Learning algorithm for the Helmholtz Learning algorithm for the Helmholtz machinemachine Wake-sleep algorithm Wake phase
Data are applied to the input layer The generative (top-down) weights are trained
Sleep phase Random sequences are produced by the topmost layer and
propagated down with the generative model to the input layer The recognition (bottom-up) weights are trained
Comments on the wake-sleep algorithm Resembling the expectation-maximization (EM) algorithm In the case of stochastic sigmoidal neurons, the training
algorithm take the form of Hebbian-type delta rules
40
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Simulation Simulation (by Hinton (by Hinton et alet al.).)
Online demonstration: http://www.cs.toronto.edu/~hinton/adi Recognition-readout-and-stimulation layer
Trained by providing labels as inputs for the purpose of ‘reading the mind’ Analogous to brain-computer interfaces developed with EEG, fMRI, etc.
41
Two possible test modes Supplying a handwritten image and
asking for recognition Asking the system to produce images
of a certain letter The stimulation device allows us to
instruct the system to ;’visualize ‘ specific letters
The probabilistic nature of the system much better resembles human abilities to produce a variety of responses
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.3.5 Probabilistic reasoning: causal 10.3.5 Probabilistic reasoning: causal models and Bayesian networksmodels and Bayesian networks
Anticipating brain system We want to implement general learning machines which are
able to self-organize from experience Learning of concepts is the basis of forming a general
understanding of the environment and to enable sophisticated anticipation of causes
Statistical models to formalize statistical reasoning in causal models
Bayesian networks Dynamic Bayesian networks (DBN) Hidden Markov models
42
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Bayesian NetworksBayesian Networks
Node (circle): random variable Arrows: represent conditional probabilities The whole density function can be factorized due to the
conditional independence of nodes
One can answer specific questions, such as how likely it is that it rains given that the weather forecast calls for rain
43
)15.10()()(),|(),|(),,,( XPAPXAWPXARPWXARP
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Dynamic Bayesian networks and Hidden Dynamic Bayesian networks and Hidden Markov modelsMarkov models The dynamic Bayesian network (DBN) takes temporal
aspects into account
The hidden Markov model (HMM) can be seen as a special case of DBN with following properties Markov chain of hidden nodes An observable node has a hidden node as its parent Stationary: The laws (conditional probabilities) do not depend
on time
44
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.3.6 Expectation maximization (EM)10.3.6 Expectation maximization (EM)
Here we view the problem in a different way We assume that a general form of a model is given The problem is to estimate the parameters of the corresponding
generative/recognition models in an unsupervised (or self-supervised) way
Expectation maximization (EM) Technique for parameter estimation Self-supervised strategy. Repeat the following steps until
convergence: E-step: we make assumptions of training labels (or the prob. That the
data were produced by a specific cause) from the current model M-step: use this hypothesis to update the parameters of the model to
maximize the observations
45
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Example of EMExample of EM
46
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Simulation of EMSimulation of EM
47
)16.10(.);(
);();|();|(
GP
GcPGcPGcP
x
xx
Recognizing data by inverting the generative model using Bayes’ formula
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.4 Adaptive resonance theory (ART)10.4 Adaptive resonance theory (ART)
Contents of the book so far – important concepts underlying cognitive processes Learning Different forms of memory Self-organization Attention Anticipation
ART is an important theory that combines many of these concepts and explains how they are related Basic ideas: Stephen Grossberg, 1976 Formal theory: Carpenter and Grossberg 1987 Extensions: ART1 (binary patterns), ART2/FuzzyART (real-valued
patterns, ARTMAP/fuzzy ARTMAP (supervised learning)
48
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.4.1 The basic ART model10.4.1 The basic ART model
Theory that specifies more directly how bottom-up and top-down processes interact to guide learning
Plasticity-stability dilemma (7.2.3) Major challenge for advanced learning machines Learning system to learn new concepts or refine learned concepts
quickly System should be stable enough to not overthrow the gained
experience and world model it acquired over its development Questions when a pattern is observed in the environment
How should this experience change our world model, should it change our acquired concepts, or should it learn the new input as a new concept?
How much should a new input change an existing concept? When is an input sufficiently different to everything the system has
experienced before to grant the creation of a new concept?
49
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
The basic ART modelThe basic ART model
Three subsystems Attentional subsystem, orienting subsystem,
gain-control subsystem
Two layers: F1, F2
50
• categories• competition among categories• selection of a winning category
• features• receives some unspecific gain input•selection of a category in F2 cancels the gain input
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Adaptive resonance of weightsAdaptive resonance of weights
The confirmation and refinement of the car category is achieved through a resonant state.
This occurs when the activation of a specific category in F2 is mapped back to F1 (attentional process)
Matching between the input and some category Equivalence is not required termed as resonance
The corresponding resonant state refine weights through Hebbian learning This learning process reinforces old features while also taking new
features into account
51
car
If a new instance of a category is experienced, this example should update the representation of that category
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Orienting subsystemOrienting subsystem
When the input should be treated as a new category Checking the similarity between the state in F1 and the original
input ρ: vigilance. Threshold of the similarity If the difference exceeds ρ, a search for a new category is triggered
the process resembles an orienting mechanism in a visual search process
The search process may result in selecting or creating a new category
Factors of the search process The way bottom-up and top-down signals are combined in F1
The specific normalization of weights and the activity in layer F1
The choice of the vigilance parameter
52
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.4.2 ART1 equations10.4.2 ART1 equations
Specific example implementation for binary input patterns Nodes in the ART architecture: leaky integrators that receive excitatory
and inhibitory inputs
Equation that describes the dynamic of the internal states of nodes in F1
The constants a1, b1, and c1 are all positive
Three excitatory inputs: g: the input from the gain system
Necessary in the case of a new search
Output of the F1 units
and effective input to F2
53
)17.10(.)()1(d
d111
iiiiii IxcbIxaxt
x
)18.10(,11 gbvdII iii
.otherwise0
activenot is F2but provided isinput 1
g
.otherwise0
0for1
ii
xs ,swt b
1τ
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
ART1 equationsART1 equations
F2 layer is competitive – need to use coupled differential equations for the dynamic neural field model
Simplifying with a winner-takes-all maximum calculation Output of layer F2
Top-down input to layer F1
Top-down weight matrix update
Only connections btw the winning node in F2 and active nodes in F1 are changed
Bottom-up weight update
54
,otherwise0
)max(argif1
ti
ui .uwv t
)23.10().(d
d tijij
tij wsut
w
)24.10(),)1((d
d
jk
kbijj
bijij
bij swLswkut
w
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.4.3 Simplified dynamics for 10.4.3 Simplified dynamics for unsupervised letter clusteringunsupervised letter clustering
Demonstrating the self-organized clustering properties Before input to the model is given, the equilibrium activity of
F1 nodes (dx/dt =0) is
During the early processing states before top-down signals become effective, the equilibrium values are
When top-down input become effective, the gain will be g=0, the equilibrium value is
In the following simulations, a model with three corresponding states is used
55
.1 1
1
c
bxi
.)(1 111 cbIa
Ix
i
ii
.)(1 111
11
cvdIaI
bvdIx
ii
iii
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
Simulation of simplified ART1 processingSimulation of simplified ART1 processing
Problem learning visual representations of the
letters of the alphabet Without labels Each of letter patterns is modified with
10% noise
Result (figure on the right) After presenting 100 examples of each
letter Prototype vector for each 26 categories
corresponding to each node in layer F2
Most of the categories have been able to extract the underlying pattern perfectly
56
(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
10.5 Where to go from here10.5 Where to go from here
It is vital to connect experimental techniques to more quantitative endeavors
Theoretical studies must be rooted in experimental knowledge The brain is one of the most challenging systems on our
planet Quite specific proposals: brain as an anticipatory memory
system, … A current challenge in computational neuroscience is the
multitude of models with diverse aims Applications
Health-care applications, e.g. advanced rehabilitation treatment after brain damage
57