1 10. The cognitive brain Lecture Notes on Brain and Computation Summary by Byoung-Hee Kim Lecture by Byoung-Tak Zhang Biointelligence Laboratory School

1

10.10. The cognitive brainThe cognitive brain

Lecture Notes on Brain and Computation

Summary by Byoung-Hee Kim

Lecture by Byoung-Tak Zhang

Biointelligence LaboratorySchool of Computer Science and Engineering

Graduate Programs in Cognitive Science, Brain Science and BioinformaticsBrain-Mind-Behavior Concentration Program

Seoul National University

E-mail: [email protected] material is available online at http://bi.snu.ac.kr/

Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2010.

(C) 2010 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr

IntroductionIntroduction

This chapter is a continuation of discussing system-level models of the brain

Layered representation of brain Invariant object recognition Visual attention

Workspace hypothesis How the brain is able to produce novel solutions to new tasks

General discussion of the brain theory in bidirectional layered models of cortex – anticipation is a central feature Bayesian formulations Boltzmann machine / Helmholtz machine / Deep belief networks Probabilistic reasoning: Bayesian networks, EM Adaptive resonance theory (ART)

2


OutlineOutline

3

10.110.210.310.410.5

Hierarchical maps and attentive visionAn interconnecting workspace hypothesisThe anticipating brainAdaptive resonance theoryWhere to go from here


10.1 Hierarchical maps and attentive vision10.1 Hierarchical maps and attentive vision

Invariant object recognition of humans Recognize objects even though they vary in form, size, location,

and viewing angle

Neural systems that learn to recognize objects through supervised learning in mapping networks are quite sensitive against changes in the input vector

Solution: hierarchical networks Related question: attention in the visual system

4


10.1.1 Invariant object recognition10.1.1 Invariant object recognition

5


VisNet ModelVisNet Model

Each layer is a competitive map Competition is implemented through adjustment of the firing threshold of

nodes until a predefined sparseness is reached (refer Section 7.5.11)

The model is trained on sequences of patterns from movements of objects in the visual field

Weights btw the layers are adjusted with Hebbian learning Early version: Hebbian learning with a trace rule – some memory in the

neural activity (trace) was used Temporal associations between moving objects

Recent model: without a trace rule when objects in consecutive time steps have some overlap with previous neural representations

Using spatial associations Able to learn invariant object recognition

invariance to translation, rotation, and size Multiple objects can be trained simultaneously The information flow between the layers is strictly feedforward

6


10.1.2 Attentive vision10.1.2 Attentive vision

Top-down information flow in cortical models Crucial in cognitive process:

Example: Visual search Demands the top-down influence of an object bias that specifies what

to look for ↔ object recognition demands a spatial attention

7


The overall scheme of the model for The overall scheme of the model for attentive visionattentive vision Model by Gustavo Deco et al. Three important parts: ‘V1-V4’, ‘IT’, ‘PP’

We are particularly interested in how the different parts influence each other

8


Roles of parts: LGNRoles of parts: LGN

Representations in the LGN: Gabor functions Bottom-up input images to the model are decomposed with

Gabor functions Turning curves in the LGN can be parameterized with Gabor

functions

9


Roles of parts: ‘V1-V4’Roles of parts: ‘V1-V4’

Principal role in the model Decomposing the visual scene into features V1 neurons respond mainly to simple features - orientation of edges Later areas are specialized to represent other features – color, motion,

combination of basic features Modeling issue

Simplification: features are just represented in sections which correspond to the location of the object in the visual field

Each section represents a feature of a part as a vector of node activities No global competition between the modules. Only within each section – an inhibitory pool that keeps the sparseness

of the activity in each section roughly constant (refer Section 7.5.1)

10


Roles of parts: ‘IT’Roles of parts: ‘IT’

11

IT: model of processes in the inferior-temporal cortex Known to be involved in object recognition Modeled as associative memory (refer Ch. 8) Attractor network: Point attractors of specific objects are

formed through the Hebbian-trained collateral connections within this structure

The connections btw ‘V1-V4’ and ‘IT’ can also be trained with Hebbian learning

This enables simulation of translation-invariant object recognition



12

The contribution of the attractor network in ‘IT’ to translation-invariant object recognition explains some recent experimental findings the size of the receptive field of IT neurons depends on the

content of the visual field and the specifics of the task Example: Single object on a screen with a blank

background – the receptive field of a neuron that responds to this object can be very large (> 30 degrees)



13

If two objects are presented simultaneously, or if the target object is shown on top of a complex background (which can be viewed as a scene with many objects)

Then the size of the receptive field shrinks markedly Fig. 10.4(A)



14

Hypothesis given by the model Based on the attractor dynamics of the autoassociator network in ‘IT’ If only one object is shown, then this object would trigger the right point

attractor and thus recall of the object regardless of its location, which corresponds to large receptive fields

If two or more trained objects are shown, then it is likely that the final state of the attractor network is mainly dominated by the object closest to the fovea, which gets the most weight due to cortical magnification

Fig. 10.4(B)


10.1.3 Attentional bias in visual search and 10.1.3 Attentional bias in visual search and object recognitionobject recognition

Object bias input to the attractor network in ‘IT’ Tells the system what to look for in visual search task

Such top-down information is thought to originate in the frontal areas of the brain

Can speed up the recognition process in ‘IT’ Supports the recognition ability of the input from ‘V1-V4’ that

corresponds to the target object in visual search The receptive fields of target objects are larger than receptive

fields of non-target objects

15


Attentional bias in Object RecognitionAttentional bias in Object Recognition

16

Top-down input to a specific location in ‘PP’ Object recognition task The label of this module suggests processing in the posterior

parietal cortex, which is part of the dorsal visual processing pathway (‘where’ pathway)

Modeled with a spatially organized neural sheet, which is connected to the corresponding section in ‘V1-V4’

Enhance the neural activity in ‘V1-V4’ for the features of the object that are located at the corresponding location in the visual field

Faster completion of the input patterns for the ‘IT’ networkFaster object recognition at the corresponding location


Attentional bias – summaryAttentional bias – summary

Origins of the attentional biases Visual search: top-down input to ‘IT’, object-based Object recognition: acts on ‘PP’, location-based

It may be difficult to separate the different forms of attention in experiments because all parts of the model are bidirectional

17


Numerical experimentsNumerical experiments

Simulation experiment by Deco et al. Target: the letter ‘E’ Distractors: visually different (‘X’, Fig. 5A) and similar (‘F’,

Fig. 5B) Reaction time

Independent of the number of objects (Fig. 5A) Linear increase with the number of objects (Fig. 5B)

Both modes are present in the same ‘parallel architecture’ The apparent serial search is only due to the more

intense conflict-resolution demand in the recognition process

18


10.2 An interconnecting workspace hypothesis10.2 An interconnecting workspace hypothesis

Humans are very flexible in coping with the complex world Ex. Humans can drive a car with such ease that little attention

despite the fact that we have to react to the unknown environment

Model to solve complex cognitive task Attributes: flexible cooperation of many specialized modules System-level perspective

specialized processors + global system activities Flexibility + robustness

19


10.2.1 The global workspace10.2.1 The global workspace

Model by S. Dehaene et al. (1995) Five basic subsystems

Perceptual (input), motor (output), three cognitive subsystems

Global workspace: interconnecting network among subsystems

20

Projection between cortical areas are indeed abundant

Large portion of the global workspace could be localized within layers II and III


10.2.2 Demonstration of the global 10.2.2 Demonstration of the global workspace in the Stroop taskworkspace in the Stroop task

Stroop Task Task 1: read the word or Task 2: name the color in which the word is written You should do it rapidly

Idea: the global workspace has to become active to ‘re-wire’ the commonly active word-naming configuration of the brain

21

Yellow Green Blue


Model that can be tested on a Stroop task to demonstrate the idea Three specialized processors Top-down influence of workspace nodes on the nodes in the

specialized processors Reward signal – indicates a mismatch btw the desired response and

actual response of the system Vigilance parameter

Usage of the model Predictions Flexibility to solve different tasks

22

10.2.2 Demonstration of the global 10.2.2 Demonstration of the global workspace in the Stroop taskworkspace in the Stroop task


10.3 The anticipating brain10.3 The anticipating brain

Layered architectures with bottom-up and top-down information flow are important for cognitive processes Previous examples: VisNet, interconnecting workspace hypothesis

Generalization (emerging brain theory) brain-style information processing principles general hypothesis of how the brain implements cognitive functions Brain as an anticipating memory system

Remainder is a discussion of Anticipating brain hypothesis Related model implementations

23


The Anticipating brain – factors The Anticipating brain – factors

Factors that are essential in realizing cognitive functions The brain can develop a model of the world, which ca be used

to anticipate or predict the environment The inverse of the model can be used to recognize causes by

evoking internal concepts Hierarchical representations are essential to capture the richness

of the world Internal concepts are learned through matching the brain’s

hypotheses with input from the world An agent can learn actively by testing hypothesis through

actions The temporal domain is an important degree of freedom

24


10.3.1 The brain as anticipatory system in 10.3.1 The brain as anticipatory system in a probabilistic frameworka probabilistic framework

Notations Sensory state: s c: causal state g: describes the physical process of generating the sensory

response a: action of the agent s’: internal representation of sensory states in primary sensory

cortex c’: higher-order cortical representations. Called as concepts Generative model G: on an abstract level, we see the brain as a

generative model of the world Recognition model Q: the inverse of G, which evokes internal

concepts from causes in the environment

25

)(cs g )',|'( csp sa)c,|'(sp );'( Gp s


Definition of the term ‘causes’ highlights two important functions of brain processing First, one of the major goals of the brain is to learn what causes are by

forming internal concepts Second, the brain must learn concepts at different levels of abstraction Learning concepts and predicting causes in our environment is central

to the thesis developed in this chapter

Agent: system that can explore the environment actively via interaction with the environment

Conjecture The brain is trying to match sensory input with internally generated

states

26


The world model in a probabilistic The world model in a probabilistic frameworkframework

Layered structure that includes the necessary bidirectional connections Related models: deep belief networks

Highly interactive system: it is not easy to clearly separate the generative model from the recognition model

Model form: Bayesian network or causal graphical model

27


Concepts at different levels of cortical representation learned in a self-supervised way through the interaction with the

environment Engrained into a memory system

Example from the visual system Early level can learn to recognize different sequences of retinal

patterns Sequences of these concepts can then be learned by higher-order

cortical areas Higher order concepts that are evoked by specific sensory input can

influence the expectations of concepts in lower cortical representations, ultimately anticipating specific patterns of sensory input

28


Hypothesis testing – how good the world model Can only be achieved through interactions with the environment Inference of the world model with environmental data Hypothesis testing by the agent is different to common

inference techniques in statistics in that the agent seems to be able to actively interact with the environment

Active learning might be necessary to reduce the demands on learning in large systems

Now, we will learn some recent models that implement and elaborate on the principal ideas outlined in this section

29


10.3.2 The Boltzmann machine - Intro10.3.2 The Boltzmann machine - Intro

Models which are able to learn expectations of sensory states The attractor neural network (ANN) (Ch. 8) More general dynamic models (in this chapter)

Features of the attractor neural network (ANN) It can be seen as predictive memory systems Simple recurrent network Trained with Hebbian autocorrelation rules corresponding dynamic

system has point attractors

Limitations of the ANN Always produce the same answer given partial input of a sensory state Not reflecting the probability of different causes in the environment

with similar sensory state

30


Toward a general dynamic systemToward a general dynamic system

Introducing hidden nodes to a recurrent system Hidden nodes

in feedforward mapping networks (perceptrons), provide enough internal representations

In recurrent networks, provide enough degree of freedom Finding practical training rules for the dynamic system of recurrent

networks with hidden nodes have been a major challenge

Extension Distinguish visible nodes and hidden nodes The system still can be described by an energy function Boltzmann machines – symmetrical connections Helmholtz machines – asymmetrical connections

31


The Boltzmann machine - 1The Boltzmann machine - 1

The energy between two nodes s: the state variable n or m can have values v or h to indicate visible and hidden

nodes

Probabilistic update rule (Glauber dynamics)

β: inverse temperature Describes the competitive interaction between minimizing the

energy and the randomizing thermal force Probability distribution for such a stochastic system is called the

Boltzmann-Gibbs distribution

32

)6.10(.2

1 ij

mj

niij

nm sswH

)7.10(,exp1

1)1(

i

niij

ni sw

sp



The distribution of visible states in thermal equilibrium

w: the weights of the recurrent network

normalization term called the partition function

Target of the model With enough hidden nodes By choosing the right weight values, we want to the dynamical

system to approximate the probability function of the sensory states caused by the environment

33

)8.10(,exp1

);(

hm

vmv HZ

p ws

,exp, mn

nmHZ

)( vp s



To derive a learning rule we need to define an objective function – difference between two density functions Kullbach-Leibler (KL) divergence

• \

Minimizing the KL divergence is equivalent to maximizing the average log-likelihood function

By gradient ascent the learning rule can be written

‘clamped’ – thermal average of the correlation btw two nodes when the states of the visible nodes are fixed

‘free’ – thermal average when the recurrent system is running freely

34

)10.10(

)9.10(

.);(log)()(log)(

);(

)(log)());(),(KL(

vvv

vvv

v

v

vvvv

pppp

p

pppp

ss

s

wssss

ws

sswss

)11.10(.);(log);(log)()( v

vvv pppls

wswssw

)12.10().(2 freeclamped

jijiij

ij ssssw

lw

Note: <x> represents expectation of x


Features In principle, the Boltzmann machine can be trained to represent

any arbitrary density functions, given that the network has a sufficient number hidden nodes

The clamped phase could be associated with a sensory driven agent during an awake state

The freely running state could be associated with a sleep phase

Limitations Learning is too demanding in practice

The averages have to be evaluated at thermal equilibrium Instability of the gradient method in recurrent systems – small

changes of weights can trigger large changes in the dynamics

Features and limitations of the Boltzmann Features and limitations of the Boltzmann machinemachine

35


10.3.3 The restricted Boltzmann machine 10.3.3 The restricted Boltzmann machine and contrastive Hebbian learningand contrastive Hebbian learning

From the Boltzmann machine to the restricted Boltzmann machine (RBM) Training the Boltzmann machine with (10.12) is challenging because

the states of the nodes are always changing (1) The update rule is probabilistic – even with constant activity of the

visible nodes, hidden nodes receive variable input (2) The recurrent connections between hidden nodes can change the states

of the hidden nodes rapidly

RBM Keep (1) and change (2) eliminating recurrent connections within each layer of BM Many layers – still giving abilities of general recurrent networks

36


Contrastive Hebbian learningContrastive Hebbian learning

Outline of the basic step of learning RBM A sensory input state to the input layer probabilistic recognition in

the hidden layer The pattern in hidden layer is used to approximately reconstruct the

pattern of visible nodes

Alternating Gibbs sampling Learning rule

Contrastive divergence (CD) by Geoffrey Hinton et al. Allow some finite number of alternations between hidden responses and

the reconstruction of sensory states Learning with only a few reconstructions is able to self-organize the

system

37

)13.10(.0 hj

vi

hj

viij ssssw

Fig. 10.12 Alternating Gibbs sampling


Deep belief networksDeep belief networks

Building a hierarchy of RBM Using the activities of hidden nodes in one layer as inputs to the

next layer Many applications

Object recognition in images, information retrieval, modeling V1-V2, digit classification, music classification, etc.

Layered RBMs as auto-encoders Restricted alternating Gibbs sampling, or contrastive

divergence, was used as pre-training Fine-tuned with backpropagation technique

Note: for us, it is more important to understand how the brain works

38


General description of the learning process Driven by differences between sensory states caused by the

environment and the expectations generated by the causal model Measure of differences

In the Boltzmann machine – log-likelihood of the data Helmholtz free energy

– L(G): the log-likelihood of the generative model

– p(c;s,Q): the densities of causes produced by the recognition model

– p(c|s;G): the densities of causes produced by the generative model, for a given set of visible states

10.3.4 The Helmholtz machine10.3.4 The Helmholtz machine

A recurrent network with asymmetrical connections

39

)14.10(.));|(),,;((KL)(),( c

GpQpGLGQF scsc


Learning algorithm for the Helmholtz Learning algorithm for the Helmholtz machinemachine Wake-sleep algorithm Wake phase

Data are applied to the input layer The generative (top-down) weights are trained

Sleep phase Random sequences are produced by the topmost layer and

propagated down with the generative model to the input layer The recognition (bottom-up) weights are trained

Comments on the wake-sleep algorithm Resembling the expectation-maximization (EM) algorithm In the case of stochastic sigmoidal neurons, the training

algorithm take the form of Hebbian-type delta rules

40


Simulation Simulation (by Hinton (by Hinton et alet al.).)

Online demonstration: http://www.cs.toronto.edu/~hinton/adi Recognition-readout-and-stimulation layer

Trained by providing labels as inputs for the purpose of ‘reading the mind’ Analogous to brain-computer interfaces developed with EEG, fMRI, etc.

41

Two possible test modes Supplying a handwritten image and

asking for recognition Asking the system to produce images

of a certain letter The stimulation device allows us to

instruct the system to ;’visualize ‘ specific letters

The probabilistic nature of the system much better resembles human abilities to produce a variety of responses


10.3.5 Probabilistic reasoning: causal 10.3.5 Probabilistic reasoning: causal models and Bayesian networksmodels and Bayesian networks

Anticipating brain system We want to implement general learning machines which are

able to self-organize from experience Learning of concepts is the basis of forming a general

understanding of the environment and to enable sophisticated anticipation of causes

Statistical models to formalize statistical reasoning in causal models

Bayesian networks Dynamic Bayesian networks (DBN) Hidden Markov models

42


Bayesian NetworksBayesian Networks

Node (circle): random variable Arrows: represent conditional probabilities The whole density function can be factorized due to the

conditional independence of nodes

One can answer specific questions, such as how likely it is that it rains given that the weather forecast calls for rain

43

)15.10()()(),|(),|(),,,( XPAPXAWPXARPWXARP


Dynamic Bayesian networks and Hidden Dynamic Bayesian networks and Hidden Markov modelsMarkov models The dynamic Bayesian network (DBN) takes temporal

aspects into account

The hidden Markov model (HMM) can be seen as a special case of DBN with following properties Markov chain of hidden nodes An observable node has a hidden node as its parent Stationary: The laws (conditional probabilities) do not depend

on time

44


10.3.6 Expectation maximization (EM)10.3.6 Expectation maximization (EM)

Here we view the problem in a different way We assume that a general form of a model is given The problem is to estimate the parameters of the corresponding

generative/recognition models in an unsupervised (or self-supervised) way

Expectation maximization (EM) Technique for parameter estimation Self-supervised strategy. Repeat the following steps until

convergence: E-step: we make assumptions of training labels (or the prob. That the

data were produced by a specific cause) from the current model M-step: use this hypothesis to update the parameters of the model to

maximize the observations

45


Example of EMExample of EM

46


Simulation of EMSimulation of EM

47

)16.10(.);(

);();|();|(

GP

GcPGcPGcP

x

xx

Recognizing data by inverting the generative model using Bayes’ formula


10.4 Adaptive resonance theory (ART)10.4 Adaptive resonance theory (ART)

Contents of the book so far – important concepts underlying cognitive processes Learning Different forms of memory Self-organization Attention Anticipation

ART is an important theory that combines many of these concepts and explains how they are related Basic ideas: Stephen Grossberg, 1976 Formal theory: Carpenter and Grossberg 1987 Extensions: ART1 (binary patterns), ART2/FuzzyART (real-valued

patterns, ARTMAP/fuzzy ARTMAP (supervised learning)

48


10.4.1 The basic ART model10.4.1 The basic ART model

Theory that specifies more directly how bottom-up and top-down processes interact to guide learning

Plasticity-stability dilemma (7.2.3) Major challenge for advanced learning machines Learning system to learn new concepts or refine learned concepts

quickly System should be stable enough to not overthrow the gained

experience and world model it acquired over its development Questions when a pattern is observed in the environment

How should this experience change our world model, should it change our acquired concepts, or should it learn the new input as a new concept?

How much should a new input change an existing concept? When is an input sufficiently different to everything the system has

experienced before to grant the creation of a new concept?

49


The basic ART modelThe basic ART model

Three subsystems Attentional subsystem, orienting subsystem,

gain-control subsystem

Two layers: F1, F2

50

• categories• competition among categories• selection of a winning category

• features• receives some unspecific gain input•selection of a category in F2 cancels the gain input


Adaptive resonance of weightsAdaptive resonance of weights

The confirmation and refinement of the car category is achieved through a resonant state.

This occurs when the activation of a specific category in F2 is mapped back to F1 (attentional process)

Matching between the input and some category Equivalence is not required termed as resonance

The corresponding resonant state refine weights through Hebbian learning This learning process reinforces old features while also taking new

features into account

51

car

If a new instance of a category is experienced, this example should update the representation of that category


Orienting subsystemOrienting subsystem

When the input should be treated as a new category Checking the similarity between the state in F1 and the original

input ρ: vigilance. Threshold of the similarity If the difference exceeds ρ, a search for a new category is triggered

the process resembles an orienting mechanism in a visual search process

The search process may result in selecting or creating a new category

Factors of the search process The way bottom-up and top-down signals are combined in F1

The specific normalization of weights and the activity in layer F1

The choice of the vigilance parameter

52


10.4.2 ART1 equations10.4.2 ART1 equations

Specific example implementation for binary input patterns Nodes in the ART architecture: leaky integrators that receive excitatory

and inhibitory inputs

Equation that describes the dynamic of the internal states of nodes in F1

The constants a1, b1, and c1 are all positive

Three excitatory inputs: g: the input from the gain system

Necessary in the case of a new search

Output of the F1 units

and effective input to F2

53

)17.10(.)()1(d

d111

iiiiii IxcbIxaxt

x

)18.10(,11 gbvdII iii

.otherwise0

activenot is F2but provided isinput 1

g

.otherwise0

0for1

ii

xs ,swt b

1τ


ART1 equationsART1 equations

F2 layer is competitive – need to use coupled differential equations for the dynamic neural field model

Simplifying with a winner-takes-all maximum calculation Output of layer F2

Top-down input to layer F1

Top-down weight matrix update

Only connections btw the winning node in F2 and active nodes in F1 are changed

Bottom-up weight update

54

,otherwise0

)max(argif1

ti

ui .uwv t

)23.10().(d

d tijij

tij wsut

w

)24.10(),)1((d

d

jk

kbijj

bijij

bij swLswkut

w


10.4.3 Simplified dynamics for 10.4.3 Simplified dynamics for unsupervised letter clusteringunsupervised letter clustering

Demonstrating the self-organized clustering properties Before input to the model is given, the equilibrium activity of

F1 nodes (dx/dt =0) is

During the early processing states before top-down signals become effective, the equilibrium values are

When top-down input become effective, the gain will be g=0, the equilibrium value is

In the following simulations, a model with three corresponding states is used

55

.1 1

1

c

bxi

.)(1 111 cbIa

Ix

i

ii

.)(1 111

11

cvdIaI

bvdIx

ii

iii


Simulation of simplified ART1 processingSimulation of simplified ART1 processing

Problem learning visual representations of the

letters of the alphabet Without labels Each of letter patterns is modified with

10% noise

Result (figure on the right) After presenting 100 examples of each

letter Prototype vector for each 26 categories

corresponding to each node in layer F2

Most of the categories have been able to extract the underlying pattern perfectly

56


10.5 Where to go from here10.5 Where to go from here

It is vital to connect experimental techniques to more quantitative endeavors

Theoretical studies must be rooted in experimental knowledge The brain is one of the most challenging systems on our

planet Quite specific proposals: brain as an anticipatory memory

system, … A current challenge in computational neuroscience is the

multitude of models with diverse aims Applications

Health-care applications, e.g. advanced rehabilitation treatment after brain damage

57

Documents

1 10. The cognitive brain Lecture Notes on Brain and Computation Summary by Byoung-Hee Kim Lecture by Byoung-Tak Zhang Biointelligence Laboratory School