Upload
michael-blum
View
699
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
The social Bayesian brain
Jean Daunizeau Brain and Spine Institute - Paris
“Motivation, Brain & Behaviour” group
The Bayesian brain hypothesis
Cerebral information processing was optimized through natural selection [Friston 2005, Fiorillo 2010]
The brain uses a model of the world that (i) is optimal on average, but (ii) can induce systematic biases [Weiss 2002, Alais 2004]
Theory of mind
= ability to attribute mental states (e.g., beliefs, desires, ...) to others [Premack 1978]
the social Bayesian brain
The social “Bayesian brain” hypothesis
• the brain’s model of other brains should assume they are Bayesian too!
→ ToM = meta-Bayesian inference [Daunizeau 2010a, Daunizeau 2010b]
• although it is limited, is ToM optimal in an evolutionary sense?
• develops very early in life [Surian 2007, Kovacs 2010]
• impairment → severe psychiatric disorders [Baron-Cohen 1985, Frith 1994, Brüne 2005]
• teaching, persuading, deceiving, … → success in social interactions [Baron-Cohen 1999]
• meta-cognitive insight: others’ behaviour is driven by their beliefs [Frith 2012]
Overview of the talk
Inverse Bayesian Decision Theory
Meta-Bayesian modelling of Theory of Mind
Limited ToM sophistication: did evolution fool us?
Overview of the talk
Inverse Bayesian Decision Theory
Meta-Bayesian modelling of Theory of Mind
Limited ToM sophistication: did evolution fool us?
BDT: from observations to beliefs
• The “amount of information” is related to probability in the sense that one’s belief “vagueness” is well characterized in terms of the dispersion of a probability distribution that measures how (subjectively) plausible is any “state of affairs”. • The subjective plausibility of any “state of affairs” is not captured by the objective frequency of the ensuing observable event. This is because beliefs are shaped by all sorts of (implicit or explicit) prior assumptions that act as a (potentially very complex) filter upon sensory observations.
x
u
hidden (unknown) state of the world
accessible observations or data
• likelihood: • priors: • posterior:
,p u x m
p x m
, ,p x u m p u x m p x m
BDT: from beliefs to decisions
• Bayesian Decision Theory (BDT) is concerned with how decisions are made or should be made, in ambiguous or uncertain situations: - normative: optimal/rational decision? (cf. statistical test) - descriptive: what do people do? (cf. behavioural economics) • BDT is bound to a perspective on preferences → utility theory: - utility functions: surrogate for the task goal (reward contingent on a decision) - subsumes game theory and control theory
a
,x a
alternative actions or decisions
loss function
, , ,u a E x a u m
* arg min ,a
a u a
→ expected cost (posterior risk):
→ BDT-optimal decision rule:
BDT example: speed-accuracy trade-off
2
, ,x a t x a t • loss = estimation error + estimation time:
( )
( )2
00 0
, ,1
,
yt t t
t
p u x m N x u x
xp x m N
• generative model:
2
0 0
12
1:
2
0
, , :1
t
t t
t t t
t
u
p x u m N
t
• posterior belief:
2 2, , t tu a t a t • expected cost:
0 0.5 1 1.5 20
2
4
6
8
10
12
posterior variance
time cost (K=1)
expected loss (K=1)
time cost (K=2)
expected loss (K=2)
BDT example: speed-accuracy trade-off (2)
t* (K=1)
decision time
t* (K=2)
expected inaccuracy (K=2)
expected inaccuracy (K=1)
inverse BDT: the complete class theorem
There (almost) always exist a duplet of prior belief and loss function
such that any observed decision can be interpreted as BDT-optimal
p x m ,x a
0
→ interpreting BDT-optimal responses (e.g., decision times): weak duality
2
0
* arg min , ,
1
tt u a t
tu
tx
(1)| , ,t tp u x m
(1)
1| , ,t tp x x m
1tx
(2)| , ( ),p y m
(2), |p m priors
likelihood Response model
Perceptual model
Perceptual priors
Perceptual likelihood (1)m
(2)my
arg min ,a
g u a
Posterior risk minimization (optimal decider assumption)
y
,u
u
meta-Bayesian model
(2)m
Free-energy maximization (optimal learner assumption)
approximate posterior
|t tq x 1
1
, ,
: arg maxt
t t t
t
f u
f F
Daunizeau et al., 2010a
inverse BDT: the meta-Bayesian approach
y
meta-Bayesian model
ball position
reaction time
0
u
,t t
prior uncertainty
cost of time
Andy Murray’s belief
-10 -5 0 5 100
1
2
3
4x 10
-3
t
t
Andy Murray’s belief
-10 -5 0 5 100
1
2
3
4x 10
-3
meta-Bayesian estimate of Andy Murray’s belief
(2)m
Overview of the talk
Inverse Bayesian Decision Theory
Meta-Bayesian modelling of Theory of Mind
Limited ToM sophistication: did evolution fool us?
• Operational definition of ToM: taking the intentional stance [Dennett 1996]
– Infer beliefs and preferences from observed behavior
– Use them to understand/predict behaviour
Recursivity and limited sophistication
• Questions:
– Does the meta-Bayesian approach reallistically capture peoples’ ToM?
– Can people appeal to these sophistication levels (e.g. by pure reinforcement) or are
these (social) priors that are set by the context?
– What is the inter-individual variability of ToM sophistication?
• In reciprocal/repeated social interactions, ToM is potentially recursive [Yoshida 2008]
– « I think that you think », « I think that you think that I think », …
– ToM sophistication levels induce different learning rules / behavioural policies
0-ToM does not apply the intentional stance
→ 0-ToM is a Bayesian agent with:
- beliefs (about non-intentional contingencies)
- preferences
0-ToM
« I think that you will hide behind the tree »
1-ToM learns how the other learns
→ 1-ToM is a meta-Bayesian agent with:
- beliefs (about other’s beliefs and preferences)
- preferences
1-ToM
« I think that you think that I will hide behind the tree »
2-ToM
2-ToM learns how the other learns and her ToM sophistication level
→ 2-ToM is a meta-Bayesian agent with:
- beliefs (about other’s beliefs – about one’s beliefs - and preferences)
- preferences
« I think that you think that I think, … »
Recursive meta-Bayesian modelling
• k-ToM learns how the other learns and her ToM sophistication level:
( ) ( ) ( )
1 1, ,k k kf a
• k-ToM acts according to her beliefs and preferences:
( ) ( ) ( )
1, 1 1 1, 1 2exp ,k k kp a a
• This induces a likelihood for a k+1-ToM observer:
'(1,..., ) ( )
1, 1 1, '
' 0 ' 1
, ,k
kk k
k
k
p a m p a
• Deriving the ensuing Free-Energy yields the k+1-ToM learning rule:
( 1)1
( 1) ( 1) ( 1)
1 1
( 1) ( 1)
( 1) (1,..., ) (1,..., ) (1,..., )
1, 1 1
, ,
: arg max
ln , , ln , ln ,
k
k k k
k k
k k k k
k k
f a
f F
F p a m p m q
Performance in competitive games
hider: a1 = 1 hider: a1 = 0
seeker: a2 = 1 -1, 1 1, -1
seeker: a2 = 0 1, -1 -1, 1
1 2 3 4 5
1
2
3
4
5-0.04
-0.02
0
0.02
0.040-ToM
1-ToM
2-ToM
3-ToM
4-ToM
0-T
oM
1-T
oM
2-T
oM
3-T
oM
4-T
oM
512
simulated behavioural performance
(#wins/trial)
outcome table (« hide and seek »)
1-ToM predicts 0-ToM 0-ToM predicts 1-ToM
Everybody is somebody’s fool
You are playing against Player 1
1 2
subject’s choice
Well done!
feedback
(1sec)
social framing
(game « hide and seek »)
alternative options
(1.2 sec)
Behavioural task design
Session 1
1 2
You win!
non-social framing
(casino gambling task)
the social framing effect: group results (N=26)
1.4
1.6
1.8
2
1.5
1.6
1.7
1.8
1.9
2
2.1
non-social framing
social framing random biased 0-ToM 1-ToM 2-ToM
-6
-4
-2
0
2
4
6
*
*
group-average performance (cumulated earnings after 60 game repetitions)
IMT false belief Frith-Happe EQ WSCT Go-NoGo 3-back mean
-6
-4
-2
0
2
4
6
*
*
inter-individual variability in cognitive skills: regression on performance against 1-ToM
1 2 3 4 5 6 7 8
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
1 2 3 4 5 6 7 8
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
6
0.6
0.65
0.7
0.75
own action
lag
we
igh
t
chance level
0-ToM (acc=86%)
1-ToM (acc=75%)
2-ToM (acc=74%)
Volterra decompositions: group results
( ) ( )
01 ...k k
t t
k
p a s u
Volterra 1st-order kernels:
1 2 3 4 5 6 7 8
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
own action
lag
Vo
lte
rra
we
igh
t
lag
* Volterr
a w
eig
ht: S
-NS
1 2 3 4 5 6 7 8
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
opponent's action
lag
Vo
lte
rra
we
igh
t
*
lag
Volterr
a w
eig
ht: S
-NS
0 0.2 0.4 0.6 0.8 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Model fit: <g(x)|y,m> versus y
<g(x)|y,m>
y
2-ToM’s worst fit: subject #21 against 0-ToM
(acc=43%)
observ
ed c
hoic
es
modelled choices
0 0.2 0.4 0.6 0.8 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Model fit: <g(x)|y,m> versus y
<g(x)|y,m>
y
2-ToM’s best fit: subject #7 against 0-ToM
(acc=79%)
observ
ed c
hoic
es
modelled choices
Volterra Nash WSLS RL 0-ToM 1-ToM 2-ToM 3-ToM-6
-4
-2
0
2
4free energies
1.4
1.6
1.8
2
1.5
1.6
1.7
1.8
1.9
2
2.1
non-social framing
social framing
log model evidences (group average)
Group-level Bayesian model comparison (I)
Volterra Nash WSLS RL 0-ToM 1-ToM 2-ToM 3-ToM0
0.1
0.2
0.3
0.4
0.5
0.6
model frequencies (social condition)estimated model frequencies (social condition)
no-ToM family ToM family
Group-level Bayesian model comparison (II)
no-ToM ToM0
0.2
0.4
0.6
0.8
1model families: exceedance probabilities
1.4
1.6
1.8
2
1.5
1.6
1.7
1.8
1.9
2
2.1
non-social framing
social framing
model families: exceedance probabilities
What about Rhesus-macaques?
[Thom et al., 2013]
Rhesus-macaques: group-results (N=4)
random biased 1-ToM-10
-5
0
5
10
15
1 2 3 4 5 6 7 8-0.4
-0.2
0
0.2
0.4
0.6own action
lag
Vo
lte
rra
we
igh
t
1 2 3 4 5 6 7 8-0.4
-0.2
0
0.2
0.4
0.6opponent's action
lag
Vo
lte
rra
we
igh
t
Rhesus-macaques: group-results (N=4)
no-ToM ToM0
0.2
0.4
0.6
0.8
1model families: exceedance probabilities
Volterra Nash WSLS RL 0-ToM 1-ToM 2-ToM
-20
-10
0
10
20
free energieslog model evidences (group average)
no-ToM family ToM family
Overview of the talk
Inverse Bayesian Decision Theory
Meta-Bayesian modelling of Theory of Mind
Limited ToM sophistication: did evolution fool us?
Competitive versus cooperative games
1 2 3 4 5
1
2
3
4
5-0.04
-0.02
0
0.02
0.040-ToM
1-ToM
2-ToM
3-ToM
4-ToM
0-T
oM
1-T
oM
2-T
oM
3-T
oM
4-T
oM
1 2 3 4 5
1
2
3
4
5-0.2
0
0.2
0.4
0.6
0.8
0-ToM
1-ToM
2-ToM
3-ToM
4-ToM 0
-To
M
1-T
oM
2-T
oM
3-T
oM
4-T
oM
P1: a1 = 1 P1: a1 = 0
P2: a2 = 1 -1, 1 1, -1
P2: a2 = 0 1, -1 -1, 1
P1: a1 = 1 P1: a1 = 0
P2: a2 = 1 2, 0 -1, -1
P2: a2 = 0 -1, -1 0, 2
« hide and seek » « battle of the sexes »
1-ToM predicts 0-ToM 0-ToM predicts 1-ToM
« hide and seek »
« battle of the sexes »
Being right is as good as being smart
2-ToM vs 2-ToM 3-ToM vs 3-ToM 4-ToM vs 4-ToM
Biases in ToM induction
Can we explain the emergence of the natural bound on ToM sophistication?
→ Average adaptive fitness:
• is a function of the behavioural performance, relative to other phenotypes
• depends upon the frequency of other phenotypes within the population
i
( )iQ expected payoff matrix of game i at round τ
frequency of game i
→ Replicator dynamics [Maynard-Smith 1982, Hofbauer 1998]:
( ) ( )i T i
i i
i i
dsDiag s Q s s Q s
dt
ks frequency of phenotype k within the population
limt
s s t
evolutionary stable states:
Evolutionary game theory
EGT steady state
k=0
k=1
k=2
k=3
k=4
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
evolutionary time
tra
its' fr
eq
ue
ncie
s
EGT replicator dynamics
EGT steady state
k=0
k=1
k=2
k=3
k=4
type and frequency of EGT steady states
0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
evolutionary time
tra
its' fr
eq
ue
ncie
s
EGT replicator dynamics
« hide and seek »
« battle of the sexes »
Replicator dynamics and ESS
1 2 4 8 16 32 64 128 256 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
tau (game duration)
lam
bd
a (
fre
qu
en
cy o
f co
op
era
tive
ga
me
)
k=0
k=1
k=2
k=3
k=4
ESS: phase portrait
« hide and seek » « battle of the sexes »
• RL = reinforcement learning agent (cannot adapt to game rules)
• Nash = « rational » agent (cannot adapt to opponent)
0-T
oM
1-T
oM
2-T
oM
3-T
oM
4-T
oM
Nas
h
RL
0-ToM
1-ToM
2-ToM
3-ToM
4-ToM
RL
Nash
0-ToM
1-ToM
2-ToM
3-ToM
4-ToM
RL
Nash
0-T
oM
1-T
oM
2-T
oM
3-T
oM
4-T
oM
Nas
h
RL
Can non-ToM agents invade?
0-ToM 1-ToM0
100
200
300
400
500
2-ToM
0-ToM 1-ToM0
100
200
300
400
500
2-ToM
0-ToM 1-ToM 2-ToM0
100
200
300
400
500
3-ToM
0-ToM 1-ToM 2-ToM0
100
200
300
400
500
3-ToM
0-ToM 1-ToM 2-ToM 3-ToM0
100
200
300
400
500
4-ToM
0-ToM 1-ToM 2-ToM 3-ToM0
100
200
300
400
500
4-ToM
4-ToM
RL
Nash
What do ToM agents think of RL and Nash?
0-ToM 1-ToM0
100
200
300
400
500
2-ToM
0-ToM 1-ToM0
100
200
300
400
500
2-ToM
0-ToM 1-ToM 2-ToM0
100
200
300
400
500
3-ToM
0-ToM 1-ToM 2-ToM0
100
200
300
400
500
3-ToM
0-ToM 1-ToM 2-ToM 3-ToM0
100
200
300
400
500
4-ToM
0-ToM 1-ToM 2-ToM 3-ToM0
100
200
300
400
500
4-ToM
1 2 4 8 16 32 64 128 256 512
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
tau (game duration)
lam
bd
a (
fre
qu
en
cy o
f co
op
era
tive
ga
me
)
k=0
k=1
k=2
k=3
k=4
RL
Nash
ESS: phase portrait (2)
• Meta-Bayesian inference
- the brain’s model of other brains assumes they are Bayesian too!
- there is an inevitable inflation of the uncertainty of others’ beliefs
The social Bayesian brain: summary
• Theory of mind:
- reciprocal social interaction → recursive beliefs
- Humans → social framing effect (mentalize or be fooled)
- Macaque monkeys → no intentional stance (but training?)
• Evolution of ToM:
- cooperation+learning during evolution → natural bounds to ToM
sophistication (“being right is as good as being smart”)
- evolutionary stable ToM distribution = mixed!
or
probabilistic cue
informed advice
player’s decision
?
P P
A progress bar A
Gold target = 20 CHF Silver target = 10 CHF
Dealing with uncertain motives: advice taking task
Diaconescu et al., in prep.
outcome
*
Dealing with uncertain motives: results (N=16)
HGF 1-ToM 2-ToM0
0.2
0.4
0.6
0.8
1exceedance probabilities
HGF 1-ToM 2-ToM
123456789
10111213141516
sub
jects
model attributions
0 0.2 0.4 0.6 0.8 1
HGF1-ToM2-ToM
12345678910
11
12
13
14
15
16
subjects
0
0.5
11
0
R2=73.0% R2=45.2%
0 0.2 0.4 0.6 0.8 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Model fit: <g(x)|y,m> versus y
<g(x)|y,m>
y
2-ToM: worst subject
0 0.2 0.4 0.6 0.8 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Model fit: <g(x)|y,m> versus y
<g(x)|y,m>
y
2-ToM: best subject
Bayes Laplace
Helmoltz Jaynes
Friston
Marie Devaine
I also would like to thank Dr. S. Bouret & Dr. A. San-Gali (monkey experiments @MBB)