Daunizeau

The social Bayesian brain

Jean Daunizeau Brain and Spine Institute - Paris

“Motivation, Brain & Behaviour” group

The Bayesian brain hypothesis

Cerebral information processing was optimized through natural selection [Friston 2005, Fiorillo 2010]

The brain uses a model of the world that (i) is optimal on average, but (ii) can induce systematic biases [Weiss 2002, Alais 2004]

Theory of mind

= ability to attribute mental states (e.g., beliefs, desires, ...) to others [Premack 1978]

the social Bayesian brain

The social “Bayesian brain” hypothesis

• the brain’s model of other brains should assume they are Bayesian too!

→ ToM = meta-Bayesian inference [Daunizeau 2010a, Daunizeau 2010b]

• although it is limited, is ToM optimal in an evolutionary sense?

• develops very early in life [Surian 2007, Kovacs 2010]

• impairment → severe psychiatric disorders [Baron-Cohen 1985, Frith 1994, Brüne 2005]

• teaching, persuading, deceiving, … → success in social interactions [Baron-Cohen 1999]

• meta-cognitive insight: others’ behaviour is driven by their beliefs [Frith 2012]

Overview of the talk

Inverse Bayesian Decision Theory

Meta-Bayesian modelling of Theory of Mind

Limited ToM sophistication: did evolution fool us?





BDT: from observations to beliefs

• The “amount of information” is related to probability in the sense that one’s belief “vagueness” is well characterized in terms of the dispersion of a probability distribution that measures how (subjectively) plausible is any “state of affairs”. • The subjective plausibility of any “state of affairs” is not captured by the objective frequency of the ensuing observable event. This is because beliefs are shaped by all sorts of (implicit or explicit) prior assumptions that act as a (potentially very complex) filter upon sensory observations.

x

u

hidden (unknown) state of the world

accessible observations or data

• likelihood: • priors: • posterior:

,p u x m

p x m

, ,p x u m p u x m p x m

BDT: from beliefs to decisions

• Bayesian Decision Theory (BDT) is concerned with how decisions are made or should be made, in ambiguous or uncertain situations: - normative: optimal/rational decision? (cf. statistical test) - descriptive: what do people do? (cf. behavioural economics) • BDT is bound to a perspective on preferences → utility theory: - utility functions: surrogate for the task goal (reward contingent on a decision) - subsumes game theory and control theory

a

,x a

alternative actions or decisions

loss function

, , ,u a E x a u m

* arg min ,a

a u a

→ expected cost (posterior risk):

→ BDT-optimal decision rule:

BDT example: speed-accuracy trade-off

2

, ,x a t x a t • loss = estimation error + estimation time:

( )

( )2

00 0

, ,1

,

yt t t

t

p u x m N x u x

xp x m N

• generative model:

2

0 0

12

1:

2

0

, , :1

t

t t

t t t

t

u

p x u m N

t

• posterior belief:

2 2, , t tu a t a t • expected cost:

0 0.5 1 1.5 20

2

4

6

8

10

12

posterior variance

time cost (K=1)

expected loss (K=1)

time cost (K=2)

expected loss (K=2)

BDT example: speed-accuracy trade-off (2)

t* (K=1)

decision time

t* (K=2)

expected inaccuracy (K=2)

expected inaccuracy (K=1)

inverse BDT: the complete class theorem

There (almost) always exist a duplet of prior belief and loss function

such that any observed decision can be interpreted as BDT-optimal

p x m ,x a

0

→ interpreting BDT-optimal responses (e.g., decision times): weak duality

2

0

* arg min , ,

1

tt u a t

tu

tx

(1)| , ,t tp u x m

(1)

1| , ,t tp x x m

1tx

(2)| , ( ),p y m

(2), |p m priors

likelihood Response model

Perceptual model

Perceptual priors

Perceptual likelihood (1)m

(2)my

arg min ,a

g u a

Posterior risk minimization (optimal decider assumption)

y

,u

u

meta-Bayesian model

(2)m

Free-energy maximization (optimal learner assumption)

approximate posterior

|t tq x 1

1

, ,

: arg maxt

t t t

t

f u

f F

Daunizeau et al., 2010a

inverse BDT: the meta-Bayesian approach

y

meta-Bayesian model

ball position

reaction time

0

u

,t t

prior uncertainty

cost of time

Andy Murray’s belief

-10 -5 0 5 100

1

2

3

4x 10

-3

t

t

Andy Murray’s belief

-10 -5 0 5 100

1

2

3

4x 10

-3

meta-Bayesian estimate of Andy Murray’s belief

(2)m





• Operational definition of ToM: taking the intentional stance [Dennett 1996]

– Infer beliefs and preferences from observed behavior

– Use them to understand/predict behaviour

Recursivity and limited sophistication

• Questions:

– Does the meta-Bayesian approach reallistically capture peoples’ ToM?

– Can people appeal to these sophistication levels (e.g. by pure reinforcement) or are

these (social) priors that are set by the context?

– What is the inter-individual variability of ToM sophistication?

• In reciprocal/repeated social interactions, ToM is potentially recursive [Yoshida 2008]

– « I think that you think », « I think that you think that I think », …

– ToM sophistication levels induce different learning rules / behavioural policies

0-ToM does not apply the intentional stance

→ 0-ToM is a Bayesian agent with:

- beliefs (about non-intentional contingencies)

- preferences

0-ToM

« I think that you will hide behind the tree »

1-ToM learns how the other learns

→ 1-ToM is a meta-Bayesian agent with:

- beliefs (about other’s beliefs and preferences)

- preferences

1-ToM

« I think that you think that I will hide behind the tree »

2-ToM

2-ToM learns how the other learns and her ToM sophistication level

→ 2-ToM is a meta-Bayesian agent with:

- beliefs (about other’s beliefs – about one’s beliefs - and preferences)

- preferences

« I think that you think that I think, … »

Recursive meta-Bayesian modelling

• k-ToM learns how the other learns and her ToM sophistication level:

( ) ( ) ( )

1 1, ,k k kf a

• k-ToM acts according to her beliefs and preferences:

( ) ( ) ( )

1, 1 1 1, 1 2exp ,k k kp a a

• This induces a likelihood for a k+1-ToM observer:

'(1,..., ) ( )

1, 1 1, '

' 0 ' 1

, ,k

kk k

k

k

p a m p a

• Deriving the ensuing Free-Energy yields the k+1-ToM learning rule:

( 1)1

( 1) ( 1) ( 1)

1 1

( 1) ( 1)

( 1) (1,..., ) (1,..., ) (1,..., )

1, 1 1

, ,

: arg max

ln , , ln , ln ,

k

k k k

k k

k k k k

k k

f a

f F

F p a m p m q

Performance in competitive games

hider: a1 = 1 hider: a1 = 0

seeker: a2 = 1 -1, 1 1, -1

seeker: a2 = 0 1, -1 -1, 1

1 2 3 4 5

1

2

3

4

5-0.04

-0.02

0

0.02

0.040-ToM

1-ToM

2-ToM

3-ToM

4-ToM

0-T

oM

1-T

oM

2-T

oM

3-T

oM

4-T

oM

512

simulated behavioural performance

(#wins/trial)

outcome table (« hide and seek »)

1-ToM predicts 0-ToM 0-ToM predicts 1-ToM

Everybody is somebody’s fool

You are playing against Player 1

1 2

subject’s choice

Well done!

feedback

(1sec)

social framing

(game « hide and seek »)

alternative options

(1.2 sec)

Behavioural task design

Session 1

1 2

You win!

non-social framing

(casino gambling task)

the social framing effect: group results (N=26)

1.4

1.6

1.8

2

1.5

1.6

1.7

1.8

1.9

2

2.1

non-social framing

social framing random biased 0-ToM 1-ToM 2-ToM

-6

-4

-2

0

2

4

6

*

*

group-average performance (cumulated earnings after 60 game repetitions)

IMT false belief Frith-Happe EQ WSCT Go-NoGo 3-back mean

-6

-4

-2

0

2

4

6

*

*

inter-individual variability in cognitive skills: regression on performance against 1-ToM

1 2 3 4 5 6 7 8

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

1 2 3 4 5 6 7 8

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

6

0.6

0.65

0.7

0.75

own action

lag

we

igh

t

chance level

0-ToM (acc=86%)

1-ToM (acc=75%)

2-ToM (acc=74%)

Volterra decompositions: group results

( ) ( )

01 ...k k

t t

k

p a s u

Volterra 1st-order kernels:

1 2 3 4 5 6 7 8

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

own action

lag

Vo

lte

rra

we

igh

t

lag

* Volterr

a w

eig

ht: S

-NS

1 2 3 4 5 6 7 8

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

opponent's action

lag

Vo

lte

rra

we

igh

t

*

lag

Volterr

a w

eig

ht: S

-NS

0 0.2 0.4 0.6 0.8 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Model fit: <g(x)|y,m> versus y

<g(x)|y,m>

y

2-ToM’s worst fit: subject #21 against 0-ToM

(acc=43%)

observ

ed c

hoic

es

modelled choices

0 0.2 0.4 0.6 0.8 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2


<g(x)|y,m>

y

2-ToM’s best fit: subject #7 against 0-ToM

(acc=79%)

observ

ed c

hoic

es

modelled choices

Volterra Nash WSLS RL 0-ToM 1-ToM 2-ToM 3-ToM-6

-4

-2

0

2

4free energies

1.4

1.6

1.8

2

1.5

1.6

1.7

1.8

1.9

2

2.1

non-social framing

social framing

log model evidences (group average)

Group-level Bayesian model comparison (I)

Volterra Nash WSLS RL 0-ToM 1-ToM 2-ToM 3-ToM0

0.1

0.2

0.3

0.4

0.5

0.6

model frequencies (social condition)estimated model frequencies (social condition)

no-ToM family ToM family

Group-level Bayesian model comparison (II)

no-ToM ToM0

0.2

0.4

0.6

0.8

1model families: exceedance probabilities

1.4

1.6

1.8

2

1.5

1.6

1.7

1.8

1.9

2

2.1

non-social framing

social framing

model families: exceedance probabilities

What about Rhesus-macaques?

[Thom et al., 2013]

Rhesus-macaques: group-results (N=4)

random biased 1-ToM-10

-5

0

5

10

15

1 2 3 4 5 6 7 8-0.4

-0.2

0

0.2

0.4

0.6own action

lag

Vo

lte

rra

we

igh

t

1 2 3 4 5 6 7 8-0.4

-0.2

0

0.2

0.4

0.6opponent's action

lag

Vo

lte

rra

we

igh

t

Rhesus-macaques: group-results (N=4)

no-ToM ToM0

0.2

0.4

0.6

0.8

1model families: exceedance probabilities

Volterra Nash WSLS RL 0-ToM 1-ToM 2-ToM

-20

-10

0

10

20

free energieslog model evidences (group average)

no-ToM family ToM family





Competitive versus cooperative games

1 2 3 4 5

1

2

3

4

5-0.04

-0.02

0

0.02

0.040-ToM

1-ToM

2-ToM

3-ToM

4-ToM

0-T

oM

1-T

oM

2-T

oM

3-T

oM

4-T

oM

1 2 3 4 5

1

2

3

4

5-0.2

0

0.2

0.4

0.6

0.8

0-ToM

1-ToM

2-ToM

3-ToM

4-ToM 0

-To

M

1-T

oM

2-T

oM

3-T

oM

4-T

oM

P1: a1 = 1 P1: a1 = 0

P2: a2 = 1 -1, 1 1, -1

P2: a2 = 0 1, -1 -1, 1

P1: a1 = 1 P1: a1 = 0

P2: a2 = 1 2, 0 -1, -1

P2: a2 = 0 -1, -1 0, 2

« hide and seek » « battle of the sexes »

1-ToM predicts 0-ToM 0-ToM predicts 1-ToM

« hide and seek »

« battle of the sexes »

Being right is as good as being smart

2-ToM vs 2-ToM 3-ToM vs 3-ToM 4-ToM vs 4-ToM

Biases in ToM induction

Can we explain the emergence of the natural bound on ToM sophistication?

→ Average adaptive fitness:

• is a function of the behavioural performance, relative to other phenotypes

• depends upon the frequency of other phenotypes within the population

i

( )iQ expected payoff matrix of game i at round τ

frequency of game i

→ Replicator dynamics [Maynard-Smith 1982, Hofbauer 1998]:

( ) ( )i T i

i i

i i

dsDiag s Q s s Q s

dt

ks frequency of phenotype k within the population

limt

s s t

evolutionary stable states:

Evolutionary game theory

EGT steady state

k=0

k=1

k=2

k=3

k=4

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

evolutionary time

tra

its' fr

eq

ue

ncie

s

EGT replicator dynamics

EGT steady state

k=0

k=1

k=2

k=3

k=4

type and frequency of EGT steady states

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

evolutionary time

tra

its' fr

eq

ue

ncie

s

EGT replicator dynamics

« hide and seek »

« battle of the sexes »

Replicator dynamics and ESS

1 2 4 8 16 32 64 128 256 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

tau (game duration)

lam

bd

a (

fre

qu

en

cy o

f co

op

era

tive

ga

me

)

k=0

k=1

k=2

k=3

k=4

ESS: phase portrait

« hide and seek » « battle of the sexes »

• RL = reinforcement learning agent (cannot adapt to game rules)

• Nash = « rational » agent (cannot adapt to opponent)

0-T

oM

1-T

oM

2-T

oM

3-T

oM

4-T

oM

Nas

h

RL

0-ToM

1-ToM

2-ToM

3-ToM

4-ToM

RL

Nash

0-ToM

1-ToM

2-ToM

3-ToM

4-ToM

RL

Nash

0-T

oM

1-T

oM

2-T

oM

3-T

oM

4-T

oM

Nas

h

RL

Can non-ToM agents invade?

0-ToM 1-ToM0

100

200

300

400

500

2-ToM

0-ToM 1-ToM0

100

200

300

400

500

2-ToM

0-ToM 1-ToM 2-ToM0

100

200

300

400

500

3-ToM

0-ToM 1-ToM 2-ToM0

100

200

300

400

500

3-ToM

0-ToM 1-ToM 2-ToM 3-ToM0

100

200

300

400

500

4-ToM


100

200

300

400

500

4-ToM

4-ToM

RL

Nash

What do ToM agents think of RL and Nash?

0-ToM 1-ToM0

100

200

300

400

500

2-ToM

0-ToM 1-ToM0

100

200

300

400

500

2-ToM

0-ToM 1-ToM 2-ToM0

100

200

300

400

500

3-ToM

0-ToM 1-ToM 2-ToM0

100

200

300

400

500

3-ToM


100

200

300

400

500

4-ToM


100

200

300

400

500

4-ToM

1 2 4 8 16 32 64 128 256 512

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

tau (game duration)

lam

bd

a (

fre

qu

en

cy o

f co

op

era

tive

ga

me

)

k=0

k=1

k=2

k=3

k=4

RL

Nash

ESS: phase portrait (2)

• Meta-Bayesian inference

- the brain’s model of other brains assumes they are Bayesian too!

- there is an inevitable inflation of the uncertainty of others’ beliefs

The social Bayesian brain: summary

• Theory of mind:

- reciprocal social interaction → recursive beliefs

- Humans → social framing effect (mentalize or be fooled)

- Macaque monkeys → no intentional stance (but training?)

• Evolution of ToM:

- cooperation+learning during evolution → natural bounds to ToM

sophistication (“being right is as good as being smart”)

- evolutionary stable ToM distribution = mixed!

or

probabilistic cue

informed advice

player’s decision

?

P P

A progress bar A

Gold target = 20 CHF Silver target = 10 CHF

Dealing with uncertain motives: advice taking task

Diaconescu et al., in prep.

outcome

*

Dealing with uncertain motives: results (N=16)

HGF 1-ToM 2-ToM0

0.2

0.4

0.6

0.8

1exceedance probabilities

HGF 1-ToM 2-ToM

123456789

10111213141516

sub

jects

model attributions

0 0.2 0.4 0.6 0.8 1

HGF1-ToM2-ToM

12345678910

11

12

13

14

15

16

subjects

0

0.5

11

0

R2=73.0% R2=45.2%

0 0.2 0.4 0.6 0.8 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2


<g(x)|y,m>

y

2-ToM: worst subject

0 0.2 0.4 0.6 0.8 1-0.2

0

0.2

0.4

0.6

0.8

1

1.2


<g(x)|y,m>

y

2-ToM: best subject

Bayes Laplace

Helmoltz Jaynes

Friston

Marie Devaine

I also would like to thank Dr. S. Bouret & Dr. A. San-Gali (monkey experiments @MBB)

Technology

Daunizeau