59
Prediction, Control and Decisions Kenji Doya [email protected] Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan Science and Technology Agency Nara Institute of Science and Technology

Prediction, Control and Decisions Kenji Doya [email protected] Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Embed Size (px)

Citation preview

Page 1: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Prediction, Control and DecisionsKenji Doya

[email protected]

Initial Research Project, OISTATR Computational Neuroscience LaboratoriesCREST, Japan Science and Technology Agency

Nara Institute of Science and Technology

Page 2: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Outline

Introduction

Cerebellum, basal ganglia, and cortex

Meta-learning and neuromodulators

Prediction time scale and serotonin

Page 3: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Learning to Walk (Doya & Nakano, 1985)

Action: cycle of 4 posturesReward: speed sensor output

Multiple solutions: creeping, jumping,…

QuickTime˛ Ç∆H.263 êLí£ÉvÉçÉOÉâÉÄǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

Page 4: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Learning to Stand Up (Morimoto &Doya, 2001)

QuickTime˛ Ç∆ÉVÉlÉpÉbÉN êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇ…ÇÕïKóvÇ≈Ç∑ÅB

QuickTime˛ Ç∆ÉVÉlÉpÉbÉN êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇ…ÇÕïKóvÇ≈Ç∑ÅB

early trials

after learning Reward: height of the headNo desired trajectory

Page 5: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Framework for learning state-action mapping (policy) by exploration and reward feedback

Criticreward prediction

Actoraction selection

Learningexternal reward rinternal reward : difference from prediction

Reinforcement Learning (RL)

environment

reward r

action a

state s

agentcritic

actor

Page 6: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Reinforcement Learning Methods

Model-free MethodsEpisode-based

parameterize policy P(a|s; )Temporal difference

state value function V(s)(state-)action value function Q(s,a)

Model-based methodsDynamic Programming

forward model P(s’|s,a)

Page 7: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Temporal Difference Learning

Predict reward: value functionV(s) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s]Q(s,a) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s, a(t)=a]

Select actiongreedy: a = argmax Q(s,a)Boltzmann: P(a|s) exp[ Q(s,a)]

Update prediction: TD error(t) = r(t) + V(s(t+1)) - V(s(t))V(s(t)) = (t)Q(s(t),a(t)) = (t)

Page 8: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Dynamic Programming and RL

Dynamic Programmingmodel-based, off-line

solve Bellman equationV(s) = maxa s’ [ P(s’|s,a) {r(s,a,s’) + V(s’)}]

Reinforcement Learningmodel-free, on-line

learn by TD error(t) = r(t) + V(s(t+1)) - V(s(t))V(s(t)) = (t)Q(s(t),a(t)) = (t)

Page 9: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Discrete vs. Continuous RL(Doya, 2000)

Discrete time

Continuous time

V (x) = E r(t) + γr(t + Δt) + γ 2r(t + 2Δt) + ...[ ]

δ(t) = r(t) + γV (t + Δt) −V (t)

V(x) = es−tτ r(s)ds

t

∫δ(t) =r(t)+ ˙ V (t) −

V(t)

τ=Δt

1−γ, γ =1−

Δtτ

Page 10: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Questions

Computational QuestionsHow to learn:

direct policy P(a|s)value functions V(s), Q(s,a)forward models P(s’|s,a)

When to use which method?Biological Questions

Where in the brain?How are they represented/updated?How are they selected/coordinated?

Page 11: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Brain HierarchyForebrainCerebral cortex (a)

neocortexpaleocortex: olfactory cortex archicortex: basal forebrain,

hippocampusBasal nuclei (b)

neostriatum: caudate, putamenpaleostriatum: globus pallidusarchistriatum: amygdala

Diencephalonthalamus (c)hypothalamus (d)

Brain stem & CerebellumMidbrain (e)Hindbrain

pons (f)cerebellum (g)

Medulla (h)Spinal cord (i)

Page 12: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Just for Motor Control?(Middleton & Strick 1994)

Basal ganglia (Globus Pallidus)

Prefrontal cortex (area46)

Cerebellum (dentate nucleus)

Page 13: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

thalamus

SN

IO

Cortex

BasalGanglia

Cerebellum

target

error+

-

outputinput

Cerebellum: Supervised Learning

reward

outputinput

Basal Ganglia: Reinforcement Learning

Cerebral Cortex : Unsupervised Learning

outputinput

Specialization by Learning Algorithms

(Doya, 1999)

Page 14: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Cerebellum

Purkinje cells~105 parallel fiberssingle climbing fiberlong-term depression

Supervised learningperceptron hypothesisinternal models

Page 15: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

early learning after learning

Internal Models in the Cerebellum

(Imamizu et al., 2000)

Learning to use ‘rotated’ mouse

Page 16: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Motor Imagery (Luft et al. 1998)

Finger movement Imagery of movement

Page 17: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Basal Ganglia

Striatumstriosome & matrixdopamine-dependent plasticity

Dopamine neuronsreward-predictive response

TD learning

Page 18: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

(a) äwèKëO

(b) äwèKå„

(c) ïÒèVǻǵ

ïÒèV r

ÉhÅ[ÉpÉ~Éìç◊ñE

ïÒèVó\ë™ V

ïÒèV r

ÉhÅ[ÉpÉ~Éìç◊ñE

ïÒèVó\ë™ V

ïÒèV r

ÉhÅ[ÉpÉ~Éìç◊ñE

ïÒèVó\ë™ V

r

V

r

V

r

V

Dopamine Neurons and TD Error

(t) = r(t) + V(s(t+1)) - V(s(t))before learning

after learning

omit reward

(Schultz et al. 1997)

Page 19: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Reward-predicting Activities of Striatal Neurons

Delayed saccade task (Kawagoe et al., 1998)

Not just actions, but resulting rewards

Reward: Right Up Left Down All

Target: Right

Up

Left

Down

Page 20: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Cerebral Cortex

Recurrent connectionsHebbian plasticity

Unsupervised learning, e.g., PCA, ICA

Page 21: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Replicating V1Receptive Fields

(Olshausen & Field, 1996)

Infomax and sparsenessHebbian plasticity and recurrent inhibition

Page 22: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Specialization by Learning?

Cerebellum: Supervised learningerror signal by climbing fibersforward model s’=f(s,a) and policy a=g(s)

Basal ganglia: Reinforcement leaningreward signal by dopamine fibersvalue functions V(s) and Q(s,a)

Cerebral cortex: Unsupervised learningHebbian plasticity and recurrent inhibitionrepresentation of state s and action a

But how are they recruited and combined?

Page 23: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Multiple Action Selection Schemes

Model-freea = argmaxa Q(s,a)

Model-baseda = argmaxa [r+V(f(s,a))]

forward model: f(s,a) Encapsulation

a = g(s)

sa

Qs’a

Vai

f

s

sa

g

Page 24: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Lectures at OCNC 2005

Internal models/CerebellumReza ShadmehrStefan SchaalMitsuo Kawato

Reward/Basal gangliaAndrew G. BartoBernard BalleinePeter DayanJohn O’DohertyMinoru KimuraWolfram Schultz

State coding/CortexNathaniel DawLeo SugrueDaeyeol LeeJun TanjiAnitha PasupathyMasamichi Sakagami

Page 25: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Outline

Introduction

Cerebellum, basal ganglia, and cortex

Meta-learning and neuromodulators

Prediction time scale and serotonin

Page 26: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Framework for learning state-action mapping (policy) by exploration and reward feedback

Criticreward prediction

Actoraction selection

Learningexternal reward rinternal reward : difference from prediction

Reinforcement Learning (RL)

environment

reward r

action a

state s

agentcritic

actor

Page 27: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Reinforcement Learning

Predict reward: value functionV(s) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s]Q(s,a) = E[ r(t) + r(t+1) + 2r(t+2)…| s(t)=s, a(t)=a]

Select actiongreedy: a = argmax Q(s,a)Boltzmann: P(a|s) exp[ Q(s,a)]

Update prediction: TD error(t) = r(t) + V(s(t+1)) - V(s(t))V(s(t)) = (t)Q(s(t),a(t)) = (t)

Page 28: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Cyber Rodent Project

Robots with same constraint as biological agents

What is the origin of rewards?What to be learned, what to be evolved?

Self-preservationcapture batteries

Self-reproductionexchange programs through IR ports

Page 29: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Cyber Rodent: Hardware

camera range sensor proximity sensors gyro

battery latch        two wheels

IR port speaker microphones R/G/B LED

Page 30: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Evolving Robot Colony

Survivalcatch battery packs

Reproductioncopy ‘genes’ through IR ports

QuickTime˛ Ç∆YUV420 ÉRÅ[ÉfÉbÉN êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

QuickTime˛ Ç∆YUV420 ÉRÅ[ÉfÉbÉN êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

Page 31: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Discounting Future Reward

large small

QuickTime˛ Ç∆DV/DVCPRO - NTSC êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

QuickTime˛ Ç∆DV/DVCPRO - NTSC êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

Page 32: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Setting of Reward Function

Reward r = rmain + rsupp - rcost

e.g., reward for vision of battery

QuickTime˛ Ç∆DV/DVCPRO - NTSC êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

Page 33: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Reinforcement Learning of Reinforcement Learning (Schweighfer&Doya, 2003)

Fluctuations in the metaparameters correlate with average reward

reward

Page 34: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Battery level

β

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

2

4

6

8

10

12

14

Randomness Control by Battery Level

Greedier action at both extremes

Page 35: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Neuromodulators for Metalearning

(Doya, 2002)

Metaparameter tuning is critical in RLHow does the brain tune them?

Dopamine: TD error Acetylcholine: learning rate Noradrenaline: inv. temp. Serotonin: discount

Page 36: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Learning Rate

V(s(t-1)) = (t)

Q(s(t-1),a(t-1)) = (t)small slow learninglarge unstable learning

Acetylcholine basal forebrainRegulate memory update and retention

(Hasselmo et al.)

LTP in cortex, hippocampustop-down and bottom-up information flow

Page 37: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Inverse Temperature

Greediness in action selection

P(ai|s) exp[ Q(s,ai)]

small exploration

large exploitation

Noradrenaline locus coeruleusCorrelation with performance accuracy

(Aston-Jones et al.)

Modulation of cellular I/O gain(Cohen et al.)

-4 -2 0 2 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Q(s,a1)-Q(s,a

2)

P(a

1)

=0=1=10

Page 38: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Serotonin dorsal rapheLow activity associated with impulsivity

depression, bipolar disordersaggression, eating disorders

Discount Factor

1 2 3 4 5 6 7 8 9 10

-1

-0.5

0

0.5

1

Time

Reward TextEnd

=0.5 =-0.093 V

1 2 3 4 5 6 7 8 9 10

-1

-0.5

0

0.5

1

Time

Reward TextEnd

=0.9 =+0.062 V

V(s(t)) = E[ r(t+1) + r(t+2) + 2r(t+3) + …]Balance between short- and long-term results

Page 39: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

TD Error

(t) = r(t) + V(s(t)) - V(s(t-1))

Global learning signal

reward prediction: V(s(t-1)) = (t)

reinforcement: Q(s(t-1),a(t-1)) = (t)

Dopamine substantia nigra, VTARespond to errors in reward predictionReinforcement of actions

addiction

Page 40: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

TD Model of Basal Ganglia(Houk et al. 1995, Montague et al. 1996, Schultz et al. 1997,...)

Striosome: state value V(s)Matrix: action value Q(s,a)

evaluation

action selection

state representation

actionoutput

sensoryinput

TD signal

Cerebral cortex

Striatum

Dopamine neurons

reward SNr, GP

Thalamus

s

V(s)

DA neurons: TD error

r

Q(s,a) a

SNr/GPi: action selection: Q(s,a) a

NA?

Ach?

5-HT?

Page 41: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Possible Control of Discount Factor

Modulation of TD error

Selection/weighting of parallel networks

V1 V2 V31 2 3

striatum

Dopamineneurons (t)

V(s(t))

V(s(t+1))

(t) = r(t) + γV (s(t +1)) −V (s(t))

Page 42: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Markov Decision Task(Tanaka et al., 2004)

State transition and reward functions

Stimulus and response

Page 43: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Behavior Results

All subjects successfully learned optimal behavior

Page 44: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Block-Design Analysis

SHORT vs. NO (p < 0.001 uncorrected)

LONG vs. SHORT (p < 0.0001 uncorrected)

OFC Insula Striatum Cerebellum

CerebellumStriatum Dorsal rapheDLPFC, VLPFC, IPC, PMd

Different brain areas involved in immediate and future reward prediction

Page 45: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Ventro-Dorsal Difference

Lateral PFC Insula Striatum

Page 46: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

 

Estimate V(t) and (t) from subjects’ performance dataRegression analysis of fMRI data

Model-based Regressor Analysis

fMRI data

Policy

reward r(t)

state s(t)

action a(t)

TD error (t)

Agent

Value functionV(s)

Value functionV(s)

TD error (t)

Environment

20yen

Page 47: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Explanatory Variables (subject NS)

Reward prediction V(t)

= 0

= 0.3

= 0.6

= 0.8

= 0.9

= 0.99

Reward prediction error t

= 0

= 0.3

= 0.6

= 0.8

= 0.9

= 0.99

1 312trial

Page 48: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Regression Analysis

mPFC Insula

x = -2 mm x = -42 mm

Reward prediction

V

Reward prediction error

Striatum

z = 2

Page 49: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Tryptophan Depletion/Loading

Tryptophan: precursor of serotonindepletion/loading affect central serotonin levels(e.g. Bjork et al. 2001, Luciana et al. 2001)

100 g of amino acid drinkexperiments after 6 hours

Day2: Tr0 Day3: Tr+Day1: Tr-

10.3g of tryptophan (Loading)

No tryptophan (Depletion)

2.3g of tryptophan(Control)

Page 50: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Blood Tryptophan LevelsBlood Tryptophan Levels

N.D. (< 3.9 g/ml)

Page 51: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Delayed Reward Choice TaskDelayed Reward Choice Task

Page 52: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Delayed Reward Choice Task

Sessions

Initial black patches

Patches/step

Yellow White Yellow White

1,2,7,8 72 24

18 9 8 2 6 2

3 72 24

18 9 8 2 14 2

4 72 24

18 9 16 2 14 2

5,6 72 24

18 9 16 2 6 2

yellow: large reward with long delaywhite: small reward with short delay

Page 53: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Choice Behaviors

Shift of indifference linenot consistent among 12 subjects

Page 54: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Modulation of Striatal Response

Tr0

0.990.90.80.70.6

Tr- Tr+

Page 55: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Modulation by Tr Levels

QuickTime˛ Ç∆TIFFÅiLZWÅj êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

QuickTime˛ Ç∆TIFFÅiLZWÅj êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

QuickTime˛ Ç∆TIFFÅiLZWÅj êLí£ÉvÉçÉOÉâÉÄ

ǙDZÇÃÉsÉNÉ`ÉÉÇ å©ÇÈÇΩÇflÇ…ÇÕïKóvÇ≈Ç∑ÅB

Page 56: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Changes in Correlation CoefficientChanges in Correlation Coefficient

= 0.6(28, 0, -4)

= 0.99(16, 2, 28)

Tr- < Tr+correlation with V at large in dorsal Putamen

Tr- > Tr+correlation with V at small in ventral PutamenR

egre

ssio

n s

lop

eR

egre

ssio

n s

lop

e

ROI (region of interest) analysis

Page 57: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Summary

Immediate rewardlateral OFC

Future rewardparietal, PMd, DLPFlateranl cerebellumdorsal raphe

Ventro-dorsal gradientinsulastriatum

Serotonergic modulation

Page 58: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Outline

Introduction

Cerebellum, basal ganglia, and cortex

Meta-learning and neuromodulators

Prediction time scale and serotonin

Page 59: Prediction, Control and Decisions Kenji Doya doya@irp.oist.jp Initial Research Project, OIST ATR Computational Neuroscience Laboratories CREST, Japan

Collaborators

Kyoto PUMMinoru KimuraYasumasa Ueda

Hiroshima UShigeto YamawakiYasumasa OkamotoGo OkadaKazutaka UedaShuji AsahiKazuhiro Shishida

ATRJun MorimotoKazuyuki Samejima

CRESTNicolas SchweighoferGenci Capi

NAISTSaori Tanaka

OISTEiji UchibeStefan Elfwing