49
Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience Unit, UCL

Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Prefrontal cortex as a Meta-reinforcement learning system

Matthew BotvinickDeepMind, London UKGatsby Computational Neuroscience Unit, UCL

Page 2: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Mnih et al, Nature (2015)

Page 3: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 4: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Mnih et al, Nature (2015)

Page 5: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Yamins & DiCarlo, 2016

Page 6: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Schultz et al, Science (1997)

Page 7: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Jederberg et al., 2016

Page 8: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Jederberg et al., 2016

Page 9: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Mante et al., Nature, 2013

Song et al., Elife, 2017

Page 10: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 11: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Lake et al, BBS (2017)

Page 12: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 13: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Harlow, Psychological Review, 1949

“Learning to learn”

Page 14: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Harlow, Psychological Review, 1949

Training episodes

“Learning to learn”

Page 15: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Mnih et al, Nature (2015)

Page 16: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 17: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Jederberg et al., 2016

Page 18: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Jederberg et al., 2016

Page 19: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/

Page 20: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 21: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)

Page 22: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Page 23: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Page 24: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Trial1008060401 20

1

2

3

4

Cum

ulat

ive

regr

et

Gittins indices

UCBThompson sampling

Trial

Episode

Left Right

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Page 25: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Page 26: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Page 27: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Trial1008060401 20

1

2

3

4

Cum

ulat

ive

regr

et

Gittins indices

UCBThompson sampling

Trial

Episode

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Page 28: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Training episodes

Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)

Page 29: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

Page 30: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

420-2-4

-4

-2

0

2

4

log2RRRL

log 2

CR CL

420-2-4

log2RRRL

log 2

CR CL

-4

-2

0

2

4

Tsutsui et al., Nature Comms, 2016

Wang et al., Nature Neuroscience (2018)

Page 31: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

Page 32: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at-1 rt-1 at-1x rt-1 vt

0.2

0.1

0.3

0.4

0.5

0.6

Pro

porti

on

Tsutsui et al., Nature Comms, 2016

0.2

0.1

0.3

0.4

0.5

0.6

Cor

rela

tion

at-1 rt-1 at-1x rt-1 vt

Wang et al., Nature Neuroscience (2018)

Page 33: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Wang et al., Nature Neuroscience (2018)

Page 34: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 35: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Trial1008060401 20

1

2

3

4

Cum

ulat

ive

regr

et

Gittins indices

UCBThompson sampling

Trial

Episode

Page 36: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

A

B

0 20 40 60 80 100 120 140 160 180 200Step

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100 120 140 160 180 200

Step

Reward probability

Inferred/decoded volatilityLearning rate

action feedback

Behrens et al., Nature Neuroscience, 2007Wang et al., Nature Neuroscience (2018)

Page 37: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Behrens et al., Nature Neuroscience, 2007Wang et al., Nature Neuroscience (2018)

Page 38: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Volkmann et al., Nature Reviews Neurology, 2010

Page 39: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Bromberg-Martin et al, J Neurophys, 2010

REVERSAL

Wang et al., Nature Neuroscience (2018)

Page 40: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

at vt

ot at-1 rt-1

δ

(PFC)

(DA)

Left rewardedRight rewarded

Wang et al., Nature Neuroscience (2018)

Page 41: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 42: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011

Model-based RPE

Stage 2

1

0

-1

1-1 0

Met

a-R

L R

PE

Reward

r2 = 0.89

Model-based RL (from model-free RL)

Wang et al., Nature Neuroscience (2018)

Page 43: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

DA blocked uponfood reward fromlarge/risky option

DA blocked upon food reward from

small/certain option

DA triggered uponfood omission from large/risky option

Wang et al., arXiv; 2018Stopper et al., Neuron, 2014

Optogenetic manipulation of dopamine

Page 44: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Mnih et al, Nature (2015)

Page 45: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 46: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience
Page 47: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

• Richer environments / abstractions (Espeholt et al., arXiv, 2018)

• Architectural biases (e.g., Raposo et al., NIPS, 2017)

• Complementary forms of meta-learning (e.g., Fernando et al., under review)

• Episodic reinstatement (Ritter et al., in press)

Current / Future Work

Page 48: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Neuroscience and AI: A virtuous circle

Page 49: Prefrontal cortex as a Meta-reinforcement learning system · Prefrontal cortex as a Meta-reinforcement learning system Matthew Botvinick DeepMind, London UK Gatsby Computational Neuroscience

Jane WangZeb Kurth-NelsonDharshan KumaranChris SummerfieldHubert SoyerJoel LeiboSam Ritter

Collaborators

Adam SantoroTim LillicrapDavid Barrett Dhruva TirumalaRemi MunosCharles BlundellDemis Hassabis

DeepMind, London UKGatsby Computational Neuroscience Unit, UCL