22
Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Embed Size (px)

Citation preview

Page 1: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Emotion-Driven Reinforcement LearningBob Marinier & John LairdUniversity of Michigan, Computer Science and EngineeringCogSci’08

Page 2: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

2

Introduction

•Interested in the functional benefits of emotion for a cognitive agent▫Appraisal theories of emotion▫PEACTIDM theory of cognitive control

•Use emotion as a reward signal to a reinforcement learning agent▫Demonstrates a functional benefit of

emotion▫Provides a theory of the origin of intrinsic

reward

Page 3: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

3

Outline

•Background▫Integration of emotion and cognition▫Integration of emotion and reinforcement

learning▫Implementation in Soar

•Learning task•Results

Page 4: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

4

Appraisal Theories of Emotion• A situation is evaluated along a number of appraisal

dimensions, many of which relate the situation to current goals▫ Novelty, goal relevance, goal conduciveness, expectedness,

causal agency, etc.• Appraisals influence emotion• Emotion can then be coped with (via internal or external

actions)

SituationGoals

Appraisals

Emotion

Coping

Page 5: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

5

Appraisals to Emotions (Scherer 2001)

Joy Fear Anger

Suddenness High/medium High High

Unpredictability High High High

Intrinsic pleasantness Low

Goal/need relevance High High High

Cause: agent Other/nature

Other

Cause: motive Chance/intentional

Intentional

Outcome probability Very high High Very high

Discrepancy from expectation

High High

Conduciveness Very high Low Low

Control High

Power Very low High

Page 6: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

6

Cognitive Control: PEACTIDM (Newell 1990)Perceive Obtain raw perception

Encode Create domain-independent representation

Attend Choose stimulus to process

Comprehend Generate structures that relate stimulus to tasks and can be used to inform behavior

Task Perform task maintenance

Intend Choose an action, create prediction

Decode Decompose action into motor commands

Motor Execute motor commands

Page 7: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

7

Unification of PEACTIDM and Appraisal Theories

Comprehend

Perceive

Encode

Attend

Intend

Decode

Motor

Raw Perceptual Information

Stimulus Relevance

Stimulus chosen for processing

Current Situation Assessment

Action

Motor Commands

Environmental Change

SuddennessUnpredictabilityGoal Relevance

Intrinsic Pleasantness

Causal Agent/MotiveDiscrepancy

ConducivenessControl/Power

Prediction

Outcome Probability

Page 8: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

8

Distinction between emotion, mood, and feeling(Marinier & Laird 2007)

•Emotion: Result of appraisals▫Is about the current situation

•Mood: “Average” over recent emotions▫Provides historical context

•Feeling: Emotion “+” Mood▫What agent actually perceives

Page 9: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

10

Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004)

Environment

Critic

Agent

Actions

StatesRewar

ds

External Environmen

t

Internal Environmen

t

Agent

Critic

Actions

StatesRewar

ds

Sensations

Appraisal Process

+/- Feeling

Intensity

Decisions

“Organism”

•Reward = Intensity * Valence

Page 10: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Body

Symbolic Long-Term Memories

Procedural

Short-Term Memory

Situation, Goals

Decision Procedure

Chunking

Reinforcement

Learning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception

ActionVisual

Imagery

Ap

pra

isal

Dete

ctor

Extending Soar with Emotion(Marinier & Laird 2007)

11

Page 11: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Ap

pra

isal

Dete

ctor

Reinforcement

Learning

Emotion.5,.7,0,-.4,.3,

Extending Soar with Emotion(Marinier & Laird 2007)

12

Body

Decision Procedure

Perception

Action

Appraisals

Feelings Short-Term Memory

Situation, Goals

Mood.7,-.2,.8,.3,.6,

Feelings

Knowledge

Architecture

Symbolic Long-Term Memories

Procedural

Chunking

Semantic

SemanticLearning

Episodic

EpisodicLearning

+/- In

tensity

Feeling.9,.6,.5,-.1,.8,

VisualImagery

Page 12: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Learning task

13

Start

Goal

Page 13: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Learning task: Encoding

14

SouthPassable: trueOn path: trueProgress: true

EastPassable: falseOn path: trueProgress: true

WestPassable: falseOn path: falseProgress: true

NorthPassable: falseOn path: falseProgress: true

Page 14: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Learning task: Encoding & Appraisal

15

SouthIntrinsic Pleasantness: NeutralGoal Relevance: HighUnpredictability: Low

EastIntrinsic Pleasantness: LowGoal Relevance: HighUnpredictability: High

WestIntrinsic Pleasantness: LowGoal Relevance: LowUnpredictability: High

NorthIntrinsic Pleasantness: LowGoal Relevance: LowUnpredictability: High

Page 15: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Learning task: Attending, Comprehending & Appraisal

16

SouthIntrinsic Pleasantness: NeutralGoal Relevance: HighUnpredictability: LowConduciveness: HighControl: High …

Page 16: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Learning task: Tasking

17

Page 17: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

Learning task: Tasking

18

Optimal Subtasks

Page 18: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

19

What is being learned?

•When to Attend vs Task•If Attending, what to Attend to•If Tasking, which subtask to create•When to Intend vs. Ignore

Page 19: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

20

Learning Results

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

2000

4000

6000

8000

10000

12000

Standard RL Feeling=EmotionFeeling=Emotion+Mood

Episode

Med

ian

Pro

cess

ing

Cyc

les

Page 20: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

21

Results: With and without mood

8 9 10 11 12 13 14 15240

250

260

270

280

290

300

Feeling=Emotion Feeling=Emotion+MoodOptimal

Episode

Med

ian

Pro

cess

ing

Cyc

les

Page 21: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

22

Discussion

•Agent learns both internal (tasking) and external (movement) actions

•Emotion allows for more frequent rewards, and thus learns faster than standard RL

•Mood “fills in the gaps” allowing for even faster learning and less variability

Page 22: Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08

23

Conclusion & Future Work

•Demonstrated computational model that integrates emotion and cognitive control

•Confirmed emotion can drive reinforcement learning

•We have already successfully demonstrated similar learning in a more complex domain

•Would like to explore multi-agent scenarios