Upload
enrique-hembrough
View
216
Download
2
Embed Size (px)
Citation preview
Emotion-Driven Reinforcement LearningBob Marinier & John LairdUniversity of Michigan, Computer Science and EngineeringCogSci’08
2
Introduction
•Interested in the functional benefits of emotion for a cognitive agent▫Appraisal theories of emotion▫PEACTIDM theory of cognitive control
•Use emotion as a reward signal to a reinforcement learning agent▫Demonstrates a functional benefit of
emotion▫Provides a theory of the origin of intrinsic
reward
3
Outline
•Background▫Integration of emotion and cognition▫Integration of emotion and reinforcement
learning▫Implementation in Soar
•Learning task•Results
4
Appraisal Theories of Emotion• A situation is evaluated along a number of appraisal
dimensions, many of which relate the situation to current goals▫ Novelty, goal relevance, goal conduciveness, expectedness,
causal agency, etc.• Appraisals influence emotion• Emotion can then be coped with (via internal or external
actions)
SituationGoals
Appraisals
Emotion
Coping
5
Appraisals to Emotions (Scherer 2001)
Joy Fear Anger
Suddenness High/medium High High
Unpredictability High High High
Intrinsic pleasantness Low
Goal/need relevance High High High
Cause: agent Other/nature
Other
Cause: motive Chance/intentional
Intentional
Outcome probability Very high High Very high
Discrepancy from expectation
High High
Conduciveness Very high Low Low
Control High
Power Very low High
6
Cognitive Control: PEACTIDM (Newell 1990)Perceive Obtain raw perception
Encode Create domain-independent representation
Attend Choose stimulus to process
Comprehend Generate structures that relate stimulus to tasks and can be used to inform behavior
Task Perform task maintenance
Intend Choose an action, create prediction
Decode Decompose action into motor commands
Motor Execute motor commands
7
Unification of PEACTIDM and Appraisal Theories
Comprehend
Perceive
Encode
Attend
Intend
Decode
Motor
Raw Perceptual Information
Stimulus Relevance
Stimulus chosen for processing
Current Situation Assessment
Action
Motor Commands
Environmental Change
SuddennessUnpredictabilityGoal Relevance
Intrinsic Pleasantness
Causal Agent/MotiveDiscrepancy
ConducivenessControl/Power
Prediction
Outcome Probability
8
Distinction between emotion, mood, and feeling(Marinier & Laird 2007)
•Emotion: Result of appraisals▫Is about the current situation
•Mood: “Average” over recent emotions▫Provides historical context
•Feeling: Emotion “+” Mood▫What agent actually perceives
10
Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004)
Environment
Critic
Agent
Actions
StatesRewar
ds
External Environmen
t
Internal Environmen
t
Agent
Critic
Actions
StatesRewar
ds
Sensations
Appraisal Process
+/- Feeling
Intensity
Decisions
“Organism”
•Reward = Intensity * Valence
Body
Symbolic Long-Term Memories
Procedural
Short-Term Memory
Situation, Goals
Decision Procedure
Chunking
Reinforcement
Learning
Semantic
SemanticLearning
Episodic
EpisodicLearning
Perception
ActionVisual
Imagery
Ap
pra
isal
Dete
ctor
Extending Soar with Emotion(Marinier & Laird 2007)
11
Ap
pra
isal
Dete
ctor
Reinforcement
Learning
Emotion.5,.7,0,-.4,.3,
…
Extending Soar with Emotion(Marinier & Laird 2007)
12
Body
Decision Procedure
Perception
Action
Appraisals
Feelings Short-Term Memory
Situation, Goals
Mood.7,-.2,.8,.3,.6,
…
Feelings
Knowledge
Architecture
Symbolic Long-Term Memories
Procedural
Chunking
Semantic
SemanticLearning
Episodic
EpisodicLearning
+/- In
tensity
Feeling.9,.6,.5,-.1,.8,
…
VisualImagery
Learning task
13
Start
Goal
Learning task: Encoding
14
SouthPassable: trueOn path: trueProgress: true
EastPassable: falseOn path: trueProgress: true
WestPassable: falseOn path: falseProgress: true
NorthPassable: falseOn path: falseProgress: true
Learning task: Encoding & Appraisal
15
SouthIntrinsic Pleasantness: NeutralGoal Relevance: HighUnpredictability: Low
EastIntrinsic Pleasantness: LowGoal Relevance: HighUnpredictability: High
WestIntrinsic Pleasantness: LowGoal Relevance: LowUnpredictability: High
NorthIntrinsic Pleasantness: LowGoal Relevance: LowUnpredictability: High
Learning task: Attending, Comprehending & Appraisal
16
SouthIntrinsic Pleasantness: NeutralGoal Relevance: HighUnpredictability: LowConduciveness: HighControl: High …
Learning task: Tasking
17
Learning task: Tasking
18
Optimal Subtasks
19
What is being learned?
•When to Attend vs Task•If Attending, what to Attend to•If Tasking, which subtask to create•When to Intend vs. Ignore
20
Learning Results
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
2000
4000
6000
8000
10000
12000
Standard RL Feeling=EmotionFeeling=Emotion+Mood
Episode
Med
ian
Pro
cess
ing
Cyc
les
21
Results: With and without mood
8 9 10 11 12 13 14 15240
250
260
270
280
290
300
Feeling=Emotion Feeling=Emotion+MoodOptimal
Episode
Med
ian
Pro
cess
ing
Cyc
les
22
Discussion
•Agent learns both internal (tasking) and external (movement) actions
•Emotion allows for more frequent rewards, and thus learns faster than standard RL
•Mood “fills in the gaps” allowing for even faster learning and less variability
23
Conclusion & Future Work
•Demonstrated computational model that integrates emotion and cognitive control
•Confirmed emotion can drive reinforcement learning
•We have already successfully demonstrated similar learning in a more complex domain
•Would like to explore multi-agent scenarios