Upload
lonnie-dane
View
213
Download
0
Embed Size (px)
Citation preview
Bob MarinierOral Defense
University of Michigan, CSEJune 17, 2008
A Computational Unification of Cognitive Control, Emotion, and Learning
IntroductionThe link between core cognitive functions
and emotion has not been fully exploredExisting computational models are largely
pragmaticWe integrate the PEACTIDM theory of
cognitive control and appraisal theories of emotion PEACTIDM supplies process, appraisal
theories supply dataWe use emotion-driven reinforcement
learning to demonstrate improved functionalityAutomatically generate rewards, set
parameters
2
Cognitive Control: PEACTIDM
Perceive Obtain raw perceptionEncode Create domain-independent
representationAttend Choose stimulus to processComprehend
Generate structures that relate stimulus to tasks and can be used to inform behavior
Task Perform task maintenanceIntend Choose an action, create
predictionDecode Decompose action into motor
commandsMotor Execute motor commands
3
PEACTIDM Cycle
Comprehend
Perceive
Encode
Attend
Intend
Decode
Motor
Raw Perceptual Information
Stimulus Relevance
Stimulus chosen for processing
Current Situation Assessment
Action
Motor Commands
Environmental Change
Prediction
What is thisinformation?
4
Appraisal Theories of EmotionA situation is evaluated along a number of appraisal
dimensions,many of which relate the situation to current goalsNovelty, goal relevance, goal conduciveness, expectedness,
causal agency, etc.Result of appraisals influences emotionEmotion can then be coped with (via internal or external
actions)
SituationGoals
Appraisals
Emotion
Coping
5
Appraisals to Emotions (Scherer 2001)
6
Joy Fear Anger
Suddenness High/medium High High
Unpredictability High High High
Intrinsic pleasantness Low
Goal/need relevance High High High
Cause: agent Other/nature
Other
Cause: motive Chance/intentional
Intentional
Outcome probability Very high High Very high
Discrepancy from expectation
High High
Conduciveness Very high Low Low
Control High
Power Very low HighWhy these dimensions?What is the functional purpose of emotion?
Unification of PEACTIDM and Appraisal Theories
7
Comprehend
Perceive
Encode
Attend
Intend
Decode
Motor
Raw Perceptual Information
Stimulus Relevance
Stimulus chosen for processing
Current Situation Assessment
Action
Motor Commands
Environmental Change
SuddennessUnpredictabilityGoal Relevance
Intrinsic Pleasantness
Causal Agent/MotiveDiscrepancy
ConducivenessControl/Power
Prediction
Outcome Probability
Example: Simple Choice Response Task
8
-1 -0.5 0 0.5 1
Em
oti
on I
n-
tensit
y w
ith
Vale
nce
Time
Appraisal FrameSuddenness 1
Goal Relevance 1
Conduciveness 1
Discrepancy 0
Outcome Probability
1
PEACTIDM in the Button Task
9 “Surprise Factor”
-1 -0.5 0 0.5 1
Em
oti
on I
n-
tensit
y w
ith
Vale
nce
Time
Appraisal FrameSuddenness 1
Goal Relevance 1
Conduciveness 1
Discrepancy 0
Outcome Probability
1
PEACTIDM in the Button Task
10
Conduciveness -1
Discrepancy 1
Summary of Evaluation
11
1. Cognitively generated emotions Emotions arise from appraisals
2. Fast primary emotions Some appraisals generated and activated early
3. Emotional experience Cognitive access to emotional state, but no
physiology
4. Body-mind interactions Emotions can influence behavior
5. Emotional behaviora. Model works and produces useful, purposeful behaviorb. Different environments lead to:
i. Different time coursesii. Different feeling profiles
c. Choices impact emotions and success
Primary Contributions
12
1. Appraisals are functionally required by cognition They specify the data used by certain steps in PEACTIDM
2. Appraisals provide a task-independent language for control knowledge
They influence choices such as Attend and Intend
3. PEACTIDM implies a partial ordering of appraisal generation
Data dependencies imply that some appraisals can’t be generated until after others
4. Circumplex models can be synthesized from appraisal models
Emotion intensity and valence can be derived from appraisals
5. Emotion intensity is largely determined by expectations “Surprise Factor” is determined by Outcome Probability and
Discrepancy from Expectation
6. Some appraisals may require an arbitrary amount of inference
Comprehend can theoretically require arbitrary processing
7. Internal and external stimuli are treated identically Tasking options can be Attended and Intended just like
external stimuli
Additional Exploration
13
Functionality: What is emotion good for?Emotion-driven reinforcement learning
Scale: Does it work in non-trivial domains?Continuous time/space environmentMore complex appraisal generation
Understanding: How do appraisals influence performance?Try subsets of appraisals
Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004)
14
Environment
Critic
Agent
Actions
StatesReward
s
External Environment
Internal Environment
Agent
Critic
Actions
StatesReward
s
Sensations
Appraisal
Process+/-
Emotion Intensity
Decisions
“Organism”
Reward = Intensity * Valence
Clean House Domain
15
Blocks
Agent Rooms
Storage Room
Gateways
Stimuli in the Environment
Gateway to 73 Gateway to 78
Gateway to 93
Current room
Block 1
Create subtask go to room 73
Create subtask go to room 78
Create subtask go to room 93
Create subtask clean current room
16
LearningIn this domain, the agent is only learning what
to Attend to (including Tasking)Not learning what action to take
Goal: What is the impact of various appraisals?Disabled most and developed a few
ConducivenessDiscrepancy from Expectation and Outcome ProbabilityGoal RelevanceIntrinsic Pleasantness
Method: SARSA, epsilon-greedy, fixed ER and LR50 trials, 15 episodes per trial
17
Conduciveness
18
Measures how good or bad a stimulus isInfluences emotion intensity and valence
Sufficient to generate a rewardValue based on “progress” and “path”
Progress: Is agent getting closer to goal over time?
Path: Will acting on stimulus get agent closer to goal?
Conduciveness
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
100020003000400050006000700080009000
10000
3rd median 1st
Episodes
Decis
ion C
ycle
s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-300
-200
-100
0
100
200
300
3rd median 1st
Episodes
Tota
l R
ew
ard
Total Failures Trial Failures Final Episode Failures
7.6% 24% 6%
19
Outcome Probability and Discrepancy from Expectation
20
Measures how likely a prediction is and how accurate the prediction is
Influences emotion intensity via “surprise factor” (unvalenced)
Predictions and Outcome Probability generated via learned task modelResults in non-stationary reward
Discrepancy generated via comparison to prediction
Added these appraisals on top of Conduciveness
Outcome Probability and Discrepancy from Expectation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
0.10.20.30.40.50.60.70.80.9
1
3rd median 1st
Episodes
Fra
cti
on C
orr
ect
Pre
dic
tions
Total Failures Trial Failures Final Episode Failures
0% 0% 0%
21
Goal RelevanceMeasures how important a stimulus is for
the goalInfluences emotion intensity (unvalenced)Value based on “path” knowledge
Agent actually had too much path knowledge, so removed some
The value of Goal Relevance for some stimulus is used to “boost” the Q-value of the Attend operator for that stimulus
Added this appraisal on top of Conduciveness, Outcome Probability, and Discrepancy22
GR Knowledge Reduction Results
23
1 3 5 7 9 11 13 15 17 19 21 23 25 27 290
100020003000400050006000700080009000
10000
3rd median 1st
Episodes
Decis
ion C
ycle
s
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29-20-10
010203040506070
3rd median 1st
Episodes
Tota
l R
ew
ard
1 3 5 7 9 11 13 15 17 19 21 23 25 27 290
0.10.20.30.40.50.60.70.80.9
1
3rd median 1st
Episodes
Fra
cti
on C
orr
ect
Pre
dic
tions
Intrinsic PleasantnessMeasures how attracted the agent is to a
stimulus independent of the current goalInfluences emotion intensity and valenceMade blocks intrinsically pleasant
This is good because blocks need to be Attended to get cleaned up
This is bad because agent may be distracted by blocks that have already been cleaned up
Replaced Goal Relevance with this appraisal
24
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
0.10.20.30.40.50.60.70.80.9
1
3rd median 1st
Episodes
Fra
cti
on C
orr
ect
Pre
dic
tions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-40
-30
-20
-10
0
10
20
30
40
3rd median 1st
Episodes
Tota
l R
ew
ard
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
2000400060008000
100001200014000160001800020000
3rd median 1st
Episodes
Decis
ion C
ycle
s
Intrinsic Pleasantness Results
25
Dynamic Exploration Rate
26
Dynamically adjust exploration rate based on current emotionIf Valence < 0, then things could probably be
betterER = |Intensity * Valence|
If Valence > 0, then things are okER = 0
Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only
Dynamic Exploration Rate
Total Failures Trial Failures Final Episode Failures
9.2% 38% 8.0%
27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-140
-120
-100
-80
-60
-40
-20
0
20
40
3rd median 1st
Episodes
Tota
l R
ew
ard
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3rd median 1st
Episodes
Fra
cti
on C
orr
ect
Pre
dic
tions
Dynamic Learning Rate
28
Dynamically adjust learning rate based on current emotionIf reward magnitude is large, then there may be
something to learnLR = |Intensity*Valence|
Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only, Dynamic Exploration Rate enabledTotal Failures Trial Failures Final Episode Failures
0.5% 8.0% 0.0%
Dynamic Exploration and Learning Rates
29
Dynamically adjust exploration and learning rates based on current emotionIf Valence < 0, then things could probably be
betterER = |Intensity * Valence|
If Valence > 0, then things are okER = 0
If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence|
Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only
Results: Tighter convergences, better prediction accuracy, small number of failures
Learning SummaryConduciveness Foundation to learning. Agent learns to perform the task better over time.
Outcome Probability,
Discrepancy from
Expectation
Introduced learned task model for generating predictions as basis for
generating values for these appraisals. Agent learns to predict better over
time. Also results in much improved failure rates.
Goal Relevance Used to “boost” Q-values of proposed Attend operators. Agent does
extremely well (except for failures), to the point where it almost isn’t
learning, raising questions about the value of other appraisals. Knowledge
about Goal Relevance was reduced, leading to more learning.
Intrinsic Pleasantness Used to provide a task-independent bias on valence and intensity. Results
are mixed, as expected, but agent generally learns to overcome problems.
Dynamic Exploration
and Learning Rates
Emotion used to regulate part of the architecture. Resulted in tighter
convergences and prediction accuracy. Slightly more failures.
30
Secondary Contributions
31
1. Reinforcement learning can be driven by intrinsically generated rewards based on the agent’s feeling
2. Reinforcement learning parameters can be influenced by the current emotional state, resulting in improved performance
3. Each appraisal contributes to the agent’s performance
4. The system scales to continuous time and space environments
5. Mood averages reward over time, allowing states with no reward-invoking stimulus to still have a reward associated with them
Future Work
32
Cognition
Scalability
Validation
Physiology
Action tendencies
Non-verbal communication
Basic drives
Integration with other
architectural mechanisms
Learning (appraisal
values, intend, etc.)
Human data
Believability
Sociocultural interactionsMore appraisals
(social, perceptual, etc.)
Physiological measures
Behavior
FunctionalityDecision making
Backup Slides
33
Benefits of Soar
34
Parallel rule firing allows for:Parallel EncodingParallel appraisal generationParallel Decoding (theoretically)
Impasses provide:Architectural support for PEACTIDM-related subgoals
Intend Comprehend (theoretically)
Support for fast and extended inference, and transitioning from extended to fast (chunking)Intend in button task starts out extended and becomes fast
Reinforcement learning allows fast learning from emotion feedback
Future benefits:New modules may assist in appraisal generation
Episodic/semantic memories, visual imagery, etc.
Architectural Requirements:Soar vs. ACT-R
35
Function Architectural Requirements Soar ACT-RPerceive Support perception Abstracted into
environment interface
Part of perception module
Encode Generate multiple structures in parallel
Parallel rule firings
Part of perception module
Attend Single out one stimulus for additional processing
Operator selection
Rule firing
Comprehend
Support arbitrary processing ranging from short (e.g., verify) to long (e.g., understanding complex relationships)
Sets of operators, possibly in impasses
Sets of rules; declarative memory
Tasking Support making persistent changes in memory
Operator application rules
Goal buffer
Intend Support arbitrary processing ranging from short (e.g., push button) to long (e.g., complex, temporally extended action); support creating predictions
Sets of operators, impasses
Sets of rules
Decode Expand compact command descriptions into complex motor sequences; possibly support parallel processing of multiple commands
Parallel rule firings; abstracted into environment interface
Part of motor module
Motor Support motor execution Abstracted into environment interface
Part of motor module
PEACTIDM and GOMS
36
In general, these are complementary techniques
GOMSFocused on HCIFocused on motor actions (e.g. keypresses)Less focus on cognitive aspects (more abstract)
PEACTIDMFocused on required cognitive functionsAllows for a mapping with appraisals
Could implement PEACTIDM with GOMS, but would lack the proper labels that allow for the mapping
Relating Emotion to Intrinsically Motivated RL
37
Emotion intensity and valence used to:Generate intrinsic rewards
Various appraisals contribute to the reward signal with varying success
Frequent reward signals allow agent to learn faster, but can also introduce infinite reward cyclesTask modeling helps address cycles
Automatically adjust parametersLearning and exploration ratesHelps reduce unnecessary exploration, bad learning
Button Task Timing:Before and After Learning
38
Time (ms)
PEACTIDM Processing
Notes
0 Light on40 Perceive90 Encode+Atten
d140 Comprehend Verifies
prediction190 Intend (to push
button)240 Impasse290 Create
prediction340 Push button420 Decode+Motor Light off
Time (ms)
PEACTIDM Processing
Notes
0 Light on40 Perceive90 Encode+Atten
d140 Comprehend Verifies prediction190 Intend (Push button and Create
prediction)270 Decode+Moto
rLight off
Learning the Task Model
39
Stim1
Stimulus 1Stimulus 2Stimulus 3
Stim2
Stim3
0.2
0.2
0.15
1.0
0.0
Task Memory
Prediction (generic)
Outcome Probability0.5
Discrepancy 0.0
Surprise FactorIntensityReward
0.5MediumMedium
Stim2
1.00.5
Stim3
0.57
0.43LowerLower
0.1
0.40.00.00.0
Perception/Encoding
Body
Symbolic Long-Term Memories
Procedural
Short-Term Memory
Situation, Goals
Decision Procedure
Chunking
Reinforcement
Learning
Semantic
SemanticLearning
Episodic
EpisodicLearning
Perception ActionVisual
Imagery
Feelin
g
Gen
era
tion
Extending Soar with Emotion(Marinier & Laird 2007)
40
Soar is a cognitive architecture
A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior
Cognitive architectures are general agent frameworks
Feelin
g
Genera
tion
Reinforcement
Learning
Emotion.5,.7,0,-.4,.3,
…
Extending Soar with Emotion(Marinier & Laird 2007)
41
Body
Decision Procedure
Perception Action
Appraisals
Feelings Short-Term Memory
Situation, Goals
Mood.7,-.2,.8,.3,.6,
…
Feelings
Knowledge
Architecture
Symbolic Long-Term Memories
Procedural
Chunking
Semantic
SemanticLearning
Episodic
EpisodicLearning
+/- In
tensity
Feeling.9,.6,.5,-.1,.8,
…
VisualImagery
Appraisal Value Ranges
42
Suddenness [0,1] Unpredictability [0,1]
Goal Relevance [0,1]Discrepancy from Expectation [0,1]
Intrinsic Pleasantness [-1,1]
Outcome Probability [0,1]
Conduciveness [-1,1]Causal Agent [self, other, nature]
Control [-1,1] Causal Motive [intentional, negligence, chance]Power [-1,1]
Computing Feeling from Emotion and Mood
43
Assumption: Appraisal dimensions are independentLimited Range: Inputs and outputs are in [0,1] or [-1,1]Distinguishability: Very different inputs should lead to
very different outputsNon-linear: Linearity would violate limited range and
distinguishability
Example
44
Power
Control
Conduciveness
Discrepancy
Outcome-probability
Causal-motive (negligence)
Causal-motive (chance)
Causal-motive (intentional)
Causal-agent (nature)
Causal-agent (other)
Causal-agent (self)
Goal-relevance
Intrinsic-pleasantness
Unpredictability
Suddenness
-0.3 0.7
Emotion (elation-joy)Mood (anxiety-worry)Feeling (elation-joy)
Maze Tasks
45
no distractions
distractions
single subgoal
multiple subgoalsimpossible
Time Course and Impact of Feelings
46
no dis-tractions
distrac-tions
single subgoal
multiple subgoals
impos-sible
0
50
100
150
200
250
300
350
400
450
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Success Failure Completion Percentage
Deci
sion
Cycl
es
Com
ple
tion
Perc
en
tag
e
Feeling Dynamics Results
47
no dis-tractions
distrac-tions
single subgoal failure
single subgoal success
multiple subgoals
failure
multiple subgoals success
impos-sible
0
50
100
150
200
250
300
anx-wor neg-disp-disg
Nu
mb
er
of
Feelin
g I
nsta
nces
easy medium
(failure)
medium
(suc-cess)
hard(fail-ure)
hard(suc-cess)
very easy
Computing Feeling Intensity
48
Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is
Limited range: Should map onto [0,1]No dominant appraisal: No single value should drown out all
the othersCan’t just multiply values, because if any are 0, then intensity is 0
Realization principle: Expected events should be less intense than unexpected events
𝐼= [ሺ1−𝑂𝑃ሻሺ1−𝐷𝐸ሻ+ሺ𝑂𝑃∙𝐷𝐸ሻ] ∙𝑆+𝑈𝑃+𝐼𝑃+𝐺𝑅+𝐶𝑜𝑛𝑑+𝐶𝑡𝑟𝑙+𝑃𝑛𝑢𝑚_𝑑𝑖𝑚𝑠
Example
49
PowerControl
ConducivenessDiscrepancy
Outcome-probabilityCausal-motive (negligence)
Causal-motive (chance)Causal-motive (intentional)
Causal-agent (nature)Causal-agent (other)
Causal-agent (self)Goal-relevance
Intrinsic-pleasantnessUnpredictability
Suddenness
-0.3 0.7
Emotion (elation-joy)Mood (anxiety-worry)Feeling (elation-joy)
Feeling Intensity
-0.15
-0.1
-0.0500000000000002
-2.22044604925031E-16
0.0499999999999998
0.0999999999999998
0.15
Learning task
50
Start
Goal
Optimal Subtasks
Learning Results
51
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
2000
4000
6000
8000
10000
12000
Standard RL Feeling=EmotionFeeling=Emotion+Mood
Episode
Med
ian
Pro
cessin
g C
ycle
s
Circumplex modelsEmotions can be described in terms of intensity
and valence, as in a circumplex model:
52
NEGATIVE VALENCE
POSITIVE VALENCE
HIGH INTENSITY
LOW INTENSITY
upset
stressed
nervous
tense
sad
depressed
lethargic
fatigued
alert
excited
elated
happy
contented
serene
relaxed
calm
Adapted from Feldman Barrett & Russell (1998)
Full Knowledge Goal Relevance Results
53
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
100020003000400050006000700080009000
10000
3rd median 1st
Episodes
Decis
ion C
ycle
s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-20
0
20
40
60
80
100
120
3rd median 1st
Episodes
Tota
l R
ew
ard
1 2 3 4 5 6 7 8 9 10 11 12 13 14 150
0.10.20.30.40.50.60.70.80.9
1
3rd median 1st
Episodes
Fra
cti
on C
orr
ect
Pre
dic
tions
Related Work
54
System Appraisal Theory
Emotion Type Mood/Feeling
Incremental Appraisals
Appraisals required
EMA Mixture Categorical Single
Mood only
Yes Coping only
MAMID Mixture Categorical Multiple
No No No
OCC/Em Mixture Categorical Multiple
Mood only
No No
Kismet Circumplex
Categorical Single
No No No
Our system
Scherer Continuous Single
Yes Yes Yes