Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning

Bob MarinierOral Defense

University of Michigan, CSEJune 17, 2008

A Computational Unification of Cognitive Control, Emotion, and Learning

IntroductionThe link between core cognitive functions

and emotion has not been fully exploredExisting computational models are largely

pragmaticWe integrate the PEACTIDM theory of

cognitive control and appraisal theories of emotion PEACTIDM supplies process, appraisal

theories supply dataWe use emotion-driven reinforcement

learning to demonstrate improved functionalityAutomatically generate rewards, set

parameters

2

Cognitive Control: PEACTIDM

Perceive Obtain raw perceptionEncode Create domain-independent

representationAttend Choose stimulus to processComprehend

Generate structures that relate stimulus to tasks and can be used to inform behavior

Task Perform task maintenanceIntend Choose an action, create

predictionDecode Decompose action into motor

commandsMotor Execute motor commands

3

PEACTIDM Cycle

Comprehend

Perceive

Encode

Attend

Intend

Decode

Motor

Raw Perceptual Information

Stimulus Relevance

Stimulus chosen for processing

Current Situation Assessment

Action

Motor Commands

Environmental Change

Prediction

What is thisinformation?

4

Appraisal Theories of EmotionA situation is evaluated along a number of appraisal

dimensions,many of which relate the situation to current goalsNovelty, goal relevance, goal conduciveness, expectedness,

causal agency, etc.Result of appraisals influences emotionEmotion can then be coped with (via internal or external

actions)

SituationGoals

Appraisals

Emotion

Coping

5

Appraisals to Emotions (Scherer 2001)

6

Joy Fear Anger

Suddenness High/medium High High

Unpredictability High High High

Intrinsic pleasantness Low

Goal/need relevance High High High

Cause: agent Other/nature

Other

Cause: motive Chance/intentional

Intentional

Outcome probability Very high High Very high

Discrepancy from expectation

High High

Conduciveness Very high Low Low

Control High

Power Very low HighWhy these dimensions?What is the functional purpose of emotion?

Unification of PEACTIDM and Appraisal Theories

7

Comprehend

Perceive

Encode

Attend

Intend

Decode

Motor

Raw Perceptual Information

Stimulus Relevance

Stimulus chosen for processing

Current Situation Assessment

Action

Motor Commands

Environmental Change

SuddennessUnpredictabilityGoal Relevance

Intrinsic Pleasantness

Causal Agent/MotiveDiscrepancy

ConducivenessControl/Power

Prediction

Outcome Probability

Example: Simple Choice Response Task

8

-1 -0.5 0 0.5 1

Em

oti

on I

n-

tensit

y w

ith

Vale

nce

Time

Appraisal FrameSuddenness 1

Goal Relevance 1

Conduciveness 1

Discrepancy 0

Outcome Probability

1

PEACTIDM in the Button Task

9 “Surprise Factor”

-1 -0.5 0 0.5 1

Em

oti

on I

n-

tensit

y w

ith

Vale

nce

Time

Appraisal FrameSuddenness 1

Goal Relevance 1

Conduciveness 1

Discrepancy 0

Outcome Probability

1

PEACTIDM in the Button Task

10

Conduciveness -1

Discrepancy 1

Summary of Evaluation

11

1. Cognitively generated emotions Emotions arise from appraisals

2. Fast primary emotions Some appraisals generated and activated early

3. Emotional experience Cognitive access to emotional state, but no

physiology

4. Body-mind interactions Emotions can influence behavior

5. Emotional behaviora. Model works and produces useful, purposeful behaviorb. Different environments lead to:

i. Different time coursesii. Different feeling profiles

c. Choices impact emotions and success

Primary Contributions

12

1. Appraisals are functionally required by cognition They specify the data used by certain steps in PEACTIDM

2. Appraisals provide a task-independent language for control knowledge

They influence choices such as Attend and Intend

3. PEACTIDM implies a partial ordering of appraisal generation

Data dependencies imply that some appraisals can’t be generated until after others

4. Circumplex models can be synthesized from appraisal models

Emotion intensity and valence can be derived from appraisals

5. Emotion intensity is largely determined by expectations “Surprise Factor” is determined by Outcome Probability and

Discrepancy from Expectation

6. Some appraisals may require an arbitrary amount of inference

Comprehend can theoretically require arbitrary processing

7. Internal and external stimuli are treated identically Tasking options can be Attended and Intended just like

external stimuli

Additional Exploration

13

Functionality: What is emotion good for?Emotion-driven reinforcement learning

Scale: Does it work in non-trivial domains?Continuous time/space environmentMore complex appraisal generation

Understanding: How do appraisals influence performance?Try subsets of appraisals

Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004)

14

Environment

Critic

Agent

Actions

StatesReward

s

External Environment

Internal Environment

Agent

Critic

Actions

StatesReward

s

Sensations

Appraisal

Process+/-

Emotion Intensity

Decisions

“Organism”

Reward = Intensity * Valence

Clean House Domain

15

Blocks

Agent Rooms

Storage Room

Gateways

Stimuli in the Environment

Gateway to 73 Gateway to 78

Gateway to 93

Current room

Block 1

Create subtask go to room 73



Create subtask clean current room

16

LearningIn this domain, the agent is only learning what

to Attend to (including Tasking)Not learning what action to take

Goal: What is the impact of various appraisals?Disabled most and developed a few

ConducivenessDiscrepancy from Expectation and Outcome ProbabilityGoal RelevanceIntrinsic Pleasantness

Method: SARSA, epsilon-greedy, fixed ER and LR50 trials, 15 episodes per trial

17

Conduciveness

18

Measures how good or bad a stimulus isInfluences emotion intensity and valence

Sufficient to generate a rewardValue based on “progress” and “path”

Progress: Is agent getting closer to goal over time?

Path: Will acting on stimulus get agent closer to goal?

Conduciveness

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

100020003000400050006000700080009000

10000

3rd median 1st

Episodes

Decis

ion C

ycle

s

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-300

-200

-100

0

100

200

300

3rd median 1st

Episodes

Tota

l R

ew

ard

Total Failures Trial Failures Final Episode Failures

7.6% 24% 6%

19

Outcome Probability and Discrepancy from Expectation

20

Measures how likely a prediction is and how accurate the prediction is

Influences emotion intensity via “surprise factor” (unvalenced)

Predictions and Outcome Probability generated via learned task modelResults in non-stationary reward

Discrepancy generated via comparison to prediction

Added these appraisals on top of Conduciveness

Outcome Probability and Discrepancy from Expectation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

0.10.20.30.40.50.60.70.80.9

1

3rd median 1st

Episodes

Fra

cti

on C

orr

ect

Pre

dic

tions


0% 0% 0%

21

Goal RelevanceMeasures how important a stimulus is for

the goalInfluences emotion intensity (unvalenced)Value based on “path” knowledge

Agent actually had too much path knowledge, so removed some

The value of Goal Relevance for some stimulus is used to “boost” the Q-value of the Attend operator for that stimulus

Added this appraisal on top of Conduciveness, Outcome Probability, and Discrepancy22

GR Knowledge Reduction Results

23

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290

100020003000400050006000700080009000

10000

3rd median 1st

Episodes

Decis

ion C

ycle

s

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29-20-10

010203040506070

3rd median 1st

Episodes

Tota

l R

ew

ard

1 3 5 7 9 11 13 15 17 19 21 23 25 27 290

0.10.20.30.40.50.60.70.80.9

1

3rd median 1st

Episodes

Fra

cti

on C

orr

ect

Pre

dic

tions

Intrinsic PleasantnessMeasures how attracted the agent is to a

stimulus independent of the current goalInfluences emotion intensity and valenceMade blocks intrinsically pleasant

This is good because blocks need to be Attended to get cleaned up

This is bad because agent may be distracted by blocks that have already been cleaned up

Replaced Goal Relevance with this appraisal

24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

0.10.20.30.40.50.60.70.80.9

1

3rd median 1st

Episodes

Fra

cti

on C

orr

ect

Pre

dic

tions

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-40

-30

-20

-10

0

10

20

30

40

3rd median 1st

Episodes

Tota

l R

ew

ard

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

2000400060008000

100001200014000160001800020000

3rd median 1st

Episodes

Decis

ion C

ycle

s

Intrinsic Pleasantness Results

25

Dynamic Exploration Rate

26

Dynamically adjust exploration rate based on current emotionIf Valence < 0, then things could probably be

betterER = |Intensity * Valence|

If Valence > 0, then things are okER = 0

Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only

Dynamic Exploration Rate


9.2% 38% 8.0%

27

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-140

-120

-100

-80

-60

-40

-20

0

20

40

3rd median 1st

Episodes

Tota

l R

ew

ard

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

3rd median 1st

Episodes

Fra

cti

on C

orr

ect

Pre

dic

tions

Dynamic Learning Rate

28

Dynamically adjust learning rate based on current emotionIf reward magnitude is large, then there may be

something to learnLR = |Intensity*Valence|

Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only, Dynamic Exploration Rate enabledTotal Failures Trial Failures Final Episode Failures

0.5% 8.0% 0.0%

Dynamic Exploration and Learning Rates

29

Dynamically adjust exploration and learning rates based on current emotionIf Valence < 0, then things could probably be

betterER = |Intensity * Valence|

If Valence > 0, then things are okER = 0

If reward magnitude is large, then there may be something to learnLR = |Intensity*Valence|

Experiment conducted with Conduciveness, Outcome Probability, and Discrepancy only

Results: Tighter convergences, better prediction accuracy, small number of failures

Learning SummaryConduciveness Foundation to learning. Agent learns to perform the task better over time.

Outcome Probability,

Discrepancy from

Expectation

Introduced learned task model for generating predictions as basis for

generating values for these appraisals. Agent learns to predict better over

time. Also results in much improved failure rates.

Goal Relevance Used to “boost” Q-values of proposed Attend operators. Agent does

extremely well (except for failures), to the point where it almost isn’t

learning, raising questions about the value of other appraisals. Knowledge

about Goal Relevance was reduced, leading to more learning.

Intrinsic Pleasantness Used to provide a task-independent bias on valence and intensity. Results

are mixed, as expected, but agent generally learns to overcome problems.

Dynamic Exploration

and Learning Rates

Emotion used to regulate part of the architecture. Resulted in tighter

convergences and prediction accuracy. Slightly more failures.

30

Secondary Contributions

31

1. Reinforcement learning can be driven by intrinsically generated rewards based on the agent’s feeling

2. Reinforcement learning parameters can be influenced by the current emotional state, resulting in improved performance

3. Each appraisal contributes to the agent’s performance

4. The system scales to continuous time and space environments

5. Mood averages reward over time, allowing states with no reward-invoking stimulus to still have a reward associated with them

Future Work

32

Cognition

Scalability

Validation

Physiology

Action tendencies

Non-verbal communication

Basic drives

Integration with other

architectural mechanisms

Learning (appraisal

values, intend, etc.)

Human data

Believability

Sociocultural interactionsMore appraisals

(social, perceptual, etc.)

Physiological measures

Behavior

FunctionalityDecision making

Backup Slides

33

Benefits of Soar

34

Parallel rule firing allows for:Parallel EncodingParallel appraisal generationParallel Decoding (theoretically)

Impasses provide:Architectural support for PEACTIDM-related subgoals

Intend Comprehend (theoretically)

Support for fast and extended inference, and transitioning from extended to fast (chunking)Intend in button task starts out extended and becomes fast

Reinforcement learning allows fast learning from emotion feedback

Future benefits:New modules may assist in appraisal generation

Episodic/semantic memories, visual imagery, etc.

Architectural Requirements:Soar vs. ACT-R

35

Function Architectural Requirements Soar ACT-RPerceive Support perception Abstracted into

environment interface

Part of perception module

Encode Generate multiple structures in parallel

Parallel rule firings

Part of perception module

Attend Single out one stimulus for additional processing

Operator selection

Rule firing

Comprehend

Support arbitrary processing ranging from short (e.g., verify) to long (e.g., understanding complex relationships)

Sets of operators, possibly in impasses

Sets of rules; declarative memory

Tasking Support making persistent changes in memory

Operator application rules

Goal buffer

Intend Support arbitrary processing ranging from short (e.g., push button) to long (e.g., complex, temporally extended action); support creating predictions

Sets of operators, impasses

Sets of rules

Decode Expand compact command descriptions into complex motor sequences; possibly support parallel processing of multiple commands

Parallel rule firings; abstracted into environment interface

Part of motor module

Motor Support motor execution Abstracted into environment interface

Part of motor module

PEACTIDM and GOMS

36

In general, these are complementary techniques

GOMSFocused on HCIFocused on motor actions (e.g. keypresses)Less focus on cognitive aspects (more abstract)

PEACTIDMFocused on required cognitive functionsAllows for a mapping with appraisals

Could implement PEACTIDM with GOMS, but would lack the proper labels that allow for the mapping

Relating Emotion to Intrinsically Motivated RL

37

Emotion intensity and valence used to:Generate intrinsic rewards

Various appraisals contribute to the reward signal with varying success

Frequent reward signals allow agent to learn faster, but can also introduce infinite reward cyclesTask modeling helps address cycles

Automatically adjust parametersLearning and exploration ratesHelps reduce unnecessary exploration, bad learning

Button Task Timing:Before and After Learning

38

Time (ms)

PEACTIDM Processing

Notes

0 Light on40 Perceive90 Encode+Atten

d140 Comprehend Verifies

prediction190 Intend (to push

button)240 Impasse290 Create

prediction340 Push button420 Decode+Motor Light off

Time (ms)

PEACTIDM Processing

Notes

0 Light on40 Perceive90 Encode+Atten

d140 Comprehend Verifies prediction190 Intend (Push button and Create

prediction)270 Decode+Moto

rLight off

Learning the Task Model

39

Stim1

Stimulus 1Stimulus 2Stimulus 3

Stim2

Stim3

0.2

0.2

0.15

1.0

0.0

Task Memory

Prediction (generic)

Outcome Probability0.5

Discrepancy 0.0

Surprise FactorIntensityReward

0.5MediumMedium

Stim2

1.00.5

Stim3

0.57

0.43LowerLower

0.1

0.40.00.00.0

Perception/Encoding

Body

Symbolic Long-Term Memories

Procedural

Short-Term Memory

Situation, Goals

Decision Procedure

Chunking

Reinforcement

Learning

Semantic

SemanticLearning

Episodic

EpisodicLearning

Perception ActionVisual

Imagery

Feelin

g

Gen

era

tion

Extending Soar with Emotion(Marinier & Laird 2007)

40

Soar is a cognitive architecture

A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior

Cognitive architectures are general agent frameworks

Feelin

g

Genera

tion

Reinforcement

Learning

Emotion.5,.7,0,-.4,.3,

…

Extending Soar with Emotion(Marinier & Laird 2007)

41

Body

Decision Procedure

Perception Action

Appraisals

Feelings Short-Term Memory

Situation, Goals

Mood.7,-.2,.8,.3,.6,

…

Feelings

Knowledge

Architecture

Symbolic Long-Term Memories

Procedural

Chunking

Semantic

SemanticLearning

Episodic

EpisodicLearning

+/- In

tensity

Feeling.9,.6,.5,-.1,.8,

…

VisualImagery

Appraisal Value Ranges

42

Suddenness [0,1] Unpredictability [0,1]

Goal Relevance [0,1]Discrepancy from Expectation [0,1]

Intrinsic Pleasantness [-1,1]

Outcome Probability [0,1]

Conduciveness [-1,1]Causal Agent [self, other, nature]

Control [-1,1] Causal Motive [intentional, negligence, chance]Power [-1,1]

Computing Feeling from Emotion and Mood

43

Assumption: Appraisal dimensions are independentLimited Range: Inputs and outputs are in [0,1] or [-1,1]Distinguishability: Very different inputs should lead to

very different outputsNon-linear: Linearity would violate limited range and

distinguishability

Example

44

Power

Control

Conduciveness

Discrepancy

Outcome-probability

Causal-motive (negligence)

Causal-motive (chance)

Causal-motive (intentional)

Causal-agent (nature)

Causal-agent (other)

Causal-agent (self)

Goal-relevance

Intrinsic-pleasantness

Unpredictability

Suddenness

-0.3 0.7

Emotion (elation-joy)Mood (anxiety-worry)Feeling (elation-joy)

Maze Tasks

45

no distractions

distractions

single subgoal

multiple subgoalsimpossible

Time Course and Impact of Feelings

46

no dis-tractions

distrac-tions

single subgoal

multiple subgoals

impos-sible

0

50

100

150

200

250

300

350

400

450

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Success Failure Completion Percentage

Deci

sion

Cycl

es

Com

ple

tion

Perc

en

tag

e

Feeling Dynamics Results

47

no dis-tractions

distrac-tions

single subgoal failure

single subgoal success

multiple subgoals

failure

multiple subgoals success

impos-sible

0

50

100

150

200

250

300

anx-wor neg-disp-disg

Nu

mb

er

of

Feelin

g I

nsta

nces

easy medium

(failure)

medium

(suc-cess)

hard(fail-ure)

hard(suc-cess)

very easy

Computing Feeling Intensity

48

Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is

Limited range: Should map onto [0,1]No dominant appraisal: No single value should drown out all

the othersCan’t just multiply values, because if any are 0, then intensity is 0

Realization principle: Expected events should be less intense than unexpected events

𝐼= [ሺ1−𝑂𝑃ሻሺ1−𝐷𝐸ሻ+ሺ𝑂𝑃∙𝐷𝐸ሻ] ∙𝑆+𝑈𝑃+𝐼𝑃+𝐺𝑅+𝐶𝑜𝑛𝑑+𝐶𝑡𝑟𝑙+𝑃𝑛𝑢𝑚_𝑑𝑖𝑚𝑠

Example

49

PowerControl

ConducivenessDiscrepancy

Outcome-probabilityCausal-motive (negligence)

Causal-motive (chance)Causal-motive (intentional)

Causal-agent (nature)Causal-agent (other)

Causal-agent (self)Goal-relevance

Intrinsic-pleasantnessUnpredictability

Suddenness

-0.3 0.7

Emotion (elation-joy)Mood (anxiety-worry)Feeling (elation-joy)

Feeling Intensity

-0.15

-0.1

-0.0500000000000002

-2.22044604925031E-16

0.0499999999999998

0.0999999999999998

0.15

Learning task

50

Start

Goal

Optimal Subtasks

Learning Results

51

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

2000

4000

6000

8000

10000

12000

Standard RL Feeling=EmotionFeeling=Emotion+Mood

Episode

Med

ian

Pro

cessin

g C

ycle

s

Circumplex modelsEmotions can be described in terms of intensity

and valence, as in a circumplex model:

52

NEGATIVE VALENCE

POSITIVE VALENCE

HIGH INTENSITY

LOW INTENSITY

upset

stressed

nervous

tense

sad

depressed

lethargic

fatigued

alert

excited

elated

happy

contented

serene

relaxed

calm

Adapted from Feldman Barrett & Russell (1998)

Full Knowledge Goal Relevance Results

53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

100020003000400050006000700080009000

10000

3rd median 1st

Episodes

Decis

ion C

ycle

s

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15-20

0

20

40

60

80

100

120

3rd median 1st

Episodes

Tota

l R

ew

ard

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

0.10.20.30.40.50.60.70.80.9

1

3rd median 1st

Episodes

Fra

cti

on C

orr

ect

Pre

dic

tions

Related Work

54

System Appraisal Theory

Emotion Type Mood/Feeling

Incremental Appraisals

Appraisals required

EMA Mixture Categorical Single

Mood only

Yes Coping only

MAMID Mixture Categorical Multiple

No No No

OCC/Em Mixture Categorical Multiple

Mood only

No No

Kismet Circumplex

Categorical Single

No No No

Our system

Scherer Continuous Single

Yes Yes Yes

Documents

Bob Marinier Oral Defense University of Michigan, CSE June 17, 2008 A Computational Unification of Cognitive Control, Emotion, and Learning