Modeling transfer of learning in games of strategic interaction

Preview:

DESCRIPTION

Modeling transfer of learning in games of strategic interaction. Ion Juvina & Christian Lebiere Department of Psychology Carnegie Mellon University. Background | Experiment | Model | In progress | Discussion. Outline. Background Experiment Cognitive model - PowerPoint PPT Presentation

Citation preview

ACT-R Workshop July 2012 1

Modeling transfer of learning in games of strategic interaction

Ion Juvina & Christian Lebiere

Department of Psychology

Carnegie Mellon University

2

Outline Background

Experiment

Cognitive model

Work in progress

Discussion

Background | Experiment | Model | In progress | Discussion

3

Transfer of learning Alfred Binet (1899):

Formal discipline: Exercise of mental faculties -> generalization

Thorndike (1903): Identical element theory:

transfer of learning occurs only when identical elements of behavior are carried over from one task to another

Singley & Anderson (1989): Surface vs. deep similarities

Common “cognitive units”

Background | Experiment | Model | In progress | Discussion

4

Transfer in strategic interaction

Bipartisan cooperation in Congress Golf -> bipartisanship?

Similarity? What is transferred?

Background | Experiment | Model | In progress | Discussion

5

Prisoner’s Dilemma (PD)

Background | Experiment | Model | In progress | Discussion

6

Chicken game (CG)

Background | Experiment | Model | In progress | Discussion

PD & CG payoff matrices

PD A B

A -1, -1 10, -10

B -10, 10 1, 1

CG A B

A -10, -10 10, -1

B -1, 10 1, 1

Background | Experiment | Model | In progress | Discussion

Similarities between PD & CG Surface (near transfer)

2X2 games 2 symmetric and 2 asymmetric outcomes [1,1] outcome is identical

Deep (far transfer) Mixed motive Non-zero sum Mutual cooperation is superior to competition

in long term Though unstable (risky)

Background | Experiment | Model | In progress | Discussion

9

Differences between PD & CG Different equilibria:

Symmetric in PD: [-1,-1]

Asymmetric in CG: [-1, 10] and [10,-1]

Different strategies to maximize joint payoff (Pareto-efficient outcome): [1,1] in PD

Alternation of [-1,10] and [10,-1] in CG

Background | Experiment | Model | In progress | Discussion

10

Questions / hypotheses Similarities

Identical element? Common cognitive units? Transfer of learning

Is there any transfer? Only in one direction?

Low – high entropy? (Bednar, Chen, Xiao Liu, & Page, in press) Identical element -> both ways

Mechanism of transfer Reciprocal trust mitigates the risk associated with the

long term solution (Hardin, 2002)

Background | Experiment | Model | In progress | Discussion

Participants and design 480 participants (CMU students)

240 pairs 2 within-subjects games: PD & CG 4 between-subjects information conditions

No-info: 60 pairs Min-info: 60 pairs Mid-info: 60 pairs Max-info: 60 pairs

2 between-subjects order conditions in each information condition

PD-CG: 30 pairs CG-PD: 30 pairs

200 unnumbered rounds for each game

Background | Experiment | Model | In progress | Discussion

Typical outcomes

Background | Experiment | Model | In progress | Discussion

Pareto-optimal equilibria

Background | Experiment | Model | In progress | Discussion

[1,1] increases with info

Background | Experiment | Model | In progress | Discussion

Alternation increases with info

Background | Experiment | Model | In progress | Discussion

PD – CG sequence

16

Background | Experiment | Model | In progress | Discussion

CG – PD sequence

17

Background | Experiment | Model | In progress | Discussion

18

PD before and after

Background | Experiment | Model | In progress | Discussion

19

CG before and after

Background | Experiment | Model | In progress | Discussion

20

Transfer from PD to CG Increased [1,1] (surface transfer)

Increased alternation (deep transfer)

Background | Experiment | Model | In progress | Discussion

21

Transfer from CG to PD Increased [1,1] (surface + deep transf.)

Background | Experiment | Model | In progress | Discussion

22

Divergent effects

[1,1]SurfaceSurface

DeepDeep

[1,1]

[10,-1] / [-1,10]

PD CG

Background | Experiment | Model | In progress | Discussion

23

Convergent effects

[1,1]

SurfaceSurface

DeepDeep

[1,1]

[10,-1] / [-1,10]

CG PD

Background | Experiment | Model | In progress | Discussion

Reciprocation by info

Background | Experiment | Model | In progress | Discussion

Payoff by info in PD and CG

Background | Experiment | Model | In progress | Discussion

26

Summary results Mutual cooperation increases with awareness of

interdependence (info) Transfer of learning

Better performance “after” than “before” Combined effects of surface and deep similarities

CG -> PD surface similarity facilitates transfer PD -> CG surface similarity interferes with transfer

Transfer occurs in both directions Mechanism of generalization

Reciprocal trust?

Background | Experiment | Model | In progress | Discussion

27

Cognitive model Awareness of interdependence

Opponent modeling

Generality Utility learning (reinforcement learning)

Transfer of learning Surface transfer Deep transfer

Background | Experiment | Model | In progress | Discussion

28

Opponent modeling Instance-based learning

Dynamic representation of the opponent

Sequence learning Prediction of opponent’s next move

Instance (snapshot of the current situation) Previous moves and opponent’s current move

Contextualized expectations

Background | Experiment | Model | In progress | Discussion

29

Utility learning Reinforcement learning

Strategy: what move to make given Expected move of opponent Context (previous moves)

Reward functions Own payoff – Opponent’s payoff Opponent’s payoff Joint payoff – Opponent’s previous payoff

Background | Experiment | Model | In progress | Discussion

30

Surface transfer Declarative sub-symbolic learning

Retrieval of instances guided by recency and frequency

Strategy learning A learned strategy continues to be used for

a while until it is unlearned

Background | Experiment | Model | In progress | Discussion

31

Deep transfer Trust learning / Trust dynamics

Trust accumulator Increases when opponent makes cooperative (risky)

moves

Decreases when opponent makes competitive moves

Trust invest accumulator Increases with mutually destructive outcome

Decreases with unreciprocated cooperation (risk taking)

Background | Experiment | Model | In progress | Discussion

32

Meta strategy Determines which reward function to use

Trust accumulator <= 0 Reward = own payoff – opponent’s payoff

Trust invest accumulator > 0 Reward = opponent’s payoff

Trust accumulator > 0 Reward = joint payoff – opp’s previous payoff

Background | Experiment | Model | In progress | Discussion

Model diagram

HSCB 2011 33

InstanceCurrent moves: A BPrevious moves: A A

Declarative Memory

Inst2Inst1

Inst4

Inst3

PredictionPrevious moves: A BOpponent move: A

Procedural Memory

Rule2

Rule1 Rule3

MoveBest response: A Predicted move: A

Trust Trust accumulatorTrust invest

Opponent MoveA

Reward

ACT-R

Environment

ACT-R extension

Background | Experiment | Model | In progress | Discussion

PD-CG

Background | Experiment | Model | In progress | Discussion

CG-PD

Background | Experiment | Model | In progress | Discussion

PD-CG surface transfer

Background | Experiment | Model | In progress | Discussion

PD-CG deep transfer

Background | Experiment | Model | In progress | Discussion

CG – PD surf+deep transfer

Background | Experiment | Model | In progress | Discussion

Trust simulation

Background | Experiment | Model | In progress | Discussion

40

Summary model results Awareness of interdependence

Opponent modeling

Generality Utility learning

Transfer of learning Surface level transfer: cognitive units

Deep level transfer: Trust

Background | Experiment | Model | In progress | Discussion

41

In progress Expand model to account for all

information conditions

Develop more ecologically valid paradigm (IPD^3)

Model “affective” processes in ACT-R

Background | Experiment | Model | In progress | Discussion

42

IPD^3

Background | Experiment | Model | In progress | Discussion

43

General discussion Transfer of learning is possible

Deep similarities: interpersonal level IPD^3

To be used in behavioral experiments Tool for learning strategic interaction skills

Background | Experiment | Model | In progress | Discussion

44

Acknowledgments

Coty Gonzalez Jolie Martin Hau-Yu Wong Muniba Saleem This research is supported by the Defense

Threat Reduction Agency (DTRA) grant number: HDTRA1-09-1-0053 to Cleotilde Gonzalez and Christian Lebiere

45

Thank you for your attention! Questions?

Recommended