Learning and teaching in games: Statistical models of human play in experiments Colin F. Camerer, Social Sciences Caltech ([email protected]) Teck

Learning and teaching in games: Statistical models Learning and teaching in games: Statistical models of human play in experimentsof human play in experiments

Colin F. Camerer, Social Sciences Caltech ([email protected])Colin F. Camerer, Social Sciences Caltech ([email protected])Teck Ho, Berkeley (Haas Business School)Teck Ho, Berkeley (Haas Business School)

Kuan Chong, National Univ SingaporeKuan Chong, National Univ Singapore

How can bounded rationality be modelled in games?How can bounded rationality be modelled in games?Theory desiderata: Precise, general, useful (game theory), Theory desiderata: Precise, general, useful (game theory), andand cognitively plausible, empirically disciplined (cog sci)cognitively plausible, empirically disciplined (cog sci)Three components:Three components:– Cognitive hierarchy thinking model (one parameter, creates Cognitive hierarchy thinking model (one parameter, creates

initial conditions)initial conditions)– Learning model (EWA, fEWA)Learning model (EWA, fEWA)

- Sophisticated teaching’ model (repeated games)- Sophisticated teaching’ model (repeated games)

Shameless plug: Camerer, Shameless plug: Camerer, Behavioral Game TheoryBehavioral Game Theory (Princeton, Feb ’03) or (Princeton, Feb ’03) or see website hss.caltech.edu/~camerersee website hss.caltech.edu/~camerer

Behavioral models use some game theory Behavioral models use some game theory principles, and weaken other principlesprinciples, and weaken other principles

PrinciplePrinciple equilibrium equilibrium Thinking Thinking LearningLearning TeachingTeachingconcept of a gameconcept of a game strategic thinkingstrategic thinking best responsebest response mutual consistencymutual consistency learning learning strategic foresightstrategic foresight

(Typical) experimental economics methods(Typical) experimental economics methodsRepeated matrix stage game (Markov w/ 1 state)Repeated matrix stage game (Markov w/ 1 state)

Repeated with “one night stand” (“stranger”) rematching protocol & Repeated with “one night stand” (“stranger”) rematching protocol & feedback (to allow learning without repeated-game reputation-building)feedback (to allow learning without repeated-game reputation-building)

Game is described abstractly, payoffs are public knowledge (e.g., read Game is described abstractly, payoffs are public knowledge (e.g., read out loud)out loud)

Subjects paid $ according to choices (~$12/hr)Subjects paid $ according to choices (~$12/hr)

Why this style? Basic question is whether S’s can “compute” Why this style? Basic question is whether S’s can “compute” equiilibriumequiilibrium**, not meant to be realistic, not meant to be realistic

Establish regularity across S’s, different game structuresEstablish regularity across S’s, different game structures

Statistical fitting: Parsimonious (1+ parameters) models, fit (in sample) Statistical fitting: Parsimonious (1+ parameters) models, fit (in sample) & predict (out of sample) & compute economic value& predict (out of sample) & compute economic value

**Question now answered (No): Would be useful to move to low-Question now answered (No): Would be useful to move to low-information MAL designsinformation MAL designs

Beauty contest game: Pick numbers [0,100] Beauty contest game: Pick numbers [0,100] closest to (2/3)*(average number) wins closest to (2/3)*(average number) wins

Beauty contest results (Expansion, Financial Times, Spektrum)

0.00

0.05

0.10

0.15

0.20

numbers

rela

tive

fr

eq

uen

cie

s

22 50 10033

average 23.07

0

““Beauty contest” game (Ho, Camerer, Weigelt Beauty contest” game (Ho, Camerer, Weigelt Amer Ec Rev 98): Amer Ec Rev 98):

Pick numbers xPick numbers xi i [0,100] [0,100]

Closest to (2/3)*(average number) wins $20Closest to (2/3)*(average number) wins $20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1 9

17

25

33

41

49

57

65

73

81

89

97

number choices

pre

dic

ted

fre

qu

en

cy

Beauty contest results (Expansion, Financial Times, Spektrum)

0.00

0.05

0.10

0.15

0.20

numbers

rela

tive

fr

eq

ue

nci

es

22 50 10033

average 23.07

0

0

6-10

16-2

0

26-3

0

36-4

0

46-5

0

56-6

0

66-7

0

81-9

0

0

0.05

0.1

0.15

0.2

0.25

0.3

frequency

Beauty contest results

Portfolio managersEcon PhDs

CEOsCaltech students

Table: Data and estimates of in pbc games

(equilibrium = 0)

data steps of

subjects/game mean std dev thinking

game theorists 19 21.8 3.7

Caltech 23 11.1 3.0

newspaper 23 20.2 3.0

portfolio mgrs 24 16.1 2.8

econ PhD class 27 18.7 2.3

Caltech g=3 22 25.7 1.8

high school 33 18.6 1.6

1/2 mean 27 19.9 1.5

70 yr olds 37 17.5 1.1

Germany 37 20.0 1.1

CEOs 38 18.8 1.0

game p=0.7 39 24.7 1.0

Caltech g=2 22 29.9 0.8

PCC g=3 48 29.0 0.1

game p=0.9 49 24.3 0.1

PCC g=2 54 29.2 0.0

mean 1.56

median 1.30

0

1~10

11~2

0

21~3

0

31~4

0

41~5

0

51~6

0

61~7

0

71~8

0

81~9

0

91~1

00

1

3

5

7

9

0

0.1

0.2

0.3

0.4

0.5

0.6

Choices

Round

Predictions

0

1~10

11~2

0

21~3

0

31~4

0

41~5

0

51~6

0

61~7

0

71~8

0

81~9

0

91~1

00

1

3

5

7

9

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Choices

Round

Results

EWA learning EWA learning Attraction AAttraction Aii

jj (t) for strategy j updated by (t) for strategy j updated by

A A iijj (t) =( (t) =(AAii

jj (t-1) + (t-1) + ii[s[sii(t),s(t),s-i-i(t)]/ ((t)]/ ((1-(1-)+1) (chosen j))+1) (chosen j)A A ii

jj (t) =( (t) =(A A iijj (t-1) + (t-1) + ii[s[sii

jj,s,s-i-i(t)]/ ((t)]/ ((1-(1- )+1) (unchosen j) )+1) (unchosen j)

logit response (softmax) Plogit response (softmax) Piijj(t)=e^{(t)=e^{A A ii

jj (t)}/[ (t)}/[ΣΣkke^{e^{A A iikk (t)}] (t)}]

key parameters:key parameters: imagination (weight on foregone payoffs)imagination (weight on foregone payoffs) decay (forgetting) or change-detectiondecay (forgetting) or change-detection growth rate of attractions (growth rate of attractions (=0 =0 averages; averages; =1=1

cumulations; cumulations; =1=1 “lock-in” after exploration) “lock-in” after exploration)

““In nature a hybrid [species] is usually sterile, but in science the In nature a hybrid [species] is usually sterile, but in science the opposite is often true”-- Francis Crick ’88opposite is often true”-- Francis Crick ’88

Weighted fictitious play (Weighted fictitious play (=1, =1, =0)=0)Simple choice reinforcement (Simple choice reinforcement (=0)=0)

Studies comparing EWA and other learning modelsStudies comparing EWA and other learning models

Reference Type of gameAmaldoss and Jain (Mgt Sci, in press) cooperate-to-compete gamesCabrales, Nagel and Ermenter ('01) stag hunt “global games”Camerer and Anderson ('99, EcTheory)

sender-receiver signaling

Camerer and Ho ('99, Econometrica) median-action coordination4x4 mixed-equilibrium gamesp-beauty contest

Camerer, Ho and Wang ('99) normal form centipedeCamerer, Hsia and Ho (in press) sealed bid mechanismChen ('99) cost allocationHaruvy and Erev (’00) binary risky choice decisionsHo, Camerer and Chong ('01) “continental divide” coordination

price-matchingpatent racestwo-market entry games

Hsia (‘99) N-person call marketsMorgan & Sefton (Games Ec Beh,'01)

“unprofitable” games

Rapoport and Amaldoss ('00OBHDP, '01)

alliancespatent races

Stahl ('99) 5x5 matrix gamesSutter et al ('01) p-beauty contest (groups,

individuals)

20 estimates of learning model parameters20 estimates of learning model parameters

CournotWeighted FictitiousPlay

FictitiousPlay

Average Reinforcement

CumulativeCumulativeReinforcement

Functional EWA learning (“EWA Lite”)Functional EWA learning (“EWA Lite”)Use functions of experience to create parameter values (only free Use functions of experience to create parameter values (only free parameter parameter )) ii(t) is a change detector: (t) is a change detector: ii(t)=1-.5[(t)=1-.5[kk( s( s-i-i

k k (t) - (t) - =1=1t st sss-i-i

kk(()/t ) )/t ) 2 2 ]]

Compares average of past freq’s sCompares average of past freq’s s-i-i(1), s(1), s-i-i(2)…with s(2)…with s-i-i(t) (t) Decay old experience (low Decay old experience (low ) if change is detected) if change is detected =1 when other players always repeat strategies=1 when other players always repeat strategies falls after a “surprise”falls after a “surprise”

falls more if others have been highly variable falls more if others have been highly variable falls less if others have been consistentfalls less if others have been consistent

==/(/( of Nash strategies) (creates low of Nash strategies) (creates low in mixed games) in mixed games)

Questions: Questions: (now) Do functional values pick up differences (now) Do functional values pick up differences

across games? (Yes.)across games? (Yes.)(later) Can function changes create sensible, rapid switching in (later) Can function changes create sensible, rapid switching in stochastic games? stochastic games?

Example: Price matching with loyalty Example: Price matching with loyalty rewards rewards (Capra, Goeree, Gomez, Holt AER ‘99)(Capra, Goeree, Gomez, Holt AER ‘99)

Players 1, 2 pick prices [80,200] ¢Players 1, 2 pick prices [80,200] ¢

Price is P=min(PPrice is P=min(P1,1,,P,P22))

Low price firm earns P+RLow price firm earns P+R

High price firm earns P-RHigh price firm earns P-R

What happens? (e.g., R=50)What happens? (e.g., R=50)

1

3

5

7

9

80

81~

90

91~

100

101~

110

111~

120

121~

130

131~

140

141~

150

151~

160

161~

170

171~

180

181~

190

191~

200

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Prob

Period

Strategy

Empirical Frequency

1

3

5

7

9

80

81~

90

91~

100

101~

110

111~

120

121~

130

131~

140

141~

150

151~

160

161~

170

171~

180

181~

190

191~

200

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Prob

Period

Strategy

Thinking fEWA

1

3

5

7

9

80

81~

90

91~

100

101~

110

111~

120

121~

130

131~

140

141~

150

151~

160

161~

170

171~

180

181~

190

191~

200

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Prob

Period

Strategy

Empirical Frequency

Teaching in repeated (partner) gamesTeaching in repeated (partner) games

Finitely-repeated trust game Finitely-repeated trust game (Camerer & (Camerer &

Weigelt Econometrica ‘88)Weigelt Econometrica ‘88)

borrowerborrower action action

repayrepay defaultdefault

lenderlender loanloan 40,40,6060-100,-100,150150

no loan no loan 10,10,1010

1 borrower plays against 8 lenders1 borrower plays against 8 lenders

A fraction (p(honest)) borrowers A fraction (p(honest)) borrowers preferprefer to to repay (controlled by experimenter)repay (controlled by experimenter)

Empirical results (conditional Empirical results (conditional frequencies of no loan and default)frequencies of no loan and default)

12

34

56

781 2 3 4

56

78

9

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

1.0000

Freq

PeriodSequence

Figure a: Empirical Frequency for No Loan

12

34

56

781 2 3 4

5 67

89

-0.10000.00000.10000.20000.30000.40000.50000.60000.70000.80000.9000

1.0000

Freq

PeriodSequence

Figure b: Empirical Frequency for Default conditional on Loan (Dishonest Borrower)

Teaching in repeated trust games Teaching in repeated trust games (Camerer, Ho, Chong J Ec Theory 02)(Camerer, Ho, Chong J Ec Theory 02)

Some (Some (=89%) borrowers know lenders learn by fEWA=89%) borrowers know lenders learn by fEWA

Actions in t “teach” lenders what to expect in t+1Actions in t “teach” lenders what to expect in t+1

(=.93) is “peripheral vision” weight(=.93) is “peripheral vision” weight

E.g. entering period 4 of sequence 17E.g. entering period 4 of sequence 17

Seq.Seq. periodperiod

16 1 16 1 2 2 3 3 4 4 5 5 6 6 77 8 8

Repay Repay Repay Default .....Repay Repay Repay Default .....

look “peripherally” (look “peripherally” ( weight) weight)

17 1 17 1 22 3 3 look back look back

Repay No loan RepayRepay No loan Repay

Teaching: Teaching: StrategiesStrategies have reputations have reputations

Bayesian-Nash equilibrium: Bayesian-Nash equilibrium: BorrowersBorrowers have reputations (types) have reputations (types)

Heart of the model: Heart of the model:

Attraction of Attraction of ssophisticated ophisticated BBorrower strategy orrower strategy jj after sequence after sequence kk before before period period tt

JJt+1t+1 is possible sequence of choices by borrower is possible sequence of choices by borrower

First term is expected (myopic) payoff from strategy jFirst term is expected (myopic) payoff from strategy jSecond term is summation of expected payoffs in the future (undiscounted) Second term is summation of expected payoffs in the future (undiscounted) given effect of j and optimal planned future choices (Jgiven effect of j and optimal planned future choices (J t+1t+1))

AjB (s;k;t) =

N oL oanX

j 0=L oan

P j 0

L (a;k;t +1) ¢¼B (j ; j 0)

+maxJ t+1

fTX

v=t+2

N oL oanX

j 0=L oan

P̂ j 0

L (a;k;vjj v¡ 1 2 J t+1) ¢¼B (j v 2 J t+1; j 0)g

Empirical results (top) andEmpirical results (top) and teaching model (bottom) teaching model (bottom)

12

34

56

781 2 3 4

56

78

9

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

1.0000

Freq

PeriodSequence

Figure a: Empirical Frequency for No Loan

12

34

56

781 2 3 4

5 67 8

9

-0.10000.00000.10000.20000.30000.40000.50000.60000.70000.80000.90001.0000

Freq

PeriodSequence

Figure b: Empirical Frequency for Default conditional on Loan (Dishonest Borrower)

1 2 3 45

67

89 1

23

45

67

8

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

1.0000

Freq

SequencesPeriod

Figure c: Predicted Frequency for No Loan

1 2 3 45 6

78

9 12

34

56

78

-0.10000.00000.10000.20000.30000.40000.50000.60000.70000.80000.9000

1.0000

Freq

SequencesPeriod

Figure d: Predicted Frequency for Default conditional on Loan (Dishonest Borrower)

ConclusionsConclusionsLearningLearning ( ( response sensitivity) response sensitivity)Hybrid fits & predicts well (20+ games)Hybrid fits & predicts well (20+ games)One-parameter fEWA fits well, easy to estimateOne-parameter fEWA fits well, easy to estimateWell-suited to Markov games because Well-suited to Markov games because ΦΦ means players means players

can “relearn” if new state is quite different? can “relearn” if new state is quite different?

TeachingTeaching ( ( fraction of teaching) fraction of teaching)Retains strategic foresight in repeated games with Retains strategic foresight in repeated games with partner matchingpartner matchingFits trust, entry deterrence better than softmax Fits trust, entry deterrence better than softmax Bayesian-Nash (aka QRE)Bayesian-Nash (aka QRE)

Next?Next?Field applications, explore low-information Markov Field applications, explore low-information Markov domains…domains…

Parametric EWA learning (E’metrica ‘99) • free parameters , , , , N(0)

Functional EWA learning• functions for parameters

• parameter ()

Strategic teaching (JEcTheory ‘02)• Reputation-building w/o “types”

• Two parameters (, )

Thinking steps(parameter )

Documents

Learning and teaching in games: Statistical models of human play in experiments Colin F. Camerer, Social Sciences Caltech ([email protected]) Teck