Upload
adam-heath
View
215
Download
1
Embed Size (px)
Citation preview
Learning and teaching in games: Statistical models Learning and teaching in games: Statistical models of human play in experimentsof human play in experiments
Colin F. Camerer, Social Sciences Caltech ([email protected])Colin F. Camerer, Social Sciences Caltech ([email protected])Teck Ho, Berkeley (Haas Business School)Teck Ho, Berkeley (Haas Business School)
Kuan Chong, National Univ SingaporeKuan Chong, National Univ Singapore
How can bounded rationality be modelled in games?How can bounded rationality be modelled in games?Theory desiderata: Precise, general, useful (game theory), Theory desiderata: Precise, general, useful (game theory), andand cognitively plausible, empirically disciplined (cog sci)cognitively plausible, empirically disciplined (cog sci)Three components:Three components:– Cognitive hierarchy thinking model (one parameter, creates Cognitive hierarchy thinking model (one parameter, creates
initial conditions)initial conditions)– Learning model (EWA, fEWA)Learning model (EWA, fEWA)
- Sophisticated teaching’ model (repeated games)- Sophisticated teaching’ model (repeated games)
Shameless plug: Camerer, Shameless plug: Camerer, Behavioral Game TheoryBehavioral Game Theory (Princeton, Feb ’03) or (Princeton, Feb ’03) or see website hss.caltech.edu/~camerersee website hss.caltech.edu/~camerer
Behavioral models use some game theory Behavioral models use some game theory principles, and weaken other principlesprinciples, and weaken other principles
PrinciplePrinciple equilibrium equilibrium Thinking Thinking LearningLearning TeachingTeachingconcept of a gameconcept of a game strategic thinkingstrategic thinking best responsebest response mutual consistencymutual consistency learning learning strategic foresightstrategic foresight
(Typical) experimental economics methods(Typical) experimental economics methodsRepeated matrix stage game (Markov w/ 1 state)Repeated matrix stage game (Markov w/ 1 state)
Repeated with “one night stand” (“stranger”) rematching protocol & Repeated with “one night stand” (“stranger”) rematching protocol & feedback (to allow learning without repeated-game reputation-building)feedback (to allow learning without repeated-game reputation-building)
Game is described abstractly, payoffs are public knowledge (e.g., read Game is described abstractly, payoffs are public knowledge (e.g., read out loud)out loud)
Subjects paid $ according to choices (~$12/hr)Subjects paid $ according to choices (~$12/hr)
Why this style? Basic question is whether S’s can “compute” Why this style? Basic question is whether S’s can “compute” equiilibriumequiilibrium**, not meant to be realistic, not meant to be realistic
Establish regularity across S’s, different game structuresEstablish regularity across S’s, different game structures
Statistical fitting: Parsimonious (1+ parameters) models, fit (in sample) Statistical fitting: Parsimonious (1+ parameters) models, fit (in sample) & predict (out of sample) & compute economic value& predict (out of sample) & compute economic value
**Question now answered (No): Would be useful to move to low-Question now answered (No): Would be useful to move to low-information MAL designsinformation MAL designs
Beauty contest game: Pick numbers [0,100] Beauty contest game: Pick numbers [0,100] closest to (2/3)*(average number) wins closest to (2/3)*(average number) wins
Beauty contest results (Expansion, Financial Times, Spektrum)
0.00
0.05
0.10
0.15
0.20
numbers
rela
tive
fr
eq
uen
cie
s
22 50 10033
average 23.07
0
““Beauty contest” game (Ho, Camerer, Weigelt Beauty contest” game (Ho, Camerer, Weigelt Amer Ec Rev 98): Amer Ec Rev 98):
Pick numbers xPick numbers xi i [0,100] [0,100]
Closest to (2/3)*(average number) wins $20Closest to (2/3)*(average number) wins $20
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1 9
17
25
33
41
49
57
65
73
81
89
97
number choices
pre
dic
ted
fre
qu
en
cy
Beauty contest results (Expansion, Financial Times, Spektrum)
0.00
0.05
0.10
0.15
0.20
numbers
rela
tive
fr
eq
ue
nci
es
22 50 10033
average 23.07
0
0
6-10
16-2
0
26-3
0
36-4
0
46-5
0
56-6
0
66-7
0
81-9
0
0
0.05
0.1
0.15
0.2
0.25
0.3
frequency
Beauty contest results
Portfolio managersEcon PhDs
CEOsCaltech students
Table: Data and estimates of in pbc games
(equilibrium = 0)
data steps of
subjects/game mean std dev thinking
game theorists 19 21.8 3.7
Caltech 23 11.1 3.0
newspaper 23 20.2 3.0
portfolio mgrs 24 16.1 2.8
econ PhD class 27 18.7 2.3
Caltech g=3 22 25.7 1.8
high school 33 18.6 1.6
1/2 mean 27 19.9 1.5
70 yr olds 37 17.5 1.1
Germany 37 20.0 1.1
CEOs 38 18.8 1.0
game p=0.7 39 24.7 1.0
Caltech g=2 22 29.9 0.8
PCC g=3 48 29.0 0.1
game p=0.9 49 24.3 0.1
PCC g=2 54 29.2 0.0
mean 1.56
median 1.30
0
1~10
11~2
0
21~3
0
31~4
0
41~5
0
51~6
0
61~7
0
71~8
0
81~9
0
91~1
00
1
3
5
7
9
0
0.1
0.2
0.3
0.4
0.5
0.6
Choices
Round
Predictions
0
1~10
11~2
0
21~3
0
31~4
0
41~5
0
51~6
0
61~7
0
71~8
0
81~9
0
91~1
00
1
3
5
7
9
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Choices
Round
Results
EWA learning EWA learning Attraction AAttraction Aii
jj (t) for strategy j updated by (t) for strategy j updated by
A A iijj (t) =( (t) =(AAii
jj (t-1) + (t-1) + ii[s[sii(t),s(t),s-i-i(t)]/ ((t)]/ ((1-(1-)+1) (chosen j))+1) (chosen j)A A ii
jj (t) =( (t) =(A A iijj (t-1) + (t-1) + ii[s[sii
jj,s,s-i-i(t)]/ ((t)]/ ((1-(1- )+1) (unchosen j) )+1) (unchosen j)
logit response (softmax) Plogit response (softmax) Piijj(t)=e^{(t)=e^{A A ii
jj (t)}/[ (t)}/[ΣΣkke^{e^{A A iikk (t)}] (t)}]
key parameters:key parameters: imagination (weight on foregone payoffs)imagination (weight on foregone payoffs) decay (forgetting) or change-detectiondecay (forgetting) or change-detection growth rate of attractions (growth rate of attractions (=0 =0 averages; averages; =1=1
cumulations; cumulations; =1=1 “lock-in” after exploration) “lock-in” after exploration)
““In nature a hybrid [species] is usually sterile, but in science the In nature a hybrid [species] is usually sterile, but in science the opposite is often true”-- Francis Crick ’88opposite is often true”-- Francis Crick ’88
Weighted fictitious play (Weighted fictitious play (=1, =1, =0)=0)Simple choice reinforcement (Simple choice reinforcement (=0)=0)
Studies comparing EWA and other learning modelsStudies comparing EWA and other learning models
Reference Type of gameAmaldoss and Jain (Mgt Sci, in press) cooperate-to-compete gamesCabrales, Nagel and Ermenter ('01) stag hunt “global games”Camerer and Anderson ('99, EcTheory)
sender-receiver signaling
Camerer and Ho ('99, Econometrica) median-action coordination4x4 mixed-equilibrium gamesp-beauty contest
Camerer, Ho and Wang ('99) normal form centipedeCamerer, Hsia and Ho (in press) sealed bid mechanismChen ('99) cost allocationHaruvy and Erev (’00) binary risky choice decisionsHo, Camerer and Chong ('01) “continental divide” coordination
price-matchingpatent racestwo-market entry games
Hsia (‘99) N-person call marketsMorgan & Sefton (Games Ec Beh,'01)
“unprofitable” games
Rapoport and Amaldoss ('00OBHDP, '01)
alliancespatent races
Stahl ('99) 5x5 matrix gamesSutter et al ('01) p-beauty contest (groups,
individuals)
20 estimates of learning model parameters20 estimates of learning model parameters
CournotWeighted FictitiousPlay
FictitiousPlay
Average Reinforcement
CumulativeCumulativeReinforcement
Functional EWA learning (“EWA Lite”)Functional EWA learning (“EWA Lite”)Use functions of experience to create parameter values (only free Use functions of experience to create parameter values (only free parameter parameter )) ii(t) is a change detector: (t) is a change detector: ii(t)=1-.5[(t)=1-.5[kk( s( s-i-i
k k (t) - (t) - =1=1t st sss-i-i
kk(()/t ) )/t ) 2 2 ]]
Compares average of past freq’s sCompares average of past freq’s s-i-i(1), s(1), s-i-i(2)…with s(2)…with s-i-i(t) (t) Decay old experience (low Decay old experience (low ) if change is detected) if change is detected =1 when other players always repeat strategies=1 when other players always repeat strategies falls after a “surprise”falls after a “surprise”
falls more if others have been highly variable falls more if others have been highly variable falls less if others have been consistentfalls less if others have been consistent
==/(/( of Nash strategies) (creates low of Nash strategies) (creates low in mixed games) in mixed games)
Questions: Questions: (now) Do functional values pick up differences (now) Do functional values pick up differences
across games? (Yes.)across games? (Yes.)(later) Can function changes create sensible, rapid switching in (later) Can function changes create sensible, rapid switching in stochastic games? stochastic games?
Example: Price matching with loyalty Example: Price matching with loyalty rewards rewards (Capra, Goeree, Gomez, Holt AER ‘99)(Capra, Goeree, Gomez, Holt AER ‘99)
Players 1, 2 pick prices [80,200] ¢Players 1, 2 pick prices [80,200] ¢
Price is P=min(PPrice is P=min(P1,1,,P,P22))
Low price firm earns P+RLow price firm earns P+R
High price firm earns P-RHigh price firm earns P-R
What happens? (e.g., R=50)What happens? (e.g., R=50)
1
3
5
7
9
80
81~
90
91~
100
101~
110
111~
120
121~
130
131~
140
141~
150
151~
160
161~
170
171~
180
181~
190
191~
200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Prob
Period
Strategy
Empirical Frequency
1
3
5
7
9
80
81~
90
91~
100
101~
110
111~
120
121~
130
131~
140
141~
150
151~
160
161~
170
171~
180
181~
190
191~
200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Prob
Period
Strategy
Thinking fEWA
1
3
5
7
9
80
81~
90
91~
100
101~
110
111~
120
121~
130
131~
140
141~
150
151~
160
161~
170
171~
180
181~
190
191~
200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Prob
Period
Strategy
Empirical Frequency
Teaching in repeated (partner) gamesTeaching in repeated (partner) games
Finitely-repeated trust game Finitely-repeated trust game (Camerer & (Camerer &
Weigelt Econometrica ‘88)Weigelt Econometrica ‘88)
borrowerborrower action action
repayrepay defaultdefault
lenderlender loanloan 40,40,6060-100,-100,150150
no loan no loan 10,10,1010
1 borrower plays against 8 lenders1 borrower plays against 8 lenders
A fraction (p(honest)) borrowers A fraction (p(honest)) borrowers preferprefer to to repay (controlled by experimenter)repay (controlled by experimenter)
Empirical results (conditional Empirical results (conditional frequencies of no loan and default)frequencies of no loan and default)
12
34
56
781 2 3 4
56
78
9
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
1.0000
Freq
PeriodSequence
Figure a: Empirical Frequency for No Loan
12
34
56
781 2 3 4
5 67
89
-0.10000.00000.10000.20000.30000.40000.50000.60000.70000.80000.9000
1.0000
Freq
PeriodSequence
Figure b: Empirical Frequency for Default conditional on Loan (Dishonest Borrower)
Teaching in repeated trust games Teaching in repeated trust games (Camerer, Ho, Chong J Ec Theory 02)(Camerer, Ho, Chong J Ec Theory 02)
Some (Some (=89%) borrowers know lenders learn by fEWA=89%) borrowers know lenders learn by fEWA
Actions in t “teach” lenders what to expect in t+1Actions in t “teach” lenders what to expect in t+1
(=.93) is “peripheral vision” weight(=.93) is “peripheral vision” weight
E.g. entering period 4 of sequence 17E.g. entering period 4 of sequence 17
Seq.Seq. periodperiod
16 1 16 1 2 2 3 3 4 4 5 5 6 6 77 8 8
Repay Repay Repay Default .....Repay Repay Repay Default .....
look “peripherally” (look “peripherally” ( weight) weight)
17 1 17 1 22 3 3 look back look back
Repay No loan RepayRepay No loan Repay
Teaching: Teaching: StrategiesStrategies have reputations have reputations
Bayesian-Nash equilibrium: Bayesian-Nash equilibrium: BorrowersBorrowers have reputations (types) have reputations (types)
Heart of the model: Heart of the model:
Attraction of Attraction of ssophisticated ophisticated BBorrower strategy orrower strategy jj after sequence after sequence kk before before period period tt
JJt+1t+1 is possible sequence of choices by borrower is possible sequence of choices by borrower
First term is expected (myopic) payoff from strategy jFirst term is expected (myopic) payoff from strategy jSecond term is summation of expected payoffs in the future (undiscounted) Second term is summation of expected payoffs in the future (undiscounted) given effect of j and optimal planned future choices (Jgiven effect of j and optimal planned future choices (J t+1t+1))
AjB (s;k;t) =
N oL oanX
j 0=L oan
P j 0
L (a;k;t +1) ¢¼B (j ; j 0)
+maxJ t+1
fTX
v=t+2
N oL oanX
j 0=L oan
P̂ j 0
L (a;k;vjj v¡ 1 2 J t+1) ¢¼B (j v 2 J t+1; j 0)g
Empirical results (top) andEmpirical results (top) and teaching model (bottom) teaching model (bottom)
12
34
56
781 2 3 4
56
78
9
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
1.0000
Freq
PeriodSequence
Figure a: Empirical Frequency for No Loan
12
34
56
781 2 3 4
5 67 8
9
-0.10000.00000.10000.20000.30000.40000.50000.60000.70000.80000.90001.0000
Freq
PeriodSequence
Figure b: Empirical Frequency for Default conditional on Loan (Dishonest Borrower)
1 2 3 45
67
89 1
23
45
67
8
0.0000
0.1000
0.2000
0.3000
0.4000
0.5000
0.6000
0.7000
0.8000
0.9000
1.0000
Freq
SequencesPeriod
Figure c: Predicted Frequency for No Loan
1 2 3 45 6
78
9 12
34
56
78
-0.10000.00000.10000.20000.30000.40000.50000.60000.70000.80000.9000
1.0000
Freq
SequencesPeriod
Figure d: Predicted Frequency for Default conditional on Loan (Dishonest Borrower)
ConclusionsConclusionsLearningLearning ( ( response sensitivity) response sensitivity)Hybrid fits & predicts well (20+ games)Hybrid fits & predicts well (20+ games)One-parameter fEWA fits well, easy to estimateOne-parameter fEWA fits well, easy to estimateWell-suited to Markov games because Well-suited to Markov games because ΦΦ means players means players
can “relearn” if new state is quite different? can “relearn” if new state is quite different?
TeachingTeaching ( ( fraction of teaching) fraction of teaching)Retains strategic foresight in repeated games with Retains strategic foresight in repeated games with partner matchingpartner matchingFits trust, entry deterrence better than softmax Fits trust, entry deterrence better than softmax Bayesian-Nash (aka QRE)Bayesian-Nash (aka QRE)
Next?Next?Field applications, explore low-information Markov Field applications, explore low-information Markov domains…domains…
Parametric EWA learning (E’metrica ‘99) • free parameters , , , , N(0)
Functional EWA learning• functions for parameters
• parameter ()
Strategic teaching (JEcTheory ‘02)• Reputation-building w/o “types”
• Two parameters (, )
Thinking steps(parameter )