View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Learning in Games
Fictitious Play
Notation!
For n Players we have: n Finite Player’s Strategies Spaces S1, S2, …, Sn
n Opponent’s Strategies Spaces S-1, S-2, …, S-n
n Payoff Functions u1, u2,…, un For each i and each s-i in S-i a set of
Best Responses BRi (s-i)
What is Fictitious Play?
Each player creates an assessment about the opponent’s strategies in form of a weight function:
iit
iitii
tii
tssif
ssifss
1
11
0
1)()(
ii S:0
Prediction
ii Ss
iit
iitii
t s
ss
~)~(
)()(
Probability of player i assigning to player –i playing s-i at time t:
Fictious Play is …
… any rule that assigns )( it
it )()( i
tii
tit BR
NOT UNIQUE!NOT UNIQUE!
Further DefinitionsIn 2 Player games:
Marginal empirical distributions of j’s play (j=-i)
t
sssd
jjtjj
t
)()()( 0
Propositions:Strict Nash equilibria are absorbing for the
process of fictitious play.Any pure-strategy steady state of fictitous play
must be a Nash equilibrium
Asymptotic Behavior
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
H T
H
T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
1.5 2 2 1.5
Weights:
Row Player Col Player
H T
H
T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
1.5 3 2 2.5
Weights:
Row Player Col Player
H T
H
T
1.5 2 2 1.5H T H TH T H T
H T H T
H T H T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
1.5 3 2 2.5
Weights:
Row Player Col Player
H T
H
T
1.5 2 2 1.5H T H T
H T H T
H T H T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
2.5 3 2 3.5
Weights:
Row Player Col Player
H T
H
T
1.5 3 2 2.5
1.5 2 2 1.5H T H T
H T H T
H T H T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
2.5 3 2 3.5
Weights:
Row Player Col Player
H T
H
T
1.5 3 2 2.5
1.5 2 2 1.5H T H T
H T H T
H T H T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
2.5 3 2 3.5
Weights:
Row Player Col Player
H T
H
T
1.5 3 2 2.5
1.5 2 2 1.5
3.5 3 2 4.5
H T H T
H T H T
H T H T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
2.5 3 2 3.5
Weights:
Row Player Col Player
H T
H
T
1.5 3 2 2.5
1.5 2 2 1.5
3.5 3 2 4.5
H T H T
H T H T
H T H T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
2.5 3 2 3.5
Weights:
Row Player Col Player
H T
H
T
1.5 3 2 2.5
1.5 2 2 1.5
3.5 3 2 4.5
H T H T
H T H T
H T H T
H T H T
Example “matching pennies”
1,-1 -1,1
-1,1 1,-1
2.5 3 2 3.5
Weights:
Row Player Col Player
H T
H
T
1.5 3 2 2.5
1.5 2 2 1.5
3.5 3 2 4.5
H T H T
H T H T
1.5 2H T H T
H T H T
2.5 3 2 3.5
Weights:
Row Player Col Player
1.5 3 2 2.5
1.5 2 2 1.5
3.5 3 2 4.5
6.5 3
6.5 4
5 4.5
5.5 3 4 4.5
4.5 3 3 4.5
6.5 4 6 4.5
Convergence?
…but the marginal empirical distributions?
2
1)()( 0 t
jjt
t
ss
Strategies cycle and do not converge …
MATLAB Simulation - PenniesGame Play PayoffWeight / Time
Proposition
Under fictitious play, if the empirical distributions over each player’s choices converge, the strategy profile corresponding to the product of these distributions is a Nash equilibrium.
Rock-Paper-ScissorsGame Play PayoffWeight / Time
0,11,0
1,00,1
0,11,0
2
1,
2
1
2
1,
2
1
2
1,
2
1
A BA
BC
C
Rock-Paper-ScissorsGame Play PayoffWeight / Time
0,11,0
1,00,1
0,11,0
2
1,
2
1
2
1,
2
1
2
1,
2
1
Shapley GameGame Play PayoffWeight / Time
0,00,11,0
1,00,00,1
0,11,00,0
Persistent miscoordinationGame Play PayoffWeight / Time
0,01,1
1,10,0A B
B
A
1.41
1.41Initial weights:
Nash: (1,0)(0,1)(0.5,0.5)
Persistent miscoordinationGame Play PayoffWeight / Time
0,01,1
1,10,0A B
B
A
1.42
1.42Initial weights:
Nash: (1,0)(0,1)(0.5,0.5)
Persistent MiscoordinationGame Play PayoffWeight / Time
0,01,1
1,10,0A B
B
A
2.42
2.42Initial weights:
Nash: (1,0)(0,1)(0.5,0.5)
Summary on fictitious play
In case of convergence, the time average of strategies forms a Nash Equilibrium
The average payoff does not need to be the one of a Nash (e.g. Miscoordination)
Time average may not converge at all (e.g. Shapley Game)
References
Fudenberg D., Levine D. K. (1998)The Theory of Learning in Games
MIT Press
Nash Convergence of Gradient Dynamics in General-Sum Games
Notation
2 Players:
Strategies and
Payoff matricies
r11 r12
r21 r22
c11 c12
c21 c22
R= C=
1
1
Objective Functions
Payoff Functions:
Vr(,)=r11()+r22((1-)(1-))
+r12((1-))+r21((1-))
Vc(,)=c11()+c22((1-)(1-))
+c12((1-))+c21((1-))
Hillclimbing Idea
Gradient Ascent for Iterated Games
With u=(r11+r22)-(r21+r12)
u’=(c11+c22)-(c21+c12)
)(),(
1222 rruVr
)('),(
1222 ccuVc
Update Rule
),(
1r
kk
V
),(
1c
kk
V
00 , can be arbitrary strategies
Problem
Gradient can lead the players to an infeasible point outside the unit square.
0 1
1
Solution:
Redefine the gradient to the projection of the true gradient onto the boundary.
0 1
1
Let this denote the constrained dynamics!Let this denote the constrained dynamics!
Infinitesimal Gradient Ascent (IGA)
0lim
)(
)(
0'
0
1222
1222
cc
rr
u
u
t
t
)(),( tt Become functions of time!
1. Case: U is invertibleThe two possible qualitative forms of the unconstrained strategy pair:
2. Case: U is not invertibleSome examples of qualitative forms of the unconstrained strategy pair:
Convergence
If both players follow the IGA rule, then both player’s average payoffs will converge to the expected payoff of some Nash equilibrium
If the strategy pair trajectory converges at all, then it converges to a Nash pair.
Proposition
Both previous propositions also hold with finite decreasing step size
References
Singh S., Kearns M., Yishay M. (2000)Nash Convergence of Gradient Dynamics in
General-Sum Games
Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pages 541-548
Dynamic computation of Nash equilibria in Two-Player general-sum games.
2 Players:
Strategies and
Payoff matricies
Notation
R= C=
np
p
1
nnn
n
rr
rr
1
111
nnn
n
cc
cc
1
111
nq
q
1
Objective FunctionsObjective Functions
Payoff Functions:Payoff Functions:
Row Player:Row Player:
Col Player:Col Player:
RqpqpV Tr ),(
CqpqpV Tc ),(
This means:
If thenthe value of pi the payoff.
Observation!
),( qpVr is linear in each pi and qj
Let xi denote the pure strategy for action i.
),(),( qpVqxV rir increases
increasing
Hill climbing (again)
Multiplicative Update Rules
),(),()( qpVqxVtpt
pr
iri
i
),(),()( qpVxpVtqt
qc
ici
i
RqpRqtpt
p Tii
i
)(
RqpRptqt
q Ti
Ti
i
)(
Hill climbing (again)
System of Differential Equations (i=1..n)
RqpRqtpt
p Tii
i
)(
RqpRptqt
q Ti
Ti
i
)(
RqpRqtpt
p Tii
i
)( 0)( tpii either
or
Fixed Points?
0 RqpRq Ti
When is a Fixpoint a Nash?
• Proposition:Provided all pi(0) are neither 0 nor 1, then if (p,q) converges to (p*,q*) then this is a Nash Equilibrium.
Unit Square?
No Problem! pi=0 or pi=1 both set to zero! t
pi
Convergence of the average of the payoff
If the (p,q) trajectory and both player’s payoffs converge in average, the average payoff must be the payoff of some Nash Equilibrium
2 Player 2 Action Case
Either the strategies converge immediately to some pure strategy, or the difference between the Kullback-Leibler distances of (p,q) and some mixed Nash are constant.
)1
1log()1()log(),(
***
p
pp
p
ppppKL
.),(),( ** constqqKLppKL
Trajectories of the difference between the Kullback-Leibler Distances
Nash
But…
… for games with more than 2 actions, convergence is not guaranteed! Counterexample: Shapley Game