Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Learning in Games

Fictitious Play

Notation!

For n Players we have: n Finite Player’s Strategies Spaces S1, S2, …, Sn

n Opponent’s Strategies Spaces S-1, S-2, …, S-n

n Payoff Functions u1, u2,…, un For each i and each s-i in S-i a set of

Best Responses BRi (s-i)

What is Fictitious Play?

Each player creates an assessment about the opponent’s strategies in form of a weight function:

iit

iitii

tii

tssif

ssifss

1

11

0

1)()(

ii S:0

Prediction

ii Ss

iit

iitii

t s

ss

~)~(

)()(

Probability of player i assigning to player –i playing s-i at time t:

Fictious Play is …

… any rule that assigns )( it

it )()( i

tii

tit BR

NOT UNIQUE!NOT UNIQUE!

Further DefinitionsIn 2 Player games:

Marginal empirical distributions of j’s play (j=-i)

t

sssd

jjtjj

t

)()()( 0

Propositions:Strict Nash equilibria are absorbing for the

process of fictitious play.Any pure-strategy steady state of fictitous play

must be a Nash equilibrium

Asymptotic Behavior

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

H T

H

T


1,-1 -1,1

-1,1 1,-1

1.5 2 2 1.5

Weights:

Row Player Col Player

H T

H

T

H T H T


1,-1 -1,1

-1,1 1,-1

1.5 3 2 2.5

Weights:


H T

H

T

1.5 2 2 1.5H T H TH T H T

H T H T

H T H T

H T H T


1,-1 -1,1

-1,1 1,-1

1.5 3 2 2.5

Weights:


H T

H

T

1.5 2 2 1.5H T H T

H T H T

H T H T

H T H T


1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:


H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5H T H T

H T H T

H T H T

H T H T


1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:


H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5H T H T

H T H T

H T H T

H T H T


1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:


H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

H T H T

H T H T


1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:


H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

H T H T

H T H T


1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:


H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

H T H T

H T H T


1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:


H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

1.5 2H T H T

H T H T

2.5 3 2 3.5

Weights:


1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

6.5 3

6.5 4

5 4.5

5.5 3 4 4.5

4.5 3 3 4.5

6.5 4 6 4.5

Convergence?

…but the marginal empirical distributions?

2

1)()( 0 t

jjt

t

ss

Strategies cycle and do not converge …

MATLAB Simulation - PenniesGame Play PayoffWeight / Time

Proposition

Under fictitious play, if the empirical distributions over each player’s choices converge, the strategy profile corresponding to the product of these distributions is a Nash equilibrium.

Rock-Paper-ScissorsGame Play PayoffWeight / Time

0,11,0

1,00,1

0,11,0

2

1,

2

1

2

1,

2

1

2

1,

2

1

A BA

BC

C

Rock-Paper-ScissorsGame Play PayoffWeight / Time

0,11,0

1,00,1

0,11,0

2

1,

2

1

2

1,

2

1

2

1,

2

1

Shapley GameGame Play PayoffWeight / Time

0,00,11,0

1,00,00,1

0,11,00,0

Persistent miscoordinationGame Play PayoffWeight / Time

0,01,1

1,10,0A B

B

A

1.41

1.41Initial weights:

Nash: (1,0)(0,1)(0.5,0.5)

Persistent miscoordinationGame Play PayoffWeight / Time

0,01,1

1,10,0A B

B

A

1.42


Nash: (1,0)(0,1)(0.5,0.5)

Persistent MiscoordinationGame Play PayoffWeight / Time

0,01,1

1,10,0A B

B

A

2.42


Nash: (1,0)(0,1)(0.5,0.5)

Summary on fictitious play

In case of convergence, the time average of strategies forms a Nash Equilibrium

The average payoff does not need to be the one of a Nash (e.g. Miscoordination)

Time average may not converge at all (e.g. Shapley Game)

References

Fudenberg D., Levine D. K. (1998)The Theory of Learning in Games

MIT Press

Nash Convergence of Gradient Dynamics in General-Sum Games

Notation

2 Players:

Strategies and

Payoff matricies

r11 r12

r21 r22

c11 c12

c21 c22

R= C=

1

1

Objective Functions

Payoff Functions:

Vr(,)=r11()+r22((1-)(1-))

+r12((1-))+r21((1-))

Vc(,)=c11()+c22((1-)(1-))

+c12((1-))+c21((1-))

Hillclimbing Idea

Gradient Ascent for Iterated Games

With u=(r11+r22)-(r21+r12)

u’=(c11+c22)-(c21+c12)

)(),(

1222 rruVr

)('),(

1222 ccuVc

Update Rule

),(

1r

kk

V

),(

1c

kk

V

00 , can be arbitrary strategies

Problem

Gradient can lead the players to an infeasible point outside the unit square.

0 1

1

Solution:

Redefine the gradient to the projection of the true gradient onto the boundary.

0 1

1

Let this denote the constrained dynamics!Let this denote the constrained dynamics!

Infinitesimal Gradient Ascent (IGA)

0lim

)(

)(

0'

0

1222

1222

cc

rr

u

u

t

t

)(),( tt Become functions of time!

1. Case: U is invertibleThe two possible qualitative forms of the unconstrained strategy pair:

2. Case: U is not invertibleSome examples of qualitative forms of the unconstrained strategy pair:

Convergence

If both players follow the IGA rule, then both player’s average payoffs will converge to the expected payoff of some Nash equilibrium

If the strategy pair trajectory converges at all, then it converges to a Nash pair.

Proposition

Both previous propositions also hold with finite decreasing step size

References

Singh S., Kearns M., Yishay M. (2000)Nash Convergence of Gradient Dynamics in

General-Sum Games

Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pages 541-548

Dynamic computation of Nash equilibria in Two-Player general-sum games.

2 Players:

Strategies and

Payoff matricies

Notation

R= C=

np

p

1

nnn

n

rr

rr

1

111

nnn

n

cc

cc

1

111

nq

q

1

Objective FunctionsObjective Functions

Payoff Functions:Payoff Functions:

Row Player:Row Player:

Col Player:Col Player:

RqpqpV Tr ),(

CqpqpV Tc ),(

This means:

If thenthe value of pi the payoff.

Observation!

),( qpVr is linear in each pi and qj

Let xi denote the pure strategy for action i.

),(),( qpVqxV rir increases

increasing

Hill climbing (again)

Multiplicative Update Rules

),(),()( qpVqxVtpt

pr

iri

i

),(),()( qpVxpVtqt

qc

ici

i

RqpRqtpt

p Tii

i

)(

RqpRptqt

q Ti

Ti

i

)(

Hill climbing (again)

System of Differential Equations (i=1..n)

RqpRqtpt

p Tii

i

)(

RqpRptqt

q Ti

Ti

i

)(

RqpRqtpt

p Tii

i

)( 0)( tpii either

or

Fixed Points?

0 RqpRq Ti

When is a Fixpoint a Nash?

• Proposition:Provided all pi(0) are neither 0 nor 1, then if (p,q) converges to (p*,q*) then this is a Nash Equilibrium.

Unit Square?

No Problem! pi=0 or pi=1 both set to zero! t

pi

Convergence of the average of the payoff

If the (p,q) trajectory and both player’s payoffs converge in average, the average payoff must be the payoff of some Nash Equilibrium

2 Player 2 Action Case

Either the strategies converge immediately to some pure strategy, or the difference between the Kullback-Leibler distances of (p,q) and some mixed Nash are constant.

)1

1log()1()log(),(

***

p

pp

p

ppppKL

.),(),( ** constqqKLppKL

Trajectories of the difference between the Kullback-Leibler Distances

Nash

But…

… for games with more than 2 actions, convergence is not guaranteed! Counterexample: Shapley Game

Documents

Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces