58
Learning in Games

Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Learning in Games

Page 2: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Fictitious Play

Page 3: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Notation!

For n Players we have: n Finite Player’s Strategies Spaces S1, S2, …, Sn

n Opponent’s Strategies Spaces S-1, S-2, …, S-n

n Payoff Functions u1, u2,…, un For each i and each s-i in S-i a set of

Best Responses BRi (s-i)

Page 4: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

What is Fictitious Play?

Each player creates an assessment about the opponent’s strategies in form of a weight function:

iit

iitii

tii

tssif

ssifss

1

11

0

1)()(

ii S:0

Page 5: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Prediction

ii Ss

iit

iitii

t s

ss

~)~(

)()(

Probability of player i assigning to player –i playing s-i at time t:

Page 6: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Fictious Play is …

… any rule that assigns )( it

it )()( i

tii

tit BR

NOT UNIQUE!NOT UNIQUE!

Page 7: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Further DefinitionsIn 2 Player games:

Marginal empirical distributions of j’s play (j=-i)

t

sssd

jjtjj

t

)()()( 0

Page 8: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Propositions:Strict Nash equilibria are absorbing for the

process of fictitious play.Any pure-strategy steady state of fictitous play

must be a Nash equilibrium

Asymptotic Behavior

Page 9: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

H T

H

T

Page 10: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

1.5 2 2 1.5

Weights:

Row Player Col Player

H T

H

T

H T H T

Page 11: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

1.5 3 2 2.5

Weights:

Row Player Col Player

H T

H

T

1.5 2 2 1.5H T H TH T H T

H T H T

H T H T

H T H T

Page 12: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

1.5 3 2 2.5

Weights:

Row Player Col Player

H T

H

T

1.5 2 2 1.5H T H T

H T H T

H T H T

H T H T

Page 13: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:

Row Player Col Player

H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5H T H T

H T H T

H T H T

H T H T

Page 14: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:

Row Player Col Player

H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5H T H T

H T H T

H T H T

H T H T

Page 15: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:

Row Player Col Player

H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

H T H T

H T H T

Page 16: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:

Row Player Col Player

H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

H T H T

H T H T

Page 17: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:

Row Player Col Player

H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

H T H T

H T H T

Page 18: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Example “matching pennies”

1,-1 -1,1

-1,1 1,-1

2.5 3 2 3.5

Weights:

Row Player Col Player

H T

H

T

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

H T H T

H T H T

1.5 2H T H T

H T H T

Page 19: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

2.5 3 2 3.5

Weights:

Row Player Col Player

1.5 3 2 2.5

1.5 2 2 1.5

3.5 3 2 4.5

6.5 3

6.5 4

5 4.5

5.5 3 4 4.5

4.5 3 3 4.5

6.5 4 6 4.5

Page 20: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Convergence?

…but the marginal empirical distributions?

2

1)()( 0 t

jjt

t

ss

Strategies cycle and do not converge …

Page 21: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

MATLAB Simulation - PenniesGame Play PayoffWeight / Time

Page 22: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Proposition

Under fictitious play, if the empirical distributions over each player’s choices converge, the strategy profile corresponding to the product of these distributions is a Nash equilibrium.

Page 23: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Rock-Paper-ScissorsGame Play PayoffWeight / Time

0,11,0

1,00,1

0,11,0

2

1,

2

1

2

1,

2

1

2

1,

2

1

A BA

BC

C

Page 24: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Rock-Paper-ScissorsGame Play PayoffWeight / Time

0,11,0

1,00,1

0,11,0

2

1,

2

1

2

1,

2

1

2

1,

2

1

Page 25: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Shapley GameGame Play PayoffWeight / Time

0,00,11,0

1,00,00,1

0,11,00,0

Page 26: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Persistent miscoordinationGame Play PayoffWeight / Time

0,01,1

1,10,0A B

B

A

1.41

1.41Initial weights:

Nash: (1,0)(0,1)(0.5,0.5)

Page 27: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Persistent miscoordinationGame Play PayoffWeight / Time

0,01,1

1,10,0A B

B

A

1.42

1.42Initial weights:

Nash: (1,0)(0,1)(0.5,0.5)

Page 28: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Persistent MiscoordinationGame Play PayoffWeight / Time

0,01,1

1,10,0A B

B

A

2.42

2.42Initial weights:

Nash: (1,0)(0,1)(0.5,0.5)

Page 29: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Summary on fictitious play

In case of convergence, the time average of strategies forms a Nash Equilibrium

The average payoff does not need to be the one of a Nash (e.g. Miscoordination)

Time average may not converge at all (e.g. Shapley Game)

Page 30: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

References

Fudenberg D., Levine D. K. (1998)The Theory of Learning in Games

MIT Press

Page 31: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Nash Convergence of Gradient Dynamics in General-Sum Games

Page 32: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Notation

2 Players:

Strategies and

Payoff matricies

r11 r12

r21 r22

c11 c12

c21 c22

R= C=

1

1

Page 33: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Objective Functions

Payoff Functions:

Vr(,)=r11()+r22((1-)(1-))

+r12((1-))+r21((1-))

Vc(,)=c11()+c22((1-)(1-))

+c12((1-))+c21((1-))

Page 34: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Hillclimbing Idea

Page 35: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Gradient Ascent for Iterated Games

With u=(r11+r22)-(r21+r12)

u’=(c11+c22)-(c21+c12)

)(),(

1222 rruVr

)('),(

1222 ccuVc

Page 36: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Update Rule

),(

1r

kk

V

),(

1c

kk

V

00 , can be arbitrary strategies

Page 37: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Problem

Gradient can lead the players to an infeasible point outside the unit square.

0 1

1

Page 38: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Solution:

Redefine the gradient to the projection of the true gradient onto the boundary.

0 1

1

Let this denote the constrained dynamics!Let this denote the constrained dynamics!

Page 39: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Infinitesimal Gradient Ascent (IGA)

0lim

)(

)(

0'

0

1222

1222

cc

rr

u

u

t

t

)(),( tt Become functions of time!

Page 40: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

1. Case: U is invertibleThe two possible qualitative forms of the unconstrained strategy pair:

Page 41: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

2. Case: U is not invertibleSome examples of qualitative forms of the unconstrained strategy pair:

Page 42: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Convergence

If both players follow the IGA rule, then both player’s average payoffs will converge to the expected payoff of some Nash equilibrium

If the strategy pair trajectory converges at all, then it converges to a Nash pair.

Page 43: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Proposition

Both previous propositions also hold with finite decreasing step size

Page 44: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

References

Singh S., Kearns M., Yishay M. (2000)Nash Convergence of Gradient Dynamics in

General-Sum Games

Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann, pages 541-548

Page 45: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Dynamic computation of Nash equilibria in Two-Player general-sum games.

Page 46: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

2 Players:

Strategies and

Payoff matricies

Notation

R= C=

np

p

1

nnn

n

rr

rr

1

111

nnn

n

cc

cc

1

111

nq

q

1

Page 47: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Objective FunctionsObjective Functions

Payoff Functions:Payoff Functions:

Row Player:Row Player:

Col Player:Col Player:

RqpqpV Tr ),(

CqpqpV Tc ),(

Page 48: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

This means:

If thenthe value of pi the payoff.

Observation!

),( qpVr is linear in each pi and qj

Let xi denote the pure strategy for action i.

),(),( qpVqxV rir increases

increasing

Page 49: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Hill climbing (again)

Multiplicative Update Rules

),(),()( qpVqxVtpt

pr

iri

i

),(),()( qpVxpVtqt

qc

ici

i

RqpRqtpt

p Tii

i

)(

RqpRptqt

q Ti

Ti

i

)(

Page 50: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Hill climbing (again)

System of Differential Equations (i=1..n)

RqpRqtpt

p Tii

i

)(

RqpRptqt

q Ti

Ti

i

)(

Page 51: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

RqpRqtpt

p Tii

i

)( 0)( tpii either

or

Fixed Points?

0 RqpRq Ti

Page 52: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

When is a Fixpoint a Nash?

• Proposition:Provided all pi(0) are neither 0 nor 1, then if (p,q) converges to (p*,q*) then this is a Nash Equilibrium.

Page 53: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Unit Square?

No Problem! pi=0 or pi=1 both set to zero! t

pi

Page 54: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Convergence of the average of the payoff

If the (p,q) trajectory and both player’s payoffs converge in average, the average payoff must be the payoff of some Nash Equilibrium

Page 55: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

2 Player 2 Action Case

Either the strategies converge immediately to some pure strategy, or the difference between the Kullback-Leibler distances of (p,q) and some mixed Nash are constant.

)1

1log()1()log(),(

***

p

pp

p

ppppKL

.),(),( ** constqqKLppKL

Page 56: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

Trajectories of the difference between the Kullback-Leibler Distances

Nash

Page 57: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces

But…

… for games with more than 2 actions, convergence is not guaranteed! Counterexample: Shapley Game

Page 58: Learning in Games. Fictitious Play Notation! For n Players we have: n Finite Player’s Strategies Spaces S 1, S 2, …, S n n Opponent’s Strategies Spaces