Computing Nash Equilibrium

1

Computing Nash Equilibrium

Presenter: Yishay Mansour

2

Outline

• Problem Definition

• Notation

• Today: Zero-Sum game

• Next week: General Sum Games– Multiple players

3

Model

• Multiple players N={1, ... , n}

• Strategy set– Player i has m actions Si = {si1, ... , sim}

– Si are pure actions of player i

– S = i Si

• Payoff functions– Player i ui : S

4

Strategies

• Pure strategies: actions• Mixed strategy

– Player i – pi distribution over Si

– Game - P = i pi

• Product distribution

• Modified distribution– P-i = probability P except for player i

– (q, P-i ) = player i plays q other player pj

5

Notations

• Average Payoff– Player i: ui(P) = Es~P[ui(s)] = P(s)ui(s)– P(s) = i pi (si)

• Nash Equilibrium– P* is a Nash Eq. If for every player i– For any distribution qi

– ui(qi,P*-i) ui(P*)• Best Response

6

Notations

• Alternative payoff– xij(P) = ui(sij,P-i) = Es~P[ui(s) | si = sij]

• Difference in payoff– zij(P) = xij(P) – ui(P)

• Improvement in payoff– gij(P) = max{ zij(P),0}

7

Fixed point Theorems

• Intermediate Value Theorem– domain [a,b]– function f continuous– f(a) f(b) < 0– exists z such that f(z)=0– Proof: M+ = { x | f(x) 0} M- ={x | f(x) 0}– closed sets and have an intersection.

8

Brouwer’s Fixed point theorem

• f: S S continuous, S compact and convex

• There exists z in S : z = f(z)– For S=[0,1], previous theorem

9

Kakutani’ Fixed Point Theorem

• L: S S correspondence– L(x) is a convex set– L semi-continuous– S compact and convex

• There exists z: z in L(z)

10

Nash Equilibrium I

• Best response correspondence– L(P) = argmaxQ { ui(qi, P-i)}

– L is a correspondence, continuous– Nash is a fixed point of L

• P* in L(P*)

– Kakutani’s fixed point theorem

11

Nash Equilibrium II

• Fixed point– K(P) has mN parameters

– Kij(P) = (pij+gij(P)) / (1 + gij(P))

– Nash is a fixed point of K• P* = K(P*)

– Original proof of Nash– Continuous function on a compact space

• Brouwer’s fixed point theorem

12

Nash Equilibrium III

• Non-linear complementary problem (NCP)– Recall zij(P)

– For every player i and action aij:

• zij(P)*pij = 0

• zi(P) is orthogonal to pi

– Nash: z(P*) 0• zij(P*) 0

13

Nash Equilibrium IV

• Stationary point problem– Recall: x = alternative payoff– Nash: P*– For every P– (P-P*) x(P*) 0

• (pij –p*ij) x(P*) 0

14

Nash Equilibrium V

• Minimizing a function– Objective function:

– V(P) = i j [gij(P)]2

– V(P) is continuous and differentiable, non-negative function

– NASH: V(P*) = 0• Local Minima

15

Nash Equilibrium VI

• Semi-Algebraic set– distribution P: j pij = 1

– difference in payoff:• zij(P) 0

• zij(P) = xij(P) – ui(P) 0

• Explicitly:

Sss k

kiiSss ik

kiijiij

nn

spsuspssuPz,...,,..., 11

)()()(),()(

16

Two player games

• Payoff matrices (A,B)– m rows and n columns– player 1 has m action, player 2 has n actions

• strategies p and q

• Payoffs: u1(pq)=pAqt and u2(pq)= pBqt

• Zero sum game– A= -B

17

Linear Programming

• Primal LP:

• x in SETprimal is feasible

• maximize <c,x> subject to x in SETprimal

}0

:{

j

jijij

jijij

nprimal

x

bxa

bxa

xSET

18

Linear Programming

• Dual LP:

• y in SETdual is feasible

• minimize <b,y> subject to y in SETdual

}0

:{

i

ijiji

ijiji

mdual

y

cay

cay

ySET

19

Duality Theorem

• Weak duality: <c,x> <b,y> – for any feasible x and y– proof!

• Strong Duality – If there are feasible solutions then– <c,x> = <b,y> for some feasible x and y– sketch of proof.

20

Two players zero sum

• Fix strategy q of player 2,• player 1 best response:

– maximize p (Aqt) such that j pj = 1 and pj 0– dual LP: minimize u such that u Aqt

• Player 2: select strategy q :– minimize u such that u Aqt and i qi = 1 and qi 0– dual (strategy for player 1)– maximize v such that v pA, j pj = 1 and pj 0

• There exists a unique value v.

21

Example

22

Summary

• Two players zero sum– linear programming– polynomial time– can have multiple Nash– unique value!– If (p,q) and (p’,q’) Nash then– (p,q’) and (p’,q) Nash

23

Online learning

• Playing with unknown payoff matrix• Online algorithm:

– at each step selects an action.• can be stochastic or fractional

– Observes all possible payoffs– Updates its parameters

• Goal: Achieve the value of the game– Payoff matrix of the “game” define at the end

24

Online learning - Algorithm

• Notations:– Opponent distribution Qt

– Our distribution Pt

– Observed cost M(i, Qt) • Should be MQt

– Goal: minimize cost

• Algorithm: Exponential weights– Action i has weight proportional to bL(i,t)

– L(i,t) = loss of action i until time t

25

Online algorithm: Notations

• Formally:– parameter: b 0< b < 1

– wt+1(i) = wt(i) bM(i,Qt)

– Zt = wt(i)

– Pt+1(i) = wt+1(i) / Zt

– Number of total steps T is known

26

Online algorithm: Theorem

• Theorem– For any matrix M with entries in [0,1]

– Any sequence of dist. Q1 ... QT

– The algorithm generates P1, ... , PT

– RE(A||B) = Ex~A [ln (A(x) / B(x) ) ]

)||(1

1),(

1

)/1ln(min),( 1

11

PPREb

QPMb

bQPM

T

ttP

T

ttt

27

Online algorithm: Analysis

• Lemma– For any mixed strategy P

• Corollary

),())1(1ln(),()/1ln()||()||( 1 ttttt QPMbQPMbPPREPPRE

nb

QPMb

bQPM

T

ttP

T

ttt ln

1

1),(

1

)/1ln(min),(

11

28

Online Algorithm: Optimization

• b= 1/(1 + sqrt{2 (ln n) / T})

• Average Loss: v + O(sqrt{(ln n )/T})

29

Two players General sum games

• Input matrices (A,B)• No unique value• Computational issues: find some, all Nash• player 1 best response:

– Like for zero sum:– Fix strategy q of player 2– maximize p (Aqt) such that j pj = 1 and pj 0– dual LP: minimize u such that u Aqt

30

Two players General sum games

• Assume the support of strategies known.– p has support Sp and q has support Sq

– Can formulate the Nash as LP:

ii

pi

pi

pj

jij

pj

jij

p

Sip

Sip

Sivqa

Sivqa

1

for 0

for 0

for

for

jj

qj

qj

qi

iji

qi

iji

q

Sjq

Sjq

Sjuap

Sjuap

1

for 0

for 0

for

for

31

Approximate Nash

32

Lemke & Howson

33

Example

Documents

Computing Nash Equilibrium