31
Experts Learning and The Experts Learning and The Minimax Theorem for Zero-Sum Minimax Theorem for Zero-Sum Games Games Maria Florina Balcan December 8th 2011

Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Embed Size (px)

Citation preview

Page 1: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Experts Learning and The Experts Learning and The Minimax Theorem for Zero-Sum Minimax Theorem for Zero-Sum

GamesGames

Maria Florina Balcan

December 8th 2011

Page 2: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

MotivationMotivation

Many situations involve repeated decision making• Deciding how to invest your money (buy or sell stocks)• What route to drive to work each day

• Playing repeatedly a game against an opponent with unknown strategy

This course:

Learning algos for such settings with connections to game theoretic notions of equilibria

Page 3: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

RoadmapRoadmap

Last lecture: Online learning; combining expert advice; the Weighted Majority Algorithm.

This lecture: Online learning, game theory, minimax optimality.

Page 4: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Recap: Online learning, minimizing regret, and combining expert advice.

• “The weighted majority algorithm”N. Littlestone & M. Warmuth

• “Online Algorithms in Machine Learning” (survey)

A. Blum

Algorithmic Game Theory, Nisan, Roughgarden, Tardos, Vazirani (eds) [Chapters 4]

Page 5: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Expert 1 Expert 2 Expert 3

Online learning, minimizing regret, and combining expert advice.

Page 6: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Using “expert” adviceUsing “expert” advice

• We solicit n “experts” for their advice.

Assume we want to predict the stock market.

Can we do nearly as well as best in hindsight?

• We then want to use their advice somehow to make our prediction. E.g.,

Note: “expert” ´ someone with an opinion.

• Will the market go up or down?

[Not necessairly someone who knows anything.]

Page 7: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Formal modelFormal model• There are n experts.

Can we do nearly as well as best in hindsight?

• Each expert makes a prediction in {0,1}

• For each round t=1,2, …, T

• The learner (using experts’ predictions) makes a prediction in {0,1}

• The learner observes the actual outcome. There is a mistake if the predicted outcome is different form the actual outcome.

Page 8: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Weighted Majority AlgorithmWeighted Majority Algorithm

Instead of crossing off, just lower its weight.

– Start with all experts having weight 1.

Weighted Majority Algorithm

Key Point: A mistake doesn't completely disqualify an expert.

– If then predict 1

else predict 0

– Predict based on weighted majority vote.

Page 9: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Weighted Majority AlgorithmWeighted Majority Algorithm

Instead of crossing off, just lower its weight.

– Start with all experts having weight 1.

Weighted Majority Algorithm

Key Point: A mistake doesn't completely disqualify an expert.

– Predict based on weighted majority vote.– Penalize mistakes by cutting weight in half.

Page 10: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Analysis: do nearly as well as best Analysis: do nearly as well as best expert in hindsightexpert in hindsight

If M = # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:

Theorem:

Page 11: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Randomized Weighted MajorityRandomized Weighted Majority

2.4(OPT + lg n)2.4(OPT + lg n) not so good if the best expert makes a mistake 20% of the time.

• Also, generalize ½ to 1- .

Can we do better?

Equivalent to select an expert with probability proportional with its weight.

• Yes. Instead of taking majority vote, use weights as probabilities. (e.g., if 70% on up, 30% on down, then pick 70:30)

Key Point: smooth out the worst case.

Page 12: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Randomized Weighted MajorityRandomized Weighted Majority

Page 13: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Formal Guarantee for Randomized Formal Guarantee for Randomized Weighted MajorityWeighted Majority

If M = expected # mistakes we've made so far and OPT = # mistakes best expert has made so far, then:

Theorem:

M · OPT + (1/log(n)

Page 14: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Randomized Weighted MajorityRandomized Weighted Majority

Solves to:

Page 15: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

SummarizingSummarizing

• E[# mistakes] · OPT + -1log(n).

• If set =(log(n)/OPT)1/2 to balance the two terms out (or use guess-and-double), get bound of

• E[mistakes]·OPT+2(OPT¢log n)1/2

Note: Of course we might not know OPT, so if running T time steps, since OPT · T, set ² to get additive loss (2T log n)1/2

regret• E[mistakes]·OPT+2(T¢log n)1/2

• So, regret/T ! 0. [no regret algorithm]

Page 16: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

What if have n options, not n predictors? What if have n options, not n predictors?

• We’re not combining n experts, we’re choosing one. • Can we still do it?

• Nice feature of RWM: can be applied when experts are n different options

• We did not see the predictions in order to select an expert (only needed to see their losses to update our weights)

• E.g., n different ways to drive to work each day, n different ways to invest our money.

Page 17: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Decision Theoretic Version; Formal Decision Theoretic Version; Formal modelmodel

• There are n experts.

The guarantee also applies to this model!!!

• For each round t=1,2, …, T

• No predictions. The learner produces a prob distr. on experts based on their past performance pt.

• The learner is given a loss vector lt and incurs expected loss lt ¢ pt.

• The learner updates the weights.

[Interesting for connections between GT and Learning.]

Page 18: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Can generalize to losses in [0,1]Can generalize to losses in [0,1]

• If expert i has loss li, do: wi à wi(1-li).

[before if an expert had a loss of 1, we multiplied by (1-epsilon), if it had loss of 0 we left it alone, now we do linearly in between]

• Same analysis as before.

Page 19: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

““Game Theory, On-line Prediction, and Boosting”,Game Theory, On-line Prediction, and Boosting”, Freund & Schapire, GEB Freund & Schapire, GEB

This lecture: Online Learning, Game Theory, and Minimax Optimality

Page 20: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Zero Sum GamesGame defined by a matrix M.

Rock Paper

0

1

1/2

1/2

Scissors

0

1

1 0 1/2

Rock

Paper

Scissors

Row player (Mindy) chooses row i.

Column player (Max) chooses column j (simultaneously).

Mindy’s goal: minimize her loss M(i,j).

Assume wlog entries are in [0,1].

Max’s goal: maximize this loss (zero sum).

Page 21: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Randomized PlayMindy chooses a distribution P over rows.

Mindy’s expected loss:

If i,j = pure strategies, and P,Q = mixed strategies

Max chooses a distribution Q over columns [simultaneously]

M(P,j) - Mindy’s expected loss when she plays P and Max plays j

M(i,Q) - Mindy’s expected loss when she plays i and Max plays Q

Page 22: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Sequential Play

Say Mindy plays before Max. If Mindy chooses P, then Max will pick Q to maximize M(P,Q), so the loss will be

So, Mindy should pick P to minimize L(P). Loss will be:

Similarly, if Max plays first, loss will be:

Page 23: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Minimax Theorem

Playing second cannot be worse than playing first

Mindy plays first

Von Neumann’s minimax theorem:

Mindy plays second

No advantage to playing second! Regardless of who goes first the outcome is always the same!

Page 24: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Optimal PlayVon Neumann’s minimax theorem:

1. Even if Max knows Mindy’s strategy, Max cannot get better outcome than v. v is the best possible value.

Optimal strategies:

Value of the game

Min-max strategy

Max-min strategy

9 min-max strategy P*s.t. for any Q, M(P*,Q) · v.

2. No matter what strategy Mindy uses, the outcome is at worst v.

9 max-min strategy Q*s.t. for any P, M(P, Q*) ¸ v.

Page 25: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Optimal PlayVon Neumann’s minimax theorem:

Optimal strategies:

Value of the game

Min-max strategy

Max-min strategy

P* and Q* optimal strategies if the opponent is also optimal!

For a two person zero-sum game against a good opponent, your best bet is to find your min-max optimal strategy and always play it.

Page 26: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Optimal PlayVon Neumann’s minimax theorem:

Note: (P*, Q*) is a Nash equilibrium.

Optimal strategies:

Value of the game

Min-max strategy

Max-min strategy

All the NE have the same value in zero-sum games.

Not true in general, very specific to zero-sum games!!!

P*is a best response to Q*; Q*is a best response to P*

Page 27: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Optimal PlayVon Neumann’s minimax theorem:

Optimal strategies:

Value of the game

Min-max strategy

Max-min strategy

P* and Q* optimal strategies if the opponent is also optimal!

For a two person zero-sum game against a good opponent, your best bet is to find your min-max optimal strategy and always play it.

Page 28: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Beyond the Classic Theory

• Opponent may not be fully adversarial.

• M maybe unknown or very large.

As game is played over and over, opportunity to learn the game and/or the opponent’s strategy.

Often limited info about the game or the opponent

Bart Simpson always plays Rock instead of choosing the uniform distribution.

You can play Paper and always beat Bart.

Page 29: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Repeated PlayRepeated Play• M unknown.

• Mindy chooses Pt

• For each round t=1,2, …, T

• Max chooses Qt (possibly based on Pt)• Mindy’s loss is M(Pt, Qt)

• Mindy observes loss M(i, Qt) for each pure strategy i.

Mindy can run RWM to ensure:

where

= Pt ¢ (MQt)

lt = MQt

Exactly fits DT experts model!

minP M(P,(Q1+…+QT)/T) · v

Page 30: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

Prove minimax theorem as corollary

Imaging game is played repeatedly. In each round t

[¸ part is trivial ]

Define:

Need to prove:

Mindy plays using RWM

Max chooses best response

Page 31: Experts Learning and The Minimax Theorem for Zero-Sum Games Maria Florina Balcan December 8th 2011

One slide proof of minimax theorem

is a strategy that you can use if you have to go first.