Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Games of ChanceGames of Chance

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

AdministrationAdministration

Rush hour (10/22).Rush hour (10/22).

Today not part of midterm (10/24), Today not part of midterm (10/24), just final.just final.

Uncertainty in SearchUncertainty in Search

We’ve assumed everything is known: We’ve assumed everything is known: starting state, neighbors, goals, starting state, neighbors, goals, etc.etc.

Often need to make decisions even Often need to make decisions even though some things are uncertain.though some things are uncertain.

Complicates things…Complicates things…

Types of UncertaintyTypes of Uncertainty

Opponent: What will other player do?Opponent: What will other player do?• MinimaxMinimax

Outcome: Which neighbor get?Outcome: Which neighbor get?• Model via probability distributionModel via probability distribution

State: Where are we now?State: Where are we now?• Hidden informationHidden information

Transition: What are the rules?Transition: What are the rules?• Need to use learning to find outNeed to use learning to find out

Nim-RandNim-Rand

Pile of sticks.Pile of sticks.• Lose if take last stick.Lose if take last stick.• On your turn, take 1 or 2.On your turn, take 1 or 2.• Flip a coin. If H, take 1 more.Flip a coin. If H, take 1 more.

Which type of uncertainty?Which type of uncertainty?

Value of a GameValue of a Game

Without randomness: maximize your Without randomness: maximize your winnings in the worst case.winnings in the worst case.

With randomness: maximize your With randomness: maximize your expectedexpected winnings in the worst winnings in the worst case.case.

Want to do well on average.Want to do well on average.

What games are like this?What games are like this?

Nim-Rand TreeNim-Rand Tree

(|||)-X(|||)-X

cc cc(||)-Y(||)-Y

(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc

()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X

+1 +1 -1-1

()-X()-X+1

()-Y()-Y

Nim-Rand ValuesNim-Rand Values

(|||)-X(|||)-X

cc cc(||)-Y(||)-Y

(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc

()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X

+1 +1 -1-1

()-X()-X+1

()-Y()-Y-1-1+1+1

+1+1 +1+1 +1+1

+1+1 +1+1+0+0

+0+0+0.5+0.5 +0+0

+0.5+0.5

Search ModelSearch Model

States, terminal states (G), values for States, terminal states (G), values for terminal states (V).terminal states (V).

X states (maximizer), Y states X states (maximizer), Y states (minimizer), Z states (chance)(minimizer), Z states (chance)

For all s in Z, for all s’ in N(s)For all s in Z, for all s’ in N(s)

P(s’|s) is the probability of reaching P(s’|s) is the probability of reaching s’ from s.s’ from s.

Game Value (no loops)Game Value (no loops)

Gameval(s) = {Gameval(s) = {If (G(s)) return V(s)If (G(s)) return V(s)Else if s in XElse if s in X

return maxreturn maxs’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else if s in YElse if s in Y

return minreturn mins’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else Else

return sumreturn sums’ in N(s) s’ in N(s) P(s’|s) Gameval(s’)P(s’|s) Gameval(s’)}}

Games with LoopsGames with Loops

No known poly time algorithm.No known poly time algorithm.

Approximated by Approximated by value iterationvalue iteration::

For all s, if G(s), L(s) = V(s), else 0For all s, if G(s), L(s) = V(s), else 0

Repeat until changes are small:Repeat until changes are small:

for all s, L(s) = for all s, L(s) =

max, min, avg L(s’), s’ in N(s)max, min, avg L(s’), s’ in N(s)

depending on s in X, Y, or Z.depending on s in X, Y, or Z.

Hidden InformationHidden Information

Games like Poker, 2-player bridge, Games like Poker, 2-player bridge, Scrabble ™, Diplomacy, StrategoScrabble ™, Diplomacy, Stratego

Don’t fit game tree model, even Don’t fit game tree model, even when chance nodes included.when chance nodes included.

Pure StrategiesPure Strategies

X:X: II: 1=L, 4=L: 1=L, 4=L

IIII: 1=L, 4=R: 1=L, 4=R

IIIIII: 1=R, 4=L: 1=R, 4=L

IVIV: 1=R, 4=R: 1=R, 4=R

Y:Y: II: 2=L, 3=R: 2=L, 3=R

IIII: 2=M, 3=R: 2=M, 3=R

IIIIII: 2=R, 3=R: 2=R, 3=R

Y-2 Y-3

L M RR

Matrix FormMatrix Form

Summarizes all decisions in one for Summarizes all decisions in one for each, chosen simultaneouslyeach, chosen simultaneously

X-X-II X-X-IIII X-X-IIIIII X-X-IVIV

Y-Y-II 77 77 22 22

Y-Y-IIII 33 33 22 22

Y-Y-IIIIII -1-1 44 22 22

Value of Matrix GameValue of Matrix Game

X picks column with largest minX picks column with largest min

Y picks row with smallest maxY picks row with smallest max

X-X-II X-X-IIII X-X-IIIIII X-X-IVIV

Y-Y-II 77 77 22 22

Y-Y-IIII 33 33 22 22

Y-Y-IIIIII -1-1 44 22 22

MinimaxMinimax

Von Neumann proved zero-sum Von Neumann proved zero-sum matrix game, minimax=maximin.matrix game, minimax=maximin.

Given perfect information (no state Given perfect information (no state uncertainty), there exists optimal uncertainty), there exists optimal pure strategy for each player.pure strategy for each player.

Game w/ Chance NodesGame w/ Chance Nodes

+4 -20

0.5 0.5 RL

0.8 0.2

Use expected Use expected valuesvalues

X-X-I I (L)

X-X-II II (R)

Y-Y-I I (L) -8-8 -2-2

Y-Y-II II (R) -8-8 +3+3

More General MatricesMore General Matrices

What game tree leads to this matrix?What game tree leads to this matrix?

Does von Neumann’s theorem still Does von Neumann’s theorem still hold?hold?

X-X-I I (L)

X-X-II II (R)

Y-Y-I I (L) 11 00

Y-Y-II II (R) 00 11

Hidden Info. MatricesHidden Info. Matrices

X picks L or R, keeping the choice X picks L or R, keeping the choice hidden from Y.hidden from Y.

Y makes a choice.Y makes a choice.

X’s choice is revealed and game X’s choice is revealed and game ends.ends. X-X-I I

(L)X-X-II II (R)

Y-Y-I I (L) 11 00

Y-Y-II II (R) 00 11

Micro PokerMicro Poker

X is dealt high X is dealt high or low card, or low card, holds/folds.holds/folds.

Y folds/sees.Y folds/sees.

High card winsHigh card wins

Y can’t see X’s Y can’t see X’s card.card.

+10 -40 +30+10

X-L X-H

fold hold

0.5 0.5

Yseefold fold see

Matrix FormMatrix Form

Player X can guarantee itself +1 on Player X can guarantee itself +1 on average. How?average. How?

It can even announce its strategy.It can even announce its strategy.

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Mixed StrategiesMixed Strategies

Pick a number p.Pick a number p.

X: With prob. p, fold; else hold.X: With prob. p, fold; else hold.

Since Y doesn’t know what’s coming, Since Y doesn’t know what’s coming, the response will sometimes work, the response will sometimes work, sometimes not.sometimes not.

Guess a ProbabilityGuess a Probability

X announces X announces p=1/3.p=1/3.

Y’s pick?Y’s pick?

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Fold: +5Fold: +5

See: -1 2/3See: -1 2/3

seesee

Guess a ProbabilityGuess a Probability

X announces X announces p=2/3.p=2/3.

Y’s pick?Y’s pick?

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Fold: +0Fold: +0

See: +1 2/3See: +1 2/3

foldfold

All StrategiesAll Strategies

What should What should X pick for p X pick for p to to maximize maximize its worst its worst case?case?

p=0.6p=0.6

Payoff +1Payoff +1 -5

0 0.5 1

Randomizing YRandomizing Y

If Y random, If Y random, answer is answer is the same.the same.

No matter No matter what, X can what, X can guarantee guarantee itself +1.itself +1.

0 0.5 1

BluffingBluffing

+10 -40 +30+10

X-L X-H

fold hold

0.5 0.5

Yseefold fold see

X: On a low X: On a low card, bluff card, bluff with prob. with prob. 0.4.0.4.

Y: On hold, Y: On hold, fold with fold with prob. 0.4.prob. 0.4.

Solving 2x2 GameSolving 2x2 Game

X-X-I I with prob. pwith prob. p

X’s expected gain X’s expected gain vs. Y-vs. Y-II : :

mm1111p+mp+m1212(1-p)(1-p)

vs. Y-vs. Y-IIII : :

mm2121p+mp+m2222(1-p)(1-p)

X-X-II X-X-IIII

Y-Y-II mm1111 mm1212

Y-Y-IIII mm2121 mm2222

Maximize the Maximize the minimum.minimum.

Try p=0, p=1, where lines meet.Try p=0, p=1, where lines meet.

Solving General mxnSolving General mxn

Linear program: pLinear program: p11,…,p,…,pnn..

pp11+…+p+…+pnn = 1, p = 1, pii 0 0

Maximize X’s gain, gMaximize X’s gain, g

vs Y-vs Y-II: m: m1111 p p11 + … +m + … +mn1n1 p pn n g g

vs Y-vs Y-IIII: m: m1212 p p11 + … +m + … +mn2n2 p pn n g g

… …

Against all Y strategies.Against all Y strategies.

IssuesIssues

Can we solve poker?Can we solve poker?• More than 2 playersMore than 2 players• Not zero sum (collude)Not zero sum (collude)• Huge state spaceHuge state space

Poker: Opponent modelingPoker: Opponent modeling

Bridge: Use simulation to Bridge: Use simulation to approximateapproximate

What to LearnWhat to Learn

Minimax value in games of chance Minimax value in games of chance and the DFS algorithm for and the DFS algorithm for computing it.computing it.

Converting games to matrix form.Converting games to matrix form.

Solve 2x2 game.Solve 2x2 game.

Homework 5 (due 11/7)Homework 5 (due 11/7)

1.1. The value iteration algorithm from the The value iteration algorithm from the Games of ChanceGames of Chance lecture can be lecture can be applied to deterministic games with applied to deterministic games with loops. Argue that it produces the same loops. Argue that it produces the same answer as the “Loopy” algorithm from answer as the “Loopy” algorithm from the the Game TreeGame Tree lecture. lecture.

2.2. Write the matrix form of the game tree Write the matrix form of the game tree below.below.

Game TreeGame Tree

Y-2 Y-3

Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Documents

Big Chance. Fat Chance. Slim Chance: How Caprice Brought ... · Big Chance. Fat Chance. Slim Chance: How Caprice Brought Us the Red Hills By Jim Cox It’s impossible to lead when

Backlash by Sarah Darer Littman EXCERPT

Ten Top Tips for Better Accessibility Jenifer Littman – Associate Consultant – Tourism for All UK

FACSAria protocol – Littman lab - NYU Langone Health · PDF fileFACSAria protocol – Littman lab Version 040208 ... - filter sample before sorting (70 or 100 nm filter) ... and

Robert J. Littman Tobit the Book of Tobit in Codex Sinaiticus Septuagint Commentary 2008

Richard Best Kristina Littman Lara S. Mehraban SECURITIES

Nips09 Littman Mbrl

More Probabilistic Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Scott Cortus Tandra Fraser Juliana Littman Celeste Nordal Feasibility Study

DISCUSSION GUIDE - Sarah Darer Littman...DISCUSSION GUIDE Sarah Darer Littman author of Want to Go Private?What happens online doesn’t always stay online . . . SCHOLASTIC PRESS An

Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Ryan Littman-Quinn Director of Mobile Health Informatics Botswana-UPenn Partnership (BUP)

Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Language and Learning Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

FACSAria protocol – Littman lab - Ispybio · FACSAria protocol – Littman lab Version 040208 ... Wait a few seconds for a complete response to the delay change, ... - You have

Expectation Maximization Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Anna Littman- Growing Minds and ASAP 1-23-2012

Wrap Up Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Search Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Chance constraints and distributionally robust …...Chance constraints and distributionally robust optimization • chance constraints • approximations to chance constraints •