Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Preview:

Citation preview

Games of ChanceGames of Chance

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

AdministrationAdministration

Rush hour (10/22).Rush hour (10/22).

Today not part of midterm (10/24), Today not part of midterm (10/24), just final.just final.

Uncertainty in SearchUncertainty in Search

We’ve assumed everything is known: We’ve assumed everything is known: starting state, neighbors, goals, starting state, neighbors, goals, etc.etc.

Often need to make decisions even Often need to make decisions even though some things are uncertain.though some things are uncertain.

Complicates things…Complicates things…

Types of UncertaintyTypes of Uncertainty

Opponent: What will other player do?Opponent: What will other player do?• MinimaxMinimax

Outcome: Which neighbor get?Outcome: Which neighbor get?• Model via probability distributionModel via probability distribution

State: Where are we now?State: Where are we now?• Hidden informationHidden information

Transition: What are the rules?Transition: What are the rules?• Need to use learning to find outNeed to use learning to find out

Nim-RandNim-Rand

Pile of sticks.Pile of sticks.• Lose if take last stick.Lose if take last stick.• On your turn, take 1 or 2.On your turn, take 1 or 2.• Flip a coin. If H, take 1 more.Flip a coin. If H, take 1 more.

Which type of uncertainty?Which type of uncertainty?

Value of a GameValue of a Game

Without randomness: maximize your Without randomness: maximize your winnings in the worst case.winnings in the worst case.

With randomness: maximize your With randomness: maximize your expectedexpected winnings in the worst winnings in the worst case.case.

Want to do well on average.Want to do well on average.

What games are like this?What games are like this?

Nim-Rand TreeNim-Rand Tree

(|||)-X(|||)-X

cc cc(||)-Y(||)-Y

(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc

()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X

+1 +1 -1-1

1 2

+1 +1

1 2

+1

()-X()-X+1

+1

()-Y()-Y

Nim-Rand ValuesNim-Rand Values

(|||)-X(|||)-X

cc cc(||)-Y(||)-Y

(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc

()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X

+1 +1 -1-1

1 2

+1 +1

1 2

+1

()-X()-X+1

+1

()-Y()-Y-1-1+1+1

+1+1 +1+1 +1+1

-1-1

-1-1

+1+1 +1+1+0+0

+0+0+0.5+0.5 +0+0

+0.5+0.5

Search ModelSearch Model

States, terminal states (G), values for States, terminal states (G), values for terminal states (V).terminal states (V).

X states (maximizer), Y states X states (maximizer), Y states (minimizer), Z states (chance)(minimizer), Z states (chance)

For all s in Z, for all s’ in N(s)For all s in Z, for all s’ in N(s)

P(s’|s) is the probability of reaching P(s’|s) is the probability of reaching s’ from s.s’ from s.

Game Value (no loops)Game Value (no loops)

Gameval(s) = {Gameval(s) = {If (G(s)) return V(s)If (G(s)) return V(s)Else if s in XElse if s in X

return maxreturn maxs’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else if s in YElse if s in Y

return minreturn mins’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else Else

return sumreturn sums’ in N(s) s’ in N(s) P(s’|s) Gameval(s’)P(s’|s) Gameval(s’)}}

Games with LoopsGames with Loops

No known poly time algorithm.No known poly time algorithm.

Approximated by Approximated by value iterationvalue iteration::

For all s, if G(s), L(s) = V(s), else 0For all s, if G(s), L(s) = V(s), else 0

Repeat until changes are small:Repeat until changes are small:

for all s, L(s) = for all s, L(s) =

max, min, avg L(s’), s’ in N(s)max, min, avg L(s’), s’ in N(s)

depending on s in X, Y, or Z.depending on s in X, Y, or Z.

Hidden InformationHidden Information

Games like Poker, 2-player bridge, Games like Poker, 2-player bridge, Scrabble ™, Diplomacy, StrategoScrabble ™, Diplomacy, Stratego

Don’t fit game tree model, even Don’t fit game tree model, even when chance nodes included.when chance nodes included.

Pure StrategiesPure Strategies

X:X: II: 1=L, 4=L: 1=L, 4=L

IIII: 1=L, 4=R: 1=L, 4=R

IIIIII: 1=R, 4=L: 1=R, 4=L

IVIV: 1=R, 4=R: 1=R, 4=R

Y:Y: II: 2=L, 3=R: 2=L, 3=R

IIII: 2=M, 3=R: 2=M, 3=R

IIIIII: 2=R, 3=R: 2=R, 3=R

X-1

+7 +3

-1

+5

+4

Y-2 Y-3

X-4

L R

L R

L M RR

Matrix FormMatrix Form

Summarizes all decisions in one for Summarizes all decisions in one for each, chosen simultaneouslyeach, chosen simultaneously

X-X-II X-X-IIII X-X-IIIIII X-X-IVIV

Y-Y-II 77 77 22 22

Y-Y-IIII 33 33 22 22

Y-Y-IIIIII -1-1 44 22 22

Value of Matrix GameValue of Matrix Game

X picks column with largest minX picks column with largest min

Y picks row with smallest maxY picks row with smallest max

X-X-II X-X-IIII X-X-IIIIII X-X-IVIV

Y-Y-II 77 77 22 22

Y-Y-IIII 33 33 22 22

Y-Y-IIIIII -1-1 44 22 22

MinimaxMinimax

Von Neumann proved zero-sum Von Neumann proved zero-sum matrix game, minimax=maximin.matrix game, minimax=maximin.

Given perfect information (no state Given perfect information (no state uncertainty), there exists optimal uncertainty), there exists optimal pure strategy for each player.pure strategy for each player.

Game w/ Chance NodesGame w/ Chance Nodes

X-1

+4 -20

-5

+3

+10

c Y-3

c

L R

0.5 0.5 RL

0.8 0.2

Use expected Use expected valuesvalues

X-X-I I (L)

X-X-II II (R)

Y-Y-I I (L) -8-8 -2-2

Y-Y-II II (R) -8-8 +3+3

More General MatricesMore General Matrices

What game tree leads to this matrix?What game tree leads to this matrix?

Does von Neumann’s theorem still Does von Neumann’s theorem still hold?hold?

X-X-I I (L)

X-X-II II (R)

Y-Y-I I (L) 11 00

Y-Y-II II (R) 00 11

Hidden Info. MatricesHidden Info. Matrices

X picks L or R, keeping the choice X picks L or R, keeping the choice hidden from Y.hidden from Y.

Y makes a choice.Y makes a choice.

X’s choice is revealed and game X’s choice is revealed and game ends.ends. X-X-I I

(L)X-X-II II (R)

Y-Y-I I (L) 11 00

Y-Y-II II (R) 00 11

Micro PokerMicro Poker

X is dealt high X is dealt high or low card, or low card, holds/folds.holds/folds.

Y folds/sees.Y folds/sees.

High card winsHigh card wins

Y can’t see X’s Y can’t see X’s card.card.

c

-20

+10 -40 +30+10

X-L X-H

Y

fold hold

0.5 0.5

Yseefold fold see

hold

Matrix FormMatrix Form

Player X can guarantee itself +1 on Player X can guarantee itself +1 on average. How?average. How?

It can even announce its strategy.It can even announce its strategy.

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Mixed StrategiesMixed Strategies

Pick a number p.Pick a number p.

X: With prob. p, fold; else hold.X: With prob. p, fold; else hold.

Since Y doesn’t know what’s coming, Since Y doesn’t know what’s coming, the response will sometimes work, the response will sometimes work, sometimes not.sometimes not.

Guess a ProbabilityGuess a Probability

X announces X announces p=1/3.p=1/3.

Y’s pick?Y’s pick?

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Fold: +5Fold: +5

See: -1 2/3See: -1 2/3

seesee

Guess a ProbabilityGuess a Probability

X announces X announces p=2/3.p=2/3.

Y’s pick?Y’s pick?

X-X-I I (fold)

X-X-II II (hold)

Y-Y-I I (fold) -5-5 +10+10

Y-Y-II II (see) +5+5 -5-5

Fold: +0Fold: +0

See: +1 2/3See: +1 2/3

foldfold

All StrategiesAll Strategies

What should What should X pick for p X pick for p to to maximize maximize its worst its worst case?case?

p=0.6p=0.6

Payoff +1Payoff +1 -5

0

5

10

0 0.5 1

see

fold

pp

Randomizing YRandomizing Y

If Y random, If Y random, answer is answer is the same.the same.

No matter No matter what, X can what, X can guarantee guarantee itself +1.itself +1.

-5

0

5

10

0 0.5 1

see

fold

BluffingBluffing

c

-20

+10 -40 +30+10

X-L X-H

Y

fold hold

0.5 0.5

Yseefold fold see

hold

X: On a low X: On a low card, bluff card, bluff with prob. with prob. 0.4.0.4.

Y: On hold, Y: On hold, fold with fold with prob. 0.4.prob. 0.4.

Solving 2x2 GameSolving 2x2 Game

X-X-I I with prob. pwith prob. p

X’s expected gain X’s expected gain vs. Y-vs. Y-II : :

mm1111p+mp+m1212(1-p)(1-p)

vs. Y-vs. Y-IIII : :

mm2121p+mp+m2222(1-p)(1-p)

X-X-II X-X-IIII

Y-Y-II mm1111 mm1212

Y-Y-IIII mm2121 mm2222

Maximize the Maximize the minimum.minimum.

Try p=0, p=1, where lines meet.Try p=0, p=1, where lines meet.

Solving General mxnSolving General mxn

Linear program: pLinear program: p11,…,p,…,pnn..

pp11+…+p+…+pnn = 1, p = 1, pii 0 0

Maximize X’s gain, gMaximize X’s gain, g

vs Y-vs Y-II: m: m1111 p p11 + … +m + … +mn1n1 p pn n g g

vs Y-vs Y-IIII: m: m1212 p p11 + … +m + … +mn2n2 p pn n g g

… …

Against all Y strategies.Against all Y strategies.

IssuesIssues

Can we solve poker?Can we solve poker?• More than 2 playersMore than 2 players• Not zero sum (collude)Not zero sum (collude)• Huge state spaceHuge state space

Poker: Opponent modelingPoker: Opponent modeling

Bridge: Use simulation to Bridge: Use simulation to approximateapproximate

What to LearnWhat to Learn

Minimax value in games of chance Minimax value in games of chance and the DFS algorithm for and the DFS algorithm for computing it.computing it.

Converting games to matrix form.Converting games to matrix form.

Solve 2x2 game.Solve 2x2 game.

Homework 5 (due 11/7)Homework 5 (due 11/7)

1.1. The value iteration algorithm from the The value iteration algorithm from the Games of ChanceGames of Chance lecture can be lecture can be applied to deterministic games with applied to deterministic games with loops. Argue that it produces the same loops. Argue that it produces the same answer as the “Loopy” algorithm from answer as the “Loopy” algorithm from the the Game TreeGame Tree lecture. lecture.

2.2. Write the matrix form of the game tree Write the matrix form of the game tree below.below.

Game TreeGame Tree

X-1

+2

-1 +4

Y-2 Y-3

X-4

L R

L R

L R

+5L

+2R

Recommended