26
Lirong Xia Minimax strategies, alpha beta pruning

Minimax strategies, alpha beta pruningxial/Teaching/2018S/slides/IntroAI_6.pdfTic-tac-toe, chess, checkers The MAX player maximizes result The MIN player minimizes result Ø Minimax

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Lirong Xia

Minimax strategies, alpha beta pruning

ØProject 1 due tonight§ Makes sure you DO NOT SEE “ERROR:

Summation of parsed points does not match”

ØProject 2 due in two weeks

2

Reminder

ØNo really mechanical way§ art more than science

ØGeneral guideline: relaxing constraints§ e.g. Pacman can pass through the walls

ØMimic what you would do

3

How to find good heuristics?

Arc Consistency of a CSP

4

Ø A simple form of propagation makes sure all arcs are consistent:

Ø If V loses a value, neighbors of V need to be rechecked!Ø Arc consistency detects failure earlier than forward checkingØ Can be run as a preprocessor or after each assignmentØ Might be time-consuming

Deletefrom tail!

X XX

Limitations of Arc Consistency

5

ØAfter running arc consistency:§ Can have one solution

left

§ Can have multiple solutions left

§ Can have no solutions left (and not know it)

“Sum to 2” gameØ Player 1 moves, then player 2, finally player 1 againØ Move = 0 or 1Ø Player 1 wins if and only if all moves together sum to 2

Player 1

Player 2 Player 2

Player 1

-1

Player 1 Player 1 Player 1

0

0

0

0

00 0

1

11

1 1 11

-1 -1 1 -1 1 1 -1

Player 1’s utility is in the leaves; player 2’s utility is the negative of this

ØAdversarial gameØMinimax search

ØAlpha-beta pruning algorithm

7

Today’s schedule

Adversarial Games

8

Ø Deterministic, zero-sum games:§ Tic-tac-toe, chess, checkers

§ The MAX player maximizes result

§ The MIN player minimizes result

Ø Minimax search:§ A search tree

§ Players alternate turns

§ Each node has a minimax value: best achievable utility against a rational adversary

Computing Minimax Values

9

Ø This is DFS

Ø Two recursive functions:§ max-value maxes the values of successors§ min-value mins the values of successors

Ø Def value (state):If the state is a terminal state: return the state’s utilityIf the agent at the state is MAX: return max-value(state)If the agent at the state is MIN: return min-value(state)

Ø Def max-value(state):Initialize max = -∞For each successor of state:

Compute value(successor)Update max accordingly

return maxØ Def min-value(state): similar to max-value

Minimax Example

10

3 2 2

3

Tic-tac-toe Game Tree

11

12

Renju• 15*15• 5 horizontal, vertical, or

diagonal in a row win• no double-3 or double-4

moves for black• otherwise black’s winning

strategy was computed– L. Victor Allis 1994 (PhD

thesis)

Minimax Properties

13

Ø Time complexity?§

Ø Space complexity?§

Ø For chess, § Exact solution is completely

infeasible

§ But, do we need to explore the whole tree?

( )mO b

( )O bm

35, 100b m≈ ≈

Resource Limits

14

Ø Cannot search to leavesØ Depth-limited search

§ Instead, search a limited depth of tree

§ Replace terminal utilities with an evaluation function for non-terminal positions

Ø Guarantee of optimal play is gone

Evaluation Functions

15

Ø Functions which scores non-terminals

Ø Ideal function: returns the minimax utility of the positionØ In practice: typically weighted linear sum of features:

Ø e.g. , etc.Evals s( ) = w1 f1 s( )+w2 f2 s( )++wn fn s( )

( ) ( )1 # white queens - # black queensf s =

ØSuppose you are the MAX playerØGiven a depth d and current state

ØCompute value(state, d) that reaches depth d§ at depth d, use a evaluation function to

estimate the value if it is non-terminal

16

Minimax with limited depth

17

Improving minimax: pruning

Pruning in Minimax Search

18

ØAn ancestor is a MAX node§ already has an option than my current solution§ my future solution can only be smaller

Alpha-beta pruningØPruning = cutting off parts of the search tree (because

you realize you don’t need to look at them)§ When we considered A* we also pruned large parts of the

search tree

ØMaintain § α = value of the best option for the MAX player along the path

so far§ β = value of the best option for the MIN player along the path

so far§ Initialized to be α = -∞ and β = +∞

ØMaintain and update α and β for each node§ α is updated at MAX player’s nodes§ β is updated at MIN player’s nodes

Alpha-Beta Pruning

20

Ø General configuration§ We’re computing the MIN-VALUE

at n§ We’re looping over n’s children§ n’s value estimate is dropping§ α is the best value that MAX can

get at any choice point along the current path

§ If n becomes worse than α, MAX will avoid it, so can stop considering n’s other children

§ Define β similarly for MIN§ α is usually smaller than β

• Once α >= β, return to the upper layer

Alpha-Beta Pruning Example

21

is MAX’s best alternative here or aboveis MIN’s best alternative here or above

αβ

Alpha-Beta Pruning Example

22

is MAX’s best alternative here or aboveis MIN’s best alternative here or above

αβ

starting /α β

raising α

raising α

lowering β

-+

α

β

= ∞

= ∞

-+

α

β

= ∞

= ∞

-+

α

β

= ∞

= ∞

3+

α

β

=

= ∞

3+

α

β

=

= ∞

-+

αβ= ∞= ∞

-3

αβ= ∞=

-3

αβ= ∞=

-3

αβ= ∞=

-3

αβ= ∞=

83

αβ==

3+

αβ== ∞

32

αβ== 3

+αβ== ∞

314

αβ==

35

αβ==

31

αβ==

Alpha-Beta Pseudocode

23

Alpha-Beta Pruning Properties

24

Ø This pruning has no effect on final result at the root

Ø Values of intermediate nodes might be wrong!§ Important: children of the root may have the wrong value

Ø Good children ordering improves effectiveness of pruning

Ø With “perfect ordering”:§ Time complexity drops to O(bm/2)

§ Doubles solvable depth!

§ Your action looks smarter: more forward-looking with good evaluation function

§ Full search of, e.g. chess, is still hopeless…

ØQ1: write an evaluation function for (state,action) pairs§ the evaluation function is for this question only

ØQ2: minimax search with arbitrary depth and multiple MIN players (ghosts)§ evaluation function on states has been

implemented for youØQ3: alpha-beta pruning with arbitrary depth

and multiple MIN players (ghosts)25

Project 2

ØMinimax search§ with limited depth

§ evaluation function

ØAlpha-beta pruning

ØProject 1 due midnight today

ØProject 2 due in two weeks

26

Recap