Local Search and Games Example: n-queens •Put n queens on an n × n board such that no queen can...

Preview:

Citation preview

Local Search and Games

Geraint A. WigginsProfessor of Computational CreativityDepartment of Computer ScienceVrije Universiteit Brussel

Objectives

• Learn about

‣ Local search algorithms

‣ Hill-climbing search

‣ Simulated annealing search

‣ Local beam search

‣ Genetic algorithms

‣ Optimal decisions

‣ α-β pruning

‣ Imperfect, real-time decision making

Local Search Algorithms

• Good for optimization problems

‣ no goal test

‣ the path to the goal is irrelevant

‣ the goal state itself is the solution ๏ integrated circuit design

๏ factory-floor layout

๏ automatic programming

๏ telecommunications network optimization

• Find configuration satisfying constraints,

‣ e.g. n-queens (we return to this later)

Local Search Algorithms

• State

‣ keep a single "current" state, try to improve it

‣ cf. keeping the entire path and trying to optimise that

• Advantages

‣ Use little memory (usually constant amount)

‣ Can find reasonable solutions in large or infinite (continuous) problems

Example: the n-Queens puzzle

• Put n queens on an n×n chess board such that no queen can attack another

‣ ie. no two queens on the same row, column, or diagonal

3

Example: n-queens• Put n queens on an n × n board such that no

queen can attack another. (ie. no two queens on the same row, column, or diagonal)

Failed solutions to the 4 Queens problem

4

State space landscape

• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.

• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum

3

Example: n-queens• Put n queens on an n × n board such that no

queen can attack another. (ie. no two queens on the same row, column, or diagonal)

Failed solutions to the 4 Queens problem

4

State space landscape

• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.

• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum

3

Example: n-queens• Put n queens on an n × n board such that no

queen can attack another. (ie. no two queens on the same row, column, or diagonal)

Failed solutions to the 4 Queens problem

4

State space landscape

• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.

• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum

Failed solutions to the 4 Queens problem

3

Example: n-queens• Put n queens on an n × n board such that no

queen can attack another. (ie. no two queens on the same row, column, or diagonal)

Failed solutions to the 4 Queens problem

4

State space landscape

• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.

• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum

objective function global maximum

shoulder

local maximum

plateau

currentstate

state space

State space landscape

• A state has

‣ Location๏ where it is in the

state space

‣ Elevation๏ its heuristic cost or

objective value

• Solutions (two ways of thinking about it, but equivalent)

‣ Heuristic cost (find minimum cost) : global minimum.

‣ Objective function (find best state or maximal cost): global maximum

Local search algorithms

• “Local” because we don’t care about the path through the search space

• A local search algorithm is

‣ complete: if it always finds a goal if one exists

‣ optimal: if it finds the global minimum/maximum

• Problem: depending on initial state, search can get stuck in local maxima/minima

Hill-climbing search

• Like agenda-based search, but the agenda contains only one item

‣ so we take the best option at each stage, and throw away all the others

5

Local search & the state space landscape

• A local search algorithm is:

– complete: if it always finds a goal if one exists

– optimal: if it finds the global minimum/maximum

Problem: depending on initial state, can get stuck in local maxima

6

Hill-climbing search

• "Like climbing Everest in thick fog with

amnesia"

This algorithm is based on an objective function, how

would you modify it to use a heuristic function?

Hill-climbing: the 8-queens problem

• Local search: use complete states

‣ always 8 queens on board

‣ the numbers in the example count attacks if a piece moves vertically

• Heuristic

‣ h = number of pairs of queens that are attacking each other, either directly or indirectly

‣ h = 17 for the example

‣ h = 0 is a solution state (or global minimum)

• Successors?

‣ The set of states generated by moving one queen

7

Hill-climbing search: 8-queens problem

• Local search uses complete

states (always 8 queens on the board)

• Heuristic?

h = number of pairs of queens

that are attacking each other,

either directly or indirectly

h = 17 for the example

h = 0 is a solution state (or global

minimum)

• Successors?

– The set of states generated by moving one queen.

8

Hill-climbing search: 8-queens problem

• A local minimum

with h = 1

10 minutes

What is the set of successor states?

Why is this a local minimum?

What is the next step of the algorithm on slide 6?

Hill-climbing: the 8-queens problem

• A local minimum with h = 1

• What is the set of successor states?

• Why is this a local minimum?

• What is the next step of hill-climbing?

7

Hill-climbing search: 8-queens problem

• Local search uses complete

states (always 8 queens on the board)

• Heuristic?

h = number of pairs of queens

that are attacking each other,

either directly or indirectly

h = 17 for the example

h = 0 is a solution state (or global

minimum)

• Successors?

– The set of states generated by moving one queen.

8

Hill-climbing search: 8-queens problem

• A local minimum

with h = 1

10 minutes

What is the set of successor states?

Why is this a local minimum?

What is the next step of the algorithm on slide 6?

Hill-climbing search

• Greedy local search

‣ grabs best neighbour without “thinking ahead”

• Rapid improvement in heuristic value

‣ But at a fatal cost

‣ Local Maxima/Minima will stop search altogether๏ Ridges: a sequence of local maxima that make

it difficult to escape.

๏ Plateaux: heuristic values uninformative

๏ Foothills: local maxima that are not global maxima

9

Hill Climbing

• Often considered greedy local search ĺ grabs best neighbour without thinking ahead!

• Rapid improvement! But at a fatal cost!– Local Maxima/Minimas

• Ridges: A sequence of local maximas that make it difficult to escape.

• Plateaux: heuristic values uninformative

• Foothills: local maximas that are not global maximas

10

Result of local minimas/maximas

• Algorithm reaches a point where no progress is made.

• Start from random configuration of 8-queens.86%: stuck 14%: solution

– Works quickly usually only evaluating 3-4 steps before either getting stuck or finding a solution.

How bad is the problem?

• Algorithm reaches a point where no progress is made

• To test on the 8-queens example:

‣ run many searches starting from random configurations of 8-queens๏ too many to do it exhaustively

‣ result๏ 86% of starting configurations get stuck

๏ 14% of starting configurations succeed

‣ the algorithms works quickly, usually only evaluating 3-4 steps before either getting stuck or finding a solution

How can we solve the problem?

• Allow different moves

‣ e.g., sideways

‣ must account for local maxima (foothills) otherwise could get stuck in an infinite loop๏ e.g., cap number of sideways moves

๏ but we risk losing generality with such ad hoc solutions

• Result:

‣ 6% stuck

‣ 94% solution

• Works much slower

‣ evaluating 21-64 steps before either getting stuck or finding a solution

Variants of hill-climbing

• Stochastic hill-climbing

‣ choose randomly between available uphill moves๏ choose from uniform or non-uniform distribution

‣ converges more slowly but sometimes finds better solutions๏ optimality still not guaranteed in all cases

• First-choice hill-climbing

‣ Good for big problems ๏ Uses Stochastic hill climbing but randomly generates successors and picks first larger one

• Random-restart hill-climbing

‣ If at first you don't succeed, try again, starting from a different place๏ requires that non-optimality can be recognised

Simulated Annealing search

• Hill-climbing

‣ only improves on the current solution

‣ not complete๏ may get stuck in local minima/maxima

• Random walk

‣ moves from state to state randomly

‣ complete (given infinite time) but very inefficient

• Simulated Annealing combines these two to give a compromise between search complexity and completeness

Simulated Annealing search

• Escape local maxima by allowing some bad moves

• Gradually decrease frequency of allowed bad moves as search proceeds

• By analogy with a process of hardening in steel production

‣ want crystal structure of metal to be all lined up

‣ heat the metal to make the crystals vibrate and “shake out” irregularities

‣ let it cool slowly so that the more ordered form stays ordered.

Simulated Annealing search

• So in the algorithm, we have an imaginary notion of “temperature”, which allows random movement outside the hill-climb

‣ as the “temperature” drops, less random movement is allowed

13

Simulated annealing search

• Hill climbing: Only improves on the solution! Not complete ĺ may get stuck in local minima/maxima.

• Random walk: Moves from state to state randomly! Complete (will eventually find a solution) but very inefficient.

Simulated annealing combines both these approaches

14

Simulated annealing search

• Idea: escape local maxima by allowing some "bad" moves but gradually decrease their frequency

Simulated Annealing search

• Like getting a ping-pong ball into the deepest crevice of a bumpy surface

‣ Left alone by itself, ball will roll into a local minimum

‣ If we shake the surface, we can bounce the ball out of a local minimum

‣ The trick is to shake hard enough to get it out of local minimum, but not hard enough to dislodge it from global one

‣ We start by shaking hard and then gradually reduce the intensity of shaking

• If the “temperature” decreases slowly enough, simulated annealing search finds a global optimum with probability approaching 1.0

• Widely used in VLSI layout, airlines scheduling, etc.

Local Beam Search

• Keep track of k states rather than just one

• Start with k randomly generated states

• At each iteration

‣ generate all successors of all k states๏ NB: not the same as an agenda k long!

‣ if any one is a goal state, stop

‣ else select the k best successors from the complete list and repeat

• So this is like a kind of selective breadth first search

Local Beam Search

• Local beam search looks like running k hill-climbing algorithms in parallel, but it is not

‣ the results of all k states influence each other

‣ if one state generates several good successors; they all end up in the next iteration

‣ states generating bad successors are weeded out

• This is both a strength and a weakness:

‣ unfruitful searches are quickly abandoned and searches making the most progress are intensified

‣ can lead to a lack of diversity: concentration in a small region of the search space

‣ remedy: choose k successors randomly, biasing choice towards good ones

Genetic Algorithms

• Genetic Algorithms (GAs) are a variant of local beam search

• A successor state is generated by combining two parent states

• Start with a large number of randomly generated states (a population)

• A state is represented as a binary string

• An evaluation function or fitness function assesses the quality of a state

‣ higher values for better states.

• Produce the next generation of states by

‣ selection

‣ crossover

‣ mutation

Problem encoding for GAs

• The way that you choose to encode the problem often has implications for the solution

• Example: 8 queens, 8x8 board

‣ 64 bits, one for each square? (= 8x8 matrix)

‣ 8 octal digits? (one 3-bit number for each column/row)

Genetic Algorithms

• 21641300 (octal)

• 010001110010001011000000 (binary)

• Fitness function: number of non-attacking pairs of queens (min = 0, max = 28)

‣ Fitness here = 6 + 5 + 4 + 3 + 3 + 2 + 0 = 23

21

Genetic algorithms

corresponds to

Fitness function: number of non-attacking* pairs of queens (min = 0, max = 28)

Fitness: = (6 + 5 + 4 + 3 + 3 + 2 + 0) = 23

* fitness functions needs to return higher values for better states….

22

8 Queens & GAs

Draw the board state this corresponds to.Work out the fitness of this state.

43228761

5 minutes

GA Selection

• After encoding states and calculating fitness, select pairs for reproduction

‣ many methods can be used

• Simplest: randomly choose pairs of states with non-uniform probability

State Fitness P(selection)

13637441 24 24/(24+23+20+11) = .31

21641300 23 23/(24 + 23 + 20 + 11) = .29

13305013 20 20/(24 + 23 + 20 + 11) = .26

21432102 11 11/(24 + 23 + 20 + 11) = .14

14%

26%

29%

31%

41300

State Fitness P(selection)

24 24/(24+23+20+11) = .31

23 23/(24 + 23 + 20 + 11) = .29

20 20/(24 + 23 + 20 + 11) = .26

11 11/(24 + 23 + 20 + 11) = .14

216

GA: Selection for Reproduction

21432102

13637441

13305013

1

1

41300

State Fitness P(selection)

24 24/(24+23+20+11) = .31

23 23/(24 + 23 + 20 + 11) = .29

20 20/(24 + 23 + 20 + 11) = .26

11 11/(24 + 23 + 20 + 11) = .14

216

GA: Selection for Reproduction

21432102

136 37441

13305013

1

1

41300

State Fitness P(selection)

24 24/(24+23+20+11) = .31

23 23/(24 + 23 + 20 + 11) = .29

20 20/(24 + 23 + 20 + 11) = .26

11 11/(24 + 23 + 20 + 11) = .14

30021641

216

GA: Selection for Reproduction

21432102

136 37441

13305013

1

2

1

2

41300

State Fitness P(selection)

24 24/(24+23+20+11) = .31

23 23/(24 + 23 + 20 + 11) = .29

20 20/(24 + 23 + 20 + 11) = .26

11 11/(24 + 23 + 20 + 11) = .14

30021641

216

GA: Selection for Reproduction

21432102

136 37441

13305 013

1

23

4

1

2

3

4

41300

30021641

216

GA: Selection for Reproduction

136 37441

13305 013

1

23

4

1

2

3

4

41300

30021641

216

GA: Selection for Reproduction

136

37441

13305 013

1

23

4

1

2

3

4

Cross-over leads to new population

41300

300

21641

216

GA: Selection for Reproduction

136

37441

13305

013

1

23

4

1

2

3

4

Cross-over leads to new population

41300

300

21641

216

GA: Selection for Reproduction

136

37441

13305

013

1

23

4

1

2

3

4

Cross-over leads to new population

25

GA: Crossover

• From each individual randomly select a crossover point:

Note: 2 new individuals are created from every 2 pairs

26

Genetic algorithms

State representation of the parents and the second offspring

+ =

Two parent cross over to produce two offspring

13637441

25

GA: Crossover

• From each individual randomly select a crossover point:

Note: 2 new individuals are created from every 2 pairs

26

Genetic algorithms

State representation of the parents and the second offspring

+ =

Two parent cross over to produce two offspring

21641300

Cross-over

• Pairs are selected by one of a range of methods:

‣ Roulette-wheel (as in our example)

‣ Tournament

• Cross-over point for each pair is randomly selected

‣ Resulting new chromosomes represent new states

⊕ ➽

NB not agreed symbols!

25

GA: Crossover

• From each individual randomly select a crossover point:

Note: 2 new individuals are created from every 2 pairs

26

Genetic algorithms

State representation of the parents and the second offspring

+ =

Two parent cross over to produce two offspring

21637441

Mutation

• Cross-over is not enough

‣ if the population does not contain examples that have each bit of the chromosome at both possible values parts of the search space are inaccessible

• So introduce mutation

‣ low probability of flipping a random bit at each cross-over step

25

GA: Crossover

• From each individual randomly select a crossover point:

Note: 2 new individuals are created from every 2 pairs

26

Genetic algorithms

State representation of the parents and the second offspring

+ =

Two parent cross over to produce two offspring

21637441

25

GA: Crossover

• From each individual randomly select a crossover point:

Note: 2 new individuals are created from every 2 pairs

26

Genetic algorithms

State representation of the parents and the second offspring

+ =

Two parent cross over to produce two offspring

21637451

25

GA: Crossover

• From each individual randomly select a crossover point:

Note: 2 new individuals are created from every 2 pairs

26

Genetic algorithms

State representation of the parents and the second offspring

+ =

Two parent cross over to produce two offspring

25

GA: Crossover

• From each individual randomly select a crossover point:

Note: 2 new individuals are created from every 2 pairs

26

Genetic algorithms

State representation of the parents and the second offspring

+ =

Two parent cross over to produce two offspring

The whole process

• Initial Population

‣ Random bit strings – make sure every bit means something

• Selection

‣ Select according to output of fitness function

‣ Various methods (look them up!)

• Cross-over

‣ Random point in chromosome

‣ Various methods, always involving death of unfit chomosomes

• Mutation

‣ To introduce possibilities not currently present in the population

• Repeat from “selection” until fitness levels are as desired

Adversarial Search and Games

• We have assumed so far that the world we are searching doesn't change

‣ This is often not the case

• Consider Games

‣ Two or more players in competition

‣ There are many kinds of game, but we will only deal with zero-sum games of perfect information๏ 2 players: player one wins → 1

๏ player two loses → -1

๏ TOTAL 0

Why study games?

• Games are a form of multi-agent environment

‣ What do other agents do and how do they affect our success?

‣ Cooperative vs. competitive multi-agent environments

‣ Competitive multi-agent environments give rise to adversarial search = games

• Why study games?

‣ Interesting subject of AI study because they are hard for humans

‣ Easy to represent and agents restricted to small number of actions

Adversarial vs. non-adversarial search

• Search – no adversary

‣ Solution is (heuristic) method for finding goal

‣ Heuristics and constraint satisfaction techniques can find optimal solution

‣ Evaluation function: estimate of cost from start to goal through given node

‣ Examples: path planning, scheduling

• Games – adversary

‣ Solution is strategy (strategy specifies move for every possible opponent reply)

‣ Sometimes time limits force an approximate solution

‣ Evaluation function: evaluate quality of game position

‣ Examples: chess, checkers, Othello, war games, simulations of competition for limited resources

Types of Game

deterministic chance

perfect information

chess, draughts, go,

othello

backgammon, monopoly

imperfect information

bridge, poker, scrabble,

nuclear war

The MINIMAX method

• Game setup

‣ Two players: MAX and MIN

‣ MAX moves first and they take turns until the game is over

‣ Winner gets prize, loser gets penalty

• Games as search:

‣ Initial state: e.g. board configuration of chess

‣ Successor function: listof(move,state) pairs specifying legal moves

‣ Terminal test: Is the game finished?

‣ Utility function: Gives numerical value of terminal states๏ E.g. win (+1), loose (-1) and draw (0) in noughts and crosses (tic-tac-toe)

‣ Each player uses a search tree to determine next move

Partial game tree for noughts and crosses

MAX

No choice

• This examplestarts well intothe game

• X is MAX

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O

Optimal Strategies

• Find the contingent strategy for MAX assuming an infallible MIN opponent

• Assumption: Both players play optimally

• Given a game tree, the optimal strategy can be determined by using the minimax value of each node, n:

‣ if n is a terminal node, MINIMAX-VALUE(n) = UTILITY(n)

‣ if n is a MAX node, MINIMAX-VALUE(n) = maxs ∈ successors(n) MINIMAX-VALUE(s)

‣ if n is a MIN node, MINIMAX-VALUE(n) = mins ∈ successors(n) MINIMAX-VALUE(s)

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O-2

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O-2

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O-2

-2

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O-2

-2

-2

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O-2

-2 10

-2

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O-2

-2 10

-2 10

MAX

no choice

• do DFS exhaustively to computetopmost MINIMAX value

• Utility fn:

‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -

10 * number of lines with 3 Os

Partial game tree for noughts and crosses

X O

O

O

X X

O

X O

O

O

X X

O X

X O

O

O

X X

O X

X O

O

O

X X

O X O-2

-2 10

-2 10

MAX

no choice

3 12 8 2 4 6 14 5 2

General 2-ply Game Tree

12 8 2 4 6 14 5 2

• Imaginary game, made-up values for example here

• Algorithm maximises the worst-case outcome for MAX

MAX

1-plyMIN

3

Minimax decision

Multiplayer games

• Replace single zero-sum utility function with a function for each player

• Use a vector of values, one for each player

43

Multiplayer games• Games allow more than two players• Single minimax values become vectors

44

Problem of minimax search• Number of games states is exponential to the

number of moves.– Solution: Do not examine every node – ==> Alpha-beta pruning

• Remove branches that do not influence final decision

• Revisit example …

MINIMAX algorithm

41

Minimax Algorithmfunction MINIMAX-DECISION(state) returns an action

inputs: state, current state in gamevmMAX-VALUE(state)return the action in SUCCESSORS(state) with value v

function MIN-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m �for a,s in SUCCESSORS(state) do

v m MIN(v,MAX-VALUE(s))return v

function MAX-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m ��for a,s in SUCCESSORS(state) do

v m MAX(v,MIN-VALUE(s))return v

42

Properties of Minimax

O(bm)Space

O(bm)Time

MinimaxCriterion

-

/

More About MINIMAX

• Definition of optimal play for MAX assumes MIN plays optimally

‣ maximizes worst-case outcome for MAX

• But if MIN plays worse than optimally

‣ MAX will do even better

• Complexity:

‣ Time: O(bm) - exponential in depth of tree

‣ Space: O(m) - linear in depth of tree

• Can improve effective complexity by Alpha-Beta (!-") Pruning

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

12 8 2 4 6 14 5 2

MAX

1-ply

MIN

3

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

12 8 2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

3

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

12 8 2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

3

[-∞,∞]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

12 8 2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

3

[-∞,∞]

[-∞,∞]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

12 8 2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[-∞,∞]

[-∞,3]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

12

8 2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[-∞,∞]

[-∞,3]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

8 2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[-∞,∞]

[-∞,3]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[-∞,∞]

[-∞,3]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

2 4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,∞]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,2]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

4 6 14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,2]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

14 5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,∞]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

5 2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,14]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

2

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,5]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,2]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

[3,2]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

[3,∞]

3 12 8 2 4 6 14 5 2

Alpha-beta pruning in MINIMAX

• Use the same algorithm as before, but consider ranges instead of values

‣ range is expressed as a pair, [lowest,highest] or [!,"]

MAX

1-ply

MIN

3

Alpha-Beta Algorithm

53

Alpha-Beta Example (continued)

[2,2][-�,2]

[3,3]

[3,3]

54

Alpha-Beta Algorithmfunction ALPHA-BETA-SEARCH(state) returns an action

inputs: state, current state in gamevmMAX-VALUE(state, - � , +�)return the action in SUCCESSORS(state) with value v

function MAX-VALUE(state,D , E) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m - �for a,s in SUCCESSORS(state) do

v m MAX(v,MIN-VALUE(s, D , E))if v � E then return vD m MAX(D ,v)

return v

55

Alpha-Beta Algorithm

function MIN-VALUE(state, D , E) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m + �for a,s in SUCCESSORS(state) do

v m MIN(v,MAX-VALUE(s, D , E))if v � D then return vE m MIN(E ,v)

return v

56

General alpha-beta pruning

• Consider a node n somewhere in the tree

• If player has a better choice at– Parent node of n– Or any choice point further up

• n will never be reached in actual play.

• Hence when enough is known about n, it can be pruned.

a, s is an [action,state] pair

General alpha-beta pruning

• Consider a node n somewhere in the tree

• If player has a better choice at

‣ Parent node of n

‣ Or any choice point further up

• n will never be reached in actual play

• Hence when enough is known about n, it can be pruned.

55

Alpha-Beta Algorithm

function MIN-VALUE(state, D , E) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m + �for a,s in SUCCESSORS(state) do

v m MIN(v,MAX-VALUE(s, D , E))if v � D then return vE m MIN(E ,v)

return v

56

General alpha-beta pruning

• Consider a node n somewhere in the tree

• If player has a better choice at– Parent node of n– Or any choice point further up

• n will never be reached in actual play.

• Hence when enough is known about n, it can be pruned.

Properties of Alpha-Beta pruning

• Pruning does not affect final results

• Entire subtrees can be pruned

• Good move ordering improves effectiveness of pruning

• With “perfect ordering”, time complexity is O(bm/2)=O((b1/2)m)

‣ Branching factor of √b

‣ Best case Alpha-beta pruning can look twice as far ahead as plain MINIMAX in a given amount of time

• Repeated states are again possible‣ Store them in memory; make large gain in memory from pruning

Games of imperfect information

• Minimax and alpha-beta pruning require too much leaf-node evaluations

• May be impractical within a reasonable amount of time

• Shannon (1950):

‣ Cut off search earlier (replace TERMINAL-TEST by CUTOFF- TEST)

‣ Apply heuristic evaluation function EVAL (replacing UTILITY function of alpha-beta)

Cutting off search

• Change the termination condition of MINIMAX:

‣ if TERMINAL-TEST(state) then return UTILITY(state)

into

‣ if CUTOFF-TEST(state,depth) then return EVAL(state)

• Can introduce a fixed or dynamic depth limit

‣ selected so (e.g.) time will not exceed what the rules of the game allow

• When cutoff occurs, and we are not at a terminal state, heuristic evaluation is performed

‣ in other words, if you don’t know, then take your best guess

Heuristic Evaluation

• Idea:

‣ produce an estimate of the expected utility of the game from a given position

• Performance depends on quality of EVAL

• Requirements:

‣ EVAL should value terminal-nodes in the same order as UTILITY

‣ Computation may not take too long

‣ For non-terminal states the EVAL should be strongly correlated with the actual chance of winning

‣ Only useful for quiescent states (=no wild swings in value in near future)

Games that include chance

• In backgammon, you roll 2 dice, and then use both numbers in either order

‣ so a double, [1,1], has probablity 1/36, all other moves have probability 1/18

• In this tree, we calculate the expected value of MINIMAX

67

Games that include chance

• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected value

68

Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=

UTILITY(n) If n is a terminalmaxs � successors(n) MINIMAX-VALUE(s) If n is a max nodemins � successors(n) MINIMAX-VALUE(s) If n is a max node¦s � successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

67

Games that include chance

• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected value

68

Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=

UTILITY(n) If n is a terminalmaxs � successors(n) MINIMAX-VALUE(s) If n is a max nodemins � successors(n) MINIMAX-VALUE(s) If n is a max node¦s � successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

Multiply MINIMAX values by

probabilities

Expected MINIMAX value

• Given a game tree, the optimal strategy can be determined by using the minimax value of each node, n:

‣ if n is a terminal node, ๏ EXPECTED-MINIMAX-VALUE(n) = UTILITY(n)

‣ if n is a MAX node, ๏ EXPECTED-MINIMAX-VALUE(n) = maxs ∈ successors(n) EXPECTED-MINIMAX-VALUE(s)

‣ if n is a MIN node, ๏ EXPECTED-MINIMAX-VALUE(n) = mins ∈ successors(n) EXPECTED-MINIMAX-VALUE(s)

‣ if n is a CHANCE node,๏ EXPECTED-MINIMAX-VALUE(n) = ∑s ∈ successors(n) P(s)EXPECTED-MINIMAX-VALUE(s)

Position evalation with CHANCE nodes

• On left, A1 wins; on right, A2 wins

‣ outcome of evaluation may change if values are not scaled linearly

• If you need to scale values, use a positive linear transformation

69

Position evaluation with chance nodes

• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change when values are

scaled differently.• Behavior is preserved only by a positive linear transformation of

EVAL.

70

Discussion• Examine section on state-of-the-art games yourself• Minimax assumes right tree is better than left, yet …

– Return probability distribution over possible values– Yet expensive calculation

Summary

• Games are fun, useful and potentially distracting

• They illustrate many important points about AI

‣ Perfection is unattainable, so approximation is necessary

‣ We need a good idea of strategy to win๏ and MINIMAX allows us to express that in terms of final states - relatively easy

‣ Uncertainty constrains the assignment of values to states

• Games are to AI as grand prix racing is to car design.

Recommended