Upload
ngokhanh
View
213
Download
0
Embed Size (px)
Citation preview
Local Search and Games
Geraint A. WigginsProfessor of Computational CreativityDepartment of Computer ScienceVrije Universiteit Brussel
Objectives
• Learn about
‣ Local search algorithms
‣ Hill-climbing search
‣ Simulated annealing search
‣ Local beam search
‣ Genetic algorithms
‣ Optimal decisions
‣ α-β pruning
‣ Imperfect, real-time decision making
Local Search Algorithms
• Good for optimization problems
‣ no goal test
‣ the path to the goal is irrelevant
‣ the goal state itself is the solution ๏ integrated circuit design
๏ factory-floor layout
๏ automatic programming
๏ telecommunications network optimization
• Find configuration satisfying constraints,
‣ e.g. n-queens (we return to this later)
Local Search Algorithms
• State
‣ keep a single "current" state, try to improve it
‣ cf. keeping the entire path and trying to optimise that
• Advantages
‣ Use little memory (usually constant amount)
‣ Can find reasonable solutions in large or infinite (continuous) problems
Example: the n-Queens puzzle
• Put n queens on an n×n chess board such that no queen can attack another
‣ ie. no two queens on the same row, column, or diagonal
3
Example: n-queens• Put n queens on an n × n board such that no
queen can attack another. (ie. no two queens on the same row, column, or diagonal)
Failed solutions to the 4 Queens problem
4
State space landscape
• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.
• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum
3
Example: n-queens• Put n queens on an n × n board such that no
queen can attack another. (ie. no two queens on the same row, column, or diagonal)
Failed solutions to the 4 Queens problem
4
State space landscape
• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.
• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum
3
Example: n-queens• Put n queens on an n × n board such that no
queen can attack another. (ie. no two queens on the same row, column, or diagonal)
Failed solutions to the 4 Queens problem
4
State space landscape
• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.
• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum
Failed solutions to the 4 Queens problem
3
Example: n-queens• Put n queens on an n × n board such that no
queen can attack another. (ie. no two queens on the same row, column, or diagonal)
Failed solutions to the 4 Queens problem
4
State space landscape
• A state has:– Location: where the state is in the state space– Elevation: the associated heuristic cost or objective function.
• Solutions:– Heuristic cost (find minimum cost) : global minimum.– Objective function (find best state or maximal cost): global maximum
objective function global maximum
shoulder
local maximum
plateau
currentstate
state space
State space landscape
• A state has
‣ Location๏ where it is in the
state space
‣ Elevation๏ its heuristic cost or
objective value
• Solutions (two ways of thinking about it, but equivalent)
‣ Heuristic cost (find minimum cost) : global minimum.
‣ Objective function (find best state or maximal cost): global maximum
Local search algorithms
• “Local” because we don’t care about the path through the search space
• A local search algorithm is
‣ complete: if it always finds a goal if one exists
‣ optimal: if it finds the global minimum/maximum
• Problem: depending on initial state, search can get stuck in local maxima/minima
Hill-climbing search
• Like agenda-based search, but the agenda contains only one item
‣ so we take the best option at each stage, and throw away all the others
5
Local search & the state space landscape
• A local search algorithm is:
– complete: if it always finds a goal if one exists
– optimal: if it finds the global minimum/maximum
Problem: depending on initial state, can get stuck in local maxima
6
Hill-climbing search
• "Like climbing Everest in thick fog with
amnesia"
This algorithm is based on an objective function, how
would you modify it to use a heuristic function?
Hill-climbing: the 8-queens problem
• Local search: use complete states
‣ always 8 queens on board
‣ the numbers in the example count attacks if a piece moves vertically
• Heuristic
‣ h = number of pairs of queens that are attacking each other, either directly or indirectly
‣ h = 17 for the example
‣ h = 0 is a solution state (or global minimum)
• Successors?
‣ The set of states generated by moving one queen
7
Hill-climbing search: 8-queens problem
• Local search uses complete
states (always 8 queens on the board)
• Heuristic?
h = number of pairs of queens
that are attacking each other,
either directly or indirectly
h = 17 for the example
h = 0 is a solution state (or global
minimum)
• Successors?
– The set of states generated by moving one queen.
8
Hill-climbing search: 8-queens problem
• A local minimum
with h = 1
10 minutes
What is the set of successor states?
Why is this a local minimum?
What is the next step of the algorithm on slide 6?
Hill-climbing: the 8-queens problem
• A local minimum with h = 1
• What is the set of successor states?
• Why is this a local minimum?
• What is the next step of hill-climbing?
7
Hill-climbing search: 8-queens problem
• Local search uses complete
states (always 8 queens on the board)
• Heuristic?
h = number of pairs of queens
that are attacking each other,
either directly or indirectly
h = 17 for the example
h = 0 is a solution state (or global
minimum)
• Successors?
– The set of states generated by moving one queen.
8
Hill-climbing search: 8-queens problem
• A local minimum
with h = 1
10 minutes
What is the set of successor states?
Why is this a local minimum?
What is the next step of the algorithm on slide 6?
Hill-climbing search
• Greedy local search
‣ grabs best neighbour without “thinking ahead”
• Rapid improvement in heuristic value
‣ But at a fatal cost
‣ Local Maxima/Minima will stop search altogether๏ Ridges: a sequence of local maxima that make
it difficult to escape.
๏ Plateaux: heuristic values uninformative
๏ Foothills: local maxima that are not global maxima
9
Hill Climbing
• Often considered greedy local search ĺ grabs best neighbour without thinking ahead!
• Rapid improvement! But at a fatal cost!– Local Maxima/Minimas
• Ridges: A sequence of local maximas that make it difficult to escape.
• Plateaux: heuristic values uninformative
• Foothills: local maximas that are not global maximas
10
Result of local minimas/maximas
• Algorithm reaches a point where no progress is made.
• Start from random configuration of 8-queens.86%: stuck 14%: solution
– Works quickly usually only evaluating 3-4 steps before either getting stuck or finding a solution.
How bad is the problem?
• Algorithm reaches a point where no progress is made
• To test on the 8-queens example:
‣ run many searches starting from random configurations of 8-queens๏ too many to do it exhaustively
‣ result๏ 86% of starting configurations get stuck
๏ 14% of starting configurations succeed
‣ the algorithms works quickly, usually only evaluating 3-4 steps before either getting stuck or finding a solution
How can we solve the problem?
• Allow different moves
‣ e.g., sideways
‣ must account for local maxima (foothills) otherwise could get stuck in an infinite loop๏ e.g., cap number of sideways moves
๏ but we risk losing generality with such ad hoc solutions
• Result:
‣ 6% stuck
‣ 94% solution
• Works much slower
‣ evaluating 21-64 steps before either getting stuck or finding a solution
Variants of hill-climbing
• Stochastic hill-climbing
‣ choose randomly between available uphill moves๏ choose from uniform or non-uniform distribution
‣ converges more slowly but sometimes finds better solutions๏ optimality still not guaranteed in all cases
• First-choice hill-climbing
‣ Good for big problems ๏ Uses Stochastic hill climbing but randomly generates successors and picks first larger one
• Random-restart hill-climbing
‣ If at first you don't succeed, try again, starting from a different place๏ requires that non-optimality can be recognised
Simulated Annealing search
• Hill-climbing
‣ only improves on the current solution
‣ not complete๏ may get stuck in local minima/maxima
• Random walk
‣ moves from state to state randomly
‣ complete (given infinite time) but very inefficient
• Simulated Annealing combines these two to give a compromise between search complexity and completeness
Simulated Annealing search
• Escape local maxima by allowing some bad moves
• Gradually decrease frequency of allowed bad moves as search proceeds
• By analogy with a process of hardening in steel production
‣ want crystal structure of metal to be all lined up
‣ heat the metal to make the crystals vibrate and “shake out” irregularities
‣ let it cool slowly so that the more ordered form stays ordered.
Simulated Annealing search
• So in the algorithm, we have an imaginary notion of “temperature”, which allows random movement outside the hill-climb
‣ as the “temperature” drops, less random movement is allowed
13
Simulated annealing search
• Hill climbing: Only improves on the solution! Not complete ĺ may get stuck in local minima/maxima.
• Random walk: Moves from state to state randomly! Complete (will eventually find a solution) but very inefficient.
Simulated annealing combines both these approaches
14
Simulated annealing search
• Idea: escape local maxima by allowing some "bad" moves but gradually decrease their frequency
Simulated Annealing search
• Like getting a ping-pong ball into the deepest crevice of a bumpy surface
‣ Left alone by itself, ball will roll into a local minimum
‣ If we shake the surface, we can bounce the ball out of a local minimum
‣ The trick is to shake hard enough to get it out of local minimum, but not hard enough to dislodge it from global one
‣ We start by shaking hard and then gradually reduce the intensity of shaking
• If the “temperature” decreases slowly enough, simulated annealing search finds a global optimum with probability approaching 1.0
• Widely used in VLSI layout, airlines scheduling, etc.
Local Beam Search
• Keep track of k states rather than just one
• Start with k randomly generated states
• At each iteration
‣ generate all successors of all k states๏ NB: not the same as an agenda k long!
‣ if any one is a goal state, stop
‣ else select the k best successors from the complete list and repeat
• So this is like a kind of selective breadth first search
Local Beam Search
• Local beam search looks like running k hill-climbing algorithms in parallel, but it is not
‣ the results of all k states influence each other
‣ if one state generates several good successors; they all end up in the next iteration
‣ states generating bad successors are weeded out
• This is both a strength and a weakness:
‣ unfruitful searches are quickly abandoned and searches making the most progress are intensified
‣ can lead to a lack of diversity: concentration in a small region of the search space
‣ remedy: choose k successors randomly, biasing choice towards good ones
Genetic Algorithms
• Genetic Algorithms (GAs) are a variant of local beam search
• A successor state is generated by combining two parent states
• Start with a large number of randomly generated states (a population)
• A state is represented as a binary string
• An evaluation function or fitness function assesses the quality of a state
‣ higher values for better states.
• Produce the next generation of states by
‣ selection
‣ crossover
‣ mutation
Problem encoding for GAs
• The way that you choose to encode the problem often has implications for the solution
• Example: 8 queens, 8x8 board
‣ 64 bits, one for each square? (= 8x8 matrix)
‣ 8 octal digits? (one 3-bit number for each column/row)
Genetic Algorithms
• 21641300 (octal)
• 010001110010001011000000 (binary)
• Fitness function: number of non-attacking pairs of queens (min = 0, max = 28)
‣ Fitness here = 6 + 5 + 4 + 3 + 3 + 2 + 0 = 23
21
Genetic algorithms
corresponds to
Fitness function: number of non-attacking* pairs of queens (min = 0, max = 28)
Fitness: = (6 + 5 + 4 + 3 + 3 + 2 + 0) = 23
* fitness functions needs to return higher values for better states….
22
8 Queens & GAs
Draw the board state this corresponds to.Work out the fitness of this state.
43228761
5 minutes
GA Selection
• After encoding states and calculating fitness, select pairs for reproduction
‣ many methods can be used
• Simplest: randomly choose pairs of states with non-uniform probability
State Fitness P(selection)
13637441 24 24/(24+23+20+11) = .31
21641300 23 23/(24 + 23 + 20 + 11) = .29
13305013 20 20/(24 + 23 + 20 + 11) = .26
21432102 11 11/(24 + 23 + 20 + 11) = .14
14%
26%
29%
31%
41300
State Fitness P(selection)
24 24/(24+23+20+11) = .31
23 23/(24 + 23 + 20 + 11) = .29
20 20/(24 + 23 + 20 + 11) = .26
11 11/(24 + 23 + 20 + 11) = .14
216
GA: Selection for Reproduction
21432102
13637441
13305013
1
1
41300
State Fitness P(selection)
24 24/(24+23+20+11) = .31
23 23/(24 + 23 + 20 + 11) = .29
20 20/(24 + 23 + 20 + 11) = .26
11 11/(24 + 23 + 20 + 11) = .14
216
GA: Selection for Reproduction
21432102
136 37441
13305013
1
1
41300
State Fitness P(selection)
24 24/(24+23+20+11) = .31
23 23/(24 + 23 + 20 + 11) = .29
20 20/(24 + 23 + 20 + 11) = .26
11 11/(24 + 23 + 20 + 11) = .14
30021641
216
GA: Selection for Reproduction
21432102
136 37441
13305013
1
2
1
2
41300
State Fitness P(selection)
24 24/(24+23+20+11) = .31
23 23/(24 + 23 + 20 + 11) = .29
20 20/(24 + 23 + 20 + 11) = .26
11 11/(24 + 23 + 20 + 11) = .14
30021641
216
GA: Selection for Reproduction
21432102
136 37441
13305 013
1
23
4
1
2
3
4
41300
30021641
216
GA: Selection for Reproduction
136 37441
13305 013
1
23
4
1
2
3
4
41300
30021641
216
GA: Selection for Reproduction
136
37441
13305 013
1
23
4
1
2
3
4
Cross-over leads to new population
41300
300
21641
216
GA: Selection for Reproduction
136
37441
13305
013
1
23
4
1
2
3
4
Cross-over leads to new population
41300
300
21641
216
GA: Selection for Reproduction
136
37441
13305
013
1
23
4
1
2
3
4
Cross-over leads to new population
25
GA: Crossover
• From each individual randomly select a crossover point:
Note: 2 new individuals are created from every 2 pairs
26
Genetic algorithms
State representation of the parents and the second offspring
+ =
Two parent cross over to produce two offspring
13637441
25
GA: Crossover
• From each individual randomly select a crossover point:
Note: 2 new individuals are created from every 2 pairs
26
Genetic algorithms
State representation of the parents and the second offspring
+ =
Two parent cross over to produce two offspring
21641300
Cross-over
• Pairs are selected by one of a range of methods:
‣ Roulette-wheel (as in our example)
‣ Tournament
• Cross-over point for each pair is randomly selected
‣ Resulting new chromosomes represent new states
⊕ ➽
NB not agreed symbols!
25
GA: Crossover
• From each individual randomly select a crossover point:
Note: 2 new individuals are created from every 2 pairs
26
Genetic algorithms
State representation of the parents and the second offspring
+ =
Two parent cross over to produce two offspring
21637441
Mutation
• Cross-over is not enough
‣ if the population does not contain examples that have each bit of the chromosome at both possible values parts of the search space are inaccessible
• So introduce mutation
‣ low probability of flipping a random bit at each cross-over step
25
GA: Crossover
• From each individual randomly select a crossover point:
Note: 2 new individuals are created from every 2 pairs
26
Genetic algorithms
State representation of the parents and the second offspring
+ =
Two parent cross over to produce two offspring
21637441
➽
25
GA: Crossover
• From each individual randomly select a crossover point:
Note: 2 new individuals are created from every 2 pairs
26
Genetic algorithms
State representation of the parents and the second offspring
+ =
Two parent cross over to produce two offspring
21637451
25
GA: Crossover
• From each individual randomly select a crossover point:
Note: 2 new individuals are created from every 2 pairs
26
Genetic algorithms
State representation of the parents and the second offspring
+ =
Two parent cross over to produce two offspring
25
GA: Crossover
• From each individual randomly select a crossover point:
Note: 2 new individuals are created from every 2 pairs
26
Genetic algorithms
State representation of the parents and the second offspring
+ =
Two parent cross over to produce two offspring
The whole process
• Initial Population
‣ Random bit strings – make sure every bit means something
• Selection
‣ Select according to output of fitness function
‣ Various methods (look them up!)
• Cross-over
‣ Random point in chromosome
‣ Various methods, always involving death of unfit chomosomes
• Mutation
‣ To introduce possibilities not currently present in the population
• Repeat from “selection” until fitness levels are as desired
Adversarial Search and Games
• We have assumed so far that the world we are searching doesn't change
‣ This is often not the case
• Consider Games
‣ Two or more players in competition
‣ There are many kinds of game, but we will only deal with zero-sum games of perfect information๏ 2 players: player one wins → 1
๏ player two loses → -1
๏ TOTAL 0
Why study games?
• Games are a form of multi-agent environment
‣ What do other agents do and how do they affect our success?
‣ Cooperative vs. competitive multi-agent environments
‣ Competitive multi-agent environments give rise to adversarial search = games
• Why study games?
‣ Interesting subject of AI study because they are hard for humans
‣ Easy to represent and agents restricted to small number of actions
Adversarial vs. non-adversarial search
• Search – no adversary
‣ Solution is (heuristic) method for finding goal
‣ Heuristics and constraint satisfaction techniques can find optimal solution
‣ Evaluation function: estimate of cost from start to goal through given node
‣ Examples: path planning, scheduling
• Games – adversary
‣ Solution is strategy (strategy specifies move for every possible opponent reply)
‣ Sometimes time limits force an approximate solution
‣ Evaluation function: evaluate quality of game position
‣ Examples: chess, checkers, Othello, war games, simulations of competition for limited resources
Types of Game
deterministic chance
perfect information
chess, draughts, go,
othello
backgammon, monopoly
imperfect information
bridge, poker, scrabble,
nuclear war
The MINIMAX method
• Game setup
‣ Two players: MAX and MIN
‣ MAX moves first and they take turns until the game is over
‣ Winner gets prize, loser gets penalty
• Games as search:
‣ Initial state: e.g. board configuration of chess
‣ Successor function: listof(move,state) pairs specifying legal moves
‣ Terminal test: Is the game finished?
‣ Utility function: Gives numerical value of terminal states๏ E.g. win (+1), loose (-1) and draw (0) in noughts and crosses (tic-tac-toe)
‣ Each player uses a search tree to determine next move
Partial game tree for noughts and crosses
MAX
No choice
• This examplestarts well intothe game
• X is MAX
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O
Optimal Strategies
• Find the contingent strategy for MAX assuming an infallible MIN opponent
• Assumption: Both players play optimally
• Given a game tree, the optimal strategy can be determined by using the minimax value of each node, n:
‣ if n is a terminal node, MINIMAX-VALUE(n) = UTILITY(n)
‣ if n is a MAX node, MINIMAX-VALUE(n) = maxs ∈ successors(n) MINIMAX-VALUE(s)
‣ if n is a MIN node, MINIMAX-VALUE(n) = mins ∈ successors(n) MINIMAX-VALUE(s)
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O-2
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O-2
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O-2
-2
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O-2
-2
-2
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O-2
-2 10
-2
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O-2
-2 10
-2 10
MAX
no choice
• do DFS exhaustively to computetopmost MINIMAX value
• Utility fn:
‣ number of lines with 2 Xs +10 * number of lines with 3 Xs -number of lines with 2 Os -
10 * number of lines with 3 Os
Partial game tree for noughts and crosses
X O
O
O
X X
O
X O
O
O
X X
O X
X O
O
O
X X
O X
X O
O
O
X X
O X O-2
-2 10
-2 10
MAX
no choice
3 12 8 2 4 6 14 5 2
General 2-ply Game Tree
12 8 2 4 6 14 5 2
• Imaginary game, made-up values for example here
• Algorithm maximises the worst-case outcome for MAX
MAX
1-plyMIN
3
Minimax decision
Multiplayer games
• Replace single zero-sum utility function with a function for each player
• Use a vector of values, one for each player
43
Multiplayer games• Games allow more than two players• Single minimax values become vectors
44
Problem of minimax search• Number of games states is exponential to the
number of moves.– Solution: Do not examine every node – ==> Alpha-beta pruning
• Remove branches that do not influence final decision
• Revisit example …
MINIMAX algorithm
41
Minimax Algorithmfunction MINIMAX-DECISION(state) returns an action
inputs: state, current state in gamevmMAX-VALUE(state)return the action in SUCCESSORS(state) with value v
function MIN-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m �for a,s in SUCCESSORS(state) do
v m MIN(v,MAX-VALUE(s))return v
function MAX-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m ��for a,s in SUCCESSORS(state) do
v m MAX(v,MIN-VALUE(s))return v
42
Properties of Minimax
O(bm)Space
O(bm)Time
MinimaxCriterion
-
/
More About MINIMAX
• Definition of optimal play for MAX assumes MIN plays optimally
‣ maximizes worst-case outcome for MAX
• But if MIN plays worse than optimally
‣ MAX will do even better
• Complexity:
‣ Time: O(bm) - exponential in depth of tree
‣ Space: O(m) - linear in depth of tree
• Can improve effective complexity by Alpha-Beta (!-") Pruning
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
12 8 2 4 6 14 5 2
MAX
1-ply
MIN
3
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
12 8 2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
3
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
12 8 2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
3
[-∞,∞]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
12 8 2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
3
[-∞,∞]
[-∞,∞]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
12 8 2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[-∞,∞]
[-∞,3]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
12
8 2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[-∞,∞]
[-∞,3]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
8 2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[-∞,∞]
[-∞,3]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[-∞,∞]
[-∞,3]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
2 4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,∞]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,2]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
4 6 14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,2]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
14 5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,∞]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
5 2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,14]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
2
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,5]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,2]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
[3,2]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
[3,∞]
3 12 8 2 4 6 14 5 2
Alpha-beta pruning in MINIMAX
• Use the same algorithm as before, but consider ranges instead of values
‣ range is expressed as a pair, [lowest,highest] or [!,"]
MAX
1-ply
MIN
3
Alpha-Beta Algorithm
53
Alpha-Beta Example (continued)
[2,2][-�,2]
[3,3]
[3,3]
54
Alpha-Beta Algorithmfunction ALPHA-BETA-SEARCH(state) returns an action
inputs: state, current state in gamevmMAX-VALUE(state, - � , +�)return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state,D , E) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m - �for a,s in SUCCESSORS(state) do
v m MAX(v,MIN-VALUE(s, D , E))if v � E then return vD m MAX(D ,v)
return v
55
Alpha-Beta Algorithm
function MIN-VALUE(state, D , E) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m + �for a,s in SUCCESSORS(state) do
v m MIN(v,MAX-VALUE(s, D , E))if v � D then return vE m MIN(E ,v)
return v
56
General alpha-beta pruning
• Consider a node n somewhere in the tree
• If player has a better choice at– Parent node of n– Or any choice point further up
• n will never be reached in actual play.
• Hence when enough is known about n, it can be pruned.
a, s is an [action,state] pair
General alpha-beta pruning
• Consider a node n somewhere in the tree
• If player has a better choice at
‣ Parent node of n
‣ Or any choice point further up
• n will never be reached in actual play
• Hence when enough is known about n, it can be pruned.
55
Alpha-Beta Algorithm
function MIN-VALUE(state, D , E) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v m + �for a,s in SUCCESSORS(state) do
v m MIN(v,MAX-VALUE(s, D , E))if v � D then return vE m MIN(E ,v)
return v
56
General alpha-beta pruning
• Consider a node n somewhere in the tree
• If player has a better choice at– Parent node of n– Or any choice point further up
• n will never be reached in actual play.
• Hence when enough is known about n, it can be pruned.
Properties of Alpha-Beta pruning
• Pruning does not affect final results
• Entire subtrees can be pruned
• Good move ordering improves effectiveness of pruning
• With “perfect ordering”, time complexity is O(bm/2)=O((b1/2)m)
‣ Branching factor of √b
‣ Best case Alpha-beta pruning can look twice as far ahead as plain MINIMAX in a given amount of time
• Repeated states are again possible‣ Store them in memory; make large gain in memory from pruning
Games of imperfect information
• Minimax and alpha-beta pruning require too much leaf-node evaluations
• May be impractical within a reasonable amount of time
• Shannon (1950):
‣ Cut off search earlier (replace TERMINAL-TEST by CUTOFF- TEST)
‣ Apply heuristic evaluation function EVAL (replacing UTILITY function of alpha-beta)
Cutting off search
• Change the termination condition of MINIMAX:
‣ if TERMINAL-TEST(state) then return UTILITY(state)
into
‣ if CUTOFF-TEST(state,depth) then return EVAL(state)
• Can introduce a fixed or dynamic depth limit
‣ selected so (e.g.) time will not exceed what the rules of the game allow
• When cutoff occurs, and we are not at a terminal state, heuristic evaluation is performed
‣ in other words, if you don’t know, then take your best guess
Heuristic Evaluation
• Idea:
‣ produce an estimate of the expected utility of the game from a given position
• Performance depends on quality of EVAL
• Requirements:
‣ EVAL should value terminal-nodes in the same order as UTILITY
‣ Computation may not take too long
‣ For non-terminal states the EVAL should be strongly correlated with the actual chance of winning
‣ Only useful for quiescent states (=no wild swings in value in near future)
Games that include chance
• In backgammon, you roll 2 dice, and then use both numbers in either order
‣ so a double, [1,1], has probablity 1/36, all other moves have probability 1/18
• In this tree, we calculate the expected value of MINIMAX
67
Games that include chance
• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected value
68
Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=
UTILITY(n) If n is a terminalmaxs � successors(n) MINIMAX-VALUE(s) If n is a max nodemins � successors(n) MINIMAX-VALUE(s) If n is a max node¦s � successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node
These equations can be backed-up recursively all the way to the root of the game tree.
67
Games that include chance
• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only expected value
68
Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=
UTILITY(n) If n is a terminalmaxs � successors(n) MINIMAX-VALUE(s) If n is a max nodemins � successors(n) MINIMAX-VALUE(s) If n is a max node¦s � successors(n) P(s) . EXPECTEDMINIMAX(s) If n is a chance node
These equations can be backed-up recursively all the way to the root of the game tree.
Multiply MINIMAX values by
probabilities
Expected MINIMAX value
• Given a game tree, the optimal strategy can be determined by using the minimax value of each node, n:
‣ if n is a terminal node, ๏ EXPECTED-MINIMAX-VALUE(n) = UTILITY(n)
‣ if n is a MAX node, ๏ EXPECTED-MINIMAX-VALUE(n) = maxs ∈ successors(n) EXPECTED-MINIMAX-VALUE(s)
‣ if n is a MIN node, ๏ EXPECTED-MINIMAX-VALUE(n) = mins ∈ successors(n) EXPECTED-MINIMAX-VALUE(s)
‣ if n is a CHANCE node,๏ EXPECTED-MINIMAX-VALUE(n) = ∑s ∈ successors(n) P(s)EXPECTED-MINIMAX-VALUE(s)
Position evalation with CHANCE nodes
• On left, A1 wins; on right, A2 wins
‣ outcome of evaluation may change if values are not scaled linearly
• If you need to scale values, use a positive linear transformation
69
Position evaluation with chance nodes
• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change when values are
scaled differently.• Behavior is preserved only by a positive linear transformation of
EVAL.
70
Discussion• Examine section on state-of-the-art games yourself• Minimax assumes right tree is better than left, yet …
– Return probability distribution over possible values– Yet expensive calculation
Summary
• Games are fun, useful and potentially distracting
• They illustrate many important points about AI
‣ Perfection is unattainable, so approximation is necessary
‣ We need a good idea of strategy to win๏ and MINIMAX allows us to express that in terms of final states - relatively easy
‣ Uncertainty constrains the assignment of values to states
• Games are to AI as grand prix racing is to car design.