Upload
david-kelley
View
218
Download
0
Embed Size (px)
Citation preview
Game PlayingGame PlayingRevisionRevision
Mini-Max searchMini-Max search
Alpha-Beta pruningAlpha-Beta pruning
General concerns on gamesGeneral concerns on games
2
Why study board games ?Why study board games ?
One of the oldest subfields of AI (Shannon and One of the oldest subfields of AI (Shannon and
Turing, 1950)Turing, 1950)
Abstract and pure form of competition that Abstract and pure form of competition that
seems to require intelligenceseems to require intelligence
Easy to represent the states and actionsEasy to represent the states and actions
Very little world knowledge required !Very little world knowledge required !
Game playing is a special case of a search Game playing is a special case of a search
problem, with some new requirements.problem, with some new requirements.
3
Types of gamesTypes of games
Bridge, poker, Bridge, poker, scrabble, scrabble, nuclear warnuclear war
Backgammon, Backgammon, monopolymonopoly
Chess, Chess, checkers, go, checkers, go, othelloothello
ChanceChanceDeterministicDeterministic
Imperfect Imperfect informationinformation
Perfect Perfect informationinformation
Sea battleSea battle
4
Why new techniques for games?Why new techniques for games?
““Contingency” problem:Contingency” problem:
We don’t know the opponents move !We don’t know the opponents move ! The size of the search space:The size of the search space:
Chess : ~15 moves possible per state, 80 ply Chess : ~15 moves possible per state, 80 ply 15158080 nodes in tree nodes in tree
Go : ~200 moves per state, 300 plyGo : ~200 moves per state, 300 ply 200200300300 nodes in tree nodes in tree
Game playing algorithms:Game playing algorithms: Search tree only up to some Search tree only up to some depth bounddepth bound Use an Use an evaluation functionevaluation function at the depth boundat the depth bound PropagatePropagate the evaluation the evaluation upwardsupwards in the tree in the tree
5
MINI MAXMINI MAX Restrictions:Restrictions:
2 players:2 players: MAXMAX (computer)(computer) andand MIN MIN (opponent)(opponent) deterministic, perfect informationdeterministic, perfect information
Select a depth-bound Select a depth-bound (say: 2)(say: 2) and evaluation and evaluation functionfunction
MAXMAX
MINMIN
MAXMAX
- - Construct the tree up tillConstruct the tree up till the depth-boundthe depth-bound
- - Compute the evaluation Compute the evaluation function for the leavesfunction for the leaves
22 55 33 11 44 44 33
- - Propagate the evaluationPropagate the evaluation function upwards:function upwards: - taking minima in- taking minima in MINMIN
22 11 33
- - taking maxima intaking maxima in MAXMAX
33SelectSelectthis movethis move
6
The MINI-MAX algorithm:The MINI-MAX algorithm:Initialise Initialise depthbounddepthbound;;
Minimax (Minimax (boardboard, , depthdepth) = ) =
IFIF depthdepth == depthbounddepthbound THENTHEN returnreturn static_evaluation(static_evaluation(boardboard));; ELSEELSE IFIF maximizing_level( maximizing_level(depthdepth) )
THENTHEN FOR EACH FOR EACH child child childchild of of boardboard compute Minimax(compute Minimax(childchild, ,
depth+1depth+1);); returnreturn maximum over all maximum over all childrenchildren; ;
ELSEELSE IFIF minimizing_level( minimizing_level(depthdepth) ) THENTHEN FOR EACH FOR EACH child child childchild of of boardboard
compute Minimax(compute Minimax(childchild, , depth+1depth+1););
return return minimum over all minimum over all childrenchildren; ; Call: Minimax(Call: Minimax(current_boardcurrent_board, , 00))
7
Alpha-Beta Cut-offAlpha-Beta Cut-off Generally applied optimization on Mini-max.Generally applied optimization on Mini-max.
Instead of:Instead of: firstfirst creating the entire tree (up to depth-level) creating the entire tree (up to depth-level) thenthen doing all propagation doing all propagation
InterleaveInterleave the generation of the tree and the the generation of the tree and the propagation of values.propagation of values.
PointPoint:: some of the obtained valuessome of the obtained values in the tree will in the tree will
provide informationprovide information that that other (non-generated) other (non-generated) parts are parts are redundantredundant and do not need to be and do not need to be generated.generated.
8
MINMIN
MAXMAX
MAXMAX
22
Alpha-Beta idea:Alpha-Beta idea: Principles:Principles:
generate the tree depth-first, left-to-rightgenerate the tree depth-first, left-to-right propagate final values of nodes as initial propagate final values of nodes as initial
estimates estimates for their parent node.for their parent node.
22
55
=2=2
22
11
11
- - The The MINMIN-value (-value (11) is already) is alreadysmaller than the smaller than the MAXMAX-value of-value ofthe parent (the parent (22))
- - The The MINMIN-value can only -value can only decrease further,decrease further,
- - The The MAXMAX-value is only allowed-value is only allowed to increase,to increase,
- - No point in computing further No point in computing further below this nodebelow this node
9
Terminology:Terminology:
- The (temporary) values at- The (temporary) values at MAX MAX-nodes are-nodes are ALPHA- ALPHA-valuesvalues
- The (temporary) values at- The (temporary) values at MINMIN-nodes are-nodes are BETA-valuesBETA-values
MINMIN
MAXMAX
MAXMAX
22
22
55
=2=2
22
11
11
Alpha-valueAlpha-value
Beta-valueBeta-value
10
The Alpha-Beta principles (1):The Alpha-Beta principles (1):
- If an - If an ALPHA-value ALPHA-value is is larger or equallarger or equal than the than the Beta-Beta-value value of a descendant node:of a descendant node:
stop generation of the children of the stop generation of the children of the descendantdescendant
MINMIN
MAXMAX
MAXMAX
22
22
55
=2=2
22
11
11
Alpha-valueAlpha-value
Beta-valueBeta-value
11
The Alpha-Beta principles (2):The Alpha-Beta principles (2):
- If an - If an Beta-valueBeta-value is is smaller or equalsmaller or equal than the than the Alpha-Alpha-valuevalue of a descendant node:of a descendant node:
stop generation of the children of the stop generation of the children of the descendantdescendant
MINMIN
MAXMAX
MAXMAX
22
22
55
=2=2
22
33
11
Alpha-valueAlpha-value
Beta-valueBeta-value
12
88 77 33 99 11 66 22 44 11 11 33 55 33 99 22 66 55 22 11 22 33 99 77 22 88 66 44
Mini-Max with Mini-Max with at work:at work:
11
22 88
33
55= 8= 8
44
66 88
77
88 99
99 1111 1313 1717 1919212124242626 2828 3232 3434 3636
1010 221212 441414= 4= 4
1515= 4= 4
441616
1818 112020 332222= 5= 5
3030= 5= 5 55 2323
553131
2525 332727 99 2929 66
3333 113535 223737= 3= 3
33 3838
3939= 5= 5MAXMAX
MINMIN
MAXMAX
11 static evaluations 11 static evaluations saved !!saved !!
13
““DEEP” cut-offsDEEP” cut-offs
- For game trees with at least 4 - For game trees with at least 4 MinMin//MaxMax layers: layers:the the AlphaAlpha - - BetaBeta rules apply also to deeper rules apply also to deeper
levels. levels.
44
44
44
44
44
22
22
14
The Gain: Best case:The Gain: Best case:
MAXMAX
MINMIN
MAXMAX
- If at every layer: the - If at every layer: the best nodebest node is the is the left-most oneleft-most one
Only Only THICKTHICK is explored is explored
15
Example of a perfectly ordered Example of a perfectly ordered treetree
MAXMAX
MINMIN
MAXMAX
21 20 19 24 23 22 27 26 2521 20 19 24 23 22 27 26 2512 11 10 15 14 13 18 17 1612 11 10 15 14 13 18 17 163 2 1 6 5 4 9 8 73 2 1 6 5 4 9 8 7
21 24 2721 24 27 12 15 1812 15 18 3 6 93 6 9
21 12 321 12 3
2121
16
# (static evaluations # (static evaluations saved) =saved) =
How much gain ?How much gain ?
- Alpha / Beta : best case :- Alpha / Beta : best case :
2 2 bbdd/2/2 - 1 (if - 1 (if dd is even) is even)
bb((dd+1)/2+1)/2 + + bb((dd-1)/2-1)/2 - 1 (if - 1 (if dd is odd) is odd)
- - The proof is by induction. The proof is by induction.
- - In the running example: In the running example: dd=3=3, , bb=3=3 : 11 ! : 11 !
17
Best case gain pictured:Best case gain pictured:
1010
100100
10001000
1000010000
100000100000
11 22 33 44 55 66 77
# Static evaluations# Static evaluations
DepthDepth
No pruningNo pruningb = 10b = 10
Alpha-BetaAlpha-BetaBest caseBest case
- - Note: algorithmic scale. Note: algorithmic scale. - - Conclusion: Conclusion: still exponential growth !!still exponential growth !!
- - Worst case??Worst case??For some trees alpha-beta does nothing,For some trees alpha-beta does nothing,For some trees: impossible to reorder to avoid cut-offsFor some trees: impossible to reorder to avoid cut-offs
18
The horinzon effect.The horinzon effect.
Queen lostQueen lost Pawn lostPawn lost
Queen lostQueen lost
horizon = depth boundhorizon = depth boundof mini-max of mini-max
Because of the depth-boundBecause of the depth-bound we prefer to delay disasters, although we we prefer to delay disasters, although we don’t don’t prevent them !!prevent them !!
solutionsolution: heuristic continuations: heuristic continuations
19
Heuristic ContinuationHeuristic Continuation
In situations that are identifies as strategically crucialIn situations that are identifies as strategically cruciale.g: king in danger, imminent piece loss, pawn e.g: king in danger, imminent piece loss, pawn
to to become as queens, ... become as queens, ...
extend the search beyond the depth-bound !extend the search beyond the depth-bound !
depth-bounddepth-bound
20
How to organize the continuations?How to organize the continuations?How to control (and stop) extending the tree? How to control (and stop) extending the tree?
Tapering searchTapering search (or: heuristic pruning)(or: heuristic pruning)
Order the moves in 1 layer by quality. Order the moves in 1 layer by quality.
b(b(childchild)) = = b(b(parentparent)) -- (rank child among brothers) (rank child among brothers)
b = 4b = 4
33 22 11 00
22 11 00 11 00
......
21
Time bounds:Time bounds:
How to play within reasonable time bounds? How to play within reasonable time bounds?
Even with fixed depth-bound, times can vary strongly!Even with fixed depth-bound, times can vary strongly!
Solution:Solution: Iterative Deepening !!!Iterative Deepening !!!
Remember: overhead of previous searches = Remember: overhead of previous searches = 1/b1/b
Good investment to be sure to have a move Good investment to be sure to have a move ready.ready.
22
Games of chanceGames of chance
Ex.: Ex.: Backgammon:Backgammon:
Form of the game tree:Form of the game tree:
23
““Utility” propagation with Utility” propagation with chances: chances:
Utility function for a Utility function for a MaximizingMaximizing node node CC : :
( , )
: outcome dice
( ) : probability
( , ) : reachable positions from given
( ) : evaluation of
expectimax( ) ( )max ( )i
i
i i
i i
is S C di
d
P d d
S C d C d
utility s s
C P d utility s
MAXMAX
s1s1 s2s2 s3s3 s4s4
d1d1 d2d2 d3d3 d4d4 d5d5
S(S(CC,,d3d3))
CC
MinMin