6. Fully Observable Game Playing 2012/03/28 1. 2 Games vs. search problems

6. Fully Observable Game Playing 2012/03/28 1

2 Games vs. search problems

Game Theory Studied by mathematicians, economists, finance In AI, we limit game to deterministic turn-taking two-player zero-sum perfect information This means deterministic, full observable environments in which there are two agents whose action must alternate and in in which the utility values at the end of the game are always equal and opposite 3

Types of Games deterministicchance perfect information Chess, Checkers , Go, Othello Backgammon imperfect informationBridge, Poker Game playing was one of the first tasks undertaken in AI Machines have surpassed humans on checker and Othello, have defeated human champions in chess and backgammon In Go, computers perform at the amateur level 4

Checkers 5

Game as Search Problems Games offer pure, abstract competition A chess playing computer would be an existence proof of a machine doing something generally thought to require intelligence Games are idealization of worlds in which the world state is fully accessible the (small number of) actions are well-defined uncertainty due to moves by the opponent due to the complexity of games 6

Game as Search Problems (cont.-1) Games are usually much too hard to solve Example, chess: average branching factor = 35 average moves per player = 50 total number of nodes in search tree = 35 100 or 10 154 total number of different legal positions = 10 40 Time limits for making good decisions Unlikely to find goal, must approximate 7

Game as Search Problems (cont.-2) Initial State How does the game start? Successor Function A list of legal (move, state) pairs for each state Terminal Test Determine when game is over Utility Function Provide numeric value for all terminal states e.g., win, lose, draw with +1, -1, 0 8

Game Tree (2-player, deterministic, turns) 9 Game tree complexity 9!=362880 Game board complexity 3 9 = 19683

Minimax Strategy Assumption Both players are knowledgeable and play the best possible move MinimaxValue(n) = Utility(n) if n is a terminal state max s Successors(n) MinimaxValue(s) if n is a MAX node min s Successors(n) MinimaxValue(s) if n is a MIN node 10

Minimax Strategy (cont.) Is a Optimal Strategy Leads to outcomes at least as good as any other strategy when playing an infallible opponent Pick the option that most (max) minimizes the damage your opponent can do maximize the worst-case outcome because your skillful opponent will certainly find the most damaging move 11

Minimax Perfect play for deterministic, perfect information games Idea: choose moves to a position with highest minimax value = best achievable payoff against best play 12

13 Minimax Animated Example 5 1362270 Max Min Max 5 5 6 70 6 6 3 3 31 The computer can obtain 6 by choosing the right hand edge from the first node.

Minimax Algorithm function M INIMAX -D ECISION ( state ) returns an action inputs: state, current state in game v M AX -V ALUE ( state ) return the action in S UCCESSORS ( state ) with value v function M AX -V ALUE ( state ) returns a utility value if T ERMINAL -T EST ( state ) then return U TILITY ( state ) v for a, s in S UCCESSORS ( state ) do v M AX ( v, M IN -V ALUE ( s )) return v function M IN -V ALUE ( state ) returns a utility value if T ERMINAL -T EST ( state ) then return U TILITY ( state ) v for a, s in S UCCESSORS ( state ) do v M IN ( v, M AX -V ALUE ( s )) return v 14

Optimal Decisions in Multiplayer Games Extend the minimax idea to multiplayer games Replace the single value for each node with a vector of values 15

Minimax Algorithm (cont.) Generate the whole game tree Apply the utility function to each terminal state Propagate utility of terminal states up one level Utility(n) = max / min (n.1, n.2, , n.b) At the root, MAX chooses the move leading to the highest utility value 16

Analysis of Minimax Complete? Yes, only if the tree is finite Optimal? Yes, against an optimal opponent Time? O(b m ), is a complete depth-first search m: max depth, b: # of legal moves Space? O(bm), generate all successors at once or O(m), generate successor one at a time For chess, b 35, m 100 for reasonable games Exact solution completely infeasible 17

Complex Games What happens if minimax is applied to large complex games? What happens to the search space? Example, chess Decent amateur program 1000 moves / second 150 seconds / move (tournament play) Look at approx. 150,000 moves Chess branching factor of 35 Generate trees that are 3-4 ply Resultant play pure amateur 18

- Pruning The problem of minimax search # of states to examine: exponential in number of moves - pruning return the same move as minimax would, but prune away branches that cannot possibly influence the final decision lower bound on MAX node, never decreasing value of the best (highest) choice so far in search of MAX upper bound on MAX node, never increasing value of the best (lowest) choice so far in search of MIN 19

- Pruning Example - 1 20

- Pruning Example - 1 (2 nd Ed.) 2 5 14 ? [- , 2] 21

- Pruning (cont.) cut-off Search is discontinued below any MIN node with min-value cut-off Search is discontinued below any MAX node with max-value Order of considering successors matters (look at step f in previous slide) If possible, consider best successors first 22

- Pruning (cont.) If n is worse than , max will avoid it prune the branch If m is better than n for player, we will never get to n in play and just prune it max min max min 23

- Pruning Example - 2 C 6 5 8 6 2 1 5 4 = - = = - = = - = = - 6 = = - = 6 = - = 6 = - 8 = 6 = - 6 = = 6 = = 6 = = 6 = 6 A B DEFG HIJKLMLM 682 26 6 5 24

- Pruning Example - 3 NodeAlpha Beta a- b- d- d1 d2 d3 b- 3 e- 3 e4 3 CUT-OFF NodeAlpha Beta a3 c3 f3 c3 3 CUT-OFF Completed MAX MIN MAX MIN 515112234706 a c b d e f g 25

FunctionNode VReturn MaxA-, 3+-, 33, 2 MinB-+, 3 3, 4 MaxD-,1,2,3+-,1,2,31,2,3 Min1-+ Min21+ Min32+ MaxE-3-, 44Cutoff 5 & 7 Min4-3 MinC3+, 6 6 MaxF3,4,5,6+-,4,5,64,5,6 Min43+ Min54+ Min65+ MaxG36-, 66Cutoff 1 & 5 Min636 Key: - = negative infinity; + = positive infinity The last value in a square is the final value assigned to the specific variable, i.e. at the end of the search Node As = 3. 26

- Algorithm function A LPHA -B ETA -S EARCH ( state ) returns an action inputs: state, current state in game v M AX -V ALUE ( state, , ) return the action in S UCCESSORS ( state ) with value v function M AX -V ALUE ( state, , ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if T ERMINAL -T EST ( state ) then return U TILITY ( state ) v for a, s in S UCCESSORS ( state ) do v M AX ( v, M IN -V ALUE ( s, , )) if v then return v// fail-high M AX ( , v ) return v 27

- Algorithm (cont.) function M IN -V ALUE ( state, , ) returns a utility value inputs: state, current state in game , the value of the best alternative for MAX along the path to state , the value of the best alternative for MIN along the path to state if T ERMINAL -T EST ( state ) then return U TILITY ( state ) v for a, s in S UCCESSORS ( state ) do v M IN ( v, M AX -V ALUE ( s, , )) if v then return v// fail low M IN ( , v ) return v 28

- Pruning Example - 4 MAX MIN MAX MIN 5877 a cb defg 4251203012 MAX hijklmn 29

- Pruning Example - 5 30

Analysis of - Search Pruning does not affect final result The effectiveness of - pruning is highly dependent on the order in which the successors are examined It is worthwhile to try to examine first the successors that are likely to be best e.g., Example 1 (e,f)Example 1 If successors of D is 2, 5, 14 (instead of 14, 5, 2) then 5, 14 can be pruned 31

Analysis of - Search (cont.) If best move first (perfect ordering), the total number of nodes examined = O(b m/2 ) effective branching factor = b 1/2 for chess, 6 instead 35 i.e., - can look ahead roughly twice as far as minimax in the same amount of time If random order, the total number of nodes examined = O(b 3m/4 ) for moderate b 32

Imperfect, Real-Time Decisions No practical to assume the program has time to search all the ways to terminal states Since moves must be made in a reasonable amount of time, to alter minimax or - in two ways Evaluation Function (instead of utility function) an estimate of the expected utility of game from a given position Cutoff Test (instead of terminal test) decide when to apply Eval e.g., depth limit (perhaps add quiescence search) 33

Evaluation Functions The heuristic that estimates expected utility Preserve the ordering among terminal states in the same way as the true utility function, otherwise it can cause bad decision making Computation cannot take too long For nonterminal states, it should be strongly correlated with the actual chances of winning Define features of game state that assist in evaluation What are features of chess? e.g., # of pawns possessed, etc. Weighted Linear Function Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) + + w n f n (s) 34

Evaluation Functions (cont.-1) (a)Black has an advantage of a knight and two pawns and will win the game (b)Black will lose after white captures the queen 35

Evaluation Functions (cont.-2) Digression: Exact values dont matter Behavior is preserved under any monotonic transformation of Eval Only the order matter payoff in deterministic games acts as an order utility function 36

Cutting off Search When do you use evaluation functions? if Cutoff-Test(state, depth) then return Eval(state) controlling the amount of search is to set a fixed depth limit d Cutoff-Test(state, depth) returns 1 or 0 when 1 is returned for all depth greater than some fixed depth d, use evaluation function cutoff beyond a certain depth cutoff if state is stable (more predictable) cutoff moves you know are bad (forward pruning) Can have disastrous effect if evaluation functions are not sophisticated enough Should continue the search until a quiescent position is found 37

Cutting off Search (cont.) Does it work in practice? b m = 10 6, b = 35 m = 4 4-ply lookahead is a hopeless chess player 4-ply human novice 8-ply typical PC, human master 12-ply Deep Blue, Kasparov 38

Horizontal Effect a series of checks by the black rook forces the inevitable queening move by white over the horizontal and makes the position look like a win for black, when it is really a win for white Horizontal effect arises when the program is facing a move by the opponent that causes serious damage and is ultimately unavoidable At present, no general solution has been found for horizontal problem 39

Suggestion Improve evaluation function Know that the bishop is trapped Make the search deeper Make the search depth more flexible Program searches deeper in the line that a pawn is being given away, and less deep in other lines 40

HW2, Deadline 4/12 41 Design the Evaluation Functions for Chinese chess and Chinese Dark chess.

Documents

6. Fully Observable Game Playing 2012/03/28 1. 2 Games vs. search problems