Upload
vancong
View
217
Download
0
Embed Size (px)
Citation preview
Course 16 :198 :520 : Introduction To Artificial IntelligenceLecture 5
Solving Problems by Searching:Adversarial Search
Abdeslam Boularias
Wednesday, September 16, 2015
1 / 24
Outline
We examine the problems that arise when we make decisions in a worldwhere other agents are also acting, possibly against us.
1 The minimax algorithm
2 Alpha-beta pruning
3 Imperfect fast decisions
4 Stochastic games
5 Partially observable games
2 / 24
Games
Multiagent environments are environments where more than oneagent is acting, simultaneously or at different times.
Contingency plans are necessary to account for the unpredictabilityof other agents.
Each agent has its own personal utility function.
The corresponding decision-making problem is called a game.
A game is competitive if the utilities of different agents aremaximized in different states.
In zero-sum games, the sum of the utilities of all agents is constant.
Zero-sum games are purely competitive.
3 / 24
Games
The abstract nature of games, such as chess, makes them appealingto study in AI.
The state of a game is easy to represent and agents typically have asmall number of actions to choose from.
(a) (b)
Left : Computer chess pioneers Herbert Simon and Allen Newell (1958). Right : John
McCarthy and the Kotok-McCarthy program on an IBM 7090 (1967)4 / 24
Games
The abstract nature of games, such as chess, makes them appealingto study in AI.
The state of a game is easy to represent and agents typically have asmall number of actions to choose from.
Physical games, such as soccer, are more difficult to study due totheir continuous state and action spaces.
Robot soccer(from ri.cmu.edu) 5 / 24
Games
A game is described by :
S0 : the initial state (how is the game set up at the start ?).
PLAYER(s) : Indicates which player has the move in state s (whoseturn is it ?).
ACTIONS(s) : Set of legal actions in state s.
RESULT(s, a) : returns the next state after we play action a in states.
TERMINAL-TEST(s) : Indicates if s is a terminal state.
UTILITY(s, p) : (also called objective or payoff function) defines anumerical value for a game that ends in terminal state s for player p
6 / 24
Example : tic-tac-toe
We suppose there are two players in a zero-sum game
We call our player MAX, she tries to maximize our utility.
We call our opponent MIN, she tries to minimize our utility (i.e.maximize her utility).
XX
XX
X
X
X
XX
X X
O
OX O
O
X OX O
X
. . . . . . . . . . . .
. . .
. . .
. . .
XX
–1 0 +1
XX
X XO
X XOX XO
O
O
X
X XO
OO
O O X X
MAX (X)
MIN (O)
MAX (X)
MIN (O)
TERMINAL
Utility 7 / 24
Optimal games
MAX A
B C D
3 12 8 2 4 6 14 5 2
3 2 2
3
a1a2
a3
b1
b2
b3 c1
c2
c3 d1
d2
d3
MIN
∆ Indicates states where MAX should play.
∇ Indicates states where MIN should play.
Which action among {a1, a2, a3} should MAX play ?8 / 24
Minimax strategy
MAX A
B C D
3 12 8 2 4 6 14 5 2
3 2 2
3
a1a2
a3
b1
b2
b3 c1
c2
c3 d1
d2
d3
MIN
We don’t take any risk, we assume that MIN will play optimally.
We look for the best action for the worst possible scenario.
What if our opponent is not optimal ? Can we learn the opponent’sbehaviour ?
What if our player is trying to fool us by behaving in a certain way ?9 / 24
Minimax strategy
MAX A
B C D
3 12 8 2 4 6 14 5 2
3 2 2
3
a1a2
a3
b1
b2
b3 c1
c2
c3 d1
d2
d3
MIN
MINIMAX(s) =
{UTILITY(s) if TERMINAL-TEST(s) = true,maxa∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MAX,mina∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MIN.
10 / 24
Minimax algorithm
166 Chapter 5. Adversarial Search
function MINIMAX-DECISION(state) returns an actionreturn argmaxa ∈ ACTIONS(s) MIN-VALUE(RESULT(state,a))
function MAX-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v←−∞for each a in ACTIONS(state) dov←MAX(v , MIN-VALUE(RESULT(s , a)))
return v
function MIN-VALUE(state) returns a utility valueif TERMINAL-TEST(state) then return UTILITY(state)v←∞for each a in ACTIONS(state) dov←MIN(v , MAX-VALUE(RESULT(s , a)))
return v
Figure 5.3 An algorithm for calculating minimax decisions. It returns the action corre-sponding to the best possible move, that is, the move that leads to the outcome with thebest utility, under the assumption that the opponent plays to minimize utility. The functionsMAX-VALUE and MIN-VALUE go through the whole game tree, all the way to the leaves,to determine the backed-up value of a state. The notation argmaxa∈ S f(a) computes theelement a of set S that has the maximum value of f(a).
to moveA
B
C
A(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5)
(1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5)
(1, 2, 6) (1, 5, 2)
(1, 2, 6)
X
Figure 5.4 The first three plies of a game tree with three players (A, B, C). Each node islabeled with values from the viewpoint of each player. The best move is marked at the root.
vector of the successor state with the highest value for the player choosing at n. Anyonewho plays multiplayer games, such as Diplomacy, quickly becomes aware that much moreis going on than in two-player games. Multiplayer games usually involve alliances, whetherALLIANCE
formal or informal, among the players. Alliances are made and broken as the game proceeds.How are we to understand such behavior? Are alliances a natural consequence of optimalstrategies for each player in a multiplayer game? It turns out that they can be. For example,
11 / 24
Minimax strategy in multiplayer games
to moveA
B
C
A
(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5)
(1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5)
(1, 2, 6) (1, 5, 2)
(1, 2, 6)
X
We have three players A, B, and C.
The utilities are represented by a 3-dimensional vector (vA, vB, vC).
We apply the same principle : assume that every player is optimal.
If the game is not zero-sum, implicit collaborations may occur.
12 / 24
Alpha-Beta pruning
Time is a major issue in game search trees. Searching the completetree takes O(bm) operations, where b is the branching factor and m isthe depth of the tree (the horizon).
Do we really need to parse the whole tree to find a minimax strategy ?
MAX A
B C D
3 12 8 2 4 6 14 5 2
3 2 2
3
a1a2
a3
b1
b2
b3 c1
c2
c3 d1
d2
d3
MIN
13 / 24
Alpha-Beta pruning
Example
Assume that the search tree has been parsed except for actions c2and c3.
Let us denote the utilities of c2 and c3 by x and y respectively.
MINIMAX(root) = max(min(3, 12, 8),min(2, x, y),min(14, 5, 2))
= max(3,min(2, x, y), 2)
= max(3, z, 2) where z = min(2, x, y) ≤ 2
= 3.
MAX A
B C D
3 12 8 2 4 6 14 5 2
3 2 2
3
a1a2
a3
b1
b2
b3 c1
c2
c3 d1
d2
d3
MIN
14 / 24
Alpha-Beta pruning
(a) (b)
(c) (d)
(e) (f)
3 3 12
3 12 8 3 12 8 2
3 12 8 2 14 3 12 8 2 14 5 2
A
B
A
B
A
B C D
A
B C D
A
B
A
B C
[−∞, +∞] [−∞, +∞]
[3, +∞][3, +∞]
[3, 3][3, 14]
[−∞, 2]
[−∞, 2] [2, 2]
[3, 3]
[3, 3][3, 3]
[3, 3]
[−∞, 3] [−∞, 3]
[−∞, 2] [−∞, 14]
15 / 24
Imperfect fast decisions
The minimax algorithm generates the entire search tree.
The alpha-beta algorithm allows us to prune large parts of the searchtree, but its complexity is still exponential in the branching factor(number of actions).
This is still non-practical because moves should be made very quickly.
Cutting-off the search
A cutoff test is used to decide when to stop looking further.
A heuristic evaluation function is used to estimate the utility wherethe search is cut off.
H-MINIMAX(s, d) =
{EVAL(s) if CUTOFF-TEST(s, d) = true,maxa∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MAX,
mina∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MIN.
16 / 24
Evaluation functions
Human chess players have ways of judging the value of a positionwithout imagining all the moves ahead until a check-mate.
A good evaluation function should order the actions correctlyaccording to their true utilities.
Evaluation functions should be computed very quickly.
17 / 24
Example : evaluation functions in chess
Evaluation functions use features of given positions in the game.
Example : number of pawns in the given position. If we know fromexperience that 72% of positions “two pawns vs one pawn” lead to awin (utility +1) ; 20% to a loss (0), and 8% to a draw (1/2), then theexpected value of these positions is 0.76.
Other functions such as the advantage in each piece, “good pawnstructure”, and “king safety” can be used as features fi.
The evaluation function can be given using a weighted linear model :
EVAL(s) =n∑
i=1
wifi(s),
where wi is the importance of feature fi.
18 / 24
Example : evaluation functions in chess
Linear models assume that the features are independent, which is notalways true (bishops are more efficient at endgame).
Values of some features do not increase linearly (two knights are waymore useful than one knight).
(b) White to move(a) White to move
In the two positions, the two players have the same number of pieces. The position on the right
is much worse than the one on the left for Black. What if the search cutoff happens in the left ?19 / 24
Stochastic games
In some games, the state of the game changes randomly dependingon the selected actions
Backgammon is a typical game that combines luck and skill.
1 2 3 4 5 6 7 8 9 10 11 12
24 23 22 21 20 19 18 17 16 15 14 13
0
25
20 / 24
Stochastic games
We can use the same minimax strategy, but we need to take therandomness into account by computing the expected utilities.
CHANCE
MIN
MAX
CHANCE
MAX
. . .
. . .
B
1
. . .
1,11/36
1,21/18
TERMINAL
1,21/18
......
.........
......
1,11/36
...
...... ......
...
C
. . .
1/186,5 6,6
1/36
1/186,5 6,6
1/36
2 –11–1
21 / 24
Stochastic games
We can use the same minimax strategy, but we need to take therandomness into account by computing the expected utilities.
EXPECMINIMAX(s) =
{UTILITY(s) if TERMINAL-TEST(s) = true,maxa∈Actions(s) EXPECMINIMAX(RESULT(s, a)) if PLAYER(s) = MAX,
mina∈Actions(s) EXPECMINIMAX(RESULT(s, a)) if PLAYER(s) = MIN∑r P (r)EXPECMINIMAX(RESULT(s, r)) if PLAYER(s) = CHANCE,
where r is a chance event (e.g, dice roll).
22 / 24
Trouble with evaluation functions in stochastic games
CHANCE
MIN
MAX
2 2 3 3 1 1 4 4
2 3 1 4
.9 .1 .9 .1
2.1 1.3
20 20 30 30 1 1 400 400
20 30 1 400
.9 .1 .9 .1
21 40.9
a1 a2 a1 a2
23 / 24
Partially observable
In some games, the state of the game is not fully known.
Cards and Kriegspiel are examples of such games.
The state of the game can be tracked by remembering past actionsand observations.
a
1
2
3
4
db c
Kc3 ?
“Illegal”“OK”
Rc3 ?
“OK” “Check”
24 / 24