Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Review
Uninformed Search
b
a
d
pq
h
e
c
f
r
START
GOAL
2
1
3
1
9
15
8
2
2
4
9
5
5
5
4
1
3
2
Complexity• N = Total number of states
• B = Average number of successors (branching factor)
• L = Length for start to goal with smallest number of steps
• Q = Average size of the priority queue
• Lmax = Length of longest path from START to any state
O(Min(N,2BL/2))O(Min(N,2BL/2))Y, If all trans.
have same cost
YBi- Direction.
BFS
BIBFS
O(BL)O(BL)Y, If all trans. have same cost
YIterative Deepening
IDS
O(Min(N,BLmax))O(Min(N,BLmax))NYMemorizing DFS
MEMDFS
O(BLmax)O(BLmax)NYPath Check DFS
PCDFS
O(Min(N,BL))O(log(Q)*Min(N,BL))Y, If cost > 0Y, If cost > 0
Uniform Cost Search
UCS
O(Min(N,BL))O(Min(N,BL))Y, If all trans.
have same cost
YBreadth First
Search
BFS
SpaceTimeOptimalCompleteAlgorithm
START
A
B
C
GOAL
h(A) = 3
h(B) = 6
Our best guess is that A is closer
to GOAL than B so maybe it is a
more promising state to expand
h(B) = 10
Informed Search
3
Informed Search
• Best-First Search: Expand node with minimum h(s)
• A*: Expand node with minimum f(s) = g(s) + h(s)
• Guaranteed to be optimal if h admissible h(s) <= h*(s)
• No nodes revisited if monotonic:
h(s) < h(s’) + cost(s,s’)
• IDA* = Equivalent of Iterative Deepening for A* � guarantees low memory usage
4
)()()2()( swhsgwsf +−=
Most Basic Algorithm: Hill-Climbing (Greedy Local Search)
• X Initial configuration
• Iterate:
1. E Eval(X)
2. N Neighbors(X)
3. For each Xi in N
Ei Eval(Xi)
4. If all Ei’s are lower than E
Return X
Else
i* = argmaxi (Ei) X Xi* E Ei*
5
Stochastic Search: Randomized Hill-Climbing
• X Initial configuration
• Iterate:
1. E Eval(X)
2. X’ one configuration
randomly selected in
Neighbors (X)
3. E’ Eval(X’)
4. If E’ > E
X X’
E E’
Critical change: We no
longer select the best move in the entire
neighborhood
Until when?
Simulated Annealing1. Do K times:
1.1 E Eval(X)
1.2 X’ one configuration randomly
selected in Neighbors (X)
1.3 E’ Eval(X’)
1.4 If E’ >= E
X X’; E E’;
Else accept the move with probability
p = e -(E – E’)/T :
X X’; E E’;
2. T α T
6
Basic GA Outline• Create initial population X = {X1,..,XP}
• Iterate:
1. Select K random pairs of parents (X,X’)
2. For each pair of parents (X,X’):
1.1 Generate offsprings (Y1,Y2) using crossover operation
1.2 For each offspring Yi:
Replace randomly selected element of the population by Yi
With probability µ:
Apply a random mutation to Yi
• Return the best individual in the population
7
CSP• Definitions
• Standard search
• Improvements– Backtracking
– Forward checking
– Constraint propagation
• Heuristics:– Variable ordering
– Value ordering
• Examples
• Tree-structured CSP
• Local search for CSP problems
Constraint Propagation• A = queue of active arcs (Vi,Vj)
• Repeat while A not empty:
– (Vi,Vj) next element of A
– For each x in D(Vi):
• Remove x from D(Vi) if there is no y in D(Vj) for which (x,y) satisfies the constraint between Vi and Vj.
– If D(Vi) has changed:
• Add all the pairs (Vk,Vi), where Vk is a neighbor of Vi (k not equal to j) to A
8
Order the variables such that the parent of a node
appears always before that node in the list
O(N d2)
• Most Constraining Variable– Selecting a variable which contributes to the largest number of
constraints will have the largest effect on the other variables �Hopefully will prune a larger part of the search
– This amounts to finding the variable that is connected to the largest number of variables in the constraint graph.
• Minimum Remaining Values (MRV)
– Selecting the variable that has the least number of candidate
values is most likely to cause a failure early (“fail-first” heuristic)
• Least Constraining Value– Choose the value which causes the smallest reduction in the number of
available values for the neighboring variables
9
• Configuration space C = set of values of q corresponding to
legal configurations of the robot• Defines the set of possible parameters (the search space) and
the set of allowed paths
• Grow the forbidden parts of C-Space � Can assume that
robot is a point in C-Space
Configuration Space (C-Space)
10
Visibility Graphs
• N = total number of vertices of the
obstacle polygons
• Naïve: O(N3)
• Sweep: O(N2 log N)
• Optimal: O(N2)
Voronoi Diagrams
• Key property: The points on the edges of the Voronoidiagram are the furthest from the obstacles• Idea: Construct a path between qstart and qgoal by following edges on the Voronoi diagram• (Use the Voronoi diagram as a roadmap graph instead
of the visibility graph)
11
Cell Decomposition
• Define cells in C-Space• Mark any cell of the grid that intersects Cobs as
blocked• Find path through remaining cells by using (for
example) A* (e.g., use Euclidean distance as heuristic)
• Approximate � Easy to compute but not complete• Exact � Hard to compute but complete
Potential Fields
• Stay away from obstacles: Imagine that the obstacles are made of a material that generate a repulsive field
• Move closer to the goal: Imagine that the goal location is a particle that generates an attractive field
• Key issue: Local minima– Deterministic exploration of local minima
– Stochastic exploration
• In very high dimension � Randomize traversal of neighbors
12
Sampling Techniques
• Tends to explore the space rapidly in all directions
• Does not require extensive pre-processing
• Single query/multiple query problems• Needs only collision detection test � No need to
represent/pre-compute the entire C-space
• For a large class of problems:– Prob(finding a path) � 1 exponentially with the number of samples
• But, cannot detect that a path does not exist
13
Games
14
A
B
3 12 8 2 14 5 24 6
3 =min(3,12,8)
2 2
3 = max(3,2,2)
MinimaxMinimax (s)
If s is terminal
Return U(s)
If next move is A
Return
Else
Return
( ))('
'MinimaxmaxsSuccss
s∈
( ))('
'MinimaxminsSuccss
s∈
15
Minimax Properties
• Complete: If finite game
• Optimal: If opponent plays optimally
• Complexity: Essentially DFS, so:– Time: O(Bm)
– Space: O(Bm)
– B = number of possible moves from any state (branching factor)
– m = depth of search (length of game)
• Pruning (αβ):– Guaranteed to find same solution
– O(Bm/2) with proper ordering of the nodes � At “A” node, the successor are in order from high to low score
Non-Deterministic MinimaxMinimax (s)
If s is terminal
Return U(s)
If next move is A: Return
If next move is B Return
If chance node Return
( ))('
'MinimaxmaxsSuccss
s∈
( ))('
'MinimaxminsSuccss
s∈
( ) ( )∑∈ )('
'Minimax'sSuccss
ssp
16
-1
+2
+1
+1
Min value across each row
Max value = game value = +2
),( jiMinMaxjColumnsiRowsM
+1+5+1+5IV
+1+5+1+5III
+2+2+4+4II
+2+2-1-1I
IVIIIIII
+5 +4 +5 +2
Ma
x v
alu
e a
cro
ss e
ach c
olu
mn
Min value = game value = +2
),( jiMaxMiniRowsjColumns
M
+1+5+1+5IV
+1+5+1+5III
+2+2+4+4II
+2+2-1-1I
IVIIIIII
Note that we find the same value and same
strategies in both cases. Is that always the case?
Minimax vs. Maximin
• Fundamental Theorem I (von Neumann):
– For a two-player, zero-sum game with
perfect information:
• There always exists an optimal pure strategy for each player
• Minimax = Maximin
• Note: This is a game-theoretic
formalization of the minimax search
algorithm that we studied earlier.
17
Minimax with Mixed Strategies• Theorem II (von Neumann):
– For a two-player, zero-sum game with hidden information:
• There always exists an optimal mixed strategy
• In addition, just like for games with perfect information, it does not matter in which order we look at the players, minimax is the same as maximin
• Note: This is a direct generalization of the minimax result to mixed strategies.
))1(,)1(max(min
))1(,)1(min(max
22211211
22122111
mqmqmqmq
mpmpmpmp
q
p
×−+××−+×
=×−+××−+×
p = 0.5 p = 0.5
hold
hold
see
resi
gn
resign
see
resi
gn
-10+4 -20 +4 +16
18
p = 0.5 p = 0.5
hold
hold
see
resi
gn
resign
see
resi
gn
-10+4 -20 +4 +16
Hidden information: Player B cannot know which of these 2 states it’s in
The game is
non-
deterministic
because of
the initial random
choice of
cards
• Generate the matrix form of the game (be careful: It’s not a deterministic game)
• Find the optimal mixed strategy
• Find the expected payoff for Player A
Hold
Resign
SeeResign
Player B
Pla
ye
r A
19
4,52,33,13,8
6,36,28,42,3
9,09,75,85,3
5,65,94,13,0I
II
III
IV
Strategies
Pure Strategy Nash Equilibrium
• Does not always exist
• Is not always unique
• For continuous games:
( ) ( )**
1
**
1
*
1
**
1
*
1
*
1 ,,,,,,,,,,,,niiiiniiii
sssssusssssu LLLL +−+− ≤
( )**
1
*
1
*
1
* ,,,,,,maxargniiii
s
isssssus
i
LL +−=
( ) 0,, **
1 =∂
∂n
i
i sss
uL
20
Mixed Strategy Nash Equilibrium
• Player A plays strategy si with probability pi
• For non-zero games � Mixed Nash
equilibrium always exists
• Is not always unique
( ) ( )*** ,, qpuqpu AA ≤
( )** ,maxarg qpup Ap
=
+1,+20,0Movie
0,0+2,+1Hockey
MovieHockey
+1,00,+1Movie
0,+2+2,0Hockey
MovieHockey
tB=meet
tB=avoid
P(tA=meet | tB=meet) = 1
P(tA=meet | tB=avoid) = 1
P(tA=avoid | tB=meet) = 0
P(tA=avoid | tB=avoid) = 0
P(tB=meet | tA=meet) = 1/2
P(tB=meet | tA=avoid) = 1/2
P(tB=avoid | tA=meet) = 1/2
P(tB=avoid | tA=avoid) = 1/2
21
( ) )|P()(),(
BPlayer of typespossible all
AB
t
BBAAAA tttstsuu
B
∑=
Payoff if Player A knows that Player B is of type tB
Probability that Player B is indeed of type tB
Since Player A does not know Player B’s type, it has to sum over all possible
types to get the expected value
22