View
225
Download
3
Tags:
Embed Size (px)
Citation preview
9/14: Belief Search Heuristics
Today: Planning graph heuristics for belief searchWed: MDPs
Heuristics for Belief-Space Planning
Evaluating search/planning: Theoretical
“Worst-case” Look at the complexity
Worst-case complexity of most search/planning problems is NP-complete or higher.
What would it tell us other than “find something else easier (if less interesting) to do”
Consider formal restrictions on domains under which complexity may be lower..
These restrictions may not be natural..
“Average-case” Average-case complexity would
be better But much harder to analyze
What distribution of problems to use?
Similar issues arise in empirical analyses
Evaluating Search/Planning: Empirical
Random problems Look at actual performance on
problems. WHICH PROBLEMS? Randomly generated problems
Which distribution? (hardest problems may live in small phase-transition regions as in SAT)
Find the phase-transition regions, generate random problems there
But who said such problems are at all related to problems that occur?
“Real” or “Benchmark” problems Use “real world” problems
Fine as far as the customers of that problem are boss is concerned, but not clear whether the claims will carry over to any other problems
May have to do analysis to figure out what is it about that domain that makes certain approaches work well
Develop many “benchmark” domains inspired by various real world problems and use them to evaluate the coverage of a planner Easy to abstract way the critical
characteristics when developing benchmarks
See Cushing’s analysis of temporal planning domains
Heuristics for Conformant Planning
First idea: Notice that “Classical planning” (which assumes full observability) is a “relaxation” of conformant planning So, the length of the classical planning solution is a
lowerbound (admissible heuristic) for conformant planning Further, the heuristics for classical planning are also
heuristics for conformant planning (albeit not very informed probably)
Next idea: Let us get a feel for how estimating distances between belief states differs from estimating those between states
Three issues: How many states are there? How far are each of the states from goal? How much interaction is there between states? For example if the length of plan for taking S1 to goal is 10, S2 to goal is 10, the length of plan for taking both to goal could be anywhere between 10 and Infinity depending on the interactions [Notice that we talk about “state” interactions here just as we talked about “goal interactions” in classical planning]
Need to estimate the length of “combined plan” for taking all states to the goal
World’s funniest joke (in USA)
In addition to interactions between literals as in classical planningwe also have interactions between states (belief space planning)
Belief-state cardinality alone won’t be enough…
Early work on conformant planning concentrated exclusively on heuristics that look at the cardinality of the belief state The larger the cardinality of the belief state, the higher its uncertainty, and the
worse it is (for progression) Notice that in regression, we have the opposite heuristic—the larger the cardinality, the
higher the flexibility (we are satisfied with any one of a larger set of states) and so the better it is
From our example in the previous slide, cardinality is only one of the three components that go into actual distance estimation. For example, there may be an action that reduces the cardinality (e.g. bomb the
place ) but the new belief state with low uncertainty will be infinite distance away from the goal.
We will look at planning graph-based heuristics for considering all three components (actually, unless we look at cross-world mutexes, we won’t be considering the
interaction part…)
Using a Single, Unioned GraphPM
QM
RM
P
Q
R
M
A1
A2
A3
Q
R
M
K
LA4
GA5
PA1
A2
A3
Q
R
M
K
L
P
G
A4K
A1P
M
Heuristic Estimate = 2
• Not effective• Lose world
specific support information
Union literals from all initial states into a conjunctive initial graph level
• Minimal implementation
Actions:A1: M
P => KA2: M
Q => KA3: M
R => LA4: K => GA5: L => G
Goal State:G
Initially: (P V Q V R) &
(~P V ~Q) &(~P V ~R) &(~Q V ~R) &
M
Using Multiple GraphsP
M
A1 P
M
K
A1 P
M
KA4
G
R
MA3
R
M
L
A3R
M
L
GA5
PM
QM
RM
Q
M
A2Q
M
K
A2Q
KA4
G
M
G
A4K
A1
M
P
G
A4K
A2Q
M
GA5
L
A3R
M
• Same-world Mutexes
• Memory Intensive
• Heuristic Computation Can be costly
Unioning these graphs a priori would give much savings …
Using a Single, Labeled Graph(joint work with David E. Smith)
P
Q
R
A1
A2
A3
P
Q
R
M
L
A1
A2
A3
P
Q
R
L
A5
Action Labels:Conjunction of Labels of Supporting Literals
Literal Labels:Disjunction of LabelsOf Supporting Actions
PM
QM
RM
KA4
G
K
A1
A2
A3
P
Q
R
M
GA5
A4L
K
A1
A2
A3
P
Q
R
M
Heuristic Value = 5
• Memory Efficient
• Cheap Heuristics
• Scalable• Extensibl
eBenefits from BDD’s
~Q & ~R
~P & ~R
~P & ~Q
(~P & ~R) V (~Q & ~R)
(~P & ~R) V (~Q & ~R) V(~P & ~Q)
M
True
Label Key
Labels signify possible worldsunder which a literal holds
What about mutexes? In the previous slide, we considered only relaxed plans (thus ignoring any
mutexes) We could have considered mutexes in the individual world graphs to get better
estimates of the plans in the individual worlds (call these same world mutexes) We could also have considered the impact of having an action in one world on the
other world. Consider a patient who may or may not be suffering from disease D. There is a medicine M,
which if given in the world where he has D, will cure the patient. But if it is given in the world where the patient doesn’t have disease D, it will kill him. Since giving the medicine M will have impact in both worlds, we now have a mutex between “being alive” in world 1 and “being cured” in world 2!
Notice that cross-world mutexes will take into account the state-interactions that we mentioned as one of the three components making up the distance estimate.
We could compute a subset of same world and cross world mutexes to improve the accuracy of the heuristics… …but it is not clear whether or not the accuracy comes at too much additional cost to
have reasonable impact on efficiency.. [see Bryce et. Al. JAIR submission]
Heuristics for sensing
We need to compare the cumulative distance of B1 and B2 to goal with that of B3 to goal Notice that Planning cost is related to plan
size while plan exec cost is related to the length of the deepest branch (or expected length of a branch)
If we use the conformant belief state distance (as discussed last class), then we will be over estimating the distance (since sensing may allow us to do shorter branch)
Bryce [ICAPS 05—submitted] starts wth the conformant relaxed plan and introduces sensory actions into the plan to estimate the cost more accurately
As
A
7
12,000
11,000
300
B1
B2
B3
A set of states is a logical formulaA transition function is also a logical formulaProjection is a logical operation
Symbolic Projection
Symbolic Manipulation with OBDDs
Strategy Represent data as set of OBDDs
Identical variable orderings Express solution method as sequence of symbolic operations
Sequence of constructor & query operations Similar style to on-line algorithm
Implement each operation by OBDD manipulation Do all the work in the constructor operations
Key Algorithmic Properties Arguments are OBDDs with identical variable orderings Result is OBDD with same ordering Each step polynomial complexity
[From Bryant’s slides]
Transition function as a BDD
Belief stateas a BDD
BDDs for representing States & Transition Function
Argument F
Restriction Execution Example
0
a
b
c
d
1 0
a
c
d
1
Restriction F[b=1]
0
c
d
1
Reduced Result
Don’t look beyond this point
A* vs. AO* Search
A* search finds a path in in an “or” graph
AO* search finds an “And” path in an And-Or graph
AO*A* if there are no AND branches
AO* typically used for problem reduction search