Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas

AmesResearchCenter

Planning with Uncertainty in Continuous Domains

Richard DeardenNo fixed abode

Joint work with:Zhengzhu Feng

U. Mass Amherst Nicolas Meuleau, Dave Smith

NASA AmesRichard Washington

Google

AmesResearchCenter

Motivation

Panorama

Image rock Image Rock

Dig Trench

?

Problem: Scientists are interested in many potential targets. How to

decide which to pursue?

AmesResearchCenter

Motivation

Panorama

Image rock

Image Rock

Dig Trench

?Time?

Power?

Likelihood of Success?

Different value targets

AmesResearchCenter

Outline

Introduction Problem Definition A Classical Planning Approach The Markov Decision Problem approach Final Comments

AmesResearchCenter

Problem Definition Aim: To select a “plan” that “maximises” long-

term expected reward received given:• Limited resources (time, power, memory capacity).• Uncertainty about the resources required to carry out

each action (“how long will it take to drive to that rock?”).

• Hard safety constraints over action applicability (must keep enough reserve power to maintain the rover).

• Uncertain action outcomes (some targets may be unreachable, instruments may be impossible to place).

Difficulties:• Continuous resources.• Actions have uncertain continuous outcomes.• Goal selection and optimization• Also possibly concurrency, …

AmesResearchCenter

Possible Approaches Contingency Planning:

• Generate a single plan, but with branches.• Branch based on the actual outcome of the actions

performed so far in the plan.

Policy-based Planning:• A plan is now a policy: a mapping from states to

actions.• There’s something to do no matter what the

outcome of the actions so far.• More general, but harder to compute.

Power > 5Ah

Power 5 Ah

AmesResearchCenter

An Example Problem

Drive (-2)Dig(60)Visual servo (.2, -.15) NIR

Lo res Rock finder NIR

E > .1 Ah = .05 Ah = .02 Ah

E > .6 Ah = .2 Ah = .2 Ah

= 40s = 20s

= 60s = 1s

E > 10 Ah = 5 Ah = 2.5 Ah

= 1000s = 500s

V = 100

t [9:00, 16:00] = 5s = 1s

E > .02 Ah = .01 Ah = 0 Ah

= 120s = 20s

E > .12 Ah = .1 Ah = .01 Ah

V = 50

HiRes V = 10

E > 3 Ah = 2 Ah = .5 Ah

t [10:00, 13:50] = 600s = 60s

t [10:00, 14:00] = 600s = 60s

E > 3 Ah = 2 Ah = .5 Ah

t [9:00, 14:30] = 5s = 1s

E > .02 Ah = .01 Ah = 0 Ah

V = 5

AmesResearchCenter

Value Function

ExpectedValue

PowerStart time

10

15

20

5

13:20

14:40

14:20

14:0013:40

AmesResearchCenter

Value Function

PowerStart time

10

15

20

5

13:20

14:40

14:2014:00

13:40



E > .1 Ah = .05 Ah = .02 Ah

E > .6 Ah = .2 Ah = .2 Ah

= 40s = 20s

= 60s = 1s

E > 10 Ah = 5 Ah = 2.5 Ah

= 1000s = 500s

V = 100

t [9:00, 16:00] = 5s = 1s

E > .02 Ah = .01 Ah = 0 Ah

= 120s = 20s

E > .12 Ah = .1 Ah = .01 Ah

V = 50

HiRes V = 10

E > 3 Ah = 2 Ah = .5 Ah

t [10:00, 13:50] = 600s = 60s

t [10:00, 14:00] = 600s = 60s

E > 3 Ah = 2 Ah = .5 Ah

t [9:00, 14:30] = 5s = 1s

E > .02 Ah = .01 Ah = 0 Ah

V = 5

AmesResearchCenter

Plans



Time > 13:40 or Power < 10

Contingency Planning:

Policy-based Planning:• Regions of state

space have corresponding actions.

VisualServoVisualServo

Lo-Res

Hi-Res

Time < 13:40 and Power > 10 : VisualServoTime > 14:15 and Time < 14:30 and Power > 10 : Hi-Res…

AmesResearchCenter

Contingency Planning

1. Seed plan2. Identify best branch point3. Generate a contingency branch4. Evaluate & integrate the branch

? ?? ?

r

Vb

Vm

Construct plangraph

Back-propagate value tables

Compute gain

AmesResearchCenter

Construct Plangraph

g1

g2

g3

g4

AmesResearchCenter

Add Resource Usages and Values

g1

g2

g3

g4

V1

V2

V3

V4

AmesResearchCenter

Value Graphs

g1

g2

g3

g4

V1

V2

V3

V4

r

r

r

r

AmesResearchCenter

Propagate Value Graphs

g1

g2

g3

g4

V1

V2

V3

V4

r

r

r

r

v

r

v

r

v

r

AmesResearchCenter

p

r5 15

.1

V

p

r5 10

.2

v

r

v

r5 15

v

r10 25

Simple Back-propagation

AmesResearchCenter

p

r5 15

.1

V

p

r5 10

.2

v

r

v

r5 15

v

r10 25

r > 15

Constraints

AmesResearchCenter

p

r5 15

.1

V

p

r5 10

.2 v

r5 15

v

r10 25

p q

ts

v

r5 15

v

r

{t}

p

r5

{q}v

r10 20

{q}

{t}

Conjunctions

AmesResearchCenter

p

r5 15

.1

V

p

r5 10

.2v

r10 25

p q

ts

p

r5

v

r10 20

{q}

{t}

v

r10 25

v

r10 20

Back-propagating Conditions

AmesResearchCenter

p

r5 15

.1

V

p

r5 10

.2v

r10 25

p q

ts

p

r5

v

r10 20

{q}

{t}

r30

v

15

30

v

15

v

r10 25

v

r10 20

Back-propagating Conditions

AmesResearchCenter

B

D

A

C

CDAB

CABDCADB

ACBDACDB

ABCD

Which Orderings

AmesResearchCenter

v2

r

p

r10 20

.1

p

r5 10

.2v1 r5 10

v2

r10 20

rv1

v2

r10 20

v1

p

r5

v2

r10 20

v1

Max

Combining Tables

AmesResearchCenter

v2

r

p

r10 20

.1

p

r5 10

.2v1 r5 10

v2

r10 20

rv1

p

r5

v2

r10 20

v1

v1+ v2

30

v2

r10 20

v1

v1+ v2

30

Achieving Both Goals

AmesResearchCenter

V1

V2

V3

V4

V

r

V

r

V

r

V

r

Max

Estimating Branch Value

AmesResearchCenter

r

V1

V2

V3

V4

r

P

r

plan value functionresource probability

Vm

Vb

Estimating Branch Value

AmesResearchCenter

r

V1

V2

V3

V4

Vb

r

P

r

Gain = ∫ P(r) max{0,Vb(r) - Vm(r)} dr∞

0

Vm

Vb

Expected Branch Gain

AmesResearchCenter

Heuristic Guidance Plangraphs generally used as heuristics – the

plans they produce may not be executable:• Not all orderings considered.• All the usual plangraph limitations:

– Delete lists generally not considered.– No mutual-exclusion representation.

• Discrete outcomes not (currently) handled.– Action uncertainty is only in resource usage, not

resulting state. Output used as heuristic guidance for classical

planner:• Start state• Goal(s) to achieve

Result is an executable plan of high value!

Drive (-1)Dig(5)Visual servo (.2, -.15) Hi res


AmesResearchCenter

ExpectedValue

PowerStart time

1015

20

5

13:20

14:4014:20

14:0013:40

Evaluating the final plan Plangraph gives a heuristic estimate of the

value of the plan. Better estimate can be computed using Monte-

Carlo techniques, but these are quite slow for a multi-dimensional continuous problem.

Figure required 500 samples per point, 4000x2000 points, so simulation of every branch of the plan 4 thousand million times. Slow!

AmesResearchCenter

Outline

Introduction Problem Definition A Classical Planning Approach The Markov Decision Problem approach Final Comments

AmesResearchCenter

MDP Approach: Motivation

ExpectedValue

PowerStart time

1015

20

5

13:20

14:4014:20

14:0013:40

Constant value function throughout region. Wouldn’t it be nice to only compute the value once!

Approach: Exploit the structure in the problem to find constant (or linear regions).

AmesResearchCenter

Continuous MDPs States: X = {X1,X2, . . . ,Xn} Actions: A = {a1, a2, . . . , am} Transition: Pa(X0|X) Reward: Ra(X) Dynamic programming (Bellman Backup):

Can’t be computed in general without discretization

AmesResearchCenter

Symbolic Dynamic Programming Special representation of transition, reward

and value using MTBDDs for discrete variables, kd-trees for continuous.

Representation makes problem structure (if any) explicit.

Dynamic programming on both the value function and the structured representation.

Idea is to do all operations of Bellman equation in MTBDD/kd-tree form.

AmesResearchCenter

Requires rectangular transition, reward functions:

Continuous State Abstraction

Transition probabilities remain constant (relative to current value) over region.

Transition function is discrete: approximate continuous functions by discretizing.• Required so family of

value functions is closed under the Bellman Equation.

AmesResearchCenter

Requires rectangular transition, reward functions:

Continuous State Abstraction

Reward function piecewise constant or linear over region.

This, along with discrete transition function, ensures all value functions computed using Bellman equation are also piecewise constant or linear.

Approach is to compute exact solution to approximate model.

AmesResearchCenter

Value Iteration

Theorem: If Vn is rectangular PWC (PWL), then Vn+1 is rectangular PWC (PWL).

Pa Vn Vn+1

Represent rectangular partitions using kd-trees.

AmesResearchCenter

Partitioning

AmesResearchCenter

Performance: 2 Continuous Variables

AmesResearchCenter

Performance: 3 Continuous Variables

For naïve, we just discretize everything at the given input resolution. For the others, we discretize the transition functions at that resolution,

but the algorithm may increase the resolution to accurately represent that final value function. This means that the value function is actually more accurate than for the naïve algorithm.

AmesResearchCenter

Final Remarks Plangraph–based approach:

• Produces “plans” - easy for people to interpret.• Fast heuristic estimate of the value of a plan/plan fragment.• Need an effective way to evaluate actual values to really

know a branch is worthwhile.• Efficient representation for problems with many goals.• Still missing discrete action outcomes

MDP-based approach:• Produces optimal policies – the best you could possibly do.• Faster, more accurate value fn. computation (if there’s

structure).• Hard to represent some problems effectively (e.g. fact that

goals are worth something only before you reach them).• Policies are hard to interpret by humans.

Can be combined: Use MDP approach to evaluate quality of plans/plan fragments.

AmesResearchCenter

Future Work We approximate by building an approximate model,

then solving it exactly. One could also approximately solve the exact model.

The plangraph approach takes advantage of the current system state when planning to narrow the search. The MDP policy probably includes value computation for many unreachable states.

Preference elicitation is very important here. With many goals we need good estimates of their value.

This is part of a greater whole—rover planning problems.• Is the policy sufficiently efficiently encoded to transmit to

the rover?• How much more complex does the executive need to be to

carry out a contingent plan?

Documents

Ames Research Center Planning with Uncertainty in Continuous Domains Richard Dearden No fixed abode Joint work with: Zhengzhu Feng U. Mass Amherst Nicolas