37
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

Embed Size (px)

Citation preview

Page 1: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

Conformant Probabilistic Planning via CSPs

ICAPS-2003

Nathanael Hyafil & Fahiem Bacchus

University of Toronto

Page 2: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

2

Contributions• Conformant Probabilistic Planning

– Uncertainty about initial state

– Probabilistic Actions

– No observations during plan execution

– Find plan with highest probability of achieving the goal

• Utilize a CSP Approach– Encode the problem as a CSP

– Develop new techniques for solving this kind of CSP

• Compare Decision-Theoretic algorithms

Page 3: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

3

Conformant Probabilistic Planning

CPP Problem consists of:• S: (finite) set of states (factored representation)• B: (initial) probability distribution over S (belief state)• A: (finite) set of (prob.) actions (represented as sequential-

effect trees)• G: set of Goal states (Boolean expression over state vars)• n: length/horizon of the plan to be computed

Solution: find a plan (finite sequence of actions) that maximizes the probability of reaching a goal state from B

Page 4: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

4

Example Problem: SandCastle• S: {moat, castle}, both Boolean• G: castle = true• A: sequential effects tree representation (Littman 1997)

Erect-Castle

castle

castle

T F1.0 moat

T F1.0 0.5

moat

moat

T Fcastle 0.0

T F0.75 castle:new

T F0.67 0.25

R1 R2R3

R4

Page 5: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

5

Constraint Satisfaction Problems

• Encode the CPP as a CSP• CSP

– finite set of variables each with its own finite domain– finite set of constraints, each over some subset of the variables.

• Constraint: satisfied by only some variable assignments.• Solution: an assignment to all Variables that satisfies all

Constraints• Standard Algorithm: Depth-First Tree Search with

Constraint Propagation

Page 6: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

6

The Encoding• Variables:

– State variables (usually Boolean)– Action variables {1, …, |A|}– Random variables to encode the action probabilities

(True/False/Irrelevant). • The setting of the random variables makes the actions deterministic• Random variables are independent

• Constraints– Initial belief state represented as an “Initial action” (as in partial

order planning).– Actions constraint the state variables at step i with those at step i+1

• Each branch of the effects trees is represented as one constraint.

– A constraint representing the goal

Page 7: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

7

V1

V2

T FState Var

Page 8: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

8

V1

V2

T FState Var

Page 9: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

9

V1

V2

a1 a2 a3

State Var

Action VarT F

Page 10: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

10

V1

V2

a1 a2 a3

State Var

Action Var

Random Var

T F

Page 11: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

11

V1

V2

A1

State Var

Action Var

Random Var

T F

T F

Page 12: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

12

V1

V2

A1

State Var

Action Var

Random Var

T F

T F

Page 13: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

13

V1

V2

A1

State Var

Action Var

Random Var

T F

Page 14: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

14

V1

V2

A1

State Var

Action Var

Random Var

T F

Goal States

Page 15: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

15

V1

V2

A1

State Var

Action Var

Random Var

T F

Goal States

a2

Page 16: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

16

So far

• Encoded CPP as a CSP

• Solved the CSP

• How do we now solve the CPP?

Page 17: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

17

State Var

Action Var

Random Var

Goal States

CSPSolutions

Page 18: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

18

State Var

Action Var

Random Var

Goal States

CSP solution:

assignment to State/Action/Random variables that is

- valid sequence of transitions- from initial - to goal state

Page 19: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

19

State Var

Action Var

Random Var

Goal Statesa1

a1

Action variables: plan executed

State variables: execution path induced by plan

Page 20: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

20

State Var

Action Var

Random Var

Goal Statesa1

a2

Page 21: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

21

State Var

Action Var

Random Var

Goal States

a2

a1

Page 22: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

22

State Var

Action Var

Random Var

Goal States

1-.25

.25

Probability ofa Solution

Product prob of Random vars :probability that this path was traversed by this plan

Page 23: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

23

State Var

Action Var

Random Var

Goal Statesa1

a2

Value ofa Plan

a1

a2

Value of plan : probs of all solutions with plan =

After all ’s evaluated:optimal plan = one with highest value

Page 24: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

24

Redundant Computations

• Due to the Markov property if the same state is encountered again at step i of the plan the subtree below will be the same– If we can cache all the info in this subtree:

explore only once

• To compute the best overall n-step plan: need to know value of every n-1 step plan for all states at step 1.

Page 25: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

25

Caching• Probability of success (value) of a i step plan

<a, > in state s = expectation of ’s success probability over the states reached from s by a:

• If we know the value of in each of these states: can compute its value in s without further search

• So, for each state s reached at step i, we cache the value of all n-i step plans

s

s1 s3s2

V1

a

.5 .2 .3

V3

V2

Page 26: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

26

CPplan Algorithm• CPplan():

Select next unassigned variable VIf V is last state var of a step:

If this state/step is cached returnElse if all vars are assigned (must be at a goal state):

Cache 1 as value of previous state/stepElse

For each value d of VV=dCPplan()If V is some Action var Ai

Update Cache for previous state/step with values of plans starting with d

Page 27: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

27

Caching Scheme

• Needs a lot of memory– proportional to |S|*|A|n

– no known algorithm does better

• Other features– Cache key simple (state / step)– Partial Caching achieves a good

space/time tradeoff

Page 28: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

28

MAXPLAN (Majercik & Littman 1998)

• Parallel approach based on Stochastic SAT

• Caching Scheme different

• Faster than Buridan and other AI Planners

• uses (even) more memory than CPplan

• 2 to 3 orders of magnitude slower than CPplan

Page 29: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

29

Results vs. Maxplan

0.1

1

10

100

1000

10000

10 11 12 13 14 15 16 17 18 19 20

MaxPlanCpplan

SandCastle-67 Slippery GripperTimeTime

Number of Steps Number of Steps

0.1

1

10

100

1000

10000

5 6 7 8 9 10 11 12

MaxplanCpplan

Page 30: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

30

CPP as special case of POMDPs

• POMDP: model for probabilistic planning in partially observable environments

• CPP can be cast as a POMDP in which there are no observations.

• Value Iteration, a standard POMDP algorithm, can be used to compute a solution to CPP.

Page 31: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

31

Value Iteration: Intuitions

• Value Iteration utilizes a powerful form of state abstraction.– Value of an i-step plan (for every belief state) is represented

compactly by vector of values (one for each state): value on a belief state is the expectation of these values.

– This vector of values is called an -vector.– Value iteration need only consider the set of -vectors that are

optimal for some belief state.– Plans optimal for some belief state are optimal over an entire

region of belief states. – So regions of belief states are managed collectively by a single

plan (-vector) that is i-step optimal for all belief states in the region.

Page 32: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

32

-vector Abstraction

• Number of alpha-vectors that need to be considered might grow much more slowly than the number of action sequences.

• Slippery Gripper:– 1 step to go: 2 -vectors instead of 4 actions– 2 steps to go: 6 instead of 16 (action sequences)– 3 steps to go: 10 instead of 64– 10 steps to go: 40 instead of >106

Page 33: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

33

Results vs. POMDP

0.01

0.1

1

10

100

1000

5 6 7 8 9 10 11 12 13 14 15

POMDPCpplan

Time

Number of Steps

Slippery Gripper Grid 10x10Time

Number of Steps

0.1

1

10

100

1000

5 6 7 8 9 10 11 12 13

POMDPCpplan

Page 34: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

34

Dynamic Reachability

• POMDPs: small portion of all possible plans to evaluate but on all belief states including those not reachable from the initial belief state.

• Combinatorial Planners (CPplan, Maxplan) must evaluate all |A|n plans but tree search performs dynamic reachability and goal attainability analysis to only evaluate plans on reachable states at each step

• Ex: Grid 10x10, only 4 states reachable in 1 step

Page 35: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

35

Conclusion -- Future Work

• New approach to CPP, better than previous AI planning techniques (Maxplan, Buridan)

• Analysis of respective benefits of decision theoretic techniques and AI techniques

• Ways to combine abstraction with dynamic reachability for POMDPs and MDPs.

Page 36: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

36

Results vs. Maxplan

SandCastle - 67 Slippery Gripper

Time Time

Number of Steps Number of Steps

Page 37: Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto

37

Results vs. POMDP

Time

Number of Steps

Slippery Gripper Grid 10x10

Time

Number of Steps