Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus...

Conformant Probabilistic Planning via CSPs

ICAPS-2003

Nathanael Hyafil & Fahiem Bacchus

University of Toronto

Contributions• Conformant Probabilistic Planning

– Uncertainty about initial state

– Probabilistic Actions

– No observations during plan execution

– Find plan with highest probability of achieving the goal

• Utilize a CSP Approach– Encode the problem as a CSP

– Develop new techniques for solving this kind of CSP

• Compare Decision-Theoretic algorithms

Conformant Probabilistic Planning

CPP Problem consists of:• S: (finite) set of states (factored representation)• B: (initial) probability distribution over S (belief state)• A: (finite) set of (prob.) actions (represented as sequential-

effect trees)• G: set of Goal states (Boolean expression over state vars)• n: length/horizon of the plan to be computed

Solution: find a plan (finite sequence of actions) that maximizes the probability of reaching a goal state from B

Example Problem: SandCastle• S: {moat, castle}, both Boolean• G: castle = true• A: sequential effects tree representation (Littman 1997)

Erect-Castle

castle

T F1.0 moat

T F1.0 0.5

T Fcastle 0.0

T F0.75 castle:new

T F0.67 0.25

R1 R2R3

Constraint Satisfaction Problems

• Encode the CPP as a CSP• CSP

– finite set of variables each with its own finite domain– finite set of constraints, each over some subset of the variables.

• Constraint: satisfied by only some variable assignments.• Solution: an assignment to all Variables that satisfies all

Constraints• Standard Algorithm: Depth-First Tree Search with

Constraint Propagation

The Encoding• Variables:

– State variables (usually Boolean)– Action variables {1, …, |A|}– Random variables to encode the action probabilities

(True/False/Irrelevant). • The setting of the random variables makes the actions deterministic• Random variables are independent

• Constraints– Initial belief state represented as an “Initial action” (as in partial

order planning).– Actions constraint the state variables at step i with those at step i+1

• Each branch of the effects trees is represented as one constraint.

– A constraint representing the goal

T FState Var

a1 a2 a3

State Var

Action VarT F

a1 a2 a3

State Var

Action Var

Random Var

State Var

Action Var

Random Var

State Var

Action Var

Random Var

State Var

Action Var

Random Var

State Var

Action Var

Random Var

Goal States

State Var

Action Var

Random Var

Goal States

So far

• Encoded CPP as a CSP

• Solved the CSP

• How do we now solve the CPP?

State Var

Action Var

Random Var

Goal States

CSPSolutions

State Var

Action Var

Random Var

Goal States

CSP solution:

assignment to State/Action/Random variables that is

- valid sequence of transitions- from initial - to goal state

State Var

Action Var

Random Var

Goal Statesa1

Action variables: plan executed

State variables: execution path induced by plan

State Var

Action Var

Random Var

Goal Statesa1

State Var

Action Var

Random Var

Goal States

State Var

Action Var

Random Var

Goal States

Probability ofa Solution

Product prob of Random vars :probability that this path was traversed by this plan

State Var

Action Var

Random Var

Goal Statesa1

Value ofa Plan

Value of plan : probs of all solutions with plan =

After all ’s evaluated:optimal plan = one with highest value

Redundant Computations

• Due to the Markov property if the same state is encountered again at step i of the plan the subtree below will be the same– If we can cache all the info in this subtree:

explore only once

• To compute the best overall n-step plan: need to know value of every n-1 step plan for all states at step 1.

Caching• Probability of success (value) of a i step plan

<a, > in state s = expectation of ’s success probability over the states reached from s by a:

• If we know the value of in each of these states: can compute its value in s without further search

• So, for each state s reached at step i, we cache the value of all n-i step plans

s1 s3s2

.5 .2 .3

CPplan Algorithm• CPplan():

Select next unassigned variable VIf V is last state var of a step:

If this state/step is cached returnElse if all vars are assigned (must be at a goal state):

Cache 1 as value of previous state/stepElse

For each value d of VV=dCPplan()If V is some Action var Ai

Update Cache for previous state/step with values of plans starting with d

Caching Scheme

• Needs a lot of memory– proportional to |S|*|A|n

– no known algorithm does better

• Other features– Cache key simple (state / step)– Partial Caching achieves a good

space/time tradeoff

MAXPLAN (Majercik & Littman 1998)

• Parallel approach based on Stochastic SAT

• Caching Scheme different

• Faster than Buridan and other AI Planners

• uses (even) more memory than CPplan

• 2 to 3 orders of magnitude slower than CPplan

Results vs. Maxplan

10 11 12 13 14 15 16 17 18 19 20

MaxPlanCpplan

SandCastle-67 Slippery GripperTimeTime

Number of Steps Number of Steps

5 6 7 8 9 10 11 12

MaxplanCpplan

CPP as special case of POMDPs

• POMDP: model for probabilistic planning in partially observable environments

• CPP can be cast as a POMDP in which there are no observations.

• Value Iteration, a standard POMDP algorithm, can be used to compute a solution to CPP.

Value Iteration: Intuitions

• Value Iteration utilizes a powerful form of state abstraction.– Value of an i-step plan (for every belief state) is represented

compactly by vector of values (one for each state): value on a belief state is the expectation of these values.

– This vector of values is called an -vector.– Value iteration need only consider the set of -vectors that are

optimal for some belief state.– Plans optimal for some belief state are optimal over an entire

region of belief states. – So regions of belief states are managed collectively by a single

plan (-vector) that is i-step optimal for all belief states in the region.

-vector Abstraction

• Number of alpha-vectors that need to be considered might grow much more slowly than the number of action sequences.

• Slippery Gripper:– 1 step to go: 2 -vectors instead of 4 actions– 2 steps to go: 6 instead of 16 (action sequences)– 3 steps to go: 10 instead of 64– 10 steps to go: 40 instead of >106

Results vs. POMDP

5 6 7 8 9 10 11 12 13 14 15

POMDPCpplan

Number of Steps

Slippery Gripper Grid 10x10Time

Number of Steps

5 6 7 8 9 10 11 12 13

POMDPCpplan

Dynamic Reachability

• POMDPs: small portion of all possible plans to evaluate but on all belief states including those not reachable from the initial belief state.

• Combinatorial Planners (CPplan, Maxplan) must evaluate all |A|n plans but tree search performs dynamic reachability and goal attainability analysis to only evaluate plans on reachable states at each step

• Ex: Grid 10x10, only 4 states reachable in 1 step

Conclusion -- Future Work

• New approach to CPP, better than previous AI planning techniques (Maxplan, Buridan)

• Analysis of respective benefits of decision theoretic techniques and AI techniques

• Ways to combine abstraction with dynamic reachability for POMDPs and MDPs.

Results vs. Maxplan

SandCastle - 67 Slippery Gripper

Time Time

Number of Steps Number of Steps

Results vs. POMDP

Number of Steps

Slippery Gripper Grid 10x10

Number of Steps

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus...

Documents

Overview of AI Planning - Department of Computer Science ...fbacchus/Presentations/Poverview95.pdf · An Overview of AI Planning Fahiem Bacchus University of Toronto. 10/28/2005 Fahiem

ICAPS/IET Program Development Guide FY20 · The Integrated Career & Academic Preparation System (ICAPS), is the implementation of the IET (Integrated Education and Training) model

PBF - ICAPS MANUAL

icaps16.icaps-conference.orgicaps16.icaps-conference.org/proceedings/coplas16.pdf · 2020-04-09 · Organising Committee Miguel A. Salido (msalido@dsic.upv.es) Universidad Politécnica

2006 08 20 ICAPS Manual - David Matsumoto · This new edition of the ICAPS manual is intended to bring the manual up to date ... ICAPS total score is the best overall predictor of

Icaps presentation 1

MaxSAT Evaluation 2020 · 2020. 10. 11. · Complete track: Unweighted New and/or improved solvers: I MaxHS by Fahiem Bacchus, University of Toronto Jeremias Berg, Fahiem Bacchus

Approximation Based Reasoning and Conformant/Conditional

Full ICAPS/CP/KR conference program PDF - 2008icaps08.cecs.anu.edu.au/CP-ICAPS-KR-program.pdf · Conference Program Sydney, ... ¥ Artificial Intelligence Journal ... The Association

ISO 50001 Conformant Energy Management Systemsenergy.gov/sites/prod/files/2013/12/f5/commlbldgs... · ISO 50001-conformant . Energy Management Systems . ... ISO 50001- Energy management

ISO 50001 Conformant Energy Management Systems · eere.energy.gov . BTO Program Peer Review . ISO 50001-conformant . Energy Management Systems . ... ISO 50001 Conformant Energy Management

Icaps 2013 - From Scheduling to Planning with Timelines: An ...icaps13.icaps-conference.org/wp-content/uploads/2012/06/...ICAPS 2013 Summer School Special track on P&S Space Applications

HDIP Proceedings - ICAPS conference

Non-Conformant Harmonization: the Real Book in the Style

Bridges, ICAPS and Beyond Transitions In Illinoisicsps.illinoisstate.edu/wp-content/uploads/2014/11/BridgesICAPSand... · Bridges, ICAPS and Beyond ... The goal of bridge programs

Non-Conformant Harmonization: the Real Book in …computationalcreativity.net/iccc2014/wp-content/uploads/...Non-Conformant Harmonization: the Real Book in the Style of Take 6 François

Representing, Eliciting, and Reasoning with …icaps09.icaps-conference.org/tutorials/tut4.pdfRepresenting, Eliciting, and Reasoning with Preferences ICAPS-09 Tutorial Ronen Brafman

Caching in Backtracking Search Fahiem Bacchus University of Toronto

Conference Program - ICAPSicaps11.icaps-conference.org/ICAPS-full-program.pdf · Piergiorgio Bertoli (Istituto Trentino di Cultura, Italy) Minh Do (Palo Alto Research Center, USA)

Fragment-Based Conformant Planning