1001 Ways to Skin a Planning Graph for Heuristic Fun and Profit Subbarao Kambhampati Arizona State University (With tons of

1001 Ways to Skin a Planning Graph for Heuristic Fun and Profit

Subbarao Kambhampati

Arizona State University

http://rakaposhi.eas.asu.edu

(With tons of help from Daniel Bryce, Minh Binh Do, Xuan Long NguyenRomeo Sanchez Nigenda, Biplav Srivastava, Terry Zimmerman)

Funding from NSF & NASA

987

WMD-in-the-toilet“After the flush, you may find

that there were no bombs to begin with”

http://rakaposhi.eas.asu.edu/

http://rakaposhi.eas.asu.edu/

Planning Graph and Projection

• Envelope of Progression Tree (Relaxed Progression)– Proposition lists: Union

of states at kth level– Mutex: Subsets of

literals that cannot be part of any legal state

• Lowerbound reachability information

p pqrs

pqrst

A1A2

A3

A1

A2A3A4

[Blum&Furst, 1995] [ECP, 1997]

p

pq

pr

ps

pqr

pq

pqs

psq

ps

pst

A1A2

A3

A2A1A3

A1A3

A4

Planning Graphs can be used as the basis forheuristics!

http://images.google.com/imgres?imgurl=pmct.org/hsart/artweb1/webpages/markfoscolo/images/light%2520bulb.jpg&imgrefurl=http://www.lejeune.usmc.mil/emd/Cultural/ohr.htm&h=450&w=444&prev=/images%3Fq%3Dlight%2Bbulb%26svnum%3D10%26hl%3Den%26lr%3D%26ie%3DUTF-8%26oe%3DUTF-8%26sa%3DN

And PG Heuristics for all..

– Classical (regression) planning– AltAlt (AAAI 2000; AIJ 2002); AltAltp (JAIR 2003)

• Serial vs. Parallel graphs; Level and Adjusted heuristics; Partial expansion

– Graphplan style search– GP-HSP (AIPS 2000)

• Variable/Value ordering heuristics based on distances

– Partial order planning– RePOP (IJCAI 2001)

• Mutexes used to detect Indirect Conflicts

– Metric Temporal Planning– Sapa (ECP 2001; AIPS 2002; JAIR 2003)

• Propagation of cost functions; Phased relaxation

– Conformant Planning– CAltAlt (ICAPS Uncertanity Wkshp, 2003)

Multiple graphs; Labelled graphs

And PG Heuristics for all..



– Graphplan style search– GP-HSP (AIPS 2000); PEGG (IJCAI 2003; AAAI 1999]







• Multiple graphs; Labelled graphs

Cavea

t:

“All T

empe

,

All t

he tim

e”

I. PG Heuristics for State-space (Regression) planners

[AAAI 2000; AIPS 2000; AIJ 2002; JAIR 2003]

Problem: Given a set of subgoals (regressed state) estimate how far they are from the initial state

Graphplan Graph

Extension Phase

(based on STAN)

Planning

Graph

Actions in the

Last Level

Action Templates Extraction of

Heuristics

Heuristic

Regression Planner

(based on HSP-R)Problem Specification

(Initial and Goal State)

Planning Graphs: Optimistic Projection of Achievability

At(0,0)

Key(0,1)

Prop listLevel 0

At(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

x

At(0,0)

key(0,1)

Prop listLevel 1

x

At(0,0)

Key(0,1)

noop

noop

x

Action listLevel 1

x

Prop listLevel 2

Move(0,1,1,1)At(1,1)

At(1,0)

At(0,1)

Move(1,0,1,1)

noop

noop

x

x

xx

xx

…...

x

…...

Pick_key(0,1) Have_key

~Key(0,1)xx

x

xx

Mutexes

Initial state

0 1 2

0 1 2

Goal state

Grid Problem

• Serial PG: PG where any pair of non-noop actions are marked mutex• lev(S): index of the first level where all props in S appear non-mutexed.

– If there is no such level, then• If the graph is grown to level off, then • Else k+1 (k is the current length of the graph)

Cost of a Set of Literals

• lev(p) : index of the first level at which p comes into the planning graph• lev(S): index of the first level where all props in S appear non-mutexed.

– If there is no such level, thenIf the graph is grown to level off, then Else k+1 (k is the current length of the graph)

Sum Set-Level

Partition-k Adjusted Sum ComboSet-Level with memos

h(S) = pS lev({p}) h(S) = lev(S)

Admissible

At(0,0)

Key(0,1)

Prop listLevel 0

At(0,0)

Key(0,1)

Prop listLevel 0

At(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

x

At(0,0)

key(0,1)

Prop listLevel 1

x

At(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

xAt(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

x

At(0,0)

key(0,1)

Prop listLevel 1

x

At(0,0)

Key(0,1)

noop

noop

x

Action listLevel 1

x

Prop listLevel 2

Move(0,1,1,1)At(1,1)

At(1,0)

At(0,1)

Move(1,0,1,1)

noop

noop

x

x

xx

xx

…...

x

…...


~Key(0,1)xx

x

xx

Mutexes

At(0,0)

Key(0,1)

noop

noop

x

Action listLevel 1

x

Prop listLevel 2

Move(0,1,1,1)At(1,1)

At(1,0)

At(0,1)

Move(1,0,1,1)

noop

noop

x

x

xx

xx

…...

x

…...


~Key(0,1)xx

x

xx

Mutexes

PROBLEM Level Sum AdjSum2M

Gripper-25 - 69/0.98 67/1.57

Gripper-30 - 81/1.63 77/2.83

Tower-7 127/1.28 127/0.95 127/1.37

Tower-9 511/47.91 511/16.04 511/48.45

8-Puzzle1 31/6.25 39/0.35 31/0.69

8-Puzzle2 30/0.74 34/0.47 30/0.74

Mystery-6 - - 16/62.5

Mistery-9 8/0.53 8/0.66 8/0.49

Mprime-3 4/1.87 4/1.88 4/1.67

Mprime-4 8/1.83 8/2.34 10/1.49

Aips-grid1 14/1.07 14/1.12 14/0.88

Aips-grid2 - - 34/95.98

Adjusting the Sum Heuristic

• Start with Sum heuristic and adjust it to take subgoal interactions into account – Negative interactions in terms

of “degree of interaction”– Positive interactions in terms

of co-achievement links • Ignore negative interactions

when accounting for positive interactions (and vice versa)

At(0,0)

Key(0,1)

Prop listLevel 0

At(0,0)

Key(0,1)

Prop listLevel 0

At(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

x

At(0,0)

key(0,1)

Prop listLevel 1

x

At(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

xAt(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

x

At(0,0)

key(0,1)

Prop listLevel 1

x

At(0,0)

Key(0,1)

noop

noop

x

Action listLevel 1

x

Prop listLevel 2

Move(0,1,1,1)At(1,1)

At(1,0)

At(0,1)

Move(1,0,1,1)

noop

noop

x

x

xx

xx

…...

x

…...


~Key(0,1)xx

x

xx

Mutexes

At(0,0)

Key(0,1)

noop

noop

x

Action listLevel 1

x

Prop listLevel 2

Move(0,1,1,1)At(1,1)

At(1,0)

At(0,1)

Move(1,0,1,1)

noop

noop

x

x

xx

xx

…...

x

…...


~Key(0,1)xx

x

xx

Mutexes

[AAAI 2000]

HAdjSum2M(S) = length(RelaxedPlan(S)) + max p,qS (p,q)

Where (p,q) = lev({p,q}) - max{lev(p), lev(q)} /*Degree of –ve Interaction */

Optimizations in Heuristic Computation

• Taming Space/Time costs

• Bi-level Planning Graph representation

• Partial expansion of the PG (stop before level-off)

– It is FINE to cut corners when using PG for heuristics (instead of search)!!

• Branching factor can still be quite high

– Use actions appearing in the PG• Select actions in lev(S) vs Levels-off

Heuristic extracted from partial graph vs. leveled graph

0.1

1

10

100

1000

10000

100000

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161

Problems

Tim

e(S

eco

nd

s)

Levels-off

Lev(S)

•A •A•A •A

•B•B

•C

•B

•C

•D

•B

•C

•D

•E

•A

•B

•C

•D

•E

x x

x

x

x

x

xxx

•A •A•A •A

•B•B

•C

•B

•C

•D

•B

•C

•D

•E

•A

•B

•C

•D

•E

x x

x

x

x

x

xxx

Goals C,D are presentExample: Levels off

Trade-off

Discarded

AltAlt Performance

Logistics Domain(AIPS-00).

0.01

0.1

1

10

100

1000

10000

100000

Problems.

Tim

e(S

eco

nd

s)

STAN3.0

HSP2.0

HSP-r

AltAlt1.0

Schedule Domain (AIPS-00)

0.01

0.1

1

10

100

1000

10000

100000

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161

Problems

Tim

e(S

ec

on

ds

)

STAN3.0

HSP2.0

AltAlt1.0Logistics

Scheduling

Problem sets from IPC 2000

ZenoTravel AIPS-02

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Problems

Ste

ps

AltAlt

AltAlt-PostProc

AltAlt-p

Even Parallel Plans aren’t safe..Action Templates

Problem Spec(Init, Goal state)

Solution Plan

GraphplanPlan Extension Phase

(based on STAN)

ParallelPlanningGraph

Extraction ofHeuristics

HeuristicsActions in the

Last Level

NodeExpansion(Fattening)

Node Orderingand Selection

PlanCompression

Algorithm(PushUp)

AltAltp

[JAIR 2003]

Logistics AIPS-00

0

10

20

30

40

50

60

70

80

90

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61

Problems

Ste

ps

Altalt-p

STAN

TP4

Blackbox

LPG 2nd

Serial graph over-estimates • Use “parallel” rather than serial PG

as the basis for heuristicsProjection over sets of actions too costly

•Select the branch with the best action and fatten it

• Use “push-up” to make the partial plans more parallel

II. PG heuristics for Graphplan..

PG Heuristics for Graphplan(!)• Goal/Action Ordering

Heuristics for Backward Search

• Propositions are ordered for consideration in decreasing value of their levels.

• Actions supporting a proposition are ordered for consideration in increasing values of their costs

– Cost of an action = 1 + Cost of its set of preconditions

• Use of level heuristics improves the performance significantly.– The heuristics are surprisingly

insensitive to the length of the planning graph

At(0,0)

Key(0,1)

Prop listLevel 0

At(0,0)

Key(0,1)

Prop listLevel 0

At(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

x

At(0,0)

key(0,1)

Prop listLevel 1

x

At(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

xAt(0,1)

At(1,0)

noop

noop

Action listLevel 0

Move(0,0,0,1)

Move(0,0,1,0)

x

At(0,0)

key(0,1)

Prop listLevel 1

x

At(0,0)

Key(0,1)

noop

noop

x

Action listLevel 1

x

Prop listLevel 2

Move(0,1,1,1)At(1,1)

At(1,0)

At(0,1)

Move(1,0,1,1)

noop

noop

x

x

xx

xx

…...

x

…...


~Key(0,1)xx

x

xx

Mutexes

At(0,0)

Key(0,1)

noop

noop

x

Action listLevel 1

x

Prop listLevel 2

Move(0,1,1,1)At(1,1)

At(1,0)

At(0,1)

Move(1,0,1,1)

noop

noop

x

x

xx

xx

…...

x

…...


~Key(0,1)xx

x

xx

Mutexes

[AIPS 2000]0.019/320.019/320.019/32rocket-b

.0098/29.0078/29.0068/29rocket-a

>30->30->30-bw-prob04

0.0218/180.0218/180.0118/18huge-fct

7.428/284.1828/284.1328/28bw-large-c

0.2518/180.2118/180.2118/18bw-large-b

.0112/12.00812/12.00712/12bw-large-a

TLTLTL

+10 levels+5levels+3levels

MOPProblem

0.019/320.019/320.019/32rocket-b

.0098/29.0078/29.0068/29rocket-a

>30->30->30-bw-prob04

0.0218/180.0218/180.0118/18huge-fct

7.428/284.1828/284.1328/28bw-large-c

0.2518/180.2118/180.2118/18bw-large-b

.0112/12.00812/12.00712/12bw-large-a

TLTLTL

+10 levels+5levels+3levels

MOPProblem

2 43

…And then state-space heuristics for Graphplan

(PEGG)

E

Y

Q

E

Y

R

T

E

F

R

-

Init

State

A

C

E

F

K

0 1

Goal

X

Y

Z

5

X

W

Q

-

-

-

W

T

S

-

-

-

W

T

R

Planning Graph (proposition levels)

6

1: Capture a state space view of Graphplan’s search in a search trace

XY a2

a3a4Z

action assignments

Regressed ‘states’

No solution?

extend graph…

6

Init

State

A

C

E

F

K

0 1 2 3 4 5

W

E

R

E

C

E

T

F

D

K

F

W

7

Y

F

Goal

X

Y

Z

W

R

W

R

E

X

W

Q

-

-

-

W

T

S

-

-

-

W

T

R

E

Y

Q

E

Y

R

T

-

-

-

E

F

R

F

R

E

F

J

F

R

A

E

Y

R

T

W

R

E

F

R

…And then state-space heuristics for Graphplan

PEGG now competitive with a heuristic state space planner

ProblemGraphplan PEGG-so

cpu sec (steps/acts)

PEGGcpu sec (steps/acts)

Alt Alt (Lisp version)heuristics:

adjusum2 comboGP-e

bw-large-b 13.4 (18/18) 12.2 3.1 (18/18) 87.1 (/ 18 ) 20.5 (/28 )

bw-large-c s 1104 66.9 (28/28) 738 (/ 28) 114.9 (/38)

bw-large-d s pe 340 (38/38) 2350 (/ 36) * rocket-ext-a 3.5 (7/36) 2.8 (7/34) 1.1 (7/34) 43.6 (/ 40) 1.26 (/ 34)

att-log-a 31.8 (11/79) 2.6 (11/72) 2.2 (11/62) 36.7 ( /56) 2.27( / 64)

Gripper-15 s 47.5 16.7 (36/45) 14.1 (/ 45) 16.98 (/45)

Gripper-20 s s 110.8 (40/59) 38.2 (/ 59) 20.92 (/59)

Tower-9 s (511/511) 118 23.6 (511/511) 121(/511) *8puzzle-1 95.2 (31/31) 31.1 9.2 (31/31) 143.7 ( / 31) 119.5 ( /39)

8puzzle-2 87.5 (30/30) 31.3 7.0 (30/30) 348.3 (/ 30) 50.5 (/ 48)

AIPS 1998 Alt Alt (Lisp version)

grid-y-1 16.7 (14/14) 16.8 16.8 (14/14) 739.4 (/14) 640.5 (/14)

mprime-1 4.8 (4/6) 3.6 (4/6) 2.1 (4/6) 722.6 (/ 4) 79.6 (/ 4)


driverlog-2-3-6b 27.5 (7/20) 1.9 1.9 (7/20) 232

ProblemGraphplan PEGG-so




adjusum2 comboGP-e

bw-large-b 13.4 (18/18) 12.2 3.1 (18/18) 87.1 (/ 18 ) 20.5 (/28 )

bw-large-c s 1104 66.9 (28/28) 738 (/ 28) 114.9 (/38)

bw-large-d s pe 340 (38/38) 2350 (/ 36) * rocket-ext-a 3.5 (7/36) 2.8 (7/34) 1.1 (7/34) 43.6 (/ 40) 1.26 (/ 34)

att-log-a 31.8 (11/79) 2.6 (11/72) 2.2 (11/62) 36.7 ( /56) 2.27( / 64)

Gripper-15 s 47.5 16.7 (36/45) 14.1 (/ 45) 16.98 (/45)

Gripper-20 s s 110.8 (40/59) 38.2 (/ 59) 20.92 (/59)

Tower-9 s (511/511) 118 23.6 (511/511) 121(/511) *8puzzle-1 95.2 (31/31) 31.1 9.2 (31/31) 143.7 ( / 31) 119.5 ( /39)

8puzzle-2 87.5 (30/30) 31.3 7.0 (30/30) 348.3 (/ 30) 50.5 (/ 48)


grid-y-1 16.7 (14/14) 16.8 16.8 (14/14) 739.4 (/14) 640.5 (/14)

mprime-1 4.8 (4/6) 3.6 (4/6) 2.1 (4/6) 722.6 (/ 4) 79.6 (/ 4)


driverlog-2-3-6b 27.5 (7/20) 1.9 1.9 (7/20) 232

ProblemProblemGraphplanGraphplanGraphplan PEGG-so


PEGG-socpu sec

(steps/acts)




adjusum2 combo


adjusum2 comboGP-eGP-e

bw-large-bbw-large-b 13.4 (18/18)13.4 (18/18) 12.2 12.2 3.1 (18/18)3.1 (18/18) 87.1 (/ 18 ) 20.5 (/28 )87.1 (/ 18 ) 20.5 (/28 )

bw-large-cbw-large-c ss 1104 1104 66.9 (28/28)66.9 (28/28) 738 (/ 28) 114.9 (/38)738 (/ 28) 114.9 (/38)

bw-large-dbw-large-d ss pepe 340 (38/38)340 (38/38) 2350 (/ 36) * 2350 (/ 36) * rocket-ext-a rocket-ext-a 3.5 (7/36)3.5 (7/36) 2.8 (7/34)2.8 (7/34) 1.1 (7/34)1.1 (7/34) 43.6 (/ 40) 1.26 (/ 34)43.6 (/ 40) 1.26 (/ 34)

att-log-a att-log-a 31.8 (11/79)31.8 (11/79) 2.6 (11/72)2.6 (11/72) 2.2 (11/62)2.2 (11/62) 36.7 ( /56) 2.27( / 64)36.7 ( /56) 2.27( / 64)

Gripper-15 Gripper-15 ss 47.5 47.5 16.7 (36/45)16.7 (36/45) 14.1 (/ 45) 16.98 (/45)14.1 (/ 45) 16.98 (/45)

Gripper-20Gripper-20 ss ss 110.8 (40/59)110.8 (40/59) 38.2 (/ 59) 20.92 (/59)38.2 (/ 59) 20.92 (/59)

Tower-9Tower-9 s (511/511)s (511/511) 118118 23.6 (511/511)23.6 (511/511) 121(/511) *121(/511) *8puzzle-1 8puzzle-1 95.2 (31/31)95.2 (31/31) 31.1 31.1 9.2 (31/31)9.2 (31/31) 143.7 ( / 31) 119.5 ( /39)143.7 ( / 31) 119.5 ( /39)

8puzzle-2 8puzzle-2 87.5 (30/30)87.5 (30/30) 31.3 31.3 7.0 (30/30)7.0 (30/30) 348.3 (/ 30) 50.5 (/ 48)348.3 (/ 30) 50.5 (/ 48)

AIPS 1998AIPS 1998AIPS 1998 Alt Alt (Lisp version)Alt Alt (Lisp version)Alt Alt (Lisp version)

grid-y-1 grid-y-1 16.7 (14/14)16.7 (14/14) 16.816.8 16.8 (14/14)16.8 (14/14) 739.4 (/14) 640.5 (/14)739.4 (/14) 640.5 (/14)

mprime-1 mprime-1 4.8 (4/6)4.8 (4/6) 3.6 (4/6)3.6 (4/6) 2.1 (4/6)2.1 (4/6) 722.6 (/ 4) 79.6 (/ 4)722.6 (/ 4) 79.6 (/ 4)

AIPS 2002AIPS 2002AIPS 2002 Alt Alt (Lisp version)Alt Alt (Lisp version)Alt Alt (Lisp version)

driverlog-2-3-6b driverlog-2-3-6b 27.5 (7/20)27.5 (7/20) 1.9 1.9 1.9 (7/20)1.9 (7/20) 232232

[IJCAI 2003]

In the beginning it was all POP.

Then it was cruellyUnPOPped

The good timesreturn with Re(vived)POP

III. PG Heuristics for PO Planners

POP Algorithm

1. Plan Selection: Select a plan P from the search queue2. Flaw Selection: Choose a flaw f

(open cond or unsafe link)3. Flaw resolution:

If f is an open condition, choose an action S that achieves f If f is an unsafe link, choose promotion or demotion Update P Return NULL if no resolution exist

4. If there is no flaw left, return P

S0

S1

S2

S3

Sinf

p

~p

g1

g2g2oc1

oc2

q1

Choice points• Flaw selection (open condition? unsafe link? Non-backtrack choice)• Flaw resolution/Plan Selection (how to select (rank) partial plan?)

S0

Sinf

g1

g2

1. Initial plan:

2. Plan refinement (flaw selection and resolution):

• Distance heuristics to estimate cost of partially ordered plans (and to select flaws)– If we ignore negative interactions,

then the set of open conditions can be seen as a regression state

• Mutexes used to detect indirect conflicts in partial plans– A step threatens a link if there is

a mutex between the link condition and the steps’ effect or precondition

– Post disjunctive precedences and use propagation to simplify

PG Heuristics for Partial Order Planning

Si

Sk

Sj

p

q

r

S0

S1

S2

S3p

~p

g1

g2g2q

r

q1

Sinf

S4

S5

kjik SSSS

rpmutexorqpmutexif

),(),(

rao

Put a sort of color emblem here...

RePOP’s Performance

4.1214.67(5.23) -45.78Bw-large-a

14.14122.56(18.86) --Bw-large-b

116.34-(137.84) --Bw-large-c

20.62-91.53-Logistics-d

4.52-22.54-Logistics-c

1.18262.642.31-Logistics-b

1.59306.123.16-Logistics-a

1.2977.488.17-Rocket-b

1.0275.128.36-Rocket-a

15.42-81.86-Gripper-20

1.1547min2.72-Gripper-10

.4366.821.01-Gripper-8

AltAltGraphplanRePOPUCPOPProblem

4.1214.67(5.23) -45.78Bw-large-a

14.14122.56(18.86) --Bw-large-b

116.34-(137.84) --Bw-large-c

20.62-91.53-Logistics-d

4.52-22.54-Logistics-c

1.18262.642.31-Logistics-b

1.59306.123.16-Logistics-a

1.2977.488.17-Rocket-b

1.0275.128.36-Rocket-a

15.42-81.86-Gripper-20

1.1547min2.72-Gripper-10

.4366.821.01-Gripper-8

AltAltGraphplanRePOPUCPOPProblem• RePOP implemented on top of UCPOP

– Dramatically better than any other partial order planner

– Competitive with Graphplan and AltAlt

– VHPOP carried the torch at ICP 2002

[IJCAI, 2001]

You see, pop, it is possible to Re-use all the old POP work!

Written in Lisp, runs on Linux, 500MHz, 250MB

IV. PG Heuristics for Metric Temporal Planning

Build RTPG Propagate Cost

functionsExtract relaxed plan

Adjust for Mutexes; Resources

Planning Problem

Generate start state

No

Partialize thep.c. plan

Returno.c and p.c plans

Expand state by applying

actions

Heuristicestimation

Select state with lowest f-value

SatisfiesGoals?

Queue of Time-Stamped states

Yes

f can have bothCost & Makespan

components

[ECP 2001; AIPS 2002; ICAPS 2003; JAIR 2003]

rao

Put a sort of "MAIN IDEAS" slide in each of the sections

Multi-Objective Nature of MTP

• Plan quality in Metric Temporal domains is inherently Multi-dimensional – Temporal quality (e.g. makespan,

slack)– Plan cost (e.g. cumulative action cost,

resource consumption)• Necessitates multi-objective search

– Modeling objective functions– Tracking different quality metrics and

heuristic estimation Challenge: Inter-dependencies

between different quality metrics Typically cost will go down with

higher makespan…

Tempe

Phoenix

L.A

SAPA’s approach

• Use a temporal version of the Planning Graph (Smith & Weld) structure to track the time-sensitive cost function:– Estimation of the earliest time

(makespan) to achieve all goals.– Estimation of the lowest cost to

achieve goals– Estimation of the cost to achieve

goals given the specific makespan value.

• Use this information to calculate the heuristic value for the objective function involving both time and cost

Challenge: How to propagate cost over planning graphs?

Tempe

Phoenix

Los Angeles

Tempe

Phoenix

Los Angeles

Drive-car(Tempe,LA)

Heli(T,P)

Shuttle(T,P)

Airplane(P,LA)

t = 0 t = 0.5 t = 1 t = 1.5 t = 10

Drive-car(Tempe,LA)

Heli(T,P)

Shuttle(T,P)

Drive-car(Tempe,LA)

Heli(T,P)

Shuttle(T,P)

Airplane(P,LA)

t = 0 t = 0.5 t = 1 t = 1.5 t = 10

Search through time-stamped states

S=(P,M,,Q,t)

Set <pi,ti> of predicates pi and thetime of their last achievement ti < t.

Set <pi,ti> of predicates pi and thetime of their last achievement ti < t.

Set of functions represent resource values.Set of functions represent resource values.

Set of protectedpersistent conditions(could be binary or resource conds).

Set of protectedpersistent conditions(could be binary or resource conds).

Event queue (contains resource as wellAs binary fluent events).Event queue (contains resource as wellAs binary fluent events).

Time stamp of S.Time stamp of S.

Flying

(in-city ?airplane ?city1)

(fuel ?airplane) > 0

(in-city ?airplane ?city1) (in-city ?airplane ?city2)

consume (fuel ?airplane)

Flying

(in-city ?airplane ?city1)

(fuel ?airplane) > 0

(in-city ?airplane ?city1) (in-city ?airplane ?city2)

consume (fuel ?airplane)

• Goal Satisfaction: S=(P,M,,Q,t) G if <pi,ti> G either:

<pi,tj> P, tj < ti and no event in Q deletes pi.

e Q that adds pi at time te < ti.

• Action Application: Action A is applicable in S if:

– All instantaneous preconditions of A are satisfied by P and M.

– A’s effects do not interfere with and Q.– No event in Q interferes with persistent

preconditions of A.– A does not lead to concurrent resource

change• When A is applied to S:

– P is updated according to A’s instantaneous effects.

– Persistent preconditions of A are put in – Delayed effects of A are put in Q.

Search: Pick a state S from the queue. If S satisfies the goals, endElse non-deterministically do one of

--Advance the clock (by executing the earliest event in Qs

--Apply one of the applicable actions to S

Propagating Cost Functions

Tempe

Phoenix

L.A

time0 1.5 2 10

$300

$220

$100

t = 1.5 t = 10

Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hourHelicopter(Tempe,Phx):Cost: $100; Time: 0.5 hourCar(Tempe,LA):Cost: $100; Time: 10 hourAirplane(Phx,LA):Cost: $200; Time: 1.0 hour

1

Drive-car(Tempe,LA)

Hel(T,P)

Shuttle(T,P)

t = 0

Airplane(P,LA)

t = 0.5

0.5

t = 1

Cost(At(LA)) Cost(At(Phx)) = Cost(Flight(Phx,LA))

Airplane(P,LA)

t = 2.0

$20

Issues in Cost Propagation

Costing a set of literals• Cost(f,t) = min {Cost(A,t) : f Effect(A)}• Cost(A,t) = Aggregate(Cost(f,t): f Pre(A))

• Aggregate can be Sum or Max

• Set-level idea would entail tracking costs of subsets of literals

Termination Criteria

• Deadline Termination: Terminate at time point t if: goal G: Deadline(G) t goal G: (Deadline(G) < t)

(Cost(G,t) = • Fix-point Termination: Terminate

at time point t where we can not improve the cost of any proposition.

• K-lookahead approximation: At t where Cost(g,t) < , repeat the process of applying (set) of actions that can improve the cost functions k times.

rao

May need to have a slide on actions and states

Heuristics based on cost functions

• If we want to minimize makespan:– h = t0

• If we want to minimize cost– h = CostAggregate(G, t)

• If we want to minimize a function f(time,cost) of cost and makespan – h = min f(t,Cost(G,t)) s.t. t0

t t• E.g. f(time,cost) =

100.makespan + Cost then h = 100x2 + 220 at t0 t = 2 t

time

cost

0 t0=1.5 2 t = 10

$300

$220

$100

Cost(At(LA))

Time of Earliest achievement

Time of lowest cost

Direct • Extract a relaxed plan using h as the bias

– If the objective function is f(time,cost), then action A ( to be added to RP) is selected such that:

f(t(RP+A),C(RP+A)) + f(t(Gnew),C(Gnew))

is minimal

Gnew = (G Precond(A)) \ Effects)

Using Relaxed Plan

Phased Relaxation

Adjusting for Resource Interactions: Estimate the number of additional resource-producing actions needed to make-up for any resource short-fall in the relaxed plan C = C + R (Con(R) – (Init(R)+Pro(R)))/R * C(AR)

Adjusting for Mutexes: Adjust the make-span estimate of the relaxed plan by marking actions that are mutex (and thus cannot be executed concurrently

The relaxed plan can be adjusted to take into account constraints that were originally ignored

Handling Cost/Makespan Tradeoffs

Results over 20 randomly generated temporal logistics problems involve moving 4 packages between different locations in 3 cities:

O = f(time,cost) = .Makespan + (1- ).TotalCost

Cost variation

0

10

20

30

40

50

60

0.1 0.2 0.3 0.4 0.5 0.6 0 0.8 0.9 0.95 1

Alpha

To

tal

Co

st

Makespan variation

Cost variation

0

10

20

30

40

50

60

0.1 0.2 0.3 0.4 0.5 0.6 0 0.8 0.9 0.95 1

Alpha

To

tal

Co

st

Makespan variation




Planning Problem


No




actions

Heuristicestimation


SatisfiesGoals?


Yes


components

SAPA at IPC-2002

Rover (time setting) Rover (time setting)

Satellite (complex setting) Satellite (complex setting)




Planning Problem


No




actions

Heuristicestimation


SatisfiesGoals?


Yes


components

[JAIR 2003]

rao

Sapa line in red??

IV. PG Heuristics for Conformant Planning

A* Search Engine(HSP-r)

Heuristics

PlanningGraph(s)

(IPP)

Clausal States

Labels (CUDD)

ModelChecker

(NuSMV)

Off – The - Shelf Custom

IPC PDDL Parser

Sear

ches

Gui

ded

By

Input forInput for

Con

dens

e

Validates

Extracted

From

Conformant Planning as RegressionActions:A1: M P => KA2: M Q => KA3: M R => LA4: K => GA5: L => G

Initially: (P V Q V R) &

(~P V ~Q) & (~P V ~R) & (~Q V ~R) &

M

Goal State:G

G

(G V K)

(G V K V L)

A4

A1

(G V K V L V P) & M

A2

A5

A3

G or K must be true before A4For G to be true after A4

(G V K V L V P V Q) & M

(G V K V L V P V Q V R) &M

Each Clause is Satisfied by a Clause in the Initial Clausal State -- Done! (5 actions)

Initially: (P V Q V R) &

(~P V ~Q) & (~P V ~R) & (~Q V ~R) &

M(G V K V L V P V Q V R) &

M

Using a Single, Unioned GraphPM

QM

RM

P

Q

R

M

A1

A2

A3

Q

R

M

K

LA4

GA5

PA1

A2

A3

Q

R

M

K

L

P

G

A4K

A1P

M

Heuristic Estimate = 2

•Not effective•Lose world specific support information•Incorrect mutexesUnion literals from

all initial states into a conjunctive initial graph level

•Easy to implement

Using Multiple GraphsP

M

A1 P

M

K

A1 P

M

KA4

G

R

MA3

R

M

L

A3R

M

L

GA5

PM

QM

RM

Q

M

A2Q

M

K

A2Q

KA4

G

M

G

A4K

A1

M

P

G

A4K

A2Q

M

GA5

L

A3R

M

•Accurate Mutexes•Moderate Implementation Difficulty

•Memory Intensive•Heuristic Computation Can be costly

Unioning these graphs a priori would give much savings …

Using a Single, Labeled Graph

P

Q

R

A1

A2

A3

P

Q

R

M

L

A1

A2

A3

P

Q

R

L

A5

Action Labels:Conjunction of Labels of Supporting Literals

Literal Labels:Disjunction of LabelsOf Supporting Actions

PM

QM

RM

KA4

G

K

A1

A2

A3

P

Q

R

M

GA5

A4L

K

A1

A2

A3

P

Q

R

M

Heuristic Value = 5

•Memory Efficient•Cheap Heuristics•Scalable•Extensible

•Tricky to Implement

Benefits from BDD’s and a model checker

ATMS

~Q & ~R

~P & ~R

~P & ~Q

(~P & ~R) V (~Q & ~R)

(~P & ~R) V (~Q & ~R) V(~P & ~Q)

M

True

Label Key

Label of a literal signifies the set of worlds in which it is supported --Full support means all init worlds

CAltAlt Performance• Label-graph based

heuristics make CAltAlt competitive with the current best approaches

Rovers Domain

1

10

100

1000

10000

100000

1000000

1 2 3 4Problem

Tim

e(m

s)

Single Sum Multi Level Multi RP Union

Label Level Label RP CGP

HSCP GPT KACMBP

Logistics

0

5

10

15

20

25

30

1 2 3 4Problem

Pla

n L

eng

th

Label RP CGP

HSCP GPT

KACMBP

A* Search Engine(HSP-r)

Heuristics

PlanningGraph(s)

(IPP)

Clausal States

Labels (CUDD)

ModelChecker

(NuSMV)

Off – The - Shelf Custom

IPC PDDL Parser

Sear

ches

Gui

ded

By

Input forInput for

Con

dens

e

Validates

Extracted

From

The Damage until now..



– Graphplan style search– GP-HSP (AIPS 2000); PEGG (IJCAI 2003; AAAI 1999]







• Multiple graphs; Labelled graphs

Still to come: PG Heuristics for—

• Probabilistic Conformant Planning• Conditional Planning• Lifted Planning

• Trans-Atlantic camaraderie• Post-war reconstruction• Middle-east peace…

Meanwhile outside Tempe…

• Hoffman’s FF uses relaxed plans from PG• Geffner & Haslum derive DP-versions of PG-

heuristics• Gerevini & Serina’s LPG uses PG heuristics to

cost the various repairs• Smith back-propagates (convolves) probability

distributions over PG to decide the contingencies worth focusing on

• Trinquart proposes a PG-clone that directly computes reachability in plan-space…

• …

Why do we love PG Heuristics?• They work!• They are “forgiving”

– You don't like doing mutex? okay– You don't like growing the graph all the way? okay.

• Allow propagation of many types of information– Level, subgoal interaction, time, cost, world support,

• Support phased relaxation– E.g. Ignore mutexes and resources and bring them back later…

• Graph structure supports other synergistic uses– e.g. action selection

• Versatility…

• PG Variations– Serial– Parallel– Temporal– Labelled

• Propagation Methods– Level– Mutex– Cost– Label

• Planning Problems– Classical– Resource/Temporal– Conformant

• Planners– Regression– Progression– Partial Order– Graphplan-style

Versatility of PG Heuristics

Documents

1001 Ways to Skin a Planning Graph for Heuristic Fun and Profit Subbarao Kambhampati Arizona State University (With tons of