3/4 The slides on quotienting were added after the class to reflect the white-board discussion in the class

3/4

The slides on quotienting were added after the class to reflect the white-board discussion in the class

Thoughts on Candidate Set semantics for Temporal Planning

Doing Temporal Planning Correctly[In search of Complete Position-Constrained Planner]

Need

Talking about “complete” and “completely optimal” seems to make little sense unless we first define the space over which we want completeness

Qn: What is the space over which candidate set of a temporal plan is defined? For classical planning, we know it is over “action sequences”

Interestingly, even partial-order planners are essentially aiming for completeness over these action sequences

Dispatches as candidates

We can define candidate sets in terms of “dispatches” A dispatch is a set of 3 tuples { <a, sa, ea>} where

a is a ground (durative) action Sa is the start time for the action a Ea is the end time for the action a

For fixed duration actions, ea is determined given sa

Completeness, optimality etc should be defined over these dispatches eventually..

Quotient spaces The space of dispatches is “dense” when you have real valued time points It is more convenient to think of search in terms of quotient spaces defined

over the space of dispatches In fact, it seems necessary that we search in quotient spaces for temporal

planning (especially with real-valued time) Since we want the complexity of planning be somehow related to the

number of actions in the plan, and not on their durations(?) A quotient space essentially involves setting up disjoint equivalence classes

over the base space SNLP’s partial plans actually set up a quotient space over the ground

operator sequences (otherwise, the space of partially ordered plans will be much larger than the space of sequences..)

There are multiple ways of setting up quotient spaces over dispatches You can discuss completeness of any planner w.r.t. any legal quotient space.

But.. Some quotient spaces may be more natural to discuss some planners…

Start/End point permutations (SEPP) One quotient space over dispatches is to consider the space of

permutations over the start and end points of actions Specifically, we consider the space of sequences over the alphabet {as

ae} over all actions where: If the sequence contains as, it must contain ae (and vice versa) as must come before ae in the sequence If the sequence contains end points of two actions a1 and a2, then their order

must not violated durations of the actions If d(a1)< d(a2), then we can’t have ..a1s…a2s…a2e..a1e.. In the sequence

Note that each element of SEPP space is a representative for a possibly infinite number of dispatches

Completeness over the SEPP space is a necessary condition for completeness over dispatch space

POP space

The space of partially ordered causal link plans that VHPOP/Zeno search in should be seen as quotienting further over the SEPP space Similar to the way SNLP plans can be seen as

quotienting over the action sequences.

SAPA-space?

Another way of setting up a quotient space over dispatches is to consider specific dispatches themselves as the prototypes of an equivalence class of dispatches

Prototype-based quotient spaces

SAPA seems to be easiest to understand in terms of associating a specific dispatch as the representative of a set of dispatches It then only searches over these dispatches ..so it will be incomplete if the optimal solution of a problem is

not in the space of these canonical dispatches The basic result of [Cushing et al 2007] can be understood as

saying that there is no easy way to set up a finite set of representative dispatches that will be complete for all problems This, I believe, is the lesson of the failed quest for complete DEP

planners Left-shifted plans as representatives?

Quotient Space & Navigation??

Sapa can be understood as Trying to navigate in a quotient space of left-shifted

dispatches But with an incomplete navigational strategy

Navigation is being effected through epochs Our inability to find a good epoch-based navigation seems

to suggest that there is no natural way to navigate this space?

Left-shifted plans

Two plans are equivalent if they have the same happening sequence

The canonical representation

Mid-term Feedback.. 9 out of 12 gave feedback. I will post them all un-edited. People generally happy (perhaps embarrassingly happy) with the way the

class is going One person said it is all too overwhelming and the pace and coverage should be

reduced significantly Readings: A mixture of reading before and after. Homeworks: Majority seem happy that they force them to re-read the paper.

There seems to be little support for “more” homework One person said they should be more challenging and go beyond readings.

Semester project: Majority seem to be getting started; and want to spend time on “their” project rather than homeworks etc.

Interactivity: People think there is enough discussion (I beg to disagree—but I am just an instructor). One person thought that there should be more discssion--and suggested design of

more incentives for discussion (sort of like the blog discussion requirement)

Temporal Constraints

Temporal Constraints

Qualitative Interval constraints (and

algebra) Point constraints (and

algebra)

Metric constraints Best seen as putting

distance ranges over time points

General temporal constraint reasoning is NP-hard. Tractable subclasses exist.

• Hybrid: allow qualitative and quantitative constraints

Most temporal constraint formalisms model only binary constraints

Tradeoffs: Progression/Regression/PO Planning for metric/temporal planning

Compared to PO, both progression and regression do a less than complete job of handling concurrency (e.g. slacks may have to be handled through post-processing).

Progression planners have the advantage that the exact amount of a resource is known at any given state. So, complex resource constraints are easier to verify. PO (and to some extent regression), will have to verify this by posting and then verifying resource constraints.

Currently, SAPA (a progression planner) does better than TP4 (a regression planner). Both do oodles better than Zeno/IxTET. However TP4 could be possibly improved significantly by giving up the insistence

on admissible heuristics Zeno (and IxTET) could benefit by adapting ideas from RePOP.

Interleaving-Space: TEMPO

Delay dispatch decisions until afterwards Choose

Start an action End an action Make a scheduling decision

Solve temporal constraints

Temporally Simple Complete, Optimal

Temporally Expressive Complete, Optimal

Salvaging State-space Temporal Planning

light

fix

match

fuse

fix

fix light

fusefix light

matchfusefix light

Y

Qualitative Temporal Constraints(Allen 83)

x before y x meets y x overlaps y x during y x starts y x finishes y x equals y

X Y

X Y

X Y

YX

YX

Y X

X

y after x y met-by x y overlapped-by x y contains x y started-by x y finished-by x y equals x

Intervals can be handled directly

The 13 in the previous page are primitive relations. The relation between a pair of intervals may well be a disjunction of these primitive ones: A meets B OR A starts B

There are “transitive” axioms for computing the relations between A and C, given the relations between A and B & B and C A meets B & B starts C => A starts C A starts B & B during C => ~ [C before A] Using these axioms, we can do constraint propagation directly on interval

relations; to check for tight relations among any given pair of relations (as well as consistency of a set of relations)

Allen’s Interval Algebra Intervals can also be handled in terms of their start and end points. This latter

is what we will see next.

Qualitative Temporal ConstraintsMaybe Expressed as Inequalities

(Vilain, Kautz 86) x before y X+ < Y-

x meets y X+ = Y-

x overlaps y (Y- < X+) & (X- < Y+) x during y (Y- < X-) & (X+ < Y+) x starts y (X- = Y-) & (X+ < Y+) x finishes y (X- < Y-) & (X+ = Y+) x equals y (X- = Y-) & (X+ = Y+) Inequalities may be expressed as binary interval relations:

X+ - Y- < [-inf, 0]

Metric Constraints Going to the store takes at least 10 minutes and at most

30 minutes.→ 10 < [T+(store) – T-(store)] < 30

Bread should be eaten within a day of baking.→ 0 < [T+(baking) – T-(eating)] < 1 day

Inequalities, X+ < Y- , may be expressed as binary interval relations:→ - inf < [X+ - Y-] < 0

Metric Time: Quantitative Temporal Constraint Networks

(Dechter, Meiri, Pearl 91)

A set of time points Xi at which events occur.

Unary constraints

(a0 < Xi < b0 ) or (a1 < Xi < b1 ) or . . .

Binary constraints

(a0 < Xj - Xi < b0 ) or (a1 < Xj - Xi < b1 ) or . . .

Not n-ary constraints

STN (simple temporal network)is a TCN that has no disjunctive constraints (each constraint has one interval)

TCSP Are Visualized UsingDirected Constraint Graphs

1 3

42

0[10,20]

[30,40][60,inf]

[10,20]

[20,30][40,50]

[60,70]

TCSPs vs CSPs

TCSP is a subclass of CSPs with some important properties The domains of the variables are totally ordered

The domains of the variables are continuous Most queries on TCSPs would involve reasoning over all

solutions of a TCSP (e.g. earliest/latest feasible time of a temporal variable) Since there are potentially an infinite number of solutions to a TCSP,

we need to find a way of representing the set of all solutions compactly

Minimal TCSP network is such a representation

TCSP Queries(Dechter, Meiri, Pearl, AIJ91)

Is the TCSP consistent? Planning What are the feasible times for each X i? What are the feasible durations between

each Xi and Xj? What is a consistent set of times? Scheduling

Dispatch

What are the earliest possible times? Scheduling What are the latest possible times?All of these can be done if we compute the minimal equivalent network

Constraint Tightness & Minimal Networks

A TCSP N1 is considered minimal network if there is no other network N2 that has the same solutions as N1, and has at least one tighter constraint than N1 Tightness means there are fewer valid composite labels for the variables.

This has nothing to do with the “syntactic complexity” of the constraint A Constraint a[ 1 3]b is tighter than a constraint a[0 10]b A constraint a[1 1.5][1.6 1.9][1.9 2.3] [2.3 4.8] [5 6]b is tighter than a constraint

a[0 10]b Computation of minimal networks, in general, involves doing two

operations: Intersection over constraints Composition over constraints

For each path p in the network, connecting a pair of nodes a and b, find the path constraint between a and b (using composition)

Intersect all the constraints between a pair of nodes a and b to find the tightest constraint between a and b

Can lead to “fragmentation of constraints” in the case of disjunctive TCSPs…

Union/Composition/Intersection of Temporal Constraints

Operations on Constraints:

Intersection

And

Composition

1 3

42

0[10,20]

[30,40][60,inf]

[10,20]

[20,30][40,50]

[60,70]

1 3

42

0[10,20]

[30,40][60,inf]

[10,20]

[20,30][40,50]

[60,70]

Compose [10,20] with [30,40][60,inf] to get constraint between 0 and 3

An example where minimal network is different from the original one.

1 30[10,20] [30,40]

[0,100]

1 30[10,20] [30,40]

[0,100]

[40,60]

To compute the constraint between 0 and 3, we first compose [10,20] and [30,40] to get [40,60] we then intersect [40,60] and [0,100] to get [40,60]

Computing Minimal Networks Using Path Consistency

Minimal networks for TCSPs can be computed by ensuring “path consistency” For each triple of vertices i,j,k

C(i,k) := C(i,k) .intersection. [C(i,j) .compose. C(j,k)]

For STP’s we are guaranteed to reach fixpoint by the time we visit each constraint once I.e., outerloop executes only

once. For Disjunctive TCSPs, enforcing

path consistency is NP-hard Shouldn’t be surprising…

consistency of disjunctive precedence constraints is NP-hard

“Fragmentation” happens Approximation schemes

possible

Solving Disjunctive TCSPs: Split disjunction

Suppose we have a TCSP, where just one of the constraints is dijunctive: a [1 2][5 6] b We have two STPs one in which the constraint a[1 2]b is

there and the other contains a[5 6]b Disjunctive TCSP’s can be solved by solving the

exponential number of STPs Minimal network for DTP is the union of minimal networks for

the STPs This is a brute-force method; Exponential number of STPs—

many of which have significant overlapping constraints.

To Query an STN Map to aDistance Graph Gd = < V,Ed >

70

1 3

42

020

50

-10

40

-30

20 -10

-40-60

1 3

42

0[10,20] [30,40]

[10,20]

[40,50]

[60,70]

Tij = (aij Xj - Xi bij)Xj - Xi bij

Xi - Xj - aij

Edge encodes an upper bound on distance to target from source.

Conjoined Paths are Computed using All Pairs Shortest Path

(e.g., Floyd-Warshall’s algorithm )

1. for i := 1 to n do dii 0;

2. for i, j := 1 to n do dij aij;

3. for k := 1 to n do4. for i, j := 1 to n do5. dij min{dij, dik + dkj};

ik

j

0 1 2 3 4

0 0 20 50 30 70

1 -10 0 40 20 60

2 -40 -30 0 -10 30

3 -20 -10 20 0 50

4 -60 -50 -20 -40 0

d-graph

Shortest Paths of Gd

70

1 2

43

020

50

-10

40

-30

20 -10

-40-60

STN Minimum Network

0 1 2 3 4

0 [0] [10,20] [40,50] [20,30] [60,70]

1 [-20,-10] [0] [30,40] [10,20] [50,60]

2 [-50,-40] [-40,-30] [0] [-20,-10] [20,30]

3 [-30,-20] [-20,-10] [10,20] [0] [40,50]

4 [-70,-60] [-60,-50] [-30,-20] [-50,-40] [0]

0 1 2 3 4

0 0 20 50 30 70

1 -10 0 40 20 60

2 -40 -30 0 -10 30

3 -20 -10 20 0 50

4 -60 -50 -20 -40 0

d-graph STN minimum network

Testing Plan Consistency

0 1 2 3 4

0 0 20 50 30 70

1 -10 0 40 20 60

2 -40 -30 0 -10 30

3 -20 -10 20 0 50

4 -60 -50 -20 -40 0

d-graph70

1 2

43

020

50

-10

40

-30

20 -10

-40-60

No negative cycles: -5 > TA – TA = 0

Latest Solution

0 1 2 3 4

0 0 20 50 30 70

1 -10 0 40 20 60

2 -40 -30 0 -10 30

3 -20 -10 20 0 50

4 -60 -50 -20 -40 070

1 2

43

020

50

-10

40

-30

20 -10

-40-60

d-graph

Node 0 is the reference.

Earliest Solution

0 1 2 3 4

0 0 20 50 30 70

1 -10 0 40 20 60

2 -40 -30 0 -10 30

3 -20 -10 20 0 50

4 -60 -50 -20 -40 070

1 2

43

020

50

-10

40

-30

20 -10

-40-60

d-graph

Node 0 is the reference.

Solution: Earliest Times

70

1 3

42

020

50

-10

40

-30

20 -10

-40

-60

S1 = (-d10, . . . , -dn0)

Scheduling:Feasible Values

0 1 2 3 4

0 0 20 50 30 70

1 -10 0 40 20 60

2 -40 -30 0 -10 30

3 -20 -10 20 0 50

4 -60 -50 -20 -40 0

d-graph

• X1 in [10, 20]

• X2 in [40, 50]

• X3 in [20, 30]

• X4 in [60, 70]

Latest Times

Earliest Times

Solution by Decomposition

0 1 2 3 4

0 0 20 50 30 70

1 -10 0 40 20 60

2 -40 -30 0 -10 30

3 -20 -10 20 0 50

4 -60 -50 -20 -40 0

d-graph • Select value for 4, consistent with 1,2 & 3O(N2)

• Select value for 2, consistent with 1

45

• Select value for 1

15

• Select value for 3, consistent with 1 & 2

30

10/30 (Don’t print hidden slides)

Multi-objective search

Multi-dimensional nature of plan quality in metric temporal planning: Temporal quality (e.g. makespan, slack—the time when a

goal is needed – time when it is achieved.) Plan cost (e.g. cumulative action cost, resource consumption)

Necessitates multi-objective optimization: Modeling objective functions Tracking different quality metrics and heuristic estimation Challenge: There may be inter-dependent

relations between different quality metric

Example

Option 1: Tempe Phoenix (Bus) Los Angeles (Airplane) Less time: 3 hours; More expensive: $200

Option 2: Tempe Los Angeles (Car) More time: 12 hours; Less expensive: $50

Given a deadline constraint (6 hours) Only option 1 is viable Given a money constraint ($100) Only option 2 is viable

Tempe

Phoenix

Los Angeles

Solution Quality in the presence of multiple objectives

When we have multiple objectives, it is not clear how to define global optimum

E.g. How does <cost:5,Makespan:7> plan compare to <cost:4,Makespan:9>? Problem: We don’t know what the user’s utility metric

is as a function of cost and makespan.

Solution 1: Pareto Sets

Present pareto sets/curves to the user A pareto set is a set of non-dominated solutions

A solution S1 is dominated by another S2, if S1 is worse than S2 in at least one objective and equal in all or worse in all other objectives. E.g. <C:4,M9> dominated by <C:5;M:9>

A travel agent shouldn’t bother asking whether I would like a flight that starts at 6pm and reaches at 9pm, and cost 100$ or another ones which also leaves at 6 and reaches at 9, but costs 200$.

A pareto set is exhaustive if it contains all non-dominated solutions Presenting the pareto set allows the users to state their preferences implicitly by

choosing what they like rather than by stating them explicitly. Problem: Exhaustive Pareto sets can be large (exponentially large in many cases).

In practice, travel agents give you non-exhaustive pareto sets, just so you have the illusion of choice

Optimizing with pareto sets changes the nature of the problem—you are looking for multiple rather than a single solution.

Solution 2: Aggregate Utility Metrics Combine the various objectives into a single utility measure

Eg: w1*cost+w2*make-span Could model grad students’ preferences; with w1=infinity, w2=0

Log(cost)+ 5*(Make-span)25 Could model Bill Gates’ preferences.

How do we assess the form of the utility measure (linear? Nonlinear?) and how will we get the weights?

Utility elicitation process Learning problem: Ask tons of questions to the users and learn their utility function to fit their

preferences Can be cast as a sort of learning task (e.g. learn a neual net that is consistent with the examples)

Of course, if you want to learn a true nonlinear preference function, you will need many many more examples, and the training takes much longer.

With aggregate utility metrics, the multi-obj optimization is, in theory, reduces to a single objective optimization problem *However* if you are trying to good heuristics to direct the search, then since estimators are

likely to be available for naturally occurring factors of the solution quality, rather than random combinations there-of, we still have to follow a two step process

1. Find estimators for each of the factors2. Combine the estimates using the utility measure THIS IS WHAT IS DONE IN SAPA

Sketch of how to get cost and time estimates

Planning graph provides “level” estimates Generalizing planning graph to “temporal planning graph” will allow us to

get “time” estimates For relaxed PG, the generalization is quite simple—just use bi-level

representation of the PG, and index each action and literal by the first time point (not level) at which they can be first introduced into the PG

Generalizing planning graph to “cost planning graph” (i.e. propagate cost information over PG) will get us cost estimates

We discussed how to do cost propagation over classical PGs. Costs of literals can be represented as monotonically reducing step functions w.r.t. levels.

To estimate cost and time together we need to generalize classical PG into Temporal and Cost-sensitive PG

Now, the costs of literals will be monotonically reducing step functions w.r.t. time points (rather than level indices)

This is what SAPA does

SAPA approach

Using the Temporal Planning Graph (Smith & Weld) structure to track the time-sensitive cost function: Estimation of the earliest time (makespan) to achieve all goals. Estimation of the lowest cost to achieve goals Estimation of the cost to achieve goals given the specific

makespan value. Using this information to calculate the heuristic

value for the objective function involving both time and cost

Involves propagating cost over planning graphs..

Heuristics in Sapa are derived from the Graphplan-stylebi-level relaxed temporal planning graph (RTPG)

Progression; so constructed anew for each state..

Relaxed Temporal Planning Graph

Relaxed Action:No delete effects

May be okay given progression planningNo resource consumption

Will adjust later

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

while(true) forall Aadvance-time applicable in S S = Apply(A,S)

Involves changing P,,Q,t{Update Q only with positive effects; and only when there is no other earlier event giving that effect}

if SG then Terminate{solution}

S’ = Apply(advance-time,S) if (pi,ti) G such that ti < Time(S’) and piS then Terminate{non-solution} else S = S’end while; Deadline goals

RTPG is modeled as a time-stamped plan! (but Q only has +ve events)

Note: Bi-level rep; we don’t actually stack actions multiple times in PG—we just keep track the first time the action entered

Heuristics directly from RTPG

For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG.

For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG. Slack estimate is the difference

between the deadline of the goal, and the expected time of achievement of that goal.

Proof: All goals appear in the RTPG at times smalleror equal to their achievable times.

ADMISSIBLE

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

Heuristics from Relaxed Plan Extracted from RTPG

RTPG can be used to find a relaxed solution which is thenused to estimate distance from a given state to the goals

Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan.

Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan.

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

Resource-based Adjustments to Heuristics

Resource related information, ignored originally, can be used to improve the heuristic values

Adjusted Sum-Action:

h = h + R (Con(R) – (Init(R)+Pro(R)))/R

Adjusted Sum-Duration:

h = h + R [(Con(R) – (Init(R)+Pro(R)))/R].Dur(AR)

Will not preserve admissibility

The (Relaxed) Temporal PG

Tempe

Phoenix

Los Angeles

Drive-car(Tempe,LA)

Heli(T,P)

Shuttle(T,P)

Airplane(P,LA)

t = 0 t = 0.5 t = 1 t = 1.5 t = 10

Time-sensitive Cost Function

Standard (Temporal) planning graph (TPG) shows the time-related estimates e.g. earliest time to achieve fact, or to execute action

TPG does not show the cost estimates to achieve facts or execute actions

Tempe

Phoenix

L.A

Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hourHelicopter(Tempe,Phx):Cost: $100; Time: 0.5 hourCar(Tempe,LA):Cost: $100; Time: 10 hourAirplane(Phx,LA):Cost: $200; Time: 1.0 hour

cost

time0 1.5 2 10

$300

$220

$100

Drive-car(Tempe,LA)

Heli(T,P)

Shuttle(T,P)

Airplane(P,LA)

t = 0 t = 0.5 t = 1 t = 1.5 t = 10

Estimating the Cost Function

Tempe

Phoenix

L.A

time0 1.5 2 10

$300

$220

$100

t = 1.5 t = 10

Shuttle(Tempe,Phx): Cost: $20; Time: 1.0 hourHelicopter(Tempe,Phx):Cost: $100; Time: 0.5 hourCar(Tempe,LA):Cost: $100; Time: 10 hourAirplane(Phx,LA):Cost: $200; Time: 1.0 hour

1

Drive-car(Tempe,LA)

Hel(T,P)

Shuttle(T,P)

t = 0

Airplane(P,LA)

t = 0.5

0.5

t = 1

Cost(At(LA)) Cost(At(Phx)) = Cost(Flight(Phx,LA))

Airplane(P,LA)

t = 2.0

$20

Observations about cost functions

Because cost-functions decrease monotonically, we know that the cheapest cost is always at t_infinity (don’t need to look at other times) Cost functions will be monotonically decreasing as long as there are no exogenous

events Actions with time-sensitive preconditions are in essence dependent on exogenous

events (which is why PDDL 2.1 doesn’t allow you to say that the precondition must be true at an absolute time point—only a time point relative to the beginning of the action

If you have to model an action such as “Take Flight” such that it can only be done with valid flights that are pre-scheduled (e.g. 9:40AM, 11:30AM, 3:15PM etc), we can model it by having a precondition “Have-flight” which is asserted at 9:40AM, 11:30AM and 3:15PM using timed initial literals)

Becase cost-functions are step funtions, we need to evaluate the utility function U(makespan,cost) only at a finite number of time points (no matter how complex the U(.) function is. Cost functions will be step functions as long as the actions do not model

continuous change (which will come in at PDDL 2.1 Level 4). If you have continuous change, then the cost functions may change continuously too

ADDED

Cost Propagation Issues:

At a given time point, each fact is supported by multiple actions Each action has more than one precondition

Propagation rules: Cost(f,t) = min {Cost(A,t) : f Effect(A)} Cost(A,t) = Aggregate(Cost(f,t): f Pre(A))

Sum-propagation: Cost(f,t) The plans for individual preconds may be interacting

Max-propagation: Max {Cost(f,t)} Combination: 0.5 Cost(f,t) + 0.5 Max {Cost(f,t)}

Probably other better ideas could be tried

Can’t use something like set-level idea here becauseThat will entail tracking the costs of subsets of literals

Termination Criteria

Deadline Termination: Terminate at time point t if: goal G: Dealine(G) t goal G: (Dealine(G) < t) (Cost(G,t) =

Fix-point Termination: Terminate at time point t where we can not improve the cost of any proposition.

K-lookahead approximation: At t where Cost(g,t) < , repeat the process of applying (set) of actions that can improve the cost functions k times.

cost

time0 1.5 2 10

$300

$220

$100

Drive-car(Tempe,LA)

H(T,P)

Shuttle(T,P)

Plane(P,LA)

t = 0 0.5 1 1.5 t = 10

Earliest time pointCheapest cost

Heuristic estimation using the cost functions

If the objective function is to minimize time: h = t0

If the objective function is to minimize cost: h = CostAggregate(G, t)

If the objective function is the function of both time and cost

O = f(time,cost) then:h = min f(t,Cost(G,t)) s.t. t0 t t

Eg: f(time,cost) = 100.makespan + Cost then h = 100x2 + 220 at t0 t = 2 t

time

cost

0 t0=1.5 2 t = 10

$300

$220

$100

Cost(At(LA))

Earliest achieve time: t0 = 1.5Lowest cost time: t = 10

The cost functions have information to track both temporal and costmetric of the plan, and their inter-dependent relations !!!

Heuristic estimation by extracting the relaxed plan

Relaxed plan satisfies all the goals ignoring the negative interaction: Take into account positive interaction Base set of actions for possible adjustment according to

neglected (relaxed) information (e.g. negative interaction, resource usage etc.)

Need to find a good relaxed plan (among multiple ones) according to the objective function

Heuristic estimation by extracting the relaxed plan

General Alg.: Traverse backward searching for actions supporting all the goals. When A is added to the relaxed plan RP, then:

Supported Fact = SF Effects(A)Goals = SF \ (G Precond(A))

Temporal Planning with Cost: If the objective function is f(time,cost), then A is selected such that:

f(t(RP+A),C(RP+A)) + f(t(Gnew),C(Gnew)) is minimal (Gnew = (G Precond(A)) \ Effects)

Finally, using mutex to set orders between A and actions in RP so that less number of causal constraints are violated

time

cost

0 t0=1.5 2 t = 10

$300

$220

$100

Tempe

Phoenix

L.A

f(t,c) = 100.makespan + Cost

End of 10/30 lecture

Adjusting the Heuristic Values

Ignored resource related information can be used to improve the heuristic values (such like +ve and –ve interactions in classical planning)

Adjusted Cost:

C = C + R (Con(R) – (Init(R)+Pro(R)))/R * C(AR)

Cannot be applied to admissible heuristics

Partialization Example

A1 A2 A3

A1(10) gives g1 but deletes pA3(8) gives g2 but requires p at startA2(4) gives p at end We want g1,g2

A position-constrained plan with makespan 22

A1

A2

A3 G

p

g1

g2

[et(A1) <= et(A2)] or [st(A1) >= st(A3)][et(A2) <= st(A3)….

OrderConstrainedplan

The best makespan dispatch of the order-constrained plan

A1

A2 A3 14+

There could be multiple O.C. plansbecause of multiple possible causal sources. Optimization will involve Going through them all.

Problem Definitions Position constrained (p.c) plan: The execution time of each action is

fixed to a specific time point Can be generated more efficiently by state-space planners

Order constrained (o.c) plan: Only the relative orderings between actions are specified More flexible solutions, causal relations between actions

Partialization: Constructing a o.c plan from a p.c plan

QR R

G

QR

{Q} {G}

t1 t2 t3

p.c plan o.c plan

Q R RG

QR

{Q} {G}

Validity Requirements for a partialization

An o.c plan Poc is a valid partialization of a valid p.c plan Ppc, if: Poc contains the same actions as Ppc

Poc is executable Poc satisfies all the top level goals (Optional) Ppc is a legal dispatch (execution) of Poc

(Optional) Contains no redundant ordering relations

PQ

PQ

Xredundant

Greedy Approximations

Solving the optimization problem for makespan and number of orderings is NP-hard (Backstrom,1998)

Greedy approaches have been considered in classical planning (e.g. [Kambhampati & Kedar, 1993], [Veloso et. al.,1990]):

Find a causal explanation of correctness for the p.c plan Introduce just the orderings needed for the explanation to

hold

Modeling greedy approaches as value ordering strategies

Variation of [Kambhampati & Kedar,1993] greedy algorithm for temporal planning as value ordering: Supporting variables: Sp

A = A’ such that: etp

A’ < stpA in the p.c plan Ppc

B s.t.: etpA’ < etp

B < stpA

C s.t.: etpC < etp

A’ and satisfy two above conditions Ordering and interference variables:

pAB = < if etp

B < stpA ; p

AB = > if stpB > stp

A

rAA’= < if etr

A < strA’ in Ppc; r

AA’= > if strA > etr

A’ in Ppc; rAA’= other wise.

Key insight: We can capture many of the greedy approaches as specific value ordering strategies on the CSOP encoding

Empirical evaluation

Objective: Demonstrate that metric temporal planner armed with our

approach is able to produce plans that satisfy a variety of cost/makespan tradeoff.

Testing problems: Randomly generated logistics problems from TP4

(Hasslum&Geffner)Load/unload(package,location): Cost = 1; Duration = 1;Drive-inter-city(location1,location2): Cost = 4.0; Duration = 12.0;Flight(airport1,airport2): Cost = 15.0; Duration = 3.0;Drive-intra-city(location1,location2,city): Cost = 2.0; Duration = 2.0;

Documents

3/4 The slides on quotienting were added after the class to reflect the white-board discussion in the class