A lookahead strategy for heuristic search planningv.vidal.free.fr/onera/publis/look.pdfA lookahead strategy for heuristic search planning Vincent Vidal IRIT – Universite´ Paul Sabatier

A lookahead strategy forheuristic search planning

Vincent VidalIRIT – Universite Paul Sabatier

118 route de Narbonne31062 Toulouse Cedex 04, France

email: [email protected]

Technical report IRIT/2002-35-R

Abstract

The planning as heuristic search framework, initiated by the planners ASP fromBonet, Loerincs and Geffner, and HSP from Bonet and Geffner, lead to some of themost performant planners, as demonstrated in the two previous editions of the Inter-national Planning Competition. We focus in this paper on a technique introduced byHoffmann and Nebel in the FF planning system for calculating the heuristic, based onthe extraction of a solution from a planning graph computed for the relaxed problem ob-tained by ignoring deletes of actions. This heuristic is used in a forward-chaining searchalgorithm to evaluate each encountered state. As a side effect of the computation of thisheuristic, more information is derived from the planning graph and its solution, namelythe helpful actions which permit FF to concentrate its efforts on more promising ways,forgetting the other actions in a local search algorithm. We introduce a novel way for ex-tracting information from the computation of the heuristic and for tackling with helpfulactions, by considering the high quality of the plans computed by the heuristic functionin numerous domains. For each evaluated state, we employ actions from these plans inorder to find the beginning of a valid plan that can lead to a reachable state, that will of-ten bring us closer to a solution state. The lookahead state thus calculated is then addedto the list of nodes that can be chosen to be developed following the numerical value ofthe heuristic. We use this lookahead strategy in a complete best-first search algorithm,modified in order to take into account helpful actions by preferring nodes that can bedeveloped with such actions over nodes that can be developed with actions that are notconsidered as helpful. We then provide an empirical evaluation which demonstrates thatin numerous planning benchmark domains, the performance of heuristic search planningand the size of the problems that can be handled have been drastically improved, whilein more “difficult” domains these strategies remain interesting even if they sometimesdegrade plan quality.

1

1 Introduction

Planning as heuristic search has proven to be a successful framework for non-optimal plan-ning, since the advent of planners capable to outperform in most of the classical benchmarksthe previous state-of-the-art planners Graphplan [BF95, BF97], Satplan [KS96, KMS96]and their descendants Blackbox [KS99], IPP [KNHD97], LCGP [CRV01], STAN [LF99],SGP [WAS98], . . . Although most of these planners compute optimal parallel plans, whichis not exactly the same purpose as non-optimal planning, they also offer no optimality guar-antee concerning plan length in number of actions. This is one reason for which the interestof the planning community turned towards the planning as heuristic search framework andother techniques such as planning as model checking, more promising in terms of perfor-mance for non-optimal planning plus some other advantages such as easier extensions toresource planning and planning under uncertainty.

The planning as heuristic search framework, initiated by the planners ASP [BLG97],HSP and HSPr [BG01], lead to some of the most performant planners, as demonstrated inthe two previous editions of the International Planning Competition with planners such asHSP2 [BG01], FF [HN01] and AltAlt [NK00, NK02]. FF was in particular awarded for out-standing performance at the 2

��International Planning Competition1 and was generally the

top performer planner in the STRIPS track of the��

International Planning Competition2.We focus in this paper on a technique introduced in the FF planning system for calculat-

ing the heuristic, based on the extraction of a solution from a planning graph computed forthe relaxed problem obtained by ignoring deletes of actions. It can be performed in poly-nomial time and space, and the length in number of actions of the relaxed plan extractedfrom the planning graph represents the heuristic value of the evaluated state. This heuristicis used in a forward-chaining search algorithm to evaluate each encountered state. As a sideeffect of the computation of this heuristic, another information is derived from the planninggraph and its solution, namely the helpful actions. They are the actions of the relaxed planexecutable in the state for which the heuristic is computed, augmented in FF by all actionswhich are executable in that state and produce fluents that where found to be goals at thefirst level of the planning graph. These actions permit FF to concentrate its efforts on morepromising ways than considering all actions, forgetting actions that are not helpful in a vari-ation of the hill-climbing local search algorithm. When this last fails to find a solution, FFswitches to a classical complete best-first search algorithm. The search is then started againfrom scratch, without the benefit obtained by using helpful actions and local search.

We introduce a novel way for extracting informations from the computation of theheuristic and for tackling with helpful actions, by considering the high quality of the re-laxed plans extracted by the heuristic function in numerous domains. Indeed, the beginningof these plans can often be extended to solution plans of the initial problem, and there areoften a lot of other actions from these plans that can effectively be used in a solution plan.We define in this paper an algorithm for combining some actions from each relaxed plan,in order to find the beginning of valid plan that can lead to a reachable state. Thanks to thequality of the extracted relaxed plans, these states will frequently bring us closer to a solu-tion state. The lookahead states thus calculated are then added to the list of nodes that canbe chosen to be developed following the numerical value of the heuristic. The best strategywe (empirically) found is to use as much actions as possible from each relaxed plans and toperform the computation of lookahead states as often as possible.

This lookahead strategy can be used in different search algorithms. We propose a mod-ification of a classical best-first search algorithm in a way that preserves completeness. In-deed, it can simply consist in augmenting the list of nodes to be developed (the open list)with some new nodes computed by the lookahead algorithm. The branching factor is slightly

1The 2 �� IPC home page can be found at http://www.cs.toronto.edu/aips2000/.2The 3 � IPC home page can be found at http://www.dur.ac.uk/d.p.long/competition.html.

2

increased, but the performances are generally better and completeness is not affected. In ad-dition to this lookahead strategy, we propose a new way of using helpful actions that alsopreserves completeness. In FF, actions that are not considered as helpful are lost: this makesthe algorithm incomplete. For avoiding that, we modify several aspects of the search algo-rithm. Once a state � is evaluated, two new nodes are added to the open list: one node thatcontains the helpful actions, which are the actions belonging to the relaxed plan computedfor � and executable in � , and one node that contains all actions applicable in � and thatdo not belong to the relaxed plan (we call them rescue actions). A flag is added to eachnode, indicating whether the actions attached to it are helpful or rescue actions. We thenadd a criterium to the node selection mechanism, that always gives preference in develop-ing a node containing helpful actions over a node containing rescue actions, whatever theheuristic estimates of these nodes are. As no action is lost and no node is pruned from thesearch space as in FF, completeness is preserved.

Our empirical evaluation of the use of this lookahead strategy in a complete best-firstsearch algorithm that takes benefit of helpful actions demonstrates that in numerous planningbenchmark domains, the improvement of the performance in terms of running time and sizeof problems that can be handled have been drastically improved. Taking into account helpfulactions makes a best-first search algorithm always more performant, while the lookaheadstrategy makes it able to solve very large problems in several domains. One drawback ofour lookahead strategy is sometimes a degradation of plan quality, which we found to becritical for a few problems. But the trade-off between speed and quality, even in some“difficult” domains where solutions for some problems are substantially longer when usingthe lookahead strategy, seems to always tend in favor of it.

After giving classical definitions and notations in Section 2, we explain the main ideasof the paper and give theoretical issues in Section 3. We then give all details about thealgorithms implemented in our planning system in Section 4, and illustrate them with anexample from the well-known Logistics domain in Section 5. We finally present an experi-mental evaluation of our work in Section 6 before some related works in Section 7 and ourconclusions in Section 8.

2 Definitions

Operators are STRIPS-like operators, without negation in their preconditions. We use afirst order logic language � , constructed from the vocabularies �� , �� , �� that respectivelydenote finite disjoint sets of symbols of variables, constants and predicates.

Definition 1 (operator) An operator, denoted by , is a triple �� where �� , �� and�� denote finite sets of atomic formulas of the language � . �� , !"�#�$�� and %&�(')�� respectively denote the sets �� , �� and �� of the operator .

Definition 2 (state, fluent) A state is a finite set of ground atomic formulas (i.e. without anyvariable symbol). A ground atomic formula is also called a fluent.

Definition 3 (action) An action denoted by � is a ground instance �*,+-�� *��*��.*#� ofan operator which is obtained by applying a substitution * defined with the language �such that �� * , ��* and ��.* are ground. �� .��/�0� , !"�#�$�/�0� , %1�(')�/�2� respectively denote thesets �� * , ��* , ��.* and represent the preconditions, adds and deletes of the action � .

Definition 4 (planning problem) A planning problem is a triple 34+56!7�98��:;� where !denotes a finite set of actions (which are all the possible ground instantiations of a given setof operators defined on � ), 8 denotes a finite set of fluents that represent the initial state,and : denotes a finite set of fluents that represent the goals.

3

Definition 5 (relaxed planning problem) Let 3 +46!7�98��:;� be a planning problem. Therelaxed planning problem 3 � + 6! � ��8�� :7� of 3 is such that

! � + � 6�� /�0��!"��/�0��#�� !Definition 6 (plan) A plan is a sequence of actions 6��(�� . Let 3 + 6!7�98��:;� bea planning problem. The set of all plans constructed with actions of ! is denoted by� '6��#�63 � .Definition 7 (First, Rest, Length, concatenation of plans) We define the classical func-tions �� and �� on non-empty plans as �� 6�� 9� + �� and �� 6� � �� $�9� + 6�!�� , and � ��#"!�%$ on all plans as � ��#"!�%$ ��6�#� �� 9� +&� (with� ��#"!�%$ ��9� +(' ). Let �)�7+ 6��(�� and �*� + ,+-� �� .+0/;� be two plans. The con-catenation of �1� and �*� (denoted by �1��2 �*� ) is defined by �1��2"�*� + 6� � �� .+��(�� .+0/7� .Definition 8 (application of a plan) Let � be a state and � be a plan. The impossible state,which represents a failure in the application of a plan, is denoted by 3 . The application of� on � (denoted by �54 � ) is recursively defined by:

�54 � +if � + �� or � +63then �else /* with � + 6� � �� */

if �� .��/� � �87 �then 9 ��;: %1�(')�/�� 9�=<1!"��/� � �?>�4 6��.�� else 3 .

Definition 9 (valid plan, solution plan) Let � + 6�#�(�� be a plan. � is valid fora state � iff �@4 �BA+C3 . � is a solution plan of a planning problem 3 + 6!7�98��:;� iff�D� � '6��#�63 � and :E7 884 � .

Definition 10 (reachable state) Let 34+ 6!7�98��:;� be a planning problem. Let � and � �be two states. � � is reachable from � in 3 iff there exists a plan �F� � '6��#�63 � such that�@4 � + � � .

3 The ideas

In this section, we expose the two main ideas of the paper (lookahead plans and use ofhelpful actions in a complete algorithm) in a very general way, without entering the detailsof their computation which will be given in the next section. For each of these ideas, wedescribe the main intuitions, we briefly explain how to obtain them in the context of aforward heuristic search planner based on FF [HN01] and we explain how to use them in acomplete heuristic search algorithm.

3.1 Computing and using lookahead states and plans

In classical forward state-space search algorithms, a node in the search graph represents aplanning state and a vertex starting from that node represents the application of one actionto this state. For completeness issue, all actions that can be applied to one state must beconsidered. For example, let � be planning state and

� �=�(�� # be all the actions thatcan be applied to that state. Figure 1 represents the development of a node related to the state� , with each action �#�(�� leading respectively to the states �G� �� , in a completesearch algorithm. The order in which these states will then be considered for developmentdepends on the overall search strategy: depth-first, breadth-first, best-first. . .

4

�

� � � � �=�

� � �!� ��

Figure 1: Node development

The main problem of forward heuristic search planning is of course the selection of thebest node to be developed next. Difficulties appear when heuristic values for such nodes arevery close to each other: no state can be distinguished from other states.

Let us now imagine that for each evaluated state � , we knew a valid plan � that couldbe applied to � and would lead to a state closer to the goal than direct descendants of� . It could then be interesting to apply � to � , and use the resulting state � � as a knewnode in the search. This state could be simply considered as a new descendant of � , asrepresented in the Figure 2: +�� is an action that can be applied to � , +�� is an action that canbe applied to �54 +-� , and so on until the application of +./ : the resulting state � � is such that� � + �54 ,+-� �� (��+�/ � . We have then two kind of vertices in the search graph: the ones thatcome from the direct application of an action to a state, and the ones that come from theapplication of a valid plan to a state � and lead to a state � � reachable from � . We will callsuch states lookahead states, as they are computed by the application of a plan to a node butare considered in the search tree as direct descendants of the nodes they are connected to.Plans labeling vertices that lead to lookahead states will be called lookahead plans. Once agoal state is found, the solution plan is then the concatenation of single actions for classicalvertices and lookahead plans for the other vertices.

�

� � � � ��

� � �� ,+-� �.+.�� .+0/7�

Figure 2: Node development with lookahead state

The computation of a heuristic for each states as is done in the FF planner offers a wayto get such lookahead plans. FF creates a planning graph for each encountered state � , forthe relaxed problem obtained by ignoring deletes of actions and using � as initial state. Arelaxed plan is then extracted from these planning graphs in polynomial time and space. Thelength in number of actions of these relaxed plans corresponds to the heuristic evaluationof the states for which they are calculated. The relaxed plan for a state � is in general notvalid for � , as deletes of actions are ignored during its computation: negative interactions

5

between actions are not considered, so an action can delete a goal or a fluent needed as aprecondition by some actions that follow it in the relaxed plan. But actions of the relaxedplans are used because they produce fluents that can be interesting to obtain the goals, sosome actions of these plans can possibly be interesting to compute the solution plan of theproblem. In numerous benchmark domains, we can observe that relaxed plans have a verygood quality in that they contain a lot of actions that belong to solution plans. We proposea way of computing lookahead plans from these relaxed plans, by trying as most actions aspossible from them and keeping the ones that can be collected into a valid plan. The detailsof the algorithms are given in Section 4.

Completeness and correctness of search algorithms is preserved by this process, becausethe nodes that are added by lookahead plans are reachable from the states they are connectedto, and because no information is lost: all actions that can be applied to a state are stillconsidered. All we do is adding new nodes, corresponding to states that can be reachedfrom the initial state.

Even more, this lookahead strategy can preserve optimality of algorithms such as A*or IDA* when used with admissible heuristics, although some preliminary experiments weconducted did not gave interesting results (we tried to use a heuristic similar to the one usedin [HG00], which corresponds to the length in number of levels of the relaxed planninggraph, while still extracting a relaxed plan for computing a lookahead plan). The conditionfor preserving optimality is to consider that the cost of a vertex is the number of actionsattached to it: 1 for classical vertices, and the length of the lookahead plan for verticesadded by the lookahead algorithm. The evaluation function

� +&"�� $ for a state � , with "being the cost for reaching � from the initial state and $ being the estimation of the cost from� to the goal, thus takes into account the length of the lookahead plans within the value of" . This cost, that takes into account the length of lookahead plans, must also be consideredby the way the state loop control (if used) is handled. Indeed, avoiding to explore alreadyvisited states increases a lot the performances of forward state-space planners [VR99]: astate � encountered at a depth � in the search tree does not need to be developed again whenencountered again at a depth � �� . In order to preserve optimality, we must consider thecost of a node as defined above, and not simply its depth: a state � encountered with a cost "in the search tree need not to be developed again when encountered again with a cost " �� " .

3.2 Using helpful actions: the “optimistic” search algorithms

In classical search algorithms, all actions that can be applied to a node are considered thesame way: the states that they lead to are evaluated by a heuristic function and are thenordered, but there is no notion of preference over the actions themselves. Such a notion ofpreference during search has been introduced in the FF planner [HN01], with the conceptof helpful actions. Once the planning graph for a state � is computed and a relaxed plan isextracted, more information is extracted from these two structures: the actions of the relaxedplan that are executable in � are considered as helpful, while the other actions are forgottenby the local search algorithm of FF. But this strategy appeared to be too restrictive, so the setof helpful actions is augmented in FF by all actions executable in � that produce fluents thatwere considered as goals at the first level of the planning graph, during the extraction of therelaxed plan. The main drawback of this strategy, as used in FF, it that it does not preservecompleteness: the actions executable in a state � that are not considered as helpful aresimply lost. In FF, the search algorithm is not complete for other reasons (it is a variation ofan hill-climbing algorithm), and the search must switch to a complete best-first search whenno solution is found by the local search algorithm.

We present in this paper a way to use such a notion of helpful actions in complete searchalgorithms, that we call optimistic search algorithms because they give a maximum trust tothe informations returned by the computation of the heuristic. The principle is the following:

6

1. Several classes of actions are created. In our implementation, we only use two ofthem: helpful actions (the restricted ones: the actions of the relaxed plan that areexecutable in the state for which we compute a relaxed plan), and rescue actions thatare all the actions that are not helpful.

2. When a newly created state � is evaluated, the heuristic function returns the numer-ical estimation of the state and also the actions executable in � partitioned into theirdifferent classes (for us, helpful or rescue3). For each class, one node is created forthe state � , that contains the actions of that class returned by the heuristic function.

3. When a node is selected to be developed, the class of the actions attached to it is takeninto account: in our “optimistic” algorithm, it is the first criterium: nodes containinghelpful actions are always preferred over nodes containing rescue actions, whatevertheir numerical heuristic values are.

Once again, completeness and correctness are preserved: no information is lost. Theway nodes are developed is simply modified: a state � is developed first with helpful ac-tions, some other nodes are developed, and then � can potentially be developed with rescueactions. As the union of helpful actions and rescue actions is equal to the set of all theactions that can be applied to � , no action is lost. But optimality for use with admissibleheuristics cannot of course be preserved: nothing guarantees that helpful actions alwaysproduce the shortest path to the goal.

4 Description of the algorithms

In this section, we describe the algorithms of our planning system and discuss the maindifferences with the FF planner, which is the closer work presented in the literature. Themain functions of the algorithm are the following:

LOBFS � � : this is the main function (see Figure 3). At first, the function compute nodeis called over the initial state of the problem and the empty plan: the initial state isevaluated by the heuristic function, and a node is created and pushed into the openlist. Nodes in the open list have the following structure: /� �� !7�.$�� , where:

� � is a state,� � is the plan that lead to � from the initial state,� ! is a set of actions applicable in � ,� $ is the heuristic value of � (the length of the relaxed plan computed from � ),� � is a flag indicating if the actions of ! are helpful actions (value $��('�� ' ) or

rescue actions (value �� (� � � ).We then enter in a loop that selects the best node (function pop best node) in theopen list and develops it until a plan is found or the open list is empty. In contrastto standard search algorithms, the actions chosen to be applied to the node state arealready known, as they are part of the informations attached to the node. These actionscome from the computation of the heuristic in the function compute heuristic calledby compute node, which returns a set of helpful actions and a set of rescue actions.

3It must be noted that an action can be considered as helpful for a given state, but can be considered as rescuefor another state: it depends on the role it plays in the relaxed plan and in the planning graph.

7

pop best node � � : returns the best node of the open list. Nodes are compared followingthree criteria of decreasing importance. Let

� + /� �� !7�.$�� be a node in theopen list. The first criterium is the value of the flag

�: $��.' � � � ' is preferred over

�� (� � � . When two flags are equal, the second criterium minimizes the heuristic value� �� +�� $�� #"!�%$ �/�;� , as in � !�� algorithm. In our current implementation,we use � + �

. When two heuristic values are equal, the third criterium minimizesthe length of the plan � that lead to � .

compute node �� ;� : it is called by LOBFS over the initial state and the empty plan, orby LOBFS or itself over a newly created state � and the plan � that lead to that state(see Figure 3). Calls from LOBFS come from the initial state or the selection of anode in the open list and the application of one action to a given state. Calls fromitself come from the computation of a valid plan by the lookahead algorithm and itsapplication to a given state.

If the state � belongs to the close list or is a goal state, the function terminates. Oth-erwise, the state is evaluated by the heuristic function (compute heuristic, whichreturns a relaxed plan, a set of helpful actions and a set of rescue actions). This op-eration is performed the first time with the goal-preferred actions (actions that do notdelete a fluent that belongs to the goal and do not belong to the initial state).

If a relaxed plan can be found, two nodes are added to the initial state: one node forthe helpful actions (the flag is set to $��('�� ' ) and one node for the rescue actions (theflag is set to ��.� � � ). A valid plan is then searched by the lookahead algorithm, andcompute node is called again over the state that results from the application of thisplan to � (if this valid plan is at least two actions long).

If no relaxed plan can be found with the goal-preferred actions, the heuristic is eval-uated again with all actions. In that case, we consider that � has a lowest interest: ifa relaxed plan is found, only one node is added to the open list (helpful actions andrescue actions are merged), the flag is set to ��.� � � , and no lookahead is performed.

compute heuristic �� !�� : this function computes the heuristic value of the state � in away similar to FF. At first, a relaxed planning graph is created, using only actionsfrom the set ! . This parameter allows us to try to restrict actions to be used to goal-preferred actions (actions that delete a goal which is not in the initial state): thisheuristic proved to be useful in some benchmark domains. Once created, the relaxedplanning graph is then searched backward for a solution.

In our current implementation, there is two main differences compared to FF. Thefirst difference holds in the way actions are used for a relaxed plan. In FF, when anaction is selected at a level � , its add effects are marked � � � at level � (as in classicalGraphplan), but also at level �8:�� . As a consequence, a precondition required byanother action at the same level will not be considered as a new goal. In our imple-mentation, add effects of an action are only marked true at time � , but its preconditionsare required at time � and not at the first level they appear, as in FF. A precondition ofan action can then be achieved by an action at the same level, and the range of actionsthat can be selected to achieve it is wider.

The second difference holds in the way actions are added to the relaxed plan. In FF,actions are arranged in the order the get selected. We found useful to use the followingalgorithm. Let � be an action, and 6�� !�� be a plan. � is ordered after �#� iff:the level of the goal � was selected for is greater or equal than the level of the goal� � was selected for, and either � deletes a precondition of �=� or � � does not delete aprecondition of � . In that case, the same process continues between � and �#� , and soon with all actions in the plan. Otherwise, � is placed before �� .

8

The differences between FF and our implementation we described here are all heuris-tic and motivated by our experiments, since making optimal decisions for these prob-lems are not polynomial, as stated in [HN01].

The function compute heuristic returns a structure � � �� 0� � where: � � is a re-laxed plan, � is a set of helpful actions, and � is a set of rescue actions. As complete-ness is preserved by the algorithm, we restrict the set of helpful actions compared toFF: they are only actions of the relaxed plan applicable in the state � for which wecompute the heuristic. In FF, all the actions that are applicable in � and produce agoal at level 1 are considered as helpful. Rescue actions are all actions applicablein � and not present in � � , and are used only when no node with helpful actions ispresent in the open list. So, � < � contains all actions applicable in � .

lookahead �� 0� �;� : this function searches a valid plan for a state � using the actions of arelaxed plan � � calculated by compute heuristic (cf. Figure 4). Several strategiescan be imagined: searching plans with a limited number of actions, returning severalpossible plans, . . . From our experiments, the best strategy we found is to search oneplan, containing as most actions as possible from the relaxed plan, and to try to replacean action of � � which is not applicable by another action when no other choice ispossible.

At first, we enter in a loop that stops if no action can be found or all actions of ��have been used. Inside this loop, there is two parts: one for selecting actions from � � ,and another one for replacing an action of � � by another action in case of failure inthe first part.

In the first part, actions of � � are observed in turn, in the order they are present inthe sequence. Each time an action � is applicable in � , we add � to the end of thelookahead plan and update � by applying � to it. Actions that cannot be applied arekept in a new relaxed plan called

� �� '6�.� , in the order they get selected. If at leastone action has been found to be applicable, when all actions of � � have been tried,the second part is not used (this is controlled by the boolean � -�=� � � � � ). The relaxedplan �� is updated with

� �� '/�(� , and the process is repeated until � � is empty or noaction can be found.

The second part is entered when no action has been found in the first part. The goalis to try to repair the current (not applicable) relaxed plan, by replacing one actionby another which is applicable in the current state � . Actions of

� �� '/�(� are observedin turn, and we look for an action (in the global set of actions ! ) applicable in � ,which achieves an add effect of the action of

� �� '/�(� we observe, this add effect beinga precondition of another action in the current relaxed plan. If several achievers arepossible for the add effect of the action of

� �� '6�.� we observe, we select the one thathas the minimum cost in the relaxed planning graph used for extracting the initialrelaxed plan (function choose best). When such an action is found, it is added to thelookahead plan and the global loop is repeated. The action of

� �� '/�(� observed whena repairing action was found is deleted from the current relaxed plan. This repairingtechnique is also completely heuristic, but gave good results in our experiments.

5 An example

Let us now illustrate the main ideas of the paper and the algorithms by the resolution of asmall problem issued from the well-known Logistics benchmark domain. We provide hereexactly what prints our planning system.

9

let 3 + 6!7��8�� :7� ; /* planning problem */let :�! + � � � !E�� %&�(')�/�0�� :�� 82�0 ; /* goal-preferred actions */let ��$�� +�� ; /* open list: nodes to be developed */let � '/ �(�"+�� ; /* close list: already developed nodes */

function LOBFS � �compute node �68��9� ;while ��$�� A+�� do

let /� �� .� � -�� .$�� '6� "�� +�� #� �forall � � ��.� � -�� do

compute node �� 4 6�0�� 2 6�0�9�endfor

endwhileend

function compute node �� ;� /* S: state, P: plan, */if � �� '/ �.� then

if :E7 � then output and exit �/�;� endif ;� '/ �.�� '/ �(�8< � �8 ;let �� +�� ! ��!"�#$�%��#&�#�� :�!�� ;if � � A+ � �� ' then

�� '� �� < � /� �� #"!�%$ �,� �;��.$��('�� '/��/� �� 0�,�� #"!�%$ �,� �;��9 ��.� � ��0 ;

let /� � �� +�(&)�*�+� ! �+�� 0� �;� ;if � ��#"!�%$ �/� � � ��,

thencompute node �� &2 � � �

endifelse

let � � � � �0� � +��!�� ! ��"�#$�%��#&�� !�� ;if � � A+ � �� ' then

��$��'� ��$�� < � /� �� #"!�%$ �,� �;�� < �;�9 ��.� � ��0endif

endifendif

end

Figure 3: Lookahead Optimistic Best-First Search algorithm

10

function lookahead �� ;� /* S: state, RP: relaxed plan */let �$' �� + �� ;let� �� '6�.�,+ �� ;

let � -�=� �?� � � +&� � � ;while � -�=� � � � �� A+ �� do

� -�=� �?� � � � � ��' �.� ;forall �)� 9 ��0��> do /* with � � + 6� �(�� */

if �� .� �/�� 17 � then� -�=� � � � �� ;� � �54 6�� ;��'6�� '6��@2 6��

else � �� '/�(�� '/�(��2 6�� endif

endfor ;if � -�=� � � � � then

�� '/�(� ;� �� '/�(�� else

�� ;while � � -�=� � � � �� '/�(� A+ �� do

forall� � !"�#�$�,�� '/�(�2�9� do

if�� 2� � �,� � 2 � �� '6�.�� .��/�0� then

let � �-�� + '/� �2�� -��"+ � �� !D� � � !"��/�0��1�� .� �/�2�17 �8 ;if �$ �� + '6� ��.� � -�� A+ � then

let � +�� !)�� %�� +�'/� ��.� � -�� ;� -�=� � � � �� ;� � �@4 6�0� ;��'6�� '6��@2 6�2� ;� � � � �&2 �� '/�(�2� ;� �� '/�(� � ��

endifendif

endfor ;if � � -�=� � � � � then

� � � ��&2 �� '/�(�� ;� �� '/�(�� '/�(��endif

endwhileendif

endwhilereturn �� /��'6��

end

Figure 4: Lookahead algorithm

11

This Logistics world contains two cities (Paris and Toulouse), each one subdivided intotwo districts (post office and airport). Each city contains a truck that can move betweendistricts of this city. There is one airplane, which can move between airports. Trucks andairplane can load and unload packages in every location they can go. In the initial state,three objects are located at Paris post office, each truck is located at its respective city postoffice, and the airplane is located at Paris airport. The goal is to move two of these objectsto Toulouse post office, the third object staying at Paris post office. This problem can bedescribed in typed PDDL [MGH � 98] by the following:

(define (problem small-french-logistics)(:domain logistics)(:objectsParis Toulouse - citypa-po tlse-po - locationpa-apt tlse-apt - airportA320 - airplanepa-truck tlse-truck - truckobj1 obj2 obj3 - package)(:init(in-city pa-po Paris) (in-city pa-apt Paris)(in-city tlse-po Toulouse) (in-city tlse-apt Toulouse)(at A320 pa-apt) (at pa-truck pa-po) (at tlse-truck tlse-po)(at obj1 pa-po) (at obj2 pa-po) (at obj3 pa-po))(:goal(and (at obj1 tlse-po) (at obj2 tlse-po) (at obj3 pa-po))))

The open and close lists are initiated to the empty set:

�� + �

� '6 �.�"+ � As the in-city predicate is only used for constraining trucks to move between districts

of the same city, we can remove fluents using it from the initial state. They can be handledin a separate way, typically by the instantiation procedure. The initial state 8 can then bedescribed by the following:

8 + �(at A320 pa-apt), (at pa-truck pa-po),(at tlse-truck tlse-po), (at obj1 pa-po), (at obj2 pa-po),(at obj3 pa-po)

The function compute node is then called on the initial state 8 and the empty plan. As 8does not belong to the close list, it is evaluated by the heuristic function compute heuristicwhich returns a relaxed plan �8� , a set of helpful actions � � and a set of rescue actions� � . The action (LOAD-TRUCK obj3 pa-truck pa-po) is not considered as helpfulbecause it is not used into the relaxed plan. The union of � � and � � represents all actionsapplicable in 8 .

12

�1� + (LOAD-TRUCK obj1 pa-truck pa-po),(LOAD-TRUCK obj2 pa-truck pa-po),(DRIVE-TRUCK pa-truck pa-po pa-apt Paris),(UNLOAD-TRUCK obj2 pa-truck pa-apt),(UNLOAD-TRUCK obj1 pa-truck pa-apt),(LOAD-AIRPLANE obj2 A320 pa-apt),(LOAD-AIRPLANE obj1 A320 pa-apt),(FLY-AIRPLANE A320 pa-apt tlse-apt),(DRIVE-TRUCK tlse-truck tlse-po tlse-apt Toulouse),(UNLOAD-AIRPLANE obj1 A320 tlse-apt),(UNLOAD-AIRPLANE obj2 A320 tlse-apt),(UNLOAD-TRUCK obj1 tlse-truck tlse-po),(LOAD-TRUCK obj1 tlse-truck tlse-apt),(UNLOAD-TRUCK obj2 tlse-truck tlse-po),(LOAD-TRUCK obj2 tlse-truck tlse-apt) �

� � + �(LOAD-TRUCK obj1 pa-truck pa-po),(LOAD-TRUCK obj2 pa-truck pa-po),(DRIVE-TRUCK pa-truck pa-po pa-apt Paris),(FLY-AIRPLANE A320 pa-apt tlse-apt),(DRIVE-TRUCK tlse-truck tlse-po tlse-apt Toulouse)

� � + �(LOAD-TRUCK obj3 pa-truck pa-po)

As a relaxed plan has been found, the open list is augmented with two new nodes, onefor the helpful actions and one for the rescue actions. The heuristic value 15 corresponds tothe number of actions of the relaxed plan �8� . The initial state 8 is added to the close list.

�� + � 8�� $��('�� '�� , 8��0� � � � � �9 ��.� � ��

� '6 �.�"+ � 8 At that point, a classical search algorithm would have selected and developed a node of

the open list. The preferred node for our algorithm would be the one with the flag set to$��('�� ' . But as the relaxed plan has been found with only using goal-preferred actions, alookahead plan will be computed from the initial state.

This is motivated by the fact that the relaxed plan �� has a very good quality. We canremark that the eleven first actions of �1� give the beginning of an optimal plan that solvesthe problem: the two objects that must move to Toulouse are loaded into the truck at Parispost office, the truck goes to Paris airport, the two objects are unloaded from the truck andboarded into the airplane, the airplane flies from Paris to Toulouse while one truck goesfrom Toulouse post office to Toulouse airport, and the two objects are unloaded from theairplane.

There is a problem with the following four actions: as delete lists of actions are nottaken into account, the fact that one truck is still at Toulouse post office remains true whileextracting a relaxed plan: the action of driving the truck from the airport to the post officedoes not appear. For the same reason, the final actions of loading objects and unloadingthem are badly ordered.

The function lookahead returns /�G� �� +�(&) *�+� ! �+�� 68��)� � , with � � + 884 � � � :

13

� � � + (LOAD-TRUCK obj1 pa-truck pa-po),(LOAD-TRUCK obj2 pa-truck pa-po),(DRIVE-TRUCK pa-truck pa-po pa-apt Paris),(UNLOAD-TRUCK obj2 pa-truck pa-apt),(UNLOAD-TRUCK obj1 pa-truck pa-apt),(LOAD-AIRPLANE obj2 A320 pa-apt),(LOAD-AIRPLANE obj1 A320 pa-apt),(FLY-AIRPLANE A320 pa-apt tlse-apt),(DRIVE-TRUCK tlse-truck tlse-po tlse-apt Toulouse),(UNLOAD-AIRPLANE obj1 A320 tlse-apt),(UNLOAD-AIRPLANE obj2 A320 tlse-apt),(LOAD-TRUCK obj1 tlse-truck tlse-apt),(LOAD-TRUCK obj2 tlse-truck tlse-apt) �

�*� + �(at A320 tlse-apt), (at pa-truck pa-apt),(at tlse-truck tlse-apt),(in obj1 tlse-truck),(in obj2 tlse-truck), (at obj3 pa-po)

The eleven first actions have been taken without any change, and the algorithm foundthat the two actions of loading objects into Toulouse truck can be executed without the twoactions of unloading these objects from the truck, which were present in the relaxed plan.The repairing part of the lookahead algorithm is not useful here, because no action in theglobal set of actions can have the same effects that unloading the objects at Toulouse post-office, which cannot be executed here.

The lookahead plan � �� being more than one action long, the function compute node isrecursively called on the state �*� and the plan � �� that lead to it. As the state �G� does notbelong to the close list, it is evaluated by the heuristic function compute heuristic whichreturns a relaxed plan �G� , a set of helpful actions �� and a set of rescue actions �� :

�G� + (UNLOAD-TRUCK obj1 tlse-truck tlse-po),(DRIVE-TRUCK tlse-truck tlse-apt tlse-po Toulouse),(UNLOAD-TRUCK obj2 tlse-truck tlse-po) �

� � + �(DRIVE-TRUCK tlse-truck tlse-apt tlse-po Toulouse)

�� + �(UNLOAD-TRUCK obj1 tlse-truck tlse-apt),(UNLOAD-TRUCK obj2 tlse-truck tlse-apt),(FLY-AIRPLANE A320 tlse-apt pa-apt),(DRIVE-TRUCK pa-truck pa-apt pa-po Paris)(LOAD-TRUCK obj3 pa-truck pa-po)

A relaxed plan has been found this time again, so the open list is augmented with twonew nodes: one for the helpful actions and one for the rescue actions. There is now fournodes in the open list, as none of them has been developed yet, and �)� is entered into theclose list:

�� + � 8�� $��('�� '�� , 8��0� � � � � �9 ��.� � �� , /� � �� @�� .$��('�� '�� ,/� �(�� 0�� 9 ��.� � ��

� '6 �.�"+ � 8 , �*�)Once again, the relaxed plan has been computed with only using goal-preferred actions.

So, a lookahead plan is searched from the state � � . In the case of a choice by the function

14

pop best nodes, the nodes would have been ordered by decreasing interest as follows:

1. /�*� �� .$��('�� '�� : the flag is $��('�� ' and its heuristic value evaluates to� �� + � � � � � ��#"!�%$ �/� � �� +�� + ,�,,

2. 8�� $��.' � � � '�� : the flag is $��('�� ' and its heuristic value evaluates to� �680� +

� � � � � � ��#"!�%$ ��9� +�� ,

3. /�*� �� 0�� .� � �� : the flag is ��.� � � and its heuristic value evaluates to� ��G� � +� � � � � ��#"!�%$ �/� � �� +�� + ,�,

,

4. 8�� .� � �� : the flag is �� (� � � and its heuristic value evaluates to� �680� +

� � � � � � ��#"!�%$ ��9� +�� .

The function lookahead returns /�� +�(&) *�+� ! �+�� (��*�.� , with � � + � �#4 � �� :

� �� + (DRIVE-TRUCK tlse-truck tlse-apt tlse-po Toulouse),(UNLOAD-TRUCK obj1 tlse-truck tlse-po),(UNLOAD-TRUCK obj2 tlse-truck tlse-po) �

� � + �(at A320 tlse-apt), (at pa-truck pa-apt),(at tlse-truck tlse-po),(at obj1 tlse-po),(at obj2 tlse-po), (at obj3 pa-po)

The lookahead plan � �� being more than one action long, the function compute node isrecursively called on the state � � and the plan �/� �� 2 � �� that lead to it. The state � � doesnot belong to the close list, but contains the goals of the problem: a solution plan is foundand printed out, and the search is stopped. The solution plan is then:

� �� 2 � �� + (LOAD-TRUCK obj1 pa-truck pa-po),(LOAD-TRUCK obj2 pa-truck pa-po),(DRIVE-TRUCK pa-truck pa-po pa-apt Paris),(UNLOAD-TRUCK obj2 pa-truck pa-apt),(UNLOAD-TRUCK obj1 pa-truck pa-apt),(LOAD-AIRPLANE obj2 A320 pa-apt),(LOAD-AIRPLANE obj1 A320 pa-apt),(FLY-AIRPLANE A320 pa-apt tlse-apt),(DRIVE-TRUCK tlse-truck tlse-po tlse-apt Toulouse),(UNLOAD-AIRPLANE obj1 A320 tlse-apt),(UNLOAD-AIRPLANE obj2 A320 tlse-apt),(LOAD-TRUCK obj1 tlse-truck tlse-apt),(LOAD-TRUCK obj2 tlse-truck tlse-apt),(DRIVE-TRUCK tlse-truck tlse-apt tlse-po Toulouse),(UNLOAD-TRUCK obj2 tlse-truck tlse-po),(UNLOAD-TRUCK obj1 tlse-truck tlse-po) �

As a conclusion to the treatment of this example, here is a comparison between its res-olution by our planning system with three different settings: classical Best First Search(BFS), Optimistic Best First Search (OBFS), and Lookahead Optimistic Best First Search(LOBFS):

Developed nodes: BFS and OBFS develop 16 nodes, while LOBFS develops 0 nodes:LOBFS solves this problem without entering the real search part of the algorithm.

15

Evaluated nodes: BFS evaluates 65 states, OBFS evaluates 42 states. As the plan is 16actions long, both of these algorithms evaluate unuseful states. The difference corre-sponds to the fact that OBFS uses only actions that are considered as helpful: rescueactions are in nodes whose flag is �� (� � � , which take no part in node development asa solution is found before using them. LOBFS evaluates only 2 nodes.

Computed nodes: BFS computes 129 nodes, OBFS computes 43 nodes. The differencebetween evaluated and computed nodes for these algorithms is due to the fact thatOBFS focuses faster on a solution state on the last stage of search. LOBFS computesonly 16 states, which is only performed in the lookahead algorithm and never in thedevelopment of nodes, as none of them is developed.

6 Experimental evaluation

6.1 Planners, benchmarks and objectives

We compare four planners. The first one is the FF planning system v2.3 [HN01]4, thehighly optimized heuristic search planner (implemented in C) that won the 2

� �Interna-

tional Planning Competition and was generally the top performer in the STRIPS and simplenumeric tracks of the 3

� �International Planning Competition. The other planners consist in

three different settings of our planning system called YAHSP (which stands for Yet AnotherHeuristic Search Planner, see5) implemented in Objective Caml6 compiled for speed:

� BFS (Best First Search): classical � ! � search, with � + �. The heuristic is based

on the computation of a relaxed plan as in FF, with the differences detailed in Sec-tion 4.

� OBFS (Optimistic Best First Search): identical to BFS, with the use of a flag indicat-ing whether the actions attached to a node are helpful or rescue. A node containinghelpful actions is always preferred over a node containing rescue actions.

� LOBFS (Lookahead Optimistic Best First Search): identical to OBFS, with the use ofthe lookahead algorithm described in Section 4. A lookahead plan is searched eachtime the computation of a relaxed plan can be performed with goal-preferred actions.

We use nine different benchmark domains7: the classical Logistics domain, the Mprimeand Mystery domains created for the 1 �

�IPC, the Freecell domain created for the 2

��IPC,

and the five STRIPS domains created for the 3� �

IPC (Rovers, Satellite, ZenoTravel, Driver-Log, Depots). Problems for Logistics are those of the 2

��IPC and problems for Freecell are

those of the 3� �

IPC.We classified these domains into three categories, in accordance with the way LOBFS

solves them: easy problems in Section 6.2 (Rovers, Satellite, ZenoTravel, Logistics, Driver-Log), medium difficulty problems in Section 6.3 (Mprime, Freecell), and difficult problemsin Section 6.4 (Depots, Mystery).

Our objectives for these experiments are the following:

4The FF home page can be found at:http://www.informatik.uni-freiburg.de/ � hoffmann/ff.html.

5The YAHSP home page can be found at:http://www.irit.fr/recherches/RPDMP/vincent/yahsp.html.

6Objective Caml is a strongly-typed functional language from the ML family, with object oriented extensions.Its home page can be found at http://caml.inria.fr/.

7All domains and problems used in our experiments can be downloaded on the YAHSP home page.

16

1. To test the efficiency of our planning system over a state-of-the-art planner, FF. In-deed, FF is known for its distinguished performances in numerous planning domainsand successes in the 2

��and 3

� �IPC. Although generally not as fast as FF in the BFS

and OBFS settings, our planner compares well to FF.

2. To evaluate the suitability of the optimistic search strategy. This strategy allows us touse helpful actions in complete search algorithms. This is in contrast to their use inthe enforced hill-climbing algorithm of FF which is not complete, and falls back intoa classical complete best-first search when hill-climbing fails to find a plan. We willsee in particular that the OBFS strategy is better than BFS in almost all the problems.

3. To demonstrate that the use of a lookahead strategy greatly improves the perfor-mances of forward heuristic search. Even more, we will see that our planner cansolve problems that are substantially bigger than what other planners can handle (upto 10 times more atoms in the initial state and 16 times more goals in the last Driver-Log problem). The main drawback of our lookahead strategy is the degradation inplan quality (number of actions); however, we will see that this degradation remainsgenerally very limited.

All tests have been performed in the same experimental conditions, on a Pentium II -450MHz machine with 512Mb of memory running Debian GNU/Linux 3.0. The maximalamount of time allocated to all planners for each problem was fixed to one hour.

6.2 Easy problems

As original problems from the competition sets are solved very easily by LOBFS, we cre-ated 10 more problems in each domain with the available problem generators. The 20 firstproblems (numbered from 1 to 20) are the original ones, and the 10 following are newlycreated ones.

In order to fully understand the results we present here, it is very important to remark thefollowing: difficulty (measured as the number of atoms in the initial state and the number ofgoals) between successive new created problems numbered from 21 to 30, increases muchmore than difficulty between original problems. Indeed, the last problem we created in eachproblem is the largest one that can be handled by LOBFS within the memory constraints.As the solving time remains reasonable, larger problems could surely be solved in less thanone hour with more memory.

As a consequence, the graphs representing plan length are divided into two parts: planlength for new created problems increases much more than for original ones. We also addeda table for each domain that shows some data about the largest problems solved by FF,OBFS and LOBFS, in order to realize the progress accomplished in the size of the problemsthat can be solved by a STRIPS planner.

6.2.1 Rovers domain

This domain is inspired by planetary rovers problems. A collection of rovers navigate aplanet surface, analyzing samples of soil and rock, and taking images of the planet. Analysisand images are then communicated back to a lander. Parallel communication between roversand the lander is impossible.

OBFS and FF perform nearly the same, FF being slightly more efficient than OBFS(see Figure 5). They are much more faster than BFS and solve 24 problems, while BFSsolves only 20 problems. OBFS is up to 300 times faster than BFS, on problem 21. LOBFSis much more efficient than all other planners, and solves all the problems. The largestproblem solved by LOBFS contains 6 times more atoms in the initial state and 4 times more

17

goals than the largest problem solved by FF and OBFS (see Table 1). Plans found by LOBFScontain sometimes slightly more actions than plans found by other planners, but this remainslimited (see Figure 6).

What is remarkable in this domain is that LOBFS never develops any node: plans arefound by recursive calls between node computation and lookahead plan application, as inthe example of the Logistics domain we studied in Section 5. Such a behavior will beencountered again for numerous problems in other domains, and is a reason of the goodperformances of LOBFS.

6.2.2 Satellite domain

This domain is inspired by space-applications and is a first step towards the “AmbitiousSpacecraft” described by David Smith at AIPS-2000. It involves planning and schedul-ing a collection of observation tasks between multiple satellites, each equipped in slightlydifferent ways. Satellites can point to different directions, supply power to one selectedinstrument, calibrate it to one target and take images of that target.

Running time curves for OBFS and FF are very similar, FF being generally more effi-cient than OBFS but at most around 10 times faster (see Figure 7). They are much moreefficient than BFS and solve 21 problems against 19 for BFS. OBFS can be up to 30 timesfaster than BFS, on problem 17. LOBFS outperforms all other planners, and solves all theproblems. The largest problem solved by LOBFS contains 10 times more atoms in the initialstate and 6 times more goals than the largest problem solved by FF and OBFS (see Table 2).Plans found by LOBFS contain often slightly more actions than plans found by other plan-ners, but this remains limited (see Figure 8) with the except of one problem for which itfinds a plan containing 176 actions against 105 actions for OBFS.

In this domain, LOBFS develops only one node in 8 problems and develops no node inall other problems.

6.2.3 ZenoTravel domain

This is a transportation domain which involves transporting people around in planes. It wasmore specifically designed for numeric tracks of the competition, with different modes ofmovement: fast and slow. The fast movement consumes fuel faster than the slow once,making the search for a good quality plan (one using less fuel) much harder. In its STRIPSversion, the fast movement gives no advantage, as quality is measured by plan length; butas more fuel is consumed, it still brings a penalty. The best behavior for STRIPS planner isthen to avoid using the fast movement.

In this domain, FF is often more efficient than OBFS but at most around 10 times faster(see Figure 9). Both are much more efficient than BFS. FF solves 25 problems, against24 for OBFS and 21 for BFS. OBFS is up to 30 times faster than BFS, on problem 19.LOBFS is much more efficient than all other planners, and solves all the problems. Thelargest problem solved by LOBFS contains 2 times more atoms in the initial state and 2times more goals than the largest problem solved by OBFS (see Table 3). Plans found byLOBFS contain often slightly more actions than plans found by other planners, but thisremains limited (see Figure 10) with the except of 2 or 3 problems where it finds up to 2times more actions.

In this domain, LOBFS develops only 5 nodes in one problem, one node in 2 problemsand develops no node in all other problems.

6.2.4 Logistics domain

This is the classical domain, as described in Section 5.

18

0.1

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 11 12 10 14 16 13 15 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 5: Rovers domain (CPU time)

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 11 12 10 14 16 13 15 17 18 19 20

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

0

100

200

300

400

500

600

700

800

900

21 22 23 24 25 26 27 28 29 30

Problems

Figure 6: Rovers domain (plan length)

Problem 24 Problem 30Rovers 26 66Way-points 80 200Objectives 26 66Cameras 16 40Goals 26 66Init atoms 5920 35791Goals 33 127

FF OBFS LOBFS LOBFSPlan length 130 133 145 560Evaluated nodes 3876 2114 9 24Search time 418.32 430.95 1.97 44.35Total time 422.48 437.92 8.87 219.13

Table 1: Rovers domain (largest problems)

19

0.1

1

10

100

1000

10000

1 2 3 4 6 5 7 8 9 10 11 18 12 14 13 19 15 20 16 17 21 22 23 24 25 26 27 28 29 30

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 7: Satellite domain (CPU time)

0

20

40

60

80

100

120

140

160

180

1 2 3 4 6 5 7 8 9 10 11 18 12 14 13 19 15 20 16 17

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2200

21 22 23 24 25 26 27 28 29 30

Problems

Figure 8: Satellite domain (plan length)

Problem 21 Problem 30Satellites 6 45Instruments/sat. 12 50Modes 12 50Targets 6 25Observations 25 100Init atoms 971 10374Goals 124 768

FF OBFS LOBFS LOBFSPlan length 140 125 151 2058Evaluated nodes 22385 20370 5 5Search time 188.69 328.42 0.12 33.73Total time 188.82 328.70 0.40 94.24

Table 2: Satellite domain (largest problems)

20

0.1

1

10

100

1000

10000

1 2 3 4 5 6 7 9 8 10 11 13 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 9: ZenoTravel domain (CPU time)

0

20

40

60

80

100

120

1 2 3 4 5 6 7 9 8 10 11 13 12 14 15 16 17 18 19 20

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

100

150

200

250

300

350

400

450

21 22 23 24 25 26 27 28 29 30

Problems

Figure 10: ZenoTravel domain (plan length)

Problem 24 Problem 25 Problem 30Cities 36 40 80Planes 9 10 20People 45 50 100Init atoms 166 183 353Goals 45 49 100

FF OBFS LOBFS FF LOBFS LOBFSPlan length 163 165 177 179 211 444Evaluated nodes 3481 5271 15 8714 16 20Search time 562.09 1496.81 4.15 1898.26 6.45 59.67Total time 564.07 1505.43 12.80 1901.03 18.98 247.06

Table 3: ZenoTravel domain (largest problems)

21

In this domain, FF is much more efficient than OBFS: up to 57 times faster in problem 6(see Figure 11). FF solves 15 problems against 10 for OBFS. This last remains howevermore efficient than BFS, which solves only 8 problems. LOBFS is outperforms all otherplanners, and solves all the problems. The largest problem solved by LOBFS contains 3.5times more atoms in the initial state and 3 times more goals than the largest problem solvedby OBFS (see Table 4). What is remarkable in this domain is that all planners give plans ofapproximately the same length, with only a few actions of difference (see Figure 12).

In this domain, LOBFS never develops any node.

6.2.5 DriverLog domain

This domain involves driving trucks around delivering packages between locations. Thedifference with a more classical transportation domain such as Logistics is that the trucksrequire drivers who must walk between trucks in order to drive them. The paths for walkingand the roads for driving form different maps on the locations.

In this domain, OBFS is often more efficient than FF (see Figure 13). Both are muchmore efficient than BFS, but this last solves more problems than FF: FF solves 15 problems,BFS solves 18 problems, and OBFS solves 20 problems. LOBFS is much more efficientthan all other planners, and solves all the problems. The largest problem solved by LOBFScontains 3.5 times more atoms in the initial state and 4 times more goals than the largestproblem solved by OBFS (see Table 5). This problem contains also 9 times more atoms inthe initial state and 16 times more goals than the largest problem solved by FF. Plans foundby LOBFS contain sometimes slightly more actions than plans found by other planners, butthis remains very limited (see Figure 14).

Problems from this domain seem to be a little bit more difficult for LOBFS than prob-lems from previous domains: 13 problems require nodes to be developed. But there are stillfew of such nodes: 4 problems develop between 10 and 75 nodes, and 9 problems developbetween 1 and 9 nodes.

6.2.6 Concluding remarks

For the five domains we presented in this section, the superiority of LOBFS over all plannersand the superiority of OBFS over BFS are clearly demonstrated, while OBFS and FF havecomparable performances.

The only drawback of LOBFS is sometimes a slight degradation of plan quality, but thisremains very limited: the trade-off between speed and quality tends without any doubt infavor of our lookahead technique. Several known benchmarks not presented here such asFerry, Gripper, Miconic-10 elevator, are also solved very easily by LOBFS.

It is to be noted that the FF planner had already good performances in these domains,that are for the most part transportation domains; but the time required for solving problemsfrom these domains and the size of problems that can be handled have been considerablyimproved.

6.3 Medium difficulty problems

The results we present for domains in this category are not as advantageous for LOBFS asin the previous section, but still compares well to the results of the other planners.

6.3.1 Mprime domain

In this domain, OBFS and FF have very close performances (FF being slightly faster), withthe except of problem 18 solved by OBFS but not by FF and two problems (22 and 14)solved much faster by OBFS (see Figure 15). BFS solves only 24 of the total 35 problems,

22

0.1

1

10

100

1000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 11: Logistics domain (CPU time)

0

100

200

300

400

500

600

700

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

21 22 23 24 25 26 27 28 29 30

Problems

Figure 12: Logistics domain (plan length)

Problem 13 Problem 15 Problem 30Airplanes 6 7 20Cities 22 25 50City size 2 2 5Packages 66 75 200Init atoms 320 364 1140Goals 65 75 200

FF OBFS LOBFS FF LOBFS LOBFSPlan length 398 387 403 505 477 1714Evaluated nodes 16456 16456 4 45785 4 5Search time 527.21 1181.95 0.29 2792.51 0.42 16.64Total time 528.10 1184.35 2.68 2793.82 3.88 96.69

Table 4: Logistics domain (largest problems)

23

0.1

1

10

100

1000

10000

1 3 5 6 7 8 2 11 4 10 13 14 9 15 12 17 18 19 20 16 21 23 22 24 25 26 27 28 29 30

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 13: DriverLog domain (CPU time)

0

20

40

60

80

100

120

140

160

180

1 3 5 6 7 8 2 11 4 10 13 14 9 15 12 17 18 19 20 16

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

0

200

400

600

800

1000

1200

1400

1600

21 23 22 24 25 26 27 28 29 30

Problems

Figure 14: DriverLog domain (plan length)

Problem 15 Problem 21 Problem 30Road junctions 12 28 90Drivers 4 10 40Packages 8 30 120Trucks 4 7 30Init atoms 227 607 2130Goals 10 38 163

FF OBFS LOBFS OBFS LOBFS LOBFSPlan length 44 44 54 184 193 1574Evaluated nodes 161 273 4 3266 8 38Search time 0.21 0.84 0.02 207.89 0.45 93.92Total time 0.26 0.97 0.14 209.40 1.96 284.65

Table 5: DriverLog (largest problems)

24

with running times much more important than the other planners ones. LOBFS solves allthe problems and is generally faster. The problem of LOBFS in this domain is about planquality: several plans found by LOBFS contain much more actions than plans found byother planners (see Figure 16). However, when looking closely to these results, we can makethe following observation: problems for which LOBFS returns a very long plan comparedto other planners are often the ones that cause difficulty to these planners. For instance,problem 22 is solved in 1770 seconds by FF, in 53 seconds by OBFS, but in only 4 secondsby LOBFS. The returned plans are 25 actions long for FF, 18 actions long for OBFS, and54 actions long for LOBFS.

LOBFS develops only one node in 8 problems, and never develops any node in all otherproblems.

6.3.2 Freecell domain

It is the familiar solitaire game found on many computers, involving moving cards from aninitial tableau, constrained by tight restrictions, to achieve a final suit-sorted collection ofstacks.

In this domain, FF is more efficient than OBFS, although this last is faster for 3 problems(up to 23 times for problem 20, see Figure 17). It is interesting to note that the enforced hill-climbing strategy of FF fails to find a solution for these problems, and FF must switch toa complete best-first search algorithm. OBFS does not have this problem, as the optimisticbest-first search algorithm is able to handle helpful actions until a solution is found or thestate space is completely explored without finding a plan. OBFS is generally more efficientthan BFS, except for two problems where BFS is slightly faster (the difference is not reallysignificant). LOBFS is more efficient than BFS and OBFS, and tends to be faster than FF:up to 67 times for problem 20. However, problem 19, solved by FF, is not solved by any ofour planners. Plan quality is equivalent for all planners, plans found by LOBFS being oftenslightly longer (see Figure 18).

Some problems seem to be more difficult for LOBFS than problems from domains wehave studied so far, because LOBFS develops more nodes: five problems require between70 and 250 nodes to be developed, while the other problems require less than 3 nodes. Thisis however limited compared to the number of nodes developed by OBFS: for instance inproblem 18, OBFS develops 6915 nodes and LOBFS only 250.


Although not as impressive as for the five first domains we studied, the improvementsobtained by using our lookahead technique are still interesting for these two domains, asLOBFS has much better performances than OBFS. This last compares well to FF, and ismore efficient than BFS.

The loss in quality of plan solution observed for LOBFS remains limited to a smallnumber of problems, and for example in Mprime domain where LOBFS solves all problemsin less than 10 seconds, we could use LOBFS for getting a solution as fast as possible andthen another planner to get a better solution.

Other techniques within our planner can also be imagined; for example we could let itrun when a solution is found and try to get a better one in an anytime way. We have somepreliminary results of such a technique that are not yet enough convincing and will not bepresented here.

6.4 Difficult problems

The domains we test here are very difficult for all the planners we use. While still upgradingthe efficiency of OBFS, our lookahead technique seems to be less useful than in previous

25

0.1

1

10

100

1000

10000

25 1 28 35 4 32 7 11 12 3 9 5 2 29 26 27 34 16 31 8 19 21 33 17 23 30 24 6 18 20 15 22 10 13 14

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 15: Mprime domain (CPU time)

0

10

20

30

40

50

60

70

80

90

25 1 28 35 4 32 7 11 12 3 9 5 2 29 26 27 34 16 31 8 19 21 33 17 23 30 24 6 18 20 15 22 10 13 14

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

Figure 16: Mprime domain (plan length)

26

0.1

1

10

100

1000

10000

1 2 3 4 5 6 8 9 12 10 13 16 7 15 17 14 11 18 20 19

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 17: Freecell domain (CPU time)

0

20

40

60

80

100

120

140

160

1 2 3 4 5 6 8 9 12 10 13 16 7 15 17 14 11 18 20 19

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

Figure 18: Freecell domain (plan length)

27

domains. The trade-off between speed and quality is not so clearly in favor of LOBFS.

6.4.1 Depots domain

This domain was devised in order to see what would happen if two previously well-researcheddomains were joined together. These were the logistics and blocks domains. They are com-bined to form a domain in which trucks can transport crates around and then the cratesmust be stacked onto pallets at their destinations. The stacking is achieved using hoists, sothe stacking problem is like a blocks-world problem with hands. Trucks can behave like“tables”, since the pallets on which crates are stacked are limited.

Problems from this domain are difficult for all planners (see Figure 19). FF solves moreproblems than the other planners: it solves 21 problems on 22, LOBFS solves 19 problems,OBFS solves 17 problems and BFS solves 14 problems. FF is generally more efficient thenOBFS, with the except of 4 problems solved much faster by OBFS: it is up to 2000 timesfaster on problem 8. As in the Freecell domain, 3 of these problems (problems 4, 8 and 10)require that FF switches to a complete best-first search algorithm, unable to handle helpfulactions contrary to OBFS. BFS is only slower than OBFS, while LOBFS is generally fasterbut in 2 problems and one problem solved by OBFS and not by LOBFS. This last compareswell to FF and is often faster, but misses two problems more than FF. The plans returnedby LOBFS are also significantly longer than plans returned by other planners in 7 problems(see Figure 20).

In this domain, LOBFS develops a lot of nodes for several problems: between 9 and6659 nodes for 10 problems, and less than 3 for 9 problems. But it still develops very fewernodes than OBFS: for example in problem 9, OBFS develops 48629 nodes while LOBFSdevelops only 6659 nodes.

The interest of our lookahead technique is debatable for this domain because of planquality, but seems to be still interesting: 2 more problems are solved and LOBFS is generallyfaster than OBFS.

6.4.2 Mystery domain

What is very difficult to handle for heuristic search planners in this domain is the fact thatsome problems do not admit a solution plan. It is some time trivial, and can be determinedby the instantiation procedure; but it more often requires to exhaustively explore the searchspace.

This domain is the most difficult one we used in our experiments, for all planners (seeFigure 21). Among the 30 problems tested, only 16 are solved by FF, 14 are solved by BFS,and 18 by OBFS and LOBFS. BFS is the slowest planner, except on problem 9 that it solvesmuch better than OBFS and LOBFS. FF is generally faster than OBFS and LOBFS, butsolves less problems. OBFS and LOBFS have very close performances, with the except of4 problems solved much better by LOBFS (up to 1050 times faster on problem 19). Themain drawback of LOBFS, as in the Depots domain, is the quality of plans that it returns(see Figure 22): 9 plans found by LOBFS are significantly longer than plans returned by theother planners.

LOBFS develops less than 3 nodes in 15 problems, but develops a lot of nodes for the 3other problems. On problem 12, which admits no solution plan, OBFS and LOBFS developexactly the same number of nodes (1032977 nodes); but on problems 6 and 9, LOBFSdevelops slightly more nodes than OBFS: 15729 against 15663 for problem 6 and 3976against 3960. The running time overhead due to the lookahead computation in LOBFSremain limited: LOBFS solves problem 6 in 1530 seconds against 1314 seconds for OBFS.

28

0.1

1

10

100

1000

10000

1 2 3 10 13 7 16 17 11 19 18 21 14 4 8 5 9 22 12 15 6 20

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 19: Depots domain (CPU time)

0

20

40

60

80

100

120

140

160

1 2 3 10 13 7 16 17 11 19 18 21 14 4 8 5 9 22 12 15 6 20

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

Figure 20: Depots domain (plan length)

29

0.1

1

10

100

1000

10000

1 7 18 25 28 11 29 27 3 26 30 2 20 19 17 15 13 14 9 6 12

Tim

e (s

econ

ds)

Problems

FFBFSOBFSLOBFS

Figure 21: Mystery domain (CPU time)

0

5

10

15

20

25

1 7 18 25 28 11 29 27 3 26 30 2 20 19 17 15 13 14 9 6 12

Pla

n le

ngth

(nu

mbe

r of

act

ions

)

Problems

FFBFSOBFSLOBFS

Figure 22: Mystery domain (plan length)

30


Due to the loss in plan quality, the use of our lookahead technique is less interesting than inprevious studied domains; it however allows to find plans for problems where OBFS fails todo so, and to get a better running time for some problems. Further developments of the ideaspresented in this paper should concentrate on improving the behavior of LOBFS for suchdomains, where there are a lot of subgoal interactions as in the Depots domain, or limitedresources as in the mystery domain.

6.5 Lookahead utility

7 Related works

8 Conclusion

Plans extracted from relaxed planning graphs are used in the FF planning system for pro-viding an estimation of the length of the plans that lead to goal states. We presented a newmethod for deriving information from these relaxed plans, by the computation of lookaheadplans. A lookahead plan, calculated by a polynomial algorithm, is composed of as mostactions as possible from a given relaxed plan. It is valid for the state for which the relaxedplan is extracted in the sense that its application to this state is possible and leads to an-other state called lookahead state. We then expose how to employ the produced lookaheadplans in a modified version of a classical best-first search algorithm to be used by a forwardstate-space planner. For each state evaluated by the heuristic function, a lookahead planis computed and the state that it leads is added to the list of nodes to be developed, in thesame way the states computed by using the actions executable in that state are added to thatlist: lookahead states thus produce supplementary nodes in the search tree. The differencewith classical algorithms is that the vertices that lead to lookahead states are labeled by aplan and not by only one action. We then improved this search algorithm by using helpfulactions in a way different than in FF, that preserves completeness of the search algorithmin a strategy we called “optimistic”. Instead of using them in a local search, and loosingactions that are not considered as helpful, we introduce a criterium for choosing nodes thatalways prefers to develop states that can be developed with helpful actions over states thatcannot. Although lookahead states are generally not goal states and the branching factoris increased with each created lookahead state, the experiments we conducted prove that innumerous domains (Rovers, Logistics, DriverLog. . . ), our planner can solve problems thatare up to ten times bigger (in number of actions of the initial state) than those solved by FFor by the optimistic best-first search without lookahead (we got plans with more than 2000actions long). The efficiency for problems solved by all planners is also greatly improvedwhen using our lookahead strategy. In domains that present more difficulty for all planners(Mystery, Depots), the use of the lookahead strategy can still improve performances forseveral problems. There is very few problems for which the optimistic search algorithm isbetter without lookahead. The price to pay for such improvements in the performances andin the size of the problems that can be handled resides in the quality of solution plans thatcan be in some cases severely damaged. However, there are few of such plans and qualityremain generally very good compared to FF.

Acknowledgments

To be included. . .

31

0.1

1

10

100

1000

10000

0 10 20 30 40 50 60 70 80 90 100

Acc

eler

atio

n LO

BF

S /

OB

FS

Lookahead utility

Figure 23: Utility

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70 80 90 100

Lookahead utility

Number of problemsMean acceleration

Figure 24: Utility

32

References

[BF95] A. Blum and M. Furst. Fast planning through planning-graphs analysis. InProc. IJCAI-95, pages 1636–1642, 1995.

[BF97] A. Blum and M. Furst. Fast planning through planning-graphs analysis. Artifi-cial Intelligence, 90(1-2):281–300, 1997.

[BG01] B. Bonet and H. Geffner. Planning as heuristic search. Artificial Intelligence,129(1-2):5–33, 2001.

[BLG97] B. Bonet, G. Loerincs, and H. Geffner. A robust and fast action selectionmechanism for planning. In Proc. AAAI-97, pages 714–719, 1997.

[CRV01] M. Cayrol, P. Regnier, and V. Vidal. Least commitment in Graphplan. ArtificialIntelligence, 130(1):85–118, 2001.

[HG00] P. Haslum and H. Geffner. Admissible heuristics for optimal planning. In Proc.AIPS-2000, pages 140–149, 2000.

[HN01] J. Hoffmann and B. Nebel. The FF planning system: Fast plan generationthrough heuristic search. JAIR, 14:253–302, 2001.

[KMS96] H. Kautz, D. McAllester, and B. Selman. Encoding plans in propositional logic.In Proc. KR-96, pages 374–384, 1996.

[KNHD97] J. Koehler, B. Nebel, J. Hoffmann, and Y. Dimopoulos. Extending planning-graphs to an ADL subset. In Proc. ECP-97, pages 273–285, 1997.

[KS96] H. Kautz and B. Selman. Pushing the envelope: Planning, propositional logicand stochastic search. In Proc. AAAI-96, pages 1194–1201, 1996.

[KS99] H. Kautz and B. Selman. Unifying SAT-based and Graph-based planning. InProc. IJCAI-99, pages 318–325, 1999.

[LF99] D. Long and M. Fox. The efficient implementation of the plan-graph in STAN.JAIR, 10:87–115, 1999.

[MGH � 98] D. McDermott, M. Ghallab, A. Howe, C. Knoblock, A. Ram, M. Veloso, andD. Weld. The Planning Domain Definition Language. 1998.

[NK00] X.L. Nguyen and S. Kambhampati. Extracting effective and admissible statespace heuristics from planning-graph. In Proc. AAA1-2000, pages 798–805,2000.

[NK02] X.L. Nguyen and S. Kambhampati. Planning graph as the basis for derivingheuristics for plan synthesis by state space and CSP search. Artificial Intelli-gence, 135(1-2):73–123, 2002.

[VR99] V. Vidal and P. Regnier. Total order planning is better than we thought. In Proc.AAAI-99, pages 591–596, 1999.

[WAS98] D.S. Weld, C.R. Anderson, and D.E. Smith. Extending Graphplan to handleuncertainty and sensing actions. In Proc. AAAI-98, pages 897–904, 1998.

33

Documents

A lookahead strategy for heuristic search planningv.vidal.free.fr/onera/publis/look.pdfA lookahead strategy for heuristic search planning Vincent Vidal IRIT – Universite´ Paul Sabatier