18
Integrated Computer-Aided Engineering 22 (2015) 343–360 343 DOI 10.3233/ICA-150493 IOS Press Reactive execution for solving plan failures in planning control applications Cesar Guzman a , Pablo Castejon a , Eva Onaindia a,and Jeremy Frank b a Universitat Politècnica de València, Valencia, Spain b NASA Ames Research Center, Moffet Field, CA, USA Abstract. We present a novel reactive execution model for planning control applications which repairs plan failures at runtime. Our proposal is a domain-independent regression planning model which provides good-quality responses in a timely fashion. The use of a regressed model allows us to work exclusively with the sufficient and necessary information to deal with the plan failure. The model performs a time-bounded process that continuously operate on the plan to recover from incoming failures. This process guarantees there always exists a plan repair for a plan failure at anytime. The model is tested on a simulation of a real-world planetary space mission and on a well-known vehicle routing problem. Keywords: Reactive planning, dynamic execution, monitoring plan execution, reactive execution agent, unpredictable environ- ment 1. Introduction The application of Artificial Intelligence AI plan- ning techniques is helping industries to improve their efficiency and performance in a great variety of ap- plications: manufacturing and telecommunication net- works [31,37], education [20], route planning [10,41], military and civilian coalition operations [36], space exploration [9], etc. In general, even though much of the research on AI planning is aimed at generating domain-independent planning technology, the application of planning to in- dustry gives rise to special-purpose systems, which are expensive to extend to other cases. On the other hand, there exist few systems that integrate automated plan- ning and plan execution and this is, perhaps, one of the main causes of the relatively low deployment of auto- mated planning applications [22]. While the primary focus of planning is on deliberative tools to calculate plans that achieve operation goals, the focus of execu- Corresponding author: Eva Onaindia, Department de Sistemas Informáticos y Computación, Universitat Politècnica de València, Spain. Tel.: +34 963 877 755; Fax: +34 963 877 359; E-mail: [email protected]. tion is on developing control methods over relatively short time spans to ensure the plan actions are executed stably [3]. Most planning and execution (P&E) systems follow an integrated approach in which the execution monitor- ing system is integrated with the planner or vice versa. Systems like I XTET- EXEC [29] or TPOPEXEC [46] work under a continual planning approach [6], inter- leaving planning and execution in a world under con- tinual change. Another example can be found in [9], where the planner of a spacecraft continuously oper- ates on the plan execution to repair failures. IDEA (In- telligent Distributed Execution Architecture) [1] is a real-time architecture that proposes a unified view of deliberation and execution where the planner is em- bedded within the executor; and T-REX [32](Teleo- Reactive EXecutive) is a deliberative P&E system for AUV control inspired from IDEA. This type of unified approaches allows only for a strict and controlled inter- leaving of P&E, making it difficult to have a general- purpose planner for different types of executor sys- tems. Some of the aforementioned P&E systems [29,46], deal with temporal plans and focus on a unified ap- proach that accommodates flexible plan executions, but ISSN 1069-2509/15/$35.00 c 2015 – IOS Press and the author(s). All rights reserved This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License.

IOS Press Reactive execution for solving plan failures in

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Integrated Computer-Aided Engineering 22 (2015) 343–360 343DOI 10.3233/ICA-150493IOS Press

Reactive execution for solving plan failures inplanning control applications

Cesar Guzmana, Pablo Castejona, Eva Onaindiaa,∗ and Jeremy FrankbaUniversitat Politècnica de València, Valencia, SpainbNASA Ames Research Center, Moffet Field, CA, USA

Abstract. We present a novel reactive execution model for planning control applications which repairs plan failures at runtime.Our proposal is a domain-independent regression planning model which provides good-quality responses in a timely fashion.The use of a regressed model allows us to work exclusively with the sufficient and necessary information to deal with the planfailure. The model performs a time-bounded process that continuously operate on the plan to recover from incoming failures.This process guarantees there always exists a plan repair for a plan failure at anytime. The model is tested on a simulation of areal-world planetary space mission and on a well-known vehicle routing problem.

Keywords: Reactive planning, dynamic execution, monitoring plan execution, reactive execution agent, unpredictable environ-ment

1. Introduction

The application of Artificial Intelligence AI plan-ning techniques is helping industries to improve theirefficiency and performance in a great variety of ap-plications: manufacturing and telecommunication net-works [31,37], education [20], route planning [10,41],military and civilian coalition operations [36], spaceexploration [9], etc.

In general, even though much of the research on AIplanning is aimed at generating domain-independentplanning technology, the application of planning to in-dustry gives rise to special-purpose systems, which areexpensive to extend to other cases. On the other hand,there exist few systems that integrate automated plan-ning and plan execution and this is, perhaps, one of themain causes of the relatively low deployment of auto-mated planning applications [22]. While the primaryfocus of planning is on deliberative tools to calculateplans that achieve operation goals, the focus of execu-

∗Corresponding author: Eva Onaindia, Department de SistemasInformáticos y Computación, Universitat Politècnica de València,Spain. Tel.: +34 963 877 755; Fax: +34 963 877 359; E-mail:[email protected].

tion is on developing control methods over relativelyshort time spans to ensure the plan actions are executedstably [3].

Most planning and execution (P&E) systems followan integrated approach in which the execution monitor-ing system is integrated with the planner or vice versa.Systems like IXTET-EXEC [29] or TPOPEXEC [46]work under a continual planning approach [6], inter-leaving planning and execution in a world under con-tinual change. Another example can be found in [9],where the planner of a spacecraft continuously oper-ates on the plan execution to repair failures. IDEA (In-telligent Distributed Execution Architecture) [1] is areal-time architecture that proposes a unified view ofdeliberation and execution where the planner is em-bedded within the executor; and T-REX [32](Teleo-Reactive EXecutive) is a deliberative P&E system forAUV control inspired from IDEA. This type of unifiedapproaches allows only for a strict and controlled inter-leaving of P&E, making it difficult to have a general-purpose planner for different types of executor sys-tems.

Some of the aforementioned P&E systems [29,46],deal with temporal plans and focus on a unified ap-proach that accommodates flexible plan executions, but

ISSN 1069-2509/15/$35.00 c© 2015 – IOS Press and the author(s). All rights reservedThis article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License.

344 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

they are not concerned with providing responses in atimely fashion. In contrast, the works in [1,32], besidesgenerating plans over relatively long time periods, theyalso introduce a reactive planner to allow robust per-formance in dynamic environments. The term reactiveplanning has been approached from different perspec-tives [8]:

– Responding very quickly to changes in the envi-ronment through a reactive plan library that storesthe best course of action to each possible contin-gency.

– Choosing the immediate next action on the basisof the current context; in this case, a deliberativeprocedure can be used to specify the next act.

– Using more complex constructs in order to handleexecution failures or environmental changes.

The first approach, used by early P&E systems, im-plies storing a plan for each possible state of the world,an option which is not affordable in highly dynamicenvironments. Hierarchical control structures provide amechanism to choose the immediate next action whena quick response is required in unpredictable envi-ronments [7]. This is the approach followed by themodels that emphasize reactiveness, computing justone next action in every instant, based on the cur-rent context [32]. The use of more complex struc-tures allow to consider more deliberative (long hori-zon) responses rather than short-term reactiveness but,in practice, none of these reactive frameworks haveever exploited the idea of providing quick deliberativeresponses. They react to a change or failure but they donot guarantee a response within a limited time.

A key aspect of reactive planning is that it is not onlyabout providing quick responses to changes but alsopreserving the plan stability [18] and guarantee that theoperation goals achieved by the plan are still reachableafter the plan repair. In contrast to control applicationsthat use condition monitoring for anticipating and re-acting to faults [4,40], reactive planning is about re-pairing a fault when is detected while trying to main-tain the executability of the rest of the plan.

The focus of this work is on the development of areactive P&E system capable to provide fast deliber-ative responses to repair a failed action without ex-plicitly representing contingency branches and eligibleplans for each possible state of the world. Our systemdoes not simply return the immediate next executableaction but it operates over a planning horizon that islonger than the minimum latency interval starting at thecurrent execution time. This idea was exposed, thoughnever exploited, in IDEA [13], which in practice works

with the minimal planning horizon in order to reducethe reactive planner’s search space; that is, the shorterthe planning horizon, the more reactivity, but also theless contextual information to repair the failure. Theidea of planning horizon has been also exploited innon-reactive dynamic planning applications [47].

In this work, we propose a novel reactive P&Emodel that keeps track of the execution of a plan andrepairs the incoming failures. The executor-repairingsystem incorporates a reactive planning procedurewhich is exclusively used for plan repair and it is inde-pendent of the deliberative planner that computes thesolution plan for the problem. Unlike the integratedP&E approaches mentioned before, ours is a highlymodular and reconfigurable P&E architecture. The re-active planner is specialized in small plan repairs thatmust be accomplished promptly. Additionally, it pre-computes an initial search space which encodes solu-tion plans for potential failures in a fragment of theplan named plan window (we use this term equiva-lently to the concept of planning horizon in IDEA [1]).The construction of the search space is a time-boundedprocess subject to the agent’s execution latency and thenumber of execution cycles in the plan window, whoseobjective is to have a solution plan available when afailure arises. Once the search space is built, the agentproceeds with the plan execution, and simultaneouslythe reactive planner computes the search space of thenext plan window. The reactive model is thus capableto deduce strict limits to the length of the plan win-dow and compute the largest search space within thetime limit. Then, if a failure occurs, the correspondingsearch space is used to find a recovery plan. This givesour reactive model an anytime-like behaviour.

Our model contributes with several novelties: (a) itis a domain-independent P&E model that can be ex-ploited in any application context; (b) it is indepen-dent of the deliberative planner; (c) it trades off de-liberative and reactive mechanisms to provide good-quality responses in a timely fashion; (d) it avoids deal-ing with unnecessary information from the world, han-dling specifically the information relevant to the fail-ure; and (e) it performs a time-bounded deliberativeprocess that permits to continuously operate on theplan to repair problems during execution.

This paper is organized as follows. Section 2 pre-sents some related work and Section 3 outlines thegeneral P&E architecture where the reactive model isintegrated. The two following sections present someformal concepts and introduce the reactive planningmodel to recover from plan failures. Section 6 presents

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 345

a motivation example on the Mars space mission. Sec-tion 7 presents the model evaluation, Section 8 dis-cusses some limitation of the model and, finally, thelast section concludes and outlines future researchlines.

2. Related work

Classical planning refers generically to planning forstate-transition systems that adopt a series of assump-tions like that the system is deterministic and static,that actions are instantaneous transitions and that theplanner is not concerned with any change that may oc-cur in the system while it is planning [21]. In contrast,reactive planning operates in a timely fashion withhighly dynamic, non-deterministic and unpredictableenvironments, assuming uncertainty in the world andthe existence of multiple outcomes due to action fail-ures or exogenous events [34]. Our reactive planneris not a temporal planner but it is designed to returntimely responses in highly dynamic environments.

Regarding non-deterministic planning, the study offinding the sequence of actions or events that explainthe current observed state of the world is called diagno-sis. Most of the research on diagnosis put the emphasisin malfunctioning components and obtaining a plan torecover from the component failure rather than mon-itoring a plan execution [5,43]. Particularly, the workin [43] provides a formal characterization of diagno-sis and its relation to planning and, in [5], authors pro-pose an evolutionary strategy that successfully diag-noses several types of component failures, such as theseparation of a body part or the complete failure of asensor or motor.

Planning in non-deterministic environments has alsobeen addressed from a probabilistic perspective, rep-resenting probabilities over the expected action out-comes or belief state space. Planning based on MarkovDecision Processes is a well-known approach to dealwith non-determinism when an accurate probabilitydistribution on each state transition is available [27,33].In highly dynamic environments where exogenousevents frequently occur, it is not possible to have amodel of the uncertainty in the world. Likewise, a Fi-nite State Machine (FSM) can also be used to imple-ment a reactive behaviour [39] but FSM requires an ex-plicit modeling of a finite set of plans (states and tran-sitions) and it is particularly aimed at choosing onlythe immediate next action. When a plan is to be exe-cuted in an unpredictable environment, it is impracti-

cal to have all potential repair plans explicitly repre-sented and the emphasis is not only on short-term reac-tivity but also on preserving the achievability of the op-eration goals. For this reason, a time-bounded reactiveplanner that calculates promptly recovery plans over aplanning horizon is the more suitable solution.

3. An architecture for planning and execution

Our work takes place in the context of PELEA [24], asingle-agent architecture in which an agent is endowedwith capabilities for generating a plan, executing, planmonitoring and, optionally, learning. Our ultimate goalof extending this model to a multi-agent context is dis-cussed in Section 8. Before addressing this issue, ourpurpose is to have an agent equipped with a reactiveexecution mechanism that enables the agent to repaira plan at runtime, thus avoiding the need to resort tothe deliberative planner each time a failure occurs. Inthe following, we will refer to the concept of agent,specifically to execution agent, as an autonomous en-tity capable of performing reasoning and communicat-ing with other entities of the system like the delibera-tive planner, which offers a planning service.

In our approach, a planning service provides execu-tion agents with independent plans to be executed. Anagent, which is an extension of a PELEA1 agent [24],executes and monitors one action at a time and callsits repairing mechanism for a recovery plan whenevera failure arises. In case the agent is unable to solve thefailure by its own, it will request the planning servicea new plan.

The focus of this paper is on the repairing mecha-nism of the execution agent. The planning module em-bedded into the execution agent is a reactive planner,which is used to recover from failures at runtime. Thecomponents of an execution agent (see Fig. 1) are:

– Execution module (EX). The EX is initializedwith a planning task, which current state is readfrom the environment through the sensors. TheEX is responsible of reading and communicatingthe current state to the rest of modules as well asexecuting the actions of the plan in the environ-ment.

– Monitoring module (MO). The main task of theMO is to verify that the actions are executable in

1A more detailed description may be found at http://servergrps.dsic.upv.es/pelea/.

346 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

Fig. 1. Flow of the reactive execution model.

the current state before sending them to the EX.When the EX reports the MO the state resultingfrom the execution of some action of the plan, theMO checks whether the next action of the plan isexecutable in the resulting state. This process iscalled plan monitoring, verifying whether the val-ues of the variables of the received state match theexpected values or not. Otherwise, the MO willdetermine the existence of a plan failure.

– Reactive Planner module (RP). The RP is usedwhen a plan failure is detected by the MOand a recovery is required. The RP uses a pre-computed search space, called repairing struc-ture, to promptly find a plan that brings the cur-rent state to one from which the plan executioncan be resumed (see Section 5). In case the RP isnot able to find a plan with its repairing structure,the MO requests the planning service a new plan.

The EX, MO, and RP modules of an executionagent operate the Reactive Execution Model. The con-trol flow of the reactive execution model is shown inFig. 1. An action plan Π for solving a planning task iscalculated by the planning service. Π consists of a se-ries of actions to be executed at given time steps, eachof which makes a deterministic change to the currentworld state. The elapsed time from one time step tothe next one defines an execution cycle; i.e., the mon-itor/acting/sensing cycle of an action execution. Themodel follows several execution cycles, performing thescheduled action in each cycle until the plan executionis completed.

Initially, the MO receives the plan Π from the plan-ning service and, before sending Π to execution, it per-forms two operations:

1. It sends Π to the RP, which creates a repairingstructure for a fragment of Π of length l calledplan window, where l is the number of actionsor execution cycles in the plan window. The re-pairing structure associated to the plan windowcontains information, in the form of alternativeplans, to repair a failure that affects any of the lactions included in the plan window.

2. When the time of the RP expires, and a repair-ing structure has been calculated for a particularplan window, the MO monitors the variables ofthe first action of the window. If the sensed val-ues of the action variables match the required val-ues for the action to be executed, the MO sendsthe scheduled action to the EX for its execution(see Fig. 1). Otherwise, a failure is detected andthe MO calls the RP, which will make use of therepairing structure to fix the failing action. No-tice that a failure that occurs in the first actionof a window is due to an exogenous event (e.g.other agents change the world state) and not dueto an erroneous execution of the preceding actionin the plan.

The MO receives the result of the sensing task fromthe EX after executing the action, it updates the planwindow accordingly by eliminating the already exe-cuted action and proceeds with the next action of theplan window. For instance, assuming a plan of five ac-tions and a repairing structure for a plan window ofl = 3 ([a1, a2, a3]), the plan window will be updatedto [a2, a3] after successfully executing a1, and the RPwill use the same repairing structure to fix a potentialfailure in the remaining actions of the plan window,that is, a2 or a3. Subsequently, the RP will create a newrepairing structure, for example, for the plan window[a4, a5]. The flow goes on as long as no plan failuresare encountered. In case that a non-executable action isfound, the MO reports the failure to the RP. Then, theRP uses the repairing structure associated to the planwindow of the non-executable action and obtains a newplan Π′ that solves the failure and replaces the old planΠ, attaining likewise the goals of the planning task.

The RP is constantly working while the EX is ex-ecuting the actions of the plan. Hence, besides hav-ing a repairing structure ready to attend a failure inan action of the current plan window, the RP is alsogenerating the subsequent structure for the next planwindow. Typically, the time for the RP to computethe repairing structure of the subsequent plan windowis the time that the EX will take to execute the ac-tions included in the current window. Therefore, the

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 347

more actions in the current plan window, the more timethe RP will have to create the next repairing structureand, consequently, the longer the window associatedto this repairing structure. This working scheme givesour model an anytime behaviour, thus guaranteeing theRP can be interrupted at anytime and will always havea repairing structure available to attend an immediateplan failure.

On the other hand, some similarities between ourmodel and the life cycle of a scientific workflow canbe found. Following [23], we can do this analogy: themodeling phase is equivalent to the planning task mod-eling; the deployment phase amounts to the plan Π cal-culated by the planning service; and the execution andmonitoring phase would be the same in our model. Un-like scientific workflow, our model does not include ananalysis phase; however, successively repetitions of thedeployment (repair plan) and execution phases happenwhen a plan failure arises.

4. Formal model

In this section, we formalize the concept of planningtask, partial state and a solution plan for a task as a se-quence of partial states [21]. Our planning formalismis based on a multi-valued state-variable representationwhere each variable is assigned a value from a mul-tiple value domain (finite domain of a variable). Formodeling planning problems, we used PDDL3.1,2 themost recent version of the Planning Domain DefinitionLanguage [17] (PDDL).

Definition 1. Planning task A planning task is givenby the 4-tuple P = 〈V , I,G,A〉:

– V is a finite set of state variables, each associatedto a finite domain, Dv , of mutually exclusive val-ues that refer to objects in the world. v ∈ V mapsa tuple of objects to an object p of the planningtask, which represents the value of v. For exam-ple, in a planetary Mars rovers domain,3 a rover(B) can be placed at any of the waypoints w1, w2 orw3. Hence, the variable loc-B represents the loca-tion of rover B, and Dloc-B = {w1, w2, w3}.A variable assignment or fluent is a function f ona variable v such that f(v) ∈ Dv , wherever f(v)

2PDDL syntax definition introduced in 2008 by M. Helmert (http://ipc.informatik.uni-freiburg.de/PddlExtension/).

3Our PDDL specification of this domain can be found at http://servergrps.dsic.upv.es/planinteraction/resources/.

Fig. 2. Plan as a sequence of partial states. Variables are loc-B: lo-cation of rover B; link-w1-w2: map to travel from w1 to w2; com-r:communication of the results of analyzing the rock r. Underlinedvariables are the preconditions of the action Navigate.

is defined. A fluent is represented as a tuple 〈v, p〉,meaning that the variable v takes the value p.A total variable assignment or state applies thefunction f to all variables in V . A state is alwaysinterpreted as a world state. A partial variable as-signment or partial state over V applies the func-tion f to some subset of V .

– I is a state that represents the initial state of theplanning task.

– G is a partial state over V called the problem goalstate.

– A is a finite set of actions over V . An action ais defined as a partial variable assignment paira = 〈pre,eff 〉 over V called preconditions and ef-fects, respectively. If an action is executable in astate, i.e. its preconditions hold in such a state,the values of the state variables (fluents) changeas specified in the effects.

An action plan, ΠA, that solves a planning task P isa sequence of actions ΠA = 〈a1, . . . , an〉 that appliedin the initial state I satisfies the goal state G. An ac-tion ai ∈ ΠA is executable in a world state S if the flu-ents contained in S satisfy the preconditions of ai; i.e.pre(ai) ⊆ S. The result of executing ai in a state S isa new state S′ that contains the fluents of S which arenot modified by eff(ai) plus the fluents as specified ineff(ai). Then, executingΠA in the initial state I resultsin a sequence of states 〈S1, . . . , Sn〉 such that S1 is theresult of applying a1 in I, S2 is the result of applyinga2 in S1, . . . , and Sn is the result of applying an inSn−1. A plan ΠA is a solution plan iff G ⊆ Sn [21].

A plan can also be viewed from the point of viewof the world conditions (fluents) that are necessary forthe plan to be executed. That is, instead of viewing aplan as the result of its execution, we can view a planas the necessary conditions for its execution. Thus, aplan can also be defined as a sequence of partial states,rather than world states, containing the minimal set of

348 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

Fig. 3. Plan as a sequence of partial states for the plan ΠA. Variables are loc-B: location of rover B; loc-L: location of lander L; loc-r-w3:location of the rock r; have-B: indicates if B has the rock r; link-wi-wj: map to travel from wi to wj; vis-w3-w2: location w2 is visible from w3.com-r: results of analyzing the rock r are communicated.

fluents that must hold in the world for the plan to beexecutable in such a world state.

In the example depicted in Fig. 2, the partial stateG is the goal state G of a planning task P , and itcontains two fluents. Let’s consider the last action ofa plan is (Navigate B w1 w2), which achieves the ef-fect 〈loc-B, w2〉. Then, the necessary fluents to be ableto execute the action and achieve the fluents in G arerepresented in state G′. We can observe that G′ doesnot only contain the fluents that match the precon-ditions of the action Navigate (i.e., 〈loc-B, w1〉 and〈link-w1-w2, true〉, which represent that the locationof rover B must be the waypoint w1 and a link betweenw1 and w2 must exist, respectively) but also the fluent〈com-r, true〉. This is because this fluent (communi-cating the results of analyzing the rocks r) is a goalof G that is not achieved by the effects of the actionNavigate. Thereby, 〈com-r, true〉 is achieved earlierin the plan and it must hold in G′ in order to guaranteethat it is satisfied in G.

The state G′ in Fig. 2 is called a regressed partialstate because it is calculated by regressing the goals inG through the action Navigate. Likewise, the same re-gression can be applied to the rest of actions of a givenplan ΠA. Let ai be an action and G a goal state suchthat P = pre(ai), E = eff(ai) and E ⊆ G. The par-tial state G′ in which ai can be applied is calculated bythe regressed transition function Γ(G, ai), defined as:

G′ := Γ(G, ai) := G \ E ∪ P (1)

G′ is a partial state that represents the minimal set offluents that must hold in the world state in order toachieve G by means of the execution of ai. Notice thatG′ includes P , the preconditions of ai, plus the fluentswhich are in G but are not produced by E (G\E); i.e.,the fluents that are achieved before G′ in the plan andmust keep their values until G.

The regressed partial state approach was first usedby PLANEX [11] to supervise the execution of a se-

Fig. 4. Plan ΠA for a planetary Mars rover domain.

quence of actions. Plans are represented by means ofa triangle table (this structure provides support forplan monitoring) and the problem goals are regressedfrom the last column of the table, including actionpreconditions, through the remaining actions of theplan. Roughly, the regression of a fluent over an action(through the regressed transition function Γ) is a suffi-cient and necessary condition for the satisfaction of thefluent following the execution of the action. The workin [19] formalizes this concept in the situation calculuslanguage whereas we apply the same formalization inPDDL.

Definition 2. Solution plan as a sequence of partialstates Given a solution plan ΠA = 〈a1, . . . , an〉 fora planning task P , the regressed plan for ΠA is definedas a chronologically ordered sequence of partial states〈G0, G1, . . .Gn〉, where:

Gn := GG0 ⊆ IGi−1 := Γ(Gi, ai)

A regressed plan is denoted by ΠG0−Gn , where G0

is the initial partial state and Gn is the final partialstate of the plan ΠA. Definition 2 specifies the rele-vant fluents at each time step for the successful execu-tion of ΠA, where each ai ∈ ΠA is the relevant actionfor achieving Gi from Gi−1. Hence, a regressed planΠG0−Gn derived fromΠA denotes the fluents that musthold in the environment at each time step to success-fully execute the actions in ΠA. In other words, thisdefinition allows us to discern between the fluents thatare relevant for the execution of a plan and those onesthat are not. It exploits the idea of annotating plans

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 349

Fig. 5. Repairing structures for a rover B in a Planetary Mars Domain. a: regressed plan ΠG0−G5for 〈a1, a2, a3, a4, a5〉, b1: the repairing

structure of the plan window [a1 ,a2,a3], and b2: the repairing structure of the plan window [a4 ,a5].

with conditions that can be checked at execution timeto confirm the validity of a plan [11]. That is, if thefluents of a partial state Gi hold in a world state S(Gi ⊆ S) then the actions comprised in the plan frag-ment ΠGi−Gn are executable in S, thus guaranteeingthe goals of the planning task are achieved.

Figure 3 shows the regressed plan ΠG0−G5 derivedfrom the solution plan shown in Fig. 4, and calculatedthrough the successive application of the regressedtransition function Γ. The plan in Fig. 4 shows the ac-tions for a rover to gather a rock sample, communi-cate the results of analyzing the rock and navigate backto the initial position. Action a2, for instance, createsthe fluent 〈have-B, r〉 in G2; and the partial state G1

is the result from applying Γ(G2, a2), which includespre(a2) (the fluents which are underlined in node G1

of Fig. 3) plus the fluents that are in G2 but are notproduced by eff(a2). Therefore, if the sensor readingreturns a world state in which all of the fluents in G1

hold, then action a2 is executable in such a world state;if the fluents in G2 occur in the subsequent world statethen a3 is executable and so on. The last partial state,G5, comprises G, the goals of the planning task. Interms of plan monitoring,G5 represents the fluents that

satisfy the preconditions of a fictitious final action, af ,where pre(af ) = G and eff(af ) = ∅, i.e. G is moni-tored by checking the preconditions of af .

As a final remark, we note that the reactive executionmodel is defined at the same granularity level than theplanning model, and both use PDDL as the specifica-tion language. This eases the communication betweenthe planning service and execution agents and avoidsthe overhead of translating a high-level planning spec-ification into a low-level description as it happens inother models [14].

5. Reactive execution model

The key concept of our reactive execution model isthe repairing structure. Generally speaking, given a so-lution plan ΠA, a repairing structure T is a partial-statesearch tree that encodes recovery plans for a plan win-dow of ΠA. Since nodes in T are partial states, the re-active execution model only handles the minimal dataset that is necessary to carry out a repairing task.

Figure 5 shows two repairing structures, T1 and T2,for the regressed plan in Fig. 3 (for simplicity, we do

350 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

not show all the partial states that would be generated).T1 is the search tree associated to the plan window[a1, a2, a3], or, equivalently, to the regressed subplanΠG0−G3 . The length of the window is three (l = 3),because it comprises three actions, and the partial stateG3 is called the root node (Gr) of T1. A path in T1represents a (regressed) plan to reach the root nodeG3. Specifically, a path in a repairing structure is inter-preted as a recovery plan that leads the current worldstate to another state from which the execution of theplan ΠA can be resumed. All recovery plans in T1have one thing in common: they eventually guide theexecution of the plan towards G3, the root node ofT1. For instance, suppose that S is the set of fluentsthat represents the state of the world state such thatG′

16 ⊆ S (see Fig. 5 b1). The application of the planΠ′ = 〈a1, a4, a8, a6〉 to S will reach the partial stateG3, from which the rest of the plan ΠA, 〈a4, a5〉, canbe executed.

The tree T2 (see Fig. 5 b2) is associated to the planwindow [a4, a5], or to the regressed plan ΠG3−G5 . Thenumber of repairing structures necessary to keep trackof the execution of a plan ΠA depends on the time limitto create the search trees, which, in turn, delimits thesize of the tree. Two parameters determine the size of asearch space T , the length of the plan window associ-ated to T (l), and the depth of the tree (d). In general,the larger the value of l, the more alternatives to finda recovery plan; and the deeper the tree, the longer therecovery plans comprised in T . The minimum value ofd must be l + 1 in order to ensure that the tree com-prises at least one action that repairs the first action ofthe plan window associated to T . On the other hand,the maximum value of d is determined by the availabletime to build T . Particularly, in T1, d = 6, which re-sults from l = 3 and the time limit to build T1 (Section5.3 explains in detail how to estimate the maximumsize of a repairing structure).

In the following, we explain (1) the process to builda repairing structure, (2) how to find a plan in a searchtree to repair a failure and (3) the analysis to estimatethe size of the search tree.

5.1. Building a repairing structure T

The construction of the repairing structure T startsafter estimating the size of T ; i.e., when the values of land d are known. The generation process, shown in Al-gorithm 1, consists in expanding T from the root nodeGr via the application of the regressed transition func-tion Γ(G, a) following Eq. (1) (line 6 of Algorithm 1).

The algorithm is a classical backward construction ofa planning search space [21], where a node G is ex-panded until depth(G) = d; i.e., G reaches the max-imum depth tree (line 4), or G is superseded by an-other node that exists in the tree (lines 7 to 13 define amechanism for the control of repeated states which isdetailed below).

Input: Gr , dOutput: T

1: Q ← {Gr}, T ← {Gr}2: whileQ �= ∅ do3: G← remove first node from Q4: if depth(G) < d then5: for all {a | a ∈ A is a relevant action to G} do6: G′ ← Γ(G, a)7: if G′ /∈ T then8: if ∃ G′′∈ T | G′′⊂ G′ then9: mark G′ as superset of G′′

10: else11: Q ← Q∪G′

12: set transition (labeled a) from G′ to G13: T ← T ∪ G′

14: else15: Q ← ∅16: return T

Algorithm 1: Generating the repairing structure T .

The purpose of Algorithm 1 is to generate multi-ple regressed plans from Gr. Unlike the application ofΓ(G, a) in Definition 2, which departs from a given so-lution plan ΠA, such a plan does not exist when build-ing a tree T . Actually, the aim of Algorithm 1 is pre-cisely to find the relevant actions for a node G (line 5),and eventually create a plan ΠA that links two particu-lar partial states.

The operation of regressing a fluent f in a node Gover an action a checks whether a is a relevant actionto achieve f or not. An action a is relevant for f , andoriginates an arc (G′, G) in T , if it does not cause anyconflict with the fluents in G and G′. The constructionof T has then to check two consistency restrictions: (1)that eff(a) does not conflict with the fluents in G, and(2) that pre(a) does not conflict with the fluents in G′.

We define Φ(G′, G) as the function that returnswhether or not a conflict between two sets of fluentsG′ and G exists. Φ(G′, G) holds if ∃〈v, p〉 ∈ G and∃〈v, p′〉 ∈ G′ and p = p′.

Definition 3. Relevant action Given a fluent f ∈ G,a is a relevant action for f if the following conditionshold:

1) f ∈ eff(a) and

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 351

2) ¬Φ(eff(a), G) and3) G′ = Γ(G, a) ∧ ¬Φ(pre(a), G′)

The construction of T follows the application ofDefinition 3 for each fluent of a partial state G whichhas not reached depth(G) = d (lines 4 and 5 of Algo-rithm 1), and the expansion continues until no new par-tial states are added to the tree. The search space T isactually a graph due to the existence of multiple pathsthat reach the same partial state from the root node dur-ing the construction of T . Multiple paths are originatedbecause of actions like (Communicate rock B L w3 w2)and (Communicate soil B L w3 w2), which can be ex-ecuted in either order, or the existence of reversible ac-tions like (Navigate B w1 w2) and (Navigate B w2 w1).Consequently, T may contain many redundant paths.A set of state variables induce a state space that hasa size that is exponential in the set, and, for this rea-son, planning, as well as many search problems, suf-fer from a combinatorial explosion. Even though nodesin T are partial states that contain far less fluents thanworld states, the large size of the repairing structuresare sometimes unaffordable for a reactive system. Withthe aim of reducing the size of T , we only considerfor expansion the fluents of G that are related to therelevant variables, that is, the variables involved in thepreconditions of the actions of the plan window. Thus,given a plan window [a1, a2, a3], we approximateT byexpanding only the fluents related to the relevant vari-ables involved in the set pre(a1) ∪ pre(a2) ∪ pre(a3),which is actually the set of fluents that might need tobe repaired. The time complexity of Algorithm 1 re-sponds to the classical complexity of the generation ofa tree, that is O(bd), where b is the estimated branchingfactor of T that is detailed in Section 5.3.

The generation process makes two nodes in T beconnected by a unique simple path. Since we are inter-ested in keeping only the shortest (optimal) paths, theconstruction of T prunes repeated states (line 7 in Al-gorithm 1) and avoids the expansion of superset nodes(lines 8 and 9). Let’s assume that T contains a pathfrom a node G to the root node Gr of T . A node G′

such that G ⊂ G′ is said to be a superset of node G. Inthis case:

– G stands for the minimal set of fluents that musthold in S in order to execute the actions of thepath that reaches Gr.

– The best recovery plan from G is also the bestpath from G′ because the RP returns the shortestplan to Gr.

All in all, a repairing structure encodes the optimalpath between each pair of nodes for which a recoveryplan can be found. Once the RP has created T , it com-municates the MO all the variables involved in T .

5.2. Repairing a failure

When an action of the plan window associated to arepairing structure T fails, the RP finds a way to keepthe plan going, either by reaching a partial state in Tfrom which to execute the faulty action again or ratheranother state from which to execute a later action ofthe plan window.

Let T be a repairing structure of a regressed planΠG0−Gr associated to the plan window [a1, . . . , ar] ofa plan ΠA. When an action in [a1, . . . , ar] fails, a re-pairing task defined as R = 〈S,Gt〉 is activated, whereS4 is the set of fluents of the current world state and Gt

is the target state we want to reach in T . The node Gt

varies depending on the failed action and the particularrepairing task for such action. Since several recoveryplans can be found to fix a faulty action, the RP willsuccessively execute a repairing task until one of themis successful for fixing the action. This way, if the er-roneous action is a1, the RP will first try the repairingtask R = 〈S,G0〉; otherwise, it will try R = 〈S,G1〉and so on until Gt = Gr; if the failure occurs in a2,the first attempt will be R = 〈S,G1〉 and the last at-tempt will be for Gt = Gr; in the case that the fail-ure affects ar, only two repairing tasks can be realized,R = 〈S,Gr−1〉 and R = 〈S,Gr〉.

More formally, given R = 〈S,Gt〉 for a faulty ac-tion a, the RP applies a modified breadth-first searchfrom Gt until a node Gs that satisfies Gs ⊆ S is foundin T . Gs is a consistent state with S, a state that com-prises all the necessary fluents to execute in the cur-rent world state the plan formed with the actions fromGs to Gt. If Gs ⊆ S is found, the recovery plan fromGs to Gt is concatenated with the plan from Gt toGr (unless Gt = Gr), and with the plan from Gr toGn, where Gn is the last state of the original plan ΠA

which contains G, the problem goal state. If Gs ⊆ Sis not found, the RP will execute the subsequent re-pairing task R = 〈S,Gt+1〉 until one of them success-fully retrieves a recovery plan or R is invoked with Gt

= Gr and a plan is not found. In this latter case, Tdoes not comprise the necessary information to find a

4Technically speaking, the MO does not communicate the RP allof the fluents in S but only the values of the variables that appear inT ; these variables were sent by the RP to the MO after building T .

352 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

recovery plan and the planner service is invoked, whichperforms a replanning task.

Let’s see how the repairing task procedure applies toa particular example. Consider that a failure occurs inthe action a1, Navigate B w2 w3, in the subplanΠG0−G3

of the repairing structure T1 of Fig. 5 b1. Let’s assumethat the failure is due to a wrong location of rover B,which is not at w2 but at w1.

– The first repairing task is R = 〈S,G0〉, whereG0 ⊆ G′

12. The only two nodes that are reach-able from G′

12 are G′17 and G′

21. If none of thesetwo states match the current world state S, that isG′

17 � S and G′21 � S, then it means no plan re-

pair actually exists to reach G0 from S in T1, and,hence, there is no way to return rover B to w2 fromw1.

– Assuming there is no solution to move rover Bback to w2, the RP attempts the next repairing taskR = 〈S,G1〉, where G1 ⊆ G′

7 in T1 is a par-tial state in which B analyzes the rock at locationw3 and communicates the results to the lander.Hence, for every descendant node Gs of G′

7 (ex-cluding G′

12 and its descendent nodes which werealready explored in the previous repairing task),the RP checks whether it holds Gs ⊆ S, in whichcase the RP will return the plan from Gs to G′

7

concatenated with the plan 〈a2, a3〉 and 〈a4, a5〉.Let’s assume that G′

13 ⊆ S holds, in which casea path that reaches G′

7 from S actually exists.This path is formed of the action a6, NavigateB w1 w3, that moves the rover B from w1 to w3,where the rover has to analyze the rock. Then, theRP returns the recovery plan 〈a6〉, concatenatedwith 〈a2, a3〉 (analyze the rock and communicatethe results, respectively) and concatenated with〈a4, a5〉 (the rest of actions in ΠA).

Two issues related to the repairing task are worthmentioning here. First, it is important to highlight thatthe choices to find a recovery plan increase when thetarget state is closer to the root node of the tree. Thiscan be graphically observed in Fig. 6. The top figureshows a tree T of depth d associated to a plan windowof length l = 3. If the repairing task is R = 〈S,G0〉,actions a1, a2 and a3 must be included in the recov-ery plan, and the path-finding algorithm restricts thesearch to m levels of the tree, the shadowed portion ofthe tree in the figure. If, however, our repairing task isR = 〈S,G1〉 (bottom figure) then the recovery planmust only comprise a2 and a3, and, the choices to finda state that matches S increase as well as the recoveryplans to reach G1. In conclusion, when the target state

Fig. 6. Repairing structures abstract.

is one of the first states of the subplan, we find feweralternatives of repairing but the recovery plan guaran-tees more stability with respect to the original plan. Incontrast, if the target state is closer to the root nodethen there are more choices to find a recovery plan al-though the plan found might not keep any of the ac-tions in the original plan. Clearly, the more flexibility,the less stability.

Secondly, it might happen that no recovery plan isfound for a repairing task. Notice that the tree has alimited size in terms of l and d that is determined bythe available time to build the tree, and the informa-tion included in the tree may not be sufficient to solveall the potential contingencies. The principle underpin-ning reactive systems is that of providing a prompt re-ply, and this requires to work with bounded data struc-tures. Moreover, reactive responses are applied whenslight deviations from the main course of actions occur.

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 353

A major fault that makes a variable take a value that isnot considered in the repairing structure would likelyneed a more deliberative response.

5.3. Estimating the size of T

In order to ensure reactivity, we need some guaran-tee that a repairing structure T is available when a fail-ure arises and that the only operation that needs to bedone is to find a recovery plan in T . In this section, weexplain the details of the time-bounded building of T .

The time limit (ts) to build some T is given by theagent’s execution latency (the longest acceptable timefor the agent to start the execution of an action and re-turn the world state resulting from such execution), andthe number of execution cycles (actions) in the planwindow of the preceding T . The agent’s latency de-pends on whether the system that is being observed issimulated or real and the characteristics of the agent’sdomain. Some domains may require a relatively largeexecution latency (e.g., 10 s.), but others may have amuch shorter latency (e.g., 10 ms.). In our experiments(see Section 7) we assume a latency of 1000 ms.

Estimating variables that have a long range of val-ues, like the time to generate T (tT ), with a multiplelinear regression model may produce highly inaccuratepredictions [26]. Therefore, our proposal is to estimatethe branching factor of T (b) instead of directly esti-mating tT ; the estimation of b is given by the Eq. (2a).Our estimation model relies on several parameters thataffect the size of T , namely, the depth d of T (x1);the number of fluents in Gr (x2); the number of rele-vant variables associated to T (x3) as well as the sumof the domain cardinalities of these variables (x4); thenumber of relevant actions that modify these variableswithout removing duplicates (x5) as well as removingthem (x6); and the number of these variables that alsoappear in Gr (x7). The values from x2 to x7 dependon the length (l) of the plan window, hence, our esti-mation depends mainly on the depth d and the lengthl of T . We preserve the value of d in our estimationmodel to ensure that T will comprise at least one ac-tion to repair the first action of the plan window. In or-der to learn the weights wi, we designed a series of ex-periments to generate random repairing structures fordifferent problems from diverse planning benchmarks5

and obtain the value of b of the generated trees. Since

5Part of the benchmark suite and the training data set can be foundat http://servergrps.dsic.upv.es/planinteraction/resources/.

Table 1Trace to calculate l and d values with ts = 1000 ms

l, d 2, 3 2, 4 2, 5 3, 4 3, 5 4, 5δ(l, d) 255 637 1052 795 1230 1084Selected �

these trees are irregularly shaped, we approximated thevalue of b (b) through Eq. (2b) (an approximation to thenumber of nodes N in a uniform tree). Finally, the timeto apply the regressed transition function Γ to a Gi (tΓ)is computed as the average of the ratios between thetime of generating a tree and the number of nodes ofsuch a tree over the whole benchmark.

b = w0 +

7∑

i=1

wi ∗ xi (2a)

N = (b+ 0.34)d (2b)

tT ≡ δ(l, d) = N ∗ tΓ (2c)

{l, d} = argmaxl ∈ [2, x], d ∈ [l + 1, y]

δ(l, d) < ts

δ(l, d) (2d)

Once the estimation model is trained, in the testingstage we use the Eq. (2d) to find the values of l and dthat maximize tT within the time limit ts. Through thismaximization process we compute for every pair of land d: (1) the associated value of b, (2) the number ofpartial states N , and (3) whether or not the tT , result ofEq. (2c), is lower than ts. Table 1 shows a trace of thisprocess. The RP starts with the combination l = 2 andd = 3 and the value of d is progressively increased by1 until δ(l, d) > ts (see δ(2, 5) in Table 1). Then, thevalue of l is increased by 1 and d resets to l + 1 = 4.Since δ(l, d) is an increasing function with the valuesof l and d, their upper boundaries, x and y, will belimited according to the value of ts (in Table 1 theseupper boundaries are 4 and 5, respectively). Finally,the RP selects the closest combination to ts (δ(3, 4) inTable 1) and the remaining time ε = ts − tT is addedto the ts for the next structure.

6. Planetary Mars domain, a real-worldmotivation example

The Mars planning domain stems from a real-worldproblem that NASA is dealing with in some researchprojects on space exploration [30]. In this domain, arover that works on the Martian surface may have dif-ferent instruments (e.g. cameras, robotic arms, drills,atmospheric sensors) to analyze rocks or soils, to take

354 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

images of the terrain, or perform atmospheric mea-surements over many location types on the Mars sur-face. Location types included soft sand, hard-packedsoil, rough rock fields, and combinations of these all.In our domain, this data must be communicated to alander, which in turn sends the results to a control cen-ter situated on Earth (current Mars rovers communicateto Earth via orbital relays). The communication fromMars to Earth has a long delay of at least 2.5 minutes,and at most 22 minutes. The rover could perform re-connaissance activities [12] to identify interesting tar-gets and maps for future missions. After a long time,the rover may lose capabilities [28,42] (e.g. reducedmobility due to wear and tear on motors or mechani-cal systems, reduced power generation due to batterydegradation, accumulated dust on solar arrays), as hap-pened with the Spirit and Opportunity rovers.

Our interest is providing rovers with on-board exe-cution and reactive planning capabilities so that theycan repair plan failures by themselves in a timely fash-ion and thus reduce the communication overhead withthe Earth. This capability is present in a limited wayon current rovers for navigation and hazard avoid-ance [15], and experiments have been conducted in-volving replanning to perform opportunistic science.These capabilities will be needed for future missions.This future hypothetical Mars mission scenario is theproblem that has primarily motivated our work.

7. Test and evaluations

Our domain-independent reactive model has beentested in a problem scenario that simulates the Marsdomain described in Section 6 as well as in other do-mains such as supply-chain, manufacturing or vehi-cle routing. All of the domains and problems are en-coded in PDDL. For the Mars domain, we adaptedthe PDDL files of the rovers domain from the Inter-national Planning Competition6 (IPC) to endow roverswith reconnaissance abilities like, for instance, the op-erators (SeekSoil ?r ?w) or (SeekRock ?r ?w) that al-lows a rover ?r to seek more samples in a waypoint?w. The resulting files accurately reflect the size anddifficulty of the real problems except for the numericcapabilities of the rovers.

In this section we show the results obtained for 12problems of the Mars scenario and 10 problems of a

6http://ipc.icaps-conference.org/.

vehicle routing domain [45] adapted from the logis-tics domain of the IPC, a significant problem in the in-dustry of transportation. For both domains, we discussthe accuracy of our approach to generate the repair-ing structures within the given time; additionally, wecompare the average time to solve a plan failure in theMars domain with our approach and other replanningand plan-adaptation mechanisms.

The reactive execution model was implemented inJava and all tests were run on a GNU/Linux De-bian computer with an Intel 7 Core i7-3770 CPU @3.40 GHz × 8, and 8 GB RAM.

7.1. Time-bounded repairing structures

In order to test the timely generation of a repair-ing structure, we executed several problems of diversecomplexity of the Mars exploration and vehicle do-mains, and we gathered the results of the first threegenerated Ti for each problem (see Table 2). The datashown for every Ti are: ΠA, number of remaining ac-tions of the plan when Ti was created; ts, the time limitto create Ti; the values of l and d selected by the esti-mation model within ts; tT , the real time used in theconstruction of Ti and N , the number of partial statesin Ti. All times are measured in seconds. Notice thatthe value of ts for Ti is subject to the value of l in Ti−1,except for T1, which is a fixed value of 1 sec. The num-ber of locations, resources and goals contribute to in-crease the complexity of each problem.

The top part of Table 2 shows the results for the Marsexploration domain. Three repairing structures (T1, T2and T3) were collected for all the problems except forproblem 2, where T1 and T2 covered all the actions ofΠA. The first remark about these results is that almostevery Ti was generated within the deadline (tTi < ts),excluding T3 of the problem 12 that slightly exceededits time limit. A second observation is that, for rela-tively small problems, like 1–5, the value of tT is farfrom ts because the search space of these problemsis fairly small and the newly generated partial statesare all repeated nodes after a certain point. Hence, inmost of the simplest problems the entire state space isquickly exhausted, in contrast to the problems 6–12. Insome problems, we can also observe that the values ofl, d and ts are the same but the values of tT and N aredifferent. For instance, (l, d) = (3, 4) and ts = 2 secs.for problems 7, 8 and 9 but the values of tT and N arefairly different because of the increasing complexity ofthese problems in the number of locations and goals,which implies a higher branching factor in each pro-

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 355

Tabl

e2

Firs

tthr

eere

pair

ing

stru

ctur

esof

Mar

san

dve

hicl

ero

utin

gdo

mai

ns

Dat

ase

tcom

plex

ityT 1

(ts=

1s)

T 2T 3

Pro

blem

Rov

ers

Loc

atio

nsG

oals

ΠA

l,d

t TN

ΠA

l,d

t st T

Al,d

t st T

N

(soi

l)(r

ock)

(im

age)

(act

ions

)(s

ize)

(s)

(nod

es)

(act

ions

)(s

ize)

(s)

(s)

(nod

es)

(act

ions

)(s

ize)

(s)

(s)

(nod

es)

Marsexplorationdomain

11

31

01

62,

40.

061

140

42,

52

0.04

621

92

2,6

20.

058

492

21

31

01

64,

50.

026

882

2,6

40.

036

128

31

31

11

104,

50.

073

625

65,

64

0.09

177

71

1,3

50.

015

714

14

11

19

2,4

0.03

935

87

2,6

20.

098

1726

53,

52

0.10

614

705

14

21

113

2,5

0.08

696

211

2,6

20.

067

1354

93,

52

0.11

016

366

24

21

111

3,5

0.82

129

058

3,5

30.

565

3822

52,

63

0.21

714

297

24

31

116

3,5

0.25

920

6513

2,6

30.

221

798

113,

42

0.29

714

588

25

22

116

2,6

0.24

339

6514

2,5

20.

273

1911

123,

42

0.43

155

789

26

33

122

2,6

0.94

213

981

202,

52

0.46

146

8618

3,4

20.

990

8461

102

63

32

252,

30.

240

801

232,

42

1.10

394

7921

2,4

20.

984

9959

114

102

43

352,

50.

322

761

333,

42

1.50

962

3830

2,4

31.

105

4926

124

103

43

392,

50.

250

756

372,

42

0.27

214

3935

2,4

22.

125

1236

6

Pro

blem

Airp

ort

Loc

atio

nsC

ityPl

ane

Truc

kG

oals

(pac

kage

s)

Vehicleroutingdomain

12

22

12

420

3,6

0.28

534

117

4,6

30.

495

621

135,

64

0.03

412

02

22

21

25

273,

60.

140

314

244,

63

0.14

941

820

3,6

40.

137

433

32

22

12

625

2,6

0.13

126

223

2,6

20.

112

284

214,

62

0.38

268

34

33

31

37

362,

60.

448

421

343,

62

1.04

989

631

5,6

30.

160

223

53

33

13

831

3,5

0.45

786

128

5,6

32.

647

2070

235,

65

4.91

865

736

33

31

39

363,

50.

527

915

332,

63

1.96

314

2731

2,6

20.

803

641

74

44

14

1046

4,5

0.95

420

5442

4,6

43.

624

1612

384,

64

0.62

338

28

44

41

411

503,

51.

091

1344

473,

63

3.80

323

5544

5,6

30.

685

340

94

44

14

1242

2,5

0.93

711

6440

4,5

21.

947

2965

362,

64

1.64

563

310

55

52

513

813,

50.

595

511

784,

53

1.97

713

4174

3,6

44.

973

1507

356 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

Tabl

e3

Tim

ean

dre

cove

rypl

ans

forr

epai

ring

task

sw

ithT 1

inth

eM

ars

expl

orat

ion

dom

ain

Prob

lem

12

34

56

78

910

1112

Failu

reid

12

12

34

12

34

12

12

12

31

23

12

12

12

12

12

Type

DA

AB

AA

DA

AD

AB

AA

AA

CA

BA

AA

AB

AA

AA

AC

ΠA

65

65

43

109

87

98

1312

1110

916

1514

1615

2221

2524

3534

3938

Repairing(RP)

Reu

sedΠ

A5

46

53

�8

98

69

811

1211

108

1415

1416

1521

2125

2333

3439

38Π

′ A5

67

64

�9

109

610

915

1312

1111

1816

1617

1624

2226

2537

3541

39(r

oot)

(roo

t)(r

oot)

(roo

t)(r

oot)

(roo

t)(r

oot)

(roo

t)(r

oot)

(roo

t)Ti

me

(ms)

1.03

0.60

0.16

5.62

1.02

0.81

0.28

0.11

0.10

1.45

0.29

0.26

5.22

0.18

0.26

0.22

0.95

1.39

0.12

2.12

0.27

0.24

23.3

50.

210.

482.

944.

280.

430.

710.

34

Adaptation(LPG-ADAPT)

Reu

sedΠ

A5

46

53

�10

88

49

710

1011

109

1314

1215

1418

1923

2130

2729

33Π

′ A5

99

64

�13

119

612

1213

1313

1514

1819

1419

2023

2528

2540

3745

41Ti

me

(ms)

5445

5852

4351

4544

4353

5352

5545

4845

4956

4957

4353

4456

4445

5149

5250

Replanning(LAMA)

Reu

sedΠ

A5

45

53

�6

44

55

69

73

12

410

04

413

415

322

2521

29Π

′ A5

67

64

�11

99

511

1115

1512

1014

1717

1418

1624

2428

2736

3439

42Ti

me

(ms)

3639

5438

3632

3938

4139

4141

4259

5151

5151

5252

5957

6869

7472

140

166

148

149

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 357

blem. In general, the branching factor also depends onthe relevant variables of each particular plan windowand, hence, a tree like T2 of problem 12 may result ina smaller search space than T2 of problem 10.

The lower part of Table 2 shows the results obtainedfor the vehicle domain, where the goal is to find opti-mal routes for several vehicles which have to deliver anumber of packages. Most of the repairing structureswere generated within the time limit except in the caseof T1 and T2 of the problem 8, and T3 of the problem10, due to the complexity of these problems. In gen-eral, the trees in this domain are larger than those ofthe Mars domain under the same value of ts. This isbecause the branching factor in the vehicle domain ismuch lower than in the Mars domain since trucks areconfined to move in a particular city and, hence, load-ing a package in a truck will only ramify across thetrucks defined in the city where the package is located.In contrast, in the Mars domain rovers are equippedwith all functionalities and therefore the branching fac-tor is considerably higher.

All in all, the results corroborate the accuracy of thepredictive model and the timely generation of the re-pairing structures. Overall, out of all of the experimentscarried out in the diverse domains, we can say that 95%of the repairing structures are always generated withinthe time limit. On the other hand, in contrast to most ofreactive planners [1,13], our model is far more flexiblesince it is capable of dealing with more than one action(in the plan window) and multiple failure states with asingle repairing structure.

7.2. Repairing plan failures

This section shows the performance of the plan re-covery search procedure when repairing a plan fail-ure in the Mars domain. Specifically, failures were ran-domly simulated in the plan window of the tree T1 forthe problems in the upper part of Table 2. Failures weregenerated by changing the value of a fluent in the ini-tial state and in the states resulting from the executionof an action ai, thus provoking an erroneous executionin the next action ai+1 of the plan window. In otherwords, the first failure affects action a1 of ΠA, the sec-ond failure affects a2 of ΠA assuming a1 was correctlyexecuted, the third failure affects a3 assuming the twopreceding actions were correctly executed and so on.

Table 3 depicts the results of the repairing task inthe Mars domain with three approaches: our RP, theLPG-ADAPT mechanism [18], which repairs a givenplan adapting it to a new current state, and the classi-

cal planner LAMA [38], used as a deliberative plannerto obtain a new plan from scratch (replanning) whena failure is detected. Plan failures are classified as fol-lows:

A. Failures originated by an error in the executionof the actions (Navigate ?r ?wo ?wd). An errorof this type is because of: 1) the rover ?r is notlocated in ?wo at the time of executing the actionor 2) the path from ?wo to ?wd is blocked and therover cannot traverse it.

B. Failures that prevent the rover from analyzingthe results or taking good pictures. This type offailure is caused by: 1) the rover loses the sam-ple (rock or soil) by an unexpected event whenit is about to analyze it, or 2) the camera losescalibration before taking the picture.

C. Failures that are solved with the help of otherrovers when a hardware failure disables the de-vice of a rover. Since fixing the damaged hard-ware is not an eligible option in the NASA sce-nario, the only possible way to repair this failureis with the help of other rover, as long as this ispossible within the repairing structure.

D. Failures that positively affect the plan execution.Table 3 also shows the number of remaining actions

of the plan ΠA at the time of executing the repairingtask, the number of actions reused from ΠA, the totalnumber of actions of the recovery plan (Π′

A) and thetime of the plan repair measured in milliseconds. Theparameter reused ΠA represents the stability with re-spect to ΠA; i.e. the number of actions of ΠA that arereused in Π′

A. Finally, the failures labeled as root in theΠ′

A row denote that the recovery plans were calculatedup to the root node of T1.

As we can see in Table 3, the three approaches wereable to find a plan except for failure 4 of problem 2. Inthis case, the path from waypoint w1 to w2 is blockedand there is no other possible way to navigate to w2;that is, it does not exist a recovery plan for this failure.

Regarding stability, the solution plans found by theRP can be classified as: (1) plans that benefit from pos-itive failures of type D, and then contain fewer actionsthan the original plan ΠA; (2) plans that reuse all theactions in ΠA (e.g. failure 1 of problems 2 and 6); and(3) plans that reuse only some of the actions in ΠA

(e.g., failure 1 of problems 5 and 11).The solutions to failures of type A are about rerout-

ing rovers through other paths using the available travelmaps. The recovery from failures of type B implies ei-ther exploring the area again seeking a new sample ofrock or soil (e.g., failure 2 of problem 2) and then an-

358 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

Table 4Summary of statistics for RP, LPG-ADAPT and LAMA perfor-mance

Stability (%) ΔΠA Time (ms)μ σ μ σ μ σ

RP 92 19 0.97 0.85 1.85 4.33LPG-ADAPT 85 19 2.40 1.94 49.47 4.72LAMA 51 27 1.33 1.52 62.83 36.99

alyze it, or calibrating the rover’s camera again (e.g.,failure 2 of problem 4). Failures of type C were foundin two cases, which could be repaired because their re-spective T1 included paths involving the second rover(e.g., failure 3 of problem 6). Here, a hardware failureprevent the rover from analyzing the soil in a specificlocation, our model repairs the failure using the secondrover that explores the area seeking for soils, analyzesthe soil and communicates the results to the lander. Inthe failures of type D, the RP takes advantage of thepositive failure, which achieves the effects of the nextaction to execute and, consequently, the RP proceedswith the following action in ΠA.

The performance results in Table 3 and the sum-mary of statistics in Table 4 show that our RP per-forms admirably well in all the measured dimensions.Regarding stability, RP outperfomrs LPG-ADAPT andLAMA. LAMA is the approach with the worst rateof stability (51%), which is reasonable since the plan-ner does not repair a plan but it computes a new plan.In Table 3 we can see that the plan quality or numberof actions of Π′

A is slightly higher with the RP thanwith LAMA in some cases (e.g., failure 1 of prob-lem 12 or failure 3 of problem 7), and lower in someother cases (e.g. failure 2 of problem 12 or failures 1and 2 of problem 10). LAMA is able to find shorterplans in a few cases because it computes a plan for thenew situation without being subject to keep the actionsin ΠA. Nevertheless, all in all, the RP returns plansof better quality than LAMA as Table 4 shows (themean value in the increase of the number of actionsis 0.97 in RP against 1.33 in LAMA). The compari-son between RP and LPG-ADAPT clearly benefits RPin both stability and quality of the recovery plan, par-ticularly in the most complex problems (10 to 12). Asfor the computation time, finding a recovery path tothe root node is more costly since the repairing mech-anism explores the entire search space. However, RPshows outstanding results compared to LPG-ADAPTand LAMA, which proves the benefit of using the RPto repair plan failures in reactive environments besidesavoiding the overhead of communicating with a de-liberative planner. In conclusion, we can affirm thatour model is a robust recovery mechanism for reactiveplanning that also provides good-quality solutions.

8. Limitations and extensions of the model

The results in Section 7 show that our reactive ex-ecution model meets the performance needs of a re-active plan repair and that it also outperforms otherrepairing mechanisms. However, the model presentssome limitations that we intend to overcome in the fu-ture. One limitation is the machine dependency of theestimation model explained in Section 5.3. In order toreproduce the experiments, or to export them to othersystems, the training of the estimation model must berepeated to adjust the value of tΓ for a particular pro-cessor.

Assuming several agents executing their plans ina common environment, a repairing task of an agentmight cause conflicts in the plan of the others. Addi-tionally, the occurrence of failures could restrict the ca-pabilities of the agents, preventing them from achiev-ing some goal. Therefore, a multi-agent approachwhere execution agents act, coordinate and jointly re-pair a failure is desirable [2,16,35]. A communica-tion protocol that helps agents request, provide andagree on a particular recovery plan is a desirable ap-proach for a multi-agent repair system [25]. Particu-larly, the present work represents a first step towardsa multi-agent P&E system capable of coordinatingagents plans while minimizing crowd-effects [44].

Our model can be easily extended to parallel andtemporal planning. The parallel execution of severalactions of an agent is achievable by grouping togetherthe preconditions and effects of the parallel actions intoa new action. The existence of grouped actions wouldavoid the duplicated states that arise from the multipleserialization of the actions in the group. On the otherhand, handling durative actions in temporal planningwould involve creating a regressed partial state at eachrelevant time point of an action and adding fluents torepresent the ongoing executing actions at each execu-tion cycle.

9. Conclusions and future works

This paper presents a reactive execution model,which comprises a RP, to recover from failures in plan-ning control applications. The model is embedded intoa P&E system where an execution agent receives aplan from a deliberative planner and its mission is tomonitor, execute and repair the given plan.

Providing time-bounded responses in reactive envi-ronments is a difficult and sometimes unfeasible task

C. Guzman et al. / Reactive execution for solving plan failures in planning control applications 359

due to the unpredictability of the environment and theimpossibility to guarantee a response within a giventime. An alternative solution to overcome this difficultyis working with time-bounded data structures ratherthan designing time-bounded reasoning processes. Byfollowing this approach, our model ensures the avail-ability of a repairing structure, or search tree, within agiven time, which is later used to fix the action failuresduring the plan execution.

Several features have been considered in order tohave a search tree generated in due time: (1) the tree isformed of partial states which contain far less fluentsthan world states; (2) the tree is limited to a particularfragment of the plan and tree depth that are calculatedby an estimation model; and (3) the expansion of thetree only considers the relevant variables that might po-tentially fail during the plan execution. Under these cri-teria, we show the results obtained for two different do-mains, a simulation of a real NASA space problem, anda vehicle routing domain. The results corroborate thatthere is a 95% likelihood to obtain a repairing struc-ture in time. Additionally, the exhaustive experimen-tation on the repairing tasks confirm that the repairingstructure together with the search recovery process isa very suitable mechanism to fix failures that representslight deviations from the main course of action in aplanning control application. The results support sev-eral conclusions: the accuracy of the model to gener-ate repairing structures in time, the usefulness of a sin-gle repairing structure to repair more than one actionin a plan fragment while reusing the original plan asmuch as possible, and the reliability and performanceof our recovery search procedure in comparison withother well-known classical planning mechanisms.

The current RP can be extended in several differentdirections as, for instance, by including the necessarymachinery to deal with temporal plans. Our next futurework is to exploit this model for a multi-agent recoverymechanism in which agents dynamically form a team-work at execution time and work together in the repairof a plan failure.

Acknowledgments

This work has been partly supported by the SpanishMICINN under the projects TIN2014-55637-C2-2-R,and the Valencian project PROMETEOII/2013/019.

References

[1] P. Aschwanden, V. Baskaran, S. Bernardini, C. Fry, M.

Moreno, N. Muscettola, C. Plaunt, D. Rijsman and P. Tomp-kins, Model-unified planning and execution for distributed au-tonomous system control, in: AAAI Fall Symposium on Space-craft Autonomy, (2006).

[2] R. Badawy, A. Yassine, A. Heßler, B. Hirsch and S. Albayrak,A novel multi-agent system utilizing quantum-inspired evo-lution for demand side management in the future smart grid,Integrated Comp-Aided Engineering 20(2) (2013), 127–141.

[3] A.G. Banerjee and S.K. Gupta, Research in automated plan-ning and control for micromanipulation, IEEE Transactionson Automation Science and Engineering 10(3) (2013), 485–495.

[4] P. Baraldi, R. Canesi, E. Zio, R. Seraoui and R. Chevalier, Ge-netic algorithm-based wrapper approach for grouping condi-tion monitoring signals of nuclear power plant components,Integrated Comp-Aided Engineering 18(3) (2011), 221–234.

[5] J.C. Bongard and H. Lipson, Automated damage diagnosisand recovery for remote robotics, in: IEEE Robotics and Au-tomation 4 (2004), 3545–3550.

[6] M. Brenner and B. Nebel, Continual planning and acting indynamic multiagent environments, Autonomous Agents andMulti-Agent Systems 19(3) (2009), 297–331.

[7] B. Browning, J. Bruce, M. Bowling and M. Veloso, STP:Skills, tactics and plays for multi-robot control in adversar-ial environments, IEEE Journal of Control and Systems Engi-neering 219 (2005), 33–52.

[8] J. Bryson and L.A. Stein, Modularity and design in reactiveintelligence, Intl Joint Conference on Artificial Intelligence(2001), 1115–1120.

[9] S. Chien, B. Cichy, A. Davies, D. Tran, G. Rabideau, R. Cas-taño, R. Sherwood, D. Mandl, S. Frye, S. Shulman, J. Jonesand S. Grosvenor, An autonomous earth-observing sensor-web, IEEE Intelligent Systems 20(2005), 16–24.

[10] J.Y.J. Chow, Activity-based travel scenario analysis with rout-ing problem reoptimization, Comp-Aided Civil and Infras-truct Engineering 29(2) (2014), 91–106.

[11] R.E. Fikes, P.E. Hart and N.J. Nilsson, Learning and executinggeneralized robot plans, Artificial Intelligence 3 (1972), 251–288.

[12] W. Fink, J.M. Dohm, M.A. Tarbell, T.M. Hare and V.R. Baker,Next-generation robotic planetary reconnaissance missions: Aparadigm shift, Planetary and Space Science 53(14) (2005),1419–1426.

[13] A. Finzi, F. Ingrand and N. Muscettola, Model-based execu-tive control through reactive planning for autonomous rovers,in: IEEE Intelligent Robots Systems 1 (2004), 879–884.

[14] L. Flückiger and H. Utz, Service oriented robotic architecturefor space robotics: Design, testing, and lessons learned, J ofField Robotics 31(1) (2014), 176–191.

[15] T.W. Fong, M. Bualat, M. Deans, M. Allan, X. Bouys-sounouse, M. Broxton, L. Edwards, R. Elphic, L. Fluckiger,J. Frank, L. Keely, L. Kobayashi, P. Lee, S.Y. Lee, D. Lees,E. Pacis, E. Park, L. Pedersen, D. Schreckenghost, T. Smith,V. To and H. Utz, Field testing of utility robots for lunar sur-face operations, in: AIAA Space Conference and Exposition(2008), 22–27.

[16] A. Fougères and E. Ostrosi, Fuzzy agent-based approach forconsensual design synthesis in product configuration, Inte-grated Comp-Aided Engineering 20(3) (2013), 259–274.

[17] M. Fox and D. Long, Pddl2.1: An extension to pddl for ex-pressing temporal planning domains, Journal of Artificial In-telligence Research, 20 (2003), 61–124.

[18] M. Fox, A. Gerevini, D. Long and I. Serina, Plan stability:Replanning versus plan repair, in: Automated Planning and

360 C. Guzman et al. / Reactive execution for solving plan failures in planning control applications

Scheduling (2006), 212–221.[19] C. Fritz and S.A. McIlraith, Monitoring plan optimality

during execution, in: Automated Planning and Scheduling(2007), 144–151.

[20] A. Garrido and E. Onaindia, Assembling learning objects forpersonalized learning: An ai planning perspective, IEEE In-telligent Systems 28(2) (2013), 64–73.

[21] M. Ghallab, D. Nau and P. Traverso, Automated Planning:Theory & Practice, Elsevier, 2004.

[22] M. Ghallab, D.S. Nau and P. Traverso, The actor’s view ofautomated planning and acting: A position paper, ArtificialIntelligence Journal 208 (2014), 1–17.

[23] K. Görlach, M. Sonntag, D. Karastoyanova, F. Leymann andM. Reiter, Conventional workflow technology for scientificsimulation, in: Guide to e-Science (2011), 323–352.

[24] C. Guzmán, V. Alcázar, D. Prior, E. Onaindia, D. Bor-rajo, J. Fdez-Olivares and E. Quintero, PELEA: A domain-independent architecture for planning, execution and learn-ing, in: Scheduling and Planning Applications woRKshop 12(2012), 38–45.

[25] C. Guzmán, P. Castejon, E. Onaindia and J. Frank, Multi-agent reactive planning for solving plan failures, in: HybridArtificial Intelligent Systems – 8th International Conference,Lecture Notes in Computer Science 8073 (2013), 530–539.

[26] M.R. Hagerty and V. Srinivasan, Comparing the predictivepowers of alternative multiple regression models, Psychome-trika 56(1) (1991), 77–85.

[27] R. Haijema and E.M.T. Hendrix, Traffic responsive control ofintersections with predicted arrival times: A markovian ap-proach, Comp-Aided Civil and Infrastruct Engineering 29(2)(2014), 123–139.

[28] Y. Kuwata, A. Elfes, M. Maimone, A. Howard, M. Pivtoraiko,T.M. Howard and A. Stoica, Path planning challenges forplanetary robots, in: IEEE Intelligent Robots Systems (2008),22–27.

[29] S. Lemai and F. Ingrand, Interleaving temporal planning andexecution in robotics domains, in: Innovative Applications ofArtificial Intelligence (2004), 617–622.

[30] M.W. Maimone, P.C. Leger and J.J. Biesiadecki, Overview ofthe mars exploration rovers’ autonomous mobility and visioncapabilities, in: IEEE Robotics and Automation, Jet Propul-sion Laboratory, NASA (2007).

[31] M.G. Marchetta and R. Forradellas, An artificial intelli-gence planning approach to manufacturing feature recogni-tion, Comp-Aided Design 42(3) (2010), 248–256.

[32] C. McGann, F. Py, K. Rajan, H. Thomas, R. Henthorn and R.McEwen, A deliberative architecture for auv control, in: IEEERobotics and Automation (2008), 1049–1054.

[33] N. Meuleau and D.E. Smith, Optimal limited contingency

planning, Uncertainty in Artificial Intelligence (2003), 417–426.

[34] A. Milani and V. Poggioni, Planning in reactive environments,Computational Intelligence 23(4) (2007), 439–463.

[35] I. Montalvo, J. Izquierdo, R. Pérez-García and M. Herrera,Water distribution system computer-aided design by agentswarm optimization, Comp-Aided Civil and Infrastruct Engi-neering 29(6) (2014), 433–448.

[36] J. Patel, M.C. Dorneich, D.H. Mott, A. Bahrami and C. Gi-ammanco, Improving coalition planning by making plansalive, IEEE Intelligent Systems 28(1) (2013), 17–25.

[37] C. Piacentini, V. Alimisis, M. Fox and D. Long, Combining atemporal planner with an external solver for the power balanc-ing problem in an electricity network, in: the 23th AutomatedPlanning and Scheduling, AAAI (2013), 398–406.

[38] S. Richter and M. Westphal, The LAMA planner: Guidingcost-based anytime planning with landmarks, Journal of Arti-ficial Intelligence Research 39(1) (2010), 127–177.

[39] A. Rodríguez and J.A. Reggia, Collective-movement teamsfor cooperative problem solving, Integrated Comp-Aided En-gineering 12(3) (Jul 2005), 217–235.

[40] M. Santofimia, X. del Toro, P. Roncero-Sánchez, F. Moya, M.Martinez and J. López, A qualitative agent-based approachto power quality monitoring and diagnosis, Integrated Comp-Aided Engineering 17(4) (2010), 305–319.

[41] J. Sedano, C. Chira, J. Villar and E. Ambel, An intelligentroute management system for electric vehicle charging, Inte-grated Comp-Aided Engineering 20(4) (2013), 321–333.

[42] M. Smart, B. Ratnakumar, L. Whitcanack, F. Puglia, S. Santeeand R. Gitzendanner, Life verification of large capacity yard-ney li-ion cells and batteries in support of nasa missions, IntlJournal of Energy Research 34(2) (2010), 116–132.

[43] S. Sohrabi, J.A. Baier and S.A. McIlraith, Diagnosis as plan-ning revisited, in: 21st Intl Workshop on the Principles of Di-agnosis (2010),

[44] Q. Sun and S. Wu, A configurable agent-based crowdmodel with generic behavior effect representation mecha-nism, Comp-Aided Civil and Infrastruct Engineering 29(7)(2014), 531–545.

[45] P. Toth and D. Vigo, The Vehicle Routing Problem, Philadel-phia, PA, USA: Society for Industrial and Applied Mathemat-ics, 2001.

[46] C. uise, J.C. Beck and S.A. McIlraith, Flexible execution ofpartial order plans with temporal constraints, in: the 23th IntlJoint Conference on Artificial Intelligence, AAAI, (2013),2328–2335.

[47] W. Xie and Y. Ouyang, Dynamic planning of facility locationswith benefits from multitype facility colocation, Comp-AidedCivil and Infrastruct Engineering 28(9) (2013), 666–678.