December 4th, 2013 Turin - Italy - CNRips2013.istc.cnr.it/wp-content/uploads/2013/12/ips2013_proceedings.pdf · December 4th, 2013 Turin - Italy ... University of Brescia, Italy(co-chair)

Workshop of the XIII AI*IA Conference (AI*IA 2013)

http://ips2013.istc.cnr.it/

December 4th, 2013Turin - Italy

Organising Committee

• Gabriella Cortellessa (ISTC-CNR, Italy)

• Alfonso Gerevini (University of Brescia, Italy)

• Daniele Magazzeni (King’s College London, UK)

• Ivan Serina (University of Brescia, Italy)

Program Committee

• Marco Baioletti, University of Perugia, Italy

• Sara Bernardini, King’s College London, UK

• Alessandro Cimatti, FBK-irst, Italy

• Amanda Coles, King’s College London, UK

• Gabriella Cortellessa, ISTC-CNR, Italy (co-chair)

• Giuseppe De Giacomo, University of Rome “La Sapienza”, Italy

• Giuseppe Della Penna, University of L’Aquila, Italy

• Yannis Dimopoulos, University of Cyprus

• Alberto Finzi, University of Napoli “Federico II”, Italy

• Simone Fratini, ESA, Germany

• Antonio Garrido, Universidad Politecnica de Valencia, Spain

• Hector Geffner, Universitat Pompeu Fabra, Barcelona, Spain

• Alfonso Gerevini, University of Brescia, Italy (co-chair)

• Enrico Giunchiglia, University of Genova, Italy

• Ugur Kuter, Smart Information Flow Technologies (SIFT), USA

• Carlos Linares, Universidad Carlos III de Madrid, Spain

• Daniele Magazzeni, King’s College London, UK (co-chair)

• Marco Maratea, University of Genova, Italy

• Lee McCluskey, University of Huddersfield, UK

• Fabio Mercorio, University of Milan “Bicocca”, Italy

• Roberto Micalizio, University of Torino, Italy

• Alfredo Milani, University of Perugia, Italy

• Barry O’Sullivan, University College Cork, Ireland

• Angelo Oddi, ISTC-CNR, Rome, Italy

• Andrea Orlandini, ISTC-CNR, Rome, Italy

• Fabio Patrizi, University of Rome “La Sapienza”, Italy

• Valentina Poggioni, University of Perugia, Italy

• Nicola Policella, ESA, Germany

• Riccardo Rasconi, ISTC-CNR, Rome, Italy

• Ioannis Refanidis, University of Macedonia, Greece

• Alessandro Saetti, Unversity of Brescia, Italy

• Andrea Schaerf, University of Udine, Italy

• Ivan Serina, University of Brescia, Italy (co-chair)

• Mauro Vallati, University of Huddersfield, UK

• Kristen Brent Venable, Tulane University, New Orleans, USA

Foreword

This volume contains the papers presented at IPS 2013, the 5th Italian Workshop on Planning and

Scheduling, held within the XIII AI*IA Conference, in Turin, Italy, on December 4th, 2013.

The aims of this series of workshop are to bring together researchers interested in different aspects of

planning and scheduling, and to introduce new researchers to the community. Although the primary

target of IPS workshops is the Italian community of Planning and Scheduling, an international Program

Committee is recruited, with the aim to attract an international gathering.

In particular, this year 14 papers were accepted for presentation at the workshop, involving 42 authors

from Italy and other European countries.

The papers mainly focus on planning applications in different domains, plans execution, repair and

robustness, and planning and scheduling with time constraints.

Gabriella Cortellessa, Alfonso Gerevini, Daniele Magazzeni, Ivan Serina

Workshop Organizers

December 2013

Contents

SESSION 1

Toward a Test Environment for Autonomous ControllersPablo Muñoz, Amedeo Cesta, Andrea Orlandini and Maria D. R-Moreno ......................................3

Planning the Behaviour of Low-Cost Quadcopters for Surveillance MissionsSara Bernardini, Maria Fox and Derek Long.....................................................................................11

Autonomous Energy Management as a High Level Reasoning for Planetary Rover ProblemsDaniel Diaz, Amedeo Cesta, Angelo Oddi, Riccardo Rasconi and Maria Dolores Rodriguez-Moreno.....................................................................................................19

Planning and Replanning for Autonomous Underwater VehiclesDaniele Magazzeni and Francesco Maurelli.......................................................................................29

SESSION 2

A TGA-based Method for Safety Critical Plan ExecutionAndrea Orlandini, Marco Suriano, Amedeo Cesta and Alberto Finzi................................................39

Plan Repair Driven by Model-Based Agent DiagnosisRoberto Micalizio...............................................................................................................................47

Timelines with Temporal UncertaintyAlessandro Cimatti, Andrea Micheli and Marco Roveri....................................................................55

Can Planning meet Data Cleansing?Roberto Boselli, Mirko Cesarini, Fabio Mercorio and Mario Mezzanzanica....................................63

Evaluating Plan Robustness in Presence of Numeric FluentsEnrico Scala........................................................................................................................................67

On the Plan-library Maintenance Problem in a Case-based PlannerAlfonso Emilio Gerevini, Anna Roubickova, Alessandro Saetti and Ivan Serina..............................71

Towards Automated Planning Domain Models GenerationMauro Vallati, Lukas Chrpa and Federico Cerutti..............................................................................79

SESSION 3

Business Model Design as a Temporal Planning Problem: Preliminary ResultsDaniele Magazzeni, Fabio Mercorio, Balbir Barn, Tony Clark, Franco Raimondi andVinay Kulkarni....................................................................................................................................85

Reasoning about Time Constraints in a Mixed-Initiative Calendar ManagerLiliana Ardissono, Giovanna Petrone, Marino Segnan and Gianluca Torta.......................................93

Efficient DTPP solving with a reduction-based approachJean-Rémi Bourguet, Marco Maratea and Luca Pulina......................................................................101

Session 1

Planning on earth, sea, sky and space

1

2

Toward a Test Environment for Autonomous Controllers

Pablo MunozUniversidad de Alcala

Alcala de Henares, Madrid (Spain)

Amedeo Cesta, Andrea OrlandiniCNR – Italian National Research Council

ISTC, Rome (Italy)

Maria Dolores R-MorenoUniversidad de Alcala


Abstract

In the last two decades, an increasing attention has been dedi-cated on the use of high level task planning in robotic control,aiming to deploy advanced robotics systems in challengingscenarios where a high autonomy degree is required. Nev-ertheless, an interesting open problem in the literature is thelack of a well defined methodology for approaching the de-sign of deliberative systems and for fairly comparing differ-ent approaches to deliberation. This paper presents the gen-eral idea of an environment for facilitating knowledge engi-neering for autonomy and in particular to facilitate accurateexperiments on planning and execution systems for robotics.It discusses features of the On-Ground Autonomy Test Envi-ronment (OGATE), a general testbench for interfacing delib-erative modules. In particular we present features of an initialinstance of such system built to support the GOAC roboticsoftware.

IntroductionIn the last two decades the ongoing work in robotics hasbeen marked by the exponential growth of the availablefunctionalities. This is a consequence of the advancementsin both hardware and software technology. Still, robotic sys-tems can conquer new locations and perform more complextasks, also in environments which are unexplored.

As a result of deploying advanced robotic systems inunknown and dynamic environments, the control softwarerequired to achieve the mission goals shall deal with animportant number of constraints. Thus, the advances inthe Artificial Intelligence (AI) field seem to be naturallymerged with the control of robotic systems in order to al-low it to generate long term plans without (or little) hu-man interaction. In this way, developments in planningand scheduling systems, such as task planning (Korf 1987;Fikes and Nilsson 1971), CSPs (Dechter and Pearl 1987;Do and Khambhampati 2000; Cesta, Fratini, and Oddi2005), timelines (Muscettola 1994; Jonsson et al. 2000;Smith, Frank, and Jonsson 2000; Frank and Jonsson 2003;Fratini, Pecora, and Cesta 2008; Cesta et al. 2009; Chien etal. 2010) and, recently, the efforts interleaving planning andexecution (Ambros-Ingerson and Steel 1998; Finzi, Ingrand,and Muscettola 2004; Py, Rajan, and McGann 2010) couldbe very valuable to control robots in dynamic environmentswith an increasing degree of autonomy.

These AI controllers are the top layer of complex toolsthat are usually made ad-hoc to control a specific roboticplatform to perform determinate missions. Usually, to testand verify the correctness of an architecture, a small set ofmissions are carried out. Also, some parts of the control sys-tem could be evaluated in a standalone manner via particulartest beds. But testing the robustness, adequacy and perfor-mance for the whole control architecture cannot be easilydone; it requires to collect and to analyze relevant data fromall the parts of the control system while the test bed coversmore cases than the typical scenarios. This is currently anopen and interesting problem in which not much work hasbeen already done.

The paper presents a prototype of the On-Ground Auton-omy Test Environment (OGATE)1 as the initial result of aneffort that aims to provide testing support while developingcontroller systems for space robotics missions. The OGATEenvironment would constitute an entry point to investigatethese open problems in autonomous controllers as the com-bination of an engineering effort, identifying the require-ments, designing and implementing a general environmentto provide testing and verification tools for autonomous con-troller systems; and a research effort to discriminate the keyfactors in research on planning, scheduling and executionin order to evaluate the performance of autonomous con-trollers.

We also aim to provide support for ground segment facil-ity in space robotics missions, hiding the complexity of thecontrolled system to the user. So, in the paper, the spacerobotics context is exploited as a real-world scenario, em-ployed to test different solutions for deliberation and execu-tion under the same conditions.

The paper is structured as follows: next section describesa space robotic scenario related to the GOAC project and ex-ploited as a case study. Then, the objectives of the systemare presented and, in the following, a brief description ofOGATE and its functionality is described. The presentationof an initial deployment of the system and a description ofwhat is a plug-in component for OGATE are given. Andfinally some conclusions are outlined.

1Funded by the ESA Networking and Partnering Initiative Co-operative Systems for Autonomous Exploration Missions.

3

A Space Robotic Case StudyOur interest in plan-based autonomy is also related to a re-cent participation in the Goal Oriented Autonomous Con-troller (GOAC) (Ceballos et al. 2011) project: an ESA effortto create a common platform for robotic software develop-ment. In particular, the GOAC effort combines several tech-nologies: (a) a timeline-based deliberative layer which in-tegrates a planner, called OMPS (Fratini, Pecora, and Cesta2008), built on top of APSI-TRF to synthesize timelines andrevise them according to execution needs, and an executivea la T-REX (Py, Rajan, and McGann 2010); (b) a functionallayer (Bensalem et al. 2010) which combines a state of theart tool for developing functional modules of robotic sys-tems (GenoM) with a component based framework for im-plementing embedded real-time systems (BIP).

The GOAC system allows to implement controllers in aflexible way, i.e., for each robot or mission a different in-stance of the T-REX system can be deployed defining var-ious cooperating reactors and their associated interactions,providing a scalable architecture. A T-REX agent is com-posed by a hierarchy of deliberative reactors. Each deliber-ative reactor has its own deliberative scope as well as plan-ning latency and look-ahead, in charge of controlling a par-ticular aspect of the mission and, interacting with other re-actors sending goals and receiving observations. Then, areactor exploits a planning system to generate plans and tomonitor the execution following a sense-plan-act paradigmfor goal oriented autonomy. It allows a divide and conquerapproach in which the scope of each deliberative reactorcould be refined by other more specific reactors. Firstly, theplanning system employed in the T-REX deliberative reac-tors was the EUROPA2 planning and scheduling framework(Jonsson et al. 2000; Bresina et al. 2005). For the GOACproject the planning system was replaced by the OMPS plan-ning system, which exploit the APSI-TRF execution facili-ties.

Figure 1 represents a possible instance of the GOAC ar-chitecture. In the figure appears two deliberative reactorseach one with its own planner and model over which to de-liberate. These reactors are interconnected between them,and also, with a command dispatcher reactor. This oneis in charge of sending commands to the functional layer(generally composed of different functional modules), whileretrieving the observation and propagating them along theother reactors. With these data, the different deliberativereactors, using their respective deliberation models, coulddynamically adapt their plans to the new circumstances.

Within the GOAC initiative, the DALA rover has beenconsidered to simulate a robotic scenario as close as possibleto a planetary exploration rover. DALA is one of the LAAS-CNRS robotic platforms that can be used for autonomousexploration experiments. In particular, it is an iRobot ATRVrobot that provides a large number of sensors and effectors.It can use vision based navigation (such as the one usedby the Mars Exploration Rovers Spirit and Opportunity), aswell as indoor navigation based on a Sick laser range finder.

In this regard, DALA can be considered as a fair repre-sentative for a planetary rover equipped with a Pan-Tilt Unit(PTU), two stereo cameras (mounted on top of the PTU), a

Figure 1: Representation of a GOAC instance

panoramic camera and a communication facility. The roveris able to autonomously navigate the environment, move thePTU, take high-resolution pictures and communicate imagesto a Remote Orbiter.

The considered mission goal is a list of required picturesto be taken in different locations with an associated PTUconfiguration, and to communicate them to an Orbiter whenit is visible to the robot. Also, the rover must operate fol-lowing some operative rules to maintain safe and effectiveconfigurations (the reader may refer to (Ceballos et al. 2011)for further details). To deal with the objectives and the oper-ative rules, the OMPS deliberative is in charge of synthesiz-ing a sequence of actions that, starting from the environmentstate, robot state and goals, reach a final state in which thegoals are satisfied.

A possible mission actions sequence is the following:navigate to one of the requested locations, move the PTUpointing at the requested direction, take a picture, then, com-municate the image to the orbiter during the next availablevisibility window, put back the PTU in the safe position and,finally, move to the following requested location. Once allthe locations have been visited and all the pictures have beencommunicated, the mission is considered successfully com-pleted.

Within the GOAC project was also exploited the Exo-Mars rover model using the ESA 3DROV simulator suite(Poulakis et al. 2008) that allows early-stage virtual mod-elling of terrain and mobile robots systems. The systemis composed of multiple modules connected through stan-dardized interfaces, being the most important the SimulationFramework, that is the ESA’s Simsat, responsible for the ex-ecution and scheduling of the simulation and the GenericController that manages the onboard flight software to en-able to connect software modules to control the rover. Also,it includes the Environment block in charge of the timekeep-ing, terrain and atmospheric conditions, and the Visualiza-tion Environment, a front-end that provides real-time visual-ization of the simulation progress.

On-ground support for autonomous controlThe current investigation is triggered in the space environ-ment where the use of software for on-board autonomy isoften perceived as loosing control on critical mission com-

4

ponents (namely a space robots, a spacecraft, etc.). For surecurrent complexity of software for autonomy is quite highand such a complexity reinforce the general skepticism to-ward its wide use in the space environment. To cope with theproblem we have conceived the idea of creating a softwareenvironment to be used on-ground to facilitate the demon-stration and testing of software for autonomy. Such an envi-ronment can also represent the seed for a future knowledgeengineering environment for autonomous controllers. In-deed the first goals for such an environment are: (a) facilitat-ing the use of autonomous controllers; (b) allowing the useof different solution for autonomous control (e.g., toward aplug-and-play style in their use); (c) enabling the compari-son of different solutions gathering reliable execution dataon a given mission. For the time being we are also mak-ing an additional assumption: we are focusing our attentiontoward the deliberative part of the autonomous control thepart that can be referred to as the one performing “planningand execution” hence we assume that the physical systemis accessible through a functional interface (in GOAC, a-laGenoM (Mallet et al. 2010)) or through a robotic operatingsystem.

To make our general goal clear let us refer to the spacerobot domain introduced above. Our research plan is to de-velop an easy-to-use system able to (i) interface differentplanning and execution solutions with a same robot (or itssimulator) and (ii) to automatically generate realistic test bedscenarios presenting an increasing complexity.

Here, we consider missions related to the space roboticscase study. Those missions consist in using a determinateautonomous control architecture over a robotic platform, re-turn some science objectives (i.e., scientific pictures) tak-ing also into account a set of constraints such as the avail-ability time windows for Remote Orbiter communication,or the time period in which science targets are available(some events may be time bounded). The goal is to pro-gressively increase the difficulty of the missions, aiming tostress the planning and execution system and, also, to collectperformance information exploiting a real robotic platformlike DALA as well as a simulator suites such as 3DROV(Poulakis et al. 2008).

Afterward, we are planning to investigate how differentways of conceiving the planning and execution task can becompared. In particular, we need to compare them usingthe same functional support, as fig. 2 shows, and identicalmissions.

Although we can find stress tests for planning systems (forexample the International Planning Competition (IPC)2 inwhich PDDL-based planners (Gerevini and Long 2005) per-formance is evaluated based on quantitative criteria), theyare standalone tests in invariant conditions. This means thatthere is no interference due to a dynamic environment (e.g.,climate changes, external agents, new goal opportunities,etc.) or to changes in the system (e.g., malfunctions or fail-ures) during a mission that affects directly the planning andexecution system. Also, newest technologies interleaving

2The IPC is usually co-located with the International Confer-ence of Automated Planning and Scheduling (ICAPS).

Figure 2: Different configurations for planning and execu-tion controlling the same robotic platform

planning and execution may use different schemas, for in-stance, in terms of number of deliberative components ortheir scope and specialization. Then, a solution plan can alsobe found by different cooperating autonomous systems. Thisentails also the need of investigating how these schemas po-tentially affect the planning process. And, at the best of ourknowledge, a methodology to compare performance metricsin realistic scenarios, such as the space robotic mission do-main considering not-nominal conditions, is still missing.So, this constitute an interesting open research issue.

The OGATE infrastructureWe can briefly define a mission as a set of goals that aresolvable by an autonomous agent. This autonomous agentis composed of two parts: i) the platform, which could be asimulator or a robot, and, ii) the control architecture to man-age the platform, a set of software components that work to-gether to accomplish the objectives defined in the problem.So, considering an autonomous agent, one or more of thesecomponents will be a deliberative component, which are re-sponsible of managing the long term planning to accomplishthe objectives of the mission.

Currently, there exists a high number of technologiesavailable to define problems for autonomous agents and theircorresponding model of the world (usually called domain),which produce high level plans, which must be decomposedin order to be executed by a robotic platform that acceptslow level commands. To deal with the complexity of op-erating the platform and controlling the execution and de-composition of the high level plans, the control architecturesare typically structured in three levels (Gat 1998): i) the de-liberative layer formed by one or more deliberative compo-nents in charge of the long term planning and schedulingii) the low level support, called functional level that con-trols the platform using low level commands, and iii) theexecutive layer, an intermediate level that decomposes theactions produced by the deliberative layer into commandsaccepted by the functional level. Newer developments in-terleaves the two upper layers (deliberative and executive)into a decisional layer, in which planning and execution arehighly coupled.

5

So, if we are interested in evaluating the performance ofa control architecture, we need to take into account all thecomponents, not only the deliberative components. The con-figuration of the architecture, that is, the hierarchy built ontop of a set of different components and how they are con-nected, plays a fundamental role: some planning technolo-gies generate a complete plan before execute it, while othersgenerate partial plans, interleaving planning and executionin a loop. This implies questions such as the different de-lays interchanging data between layers that affects the per-formance of the system. To cover that issue we take thefunctional support for the robotic platform to control as aninvariant part of the system, while employ different tech-nologies for planning and execution over the same domain.

Figure 3: The OGATE environment

To deal with these questions, OGATE aims at providingan environment to test features of goal oriented controllersand to obtain quantitative comparison based on accurate ex-periments and also qualitative analysis allowed by inspec-tion and visualization of software internal monitors of thecontrolled system. The infrastructure of the system relies onthree main modules as seen in fig. 3 to support a general testbed environment for autonomous agents:

– Mission specification: to define the functionality andgoals, first it is required to specify the configuration ofthe mission: the components of the control architectureand the platform over which they operate. In this way,the system provides a convenient mode to allow the userto configure some components of the controlled system,such as the deliberators. Typically, these AI controllerswork with a domain and a problem. First one defines theinteractions between different elements of the world, andthe last one includes some initial facts and the desired ob-jectives of the mission. The specification of a missiontestbench could consist of an evaluation of different con-trol architectures or various configurations for a controlarchitecture to select the best one for a particular mission,or the evaluation of the performance for a particular con-trol architecture over a set of missions, to evaluate andimprove the components employed. The OGATE systemwill be able to support these tests in an automated way.

– Mission execution: the mission specification includes theconfiguration of the different components (executives, de-liberatives, etc.) involved in the mission. The executionsupport of OGATE provides a framework to deal with thecomplexity of the underlying architecture of the differ-ent components, to execute the user defined mission and

to gather the relevant data in order to give it to the user.Also, during the execution, the user must be able to inter-act with the controlled system, modifying internal param-eters or including new mission goals to change the nomi-nal execution, in order to test the robustness of the systemand the replanning capabilities of the planning and execu-tion components, or to include new science opportunitiesfor real missions to maximize the science return. So, us-ing specific defined interfaces, OGATE is able to accessto the different components to control and gather the rel-evant data.

– Report: from the previous data, a research effort toidentify and to develop useful metrics for comparing theperformance of different deliberative layers will be ad-dressed, in order to obtain strong conclusions when com-paring different deliberative components under the sameconditions. OGATE will provide a human-legible re-port of the mission when the required test have been per-formed.

A first contact with the environmentFor the first deployment of the OGATE environment we em-ploy the GOAC architecture. The mission specification relieson the OGATE infrastructure to provide the configuration ofthe mission by the user, that is, select domain and problem,and the deliberative components that will be attached to theGOAC architecture. Also, it will be possible to define dif-ferent platforms to control, with their respective functionallayers. The configuration of the different reactors for the ar-chitecture (i.e. command dispatcher or functional layer) willbe easily set by the user using a provided graphical interface,encapsulating the complexity of dealing with the underlyingGOAC architecture. The mission specification generates aconfiguration file as shown in fig. 4 to be used by the mis-sion execution.

Figure 4: OGATE mission specification

The mission execution takes the configuration definedby the user and it attaches the different components tothe OGATE system to execute them in a coordinated way.

6

Some of these components are basic components over whichOGATE has no control, it only executes them because theyare required (for example, a simulation platform will beexecuted before execute the functional layer that controlsthe platform), while other components will be controlleddirectly by OGATE, such as the deliberative component.These components will be defined as OGATE plug-in com-ponents (more details in next section). For deliberative plug-in components a GUI allows the user to include new goalsor to modify the internal state during the test, showing therelevant data in real-time.

From the data gathered by the plug-in components andother relevant data generated by the system, OGATE willprovide a report to measure the performance for these com-ponents and for the whole configuration.

At this moment we are focused on the deliberative capa-bilities, and for the initial test we are interested in two points:

– Starting from the timelines approach and with the APSI-TRF support, deploy an automated problem generator andto perform some initial tests to evaluate the planner, in-creasing the problem hardness. This generator must bebased on critical factors that affect directly to the goal ori-ented controllers, and the problems must be defined usinga set of parameters which denote the problem hardnesswith valid and objective criteria.

– When the system is mature, we plan to interchange theconfiguration and technology employed for the deliber-ative and executive process, using different instances ofGOAC for interleaving planning and execution, and othersolutions based in different paradigms, such as the plan-then-act schema, like MOBAR architecture does (Munozand R-Moreno 2013).

The deliberative plug-inA OGATE plug-in is the one that implements some function-ality, giving some access to OGATE to control its execution,or to retrieve data from it through specific interfaces. Fo-cusing on the deliberative component, the OGATE missionexecution module should access and control this component,then its functionality (the planning service) must be encap-sulated in a OGATE plug-in that could have three connec-tions with the OGATE system as fig. 5 shows. In the figurethe deliberative component functionality is accessed by theOGATE system through the following interfaces:

– Control interface: provides the basic functionality torun, pause or stop the component safely.

– Data interface: supplies a bidirectional channel to re-trieve the relevant data and to modify the internal state ofthe component. For deliberative components it also pro-vides a function to include new goals during the execu-tion.

– Component GUI: it is possible to include a specific userinterface for that component in the OGATE GUI.

In order to implement the different interfaces, a setof Java methods is given in a template for every in-terface. For the first and the second ones, the meth-ods are calling such as boolean stop() or float

Figure 5: A deliberative component plug-in

getFloatVal(String valId). As it is possible thatthe component is developed in C/C++, OGATE will also in-clude some functionality to connect these components giv-ing some standard code to made the interconnection betweenOGATE and the component. The implementation of thefunctionality of these methods are responsibility of the com-ponent developer. For the component GUI, OGATE can in-tegrate a Java GUI inside its environment with a minimummodification of the code; graphical components developedin other technologies must be executed outside the environ-ment or executed by OGATE as a basic component.

From the different plug-ins attached to the OGATE sys-tem, the information gathered through the data interface willbe passed to the Metrics module, which takes the relevantdata and generates a report based on specialized metrics.This is the final result of the execution of the OGATE sys-tem.

Current deploy of OGATEThe deployment of an automated control architecture is usu-ally a complex task, that requires some knowledge about thedifferent components: how are they interconnected, whichone must be executed first to be coupled with the others,etc.. So, if we are interested in disseminating our work, wemust spend some time in preparing manuals and help newusers.

For these new users, when they are able to start the sys-tem, a huge amount of data is (generally) generated and pre-sented via the command line or via specialized graphical in-terface. But depending on the users preferences, they maywant to pay attention to some particular data.

The current state of the OGATE system integrates Knowl-edge Engineering capabilities, using the different compo-nents that forms the GOAC architecture, what allows us to:

– Create new problems of the space robotic case study do-main (mentioned in previous section) using an automaticgenerator that defines the hardness in function of the num-ber of science tasks and the time in which the Remote Or-biter is visible.

7

– Create scenarios through a GUI to specify the planningand execution configuration used to solve the problem(s)defined. Currently, the T-REX configuration is fixed toone deliberative reactor, the command dispatcher reactor(connected to the functional support of the robot) and avisualization tool provided by the T-REX framework (itvisualizes the current state of the different timelines con-tained in the domain). So, the user only can change theplanning service employed by the deliberative reactor andthe domain and problem to resolve.

– Define components that will be run without the OGATEcontrol such as the functional support of the robotic plat-form or the simulator. Actually in our implementation,the DALA robot simulator is only available.

– Integrate different plug-ins; OGATE not only accepts de-liberative plug-ins, other components could be attached tothe system, such as the visualization tool of T-REX.

– Attach the different components to generate an instanceof the system, which will be executed by OGATE withoutany intervention of the user.

– Focus on what we are really interesting since insideOGATE is possible to define which components generaterelevant data, and then, show only that information insidea unique GUI.

The cycle for the execution of a control system withinOGATE could be seen in fig. 6. Starting form the differentcomponents available (the T-REX engine, a timeline-basedplanner, domain and problem, and a robotic platform), theuser can exploit the mission specification module of OGATEto generate a configuration file that defines what are theproblems to address, and with which instance of T-REX theproblem will be solved within the mission specification GUI(shown at the top left window). After that, the mission exe-cution module takes the user choice and creates an instanceof T-REX with the selected planner and the functional sup-port, and it attaches them correctly. In fig. 6 is possibleto see (bottom right) OGATE running the current availableinstance of T-REX with the OMPS planner as the delibera-tive component inside an OGATE plug-in. This plug-in im-plements the control and data interface to allow OGATE tocontrol the planning service and to retrieve the relevant dataproduced. Also, there is another OGATE plug-in to includethe T-REX’ visualization tool inside the OGATE GUI. Thefunctional support is the DALA simulator that requires spe-cific components to be executed without being controlled bythe OGATE system. One of them is the mp-oprs, a mes-sage passer service for communication with the robotic plat-form exploited by OGATE, for which output is presentedinside the OGATE GUI. Also, OGATE maintains a registryor log for the different components, which could be valuablefor debugging the controlled system. Finally, OGATE mustpresent the relevant data gathered to the user.

Although OGATE provides some advantages, we are cur-rently working on expanding its capabilities for makingexecution more ”user-comprehensible”. At this momentOGATE shows the information of the components, but notthe interaction between them. This is an important issue that

has to be addressed to better understand what and how thesystem works. Also, through the GUI, our intention is toprovide a framework that allows the user to support differentplanning and execution systems via a plug-in style, trying toease the deployment of complex systems. The support isnot only applicable to the deliberative layer, but also to thefunctional support, in order to exploit different robotics plat-forms or simulators suites. Finally, our intention is to inves-tigate the metrics to define the performance for planning andexecution systems, implementing objective and comprehen-sive comparison reports for autonomous controllers insidethe OGATE system.

ConclusionsControl architectures for autonomous robots are complexsoftware systems, not only from a design and implemen-tation perspective but also from a research point of viewsince we want to analyze and (possibly) improve their per-formance. This is in part a consequence of the lack of atestbench that allows the user to define metrics and obtainresults from the execution of different tests (by inspection ofthe relevant parameters).

Also, it is usually very complex to deploy and execute anautonomous control architecture, which generally is formedby different interconnected components. Besides, once thesystems is executing, the operator gets a huge amount of nonhuman-legible information, confusing the user and keepinghim/her away from the relevant data.

For this reason we are working on the OGATE system,which aims to address part of this gap, providing a generaltestbench to allow easy deployment of an autonomous ar-chitecture: giving the OGATE plug-ins and configurationsfiles, it will be easy to deploy any system. It will be only re-quired to load the specific files and run it. Then, specializedgraphical interfaces will be enabled to show the relevant datato the user, encapsulating the complexity of the underlyingfunctionality. Finally, OGATE will also offer an automatedtestbench to obtain quantitative comparison based on accu-rate experiments, which leads to human-legible reports withwell defined metrics.

AcknowledgementsPablo Munoz is supported by the European Space Agency(ESA) under the Networking and Partnering Initiative (NPI)Cooperative systems for autonomous exploration missions.CNR authors are partially supported by the Italian Ministryfor University and Research (MIUR) and CNR under theGECKO Project (Progetto Bandiera “La Fabbrica del Fu-turo”). Authors want to thank to the ESA’s technical officerMr. Michel Van Winnendael for his continuous support.

ReferencesAmbros-Ingerson, J., and Steel, S. 1998. Integrating Plan-ning, Execution and Monitoring. In AAAI, 83–88. AAAIPress.Bensalem, S.; de Silva, L.; Gallien, M.; Ingrand, F.; and Yan,R. 2010. “Rock Solid” Software: A Verifiable and Correct-by-Construction Controller for Rover and Spacecraft Func-

8

Figure 6: A first deploy of the OGATE system

tional Levels. In i-SAIRAS-10. Proc. of the 10th Int. Symp.on Artificial Intelligence, Robotics and Automation in Space.

Bresina, J.; Jonsson, A.; Morris, P.; and Rajan, K. 2005. Ac-tivity planning for the Mars Exploration Rovers. In in Proc.of the 15th International Conference on Automated Planningand Scheduling.

Ceballos, A.; Bensalem, S.; Cesta, A.; Silva, L. D.; Fratini,S.; Ingrand, F.; Ocon, J.; Orlandini, A.; Py, F.; Rajan, K.;Rasconi, R.; and Winnendael, M. V. 2011. A Goal-OrientedAutonomous Controller for Space Exploration. In ASTRA2011 - 11th Symposium on Advanced Space Technologies inRobotics and Automation.

Cesta, A.; Cortellessa, G.; Fratini, S.; and Oddi, A. 2009.Developing an end-to-end planning application from a time-line representation framework. In IAAI-09. Proc. of theThe Twenty-First Innovative Applications of Artificial Intel-ligence Conference.

Cesta, A.; Fratini, S.; and Oddi, A. 2005. Planningwith Concurrency, Time and Resources: A CSP-Based Ap-proach. In Vlahavas, I., and Vrakas, D., eds., IntelligentTechniques for Planning. Idea Group Pubhishing. 259–295.

Chien, S.; Tran, D.; Rabideau, G.; Schaffer, S.; Mandl,

D.; and Frye, S. 2010. Timeline-based space operationsscheduling with external constraints. In ICAPS-10. Proc. ofthe Twentieth International Conference on Automated Plan-ning and Scheduling.

Dechter, R., and Pearl, J. 1987. Network-based Heuristicsfor Constraint-Satisfaction Problems. Artificial Intelligence34(1):1–38.

Do, M. B., and Khambhampati, S. 2000. Solving Planning-Graph by Compiling It Into CSP. In ICAPS-00. The FifthInternational Conference on Artifcial Intelligence Planningand Scheduling, 82–91.

Fikes, R. E., and Nilsson, N. J. 1971. STRIPS: A New Ap-proach to the Application of Theorem-Proving to Problem-Solving. Artificial Intelligence 2(3):189–208.

Finzi, A.; Ingrand, F.; and Muscettola, N. 2004. Model-based executive control through reactive planning for au-tonomous rovers. In In Proc. of 2004 IEEE/RSJ Interna-tional Conference on Intelligent Robots and Systems.

Frank, J., and Jonsson, A. 2003. Constraint-based attributeand interval planning. Journal of Constraints 8(4):339–364.

Fratini, S.; Pecora, F.; and Cesta, A. 2008. Unifying Plan-

9

ning and Scheduling as Timelines in a Component-BasedPerspective. Archives of Control Sciences 18(2):231–271.Gat, E. 1998. Three-layer architectures. In Kortenkamp,D.; Bonasso, R.; and Murphy, R., eds., Mobile Robots andArtificial Intelligence. AAAI Press. 195–210.Gerevini, A., and Long, D. 2005. Plan constraints and pref-erences in PDDL3. In Proc. of the Fifth International Plan-ning Competition.Jonsson, A. K.; Morris, P. H.; Muscettola, N.; Rajan, K.;and Smith, B. D. 2000. Planning in interplanetary space:Theory and practice. In the 5th International Conference onArtificial Intelligence Planning Systems.Korf, R. E. 1987. Planning as Search: A Quantitative Ap-proach. Artificial Intelligence 33:65–88.Mallet, A.; Pasteur, C.; Herrb, M.; Lemaignan, S.; and In-grand, F. 2010. GenoM3: Building middleware-independentrobotic components. In 2010 IEEE Proc. of the InternationalConference on Robotics and Automation.Munoz, P., and R-Moreno, M. D. 2013. Model-Based Ar-chitecture on the ESA 3DROV simulator. In In Proc. of the23rd ICAPS Application Showcase.Muscettola, N. 1994. HSTS: Integrating Planning andScheduling. In Zweben, M. and Fox, M.S., ed., IntelligentScheduling. Morgan Kauffmann.Poulakis, P.; Joudrier, L.; Wailliez, S.; and Kapellos, K.2008. 3DROV: A planetary rover system design, simula-tion and verification tool. In International Symposium onArtificial Intelligence, Robotics and Automation in Space (i-SAIRAS).Py, F.; Rajan, K.; and McGann, C. 2010. A System-atic Agent Framework for Situated Autonomous Systems.In AAMAS-10. Proc. of the 9th Int. Conf. on AutonomousAgents and Multiagent Systems.Smith, D.; Frank, J.; and Jonsson, A. 2000. Bridging the gapbetween planning and scheduling. Knowledge EngieneeringReview 15(1):47–83.

10

Planning the Behaviour of Low-Cost Quadcopters for Surveillance Missions

Sara Bernardini and Maria Fox and Derek LongDepartment of Informatics

King’s College LondonLondon, UK, WC2R 2LS

[email protected]

Abstract

Micro Aerial Vehicles (MAVs) are increasingly re-garded as a valid alternative to UAVs and ground robotsin surveillance missions and a number of other civiland military applications. Thanks to their light weight,small size and aerodynamics characteristics, MAVs aregreatly flexible and manoeuvrable, easily portable anddeployable, and safe for close interaction. Research onautonomous MAVs is still in its infancy and has focusedalmost exclusively on integrating control and computervision techniques to achieve reliable autonomous flight.In this paper, we describe our approach to using au-tomated planning in order to elicit high-level intelli-gent behaviour from autonomous MAVs engaged insurveillance applications. Planning offers effective toolsto handle the unique challenges faced by MAVs thatrelate to their fast and unstable dynamics as well astheir short range, low endurance, and small payloadcapabilities. We focus on a specific MAV, the “Par-rot AR.Drone2.0” quadcopter, and a specific surveil-lance application, Search-and-Tracking, which involvessearching for a mobile target and tracking it after it isfound.

1 IntroductionIn the last few years, there has been a considerable amountof work in developing autonomous Micro Aerial Vehicles(MAVs). Thanks to their light weight, small size and aero-dynamics characteristics, MAVs afford greater flexibilityand manoeuvrability than traditional unmanned aircraft. Be-ing capable of hovering, flying at very low speed, turningwith a small radius, as well as quickly taking-off and land-ing, MAVs can perform complex and aggressive manoeu-vres (Lupashin and D’Andrea 2012; Purwin and D’Andrea2009). The emergence of low-cost MAVs, to the point thatthey can be considered disposable devices, has allowed forrapid prototyping and testing of innovative techniques tosupport autonomous behaviour. Despite their relatively newappearance in the commercial market, MAVs have alreadybeing used in a number of military and civilian missions,including surveillance operations, exploration, weather ob-servation, disaster relief coordination, and civil engineeringinspections. Similarly to UAVs, MAVs can be used in anysituations in which it would be difficult or dangerous to senda human. However, thanks to their reduced dimensions, they

can also be used in scenarios that are inaccessible to largeunmanned vehicles, such as indoors, in cluttered outdoorscenarios and in the vicinity of people.

As for other robotic artefacts, building an autonomousMAV is a challenging task because it requires the integrationof techniques developed in different fields, including controltheory, navigation, real time systems, computer vision andphysics. However, in comparison with ground robots, MAVspresent unique characteristics that make devising algorithmsfor autonomy particularly demanding (Bachrach et al. 2010).First, these vehicles are difficult to control as they are inher-ently unstable systems with fast dynamics. Second, giventheir reduced dimensions and light weight, MAVs have a re-stricted payload, i.e reduced computational power as wellas noisy and limited sensors. Third, the life of a MAV bat-tery is usually short and allows continuous flight for a lim-ited period, ranging from a minimum of a few minutes to amaximum of around two hours. Finally, MAVs are almostalways in motion, as landing and taking-off are expensiveoperations that cannot be performed too often within a mis-sion. Considering all these factors, MAVs appear particu-larly well-suited to operate in situations in which there isvery little stability, information is changing rapidly and de-cisions about what action to perform and how to coordinatewith other MAVs must be made almost instantaneously. Ef-fective management of uncertainty, restricted resources andtight deadlines are crucial requirements for creating an au-tonomous and intelligent MAV.

To date, research concerning MAVs has mainly focusedon perception and control. In particular, in GPS-denied andcontained environments, such as indoor scenarios, the fo-cus has been on perception, with both vision sensors suchas cameras (Engel, Sturm, and Cremers 2012b; Bills, Chen,and Saxena 2011; Zingg et al. 2010) and non-vision sen-sors such as laser range scanners, sonar, and infra-red be-ing widely investigated (Roberts et al. 2007; Achtelik etal. 2009), and on navigation (Engel, Sturm, and Cremers2012b; Bills, Chen, and Saxena 2011; Courbon et al. 2009;Mori, Hirata, and Kinoshita 2007). Conversely, in out-door domains, where MAVs are subject to wind and turbu-lence, the emphasis has been on control, stabilisation andpose estimation (Abbeel et al. 2007; Moore et al. 2009;Fan et al. 2009). The use of automated planning to underpinthe behaviour of MAVs has received little attention so far,

11

with some effort devoted to path-planning and trajectory-planning (Bachrach et al. 2010; He, Prentice, and Roy 2008;Hehn and D’Andrea 2012), but none to task planning (to thebest of our knowledge).

In contrast with this trend, we believe that task planningcarries a significant potential for the development of intelli-gent MAVs as their short range, low endurance, and smallpayload capabilities pose challenges that cannot be met bysimple low-level control algorithms. In our previous work(Bernardini et al. 2013), we used task planning to underpinthe high-level operation of an autonomous UAV engagedin surveillance operations, and particularly in Search-and-Tracking (SaT) missions involving searching for a mobiletarget and tracking it after it is found. By developing a sim-ulator of a fixed-wing UAV undertaking SaT operations, wedemonstrated the effectiveness of our planning-based ap-proach to SaT in comparison with static strategies. In thispaper, we show that this approach generalises well to MAVsinvolved in SaT operations. In so doing, we offer a proof-of-concept that complementing computer vision and con-trol techniques with automated planning mechanisms is aviable way of enabling MAVs to exhibit intelligent and au-tonomous behaviour. Although we focus here on a specificMAV, the “Parrot AR.Drone2.0” quadcopter (see Figure 1),and a specific surveillance application, SaT, we believe thatour approach generalises to other MAVs and robotic vehi-cles as well as different surveillance operations. Ultimately,our goal is to demonstrate that our planning-based approachcan be used in any surveillance scenario to underpin the be-haviour of an observer that, knowing the dynamics of thetarget but not its intentions, needs to quickly make controldecisions in the face of such uncertainty and under tight re-source constraints.

7

(a) Indoor (b) Outdoor

Figure 2.2: Drone hulls

2.2 Indoor and outdoor design configurations

When flying outdoor the AR.Drone 2.0 can be set in a light and low wind drag configuration(2.2b). Flying indoor requires the drone to be protected by external bumpers (2.2a).

When flying indoor, tags can be added on the external hull to allow several drones to easilydetect each others via their cameras.

2.3 Engines

The AR.Drone 2.0 is powered with brushless engines with three phases current controlled by amicro-controller

The AR.Drone 2.0 automatically detects the type of engines that are plugged and automaticallyadjusts engine controls. The AR.Drone 2.0 detects if all the engines are turning or are stopped.In case a rotating propeller encounters any obstacle, the AR.Drone 2.0 detects if any of thepropeller is blocked and in such case stops all engines immediately. This protection systemprevents repeated shocks.

2.4 LiPo batteries

The AR.Drone 2.0 uses a charged 1000mAh, 11.1V LiPo batteries to fly. While flying the batteryvoltage decreases from full charge (12.5 Volts) to low charge (9 Volts). The AR.Drone 2.0 mon-itors battery voltage and converts this voltage into a battery life percentage (100% if battery isfull, 0% if battery is low). When the drone detects a low battery voltage, it first sends a warningmessage to the user, then automatically lands. If the voltage reaches a critical level, the wholesystem is shut down to prevent any unexpected behaviour.

Figure 1: Parrot AR.Drone

The work presented in this paper is still in progress, socurrently we cannot report on extensive experimental re-sults, which we have not performed yet. However, based onour previous experience on SaT missions with UAVs andthe very limited experiments that we have performed, weare confident that our approach will prove successful.

The organisation of the paper is the following. In Sections2 and 3, we describe the characteristics of the AR.Drone2.0and of SaT missions, respectively. In Section 4, we give anoverview of our planning-based approach to SaT using thequadcopter as the observer. We then explain our approach indetail in Section 5. In Sections 6 and 7, we describe how wehave implemented our method and the intended set-up forour experiments. We conclude with final considerations and

a description of future work in Section 8.

2 Hardware Platform: Parrot AR.DroneWe use the AR.Drone quadcopter1 as our prototyping hard-ware platform and as an example of MAV to which ourapproach can be applied. The AR.Drone is a low-cost andlight-weight quadcopter that was launched in 2010 by theFrench company “Parrot” as a high-tech toy for augmentedreality games. Since then, it has been increasingly popu-lar in academia and research organisations as an affordabletest platform for MAV demonstrations (see projects at Cor-nell University2 (Bills, Chen, and Saxena 2011), TechnischeUniversitat Munchen3 (Engel, Sturm, and Cremers 2012b)and RMIT University in Melbourne4 (Graether and Mueller2012), just to mention a few of them). With respect to othermodern MAVs, the AR.Drone has a number of advantages:it is sold at a very low cost, is robust to crashes, can be usedvery close to people and its onboard software provides reli-able communication, stabilisation, and assisted manoeuvressuch as take-off and landing.

The AR.Drone is composed of a carbon-fibre tube struc-ture, plastic body, high-efficiency propellers, four brushlessmotors, sensor and control board, two cameras and indoorand outdoor removable hulls. It has a weight of respectively380 g and 420 g with the indoor and the outdoor hull, andmaximum dimensions of 52 × 52 centimetres. The droneaffords an average speed of 5 meters per second, with max-imum speed reported as 11 meters per second. Its lithiumpolymer battery provides enough energy up to 13 minutes ofcontinuous flight and takes around 90 minutes to recharge.

The quadcopter is equipped with an ARM9 processor run-ning at 468MHz with 128 MB of DDR RAM running at200MHz and a WiFi network. The drone acts as a wirelessserver and assigns itself (via its DHCP server) a fixed IP ad-dress through which it is possible to communicate with it.

The sensory equipment of the AR.Drone is composed of:i) 3-axis accelerometer; ii) 2-axis gyroscope (for measur-ing/maintaining orientation); iii) 1-axis yaw precision gy-roscope; iv) ultrasound altimeter (for vertical stabilisation);and v) two cameras. The first camera is aimed forward, cov-ers a field of view of 73.5◦ × 58.5◦, has a resolution of1280×720 pixels (720p) and its output is streamed to a lap-top at 30 fps. The second camera aims downward, coversa field of view of 47.5◦ × 36.5◦ and has a resolution of320×240 pixels (QVGA) at 60 fps.

The onboard software uses the down-looking camera toestimate the horizontal velocity and the other sensors to con-trol the roll Φ and pitch Θ, the yaw rotational speed Ψ andthe vertical velocity z of the quadcopter according to an ex-ternal reference value (see Figure 2).

The onboard software provides three communicationchannels with the drone:• Command channel: used to send commands to the drone

(e.g. take-off, land, calibrate sensors, etc.) at 30 Hz.1http://ardrone2.parrot.com/2http://mav.cs.cornell.edu/3https://vision.in.tum.de/research/quadcopter4http://exertiongameslab.org/projects/joggobot

12

of the drone internal controllers. Moreover, it is possible to cross-compile anapplication for the ARM processor and run it directly on the AR-Drone controlboard. In this case, one can access the drone cameras and onboard sensors di-rectly without a delay caused by the wireless data transfer. Thus, one can achievefaster control loops and experiment with a low level control of the drone. Evenwhen a custom application is running on the platform control board, the internalcontrollers, which take care of the drone stability, can be active. However, thememory and computational limits of the control board have to be taken intoaccount when developing an application, which should run onboard the drone.

For our purposes, we have created a simple application, which uses all threeaforementioned channels to acquire data, allows drone control by a wireless joy-stick and performs a simple image analysis. This piece of freely available soft-ware [11] serves as a base for more complex applications, which provide the dronewith various degrees of autonomy. The software does not require any nonstan-dard libraries and works both under GNU/Linux and Windows environments.

3 Autonomous navigation

In this chapter, we will show how to implement several autonomous behaviours.We will start by a simple position control, continue with hovering over a movingobject and traveling along a predefined path.

x

φθy

z

η

u

yawpitch

v

roll

Fig. 3: Coordinate system of the drone

Figure 2: Coordinate system of the AR.Drone

• Navdata channel: providing information about the dronestatus (flying, calibrating sensors, etc.) and pre-processedsensory data (current yaw, pitch, roll, altitude, batterystate and 3D speed estimates) at 30 Hz. Since the dronecan run a simple analysis of the images from the frontalcamera and search for specially designed tags in the im-ages, the navdata channel contains estimates of the tags’positions if such tags have been detected.

• Stream channel: providing images from the frontal and/orbottom cameras. Images from both cameras cannot be ac-quired at the same time, and switching between camerasrequires 300 ms.

3 Mission Description: Search-and-TrackingAlthough the AR.Drone is suitable for a number of surveil-lance applications, we focus here on SaT missions. SaT isthe problem of searching for a mobile target and tracking itafter it is found. Solving this problem effectively is impor-tant as SaT is a common component task in many surveil-lance operations. SaT missions are plausible both outdoorsand indoors as well as both in adversarial and cooperativecontexts. We can imagine, for example, a drone chasing asuspect in a parking lot or escorting a worker who performsrisky tasks in a factory.

In our case, the observer is the AR.Drone, while the tar-get can be a person or any moving object that proceeds ata speed compatible with the speed of the drone. As we donot deal with object recognition, we assume that our targetis identified by a specific tag known in advance by the drone.In addition, we assume that the configuration of the environ-ment that confines the movement of the target is known tothe drone. If the SaT mission takes place outdoors, we as-sume that the road network and the terrain types are known,whereas if the mission happens indoors, we assume that thetopology of the indoor space is known to the quadcopter.Finally, we assume that the target has a destination and, toreach such destination, it follows the road network when itmoves outdoors and the path structure (corridors, stairs, etc.)when it moves indoors. The target does not perform evasiveactions by attempting to use features in the environment forconcealment or to deviate from the originally intended path.This is a plausible assumption as the target might be coop-erating with the drone or simply unaware of its presence. In

practice, as described in Section 6, in our initial experiments,we have used a robotic wheeled vehicle as the target and as-sume that it follows a road network when it moves. All theseassumptions are in line with our previous work relating to afixed-wing UAV involved in a SaT mission (Bernardini et al.2013). This helped us to generalise the solution adopted forthe UAV case to the MAV case.

The objective of a SaT mission is to follow the target toits destination. In general, a SaT mission proceeds in twophases, which interleave until the target stops or until theobserver acknowledges it has lost the target irrevocably andabandons the mission. These phases are:• Tracking: the drone simply flies over the target, observing

its progress; and• Search: the drone has lost the target and flies a series of

manoeuvres intended to rediscover the target.Once the target is rediscovered, the drone switches back

to the tracking mode.

4 Planning-based Approach to SaTAs described in the previous section, any SaT mission con-sists of two phases: tracking and search.

We manage the tracking phase through a reactive con-troller equipped with vision capabilities: the problem is sim-ply one of finding a route that maximises observations ofthe target. However, when the drone fails to observe the tar-get, it must attempt to rediscover it. How this is achieveddepends on the time since the observer last observed the tar-get. For a short period after losing the target, the drone cansimply track the predicted location of the target, since thetarget cannot move fast enough to significantly deviate fromthis prediction. However, after a period (whose length de-pends on the speed of the target, the field of view of theimaging equipment, and the observation probability), it willbe necessary to make a more systematic effort to rediscoverthe target by directing the search into specific places. Thisis when task planning comes into play. Right after loosingtrack of the target, the following information is available tothe drone: the last known location and velocity of the tar-get, the average velocity over the period the target has beentracked, the map of the environment and the current positionof the drone itself. Based on this information, it is possible toformulate the task of rediscovering the target as a planningproblem: a search plan is constructed from a set of candidatemanoeuvres (for example, spirals, lawnmowers, polygonalfigures, hovering, etc.) that can be arranged in a sequence toattempt to optimise the likelihood of rediscovering the tar-get. If, while flying this search plan, the target is rediscov-ered, the observer switches back to tracking mode.

As mentioned in the Section 1, we have already demon-strated the effectiveness of a planning-based approach toSaT in our previous work concerning a single UAV trackingand searching for a ground vehicle through a mixed urban,sub-urban and rural landscape (Bernardini et al. 2013). Thesolution we proposed is to track the target reactively whileit is in view and to plan a recovery strategy that relocatesthe target every time it is lost, using a high-performing auto-mated planning tool. The recovery strategy involves flying a

13

sequence of standard search patterns, i.e. spirals and lawn-mowers, at particular times and in specific locations of thegeographical area considered. The planning problem con-sists of deciding where to search and which search patternsto use in order to maximise the likelihood of recovering thetarget. Our results indicated that this approach is successfuland certainly outperforms static search strategies. By usingour technique, we have been able to tackle SAT problems ona large scale, a 100 kilometre square area, which representsa significant challenge to the problem of search, far beyondthe capabilities of current alternatives.

The use of quadcopters in SaT missions makes the plan-ning problem even more challenging than when UAVs areused. In fact, the planning model must reflect the specificconstraints pertaining to the quadcopter’s capabilities, andthe planner must reason within such constraints in order toformulate an effective search strategy. These constraints re-late to safety considerations in the use of the drone, the factthat the drone is always in motion, and finally the drone’slimited computational power, cheap and noisy sensors, andshort battery life. We believe that dynamically planning thebehaviour of a quadcopter in the face of all these constraintsinstead of using a static policy is even more crucial than inthe case of UAVs. Only with a carefully crafted strategy, canthe quadcopter achieve satisfactory performance within thelimitations posed by its technical characteristics.

As our project is still in progress, we have not yet fullyhandled all the constraints pertaining to quadcopters in ourcurrent implementation. Instead, as reported in detail in thenext two sections, we have focused on translating the plan-ning approach to SaT used for fixed-wing UAVs to MAVsand robustly implementing such method on a real quad-copter, the AR.Drone. However, modelling the specificitiesof the drone and exploiting them to formulate more effectiveplans is part of our future work.

5 Search as PlanningIf the drone loses the target beyond the short period forwhich it tracks its predicted location, it must follow a searchstrategy to attempt to rediscover it. We propose to exploithovering, and flight patterns, such as zig-zags, spirals, lawn-mowers, polygonal figures and patrolling around obstacles,as the building blocks for a search plan that attempts to max-imise the expectation of rediscovering the target. The chal-lenge, then, is to decide where these search patterns shouldbe deployed. One possibility is to use a fixed strategy of sim-ply flying some standard configuration of these patterns overthe area where the target was lost. However, a more interest-ing approach is to see the problem of selection and sequenc-ing of search patterns as a planning problem: each searchpattern can be assigned a value corresponding to the expec-tation of finding the target in a search of that area and thenthe drone can select a sequence of search patterns, linkingthem together with a series of flight paths, in order to max-imise the accumulated expectation of rediscovery within thelimits of its resources.

As we have already pointed out in our previous work(Bernardini et al. 2013), despite the inherent uncertainty in

the situation, the problem is deterministic, since the uncer-tainty arises in the position of the target and, if the targetis found, the plan ceases to be relevant. Therefore, the planis constructed entirely under the assumption that the targetremains undiscovered. Somewhat counter-intuitively, “plan-failure” corresponds to the situation in which the target isfound, counter to the assumption on which the plan is based.However, on failure of this assumption, the plan becomesirrelevant and the drone switches back to the tracking mode.

5.1 Planning DomainThe domain model for the search problem has a straightfor-ward structure. There are actions for: i) taking-off; ii) land-ing; iii) hovering; iv) flying from one waypoint to another;and v) performing basic search patterns, such as spirals,lawnmowers, zig-zags and polygonal figures. The searchpattern actions all have similar forms: they each have an en-try waypoint and an exit waypoint and the effect, other thanto move the drone from the entry to the exit point, is to in-crease the reward (which is the accumulated expectation offinding the target). The actions are durative and their dura-tion is fixed in the problem instance to be the correct (com-puted) value for the execution of the corresponding search.The search patterns can only be executed so that they coin-cide with a period during which the target could plausiblybe in the area the pattern covers. This is calculated by con-sidering the minimum and maximum reasonable speeds forthe target and the distance from where the target was last ob-served. The reward is more complicated and is discussed indetail below, but the problem instance associates with eachpattern a reward, using a time-dependent function.

As an example of how search actions are modelled inPDDL2.2, the following is the description of the actiondoSmallLawnmower, which specifies the location andavailability of a lawnmower search pattern, the time it takesto complete it and the reward available.(:durative-action doSmallLawnmower:parameters (?from ?to - waypoint ?p - smallLawnmower ):duration (=?duration (timefor ?p)):condition (and (at start (beginAt ?from ?p))

(at start (endAt ?to ?p))(at start (at ?from))(at end (active ?p)))

:effect (and (at end (at ?to))(at start (not (at ?from)))(at end (increase (reward) (rewardof ?p)))))

The time-dependent reward function is managed bytimed-initial fluents in the problem specification that changethe reward of the patterns as time progresses. The shape ofthe function is constructed to represent an approximate liftedGaussian distribution, with no reward until the target couldplausibly have arrived at the search area and no reward afterthe target is unlikely to be still present in the area. Betweenthese extremes, the reward peaks at the point where the tar-get would be in the centre of the search pattern if driving ataverage speed. The model can be modified by adding moreor fewer intermediate points in the step function approxima-tion of the distribution. To ensure that the planner does notexploit search patterns when there is no reward associatedwith them, the patterns are only made active during the pe-riod when the distribution is positive, using timed initial lit-erals that are asserted and retracted at the appropriate times.

14

In future work, we plan to enrich the planning domain byadding, for example, a model of the drone’s battery. As thebattery has a very limited life, the planner needs to take intoaccount the cost of performing different actions in terms oftheir battery consumption.

5.2 Planning ProblemThe problem has no goal, but the plan metric measures thevalue of the plan in terms of the accumulated expectationof finding the target. A few examples of problems of thissort have been considered before (for example, one variantof the satellite observation problem used in the 3rd Interna-tional Planning Competition (Long and Fox 2003) had thischaracter) and it is typically the case that bench-mark plan-ners generate empty plans, ignoring the metric. We discussbelow our management of this problem.

To create the initial states for our planning problems, wehave to manage two tasks: (i) identifying candidate searchpatterns; and (ii) assigning appropriate rewards to them. Thefirst task is made difficult by the fact that there are infinitelymany patterns that could be used, while the second is madedifficult because of the lack of knowledge about the inten-tions of the target. In what follows, we show how we managethese tasks for outdoor SaT missions, on which we are cur-rently focusing in our experiments. However, similar con-siderations can be made for indoor spaces.

To address the first problem we observe that the plannercan only consider a finite subset of search patterns and, sincewe want to perform planning in real time, is limited to beingable to consider a reasonably small number of candidates.Therefore, we generate a sample of possible search patternsby randomly selecting a set of shapes (circles, rectangles andpolygons) and placing them onto the general search area.There are three steps involved in this (see (Bernardini et al.2013) for additional technical details on each step)):

1. Circular sector construction: we use as our generalsearch area a circular sector that is centred on the lastknown location of the target and extends outwards withits symmetry axis aligned with the average bearing of thetarget over the period the target has been observed. Thesector extends outwards for a distance whose exact valuedepends on the total area included in the sector and therelative time required to fly a search pattern of a givenarea. There is a point at which the area where the targetcould be present is so much larger than the area that thedrone can search, during a period when the target could bepresent, that the expectation of finding the target dimin-ishes to a negligible value. The angle subtended by thesector reflects the degree of uncertainty in the heading ofthe target. In general, a target will follow the direction thatis forced on it by the paths it uses, but the average bearingwill converge, over time, on the direction from the originto the destination. The longer the target is observed, thecloser will this convergence become.

2. Sampling: once the relevant sector is identified, we thensample points using a probability distribution laid overthe sector. This distribution is based on the density ofroads across the sector, which is measured by using a fine-

mesh grid and counting the number of significant roadswithin each grid cell, the terrain type (urban, suburban,mountainous, forested, rough or open rural ground) andthe distance from the symmetry axis and from the lastknown location of the target. The distribution decays lin-early with distance from the origin, linearly away from thesymmetry axis and is weighted by values for terrain typeand road density. Although the density of patterns decaysaway from the origin, the effect is muted because the rel-ative areas available for selection are proportional to thedistance from the origin.

3. Search pattern generation: finally, we decide the type ofpattern to use for each point: we favour spirals for cov-ering an area of high density road network, particularlyin urban or suburban terrain, and lawnmowers when at-tempting to search over a more elongated stretch cover-ing a major road and including some possible side roads.For spirals, we select a radius based on the width of thesector at that point and the road network density. Forlawnmowers, we select an orientation and then width andlength. The orientation is based on the road network andis aligned to follow major roads or high densities of roads,while the width and length are determined by examiningthe road network and probability distribution.

As for the second problem, i.e. reward selection, we com-pute a shortest and longest time of arrival for the target byconsidering an average speed and variation in speed over thepath from the origin to the pattern. In principle, this mech-anism should use the road map to identify shortest paths,but this is too costly to compute in real time, so we insteadsample the terrain along the straight line from the origin tothe leading and far edges of the pattern. This is used to as aguide to the likely speed of the target on this path. In prac-tice, if the straight line path traverses rural areas then thetarget will either have to use smaller roads or else deviatefrom the direct path in order to exploit more major roads. Ineither case, the target will arrive at the target later than if thedirect path is through suburban terrain. On the other hand, ifthe terrain is urban then speed will be constrained by trafficlaws and other road users. The earliest and latest times areused to set up a value function, with these as the limits of thereward (outside this range the pattern is awarded no value).The peak reward is calculated as a proportion of the proba-bility density in the distribution across the intersection of thesector and the annulus centred at the same origin and withedges coinciding with the boundaries of the search pattern.This represents a surrogate for the total available probabilitydensity across the time period covered by the search pattern,although it is clearly an approximation.

Once the initial state is prepared, we can plan.

5.3 Planning MechanismWe exploit the period in which the quadcopter tracks thepredicted location of the target to perform planning. Inour application, we use an off-the-shelf planner called OP-TIC (Benton, Coles, and Coles 2012) to build plans for thedrone. OPTIC is a version of POPF (Coles et al. 2010) specif-ically designed to perform anytime, cost-improving search.

15

We use a time-bounded search limited to 10 seconds be-cause, having the drone a very limited battery life, we arein a time-critical situation. The planner will typically find afirst solution very easily, since the empty plan is already afeasible solution, but it will then spend the additional timeimproving on this by adding further search patterns to theplan, or trying different collections of patterns. The searchuses a weighted-A? scheme with steadily changing weightsin a tiered fashion (see (Benton, Coles, and Coles 2012) fordetails). The plans produced in this way are monotonicallyimproving, so the final plan produced is the one we select forexecution. We use OPTIC because it is very fast at produc-ing its first solution and provides an any-time improvementbehaviour. LPG would offer similar characteristics but ourexperiments indicated OPTIC was better for our problems.

The plan is dispatched via a simple controller, action byaction. At the conclusion of execution of the plan, in princi-ple, two alternatives are viable for the drone, depending onhow long has passed since the target was last seen: spendingmore time generating a new plan, or abandoning the search.We always make the drone abandon the search and land atthis point in our current implementation.

So far we have implemented a static policy, which is basedon replanning. The observer enters a planning phase ev-ery time it loses the target. We allow the observer 10 sec-onds for planning (it can be configured to have longer) ateach of these points. We intend to later replace this plan-ning behaviour with an essentially instantaneous action se-lection using a learned policy. Learning a policy, to replacethe static policy currently implemented, is a future goal forour work.

6 ImplementationIn order to carry out a SaT mission, the AR.Drone needsto combine the abstract deliberative skills illustrated in theprevious sections with low-level control and vision capabil-ities. In particular, the drone needs different skills in the twophases of a SaT mission:

• For the tracking phase, the drone needs to be able to ac-complish the following tasks:

– Tag recognition: we assume that our target is identi-fied by a specific tag, and therefore, the drone needs tobe able to recognise tags from a distance based on thevideo stream coming from its cameras. We use com-puter vision algorithms to solve this problem.

– Tag following: once the drone has recognised thetag corresponding to the target, it needs to followit reactively. We achieved this by implementing aProportional-Integral-Derivative (PID) controller thatworks based on the navigation data provided by thedrone’s driver.

• For the search phase, the drone essentially needs to beable to fly autonomously to a given position as the planformulated by the planner specifies waypoints to fly toand search patterns to execute, which are in turn sets ofwaypoints to be reached in a particular sequence. Au-tonomous flight requires a combination of low-level con-

trol capabilities, such as maintaining attitude, stabilisa-tion, and compensation for disturbances, and high-levelcontrol skills, such as compensation for drift, localisationand mapping, obstacle avoidance and navigation. Conve-niently, the AR.Drone provides built-in low-level control,so we have focused on high-level control only. Since wecurrently ignore obstacle avoidance, our implementationprovides capabilities for localisation and mapping, andnavigation. The navigation system that we use is com-posed of three major components: monocular simultane-ous localisation and mapping (SLAM) for visual tracking,Extended Kalman Filter (EKF) for data fusion and predic-tion, and PID control for pose stabilisation and navigation.We describe them in detail below.We have implemented our application within the Robot

Operating System5 (ROS) framework (Quigley et al. 2009).ROS is an open-source, meta-operating system for robotsand provides basic services such as hardware abstraction,low-level device control, implementation of commonly-usedfunctionality and message-passing between processes. ROSfacilitates code reuse in robotics as it offers tools for find-ing, building, and running code across multiple computers.A ROS application can be seen as a peer-to-peer networkof processes, called nodes, that perform computation andcommunicate with each other by passing messages, whichare data structures with typed fields. A node sends a mes-sage by publishing it to a given topic, which is simply astring used to identify the content of the message. A nodethat is interested in a certain kind of data will subscribe tothe appropriate topic. A node can also offer a service to theother nodes, which send request messages for such serviceand wait for replies. In the spirit of ROS, we have leveragedexisting packages for implementing our SaT application. Inparticular, we built on the following ROS packages:• ARDRONE AUTONOMY6: this is a ROS driver for the

Parrot AR.Drone based on the official AR.Drone SDKversion 2.0 and developed by the Autonomy Lab at Si-mon Fraser University. The driver’s executable node,ARDRONE DRIVER, offers a number of features:– it converts all raw sensor readings, debug values and

status reports sent from the drone into standard ROSmessages, which can then be used and interpreted bythe other ROS nodes involved in the application;

– it allows to send control commands to the drone fortaking off, landing, hovering and specifying the desiredlinear and angular velocities; and

– it provides additional services such as led and flight an-imations.

• AR RECOG7: this is a ROS vision package developed bythe Robotics, Learning and Autonomy at Brown Univer-sity that allows the drone to recognise specific tags aswell as to locate and transform them in the image-space.This package is based on the ARToolKit8, which is a well-5http://www.ros.org6http://wiki.ros.org/ardrone autonomy7http://wiki.ros.org/ar recog8http://www.hitl.washington.edu/artoolkit/

16

established software library for building augmented real-ity applications.

• TUM ARDRONE9: this ROS package is based onARDRONE AUTONOMY and has been developed bythe Computer Vision Group of Technische UniversitatMunchen. It implements autonomous navigation and fig-ure flying in previously unknown and GPS-denied en-vironments (Engel, Sturm, and Cremers 2012a; Engel,Sturm, and Cremers 2012b). The package is composed oftwo main nodes:

– DRONE STATE-ESTIMATION: This node provides thedrone with SLAM capabilities. SLAM is the process ofusing onboard sensors to estimate the vehicle’s positionand then using the same sensor data to build a map ofthe environment around the vehicle. In particular, thisnode implements a SLAM algorithm based on Paral-lel Tracking and Mapping (PTAM) (Klein and Murray2007). Finally, this node implements an EKF for stateestimation and compensation of the time delays in thesystem arising from wireless LAN communication.

– DRONE AUTOPILOT: This nodes implements a PIDcontroller for pose stabilisation and navigation. Basedon the position and velocity estimates from the EKF,the PID control allows us to steer the quadcopter to-wards a desired goal location expressed in a global co-ordinate system. In addition, this node implements ascripting language to control the drone by sending itcommands for initialising PTAM, taking off, landing,going to a specific waypoint and moving in a given di-rection with respect to the current position.

• TUM SIMULATOR10: this is a ROS package that imple-ments a Gazebo11 simulator for the AR.Drone developedby the Computer Vision Group of Technische UniversitatMunchen. Since this Gazebo simulator can be used as atransparent replacement for the quadcopter, it allows usto develop and evaluate algorithms more quickly becausethey do not need to be run on the real quadcopter.

We have implemented two additional ROS nodes for ourapplication: AR TAG FOLLOWING and AR PLANNER. Thefirst node implements a PID controller that, based on themessages received from the AR RECOG package, allowsthe drone to follow the detected tag. The second node,AR PLANNER, wraps the OPTIC planner and allows us tointegrate it with the rest of the system. The planner is in-voked by the node AR TAG FOLLOWING when the tag isnot detected by an amount of time sufficient for the node toconclude that the target is lost. After the planner has builtthe plan, the plan’s actions are translated in the scriptinglanguage provided by the DRONE AUTOPILOT node, whichthen imparts the commands to the ARDRONE DRIVER node.

As ARDRONE AUTONOMY is the driver for the drone, weuse this package both for the tracking and for the searchphase. We use the AR TAG FOLLOWING and AR RECOG

9http://wiki.ros.org/tum ardrone10http://wiki.ros.org/tum simulator11http://gazebosim.org

/drone_stateestimation/ardrone_driver

/drone_autopilot

/ar_planner

/ardrone/image_raw

/cmd_vel

/cmd_vel

/cmd_vel

/tum_ardrone/com

/tum_ardrone/com

/tum_ardrone/com

/tum_ardrone/com/cmd_vel

/ardrone/navdata/tum_ardrone/com

Figure 3: ROS nodes and topics involved in the search phase

nodes for the tracking phase and the DRONE STATE-ESTIMATION and DRONE AUTOPILOT for the search phases.The AR PLANNER node acts as the interface between thethe tracking and the search phase. Figure 3 shows the activeROS nodes and topics for the search phase.

Although using ROS and leveraging existing packagesfor the AR.Drone facilitated the implementation of our ap-plication, working with a real quadcopter remains a time-consuming and challenging task for several reasons. First,flying a physical object (in contrast with a simulated one)requires to solve several implementation and low-level is-sues and to manually tune of a number of parameters. Inaddition, despite the fact that the Parrot AR.Drone is con-siderably more robust than other low-cost quadcopters, itis yet an unstable system and its control is not as easy asthe control of a ground robot. Finally, the majority of exist-ing ROS packages for controlling the drone do not work forboth versions of the drone (AR.Drone1.0 or 2.0) and are notsupported by the latest versions of ROS and Gazebo. Thisrequired us to modify the existing packages and make themcompatible with the new versions of the drone, as well as theROS and Gazebo frameworks.

7 Experimental Set-upIn order to evaluate our approach to SaT in outdoor sce-narios, we intend to create an indoor analogue of the out-door setting. As our project is still ongoing, we have not yetcompleted the set-up of our experiments and, consequently,we have not yet performed extensive experiments of our ap-proach. In what follows, we describe the design of the ex-perimental set-up, while we will report on the actual experi-mental results in future publications.

We use the AR.Drone 2.0 as the observer and a LEGOMindstorms NXT assembled in the shape of a wheeled vehi-cle as the target (see Figure 4). This vehicle is able to detectand follow a line marked on the floor and is identified by thedrone through a ARTag attached to it (see Figure 5).

We use markers on the floor to create a road network andboxes to create spaces where the drone is unable to see thevehicle. We currently assume that the drone flies at an alti-tude where there are no obstacles to interfere with its flight.However, we plan to revisit this assumption at a later stageand add walls with windows, poles and other obstacles in

17

Figure 4: Target vehicleused in our experiments

Figure 5: ARTag used toidentify the vehicle

the environment. This will require additional control capa-bilities for our drone. Finally, we assume that the droneknows the configuration of the environment. Although theTUM ARDRONE and AR TAG FOLLOWING packages do notrequire a priori knowledge of the environment, the plannerneeds to know the configuration of the road network for gen-erating plausible candidate search patterns. In future work,we intend to experiment with a different set of assumptions.For example, instead of knowing the road network, the dronemight know the motion model of the target and reason aboutsuch model in order to compute candidate search patterns.

In our previous work (Bernardini et al. 2013), we showedthat our planning approach to SaT, i.e. viewing the searchproblem as a planning problem in which search patternsmust be selected and sequenced to maximise the expecta-tion of rediscovering the target, is successful and certainlyoutperforms static search strategies. Such work presents anumber of different factors with respect to the work reportedin this paper: it focuses on a fixed-wing aeroplane as the ob-server and a car as the target, the vehicles are free to movewithin a large geographical area and the experimental re-sults were obtained by running a computer simulation. Atthis stage, we cannot yet report on extensive experiments ofour SaT approach in the new setting, as our work is still inprogress. However, our initial experiments, limited to testingthe different modules of our system, seem to suggest that thepromising results we obtained in our original scenario couldtranslate to the new setting.

8 Conclusions and Future WorkIn this paper, we describe our ongoing work concerning theuse of automated planning for the high level operation of alow-cost and light-weight quadcopter engaged in SaT mis-sions. We formulate the search problem as a planning prob-lem in which search patterns must be selected and sequencedto maximise the expectation of rediscovering the target.

The approach described here builds on the work ofBernardini et al. (2013) on planning the behaviour of a UAVthat searches for a moving target across large areas and overa long time. We have demonstrated that our planning-basedapproach to SaT for UAVs generalises well to the case ofMAVs, although the temporal and spatial scales of UAV andMAV missions are quite different. We believe that, given thespecific technical characteristics of MAVs, i.e. short range,low endurance, and small payload capacities, augmenting

their control architectures with planning capabilities is cru-cial to obtaining effective performance.

In future work, we intend to generalise our approach fur-ther to any situations in which an observer with limited re-sources needs to track a moving object, whose contingentbehaviour is unknown, based on its motion model. Auto-mated planning is ideally suited to generate intelligent be-haviour in the face of uncertainty, tight deadlines and re-source constraints.

References[2007] Abbeel, P.; Coates, A.; Quigley, M.; and Ng, A. Y. 2007. An application of reinforcement

learning to aerobatic helicopter flight. In In Advances in Neural Information Processing Systems 19.MIT Press.

[2009] Achtelik, M.; Bachrach, A.; He, R.; Prentice, S.; and Roy, N. 2009. Stereo Vision and LaserOdometry for Autonomous Helicopters in GPS-denied Indoor Environments. In Proceedings of theSPIE Unmanned Systems Technology XI, volume 7332.

[2010] Bachrach, A.; de Winter, A.; He, R.; Hemann, G.; Prentice, S.; and Roy, N. 2010. RANGE -Robust Autonomous Navigation in GPS-denied Environments. In 2010 IEEE International Confer-ence on Robotics and Automation (ICRA), 1096–1097.

[2012] Benton, J.; Coles, A.; and Coles, A. 2012. Temporal Planning with Preferences and Time-Dependent Continuous Costs. In Proceedings of the Twenty Second International Conference onAutomated Planning and Scheduling (ICAPS-12).

[2013] Bernardini, S.; Fox, M.; Long, D.; and Bookless, J. 2013. Autonomous Search and Trackingvia Temporal Planning. In Proceedings of the 23st International Conference on Automated Planningand Scheduling (ICAPS-13).

[2011] Bills, C.; Chen, J.; and Saxena, A. 2011. Autonomous MAV Flight in Indoor Environmentsusing Single Image Perspective Cues. In 2011 IEEE International Conference on Robotics andAutomation (ICRA).

[2010] Coles, A. J.; Coles, A. I.; Fox, M.; and Long, D. 2010. Forward-Chaining Partial-Order Plan-ning. In Proceedings of the 20th International Conference on Automated Planning and Scheduling.

[2009] Courbon, J.; Mezouar, Y.; Guenard, N.; and Martinet, P. 2009. Visual navigation of a quadro-tor aerial vehicle. In 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), 5315–5320.

[2012a] Engel, J.; Sturm, J.; and Cremers, D. 2012a. Accurate figure flying with a quadrocopterusing onboard visual and inertial sensing. In Proc. of the Workshop on Visual Control of MobileRobots (ViCoMoR) at the IEEE/RJS International Conference on Intelligent Robot Systems.

[2012b] Engel, J.; Sturm, J.; and Cremers, D. 2012b. Camera-based navigation of a low-cost quadro-copter. In Proc. of the International Conference on Intelligent Robot Systems (IROS).

[2009] Fan, C.; Song, B.; Cai, X.; and Liu, Y. 2009. Dynamic visual servoing of a small scaleautonomous helicopter in uncalibrated environments. In 2009 IEEE/RSJ International Conferenceon Intelligent Robots and Systems (IROS), 5301–5306.

[2012] Graether, E., and Mueller, F. 2012. Joggobot: a flying robot as jogging companion. In CHI’12 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’12, 1063–1066. ACM.

[2008] He, R.; Prentice, S.; and Roy, N. 2008. Planning in Information Space for a QuadrotorHelicopter in a GPS-denied Environments. In Proceedings of the IEEE International Conference onRobotics and Automation (ICRA 2008), 1814–1820.

[2012] Hehn, M., and D’Andrea, R. 2012. Real-time Trajectory Generation for Interception Maneu-vers with Quadrocopters. In IEEE/RSJ International Conference on Intelligent Robots and Systems,4979–4984. IEEE.

[2007] Klein, G., and Murray, D. 2007. Parallel tracking and mapping for small AR workspaces. InProc. Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’07).

[2003] Long, D., and Fox, M. 2003. The 3rd International Planning Competition: Results and Anal-ysis. Journal of Artificial Intelligence Research (JAIR) 20:1–59.

[2012] Lupashin, S., and D’Andrea, R. 2012. Adaptive fast open-loop maneuvers for quadrocopters.Autonomous Robots 33(1-2):89–102.

[2009] Moore, R.; Thurrowgood, S.; Bland, D.; Soccol, D.; and Srinivasan, M. 2009. A stereovision system for UAV guidance. In Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJInternational Conference on, 3386–3391.

[2007] Mori, R.; Hirata, K.; and Kinoshita, T. 2007. Vision-based guidance control of a small-scaleunmanned helicopter. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), 2648–2653.

[2009] Purwin, O., and D’Andrea, R. 2009. Performing aggressive maneuvers using iterative learningcontrol. In Robotics and Automation, 2009. ICRA’09. IEEE International Conference on, 1731–1736. IEEE.

[2009] Quigley, M.; Gerkey, B.; Conley, K.; Faust, J.; Foote, T.; Leibs, J.; Berger, E.; Wheeler, R.;and Ng, A. 2009. ROS: an open-source Robot Operating System. In Proceedings of the InternationalConference on Robotics and Automation (IJCAI).

[2007] Roberts, J.; Stirling, T.; Zufferey, J.; and Floreano, D. 2007. Quadrotor using minimal sensingfor autonomous indoor flight. In European Micro Air Vehicle Conference and Flight Competition(EMAV2007).

[2010] Zingg, S.; Scaramuzza, D.; Weiss, S.; and Siegwart, R. 2010. MAV navigation through indoorcorridors using optical flow. In 2010 IEEE International Conference on Robotics and Automation(ICRA), 3361–3368.

18

Autonomous Energy Management as a High Level Reasoningfor Planetary Rover Problems∗

D. DıazUniversidad de Alcala


A. Cesta and A. Oddi and R. RasconiCNR – Italian National Research Council

ISTC, Rome (Italy)

M.D. R-MorenoUniversidad de Alcala


Abstract

This paper presents recent results on applying advanced au-tonomous reasoning capabilities for a planetary rover con-cept for synthesizing complete command plans that involvea wide assortment of mission requirements. Our solution ex-ploits AI scheduling techniques to manage complex tem-poral and resource constraints within an integrated power-aware decision-making strategy. The main contribution ofthis work is threefold: (i) we propose a model of the worldinspired by the Mars Sample Return (MSR) mission concept,a long-range planetary exploration scenario; (ii) we introducea MSR-inspired scheduling problem called Power Aware Re-source Constrained Mars Rover Scheduling (PARC-MRS),and we present an extension of a well-known constraint-based, resource-driven reasoner that returns rover activityplans as solutions of the PARC-MRS; finally, (iii) we conductan exhaustive experimentation to report the quality of the gen-erated solutions according to both feasibility and makespanoptimization criteria.

IntroductionThe forthcoming planetary exploration scene will call forambitious robotic missions. Increasing the level of auton-omy in those missions inevitably entails entrusting therovers with higher level responsibilities, such as the syn-thesis of complete mission plans from high-level goal de-scriptions, plan adaptation/modification to address contin-gent situations, and even the possibility of performing op-portunistic science and hazard prediction (Estlin et al. 2007).

In this work, the Mars Sample Return (MSR) missionconcept (Treiman et al. 2009) is proposed as a plausible andefficient paradigm-shift to continue the exploration of theRed Planet in the near future. Roughly speaking, the MSRmission consists of placing a rover on Mars’ surface, gath-ering scientific samples from a set of scattered and chal-lenging sites (up to many kilometers from the landing site)within relatively short time frames, and transporting themto a specific location where an ascent vehicle will be incharge of initiating the return trip. The proposed model en-capsulates a wide range of interesting features which makesit particularly challenging, as it involves: first, global path-planning, focused on “long-range navigation” planning incontrast to the classical path-planning research which ad-dresses “local navigation” to trace safe routes between pairs

∗An extended version of this paper will be published in theIEEE Computational Intelligence Magazine - November 2013

of locations separated a few meters apart of each other. Sec-ond, resource management, by analyzing the energy produc-tion/consumption profiles of all the plan activities. Third, awide assortment of temporal constraints, such as absolutedeadlines on the experiment execution (e.g., to communicatecritical experimental results via orbiting relays), or rover in-activity periods (e.g., nights or solar storms) represented asstatic synchronization events of finite duration.

To this aim, we introduce a MSR-inspired schedulingproblem called Power Aware Resource Constrained MarsRover Scheduling (PARC-MRS), and present an extensionof a well-known constraint-based, resource-driven reasonerthat returns rover activity plans as solutions of the PARC-MRS. Our solving process exploits advanced Artificial In-telligence (AI) P&S constraint-based, resource reasoningtechniques, in particular “Precedence Constraint Posting”(PCP) (Oddi and Smith 1997; Cesta, Oddi, and Smith 2002)to reason upon a detailed model of the PARC-MRS prob-lem instances. One of the contributions of this work is theexploitation of a known methodology to represent consum-able resources by means of a cumulative scheme (Simo-nis and Cornelissens 1995), to model and solve the pro-posed scheduling problem. The remainder of the paper isstructured as follows: we start with a detailed description ofthe mission scenario of reference. Next, we provide a de-tailed description of the extended constraint-based, resourcedriven reasoner with integrated power-aware decision capa-bilities. Following that, we conduct an exhaustive experi-mentation to evaluate the efficiency of our solution algo-rithm. Finally, a conclusion and future work section closesthe paper.

The Mars Sample Return mission scenarioIn this section we provide a definition of the Power-AwareResource Constrained Mars Rover Scheduling (PARC-MRS) problem, that is grounded on the commitment to theMars Sample Return (MSR) (Treiman et al. 2009) reliabilityand efficiency baseline requirements: the first requirementrefers to the need to synthesize plans capable of partially ab-sorbing the effects of possible exogenous events arising dur-ing the plan’s execution, while the second refers to the goalof minimizing the plan’s completion time, thus maximizingthe overall science return.

The attainment of the mission’s goals requires the useof the rover’s set of instruments/resources whose utilizationmust be synchronized over time in order to guarantee the

19

Figure 1: Mission scenario overview: (1) navigation, (2) ac-quiring science (drill) and (3) sample release activities

correct execution of the plan’s activities. Each rover activ-ity ai requires a specific amount of one or more resourcesduring its entire execution.

Specifically, the soil extraction operations require a sci-ence acquisition asset and a sample cache (SCr), which ba-sically consist of a drilling subsystem and a container with acapability to store and transport up to C standard-sized sam-ples, respectively.

Navigation tasks demand a Locomotion, Guidance, Nav-igation and Control (Loco & GNC) subsystem, which pro-vides all the functionalities that allow the rover to reliablyreach a desired target. The rover energy supply is providedby a powering subsystem consisting of a combination of So-lar Array (SAr) as a primary power source, and a Battery(Br). The battery is characterized by a maximum capacityor saturation level Bmax (in Watts-hour) and by a mini-mum usage threshold Bmin (expressed as a percentage ofthe maximum capacity), representing the minimum batterypower level that can be reached, for safety reasons. Duringnominal rover operations, the power generated by the solarpanels is sufficient to propel the rover and charge the batter-ies in the day time, while during the night the rover suspendsevery activity. The battery is required to sustain the execu-tion of the soil extraction activities as well as to maintain theminimum operating temperature of the rover system duringthe night, but under certain conditions, it can also contributewith additional power for locomotion operations.

More formally, the PARC-MRS problem entails the syn-chronization of a set of resources R = {r1, r2, . . . , rm}to perform a set of n rover activities over time A ={a1, a2, . . . , an}. The set of activities is organized alonga set of ne experiments (or job sequences) Exp ={Exp1, . . . , Expne}. More concretely, the complete execu-tion of the i-th experiment Expi is modeled as a tuple com-posed of the following ordered activities:

Expi = 〈NavS,i, Drilli, Navi,F , Reli〉 (1)

Figure 1 illustrates the PARC-MRS problem scenario, aswell as the basic activities that are executed in a typicalMSR mission: theNavi,j activities represent the long-rangetraversals for science acquisition or sample delivery betweentwo different locations i and j (the initial location of therover and the location of the ascent vehicle are denoted with

S and F, respectively); the Drilli activities represent the de-ployment of the onboard science collection system (i.e., adrilling instrument) to retrieve and store a soil sample situ-ated at the location or way-point i; finally, theReli activitiesrepresent the releasing of the sample (collected at the way-point i) at the final location, where the Mars ascent vehicleis in charge of uploading it into orbit. The ascent vehicle isequipped with a robotic arm (Ar) that is used to recover thesoil samples collected by the rover. Every rover activity un-dergo complete suspension periods during the nights, whichcan have different durations depending on the Martian sea-son.

A feasible solution S = {st1, st2, . . . , stn} is an as-signment to the start times sti of the activities ai ∈ Aimposing a total order among all the activities activitiesak ∈ {Drilli, Reli : i = 1, 2, . . . , n} ⊂ A, and satisfy-ing the following set of constraints.

• Temporal Constraints - S is consistent with the partialordering imposed in each sequence Expi. Pairs of con-secutive activities in each sequence Expi are supposedto be contiguous, i.e., in every sequence, the end time ofeach previous activity coincides with the start time of thefollowing activity. The durations of the Drilli and Reliactivities are lower bounded by the time required to com-plete the science extraction and the release operations, re-spectively. The minimum duration of theNavi,j activitiesdepends on the nominal traversal time (ttij) required totravel the distance between the pair of i, j waypoints. Inthis work, waypoint-to-waypoint paths are considered assequences of straight, traversable segments computed dur-ing the mission preparation phase. Finally, the completiontime of some of the Expi sequences might be constrainedby an absolute deadlines di.

• Sample Cache Constraints - the number of samples con-tained at all times in the cache SCr cannot exceed therover’s maximum sample capacity capacity C.

• Energy Constraints - in our model, the execution of thepower demanding activities (i.e., drilling operations andnavigation) requires a certain amount of energy that hasto be completely available at the beginning of the activity.

Navi,j activities demand a variable amount of energy eij ,which depends on the traveling distance between the twodifferent locations i and j, while the Drilli activities de-mand a constant amount of power ei necessary to operatethe drill subsystem.

The rover powering subsystem imposes an additionalglobal constraint on the set of activities A. In particular,the global production/consumption battery usage profileB(t) is computed according to the hypothesis that the on-board rover solar arrays produce a continuum of energy ata monotonic rate σcharge (Watts). The generated power isdirectly used to both propel the rover and charge the bat-tery up to the saturation level Bmax. The surplus energy,if any, is discarded as the battery cannot be charged in ex-cess of Bmax (saturation). As the activities ai consumethe energy instantaneously at their start times sti, we canconsider the assessment of the usage profile B(t) only fort = sti, with i = 1, 2, . . . , n.

20

Figure 2: Activity-on-the-node graph representation of the problem model: the edges represent the precedence constraints,while the nodes (boxes) represent the activities; the resource usage information is shown within each box

Finally, an optimal solution S∗ is a feasible solutionwhere the plan’s completion time, defined as the highest endtime among of all the plan’s activities, is minimized.

The constraint-based solving algorithmIn this section we present in detail the profile-based, power-aware reasoning algorithm ESTAp developed in this workto provide feasible solutions for the PARC-MRS problem, aswell as a meta-heuristic strategy for solution optimization.

The constraint-based PARC-MRS problemrepresentationWe formulate the PARC-MSR scheduling problem in termsof Constraint Satisfaction Problem (CSP) (Montanari 1974).In our CSP-based formulation of the PARC-MSR a set ofdecision variables called Minimal Critical Sets (MCSs) areidentified; An MCS is defined as a set of activities that si-multaneously require a resource rk with a combined capac-ity requirement greater than the resource’s total capacity,such that the combined requirement of any subset is less thanor equal to the resource capacity. From the definition of anMCS, it follows that the posting of a single precedence be-tween some pair of activities in the MCS is sufficient to elim-inate the conflict. Each MCS variable is associated with adomain of feasible values corresponding to the set of prece-dence constraints that can be posted to resolve the MCS (i.e.,the possible orderings allowed between any pair of activitiesbelonging to the same MCS).

Two different types of solving separation constraintsare considered: simple precedence and traversal time con-straints, denoted respectively as ai ≺ aj and ai ≺val aj ,where val is the minimum separation value that must holdbetween ai and aj . Traversal time constraints are posted be-tween drilling and/or sample release activity pairs in order toproperly model the traveling times among the different loca-tions, while simple precedence constraints are used in all theother cases.

To support the search for a consistent assignment to theset of MCS variables, for any PARC-MRS problem instancewe can define a temporal constraint network which mapsthe temporal constraints in the problem to distance con-straints between appropriate time-points (i.e., the activity

start times and/or end times); such temporal constraint net-work corresponds to the so-called Simple Temporal Problem(STP) (Dechter, Meiri, and Pearl 1991), and is formulatedas a CSP (ground-CSP). Thus, our PARC-MSR formulationcan be seen as a meta-CSP formulation, which utilizes theground-CSP representation for the underlying temporal rea-soning on top of which a second CSP problem is formulatedthat enables resource constraint reasoning.

Figure 2 presents an activity-on-the-node graph represen-tation of the PARC-MRS problem considered here. In thegraph, the nodes represent the problem activities, each char-acterized by (i) a pair of time points (indicating the start-ing and end times), (ii) a resource demand, where U(ai, rk)represents the amount of resource rk required by the activ-ity ai, and (iii) a specific (flexible) duration, i.e., expressedas a temporal interval [lb, ub]; the edges correspond to theprecedence relation constraints between the activities, againexpressed as temporal intervals. The graph contains two spe-cial time points (A and B) indicating the schedule’s time ori-gin and horizon, respectively.

The integrated power-aware, resource drivenESTAp solverThe proposed ESTAp procedure for solving instances ofthe PARC-MSR problem is based on the precedence con-straint posting (PCP) approach (Smith and Cheng 1993;Cheng and Smith 1994), that consists of deciding and post-ing a set of temporal precedence constraints that elimi-nates all the resource contentions. Basically, ESTAp isa modified version of the basic profile-based schema ofESTA (Cesta, Oddi, and Smith 2002) which providescumulative-based resource reasoning by “iteratively level-ing contention peaks” through the exploitation of a new setof dominance conditions introduced in (Oddi et al. 2011)that allow us to take into account both the simple and setuptime precedence constraints within the general problem-solving strategy.

Algorithm 1 showsESTAp’s resolution process in detail.The algorithm receives as input a description of the schedul-ing problem according to the constraint-based specificationintroduced in the previous section, and iteratively performsa solving sequence composed of three steps: (i) checkingthe temporal consistency of the current partial solution; (ii)

21

Algorithm 1: ESTAp algorithm.Input: Problem, HorizonOutput: FeasibleSolution, EmptySolution

<meta-CSP, ground-CSP >← CreateCSP (Problem)1loop2

if CheckConsistency (ground-CSP) then3// Earliest Start-time Solution extraction4ESS← ExtractESS (ground-CSP)5// Resource profiling6ComputeResourceUsages (ESS)7

// Resource contention peaks levelling8meta-CSP← ComputeMCSs (ground-CSP)9if ConflictFree (meta-CSP) then10

FeasibleSolution← ExtractSolution11(ground-CSP)Return (FeasibleSolution)12

else13if Unsolvable (meta-CSP) then14

Return (EmptySolution)15

else16MCS← SelectMCS (meta-CSP)17PrecedenceConstraint←18SelectPrecedence (MCS)ground-CSP← PostConstraint19(ground-CSP, PrecedenceConstraint)

else20Return (EmptySolution)21

end-loop22

estimating all the resources utilization throughout the cur-rent solution, i.e., by profiling the sample cache, rover andbattery resources; (iii) identifying and resolving all of the re-source conflicts possibly existing in the current solution. Inthe following sections we provide a detailed description ofeach of these steps.

Step 1: constraint propagation & temporal con-sistency checking. Within this step, the tempo-ral constraint network (ground-CSP) underlying theproblem is checked for consistency by the functionCheckConsistency(ground-CSP) (line 3 of Algo-rithm 1). If the ground-CSP is found to be inconsistent, theprocedure exists immediately.

Step 2: Resource usage profiles computation. In thisstep (lines 4-7), the algorithm extracts a solution andperforms an estimation of all the resource usage pro-files. At line 5, an Earliest Start-time Schedule1 (ESS)is extracted from the partial CSP schedule solution(extractESS(GroundCSP) function), while at line 7the ComputeResourceUsages(ESS) function returnsall the resource utilization profiles on the basis of the ESSsolution.

In this work, in order to reduce the consumable behav-ior of the battery to the cumulative scheme used by ESTA,

1ESS is a consistent temporal assignment where all the timepoints are assigned with the lower bound values of their respectivefeasibility intervals.

we exploit a modified version of the model introduced bySimonis2 (Simonis and Cornelissens 1995). Below, we de-scribe how the estimation of the overall power respectivelyconsumed and produced by all the activities of the sched-ule is computed according to our adaptation of the Simonismodel.

Energy consumption profile. Figure 3 (left) illustrates anexample of how energy consumptions are modeled in ourframework, by introducing as many battery consuming ac-tivities, or energy consumers (Cons1 and Cons2) as theplan’s activities that require energy (A1 and A2, in the fig-ure). As shown in the figure, the energy consumers’ endtimes are constrained to coincide with the horizon time point(i.e., the end of the schedule), while their start times are con-strained to match with the start times of A1 and A2 (i.e.,to the instants at which a specific amount of energy is re-quired), thereby expressing the fact that each amount of en-ergy required by a task is lost forever (unless replenishedby a producer task), hence modeling the typical renewableresource behavior.

Energy production profile. The computation of the energyproduction profile follows a logic which is directly exempli-fied by Figure 3 (right). As a consequence of adopting theSimonis’ model, the continuous charging rate curve (i.e., theσcharge rate charging profile) is approximated by means of asequence of small, discrete chunks of energy producers (i.e.,the Prodi activities, in the figure) distributed along the com-plete horizon. The result is a piecewise constant representa-tion of the energy production profile; each chunk of energy ismodeled as a time-fixed activity which produces an amountof energy equal to the nominal quantity of power collectedduring the related piecewise segment minus the energy pos-sibly lost because of saturation during the same segment.Each energy producer activity starts at the beginning of theschedule (i.e., the origin time point), and terminates at theinstant at which the battery is charged with the associatedenergy chunk (i.e., the energy chunk is released).

Figure 4 presents an MSR problem instance composed oftwo job sequences (top) together with the set of the rela-tive energy consuming activities (four consumers for eachsequence), while at the bottom of the figure, the resultingoverall battery usage profile is drawn (energy consumptionsare depicted as red down arrows, while energy productionsare depicted in green). As shown in the figure, the relationbetween the rover activities and their related consumers is asfollows: the first consumers (i.e., those starting at t0) referto the consumption of the initial traversals to be performedbefore reaching the drill locations; the second consumer re-fer to the soil extraction operations (Drilli); the third con-sumers refer to the navigation activities between the soil ex-traction and final location; and the last consumers of eachjob (i.e., those attached to the end of the Reli activities)are introduced to model the consumption of further possi-ble movements starting from the final location. The energy

2The Simonis’ model was originally proposed to model con-sumable resources following a classical cumulative scheme. Thereference model basically copes with stock-based consumable re-sources (such as a fuel tank or a storage warehouse) in flow shopor job shop application contexts.

22

Figure 3: Energy consumption (left) and production (right) constraints representation

consumption profile is ultimately computed as the sum of allthe power demands (depicted as downward red arrows) onbehalf of all the consumer activities across the whole sched-ule’s makespan.

Step 3: Resource contention peaks leveling. This step(lines 8-19 in Algorithm 1) constitutes the most importantpart of the solving process, as it deals with the identificationand the resolution of the next resource conflict on the ba-sis of a specific heuristic rationale. This decision process isknown as “resource contention peak leveling”, since it con-sists of (i) identifying the resource over-consumptions and(ii) flattening them through the imposition of new prece-dence constraints which temporally separate the executionof the contending activities, according to the steps below.

Resource conflict detection. Firstly, a resource usage anal-ysis is performed with the aim of determining all possibleresource capacity violations (i.e., the resource contentionpeaks), by identifying the sets of activities that are executedconcurrently (on the basis of the ESS projection) and thatcause resource over-consumptions by globally requiring aresource in excess of its maximum capacity.

More concretely, a meta-CSP is computed by extracting aset of Minimal Critical Sets (MCSs) from each resource con-tention peak (ComputeMCS(meta-CSP) function, line9). For example, contention peaks occurring on the batteryresource are detected for all instants ti where the overlap-ping of (at least) one production and one consumption activ-ity causes the total battery capacity to fall below the Bmin

value. Figure 4 shows an example of battery contention peakspanning over a temporal interval (denoted as critical seg-ment) during which the battery usage profile remains belowthe threshold energy level Bmin (i.e., battery is overcon-sumed).

Resource conflict resolution. Subsequently, the functionSelectMCS(meta-CSP) (line 17) is invoked to returnthe next MCS. Such MCS is chosen according to the mostconstrained variable ordering heuristic, so as to select theMCS candidate (the decision variable) characterized by thesmallest temporal flexibility, i.e., a function of the degreeto which constituent activities can be reciprocally shifted intime, the idea being that the less flexibility a MCS has, themore critical it is to resolve that first. Once the MCS is se-lected, the function SelectPrecedence(MCS) (line 18)

is in charge of (i) selecting a pair of activities from the MCS,and (ii) deciding their relative separation ordering for MCSresolution. This decision is made following the least con-straining value ordering heuristic guideline: the greater theflexibility is retained after inducing a precedence orderingconstraint, the more desirable it is to post that constraint.

The three steps previously discussed are iterated until(i) a conflict-free solution schedule (i.e., temporal and re-source feasible) is found, or (ii) a temporal inconsistency isdetected. In the second case, the algorithm stops as it hasreached a dead-end situation.

Providing better solutionsBoth the feasibility and optimization version of the schedul-ing problem here addressed is NP-hard, and therefore can-not be solved in reasonable time by using systematic, non-informed search techniques. As previously stated, the solu-tions provided by the ESTAp algorithm are generally farfrom being optimal as the ESTAp procedure is only con-cerned with providing feasible solution schedules. There-fore, we embedded our ESTAp algorithm within an it-erative sampling optimization loop, similarly to the ap-proach used in the Iterative Sampling Earliest Solutions(ISES) (Cesta, Oddi, and Smith 2002) strategy for makespanminimization, an efficient multi-pass approach which per-forms quite well in the face of scheduling problems involv-ing very large search spaces. More concretely, ISES is astochastic procedure that controllably broaden the explo-ration of the search space without incurring in the exponen-tial cost of classical backtracking strategies, by iterating anon-deterministic version of theESTAp’s conflict selectionheuristic (called ESTAp

rand) across solutions characterizedby increasingly smaller temporal horizons.

The solution we employ in this work is a simplifiedversion of the ISES procedure (still referred to as ISES,for simplicity reasons), and is illustrated in Algorithm 2.The procedure receives as inputs (i) a scheduling prob-lem specification, (ii) an initial “sufficiently large” horizonvalue (MaxH), and (iii) two additional parameters to con-trol the stop conditions, i.e., the maximum CPU time allot-ted for optimization (MaxTime), and the maximum num-ber of permitted iterations without getting any improvement(MaxAttempts). Our ISES version works according to thetwo following basic steps: (i) an initial and deterministic in-vocation of ESTAp with the horizon value MaxH (line 1),in order to find the first feasible solution, and (ii) the execu-

23

Figure 4: Example of energy profile computation: the consumption components of the profile (eij) are depicted in red, whilethe production components are depicted in green

Algorithm 2: The iterative sampling search framework(ISES) for solution optimization.

Input: Problem, MaxH, MaxTime, MaxAttemptsOutput: Sbest

Sbest←ESTAp (MaxH)1while (¬ StopCondition (MaxTime, MaxAttempts)) do2

Sol←ESTAprand (Mk (Sbest))3

if (Mk (Sol)<Mk (Sbest)) then4Sbest← Sol5

tion of an optimization loop, in the shape of successive callsto ESTAp

rand where at each iteration the temporal horizonis reduced to the best solution makespan Mk(Sbest) found sofar (line 3), thus forcing the algorithm towards solutions ofincreasingly smaller makespans. The algorithm returns thebest solution encountered when either of the two stop con-ditions previously described is met.

Experimental analysis

In this section we conduct an experimentation analysiswhich aims at assessing both: (a) the efficiency of our solv-ing algorithm ESTAp; and (b) the efficacy of an optimiza-tion framework based on the ISES strategy.

The MSR benchmark problem sets

Due to the lack of comparative scheduling problem instancesof reference in literature which suitably match the character-istics of our MSR problem description, we decided to gener-ate a problem library by using our own benchmark instance

generator MSR/Gen3 for the class of problems here referredto as PARC-MRS.

The benchmark library used in this work has been instan-tiated by using seed templates whose baseline parameterswere carefully selected from specifications characterizingrecent real-world rover-based missions. More concretely, wehave based our benchmark production on one of the rovermodels contained within the ESA’s 3DROV (Poulakis et al.2008) simulator, an advanced planetary robot design, visu-alization, and validation tool. The 3DROV’s rover model ac-tually represents a prototype of one of the possible config-urations of the ExoMars rover, a planned Mars mission tosearch for possible biosignatures of Martian life (van Win-nendael, Baglioni, and Vago 2005).

The MSR benchmark library used in the experimentalphase of this work consists of three different benchmark sets(containing 40 problem instances each) where each set iscomposed of instances respectively characterized by 20, 25and 30 experiment sequences (referred to as MSR40–20,MSR40–25 and MSR40–30, respectively)4.

Experimental resultsThe empirical analysis has been organized in two differentparts, relatively to the feasibility and the optimization as-sessments respectively. The former analysis conveys the re-sults related to the execution of the deterministic ESTAp

algorithm on the computation of a feasible schedule solu-tion, while the second analysis focuses on the outcome pro-duced by the optimization framework (ISES) on the attempt

3The MSR/Gen Java code can be downloaded from the follow-ing link: atc1.aut.uah.es/∼mdolores/PARC MSR

4The complete MSR benchmark library, as well as a self-contained description of the specific format of each problem in-stance and seed templates can be downloaded at the following link:atc1.aut.uah.es/∼mdolores/PARC MSR

24

to improve the results obtained from the feasibility analysis.In either case, we solved the benchmark instance sets pre-viously presented under three different environmental con-ditions, depending on the particular period of the year themission takes place, i.e., summer, winter and mid-season.The idea is to study the performances of the solar arraysand battery-powered rover relatively to the problem at hand,under different conditions of available daylight. More con-cretely, a martian day is slightly longer than 24 terrestrialhours (here considered exactly 24 hours for simplicity rea-sons) and, depending on the particular season and latitudeof the rover’s area of operations, daytime periods might varyfrom approximately 16 hours (i.e., the nights lasting 8 hours)to 8 hours (i.e., the nights lasting 16 hours). In our study,we also considered a “mid-season” situation where each dayis equally divided in 12 hours of daylight and 12 hours ofnight. In the model, we use the simplifying assumption thatthe day/night transitions occur instantaneously. Regardlessof the season, feasible solution plans must guarantee therover’s capability to retain the energy required to keep therover subsystems sufficiently warm during the night inactiv-ity cycles. It should be noted that for obvious reasons, suchheating power can only be supplied by the onboard battery.

As explained in the previous section, 3 different bench-mark sets are used in the experimental campaign, labeledin agreement with the notation MSR40–x–yh, where x de-notes the number of jobs of each instance of the set (x ∈20, 25, 30), and y refers to the duration, expressed in hours,of the night periods (y ∈ 8, 12, 16, referring to summer,mid-season and winter light conditions, respectively). Everyproblem instance contains four experiment sequences char-acterized by deadline constraints (therefore defined critical),two of which are forced to be executed at some random in-stant before the 10th day of mission, while the other two areforced to be completed before the 20th day of mission.

Table 1 collects the results of both the feasibility (feasi-bility assessment section) and optimization assessment (op-timization assessment section), for each previous benchmarkset. All the reported figures are computed by averaging thedata obtained from the 40 instances belonging to every set.The results shown in each column of the table have the fol-lowing meaning:

– Mkspavg(mins) is the average solution makespan length(expressed in minutes).

– CPUavg(secs) is the average CPU computation time (ex-pressed in seconds).

– Cacheavg(%) represents the rover’s average samplecache usage (expressed in percentage5) along the wholeplan’s horizon.

– Batavg(%) is the battery resource usage (expressed inpercentage5) along the whole plan’s horizon.

5Resavg = 1n∗maxCap

n∑

i=1

∫mki

0fi(t)dt

mki× 100, where n is

the number of problem instances, maxCap is the resource max-imum capacity, mki is the solution’s makespan of each instance,and fi(t) is the curve representing the resource utilization profilealong the complete makespan.

– ∆avgLWU (%) conveys the average improvement ratio (ex-

pressed in percentage6) between the makespan lengths re-lated to the initial and optimized solutions, respectively.

– #Iter(avg) is the average number of iterations per-formed by ISES while attempting at improving the ini-tial solution within an estimated maximum time windowof 10 minutes (or after 200 consecutive attempts if nomakespan improvement is obtained).

A maximum CPU time of 10 minutes has been allottedfor each optimization run. In both assessments we consid-ered an initial mission horizon of 138 Sols (Martian days).Finally, the current experimentation has been executed on anIntel(R) Core(TM)2 Quad CPU Q8200 @2.33Ghz machine,with 4Gb RAM.

From the observation of the obtained results, we can in-fer the following conclusions. Relatively to the results re-turned by ESTAp, it can be observed that for all the threebenchmark sets, the average makespans (Mkspavg(mins)column, feasibility assessment) follow an increasing trendwith the shortening of the daylight periods, thus confirm-ing our expectations about the significant impact of the sea-sonal conditions on the solution quality (the results showthat in some cases the plan’s duration can be as much asdoubled). Still relatively to the makespan, we can appre-ciate the significant improvement rates provided by ISES(Mkspavg(mins) column, optimization assessment), rang-ing from a 35.6% improvement for the MSR40–20–8h in-stances, to a 8.5% improvement for the MSR40–30–16hinstances. It should be however observed that, in the lattercase, only an average of ≈ 3 optimization iterations havebeen possible within the allotted time of 10 minutes (see#Iter(avg) column).

Still relatively to the makespan improvement averages, itcan be observed that the deteriorating seasonal lighting con-ditions severely affect the optimization quality (∆avg

LWU (%)column), as the room for “compacting” the plan’s activitiesdecreases for reasons related to both the augmented roverperiods of quiescence, and the higher amount of batterypower that must be charged before the rover goes off-duty. Infact, this power (which might otherwise be used to perform anumber of pre-dusk activities that have to be inevitably post-poned to the following day) must be saved to guarantee theequipment’s proper heating during the longer nights.

With regards to the average battery power utilization (seeboth Batavg(%) columns), we observed a rather regulartrend which confirmed that the shorter the martian days’duration, the higher is the battery average power demand.While this result may seem quite straightforward (e.g., morebattery power is required to safely “survive” the longernights), the fact that approximately the same amount ofpower is used for both the baseline and makespan-optimizedsolutions is puzzling.

One possible explanation may be directly derived fromthe formula used for the Batavg assessment, as we can see

6∆avgLWU = 1

n

n∑

i=1

mki −mk0i

mk0i

×100, where mki corresponds

to the makespan length of the optimized solution provided byISES, and mk0

i is the the makespan length of the initial solutionprovided by ESTAp

25

ESTAp (feasibility assessment) ISES (optimization assessment)Benchmark Mkspavg CPUavg Cacheavg Batavg Mkspavg ∆avg

LWU Cacheavg Batavg #Iter(avg)

MSR40–20–8h 17027.2 50.367 28.575 8.864 12620.325 35.628 30.473 8.679 14.575MSR40–20–12h 19336.232 55.609 33.062 21.511 15990.45 26.399 33.866 21.395 12.375MSR40–20–16h 30144.4 59.440 23.535 39.771 25729.3 17.239 25.795 39.672 11.3

MSR40–25–8h 23826.228 108.793 30.458 8.91 20870.925 21.805 33.954 8.861 7.85MSR40–25–12h 29249.686 121.651 28.153 21.598 26293.85 17.907 28.481 21.523 6.05MSR40–25–16h 41558.232 163.706 22.062 39.825 37720.55 11.785 24.801 39.783 4.725

MSR40–30–8h 30605.825 181.842 32.674 8.889 26643.7 15.588 35.213 08.857 5.775MSR40–30–12h 36516.125 207.214 30.166 21.639 33517.45 9.248 30.995 21.608 3.975MSR40–30–16h 52707.65 272.359 22.435 37.185 48680.2 8.569 22.819 38.034 2.925

Table 1: Experimental results corresponding to the feasibility and optimization assessments

that while shorter plans should require less battery power(e.g., the distance traveled are shorter), the Batavg value isinversely proportional to the makespan (i.e., an optimizedmakespan increases the Batavg value). Despite all of theabove, the very strict correspondence of values in all thecases remains however to be fully explained.

Finally, the average rover cache utilization data (bothCacheavg(%) columns) deserve some attention. Lookingat the Cacheavg(%) columns, a decreasing utilization ofthe cache can be observed as the seasonal situation movefrom the summer to the winter daylight conditions. This canbe noticed for all the MSR40–25–∗ and the MSR40–30–∗ benchmark sets, and the same behavior applies to boththe feasibility and the optimization assessment data (eventhough it can be observed that in the makespan-optimizedsolutions the average cache utilization tends to increase).This circumstance is easily explained as a direct conse-quence of the longer times necessary to complete the samemissions under less favorable power charging conditions(i.e., longer plan makespans entail a less efficient cache uti-lization). Yet, it can also be observed that in the MSR40–25–∗ case, the previous regular trend is not followed: asthe lighting conditions worsen, there is an “counterintuitive”behavior where the average cache utilization seems to in-crease, before definitely falling to the expected values. This“anomaly” on the general trend might be explained with theinfluence of the maximum time windows on the executionof some job sequences, which may cause the rover to de-cide not to release all of the acquired samples at the AVlocation before heading for a new experiment’s location,in order to satisfy some experiment-related deadline con-straint. It is straightforward that in all such circumstances,the cache utilization tends to increase as the cache itselfremains occupied by the unreleased samples. The reasonthis phenomenon becomes evident only with the smaller in-stances (i.e., those composed of 20 experiment sequences) isrelated to the fact that, since each problem instance alwayshas 4 sequences characterized by a deadline (regardless ofits size), the presence of such deadlines become more rele-vant for the instances where the constrained/unconstrainedsequences ratio increases.

Conclusions and Future WorkIn this paper we presented last results on delivering ad-vanced autonomous reasoning capabilities to robotic plan-etary exploration. In our current work, we were inspired

by the requirements of a particular rover-based Mars ex-ploration mission, namely, the Mars Sample Return (MSR)mission concept. One of the contribution of this work is tointegrate the most significant MSR mission requirementsinto a scheduling problem model, the Power Aware Re-source Constrained Mars Rover Scheduling (PARC-MRS)problem. Following the proposed model, we presented ascheduling algorithm aimed at synthesizing complete plansequences that span the whole mission horizon by reason-ing upon a wide set of realistic mission requirements. Moreconcretely, the reasoner we propose focuses on a number ofresults belonging to previous research, and provides an ex-tension of a well-known constraint-based, resource-drivenprocedure which exploits power-aware reasoning capabili-ties within an integrated resolution strategy, where a widevariety of complex temporal and resource constraints areconsidered, with special attention paid to the energy require-ments. Indeed, one of the main contributions of this workis the successful exploitation of a well known methodol-ogy to represent renewable resources by means of a classi-cal cumulative scheme, to model and solve the PARC-MRSproblem. We also conducted an experimentation assessmentto evaluate the efficiency of our solution algorithm, as wellas the effectiveness of an optimization schema in providingminimum-makespan solutions.

The contents of this paper describe an ongoing work.More activities are currently being carried out in many direc-tions, like the refinement of the terrain model to take into ac-count characteristics such as slope, compactness, roughness,etc., in view of a fully dynamic utilization of the schedulingengine in a simulated Sense-Plan-Act loop execution con-text. It was outside the scope of this paper to present thepreliminary results of such experimentation. The interestedreader may refer to (Dıaz et al. 2012) for more informationon the current state of activities.

Acknowledgments

Daniel Diaz is supported by the European Space Agency(ESA) under the Networking and Partnering Initiative (NPI)Autonomy for Interplanetary Missions (ref. 2169/08/NI/PA).The authors are grateful for all the support obtained throughESA-ESTEC, specially from its ESA’s technical officer Mr.Michel Van Winnendael. Last author is funded by the CDTIproject COLSUVH.

26

ReferencesCesta, A.; Oddi, A.; and Smith, S. F. 2002. A constraint-based method for project scheduling with time windows. J.Heuristics 8(1):109–136.

Cheng, C., and Smith, S. 1994. Generating Feasible Sched-ules under Complex Metric Constraints. In 12th NationalConference on AI (AAAI-94).

Dechter, R.; Meiri, I.; and Pearl, J. 1991. Temporal con-straint networks. Artificial Intelligence 49:61–95.

Diaz, D.; R-Moreno, M.; Cesta, A.; Oddi, A.; and Rasconi,R. 2011. Scheduling a Single Robot in a Job-Shop Environ-ment through Precedence Constraint Posting. In IEA/AIE,Syracuse, NY, USA.

Dıaz, D.; R-Moreno, M.; Cesta, A.; Rasconi, R.; and Oddi,A. 2012. An Intergrated Constraint-based, Power AwareControl System for Autonomous Rover Mission Operations.In i-SAIRAS, article n. 10A–4.

Estlin, T.; Gaines, D.; Chouinard, C.; Castano, R.; Bornstein,B.; Judd, M.; Nesnas, I.; and Anderson, R. 2007. IncreasedMars Rover Autonomy using AI Planning, Scheduling andExecution. In Robotics and Automation, IEEE InternationalConference, 4911–4918.

Montanari, U. 1974. Networks of Constraints: FundamentalProperties and Applications to Picture Processing. Informa-tion Sciences 7:95–132.

Oddi, A., and Smith, S. 1997. Stochastic Procedures forGenerating Feasible Schedules. In 14th National Confer-ence on AI (AAAI-97), 308–314.

Oddi, A.; Rasconi, R.; Cesta, A.; and Smith, S. F. 2011.Solving job shop scheduling with setup times throughconstraint-based iterative sampling: an experimental analy-sis. Annals of Mathematics and Artificial Intelligence 62(3-4):371–402.

Poulakis, P.; Joudrier, L.; Wailliez, S.; and Kapellos, K.2008. 3DROV: A Planetary Rover System Design, Simu-lation and Verification Tool. In i-SAIRAS.

Simonis, H., and Cornelissens, T. 1995. Modelling Pro-ducer/Consumer Constraints. In First International Con-ference on Principles and Practice of Constraint Program-ming, 449–462. London, UK: Springer-Verlag.

Smith, S., and Cheng, C. 1993. Slack-Based Heuristics forConstraint Satisfaction Scheduling. In 11th National Con-ference on AI (AAAI-93).

Treiman, A. H.; Wadhwa, M.; Shearer, C. K.; McPherson,G. J.; Papike, J. J.; Wasserburg, G. J.; Floss, C.; Rutherford,M. J.; Flynn, G. J.; Papanastassiou, D.; Westphal, A.; Neal,C.; Jones, J. H.; Harvey, R. H.; and Schwenzer, S. 2009.Groundbreaking Sample Return from Mars: The Next GiantLeap in Understanding the Red Planet. In Planetary ScienceDecadal Survey.

van Winnendael, M.; Baglioni, P.; and Vago, J. 2005. De-velopment of the ESA ExoMars Rover. In i-SAIRAS.

Wood, E. G. 2002. Multi Mission Power Analysis Tool. InIT Symposium, Pasadena, CA, USA.

27

28

Planning and Replanning for Autonomous Underwater Vehicles∗

Daniele MagazzeniDepartment of Informatics

King’s College LondonUnited Kingdom

Francesco MaurelliOcean Systems Laboratory

Heriot-Watt UniveristyUnited Kingdom

Abstract

This paper presents the framework of a FP7 EU projecton persistent autonomy for autonomous underwater ve-hicles. It highlights the major challenges and tasks ad-dressed by the project, and then focuses on the resultsachieved at the planning level. In the last part, it presentsontology as a way to represent the reality and outlinesfuture work about integration of ontologies with theplanning system.

1 IntroductionWhilst humans and animals perform effortlessly doing com-plicated tasks in unknown environments, our human-builtrobots are not very good at being similarly independent. Op-erating in real environments, they easily get stuck, often askfor help, and generally succeed only when attempting simpletasks in well-known situations. We want autonomous robotsto be much better at being autonomous for a long time (per-sistent autonomy), and to be able to carry out more com-plicated tasks without getting stuck, lost or confused. Fol-lowing the Deep Water Horizon disaster in the BP Macondooilfield in the Gulf of Mexico in 2010, Oil Companies are de-veloping improved ways to cost effectively and safely carryout more frequent inspection, repair and maintenance taskson their subsea infrastructure. This is particularly challeng-ing in deep water. To date, Autonomous Underwater Vehi-cles (AUVs) have been deployed very successfully for var-ious forms of seabed and water column transit survey. Firstcommercial units will soon be applied to simple hoveringinspection tasks, with future units expected to address muchharder intervention where contact is made to turn a valveor replace a component. Because these vehicles reduce orremove the need for expensive ships, their adoption is ex-pected to grow over the next 5 to 10 years.

To be successful commercially, these hovering AUVsmust operate for extended periods (12-48 hours +) with-out the continual presence of a surface vessel. They musttherefore demonstrate persistent autonomy in a challengingenvironment. This is the aim of the European FP7 projectPANDORA: Persistent Autonomy through Learning, Adap-tation, Observation and Re-planning. Three essential areashave been identified:

*Parts of this paper have already been presented at OCEANS’13.

• Describing the World• Directing and Adapting Intentions• Acting RobustlyThis paper presents at first an overview of the project, withthe challenges and tasks which are addressed. Then it fo-cuses on one of the tasks, presenting the planning problemsthat have been demonstrated both in simulation and with areal vehicle, interacting with geometric information. Lastly,it outlines future plans for integration of the planning systemwith a symbolic representation, represented by an ontologyarchitecture which represents the world.

2 ArchitectureFigure 1 outlines the computational architecture designedfor development and study of persistent autonomy. Key isthe notion that the robot’s response to change and the un-expected takes place at one or a number of hierarchicallevels. At an Operational level, sensor data is processedin Perception to remove noise, extract and track features,localise using SLAM, in turn providing measurement val-ues for Robust Control of body axes, contact forces/torquesand relative positions. One of the goals is to further ex-plore some of the current approaches (Aulinas et al. 2011;Lee, Clark, and Salvi 2012) and integrate them on a real ve-hicle. In the cases where a map is given, localisation tech-niques can be used (Petillot et al. 2010), with a specific at-tention to active localisation (Maurelli et al. 2010). Relevantwork on robust control can be found in (Panagou and Kyri-akopoulos 2011; Karras, Loizou, and Kyriakopoulos 2011).At a Tactical Level, Status Assessment uses status informa-tion from around the robot in combination with expectationsof planned actions, world model and observed features todetermine if actions are proceeding satisfactorily, or havefailed. Alongside this, reinforcement and imitation learn-ing techniques are used to train the robot to execute pre-determined tasks, providing reference values to controllers.Fed by measurement values from Perception, they updatecontroller reference values when disturbance or poor con-trol causes action failure. Finally at a Strategic level, sen-sor features and state information are matched with geomet-ric data about the environment to update a geometric worldmodel. These updates include making semantic assertionsabout the task, and the world geometry, and using reasoning

29

Figure 1: PANDORA: Computational architecture to develop and study persistent autonomy

to propagate the implications of these through the world de-scription. Task Planning uses both semantic and geometricinformation as pre-conditions on possible actions or actionsequences that can be executed (Fox, Long, and Magazzeni2011; 2012). When Status Assessment detects failure of anaction, Task Planning instigates a plan repair to asses best re-sponse, if any. Where there is insufficient data to repair, TaskPlanning specifies Focus Areas where it would like furthersensor attention directed. These are recorded in the WorldModel and propagated through Status Assessment as Focusof Attention to direct the relevant sensors to make furthermeasurements.

3 Test ScenariosThe goal of the Pandora project is to address three tasks,described in the following.

3.1 Task A: Autonomous inspection of asubmerged structure

A hover capable autonomous underwater vehicle is equippedwith a forward looking sonar, a video camera and dead reck-oning navigation system, and its mission is to perform an au-tonomous inspection of submerged structures, such as a shiphull (FPSO) or manifold (see Figure 2). The structure is par-tially known, but there are inconsistencies between it and thegeometric world model. The vehicle’s high-level goal is toautonomously inspect the entire structure with no data holi-days, and bring back a complete data set of video and sonarfor mosaicking and post processing. There may be a currentrunning, and the optical visibility may be very poor. In somecases, the sonar inspection sensors must be kept at a constantgrazing angle relative to the structure, for best performance.

In the absence of a pan and tilt unit, the vehicle must dynam-ically pitch, yaw and roll to maintain this orientation.

3.2 Task B: Autonomous location, cleaning andinspection of an anchor chain

A hover capable autonomous underwater vehicle is equippedas above, but in addition carries a high-pressure water jet. Itsgoal is to locate the correct anchor chain of an FPSO and tra-verse it to remove the marine growth on all sides using thewater jet (see Figure 3). Thereafter it revisits the chain andbrings back complete video inspection data for subsequentpost processing. The reaction forces from the water jet intro-duce significant forces and moments onto the vehicle, andalso disturb the anchor chain. Both are therefore in constantdisturbed motion. The optical visibility drops to zero duringjetting as the marine growth floats in the water. There may besea currents moving over the anchor, creating minor turbu-lence downwind of the chain. The chain is located adjacentto flexible risers of slightly larger dimension bringing oil tothe surface.

3.3 Task C: Autonomous grasping and turning ofa valve from a swimming, undocked vehicle

A hover capable autonomous underwater is equipped as inTask A, with a simple robot arm at the front. Its goal is tolocate the correct valve panel of a subsea manifold and openthe correct valve (see Figure 4). On each panel a selectionof valve heads are exposed, each with a T bar attached forgrasping. The vehicle must identify the state of the valves(open, close, in-between) from the T bar orientations, andif appropriate, use the robot arm to grasp the correct valveand open it. The vehicle does not dock, because there are nodocking bars on the panel. It must therefore hover by swim-

30

Figure 2: Task A: Autonomous Inspection of Ship Hull or Subsea Structure

Figure 3: Task B: (a) Marine growth (b) Anchor Chain (c) FPSO, Anchors and Risers

Figure 4: Task C: Valve Turning: (a) Docked ROV (b)Hover-capable Inspection AUV Prior to Launch

ming, counteracting any reaction forces from the turning. Itmust also ensure that the gripper position and orientation ofthe gripper after grasping does not cause significant shearforces in the T bar, and break it off. The visibility is gener-ally good, but there may be sea currents running and minorturbulence down current from the manifold.

In the rest of the paper we focus on task A, describing itin detail and presenting the results obtained so far.

4 AUVs Inspection TaskPlanning and replanning play a major role in any au-tonomous system, especially when a high level of autonomyis sought. In this section, the current approach to planning,linked to Pandora Task A, is presented. We consider the sit-uation in which an AUV has to inspect a collection of struc-tures on the oilfield, beginning with a coarse map, a 2-D sliceof which is shown by the dark blue shape in Figure 5. Here,the precise shapes and conditions of the structures are notknown, so the structures are represented as simple shapessurrounded by free space.

Inspection points are constructed for these structures, ei-ther by recognising their forms and selecting pre-determinedinspection points for them or from an ontological model

available within the PANDORA architecture.

The AUV is required to navigate through the environment,which is possibly different from the coarse map, and to in-spect each inspection point – possibly from a number of dif-ferent angles if the inspection point is only partially visiblefrom one.

4.1 Method Overview

In the initial map, waypoints are placed and connected us-ing a Probabilistic Roadmap (Kavraki et al. 1996),(Lavalle2006) (PRM), which is then extended with a set of strategicwaypoints. The extended roadmap is used to generate theinput for the planner to construct a plan that will allow theAUV to move between these waypoints, visiting all the rele-vant structures and performing observations. Figure 5 showsa 2-D example of a coarse map and a PRM. We call this thefly-past plan and it plays a similar role to the initial surveyused in (Englot and Hover 2010) to initialise a model forhull inspection of an unknown hull. As the AUV executesits fly-past plan, sonar data will reveal more detail about theprecise shape and relative position of the structures. As moredetail is revealed, structures in the map can be replaced withmore accurate representations, and the planner can be usedagain to replan and find better plans. Figure 6 shows an ex-ample where the shape of the object has been updated basedon new sonar data and a new plan has been found.

The inspection plans are built using the plannerPOPF (Coles et al. 2010), and the actions in the plan aresent in sequence to a controller using the ROS framework,as described in more detail in the following section.

31

Figure 5: The coarse map for inspecting a submerged struc-ture (dark blue) and a preliminary roadmap (red).

Figure 6: The map of the submerged structure is updated assonar data becomes available. Following the generated plan,the AUV moves through the highlighted (red) edges, andmakes observations (dark lines) at some waypoints. Manyinspection points (yellow) require inspection from multiplelocations.

Figure 7: Inspection points (yellow) are used to generatestrategic points (red) between the safest approach distancefrom the object and the maximum viewing distance.

5 Planning Inspection Tasks5.1 Roadmap GenerationThe first step for planning an inspection task is to define anabstraction of the environment. To this end, a PRM is cre-ated. The planner will then use edges from the PRM in placeof actual motion to approximate real distances between dif-ferent locations. It is important that every edge in the PRMis representative of a collision-free motion for the AUV. For

this reason the PRM is generated using an OMPL1 motionplanner, which takes into account the vehicle dynamics andchecks for collision-free trajectories. In particular, the mo-tion planner generates the PRM in a model of the environ-ment provided by the PANDORA architecture and subse-quently updated by sensor data.

In the current architecture the orientation of the AUV isnot taken into account during planning. For this reason, inorder to ensure edges remain collision-free when the AUVre-orients itself during the execution of a plan, we use a con-servative approach, and a bounding sphere is used to approx-imate the shape of the AUV during collision detection.

Inspection points are only visible from certain locations.In order to ensure that the PRM contains sufficient view-points for each inspection point additional waypoints areadded to the PRM. We call these waypoints strategic points.It is possible to continually make the PRM more dense, untilthere is sufficient coverage of each inspection point. How-ever, this comes at the cost of increasing the size and diffi-culty of the resulting planning problem. As a result, strategicpoints are an equally robust yet realistic alternative.

Strategic points are obtained by projecting rays from thedetected structure and sampling along the ray. New way-points are sampled between safest approach distance andmaximum effective viewing distance away from the ob-served structure. This process is shown in Figure 7. Thesewaypoints are connected to the Roadmap using the motionplanner and can then be used when planning an inspectionpath.

5.2 Inspection Task Planning ModelIn this section we briefly describe how the inspectiontask can be modelled as a temporal planning problem inPDDL (Fox and Long 2003).

In order to define a domain for the inspection task sce-nario, we consider the waypoints of the PRM generatedin the previous phase. The state of the AUV is describedthrough its position, given by a waypoint. Then, the AUVcan perform two actions, namely hover and observe.

In PDDL (Fox and Long 2003), actions are describedthrough the following components: a list of parameters,listed with a question mark denoting the variable tokens andthe corresponding types; the precondition that defines theconditions that must be satisfied in the current state in orderfor the action to be performed; and the effect that defines thechange in the state after the action has been performed.

The hover action moves the AUV between two connectedwaypoints (which, by construction, are the end-nodes of acollision-free edge). In particular, the AUV can move fromwaypoint ?from to waypoint ?to only if it is currently atwaypoint ?from, and the two waypoints are connected (i.e.,the predicate (connected ?from ?to) is true). As theeffect of the action, the AUV will be at waypoint ?to andno longer at waypoint ?from. The duration of the action de-pends on the distance between the two waypoints. In order totake into account the time required for the AUV to turn and

1The Open Motion Planning Library (Sucan, Moll, and Kavraki2012).

32

(:durative-action do_hover

:parameters (?v - vehicle ?from ?to - waypoint)

:duration ( = ?duration (* (distance ?from ?to)

(invtime ?v)))

:condition (and (at start (at ?v ?from))

(at start (connected ?from ?to)))

:effect (and (at start (not (at ?v ?from)))

(at end (at ?v ?to))))

(:durative-action observe

:parameters (?v - vehicle ?wp - waypoint

?ip - inspectionpoint)

:duration ( = ?duration (obstime))

:condition (and (at start (at ?v ?wp))

(at start (cansee ?v ?ip ?wp)))

:effect (and (at start (not (cansee ?v ?ip ?wp)))

(at end (increase (observed ?ip)

(obs ?ip ?wp)))))

Figure 8: A fragment of the PDDL inspection-task domain.

be in the correct orientation, we use a conservative model inwhich the duration of navigation actions is determined by abound on the expected time, including corrections and reori-entations.

The observe action allows the AUV to observe an inspec-tion point. The precondition requires the AUV to be at awaypoint from which the target inspection point is (partially)visible (predicate cansee). The action effect increases theduration of the mission and sets the portion of the inspec-tion point that has been inspected. Note that the effect (not(cansee ?v ?ip ?wp)) prevents the AUV from ob-serving the same portion of the structure more than once.

The structure to be inspected is described as a PDDLproblem by encoding the generated PRM as the list of allpairs of connected waypoints together with the distance be-tween each pair; the list of inspection points that can be ob-served from each waypoint (if any) together with a valueindicating what percentage of each area can be observed;the initial position of the AUV; the goal state and the met-ric function. A fragment of a sample problem is shown inFigure 9.

5.3 Solving the Planning ProblemTo solve the problem, we use the forward search tempo-ral planner POPF (Coles et al. 2010). As described earlier,the planner deals with coarse-grained events: in this casemovement between waypoints and observation of inspec-tion points. The plan generated by POPF does not specifyhow this movement is to take place, or what the orientationof the AUV must be in order to successfully carry out theinspection tasks, as these are the tasks for the controller ata lower level. Figure 6 shows a 2-D example of a plan. Theplan may contain waypoints where multiple observe actionstake place. As mentioned, the planner does not take the ori-entation of the AUV into account. In order for the plan to bevalid, new motions must be introduced for re-orienting thevehicle between adjacent inspection actions. Additionally,the plan may require the AUV to revisit a waypoint, but the

(define (problem inspection-task-p1)

(:objects auv - vehicle

wp1 wp2 wp3 ... - waypoint

ip1 ip2 ip3 ... - inspectionpoint)

(:init

(at auv wp1)

(= (mission-time) 0)

(= (observed ip1) 0)

(connected wp1 wp2) (connected wp2 wp1)

(= (distance wp1 wp2) 7.16958)


(connected wp1 wp9) (connected wp9 wp1)



...

(cansee auv ip4 wp12)

(= (obs ip4 wp12) 0.445331)

...

)

(:goal (and (>= (observed ip1) 1)

...

))

(:metric minimize (total-time)))

Figure 9: A fragment of the PDDL inspection-task problem

orientation might not sensibly be the same; for example, theAUV might be travelling in the opposite direction. For thisreason a single waypoint in the task planning problem mightcorrespond to multiple configurations of the AUV. Both ofthese issues are accounted for in a post processing step.

This step produces a mapping file containing coordinatedata for use by the controller. Each waypoint in the postpro-cessed plan corresponds to a coordinate, including orienta-tion, in the file.

The post-processing is performed as follows:

1. New movement actions for re-orientation are inserted be-tween adjacent inspection actions.

2. Waypoints are duplicated for each time they are revistedin the plan. The duplicate waypoints share the same po-sition as the original, but will have possibly different ori-entation. Every time the plan requires the AUV to revisita waypoint – including during reorientation – an unusedduplicate waypoint is used instead.

3. Orientation is generated for each waypoint, pointing theAUV either in the correct direction for a subsequent in-spection action, or towards the next waypoint in the path.

4. The positional and orientational information for eachwaypoint is printed to the mapping file in the order theyare visted in the plan.

Therefore, the post-processing step creates a file mappingthe symbolic representation of the waypoints used by theplanner to their real coordinates.

6 Plan Executiong and ReplanningThe planning approach described in the previous section hasbeen implemented in a system that integrates planning andexecution in the ROS framework. In ROS the computation

33

Figure 10: State diagram of the executor

is performed by nodes, therefore the planner is set up in anode that provides an RPC service. This service can be re-quested by another part of the system at any time with anenvironment file and a list of inspection points, as describedpreviously.

Once the planner has found a plan and the waypoints filehas been generated, they need to be converted into ROS mes-sages to be sent to the AUV. To this aim, actions in the planare tokenized and each action is translated into a ROS ac-tionlib goal invocation. These goal invocations are passedsequentially to the controller by an executor. The controlleris responsible for achieving the goal and providing feedback.The feedback can be either success, in which case the ex-ecutor passes the next goal invocation to the controller, orfailure, with a request for replanning.

In the following we describe each step of the integrationand the mechanisms behind replanning in more detail.

6.1 Integration and ExecutionThe executor reads the list of actions in the plan into a queuefor sequential execution. The state diagram for the executoris summarised in figure 10.

In order for the AUV to effect an action, the symbolicdescriptions of actions must be converted into a numericalform for initialization of an actionlib goal. The first tokenof a plan element is the action name (in this case hoveror observe). Each action name is associated with a ROSactionlib goal invocation.

6.2 Feedback and ReplanningIf a ROS goal is achieved successfully, the executor takesinto consideration the next goal invocation, correspondingto the next action in the plan.

It can happen, however, that the action execution fails. Ac-tions failures are mainly due to wrong assumptions aboutthe environment and the structures to inspect. Indeed, as wesaid, the inspection task mission starts with an assumptionabout the map of the world. The inspection points are de-fined according to this assumption. However, during the mis-

sion new sonar data becomes available (as a result of theobserve action) that can reveal that the real environmentdoes not match the expected one. In such a case, the worldmap is updated and the observe action declares the exe-cution to have failed. The fail signal causes the executor torequest a replan on the updated world map.

Note that, in our framework, replanning is based on refor-mulating the inspection task as a new planning problem. There-modelling is performed dynamically, as new informationbecomes available, the current PRM is then updated accord-ing to the new information about the environment and thestructures to inspect, and then a new plan is generated.

7 Experimentation7.1 SimulationA simple scenario was devised to test the replanning loop,using UWSim (Prats et al. 2012). The AUV was told to in-spect a cube shaped object, whereas in fact the cube had anunaccounted for dent on the distal side (as shown in Fig-ure 12).

The aim of the simulation was to test the planning integra-tion and the behaviour of the planning strategy. Therefore,at this stage we did not test the interpretation of sonar datato update the world map, instead we customised the imple-mentation of the observe action so that the feedback wassuccess if executed when the AUV was on the non-dentedsides of the cube, and failure on the dented side. Further-more, when the observe action failed on the dented side, theworld map was replaced with the correct version. Testing thesonar data interpretation is out of the scope of the present pa-per, and will be the focus of future tests.

Figure 11 (above) shows the probabilistic roadmap gener-ated with the motion planner. The roadmap is then given asinput to the planner that finds the plan shown in Figure 11(below).

The processes of generating the roadmap and finding aplan are both very fast, as shown in Table 13, where wereport the computation time required for constructing theroadmap and the plan in different tests.

Figure 12 shows the execution of a different plan in theROS environment. The purpose of this was to test the use ofstrategic points. Inspection points in the structure are set insuch a way that a simple trajectory over the strategic pointswould allow completing the inspection task.

Figure 12 (left) shows the AUV in its initial position,and its initial (incorrect) assumption that a cube is to be in-spected. Around the cube are the orientated (strategic) way-points (in green) that the AUV will use for planning hoveractions between. Figure 12 (center) shows the AUV whereapproaching the dented location. It is is attempting to lookfor the inspection point marked in yellow, but the sonar doesnot see it (red). Figure 12 (right) shows the updated map,and the corresponding updated plan, with the AUV now in-specting the dent and the remaining inspection points.

7.2 Integration and Tank TrialsThe planning system has been integrated into theNessieV IIAUV and tested at the Ocean Frontier, The Un-

34

Figure 12: Inspection task simulation.

Figure 11: Inspection task simulation: roadmap generationand planning.

derwater Centre, in Fort William (Scotland, UK). For the ex-periments, the portion of the tank that was considered is anenvironment with two pillars, like in Figure 14. Both pillarshave a pipe fixed vertically to all four sides. The physicaltest mission was to inspect each of these pipes. The originalknowledge asserts the two pillar to be connected into one bigstructure and a plan is built accordingly. Sensor data show adifference among the perceived environment and the previ-ous knowledge and a new plan is formed. This approach hasbeen successfully validated with in-water trials. The AUVsuccessfully completed the plan as it was generated by theplanner, finishing by returning to the initial positio, whilesonar data and images of inspection points were taken, such

Time (s)PRM Construction Planning

0.67 0.710.94 0.560.67 1.040.67 0.85

Figure 13: Computation time for generating the roadmapand planning.

as the pipe shown in Figure 15.

Figure 14: The vehicle Nessie VII at Ocean Frontier, TheUnderwater Centre, Fort William.

8 Ontology Representation and Future WorkIn parallel with a work focused on planning and dynamic re-planning presented in the previous sections, work has beendone towards the extraction of information from noisy sen-sors, and its conversion into objects which are stored in anontology. An ontology format for semantic data is used forseveral reasons:

• it is easy to create and examine the concepts used bythe system, and the attributes available for each concept,without having to have programming knowledge.

• many tools exist to perform logical reasoning within theknowledge base.

35

Figure 15: Camera image of the pipe

• an ontology represents a well-specified central data storethat all the software components comprising the agent canmake use of.

• common ontologies (such as the IEEE robotics ontologycurrently under development) are very useful for exchang-ing data between robotic systems created by different or-ganisations.

The key area that we are going to address is related to theintegration of semantic information into the planning sys-tem. A particular interest will be devoted to a probabilisticrepresentation of the reality, and its implication for the plan-ner. The presented results are based on a geometric-basedplanning decision, whilst for the future we want to focus onhigh-level information. Inspection points will become con-cepts in the knowledge base and this will allow a broadergeneralisation and abstraction of the planning system.

9 Conclusion

This paper has presented the EU FP7 PANDORA Projectand its preliminary results in the planning domain. Afterthe analysis of the general architecture and of the tasks ad-dressed in the project, the paper has focused on the resultson the planning side. Both simulation and results from realtrial were presented, showing a successful integration amongthe systems. In the future work, a world representation for-malised through ontologies was presented. This will be keyfor the planning system to plan based on ontologies and highlevel concepts, and not just on geometric information.

10 Acknowledgments

This is a joint work with Michael Cashmore, Valerio DeCarolis, Maria Fox, David Lane, Tom Larkworthy, DerekLong, Georgios Papadimitriou, Zeyn Saigol. The researchleading to these results has received funding from the Euro-pean Union Seventh Framework Programme FP7/20072013Challenge 2 Cognitive Systems, Interaction, Robotics un-der grant agreement No 288273 PANDORA

ReferencesAulinas, J.; Carreras, M.; Petillot, Y. R.; Salvi, J.; Llado, X.;Garcia, R.; and Prados, R. 2011. Feature extraction for un-derwater visual slam. In IEEE/OES OCEANS 2011.Coles, A. J.; Coles, A. I.; Fox, M.; and Long, D. 2010.Forward-chaining partial-order planning. In Proceedings ofICAPS 2010.Sucan, I. A.; Moll, M.; and Kavraki, L. E. 2012. The OpenMotion Planning Library. IEEE Robotics & AutomationMagazine 19(4):72–82. http://ompl.kavrakilab.org.Englot, B., and Hover, F. 2010. Inspection planning forsensor coverage of 3D marine structures. In IEEE/RSJ Int.Conf. on Intelligent Robots and Systems.Fox, M., and Long, D. 2003. PDDL2.1: An extension to pddlfor expressing temporal planning domains. J. Artif. Intell.Res. (JAIR) 20:61–124.Fox, M.; Long, D.; and Magazzeni, D. 2011. Automaticconstruction of efficient multiple battery usage policies. InProceedings of ICAPS 2011.Fox, M.; Long, D.; and Magazzeni, D. 2012. Plan-basedpolicy learning for autonomous feature tracking. In Pro-ceedings of ICAPS 2012.Karras, G. C.; Loizou, S. G.; and Kyriakopoulos, K. J. 2011.Towards semi-autonomous operation of under-actuated un-derwater vehicles: sensor fusion, on-line identification andvisual servo control. Auton. Robots 31(1):67–86.Kavraki, L. E.; Svestka, P.; Latombe, J.-C.; and Overmars,M. H. 1996. Probabilistic roadmaps for path planning inhigh-dimensional configuration spaces. IEEE Transactionson Robotics and Automation 12(4):566–580.Lavalle, S. M. 2006. Planning Algorithms. Cambridge.Lee, C. S.; Clark, D. E.; and Salvi, J. 2012. Slam with singlecluster phd filters. In Proceedings of ICRA 2012.Maurelli, F.; Mallios, A.; Krupinski, S.; Petillot, Y.; and Ri-dao, P. 2010. Speeding-up particle convergence with proba-bilistic active localisation for auv. In IFAC IAV.Panagou, D., and Kyriakopoulos, K. J. 2011. Control ofunderactuated systems with viability constraints. In Proc.of the 50th IEEE Conference on Decision and Control andEuropean Control Conference, 5497–5502.Petillot, Y.; Maurelli, F.; Valeyrie, N.; Mallios, A.; Ridao, P.;Aulinas, J.; and Salvi, J. 2010. Acoustic-based techniquesfor auv localisation. Journal of Engineering for MaritimeEnvironment 224(4):293–307.Prats, M.; Perez, J.; Fernandez, J.; and Sanz, P. 2012. Anopen source tool for simulation and supervision of underwa-ter intervention missions. In Proceedings of IROS 2012.

36

Session 2

Plans: execution, repair, robustness

37

38

A TGA-based Method for Safety Critical Plan Execution

Andrea Orlandini, Marco Suriano, Amedeo CestaCNR – National Research Council of Italy

Institute for Cognitive Science and TechnologyRome, Italy – {name.surname}@istc.cnr.it

Alberto FinziUniversita di Napoli “Federico II”

DIETINaples, Italy – [email protected]

Abstract

Safety critical planning and execution is a crucial issue in au-tonomous systems. This paper proposes a methodology forcontroller synthesis suitable for timeline-based planning anddemonstrates its effectiveness in a space domain where ro-bustness of execution is a crucial property. The proposed ap-proach uses Timed Game Automata (TGA) for formal mod-eling and the UPPAAL-TIGA model checker for controllerssynthesis. An experimental evaluation is performed using areal-world control system.

IntroductionThe design of safety critical and dependable systems is be-coming increasingly important as technology advances. Infact, safety critical systems need to be certified and, thus,crucial properties, e.g., a bounded maximum execution time,require to be enforced. On the other hand, the need ofmeeting operational requirements in challenging domainsas, for instance, in several space applications, often leadsto the use of highly efficient software modules that addressspecific sub-parts of a larger problem with ad-hoc solvingalgorithms that cannot be easily verified. In this regard,the authors are working at the integration of Planning andScheduling (P&S) technology with Validation and Verifica-tion (V&V) techniques to synthesize safety critical systemsin space robotics. In particular, our current goal consistsin cascading a timeline-based planner (OMPS (Fratini, Pec-ora, and Cesta 2008)) and a V&V technique based on TimedGame Automata (TGA) (Maler, Pnueli, and Sifakis 1995) toautomatically synthesize a robot controller that guaranteescertain properties.

More specifically, we are addressing the dynamic control-lability issue (e.g., see (Morris and Muscettola 2005)): oncea planner has generated a temporal plan, it is up to the execu-tive system to decide, at run-time, how and when to executeeach planned activity preserving both plan consistency andcontrollability. Such capability is even more crucial whenthe generated plan is temporally flexible, as it captures anenvelope of potential behaviors to be instantiated during the

The present paper has been already accepted for publication atthe IEEE International Conference on Tools with Artificial Intelli-gence (ICTAI 2013).

execution taking into account temporal/causal constraintsand controllable/uncontrollable activities and events. 1

A previous paper (Cesta et al. 2010a) introduces a tech-nique for the P&S and V&V integration, while (Orlandini etal. 2011) first uses such integration for controller synthesis.The present paper integrates the technology in a real con-trol architecture result from the GOAC project 2 (Ceballoset al. 2011) and explores the applicability in a set of exper-iments based on a real robot. The experimental evaluationshows the practical feasibility of the on-line deployment ofsuch TGA-based approach in different operative modalitiesconsidering increasingly complex instances of a real-worldrobotics case study. In all the considered settings, robustplan execution is formally enforced maintaining plans as dy-namically controllable. It is worth underscoring that, eventhough the running example is taken from a specific project,the work described in this paper is valid for any generic lay-ered control architecture (e.g., (Gat 1997)) that integrates atemporal planning system.

Plan of the Paper. A first section introduces timeline-based planning and execution to set the context of the work.The second presents the integration with the TGA-basedmethod. The real-world robotic scenario is illustrated in thesubsequent section followed by the outcome of the associ-ated empirical evaluation. Some conclusions end the paper.

Planning and Execution with TimelinesTimeline-base planning has been introduced in (Muscet-tola 1994) and has demonstrated successful in a number ofspace applications (Muscettola 1994; Jonsson et al. 2000;Cesta et al. 2007). The modeling assumption underlyingthis approach is inspired by the classical Control Theory:the problem is modeled by identifying a set of relevant com-ponents whose temporal evolutions need to be controlled toobtain a desired behavior. Components are primitive entitiesfor knowledge modeling, and represent logical or physical

1Uncontrollable events are those that cannot be planned for asthey are decided by Nature – the external environment.

2GOAC (Goal Oriented Autonomous Controller) has been amulti-institutional effort within the activities on robotics funded bythe European Space Agency (ESA)

39

subsystems whose properties may vary in time. In this re-spect, the set of domain features under control are modeledas a set of temporal functions whose values have to be de-cided over a time horizon. Such functions are synthesizedduring problem solving by posting planning decisions. Theevolution of a single temporal feature over a time horizonis called the timeline of that feature. In particular, for thepurpose of this paper multi-valued state variables are con-sidered as the basic type of time varying features (Cesta andOddi 1996). As in classical control theory, the evolution ofthose features are described by some causal laws which de-termine legal temporal evolutions of timelines. For the statevariables, such causal laws are encoded in a Domain The-ory which determines the operational constraints of a givendomain. Task of a planner is to find a sequence of controldecisions that brings the variables into a final set of desiredevolutions (i.e., the Planning Goals) always satisfying thedomain specification.

In this area of research, a lot of effort has been dedicatedto build software development environments, like EUROPA(Barreiro et al. 2012), ASPEN (Chien et al. 2010), andAPSI-TRF (Cesta et al. 2009), aiming to facilitate the syn-thesis of timeline-based P&S applications. Nevertheless, acrucial issue in real applications is the tight integration ofplanning and execution.

Previous works have tackled the robust execution is-sue within a Constraint-based Temporal Planning (CBTP)framework deploying specialized techniques based ontemporal-constraint networks. Several authors (Morris,Muscettola, and Vidal 2001; Morris and Muscettola 2005;Shah and Williams 2008; Hunsberger 2010) have proposeda dispatchable execution approach where a flexible tem-poral plan is then used by a plan executive that schedulesactivities on-line while guaranteeing constraint satisfaction.This general line of research has concerned specifically theuse of timeline-based planning and their temporal constraintnetworks implementation for an homogeneous synthesis ofcontrollers. Among the architectures that use a uniform rep-resentation for the continuous planning and execution taskare IDEA (Muscettola et al. 2002), T-REX (Py, Rajan, andMcGann 2010) and, more recently, GOAC (Ceballos et al.2011). In particular, the GOAC effort combines several tech-nologies: (a) a timeline-based deliberative layer which inte-grates a planner, called OMPS (Fratini, Pecora, and Cesta2008), built on top of APSI-TRF to synthesize timelines andrevise them according to execution needs, and an executivea la T-REX (Py, Rajan, and McGann 2010); (b) a functionallayer (Bensalem et al. 2010) which combines a state of theart tool for developing functional modules of robotic sys-tems (Gen

oM) with a component based framework for im-plementing embedded real-time systems (BIP).

In this context, the present paper particularly focuses ona timeline-based, domain independent deliberative controlsystem, called APSI Deliberative Reactor (ADR) 3 (Cesta

3The term Reactor is a legacy from T-REX. It is also worthsaying that the initial motivation of our work is to design a smoothintegration with the T-REX executive that in its original implemen-tation uses a different timeline-based planner.

et al. 2012), proposed in GOAC. The ADR has been de-signed to address a set of open issues in planning and execu-tion with timelines, i.e., the dynamic management of goalsduring planning and execution, the assessment of the sta-tus of partially executed goals and the dynamic dispatch-ing of commands. More in detail, the ADR is an instanceof a proactive control system entirely based on APSI-TRFtechnology and is constituted by (i) an execution module, todispatch planned timelines, to supervise their execution sta-tus and to entail continuous planning and re-planning, (ii) atimeline-based planning module, i.e., OMPS, to model andsolve planning problems.

The ADR is designed to be domain independent, i.e., onceprovided a suitable timeline-based description model of thesystem to be controlled and a set of temporal goals to beachieved it fully implements all the required functionalitiesto plan for goals, dispatch planned values to the controlledsystem and supervise plan execution collecting the teleme-try of the controlled system. One of the main advantageof domain independence is the capability of the delibera-tive reactor to both plan for user goals and dynamically re-act to off-nominal conditions detected from the controlledsystem telemetry. Additionally, it allows flexibility in twodirection: it can achieve different classes of user goals in thesame system by substituting the controller model and it canbe deployed to control different systems by substituting thedomain description of the controlled system.

Finally, it is worth pointing out that, following a T-REX-like approach (Py, Rajan, and McGann 2010), the use of re-actors allows to implement controller systems by means ofhierarchical compositions of various deliberative reactors.In fact, reactors are differentiated on the basis of whetherthey need to deliberate in abstraction (reasoning on the high-est level of representation) or they need to be responsive tothe inputs from the lower levels closer to the robotic hard-ware. In the former case the planner a have larger planninghorizon to deliberate and return partial plan for dispatchingto other reactors. In the latter the planner has a no time toreason, hence it implements simple reactive policies. Suchgradation allows the entire system to be both deliberativeand reactive over its temporal scope. The GOAC architec-ture, whose executive component is based on T-REX, usessuch a hierarchical configuration of reactors (see further de-tails in (Ceballos et al. 2011)).

TGA-based controller synthesisThis section presents the integration in APSI-TRF of analternative and novel approach to flexible plan dispatch-ing/execution proposed in (Orlandini et al. 2011), whererobust plan execution is pursued by relying on Timed GameAutomata (TGA) formal modeling and controller synthesis.The technique used to synthesize plan controllers is a di-rect consequence of the formalization proposed in (Cesta etal. 2010a) in which plan correctness as well as dynamiccontrollability are checked by means of TGA model check-ing. Analogously to that work, the dynamic P&S domainand the generated flexible temporal plan are encoded intoTGA models. However, a different perspective is exploitedthrough the use of a model checker (i.e., UPPAAL-TIGA

40

(Behrmann et al. 2007)) to directly synthesize a real-timeplan controller for the flexible plan. Such controller guaran-tees robust plan execution along with dynamic controllabil-ity.

TGA-based controllers for flexible plan executionTimed Game Automata (Maler, Pnueli, and Sifakis 1995)(TGA) allow to model real-time systems and controllabil-ity problems representing uncontrollable activities as adver-sary moves within a game between the controller and theenvironment. Following the same approach presented in(Cesta et al. 2010a) (and briefly discussed above), flexibletimeline-based plan verification can be performed by solv-ing a Reachability Game using UPPAAL-TIGA (Cassez etal. 2005). To this end, flexible timeline-based plans, statevariables, and domain theory descriptions are compiled intoa network of TGA (nTGA). This is obtained by means ofthrough following steps: (1) a flexible timeline-based planP is mapped into a nTGA Plan. Each timeline is encodedas a sequence of locations (one for each timed interval),while transition guards and location invariants are definedaccording to (respectively) lower and upper bounds of flex-ible timed intervals; (2) the associated set of state variablesSV is mapped into a nTGA StateVar. Basically, a one-to-onemapping is defined between state variables descriptions andTGA. In such encoding, value transitions are partitioned intocontrollable and uncontrollable according to their actual ex-ecution profile; (3) an Observer automaton is introduced tocheck for violations of both value constraints and DomainTheory. In particular, two locations are defined: an Errorlocation, to state constraint violations, and a Nominal (OK)location, to state that the plan behavior is correct. The Ob-server is defined as fully uncontrollable. (4) the compoundnTGA PL = StateVar ∪ Plan ∪ {Observer} encapsulatesflexible plan, state variables and domain theory descriptions.

Then, considering a Reachability Game RG(PL, Init,Safe, Goal) where Init represents the set of the initial lo-cations of each automaton in PL, Safe is the OK locationof the Observer automaton, and Goal is the set of goal loca-tions (one for each automaton in Plan), plan verification canbe performed solving/winning theRG(PL, Init, Safe, Goal)defined above. In order to win/solve the reachability gameRG, UPPAAL-TIGA is exploited as verification tool check-ing a suitable CTL formula, i.e., Φ = A [ Safe U Goal] inPL. In fact, the formula Φ states that along all its possibletemporal evolutions, PL remains in Safe states until Goalstates are reached. That is, in all the possible temporal evolu-tions of the timeline-based plan P all the constraints are ful-filled and the plan is completed. Thus, if the solver verifiesthe above property, then the flexible temporal plan is valid.Whenever the flexible plan is not verified, UPPAAL-TIGAproduces an execution trace showing one temporal evolutionthat leads to a fault. Such a strategy can be analyzed in orderto check either for plan weaknesses or for the presence offlaws in the planning model.

Furthermore, a mapping between flexible temporal behav-iors defined by P over the temporal horizon [0, H] and theautomata behaviors defined by PL can be shown: for eachpartial temporal behavior pb ∈ P defined over H ′ < H ,

it there exists a unique temporal evolution ρpb of PL suchthat ρpb represents the partial temporal behavior pb over thesame horizonH ′. That is, ρpb represents the same valued in-tervals sequence in P limited to H ′ and the duration of ρpbis exactly the horizon H ′. As a consequence, the winningstrategy f generated by UPPAAL-TIGA solving the reach-ability game on PL represents a flexible plan controller Cfthat achieves the planning goals maintaining the dynamiccontrollability during the overall plan execution. In (Orlan-dini et al. 2011), the reader may find a formal account of thegeneration of a plan controller Cf derived from a winningstrategy f generated by UPPAAL-TIGA.

Integrating the controller in the deliberativereactorHere, the integration in the ADR of the TGA-based methoddiscussed above is presented. In particular, a suitable em-bedding of the UPPAAL-TIGA tool within the ADR plan-ning and execution cycle is shown and, then, the advantagesin terms of plan correctness and robust execution enforce-ment (i.e., dynamic controllability) are discussed.

The integration schema is shown in Figure 1. The leftpart of the figure shows the APSI-TRF general architecture.The domain and problem models are encoded as DomainDefinition Language (DDL) and Problem Definition Lan-guage (PDL) input files. Then, both DDL and PDL filesare parsed and managed by the Component-based DomainModeling Engine and a Current Plan (i.e., the initial plan-ning problem) is created to be manipulated by a ProblemSolver. Indeed, the Current Plan is specialized as a datastructure called Decision Network in APSI-TRF. Then, ageneric problem solver, e.g., OMPS, applies a solving pro-cedure until the Current Plan satisfies all the planning goals(or fails in finding a solution plan).

The right part of Fig. 1 depicts a simplified view of theAPSI Deliberative Reactor with two relevant services, i.e.,the Dispatch services and the Execution Feedback modules,in charge of (respectively) dispatching suitable commandsfor the controlled system and collecting feedback from thefield. The new APSI Deliberative Reactor architecture stillreflects the structure of a T-REX reactor (as defined in (Cestaet al. 2012)) as well as it introduces two new components,i.e., the TGA-based Controller (TC) and the Strategy Man-ager (SM), enabling robust plans execution through the useof strategies generated by UPPAAL-TIGA.

The TC is in charge of managing plans in order to (i) ver-ify plan correctness and (ii) generate a dynamically control-lable execution strategy: once a solution plan P is generatedby the problem solver (i.e., the stored Current Plan is actu-ally the valid plan to be executed), the TC automatically gen-erates the associated TGA encoding (PL) and, then, invokesUPPAAL-TIGA in order to verify the correctness of the planas well as to check for the existence of (at last) one tem-poral plan execution guaranteeing the correct achievementof the plan goals, independently from the exogenous eventsgenerated by the environment (i.e., enforcing the dynamiccontrollability). If the verifier finds one of these sequences,then a strategy for the plan execution is generated. Namely,a strategy generated by UPPAAL-TIGA is a set of tempo-

41

Component Based

Modeling Engine

Domain Description Language

Problem Description Language

Dispatch Services

Execu1on Feedback

APSI Deliberative Reactor

Current Plan

APSI-TRF

TGA-‐based Controller

TIGA

Verify Plan

Generate Strategy Problem Solver

Strategy Manager

Figure 1: Integration of TGA-based controller in the APSI Deliberative Reactor

ral rules that should guide the controlled system through theexecution space avoiding plan failures during its execution.More formally, an UPPAAL-TIGA strategy is a set of rulesf(t, s) defined as follows:

f(t, s) =

{twl < t < twu Waittal < t < tau Action ant > terr Error

where t is the execution time, s is one of the possible stateof the system, twu and twl represent, respectively, lowerand upper bounds of a time interval in which the systemmust wait for the environment to act, tal and tau representlower and upper bounds of a time interval in which the sys-tem should perform the action an (i.e., one of the timelineshould change value) and terr is a time limit beyond whichthe system generates an error. The latter represents the casein which the execution strategy is coping with exogenousevents that are not properly modeled in the planning do-main, e.g., the actual duration of an uncontrollable event isshorter/longer than the minimal/maximal duration stated inthe domain model. This implies that the planning model isinconsistent with the actual behavior of the controlled sys-tem and, thus, a revision of that model (and the TGA encod-ing) is required.

The SM is the module in charge of implementing theconcrete dispatching policy relying on the UPPAAL-TIGAstrategy. In fact, once generated, the SM exploit such strat-egy to to choose the more suitable f(t, s) rule to be exe-cuted, thus, extracting the associated action to be dispatched(or to wait while the controlled system is evolving) as wellas to continuously monitor the internal status of the reactortimelines and the execution feedback received from the field.

Given the above, the new integrated reactor architectureguarantees plans correctness as well as the robust executionof the generated plans, thus, increasing the probability ofsuccessfully performing the temporal plan.

Testing the Synthesis on a Robot ControllerThis section describes a robotic scenario related to the GOACproject exploited as case study for the experimental assess-ment presented in the next section. First, we describe theDALA platform, i.e., the real robotic platform deployedwithin the GOAC project. Then, we exploit the same sce-nario in order to show a possible configuration of a control

system implemented by means of an APSI Deliberative Re-actor.

The Robotic PlatformThe DALA rover is one of the LAAS-CNRS robotic plat-forms that can be used for autonomous exploration experi-ments. In particular, it is an iRobot ATRV robot that pro-vides a large number of sensors and effectors. It can usevision based navigation (such as the one used by the MarsExploration Rovers Spirit and Opportunity), as well as in-door navigation based on a Sick laser range finder. Then, theuse of DALA in the GOAC project was to simulate a roboticscenario as close as possible to a planetary exploration rover.

In this regard, DALA can be considered as a fair repre-sentative for a planetary rover equipped with a Pan-Tilt Unit(PTU), two stereo cameras (mounted on top of the PTU), apanoramic camera and a communication facility. The roveris able to autonomously navigate the environment, move thePTU, take high-resolution pictures and communicate imagesto a Remote Orbiter. During the mission, the Orbiter may benot visible for some periods. Thus, the robotic platform cancommunicate only when the Orbiter is visible. The missiongoal is a list of required pictures to be taken in different loca-tions with an associated PTU configuration. A possible mis-sion actions sequence is the following: navigate to one of therequested locations, move the PTU pointing at the requesteddirection, take a picture, then, communicate the image to theorbiter during the next available visibility window, put backthe PTU in the safe position and, finally, move to the follow-ing requested location. Once all the locations have been vis-ited and all the pictures have been communicated, the mis-sion is considered successfully completed. The rover mustoperate following some operative rules to maintain safe andeffective configurations. Namely, the following conditionsmust hold during the overall mission: (C1) While the robotis moving the PTU must be in the safe position (pan and tiltat 0); (C2) The robotic platform can take a picture only ifthe robot is still in one of the requested locations while thePTU is pointing at the related direction; (C3) Once a picturehas been taken, the rover has to communicate the picture tothe base station; (C4) While communicating, the rover hasto be still; (C5) While communicating, the orbiter has to bevisible. The reader may refer to (Ceballos et al. 2011) forfurther details.

42

The Figure 2 shows a timeline-based plan and theassociated temporal constraints implementing the opera-tive rules given above. The depicted constraints are:(C1) GoingTo(x,y) must occur during PointingAt(0,0); (C2)TakingPicture(pic,x,y,pan,tilt) must occur during At(x,y)and PointingAt(pan,tilt); (C3) TakingPicture(pic,x,y,pan,tilt)must occur before Communicating(pic); (C4) Communi-cating(file) must occur during At(x,y); (C5) Communicat-ing(file) must occur during Visible.

C5

C4 C3

C2

C1 C2

Figure 2: An example of timeline-based plan with con-straints.

Control system configurationAccording to T-REX design approach (Py, Rajan, and Mc-Gann 2010), a GOAC control system configuration has beendesigned considering an analogy between human control re-sponsibilities for the mentioned rover, thus, implementing asuitable control system as the composition of a set of differ-ent deliberative reactors. Namely, some different personnelacting specific roles can be considered as involved in con-trol tasks (Ceballos et al. 2011). As the goal of this paperis to evaluate the new architecture for the APSI Delibera-tive Reactor, taking advantage of the flexibility provided bythe GOAC framework, a control system is here defined con-sidering only two different reactors i.e., a Mission Managerresponsible to perform all the deliberative tasks and a Com-mand Dispatcher in charge of executing commands and col-lecting execution feedback.

More in detail, the Mission Manager reactor is designedto provide plans for user requested goals, i.e., requests for (i)scientific pictures in desired locations, (ii) reaching a certainposition and (iii) monitoring a certain area. Then, the time-lines planned by the Mission Managers are dispatched forexecution to the Command Dispatcher reactor that, in turn,encodes the planned values into actual commands for therover and uses the replies provided by the functional layer toproduce observations on the low-level timelines. Thus, eachreactor has a specific functional role over different temporalscopes during the mission: the Mission Manager’s tempo-ral scope is the entire mission and potentially can take min-utes to deliberate; the Command Dispatcher interfaces to theDALA functional layer and requires minimal latency withno deliberation. It is also worth underscoring that the Mis-sion Manager is the only APSI Deliberative Reactor in this

use of the GOAC architecture. The Command Dispatcheris a fully reactive system that interacts with the actual con-trolled system with no deliberation task involved.

Empirical EvaluationThis section illustrates the assessment of the new APSI De-liberative Reactor performance considering the control sys-tem configuration presented in the previous section. Here,the aim is to assess the on-line TGA-based Controller syn-thesis performance in a real world scenario in order to showits viability with respect to actual execution requirements,i.e., the latencies of an on-line planning and execution cy-cle. Therefore, similarly to (Orlandini et al. 2011), differentplanning/execution scenarios are considered by varying thecomplexity of the robotic planning problem dimensions:(1) Plan Length. Problem instances are considered with anincreasing number of requested pictures (from 1 to 3). Atthe same time, flexible plans are generated over a horizonlength ranging from 150 to 400 seconds.(2) Plan Flexibility. For each uncontrollable activity (i.e.,robot and PTU movements as well as camera and commu-nication tasks), a minimal duration is set, but temporal flex-ibility on activity termination is considered, i.e., the end ofeach activity presents a tolerance ranging from 10 to 30 sec-onds. This interval represents the degree of temporal flexi-bility/uncertainty that we introduce in the system.(3 Plan Choices. We define from 1 to 3 visibility windowsthat can be exploited to communicate picture contents. No-tice that an increasing number of communication opportu-nities raises the complexity of the planning problem with acombinatorial effect.

More in general, among all the generated problems in-stances, the ones with higher number of required pictures,higher temporal flexibility, and higher number of visibilitywindows result as the hardest ones. In these scenarios, weanalyzed the performance of the APSI Deliberative Reac-tor considering costs for planning, TGA model generation,plan verification-strategy synthesis and actual plan execu-tion. The OMPS tool has been exploited as CBTP DomainIndependent Planner. The DALA rover has been simulatedby means of a software environment 4 used for testing thecontrol system during the GOAC project and offering thesame robotic functional interface as well as fully replicat-ing the physical rover behaviors (i.e., random temporal du-rations for uncontrollable tasks). The experiments have beenran on a PC endowed with an Intel Core i7 CPU (2.93GHz)and 4GB RAM and, for each setting, 10 runs have been per-formed, and in tables, average timings are reported in mil-liseconds.

In Table 1, the performance of the APSI Deliberative Re-actor during the whole planning and execution cycle are re-ported. Such execution settings seem to be suitable only inproblems with one picture while, with 2 pictures, verifica-tion costs are rather dominating both deliberative and exe-cution costs. For instance, with 3 communication windowsand 30 seconds flexibility (i.e., the most complex scenario),

4DALA software simulator courtesy of Felix Ingrand andLavindra De Silva from LAAS-CNRS.

43

Table 1: Performance with verification and strategy generationperformed on a complete TGA model (timings in secs).

1 Comm Window 2 Comm Windows 3 Comm Windowsflex 10 20 30 10 20 30 10 20 30

PLANNINGTP1 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3TP2 1.0 1.1 1.0 1.1 1.1 1.1 1.1 1.1 1.1

TGA ENCODINGTP1 0.008 0.007 0.006 0.007 0.009 0.006 0.009 0.007 0.006TP2 0.007 0.007 0.008 0.008 0.006 0.007 0.007 0.007 0.008

PLAN VERIFICATION & STRATEGY GENERATION ON COMPLETE TGA MODELTP1 0.6 0.6 0.8 0.6 0.6 0.8 0.6 0.8 2.2TP2 137.8 152.8 149.9 137.4 149.4 150.5 139.4 150.3 151.5

PLAN EXECUTIONTP1 27.9 38.3 42.7 30.8 36.5 44.5 31.7 36.4 40.4TP2 65.5 77.2 103.5 60.4 78.7 89.0 66.0 78.0 106.1

even the execution costs are comparable with the time spentby UPPAAL-TIGA in verifying the plan and generating thestrategy. Moreover, in the case of 3 pictures, UPPAAL-TIGA has been always terminated after 500 seconds withno suitable generated strategy.

Table 2: Performance with verification performed on a completeTGA model and strategy generation on a reduced TGA model (tim-ings in msecs).


PLANNINGTP1 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3TP2 1.0 1.0 0.9 1.0 1.1 1.0 1.1 1.1 1.1TP3 14.9 15.2 14.1 15.4 15.7 15.9 16.4 16.4 16.5

TGA ENCODINGTP1 0.006 0.006 0.006 0.006 0.006 0.007 0.007 0.006 0.006TP2 0.006 0.005 0.005 0.005 0.006 0.005 0.006 0.005 0.005TP3 0.004 0.004 0.003 0.005 0.004 0.004 0.005 0.005 0.004

PLAN VERIFICATION (COMPLETE) & STRATEGY GENERATION (REDUCED)TP1 0.6 0.5 0.5 0.6 0.5 0.5 0.5 0.5 0.5TP2 10.8 10.8 10.7 10.8 10.8 10.6 10.8 10.6 10.6TP3 70.4 70.7 70.1 70.3 70.4 71.1 70.6 70.5 70.5

PLAN EXECUTIONTP1 37.6 46.4 49.8 38.8 47.6 50.2 39.4 49.6 50.8TP2 89.4 100.8 120.6 87.0 108.6 117.4 90.8 110.8 124.0TP3 142.2 165.6 169.6 137.8 166.6 182.6 135.8 172.8 180.2

Then, considering the different performance of the ver-ification tool in checking plan correctness only (see (Or-landini et al. 2011)) and taking advantage of the flexibil-ity of the TGA method, a slightly modified approach hasbeen deployed and tested. The TC in the APSI delibera-tive reactor has been modified in order to invoke first theUPPAAL-TIGA tool to check the plan correctness on thecomplete TGA model PL without generating the winningstrategy and, afterward, to ask the verification tool for gen-erating a strategy on a reduced TGA model. Namely, the TCproduces a reduced TGA model considering only the planPlan and the Observer automata (i.e., focusing the strategygeneration on the plan and the domain theory descriptionsonly) relying on the fact that the plan validity is guaranteedby the previous verification step.

Such approach leads to verification performance morecompatible with the considered on line execution scenarioseven though, yet, plan verification and strategy generationcosts can not be neglected with respect to planning and ex-ecution costs (see Table 2). Furthermore, considering theaverage timing values among all the execution settings (i.e.,the average values of each rows in the table), Figure 3 de-

1

10

100

1000

10000

100000 TGA Encoding

Planning

Plan Verifica3on &

Strategy Genera3on

Plan Execu3on

3 Pictures 2 Pictures 1 Picture

Figure 3: Performance related to full execution with plan veri-fication and strategy generation performed in two different steps(timings in msecs).

picts a radar chart showing how each phase is affecting thewhole planning and execution cycle. The plan verificationand strategy generation task is always greater of almost oneorder of magnitude (axis are in logarithmic scale) and, in the3 pictures settings, such cost is comparable even with planexecution cost.

Then, a further modification of the TC has been deployedwhere strategy generation is performed on the reduced TGAmodel without checking plan correctness. This option en-tails the strong assumption that every plan generated by theproblem solver, in this case OMPS, is supposed to be valid,i.e., off-line plan verification is requested. In Table 3, thereported performance shows that planning and strategy gen-eration costs are equivalent and fully compatible in all theplan execution scenarios. This is shown also in Figure 4in which, again, average values are considered in the radarcharts.

Table 3: Performance with strategy generation only on a reducedTGA model (timings in msecs).


PLANNINGTP1 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3TP2 1.0 1.0 0.9 1.0 1.1 1.0 1.1 1.1 1.1TP3 14.9 15.2 14.1 15.4 15.7 15.9 16.4 16.3 16.5

TGA ENCODINGTP1 0.006 0.006 0.006 0.006 0.006 0.007 0.007 0.006 0.006TP2 0.006 0.005 0.005 0.005 0.006 0.005 0.006 0.005 0.005TP3 0.004 0.004 0.003 0.005 0.004 0.004 0.005 0.005 0.004

STRATEGY GENERATION ON REDUCED TGA MODELTP1 0.6 0.5 0.5 0.6 0.5 0.5 0.5 0.5 0.5TP2 3.9 3.7 3.6 3.8 3.7 3.6 3.8 3.6 3.6TP3 15.4 15.7 15.1 15.3 15.3 16.1 15.6 15.5 15.5

PLAN EXECUTIONTP1 37.6 46.4 49.8 38.8 47.6 50.2 39.4 49.6 50.8TP2 89.4 100.8 120.6 87.0 108.6 117.4 90.8 110.8 124.0TP3 142.2 165.6 169.6 137.8 166.6 182.6 135.8 172.8 180.2

Discussion The present experimental evaluation showsthat the APSI Deliberative Reactor infrastructure allows the

44

1

10

100

1000

10000

100000

TGA Encoding

Planning

Strategy Genera4on

Plan Execu4on

3 Pictures 2 Pictures 1 Picture

Figure 4: Performance related to full execution with plan veri-fication and strategy generation performed in two different steps(timings in msecs).

deployment of different compositions of verification andstrategy generation tasks. In particular, the TGA-based Con-troller is adaptable to different real contexts allowing the im-plementation of different suitable controller solutions. Themost effective and affordable composition considers only astrategy generation task even though it entails the assump-tion on the validity of generated plans. Envisaging a use ofthe technique within a suitable Knowledge Engineering sys-tems (Cesta et al. 2010b; Bernardi et al. 2013) potentiallyguarantees the deploying of an APSI-based application af-ter an extensive off-line plan verification and testing phases(in addition to the known deterministic behavior of the con-sidered problem solver) and suggests to consider also suchcomposition as a fully reliable solution. More generally, theAPSI Deliberative Reactor is open to support any operativemodality with a computational load that can be tuned ac-cording to the criticality of the controlled system.

ConclusionIn this paper, an extension of the APSI Deliberative Reac-tor control system has been presented integrating a TGA-based plan controller synthesis approach, thus, enforcing ro-bust plan execution. Then, an experimental evaluation hasbeen reported discussing the practical feasibility of the on-line deployment of such TGA-based approach in differentoperative modalities and considering increasingly complexinstances of a real-world robotics case study derived froma research project funded by the European Space Agency.However, the work described here is valid for any genericlayered control architecture (e.g., (Gat 1997)) that integratesa temporal planning and scheduling system.

The reported results show the viability of the approach aswell as enforce two main general advantages: the presentedmethodology relies on off-the-shelf planning/verificationtools and, thus, it enables its application to any generic lay-

ered control architecture that integrates a temporal P&S sys-tem; the possibility of applying different settings for the con-trol system allows to look for trade-off between planning,verification and execution costs, i.e., the control system canbe tuned up according to the actual criticality of the con-trolled system.

Acknowledgments. Cesta, Orlandini and Suriano are partiallyfunded by the Italian Ministry for University and Research (MIUR)and CNR under the GECKO project (Progetto Bandiera “La Fab-brica del Futuro”). Finzi is partially supported by the EC withinthe SHERPA FP7 project under grant agreement ICT-600958.

ReferencesBarreiro, J.; Boyce, M.; Do, M.; Frank, J.; Iatauro, M.;Kichkaylo, T.; Morris, P.; Ong, J.; Remolina, E.; Smith, T.;and Smith, D. 2012. EUROPA: A Platform for AI Planning,Scheduling, Constraint Programming, and Optimization. In ICK-EPS 2012: the 4th Int. Competition on Knowledge Engineeringfor Planning and Scheduling.Behrmann, G.; Cougnard, A.; David, A.; Fleury, E.; Larsen, K.;and Lime, D. 2007. UPPAAL-TIGA: Time for playing games! InProc. of CAV-07, number 4590 in LNCS, 121–125. Springer.Bensalem, S.; de Silva, L.; Gallien, M.; Ingrand, F.; and Yan,R. 2010. “Rock Solid” Software: A Verifiable and Correct-by-Construction Controller for Rover and Spacecraft FunctionalLevels. In i-SAIRAS-10. Proc. of the 10th Int. Symp. on ArtificialIntelligence, Robotics and Automation in Space.Bernardi, G.; Cesta, A.; Orlandini, A.; and Finzi, A. 2013. Aknowledge engineering environment for p&s with timelines. InICAPS Workshop on Knowledge Engineering for Planning andScheduling (KEPS).Cassez, F.; David, A.; Fleury, E.; Larsen, K. G.; and Lime, D.2005. Efficient on-the-fly algorithms for the analysis of timedgames. In CONCUR 2005, 66–80. Springer-Verlag.Ceballos, A.; Bensalem, S.; Cesta, A.; de Silva, L.; Fratini, S.;Ingrand, F.; Ocon, J.; Orlandini, A.; Py, F.; Rajan, K.; Rasconi,R.; and van Winnendael, M. 2011. A Goal-Oriented AutonomousController for Space Exploration. In ASTRA-11. 11th Symposiumon Advanced Space Technologies in Robotics and Automation.Cesta, A., and Oddi, A. 1996. DDL.1: A Formal Description ofa Constraint Representation Language for Physical Domains,. InGhallab, M., and Milani, A., eds., New Directions in AI Planning.IOS Press: Amsterdam.Cesta, A.; Cortellessa, G.; Fratini, S.; Oddi, A.; and Policella, N.2007. An Innovative Product for Space Mission Planning: An APosteriori Evaluation. In ICAPS-07, 57–64.Cesta, A.; Cortellessa, G.; Fratini, S.; and Oddi, A. 2009. De-veloping an End-to-End Planning Application from a TimelineRepresentation Framework. In IAAI-09. Proc. of the 21st Innova-tive Application of Artificial Intelligence Conference, Pasadena,CA, USA.Cesta, A.; Finzi, A.; Fratini, S.; Orlandini, A.; and Tronci, E.2010a. Analyzing Flexible Timeline Plan. In ECAI 2010. Pro-ceedings of the 19th European Conference on Artificial Intelli-gence, volume 215. IOS Press.Cesta, A.; Finzi, A.; Fratini, S.; Orlandini, A.; and Tronci, E.2010b. Validation and Verification Issues in a Timeline-BasedPlanning System. Knowledge Engineering Review 25(3):299–318.

45

Cesta, A.; Fratini, S.; Orlandini, A.; and Rasconi, R. 2012. Con-tinuous Planning and Execution with Timelines. In i-SAIRAS-12.Proc. of the 11th Int. Symp. on Artificial Intelligence, Roboticsand Automation in Space.Chien, S.; Tran, D.; Rabideau, G.; Schaffer, S.; Mandl, D.; andFrye, S. 2010. Timeline-Based Space Operations Scheduling withExternal Constraints. In ICAPS-10. Proc. of the 20th Int. Conf.on Automated Planning and Scheduling.Fratini, S.; Pecora, F.; and Cesta, A. 2008. Unifying Planningand Scheduling as Timelines in a Component-Based Perspective.Archives of Control Sciences 18(2):231–271.Gat, E. 1997. On Three-Layer Architectures. In Artificial Intelli-gence and Mobile Robots. MIT Press.Hunsberger, L. 2010. A fast incremental algorithm for managingthe execution of dynamically controllable temporal networks. InTemporal Representation and Reasoning (TIME), 2010 17th In-ternational Symposium on, 121–128.Jonsson, A.; Morris, P.; Muscettola, N.; Rajan, K.; and Smith, B.2000. Planning in Interplanetary Space: Theory and Practice. InAIPS-00. Proceedings of the Fifth Int. Conf. on AI Planning andScheduling.Maler, O.; Pnueli, A.; and Sifakis, J. 1995. On the Synthesisof Discrete Controllers for Timed Systems. In STACS, LNCS,229–242. Springer.Morris, P. H., and Muscettola, N. 2005. Temporal Dynamic Con-trollability Revisited. In Proc. of AAAI 2005, 1193–1198.Morris, P. H.; Muscettola, N.; and Vidal, T. 2001. DynamicControl of Plans With Temporal Uncertainty. In Proc. of IJCAI2001, 494–502.Muscettola, N.; Dorais, G. A.; Fry, C.; Levinson, R.; and Plaunt,C. 2002. Idea: Planning at the core of autonomous reactiveagents. In Proc. of NASA Workshop on Planning and Schedul-ing for Space.Muscettola, N. 1994. HSTS: Integrating Planning and Schedul-ing. In Zweben, M. and Fox, M.S., ed., Intelligent Scheduling.Morgan Kauffmann.Orlandini, A.; Finzi, A.; Cesta, A.; and Fratini, S. 2011. Tga-based controllers for flexible plan execution. In KI 2011: Ad-vances in Artificial Intelligence, 34th Annual German Conferenceon AI., volume 7006 of Lecture Notes in Computer Science, 233–245. Springer.Py, F.; Rajan, K.; and McGann, C. 2010. A Systematic AgentFramework for Situated Autonomous Systems. In AAMAS-10.Proc. of the 9th Int. Conf. on Autonomous Agents and MultiagentSystems.Shah, J., and Williams, B. C. 2008. Fast Dynamic Scheduling ofDisjunctive Temporal Constraint Networks through IncrementalCompilation. In ICAPS-08, 322–329.

46

Plan Repair Driven by Model-Based Agent Diagnosis

Roberto MicalizioDipartimento di Informatica

Universita di Torinocorso Svizzera 185 - 10149 Torino, Italy

Abstract

This paper proposes a methodology for repairing a planexecuted in a partially observable environment. In par-ticular, the paper takes into account that the plan to berepaired is part of a Multi-agent Plan, and hence the syn-chronization among the agents must be considered as afurther constraint during the repair process.The paper formalizes a local plan repair strategy, whereeach agent in the system is responsible for controlling(monitoring and diagnosing) the actions it executes, andfor autonomously repairing its own plan when an actionfailure is detected.The paper describes also how to mitigate the impact ofan action failure on the plans of other agents when thelocal recovery strategy fails.

IntroductionMany real complex tasks find proper solutions in the adop-tion of a Multi-Agent Plan (MAP). In a MAP a team ofagents cooperate with one another to reach a common goalG by performing actions concurrently.The actual execution of a MAP, however, is menaced bythe possible occurrence ofplan threats(e.g., agent faultsin (Birnbaum et al. 1990)), that can disrupt the nominalprogress of the plan by causing the failure of some actions.The occurrence of a plan threat does not prevent, in general,the agents to complete their activities, but the MAP needs tobe repaired: a new planning process is required to overcomethe effects of the action failure, and achieve the global goalin some alternative way.

Dealing with action failures in a multi-agent setting is par-ticularly challenging. First of all, since the agents cooperateby exchanging services, the local failure of an agent can eas-ily propagate in the global MAP, that like in a domino effect,could start a series of harmful effects on the plan of otheragents. Moreover, even though an impaired agent does notprovide services to other agents, it may still represent a la-tent menace for them because it may lock critical resourcesindefinitely.

To cope with these issues, the paper proposes a local ap-proach to autonomous plan repair. In particular, each agentperforms aclosed control loopover the actions in its local

Copyright c© 2013, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

plan; this control loop includes three main tasks:plan mon-itoring, agent diagnosis, andplan repair. In this paper wediscuss how these three activities can be realized and con-catenated even when the system where the agents operate isjust partially observable. The limited amount of observationsrepresents a challenge: the monitoring can just estimate theagent state as a set of alternatives (i.e., a belief state), and theagent diagnosis is typically ambiguous (i.e., a set of alter-native explanations); as a consequence the re-planning stepmust be able to deal with uncertain initial states and non de-terministic actions.

The paper is organized as follows: first, we introducesome basic notions on multi-agent plans; then we formal-ize the three steps of the control loop–monitoring, diagnosis,and repair. Since the repair strategy is based on a planningphase, we also sketch the conformant planner that we em-ploy in our framework. The we present some experimentalresults, and discuss some related works.

BackgroundGlobal Plan.

In this paper, a MAP is a system where a teamT of agentsactively cooperate for reaching a common goalG. For thesake of discussion, the model of a MAP is a simplified ver-sion of the formalism presented in (Coxet al.2005). In par-ticular, the MAPP is a tuple〈A, E, CL, RE〉, where:

- A is the set of the action instances the agents have toexecute. Two pseudo-actions,a0 anda∞, belong toA: a0

(the starting action), has no preconditions, and its effectsspecify the propositions that are initially true;a∞ (the end-ing action), has no effects, and its preconditions specify thepropositions which must hold in the final state (i.e., the goalG of the MAP). Except fora0 anda∞, each action instancea ∈ A is assigned to a specific agenti ∈ T .

- E is a set of precedence links between actions: a prece-dence linka ≺ a′ in E indicates that the execution ofa mustprecede the execution ofa′;

- CL is a set of causal links of the formcl : aq→ a′; cl

states that actiona provides actiona′ with serviceq (q is anatom occurring in the preconditions ofa′);

- RE is a set of precedence links ruling the access tothe resources. In fact, according to theconcurrency require-mentintroduced in (Roos and Witteveen 2007), two actionsa anda′, assigned to different agents, cannot be executed

47

at the same time instant if they require the same resourceres. We assume that the planning process that synthesizedP , resolved also the conflicts for accessing the resources byadding eithera ≺res a′ or a′ ≺res a. To keep a trace ofthese additional precedence links, they are labeled with theidentifier of the specific resource they refer to, and are col-lected in the setRE.Local Plans.The MAPP is decomposed into as many localplans as there are agents in the team: each local planP i isassigned to an agenti, and reaches a specific sub-goalGi.Formally, the local plan for agenti is the tupleP i=〈Ai, Ei, CLi, T i

in, T iout, REi

in, REiout〉 where Ai, Ei

andCLi have the same meaning of the setsA, E andCL,respectively, restricted to actions assigned to agenti. Ai in-cludes also two special actionsai

0 andai∞ which specify,

respectively, the initial and final conditions for the sub-planP i. T i

in (T iout) is a set of incoming (outgoing) causal links

of the formaq→ a′ wherea′ (a) belongs toAi anda (a′)

is assigned to another agentj in the team. Similarly,REiin

(REiout) is a set of incoming (outgoing) precedence links of

the forma ≺res a′ wherea′ (a) belongs toAi anda (a′) isassigned to another agentj in the team.

We assume that each local planP i is totally ordered, thatis P i is the ordered sequence of actions[ai

0, ai1, . . . , a

i∞].

Distributed plan execution.An agent executes its next ac-tion as soon as the action preconditions have been satisfied(the notions of preconditions and effects of an action will beformalized in the following section). However, an agent canexecute no more than one action in a given time instant. Inparticular, the time is assumed to be a discrete sequence ofinstants, and actions are atomic.In the following the notationai

l(t) will denote that thel-thaction in the local planP i is executed by agenti at timet.Coordination during plan execution.Since agents executeactions concurrently, they need to coordinate their activitiesin order to avoid the violation of the constraints defined dur-ing the planning phase. Effective coordination among agentsis obtained by exploiting the causal and precedence links inthe global MAP.As pointed out in (Decker and Li 2000), coordination be-tween two agentsi andj is required wheni providesj witha serviceq; this is modeled by a causal linkcl : ai

h

q→ ajk

in the MAPP . (As an effect of the MAP decomposition,cl

belongs both toT iout and toT j

in.) Since an agent can observe(at most) the direct effects of the actions it executes, onlyagenti has the chance of observing the achievement (or theabsence) ofq; thereby, the agenti must notify agentj aboutthe outcome of actionai

h.Similarly, the consistent access to the resources is a formof coordination which involves precedence links. For exam-ple, the precedence linkpl : ai

h ≺res ajk means that agenti

will release resourceres to agentj just after the executionof actionai

h; resourceres will be used by agentj to executeactionaj

k. Of course,pl belongs both toREiout and toREj

in.Since the system is distributed, an agent does not have a

global view of the status of all the system resources, but itknows just the status of the resources it holds. After hav-ing released resourceres, agenti will not have access to

the actual status ofres. In the following we will denote asAvRes(i , t) (available resources) the subset of resources as-signed to agenti at time t; i.e., only agenti observes andknows the actual status of those resources.

Monitoring a MAP

In this section we formalize the first step of the local controlloop: the plan monitoring activity. Before that, however, wefirst introduce some fundamental notions.Agent status. The status of agenti is modeled by a setof status variablesVARi, partitioned into three subsetsEND i, ENV i andHLT i. END i andENV i denote the setof endogenous (e.g., the agent’sposition) and environment(e.g., the resources state) status variables, respectively.Because of the partitioning, each agenti maintains a privatecopy of the resource status variables; therefore for eachresourceresk ∈ RES (k : 1..|RES |) the private variableresk,i is included in the setENV i. The consistency amongall these copies is assured by the fact that conflicts foraccessing the resources are solved at planning level. Theprecedence links inRE guarantee in fact that, at eachtime t, a resourceresk is available just for an agenti (i.e.,resk belongs toAvRes(i , t)); therefore for any other agentj ∈ T \ {i} the status of theresk is unknown.

Since we are interested in monitoring the plan executioneven when something goes wrong, we introduce a furthersetHLT i of variables for modeling the health status of anagent. For each agent functionalityf , a variablevf ∈ HLT i

represents the health status off , the domain of variablevf is the set{ok, abn1, . . . , abnn} whereok denotes thenominal mode, whileabn1, . . . , abnn denote anomalousor degraded modes. An action failure can be thereforeexplained in terms of faults in a subset of functionalities ofa specific agent.System observability.We assume that after the executionof an actionai

l(t) the agenti receives a setobsi(t + 1) ofobservations, that conveys information about a subset ofvariables inVARi. Given the partial observability, an agentcan directly observe just the status of its available resources,and the value of a subset of variables inEND i, whereasthe variables inHLT i are not directly observable and theiractual value can be just inferred. As a consequence, ateach timet the agenti can just estimate asetof alternativestates which are consistent with the received observations;in literature this set is known asbelief state, and in thefollowing the notationBi(t) will refer to the belief of agenti inferred at timet.Action models. In order to monitor the execution of actionai

l(t), agenti needs a model for estimating all the possible,nominal as well as anomalous, evolutions of the actionitself. In our framework, an action model is the tuple〈var(ai

l(t)), PRE (ail(t)), EFF (ai

l(t)), ∆(ail(t))〉, where:

var(ail(t)) ⊆ VARi is the subset ofactivestatus variables

over which the preconditionsPRE (ail(t)) and the action

effectsEFF (ail(t)) are defined; finally,∆(ai

l(t)) is a tran-sition relation defined over the agent status variables fromtime t (when the action starts) to timet+1 (when the action

48

ends). Given actionail , healthVar(ai

l)=HLTi∩var(ai

l)denotes the set of variables representing the health statusof the functionalities which directly affect the outcome ofactionai

l.

The healthy formulahealthy(ail) of actionai

l is computedby restricting each variablev ∈ healthVar(ai

l) to the nominalbehavioral modeok, and represents the nominal health statusrequired to agenti for successfully completing the action.Definition 1 The set of the nominal effects of actionai

l isnomEff(ai

l)={q ∈ EFF(ail) | PRE(ai

l)∪ healthy(ail) ⊢ q}.

On the contrary, when at least one variablev ∈healthVar(ai

l) assumes an anomalous mode (i.e., a function-ality is not in the nominal mode), the behavior of the actionmay be non deterministic and some of the expected effectsmay be missing.In the following,A will denote the set of action models anagent exploits for monitoring the progress of its own plan.The estimation of the agent status.The estimation processaims at predicting the status of agenti at timet + 1 after theexecution of an actionai

l(t). However, because of the nondeterminism in the action models, and the partial system ob-servability, the estimation process can in general infer just aset of alternative agent states (i.e.; a belief state) rather thanthe actual agent state. The estimation can be formalized interms of the Relational algebra operators as follows.Definition 2 Let Bi(t) be the belief state of agenti, let∆(ai

l(t)) be the model of the action executed at timet, theagent belief state at timet + 1 results from:Bi(t + 1) =PROJECTIONt+1[ SELECTIONobsi(t+1) (Bi(t) JOIN ∆(ai

l(t)))]

The join operationBi(t) JOIN∆(ail(t)) is the predictive step

by means of which all the possible agent states at timet + 1are estimated. The selectionSELECTIONobsi(t+1), refines thepredictions by pruning off all those estimates which are in-consistent with the agent observations. Finally, the beliefstateBi(t + 1) is obtained by projecting the resulting es-timates over the status variables of agenti at timet + 1.Action outcome. The outcome of an action is eithersuc-ceededor failed. Given the beliefBi(t+1), the agenti deter-mines the successful completion of actionai

l(t) as follows:

Definition 3 The outcome of actionail(t), is succeedediff

∀q ∈ nomEff (ail(t)), ∀s ∈ Bi(t + 1), s |= q.

In order to be conservative, we consider actionail(t)

successfully completed only when all the atomsq innomEff (ai

l(t)) are satisfied in every states in Bi(t + 1);i.e., when all the nominal effects ofai

l(t) hold in every pos-sible state estimated after the execution of the action. Whenwe cannot assert that actionai

l(t) is succeeded, we assumethat the action is failed. This conservative assumption canbereleased along the line discussed in (Micalizio and Torasso2008; 2009), where an action outcome ispendingwhen theavailable observations are not sufficient for discriminatingbetween success and failure.

Agent DiagnosisIn this section we formalize the step ofagent diagnosis,which explain the action failure in terms of faults in the

agent functionalities. As a further refinement of the failureanalysis, we show also how to compute the set ofmissinggoals, which in principle steers the recovery process.Agent diagnosis.The diagnostic process is activated when-ever the agenti detects the failure of an actionai

l(t). Thepurpose of the diagnostic process is to single out which fault(or combination of faults) is a possible cause (i.e., an expla-nation) for the detected failure. An explanation is thereforeexpressed in terms of the status variables inhealthVar(ai

l),as these variables model the health status of the functionali-ties required for the successful execution of actionai

l .Intuitively, given the agent belief stateBi(t+1), the agent

diagnosisDi is inferred by projectingBi(t+1) over the sta-tus variables inhealthVar(ai

l). However, sinceBi(t + 1) isin general ambiguous, the agent diagnosisDi results to be aset of alternative explanations: each explanationexp ∈ Di

is a complete assignment of values to the status variables inhealthVar(ai

l).More formally, the agent diagnosis is defined in Relationalterms as

Di=PROJECTIONhealthV ar(ail)Bi(t + 1)

Missing Goals.As noted earlier, the agents in the teamT cooperate one another by exchanging services; that is,there exist causal dependencies between actions of differentagents. As a consequence, the failure of actionai

l preventsthe execution of the actions in the plan segment[ai

l+1, ai∞]

and, since some services will never be provided, it can indi-rectly impact the local plans of the other teammates.The set of the services that agenti can no longer provide dueto the failure is denoted as the set ofmissing goals; singlingout these services is important as, in principle, it would besufficient to find an alternative way to provide them in orderto reach the global goalG despite the failure of actionai

l(t).

Definition 4 Given the failure of action ail , let

[ail+1, . . . , a

i∞] be the plan segment the agenti is un-

able to complete, the set of missing goals is:MG(i)={serviceq|∀ai

k ∈ [ail, . . . , a

i∞], q ∈ nomEff (ai

k), andq ∈ PRE(ai

∞) or∃ a causal linkcl ∈ CL such thatcl : ai

k

q→ ajh; i 6=

j)}Namely, the serviceq is a missing goal whenq is a nominaleffect no longer provided by an action in the plan segment[ai

l, . . . , ai∞], and when eitherq is an atom appearing in the

sub-goalGi (i.e., q is a precondition of the special actionai

∞) or q is a service agenti should provide to another agentj.

Plan Repair: a local strategyIn this section we discuss a methodology for repairing the lo-cal planP i, interrupted after the failure of actionai

l has beendetected. Essentially, this repairing process consists ina re-planning step intended to overcome, if possible, the harmfuleffects of the failure. In the next section we sketch the plan-ning algorithm activated by agenti; in this section we focus

49

on which goals should be reached for the recovering pur-pose. As noted above, the set of missing goals can be usedto this end; unfortunately, when the recovery is driven by themissing goals it requires global changes; in fact, the missinggoals are long term objectives, that can be reached by acquir-ing new resources; the acquisition of a resource, however,imposes the coordination with other teammates, and hencenew causal and precedence links are to be introduced in theglobal MAPP ; it follows that a number of other agents inthe team have to change their local plans.

The local strategy we propose, instead, tries to recoverfrom the failure ofai

l just by changing the local planP i,without any direct impact on the plans of other teammates.The idea of a local strategy stems from the observation thatin many cases an agent is still able to do something usefuleven if its health status is not completely nominal. By ex-ploiting this possibility, we first formalize a local replanningstrategy intended to overcome the causes of an action failure,and then restore the plan execution from the point where itwas stopped. However, when such a replanning step fails, weshow also how the agent in trouble can reduce the impact ofthe failure in the MAPP by moving into a safe status.Repairing the interrupted local plan. This step is based onthe observation that the plan segment[ai

l+1, . . . , ai∞] could

be carried out if the root causes of the failure of actionail

were removed.Of course these root causes have been singled out by thediagnostic inferences: the agent diagnosisDi explains thefailure ofai

l as a combination of anomalous conditions in thefunctionalities of agenti; therefore, to overcome the causesof this failure the agenti has to restore a healthy conditionin those functionalities. To this end the agenti can exploit asetAR of repairing actions, each actionar ∈ AR restorethe healthy condition in a specific agent functionality. Ofcourse, it is possible that some faults are not autonomouslyrepairable by an agent. For example, if our agents wererobots, a low charge in the battery could be fixed by means ofarecharge repairing action, whereas a fault in the mobil-ity functionality would require the human intervention andit could not be fixed autonomously by the robots.Therefore, relying on the agent diagnosisDi, agenti as-sesses whether one (or a set of) repair action(s) exists. Ifrecover actions do not exist, agenti gives up the synthesisof a recovery plan and tries to reach asafe status(see later).If recovery actions exist, the agenti tries to reach a newgoalK consisting in: 1) restoring the healthy conditions inits functionalities by executing an appropriate set of repair-ing actions,and2) restarting the execution of the plan fromthe failed actionai

l. The repairing planPri is a plan whichmeets these two goals, and it can be found by resolving thefollowing planning problem:

Definition 5 The repairing planPri= [ari0, . . . , ari

∞] is asolution of the plan problem〈I,F ,A, AvRes(i, t), Di〉;where:

- I (initial state) corresponds to the agent belief state:EFF (ari

0) ≡ Bi(t + 1) (i.e., the belief state inferred afterthe execution of actionai

l(t)).- F (final state) is the goalK defined as

PRE (ari∞) ≡ {∀v ∈ healthV ar(ai

l), v = ok} ∧ PRE (ail)

- A ⊆ A ∪ AR is the set of action models which can beused during the planning process given the agent diagnosisDi, and the available resourcesAvRes(i, t).

The repairing planPri, however, must satisfy two furtherdemanding requirements:

Requirement 1 Since the repairing planPri can imposelocal changes only, no new resources can be acquired: theactions inPri can just exploit the resources inAvRes(i, t),already acquired by agenti at the time of the failure.

Requirement 2 Since the belief stateBi(t+1) is potentiallyambiguous (the actual agent health status is not preciselyknown) the repairing planPri must be conformant, namely,it must be executable no matter the actual health status ofagenti.

An important consequence of the conformant requirement isthe following property.

Property 1 For each actionarik ∈ Pri it must hold

healthy(arik) ∪Di 6⊢ ⊥

Property 1 states that all the actions in the repairing planmust be executable despite the current status of agenti isnot healthy. Therefore, when no action is executable giventhe agent diagnosisDi, the repairing plan does not exist.Assuming that the planPri exists, the agenti yields its newlocal planP ∗i = [ari

0, . . . , ari∞]◦ [ai

l, . . . , ai∞]; where◦ de-

notes the concatenation between two plans (i.e., the secondplan can be executed just after the last action of the first planhas been executed).

Property 2 The recovery planP ∗i is feasible and exe-cutable.

Due to space reason the proof is omitted, intuitively the fea-sibility of the recovery planP ∗i stems by two characteris-tics: 1) every plan segment is feasible on its own as it hasbeen produced by a specific planning step, and 2) the pre-conditions of the actionari

∞ of the first plan matches withthe effects of the actionai

l of the second one. A more impor-tant property is the following one:

Property 3 The recovery planP ∗i meets all the services inMG(i).

Property 3 guarantees that, by executingP ∗i in lieu of[ai

l, . . . , ai∞], agenti can recover from the failure of action

ail and achieve its sub-goalGi despite the failure.

Reaching the Safe Status.The repairing planPri, however,may not exist. In fact, the faults assumed inDi may be notrepairable, or a conformant solution may not exist. When theplan recovery process fails, the impaired agent can be seenas a latent menace for the other team members (e.g., whenthe agent locks indefinitely critical resources). We comple-ment the first step of the local strategy by means of a furtherstep intended to lead the impaired agenti into asafe statusSi. In this paper, we define a safe status as a condition whereall the resources used byi at timet–the time of the failure ofactionai

l(t)–have been released. Also this step can be mod-eled as a planning problem as follows:

50

Definition 6 The plan-to-safe-statusPsi= [asi0, . . . , asi

∞]is a solution of the plan problem〈I,F ,A, AvRes(i, t), Di〉; where:

- I (initial state) corresponds to the agent belief state:EFF (asi

0) ≡ Bi(t + 1) (i.e., the belief state inferred afterthe execution of actionai

l(t)).- F (final state) is the safe statusSi, defined asPRE (asi

∞) ≡ ∀resk ∈ AvRes(i, t), resk,i = free.- A is as before the set of action models which can be

used during the planning process given the agent diagnosisDi and the set of available resourcesAvRes(i, t).

Of course, also the plan-to-safe-statusPsi must satisfy therequirements 1 and 2; thus property 1 can be extended to theactions inPsi too. Repairing actions can be used also dur-ing this planning step, in some cases in fact it is required torestore the healthy status in some functionalities to release aresource.When the plan-to-safe-statusPsi exists, it becomes the newlocal plan assigned to agenti; that isP ∗i = Psi, and all theactions in[ai

l, . . . , ai∞] are aborted. Therefore, even though

the agenti is unable to reach its goalGi, it moves into safestatus in order to do not obstruct the other team members intheir activities.When the recovery strategy fails.The recovery strategyfails when neither the plan to a repaired state, nor the plan toa safe status exists. In this case, we adopt a conservative pol-icy and we impose that the impaired agent gives up the ex-ecution of its local plan. Performing further actions, in fact,may lead the agent in dangerous conditions; for example,the agent could lock indefinitely some resources preventingothers to access them.

The failure of the local recovery strategy does not imply,in general, that the action failure cannot be recovered from,but different, global strategies should be activated. Thesestrategies (out the scope of this paper), are driven by the setof missing goals, and may require the cooperation of a sub-set of agents, or the activation of a global re-planning step.The algorithm. The high-level algorithm of the control loopperformed by each agenti ∈ T is showed in Figure 1. Thealgorithm consists in a while loop, where at each iterationagenti singles out the next actionai

l to be executed. Theactionai

l is executed iff its preconditions are satisfied in thecurrent belief stateBi(t). After the action execution,i gath-ers the available observations and detects the outcome ofai

l(see Definition 3). In case the action outcome isfailed, firstthe diagnostic inference are activated, and then the confor-mant planner is invoked to find a plan to a repaired state(Definition 5), or alternatively a plan to safe status (Defini-tion 6). When both the planning steps fail, the agenti sendsa message to all the other agents about its failure, and inter-rupts the execution of its local planP i.

Conformant PlanningIn this section we just sketch the main steps performedby the conformat planner we propose, more details can befound in (Micalizio 2013).

The high-level algorithm of the conformant planner wepropose is showed in Figure 3. The simple idea at the basis

LocalControLoop(P i,Bi(0)){ t=0;while there are actions inP i to be executed

ail ← nextAction(P i);

if PRE (ail) are satisfied inBi(t)

〈 EXECUTEail〉;

gather observationsobsi(t + 1);Bi(t + 1)←Monitoring (Bi(t), ∆(ai

l))if outcome(ai

l,Bi(t + 1)) equalsfailedDi ← Infer-Diagnosis(Bi(t + 1), healthVar(ai

l))Pri ←ConfPlan(Bi(t+1),K,A, AvRes(i, t), Di)if Pri is not empty→ P i=Pri ◦ [ai

l , .., ai∞]

elsePsi ← ConfPlan(Bi(t), S,A, AvRes(i, t), Di)if (Psi is not empty)P i ← Psi

elseinvoke a global recovery strategyt=t+1;

Figure 1: The control loop algorithm.

of our planner is to create a macro-operatorΦ which gath-ers, in a disjunctive form, all the models of the actions thatcan be used during the repair plan. This task is performed byfunctionBuild -Φ (line 00), which takes in input the set ofaction modelsA, the set of available resourcesAvRes(i, t),and the agent diagnosisDi. The function selects fromAall those actions that can be performed given the set of re-sources currently assigned to agenti, and that are consistentwith the diagnosisDi. In other words, an action that requiresa functionality assumed faulty inDi, or that requires a re-source that the agent does not hold, is not included inΦ.

The macro-operatorΦ is subsequently used to extend theset of current plan hypotheses. To keep a trace of such planhypotheses, we introduce a further structure, namedPSET .

Intuitively, PSETh can be seen as a set of tra-jectories, where each trajectorytr has the form〈Bi(0), a1,Bi(1), a2, . . . , ah,Bi(h)〉. EachBi(k) (k : 0..h)represents an agent belief state, while eachak (k : 1..h)represents an action that brings belief stateBi(k − 1) toevolve into belief stateBi(k).

In particular, we call these actions “conformant” in thesense that each actionak is applicable in every states ∈Bi(k−1). In other words,PSETh represents the set of con-formant plans of lengthh found so far. At the beginning ofthe algorithm, of course,PSET0 is set to the initial beliefstateI (line02).

After these preliminary steps, the algorithm starts a whileloop that terminates either when a conformant plan has beenfound, or when a maximum depthMAXDEPTH has beenreached. In this second case, the algorithm has explored allthe space of plans no longer thanMAXDEPTH actionswithout finding a solution, and hence terminates with a fail-ure.

At each iteration of the while loop, the currentPSETstructure is extended by means of the macro-operatorΦ (line06). Such an extension consists in applying each action inΦ to each belief state in the frontier ofPSETh; namely,to each belief state at depthh. The result of this operationis a new set of plan hypothesesPSETh+1, which are anaction longer than the previous ones. Note that this exten-

51

...

...

...Φ

Φ

Φ

I

F

a1

a1

a1

a2

a2

a2

an

an

an

Figure 2: An example of how thePSET structure is ex-tended.

sion is carried out by means of a RelationalJOIN , whichdoes not guarantee thatPSETh+1 contains conformat ac-tions only. For this reason, the newly createdPSETh+1 isrefined by removing all those belief states produced by non-conformant actions (line07). If the resultingPSETh+1 getsempty, the planning process terminates with the guaranteethat no conformat plan exists for the given problem. Other-wise, the algorithm checks whether some of the belief statesin the frontier ofPSETh+1 satisfy the goal. In the positivecase, a conformant plan has been found, so it is extracted andreturned as a result. In the negative case, the loop is repeated.

Figure 2 gives an intuition of how the search proceeds.Belief states are represented as boxes. Grey boxes are thosealong a solution. The macro-operatorΦ contains all theactions that can be used to find a conformant plan (i.e.,a1, a2 . . . an). The operator is applied to each belief state inthe frontier of the currentPSET structure starting from theinitial belief I. After three steps of the algorithm, a solutionis found since a belief state satisfies the goalF , and hencethe plan〈a1, a2, an is returned.

The planning algorithmConfPlan has some interestingproperties.

Property 4 (Optimality) If ConfPlan terminates with suc-cess, it finds all the optimal (i.e., with the minimum numberof actions) conformant plans.

Intuitively, the proof follows by the fact that the algorithmimplements an exhaustive, forward-chaining search. Thus,itis possible to demonstrate that if a conformant planπ foundby ConfPlan is longer than another conformant planπ′, forthe same problem, thenπ′ must mention at least one actionthat is not included in the macro-operatorΦ, but in this case,π′ would not be executable by the impaired agenti.

Property 5 (Soundness)If ConfPlan terminates with afailure because the setPSET gets empty, than there notexists a conformant plan to solve the problem.

ConfPlan(I,F ,A,AvRes(i, t), Di)00 Φ← Build-Φ(A,AvRes(i, t), Di)01 π ← ∅02 PSET0 ← I03 h← 004 solved←false05 while not solved andh < MAXDEPTH06 PSETh+1 ← PSETh JOIN Φ07 PSETh+1 ← Prune

NotConformant(PSETh,PSETh+1)08 if PSETh+1 is emptyreturn ∅09 solved← CheckGoal(PSETh+1,F)10 if solved=true11 π ←ExtractPlan(PSETh+1)12 elseh← h + 113 return π

Figure 3: The high-level algorithm for the synthesis of a con-formant plan.

Also this property follows by the fact that the search is ex-haustive: if thePSETh+1 gets empty, it means that no ac-tion in Φ is applicable as a conformant action to any of thebelief states in the frontier ofPSETh. Therefore, no solu-tions longer thanh actions are possible.

Property 6 (Incompleteness)If ConfPlan terminates witha failure because the thresholdMAXDEPTH has beenreached, a conformant plan longer thanMAXDEPTH ac-tions might exist.

The incompleteness of theConfPlan algorithm comes fromthe fact that the search is made in a forward chaining man-ner without backtracking. Thus, to guarantee the terminationeven in presence of infinite paths, we need to set an artificiallimit to the search space, which introduces the incomplete-ness.

Experimental results.The proposed control loop has been implemented in JavaJDK 1.6 by exploiting the symbolic formalism of the Or-dered Binary Decision Diagrams (OBDDs) to encode theagents’ belief states, and the non deterministic models ofthe actions. Monitoring, diagnosis and planning are there-fore implemented in terms of standard OBDDs operators(see (Micalizio 2013; Cimatti and Roveri 2000; Jensen andVeloso 2000)). Agents are threads running on the same PC(Intel Core 2, 2.16GHz, RAM 2GB, WindowsXP).

In our experiments we have (software) simulated aservice-robot scenario where a team of robotic agents offera “mail delivery service” in an office-like environment. Re-sources are parcels, clerks’ desks, doors, and one or morerepositories. Resources are constrained: desks, doors andrepositories can be accessed by only one agent per time;moreover, at most one parcel can be put on a desk. The envi-ronment we have simulated is fairly large involving 30 criti-cal resources.In such an environment we have considered the executionof 15 MAPs, involving 6 agents, each of which executes 10

52

Table 1: Main characteristics of the simulated plans (avg.values) in each scenario.

SCN2 SCN4 SCN6 SCN8# actions 140 312 308 444# casual links 430 846 818 1062# subgoals 43 114 99 148# actions per agent 70 78 51 56# subgoals per agent 22 28 17 19

Figure 4: The average number of performed actions in thefour scenarios: comparison between the repair strategies andthe nominal situation.

actions and has to reach 2 sub-goals (i.e., the complex goalG consists of 12 sub-goals). The execution of each MAP hasbeen perturbed by the injection of a fault in the functionali-ties of one agent.

In order to prove the effectiveness of the local recoverystrategy, we have considered four alternative scenarios in-cluding from 2 to 8 agents. In each scenario we have simu-lated the execution of 40 MAPs, whose main characteristicsare reported in Table 1. In these four scenarios, we comparedthe behavior of four strategies when the actual execution ofthe MAPs is affected by the occurrence of faults. Up to threefaults are injected randomly in each MAP, some of them arerepairable, others are not. The four strategies we comparedare:no-repair, the agent in trouble does not handle the fail-ure (the agent just stops the execution of its local plan);safe-status, whenever an action failure occurs, the agent in trou-ble moves into a safe status;repair, the agent tries to repairits own plan, when the repair process fails the agent stopsthe plan execution;r+s (repair and safe-status), a combina-tion of the previous scenarios: first the agent tries to repairits plan, in case this step fails the agent reaches a safe status.

Figure 4 shows the average number of actions that havebeen performed in four strategies. From the picture itemerges that the best strategy isr+s. This strategy, in fact,

Figure 5: The percentage of achieved subgoals by the fourrepair strategies with the four scenarios.

Table 2: CPU time [msec] for the repair process.repaired safestatus

SCN2r+s 841± 129 876± 138repair 838± 128 -safestatus - 112± 27


SCN6r+s 1009± 130 1053±135repair 1007± 146 -safestatus - 109± 34


is the most flexible as it can take advantage of both a planrepair, and when this is not possible, of a plan-to-safe-status.In Figure 5 we show the percentage of subgoals actuallyachieved in four scenarios by the four strategies. Also in thiscase strategyr+s is the best choice as it gets the highestnumber of subgoals in all the four scenarios.

Due to space reasons, we present the computational timesrequired for the repair purpose only. These times are showedin Table 2. Of course, strategyno-repair is not included asits repair cost is always zero (indeed, this strategy does notattempt any plan repair). The table reports the average CPUtime that has been spent for planning either to a repairedstate, or to a safe-status.

Thesafestatusstrategy can just plan to a safe-status, sucha planning problem is in general simpler that a plan repairproblem, and hence it is the cheapest strategy, but its effec-tiveness is limited, as we have already shown.

The repair strategy can just plan to a repaired state fromwhich the nominal plan execution can be resumed. Such aplanning problem is more complex since it has to restorethe nominal conditions in the agent’s functionalities, andtosatisfy the preconditions for the plan segment still to be per-formed. So the computational time is in the order of 8 hun-dreds msec.

Ther+s strategy can plan both to a repaired state and to a

53

safe-status. Indeed, when a plan to a repaired state exists,thestrategy behaves similarly to therepair strategy. However,when such a planning step fails, ther+s strategy tries a planto safe-status, in these cases the computational cost ofr+sis in the order of one second or more. It is worth noting,however, that even thoughr+s can invoke, at least in somecases, the conformant planner twice, the computational costis not doubled. This happens because in most of the casesr+s detects very early that a conformant plan to a repairedstate does not exist, and hence the overhead introduced bythis planning step is still acceptable.

Related worksIn (Birnbaumet al. 1990) a model-based approach to plandiagnosis is presented, in this approach the authors relatethe health status of a planning agent to the outcome of theplanning activity. Also in this paper we relate the outcomeof the actions executed by plan executors to the health statusof these executors, however, in this work we will considermulti-agent plans whereas Brinbaum et al. considered just asingle planning agent.

The multi-agent setting is discussed in (Roos and Wit-teveen 2007), where the authors introduce the notion ofplandiagnosisas the subset of actions whose failure is consistentwith the anomalous observed behavior of the system. In con-trast to our work, this approach does not relate the failure ofan action to the health status of the agents; it focus just onthe detection of abnormal actions.

In (Kalech and Kaminka 2007) the authors introduce thenotion ofsocial diagnosisto find the cause of coordinationfailures. In their approach, however, they do not explicitlyconsider plans, rather they model a hierarchy ofbehaviors:each agent selects independently from others the more ap-propriate behavior given its own beliefs.

The plan repair task has been addressed by a numberof works (see e.g., (van der Krogt and de Weerdt 2005;Decker and Li 2000; Horling and Benyo 2001), which con-sider both self-interested and collaborative agents; howeverthese works are not directly applicable in our framework.These approaches in fact are mainly focused on repairing co-ordination flaws occurring during plan execution, thus theyinvolve a re-scheduling task rather than performing a re-planning step (see the GPGP solution in (Decker and Li2000)). In (Horling and Benyo 2001) a solution for reor-ganizing the tasks among the (collaborative) agents is pre-sented: this approach is driven by the results of a diagnosticengine which explains detected plan failures. In this case,however, the explanations are derived from a causal modelwhere anomalous events (e.g., resource unavailable) are or-ganized in a fault tree, and the reaction to plan failure is aproper precompiled repairing solution.

ConclusionsIn this paper we have formalized a closed loop of controlover the execution of a multi-agent plan.

The paper contributes to show the importance of a repairstrategy driven by a failure analysis which highlight the rootcauses of an action failure. Depending on the (possibly mul-

tiple) faults and on the activities of the agent in trouble, dif-ferent course of actions are synthesized either for recover-ing the action failure (if the local repairing plan exists) or tobring the agent in a safe status and limit the impact of thefailure.

The preliminary experimental results show that the pro-posed methodology is adequate to promptly react to an ac-tion failure and to actually mitigate the harmful effects ofthe failure. Also the computational cost of the approach isaffordable since the search for a recovery plan is stronglyconstrained by the agent diagnosis.

The proposed framework can be extended to deal withmore sophisticated notions multi-agent plan. First of all,concurrency constraints can be introduced to model joint ac-tions (see e.g., (Micalizio and Torasso 2008)). A more in-teresting extension concerns the temporal dimension. Deal-ing with temporal plans has a strong impact on the confor-mant planner. In fact the planner has to find a repairing planthat meets the set of missing goals, and that can be executedwithout violating any temporal constraint.

ReferencesL. Birnbaum, G. Collins, M. Freed, and B. Krulwich.Model-based diagnosis of planning failures. InProc.AAAI90, pages 318–323, 1990.A. Cimatti and M. Roveri. Conformant planning via sym-bolic model checking.JAIR, 13:305–338, 2000.J. S. Cox, E. H. Durfee, and T. Bartold. A distributed frame-work for solving the multiagent plan coordination problem.In Proc. AAMAS05, pages 821–827, 2005.K. Decker and J. Li. Coordinating mutually exclusive re-sources using GPGP.Journal of AAMAS, 3(2):113–157,2000.B. Horling and V. Benyo, B. Lesser. Using self-diagnosisto adapt organizational structures. InProc. ICAA’01, pages529–536, 2001.R. M. Jensen and M. M. Veloso. Obdd-based universal plan-ning for synchronized agents in non-deterministic domains.JAIR, 13:189–226, 2000.M. Kalech and G. A. Kaminka. On the design of coordina-tion diagnosis algorithms for teams of situated agents.AI,171:491–513, 2007.R. Micalizio and P. Torasso. Monitoring the execution of amulti-agent plan: Dealing with partial observability. InProc.of ECAI’08, pages 408–412, 2008.R. Micalizio and P. Torasso. Agent cooperation for moni-toring and diagnosing a map. InMATES, volume 5774 ofLecture Notes in Computer Science, pages 66–78, 2009.R. Micalizio. Action failure recovery via model-based diag-nosis and conformant planning.Computational Intelligence,29(2):233–280, 2013.N. Roos and C. Witteveen. Models and methods for plandiagnosis.Journal of AAMAS, 16:30–52, 2007.R. van der Krogt and de Weerdt. Plan repair as an extensionof planning. InProc. of ICAPS’05, pages 284–259, 2005.

54

Timelines with Temporal Uncertainty∗

Alessandro Cimatti1 and Andrea Micheli1,2 and Marco Roveri11Fondazione Bruno Kessler – Italy,

2University of Trento – Italy{cimatti,amicheli,roveri}@fbk.eu

Abstract

Timelines are a formalism to model planning domains wherethe temporal aspects are predominant, and have been used inmany real-world applications. Despite their practical success,a major limitation is the inability to model temporal uncer-tainty, i.e. the fact that the plan executor cannot decide theactual duration of some activities.In this paper we make two key contributions. First, we pro-pose a comprehensive, semantically well founded frameworkthat (conservatively) extends with temporal uncertainty thestate of the art timeline approach.Second, we focus on the problem of producing time-triggeredplans that are robust with respect to temporal uncertainty, un-der a bounded horizon. In this setting, we present the firstcomplete algorithm, and we show how it can be made practi-cal by leveraging the power of Satisfiability Modulo Theories.

IntroductionTimelines are a comprehensive formalism to model plan-ning domains where the temporal aspects are predominant.The framework builds on a quantitative extension of Allen’stemporal operators (Allen 1983; Angelsmark and Jonsson2000). For example, it is possible to state that a certain ac-tivity must last no longer than 5 seconds, and must be carriedout during another activity. The key difference with respectto (Allen 1983) is in the fact that, with timelines, the numberand type of activities is not known a priori — they are theresult of unrolling over time the domain description (simi-lar to the instantiation of operators into actions in classicalplanning).

Timelines have been used in many real-world applica-tions. The research line pioneered by NASA, that resultedin the Europa planner (Barreiro et al. 2012), is based on atimeline framework. APSI is a timeline-based frameworkthat has been developed in the European Space Agency(ESA) since 2008 (Donati et al. 2008; Cesta et al. 2009a).The framework is very expressive, and it has been used todescribe real-world planning and scheduling domains andproblems (Donati et al. 2008; Cesta et al. 2008), and as a

∗This is a presentation-only paper. This paper has been pub-lished in the Proceedings of the Twenty-Seventh AAAI Conferenceon Artificial Intelligence, July 14-18, 2012, Bellevue, Washington,USA

core for several practical applications (Cesta et al. 2009b;2010a; 2011).

Despite the practical success of the timeline approach, amajor limitation is the inability to express temporal uncer-tainty. Temporal uncertainty is needed to model situationsin which some activities have a duration that cannot be con-trolled by the plan executor. Such phenomenon is perva-sive in several application domains, including transporta-tion, production planning, and aerospace. In fact, in 2010ESA issued an Invitation to Tender aiming at the extensionof the APSI timeline-based framework with uncertainty.

In this paper, we make the following contributions. First,we propose a comprehensive framework, that (conserva-tively) extends the state of the art timeline approach withtemporal uncertainty. We provide a semantic foundation tothe strong controllability problem, that is the problem of pro-ducing time-triggered plans that are robust with respect totemporal uncertainty. In practice, this is useful to generateplans that are guaranteed to fulfill the goal under any possi-ble behavior of the uncertain components of the domain.

Second, we present the first complete algorithm for time-line planning under temporal uncertainty. The approach isbased on the logical encoding of the problem into the prob-lem of satisfiability of a first order formula with respect to abackground theory. The approach is made practical by lever-aging the power of Satisfiability Modulo Theories (Barrett etal. 2009) (SMT). In addition to the direct encoding, we pro-pose a lazy algorithm that relies on the incremental use of theunderlying SMT solver. We experimented on various prob-lems, and the results confirm the potential of the approach.

This paper is structured as follows. In Section we presentsome background. In Section we model timelines with un-certainty; in Section we show how to encode the strong con-trollability problem into SMT. In Section we compare ourapproach with related work, and in Section we experimen-tally evaluate it. In Section we draw some conclusions, andoutline directions for future research.

BackgroundAllen’s Algebra. Allen’s algebra is a well known formal-ism to reason about the temporal properties of a finite setof activities (Allen 1983). The algebra is defined by 13operators, representing all the possible relations between apair of intervals, by a transitivity table that allows for con-

55

straint propagation and by an inverse function. The problemof checking the temporal consistency of a set of Allen con-straints is known to be NP-hard (Allen 1983).

Allen’s algebra has been extended in a number of worksto express quantitative information (Angelsmark and Jons-son 2000; Cheng and Smith 1994; Drakengren and Jonsson1997; Wetprasit and Sattar 1998; Cesta et al. 2009a). In thispaper, we use the extension proposed in (Cesta et al. 2009a),where operators are annotated with intervals: for example,the expression “A contains [10,20] [2,5] B” states that theinterval A contains the interval B, and the start of A precedesthe start of B by no less than 10 and no more than 20 timeunits; similarly the end of B precedes the end of A by noless than 2 and no more than 5 time units. The semanticsof the operators is given in terms of structures describing themutual relations between the start/end point of each interval.Clearly, this quantitative formalism subsumes the qualitativeversion: each “classical” Allen operator can be obtained bysetting the quantitative intervals to [0,∞].

In our setting we consider the time to be dense. Timepoints are interpreted over real values.

Satisfiability Modulo Theory. Given a first-order formulaψ in a decidable background theory T , Satisfiability ModuloTheory (SMT) (Barrett et al. 2009) is the problem of decid-ing whether there exists a satisfying assignment to the freevariables in ψ.

In this work we concentrate on the theory of Linear Arith-metic over the Real numbers (LRA). A formula in LRA ob-tained from atoms by applying Boolean connectives (nega-tion ¬, conjunction ∧, disjunction ∨), and universal (∀)and existential (∃) quantification. Atoms are in the form∑i aixi ./ c where ./∈ {>,<,≤,≥, 6=,=}, every xi is a

real variable and every ai and c is a real constant. We denotewith QF LRA the quantifier-free fragment of LRA.

As an example, consider the QF LRA formula (x ≤y) ∧ (x + 3 = z) ∨ (z ≥ y) with x, y, z being realvariables. In the theory of real arithmetic, numerical con-stants are interpreted as the corresponding real numbers, and+,=, <,>,≤,≥ as the corresponding operations and rela-tions over R. The formula is satisfiable, and a satisfyingassignment is {x := 5, y := 6, z := 8}.

An SMT solver is a decision procedure which solves thesatisfiability problem for a formula expressed in a decidablesubset of First-Order Logic. Currently, the most efficientimplementations of SMT solvers use the so-called “lazy ap-proach”, where a SAT solver is tightly integrated with a T -solver. See (Barrett et al. 2009) for a survey.

Several techniques have been developed for remov-ing quantifiers from an LRA formula (e.g. Fourier-Motzkin (Schrijver 1998), Loos-Weispfenning (Loos andWeispfenning 1993; Monniaux 2008)): they transform anLRA formula into a QF LRA formula that is logically equiv-alent modulo the LRA theory. These techniques enable forthe solution of quantified formulae at a cost that is dou-bly exponential in time and space in the original formulasize (Schrijver 1998; Monniaux 2008; Loos and Weispfen-ning 1993).

Visible[10, 11]

Hidden[10, 12]

Send1[5, 5]

DURING

Idle[1,∞]

Send2[5, 5]

DURING

Satellite

Device

Figure 1: Running example

It is possible to reduce the consistency problem for quan-titative Allen’s algebra to SMT(QF LRA). Intuitively, foreach activity, two (start/end) real variables are introduced;each constraint over two activities is mapped to an SMT for-mula over the corresponding variables. For example, A be-fore B by at least 10 time units is expressed as B.start −A.end ≥ 10. The disjunction in SMT is essential to express“non-convex” Allen’s constraints.

In the following we will use the following shorthands. LetV be a finite set {v1, v2, . . . , vn}. We write t ∈ V for theformula t = v1 ∨ t = v2 ∨ . . .∨ t = vn. Let I be an interval[l, h). We write t ∈ I for the formula t ≥ l∧t < h. We writeI for the set of all possible intervals. Let I1 = [s1, e1) andI2 = [s2, e2) be two intervals, we write I1 ⊆ I2 if s1 ≥ s2and e1 ≤ e2.

Timelines with UncertaintyExample. Consider a communication device that can senddata packets of two different types to a satellite during thetime period in which the satellite is visible. The visibilitywindow of the satellite is not controllable by the communi-cation device and it ranges between 10 and 11 hours, whilethe satellite remains hidden in the following 10-12 hours(also uncontrollably). The device needs 5 hours to send eachpacket of data and a transmission has to happen during thevisibility window. Notice that both the satellite and the de-vice can be in each state more than once. The satellite isinitially hidden, and the device is idle. The goal is to sendone data packet per type. The situation is depicted in Fig-ure 1.

Syntax. We introduce an abstract notation for timeline-based domain descriptions. We retain all the features of theconcrete languages used in the applications. Intuitively, thetimeline framework can be thought of as a “sequential ver-sion” of Allen’s algebra, where the same activity can be in-stantiated multiple times. The instantiations are obtained bymeans of generators.

Definition 1. A generator G is a tuple (V, T, δ) such that Vis a finite set of values, T ⊆ V × V is a transition relation,δ : V → I is a temporal labeling function.

A generator represents a state variable over values V in

56

a timeline framework1. The transition relation T is usedto logically describe the evolution of the generator. IfT (vi, vj), then the end of (an instance of) activity vi canbe followed by the start of (an instance of) activity vj . Inour satellite example, the system is composed of two gen-erators: the Satellite and the Communicator. The Satellitegenerator has values {V isible,Hidden}, the transition re-lation imposes the alternation of V isible and Hidden val-ues, and the δ function imposes the minimal and maximalduration of each value (V isible → [10, 11), Hidden →[10, 12)). The Communicator generator is three-valued(Idle, Send1, Send2), the transition relation imposes theautomaton shape depicted in Figure 1 and the duration con-straints are Idle → [1,∞), Send1 → [5, 5) and Send2 →[5, 5).

In order to express the constraints between different gen-erators, we introduce the notion of synchronization.

Definition 2. Let Gi = (Vi, Ti, δi) be generators, withi ∈ {0, . . . , n}. An n-ary synchronization σ is a triple((G0, v0), {(G1, v1), . . . , (Gn, vn)}, C), such that, for alli ∈ {0, . . . , n}, vi ∈ Vi, and C is a set of Allen constraintsin the form vh ./ vk, with h, k ∈ {0, . . . , n}.

The synchronizations are based on Allen’s temporal oper-ators applied to generator values. The interpretation, how-ever, is quite different from (Allen 1983). For example,“Send1 during [0,∞) V isible” means that every instanceof Send1 occurs during some instance of V isible; similarly,“V isible during−1 [0,∞) Send1” means that during everyvisibility window some Send1 occurs. Therefore, the alge-braic properties of (Allen 1983) are not retained here. InFigure 1, we indicated two synchronizations using dashedarrows. These synchronizations are used to require that thepacket of data is sent during the visibility window.

A set of generators and a set of synchronizations are suf-ficient to define a planning domain. For what concerns theplanning problem, we do not distinguish between facts andgoals: we just require an execution that exhibits a set of(temporally-extended and temporally constrained) facts.

Definition 3. LetG = (V, T, δ) be a generator. A unary factis a tuple (G, v, Is, Ie), where v ∈ V and Is, Ie ∈ I. Let f1and f2 be two unary facts and let ./ be a quantified Allenoperator. A binary fact is a constraint in the form f1 ./ f2.

A unary fact prescribes the existence of a value v inthe execution of G, that starts during Is and ends dur-ing Ie. A binary fact is useful to impose constraints (e.g.precedence, containment) between the intervals in the cor-responding unary facts. In our satellite example, we usetwo unary facts to force the initial condition of the sys-tem: f1 = (Satellite,Hidden, [0, 0], [0,∞)) forces thesatellite to be in Hidden state at 0. Similarly, f2 =(Communicator, Idle, [0, 0], [0,∞)) constrains the initialstate of the communicator to be idle.

Similarly, to express the goals we introduce g1= (Communicator, Send1, [0,∞), [0,∞)) and g2 =(Communicator, Send1, [0,∞), [0,∞)), that require the

1Without loss of generality, we disregard the parametrizationused in some timeline languages.

communicator to be eventually in Send1 state and in Send2state. If we need to order the goals, prescribing that thepacket 1 must be sent before packet 2, we can impose a bi-nary fact g1 before[0,∞) g2. Notice that the goals are tem-porally extended, i.e. they do not simply require to reach afinal condition.

The above definitions characterize timelines in the clas-sical sense. In order to deal with temporal uncertainty, wenow introduce an annotation to distinguish controllable anduncontrollable elements.

Definition 4. A CU-annotation for a set of generators G ={Gi = (Vi, Ti, δi)} is a function χ : G ×⋃

i Vi → {C, U} ×{C, U}. A CU-annotation for a set of synchronizations S isa function χ : S → {C, U}.

With a slight abuse of notation, we overload the χ func-tion. The U flag identifies an uncontrollable element, there-fore the flagged time instant is not under the control of theagent. Instead, the C flag identifies controllable elements.Consider again the running example. If we flag both thestates of the satellite with (U, U) and all the rest as control-lable, we are modeling a situation in which the satellite vis-ibility is not decidable by the communicator, the only possi-ble assumption is the minimal and maximal durations.

We now define what is a planning problem.

Definition 5. Let G be a generator set, Σ a set of syn-chronizations over the generators in G, F and R be setsof unary and binary facts, respectively. Let χ be a CU-annotation. A timeline controllability problem P is a tuple(G,Σ,F ,R, χ).

In this work possible solutions are time-triggered plans,defined as follows.

Definition 6. A time-triggered plan is a (possibly infinite)sequence (G1, v1, cmd1, t1); (G2, v2, cmd2, t2); . . . where,for all i ≥ 1, vi is a value for Gi, cmdi ∈ {S, E}, andti ≤ ti+1.

Intuitively, at a specific time point, a time triggered planmay specify one or more start/end commands to be executedon a specific generator and value. This definition is syntac-tic; the executability of a time-triggered plan is defined atthe semantic level.

Semantics. In the following we assume that a timeline de-scription is given. We provide an interpretation of timelinesby means of streams, i.e. possibly infinite sequences of time-labeled activity instances.

Definition 7. Let G = (V, T, δ) be a generator. A stream Sfor G is a (possibly infinite) sequence (v1, d1); (v2, d2); . . .such that, for all i ≥ 1, vi ∈ V , (vi, vi+1) ∈ T , di ∈ δ(vi).

Given a stream S, we use the following nota-tion : V alue(S, i) = vi; StartT ime(S, i) =∑i−1j=1 dj ; EndTime(S, i) = StartT ime(S, i) + di;

Interval(S, i) = (StartT ime(S, i), EndT ime(S, i)).We can now define the compatibility of a stream with the

problem constraints.

57

Definition 8. Let G0, . . . , Gn be generators, and let σ =((G0, v0), {(G1, v1), . . . , (Gn, vn)}, C) be a synchroniza-tion. For 0 ≤ i ≤ n, let Si be a stream forGi. {S0, . . . , Sn}fulfills σ iff for all j0 such that (V alue(S0, j0) = v0), thereexist j1, . . . , jn such that for every constraint (vh ./ vk) ∈C, Interval(Sh, jh) ./ Interval(Sk, jk) holds.

Notice that, in general, n-ary synchronizations, cannot beexpressed in terms of binary synchronizations only. This istrue only in the case where each Allen constraint involvesone value from G0 and one from another Gi. In the caseof constraints between Gi and Gj , with i, j > 0 a bindingbetween the activities in Gi and Gj is introduced, but thebinding is further constrained by G0.Definition 9. Let G be a generator, and let S be astream for G. S fulfills the unary fact (G, v, Is, Ie) ati iff V alue(S, i) = v, StartT ime(S, i) ∈ Is andEndTime(S, i) ∈ Ie.Definition 10. Let f1 ./ f2 be a binary fact, where fi

def=

(Gi, vi, Isi , Iei). Let S1 and S2 be streams for G1 andG2 respectively. S1 and S2 fulfill f1 ./ f2 iff S1 ful-fills f1 at i1, S2 fulfills f2 at i2, and Interval(S1, i1) ./Interval(S2, i2).Definition 11. A time-triggered plan(G1, v1, cmd1, t1); (G2, v2, cmd2, t2); . . . induces astream S on G = (V, T, δ) iff for all i ≥ 1, whenG = Gi, there exists j ≥ 1 such that (1) if cmdi = Sthen StartT ime(S, j) = ti, and (2) if cmdi = E thenEndTime(S, j) = ti.Definition 12. A time triggered plan(G1, v1, cmd1, t1); (G2, v2, cmd2, t2); . . . obeys a CU-annotation χ iff for each i ≥ 1, (1) if cmdi = S thenχ(Gi, vi) ∈ {(C, C), (C, U)}, and (2) if cmdi = E thenχ(Gi, vi) ∈ {(C, C), (U, C)}.

Intuitively, this means that each assigned time point is la-beled as controllable.Definition 13. Let π be a time-triggered plan, χ a CU-annotation and G the set of generators controlled by π.π is complete with respect to χ iff for each G ∈ G,for each stream S = (v1, d1); (v2, d2); . . . of G inducedby π and for each i: (1) if χ(G, vi) ∈ {(C, C), (C, U)}then (G, vi, S, StartT ime(S, i)) ∈ π; (2) if χ(G, vi) ∈{(C, C), (U, C)} then (G, vi, E, EndT ime(S, i)) ∈ π.

In other words, if π is complete, each controllable timepoint of an induced stream S is assigned by π.Definition 14. Given the CU-annotation χ, a stream(v1, d1); (v2, d2); . . . for generator G = (V, T, δ) is saidto satisfy contingencies of G iff for each i ≥ 1, vi ∈ V ,(vi, vi+1) ∈ T and if χ(Gi, vi) ∈ {(U, U), (U, C)} thendi ∈ δ(vi).

In other words, a stream satisfies the contingencies of agenerator if it is compatible with the generator constraintson the uncontrollable values.Definition 15 (Solution to strong controllability problem).A time-triggered plan π is a strong solution for P =(G,Σ,F ,R, χ) iff it obeys and is complete w.r.t χ, and all

the streams induced by π that are compatible with the con-sistencies of the generators in G and that fulfill the synchro-nizations labeled as u, also fulfill each generator, the rest ofΣ, F andR.

Intuitively we are searching for a plan that constrains theexecution in such a way that for every possible evolution ofthe uncontrollable parts (fulfilling the assumed contingen-cies), all the problem constraints are satisfied.

In practice, we are interested in finding solutions to astrong controllability problem within a given temporal hori-zon H .

Definition 16 (Bounded solution to strong controllabilityproblem). A finite time-triggered plan π is a strong boundedsolution for P = (G,Σ,F ,R, χ) for a time horizon H ∈R+ iff the following conditions hold: (1) π obeys and iscomplete w.r.t χ; (2) all the streams compatible with π fin-ish after H; (3) each stream S that is compatible with thecontingencies of the generators in G and that satisfies thesynchronizations labeled as u, also satisfies the generatorconstraints, F , R, and the rest of Σ is satisfied for everyinterval of S that ends before H .

Note that we chose to impose no constraint on intervalthat end after the horizon, but other semantics are possible.We highlight that searching for a time-triggered plan meanssearching for a fixed assignment of controllable decisions intime. For instance, in the satellite example it is possible toproduce a time triggered plan for sending each packet onceas shown in Figure 2. However, it is not possible to sendmore packets, because the uncertainty in the satellite com-presses the guaranteed visibility window. Consider againFigure 2, the next guaranteed visibility window of the satel-lite would be [58, 60) that is too short for sending anotherpacket.

Bounded Encoding in FOLWe now reduce the problem of finding a solution for abounded strong controllability problem to a SMT problem.Intuitively, we aim at finding a finite sequence of intervals,that completely covers the time-span between 0 and the hori-zon H , that fulfill all the problem constraints. Note thatno synchronization constraints are imposed on intervals thatend after the horizon bound. The underlying idea is to log-ically model a set of bounded streams and to impose theproblem constraints on the streams. If the resulting formulais satisfiable, it means that a model for the formula codifiesa stream that witnesses a solution for the original problem.

Let H ∈ R+ be the horizon, a generator G = (V, T, δ) isassociated with a maximum number of intervals (assumingeach δ(v) > 0). A coarse upper boundMG is given dividingH by the minimal duration associated with any value in V :MG = d H

minv∈V start(δ(v))e.We use two set of variables for each generator G:

V alueOfG(j) and EndOfG(j), whose interpretation de-fines the stream for G. V alueOfG(j) gives the value of thej-th interval, while EndOfG(j) encodes the end time pointof the j-th interval. Thus, for each generator G = (V, T, δ),we can use MG variables V alueOfG(j) ranging over the

58

domain V , and MG variables EndOfG(j) of type R+ tomodel a bounded stream that is guaranteed to cover the in-terval [0, H].EndOfG(j) defines time points in which the stream

changes its value. Unfortunately, whether a time point iscontrollable or not cannot be detected statically in gen-eral. In fact, depending on the discrete path encodedin the assignments to V alueOfG(j), the j-th time pointcan be either controllable or uncontrollable. For thisreason, we have to introduce MG new variables, calledUG(j), that model the uncertain values (analogous toEndOfG(j)). In order to properly capture the strong con-trollability of the execution, we consider EndOfG(j −1) and V alueOfG(j) as existentially-quantified variables,and UG(j) as universally quantified variables. We indi-cate with UG the set of all the UG(j) variables. In or-der to impose the proper constraints on either EndOfG(j)or UG(j) we have to condition the constraint on the con-trollability of the j-th interval that is decided at solv-ing time. Therefore we introduce two macros SG(j, UG)and EG(j, UG) that encapsulate this conditioning and re-turn the proper value that encodes the start or the end ofthe j-interval respectively. The first formula, SG(j, UG),is defined as ite(j = 0, 0, ite(χ(G,V alueOfG(j)) ∈{(C, C), (C, U)}, EndOfG(j − 1), UG(j − 1))). Simi-larly, EG(j, UG) is defined as ite(χ(G,V alueOfG(j)) ∈{(C, C), (U, C)}, EndOfG(j), UG(j)).

Let UsedG(j) be the predicate defined as EG(j, UG) ≤H . The encoding is defined as follows. For each generatorG= (V, T, δ), we define V alueG

def=

∧MG

j=1 V alueOfG(j) ∈

V to force the domain of V alueOfG(j) and TransGdef=∧MG−1

j=1 T (V alueOfG(j), V alueOfG(j+1)) to codify thetransition relation of G.

We split the constraints encoding the interval durations intwo distinct formulae as follows.

ΓG(UG) =∧MGj=1((χ(G,V alueOfG(j)) ∈ {(C, U), (U, U)})→(EG(j, UG)− SG(j, UG) ∈ δ(V alueOfG(j))))

ΨG(UG) =∧MGj=1((χ(G,V alueOfG(j)) ∈ {(C, C), (U, C)})→(EG(j, UG)− SG(j, UG) ∈ δ(V alueOfG(j))))

For every uncontrollable synchronization σ =((G0, v0), {(G1, v1), . . . , (Gn, vn)}, C) (χ(σ) = U),we define Γσ(UG0 , . . . , UGn) as follows.

∧MG0j0=1(V alueOfG0(j0) = v0 ∧ UsedG0(j0))→

(∨MG1j1=1

(V alueOfG1(j1) = v1 ∧ UsedG1(j1)

)∧ . . .

(∨MGnjn=1(V alueOfGn(jn) = vn ∧ UsedGn(jn))∧∧vk./vh∈C ξ(./, S

Gk (jk, UGk ), EGk (jk, U

Gk ),

SGh(jh, UGh), EGh(jh, U

Gh))) . . .)

Where ξ(./, s1, e1, s2, e2) is the LRA encoding of the Allenconstraint I1 ./ I2 with the interval Ii being (si, ei). Wealso define Ψσ(UG0 , . . . , UGn) in the very same way foreach controllable synchronization (χ(σ) = C). The for-mula encoding unary facts is obtained by imposing the ex-istence of a compatible interval in the considered stream.For each unary fact f = (G, v, Is, Ie) we define Ψf (UG)

Hidden Visible Hidden Visible

Idle Send1 Idle Send2

Satellite

Device

0 10 12 15 20 23 30 35 40

Figure 2: An execution of the satellite example that fulfillsthe problem constraints. The striped regions are uncertain:depending on the actual duration of the intervals the satellitecan be either in Hidden or in Visible state.

as∨MG

j=1 Fact(UG, j) where Fact(UG, j) is UsedG(j) ∧

(V alueOfG(j) = v)∧ SG(j, UG) ∈ Is ∧EG(j, UG) ∈ IeFor every binary fact requirement r = f1 ./ f2,

where fi = (Gi, vi, Isi , Iei) we define Ψr(UG1 , UG2)

as∨MG1j1=1

∨MG2j2=1(Fact(UG1 , j1) ∧ Fact(UG2 , j2) ∧ ξ(./

, SG1(j1, UG1), EG1(j1, U

G1), SG2(j2, UG2), EG2(j2, U

G2))).Finally, let Σu be the subset of Σ of the uncontrollable

synchronizations and let Σc be Σ/Σu. The overall encodingfor the problem is:∧G∈G V alueG ∧

∧G∈G TransG ∧ ∀UG0 , . . . , UGn .

((∧G∈G ΓG(UG) ∧∧σ∈Σu

Γσ(UG0 , . . . , UGn))→(∧G∈G ΨG(UG) ∧∧σ∈Σc

Ψσ(UG0 , . . . , UGn)∧∧f=(G,v,Is,Ie)∈F Ψf (UG)∧∧r=(G1,v1,Is1 ,Ie1 )./(G2,v2,Is2 ,Ie2 )∈RΨr(U

G1 , UG2))).

The universal quantification captures the “universality” ofthe solution: for each possible allocation of the uncontrol-lables, given by UG0 , . . . , UGn , we impose that the con-tingent part of the problem implies the requirements. Theencoding admits a model iff there exist a bounded solutionto the original problem and the model can be used to build acomplete time-triggered plan for the original bounded strongcontrollability problem. This formula is a first-order quan-tification over a finite set of real variables. Therefore, it canbe decided by a SMT(LRA) solver equipped with a quanti-fier elimination procedure.

Related WorkThis work is most closely related to two research lines:timeline-based planning, and temporal problems with uncer-tainty.

The literature on timeline-based planning is extensive,starting from the seminal work described in (Muscettola1993), and including the APSI framework (Cesta et al.2009a; Cesta, Fratini, and Pecora 2008; Cesta et al. 2009b),the EUROPA framework (Frank and Jonsson 2003) with itsformalization in (Bernardini 2008), and (Verfaillie, Pralet,and Lemaıtre 2010). The key difference of our work is thatwe provide a full formal account of timelines with tempo-ral uncertainty, while the frameworks mentioned above as-sume controllable duration of activities. The problem ad-dressed in this paper, requiring universal quantifications tomodel the effect of an “adversarial” environment, is signif-icantly harder than the “consistency” problem, where quan-tification over (start and end) time points is only existen-tial. It is important to emphasize the that problem of finding

59

flexible timelines (Cesta et al. 2010b; 2010a) is very differ-ent from the one solved here: timeline flexibility demandsthe scheduling of the activities to the executor, but does notguarantee goal achievement in a temporally uncertain do-main, with uncontrollable durations. Finally, we mentionthat we use a dense-time interpretation: we represent timepoints as real variables, while APSI and EUROPA use inte-gers.

There are various extensions of temporal problems withuncertainty, starting from (Vidal and Fargier 1999), tostrong (Peintner, Venable, and Yorke-Smith 2007; Cimatti,Micheli, and Roveri 2012a), weak (Venable et al. 2010;Cimatti, Micheli, and Roveri 2012b), and dynamic control-lability (Morris, Muscettola, and Vidal 2001). In temporalproblems, the number of instances of activities is known apriori. This is a key difference with the work discussed here,where determining the right type and number of activities ispart of the problem.

IxTeT (Ghallab and Laruelle 1994) is a temporal planningsystem that is able to deal with temporal uncertainty. Dif-ferently from our approach, IxTeT does not produce robustplans. The approach separates planning and scheduling, bydemanding to the plan executor the on-line solution of thedynamic controllability of a temporal problem with uncer-tainty.

For completeness, we also contrast our work with the(less related) work on planning for durative actions basedon PDDL (Coles et al. 2012; 2009). The first difference isimplicit in the two modeling paradigms – for example, time-lines can naturally express temporally extended goals. Moreimportantly, planning for durative actions (Coles et al. 2012;2009) assumes that the duration of actions is controllable.

EvaluationImplementation. The approach described in previous sec-tions was implemented in the first (sound and complete) de-cision procedure for timelines with uncertainty. We devel-oped a tool chain that uses an APSI-like syntax for specify-ing the planning domain and problem, with an extension forCU-annotations of generator states and synchronizations.

The implementation uses the state-of-the-art MathSATSMT solver (Cimatti et al. 2012) as a backend, and a quanti-fier elimination procedure based on the Loos-Weispfenningmethod (Loos and Weispfenning 1993).

The tool implements the encoding described in Section ,in the following referred to as “Monolithic”. We also im-plemented an “Incremental” approach, where the incremen-tality feature of the SMT solver is exploited2. The idea isto limit the number of considered intervals for a generator(MG), thus resulting in a smaller and easier formula to de-cide. If the check returns a plan, then the algorithm canterminate, otherwise more intervals are considered, until wereach the MG for each generator. This solution often avoids

2In an incremental setting, a solver instance can be queried forthe satisfiability of a formula and then clauses can be pushed orpopped to obtain another formula that can be decided, possibly re-cycling parts of the previous search.

Type Problem Monolithic IncrementalTime(s) Memory(Mb) Time(s) Memory(Mb)

SatSatellite 6.87 111.5 1.88 31.9

Machinery1 TO TO 360.15 611.5Meeting MO MO 182.52 1897.0

UnsatSatellite 7.17 126.2 171.25 147.6

Machinery2 104.86 253.7 113.53 284.4Meeting 23.12 630.8 105.17 776.9

Table 1: Experimental results.

submitting to the solver the (bigger) formula correspondingto the whole problem.

Both approaches were optimized by applying a rewritingsimilar to the one described in (Cimatti, Micheli, and Roveri2012a): the formula of the bounded encoding has the form∀~x.Γ(~x) → ∧

h ψh(~x) and can be equivalently rewritten as∧h ∀~x.Γ(~x) → ψh(~x). This is often useful, since a large

number of small quantifications may be preferable to a sin-gle Monolithic one.

Experiments. No competitor approach is available for theproblem that we address. For this reason, we compare theperformances of the Monolithic approach and the Incremen-tal one.

We considered problems on three different domains, andran several (solvable and unsolvable) problems, with vari-ous time horizons. The results, reported in Table 1, showthat the Incremental approach outperforms the Monolithicone in all the instances that have a solution. On unsolvableinstances, the Monolithic approach is superior. This is be-cause the Incremental version has to solve increasingly dif-ficult problems, and in order to prove the unsatisfiability thealgorithm ends up solving the same problem the Monolithicapproach solved in the first place. In sat-instances, instead,the Incremental algorithm can early-terminate without hav-ing to solve the big quantifications needed for the termina-tion proof. For the lack of space we omit the description ofthe used domains and problems. An archive containing therunnable tool, the tested instances and the relative explana-tions can be found at https://es.fbk.eu/people/amicheli/resources/aaai13.

ConclusionsWe presented a formal foundation for planning with time-lines under temporal uncertainty, where the duration of ac-tions can not be controlled by the executor. We proposedthe first decision procedure that is able to produce time-triggered plans satisfying the problem constraints regardlessof temporal uncertainty, and implemented it leveraging thepower of SMT solvers.

In the future, we plan to extend the framework to en-compass resource consumption, to investigate more efficientquantifier-elimination techniques and other forms of lazy en-coding, and to address the production of conditional plansthat are robust with respect to temporal uncertainty.

60

ReferencesAllen, J. F. 1983. Maintaining knowledge about temporal intervals.Communication of the ACM 26(11):832–843.Angelsmark, O., and Jonsson, P. 2000. Some observations ondurations, scheduling and allen’s algebra. In CP, 484–488.Barreiro, J.; Boyce, M.; Do, M.; Frank, J.; Iatauro, M.; Kichkaylo,T.; Morris, P.; Ong, J.; Remolina, E.; Smith, T.; and Smith, D.2012. EUROPA: A Platform for AI Planning, Scheduling, Con-straint Programming, and Optimization. In Proc. of 4th Interna-tional Competition on Knowledge Engineering for Planning andScheduling (ICKEPS).Barrett, C. W.; Sebastiani, R.; Seshia, S. A.; and Tinelli, C. 2009.Satisfiability modulo theories. In Handbook of Satisfiability. IOSPress. 825–885.Bernardini, S. 2008. Constraint-based Temporal Planning: Is-sues in Domain Modelling and Search Control. Ph.D. Dissertation,Faculty of Computer Science – University of Trento (Italy).Cesta, A.; Fratini, S.; Oddi, A.; and Pecora, F. 2008. APSI Case#1:Pre-planning Science Operations in Mars Express. In Proceedingsof i-SAIRAS-08, the 9th International Symposium in Artificial In-telligence, Robotics and Automation in Space.Cesta, A.; Cortellessa, G.; Fratini, S.; Oddi, A.; and Rasconi, R.2009a. The APSI Framework: a Planning and Scheduling SoftwareDevelopment Environment. In Working Notes of the ICAPS-09 Ap-plication Showcase Program.Cesta, A.; Cortellessa, G.; Fratini, S.; and Oddi, A. 2009b. Devel-oping an end-to-end planning application from a timeline represen-tation framework. In Haigh, K. Z., and Rychtyckyj, N., eds., IAAI.AAAI.Cesta, A.; Finzi, A.; Fratini, S.; Orlandini, A.; and Tronci, E.2010a. Analyzing flexible timeline-based plans. In Coelho, H.;Studer, R.; and Wooldridge, M., eds., ECAI, volume 215 of Fron-tiers in Artificial Intelligence and Applications, 471–476. IOSPress.Cesta, A.; Finzi, A.; Fratini, S.; Orlandini, A.; and Tronci, E.2010b. Validation and verification issues in a timeline-based plan-ning system. Knowledge Eng. Review 25(3):299–318.Cesta, A.; Cortellessa, G.; Fratini, S.; and Oddi, A. 2011. Mrspock- steps in developing an end-to-end space application. Computa-tional Intelligence 27(1):83–102.Cesta, A.; Fratini, S.; and Pecora, F. 2008. Planning with multiple-components in omps. In Nguyen, N. T.; Borzemski, L.; Grzech,A.; and Ali, M., eds., IEA/AIE, volume 5027 of Lecture Notes inComputer Science, 435–445. Springer.Cheng, C.-C., and Smith, S. F. 1994. Generating feasible schedulesunder complex metric constraints. In Hayes-Roth, B., and Korf,R. E., eds., AAAI, 1086–1091. AAAI Press / The MIT Press.Cimatti, A.; Griggio, A.; Sebastiani, R.; and Schaafsma, B. 2012.The MathSAT5 SMT solver. http://mathsat.fbk.eu.Cimatti, A.; Micheli, A.; and Roveri, M. 2012a. Solving temporalproblems using smt: Strong controllability. In CP, 248–264.Cimatti, A.; Micheli, A.; and Roveri, M. 2012b. Solving temporalproblems using smt: Weak controllability. In AAAI.Coles, A.; Fox, M.; Halsey, K.; Long, D.; and Smith, A.2009. Managing concurrency in temporal planning using planner-scheduler interaction. Artif. Intell. 173(1):1–44.Coles, A. J.; Coles, A.; Fox, M.; and Long, D. 2012. Colin: Plan-ning with continuous linear numeric change. J. Artif. Intell. Res.(JAIR) 44:1–96.

Donati, A.; Policella, N.; Cesta, A.; Fratini, S.; Oddi, A.; Cortel-lessa, G.; Pecora, F.; Schulster, J.; Rabenau, E.; Niezette, M.; andSteel, R. 2008. Science Operations Pre-Planning & Optimiza-tion using AI constraint-resolution - the APSI Case Study 1. InProceedings of SpaceOps-08, the 10th International Conferenceon Space Operations.Drakengren, T., and Jonsson, P. 1997. Eight maximal tractablesubclasses of allen’s algebra with metric time. J. Artif. Intell. Res.(JAIR) 7:25–45.Frank, J., and Jonsson, A. 2003. Constraint-based Attribute andInterval Planning. Constraints 8(4):339–364.Ghallab, M., and Laruelle, H. 1994. Representation and control inixtet, a temporal planner. In AIPS, 61–67.Loos, R., and Weispfenning, V. 1993. Applying linear quantifierelimination. Computer Journal 36(5):450–462.Monniaux, D. 2008. A Quantifier Elimination Algorithm for Lin-ear Real Arithmetic. In Logic for Programming, Artificial Intelli-gence, and Reasoning - LPAR, 243–257.Morris, P. H.; Muscettola, N.; and Vidal, T. 2001. Dynamic controlof plans with temporal uncertainty. In International Joint Confer-ence on Artificial Intelligence - IJCAI, 494–502.Muscettola, N. 1993. Hsts: Integrating planning and scheduling.Technical report, DTIC Document.Peintner, B.; Venable, K. B.; and Yorke-Smith, N. 2007. Strongcontrollability of disjunctive temporal problems with uncertainty.In Bessiere, C., ed., Principles and Practice of Constraint Pro-gramming - CP, volume 4741 of LNCS, 856–863. Springer.Schrijver, A. 1998. Theory of Linear and Integer Programming. J.Wiley & Sons.Venable, K. B.; Volpato, M.; Peintner, B.; and Yorke-Smith, N.2010. Weak and dynamic controllability of temporal problems withdisjunctions and uncertainty. In Workshop on Constraint Satisfac-tion Techniques for Planning & Scheduling, 50–59.Verfaillie, G.; Pralet, C.; and Lemaıtre, M. 2010. How to modelplanning and scheduling problems using constraint networks ontimelines. Knowledge Eng. Review 25(3):319–336.Vidal, T., and Fargier, H. 1999. Handling contingency in temporalconstraint networks: from consistency to controllabilities. Journalof Experimental Theoretical Artificial Intelligence 11(1):23–45.Wetprasit, R., and Sattar, A. 1998. Temporal reasoning with qual-itative and quantitative information about points and durations. InMostow, J., and Rich, C., eds., AAAI/IAAI, 656–663. AAAI Press /The MIT Press.

61

62

Can Planning meet Data Cleansing?

Roberto Boselli and Mirko Cesarini and Fabio Mercorio and Mario MezzanzanicaDept. of Statistics and Quantitative Methods - CRISP Research Centre, University of Milano-Bicocca, Milan, Italy

Abstract

One of the motivations for research in data quality isto automatically identify cleansing activities, namelya sequence of actions able to cleanse a dirty dataset,which today are often developed manually by domain-experts. Here we explore the idea that AI Planning cancontribute to identify data inconsistencies and automat-ically fix them. To this end, we formalise the concept ofcost-optimal Universal Cleanser - an object summaris-ing the best cleansing actions for each feasible data in-consistency - as a planning problem, then we present amotivating government application in which it has beused.Keywords: Data Quality, Data Cleansing, GovernmentApplication

1 Introduction and Related WorkToday, most of the personal, business, and administra-tive data are collected, managed and distributed elec-tronically, thanks to the wide diffusion of the Informa-tion Systems (IS). Today, most researchers agree thatthe quality of data is frequently poor (Fisher et al. 2012)and, according to the “garbage in, garbage out” princi-ple, dirty data may affect the effectiveness of decisionmaking processes. In such a scenario, the data cleansing(or cleaning) research area focuses on the identificationof a set of domain-dependent activities able to cleanse adirty database (wrt quality requirements), which usu-ally have been realised in the industry by means ofbusiness rules. However, the design of such businessrules relies on the experience of domain-experts, thus,there is no guarantee that the cleansing activities iden-tified are suitable for the analysis purposes. Further-more, exploring cleansing alternatives is a very time-consuming task: each business rule has to be developedand analysed separately, and resulting solutions needto be compared and evaluated manually. In this regard,expressing data cleansing problems via planning maycontribute in addressing the following open issues.(i) Modelling the behaviour of longitudinal data. Usuallylongitudinal data (aka panel or historical data) extracted byISs provide knowledge about a given subject, object or

Copyright c© 2013, Association for the Advancement of Arti-ficial Intelligence (www.aaai.org). All rights reserved.

phenomena observed at multiple sampled time points.In this regard, planning languages like PDDL can helpdomain experts formalising how data should evolveaccording to an expected behaviour. Indeed, as arguedby McDermott et al. (1998) “the PDDL language is in-tended to express the physics of a domain, that is, whatpredicates there are, what actions are possible, what thestructure of compound actions is, and what the effectsof actions are“. In our context, a planning domain de-scribes how data arriving from the external world - thedatabase - may change the subject status1, while a plan-ning instance is initiated with subject’s data. The goalof such a planning problem is to evaluate if data evolveaccording to the domain model.(ii) Expressing data quality requirements. Data quality is adomain-dependent concept, usually defined as “fitnessfor use”, thus quality considered appropriate for oneuse may not be sufficient for another use. Here we shallfocus on consistency, which refers to “the violation ofsemantic rules defined over (a set of) data items, whereitems can be tuples of relational tables or records in afile“ (Batini and Scannapieco 2006). In reference to rela-tional models, such “semantic rules” have usually beenexpressed through functional dependencies (FDs), con-ditional dependencies, join dependencies, and inclu-sion dependencies, useful for specifying integrity con-straints. Indeed, as argued by Chomicki (1995), FDs areexpressive enough to model static constraints, whichevaluate the current state of the database, but they donot take into account the past, that is how the currentstate of the database has evolved over time. Further-more, even though FDs enable the detection of errors,they have limited usefulness since they fall short ofacting as a guide in correcting them (Fan et al. 2010).Finally, FDs are only a fragment of first-order logicand this motivates the usefulness of formal systems indatabases, as studied by Vardi (1987). In this regard,planning formalisms are expressive enough to modelcomplex temporal constraints, e.g. by using temporallogics in expressing goal conditions. A cleansing ap-proach based on AI planning might allow domain ex-

1 “Status” here is considered in terms of a value assign-ment to a set of finite-domain state variables.

63

perts to concentrate on what quality constraints need tobe modelled rather than on how to verify them.

(iii) Automatic identification of cleaning activi-ties. A gap between practice-oriented approachesand academic research contributions still existsin the data quality field. From an academic pointof view, two very effective approaches based onFDs are database repair and consistent query answer-ing (Chomicki and Marcinkowski 2005). The formeraims at finding a repair, i.e. a database instancethat satisfies integrity constraints and minimallydiffers from the original one while the latter triesto compute consistent query answers in responseto a query, namely answers that are true in everyrepair of the given database, but without fixing thesource data. The main drawback of these approachesis that finding consistent answers to aggregatequeries becomes NP-complete already using two(or more) FDs, as observed by Bertossi(2006). Tomitigate this problem, a number of works haverecently exploited heuristics to find a database re-pair (Yakout, Berti-Équille, and Elmagarmid 2013)(Kolahi and Lakshmanan 2009). These approachesseem to be very promising, even though their effec-tiveness has not been evaluated on real-life domains.

From an industry perspective, a lot of off-the-shelftools are available and well-supported, but they oftenlack of formality in addressing domain independentproblems, as the case of several ETL tools2. In suchtools a quite relevant amount of the data analysis andcleaning work has still to be done manually or by ad-hoc routines, that may be difficult to write and main-tain (Rahm and Do 2000).

In this regard, planning can contribute to the syn-thesis of optimal cleansing activities (wrt a domain-dependent objective function) by enabling domain ex-perts to express complex quality requirements and ef-fortlessly identify the best suited cleansing actions fora particular data quality context.

2 The Labour Market DatasetThe scenario we are presenting focuses on theItalian labour market domain studied by statisti-cians, economists and computer scientists at theCRISP Research Centre3. According to the Italianlaw, every time an employer hires or dismisses anemployee, or an employment contract is modified(e.g. from part-time to full-time, or from fixed-termto unlimited-term), a Compulsory Communication -an event - is sent to a job registry. The public ad-ministration has developed an ICT infrastructure

2In the ETL approach (Extract, Transform and Load) dataextracted from a source system pass through a sequence oftransformations, that analyse, manipulate and then cleansethe data before loading them into a Datawarehouse

3The Interuniversity Research Centre on Public Services.http://www.crisp-org.it/english

(The Italian Ministry of Labour and Welfare 2012)generating an administrative archive usefulfor studying the labour market dynamics (see,e.g.(Lovaglio and Mezzanzanica 2013)). Each manda-tory communication is stored into a record whichpresents several relevant attributes. e_id and w_idare ids identifying the communication and the personinvolved. e_date is the event occurrence date whilste_type describes the event type occurring to theworker’s career. Events types are the start, cessationand extension of a working contract, and the conversionfrom a contract type to a different one. c_flag stateswhether the event is related to a full-time or a part-timecontract while c_type describes the contract type withrespect to the Italian law. Here we consider the Limited(fixed-term) and unlimited (unlimited-term) contracts.Finally, empr_id uniquely identifies the employerinvolved. A communication represents an event arriv-ing from the external world (ordered with respectto e_date and grouped by w_id), whilst a career is alongitudinal data sequence whose consistency have tobe evaluated. To this end, the consistency semantics hasbeen derived from the Italian labour law and from thedomain knowledge as follows.c1: an employee cannot have further contracts if a full-time is active;c2: an employee cannot have more than K part-timecontracts (signed by different employers), in our con-text we shall assume K = 2;c3: an unlimited term contract cannot be extended;c4: a contract extension can change neither the contracttype (c_type) nor the modality (c_ f lag), for instance apart-time and fixed-term contract cannot be turned intoa full-time contract by an extension;c5: a conversion requires either the c_type or the c_ f lagto change (or both).

Table 1: Example of a worker career

e_date e_type c_flag c_type empr_id1st May 2010 start PT limited CompanyX1st Nov 2010 convert PT unlimited CompanyX12th Jan 2012 convert FT unlimited CompanyX28th July 2013 start PT limited CompanyY

For simplicity, we omit to describe some trivial con-straints e.g., an employee cannot have a cessation eventfor a company for which she/he does not work, an eventcannot be recorded twice, etc.

To clarify the matter, let us consider a worker’s ca-reer as in Tab. 1. A worker started a limited-term part-time contract with CompanyX in May 2010. After sixmonth the contract was converted to unlimited-term.Then, in January 2012 the worker converted its con-tract from part-time to full-time. Finally, in July 2013a communication arrived from CompanyY reportingthat the worker had started a new part-time contract,but no communication concerning the cessation of theprevious active contract had ever been notified. Thelast communication makes the career inconsistent with

64

respect to the labour law as it violates the constraintc1. Being able to catch and fix such an inconsistencyis quite important as it may strongly affect a statisti-cal indicator based on “working days”, and this makesthe data cleansing process necessary to guarantee thebelievability of the overall decision making process.Clearly, there are several alternatives that domain ex-perts may define to fix an inconsistency (as shown inTab. 2) many of which are often inspired by commonpractice. Probably, in the example above the commu-nication was lost, thus it is reasonable to assume thatthe full-time contract has been closed in a period be-tween 1st January 2012 and 28th July 2013. However, onemight argue that the communication might have beena conversion to part-time rather than a cessation of theprevious contract and, in such a case, the career doesnot violate any constraints as two part-time contractsare allowed. Although this last scenario seems unusual,a domain expert should consider such hypothesis.

3 Data Cleansing as PlanningModelling data cleansing as a planning problem canbe used to (i) confirm if data evolution follows or notan expected behaviour (wrt quality requirements) and(ii) to support domain experts in the identification ofall cleansing alternatives, summarising those that min-imize/maximize an indicator. To this aim, a planneris well-suited to explore all the feasible actions able tocleanse the dataset and to select the best ones with re-spect to given criteria, which are domain-dependent.Notice that an IS recording longitudinal data can beseen as an event-driven system where a database recordis an event modifying the system state whereas an or-dered set of records forms an event sequence. We canformalise this concept as follows.Definition 1. Let R = (R1, . . . ,Rn) be a schema relation ofa database. Then,(i) An event e = (r1, . . . ,rm) is a record of the projection(R1, . . . ,Rm) over R with m ≤ n, s.t. r1 ∈ R1, . . . ,rm ∈ Rm;(ii) Let ∼ be a total order relation over events, an eventsequence is a ∼-ordered sequence of events ǫ = e1, . . . ,enconcerning the same object or subject.

In classical planning, a model describes how the sys-tem evolves in reaction to input actions. A planner firesactions to explore the domain dynamics, in search ofa (optimal) path to the goal. Similarly, when dealingwith longitudinal data the system is represented bythe object or subject we are observing, while an eventrepresents an action able to modify the state of thesystem. Once a model describing the evolution of anevent sequence has been defined, a planner works intwo distinct and separated steps.

First, it simulates the execution of all the feasible(bounded) event sequences, summarising all the incon-sistencies into an object, the so-called Universal Checker(UCK). This represents a topology of all the feasibleinconsistencies that may affect a data source. Second,for each inconsistency classified into the UCK with a

unique identifier - the error-code - the planner looks forthe best correction by exploring all the cleansing al-ternatives. The result is a Universal Cleanser (UC) thatenhances the UCK with a sequence of actions able tofix the inconsistency.

The first step can be easily accomplished by enablinga planner to continue the search when a goal (an incon-sistency) has been found. Indeed, in this first phase, thegoal of the planning problem is to identify an inconsis-tency, that is a violation of one or more semantic rules.We formalise our planning problem on FSSs.Definition 2. A Finite State System (FSS) S is a 4-tuple(S,I,A,F), where: S is a finite set of states, I ⊆ S is a finite setof initial states, A is a finite set of actions and F : S×A→ Sis the transition function, i.e. F(s,a)= s′ iff the system fromstate s can reach state s′ via action a.

Then, a planning problem on FSS is a triple PP =(S,G,T) where s0 ∈ S, G ⊆ S is the set of the goal states,and T is the finite temporal horizon. A solution for PP is apath on the FSS (plan) π = s0a0s1a1 . . .sn−1an−1sn where,∀i = 0, . . . ,n− 1, si ∈ Reach(S) is a state reachable from theinitial ones, ai ∈ A is an action, F(si,ai) is defined, |π| ≤ T,and sn ∈ G ⊆ Reach(S).

Finally, letΠ be the set of all plans, a Universal CheckerC is a set of state-action pairs that for eachπ ∈Π summarisesall the pairs (sn−1,an−1) s.t. F(sn−1,an−1) ∈ G.

Second, for each pair (si,ai) ∈ C denoting an incon-sistency, we shall construct a new planning problemwhich differs from the previous one as follows: (i) thenew initial state is I = {si}, where si is the state before theinconsistent one, that is F(si,ai)= si+1 where si+1 violatesthe rules, and (ii) the new goal is to “execute action ai”.Intuitively, a corrective action sequence represents analternative route leading the system from a state si to astate s j where the action ai can be applied (without vio-lating the consistency rules). To this aim, in this phasethe planner explores the search space and selects the bestcorrections according to a given policy.Definition 3. Let PP = (S,G,T) be a planning problemand letC be a Universal Checker of PP. Then, a T-cleansingaction sequence for the pair (si,ai) ∈ C is a non-emptysequence of actions ǫc = c0, . . . ,cn, with |ǫc| ≤ T s.t. exists apath πc = sic0 . . . si+ncn skaisk+1 on S, where all the statessi, . . . ,sk < G whilst sk+1 ∈ G.

Finally, let C : S×A→ R+ be a cost function, a cost-optimal cleansing sequence is a sequence s.t. for all othersequences π′c the following holds: C(πc) ≤ C(π′c).

Table 2: Some corrective action sequences

state employed [FT,Limited,CompanyX]Inconsistent event (start,PT,Limited,CompanyY)

Alternative 1 (cessation,FT,Limited,CompanyX)Alternative 2 (conversion,PT,Limited,CompanyX)Alternative 3 (conversion,PT,Unlimited,CompanyX)

Alternative 4 (conversion,FT,Unlimited,CompanyX)(cessation,FT,Unlimited,CompanyX)

Then, a UC is a collection of cleansing action se-quences synthesised for each inconsistency identified.

65

Definition 4. Let C be a universal checker. A UniversalCleanser is a mapK : Reach(S)×A→ 2A which assigns toeach pair (si,ai) ∈ C a T-cleansing action sequence ǫc.

Apply

Cleansing

Planning

Domain

Planner

Source

Dataset

Universal

CleanserCleansed

Dataset

Sequence

consistent

Sequence

inconsistent

Planning

Problem Generate a new

Planning Problem

Figure 1: Overview of the cleansing process

The UC is synthesised off-line and it contains a singlecost-optimal action sequence for each entry. Clearly, thecost function is domain-dependent and usually drivenby the purposes of the analysis.To give a few examples,one could fix the data by minimising/maximising eitherthe number of interventions or an indicator computedon the overall cleansed sequence. We remark that theUC generated is domain-dependent as it can deal onlywith event sequences conforming to the model usedduring its generation. On the other hand, the UC is alsodata-independent since it has been computed by takinginto account all the feasible (bounded) event sequences,and this makes the UC able to cleanse any data source,as shown in Fig. 1.

Preliminary Results and Expected Outlook.Actually, we used the UPMurphi temporal plan-ner (Della Penna et al. 2012; Della Penna et al. 2009) tosynthesise a UC for our domain, by exploiting theplanning as model checking paradigm, as shown in(Mezzanzanica et al. 2013; Boselli et al. 2013). The UChas been synthesised with no optimisation require-ments, thus the result is an exhaustive repository ofall the (bounded) feasible cleansing activities. TheUC contains 342 different error-codes, i.e. all the pos-sible 3-steps (state,action) pairs leading to an incon-sistent state of the model. We have performed a dataconsistency evaluation on an online dataset providedin (Mezzanzanica et al. 2013) 4 composed by 1,248,814anonymized mandatory communications describingthe careers of 214,429 people observed in ten years.UPMurphi has recognised 92,598 careers as inconsis-tent (43% of total) assigning an error-code to each ofthem.

Such a kind of result is actually quite relevant fordomain-experts at CRISP as it provides a bird’s eyeview of the inconsistency distribution affecting thesource dataset. For instance, we discovered that about30% of total inconsistencies arose due to an exten-sion, cessation or conversion event received when the

4Dataset available at http://goo.gl/zrbrR

worker was in the unemployed status, showing that theidentification of cleansing activities for these careers isessential to guarantee the quality of the cleansed data.

As a further step, we intend to model the labour mar-ket domain through PDDL3 (that would enable the useof temporal logics) and to realise the process describedin Fig. 1 by connecting a PDDL planner to our DBMS,so that the archive can be cleansed at real-time.

References[Batini and Scannapieco 2006] Batini, C., and Scannapieco, M.

2006. Data Quality: Concepts, Methodologies and Techniques.Data-Centric Systems and Applications. Springer.

[Bertossi 2006] Bertossi, L. 2006. Consistent query answeringin databases. ACM Sigmod Record 35(2):68–76.

[Boselli et al. 2013] Boselli, R.; Cesarini, M.; Mercorio, F.; andMezzanzanica, M. 2013. Inconsistency knowledge discoveryfor longitudinal data management: A model-based approach.In CHI-KDD, volume 7947 of LNCS, 183–194. Springer.

[Chomicki and Marcinkowski 2005] Chomicki, J., andMarcinkowski, J. 2005. On the computational complexityof minimal-change integrity maintenance in relationaldatabases. In Inconsistency Tolerance. Springer. 119–150.

[Chomicki 1995] Chomicki, J. 1995. Efficient checking of tem-poral integrity constraints using bounded history encoding.ACM Transactions on Database Systems (TODS) 20(2):149–186.

[Della Penna et al. 2009] Della Penna, G.; Intrigila, B.; Maga-zzeni, D.; and Mercorio, F. 2009. UPMurphi: a tool for univer-sal planning on PDDL+ problems. In ICAPS 2009, 106–113.AAAI Press.

[Della Penna et al. 2012] Della Penna, G.; ; Magazzeni, D.; andMercorio, F. 2012. A universal planning system for hybriddomains. Applied Intelligence 36(4):932–959.

[Fan et al. 2010] Fan, W.; Li, J.; Ma, S.; Tang, N.; and Yu, W.2010. Towards certain fixes with editing rules and masterdata. Proceedings of the VLDB Endowment 3(1-2):173–184.

[Fisher et al. 2012] Fisher, C.; Lauría, E.; Chengalur-Smith, S.;and Wang, R. 2012. Introduction to information quality.

[Kolahi and Lakshmanan 2009] Kolahi, S., and Lakshmanan,L. V. 2009. On approximating optimum repairs for functionaldependency violations. In ICDT, 53–62. ACM.

[Lovaglio and Mezzanzanica 2013] Lovaglio, P. G., and Mez-zanzanica, M. 2013. Classification of longitudinal careerpaths. Quality & Quantity 47(2):989–1008.

[McDermott et al. 1998] McDermott, D.; Ghallab, M.; Howe,A.; Knoblock, C.; Ram, A.; Veloso, M.; Weld, D.; and Wilkins,D. 1998. Pddl-the planning domain definition language.

[Mezzanzanica et al. 2013] Mezzanzanica, M.; Boselli, R.; Ce-sarini, M.; and Mercorio, F. 2013. Automatic synthesis of datacleansing activities. In DATA, 138 – 149. Scitepress.

[Rahm and Do 2000] Rahm, E., and Do, H. 2000. Data clean-ing: Problems and current approaches. IEEE Data EngineeringBulletin 23(4):3–13.

[The Italian Ministry of Labour and Welfare 2012] The ItalianMinistry of Labour and Welfare. 2012. Annual reportabout the CO system, available at www.cliclavoro.gov.it/Barometro-Del-Lavoro/Documents/Rapporto_CO/Executive_summary.pdf last accessed: November 2013.

[Vardi 1987] Vardi, M. 1987. Fundamentals of dependencytheory. Trends in Theoretical Computer Science 171–224.

[Yakout, Berti-Équille, and Elmagarmid 2013] Yakout, M.;Berti-Équille, L.; and Elmagarmid, A. K. 2013. Don’t bescared: use scalable automatic repairing with maximallikelihood and bounded changes. In International conferenceon Management of data, 553–564. ACM.

66

Evaluating Plan Robustness in Presence of Numeric Fluents

Enrico ScalaDepartment of Computer Science - Universita’ di Torino

Corso Svizzera 185Torino, 10149 Italia

Abstract

This paper presents an ongoing research activity on thecontext of robust plan execution involving consumableand continuous resources modeled as numeric fluents.In particular, the document shows how it is possible toexploit the notion of numeric kernel for reasoning aboutthe robustness of a plan, given a domain model and aninitial state of the world. The contribution of the paperis a metric that can be exploited to evaluate how much aplan can be considered tolerant to unexpected variationson the consumption of resources. Although preliminary,the metric defined in this paper could represent a goodseed for future researches.

IntroductionThe execution of a plan in real world environments can beseveral times threatened by the occurrence of unexpecteddeviations from the nominal predicted trajectory. Thus, toavoid costly replanning, the plan should be quite robust tobe tolerant in dealing with turbulence that could arises dur-ing the plan execution.

For these reasons, as highlighted in (Fox, Howey, andLong 2006), it is crucial to have a methodology that helpsin understanding when the plan to be executed is actuallyrobust (or not) with respect to an execution model.

The problem has been faced in the context of plan sched-ule generation, when the resource envelope has been in-troduced (Muscettola 1993), and exploited in building pro-actively robust schedules (Policella et al. 2009). In theseworks, the robustness is in intended in terms of flexibilityof the underlying actions schedule, while, in the context oftemporal and numeric variations, the work of (Fox, Howey,and Long 2006; Fox and Long 2003) has proposed a prac-tical testing strategy based on the theoretical framework ofhybrid automata (Gupta, Henzinger, and Jagadeesan 1997).

Relying on the plan validity notion of numeric kernel pre-sented in (Scala 2013) and in line with the PDDL 2.1 se-mantic (Fox and Long 2003), in this paper we introducea new way for assessing the robustness of sequential nu-meric plans. As a difference w.r.t. approaches in the con-text of actions schedule, but similarly to (Fox, Howey, and


Long 2006), we are interested in measuring robustness forvariations affecting all the resources taken into account.Our approach, as an innovation, does not rely on a testingphase. Rather, our metric relates the plan robustness withthe boundaries of validity computed by the numeric kernelof a given plan. Let us remember that a set of numeric ker-nels for a plan (Scala 2013) defines the sufficient and neces-sary conditions that must be satisfied by each state of theexecution in such a way that the goal is reached via thatplan. The metric we are going to propose gives us an eval-uation about the critical points that the plan may encounterduring the execution. More precisely, this metric assesseshow the trajectory is related to the boundaries defined by thenumeric kernel. Differently from probabilistic approaches((Coles 2012),(Beaudry, Kabanza, and Michaud 2010)), ourmetric does not require any prior knowledge on the distribu-tion functions associated to the numeric variables of interest.

The technique is suitable for those situations in which theplan validity depends on hard constraints expressed on re-sources (e.g., time, energy, money and so forth), modeledas numeric fluents. As a matter of fact, since the introduc-tion of numeric fluents, goal and action preconditions canencompass conditions given in form of inequalities (e.g.,energy > 5). As we will see, differently from the classicalparadigm, such constraints define large valid state spaces.Therefore, the reasoning about this constraints cannot be re-duced to a simple satisfaction testing. Rather, the distancebetween the current plan and such constraints (and in partic-ular with those identified by kernels) could be beneficial forpredicting if the plan can fail somehow.

Motivating ExampleLet us imagine an instance of problem for the ZenoTraveldomain1. We have the passenger p1 that must be moved froml1 to l2 by the airplane a1. We assume that the plan solvingthis problem has been computed as follows: (board p1 a1)(fly p1 l1 l2) (debark p1 a1). Consider that, besides the clas-sical propositional conditions, the goal involves also a con-dition on the time of arrival of p1, expressing that p1 must be

1The ZenoTravel is a famous numeric domain, manytimes employed in the context of the International Plan-ning Competition, for more information take a look athttp://www.plg.inf.uc3m.es/ipc2011-deterministic.

67

in location l2 before the 100 units of time. Assuming that themain steps of the execution are deterministic (e.g., the planwill reach position l2 and the boarding and the debarking arenot affected by uncertainty) it is quite evident that the criti-cal situation of this plan concerns the actual time of arrivalof the airplane a1. As a matter of fact, an important aspectto take into account for considering whether this plan is ro-bust, refers the tolerance to action delays. But how can weevaluate this aspect in general? How can we relate such in-formation with other implicit constraints present in the plan(e.g., the fuel of the airplane), that are not explicitly men-tioned in the goals set? From this intuition we come up withthe idea to combine the notion of numeric kernels with therobustness of the plan.

Formal FrameworkThis section reports the reference planning formalism andthen the numeric kernels notion, that is key in the metric weare going to propose. We assume the reader is familiar to thePDDL-like language; for a thorough discussion see (Fox andLong 2003).

Basic DefinitionsDefinition 1 (World State) A world state is built upon a setF of propositions and a set X of numeric variables. Thus astate s is a pair < F (s), X(s) >, where F(s) is the set ofatoms that are true in s (Closed World Assumption) and X(s)is an assignment in Q for each numeric variable in X.

Definition 2 (Numeric Action) Given F and X as definedabove, a numeric action ”a” is a pair < pre(a), eff(a) >where:• ”pre” is the applicability condition for ”a”; it consists of:

– a numeric part (prenum), i.e. a set of comparisons ofthe form { exp {<,≤,=,≥,>} exp’}.

– a propositional part (preprop), i.e. a set of propositionsdefined over F .

• ”eff” is the effects set of a; it consists of:– a set of numeric operations (effnum) of the form{f ,op,exp}, where f ∈ X is the numeric fluent affectedby the operation, and op is one of {+ =,− =,=}.

– an ”add” and a ”delete” list (eff+ and eff−), whichrespectively formulate the propositions produced anddeleted after the action execution

Here, exp and exp’ are arithmetic expressions involvingvariables from X . An expression is recursively defined interms of (i) a constant in Q (ii) a numeric fluent (iii) anarithmetical combination among {+,*,/,-} of expressions2.

An action a is said to be applicable in a state s iff its propo-sitional and numeric preconditions are satisfied in s. Mean-ing that (i) preprop(a) ⊆ F(s) and (ii) prenum(a) must besatisfied (in the arithmetical sense) by X(s).

2For computational reasons, several numeric planners requestsuch expressions to be linear (e.g. (Hoffmann 2003),(Coles et al.2012)). While for the numeric kernel this condition is not neces-sary, also our metric requires such restriction, as we will see in thenext section.

Given a state s and a numeric action a, the applicationof a in s, identified by s[a], (deterministically) produces anew state s′ as follows. s′ is initialized to be s; each atompresent in eff+(a) is added to F (s′) (whether this is notalready present); each atom present in eff−(a) is removedfrom F (s′); each numeric fluent f of the numeric opera-tion {f ,op,exp} is modified according to the exp and the opinvolved. The state resulting from a non applicable actionis undefined. An undefined world state does not satisfy anycondition.

Definition 3 (Numeric planning problem) A numericplanning problem Π is the tuple < s0, G,A > where, s0 isthe initial state, G3 is the goals set condition and A is a setof ground actions. A solution for Π is a totally ordered setof actions of length n (subset of A) such that the executionof these actions transforms the state s0 into a state sn whereG is satisfied.

Given the formulation above, we allow the access to a seg-ment of the plan by subscripting the plan symbol. More pre-cisely, πi→j with i < j identifies the sub-plan starting fromthe i-th till the j-th action. Moreover when the right bound isomitted the length of the plan is assumed, i.e. πi ≡ πi→|π|.Finally, we identify by s[π] the state produced executing πstarting from s.

Definition 4 (Numeric i-th Plan Validity) Let s be aworld state and G a set of goal conditions, the sub-plan πiis said to be i-th valid w.r.t. s and G iff s[πi] satisfies G.

As reported in (Scala 2013), given a solution for a plan-ning problem, it is (always) possible to compute the suffi-cient and necessary condition for a state to be valid to thatsolution and a given goal. The set of such conditions is callednumeric kernel.

Definition 5 (Numeric Kernel) Let π be a numeric planfor achieving G, and K a set of (propositional and numeric)conditions built over F and X , K is said to be a numerickernel of π iff it represents a set of sufficient and necessaryconditions for the achievement of the goal G starting froms via π. That is, given a state s, s[π] satisfies G iff snumsatisfies Knum and sprop satisfies Kprop.

By considering each suffix of the plan π, i.e.π0 = {a0, ..., an−1}, π1 = {a1, ..., an−1}, π2 ={a2, ..., an−1},...,πn−1 = {an−1} till the empty planπn = {}, it is possible to individuate an ordered set ofnumeric kernels where the i-th element of the set is thenumeric kernel of πi. It is worth noting that, by definition,the goal is a special kind of numeric kernel for the emptysub-plan.

Therefore, given a plan of size n we can say that:- s0[π0] satisfies G iff s0 satisfies K0

- s1[π1] satisfies G iff s1 satisfies K1

- ...- Kn = G corresponding to the kernel for the empty plan πn

3G has the same form of the applicability conditions of a nu-meric action.

68

where the superscript indicates the ”time” index of interest.The resulting set of kernels will be denoted with K, i.e. K ={K0,...,Kn}.

In (Scala 2013), a regression method for the constructionof the numeric kernels set of a plan is reported. Each nu-meric kernel is expressed with the same formalism used forexpressing the set of the action preconditions.

Measuring the numeric robustness of a planvia kernel

By analyzing the numeric kernel formulation, we observedthat, differently from the satisfaction of propositional atoms,the admissibility region defined by the numeric conditionsdoes not determine a particular assignment for the involvednumeric fluents. What is in fact interesting when dealingwith conditions in form of inequalities is that, varying theassignment to the numeric fluents of a state, the conditioncould be still satisfied, even if, sometimes, in a very differentway. For instance, given two states having assigned power =5 and power = 10, respectively, in both cases the conditionspower > 4 is satisfied. However, it is quite obvious that astate containing the second assignment would be preferableas it is clearly more ”robust”.

The idea is hence to capture this aspect when assessingthe robustness of the plan. Intuitively, the more the distancefrom the modeled constraint is, the less the chance to violate(at execution time) the condition in the kernel will be.

To formalize this intuition for the general case we con-sider a particular class of conditions, i.e. inequalities con-taining expressions given as weighted summations, in whichthe comparison term can be only of the kind {<,>}. Eachvariable of the summation represents a numeric fluent4.

More formally, let X be the set of numeric fluents ofinterest, each condition c of a kernel K is as follows: c :a1x0 +a2x1+, ..,+anxn−1 +a0{<,>}0, where xi are nu-meric fluents of the planning problem at hand, and each aiis a real value.

Afterwards, we consider the numeric part of the state s asa point in the space defined by X. Therefore, by exploitingthe well known perpendicular distance defined for euclideanspaces, we can compute the distance from s to c as the dis-tance between the point whose coordinates are defined bythe numeric part of s and the hyper-plane defined by c5. Thatis:

d((x′0, x′1, .., x

′n−1), c) =

|a1x′0 + a2x′1+, ..,+anx

′n−1 + a0|√

a21 + a22 + ..+ a2n(1)

Having defined the distance from a state to a specific con-dition, we need to model all those situations where the statemust be compared with more than one condition. To this pur-pose, we started from the consideration that the robustness

4This kind of representation is possible under the condition ofrestricting the language to linear expressions, (Hoffmann 2003).

5Note that, if we consider also not admissible states, the dis-tance represents a measure of the effort for satisfying such a con-straint.

metric should model the situation in which there is no con-straint at all with maximum score (possibly infinite), whilecases where at least a constraint is just violated with a valueclose to 0. Note that, since we assume that such conditionsare satisfied by s, our distance will never be 0.

So, given a state s and a kernel K, we have that the riskfor s to become invalid is given by:

Risk(s,K) =∑

c∈Kwc

1

d(s, c)6 (2)

Each element of this summation takes 0 when the distanceis infinite. This happens when the condition is not expressedin that kernel. While it takes high values when the constraintis very tight.

Let the risk as defined above, the robustness of a givenplan starting from the state I is given by

Robustness(s0, π) =1

Risk(s0,K0)(3)

Therefore we have that, when the risk is equal to 0, therobustness takes the infinite value; meaning that the plan isnot threatened by any condition. On the opposite, when therisk is very high, the robustness of the plan tends to be 0.Meaning that the execution could easily violate the conditionon the resources.

This measure gives us the robustness of the plan for a par-ticular state of the system; however, let us remember that thenumeric kernel compresses all the conditions throughout theplan toward the state of interest (Scala 2013). For this rea-son, also all the services to forthcoming actions are capturedin evaluating the robustness.

On the other hand, to understand possible weakness alsoin the rest of the plan, we can repeat this measurement acrossall the actions the plan consists of. By exploiting this furtherpieces of information, for instance, in a mixed initiative sce-nario, the user can change actions, modify the action param-eters or choose to run the planner for alternative solutions.For instance, if the fly action in our domain is preceded by astate that scarcely satisfies the kernel condition as refers thepower, one can decide to add a refuel action, just before thefly action execution. Also the idea of creating further execu-tion branches matching particular conditions could be inter-esting (Coles 2012).

Another application could be in a continual planning strat-egy (Brenner and Nebel 2009). We can in fact execute theplan and exploit the robustness estimation to trigger replan-ning when such an evaluation overpasses a given safetythreshold. In fact, thanks to the very compact nature of thekernel formulation, the process can be performed very effi-ciently; it is not necessary to simulate all the plan execution,

6Here, we consider a weighted summation for modelling thedifferent contribution of a given condition. This is a domain depen-dent characteristic that relies on the risk that each condition orig-inates. As a future work, we aim at understanding if it is possibleto autonomously derive this value by considering the model of theactions, or by analyzing the way in which the search space is ex-plored.

69

since the evaluation of the robustness just depends on thecurrent (observed) state of the world.

Motivating example - Evaluating theRobustness

Given the plan reported in the first part of this paper, foreach state of the execution of our example, we obtained thefollowing robustness values: (79.13), (79.13), (92).

It is quite evident that the more critical situation in theplan is represented by the states prior to the execution of thefly action. Despite this fact can be quite natural for a humanuser, just looking at the precondition of the fly action doesnot suffice. Instead, the kernel notion captures also the timeinformation; in fact when the kernel is generated, the condi-tion is regressed toward the kernel supporting the suffix ofthe plan (fly - debark). Hence, since the robustness considerall the conditions of a kernel, the weakness in that point islarger since the state satisfying such a kernel has to take intoaccount not only the constraint on the time, but also the con-straint on the fuel at disposal.

ConclusionThis paper has presented a preliminary study on how it ispossible to exploit the notion of numeric kernel for reason-ing on the robustness of a plan when dealing with numericfluents. The notion of numeric fluents is very important inthe context of real world planning because, since their in-troduction, it allows to compute a plan which has to con-form not only to propositional conditions, but to particularresource profiles as well.

However, while a quite big amount of work has been ad-dressed for the plan generation problem ((Hoffmann 2003),(Coles et al. 2012), (Gerevini, Saetti, and Serina 2008)),less attention has been devoted to the problem of manag-ing/revising and analyzing such plan after the generation.

In this paper we propose a new robustness metric for an-alyzing a numeric plan. Differently from the work of (Fox,Howey, and Long 2006), the metric proposed does not re-quire a testing phase; rather, it evaluates the goodness ofa solution w.r.t the distance between the plan and the con-straints defined by the kernels. The metric is still prelimi-nary and needs a validation phase. So we are working ontesting the robustness metric reported in this paper againstvarious domains from the planning competition. Of course,also a thorough comparison with the strategy reported in(Fox, Howey, and Long 2006), and approaches based onprobabilistic model of execution ((Domshlak and Hoffmann2007), (Beaudry, Kabanza, and Michaud 2010)) is neces-sary.

From an applicative point of view, as anticipated in thepaper, we believe the metric can be interesting in a mixedinitiative framework and also in the context of the continualplanning to signal potentially risky situations.

ReferencesBeaudry, E.; Kabanza, F.; and Michaud, F. 2010. Planningwith concurrency under resources and time uncertainty. In

Proceedings of the 2010 conference on ECAI 2010: 19th Eu-ropean Conference on Artificial Intelligence, 217–222. Am-sterdam, The Netherlands, The Netherlands: IOS Press.Brenner, M., and Nebel, B. 2009. Continual planning andacting in dynamic multiagent environments. Journal of Au-tonomous Agents and Multiagent Systems 19(3):297–331.Coles, A. J.; Coles, A.; Fox, M.; and Long, D. 2012. Colin:Planning with continuous linear numeric change. Journal ofArtificial Intelligence Research (JAIR) 44:1–96.Coles, A. J. 2012. Opportunistic branched plans to maximiseutility in the presence of resource uncertainty. In ECAI, 252–257.Domshlak, C., and Hoffmann, J. 2007. Probabilisticplanning via heuristic forward search and weighted modelcounting. Journal Artifificial Intelligence Research (JAIR)30:565–620.Fox, M., and Long, D. 2003. Pddl2.1: An extension to pddlfor expressing temporal planning domains. Journal of Arti-ficial Intelligence Research (JAIR) 20:61–124.Fox, M.; Howey, R.; and Long, D. 2006. Exploration of therobustness of plans. In AAAI, 834–839.Gerevini, A.; Saetti, I.; and Serina, A. 2008. An approach toefficient planning with numerical fluents and multi-criteriaplan quality. Artificial Intelligence 172(8-9):899–944.Gupta, V.; Henzinger, T.; and Jagadeesan, R. 1997. Robusttimed automata. In Maler, O., ed., Hybrid and Real-TimeSystems, volume 1201 of Lecture Notes in Computer Sci-ence. Springer Berlin Heidelberg. 331–345.Hoffmann, J. 2003. The metric-ff planning system: Translat-ing ”ignoring delete lists” to numeric state variables. Journalof Artificial Intelligence Research (JAIR) 20:291–341.Muscettola, N. 1993. Hsts: Integrating planning andscheduling. Technical Report CMU-RI-TR-93-05, RoboticsInstitute, Pittsburgh, PA.Policella, N.; Cesta, A.; Oddi, A.; and Smith, S. 2009. Solve-and-robustify. Journal of Scheduling 12:299–314.Scala, E. 2013. Numeric kernel for reasoning about plansinvolving numeric fluents. In et al., M. B., ed., AI*IA 2013,LNAI 8249, Springer International Publishing Switzerland.

70

On the Plan-library Maintenance Problem in aCase-based Planner

Alfonso E. Gerevini† and Anna Roubıckova‡ and Alessandro Saetti† and Ivan Serina†

†Dept. of Information Engineering, University of Brescia, Brescia, Italy‡Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano, Italy

{gerevini, saetti, serina}@ing.unibs.it, [email protected]

Abstract

Case-based planning is an approach to planning where previ-ous planning experience stored in a case base provides guid-ance to solving new problems. Such a guidance can be ex-tremely useful when the new problem is very hard to solve,or the stored previous experience is highly valuable (because,e.g., it was provided and/or validated by human experts) andthe system should try to reuse it as much as possible. How-ever, as known in general case-based reasoning, the case baseneeds to be maintained at a manageable size, in order to avoidthat the computational cost of querying it excessively grows,making the entire approach ineffective. We formally definethe problem of case base maintenance for planning, discusswhich criteria should drive a successful policy to maintain thecase base, introduce some policies optimizing different crite-ria, and experimentally analyze their behavior by evaluatingtheir effectiveness and performance.

IntroductionIt is well known that AI planning is a computationally veryhard problem (Ghallab, Nau, and Traverso 2004). In order toaddress it, over the last two decades several syntactical andstructural restrictions that guarantee better computationalproperties have been identified (e.g., (Backstrom et al. 2012;Backstrom and Nebel 1996)), and various algorithms andheuristics have been developed (e.g., (Gerevini, Saetti, andSerina 2003; Richter and Westphal 2010)). Another comple-mentary approach, that usually gives better computationalperformance, attempts to build planning systems that canexploit additional knowledge not provided in the classicalplanning domain model. This knowledge is encoded as, e.g.,domain-dependent heuristics, hierarchical task networks andtemporal logic formulae controlling the search, or it can beautomatically derived from the experiences of the planningsystem in different forms.

Case-based planning (e.g., (Gerevini, Saetti, and Serina2012; Munoz-Avila 2001; Serina 2010; Spalazzi 2001)) fol-lows this second approach and concerns techniques that im-prove the overall performance of the planning system byreusing its previous experiences (or “cases”), provided thatthe system frequently encounters problems similar to thosealready solved and that similar problems have similar so-lutions. If these assumptions are fulfilled, a well-designed

This work has already been published at the 21st International Con-ference on Case-Based Reasoning (ICCBR 2013).

case-based planner gradually creates a plan library that al-lows more problems to be solved (or higher quality solu-tions to be generated) compared to using a classical domain-independent planner. Such a library is a central componentof a case-based planning system, which needs a policy tomaintain the quality of the library as high as possible in or-der to be efficient. Even though this problem has been stud-ied in the context of case-based reasoning, comparable workin the planning context is still missing.

In this paper, we define the assumptions underlying thecase-based methodology in the context of planning, whichcharacterize the typical distribution of the cases in a planlibrary. Then we formalize the problem of maintaining theplan library, we introduce criteria for evaluating its quality,and we propose different policies for maintaining it. Suchpolicies are experimentally evaluated and compared using arecent case-based planner, OAKplan (Serina 2010), and con-sidering some benchmark domains from the InternationalPlanning Competitions (Koenig 2013).

PreliminariesIn this section, we give the essential background and no-tation of (classical) planning problem (Ghallab, Nau, andTraverso 2004), as well as some basic concepts in case-basedreasoning and planning.

A planning problem is a tuple Π = ⟨F , I,G,A⟩ where:F is a finite set of ground atomic propositional formulae;I ⊆ F is the set of atoms that are true in the initial state;G ⊆ F is a set of literals over F defining the problem goals;A is a finite set of actions, where each a ∈ A is defined bya set pre(a) ⊆ F forming the preconditions of a, and a seteff(a) ⊆ F forming the effects of a.

A plan π for a planning problem Π is a partially orderedset of actions of Π. A plan π solves Π if the application ofthe actions in π according to their planned order transformsthe initial state to a state Sg where the goals G of Π are true(G ⊆ Sg). Classical generative planning is concerned withfinding a solution plan for a given planning problem.

Case-based planning (CBP) is a type of case-based rea-soning, exploiting the use of different forms of planningexperiences concerning problems previously solved and or-ganized in cases forming a case base or plan library. Thesearch for a solution plan can be guided by the stored infor-mation about previously generated plans in the case base,which may be adapted to become a solution for the new

71

problem. When a CBP system solves a new problem, a newcase is generated and possibly added to the library for poten-tial reuse in the future. In order to benefit from rememberingand reusing past plans, a CBP system needs efficient meth-ods for retrieving analogous cases and for adapting retrievedplans, as well as a case base of sufficient size and coverageto yield useful analogues.

In our work, we focus on planning cases that are planner-independent and consist of a planning problem Π and a so-lution plan π of Π. A plan library is a set of cases {⟨Πi, πi⟩ |i ∈ {1, . . . , N}}, constituting the experience of the plannerusing this library. In our approach, the relevant informationof each library case is encoded using a graph-based repre-sentation called planning problem graph (for a detailed de-scription about this representation see (Serina 2010)).

A case-based planner follows a sequence of steps typicalin CBR (Aamodt and Plaza 1994):• Retrieve - querying the library to identify cases suitable

for reuse and select the best one(s) of these;• Reuse - adapting such cases to solve the new problem;• Revise - testing the validity of the adapted plan in the new

context by, e.g., a (simulated) execution of the plan repair-ing it in case of failures;• Retain - possibly storing the new problem and the corre-

sponding solution plan into the library,where the first three steps composed the adaptation phaseand the fourth one the maintenance phase. The general CBRschema may differ depending on various implementationchoices, e.g., the retrieval phase may provide one or morecases to be reused, or, the reuse may discard the proposedcases for insufficient quality and generate a solution fromscratch, i.e., behave like classical generative planner.

Related WorkThe topic of case base maintenance has been of a great in-terest in the case-based reasoning community for the lasttwo decades. However, the researchers studying case-basedplanning have not paid much attention to the problem ofcase base maintenance yet. Therefore, the related work fallsmostly in the field of CBR, where most of the proposed sys-tems handle classification problems.

(Leake and Wilson 1998) defined the problem of case-based maintenance as “an implementation of a policy to re-vise the case base to facilitate future reasoning for a particu-lar set of performance objectives”. Depending on the evalu-ation criteria, they distinguish two types of CB maintenancetechniques — the quantitative criteria (e.g., time) lead toperformance-driven policies, while the qualitative criteria(e.g. coverage) lead to competence-driven policies.

The quantitative criteria are usually easier to compute;among these policies belong the very simple random dele-tion policy (Markovitch, Scott, and Porter 1993) and a policydriven by a case utility metric (Minton 1990), where the util-ity of a case is increased by its frequent reuse and decreasedby costs associated with its maintenance and matching.

The most used qualitative criterion corresponds to thenotion of “competence” introduced by (Smyth 1998). In-tuitively, the elements are removed from the case base inreverse order w.r.t. their importance, where the importance

of a case is determined by the case “coverage” and “reach-ability”. The two notions capture how many problems thecase solves and how many cases it is solved by. Note, how-ever, that differently form our approach to CBP, in his workSmyth considers systems without an underlying generic gen-erative solver. With such a solver the case-based system cansolve any problem independently of the quality of the casebase, and the notion of competence needs to be reconsidered.

The notion of competence was also used to define foot-print deletion and footprint-utility deletion policies (Smythand Keane 1998). Another extension is the RC-CNN algo-rithm (Smyth and McKenna 1999), which compresses thecase base using the compressed-nearest-neighbor algorithmand ordering derived by relative coverages of the cases. Fur-thermore, (Leake and Wilson 2000) suggested replacing therelative coverage by relative performance of a case. (Zhuand Yang 1998) however claim that the competence-drivenpolicies of Smyth and his collaborators do not ensure com-petence preservation. They propose a case addition policywhich mimics a greedy algorithm for set covering, addingalways the case that has the biggest coverage until the wholeoriginal case base is covered or the limit size is reached. Thepolicies we propose in this paper differ from the approachof (Zhu and Yang 1998) mainly in the condition guiding theselection of the cases to keep in the case base — Zhu andYang evaluate the utility of each case based on the frequencyof its reuse in comparison with the frequency with which itsneighbors are reused. Moreover, their policy does not con-sider the quality with which the original case base is cov-ered, whereas the weighted approach proposed here does. Ina sense, we generalize the work of Zhu and Yang and adaptit for using it in the context of planning.

Munoz-Avila studied the case retention problem in or-der to filter redundant cases (Munoz-Avila 2001), which isclosely related to the problem studied here. However, thepolicy proposed in (Munoz-Avila 2001) is guided by thecase reuse effort, called the benefit of the retrieval, requiredby a specific “derivational” case-based planner to solve theproblem. Intuitively, a case c is kept only if there is no othercase in the case base that could be easily adapted by the plan-ner to solve the problem represented by c. In our approach,the decision of keeping a case is independent from adapta-tion cost of the other cases. Moreover, the policy proposed in(Munoz-Avila 2001) can decide only about problems solvedby the adopted derivational case-based planner; while thepolicies studied in this paper are independent from the plan-ner used to generate the solutions of the cases.

Case Base Maintenance

The core idea of CBP is providing a complementary ap-proach to traditional generative planning under some as-sumptions. Coming from the field of case-based reasoning,the world needs to be regular and problems need to recur.The regularity of the world requires that similar problemshave similar solution. Such an assumption obviously linkstogether the similarities between problems and between so-lutions, which (among others) provides a guarantee that aretrieved case containing a problem similar to the new prob-

72

lem to solve will provide a good solution plan for the reuse.1The later assumption (the problems recur) is meant to ensurethat a case base contains a good reuse candidate to retrieve.

In addition to the assumptions on the problems and theirsolutions, there are assumptions coming from the design ofthe case-based methodology — a case base needs to rep-resent as many various experiences the system has madeas possible, while remaining of manageable size. The inter-play between these two parameters has a significant impacton the observed performance of the case-based planner, be-cause a too large case base requires a vast amount of time tobe queried, whereas even well designed retrieval algorithmfails to provide a suitable case to the reuse procedure if sucha case is not present in the case base.

These assumptions seem quite reasonable and not overlyrestrictive, however, they are not very formal. In the follow-ing, we propose some definitions that allow us to formalizethe assumptions and to define a maintenance policy.The Maintenance ProblemThe case base maintenance is responsible for preserving andimproving the quality of the case base. So far the planningcommunity has focused the research on CBP mostly on theproblems related to the reuse and retrieval steps. The reten-tion usually settled with one of the extremes — either main-taining everything or using a pre-built case base which isfixed during the lifetime of the system.

To design a procedure for the maintenance, we start bydeciding which parameters of the case base define its qual-ity, and so which criteria should guide the maintenance pol-icy in determining which experiences to keep and which todiscard. Obviously, an important criterion is the variety ofproblems the case base can address, which is also referredto as the case base competence (Smyth 1998), and its inter-play with the size, or cardinality, of the case base.

However, the notion of competence used in CBR cannotbe directly adapted to the use in the planning context. Differ-ently from CBR, where a case usually either can or cannot beadapted to solve a problem, the reuse procedure of planningsystems can change any unfit part of the stored solution, or itcan even disregard the whole stored solution and attempt tofind a new solution from scratch. In modern case-based plan-ners, any case can be used to solve any problem; the systemwill decide how much the new solution deviates from thestored one, how expensive the reuse is, and therefore, howuseful the stored solution is. Consequently, we define somecriteria for guiding the maintenance.

There are two different kinds of maintenance policies —an additive policy, which considers inserting a new case intothe case base when a new solution is found/provided, and aremoval policy, which identifies cases that can be removedwithout decreasing the quality of the case base too much.Hence, we formalize the general maintenance problem as atwo-decision problem.Definition 1 (Case Base Maintenance Problem)• Given a case base L = {ci | ci = ⟨Πi, πi⟩, i ∈{1, . . . , n}}, decide for each i ∈ {1, . . . , n} whether the

1During the retrieval, the system has no information about thesolution it is looking for and so it needs to decide solely based onthe properties of the problem.

case ci should be removed from L.• Given a new case c = ⟨Π, π⟩, c ∈ L , decide whether c

should be added to L.In our work, we focus on the removal maintenance, as wesee it as the most critical one — in the absence of a policy todecide which elements to add, we can simply add every newcase until the case base reaches a critical size, and then em-ploy the removal maintenance policy to obtain a small casebase of good coverage. The alternative approach, which adds“useful” cases until the case base reaches a critical size, mayperform not that well, as, differently from the first approach,it needs to estimate the distribution of the future problems,whereas the first approach operates on known past data.

We start by considering when a case can address anotherproblem, or rather how well it can do so. Intuitively, plan-ning can be interpreted as a search problem in the spaceof plans (e.g., (Ghallab, Nau, and Traverso 2004)), whereclassical planners start from an empty plan, while case-based planners start from a plan retrieved from the plan li-brary (see, e.g., LPG-Adapt (Fox et al. 2006)). We can de-fine the distance between the stored solution and the solu-tion of the new problem as the minimum number of ac-tions that need to be added/removed in order to convertthe stored plan to the new one. Let ℘ denote the space ofplans for a given planning domain. Then, a distance func-tion da : (℘ × ℘) → [0, 1] measures the distance of anytwo plans πi, πj ∈ ℘, where the greater distance indicatesgreater effort needed during the adaptation phase. However,computing such a function can be very hard as, in the worstcase, it can be reduced to searching for the solution planfrom an empty plan (i.e., to the classical planning problem),which is known to be PSPACE-hard (Backstrom and Nebel1996). Therefore we define da(πi, πj) as the number of ac-tions that are in πi and not in πj plus the number of actionsthat are in πj and not in πi, normalized over the total num-ber of actions in πi and πj (Srivastava et al. 2007), that is,da(πi, πj) =

|πi−πj |+|πj−πi||πi|+|πj | , unless both plans are empty,

in which case their distance is 0.Clearly, if the case-based system needs to revert to an

empty plan and search from there, the provided case is notconsidered useful. Hence, we can say that a case can beuseful to solve a problem if the distance between the corre-sponding plans is not bigger than the distance from an emptyplan. The distance from the empty plan estimates the effortthe case-based system needs to spend in order to generate asolution from scratch, which is equivalent to the estimatedwork a generative planner requires in order to find a solu-tion as the reuse procedures often perform the same kindof search as generative planners do. Consequently, it is notworth trying to reuse more distant plans than the empty one.

Definition 2 Given a finite value δ ∈ R, we say that acase ci = ⟨Πi, πi⟩ can be useful to solve problem Π, thatis addressesδ(ci, Π), if there exists a solution plan π for Πsuch that da(πi, π) < δ.

Note that the definition of addressesδ(ci, Π) heavily relieson the distance between the solutions, and completely disre-gards the relation of the relative problems. However, alsothe structural properties of the problems play a consider-

73

able role, as the case retrieval step is based on the plan-ning problem descriptions. Therefore, we also use a distancefunction dr that is intended to reflect the similarity of theproblems. Let P denote the space of problems in a givenplanning domain, Π ∈ P be a new problem, and Π′ ∈ Pbe a problem previously solved. Assuming that the match-ing between the objects of Π and Π′ has been already per-formed, and that I ′ ∪ G = ∅, a problem distance functiondr : (P × P) → [0, 1] is defined as follows (Serina 2010):dr(Π

′, Π) = 1 − |I′∩I|+|G′∩G||I′|+|G| , where I and I ′ (G and G′)

are the initial states (sets of goals) of Π and Π′, respectively.If I ′ ∪ G = ∅, dr(Π

′, Π) = 0.The smaller distance dr between two problems is, the

more similar they are; consequently, by the regular worldassumption, they are more likely to have similar solutions,and so it is useful to retrieve from the case base the case fora problem that is mostly similar to the problem to solve. Wecan say that dr guides the retrieval phase while da estimatesthe plan adaptation effort. The maintenance policy shouldconsider both distances, in order not to remove importantcases, but also to support the retrieval process. Therefore,we combine the two functions, obtaining distance functiond : ((P × ℘) × (P × ℘)) → [0, 1] measuring distance be-tween cases. The combination of dr and da allows us to as-sign different importance to the similarity of problems andtheir solutions, depending on the application requirements.2

The assumption of regular world presented at the begin-ning of this section was using a notion of similarity betweenproblems and solutions, neither providing details on how thesimilarity should be interpreted, nor specifying which solu-tions are considered. This is an important concern when aproblem may have several significantly different solutions.We formalize this assumption keeping the notion of similar-ity undetailed to preserve the generality of the definition, butestablishing the quantification over the solutions as follows:Definition 3 (Regular World Assumption) Let Π andΠ′ ∈ P , Π =Π′, be two similar planning problems of aplanning domain with plan space ℘. If the world is regular,then ∀π ∈ ℘ that is a solution for Π ∃π′ ∈ ℘ that is asolution for Π′ and ∀π′ ∈ ℘ that is a solution for Π′ ∃π ∈ ℘that is a solution for Π, such that these π and π′ are similar.We interpret problem and plan similarity using the distancefunctions dr and da. Specifically, we consider two problemsΠ, Π′ similar iff dr(Π, Π′) < δ, and two solutions π, π′

similar iff da(π, π′) < δ′, where δ and δ′ are reals whosespecific values depend on the particular distance functionand on the planning domain considered. Next, we formalizethe notion of problem recurrence:Definition 4 (Recurring Problems Assumption) For ev-ery new problem a case-based planner encounters, it islikely that a similar problem has been already encounteredand solved.

2We use a linear combination d = α · dr + (1 − α) · da,where α = 0.5 for domains that exhibit very regular behavior(e.g. Logistics), while we used α = 1, i.e., d = dr , for do-mains where the solutions generated for the case base using plan-ner TLplan (Bacchus and Kabanza 2000) differed even for similarproblems (e.g. ZenoTravel), being quite irregular w.r.t. da.

Case-based planning relies on the two assumptions ofDef. 3-4 to be fulfilled. If that indeed is the case, since inour approach every encountered problem is simply added tothe case base, when a new problem Π is encountered, thecase base likely contains a case ci = ⟨Πi, πi⟩ such thataddressesδ(ci, Π) holds for some (small) value of δ, and,by Def. 3, the case-based planning system can produce so-lutions similar to the previous ones for Π by reusing those.Moreover, by Def. 4, it can be expected that the cases inthe case base create groups of elements, that we call caseclusters, similar to each other and that could be reduced tosmaller groups without significant loss of information.

Definition 4 does not define how much likely similarproblems are assumed to be encountered. This means thatthere can be different degrees of problem recurrence. In thestrongest case, all new encountered problems are similar toa problem in the case base. Different degrees of problem re-currence lead to differently structured plan libraries in termsof case clusters, which can affect the performance of a planlibrary maintenance policy exploiting them.

In our work, case similarity is interpreted by means of thedistance function d, i.e., c is similar to c′ iff d(c, c′) < δ′′,where a specific value of δ′′ ∈ R depends on the specificimplementation choices as well as on the domain.

Maintenance PoliciesInstead of maximizing competence as an absolute propertyof a case base, the maintenance is guided by minimizing theamount of knowledge that is lost in the maintenance pro-cess, where removing a case from the library implies losingthe corresponding knowledge, unless the same informationis contained in some other case. The notions of case coveringand case base coverage are defined to capture this concept:

Definition 5 Given a case base L and a case distancethreshold δ ∈ R, we say that a case ci ∈ L covers a casecj ∈ L, that is, covers(ci, cj), if d(ci, cj) ≤ δ.

Definition 6 Let L,L′ denote two case bases and let C de-note the set of all cases in L that are covered by the casesin L′, i.e., C = {ci ∈ L | ∃c′

i ∈ L′, covers(c′i, ci)}. The

coverage of L′ over L, coverage(L′,L), is defined as |C||L| .

We can now formally define the outcome of an algorithm ad-dressing the plan library maintenance problem — it shouldbe a case base L′ that is smaller than the original case baseL, but that contains very similar experiences. Under suchconditions, we say that L′ reduces L:

Definition 7 Case base L′ reduces case base L, de-noted as reduces(L′,L), if and only if L′ ⊆ L andcoverage(L′,L) = 1.

In the previous definition, we may set additional require-ments on L′ to find a solution that is optimal in some ways.For example, we may want to minimize the size of L′, or wemay try to maximize the quality of the coverage. The struc-ture of the policy remains the same — it constructs L′ byselecting the cases that satisfy a certain condition optimizingL′. Such a condition corresponds to a specific criterion themaintenance policy attempts to optimize.

74

Random Policy (Smyth 1998). This policy reduces the casebase by randomly removing cases (Markovitch, Scott, andPorter 1993), which is easy to implement and fast to com-pute. However, the coverage of the reduced case baseL′ overthe original case base L cannot be guaranteed.Distance-Guided Policy. Due to the assumption of recur-ring problems, we expect that the problems in the library canbe grouped into sets of problems that are similar (close in thesense of dr) to each other. Consequently, by the assumptionof regular world, for a problem Π′ there exists a solution π′

that is similar to the solution π of a stored case c = ⟨Π, π⟩where Π is similar to Π′. Case c′ = ⟨Π′, π′⟩ is similar to c(close in the sense of d) and its inclusion in the case baseintroduces some redundancy because of its similarity with c.

We propose a distance-guided policy that attempts to re-move the cases that are mostly redundant. Intuitively, thesecases are those having their distance from other cases toosmall. In particular, the distance-guided policy identifies thecases to remove by exploiting the notion of average min-imum distance δµ in the case base. Given a case ci ∈ L,the minimum distance case c∗

i of ci is a case in L suchthat d(ci, c

∗i ) < d(ci, cj), ∀cj ∈ L \ c∗

i . The distanceguided policy keeps a case ci in the case base if and onlyif d(ci, c

∗i ) ≥ δµ, where δµ is defined as follows: δµ =

Σci∈Ld(ci,c

∗i )

|L| .3

The distance-guided policy is clearly better informed thanthe random policy, and it can recognize cases of high impor-tance for the coverage of the case base (e.g., isolated ele-ments that are dissimilar to any other case). The better in-formation is however reflected by increased computationalcomplexity – the distance-guided policy needs to considerthe distance between all pairs of cases in order to find theclosest one; therefore it requires quadratic number of dis-tance evaluations, resulting in run-time of O(|L|2 · td),where td denotes the time needed to compute the distancebetween two cases.Coverage-Guided Policy. The distance-guided policy canpreserve the knowledge in the case base better than the ran-dom policy does. However, it is not optimal, as some infor-mation is missed when only pairs of cases are considered.We generalize the approach by considering all the cases thatmay contain redundant information at once. For that we de-fine the notion of neighborhood of a case c with respect to acertain similarity distance value δ, denoted nδ(c).

The idea of the case neighborhood is to group elementswhich contain redundant information and hence that can bereduced to a single case. The case neighborhood uses a valueof δ in accordance with Def. 5. Note that such a value, to-gether with the structure and distribution of the cases in thecase base, influences the cardinality of the case neighbor-hoods and therefore determines the amount by which L canbe reduced.

Definition 8 (Case neighbourhood) Given a case base L,a case c ∈ L and a similarity distance threshold δ ∈ R, theneighborhood of c is nδ(c) = {ci ∈ L | d(c, ci) < δ}.

3The isolated cases are excluded in the computation of δµ. Acase ci is considered isolated if distance d(ci, c

∗i ) = 0.5.

The Coverage-Guided policy is concerned with finding a setL′ of cases such that the union of all their neighborhoodscovers all the elements of the given case base L, or, usingthe terminology of Def. 7, finding a case base L′ such thatreduces(L′,L) holds.

There are many possible ways to reduce a case base in ac-cordance with this policy, out of which some are more suit-able than others. We introduce two criteria for reducing thecase base that we observed can significantly influence theperformance of a case-based system adopting the coverage-guided policy: minimizing the size of the reduced case base,which has a significant impact on the efficiency of the re-trieval phase, and maximizing the quality of the coverageof the reduced case base, which influences the adaptationcosts. Considering the first criterion, the optimal result ofthe coverage-guided policy takes account of the number ofelements in the reduced set:

Definition 9 (Cardinality Coverage-Guided Policy)Given a similarity threshold value δ ∈ R and a case baseL, find a reduction L′ of L with minimal cardinality.Concerning the second criterion, consider three casesc, c1, c2 so that d(c, c1) < d(c, c2) < δ. By Def. 5, c coversboth c1 and c2, however, the expected adaptation cost of c1 islower than the cost of c2, and therefore c1 is better covered.The quality of the case base coverage can intuitively be de-fined as the average distance from the removed cases to theclosest kept case (average coverage distance). Regarding thecoverage quality, the optimal result of the coverage-guidedpolicy is a case base L′ reducing L with minimal averagecoverage distance. Note, however, that if only the coveragedistance was considered, then L = L′ would be a specialcase of optimal reduced case base. Therefore, the qualitymeasure to optimize needs to be more complex in order totake account of the size of the reduced case base. In par-ticular, given a reduction L′ of L, we define the uncoveredneighborhood Uδ(c) of an element c ∈ L as its neighbors inL\L′, i.e., [Uδ(c) = {cj ∈ L | cj ∈ {nδ(c)∩L\L′}∪{c}}.Then, we define the cost of a case c as a real functionvδ(c) =

(Σcj∈Uδ(c)d(c,cj)

|Uδ(c)| + p)

. The first term within thebrackets indicates the average coverage distance of the un-covered neighbors; the second term, p ∈ R, is a penalizationvalue that is added in order to favorite reduced case baseswith fewer elements and to assign a value different from 0also to isolated cases in the case base.4 The sum of thesecosts for all the elements of a reduced set L′ defines the costMδ(L′) of L′, i.e.,Mδ(L′) = Σc∈L′vδ(c). The policy op-timizing the quality of the case base coverage can then bedefined as follows:

Definition 10 (Weighted Coverage-Guided Policy)Given a similarity threshold value δ ∈ R and a case baseL, find a reduction L′ of L that minimizesMδ(L′) .

Unfortunately, computing the reduction of Def. 10 can becomputationally very expensive. Therefore, we propose tocompute an approximation of the reduced case base of this

4In our experiments, we use p = maxci∈Ld(ci, c∗i ).

5If the δ value is not provided, the algorithm uses the averageminimum distance (δµ).

75

Algorithm: CoverageBasedPolicy(L, δ)

Input: a case base L = {ci | i ∈ 1 ≤ i ≤ n}, a thresholdδ ∈ R.5

Output: a case base L′ reducing L1. L′ ← ∅;2. Uncovered← L;3. repeat4. select ci ∈ Uncovered that satisfies condition(ci);5. Uncovered← Uncovered \ nδ(ci);6. L′ ← L′ ∪ {ci};7. until Uncovered = ∅;8. return L′;Figure 1: A greedy algorithm computing a Coverage-BasedPolicy approximation.

policy using the greedy algorithm described by Fig. 1. Thisalgorithm has two variants that depend on how line 4 is im-plemented and corresponds to the two proposed versions ofthe coverage-guided policy. For the Cardinality Coverage-Guided Policy, the condition test at line 4 of the algorithmis used to select the uncovered element ci with greatest|Uδ(ci)| in order to maximize the number of uncovered ele-ments in nδ(ci) that can be covered by inserting ci into L′.While, for the Weighted Coverage-Guided Policy, in orderto optimize the quality of the reduced case base L′, the con-dition of line 4 is used to select the uncovered element ci

with the minimum vδ(ci)|Uδ(ci)| value, where the vδ(ci) value is

scaled down by |Uδ(ci)| to favor the cases that cover highernumber of still-uncovered elements.

Experimental ResultsThe policies presented in the previous sections have beenimplemented in a new version of the CBP system OAKPlan(Serina 2010). In our experiments, the plan retrieved byOAKPlan is adapted using planner LPG-Adapt (Gerevini,Saetti, and Serina 2003). The benchmark domains consid-ered in the experimental analysis are the available domainsDriverLog, Logistics, Rovers, and ZenoTravel fromthe 2nd, 3rd and 5th International Planning Competitions(Koenig 2013).

For each considered domain we generated a plan librarywith ∼ 5000 cases. Specifically, each plan library containsa number of case clusters ranging from 34 (for Rovers) to107 (for ZenoTravel), each cluster c is formed by usingeither a large-size competition problem or a randomly gen-erated problem Πc (with a problem structure similar to thelarge-size competition problems) plus a random number ofcases ranging from 0 to 99 that are obtained by changingΠc. Problem Πc was modified either by randomly chang-ing at most the 10% of the literals in its initial state andset of goals, or adding/deleting an object to/from the prob-lem. The solution plans of the planning cases were computedby planner TLPlan (Bacchus and Kabanza 2000). TLPlanexploits domain-specific control knowledge to speedup thesearch significantly, so that large plan libraries can be con-structed by using a relatively small amount of CPU time. Inour libraries, plans have a number of actions ranging from68 to 664. For each considered domain, we generated 25

test problems, each of which derived by (randomly) chang-ing problem of a cluster randomly selected among those inthe case base. Note that the cases in the library are groupedinto clusters and test problems were generated from the li-brary problems because the aim of our experimental analysisis studying the effectiveness of the proposed techniques fordomains with recurring problems. We experimentally com-pared nine specific maintenance policies:• three random policies, R50, R75, and R90, that remove a

case from the full plan library with probability 0.50, 0.75,and 0.90;• three distance-guided policies, D1, D2, and D3, that re-

move the mostly redundant cases from (1) the full planlibrary, (2) the library obtained from D1, and (3) the li-brary obtained from D2;• three coverage-guided policies, C1, C2, and C3, that com-

pute a reduced case base by using the greedy algorithm inFig. 1 with (1) the full plan library, (2) the library obtainedfrom C1, and (3) the library obtained from C2.

Table 1 compares the reduced plan libraries by using thenine considered maintenance policies in terms of size, cov-erage (using δ = 0.1) w.r.t. the full plan library, numberof elements of the full plan library that are not covered bythe reduced plan libraries, and average distance from anyuncovered case to the closest case in the reduced plan li-brary. Obviously, the closer the coverage is to 1, or, equally,the lower the number of uncovered cases is, the better themaintenance policy is. Moreover, since a high-quality pol-icy should remove only redundant cases, lower values of theaverage minimum distance from the uncovered cases indi-cates better plan libraries.

While the size of the case bases obtained by C1 and D1 isoften comparable with the case base obtained by using R50,C1 and D1 are always better (usually significantly) than R50

in terms of coverage, number of uncovered elements and av-erage minimum distance from the uncovered cases. Simi-larly, while the sizes of the case bases are often comparable,C2 and D2 are better than R75, and C3 and D3 are betterthan R90. The results in Table 1 also confirm the fact thatthe random policy may remove important cases, since thenumber of uncovered elements is often high, while the otherpolicies can compute reduced case bases of comparable sizewith fewer uncovered elements. Moreover, it is interesting tonote that the case bases with the best coverage are computedby D1, although those obtained by C1 have a similar cov-erage while contain fewer cases. For ZenoTravel, even ifthe case bases obtained through the distance-guided policiescontain many more cases, the coverage of the case bases ob-tained by the coverage-guided policies is similar to or betterthan by the distance-guided policies.

Table 2 shows the performance of OAKPlan using the fullplan library and the reduced libraries derived by the nineconsidered maintenance policies for the considered domainsin terms of average CPU seconds, average plan stability andIPC speed score (defined below). Given a library plan π′

and a new plan π for a test problem, the plan stability of πwith respect to π′ can be defined as 1− da(π, π′). Having ahigh value of plan stability can obviously be very importantin plan adaptation, because, e.g., high stability reduces thecognitive load on human observers of a planned activity by

76

Domain Random policy Distance-Guided policy Coverage-Guided policyR50 R75 R90 D1 D2 D3 C1 C2 C3

DriverLog

Case-base size 2617 1368 566 3152 2253 1727 2318 1222 684

Coverage 0.776 0.628 0.501 0.972 0.857 0.733 0.956 0.681 0.502

#Uncovered 1177 1957 2623 146 754 1404 231 1679 2617

Avg. uncov. dist. 0.067 0.112 0.171 0.017 0.035 0.058 0.022 0.069 0.130

Logistics

Case-base size 2615 1274 462 2826 1443 659 2767 1283 460

Coverage 0.888 0.767 0.572 1 0.996 0.874 1 0.999 0.862

#Uncovered 583 1213 2226 0 21 658 0 5 720

Avg. uncov. dist. 0.036 0.064 0.103 0.018 0.043 0.057 0.018 0.042 0.059

Rovers

Case-base size 2107 1012 518 2130 1358 1165 1758 1018 720

Coverage 0.599 0.358 0.227 0.672 0.476 0.387 0.586 0.375 0.294

#Uncovered 1730 2770 3336 1416 2261 2646 1786 2696 3044

Avg. uncov. dist. 0.131 0.218 0.295 0.065 0.120 0.156 0.080 0.150 0.210

ZenoTravel

Case-base size 2493 1242 479 2718 1729 1240 2588 1205 538

Coverage 0.989 0.959 0.873 1 1 0.993 1 0.999 0.999

#Uncovered 56 202 632 0 0 36 0 1 1

Avg. uncov. dist. 0.027 0.046 0.067 0.014 0.027 0.036 0.015 0.031 0.042

Table 1: Evaluation of nine reduced plan libraries. Gray boxes indicate the best results.

Domain Full Random policy Distance-Guided policy Coverage-Guided policylibrary R50 R75 R90 D1 D2 D3 C1 C2 C3

DriverLog

Avg. CPU seconds 19.8 11.1 6.7 5.7 12.9 9.4 7.5 9.6 5.6 3.3

Speed score 3.76 6.83 11.93 20.39 5.90 8.08 10.01 7.86 14.30 22.70

Avg. plan stability 0.801 0.817 0.787 0.751 0.818 0.823 0.811 0.812 0.829 0.839

Logistics

Avg. CPU seconds 34.4 22.2 22.2 15.0 21.5 13.9 10.2 21.3 12.7 8.9

Speed score 6.13 9.83 13.16 17.62 10.42 15.84 22.44 10.55 17.31 24.88


Rovers

Avg. CPU seconds 160.3 82.8 44.5 78.0 67.3 27.0 19.7 53.3 18.6 17.6

Speed score 3.54 6.27 10.36 15.02 7.58 15.73 20.25 8.40 21.76 22.30


ZenoTravel

Avg. CPU seconds 28.2 29.8 21.9 21.7 22.7 21.3 20.6 22.7 19.3 17.1

Speed score 13.19 12.62 16.66 16.92 16.19 17.10 17.94 16.13 20.16 24.14


Table 2: Performance of OAKPlan. Gray boxes indicate the best results.

ensuring coherence and consistency of behaviors (Fox et al.2006). A good reduced case base should allow the planner toproduce stable solutions. Given two compared policies anda problem set, the average CPU time and plan stability foreach policy is computed over the test problems in the set thatare solved by both the compared policies.

The speed score function was first introduced and usedby the organizers of the 6th International Planning Competi-tions (Fern, Khardon, and Tadepalli 2011) for evaluating therelative performance of the competing planners, and sincethen it has been a standard method for comparing planningsystems performances. The speed score for a maintenancepolicy m is defined as the sum of the speed scores assignedto m over all the considered test problems. The speed score

for m with a planning problem Π is defined as: 0 if Π is un-solved using policy m and T ∗

Π/T (m)Π otherwise, where T ∗Π

is the lowest measured CPU time to solve problem Π andT (m)Π denotes the CPU time required to solve problem Πusing the case base reduced through policy m. Higher valuesof the speed score indicate better performance.

The results in Table 2 indicate that OAKPlan using thelibraries reduced through the compared distance-guided andcoverage-guided policies is always faster than using the fulllibrary, and even the use of the simple random policies usu-ally improves the speed of OAKPlan.

Concerning plan stability, the plans computed using thelibraries reduced by the distance-guided and the coverage-guided policies are always on average as much stable stable

77

as the plans computed using the full library. Surprisingly,for DriverLog and ZenoTravel, OAKPlan with policy C3

computes plans that are even more stable than with the fulllibrary. The rationale of this is related to the use of LPG-Adapt in OAKPlan: since LPG-Adapt is based on a stochas-tic local search algorithm, it may happen that LPG-Adaptcomputes plans that are far from the library plans even ifthere exist solution plans similar to some of them.

OAKPlan using C1 and D1 is always on average fasterthan using R50, while the size of the case bases is oftencomparable, except for D1 and domain DriverLog. ForDriverLog, OAKPlan using D1 is slower than using R50,because the library reduced by D1 is much bigger than theone reduced by R50. Moreover, OAKPlan using C1 and D1

usually computes plans that are on average more stable thanusing R50. The performance gaps of C2 and D2 w.r.t. R75,and of C3 and D3 w.r.t. R90 are similar. Finally, in terms ofaverage CPU time, speed score, and average plan distance,the coverage-guided policies perform almost always betterthan, or similarly to, the distance-guided policies.

ConclusionIn this work, we have addressed the problem of maintain-ing a plan library for case-based planning by proposing andexperimentally evaluating some maintenance policies of thecase base. The investigated policies optimize different qual-ity criteria of the reduced case base.

The random policy, that is also used in general case-basedreasoning, does not optimize any criterion but is very fastto compute. We have introduced two better informed poli-cies, the distance-guided and the coverage-guided policies,which attempt to generate reduced case bases of good qual-ity. Since computing such policies can be computationallyhard, we have proposed a greedy algorithm for effectivelycomputing an approximation of them. An experimental anal-ysis shows that these approximated policies can be muchmore effective compared to the random policy, in terms ofquality of the reduced case base and performance of a case-base planner using them.

There are several research directions to extend the workpresented here. We intend to study in detail additional dis-tance functions to assess the similarity between problemsand solutions, to develop and compare additional policies,to investigate alternative methods for efficiently computinggood policy approximations, and to extend the experimentalanalysis with a larger set of benchmarks. Moreover, currentwork includes determining the computational complexity ofthe two proposed (exact) coverage-guided policies, that weconjecture are both NP-hard.

ReferencesAamodt, A., and Plaza, E. 1994. Case-based reasoning:foundational issues, methodological variations, and systemapproaches. AI Communications 7(1):39–59.Bacchus, F., and Kabanza, F. 2000. Using temporal logicto express search control knowledge for planning. ArtificialIntelligence 116(1-2):123–191.Backstrom, C., and Nebel, B. 1996. Complexity results forSAS+ planning. Computational Intelligence 11:625–655.

Backstrom, C.; Chen, Y.; Jonsson, P.; Ordyniak, S.; andSzeider, S. 2012. The complexity of planning revisited –a parameterized analysis. In 26th AAAI Conf. on AI.Fern, A.; Khardon, R.; and Tadepalli, P. 2011. The firstlearning track of the int. planning competition. MachineLearning 84(1-2):81–107.Fox, M.; Gerevini, A.; Long, D.; and Serina, I. 2006. Planstability: Replanning versus plan repair. In 16th Int. Conf.on AI Planning and Scheduling.Gerevini, A.; Saetti, A.; and Serina, I. 2003. Planningthrough stochastic local search and temporal action graphs.JAIR 20:239–290.Gerevini, A.; Saetti, A.; and Serina, I. 2012. Case-basedplanning for problems with real-valued fluents: Kernel func-tions for effective plan retrieval. In ECAI 2012.Ghallab, M.; Nau, D. S.; and Traverso, P. 2004. Automatedplanning - theory and practice. Elsevier.Koenig, S. 2013. Int. planning competition. http://ipc.icaps-conference.org/.Leake, D. B., and Wilson, D. C. 1998. Categorizing case-base maintenance: Dimensions and directions. In 4th Euro-pean Workshop on CBR.Leake, D. B., and Wilson, D. C. 2000. Remembering why toremember: Performance-guided case-base maintenance. In5th European Workshop on CBR.Markovitch, S.; Scott, P. D.; and Porter, B. 1993. Informa-tion filtering: Selection mechanisms in learning systems. In10th Int. Conf. on Machine Learning, 113–151.Minton, S. 1990. Quantitative results concerning the utilityof explanation-based learning. AI 42(2-3):363–391.Munoz-Avila, H. 2001. Case-base maintenance by inte-grating case-index revision and case-retention policies in aderivational replay framework. Computational Intelligence17(2):280–294.Richter, S., and Westphal, M. 2010. The lama planner:Guiding cost-based anytime planning with landmarks. JAIR39:127–177.Serina, I. 2010. Kernel functions for case-based planning.Artificial Intelligence 174(16-17):1369–1406.Smyth, B., and Keane, M. T. 1998. Adaptation-guided re-trieval: Questioning the similarity assumption in reasoning.Artificial Intelligence 102(2):249–293.Smyth, B., and McKenna, E. 1999. Footprint-based re-trieval. In ICCBR 1999, 343–357.Smyth, B. 1998. Case-base maintenance. In Eleventh Int.Conf. on Industrial and Engineering Applications of AI andExpert Systems. Springer.Spalazzi, L. 2001. A survey on case-based planning. AIReview 16(1):3–36.Srivastava, B.; Nguyen, T. A.; Gerevini, A.; Kambhampati,S.; Do, M. B.; and Serina, I. 2007. Domain independentapproaches for finding diverse plans. In IJCAI 2007.Zhu, J., and Yang, Q. 1998. Remembering to add:Competence-preserving case-addition policies for case-basemaintenance. In 16th Int. Joint Conf. on AI.

78

Towards Automated Planning Domain Models Generation

Mauro Vallati and Lukas ChrpaSchool of Computing and Engineering

University of Huddersfield, UK{m.vallati,l.chrpa}@hud.ac.uk

Federico CeruttiSchool of Natural & Computing Science

University of Aberdeen, [email protected]

Abstract

It is a common practice in Automated Planning to eval-uate algorithms on existing benchmark domains. Thenumber of domain models is limited, since they are en-coding simplified versions of real-world domains andthe generation of a new planning domain is a complextask.The limited number of domain models does not allow tohave a complete overview of the performances of auto-mated planning engines. It would then be useful to havea generator of planning domain models for improvingthe evaluation of planning algorithms.In this paper we introduce the requirements that an au-tomatic generator of random domain models should ful-fill, and we discuss the related works and the main is-sues that a domain models generator will have to face.

IntroductionIn AI planning, algorithms are commonly evaluated onlyon a limited number of benchmark domain models, usuallythose that have been designed and used in an InternationalPlanning Competition (IPC) (Coles et al. 2012). They are in-spired by real world domains with the ultimate aim of test-ing planning algorithms in everyday applications. However,usually the resulting models are very simplified, they shareseveral similarities, and they are provided in a very limitednumber.

In this paper we introduce the requirements for a gener-ator of planning domain models that will be able to over-come the limits stated above. These requirements will con-sider four situations where the usage of such a generator canbe envisaged: (i) for achieving a better comprehension ofalgorithms performance, (ii) for configuring learning-basedplanners on large classes of domains; (iii) for configuringdomain-independent portfolios on very large sets of do-mains, and (iv) for improving the current existing techniquesfor comparing planning models.

In the next section we introduce the existing techniquesfor generating domain models for planning. Then in the sub-sequent we discuss the main issues related to the automatedgeneration of AI planning domains models. Finally we pro-vide conclusions and future work.

Related works

The generation of a new planning domain model is a com-plex task. The traditional method involves an AI planningexpert that uses a text editor for manually hand coding aset of previously gathered requirements that represent a realworld domain. Recently, knowledge engineering tools suchas GIPO (Simpson, Kitchin, and McCluskey 2007), itSIM-PLE (Vaquero et al. 2012) and PDDL Studio (Plch et al.2012) have been developed. These tools usually includetechniques for analysis of structures of domain models. Itis the case of itSIMPLE, where Petri Nets are exploitedfor analysing dynamic properties of the model, or GIPO, inwhich it is possible to check the correctness of invariants.It is clear that these KE tools are designed for supportingusers in the models generation task, while in this paper weare introducing the idea of an automated models generator.It is also worth to consider that, for humans, it is hard to de-sign a new domain which is not somehow related to a real-world application. Moreover, a user will probably re-use so-lutions that he has previously adopted for encoding similarconstraints, which will result in models with significant sim-ilarities. These facts represent a clear limit to the generationof new models by human users.

Considering the automatic generation of planning domainmodels, there exist several techniques for handling this prob-lem, but all of them require the exploitation of some sortof existing knowledge. LOCM (Cresswell, McCluskey, andWest 2013) is able to generate domain models from sam-ple plan traces, for instance, LOCM is able to learn a Free-cell domain model just by observing legal moves in severalgames. Opmaker2 (McCluskey et al. 2010) learns domainmodels from sample plans and partial domain models. Do-main models can be learnt also from existing formal mod-els (Bartak, Fratini, and McCluskey 2010).

Differently from the existing automatic approaches forgenerating planning domain models, the final outcome ofthis research project will be a system able to automaticallycreate new models that will be mainly exploited for improv-ing the comprehension of planners performances (encodingdomain models described in a non-formal language or en-coding real-world problems are not considered at this stage).

To this aim, in the next section we introduce the designrequirements for such a system.

79

Requirements for an Automated PlanningDomain Models Generator

An automated planning domain models generator should ad-dress the following three issues: (i) how to define a wellstructured domain; (ii) the parameters that can be safely ran-domized, and the ones that have to be selected by humans;and (iii) the equivalence between models.

Insights into how actions, instances of planning opera-tors, might be ordered in plans can be get by investigatingoperators’ preconditions and effects. This mainly influencesplanning domain structures. Following Chapman’s terminol-ogy (Chapman 1987) we can define a possible achiever anda possible clobberer. An operator o is a possible achiever(resp. clobberer) for another operator o′ (o′ does not have tobe necessarily different from o) if and only if o creates (resp.deletes) a predicate for o′ (operators o and o′ must sharecorresponding arguments). Straightforwardly, a predicate ina precondition of some operator must be achieved by someother operator or be present in an initial state. Similarly, anoperator should be an achiever for some other operator ora goal state. In other words, operators should be reachable(their instances should be applicable at some point) and use-ful (their instances should somehow contribute to the goal).Intuitively, achiever and clobberer relations among operatorsin a domain should be somehow balanced. These aspectsmight somehow ensure that the domain is well structured.However, it is also important to investigate computationalcomplexity issues of determining whether a domain is wellstructured.

Regarding the second aspect, the system must provideconfiguration parameters that humans would select for test-ing planning algorithms on domain models with specificcharacteristics. It can determine, for instance, whether plan-ning operators, when instantiated into actions, have a highlevel of interference between each other, or what restric-tions the domain models would have. Specific restrictionsof planning domains influence how difficult is to solve corre-sponding planning problems in terms of computational com-plexity (Backstrom and Nebel 1995; Backstrom et al. 2012).Computational complexity of known IPC benchmarks hasalso been studied (Helmert 2003; 2006). Despite proventractability of some classes of problems planning enginestend to struggle with them. Therefore, a possibility of gen-erating domains with proven tractability can somehow pointto issues which might be characteristic for current domain-independent planning engines. Given the fact that the prob-lem of determining how to design an optionally constraineddomain have never been carefully analyzed, we believethat having an opportunity to generate domains with spe-cific constraints might contribute to ongoing research deal-ing with structural analysis of planning domains/problemswhich might result in new planning techniques, heuristicsand theoretical results. We will address this specific prob-lem by studying the relations between (Roberts and Howe2009)’s metrics of existing domain models that have beendemonstrated to be constrained, thus providing also an inno-vative usage of these metrics.

Another important aspect of planning domain models is

the reversibility of all the operators. Reversibility impliesthat is whether at any state, an operator’s application canbe reversed by another operator to result in the originalstate (Wickler 2011). If one operator is not reversible, thisleads to dead ends in the search space of the planning prob-lems. This is an aspect that is important to address in par-ticular both for comparing planning models and for config-uring domain-independent portfolios because planning do-main models which have many dead ends in the search spaceare especially problematic for heuristic search based plan-ners (Helmert 2004).

The third introduced aspect is related to the equivalenceof domain models. Referring to the definitions given in(Shoeeb and McCluskey 2011), there are two different typesof equivalence: strong and weak. The former defines two do-main models that are identical up to naming, while the latterimplies that the functional behaviour of the domain is thesame for both models. This means that the same task canbe formulated in both models and then the same solutionscan be generated. It is clear that a domain models genera-tor must avoid to generate strong equivalent models: indeedthe analysis of the performance of planning algorithms onstrong equivalent domains do not improve the knowledge ontheir behaviour. Therefore, if different strong equivalent do-main models are used for testing planners performance, theresulting statistical model of their behaviour would be mis-leading. On the other hand, the impact of weak equivalentdomain models on the planners evaluation is not clear as thestrong one. Intuitively, it seems reasonable that the genera-tion of weak equivalent models should be avoided by an au-tomated system but more analysis are needed for a completeunderstanding of this issue.

Conclusions

The evaluation of planning algorithms is currently limitedto existing benchmark domain models. This approach doesnot allow to have a complete overview of the performance ofplanners. An automated domain model generator would thenbe useful in several situations, which include algorithmsevaluation and configuration of learning-based planners.

In this paper we introduced three general requirementsfor a domain models generator. First it has to build well-structured and exploitable domain models. Then, it has tobe open to user customization, in order to generate planningdomains with certain chosen characteristics (e.g. tractability,operators’ reversibility, ...) that are believed to be useful forgiven purposes. Finally, it must be able to generate domainsnot equivalent to a given one.

Our future work includes more specific analysis of the do-main models generation problem, such as the best way forrepresenting and handling new models, and the developmentof a prototype of a domain generator. It will be also impor-tant to study the techniques that can be used for evaluat-ing the quality of the new generated models, and the perfor-mance of the proposed system.

80

ReferencesBackstrom, C., and Nebel, B. 1995. Complexity results forsas+ planning. Computational Intelligence 11:625–656.Backstrom, C.; Chen, Y.; Jonsson, P.; Ordyniak, S.; andSzeider, S. 2012. The complexity of planning revisited -a parameterized analysis. In Proceedings of the 26th Con-ference on Artificial Intelligence (AAAI-12), 1735–1741.Bartak, R.; Fratini, S.; and McCluskey, T. L. 2010. The3rd competition on knowledge engineering for planning andscheduling. AI Magazine 31(1):95–98.Chapman, D. 1987. Planning for conjunctive goals. Artifi-cial Intelligence 32(3):333–377.Coles, A. J.; Coles, A.; Olaya, A. G.; Jimenez, S.; Lopez,C. L.; Sanner, S.; and Yoon, S. 2012. A survey of the seventhinternational planning competition. AI Magazine 33(1).Cresswell, S. N.; McCluskey, T. L.; and West, M. M. 2013.Acquiring planning domain models using locm. The Knowl-edge Engineering Review FirstView:1–19.Helmert, M. 2003. Complexity results for standardbenchmark domains in planning. Artificial Intelligence143(2):219–262.Helmert, M. 2004. A planning heuristic based oncausal graph analysis. In Proceedings of the 14th Interna-tional Conference on Automated Planning and Scheduling(ICAPS-04), 161–170.Helmert, M. 2006. New complexity results for classicalplanning benchmarks. In Proceedings of the 16th Interna-tional Conference on Automated Planning and Scheduling(ICAPS-06), 52–62.McCluskey, T. L.; Cresswell, S.; Richardson, N.; and West,M. 2010. Action knowledge acquisition with opmaker2.In Agents and Artificial Intelligence, volume 67 of Commu-nications in Computer and Information Science. Springer.137–150.Plch, T.; Chomut, M.; Brom, C.; and Bartak, R. 2012.Inspect, edit and debug pddl documents: Simply and effi-ciently with pddl studio. In System Demonstration – Pro-ceedings of the 22nd International Conference on Auto-mated Planning & Scheduling (ICAPS-12).Roberts, M., and Howe, A. 2009. Learning from plannerperformance. Artificial Intelligence 173(56):536 – 561.Shoeeb, S., and McCluskey, T. L. 2011. On comparingplanning domain models. In The 29th Workshop of the UKPlanning and Scheduling Special Interest Group (PlanSIG-11).Simpson, R.; Kitchin, D. E.; and McCluskey, T. 2007. Plan-ning domain definition using GIPO. Knowledge Engineer-ing Review 22(2):117–134.Vaquero, T. S.; Tonaco, R.; Costa, G.; Tonidandel, F.; Silva,J. R.; and Beck, J. C. 2012. itSIMPLE4.0: Enhancingthe modeling experience of planning problems. In Sys-tem Demonstration – Proceedings of the 22nd InternationalConference on Automated Planning & Scheduling (ICAPS-12).

Wickler, G. 2011. Using planning domain features to facili-tate knowledge engineering. In Knowledge Engineering forPlanning and Scheduling workshop (KEPS).

81

82

Session 3

Planning and Scheduling with time constraints

83

84

Business Model Design as a Temporal Planning Problem: Preliminary ResultsDaniele Magazzeni

Department of InformaticsKing’s College London

UK

Fabio MercorioCRISP Research Centre

University of Milan BicoccaItaly

Balbir Barn, Tony Clark, Franco RaimondiDepartment of Computer Science

Middlesex University, LondonUK

Vinay KulkarniTata Consultancy Services

Pune, India

Abstract

A number of formalisms and notations are available to designbusiness models. Typically, a top-level model is created man-ually with one of these formalisms by a team of business ex-perts, and the model is then analysed using simulations andmodel-based testing to find the most efficient configurationfor resource allocation. The aim of this paper is to present anovel application domain for planning, together with a bench-mark problem based on an industrial partner’s experience: weencode the problem of finding the most efficient resource al-location for a business process as a planning problem. In par-ticular, we consider a list of actions that various divisions inan organization can take, together with their associated costs,and we look for a solution that minimises time-to-market fora given project budget.

1 IntroductionBusiness organisations that provide or build products (e.g.,software, mixed software-hardware solutions, and even ac-tual goods) are likely to employ abstract modelling lan-guages to describe and analyse their business processes. Asdescribed below, a number of formal languages are avail-able for modelling business processes, and various tools areavailable to automate the analysis of the modelled work-flows with the aim of minimising time-to-market, produc-tion costs, maximising return on investment, etc. To the bestof our knowledge, however, the design of business mod-els is still a manual process that relies on the experienceof top-level managers and domain experts to produce se-quences of steps that achieve a desired business goal, sub-ject to the minimization/maximization of various metrics.As a result, there is no guarantee that the business modelsobtained using this process are indeed the most efficient so-lution. Additionally, the exploration of different options is avery time-consuming task: each new model has to be devel-oped and analysed separately, and alternative solutions needto be compared manually.

In this paper we argue that the design of business modelscan be automated using AI planning techniques, thus pro-viding an effective tool to search for “efficient” solutions forresource allocation and task scheduling, or to quickly ex-plore alternative business models.


More in detail, our contributions can be summarised asfollows: we present a novel application domain for temporalplanning that is derived form a concrete instance provided byan industrial partner; we describe a benchmark domain thatresults in a challenging multi-objective temporal planningproblem; as a practical example, we consider a model-drivensoftware development process and we show how it can beencoded as a temporal planning problem for our domain.

Finally, we use a temporal planner to generate solutionsthat minimise time-to-market for a given project budget andwe provide a visual representation of the automatically gen-erated non-trivial resource allocation using a Gantt chart.

Related work: UML Activity Diagrams (UML 2012)and their associated profiles are among the most commonlyused representations for business process modelling. Anumber of tools support UML modelling, including Eclipseplug-ins (www.eclipse.org) and IBM Rational Rose(www.ibm.com/software/rational/). Various ap-proaches have been presented to analyse and verify activitydiagrams (Eshuis 2006; Forster et al. 2007). The key differ-ence with our work is that the input of all the existing tools(and for the methods mentioned below) is a business modelthat is manually created: using a planner an optimised model(in terms of resource allocation) is automatically generated.

Other business modelling languages include PetriNets (van der Aalst 1998; Hinz, Schmidt, and Stahl 2005)and Event-Drive Process Chains (EPC) (van der Aalst 1999).Due to space constraints we refer to (List and Korherr 2006)for the description of additional methods.

Perhaps the closest work to ours is the translation of Sta-tus and Action Management (SAM) models to PDDL (Hoff-mann, Weber, and Kraft 2012). This work has been devel-oped in collaboration with SAP (www.sap.com), and isbased on the use of an adaptation of the planner FF (Hoff-mann and Nebel 2001). Using this approach, businessprocess modellers can employ a planner to refine high-level actions and automatically generate sequences of stepsexpressed in BPMN (Business Process Modelling Nota-tion (OMG 2012)). The main difference with our proposal isthat we consider cost and duration of actions, together withplanning metrics.

85

2 A Formal Model for Business ProcessesIn this section we first provide a formal semantics for busi-ness processes that abstracts the existing approaches de-scribed above. Then, we describe an actual process currentlyin use at Tata Consultancy Services. In section 3 a mappingbetween the formal model and a temporal planning model ispresented, using the concrete business process as a runningexample.

2.1 The ModelDefinition 1 (Business Process) A Business Process (BP)B is a 10-tuple (S,si,se,P ,D,R,A,T ,C,b), where: S is a fi-nite set of states, si ∈ S is the initial state, se ∈ S is theend state, P is a finite set of parameters, D : S → 2S is thedependency function, R : S → 2P is the requirements func-tion, A : S → 2P is the allocation function, T : S → R+ isthe temporal function, C : S × 2P × R+ → R+ is the costfunction and b ∈ R+ is the budget.

The set of states S corresponds to the set of typical busi-ness steps, such as “Requirements analysis” or “Code test-ing”. Each state may depend on other states: for instance,“Code testing” depends on “Code generation”. The set Pincludes parameters such as number of developers, numberof testers, number of domain experts, etc. For each states ∈ S, D(s) defines the set of states s depends on, R(s) de-fines the requirements for the phase represented by s, A(s)defines the resources allocated for it, T (s) defines its dura-tion and C(s, A(s), T (s)) defines its cost. Finally we havean initial state si ∈ S (such that D(si) = ∅), and an endstate se ∈ S.

Definition 2 (Business Process Execution) A BusinessProcess Execution for the BP B=(S,si,se,P ,D,A,T ,C,b) isa sequence π = (s0a0t0)(s1a1t1)(s2a2t2) . . . sn, where,∀j ≥ 0, sj ∈ S, aj = A(sj), tj = T (sj).

A Business Process Execution is admissible iff:

1. s0 = si

2. sn = se

3. ∀j ≥ 1, ∀sk ∈ {D(sj)} : sk ∈ π ∧ k < j

4. ∀sl ∈ S, R(sl) ⊆ A(sl)

5.∑

j=0...n−1 C(sj, aj , tj) ≤ b

In particular, conditions 1, 2, and 3 of Definition 2 requireall the dependencies among phases to be satisfied, all phasesrequirements to be met and the total cost to be within thegiven budget, respectively.

2.2 The MDSD ProcessTable 1 describes the states and the associated functionsR(s), T (s), and C(s, a, t) for a model-driven software de-velopment (MDSD) process that has been used to deliverseveral large business applications for past 16 years at TataConsultancy Services. A row of the table depicts a specificphase of the MDSD process, the time the phase takes tocomplete as a percentage of total time taken for completionof the MDSD process, and the various actors participatingin the phase along with their relative contribution. For in-stance, the High Level Design phase requires 5% of the time

taken by the overall MDSD process, and requires the partic-ipation of a Solution Architect (SA), a Domain Expert (DE)and a Technology Architect (TA). If the overall time requiredby a project is 100 man-days, this line encodes the fact thatthe HLD phase requires 0.3 · 5 TA days, 0.1 · 5 DE days,and 0.6 · 5 SA days. The remaining actors are Test Engineer(TE), MDE Expert (ME), Modeller (M), Developer (D), andTester (T).

The initial state is si = RE, and the end state is se = END.The dependency function D is graphically shown in Fig-ure 1. The function C(s, a, t) can be derived from the times

described in Table 1 and the costs in Table 2.

3 Business Process as a Planning ProblemThe proposal of this paper is a translation of a businessprocess (as defined in Definition 1) into a temporal plan-ning problem, so that planners can be used to find admissi-ble business process executions (as defined in Definition 2)while minimising the time-to-market.

It is worth noting that the use of temporal planning iskey in this context as it allows modelling of concurrent ac-tivities and time-dependent resource allocations. Further-more, although the business process design could be seenas a scheduling problem, the setting we consider makes it aplanning problem. In fact, the number of resources that canbe allocated to each phase is not known in advance as wellas the order in which phases are executed, their duration andhow concurrency can be exploited.

In the following we present the main components of thePDDL domain and problem.

3.1 The Planning DomainThe business process domain presents a number of challeng-ing features to be modelled. First, the business process con-sists of different phases each of which, in turn, requires anumber of tasks to be accomplished. The order of executionof the phases is not fixed, while there is a set of dependen-cies among phases that must be satisfied, which, however,allow for a set of phases to be executed in parallel. Second,different tasks require people of different skills, and peoplehave to be allocated to each phase/task. The number of peo-ple to allocate is not know in advance, and only an upperbound is provided. Third, the project cost needs to be com-puted, depending on the resource allocation, and maintainedwithin the given budget. Finally, as we want to optimise thetime-to-market, the duration of the plan has to be minimised.

We begin the description of the domain with thewhole project action, shown in Figure 2. It is an en-velope action, which encloses the whole process executionand whose duration is chosen by the planner. Therefore,in order to minimise the time-to-market, the planner willtry to minimise the duration of the whole project ac-tion, which, in turn, is used to set dynamically the valueof project time that is used to compute the cost of theproject, as described in the following.

In order to model resource allocation, we distinguish be-tween employing a resource (which defines the total number

86

s ∈ S T (s) R(s)% Time DE SA TA TE ME M D T

Requirements Elicitation (RE) 15 0.9 0.1High Level Design (HLD) 5 0.1 0.6 0.3Test Case Preparation (TCP) 5 0.2 0.1 0.7Low Level Design (LLD) 10 0.3 0.7Code Generator Procurement (CGP) 10 0.2 0.2 0.6Component Interface Modelling (CIM) 2 0.1 0.1 0.8Component Interface Validation (CIV) 2 0.1 0.1 0.8Component Interface Assembly (CIA) 1 0.2 0.1 0.7Modelling Component Implementation (MCM) 5 0.1 0.1 0.8Validation of Component Implementation Model (VCIM) 5 0.1 0.1 0.8Coding of Component Implementation (CCI) 7 0.8 0.2Model-Based Code Generation (MBCG) 3 0.1 0.1 0.8DSL translation (DSLT) 4 0.1 0.9Compilation (COMP) 5 1Unit Testing (UT) 5 1Component Assembly (CA) 5 0.2 0.4 0.4Integration Testing (IT) 5 0.1 0.1 0.8User Acceptance Testing (UAT) 5 0.1 0.1 0.1 0.7Sign Off (SO) 1 0.4 0.3 0.3END

Table 1: Phases of the MDSD process

Figure 1: Dependency graph for the MDSD process

(:durative-action whole_project:parameters (?p - phase):duration (<= ?duration (MAX_PLAN_LENGHT)):condition (and (at start (todo_project))(at start (is_last_phase ?p))(at end (completed ?p)))

:effect (and (at start (running_project))(at start (not (todo_project)))(at end (project_completed))(at end (assign (project_time) ?duration))))

Figure 2: The whole project action

of resources for each skill that will be used) and allocating aresource (which defines how resources are used throughoutthe process).

Employing Resources. Figure 3 shows the employ andpayOne actions for domain experts (similar actions are de-fined for other skills). These actions are used for managingthe amount of resources and for updating the project costaccordingly. In particular the employ action incrementsthe project cost by the cost of recruiting that particular re-

source (costs of resources of different skills are shown inTable 2). On the other hand, the payOne action, which isapplied when the project is completed, is used to incrementthe project based on the daily cost of the resource and theduration of the project.

Allocating resources. The planner can use allocation(deallocation) actions to assign (release) resources of differ-ent skills to each phase of the business model before (after)performing that phase through the corresponding executionaction. As an example, Figure 4 shows the actions for al-locating and deallocating a domain expert. Note that thedeallocate action for skill Y does not require the wholephase to be finished, but only the subtask for skill Y to becompleted. This allows a flexible allocation of the same re-source to different phases.

Executing Phases. Modelling the execution of a phasepresents an interesting issue, as a phase consists of one ormore tasks to be completed. Furthermore, the duration ofeach task is defined in terms of man-days for each skill re-quired to perform the task (as shown in Table 1). Let us as-sume that task p requires skills A, B, C, and for each of themthe amount of work is pAdays, pBdays and pCdays. If

87

(:action employ_DE:parameters ():precondition (and (< (employed_DE) (max_DE))(running_project))

:effect (and (increase (available_DE) 1)(increase (employed_DE) 1)(increase (total_project_cost)

(employment_cost_DE)))

(:action payOne_DE:parameters ():precondition (and (project_completed)(> (employed_DE) 0)(> (available_DE) 0))

:effect (and (increase (total_project_cost)(* (project_time) (per_day_cost_DE)))

(decrease (employed_DE) 1)))

Figure 3: The employ and payOne actions

(:action allocate_DE:parameters (?p - phase):precondition (and(doing ?p)(> (available_DE) 0))

:effect (and(increase (allocated_DE ?p) 1)(decrease (available_DE) 1)))

(:action deallocate_DE:parameters (?p - phase):precondition (and(completed_DE ?p)(> (allocated_DE ?p) 0))

:effect (and(decrease (allocated_DE ?p) 1)(increase (available_DE) 1)))

Figure 4: An example of allocation and deallocation actions

the planner has allocated pAres, pBres, pCres resourcesto task p, then the duration of the phase is

maxi∈{A,B,C}

(pidays

pires

)

Therefore, the effects of the action become effective onlywhen all the sub-tasks have been completed. On the otherhand, the resources of skill j allocated for the task becomeavailable as soon as the sub-task requiring skill j terminates,even if the other sub-tasks are still executing.

Modelling such a scenario is not trivial, and the proposedsolution is illustrated in Figure 7. For each phase of the busi-ness process, an envelope action is used, whose duration isleft to the planner. Then, the task is split into k durative ac-tions (where k is the number of different subtask requiredto complete the phase), whose duration depends on the re-sources previously allocated by the planner.

As an example, Figure 5 shows the envelope action to per-form a phase, while Figure 6 shows the action for the sub-task requiring domain experts. Furthermore, the model al-lows the presence of continuous non-linear constraints over

resources, as the business model design is expressed in termsof a temporal planning problem.

(:durative-action execute_phase:parameters (?p ?dp1 ?dp2 ?dp3 - phase):duration (and (<= ?duration (upper_bound ?p))):condition (and

(at start (todo ?p))(at start (running_project))(at start (depends ?p ?dp1 ?dp2 ?dp3))(at start (completed ?dp1))(at start (completed ?dp2))(at start (completed ?dp3))(at end (completed_DE ?p))(at end (completed_SA ?p))(at end (completed_TA ?p)))

:effect (and (at start (doing ?p))(at end (completed ?p))(at end (not (doing ?p)))(at end (not (todo ?p)))))

Figure 5: An example of envelope-execution action

(:durative-action executive_subtask_DE:parameters (?p - phase):duration (= ?duration (/(duration_subtask_DE ?p) (allocated_DE ?p)))

:condition (and (at start (todosubtask_DE ?p))(over all (doing ?p))(at end (>= (employed_DE) (allocated_DE ?p)))

:effect (and (at start (not (todosubtask_DE ?p)))(at end (completed_DE ?p))))

Figure 6: An example of subtask-execution action

3.2 The Planning ProblemThe goal is to complete the whole project satisfying allthe dependencies. Furthermore, the total project cost mustbe within the given budget. To this end, the goal has thecondition that total project cost must be no greaterthan budget, where total project cost depends onthe number of resources with different skills employedand the per-day cost of each resource (as shown in Ta-ble 2). As we said before, we are interested in minimisingTime-to-Market. This is mapped into the planning metric(:metric minimize (total-time)).

Skill per-day-cost employment costDE 5 30SA 5 30TA 4 25TE 2 20ME 5 30M 2 20D 1.25 12.5T 1 10

Table 2: Normalized costs of resources of different skills

88

Figure 7: Envelope action for task execution

A fragment of the PDDL problem is shown in Fig-ure 8, where we show key elements for phase Compilation(COMP) and resource Developer (D).

The budget is fixed to 3, 000. The per day cost D,the employment cost D and the maximum number ofDevelopers that can be employed are defined according tothe normalized costs of resources shown in Tab. 2.

Then, the predicate (todosubtask D COMP) is usedto specify which kind of skills are needed to complete thephase (here a Developer is needed to perform the Compila-tion phase). The dependency graph for the MDSD processis defined through the predicate (depends COMP CIAMBCG DSLT)which constrains the execution of the COMPphase to the completion of three distinct phases, that are theComponent Interface Assembly, Model-Based Code Gen-eration, and DSL Translation. The duration of the phaseCOMP and its three subtasks is initiated through predicatesupper bound COMP and duration subtask respec-tively, as explained in Sec. 2.2. Finally, as described above,the goal is to find a feasible Business Process Execution ac-cording to Def. 2 minimizing the time-to-market.

4 Experimental ResultsGiven the model detailed above, we used the POPF1 plan-ner (Coles et al. 2010) to synthesise a solution. Figure 9 pro-vides a Gantt chart of an efficient solution found by POPFin less than 30 minutes, working on a x64 Linux machineequipped with 6 GB of RAM. The MDSD process requiresabout 15 days, with a total cost of 2.686 for people, againstan available budget of 3.000. Notice that the maximum num-ber of employees for each skill has been limited to 5.

An approach typically followed by managers - and con-firmed by our industrial partner - is to move a resource to adifferent phase only when the current phase is finished, asthe switching of resources between ongoing phases is hardto be planned manually. Conversely, the key element of theplan is the optimised parallel execution of several phases

1POPF is an any-time planner, which improves the current so-lution as time is given.

(= (MAX_PLAN_LENGHT) 100) ;;sum of T(s) in Tab.1(is_last_phase END)(= (budget) 3000)(= (total_project_cost) 0)(= (project_time) 0)

;; Developer(= (employed_D) 0)(= (max_D) 5)(= (available_D) 0)(= (employment_cost_D) 12.5)(= (per_day_cost_D) 1.25)

;; columns R(s) of Tab. 1 for a Developer(todosubtask_D CCI) (todosubtask_D MBCG)(todosubtask_D DSLT) (todosubtask_D COMP)(todosubtask_D UT) (todosubtask_D CA)

;; graph of Fig.1(depends COMP CIA MBCG DSLT)

;; COMP PHASE(completed_DE COMP) (completed_TA COMP)(completed_TE COMP) (completed_ME COMP)(completed_M COMP) (completed_T COMP)(completed_SA COMP)

(= (upper_bound HLD) 5)(= (duration_subtask_D COMP) 5)

(:goal (and(project_completed)(<= (total_project_cost) (budget)))

(:metric minimize (total-time)))

Figure 8: An extraction of PDDL problem for the COMPphase

as the model allows the switching of a resource betweenphases even when they are still on-going. To give a fewexamples, Figure 11 focuses on phases Modelling Compo-nent Implementation (MCM), Component Interface Assem-bly (CIA), and Coding of Component Implementation (CCI)of the Gantt chart depicted in Figure 9. These phases start inparallel on the sixth day of the project execution. In partic-ular, the phases Modelling Component Implementation andComponent Interface Assembly have the same skill require-ments as shown in Table 1, that is Software Architects andMDE Experts.

In order to speed up the execution of these parallel phases(on which the next Compilation phase depends) the plannerdecides to allocate the available Modellers on the Compo-nent Interface Assembly task until it ends, then the plan-ner switches them on the same task of the phase ModellingComponent Implementation. The same happens for tasksrequiring MDE Experts. On the other hand, all these threephases need to complete a task which involves Software Ar-chitects (boxes with a green vertical texture in Figure 11).As a consequence, the planner decides to assign 4 out of 5Software Architects to complete the task of phase Coding ofComponent Implementation. Simultaneously, it first assigns

89

6 7

Whole ProjectPhase MCM

Phase CIAPhase CCI

TaskM CIA 3TaskSA CCI 4TaskD CCI 5

TaskME CIA 3TaskME MCM 5

TaskM MCM 5TaskSA CIA 1

TaskSA MCM 1

Figure 11: A focus on the Gannt Charts of Figure 9. Rel-evant tasks are highlighted with a green vertical texturewhen referring to Software Architects (SA), with an orangecrosshatched texture to refer MDE Experts (ME) and with ablue slanted texture in referring to Modellers (M).

the remaining Software Architect to phase Component Inter-face Assembly, and then to the phase Modelling ComponentImplementation. This non-trivial allocation of resources be-tween tasks allows the planner to compute an optimised taskduration to minimize the project time maximising the budgetusage. Finally, a less efficient plan is provided in Figure 10that requires 57 days to complete the MDSD process. Noticethat, in comparison with the optimised solution of Figure 9,this plan recruits less employees but requires more budgetto complete the process. This example shows that there ismuch room for optimisation in this domain using automatedplanning.

5 Conclusion and Future WorkIn this paper we have presented how the problem of design-ing a business model can be cast as a planning problem.Our experience at Tata Consultancy Services over nearly twodecades has shown that the correct design of business mod-els can make the difference between successful and unsuc-cessful projects. However, in spite of the large body of workavailable for the verification of existing business processes(see for instance (Bianculli, Ghezzi, and Spoletini 2007) andreferences therein), there is currently no support for the au-tomatic generation of business processes. We have presenteda first approach on this direction, modelling the design of thebusiness process as a temporal planning domain and findingplans that minimise time-to-market for a given project bud-get.

The abstract model described in Section 2.1 capturesthe key elements of most modelling languages describedin the Introduction: we are currently working on auto-matic translators from mainstream notations (OMG 2012;Clark, Barn, and Oussena 2011) to PDDL, and for the futurewe envisage automatic tool support to enable the communi-cation between modelling tools and an appropriate planner.

A natural future work for extending the proposed modelis to exploit the expressive power of PDDL3.0 (Gereviniand Long 2006) and use preferences to take into accountsoft constraints and model further Key Performacen Indica-

tors. Finally, to deal with the multi-objective optimisationinvolved in this problem and provide richer suggestions tobusiness organisation, the use of Pareto frontiers (Sroka andLong 2012) appears promising.

ReferencesBianculli, D.; Ghezzi, C.; and Spoletini, P. 2007. A modelchecking approach to verify BPEL4WS workflows. In Pro-ceedings of the IEEE International Conference on Service-Oriented Computing and Applications, SOCA ’07, 13–20.Washington, DC, USA: IEEE Computer Society.Clark, T.; Barn, B. S.; and Oussena, S. 2011. LEAP: aprecise lightweight framework for enterprise architecture. InProceeding of the 4th Annual India Software EngineeringConference, ISEC 2011, 85–94. ACM.Coles, A. J.; Coles, A. I.; Fox, M.; and Long, D. 2010.Forward-chaining partial-order planning. In Proceedings ofthe Twentieth International Conference on Automated Plan-ning and Scheduling (ICAPS-10).Eshuis, R. 2006. Symbolic model checking of UML activitydiagrams. ACM Trans. Softw. Eng. Methodol. 15(1):1–38.Forster, A.; Engels, G.; Schattkowsky, T.; and VanDer Straeten, R. 2007. Verification of business process qual-ity constraints based on visual process patterns. In Theoret-ical Aspects of Software Engineering, 2007., 197 –208.Gerevini, A., and Long, D. 2006. Preferences and soft con-straints in pddl3. In Proceedings of ICAPS Workshop onPlanning with Preferences and Soft Constraints.Hinz, S.; Schmidt, K.; and Stahl, C. 2005. TransformingBPEL to Petri nets. Business Process Management 220–235.Hoffmann, J., and Nebel, B. 2001. The FF planning system:Fast plan generation through heuristic search. J. Artif. Intell.Res. (JAIR) 14:253–302.Hoffmann, J.; Weber, I.; and Kraft, F. 2012. SAP speaksPDDL: Exploiting a software-engineering model for plan-ning in business process management. Journal of ArtificialIntelligence Research 44:587–632.List, B., and Korherr, B. 2006. An evaluation of conceptualbusiness process modelling languages. In Proceedings of the2006 ACM symposium on Applied computing, 1532–1539.ACM.OMG. 2012. OMG business process model and notation.www.bpmn.org/. Last accessed: 13 November 2012.Sroka, M., and Long, D. 2012. Exploring metric sensitivityof planners for generation of pareto frontiers. In STAIRS,volume 241 of Frontiers in Artificial Intelligence and Appli-cations, 306–317.UML. 2012. OMG formal specifications. http://www.omg.org/spec/. Last accessed: 12 November 2012.van der Aalst, W. 1998. The application of Petri nets toworkflow management. Journal of circuits, systems, andcomputers 8(01):21–66.van der Aalst, W. 1999. Formalization and verification ofevent-driven process chains. Information and Software tech-nology 41(10):639–650.

90

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Whole Project (cost 2686.685)Phase RE

TaskDE RE 5TaskSA RE 5

Phase TCPTaskSA TCP 3TaskTE TCP 3

Phase HLDTaskDE TCP 3TaskDE HLD 2TaskTA HLD 5TaskSA HLD 5

Phase LLDTaskSA LLD 5TaskTE LLD 5

Phase CIMTaskME CIM 3TaskSA CIM 3TaskM CIM 5

Phase CGPTaskSA CGP 5

TaskME CGP 5Phase CIV

TaskM CIV 5TaskTA CGP 5TaskSA CIV 5

TaskME CIV 5Phase MCM

Phase CIAPhase CCI

TaskM CIA 3TaskSA CCI 4TaskD CCI 5

TaskME CIA 3TaskME MCM 5

TaskM MCM 5TaskSA CIA 1

TaskSA MCM 1Phase VCIM

TaskME VCIM 5TaskM VCIM 5

Phase DSLTTaskD DSLT 3

TaskSA DSLT 3TaskSA VCIM 5

Phase MBCGTaskD MBCG 2

TaskSA MBCG 5TaskM MBCG 5

Phase COMPTaskD COMP 5

Phase UTTaskD UT 5

Phase CATaskSA CA 5TaskM CA 5TaskD CA 5

Phase ITTaskTE IT 5TaskSA IT 5TaskT IT 5Phase UAT

Phase SOTaskTE UAT 3TaskSA UAT 3TaskDE UAT 3

TaskT UAT 5TaskTE SO 5TaskSA SO 5TaskDE SO 5

Figure 9: Gannt Charts of a POPF solution. The black bar represents the overall phase duration whereas a blue bar TaskXY Krepresents the execution of task X of the phase Y using K allocated employees.

91

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

Whole Project (cost 2750.655)Phase RE

TaskSA RE 1TaskDE RE 2

Phase TCPPhase HLD

TaskTE TCP 1TaskTA HLD 1TaskSA TCP 1TaskSA HLD 1TaskDE TCP 1TaskDE HLD 1

Phase LLDTaskSA LLD 1TaskTE LLD 1

Phase CIMPhase CGP

TaskM CIM 1TaskME CIM 1TaskTA CGP 1TaskSA CIM 1

TaskME CGP 1TaskSA CGP 1

Phase CIVTaskM CIV 1

TaskME CIV 1TaskSA CIV 1

Phase MCMPhase CIA

TaskM CIA 1TaskME CIA 1TaskM MCM 1

TaskME MCM 1Phase CCI

TaskD CCI 1TaskSA CIA 1TaskSA CCI 1

TaskSA MCM 1Phase VCIM

TaskM VCIM 1TaskME VCIM 1TaskSA VCIM 1

Phase DSLTTaskD DSLT 1

TaskSA DSLT 1Phase MBCG

TaskD MBCG 1TaskM MBCG 1TaskSA MBCG 1

Phase COMPTaskD COMP 1

Phase UTTaskD UT 1

Phase CATaskD CA 1TaskM CA 1TaskSA CA 1

Phase ITTaskT IT 1

TaskTE IT 1TaskSA IT 1Phase UAT

TaskT UAT 1TaskTE UAT 1

Phase SOTaskTE SO 1

TaskDE UAT 1TaskSA UAT 1TaskDE SO 1TaskSA SO 1

Figure 10: Gannt Charts of a POPF feasible solution. The black bar represents the overall phase duration whereas a blue barTaskXY K represents the execution of task X of the phase Y using K allocated employees.

92

Reasoning about Time Constraints in a Mixed-Initiative Calendar ManagerLiliana Ardissono and Giovanna Petrone and Marino Segnan and Gianlu a TortaDipartimento di Informati a, Universit�a di Torino, Italyemail fliliana.ardissono, giovanna.petrone, marino.segnan, gianlu a.tortag�unito.itAbstra tS heduling support is very important for alendar man-agement in order to automatize the exe ution of pos-sibly omplex reasoning tasks. However, an intera tiveapproa h is desirable to enable the user to steer the allo- ation of events, whi h is a rather personal and riti alkind of a tivity. This paper proposes a mixed-initiatives heduling model supporting the user's awareness dur-ing the exploration of the solution spa e. The paper de-s ribes the temporal reasoning te hniques underlyingMARA (Mixed-initiAtive alendaR mAnager), fo us-ing on the generation of s heduling options and on the hara terization of their properties, needed to presentthe pros and ons of ea h possible solution to the user.Keywords: mixed-initiative s heduling, temporal reason-ing, onstraint satisfa tion.Introdu tionCalendar management is burdensome and hallenging whens hedules are over onstrained or in lude items having mul-tiple parti ipants be ause it requires the veri� ation of a pos-sibly large number of temporal onstraints. However, a fullyautomated s heduling support is onsidered as hardly a - eptable for this type of a tivity be ause it fails to keep theuser in ontrol of the de isions to be taken; e.g., see (Berryet al. 2011).In the attempt to address this issue, we propose a new,mixed-initiative s heduling model that enables the user totemporally allo ate multi-user events and tasks in oopera-tion with the system. The paper also proposes a novel, on-servative s heduling poli y to suggest alendar revisions bymodifying small portions of the s hedules, leaving the restas originally planned or with minor temporal shifts. The ideais that of keeping the hanges to the user's plans as lo al aspossible in order to maintain stable daily plans.Our s heduling model is applied in the MARA Mixed-initiAtive alendaR mAnager, whi h exploits TemporalConstraint Satisfa tion Problem te hniques for suggestingsafe s heduling solutions a ross multiple alendars. Themixed-initiative intera tion with the user is a hieved by in-voking the Interval-based TEmporal Reasoner (ITER) for omputing a synthesis of the solutions to be proposed to theuser, instead of presenting a possibly large number of alter-native s hedules to hoose from. Su h a synthesis is based

on the spe i� ation of admissible intervals for the allo ationof items and of the orresponding impa t on the alendars ofthe involved people. In this way, the user an analyze the so-lution spa e at an abstra t level and sele t the paths whi hare worth to be explored in an informed way.In the following, we �rst des ribe the mixed-initiatives heduling support offered by MARA. Then, we dis uss indetail the temporal reasoning te hniques adopted in ITER.Finally, we present related resear h and on lusions.Mixed-Initiative S heduling in MARACalendar ManagementMARA supports ross- alendar management providing theuser with an overview of the impa t of her/his a tions onthe s hedules of all the involved a tors. Figure 1 shows aportion of the User Interfa e of the system. The table in theupper right portion of the page shows the list of shared al-endars and enables the user to sele t those to be jointly visu-alized. In order to let the user identify the a tors involved ina alendar item (e.g., a meeting), ea h item has asso iated anumber of verti al bars whose olors orrespond to those ofthe respe tive alendars; e.g., the parti ipants of event CDDin Figure 1 are Gianlu a and Prof. Rossi.While the user adds new items or revises the existing ones,the system he ks whether the temporal onstraints of theaffe ted items are satis�ed. If any on�i ts o ur, it noti�esthe user; moreover, it helps her/him to in rementally explorethe solution spa e in order to qui kly evaluate the availableoptions. Spe i� ally:� The system offers a �Where an I pla e the task?� featurewhi h helps the user to allo ate alendar items. When theuser asks for help regarding an item M , MARA presentsin a alendar window an overview of the feasible timeintervals in whi h M ould be allo ated, possibly mov-ing other existing items. Ea h interval I provides the userwith information useful to evaluate its onvenien y; i.e.:(i) the names of the a tors whose existing ommitmentshave to be revised if M is pla ed in I ; (ii) the riti ityof pla ingM in I , given the existing ommitments of theinvolved a tors (e.g., the riti ity is high if at least onehigh-priority ommitment has to be shifted in order to al-lo ate the item).93

Figure 1: Gianlu a's alendar, jointly visualized with Prof. Rossi's one.� The user an sele t a spe i� interval for s heduling theitem. At that point, the system further enables her/him tosteer the s heduling a tivity by presenting a few alterna-tive revision hypotheses for the alendar; e.g., alternativeitems might be shifted to allo ate M . The user an a - ept one of the suggestions, in whi h ase the involveda tors are informed and asked to on�rm the hange, or(s)he an go ba k to the available options and investigatefurther revision opportunities.See (Ardissono, Segnan, and Torta 2013) for more informa-tion about MARA's User Interfa e and fun tionality.System Ar hite tureMARA stores the information about alendars, items and a -tors as lists of obje ts. While the user revises a alendar,the ore of MARA invokes the following modules to he kwhether the user's a tions violate any temporal onstraintsand to suggest how su h on�i ts might be solved:1. Every time an item is manually added/moved, a onsis-ten y he k module he ks the onsisten y of the tem-poral onstraints asso iated to the item to verify whetherthey are satis�ed or not.2. In ase of in onsisten y (or, more generally, when re-quested by the user), the Interval-based TEmporal Rea-soner (ITER) supports the addition/movement of an itemby identifying the feasible intervals where it ould be al-lo ated and the onsequent impa t on the s hedules of theinvolved a tors. On e the user hooses where to pla e theitem, this module generates the orresponding onserva-tive alendar revisionsMARA is a Java Web appli ation. The ITER module is de-veloped in Perl using the Graph.pm extension module (Hi-etaniemi 2010) for representing and manipulating STNs.The minimization of the STNs is performed by invoking theimplementation of the Floyd-Warshall algorithm in luded in

Graph.pm. The Java Web appli ation invokes ITER as a lo- al REST servi e via HTTP.Temporal Reasoning Underlying MARARunning exampleIn this se tion we will use the following s enario to illustratethe temporal resoning performed by ITER. Our running ex-ample refers to the alendar displayed in Figure 1.Gianlu a is a University staff member and he ollaborateswith some olleagues (Liliana, Giovanna, Marino) and withthe Head of Department, Prof. Rossi.Gianlu a's alendar in ludes tea hing hours (e.g., ProgrI, SW), Department meetings (CDD, CCS), student sup-port (Tutoring, Thesist Ugo / Ida), proje t meetings (e.g.,Skype all PRIN, Meet Dr. Neri) and personal ommitments(e.g., Baseball, Plumber, Attorney). Some a tivities have a�xed s hedule; e.g., tea hing hours and Department meet-ings. Others are �exible and ould be moved to other timesif needed; for instan e, Gianlu a is available for student sup-port on Wednesday, Thursday and Friday.Suppose that Gianlu a has to s hedule a 2-hours Staffmeeting with Prof. Rossi on Thursday. The event an start at8.00 and must �nish by 16.00 (deadline). In order to a om-modate this meeting, the urrent alendar has to be revised.By analyzing Prof. Rossi's onstraints, it an be seen that heis available starting from 10.00 until 13.00. However, Gian-lu a is busy at that time be ause he tea hes SW from 9.00to 11.00 and then he meets two students (Thesist Ugo / Ida).The SW lesson annot be moved. Thus, the only solution isto move Thesist Ugo and Thesist Ida to different times.In a typi al alendar manager this revision would implythat Gianlu a �rst analyzes all the relevant ommitments (in- luding Prof. Rossi's ones) in order to identify the items that an be moved and their alternative times; then, he manuallymoves the sele ted items; �nally, he inserts the new item.Considering that alendar revision is a frequent daily a tiv-ity, it is worth saving as mu h effort as possible in it. Thus,94

an automati support to its exe ution is ru ial. Our workattempts to address this need.Representation of Calendars ItemsThe ITER module exploits reasoning te hniques basedon Temporal Constraint Satisfa tion Problems (TCSP)(De hter, Meiri, and Pearl 1991). Spe i� ally, ITER em-ploys a sub lass of TCSPs, the Simple Temporal Problems(STPs) (De hter, Meiri, and Pearl 1991), where all of the onstraints are binary and they do not ontain any disjun -tions, i.e., they have the following format:a � Xj �Xi � bThis lass of problems an be represented as a graph namedSimple Temporal Network (STN), whose onsisten y anbe he ked in polynomial time (Planken, de Weerdt, andvan der Krogt 2011). Also the minimization of an STN (i.e.,the omputation, for ea h pair of variables Xi, Xj , of aninterval [amin; bmin℄ whi h guarantees the existen e of aglobal solution for the STN) an be done in polynomial time.The s heduling of a set of alendars is done by reason-ing on the joint set of onstraints asso iated to their items. A alendar item is represented as an obje t having several fea-tures, among whi h the expe ted duration (number of hoursdevoted to the item), the earliest start time for s hedulingit, the deadline for its ompletion/end, the s hedule ( urrenttemporal allo ation), the list of parti ipants and a priority(low,medium, high) representing the importan e of the om-mitment.Given su h information, ea h alendar item M is inter-nally represented by means of:� Two numeri variablesMs andMe, representing the startand end time ofM in a given s hedule. For simpli ity, weassume that the value of a variable Ms (resp. Me) is thenumber of one-hour slots between Monday 8.00 and thestart (respe tively the end) ofM . Note, however, that theTCSP te hniques we use are able to deal with real num-bers, so that we ould easily deal with �ner granularitiesof time.� The temporal onstraints onMs andMe needed to s hed-ule M onsistently with its earliest start time, durationand deadline.� The temporal onstraints onMs andMe needed to imposepre eden e relations with respe t to other alendar items.For instan e, given an item Staff Meeting (SM in Figure 1):� The earliest start time, Thursday at 8.00, is expressed asSMs � 36 be ause in the alendar there are 36 one-hourslots between Monday 8.00 and Thursday 8.00.� The deadline, Thursday 16.00, is expressed as SMe � 44.� The duration, 2 hours, is SMe � SMs = 2 as the itemtakes 2 time slots.� A pre eden e relation among alendar items, e.g., the fa tthat SM must take pla e after another event E, is ex-pressed as SMs �Ee � 0.

With slight abuse we use the term deadline also to indi ate onstraints on the exa t end of a alendar item; e.g., the fa tthat an item P must end exa tly on Wednesday at 13.00 isrepresented as Pe = 29 (i.e., 29 � Pe � 29).ITER: Computing the Feasible IntervalsGiven the temporal onstraints of the alendar and an itemM to be added (or moved), ITER:� Sear hes for feasible intervals for allo atingM . For ea hinterval, it omputes the riti ity on the basis of the pri-orities of the existing items whi h should be moved toallo ateM in the orresponding time window.� Generates a s hedule, given a (feasible) start time forMsele ted by the user.ALGORITHM 1: Feasible intervals for starting a new alendaritemM for a spe i� user Ui.input:new itemMother Ui items in urrent s hedule order (Ti;1; : : : ; Ti;ki)STNNi (temporal onstraints forM;Ti;1; : : : ; Ti;ki)Si urrent s hedule for Ti;1; : : : ; Ti;ki1 forea h C 2 f?; L;M;Hg do2 ICi Intervals(M , (Ti;1; : : : ; Ti;ki),Ni, Si, C)3 end4 Ii DComp(I?i ; ILi ; IMi ; IHi );5 return IiNote that, different from a lassi s heduler, ITER om-putes a set of intervals within whi h the item an be pla ed,instead of a set of spe i� pla es. In this way, ITER sup-ports a ompa t and syntheti presentation of s hedulingoptions, to be further re�ned. As we shall see, ea h inter-val orresponds to different revisions of the alendars, basedon whi h time point within the interval will be hosen by theuser, as well as additional user input.The feasible intervals and the s hedules that are omputedby ITER are restri ted by onservativeness riteria. In gen-eral, this means that pla ing M in a feasible interval doesnot require a heavy revision of the urrent s hedule. In our urrent implementation, we adopt a simple onservativeness riterion, requiring that the relative order of existing itemsin the s hedule an be maintained whenM is added.For simpli ity, we will �rst onsider feasible intervalswhi h do not require to move any already s heduled itemsinvolving multiple users. The relaxation of this restri tionwill be dis ussed later, in the se tion on handling existingitems involving multiple a tors.We introdu e a notation in order to tag feasible intervalswith information about their riti ity. Ea h omputed inter-val IM;i will have an asso iated label:l(IM;i) = fUC11 ; : : : ; UCmm gwhere U1; : : : ; Um are the users involved by M and theCi supers ripts denote the riti ity of the interval for ea ha tor Ui; i.e., the impa t on Ui's alendar of allo ating95

M in the interval IM;i, in terms of revisions. Spe i� ally,Ci 2 f?; L; M; Hg, where:� ? means that the s hedule of Ui is not affe ted byM ;� L: only low-importan e ommitments of Ui have to beshifted to allo ateM ;� M: only medium- and low-importan e ommitments of Uihave to be shifted;� H: at least one high-importan e ommitment of Ui has tobe shifted;The omputation of the set of feasible intervals IM forM issplit in two steps:1. Computing the feasible intervals Ii = (Ii;1; : : : ; Ii;ni)for ea h user Ui. Ea h interval Ii;j is assigned a labell(Ii;j) = UCji with Cj 2 f?; L; M; Hg; the label spe i-�es how riti al is to allo ateM in that interval, given thetemporal onstraints of Ui's ommitments.2. Computing the joint feasible intervals IM and their labelsfrom user intervals Ii in order to synthesize a (limited)number of options to be onsidered for s hedulingM byits organizer.Computing the User Intervals. Algorithm 1 (on the pre-vious page) on erns a single user Ui and implements the�rst step. It takes as inputs: (i) the new item M to be allo- ated; (ii) Ui's other items (Ti;1; : : : ; Ti;ki) in the order inwhi h they appear in the urrent s hedule; (iii) an STN Nien oding the deadline, duration and pre eden e onstraintsfor Ti;1, : : :, Ti;ki and M ; and, (iv) the urrent s heduledtimes Si for Ti;1; : : : ; Ti;ki.In the loop starting at line 1, the algorithm omputes foursequen es of intervals I?i , ILi , IMi , IHi by invoking pro e-dure Intervals, des ribed below. Ea h sequen e ICi ontains(ki + 1) intervals, one for ea h position where M an bepla ed in the order of the existing items Ti;1; : : : ; Ti;ki. Forea h position j, interval ICi;j 2 ICi represents the set of timepoints t su h that starting M at time t requires to modifyUi's s hedule by shifting items of importan e C or lower.The intervals in sequen es I?i , ILi , IMi , IHi an overlapboth within the same sequen e and among different se-quen es. Thus, in line 4 the algorithm invokes pro edureDComp to merge I?i , ILi , IMi , IHi into a single sequen e Iiof labeled and ordered disjoint intervals. Ea h overlappingportion is labeled with the lowest (i.e., best) riti ity lass towhi h it belongs.Pro edure Intervals is alled to generate a sequen e of in-tervals of given riti ity C. Therefore, it immediately addsthe s heduled times of tasks of lasses C 0 > C to the tem-poral onstraints en oded by STN Ni, obtaining STN NCi(line 2). With these added onstraints, NCi ensures that notask with riti ity C 0 > C an be moved.Then, Intervals onsiders ea h possible positioning j ofM in the sequen e of existing items Ti;1; : : : ; Ti;ki involv-ing Ui. In this way a total order � is determined amongall the items, in luding M , and an be asserted as a setof pre eden e onstraints into the STN. The resulting STNNCi;j is then minimized, yielding a feasible interval ICi;j =

Pro edure Intervals - feasible intervals of a given lass Cfor starting itemM .input:new itemMother Ui items in urrent s hedule order (Ti;1; : : : ; Ti;ki)STNNi (temporal onstraints forM;Ti;1; : : : ; Ti;ki)Si urrent s hedule for Ti;1; : : : ; Ti;kiC riti ity lass of the omputed intervals1 ICi ();2 NCi assert s heduled times of tasks of lasses C0 > C in Ni;3 for j = 0 : : : ki do4 � (Ti;1; : : : ; Ti;j ;M; Ti;j+1; : : : Ti;ki);5 NCi;j assert order � in NCi ;6 minimize NCi;j ;7 ICi;j get interval [min;max℄ forMs from NCi;j ;8 ICi ICi � �ICi;j�;9 end10 return ICiANs

SMe

[1,1]S3s S3e

[2,2]

TUs TUe TIs TIe

z

[1,1] [1,1]

[37,41]

[25,60][25,60]

[39,39]SMs

[2,2]

[38,44]

ANe

M

H

L H

MPLs PLe

[3,3]

L

[40,48]

[36,42]

PSfrag repla ements [37,37℄ [39,39℄[39,41℄ [41,43℄ [42,44℄ [43,45℄ [46,48℄

Figure 2: STN N H2 representing Gianlu a onstraints onThursday when Staff Meeting is pla ed between S3 and TUand all existing items an be shifted.[min;max℄ for the start ofM . As said above, for ea h posi-tion j ofM , su h an interval represents the set of time pointst su h that startingM at time t requires to shift items havingimportan e C or lower in Ui's s hedule. Intervals ICi;j areadded to sequen e ICi and, after all the positions have been onsidered, su h a sequen e is returned.Example 1 Let us refer to the running example and on-sider the exe ution of Intervals when it is invoked on userGianlu a for adding the Staff meetingSM on Thursday withthe lass parameter C set to H. That day, the items alreadyallo ated for Gianlu a in lude, in the order: Attorney (AT ),SW/3 (S3), thesist Ugo (TU ), thesist Ida (TI) and Plumber(PL).Figure 2 shows a portion of the STN N H2 omputed byIntervals for user Gianlu a. Su h a network orresponds tothe iteration of the Intervals pro edure when the for loop hasset position j = 2 (i.e., between S3 and TU ).The z time point represents Monday 8.00 and the boldfa eintervals on the ar s express the minimum and maximum96

distan e between the onne ted time points. For example,interval [36; 42℄ on the ar onne ting z and SMs repre-sents: 36 � SMs � z � 42; i.e., SM must start on Thurs-day between 8.00 and 14.00. The dashed ar s represent thepre eden e between two items T 0, T 00 in the urrent order�.Their asso iated intervals, omitted for readability, would be[0;+1℄; i.e., T 00s must follow T 0e by at least 0 hours. More-over, ea h item is labeled with its importan e; e.g., AT haslow importan e and TU has medium importan e.The intervals after the minimization of N H2 are shown initali s. Spe i� ally, the intervals omputed for SMs, SMeare, respe tively, [39; 41℄ (Thursday 11.00 to 13.00) and[41; 43℄ (13.00 to 15.00). Indeed, when SM is positionedbetween S3 and TU , it an start only after the end of S3(11.00) and its latest end (15.00) must leave enough time forTU , TI and PL to be ompleted by 20:00. The start inter-val [39; 41℄ is added to the sequen e IHi of feasible intervalswith lass H for Gianlu a.Let us now onsider the interval for SMs when positionis j = 2 but the lass C is L, i.e., s heduled items of impor-tan e M and H annot be shifted. In su h a ase, the intervalfor SMs is ; be ause TU annot be moved (it has mediumimportan e); thus, SM annot be allo ated between S3 andTU .By repeating the pro ess, it is easy to see that the non-empty intervals of lass H for user Gianlu a are: [39; 41℄(j = 2); [40; 42℄ (j = 3); [41; 42℄ (j = 4). The M intervalsare the same but the L and? intervals are empty.Pro edure DComp re eives sequen es I?i ; ILi ; IMi ; IHiand omposes them into a single sequen e Ii of ordered,labeled, disjoint intervals. In line 1, it orders the start andend time points of ea h of the input intervals and it storesthem in a list T . This operation onsists of two steps:1. Flatten ea h input list of intervals ICi = (ICi;0; : : : ; ICi;ki)into an ordered list of start and end time points (s(ICi;0);: : : ; e(ICi;ki));2. Merge su h ordered lists into list T . In ase of ties, putstart time points before end time points, and further orderstart (resp., end) points in in reasing (resp., de reasing)order of their riti ity lasses (?, L, M, H). In other words,if several intervals start at the same time, the start of thebest (i.e., lowest lass) is the �rst one in the order. More-over, if several intervals end at the same time, the end ofthe best interval is the last one.In the pro edure, line 2 initializes the output sequen e Iito the empty list. Moreover it sets to 0 the ounter nin ofinput intervals and the ounter nC of input intervals of lassC to whi h the elements of T belong. The loop starting atline 3 onsiders ea h t 2 T . If t is the start point of aninput interval ICi;j (line 4), nin and nC (intervals to whi h tbelongs) are in remented. Then:� If nin is equal to 1, t orresponds to the start s(I) of anew output interval I (line 7) whose label is set to UCi ,be ause the input interval has lass C.� If nin > 1 but the lass C of the input interval is lowerthan the label l of the urrent output interval, a new output

Pro edure DComp - disjun tive omposition of labeled in-tervals.input:sequen es I?i ; ILi ; IMi ; IHi , ea h one ontaining intervalsICi;j , j = 0; : : : ; ki1 T ordered list of time points s(ICi;j), e(ICi;j) for all given ICi;j ;2 Ii (); nin 0; nC 0, C 2 f?; : : : ; Hg; l U?i ;3 forea h t 2 T do4 if (t = s(ICi;j)) then // start of an inputinterval5 nin nin + 1;6 nC nC + 1;7 if (nin = 1) then // start of an outputinterval8 s(I) t;9 l UCi10 else if (C < l) then // new output intervalwith label C11 e(I) t, l(I) l;12 Ii Ii � I;13 s(I) t;14 l UCi15 end16 end17 if (t = e(ICi;j)) then // end of an input interval18 nin nin � 1;19 nC nC � 1;20 if (nin = 0) then // end of an outputinterval21 e(I) t, l(I) l;22 Ii Ii � I;23 else if ((l = C) and (nC = 0)) then // new outputinterval with label > C24 e(I) t, l(I) l;25 Ii Ii � I;26 s(I) t;27 l UCnewi where Cnew = minfC0 : nC0 > 0g28 end29 end30 end31 return Iiinterval has to be reated after losing and adding to theoutput sequen e I the urrent output interval (line 10).If t is the end point of an input interval ICi;j (line 17), nin andnC are de remented.� If nin is equal to 0, t is the end e(I) of the urrent outputinterval I (line 20), whi h has to be losed and added tosequen e I.� If nin > 1 but the lass C of the input interval is equalto the label l of the urrent output interval and nC = 0,a new output interval has to be reated after losing andadding to the output sequen e I the urrent output interval(line 23). In line 27 the label of the new ouput intervalis set to the minimum (best) lass Cnew asso iated withat least one input interval whi h in ludes t, i.e. su h thatnCnew > 0.97

Example 2 As shown in example 1, the H and M sequen esfor Gianlu a ontain the following intervals: [39; 41℄ (j =2); [40; 42℄ (j = 3); [41; 42℄ (j = 4). The sequen e T oftime points onsidered by DComp is therefore:T = (39Ms; 39Hs; 40Ms; 40Hs; 41Ms; 41Hs; 41He; 41Me; 42He; 42He; 42Me; 42Me)where we have marked ea h t 2 T with the lass of its in-terval and its role (start/end). When t = 39Ms is onsidered,a new ouput interval I is started with Is = 39 and labelGlM (for Gianlu a). Time point t = 39Hs is skipped be auseits lass is higher than the urrent output lass (line 10 ofDComp). For the same reason, time points 40Ms,40Hs,41Ms,41Hsare skipped. Note that meanwhile the number nin of feasibleinput intervals has grown to 6, with nH = 3 and nM = 3. Forthis reason also ending time points 41He,41Me,42He,42He,42Me areskipped (line 20 of DComp). When the last point t = 42Meis onsidered, nin drops to 0 and the (only) output interval[39; 42℄ with labelGlM is returned in the sequen e Ii for Gi-anlu a.Computing the Joint Intervals. For ea h user Ui in-volved in M , Algorithm 1 omputes a sequen e of ordered,disjoint intervals Ii su h that ea h interval Ii;j has a labell(Ii;j) whi h takes values in UCi , C 2 f?; L, M, Hg.Given that the involved users are fU1; : : : ; Umg, ITER omputes a single sequen e of ordered, disjoint intervalsIM = (IM;1; : : : ; IM;n) su h that ea h element IM;j of IMrepresents a jointly feasible interval for starting M . Inter-val IM;j is labeled based on the labels of the user intervalsfrom whi h it is derived. Given I1; : : : ; Im, the sequen e ofjointly feasible intervals IM satis�es the following ondi-tions:� Two time points t; t0 belong to an interval IM;j iff forea h involved user Ui they belong to the same user in-terval Ii;ji 2 Ii.� The label l(IM;j) asso iated with interval IM;j is givenbySi l(Ii;ji ).The omputation of IM is performed by a pro edureJComp (joint omposition) analogous to DComp. The pro- edure re eives the sequen es of intervals Ii for all usersUi and produ es the single sequen e IM of jointly feasibleintervals.Example 3 As shown in Example 2, the only interval foruser Gianlu a is [39; 42℄ with label GlM. Let us assume thatProf. Rossi's BUSY3 annot be moved and that BUSY4 ould be postponed at 14.00. Then, for Prof. Rossi thereare two feasible intervals: [38; 39℄ with label Ro? (i.e., hiss hedule has not to be modi�ed) and [39; 40℄ with label RoM(medium importan e task BUSY4 has to be moved). The in-vo ation of JComp on the sequen es of intervals for Gian-lu a and Prof. Rossi returns a single interval [39; 40℄ withlabel fRoM; GlMg. Indeed, [39; 40℄ is the only feasible timewindow for both Prof. Rossi and Gianlu a; moreover, theinterval has impa t M on both a tors. Therefore, SM muststart at 11.00 or later and end by 14.00.

ITER: Computing Revisions to a CalendarWe brie�y des ribe the omputation of revisions to shared alendars be ause it only requires to run again the Intervalspro edure used for omputing the feasible intervals.Let us onsider the feasible interval [39; 40℄ with labelfRoM; GlMg omputed for SM , and let us suppose that Gi-anlu a pla es SM at time point 40 (12.00 pm). In orderto ompute the alternative revisions, Intervals has to be in-voked with the additional onstraints that SM starts at timepoint 40 and that this allo ation has impa t M on Gian-lu a and Prof. Rossi's alendars. For Gianlu a, SM an bepla ed between S3 and TU , or between TU and TI ; forProf. Rossi, between BUSY 3 and BUSY 4.The �rst revision of Gianlu a's alendar is obtained by onsidering the minimized STN omputed by pla ing SMbetween items S3 and TU , whi h also ontains feasible in-tervals for the other items in the alendar. For ea h su h itemwe hoose a start time point as lose as possible to its ur-rent s hedule, whi h results in pushing TU at 14.00, TI at15.00, and PL at 16.00. The se ond revision of Gianlu a's alendar and the only available revision of Prof. Rossi's al-endar are omputed in a similar way.Handling existing items involving multiple a torsThe results presented in the previous se tion hold whenexisting items whi h involve multiple a tors (hen eforth,meetings) annot be anti ipated nor postponed. However,when looking for the feasible intervals for a new item M ,it is desirable to onsider re-s heduling su h items. Let usassume thatM involves a set of a tors U = fU1; : : : ; Umg.The previous meetings of users U an be of two kinds:1. They involve only (some of) the members of U .2. They have additional parti ipants U 0 whi h are not in-volved inM .Re-s heduling alendar items of the se ond kind an resultin a domino effe t: a tors in U 0 may have existing meetingswith yet other users U 00, and so forth. Thus, addingM maylead to revise s hedules of people having an indire t on-ne tion with users U . While we believe this is an interestingproblem, that may likely involve some forms of automati negotiation, su h propagations are out of the s ope of thispaper. Thus, we add the following restri tion: if an existing alendar item whi h involves a tors U also involves othera tors U 0, it an be only moved to time slots where the mem-bers of U 0 are available.In the following we dis uss how to handle the items in-volving subsets of U . Handling additional users with theabove stated restri tion trivially onsists in pruning some ofthe solutions omputed for users U , based on the free timeslots of users U 0. Therefore, we do not dis uss it.In order to handle existing meetings among users U , we�rst partition them in families U1; : : : ;Up su h that:� Users U;U 0 should be in the same family U if there is ameeting involving U;U 0.� Families should be disjoint, i.e., if two families U 0, U 00share at least one user, they are repla ed by a new familyU = U 0 [ U 00.98

The previously des ribed te hniques an be applied to this ase by omputing the sequen es of intervals Ii for ea hfamily instead of for ea h user. Indeed, if usersU 01; : : : ; U 0q ina family have an existing shared meetingM 0, ea h of themhas an itemM 0i in her/his alendar representingM 0 and thestart/end times of itemsM 0i , i = 1; : : : ; q must be equal. Forthis reason, we have to build and use an STN that en odesthe alendar onstraints of all the users in the family. Afterthe sequen es for ea h family have been omputed, they anbe merged with the previously des ribed JComp pro edurefor omputing jointly feasible intervals.We now explain how Algorithm 1, Intervals and DCompare adapted to operate on families of users.Let us start by onsidering the hanges to Algo-rithm 1. Instead of re eiving a single sequen e of items(Ti;1; : : : ; Ti;ki) and a single s hedule Si asso iatedwith a user Ui, the algorithm must re eive a sequen e(Ti;1; : : : ; Ti;ki) and a s hedule Si for ea h userUi in a fam-ily U = fU1; : : : ; Uqg. Moreover, instead of an STN Ni en- oding the basi onstraints for the items of an individualuser, it must re eive an STN NU en oding the basi on-straints for the items of all of the users in family U .Pro edure Intervals returns a set of 4q sequen es, ea hone asso iating a lass to a user in the family. For example,if U = fU1; U2g, Intervals must return a sequen e I?;?U1;U2,a sequen e I?;LU1;U2, and so forth. Moreover, the number ofpositions to be onsidered by Intervals for pla ing the newmeetingM (line 3) is now:(k1 + 1) � : : : � (kq + 1)For example, if the existing tasks of U1 are (T1;1; T1;2)(k1 = 2) and the existing tasks of U2 are (T2;1; T2;2; T2;3)(k2 = 3), we must onsider s hedulingM at position �0 forU1 and 0 forU2� (i.e., before both T1;1 and T2;1), at position�0 for U1 and 1 for U2� (i.e., before T1;1 and between T2;1and T2;2), and so on.Therefore, Intervals omputes 4q sequen es of size (k1 +1) � : : : � (kq +1), and su h sequen es are passed as inputs toDComp. DComp is almost un hanged but there is an impor-tant point to make about the labels. While for the intervals ofa single user labels are totally ordered (as ? < L < M < H),this is no longer true for the intervals of a family. For exam-ple, labels U1U?;L2 (i.e., some low-importan e tasks of U2have to be shifted) and U1UL;?2 (i.e., some low-importan etasks of U1 have to be shifted) are not ordered. As a onse-quen e, the if starting at line 7 of DComp should have oneadditional bran h, overing the ase when the lass C of thestarting input interval is not omparable with the label l ofthe urrent output interval. In su h a ase, a new output in-terval should be started with label (C; l).Related WorkThe importan e of a mixed-initiative approa h to s hedul-ing alendars was re ognized in previous works, su h as(Cesta, D'aloisi, and Bran aleoni 1996) and (Berry et al.2011). However, relatively little work has been done on thistopi so far.

Some re ent alendar managers (e.g., Google CalendarSmart Res heduler (Marmaros 2010)) analyze the estimatedduration and temporal onstraints of the items to be s hed-uled in order to identify the available time slots where they ould be allo ated. However, they annot suggest any s hed-ule revisions for addressing temporal on�i ts.Many task managers su h as Things (Cultured Code2011) and Standss Smart S hedules for Outlook (Standss2012) manage tasks and deadlines but they have no s hedul-ing apabilities. Other ones are very powerful but they re-quire toomu h information from the user for everyday a tiv-ity management, and/or they only handle single-user tasks;e.g., see (Refanidis and Yorke-Smith 2010).Opportunisti s hedulers syn hronously guide the user inthe exe ution of a tivities; e.g., see (Horvitz and Subramani2007). However, they annot present an overview of long-term s hedules.PTIME (Berry et al. 2011) adopts a mixed-initiative ap-proa h and generates personalized s heduling options bylearning the user's preferen es. A major differen e with re-spe t to MARA is the fa t that it proposes omplete solu-tions to hoose from, instead of intera ting with the userduring the exploration of the solution spa e, represented asa set of feasibile intervals for adding/moving a task.The TCSP-based lassi te hniques used in the paper(De hter, Meiri, and Pearl 1991) have been extensivelystudied and extended in the temporal reasoning and plan-ning/s heduling ommunities. Of parti ular interest for ex-tensions of this paper are, among others, improved ef-� ien y of STN onsisten y he k (Planken, de Weerdt,and van der Krogt 2011), onsisten y he k of distributedSTNs (Boerkoel and Durfee 2010), in remental onsisten y he ks (Planken, de Weerdt, and Yorke-Smith 2011) andtemporal preferen es (Peintner and Polla k 2004). Whilethese te hniques ould de�nitely improve the ef� ien y ands alability of our approa h, we are not aware of any exist-ing work whi h exploits them in the way proposed in thispaper for improving the mixed-initiative user experien e in alendar management.Finally, it is worth mentioning the works by Bresinaet al. on mixed-initiative planning of Mars rover missions(Bresina and Morris 2006; 2007). In su h works, however,the role of the automated reasoner is that of helping the hu-man to plan the daily a tivity for the rover by dete ting (tem-poral) in onsisten ies and trying to explain them in terms ofprevious ommitments made by the user.Con lusionsWe presented the temporal reasoning support underlyingthe Mixed-initiative AlendaR mAnager. MARA exploitsTemporal Constraint Satisfa tion te hniques to generate safes hedules a ross multiple alendars; it adopts a mixed-initiative intera tion model to guide the user in the explo-ration of the solution spa e, providing her/him with infor-mation about the available options and their impa t on theexisting ommitments of the involved a tors. In this way, ithelps the user to qui kly solve alendar management prob-lems, leaving her/him in ontrol of the s heduling a tivity.99

A preliminary test with users provided en ouraging re-sults on the ef� a y and usefulness of MARA's alen-dar management features: users parti ularly appre iated itsawareness support be ause it enabled them to easily �nd theallo ation options for events and to sele t the most onve-nient ones by previewing their impa t on people's ommit-ments, without analyzing all the possible solutions in detail.Referen esArdissono, L.; Segnan, G. P. M.; and Torta, G. 2013. Mixed-initiative management of online alendars. In Le ture Notesin Business Information Pro essing, Web Information Sys-tems and Te hnologies, 167�182. Springer.Berry, P.; Gervasio, M.; Peintner, B.; and Yorke-Smith, N.2011. PTIME: Personalized assistan e for alendaring.ACM Transa tions on Intelligent Systems 2(4):40:1�22.Boerkoel, J., and Durfee, E. 2010. A omparison of algo-rithms for solving the multiagent simple temporal problem.In In Pro eedings of the 20th Int. Conf. on Automated Plan-ning and S heduling (ICAPS-10).Bresina, J. L., and Morris, P. H. 2006. Explanations andre ommendations for temporal in onsisten ies. In Pro . Int.Work. on Planning and S heduling for Spa e.Bresina, J. L., andMorris, P. H. 2007. Mixed-initiative plan-ning in spa e mission operations. AI Magazine 28(2).Cesta, A.; D'aloisi, D.; and Bran aleoni, R. 1996. Consid-ering the user in mixed-initiative meeting management. InPro . 2nd ERCIM Workshop on User Interfa es for All.Cultured Code. 2011. Things Ma .De hter, R.; Meiri, I.; and Pearl, J. 1991. Temporal on-straint networks. Arti� ial Intelligen e 49:61�95.Hietaniemi, J. 2010. Graph-0.94.Horvitz, E., and Subramani, M. 2007. Mobile opportunisti planning: methods and models. In Le ture Notes in Arti� ialIntelligen e n. 4511: Pro . 11th Int. Conf. on User Model-ing, 228�237.Marmaros, D. 2010. Smart res heduler in google alendarlabs.Peintner, B., and Polla k, M. E. 2004. Low- ost addition ofpreferen es to dtps and t sps. In In Pro eedings of the Na-tional Conferen e on Arti� ial Intelligen e (AAAI-04), 723�728.Planken, L.; de Weerdt, M.; and van der Krogt, R.2011. Computing all-pairs shortest paths by leveraging lowtreewidth. In In Pro eedings of the 21st Int. Conf. on Auto-mated Planning and S heduling (ICAPS-11).Planken, L.; de Weerdt, M.; and Yorke-Smith, N. 2011. In- rementally solving stns by enfor ing partial path onsis-ten y. In In Pro eedings of the 20th Int. Conf. on AutomatedPlanning and S heduling (ICAPS-10).Refanidis, I., and Yorke-Smith, N. 2010. A onstraint-basedapproa h to s heduling an individual's a tivities. ACMTransa tions on Intelligent Systems 1(2):12:1�32.Standss. 2012. Standss smart s hedules for outlook.100

Efficient DTPP solving with a reduction-based approach

Jean-Remi Bourguet and Luca PulinaPOLCOMING - University of SassariViale Mancini 5, 07100 Sassari, Italy

[email protected],[email protected]

Marco MarateaDIBRIS - University of Genova

Viale F. Causa 15, Genova, [email protected]

Abstract

Disjunctive Temporal Problems with Preferences(DTPPs) extend DTPs with piece-wise constant pref-erence functions associated to each constraint of theform l ≤ x − y ≤ u, where x, y are (real or integer)variables, and l, u are numeric constants. The goal isto find an assignment to the variables of the problemthat maximizes the sum of the preference values of sat-isfied DTP constraints, where such values are obtainedby aggregating the preference functions of the satisfiedconstraints in it under a “max” semantic. The state-of-the-art approach in the field, implemented in theDTPP solver Maxilitis, extends the approach of theDTP solver Epilitis.

In this paper we present an alternative approach thatreduces DTPPs to Maximum Satisfiability of a set ofBoolean combination of constraints of the form l ./x − y ./ u, ./∈ {<,≤}, that extends previous workthat dealt with constant preference functions only. Re-sults obtained with the Satisfiability Modulo Theories(SMT) solver Yices on randomly generated DTPPsshow that our approach is competitive to, and can befaster than, Maxilitis. This paper is to appear in theProc. of AI*IA 2013.

IntroductionTemporal constraint networks (Dechter, Meiri, andPearl 1991) provide a convenient formal frameworkfor representing and processing temporal knowledge.Along the years, a number of extensions to the frame-work have been presented to deal with, e.g. more ex-pressive preferences. Disjunctive Temporal Problemswith Preferences (DTPPs) is one of such extensions.DTPPs extend DTPs, i.e. conjunctions of disjunctionsof constraints of the form l ≤ x − y ≤ u, where x, yare (real or integer) variables, and l, u are numeric con-stants, with piece-wise constant preference functions as-sociated to each constraint. The goal is to find an as-signment to the variables of the problem that maximizesthe sum of the preference values of satisfied disjunctionsof constraints (called DTP constraints), where such val-ues are obtained by aggregating the preference func-tions of the satisfied constraints in it. We consider an

Copyright c© 2013, Association for the Advancement of Ar-tificial Intelligence (www.aaai.org). All rights reserved.

utilitarian aggregation of such DTP constraints values,and a “max” semantic for aggregating preference valueswithin DTP constraints: given a (candidate) solutionof a DTPP, the preference value of the DTP constraintis defined to be the maximum value achieved by anyof its satisfied disjuncts (see, e.g. (Moffitt 2011)). Theactual state-of-the-art approach that considers such ag-gregation methods is implemented in the DTPP solverMaxilitis, and is based on an extension of the DTPapproach of the solver Epilitis (Tsamardinos and Pol-lack 2003) to deal with piece-wise constant preferencefunctions. Various other approaches have been designedin the literature to deal with DTPPs (Sheini et al. 2005;Moffitt and Pollack 2005; 2006; Moffitt 2011), possiblyrelying on alternative preference aggregation methods(see, e.g. (Peintner and Pollack 2004; Peintner, Moffitt,and Pollack 2005)).

In this paper we present an alternative approachthat reduces DTPPs to Maximum Satisfiability ofa set of Boolean combination of constraints of theform l ./ x − y ./ u, where ./∈ {<,≤}. At first,we have considered a very natural modeling of theproblem where the generated constraints are mutuallyexclusive, and each is weighted by a preference value:the set is constructed in order to maximize the degreeof satisfaction of the DTP constraint. Preliminaryexperiments report that this solution is impractical.A second solution we propose is, instead, obtainedby extending previous work that dealt with constantpreference functions only (Maratea and Pulina 2012),and reduces each DTP constraint to a set of disjunctionof constraints, and a non-trivial interplay among theirpreference values to maximize, as before, the preferencevalue of the DTP constraint. In order to test the effec-tiveness of our proposal, we have randomly generatedDTPPs, following the method originally developedin (Peintner and Pollack 2004) and then employedin all other papers on DTPPs. In our framework,each problem is then represented as a SatisfiabilityModulo Theory (SMT) formula, and the Yices SMTsolver, that is able to deal with optimization issue, isemployed1. An experimental analysis conducted on a

1Yices showed the best performance in (Maratea and

101

wide set of benchmarks, using the same benchmarkssetting already employed in past papers, shows thatour approach is competitive to, and can be faster than,Maxilitis.

The rest of the paper is structured as follows. Sec-tion introduces preliminaries about DTPs, DTPPs andMaximum Satisfiability. Then, in Section we presentour reduction from DTPPs to Maximum Satisfiabilityof Boolean combination of constraints, while the exper-imental analysis is presented in Section . The paperends by providing a discussion about the related workin Section and some conclusions in Section .

Formal BackgroundProblems involving disjunction of temporal constraintshave been introduced in (Stergiou and Koubarakis1998), as an extension of the Simple Temporal Prob-lem (STP) (Dechter, Meiri, and Pearl 1991), whichconsists of conjunction of different constraints. Theproblem was referred for the first time as DisjunctiveTemporal Problem (DTP) in (Armando, Castellini, andGiunchiglia 1999), and is presented in the first subsec-tion. The remaining subsections introduce MaximumSatisfiability of DTPs and DTPPs.

DTP

Let V be a set of symbols, called variables. A constraintis an expression of the form l ./ x − y ./ u, where./∈ {<,≤}, x, y ∈ V, and l, u are numeric constants.A DTP constraint is a disjunction of constraints hav-ing ./=≤ (equivalently seen as a disjunctively intendedfinite set of constraints). A DTP formula, or simplyformula, is a conjunction of DTP constraints. A DTPconstraint can be either hard, i.e. its satisfaction ismandatory, or soft, i.e. its satisfaction is not necessarybut preferred, and in case of satisfaction it contributesto the generation of high quality solutions according tothe aggregation methods employed and defined later.A DTPA constraint is a Boolean combination of con-straints.

About the semantics, let the set D (domain of inter-pretation) be either the set of the real numbers R orthe set of integers Z. An assignment is a total func-tion mapping variables to D. Let σ be an assignmentand φ be a formula composed by hard DTP constraintsonly. Then, σ |= φ (σ satisfies a formula φ) is definedas follows

• σ |= l ≤ x− y ≤ u if and only if l ≤ σ(x)− σ(y) ≤ u;

• σ |= ¬φ if and only if it is not the case that σ |= φ;

• σ |= (∧ni=1φi) if and only if for each i ∈ [1, n], σ |= φi;and

• σ |= (∨ni=1φi) if and only if for some i ∈ [1, n], σ |= φi.

Pulina 2012) among of number of alternatives, and it isthe only SMT solver able to cope with (Partial Weighted)Maximum Satisfiability problems.

If σ |= φ then σ is also called a model of φ. We also saythat a formula φ is satisfiable if and only if there ex-ists a model for φ. The DTP is the problem of decidingwhether a formula φ is satisfiable or not in the given do-main of interpretation D. Notice that the satisfiabilityof a formula depends on D, e.g. the formula

x− y > 0 ∧ x− y < 1

is satisfiable if D is R but unsatisfiable if D is Z. How-ever, the problems of checking satisfiability in Z and Rare closely related and will be treated uniformly.

Max-DTP

Consider now a DTPA formula φ consisting of hardDTP constraints and soft DTPA constraints. Intu-itively, in this case the goal is to find an assignment tothe variables in φ that satisfies all hard DTP constraintsand maximizes the sum of the weights associated tosatisfied soft DTPA constraints. The problem is calledPartial Weighted Maximum Satisfiability of DTPA, andis formally defined as a pair 〈φ,w〉, where

1. φ is a DTPA formula consisting of both hard DTPand soft DTPA constraints, and

2. w is a function that maps DTPA constraints to pos-itive integer numbers.

More precisely, the goal is to find an assignment σ′ forφ that satisfies all hard DTP constraints and maximizesthe following linear objective function f

f =∑

d∈φ,σ′|=dw(d) (1)

where d is a soft DTPA constraint. In the following,for simplicity we will use Max-DTP to refer to the Par-tial Weighted Maximum Satisfiability problem of mixedDTP and DTPA constraints as defined above.

DTPP

DTPP is an extension of DTP, and it is defined as apair 〈φ,w′〉, where

1. φ is a DTP formula consisting of both hard and softDTP constraints, and

2. w′ is a (possibly partial) function that maps con-straints in soft DTP constraints to piece-wise con-stant preference functions.

We consider, as before, an utilitarian method for ag-gregating soft DTP constraints weights: the goal is nowto find an assignment σ′ for φ that (i) satisfies all hardDTP constraints, and (ii) maximizes the sum of weightsassociated to satisfied soft DTP constraints, i.e. maxi-mizes the linear objective function (1).

It is left to define how weights, corresponding topreference values, are aggregated within soft DTP con-straints to “define” their weights w(d) in (1). In ourwork we consider a prominent semantic for this pur-pose: the max semantic.

102

Given a constraint dc := l ≤ x−y ≤ u, its preferencefunction w′(dc) is in general defined as:

w′(dc) : t ⊆ [l, u]→ [0, R+]

mapping every feasible temporal interval t to a pref-erence value expressing its weight. The max seman-tic (Moffitt and Pollack 2005; Moffitt 2011) defines theweight w(d) of a satisfied soft DTP constraint d as themaximum among the possible preference values of sat-isfied constraints in d, i.e. given an assignment σ′

w(d) := max{w′(σ′(x)− σ′(y)) : dc ∈ d, σ′ |= dc}

Reducing DTPPs to Max-DTPsAs we said before, our main idea is to reduce the prob-lem of solving DTPPs to solving Max-DTPs. HardDTP constraints remain unchanged in our reduction,while soft DTP constraints need special treatment.Given a soft DTP constraint d, for each constraint dcin d, let Ldc be a set of pairs, each pair 〈DC, v〉 beingcomposed by (i) a set DC of pairs (l, u), represent-ing the end points of intervals, such that [l, u] ⊆ [l, u],and (ii) the preference value v of the constraints ofthe type l ./ x − y ./ u, ./∈ {≤, <}, extracted fromDC, where the variables x, y are obtained from the con-straint name. If the preference function is a constant v′,Ldc is composed by only one pair 〈{(l, u)}, v′〉, i.e. theinterval [l, u], representing the constraint l ≤ x−y ≤ u,and its preference value v′.

We need now to “aggregate” the preference valuescorresponding to different levels of the piece-wise con-stant functions in the various constraints in order toimplement our reduction. The idea is to “merge” thepairs 〈DC, v〉, representing preference function of con-straints, in the same soft DTP constraint; intuitively,this means that, if the candidate solution satisfies atleast one of the constraints obtained from DC at pref-erence value v, then a possible preference value for d isv.

More formally, consider aggregating Ldc1 and Ldc2 ,coming from two constraints dc1 and dc2 in d, respec-tively. Ldc1∨dc2 :=merge(Ldc1 , Ldc2) is an operatorthat

• contains the preference values that are in the prefer-ence functions of dc1 or dc2; and

• if the preference functions of dc1 and dc2 have acommon preference value, i.e. Ldc1 contains a pair〈DCi, vi〉, Ldc2 contains a pair 〈DCj , vj〉 and vi = vj ,these pairs are merged and Ldc1∨dc2 contains a pair〈DCi ∪DCj , vi〉.Moreover, during merge pairs (l, u) are attached a

subscript, from which we deduce the ordered pair ofvariables involved in the constraint it represents.

The operator merge can be easily generalized to anarbitrary finite number of constraints.

Consider a soft DTP constraint

d := dc1 ∨ ... ∨ dck (2)

where {dc1, . . . , dck} is the set of constraints in d.

The first attempt we considered for our reduction isto express a soft DTP constraint d using soft DTPA

constraints that force the highest preference value as-sociated to satisfied constraints in d to be assigned asweight for d. First, we apply the operator merge toall the constraints in d, and related piece-wise constantpreference functions, i.e. Ld :=merge(Ldc1 , . . . , Ldck).

Further, consider an ordering on the k pairs in Ld of adc in d induced by the preference values, i.e. an ordering≺ is which 〈DCi, vi〉 ≺ 〈DCj , vj〉 iff vi < vj , 1 ≤ i, j ≤k, i 6= j. For simplicity, from now on we consider thepairs in Ld to be re-ordered according to ≺, i.e. DC1 isthe set whose v1 is maximum among the weights in d,i.e. v1 > vi, 2 ≤ i ≤ k, while the set DCk is such thatvk < vi, 1 ≤ i ≤ k − 1.

Then, starting from Ld, d and its preference value areexpressed by the following |Ld| soft DTPA constraints:for each z = 1 . . . |Ld|cz := ∧z−1i=1¬(∨p∈DCi

dcp)∧(∨p∈DCzdcp), w(d) = w(cz) = vz

(3)where dcp is a constraint built from the pair p (we re-

call that the subscript identifies the variables involvedin the constraint, and in which order). The set of con-straints is mutually exclusive: considering an assign-ment, at most one of the constraints in (3) can be sat-isfied, and the relative value is assigned to d. If a con-straint in (3) is satisfied, this is the constraint leadingto the maximum value (according to the candidate so-lution considered).

This is done for each soft DTP constraint in theformula.

Example. Consider a soft DTP constraint dc1∨dc2,where dc1 : 1 ≤ x − y ≤ 10 and dc2 : 5 ≤ z − q ≤ 15.The piece-wise constant preference function associatedto dc1 is

f(dc1) =

{1 1 ≤ x− y ≤ 32 3 < x− y ≤ 71 7 < x− y ≤ 10

(4)

and can be represented with Ldc1 ={〈{(1, 3), (7, 10)}, 1〉, 〈{(3, 7)}, 2〉}.

Regarding dc2, its preference function is

f(dc2) =

{2 5 ≤ z − q ≤ 84 8 < z − q ≤ 102 10 < z − q ≤ 15

(5)

represented with Ldc2 ={〈{(5, 8), (10, 15)}, 2〉, 〈{(8, 10)}, 3〉}. We now “merge”Ldc1 and Ldc2 into Ldc1∨dc2 :=merge(Ldc1 , Ldc2)whose result is

{〈{(1, 3)1, (7, 10)1}, 1〉, 〈{(3, 7)1, (5, 8)2, (10, 15)2}, 2〉, 〈{(8, 10)2}, 4〉}.(6)

Following (3), the reduction is

c1 : (8 < z − q ≤ 10), w(c1) = 4

103

c2 : ¬c1∧((3 < x−y ≤ 7)∨(5 ≤ z−q ≤ 8)∨(10 < z−q ≤ 15)), w(c2) = 2

c3 : ¬c1 ∧ ¬c2 ∧ (1 ≤ x− y ≤ 3 ∨ 7 < x− y ≤ 10), w(c3) = 1

Further note that the preference functions we haveconsidered are characterized by having the left-mostsub-interval with both bounds included, while theremaining sub-intervals have only the right boundincluded: to correctly reproduce the reduction fromthe set L, we have further assumed that with thesubscript we can recognize the left-most sub-interval ofeach constraint.

This first reduction corresponds to a very naturalway of expressing soft DTP constraints; unfortunately,preliminary experiments show that it is inefficient.

A second reduction transforms each soft DTP con-straint d to |Ld| soft DTPA constraints as follows: foreach z = 1 . . . |Ld|

c′z := ∨zi=1 ∨p∈DCi dcp (7)

The problem is now to define what are the weightsassociated to each newly defined soft DTPA constraint,in order to reflect the semantic of our problem. In theprevious reduction (2), the constraints occurred posi-tively only once; now there can be many occurrencesin the corresponding soft DTPA constraints in (7) thatinfluence constraints weights adaptation and definition.Our solution starts from the following fact: if the con-straint c′|Ld| (i.e. the one that contains all constraints

generated with out method) is satisfied, it is safe toconsider that it contributes for at least the minimalpreference value v|Ld|, i.e. the one associated to the setDC|Ld|, from which c′|Ld| is constructed. Satisfying the

constraint c′|Ld|−1 contributes for v|Ld|−1 − v|Ld|, and

given that a constraint c′z implies all constraints c′z′ ,z′ > z, these two soft DTPA constraints together con-tribute for v|Ld|−1. This method is recursively appliedup to the set of constraints constructable from DC1,i.e. c′1, whose preference value is v1−v2 and, given thatc′1implies all other introduced soft DTPA constraints,satisfying c′1 correctly corresponds to assign a weightv1 to d.

More formally, for each z = 1 . . . |Ld|

w(c′z) =

{v|Ld| z = |Ld|vz − vz+1 1 ≤ z < |Ld| (8)

and, given an assignment σ′, w(d) =∑z∈{1,...,|Ld|},σ′|=c′z vz.

Example. Concerning the second reduction, thesoft DTPA constraints that express the constraint dwith the preference functions in the previous exampleare

c′1 := 8 < z − q ≤ 10, w(c′1) = 2

c′2 := c′1∨(3 < x−y ≤ 7∨5 ≤ z−q ≤ 8∨10 < z−q ≤ 15), w(c′2) = 1

c′3 := c′1 ∨ c′2 ∨ (1 ≤ x− y ≤ 3 ∨ 7 < x− y ≤ 10), w(c′3) = 1

Such reduction works correctly if we consider a sin-gle soft DTP constraint. However, considering a for-mula φ, given our reduction, it is possible to have re-peated DTPA constraints in the reduced formula φ′. Inthis case, intuitively, we want each single occurrencein φ′ to count “separately”, given that they take intoaccount different contributions from different soft DTPconstraints in φ. A solution is to consider a single oc-currence of the resulting soft DTPA constraint in φ′

whose weight is the sum of the weights of the variousoccurrences. The same applies to the first reduction.

Experimental Analysis

We have implemented both reductions, and expressedthe resulting formulas as SMT formulas with optimiza-tion, then solved with Yices ver. 1.0.38. A prelim-inary analysis showed that the first reduction is notcompetitive, thus our experimental analysis comparesthe performance of our second reduction, called dtp-pYices, with two versions of the Maxilitis solver,namely Maxilitis-IW and Maxilitis-BB. Maxilitis-IW (IW standing for Iterative Weakening) searches forsolutions with a progressively increasing number of vi-olated constraints; Maxilitis-BB uses a branch andbound approach for reaching the optimal solution. Ourexperiments aim at comparing the considered solverson two dimensions, namely the size of the benchmarksand the number of preference levels in the piece-wiseconstant preference function, as used in past papers onDTPPs, with the same parameter settings. Moreover,we also investigated the performance of the solvers inthe case where the preference values of the levels of thepiece-wise preference functions are randomly generated.For randomly generating the benchmarks the main pa-rameters considered are: (i) the number k of disjunctsper DTP constraint; (ii) the number n of arithmeticvariables; (iii) the number m of DTP constraints; and(iv) number l of levels in the preference functions.2 Foreach tuple of values of the parameters, 25 instances havebeen generated.

The experiments reported in the following ran on PCsequipped with a processor Intel Core i5 running at 3.20GHz, with 4 GB of RAM, and running GNU LinuxUbuntu 12.04. The timeout for each instance has beenset to 300s.

As a first experiment, we randomly generated bench-marks by varying the total amount of constraints, withthe following parameters: k=2, m ∈ {10, . . . , 100},n=0.8 ×m, l=5, lower and upper bounds of each con-

2The preference functions considered are semi-convexpiece-wise constant: starting from the lower and upperbounds of the constraints, intervals corresponding to higherpreference levels are randomly put within the interval of theimmediate lower level, with a reduction factor, up to anhighest level. For details see, e.g. (Moffitt 2011).

104

Figure 1: Results of the evaluated solvers on random DTPPs.

straint taken in [−50, 100]. In this setting, the prefer-ence value of the i-th levels is i.3

The results obtained in the experiment are shown inFigure 1, which is organized as follows. Concerning theleft-most plots, in the x axis we show the total amountof constraints, while in the right-most plots the totalamount of levels of the piece-wise constant preferencefunction is reported. In the y axis (in log scale), itis shown the related median CPU time (in seconds).Maxilitis-BB’s performance is depicted by blue tri-angles, Maxilitis-IW’s by using orange upside downtriangles, and dtppYices performance is denoted byblack circles. Plots in the top row have a preferencevalue corresponding to i for the i-th preference level,while plots in the bottom row are related to randomDTPPs whose preference values are randomly gener-ated in {1, . . . , 100} (still ensuring to maintain the sameshape for preference functions),

Looking at Figure 1, and considering the top-left plot,we can see that the median time of Maxilitis-BB onbenchmarks with 100 constraints runs into timeout. Wecan also see the up to m = 80, Maxilitis-IW is one or-der of magnitude of CPU time faster that dtppYices,while for m > 80 the performance of the solvers are inthe same ballpark. Now, considering the same analysisin the case where values of the preference levels are ran-domly generated, we can see (bottom-left plot) that thepicture changes in a noticeable way. Benchmarks areharder than previously: Maxilitis-BB and Maxilitis-IW are not able to efficiently cope with benchmarkswith m > 30. In this case, dtppYices is the best

3These benchmarks have been generated using the pro-gram provided by Michael D. Moffitt, author of Maxilitis.

solver, and we report that it is able to deal with bench-marks up to m = 60.

Our next experiment aims to evaluate the solvers byvarying the number of levels in the preference functions,with the following parameters: k=2, n=24, m=30,l ∈ {2, . . . , 8}, lower and upper bounds of each con-straint taken in [−50, 100]. Top-right and bottom-rightplots have the same meaning as before w.r.t. the pref-erence functions. Looking at the top-right plot of Fig-ure 1, we can see that Maxilitis-IW is the best solverup to l = 7, while for l = 8, we report that dtppYicesis faster. Also in this case Maxilitis-BB does not ef-ficiently deal with the most difficult benchmarks in thesuite. Looking now at the plot in the bottom-right, wecan see the same picture related to the bottom-left plot:the performance of both versions of Maxilitis are verysimilar, while dtppYices is the fastest solver: the me-dian CPU time of both Maxilitis-BB and Maxilitis-IW runs in timeout for l > 5, while dtppYices solvesall set of benchmarks within the time limit. Along withthe previous results, this reveals that Maxilitis mayhave specialized techniques to deal with DTPPs whosepreference values are of the first type we have analyzed.

Related Work

Maxilitis (Moffitt 2011; Moffitt and Pollack 2005),WeightWatcher (Moffitt and Pollack 2006) andARIO (Sheini et al. 2005) implement different ap-proaches for solving DTPPs as defined in (Peintner andPollack 2004). Maxilitis is a direct extension of theDTP solver Epilitis (Tsamardinos and Pollack 2003),while WeightWatcher uses an approach based onWeighted Constraints Satisfaction problems, even if the

105

two methods are similar (as mentioned in, e.g., (Mof-fitt and Pollack 2006)). ARIO, instead, relies on anapproach based on Mixed Logical Linear Programming(MLLP) problems. In our analysis we have used Max-ilitis because the results in, e.g. (Moffitt 2011) clearlyindicate its superior performance.

About the comparison to Maxilitis, our solution iseasy, yet efficient, and has a number of advantages w.r.t.the approach of Maxilitis. On the modeling side, itallows to consider (with no modifications) both integerand real variables, while Maxilitis can deal with inte-ger variables only. Moreover, our implementation pro-vides an unique framework for solving DTPPs, whilethe techniques proposed in (Moffitt 2011) are imple-mented in two separate versions of Maxilitis. Finally,our solution is modular, i.e. it is easy to rely on differ-ent back-end solvers (or, on a new version of Yices),thus taking advantages on new algorithms and tools forsolving our formulas of interest.

Conclusions

In this paper we have introduced a general reduction-based approach for solving DTPPs, that reduces theseproblems to Maximum Satisfiability of DTPs as definedin the paper. An experimental analysis performed withthe Yices SMT solver on randomly generated DTPPsshows that our approach is competitive to, and some-times faster than, the specific implementations of theMaxilitis solver. The executable of our solver can befound at

http://www.star.dist.unige.it/~marco/DTPPYices/.

Acknowledgment. The authors would like to thankMichael D. Moffitt for providing his solvers and the pro-gram for generating random benchmarks, and BrunoDutertre for his support about Yices.

References

Armando, A.; Castellini, C.; and Giunchiglia, E. 1999.SAT-based procedures for temporal reasoning. In Bi-undo, S., and Fox, M., eds., Proc. of the 5th Euro-pean Conference on Planning (ICAPS 1999), volume1809 of Lecture Notes in Computer Science, 97–108.Springer.

Dechter, R.; Meiri, I.; and Pearl, J. 1991. Temporalconstraint networks. Artificial Intelligence 49(1-3):61–95.

Maratea, M., and Pulina, L. 2012. Solving disjunctivetemporal problems with preferences using maximumsatisfiability. AI Commununications 25(2):137–156.

Moffitt, M. D., and Pollack, M. E. 2005. Partial con-straint satisfaction of disjunctive temporal problems.In Russell, I., and Markov, Z., eds., Proc. of the 18thInternational Conference of the Florida Artificial In-telligence Research Society (FLAIRS 2005), 715–720.AAAI Press.

Moffitt, M. D., and Pollack, M. E. 2006. Temporalpreference optimization as weighted constraint satis-faction. In Proc. of the 21st National Conference onArtificial Intelligence (AAAI 2006). AAAI Press.

Moffitt, M. D. 2011. On the modelling and optimiza-tion of preferences in constraint-based temporal rea-soning. Artificial Intelligence 175(7-8):1390–1409.

Peintner, B., and Pollack, M. E. 2004. Low-cost addi-tion of preferences to DTPs and TCSPs. In McGuin-ness, D. L., and Ferguson, G., eds., Proc. of the 19thNational Conference on Artificial Intelligence (AAAI2004), 723–728. AAAI Press / The MIT Press.

Peintner, B.; Moffitt, M. D.; and Pollack, M. E. 2005.Solving over-constrained disjunctive temporal prob-lems with preferences. In Biundo, S.; Myers, K. L.;and Rajan, K., eds., Proc. of the 15th InternationalConference on Automated Planning and Scheduling(ICAPS 2005), 202–211. AAAI.

Sheini, H. M.; Peintner, B.; Sakallah, K. A.; and Pol-lack, M. E. 2005. On solving soft temporal constraintsusing SAT techniques. In van Beek, P., ed., Proc. ofthe 11th International Conference on Principles andPractice of Constraint Programming (CP 2005), vol-ume 3709 of Lecture Notes in Computer Science, 607–621. Springer.

Stergiou, K., and Koubarakis, M. 1998. Backtrackingalgorithms for disjunctions of temporal constraints. InShrobe, H. E.; Mitchell, T. M.; and Smith, R. G., eds.,Proc. of the 15th National Conference on Artificial In-telligence (AAAI 1998), 248–253. AAAI Press / TheMIT Press.

Tsamardinos, I., and Pollack, M. 2003. Efficient so-lution techniques for disjunctive temporal reasoningproblems. Artificial Intelligence 151:43–89.

106