1
Reactive Planning in a Motivated Behavioral Architecture Éric Beaudry, Yannick Brosseau, Carle Côté, Clément Raïevsky, Dominic Létourneau, Froduald Kabanza, François Michaud Université de Sherbrooke, Québec, Canada http://www.gel.usherb.ca/laborius/ - [email protected] Abstract To operate in natural environmental settings, autonomous mobile robots need more than just the ability to navigate in the world, react to perceived situations or follow predetermined strategies: they must be able to plan and to adapt those plans according to the robot's capabilities and the situations encountered. Navigation, simultaneous localization and mapping, perception, motivations, planning, etc., are capabilities that contribute to the decision-making processes of an autonomous robot. How can they be integrated while preserving their underlying principles, and not make the planner or other capabilities a central element on which everything else relies on? In this paper, we address this question with an architectural methodology that uses a planner along with other independent motivational sources to influence the selection of behavior producing modules. Influences of the planner over other motivational sources are demonstrated in the context of the AAAI Challenge. Motivated Behavioral Architecture (MBA) Motivations Motivational modules (MM) are high level independent modules that influence the way that the robot is behaving. MM can add, modify, query and recommend tasks in DTW. Dynamic Task Workspace (DTW) DTW is a central component that organizes tasks in a hierarchy using a tree-like structure, from high-level/abstract tasks to primitive/BPM-related tasks. System Know How (SNOW) The SNOW is a rule-based template that maps low level tasks with BPM. Behavior-Producing Modules (BPM) BPMs define how particular percepts and conditions influences the control of the robot’s actuators. An arbitration mechanism (priority-based in this implementation, but other methods like fuzzy logic could be used) filters the commands generated by BPMs before applying them to the actuators. Reactive Planning Process Experimentation Setup : AAAI Challenge 2005 Exemples, Tests and Results Conclusion Tasks Mangagement and Selection in MBA Inside the Plan motivation. we introduce a reactive planning process. The Planner is decomposed in two sub-processes that can run concurrently. The first one is the ExecMonitor sub-process (for execution and monitoring) that communicates with the Dynamic Task Workspace to extract the robot’s current mission. It also adds and recommends lower level tasks to achieve high-level tasks in DTW. The Planning sub-process invokes a planner for planning the mission. The reactive planning process is generic and is not limited to one particular planner. MBAPlanner_ExecMonitor( ) : while(true) : e = waitForEventOrTimeout(); switch case(e) : newTask(t) : if(isPlannableTask(t)) : mission+=t; needreplan = true; cancelledTask(t) : if(mission.contains(t)) : mission -= t; needreplan = true; if(currenttask.isDecendendOf(t)) : currenttask = <notask>; completedTask(t) : if(t == currenttask) : if(currenttask.isLastDecendendOf(currenttask.parent)) : workspace.SetCompleted(currentTask.parent); currenttask = plan.nextTask(); if(mission.contains(t)) : mission -= t; needreplan |= plan.containsDecendantOf(t); failedTask(t) : if(t == currenttask) : currenttask = <notask> needreplan = true; sensedData(d) : currentstate.update(d); needreplan |= plan.validate(currentstate)==FAILURE; if(needreplan): resetPlanningProcess(); workspace.setRecommendation(currenttask); MBAPlanner_Planning( ) : while(true) : if(plan.progressFasterThanPlanned() && !plan.nextTask().isReady(now + planningAvgDuration)) needreplan = true; if(needreplan) : needreplan = false; for each task t in mission : if(planner.fastplan(t) == FAILURE) : mission -= t; workspace.failedTask(t); for each task pair (t1, t2) : tempmission = {t1, t2} if(planner.fastplan(tempmission) == FAILURE): mission -= lowestprioritytask(t1, t2); while(newplan == FAILURE) : newplan = planner.plan(currenttask, mission); if(newplan == FAILURE) : mission -= lowestprioritytask(mission); plan = newplan; currenttask = plan.nextTask(); monitorexec.sendTimeoutEvent(); // Detect and accept the possibility of an opportunity // Plan as needed // First : removing unachievable tasks // Second : try to detect mutex tasks by sufficient condition Task Life Cycle Office map experimentation Testing Scenarios To experiment and validate the MBA architecture, we created a set of scenarios inspired from the AAAI Challenge. In these simplified scenarios, the robot has to perform tasks specified by people in a simulated conference site. 1 GiveConference : make a presentation at a predetermined location and at a specific time. 1 AttendConference : listen to a presentation at a specific location during a fixed period of time. 1 AttendPoster : look at a poster, specified by a location and a duration, during the poster session time window. 1 DeliverMessage : receive a message to a person at an initial location and bring the message to another person at the destination location. 1 Guard : guard a location during a fixed time period. 1 Explore : explore the environment by doing wandering and looking for interesting things (e.g.: tracking symbols). 1 OpenInteraction : interact with other attendees. Help people or receive help as needed. Hypothesis 1 The metric map of the environment is already known (no mapping to do). 1 Locations are partially known : the robot has to found the location position on the map by asking help or searching for landmark. 1 Tasks can be given using a graphical interface on the robot and/or directly from a mission text file. Hardware We use U2S/Spartacus, a UdeS wheeled robot platform equipped with a laser ranger finder for navigation. This robot is controlled by a laptop computer . Software Player : low-level interface for communicating with the hardware. MARIE : software integration environment. FlowDesigner / RobotFlow : execution control platform for BPM implementation. ConfPlan : a HTN-based planner with anytime and metrics capabilities. CARMEN : localization and path planning for position tracking and navigation. Planning Domain 1 Navigation table with distances between each location pairs. Dynamically computed as new locations are found. 1 Average speed of the robot. 1 Operators with preconditions and effects modeling the robot’s actions. 1 HTN task templates as search controls. 1 Optimization criteria : Complementarity of Motivational Modules 1 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 Initial State p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 time=0, RobotAt(p1) Mission ! Guard p6 at time=0:15 for a duration of 0:05. ! DeliverMessage from c1 to p7. * Location c1 is unknown at start. 2 Planner Plan Since c1 position is unknown, the planner cannot make a good estimation on how to reach it. So, to be sure to guarantee the accomplishment of the Guard task, it assumes the worst case, that is c1 is very far. The consequence is that going to p6 is the first action. 3 Instinctive Explore While the Plan MM is executing its plan by recommending a ProceedTo ( p6 ) task, the Explore MM is recommending finding location c1. This forces the GUI to ask for help. A person clicks on the robot’s touch screen to show c1 position on the displayed map. 4 Planner Replan When the Plan MM is notified about the position of c1 (between p2 and p7), the planner is reinvoked to find a new plan. Because AskMessage and GiveMessage is feasible before Guard, the planner chooses this order. 5 Trace p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 t=0:02 Receive Help t=0:06 Ask Message t=0:08 Give Message t=0:15 Guard Mission Optimization and Failure Detection from the Planner 0:00 1:00 Goto(p2) 1. AskMessage(p2,m1) 2. Goto(p3) 3. GiveMessage(p3,m1) 4. Goto(p9) 5. AskMessage(p9,m2) 6. Goto(p6) 7. GiveMessage(p6,m2) 8. Goto(p4) 9. GuardLocation(p4, 23m, 5m) 10. Goto(p3) 11. GuardLocation(p3, 36m, 5m) 12. Initial State ! RobotAt(p1) ! time = 0 One objective with the MBA architecture is not to have one central module on which all decisions depend on. At the same time, the MBA wants to take advantage of a deliberative module (from AI planning) to improve overall performance of the architecture. To show that MBA is not centralized on a planner, test cases demonstrate that the system can continue to operate and achieve totally or partially submitted missions. Optimal plan for this test We generate random tests that executed with and without the planner. Random Tests Initial State - RobotAt(Random_Place) - time = 0 Mission - 1 or 2 DeliverMessage with random origin and destination places - 1 or 2 Guard at random places and times - 1 or 2 AttendConference at random places and times The intelligence of a system depends on its sensing, acting and processing capabilities, not taken individually but as a whole. The work presented here offers one solution by integrating different motivational sources such as a planner to influence the decision-making process of an autonomous mobile robot. Just using a planner to select behavioral modes would require frequent generation of plans to handle dynamic changes in real life settings. Not using a planner makes it difficult to anticipate and reorganize behavioral strategies. Our objective is to try to find the right balance between the two, using the planner as a motivational source allowing the robot to act rationally by selecting and sequencing primitive tasks. Acknowledgments F. Michaud holds the Canada Research Chair (CRC) in Mobile Robotics and Autonomous Intelligent Systems. Support for this work is provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chair program and the Canadian Foundation for Innovation. Many students on the project are supported by NSERC and by the Fonds québécois de la recherche sur la nature et les technologies (FQRNT) programs. 0:00 1:00 2:00 Goto(p6) 1. GuardLocation(p6, 0:15, 0:05) 2. FindPlace(c1) 3. Goto(c1) 4. AskMessage(c1) 5. Goto(p7) 6. GiveMessage(p7) 7. 0:00 1:00 Goto(c1) 1. AskMessage(c1) 2. Goto(p7) 3. GiveMessage(p7) 4. Goto(p6) 5. GuardLocation(p6, 0:15, 0:05) 6. Task Selection for BPM Activations Trace with Planner Trace without Planner p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 Task t t Task t t s1 Task t t s1 Task t t s1 s2 Task t t s1 s2 Task t t s1 s2 t MM 2 MM 1 MM 3 MM 4 1. MM adds task t 1 2. MM adds child task s1 2 3. MM handles task s1 and 3 marks it completed 4. MM deletes task s1 and 2 adds child task s2 5. MM handles task s2 and 4 marks it completed 6. MM deletes task s2 and 2 marks task t completed 8. MM deletes 1 task t Environment GUI Agenda Mission ! DeliverMessage(p2, p3, m1) ! DeliverMessage(p9, p6, m2) ! Guard(p3, 23m, 5m) ! Guard(p4, 36m, 5m) Sample Test t=0:07 AskMessage t=0:07 AskMessage t=0:13 GiveMessage t=0:18 AskMessage t=0:20 AskMessage t=0:26 Guard t=0:36 Guard t=0:48 AskMessage t=0:55 GiveMessage t=0:03 AskMessage t=0:06 AskMessage t=0:26 Guard t=0:36 Guard c1 Dynamic Task Workspace RestBehavior AvoidBehavior AskForHelpBehavior GotoBehavior WanderBehavior Behavior-Producing Modules GiveConference(p1, t1) proceedTo(p1) Goto(x1, y1) AttendConference(p3, t2) ProceedTo(p3) FindPlace(p2) AskForHelp(p2) DeliverMessage(p1, p2) Avoid() GUI GUI GUI Goto(x2, y2) Plan Explore Agenda Survive Navigate Navigate Explore BPM Selection Veto Selection Strategy Rest() GUI + Rec(Plan)=rec + DR (t) : Direct positive recommendation for task t - DR (t) : Direct negative recommendation for task t Rec(t) : Recommendation status for task t Rest + Rec(Plan)=rec + Rec(Explore)=rec - Rec(Agenda)=rec + Rec(Survive)=rec + Rec(GUI)=rec Each MM is responsible for handling a subset of tasks in DTW and each task can be handled by one or more MM. In the case that more than one MM can handle a task, they compete for determining how to achieve the task. ProceedTo(p1) c1

Éric Beaudry, Yannick Brosseau, Carle Côté, Clément ...ericbeaudry.uqam.ca/aaai-05/aaai05-poster-laborius.pdfReactive Planning in a Motivated Behavioral Architecture Éric Beaudry,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Éric Beaudry, Yannick Brosseau, Carle Côté, Clément ...ericbeaudry.uqam.ca/aaai-05/aaai05-poster-laborius.pdfReactive Planning in a Motivated Behavioral Architecture Éric Beaudry,

Reactive Planning in a Motivated Behavioral ArchitectureÉric Beaudry, Yannick Brosseau, Carle Côté, Clément Raïevsky, Dominic Létourneau, Froduald Kabanza, François Michaud

Université de Sherbrooke, Québec, Canadahttp://www.gel.usherb.ca/laborius/ - [email protected]

Abstract

To operate in natural environmental settings, autonomous mobile robots need more than just the ability to navigate in the world, react to perceived situations or follow predetermined strategies: they must be able to plan and to adapt those plans according to the robot's capabilities and the situations encountered. Navigation, simultaneous localization and mapping, perception, motivations, planning, etc., are capabilities that contribute to the decision-making processes of an autonomous robot. How can they be integrated while preserving their underlying principles, and not make the planner or other capabilities a central element on which everything else relies on? In this paper, we address this question with an architectural methodology that uses a planner along with other independent motivational sources to influence the selection of behavior producing modules. Influences of the planner over other motivational sources are demonstrated in the context of the AAAI Challenge.

Motivated Behavioral Architecture (MBA)

MotivationsMotivational modules (MM) are high level independent modules that influence the way that the robot is behaving. MM can add, modify, query and recommend tasks in DTW.

Dynamic Task Workspace (DTW)DTW is a central component that organizes tasks in a hierarchy using a tree-like structure, from high-level/abstract tasks to primitive/BPM-related tasks.

System Know How (SNOW)The SNOW is a rule-based template that maps low level tasks with BPM.

Behavior-Producing Modules (BPM)BPMs define how particular percepts and conditions influences the control of the robot’s actuators. An arbitration mechanism (priority-based in this implementation, but other methods like fuzzy logic could be used) filters the commands generated by BPMs before applying them to the actuators.

Reactive Planning Process Experimentation Setup : AAAI Challenge 2005

Exemples, Tests and Results

Conclusion

Tasks Mangagement and Selection in MBA

Inside the Plan motivation. we introduce a reactive planning process. The Planner is decomposed in two sub-processes that can run concurrently. The first one is the ExecMonitor sub-process (for execution and monitoring) that communicates with the Dynamic Task Workspace to extract the robot’s current mission. It also adds and recommends lower level tasks to achieve high-level tasks in DTW. The Planning sub-process invokes a planner for planning the mission. The reactive planning process is generic and is not limited to one particular planner.

MBAPlanner_ExecMonitor( ) :while(true) : e = waitForEventOrTimeout(); switch case(e) :

newTask(t) :if(isPlannableTask(t)) :

mission+=t;needreplan = true;

cancelledTask(t) :if(mission.contains(t)) :

mission -= t;needreplan = true;if(currenttask.isDecendendOf(t)) :

currenttask = <notask>;completedTask(t) :

if(t == currenttask) :if(currenttask.isLastDecendendOf(currenttask.parent)) :

workspace.SetCompleted(currentTask.parent);currenttask = plan.nextTask();

if(mission.contains(t)) :mission -= t;needreplan |= plan.containsDecendantOf(t);

failedTask(t) :if(t == currenttask) :

currenttask = <notask>needreplan = true;

sensedData(d) :currentstate.update(d);

needreplan |= plan.validate(currentstate)==FAILURE;if(needreplan):

resetPlanningProcess();workspace.setRecommendation(currenttask);

MBAPlanner_Planning( ) :while(true) :

if(plan.progressFasterThanPlanned() && !plan.nextTask().isReady(now + planningAvgDuration))

needreplan = true;

if(needreplan) :needreplan = false;

for each task t in mission :if(planner.fastplan(t) == FAILURE) :

mission -= t;workspace.failedTask(t);

for each task pair (t1, t2) :tempmission = {t1, t2}if(planner.fastplan(tempmission) == FAILURE):

mission -= lowestprioritytask(t1, t2);

while(newplan == FAILURE) :newplan = planner.plan(currenttask, mission);if(newplan == FAILURE) :

mission -= lowestprioritytask(mission);

plan = newplan;currenttask = plan.nextTask();monitorexec.sendTimeoutEvent();

// Detect and accept the possibility of an opportunity

// Plan as needed

// First : removing unachievable tasks

// Second : try to detect mutex tasks by sufficient condition

Task Life Cycle

Office map experimentation

Testing ScenariosTo experiment and validate the MBA architecture, we created a set of scenarios inspired from the AAAI Challenge. In these simplified scenarios, the robot has to perform tasks specified by people in a simulated conference site.

1 GiveConference : make a presentation at a predetermined location and at a specific time.1 AttendConference : l i s ten to a

presentation at a specific location during a fixed period of time.1 AttendPoster : look at a poster, specified

by a location and a duration, during the poster session time window.1 DeliverMessage : receive a message to a

person at an initial location and bring the message to another person at the destination location.1 Guard : guard a location during a fixed

time period.1 Explore : explore the environment by doing

wandering and looking for interesting things (e.g.: tracking symbols).1 OpenInteraction : interact with other

attendees. Help people or receive help as needed.

Hypothesis1 The metric map of the environment is

already known (no mapping to do).1 Locations are partially known : the robot

has to found the location position on the map by asking help or searching for landmark.1 Tasks can be given using a graphical

interface on the robot and/or directly from a mission text file.

HardwareWe use U2S/Spartacus, a UdeS wheeled robot platform equipped with a laser ranger finder for navigation. This robot is controlled by a laptop computer .

Software

Player : low- level in ter face for communicating with the hardware.MARIE : software integration environment.FlowDesigner / RobotFlow : execution control platform for BPM implementation.ConfPlan : a HTN-based planner with anytime and metrics capabilities.CARMEN : localization and path planning for position tracking and navigation.

Planning Domain1 Navigation table with distances

between each locat ion pai r s. Dynamically computed as new locations are found.1 Average speed of the robot.1 Operators with preconditions and

effects modeling the robot’s actions.1 HTN task templates as search controls.1 Optimization criteria :

Complementarity of Motivational Modules

1

p1

p2

p3p4

p5

p6

p7

p8

p9 p10p11

Initial State

p1

p2

p3p4

p5

p6

p7

p8

p9 p10p11

time=0, RobotAt(p1)

Mission

!Guard p6 at time=0:15 for a duration of 0:05.!DeliverMessage from c1 to p7.

* Location c1 is unknown at start.

2

Planner Plan

Since c1 position is unknown, the planner cannot make a good estimation on how to reach it. So, to be sure to guarantee the accomplishment of the Guard task, it assumes the worst case, that is c1 is very far. The consequence is that going to p6 is the first action.

3

Instinctive Explore

Whi le the P lan MM is execut ing i t s p lan by r e c o m m e n d i n g a ProceedTo(p6) task, the E x p l o r e M M i s recommending f ind ing location c1. This forces the GUI to ask for help. A person clicks on the robot’s touch screen to show c1 position on the displayed map.

4

Planner Replan

When the Plan MM is notified about the position of c1 (between p2 and p7), the planner is reinvoked to find a new plan. Because AskMessage and GiveMessage is feasible before Guard, the planner chooses this order.

5

Trace

p1

p2

p3p4

p5

p6

p7

p8

p9 p10p11

t=0:02Receive

Help

t=0:06Ask

Message

t=0:08Give

Message

t=0:15Guard

Mission Optimization and Failure Detection from the Planner

0:00 1:00

Goto(p2)1.

AskMessage(p2,m1)2.

Goto(p3)3.

GiveMessage(p3,m1)4.

Goto(p9)5.

AskMessage(p9,m2)6.

Goto(p6)7.

GiveMessage(p6,m2)8.

Goto(p4)9.

GuardLocation(p4, 23m, 5m)10.

Goto(p3)11.

GuardLocation(p3, 36m, 5m)12.

Initial State! RobotAt(p1)! time = 0

One objective with the MBA architecture is not to have one central module on which all decisions depend on. At the same time, the MBA wants to t a ke a d v a n t a g e o f a deliberative module (from AI planning) to improve overall p e r f o r m a n c e o f t h e architecture.

To show that MBA is not centralized on a planner, test cases demonstrate that the system can continue to operate and achieve totally or partially submitted missions. Optimal plan for this test

We generate random tests that executed with and without the planner.

Random TestsInitial State- RobotAt(Random_Place)- time = 0

Mission- 1 or 2 DeliverMessage with random origin and destination places- 1 or 2 Guard at random places and times- 1 or 2 AttendConference at random places and times

The intelligence of a system depends on its sens ing, act ing and p rocess ing capabilities, not taken individually but as a whole. The work presented here offers one so lu t ion by in tegrat ing d i f fe rent motivational sources such as a planner to influence the decision-making process of an autonomous mobile robot. Just using a planner to select behavioral modes would require frequent generation of plans to handle dynamic changes in real life settings. Not using a planner makes it difficult to anticipate and reorganize behavioral strategies. Our objective is to try to find the right balance between the two, using the planner as a motivational source allowing the robot to act rationally by selecting and sequencing primitive tasks.

AcknowledgmentsF. Michaud holds the Canada Research Chair (CRC) in Mobile Robotics and Autonomous Intelligent Systems. Support for this work is provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Research Chair program and the Canadian Foundation for Innovation.

Many students on the project are supported by NSERC and by the Fonds québécois de la recherche sur la nature et les technologies (FQRNT) programs.

0:00 1:00 2:00

Goto(p6)1.

GuardLocation(p6, 0:15, 0:05)2.

FindPlace(c1)3.

Goto(c1)4.

AskMessage(c1)5.

Goto(p7)6.

GiveMessage(p7)7.

0:00 1:00

Goto(c1)1.

AskMessage(c1)2.

Goto(p7)3.

GiveMessage(p7)4.

Goto(p6)5.

GuardLocation(p6, 0:15, 0:05)6.Task Selection for BPM Activations

Trace with Planner Trace without Planner

p1

p2

p3p4

p5

p6

p7

p8

p9 p10p11

p1

p2

p3p4

p5

p6

p7

p8

p9 p10p11

Task tt

Task tt

s1

Task tt

s1

Task tt

s1 s2Task tt

s1 s2

Task tt

s1 s2

t

MM2

MM1

MM3

MM4

1. MM adds task t1

2. MM adds child task s12

3. MM handles task s1 and3

marks it completed

4. MM deletes task s1 and2

adds child task s25. MM handles task s2 and4

marks it completed

6. MM deletes task s2 and2

marks task t completed

8. MM deletes1

task t

Environment

GUI

Agenda

Mission! DeliverMessage(p2, p3, m1)! DeliverMessage(p9, p6, m2)! Guard(p3, 23m, 5m)! Guard(p4, 36m, 5m)

Sample Test

t=0:07AskMessage

t=0:07AskMessage

t=0:13GiveMessage

t=0:18AskMessage

t=0:20AskMessage

t=0:26Guard

t=0:36Guard

t=0:48AskMessage

t=0:55GiveMessage

t=0:03AskMessage

t=0:06AskMessage

t=0:26Guardt=0:36

Guard

c1

Dynamic Task Workspace

RestBehavior

AvoidBehavior

AskForHelpBehavior

GotoBehavior

WanderBehavior

Behavior-Producing Modules

GiveConference(p1, t1)

proceedTo(p1)

Goto(x1, y1)

AttendConference(p3, t2)

ProceedTo(p3)FindPlace(p2)

AskForHelp(p2)

DeliverMessage(p1, p2)

Avoid()

GUI GUI GUI

Goto(x2, y2)

Plan Explore Agenda

Survive Navigate NavigateExplore

BPM Selection

Veto Selection Strategy

Rest()

GUI

+Rec(Plan)=rec

+DR (t) : Direct positive recommendation for task t-DR (t) : Direct negative recommendation for task t

Rec(t) : Recommendation status for task t

Rest

+Rec(Plan)=rec +

Rec(Explore)=rec

-Rec(Agenda)=rec

+Rec(Survive)=rec +Rec(GUI)=rec

Each MM is responsible for handling a subset of tasks in DTW and each task can be handled by one or more MM. In the case that more than one MM can handle a task, they compete for determining how to achieve the task.

ProceedTo(p1)

c1