From Deliberative to Routine Behaviors: a Cognitively ... · From Deliberative to Routine Behaviors: a Cognitively-inspired Action Selection Mechanism for Routine Behavior Capture

From Deliberative to Routine Behaviors: a Cognitively-inspired

Action Selection Mechanism for Routine Behavior Capture

Sonia ChernovaComputer Science Dept.

Carnegie Mellon UniversityPittsburgh, PA, [email protected]

Ronald C. ArkinCollege of Computing

Georgia Institute of TechnologyAtlanta, GA, USA

[email protected]

Contact InformationSonia Chernova

Tel: (412) 268-2601Fax: (412) [email protected]

Carnegie Mellon UniversityComputer Science Department

5000 Forbes Ave.Pittsburgh, PA 15213, USA

November 6, 2006

1

Abstract

Long-term human-robot interaction, especially in the case of humanoid robots, requires anadaptable and varied behavior base. In this work we present a method for capturing, or learning,sequential tasks by transferring serial behavior execution from deliberative to routine control. Theincorporation of this approach leads to natural development of complex and varied behaviors, withlower demands for planning, coordination and resources. We demonstrate how this process canbe performed autonomously as part of the normal function of the robot, without the need for anexplicit learning stage or user guidance. The complete implementation of this algorithm on theSony QRIO humanoid robot is described.

2

1 Introduction

It is well known that anthropomorphic robots aresubject to high expectations with regard to nat-ural behavior in their interactions with humans.Furthermore, interaction over extended periodsof time requires that the robot acquire new skills,adapt to its surroundings, and change its behav-ior in response to different situations in order toappear natural and interesting. In this work weaddress the issue of compelling long-term inter-action by enabling the robot to adapt throughthe capture and execution of routine behaviors.

Routine behavior is defined as the habitualperformance of an established procedure or task.Routine behavior occurs in a reactive manner,sometimes without awareness, and requires farless focused attention than conscious and highlysupervised task execution, or deliberative behav-ior.

In humans, the classification of a task into oneof these two categories is dependent on the famil-iarity of the task, as well as other psychologicalaspects such as perceived dangers or complex de-cision making (Norman & Shallice, 1986). Overtime, most behaviors are typically transferredfrom deliberative to routine execution. This typeof adaptation is believed to be fundamental toour development due to the limited nature of hu-man attention.

In this work, we demonstrate a similar adap-tational mechanism for robotic systems. Wepresent a method for capturing, or learning,robotic tasks and transferring their executionfrom deliberative to routine control. Specifi-cally we focus on sequential tasks, ones that in-volve series of consecutive actions. We presenta method based on the repeated execution ofa task, that allows the robotic system to learnhabitual execution, so as to require less plan-

Figure 1: The Sony entertainment robot QRIO.

ning, coordination and resources. We demon-strate how this process can be performed au-tonomously as part of the normal function ofthe robot, without the need for a specific learn-ing stage or user guidance. The incorporationof this method results in the more varied andadaptable behavior that is so important for suc-cessful long-term interaction.

This research was conducted on a fully au-tonomous robotic humanoid system, the SonyQRIO entertainment robot (Figure 1), designedfor long term human-robot interaction. All ex-periments were performed on the physical plat-form without simulation.

Psychological and Neuroscientific Mo-tivations

Action selection, the problem of selecting whatto do next, has been studied extensively in hu-mans in the areas of psychology, physiology,neurology and other related fields (Cleeremans,1993; Sun & Giles, 2001; Keele, Ivry, Mayr,

3

Hazeltine, & Heuer, 2004). Our approach ismotivated by the Contention Scheduling (CS)model for action selection, originally proposed byNorman and Shallice (Norman & Shallice, 1986;Cooper, Shallice, & Farringdon, 1995; Cooper& Shallice, 1997, 2000). Contention Schedulingcombines elements of automatic and deliberativeaction execution, encompassing both action se-lection for routine actions as well as deliberativeplanning and execution.

Original inspiration for Contention Schedul-ing, and other serial behavior models, can betraced back to early work by Lashley (Lashley,1951). Lashley stated, consistent with the Con-tention Scheduling model, that a parallel set ofchunk actions are activated even before actionis produced. More recently, (Houghton & Hart-ley, 1995) revisited Lashley’s early work on par-allel models of serial behavior, and employed theconcept of schemas as the basis for producingsequential behavior, providing new evidence tosupport his claim. This inspired the concept ofactivation levels that is currently used in oneform for its action-selection mechanism in theQRIO architecture.

In other work, Cisek contends that animalshave two pragmatic concerns: action specifica-tion and action selection (Cisek, 2001). Specifi-cation allows multiple processes to be primed foraction that are based on spatial information ofthe agent and its relationship to world objects,while selection reduces this set to a unique be-havior for enactment based upon the nature andidentity of the environmental objects. As far ascan be told, however, this model has yet to beexported to a robotic system in its native form.

One recent neuroscientific study (Badgalyan,2000) gives strong support to Norman and Shal-lice’s model of central deliberative control byidentifying regions in the brain where such ac-

tivity occurs, specifically, the cingulate and pre-frontal cortical regions. The statement thatthe deliberative system is recruited only whenthe action is conscious helps us understand thenon-dominant relationship between the Supervi-sory Attentional System (SAS) and ContentionScheduler (CS), and helps ground us in designinga suitable interface between planning and controlfor QRIO.

Other neuroscientific studies implicate thebasal ganglia in action selection as well as ac-tion gating via disinhibition (Prescott, Gurney,& Redgrave, 2003). Their notion is that sensoryrequests arrive at the basal ganglia in the form ofrequests for access to the motor control system,which utilizes multiple selection mechanisms toservice those requests. A model of this systemwas embedded in a mobile robot and tested, butmore for the ability to evaluate it as an explana-tory model of the human action selection mech-anism as opposed to an efficient robot controller.

Several implementations of the ContentionScheduling model have also been developed, in-cluding one by Cooper and Shallice themselves(Cooper & Shallice, 2000). However, most workso far has focused on the independent implemen-tation and study of the two behavior mechanismsfor the control of routine and non-routine behav-iors. The problem of transference of behaviorsand skills from deliberative to routine controlis much less studied as it requires a functionalmodel of both systems.

We are aware of only two projects involved inthe study of routine behavior capture for roboticsystems relating to the Contention Schedulingmodel. One is the work of Garforth and Mee-han (Garforth, Meehan, & McHale, 2000), wherethe Contention Scheduling model is implementedas a large scale neural network. The authorsdemonstrate the transfer of skills from deliber-

4

ative to routine control using a simulated robotunder the task of avoiding distracting stimuli inorder to fully complete a primary task. Thisstudy differs significantly from the work pre-sented here, not only in the implementation ofthe system, but also in its functionality and tar-get behavior. Specifically, the related work isconcerned with only single behaviors instead ofsequential tasks, and does not include an emo-tional model.

The second related work, published by Cooperand Glasspool (Cooper & Glasspool, 2001), fo-cuses on acquiring environment-action associa-tions through unguided exploration in the envi-ronment and reinforcement learning. This workdiffers significantly in that learning is performedonly at the lowest levels, pairing environmentalfeatures with basic actions. It is independentof any Deliberative System and the algorithmdoes not represent any higher order behavior se-quences.

Our work is complimentary to the Cooper andGlasspool approach, consisting of a mechanismfor capturing higher order behavior sequencesthat comprise more complex tasks. Over time,sequential tasks executed through deliberationcan become routine, such that the mechanism oftheir execution is shifted to the reactive behav-ior level and deliberative planning is no longerinvolved in the execution of the sequence. Thisseparation of routine and non-routine behaviorsallows different methods to be applied to each,while reducing the load on the planner and free-ing system resources. Our approach is the firstcomplete implementation of the routine behaviorcapture mechanism at this level, and we demon-strate its effectiveness using the fully functionalbehavior model of the QRIO robot.

In the rest of the paper we present an overviewof QRIO’s behavior system, the Emotionally

GrOunded (EGO) Architecture, followed by adetailed description of the behavior selection al-gorithm. We then present the routine behaviorserialization method which allows the transfer-ence of routine behavior control from the delib-erative to the reactive layer of the behavior sys-tem. Finally, we describe the results of a seriesof experiments.

2 Behavior System Overview

The autonomous behavior control architectureof the QRIO robot, termed the EGO architec-ture, is designed for long-term human-robot in-teraction based on an ethological behavior model(Arkin, Fujita, Takagi, & Hasegawa, 2003). De-tailed descriptions of its various components ap-pear in a series of previous publications (Fu-jita, Kuroki, Ishida, & Doi, 2003; Tanaka, Noda,Sawada, & Fujita, 2004; Hoshino, Takagi, Profio,& Fujita, 2004; Sawada, Takagi, & Fujita, 2004;Sawada, Takagi, Hoshino, & Fujita, 2004), andin this section we provide only a brief descriptionof the individual software components related tothis work. Figure 2 presents an overview of thesystem.

2.1 The EGO Architecture

The perception component of the system pro-cesses data on three different sensor channels -visual, auditory and tactile. Information aboutthe environment perceived through the robot’ssensors is integrated into the Short Term Mem-ory (STM) segment of the memory component,computing information such as object locationand sound source, while assigning IDs to theseperceptual events. Using the robot’s kinematicdata, the STM also tracks and maintains the po-sition of objects located outside the current field

5

Figure 2: Overview of the EGO architecturecomponents.

of view of the robot.The Long Term Memory (LTM) component

associates perceived information with the inter-nal state of the robot. This allows the robot toidentify people through face and voice recogni-tion, as well as recall previously made associa-tions and emotions felt about the person.

Variables related to the internal state ofthe robot are maintained by the InternalState Model (ISM). Example state variables in-clude nourishment, sleep, fatigue, and vitality.Changes in the robot’s internal state occur withthe passage of time, and as a result of externalstimuli or the robot’s actions. While some vari-ables can be grounded on a physical sensor, suchas battery charge level, others represent moreabstract values.

Three different behavior control modules areresponsible for behavior selection. The ReflexiveBehavior Layer (RBL) regulates behaviors thatrequire a quick response time, such as being star-tled. The Situated Behavior Layer (SBL) con-

Figure 3: An example of the SBL behavior tree.

trols reactive and homeostatic behaviors. TheDeliberative System (DS) performs deliberativeplanning and control of sequential behaviors.Within the SBL, which is the focal point forthis research, behaviors are organized in a tree-structured network of schemas (Figure 3). Ac-tion selection is conducted when schemas in thenetwork compete for activation based on theirrelevance and resource requirements. The spe-cific details of this behavior selection mechanismare described in the following section.

2.2 Behavior Selection

The behavior cycle is executed at a rate of 2 Hz.During each cycle, every behavior calculates afitness value called the Activation Level (AL),which indicates the relevance of that behavior inthe current situation. The Activation Level iscalculated based on the external stimuli and in-ternal state of the robot, as well as intentionalvalues provided by the Deliberative System. De-tails of the Deliberative System will be discussedin Section 4.1.

Behavior selection occurs using a greedy pol-icy: the behavior with the highest ActivationLevel is selected first. Other behaviors, fromhighest to lowest AL value, can then be selected

6

for concurrent execution as long as their resourcedemands do not conflict with the already cho-sen ones. A behavior’s resource demands are thephysical robot parts (torso, head, etc.) and vir-tual resources (speech, etc.) needed for the exe-cution of the behavior. A more detailed discus-sion concerning parallel activation and the prop-agation of Activation Level values through thebehavior tree can be found in (Hoshino et al.,2004).

In the current implementation, ActivationLevel values are strictly positive and can havearbitrarily high values. Alternative approachescould include normalizing or bounding all valuesto some range. Since the values are compared ona relative instead of absolute scale, all of theseapproaches will result in the same outcome.

The Activation Level Function, used to calcu-late the AL value, plays a crucial role in deter-mining which behaviors are selected. In effect,different values in the AL function control theovert personality and behavioral manifestationof the robot, making it of critical importance inthe development of an entertainment robot sys-tem. In the following section we describe the fulldetails of this implementation.

3 Activation Level Evaluation

Our new formulation of the Activation LevelFunction builds upon the previously existing ALFunction of the EGO architecture (Hoshino etal., 2004; Sawada et al., 2004), while addingfundamental components inspired by the Con-tention Scheduling (CS) model (Cooper et al.,1995; Cooper & Shallice, 2000). The pre-existingAL function was based upon an ethologically in-spired model (Arkin et al., 2003) and containedelements balancing internal motivations and ex-

ternal stimuli.We have extended this function to now include

a resting level and random noise, both of whichare key features governing the CS model (Cooperet al., 1995; Cooper & Shallice, 2000). Most im-portantly, we have added a new mechanism wecall Self Excitation, which, although inspired bythe concepts of Self Activation and Lateral Inhi-bition present in the CS model, in a way replacesboth while providing new abilities and flexibilityto the system.

The new Activation Level value is calculatedas a weighted sum of the following five compo-nents:

Previously existing components

• Motivation value (Mot)

• Releasing value (Rel)

Contributed components

• Resting Level (RL)

• Random noise (Noise)

• Self Excitation value (SE )

Each of these contributing factors are de-scribed below.

3.1 Motivation Value

The motivation value is derived from the inter-nal state of the robot, and embodies its instinc-tual drive. It is calculated as a weighted sum ofindividual instincts, Ins[i]. Each instinct valuecorresponds to an internal state i, and expressesthe robot’s current desire with respect to thatstate (Hoshino et al., 2004; Sawada et al., 2004).

7

Figure 4: Curve expressing the relationship be-tween intention and instinct for the internal statenourishment.

Example values for the nourishment internalstate are shown in Figure 4. The instinct asso-ciated with nourishment is high when the inter-nal value of nourishment is low, signifying thatthe robot has a desire to satisfy this need andraise the nourishment level. The less nourish-ment there is, the greater the desire to obtain it.In the case when the nourishment level is high,the instinct value is low to signify satisfaction. Itis also possible to have a negative instinct value,signifying that the robot has exceeded the de-sired level for that internal state. In our exam-ple, this phenomenon symbolizes overeating; ifthe nourishment level is too great, the instinctor desire to eat becomes negative.

The function relating each instinct to its cor-responding internal state is designed on an in-dividual basis, as described in (Sawada et al.,2004). The motivation value for each behavioris calculated as a weighted sum of the robot’scurrent instincts:

MotBeh =∑

i

WMBeh[i] · Ins[i] (1)

where WMBeh[i] is the motivation weight associ-ated with instinct Ins[i] for that behavior. Theweights are used to control which instincts mo-tivate each behavior. For example, an instinct

Figure 5: Curve expressing the relationship be-tween intention and satisfaction for the internalstate nourishment.

based on nourishment would motivate an eatingbehavior, but may not effect the desire for sleepor exercise.

3.2 Releasing Value

The releasing value denotes the expected satis-faction associated with an internal state that therobot would achieve from executing a behavior.It is composed of the current satisfaction value,Sat[i], which is based on the internal state of therobot at this time, and an expected satisfaction,ESat[i], which is calculated based on the inter-nal state and the properties of external stimuli(Sawada et al., 2004).

Expected satisfaction is calculated by predict-ing the change in internal state as a result of per-forming a behavior. Consider the nourishmentexample, as shown in Figure 5. From the figurewe see that if the nourishment level is low, rais-ing it leads to increased satisfaction. However,if the nourishment level is currently high, eatingfurther can lead to overeating and a decrease inthe satisfaction value. The lowest levels of satis-faction result from extreme hunger and extremesatiety.

Satisfaction values can also depend on charac-teristics of external stimuli, such as object type,size or distance. For example, it is possible to

8

expect more satisfaction from interacting witha friend than a stranger. In the current imple-mentation of our system satisfaction functionsare designed on an individual basis for each be-havior.

The releasing value is calculated based on thesatisfaction values using the following formula:

dSat[i] = ESat[i]− Sat[i] (2)

RelBeh =∑

i

WRBeh[i] · (WdSatdSat[i] +

(1−WdSatESat[i])) (3)

where dSat[i] denotes the expected change insatisfaction, WRBeh[i] is the releasing weightassociated with internal state i, and WdSat isthe weight of the expected change in satisfactionagainst the final expected satisfaction value. Forfurther details on motivation and releasing val-ues, and their involvement in behavior selection,please see (Sawada et al., 2004)

3.3 Rest Level

The Rest Level value establishes a baseline acti-vation for behaviors having no additional input.In the absence of any internal or external influ-ences, the Activation Level value tends towardthe Rest Level. Its function is to permit low-leveldifferentiation of behaviors based on priority dueto the assignment of different resting levels.

In the case when a behavior is active and thereare no internal or external stimuli, the drop ofthe AL to rest level is controlled by a decay func-tion called the Persistence Function (Figure 6).This feature mimics the mechanism by the samename proposed by Norman and Shallice (Nor-man & Shallice, 1986). Its purpose is to promotethe continuity of the Activation Level of activebehaviors.

Figure 6: An idealized graph of the ActivationLevels of two behaviors, demonstrating the effectof the Persistence Function. In the absence ofany internal or external stimuli, the ActivationLevel of the active behavior decays until anotherbehavior is activated as a result of a higher ALvalue. Activation of behavior A drops to the restlevel.

In the case that a behavior is already active,it is natural that the desire to perform the activ-ity should persist for some time after all stimulidisappear. For example, if the robot is playingsoccer and loses sight of the ball, the desire toplay soccer should persist for some time, allow-ing the robot to search for the ball. However,the probability that another behavior is selectedshould increase over time.

Persistence is implemented through a decayfunction which decreases the Status Self Excita-tion value over time. The decay continues untileither the rest level value is reached or anotherbehavior is activated.

9

Figure 7: Graph of actual Activation Level val-ues including the noise parameter.

3.4 Noise

The Random Noise parameter adds normallydistributed random noise to the Activation Levelin order to break ties between equally competingbehaviors and add variability to the behavioraldisplay. The magnitude of the noise is a con-trollable parameter, usually proportional to thestrength of the AL value. Figure 7 shows noisyActivation Levels of three competing behaviors.The noise component is omitted in other exam-ple figures in this section in order to demonstratethe underlying concepts more clearly.

3.5 Self Excitation

The Self Excitation (SE) concept, and the under-lying function used for setting this value, com-rise one of the main contributions of this work.Self Excitation is composed of two factors, Sta-tus Self Excitation (SSE) and Routine Self Exci-tation (RSE), which play two different and im-portant roles in Activation Level regulation.

SE = SSE + RSE (4)

Figure 8: An idealized graph of the ActivationLevels of three competing behaviors, demon-strating the role of SSE in activation level regu-lation.

Status Self Excitation serves as a mechanismfor minimizing the risk of rapid oscillation be-tween behaviors. Its value is dependent upon thecurrent operational status of the behavior, whichrepresents whether the behavior is currently ac-tive, not active, or transitioning in or out of theactive state. Different levels of excitement areassociated with each status.

A behavior that is not active experiences verylittle or no excitement. When the behavior be-comes active, its excitement level rises until thecompletion of the behavior, increasing the over-all Activation Level value. This ensures that theActivation Level of an active behavior has anadditional margin of separation from the otherbehaviors. The Activation Levels of other be-haviors must overcome this margin in order tobecome active. Figure 8 shows the ActivationLevels of three competing behaviors and demon-strates the effect of SSE.

This mechanism is designed to increase thelikelihood that each behavior runs to comple-

10

Figure 9: An idealized graph showing the Ac-tivation Levels of a captured routine sequenceconsisting of three behaviors executed via Rou-tine Self Excitation.

tion without interruption. This is a desirablecharacteristic as it prevents behavioral dithering,the phenomenon of behaviors thrashing back andforth between themselves as they compete for ex-ecution. Note that the currently active behaviorcan still be interrupted if the Activation Level ofa competitor surpasses the excitation margin.

Routine Self Excitation controls the activationof routine behavior sequences. Its behavior isbased on sequences of behaviors that were cap-tured as a result of repeated execution of sometask.

Figure 9 demonstrates the effect of RSE ona captured routine sequence of behaviors A →B → C. In the captured sequence, each behav-ior knows its predecessor in the chain. RoutineSelf Excitation works by increasing the Activa-

tion Level of each behavior when its predecessorbecomes active, a process called priming. In ourexample we see that the AL value for B risesas its predecessor A becomes active. A similarrelationship exists between C and B.

Intuitively, this can be interpreted as a behav-ior anticipating or predicting being next in thesequence. The resulting higher Activation Levelincreases the likelihood, although does not guar-antee, that all the behaviors in the sequence willbe executed in the captured order. Note that theRSE value is usually set so as not to overcomethe SSE margin, otherwise the primed behaviorwill activate early, interrupting the execution ofits predecessor. Overall, the excitation compo-nents work together to achieve continuous andcomplete behavior sequence execution. Furtherdetails of the RSE implementation are discussedin Section 4.

3.6 Activation Level ComputationSummary

The complete Activation Level Function is sum-marized in equations 5 and 6. The motivationaland releasing components are combined in MR,where WM sets the weight or importance be-tween the two. WSE provides a similar balancebetween Self Excitation and MR.

MR = WMMot + (1−WM )Rel (5)

AL = WSESE + (1−WSE)MR + RL + Noise(6)

The two weight parameters of the ActivationLevel Function, WM and WSE , control the emer-gent behavioral pattern and apparent personal-ity of the robot. A high WM results in the robotexhibiting self-centered behavior aimed at sat-isfying its own internal state. This type of be-

11

havior would likely be perceived as selfish or un-friendly by humans as the robot may ignore theirpresence or attempts at interaction. A low WM

value would result in the opposite effect wherethe robot would be overly responsive to exter-nal stimuli and interaction. WSE controls howlikely the robot is to complete a task it has be-gun. It affects both short independent behaviorsand the completion of longer chains of sequen-tial tasks. Algorithms for dynamically adjustingthese values still need to be explored, the currentimplementation relies on fixed values.

4 Routine Behavior Captureand Execution

Routine behavior is defined here as the repeatedexecution of the same task or sequence of tasksover a prolonged period of time. Over its life-time, QRIO will complete many repeated tasks.In this research, we are specifically interested inhigher-order routine tasks composed of a numberof sub-behaviors executed in a specified order.For example, the task of playing soccer requiresthe robot to find the ball, approach and kick it.These sub-behaviors must be executed in a spe-cific sequence in order for the overall task to besuccessfully completed.

In this section, we first describe the mecha-nism by which normal sequential behavior is ex-ecuted through deliberative intention. We thenpresent a novel approach for capturing, or learn-ing, these repeated behavioral sequences and ex-ecuting them subsequently as routine behaviors.

4.1 The Deliberative System

In this architecture, deliberative control of therobot is performed by the Deliberative System

Figure 10: Overview of the interaction betweenthe Deliberative System, Intentional Bus and theSBL behavior schema tree.

via the intentional bus (Figure 10) (Ulam &Arkin, 2006). The intentional bus forms a gate-way between the reactive and deliberative lay-ers in the architecture. Its purpose is to con-vert the high-level, goal-oriented tasks generatedby the Deliberative System into intentional biasthat serves to influence behavioral activation inthe reactive layer. These biases can be combinedwith the existing Activation Level computations(Equation 7) in order to guide the robot towardsaccomplishing the goals of the Deliberative Sys-tem.

ALtotal = AL + IntentionalBias (7)

The intentional bus provides three major func-tions for the Deliberative System:

• It serves as a repository of informationabout the underlying behavior level, main-taining information about the current statusof each behavior as well as their ActivationLevels.

12

• It biases the Activation Levels of designatedreactive behaviors in response to requestsemanating from the Deliberative System.

• It maintains appropriate levels of inten-tional bias despite changes in ActivationLevels of the behaviors.

When the Deliberative System generates andthen executes a plan for the robot, the inten-tional bus responds by sending intention, orbias, to the first behavior in the planned se-quence. The strength of the bias is encoded inthe plan representation itself, and conveys theimportance of that behavior in the sequence. Aweak intentional bias implies that the behaviorhas a low priority, which may result in its ex-ecution being interrupted, deferred, or perhapseven skipped should higher priority activities bein play.

While the behavior is performed by the robot,the intentional bus monitors its progress andmaintains appropriate intentional bias levels.When the behavior completes, the focus isshifted to the next behavior in the sequence andthe process is repeated. In this way the inten-tional bus activates each of the behaviors in theplan in the specified order. Figure 11 shows theActivation Levels of a sequence of four behaviorsexecuted by the Deliberative System.

4.2 Routine Behavior

While the execution of deliberative plans is con-trolled by a centralized, higher-level system, thecapture and execution of routine behaviors oc-curs entirely at the reactive layer. Informationbetween schemas is shared via a structure calledLogMemory, which records the status and in-tention value of each schema. The capture ofroutine sequences occurs independently for each

Figure 11: An idealized graph showing the Ac-tivation Levels of a sequence consisting of fourbehaviors executed under the control of the De-liberative System.

behavior; each schema only records and learnsbehavior sequences in which it plays a part. Fig-ure 12 provides an overview of the serializationprocess.

Using information from LogMemory, each be-havior creates an Intentioned Schema History(ISH) which maintains information about anycurrently active deliberative plans. Statisticsmaintained by the ISH include the order of thebehaviors in the sequence and the value of theintentional signal sent via the intentional bus. Ifno plans are being executed by the DeliberativeSystem, the history remains empty.

When a specific behavior is deliberately acti-vated by the intentional bus, it queries the ISHto determine the behavioral sequence prior to itsactivation. In the case that the active behavioris the first in a planned sequence, the ISH will

13

Figure 12: Overview of the behavior serializationprocess.

be empty. If the behavior is not the first of aplanned sequence, then the ISH will contain asummary of the entire plan executed up to thispoint. The behavior listed as the most recentone in the ISH, the previous step in the plan, iscalled the trigger schema, and this occurrence istermed a triggered sequence.

When a triggered sequence occurs, the trig-ger schema is compared to a list of previouslyencountered trigger schemas. If this triggerschema has never been previously encountered,the schema is entered into the Candidate Cap-tured Routine List (CCRL). This list maintainsa record of behaviors that have preceded thecurrently active behavior in previously executedplans. The schemas listed here have no effect onthe Activation Level, but instead serve as a lo-cal memory bank of previous experiences for thebehavior.

When a triggered sequence occurs that in-volves a previously encountered trigger schema,information relating to that schema is updated.Statistics maintained about each schema includethe average intentional bias value used to triggerthe schema and the total number of times theschema has been active.

As stated above, trigger schemas on the candi-date list do not yet have an effect on the RSE andAL, and do not comprise captured behaviors. A

trigger schema must pass certain requirementsbefore being captured and considered a part ofa routine activity. Once captured, the schema isupgraded from the CCRL to the Captured Rou-tine List (CRL).

Each behavior maintains its own CRL andmonitors the status of the associated triggerschemas through LogMemory. When a triggerschema becomes active, the behavior primes it-self for activation in anticipation of being nextto execute, as seen in Figure 9. When primingoccurs, the Routine Self Excitation (RSE) valueincreases, raising the overall Activation Level ofthe behavior. The amount by which the RSE in-creases is proportional to the average intentionalbias (IBave) received from the Intentional Bus.

The process by which a trigger schema, sym-bolizing part of a sequence, achieves routine be-havior status is important for the success of thesystem. Capturing sequences too quickly mayresult in undesired behavior by the robot, whilecapturing too slowly will make the whole pro-cess ineffective as relatively few routine behav-iors will be learned. We have tested three differ-ent approaches to this problem, although othermethods clearly exist:

1. Simple Thresholding: The sequencepairing must occur some minimum numberof times before it is considered routine.

2. Pair Frequency Thresholding: In addi-tion to the Simple Thresholding criteria, thebehavior must follow the trigger schema ina significant proportion of seen plans. Morespecifically, the ratio of the number of timesthe behavior is activated following the trig-ger schema relative to the total number oftimes the trigger schema is active must passsome threshold. If the pairing occurs only

14

sporadically, and the majority of the timethe trigger schema is followed by some otherbehavior, then this pairing does not have astrong routine bond and is not captured.

3. Convergence Thresholding: In additionto the Simple Thresholding criteria, therecorded average intention value must sta-bilize or converge. This ensures that theRSE, which is calculated based on the av-erage intention value, accurately simulatesthe intentional bias signal.

The Routine Self Excitation value replaces therole of the intentional bias in its control over gen-erating the behavior sequence after the routinehas been captured. It is therefore natural thatthe RSE is calculated based on the average in-tentional bias value. The intentional bias aver-age (IBavg) is maintained for each behavior pairindividually, since the strength of the bias sig-nal conveys the strength or importance of thesequence pairing. The following methods havebeen tested for calculating the RSE value:

1. Static RSE: The RSE is equal to IBavg.

2. Likelihood RSE: The RSE is equal toIBavg scaled by the likelihood of the routinepairing occurring. This results in a higherSelf Excitation value for more likely pair-ings, increasing the probability of their oc-currence.

3. Stochastic RSE: The RSE value is calcu-lated using a stochastic method based onthe likelihood of the pairing occurring. Notethat in the current implementation the in-dividual behavior schemas share only a lim-ited amount of information; specifically theActivation Levels and RSE values are not

shared. Therefore, the RSE value is prob-abilistically chosen between the full IBavg

value and the value of IBavg scaled by thelikelihood of the routine pairing occurring.Other stochastic approaches could also beapplied.

The final issue that must be addressed is atwhat point a behavior sequence becomes trulyroutine. Despite learning the appropriate se-quencing, anticipating activation and settingRSE, this goal is not achieved until delibera-tive planning is no longer directly involved in theexecution of the sequence. To accomplish this,the Deliberative System must be notified whena routine is captured.

Once notified, the Deliberative System canchoose to stop controlling the plan through in-tention and shift to an attentional method thatmerely triggers the sequence rather than con-stantly overseeing it (Ulam & Arkin, 2006). Thisprocess can be compared to a parent who isteaching their child to ride a bicycle decidingwhen to let go of the seat. We can imagine cer-tain cases, plans involving dangerous activitiesfor example, where the planner may never de-cide to rely completely on the routine behavior.However, in the majority of cases, the learnedroutine activity can be released from delibera-tion.

The Deliberative System is notified of a rou-tine’s capture by monitoring the Routine SelfExcitation values of the behaviors. RSE val-ues are set to zero unless the behavior is beingprimed for activation as part of a sequence. Notethat if intentional bias is present, signaling thatthe sequence is still under deliberative control,the RSE is also set to zero to avoid increasing theActivation Level by double the desired amount.In this case, however, the RSE value which would

15

have been present is still sent to the DeliberativeSystem, signaling that the sequence is a capturedone.

The intentional bus provides a mechanism forthe Deliberative System to monitor the status ofall schemas. This enables it to observe whetherthe intended plan sequence did indeed proceedin the required order, ensuring that the outcomeis still correct even in the absence of additionalintentional bias. It is desirable that the systemmonitor the progress of the plan for some inter-val in order to make sure that the sequence hasbeen learned properly and continues to be exe-cuted correctly. In the case that it has not, theDeliberative System resumes executing the planthrough intention. If the routine sequence per-forms well however, monitoring can be reducedor eliminated entirely. Details about the Delib-erative System and the implementation of theintentional bias mechanism appear in (Ulam &Arkin, 2006).

4.3 External Stimuli during RoutineBehavior Execution

Since the activation level is dependent upon ex-ternal stimuli, it is possible that an externalstimulus will increase the AL of an unrelatedbehavior during the execution of a routine se-quence. For example, if the robot recognises ahuman face while executing a soccer sequence,the activation level of a dialogue behavior mayrise. Whether routine task execution ought tobe interrupted by this occurrence should be de-termined by such factors as the importance ofthe routine behavior sequence, the relative de-sire for interaction, and personality preferencesof the robot. In our system, the intentional biasmechanism is used to control this process. Aseries of experiments by (Ulam & Arkin, 2006)

Figure 13: Behavioral tree used for serializationexperiments.

demonstrate the robot’s ability to ignore or at-tend to such distractions based on the strengthof the intentional bias signal.

5 Experiments and Results

Several experiments were designed to test theability of the proposed method to capture andexecute routine behaviors. In each experiment,the same behavior sequence was used, which re-quired QRIO to emulate attending a music class.In the music class activity, the robot must go tothe class location, locate and play a musical in-strument, and sing. All experiments and taskswere performed by the real robot.

The behavior tree used in each experiment canbe seen in Figure 13. In addition to the fourbehaviors that make up the behavior sequencewhich we wish the robot to learn (GoToClass,FindBell, RingBell and Sing), it contains two ad-ditional behaviors, Sleep and Soccer, which com-pete for activation.

In all experiments, the Pair Frequency Thresh-olding method was used. The behavior sequencehad to be experienced 5 times, and each pairinghad to occur at least 75% of the time for thebehavior to be captured. Comparable experi-mental results have been achieved with the other

16

(a)

(b)

(c)

Figure 14: Summary of three experimental se-tups. Each figure displays the plans executed bythe Deliberative System, the percentage of thetime each plan was used, and the internal stateof the Captured Routine List once the sequencewas learned. The CRL of each behavior liststhe trigger schemas for that behavior, the aver-age intention value used to activate the behavior,and the fraction of the time the trigger schemais followed by the behavior.

thresholding approaches and metrics. The Like-lihood RSE method was used in experiments 1-3and the Stochastic RSE method in experiment4.

5.1 Experiment 1

The goal of the first experiment was to test thesystem’s ability to learn a frequently repeatedbehavior sequence and execute it as a routinebehavior. The experiment consisted of QRIO ex-ecuting behaviors under the control of the nor-mal action selection mechanism. At some in-tervals this was interrupted by the DeliberativeSystem, which executed the planned music classsequence. For each behavior in the plan, the in-tentional bias was set to 100, a value selected tobe high enough to guarantee that the plannedsequence was never interrupted.

Figure 14(a) provides an overview of the ex-periment. The left side of the figure shows theplan that was executed by the Deliberative Sys-tem. The right side of the figure lists the con-tents of the Captured Routine List of each be-havior upon the completion of the routine cap-ture. Observe that the GoToClass behavior,since it is first in the sequence, does not havea trigger schema. Each of the other three behav-iors learns its connection to the previous behav-ior in the plan.

Since the sequence was always executed tocompletion in a consistent manner, and with noother plans present, each sequence pair was cap-tured at approximately the same time. Upon thecompletion of the capture, the Deliberative Sys-tem shifted to using an attentional signal insteadof deliberative intentional bias. During the nextscheduled execution of the music class sequence,the Deliberative System initiated the behaviorusing attention by sending a short burst of inten-

17

(a)

(b)

Figure 15: Graphs of the Activation Levels of a sequence of four behaviors executed (a) by theDeliberative System and (b) as a routine behavior. AL values of two other competing behaviorsare also present for demonstration purposes. Two bars below each graph are used to indicate theactive and primed behaviors.

18

tional bias to the GoToClass schema, and thenentered a monitoring state for the duration ofthe sequence’s execution. The remainder of themusic class sequence was successfully executedentirely through Routine Self Excitation.

Figure 15 compares the AL values of the be-havior sequence when executed by the plannerand as a learned routine behavior. As the plotlines can be difficult to distinguish, symbols ontwo bars below each figure are used to indicatethe active and primed behavior for each time seg-ment.

In Figure 15(a) the Activation Values of thesystem as the planner executes the music classsequence are shown. The robot is initially exe-cuting a Sleep behavior, which is interrupted bythe Deliberative System. The planned behaviorsare then executed one by one. The overall shapeof the AL curves of the sequence closely resem-ble the idealized curves seen earlier in Figure 11.Upon the completion of the sequence the robotresumes the Sleep behavior.

Figure 15(b) displays the internal state of thesystem as the robot executes the music class se-quence as a routine behavior. This sequence alsobegins with the robot sleeping, which is again in-terrupted by the Deliberative System, this timethrough an attentional instead of an intentionalsignal. This is characterized by the short spikein the Activation Level of the GoToClass behav-ior. As the GoToClass behavior becomes active,we see an immediate response from the Find-Bell behavior, whose Activation Level rises asit is primed for activation. As the robot reachesthe class and the GoToClass behavior completes,FindBell is activated and RingBell is primed.This results in the step-like graph characteris-tic of Activation Levels of routine behavior se-quences, similar to the idealized graph in Figure9.

5.2 Experiment 2

The second experiment was performed to testthe system’s response to variable order plans.Specifically we are interested in plans that have afixed number of subtasks that must be achieved,but where the order of some or all of the sub-tasks is not fixed. These types of plans are fairlycommon and cover a wide range of activities. Inour experiment, we chose to leave the order ofQRIO’s class activities unspecified. Upon arriv-ing at class QRIO could choose to sing or ringthe bell in either order, although both activitieshad to be completed.

Figure 14(b) summarizes the experimentalsetup. The left side of the figure displays thetwo plans executed by the planner. In this exper-iment no preference was given to either plan andthey were executed an equal proportion of thetime. On the right of the figure we get anotherview of the CRL. GoToClass again has no trig-ger schema, and RingBell remains unchangedbecause this behavior always follows after therobot finds the bell. The FindBell and Sing be-haviors now both have two trigger schemas. TheFindBell behavior will sometimes occur follow-ing GoToClass and at other times following Sing,and the CRL expresses both of these possibilitieswhile also keeping track of the likelihood of eachpairing based on previous history.

Figure 16 tracks the Activation Level valuesover the course of the sequence. Plots of the be-haviors not involved in the music class sequencehave been omitted for clarity. Initially the Ac-tivation Levels of FindBell and Sing are veryclose, separated only by the noise parameter.As GoToClass is activated, both behaviors areprimed through RSE. Due to the fact that bothplans were equally likely and equal intention lev-els were used for both behaviors, the RSE val-

19

Figure 16: Graph of the Activation Levels of aroutine sequence resulting from equal preferencevariable order plans. Both FindBell and Singare primed during the active phase of the GoTo-Class behavior. Both behaviors have equal ini-tial motivation levels and random noise breaksthe tie.

ues in this case are identical and both behav-iors remain close in activation. The noise pa-rameter remains the only separating factor be-tween these two behaviors, and upon the com-pletion of GoToClass the FindBell behavior isactivated. Since FindBell is always followed byringing, RingBell becomes primed at this timewhile Sing returns to its rest level value. Sing isprimed again during the RingBell behavior andis finally activated as the last behavior in thesequence.

The internal state of the robot, its internal de-sires or moods, plays an important role in behav-ior selection and therefore also in sequence exe-cution. Figure 17 presents another set of resultsfrom the same experiment where the outcome

Figure 17: Graph of the Activation Levels of aroutine sequence resulting from equal preferencevariable order plans. Both FindBell and Sing areprimed during the active phase of the GoToClassbehavior. The Sing behavior has a higher initialmotivation and is therefore selected.

was affected by the internal state of the robot.In this case the robot’s desire to sing is muchgreater starting out than its desire to ring thebell. This is apparent from the large differencein Activation Levels of the two behaviors beforethe sequence begins. As GoToClass is activated,both FindBell and Sing are again primed by thesame amount, but the gap between their AL val-ues remains due to the internal state of the robotat this time. This leads to Sing being activatedupon arriving in class, followed by the bell ring-ing behavior. This example demonstrates theability of the system to express the internal stateand preferences of the robot.

20

Figure 18: Graph of the Activation Levels ofa routine sequence resulting from biased pref-erence variable order plans. Both FindBell andSing are primed during the active phase of theGoToClass behavior. The preferred behavior,FindBell, has a higher RSE value, causing it tobe selected.

5.3 Experiment 3

The third experiment was based on the samesetup as Experiment 2, but one of the planswas given preference over the other. As can beseen in Figure 14(c), executing bell ringing be-fore singing was preferred and 75% of the exe-cuted plans used this order. The CRL, whichtracks the likelihood of each pairing occuring,reflects this preference in the FindBell and Singbehaviors.

Figure 18 shows the Activation Level plot forthis sequence. Initially the AL values for Find-Bell and Sing are again very close. Just aswe saw in the previous experiment, as the Go-ToClass behavior becomes activated both Find-Bell and Sing are primed. However, under the

Pair Frequency Thresholding method the RSEis scaled by the expected probability of a pair-ing occurring, which results in different levels ofexcitation for the two behaviors. Since FindBellis more likely to follow GoToClass, its excitationlevel is greater. This results in the bell ringingbehavior being executed before singing.

It is important to note the amount of separa-tion between the primed AL values of FindBelland Sing, and to compare this to the separa-tion at the beginning of Figure 17. In Figure17 there is a strong internal desire associatedwith the Sing behavior but not with FindBell.This causes a large separation in the AL valuesof the two behaviors which is greater than theseparation due to RSE in Figure 18. This leadsto the conclusion that a preference for one planordering over another will indeed bias the behav-ior selection towards preferring that order, butwithout eliminating the occurrence of the lesspreferred sequence. If QRIO again experiencesa strong desire to Sing and little or no desirefor FindBell, then, as can be seen in Figure 19,the less preferred behavior sequence will indeedoccur.

5.4 Experiment 4

The final experiment utilizes the routine se-quence captured in Experiment 3, but the Rou-tine Self Excitation is calculated using theStochastic instead of the Likelihood method.The Stochastic method aims to generate a dis-tribution of AL values that resembles the planor pairing distribution. Specifically, in the casewhere the initial AL values of two behaviors arevery close, the goal is to ensure that each behav-ior has a probability of being activated that isproportional to the likelihood of the pairing.

The resulting AL graph can be seen in Figure

21

Figure 19: Graph of the Activation Levels ofa routine sequence resulting from biased pref-erence variable order plans. Both FindBell andSing are primed during the active phase of theGoToClass behavior. The Sing behavior hasa higher initial motivation, and although it isnot the preferred behavior, its overall ActivationLevel exceeds that of the FindBell behavior.

20. The most notable difference occurs duringthe time when the GoToClass behavior is ac-tive. In this case, Sing is the less likely behavior,but its Activation Level exceeds that of FindBellapproximately 28% of the time, roughly propor-tional to the likelihood of the GoToClass-Singpairing. This ensures that all else being equal,all pairings will have a probability of being se-lected that is proportional to the likelihood ofthe pairing as observed under the control of theDeliberative System. Of course, in the case thata preference for one activity over the other existsdue to the internal state, it will play a strong rolein behavior selection as was previously demon-strated.

Figure 20: Graph of the Activation Levels ofa routine sequence resulting from biased pref-erence variable order plans, demonstrating theStochastic RSE calculation method.

6 Discussion and Conclusions

This work demonstrates a method by which se-quential tasks executed through deliberation canbecome routine, such that the mechanism oftheir execution is shifted to the reactive behav-ior level and deliberative planning is no longerinvolved in the execution of the sequence. Thisallows the separation of routine and non-routineactivities, while reducing the load on the plannerand freeing system resources.

The experimental results presented abovedemonstrate the flexibility of the routine capturesystem, as well as many of its strengths. Thismethod allows an unlimited number of behaviorpairings to be captured into routines, enablingthe robot to learn and perform sequenced tasksin a natural and ordered manner. Numerous be-haviors can be linked to each trigger schema, al-lowing the same behavior to be reused in many

22

plans. Which behavior is selected for activationdepends not only on the observed experiences,but also on the current internal state of the robotand the external stimuli. As a result, the rou-tine behavior execution builds upon the robot’sability to express its desires instead of suppress-ing it. Over time, this leads to more interest-ing behavior combinations that vary dependingon the robot’s environment, a key property forlong-term robot interaction.

Several interesting and important extensionsto this model are left to consider. One naturalextension to examine is using a richer and moreflexible trigger schema representation. Whereasthe current algorithm only looks one step back inthe history, it could be extended to trace furtherback along the sequence of past behaviors. Thiswould enable the system to differentiate betweenthe sequences A → B → C and D → B → E,where just knowing that B is active does not pro-vide enough information to determine whether Eor C is more appropriate as the next behavior.In this case, the ambiguity can be eliminated bychecking further back in the history. This exten-sion would result in more accurate and reliableexecution of the routine behavior sequences.

Another question of interest is whether therobot should be allowed to unlearn, or forget,previously captured routines. Such an extensionseems natural as the environmental features orthe robot’s tasks may permanently change, re-sulting in some learned sequences becoming use-less or inappropriate. The implementation andanalysis of this mechanism has been left for fu-ture work.

While many directions are left to be explored,the system presented here is already the first ofits kind to autonomously perform the captureand execution of higher level behavior sequences.The implementation and execution of this com-

plete system on an autonomous humanoid robotbrings us a step closer to achieving the goal ofnatural and varied long-term human-robot inter-action.

7 Acknowledgments

This research was conducted while the authorswere visiting the Sony Intelligence DynamicsLaboratories (SIDL). We would like to thankKazumi Aoyama, Hideki Shimomura, and theother members of the SIDL development teamfor their invaluable help with this project. Wealso thank Masahiro Fujita and Toshitada Doifor making this collaboration possible.

23

References

Arkin, R., Fujita, M., Takagi, T., & Hasegawa,R. (2003). An ethological basis forhuman-robot interaction. Robotics and Au-tonomous Systems, 42 (3-4).

Badgalyan, R. (2000). Executive control, willedactions, and nonconscious processing. Hu-man Brain Mapping, 9 (38), 38-41.

Cisek, P. (2001). Embodiment is all in the head.Behavioral and Brain Sciences, 24 (1), 36-38.

Cleeremans, A. (1993). Mechanisms of implicitlearning: connectionist models of sequenceprocessing. Cambridge, MA, USA: MITPress.

Cooper, R., & Glasspool, D. (2001). Learn-ing action affordances and action schemas.Connectionist Models of Learning, Devel-opment, and Evolution, 133-142.

Cooper, R., & Shallice, T. (1997). Modeling theselection of routine action: Exploring thecriticality of parameter values. In Proceed-ings of the 19th annual conference of thecognitive science society (p. 130-135).

Cooper, R., & Shallice, T. (2000). Contentionscheduling and the control of routine ac-tivities. Cognitive Neuropsychology, 17 (4),297-338.

Cooper, R., Shallice, T., & Farringdon, J.(1995). Symbolic and continuous processesin the automatic selection of actions. Hy-brid Problems, Hybrid Solutions, 27-37.

Fujita, M., Kuroki, Y., Ishida, T., & Doi,T. (2003). Autonomous behavior controlarchitecture of entertainment humanoidrobot SDR-4X. In IEEE/RSJ interna-tional conference on intelligent robots andsystems. Las Vegas, USA.

Garforth, J., Meehan, A., & McHale, S. (2000).

Attentional behavior in robotic agents. InProceedings of the international workshopon recent advances in mobile robots.

Hoshino, Y., Takagi, T., Profio, U., & Fujita,M. (2004, April). Behavior description andcontrol using behavior module for personalrobot. In Proceedings of the IEEE interna-tional conference on robotics and automa-tion.

Houghton, G., & Hartley, T. (1995). Parallelmodels of serial behaviour: Lashley revis-ited. Psyche, 2 (25).

Keele, S., Ivry, R., Mayr, E., Hazeltine, E., &Heuer, H. (2004). The cognitive and neu-ral architecture of sequence representation.Psychological Review, 316-339.

Lashley, K. S. (1951). The problem of serial or-der in behavior. In L. Jeffress (Ed.), Cere-bral mechanisms in behavior. New York:Wiley.

Norman, D. A., & Shallice, T. (1986). Con-sciousness and self regulation: Advancesin theory and research. In D. S. R. David-son G. Schwartz (Ed.), Attention to action:Willed and automatic control of behavior(Vol. 4, p. 515-549). Academic Press.

Prescott, T., Gurney, K., & Redgrave, P. (2003).Basal ganglia. In The handbook of braintheory and neural networks (Second ed.).MIT Press.

Sawada, T., Takagi, T., & Fujita, M. (2004,September). Behavior selection and mo-tion modulation in emotionally groundedarchitecture for QRIO SDR-4X II. InIEEE/RSJ international conference on in-telligent robots and systems. Sendai,Japan.

Sawada, T., Takagi, T., Hoshino, Y., & Fujita,M. (2004, November). Learning behav-ior selection through interaction based on

24

emotionally grounded symbol concept. InIEEE-RAS/RSJ international conferenceon humanoid robots.

Sun, R., & Giles, L. (2001). Sequence learn-ing: Paradigms, algorithms, and applica-tions. Springer.

Tanaka, F., Noda, K., Sawada, T., & Fujita, M.(2004). Associated emotion and its expres-sion in an entertainment robot QRIO. InInternational conference on entertainmentcomputing (p. 499-504).

Ulam, P., & Arkin, R. (2006). Biasing behav-ioral activation with intent (Tech. Rep. No.GIT-GVU-06-11). GVU Center, GeorgiaInstitute of Technology.

25

Documents

From Deliberative to Routine Behaviors: a Cognitively ... · From Deliberative to Routine Behaviors: a Cognitively-inspired Action Selection Mechanism for Routine Behavior Capture