6
Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12-17, 2007 Robotic Architecture Inspired on Behavior Analysis Abstract- Social robots are embodied agents that are part of a heterogeneous group: a society of robots or humans. They are able to recognize human beings and each other, and engage in social interactions. They possess histories and they explicitly communicate and learn from interactions. The construction of social robots may strongly benefit from using a robotic architecture. However, a robotic architecture for sociable robots must have structures and mechanism to allow social interaction control and learning from environment. In this paper, we propose a robotic architecture inspired on Behavior Analysis. Methods and structures of the proposed architecture are presented and discussed. The architecture was evaluated on a Skinner Box simulator and the obtained results shown that the architecture is able to produce appropriate behavior and to learn from social interaction. I. INTRODUCTION Social robots are embodied agents that are part of a heterogeneous group: a society of robots or humans. They are able to recognize human beings and each other, and engage in social interactions. They possess histories (perceive and interpret the world in terms of their own experience), and they explicitly communicate and learn from interactions [1] [2]. There are several scientific and practical motivations for developing social robots as platform of researches, education and entertainment. Socially interactive robots are important for domains in which robots must exhibit peer-to-peer in- teraction skills, either because such skills are required for solving specific tasks, or because the primary function of the robot is to interact socially with people. One area where social interaction is desirable is that of "robot as persuasive machine", where the robot is used to change the behavior, feelings or attitudes of humans. This is the case when robots mediate human-human interaction, as in autism therapy. Additionally, some researchers design socially interactive robots simply to study embodied models of social behavior [2] [3]. The construction of social robots may strongly benefit from using a robotic architecture. Several robotic architec- tures were proposed in the literature [4] [5] [6]. However, a robotic architecture for sociable robots must have struc- tures and mechanism to allow social interaction control and learning from environment. The implementation of learning processes evidenced on Behavior Analysis can led to the development of promising methods and structures for the Claudio A. Policastro is with the Department of Computer Science, University of Sao Paulo, Sao Carlos, Brazil; email: [email protected]. Roseli A.F. Romero is with the Department of Computer Science, University of Sao Paulo, Sao Carlos, Brazil; email: [email protected]. Giovana Zuliani is with the Post Graduation on Special Education, Federal University of Sao Carlos, Sao Carlos, Brazil; email: [email protected]. construction social robots that are able to learn through inter- action from the environment and able to exhibit appropriate social behavior. This article reports an ongoing work aimed at developing a robotic architecture inspired on Behavior Analysis. The built in methods and structures allow the exhibition of appropriate social behavior and learning interaction. This work is the base for developing a tool for the construction of social robots. The proposed architecture was empirically evaluated in the context of a Skinner Box simulator. Results show that the architecture is able to produce appropriate behavior and learn from interaction. Future works include experiments with a real robotic head interacting with human beings, in order to evaluate the learning capabilities of the architecture with a non trivial real problem: the learning of shared attention. This article is organized in the following way. In section II we shortly introduce the Behavior Analysis, the base theory of this work. In Section III we describe the Skinner box experiment, the domain used for the preliminar evaluation of the proposed architecture, by simulation of this classical ex- periment. In Section IV we detail the proposed architecture. In Section V the main results from a set of experiments car- ried out to evaluate the proposed architecture are discussed. Finally, in section VI we present our conclusions. II. BEHAVIOR ANALYSIS This Section shortly introduces Behavior Analysis, the base theory of this work. Behavior is a primary characteristic of living things. Some behavior makes sense in terms of the events that precede it; other behavior makes more sense in terms of the events that follow it. Behavior guided by its consequences was called, by Skinner, operant behavior [7]. The word operant refers to an essential property of goal directed behavior: that it has some effect on the environment. The term control is sometimes used to refer to operant behavior also, as behavior that is controlled by its conse- quences [7] [8]. The consequences of behavior may feed back into the organism, affecting positively or negatively a behavior. When they do so, they may change the probability that the behavior which produced them will occur again. This probability refers to tendencies or predispositions to behave in particular ways. The change in frequency with which a response is emitted by an individual is the process of operant conditioning. In operant conditioning we strengthen (reinforce) an oper- ant in the sense of making response more probable or, in actual fact, more frequent [7]. Operant conditioning may be described without mentioning any stimulus which acts before the response is made. Stimuli are always acting upon 1-4244-1 380-X/07/$25.00 ©2007 IEEE

[IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA (2007.08.12-2007.08.17)] 2007 International Joint Conference on Neural Networks - Robotic Architecture

  • Upload
    giovana

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA (2007.08.12-2007.08.17)] 2007 International Joint Conference on Neural Networks - Robotic Architecture

Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August 12-17, 2007

Robotic Architecture Inspired on Behavior Analysis

Abstract- Social robots are embodied agents that are partof a heterogeneous group: a society of robots or humans.They are able to recognize human beings and each other,and engage in social interactions. They possess histories andthey explicitly communicate and learn from interactions. Theconstruction of social robots may strongly benefit from usinga robotic architecture. However, a robotic architecture forsociable robots must have structures and mechanism to allowsocial interaction control and learning from environment. In thispaper, we propose a robotic architecture inspired on BehaviorAnalysis. Methods and structures of the proposed architectureare presented and discussed. The architecture was evaluated ona Skinner Box simulator and the obtained results shown thatthe architecture is able to produce appropriate behavior andto learn from social interaction.

I. INTRODUCTION

Social robots are embodied agents that are part of aheterogeneous group: a society of robots or humans. They areable to recognize human beings and each other, and engagein social interactions. They possess histories (perceive andinterpret the world in terms of their own experience), andthey explicitly communicate and learn from interactions [1][2].

There are several scientific and practical motivations fordeveloping social robots as platform of researches, educationand entertainment. Socially interactive robots are importantfor domains in which robots must exhibit peer-to-peer in-teraction skills, either because such skills are required forsolving specific tasks, or because the primary function ofthe robot is to interact socially with people. One area wheresocial interaction is desirable is that of "robot as persuasivemachine", where the robot is used to change the behavior,feelings or attitudes of humans. This is the case when robotsmediate human-human interaction, as in autism therapy.Additionally, some researchers design socially interactiverobots simply to study embodied models of social behavior[2] [3].The construction of social robots may strongly benefit

from using a robotic architecture. Several robotic architec-tures were proposed in the literature [4] [5] [6]. However,a robotic architecture for sociable robots must have struc-tures and mechanism to allow social interaction control andlearning from environment. The implementation of learningprocesses evidenced on Behavior Analysis can led to thedevelopment of promising methods and structures for the

Claudio A. Policastro is with the Department of Computer Science,University of Sao Paulo, Sao Carlos, Brazil; email: [email protected] A.F. Romero is with the Department of Computer Science, Universityof Sao Paulo, Sao Carlos, Brazil; email: [email protected]. GiovanaZuliani is with the Post Graduation on Special Education, Federal Universityof Sao Carlos, Sao Carlos, Brazil; email: [email protected].

construction social robots that are able to learn through inter-action from the environment and able to exhibit appropriatesocial behavior.

This article reports an ongoing work aimed at developing arobotic architecture inspired on Behavior Analysis. The builtin methods and structures allow the exhibition of appropriatesocial behavior and learning interaction. This work is thebase for developing a tool for the construction of socialrobots. The proposed architecture was empirically evaluatedin the context of a Skinner Box simulator. Results showthat the architecture is able to produce appropriate behaviorand learn from interaction. Future works include experimentswith a real robotic head interacting with human beings, inorder to evaluate the learning capabilities of the architecturewith a non trivial real problem: the learning of sharedattention.

This article is organized in the following way. In section IIwe shortly introduce the Behavior Analysis, the base theoryof this work. In Section III we describe the Skinner boxexperiment, the domain used for the preliminar evaluation ofthe proposed architecture, by simulation of this classical ex-periment. In Section IV we detail the proposed architecture.In Section V the main results from a set of experiments car-ried out to evaluate the proposed architecture are discussed.Finally, in section VI we present our conclusions.

II. BEHAVIOR ANALYSIS

This Section shortly introduces Behavior Analysis, thebase theory of this work. Behavior is a primary characteristicof living things. Some behavior makes sense in terms ofthe events that precede it; other behavior makes more sensein terms of the events that follow it. Behavior guided byits consequences was called, by Skinner, operant behavior[7]. The word operant refers to an essential property of goaldirected behavior: that it has some effect on the environment.The term control is sometimes used to refer to operant

behavior also, as behavior that is controlled by its conse-quences [7] [8]. The consequences of behavior may feedback into the organism, affecting positively or negatively abehavior. When they do so, they may change the probabilitythat the behavior which produced them will occur again. Thisprobability refers to tendencies or predispositions to behavein particular ways. The change in frequency with which aresponse is emitted by an individual is the process of operantconditioning.

In operant conditioning we strengthen (reinforce) an oper-ant in the sense of making response more probable or, inactual fact, more frequent [7]. Operant conditioning maybe described without mentioning any stimulus which actsbefore the response is made. Stimuli are always acting upon

1-4244-1 380-X/07/$25.00 ©2007 IEEE

Page 2: [IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA (2007.08.12-2007.08.17)] 2007 International Joint Conference on Neural Networks - Robotic Architecture

an organism, but their functional connection with operantbehavior is not like that in the reflex. Operant behavior,in short, is emitted, rather than elicited. It must have thisproperty if the notion of probability of response is to makesense. Most operant behavior, however, acquires importantconnections with the surrounding world. Those connectionsmay be shaped by submitting an individual to a contingencyor histories of reinforcement. We describe a contingency bysaying that a stimulus is the occasion upon which a responseis followed by reinforcement. The process through whichthis comes about is called discrimination. Its importance ina theoretical analysis, as well as in the practical controlof behavior, is obvious: when a discrimination has beenestablished, we may alter the probability of a response bypresenting or removing the discriminative stimulus [7] [8].

Operant behavior almost necessarily comes under stimuluscontrol, since only a few responses are automatically rein-forced by the organism's own body without respect to exter-nal circumstances. Reinforcement achieved by adjusting to agiven environment almost always requires the sort of physicalcontact which we call stimulation. The environmental controlhas an obvious biological significance. If all behavior wereequally likely to occur on all occasions, the result would bechaotic. It is obviously advantageous that a response occuronly when it is likely to be reinforced [7].One develops a behavior because visual stimulation from

an object is the occasion upon which certain responsesof walking, reaching, and so on lead to particular tactualconsequences. The visual field is the occasion for effectivemanipulatory action. The contingencies responsible for thebehavior are generated by the relations between visual andtactual stimulation characteristic of physical objects. Otherconnections between the properties of objects supply othersorts of contingencies which lead to similar changes inbehavior [7].

III. SKINNER Box DOMAIN

In this Section is presented the Skinner box domain.This domain was used for preliminarily evaluate the pro-posed architecture, by simulating this classical experiment.A Skinner box typically contains one or more levers whichan animal can press, one or more stimulus lights and oneor more places in which reinforcement stimuli like foodcan be delivered. The animal's presses on the levers canbe detected and recorded and a contingency between thesepresses, the state of the stimulus lights and the delivery ofreinforcement can be set up, all automatically. It is alsopossible to deliver other reinforcement stimulus such as wateror to produce punishments like electric shock through thefloor of the chamber. If the box is programmed so that asingle lever-press causes a pellet to be dispensed, followed bya period for the rat to eat the pellet when the discriminativestimulus light is out and the lever inoperative, then the ratmay learn to press the lever if left to its own devices forlong enough. With the bos, we can initially shape the rat'sbehavior so that it learns to eat from the hopper. Instead of

- lever pressing - it is rewarded whenever it performs a

behavior which approximates to lever pressing. The closenessof the approximation to the desired behavior required inorder for the rat to get a pellet is gradually increased so

that eventually it is only reinforced for pressing the lever.Starting by reinforcing the animal whenever it is on the sidewhere the lever is. After this the reinforcement occurs if itshead is pointing towards the lever and then later only when itapproaches the lever, when it touches the lever with the fronthalf of its body, when it puts touches the lever with its pawand so on until the rat is pressing the lever in order to obtainthe reinforcer. The rat may still not have completely learnedthe operant contingency - specifically it may not yet havelearned that the contingency between the operant response

and reinforcement is signalled by the light. If we now leavehim to work in the Skinner-box on its own it will soon learnthis and will only press the lever when the light is on.

IV. PROPOSED ARCHITECTURE

In this section we detail the proposed robotic architecture,composed by mechanisms and structures evidenced fromBehavior Analysis. The architecture simulates an individual'soperant conditioning through histories of reinforcement. Itis composed by three main modules: stimulus perception,response emission and consequence control. The stimulusperception module employs algorithms of data acquisitionand a vision system. This module detects the state fromthe environment and encodes this state using an appropriaterepresentation. The response emission module is composedby a learning mechanism that constructs a nondeterministicpolicy for response emission, that is, what response is to beemitted on the presence of certain antecedent stimulus. Theconsequence control module is composed by a motivationalsystem that simulates internal necessities of the robot anddetects reinforcements received from the environment. Themotivational system is formed by necessity units that are

implemented as a simple perceptron with recurrent connec-tions [9]. Those necessity units simulate the homeostasesof an alive organisms. A positive value of a necessityunit, above a predefined threshold, indicates the privationof the robot to certain reinforcement stimulus. In this way,

the architecture supplies mechanisms to simulate privationstates and satisfaction of necessities, and mechanism todetermine reinforcements as consequences of an emittedresponse. Figure 1 illustrates the general organization of theproposed architecture and the interaction between the threemain modules.

Following, we describe the main structures and methodsof the proposed architecture.

A. Knowledge Representation

The knowledge representation adopted for the proposedarchitecture is based on First Order Logic [10], enabling therepresentation of large spaces in an economical way. Thestimulus detected from the environment are represented as

terms or objects that may have properties like color, size,rewarding the rat for producing the exact behavior we require shape and quantity. Those terms represent relevant objects

Page 3: [IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA (2007.08.12-2007.08.17)] 2007 International Joint Conference on Neural Networks - Robotic Architecture

Fig. 1. General organization of the architecture. Arrows indicate the flow ofinformation in the three modules of the architecture. The circles indicate themethods and component structures of the modules. The Stimulus PerceptionModule encodes stimulus from environment. Those stimulus are then usedby the Consequence Control and Response Emission Modules for learningappropriate behaviors.

from the real world. Properties of the objects are set by theStimulus Perception Module, in order to encode the currentenvironment state. On the domain selected for the exper-iments reported in this paper (Skinner box), stimulus aresides and corners of the box (nt corner, se side, etc), bar,light and the reinforcement stimuli food. The environmentstate is encoded employing perception predicates: see(X),hear(Y), at(Z) and smell(W). The perception predicates relateall detected stimulus to build a representation from the cur-rent environment state. Robot responses are encoded in thearchitecture as action predicates. These predicates representpredefined motor scripts that should be executed to emita selected response. Action predicates are press(X) andexplore(Z), representing the rat's responses or behaviors.The proposed architecture encodes knowledge on the formof behavior rules and constraint rules. Behavior rules en-codes knowledge about appropriate behavior learned by thearchitecture, having the following general form:

fitnessstimulus > responseSD

where the antecedent part is a set of stimulus representingan environment state and the consequent part is the responseto be emitted by the robot. The value above the connector is afitness value that indicates the probability of rule executionwhen the antecedent part is satisfied. The value below theconnector indicates the reinforcement stimuli, if there exists,to be obtained as consequence of the rule execution. Bothvalues are employed by the response selection mechanismto build the selection roulette. Constraint rules are employedto indicate when certain responses can or can not be emitted,having the following general form:

stimulus -> can(response)

where the antecedent part is a set of stimulus representingan environment state and the consequent part is the responsethat can be emitted only if the antecedent part is satisfied.This rules are employed by the architecture to constrain theemission of some responses.

B. Working Memory

The proposed architecture employs a working memoryto exchange information between the three main modules.This memory is used to keep information about stimulus(antecedents and consequents), last emitted response andinternal necessities. Each element inserted in the workingmemory has a counter that keeps the notion of time. Whena new element is inserted in the working memory, its agecounter is set to zero, and it is incremented by 1 whenevernew subsequent predicates are inserted. So, elements persistfor a number of time steps in the memory. This mechanismis employed to control the chronology of facts and events,and to determine the three terms of a contingency.

C. Motivational System

Operant behavior of higher animals depends on theirmotivation, and the value of the reward or punishment. An ar-tificial motivational system may enable a robot to proactivelyinteract with the environment, driving its behaviors to satiateits artificial internal necessities. The consequence controlmodule is composed by a motivational system that simulatesinternal necessities of an individual. The motivational systemis based on works presented by Breazeal [2] and Gadanho[11] and it has one or more necessity units implementedas a simple perceptron with recurrent connections [9]. Theactivation of a necessity unit is given by Equation (1).

n

u= (Z:WjX iij)+wrXr +bj=l

(1)

where ij is the input signal representing a stimuli detectedfrom the environment, i r is the signal from the recurrentconnection, wj are connection weights of the input signal,wr is connection weight of the recurrent signal, and b is thebias of the unit. All weights and bias are empirically definedaccording to the necessity being simulated. The output of anecessity unit is given by Equation (2).

y1

1 -Fe(U±6) (2)

where u is the activation value and d is the sigmoidfunction inclination. A necessity unit simulates the internalnecessities of an individual. Additionally, the motivationalsystem has an output mediator that mediates activationamong several necessity units, employing competition and anactivation threshold. Figure 2 shows the general architectureof the motivational system.The motivational system works as follows. Initially, the

stimulus detected from the environment are sent to the con-sequence control module. Then, the Preprocessor encodesthose stimulus to construct an appropriate input pattern.

Page 4: [IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA (2007.08.12-2007.08.17)] 2007 International Joint Conference on Neural Networks - Robotic Architecture

w

Fig. 2. General architecture of the motivational system. The system ismodeled as a competitive artificial neural network with recurrent connec-

tions. Stimulus is a set of input stimulus from the environment. Preprocessorencodes the input signals received from the environment into an appropriateform. Units i, j, ...m represent an array of m necessity units that can besimulated by the system. Bias is the activation bias of each unit. The output yof each unit is given by y = f (u). Mediator selects the dominant necessity,that is higher than a predefined threshold.

This input pattern may be or not normalized, depending on

the numeric interval of the selected connection weights andproblem domain. Afterwards, the necessity units calculatetheir activation values employing the Equation (1), and theiroutput values employing the Equation (2). After this, theMediator performs a competition among all unit outputs andselects the winner. The Mediator checks if the winner ishigher than the activation threshold. If so, the motivationalsystem outputs the active necessity. After this, the motiva-tional system checks and reports if any necessity unit hasgot a reinforcement, checking the decreasing of their outputvalue.

D. Contingency Learning

The proposed architecture is able to simulate learning ofcontingencies and stimulus discrimination from histories ofreinforcement. Learning is carried out by a nondeterministicreinforcement algorithm [12] [13] by storing new behaviorrules and updating the execution probability of existing ones.

Figure 3 shows the learning control algorithm.During an interaction, the stimulus perception module

acquires and codifies the environment state and deploys thiscoded state for the response emission and the consequence

control modules. After, the consequence control modulechecks the internal state of the robot and sets the activenecessities, if there is one. Then, the architecture controlenters a loop that may be finished at the end of an interactionor when the robot reaches it goal. In the loop, the response

emission module uses the state and necessity informationto select a response to be emitted by the robot. Responseselection is done in a probabilistic way, based on RouletteWheel selection method [14]. This method is also calledstochastic sampling with replacement. The roulette-wheelselection algorithm provides a zero bias and the probability

Fig. 3. Architecture main control algorithm. After initiation, the algorithmenters a loop that performs the response emission and learning whileinteracting with the environment.

to be chosen are proportional to the fitness value. Theprobability distribution of this method is given by Equation(3).

Pifin

E fij=l

(3)

where fi and fj are fitness values of each response or

behavior rule. All responses in the robot's repertory keeps a

default fitness value (fd) that is predefined as a parameter inthe architecture. This default fitness value, as well as fitnessvalues from the behavior rules, are employed for building theselection roulette. While selecting the appropriate behaviorrules, the response selection method can increase or decreaseits fitness value by an influence rate, if a rule satisfies an

active necessity, or if a rule satisfies an inactive necessity,respectively. This influence rate (I) is given by the motiva-tional system. It reflects the internal state of the robot andis the difference between the activation value of a necessityunit and the activation threshold. Therefore, the influence ispositive if the necessity unit is active and negative if thenecessity unit is inactive.

Afterwards, the selected response is emitted by executing a

motor script. Then, the stimulus perception module acquiresand encodes the new current environment state and sends itto the response emission. The consequence control modulepropagates the encoded new state through the motivationalsystem and checks the internal state of the robot and any re-

inforcement got as consequence of the last emitted response.

If the last emitted response is not yet a rule, the learningalgorithm then links the three-term contingency (antecedentstimulus, last emitted response and consequence), storing thisnew knowledge as a new behavior rule. If the behavior rulealready exists, the architecture updates its fitness using theperceived consequence of its execution. Fitness update iscarried out employing the learning rule given by Equation(4).

function MainControl (State<-EncodeEnvironmentState(UpdatelnternalState(State)Reinforcements<- CheckReinforements()ActiveNecessities<- CheckNecessities(while Interaction EpisodeResponse <- SelectResponse(ActiveNecessities)EmitResponse(Response)State<-EncodeEnvironmentState(UpdatelnternalState(State)Reinforcements <- CheckReinforements()ActiveNecessities <- CheckNecessities(if Last response is not a rule

AntecedentState <- GetAntecedentState(CreateBehaviorRule(AntecedentState, Response, Reinforcements)

elseUpdateBehaviorRule(Reinforcements)

end ifend whileend function

Page 5: [IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA (2007.08.12-2007.08.17)] 2007 International Joint Conference on Neural Networks - Robotic Architecture

fn = anx (P x )+(1 -an) x fn (4)Cnwhere ft is the new fitness value at present time, P is

the power of a reinforcement stimuli, Cr and Cn are thereinforcement and execution counters that represent respec-

tively the number of execution of the rule and the numberof reinforcements got, and an is a decreasing learning rategiven by Equation (5).

A i{f Cn NNInteractionsNn A

iiof Cn > NInteractions

where Ninteractions denotes the minimum executions of a

behavior rule before a decreases and A is a learning constant,both set as parameters of the architecture. This decreasinglearning rate allows the convergence of the algorithm to theoptimal policy.

This function enables to increase a fitness value when a

behavior rule receives a reinforcement, and to decrease a

fitness value, or a punishment, when a behavior rule doesnot receive a reinforcement. Fitness value ft may vary ina range [fd..o°], where fd is the default fitness value. Thisparameter prevents the extinction of existing behavior rules.Additionally, this learning mechanism is tolerant to non

stationary environments.

In this way, the architecture is able to simulate the pro-

cess of learning contingencies for an individual during an

interaction in the environment.

V. RESULTS AND EVALUATION

In this section, the main results from an experiment carriedout to evaluate the proposed architecture are presented. Theexperiment was carried out employing a simulator of a

Skinner Box. The objective of the experiment was teachinga rat to press a lever to obtain food only when the controllight of the box is on. Teaching was done by controllingthe stimulus present on environment and reinforcing correctactions of the rat in a process of successive approximations.In this experiment, the simulator was programmed with a

food lever to drop pellets of food. The control light is on

in the first 500 time units and then is switched on and offto the end of the experiment. The rat is able to exploreany side of the box, to press the food lever and perceiveseveral stimulus like food, lever, sides and corners of the box.The motivational system was configured with one necessitysimulating hunger and the input pattern was configured torecognize food as a reinforcement stimuli.

The architecture was set as follows. The Learning constant(A parameter) was set to 0.5. The minimum rule executionsbefore start decreasing the learning rate (Ninteractions pa-

rameter) was set to 1. Default fitness value (parameter fd)was set to 0.001. The activation threshold of the motivationalsystem was set to 0.7. The sigmoid function inclination ofthe necessity unit (d parameter) was set to 0.20. The Biasof the unit was set to 1.50 and its weight was set to 0.15.The weight of the recurrent connexion was set to 1.00.

Weights of the input pattern (see(food) and smell(food)) wasset respectively to -1.50 and 0.10. The authors have foundempirically that these parameters have produced better resultsfor the experiment.

In the experiment, the was initially put on an arbitraryplace and has started to explore the entire box. When therat explored the side near the lever, a pellet of food wasdelivered for it. This procedure was repeated for five timesto teach the rat that exploring the side near the lever is goodto receive food. So, whenever the rat was hungry, it wentnear the bar. In one of these explorations, the rat incidentallyhave pressed the lever and got food when the control lightwas on. This process was repeated some times so the rat haslearned to press the lever to get food. Afterward, when therat was hungry, it tried to press the lever with the controllight off. After some times performing this behavior, therat has learned that when the control light is off food wasnot delivered by pressing the lever. So, the rat has learnedto press the lever to get food only when the control lightis on. In Figure 4 is shown the rat's behavior through thesimulation. The chart show the influence of the motivationalsystem on the behavior of the rat and response emission onappropriate contexts. As the motivational system output goabove the activation threshold, the rat becomes hungry andthen explores the side near the lever. Then, it presses thelever to get food and to satiate its necessity, only when thecontrol light is on.

This result shows that the architecture is able to control thebehavior of a social agent and the emit appropriate behaviorfrom interaction with the environment. The results show alsothat the architecture supports behavior shaping, that is, weare able to shape the agent's behavior by successive approx-imations, reinforcing and chaining intermediate behaviorsto form a more complex one. In the experiment, this wasdone by chaining the behaviors of exploring the side nearthe lever and the behavior of pressing the lever. Behaviormodeling and teaching are important procedures evidencedfrom the Behavior Analysis, that are employed to shapebehaviors to an individual. Therefore, the results show thatthe architecture is a potential tool to control sociable robotsduring interactions in a social environment. As a result ofthis experiment, seventy nine behavior rules were created.Several behavior rules were created to control exploration ofthe rat in the box:

0.003at(ne-side) '- explore(se-side)

Some behavior rules were created to control explorationof the side near the lever (et side) whenever the rat washungry:

Page 6: [IEEE 2007 International Joint Conference on Neural Networks - Orlando, FL, USA (2007.08.12-2007.08.17)] 2007 International Joint Conference on Neural Networks - Robotic Architecture

Fig. 4. Influence of the motivational system on the behavior of the rat. Activation Threshold is the threshold employed by the motivational system tocheck the dominant active necessity. All outputs were normalized to fit the chart scale.

0699

at(ne side) explore(et side)food

at(st side) 00- explore(et side)food

0.2099

at(nw corner) explore(et side)food

Additionally, two behavior rules were created to controlthe behavior of pressing the lever when the rat was hungry:

at(et side)&see(light on)&see(bar) press(lever)food

0 0075

at (et side)&see(light off)&see(bar) '--7 press(lever)

The first behavior rule shapes the rat's behavior to press

the food bar to get food, with a highly probability, whenevercontrol light is on. The second behavior rule shapes the rat'sbehavior to press the food bar to get food, with a lowerprobability, whenever control light is off. Therefore, the rathas learned to press the food bar when the control light ison.

VI. CONCLUSIONS

In this paper, we have presented an ongoing work for thedevelopment of a robotic architecture inspired on BehaviorAnalysis. This work is the base for developing a tool forthe construction of social robots. The proposed architecturewas preliminarily evaluated on a simulation of a Skinner BoxExperiment. The evaluation shows that the architecture isable to produce appropriate behavior and learn contingenciesfrom interaction. The results show also that the architecture

is a potential tool to control sociable robots. Future worksinclude the incorporation of algorithms for generalizing theknowledge learned by the architecture and experiments witha real robotic head interacting with human beings, in orderto evaluate the learning capabilities of the architecture witha non trivial real problem: the learning of shared attention.

ACKNOWLEDGMENT

The authors would like to thank FAPESP and CNPq forsupport received.

REFERENCES

[1] K. Dautenhahn and A. Billard, "Bringing up robots or-the psychologyof socially intelligent robots: From theory to implementation," inAutonomous Agents, 1999.

[2] C. Breazeal, Designing Sociable Robots. MIT Press, 2002.[3] B. Robins, P. Dickerson, P. Stribling, and K. Dautenhahn, "Robot-

mediated joint attention in children with autism: A case study in robot-human interaction," Interaction Studies, vol. 5, no. 2, pp. 161--198,2004.

[4] R. Arkin, Behavior-Based Robotics. Cambridge: MIT Press, 1998.[5] N. Oza, "A survey of robot architectures. cite-

seer.ist.psu.edu/205448.html."[6] B. Duffy, M. Dragone, and G. O'Hare, "Social robot architecture: A

framework for explicit social interaction," in Android Science: TowardsSocial Mechanisms, CogSci 2005 Workshop, Stresa, Italy, 2005.

[7] B. Skinner, About Behaviorism. Penguin Books, 1974.[8] J. Staddon, Adaptive Behavior and Learning. Cambridge Press, 1983.[9] S. Haykin, Neural Networks -A Comprehensive Foundation. Prentice

Hall, 1999.[10] S. Russell and P. Norvig, Artificial Intelligence - A Modern Approach.

Prentice-Hall, 2003.[11] S. Gadanho and J. Hallan, "Robot learning driven by emotions,"

Adaptative Behavior, vol. 9, no. 1, pp. 42--64, 2001.[12] R. Sutton and A. Barto, Reinforcement Learning: An Introduction.

MIT Press, 1998.[13] T. Mitchell, Machine Learning. McGraw-Hill, 1997.[14] D. Goldberg, Genetic Algorithm in Search, Optimization and Machine

Learning. Massachusetts: Addison-Wesley Publishing Company, Inc.,1989.