43
Chapter 1 Abstraction Levels for Robotic Imitation: Overview and Computational Approaches Manuel Lopes, Francisco Melo, Luis Montesano and Jos´ e Santos-Victor Abstract This chapter reviews several approaches to the problem of learning by imitation in robotics. We start by describing several cognitive processes identified in the literature as necessary for imitation. We then proceed by surveying different approaches to this problem, placing particular emphasys on methods whereby an agent first learns about its own body dynamics by means of self-exploration and then uses this knowledge about its own body to recognize the actions being performed by other agents. This general approach is related to the motor theory of perception, par- ticularly to the mirror neurons found in primates. We distinguish three fundamental classes of methods, corresponding to three abstraction levels at which imitation can be addressed. As such, the methods surveyed herein exhibit behaviors that range from raw sensory-motor trajectory matching to high-level abstract task replication. We also discuss the impact that knowledge about the world and/or the demonstrator can have on the particular behaviors exhibited. Manuel Lopes Instituto de Sistemas e Rob ´ otica, Instituto Superior T´ ecnico Lisbon, Portugal e-mail: [email protected] Francisco Melo Carnegie Mellon University Pittsburgh, PA, USA e-mail: [email protected] Luis Montesano Universidad de Zaragoza Zaragoza, Spain e-mail: [email protected] Jos´ e Santos-Victor Instituto de Sistemas e Rob ´ otica, Instituto Superior T´ ecnico Lisbon, Portugal e-mail: [email protected] 1

Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

Chapter 1Abstraction Levels for Robotic Imitation:Overview and Computational Approaches

Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

Abstract This chapter reviews several approaches to the problem of learning byimitation in robotics. We start by describing several cognitive processes identifiedin the literature as necessary for imitation. We then proceed by surveying differentapproaches to this problem, placing particular emphasys on methods whereby anagent first learns about its own body dynamics by means of self-exploration and thenuses this knowledge about its own body to recognize the actions being performed byother agents. This general approach is related to the motor theory of perception, par-ticularly to the mirror neurons found in primates. We distinguish three fundamentalclasses of methods, corresponding to three abstraction levels at which imitation canbe addressed. As such, the methods surveyed herein exhibit behaviors that rangefrom raw sensory-motor trajectory matching to high-level abstract task replication.We also discuss the impact that knowledge about the world and/or the demonstratorcan have on the particular behaviors exhibited.

Manuel LopesInstituto de Sistemas e Robotica, Instituto Superior TecnicoLisbon, Portugal e-mail: [email protected]

Francisco MeloCarnegie Mellon UniversityPittsburgh, PA, USA e-mail: [email protected]

Luis MontesanoUniversidad de ZaragozaZaragoza, Spain e-mail: [email protected]

Jose Santos-VictorInstituto de Sistemas e Robotica, Instituto Superior TecnicoLisbon, Portugal e-mail: [email protected]

1

jasv2
Text Box
in "From Motor Learning to Interaction Learning in Robots" Springer Verlag, Series: Studies in Computational Intelligence , Vol. 264, Sigaud, Olivier; Peters, Jan (Eds.), 2010.
Page 2: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

2 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

MotorRessonance

Object-MediatedImitation

Imitation byGoal Inference

Replication ofobserved motor

patterns

Replication ofobserved effects

Replication ofactions towardinferred goal

Fig. 1.1: Approaches to imitation at the three levels of abstraction discussed in thischapter.

1.1 Introduction

In this chapter we study several approaches to the problem of imitation in robots.This type of skill transfer is only possible if the robots have several cognitive ca-pabilities that, in turn, pose multiple challenges in terms of modeling, perception,estimation and generalization. Throughout the chapter, we survey several methodsthat allow robots to learn from a demonstration. Several other surveys cover differ-ent aspects of imitation, including [6, 11, 16, 128].

Rather than providing another extensive survey of learning from demonstration,in this chapter we review some recent developments in imitation in biological sys-tems and focus on robotics works that consider self-modeling as a fundamental partof the cognitive processes involved in and required for imitation. Self-modeling, inthis context, refers to the learning processes that allow the robot to understand itsown body and its interaction with the environment.

In this survey, we distinguish three fundamental classes of methods, each ad-dressing the problem of learning by imitation at different levels of abstraction. Eachof these levels of abstraction focuses on a particular aspect of the demonstration,giving rise to different imitative behaviors ranging from motor resonance to a moreabstract imitation of inferred goals. This hierarchy of behaviors is summarized inthe diagram of Fig. 1.1. It is interesting to note that the approaches at these differentlevels of abstraction, rather than being mutually exclusive, actually provide a naturalhierarchical decomposition, in which approaches at the more abstracted levels canbuild on the outcome of methods in less abstract levels (see, for example, [80, 84]for an example of such integration).

Why Learn by Imitation?

The impressive research advances in robotics and autonomous systems in the pastyears have led to the development of robotic platforms of increasingly complex mo-tor, perceptual and cognitive capabilities. These achievements open the way for new

Page 3: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 3

applications that require these systems to interact with other robots and/or humanusers during extended periods of time. Traditional programming methodologies androbot interfaces will no longer suffice, as these systems need to learn to execute newcomplex tasks and improve their performance throughout its lifetime.

Learning by imitation is likely to become one primary form of teaching suchcomplex robots [9, 127]. Paralleling the ability of human infants to learn through(extensive) imitation, an artificial system can retrieve a large amount of task relatedinformation simply by observing other individuals, humans or robots, perform thatsame task. Such a system would ideally be able to observe humans and learn howto solve similar tasks by imitation only. To be able to achieve such capability thereare several other skills that must be developed first [84].

The ability to imitate has also been used in combination with other learningmechanisms. For instance, it can speed up learning either by providing an initialsolution for the intended task that can then be improved by trial-and-error [109] orby guiding exploration [112, 114]. It also provides more intuitive and acceptablehuman-machine interactions due to its inherent social component [20,79]. Learningby imitation has been applied before the advent of humanoid robots and in sev-eral different applications, including robotics [75], teleoperation [153], assemblytasks [149], game characters [139], multiagent systems [113], computer program-ming [49] and others.

What Is Imitation?

In biological literature, many behaviors have been identified under the general labelof “social learning”. Two such social learning mechanisms have raised particularinterest among the research community, these being imitation and emulation [148].In both the agent tries to replicate the effects achieved by the demonstrator but inimitation the agent also replicates the motor behavior used to achieve such goal,while in emulation only the effects are replicated (the agent achieves the effect byits own means).

In robotic research the word imitation is also used to represent many differentbehaviors and methodologies. Some works seek to clarify and distinguish severalsuch approaches, either from a purely computational point-of-view [84,89,104,127]or taking inspiration in the biological counterparts [25, 79, 140, 143, 146, 148, 154].The taxonomy depicted in Fig. 1.2 provides one possible classification of differentsocial learning mechanisms that takes into account three sources of information,namely goals, actions and effects.

In this paper, we define imitation in its daily use meaning and use the designa-tions imitation and learning/programming by demonstration interchangeably. Tak-ing into account the previous taxonomy the works presented may be classified underother labels. Roughly speaking we can consider the three levels as going from mim-icking, through (goal) emulation and finally imitation. This division is clear if weconsider methods that make an explicit inference about the goal as imitation, but

Page 4: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

4 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

= Imitation

Understand (andadopt) goal

Copy action

Do not copyaction

Reproduce result

Do not reproduceresult

Reproduce result

Do not reproduceresult

= Imitation (failed)

= Goal emulation

= Goal emulation

= Mimicry

Do not understand(or adopt) goal

= Mimicry

= Emulation

= Other

Copy action

Do not copyaction

Reproduce result

Do not reproduceresult

Reproduce result

Do not reproduceresult

Fig. 1.2: Behavior classification in terms of goals, actions and effects (reproducedfrom [25]).

not that clear in the cases where the trajectory generalization is performed using animplicit goal inference.

Organization of the Chapter

In the continuation, and before entering into the different computational approachesto imitation, Section 1.2 briefly outlines relevant aspects from psychology and neu-rophysiology on the topic of imitation in biological systems. Section 1.3 then dis-cusses imitation in artificial systems, by pointing out the main scientific challengesthat have been identified and addressed in the literature on imitation learning.

The remainder of the chapter is divided into three main sections, each addressingimitation from a specific perspective:

Page 5: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 5

• Section 1.4 addresses imitation from a motor resonance perspective, namely tra-jectory matching and generalization. It discusses approaches that work at thetrajectory level (either joint or task space). These approaches can be interpretedas performing regression (at the trajectory level) using the observed demonstra-tion, and including additional steps to allow the learner to generalize from it.

• Section 1.5 discusses imitation by replication of observed (world) events. In thissection, the learner focuses on replicating observed effects in the world, mainlyeffects on objects.

• Goal inference is finally presented in Section 1.6. We survey approaches in whichthe learner explicitly tries to infer the goal of the demonstrator and then uses thisgoal to guide its action-choice.

We note that such division is not strict and some of the approaches share ideasacross several of the perspectives above. Also, depending on the application andcontext, one particular perspective might be more appropriate for imitation thanthe others. We conclude the paper in Sections 1.7 and 1.8 by discussing other ap-proaches to imitation and providing some concluding remarks.

1.2 Imitation in Natural Systems

The idea of learning by imitation has a clear inspiration in the way humans and otheranimals learn. Therefore, results from neurophysiology and psychology on imitationin humans, chimpanzees and other primates are a valuable source of information tobetter understand, develop and implement artificial systems able to learn and imitate.Section 1.2.1 details information from neurophysiology and Section 1.2.2 presentsevidence from psychology and biology. Such results illustrate the highly complextask that imitation is. The brief literature review in this section identifies some ofthe the problems that must be addressed before robots can learn (efficiently) byimitation and some works in the robotic literature that seek to model/test cognitivehypothesis.

1.2.1 Neurophysiology

Neurophysiology identified several processes involved in action understanding that,in turn, contributed differently to the development of learning approaches in robotics.For example, mirror neurons [51, 106] provided an significant motivation for usingmotor simulation theories in robotics. Similar ideas were suggested in speech recog-nition [50,77]. Also the existence of forward and backward models in the cerebellumgave further evidence that the production system is involved in the perception, al-though it is not clear if it is necessary [152]. Most of these methods consider alreadyknown actions and not the way such knowledge can be acquired (we refer to [106]for further discussion). For the case of novel actions there is evidence that the mirror

Page 6: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

6 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

system is not sufficient to explain action understanding and a reasoning mechanismmust be involved [19]. We now discuss several of these views in more detail.

Motor Theories of Perception

Several theories already claimed that the motor system is involved in perception.An example is the motor theory of speech perception [77]. The three main claims inthis theory are: “(a) speech processing is special, (b) perceiving speech is perceivinggestures, and (c) the motor system is recruited for perceiving speech”. In [50], theauthors revisit such theory taking into account the results from the last 50 years. Theauthors argue that although claim (a) is likely false, claims (b) and (c) are still likelyto be true, although they admit that most of the findings supporting such claims maybe explained by alternative accounts.

One evidence in favor of the theory that the motor system is involved in percep-tion is the existence of several mechanisms in the brain involved in motor predictionand reconstruction. One such mechanism depends on the existence of several pairsof forward and backward models in the brain [152]. The forward model codes theperceptual effects of motor actions, while the backward model represents the inverserelation, i.e., the motor actions that might cause a given percept. These models pro-vide the agent with “simulation capabilities” for its own body dynamics, and arethus able to adapt to perturbations. They are also general enough to take into ac-count task restrictions.

Mirror and Canonical Neurons

The discovery of mirror neurons [51, 96, 106] fostered a significant interest onthe brain mechanisms involved in action understanding. These neurons are lo-cated in the F5 area of the macaque’s brain and discharge during the executionof hand/mouth movements. In spite of their localization in a pre-motor area of thebrain, mirror neurons fire both when the animal performs a specific goal-orientedgrasping action and when it sees that same action being performed by another in-dividual. This observation suggests that the motor system responsible for triggeringan action is also involved in the recognition of the action. In other words, recog-nition may also involve motor information, rather than purely visual information.Furthermore, by establishing a direct connection between gestures performed by asubject and similar gestures performed by others, mirror neurons may be related tothe ability to imitate found in some species [117], establishing an implicit level ofcommunication between individuals.

Canonical neurons [96] have the intriguing characteristic of responding whenobjects that afford a specific type of grasp are present in the scene, even if the graspaction is not performed or observed. Thus, canonical neurons may encode objectaffordances, as introduced in [55], and may help distinguishing ambiguous gesturesduring the process of recognition. In fact, many objects are grasped in very precise

Page 7: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 7

ways that allow the object to be used for specific purposes. A pen is usually graspedin a way that affords writing and a glass is held in such a way that we can use it todrink. Hence, by recognizing an object that is being manipulated, it is also possibleto attain information about the most likely grasping possibilities (expectations) andhand motor programs, simplifying the task of gesture recognition.

Reasoning Processes

Even if there is strong evidence that the motor system is involved in perception, it isnot clear how fundamental it is and many claims on the mirror system are unlikely tohold [56]. For instance, mirror neurons are not strictly necessary for action produc-tion as their temporal deactivation does not impair grasping control but only slowsit down [47,106]. On the other hand, more complex mechanisms than mirroring arenecessary to understand unexpected behaviors of an agent. In [19] an experimentis presented where a person turns a light on using its knee. Similar demonstrationsare shown where the person has the arms occupied with a folder, many folders ornone. Results from an fMRI scan showed that the mirror mechanism is active duringthe empty arms situation (expected behavior) but it is not active during the othersituation (unexpected behaviour). This and other similar results suggest that actionunderstanding in unexpected situations is achieved by an inference-based mecha-nism taking the contextual constraints into account. In turn, this indicates that theremay exist a reasoning mechanism to understand/interpret the observed behaviors.

1.2.2 Psychology

Studies in behavioral psychology have evidenced the ability of both children andchimpanzees to use different “imitative” behaviors. Individuals of both species alsoseem to switch between different such behaviors depending on perceived cues aboutthe world [54, 62]. These cues include, for example, the inferred purpose of the ob-served actions [13, 14, 67] even when the action fails [67, 91, 145]. Other sociallearning mechanisms are analyzed in [154] under the more general designation ofsocial influence/learning. In the continuation, we discuss some examples of behav-ior switching identified in the literature.

Imitation Capabilities and Behaviour Switching

Imitation and emulation are two classes of social learning mechanisms observed inboth children and apes [138,146,148].1 In imitation, the learning individual adheresto the inferred goal of the demonstrator, eventually adopting the same action choice.

1 Other species, such as dogs, have also been shown to switch strategies after having observed ademonstration, as seen in [118].

Page 8: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

8 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

In emulation, on the other hand, the individual focuses on the observed effects of theactions of the demonstrator, possibly reaching these using a different action choice.The predisposition of an individual to imitate or emulate can thus be confirmed intasks where the same effect can be achieved using different actions/motor patterns.For example, both chimpanzees and children are able to copy the choice of a pushor twist movement in opening a box [147].

Children, in particular, can be selective about which parts of a demonstration toimitate [54, 150], but are generally more prone to imitate than to emulate. For ex-ample, children can replicate parts of a demonstration that are clearly not necessaryto achieve the most obvious goal – a phenomenon known as over-imitation [62].Over-imitation can be diminished by reducing the social cues or by increasing theurgency of task completion [21, 85, 88]. It has also been argued that over-imitationcan occur for a variety of social reasons [103] or because the individuals interpretthe actions in the demonstration as causally meaningful [85].

Sensitivity to Task Constraints

Social animals also exhibit some sensitivity to the context surrounding the taskexecution, particularly task constraints. For example, in [90] 14-month-olds wereshown a box with a panel that lit up when the demonstrator touched it with his fore-head.The results showed that most infants reproduced the use of the forehead ratherthan using their hand when presented with the object a week later. This experimentwas further extended in [54] by including a condition in which the demonstratorwas restricted and could not use her hands. It was observed that only 21% of the in-fants copied the use of the forehead, against the 69% observed in a control conditionreplicating the [90] study. It was argued that, in the latter condition, infants recog-nize no constraints upon the demonstrator and thus encode the use of the forehead asa specific part of the intention. In the restricted case, they recognize the constraintas a extraneous reason for the use of the forehead and do not encode the specificaction as part of the intention.

We return to this particular experiment in Section 1.6, in the context of computa-tional models for social learning.

Imperfect Knowledge

Several experiments were conducted to investigate how the knowledge about theworld dynamics influences social learning mechanisms. In one archetypical exper-iment, an individual observes a sequence of actions, not all of which are actuallynecessary to achieve the outcome. For example, in [62], preschoolers and chim-panzees were presented with two identical boxes, one opaque and one transparent.A demonstrator then inserted a stick into a hole on the top of the box and then intoanother hole on the front of the box. It was then able to retrieve of a reward from thebox. In this experiment, the insertion of the stick into the top hole was unnecessary

Page 9: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 9

in order to obtain the reward, but this was only perceivable in the transparent box.The results showed that 3 and 4-year-old children tended to always imitate bothactions. On the contrary, chimpanzees were able to switch between emulation andimitation if causal information was available: after having observed demonstrationsin a transparent box, the chimpanzees were much less prone to insert the stick intothe upper (useless) hole.

Goal Inference

Finally, it has been showed that some species exhibit imitative behavior beyondsimple motion mimicry. For example, primates tend to interpret and actually repro-duce observed actions in a teleological manner – that is, in terms of the inferredgoals of the action [33]. In an experiment designed to test this hypothesis, 3 to 6-year-old children observed a demonstrator reaching across her body to touch a dotpainted on a table to one side of her, using the hand on her other side [13]. Whenprompted to reproduce the observed demonstration, children tended to copy the dot-touching action, but not the use of the contra-lateral hand. However, when the samedemonstration was performed without a dot, children tended to imitate the use ofthe contra-lateral hand. It was argued that, in the first case, children interpreted thedot touching as the intention, choosing their own (easier) way to achieve it, whilein the second case there was no clear target of the action but the action itself. Assuch, children interpreted the use of the contra-lateral hand as the intention and imi-tated it more faithfully. Results in experiments adapted for older children infants aresimilar [28].

1.2.3 Remarks

The experiments described above show that imitation behaviors result from severalcomplex cognitive skills such as action understanding, reasoning and planning. Eachof them depends on the physical and social context and also the knowledge of theagent. Partial world knowledge and contextual restrictions all influence the way anaction is understood and replicated. A robot that is able to imitate in a flexible wayshould thus be able to consider all of such aspects.

1.3 Imitation in Artificial Systems

Imitation learning brings the promise of making the task of programming robotsmuch easier [127]. However, to be able to imitate, robots need to have several com-plex skills that must be previously implemented or developed [84].

Page 10: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

10 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

In Section 1.2 we discussed some of the complexities involved in the process oflearning by imitation in natural systems, as well as all the contextual informationtaken into account when interpreting actions. Now, we replicate this discussion forartificial systems, outlining some of the issues that must be dealt with when devel-oping an artificial system (e.g., a robot) that can learn by imitation.

In [151], the authors identified three subproblems (or classes thereof) to be ad-dressed in developing one such system:

• Mapping the perceptual variables (e.g., visual and auditory input) into corre-sponding motor variables;

• Compensating for the difference in the physical properties and control capabili-ties of the demonstrator and the imitator; and

• Understanding the intention/purpose/reason behind an action (e.g., the cost func-tion to be minimized in optimal control that determines the action to be taken ineach situation) from the observation of the resulting movements.

If we further take into account that the perceptual variables from the demonstra-tor must also be mapped from an allo- to an ego- frame of reference, the firstof the above subproblems further subdivides into two other sub-problems: view-point transformation and sensory-motor matching [8, 22, 82, 83, 123]. The secondof the problems referred above is usually known as the body correspondence prob-lem [4,99,100] and is, in a sense, closely related and dependent on the first problemof mapping between perception and action.

Data Acquisition

The way to address the issues discussed above will largely depend on the contextin which imitation takes place. When used for robot programming, it is possible touse data acquisition systems that simplify the interpretation and processing of theinput data, thus reducing partial observability issues. The latter is important sincethe learner will seldom be able to unambiguously observe all the relevant aspects ofthe demonstration. In particular, this can allow more robust and efficient algorithmsto tackle the allo-ego transformation, the perception-to-action mapping and the bodycorrespondence. Examples of such systems include exoskeletons, optical trackers orkinesthetic demonstrations [16].

Other applications of imitation occur in less controlled environments, for ex-ample as a result of the natural interaction between a robot and a human. In suchcontexts, perceptual problems must be explicitly addressed. Some authors addressthis problem adopting a computer vision perspective [82], modeling partial observ-ability [40] or being robust to noise in the demonstration [26, 80, 116].

Page 11: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 11

Mapping of Perceptual Variables and Body Correspondence

Many approaches do not consider a clear separation between data acquisition andlearning by demonstration. One way to deal with the lack of information/data is touse prior knowledge to interpret the demonstration. Action interpretation stronglydepends on the knowledge about how the world evolves as well as on the capabil-ities of both the learner and the demonstrator to interact with it [54, 62, 79]. Thisprocess is closely related with the two first issues pointed out in [151], since it pro-vides a way to map external inputs to internal motor representations (e.g., to robotactions). Therefore, imitation learning algorithms will typically benefit from priorknowledge about the environment, specially when data acquisition cannot provide afull description of the demonstration.

Knowledge about the agent’s own body and its interaction with the world simpli-fies some of the difficulties found in imitation. On one hand, for the perception-to-action mapping, the recognition of others’ actions can rely on the learner’s modelof the world dynamics, e.g., by inferring the most probable state-action sequencegiven this model. This idea draws inspiration from psychological and neurophysi-ological theories of motor perception, where recognition and interpretation of be-havior are performed using an internal simulation mechanism [5, 51, 83]. As seenin Section 1.2, mirror neurons are one of such mechanisms [51], and a significantamount of research in imitation learning in robotics flourished from this particulardiscovery.

On the other hand, this type of knowledge also allows action recognition andmatching to occur with an implicit body correspondence, even if the bodies of thelearner and demonstrator are different. Several works have explored this idea. Forexample, in [82, 83, 99, 100, 130, 132], action matching is addressed at a trajectorylevel. In these works, the demonstration is interpreted taking into account the differ-ent dynamics of the learner and the demonstrator. In [71, 93, 94], the same problemis addressed at a higher level of abstraction that considers the effects on objects.

Goal/Intention Inference

Understanding actions and inferring intentions generally requires a more explicitreasoning process than just a mirror-like mechanism [19]. As discussed in Sec-tion 1.2, by further abstracting the process of learning by imitation to task levelit is possible to additionally include contextual cues [10, 64, 79]. At this level ofabstraction the third issue identified in [151] becomes particularly relevant.

Identifying the goal driving a demonstration is a particularly complex inferenceprocess, indeed an ill-defined one. What to imitate depends on several physical, so-cial and psychological factors (see Section 1.2.2). One possible way to answer thisquestion relies on the concept of imitation metrics. These metrics evaluate “howgood” imitation is. Imitation metrics were first explicitly introduced in [101] inorder to quantify the quality of imitation, to guide learning and also to evaluatelearned behavior. However, it is far from clear what “good imitation” is and, per-

Page 12: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

12 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

haps more important, how variable/well-defined the learned behavior can be. Somestudies along this direction have characterized the quality of imitation in humans.In [111], subjects were asked to perform imitation tasks and quantitative resultswere obtained to assess the effect of rehearsal during observation and repetition ofthe task.

In any case, it is often not clear whether imitation concerns the motor intention orthe underlying goal of that motor intention [17, 23, 79, 127]. In other words, it is of-ten the case that the agent cannot unambiguously identify whether it should imitatethe action, its outcome, or the reason driving the demonstrator to do it. In each of thefollowing sections we we discuss in detail each of these three approaches to imita-tion learning. In particular, we refer to Section 1.6 for a more detailed discussion onthe problem of inferring the goal behind a demonstration. In that section we surveyseveral recent works in which a learner does infer the goal of the demonstrator andadopts this goal as its own [2, 10, 64, 80, 119].

The Role of Self-Observation

It is interesting to note that most of the necessary information about the robot’s bodyand the world dynamics can be gathered by self-observation [36,84]. Although slowin many situations, it often allows a greater adaptation to changing scenarios.

Many different techniques can be used by the robot to learn about its own body[41, 84, 108, 129, 144]. For example, several works adopt an initial phase of motorbabbling [57, 76, 83, 84]. By performing random motions, a great amount of databecomes available, allowing the robot to infer useful relations about causes andconsequences of actions. These relations can then be used to learn a body schemauseful in different application scenarios. The specific methods used to learn bodymodels vary, and range from parametric methods [27, 61], neural network methods[76,82,83] to non-parametric regression [41,84,144] and graphical models [57,135].As for learning about the world dynamics, this is closely related to the conceptof learning affordances. Repeated interaction with the world allows the robot tounderstand how the environment behaves under its actions [46,94,95,122]. As seenin the previous section, the knowledge about the world dynamics and the capabilitiesof others strongly influences how actions are understood.

So far in this chapter we presented insights from neurophysiology, psychologyand robotics research on the problems involved in learning by demonstration. Thenext sections will provide an overview of methods that handle such problems. We di-vide those methods according to the different formalisms or sources of informationused, namely (a) trajectory matching and generalization, where sample trajectoriesare the main source of information from which the learner generalizes; (b) objectmediated imitation, where effects occurring on objects are the relevant features of a

Page 13: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 13

Metric

SMM1

SMMN

VPT

Recognition

Fig. 1.3: Imitation architecture. Observed actions are first transformed to a ego frameof reference (VPT), where segmentation and recognition take place. After decidingon how to imitate, a correspondence between the two different bodies must be doneby selecting the corresponding SMM. Finally, imitation is enacted.

demonstration; and (c) imitation of inferred goals, where there is an explicit estima-tion of the demonstrator’s goal/intention.

1.4 Imitating by Motor Resonance

This section presents several methods that learn by demonstration by first mappingstate-action trajectories to the learner’s own body and then generalizing them.

Following what is proposed in [83, 151], the imitation process consists of thesteps enumerated below and illustrated in Fig. 1.3:

(i) The learner observes the demonstrator’s movements;(ii) A viewpoint transformation (VPT) is used to map a description in the demon-

strator’s frame allo-image to the imitator’s frame ego-image;(iii) Action recognition is used (if necessary) to abstract the observed motion; and(iv) A sensory-motor map (SMM) is used to generate the motor commands that

have the higher probability of generating the observed features.

In this section we survey several methods that adopt this general approach. Inthese methods not all steps enumerated above are explicitly dealt with, but are stillimplicitly ensured by considering different simplifying assumptions.

Page 14: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

14 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

Fig. 1.4: Perceptual difference between the same gesture having an ego- or an allo-perspective.

1.4.1 Visual Transformations

The same motor action can have very distinctive perceptual results if one considersdifferent points-of-view. For example when a person gestures goodbye, she can seethe back of her hand, while when someone else is doing the same she will see thepalm of the other’s hand. This poses a problem when mapping actions from thedemonstrator to the learner, as already identified in [22] and depicted in Fig. 1.4.

The typical solutions for this problem is to perform a complete three-dimensionalreconstruction. However, several works proposed alternative approaches that con-sider simplifications to such problem. In [8, 12, 52, 82, 83] several transformationsare discussed, ranging from a simple image transformation (e.g., mirroring the im-age) to a partial reconstruction assuming an affine camera and the full three dimen-sional reconstruction. These works also point out that such transformations can beseen as imitation metrics, because the depth information in some gestures can in-deed change the meaning of the action. Such transformations can also be done usingneural networks [123].

1.4.2 Mimicking Behaviors and Automatic Imitation

Several works on imitation seek to transfer behaviors by a simple motor reso-nance process. The observation of perceptual consequences of simple motor actionscan elicit an automatic mimicking behavior, and several initial works on imitationadopted this approach [34, 35, 53, 59]. In these works, the relation between changesin perceptual channels caused by a demonstrator are directly mapped into motor

Page 15: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 15

actions of the learner. Particularly in [34], the orientation of a mobile robot is con-trolled according to the observed position and orientation of a human head.

In these approaches the model about the own body is not explicit, the relationsbetween action and perception is learned without concern about the true geometryand dynamics of the body.

Using the visual transformations introduced earlier it is possible to generate morecomplex behaviors. In [82, 83], the learner starts by acquiring a correspondence be-tween its own perceptions and actions. The mimicking behavior results from map-ping the perception of the demonstrator to its own using a view-point transforma-tion, and then activating an inverse sensory-motor map. Different visual transforma-tions result in different types of behavior.

Examples of other methods posing imitation within this visuo-somatic perspec-tive include [7,53]. An interesting approach is proposed in [18] where facial expres-sion are learned by interacting with a person and creating a resonance of expres-sions.

1.4.3 Imitation through Motor Primitives

All previous approaches consider a simple perception-action loop in imitation. How-ever, when considering a learner equipped with several motor primitives, it is pos-sible to achieve more complex interactions. The “recognition” block in Fig. 1.3represents the translation of observed trajectories in terms of such motor primitives.A motor sequence is perceived, maybe after some visual processing, and recognizedas a specific motor primitive [86,87]. This general approach can be used to performhuman-machine interaction [20] or to learn how to sequence such primitives [23].

In [23] a string parsing mechanism is proposed to explain how apes are able tolearn by imitation to process food. The string parsing mechanism is initialized withseveral sequences of primitives. Learning and generalization are performed by ex-tracting regularities and sub-sequences. This approach can be seen as a grammaticalinference process.

Other approaches use hidden Markov models to extract such regularities and filterbehavior, usually in the context of tele-operation. The main goal of such approachesis to eliminate sluggish motion of the user and correct errors. We refer to [153] for anapplication of one such approach to the control of an Orbit Replaceable Unit. In thiswork, a robot observes an operator performing a task and builds a hidden Markovmodel that describes that same task. In [63], a similar approach is used in assemblytasks. Such models can also be used to detect regularities in human motion [24].

Regularities of human motion can be represented in low-dimensions using prin-cipal component analysis [29], clustering [74] or other non-linear manifold learningtechniques [65].

Some authors rely on completely separated methods to recognize and to generatemotor primitives. Others combine both processes, thus exploring the self-modelingphase [37,38,66,83]. As seen in Section 1.2.1, this process can be explained by the

Page 16: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

16 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

use of motor simulations of some kind. For example in both [38] and [151], actionrecognition is performed using the idea of coupled forward/backward models dis-cussed in [152]. The former decouples each of the action primitives and is thus ableto deal with a larger variety of tasks, while the latter is able to combine several prim-itives and deal with complex motor skills in a robust way. In [43,44] dynamic neuralnetworks are used to recognize actions goals taking into account task restrictions. Asingle neural network able to encode several behaviors was introduced in [137] andperforms similar computations.

1.4.4 Learning of New Task Solutions

In some cases the learner has an explicit goal. However, it might be very difficult toplan how to reach such goal. This is specially important in complex environmentsor in situations involving highly redundant robots (e.g., humanoid robots). One ofthe main motivations behind imitation learning is that it provides an easy way toprogram robots. Therefore, most approaches to imitation consider the learning ofnew actions. In such approaches two main trends have been adopted: one considersmany demonstrations of the same task and tries to find invariants in the observedmotions [17, 119]. The other uses only the observed trajectories as an initializationand then improves and generalizes further (e.g., using reinforcement learning) [109].

In [112], imitation is used to speed up learning and several metrics were definedfor evaluating the improvement in learning when using imitation. Imitation is usedin [129] to learn dynamic motor primitives. As argued in this work, “Movementprimitives are parameterized policies that can achieve a complete movement behav-ior”. From this perspective, motor primitives can be seen as dynamical systems thatgenerate different complex motions by changing a set of parameters. The authorsalso suggest the use of data from a demonstration to initialize such parameters. Theparameters can then be optimized using, for example, policy gradient [109]. Wepoint out that these methods consider classes of parameterized policies, namely theparameters of the dynamical system.

One of the few approaches taking uncertainty into account is proposed in [57].The approach in this work starts by learning a Gaussian mixture model as a forwardmodel, using self-observation [130]. The demonstration is taken as observation of aprobabilistic process and the goal is to find a sequence of actions that maximizes thelikelihood of such evidence. The work focuses on copying motions taking into ac-count the dynamics of the robot and, as such, uses as observations the estimated statetrajectory and ignores the dynamic information. It achieves body correspondence byinferring the most probable trajectory using the imitator’s body. Extra task restric-tions can also be included. In [29] walking patterns are transferred from humans torobots after adapting it for different kinematics using low level representations.

Finally, several other methods are agnostic as to what is exactly the goal of thedemonstrated task and aim only at learning the observed course of action. For exam-ple, in [31], learning from demonstration is formulated as a classification problem

Page 17: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 17

and solved using support-vector machines. These methods, in a sense, disregard theeffects and social context of the demonstration, and focus only on replicating in eachsituation the demonstrated course of action (see Section 1.6 for a more detailed dis-cussion on this). One disadvantage of this general approach is that it places excessiveconfidence on the demonstrator. Furthermore, the course of action learned is specificto the context and environment of the learner and, as such, is not generalizable todifferent environments.

1.5 Object Mediated Imitation

In the previous section, we have discussed imitation from a motor perspective. Fromthis perspective, context is mainly provided by the body parts and correspondingmotion. In this section we discuss a more abstract approach to imitation, wherethe context is enlarged to accommodate objects. In other words, the learner is nowaware of the interaction with objects and, consequently, takes this information intoaccount during learning. The most representative example of this type of behavior isemulation (see Fig. 1.2). In contrast with the motor resonance mechanisms discussedpreviously, which could perhaps be best described as mimicry, emulation focuses oncopying (replicating) the results/effects of actions.

This abstraction from low-level control to higher-level representations of actionsalso facilitates reasoning about causality of actions, i.e., how to induce specificchanges to the environment. Consider the simple case of piling two objects. To learnthis task, motor resonance alone does not suffice, as the learner must take into ac-count which object can be placed on top of which depending on their sizes, shapesand other features. Another illustrative example is opening a door, where the shapeof the handle provides meaningful information about how to perform the action. Inthe motor-resonance-based approaches, body correspondence addressed problemssuch as different number of degrees of freedom, or different kinematics and dynam-ics. In this section we discuss correspondence in terms of the usage that differentobjects have to different agents.

A biologically inspired concept related to the previous discussion is that of affor-dances [55]. Developed in the field of psychology, the theory of affordances statesthat the relation between an individual and the environment is strongly shaped bythe individual’s perceptual-motor skills. Back in the 70s, this theory established anew paradigm where action and perception are coupled at every level. Biologicalevidence of this type of coupling is now common in neuroscience [51] and sev-eral experiments have shown the presence of affordance knowledge based on theperception of heaviness [141] or traversability [69].

Affordances have also been widely studied in robotics. In this section we discussthis concept of affordances in the context of imitation learning in robots, as well asthe inclusion of object information and properties in the learning process. A thor-ough review on these topics can be found in [122], with special emphasis placed inaffordance-based control.

Page 18: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

18 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

We generally define affordances as mappings that relate actions, objects and con-sequences (effects). This very general concept can be modeled using different for-malisms including dynamical systems [131], self-organizing maps [32], relationallearning [58] and algebraic formulations [100].

However, independently of the selected representation or formalism, there aretwo core challenges to achieve affordance-based imitation: acquiring the model ofaffordances and exploiting this model. The latter depends heavily on the representa-tion, but usually resorts to some type of action selection. For instance, if a dynamicalsystem is used to encode a forward model, then the agent emulates by selecting theaction that best matches the desired effect. It is worth mentioning that the approachesin this section are strongly dependent on a self-modeling phase as the concept of af-fordances is strictly a self-modeling idea.

The required data to infer the affordances model may be acquired either by ob-servation or by experimentation. In the case of self-experimentation there is no bodycorrespondence or visual transformation required, but such capability is importantwhen interacting with other agents. When learning by observation such problem isimmediately present. An advantage of object mediated imitation is that the matchonly occurs in the effects on object and so the specific body kinematics and actiondynamics are not considers, thus simplifying several problems in imitation.

Affordances as Perception-Action Maps

A simple way of describing effects is to learn mappings from a set of predefinedobject features to changes in these features. This approach was used in a manipula-tion task to learn by experimentation the resulting motion directions as a function ofthe object shape and the poking direction [46]. Once the mapping has been learned,emulating an observed motion direction can be achieved by simply selecting the ap-propriate poking direction for the object. A similar approach was proposed in [71],where the imitation is also driven by the effects. In this case, the demonstrator’sand imitator’s actions are grouped according to the effect they produce in an object,irrespectively of their motor programs. Given an observed object and effect pair,the appropriate action (or sequence of actions) can then be easily retrieved. Anotherexample is the use of invariant information [32, 48, 133]. In this case, the systemlearns invariant descriptions across several trials of the same action upon an object.Depending on the parameterization, the learned invariant descriptors may representobject characteristics or effects. Although this type of information has been usuallyapplied in robot control, it is possible to use the invariants for emulation under theassumption that they are invariant to the viewpoint and that they capture the effects.

Grasping is a paradigmatic example of affordance knowledge that has beenwidely studied in the literature. Perception-action maps have appeared in sev-eral forms such as the Q-function of a reinforcement learning algorithm [155]; apure regression map from object features to image points based labelled exam-ples [125, 126] and self-experimentation [39, 92]; or as the correct position of amobile robot to trigger a particular grasping policy [134].

Page 19: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 19

Affordances as Dynamical Models

An alternative approach consists in modeling the dynamical system composed bythe agent (demonstrator or imitator) and the environment. In [131], a hidden Markovmodel is used to encode the state of the agent and objects. In order to train the for-ward model, reflective markers were placed on the demonstrator and on the objectsand tracked by a capture system. Viewpoint transformation then uses linear trans-formations between the demonstrator’s and the imitator’s body poses. Emulation iscasted as a Bayesian decision problem over the Markov model, i.e., selecting themaximum a posteriori action for each transition of the Markov chain. Interestingly,the model is able to modify the selected behavior with its own experience and refinethe model previously learned solely by observation. Again, this is due to the factthat emphasis is placed on achieving the same effects instead of copying the action.

Dynamical systems have also been used for goal directed imitation in [43, 44].The proposed architecture contains three interconnected layers corresponding to dif-ferent brain areas responsible for the observed action, the action primitives and thegoal. Each layer is implemented using a dynamic field that evolves with experienceand its able to incorporate new representations using a correlation learning rule be-tween adjacent neurons.

1.5.1 Bayesian Networks as Models for Affordances

Affordances can be seen as statistical relations between actions, objects and effects,modeled for example using Bayesian networks. One such approach was proposedin [94], in which the nodes in the network represent actions, object features or mea-sured effects. As in standard Bayesian networks, the absence of a vertex betweentwo nodes indicates conditional independence. Self-experimentation provides mostof the data necessary to learn such relations. If a robot exerts its actions upon dif-ferent objects, it can measure the effects of such actions. Even in the presence ofnoise the robot is able to infer that some actions have certain effects that depend onsome of the object features. Also, the existence of irrelevant and redundant featuresis automatically detected.

Based on such prior experience, structure learning [60] can be used to distinguishall such relations. Once these dependencies are known, one can query the network toprovide valuable information for several imitation behaviors. Table 1.1 summarizesthe different input-output combinations. In particular, for emulation purposes, theability to predict the effect conditioned on a set of available objects and the possibleactions directly provides the robot with a way to select the appropriate action in asingle-step Bayesian decision problem. It is interesting to note that at this abstractionlevel the same mechanism is used for prediction and control, giving a mirror-likebehavior.

This approach is halfway between learning specific maps and a full dynamicalsystem description. If specific maps are learned, it is not easy to consider demon-

Page 20: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

20 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

Table 1.1: Affordances as relations between actions (A), objects (O) and effects (E)that can be used for different purposes: predict the outcome of an action, plan actionsto achieve a goal or recognize objects or actions

Inputs Outputs Function

(O,A) E Predict effect(O,E) A Action recognition and planning(A,E) O Object recognition and selection

strators with different dynamics nor to explicitly consider task restrictions. Thisapproach is not as applicable as learning a full dynamical system description, be-cause it is does not easily allow encoding long-term plans. It provides predictionsfor incremental state changes. For a more detailed discussion, we refer to [80].

1.5.2 Experiments

In this section we provide several experimental results obtained with a Bayesiannetwork model for affordances. In particular, we describe both how the model ofaffordances can be learned by the robot and then used to attain affordance-basedemulation behaviors.

For all experiments we used BALTAZAR [78], a robotic platform consisting ofa humanoid torso with one anthropomorphic arm and hand and a binocular head(see Fig. 1.5). The robot is able to perform a set of different parameterized actions,namely A = {a1 = grasp(λ ),a2 = tap(λ ),a3 = touch(λ )}, where λ represents theheight of the hand in the 3D workspace when reaching an object in the image. Italso has implemented an object detector that extracts a set of features related to theobject properties and the effects of the action.

Affordance Learning

We now describe the process by which the robot learned the affordance networkused in the emulation behaviors. We recorded a total of 300 trials following theprotocol depicted in Fig. 1.6. At each trial, the robot was presented with a randomobject of one of two possible shapes (round and square), four possible colors andthree possible sizes (see Fig. 1.5 for an illustration of the objects used). BALTAZARthen randomly selects an action and moves its hand toward the object using pre-learned action primitives [84, 94]. When the reaching phase is completed, the robotthen performs the selected action (grasp(λ ), tap(λ ) or touch(λ )) and finally returnsthe hand to the initial position. During the action, object features and effects arerecorded.

Page 21: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 21

Fig. 1.5: Robot playground used in the experiments.

Fig. 1.6: Protocol used in the experiments. The object used in each trial is selectedmanually and the robot then randomly selects an action to interact with it. Objectproperties are recorded from the Init to the Approach states, when the hand is notoccluding the object. The effects are recorded in the Observe state. Init moves thehand to a predefined position in open-loop.

Visual information is automatically clustered using the X-means algorithm [107].The resulting classes constitute the input for the affordance learning algorithm. Thefeatures and their discretization are shown in Table 1.2. Summarizing, shape de-scriptors (e.g., compactness and roundness) provided two different classes, size wasdiscretized in 3 different classes and color in four. Based on this data, the robotadjusted the parameter λ for each action and then learned an affordance model asdescribed above.

The learned model is shown in Fig. 1.7. The network was learned using MonteCarlo sampling with BDeu priors for the graph structure and a random network ini-tialization. The dependencies basically state that color is irrelevant for the behaviorof the objects under the available actions. In addition to this, a successful grasp re-quires the appropriate object size, while the velocity of the object depends on its

Page 22: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

22 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

Table 1.2: Random variables in the network and possible values.

Symbol Description Values

A Action grasp, tap, touchC Color green1,green2, yellow, blueSh Shape ball, boxS Size small, medium, large

OV Object velocity small, medium, largeHV Hand velocity small, medium, largeDi Object-hand velocity small, medium, largeCt Contact duration none, short, long

Fig. 1.7: Affordance network representing relations between actions, object featuresand the corresponding effects. Node labels are shown in Table 1.2.

shape and the action. We refer the reader to [94] for further details on the affordancelearning.

Emulation

We now present the results obtained in several basic interaction games using the af-fordance network. The games proceed as follow. The robot observes a demonstratorperforming an action on a given object. Then, given a specific imitation metric, itselects an action and an object to interact with so as to imitate (emulate) the demon-strator. Figure 1.8 depicts the demonstration, the different objects presented to therobot and the selected actions and objects for different metrics.

In the experiments we used two different demonstrations, a tap on a small ball(resulting in high velocity and medium hand-object distance) and a grasp on a smallsquare (resulting in small velocity and small hand-object distance). Notice that con-tact information is not available when observing others.

The goal of the robot is to replicate the observed effects. The first situation(Fig. 1.8a) is trivial, as only tapping has a non-zero probability of producing highvelocity. Hence, the emulation function selected a tap on the single object available.

Page 23: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 23

One object Two objects Three objects

Dem

onst

ratio

nSe

lect

ion

ofac

tion

Em

ulat

ion

Matching effect Matching effect Matching effect Matching effect and shape(a) (b) (c) (d)

Fig. 1.8: Different emulation behaviors. Top row: Demonstration; Middle row: Setof potential objects; Bottom row: Emulation. Situations (a) through (d) represent:(a) emulation of observed action, (b) replication of observed effect, (c) replicationof observed effect, and (d) replication of observed effect considering the shape ofthe object.

In Fig. 1.8b the demonstrator performed the same action, but the robot now has todecide between two different objects. Table 1.3 shows the probabilities for the de-sired effects given the six possible combinations of actions and objects. The robotselected the one with highest probability and performed a tap on the ball.

Table 1.3: Probability of achieving the desired effect for each action and the objectsof Fig. 1.8b.

Obj \ Action Grasp Tap Touch

Large blue ball 0.00 0.20 0.00Small yellow box 0.00 0.06 0.00

Page 24: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

24 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

Figures 1.8c and 1.8d illustrate how including the object features in the metricfunction produce different behaviors. After observing the grasp demonstration, therobot has to select among three objects: large yellow ball, small yellow ball andsmall blue box. In the first case the objective was to obtain the same effects. Theprobability of grasping for each of the objects is 0.88, 0.92 and 0.52, respectively,and the robot grasped the small yellow ball even if the same object is also on thetable (Fig. 1.8c). Notice that this is not a failure, since it maximizes the probabilityof a successful grasp which is the only requirement of the metric function.

We conclude by noting that other criteria can include more complex information,such as similarly shaped objects. When also taking this new criterion into account,the robot selected the blue box instead (see Fig. 1.8d).

1.6 Imitating by Inferring the Goal

So far in this chapter we have discussed learning by imitation at the motor leveland at the effect level. The former focuses on replicating the exact movement ob-served, in a sense disregarding the effect on the environment and the social contextin which the movement is executed. The latter addresses imitation at a higher levelof abstraction, focusing on replicating the effects produced by the observed actionin the environment, ignoring to some extent the exact motor trajectory executed. Asseen in Section 1.2, knowledge about the world, the demonstrator and the learner’sown body all influence the way a goal is inferred.

In this section we discuss imitation learning at a yet higher level of abstraction,approaching the concept of “imitation” according to the taxonomy in Fig. 1.2. Con-cretely, we discuss the fundamental process by which a learner can infer the taskto be learned after observing the demonstration by another individual (e.g., a hu-man). We discuss several approaches from the literature that address the problemof inferring the goal of a demonstration at different levels. We then discuss in de-tail a recent approach to this problem that provides a close relation and potentiallyinteresting insights into imitation in biological contexts.

1.6.1 Goal Inference from Demonstration

Inferring the goal behind a demonstration is, in general, a hard problem, as it re-quires some form of common background for the learner and the demonstrator. Insocial animals, this common background greatly depends on the social relation be-tween the demonstrator and the learner. For example, social cues were found im-portant in promoting imitation in infants [21, 91]. Several other studies address thegeneral problem of understanding the process of inferring the goal/intention behinda demonstration [14, 67, 91]. Most such studies also address the related problem ofunderstanding the process of perceiving unfulfilled intentions.

Page 25: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 25

Translating this complex social learning mechanism into artificial systems usu-ally requires the common background to be provided by the designer, who “im-prints” in the system whatever of her own background knowledge it determines tobe relevant for the particular environmental context of the system. As such, it ishardly surprising that different researchers address the problem of “goal inference”from perspectives as distinct as their own backgrounds and lines of work. For ex-ample, in [17] goal inference is cast as an optimization problem. Motor theories ofperception are used in [38] to build better imitation systems. These works essen-tially seek to determine which specific elements of the demonstration are relevant,seeking a hard answer to the fundamental problem of “What to imitate” discussedin Section 1.3.

Other recent works have adopted a fundamentally different approach, in whichthe learning agent chooses among a library of possible goals the one most likely tolead to the observed demonstration. For example, in [64] the problem of imitationis tackled within a planning approach. In this setting, the learner chooses betweena pool of possible goals by assessing the optimality of the demonstration (viewedas a plan). Evaluative feedback from the demonstrator is also used to disambiguatebetween different possible goals.

One significant difficulty in inferring the goal behind a demonstration is thatthe same observed behavior can often be explained by several possible goals. Goalinference is, therefore, an ill-posed problem, and many approaches adopt a proba-bilistic setting to partly mitigate this situation [10, 119, 142]. For example, in [10],the authors address the problem of action understanding by children. To this pur-pose, they propose the use of a Bayesian model that matches observed inferencesin children facing new tasks or environmental constraints. Similar ideas have beenapplied to robots in different works [80, 119, 142]. In [119], the goals of the robotare restricted to shortest-path problems while in [80, 142] general goals are consid-ered. In [156], a maximum entropy approach is used to infer the goal in navigationtasks. The paper computes a distribution over “paths to the goal” that matches theobserved empirical distributions but otherwise being as little “committed” as possi-ble. Optimization is performed by a gradient-based approach. All these approacheshandle the body correspondence problem by performing the recognition in terms ofa self-world model.

In a sense, all the aforementioned approaches interpret the demonstration as pro-viding “implicit” information about the goal of the demonstrator, a soft answer to theproblem of “What to imitate”. In other words, while the approaches in [17,38] seekto single out a particular aspect of the demonstration to replicate, the latter assumesthat the actual goal of the demonstration drives the action choice in the demonstra-tion, but needs not be “contained” in it. This makes the latter approach more flexibleto errors in the demonstration and non-exhaustive demonstrations. Another way oflooking at the distinction between the two classes of approaches outlined aboveis by interpreting the latter as providing models and methods that allow the agentto extract a general task description from the demonstration, rather than a specificmapping from situations to actions that may replicate, to some extent, the observedbehavior. This approach is closer to imitation in the biological sense, as defined

Page 26: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

26 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

in [25]. Finally, several recent works have proposed general models that contrastwith those referred above in that they are able to generate multiple social-learningbehaviors [79, 89].

In the remainder of this section we describe in detail the approach in [79]. Fol-lowing the taxonomy in [25], our model takes into account several possible sourcesof information. Concretely, the sources of influence on our model’s behavior are:beliefs about the world’s possible states and the actions causing transitions betweenthem; a baseline preference for certain actions; a variable tendency to infer andshare goals in observed behavior; and a variable tendency to act efficiently to reachthe observed final states (or any other salient state).

1.6.2 Inverse Reinforcement Learning as Goal Inference

We now introduce the general formalism used to model the interaction of both thelearning agent and the demonstrator with their environment. This approach sharesmany common concepts with those in [10,119,142], in that it infers a goal from thebehaviors using bayesian inference to deal with noise and to disambiguate the set ofpossible goals.

Environment Model

At each time instant t, the environment can be described by its state, a randomvariable that takes values in a finite set of possible states (the state-space). The tran-sitions between states are controlled to some extent by the actions of the agent (thedemonstrator during the demonstration and the learner otherwise). In particular, ateach time instant the agent (be it the learner or the demonstrator) chooses an actionfrom its (finite) repertoire of action primitives and, depending on the action par-ticular action chosen, the state evolves at time t + 1 according to some transitionprobabilities P [Xt+1 | Xt ,At ].

We assume that the learner has knowledge of its world, in the sense that it knowsthe set of possible states of the environment, its action repertoire and that of thedemonstrator, and the world dynamics, i.e., how both his and the demonstrator’sactions affect the way the state changes (the transition probabilities). Note that wedo not assume that this world knowledge is correct, in the sense that the agent maynot know (or may know incorrectly) the transitions induced by certain actions. Inany case, throughout this section we assume this knowledge as fixed – one canimagine the approach described herein eventually to take place after a period of self-modeling and learning about the world.2 In this section, the modeled agent does notlearn new actions, but instead learns how to combine known actions in new ways.In this sense, it is essentially distinct from the approach surveyed in Section 1.4.

2 To our knowledge, no work exists that explores knowledge acquisition simultaneously with learn-ing by imitation, but we believe that such approach could yield interesting results.

Page 27: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 27

Finally, in a first approach, we assume that the agent is able to recognize theactions performed by the demonstrator. In this section we do not discuss how thisrecognizer can be built, but refer that it can rely, for example, on the affordancemodels discussed in Section 1.5. Toward the end of the section we briefly discusshow the ability to recognize the demonstrator’s actions affects the ability of thelearner to recover the correct task to be learned (see also [80]).

In our adopted formalism, we “encode” a general task as a function r mappingstates to real values that describes the “desirability” of each particular state. Thisfunction r can be seen as a reward for the learner and, once r is known, the problemfalls back into the standard framework of Markov decision processes [115]. In fact,given the transition probabilities and the reward function r, it is possible to computea function Qr that, at each possible state, provides a “ranking” of all actions detailinghow useful each particular action is in terms of the overall goal encoded in r. Fromthis function Qr(x,a) it is possible to extract an optimal decision rule, henceforthdenoted by πr and referred as the optimal policy for reward r, that indicates theagent the best action(s) to choose at each state,

πr(x) = argmaxa

Qr(x,a)

The computation of πr, or equivalently Qr, given r, is a standard problem and canbe solved using any of several standard methods available in the literature [115].

Bayesian Recovery of the Task Description

We consider that the demonstration consists of a sequence D of state-action pairs

D = {(x1,a1),(x2,a2), . . . ,(xn,an)} .

Each pair (xi,ai) exemplifies to the learner the expected action (ai) in each of thestates visited during the demonstration (xi). In the formalism just described, the goalinference problem lies in the estimation of the function r from the observed demon-stration D . Notice that this is closely related to the problem of inverse reinforcementlearning as described in [1]. We adopt the method described in [89], which is a basicvariation of the Bayesian inverse reinforcement learning (BIRL) algorithm in [116],but the same problem could be tackled using other IRL methods from the literature(see, for example, [102, 136]).

For a given reward function r, we define the likelihood of a state-action pair,(x,a), as

Lr(x,a) = P [(x,a) | r] = eηQr(x,a)

∑b eηQr(x,b) ,

where η is a user-defined confidence parameter that we describe further ahead. Thevalue Lr(x,a) translates the plausibility of the choice of action a in state x whenthe underlying task is described by r. Therefore, the likelihood of a demonstrationsequence D can easily be computed from the expression above. We use MCMC

Page 28: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

28 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

to estimate the distribution over the space of possible reward functions given thedemonstration (as proposed in [116]) and choose the maximum a posteriori.

We note that it may happen that the transition model available is inaccurate. Inthis situation, the learner should still be able to perceive the demonstrated task, giventhat the “errors” in the model are not too severe. We also note that, in the processof estimating this maximum, the learner uses the knowledge concerning the actionrepertoire and world dynamics of the demonstrator. After the task description (thereward function r) is recovered, the learning agent then uses its own world model tocompute the right policy for the recovered task in terms of its own world dynamicsand action repertoire.

A Model for Social Learning Mechanisms

Following the taxonomy in Fig. 1.2, we include in our model of social learningthree sources of information to be used by the learner in determining the behaviorto adopt [79]. The sources of influence on our model’s behavior are baseline prefer-ences for certain actions; a tendency to infer and share goals in observed behavior;and a tendency to act efficiently to reach rewarding states. Each of the three sourcesof information is quantified in terms of a utility functions QA, QI and QE , respec-tively. The learner will adhere to the decision-rule obtained by combining the threefunctions. In particular, the learner will adhere to the decision-rule associated withthe function

Q∗ = λAQA +λEQE +λIQI , (1.1)

with λA +λI +λE = 1. By resorting to a convex combination as in Eq. 1.1, there isan implicit trade-off between the different sources of information (see also Fig. 1.9).It remains to discuss how QA, QI and QE are computed from the demonstration.

• The first source of information is the learner’s preference between actions. Thispreference can be interpreted, for example, as representing a preference for “eas-ier” actions than “harder” actions, in terms of the respective energetic costs. Thispreference corresponds to natural inclinations of the learner, and is independentof the demonstration. The preference is translated in the corresponding utilityfunction QA, whose values are pre-programmed into the agent.3

• The second source of information corresponds to the desire of the learner to repli-cate the effects observed in the demonstration. For example, the learner may wishto reproduce the change in the surroundings observed during the demonstration,or to replicate some particular transition experienced by the teacher. This can betranslated in terms of a utility function QE by considering the reward functionthat assigns a positive value to the desired effect and then solving the obtainedMarkov decision process for the corresponding Q-function. The latter is taken asQE .

3 The exact values of QA translate, at each state, how much a given action is preferred to any other.The values are chosen so as to lie in the same range as the other utility functions, QI and QE .

Page 29: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 29

Non-social behaviourSocial behaviour

Follow baselinepreferences

λB = 1

λE = 1λI = 1

Mixedimitativebehaviour

ImitationAdhere to inferred“intention”, replicateobserved actions andeffect

EmulationReplicate observedeffect

Fig. 1.9: Combination of several simple behaviors: Non-social behavior, emula-tion and imitation. The line separates the observed social vs non-social behav-ior, and does not correspond to the agent’s reasoning (reproduced with permissionfrom [79]).

• The third and final source of information is related to the desire of the learner topursue the same goal as the teacher. Given the demonstration, the learner usesthe Bayesian approach outlined before, inferring the underlying intention of theteacher. Inferring this intention from the demonstration is thus achieved by ateleological argument [33]: the goal of the demonstrator is perceived as the onethat more rationally explains its actions. Note that the goal cannot be reduced tothe final effect only, since the means to reach this end effect may also be part ofthe demonstrator’s goal. We denote the corresponding utility function by QI .

It is only to be expected that the use of different values for the parameters λA, λEand λI will lead to different behaviors from the learner. This is actually so, as illus-trated by our experiments. We also emphasize that QE greatly depends on the worldmodel of the learner while QI also depends on the world model of the teacher.4

4 Clearly, the world model of the learner includes all necessary information relating the actionrepertoire for the learner and its ability to reproduce a particular effect. On the other hand, theworld model of the teacher provides the only information relating the decision-rule of the teacherand its eventual underlying goal.

Page 30: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

30 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

1.6.3 Experiments

In this section we compare the simulation results obtained using our proposed modelwith those observed in a well-known biological experiment in children. We alsoillustrate the application of our imitation-learning framework in a task with a robot.

Modeling Imitation in Humans

In a simple experiment described in [90], several infants were presented with ademonstration in which an adult turned a light on by pressing it with the head. Oneweek later, most infants replicated this peculiar behavior, instead of simply usingtheir hand. Further insights were obtained from this experiment when, years later, anew dimension to the study was added by including task constraints [54]. In the newexperiment, infants were faced with an adult turning the light on with the head buthaving the hands restrained/occupied. The results showed that, in this new situation,children would display a more significant tendency to use their hands to turn thelight on. The authors suggest that infants understand the goal and the restriction andso when the hands are occupied they emulate because they assume that the demon-strator did not follow the “obvious” solution because of the restrictions. Notice that,according to Fig. 1.2, using the head corresponds to imitation while using the handcorresponds to (goal) emulation.

We applied our model of social learning to an abstracted version of this experi-ment, evaluating the dependence of the behavior by the learner on the parameters λA,λE and λI in two distinct experiments. In the first experiment, we fixed the weightassigned to the baseline preferences (i.e., we set λA = 0.2) and observed how the be-havior changed as as λI goes from 0 to 1 (i.e., as the learner increasingly adheres tothe inferred goal of the demonstration). The results are depicted in Figure 10(a). No-tice that, when faced with a restricted teacher, the learner switches to an “emulative”behavior much sooner, replicating the results in [54].

On a second experiment, we disregarded the observed effect (i.e., we set λE = 0)and observed how the behavior of the learner changes as it assigns more importanceto the demonstration and focuses less on its baseline preferences (i.e., as λI goesfrom 0 to 1). The results are depicted in Figure 10(b). Notice that, in this test, weset λE to zero, which means that the agent is not explicitly considering the observedeffect. However, when combining its own interests with the observed demonstration(that includes goals, actions and effects), the learner tends to replicate the observedeffect and disregard the observed actions, thus displaying emulative behavior. Thisis particularly evident in the situation of a restricted teacher.

We emphasize that the difference in behavior between the restricted and non-restricted teacher is due only to the perceived difference on the ability of the teacherto interact with the environment. We refer to [79] for further details.

Page 31: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 31

0 0.16 0.32 0.48 0.64 0.8

0

50

100

0 0.16 0.32 0.48 0.64 0.8

0

50

100

Per

centa

ge(%

)

Replication of demonstrated use of head (hands-free teacher)

Per

centa

ge(%

)

Replication of demonstrated use of head (restricted teacher)

λI (λB = 0.2)

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Use headUse hand

Use headUse hand

Occurrence of different actions (restricted teacher)

Occurrence of different actions (hands-free teacher)

λI (λE = 0)

0

50

100

Per

centa

ge(%

)

0

50

100

Per

centa

ge(%

)

(b)

Fig. 1.10: Percentages of replication of demonstrated action. (a) Percentage of runsin which the learner replicates the demonstrated use of head. Whenever the actionwas not performed with the head, it was performed with the hand. (b) Rates ofoccurrence of the different actions. When none of the two indicated actions is per-formed, no action is performed. In both plots, each bar corresponds to a trial of2,000 independent runs.

Robot Learning by Imitation

We now present an application of our imitation learning model in a sequential taskusing BALTAZAR [78]. To test the imitation learning model in the robot we con-sidered a simple recycling game, where the robot must separate different objectsaccording to their shape (Figure 1.11). We set λE = λA = 0 and used only the im-itation module to estimate the intention behind the demonstration. In front of therobot are two slots (Left and Right) where 3 types of objects can be placed: LargeBalls, Small Balls and Boxes. The boxes should be dropped in a corresponding con-tainer and the small balls should be tapped out of the table. The large balls should betouched upon, since the robot is not able to efficiently manipulate them. Every timea large ball is touched, it is removed from the table by an external user. Therefore,the robot has available a total of 6 possible actions: Touch Left (TcL), Touch Right(ThR), Tap Left (TpL), Tap Right (TpR), Grasp Left (GrL) and Grasp Right (GrR).

For the description of the task at hand, we considered a state-space consisting of17 possible states. Of these, 16 correspond to the possible combinations of objects inthe two slots (including empty slots). The 17th state is an invalid state that accountsfor the situations where the robot’s actions do not succeed (for example, when therobot drops the ball in an invalid position in the middle of the table).

We first provided the robot with an error-free demonstration of the optimal be-havior rule. As expected, the robot was successfully able to reconstruct the optimalpolicy. We also observed the learned behavior when the robot was provided with twodifferent demonstrations, both optimal. The results are described in Table 1.4. Eachstate is represented as a pair (S1,S2) where each Si can take one of the values “Ball”

Page 32: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

32 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

Kick the ballsout of the table

Drop the boxesin the pile

Touch the largeball

Robot

Left Right

(a)

Big ball

Empty

Invalid

Tap (0.35)Grasp (0.60)

Tap (0.65)Grasp (0.40)

Touch (1.0)

Small ball

Empty

Invalid

Tap (0.35)Grasp (0.08)

Tap (0.01)Grasp (0.92)Touch (1.0)

Tap (0.64)

Square

Empty

Invalid

Tap (0.35)Grasp (0.20)

Tap (0.65)Grasp (0.25)Touch (1.0)

Grasp (0.55)

(b)

Fig. 1.11: Illustration of the recycling game. (a) The setup. (b) Transition diagramsdescribing the transitions for each slot/object.

(Big Ball), “ball” (Small Ball), “Box” (Box) or /0 (empty). The second column ofTable 1.4 then lists the observed actions for each state and the third column lists thelearned policy. Notice that, as before, the robot was able to reconstruct an optimalpolicy, by choosing one of the demonstrated actions in those states where differentactions were observed.

Table 1.4: Demonstration 1: Error free demonstration. Demonstration 2: Inaccu-rate and incomplete demonstration, where the boxed cells correspond to the statesnot demonstrated or in which the demonstration was inaccurate. Columns 3 and 5present the learned policy for Demo 1 and 2, respectively.

State Demo 1 Learned Pol. Demo 2 Learned Pol.

( /0, Ball) TcR TcR - TcR( /0, Box) GrR GrR GrR GrR( /0, ball) TpR TpR TpR TpR(Ball, /0) TcL TcL TcL TcL

(Ball, Ball) TcL,TcR TcL,TcR GrR TcL(Ball, Box) TcL,GrR GrR TcL TcL(Ball, ball) TcL TcL TcL TcL(Box, /0) GrL GrL GrL GrL

(Box, Ball) GrL,TcR GrL GrL GrL(Box, Box) GrL,GrR GrR GrL GrL(Box, ball) GrL GrL GrL GrL

(ball, /0) TpL TpL TpL TpL(ball, ball) TpL,TcR TpL TpL TpL(ball, Box) TpL,GrR GrR TpL TpL(ball, ball) TpL TpL TpL TpL

Page 33: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 33

We then provided the robot with an incomplete and inaccurate demonstration. Asseen in Table 1.4, the action at state ( /0, Ball) was never demonstrated and the actionat state (Ball, Ball) was wrong. The last column of Table 1.4 shows the learnedpolicy. Notice that in this particular case the robot was able to recover the correctpolicy, even with an incomplete and inaccurate demonstration.

(a) Initial state. (b) GraspL.

(c) TapR. (d) Final state.

Fig. 1.12: Execution of the learned policy in state (Box, SBall).

In Figure 1.12 we illustrate the execution of the optimal learned policy for theinitial state (Box, SBall).5

To assess the sensitivity of the imitation learning module to the action recogni-tion errors, we tested the learning algorithm for different error recognition rates. Foreach error rate, we ran 100 trials. Each trial consists of 45 state-action pairs, corre-sponding to three optimal policies. The obtained results are depicted in Figure 1.13.

As expected, the error in the learned policy increases as the number of wronglyinterpreted actions increases. Notice, however, that for small error rates (≤ 15%) therobot is still able to recover the demonstrated policy with an error of only 1%. Inparticular, if we take into account the fact that the error rates of the action recogni-tion method used by the robot are between 10% and 15%, the results in Figure 1.13guarantee a high probability of accurately recovering the optimal policy.

We conclude by remarking that a more sophisticated model can be used in whichobservation noise is taken into account. This may allow more insensitivity to the

5 For videos showing additional experiences see http://vislab.isr.ist.utl.pt/baltazar/demos/

Page 34: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

34 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

action recognition error (%)

polic

y er

ror

Fig. 1.13: Percentage of wrong actions in the learned policy as the action recognitionerrors increase.

noise, by including it explicit in the inference module that estimates the rewardrepresenting the goal of the demonstrator.

1.7 Other Imitation Settings

Imitation learning goes far beyond programming a single robot to perform a task,and has been used in many other settings.

For example, in [120], learning from demonstrated behavior somewhat resemblestransfer learning, in which an agent observes demonstrations in different scenariosand uses this knowledge to recover a reward function that can be used in yet otherscenarios. This problem is addressed as a max-margin structured learning problemto recover a reward function from a set of demonstrations. In order to simplify theproblem and to leverage fast solution methods, the paper formulates the problem asa non-linear (non-differentiable) unconstrained optimization problem, that is tackledusing subgradient techniques that rely on the solution structure of the (embedded)constraints.

In [45,97], robots able to imitate have been used to interact with autistic children.On a related application, the Infanoid project [70, 72] deals with gesture imitation[71], interaction with people [73], and joint attention [98]. The robots in this projectare used in human-robot interaction scenarios with particular emphasis on peoplewith special needs. Although the results seem promising, and in short term peopleseem to react well, care must be taken in ensuring that the robots are used to promotesocialization with other people, and not a stronger focus on the machine itself [121].

Page 35: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 35

Some authors have also addressed imitation in multiagent scenarios, consid-ering multiple demonstrators [132], multiple learners [30] and human-robot jointwork [42]. In the presence of multiple demonstrators, these may be performing dif-ferent tasks and the agent must actively select which one to follow. In [132], thisobservation led the authors to call their approach active imitation. Active learningapproaches applied to imitation are very recent [81, 132]. Typically, the burden ofselecting informative demonstrations has been completely on the side of the men-tor. Active learning approaches endow the learner with the power to select whichdemonstrations the mentor should perform. Several criteria have been proposed:game theoretic approaches [132], entropy [81] and risk minimization [40].

Computational models of imitation have also been proposed to understand bi-ology by synthesis. Examples include models of language and culture [3, 15], cu-riosity drives resulting in imitation behaviors [68], behavior switching in childrenand chimpanzees [79]. There have also been studies of imitation deficits relying onmodels of brain connections [110, 124]

We also note that there are other social learning mechanisms that fall outside the“imitation realm” in biological research. Often imitation is seen as a fundamentalmental process for acquiring complex social skills but other mechanisms, althoughcognitively simpler, may have their own evolutionary advantages [89, 104, 105].

1.8 Discussion

In this chapter we presented an overview of imitation learning from two differentperspectives. First, we discussed evidence coming from research in biology andneurophysiology and identified several cognitive processes required for imitation.We particularly emphasized two ideas that have a direct impact on imitation: 1) theaction-perception coupling mechanisms involved, for instance, in the mirror system;and 2) the different social learning mechanisms found in infants and primates, not allof which can be classified as “true imitation”. We also pointed out the importance ofcontextual cues to drive these mechanisms and to interpret the demonstration. As thebottom line, we stress that social learning happens at different levels of abstraction,from pure mimicry to more abstract cognitive processes.

Taking this evidence into account, we then reviewed imitation in artificial sys-tems i.e., methods to learn from a demonstration. As a result of the advances onthis topic, there is currently a vast amount of work. Following the three main chal-lenges identified in [151], we surveyed several methods that take advantage of theinformation provided by a demonstration in many different ways: as initial condi-tions for self-exploration methods (including planning), as exploration strategies, asdata from which to infer world models, or as data to infer what the task is. Thesemethods are being used with many different goals in mind, either to speed up robotprogramming, to develop more intuitive human-robot interfaces or to study cogni-tive and social skills of humans. In addition to this, we provide experimental resultsof increasing abstract imitation behaviors, from motor resonance to task learning.

Page 36: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

36 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

An open question, and one we only addressed in an empirical way, is how allthese methods are related or could be combined to achieve complex imitation be-haviors. Indeed, different approaches usually tailor their formalisms to a particulardomain of application. It is still not clear how different they are and if they can beused in several domains. If it becomes clear that they are indeed different, it wouldbe interesting to understand how to switch between each mechanism, and eventuallyunderstand if there is a parallel in the brain.

Another important aspect that requires further research is related to perception.Although this is not specific to imitation, it plays a crucial role when interpreting thedemonstration. Currently, robots are still unable to properly extract relevant infor-mation and perceive contextual restrictions from a general purpose demonstration.Due to this difficulty, having a robot companion that learns by imitation is still be-yond our technological and scientific knowledge.

Nevertheless, most of these problems can be somewhat reduced when robot pro-gramming is conducted by skilled people that can handle more intrusive sensorymodalities. In the chapter, we analyzed more in detail an alternative path to imita-tion which relies on previously learned models for the robot and the environmentthat help the understanding of the demonstration.

Using prior knowledge may simplify the interpretation of the demonstration, butrequires the acquisition of good motor, perceptual and task descriptions. Most ap-proaches consider predefined feature spaces for each of these entities. When con-sidering object-related tasks, this problem is even more important than when ad-dressing pure motor tasks. A given world state may be described in terms of objectlocations, object-object relations, robot-object relations, among many others, but itis not easy to automatically extract, or choose, among the correct representations.

Finally, a recent trend in imitation learning tries to learn task abstractions fromdemonstrations. The rationale is that, once the robot has understood a task in anabstract manner, it can easily reason about the contextual cues that drive imita-tion behaviors, include them in future plans and, as a result, generalize better toother situations. In our experimental results, we showed how to combine multipletask descriptions to switch between different social learning behaviors through abiologically-inspired computational imitation model. Also, having such a represen-tation opens the door to more general cognitive imitation architectures for robots.

Future applications of imitation will handle human-robot collaboration in coop-erative settings (with several robots or people) and active strategies for interactionwith the demonstrator.

We conclude this review by stating our belief that imitation and learning bydemonstration will become one of the capabilities that future fully autonomousrobots will extensively use, both to acquire new skills and to adapt to new situationsin an efficient manner. The path to this objective is still full of exciting researchchallenges and fascinating links to the way we, humans, develop and learn.

Acknowledgements This work was (partially) supported by: the Portuguese Fundacao paraa Ciencia e a Tecnologia, the Information and Communications Technologies Institute (underthe Carnegie Mellon-Portugal Program), the Programa Operacional Sociedade de Conhecimen-

Page 37: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 37

to (POS C) that includes FEDER funds, and the project PTDC/EEA-ACR/70174/2006), and by theEU Projects (IST-004370) RobotCub and (EU-FP6-NEST-5010) Contact.

References

1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Pro-ceedings of the 21st International Conference on Machine Learning (ICML’04), pp. 1–8(2004)

2. Alissandrakis, A., Nehaniv, C.L., Dautenhahn, K.: Imitating with alice: Learning to imitatecorresponding actions across dissimilar embodiments. IEEE Transactions on Systems, Man,& Cybernetics, Part A: Systems and Humans 32(4), 482–496 (2002)

3. Alissandrakis, A., Nehaniv, C.L., Dautenhahn, K.: Towards robot cultures? - learning to im-itate in a robotic arm test-bed with dissimilarly embodied agents. Interaction Studies 5(1),3–44 (2004)

4. Alissandrakis, A., Nehaniv, C.L., Dautenhahn, K.: Action, state and effect metrics for robotimitation. In: 15th IEEE International Symposium on Robot and Human Interactive Com-munication (RO-MAN 06), pp. 232–237. Hatfield, United Kingdom (2006)

5. Arbib, M.A., Billard, A., Iacoboni, M., Oztop, E.: Synthetic brain imaging: grasping, mirrorneurons and imitation. Neural Networks 13, 975–997 (2000)

6. Argall, B., Chernova, S., Veloso, M.: A survey of robot learning from demonstration.Robotics and Autonomous Systems 57(5), 469–483 (2009)

7. Asada, M., Ogino, M., Matsuyama, S., Ooga, J.: Imitation learning based on visuo-somaticmapping. In: O.K. Marcelo, H. Ang (eds.) 9th Int. Symp. Exp. Robot.,, vol. 21, pp. 269–278.Springer-Verlag, Berlin, Germany (2006)

8. Asada, M., Yoshikawa, Y., Hosoda, K.: Learning by observation without three-dimensionalreconstruction. In: Intelligent Autonomous Systems (IAS-6), pp. 555–560 (2000)

9. Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: 14th International Con-ference on Machine Learning, pp. 12–20. Morgan Kaufmann (1997)

10. Baker, C.L., Tenenbaum, J.B., Saxe, R.R.: Bayesian models of human action understanding.Advances in Neural Information Processing Systems 18 (2006)

11. Bakker, P., Kuniyoshi, Y.: Robot see, robot do : An overview of robot imitation. In: AISB’96:Workshop on Learning in Robots and Animals, pp. 3–11 (1996)

12. Bandera, J.P., Marfil, R., Molina-Tanco, L., Rodrıguez, J.A., Bandera, A., Sandoval, F.:Robot Learning by Active Imitation, pp. 519–544. ARS Publ. (2007)

13. Bekkering, H., Wohlschlager, A., Gattis, M.: Imitation of gestures in children is goal-directed. Quarterly J. Experimental Psychology 53A, 153–164 (2000)

14. Bellagamba, F., Tomasello, M.: Re-enacting intended acts: Comparing 12- and 18-month-olds. Infant Behavior and Development 22(2), 277–282 (1999)

15. Billard, A.: Imitation: A means to enhance learning of a synthetic proto-language in an au-tonomous robot. In: Imitation in Animals and Artifacs, pp. 281–311. MIT Press (1999)

16. Billard, A., Calinon, S., Dillman, R., Schaal, S.: Handbook of Robotics, chap. Chapter 59:Robot Programming by Demonstration. Springer (2007)

17. Billard, A., Epars, Y., Calinon, S., Cheng, G., Schaal, S.: Discovering optimal imitationstrategies. Robotics and Autonomous Systems 47:2-3 (2004)

18. Boucenna, S., Gaussier, P., Andry, P.: What should be taught first: the emotional expressionor the face? In: 8th International conference on Epigenetic Robotics. Brighton, UK (2008)

19. Brass, M., Schmitt, R.M., Spengler, S., Gergely, G.: Investigating action understanding: In-ferential processes versus action simulation. Current Biology 17(24), 2117–2121 (2007)

20. Breazeal, C.: Imitation as social exchange between humans and robots. In: AISB Symp.Imitation in Animals and Artifacts, pp. 96–104 (1999)

Page 38: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

38 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

21. Brugger, A., Lariviere, L.A., Mumme, D.L., Bushnell, E.W.: Doing the right thing: Infants’selection of actions to imitate from observed event sequences. Child Development 78(3),806–824 (2007)

22. Bruner, J.: Nature and use of immaturity. American Psychologist 27, 687–708 (1972)23. Byrne, R.W.: Imitation without intentionality. using string parsing to copy the organization

of behaviour. Animal Cognition 2, 63–72 (1999)24. Calinon, S., Billard, A.: Learning of gestures by imitation in a humanoid robot. In: Imitation

and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communica-tive Dimensions, K. Dautenhahn and C.L. Nehaniv edn., pp. 153–177. Cambridge UniversityPress (2007)

25. Call, J., Carpenter, M.: Three sources of information in social learning. In: Imitation inanimals and artifacts. MIT Press, Cambridge, MA, USA (2002)

26. Cantin-Martinez, R., Lopes, M., Melo, F.: Inverse reinforcement learning with noisy obser-vations. Tech. rep., Institute for Systems and Robotics, Lisbon, Portugal (2009)

27. Cantin-Martinez, R., Lopes, M., Montesano, L.: Active body schema learning. Tech. rep.,Institute for Systems and Robotics, Lisbon, Portugal (2009)

28. Carpenter, M., Call, J., Tomasello, M.: Twelve- and 18-month-olds copy actions in terms ofgoals. Developmental Science 1(8), F13–F20 (2005)

29. Chalodhorn, R., Rao, R.P.N.: From Motor to Interaction Learning in Robots, chap. Learningto imitate human actions through Eigenposes. Springer (2009)

30. Chernova, S., Veloso, M.: Teaching collaborative multi-robot tasks through demonstration.In: IEEE-RAS International Conference on Humanoid Robots. Korea (2008)

31. Chernova, S., Veloso, M.: Interactive policy learning through confidence-based autonomy. J.Artificial Intelligence Research 34, 1–25 (2009)

32. Cos-Aguilera, I., Canamero, L., Hayes, G.: Using a SOFM to learn object affordances. In:Workshop of Physical Agents (WAF). Girona, Spain (2004)

33. Csibra, G., Gergely, G.: ”Obsessed with goals”: Functions and mechanisms of teleologicalinterpretation of actions in humans. Acta Psychologica 124, 60–78 (2007)

34. Demiris, J., Hayes, G.: Imitative learning mechanisms in robots and humans. In: EuropeanWorkshop on Learning Robots, pp. 9–16 (1996)

35. Demiris, J., Rougeaux, S., Hayes, G.M., Berthouze, L., Kuniyoshi, Y.: Deferred imitationof human head movements by an active stereo vision head. In: 6th IEEE Int. Workshop onRobot Human Communication, pp. 88–93 (1997)

36. Demiris, Y., Dearden, A.: From motor babbling to hierarchical learning by imitation: a robotdevelopmental pathway. In: EPIROB-2005, pp. 31–37. Japan (2005)

37. Demiris, Y., Hayes, G.: Imitation as a dual-route process featuring predictive and learningcomponents: a biologically-plausible computational model. In: K. Dautenhahn, C. Nehaniv(eds.) Imitation in Animals and Artifacts. MIT Press, Cambridge, MA, USA (2002)

38. Demiris, Y., Khadhouri, B.: Hierarchical attentive multiple models for execution and recog-nition of actions. Robotics and Autonomous Systems 54, 361–369 (2006)

39. Detry, R., Baseski, E., , M.P., Touati, Y., Kruger, N., Kroemer, O., Peters, J., Piater, J.: Learn-ing object-specific grasp affordance densities. In: IEEE 8TH International Conference onDevelopment and Learning (2009)

40. Doshi, F., Pineau, J., Roy, N.: Reinforcement learning with limited reinforcement: usingbayes risk for active learning in pomdps. In: Proceedings of the 25th international conferenceon Machine learning (ICML’08), pp. 256–263 (2008)

41. D’Souza, A., Vijayakumar, S., Schaal, S.: Learning inverse kinematics. In: InternationalConference on Intelligent Robots and Systems, pp. 298–303. Hawaii, USA (2001)

42. Erlhagen, W., Mukovskiy, A., Chersi, F., Bicho, E.: On the development of intention under-standing for joint action tasks. In: 6th IEEE International Conference on Development andLearning (ICDL 2007). London, UK (2007)

43. Erlhagen, W., Mukovsky, A., Bicho, E.: A dynamic model for action understanding and goal-directed imitation. Brain Research 1083, 174–188 (2006)

Page 39: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 39

44. Erlhagen, W., Mukovsky, A., Bicho, E., Panin, G., Kiss, C., Knoll, A., van Schie, H., Bekker-ing, H.: Goal-directed imitation in robots: a bio-inspired approach to action understandingand skill learning. Robotics and Autonomous Systems 54(5), 353–360 (2006)

45. Field, T., Field, T., Sanders, C., Nadel, J.: Children with autism become more social afterrepeated imitation sessions. Autism pp. 317–324 (2001)

46. Fitzpatrick, P., Metta, G., Natale, L., Rao, S., Sandini., G.: Learning about objects through ac-tion: Initial steps towards artificial cognition. In: IEEE International Conference on Roboticsand Automation. Taipei, Taiwan (2003)

47. Fogassi, L., Gallese, V., Buccino, G., Craighero, L., Fadiga, L., Rizzolatti, G.: Cortical mech-anism for the visual guidance of hand grasping movements in the monkey: A reversible in-activation study. Brain 124(3), 571–586 (2001)

48. Fritz, G., Paletta, L., Breithaupt, R., Rome, E., Dorffner, G.: Learning predictive featuresin affordance based robotic perception systems. In: IEEE/RSJ International Conference onIntelligent Robots and Systems. Beijing, China (2006)

49. Furse, E.: A model of imitation learning of algorithms from worked examples. Cyberneticsand Systems 32, 121–154 (2001)

50. Galantucci, B., Fowler, C.A., Turvey, M.T.: The motor theory of speech perception reviewed.Psychonomic Bulletin & Review 13(3), 361–377 (2006)

51. Gallese, V., Fadiga, L., Fogassi, L., Rizolaatti, G.: Action recognition in the premotor cortex.Brain 119, 593–609 (1996)

52. Gardner, M.: Imitation and egocentric perspective transformation. In: Virtual Poster associ-ated with Perspectives on Imitation Conference (2002)

53. Gaussier, P., S. Moga, J.B., Quoy, M.: From perception-action loops to imitation processes: Abottom-up approach of learning by imitation. Applied Artificial Intelligence: An InternationJournal 1(7) (1997)

54. Gergely, G., Bekkering, H., Kiraly, I.: Rational imitation in preverbal infants. Nature 415,755 (2002)

55. Gibson, J.J.: The Ecological Approach to Visual Perception. Houghton Mifflin, Boston(1979)

56. Gopnik, A.: Cells that read minds? what the myth of mirror neurons gets wrong about thehuman brain. Slate: Brains: A special issue on neuroscience and neuroculture. (2007)

57. Grimes, D.B., Rashid, D.R., Rao, R.P.: Learning nonparametric models for probabilistic im-itation. In: Advances in NIPS (2007)

58. Hart, S., Grupen, R., Jensen, D.: A relational representation for procedural task knowledge.In: Proceedings of the 2005 American Association for Artificial Intelligence (AAAI) Con-ference. Pittsburgh, USA (2005)

59. Hayes, G., Demiris, J.: A robot controller using learning by imitation. In: Int. Symp. Intelli-gent Robotic Systems, pp. 198–204 (1994)

60. Heckerman, D., Geiger, D., Chickering, M.: Learning bayesian networks: the combinationof knowledge and statistical data. Machine Learning (1995)

61. Hersch, M., Sauser, E., Billard, A.: Online learning of the body schema. International Journalof Humanoid Robotics 5(2), 161–181 (2008)

62. Horner, V., Whiten, A.: Causal knowledge and imitation/emulation switching in chimpanzees(Pan troglodytes) and children (Homo sapiens). Animal Cognition 8, 164–181 (2005)

63. Hovland, G., Sikka, P., McCarragher, B.: Skill acquisition from human demonstration usinga hidden markov model. In: IEEE International Conference on Robotics and Automation,pp. 2706–2711. Minneapolis, MN (1996)

64. Jansen, B., Belpaeme, T.: A model for inferring the intention in imitation tasks. In: 15th IEEEInternational Symposium on Robot and Human Interactive Communication (RO-MAN06).Hatfield, UK (2006)

65. Jenkins, O., Bodenheimer, R., Peters, R.: Manipulation manifolds: Explorations into uncov-ering manifolds in sensory-motor spaces. In: International Conference on Development andLearning (ICDL 2006). Bloomington, IN, USA (2006)

Page 40: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

40 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

66. Jenkins, O.C., Mataric, M.J., Weber, S.: Primitive-based movement classification for hu-manoid imitation. In: IEEE International Conference on Humanoid Robots (Humanoids2000) (2000)

67. Johnson, S., Booth, A., O’Hearn, K.: Inferring the goals of a nonhuman agent. CognitiveDevelopment 16(1), 637–656 (2001)

68. Kaplan, F., Oudeyer, P.Y.: The progress drive hypothesis: an interpretation of early imita-tion. In: Models and Mechanims of Imitation and Social Learning: Behavioural, Social andCommunication Dimensions, pp. 361–377. Cambridge University Press., Cambridge (2007)

69. Konczak, J., Meeuwsen, H., Cress, M.: Changing affordances in stair climbing: The percep-tion of maximum climbability in young and older adults. Journal of Experimental Psychol-ogy: Human Perception & Performance 19, 691–7 (1992)

70. Kozima, H.: Infanoid: An experimental tool for developmental psycho-robotics. In: Interna-tional Workshop on Developmental Study. Tokyo, Japan (2000)

71. Kozima, H., Nakagawa, C., Yano, H.: Emergence of imitation mediated by objects. In: 2ndInt. Workshop on Epigenetic Robotics (2002)

72. Kozima, H., Nakagawa, C., Yano, H.: Attention coupling as a prerequisite for social inter-action. In: IEEE International Workshop on Robot and Human Interactive Communication.Millbrae, USA (2003)

73. Kozima, H., Yano, H.: Designing a robot for contingency-detection game. In: InternationalWorkshop on Robotic and Virtual Agents in Autism Therapy. Hertfordshire, England (2001)

74. Kulic, D., Nakamura, Y.: From Motor to Interaction Learning in Robots, chap. IncrementalLearning of Full Body Motion Primitives. Springer (2009)

75. Kuniyoshi, Y., Inaba, M., Inoue, H.: Learning by watching: Extracting reusable task knowl-edge from visual observation of human performance. IEEE Trans. on Robotics and Automa-tion 10(6), 799–822 (1994)

76. Kuniyoshi, Y., Yorozu, Y., Inaba, M., Inoue, H.: From visuo-motor self learning to earlyimitation-a neural architecture for humanoid learning. In: IEEE Int. Conf. Robotics andAutomation, vol. 3, pp. 3132–3139 (2003)

77. Liberman, A.M., Mattingly, I.G.: The motor theory of speech perception revised. Cognition21, 1–36 (1985)

78. Lopes, M., Beira, R., Praca, M., Santos-Victor, J.: An anthropomorphic robot torso for imi-tation: design and experiments. In: International Conference on Intelligent Robots and Sys-tems. Sendai, Japan (2004)

79. Lopes, M., Melo, F., Kenward, B., Santos-Victor, J.: A computational model of social-learning mechanisms. Adaptive Behavior (to be published)

80. Lopes, M., Melo, F.S., Montesano, L.: Affordance-based imitation learning in robots. In:IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1015–1021. USA(2007)

81. Lopes, M., Melo, F.S., Montesano, L.: Active learning for reward estimation in inverse rein-forcement learning. In: European Conference on Machine Learning (ECML/PKDD). Bled,Slovenia (2009)

82. Lopes, M., Santos-Victor, J.: Visual Transformations in Gesture Imitation: What you see iswhat you do. In: IEEE Int. Conf. Robotics and Automation (2003)

83. Lopes, M., Santos-Victor, J.: Visual learning by imitation with motor representations. IEEETrans. Systems, Man, and Cybernetics - Part B: Cybernetics 35(3) (2005)

84. Lopes, M., Santos-Victor, J.: A developmental roadmap for learning by imitation in robots.IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 37(2) (2007)

85. Lyons, D.E., Young, A.G., Keil, F.C.: The hidden structure of overimitation. Proceedings ofthe National Academy of Sciences 104(50), 19,751–19,756 (2005)

86. Maistros, G., Marom, Y., Hayes, G.: Perception-action coupling via imitation and attention.In: AAAI Fall Symp. Anchoring Symbols to Sensor Data in Single and Multiple Robot Sys-tems (2001)

87. Mataric, M.J.: Sensory-motor primitives as a basis for learning by imitation:linking percep-tion to action and biology to robotics. In: Imitation in Animals and Artifacts. MIT Press(2002)

Page 41: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 41

88. McGuigan, N., Whiten, A., Flynn, E., Horner, V.: Imitation of causally opaque versuscausally transparent tool use by 3- and 5-year-old children. Cognitive Development 22,353–364 (2007)

89. Melo, F., Lopes, M., Santos-Victor, J., Ribeiro, M.I.: A unified framework for imitation-likebehaviors. In: 4th International Symposium on Imitation in Animals and Artifacts. Newcas-tle, UK (2007)

90. Meltzoff, A.N.: Infant imitation after a 1-week delay: Long-term memory for novel acts andmultiple stimuli. Developmental Psychology 24(4), 470–476 (1988)

91. Meltzoff, A.N.: Understanding the intentions of others: Re-enactment of intended acts by18-month-old children. Developmental Psychology 31(5), 838–850 (1995)

92. Montesano, L., Lopes, M.: Learning grasping affordances from local visual descriptors. In:IEEE 8TH International Conference on Development and Learning. China (2009)

93. Montesano, L., Lopes, M., Bernardino, A., Santos-Victor, J.: Affordances, development andimitation. In: IEEE - International Conference on Development and Learning. London, UK(2007)

94. Montesano, L., Lopes, M., Bernardino, A., Santos-Victor, J.: Learning object affordances:From sensory–motor coordination to imitation. IEEE Transactions on Robotics 24(1), 15–26(2008)

95. Montesano, L., Lopes, M., Bernardino, A., Santos-Victor, J.: Learning object affordances:From sensory–motor coordination to imitation. IEEE Transactions on Robotics 28(1) (2008)

96. Murata, A., Fadiga, L., Fogassi, L., Gallese, V., Raos, V., Rizzolatti, G.: Object representationin the ventral premotor cortex (area f5) of the monkey. Journal of Neurophysiology 78(4),2226–2230 (1997)

97. Nadel, J.: Imitation and imitation recognition: functional use in preverbal infants and non-verbal children with autism. In: The imitative mind. Cambridge University Press (2002)

98. Nagai, Y., Asada, M., Hosoda, K.: A developmental approach accelerates learning of jointattention. In: International Conference on Development and Learning (2002)

99. Nakaoka, S., Nakazawa, A., Yokoi, K., Hirukawa, H., Ikeuchi, K.: Generating whole bodymotions for a biped humanoid robot from captured human dances,. In: IEEE InternationalConference on Robotics and Automation. Taipei, Taiwan (2003)

100. Nehaniv, C., Dautenhahn, K.: Mapping between dissimilar bodies: Affordances and the al-gebraic foundations of imitation. In: European Workshop on Learning Robots (1998)

101. Nehaniv, C.L., Dautenhahn, K.: Like me? - measures of correspondence and imitation. Cy-bernetics and Systems 32, 11–51 (2001)

102. Neu, G., Szepesvri, C.: Apprenticeship learning using inverse reinforcement learning andgradient methods. In: Uncertainty in Artificial Intelligence (UAI), pp. 295–302 (2007)

103. Nielsen, M.: Copying actions and copying outcomes: Social learning through the secondyear. Developmental Psychology 42(3), 555–565 (2006)

104. Noble, J., Franks, D.W.: Social learning mechanisms compared in a simple environment. In:Artificial Life VIII: Proceedings of the Eighth International Conference on the Simulationand Synthesis of Living Systems, pp. 379–385. MIT Press, Cambridge, MA, USA (2002)

105. Oudeyer, P.Y.: The Social Formation of Acoustic Codes with ”Something Simpler”. In:K. Dautenham, C. Nehaniv (eds.) Second International Workshop on Imitation in Animalsand Artefacts, AISB’03. Aberystwyth, Wales (2003)

106. Oztop, E., Kawato, M., Arbib, M.: Mirror neurons and imitation: A computationally guidedreview. Neural Networks 19(3), 254–271 (2006)

107. Pelleg, D., Moore, A.W.: X-means: Extending k-means with efficient estimation of the num-ber of clusters. In: International Conference on Machine Learning. San Francisco, CA, USA(2000)

108. Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. NeuralNetworks pp. 682–697 (2008)

109. Peters, J., Vijayakumar, S., Schaal, S.: Natural Actor-Critic. In: Proc. 16th European Conf.Machine Learning, pp. 280–291 (2005)

110. Petreska, B., Billard, A.: A Neurocomputational Model of an Imitation Deficit followingBrain Lesion. In: ICANN’2006, vol. vol. 4131, pp. pp. 770–779 (2006)

Page 42: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

42 Manuel Lopes, Francisco Melo, Luis Montesano and Jose Santos-Victor

111. Pomplun, M., Mataric, M.J.: Evaluation metrics and results of human arm movement imita-tion. In: IEEE-RAS Int. Conf. Humanoid Robotics (2000)

112. Price, B.: Accelerating reinforcement learning with imitation. Ph.D. thesis, University ofBritish Columbia (2003)

113. Price, B., Boutilier, C.: Implicit imitation in multiagent reinforcement learning. In: Proc.ICML, pp. 325–334 (1999)

114. Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. J.Artificial Intelligence Research 19, 569–629 (2003)

115. Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming.John Wiley & Sons, Inc. (1994)

116. Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: 20th Int. JointConf. Artificial Intelligence. India (2007)

117. Ramachandran, V.: Mirror neurons and imitation learning as the driving force behind thegreat leap forward in human evolution. Edge 69 (2000)

118. Range, F., Viranyi, Z., Huber, L.: Selective imitation in domestic dogs. Current Biology17(10), 868–872 (2007)

119. Rao, R., Shon, A., Meltzoff, A.: Imitation and social learning in robots, humans, and animals,chap. A Bayesian model of imitation in infants and robots. Cambridge University Press(2007)

120. Ratliff, N., Bagnell, J., Zinkevich, M.: Maximum margin planning. In: Proc. 23rd Int. Conf.Machine Learning, pp. 729–736 (2006)

121. Robins, B., Dautenhahn, K., Dubowski, J.: Robots as isolators or mediators for children withautism? a cautionary tale. In: AISB05: Social Intelligence and Interaction in Animals, Robotsand Agents, pp. 12–15. Hatfield, UK (2005)

122. Sahin, E., Cakmak, M., Dogar, M., Ugur, E., Ucoluk, G.: To afford or not to afford: A newformalization of affordances towards affordance-based robot control. Adaptive Behavior15(5), 447–472 (2007)

123. Sauser, E., Billard, A.: View sensitive cells as a neural basis for the representation of othersin a self-centered frame of reference. In: 3rd Int. Symp. Imitation in Animals & Artifacts(2005)

124. Sauser, E., Billard, A.: Parallel and Distributed Neural Models of the Ideomotor Principle:An Investigation of Imitative Cortical Pathways. Neural Networks 19(3), 285–298 (2006)

125. Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects using vision. Inter-national Journal of Robotics Research (IJRR) (2008)

126. Saxena, A., Wong, L., Ng, A.Y.: Learning grasp strategies with partial shape information. In:AAAI (2008)

127. Schaal, S.: Is imitation learning the route to humanoid robots. Trends in Cognitive Sciences3(6), 233–242 (1999)

128. Schaal, S., Ijspeert, A., Billard, A.: Computational approaches to motor learning by imitation.Phil. Trans. of the Royal Society of London: Series B, Biological Sciences 358(1431) (2003)

129. Schaal, S., Peters, J., Nakanishi, J., Ijspeert, A.: Learning movement primitives. In: Interna-tional Symposium on Robotics Research (ISRR2003) (2003)

130. Shon, A., Grochow, K., Rao, R.: Robotic imitation from human motion capture using gaus-sian processes. In: IEEE/RAS International Conference on Humanoid Robots (Humanoids),(2005)

131. Shon, A.P., Joshua, Storz, J., Rao, R.P.N.: Towards a real-time bayesian imitation system fora humanoid robot. In: ICRA’07 (2007)

132. Shon, A.P., Verma, D., Rao, R.P.N.: Active imitation learning. In: AAAI (2007)133. Stoytchev, A.: Behavior-grounded representation of tool affordances. In: International Con-

ference on Robotics and Automation. Barcelona, Spain (2005)134. Stulp, F., Fedrizzi, A., Beetz, M.: Learning and performing place-based mobile manipulation.

In: IEEE 8TH International Conference on Development and Learning (2009)135. Sturm, J., Plagemann, C., Burgard, W.: Adaptive body scheme models for robust robotic

manipulation. In: RSS - Robotics Science and Systems IV. Zurich, Switzerland (2008)

Page 43: Chapter 1 Abstraction Levels for Robotic Imitationrobotcub.org/misc/papers/09_Lopes_Montesano_etal_2.pdf · Section 1.2.1 details information from neurophysiology and Section 1.2.2

1 Abstraction Levels for Robotic Imitation: 43

136. Syed, U., Schapire, R., Bowling, M.: Apprenticeship learning using linear programming.In: Proceedings of the 25th International Conference on Machine Learning (ICML’08), pp.1032–1039 (2008)

137. Tani, J., Ito, M., Sugita, Y.: Self-organization of distributedly represented multiple behaviorschemata in a mirror system: Reviews of robot experiments using rnnpb. Neural Networks17(8/9), 1273–1289 (2004)

138. Tennie, C., Call, J., Tomasello, M.: Push or pull: Imitation vs. emulation in great apes andhuman children. Ethology 112(12), 1159–1169 (2006)

139. Thurau, C., Paczian, T., Sagerer, G.: Bayesian imitation learning in game characters. Int. J.Intelligent Systems Technologies and Applications 2(2/3) (2007)

140. Tomasello, M., Kruger, A.C., Ratner, H.H.: Cultural learning. Behavioral and Brain Sciences16(3), 495–511 (1993)

141. Turvey, M., Shockley, K., Carello, C.: Affordance, proper function, and the physical basis ofperceived heaviness. Cognition 73 (1999)

142. Verma, D., Rao, R.: Goal-based imitation as probabilistic inference over graphical models.In: Advances in NIPS 18 (2006)

143. Visalberghi, E., Fragaszy, D.: “Do monkeys ape?”: ten years after. In: Imitation in Animalsand Artifacts. MIT Press, Cambridge, MA, USA (2002)

144. Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human mo-tion. Trans. PAM pp. 283–298 (2008)

145. Want, S., Harris, P.: Learning from other people’s mistakes: Causal understanding in learningto use a tool. Child Development 72(2), 431–443 (2001)

146. Want, S.C., Harris, P.L.: How do children ape? Applying concepts from the study of non-human primates to the development study of “imitation” in children. Developmental Science5(1), 1–13 (2002)

147. Whiten, A., Custance, D., Gomez, J.C., Teixidor, P., Bard, K.A.: Imitative learning of artifi-cial fruit processing in children (Homo sapiens) and chimpanzees (Pan troglodytes). Journalof Comparative Psychology 110, 3–14 (1996)

148. Whiten, A., Horner, V., Litchfield, C.A., Marshall-Pescini, S.: How do apes ape? Learning& Behavior 32(1), 36–52 (2004)

149. Williams, T.G., Rowland, J.J., Lee, M.H.: Teaching from examples in assembly and manip-ulation of snack food ingredients by robot. In: 2001 IEEE/RSJ, International Conference onIntelligent Robots and Systems, pp. 2300–2305 (2001)

150. Williamson, R.A., Markman, E.M.: Precision of imitation as a function of preschoolers’understanding of the goal of the demonstration. Developmental Psychology 42(4), 723–731(2006)

151. Wolpert, D.M., Doya, K., Kawato, M.: A unifying computational framework for motor con-trol and social interaction. Philosophical Transactions of the Royal Society of London. SeriesB, Biological Sciences 358(1431), 593–602 (2003)

152. Wolpert, D.M., Kawato, M.: Multiple paired forward and inverse models for motor control.Neural Networks 11(7-8), 1317–1329 (1998)

153. Yang, J., Xu, Y., Chen, C.: Hidden markov model approach to skill learning and its applica-tion to telerobotics. IEEE Transations on Robotics and Automation 10(5), 621–631 (1994)

154. Zentall, T.R.: Imitation in animals: Evidence, function, and mechanisms. Cybernetics andSystems 32(1), 53–96 (2001)

155. Zhang, J., Rossler, B.: Self-valuing learning and generalization with application in visuallyguided grasping of complex objects. Robotics and Autonomous Systems 47, 117–127 (2004)

156. Ziebart, B., Maas, A., Bagnell, J., Dey, A.: Maximum entropy inverse reinforcement learning.In: Proc. 23rd AAAI Conf. Artificial Intelligence, pp. 1433–1438 (2008)