13
Robotics and Autonomous Systems 60 (2012) 729–741 Contents lists available at SciVerse ScienceDirect Robotics and Autonomous Systems journal homepage: www.elsevier.com/locate/robot Imitating others by composition of primitive actions: A neuro-dynamic model Hiroaki Arie a,, Takafumi Arakaki b , Shigeki Sugano b , Jun Tani a a Brain Science Institute, RIKEN, Japan b Department of Advanced Mechanical Engineering, Waseda University, Japan article info Article history: Available online 25 November 2011 Keywords: Cognitive robotics Neural network Imitation Dynamical system abstract This paper introduces a novel neuro-dynamical model that accounts for possible mechanisms of action imitation and learning. It is considered that imitation learning requires at least two classes of generalization. One is generalization over sensory–motor trajectory variances, and the other class is on cognitive level which concerns on more qualitative understanding of compositional actions by own and others which do not necessarily depend on exact trajectories. This paper describes a possible model dealing with these classes of generalization by focusing on the problem of action compositionality. The model was evaluated in the experiments using a small humanoid robot. The robot was trained with a set of different actions concerning object manipulations which can be decomposed into sequences of action primitives. Then the robot was asked to imitate a novel compositional action demonstrated by a human subject which are composed from prior-learned action primitives. The results showed that the novel action can be successfully imitated by decomposing and composing it with the primitives by means of organizing unified intentional representation hosted by mirror neurons even though the trajectory- level appearance is different between the ones of observed and those of self-generated. © 2011 Elsevier B.V. All rights reserved. 1. Introduction The understanding of other’s actions and imitating them is a significant cognitive capability for humans. Human adults can easily acquire various actions within a very short time by watching other’s actions. The learning of such skilled actions should not be considered just as recording a set of experienced motor trajectories in rote memory but as acquiring certain structures from them. One of such essential structures should be so-called the ‘‘behavior compositionality’’ [1]. The term compositionality here is adopted from ‘‘Principle of compositionality’’ [2] in linguistics. The principle claims that the meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them. This principle can be adopted also in action domain—skilled actions can be generated by combining a set of re-usable action primitives adaptively by following rules. For example, an attempt of drinking a cup of water might be decomposed into multiple action primitives such as reaching for a cup, grasping the cup and moving the cup toward one’s mouth. Each action primitive can be re-utilized also as a component for other actions e.g. reaching for a cup can be used for another goal such as clearing it away. Corresponding author. E-mail address: [email protected] (H. Arie). This idea of decomposition of whole actions into sequences of reusable primitives has been considered by Arbib [3] as motor schemata theory. The theory considers that a set of action primitives might be preserved in a memory pool and the higher level might retrieve adequate primitives from the pool in order to combine them in generating desired actions. A substantial number of engineering as well as theoretical researches have attempted to implement this idea into autonomous agents including robots [4–6] as a key mechanism to generate their diverse and complex actions. The typical implementations are to prepare a set of local modules for registering action primitive patterns by using a soft computing scheme such as neural network models [7] or fuzzy systems [8] in the lower level and combining those primitives into sequences by manipulating pointers to them symbolically in the higher level. However, such hybrid approaches are likely to have symbol grounding problems [9]. This is because the interface between the higher level of symbol processing and the lower level of pattern processing can be just arbitrary since the symbol systems consisting of arbitrary shapes of tokens cannot share the same metric space with the analog patterns in their interactions [10,11]. On this account, Tani and his colleagues [10–12] considered that whole systems might be better constructed on seamless analog dynamical systems instead of having hybrids of symbol and pattern systems. This idea seemed possible because theory of symbolic dynamics [13,14] had already shown that dynamical systems can exhibit equivalence of symbol-like computation by 0921-8890/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.robot.2011.11.005

2012_Arie_Paper_ImitatingOthersbyCompositionofPrimitiveActionsANeuroDynamicModel.pdf

Embed Size (px)

Citation preview

  • Robotics and Autonomous Systems 60 (2012) 729741

    Contents lists available at SciVerse ScienceDirect

    Robotics and Autonomous Systems

    journal homepage: www.elsevier.com/locate/robot

    Imitating others by composition of primitive actions: A neuro-dynamic modelHiroaki Arie a,, Takafumi Arakaki b, Shigeki Suganob, Jun Tani aa Brain Science Institute, RIKEN, Japanb Department of Advanced Mechanical Engineering, Waseda University, Japan

    a r t i c l e i n f o

    Article history:Available online 25 November 2011

    Keywords:Cognitive roboticsNeural networkImitationDynamical system

    a b s t r a c t

    This paper introduces a novel neuro-dynamical model that accounts for possible mechanisms ofaction imitation and learning. It is considered that imitation learning requires at least two classes ofgeneralization. One is generalization over sensorymotor trajectory variances, and the other class is oncognitive level which concerns on more qualitative understanding of compositional actions by own andothers which do not necessarily depend on exact trajectories. This paper describes a possible modeldealing with these classes of generalization by focusing on the problem of action compositionality. Themodel was evaluated in the experiments using a small humanoid robot. The robot was trained with aset of different actions concerning object manipulations which can be decomposed into sequences ofaction primitives. Then the robot was asked to imitate a novel compositional action demonstrated by ahuman subject which are composed from prior-learned action primitives. The results showed that thenovel action can be successfully imitated by decomposing and composing it with the primitives bymeansof organizing unified intentional representation hosted by mirror neurons even though the trajectory-level appearance is different between the ones of observed and those of self-generated.

    2011 Elsevier B.V. All rights reserved.

    1. Introduction

    The understanding of others actions and imitating them isa significant cognitive capability for humans. Human adults caneasily acquire various actions within a very short time bywatchingothers actions. The learning of such skilled actions should not beconsidered just as recording a set of experiencedmotor trajectoriesin rote memory but as acquiring certain structures from them.One of such essential structures should be so-called the behaviorcompositionality [1]. The term compositionality here is adoptedfrom Principle of compositionality [2] in linguistics. The principleclaims that the meaning of a complex expression is determinedby the meanings of its constituent expressions and the rulesused to combine them. This principle can be adopted also inaction domainskilled actions can be generated by combining aset of re-usable action primitives adaptively by following rules.For example, an attempt of drinking a cup of water might bedecomposed into multiple action primitives such as reaching fora cup, grasping the cup and moving the cup toward ones mouth.Each action primitive can be re-utilized also as a component forother actions e.g. reaching for a cup can be used for another goalsuch as clearing it away.

    Corresponding author.E-mail address: [email protected] (H. Arie).

    This idea of decomposition of whole actions into sequencesof reusable primitives has been considered by Arbib [3] asmotor schemata theory. The theory considers that a set of actionprimitives might be preserved in a memory pool and the higherlevel might retrieve adequate primitives from the pool in order tocombine them in generating desired actions. A substantial numberof engineering as well as theoretical researches have attemptedto implement this idea into autonomous agents including robots[46] as a key mechanism to generate their diverse and complexactions. The typical implementations are to prepare a set of localmodules for registering action primitive patterns by using a softcomputing scheme such as neural network models [7] or fuzzysystems [8] in the lower level and combining those primitivesinto sequences by manipulating pointers to them symbolicallyin the higher level. However, such hybrid approaches are likelyto have symbol grounding problems [9]. This is because theinterface between the higher level of symbol processing and thelower level of pattern processing can be just arbitrary since thesymbol systems consisting of arbitrary shapes of tokens cannotshare the same metric space with the analog patterns in theirinteractions [10,11].

    On this account, Tani and his colleagues [1012] consideredthat whole systems might be better constructed on seamlessanalog dynamical systems instead of having hybrids of symboland pattern systems. This idea seemed possible because theoryof symbolic dynamics [13,14] had already shown that dynamicalsystems can exhibit equivalence of symbol-like computation by

    0921-8890/$ see front matter 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2011.11.005

  • 730 H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741

    a b

    Fig. 1. (a) The dynamic function generates different trajectories of x depending on initial internal state ct=0. (b) An example of generating three different x trajectories whichtrifurcate after the branching point depending on the initial internal state values.

    a b

    Fig. 2. (a) The internal initial state for achieving a distal goal state is inversely computed. (b) Multiple initial states can exist for achieving the same distal goal state.

    means of combinatorial mechanics of chaos and fractals. It hadbeen also shown that adaptive dynamical systems, especiallyrecurrent neural network (RNN) model can acquire equivalencesof desired symbolic systems of finite state machines (FSM) by self-organizing fractals [15,16] or chaos [17] in the internal dynamics.Our expectation has been that cognitive processes of beingcompositional, yet contextual and situated to the reality, couldbe resulted by means of the seamless interactions between thetop-down cognitive processing and the bottom-up sensorymotorprocessing in the embodied neuro-cognitive dynamical systemsunder the name of organic compositionality [1].

    Now, we explain how dynamical systems framework canbe utilized to achieve compositionality in generating diverseactions more specifically. First, we explain simpler problems ofhow a repertory of multiple sensorymotor primitives can beembedded in dynamic neural network through learning. Thereare two distinct schemes: one is a local representation schemeand the other is a distributed representation scheme. In the localrepresentation scheme, each sensorymotor pattern is learned asforward model in a specific local dynamic neural network.

    Each forward model predicts how sensorymotor state xchanges in time by means of its dynamic function f (x, c) wherec is the internal state which is not observable from the external.This type of dynamic function can be realized by the continuoustime recurrent neural network model (CTRNN) [18,19]. The outputof a corresponding local network can be selected externally amongothers, for example, by utilizing awinner-take-all dynamics (WTA)type gating mechanism [20,21].

    On the other hand, in the distributed representation scheme, aset of sensorymotor primitives are embedded in a single dynamicnetwork. How can each distinct repertory of sensorymotorpattern be selectively generated? There are two ways to achievethis. One way is to utilize the parameter bifurcation characteristicsof nonlinear dynamical systems. A dynamic function associatedwith a parameter p, which is denoted as f (x, c, p), can generatemultiple dynamic patterns by modulating the value of p. Taniet al. [22] implemented this idea in a neural network modelso-called the recurrent neural network with parametric biases(RNNPB).

    The other way is to utilize the initial sensitivity characteristicsof nonlinear dynamical systems of which mechanism will be the

    focus in the current paper. The idea is schematized in Fig. 1(a).The dynamic function can generate different trajectories of xdepending on initial internal state ct=0 given. Based on this idea,Nishimoto et al. [23] showed that a humanoid robot learns togenerate three different actions of (): grasping an object andmoving it to the left, (): grasping an object and lifting it up,and ( ): grasping an object and moving it to the right by startingfrom the same home position via repeated supervised trainingof trajectories. It was shown that the differences in the initialinternal state values which are self-determined through learningenables the robot trajectories to trifurcate after the robot graspsthe object (see the schematics of the idea in Fig. 1(b)). In thissituation, an initial internal state is regarded as setting of intentionsfor particular action programs.

    The above mentioned scheme is further extended to be appliedto goal-directed planning problems [24]. If distal states of sensorytrajectories of learned are taken as goal states, e.g. object movedleft, upward or right in the previous example, the initial internalstates which lead to these goal states can be inversely computed(Fig. 2(a)). This inverse computation requires iterative search forfinding an optimal initial state value. This process correspondsto deliberative planning processes. There could be cases that thesame distal goal state can be achieved by multiple trajectories asshown in Fig. 2(b). In such cases, multiple plans in terms of thesecorresponding initial states should be found through iterativesearch [24].

    It is generally thought that introduction of hierarchy canincrease behavioral complexity of the system significantly. Ya-mashita and Tani [25] showed that functional hierarchical struc-ture responsible for generating action primitives and theirsequencing can emerge in dynamics of CTRNN which consists ofmultiple time-scale dynamics as illustrated in Fig. 3. In the slowdynamics part different profiles of slowly changing dynamic pat-terns (Sa, Sb or Sc) appear depending on action programs in termsof the initial state (ISa, ISb or ISc) given which turn out to drive thefast dynamics part to generate corresponding sequences of primi-tive patterns (Fa, Fb or Fc).

    Now let us consider possible correspondences of abovementioned models to system level neuroscience. Tani et al. [1]speculated that setting or determining initial internal states might

  • H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741 731

    Fig. 3. Functional hierarchical structures appear in a dynamic neural network model consisting of multiple time-scale (fast and slow) dynamics.

    Fig. 4. The proposed brain model where triangular functional relations among PFC, PMv and IPL are shown.

    correspond to the building-up of a motor programwhich is knownto take place in a premotor (PM) as well as a supplementarymotor area (SMA) [26]. Furthermore, prefrontal cortex which hasbeen assumed to play important roles in planning [27] mightinvolve in searching optimalmotor programs in terms of the initialstate in PM or in SMA. It has been widely considered that PMinvolves more with generation of sensory-guided action whileSMA does with the ones by internal drives [26]. Especially, theroles of mirror neurons [28] in a ventral premotor (PMv) foraction generation and recognition of the same action by othershave drawn a large amount of attention recently. Some emphasizemutual interactions between the PMv and the inferior parietallobe (IPL) in generation of own actions and recognition of thesame actions by the other [29,30]. They assume that PMv receivesintegrated sensory information from IPL and PMv sends back theefference copy to IPL. Under these circumstances, the current paperfocus on triangular functional relations among PFC, PMv and IPL ingeneration of own skilled actions and recognition of ones by others.One specific assumption in the currentmodel is that IPLmight playessential roles in acquiring the predictive model for well-practicedactions by means of the so-called sensory-forward model [1].More specifically, IPL might anticipate coming sensory flow interms of proprioception and vision associated with a currentlyintended action. The assumption of the anticipation function forIPL might be unusual to some readers because the main role ofIPL has been regarded as an integrator for different modalitiesof sensory perceptions [31]. However, there have been glowingevidences [3235] which could support the ideas of anticipatoryfunctions in IPL.

    The global picture of the proposed brainmodel would be like asshown in Fig. 4. In this figure, PFC might set an intention for actionprogram into PMv by providing initial states. PMv might generate

    abstract sequencing of action primitives by means of its slowdynamics while IPL, which is bi-directionally connected to PMv,might generate detailed visuo-proprioceptive flow by anticipationby means of its fast dynamics. The prediction of proprioception interms of body posture represented in the joint coordinate systemmay provide target position at the next moment to motor cortexor cerebellum in which necessary motor commands might beobtained by means of inverse computation. On the other hand, theprediction of visual image would generate visual mental imageryfor future of which signal might descend the visual pathway ina top-down manner. PFC might utilize the visual mental imagerygenerated in IPL for the purpose of searching the action programsin PMv, which could provide the best match between the desiredvisual goal images and mentally simulated ones. The details of thebrain model will be described in the later sections.

    The current paper introduces a novel experiment concerningimitation of actions using a humanoid robot implemented withthe above mentioned neuro-dynamic model. We consider thatthere are two different ways of imitation. One is a trajectorylevel imitation, in which an imitator mimic an action trajectoryof a demonstrator exactly [36]. This trajectory level imitationcan be achieved by acquiring the ability of motion tracking, poseestimation, body correspondence and coordinate transformation.The other one is primitive level imitation, in which an imitatorrecognizes a demonstrated action as a sequence of actionprimitives; then the imitator maps the demonstrators primitivesto own ones and generates own action primitive sequence. In thecurrent paper, we are focusing on the latter case.

    At first, the robot learns to bind observation of a human sub-jects (demonstrators) actions by generation of own correspondingactions in supervised ways (see Fig. 5 for schematics). The learn-ing is conducted for a set of different actions concerning manip-ulation of an object which can be decomposed into sequences of

  • 732 H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741

    Fig. 5. The imitation learning task.

    action primitives. The robot learning is conducted by having vari-ations in relative observational angles from the robot to the hu-man demonstrator as well as in object positions for the purpose ofgaining generalization via accumulated learning. It is important tonote that there could be a potential difference in visual perceptionduring observation of the demonstrators action and generation ofown corresponding action. So the robot should acquire more qual-itative understanding of actions independent of the view, that is tofind a set of action primitives shared by own generation case andobservation of demonstrators case as segmented from continuousperceptual trajectories even though their trajectory-level appear-ances are different between the two.

    Then, an interesting test is to see how the robot can imitatenovel actions shown by the demonstrator which are composedfrom prior-learned action primitives. This examination would ad-dress general questions in the imitation learning community thathow imitation can be conducted by qualitative understanding of

    others actions rather than merely mimicking physical action tra-jectories and how generalization via learning could be achieved[3739]. The current paper attempts to show a possible accountfrom our proposed neuro-dynamic systems perspective.

    2. Model

    In this section, our proposed neuro-dynamicmodel is explainedin more detail. The next section describes the over all informationflow by making possible correspondences to anatomical organiza-tion of brains. Especially, we describe two specificmodes of on-linegeneration of anticipatory actions and off-line goal-directed plangeneration mode. Then, more details of the neural network mod-eling scheme will be given in nextnext subsection.

    2.1. Brain model

    2.1.1. Generation of anticipatory actionsFig. 6 illustrates how anticipatory actions are generated by

    going through interactions between PMv and IPL. It is supposedthat PMv receives inputs from PFC which switch the systemsoperational mode between the self mode and the demonstratormode. (Such neurons of distinguishing between own actions andthedemonstrators,were found inparietal cortex [40] but not yet inpremotor.) The visuo-proprioceptive inputs accompanied by ownactions are predicted in the self mode while visual perceptionof the observing demonstrator are predicted in the demonstratormode. PFC also sets the initial states (the initial potential valuesof some neuronal units) in PMv. After starting from the initialstate, neuronal activation profiles change slowly in PMv whichaffects the forward dynamics in IPL by having mutual interactionsbetween the two. In the self mode, coming visual inputs andproprioceptive inputs in terms of body posture are predicted inIPL based on the prior learning. The prediction of proprioception(posture in terms of joints angles) in the next time step is utilizedas the target to be achieved in the next time step. Cerebellum ormotor cortex might compute necessary motor torques to achievethe target for joint angles. On the other hand, in the demonstratormode, the visual inputs in the next time step during observationof the demonstrators action are predicted. Although there areprediction outputs for the proprioception also in the demonstratormode, they are not utilized but are fed back as recurrent inputs.

    Fig. 6. Generation of anticipatory behaviors via interactions between PMv and IPL.

  • H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741 733

    Fig. 7. Planning by interactions among PFC, PMv and IPL.

    2.1.2. PlanningIn the current setting, planning is regarded as a process to

    determine the optimal action program which matches the distalvisual images of mentally simulated with the goal image ofspecified (See Fig. 7). For this purpose, the initial state of PMv isiteratively searched such that look-ahead prediction of the visualimage by IPL can fit with the goal image specified. The look-aheadprediction in IPL is conducted by means of mental imagery byclosing the loops between the next step prediction and currentstep inputs for both vision and proprioception. The planning canbe conducted both for own actions and demonstrators actions byswitching between the self and the demonstrator modes.

    2.2. Mathematical model

    2.2.1. OverviewThemultiple timescale recurrent neural network (MTRNN) [25]

    model used in this study is shown in Fig. 8. A small humanoidrobot, which has two 4-DOF arms, a 2-DOF neck and a stereocamera, is used in the role of a physical body interacting withactual environment. The neural network model receives theproprioceptive state (Pt) representing the posture of arms in termsof the joint angles, the direction of the camera in terms of the neckjoint angles (Vdt ) and the visual image obtained with the camera(V vt ).

    At first, current sensory inputs (Pt , Vdt , Vvt ) are pre-processed

    using topology preserving maps (TPM), which transforms avector of continuous values into neural population coding. Afterthis transformation, input signals are sent to the MTRNN andthe MTRNN predicts sensory inputs, which was encoded byneural population coding, for the next time step. This predictedsignal is transformed inversely from neural population coding tocontinuous values (Pt+1, Vdt+1, V vt+1) using TPM. Then the actualrobot movement just follows this prediction of the proprioceptivestate. In addition, the system can be operated also in the so-calledclosed-loop mode to generate mental simulation of sensorysequential patterns without generating physical movements byfeeding back the prediction of the next sensory inputs to thecurrent ones.

    In our currentmodel, theMTRNNconsists of three parts, namelyinputoutput units, fast context units and slow context units. Boththe inputoutput units and fast context units correspond to theIPL. In addition, the slow context units correspond to the PMv.The number of neural units is 100 for fast context and 50 for slow

    Fig. 8. Neural network model interfacing with a robot.

    context respectively. Among the slow context units, there are fourspecial neuronal units. Two of them are used to specify actionprograms. The initial states of these neurons (IS) take the samevalue both in generating a particular action in the self mode andin generating mental imagery of observing the correspondingaction by the demonstrator. Each initial state for the correspondingaction sequence is self-determined in the learning process asbound to be the same for both modes. Other two neurons functionas a switch between self mode and demonstrator mode. Theneuronal activation of these switching units are set as a differentvalue, which takes constant value from the start to the end ofthe action sequence, by PFC depending on the operational modesbetween the self mode and the demonstrator mode.

    In our current model, PFC has two functions. The first oneis to remember initial states of the PMv, which were self-determined in the learning process, as a database. When the robotis required to generate action sequence, which the robot havealready experienced, PFC recall the initial state from the memoryand set the initial state of PMv. The other is to find the initial stateof PMv corresponding to the optimal action program to achieve thegiven goal state. This search algorithm is described in the followingsection.

  • 734 H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741

    2.2.2. Sparse encoding of inputoutputInput to the neural network are sparsely encoded in the form

    of a population coding using topology preserving maps (TPM).The current model has three TPMs, one map corresponding toproprioceptive state P , one map corresponding to neck joint angleVd and one map corresponding to camera image V v . The TPM is atype of a neural network that produces a discretized representationof the input space of training samples. This sparse encoding ofinput trajectories reduces the overlaps of input sequence, so thatit improves the learning ability of the MTRNN as described in [25].The size of TPMs are 100(1010) for proprioception, 36(66) forneck joint angle and 144(12 12) for camera image, respectively.

    In the current study, TPMs are trained in advance of MTRNNtraining using conventional unsupervised learning algorithm [41].Training data set of TPMs included all teaching signals for theMTRNN. The TPM transformation is described by the followingformula,

    qi,t =exp

    nkkikteachk2

    oPj2Z

    expnkkjkteachk2

    o (1)where if i 2 P , then Z = P and kteach = Pt , if i 2 Vd, then Z = Vdand kteach = Vdt , if i 2 V v , then Z = V v and kteach = V vt . is aconstant value, which indicating the shape of the distribution ofqi,t , set at 0.1 in the current study. ki is reference vector of i-thneural unit in a TPM.

    The MTRNN generates prediction of next step sensory visuo-proprioceptive state based on the acquired forward dynamicsdescribed later. This prediction generated by MTRNN can beassumed to correspond to an activation probability distributionover the TPMs. It is inversely transformed to a visuo-proprioceptiveinput signal using the same TPMs:

    kout =Xi2Z

    yi,t ki (2)

    where if i 2 P , then Z = P and kout = Pt+1, if i 2 Vd, then Z = Vdand kout = Vdt+1, and if i 2 Vv , then Z = V v and kout = V vt+1.

    2.2.3. Forward generation of MTRNNThe MTRNN [25] is a type of the continuous time recurrent

    neural network (CTRNN). In the CTRNN, neuronal activations arecalculated using the conventional firing rate model shown inEq. (3).

    idui,tdt

    = ui,t +Xj

    wijxj,t (3)

    yi,t =

    8>>>>>:exp(ui,t)P

    j2Zexp(uj,t)

    , if i 2 Z

    11+ exp(ui,t) , otherwise

    (4)

    xi,t+1 =qi,t+1, if i 2 Zyi,t , otherwise

    (5)

    where ui,t is the potential of each i-th neural unit at time step t, xj,tis the neural state of the j-th unit and wij is synaptic weight fromthe j-th unit to the i-th unit and Z is P or Vd or V v and qi,t is inputvectors using TPMs.

    The characteristic of the MTRNN is that they consist of neuronsof multiple time-scale dynamics. The MTRNN, which is usedin the current study, consisted of inputoutput units and non-inputoutput units, the latter referred to as context units. Context

    units are divided into two groups based on the value of timeconstant . The first group consisted of fast context units with asmall time constant ( = 2), whereas the second group consistedof slow context units with a large time constant ( = 10).This characteristic enables the MTRNN to articulate complexvisuo-proprioceptive sequences into reusable primitive patternsby self-organizing their functional hierarchy while learning. Theeffect of having multiple time-scale dynamics among groups isroughly summarized as follows: neurons with a fast time constantassemble a set of primitives and neuronswith a slow time constantmanipulate them in various sequences depending on IS.

    IS values are set by PFC before forward generation. Those valuesare self-determined under the condition that they take the samevalue when generating the particular action in the self mode andthe corresponding mental imagery in the demonstrator modethrough the learning process. The potential values u of neuralunits corresponding to self-other switch are set 100 in self mode,whereas the values are set 100 in demonstrator mode. Duringthe learning phase of the MTRNN, the same potential values areused.

    2.2.4. Learning of own and demonstrators action sequencesIn our current study, the TPMs are trained in advance of

    the MTRNN training using conventional unsupervised learningalgorithm. The goal of training the MTRNN is to find the optimalvalues of the connective weights and IS which minimize the valueof E, which is defined as the learning error. We define the learningerror using KullbackLeibler divergence as a following equation,

    E =Xt

    Xi2IO

    yi,t logyi,tyi,t

    (6)

    where IO represents inputoutput units, yi,t is the desiredactivation value of the output neuron at time t and yi,t is theactivation value of the output neuron with current connectiveweights and IS. As a training method, we use the general BackPropagation Through Time (BPTT) algorithm [42]. Using the BPTTalgorithm, the network can reach their optimal levels for all giventeaching sequences by updating the connective weights in theopposite direction of the gradient @E/@w.

    wij(n+ 1) = wij(n) @E@w

    (7)

    where is the learning rate constant, and n is an indexrepresenting the iteration epoch in the learning process. Note thatthere are both own action sequence and demonstrators actionsequence in the training data set. However, there are no teachingsignals for proprioceptive sequence when the neural network istrained with demonstrators action sequence, and so the error iscalculated only with the direction of the camera (Vdt ) and visualimage (V vt ) for demonstrators action sequences.

    @E@w

    is given by

    @E@w

    =Xt

    1i

    @E@ui,t

    yj,t1 (8)

    and @E@ui,t

    is recursively calculated from the following recurrenceformula

    @E@ui,t

    =

    8>>>>>:yi,t yi,t +

    1 1

    i

    @E

    @ui,t+1if i 2 IO

    Xk2N

    @E@ui,t+1

    "ik

    1 1

    i

    + 1

    kwkif 0(ui,t)

    #if i 62 IO

    (9)

    where f 0() is the derivative of the sigmoidal function and ik isKronecker delta.

  • H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741 735

    Fig. 9. Task design: the robot was trained for three task trajectories of (R)(K), (U)(K) and (U)(L) with starting and going back to home position. Then, it was tested if the robotcan generate (R)(L) task trajectory by observing its corresponding action by the demonstrator.

    In addition, an update of the initial context unit valuescorresponding to action sequence (IS) is obtained simultaneouslyby utilizing the delta error @E

    @ui,0back-propagated through time

    steps to the context unit at the initial step.

    ci,0(n+ 1) = ci,0(n) @E@ui,0

    (10)

    where ci,0 is the potential of i-th neural unit. As described inprevious section, the initial states that correspond to same actionprograms should be the same value either in the self mode ordemonstrator mode in the proposed bind learning schema.To satisfy this condition, the initial state values that correspondto same action programs are averaged among the training dataset after applying Eq. (10). It is summarized that the learningproceeds with dense interaction between the top-down and thebottom-up pathways where imagery of visuo-proprioceptive(VP) sequences generated in the top-down pathway by means ofcurrent weights and IS values are iteratively encountered by theactual VP sequences of the bottom-up pathways. As the generatederror is propagated through all neurons including the fast and slowparts, the synaptic weights in the whole network can be updated.

    2.2.5. Goal-directed planningWhen the goal state is given by an experimenter, in terms

    of distal visuo-proprioceptive state (Pgoal, Vdgoal, Vvgoal), IS value to

    generate VP sequences that include the goal state in their distalsteps, is searched. The search is conducted by means of the BPTTalgorithm [42] but without modulating the synaptic weights. First,the MTRNN generates imagery of VP sequence by closed-loopoperation with temporarily set IS values. At this time, neuronalactivation of context neurons corresponding to self-other switchare set to self mode. Then, the Euclidean distance betweenthe given goal state and the state generated in the imagery VPsequences are calculated for all time steps in the sequences, andthe particular time step where the calculated Euclidean distancebecomes minimum are marked up. The error is back-propagatedfrom this marked step to the initial step in the sequence and the ISvalues are updated using Eq. (10). The search is started by settingthe initial IS values randomly and it is iterated until the error isminimized. Then the plan is enacted by the robot in the physicalenvironment with activating the network forward dynamics withsetting IS with the obtained values in the search.

    3. Experiment

    3.1. Task design

    The experiment was carried out by using a small humanoidrobot named HOAP3. The robot was fixed to a chair, and aworkbench was set up in front of the robot. A movable rectangularsolid object which is painted half blue and half red was placed onthe workbench. The neck joints are controlled by PID controller,which is programmed to track a red-color object to be centeredin the retinal image. Inputs to the neural model are angles of armjoints Pt (8 dimensional vector) and neck joints Vdt (2 dimensionalvector) and the visual perception V vt (1612 dimensional vector).Each pixel component of the visual perception vector can take oneof four possible values (0, 0.375, 0.625, 0.875) depending on thecolor (others, green, blue, red) of the corresponding retinal cell,respectively.

    The compositional actions used in the current experiment isshown in Fig. 9. As shown in this figure, compositional actionsassumed in the experimental task had a temporal structure whichcould be described by a path of state transitions with branching.The first half of the action is moving the object to the right (R) ormoving the object upward (U), whereas the last half of the action isknocking over the standing object to lie by left arm (K) or movingthe object to the left (L). Totally, as combinations of the first halfand the last half, there are four actions, i.e. (RK), (RL), (UK) and (UL).

    In order to obtain a teaching signal of self actions, the humantutor guided both hands of the robot along the trajectory of theaction without providing explicit cues of showing segmentingpoints for the primitives. In these actions, the robot started tomovefrom the same home position so that theMTRNN cannot utilize theinitial posture difference information to generate difference actionprograms. As the robot hands were guided along the trajectory,the joint angles at each step were recorded and their sequenceswere used as teaching sequences. For each actions, the robotwas tutored 3 times with changing the initial object position as2 cm left from the original position, the original one and 2 cmright from original one. Meanwhile, others action sequences wereobtained by showing the corresponding demonstrators actions tothe robot. The sensory information during this observation of thedemonstrators actions consists of the vision inputs and the headdirection of the robot. For each action by the demonstrator, theaction sequence was recorded 3 times by changing the relative

  • 736 H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741

    a b

    Fig. 10. These pictures show the appearance of compositional actions performed by the robot or human imitator. In (a), the robot move the object upward using both hands,meanwhile the demonstrator performed the corresponding action using only one hand. In (b), the robot or demonstrator knock over the object from its left-hand side to itsright-hand side.

    Fig. 11. Time development of three actions generated. Each pair of three graphs corresponding to the three actions as described in Fig. 9. The upper graphs show theproprioceptive state (PS) for the four joint angles in the right arm. The lower graphs show the activation history of the IS neurons.

    position between the demonstrator and the robot as located infront, in 15 right and in 15 left. In the current setup of theexperiment, the demonstrator demonstrated action sequencesusing only onehand. Therefore the appearances of the sameactionsby the demonstrator and the robot could be significantly differentas shown in Fig. 10.

    At first, we examine how the robot can imitate novel actionsshown by the demonstrator by utilizing prior-learned behaviorskills. For this purpose, the robot was trained for three actionsincluding (RK), (UK) and (UL) by means of the bind training ofthe observation of the demonstrators actions and the own ones.After the training, another observation of the demonstrator doingthe action sequence (RL) was added to the training data set andthe corresponding initial state values was calculated. In the testgeneration, it is examined if the robot can successfully generatethe novel own action sequence of (RL) by using the initial stateobtained by learning of the observation of its corresponding actionby the demonstrator.

    Next, we examine how plans for actions can be generated inthe proposed scheme. The test was conducted using the networktrained in the previous experiment. In the planning, the actionprogram should be made to achieve the goal state of the objectbeing located in the left hand side from the initial state of the objectstanding on the workbench with the robots posture in its homeposition. Note that this goal state can be achieved by generatingeither action sequence of (RL) or (UL). In this experiment, actionplans were generated by searching for the initial state from whichthe sensory-forward model can bring the best match between thespecified goal state and the predicted one in the distal step asdescribed in the previous section.

    3.2. Results

    3.2.1. Regeneration of the trained actionsAll three TPMs were pre-adapted before the training of the

    MTRNN utilizing the sensory patterns including self and othersaction sequences. For the training of MTRNN, BPTTwas iterated for12000 epochs with randomly set initial weights, and it took about5 h in the current setting. The mean square error (MSE), which isthe average square error per inputoutput neurons per step overall teaching sequences, conversed to 0.000733.

    After the training, the MTRNN was tested to regenerate trainedactions by the robot by setting the ISwith the corresponding valuesdetermined through the learning process. Each action was testedwith varying the object position among three of originally trainedpositions and two of the novel positions which were 1 cm left andright from the center. Each trial was repeated three times. Theresults of the experiment show that the robot could regeneratetrained actions with a success rate of 100%. Fig. 11 shows the timedevelopment of the actions in which the object is placed in theoriginal position. The upper graph shows the encoder readings offour joint angles of the right armwhich are shown as unified in therange between 0.0 and 1.0. The lower graphs show the activationhistory of the IS neurons by red and blue lines whose initialvalues were self-determined in the learning. As shown in thisfigure, the initial values of the IS neurons are different dependingon the action sequence. In order to analyze the relationshipbetween the IS and corresponding action sequences generated,we constructed the initial state map which shows the mappingfrom two-dimensional initial state to the resulting imagery VPsequences. The result can be seen in Fig. 12. We compared a

  • H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741 737

    Fig. 12. The initial state map obtained after the first training of the MTRNN. Eachgrid is labeled in accordance to the generated combination of action primitivepattern, where (R), (L), (K) and (U) denote the classified primitives and (X) denotesthe unclassifiable distorted one.

    mental image, which is generated with each initial state in thegrid,with trained actionprimitive pattern and classified thementalimagemanually. Each grid shows a label of representing generatedcombination of action primitive pattern, where (R), (L), (K) and(U) denote classified primitives and (X) denotes unclassifiabledistorted ones. It is seen that three of the trained combinations ofthe primitives, namely (UL), (UK) and (RK) appear as large clusterswhile the distorted one appears in small regions.

    3.2.2. Generation of novel action by imitationAfter the first training, observation of the demonstrator doing

    the novel action sequence (RL) was added to the training dataset.Then BPTT learning was iterated for 3000 epochs on the trainedsynaptic weights which had been obtained through the firsttraining, and it took about 75 min. As a result the MSE conversedto 0.000591 with the new dataset including the observationof novel action sequence. The imitational action sequence wasgenerated by using the initial state determined for the novel actionsequence with the self mode. The generated action sequence wastested by varying the object position that was the same conditionof regeneration of trained action sequences. The result of thisexperiment shows that the robot could generate imitational actionsequences with a success rate of 100%. The time development ofthe imitational action sequence in which the object is placed in theoriginal position is shown in Fig. 13.

    When looking at Figs. 11 and 13, we can find that the IS valuesshown by the initial value of red and blue lines take different valuedepending on the combination of action primitives. When the firsthalf of the action sequence is (U), the blue line in the lower graphstarts from lower value, whereas it starts from higher value whenthe first half of the action sequence is (R). Likewise, the red line inthe lower graph starts from lower value when the last half of theaction sequence is (L) and it becomes (K) when the red line startsfrom higher value. This observation suggests that compositionalaction primitive sequences are successfully embedded in the slowdynamics of the IS neurons. Fig. 14 indicates the initial statemap obtained after the additional training. By comparing withFig. 12, the novel cluster corresponding to the action sequence(RL) appears in the left bottom of this map while positions ofthe clusters corresponding to previously learned action sequences(UL), (UT) and (RT) change only slightly. This result is interpretedthat a novel own action program that corresponds to observationof the demonstrators action can be generated by utilizing priorlearned structures of action primitives.

    Fig. 13. Time development of the imitational action. The upper graph shows theproprioceptive state (PS) for the four joint angles in the right arm. The lower graphshows the activation history of the IS neurons.

    Fig. 14. Initial state map generated after the additional training of novel actionsequence (RL).

    3.2.3. Planning of action for a given goal stateThe action plans to achieve the specified goal state as described

    in the previous section were generated by searching the optimalIS values. The search calculation was iterated 2000 epochs withthe learning rate = 0.2, and it took about 50 min in the currentsetting. As had been expected, two distinct action plans of (RL) and(UL) were generated in terms of imagery VP sequences dependingon the initial setting of the IS values in the search process. Thosetwo typical VP sequences are shown in Fig. 15 for RL and Fig. 16for UL. In these figures, imagery sequences generated in both selfmode and others mode are shown. These figures indicate thattwo possible imagery for own action as well as for others inachieving the specified goal are successfully generated by meansof planning. It was also confirmed that the robot can successfullyexecute those plans in its physical behaviors by setting the IS withthose values obtained in the search.

  • 738 H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741

    a

    b

    Fig. 15. The visual imagery corresponding to RL generated as a result of planning. In (a), visual imagery for own action in the lower row and actual execution of the plan inthe upper row. In (b), visual imagery for demonstrators action in the lower row and actual demonstrators action demonstrated to the robot in the upper column.

    a

    b

    Fig. 16. The visual imagery corresponding to UL generated as a result of planning. See the caption of Fig. 14.

    To carry out a more detailed analysis, we constructed a mapwhich shows the mapping from the initial IS (IIS) values, whichis set to the IS in the onset of the search and to action sequencesgenerated from the converged IS value in the search. The IIS valuesare adjusted in the range between 0.0 and 1.0with 0.1 interval. Theresult is shown in Fig. 17. It can be seen that the search process cangenerate correct action plans of either (RL) or (UL) with setting theIIS at most points in the space. Also, we can see two clusters of(UL) in the upper part and of (RL) in the lower part in the spacethose overlap largely with the corresponding clusters shown inFig. 14. This analysis indicates that goal-directed planning can beperformed quite robustly.

    4. Discussion

    The current paper presented a neuro-dynamic systems ap-proach to the problem of generating/understanding actions ofown/demonstrators by introducing our synthetic robot modeling

    study. In the following, we show our interpretation and discussionabout the obtained experimental results qualitatively by focusingon the problems of generalization in imitation learning and cre-ations of novel actions. We will also discuss future related studies.

    4.1. Generalization in imitation learning

    It is considered that imitation learning require at the leasttwo classes of generalization [3739]. One is generalization oversensorymotor trajectory variances such as caused by differencesin object size, position and shape to be manipulated or byview differences in observing demonstrator. The current studyshowed that such trajectory level variances related to objectpositions and views on demonstrator can be resolved byintroducing unified intentional states in terms of the initial internalstates values over such variances bymeans of the proposed bindinglearning scheme. Such intentional statesmight be hosted bymirrorneurons [28] in PMv.

  • H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741 739

    Fig. 17. Initial IS map. This map shows themapping from the initial IS (IIS) value toimagery sequences generated with the converged IS value in the search. Each gridindicates a label of imagery sequences.

    The other class of generalization is on cognitive level whichconcerns onmore qualitative understanding of actions by own anddemonstrators, which do not necessarily depend on exact actiontrajectories. Understanding of actions require understanding ofmultiple possibleways of achieving the samegoalwith abstraction.This requires decomposition of original visuo-proprioceptivetrajectories into reusable primitives and their recombinations intomultiple pathways to achieve the same goal. Our experimentalresults showed that such abstraction can be realized by learningin multiple time-scales neuronal dynamics and that the actionplanning scheme of searching the initial state can constructmentalimagery of multiple pathways to achieving the same goal.

    On this aspect, some may claim that the same function couldbe accomplished by some hybrid system approaches [7,8] ofinterfacing symbol level constructed by some machine learningscheme and analog representation in sensorymotor level byusing neural nets of fuzzy systems. We, however, argue that suchapproaches may suffer from the symbol grounding problems [9]because the interface between those levels can be just arbitrary. Onthe contrary, in our proposed dynamical systems approach, thereare no explicit separation of symbol level and sensorymotor levelbecause the whole cognitive-behavioral function is realized byone body of neuro-dynamical system.When seamless interactionsare iterated between the top-down goal-directed intentionalityand the bottom-up sensation of the reality in the adopted neuro-dynamical system through repeated learning, cognitive constructsof being compositional, yet situated to sensorymotor realityshould emerge upon the self-organized dynamical structures.

    4.2. Creation of novel actions

    One interesting question to ask might be how diversity ofnovel action programs can be generated by utilizing andmodifyingprior-learned action scheme. Our experiment showed that afterthe robot observes the demonstrators action, its correspondingown action, which is novel for the robot, can be successfullygenerated by adequately mapping the observed one to the ownone by utilizing the prior-learned own primitive. The initial statemapping analysis shown in Fig. 14 indicated that relatively largeclusters appear for the IS of the novel combination of actionprimitives as well as for that of other trained combinations butthat only small region does for that of distorted combination ofprimitives. It was, however, shown in our preliminary experimentthat such novel combination cannot be generated anymore unlessthe observation of corresponding demonstrators action is binded

    in the second time learning. On the contrary, our prior study [24]using MTRNN indicated that diverse novel combinations of actionprimitives including that of distorted ones can be generated byintroduction of chaotic dynamics in the slow dynamics part inthe network. These results suggest that diversity of creating novelactions of combining prior-learned primitives can be drasticallyincreased by introduction of chaos. It should be, however, alsotrue that introduction of chaos may lead to unstable situationswhere useless distorted actions are more likely to be generated.It is speculated that creativity for generating novel actions couldface the dilemma between flexibility and stability.

    4.3. Future studies

    Future studies should extend the vision system for the purposeof scaling the task complexity. In the current experiments, varianceof relative position between the robot and the demonstrator waslimited in narrow range of plusminus 15. Therefore, the visualsensation of the demonstrator can be easily mapped to that of theimitator by nonlinear function of adopted network. It is, however,not certain that the same scheme can be applied to the case oflarger range of view variances. In such case, vision processingknown as mental rotation [43] might be required.

    In the course of scaling task complexity, the mechanisms ofvisual attention and saccading might be required to deal withmanipulation of more than one object. In our current setting,because it was assumed that there is only a single object to bemanipulated, no attention switching mechanism was required. Itis left for future studies to investigate how the scheme of attentionswitching accompanied by saccading can be attained throughlearning.

    Another line of interesting future investigations might becomparison between the stochastic learning approach usingBayesian formulation [44] and deterministic dynamical systemsapproach. Recently, fast progress in the stochastic computationalmodels such as the hidden Markov model (HMM) attract largeattention inmany research area. The action imitation task had beenalso achieved by using the HMM [45]. One interesting recent resultby Namikawa and Tani [46] is that the forward model by CTRNNcan learn probability of branching of primitives as embedded inits self-organized chaos. Such capability of mimicking stochasticsequences by the deterministic dynamic neural network modelshould be qualitatively examined by comparison with variousstochastic modeling schemes.

    5. Conclusion

    The current paper presented a synthetic robot modeling studywhich accounts for how actions by others can be recognized andimitated with compositionality in terms of the mirror neuronsystem by utilizing a dynamic neural network model of MTRNN.The method was evaluated in the experiments using a smallhumanoid robot implementedwith themodel. The results showedthat (1) all of tutored actions can be regenerated robustly bygeneralizing the variance of the position of the target object,(2) novel actions demonstrated by the demonstrator can beimitated by means of combining prior acquired primitives intothe corresponding actions of self-generated, and (3) when thegoal state is given in terms of own visuo-proprioceptive state,multiple action plans which achieve the given goal state can befound robustly through iterative search. It is finally concluded that,by having seamless and dense interactions between the top-downgoal-directed intentionality and the bottom-up perception of thereality through repeated learning of own experiences, cognitiveconstructs of being compositional, yet situated to sensorymotor

  • 740 H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741

    reality can emerge as the results of self-organization of adequateneuro-dynamical structures.

    Acknowledgments

    This work was supported in part by funds from the Grant-in-Aid for Scientific Research in Priority Areas Emergence of AdaptiveMotor Function through Interaction among the Body, Brain andEnvironment: A Constructive Approach to the Undergoing ofMobiligence from Ministry of Education, Culture, Sports, Scienceand Technology (MEXT), Japan.

    References

    [1] J. Tani, R. Nishimoto, R. Paine, Achieving organic compositionality throughself-organization: reviews on brain-inspired robotics experiments, NeuralNetworks 21 (2008) 584603.

    [2] G. Evans, Semantic theory and tacit knowledge, in: S. Holzman, C. Leich (Eds.),Wittgenstein: To Follow a Rule, Routledge and Kegan Paul, London, 1981,pp. 118137.

    [3] M. Arbib, Perceptual structures and distributed motor control, in: Handbookof Physiology: The Nervous System, II, Motor Control, MIT Press, cambridge,MA, 1981, pp. 14481480.

    [4] M. Mataric, Integration of representation into goal-driven behavior-basedrobot, IEEE Transactions on Robotics and Automation 8 (3) (1992) 304312.

    [5] S. Harnad, Grounding symbolic capacity in robotic capacity, in: L. Steels,R. Brooks (Eds.), The Artificial Life Route to Artificial Intelligence, in: BuildingSituated Embodied Agents, Lawrence Erlbaum, New haven, 1993.

    [6] A. Cangelosi, A. Greco, S. Harnad, From robotic toil to symbolic theft: groundingtransfer from entry-level to higher-level categories, Connection Science 12 (2)(2000) 143162.

    [7] R. Sun, Connectionist Implementationalism and Hybrid Systems, NaturePublishing Group MacMillan, 2002.

    [8] P. Doherty, The witas integrated software system architecture, LinkopingElectronic Articles in Computer and Information Science 4 (17) (1999).

    [9] S. Harnad, The symbol grounding problem, Physica D 42 (1990) 335346.[10] J. Tani, Model-based learning for mobile robot navigation from the dynamical

    systems perspective, IEEE Transactions on Systems,Man and Cybernetics: PartB 26 (3) (1996) 421436.

    [11] J. Tani, An interpretation of the self from the dynamical systems perspective:a constructivist approach, Journal of Consciousness Studies 5 (56) (1998)516542.

    [12] J. Tani, Learning to generate articulated behavior through the bottomup andthe topdown interaction process, Neural Networks 16 (2003) 1123.

    [13] B.l. Hao, Elementary Symbolic Dynamic and Chaos in Dissipative System,World Scientific, Singapore, 1989.

    [14] J. Crutchfield, Inferring statistical complexity, Physical Review Letters 63(1989) 105108.

    [15] J. Elman, Finding structure in time, Cognitive Science 14 (1990) 179211.[16] J. Pollack, The induction of dynamical recognizers, Machine Learning 7 (1991)

    227252.[17] J. Tani, N. Fukumura, Embedding a grammatical description in deterministic

    chaos: an experiment in recurrent neural learning, Biological Cybernetics 72(1995) 365370.

    [18] K. Doya, S. Yoshizawa, Memorizing oscillatory patterns in the analog neuronnetwork, in: Proc. of 1989 Int. Joint Conf. on Neural Networks, Washington,DC, vol. 1, 1989, pp. 2732.

    [19] R.J. Williams, D. Zipser, A learning algorithm for continually running fullyrecurrent neural networks, Neural Computation 1 (2) (1989) 270280.

    [20] D.M. Wolpert, M. Kawato, Multiple paired forward and inverse models formotor control, Neural Networks 11 (78) (1998) 13171329.

    [21] J. Tani, S. Nolfi, Learning to perceive the world as articulated: an approach forhierarchical learning in sensory-motor systems, Neural Networks 12 (1999)11311141.

    [22] J. Tani,M. Ito, Y. Sugita, Self-organization of distributedly representedmultiplebehavior schemata in a mirror system: reviews of robot experiments usingRNNPB, Neural Networks 17 (2004) 12731289.

    [23] R. Nishimoto, J. Namikawa, J. Tani, Learning multiple goal-directed actionsthrough self-organization of a dynamic neural network model: a humanoidrobot experiment, Adaptive Behavior 16 (23) (2008) 166181.

    [24] H. Arie, T. Endo, T. Arakaki, S. Sugano, J. Tani, Creating novel goal-directedactions at criticality: a neuro-robotic experiment, New Mathematics andNatural Computation (2009).

    [25] Y. Yamashita, J. Tani, Emergence of functional hierarchy in amultiple timescaleneural network model: a humanoid robot experiment, PLoS ComputationalBiology 4 (11) (2008).

    [26] M.I.H. Mushiake, J. Tanji, Neuronal activity in the primate premotor,supplementary, and precentral motor cortex during visually guided andinternally determined sequential movements, Journal of Neurophysiology 66(3) (1991) 705718.

    [27] J. Fuster, The Prefrontal Cortex, Raven Press, New York, 1989.[28] G. Rizzolatti, L. Fadiga, V. Gallese, L. Fogassi, Premotor cortex and the

    recognition of motor actions, Cognitive Brain Research 3 (2) (1996)131141.

    [29] A.H. Fagg, M.A. Arbib, Modeling parietalpremotor interactions in primatecontrol of grasping, Neural Networks 11 (1998) 12771303.

    [30] E. Oztop, D. Wolpert, M. Kawato, Mental state inference using visual controlparameters, Cognitive Brain Research 22 (2005) 129151.

    [31] C. Colby, J. Duhamel, M. Goldberg, Ventral intraparietal area of the macaque:anatomic location and visual response properties, Journal of Neurophysiology69 (1993) 902914.

    [32] E. Eskandar, J. Assad, Dissociation of visual, motor and predictive signalsin parietal cortex during visual guidance, Nature Neuroscience 2 (1999)8893.

    [33] H. Ehrsson, A. Fagergren, R. Johansson, H. Forssberg, Evidence for theinvolvement of the posterior parietal cortex in coordination of fingertip forcesfor grasp stability in manipulation, Journal of Neurophysiology 90 (2003)29782986.

    [34] S. Imazu, T. Sugio, S. Tanaka, T. Inui, Differences between actual and imaginedusage of chopsticks: an FMRI study, Cortex 43 (3) (2007) 301307.

    [35] K. Ogawa, T. Inui, Lateralization of the posterior parietal cortex for internalmonitoring of self-versus externally generated movements, Journal ofCognitive Neuroscience 19 (2007) 18271835.

    [36] A. Billard, M.J. Mataric, Learning human arm movements by imitation:evaluation of a biologically inspired connectionist architecture, Robotics andAutonomous Systems 37 (23) (2001) 145160.

    [37] Y. Kuniyoshi, M. Inaba, H. Inoue, Learning by watching: extracting reusabletask knowledge from visual observation of human performance, IEEETransactions on Robotics and Automation 10 (6) (1994) 799822.

    [38] S. Calinon, F. Guenter, A. Billard, On learning, representing and generalizing atask in a humanoid robot, IEEE Transactions on Systems,Man and Cybernetics:Part B, Robot learning by observation, demonstration and imitation 37 (2)(2007) (special issue).

    [39] K. Dautenhahn, C.L. Nehaniv (Eds.), Imitation in Animals and Artifacts, MITPress, Cambridge, MA, 2002.

    [40] A. Murata, H. Ishida, Representation of bodily self in the multimodalparietopremotor network, in: S. Funahashi (Ed.), Representation and Brain,Springer, 2007, pp. 151176.

    [41] J. Saarinen, T. Kohonen, Self-organized formation of colour maps in a modelcortex, Perception 14 (6) (1985) 711719.

    [42] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning Internal Representationsby Error Propagation, in: Parallel Distributed Processing: Explorations in theMicrostructure of Cognition, vol. 1, MIT Press, Cambridge, 1986.

    [43] R. Shepard, L. Cooper, Mental Images and their Transformations, MIT Press,Cambridge, Mass, 1982.

    [44] R.P.N. Rao, A.P. Shon, A.N. Meltzoff, A Bayesian model of imitation in infantsand robots, in: K. Dautenhahn, C. Nehaniv (Eds.), Imitation and Social Learningin Robots, Humans, and Animals: Behavioural, Social and CommunicativeDimensions, Cambridge University Press, Cambridge, 2004.

    [45] T. Inamura, Y. Nakamura, I. Toshima, Embodied symbol emergence basedon mimesis theory, International Journal of Robotics Research 23 (4) (2004)363377.

    [46] J. Namikawa, J. Tani, A model for learning to segment temporal sequences,utilizing a mixture of rnn experts together with adaptive variance, NeuralNetworks 21 (2008) 14661475.

    Hiroaki Arie received the B.S. and M.S. degrees fromWaseda University, Tokyo, Japan, in 2003 and 2005,respectively. His interests include intelligence model for aself-developing autonomous robot and embodied artificialintelligence. He is currently a research associate at BrainScience Institute, RIKEN.

    Takafumi Arakaki received the B.S. degree from WasedaUniversity, Tokyo, Japan, in 2009. He is interested intheoretical understanding of dynamical properties andbiological implementations of cognitive systems. He iscurrently a master student at Waseda University.

  • H. Arie et al. / Robotics and Autonomous Systems 60 (2012) 729741 741

    Shigeki Sugano received the B.S., M.S., and Dr. ofEngineering in Mechanical Engineering in 1981, 1983,and 1989, respectively, from Waseda University. From1987 to 1991, he was a Research Associate in WasedaUniversity. Since 1991, he has been a Faculty Memberin the Department of Mechanical Engineering, WasedaUniversity, where he is currently a Professor. From 1993to 1994, he was a Visiting Scholar in the MechanicalEngineering Department, Stanford University. Since 2001,he has been a Director of the Waseda WABOT-HOUSElaboratory. He is a member of the Humanoid Robotics

    Institute of Waseda University. Since 2001, he has been the President of JapanAssociation for Automation Advancement.

    His research interests include anthropomorphic robot, dexterous manipulator,and humanrobot communication. He received the Technical Innovation Awardfrom the Robotics Society Japan for the development of Waseda Piano-PlayingRobot: WABOT-2 in 1991. He received the JSME Medal for Outstanding Paper fromthe Japan Society of Mechanical Engineers in 2000. He received JSME Fellow Awardin 2006. He received IEEE Fellow Award in 2007.

    He serves the Secretary of the IEEE Robotics & Automation Society. He serves asa Co-Chair of the IEEE RAS Technical Committee on Humanoid Robotics. He served

    as the IEEE RAS Conference Board, Meetings Chair from 1997 to 2005. He is anAssociate Editor of the International Journal of Advanced Robotics.

    He served as the General Chair of the 2003 IEEE/ASME International Conferenceon Advanced Intelligent Mechatronics (AIM 2003). He served as the General Co-Chair of the 2006 IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS 2006).

    Jun Tani received the B.S. degree in mechanical engineer-ing from Waseda University, dual M.S. degree in electri-cal engineering and mechanical engineering from Univer-sity of Michigan and Dr. Eng. from Sophia University. Hestarted his research career in Sony Computer Science Lab-oratory in 1990.

    He has been appointed as a team leader in Lab. ForBehavior and Dynamic Cognition, Brain Science Institute,RIKEN in Tokyo since 2000. He was also appointed as avisiting associate professor in Univ. of Tokyo from 1997 to2002. He is interested in neuroscience, psychology, phe-

    nomenology, complex adaptive systems and robotics.