Online Learning of an Open-Ended Skill Library for Collaborative … · 2018. 10. 15. · Online Learning of an Open-Ended Skill Library for Collaborative Tasks Dorothea Koert 1,

Online Learning of an Open-Ended Skill Library for Collaborative Tasks

Dorothea Koert1, Susanne Trick1, Marco Ewerton1, Michael Lutter1, and Jan Peters1,3

Abstract— Intelligent robotic assistants can potentially im-prove the quality of life for elderly people and help themmaintain their independence. However, the number of differ-ent and personalized tasks render pre-programming of suchassistive robots prohibitively difficult. Instead, to cope with acontinuous and open-ended stream of cooperative tasks, newcollaborative skills need to be continuously learned and updatedfrom demonstrations. To this end, we introduce an onlinelearning method for a skill library of collaborative tasks thatemploys an incremental mixture model of probabilistic inter-action primitives. This model chooses a corresponding robotresponse to a human movement where the human intentionis extracted from previously demonstrated movements. Unlikeexisting batch methods of movement primitives for human-robot interaction, our approach builds a library of skills online,in an open-ended fashion and updates existing skills using newdemonstrations. The resulting approach was evaluated bothon a simple benchmark task and in an assistive human-robotcollaboration scenario with a 7DoF robot arm.

I. INTRODUCTION

The expected demographic change is a major challengefor society as a significantly growing elderly populationwill require substantially increased assistance [1]. Intelligentrobot assistants could improve quality of life, assist incooperative tasks and help the elderly maintain their partialindependence while staying in their own homes. However,such cooperative robot assistants need to be able to adapt toindividual needs and a multitude of tasks at hand, renderingpre-programming of all possible tasks prohibitively difficultin practice.

An intuitive way for non-expert users to teach personalizedskills to the robot is required for which Learning fromDemonstration (LfD) is considered a promising approach [2].In particular, personal cooperative robots require the abilityto learn multiple different tasks and adapt them to varyingcontexts. Hereby, the robot should be able to learn to dis-tinguish between different human intentions and react withthe matching interaction patterns. However, human motionsexhibit a high variability [3]. To model and react to thisvariability in human motions, a probabilistic approach tocooperative skill learning is needed, which takes the varia-tions in human motions during demonstrations and executiontime into account. Additionally, the ability to incorporate

This work is supported by the German Federal Ministry of Educationand Research (BMBF) in the project 16SV7984 (KoBo34) and by ERCStG SKILLS4ROBOTS #640554.

1Intelligent Autonomous Systems, TU Darmstadt, Germany3MPI for Intelligent Systems, Tuebingen

{koert,ewerton,lutter,peters} @ias.tu-darmstadt.de, ,susanne gabriele.trick @ stud.tu-darmstadt.de

Observed Human Motion

Gating Model for Cooperative Tasks ? New Interaction Pattern

Fig. 1. Intelligent robot assistants should learn multiple personalizedcooperative tasks from a continuous and open-ended stream of new demon-strations. To this end, we propose a novel approach for online and open-ended learning of a mixture model of probabilistic interaction primitives.In particular, our approach updates existing cooperative tasks from newdemonstrations and extends the collaborative skill library for new taskswhen needed. Hereby, our model chooses a robot response to an observedhuman motion based on prior demonstrations while considering variance inthe demonstrations as well as coupling between human and robot motions.

new demonstrations in an online and open-ended fashion isdesirable.

In this paper, we propose an online learning methodfor collaborative skills that employs an incremental mixturemodel of probabilistic interaction primitives for online learn-ing of collaborative skills from demonstrations. Probabilisticinteraction movement primitives (Interaction ProMPs) [4] areused as a representation of cooperative skills, that can capturecorrelations between human and robot movements as well asthe inherent variance. While Interaction ProMPs have alreadybeen used in scenarios with multiple contexts [5], to thebest of the author’s knowledge no method for online andopen-ended learning of multiple Interaction ProMPs fromdemonstrations has been proposed. However, for personal-ized robot assistants it is crucial to open-endedly learn newtasks and continuously update existing cooperative skills withnew demonstrations. In particular, in such an open-endedscenario the total number of cooperative tasks cannot beknown beforehand and thus needs to be extended during thelearning process. In this paper, this problem is tackled withan online learning method to update and extend a library ofcooperative skills. This library allows inferring the humanintention from previous demonstrations and is used to choosethe appropriate robot response to a human motion or requestmore demonstrations in case of high uncertainty. The em-ployed probabilistic movement representation is capable of

representing the abundant variance in demonstrations and canadapt to variation in human motions during execution. Theresulting approach is able to distinguish between multipledifferent interaction tasks, to update existing skills with newdemonstrations and, if necessary, to extend the interactionlibrary for new tasks. In contrast to prior work on learning aMixture of Interaction primitives [5], our new approach doesnot rely solely on demonstrations which are available at thefirst training time but can integrate new demonstrations andtasks into the collaborative skill library over multiple trainingsessions.

The rest of this paper is organized as follows: First, wediscuss related work in Section II. Next, in Section III,we provide a short overview on the existing approach ofBatch Learning for a Mixture of Interaction Primitives andafterwards introduce our novel approach for Online Open-Ended Learning of a Mixture of Interaction Primitives. InSection IV we evaluate this new approach on 2D trajectorydata and on a collaborative scenario where a robot assists ahuman in making a salad. Finally, we conclude with SectionV and discuss ideas for future work.

II. RELATED WORK

Learning cooperative tasks between humans and robotsfrom demonstration is a popular approach as it enables alsonon-expert users to teach personalized skills to robots [6],[7], [8], [9]. When learning from demonstrations, the conceptof movement primitives offers a lower dimensional repre-sentation of trajectories [10], [11], [12], [8]. In particular,Probabilistic Interaction Movement Primitives (InteractionProMPs) [13], [4], [9] offer a probabilistic representationto model inherent correlations in the movements of twoactors, such as human and robot, from coupled demonstratedtrajectories.

However, to achieve a personalized cooperative robot itis desirable to learn multiple different cooperative tasks anddecide on their activation depending on the context or on thehuman intention [14], [15], [16]. To this end, an approachthat deploys Gaussian Mixture Models (GMMs) and Expec-tation Maximization to learn multiple Interaction ProMPsfrom unlabeled demonstrations has been introduced [5]. Thisapproach considers batch data, i.e. assuming the availabilityof all data points during training. This limits its applicationto settings where the number of tasks does not change aftertraining and no new demonstration trajectories need to beintegrated. Moreover, such batch learning prevents scalabilityas the computation time and memory requirements becomeinfeasible for large skill libraries or datasets [17]. Variousapproaches outside the human-robot interaction (HRI) scopehave been addressing these problems.

Initially, the machine learning communities have pro-posed incremental learning approaches for Gaussian MixtureModels. Some approaches propose updating a GMM withcomplete new model component datasets [18] or assume theincoming data points to be time-coherent [19]. Incremen-tal Gaussian Mixture Model learning introduced a way tocontinuously learn a GMM from an incoming data stream

while not fixing the number of total components beforehand[20], [21]. Another two-level approach introduces methodsfor splitting and merging of GMM components [22].

Updating of robotic movement representations online fromnew demonstrations has also been used for incremental learn-ing of extensions of GMMs for gesture imitation [17], up-dating Gaussian Processes from demonstrations and therebyreducing the movement variance [23] or incremental updat-ing of task-parameterized Gaussian Mixture Models [24].

While all these works focus on updating multiple existingmovement representations, in a long-term setting adding newtasks is also important. Approaches that also add new com-ponents when needed have been proposed in the context ofonline updating of task-parameterized semi-tied hidden semi-Markov models for manipulation tasks [25], learning full-body movements [26], a bootstrapping cycle for automaticextraction of primitives from complex trajectories [27] orrobot table tennis [28]. However, while we draw inspirationfrom the aforementioned related work, in an HRI scenario itis additionally desirable to consider the inherent coupling andvariance in human and robot motions in the demonstrations.

III. INCREMENTAL INTERACTION PRIMITIVES

In this Section we introduce a new approach to contin-uously learn and update multiple cooperative skills fromdemonstrations.

Here, demonstrations are given in form of coupled humanand robot trajectories dn = {τh

n , τrn}, where τh

n can e.g.be a sequence of human wrist positions and τ r

n can e.g.be a sequence of robot joint positions. To learn multiplecooperative tasks from these demonstrations in an onlineopen-ended fashion we introduce a model that is inspiredby the Mixture of Experts architecture [29] and consists oftwo intertwined parts. On the one hand, we use the humantrajectories from the demonstrations to train and update agating model, which will later be used to decide betweendifferent cooperative tasks. In addition, we train probabilisticmodels to generate appropriate robot response trajectories.Here, we deploy Interaction ProMPs [4], as they are able tocapture the inherent correlation in robot and human motionsfrom the demonstrations. Figure 2 summarizes our approachto achieve training of this mixture model in an online andopen-ended fashion.

In the following, we briefly describe the previously pro-posed batch-based, stationary Mixture of Interaction ProMPsin Section III-A. Next, we present our novel approach tolearn a mixture model of Probabilistic Movement Primitivesin an online and open-ended fashion in Section III-B. Finally,in Section III-C we show how the obtained library of multipleinteraction ProMPs and the corresponding gating model canbe deployed in an HRI scenario.

A. Batch Learning for Mixture of Interaction ProMPs

Probabilistic Movement Primitives (ProMPs) [12] repre-sent demonstrated movements in the form of distributionsover trajectories. In order to obtain this distribution, thetrajectories are first approximated by a linear combination

of basis functions φ. More precisely, a joint position qt attime step t can be represented as

qt = φTt w + ε, (1)

where φt contains N basis functions φ evaluated at timestep t, w is a weight vector and ε is a zero-mean Gaussiannoise. The choice of basis functions φ depends on the typeof demonstrated movements.

The weight vector w for each demonstrated trajectoryis computed with Ridge Regression. For multiple recordeddemonstrated trajectories, a Gaussian distribution over theweight vectors p(w) = N (µw,Σw) can then be obtainedwith Maximum Likelihood Estimation. Since the number Nof basis functions is usually much lower than the numberof time steps of recorded trajectories, the distribution p(w)can be seen as a compact representation of the demonstratedmovements, which accounts for variability in the execution.In particular, ProMPs offer a representation that allows foroperations from probability theory to specify goal or via-points, correlate different degrees of freedom via condition-ing and combine different primitives through blending [12].

An Interaction ProMP [4] is a ProMP that uses a dis-tribution over the trajectories of at least two interactingagents. The demonstrations are now given in the form of astacked vector for the observed and the controlled agent q =[qo, qc]T , where qo denotes the demonstrated trajectoriesfor the observed agent and qc denotes the demonstratedtrajectories of the controlled agent. Respectively, the weightvector is also represented in an augmented form w =[wT

o ,wTc ]T . Given a set of demonstrations, a distribution

over multiple stacked weight vectors can be obtained just aspreviously described such that p(w) = N (µw,Σw). Given asequence D of positions of the observed agent (e.g. human),Interaction ProMPs provide methods to infer a corresponding(most likely) trajectory of the controlled agent (robot) [4].

The previously proposed batch learning for Mixture ofInteraction ProMPs [5] is an extension to Interaction ProMPsthat allows to learn several different interaction patterns fromunlabeled demonstrations by applying Gaussian MixtureModels (GMMs), where each mixture component representsone interaction pattern. The Mixture of Interaction Primi-tives is hereby learned from batch data and the number ofcomponents needs to be fixed beforehand. In the case of Kdifferent interaction patterns, the distribution over the weightvectors w is

p(w) =

K∑k=1

p(k)p(w|k) =

K∑k=1

αk N (w|µk,Σk), (2)

where αk is the k−th mixture weight that can be prior (if notlearned) or posterior (if learned from given data), µk is themean and Σk the covariance matrix of the k-th component.

The parameters of the GMM are hereby learned in theweight space using the Expectation Maximization (EM)algorithm. However, since this approach assumes that alldemonstrations are available at the learning time the numberof components K remains fixed after learning. This means

Update existing Cooperative tasks

Add new cooperativetask to library

Belongs to existing cooperative task ?

Decide which Interaction ProMP to activate

Application

Training

Observed motion

New demonstration yes no

Too high uncertainty

Execute Robot trajectoryRequest new demonstration

?

Adapt to human motion

Skill Library for Collaborative Tasks

Robot trajectoryHuman trajectory

Interaction ProMPs

GatingModel

Observed trajectory

Check merging

Fig. 2. We introduce a novel approach for online and open-ended learningof a mixture model for cooperative tasks. During training, demonstrationsare given in form of trajectories of a human demonstrator τh and corre-sponding trajectories of a robot arm τ r , that are obtained via kinestheticteaching. From these demonstrations, we update or extend our library forcooperative tasks that consists of a gating model and multiple correspondingInteraction ProMPs. During runtime, the gating model decides on activationof particular Interaction ProMPs that we subsequently adapt to the variancein the observed motion. If the gating model is too uncertain about activationof Interaction ProMPs the robot can request more demonstrations.

that the parameters of the GMM need to be computed usingall the previous demonstrations in case a new demonstrationis available. Moreover, a GMM with a fixed number of mix-ture components cannot cope with new interaction patterns.

B. Online Open-Ended Mixture of Interaction ProMPs

We propose a new method to achieve online learningof cooperative tasks in an open-ended fashion. Hereby,demonstrations are given in the form of robot and humantrajectories {τ r, τh}. First, we compute a correspondingrepresentation with weight vectors as introduced in SectionIII-A. Here, we consider that the human trajectory is ofdimensions Dh × T where Dh is the degree of freedom ofthe observations (e.g. in case of observing the wrist positionDh = 3) and T is the number of time steps and the robottrajectory is of dimensions Dr ×T where Dr is the degreesof freedom of the robot (e.g. Dr = 7 in case of a 7DoFrobot arm). For N basis functions φ we compute the matrixΦ = [φ0, ...φt, ...,φT ] with dimension N ×T . In this work,Gaussian basis functions evenly spaced along the time axisare an appropriate choice due to the stroke-based movements.We then compute the weight vectors as a lower dimensionalrepresentation of the trajectories where we first compute theweight vectors for each dimension w as

wh1

...wh

Dh

wr1

...wr

Dr

= (ΦΦT + βI)−1Φ

[τhτr

], (3)

where β is a factor for Ridge Regression and I is anidentity matrix. In experimental evaluation, we found that

Algorithm 1:input: Σg

init = I, Tnov, vmin;while new data τh

n ,τ rn do

compute whn,wr

n from τhn ,τ r

n Eq (3),(4);compute p(wh

n|k) ∀k according to Eq (5).;if p(wh

n|k) < Tnov ∀k thenadd new component Eq. (9);k++;

elsecompute p(k|wh

n) ∀k;update model parameters ∀k Eq.(6), (7), (8);if vk > vmin then

check merge and if any merge Eq. (10);end

endend

normalizing the trajectory data within a fixed range beforetransforming it into the weight space yields overall betterresult. Subsequently, we compute the stacked weight vectors

wh = [wh1 , ..., w

hDh

] andwr = [wr

1, ..., wrDr

]. (4)

From these demonstrations, now represented in form of{wr,wh}, we learn the two intertwined parts of our model:The gating model that decides on the cooperative tasks basedon human motions and multiple corresponding InteractionProMPs that can subsequently generate a correspondingrobot response. For the gating model we train a GaussianMixture Model (GMM) only on the weights of the humantrajectories wh, as at runtime only the human motion will beobserved when the system needs to decide on the particularcooperative task and the response of the robot. In parallel tothe gating model, the corresponding Interaction ProMPs aretrained with the augmented weight vector w of human androbot trajectories to model the correlations in the motions.

We assume that new training data needs to be integratedcontinuously and that we do not know beforehand thenumber of different collaborative tasks that might be shownto the robot during long-term training. To this end, weuse Incremental Gaussian Mixture Models [20] to achievethe continuous integration of new demonstrations. Here, weupdate the gating model and the parameters of the InteractionProMPs in an Expectation Maximization fashion.

In the Expectation step we compute the responsibilitiesλkn of the existing cooperative task k for a new demonstra-tion {wh

n,wrn}, that is the probability of a new demonstration

to belong to an already known cooperative task

λkn := p(k|whn) =

p(whn|k)p(k)

p(whn)

=αkN (wh

n|µgk,Σ

gk)∑K

j=1 αjN (whn|µ

gj ,Σ

gj ),

(5)

where µgk and Σg

k are respectively the mean and covariancematrix of the k-th component of the gating and αk arethe mixture component weights. In the Maximizationstep, we use the responsibilities to recursively update the

parameters of the gating model as well as the parametersof the already learned Interaction ProMPs. For each alreadylearned Interaction ProMP k we first compute

vk =vk + 1, sk = sk + λkn,

γk =λknsk

, γk = γk + exp (−sk)λkn, (6)

where vk is the age of the k-th component and sk representsthe trajectories the component already modeled well. We thenupdate the parameters of the gating model

µgk =µg

k + γk(whn − µ

gk),

Cgk =(1− γk)Cg

k + γk(whn − µ

gk)(wh

n − µgk)T−

(γk − γk)(µgk − µ

g,oldk )(µg

k − µg,oldk )T ,

αk =sk∑K

j=1 s(j), (7)

where the formulas correspond to the formulas in theincremental GMM [20], except that we introduce γk toachieve that during the first demonstrations the covariance isshifted faster away from the (possibly wrong) initialization.Additionally, we compute the updated parameters of thecorresponding Interaction ProMPs

µek =µe

k + γ(wn − µek),

Cek =(1− γk)Ce

k + γk(wn − µek)(wn − µe

k)T−(γk − γk)(µe

k − µe,oldk )(µe

k − µe,oldk )T , (8)

where µek is the mean of the k-th Interaction ProMP and

Σek is the covariance matrix of the k-th Interaction ProMP.

Whenever p(whn|k) is below a threshold Tnov for all existing

K components we initialize a new component with

µgK+1 =wh

n, ΣgK+1 = Σg

init,

µeK+1 =wn, Σe

K+1 = Σeinit,

vk =1, sk = 1, (9)

If a component has reached a certain age vk > vmin

we check also for merging of components to ensure thatno unnecessary components are maintained. Therefore, wecompute the probability of the mean of a cluster j tobelong to a cluster i as p(µj |i) = N (µj |µi,Σi) anddecide on merging if p(µj |i) > Tnov. Once we decide oncandidates i, j for merging we recompute the joined meanand covariance

µij =siµi + sjµj

si + sj

Σij =s2iΣi + s2jΣj + (siµi + sjµj)siµi + sjµ

Tj

(si + sj)2

− µijµTij . (10)

Algorithm 1 summarizes our approach for online learningof a gating model and multiple Interaction ProMPs.

yyyy

x x x x xxx

1 demonstration 3 demonstrations 4 demonstrations 6 demonstrations 20 demonstrations 27 demonstrations 100 demonstrations

y

x

y y

x x x x x x

timesteps timesteps timesteps timesteps timesteps timesteps timesteps

Fig. 3. We evaluate our approach first on a task of learning ProMPs of multiple hand-drawn letter trajectories when the demonstrations are providedincrementally and no batch data are stored. The intermediate results during the training of the ProMP library are shown in the upper row while theaccumulated demonstrations are shown in the lower row. In the upper row, the shaded area represents two times the standard deviation while the solidlines show the mean and the demonstrated trajectories are shown as gray lines. Here, our approach successfully updates existing components with newdemonstrations and adds new components when required. In particular, for more demonstrations per letter, the influence of the initial covariance matrixdiminishes and the components covariance converges to the covariance of the demonstrations.

C. A Skill Library for Collaborative Tasks

To demonstrate the use of the learned probabilistic mixturemodel for cooperative tasks we assume we are now observingthe human and obtain an observation wh

∗ . To determine themost probable cluster given the observations we need tomodel the posterior of the cluster given the observation

p(k|wh∗ ) = λk∗ (11)

where λk∗ is the responsibility of the k-th cluster for theobservation wh

∗ as defined in Equation (5).For a given observation wh

∗ we can now infer the most likelyInteraction ProMP k∗ using our probabilistic gating model

k∗ = arg maxk

p(k|wh∗ ). (12)

If the responsibility of all components is smaller than thenovelty threshold Tnov the robot does not execute a responsebut asks the user for new demonstrations that get subse-quently included in the library as described in Section III-B.Otherwise, we condition the chosen Interaction ProMP on theobserved trajectory to infer the corresponding robot response.Therefore, the observation o is used to obtain a posteriordistribution over the weights. The posterior is again Gaussianwith mean µnew and covariance matrix Σnew

Λ = Σk∗Ht(Σo +HTt Σk∗Ht)

−1

µnew = µk∗ + Λ(o−HTt µk∗)

Σnew = Σk∗ −ΛHtΣk∗ , (13)

where Σo = Iσo is the observation noise and Ht is theobservation matrix as defined in [4]. More details can befound in [9]. To obtain a corresponding robot motion weexecute the mean robot trajectory of this posterior.

IV. EXPERIMENTAL EVALUATION

We evaluate our approach on 2D trajectory data andon a collaborative scenario with a 7DoF robot arm. Forboth, we show the qualitative applicability and evaluate thequantitative convergence w.r.t. to a baseline. In addition, wedemonstrate that the proposed approach can learn personal-ized libraries for collaborative tasks for different persons andreport successful task completion via the decision accuracyof our gating model.

xxx

xx x

2 demonstrations per letter 4 demonstrations per letter 10 demonstrations per letter

(a) (b)

our

appro

ach

EM

(c)

y

xdemonstrations per letterK

ullb

ack

-Leib

ler

div

erg

ence

to b

ase

lin

e

timesteps timesteps timesteps


our approachEM

Fig. 4. (a) The demonstrations in the first experiment are given in the formof multiple hand-drawn letter trajectories. (b) We compare our approachagainst an EM approach, where for both we compute the KL-divergence toa baseline solution from labeled data. For increasing number of samples perletter our approach converges against the EM solution, while additionallybeing able to continuously integrate new demonstrations. (c) For fewersamples per letter the covariance of the components in our approach isgoverned by the initial covariance. For increasing number of samples perletter the covariances converge to the underlying data covariances and resultin comparable results to the EM approach.

A. 2D trajectory data

For the 2D trajectory data experiment, demonstrationsare given in the form of multiple hand-drawn letters, asillustrated in Figure 4 (a). Here, we learn a library ofProbabilistic movement primitives in an incremental fashion.The system never has access to the whole training dataset atonce, but only one new unlabeled demonstration is providedat each update step. The general procedure is shown in Figure3, where the upper row shows the x-dimension of the learnedlibrary and the lower row the accumulated demonstrations.Initially, a single ”a” is demonstrated and the first skill isadded, with the initial covariance Σinit. Afterwards, addi-tional ”a”s are demonstrated, recognized and used to updatethe mean and covariance of the corresponding cluster. Oncea new demonstration, i.e. the letter ”m”, is recognized to not

A

B

CD

E

G

F

demonstrations per taskKullb

ack

-Leib

ler

div

erg

ence

to b

ase

line

our approachEM

x

z y

(a) (b) (c)

Fig. 5. (a) We evaluate our approach in a collaborative scenario where a robot (A) assists a human (B) in making a salad. Hereby the robot can handover the board (D) the dressing (C), a tomato (F), the bowl (E) or assist with a standup motion. (b) The demonstrations are recorded as human and robottrajectories where the robot is moved in kinesthetic teaching mode and the wrist trajectory is tracked with a motion capturing system. (c) We again comparethe KL-divergence of both our approach and an EM approach to a baseline from labeled data. For an increasing number of demonstrations per task ourapproach converges to the EM solution but requires less recomputation and memory as new demonstrations arrive.


board tomato bowl dressing standup

(a) (b) (c)

x

y

z

y

x zy

Fig. 6. For a human test subject (subject 1) we recorded 15 demonstrations per task. The trajectories are shown from a top-down view (a) and from afront view (b). On the demonstrations, we train a collaborative task library, where we do not provide all demonstrations as batch data but incrementallyadd more demonstrations. (c) shows the resulting gating model, which corresponds to the human trajectory part of the Interaction ProMPs. The shadedarea represents two times the standard deviation while the solid lines show the mean. The demonstrated trajectories are depicted as gray lines.

belong to the existing cluster a new cluster is generated. Withan increasing number of samples, the variance convergesto the variance of the demonstrations as the impact of theinitialization covariance decreases. The final skill library con-sists of five clusters representing the different letters. Pleasenote that in this experiment µg

k,Σgk = µe

k,Σek. We evaluate

the approach in a collaborative setting later in Section IV-B. To demonstrate that the library learned with our newapproach using the incremental processing of demonstrationsconverges to the solution of EM with batch learning, wecompare the resulting skill libraries first qualitatively asshown in Figure 4 (c) and quantitatively using the Kullback-Leibler Divergence to a baseline as shown in Figure 4 (b).Qualitatively speaking, our approach (Figure 4 (c), upperrow) represents all different letters as individual clusters andthe trajectory means of the mixture model components matchthe means learned with EM in batch mode (Figure 4 (c),bottom row). While for fewer samples per letter the trajectorycovariances learned with our approach are dominated byΣinit, with increasing number of samples per letter thetrajectory covariances converge to the covariances of the EMsolution as the influence of the initial covariance decreases.The same behavior can also be observed in the quantitativecomparison. Hereby we compute the Kullback-Leibler (KL)divergence of our approach and EM to a baseline, computedwith Maximum Likelihood estimation from labeled data. TheKL-divergence of our approach is averaged over 100 trials,where the order of demonstrations is randomly permuted.In the batch EM case, we provided the method with thecorrect number of components, while in our approach the

timesteps timestepstimesteps

q2 q4 q6

Fig. 7. Once trained with demonstrations our model can subsequentlybe used to produce a corresponding robot response to an observed humantrajectory. Therefore, first the gating model decides which of the Interac-tion Primitives (light gray) to activate (green). The activated primitive issubsequently adapted to the variance in the observed human trajectory viaconditioning (dark gray). The plots show joints q2, q4 and q6 of the robotarm where the shaded area represents two times the standard deviation whilethe solid lines show the mean.

algorithm had to find the correct number of components byitself. Figure 4 (b) shows that the KL-divergence between thesolution of our approach and the baseline is large for fewersamples and decreases with increasing number of samples.The high variance in the KL-divergence for few samples isexpected as the KL-divergence is sensitive to the entropyof the ground truth model, which clearly depends on theselected demonstrations. The variance also shrinks as theentropy converges for multiple samples. The experimentsshow that our novel approach achieves results comparableto those of an EM approach. However, in contrast to EM,our approach does not require all data in batch mode butcan incrementally learn and update its models from newdemonstrations in an online and open-ended fashion.

B. Learning Cooperative Tasks with a Robotic Arm

In this experiment, the proposed approach is tested in acollaborative scenario, where a robot is supposed to assistan elderly person in making a salad. The robot assists theperson by first observing and recognizing the human actionand second determining, adapting and executing an assistiveresponse based on prior demonstrations. For the salad sce-nario, shown in Figure 5 (a), five different cooperative tasksare required, namely:

• Board: The robot hands over the cutting board after thehuman grasped the knife.

• Tomato: The robot passes the tomato when the humanreaches for the tomato.

• Bowl: The robot passes the salad bowl when the humanreaches for the bowl.

• Dressing: The robot gets the salad dressing from theshelf after the human reached for the dressing.

• Standup: The robot supports the standup motionEach of the cooperative tasks is demonstrated separately

and multiple times. The robot response is shown usingkinesthetic teaching, while the human action is recordedusing motion capturing markers on the wrist. This teachingprocedure is shown for the bowl task in Figure 5 (b).

In an initial experiment, 15 demonstrations are recordedfor every task with a human test subject (subject 1). Theresulting human trajectories are shown in top-down and frontview in Figure 6 (a), (b). From the demonstrations ourapproach learns a library for cooperative skills consistingof a gating model and multiple corresponding InteractionProMPs. Hereby, the demonstrations are not provided asbatch data but incrementally and the data are not stored.An example of the gating model for the human trajectories(which corresponds to the human part of the InteractionProMPs) is shown in Figure 6 (c). The five different skillclusters are clearly visible. Figure 5 (c) shows that similarlyto the letter experiment, the averaged KL-divergence w.r.t. tothe ground truth solution learned from labeled data decreaseswith the number of demonstrations per task and is for moredemonstrations per task comparable to the EM solution,while it requires fewer computations and memory.

In the interactive setting, the robot determines and adaptsits response to the human movement based on prior demon-strations. Such an adapted robot response is shown for thebowl task in Figure 5 (b) in the bottom. The adaptation ofthe robot response is achieved by conditioning the InteractionProMP on the observed human wrist trajectory as describedin Section III-C. An example of such an adaptation can beseen in Figure 7, for the tomato task.

To further demonstrate the applicability and robustnessof the proposed approach, we conducted more experimentswith different subjects and identical hyperparameters. Foreach subject individual demonstrations are recorded and acorresponding personalized skill library is incrementallylearned. To evaluate the performance, the classificationaccuracy for recognizing the correct cooperative task isevaluated by k-fold cross-validation. For subject 1 we use

a training set of 10 demonstrations per task and test on5 demonstrations per task, while all other subjects use 4demonstrations per task for training and 1 demonstrationper task for testing. The classification results averagedover 100 test and train sets are shown in Table 1. Thefirst value corresponds to the percentage of successfulclassifications, the second to the percentage of wrongclassifications (such as classifying tomato as bowl) and thethird to the percentage of classifications as unknown. Theresults reveal that even though our approach works wellfor six of the subjects the classification accuracies for theother 4 subjects vary between tasks. In particular, for theboard task subject 3, 7 and 9 have some movements with ahigh variance to the training set that are therefore classifiedas unknown. However, the classification as unknown doesnot yield a wrong robot response and the robot wouldonly ask for a new demonstration. Only for subject 10the robot misclassifies the stand-up skill. In addition tothe classification accuracy, Table 2 shows the number oflearned components for the individual subjects to providesome insights about the personalized skill libraries. Here,depending on a variance of the subject’s movement a singleskill can be represented by multiple clusters since we usedthe same hyperparameters for all subjects and did not tunethem individually. Additional clusters do not cause wrongclassifications but can lead to unknown classification asshown for subjects 3, 7 and 9. The results for subject 10illustrate that the wrong classifications for the standup taskfor this subject were caused by too few learned clusters.In all the experiments we assumed a fixed observationtime of the human motion, after which the gating modeldecides on a particular cooperative task. For future work weconsider to replace this fixed observation time with a moreflexible observation time to allow for temporal correlationbetween human and robot motions.

V. CONCLUSIONS

In this paper, we introduce a novel approach to learn amixture model of probabilistic interaction primitives in anonline and open-ended fashion. In contrast to prior workwhich focussed on batch learning of Mixture of InteractionPrimitives, our approach is able to update existing interactionprimitives continuously from new data and extend a coopera-tive tasks library with new interaction patterns when needed.Experimental evaluation on a collaborative scenario with a7DoF robot arm showed that our approach is able to learnmultiple different collaborative tasks from unlabeled trainingdata and generate corresponding robot motions, based onprior demonstrations. Evaluations with 10 human subjectsshowed that our approach successfully learned a personalizedcollaborative library for the majority of subjects.

However, since the experiments on different subjects indi-cate that motion data do not work equally well for all subjectsand tasks, we are currently investigating how to include othermodalities such as gaze direction or voice commands inour gating model. Another interesting line for future workis to train a cooperative tasks library across subjects to

TABLE ICLASSIFICATION ACCURACY

board tomato dressing standup bowlsubject1 1.0 1.0 1.0 1.0 0.99 (0|0.01)subject2 1.0 1.0 1.0 1.0 1.0subject3 0.85 (0|0.15) 1.0 1.0 0.72 (0|0.28)) 1.0subject4 1.0 1.0 1.0 1.0 1.0subject5 1.0 1.0 0.99 (0.01|0) 1.0 1.0subject6 1.0 1.0 1.0 1.0 1.0subject7 0.77 (0|0.23) 1.0 0.88 (0|0.12) 1.0 1.0subject8 1.0 1.0 1.0 1.0 1.0subject9 0.84 (0|0.16) 1.0 1.0 1.0 1.0subject10 1.0 1.0 1.0 0.77 (0.23|0) 1.0

TABLE IINUMBER OF CLUSTERS AFTER TRAINING

4 5 6 7subject1 0 1.0 0 0subject2 0 1.0 0 0subject3 0 0.23 0.77 0subject4 0 1.0 0 0subject5 0.01 0.72 0.23 0.04subject6 0 1.0 0 0subject7 0 0.85 0.15 0subject8 0 1.0 0 0subject9 0 0.78 0.22 0subject10 0.23 0.77 0 0

achieve better transfer to new subjects. In particular, we areinvestigating how the number of demonstrations per newsubject can be reduced when reusing prior demonstrationsfrom other subjects. Moreover, since for now the InteractionProMPs in the cooperative tasks library are solely learnedfrom demonstrations an important component for future workis to enrich and improve the trajectories of the robot, forexample, by using reinforcement learning.

REFERENCES

[1] K. Linz and S. Stula, “Demographic change in europe-an overview,”Observatory for Sociopolitical Developements in Europe, vol. 4, no. 1,pp. 2–10, 2010.

[2] S. Schaal, “Is imitation learning the route to humanoid robots?” Trendsin cognitive sciences, vol. 3, no. 6, pp. 233–242, 1999.

[3] D. A. Rosenbaum, Human motor control. Academic press, 2009.[4] G. Maeda, M. Ewerton, R. Lioutikov, H. B. Amor, J. Peters, and

G. Neumann, “Learning interaction for collaborative tasks with prob-abilistic movement primitives,” in Humanoid Robots (Humanoids),2014 14th IEEE-RAS International Conference on. IEEE, 2014, pp.527–534.

[5] M. Ewerton, G. Neumann, R. Lioutikov, H. B. Amor, J. Peters, andG. Maeda, “Learning multiple collaborative tasks with a mixture ofinteraction primitives,” in Robotics and Automation (ICRA), 2015IEEE International Conference on. IEEE, 2015, pp. 1535–1542.

[6] B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A surveyof robot learning from demonstration,” Robotics and autonomoussystems, vol. 57, no. 5, pp. 469–483, 2009.

[7] A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Robot program-ming by demonstration,” in Springer handbook of robotics. Springer,2008, pp. 1371–1394.

[8] D. Vogt, S. Stepputtis, S. Grehl, B. Jung, and H. B. Amor, “A systemfor learning continuous human-robot interactions from human-humandemonstrations,” in Robotics and Automation (ICRA), 2017 IEEEInternational Conference on. IEEE, 2017, pp. 2882–2889.

[9] G. J. Maeda, G. Neumann, M. Ewerton, R. Lioutikov, O. Kroemer,and J. Peters, “Probabilistic movement primitives for coordinationof multiple human–robot collaborative tasks,” Autonomous Robots,vol. 41, no. 3, pp. 593–612, Mar 2017. [Online]. Available:https://doi.org/10.1007/s10514-016-9556-2

[10] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal,“Dynamical movement primitives: learning attractor models for motorbehaviors,” Neural computation, vol. 25, no. 2, pp. 328–373, 2013.

[11] S. Calinon, F. Guenter, and A. Billard, “On learning, representing,and generalizing a task in a humanoid robot,” IEEE Transactions onSystems, Man, and Cybernetics, Part B (Cybernetics), vol. 37, no. 2,pp. 286–298, 2007.

[12] A. Paraschos, C. Daniel, J. Peters, and G. Neumann, “Using prob-abilistic movement primitives in robotics,” Autonomous Robots, pp.1–23, 2018.

[13] H. B. Amor, G. Neumann, S. Kamthe, O. Kroemer, and J. Peters,“Interaction primitives for human-robot cooperation tasks,” in Roboticsand Automation (ICRA), 2014 IEEE International Conference on.IEEE, 2014, pp. 2831–2837.

[14] C. Perez-D’Arpino and J. A. Shah, “Fast target prediction of humanreaching motion for cooperative human-robot manipulation tasks usingtime series classification,” in Robotics and Automation (ICRA), 2015IEEE International Conference on. IEEE, 2015, pp. 6175–6182.

[15] D. Lee, C. Ott, and Y. Nakamura, “Mimetic communication modelwith compliant physical contact in human-humanoid interaction,” TheInternational Journal of Robotics Research, vol. 29, no. 13, pp. 1684–1704, 2010.

[16] G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, “Robot learn-ing from demonstration by constructing skill trees,” The InternationalJournal of Robotics Research, vol. 31, no. 3, pp. 360–375, 2012.

[17] S. Calinon and A. Billard, “Incremental learning of gestures byimitation in a humanoid robot,” in Proceedings of the ACM/IEEEinternational conference on Human-robot interaction. ACM, 2007,pp. 255–262.

[18] A. Ahmed and E. Xing, “Dynamic non-parametric mixture modelsand the recurrent chinese restaurant process: with applications to evo-lutionary clustering,” in Proceedings of the 2008 SIAM InternationalConference on Data Mining. SIAM, 2008, pp. 219–230.

[19] O. Arandjelovic and R. Cipolla, “Incremental learning of temporally-coherent gaussian mixture models,” Society of Manufacturing Engi-neers (SME) Technical Papers, pp. 1–1, 2006.

[20] P. M. Engel and M. R. Heinen, “Incremental learning of multivari-ate gaussian mixture models,” in Brazilian Symposium on ArtificialIntelligence. Springer, 2010, pp. 82–91.

[21] R. C. Pinto and P. M. Engel, “A fast incremental gaussian mixturemodel,” PloS one, vol. 10, no. 10, p. e0139931, 2015.

[22] A. Declercq and J. H. Piater, “Online learning of gaussian mixturemodels-a two-level approach.” in VISAPP (1), 2008, pp. 605–611.

[23] G. Maeda, M. Ewerton, T. Osa, B. Busch, and J. Peters, “Activeincremental learning of robot movement primitives,” in Conferenceon Robot Learning (CORL), 2017.

[24] J. Hoyos, F. Prieto, G. Alenya, and C. Torras, “Incremental learningof skills in a task-parameterized gaussian mixture model,” Journal ofIntelligent & Robotic Systems, vol. 82, no. 1, pp. 81–99, 2016.

[25] I. Havoutis, A. K. Tanwani, and S. Calinon, “Online incrementallearning of manipulation tasks for semi-autonomous teleoperation,”2016.

[26] D. Kulic, C. Ott, D. Lee, J. Ishikawa, and Y. Nakamura, “Incrementallearning of full body motion primitives and their sequencing throughhuman motion observation,” The International Journal of RoboticsResearch, vol. 31, no. 3, pp. 330–345, 2012.

[27] A. Lemme, R. F. Reinhart, and J. J. Steil, “Self-supervised bootstrap-ping of a movement primitive library from complex trajectories,” inHumanoid Robots (Humanoids), 2014 14th IEEE-RAS InternationalConference on. IEEE, 2014, pp. 726–732.

[28] K. Muelling, J. Kober, and J. Peters, “Learning table tennis with amixture of motor primitives,” in Humanoid Robots (Humanoids), 201010th IEEE-RAS International Conference on. IEEE, 2010, pp. 411–416.

[29] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptivemixtures of local experts,” Neural computation, vol. 3, no. 1, pp. 79–87, 1991.

Documents

Online Learning of an Open-Ended Skill Library for Collaborative … · 2018. 10. 15. · Online Learning of an Open-Ended Skill Library for Collaborative Tasks Dorothea Koert 1,