Natural Movement Generation Using Hidden Markov Models and … · 2017-12-04 · 1184 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 38, NO. 5, OCTOBER

1184 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 38, NO. 5, OCTOBER 2008

Natural Movement Generation Using HiddenMarkov Models and Principal Components

Junghyun Kwon and Frank C. Park

Abstract—Recent studies have shown that the perception ofnatural movements—in the sense of being “humanlike”—dependson both joint and task space characteristics of the movement. Thispaper proposes a movement generation framework that mergestwo established techniques from gesture recognition and mo-tion generation—hidden Markov models (HMMs) and principalcomponents—into an efficient and reliable means of generatingnatural movements, which uniformly considers joint and taskspace characteristics. Given human motion data that are classifiedinto several movement categories, for each category, the principalcomponents extracted from the joint trajectories are used as basiselements. An HMM is, in turn, designed and trained for eachmovement class using the human task space motion data. Naturalmovements are generated as the optimal linear combination ofprincipal components, which yields the highest probability for thetrained HMM. Experimental case studies with a prototype hu-manoid robot demonstrate the various advantages of our proposedframework.

Index Terms—Hidden Markov model (HMM), movementprimitive, natural movement, principal component.

I. INTRODUCTION

WHILE the various motivating arguments put forth fordesigning robots that physically resemble humans—the

need for robots to function in environments designed expresslyfor humans and the better facilitation of human–robot interac-tion leading to wider social acceptance of robots—are by nowwell known, recent works that focus on getting humanoid robotsto move in a humanlike way, e.g., [1] and [2], point to a growingrealization that how the robot moves and behaves is just asimportant in the way that humans ultimately perceive robots.

Several recent works have attempted to quantify how hu-manlike or natural a particular movement is. Considering onlythe angle and velocity profiles of the joints, Ren et al. [3]employ various statistical methods, e.g., a mixture of Gaussiansand hidden Markov models (HMMs), to measure how similara movement is to a previously acquired database of humanmovements. In contrast, the task space properties of a move-ment are emphasized in [4] and [5]; drawing upon evidence

Manuscript received September 17, 2007; revised December 31, 2007 andApril 16, 2008. This research was supported in part by grants from theFrontier 21C Program in Intelligent Robotics, by the Seoul National University(SNU) Center for Biomimetic Systems, and by IAMD—SNU. This paper wasrecommended by Associate Editor C.-T. Lin.

This paper has supplementary downloadable material available at http://ieeexplore.ieee.org, provided by the authors. This includes four movie clips,which show the humanoid motion generation results for the numerical andexperimental case studies in Section IV. This material is 9.9 MB in size.

The authors are with the School of Mechanical and Aerospace Engineering,Seoul National University, Seoul 151-742, Korea (e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TSMCB.2008.926324

from human motor control research, Simmons and Demiris [4]generate human point-to-point arm-reaching movements inwhich the hand typically follows a linear path and the velocityprofile resembles a smooth bell-shaped curve. In [5], humansubjects are asked to subjectively evaluate the naturalness ofarm movements generated from various algorithms, e.g., theminimum velocity or acceleration of the hand, minimum angu-lar velocity or acceleration of the joints, minimum joint torqueor torque change based on dynamic optimization, etc., and theauthors conclude, among other things, that the smoothness ofhand velocity profiles is critical and that minimum torque andother dynamically optimal motions tend not to be perceived asbeing natural (the latter finding is also confirmed in [6]). Theimportant consequence inferred from such previous works isthat both the joint and task space characteristics of a movementare important ingredients affecting the movement’s naturalness.

The easiest and most popular way to generate humanoidmovements resembling human movements in both the jointand task spaces is to exploit some existing database of humanmotion capture data. Recent approaches attempt to generatemovements that are similar, in some statistical sense, to thisdatabase of movements; this is also true for many works inimitation learning and programming by demonstration. Nearlyall of the metrics used to measure the similarity between move-ments rely on joint and task space representations of the move-ment. However, directly applying a particular set of humanmotion capture data—in the form of joint angle trajectories—toa robot will more often than not yield unsatisfactory movementsbecause topological and structural differences inevitably existbetween the human subjects providing the motion data and theactual robot (e.g., differences in kinematic degrees of freedom,joint types, link dimensions, joint limits, mass and inertialproperties, etc.).

Therefore, what is needed is a procedure for generating newmovements from a collection of similar movements (or, in thecontext of imitation learning, from multiple demonstrations of aspecific movement), which adjusts the joint trajectories to com-pensate for these structural differences while still preserving the“natural” qualities of the movement in both the joint and taskspace senses.

The main contribution of this paper is an algorithm for gen-erating natural humanlike movements, which simultaneouslyaccounts for task and joint space characteristics in a unifiedand consistent fashion while preserving the natural qualitiesof the movement within an optimization setting. The approachthat we set forth is based on the following: 1) the encoding ofmovement primitives in joint space as a set of basis elementsthat are extracted via the principal component analysis (PCA)

1083-4419/$25.00 © 2008 IEEE

KWON AND PARK: NATURAL MOVEMENT GENERATION 1185

of existing human movement data and 2) the use of HMMsto determine the optimal linear combination of basis elements,which best represents the movement in the task space.

The idea that complex movements are created from a vocab-ulary of movement primitives has inspired a wealth of relatedstudies in the human motor control literature (see, e.g., [7]–[9]and the references cited therein). The method of representingand generating human arm movements as a linear superpo-sition of principal components was first investigated in [8].Basis elements obtained from PCA have also been used ininterpolation-based schemes for deriving action and behaviorprimitives for humanoid robots [10]–[12] and for obtainingdynamically suboptimal motions in real time, e.g., suboptimalminimum torque motions [12].

The use of HMMs are ubiquitous in time series signalprocessing, particularly in speech recognition [13]. Recently,HMMs have also appeared in video-based gesture recognition[14]–[16] and robot task learning [17], [18]. The ability ofHMMs to generalize human demonstrations has naturally ledto several proposed methods for HMM-based movement gener-ation [19]–[23], whose details we discuss in the following.

In our proposed method, each movement primitive is repre-sented by an HMM trained with human motion capture data.The HMM training process uses only the task space propertiesof the movement, i.e., only the hand trajectories and velocitiesin Cartesian space. At the same time, basis elements in theform of joint angle trajectories for each movement primitiveare extracted via the PCA of the same HMM training data.Movements are then generated by selecting the basis elementweights that result in the maximum probability for the givenHMM and that meet boundary conditions and other user-specified constraints.

A. Related Work

Several closely related works that employ HMMs for move-ment generation have been recently proposed in the literature.Inamura et al. [19] train a continuous HMM using joint angletrajectories of human motion capture data as the observationdata. New joint angle trajectories are obtained via the directsimulation of state transition and observation emission processwith an averaging strategy, in which the following are ac-complished: 1) the average state sequence is obtained fromrepeated trials of the state transition process; 2) the joint angletrajectories are obtained from a single trial of the observationemission process based on the averaged state sequence; and3) the averaged joint angle trajectories are obtained from re-peated trials of 1) and 2).

Calinon and Billard [20]–[22] primarily use HMM in animitation learning context, which is for generalizing multi-ple human demonstrations of a given task. In contrast withInamura et al. [19], Calinon and Billard [20] use only key-pointvalues, such as inflection points extracted from the joint angleor 3-D hand path trajectories, to train two corresponding sep-arate HMMs. New movements are generated by interpolatingthe key-point sequence retrieved from the trained observationmean vectors. The works of Calinon and Billard in [21] and[22] extend their work in [20] by the following: 1) using

PCA or independent component analysis to reduce the datadimension and the effects of noise before HMM training and2) introducing a cost function to determine whether to controlthe humanoid using either the retrieved joint angle or 3-D handpath trajectories.

Asfour et al. [23] also rely on key-point retrieval from thetrained HMM and interpolation of those key-points, which issimilar to [20]. A distinctive feature of [23] is that three separateHMMs are employed for a specific movement, namely, joint-angle-based, hand-path-based, and hand-orientation-based. Thenewly generated movement is the blend of the joint angletrajectories produced by those HMMs.

The approach presented in this paper enhances, in sev-eral important respects, the previously mentioned HMM-basedmethods for movement generation. The primary original con-tributions can be summarized as follows.

1) Our framework provides a systematic and efficient meansof simultaneously considering the goodness of fit in boththe joint and task spaces; the basis elements extracted viaPCA reflect the joint space characteristics of the move-ment, whereas the trained HMM focuses on the featuresof the movement in the task space. Considering that therequisite optimization involves selecting the best linearcombination of the movement basis elements that yieldsthe highest probability for the trained HMM, the resultingmovement preserves both the joint and task space charac-teristics of the original human motion. Our approach canbe contrasted with that of Inamura et al. [19], which usesonly human joint angle trajectories and does not considerthe task space features of the movement. At the sametime, the other methods described in [20]–[23] generateseparate movements according to the task and joint spacecharacteristics, of which one is either selected or the twoare suitably merged through some ad hoc user-definedcriterion.

2) Our optimization procedure effectively resolves anymovement discrepancies arising from the inherent kine-matic differences between the target humanoid and thehuman subject providing the movement database. Con-sidering that previous approaches [19]–[23] directly gen-erate the movements without considering such structuraldifferences, it becomes necessary to, e.g., manually adjustthe human joint angle trajectories prior to HMM trainingto ensure, for example, that the joint limits of the tar-get humanoid are not exceeded and that the qualitativefeatures that make the movement natural are preservedeven for different link length proportions. Such adjust-ments must, moreover, be made repeatedly wheneverthe target humanoid is changed. In contrast, consideringthat any joint limit constraints can be easily includedinto our optimization procedure, our approach avoids anylaborious manual modification of the human joint angletrajectories. As we will also discuss later, consideringthat the HMM observation vector is defined to be theCartesian hand position and velocity (normalized by thesum of the link lengths), humanlike movements can begenerated simply by explicitly specifying the initial and


final joint configurations of the movement as constraintsin the optimization procedure. Unlike past approaches,the modification of the human joint angle trajectories andthe HMM training is unnecessary even when the targethumanoid is changed.

We emphasize that although our proposed approach bearssome superficial similarities to the methodology presented in[21] and [22], which is in the respect that both apply PCAand HMMs to human movement data, the two approaches arefundamentally different in both purpose and means; the worksof Calinon and Billard in [21] and [22] apply PCA to humanmovement data across different joint or Cartesian position vari-ables to reduce the data dimension and the effects of noise priorto HMM training. In contrast, our approach applies PCA tohuman movement data across multiple human demonstrationsof every single joint comprising a movement in order to extractthe movement basis elements for each corresponding joint.

B. Paper Organization

This paper is organized as follows. In Section II, we describethe principal component-based approach to constructing jointspace basis elements and how these elements can be usedto generate movements. Section III then presents our overallHMM-based optimization framework for movement genera-tion. Section IV provides a detailed case study involving diversesets of arm movements for a prototype humanoid robot, whichis followed by a summary and topics for further study inSection V.

II. JOINT SPACE BASIS ELEMENTS VIA

PRINCIPAL COMPONENTS

In our procedure for movement basis extraction via PCA, ahuman demonstrator first executes repeated trials of a specificmovement (e.g., hand raising), which are then recorded with amotion capture device. Assuming that the kinematic parametersof the human subject are available, the movement data arestored in the form of joint angle trajectories obtained from theinverse kinematics. For each joint, PCA is then applied to thecaptured joint angle trajectories, and a set of dominant principalcomponents (or, more precisely in our case, their continuous-time polynomial approximations) are selected as basis elementsrepresenting that movement class. This procedure is repeatedfor different types of basic movements, with the associatedprincipal components constituting a convenient “primitive” rep-resentation for each movement class.

If the demonstrator executes a specific movement N times,with each movement sampled at M uniform time intervals, thejoint angle trajectories of a particular joint can be represented asa sample movement matrix X ∈ �M×N ; here, each column ofX , which is denoted as xi ∈ �M , i = 1, 2, . . . , N , representsa single (sampled) joint angle trajectory for each execution.The associated sample covariance matrix C ∈ �M×M can beobtained as

C =1N

N∑i=1

(xi − x)(xi − x)� (1)

Fig. 1. (a) Joint angle trajectories obtained from human motion capturesession. The thick line represents the mean joint angle trajectory. (b) First fourdominant principal components extracted from human joint angle trajectories.

where x is the sample mean. Denoting the eigenvalues of C,in decreasing order, by λi (i.e., λ1 ≥ λ2 ≥ · · · ≥ λM ), thecorresponding eigenvectors ei are the principal componentsof X . The ratio between each eigenvalue and the sum of allthe eigenvalues is then the percentage of data explained byeach corresponding principal component. Usually, the first pdominant principal components are selected as movement basiselements according to the following criterion:

∑pi=1 λi∑Mi=1 λi

≥ T (2)

where T is a specific user-supplied threshold (e.g., 0.9and 0.95).

Fig. 1 shows an example of the captured joint angle tra-jectories and the four dominant principal components. If weuse only the first three principal components as the movementbasis elements, the newly generated joint angle trajectory canbe expressed as a linear combination of bi(t), which is thecontinuous-time polynomial approximation of each principalcomponent ei, as follows:

q(t) = q(t) + c1b1(t) + c2b2(t) + c3b3(t) + c4 (3)

where q(t) is the continuous-time polynomial approximationof the mean trajectory and ci represents the scalar weight-ing coefficients. What remains now is how to determine thecoefficients ci.

One of the simplest ways of determining ci is to performa linear movement interpolation via the solution of the linearequation⎡⎢⎢⎣

b1(t0) b2(t0) b3(t0) 1b1(tf ) b2(tf ) b3(tf ) 1b1(t0) b2(t0) b3(t0) 0b1(tf ) b2(tf ) b3(tf ) 0

⎤⎥⎥⎦

⎡⎢⎣

c1

c2

c3

c4

⎤⎥⎦=

⎡⎢⎣

q(t0) − q(t0)q(tf ) − q(tf )q(t0) − ˙q(t0)q(tf ) − ˙q(tf )

⎤⎥⎦ (4)

where {q(t0), q(tf ), q(t0), q(tf )} are boundary conditions atthe initial and final configurations specified by the user. Thecomputational time for solving (4) is negligible. Modifyingthe number of principal components used or applying otherboundary conditions, e.g., joint angle accelerations at the endpoints, can be accommodated in a similar fashion.

Note that (4) must be separately applied to each joint. Whenthe number of weights to be determined exceeds the number


of boundary conditions, the weight values can be determinedvia optimization with respect to a suitable criterion. The sub-sequent suboptimal motions obtained for physical criteria, e.g.,minimum torque motion, can generally be obtained far moreefficiently than by using more general basis elements such asB-splines.

Finally, we note that although our framework employs princi-pal component representations of the joint trajectories, in prin-ciple, one can use any number of methods for constructing thebasis elements for each movement primitive, e.g., independentcomponent analysis, nonnegative matrix factorization, etc.

III. MOVEMENT GENERATION FRAMEWORK

By using the principal components as the movement basiselements, we can generate new joint angle trajectories thatappear similar to the captured human joint angle trajectories.However, this apparent similarity in the joint space does notensure a natural appearance in the task space. In previous casestudies involving arm movements [24], we have shown thatboth the linear interpolation method using (4) and torque mini-mization, although efficient, produce movements that somehowfail to capture the small but essential details that result in theperception of a movement as being natural—the spatiotemporalcharacteristics of the movement in the task space (such asthe motion of the hand while waving) clearly need to beconsidered.

To address this concern, we combine the PCA-based move-ment generation framework [12] and the HMM representationof movements used for gesture recognition into a single move-ment generation framework. Specifically, the HMM is trainedin the task space for a specific movement primitive, using thesame human motion capture data employed in the PCA-basedmovement basis extraction procedure. We then determine thelinear combination of weights for the movement basis elementsso as to maximize the probability for the trained HMM, whichis subject to any user-imposed boundary joint values and con-straints. The end result is a movement that is natural from theperspective of both the task and joint spaces, and it satisfiesthe desired initial and final configurations of the end effector.We describe the proposed movement generation frameworkin the following two stages: 1) HMM setup and training and2) motion optimization using the principal components with thetrained HMM.

A. HMM Setup and Training

HMMs can be characterized as discrete state stochasticprocesses consisting of hidden states S and observation vectorso from S. The characteristics of a specific HMM are com-pletely determined by the probability distribution set λHMM ={A,B, π}, where A, B, and π are, respectively, the statetransition probability distribution, the observation probabilitydistribution, and the initial state distribution. In Fig. 2, aij

comprising A represents the transition probabilities from stateSi to state Sj , i.e., aij = P (qt+1 = Sj |qt = Si), where wedenote the actual state at time t as qt, and bj(o) comprisingB represents the observation probability distributions at Sj .

Fig. 2. Examples of HMMs. (a) Ergodic HMM with four states. (b) Left–rightHMM with three states.

If the mixture of M Gaussian densities is used, bj(o) can beexpressed as

bj(o) =M∑

m=1

wjmNjm(o;μjm,Σjm) (5)

where wjm represents the mixture coefficients for the mthGaussian mixture in the state Sj and μjm and Σjm are themean vectors and the covariance matrices for the mth Gaussianmixture in the state Sj . Given the observation sequences, HMMtraining involves determining the probability distribution setλHMM = {A,B, π} so as to maximize the probability of theobservation sequences for the given model. Many differenttraining algorithms are available, e.g., the Baum–Welch algo-rithm is widely used in speech recognition applications [13].

As a first step toward using HMMs for movement generation,the observation vector must be defined; for this purpose, weuse the 3-D Cartesian position and velocity of the end effector,which can be straightforwardly obtained from the forwardkinematics once the joint and kinematic parameters of thehuman links are available. Considering that in most cases, thelink length proportions of the humanoid robot will differ fromthat of the human demonstrator, we use the Cartesian positionnormalized by the sum of the link lengths, i.e., if the Cartesianposition with respect to the fixed frame is p = (X,Y,Z) andthe normalized position is p = (X, Y , Z), then the observation

vector is chosen to be o = (X, Y , Z,˙X,

˙Y ,

˙Z) ∈ �6. For the

observation density, we use a single Gaussian density ratherthan a mixture of multiple Gaussians, considering that the train-ing observation sequences are necessarily limited in practice.

Next, the type of HMM must be decided. The most generaltype of HMM is the ergodic HMM shown in Fig. 2(a), whereall of the states are mutually connected. In the ideal case, thechoice of the type of HMM should not be a critical factor.For example, in speech recognition applications, where theleft–right HMM shown in Fig. 2(b) is widely used according tothe temporal characteristics of the spoken word, all the aij’s that


Fig. 3. Examples of choices of the number of the hidden states for handmovements, which are, namely, hand raising and figure “3.”

do not correspond to the left–right HMM would vanish duringthe training process, even for the ergodic HMM. However,considering that the training observation sequences are noisy tosome degree and their available quantity is limited in practice,choosing an appropriate type of HMM is crucial for effectivetraining. For our application in which the observation vector isdefined to be the position and velocity of the end effector, theleft–right HMM shown in Fig. 2(b) is found to be an appropriatechoice.

The number of hidden states must also be decided. In theextreme, the number of states can be set equal to the lengthof the observation vector sequence [13]; however, this is notrecommended for the obvious reasons of limited training dataand excessive computational requirements at the training stage.In the case of speech recognition, the number of hidden statestypically corresponds to the number of phonemes in a specificword. Similarly, we can select the number of hidden statesaccording to the spatial properties of a specific movement. Weheuristically split the specific movement into the initial and finalstates and a minimal number of intermediate states that arerepresentative of the movement characteristics. Fig. 3 showsexamples of such choices of the number of hidden states forhand movements that are considered in our case studies, i.e.,hand raising and a gesture representing the figure “3.” We notethat our method is not bound to any particular type of movementsegmentation and that any number of methods based on, e.g.,curvature considerations, information-theoretic criteria, such asthe minimum description length, etc., can be employed.

For HMM training, the human motion capture data used toextract the movement basis elements is also used to obtainthe observation sequences. Therefore, the observation sequenceO = {o1, o2, . . . , oM} is available for each motion captureexecution, and totally N sample observation sequences areavailable for HMM training. As a result of HMM training,multiple human demonstrations are generalized, and the essen-tial characteristics of a specific movement are encoded in theprobability distribution set λHMM = {A,B, π}.

B. Movement Optimization

Once the trained HMM and the principal components ex-tracted from the human motion capture data are available,optimal movements using the principal components as move-ment basis elements can be determined that yield the highestprobability for the trained HMM. This step can be expressed inthe form of an optimization involving the trained HMM, i.e.,

maxc

P (O|λHMM) (6)

Fig. 4. Four joints of the humanoid arm.

where c denotes the weights in the linear combination of thePCA-based basis elements and O is the observation vectorsequence [computed from the forward kinematic equations withthe joint angle trajectories obtained from the linear combinationof the principal components for each joint as given by (3)],which is subject to appropriate boundary joint values at theinitial and final configurations. When the joint limits of thehumanoid robot differ from those of the human demonstrator,the joint limit constraints can be easily inserted into the op-timization process. The boundary joint values and joint limitconstraints are, respectively, considered as the linear equalityand nonlinear inequality constraints in the optimization process.The forward procedure [13] is used in calculating P (O|λHMM).

To reduce the optimization search space, we use the projectedvalues of the human motion data onto the selected principalcomponents. For each principal component of a specific joint,there exist N projected values of the original human sam-ple movement matrix X ∈ �M×N . In order for the resultingmovement to be similar to the sample human movements, eachweighting coefficient of (3) should be close to the correspond-ing projected values of the human motion data. We thereforerestrict the search space of each weighting coefficient accordingto ±ci ≤ pi,max · (1 + δ), where pi,max is the maximum ab-solute value of the N projected values of the human movementsonto the corresponding principal component and δ is a smallpositive number to allow the new movement to exceed therange of the original human movements. For our following casestudies, δ is set to 0.3. The weighting coefficients, such as c4 in(3) not related with the principal components, are restricted tolie between −(1/2)π and (1/2)π.

IV. NUMERICAL AND EXPERIMENTAL CASE STUDIES

In this section, we examine the feasibility of our pro-posed movement generation framework via numerical andexperimental case studies involving several arm movements.We consider a 4-DOF humanoid arm, consisting of a 3-DOFshoulder and 1-DOF elbow, with the wrist assumed fixed (seeFig. 4). To test the adaptability of our framework to variations inthe link length proportions, three different humanoid models areused in the case studies, as shown in Table I. The forearm lengthrepresents the length between the elbow and the referencepoint of the hand. The first humanoid model has the samelink length proportions as the human demonstrator. Simple armmovements, including hand raising and a figure “3” gesture, are


TABLE ITHREE HUMANOID MODELS USED IN THE SIMULATION.

ALL UNITS ARE IN MILLIMETERS

considered in the case studies. The Matlab HMM toolbox [25]is used throughout for all calculations related to our HMMs.

A. Hand Raising

We first consider a simple hand-raising motion. The humandemonstrator executes this motion 50 times, and the humanjoint angle trajectories are calculated from the recorded 3-Dposition data of the markers. The principal components andobservation sequences are then extracted from the obtainedjoint angle trajectories. We use a left–right HMM with fivehidden states defined as follows.

State 1) The hand is at the initial lower position, with theelbow outstretched.

State 3) The hand is in the middle position, with theelbow bent.

State 5) The hand is in the final vertical position, with theelbow outstretched.

States 2) and 4) are defined in the obvious way as the interme-diate spatial positions of the adjacent states.

Fig. 5 shows both the training data and the training results ofthe associated HMM for the hand-raising motion. In Fig. 5(b),the estimated mean vectors of the observation Gaussian densityfor each state are plotted as asterisk markers, together with themean of observation sequences. Only the first three componentsof the mean vector corresponding to the normalized Cartesianposition of the hand can be displayed. As shown in Fig. 3, wecan see that the mean vectors are well placed at the positionspresumed during the design of the associated HMM.

Fig. 5(c) shows the log-likelihoods of each training obser-vation sequence for the trained HMM, i.e., P (O|λHMM). InFig. 5(d), for comparison, the maximum and minimum prob-ability training sequences for the trained hand-raising HMMare displayed with the mean observation sequence. Again, onlythe first three components, i.e., the normalized Cartesian handtrajectories, are plotted. The maximum probability observationsequence, whose log-likelihood for the trained HMM is −59.5,appears very much similar to the mean observation sequencewhose log-likelihood is −36.1, whereas the minimum probabil-ity observation sequence, whose log-likelihood is −584.3, doesnot. These results suggest that the HMM has been adequatelytrained for the hand-raising motion.

Before showing the results of our movement optimization,we illustrate, with examples, the problems that arise whendirectly applying human motion capture data to humanoids.First, the resulting motions may exceed joint limits, such asthe case for the humanoid elbow joint limit being set to(2/3)π in Fig. 6(a). Fig. 6(b) shows the resulting Cartesianhand trajectories obtained from the human joint trajectoriesfor our three humanoid models. The log-likelihood of theobservation sequence for humanoid model 1 for the trained

Fig. 5. (a) Cartesian hand trajectories of human demonstrations of the hand-raising motion. (b) Results of HMM training for the hand-raising motion. Thethick line represents the mean of the observation sequences. (c) Log-likelihoodsof training observation sequences for the trained HMM. (d) NormalizedCartesian hand trajectories of the (A) mean, (B) maximum probability, and(C) minimum probability hand-raising motions.

HMM is −197.5, whereas those for humanoid models 2 and 3are, respectively, −308.5 and −328.1. These numbers reflectthe diminished naturalness of the movement arising from thedifference in length proportions.


Fig. 6. Examples of problems encountered when directly applying humanmotion capture data to humanoids. (a) Human motion capture data may exceedthe different humanoid joint limits. The dotted line represents the humanoid’selbow joint limit. (b) Movement may appear unnatural due to the different linklength proportions. The lines A, B, and C represent the normalized Cartesianhand trajectories of the hand-raising motion applied, respectively, to humanoidmodels 1, 2, and 3.

TABLE IIPERCENTAGE OF DATA EXPLAINED BY EACH PRINCIPAL COMPONENT

We now compare the movements obtained from the opti-mization stage with the previous results. Table II shows the per-centage of data explained by each principal component for ourbasic movements (hand raising and figure “3”). For both cases,we use the four dominant principal components in the optimiza-tion, which explain at least 80% of the original motion capturedata. The joint angle values at the initial and final configurationsare imposed as equality constraints in the optimization.

We first examine the effects of different joint limits onoptimized hand-raising movements for humanoid model 1. Thesolid and dashed lines of Fig. 7(a) represent, respectively, theoptimized joint angle trajectories for the case without and withelbow joint limits (set to (2/3)π in the latter case). The corre-sponding normalized Cartesian trajectories of the hand position

Fig. 7. (a) Joint angle trajectories of the optimized hand-raising motions (solidline) without and (dashed line) with the elbow joint limit. The dotted linerepresents the elbow joint limit. (b) Normalized Cartesian hand trajectories ofthe optimized hand-raising motion (A) without and (B) with elbow joint limits.

are shown in Fig. 7(b). Please see video attachment 1 for thecorresponding moving picture of the optimized movements.This will be available at http://ieeexplore.ieee.org. The log-likelihood of the movement in the case without the elbow jointlimit is −56.3 and −95.9 in the joint limit case.

We next examine the results of the optimized motions fordifferent link length proportions. We first optimize the hand-raising motion for humanoid model 1 for the prescribed bound-ary values of the joints. We then determine the optimal motionsfor humanoid models 2 and 3, using boundary values calculatedsuch that the initial and final normalized Cartesian hand posi-tions are the closest (in the sense of least squares distance) tothose for humanoid model 1. From the results shown in Fig. 8and video attachment 2, the similarities between the resultingnormalized Cartesian hand trajectories are immediately appar-ent in spite of the differences evident in the corresponding jointangle trajectories. This video attachment will be available athttp://ieeexplore.ieee.org. The log-likelihoods of the optimizedmotions for humanoid models 1, 2, and 3 are, respectively,−56.3, −62.3, and −67.4.

B. Figure “3” Gesture

We now apply our proposed framework to an arm gesturecorresponding to the figure “3.” The entire procedure for move-ment generation is identical to the previous cases.

A left–right HMM with seven states is used (see Fig. 3).Again, using 50 executions by the human demonstrator, theresults of the HMM training are shown in Fig. 9. Fig. 9(b)shows that our choice of the number of states for the figure “3”


Fig. 8. (a) Joint angle trajectories for the optimized hand-raising motions. Thesolid, dashed, and dotted lines represent, respectively, the results for humanoidmodels 1, 2, and 3. (b) Normalized Cartesian hand trajectories of the optimizedhand-raising motions. The lines A, B, and C represent, respectively, the resultsfor humanoid models 1, 2, and 3.

gesture is appropriate. The log-likelihoods of the maximumand minimum probability figure “3” gestures are, respectively,−162.7 and −420.8, whereas that of the mean observationsequence is −141.8. The corresponding normalized Cartesianhand trajectories are shown in Fig. 9(d).

The figure “3” gesture generation results are shown inFig. 10 and video attachment 3. This will be available athttp://ieeexplore.ieee.org. The boundary joint values for eachhumanoid model are set following the hand-raising case. De-spite the differences in joint angle trajectories for the three hu-manoid models, the normalized Cartesian hand trajectories arealmost identical. The log-likelihoods of the resulting gesturesfor humanoid models 1, 2, and 3 are, respectively, −138.9,−146.2, and −143.3, suggesting that the training appears tohave been adequate.

C. Application to the KIBO Humanoid Prototype

We now demonstrate the results of our movement generationprocedure using the KIBO humanoid robot prototype. Similarto the robot used for our earlier simulation studies, KIBO alsohas a 4-DOF arm (3-DOF shoulder and 1-DOF elbow). KIBO’supper arm and forearm lengths are both 120 mm, and its elbowjoint limit is set to (110/180)π. Two simple movements usedin the previous case study are generated again for KIBO. Theinitial and final boundary joint values for each movement arespecified in the same way as in the previous case studies.

The optimized motions are shown in Fig. 11 and video at-tachment 4. This will be available at http://ieeexplore.ieee.org.

Fig. 9. (a) Cartesian hand trajectories of human demonstrations for thefigure “3” gesture. (b) Results of HMM training. The thick line represents themean of the observation sequences. (c) Log-likelihoods of training observa-tion sequences for the trained HMM. (d) Cartesian hand trajectories for the(A) mean, (B) maximum probability, and (C) minimum probability motions.

The white lines represent the 2-D trajectories of the referencepoint on the hand. For the resulting hand-raising motion shownin the top row of Fig. 11, it can be verified that the resultingmotion is natural even with the elbow joint limit constraint. The


Fig. 10. (a) Joint angle trajectories of the optimized figure “3” gestures.The solid, dashed, and dotted lines represent, respectively, the results for thehumanoid models 1, 2, and 3. (b) Normalized Cartesian hand trajectories of theoptimized figure “3” gestures. The lines A, B, and C represent, respectively,the results for humanoid models 1, 2, and 3.

Fig. 11. Movement generation results for the humanoid robot prototypeKIBO. (Top) Hand raising. (Bottom) Figure “3” gesture.

resulting figure “3” gesture, which is shown in the bottom rowof Fig. 11, also appears quite satisfactory. The log-likelihoodsof the optimized hand-raising and figure “3” gestures are, re-spectively, −147.1 and −138.4 (also summarized in Table III).The log-likelihoods expressed in italics denote cases wherethe elbow joint limit constraint was in effect. It should benoted that the comparison of log-likelihoods across differentmovement classes is not meaningful because each movementclass possesses its own trained HMM.

The average computational times for generating each move-ment are shown in Table IV. The optimization is performed onthe Pentium Dual-Core 2.8-GHz processor with 2-GB memoryusing the “fmincon” function of the Matlab Optimization Tool-box. The optimization termination tolerances on the objective

TABLE IIILOG-LIKELIHOODS FOR THE TRAINED HMMs

TABLE IVAVERAGE COMPUTATIONAL TIMES

function value and the weighting coefficients are all set to 0.01.The computational times are averaged over ten trials for eachmovement with varying boundary joint values. From Table IV,we can see that there is a slight increase of the computationaltime in proportion to the number of states.

An analysis reveals that in the evaluation of the objectivefunction, most of the computational time is occupied in thecalculation of the forward kinematics of the observation vector,particularly by the curve fitting and differentiation procedure,to obtain the requisite velocities for the hand trajectory. In oursimulations, this stage is considerably speeded up without anynoticeable loss in accuracy by using finite difference approx-imated derivatives. It should be noted that the main emphasisof this paper was not on minimizing computation times but onevaluating the validity of the algorithm and a qualitative analy-sis of the generated movements. We expect that further speedupcan be attained via, for example, more efficient approximationtechniques, customized optimization algorithms, and even byimplementing our algorithm directly in C or C++.

D. Comparison With Other Algorithms

We now compare our movement generation procedure withother natural movement generation algorithms reported in theliterature. In [24], we have already shown that our proposedapproach can yield far more natural movements than the linearinterpolation or joint torque minimization. Here, among the var-ious algorithms introduced in [5], the following four algorithmsare compared: the minimum end-effector velocity and acceler-ation movements (MV and MA) in the 3-D Cartesian space andminimum angular velocity and acceleration movements (MAVand MAA) in the joint space. Lim et al. [12] reports it is moreefficient to use, as basis elements in the movement optimiza-tion, the principal components of human joint trajectories than,for example, piecewise B-spline polynomials; we therefore usethe principal components as the movement basis elements forthese four algorithms.

The 3-D Cartesian trajectories of the resulting movementsgenerated via each method under the same boundary conditionsfor each movement (hand-raising and figure “3” gestures),are shown in Fig. 12. The log-likelihoods of the resultinghand-raising movements generated via our approach, MV, MA,MAA, and MAV for the trained HMM are, respectively, −41.3,−237.5, −177,7, −339.1, and −258.5. We can also see thatthe trajectories of the hand-raising movements generated via


Fig. 12. Normalized Cartesian hand trajectories of (a) hand-raising move-ments and (b) figure “3” gestures generated via (A) our framework;(B) hand velocity and (C) acceleration minimization; and (D) joint velocity and(E) acceleration minimization.

our approach shown in Fig. 12(a) are much more similar to themean trajectories shown in Fig. 5(b) than the trajectories of theother algorithms.

The advantage of our proposed movement generation frame-work over MV, MA, MAV, and MAA can be more clearly seenvia the results of the figure “3” gesture shown in Fig. 12(b).Although the principal components representing the joint spacecharacteristics of the figure “3” gesture are used as the move-ment basis elements, with the exception of our method, allthe other methods do not consider the task space character-istics of the figure “3” gesture in the optimization process;the consequence is an unnatural-looking movement as shownin Fig. 12(b). The log-likelihoods of the resulting figure “3”gestures generated via our approach, MV, MA, MAA, andMAV are, respectively, −163.7, −1019.1, −1241.9, −956.1,and −1135.4, further supporting our claims of naturalness.

E. Discussion

Beyond the simple arm movements demonstrated in ourexperiments, in principle, our framework can also be extendedto more complex movements, which are both of a longerduration and involving a greater number of states to describe themovement. To assess the associated computational complexity,we recall that the complexity of the forward procedure forcomputing P (O|λHMM) is O(N2T ), where N and T denote,respectively, the number of states and the length of the obser-vation sequence [13]. Based on this result, it would seem that,in general, it is better to break up movements involving a largenumber of states into a concatenation of smaller (and simpler)

movement segments. Such a movement segmentation can bemanually or automatically performed, as done in [26] and [27].While it is difficult to provide a precise quantitative projectionbecause it would depend on factors such as the kinematicstructure and geometric and topological characteristics of themovement trajectory, for our arm movement studies, we havefound that limiting the number of states for each movement tobelow ten seems to be reasonably effective.

In our HMM training procedure, we use the 3-D Cartesianposition and velocity of the end effector, which are normalizedby the total length of links, as the observation vector. Via sucha normalization, it is possible to apply the trained HMM toother humanoids with different link length proportions withoutretraining the HMM. The observation vector normalization alsoallows us to use the motion capture data of multiple humansubjects for HMM training purposes without degrading thequality of the resulting humanoid movement. For the movementbasis element extraction procedure via PCA, using multiplehuman subjects should not significantly affect the quality of thegenerated movements, considering that the joint angle trajecto-ries should not significantly differ, provided that they belong tothe same movement class.

Finally, we remark that our proposed framework is bestsuited for situations where natural movements are a highpriority, e.g., gestural human–robot interaction and entertain-ment robots performing dance or athletic movements. In itscurrent form, our framework is not well suited for situationswhere movement accuracy is critical, e.g., two-arm cooperativemovements interacting physically or manipulating a commonobject. Indeed, for movements requiring any combination ofprecise force-position-impedance control, demanding that themovement further appear natural would seem to call for analtogether different approach. For two-arm movements that donot involve physical interaction, one straightforward way toapply our framework is to first generate one arm movementusing our methodology and the second arm movement canthen be obtained from the same methodology, with collisionavoidance between the two arms imposed as constraints in theassociated optimization for the latter arm.

V. CONCLUSION

This paper has presented an algorithm for generating naturalhumanlike movements, which simultaneously accounts for taskand joint space characteristics in a unified and consistent fash-ion. Each movement class is represented as an HMM, which istrained with only the task space trajectories of human motioncapture data. At the same time, basis elements in the formof joint angle trajectories for each movement primitive areextracted via the PCA of the motion data. Movements are thengenerated by selecting the basis element weights that result inthe maximum probability for the given HMM and that meetboundary conditions and other user-specified constraints.

The results obtained from our numerical and experimen-tal case studies demonstrate the advantages of our proposedframework. The resulting movements appear substantially morenatural than, for example, movements obtained from a torqueminimization procedure or movements obtained by directly


applying human joint trajectory data to the humanoid platform.Another important advantage of our method is that any kine-matic differences between the humanoid and human subjects,i.e., different link length proportions or joint limit constraints,are naturally accounted for in the embedded optimizationprocedure.

Our current efforts are focused on the following: 1) consider-ing, in lieu of PCA, other methods for constructing the jointspace basis elements, e.g., independent component analysisand nonnegative matrix factorization; 2) replacing the manualmovement segmentation step in the HMM representation of acomplex movement, which is by an automatic segmentationprocedure such as one based on minimizing the minimumdescription length of the movement encoding; 3) seeking waysto improve the computational efficiency of the optimizationprocedure; and 4) considering cyclic movements, such as walk-ing and running, and more complex movements that arise insports and dance.

REFERENCES

[1] D. Matsui, T. Minato, K. MacDorman, and H. Ishiguro, “Generatingnatural motion in an android by mapping human motion,” in Proc.IEEE/RSJ Int. Conf. Intell. Robots Syst., 2005, pp. 3301–3308.

[2] M. Bennewitz, F. Faber, D. Joho, M. Schreiber, and S. Behnke, “Towardsa humanoid museum guide robot that interacts with multiple persons,” inProc. 5th IEEE-RAS Int. Conf. Humanoid Robots, 2005, pp. 418–423.

[3] L. Ren, A. Patrick, A. A. Efros, J. K. Hodgins, and J. M. Rehg,“A data-driven approach to quantifying natural human motion,” ACMTrans. Graph., vol. 24, no. 3, pp. 1090–1097, Jul. 2005.

[4] G. Simmons and Y. Demiris, “Imitation of human demonstration using abiologically inspired modular optimal control scheme,” in Proc. 4th IEEE-RAS Int. Conf. Humanoid Robots, 2004, pp. 215–234.

[5] F. E. Pollick, J. G. Hale, and M. Tzoneva-Hadjigeorgieva, “Perceptionof humanoid movement,” Int. J. Humanoid Robot., vol. 2, no. 3, pp. 277–300, 2005.

[6] A. Safonova, J. K. Hodgins, and N. S. Pollard, “Synthesizing physicallyrealistic human motion in low-dimensional, behavior-specific spaces,”ACM Trans. Graph., vol. 23, no. 3, pp. 514–521, Aug. 2004.

[7] N. A. Bernstein, The Coordination and Regulation of Movements.London, U.K.: Pergamon, 1967.

[8] T. D. Sanger, “Human arm movements described by a low-dimensionalsuperposition of principal components,” J. Neurosci., vol. 20, no. 3,pp. 1066–1072, Feb. 2000.

[9] F. A. Mussa-Ivaldi and E. Bizzi, “Motor learning through the combina-tion of primitives,” Philos. Trans. R. Soc. Lond. B, Biol. Sci., vol. 355,no. 1404, pp. 1755–1769, Dec. 2000.

[10] O. C. Jenkins and M. Mataric, “Deriving action and behavior primitivesfrom human motion data,” in Proc. IEEE/RSJ Int. Conf. Intell. RobotsSyst., 2002, pp. 2551–2556.

[11] O. C. Jenkins and M. Mataric, “Automated derivation of primitives formovement classification,” Auton. Robots, vol. 12, no. 1, pp. 39–54,Jan. 2002.

[12] B. Lim, S. Ra, and F. C. Park, “Movement primitives, principal componentanalysis, and the efficient generation of natural motions,” in Proc. IEEEInt. Conf. Robot. Autom., 2005, pp. 4630–4635.

[13] L. R. Rabiner, “A tutorial on hidden Markov models and selected appli-cations in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286,Feb. 1989.

[14] J. Yamato, J. Ohya, and K. Ishii, “Recognizing human action in time-sequential images using hidden Markov model,” in Proc. IEEE Comput.Soc. Conf. Comput. Vis. Pattern Recog., 1992, pp. 379–385.

[15] T. Starner and A. Pentland, “Real-time American sign language recogni-tion using desk and wearable computer based video,” IEEE Trans. PatternAnal. Mach. Intell., vol. 20, no. 12, pp. 1371–1375, Dec. 1998.

[16] A. D. Wilson and A. F. Bobick, “Parametric hidden Markov models forgesture recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21,no. 9, pp. 884–900, Sep. 1999.

[17] J. Yang, Y. Xu, and C. S. Chen, “Hidden Markov model approach to skilllearning and its application to telerobotics,” IEEE Trans. Robot. Autom.,vol. 10, no. 5, pp. 621–631, Oct. 1994.

[18] J. Yang, Y. Xu, and C. S. Chen, “Human action learning via hiddenMarkov model,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans,vol. 27, no. 1, pp. 34–44, Jan. 1997.

[19] T. Inamura, H. Tanie, and Y. Nakamura, “Keyframe compression anddecompression for time series data based on the continuous hiddenMarkov model,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2003,pp. 1487–1492.

[20] S. Calinon and A. Billard, “Stochastic gesture production and recognitionmodel for a humanoid robot,” in Proc. IEEE/RSJ Int. Conf. Intell. RobotsSyst., 2004, pp. 2769–2774.

[21] S. Calinon and A. Billard, “Recognition and reproduction of gesturesusing a probabilistic framework combining PCA, ICA and HMM,” inProc. 22nd Int. Conf. Mach. Learn., 2005, pp. 105–112.

[22] S. Calinon and A. Billard, “Learning of gestures by imitation in a hu-manoid robot,” in Imitation and Social Learning in Robots, Humansand Animals: Behavioural, Social and Communicative Dimensions,K. Dautenhahn and C. L. Nehaniv, Eds. Cambridge, U.K.: CambridgeUniv. Press, 2007.

[23] T. Asfour, F. Gyarfas, P. Azad, and R. Dillmann, “Imitation learning ofdual-arm manipulation tasks in humanoid robots,” in Proc. 6th IEEE-RASInt. Conf. Humanoid Robots, 2006, pp. 40–47.

[24] J. Kwon and F. C. Park, “Using hidden Markov models to generate naturalhumanoid movements,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.,2006, pp. 1990–1995.

[25] K. Murphy, Hidden Markov model (HMM) toolbox for Matlab. [Online].Available: http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html

[26] J. Barbiè, A. Safonova, J. Y. Pan, C. Faloutsos, J. K. Hodgins, andN. S. Pollard, “Segmenting motion capture data into distinct behaviors,”in Proc. Graph. Interface, 2004, pp. 185–194.

[27] D. Bouchard and N. Badler, “Semantic segmentation of motion captureusing Laban movement analysis,” in Proc. Intell. Virtual Agents, 2007,vol. 4722, pp. 37–44.

Junghyun Kwon received the B.S. and Ph.D.degrees in mechanical engineering from SeoulNational University, Seoul, Korea, in 2002 and 2008,respectively.

He is currently with the School of Mechanical andAerospace Engineering, Seoul National University.His research interests include visual tracking and es-timation in robotics-related applications, movementrecognition and generation for human–robot interac-tion, and 2-D/3-D image visualization for medicalimaging and nondestructive testing.

Frank C. Park received the B.S. degree in elec-trical engineering and computer science from theMassachusetts Institute of Technology, Cambridge,in 1985, and the Ph.D. degree in applied mathematicsfrom Harvard University, Cambridge, MA, in 1991.

From 1991 to 1994, he was an Assistant Professorof mechanical and aerospace engineering with theUniversity of California, Irvine. Since 1995, he hasbeen with the School of Mechanical and AerospaceEngineering, Seoul National University, Seoul,Korea, where he is currently a Professor. He is an

Associate Editor of the American Society of Mechanical Engineers Journal ofMechanisms and Robotics and the Parts Editor of the Springer-Verlag Hand-book of Robotics. His research interests are robotics, mathematical systemstheory, and related areas of applied mathematics.

Dr. Park is a 2007–2008 IEEE Robotics and Automation Society Distin-guished Lecturer and serves as Senior Editor of the IEEE TRANSACTIONS ON

ROBOTICS and as the Secretary of the IEEE Robotics and Automation Society.

Documents

Natural Movement Generation Using Hidden Markov Models and … · 2017-12-04 · 1184 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 38, NO. 5, OCTOBER