10
Style machines Matthew Brand Mitsubishi Electric Research Laboratory Aaron Hertzmann NYU Media Research Laboratory A pirouette and promenade in five synthetic styles drawn from a space that contains ballet, modern dance, and different body types. The choreography is also synthetic. Streamers show the trajectory of the left hand and foot. Abstract We approach the problem of stylistic motion synthesis by learn- ing motion patterns from a highly varied set of motion capture se- quences. Each sequence may have a distinct choreography, per- formed in a distinct style. Learning identifies common choreo- graphic elements across sequences, the different styles in which each element is performed, and a small number of stylistic degrees of freedom which span the many variations in the dataset. The learned model can synthesize novel motion data in any interpolation or extrapolation of styles. For example, it can convert novice bal- let motions into the more graceful modern dance of an expert. The model can also be driven by video, by scripts, or even by noise to generate new choreography and synthesize virtual motion-capture in many styles. CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Animation; I.2.9 [Artificial Intelligence]: Robotics—Kinematics and Dynamics; G.3 [Mathematics of Com- puting]: Probability and Statistics—Time series analysis; E.4 [Data]: Coding and Information Theory—Data compaction and compression; J.5 [Computer Applications]: Arts and Humanities— Performing Arts Keywords: animation, behavior simulation, character behavior. 1 Introduction It is natural to think of walking, running, strutting, trudging, sashay- ing, etc., as stylistic variations on a basic motor theme. From a di- rectorial point of view, the style of a motion often conveys more meaning than the underlying motion itself. Yet existing animation tools provide little or no high-level control over the style of an ani- mation. In this paper we introduce the style machine—a statistical model that can generate new motion sequences in a broad range of styles, just by adjusting a small number of stylistic knobs (parameters). Style machines support synthesis and resynthesis in new styles, as well as style identification of existing data. They can be driven by many kinds of inputs, including computer vision, scripts, and noise. Our key result is a method for learning a style machine, including the number and nature of its stylistic knobs, from data. We use style machines to model highly nonlinear and nontrivial behaviors such as ballet and modern dance, working with very long unsegmented motion-capture sequences and using the learned model to generate new choreography and to improve a novice’s dancing. Style machines make it easy to generate long motion sequences containing many different actions and transitions. They can of- fer a broad range of stylistic degrees of freedom; in this paper we show early results manipulating gender, weight distribution, grace, energy, and formal dance styles. Moreover, style machines can be learned from relatively modest collections of existing motion- capture; as such they present a highly generative and flexible alter- native to motion libraries. Potential uses include: Generation: Beginning with a modest amount of motion capture, an animator can train and use the result- ing style machine to generate large amounts of motion data with new orderings of actions. Casts of thousands: Random walks in the machine can produce thousands of unique, plausible motion choreographies, each of which can be synthesized as motion data in a unique style. Improvement: Motion capture from unskilled performers can be resynthesized in the style of an expert athlete or dancer. Retargetting: Motion capture data can be resynthesized in a new mood, gender, energy level, body type, etc. Acquisition: Style machines can be driven by computer vision, data-gloves, even impoverished sensors such as the computer mouse, and thus offer a low-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed the problem of editing and reuse of existing animation. A common approach is to provide interac- tive animation tools for motion editing, with the goal of capturing the style of the existing motion, while editing the content. Gle- icher [11] provides a low-level interactive motion editing tool that searches for a new motion that meets some new constraints while minimizing the distance to the old motion. A related optimization method method is also used to adapt a motion to new characters [12]. Lee et al. [15] provide an interactive multiresolution mo- tion editor for fast, fine-scale control of the motion. Most editing Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. SIGGRAPH 2000, New Orleans, LA USA © ACM 2000 1-58113-208-5/00/07 ...$5.00 183

Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

Style machines

Matthew BrandMitsubishi Electric Research Laboratory

Aaron HertzmannNYU Media Research Laboratory

A pirouette and promenade in five synthetic styles drawn from a space that contains ballet, modern dance, and different bodytypes. The choreography is also synthetic. Streamers show the trajectory of the left hand and foot.

Abstract

We approach the problem of stylistic motion synthesis by learn-ing motion patterns from a highly varied set of motion capture se-quences. Each sequence may have a distinct choreography, per-formed in a distinct style. Learning identifies common choreo-graphic elements across sequences, the different styles in whicheach element is performed, and a small number of stylistic degreesof freedom which span the many variations in the dataset. Thelearned model can synthesize novel motion data in any interpolationor extrapolation of styles. For example, it can convert novice bal-let motions into the more graceful modern dance of an expert. Themodel can also be driven by video, by scripts, or even by noise togenerate new choreography and synthesize virtual motion-capturein many styles.

CR Categories: I.3.7 [Computer Graphics]: Three-DimensionalGraphics and Realism—Animation; I.2.9 [Artificial Intelligence]:Robotics—Kinematics and Dynamics; G.3 [Mathematics of Com-puting]: Probability and Statistics—Time series analysis; E.4[Data]: Coding and Information Theory—Data compaction andcompression; J.5 [Computer Applications]: Arts and Humanities—Performing Arts

Keywords: animation, behavior simulation, character behavior.

1 Introduction

It is natural to think of walking, running, strutting, trudging, sashay-ing, etc., as stylistic variations on a basic motor theme. From a di-rectorial point of view, the style of a motion often conveys moremeaning than the underlying motion itself. Yet existing animationtools provide little or no high-level control over the style of an ani-mation.

In this paper we introduce the style machine—a statistical modelthat can generate new motion sequences in a broad range of styles,just by adjusting a small number of stylistic knobs (parameters).Style machines support synthesis and resynthesis in new styles, aswell as style identification of existing data. They can be driven bymany kinds of inputs, including computer vision, scripts, and noise.Our key result is a method for learning a style machine, includingthe number and nature of its stylistic knobs, from data. We use stylemachines to model highly nonlinear and nontrivial behaviors suchas ballet and modern dance, working with very long unsegmentedmotion-capture sequences and using the learned model to generatenew choreography and to improve a novice’s dancing.

Style machines make it easy to generate long motion sequencescontaining many different actions and transitions. They can of-fer a broad range of stylistic degrees of freedom; in this paper weshow early results manipulating gender, weight distribution, grace,energy, and formal dance styles. Moreover, style machines canbe learned from relatively modest collections of existing motion-capture; as such they present a highly generative and flexible alter-native to motion libraries.

Potential uses include:Generation: Beginning with a modestamount of motion capture, an animator can train and use the result-ing style machine to generate large amounts of motion data withnew orderings of actions.Casts of thousands: Random walks inthe machine can produce thousands of unique, plausible motionchoreographies, each of which can be synthesized as motion datain a unique style.Improvement: Motion capture from unskilledperformers can be resynthesized in the style of an expert athlete ordancer. Retargetting: Motion capture data can be resynthesizedin a new mood, gender, energy level, body type, etc.Acquisition:Style machines can be driven by computer vision, data-gloves, evenimpoverished sensors such as the computer mouse, and thus offer alow-cost, low-expertise alternative to motion-capture.

2 Related workMuch recent effort has addressed the problem of editing and reuseof existing animation. A common approach is to provide interac-tive animation tools for motion editing, with the goal of capturingthe style of the existing motion, while editing the content. Gle-icher [11] provides a low-level interactive motion editing tool thatsearches for a new motion that meets some new constraints whileminimizing the distance to the old motion. A related optimizationmethod method is also used to adapt a motion to new characters[12]. Lee et al. [15] provide an interactive multiresolution mo-tion editor for fast, fine-scale control of the motion. Most editing

Permission to make digital or hard copies of part or all of this work or personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. SIGGRAPH 2000, New Orleans, LA USA © ACM 2000 1-58113-208-5/00/07 ...$5.00

183

Page 2: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

[A] [B] [C]

4 1

2

3

1 3

3

1

2

62

4

5

4

[D]

1

1

2

24

3

3

44

1

2

3 [E]

+v

Figure1: Schematicillustrating the effects of cross-entropy minimization. [A]. Threesimplewalk cyclesprojectedonto 2-space.Eachdatapoint representsthe bodyposeobservedat a given time. [B]. In conventionallearning,onefits a singlemodelto all thedata(ellipsesindicatestate-specificisoprobabilitycontours;arcsindicateallowabletransitions).But herelearningis overwhelmedby variationamongthesequences,andfails to discover the essentialstructureof the walk cycle. [C]. Individually estimatedmodelsarehopelesslyoverfit to theirindividual sequences,andwill not generalizeto new data. In addition,they divide up the cycle differently, andthuscannotbe blendedorcompared.[D]. Cross-entropy minimizedmodelsareconstrainedto have similar qualitative structureandto identify similar phasesof thecycle. [E]. Thegenericmodelabstractsall theinformationin thestyle-specificmodels;varioussettingsof thestylevariable� will recoverallthespecificmodelsplusany interpolationor extrapolationof them.

systemsproduceresultsthat may violate the laws of mechanics;Popovic andWitkin [16] describea methodfor editingmotionin areduced-dimensionalityspacein orderto edit motionswhile main-tainingphysicalvalidity. Suchamethodwouldbeausefulcomple-mentto thetechniquespresentedhere.

An alternativeapproachis to providemoreglobalanimationcon-trols. Signalprocessingsystems,suchasdescribedby BruderlinandWilliams [7] andUnumaetal. [20], providefrequency-domaincontrolsfor editingthestyleof a motion. Witkin andPopovic [22]blendbetweenexisting motionsto provide a combinationof mo-tion styles.Roseet al. [18] useradialbasisfunctionsto interpolatebetweenandextrapolatearounda setof alignedandlabeledexam-ple motions(e.g.,happy/sadandyoung/oldwalk cycles),thenusekinematicsolversto smoothlystringtogetherthesemotions.Simi-lar functionalityfallsoutof our framework.

Althoughsuchinteractivesystemsprovidefinecontrol,they relyon the laborsof skilled animatorsto producecompellingandcon-vincing results. Furthermore,it is generallydifficult to produceanew motionthatis substantiallydifferentfrom theexistingmotions,in styleor in content(e.g.,to convert by handa ponderouswalk toa jauntydance,etc.)

Theabovesignal-processingmethodsalsorequirethattheexam-ple motionsbetime-warped;in otherwords,thatsequentialcorre-spondencescanbefoundbetweeneachcomponentof eachmotion.Unfortunately, it is rarelythecasethatany setof complex motionswill havethisproperty. Stylemachinesautomaticallycomputeflex-ible many-to-many correspondencesbetweensequencesusingfastdynamicprogrammingalgorithms.

Ourwork unitestwo themesthathaveseparateresearchhistoriesin motionanalysis:estimationof dynamical(in thesenseof time-evolving) modelsfrom examples,andstyleandcontentseparation.Howe et al. [14] analyzemotion from video using a mixture-of-Gaussiansmodel. Grzeszczuket al. [13] learncontrol andphys-ical systemsfrom physical simulation. Several authorshave usedhiddenMarkov Modelsto analyzeandsynthesizemotion. Bregler[6] andBrand[5] useHMMs to recognizeandanalyzemotionfromvideosequences.Brand[4] analyzesandresynthesizesanimationof humanspeechfrom exampleaudioandvideo. With regard tostyles, Wilson and Bobick [21] use parametricHMMs, in whichmotion recognitionmodelsare learnedfrom user-labeledstyles.Tenenbaumand Freeman[19, 10] separatestyle from contentingeneraldomainsundera bilinearmodel,therebymodelingfactorsthathaveindividually linearbut cooperatively multiplicativeeffectson the output, e.g., the effects of lighting andposein imagesoffaces.

Thesestyle/contentmodelsdependon largesetsof hand-labeledand hand-alignedsamples(often exponential in the number ofstylisticdegreesof freedomDOFs)plusanexplicit statementof what

the stylistic DOFs are. We now introducemethodsfor extractingthis informationdirectly andautomaticallyfrom modestamountsof data.

3 Learning stylistic state-space modelsWe seeka modelof humanmotion from which we cangeneratenovel choreography in a variety of styles. Ratherthanattempttoengineersucha model,we will attemptto learnit—to extractfromdataa functionthatapproximatesthedata-generatingmechanism.

We castthis asanunsupervisedlearningproblem,in which thegoal is to acquirea generative model that capturesthe data’s es-sentialstructure(tracesof thedata-generatingmechanism)anddis-cardsits accidentalproperties(particularsof the specificsample).Accidentalpropertiesincludenoiseandthebiasof thesample.Es-sentialstructurecanalsobe divided into two components,whichwe will call structureandstyle. For example,walking, running,strutting,etc.,areall stylistic variationson bipedallocomotion,adynamicalsystemwith particularlysimple temporalstructure—adeterministicloop.

It is up to the modelerto make the structure/styledistinction.State-spacerepresentationsare very useful here: We take thestructure of bipedallocomotionto be a small setof dynamically-significant qualitative statesalong with the rules that governchangesof state.Wetakestyleto bevariationsin themappingfromqualitative statesto quantitative observations. For example,shift-ing one’sweightloadontotheright leg is adynamically-significantstatecommonto all forms of bipedallocomotion,but it will lookquitedifferentin running,trudging,etc.

An appropriatestate-spacemodelfor time-seriesdatais thehid-denMarkov model (HMM). An HMM is a probabilisticfinite-statemachineconsistingof a set of discretestates,state-to-statetran-sition probabilities,andstate-to-signalemissionprobabilities—inthispaper, eachstatehasaGaussiandistributionoverasmallspaceof full-body posesandmotions.(See

�A for aconciseHMM tutorial;

see[17] for a detailedtutorial.) We will addto theHMM a multidi-mensionalstylevariable � thatcanbeusedto vary its parameters,andcall theresulta stylisticHMM (SHMM), or time-seriesstyle ma-chine. (See

�B for formal definitions.)The SHMM definesa space

of HMMs; fixing theparameter� yieldsauniqueHMM.Herewe show how to separatestructure,style, andaccidental

propertiesin a datasetby minimizing entropiesin the SHMM. Themainadvantagesof separatingstylefrom structureis thatwe windup with simpler, moregenerative modelsfor both,andwe candoso with significantlylessdatathanrequiredfor the generallearn-ing setting.Our framework is fully unsupervisedandautomaticallyidentifiesthenumberandnatureof thestylisticdegreesof freedom(often much fewer than the numberof variationsin the dataset).

184

Page 3: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

Thediscovereddegreesof freedomlend themselvesto someintu-itive operationsthat arevery useful for synthesis:style mixtures,exaggerations,extrapolations,andevenanalogies.

3.1 Generic and style-specific models

We begin with a family of trainingsamples.By “f amily” we meanthatall thesampleshave somegenericdata-generatingmechanismin common,e.g.,themotorprogramfor dancing.Eachsamplemayinstantiatea differentvariation. Thesamplesneednot bealigned,e.g.,theordering,timing,andappearanceof actionsmayvaryfromsampleto sample.Ourmodelinggoalis to extractasingleparame-terizedmodelwhichcoversthegenericbehavior in theentirefamilyof trainingsamples,andwhich caneasilybemadeto modelanin-dividualstyle,combinationof styles,or extrapolationof styles,justby choosinganappropriatesettingof thestylevariable� .

Learning involves the simultaneousestimation of a genericmodelanda setof style-specificmodelswith threeobjectives: (1)eachmodelshouldfit its sample(s)well; (2) eachspecificmodelshouldbe closeto the genericmodel; and (3) the genericmodelshouldbe as simple as possible,therebymaximizing probabilityof correctgeneralizationto new data. Theseconstraintshave aninformation-theoreticexpressionin eqn.1. In thenext sectionwewill explainhow thelasttwo constraintsinteractto producea thirddesirableproperty: The style-specificmodelscanbe expressedassmallvariationson thegenericmodel,andthespaceof suchvaria-tionscanbecapturedwith justa few parameters.

We first describethe useof style machinesas appliedto puresignaldata. Detailsspecificto working with motion-capturedataaredescribedin � 4.

3.2 Estimation by entropy minimization

In learning we minimize a sum of entropies—whichmeasuretheambiguityin a probabilitydistribution—andcross-entropies—whichmeasurethedivergencebetweendistributions.Theprincipleof minimumentropy, advocatedin variousformsby [23, 2,8], seeksthesimplestmodelthatexplainsthedata,or, equivalently, themostcomplex modelwhoseparameterestimatesarefully supportedbythedata.This maximizestheinformationextractedfrom thetrain-ing dataandbooststheoddsof generalizingcorrectlybeyondit.

The learningobjective hasthreecomponents,correspondingtotheconstraintslistedabove:

1. Thecross-entropy betweenthemodeldistribution andstatis-ticsextractedfrom thedatameasuresthemodel’smisfit of thedata.

2. The cross-entropy betweenthe genericanda specificmodelmeasuresinconsistenciesin theiranalysisof thedata.

3. The entropy of the genericmodel measuresambiguity andlackof structurein its analysisof thedata.

Minimizing #1makesthemodelfaithful to thedata.Minimizing #2essentiallymaximizestheoverlapbetweenthegenericandspecificmodelsandcongruence(similarity of support)betweentheir hid-denstates.Thismeansthatthemodels“behave” similarly andtheirhiddenstateshavesimilar “meanings.” For example,in adatasetofbipedalmotionsequences,all thestyle-specificHMMs shouldcon-verge to similar finite-statemachinemodelsof the locomotioncy-cle, andcorrespondingstatesin eachHMM to refer to qualitativelysimilar posesandmotionsin the locomotioncycle. E.g., the � thstatein eachmodelis tunedto theposesin whichthebody’sweightshiftsonto the right leg, regardlessof thestyleof motion (seefig-ure 1). Minimizing #3 optimizesthe predictivenessof the modelby makingsurethat it givesthe clearestandmostconcisepictureof the data,with eachhiddenstateexplaininga clearlydelineatedphenomenonin thedata.

Figure2: Flatteningandalignmentof Gaussiansby minimizationof entropy andcross-entropy, respectively. Gaussiandistributionsarevisualizedasellipsoidiso-probabilitycontours.

Putting this all togethergives the following learningobjectivefunction

���� ��� �������

-log posterior

1:-log likelihood�������dataentropy

��� ����� � �misfit

�-log prior��� � �

3:modelentropy

��� � �� � � �2:incongruence

�"!#!$!

(1)where

�is a vectorof modelparameters;

�is a vectorof expected

sufficient statisticsdescribingthedata % ;�

parameterizesa ref-erencemodel(e.g.,thegeneric);

���'&(�is anentropy measure;and� �'&(�

is acrossentropy measure.Eqn. 1 can also be formulated as a Bayesian posterior)*� �,+ �-� . )*��� +/� �0)*� � �

with likelihood function)*� % +/� � .

1325476987:;25<=698?>�@A: and a prior)*� � � . 13254769@A:;25<=69@CB�>�@A: . The

dataentropy term, not mentionedabove, arisesin the normaliza-tion of the likelihoodfunction; it measuresambiguityin thedata-descriptivestatisticsthatarecalculatedvis-a vis themodel.

3.3 Effects of the prior

It is worth examiningthe prior becausethis is what will give thefinal model

� �its specialstyle-spanningandgenerativeproperties.

The prior term 13254769@D: expressesour belief in the parsimonyprinciple—amodelshouldgiveamaximallyconciseandminimallyuncertainexplanationof thestructurein its trainingset. This is anoptimal biasfor extractingasmuchinformationaspossiblefromthe data[3]. We apply this prior to the genericmodel. The priorhasaninterestingeffect on theSHMM’s emissiondistributionsoverposeandvelocity: It graduallyremovesdimensionsof variation,becauseflatteningadistributionis themosteffectiveway to reduceits volumeandthereforeits entropy (seefigure2).

The prior term 1 25<=69@EBF>�@A: keepsstyle modelscloseand con-gruentto thegenericmodel,sothatcorrespondinghiddenstatesintwo modelshave similar behavior. In practicewe assessthis prioronly on theemissiondistributionsof thespecificmodels,whereithasthe effect of keepingthe variation-dependentemissiondistri-butionsclusteredtightly aroundthe genericemissiondistribution.Consequentlyit minimizesdistancebetweencorrespondingstatesin themodels,not betweentheentiremodels.We alsoadda termGIHKJ & � � � � � �

that allows us to vary the strengthof the cross-entropy prior in thecourseof optimization.

By constraininggenericandstyle-specificGaussiansto overlap,and constrainingboth to be narrow, we causethe distribution ofstate-specificGaussiansacrossstylesto have a small numberofdegreesof freedom. Intuitively, if two Gaussiansarenarrow andoverlap, then they must be alignedin the directionsof their nar-rowness(e.g.,two overlappingdisksin 3-spacemustbeco-planar).Figure2 illustrates. The more dimensionsin which the overlap-ping Gaussiansare flat, the fewer degreesof freedomthey haverelative to eachother. Consequently, asstyle-specificmodelsaredrawn towardthegenericmodelduringtraining,all themodelsset-tle into a parametersubspace(seefigure3). Within this subspace,all thevariationbetweenthestyle-specificmodelscanbedescribedwith just a few parameters.We canthenidentify thosedegreesoffreedomby solving for a smoothlow-dimensionalmanifold thatcontainstheparameterizationsof all thestyle-specificmodels.Our

185

Page 4: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

Figure3: Schematicof style DOF discovery. LEFT: Without cross-entropy constraints,style-specificmodels(pentagons)aredrawn totheir data(clouds),andtypically spanall dimensionsof parameterspace.RIGHT: Whenalsodrawn to a genericmodel,they settleintoaparametersubspace(indicatedby thedashedplane).

experimentsshowedthata linearsubspaceusuallyprovidesagoodlow-dimensionalparameterizationof thedataset’s stylistic degreesof freedom.Thesubspaceis easilyobtainedfrom a principalcom-ponentsanalysis(PCA) of a set of vectors,eachrepresentingonemodel’sparameters.

It is oftenusefulto extendtheprior with additionalfunctionsof�. For example,adding GIH &$��� � �

andvarying H givesdetermin-istic annealing,an optimizationstrategy that forcesthe systemtoexplore the error surfaceat many scalesas H LNM insteadof my-opically converging to thenearestlocal optimum(seeO C for equa-tions).

3.4 Optimization

In learningwehopeto simultaneouslysegmentthedatainto motionprimitives, matchsimilar primitives executedin different styles,andestimatethe structureandparametersfor minimally ambigu-ous,maximallygenerativemodels.Entropicestimation[2] givesusa framework for solving this partially discreteoptimizationprob-lem by embeddingit in a high-dimensionalcontinuousspaceviaentropies.It alsogivesusnumericalmethodsin the form of max-imum a posteriori(MAP) entropy-optimizingparameterestimators.Theseattemptto find a bestdata-generatingmodel,by graduallyextinguishingexcessmodelparametersthatarenot well-supportedby thedata.Thissolvesthediscreteoptimizationproblemby caus-ing adiffusedistributionoverall possiblesegmentationsto collapseontoasinglesegmentation.

Optimizationproceedsvia Expectation-Maximization(EM) [1,17], a fast and powerful fixpoint algorithm that guaranteescon-vergenceto a local likelihood optimum from any initialization.The estimatorswe give in O D modify EM to do cross-entropy op-timization and annealing. AnnealingstrengthensEM’s guaranteeto quasi-globaloptimality—globalMAP optimality with probabil-ity approaching1 astheannealingshedulelengthens—anecessaryassurancedueto the numberof combinatorialoptimizationprob-lemsthatarebeingsolvedsimultaneously:segmentation,labeling,alignment,modelselection,andparameterestimation.Thefull algorithmis:

1. Initialize a genericmodel and one style-specificmodel foreachmotionsequence.

2. EM loopuntil convergence:

(a) E step:ComputeexpectedsufficientstatisticsP of eachmotionsequencerelative to its model.

(b) M step:(generic):Calculatemaximuma posterioripa-rametervalues QR�S with theminimum-entropy prior, us-ing E-stepstatisticsfrom theentiretrainingset.

(c) M step: (specific): Calculatemaximuma posterioriparametervalues QR with the minimum-cross-entropyprior, only usingE-stepstatisticsfrom the currentse-quence.

(d) Adjust thetemperature(seebelow for schedules).

3. Find a subspacethatspanstheparametervariationsbetweenmodels.E.g.,calculatea PCA of thedifferencesbetweenthegenericandeachstyle-specificmodel.

Initialization canbe randombecausefull annealingwill obliterateinitial conditions.If onecanencodeusefulhintsin theinitial model,thentheEM loopshouldusepartialannealingby startingata lowertemperature.

HMMs have a usefulpropertythat savesus the troubleof hand-segmentingand/orlabelling the training data: The actionsin anyparticulartrainingsequencemaybesquashedandstretchedin time,oddly ordered,and repeated;in the courseof learning, the ba-sic HMM dynamicprogrammingalgorithmswill find an optimalsegmentationand labelling of eachsequence.Our cross-entropyprior simply addsthe constraintthat similarly-labeledframesex-hibit similar behavior (but not necessarilyappearance)acrossse-quences.Figure4 illustrateswith the inducedstatemachineandlabellingof four similarbut incongruentsequences,andaninducedstatemachinethat capturesall their choreographicelementsandtransitions.

4 Working with Motion Capture

As in any machine-learningapplication,onecanmake theproblemharderor easierdependingonhow thedatais representedto theal-gorithm.Learningalgorithmslook for themoststatisticallysalientpatternsof variationthedata.For motioncapture,thesemaynotbethepatternsthathumansfind perceptuallyandexpressivelysalient.Thuswe want to preprocessthedatato highlight sourcesof varia-tion that “tell thestory” of a dance,suchasleg-motionsandcom-pensatorybody motions,andsuppressirrelevant sourcesof varia-tion, suchasinconsistentmarkerplacementsandworld coordinatesbetweensequences(whichwouldotherwisebemodeledasstylisticvariations).Othersourcesof variation,suchasinter-sequencevari-ationsin bodyshapes,needto bescaleddown so that they do notdominatestylespace.Wenow describemethodsfor convertingrawmarkerdatainto asuitablerepresentationfor learningmotion.

4.1 Data Gathering and Prepr ocessing

We first gatheredhumanmotion capturedata from a variety ofsources(seeacknowledgementsin O 8). The dataconsistsof the3D positionsof physicalmarkersplacedonhumanactors,acquiredovershortintervalsin motioncapturestudios.Eachdatasourcepro-videddatawith a differentarrangementof markersover thebody.We defineda reduced20 marker arrangement,suchthatall of themarkers in the input sequencescould be convertedby combininganddeletingextra markers. (Notethat themissingmarkerscanberecoveredfrom synthesizeddatalater by remappingthe style ma-chinesto theoriginal input marker data.)We alsodoubledthesizeof thedatasetby mirroring,andresampledall sequencesto 60Hz.

Capturedand syntheticmotion capturedatain the figuresandanimationsshow themotionsof markersconnectedby afakeskele-ton. The“bones”of this skeletonhave no algorithmicvalue; theyareaddedfor illustrationpurposesonly to make themarkerseasierto follow.

The next stepis to convert marker datainto joint anglespluslimb lengths,global positionandglobal orientation. The coccyx(nearthebaseof theback)is usedastherootof thekinematictree.Jointanglesaloneareusedfor training. Jointanglesareby natureperiodic(for example,rangingfrom M to TVU ); becausetrainingas-sumesthat the input signal lies in the infinite domainof WYX , wetook somepain to choosea joint angleparameterizationwithout

186

Page 5: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

discontinuities(suchasa jump from ZV[ to \ ) in thetrainingdata.1

Howe] ver, we were not able to fully eliminateall discontinuities.(This is partially dueto someperversitiesin theinput datasuchasinvertedkneebends.)

Conversionto joint anglesremovesinformationaboutwhich ar-ticulationscausegreatestchangesin bodypose.To restorethis in-formation,wescalejoint anglevariablestomakestatisticallysalientthosearticulationsthatmostvary thepose,measuredin thedatasetby a proceduresimilar to that describedby Gleicher[11]. To re-ducethe dependenceon individual body shape,the meanposeissubtractedfrom eachsequence,Finally, noiseanddimensionalityarereducedvia PCA; we typically useten or fewer significantdi-mensionsof datavariationfor training.

4.2 Training

Modelsareinitialized with a statetransitionmatrix ^`_ba?c thathasprobabilitiesdeclining exponentiallyoff the diagonal; the Gaus-siansare initialized randomlyor centeredon every d th frameofthesequence.Theseinitial conditionssave the learningalgorithmthe gratuitoustroubleof selectingfrom amonga factorialnumberof permutationallyequivalentmodels,differingonly in theorderingof their states.

Wetrainwith annealing,settingthetemperaturee highandmak-ing it decayexponentiallytowardzero.Thisforcestheestimatorstoexploretheerrorsurfaceatmany scalesbeforecommittingto apar-ticular region of parameterspace.In the high-temperaturephase,we setthecross-entropy temperatureeKf to zero,to forcethevaria-tion modelsto staynearthegenericmodel. At high temperatures,any accidentalcommitmentsmadein the initialization arelargelyobliterated.As thegenerictemperaturedeclines,webriefly heatupthe cross-entropy temperature,allowing the style-specificmodelsto ventureoff to find datapointsnot well explainedby thegenericmodel.We thendrivebothtemperaturesto zeroandlet theestima-torsconvergeto anentropy minimum.

Thesetemperatureschedulesarehints that guidethe optimiza-tion: (1) Find global structure;(2) offload local variation to thespecificmodels;(3) thensimplify (compress)all modelsasmuchaspossible.

The result of training is a collection of modelsand for eachmodel, a distribution g over its hidden states,where hjilkmconlprq =s (statet explainsframe u , givenall theinformationin thesequencep ). Typically thisdistributionhaszeroor near-zeroentropy, mean-ing that g hascollapsedto a single statesequencethat explainsthedata. g (or thesequenceof mostprobablestates)encodesthecontentof thedata;asweshow below, applyingeitheroneto adif-ferentstyle-specificmodelcausesthatcontentto beresynthesizedin adifferentstyle.

We useg to remapeachmodel’s emissiondistributionsto jointanglesandangularvelocities,scaledaccordingthe importanceofeachjoint. This informationis neededfor synthesis.Remappingmeansre-estimatingemissionparametersto observe a time-seriesthatis synchronouswith thetrainingdata.

4.3 Making New Styles

We encodea style-specificHMM in a vectorby concatentatingitsstatemeansv c , square-rootcovariances( wxcy_$z {|wxcy_F{�} for t�~�� ),andstatedwell times(on average,how long a modelstaysin onestatebeforetransitioningout). New stylescanbecreatedby inter-polationandextrapolationwithin this space. The dimensionalityof the spaceis reducedby PCA, treatingeachHMM asa singleob-servation andthe genericHMM asthe origin. The PCA givesus a

1For cyclic domains,onewould ideally usevon Mises’ distribution,es-sentially a Gaussianwrappedarounda circle, but we cannotbecausenoanalyticvarianceestimatoris known.

subspaceof modelswhoseaxesare intrinsic degreesof variationacrossthestylesin the trainingset. Typically, only a few stylisticDOFs areneededto spanthemany variationsin a trainingset,andthesebecomethedimensionsof thestylevariable � . Oneinterpo-latesbetweenany stylesin the training setby varying � betweenthe coordinatesof their modelsin the style subspace,thenrecon-stitutinga style-specificHMM from the resultingparametervector.Of course,it is moreinterestingto extrapolate,by goingoutsidetheconvex hull of thetrainingstyles,a themethatis exploredbelow in�5.

4.4 Anal yzing New Data

To obtainthe style coordinatesof a novel motionsequencep , webegin with acopy of thegenericmodel(or of astyle-specificmodelwhich assignsp high likelihood),thenretrainthatmodelon p , us-ing cross-entropy constraintswith respectto the original genericmodel. Projectionof theresultingparametersontothestylemani-fold givesthestylecoordinates.We alsoobtainthesample’s stateoccupancy matrix g . As mentionedbefore,this summarizesthecontentof the motion sequencep andis the key to synthesis,de-scribedbelow.

4.5 Synthesizing Vir tual Motion Data

Given a new value of the style variable � and a statesequence� nlprq���g�nlp�q encodingthe contentof p , onemay resynthesizepin the new style p f by calculatingthe maximum-likelihood path�����7��������� s nlp f {/�=} � nlprq�q . Brand[4] describesamethodfor calcu-lating the maximum-likelihoodsamplein �-n0eq time for e time-steps.

�F generalizesand improveson this result, so that all the

informationin g is used.Thisresultingpathis aninherentlysmoothcurvethatvarieseven

if the systemdwells in the samehiddenstatefor several frames,becauseof thevelocity constraintson eachframe. Motion discon-tinuitiesin thesynthesizedsamplesarepossibleif thedifferenceinvelocitiesbetweensuccessive statesis large relative to the frame(sampling)rate. Thepreprocessingstepsarethenreversedto pro-ducevirtual motion-capturedataasthefinal output.

Someactionstake longer in differentstyles;aswe move fromstyle to style, this is accommodatedby scalingdwell timesof thestatesequenceto matchthoseof thenew style.This is oneof manywaysof makingtimeflexible; anotheris to incorporatedwell timesdirectly into theemissiondistributionsandthensynthesizea list ofvarying-sizedtime-stepsby which to clock thesynthsizedmotion-captureframes.

In addition to resynthesizingexisting motion-capturein newstyles,it is possibleto generateentirely new motion datadirectlyfromthemodelitself. Learningautomaticallyextractsmotionprim-itivesandmotioncyclesfrom thedata(seefigure4), whichtaketheform of statesub-sequences.By cuttingandpastingthesestatese-quences,we can sequencenew choreography. If a model’s statemachinehasanarcbetweenstatesin two consecutively scheduledmotionprimitives,themodelwill automaticallymodify theendandbeginningof thetwo primitivesto transitionsmoothly. Otherwise,we must find a path throughthe statemachinebetweenthe twoprimitivesandinsertall thestatesonthatpath.An interestingeffectcanalsobeachievedby doinga randomwalk onthestatemachine,whichgeneratesrandombut plausiblechoreography.

5 ExamplesWe collecteda setof bipedal locomotion time-series from a va-riety of sources.Thesemotion-capturesequencesfeaturea vari-ety of differentkindsof motion,body types,postures,andmarkerplacements.We convertedall motiondatato usea commonsetofmarkerson a prototypicalbody. (If we do not normalizethebody,

187

Page 6: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

walk kick spin tiptoewindmill

walk

kickspin

tiptoe

windm

ill

...st

ates

...

...st

ates

...

...time... ...time...

...st

ates

...

...st

ates

...

Female lazy ballet

1...1......1277...1440 1...1... ...1603...time......time... ...1471

Female expert ballet Female modern danceMale ballet

Figure4: TOP: Statemachinelearnedfrom four dancesequencestotalling 6000+frames.Very low-probabilityarcshave beenremovedforclarity. Motion cycleshavebeenlabeled;otherprimitivesarecontainedin linearsequences.BOTTOM: Occupancy matrices(constructedwhilelearning)indicatehow eachsequencewassegmentedandlabeled.Notethevariationsin timing, ordering,andcyclesbetweensequences.

: ::

:

Figure 5: Completionof the analogywalking:running::strutting:Xvia synthesisof stylistic motion. Stick figuresshow every 5 frames;streamersshow thetrajectoriesof theextremities.X extrapolatesboththeenergeticarmswingof struttingandthepowerstrideof running.

Figure6: Fivemotionsequencessynthesizedfromthesamechoreography, but in differentstyles(oneperrow). Theactions,alignedvertically,aretiptoeing,turning,kicking, andspinning.Theoddbodygeometryreflectsmarkerplacementsin thetrainingmotion-capture.

188

Page 7: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

thealgorithmtypically identifiesvariationsin bodygeometryastheprincipal� stylistic DOFs.)

As ahint to thealgorithm,wefirst trainedanHMM onaverylow-dimensionalrepresentationof thedata.Entropicestimationyieldeda modelwhich wasessentiallya phase-diagramof the locomotivecycle. Thiswasusedasaninitialization for thefull SHMM training.TheSHMM waslightly annealed,soit wasnotconstrainedto usethishint, but thefinal genericmodeldid retainsomeof theinformationin theinitialization.

The PCA of the resultingstyle-specificmodelsrevealedthat 3stylisticdegreesof freedomexplained93%of thevariationbetweenthe 10 models. The most significantstylistic DOF appearsto beglobal pose,e.g., one tilts forward for running, back for funny-walking. It alsocontainsinformationaboutthe speedof motion.Style DOF #2 controlsbalanceandgender;varying it modifiesthehips,thedistanceof thefootfall to themidline,andthecompensat-ing swingof the arms. Finally, style DOF #3 canbe characterizedasthe amountof swaggerandenergy in the motion; increasingityieldssequencesthat look moreandmorehigh-spirited.Extrapo-lating beyondthehull of specificmodelsyieldswell-behavedmo-tion with theexpectedproperties.E.g.,we candoubletheamountof swagger, or tilt a walk forward into a slouch. We demonstratewith analogies.

Analogies are a particularly interesting form of extrapola-tion. Given the analogicialproblemwalking:running::strutting:X,we can solve for X in terms of the style coordinatesX=strutting+(running-walking), which is equivalent to com-pleting the parallelogram having the style coordinates forwalking, running, and strutting as three of its corners.

run

walk strut

?

The resulting synthesizedsequence(figure 5) looks like a fast-advancingform of skanking(aratherhigh-energy popdancestyle).Similarly, theanalogywalking:running::cat-walking:Xgivessome-thingthatlookslikehow amodelmightskip/rundown acatwalk ina fashionshow.

Now we turn to examplesthat cannotbe handledby existingtime-warpingandsignalprocessingmethods.

In a more complicatedexample, the systemwas trained onfour performances by classically trained dancers (man,woman-ballet, woman-modern-dance,woman-lazy-ballet)of 50-70 sec-ondsdurationeach,with roughly 20 different moves. The per-formancesall have similar choreographiesbut vary in the timing,ordering,andstyle of moves. A 75-statemodel took roughly 20minutesto trainon6000+frames,usinginterpretedMatlabcodeona singleCPUof a 400MHzAlphaServer. Parameterextinction lefta69-stateSHMM with roughly3500parameters.Figure4 showsthatthe systemhasdiscoveredroughly equivalentqualitative structurein all thedances.Thefigurealsoshows a flowchartof the“chore-ography” discoveredin learning.

Wethentooka1600-framesequenceof anovicedancerattempt-ing similar choreography, but with little success,getting moveswrongandwronglyordered,losingthebeat,andoccasionallystum-bling. Weresynthesizedthisin amasculine-modernstyle,obtainingnotableimprovementsin thegraceandrecognizabilityof thedance.This is shown in theaccompanying video.

Wethengenerated new choreography by doingarandomwalkon the statemachine. We usedthe resulting statesequencetosynthesizenew motion-capturein a variety of styles: �� ballet- ��languid; modern+male,etc. Theseareshown in the video. Fig-ure6 illustrateshow differenttheresultsareby showing posesfromalignedtime-slicesin thedifferentsynthesizedperformances.

Finally, we demonstratedriving style machines from video.Theessenceof our techniqueis thegenerationof stylistically var-

iedmotioncapturefrom HMM statesequences(or distributionsoverstates).In the examplesabove, we obtainedstatesequencesfromexistingmotioncaptureor randomwalksontheHMM statemachine.In fact, suchstatesequencescanbe calculatedfrom arbitrarysig-nals: We canuseBrand’s shadow puppetrytechnique[5] to inferstatesequencesand/or3D bodyposeandvelocityfrom videoimagesequences.Thismeansthatonecancreateanimationsby actingoutamotionin front of acamera,thenusestylemachinesto mapsome-oneelse’s (e.g. anexpert’s) styleontoone’s choreography. In theaccompanying video we show somevision-drivenmotion-captureandstylisticvariationsthereon.

6 Discussion

Ourunsupervisedframework automatesmany of thedreariesttasksin motion-captureediting andanalysis:The dataneedn’t be seg-mented,annotated,or aligned,normustit containany explicit state-mentof the themeor the stylistic degreesof freedom(DOFs). Allthesethingsarediscoveredin learning.In addition,thealgorithmsautomaticallysegmentthe data,identify primitive motion cycles,learntransitionsbetweenprimitives,andidentify thestylistic DOFsthatmakeprimitiveslookquitedifferentin differentmotion-capturesequences.

This approachtreatsanimationasa puredata-modelingandin-ferencetask:Thereis noprior kinematicor dynamicmodel;norep-resentationof bones,masses,or gravity; noprior annotationor seg-mentationof themotion-capturedatainto primitivesor styles.Ev-erythingneededfor generatinganimationis learneddirectly fromthedata.

However, the userisn’t forcedto stay “data-pure.” We expectthat our methodscan be easily coupledwith other constraints;the quadraticsynthesisobjective function and/orits linear gradi-ent (eqn.21) canbe usedaspenaltytermsin larger optimizationsthat incorporateuser-specifiedconstraintson kinematics,dynam-ics, foot placement,etc.Thatwehavenotdonesoin thispaperandvideoshouldmakeclearthepotentialof raw inference.

Our methodgeneralizesreasonablywell off of its smalltrainingset,but like all data-driven approaches,it will fail (gracefully) ifgiven problemsthat look like nothingin the training set. We arecurrentlyexploringavarietyof strategiesfor incrementallylearningnew motionsasmoredatacomesin.

An importantopenquestionis thechoiceof temperaturesched-ules,in whichweseea trade-off betweenlearningtimeandqualityof the model. The resultscanbe sensitive to the time-coursesof�

and�K�

andwe have no theoreticalresultsabouthow to chooseoptimalschedules.

Although we have concentratedon motion-capturetime-series,thestylemachineframework is quitegeneralandcouldbeappliedto avarietyof datatypesandunderlyingmodels.For example,onecould modela variety of textureswith mixture models,learn thestylistic DOFs, thensynthesizeextrapolatedtextures.

7 Summar y

Stylemachinesaregenerativeprobabilisticmodelsthatcansynthe-sizedatain abroadvarietyof styles,interpolatingandextrapolatingstylistic variationslearnedfrom a trainingset.We have introducedacross-entropy optimizationframework thatmakesit possiblelearnstylemachinesfrom asparsesamplingof unlabeledstyleexamples.We thenshowedhow to applystylemachinesto full-body motion-capturedata,anddemonstratedthreekindsof applications:resyn-thesizingexisting motion-capturein new styles;synthesizingnewchoreographiesandstylizedmotiondatatherefrom;andsynthesiz-ing stylizedmotionfrom video.Finally, weshowedstylemachinesdoingsomethingthateverydancestudenthaswishedfor: Superim-

189

Page 8: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

posingthemotorskills of anexpertdanceron thechoreography ofanovice.�

8 Ackno wledgmentsThe datasetsusedto train thesemodelswere madeavailable byBill Freeman,MichaelGleicher, ZoranPopovic, Adaptive Optics,Biovision, Kinetix, andsomeanonymoussources.Specialthanksto Bill Freeman,who choreographedandcollectedseveral dancesequencesespeciallyfor thepurposeof style/contentanalysis.EgonPasztorassistedwith converting motion capturefile formats,andJonathanYedidiahelpedto definethejoint angleparameterization.

References[1] L. Baum. An inequalityandassociatedmaximizationtechniquein statistical

estimationof probabilisticfunctionsof Markov processes.Inequalities, 3:1–8,1972.

[2] M. Brand. Patterndiscovery via entropy minimization. In D. HeckermanandC. Whittaker, editors,Artificial Intelligenceand Statistics#7. Morgan Kauf-mann.,January1999.

[3] M. Brand. Exploring variationalstructureby cross-entropy optimization. InP. Langley, editor, Proceedings,InternationalConferenceon Machine Learn-ing, 2000.

[4] M. Brand. Voicepuppetry. Proceedingsof SIGGRAPH99, pages21–28,Au-gust1999.

[5] M. Brand.Shadow puppetry. Proceedingsof ICCV99, September1999.[6] C. Bregler. Learningand recognizinghumandynamicsin video sequences.

Proceedingsof CVPR97, 1997.[7] A. BruderlinandL. Williams. Motion signalprocessing.Proceedingsof SIG-

GRAPH95, pages97–104,August1995.[8] J. Buhmann. Empirical risk approximation:An inductionprinciple for unsu-

pervisedlearning. TechnicalReportIAI-TR-98-3, Institut fur Informatik III,Universitat Bonn.1998.,1998.

[9] R. M. Corless,G. H. Gonnet,D. E. G. Hare,D. J.Jeffrey, andD. E. Knuth. OntheLambertW function.Advancesin ComputationalMathematics, 5:329–359,1996.

[10] W. T. FreemanandJ. B. Tenenbaum.Learningbilinearmodelsfor two-factorproblemsin vision. In Proceedings,Conf. on ComputerVision and PatternRecognition, pages554–560,SanJuan,PR,1997.

[11] M. Gleicher. Motion editingwith spacetimeconstraints.1997SymposiumonInteractive3D Graphics, pages139–148,April 1997.

[12] M. Gleicher. Retargetingmotionto new characters.Proceedingsof SIGGRAPH98, pages33–42,July1998.

[13] R. Grzeszczuk,D. Terzopoulos,andG. Hinton. Neuroanimator:Fastneuralnetwork emulationandcontrolof physics-basedmodels.Proceedingsof SIG-GRAPH98, pages9–20,July1998.

[14] N. R.Howe,M. E.Leventon,andW. T. Freeman.Bayesianreconstructionof 3dhumanmotionfrom single-cameravideo. In S.Solla,T. Leend,andK. Muller,editors,Advancesin Neural InformationProcessingSystems, volume10.MITPress,2000.

[15] J.LeeandS.Y. Shin. A hierarchicalapproachto interactivemotioneditingforhuman-likefigures.Proceedingsof SIGGRAPH99, pages39–48,August1999.

[16] Z. Popovic andA. Witkin. Physically basedmotiontransformation.Proceed-ingsof SIGGRAPH99, pages11–20,August1999.

[17] L. R. Rabiner. A tutorial on hiddenMarkov modelsandselectedapplicationsin speechrecognition.Proceedingsof theIEEE, 77(2):257–286,Feb. 1989.

[18] C. Rose,M. F. Cohen,and B. Bodenheimer. Verbsand adverbs: Multidi-mensionalmotion interpolation. IEEE ComputerGraphics& Applications,18(5):32–40,September- October1998.

[19] J.B. TenenbaumandW. T. Freeman.Separatingstyleandcontent.In M. Mozer,M. Jordan,andT. Petsche,editors,Advancesin Neural InformationProcessingSystems, volume9, pages662–668.MIT Press,1997.

[20] M. Unuma,K. Anjyo, andR. Takeuchi. Fourierprinciplesfor emotion-basedhumanfigureanimation.Proceedingsof SIGGRAPH95, pages91–96,August1995.

[21] A. WilsonandA. Bobick. Parametrichiddenmarkov modelsfor gesturerecog-nition. IEEETrans.PatternAnalysisandMachineIntelligence, 21(9),1999.

[22] A. Witkin andZ. Popovic. Motion warping. Proceedingsof SIGGRAPH95,pages105–108,August1995.

[23] S. C. Zhu, Y. Wu, andD. Mumford. Minimax entropy principleandits appli-cationsto texturemodeling.Neural Computation, 9(8),1997.

...

1x x4 x53

(1)s s(2) (5)

x2 x

ss(3) s(4)

5x1 xx3 x4

v...

time

2

s(1) (4) s(5)

x

ss(2) s(3)

(3) s(4) sss(1) s(2) (5)

4 x5x xx2 x31

...

3 y4 yyy1 2y 5

Figure7: Graphicalmodelsof anHMM (top), SHMM (middle),andpath map (bottom). The observed signal �5� is explained by adiscrete-valuedhiddenstatevariable�D� � � whichchangesover time,andin thecaseof SHMMs,a vector-valuedstylevariable ¡ . Both ¡and �D� � � arehiddenandmustbeinferredprobabilistically. Arcs in-dicateconditionaldependenciesbetweenthevariables,which taketheform of parameterizedcompatibilityfunctions.In thispaperwegive rulesfor learning(inferringall theparametersassociatedwiththearcs),analysis(inferring ¡ and �D� � � ), andsynthesisof novel butconsistentbehaviors(inferringamostlikely ¢ � for arbitrarysettingsof ¡ and �D� � � ).A Hidden Markov models

An HMM is a probability distribution over time-series. Its de-pendency structureis diagrammedin figure 7. It is specifiedby£�¤ ¥$¦¨§�©`ªb§�©`«b¬?ªb§®­Dª'¯ �7°'± where

² ¦ ¤ ¥$³µ´3§#¶�¶�¶�§·³�¸ ± is thesetof discretestates;

² stochasticmatrix©`«b¬?ª

givesthe probabilityof transitioningfrom state¹ to stateº ;

² stochasticvector©`ª

is theprobabilityof asequencebeginningin stateº ;

² emission probability­Dª'¯ �7° is the probability of observ-

ing � while in state º , typically a Gaussian­Dªo¯ �»° ¤

¼ ¯ �7½¿¾ ª §bÀ-ª ° ¤ Á3 �9à ÂEÄÆÅ �ÈÇ ÀÊÉ �Å �9à ÂEÄÆÅ �ÈËoÌ ¯®ÍVÎ °ÈÏAÐ ÀѪ Ðwith mean¾ ª andcovariance

ÀѪ.

WecoverHMM essentialshere;see[17] for amoredetailedtutorial.It is useful to think of a (continuous-valued) time-seriesÒ

as a path throughconfigurationspace. An HMM is a state-space

190

Page 9: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

model, meaningthat it divides this configurationspaceinto re-gions,Ó eachof which is moreor less“owned” by a particularhid-denstate,accordingto its emissionprobability distribution. Thelikelihoodof a path Ò ¤ ¥ � ´3§ �rÌ §#¶#¶#¶$§ �ÕÔÖ± with respectto par-ticular sequenceof hidden states× ¤ ¥$³ � ´ � §�³ � Ìo� §#¶#¶$¶#§�³ � ÔÆ� ± isthe probability of eachpoint on the pathwith respectto the cur-rent hiddenstate(

Ô� Ø ´ ­ÚÙlÛÝÜ Þ¿¯ �Õ�¿° ), times the probability of thestatesequenceitself, which is theproductof all its statetransitions(©�ÙlÛ

�Þ Ô� Ø�Ì ©�ÙlÛÝÜ É �

Þ0¬ßÙlÛÝÜ Þ). Whenthis is summedoverall possible

hiddenstatesequences,oneobtainsthelikelihoodof thepathwithrespectto theentireHMM:

­à¯ Ò Ð £ ° ¤ áVâ3ãjä©�ÙlÛ

�Þ;­ÚÙlÛ

�Þb¯ � ´ °

Ô� Ø�Ì

©�ÙlÛÝÜ É �Þ ¬ßÙlÛÝÜ ÞÈ­ÚÙlÛÝÜ Þb¯ �Õ�¿°

(2)A maximum-likelihoodHMM may be estimatedfrom data Ò viaalternatingstepsof Expectation—computingadistributionoverthehiddenstates—andmaximization—computinglocally optimalpa-rametervalueswith respectto thatdistribution.TheE-stepcontainsa dynamicprogrammingrecursionfor eqn.2 thatsavesthetroubleof summingover theexponentialnumberof statesequencesin

¦ Ô :

­à¯ Ò Ð £ ° ¤ ª å Ôàæ ª (3)

å �læ ª�¤ç­Dª'¯ �Õ�¿° « å �  ´ æ «#©`«b¬?ª ½ å ´ æ ª7¤ ©`ª,©`ª'¯ � ´ ° (4)

è is calledtheforwardvariable;asimilar recursiongivestheback-wardvariableé :

ê �læ ª�¤ « ê � ë ´ æ «·­D«µ¯ �Õ� ë ´ ° ©`ªÈ¬?« ½ ê Ôàæ ª�¤ ì(5)

In theE-stepthevariablesè § é areusedto calculatetheexpectedsufficient statisticsí ¤ ¥ïîð§0ñ ± thatform thebasisof new param-eterestimates.Thesestatisticstally theexpectednumberof timestheHMM transitionedfrom onestateto another

òK«b¬?ª ¤ Ô� Ø�Ì å �

 ´ æ «#©`«b¬?ªó­Dª'¯ �Õ�¿° ê �læ ªCôï©*¯ Ò Ð £ ° § (6)

andthe probability that the HMM wasin hiddenstate³$ª

whenob-servingdatapoint�Õ�

õ �læ ª ¤ å �læ ªlê �læ ª ª å �læ ªlê �læ ªö¶ (7)

Thesestatisticsareoptimal with respectto all the informationintheentiresequenceandin themodel,dueto theforwardandback-wardrecursions.In theM-step,onecalculatesmaximumlikelihoodparameterestimateswhicharearesimplynormalizationsof í :÷©`ªÈ¬?« ¤ òKªÈ¬?«#ø ª òKªÈ¬?«

(8)

÷¾ ª ¤�õ �læ ª �Õ� �

õ �læ ª (9)

÷ù ª ¤�õ �læ ªo¯ �Õ�rú ÷¾ ª ° ¯ �Õ�rú ÷¾ ª °üû �

õ �læ ª (10)

After training, eqns.9 and10 canbe usedto remapthe modeltoany synchronizedtime-series.

In ý D we replacethesewith morepowerful entropy-optimizingestimates.

B Stylistic hid den Markov models

A stylistic hiddenMarkov model (SHMM) is an HMM whosepa-rametersarefunctionallydependenton a stylevariable ¡ (seefig-ure 7). For simplicity of exposition, herewe will only developthecasewheretheemissionprobabilityfunctions

­Dª'¯ �Õ�¿° areGaus-sianswhosemeansandcovariancesarevariedby ¡ . In that casethe SHMM is specifiedby

£ ¤ ¥$¦¨§�©`ªb§�©`«b¬?ªb§ ¾ ª § ù ªb§'þÿªb§�� ªb§ ¡=±where

² meanvector ¾ ª , covariancematrixù ª

, variation matricesþÿªb§�� ªand style vector ¡ parameterizethe multivariate

Gaussianprobability­Dª'¯ �Õ�¿° of observingadatapoint�Õ� while

in stateº :­à¯ �Õ� Ð ³$ª ° ¤ ¼ ¯ �Õ�¿½b¾ ª�� þʪ ¡ § ù ª � � ª ¡5° ¶

wherethestylizedcovariancematrixù ª � � ª ¡ is keptpos-

itivedefiniteby mappingits eigenvaluesto theirabsoluteval-ues(if necesary).

Theparameters¥$©`ªb§�©`«b¬?ª § ¾ ª § ù ª ± areobtainedfrom datavia en-

tropic estimation;¥ïþʪb§�� ª ± are the dominanteigenvectorsob-

tainedin thepost-trainingPCA of the style-specificmodels;and ¡can be estimatedfrom dataand/orvaried by the user. If we fixthe valueof ¡ , thenthe modelbecomesa standarddiscrete-state,Gaussian-outputHMM. Wecall the ¡ ¤ �

casethegenericHMM.A simpler versionof this model hasbeentreatedbefore in a

supervisedcontext by [21]; in their work, only the meansvary,andonemustspecifyby handthe structureof the model’s transi-tion function

©`«b¬?ª, thenumberof dimensionsof stylisticvariation

dim(¡ ), andthevalueof ¡ for every trainingsequence.Our frame-work learnsall of this automaticallywithout supervision,andgen-eralizesto awidevarietyof graphicalmodels.

C Entr opies and cross-entr opies

Thefirst two termsof ourobjectivefunction(eqn.1) areessentiallythe likelihoodfunction,which measuresthefit of themodelto thedata:

� ¯ í-° ��� ¯ í� £ ° ¤ ú� ¯ Ò Ð £ ° ¤ ú ����� ©*¯ Ò Ð £ ° (11)

Theremainingtermsmeasurethefit of themodelto our beliefs.Their preciseforms arederived from the likelihoodfunction. Formultinomialswith parameters

£�¤ ¥��µ´3§������#§�� Ï ± ,� ¯l£ ° ¤ ú ª �$ª ����� �$ªb§ (12)� ¯l£�� � £ ° ¤ ª � �ª ����� ¯�� �ª ø��$ª ° ¶ (13)

For � -dimensionalGaussiansof mean¾ , andcovarianceù

,

� ¯l£ ° ¤ìÍ�� ������� ÍVÎ�Á � ����� Ð ù Ð�� § (14)

� ¯l£ � � £ ° ¤ìÍ�� ����� Ð ù Ð�ú ����� Ð ù � Ð � ªy« ¯ ù  ´ ° ªy«µ¯�¯ ù � °  ´ ° ªy«� ¯ ¾Núç¾ � ° û ù  ´ ¯ ¾Nú"¾ � °Õú!� ¶

(15)

The SHMM likelihoodfunction is composedof multinomialsandGaussiansby multiplication (for any particularsettingof the hid-denstates).Whenworking with suchcompositedistributions,weoptimizethe sumof the components’entropies,which givesus ameasureof modelcodinglength,andtypically boundsthe(usuallyuncalculable)entropy of thecompositemodel.As entropy declinestheboundsbecometight.

191

Page 10: Style machines - iVizlab labivizlab.sfu.ca/arya/Papers/Dance/Style Machine.pdflow-cost, low-expertise alternative to motion-capture. 2 Related work Much recent effort has addressed

parameterization

Optimization by continuation

exp[temperature]

−lo

g po

ster

ior

Figure8: TOP: Expectationmaximizationfindsalocaloptimumbyrepeatedlyconstructinga convex boundthat touchestheobjectivefunctionat thecurrentparameterestimate(E-step;blue),thencal-culatingoptimalparametersettingsin thebound(M-step;red). Inthis figure theobjective function is shown asenergy = -log poste-rior probability;theoptimumis thelowestpointonthecurve. BOT-TOM : Annealingaddsa probabilisticguaranteeof finding a globaloptimum,by defininga smoothblendbetweenmodel-fitting—thehardoptimizationproblemsymbolizedby theforemostcurve—andmaximizingentropy—aneasyproblemrepresentedby thehindmostcurve—thentrackingtheoptimumacrosstheblend.

D Estimator sThe optimal Gaussianparametersettingsfor minimizing cross-entropy vis-a-vis datapoints� ª anda referenceGaussianparame-terizedby mean¾ � andcovariance

ù �are

÷¾ ¤ "# � # ��$�% ¾ �& ��$ % ' (16)

() * "#,+.- #0/ (132 +.- #0/ (1325476 $ % +8+ (1 / 1:9;2 + (1 / 1�9;25476 () 9 2& 6 $ 6 $ % <(17)

Theoptimalmultinomialparametersettingsvis-a-viseventcounts= andreferencemultinomialdistribution parameterizedby proba-bilities > 9 aregivenby thefixpoint

(?A@CB # * D�EGF H / +JI @CB # 6 $ %LK 9@CB # 2$NM�OQP5RTSVU 6�WYX $ / Z ' (18)

(W * Z[\#

+JI @CB # 6 $�% K 9@CB # 2?A@CB # 6 $�]�^�_ ?A@CB # 6 $ '(19)

whereW is definedH +�` 2 M�acb�dfe * ` [9]. The factors

$ ' $ % varythe strengthof the entropy and cross-entropy priors in anneal-ing. Derivationswill appearin a technicalreport available fromhttp://www.merl.com.

Theseestimatorscomprisethe maximizationstepillustratedinfigure8.

E Path mapsA pathmapis astatisticalmodelthatsupportspredictionsaboutthetime-seriesbehavior of a targetsystemfrom observationsof a cuesystem.A pathmapis essentiallytwo HMMs thatsharea backboneof hiddenstateswhosetransitionfunctionis derivedfrom thetargetsystem(seefigure7). TheoutputHMM is characterizedby per-stateGaussianemissiondistributionsoverbothtargetconfigurationsandvelocities.Givenapaththroughcueconfigurationspace,onecalcu-latesa distribution g over thehiddenstatesasin h A, eqn.7. Fromthis distribution onecalculatesanoptimalpaththroughtargetcon-figurationspaceusingtheequationsin h F.

F SynthesisHerewe repriseandimprove on Brand’s [4] solutionfor likeliestmotion sequencegiven a matrix of stateoccupancy probabilitiesg . As theentropy of g declines,thedistribution over all possiblemotionsequencesbecomes

]�i�jk bmlYemnpo�qGr +.sut g 2 * MSNvw x y l x{z|y;}~��x{z|y���� vy }~ x{z|y5��� ' (20)

wherethevector �� #��|� <* � � � / 1 # ' + � � / � � SVU 2 / �1 #�� 4 is the targetposition and velocity at time � minus the meanof state � , and� is a constant. The means 1 # and covariances

) # are fromthe synthesizingmodel’s set of Gaussianemissionprobabilities

qY�.� x�� + � � ' �� � 2 <*�� + � � � ' � � / � � SVU ��� � 1 �.� x�� ' �1 �.� x�� � ') �.� x�� 2 . Break-

ingeachinversecovarianceinto foursubmatrices) SVU@ <* �������� ��� � ,

we canobtainthemaximumlikelihoodtrajectory s!¡ (mostlikelymotion sequence)by solving the weightedsystemof linear equa-tions

¢Y£ SVU��¤¦¥ ¢ @ �|# /�§�� SVU � @ +�¨ @ 6ª© @ 2§�� SVU � @ +�« @ 6 ¨ @ 6­¬ @ 6­© @ 2�6 §��.�|# © #/�§��.�|# + ¬ # 6ª© # 24 ® � SVU® �® � � U* §�� SVU � @°¯A@ /±§��.�|#{²:# ' (21)

where² @ <* � ¬ @ © @ � �´³ @uµ³ @ � 4 ' ¯A@ <* � « @ ¨ @ � �´³ @uµ³ @ � 4¶6 ² @ andtheendpointsareobtainedby droppingtheappropriatetermsfromequation21:

¢ @ § @ � U © @/ ¬ @ / © @4 ® o® U * /�§ @ � U ² @ (22)

¢ @ § @ � £ / ¨ @ / © @« @ 6 ¨ @ 6�¬ @ 6ª© @4 ® £ SVU® £ * § @ � £ ¯A@ (23)

This generalizesBrand’s geodesic[4] to useall the informationintheoccupancy matrix g , ratherthanjustastatesequence.

The least-squaressolutionof this · s * ¸ systemcanbecal-culatedin ¹ +�º 2 timebecause· is block-tridiagonal.

We introducea furtherimprovementin synthesis:If weset s tothetrainingdatain eqn.21,thenwecansolvefor thesetof Gaussianmeans» * ¼ 1 U ' 1 ¥ ' <�<�<�½ thatminimizestheweightedsquared-error in reconstructingthatdatafrom its statesequence.To do so,we factorther.h.s.of eqn.21 into · s * ¸ * ¾ » where

¾is

anindicatormatrixbuilt from thesequenceof mostprobablestates.Thesolutionfor » via thecalculation» * · s ¾ SVU tendsto beof enormousdimensionandill-conditioned,sowe preconditiontomake theproblemwell-behaved: » * +.¿ 4 s 2 +.¿ 4 ¿ 2 SVU , where¿ * · ¾ SVU . Onecaution:Thereis a tensionbetweenperfectlyfit-ting thetrainingdataandgeneralizingwell to new problems;unlesswe have a very largeamountof data,theminimum-entropy settingof themeanswill dobetterthantheminimumsquarederrorsetting.

192