8
Crowd Motion Capture Nicolas Courty Samsara/VALORIA/Université de Bretagne Sud Campus de Tohannic 56000 Vannes, France [email protected] Thomas Corpetti COSTEL/Université de Haute-Bretagne Place du Recteur Henri Le Moal 35043 Rennes, France [email protected] ABSTRACT In this paper a new and original technique to animate a crowd of human beings is presented. Following the success of data-driven animation models (such as motion capture) in the context of articulated figures control, we propose to derivate a similar type of approach for crowd motions. In our framework, the motion of the crowds are represented as a time series of velocity fields estimated from a video of a real crowd. This time series is used as an input of a sim- ple animation model that ”advect” people along this time- varying flow. We demonstrate the power of our technique on both synthetic and real examples of crowd videos. We also introduce the notions of crowd motion editing and present possible extensions to our work. Categories and Subject Descriptors I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism - Animation; I.6.8 [Simulation and Model- ing]: Types of Simulation - Animation Keywords Crowd simulation, data-driven animation, vision-based tech- niques for computer animation, motion estimation 1. INTRODUCTION Crowds of people exhibit particular and subtle behaviors whose complexity reflects the complex nature of human be- ings. Animating crowds are usually a tedious task for an- imators who rely most of the time on simulation models. Achieving particular effects with global emergent crowd be- haviors from such models appear to be very tricky, since an- imators usually only have control on specific characteristics of the crowd members. In the context of human-like figures animation, huge progress have been observed with the use of motion capture. Using motions acquired from real perform- ers is now common, and modifying them through a variety of editing operations brings substantial benefits in terms of control and realism in the produced animation. The aim of our technique is to adapt such a methodology in the context of crowd animation. This raises an opened set of questions: what kind of descriptors will best represent crowd mo- tions ? how to capture/estimate them from real situations ? how to apply/use them as an input for an interactive, parameterizable animation system ? and finally is it possible to edit those captured motions in order to produce new motions ? We chose to use crowd videos as input data because they are relatively easy to capture and do not require invasive tech- niques such as markers that are impractical in the presence of thousands of individuals. At first it sounds temptating to track singular pedestrians into the flow of people. We ar- gue that, despite the technical difficulties to achieve such a task, this approach fails in describing the crowd behavior as a whole (but may as well succeed in characterizing individ- ual specific behaviors). Conversely, our framework is based on the assumption that the motions of individuals within the crowd is the expression of a continuous flow that drives the crowd motion. This assumes that the crowd is dense enough so that pedestrians are considered as markers of an underlying flow. In this sense, our method is more related to macroscopic simulation models (that try to define an overall structure to the crowd’s motions) rather than microscopic models (that define the crowd’s motions as an emergent be- havior of the sum of individual displacement strategies). Overview of the data-driven animation system. Our methodology relies on an analysis/synthesis scheme which is depicted in Figure 1. First, images are extracted from a video of a real crowd. From all the pairs of successive images a vector field is computed through a motion estima- tion process. The concatenation of all these vector fields represent a time series which accounts for the displacement of the whole crowd along time. This ends up the analysis part. The synthesis of a new crowd animation is done by advecting particles (the pedestrians) along this time varying flow. This paper is divided as follow: Section 2 presents related work in the context of crowd simulation as well as motion estimation. Section 3 deals with the estimator used in our

Crowd Motion Capture - univ-rennes2.fr · methodology relies on an analysis/synthesis scheme which is depicted in Figure 1. First, images are extracted from a video of a real crowd

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Crowd Motion Capture

    Nicolas CourtySamsara/VALORIA/Université de Bretagne Sud

    Campus de Tohannic56000 Vannes, France

    [email protected]

    Thomas CorpettiCOSTEL/Université de Haute-Bretagne

    Place du Recteur Henri Le Moal35043 Rennes, France

    [email protected]

    ABSTRACTIn this paper a new and original technique to animate acrowd of human beings is presented. Following the successof data-driven animation models (such as motion capture)in the context of articulated figures control, we propose toderivate a similar type of approach for crowd motions. Inour framework, the motion of the crowds are represented asa time series of velocity fields estimated from a video of areal crowd. This time series is used as an input of a sim-ple animation model that ”advect” people along this time-varying flow. We demonstrate the power of our technique onboth synthetic and real examples of crowd videos. We alsointroduce the notions of crowd motion editing and presentpossible extensions to our work.

    Categories and Subject DescriptorsI.3.7 [Computer Graphics]: Three-Dimensional Graphicsand Realism - Animation; I.6.8 [Simulation and Model-ing]: Types of Simulation - Animation

    KeywordsCrowd simulation, data-driven animation, vision-based tech-niques for computer animation, motion estimation

    1. INTRODUCTIONCrowds of people exhibit particular and subtle behaviorswhose complexity reflects the complex nature of human be-ings. Animating crowds are usually a tedious task for an-imators who rely most of the time on simulation models.Achieving particular effects with global emergent crowd be-haviors from such models appear to be very tricky, since an-imators usually only have control on specific characteristicsof the crowd members. In the context of human-like figuresanimation, huge progress have been observed with the use ofmotion capture. Using motions acquired from real perform-ers is now common, and modifying them through a varietyof editing operations brings substantial benefits in terms ofcontrol and realism in the produced animation. The aim of

    our technique is to adapt such a methodology in the contextof crowd animation. This raises an opened set of questions:

    • what kind of descriptors will best represent crowd mo-tions ?

    • how to capture/estimate them from real situations ?

    • how to apply/use them as an input for an interactive,parameterizable animation system ?

    • and finally is it possible to edit those captured motionsin order to produce new motions ?

    We chose to use crowd videos as input data because they arerelatively easy to capture and do not require invasive tech-niques such as markers that are impractical in the presenceof thousands of individuals. At first it sounds temptatingto track singular pedestrians into the flow of people. We ar-gue that, despite the technical difficulties to achieve such atask, this approach fails in describing the crowd behavior asa whole (but may as well succeed in characterizing individ-ual specific behaviors). Conversely, our framework is basedon the assumption that the motions of individuals withinthe crowd is the expression of a continuous flow that drivesthe crowd motion. This assumes that the crowd is denseenough so that pedestrians are considered as markers of anunderlying flow. In this sense, our method is more related tomacroscopic simulation models (that try to define an overallstructure to the crowd’s motions) rather than microscopicmodels (that define the crowd’s motions as an emergent be-havior of the sum of individual displacement strategies).

    Overview of the data-driven animation system. Ourmethodology relies on an analysis/synthesis scheme whichis depicted in Figure 1. First, images are extracted froma video of a real crowd. From all the pairs of successiveimages a vector field is computed through a motion estima-tion process. The concatenation of all these vector fieldsrepresent a time series which accounts for the displacementof the whole crowd along time. This ends up the analysispart. The synthesis of a new crowd animation is done byadvecting particles (the pedestrians) along this time varyingflow.

    This paper is divided as follow: Section 2 presents relatedwork in the context of crowd simulation as well as motionestimation. Section 3 deals with the estimator used in our

  • Figure 1: Overview of the whole process

    methodology, and Section 4 presents the integration of themotion descriptor in a crowd animation controller, as wellas possible ways to edit the captured motions. The last twosections present results obtained with our method along witha conclusion and perspectives for our work.

    2. STATE OF THE ARTData-driven animation techniques usually require means toextract information from the real world. In the presenceof rigid motions such as character motions, marker basedtechniques bring the best results because only a few motionparameters have to be considered. In the context of non-rigid motions such as facial parameters, fluids flows, animalgaits/behaviors or tree structures, video-based techniqueshave already been considered in the literature [1, 2, 3, 4,5]. Simulating crowds from real videos is closely related tothis topic, but yet is different because of the choice of themotion descriptor (time varying vector fields). Such a de-scription has been previously considered in a general anima-tion system, Flow tiles [6], but the author did not considerthat the flow tiles could be designed from real data. In [?],Brogan et al succesfully performed statistics on capturedhuman walking paths and derived from them a new pathplanning model. In order to motivate a data-driven anima-tion scheme for crowd animation we first present the state ofthe art works in crowd animation, then we introduce somegeneral issues in the motion estimation problem.

    2.1 Crowd simulationCrowd behavior and motion of virtual people have beenstudied and modeled for several purposes ranging from movieproduction to the simulation of emergency situations or thedesign of buildings and open-spaces. The state of the artin human crowd behavioral modeling is large and can beclassified in two main approaches: microscopic and macro-scopic models. The models belonging to the first categorydescribe the time-space behavior of individual pedestrianswhereas the second category describe the emergent prop-erties of the crowd. The simplest models of microscopicsimulation are based on cellular automata [7, 8]. Dynamicsystems have also been considered by Helbing in his popularsocial force model [9]. It consists in expressing the motion ofeach pedestrian as a result of a combination of social forces,that repel/attract pedestrians toward each others. It hasbeen shown that this model generates realistic phenomenasuch as arc formations in exits or increasing evacuation timewith increased desired evacuation velocities. It has been ex-

    tended to account for individualities [10] or the presence oftoxic gases in the environment [11]. More complex modelsconsider each member of the crowd as autonomous pedestri-ans endowed with perceptive and cognitive abilities [12, 13,14, 15]. Those models exhibit a variety of results dependingon the quality of the behavior design. Such models may leadto incorrect emergent global behaviors. Those difficultiescan be avoided by using a continuum formulation [16, 17].Equations using the concepts of fluid mechanics have beenderived in order to model such approach of human crowds.Those approaches rely on the assumption that the charac-teristic distance scale between individuals is much less thanthe characteristic distance scale of the region in which theindividuals move [16]. As such, the density of the crowd hasto be taken into account for those models to be pertinent.Finally several hypotheses on the behavior of each membersof the crowd lead to partial derivative equations governingthe flow of people.

    Although crowds are made up of independent individualswith their own objectives and behavior patterns, the behav-ior of crowds is widely understood to have collective charac-teristics which can be described in general terms. Though,macroscopic models may lack of subtleties and often rely onstrong hypotheses (notably on density). Our framework pro-pose to capture this global dynamic from real crowd videosequences. This imposes the use of motion estimation tech-niques.

    2.2 Motion estimationTracking a very short number of people and using their mo-tions in a computer animation system has previously beeninvestigated by Somasundaram and Parent in [?]. When acrowd is dense enough, usual tracking systems like Kalmanfilters or stochastic filtering [18, ?, ?] generate large statespace that yields a computationally expensive problem. Itis then necessary to use alternative methods to obtain theinformation on the dynamics of the crowd in order to charac-terize its behavior. The idea of using optical flow to estimatecrowd motions has recently drawn attention in the contextof human activity recognition [19]. This section investigatesthe most popular ways to estimate the motion from imagesequences.

    From the panel of available approaches, almost all of themare based on the well-known Optical-Flow Constraint Equa-tion (ofce): it assumes the visible points conserve roughlytheir intensity in the course of a displacement. This reads:

    dE(x, t)

    dt=∂E(x, t)

    ∂t+ ∇E(x, t) · v(x, t) ≈ 0, (1)

    where v(x, t) = (u, v)T is the unknown velocity field at timet and location x = (x, y) in the image plane Ω, E(x, t) be-ing the image brightness, viewed for a while as a continuousfunction. It is known that this assumption is subject to theaperture problem: in homogeneous areas, the latter equa-tion is undetermined and there exist an infinity of solutions.A number of approaches have hence been proposed to copethis problem and to obtain the optical flow under the ofceconstraint (see for instance [20] for a survey). In the follow-ing, we list the most popular ones.Correlation: these techniques are based on the researchof a window that is similar (w.r.t the ofce) to a window

  • centered at position x. The displacement associated corre-sponds to a peak of the correlation surface. Very efficientimplementations of such approaches can be obtained in theFourier space but these techniques suffer from a lack of con-sistency (from a spatial point of view) and some known lim-itations of such approaches [21] prevent from a comfortableuse.Parametric models: the idea is to cope the aperture prob-lem by using a parametric model Θ to represent the velocityin the image (v(x) = Θx). The model can be obtained byminimizing:

    Θ = minΘ

    ZΩ

    „∂E(x, t)

    ∂t+ ∇E(x, t) ·Θx

    «2. (2)

    Such techniques are very efficient when one has a reliable apriori knowledge on the flow to estimate but this representa-tion strongly limits the possibility of estimating the motionin a number of various situations.Lucas & Kanade: Lucas and Kanade [22] assumed thatthe unknown optic flow vector is constant within some neigh-borhood of size n. Hence, the motion field v can be esti-mated by minimizing the function:Z

    Kn ∗„∂E(x, t)

    ∂t+ ∇E(x, t) · v(x, t)

    «2(3)

    where the Gaussian Kernel Kn tends in fact to eliminatehomogeneous areas. The main drawback of this approach isits incapacity of estimating local measurements.Optical Flow: In the seminal work of Horn & Schunck[24], the authors proposed to solve the ofce by adding aspatial regularization constraint. The motion field is thenobtained by minimizing:Z

    „∂E(x, t)

    ∂t+ ∇E(x, t) · v(x, t)

    «2+ α

    ZΩ

    f2(v),

    (4)

    where f is a smoothing function and α a parameter to fix.In the original work, the authors proposed a first order spa-tial smoothing term: f(v) = ‖∇u‖2 + ‖∇v‖2 that assumesa spatial coherency of the motion field. This penalty termis based on the idea that all points of the same rigid objecthave a similar motion. Unfortunately, in presence of discon-tinuities, this term is not adapted as it smooth out the fron-tiers. The two last decades, many authors have proposedsome regularizations of the type f(v) = g(‖u‖) + g(‖v‖)where g is a robust function to define. Its objective is to ex-tract a piecewise smooth motion field while preserving thediscontinuities (see for instance [25, 26]).

    3. CROWD ESTIMATION AND REPRESEN-TATION

    In this section we present our motion estimation techniquealong with necessary post-processing treatments that en-hance both the quality of the signal as well as its compact-ness.

    3.1 Motion estimationIt has been shown in [21] that a penalty term based on afirst order smoothing f(v) = ‖∇u‖2 + ‖∇v‖2 is equivalent,

    from a minimization point of view, to the minimization of:ZΩ

    div2(v) + curl2(v)dx (5)

    where div(v) = ∂u/∂x+∂v/∂y and curl(v) = ∂v/∂x−∂u/∂yare respectively the divergence and the vorticity of the flow.Therefore, a classic smoothing term minimize areas that ex-hibit high values of divergence and/or vorticity. Such kindof motions descriptors are issued from fluid mechanics andare also very useful to characterize a crowd phenomenon:the divergence represents a dispersion while the vorticity islinked to a walk around an obstacle. Some studies haveactually proved that a crowd dense enough has sometimesa behavior that can be explained by some fluid mechanicslaws [16]. It is then of primary interest to integrate suchprior knowledge in the optical flow to obtain a techniquedevoted to crowd motion. In particular, it is important tomeasure precisely the divergence and the vorticity from realdata to perform an accurate synthesis of the observed phe-nomenon. In this paper, we then prefer to rely on a div-curlsmoothing term that extract more precisely those two com-ponents of the flow [21]. The new smoothing function f isthen f(v) = ‖∇div v‖2 + ‖∇curl v‖2. Such a penalizationtries to preserve areas that contain high values of divergenceand/or vorticity.

    3.2 Motion post-processingIn practice, the real sequences of crowds are composed of ahigh number of images (between 250(≈ 10sec) and 1500(≈1min)). This results in an important number of instanta-neous dense motion fields. Under the assumption that thepedestrians are some ”tracers” of the underlying flow thattransport them, the information contained by all the instan-taneous motion fields is redundant. Moreover, some loca-tions of the images are pedestrian-free (only the backgroundis visualized); in the time series this results in some noisyareas where the corresponding displacements are computedonly with the help of the spatial-regularization. In such sit-uations, from an image to an other, there is no warrantythat the flow is consistent from a temporal point of view.This results in noisy time series, as depicted in the greensignal of figure 5. According to these remarks, it is thenof primary interest to i) reduce this huge amount of dataand ii) de-noise the time series. To that end, we suggest toexploit some properties of the Fourier transformation.Fourier transformation Following the Fourier transform,any discrete signal of size T can be represented into a seriesof T growing frequency components. The lowest frequencyrepresents the mean of the signal; the lower frequencies rep-resent its small variations; higher frequencies are linked tohigh variations whereas very high values represent noises.It is then important to reconstruct a signal by removing itshigher frequencies in order to de-noise it. Concerning thereduction of dimension, the figure 2 represents the error ofreconstruction of a complete sequence of motion fields infunction of the number of frequencies used to reconstruct it.It is shown than if we take a few number of frequencies (lessthan 10%), the reconstruction is fairly accurate.

    4. DATA-DRIVEN ANIMATION OF CROWDSOnce the time series of motion fields has been computed,it is possible to consider this information as input data for

  • Figure 2: Error of reconstruction. In this example, theoriginal signal contain 641 frequencies. The reconstruc-

    tion error continuously decreases and if one takes 50 fre-

    quencies (less than 8%), the error becomes lower than

    5%

    a data-driven crowd animation system. We first presentin this section a simple model to compute crowd motionsfrom these time series (Motion synthesis). We then exhibittwo possible techniques to edit/modify those data (Motionediting).

    4.1 Motion synthesisLet us first recall that the computed velocities correspondto a velocity in the image space, and our goal is to animatepedestrians in the virtual world space. Given the positionof such a person in the virtual world, it is possible to getthe corresponding position in the image frame along witha camera projection model. Parameters for this projectioncan be obtained exactly through camera calibration. Wehave considered as an approximation of this model a simpleorthographic projection in the experiments presented in theresult sections. This assumption holds whenever the camerais sufficiently far away from the scene. Once this projectionhas been defined, animating individualities which constitutethe crowd amounts to solve the classical following differentialequation (with x(t) the position of a person in the imageframe at time t) :

    ∂x

    ∂t= v(x(t), t) (6)

    equipped with appropriate initial condition x(0) = x0 whichstands for the initial positions of the individual in the flowfield. In our framework we have used the classical 4-th orderRunge Kutta integration scheme, which allows to computea new position x(t + 1) given a fixed time step with anacceptable accuracy. This new position is then projectedback in the virtual world frame. This process is depicted inFigure 3.

    4.2 Motion editingContinuous crowd motion. Assuming that only a shortvideo sequence of a crowd is available, we present here atechnique to generate a continuous and aperiodic anima-tion of crowd. Following the work on video textures [28](which was later used in the motion graph techniques forbody motions [29]), the idea is to generate transitions be-

    Figure 3: Motion synthesis from flow field. The positionof the crowd’s member is projected onto the flow (step

    1), the integration is performed in the image frame (step

    2) and then the new position is projected back in the

    virtual world frame (step 3).

    tween different instants in the vector field time series basedon a similarity criterion. In order to compare different in-stantaneous velocity fields we propose to use, as suggestedin [?], the angular error:

    d(U,V) =

    ZΩ

    arccos(< u(x),v(x) >

    ||u(x)||||v(x)|| )dx (7)

    where U and V are two vector fields of same dimensions,

    u(x) =

    „U(x)

    1

    «and v(x) =

    „V(x)

    1

    «correspond to

    the homogeneous coordinates of the fields vector elements(vectors of dimension 3). The rest of the procedure is thenidentical to [28]: the distance matrix is filtered with an n-tapfilter that magnifies similarities upon a small time window.From this matrix are derived probabilities of transition fromone instantaneous field to another. Figure 4 shows an ex-ample of such matrices built upon a 3 seconds small crowdvideo.

    a b

    Figure 4: distance matrix; (a): distance matrix using anangular distance between each vector field in the time

    series; (b) filtered matrix with an 8-tap filter

    Perspectives. There exists a variety of potential editingoperations for crowd motions. As perspectives for this workwe sketch two of them:

  • • Changing the crowd style. Using the Fourier Trans-form may also help in separating different patterns inthe crowd behavior, depending on the classes of fre-quency used for the reconstruction. A signal com-posed with only few low coefficients actually corre-sponds to a stable and regular behavior. Conversely,manually adjusting the scale of high frequencies for in-stance is likely to produce a more ”nervous” patternin the crowd motions, thus giving the animator a po-tential control over the crowd ”style”. Examples ofreconstruction using either low or high frequencies aregiven in figure 5

    • Tiling crowd patterns. Given several available mo-tion time series one could possibly envisage to combinethem spatially in the same virtual environment (oneentrance and one obstacle acquired separately for ex-ample). The problem is here to define a proper juxta-posing or fusion tool that will preserve the interestingfeatures of the flow. Let us finally note that this spa-tial registration can also be done in the time domain(how to blend two series together ?). Dynamic TimeWarping techniques stand as good candidates for thosepurposes.

    Figure 5: Illustration of the Fourier Transform. dashedlines: an initial time series; line in gray: its mean value

    (reconstruction with one frequency); black signal: recon-

    struction with 3 frequencies; the resulting filtered time

    series is smooth and corresponds to a quiet displacement;

    gray signal: the de-noised signal (very high frequencies re-

    moved) reconstruct with only high frequencies; this is

    linked to a stressed behavior. The black ellipses are some

    examples of pedestrian-free areas.

    5. RESULTSOur approach has been first tested on a synthetic crowdsequence to validate the theoretical part of our work. Wehave also used real sequences to handle real cases. Thoseresults are presented in this section. Concerning the post-processing, each time series of motion fields has been repre-sented with only 10% of its lowest frequencies. This choiceis based on the graph of figure 2.

    5.1 Synthetic example

    The synthetic sequence represents a continuous flow of hu-man beings with an obstacle (a cylinder named C) in themiddle of the image. It has been generated using the classi-cal Helbing simulation model [9].Motion estimation process In this situation, the area Cis subject to the aperture problem: any vector inside C isa reliable candidate. The motion estimation technique be-ing equipped with a spatial regularization, we need to applya specific process to prevent from incoherent motions thatmay occur inside the cylinder. To that end, we completelyblurred this area from an image to an other, so that theofce constraint is verified nowhere inside C. We applied arobust M-estimator to the ofce (first term of relation (4))instead of a quadratic penalization. Such functions, arisingfrom robust statistics [?], limit the impact of the locationswhere the brightness constancy assumption does not hold.As a consequence, this area is not taken into account bythe observation term of the estimation process. The motionfields estimated outside the cylinder are then not disturbedby the ones inside C. This is illustrated in Figure 6. Wepresent 4 images of the sequence in Figure 6(a–d); an esti-mated motion field is depicted in Figure 6(e) and a zoom ofthe cylinder area with and without the specific treatment ispresented in Figure 6(f) and 6(g) respectively. Concerningthe smoothing parameter α, its value was set to 250 (thiscorresponds to the dynamics of the brightness function).Animation Some images of the crowd animation synthesisare shown on Figure 6(i–l). The animation was generatedthanks to a Maya plugin which defines a crowd as a set ofparticles and performs the synthesis described in section 4.As expected, the virtual crowd moves in accordance withthe underlying motion and the obstacle is correctly man-aged. This first example proves the ability of the proposedapproach to synthesize a coherent motion from an estimatedmotion field. Let us now apply this technique to real data.

    5.2 Real dataWe present the results obtained on two real sequences. Bothdata have been acquired with a simple video camera withan MPEG encoder. The motion estimation approach wasapplied without any particular process, with a smoothingparameter α = 250.Strike sequence The first real sequence is a video repre-senting a strike. All pedestrians are walking in the samedirection. Two images of the sequence can be seen on Fig-ure 7 (a-b). In Figure 7 (c-d), we present the synthetic crowdanimation obtained superimposed on the estimated motionfield. One can observe that the resulting crowd animationrespect the initial yet simple pedestrian behaviors. The realscene can then be reproduced with accuracy while maintain-ing, despite the regularity of the flow, a particular diversityin the pedestrian trajectories.Entrance sequence The second real sequence shows a crowdentering a stadium. This example is very worthwhile sincea variety of phenomena are present: a continuous flow atthe beginning followed by a compression of some peoplesin the left part of the images. In addition, the limit of thedoor is an obstacle that creates two opposite fluxes and thatgenerates a vortex in the motion fields. Four images of thesequence are displayed in Figures 7 (e–h).

    The animation is represented in Figures 7 (i–l). Figures 7(m–p) are focused on the region that exhibits opposite mo-

  • (a) (b) (c) (d)

    (e) (f) (g)

    (i) t = 1 (j) t = 50 (k) t = 100 (l) t = 150

    Figure 6: Experimental results on synthetic data: (a,b,c,d): Sequence #1: simulation of a crowd flow with a cylindricobstacle; (e) the estimated motion field;(f) motion near the cylinder estimated with a special care of this no-data area

    and (g) same as (f) but without a specific treatment for the cylinder. One can see that the motion near the cylinder

    in (g) is not totally coherent. (i,j,k,l): images of the simulated crowd with the synthetic flow.

    tions. One can see that this complex behavior has been cor-rectly captured and re-synthesized. This is very stimulativeregarding the possibilities of this approach to manage com-plex flows. The next step of the process will be to extractthe different behaviors (compression, rotation for instance)and to synthesize them independently.Quantitative values The table 1 shows some details onthe size, the time computation (generated on a 2GHz PCwith 1Go RAM) and the quality of the compression for thethree sequences. We also added the space required by thewhole motion field and the compressed one. Concerning theanimations, they were generated at interactive framerates( 60 frames per second). It can be observed that our ap-proach is thus suitable for real-time applications and doesnot require a lot of memory.

    5.3 DiscussionOur technique has been applied with success to reproducethe observed scenes. Nevertheless, we have isolated two lim-itations to our approach:

    • the quality of the generated animation is linked withthe initial density of the crowd members. In this sense,it is the role of the animator to design an initial crowdedsituation that is similar to the video conditions, though

    seq #1 seq #2 seq#31 (80, 60)x214 (80, 60)x274 (80, 80)x6412 3 min 35 s 4 min 35 s ≈ 11 min3 19.2 Mo 24.7 Mo 76.4 Mo4 1.92 Mo 2.47 Mo 7.64 Mo5 93.4% 97.3% 97.5%

    Table 1: Numerical aspects. 1st line: size of the se-quences (lin,col)× t; 2nd line: CPU time for the motionestimation; 3rd line: uncompressed size of the motion

    field; 4rd line: compressed size; 5th line: percentage of

    the information in the compressed field.

    a certain latitude is possible. This point requires to ouropinion more investigation,

    • this analysis/synthesis scheme may not keep peopleaway from colliding between each other. This pointcould be solved by mixing the a priori information ac-quired by motion estimation as an input parameter of aclassical dynamic system such as Helbing’s model [9] orthe more recent Continuum crowd’s model [17]. Thoseaspects are part of our future works.

    6. CONCLUSION

  • (a) (b) (c) (d)

    (e) (f) (g) (h)

    (i) (j) (k) (l)

    (m) (n) (o) (p)

    Figure 7: Experimental results on real data: (a,b): Sequence #2: strike video taken from above; (c,d) images of theresulting animation; (e,f,g,h) Sequence #3: video of a crowded entrance; (i,j,k,l) images of the resulting animation;

    (m,n,o,p) close-up on a remarkable zone where two opposite fluxes of people are juxtaposed

    In this paper an extension of the concept of motion capturewas proposed to the domain of crowd animation, based onthe analysis of video sequences of crowd. Our frameworkrelies on i) a specific motion information process applied onthe video images and ii) a data-driven animation schemethat allows to generate animation of crowds from this in-put information. We applied the presented method on bothsynthetic and real examples. In our opinion the results areconvincing and we believe that this method could be effi-ciently used by animators to produce realistic crowd effectsfrom sample situations. We also think that this approachcan be used to compare the results of simulation models toreal situations according to their underlying flows, thus pro-viding a quantitative evaluation framework (rather than thetraditional qualitative one). Motivated by our experiments,three points are envisaged to continue this work:

    • concerning the analysis part, the motion estimationprocess can be improved by introducing more specific

    spatio-temporal models of continuous crowd formula-tion like [16]. This can be done through the use ofthe optical flow framework but it can also be relevantto explore the data-assimilation possibilities used forinstance in meteorology,

    • concerning the synthesis part, it is possible to considermore sophisticated animation system by introducingthe a priori information obtained from the analysispart in dynamic systems such as [9, 17],

    • finally we plan to continue the analogy with motioncapture and propose new editing methods that willmake it possible to slightly change/control the intialcaptured motions.

    7. ACKNOWLEDGMENTSThe authors wish to thank Amel Achour and Gwendal Jabotfor their contributions to the project, and also reviewers forpointing out missing references in our work.

  • 8. REFERENCES[1] J. Chai, J. Xiao, and J. Hodgins. Vision-based control

    of 3d facial animation. In Eurographics/ACMSIGGRAPH Symposium on Computer Animation,pages 79–87, Grenoble, France, August 2003.

    [2] K. Bhat, S. Seitz, J. Hodgins, and P. Khosla.Flow-based video synthesis and editing. ACMTransactions on Graphics, special issue, ProceedingsACM SIGGRAPH 2004, 23(3):360–363, 2004.

    [3] L. Favreau, L. Reveret, C. Depraz, and M.-P. Cani.Animal gaits from video. In Eurographics/ACMSIGGRAPH Symposium on Computer Animation(SCA’04), Grenoble, France, August 2004.

    [4] D. Gibson, D. Oziem, C. Dalton, and N. Campbell.Capture and synthesis of insect motion. InEurographics/ACM SIGGRAPH Symposium onComputer Animation (SCA’05), pages 39–48, July2005.

    [5] J. Diener, L. Reveret, and E. Fiume. Hierarchicalretargetting of 2d motion fields to the animation of 3dplant models. In Eurographics/ACM SIGGRAPHSymposium on Computer Animation (SCA’06),Vienna, Austria, August 2006.

    [6] S. Chenney. Flow tiles. In Eurographics/ACMSIGGRAPH Symposium on Computer Animation(SCA’04), pages 233–242, Grenoble, France, August2004.

    [7] E. Bouvier, E. Cohen, and L. Najman. From crowdsimulation to airbag deployment: particle systems, anew paradigm of simulation. Journal of ElectronicImaging, 6(1):94–107, January 1997.

    [8] V. Blue. Cellular automata microsimulation formodeling bi-directional pedestrian walkways.Transportation Research, Part B: Methodological,35(3):293–312, March 2001.

    [9] D. Helbing, I. Farkas, and T. Vicsek. Simulatingdynamical features of escape panic. Nature,407(1):487–490, 2000.

    [10] A. Braun and S. Raupp Musse. Modeling individualbehaviors in crowd simulation. In Proc. of ComputerAnimation and Social Agents (CASA’03), NewJersey, USA, May 2003.

    [11] N. Courty and S. Musse. Simulation of Large CrowdsIncluding Gaseous Phenomena. In Proc. of IEEEComputer Graphics International 2005, pages206–212, New York, USA, June 2005.

    [12] S. Raupp Musse and D. Thalmann. Hierarchicalmodel for real time simulation of virtual humancrowds. In IEEE Trans. on Visualization andComputer Graphics, volume 7(2), pages 152–164.IEEE Computer Society, 2001.

    [13] M. Sung, M. Gleicher, and S. Chenney. Scalablebehaviors for crowd simulation. Comput. Graph.Forum, 23(3):519–528, 2004.

    [14] T. Sakuma, T. Mukai, and S. Kuriyama. Psychologicalmodel for animating crowded pedestrians. Journal ofVisualization and Computer Animation,16(3-4):343–351, 2005.

    [15] W. Shao and D. Terzopoulos. Animating autonomouspedestrians. In Proc. SIGGRAPH/EG Symposium onComputer Animation (SCA’05), pages 19–28, LosAngeles, CA, July 2005.

    [16] R. L. Hughes. The flow of human crowds. Annualrevue of Fluid. Mech., 20(10):169–182, 2003.

    [17] A. Treuille, S. Cooper, and Z. Popovic. Continuumcrowds. ACM Transactions on Graphics, special issue,Proceedings ACM SIGGRAPH 2006, 25(3):1160–1168,2006.

    [18] D. Crisan. Particle filters, a theoretical perspective. InN. de Freitas A. Doucet and N. Gordon, editors,Sequential Monte-Carlo Methods in Practice. Springer,2001.

    [19] E. Andrade, S. Blunsden, and R. Fisher. Modellingcrowd scenes for event detection. In Int. Conf. onPattern Recognitiion, ICPR 2006, pages 175–178,2006.

    [20] A. Mitiche and P. Bouthemy. Computation andanalysis of image motion: a synopsis of currentproblems and methods. Int. J. Computer Vision,19(1):29–55, 1996.

    [21] Fluid experimental flow estimation based on anoptical-flow scheme. Experiments in fluids,40(1):80–97, 2006.

    [22] B. Lucas and T. Kanade. An iterative imageregistration technique with an application to stereovision. In In Proc. Seventh International JointConference on Artificial Intelligence, pages 674–679,Vancouver, Canada,, 1981.

    [23] B.D. Lucas. Generalized image matching by themethod of differences. PhD thesis, School of ComputerScience, Carnegie– Mellon University, Pittsburgh, PA.,1984.

    [24] B. Horn and B. Schunck. Determining optical flow.Artificial Intelligence, 17:185–203, 1981.

    [25] M. Black. Recursive non-linear estimation ofdiscontinuous flow fields. In Proc. Europ. Conf.Computer Vision, pages 138–145, Stockholm, Sweden,1994.

    [26] E. Mémin and P. Pérez. Dense estimation andobject-based segmentation of the optical flow withrobust techniques. IEEE Trans. Image Processing,7(5):703–719, 1998.

    [27] T. Corpetti, E. Mémin, and PO. Pérez. Denseestimation of fluid flows. IEEE Transactions onPattern Analysis and Machine Intelligence,24(3):365–380, March 2002.

    [28] A. Schödl, R. Szeliski, D. Salesin, and I. Essa. Videotextures. ACM Transactions on Graphics, specialissue, Proceedings ACM SIGGRAPH 2000, pages489–498, 2000.

    [29] L. Kovar, M. Gleicher, and F. Pighin. Motion graphs.ACM Transactions on Graphics, special issue,Proceedings ACM SIGGRAPH 2002, 21(3):473–482,2002.

    [30] F.X. Le-Dimet and O. Talagrand. Variationalalgorithms for analysis and assimilation ofmeteorological observations: theoretical aspects.Tellus, pages 97–110, 1986.