6
Online Marker Labeling for Fully Automatic Skeleton Tracking in Optical Motion Capture Johannes Meyer Markus Kuderer org M¨ uller Wolfram Burgard Abstract— Methods to accurately capture the motion of humans in motion capture systems from optical markers are important for a large variety of applications including animation, interaction, orthopedics, and rehabilitation. Major challenges in this context are to associate the observed markers with skeleton segments, to track markers between consecutive frames, and to estimate the underlying skeleton configuration for each frame. Existing solutions to this problem often assume fully labeled markers, which usually requires labor-intensive manual labeling, especially when markers are temporally oc- cluded during the movements. In this paper, we propose a fully automated method to initialize and track the skeleton configuration of humans from optical motion capture data without the need of any user intervention. Our method applies a flexible T-pose-based initialization that works with a wide range of marker placements, robustly estimates the skeleton configuration through least-squares optimization, and exploits the skeleton structure for fully automatic marker labeling. We demonstrate the capabilities of our approach for online skeleton tracking and show that our method outperforms solutions that are widely used and considered as state of the art. I. I NTRODUCTION Recently, methods to accurately capture the motion of people gained increasing interest for variety of applications including interaction, animation, orthopedics, and rehabilita- tion. The advantages of marker-based optical motion capture systems are that they provide automatic calibration proce- dures and precise position information about the markers at a high frame rate. Compared to markerless approaches, marker-based methods are typically more accurate and at the same time are more robust against occlusions [14]. Accordingly, they are perfectly suited to accurately capture even fast movements of people. However, inferring the body pose in terms of the skeleton configuration, i.e., the global pose and the joint angles of the underlying skeleton, from raw 3D marker position data is a challenging task. Although the camera system provides accurate 3D marker positions, their association to the individual markers placed on the person is initially unknown. This data association problem is called labeling and is usually solved by analyzing the geometric structure of the set of detected 3D marker positions, which requires tedious and time-consuming manual work even with state- of-the-art software. Furthermore, markers are occasionally occluded by parts of the body or objects around the tracked person, which makes the labeling of the remaining and All authors are with the Department of Computer Science at the Univer- sity of Freiburg, Germany. This work has partially been partly supported by the German Research Foundation under grant number EXC 1086 and the Research Training Group 1103. Fig. 1. Skeleton pose estimation. Top: Human with passive optical markers attached to the body. Bottom: Observed markers and the skeleton configuration as estimated by our fully automated method. especially the re-appearing markers even more challenging. Finally, given the labeling of the markers, inferring the skeleton configuration requires to take into account that the markers are only attached to the skin or to clothes. Hence, the skin movement causes the markers to slightly move with respect to the bones during the activity, so that the skeleton configuration needs to be inferred in a robust way. In this paper, we propose a novel fully automatic skeleton tracking technique. Our method is flexible and includes the labeling of passive markers, which can be placed on arbitrary positions on the human body, without any manual effort. We propose to use a parameterized standard human skeleton model, which we automatically adapt to humans of different size, and assume that each marker is attached to one of the limbs of this skeleton. Our approach to skeleton tracking jointly estimates the labeling of the observed markers and the skeleton configuration. In contrast to other approaches, our method exploits the information about the human skeleton during the labeling step and therefore provides a more informed data association. Our approach initializes the skeleton tracking based on a T-pose executed by the person being tracked (see Fig. 1). Our method uses this initialization step to scale the skeleton to the person’s size and aligns the skeleton to the person’s limbs. During tracking, it then in an alternating fashion labels the observed markers and optimizes the skeleton configuration in an expectation-maximization-like procedure [6]. Thereby, it exploits both the marker positions of the most recent frame and the current skeleton configuration to obtain a consistent

Online Marker Labeling for Fully ... - uni-freiburg.deais.informatik.uni-freiburg.de/publications/papers/meyer14icra.pdf · tedious and time-consuming manual work even with state-of-the-art

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Online Marker Labeling for Fully ... - uni-freiburg.deais.informatik.uni-freiburg.de/publications/papers/meyer14icra.pdf · tedious and time-consuming manual work even with state-of-the-art

Online Marker Labeling for Fully Automatic Skeleton Trackingin Optical Motion Capture

Johannes Meyer Markus Kuderer Jorg Muller Wolfram Burgard

Abstract— Methods to accurately capture the motion ofhumans in motion capture systems from optical markersare important for a large variety of applications includinganimation, interaction, orthopedics, and rehabilitation. Majorchallenges in this context are to associate the observed markerswith skeleton segments, to track markers between consecutiveframes, and to estimate the underlying skeleton configurationfor each frame. Existing solutions to this problem often assumefully labeled markers, which usually requires labor-intensivemanual labeling, especially when markers are temporally oc-cluded during the movements. In this paper, we propose afully automated method to initialize and track the skeletonconfiguration of humans from optical motion capture datawithout the need of any user intervention. Our method appliesa flexible T-pose-based initialization that works with a widerange of marker placements, robustly estimates the skeletonconfiguration through least-squares optimization, and exploitsthe skeleton structure for fully automatic marker labeling. Wedemonstrate the capabilities of our approach for online skeletontracking and show that our method outperforms solutions thatare widely used and considered as state of the art.

I. INTRODUCTION

Recently, methods to accurately capture the motion ofpeople gained increasing interest for variety of applicationsincluding interaction, animation, orthopedics, and rehabilita-tion. The advantages of marker-based optical motion capturesystems are that they provide automatic calibration proce-dures and precise position information about the markersat a high frame rate. Compared to markerless approaches,marker-based methods are typically more accurate and atthe same time are more robust against occlusions [14].Accordingly, they are perfectly suited to accurately captureeven fast movements of people.

However, inferring the body pose in terms of the skeletonconfiguration, i.e., the global pose and the joint anglesof the underlying skeleton, from raw 3D marker positiondata is a challenging task. Although the camera systemprovides accurate 3D marker positions, their associationto the individual markers placed on the person is initiallyunknown. This data association problem is called labelingand is usually solved by analyzing the geometric structureof the set of detected 3D marker positions, which requirestedious and time-consuming manual work even with state-of-the-art software. Furthermore, markers are occasionallyoccluded by parts of the body or objects around the trackedperson, which makes the labeling of the remaining and

All authors are with the Department of Computer Science at the Univer-sity of Freiburg, Germany. This work has partially been partly supported bythe German Research Foundation under grant number EXC 1086 and theResearch Training Group 1103.

Fig. 1. Skeleton pose estimation. Top: Human with passive opticalmarkers attached to the body. Bottom: Observed markers and the skeletonconfiguration as estimated by our fully automated method.

especially the re-appearing markers even more challenging.Finally, given the labeling of the markers, inferring theskeleton configuration requires to take into account that themarkers are only attached to the skin or to clothes. Hence,the skin movement causes the markers to slightly move withrespect to the bones during the activity, so that the skeletonconfiguration needs to be inferred in a robust way.

In this paper, we propose a novel fully automatic skeletontracking technique. Our method is flexible and includes thelabeling of passive markers, which can be placed on arbitrarypositions on the human body, without any manual effort.We propose to use a parameterized standard human skeletonmodel, which we automatically adapt to humans of differentsize, and assume that each marker is attached to one of thelimbs of this skeleton. Our approach to skeleton trackingjointly estimates the labeling of the observed markers and theskeleton configuration. In contrast to other approaches, ourmethod exploits the information about the human skeletonduring the labeling step and therefore provides a moreinformed data association.

Our approach initializes the skeleton tracking based on aT-pose executed by the person being tracked (see Fig. 1). Ourmethod uses this initialization step to scale the skeleton to theperson’s size and aligns the skeleton to the person’s limbs.During tracking, it then in an alternating fashion labels theobserved markers and optimizes the skeleton configurationin an expectation-maximization-like procedure [6]. Thereby,it exploits both the marker positions of the most recent frameand the current skeleton configuration to obtain a consistent

Page 2: Online Marker Labeling for Fully ... - uni-freiburg.deais.informatik.uni-freiburg.de/publications/papers/meyer14icra.pdf · tedious and time-consuming manual work even with state-of-the-art

labeling and to reliably identify re-appearing markers thatwere temporally occluded.

We implemented our approach and evaluated it in exten-sive experiments. The results demonstrate the effectivenessand reliability of our approach and show that it outperformsCortex, a state-of-the-art commercial solution for trackingarticulated objects. Furthermore, we present results indicat-ing that our method is highly efficient and enables onlineskeleton tracking on a standard desktop computer.

II. RELATED WORK

The problem of inferring the structure and motion ofobjects based on the observation of markers attached tothese objects has been studied intensively in the past. Acommon approach is to detect connected rigid bodies andto estimate the structure of the underlying skeleton in termsof the joints connecting these segments. Ringer and Lasenby[15] cluster observed 3D markers using the variance of thepairwise distance over all frames. Given the association of allmarkers to the segments, they determine their offsets, similarto our work. Similarly, de Aguiar et al. [5] and Kirk et al.[10] propose a method to cluster the observed markers torigid bodies and estimate the center of rotation between theresulting limb segments.

Several authors estimate the joint rotation centers and thejoint angles of a human skeleton from clusters of labeled2D and 3D marker positions while taking into accountskin movement artifacts [1, 2, 11]. Cerveri et al. [2] andKlous and Klous [11] even infer the skeleton structure andits configuration by calculating a statistics over all frames,which allows only offline processing of the captured data. Incontrast, the method proposed by Aristidou and Lasenby [1]as well as our approach are able to track the skeleton onlineframe by frame without observing all frames first.

In many applications, for example when tracking humans,the underlying skeleton structure is known in advance. Inparticular, Contini [3] describes a standard human skeletonmodel, which we also use in the work described here. Zordanand Van Der Horst [18] propose a force based approach to fitsuch a human skeleton model to observed markers. Similarto our work, Xiang et al. [16] estimate the configuration of ahuman skeleton and utilize the knowledge about joint limits.One can either use medical studies [7] to determine thesejoint limits or learn them for individual humans by observingtheir motions [9].

All the approaches described above assume known markerlabels to estimate the underlying skeleton model, or theysolve the marker labeling independently of the skeletonestimation process. In contrast to these methods, we proposea marker labeling technique that exploits the knowledgeabout the current skeleton configuration. Similarly, Herdaet al. [8] present a method to estimate the labeling of themarkers in each frame and Yu et al. [17] aim to track multipletargets by fitting rigid bodies into the observed point cloud.Both approaches are able to deal with occluded markers butthey require to manually label the markers in the first frame.

Fig. 3. This figure illustrates two linked joints si and its parent segmentp(si), their local coordinate systems and the transformation between the twosegments. Furthermore, it illustrates an observed point and its normalizedprojection and distance to the segment si, which is used to compute thelikelihood for the assignment of the observed marker to a segment.

In contrast, our approach performs labeling and skeleton poseestimation automatically without any user assistance.

In addition to the academic approaches described above,there are commercial solutions available for tracking articu-lated objects. We compare our work to Cortex developedby Motion Analysis [4] that provides online labeling ofobjects after a prior manual initialization and model trainingprocedure. However, as opposed to our method, Cortexentirely separates the labeling from the skeleton tracking andtherefore requires tedious manual post-processing to achieveresults that are comparable to those obtained with our system.

III. BASICS

The goal of this section is to give a formal problemdefinition and to present the foundations of online marker-based skeleton tracking.

A. Problem Definition

At each discrete time step t, we assume to receive a frameof data Ft that is a set of unlabeled 3D points {oi,t}. Eachpoint oi,t ∈ Ft is the observed 3D position of a markerattached to one of the limbs of a person. The goal of ourmethod is to estimate the skeleton configuration C for eachframe, i.e., to estimate the global pose and the joint angles ofthe underlying skeleton. In particular, we use a human skele-ton model that consists of 14 connected segments si ∈ Shaving overall 45 degrees of freedom and that is based onmedical data [3]. However, the 3D observations are subjectto measurement noise and the markers are usually placedon the skin. During the movements of the body this inducesnon-deterministic variations of 3D position of the markersalso with respect to the segments of the skeleton. Thereforewe consider the full posterior probability p(C | F ) andestimate the maximum likelihood skeleton configuration ofthe skeleton configuration given the observations. This entailsto label the observed points, i.e., to find the associationfunction χt : Ft → M that assigns each point to a markerlabel mj ∈ M , which is one of the key challenges of themethod presented in this paper.

B. Skeleton Model

In this work, we use a hierarchical skeleton model, wherethe pose of each segment si is defined by its local pose withrespect to the coordinate system of its parent segment p(si).In particular, we denote the quaternion that describes the

Page 3: Online Marker Labeling for Fully ... - uni-freiburg.deais.informatik.uni-freiburg.de/publications/papers/meyer14icra.pdf · tedious and time-consuming manual work even with state-of-the-art

Initialization

Labeling (based on preceding frame)

Optimization

EM

Labeling (based on skeleton)

Optimization

Framet→

t+1

T-pose

Resize skeleton model

Extract arms and legs

Fit skeleton to principal axes

Assign markers to segments

otop

First principalaxes of limbs

h

Fig. 2. Overview of the proposed method for automated skeleton estimation. Based on the observed markers in the first frame our method adjusts theskeleton model to the estimated height and fits the skeleton into the observed point cloud using the principal axes of arms and legs (middle and right).Furthermore, it initializes the marker labeling and their association to skeleton segments. Left: In each successive frame, most observed points are labeledbased on the preceding frame by nearest neighbor association. Repeated optimization of the skeleton configuration and association based on the currentskeleton estimate robustly labels the remaining points.

orientation of segment si in its parent’s coordinate systemat time step t as q

p(si)si (t) and the position as t

p(si)si (t),

respectively. Fig. 3 shows an example of a segment si andits parent segment p(si). Consequently, the global orientationqgsi(t) of segment si at time step t is recursively given by

qgsi(t) = qgp(si)(t)� qp(si)si (t) , (1)

where � is the quaternion product. Similarly, the globalposition tgsi(t) of segment si is recursively given by

tgsi(t) = tgp(si)(t) + qgp(si)(t)� tp(si)si (t). (2)

The position tp(si)s0 (t) = tgs0(t) and orientation q

p(si)s0 (t) =

qgs0(t) of the root segment, which is the hip segment in ourexperiments, describes the global pose of the skeleton. Thepositions t

p(si)si (t) of the remaining segments correspond to

their parent’s lengths and are therefore static and specified bythe skeleton model. As a result, the skeleton configuration Cconsists of 14 orientations that correspond to the joint anglesand the global position of the skeleton. This results in overall14 · 3 + 3 = 45 degrees of freedom of C.

We assume each marker mj to be rigidly attached toone of the segments of the skeleton and introduce thefunction ξ :M → S that maps each marker to a segment.The local position of a marker mj in the coordinate systemof its corresponding segment is denoted as p

ξ(mj)mj . Thus,

the position of marker mj in the global coordinate systemat time step t is given by

pgmj(t) = tgξ(mj)

(t) + qgξ(mj)(t)� pξ(mj)

mj. (3)

IV. PROBABILISTIC MARKER-BASED SKELETONTRACKING

In general skeleton tracking, one aims at estimating theskeleton configuration C1:t from time step 1 to t giventhe frame of unlabeled, noisy 3D observations of markersF1:t = {oi,1:t}. In particular, we consider the likelihoodL(C1:t | F1:t) of the skeleton configuration given the obser-vations. However, the association of markers to segments ξ1:tand the labeling of the observations χ1:t are latent variables

in our observation model. Thus, to compute the likelihood

L(C1:t | F1:t) = p(F1:t | C1:t)

=∑

χ1:t,ξ1:t

p(F1:t, ξ1:t, χ1:t | C1:t) (4)

we marginalize over these latent variables. Since the max-imization of Eq. (4) is infeasible in practice, we rely onthe popular EM algorithm, which iteratively determines themaximum likelihood skeleton configuration

C∗1:t = argmaxC1:t

∑χ1:t,ξ1:t

p(F1:t, ξ1:t, χ1:t | C1:t) . (5)

In particular, the EM algorithm consists of two steps. First,the E-step computes the expectation value of the latentvariables E(χ1:t, ξ1:t | C(k)

1:t , F1:t) given the configurationC

(k)1:t . Second, the M-step computes the configuration C(k+1)

1:t

that maximizes the likelihood under these expectations.However, evaluating all possible marker and segment asso-

ciations over all frames is not feasible in practice. Therefore,we propose the following approximations:

1) We assume the association ξ to be static and onlycompute it once in the initialization phase.

2) We consider online skeleton tracking and thereforerecursively compute the likelihood. Hence, we assumethe latent variables of the preceding frame to be knownso that the recursive E-step reduces to computingthe expectation value E(χt | C1:t, F1:t, χ1:t−1) and therecursive M-step computes C(k+1)

t .3) We apply the Hungarian method for optimal assign-

ment to efficiently approximate the expectation valueof the latent variables in the E-step. This maximumlikelihood assignment technique is also known as hardEM in the literature.

Fig. 2 illustrates the resulting algorithm for online skeletontracking. In the initialization phase, we determine the initialskeleton configuration as well as the association functionξ. For every incoming frame at time Ft, we first label theobservations based on the labeling of the preceding frameand optimize the skeleton configuration to obtain an initialguess for the EM skeleton estimation. We then iteratively

Page 4: Online Marker Labeling for Fully ... - uni-freiburg.deais.informatik.uni-freiburg.de/publications/papers/meyer14icra.pdf · tedious and time-consuming manual work even with state-of-the-art

find the most likely labeling χt given the current skeletonconfiguration (E-step) and optimize the configuration Ct(M-step). In the following, we describe the individual stepsof our algorithm for automatic skeleton tracking in detail.

V. T-POSE INITIALIZATION

In this section, we present the initialization step of ourmethod to estimate the skeleton configuration of a persongiven the 3D position of observed markers that are attachedto the body, as illustrated in Fig. 2.

The goal of the initialization step is to estimate the initialskeleton configuration C0, to determine the initial labelingχ0, i.e., to assign a marker label to each observed point in thefirst frame of observations F0, and to determine ξ0, i.e., toassign each marker label to one of the skeleton segments. Weassume that the person initially stands in the T-pose, in whichthe person stands on the floor, stretches both arms and legs,and holds the arms approximately horizontally sidewards asshown in Fig. 1. Furthermore, we assume that there is onemarker placed on top of the person’s head. Fig. 2 (middle)illustrates the initialization process.

First, we obtain the height of the person by extracting theuppermost marker observation. Given the height, we scalethe skeleton model, i.e., the local position vectors t

p(si)si

of all segments si ∈ S \ {s0}, to match the size of theobserved person. By considering the anatomy of humans,we identify the subset of points that belong to the legsand to the arms and compute their first principal axes, asillustrated in Fig. 2 (right). We align the skeleton model tothese axes through least-squares optimization to obtain theinitial skeleton configuration C0. Note, that our initializationmethod is robust against deviations from the perfect T-posedue to the optimization-based alignment.

We initialize the bijective association function χ0 : F0 →M by assigning each observed point oi,0 to the markermi. For each observed point oj , we compute the projectionn(oj , si) and distance d(oj , si) to the corresponding segmentsi = χ0(oj), normalized by the segment’s length and radius,respectively. Thus, as illustrated in Fig. 3, points with thevalue of n and d in the interval [0, 1] form a tube that hasapproximately the size of the corresponding body part. Wedefine the likelihood of a point oi to belong to a segment sias

L(si | oi) = f(n(oj , si)) f(d(oj , si)) , (6)

where f is a function that has high values in the interval[0, 1] and converges to 0 outside this interval. As a result,we assign high likelihood to points located inside the tubethat represents the body part corresponding to segment si.In particular we choose

f(x) = φ((x+ θ1)θ2) φ((1− x+ θ1)θ2) (7)

withφ(x) =

0.5x√1 + x2

+ 0.5. (8)

The parameters θ1 and θ2 determine the convergence prop-erties of f . Then, we assign each point to the segment it is

most likely attached to:

ξ(mj) = argmaxsj∈S

L(sj | χ−10 (mj)

). (9)

Finally, we compute the initial estimate of the relative posi-tion p

ξ(mj)mj of all markers with respect to their corresponding

segments.To resolve the ambiguity between the two possible head-

ings of the person in T-pose, we maintain both hypotheses atfirst and dismiss one hypothesis as soon as it gets unlikelydue to the joint limits of the skeleton (Sec. VI-C).

VI. JOINT LABELING AND SKELETON ESTIMATION

Skeleton tracking aims at maximizing the likelihood ofthe skeleton configuration Ct given the unlabeled, noisyobservations of markers in an online fashion. As illustratedin Fig. 2, we initialize the labeling based on the precedingframe in a nearest neighbor association. Given the initialskeleton configuration, we simultaneously perform the label-ing of marker observations and the estimation of the skeletonconfiguration through an EM procedure, which alternatelyupdates the labeling based on the skeleton configuration andestimates the skeleton configuration given the labeling.

A. Labeling based on the Preceding FrameWe initialize the labeling of every incoming frame of

observations based on the labeling and the skeleton config-uration of the preceding frame. Due to the high frame rateof motion capture systems usually most of the markers onlymove a short distance between two consecutive frames. Inour approach, we initially label these markers through nearestneighbor association given the preceding frame.

In particular, we optimally associate the labeled observa-tions of frame Ft−1 to the unlabeled observations of frameFt given the spacial distance as a cost function. Additionally,we define the threshold θPF

max as an upper limit for a validassociation to avoid a wrong labeling in cases where onemarker disappears and another marker appears at the sametime. We determine the optimal one-to-one assignment usingthe Hungarian method [12]. This method assigns each obser-vation of Ft to one observation of Ft−1 such that the sum ofthe distances is minimized. The resulting labeling function isχt(oi,t) = χt−1(oj,t−1) for each pair (oi,t,oj,t−1) that hasbeen assigned by the Hungarian method. Note, that we canadapt the thresholds θPF

max online for each marker individuallybased on statistics over the previous frames to account for achanging velocity of the markers.

B. Labeling based on the Skeleton ConfigurationIf a marker was occluded during some frames and reap-

pears, or if it moves more than the threshold θPFmax, we

cannot label it based on the preceding frame. However, thecurrent skeleton configuration Ct provides a prediction of theposition of each marker, given by the global marker positionspgmj

(t). According to Sec. VI-A, we use the Hungarianmethod [12] to assign all remaining observations to markersthat have not already been labeled based on the precedingframe. Here, we assign only observed points to markers ina radius of θS

max to be more robust against outliers.

Page 5: Online Marker Labeling for Fully ... - uni-freiburg.deais.informatik.uni-freiburg.de/publications/papers/meyer14icra.pdf · tedious and time-consuming manual work even with state-of-the-art

C. Skeleton Estimation

Given the full or partial labeling χt of the marker ob-servations of the current frame Ft and the relative positionpξ(mj)mj of each marker mj ∈ M with respect to its corre-

sponding segment, we estimate the skeleton configuration Ct.In particular, we estimate the maximum likelihood skeletonconfiguration of p(Ct | Ft) taking into account the uncer-tainty of observations and the joint limits of the skeleton.Due to the optical measurement process and skin effects, weassume Gaussian noise on the 3D observations of markerswith respect to the segments of the skeleton. Thus the M-step translates to optimizing the skeleton configuration withrespect to the mean squared error of the marker distancesand the joint limits of the skeleton.

For each observation oi,t, we compute the reprojection,i.e., the estimated global position given Ct, of the marker as-signed to this observation pgχt(oi,t)

(t) by evaluating Eq. (3).We aim to find the configuration of the skeleton thatminimizes the quadratic reprojection error, which is thequadratic distance between this reprojection and the actualobservation. To improve the performance of the skeletonpose estimation, we include penalties for joint configurationsthat are outside of certain limits defined by considering thenatural configurations of the skeleton, forcing for examplethe knee to move in one degree of freedom. Thus, the overalloptimization function at time step t,

f(Ct) =∑i∈It

∥∥∥pgχt(oi,t)(t)− oi,t

∥∥∥2 + l(Ct) , (10)

sums over the set It of labeled marker observations and takesinto account the joint limit cost function l(Ct).

We determine the skeleton configuration Ct using theoptimization framework g2o [13], which provides modularand flexible gradient descent optimization. Particularly, weformulate the skeleton estimation problem as a graph, inwhich the skeleton configuration is contained in a variablenode, the relative marker positions are contained in fixednodes, and the components of the error function f(Ct) arecomputed in unary and binary edges. The gradient of f(Ct)with respect to the skeleton parameterization Ct is computedby means of numerical differentiation. This optimizationprocedure usually converges after a few iterations, evenduring fast movements and when initialized with the skeletonconfiguration Ct−1 of the preceding frame.

VII. EXPERIMENTAL RESULTS

We evaluated the presented algorithm on various motioncapture recordings of different test subjects and marker sets.Each data set was recorded with a Motion Analysis motioncapture system with ten Raptor-E cameras at 100 Hz framerate. Our method was able to process data online at 100 Hzon an Intel R© CoreTM i7-2600K CPU with 3.40 GHz.

A. Initial association of markers to limbs

In a first set of experiments, we evaluated the initializationmethod (see Sec. V) that associates observed markers to theskeleton segments. Therefore, we compared the association

ObservationsOur method

Preceding frameCortex online

Cortex manual 0

10

20

30

40

0 5000 10000 15000

# O

bserv

ati

on

s

Frame

0

10

20

30

40

0 5000 10000

# O

bserv

ati

on

s

Frame

0

10

20

30

40

0 5000 10000 15000

# O

bserv

ati

on

s

Frame

0

10

20

30

40

0 5000 10000 15000

# O

bserv

ati

on

sFrame

0

10

20

30

40

0 5000 10000 15000

# O

bserv

ati

on

s

Frame

Fig. 4. The number of observed markers (solid) and the number of wronglyor not labeled markers (dashed) of our proposed method compared to thecommercially available software Cortex and to a baseline method.

of our method to manually labeled ground truth data. Formarkers that were located in the joint area between twosegments, for example the elbow or the knee, we allowedthe association to both segments. As parameter for theassociation of markers to limbs in Eq. (7) we chose θ1 = 5and θ2 = 0.2. In 24 experiments, our approach associated99.1% of overall 969 markers to the correct segment.

B. Performance of the marker labeling

A key challenge during the estimation of the skeleton poseof a human is to associate the observed points to the correctmarker labels in each frame. To evaluate and compare thelabeling performance of our method, we manually annotatedfive publicly available datasets1 of overall 10 min motioncapture data, including walking, sitting on a chair, stretchingarms and legs, and jumping. Fig. 4 shows for each dataset andfor our method as well as the following three alternatives thenumber of observed markers (Observations) and the numberof markers that were either not or falsely labeled:• Preceding frame: Our method with labeling based on

the preceding frame only.• Cortex online: Motion Analysis Cortex without manual

post processing.• Cortex manual: Motion Analysis Cortex where each

dataset was manually post-processed for 1 h.Overall, our approach was able to correctly label 99.6% of allmarkers whereas Cortex online correctly labeled only 79.8%of the data without manual post processing. After one hour

1http://www.informatik.uni-freiburg.de/∼kudererm/

Page 6: Online Marker Labeling for Fully ... - uni-freiburg.deais.informatik.uni-freiburg.de/publications/papers/meyer14icra.pdf · tedious and time-consuming manual work even with state-of-the-art

0 0.1 0.2 0.3 0.40

0.5

1

1.5

2x 10

5

Ob

serv

ati

on

s

Distance in m0 0.1 0.2 0.3 0.4

0

10

20

30

40

50

Ob

serv

ati

on

s

Distance in m

(a) correct associations (b) wrong associations

Fig. 5. Histogram of the movement of markers between two consecutiveframes for correctly associated markers (a) and incorrectly associatedmarkers (b) when assigned by the Hungarian method in a typical capture.Note the substantially distinct scales of the y-axes, which shows that thebulk of the markers are correctly associated based on the preceding frame.

of manual post processing for each dataset, Cortex manualcorrectly labeled 93.6% of the data. Obviously, labeling ofmarkers based on the preceding frame only is not sufficient,as occluded markers are not correctly labeled when theyreappear. Thus, our approach provides the best performancein fully automatic marker labeling for skeleton tracking.

C. Data Association Threshold Experiments

Labeling based on the preceding frame is sensitive to theparameter θPF

max, which determines the maximum distance foroptimal data association in the Hungarian method. We canadjust this parameter for each marker online using statisticsover the previous frames. Thus, we can adapt the thresholdto changing velocities of individual markers. To initialize thethresholds, we evaluated the movements of markers in a typ-ical capture of natural movements. Fig. 5 shows a histogramof correctly (left) and incorrectly (right) associated markersbased on the preceding frame with θPF

max set to ∞. Note, thatthe scales of the two figures deviate largely, which indicatesthat most of the markers are labeled correctly. All correctlyassociated markers moved less than 5 cm compared to thepreceding frame. The incorrectly labeled observations aremostly due to occlusions. Therefore, in all of our experimentswe chose the initial value θPF

max = 5 cm.

D. Ambiguity of the Initial Pose

As described above, our method initially tracks both poten-tial headings of the person until the constraints in the jointsand especially in the knee joints disambiguate the situation.To evaluate the robustness of our approach to determine theheading we evaluated its performance on 24 datasets withtwo different humans and six different marker setups. In allof the 24 datasets, our method chose the correct heading after1.78 sec in average, with a standard deviation of 0.93 sec.

VIII. CONCLUSIONS

In this paper we presented a fully automatic method toestimate the underlying skeleton configuration of a humanbased on the position of markers perceived in a motioncapture system and freely attached to the human. Our methoduses the popular EM algorithm to compute the most likelyskeleton configuration by estimating the unknown associationof observations to marker labels in each frame. In contrast to

existing approaches, which typically require tedious manualpost processing, our method solves the estimation of themarker labeling without any user intervention. In an ex-tensive set of experiments we demonstrate that our methodoutperforms even a commercially available and state-of-the-art method for skeleton pose estimation.

REFERENCES

[1] A. Aristidou and J. Lasenby. Real-time marker predictionand CoR estimation in optical motion capture. The VisualComputer, 29, 2013.

[2] P. Cerveri, A. Pedotti, and G. Ferrigno. Kinematical modelsto reduce the effect of skin artifacts on marker-based humanmotion estimation. Journal of Biomechanics, 38(11), 2005.

[3] R. Contini. Body segment parameters II. Artificial Limbs, 16(1), 1972.

[4] Motion Analysis Corp. Cortex, 2013. URL http://www.motionanalysis.com/html/industrial/cortex.html. Version4.0.0.1387.

[5] E. de Aguiar, C. Theobalt, and H.-P. Seidel. Automaticlearning of articulated skeletons from 3D marker trajectories.In Second Int. Symposium on Advances in Visual Computing(ISVC), Part I, 2006.

[6] A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum like-lihood from incomplete data via the EM algorithm. Journal ofthe Royal Statistical Society. Series B (Methodological), 1977.

[7] R. Hepp and R. Debrunner. Orthopadisches Diagnostikum.Thieme, 7 edition, 2004.

[8] L. Herda, P. Fua, R. Plankers, R. Boulic, and D. Thalmann.Using skeleton-based tracking to increase the reliability ofoptical motion capture. Human Movement Science, 20(3),2001.

[9] L. Herda, R. Urtasun, P. Fua, and A. Hanson. Automaticdetermination of shoulder joint limits using quaternion fieldboundaries. Int. Journal of Robotics Research, 22, 2003.

[10] A.G. Kirk, J.F. O’Brien, and D.A. Forsyth. Skeletal parameterestimation from optical motion capture data. In IEEE Conf. onComputer Vision and Pattern Recognition (CVPR), 2005.

[11] M. Klous and S. Klous. Marker-based reconstruction of thekinematics of a chain of segments: a new method that incor-porates joint kinematic constraints. Journal of BiomechanicalEngineering, 132(7), 2010.

[12] H.W. Kuhn. The hungarian method for the assignmentproblem. Naval Research Logistics Quarterly, 2(1), 1955.

[13] R. Kummerle, G. Grisetti, H. Strasdat, K. Konolige, andW. Burgard. g2o: A general framework for graph optimization.In Proc. of the IEEE Int. Conf. on Robotics & Automation(ICRA), 2011.

[14] S. Obdrzalek, G. Kurillo, F. Ofli, R. Bajcsy, E. Seto, H. Jimi-son, and M. Pavel. Accuracy and robustness of kinect poseestimation in the context of coaching of elderly population. InEngineering in Medicine and Biology Society (EMBC), 2012Annual Int. Conf. of the IEEE, 2012.

[15] M. Ringer and K. Lasenby. A procedure for automaticallyestimating model parameters in optical motion capture. InProc. of the British Machine Vision Conf., 2002.

[16] Yujiang Xiang, Salam Rahmatalla, Jasbir S Arora, and KarimAbdel-Malek. Enhanced optimisation-based inverse kinemat-ics methodology considering joint discomfort. Int. Journal ofHuman Factors Modelling and Simulation, 2(1), 2011.

[17] Q. Yu, Q Li, and Z. Deng. Online motion capture markerlabeling for multiple interacting articulated targets. ComputerGraphics Forum, 2007.

[18] V.B. Zordan and N.C. Van Der Horst. Mapping optical motioncapture data to skeletal motion using a physical model. InProc. of the 2003 ACM SIGGRAPH/Eurographics symposiumon Computer animation, 2003.