29
J Intell Robot Syst DOI 10.1007/s10846-013-9961-0 A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture David S. Schacter · Mario Donnici · Evgeny Nuger · Matthew Mackay · Beno Benhabib Received: 12 September 2013 / Accepted: 14 September 2013 © Springer Science+Business Media Dordrecht 2013 Abstract A novel methodology is proposed to select the on-line, near-optimal positions and ori- entations of a set of dynamic cameras, for a re- configurable multi-camera active-vision system to capture the motion of a deformable object. The active-vision system accounts for the deformation of the object-of-interest by fusing tracked ver- tices on its surface with those triangulated from features detected in each camera’s view, in or- der to predict the shape of the object at subse- quent demand instants. It then selects a system configuration that minimizes error in the recov- ered position of each of these features. The tan- gible benefits of using a reconfigurable system, particularly with translational cameras, versus sys- tems with static cameras in a fixed configuration, D. S. Schacter · E. Nuger · M. Mackay · B. Benhabib (B ) Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada e-mail: [email protected] D. S. Schacter e-mail: [email protected] M. Donnici Department of Mechanical and Industrial Engineering, University of Calabria, Rende, Italy e-mail: [email protected] are demonstrated through simulations and exper- iments in both obstacle-free and obstacle-laden environments. Keywords Active vision · Camera reconfiguration · Deformable objects · Motion capture · Robot vision systems · View planning 1 Introduction The motion capture of deformable objects—those whose shape’s variance is distributed relatively uniformly over their surfaces—is a complex re- search problem, owing to the inherent difficulties in recovering dynamic surface forms in noisy en- vironments [15]. Often, these methods rely on breaking the object’s surface into a mesh of ver- tices whose motion is then tracked [1, 6], as will be the focus herein. Nevertheless, deformable- object-motion capture has been attempted in a va- riety of applications, such as performance capture [7, 8], expression recognition [9], tele-immersion [10], and soft-tissue deformation tracking [11, 12]. However, for many proposed systems, the underlying assumption has been that the posi- tions and orientations (poses) of the cameras are fixed in space, and clear views of the object- of-interest (OoI) are available. These constraints limit the applicability of such systems, especially,

A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

  • Upload
    beno

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot SystDOI 10.1007/s10846-013-9961-0

A Multi-Camera Active-Vision Systemfor Deformable-Object-Motion Capture

David S. Schacter · Mario Donnici ·Evgeny Nuger · Matthew Mackay ·Beno Benhabib

Received: 12 September 2013 / Accepted: 14 September 2013© Springer Science+Business Media Dordrecht 2013

Abstract A novel methodology is proposed toselect the on-line, near-optimal positions and ori-entations of a set of dynamic cameras, for a re-configurable multi-camera active-vision system tocapture the motion of a deformable object. Theactive-vision system accounts for the deformationof the object-of-interest by fusing tracked ver-tices on its surface with those triangulated fromfeatures detected in each camera’s view, in or-der to predict the shape of the object at subse-quent demand instants. It then selects a systemconfiguration that minimizes error in the recov-ered position of each of these features. The tan-gible benefits of using a reconfigurable system,particularly with translational cameras, versus sys-tems with static cameras in a fixed configuration,

D. S. Schacter · E. Nuger · M. Mackay ·B. Benhabib (B)Department of Mechanical and IndustrialEngineering, University of Toronto, Toronto,ON, Canadae-mail: [email protected]

D. S. Schactere-mail: [email protected]

M. DonniciDepartment of Mechanical and IndustrialEngineering, University of Calabria, Rende, Italye-mail: [email protected]

are demonstrated through simulations and exper-iments in both obstacle-free and obstacle-ladenenvironments.

Keywords Active vision · Camerareconfiguration · Deformable objects ·Motion capture · Robot vision systems ·View planning

1 Introduction

The motion capture of deformable objects—thosewhose shape’s variance is distributed relativelyuniformly over their surfaces—is a complex re-search problem, owing to the inherent difficultiesin recovering dynamic surface forms in noisy en-vironments [1–5]. Often, these methods rely onbreaking the object’s surface into a mesh of ver-tices whose motion is then tracked [1, 6], as willbe the focus herein. Nevertheless, deformable-object-motion capture has been attempted in a va-riety of applications, such as performance capture[7, 8], expression recognition [9], tele-immersion[10], and soft-tissue deformation tracking [11,12]. However, for many proposed systems, theunderlying assumption has been that the posi-tions and orientations (poses) of the cameras arefixed in space, and clear views of the object-of-interest (OoI) are available. These constraintslimit the applicability of such systems, especially,

Page 2: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

in the presence of obstacles occluding views ofthe deformable OoI, and in the event of self-occlusion [13].

In order to address the abovementioned chal-lenges, researchers have suggested the use ofactive-vision systems, wherein cameras are re-configured in response to changes in the environ-ment [14–17]. Namely, the poses of the camerasmay be varied dynamically in order to maximizethe number of un-occluded features, and/or tominimize the error in recovering the positions ofthese features [18].

Recovering a dynamic surface form is a chal-lenging problem, and methods that have beendeveloped for the on-line selection of optimalcamera poses to improve an OoI’s 3D-shape re-covery are, typically, limited to cases where theobject is rigid (e.g., [19–22]). When dealing withdeformable objects, these methods suffer froma number of shortcomings. First, by assuming aunique and static representation of the OoI, theyare unable to recognize or track the OoI if itsshape or appearance changes. Second, they as-sume that the entire OoI’s shape can be knowngiven the position of a small number of referencepoints on it, due to a rigid shape known a priori.Accordingly, they do not necessarily ensure thatall desired features of interest are observable bythe cameras at all times, thereby allowing un-acceptable errors in the recovered OoI’s shape.Third, when faced with self-occlusions, they donot reconfigure to views which avoid increasingthe errors in the occluded features’ estimatedpositions.

A number of research teams have been focus-ing on overcoming these shortcomings. The activemulti-camera system proposed in [23], for exam-ple, was designed for volumetric reconstructionof a deformable object for 3D video. This systemreconstructs the OoI’s shape by calculating thevolume intersection of silhouettes projected fromthe cameras’ image planes onto multiple parallelplanes, although a control scheme to select cameraviewpoints is not mentioned.

In [24], for a given image sequence, factor-ization is performed on detected feature pointsin order to determine the relative camera poses.Although this technique was designed for 3Ddeformable-object-motion capture, it does not

prescribe explicitly how the camera poses shouldbe chosen optimally. The active multi-camera sys-tem proposed in [25] does address the issue ofoptimal camera viewpoint selection, even in thepresence of occluding obstacles. This system usesan agent-based approach to maximize a visibilitycriterion over bounding ellipsoids encircling theobject’s articulated joints. However, as one notes,only recognition of articulated (multi-rigid-link)objects is addressed.

The camera-assignment method discussed in[26] maximizes the visibility of an unknown de-formable OoI. The proposed windowing schemecontrols the orientation of pan/tilt cameras, al-though, no provision is made for future defor-mation prediction. A stochastic quality metricwas proposed in [27] for the optimal control ofan active-camera network for deformable-objecttracking and reconstruction, however, as in [20],no configuration-management component or de-formation prediction was presented. An activepan-tilt-zoom camera system that does use atracking algorithm to keep a moving person cen-tered in each camera’s view was proposed in [28].This system, however, does not explicitly optimizefor each target’s visibility, allowing the possibilityfor occlusion.

From the abovementioned works, one may con-clude that none of the existing systems adjustcameras’ positions to better observe deformingobjects. Allowing cameras in an active multi-camera network to translate may be particu-larly beneficial in two cases. First, in mixed-scaleenvironments—where the scale of the OoI’s de-formations are significantly smaller than the scaleof the OoI’s motion within a workspace, which isusually the case with mobile deformable objects—a system must simultaneously capture the defor-mation, which requires high resolution imagerysuitable for resolving small details, and increasethe working volume in which motion capture ispossible [29]. Translation allows even a limitednumber of cameras with limited zoom to do justthat: to approach an OoI and recover its finedeformations, while allowing a wide region in aworkspace to be covered as needed. Second, incluttered environments, a system must be able tocapture the motion of an OoI even when it passesbehind an obstacle. In these cases, translation

Page 3: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

allows a system with even a limited number ofcameras to recover the OoI’s deformation byrepositioning them to viewpoints from which theOoI is un-occluded by obstacles.

Herein, a reconfigurable active-vision systemthat minimizes the error in capturing the motionof a known deformable object with detectiblefeatures is proposed. Such a system would bebeneficial in improving the effectiveness of meth-ods such as those presented in [1, 6], which cap-ture the motion of clothing with a known colorpattern, and could be used to capture the motionof the flowing dress of a dancer moving arounda cluttered set. This system is novel in that itaccounts for the deformation of the OoI in anon-line manner, and allows for pan/tilt as wellas translation of the cameras. Moreover, an al-gorithm is provided for the efficient determina-tion of near-optimal camera poses. Simulationsdemonstrate the system’s ability to reduce theerror in deformation recovery and account forerror in deformation prediction. Additionally, ex-periments empirically validate the effectiveness ofthe proposed implementation with application tomotion capture, and show the benefit of allowingreconfigurable cameras to translate.

This paper is, thus, organized as follows. Thenotation used throughout the paper is providedin Section 2, and the problem addressed herein is,then, defined in Section 3, in terms of the objectiveto be achieved, and the underlying tasks requiredto achieve it. Section 4 provides the descriptionof, and theory behind, the proposed system forsolving this problem. This system is evaluated inthree distinct simulations in Section 5, and in twoexperimental scenarios in Section 6, respectively.Both Sections 4 and 5 provide their respectivedescription of the test set-up, procedure, results,and discussion. Section 7 concludes the paper.

2 Notation

T Set of all DIs (demand instants)J Total number of DIst j jth DIj Index of current DIV j Set of all vertex positions at t j (set of

[3 × 1])

n Number of verticesxi, j Vertex i’s position in world coordinates

[3 × 1]xi, j Estimate of Vertex i’s position in world

coordinates at t j [3 × 1]C Set of all camerasck Camera kRc, j Pose of camera cat t j

E Average error metric over all DIsE j Error metric at t j

m Number of degrees of freedom for eachcamera

δ Index of a camera’s degree of freedom�i, j Ellipsoidal uncertainty region around Ver-

tex iat t j

W i, j Scaled covariance matrix of Vertex I’s pre-dicted position at t j [3 × 3]

γ Number of variances defined by user to beenclosed in uncertainty ellipse

�ijc Projection of uncertainty ellipsoid intocamera c’s pixel coordinates

vc Arbitrary pixel in camera c pixel coordi-nates [2 × 1]

� Pc Mapping of world coordinates to pixel co-ordinates for camera c [2 × 1]

Jcij Jacobian matrix of � Pc evaluated for xi, j[2 × 3]

Pc Camera c’s projection matrix [3 × 4]xc Pixel coordinates of x in camera c’s image

[2 × 1]si, j State of vertex i at DI j in terms of position

and velocity [6 × 1]si, j Estimate of state si, j [6 × 1]ω Noise in vertex dynamics [6 × 1]Q Covariance of vertex dynamics [6 × 6]ηi, j Measurement noise for vertex i in DI j

[3 × 1]Ri, j Covariance of measurement noise for ver-

tex i in DI j [3 × 3]βij Covariance variable for Vertex i’s measure-

ment at DI jσ Standard deviation of vertex position

detectionoicj Boolean occlusion value for a given vertex

at a given DI, in the perspective of camera cxi, j Predicted world coordinates of Vertex i in

DI j [3 × 1]X i, j Covariance matrix of vertex i’s predicted

position at DI j [3 × 3]

Page 4: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Gq Objective function reflecting expected er-ror for a given camera configuration at tq

q Index of DI after t j

Uq Expected uncertainty in vertex position es-timates for a given camera configurationat tq

�q Non-uniform uncertainty unaccounted forin Uq

εac Error due to Camera c calibrationεz pc Triangulation error based on baseline sep-

aration between camera pairsb pc Baseline between camera pairzpc Distance between baseline and vertex cen-

troid (depth)wa Weighting for camera calibration inaccu-

racy errorswz Weighting for camera triangulation errorspc Index of camera pair

3 Problem Formulation

The problem of multi-camera reconfiguration fordeformable-object motion-capture is defined be-low in the case of a known object with detectiblefeatures.

3.1 Objective

Let us suppose that a single deformable object,the OoI, is present in a known workspace, and theshape of its dynamically deforming surface duringthe time period of interest is a priori unknown,and is to be determined over a set of J discretepoints in time given at the outset, T = {t j| j =1, . . ., J}, called demand instants (DIs). The OoI’sidentity is known a priori, and there exist a num-ber of detectable feature points on its surface suchthat the shape of its surface at the jth DI, t j, can bemodeled in 3D by a mesh of n vertices, one at eachfeature’s location, V j = {

xi, j|i = 1, . . . , n}

, con-nected by a set of edges. Vertex trajectories are,therefore, not known deterministically over T.

The objective is, thus, to define a system which,given a set of k calibrated cameras, C, attachedto actuators with known kinematic capability (mo-bility, constraints, and maximum speed and accel-eration), will locate the cameras at each DI, t j,in such a configuration, RC, j, so as to determine

the position of the n vertices with the minimumerror over all DIs. One may note that this problemdiffers from traditional placement problems, inthat the object is deforming in time, and so V j

is not constant across T up-to some Euclideantransformation.

If the system determines the position of the ith

vertex at DI t j, in world coordinates, to be xi, j,the accuracy of this estimate can be defined by itsdifference from the vertex’s true position, xi, j. Anoverall error metric, E, is defined as the optimalitycriteria to represent this error across all verticesand DIs according to:

E =(

∑J

j=1E j

)

/

J, (1)

where the error metric at each DI, E j, is defined as

E j = 1n

∑n

i=1xi, j − xi, j. (2)

Thus, given that the estimated positions of thevertices, V j, are implicitly determined by the posesof the cameras with which they are observed, anideal system should be capable of minimizing Eover T according to:

minRC;T

(

E(

Rc=1,...,k; j=1,...,J)) ;

s.t. Rcjmin δ< Rcjδ < Rcjmax δ

;c = 1, . . . , k; δ = 1, . . . , m; j = 1, . . . , J (3)

where, for Camera c, Rcj = (

Rc, j1 , . . . , Rc, jm

)

is itspose, and Rc, jmin δ

and Rc, jmax δdefine the limits of

its achievable range of motion in the δth degree offreedom (dof) at DI t j.

The above error metric assumes that the truepositions of the OoI’s vertices are known; in prac-tice, however, they may only be measured to acertain level of accuracy. Thus, when evaluatingthis error metric herein, results for simulationsreflect the actual error, whereas results for exper-iments are provided to an accuracy of the afore-mentioned measurement error.

3.2 Required Tasks

In order to achieve its objective, the proposedsystem must be able to fulfill a number of different

Page 5: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

functions in an on-line manner, each subject to anumber of challenges:

Detection Given a set of images acquired at acertain DI t j, the system must detect as manyof the OoI’s features as possible. It must do soefficiently in the face of significant uncertaintyin their expected locations in each image, andfrom multiple changing viewpoints. Moreover, thesystem must account for the fact that texturalfeatures may themselves be deforming in time.

Association Given a set of detected features ineach image at a DI, t j, the system must be able tocorrectly associate each one with its correspond-ing vertex, despite uncertainty in its predicted lo-cation, limited capability to uniquely differentiateeach one, and close proximity of similar featuresdue to many densely distributed vertices.

Recovery Given a set of corresponding featuresin images acquired by multiple cameras withknown poses, the system must recover the worldcoordinates of the associated vertices. Since thecameras are reconfiguring, maintaining an accu-rate camera model is a challenging task, and therecovery method must be able to perform accu-rately despite errors in calibration.

Prediction Since cameras cannot be instanta-neously maneuvered to desired poses, the systemmust predict what deformation the object willadopt in the next DI. Such predictions must beable to account for uncertainty in the dynamicsof the deformable OoI’s vertices. Moreover, if thevertices are subject to occlusion, predictions mustbe able to recover from drift to ensure that thesystem remains stable.

Reconf iguration Given the predicted deformation,the system must determine which configurationit should adopt to minimize the shape recoveryerror at the next DI. Since this error is a non-linear combination of each camera’s view, thereconfiguration method should account for theglobal effect of all cameras’ poses taken together.Solving for such a high-dimensional search spacein an on-line environment necessitates an efficientalgorithm.

4 Proposed System

The proposed online active-vision system-reconfiguration method comprises seven modulesto accomplish the required tasks depicted Fig. 1.The first two modules, namely, image capture,

Fig. 1 Proposedsystem-reconfigurationalgorithm

Page 6: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

and feature detection and association, both run onindependent camera agents such that they can beprocessed in parallel for each camera. Similarly,the camera-view selection and configuration-quality evaluation modules also run on as manyindependent view management agents.

4.1 Image Capture

A set of cameras with synchronized clocks aretasked with capturing images at a specified DI.The cameras are moved into a configuration pre-scribed by the Camera Assignment module soas to be stationary in the optimal pose at theappointed time. When the images are acquired,the poses of the cameras at that point are stored,and the images forwarded for feature detection.

4.2 Feature Detection and Association

In order to determine a feature’s position inworld coordinates, it must first be detected in thecaptured images. Searching an entire image foreach feature is computationally expensive. How-ever, efficiently and robustly matching featuresbetween images taken from different viewpoints,(or, different points in time) without any prior ex-pectation for the features’ position in image space,is a challenging open problem [30]. Although typi-cal motion-capture methods can track the positionof a feature off-line, and thereby make use of com-putationally expensive methods, a reconfigurablecamera system which attempts to account for self-occlusions does not have that luxury, as an up-to-date model of the OoI’s shape is required ateach DI.

Hence, strong priors are needed to constrainthe feature-detection search process, and to facil-itate efficient feature matching between images,

which can be provided using tracking. Since aprediction for a vertex’s position in world coordi-nates at t j is available, it is not necessary to searchthe entire image space of each camera’s capturedimage for features. Instead, given known cameraposes, the probability distribution for a vertex’sposition can be projected into each camera’s view,thereby, providing the prior for a feature’s posi-tion in image space necessary both to constrainthe search and to provide an expected associationwhich can be used to match corresponding fea-tures between views.

Thus, in order to detect features in each image,first, a search region is defined in each imagefor each Vertex i,based upon the region in whichthe vertex is predicted to lie. This region can bedefined using two outputs obtained from the 3DDeformation Recovery and Prediction module atthe previous DI, t j−1: the prediction for the posi-tion of the vertex at the jth DI, x −

i, j, and the 3 ×3 covariance matrix representing its uncertainty,X

−i, j. Namely, an ellipsoidal uncertainty region,

�i, j, enclosing the vertex’s expected position, canbe defined as:

�i, j :{

x|(

x − x −i, j

)TW−1

i, j

(

x − x −i, j

)

≤ 1}

, (4)

where

W i, j = γ X−i, j, (5)

x is a column vector representing some positionin world coordinates, γ is a user defined scalarparameter regulating the level of uncertainty fromVertex i’s expected position, to be enclosed by theellipsoid [31], and W i, j is, thus, the 3 × 3 matrixdefining the size and shape of said ellipsoid. Theprojection of this ellipsoid into the cth camera’simage space can be linearly approximated by anelliptical region of pixels [32],

�ijc :{

vc

(

vc − � Pc

(

x −i, j

))TW−1ijc

(

vc − � Pc

(

x −i, j

)) ≤ 1,

vc ∈ N2

}

, (6)

where

W ijc =(

JcijW i, j JTcij

)

, (7)

vc is a column vector representing an arbitrarypixel in Camera c’s pixel coordinates, Jcij is theJacobian matrix of � Pc evaluated for x −

i, j, and

Page 7: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

� Pc : R3 → N2 maps Euclidean world coordinatesto Camera c’s pixel coordinates as follows:

xc =[⌊

x′cx

x′cz

⌋ ⌊

x′cy

x′cz

⌋]T = � Pc(x), (8)

x′c = Pcx′ =

[

x′cx

x′cy

x′cz

]T, x′

c ∈ P2, (9)

where Pc is Camera c’s projection matrix, x′ is thehomogeneous coordinates representation of x, x′

crepresents the homogeneous coordinates of x′ inCamera c’s image, and xc represents that point’spixel coordinates.

Thus, by modifying the method proposed in[31], Algorithm 1 below allows the detector tocover a minimal search space by beginning witha small region around the predicted feature lo-cation, defined by a low value of γ , and succes-sively increasing γ to expand the search regionuntil either a feature is found, or until γ exceedsγThreshold, the maximum level of uncertainty withinwhich to detect features. The higher γThreshold is,the greater the chance that a feature will indeedbe detected, but the more time may be wastedsearching for a pixel which may in-fact not bevisible in the image, and the greater the likeli-hood of detecting a feature which in-fact belongsto a different vertex. Since these factors dependon the number of vertices, how densely-packedthey are, the speed of the detection algorithm’simplementation, and the accuracy of the trackingand prediction module, γThreshold should be tunedempirically to balance the aforementioned trade-offs. Moreover, since each region is prescribed fora specific vertex, the resulting detected featureis automatically associated with the appropriatecorresponding vertex.

Since different objects may have different fea-tures, an appropriate feature detector which isboth fast enough to work in real-time as well asrobust enough to detect the feature from differentviewpoints should be used. It is important to notethat since the object is non-rigid, textural featuresmay appear deformed, and an associated featuredetector should also be robust to warping of thefeatures in the image plane [33].

4.3 3D Deformation Recovery and Prediction

Once the features in each image are detected, theyare forwarded to the vertex Position Recoveryand Prediction module, so that the world coor-dinates of their corresponding vertices are deter-mined at the current and future DIs.

Current DI Vertex Triangulation Given the poseof each of the cameras at DI t j, and the 2D lo-cation of each of the detected features in eachcamera’s image, the 3D location for the verticescorresponding to each of these features can betriangulated.

Current DI Vertex Location Estimate Since mea-surements are themselves subject to error, forany specific application, a priori knowledge of thenature of the deformation can be used to improvethe estimate of the 3D vertex position. Thus, giventhe triangulated 3D positions of each of the de-tected vertices at DI t j, their estimated positionscan be found by fusing those triangulations withtheir predicted positions at t j, themselves deter-mined during DI t j−1 based upon the given OoI’sdynamic model. To this end, a number of differentestimation and prediction methods can be em-ployed, and the particular ones selected shoulddepend on the dynamics of the deformations forthe given application.

In order to provide a generic methodology, weassume that the OoI’s deformation is not knowndeterministically, but that its vertices’ accelera-tions vary randomly and isotropically about amean value of 0. Thus, we propose the use of aKalman Filter (KF) [34] with a constant-velocitymotion model with Gaussian process noise, sinceit can provide both estimation and prediction forthe OoI’s vertices positions, as well as for the

Page 8: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

uncertainty thereof, although other approachescould have been used. In order to implementthe selected approach, the state, the parametersof the dynamic model, and the measurementmodel, including its associated parameters, mustbe defined.

The state of each vertex at the jth DI can bedefined in 3D world coordinates as:

si, j =[

xTi, j xT

i, j

]T, (10)

where xi, j is the velocity of xi, j, and the state’sestimate is defined as:

si, j =[

xTi, j x∧T

i, j

]T. (11)

In order to define the parameters in the dynamicmodel, the uncertainty in the vertex dynamics forany vertex, ω, is assumed to be constant in time,and is set to be normally distributed according top (ω) ∼ N (0, Q). Application-specific knowledgecan be used to specify the values for Q; how-ever, if such information is unavailable, Q can belearned from ground truth data [35].

Since the vertices of a deformable object mayhave velocities which are not constant in time,and these velocities are included in the state, themeasurement model must be defined in a mannerwhich updates them. The measurements from thecameras, however, only detect vertex positions.Therefore, in order to update the velocities, a first-order measurement model was selected which al-lows position measurements, from the previousDI, t j−1, to update, at the current DI, t j, boththe estimated state defined in Eq. 11, and theestimated state covarianceSi, j, where the last termrepresents the amount of uncertainty in the statefor a given Vertex i. The measurement noise,ηi, j, was modeled as a normal distribution withcovariance Ri, j, p(ηi, j) ∼ N(0, Ri, j).

Since the OoI is deforming and moving, certainfeatures may not be visible from particular cameraposes due to self-occlusion, occlusion by an obsta-cle, or field-of-view (FOV) limitations. Occlusion,therefore, influences features’ detectability fromparticular views, so the state of occlusion of eachvertex is explicitly modeled to inform both vertex-estimation and view-evaluation modules.

For a given Vertex i, its occlusion value at DIt j, for Camera c, oicj, is set to 0 if its associated

feature is not visible, and to 1 if it is visible.Visibility of a feature in a given camera’s viewcan be determined based on the camera’s knownpose, the known positions of obstacles, and thepredicted form of the OoI at the DI t j. Namely, ifthe predicted vertex position x −

i, j lies outside of thecamera’s FOV, or a ray passing between it and thecamera is intersected by a polygon in the model ofthe OoI itself, or of an obstacle, then the featurecan be classified as not visible by Camera c.

Thus, Ri, j is defined as follows:

Ri, j = βijI3×3, (12)

where

βij ={

σ 2, if∑

c oicq ≥ 2∞, if

c oicq < 2. (13)

Above, σ is the standard deviation of detectedvertex positions for the set of cameras. Hence,when a vertex is unoccluded in at least 2 cameraviews, it is expected to be detected near its trueposition with a high degree of certainty; however,when it isn’t, it is unlikely to be detected nearthat point, and in fact, any positive detection islikely a false-positive. Therefore, by modeling Ri, j

in this manner, appropriate weights can be placedon the strength of any detection’s contribution tothe estimated vertex location.

Based upon the aforementioned specificationfor the state-space model, the standard KFevolution equations, provided for reference inAppendix A, can be used to estimate the de-formed shape of the OoI at DI t j. Once the po-sitions of all the vertices at DI t j are determined,they are forwarded as motion capture results, andalso used for future DI predictions.

Future DI Vertex Location Prediction In orderto determine the optimal configuration for thenext DI, the system must predict what deforma-tion the object will adopt. A prediction, Vq ={

xi,q|i = 1, . . . , n}

, for the most likely position ofeach of the vertices at subsequent DIs, tq > t j, canbe made using sequential recursive KF iterations,and stored for use in determining the optimalsystem configuration at those DIs. Sequential it-erations can be calculated based on the realizationthat at DI t j, no measurements yet exist for futureDIs, and so estimates of the measurements can

Page 9: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

be used instead. Therefore, the measurement ofVertex i at DI tq, zi,q, can be replaced by zi,q =Hs−

i,q, where H is the measurement matrix and s−i,q

is the predicted state at DI tq. Consequently, thesemeasurements contribute no new information inthe state update equation, therefore, allowing itto be replaced with si,q = s−

i,q. The prediction ofthe state at any DI, tτ , can be determined byiteratively following the process for the KF fromq = j+1 until q = τ . Thus, given the predictionsfor the positions of all the vertices at some DItτ , and the known object mesh connecting all thevertices, a predicted model of the OoI’s form atthat instant can be rendered.

4.4 Configuration-Error Evaluation

In order to accurately reflect the error, E, in-herent in the sensing task, an error metric mustconsider a number of factors, such as the prioruncertainty in vertices’ positions, and for theconfiguration in question, the number of visiblevertices, and the accuracy with which features canbe detected in each image, and triangulated. Ac-cordingly, an objective function, Gq, representingthe expected error at DI q, is defined herein as:

Gq = Uq + �q. (14)

Above, Uq is the expected amount of uncertaintyremaining in vertices’ position estimates once agiven configuration has been adopted at DI tq. Itis defined herein as:

Uq =n

i=1

det(

X i,q)

, (15)

where det(.) is the determinant operator. Thismetric is useful since for each Vertex i, det

(

X i,q)

reflects the expected uncertainty in xi,q as it is pro-portional to the square of the volume enclosed byits uncertainty ellipsoid. Taking the sum of thesevalues over i allows vertices with very small error(e.g., ones whose positions we know with veryhigh accuracy) to contribute minimal additionalerror, but not to cancel the error of other vertices(whose positions we potentially know with verypoor accuracy), as would occur if we multipliedeach of the vertices’ errors.

Since X i,q is modeled using uniform measure-ment noise, it does not account for the variabilityin errors in the detection process, and neglectsthe non-uniform uncertainty in detected featurelocations between multiple camera views, as isthe case with triangulation. This is critical forreconfigurable cameras, particularly those withmany degrees of freedom, where calibration erroris expected to introduce a significant amount oferror in triangulation. In order to overcome thisissue, the term �q is included in the proposedobjective function:

�q = wa

k∑

c=1

εa2c+ wz

k(k−1)/2∑

pc=1

ε2zpc

, (16)

where εac is the error caused by inaccuracies incalibration for a given camera, and can be mod-eled, using sensor modeling, with respect to therelative position of the cameras and the OoI’svertices. Also, εz pc

is the error in triangulation,herein evaluated in terms of the depth-error [36]of each pair of cameras, pc:

εzpc= n

2

(

Z 2pc

b pc

)

, (17)

where, for each pair, b is the baseline—namely,the distance between the pair of cameras—andZ is the distance between it and the centroid ofthe vertices. The weights wa and wz in Eq. 16 areused to normalize the errors, and can be selectedthrough empirical testing.

In this manner, the estimation error for a givenconfiguration can be evaluated given the OoI’spredicted deformed shape and each camera’s poserelative to it. This formulation of the objectivefunction yields four desirable effects:

1) Given uniform prior uncertainty in vertexposition, the objective function is minimizedwhen the configuration at hand providesthe greatest ability to view the vertices inquestion, since for a properly formulatedmeasurement covariance matrix, superiorconfigurations minimize the measurement er-ror, which in turn minimizes the expectedvertex position uncertainty.

2) The more uncertain the location of a particu-lar vertex at the previous DI, tq−1, the moreimpact observing said vertex will have on

Page 10: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

reducing its uncertainty and, thus, the morelikely the camera pose selection algorithm isto assign a camera to observe that vertex.

3) Errors in calibration can be mitigated by ex-plicitly reconfiguring the cameras to thoseviews which minimize the detection error.

4) The amount of triangulation error due tothe relative camera positions associated witha given configuration can be explicitly ac-counted for and minimized.

Thus, this objective function provides aneffective means by which the error from anygiven configuration can be minimized for a futureDI. Nevertheless, since the input to the objec-tive function, X i,q, is only a prediction of the ex-pected uncertainty in future measurements, whensignificant drift accumulates due to vertices be-ing unobserved over successive frames, Gq can-not be guaranteed to exactly match Eq for eachconfiguration. In these cases, X i,q may be inaccu-rate since the expected location of a vertex whichdrifts may be far enough from its true locationthat the vertex’s expected visibility in a givencamera may be wrong. Therefore, in these cases,minimizing Gq at each DI may yield sub-optimalconfigurations instead of those corresponding tothe true minimum of E.

4.5 Camera-View Selection

The configurations that yield the minimum errorcan be determined according to the optimiza-tion problem described in Section 3.1. The searchspace of this problem, however, has dimensional-ity mkJ, rendering it computationally expensive.Therefore, in practice, an efficient algorithm isemployed to reduce the search space to dimen-sionality mby optimizing each camera’s pose in-dependently at each frame. Namely, at a given DItq, for a given Camera c∗, and test configurationcontaining the expected pose of each other cam-era, Rc=1,...,c∗−1,c∗+1,...,k;q, as well as Camera c∗’sprevious pose, Rc∗,q−1, the optimal pose for Cam-era c∗, R∗

c∗,q, which minimizes the error over allvertices, Gq

(

Rc=1,...,c∗,...,k;q)

, is determined withinthe space of achievable poses. This is expressed inthe optimization problem (18) below, where thepredicted form of the object

(

Vq, M)

, the model ofobstacles in the workspace, each camera’s FOV,

aspect ratio, and current pose, and each vertex’suncertainty are known.

R∗c∗,q = argmin

Rc∗q

(

Gq(

Rc=1,...,c∗,...,k;q)) ;

s.t. Rcqminδ< Rcqδ < Rcqmaxδ

c = 1, . . . , k; δ = 1, . . . , m (18)

Above, the limits Rc,qminδand Rc,qmaxδ

are cal-culated based upon the workspace constraints,(Rcminδ

, Rcmaxδ), and the maximum distance

the cameras can reach in the available time,Rcδmax (�t), namely:

Rc,qminδ= max

(

Rc, q − 1δ − Rcδ max (�t) , Rcmin δ

)

,

(19)

Rc,qmax δ= min

(

Rc,q−1δ+Rcδ max(�t),Rc max δ

)

. (20)

This approach allows a reduction in not only thesearch space, but also in the computation timeof the algorithm, since Camera c∗ is the onlycamera with variable pose and, hence, all othercameras’ parameters used in calculating the er-ror can be pre-computed. Moreover, since eachcamera’s contribution to the error is calculatedindependently, each camera’s pose optimizationcan be performed in parallel.

It is important to note that in the optimizationproblem in Eq. 18, the objective function stillrepresents the estimation error due to the infor-mation provided by the combined set of cameras,and only the search space, over which that erroris evaluated, is limited to one camera at a time.Due to the limited search space, however, it ispossible that the global minimum exists outside ofit and, thus, an approach must be used to allow thecameras to find the global minimum.

4.6 Camera Assignment

A Camera Assignment module is used to ensurethat camera poses initially selected independentlyprovide a beneficial configuration when consid-ered together. Since optimizing all camera posessimultaneously over time is computationally pro-hibitive, this assignment requirement ensures thatresulting camera configurations provide the de-sired effect while still allowing poses to be selectedin a computationally efficient manner.

Page 11: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Thus, Algorithm 2 below selects the confi-guration that reduces the overall error with whichthe shape of the deformation is estimated. It doesso by determining the maximum improvement tothe error which can be achieved by moving anyone camera, assigning the camera which allowsfor such improvement to its optimal pose, re-evaluating the visibility of each vertex for thegiven configuration, and reiterating the processuntil the optimal error converges.

Below, Select_Camera_Conf iguration is the al-gorithm which solves optimization problem (18),cchange is the index of the camera whose change inpose leads to the greatest decrease in error, Gold

qrepresents the error for the previous iteration’sconfiguration, �G is the change in error over theprevious iteration, and �Gthr is some specifiedthreshold on said change, indicating the stop con-dition, under which the error is expected to havesufficiently converged.

The other condition to interrupt the optimiza-tion process is when the time spent processingdata, tproc, exceeds the maximum time limit forthe selection of a final configuration tmax. Thislimit ensures that sufficient time remains to movecameras to their respective poses if no further timeremains to process the algorithm, and that cam-eras are assigned to the best known configuration.The steps are as follows: vertices visible in nomore than one camera’s views are removed fromthe calculation of Gcq, when the overall system

cannot reach a consensus in the allowed time tmax,cameras are positioned in a fallback pose. Namely,off-line, cameras are given a prescribed pose suchthat all cameras are evenly spaced around theOoI, and their views are centered at its vertices’centroid.

Since Algorithm 2 is not guaranteed to yield aglobal minimum, one important factor which mustbe considered is the selection of the initial pointof the optimal configuration, Rc=1,...,k;q. The initialconfiguration is herein specified as the currentconfiguration due to its ability to both lead tominima of the objective function and to satisfy theconstraints. Namely, if the OoI does not deformsignificantly between successive DIs, the optimalconfiguration found for the current DI should beclose to the optimal configuration for the nextDI as well. Additionally, the current configurationis guaranteed to lie within the achievable regiondefined by the camera motion constraints. Otherpoints may, however, be selected depending onthe nature of the application.

4.7 Motion Capture

Motion capture can be performed on-line or off-line depending on the needs of the application.For on-line deformation estimation, the estimatesfor vertex positions, xi, j, are acquired directlyfrom the KF at each DI t j. Performing such a taskoff-line, on the other hand, would allow the useof data from future DIs to further improve theaccuracy of the estimate at DI t j. Moreover, thesystem could process images directly using com-putationally expensive techniques, such as [37],to extract the shape of the entire OoI’s surface,rather than relying solely on vertex data.

4.8 OoI Feature Guidelines

Given the high degree of variability in non-rigiddeformations for different objects, such as interms of deformation speed and complexity, thefollowing guidelines are proposed in order to en-sure that an OoI’s deformation indeed can becaptured. The OoI should have permanent fea-tures on its surface, but said features need not beuniquely identifiable. The number and positionsof such features is determined by the complexity

Page 12: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

of the OoI’s deformation, such that the greaterthe complexity of the deformation, the greaterthe number of features required to establish asufficient sampling rate necessary to accuratelyapproximate the shape of the object with a meshmodel. These features can be natural texturalfeatures, such as high contrast spots or colourpatterns; however, where no such features exist,markers should be placed to be able to triangulateand track points on the OoI’s surface. The specificdistribution of markers will depend on the par-ticular nature of the OoI’s deformation, and theresolution with which the OoI’s shape is desiredto be estimated.

5 Simulations

Three simulations were performed in order toevaluate different aspects of the system’s per-formance. The first simulation’s objective was todemonstrate the system’s effectiveness in recov-ering an OoI’s deformed shape at different lev-els of camera reconfigurability. The second’s ob-jective was to demonstrate the system’s robust-ness to uncertainty in the predicted deformation’sshape. The third’s objectives were to compare theeffectiveness of the system, in an obstacle-laden,to that of non-translating cameras, and to deter-mine how its effectiveness changes with respect tothe number of cameras.

In all the simulations, a 3D virtual deformableOoI was generated, and its exact shapes werespecified over a set of DIs in a virtual environ-ment in order to provide ground truth data againstwhich the recovered deformations could be com-pared. The system, then, controlled a set of virtualcameras in this environment to observe the OoIand capture its motion. The detailed descriptionof the set-up, procedure, results, and discussion ofeach of these simulations are provided below.

5.1 Simulations 1 and 2: No Obstacles

5.1.1 Deformable Object Model

In order to simulate a non-rigid object which coulddeform in a wide variety of possible ways, a vir-tual 3D OoI surface model capable of adoptingspecified shapes was created. The OoI model wasgenerated by creating a mesh of 144 triangularpolygons between 74 vertices. Deformed shapeswere specified by applying a set of deformationsto a predefined neutral shape – a shape selectedsuch that any part of its surface can be recoveredby any symmetric configuration of cameras. Thisspecification for the neutral object was achievedby satisfying three principles: axial symmetry, uni-form surface density of vertices, and no concavi-ties.

In order to generate deformed shapes, 12 de-formation modes were defined for the OoI byapplying linearly independent transformations toall of its vertices simultaneously. Specific trans-formations were selected so as to allow the ob-ject to deform according to any combination ofstretching in the x−, y− and z−dimensions, shear-ing along the x − y, y − z, and z − xplanes, andbending along x, y, and z in each of the other twodimensions.

Also, in order to allow the OoI to deform overtime, a set of 6 deformed shapes were generated,some examples of which are shown in Fig. 2.

For a total duration of 48 DIs, the OoI’s shapewas specified at every 8th DI, starting with thefirst, to be one of the 6 shapes. The OoI’s shape ateach intermediate DI was specified by determin-ing the cubic spline interpolation for each vertex’sposition from the previously defined shapes.

Although any distinct features can be detected,in order to evaluate the accuracy of the proposedsystem, uniquely coloured spherical markers wereplaced at each of the vertices, and particular color

Fig. 2 Examples imagesof four deformationsviewed from the sameviewpoint

Page 13: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

regions in YCbCr space, of each of the markers,were used to represent their features in the im-ages. The centroid of the patches of connectedpixels whose colors fell within the accepted re-gions was detected in each image to determinethe features’ locations in image coordinates. Bytriangulating these features’ locations, their asso-ciated vertices’ estimated 3D positions could thenbe compared to the known true positions of thevertices in the simulation.

5.1.2 Camera Constraints

Four virtual cameras were placed around the OoI.They were specified to have a resolution of 640 ×480 and a vertical FOV of 45◦. As displayedin Fig. 3, their positions were constrained to avolume bounded within two concentric spheres,whose centers coincided with the mean positionof the neutral object’s vertices. The outer sphereradius was defined as 10Dmax, and the inner radiuswas defined as 0.5Dmax (in order to ensure thatthe cameras did not collide with the OoI), whereDmax was defined as the maximum dimension ofthe OoI’s bounding box.

Each camera was free to move within a quad-rant of the volume defined by two adjacent oc-tants. Camera orientations were constrained suchthat the camera focal line could not point morethan 45◦ away from the OoI’s mean vertex posi-tion, and the upward direction in each camera’scamera-coordinates always pointed in the plane

y

x

CameraOrientationconstraint

OoI

Positionconstraint

Fig. 3 Top view of camera constraints for simulation

prescribed by its focal line and the upward direc-tion in world-coordinates.

5.1.3 Simulation 1: System Reconf igurabilityComparison

Procedure The objective of Simulation 1 was todemonstrate the effectiveness of a multi-camerasystem at different levels of reconfigurability.Since configurations which the system could adoptat any DI were limited by the cameras’ rangesof motion and maximum achievable speeds, threedistinct trials were performed for this simulation:

Trial 1: Static cameras were employed. They re-mained at their initial respective poses overthe entire trial duration T. The initial systemconfiguration was determined based on the oneoptimally determined by our methodology forthe neutral OoI shape.

Trial 2: Dynamic cameras with limited speedswere employed. They could be repositionedat selected configurations at each DI sub-ject to camera-relocation speed and acceler-ation constraints. Namely, achievable systemconfigurations were limited to those whosecameras’ poses in the workspace could bereached within the available time before thenext DI. The maximum camera speeds and ac-celerations were constrained to 0.2Dmax/(t j −t j−1) and 0.4Dmax/(t j − t j−1)

2, respectively.Trial 3: Unconstrained dynamic cameras with no

speed or acceleration limitations were employed– representing the ideal case.

The test procedure was as follows: At eachDI, a view of the deformed OoI was ren-dered using a pin-hole camera model in OpenGLfrom each camera’s viewpoint as defined by theconfiguration adopted for that DI. Images cap-tured from these views were used as input tothe feature-detection algorithm, and the visiblecolored markers (vertices) were thereby detectedin each image and identified by matching theircolors to the corresponding features. The worldcoordinates of these detected features were trian-gulated based on the known poses of each of thecameras, and stored for later comparison. Then,the OoI was deformed to the shape specified forthe subsequent DI, and the process was repeated

Page 14: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Fig. 4 Error metric Etevaluated at each DI

for all DIs in T. DIs were defined to be evenlyspaced at 20-second intervals.

Results and Discussion As can be noted in Fig. 4,Trials 2 and 3 yielded about 41 % and 57 % im-provements in the overall error metric E, respec-tively, when compared to Trial 1 (static cameras).Clearly, employing active reconfiguration tangiblyimproves the deformation-recovery accuracy overstatic cameras, and the degree of reconfigurabilityis positively related to the degree of performanceimprovement.

As the OoI deforms, certain shapes may notallow for effective visibility of all the vertices bythe static cameras and, thus, the benefit of re-configuring the system, by relocating the camerasbecomes apparent. This effect can be noted inFig. 5, which displays an example of static ver-sus active cameras (Trials 1 and 3, respectively)for the same deformation at DI 31. The blue

mesh represents the true deformed shape andthe red dots represent undetected vertices. Byreconfiguring the active-vision system, a preferredview from Camera 1 allowed vertices undetectedunder the static configuration to actually be de-tected. Moreover, this superior configuration en-sured that the location of such vertices were ac-curately tracked not only in the current frame,but also in subsequent frames, whereas the staticconfiguration led to significant accumulation oferrors over the next four frames.

It is, thus, important to note that all OoI ver-tices must not necessarily be visible at every DIin order to minimize errors. At a given DI, asufficiently robust tracking method allows the lo-cations of vertices with minimal drift to be accu-rately estimated, even if they are not detected.Consequently, a reconfigurable system could re-quire fewer cameras than a static system would, ascameras can be dynamically and selectively reas-signed to detect only those vertices whose drifts

Fig. 5 DI 31: a Staticcameras, b Unconstraineddynamic cameras

(a) (b)

Page 15: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

exceed the threshold required by the applicationat hand.

In the static case, however, the amount of driftis not known a priori, and so the camera set-up must be planned such that all vertices canbe viewed at all times in order to account forthe potential drift of each vertex, even if suchdrift would not in fact arise. Namely, not only doreconfigurable cameras provide greater coverageof the OoI, but they also may yield lower errorestimates at the same coverage level.

5.1.4 Simulation 2: Robustness to Prediction Error

In order to capture the ideal images of the OoI ata DI, cameras must already be relocated to theiroptimal poses before the OoI actually assumesits deformation. Since the system configurationdepends on the OoI’s deformation, which is notknown a priori, on-line prediction of the OoI’sdeformation is necessary. Furthermore, since pre-dictions cannot be made without some degree ofuncertainty, it is critical that configurations supe-rior to the static case may still be selected in theface of prediction errors. The objective of Simula-tion 2 is, thus, to demonstrate the ability of thesystem-reconfiguration methodology to accountfor uncertainty in the predicted deformation.

Procedure In order to characterize predictionerror and evaluate its effect on the accuracy ofdeformation recovery, two trials were run:

Trial 1: The OoI’s true vertex coordinates wereused as the vertices’ predicted coordi-nates at each DI.

Trial 2: The output from the prediction modulewas used as the vertices’ predicted coor-dinates at each DI.

In both trials, the system was initially pre-sented with the true coordinates of the verticesat the first DI as a prediction for the OoI’sshape at that DI. Based on this prediction, theconfiguration-selection module determined an ini-tial configuration for the cameras. The system,then, operated in the same manner described inSection 5.1.3 for the limited-speed trial, exceptthat depending on the trial, either the output gen-erated by the prediction module, or the OoI’s true

vertex coordinates, were used as the input to theconfiguration-selection module. This process wasrepeated for all DIs in the trial, T.

In order to characterize the level of error ineach estimate introduced by the prediction mod-ule, the actual and predicted positions of eachvertex were recorded at each DI in Trial 1, and thenormalized error in the prediction was calculatedas the Euclidean distance between the predictedand actual coordinates divided by the radius of theobject, in percentages.

Results and Discussion As depicted in Fig. 6, theprediction error for a given vertex varies over theOoI’s deformation, with a mean of about 4 % ofthe OoI radius. One may note that for Trial 1,the views adopted by the cameras were selectedbased upon the true vertex locations. Thus, theerrors in Fig. 6 can be uniquely attributed to errorsintroduced by the prediction module itself, andnot to inadequacies in camera-view selection.

Even by using a simple tracking scheme, suchas a KF, with a 4 % mean tracking error, thesystem was able to achieve a maximum error ofonly 14 % as described in Section 5.1.3. Adoptinga more accurate tracking scheme might thereforeallow for improved system performance at thesame level of error, or lead to even lower error

Fig. 6 Distribution of errors in prediction, normalized as apercent of OoI radius and averaged over all DIs, for Trial 1

Page 16: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

in capturing the motion of deformable objects fora given level of system performance. For example,more accurate tracking could be beneficial for on-line applications by allowing the system to operateat a lower frame rate, and reducing the needfor complex or computationally expensive detec-tion methods. Additionally, it would reduce thelevel of calibration accuracy required, as errors incamera measurements would be compensated byincreased accuracy of predictions.

Given that a certain degree of error in theprediction is unavoidable, it is necessary to en-sure that drift from tracking error does notlead the system to become unstable in selectingconfigurations. In order to assess the proposedsystem’s robustness to such prediction error, theinstantaneous error metric, Et, was evaluated foreach of the recovered deformations in Trials 1 and2, and displayed in Fig. 7.

For Trial 2, the views adopted by the cameraswere selected based on the predicted vertex lo-cations, which were in turn dependent upon theprevious DI’s views. In this manner, the stabilityof the system could be evaluated since the effectof drift (prediction-error accumulation) over mul-tiple DIs was accounted for in the recovered de-formations. Over most DIs, the mean error metric,E, for Trial 2, did not exceed that of Trial 1 bymore than 2. In Frames 16 to 23, poor tracking ledto significant accumulation of estimation error. InFrame 23, however, the system was able to recoveraccuracy in estimation, demonstrating its stabilityin the face of significant drift. Thus, for the givendeformations, even using a simple tracking model,the system was robust to tracking error.

5.2 Simulation 3: Obstacle Avoidance and Impactof Number of Cameras

The objective of the third simulation was to eval-uate the system’s effectiveness in the presence ofobstacles, and how it changes with respect to thenumber of cameras. To this end, simulations wereconducted for cases with 3, 4, 5, and 6 cameras,where each case consisted of three trials: (i) staticcameras, (ii) cameras which were only free torotate (pan), and (iii) cameras which could rotateand translate. In each trial, the motion of a de-forming OoI moving between a pair of obstacleswas captured, and the error metric in each framewas evaluated. Below is the description of theworkspace setup, OoI, and simulation procedure,followed by the results and a discussion thereof.

Deformable Object Model The OoI was gener-ated from a set of 22 vertices stacked in tworings, forming a cylindrical shell with a deformingsurface to resemble the physical model used inthe experiments in Section 6. The vertices’ mo-tion was defined as the ground truth positionsof the vertices measured in the second scenarioof experiments in Section 6, such that the OoItranslated across the center of the workspace at25 mm per DI, while its surface deformed in asmooth manner.

Set-up and Camera Constraints The simulatedscene was contained within a virtual 1700 ×1200 mm2 workspace, with two static 45 mm diam-eter and 500 mm tall cylindrical obstacles placedas indicated in Fig. 8. Up to six cameras were

Fig. 7 Error metric Etevaluated at each DI forideal prediction (Trial 1)and actual prediction(Trial 2)

Page 17: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Fig. 8 Simulation 3 workspace with all cameras’ initialposes

present in the workspace, with initial poses set,as depicted in Fig. 8, such that, if the obstacleswere removed, all cameras could maintain the OoIwithin their FOV throughout its motion.

The particular cameras included in each casewere the ones whose number was no greater thanthe number of cameras in the case. To permit cam-era motion, in Trial 2, all cameras were allowedto pan ± 90◦ about the vertical-axis, and in Trial3, Cameras 2, 4, 5, and 6 were, when present,also allowed to translate over a range of 250 mm.The maximum speed and acceleration for thecameras was limited to 450 mm/s and 900 mm/s2,respectively, for translation, and 40.1 ◦/s and 80.2◦/s2, respectively, for rotation. Throughout, theheights of Cameras 1 through 4 were maintainedat 350 mm, while those of Cameras 5 and 6 weremaintained at 250 mm.

Procedure The simulation consisted of 15 DIsuniformly spaced at 20-second intervals, and todetect the vertices, instead of rendering imagesand running the feature detector, the simulationprojected the vertex coordinates of all unoccludedvertices in each camera’s FOV into the cameras’image-spaces, and then forwarded these featurelocations to the association module. Detectionerror was simulated by adding Gaussian noise,with a standard deviation 3 pixels in the x andy directions, to the detected feature’s location ineach image, which matched to the amount of errorexhibited by the feature detector in the real-world

experiments described in Section 6. The systemand simulations were implemented in Matlab on3.4 GHz CPUs, and the system completed calcu-lation of each frame in, on-average, 4.0 s in the4-camera case.

Results and Discussion The results of the simula-tion are presented in Fig. 9, wherein each verticalbar represents the range of errors Et across allDIs for a given trial and camera case, and eachhorizontal line connects the mean errors, E, for agiven trial across camera cases.

In the 3-, 4- and 5-camera cases, the proposedsystem reduced both the mean as well as the max-imum error relative to those trials where cameraswere static or could only change their orientation.As the number of cameras increased, the averageerrors of each of the trials decreased asymptot-ically, and the effectiveness of static and only-rotating cameras approached that of translatingcameras.

It is important to note that although, in the cur-rent simulations, repositioning the cameras pro-vided marginal improvement over the other trialswhen 6 cameras were used, in other situations(e.g., more obstacles are present in the scene, theOoI exhibits different motion or deformations),actively repositioning the cameras may indeed im-prove the error when 6 or even more cameras areavailable.

These results demonstrate the proposed sys-tem’s effectiveness at capturing the motion of de-forming objects in the presence of obstacles, andthe benefit of using it over static or simply rotatingcameras for a given number of cameras. Althoughthe simulations were conducted with a dynamicOoI moving through static obstacles, since all

Fig. 9 Mean and range of error metric Et with 3, 4, 5, and6 cameras

Page 18: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

motion is relative, they also demonstrate that ac-tively repositioning the cameras would improvethe error in capturing the motion of a deformableOoI in the presence of dynamic obstacles. This isparticularly beneficial as cameras may not be ableto be placed a priori in static positions that avoidocclusions from such obstacles.

6 Experiments

The objective of the experiments reported hereinis to confirm the effectiveness of the proposedmethod in recovering the shapes of deformableobjects, and in overcoming obstacles, and to char-acterize its performance with application to mo-tion capture of a dynamic deformable OoI. To thisend, two scenarios of real-world experiments wereperformed: obstacle-free, and with two obstaclesin the workspace, both in which a deforming OoImoves across a scene with 4 cameras. Each sce-nario in turn contained three trials of varying lev-els of reconfigurability, corresponding to cameraswith (i) static pose, (ii) adjustable orientation, and(iii) both adjustable orientation and translation.These scenarios, for example, could reflect the

motion capture of the flowing dress of a dancermoving across an open set, and a set with twocolumns, respectively.

6.1 Set-up

In the experiments, a 69 mm tall and 400 mmin circumference flexible model, akin to a cylin-drical shell with a deformable surface, providedprecise ground truth measurements for the shapeof the deformable OoI. The model shape couldbe accurately specified at each DI by adjustingthe poses of three braces, Fig. 10. The surface ofthe model was populated with twenty-two 6.2 mmradius colored spherical markers to represent thevertices.

The experimental scene was identical to the onedescribed in Section 5.2, but with four cameraspresent, two of which were mounted on high-precision 2-dof linear and rotary stages (Cameras2 and 4), and two mounted only on high-precision1-dof rotary stages (Cameras 1 and 3), Fig. 10.All stages were fixed to an optical table, and theirmotion could be controlled to within ±0.05 mmof translation and ±0.01◦ of rotation, so evenwithout the benefit of extrinsic calibration, the

Fig. 10 Experimentalset-up and OoI’s motionpath; Inset: DeformableOoI

Page 19: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

cameras’ motion model (and, thus, their poses forany given input to the stages) were known up-tothe rotational and positional offset at which theywere mounted on the stages.

Four Logitech QuickCam Pro 9000 cameraswere used with a resolution of 960 × 720 pix-els and a vertical FOV of 45◦. Each camerawas calibrated independently off-line using anadvanced geometric technique that employed afrontal-image concept, and a precise control-point-detection scheme with digital-image corre-lation [38]. The calibration was performed usingthe existing set-up, with a calibration target tem-plate fixed to the central stage. In this manner,first the intrinsic camera parameters were deter-mined and, then, the extrinsic parameters pro-vided during the calibration process were used tocalculate the camera-pose offsets in the motionmodel. The OoI model itself was mounted on ahigh-precision X − Y motion stage.

6.2 Implementation

The calibration error in Eq. 16, εac , was modeledas a function of both the Euclidean distance andthe angle between each camera and the centroidof the visible vertices. Feature detection was ac-complished using color segmentation and circlefitting, ensuring that the location of the projectedcenters of the color markers could be measuredto sub-pixel accuracy, even in the case of partiallyoverlapping features.

6.3 Procedure

In each scenario, three trials were run in which theOoI followed the path shown in Fig. 10, above,and underwent deformations. In the first scenario,the OoI stage translated at 9 mm per DI over asequence of 44 DIs. In the second scenario, obsta-cles were placed in the workspace, and the OoIstage translated at 25 mm per DI over a sequenceof 16 DIs. The OoI’s deformations in the secondscenario were the same as those in the first 16 DIsof the first scenario, and within both scenarios, allthree trials exhibited identical OoI deformationand stage displacement. DIs were also defined tobe evenly spaced at 20-second intervals in both

scenarios. The deformation sequence was gener-ated to achieve a smooth motion of the verticesalong the motion path of the OoI, and vertexpositions were measured in 3D to within 0.9 mmerror.

Only changes in camera orientation were per-mitted in Trial 2, whereas changes in both orien-tation and position were allowed in Trial 3. In thestatic-camera trial (Trial 1), the configuration wasset using a standard off-line camera-placementstrategy common in many other works, such as[39]. The four cameras were placed in a rec-tangular configuration, each with its focal lineplaced on a normal to the path of the OoI. In theactive-camera trials, the cameras were positionedin the configurations selected by the proposedmethodology at each DI, and camera motionswere constrained by the capabilities of the cameraactuators.

6.4 Results and Discussion

Time-series results of the system’s behavior ineach of the two scenarios of real-world experi-ments are provided below, along with the associ-ated discussion thereof, followed by statistics forthe overall performance of the system. A movie-strip sample of the experimental results is avail-able in Appendix B, while a complete table ofCamera Poses for simulations and experiments isprovided in Appendix C.

Scenario 1: System Behavior For the scenariowith no obstacles in the scene, Fig. 11 displaysthe errors for each trial, at every DI, where theiraccuracy at each frame is ±0.2 mm. In Trials 1 and2, without translating cameras, OoI self-occlusionsoccurred during DIs 13 to 19 and 33 to 38. Asexpected, for these DIs, the errors increased. InTrial 3, however, with both rotating and translat-ing cameras, the errors remained lower becausesystem reconfiguration allowed the cameras toobserve vertices occluded in the other two trials.This demonstrates the benefit of translating cam-eras to achieve superior OoI shape recovery byovercoming self-occlusion.

It is important to note that in all the DIs, theerrors for Trial 3 did not exceed 5.2 mm, with the

Page 20: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Fig. 11 Error metric Etevaluated at each DI, inobstacle-free workspace

exception of a large error at DI 38. At this DI,vertices were occluded such that they would onlyhave been observable outside of the cameras’ fea-sible range of motion as defined by the workspaceconstraints.

In addition to the benefits mentioned before,the frame sequence above demonstrates two moreways in which the proposed system reduces error:first, the proposed system avoids drift by properlyselecting configurations which minimize uncer-tainty; second, it improves the OoI shape recov-ery even for vertices which are already visible byreducing error in the priors.

Scenario 2: System Behaviour For the secondscenario, with obstacles in the scene, Fig. 12. Dis-plays the errors for each trial, at every DI, wheretheir accuracy at each frame is ±0.2 mm. In trialswith cameras which were static and which could

only rotate, the obstacles occluded parts of theOoI from DI 2 to 15. During these frames, the er-ror increased as no visual feedback was availableto correct for drift. This drift led to permanenttracking loss of certain vertices, and so the errorcontinued to grow even after the obstacle passedthe OoI.

In the trial where the cameras could translate,however, the cameras were able to reposition so asto avoid the obstacles, and so the error remainedrelatively constant and did not exceed 6.3 mm.This demonstrates the benefit of translating cam-eras and their ability to continuously recover ver-tices in the presence of obstacles.

Overall System Performance Table 1 summa-rizes the average overall errors in the static-camera and rotating-camera trials compared tothose in the fully reconfigurable camera trials

Fig. 12 Error metric Etevaluated at each DI, inobstacle-laden workspace

Page 21: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Table 1 Overall error foreach trial

Trial # Camera Average overall error, Average overall error,Reconfigurability no obstacles (mm) with obstacles (mm)

1 Static 4.82 ± 0.03 15.63 ± 0.062 Orientation only 4.27 ± 0.03 9.50 ± 0.063 Orientation and position 3.98 ± 0.03 4.31 ± 0.06

for both scenarios. In the first scenario, with noobstacles present, when cameras were allowedto change both their orientation and position,actively reconfiguring them reduced the over-all error by about 17 % over the static-cameracase, and by about 7 % over the orientation-onlycase. In the second scenario, with two obstaclesplaced in the workspace, the full reconfigurationreduced the overall error by about 72 % overthe static-camera case, and by about 55 % overthe orientation-only case. Thus, not only doeschanging the position of the cameras significantlyreduce the overall error in both obstacle-freeand obstacle-laden scenes, but the benefit of theproposed system in recovering the shape of adeformable OoI is even more pronounced inobstacle-laden environments.

It is important to note that the errors duringthose DIs when occlusion does take place, particu-larly of those vertices which become occluded, aremuch greater than the average error, and are sig-nificantly worse for the static and rotating camerasthan for the translating cameras. For example, as

depicted Fig. 13, for the rotating camera case inScenario 1, there were 38 instances when verticeshad an error over 10 mm (some reaching as high as33 mm), however, repositioning the cameras usingthe proposed system reduced this number of ver-tices with such error by 26 % (and the maximumerror of those was no more than 24 mm).

In Scenario 2, individual vertices’ errors in therotating camera case were as high as 65 mm,whereas repositioning the cameras reduced theerrors to no more than 16 mm, and cut the numberof vertices with errors over 10 mm by 93 %.Clearly the proposed system was particularly use-ful in improving the accuracy of those verticeswhich were occluded and inducing the highesterrors.

7 Conclusions

In this paper, an effective system for activelyreconfiguring a set of cameras to minimize theerror in motion capture and shape recovery of

(a) (b)

Fig. 13 Histogram of vertices’ errors above 10 mm: a Obstacle-free environment, b Obstacle-laden environment

Page 22: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

deformable objects-of-interest (OoIs) was pre-sented. The proposed system selects camera poseswhich minimize an error metric that accounts forsuch factors as vertex occlusions, measurementerror in vertex detection, and prior uncertainty invertex’s locations. It has been shown that, by di-rectly accounting for the OoI’s deformation whenselecting camera configurations, the error in therecovered deformations could be reduced.

Extensive simulations demonstrated the tangi-ble benefit of using active-camera reconfigurationover static-camera systems in recovering the shapeof a deformable OoI, and evaluated the robust-ness of the method in the presence of motion-prediction errors. Simulations also demonstratedthe benefit of repositioning cameras to overcomeobstacles occluding the OoI compared to cam-eras which are static or only able to rotate, andevaluated this effect as the number of camerasincreases.

Experiments also validated the applicabilityof our proposed system to motion capture, andparticularly highlighted the effectiveness of re-configurable systems in large open as well asobstacle-laden scenes, by determining optimalcamera positions in addition to orientations.

Acknowledgements The authors would like to acknowl-edge the support received, in part, by the Natural Sciencesand Engineering Research Council of Canada (NSERC).

Appendix A: Kalman Filter

For reference, the process for estimating the ver-tex positions using the Kalman Filter equations isprovided below in our notation.

At DIt j, given the estimated state vector, si, j−1,and state covariance matrix, Si, j−1, both deter-mined from the previous DI t j−1, and the detectedvertex positions, zi, j, the process to recover anestimate for Vertex i’s current state, si, j, and statecovariance matrix, Si, j, is as follows:

1. Predict current state from previous estimate:

s−i, j = Fsi, j−1 (21)

2. Predict current error covariance:

S−i, j = FSi, j−1 FT + Q (22)

3. Compute Kalman Gain:

Ki, j = S−i, j H

T(

HS−i, j H

T + Ri, j)−1 (23)

4. Update state estimate with measurement:

si, j = s−i, j + Ki, j

(

zi, j − Hs−i, j

)

(24)

5. Update error covariance:

Si, j = (

I − Ki, j H)

S−i, j (25)

Appendix B: Camera-Poses t Movie Strip

“Movie-strip” views of a selected sequence ofDIs from the trials with rotating-only and bothrotating and translating cameras in this scenarioare shown in Table 2. At DI 6, all the verticeswere detectable by the cameras in the initialconfiguration. At DI 13, in the rotating-only trial,the two red vertices in the middle of Camera1’s view were obstructed by the OoI’s surface,increasing the error to about 4.6 mm. In the trialswith both rotating and translating cameras, how-ever, the system properly selected a configurationwhich allowed Camera 2 to observe the other-wise occluded vertices, and so the error was only3.9 mm. In the subsequent DIs, up to DI 19,the two red vertices mentioned above remainedoccluded from Camera 2’s view in the rotating-only trial, and so the inability to detect the ver-tices impeded the system’s ability to accuratelypredict their locations, causing drift which led tosuccessively greater error in each DI. When cam-eras were able to change their position, however,the system was able to continuously reconfigurethe cameras to observe these vertices, achievingconsistently lower errors than those in the trialwhere cameras could only rotate. Even at DI 21,when the previously occluded vertices returned toview in the trial with only rotating cameras, theestimation errors were still larger than those forthe trial with translating cameras, due to betterprior predictions.

Page 23: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Table 2 “Movie-strip” views

Rotating cameras Rotating and translating cameras

DI Deformation Camera 2’s Camera Error Camera 2’s Camera Errorview poses metric (mm) view poses

6 3.4 3.3

13 4.6 3.9

19 7.1 4.7

21 3.5 2.8

Appendix C: Camera Poses for Simulations andExperiments

Table notation:

R = “Rotation” in (◦)T = “Translation” in (mm)

Table 3 Simulation-3 camera trajectories – 3 rotatingcameras

Frame # Camera

1 2 2 4

R R T R

1 70 100 850 2502 62 110 850 2533 60 109 850 2544 58 107 850 2565 56 106 850 2576 54 104 850 2597 53 103 850 2608 51 101 850 2629 49 100 850 26410 48 98 850 26511 46 96 850 26712 45 95 850 26913 43 93 850 27114 41 91 850 27315 40 89 850 275

Table 4 Simulation-3 camera trajectories – 3 rotating &translating cameras

Frame # Camera

1 2 2 3

R R T R

1 70 100 850 2502 62 103 724 2533 60 105 788 2544 58 103 789 2565 56 102 789 2576 54 100 789 2597 53 95 731 2608 51 87 638 2629 49 100 850 26410 47 89 727 26611 46 96 850 26812 44 95 850 27013 42 93 850 27114 41 91 850 27315 40 89 850 275

Table 5 Simulation-3 camera trajectories – 4 rotatingcameras

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 62 110 850 253 286 2503 60 108 850 254 291 250

Page 24: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Table 5 (Continued)

Frame # Camera

1 2 2 3 4 4

R R T R R T

4 58 107 850 256 295 2505 56 105 850 257 298 2506 54 104 850 259 302 2507 53 102 850 260 305 2508 51 101 850 262 307 2509 50 100 850 263 307 25010 48 98 850 265 308 25011 46 97 850 267 310 25012 45 95 850 269 310 25013 43 94 850 271 311 25014 42 92 850 273 311 25015 40 90 850 275 311 250

Table 6 Simulation-3 camera trajectories – 4 rotating &translating cameras

Frame Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 62 107 788 253 290 2503 60 105 788 254 290 2504 58 103 789 256 277 3585 56 102 789 257 283 3506 54 100 789 259 289 3377 53 99 791 260 276 4408 51 94 745 262 298 3129 49 93 758 264 300 31810 47 89 726 266 303 31811 46 86 711 268 303 31812 44 86 725 269 304 31713 43 86 725 271 301 37814 41 91 850 273 308 31615 40 89 850 275 309 315

Table 7 Simulation-3 camera trajectories – 5 rotatingcameras

Frame # Camera

1 2 2 3 4 4 5 5

R R T R R T R T

1 70 100 850 250 290 250 0 9252 62 110 850 253 286 250 −4 9253 60 109 850 254 290 250 −4 9254 58 107 850 256 294 250 −4 9255 56 106 850 257 297 250 −4 9256 54 104 850 259 300 250 −3 9257 53 103 850 260 303 250 −3 9258 51 101 850 262 305 250 −3 9259 49 99 850 264 307 250 −3 925

Table 7 (Continued)

Frame # Camera

1 2 2 3 4 4 5 5

R R T R R T R T

10 47 98 850 266 308 250 −4 92511 46 96 850 268 309 250 −4 92512 44 95 850 269 308 250 −4 92513 43 93 850 271 309 250 −4 92514 41 91 850 273 310 250 −4 92515 40 89 850 275 311 250 −3 925

Table 8 Simulation-3 camera trajectories – 5 rotating &translating cameras

Frame # Camera

1 2 2 3 4 4 5 5

R R T R R T R T

1 70 100 850 250 290 250 0 9252 62 107 788 253 290 250 4 8683 60 105 788 254 277 337 4 8674 58 103 789 256 281 335 4 8675 56 102 789 257 282 353 −3 9256 54 100 789 259 294 300 4 8667 53 99 793 260 283 397 −3 9258 51 94 742 262 298 311 −3 9259 49 93 756 264 301 319 −3 92510 47 89 725 266 308 257 3 86611 46 96 850 268 303 318 2 86512 44 86 726 269 304 317 1 86513 43 86 726 271 301 378 −4 92514 41 91 850 273 307 316 1 86515 40 89 850 275 311 252 1 865

Table 9 Simulation-3 camera trajectories – 6 rotatingcameras

Frame # Camera

1 2 2 3 4 4 5 5 6 6

R R T R R T R T R T

1 70 100 850 250 290 250 0 925 180 9252 62 110 850 253 286 250 −4 925 182 9253 60 108 850 254 291 250 −4 925 182 9254 58 107 850 256 294 250 −4 925 182 9255 56 106 850 257 298 250 −3 925 182 9256 54 104 850 259 300 250 −3 925 182 9257 53 103 850 260 303 250 −3 925 183 9258 51 101 850 262 305 250 −3 925 183 9259 49 99 850 264 307 250 −3 925 183 92510 47 98 850 266 308 250 −3 925 183 92511 46 96 850 268 308 250 −4 925 184 92512 44 95 850 269 308 250 −4 925 184 92513 43 93 850 271 309 250 −4 925 185 92514 41 91 850 273 310 250 −4 925 185 92515 40 89 850 275 311 250 −4 925 185 925

Page 25: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Table 10 Simulation-3 camera trajectories – 6 rotating &translating cameras

Frame # Camera

1 2 2 3 4 4 5 5 6 6

R R T R R T R T R T

1 70 100 850 250 290 250 0 925 180 9252 62 110 850 253 284 262 4 868 178 8613 60 108 850 254 276 338 4 867 177 8614 58 107 850 256 293 261 4 867 177 8605 56 98 724 257 282 354 −3 925 177 8606 54 100 789 259 299 260 3 866 177 8607 52 103 850 261 302 259 3 866 177 8608 51 101 850 262 304 258 3 866 177 8609 49 99 850 264 306 258 3 866 177 86010 47 98 850 266 308 257 3 866 177 86011 46 96 850 268 308 256 2 865 178 86012 44 95 850 269 308 255 1 865 178 86013 43 86 750 271 309 254 1 865 178 86014 41 91 850 273 310 253 1 865 178 85915 40 89 850 275 311 251 1 865 177 859

Table 11 Experiment scenario-1 camera trajectories – ro-tating cameras with no obstacles

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 71 102 850 241 305 2503 70 101 850 241 305 2504 69 101 850 242 307 2505 69 100 850 242 307 2506 68 100 850 243 307 2507 67 99 850 243 302 2508 67 99 850 244 304 2509 66 98 850 245 305 25010 65 103 850 245 301 25011 65 103 850 246 301 25012 64 103 850 246 301 25013 64 102 850 247 302 25014 63 102 850 247 303 25015 62 101 850 248 304 25016 62 100 850 249 304 25017 61 100 850 249 305 25018 61 99 850 250 305 25019 70 114 850 245 291 25020 69 113 850 246 291 25021 68 112 850 247 293 25022 63 100 850 252 298 25023 75 101 850 253 284 25024 67 101 850 251 290 25025 66 100 850 252 291 250

Table 11 (Continued)

Frame # Camera

1 2 2 3 4 4

R R T R R T

26 66 100 850 252 291 25027 65 99 850 253 292 25028 64 99 850 254 292 25029 64 98 850 255 293 25030 63 98 850 255 293 25031 63 97 850 255 294 25032 62 96 850 256 295 25033 62 96 850 257 295 25034 61 95 850 258 296 25035 61 95 850 259 296 25036 60 94 850 260 297 25037 59 93 850 260 292 25038 59 93 850 261 293 25039 58 92 850 261 298 25040 58 91 850 262 299 25041 58 91 850 263 300 25042 57 90 850 263 300 25043 57 89 850 264 301 25044 56 88 850 265 301 25045 56 88 850 265 302 250

Table 12 Experiment scenario-1 camera trajectories – ro-tating & translating cameras with no obstacles

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 71 102 850 241 305 2503 70 101 850 241 306 2504 69 101 850 242 307 2505 69 100 850 242 307 2506 68 100 850 243 307 2507 67 99 850 243 302 2508 67 99 850 244 304 2509 66 98 850 245 305 25010 65 103 850 245 301 25011 65 103 850 246 301 25012 64 103 850 246 302 25013 64 89 645 247 302 25014 63 97 780 247 303 25015 62 90 681 248 304 25016 62 87 650 249 304 25017 61 88 675 249 305 25018 61 87 658 249 305 25019 70 100 650 245 291 25020 69 113 850 246 292 25021 68 112 850 247 293 250

Page 26: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Table 12 (Continued)

Frame # Camera

1 2 2 3 4 4

R R T R R T

22 63 100 850 252 298 25023 75 101 850 253 284 25024 67 101 850 251 290 25025 66 100 850 252 291 25026 66 100 850 253 292 25027 65 99 850 253 292 25028 64 99 850 254 292 25029 64 98 850 255 293 25030 64 98 850 255 293 25031 63 97 850 255 294 25032 62 97 850 256 295 25033 62 96 850 257 288 37534 61 95 850 258 308 45835 61 95 850 259 282 45236 60 94 850 260 297 25037 59 93 850 260 289 30038 59 93 850 261 293 25039 58 92 850 262 298 25040 58 91 850 262 299 25041 58 91 850 263 300 25042 57 90 850 263 300 25043 57 89 850 264 301 25044 56 89 850 265 301 25045 56 88 850 265 302 250

Table 13 Experiment scenario-2 camera trajectories – ro-tating cameras with obstacles

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 62 110 850 253 287 2503 60 109 850 254 290 2504 58 107 850 255 294 2505 56 106 850 257 297 2506 55 104 850 258 300 2507 53 103 850 260 302 2508 52 102 850 261 304 2509 50 100 850 263 305 25010 49 99 850 264 307 25011 47 97 850 266 308 25012 45 96 850 268 308 25013 44 94 850 270 308 25014 43 93 850 271 309 25015 41 91 850 273 311 25016 40 90 850 275 312 250

Table 14 Experiment scenario-2 camera trajectories – ro-tating & translating cameras with obstacles

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 62 106 788 253 290 2503 60 105 788 254 290 2504 58 104 789 255 279 3595 56 102 789 257 281 3546 55 101 789 258 288 3367 53 99 789 260 284 3758 52 94 737 261 296 3099 50 94 764 263 299 31610 48 92 752 265 294 37211 47 90 743 267 295 37812 45 88 744 268 302 31713 44 73 743 270 304 31214 42 93 850 272 306 31615 41 91 850 274 308 31616 40 89 850 275 311 251

Table 15 Experiment scenario-1 camera trajectories – ro-tating cameras with no obstacles

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 71 102 850 241 305 2503 70 101 850 241 305 2504 69 101 850 242 307 2505 69 100 850 242 307 2506 68 100 850 243 307 2507 67 99 850 243 302 2508 67 99 850 244 304 2509 66 98 850 245 305 25010 65 103 850 245 301 25011 65 103 850 246 301 25012 64 103 850 246 301 25013 64 102 850 247 302 25014 63 102 850 247 303 25015 62 101 850 248 304 25016 62 100 850 249 304 25017 61 100 850 249 305 25018 61 99 850 250 305 25019 70 114 850 245 291 25020 69 113 850 246 291 25021 68 112 850 247 293 25022 63 100 850 252 298 25023 75 101 850 253 284 25024 67 101 850 251 290 25025 66 100 850 252 291 250

Page 27: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Table 15 (Continued)

Frame # Camera

1 2 2 3 4 4

R R T R R T

26 66 100 850 252 291 25027 65 99 850 253 292 25028 64 99 850 254 292 25029 64 98 850 255 293 25030 63 98 850 255 293 25031 63 97 850 255 294 25032 62 96 850 256 295 25033 62 96 850 257 295 25034 61 95 850 258 296 25035 61 95 850 259 296 25036 60 94 850 260 297 25037 59 93 850 260 292 25038 59 93 850 261 293 25039 58 92 850 261 298 25040 58 91 850 262 299 25041 58 91 850 263 300 25042 57 90 850 263 300 25043 57 89 850 264 301 25044 56 88 850 265 301 25045 56 88 850 265 302 250

Table 16 Experiment scenario-1 camera trajectories – ro-tating & translating cameras with no obstaclesFrame # Camera

1 2 2 3 4 4R R T R R T

1 70 100 850 250 290 2502 71 102 850 241 305 2503 70 101 850 241 306 2504 69 101 850 242 307 2505 69 100 850 242 307 2506 68 100 850 243 307 2507 67 99 850 243 302 2508 67 99 850 244 304 2509 66 98 850 245 305 25010 65 103 850 245 301 25011 65 103 850 246 301 25012 64 103 850 246 302 25013 64 89 645 247 302 25014 63 97 780 247 303 25015 62 90 681 248 304 25016 62 87 650 249 304 25017 61 88 675 249 305 25018 61 87 658 249 305 25019 70 100 650 245 291 25020 69 113 850 246 292 25021 68 112 850 247 293 25022 63 100 850 252 298 250

Table 16 (Continued)

Frame # Camera

1 2 2 3 4 4

R R T R R T

23 75 101 850 253 284 25024 67 101 850 251 290 25025 66 100 850 252 291 25026 66 100 850 253 292 25027 65 99 850 253 292 25028 64 99 850 254 292 25029 64 98 850 255 293 25030 64 98 850 255 293 25031 63 97 850 255 294 25032 62 97 850 256 295 25033 62 96 850 257 288 37534 61 95 850 258 308 45835 61 95 850 259 282 45236 60 94 850 260 297 25037 59 93 850 260 289 30038 59 93 850 261 293 25039 58 92 850 262 298 25040 58 91 850 262 299 25041 58 91 850 263 300 25042 57 90 850 263 300 25043 57 89 850 264 301 25044 56 89 850 265 301 25045 56 88 850 265 302 250

Table 17 Experiment scenario-2 camera trajectories – ro-tating cameras with obstacles

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 62 110 850 253 287 2503 60 109 850 254 290 2504 58 107 850 255 294 2505 56 106 850 257 297 2506 55 104 850 258 300 2507 53 103 850 260 302 2508 52 102 850 261 304 2509 50 100 850 263 305 25010 49 99 850 264 307 25011 47 97 850 266 308 25012 45 96 850 268 308 25013 44 94 850 270 308 25014 43 93 850 271 309 25015 41 91 850 273 311 25016 40 90 850 275 312 250

Page 28: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

Table 18 Experiment scenario-2 camera trajectories – ro-tating & translating cameras with obstacles

Frame # Camera

1 2 2 3 4 4

R R T R R T

1 70 100 850 250 290 2502 62 106 788 253 290 2503 60 105 788 254 290 2504 58 104 789 255 279 3595 56 102 789 257 281 3546 55 101 789 258 288 3367 53 99 789 260 284 3758 52 94 737 261 296 3099 50 94 764 263 299 31610 48 92 752 265 294 37211 47 90 743 267 295 37812 45 88 744 268 302 31713 44 73 743 270 304 31214 42 93 850 272 306 31615 41 91 850 274 308 31616 40 89 850 275 311 251

References

1. White, R., Crane, K., Forsyth D.: Capturing and an-imating occluded cloth. ACM Trans. Graph. 26, 34(2007)

2. Bai, X., Liu, W., Wang, X., Latecki, L.J., Tu, Z.: Ac-tive skeleton for non-rigid object detection. In: IEEEInt. Conf. Computer Vision, Kyoto, Japan, pp. 575–582(2009)

3. Park, S.I., Hodgins, J.K.: Capturing and animating skindeformation in human motion. ACM Trans. Graph.25(3), 881–889 (2006)

4. Vlasic, D., et al.: Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28(5),174:1–147:11 (2009)

5. Drouin, S., Hebert, P., Parizeau, M.: Incremental dis-covery of object parts in video sequences. Comput.Vision Image Underst. 110, 60–74 (2008)

6. Scholz, V., Stich, T., Magnor, M., Keckeisen, M.,Wacker, M.: Garment motion capture using color-coded patterns. CGF 24(3), 439–448 (2005)

7. Bradley, D., Heidrich, W., Popa, T., Sheffer, A.: Highresolution passive facial performance capture. Pro-ceedings of ACM SIGGRAPH (Los Angeles, USA,July 25-29, 2010). ACM Trans. Graph. 29(4), 41:1–41:10 (2010)

8. de Aguiar, E., et al.: Performance capture from sparsemulti-view video. ACM Trans. Graph. 27(3), 98–108(2008)

9. Cerezo, E., et al.: Real-Time facial expression recogni-tion for natural interaction. In: Iberian Conf. PatternRecognition and Image Analysis, pp. 40–47. Girona,Spain (2007)

10. Petit, B., et al.: Multicamera real-tme 3D modeling fortelepresence and remote collaboration. Int. J. Digit.Multimed. Broadcast. 2010, 24108–12 (2009)

11. Lo, B., Chung, A., Stoyanov, D., Mylonas, G., Yan,G.Z.: Real-time intra-operative 3D tissue deformationrecovery. In: IEEE Int. Symp. Biomedical Imaging,pp. 1387–1390. Paris, France (2008)

12. Richa, R., Bo, A.P., Poignet, P.: Towards robust 3D vi-sual tracking for motion compensation in beating heartsurgery. Med. Image Anal. 15(3), 301–315 (2011)

13. Sharf, A., et al.: Space-time surface reconstruction us-ing incompressible flow. ACM Trans. Graph. 27(5),110–120 (2008)

14. Chen, S., Li, Y., Ming Kwok, N.: Active vision in ro-botic systems: a survey of recent developments. Int. J.Robot. Res. 20(11), 1343–1377 (2011)

15. Chellappa, R., Roy-Chowdhury, A.K., Zhou, S.K.:Human Activity Recognition. Morgan & ClaypoolPublishing, San Rafael, CA (2005)

16. Abrams, S., Allen, P.K., Tarabanis, K.A.: Dynamic sen-sor planning. In: IEEE Int. Conf. Robotics and Au-tomation, pp. 605–610. Atlanta, GA (1993)

17. Miao, Y.-Q., Khamis, A., Kamel, M.: Coordinated mo-tion control of mobile sensors in surveillance systems.In: Int. Conf. Signals, Circuits and Systems, pp. 1–6.Djerba, Tunisia (2009)

18. Caglioti, V.: A unified criterion for minimum uncer-tainty sensing in object recognition and localization. In:IEEE Int. Conf. on Intelligent Robots and Systems,pp. 1585–1590. Raleigh, NC (1992)

19. Beb, R., Paulus, D., Niemann, H.: 3D recovery usingcalibrated active cameras. In: Int. Conf. Image Process-ing, pp. 855–858. Lausanne, Switzerland (1996)

20. Farid, H., Lee, S.W., and Bajcsy, R.: View selectionstrategies for multi-view, wide-baseline stereo. Univ. ofPennsylvania, Dept. of Comput. Inform. Sci., Philadel-phia, PA, Tech. Rep. MS-CIS-94-18 (1994)

21. Olague, G., Mohr, R.: Optimal camera placement foraccurate reconstruction. Pattern Recog. 35(4), 927–944(2002)

22. Klarquist, W.N., Bovik, A.C.: OVEA: a foveated ver-gent active stereo vision system for dynamic three-dimensional scene recovery. IEEE Trans. Robot. Au-tom. 14(5), 755–770 (1998)

23. Wu, X., Matsuyama, T.: Real-time active 3D shapereconstruction for 3D video. In: 3rd Int. Symp. ImageSignal Processing Analysis, pp. 186–191. Rome, Italy(2003)

24. Tan, J.K., Ishikawa, S., Yamaguchi, I., Naito, T.,Yokota, M.: 3-D recovery of human motion by mobilestereo cameras. Artif. Life Robot. 10, 64–68 (2006)

25. Mackay, M., Fenton, R., Benhabib, B.: Time-varying-geometry object surveillance using a multi-cameraactive-vision system. Int. J. Smart Sensing Intell. Syst.1(3), 679–704 (2008)

26. Yous, S., Ukita, N., Masatsugu, K.: An assignmentscheme to control multiple pan/tilt cameras for 3Dvideo. J. Multimedia 2(1), 10–19 (2007)

27. Ilie, A., Welch, G., Macenko, M.: A stochastic qual-ity metric for optimal control of active camera net-work configurations for 3D computer vision tasks.

Page 29: A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

J Intell Robot Syst

In: Workshop Multi-camera and Multi-modal SensorFusion Algorithms and Applications. Marseille, France(2008)

28. Collins, R.T., Amidi, O., Kanade, T.: An active camerasystem for acquiring multi-view video. In: Conf. ImageProcessing, pp. 17–520. Rochester, NY (2002)

29. Davis, J.E.: Mixed scale motion recovery. Ph.D. Dis-sertation, Dept. Computer Science, Stanford Univ,Stanford, CA (2002)

30. Doshi, A., Starck, J., Hilton, A.: An empirical study ofnon-rigid surface feature matching of human from 3DVideo. J. Virtual Real. Broadcast. 7(3), 1–20 (2010)

31. Canton-Ferrer, C., Casas, J.R., Tekalp, M., Pardas, M.:Projective Kalman filter: multiocular tracking of 3Dlocations towards scene understanding. Lecture NotesComput. Sci. 3869, 250–261 (2006)

32. Hartley, R.I., Zisserman, A.: Multiple View Geometryin Computer Vision, 2nd edn. Cambridge UniversityPress (2004)

33. Bolme, D.S., Beveridge, J.R., Draper, B.A.: Visual ob-ject tracking using adaptive correlation filters. In: Com-puter Vision and Pattern Recognition, pp. 2544–2550.San Francisco, CA (2010)

34. Kalman, R.E.: A new approach to linear filteringand prediction problems. J. Basic Eng. 82, 35–45(1960)

35. Kokkinos, I., Yuille, A.: Unsupervised learning ofobject deformation models. In: IEEE Int. Conf.Computer Vision, pp. 1–8. Rio de Janeiro, Brazil(2007)

36. Gallup, D., Frahm, J.-M., Mordohai, P., Pollefeys,M.: Variable baseline/resolution stereo. In: Proc.Computer Vision and Pattern Recognition, pp. 1–8.Anchorage, AK (2008)

37. de Aguiar, E., Theobalt, C., Sto, C.: Marker-less de-formable mesh tracking for human shape and motioncapture. In: Proc. Computer Vision and Pattern Recog-nition, pp. 1–8. Minneapolis, MN (2007)

38. Vo, M., Wang, Z., Luu, L., Ma, J.: Advanced geomet-ric camera calibration for machine vision. Opt. Eng.50(11), 110503 (2011)

39. Veres, G.V., Gordon, L., Carter, J.N., Nixon, M.S.:What image information is important in silhouette-based gait recognition? In: Proc. Computer Vision andPattern Recognition, pp. 776–782. Washington, D.C.(2004)