A Multi-Camera Active-Vision System for Deformable-Object-Motion Capture

  • View

  • Download

Embed Size (px)


  • J Intell Robot SystDOI 10.1007/s10846-013-9961-0

    A Multi-Camera Active-Vision Systemfor Deformable-Object-Motion Capture

    David S. Schacter Mario Donnici Evgeny Nuger Matthew Mackay Beno Benhabib

    Received: 12 September 2013 / Accepted: 14 September 2013 Springer Science+Business Media Dordrecht 2013

    Abstract A novel methodology is proposed toselect the on-line, near-optimal positions and ori-entations of a set of dynamic cameras, for a re-configurable multi-camera active-vision system tocapture the motion of a deformable object. Theactive-vision system accounts for the deformationof the object-of-interest by fusing tracked ver-tices on its surface with those triangulated fromfeatures detected in each cameras view, in or-der to predict the shape of the object at subse-quent demand instants. It then selects a systemconfiguration that minimizes error in the recov-ered position of each of these features. The tan-gible benefits of using a reconfigurable system,particularly with translational cameras, versus sys-tems with static cameras in a fixed configuration,

    D. S. Schacter E. Nuger M. Mackay B. Benhabib (B)Department of Mechanical and IndustrialEngineering, University of Toronto, Toronto,ON, Canadae-mail: benhabib@mie.utoronto.ca

    D. S. Schactere-mail: david.schacter@utoronto.ca

    M. DonniciDepartment of Mechanical and IndustrialEngineering, University of Calabria, Rende, Italye-mail: mario.donnici@gmail.com

    are demonstrated through simulations and exper-iments in both obstacle-free and obstacle-ladenenvironments.

    Keywords Active vision Camerareconfiguration Deformable objects Motion capture Robot vision systems View planning

    1 Introduction

    The motion capture of deformable objectsthosewhose shapes variance is distributed relativelyuniformly over their surfacesis a complex re-search problem, owing to the inherent difficultiesin recovering dynamic surface forms in noisy en-vironments [15]. Often, these methods rely onbreaking the objects surface into a mesh of ver-tices whose motion is then tracked [1, 6], as willbe the focus herein. Nevertheless, deformable-object-motion capture has been attempted in a va-riety of applications, such as performance capture[7, 8], expression recognition [9], tele-immersion[10], and soft-tissue deformation tracking [11,12]. However, for many proposed systems, theunderlying assumption has been that the posi-tions and orientations (poses) of the cameras arefixed in space, and clear views of the object-of-interest (OoI) are available. These constraintslimit the applicability of such systems, especially,

  • J Intell Robot Syst

    in the presence of obstacles occluding views ofthe deformable OoI, and in the event of self-occlusion [13].

    In order to address the abovementioned chal-lenges, researchers have suggested the use ofactive-vision systems, wherein cameras are re-configured in response to changes in the environ-ment [1417]. Namely, the poses of the camerasmay be varied dynamically in order to maximizethe number of un-occluded features, and/or tominimize the error in recovering the positions ofthese features [18].

    Recovering a dynamic surface form is a chal-lenging problem, and methods that have beendeveloped for the on-line selection of optimalcamera poses to improve an OoIs 3D-shape re-covery are, typically, limited to cases where theobject is rigid (e.g., [1922]). When dealing withdeformable objects, these methods suffer froma number of shortcomings. First, by assuming aunique and static representation of the OoI, theyare unable to recognize or track the OoI if itsshape or appearance changes. Second, they as-sume that the entire OoIs shape can be knowngiven the position of a small number of referencepoints on it, due to a rigid shape known a priori.Accordingly, they do not necessarily ensure thatall desired features of interest are observable bythe cameras at all times, thereby allowing un-acceptable errors in the recovered OoIs shape.Third, when faced with self-occlusions, they donot reconfigure to views which avoid increasingthe errors in the occluded features estimatedpositions.

    A number of research teams have been focus-ing on overcoming these shortcomings. The activemulti-camera system proposed in [23], for exam-ple, was designed for volumetric reconstructionof a deformable object for 3D video. This systemreconstructs the OoIs shape by calculating thevolume intersection of silhouettes projected fromthe cameras image planes onto multiple parallelplanes, although a control scheme to select cameraviewpoints is not mentioned.

    In [24], for a given image sequence, factor-ization is performed on detected feature pointsin order to determine the relative camera poses.Although this technique was designed for 3Ddeformable-object-motion capture, it does not

    prescribe explicitly how the camera poses shouldbe chosen optimally. The active multi-camera sys-tem proposed in [25] does address the issue ofoptimal camera viewpoint selection, even in thepresence of occluding obstacles. This system usesan agent-based approach to maximize a visibilitycriterion over bounding ellipsoids encircling theobjects articulated joints. However, as one notes,only recognition of articulated (multi-rigid-link)objects is addressed.

    The camera-assignment method discussed in[26] maximizes the visibility of an unknown de-formable OoI. The proposed windowing schemecontrols the orientation of pan/tilt cameras, al-though, no provision is made for future defor-mation prediction. A stochastic quality metricwas proposed in [27] for the optimal control ofan active-camera network for deformable-objecttracking and reconstruction, however, as in [20],no configuration-management component or de-formation prediction was presented. An activepan-tilt-zoom camera system that does use atracking algorithm to keep a moving person cen-tered in each cameras view was proposed in [28].This system, however, does not explicitly optimizefor each targets visibility, allowing the possibilityfor occlusion.

    From the abovementioned works, one may con-clude that none of the existing systems adjustcameras positions to better observe deformingobjects. Allowing cameras in an active multi-camera network to translate may be particu-larly beneficial in two cases. First, in mixed-scaleenvironmentswhere the scale of the OoIs de-formations are significantly smaller than the scaleof the OoIs motion within a workspace, which isusually the case with mobile deformable objectsa system must simultaneously capture the defor-mation, which requires high resolution imagerysuitable for resolving small details, and increasethe working volume in which motion capture ispossible [29]. Translation allows even a limitednumber of cameras with limited zoom to do justthat: to approach an OoI and recover its finedeformations, while allowing a wide region in aworkspace to be covered as needed. Second, incluttered environments, a system must be able tocapture the motion of an OoI even when it passesbehind an obstacle. In these cases, translation

  • J Intell Robot Syst

    allows a system with even a limited number ofcameras to recover the OoIs deformation byrepositioning them to viewpoints from which theOoI is un-occluded by obstacles.

    Herein, a reconfigurable active-vision systemthat minimizes the error in capturing the motionof a known deformable object with detectiblefeatures is proposed. Such a system would bebeneficial in improving the effectiveness of meth-ods such as those presented in [1, 6], which cap-ture the motion of clothing with a known colorpattern, and could be used to capture the motionof the flowing dress of a dancer moving arounda cluttered set. This system is novel in that itaccounts for the deformation of the OoI in anon-line manner, and allows for pan/tilt as wellas translation of the cameras. Moreover, an al-gorithm is provided for the efficient determina-tion of near-optimal camera poses. Simulationsdemonstrate the systems ability to reduce theerror in deformation recovery and account forerror in deformation prediction. Additionally, ex-periments empirically validate the effectiveness ofthe proposed implementation with application tomotion capture, and show the benefit of allowingreconfigurable cameras to translate.

    This paper is, thus, organized as follows. Thenotation used throughout the paper is providedin Section 2, and the problem addressed herein is,then, defined in Section 3, in terms of the objectiveto be achieved, and the underlying tasks requiredto achieve it. Section 4 provides the descriptionof, and theory behind, the proposed system forsolving this problem. This system is evaluated inthree distinct simulations in Section 5, and in twoexperimental scenarios in Section 6, respectively.Both Sections 4 and 5 provide their respectivedescription of the test set-up, procedure, results,and discussion. Section 7 concludes the paper.

    2 Notation

    T Set of all DIs (demand instants)J Total number of DIst j jth DIj Index of current DIV j Set of all vertex positions at t j (set of

    [3 1])

    n Number of verticesxi, j Vertex is position in world coordinates

    [3 1]xi, j Estimate of Vertex is position in world

    coordinates at t j [3 1]C Set of all camerasck Camera kRc, j Pose of camera cat t jE Average error metric over all DIsE j Error metric at t jm Number of degrees of freedom for each

    camera Index of a cameras degree of freedomi, j Ellipsoidal uncertainty region around Ver-

    tex iat t jW i, j Scaled covariance matrix of Vertex Is pre-

    dicted position at t j [3 3] Number of variances defined by user to be

    enclosed in uncertainty ellipseijc Projection of uncertainty ellipsoid into

    camera cs pixel coordinatesvc Arbitrary pixel in camera c pixel coordi-

    nates [2 1] Pc Mapping of world coordinates to pixel co-

    ordinates for