Capture of Arm-Muscle Deformations using aDepth-Camera
Nadia RobertiniUniversity of Saarland
Dr. Kiran Varanasi
Max-Planck-Institut InformatikSaarbruecken, Germany
Prof. Dr. ChristianTheobalt
Max-Planck-Institut InformatikSaarbruecken, Germany
ABSTRACTModeling realistic skin deformations due to underneath muscle bul-ging has a wide range of applications in medicine, entertainmentand art. Current acquisition systems based on dense markers andmultiple synchronized cameras are able to record and reproducefine-scale skin deformations with sufficient quality. However, thecomplexity and the high cost of these systems severely limit theirapplicability. In this paper, we propose a method for reconstructingfine-scale arm muscle deformations using the Kinect depth cam-era. The captured data from the depth camera has no temporalcontiguity and suffers from noise and sensory artifacts, and thusunsuitable by itself for potential applications in visual media pro-duction or biomechanics. We process noisy depth input to ob-tain spatio-temporally consistent 3D mesh reconstructions show-ing fine-scale muscle bulges over time. Our main contribution isthe incorporation of statistical deformation priors into the spatio-temporal mesh registration progress. We obtain these priors froma previous dataset of a limited number of physiologically differentactors captured using a high fidelity acquisition setup, and thesepriors help provide a better initialization for the ultimate non-rigidsurface refinement that models deformations beyond the range ofthe previous dataset. Thus, our method is an easily scalable frame-work for bootstrapping the statistical muscle deformation model,by extending the set of subjects through a Kinect based acquisi-tion process. We validate our spatio-temporal surface registrationmethod on several arm movements performed by people of differ-ent body shapes.
Currently at Technicolor Research & Innovation, Rennes, France
1. INTRODUCTIONReconstructing high-quality muscle deformations in a non-intrusivemanner is a key problem in the areas of entertainment, human biome-chanics and human-centered design. Depth cameras like MicrosoftKinect, that recently appeared on the consumer market, provide arelatively cheap and easy mechanism to capture 3D images. How-ever, the captured depth images have significant artifacts due tosensor noise, occlusions and the lack of temporal contiguity in cap-ture. As such, these are unusable for researchers in biomechan-ics or human-computer interfaces who want to build accurate user-specific models of muscle deformations. In the current paper, wepropose a method for reconstructing high quality and temporallyaligned 3D meshes from depth images captured by the Kinect cam-era, for the human shoulder-arm region. Thus our method bridgesan important gap and enlarges the research scope for many areasconcerned with modeling muscle deformations, making them cap-italize on cheap consumer hardware. For example, realistic vir-tual humans and their muscle movements can be modeled for vi-sual media production in a cost-effective manner, through the useof cheap acquisition systems and fewer hours of manual work byartists. Sports scientists and medical practitioners can observe thephysiological action of muscles on a day-to-day basis and providepersonalized advice to sportsmen and patients without the use ofexpensive and intrusive sensors.
At present, modeling realistic muscle deformations of virtual hu-mans remains a highly labour intensive task. Commercial systemsuse specialized kinematic rigs for virtual characters, which havehundreds of control parameters to derive localized bulging effectson a fine scale by approximating them with a set of bones. As analternative, bio-mechanically based simulation of human anatomyand physics-based muscle deformation can be performed. How-ever, this remains computationally very expensive and such rigs arehard to control and adapt to new characters. Thus, data-driven sim-ulation methods have been developed in order to overcome some ofthese limitations. Based on a training set of artist-given deforma-tion examples or 3D scans acquired directly from the real world, anartistic interface can be developed that is simple to use, but whichreproduces complex muscle deformation behavior as visible in thetraining set. These data-driven simulation methods bridge an im-portant gap in the artistic production pipeline. However, acquiring
Figure 1: Overview of our capture pipeline: depth measurements are first filtered and then used for an initial surface reconstruction. Astatistical deformation model based on two different datasets is built and then used to clean the initial reconstruction within the space oflearned deformations. A last refinement step is used to capture fine scale details not captured by previous shapes in the database.
the training set of spatio-temporally aligned muscle deformationexamples remains a challenging problem. In order to acquire fine-scale deformation, a lot of markers have to be placed on the hu-man body and tracked using expensive imaging systems . Thecomplexity of this acquisition process places a high limiting bar-rier for novice artists and practitioners from taking advantage ofthe research advances in data-driven muscle simulation. Further-more, people exhibit an enormous statistical variation in muscledeformations with respect to body pose. The data-driven simula-tion methods are, by their very design, restricted in their modelingability to the limited set of human subjects captured in the trainingdata. Unless the acquisition process becomes cheap and simple touse, it is difficult to capture a substantially large set of people andmodel the statistical variation in their muscle deformations. In thispaper, we make a contribution in this regard by proposing a novelacquisition method based on the Kinect depth camera.
The Kinect depth sensor has been deployed with great success forvarious tasks by researchers in robotics, computer vision and human-computer interaction. However, most of these tasks have been re-stricted to reconstructing a static world  or recovering the motiondynamics only at a coarse scale . Modeling fine scale non-rigidsurface deformation, with using a consumer-grade depth sensor likethe Kinect, remains an immense challenge. The noise artifacts thatoccur in the depth image make the simultaneous recovery of ac-curate 3D geometry and motion severely under-constrained. Theartifacts in the depth image can arise from limitations in the imag-ing process, the limited resolution of the capture sensor, and due tosurface occlusions that naturally occur during motion. In this pa-per, we propose a method to process these noisy depth images andreconstruct high-quality spatio-temporally aligned 3D meshes. Ourmain observation is that a previously acquired dataset of muscle de-formations (from a set of 10 subjects captured with a multi-cameraacquisition system, kindly made available by ) provides usefulpriors for initializing the 3D registration, and ultimately to recon-structing fine-scale 3D surface deformations beyond the previousdataset. We make the following key-contributions to push the re-
search agenda in this field.
1. We provide a method for filtering three-dimensional corre-spondence estimation in the noisy depth map input, by usinggeometric priors of deformation and statistical priors learnedfrom a capture dataset.
2. We provide a framework for extending the generalizationscope of a statistical model of deformation, by capturing morepeople than present in the initial dataset.
2. RELATED WORKModeling fine-scale muscle deformations has long been an activeresearch topic in computer graphics. We refer the readers to therelated work section in  for an elaborate review of muscle defor-mation models: skinning approaches, physiologically-based simu-lation models and data-driven simulation models. In the following,we review only certain important related works in data-driven mod-eling of muscle deformations.
In 2006, Park and Hodgins  developed and demonstrated anacquisition system for capturing fine-scale muscles deformationsat high-speed motions on several actors. They used a very large setof reflective markers (350 against 40-60 previously used) placed onmuscular and fleshy parts of the body. They first captured the rigidbody motion of the markers and then used the found residual defor-mations to deform a hand-designed subject-specific model. How-ever, the marker application time and acquisition complexity wereextremely high. This inhibits the acquisition of a larger numberof subjects, which can contribute with more muscle data and skinmotion. Another limitation of their system is the impossibility togeneralize the acquired dynamic captured motion for different bodytypes. In their later work in 2008 , they presented a data-driventechnique for synthesizing skin deformation from skeletal motion.Using the same input data they used in the previous work, theybuild up a database of deformation data separately parametrized bypose and acceleration. Afterward they learned respectively pose
and acceleration specific deformation using Principal ComponentAnalysis (PCA) and built a statistical model. Because of the com-plexity of the acquisition step, they filled the database with a hugeamount of poses from a single subject, causing the statistical modelagain to be highly shape dependent. Although they introduced thepossibility to generate novel motions of subjects with similar bodyshapes as the one contained in their database. Using similar ac-quisition system and pipeline, a later work presented by Hong