Magic mirror using motion capture in an exhibition environment799064/FULLTEXT01.pdf · Magic mirror using motion capture in an exhibition environment Examensarbete utfört i medieteknik

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings Universitet SE-601 74 Norrköping, Sweden 601 74 Norrköping

LiU-ITN-TEK-A--10/068--SE

Magic mirror using motioncapture in an exhibition

environmentDaniel ErikssonThom Persson

2010-11-18

LiU-ITN-TEK-A--10/068--SE

Magic mirror using motioncapture in an exhibition

environmentExamensarbete utfört i medieteknik

vid Tekniska Högskolan vidLinköpings universitet

Daniel ErikssonThom Persson

Handledare Thomas RydellExaminator Stefan Gustavson

Norrköping 2010-11-18

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.

The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.

According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.

For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/

© Daniel Eriksson, Thom Persson

Magic mirror using motion capture in anexhibition environment

Daniel Eriksson Thom Persson

November 30, 2010

Abstract

Motion capture is a commonly used technique in the movie and computer gameindustries to record animation data. The systems used in these industries areexpensive high end systems that often use markers on the actor together withseveral cameras to record. Reasonable results can be achieved using no markersand a single webcam. In this report we will take a look on such a system andthen use it together with our own animation software. The final product willbe placed in an exhibition environment, restricting the level of interaction withthe user that is practical.

Contents

Contents 2

List of Figures 4

List of Algorithms 5

List of Abbreviations 6

1 Introduction 71.1 Interactive Institute . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Visualization Center C . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Our task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Motion capture 92.1 Optical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Passive markers . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 Active markers . . . . . . . . . . . . . . . . . . . . . . . . 112.1.3 Markerless . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Mechanical systems . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Body motion capture . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 Facial motion capture . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.1 Feature tracking . . . . . . . . . . . . . . . . . . . . . . . 132.5.2 Active appearance models . . . . . . . . . . . . . . . . . . 142.5.3 visage|SDK . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5.4 faceAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Animation 173.1 Blend shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Skeletal animation . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Blink detection 194.1 Normal flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 State machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2

CONTENTS CONTENTS

5 Implementation 225.1 Blend shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 Skinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.4 Matching landmarks . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.4.1 Eyebrows . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4.2 Mouth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4.3 Jaw movement . . . . . . . . . . . . . . . . . . . . . . . . 28

5.5 Blink detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.6 Key framing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.6.1 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.6.2 Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.7 Eyes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.8 Upper body movement . . . . . . . . . . . . . . . . . . . . . . . . 305.9 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.10 Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.11 SSAO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Results 326.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326.2 SSAO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.3 Usage Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.4 The installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Conclusions 387.1 OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.2 Blink detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.3 Tracking and animation separation . . . . . . . . . . . . . . . . . 387.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.4.1 Gaze tracking . . . . . . . . . . . . . . . . . . . . . . . . . 39

A faceAPI landmark standard 40

B User manual 41B.1 Command line arguments . . . . . . . . . . . . . . . . . . . . . . 41B.2 Available Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . 41B.3 File formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

B.3.1 Config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42B.3.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43B.3.3 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Bibliography 44

3

List of Figures

1.1 An early concept image of the installation. . . . . . . . . . . . . . 8

2.1 Optical motion capture suit with markers fromMoCap for Artists:Workflow and Techniques for Motion Capture[4]. . . . . . . . . . 11

2.2 Example of detected facial features from A robust facial featuretracking system[13]. . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Screen capture of example video showcasing visage|SDK fromVisage Technologies. . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Part of a screen capture of our final application visualizing thetracked features from faceAPI. . . . . . . . . . . . . . . . . . . . 16

3.1 Example of blend shapes from the book GPU Gems 3[30]. . . . 18

4.1 The normal flow showing an opening sequence between two eye-frames. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 The visualization used by Divjak and Bischof [31] for their statemachine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.1 Illustrated bone weights. . . . . . . . . . . . . . . . . . . . . . . 245.2 Illustrated bone placement. . . . . . . . . . . . . . . . . . . . . . 255.3 Key frame editor . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.1 Comparision between SSAO on (left) and SSAO off (right). . . . 336.2 Usage examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3 Usage examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.4 The installation seen from the front. . . . . . . . . . . . . . . . . 366.5 The installation seen from the back. . . . . . . . . . . . . . . . . 37

A.1 faceAPI landmark standard . . . . . . . . . . . . . . . . . . . . . 40

4

List of Algorithms

1 General psuedocode . . . . . . . . . . . . . . . . . . . . . . . . . 262 Eyebrow psuedocode . . . . . . . . . . . . . . . . . . . . . . . . . 273 Open mouth psuedocode . . . . . . . . . . . . . . . . . . . . . . . 274 Smile psuedocode . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Blink detection psuedocode . . . . . . . . . . . . . . . . . . . . . 29

5

List of Abbreviations

AAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Active Appearence ModelAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application Programming InterfaceCGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computer Generated ImageryCPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Central Processing UnitFIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Finite Impulse ResponseGLSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . OpenGL Shading LanguageGPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphics Processing UnitGUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphical User InterfaceIIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Infinite Implulse ResponseLED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Light-Emitting DiodMIT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Massachusetts Institute of TechnologyPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Component AnalysisSDK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Software Development KitSSAO . . . . . . . . . . . . . . . . . . . . . . . . . . . . Screen Space Ambient OcclusionVR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Reality

6

Chapter 1

Introduction

This Master’s thesis is carried out at the Interactive Institute in Norrköpingwith the goal to be a part of an exhibition called ”To show what can not beseen” at Norrköping Visualization Center C.

1.1 Interactive InstituteThe Interactive Institute1 is an experimental media research institute with ex-pertise in art, design and technology. They conduct research in these areas andalso provide strategic advice to cooperations and public organizations.

The Interactive Institute is organized into subgroups that are located aroundthe country, each one with a slightly different focus. The group this Master’sthesis was developed at is called C-Studio and is also a part of the NorrköpingVisualization Center C.

1.2 Visualization Center CNorrköping Visualization Center C is a cooperation between Norrköping Kom-mun, Linköping University, Norrköping Science Park and The Interactive Insti-tute.

Besides the permanent exhibition mentioned above, the center also contains atemporary exhibition, a restaurant and café, office space for the various branchesof C, conference rooms, a VR-theater and a dome theater.

1.3 PurposeThe purpose of this Master’s thesis was to develop an interactive installationthat in a playful manner demonstrates how motion capture techniques work andare used in the game and movie industries.

1http://www.tii.se

7

1.4. OUR TASK CHAPTER 1. INTRODUCTION

1.4 Our taskThe task was twofold; the first task was to develop, from scratch or basedon existing software, a robust system for tracking a user’s movements. Therequirements for this system where:

• Real time, or as close to real time as possible.

• Camera based, be usable with a single off-the-shelf webcam.

• No user input other than the camera.

• Can handle large head rotations, up to 45 degrees.

• Can handle occlusion.

• Do not utilize markers or other equipment.

The second task was to develop, from scratch or based on existing software,an application that utilizes the tracking software to render animated characters.

Figure 1.1: An early concept image of the installation.

8

Chapter 2

Motion capture

Motion capture is a process used in a number of areas including the military[1], clinical medicine [2] and the entertainment industry [3]. Despite the wideuse of motion capture and its different implementations, a fitting definition canbe found by Kitagawa and Windsor [4]:

”Motion capture (mocap) is sampling and recording motion of humans, ani-mals, and inanimate objects as 3D data. The data can be used to study motionor to give an illusion of life to 3D computer models.”

A common use of motion capture today can be found in the feature filmindustry where an actor’s movements are recorded using one of the availablesystems for motion capture and then translated onto a virtual character. Whilesome call it ”Satan’s Rotoscope” and see it as a threat to the livelihood ofanimators, others see it as a promising technique which will allow animators tofocus on more creative endeavors [5]. Whatever role motion capture will playin the future, the main qualities today are:

• The high speed, making it possible to animate a virtual character muchfaster than traditional key framing, resulting in an easy way to do multipletakes with different deliveries from the actor. Some systems are so fastthat even real time viewing is possible, giving even more control to thedirector [6].

• The amount of work needed is not as dependent on the complexity, or thelength, of the animation as other common animation techniques like keyframing can be.

• Realistic movements and physical interactions, including exchange of forcesand secondary motion, are easily recorded.

Although it is a very fast and cost effective technique, the automatic natureof motion capture has a few disadvantages as well:

• Specific hardware, software, and personnel are needed to record and pro-cess the data, as well as a designated motion capture area in some cases.

9

2.1. OPTICAL SYSTEMS CHAPTER 2. MOTION CAPTURE

The combined cost can be too much for small studios or productions witha low budget.

• The motion capture area may be very limited in volume, severely effectingthe possible range of scenes to capture. The system may also be dependenton specific clothing or prohibit the use of metal objects.

• Movements which are not physically possible cannot be captured.

• Artifacts occur when the proportions of the capture subject’s limbs differsfrom the computer model.

One of the earliest techniques related to motion capture is rotoscoping, ananimation technique invented in 1915 by Max Fleischer [7]. With rotoscoping,animators trace over live-action film, frame by frame, with hand-drawn anima-tion. The primary use of rotoscoping was to help the animators quickly createrealistic movement with as little work as possible. Rotoscoping has since thenbeen used in many of Disney’s animated feature films with human movement,for example ”Snow White and the Seven Dwarfs” from 1937, as well as morerecent productions like Ralph Bakshi’s animated film ”The Lord of the Rings”from 1978 [5].

The first successful CGI implementation of motion capture more closely re-lated to today’s techniques is the movement of the animated robot "Brilliance".She was produced by Robert Abel and Associates for the National Canned FoodInformation Council for a commercial aired during the 1985 Super Bowl. Hermovements were animated by painting a total of 18 black dots on a live model’sjoints, and photographing the model from multiple angles. The photographswhere then processed in the computer to generate the information needed toanimate the robot.

Kitagawa and Windsor[4] divide today’s motion capture techniques into fivemain areas; optical, magnetic, mechanical, ultrasonic and inertial systems. Thelast two are not discussed due to their rare use in the entertainment business.

2.1 Optical systemsOptical systems rely on the use of multiple cameras with overlapping captureareas to gather motion capture data. The cameras are used together with up tohundreds of reflective or light emitting markers attached to the capture subject’sjoints, or in some recent systems, with no markers at all. At least two camerasneed to see a marker in order to triangulate its 3D position, although three ormore is preferable for accuracy. Real time view is possible, although limitedto less accurate motions due to a needed post-processing step where rotationalinformation is calculated for the markers. Problems with the technique aremostly related to occlusion from the capture target or props, as well as a limitedcapture area requiring a very controlled lighting environment [4].

10

2.1. OPTICAL SYSTEMS CHAPTER 2. MOTION CAPTURE

2.1.1 Passive markersPassive markers usually have a spherical or circular shape and are coated witha reflective material, often put directly on the skin of the actor or on a slimbody suit. Using a light source directed from each camera, the markers reflectthe light and appear very bright. The images from the camera are thresholdedso only the markers are seen.

2.1.2 Active markersActive markers are made of light-emitting diodes (LED) instead of reflectingexternal light. This requires the capture target to wear electrical equipmentbut helps significantly with the identification of each marker. This is achievedeither by lighting one marker at a time at the cost of frame rate, or by using aunique combination of amplitude and frequency for each LED.

Figure 2.1: Optical motion capture suit with markers from MoCap for Artists:Workflow and Techniques for Motion Capture[4].

2.1.3 MarkerlessResearch in computer vision has pushed markerless motion capture techniques.At MIT [8] and Stanford [2], techniques using no specific clothing or markershave emerged by extracting motion parameters from edges and silhouettes fromthe image stream. Among commercial systems, the company mova and their

11

2.2. MAGNETIC SYSTEMS CHAPTER 2. MOTION CAPTURE

Contour Reality Capture system uses phosphorescent makeup to capture thegeometry and texture of an actors face.

2.2 Magnetic systemsMagnetic systems use less sensors than optical systems, often as few as 12-20and with a lower sampling rate. The sensors are attached to the capture targetand output both position and orientation by measuring the spatial relationshipto a magnetic transmitter. This enables real time viewing without any tediouspost-processing, but with the drawback of a smaller capture area. Magneticsystems do not suffer from occlusion or problems with sensor identification butare prone to magnetic and electrical interference caused by metal objects as wellas electrical equipment. The output tends to be a bit noisy and the mobility ofthe capture target is somewhat limited by carrying a battery and wiring for thesensors.

2.3 Mechanical systemsMechanical systems are worn as an exo-skeletal device consisting of hinge-joints,straight rods and potentiometers. The system measures the joint angles of thecapture subject and is suitable for real time viewing. There is no occlusion,no interference and no capture area is needed due to the systems high porta-bility. The problems are mostly related to the bad mobility caused by rigidexo-skeletons tendency to break, as well as the limited range of motion from thehinge-joints. Mechanical systems also rely on accelerometers to measure globaltranslation like walking or jumping, often resulting in the data staying at thesame spot on the ground or sliding motion.

2.4 Body motion captureThe task from The Interactive Institute demands that a single webcam is usedwith no markers, severely limiting the available techniques to mono view mark-erless optical systems. A lot of these systems cannot be considered for real timeapplications or relies on a too controlled environment [9]. Although promisingresults can be achieved with a powerful computer and a parallelized implemen-tation, it requires large amounts of training data as well as an initiation bythe user [10]. Full body mono view markerless motion capture is overall a verychallenging task, especially in an exhibition environment with changing lightingconditions and a crowded background [11]. A more realistic approach, while stillmoving within the boundaries of the task, is finding a less demanding motioncapture target, for example facial motion capture.

12

2.5. FACIAL MOTION CAPTURE CHAPTER 2. MOTION CAPTURE

2.5 Facial motion captureFacial motion capture focuses on capturing the head movements as well as themovement of face muscles in order to recreate facial expressions. Althoughsome of the concerns from full body motion capture remain in facial motioncapture, it is still a more suitable approach due to the high number of real timeimplementations available.

2.5.1 Feature trackingA feature tracker finds and follows facial features from one frame to another.A feature can be any point in a face but is often found on the edge around themouth, an eye, an eyebrow or any other contrasting part easily found throughimage processing.

A common first step in locating features to track is to find a face to limitthe search. Some use color analysis to find skin tones in an image and thenimage processing techniques like thresholding and morphological operations tolocate the overall face [12, 13]. Others use a machine learning approach wherea system is trained to recognize a face based on classifiers from training data[14]. Both techniques can also be used to locate facial features in the limitedregion of the face. The thresholding approach, however, often identifies thefound features by comparing their geometrical information from the image to apredefined face model. Due to the variety of individual appearances, for exampleskin color and face proportions, we consider both methods far from ideal. Onesolution is to manually mark facial features from a neutral expression in aninitialization phase. This obviously requires tedious user input and is thereforenot an approach suited for our application.

Figure 2.2: Example of detected facial features from A robust facial featuretracking system[13].

The second step is tracking the identified feature points from one frame toanother. It is possible to repeat the previous procedure for each frame, but a lesscomputational heavy way is to track the feature points using their neighboringpixels in template matching [13, 15]. The third and last step is approximating

13


head rotations. Many feature tracking algorithms skip this step entirely andare therefore prone to error when handling out-of-plane rotations in the inputimage [14, 16, 17]. Promising results are accomplished using Kalman filter orthe Posit algorithm together with a simplified 3D face model to predict headmovements [15, 13, 18]. The downside is that the technique relies heavily on thesimilarity between the simplified 3D face model and the geometry of the actualface being captured.

2.5.2 Active appearance modelsAn interesting approach not too distant to feature tracking is active appearancemodels (AAM) [19]. An AAM consists of two parts; a shape model and anappearance model. The shape model is defined as a simple triangulated 2Dmesh of a human face, or more specifically its vertex positions. The shapemodel is commonly implemented as a shape matrix s containing the vertexpositions v that make up the mesh.

s = (x1, y1, x2, y2...xv, yv)T (2.1)

AAMs allow linear shape variation, meaning that the shape s can be ex-pressed as the linear combination of a base shape s0 and n weighted shapevectors si.

s = s0 +

n∑i=1

pisi (2.2)

The base shape s0 and the shape vectors si are created by manually over-laying the 2D mesh on a series of training images containing faces. PrincipalComponent Analysis (PCA) is then run on the training shapes, creating s0 fromthe mean matrix and the si matrixes from the reshaped eigenvectors with thelargest eigenvalues [20].

The appearance model is defined as the pixel content within the base shapes0. The appearance of an AAM is then an image A(u) where u(u, v) ∈ s0. Likethe shape model, each appearance A(u) can be described as a linear combinationof a base appearance A0(u) and m weighted appearances Ai(u).

A(u) = A0(u) +

m∑i=1

λiAi(u) (2.3)

A0(u) and Ai(u) are created using PCA on the pixel content within eachshape in the training images. First the shapes need to be transformed to theshape s0 through piecewise affine warp between the corresponding triangles inthe shape and s0. A0(u) is the mean image and Ai(u) are the eigenimages withthe largest corresponding eigenvalues from the PCA.

Given the shape weights p = (p1, p2...pn)T , the shape model can be calcu-lated using equation 2.2. Likewise, given the appearance weights λ = (λ1, λ2...λm)T ,the appearance model can be calculated using equation 2.3. The AAM instance

14


can now be created by warping the appearance A from its base shape s0 to themodel shape s.

Normally the AAM model is fitted to a face in an input image, meaningfinding the optimal shape and appearance weights minimizing the differencebetween the input image and the AAM instance. There are a number of waysto solve this problem depending on the speed and efficiency needed. Among thefast solutions, the inverse compositional alignment [19] appears to outperformprevious algorithms in efficiency.

AAMs can be extended to include 3D head pose estimation [21, 22] as wellas handle occlusions [23]. However, the use of training data requires a lot ofmanual labour and still performs unsatisfactory on persons excluded from thetraining data [24].

2.5.3 visage|SDKOne commercially available feature tracker is visage|SDK from Linköping basedVisage Technologies AB1. Visage Technologies AB offers services and applica-tions involving computer generated virtual characters and computer vision forfinding and tracking faces and facial features in images and video. Their realtime feature tracker, based on a master’s thesis by Nils Ingemars [15], supportsout-of-plane head rotations with full 3D head tracking but requires a setup pro-cedure. The setup consists of manually positioning and scaling a 2D projectionof the Candide-3 [25] face mesh model over a still image of the capture targetat the beginning of the tracking session. Another noticeable issue is the lack ofrecovery from errors caused by fast head movements or occlusion.

Figure 2.3: Screen capture of example video showcasing visage|SDK from VisageTechnologies.

1http://www.visagetechnologies.com/

15


2.5.4 faceAPIAmong the commercially available feature trackers faceAPI from SeeingMa-chines2 stands out. Their real time implementation offers a fully automatic andhighly robust face tracker with an estimated 3D head-position and orientation.The features available for tracking are eyes, eyebrows and lips which all canbe extracted from a number of movie file formats as well as any webcam. Thetracking is robust to occlusions, fast movements, large head rotations, environ-ment lighting as well as varying personal traits including facial deformation,skin color, beards and glasses and automatically recovers from tracking errors.The tracker is easily integrated, highly configurable and comes with a compre-hensive documentation. faceAPI fulfills all of our demands and much more andis therefore the optimal tracker for our final application.

Figure 2.4: Part of a screen capture of our final application visualizing thetracked features from faceAPI.

2http://www.seeingmachines.com/

16

Chapter 3

Animation

The second task in our Master’s thesis is to develop a system animating a 3Dface model with the output feature point coordinates from our tracking software.The animation system is to be separated from the tracking system in order toplay a key framed animation when no tracking occurs.

A possible animation system is the muscle-based approach where facial ex-pressions are created by simulating the movement of the underlaying musclestructure of the face [26]. Another approach is to simply move the vertices inthe face model according to their feature point counterparts from the trackingdata and interpolate the leftover vertices.

The first approach adds a lot of complexity since we need to decide whichmuscles are involved based on the movements of the feature points which isfar from trivial. The second approach is quite promising but requires a highernumber of feature points at greater precision than available in order to createrealistic expressions.

What is needed is a technique that can create realistic expressions from asmall set of feature points without adding a lot of complexity. The technique alsoneeds to be well known in order for us to find a realistic, textured and animated3D face model resource since our 3D modeling experience is not enough for thelimited timeframe. A technique fulfilling all these demands is blend shapes.

3.1 Blend shapesBlend shapes is not a new concept, according to Joshi et al. [27] it can betraced back to Parke’s work on facial animation in the early 70’s [28, 29]. Thetechnique consists of a model with a neutral expression and a finite set of extremeexpressions. From that it is possible to construct virtually an infinite amountof expressions by blending the neutral expression with the extremes weighted indifferent ways. The use of blend shapes makes it possible to create details whichare hard to track but likely to occur together with trackable details. For examplethe wrinkles above the nose when the inner feature points of the eyebrows move

17

3.2. SKELETAL ANIMATION CHAPTER 3. ANIMATION

downward and inward to create a frown.

Figure 3.1: Example of blend shapes from the book GPU Gems 3[30].

Realistic and textured blend shape resources can be obtained from variousplaces. A popular tool is FaceGen1 from Singular Inversions, a program used alot in the game industry to create diverse facial blend shape meshes with littleeffort.

3.2 Skeletal animationThe linear blending of blend shapes makes it a bad solution for motion includingrotations. To animate such motion, for example head rotations or jaw move-ment, virtual bones are placed inside the model. Each vertex is then connectedwith weights to the bones that can effect that vertex. When the bones movethe model is animated.

1http://www.facegen.com/

18

Chapter 4

Blink detection

Humans blink very often and involuntary. If our virtual character can matchthe blinking of the user it will help to improve the user experience. There arequite a bit of research done on detecting and analyzing blinks and eye movement[31, 32, 33]. When observed over a period of time this can give insight into themental status of the subject [33], for example measuring fatigue in an operatorof heavy machinery or critical systems.

A common problem in blink detection is the initial localization of the eyes,a problem faceAPI handles for us. Once the eyes are found, the actual blinkdetection can commence by matching each eye to a person-specific open-eyetemplate [32] or search for motion in the immediate surrounding [31, 33].

The template approach detects blinks by measuring the error between theeye template and an area around the last known eye location. A high errorindicates a bad fit, meaning that the eye has changed from open to closed. Oneproblem with the template approach is knowing when the eyes are open for theinitial creation of the templates. A bigger problem is the bad matching thatnaturally occurs when the user turns his head and each eye region is comparedto a template seen from a frontal position. This results in false positives, makingthe technique viable only for stationary heads or head-mounted cameras. Anapproach more resilient to our head rotations is searching for motion usingnormal flow.

4.1 Normal flowThe base of the technique used by Heishman and Duric [33], as well as Divjakand Bischof[31], is to calculate the direction and amplitude of the movementwithin a frame around the eyes. A common technique for this is called opticalflow but it is considered too slow for real time applications by Heishman andDuric who instead use a similar technique called normal flow. The advantageof normal flow is that it can be computed using only local information. Bysubtracting the global head movement from the local flow movement around

19

4.2. STATE MACHINE CHAPTER 4. BLINK DETECTION

the eyes, a reliable flow direction and magnitude can be found and used todetermine the state of the eyes.

Figure 4.1: The normal flow showing an opening sequence between two eye-frames. Image from Using Image Flow to Detect Eye Blinks in Color Videos[33].

4.2 State machineTo monitor and update the state of the eyes Heishman and Duric[33] proposethe use of a state machine. In their state machine they have three states; open,opening and closing. Divjak and Bischof[31] extends on this and add anotherstate to their state machine; closed. The reason behind using a state machineis that normal flow by itself do not tell if the eye is open or closed. But if youhave the previous state of the eye as well as the information from the flow it ispossible to know what the next state is. A potential problem is finding genericthresholds for the flow direction and the magnitude which together decide thecurrent state.

20

4.2. STATE MACHINE CHAPTER 4. BLINK DETECTION

Figure 4.2: The visualization used by Divjak and Bischof [31] for their statemachine. mag is the mean flow magnitude, dir is the dominant flow directionand T is the mean flow magnitude threshold.

21

Chapter 5

Implementation

The application is primarily written in C++. For face tracking faceAPI is used,other APIs used are OpenGL for graphics, OpenCV1 to interact with the cameraand for image processing, GLFW2 and QT3 for window management and GUIhandling. Boost4 is also used for thread management and for easy handlingof command line options. Shaders are written in GLSL. OpenGL is favouredbefore DirectX and the XNA-framework since we are already familiar with it.GLFW is used because of its simplicity while OpenCV is the obvious option forimage processing in C/C++.

5.1 Blend shapesIn the blend shape calculations each vertex can be calculated independentlyfrom each other, making it suitable to be done on the vertex shader. Eachblend shape consists of the difference between the expression mesh and theneutral expression mesh. The differences for vertices and normals are stored ina texture which together with texture width, vertex count and number of shapesis supplied to the shader in order to calculate the correct row and column forthe texture lookups when blending.

To get the current vertex id the ”GL_EXT_gpu_shader4”-extension5 isused, enabling the built in ”gl_VertexID” variable which just as easily couldbe supplied as an attribute. The ”GL_ARB_texture_rectangle”-extension6 isalso used for rectangle textures with integer indices instead of the standard 2Dtextures where the coordinates are normalized to the 0 to 1 range.

The formula used to calculate the resulting vertices and normals for theshapes can be seen below in 5.1 where ~dk is the differences and wk is the weight

1http://opencv.willowgarage.com/wiki/2http://glfw.sf.net3http://qt.nokia.com/products/4http://www.boost.org/5http://www.opengl.org/registry/specs/EXT/gpu_shader4.txt6http://www.opengl.org/registry/specs/ARB/texture_rectangle.txt

22

5.2. SKINNING CHAPTER 5. IMPLEMENTATION

for shape k.

~vout = ~vbase +

N∑k=1

~dkwk (5.1)

This results in the following shader code:

uniform sampler2DRect vertexDeltas;uniform sampler2DRect normalDeltas;uniform sampler2DRect weights;uniform int vertexCount;uniform int numOfShapes;uniform int textureWidth;

...

int id = gl_VertexID;vec4 vertex = gl_Vertex;vec3 normal = gl_Normal;

for(int i = 0; i < numOfShapes; i++){

int index = vertexCount * i + id;int row = index / textureWidth;int col = index % textureWidth;vec2 dpos = vec2(col, row);vec2 wpos = vec2(i, 1);

float curr_weight = texture2DRect(weights, wpos).x;

vec3 vertexDelta = texture2DRect(vertexDeltas, dpos).xyz;vertex += vec4(vertexDelta * curr_weight, 0);

vec3 normalDelta = texture2DRect(normalDeltas, dpos).xyz;normal += vec3(normalDelta * curr_weight);

}

The blend shape mesh used in the application is created by the digital artistKent Trammell7.

5.2 SkinningThe model has in Maya bones defined for head rotations, jaw movement andeye movement. Unfortunately the format8 used to export the model from Maya

7http://www.ktrammell.com8Wavefront .OBJ

23

5.2. SKINNING CHAPTER 5. IMPLEMENTATION

does not have any support for skinning data, making it impossible to exportthe bone information. To solve this a pre-processing step was added where thebone weights are estimated for the head rotations based on the y-value of thevertex and which part of the mesh it belongs to. In figure 5.1 this is illustratedusing green color where the model is effected by rotation and red color where itis not effected. The gradient is the area where we do a linear interpolation toget a smooth transition from non effected vertices to the effected vertices. Forexample it is known that the shirt mesh is not effected by the bone so there isno point checking the y value for its vertices. It is also known that the eyes,teeth and tongue meshes are always effected by the current maximum of thebone weight making it possible to skip the check for their y values also.

Figure 5.1: Illustrated bone weights.

The location of the bone is manually placed inside the throat of the facemesh, visualized as the top red dot in figure 5.2. The output from faceAPI isthe euler angles for the head rotations which are used to construct the rotationmatrix on the CPU. The rotation matrix is then uploaded to the GPU everyframe where it in the vertex shader is applied after all blend shape calculations.

24

5.3. FILTERING CHAPTER 5. IMPLEMENTATION

Figure 5.2: Illustrated bone placement.

Skinning for jaw movement and eye movement are substituted with blendshapes for simplicity.

5.3 FilteringThe values received from faceAPI are very noisy, resulting in jerky movementsfor the virtual character unless a low pass filter is applied. Both an InfiniteImpulse Response (IIR) filter, described in equations 5.2 and 5.3, and a FiniteImpulse Response (FIR) filter, described in equation 5.4 are implemented. Inequation 5.3 RC is the time constant controlling the behavior of the filter and∆t is the length of the current time step. In equation 5.4 N is the length of thebuffer.

y[n] = y[n− 1] + α(x[n]− y[n− 1]) (5.2)

α =∆t

RC + ∆t(5.3)

y[n] =1

N

N∑k=0

x[n− k] (5.4)

For head rotations the IIR filter was discovered to be unusable due to theimpulse response being non-zero an infinite amount of time, making the virtualhead movements lag too far behind the user’s movements.

25

5.4. MATCHING LANDMARKS CHAPTER 5. IMPLEMENTATION

5.4 Matching landmarksThe tracked feature points are called landmarks in the faceAPI standard. Inevery frame the differences between landmark coordinates are calculated in orderto find the appropriate weights for the blend shapes. For example the weight forthe open mouth blend shape is calculated by comparing the difference betweenan upper lip landmark and an lower lip landmark. The problem with thisapproach is to find the largest difference possible in order to normalize thedifference to a value between 0 and 1 and use it as a weight for the blend shape.Every user has a unique appearance and different face proportions, making ithard to specify fixed maximum differences that works for anyone. The approachused is to set all the maximum differences relative to the distance between theeyes. This distance works well as an indicator of head width since it remains thesame during the entire user experience as well as being created from the veryfirst tracked image where a frontal face is required. More precise relationshipscould be used if the first tracked image always contains a neutral expressionwhich unfortunately is not the case.

The ”lowpassFilter”-function in the pseudo code below is always the FIRfilter described in section 5.3. The width of the buffer differs between them asthe goal was to make the window as narrow as possible to reduce lag.

Algorithm 1 General psuedocodepupilSeperation = abs(rightEyeCenter - leftEyeCenter)mouthNeutralWidth = 0.7 * pupilSeperation

”rightEyeCenter” in algorithm 1 is landmark 600 and ”leftEyeCenter” is land-mark 700. See appendix A for the faceAPI landmark standard.

26

5.4. MATCHING LANDMARKS CHAPTER 5. IMPLEMENTATION

5.4.1 Eyebrows

Algorithm 2 Eyebrow psuedocodefor each eyebrow do

browDiff = lowpassFilter(abs((eyeCenter - innerBrowPoint ) / pupilSeper-ation))if browDiff < 0.22 then

amountDown = min((0.22 - browDiff) / 0.7, 1.0)setShape(Brow down, amountDown * 1.3)setShape(Eye squint, amountDown)setShape(Brow up, 0)setShape(Eye wide, 0)

elseamountUp = min((browDiff - 0.22) / 0.23, 1.0)setShape(Brow up, amountUp * 1.3)setShape(Eye wide, amoutUp)setShape(Brow down, 0)setShape(Eye squint, 0)

end ifend for

”innerBrowPoint” refers to landmark 302 and 400. ”eyeCenter” to landmark600 and 700.

5.4.2 Mouth

Algorithm 3 Open mouth psuedocodemouthOpen = lowpassFilter(abs(overlipYPos - underlipYPos))amountOpen = min(mouthOpen / (0.8 * mouthNeutralWidth), 1.0)if amountOpen > 0.3 then

setShape(Mouth open, ((1.0/0.7) ∗ (amountOpen− 0.3))2)else

setShape(Mouth open, 0)end if

27

5.5. BLINK DETECTION CHAPTER 5. IMPLEMENTATION

Algorithm 4 Smile psuedocodemouthWidth = lowpassFilter(abs(leftMouthCorner - rightMouthCorner))mouthMaxWidthDelta = (1.04 * pupilSeperation) - mouthNeutralWidthmouthMinWidthDelta = mouthNeutralWidth - (0.5 * pupilSeperation)if mouthWidth > mouthNeutralWidth then

amountWide = min((mouthWidth - mouthNeutralWidth) / mouth-MaxWidthDelta, 1.0)setShape(Full smile, amountWide)setShape(Left temple flex, amountWide)setShape(Right temple flex, amountWide)

elsesetShape(Full smile, 0)setShape(Left temple flex, 0)setShape(Right temple flex, 0)

end if

The corresponding landmarks:

• ”overlipYPos” is y-component of landmark 202

• ”underlipYPos” is y-component of landmark 206

• ”rightMouthCorner” is landmark 4

• ”leftMouthCorner” is landmark 5

5.4.3 Jaw movementThe landmarks for the lips together with the face contour landmarks allow fordetection when the user moves his jaw to the right and left. The program is notkeeping track of the jaw position, instead it looks for movement and initiate thecorrect animation when it detects a movement to the right or left.

5.5 Blink detectionUsing the eye landmarks from faceAPI, a 40x50 pixels large sub image is createdaround each eye for blink detection calculations. Each eye is calculated inde-pendently but we make our virtual character blink with both eyes even if onlyone eye detects a blink. Otherwise the eyes blink out of sync which looks veryunnatural. This also helps animating blinks for users with asymmetric blinkbehavior [33] at the cost of removing single eye blinks.

To calculate motion around the eyes, Heishman and Duric [33] use the nor-mal flow algorithm and reject optical flow for real time applications. Due totime constraints and the fact that OpenCV has several ways to calculate opti-cal flow implemented, optical flow was chosen as the blink detection technique.

28

5.6. KEY FRAMING CHAPTER 5. IMPLEMENTATION

Based on the required inputs and what is said about them in [34] the ”cvCal-cOpticalFlowHS()” function was chosen.

Head movements introduce noise to the optical flow calculations. By com-pensating for roll and ignoring the results if the pitch or yaw in a frame is abovea certain threshold a large portion of false positive blinks should be removed.

Algorithm 5 Blink detection psuedocodeApply the reverse head rotation around the Z-axis to the image.for each eye do

Calculate optical flow on a 40x50 image with center in the pupil.Construct histograms of the directions and magnitudes of the optical flowvectors.T = 1.8 + (z-value of the head position) * 0.9if The length of the rotation-difference vector < 0.8 then

if The average magnitude > T and the dominating direction is down.then

Blink with both eyes.end if

end ifend for

5.6 Key framingWhen the system is not tracking a user it looks a bit boring if the model isnot doing anything. This is fixed by using a predefined animation playing. Theanimation is implemented by interpolating between key framed expressions andhead rotations.

5.6.1 AnimationIn the key frame animation system a key frame is defined with a time and acomplete set of all rotations and shape weights. For every time step we calculatehow far it is between two key frames on a scale from 0 to 1 and then the followingfunction is used to interpolate between the two.

y(t) = 6t5 − 15t4 + 10t3 (5.5)

This equation was proposed by Ken Perlin in [35] and is C2 continuous whichmeans that it is continuous in the second derivative thus making the transitionbetween key frames really smooth.

5.6.2 EditorHard coding the key frame animations with C++ is very tiresome. Therefore asimple file format was developed to store the key frame data and an editor toedit the format. Details on the file format can be seen in appendix B.3.3.

29

5.7. EYES CHAPTER 5. IMPLEMENTATION

Figure 5.3: Key frame editor

5.7 EyesTo get a good user experience it is important that the eyes of the virtual char-acter do not stare out into nothing but instead look at the user. This wasaccomplished by rotating each eye around its center in the opposite direction ofthe pitch and yaw head rotations making the virtual character look straight atthe user at all times.

5.8 Upper body movementThe shoulder section of the model has no movement caused by blend shapesor the head’s bone animation but if it is completely still it does not feel right.To address this problem a low frequency and low magnitude noise was addedcontrolling the upper body rotation around an imaginary bone placed in the hipof the virtual character, bottom red dot in figure 5.2. This gives the impressionof the character swinging back and forth like he is shifting his weight from onefoot to the other.

5.9 NoiseFor the low frequency upper body motion perlin noise[35] is used, specificallyan implementation made by Stefan Gustavsson [36]. The noise is applied inde-pendently for each rotation axis and consists of a sum of noises with different

30

5.10. LIGHTING CHAPTER 5. IMPLEMENTATION

magnitudes and frequencies.

5.10 LightingFor lighting the model the Phong model is usded. The light sources are fourpoint lights consisting of one key light, two rim lights and one fill light.

5.11 SSAOScreen space ambient occlusion (SSAO) is a technique that tries to approximateambient occlusion in screen space using the depth value and normal in each pixel.An implementation from [37] with some modifications is used. The original codeuses the texture coordinates from the fullscreen quad to lookup a random vectorresulting in an artifact where the SSAO shadow ”slides” over the model whenit moves. To fix this the texture coordinates from the model is used instead tolookup a random vector.

31

Chapter 6

Results

An important factor to consider when evaluating our work is how the receptionof our installation has been in the exhibition. We have not done any formalsurvey but from our observations during the opening weekend and what wehave heard from the tour guides people do like the installation. However thereare some usability problems, to some people it not clear what to do.

6.1 PerformanceOur vertex shader is not optimized for performance in any way, for example thetwo rows of teeth in our model consist of 50k polygons. They have no blendshapes but the blend shape calculations in the vertex shader is applied to themany way for a simpler code structure. Our only consideration performance wisefor the GPU calculations has been to make sure that the anti aliasing level doesnot make the framerate drop below 60fps. On the CPU faceAPI takes a lot ofperformance bringing our developing machines with a Core 2 Duo processor toits limit. This became a non issue on the exhibition computer that have moremodern Core i7 Quad Core processor.

32

6.2. SSAO CHAPTER 6. RESULTS

6.2 SSAO

Figure 6.1: Comparision between SSAO on (left) and SSAO off (right).

As can be seen in figure 6.1 the visual results is not that good. The imageis taken from the exhibition computer running a ATI Radeon 5770 graphicscard. On our developing machines running Nvidia 8800 GTS graphics cards theresults were still not good enough but do not look quite as bad. There werealso a lot of problems with the framebuffer extension on the ATI card. Thingsthat worked on the Nvidia card did not work at all on the ATI card. Whenuseing framebuffers we also do not take advantage of the anti aliasing applied toour OpenGL window since we missed the OpenGL extension that handles antialiasing for framebuffer rendering. Our solution was to supersample the textureourselves by rendering to a texture that was larger than the window resolutionand then scaling it down when displaying it.

33

6.3. USAGE EXAMPLE CHAPTER 6. RESULTS

6.3 Usage Example

(a) Tracked neutral expression (b) Keyframed expression

(c) Keyframed expression (d) Tracked sceptical expression

Figure 6.2: Usage examples.

34

6.3. USAGE EXAMPLE CHAPTER 6. RESULTS

(a) Tracked smiling expression withfaceAPI output frame and wire frame

(b) Tracked open mouth with faceAPI out-put frame and wire frame

(c) Tracked head rotations with faceAPIoutput frame and wire frame

(d) Tracked closed eyes with faceAPI out-put frame and wire frame

Figure 6.3: Usage examples.

35

6.4. THE INSTALLATION CHAPTER 6. RESULTS

6.4 The installation

Figure 6.4: The installation seen from the front.

36

6.4. THE INSTALLATION CHAPTER 6. RESULTS

Figure 6.5: The installation seen from the back.

37

Chapter 7

Conclusions

During the work we have had a couple of decision points where we had to decideon the direction of the work. The two important ones were chosing a model aswell as a tracking software.

7.1 OpenCVWe found that OpenCV is very good at what is does but the nature of the C-language makes it hard to use and extremely hard to debug. However OpenCValso has a Python interface and this looks a lot easier to use, it would beinteresting see what kind of performance hit replacing our C OpenCV codewith the same functionality in Python would have.

7.2 Blink detectionLatency is often a bad thing but for our blink detection it is actually beneficialsince it allows the user to see the virtual character blink. Even though wesimplified the algorithms discussed by Heishman et al. [33] and Divjak et al.[31] we are happy with the results. Our goal from the start was to keep track ofthe open and close state of the eye but that showed not be not feasable with oursetup. Besides the flow calculation algorithms we looked into if it was possibleto filter out the white part of the eye and depending on how much ”white” wefound determine the state of the eye. This idea was rejected quite early becauseof the low resolution of our webcam images as well as the amount of ”white”varied huge between just the two of us.

7.3 Tracking and animation separationAs we said in the animation chapter the goal was to separate the animationand tracking and we think we succeeded in that part. Our solution is that they

38

7.4. FUTURE WORK CHAPTER 7. CONCLUSIONS

run in separate threads within a single program and communicate via abstractinterfaces thus making it easy to add other classes that can communicate viathe same interfaces. Of course the separation could be taken to the extremewhere they run in a separate process and communicate via for example tcp/ipbut that would be completely overkill for our application.

7.4 Future work

7.4.1 Gaze trackingIt would have been neat if the eyes of the model followed the users eyes. We hadno time to look further into this but still think that our solution with the modelalways looking forward is good enough for this installation. If the user looks atthe screen then the eyes are looking the right direction, if the user looks in anyother direction he or she will not notice if the eyes do not follow perfectly.

39

Appendix A

faceAPI landmark standard

Figure A.1: faceAPI landmark standard

40

Appendix B

User manual

B.1 Command line arguments--help Produces this help message.--windowmode Force window mode.--fullscreen Force fullscreen mode.--fsaa arg Set FSAA level. Default value is 4.--width arg Set horizontal resolution.--height arg Set vertical resolution.

Fullscreen mode will always overwrite window mode regardless of the orderingof the arguments.

B.2 Available ShapesWe have in total 28 different shapes.

• Left brow down.

• Right brow down.

• Left brow up.

• Right brow up.

• Left eye closed.

• Right eye closed.

• Open mouth.

• Angry pucker with the mouth.

• Full smile.

41

B.3. FILE FORMATS APPENDIX B. USER MANUAL

• Look right.

• Look left.

• Look up.

• Look down.

• Left temple flex.

• Right temple flex.

• Left eye squint.

• Right eye squint.

• Left eye wide.

• Right eye wide.

• Shift jaw left.

• Shift jaw right.

• Crinkle nose.

• Upper lip up.

• Under lip up.

• Right side smile.

• Left side smile.

• Left brow middle down.

• Right brow middle down.

B.3 File formats

B.3.1 Configstruct ConfigFileHeader{

int shapes;int files;

};

struct FileInfo{

char name[40];};

42

B.3. FILE FORMATS APPENDIX B. USER MANUAL

B.3.2 ModelsThe binary model files consist of the header seen below and then float arrayswith the data. The length of these arrays can be calculated using data in theheader and the order of these arrays is; vertices, normals, texture coordinates,weights, vertex deltas and normal deltas.

struct MeshFileHeader{

char name[50];int numOfShapes;int vertexCount;unsigned int meshType;char textureName[100];float gloss;float Ka[3];float Kd[3];float Ks[3];

};

B.3.3 AnimationOur animation file format is based on the .OBJ file format. Since we alreadyhad a reader for that we could reuse some of that logic to read our animationfiles.

This is an example of from the actual idle animation that is running in theexhibition:

newkftime 0rotation 0 0 0newkftime 1.4rotation 0 5 10shape 2 0.3shape 3 0.2shape 4 1.4shape 5 1.4shape 6 1shape 19 0.3

43

Bibliography

[1] Lockheed martin - human immersive lab. http://www.lockheedmartin.com/aeronautics/labs/human_immersive.html visited 7/8 2010.

[2] L. Mundermann, S. Corazza, and T.P. Andriacchi. Accurately measuringhuman movement using articulated icp with soft-joint constraints and arepository of articulated models. In Computer Vision and Pattern Recog-nition, 2007. CVPR ’07. IEEE Conference on, pages 1 –6, 17-22 2007.

[3] Remington Scott. Sparking life: notes on the performance capture sessionsfor the lord of the rings: the two towers. SIGGRAPH Comput. Graph.,37(4):17–21, 2003.

[4] Midori Kitagawa and Brian Windsor. MoCap for Artists: Workflow andTechniques for Motion Capture. Focal Press, 2008.

[5] Gordon Cameron, Andre Bustanoby, Ken Cope, Steph Greenberg, CraigHayes, and Olivier Ozoux. Motion capture and cg character animation(panel). In SIGGRAPH ’97: Proceedings of the 24th annual conference onComputer graphics and interactive techniques, pages 442–445, New York,NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co.

[6] Sharon Waxman. Computers join actors in hybrids on screen. http://www.nytimes.com/2007/01/09/movies/09came.html visited 7/8 2010,January 2007.

[7] Max Fleischer. Method of producing moving-picture cartoons. U.S. patent1242674, 1915.

[8] Michael Leventon and W. Freeman. Bayesian estimation of 3-d humanmotion from an image sequence. Technical report, 1998.

[9] K. Onishi, T. Takiguchi, and Y. Ariki. 3d human posture estimation usingthe hog features from monocular image. pages 1 –4, dec. 2008.

[10] Ryuzo Okada and Björn Stenger. A single camera motion capture systemfor human-computer interaction. IEICE - Trans. Inf. Syst., E91-D(7):1855–1862, 2008.

44

BIBLIOGRAPHY BIBLIOGRAPHY

[11] Cristian Sminchisescu. 3d human motion analysis in monocular video tech-niques and challenges. In AVSS ’06: Proceedings of the IEEE InternationalConference on Video and Signal Based Surveillance, page 76, Washington,DC, USA, 2006. IEEE Computer Society.

[12] Jari Hannuksela, Janne Heikkilä, and Matti Pietikäinen. A real-time fa-cial feature based head tracker. advanced concepts for intelligent visionsystems. In in Advanced Concepts for Intelligent Vision Systems,Brussels,page 267272, 2004.

[13] Jingying Chen and B. Tiddeman. A robust facial feature tracking system.pages 445 – 449, sep. 2005.

[14] Jong-Gook Ko, Kyung-Nam Kim, and R.S. Ramakrishna. Facial featuretracking for eye-head controlled human computer interface. volume 1, pages72 –75 vol.1, 1999.

[15] Nils Ingemars. A feature based face tracker using extended kalman filtering,2007.

[16] Taro Goto, Marc Escher, Christian Zanardi, and Nadia Magnenat-thalmann. Mpeg-4 based animation with face feature tracking. In In Proc.Eurographics Workshop on Computer Animation and Simulation ’99, pages89–98. Springer, 1999.

[17] Marian Stewart Bartlett, Gwen Littlewort, Mark Frank, Claudia Lainscsek,Ian Fasel, and Javier Movellan. Recognizing facial expression: Machinelearning and application to spontaneous behavior, 2005.

[18] Tommaso Gritti. Toward fully automated face pose estimation. In IMCE’09: Proceedings of the 1st international workshop on Interactive multi-media for consumer electronics, pages 79–88, New York, NY, USA, 2009.ACM.

[19] Iain Matthews and Simon Baker. Active appearance models revisited. In-ternational Journal of Computer Vision, 60:135–164, 2003.

[20] Kyungnam Kim. Face recognition using principle component analysis, 2003.

[21] B. Theobald, I. Matthews, S. Boker, and J. F. Cohn. Real-time expressioncloning using appearance models.

[22] Daniel F. DeMenthon and Larry S. Davis. Model-based object pose in 25lines of code. International Journal of Computer Vision, 15:123–141, 1995.

[23] Soumya Hamlaoui and Franck Davoine. Facial action tracking using particlefilters and active appearance models. In sOc-EUSAI ’05: Proceedings ofthe 2005 joint conference on Smart objects and ambient intelligence, pages165–169, New York, NY, USA, 2005. ACM.

45

BIBLIOGRAPHY BIBLIOGRAPHY

[24] Ralph Gross, Iain Matthews, and Simon Baker. Generic vs. person specificactive appearance models. Image Vision Comput., 23(12):1080–1093, 2005.

[25] Jörgen Ahlberg. Candide-3 - an updated parameterised face. Technicalreport, 2001.

[26] Mauricio Radovan and Laurette Pretorius. Facial animation in a nutshell:past, present and future. In Proceedings of the 2006 annual research confer-ence of the South African institute of computer scientists and informationtechnologists on IT research in developing countries, SAICSIT ’06, pages71–79, , Republic of South Africa, 2006. South African Institute for Com-puter Scientists and Information Technologists.

[27] Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and Frederic Pighin. Learn-ing controls for blend shape based realistic facial animation. In SIGGRAPH’06: ACM SIGGRAPH 2006 Courses, page 17, New York, NY, USA, 2006.ACM.

[28] Frederick I. Parke. Computer generated animation of faces. In ACM ’72:Proceedings of the ACM annual conference, pages 451–457, New York, NY,USA, 1972. ACM.

[29] Frederic Ira Parke. A parametric model for human faces. PhD thesis, 1974.

[30] Hubert Nguyen. Gpu gems 3. Addison-Wesley Professional, first edition,2007.

[31] Matjaz Divjak and Horst Bischof. Real-time video-based eye blink analysisfor detection of low blink-rate during computer use. Technical report, 2008.

[32] Michael Chau and Margrit Betke. Real time eye tracking and blink detec-tion with usb cameras. Technical report, 2005.

[33] Ric Heishman and Zoran Duric. Using image flow to detect eye blinks incolor videos. In WACV ’07: Proceedings of the Eighth IEEE Workshop onApplications of Computer Vision, page 52, Washington, DC, USA, 2007.IEEE Computer Society.

[34] Dr. Gary Rost Bradski and Adrian Kaehler. Learning OpenCV, 1st edition.O’Reilly Media, Inc., 2008.

[35] Ken Perlin. Improving noise. In SIGGRAPH ’02: Proceedings of the 29thannual conference on Computer graphics and interactive techniques, pages681–682, New York, NY, USA, 2002. ACM.

[36] Stefan Gustavsson. Noise. http://webstaff.itn.liu.se/~stegu/aqsis/aqsis-newnoise/.

[37] Ssao. http://www.gamerendering.com/2009/01/14/ssao/, January2009.

46

Documents

Magic mirror using motion capture in an exhibition environment799064/FULLTEXT01.pdf · Magic mirror using motion capture in an exhibition environment Examensarbete utfört i medieteknik