12
3D Reconstruction of Human Faces from Occluding Contours Michael Keller, Reinhard Knothe, and Thomas Vetter University of Basel, Computer Science Department, Basel, Switzerland {michael.keller, reinhard.knothe, thomas.vetter}@unibas.ch Abstract. In this paper we take a fresh look at the problem of extract- ing shape from contours of human faces. We focus on two key questions: how can we robustly fit a 3D face model to a given input contour; and, how much information about shape does a single contour image convey. Our system matches silhouettes and inner contours of a PCA based Mor- phable Model to an input contour image. We discuss different types of contours in terms of their effect on the continuity and differentiability of related error functions and justify our choices of error function (modified Euclidean Distance Transform) and optimization algorithm (Downhill Simplex). In a synthetic test setting we explore the limits of accuracy when recover- ing shape and pose from a single correct input contour and find that pose is much better captured by contours than is shape. In a semi-synthetic test setting – the input images are edges extracted from photorealis- tic renderings of the PCA model – we investigate the robustness of our method and argue that not all discrepancies between edges and contours can be solved by the fitting process alone. 1 Introduction Automatic face recognition from a given image is one of the most challenging research topics in computer vision and it has been demonstrated that variations in pose and light are the major problems [Zha00]. Since 2D view based methods are limited in their representation, most face recognition systems show good results only for faces under frontal or near frontal pose. Methods that are based on the fitting of deformable, parametrized 3D models of human heads have been proposed to overcome this issue [Bla03]. Our work investigates the problem of recovering the 3D shape of human faces from a single contour image. Previous work shows that shape recovery methods based on pixel intensities or color are inappropriate for matching edges accu- rately, while multi-feature approaches such as [Rom05], where contour features are used among other features, achieve higher accuracies at the edges. To fur- ther understand the benefits and limits of the contour feature, here, we take a step back to look at this feature in isolation. Our first interest lies in building a system to robustly find a solution with a small contour reconstruction error. Our second interest lies in determining the degree to which such a solution and therefore the 3D shape of the face is constrained by the input contour.

3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

  • Upload
    lytruc

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

3D Reconstruction of Human Faces fromOccluding Contours

Michael Keller, Reinhard Knothe, and Thomas Vetter

University of Basel, Computer Science Department, Basel, Switzerland{michael.keller, reinhard.knothe, thomas.vetter}@unibas.ch

Abstract. In this paper we take a fresh look at the problem of extract-ing shape from contours of human faces. We focus on two key questions:how can we robustly fit a 3D face model to a given input contour; and,how much information about shape does a single contour image convey.Our system matches silhouettes and inner contours of a PCA based Mor-phable Model to an input contour image. We discuss different types ofcontours in terms of their effect on the continuity and differentiability ofrelated error functions and justify our choices of error function (modifiedEuclidean Distance Transform) and optimization algorithm (DownhillSimplex).In a synthetic test setting we explore the limits of accuracy when recover-ing shape and pose from a single correct input contour and find that poseis much better captured by contours than is shape. In a semi-synthetictest setting – the input images are edges extracted from photorealis-tic renderings of the PCA model – we investigate the robustness of ourmethod and argue that not all discrepancies between edges and contourscan be solved by the fitting process alone.

1 Introduction

Automatic face recognition from a given image is one of the most challengingresearch topics in computer vision and it has been demonstrated that variationsin pose and light are the major problems [Zha00]. Since 2D view based methodsare limited in their representation, most face recognition systems show goodresults only for faces under frontal or near frontal pose. Methods that are basedon the fitting of deformable, parametrized 3D models of human heads have beenproposed to overcome this issue [Bla03].

Our work investigates the problem of recovering the 3D shape of human facesfrom a single contour image. Previous work shows that shape recovery methodsbased on pixel intensities or color are inappropriate for matching edges accu-rately, while multi-feature approaches such as [Rom05], where contour featuresare used among other features, achieve higher accuracies at the edges. To fur-ther understand the benefits and limits of the contour feature, here, we take astep back to look at this feature in isolation. Our first interest lies in buildinga system to robustly find a solution with a small contour reconstruction error.Our second interest lies in determining the degree to which such a solution andtherefore the 3D shape of the face is constrained by the input contour.

Page 2: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

Feature Extraction Fitting

input image edge map 3D reconstruction

contour 3D model

Fig. 1. Schematic view of the problem (see Section 1.1)

1.1 The Problem

Adapting a model to contours in an image can be decomposed into two subprob-lems: feature extraction and fitting (see Fig. 1). In a naive approach, featureextraction is performed by applying a general-purpose edge detection algorithm,such as [Can86]. Face model and camera parameters are then optimized by min-imizing a distance between the model contours and the edge image.

There is an important discrepancy where the two processes interface: edgeextraction computes edges and not contours. Even though most contours arefound among the edges, they are incomplete and noisy. There are several waysto deal with this problem, (a) using a robust distance function which ignoresboth unmatched edges and unmatched contours, (b) using the model to identifywhich edges correspond to contours, (c) using a more specific form of contourdetection in the first place.

In this paper we present and discuss a system based on the first approach:can we robustly fit model contours against edges by using an appropriate imagedistance measure. Additionally we investigate the ultimately more interestingquestion: assuming contour extraction was ideally solved, how well could we fitour model to a correct and complete input contour (synthesized from the model)and how good is the reconstruction of shape and pose.

1.2 Related Work

Moghaddam et al. [Mog03] built a system similar to our own and applied itboth to video sequences and to photos acquired with a multi-camera rig – singleimages are not discussed. Only silhouettes are used, detected using backgroundsubtraction. Another difference lies in the error function used in their system.

Roy-Chowdhury et al. [Roy05] also model faces from video sequences, butas opposed to this work a locally deformable model is used. In the second partof their paper silhouettes and control points of the model are matched againstedges extracted from the video frames. Model and optimization technique arevery different from ours, except that an Euclidean Distance Transform is usedfor image comparison.

Ilic et al. [Ili05] adapt an implicit surface representation of a morphable PCAmodel to silhouette contours by solving a system of differential equations. Very

Page 3: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

interesting is their integrated approach of detecting the contour based on thelocation and direction of the model contour.

Both [Mog03] and [Roy05] use multiple images to achieve higher accuracy.While this is legitimate, our basic question of the constraint imposed by a singlecontour cannot be answered. None of the cited papers provide quantitative re-sults describing reconstruction performance ([Roy05] do, but only for their SfMexperiment, which is not based on contours).

In the remainder of the paper we first discuss types of contours and theirproperties, to develop exact terminology. We then explain in detail the designand implementation of our fitting system. Finally we motivate and describe ourexperiments, and discuss results.

2 Contours

Different authors often mean different things when using the term “contour”.Sometimes, contours are “image contours” (f.i. in the context of Active ContourModels) a synonym of what we call “edges”. In the context of shapes, contouris often used interchangeably with “silhouette”. Other definitions of contoursare implied by terms such as “inner contours”, “texture contours”, or “shadowcontours”.

2.1 Types of Contours

We define contours as the collection of those edges whose image locations areinvariant to lighting. Contours fall then into two classes: those originating in thegeometry of the solid (“occlusion contours”), and those originating in materialproperties (“texture contours”). Please note that a line drawing of a face wouldcharacteristically consist of just these edges.

Occluding Contours. Assuming a smooth solid shape [Koe90], occluding con-tours can be formally defined as the projection of those points P of the surfacewhose surface normal n is perpendicular to the direction e from the eye to P ,i.e. 〈n, e〉 = 0. This definition encompasses both “outer” contours (silhouettes)and “inner” contours (self-occluding contours), where the latter only appear onnon-convex solids.

Texture Contours. Texture Contours are salient edges on texture maps. Forfaces, such contours are found at the lips, the eyes, the eyebrows, as well as athair boundaries and within hair.

2.2 Continuity and Differentiability

We will now investigate how different types of contours behave differently underparameter changes in terms of continuity and differentiability of related errorfunctions. These characteristics are very important when choosing an optimiza-tion algorithm.

For simplicity’s sake we describe the problem in a 2D analogy (Fig. 2). Weobserve the projections of silhouette (S1, S2), inner contour (U) and texturecontour (T ) with respect to a rotation ρ around R.

Page 4: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

Fig. 2. 2D analogy of contour projection (see 2.2)

Only T is fixed with respect to the rotating body. Without knowledge ofthe shape of the body its projection xt and its speed dxt

dρ can be analyticallydetermined, as long as T is not occluded.

Let now s1, s2 be defined as the two tangent rays with the maximum anglebetween them, and u the third tangent ray in the figure. When rotating the bodyclockwise, u will disappear. When rotating it counter-clockwise, u will coincidewith s2 to form a bitangent, then disappear.

Therefore the projection xu is not defined for all ρ. Whether absence of thecontour is penalized or not, xu will contribute a discontinuity to a compounderror measure. The projections x1, x2 are always present and unique, but wheres2 becomes a bitangent the derivative dx2

dρ is undefined.In 3D the situation is analogous: the movement of texture contour points with

respect to parameters is analytically differentiable [Rom05], silhouette movementis piecewise differentiable (albeit generally not analytically), and inner contourmovement is discontinuous (with 3D solids, contours appear, disappear, splitand merge under parameter changes [Koe90]).

2.3 Contour Matching

Which contour features are accessible for matching depends of course on theapplication context. If indeed one was reconstructing faces from a shadow play,then only silhouettes would be available. Much more common is a scenario wherethe input is an ill-lit photograph and all types of contours are in principle acces-sible. While it is our goal to use all types of contours, we concentrate for now onsilhouettes and inner contours, where the latter are arguably the hardest featureto work with.1 Because the error function is neither differentiable nor even con-tinuous, our chosen optimization algorithm is Downhill Simplex [Nel65], whichuses only function evaluations and deals well with ill-behaved functions.

3 Implementation

Our face model is computed from a database of m = 200 laser scanned faces.Each face is defined by n = 23888 3D vertices which are in correspondenceacross the database [Bla03]. Let xi ∈ IR3n, 1 ≤ i ≤ m be vector representa-tion of these faces, then our (m-1)-dimensional face space is the affine subspace

1 Please note that in most related works only the silhouette feature is used, whichdoes not suffer from discontinuities.

Page 5: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

{∑m

i=1 αixi|∑m

i=1 αi = 1}, or {x̄ + v|v ∈ V } with x̄ the mean face and V thevector subspace spanned by the mean-free vectors (x1 − x̄, ...,xm − x̄).

We estimate the probability of faces by fitting a multi-variate Gaussian dis-tribution to the mean-free data, yielding the pdf p(p) = c exp

(− 1

2pTΣ−1p

)for coefficient vectors p ∈ IRm, where Σ = 1

mXXT and c a normalization con-stant. This assumption allows us to judge plausibility of results and to randomlysample test cases.

In practise we work with an orthonormal basis U = (u1, ...,um), where Uis such that UTΣU = Λ, with Λ the diagonal matrix of eigenvalues of Σ (indescending order). This transformation is called Principal Component Analysis,and allows us to make a feature reduction which incurs a minimal L2 error.Specifically if p̂ ∈ IRk is a projection of p ∈ IRm onto the k-dimensional subspacespan(u1, ...,uk), then E[||Up−Up̂||2] =

∑mi=k+1 λi. In our case the sum of the

first 30 eigenvalues comprise 96% of the sum of all 200 eigenvalues, thereforefitting in a 30-dimensional subspace is mostly sufficient for good results.

Hair. While the hair line for an individual person may remain steady for years,hair length and hair style do not. Additionally hair often covers more innate andsalient features such as ears and chin. In our opinion, hair is a destructive factorwhen creating a PCA-based model, leading to many false correspondences.

For this reason our head model is completely without hair. When registeringheads we then need robust procedures which simply ignore portions of the inputpicture covered by hair. Future work may attempt to use such portions of inputpictures constructively, but for the time being our model has no concept of hair.

Shape Error. Although we understand that the L2 error is not optimal todescribe shape similarity as perceived by humans, it is sufficient in our context.We quote the L2 error as root-mean-squared error, with unit mm. This error iseasy to visualize as average distance of the reconstructed surface from the truesurface (although this is of course only correct if the error is relatively uniform).With the definitions for m,n, λi from above the L2 distance between two facesdescribed by shape vectors p, q is calculated as:

L2(p, q) =

√√√√ 1n

m∑i=1

λi(pi − qi)2. (1)

3.1 Error Function

Let p′ ∈ IRk, k = m + 7 be a shape vector extended by the seven cameraparameters of the pinhole camera model. Then the error function to be minimizedmust map such parameter vectors p′ to scalar values. Since the binary contourimage I ∈ {0, 1}w·h is in the 2D domain, it is reasonable to compose the errorfunction E as E(p) = D(I,R(p)) with R : IRk → {0, 1}w·h a rendering functionand D : {0, 1}w·h × {0, 1}w·h → IR an image comparison function (“distance”).As opposed to previous work, we do not add a prior probability term to ourerror function, as overfitting has not been an issue in our experiments.

Page 6: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

Rendering Function. The contour is found and rendered by determining frontand back facing polygons on the mesh parametrized by p. Edges between aback and a front facing polygon are contour candidates. Hidden contours areeliminated through z buffering. Finally the remaining edges are projected ontoan image.

Distance Function. Choosing a good distance function is much more difficult.Edge comparison is a research topic in its own right with many interesting ap-proaches, such as elastic matching, Fourier Descriptors, or Shock Graphs [Sid98],to name a few. All of them are computationally expensive and require relativelynoise-free input. As long as we are dealing with unreliable input from the Cannyedge detector, we choose the more primitive yet robust approach of distancetransforms, which solves correspondence implicitly (albeit not necessarily well).

The Euclidean Distance Transform [Fel04] of an input contour image is ascalar field d : IR2 → IR where d(x, y) is the Euclidean Distance of the pixel at(x, y) to the nearest contour pixel. With S = {(x, y)|R(p)(x, y) = 1} the set of“on” pixels in the synthetic contour image, we calculate the distance term as

D(I,R(p)) =1|S|

∑(x,y)∈S

d(x, y). (2)

Such a distance term is fairly robust against unmatched edges, althoughsuch edges can create an undesirable potential. On the other hand, contours notpresent in the input image will create large error terms and optimization willdrive them toward completely unrelated edges. To prevent this, the gradient faraway from input edges should be small or zero.

We experimented with three modifications of the EDT, of the form g(x, y) =f(d(x, y)): (a) “Linear with plateau” f(d) = d for d < c, f(d) = c for d ≥ c,(b) “Quadratic with plateau” f(d) = d2 for d < c, f(d) = c2 for d ≥ c, (c)“Exponential” f(d) = exp(−d/c) + 1 for some c > 0. (a) and (b) result indistance transforms with a gradient of zero at pixels that are further than cpixels away from any input contour point, while (c) produces a monotonicallydecreasing gradient. Note that the latter is qualitatively most similar to theboundary-weighted XOR function of [Mog03].2

3.2 Minimizing Algorithm

For fitting against occluding contours, where the error function is not well be-haved and analytical gradients are not available (see above), we chose an op-timization algorithm based on function evaluations only, the Downhill Simplexalgorithm due to Nelder and Mead [Nel65] in an implementation from [Pre99].

Much effort went into the setup of the initial simplex, and an appropriaterestarting behavior. Our initial simplex is based on the estimated standard de-viation of pG − pT (the difference of initial guess to ground truth). Therefore,

2 Moghaddam et al. integrate their error over the entire area between input and syn-thetic silhouette, with penalties of 1/d per pixel.

Page 7: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

if σ is the vector of the estimated standard deviations of the k parameters tobe adjusted, our initial simplex is made up of the k + 1 points pi = p0 + σi for1 ≤ i ≤ k and p0 = pG − 1

kσ, the initial guess.3

After convergence of the algorithm, the algorithm was restarted with aninitial simplex constructed in the same way as above, but with the point ofconvergence in place of the initial guess. The algorithm was started no morethan ten times for a given test case, and aborted if three consecutive runs didnot produce a new minimum. This resulted in an average of around 6000 errorfunction evaluations per test case.

3.3 Camera Model and Initial Pose EstimateOur camera model is the pinhole camera model which has seven degrees offreedom: three rotations (azimuth α, declination δ, rotation ρ around viewingaxis), and four translations (left x, up y, distance from object to focal point f ,distance from focal point to photographic plate d). Only α, δ, f have influence onoccluding contour formation, while the other parameters describe a 2D similaritytransform. When randomizing test cases the contour-defining parameters weresampled uniformly: α ∈ [−90◦, 90◦], δ ∈ [−30◦, 30◦], 1/f ∈ [0m−1, 2m−1].

From the rotations we derived our pose error measure, the aspect error. Theaspect error is the angle of the compound rotation required to align the coordi-nate frame of the ground truth with the coordinate frame of the estimate. Forsmall angles and frontal views this is approximately the sum α + δ + ρ.

Like in previous work we always assume that an initial guess for α, δ, ρ, x, yand the scale f/d has been provided by either a manual procedure or anotherprogram. When randomizing test cases we simulate this condition by randomlyscattering the parameters in question. Examples of the low accuracy of the initialguess can be seen in Fig. 4 and 5.

4 Experiments

4.1 Test SettingsWe differentiate between three types of test settings: realistic, synthetic andsemi-synthetic. The realistic test setting is an application of the system to realphotos; while being the most difficult test setting, reconstruction accuracy cannotbe quantified, unless the test subject is laser scanned.

In the synthetic test setting, geometrically correct contours are generatedfrom random configurations of the model (Fig. 3 (a)). This simulates the con-dition where feature extraction has been optimally solved (Fig. 1) and an exactmatch is possible.

In the semi-synthetic test setting, random configurations of the model arerendered photorealistically and edges extracted [Can86]. This mimics the real-istic case, except that reconstruction accuracy can be easily measured since theground truth is known. Additionally, the difficulty of the problem can be con-trolled and specific aspects of contour fitting can be investigated by f.i. enablingor disabling hair texture and hard shadows (see Fig. 3 (b)-(d)).3 When using an uncentered simplex with p0 = pG we observed a bias of the error of

the reconstructed parameters toward positive values.

Page 8: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

(a) geometry (b) without hair (c) with hair (d) with shadows

computed contour edges of (b) edges of (c) edges of (d)

Fig. 3. Hierarchy of the used test settings: (a) synthetic: contour computed from geom-etry. (b)-(d) semi-synthetic: edges extracted from renderings with increasingly manyfeatures.

4.2 Synthetic Experiments

The purpose of this experiment is to answer our second key question, how muchinformation about shape does a single contour image convey in the best imagin-able case.

We achieved best results matching 30 principal components and an unmodi-fied Euclidean Distance Transform (the modifications in 3.1 are designed for thesemi-synthetic case). On a large test set with 500 test cases the distance termwas under 0.5px for 90% of the results (Fig. 4).

The aspect error of the fits are on average very small: 92% with aspect errorea < 3◦. Please note, that in a first approximation ea ≈ eα + eδ + eρ. Thereforethe pose is indeed very well estimated.

In terms of shape error the fits are not significantly nearer to the ground truththan the mean is. It is worth mentioning that this by no means signifies thatthe recovered head is “practically random”. Let p, q be independent randomheads, where all components pi, qi are taken from a normal distribution withµ = 0 and σ = 1. With E[(pi − qi)2] = 2 and (1) it can easily be shownthat E[L2(p, q)2] = 2E[L2(p,0)2]. Therefore L2(fit, ground truth) would be onaverage

√2 times larger than L2(ground truth, mean) if the reconstructed heads

were indeed random.Nevertheless, the fact that the mean is almost as similar in L2 terms as the

reconstructed heads is very interesting. Obviously reconstructed heads can bevery different from the ground truth, while displaying virtually the same contours(Fig. 4). This supports the view that contour does not constrain shape tightly,even with a holistic head model.

Page 9: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

0.2 0.4 0.6 0.8 10

20

40

60

80

100Distance Term

distance term [px]

pece

ntile

distance term

1 2 3 4 5 6 7 80

20

40

60

80

100Aspect Error

aspect error [°]

pece

ntile

fitinitial guess

0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

20

40

60

80

100Shape Error (L2)

shape error [mm]

pece

ntile

L2(fit,ground truth)L2(ground truth,mean)

(a) (b)

(d)

(c)

(e)

Fig. 4. Synthetic Case: (a) Matched contour (black), input contour (white, mostly cov-ered by matched contour), contour of initial guess (gray). (b),(c) ground truth (actualcamera parameters, frontal view). (d),(e) reconstruction (estimated camera parameters,frontal view). Comparing 3D reconstruction to ground truth demonstrates that evennear-perfect fitting of contour may result in very dissimilar 3D shape. Distance errorand aspect error were near the median (edist = 0.29px, easp = 1.47◦, eL2 = 3.61mm).

More evidence is provided by a look at correlations: while aspect error corre-lates strongly with contour distance (correlation coefficient c = 0.62), L2 errorcorrelates less (c = 0.44).4

4.3 Semi-Synthetic Experiments

The purpose of this experiment is to see whether we can formulate the dis-tance function in such a way, that comparing an edge map to model contoursautomatically leads to correct correspondence between contours and edges.

We used the same test set as in the previous paragraph, and generated edgemaps for the four “unwanted feature” combinations, test persons with/withouthair (predicate H), hard shadows enabled/disabled (predicate S). For brevitywe notate e.g. the case “with hair, shadows disabled” as H,¬S.

Image Errors. Since the input is now an edge map, which is generally quitedifferent from the synthetic contour R(pT ) (in the notation of 3.1 with pT theparameters of the ground truth), we are now faced with a residual error eres =

4 To determine correlation, results with a distance term > 1px (1.6% of results) wereexcluded.

Page 10: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.50

20

40

60

80

100Reconstruction Error

reconstruction error [px]

perc

entil

e

synth.¬H,¬S¬H,SH,¬SH,SH,¬S,VH,S,V

1 2 3 4 5 6 7 80

20

40

60

80

100Aspect Error

aspect error [°]

perc

entil

e

synth.¬H,¬SH,S,Vguess

0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

20

40

60

80

100Shape Error (L2)

shape error [mm]

perc

entil

e

synth.¬H,¬SH,S,Vtrue vs. mean

(a) (b)

(d)

(c)

(e)

Fig. 5. Semi-Synthetic Case (H, S): Characteristic test result with vertex selection; re-construction, aspect and L2 error near the median (erec = 0.97px, easp = 3.99◦, eL2 =3.18mm). (a) Matched contour (black), input contour (white), contour of initial guess(gray). (b),(c) ground truth (actual camera parameters, frontal view). (d),(e) recon-struction (estimated camera parameters, frontal view).

D(I,R(pT )), eres ≥ 0. This residual error is highly dependent on the nature ofeach test case as well as edge extraction parameters. Therefore it would be amistake to quote the distance error ed = D(I,R(pR)) of a result vector pR, asits value does not describe reconstruction quality.

Instead we quote the reconstruction error erec = D(R(pT ), R(pR)), the dif-ference between the resulting contour and the synthetic input contour. This errormeasures the performance of the algorithm to match the model to the unknowntrue contour.

However, we must be aware that the algorithm can only minimize ed andnever erec (for which the ground truth must be known). If there is a solutionwith ed < eres the system will prefer it, even if it means erec becomes larger.

Page 11: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

Reconstruction. Our foremost goal is to reduce the above reconstruction error,erec. In a comparison of the three variants of the distance transform describedabove, “linear with plateau” was best in every respect. Especially the non-zerogradient of the exponential variant proved troublesome with incomplete input.The plateau constant c was chosen depending on the amount of noise in theinput (c = 25 for ¬H,¬S, until c = 5 for H,S).

Results were best if only 20 principal components were matched. Be awarethough that if p is a random shape vector, E[L2(p,0)] = 3.26mm, while forq, qi = pi(1 ≤ i ≤ 20), qi = 0 (else) E[L2(q,0)] = 3.14mm. So on average thefirst 20 components carry 96% of the L2-“content”.

An interesting result in itself is that the presence of hard shadows madevirtually no difference on reconstruction accuracy (Fig. 5) – when looking againat Fig. 3 (c) and (d), this makes a strong case for contour matching, consideringhow different the lighting is between these two pictures.

On the other hand, hair had a very destructive influence on reconstructionquality, mainly because our model is an entire head and in many test cases thesilhouette of the head was matched against the hair line of the input image. Forthis reason we include results where the model was artificially restricted to theface area (“vertex selection”, predicate V ).

Results for both aspect and shape error (Fig. 5) are in line with expectationsfrom reconstruction errors. The results are by no means bad in absolute terms(Fig. 5), but compared to the synthetic case much accuracy is lost.

5 Conclusions

We have shown that it is possible to compute a shape and a pose of a human faceexplaining a given input contour with consistently very high accuracy. While theoriginal pose is recovered very well, the recovered shape often differs greatly fromthe ground truth. This predestines contour matching as part of a system com-bining several features, as it can ensure contour consistency while not imposingtight constraints on the shape.

More difficult is the case where the input contour is replaced by an edge mapfrom a picture, because of the different nature of edges and model contours.While it is possible to design a distance function to solve correspondence im-plicitly, there are limits to robustness where an unwanted edge is too close to amissed contour. We showed that restricting fitting to areas where such misun-derstandings are least likely to occur does alleviate the problem, but would liketo point out that this solution suffers from the problem that the scope of therestriction is arbitrary and has to be appropriate for all test cases simultaneously.

In our problem description we have listed three basic options how to overcomethe edge-contour correspondence problem: (a) using a robust distance function,(b) using the model to establish correspondence, (c) using a more elaboratefeature detection process. In this paper we thoroughly explored the first option,showing on one hand that despite the simplicity of the method, good resultscan be achieved most of the time. On the other hand certain kinds of false

Page 12: 3D Reconstruction of Human Faces from Occluding Contoursgravis.dmi.unibas.ch/publications/2007/Mirage07.pdf · 3D Reconstruction of Human Faces from Occluding Contours Michael Keller,

correspondences are difficult to avoid without exploring the other options, whichwill be our focus in future work.

Furthermore we have laid out that different types of contours require differ-ent optimization techniques; we believe that combining subsystems for differenttypes of contours may prove to be a powerful concept.

Acknowledgements

This work was supported by a grant from the Swiss National Science Foundation200021-103814.

References

[Nel65] J. A. Nelder, R. A. Mead, “A Simplex Method for Function Minimization,”Comput. J. 7, 308-313, 1965.

[Can86] J. Canny, “A computational approach to edge detection,” IEEE Trans. onPattern Analysis and Machine Intelligence, 8:679-698, November 1986.

[Mog03] B. Moghaddam, J. Lee, H. Pfister, and R. Machiraju, “Model-based 3-D facecapture with shape-from-silhouettes,” IEEE International Workshop on Analysis andModeling of Faces and Gestures, Nice, France, pages 2027, October 2003.

[Bla99] V. Blanz, Th. Vetter, “A morphable model for the synthesis of 3d faces,”SIGGRAPH ’99, pages 187–194, New York, NY, USA, 1999. ACM Press/Addison-Wesley Publishing Co.

[Bla03] V. Blanz, Th. Vetter, “Face Recognition Based on Fitting a 3D MorphableModel,” IEEE Transactions on Pattern Analysis and Machine Intelligence 2003,25(9).

[Fel04] P. Felzenszwalb, D. Huttenlocher, “Distance transforms of sampled functions,”Cornell Computing and Information Science Technical Report TR2004-1963, Septem-ber 2004.

[Koe90] J. J. Koenderink, “Solid Shape,” The MIT Press, 1990.[Pre99] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, “Numerical

Recipes in C / the Art of Scientific Computing,” Cambridge University Press, 1999.[Rom05] S, Romdhani, Th. Vetter, “Estimating 3D shape and texture using pixel in-

tensity, edges, specular highlights, texture constraints and a prior, ” CVPR’05, SanDiego, 2:986–993, 2005.

[Roy05] A. Roy-Chowdhury, R. Chellappa, H. Gupta, “3D Face Modeling From Monoc-ular Video Sequences,” Face Processing: Advanced Modeling and Methods (Eds.R.Chellappa and W.Zhao), Academic Press, 2005.

[Ili05] S. Ilic, M. Salzmann, P. Fua, “Implicit surfaces make for better silhouettes,”CVPR’05, San Diego, 1:1135–1141, 2005.

[Zha00] W. Zhao, R. Chellappa, A. Rosenfeld, P. Phillips, “Face recognition: A liter-ature survey,” UMD CfAR Technical Report CAR-TR-948, 2000.

[Sid98] K. Siddiqi, A. Shokoufandeh, S. J. Dickinson, S. W. Zucker, “Shock Graphsand Shape Matching,” Computer Vision, pages 222–229, 1998.