Pose-Robust Albedo Estimation from a Single Image · test images to the surface of the ellipsoid. Zhou et al. [24] impose a rank constraint on shape and albedo maps to sep-arate the

Pose-Robust Albedo Estimation from a Single Image ∗

Soma Biswas and Rama ChellappaDepartment of Electrical and Computer Engineering and CfAR,

University of Maryland, College Park{soma, rama}@cfar.umd.edu

Abstract

We present a stochastic filtering approach to performalbedo estimation from a single non-frontal face image.Albedo estimation has far reaching applications in variouscomputer vision tasks like illumination-insensitive match-ing, shape recovery, etc. We extend the formulation pro-posed in [3] that assumes face in known pose and present analgorithm that can perform albedo estimation from a singleimage even when pose information is inaccurate. 3D poseof the input face image is obtained as a byproduct of thealgorithm. The proposed approach utilizes class-specificstatistics of faces to iteratively improve albedo and poseestimates. Illustrations and experimental results are pro-vided to show the effectiveness of the approach. We high-light the usefulness of the method for the task of matchingfaces across variations in pose and illumination. The facialpose estimates obtained are also compared against groundtruth.

1. Introduction

Albedo at a surface point is defined as the fraction oflight that is reflected by the point. One of the earliest effortsfor albedo estimation can be traced back to the lightness al-gorithms which follow a filtering approach to separate dif-ferent frequency components [7]. Since then, albedo esti-mation has often been coupled with the task of shape recov-ery [19][22] making the accuracy of the estimated albedodepend on the accuracies of shape and illumination esti-mates. Recently, an approach based on an image formationmodel has been proposed for robust estimation of albedofrom a single face image [3]. The approach uses a stochasticfiltering framework for handling errors due to inaccuracies

∗This research was funded by the Office of the Director of NationalIntelligence (ODNI), Intelligence Advanced Research Projects Activity(IARPA), through the Army Research Laboratory (ARL). All statementsof fact, opinion or conclusions contained herein are those of the authorsand should not be construed as representing the official views or policiesof IARPA, the ODNI, or the U.S. Government.

in the surface normals and light source direction to estimatealbedo across wide range of challenging illumination con-ditions. One limitation of the approach is that it requiresaccurate knowledge of the pose of the face that may nottypically be available. To be able to recognize faces in realand unconstrained scenarios which is the ultimate goal, itmay not be realistic to assume either frontal pose or an ac-curate knowledge of the pose since facial pose estimation isby itself a challenging research problem [13].

In this paper, we extend the formulation in [3] to accountfor inaccurate pose information in addition to inaccuraciesin light source and surface normal information. The pro-posed approach is an image estimation framework that uti-lizes class-specific statistics of the imaged object to itera-tively improve pose and albedo estimates. In each itera-tion, given the current albedo estimate, 3D facial pose isestimated by solving a linear Least-Squares (LS) problemwhich is used to further improve the albedo estimate, and soon. The input to the algorithm is a face image in which faceand eyes are automatically located using OpenCV’s Haar-based detectors.

Extensive experiments have been performed to evaluatethe usefulness of the proposed approach. Experimental re-sults on synthetic data in varying poses are provided to showthe accuracy of the albedo and 3D pose estimates for differ-ent unknown poses. To show the usefulness of the estimatedalbedo maps as illumination insensitive measures, the esti-mated albedo maps are used for the task of face recognitionacross pose and illumination variations. We also providecomparisons with ground truth for the estimated 3D facialposes. Experiments on unconstrained real face images fromthe web further highlight the effectiveness of the approach.

The rest of the paper is organized as follows. Section 2discusses a few related works. The proposed albedo andpose estimation framework is described in Section 3. Thedetails of the proposed algorithm are given in Section 4.The results of experimental evaluation are presented in Sec-tion 5. The paper concludes with a summary and discussion.

1

2. Previous Work

This section discusses some of the related works onalbedo estimation and its applications of matching facesacross pose and illumination variations. Zhao and Chel-lappa [23] utilize domain specific knowledge in a Shape-from-Shading (SFS) framework to recover shape and albedofor the class of bilaterally symmetric objects with a piece-wise constant albedo field. Smith and Hancock [19] also usea statistical model of facial shape in an SFS formulation toestimate shape. Albedo is then computed as the residual toaccount for the difference between predicted and observedimage brightness.

Since one of the main applications of albedo estimationis as an illumination-insensitive signature for representingand recognizing faces across illumination variations, muchof the work on albedo estimation has taken place in the con-text of face recognition. Blanz and Vetter [4] propose a 3Dmorphable model approach in which a face is representedusing a linear combination of basis exemplars. The shapeand albedo parameters of the model are computed by fit-ting the morphable model to the input image. An efficientand robust algorithm for fitting 3D morphable model to in-put images using shape and texture error functions was pro-posed by Romdhani et al. [15]. Zhang and Samaras [22]combine spherical harmonics illumination representationwith 3D morphable models [4]. An iterative approach isused to compute albedo and illumination coefficients usingthe estimated shape. Liu and Chen [12] propose a geomet-ric approach in which they approximate a human head witha 3D ellipsoid model and recognition is performed by com-paring texture maps obtained by projecting the training andtest images to the surface of the ellipsoid. Zhou et al. [24]impose a rank constraint on shape and albedo maps to sep-arate the two from illumination using the factorization ap-proach.

Other than the albedo based methods, several approacheshave been proposed for face recognition across pose varia-tions. Due to space constraints, we provide pointers only toa few of the recent approaches. For face recognition acrosspose, local patches are considered more robust than thewhole face, and several patch-based approaches have beenproposed [8][2][11]. In a recent paper, Prince et al. [14]propose a generative model for generating the observationspace from the identity space using an affine mapping andpose information. Yue et al. [21] extend the spherical har-monics representation to encode pose information. Castilloand Jacobs [6] propose using the cost of stereo matchingfor 2D face recognition across pose without performing 3Dreconstruction.

Notation: Throughout the paper, ρ, n, s, Θ denote thetrue unknown albedo, surface normals, illuminant directionand pose of the object while ρ̄, n̄, s̄, Θ̄ represent the initialestimates of the corresponding variables.

3. Albedo Estimation from a Single Image

For the class of Lambertian objects, the diffused compo-nent of the surface reflection is modeled using the Lambert’sCosine Law

I = ρ max(nT s, 0) (1)

where I is the pixel intensity, s is the light source direc-tion, ρ is the surface albedo and n is the surface normal ofthe corresponding point. The max function in the relationaccounts for the formation of attached shadows.

Let n̄i,j and s̄ be some initial estimates of the surfacenormals and illuminant direction respectively. Let Θ̄ repre-sents initial knowledge of the pose. The Lambertian as-sumption imposes the following constraint on the initialalbedo ρ̄ obtained at pixel (i, j)

ρ̄i,j =Ii,j

n̄Θ̄i,j · s̄

(2)

where · is the standard dot product operator and n̄Θ̄i,j de-

notes the initial estimate of surface normals in pose Θ̄. Inmost real applications, the input is only a single intensityimage and so we do not have accurate estimates of pose,surface normals and light source direction. Inaccuracies inthese initial estimates lead to considerable errors in the ini-tial albedo estimate (Figure 1).

Figure 1. Illustration of errors in albedo due to errors in surfacenormals, illuminant direction and pose. (a) Input Image; (b) Truealbedo; (c) Albedo estimate using average facial surface normal,estimated illuminant direction and true pose; (d) Error map for(c); (e) Albedo estimate using true values of surface normal andilluminant direction and assuming frontal pose; (f) Error map for(e) due to inaccuracies in pose information.

As shown in the figure, even if the surface normals andilluminant direction are accurately known, error in pose in-formation can result in unacceptable errors in the albedomap. In [3], an image estimation formulation was proposedto account for the inaccuracies in the surface normals andthe light source direction, but knowledge of the pose wasassumed to be known a priori. In this work, we extend theframework to address the more general scenario where thepose is unknown. As a byproduct of the formulation, wealso get an estimate of the 3D pose which is itself a chal-lenging problem and an active area of research [13].

3.1. Image Estimation Formulation

Here we formulate the image estimation framework toobtain a robust albedo estimate using the initial albedo mapwhich is erroneous due to inaccuracies in pose, surface nor-mal and light source estimates. The expression in (2) can berewritten as follows

ρ̄i,j =Ii,j

n̄Θ̄i,j · s̄

= ρi,j

nΘi,j · s

n̄Θ̄i,j · s̄

(3)

where ρ, n and s are the true unknown albedo, normal andilluminant direction respectively and Θ denotes the true un-known pose. ρ̄i,j can further be expressed as follows

ρ̄i,j = ρi,j

n̄Θi,j · s̄

n̄Θ̄i,j · s̄

+nΘ

i,j · s − n̄Θi,j · s̄

n̄Θ̄i,j · s̄

ρi,j (4)

We substitute

wi,j =nΘ

i,j · s − n̄Θi,j · s̄

n̄Θ̄i,j · s̄

ρi,j , hi,j =n̄Θ

i,j · s̄n̄Θ̄

i,j · s̄(5)

So equation (4) simplifies to

ρ̄i,j = ρi,jhi,j + wi,j (6)

This can be identified with the standard image estimationformulation [1]. Here ρ is the original signal (true albedo),the rough albedo estimate ρ̄ is the degraded signal and wis the signal dependent additive noise. When the head poseis known accurately, i.e., if Θ̄ = Θ, hi,j = 1. So this isa generalization of the formulation proposed in [3] for thecase of unknown head pose.

4. Albedo Estimate

Several methods have been proposed in literature tosolve image estimation equations of the form (6). Herewe compute the Linear Minimum Mean Squared Error(LMMSE) albedo estimate which is given by [17]

ρest = E(ρ) + Cρρ̄C−1ρ̄ (ρ̄ − E(ρ̄)) (7)

Here Cρρ̄ is the cross-covariance matrix of ρ and ρ̄. E(ρ̄)and Cρ̄ are the ensemble mean and covariance matrix of ρ̄respectively. The LMMSE filter requires the second orderstatistics of the signal and noise.

From (5), the expression for the signal-dependent noisewi,j can be rewritten as follows

wi,j =(nΘ

i,j − n̄Θi,j) · s + n̄Θ

i,j · (s − s̄)

n̄Θ̄i,j · s̄

ρi,j (8)

Assuming the errors in illumination and surface normals tobe unbiased, the noise w is zero-mean. Under this assump-tion, the expressions for Cρρ̄ and Cρ̄ simplify (details in thesupplementary material) to

Cρρ̄ = CρHT and Cρ̄ = HCρH

T + Cw (9)

where H is the matrix containing h’s for the entire imageas its diagonal entries and Cw is the covariance of the noiseterm.

We assume a Non-stationary Mean Non-stationary Vari-ance (NMNV) model for the original signal, which has beenshown to be a reasonable assumption for many applica-tions [9]. Under this model, the original signal is charac-terized by a non-stationary mean and a diagonal covariancematrix with non-stationary variance. Under the NMNV as-sumption, the LMMSE filtered output (7) simplifies (de-tails in the supplementary material) to the following scalar(point) processor of the form

ρesti,j = E(ρi,j) + αi,j

(ρ̄i,j − E(ρ̄i,j)

)where, αi,j =

σ2i,j(ρ)hi,j

σ2i,j(ρ)h2

i,j + σ2i,j(w)

(10)

where σ2i,j(ρ) and σ2

i,j(w) are the non-stationary signal andnoise variances respectively. Since noise w is zero-mean,E(ρ̄i,j) = hi,jE(ρi,j). Therefore, (10) can be written as

ρesti,j = (1 − hi,jαi,j)E

(ρi,j

)+ αi,j ρ̄i,j (11)

So the LMMSE albedo estimate is the weighted sum ofthe ensemble mean E(ρ) and the observation ρ̄, where theweight depends on the ratio of signal variance to the noisevariance. Now we derive the different entities in the expres-sion for the albedo estimate.

4.1. Expression for the Noise Variance

From (8), assuming the errors in surface normal (ni,j −n̄i,j) to be uncorrelated in the x, y and z directions and theirvariances are same, the expression for the noise variancecan be shown to be (details in the supplementary material)

σ2i,j(w) =

σ2i,j(n) + σ2(s)

(n̄Θ̄i,j · s̄)

2 E(ρ2

i,j

)(12)

Here σ2i,j(n) and σ2(s) are the error variances in each di-

rection of the surface normal and light source direction re-spectively.

4.2. Expression for hi,j

The expression for hi,j is given by

hi,j =n̄Θ

i,j · s̄n̄Θ̄

i,j · s̄= 1 +

(n̄Θi,j − n̄Θ̄

i,j) · s̄n̄Θ̄

i,j · s̄(13)

The term hi,j depends on the difference of the surface nor-mal corresponding to the pixel location (i, j) between theinitial pose and the true pose. The term is present due to thefact that an incorrect pose Θ̄ is used to compute the initialalbedo that is different from the true unknown pose Θ.

Let Figure 2 (a) represents the initial pose Θ̄ and Figure 2(b) represents the true pose Θ. Let us consider the surfacepoints corresponding to the same pixel location (i, j) forthe two poses. Let P1 be the surface point of the face in theinitial pose which corresponds to pixel (i, j) (which is P ′

1

in the true pose) and P ′2 be the surface point in the true pose

for the same pixel (i, j) (which is P2 in the initial pose).P1 and P ′

2 which correspond to the same pixel location arephysically different surface points since the initial pose isdifferent from the true pose. Let us assume that the initialpose and the true pose are related by (Ω, T ). Here, Ω =[Ωx, Ωy, Ωz] denotes the rotation about the centroid of theface and T = [Tx, Ty, Tz] denotes the translation of thecentroid.

Figure 2. Illustration to explain the relation between surface nor-mals of two different surface points corresponding to the samepixel location. (a) Initial pose; (b) True pose. Here P1 and P ′

2

correspond to the same pixel location (i, j), though they are phys-ically different points.

In this case, the difference between the normals can beexpressed as

Δn = nP ′2− nP1 = JP1Δ + ΔnP2,P ′

2(14)

Here, Δ = P2 − P1 is the difference in the co-ordinates ofP2 and P1 and JP1 is the Jacobian matrix of the surface nor-mal nP1 at surface point P1. The term ΔnP2,P ′

2denotes the

difference in surface normals between nP2 and nP ′2. The

first term conveys that P2 is a different surface point fromP1 and the second term takes care of the fact that the surfacenormal nP ′

2is a rotated version of the surface normal nP2 .

In [20], Xu and Roy-Chowdhury use a similar equationto relate different frames of a video sequence when the ob-ject under consideration is undergoing rotation and trans-lation. They showed that under small motion assumption,the difference in surface normals can be expressed as a lin-ear function of the object motion variables, i.e., (14) can beexpressed as

Δni,j = Ai,jΩ + Bi,jT (15)

where the variables A and B can be computed from the av-erage surface normal at the initial pose. The exact expres-sions for these variables are given in Appendix A. Figure 3

illustrates how well the linear expression for Δni,j approx-imates the true difference n̄Θ

i,j − n̄Θ̄i,j for average 3D face

model. The figure shows the average angular errors due tothe linear approximation of Δni,j for different values ofpitch and yaw. We see that for small rotations, the erroris quite small which means that the approximation is quitegood. Using (15), the expression for hi,j can be written interms of rotation and translation (Ω, T ) as

hi,j = 1 +(Ai,jΩ + Bi,jT ) · s̄

n̄Θ̄i,j · s̄

(16)

−5

0

5

−4 −3 −2 −11 2 3 4

−5

0

5

−4−3−2−11234

0

0.5

1

1.5

2

2.5

3

3.5

Pitch (deg)Yaw (deg)

Avg

an

gu

lar

erro

r in

su

rfac

e n

orm

als

(deg

)

Figure 3. Average angular errors in surface normals for average 3Dface model due to the linear approximation of n̄Θ

i,j − n̄Θ̄i,j (15).

4.3. Algorithm for Albedo and Pose Estimation

In this section, we describe the proposed algorithm forestimating the unknown albedo map and the pose using thedescribed formulation. From (10), (12) and (16), we canexpress the LMMSE albedo estimate as a function of poseand class-based statistics as follows

ρest = f(S, Θ) (17)

where S represents the various statistics like E(ρi,j

),

σ2i,j(ρ) and σ2

i,j(w) and Θ represents the pose which isgiven by rotation Ω and translation T . The statistics im-plicitly depends on the facial pose. If the pose is known, theLMMSE albedo estimate can be computed using the aboverelation and vice versa. Based on this, we propose an itera-tive algorithm to alternately estimate albedo and pose.

The input to the algorithm is a single intensity image andsome initial estimates of surface normals and pose. In allour experiments, we use an average 3D face model as theinitial estimate of the surface normals. Initial pose is as-sumed to be frontal in all our experiments. Given the image,OpenCV Haar-based detectors are used to obtain face andeye locations that serve to provide initial localization of theface region. Using the average shape and initial pose infor-mation, we obtain an initial estimate of illuminant direction

as follows [5]

s̄ =

(∑i,j

n̄Θ̄i,jn̄

Θ̄i,j

T

)−1∑i,j

Ii,jn̄Θ̄i,j (18)

where n̄Θ̄i,j is the average facial surface normal at initial

pose Θ̄. The required class statistics S is computed based onthe initial pose information using Vetter’s 3D face data [4].The rest of the algorithm proceeds as follows

1. Using the current estimate of the pose, the LMMSEalbedo estimate ρest is computed using (11).

2. If the current pose estimate is very different from thetrue unknown pose, the current albedo estimate canbe quite erroneous. So we perform a regularizationstep where the current albedo estimate is projectedonto a statistical albedo model to ensure that the re-sulting albedo map lies within the space of allowablefacial albedo maps. In our implementation, we use thestandard Principal Component Analysis (PCA)-basedlinear statistical model computed from Vetter’s facialdata [4] to perform this regularization. Let the regular-ized albedo map be denoted by ρreg.

To avoid computation of the statistical model for everyintermediate pose, we bring the albedo map to thefrontal pose before regularization. The albedo map atthe frontal pose ρfrontal is related to the albedo map atthe current pose ρest as follows

ρfrontali,j = ρest

i,j − Δρi,j (19)

From Figure 2, the albedo changes from P1 to P2, butis the same for P2 and P ′

2. Therefore, Δρ = ρP ′2−

ρP1= ΔρP1

Δ where ΔρP1is the gradient of ρ at

point P1. Δρi,j can further be approximated as [20]

Δρi,j = Ci,jΩ + Di,jT (20)

where the variables C and D are computed from theclass statistics (details are in Appendix A).

3. The regularized albedo map is further used to computea revised estimate of the pose. From (17), we can ex-press the pose in terms of the albedo estimate as fol-lows

(X1 X2

)( ΩT

)= (ρ̄ − ρreg)n̄Θ̄ · s̄ (21)

where X1 = ρregs̄T A + (n̄Θ̄ · s̄)C and X2 =ρregs̄T B + (n̄Θ̄ · s̄)D. The subscript i, j has beenomitted from (21) for clarity. (21) is used to obtain thenew pose estimate using the LS method.

4. If the differences of the albedo and pose estimates be-tween two successive iterations are below a specifiedthreshold, terminate the algorithm and output the cur-rent albedo and pose. Otherwise, using the updatedpose and illuminant estimates, repeat the iteration.

Figure 4. Flowchart illustrating the proposed algorithm.

As we have seen from Figure 3, the linear approxima-tion for Δni,j in (15) works well for small difference be-tween the initial pose and the true pose which imposes alimit on the pose difference which the above algorithm canhandle. Experimentally, we have seen that the above algo-rithm can handle rotation of about 5 − 6 degrees. To gen-eralize the method for larger pose difference, we de-rotateand de-translate the input image by the estimated rotationand translation after every iteration. Then we use the newde-rotated and de-translated image as input to the next it-eration. This enables the algorithm to handle pose errors ofover 30◦. Figure 4 shows the different steps of the proposedalgorithm. As shown, we obtain pose and albedo (in frontalpose) estimates as the output of the algorithm. The numberof iterations required depends on the pose but we observedthat typically it takes around 5-6 iterations for a pose errorof around 20◦. In all our experimental results, the iterativeoptimization is terminated if the pose difference betweentwo consecutive iterations becomes less than one degree forall the three angles (roll, yaw and pitch). Our MATLABimplementation of the algorithm converges in around 1.5minutes on a Pentium M 1.60 GHz laptop out of which mostof the time is used in image warping that can potentially be

made faster using a GPU-based parallel implementation.

Figure 5. (a) Input image; (b) Initial rough albedo estimate usingfrontal pose; (c) Estimated 3D pose; (d) Estimated albedo map; (e)True albedo; (f) The derotated images after every iteration.

Figure 5 shows the albedo map and 3D pose obtainedusing the proposed algorithm for a face image generatedusing 3D facial data [4]. The derotated images after everyiteration are shown in the second row. The black pixels inthe derotated images correspond to regions in the originalimage which are not visible due to the non-frontal pose. Thealbedo map in (d) is obtained using the pose estimate andthe estimated albedo at the frontal pose.

−100

10 0 5 10 15 20 25 30 35 40

15

20

25

30

35

40

45

50

55

60

Yaw (deg)Pitch (deg)

Ave

rag

e al

bed

o e

rro

r

−10

−8

−6

−4

−2

0

2

4

6

8

100 5 10 15 20 25 30 35 40

Yaw (deg)

Pit

ch (

deg

)

20

25

30

35

40

45

50

55

Figure 6. Visualization of the error surface for a synthetically gen-erated face image. (Left) Average per-pixel error in the albedo mapfor different pose hypotheses. The path taken by our algorithm isshown in red. (Right) Top view of the error surface.

To further illustrate the working of the algorithm, wepresent the error surface along with the path traversed by theproposed iterative algorithm (Figure 6). The error surface isgenerated by computing the average per-pixel albedo errorfor albedo estimates obtained for different pose hypotheses.The error is minimum at the true pose of 20 degrees yaw.The algorithm starts with the assumption of frontal pose andconverges to a pose close to the true pose in 5 iterations (redline in the plot).

Discussion: We now analyze the reason for the proposedalgorithm to work reliably for pose errors over 30◦ eventhough the linear approximation for Δni,j in (15) seemsto be accurate only for much smaller angles. Note that theerror plot in Figure 3 shows the errors averaged over theentire face but we observe that most of these errors comefrom the nose region. The linear approximation is fairlyaccurate for angles as large as 30◦ for facial points that are

not close to the nose making the proposed algorithm capableof dealing with such large pose errors.

5. Experimental Evaluation

Experiment on synthetically generated data: Forcomparison with the ground truth, we first evaluate the pro-posed approach for images synthetically generated from 3Dfacial data [4]. Tables 1 and 2 show the average accuracy inpose and albedo estimates obtained for 1000 images gener-ated under different illumination conditions and poses. Forall the images, the initial pose was assumed to be frontal, sothe tables show the results of the algorithm for increasingerrors in the initial pose. The albedo estimates obtained aresignificantly more accurate (around 40%) compared to theinitial noisy maps obtained assuming frontal pose.

Table 1. Average accuracy in the pose estimates (in deg) for syn-thetic data under different illumination conditions and poses. Theresults are averaged over 1000 images. The initial pose is alwaystaken to be frontal.

Mean and std 5◦ 10◦ 15◦ 20◦ 25◦ 30◦

YawMean 5.9 10.3 15.2 20.1 24.3 28.8Std 1.05 1.3 1.3 1.6 1.5 1.6

PitchMean 5.4 10.3 14.9 20.1 24.9 29.2Std 1.3 1.5 1.9 1.6 1.6 2.1

RollMean 4.7 9.7 14.5 19.5 24.1 28.6Std 1.3 1.5 1.6 1.5 1.5 1.4

Table 2. Average accuracy in the albedo estimates for the experi-ment described in Table 1. The entries in the table represent theaverage per-pixel errors in albedo estimates.

5◦ 10◦ 15◦ 20◦ 25◦ 30◦

Yaw 14.8 14.9 14.4 14.9 14.9 15.1Pitch 14.3 14.4 15.2 15.4 15.9 16.1Roll 14.7 14.8 14.8 15.2 15.3 15.9

Recognition across illumination and pose: Figure 7shows the estimated frontal albedo maps for several imagesunder different illumination conditions and poses for onesubject from the PIE dataset [18]. As desired, the albedomaps look quite similar to each other with much of the il-lumination and viewpoint differences removed. We furtheruse the estimated albedo maps as illumination and pose in-sensitive signatures in a face recognition experiment on thePIE dataset that contains face images of 68 subjects takenunder several different illumination conditions and pose.The estimated frontal albedo maps are projected onto analbedo PCA space (generated from FRGC training data) tocompute similarity across gallery and probe images. Herethe gallery images are in frontal pose and frontal illumina-tion f12 and the probe images are in side pose and 21 differ-ent illumination conditions. In this experiment, each galleryand probe set contains just one image per subject. Table 3shows the rank-1 recognition results obtained. We see that

Figure 7. Albedo estimates obtained for several images of the same subject from the PIE dataset [18].

Table 3. Recognition results on the PIE dataset [18]. The recognition rates of [15][22] are included for comparison.

Illumination source from PIE2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Avg

[15] 60 78 83 91 89 92 94 97 89 97 98 97 98 97 94 89 85 86 97 98 97 90.8[22] 81 88 91 89 92 95 93 96 97 98 99 93 94 93 91 92 88 90 94 96 95 92.6Our 68 84 91 96 97 97 97 97 99 97 99 97 97 97 93 90 96 97 97 96 97 94.2

the proposed algorithm compares favorably with the state-of-the-art [15][22].

Head pose estimation and comparison with groundtruth: Figure 8 shows the results of head pose estimationusing the proposed algorithm on a set of images from theBU dataset [10]. The sequence has 200 frames out of whichwe considered every alternate frame. For every frame, westarted with the frontal pose as the initial pose. The first rowin Figure 8 shows some of the frames from the sequence andthe second row shows the comparison of the pose estimatesobtained against the ground truth provided with the dataset.As can be seen, the proposed estimates are quite close to theground truth (with mean error of 2.7, 1.3 and 1.2 degrees inpitch, yaw and roll respectively).

Figure 8. Comparison of the pose estimation results on the BUdataset [10] with the provided ground truth.

We also use the proposed algorithm to estimate albedoand pose on images downloaded from the web with littlecontrol over the imaging conditions. Figure 9 shows thealbedo and pose estimates obtained.

Figure 9. Row 1: A few images downloaded form the web withautomatically detected faces and eye locations; Row 2: Estimated3D head pose; Row 3: Estimated albedo map.

6. Summary and Discussion

In this paper, we have proposed an approach for simulta-neous estimation of albedo and 3D head pose from a singleimage. In all our experiments, we used OpenCV’s Haar-based detectors to automatically detect faces and eyes forinitial localization. Compared to most state-of-the-art ap-proaches [16], the proposed approach does not require man-ually marked landmarks and is completely automatic. Inaddition, the method does not impose any linear statisticalconstraint on the unknown albedo and the statistical albedomodel is used only for regularization. Currently, we do notestimate 3D shape of the input face image that will be partof our future research. The proposed algorithm works wellfor a wide range of poses (around 30◦ on either side for a to-

tal range of around 60◦). Starting with a different canonicalpose, the method can be extended for more extreme poses.Multiple illumination sources can also be incorporated inthe proposed formulation as done in [3].

Appendix

A. Expressions for (15) and (20)

Assuming P1 is the 3D face point corresponding to thepixel i, j in the initial pose, the expressions for A and Bin (15) are given by

A = JP1MP̂1 − n̂P1 ; B = −JP1M (22)

The subscript i, j has been omitted for clarity. Here,

M = I − 1nT

P1u

unTP1

where I is the identity matrix and u is the unit vector inthe direction joining the optical center of the camera to thesurface point P1 corresponding to the pixel (i, j). The skewsymmetric matrix of a vector

X =

⎛⎝ x1

x2

x3

⎞⎠ ; is X̂ =

⎛⎝ 0 −x3 x2

x3 0 −x1

−x2 x1 0

⎞⎠

The expressions for C and D in (20) are given by

C = ΔρP1MP̂1; D = −ΔρP1M

The subscript i, j has been omitted for clarity. For deriva-tions of these expressions, readers are referred to [20].

References

[1] H. C. Andrews and B. R. Hunt. Digital Image Restoration.Prentice-Hall signal processing series, 1977. 3

[2] A. Ashraf, S. Lucey, and T. Chen. Learning patch correspon-dences for improved viewpoint invariant face recognition. InIEEE Conf. on Comp. Vision and Pattern Recog., 2008. 2

[3] S. Biswas, G. Aggarwal, and R. Chellappa. Robust estima-tion of albedo for illumination-invariant matching and shaperecovery. IEEE Trans. on PAMI, 31(5):884–899, May 2009.1, 2, 3, 8

[4] V. Blanz and T. Vetter. Face recognition based on fitting a 3dmorphable model. IEEE Trans. on PAMI, 25(9):1063–1074,Sept 2003. 2, 5, 6

[5] M. J. Brooks and B. K. P. Horn. Shape and source fromshading. In Proceedings of International Joint Conferenceon Artificial Intelligence, pages 932–936, Aug 1985. 5

[6] C. Castillo and D. Jacobs. Using stereo matching for 2-d facerecognition across pose. In IEEE Conf. on Comp. Vision andPattern Recog., pages 1–8, 2007. 2

[7] B. K. P. Horn. Determining lightness from an image. Comp.Graphics and Image Processing, 3(4):277–299, 1974. 1

[8] T. Kanade and A. Yamada. Multi-subregion based proba-bilistic approach toward pose-invariant face recognition. InIEEE International Symp. on Computational Intelligence inRobotics and Automation, pages 954–959, 2003. 2

[9] D. T. Kuan, A. A. Sawchuk, T. C. Strand, and P. Chavel.Adaptive noise smoothing filter for images with signal-dependent noise. IEEE Trans. on PAMI, 7(2):165–177,March 1985. 3

[10] M. La Cascia, S. Sclaroff, and V. Athitsos. Fast, reliable headtracking under varying illumination: An approach based onregistration of textured-mapped 3d models. IEEE Trans. onPAMI, 22(4):322–336, April 2000. 7

[11] A. Li, S. Shan, X. Chen, and W. Gao. Maximizing intra-individual correlations for face recognition across pose dif-ferences. In IEEE Conf. on Comp. Vision and Pattern Recog.,2009. 2

[12] X. Liu and T. Chen. Pose-robust face recognition usinggeometry assisted probabilistic modeling. In IEEE Conf. onComp. Vision and Pattern Recog., pages 502–509, 2005. 2

[13] E. Murphy-Chutorian and M. Trivedi. Head pose estima-tion in computer vision: A survey. IEEE Trans. on PAMI,31(4):607–626, April 2009. 1, 2

[14] S. Prince, J. Warrell, J. Elder, and F. Felisberti. Tied factoranalysis for face recognition across large pose differences.IEEE Trans. on PAMI, 30(6):970–984, June 2008. 2

[15] S. Romdhani, V. Blanz, and T. Vetter. Face identificationby fitting a 3d morphable model using linear shape and tex-ture error functions. In European Conference on ComputerVision, pages 3–19, 2002. 2, 7

[16] S. Romdhani, J. Ho, T. Vetter, and D. Kriegman. Face recog-nition using 3-d models: Pose and illumination. Proceedingsof the IEEE, 94(11), November 2006. 7

[17] A. P. Sage and J. L. Melsa. Estimation Theory with Applica-tions to Comm. and Control. McGraw-Hill, 1971. 3

[18] T. Sim, S. Baker, and M. Bsat. The CMU pose, illumi-nation, and expression database. IEEE Trans. on PAMI,25(12):1615–1618, Dec. 2003. 6, 7

[19] W. A. P. Smith and E. R. Hancock. Recovering facial shapeusing a statistical model of surface normal direction. IEEETrans. on PAMI, 28(12):1914–1930, Dec 2006. 1, 2

[20] Y. Xu and A. Roy-Chowdhury. Integrating motion, illumi-nation, and structure in video sequences with applicationsin illumination-invariant tracking. IEEE Trans. on PAMI,29(5):793–806, May 2007. 4, 5, 8

[21] Z. Yue, W. Zhao, and R. Chellappa. Pose-encoded sphericalharmonics for face recognition and synthesis using a singleimage. EURASIP Journal on Advances in Signal Proc. 2

[22] L. Zhang and D. Samaras. Face recognition from a sin-gle training image under arbitrary unknown lighting usingspherical harmonics. IEEE Trans. on PAMI, 28(3):351–363,March 2006. 1, 2, 7

[23] W. Zhao and R. Chellappa. Symmetric shape from shadingusing self-ratio image. International Journal of ComputerVision, 45(1):55–75, October 2001. 2

[24] S. Zhou, R. Chellappa, and D. Jacobs. Characterization ofhuman faces under illumination variations using rank, inte-grability, and symmetry constraints. In European Conf. onComputer Vision, 2004. 2

Documents

Pose-Robust Albedo Estimation from a Single Image · test images to the surface of the ellipsoid. Zhou et al. [24] impose a rank constraint on shape and albedo maps to sep-arate the