11
Research Article A Robust Shape Reconstruction Method for Facial Feature Point Detection Shuqiu Tan, Dongyi Chen, Chenggang Guo, and Zhiqi Huang School of Automation Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu 611731, China Correspondence should be addressed to Shuqiu Tan; [email protected] and Dongyi Chen; [email protected] Received 24 October 2016; Revised 18 January 2017; Accepted 30 January 2017; Published 19 February 2017 Academic Editor: Ezequiel L´ opez-Rubio Copyright © 2017 Shuqiu Tan et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Facial feature point detection has been receiving great research advances in recent years. Numerous methods have been developed and applied in practical face analysis systems. However, it is still a quite challenging task because of the large variability in expression and gestures and the existence of occlusions in real-world photo shoot. In this paper, we present a robust sparse reconstruction method for the face alignment problems. Instead of a direct regression between the feature space and the shape space, the concept of shape increment reconstruction is introduced. Moreover, a set of coupled overcomplete dictionaries termed the shape increment dictionary and the local appearance dictionary are learned in a regressive manner to select robust features and fit shape increments. Additionally, to make the learned model more generalized, we select the best matched parameter set through extensive validation tests. Experimental results on three public datasets demonstrate that the proposed method achieves a better robustness over the state-of-the-art methods. 1. Introduction In most literatures, facial feature points are also referred to facial landmarks or facial fiducial points. ese points mainly locate around edges or corners of facial components such as eyebrows, eyes, mouth, nose, and jaw (see Figure 1). Existing databases for method comparison are labeled with different number of feature points, varying from the minimum 5-point configuration [1] to the maximal 194-point configuration [2]. Generally facial feature point detection is a supervised or semisupervised learning process that trains model on a large number of labeled facial images. It starts from a face detection process and then predicts facial landmarks inside the detected face bounding box. e localized facial feature points can be utilized for various face analysis tasks, for example, face recognition [3], facial animation [4], facial expression detection [5], and head pose tracking [6]. In recent years, regression-based methods have gained increasing attention for robust facial feature point detection. Among these methods, a cascade framework is adopted to recursively estimate the face shape of an input image, which is the concatenation of facial feature point coordinates. Beginning with an initial shape (1) , is updated by inferring a shape increment Δ from the previous shape: Δ () = () Φ () (, () ), (1) where Δ () and () are the shape increment and linear regression matrix aſter iterations, respectively. As the input variable of the mapping function Φ () , denotes the image appearance and () denotes the corresponding face shape. e regression goes to the next iteration by the additive formula: () = (−1) + Δ (−1) . (2) In this paper, we propose a sparse reconstruction method that embeds sparse coding in the reconstruction of shape increment. As a very popular signal coding algorithm, sparse coding has been recently successfully applied to the fields of computer vision and machine learning, such as feature selection and clustering analysis, image classification, and face recognition [7–11]. In our method, sparse overcomplete dictionaries are learned to encode various facial poses and local textures considering the complex nature of imaging Hindawi Publishing Corporation Computational Intelligence and Neuroscience Volume 2017, Article ID 4579398, 11 pages http://dx.doi.org/10.1155/2017/4579398

A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

Research ArticleA Robust Shape Reconstruction Method for Facial FeaturePoint Detection

Shuqiu Tan Dongyi Chen Chenggang Guo and Zhiqi Huang

School of Automation Engineering University of Electronic Science and Technology of China No 2006 Xiyuan AveWest Hi-Tech Zone Chengdu 611731 China

Correspondence should be addressed to Shuqiu Tan tanshuqiu123136hotmailcom and Dongyi Chen dychenuestceducn

Received 24 October 2016 Revised 18 January 2017 Accepted 30 January 2017 Published 19 February 2017

Academic Editor Ezequiel Lopez-Rubio

Copyright copy 2017 Shuqiu Tan et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Facial feature point detection has been receiving great research advances in recent years Numerous methods have been developedand applied in practical face analysis systems However it is still a quite challenging task because of the large variability in expressionand gestures and the existence of occlusions in real-world photo shoot In this paper we present a robust sparse reconstructionmethod for the face alignment problems Instead of a direct regression between the feature space and the shape space the conceptof shape increment reconstruction is introduced Moreover a set of coupled overcomplete dictionaries termed the shape incrementdictionary and the local appearance dictionary are learned in a regressive manner to select robust features and fit shape incrementsAdditionally to make the learned model more generalized we select the best matched parameter set through extensive validationtests Experimental results on three public datasets demonstrate that the proposed method achieves a better robustness over thestate-of-the-art methods

1 Introduction

In most literatures facial feature points are also referred tofacial landmarks or facial fiducial pointsThese points mainlylocate around edges or corners of facial components such aseyebrows eyes mouth nose and jaw (see Figure 1) Existingdatabases for method comparison are labeled with differentnumber of feature points varying from theminimum5-pointconfiguration [1] to the maximal 194-point configuration[2] Generally facial feature point detection is a supervisedor semisupervised learning process that trains model ona large number of labeled facial images It starts from aface detection process and then predicts facial landmarksinside the detected face bounding box The localized facialfeature points can be utilized for various face analysistasks for example face recognition [3] facial animation[4] facial expression detection [5] and head pose tracking[6]

In recent years regression-based methods have gainedincreasing attention for robust facial feature point detectionAmong these methods a cascade framework is adopted torecursively estimate the face shape 119878 of an input imagewhich is the concatenation of facial feature point coordinates

Beginning with an initial shape 119878(1) 119878 is updated by inferringa shape increment Δ119878 from the previous shape

Δ119878(119905) = 119882(119905)Φ(119905) (119868 119878(119905)) (1)

where Δ119878(119905) and 119882(119905) are the shape increment and linearregression matrix after 119905 iterations respectively As the inputvariable of the mapping function Φ(119905) 119868 denotes the imageappearance and 119878(119905) denotes the corresponding face shapeThe regression goes to the next iteration by the additiveformula

119878(119905) = 119878(119905minus1) + Δ119878(119905minus1) (2)

In this paper we propose a sparse reconstructionmethodthat embeds sparse coding in the reconstruction of shapeincrement As a very popular signal coding algorithm sparsecoding has been recently successfully applied to the fieldsof computer vision and machine learning such as featureselection and clustering analysis image classification andface recognition [7ndash11] In our method sparse overcompletedictionaries are learned to encode various facial poses andlocal textures considering the complex nature of imaging

Hindawi Publishing CorporationComputational Intelligence and NeuroscienceVolume 2017 Article ID 4579398 11 pageshttpdxdoiorg10115520174579398

2 Computational Intelligence and Neuroscience

Learned modelShape increment

reconstruction

Training Testing

Δ + Δ + middot middot middot + ΔS(A)

DΔ D H

S(1)998400

+S(1)998400

S(2)998400 998400

Figure 1 Schematic diagram of our robust sparse reconstruction method for facial feature point detection

conditions The schematic diagram of the proposed shapeincrement reconstruction method is illustrated in Figure 1In the training stage two kinds of overcomplete dictionariesneed to be learned The first kind of dictionary is termedshape increment dictionary since the atoms consist of typicalshape increments in each iteration The other kind of dic-tionary is termed local appearance dictionary because of theatoms abstracting the complex facial feature appearance Inthe testing stage local features are extracted around the shapepoints of current iteration and then encoded into featurecoefficients using the local appearance dictionaryThus shapeincrements can be reconstructed by the shape incrementdictionary and the shape coefficients transformed from thefeature coefficients Considering the holistic performancewe adopt a way of alternate verification and local enumer-ation to get the best parameter set in a large number ofexperiments Comparison with three previous methods isevaluated on three publicly available face datasets Experi-mental results show that the proposed sparse reconstructionmethod achieves a superior detection robustness comparingwith other methods

The following contents of this paper are organized asfollows relatedwork is introduced in Section 2The proposedsparse reconstruction method is described in detail in Sec-tion 3 and experimental results are compared in Section 4Finally we conclude the whole paper in Section 5

2 Related Work

During the past two decades a large number of methodshave been proposed for facial feature point detection Amongthe early methods Active Appearance Model (AAM) [12]is a representative parametric model that aims to minimizethe difference between the texture sampled from the testingimage and the texture synthesized by the model Latermany improvements and extensions of AAM are proposed[6 13ndash20] To improve the efficiency in real-time systemTzimiropoulos and Pantic [16] proposed amodel to efficientlysolve the AAM fitting problem Tresadern et al [14] usedHaar-like features to reduce computation which can help

the mobile device to perform real-time tracking Nguyenet al [17ndash19] thought AAMs are easily converged to localminima And to overcome this problem they designed a newmodel that learns a cost function having local minima onlyat desired places In terms of improving robustness Huang etal [6] combined view-based AAM [20] and Kalman filter toperform pose tracking and use shape parameters to rebuildthe view space Hansen et al [13] introduced a nonlinearshape model that based on Riemannian elasticity model tohandle the problem of poor pose initialization

Generally Constrained Local Model- (CLM-) basedmethods [21ndash25] are to learn a group of local experts and thentake various shape prior for refinement Vogler et al [21] usedActive Shape Model (ASM) [23] to build a 3D deformablemodel for real-time tracking Yu et al [22] used the mean-shift method [25] to rapidly approach the global optimumAndLiang et al [24] constrained the structure of facial featurepoints using the component locations

The aforementionedmethods share the same characteris-tic which controls face shape variations through some certainparameters But different from thosemethods the regression-based methods [26ndash28] directly learn the regression functionfrom image appearance to target output Gao et al [26] adopta two-level cascaded boosted regression [27] structure toobtain a vectorial output for all points To solve the problemsof large shape variations and occlusions Burgos-Artizzu et al[28] improved their method in three aspects firstly they firstreference pixels by linear interpolation between two land-marks Secondly the regression model directly embeds theocclusion information for robustness Thirdly they designeda smart initialization restart scheme to avoid unsuitablerandom initializations

Our method belongs to the regression-based methodlike [28ndash31] However our work is different from previousmethods in several aspects Firstly existing methods like[31] acquire the descent direction in a supervised learningmanner But in our proposed method the information ofdescent direction is included in the sparse dictionaries forreconstructing the shape increment And then unlike themethod in [28] our method has no usage of the occlusion

Computational Intelligence and Neuroscience 3

information of each feature point Finally the method from[30] designed a two-level boosted regression model to inferthe holistic face shape In our regression model we refine theface shape by the learned two coupled dictionaries stage bystage

3 Regression-Based SparseReconstruction Method

31 Problem Formulation In this paper the alignment targetof all methods is assessed through the following formula

10038171003817100381710038171003817119878(final) minus 119878(ground-truth)100381710038171003817100381710038172 (3)

where 119878(final) and 119878(ground-truth) denote the final estimatedshape and the corresponding ground-truth shape of an imagerespectively In the regression-based methods the iterativeequation is formulated as the following

119878(119896) = 119878(119896minus1) + Δ119878(119896minus1)lowast (4)

here Δ119878(119896minus1)lowast is a variable of shape increment after 119896 minus 1iterations and its value should approximate the ground-truthshape incrementΔ119878(119896minus1) whereΔ119878(119896minus1) = 119878(ground-truth)minus119878(119896minus1)

32Multi-Initialization andMultiparameter Strategies Multi-initialization means diversification of initial iteration shapewhich can improve robustness of the reconstruction modelSpecifically we randomly select multiple ground-truth faceshapes from the training set to form a group of initial shapesfor the current image Obviously the multi-initializationstrategy is able to enlarge the training sample size and enrichextracted feature information that makes each regressionmodel more robust while during the testing stage multi-initialization can create more chances to step out of potentiallocal minima that may lead to inaccurate feature pointlocalization

In our method there are four key parameters that arethe size of feature dictionary the size of shape incrementdictionary and their corresponding sparsity The selectionof the four parameters has a direct influence on the learnedreconstruction model Therefore we do a large number ofvalidation tests to find the best matched parameters Thenaccording to the validation results we decide to adopt threesets of parameters to train the model

33 The Learning of Sparse Coding We use the OrthogonalMatching Pursuit (OMP) [32] algorithm and the K-SingularValue Decomposition (K-SVD) [33] algorithm to find theovercomplete dictionary by minimizing the overall recon-struction error

min119863120574

1003817100381710038171003817119878 minus 11986312057410038171003817100381710038172

119865

subject to 100381710038171003817100381712057411989410038171003817100381710038170 le 1198790 forall119894

(5)

where 119878 is the input data and119863 and 120574denote sparse dictionaryand sparse coefficient respectively 1198790 defines the numberof nonzero values in a coefficient vector and is termed thesparsity

34The Learning of Shape Increment In Supervised DescendMethod (SDM [31]) authors adopt a linear regression equa-tion to approximate shape increments

Δ119878(119896) = 119877(119896)120601(119896) + 119887(119896) (6)

here 120601(119896) denotes Histograms of Oriented Gradients (HoG)features extracted from the shapes 119878(119896) of previous stage 119877(119896)

and 119887(119896) are got from the training set by minimizing

arg min119877(119896)119887(119896)sum119894

10038171003817100381710038171003817Δ119878(119896119894)lowast minus 119877(119896)120601(119896119894) minus 119887(119896)10038171003817100381710038171003817

2

2 (7)

Different from the idea of linear approximation proposed inSDM we introduce the concept of direct sparse reconstruc-tion for reconstructing shape increments

Δ119878(119896)lowast = 119863(119896)Δ 120574(119896)Δ (8)

Here 119863(119896)Δ and 120574(119896)Δ represent the shape increment dictionaryand its corresponding sparse coefficient in the 119896th iterationrespectively From another perspective the generic descentdirections are embedded into the sparse dictionary 119863(119896)Δwhich can be more robust in facing large shape variations

35 The Shape Regression Framework To better representlocal appearances around facial feature points the extractedHoG features are also encoded into sparse coefficients

120601(119896) = 119863(119896)120574(119896) (9)

where 119863(119896) and 120574(119896) are the local appearance dictionary andthe local appearance sparse coefficient respectively Insteadof a direct mapping from the whole feature space to the shapeincrement space we propose to perform regression only inthe sparse coefficient space Since both coefficient matrixesare sufficient sparse the regression matrix can be quicklysolved The equation is formulated as follows

120574(119896)Δ = 119867(119896)120574(119896) (10)

Nowwedescribe the shape regression framework in detail(see Pseudocode 1) During the training stage we can getthe shape prediction of the next stage using (4) By iterativelearning shape increment Δ119878(119896)lowast we can obtain the finalface shape Combining (10) and (8) Δ119878(119896)lowast is computed fromΔ119878(119896)lowast = 119863(119896)Δ 119867

(119896)120574(119896) where 119863(119896)Δ and 120574(119896) are variablesthat can be acquired by the following sparse reconstructionformulas

argmin119863(119896) 120574(119896)

10038171003817100381710038171003817120601(119896) minus 119863(119896)120574(119896)10038171003817100381710038171003817

2

2

st 10038171003817100381710038171003817120574(119896)100381710038171003817100381710038170 le 119879120601

argmin119863(119896)Δ120574(119896)Δ

10038171003817100381710038171003817Δ119878(119896) minus 119863(119896)Δ 120574

(119896)Δ

100381710038171003817100381710038172

2

st 10038171003817100381710038171003817120574(119896)Δ

100381710038171003817100381710038170 le 119879Δ

(11)

4 Computational Intelligence and Neuroscience

Input Training set images and corresponding shapes 119868train = [1198681 1198682 119868119873] and 119878train = [1198781 1198782 119878119873] Testingset images and corresponding initial shapes 119868test = [1198681 1198682 1198681198731015840 ] and 119878initial = [119878

(1)10158401 119878(1)10158402 119878

(1)1015840

1198731015840] Sparse

coding parameters set 119879Δ 119879120601 size of119863Δ and size of119863 Total iterations 119876Output Final face shapes 119878test = [119878

(119876+1)10158401 119878

(119876+1)10158402 119878

(119876+1)1015840

1198731015840]

Training Stagefor 119896 from 1 to 119876 dostep 1 Given 119878lowast and 119878(119896) obtain Δ119878(119896) Then extract 120601(119896) from 119878(119896)step 2 Sequentially get 120574(119896)119863(119896)Δ and 120574(119896)Δ using Equations (11)step 3 Get119867(119896) using Equation (12)

end forTraining Model [119863(1)Δ 119863

(2)Δ 119863

(119876)Δ ] [119863

(1) 119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)]Testing Stagefor 119896 from 1 to 119876 dostep 1 Similarly extract 120601(119896)1015840 on 119878(119896)1015840step 2 Given119863(119896)Δ 119863

(119896) and119867(119896) calculate Δ119878(119896)1015840 using Equation (14)step 3 Obtain 119878(119896+1) using Equation (13)

end for

Pseudocode 1 Pseudocode of our proposed regression-based method

119879Δ and 119879120601 represent the shape increment sparsity and thelocal appearance sparsity respectively Given 120574(119896) and 120574(119896)Δ wecan get119867(119896) by

arg min119867(119896)

10038171003817100381710038171003817120574(119896)Δ minus 119867

(119896)120574(119896)100381710038171003817100381710038172

2 (12)

Finally we can generate a set of [119863(1)Δ 119863(2)Δ 119863

(119876)Δ ] [119863

(1)119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)] after 119876 iterationsHere 119876 is the number of iterations and 119896 = 1 2 119876

During the testing stage we can get the local appearancecoefficients 120574(119896minus1)1015840 using the already learned 119863(119896minus1) Thenthe final face shape is estimated using (16) and (17) after 119876iterations

119878(119896+1)1015840 = 119878(119896)1015840 + Δ119878(119896)1015840 (13)

Δ119878(119896+1)1015840 = 119863(119896)Δ 119867(119896)120574(119896)1015840 (14)

36 Major Contributions of the Proposed Method In thissection we summarize the following three contributions ofthe proposed method

(1) Sparse coding is utilized to learn a set of coupleddictionaries named the shape increment dictionaryand the local appearance dictionary The solved cor-responding sparse coefficients are embedded in aregression framework for approximating the ground-truth shape increments

(2) A way of alternate verification and local enumera-tion is applied for selecting the best parameter setin extensive experiments Moreover it is shown inexperimental results that the proposed method has astrong stability under different parameter settings

(3) We also rebuild testing conditions that the top 510 15 20 and 25 of the testing images are

removed according to the descending order sortedby the normalized alignment error And then theproposed method is compared with three classicalmethods on three publicly available face datasetsResults support that the proposed method achievesbetter detection accuracy and robustness than theother three methods

4 Experiments

41 FaceDatasets In this section three publicly available facedatasets are selected for performance comparison LabeledFace Parts in the Wild (LFPW-68 points and LFPW-29points [16]) and Caltech Occluded Faces in the Wild 2013(COFW) [34] In the downloaded LFPWdataset 811 trainingimages and 224 testing images are collected Both the 68pointsrsquo configuration and the 29 pointsrsquo configuration labeledfor the LFPW dataset are evaluated The COFW datasetincludes 1345 training images and 507 testing images andeach image is labeled with 29 facial feature points and relatedbinary occlusion information Particularly collected imagesin this dataset show a variety of occlusions and large shapevariations

42 Implementation Details

421 Codes The implementation codes of SDM [31] ExplicitShape Regression (ESR) [30] and Robust Cascaded PoseRegression (RCPR) [28] are got from the Internet Exceptthat the codes of RCPR and ESR are released on the personalwebsites by at least one of the authors we get the code of SDMfrom Github

422 Parameter Settings Generally the size of shape incre-ment dictionary and local appearance dictionary in ourmethod depends on the dimensionality of the HoG descrip-tor And in the following validation experiments we will

Computational Intelligence and Neuroscience 5

Table 1 Comparison of different parameter sets on LFPW (68 points) dataset Here 119879Δ is fixed to 2

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors

2 2

256 256 5 1 amp 1 0079381256 512 5 1 amp 1 008572512 256 5 1 amp 1 0085932512 512 5 1 amp 1 0086731

2 4

256 256 5 1 amp 1 0081958256 512 5 1 amp 1 0083715512 256 5 1 amp 1 0086187512 512 5 1 amp 1 0087157

2 6

256 256 5 1 amp 1 007937256 512 5 1 amp 1 007986512 256 5 1 amp 1 0075987512 512 5 1 amp 1 0084429

2 8

256 256 5 1 amp 1 0075863256 512 5 1 amp 1 0082588512 256 5 1 amp 1 0077048512 512 5 1 amp 1 0082644

2 10

256 256 5 1 amp 1 0076178256 512 5 1 amp 1 0076865512 256 5 1 amp 1 0080907512 512 5 1 amp 1 0088414

introduce how to select the best combination of parametersParameters settings of SDM ESR and RCPR are consistentwith the original settings reported in the papers In SDM theregression runs 5 stages In ESR the number of features in afern and candidate pixel features are 5 and 400 respectivelyTo build themodel themethod uses 10 and 500 stages to traina two-level boosted framework And in RCPR 15 iterations5 restarts 400 features and 100 random fern regressors areadopted

423 Assessment Criteria In our experiments we use thefollowing equation to calculate and normalize the alignmenterrors Firstly we calculate the localization errors betweenthe ground-truth point coordinates and the detected pointcoordinates that is the Euclidean distance between twovectors Then it is further normalized by the interoculardistance as follows

119889error =119875 minus 119866210038171003817100381710038171003817119866leye minus 119866reye

100381710038171003817100381710038172 (15)

In (15) 119875 denotes the detected facial point coordinates and 119866denotes the ground-truth point coordinates 119866leye and 119866reyedenote the ground-truth center coordinates of left eye andright eye respectively

43 Experiments

431 Parameter Validation In this section we will introducehow to use the way of alternate verification and local enu-meration to find the final values of parameters As describedabove there are six variables 119879Δ 119879120601 size of 119863Δ size of 119863119876 and 119870 that need to be fixed here 119870 is the initialization

number during the process of training and testing Depend-ing on the requirements of sparsity the candidate values of119879Δ and 119879120601 are selected from the following set

119879Δ 119879120601 isin 2 4 6 8 10 (16)

Similarly the candidate sizes of119863Δ sizes of119863119876 and119870 formthe following sets

size of 119863Δ isin 256 512

size of 119863 isin 256 512

119876 isin 1 2 3 4 5

119870 isin 1 6 8 10 12 14

(17)

Firstly the values of119876 and119870 are set to 5 and 1 respectivelyNote that the value of 119870 in the testing stage should be equalto the value of 119870 in the training stage Then we set the valueof 119879120601 to 2 4 6 8 and 10 sequentially The size of 119863Δ and 119863is selected in random combination For different values of 119879Δwe can get five groups of results AndTable 1 gives the detailedresults when 119879Δ is fixed to 2 From Table 1 we may find thatthe parameter set 2 8 256 256 5 1 amp 1 achieves the lowestalignment error Similarly we conduct the rest experimentsand find the best parameter sets The corresponding sparsityis also fixed and therefore we get three sets of parameters thatare 6 10 512 256 8 8 512 256 and 10 10 512 256 InTable 2 we test the multi-initialization and multiparameterstrategies while the regression runs 4 iterations and 10initializations with different parameter settings In the finalall point localizations are averaged to get the fusion result

6 Computational Intelligence and Neuroscience

Table 2 Comparison of multi-initialization and multiparameter strategies on LFPW (68 points) dataset Here 119876 and 119870 are set to 4 and 10respectively

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors Fusion errors6 10 512 256 4 10 amp 10 0062189

005517910 10 512 256 4 10 amp 10 0060758 8 512 256 4 10 amp 10 0061787

Table 3Mean error of each facial component on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasetsThe top 5maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00829 00619 00478 00395 00369ESR 00862 00750 00651 00596 00527RCPR 00948 00690 00562 00493 00433

Our method 00747 00587 00455 00405 00392

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00422 00422 00410 00422 00328ESR 00570 00459 00531 00502 00400RCPR 00748 00507 00568 00509 00357

Our method 00382 00403 00401 00406 00323

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00713 00714 00709 00600 00519ESR 01507 01022 01082 00952 00801RCPR 01209 00810 00781 00655 00539

Our method 00668 00642 00702 00567 00497

432 Comparison with Previous Methods Due to the exis-tence of a small number of facial images having large shapevariations and severe occlusions it challenges the randommulti-initialization strategy which fails to generate an appro-priate starting shapeTherefore we compare ourmethod withthree classic methods on rebuilt datasets These datasets stillinclude most of the images coming from LFPW (68 points)LFPW (29 points) and COFW (29 points) We just removethe top 5 10 15 20 and 25 of the testing facialimages in each dataset by sorting the alignment errors in adescending order (see Figure 2)

In Figure 2 all curves of COFW show a more dispersivedistribution than the other two datasets Since this datasetconsists of many more facial images with large shape vari-ations and occlusions it may affect the detection accuracymore or less Meanwhile the irregular textural featuresaround facial feature points are challenging for learning ofstructural model during the training Obviously in Figure 2the curves of our proposedmethod are superior to the othersAdditionally the LFPW (68 points) and LFPW (29 points)share the same facial images but different face shapes so wemay find some useful information about the performance ofmethods through these datasets

In general the more facial feature points are the moredifficult they are to detect By comparing among five facial

components themean errors of nose and eyes given in Tables3 and 4 do not change obviously across three datasets becausethe vicinal textural information of eyes is easy to recognizeand the textural information aroundnose has a less possibilityto be occluded Moreover the facial feature points locatedin the regions of nose and eyes are denser than the pointsof contour which is also benefit to the regressive searchingprocess

Figure 3 shows the alignment errors of four methodstested on LFPW (68 points) LFPW (29 points) and COFW(29 points) datasets In Figure 3 we may find that the meanerror curves show a rapid descending trend when the mostdifficult 5 of testing images are removed It indicates thatthe statistical average can be biased by a few challenge imagesThen as the removal proportion increases all the curvesbecome smoother It shows in Figure 3 that our proposedmethod is more stable than other methods which means ourtraining model has a robustness in dealing with occlusionsand large shape variations

Specifically we plot the detection curves of five facialcomponents in Figure 4 It is obvious in Figure 4 that ESR andRCPR has a less competitive performance for localizing eachfacial components And our method shows better robustnessin localizing feature points that belong to eyebrows andcontour since these two facial components are very likely to

Computational Intelligence and Neuroscience 7

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(a)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(b)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(c)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(d)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(e)

Figure 2 Continued

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 2: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

2 Computational Intelligence and Neuroscience

Learned modelShape increment

reconstruction

Training Testing

Δ + Δ + middot middot middot + ΔS(A)

DΔ D H

S(1)998400

+S(1)998400

S(2)998400 998400

Figure 1 Schematic diagram of our robust sparse reconstruction method for facial feature point detection

conditions The schematic diagram of the proposed shapeincrement reconstruction method is illustrated in Figure 1In the training stage two kinds of overcomplete dictionariesneed to be learned The first kind of dictionary is termedshape increment dictionary since the atoms consist of typicalshape increments in each iteration The other kind of dic-tionary is termed local appearance dictionary because of theatoms abstracting the complex facial feature appearance Inthe testing stage local features are extracted around the shapepoints of current iteration and then encoded into featurecoefficients using the local appearance dictionaryThus shapeincrements can be reconstructed by the shape incrementdictionary and the shape coefficients transformed from thefeature coefficients Considering the holistic performancewe adopt a way of alternate verification and local enumer-ation to get the best parameter set in a large number ofexperiments Comparison with three previous methods isevaluated on three publicly available face datasets Experi-mental results show that the proposed sparse reconstructionmethod achieves a superior detection robustness comparingwith other methods

The following contents of this paper are organized asfollows relatedwork is introduced in Section 2The proposedsparse reconstruction method is described in detail in Sec-tion 3 and experimental results are compared in Section 4Finally we conclude the whole paper in Section 5

2 Related Work

During the past two decades a large number of methodshave been proposed for facial feature point detection Amongthe early methods Active Appearance Model (AAM) [12]is a representative parametric model that aims to minimizethe difference between the texture sampled from the testingimage and the texture synthesized by the model Latermany improvements and extensions of AAM are proposed[6 13ndash20] To improve the efficiency in real-time systemTzimiropoulos and Pantic [16] proposed amodel to efficientlysolve the AAM fitting problem Tresadern et al [14] usedHaar-like features to reduce computation which can help

the mobile device to perform real-time tracking Nguyenet al [17ndash19] thought AAMs are easily converged to localminima And to overcome this problem they designed a newmodel that learns a cost function having local minima onlyat desired places In terms of improving robustness Huang etal [6] combined view-based AAM [20] and Kalman filter toperform pose tracking and use shape parameters to rebuildthe view space Hansen et al [13] introduced a nonlinearshape model that based on Riemannian elasticity model tohandle the problem of poor pose initialization

Generally Constrained Local Model- (CLM-) basedmethods [21ndash25] are to learn a group of local experts and thentake various shape prior for refinement Vogler et al [21] usedActive Shape Model (ASM) [23] to build a 3D deformablemodel for real-time tracking Yu et al [22] used the mean-shift method [25] to rapidly approach the global optimumAndLiang et al [24] constrained the structure of facial featurepoints using the component locations

The aforementionedmethods share the same characteris-tic which controls face shape variations through some certainparameters But different from thosemethods the regression-based methods [26ndash28] directly learn the regression functionfrom image appearance to target output Gao et al [26] adopta two-level cascaded boosted regression [27] structure toobtain a vectorial output for all points To solve the problemsof large shape variations and occlusions Burgos-Artizzu et al[28] improved their method in three aspects firstly they firstreference pixels by linear interpolation between two land-marks Secondly the regression model directly embeds theocclusion information for robustness Thirdly they designeda smart initialization restart scheme to avoid unsuitablerandom initializations

Our method belongs to the regression-based methodlike [28ndash31] However our work is different from previousmethods in several aspects Firstly existing methods like[31] acquire the descent direction in a supervised learningmanner But in our proposed method the information ofdescent direction is included in the sparse dictionaries forreconstructing the shape increment And then unlike themethod in [28] our method has no usage of the occlusion

Computational Intelligence and Neuroscience 3

information of each feature point Finally the method from[30] designed a two-level boosted regression model to inferthe holistic face shape In our regression model we refine theface shape by the learned two coupled dictionaries stage bystage

3 Regression-Based SparseReconstruction Method

31 Problem Formulation In this paper the alignment targetof all methods is assessed through the following formula

10038171003817100381710038171003817119878(final) minus 119878(ground-truth)100381710038171003817100381710038172 (3)

where 119878(final) and 119878(ground-truth) denote the final estimatedshape and the corresponding ground-truth shape of an imagerespectively In the regression-based methods the iterativeequation is formulated as the following

119878(119896) = 119878(119896minus1) + Δ119878(119896minus1)lowast (4)

here Δ119878(119896minus1)lowast is a variable of shape increment after 119896 minus 1iterations and its value should approximate the ground-truthshape incrementΔ119878(119896minus1) whereΔ119878(119896minus1) = 119878(ground-truth)minus119878(119896minus1)

32Multi-Initialization andMultiparameter Strategies Multi-initialization means diversification of initial iteration shapewhich can improve robustness of the reconstruction modelSpecifically we randomly select multiple ground-truth faceshapes from the training set to form a group of initial shapesfor the current image Obviously the multi-initializationstrategy is able to enlarge the training sample size and enrichextracted feature information that makes each regressionmodel more robust while during the testing stage multi-initialization can create more chances to step out of potentiallocal minima that may lead to inaccurate feature pointlocalization

In our method there are four key parameters that arethe size of feature dictionary the size of shape incrementdictionary and their corresponding sparsity The selectionof the four parameters has a direct influence on the learnedreconstruction model Therefore we do a large number ofvalidation tests to find the best matched parameters Thenaccording to the validation results we decide to adopt threesets of parameters to train the model

33 The Learning of Sparse Coding We use the OrthogonalMatching Pursuit (OMP) [32] algorithm and the K-SingularValue Decomposition (K-SVD) [33] algorithm to find theovercomplete dictionary by minimizing the overall recon-struction error

min119863120574

1003817100381710038171003817119878 minus 11986312057410038171003817100381710038172

119865

subject to 100381710038171003817100381712057411989410038171003817100381710038170 le 1198790 forall119894

(5)

where 119878 is the input data and119863 and 120574denote sparse dictionaryand sparse coefficient respectively 1198790 defines the numberof nonzero values in a coefficient vector and is termed thesparsity

34The Learning of Shape Increment In Supervised DescendMethod (SDM [31]) authors adopt a linear regression equa-tion to approximate shape increments

Δ119878(119896) = 119877(119896)120601(119896) + 119887(119896) (6)

here 120601(119896) denotes Histograms of Oriented Gradients (HoG)features extracted from the shapes 119878(119896) of previous stage 119877(119896)

and 119887(119896) are got from the training set by minimizing

arg min119877(119896)119887(119896)sum119894

10038171003817100381710038171003817Δ119878(119896119894)lowast minus 119877(119896)120601(119896119894) minus 119887(119896)10038171003817100381710038171003817

2

2 (7)

Different from the idea of linear approximation proposed inSDM we introduce the concept of direct sparse reconstruc-tion for reconstructing shape increments

Δ119878(119896)lowast = 119863(119896)Δ 120574(119896)Δ (8)

Here 119863(119896)Δ and 120574(119896)Δ represent the shape increment dictionaryand its corresponding sparse coefficient in the 119896th iterationrespectively From another perspective the generic descentdirections are embedded into the sparse dictionary 119863(119896)Δwhich can be more robust in facing large shape variations

35 The Shape Regression Framework To better representlocal appearances around facial feature points the extractedHoG features are also encoded into sparse coefficients

120601(119896) = 119863(119896)120574(119896) (9)

where 119863(119896) and 120574(119896) are the local appearance dictionary andthe local appearance sparse coefficient respectively Insteadof a direct mapping from the whole feature space to the shapeincrement space we propose to perform regression only inthe sparse coefficient space Since both coefficient matrixesare sufficient sparse the regression matrix can be quicklysolved The equation is formulated as follows

120574(119896)Δ = 119867(119896)120574(119896) (10)

Nowwedescribe the shape regression framework in detail(see Pseudocode 1) During the training stage we can getthe shape prediction of the next stage using (4) By iterativelearning shape increment Δ119878(119896)lowast we can obtain the finalface shape Combining (10) and (8) Δ119878(119896)lowast is computed fromΔ119878(119896)lowast = 119863(119896)Δ 119867

(119896)120574(119896) where 119863(119896)Δ and 120574(119896) are variablesthat can be acquired by the following sparse reconstructionformulas

argmin119863(119896) 120574(119896)

10038171003817100381710038171003817120601(119896) minus 119863(119896)120574(119896)10038171003817100381710038171003817

2

2

st 10038171003817100381710038171003817120574(119896)100381710038171003817100381710038170 le 119879120601

argmin119863(119896)Δ120574(119896)Δ

10038171003817100381710038171003817Δ119878(119896) minus 119863(119896)Δ 120574

(119896)Δ

100381710038171003817100381710038172

2

st 10038171003817100381710038171003817120574(119896)Δ

100381710038171003817100381710038170 le 119879Δ

(11)

4 Computational Intelligence and Neuroscience

Input Training set images and corresponding shapes 119868train = [1198681 1198682 119868119873] and 119878train = [1198781 1198782 119878119873] Testingset images and corresponding initial shapes 119868test = [1198681 1198682 1198681198731015840 ] and 119878initial = [119878

(1)10158401 119878(1)10158402 119878

(1)1015840

1198731015840] Sparse

coding parameters set 119879Δ 119879120601 size of119863Δ and size of119863 Total iterations 119876Output Final face shapes 119878test = [119878

(119876+1)10158401 119878

(119876+1)10158402 119878

(119876+1)1015840

1198731015840]

Training Stagefor 119896 from 1 to 119876 dostep 1 Given 119878lowast and 119878(119896) obtain Δ119878(119896) Then extract 120601(119896) from 119878(119896)step 2 Sequentially get 120574(119896)119863(119896)Δ and 120574(119896)Δ using Equations (11)step 3 Get119867(119896) using Equation (12)

end forTraining Model [119863(1)Δ 119863

(2)Δ 119863

(119876)Δ ] [119863

(1) 119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)]Testing Stagefor 119896 from 1 to 119876 dostep 1 Similarly extract 120601(119896)1015840 on 119878(119896)1015840step 2 Given119863(119896)Δ 119863

(119896) and119867(119896) calculate Δ119878(119896)1015840 using Equation (14)step 3 Obtain 119878(119896+1) using Equation (13)

end for

Pseudocode 1 Pseudocode of our proposed regression-based method

119879Δ and 119879120601 represent the shape increment sparsity and thelocal appearance sparsity respectively Given 120574(119896) and 120574(119896)Δ wecan get119867(119896) by

arg min119867(119896)

10038171003817100381710038171003817120574(119896)Δ minus 119867

(119896)120574(119896)100381710038171003817100381710038172

2 (12)

Finally we can generate a set of [119863(1)Δ 119863(2)Δ 119863

(119876)Δ ] [119863

(1)119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)] after 119876 iterationsHere 119876 is the number of iterations and 119896 = 1 2 119876

During the testing stage we can get the local appearancecoefficients 120574(119896minus1)1015840 using the already learned 119863(119896minus1) Thenthe final face shape is estimated using (16) and (17) after 119876iterations

119878(119896+1)1015840 = 119878(119896)1015840 + Δ119878(119896)1015840 (13)

Δ119878(119896+1)1015840 = 119863(119896)Δ 119867(119896)120574(119896)1015840 (14)

36 Major Contributions of the Proposed Method In thissection we summarize the following three contributions ofthe proposed method

(1) Sparse coding is utilized to learn a set of coupleddictionaries named the shape increment dictionaryand the local appearance dictionary The solved cor-responding sparse coefficients are embedded in aregression framework for approximating the ground-truth shape increments

(2) A way of alternate verification and local enumera-tion is applied for selecting the best parameter setin extensive experiments Moreover it is shown inexperimental results that the proposed method has astrong stability under different parameter settings

(3) We also rebuild testing conditions that the top 510 15 20 and 25 of the testing images are

removed according to the descending order sortedby the normalized alignment error And then theproposed method is compared with three classicalmethods on three publicly available face datasetsResults support that the proposed method achievesbetter detection accuracy and robustness than theother three methods

4 Experiments

41 FaceDatasets In this section three publicly available facedatasets are selected for performance comparison LabeledFace Parts in the Wild (LFPW-68 points and LFPW-29points [16]) and Caltech Occluded Faces in the Wild 2013(COFW) [34] In the downloaded LFPWdataset 811 trainingimages and 224 testing images are collected Both the 68pointsrsquo configuration and the 29 pointsrsquo configuration labeledfor the LFPW dataset are evaluated The COFW datasetincludes 1345 training images and 507 testing images andeach image is labeled with 29 facial feature points and relatedbinary occlusion information Particularly collected imagesin this dataset show a variety of occlusions and large shapevariations

42 Implementation Details

421 Codes The implementation codes of SDM [31] ExplicitShape Regression (ESR) [30] and Robust Cascaded PoseRegression (RCPR) [28] are got from the Internet Exceptthat the codes of RCPR and ESR are released on the personalwebsites by at least one of the authors we get the code of SDMfrom Github

422 Parameter Settings Generally the size of shape incre-ment dictionary and local appearance dictionary in ourmethod depends on the dimensionality of the HoG descrip-tor And in the following validation experiments we will

Computational Intelligence and Neuroscience 5

Table 1 Comparison of different parameter sets on LFPW (68 points) dataset Here 119879Δ is fixed to 2

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors

2 2

256 256 5 1 amp 1 0079381256 512 5 1 amp 1 008572512 256 5 1 amp 1 0085932512 512 5 1 amp 1 0086731

2 4

256 256 5 1 amp 1 0081958256 512 5 1 amp 1 0083715512 256 5 1 amp 1 0086187512 512 5 1 amp 1 0087157

2 6

256 256 5 1 amp 1 007937256 512 5 1 amp 1 007986512 256 5 1 amp 1 0075987512 512 5 1 amp 1 0084429

2 8

256 256 5 1 amp 1 0075863256 512 5 1 amp 1 0082588512 256 5 1 amp 1 0077048512 512 5 1 amp 1 0082644

2 10

256 256 5 1 amp 1 0076178256 512 5 1 amp 1 0076865512 256 5 1 amp 1 0080907512 512 5 1 amp 1 0088414

introduce how to select the best combination of parametersParameters settings of SDM ESR and RCPR are consistentwith the original settings reported in the papers In SDM theregression runs 5 stages In ESR the number of features in afern and candidate pixel features are 5 and 400 respectivelyTo build themodel themethod uses 10 and 500 stages to traina two-level boosted framework And in RCPR 15 iterations5 restarts 400 features and 100 random fern regressors areadopted

423 Assessment Criteria In our experiments we use thefollowing equation to calculate and normalize the alignmenterrors Firstly we calculate the localization errors betweenthe ground-truth point coordinates and the detected pointcoordinates that is the Euclidean distance between twovectors Then it is further normalized by the interoculardistance as follows

119889error =119875 minus 119866210038171003817100381710038171003817119866leye minus 119866reye

100381710038171003817100381710038172 (15)

In (15) 119875 denotes the detected facial point coordinates and 119866denotes the ground-truth point coordinates 119866leye and 119866reyedenote the ground-truth center coordinates of left eye andright eye respectively

43 Experiments

431 Parameter Validation In this section we will introducehow to use the way of alternate verification and local enu-meration to find the final values of parameters As describedabove there are six variables 119879Δ 119879120601 size of 119863Δ size of 119863119876 and 119870 that need to be fixed here 119870 is the initialization

number during the process of training and testing Depend-ing on the requirements of sparsity the candidate values of119879Δ and 119879120601 are selected from the following set

119879Δ 119879120601 isin 2 4 6 8 10 (16)

Similarly the candidate sizes of119863Δ sizes of119863119876 and119870 formthe following sets

size of 119863Δ isin 256 512

size of 119863 isin 256 512

119876 isin 1 2 3 4 5

119870 isin 1 6 8 10 12 14

(17)

Firstly the values of119876 and119870 are set to 5 and 1 respectivelyNote that the value of 119870 in the testing stage should be equalto the value of 119870 in the training stage Then we set the valueof 119879120601 to 2 4 6 8 and 10 sequentially The size of 119863Δ and 119863is selected in random combination For different values of 119879Δwe can get five groups of results AndTable 1 gives the detailedresults when 119879Δ is fixed to 2 From Table 1 we may find thatthe parameter set 2 8 256 256 5 1 amp 1 achieves the lowestalignment error Similarly we conduct the rest experimentsand find the best parameter sets The corresponding sparsityis also fixed and therefore we get three sets of parameters thatare 6 10 512 256 8 8 512 256 and 10 10 512 256 InTable 2 we test the multi-initialization and multiparameterstrategies while the regression runs 4 iterations and 10initializations with different parameter settings In the finalall point localizations are averaged to get the fusion result

6 Computational Intelligence and Neuroscience

Table 2 Comparison of multi-initialization and multiparameter strategies on LFPW (68 points) dataset Here 119876 and 119870 are set to 4 and 10respectively

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors Fusion errors6 10 512 256 4 10 amp 10 0062189

005517910 10 512 256 4 10 amp 10 0060758 8 512 256 4 10 amp 10 0061787

Table 3Mean error of each facial component on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasetsThe top 5maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00829 00619 00478 00395 00369ESR 00862 00750 00651 00596 00527RCPR 00948 00690 00562 00493 00433

Our method 00747 00587 00455 00405 00392

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00422 00422 00410 00422 00328ESR 00570 00459 00531 00502 00400RCPR 00748 00507 00568 00509 00357

Our method 00382 00403 00401 00406 00323

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00713 00714 00709 00600 00519ESR 01507 01022 01082 00952 00801RCPR 01209 00810 00781 00655 00539

Our method 00668 00642 00702 00567 00497

432 Comparison with Previous Methods Due to the exis-tence of a small number of facial images having large shapevariations and severe occlusions it challenges the randommulti-initialization strategy which fails to generate an appro-priate starting shapeTherefore we compare ourmethod withthree classic methods on rebuilt datasets These datasets stillinclude most of the images coming from LFPW (68 points)LFPW (29 points) and COFW (29 points) We just removethe top 5 10 15 20 and 25 of the testing facialimages in each dataset by sorting the alignment errors in adescending order (see Figure 2)

In Figure 2 all curves of COFW show a more dispersivedistribution than the other two datasets Since this datasetconsists of many more facial images with large shape vari-ations and occlusions it may affect the detection accuracymore or less Meanwhile the irregular textural featuresaround facial feature points are challenging for learning ofstructural model during the training Obviously in Figure 2the curves of our proposedmethod are superior to the othersAdditionally the LFPW (68 points) and LFPW (29 points)share the same facial images but different face shapes so wemay find some useful information about the performance ofmethods through these datasets

In general the more facial feature points are the moredifficult they are to detect By comparing among five facial

components themean errors of nose and eyes given in Tables3 and 4 do not change obviously across three datasets becausethe vicinal textural information of eyes is easy to recognizeand the textural information aroundnose has a less possibilityto be occluded Moreover the facial feature points locatedin the regions of nose and eyes are denser than the pointsof contour which is also benefit to the regressive searchingprocess

Figure 3 shows the alignment errors of four methodstested on LFPW (68 points) LFPW (29 points) and COFW(29 points) datasets In Figure 3 we may find that the meanerror curves show a rapid descending trend when the mostdifficult 5 of testing images are removed It indicates thatthe statistical average can be biased by a few challenge imagesThen as the removal proportion increases all the curvesbecome smoother It shows in Figure 3 that our proposedmethod is more stable than other methods which means ourtraining model has a robustness in dealing with occlusionsand large shape variations

Specifically we plot the detection curves of five facialcomponents in Figure 4 It is obvious in Figure 4 that ESR andRCPR has a less competitive performance for localizing eachfacial components And our method shows better robustnessin localizing feature points that belong to eyebrows andcontour since these two facial components are very likely to

Computational Intelligence and Neuroscience 7

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(a)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(b)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(c)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(d)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(e)

Figure 2 Continued

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 3: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

Computational Intelligence and Neuroscience 3

information of each feature point Finally the method from[30] designed a two-level boosted regression model to inferthe holistic face shape In our regression model we refine theface shape by the learned two coupled dictionaries stage bystage

3 Regression-Based SparseReconstruction Method

31 Problem Formulation In this paper the alignment targetof all methods is assessed through the following formula

10038171003817100381710038171003817119878(final) minus 119878(ground-truth)100381710038171003817100381710038172 (3)

where 119878(final) and 119878(ground-truth) denote the final estimatedshape and the corresponding ground-truth shape of an imagerespectively In the regression-based methods the iterativeequation is formulated as the following

119878(119896) = 119878(119896minus1) + Δ119878(119896minus1)lowast (4)

here Δ119878(119896minus1)lowast is a variable of shape increment after 119896 minus 1iterations and its value should approximate the ground-truthshape incrementΔ119878(119896minus1) whereΔ119878(119896minus1) = 119878(ground-truth)minus119878(119896minus1)

32Multi-Initialization andMultiparameter Strategies Multi-initialization means diversification of initial iteration shapewhich can improve robustness of the reconstruction modelSpecifically we randomly select multiple ground-truth faceshapes from the training set to form a group of initial shapesfor the current image Obviously the multi-initializationstrategy is able to enlarge the training sample size and enrichextracted feature information that makes each regressionmodel more robust while during the testing stage multi-initialization can create more chances to step out of potentiallocal minima that may lead to inaccurate feature pointlocalization

In our method there are four key parameters that arethe size of feature dictionary the size of shape incrementdictionary and their corresponding sparsity The selectionof the four parameters has a direct influence on the learnedreconstruction model Therefore we do a large number ofvalidation tests to find the best matched parameters Thenaccording to the validation results we decide to adopt threesets of parameters to train the model

33 The Learning of Sparse Coding We use the OrthogonalMatching Pursuit (OMP) [32] algorithm and the K-SingularValue Decomposition (K-SVD) [33] algorithm to find theovercomplete dictionary by minimizing the overall recon-struction error

min119863120574

1003817100381710038171003817119878 minus 11986312057410038171003817100381710038172

119865

subject to 100381710038171003817100381712057411989410038171003817100381710038170 le 1198790 forall119894

(5)

where 119878 is the input data and119863 and 120574denote sparse dictionaryand sparse coefficient respectively 1198790 defines the numberof nonzero values in a coefficient vector and is termed thesparsity

34The Learning of Shape Increment In Supervised DescendMethod (SDM [31]) authors adopt a linear regression equa-tion to approximate shape increments

Δ119878(119896) = 119877(119896)120601(119896) + 119887(119896) (6)

here 120601(119896) denotes Histograms of Oriented Gradients (HoG)features extracted from the shapes 119878(119896) of previous stage 119877(119896)

and 119887(119896) are got from the training set by minimizing

arg min119877(119896)119887(119896)sum119894

10038171003817100381710038171003817Δ119878(119896119894)lowast minus 119877(119896)120601(119896119894) minus 119887(119896)10038171003817100381710038171003817

2

2 (7)

Different from the idea of linear approximation proposed inSDM we introduce the concept of direct sparse reconstruc-tion for reconstructing shape increments

Δ119878(119896)lowast = 119863(119896)Δ 120574(119896)Δ (8)

Here 119863(119896)Δ and 120574(119896)Δ represent the shape increment dictionaryand its corresponding sparse coefficient in the 119896th iterationrespectively From another perspective the generic descentdirections are embedded into the sparse dictionary 119863(119896)Δwhich can be more robust in facing large shape variations

35 The Shape Regression Framework To better representlocal appearances around facial feature points the extractedHoG features are also encoded into sparse coefficients

120601(119896) = 119863(119896)120574(119896) (9)

where 119863(119896) and 120574(119896) are the local appearance dictionary andthe local appearance sparse coefficient respectively Insteadof a direct mapping from the whole feature space to the shapeincrement space we propose to perform regression only inthe sparse coefficient space Since both coefficient matrixesare sufficient sparse the regression matrix can be quicklysolved The equation is formulated as follows

120574(119896)Δ = 119867(119896)120574(119896) (10)

Nowwedescribe the shape regression framework in detail(see Pseudocode 1) During the training stage we can getthe shape prediction of the next stage using (4) By iterativelearning shape increment Δ119878(119896)lowast we can obtain the finalface shape Combining (10) and (8) Δ119878(119896)lowast is computed fromΔ119878(119896)lowast = 119863(119896)Δ 119867

(119896)120574(119896) where 119863(119896)Δ and 120574(119896) are variablesthat can be acquired by the following sparse reconstructionformulas

argmin119863(119896) 120574(119896)

10038171003817100381710038171003817120601(119896) minus 119863(119896)120574(119896)10038171003817100381710038171003817

2

2

st 10038171003817100381710038171003817120574(119896)100381710038171003817100381710038170 le 119879120601

argmin119863(119896)Δ120574(119896)Δ

10038171003817100381710038171003817Δ119878(119896) minus 119863(119896)Δ 120574

(119896)Δ

100381710038171003817100381710038172

2

st 10038171003817100381710038171003817120574(119896)Δ

100381710038171003817100381710038170 le 119879Δ

(11)

4 Computational Intelligence and Neuroscience

Input Training set images and corresponding shapes 119868train = [1198681 1198682 119868119873] and 119878train = [1198781 1198782 119878119873] Testingset images and corresponding initial shapes 119868test = [1198681 1198682 1198681198731015840 ] and 119878initial = [119878

(1)10158401 119878(1)10158402 119878

(1)1015840

1198731015840] Sparse

coding parameters set 119879Δ 119879120601 size of119863Δ and size of119863 Total iterations 119876Output Final face shapes 119878test = [119878

(119876+1)10158401 119878

(119876+1)10158402 119878

(119876+1)1015840

1198731015840]

Training Stagefor 119896 from 1 to 119876 dostep 1 Given 119878lowast and 119878(119896) obtain Δ119878(119896) Then extract 120601(119896) from 119878(119896)step 2 Sequentially get 120574(119896)119863(119896)Δ and 120574(119896)Δ using Equations (11)step 3 Get119867(119896) using Equation (12)

end forTraining Model [119863(1)Δ 119863

(2)Δ 119863

(119876)Δ ] [119863

(1) 119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)]Testing Stagefor 119896 from 1 to 119876 dostep 1 Similarly extract 120601(119896)1015840 on 119878(119896)1015840step 2 Given119863(119896)Δ 119863

(119896) and119867(119896) calculate Δ119878(119896)1015840 using Equation (14)step 3 Obtain 119878(119896+1) using Equation (13)

end for

Pseudocode 1 Pseudocode of our proposed regression-based method

119879Δ and 119879120601 represent the shape increment sparsity and thelocal appearance sparsity respectively Given 120574(119896) and 120574(119896)Δ wecan get119867(119896) by

arg min119867(119896)

10038171003817100381710038171003817120574(119896)Δ minus 119867

(119896)120574(119896)100381710038171003817100381710038172

2 (12)

Finally we can generate a set of [119863(1)Δ 119863(2)Δ 119863

(119876)Δ ] [119863

(1)119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)] after 119876 iterationsHere 119876 is the number of iterations and 119896 = 1 2 119876

During the testing stage we can get the local appearancecoefficients 120574(119896minus1)1015840 using the already learned 119863(119896minus1) Thenthe final face shape is estimated using (16) and (17) after 119876iterations

119878(119896+1)1015840 = 119878(119896)1015840 + Δ119878(119896)1015840 (13)

Δ119878(119896+1)1015840 = 119863(119896)Δ 119867(119896)120574(119896)1015840 (14)

36 Major Contributions of the Proposed Method In thissection we summarize the following three contributions ofthe proposed method

(1) Sparse coding is utilized to learn a set of coupleddictionaries named the shape increment dictionaryand the local appearance dictionary The solved cor-responding sparse coefficients are embedded in aregression framework for approximating the ground-truth shape increments

(2) A way of alternate verification and local enumera-tion is applied for selecting the best parameter setin extensive experiments Moreover it is shown inexperimental results that the proposed method has astrong stability under different parameter settings

(3) We also rebuild testing conditions that the top 510 15 20 and 25 of the testing images are

removed according to the descending order sortedby the normalized alignment error And then theproposed method is compared with three classicalmethods on three publicly available face datasetsResults support that the proposed method achievesbetter detection accuracy and robustness than theother three methods

4 Experiments

41 FaceDatasets In this section three publicly available facedatasets are selected for performance comparison LabeledFace Parts in the Wild (LFPW-68 points and LFPW-29points [16]) and Caltech Occluded Faces in the Wild 2013(COFW) [34] In the downloaded LFPWdataset 811 trainingimages and 224 testing images are collected Both the 68pointsrsquo configuration and the 29 pointsrsquo configuration labeledfor the LFPW dataset are evaluated The COFW datasetincludes 1345 training images and 507 testing images andeach image is labeled with 29 facial feature points and relatedbinary occlusion information Particularly collected imagesin this dataset show a variety of occlusions and large shapevariations

42 Implementation Details

421 Codes The implementation codes of SDM [31] ExplicitShape Regression (ESR) [30] and Robust Cascaded PoseRegression (RCPR) [28] are got from the Internet Exceptthat the codes of RCPR and ESR are released on the personalwebsites by at least one of the authors we get the code of SDMfrom Github

422 Parameter Settings Generally the size of shape incre-ment dictionary and local appearance dictionary in ourmethod depends on the dimensionality of the HoG descrip-tor And in the following validation experiments we will

Computational Intelligence and Neuroscience 5

Table 1 Comparison of different parameter sets on LFPW (68 points) dataset Here 119879Δ is fixed to 2

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors

2 2

256 256 5 1 amp 1 0079381256 512 5 1 amp 1 008572512 256 5 1 amp 1 0085932512 512 5 1 amp 1 0086731

2 4

256 256 5 1 amp 1 0081958256 512 5 1 amp 1 0083715512 256 5 1 amp 1 0086187512 512 5 1 amp 1 0087157

2 6

256 256 5 1 amp 1 007937256 512 5 1 amp 1 007986512 256 5 1 amp 1 0075987512 512 5 1 amp 1 0084429

2 8

256 256 5 1 amp 1 0075863256 512 5 1 amp 1 0082588512 256 5 1 amp 1 0077048512 512 5 1 amp 1 0082644

2 10

256 256 5 1 amp 1 0076178256 512 5 1 amp 1 0076865512 256 5 1 amp 1 0080907512 512 5 1 amp 1 0088414

introduce how to select the best combination of parametersParameters settings of SDM ESR and RCPR are consistentwith the original settings reported in the papers In SDM theregression runs 5 stages In ESR the number of features in afern and candidate pixel features are 5 and 400 respectivelyTo build themodel themethod uses 10 and 500 stages to traina two-level boosted framework And in RCPR 15 iterations5 restarts 400 features and 100 random fern regressors areadopted

423 Assessment Criteria In our experiments we use thefollowing equation to calculate and normalize the alignmenterrors Firstly we calculate the localization errors betweenthe ground-truth point coordinates and the detected pointcoordinates that is the Euclidean distance between twovectors Then it is further normalized by the interoculardistance as follows

119889error =119875 minus 119866210038171003817100381710038171003817119866leye minus 119866reye

100381710038171003817100381710038172 (15)

In (15) 119875 denotes the detected facial point coordinates and 119866denotes the ground-truth point coordinates 119866leye and 119866reyedenote the ground-truth center coordinates of left eye andright eye respectively

43 Experiments

431 Parameter Validation In this section we will introducehow to use the way of alternate verification and local enu-meration to find the final values of parameters As describedabove there are six variables 119879Δ 119879120601 size of 119863Δ size of 119863119876 and 119870 that need to be fixed here 119870 is the initialization

number during the process of training and testing Depend-ing on the requirements of sparsity the candidate values of119879Δ and 119879120601 are selected from the following set

119879Δ 119879120601 isin 2 4 6 8 10 (16)

Similarly the candidate sizes of119863Δ sizes of119863119876 and119870 formthe following sets

size of 119863Δ isin 256 512

size of 119863 isin 256 512

119876 isin 1 2 3 4 5

119870 isin 1 6 8 10 12 14

(17)

Firstly the values of119876 and119870 are set to 5 and 1 respectivelyNote that the value of 119870 in the testing stage should be equalto the value of 119870 in the training stage Then we set the valueof 119879120601 to 2 4 6 8 and 10 sequentially The size of 119863Δ and 119863is selected in random combination For different values of 119879Δwe can get five groups of results AndTable 1 gives the detailedresults when 119879Δ is fixed to 2 From Table 1 we may find thatthe parameter set 2 8 256 256 5 1 amp 1 achieves the lowestalignment error Similarly we conduct the rest experimentsand find the best parameter sets The corresponding sparsityis also fixed and therefore we get three sets of parameters thatare 6 10 512 256 8 8 512 256 and 10 10 512 256 InTable 2 we test the multi-initialization and multiparameterstrategies while the regression runs 4 iterations and 10initializations with different parameter settings In the finalall point localizations are averaged to get the fusion result

6 Computational Intelligence and Neuroscience

Table 2 Comparison of multi-initialization and multiparameter strategies on LFPW (68 points) dataset Here 119876 and 119870 are set to 4 and 10respectively

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors Fusion errors6 10 512 256 4 10 amp 10 0062189

005517910 10 512 256 4 10 amp 10 0060758 8 512 256 4 10 amp 10 0061787

Table 3Mean error of each facial component on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasetsThe top 5maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00829 00619 00478 00395 00369ESR 00862 00750 00651 00596 00527RCPR 00948 00690 00562 00493 00433

Our method 00747 00587 00455 00405 00392

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00422 00422 00410 00422 00328ESR 00570 00459 00531 00502 00400RCPR 00748 00507 00568 00509 00357

Our method 00382 00403 00401 00406 00323

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00713 00714 00709 00600 00519ESR 01507 01022 01082 00952 00801RCPR 01209 00810 00781 00655 00539

Our method 00668 00642 00702 00567 00497

432 Comparison with Previous Methods Due to the exis-tence of a small number of facial images having large shapevariations and severe occlusions it challenges the randommulti-initialization strategy which fails to generate an appro-priate starting shapeTherefore we compare ourmethod withthree classic methods on rebuilt datasets These datasets stillinclude most of the images coming from LFPW (68 points)LFPW (29 points) and COFW (29 points) We just removethe top 5 10 15 20 and 25 of the testing facialimages in each dataset by sorting the alignment errors in adescending order (see Figure 2)

In Figure 2 all curves of COFW show a more dispersivedistribution than the other two datasets Since this datasetconsists of many more facial images with large shape vari-ations and occlusions it may affect the detection accuracymore or less Meanwhile the irregular textural featuresaround facial feature points are challenging for learning ofstructural model during the training Obviously in Figure 2the curves of our proposedmethod are superior to the othersAdditionally the LFPW (68 points) and LFPW (29 points)share the same facial images but different face shapes so wemay find some useful information about the performance ofmethods through these datasets

In general the more facial feature points are the moredifficult they are to detect By comparing among five facial

components themean errors of nose and eyes given in Tables3 and 4 do not change obviously across three datasets becausethe vicinal textural information of eyes is easy to recognizeand the textural information aroundnose has a less possibilityto be occluded Moreover the facial feature points locatedin the regions of nose and eyes are denser than the pointsof contour which is also benefit to the regressive searchingprocess

Figure 3 shows the alignment errors of four methodstested on LFPW (68 points) LFPW (29 points) and COFW(29 points) datasets In Figure 3 we may find that the meanerror curves show a rapid descending trend when the mostdifficult 5 of testing images are removed It indicates thatthe statistical average can be biased by a few challenge imagesThen as the removal proportion increases all the curvesbecome smoother It shows in Figure 3 that our proposedmethod is more stable than other methods which means ourtraining model has a robustness in dealing with occlusionsand large shape variations

Specifically we plot the detection curves of five facialcomponents in Figure 4 It is obvious in Figure 4 that ESR andRCPR has a less competitive performance for localizing eachfacial components And our method shows better robustnessin localizing feature points that belong to eyebrows andcontour since these two facial components are very likely to

Computational Intelligence and Neuroscience 7

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(a)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(b)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(c)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(d)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(e)

Figure 2 Continued

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 4: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

4 Computational Intelligence and Neuroscience

Input Training set images and corresponding shapes 119868train = [1198681 1198682 119868119873] and 119878train = [1198781 1198782 119878119873] Testingset images and corresponding initial shapes 119868test = [1198681 1198682 1198681198731015840 ] and 119878initial = [119878

(1)10158401 119878(1)10158402 119878

(1)1015840

1198731015840] Sparse

coding parameters set 119879Δ 119879120601 size of119863Δ and size of119863 Total iterations 119876Output Final face shapes 119878test = [119878

(119876+1)10158401 119878

(119876+1)10158402 119878

(119876+1)1015840

1198731015840]

Training Stagefor 119896 from 1 to 119876 dostep 1 Given 119878lowast and 119878(119896) obtain Δ119878(119896) Then extract 120601(119896) from 119878(119896)step 2 Sequentially get 120574(119896)119863(119896)Δ and 120574(119896)Δ using Equations (11)step 3 Get119867(119896) using Equation (12)

end forTraining Model [119863(1)Δ 119863

(2)Δ 119863

(119876)Δ ] [119863

(1) 119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)]Testing Stagefor 119896 from 1 to 119876 dostep 1 Similarly extract 120601(119896)1015840 on 119878(119896)1015840step 2 Given119863(119896)Δ 119863

(119896) and119867(119896) calculate Δ119878(119896)1015840 using Equation (14)step 3 Obtain 119878(119896+1) using Equation (13)

end for

Pseudocode 1 Pseudocode of our proposed regression-based method

119879Δ and 119879120601 represent the shape increment sparsity and thelocal appearance sparsity respectively Given 120574(119896) and 120574(119896)Δ wecan get119867(119896) by

arg min119867(119896)

10038171003817100381710038171003817120574(119896)Δ minus 119867

(119896)120574(119896)100381710038171003817100381710038172

2 (12)

Finally we can generate a set of [119863(1)Δ 119863(2)Δ 119863

(119876)Δ ] [119863

(1)119863(2) 119863(119876)] and [119867(1) 119867(2) 119867(119876)] after 119876 iterationsHere 119876 is the number of iterations and 119896 = 1 2 119876

During the testing stage we can get the local appearancecoefficients 120574(119896minus1)1015840 using the already learned 119863(119896minus1) Thenthe final face shape is estimated using (16) and (17) after 119876iterations

119878(119896+1)1015840 = 119878(119896)1015840 + Δ119878(119896)1015840 (13)

Δ119878(119896+1)1015840 = 119863(119896)Δ 119867(119896)120574(119896)1015840 (14)

36 Major Contributions of the Proposed Method In thissection we summarize the following three contributions ofthe proposed method

(1) Sparse coding is utilized to learn a set of coupleddictionaries named the shape increment dictionaryand the local appearance dictionary The solved cor-responding sparse coefficients are embedded in aregression framework for approximating the ground-truth shape increments

(2) A way of alternate verification and local enumera-tion is applied for selecting the best parameter setin extensive experiments Moreover it is shown inexperimental results that the proposed method has astrong stability under different parameter settings

(3) We also rebuild testing conditions that the top 510 15 20 and 25 of the testing images are

removed according to the descending order sortedby the normalized alignment error And then theproposed method is compared with three classicalmethods on three publicly available face datasetsResults support that the proposed method achievesbetter detection accuracy and robustness than theother three methods

4 Experiments

41 FaceDatasets In this section three publicly available facedatasets are selected for performance comparison LabeledFace Parts in the Wild (LFPW-68 points and LFPW-29points [16]) and Caltech Occluded Faces in the Wild 2013(COFW) [34] In the downloaded LFPWdataset 811 trainingimages and 224 testing images are collected Both the 68pointsrsquo configuration and the 29 pointsrsquo configuration labeledfor the LFPW dataset are evaluated The COFW datasetincludes 1345 training images and 507 testing images andeach image is labeled with 29 facial feature points and relatedbinary occlusion information Particularly collected imagesin this dataset show a variety of occlusions and large shapevariations

42 Implementation Details

421 Codes The implementation codes of SDM [31] ExplicitShape Regression (ESR) [30] and Robust Cascaded PoseRegression (RCPR) [28] are got from the Internet Exceptthat the codes of RCPR and ESR are released on the personalwebsites by at least one of the authors we get the code of SDMfrom Github

422 Parameter Settings Generally the size of shape incre-ment dictionary and local appearance dictionary in ourmethod depends on the dimensionality of the HoG descrip-tor And in the following validation experiments we will

Computational Intelligence and Neuroscience 5

Table 1 Comparison of different parameter sets on LFPW (68 points) dataset Here 119879Δ is fixed to 2

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors

2 2

256 256 5 1 amp 1 0079381256 512 5 1 amp 1 008572512 256 5 1 amp 1 0085932512 512 5 1 amp 1 0086731

2 4

256 256 5 1 amp 1 0081958256 512 5 1 amp 1 0083715512 256 5 1 amp 1 0086187512 512 5 1 amp 1 0087157

2 6

256 256 5 1 amp 1 007937256 512 5 1 amp 1 007986512 256 5 1 amp 1 0075987512 512 5 1 amp 1 0084429

2 8

256 256 5 1 amp 1 0075863256 512 5 1 amp 1 0082588512 256 5 1 amp 1 0077048512 512 5 1 amp 1 0082644

2 10

256 256 5 1 amp 1 0076178256 512 5 1 amp 1 0076865512 256 5 1 amp 1 0080907512 512 5 1 amp 1 0088414

introduce how to select the best combination of parametersParameters settings of SDM ESR and RCPR are consistentwith the original settings reported in the papers In SDM theregression runs 5 stages In ESR the number of features in afern and candidate pixel features are 5 and 400 respectivelyTo build themodel themethod uses 10 and 500 stages to traina two-level boosted framework And in RCPR 15 iterations5 restarts 400 features and 100 random fern regressors areadopted

423 Assessment Criteria In our experiments we use thefollowing equation to calculate and normalize the alignmenterrors Firstly we calculate the localization errors betweenthe ground-truth point coordinates and the detected pointcoordinates that is the Euclidean distance between twovectors Then it is further normalized by the interoculardistance as follows

119889error =119875 minus 119866210038171003817100381710038171003817119866leye minus 119866reye

100381710038171003817100381710038172 (15)

In (15) 119875 denotes the detected facial point coordinates and 119866denotes the ground-truth point coordinates 119866leye and 119866reyedenote the ground-truth center coordinates of left eye andright eye respectively

43 Experiments

431 Parameter Validation In this section we will introducehow to use the way of alternate verification and local enu-meration to find the final values of parameters As describedabove there are six variables 119879Δ 119879120601 size of 119863Δ size of 119863119876 and 119870 that need to be fixed here 119870 is the initialization

number during the process of training and testing Depend-ing on the requirements of sparsity the candidate values of119879Δ and 119879120601 are selected from the following set

119879Δ 119879120601 isin 2 4 6 8 10 (16)

Similarly the candidate sizes of119863Δ sizes of119863119876 and119870 formthe following sets

size of 119863Δ isin 256 512

size of 119863 isin 256 512

119876 isin 1 2 3 4 5

119870 isin 1 6 8 10 12 14

(17)

Firstly the values of119876 and119870 are set to 5 and 1 respectivelyNote that the value of 119870 in the testing stage should be equalto the value of 119870 in the training stage Then we set the valueof 119879120601 to 2 4 6 8 and 10 sequentially The size of 119863Δ and 119863is selected in random combination For different values of 119879Δwe can get five groups of results AndTable 1 gives the detailedresults when 119879Δ is fixed to 2 From Table 1 we may find thatthe parameter set 2 8 256 256 5 1 amp 1 achieves the lowestalignment error Similarly we conduct the rest experimentsand find the best parameter sets The corresponding sparsityis also fixed and therefore we get three sets of parameters thatare 6 10 512 256 8 8 512 256 and 10 10 512 256 InTable 2 we test the multi-initialization and multiparameterstrategies while the regression runs 4 iterations and 10initializations with different parameter settings In the finalall point localizations are averaged to get the fusion result

6 Computational Intelligence and Neuroscience

Table 2 Comparison of multi-initialization and multiparameter strategies on LFPW (68 points) dataset Here 119876 and 119870 are set to 4 and 10respectively

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors Fusion errors6 10 512 256 4 10 amp 10 0062189

005517910 10 512 256 4 10 amp 10 0060758 8 512 256 4 10 amp 10 0061787

Table 3Mean error of each facial component on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasetsThe top 5maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00829 00619 00478 00395 00369ESR 00862 00750 00651 00596 00527RCPR 00948 00690 00562 00493 00433

Our method 00747 00587 00455 00405 00392

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00422 00422 00410 00422 00328ESR 00570 00459 00531 00502 00400RCPR 00748 00507 00568 00509 00357

Our method 00382 00403 00401 00406 00323

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00713 00714 00709 00600 00519ESR 01507 01022 01082 00952 00801RCPR 01209 00810 00781 00655 00539

Our method 00668 00642 00702 00567 00497

432 Comparison with Previous Methods Due to the exis-tence of a small number of facial images having large shapevariations and severe occlusions it challenges the randommulti-initialization strategy which fails to generate an appro-priate starting shapeTherefore we compare ourmethod withthree classic methods on rebuilt datasets These datasets stillinclude most of the images coming from LFPW (68 points)LFPW (29 points) and COFW (29 points) We just removethe top 5 10 15 20 and 25 of the testing facialimages in each dataset by sorting the alignment errors in adescending order (see Figure 2)

In Figure 2 all curves of COFW show a more dispersivedistribution than the other two datasets Since this datasetconsists of many more facial images with large shape vari-ations and occlusions it may affect the detection accuracymore or less Meanwhile the irregular textural featuresaround facial feature points are challenging for learning ofstructural model during the training Obviously in Figure 2the curves of our proposedmethod are superior to the othersAdditionally the LFPW (68 points) and LFPW (29 points)share the same facial images but different face shapes so wemay find some useful information about the performance ofmethods through these datasets

In general the more facial feature points are the moredifficult they are to detect By comparing among five facial

components themean errors of nose and eyes given in Tables3 and 4 do not change obviously across three datasets becausethe vicinal textural information of eyes is easy to recognizeand the textural information aroundnose has a less possibilityto be occluded Moreover the facial feature points locatedin the regions of nose and eyes are denser than the pointsof contour which is also benefit to the regressive searchingprocess

Figure 3 shows the alignment errors of four methodstested on LFPW (68 points) LFPW (29 points) and COFW(29 points) datasets In Figure 3 we may find that the meanerror curves show a rapid descending trend when the mostdifficult 5 of testing images are removed It indicates thatthe statistical average can be biased by a few challenge imagesThen as the removal proportion increases all the curvesbecome smoother It shows in Figure 3 that our proposedmethod is more stable than other methods which means ourtraining model has a robustness in dealing with occlusionsand large shape variations

Specifically we plot the detection curves of five facialcomponents in Figure 4 It is obvious in Figure 4 that ESR andRCPR has a less competitive performance for localizing eachfacial components And our method shows better robustnessin localizing feature points that belong to eyebrows andcontour since these two facial components are very likely to

Computational Intelligence and Neuroscience 7

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(a)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(b)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(c)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(d)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(e)

Figure 2 Continued

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 5: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

Computational Intelligence and Neuroscience 5

Table 1 Comparison of different parameter sets on LFPW (68 points) dataset Here 119879Δ is fixed to 2

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors

2 2

256 256 5 1 amp 1 0079381256 512 5 1 amp 1 008572512 256 5 1 amp 1 0085932512 512 5 1 amp 1 0086731

2 4

256 256 5 1 amp 1 0081958256 512 5 1 amp 1 0083715512 256 5 1 amp 1 0086187512 512 5 1 amp 1 0087157

2 6

256 256 5 1 amp 1 007937256 512 5 1 amp 1 007986512 256 5 1 amp 1 0075987512 512 5 1 amp 1 0084429

2 8

256 256 5 1 amp 1 0075863256 512 5 1 amp 1 0082588512 256 5 1 amp 1 0077048512 512 5 1 amp 1 0082644

2 10

256 256 5 1 amp 1 0076178256 512 5 1 amp 1 0076865512 256 5 1 amp 1 0080907512 512 5 1 amp 1 0088414

introduce how to select the best combination of parametersParameters settings of SDM ESR and RCPR are consistentwith the original settings reported in the papers In SDM theregression runs 5 stages In ESR the number of features in afern and candidate pixel features are 5 and 400 respectivelyTo build themodel themethod uses 10 and 500 stages to traina two-level boosted framework And in RCPR 15 iterations5 restarts 400 features and 100 random fern regressors areadopted

423 Assessment Criteria In our experiments we use thefollowing equation to calculate and normalize the alignmenterrors Firstly we calculate the localization errors betweenthe ground-truth point coordinates and the detected pointcoordinates that is the Euclidean distance between twovectors Then it is further normalized by the interoculardistance as follows

119889error =119875 minus 119866210038171003817100381710038171003817119866leye minus 119866reye

100381710038171003817100381710038172 (15)

In (15) 119875 denotes the detected facial point coordinates and 119866denotes the ground-truth point coordinates 119866leye and 119866reyedenote the ground-truth center coordinates of left eye andright eye respectively

43 Experiments

431 Parameter Validation In this section we will introducehow to use the way of alternate verification and local enu-meration to find the final values of parameters As describedabove there are six variables 119879Δ 119879120601 size of 119863Δ size of 119863119876 and 119870 that need to be fixed here 119870 is the initialization

number during the process of training and testing Depend-ing on the requirements of sparsity the candidate values of119879Δ and 119879120601 are selected from the following set

119879Δ 119879120601 isin 2 4 6 8 10 (16)

Similarly the candidate sizes of119863Δ sizes of119863119876 and119870 formthe following sets

size of 119863Δ isin 256 512

size of 119863 isin 256 512

119876 isin 1 2 3 4 5

119870 isin 1 6 8 10 12 14

(17)

Firstly the values of119876 and119870 are set to 5 and 1 respectivelyNote that the value of 119870 in the testing stage should be equalto the value of 119870 in the training stage Then we set the valueof 119879120601 to 2 4 6 8 and 10 sequentially The size of 119863Δ and 119863is selected in random combination For different values of 119879Δwe can get five groups of results AndTable 1 gives the detailedresults when 119879Δ is fixed to 2 From Table 1 we may find thatthe parameter set 2 8 256 256 5 1 amp 1 achieves the lowestalignment error Similarly we conduct the rest experimentsand find the best parameter sets The corresponding sparsityis also fixed and therefore we get three sets of parameters thatare 6 10 512 256 8 8 512 256 and 10 10 512 256 InTable 2 we test the multi-initialization and multiparameterstrategies while the regression runs 4 iterations and 10initializations with different parameter settings In the finalall point localizations are averaged to get the fusion result

6 Computational Intelligence and Neuroscience

Table 2 Comparison of multi-initialization and multiparameter strategies on LFPW (68 points) dataset Here 119876 and 119870 are set to 4 and 10respectively

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors Fusion errors6 10 512 256 4 10 amp 10 0062189

005517910 10 512 256 4 10 amp 10 0060758 8 512 256 4 10 amp 10 0061787

Table 3Mean error of each facial component on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasetsThe top 5maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00829 00619 00478 00395 00369ESR 00862 00750 00651 00596 00527RCPR 00948 00690 00562 00493 00433

Our method 00747 00587 00455 00405 00392

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00422 00422 00410 00422 00328ESR 00570 00459 00531 00502 00400RCPR 00748 00507 00568 00509 00357

Our method 00382 00403 00401 00406 00323

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00713 00714 00709 00600 00519ESR 01507 01022 01082 00952 00801RCPR 01209 00810 00781 00655 00539

Our method 00668 00642 00702 00567 00497

432 Comparison with Previous Methods Due to the exis-tence of a small number of facial images having large shapevariations and severe occlusions it challenges the randommulti-initialization strategy which fails to generate an appro-priate starting shapeTherefore we compare ourmethod withthree classic methods on rebuilt datasets These datasets stillinclude most of the images coming from LFPW (68 points)LFPW (29 points) and COFW (29 points) We just removethe top 5 10 15 20 and 25 of the testing facialimages in each dataset by sorting the alignment errors in adescending order (see Figure 2)

In Figure 2 all curves of COFW show a more dispersivedistribution than the other two datasets Since this datasetconsists of many more facial images with large shape vari-ations and occlusions it may affect the detection accuracymore or less Meanwhile the irregular textural featuresaround facial feature points are challenging for learning ofstructural model during the training Obviously in Figure 2the curves of our proposedmethod are superior to the othersAdditionally the LFPW (68 points) and LFPW (29 points)share the same facial images but different face shapes so wemay find some useful information about the performance ofmethods through these datasets

In general the more facial feature points are the moredifficult they are to detect By comparing among five facial

components themean errors of nose and eyes given in Tables3 and 4 do not change obviously across three datasets becausethe vicinal textural information of eyes is easy to recognizeand the textural information aroundnose has a less possibilityto be occluded Moreover the facial feature points locatedin the regions of nose and eyes are denser than the pointsof contour which is also benefit to the regressive searchingprocess

Figure 3 shows the alignment errors of four methodstested on LFPW (68 points) LFPW (29 points) and COFW(29 points) datasets In Figure 3 we may find that the meanerror curves show a rapid descending trend when the mostdifficult 5 of testing images are removed It indicates thatthe statistical average can be biased by a few challenge imagesThen as the removal proportion increases all the curvesbecome smoother It shows in Figure 3 that our proposedmethod is more stable than other methods which means ourtraining model has a robustness in dealing with occlusionsand large shape variations

Specifically we plot the detection curves of five facialcomponents in Figure 4 It is obvious in Figure 4 that ESR andRCPR has a less competitive performance for localizing eachfacial components And our method shows better robustnessin localizing feature points that belong to eyebrows andcontour since these two facial components are very likely to

Computational Intelligence and Neuroscience 7

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(a)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(b)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(c)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(d)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(e)

Figure 2 Continued

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 6: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

6 Computational Intelligence and Neuroscience

Table 2 Comparison of multi-initialization and multiparameter strategies on LFPW (68 points) dataset Here 119876 and 119870 are set to 4 and 10respectively

119879Δ 119879120601 Size of119863Δ Size of119863 119876 119870 (tr amp ts) Mean errors Fusion errors6 10 512 256 4 10 amp 10 0062189

005517910 10 512 256 4 10 amp 10 0060758 8 512 256 4 10 amp 10 0061787

Table 3Mean error of each facial component on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasetsThe top 5maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00829 00619 00478 00395 00369ESR 00862 00750 00651 00596 00527RCPR 00948 00690 00562 00493 00433

Our method 00747 00587 00455 00405 00392

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00422 00422 00410 00422 00328ESR 00570 00459 00531 00502 00400RCPR 00748 00507 00568 00509 00357

Our method 00382 00403 00401 00406 00323

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00713 00714 00709 00600 00519ESR 01507 01022 01082 00952 00801RCPR 01209 00810 00781 00655 00539

Our method 00668 00642 00702 00567 00497

432 Comparison with Previous Methods Due to the exis-tence of a small number of facial images having large shapevariations and severe occlusions it challenges the randommulti-initialization strategy which fails to generate an appro-priate starting shapeTherefore we compare ourmethod withthree classic methods on rebuilt datasets These datasets stillinclude most of the images coming from LFPW (68 points)LFPW (29 points) and COFW (29 points) We just removethe top 5 10 15 20 and 25 of the testing facialimages in each dataset by sorting the alignment errors in adescending order (see Figure 2)

In Figure 2 all curves of COFW show a more dispersivedistribution than the other two datasets Since this datasetconsists of many more facial images with large shape vari-ations and occlusions it may affect the detection accuracymore or less Meanwhile the irregular textural featuresaround facial feature points are challenging for learning ofstructural model during the training Obviously in Figure 2the curves of our proposedmethod are superior to the othersAdditionally the LFPW (68 points) and LFPW (29 points)share the same facial images but different face shapes so wemay find some useful information about the performance ofmethods through these datasets

In general the more facial feature points are the moredifficult they are to detect By comparing among five facial

components themean errors of nose and eyes given in Tables3 and 4 do not change obviously across three datasets becausethe vicinal textural information of eyes is easy to recognizeand the textural information aroundnose has a less possibilityto be occluded Moreover the facial feature points locatedin the regions of nose and eyes are denser than the pointsof contour which is also benefit to the regressive searchingprocess

Figure 3 shows the alignment errors of four methodstested on LFPW (68 points) LFPW (29 points) and COFW(29 points) datasets In Figure 3 we may find that the meanerror curves show a rapid descending trend when the mostdifficult 5 of testing images are removed It indicates thatthe statistical average can be biased by a few challenge imagesThen as the removal proportion increases all the curvesbecome smoother It shows in Figure 3 that our proposedmethod is more stable than other methods which means ourtraining model has a robustness in dealing with occlusionsand large shape variations

Specifically we plot the detection curves of five facialcomponents in Figure 4 It is obvious in Figure 4 that ESR andRCPR has a less competitive performance for localizing eachfacial components And our method shows better robustnessin localizing feature points that belong to eyebrows andcontour since these two facial components are very likely to

Computational Intelligence and Neuroscience 7

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(a)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(b)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(c)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(d)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(e)

Figure 2 Continued

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 7: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

Computational Intelligence and Neuroscience 7

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(a)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(b)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(c)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

002040608

1

002040608

1

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Our methodSDM

RCPRESR

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(d)

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

(e)

Figure 2 Continued

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 8: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

8 Computational Intelligence and Neuroscience

LFPW (68 points) CED LFPW (29 points) CED COFW (29 points) CED

Our methodSDM

RCPRESR

002040608

1

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

Prop

ortio

n of

da

taba

se im

ages

01 02 03 04 050Relative error

01 02 03 04 050Relative error

01 02 03 04 050Relative error

Our methodSDM

RCPRESR

002040608

1

Our methodSDM

RCPRESR

002040608

1

(f)

Figure 2 Cumulative Error Distribution (CED) curves of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29points) datasetsThe top (a) 0 (b) 5 (c) 10 (d) 15 (e) 20 and (f) 25 of the testing images are removed according to the descendingorder sorted by the normalized alignment errors

Table 4Mean error of each facial component on LFPW(68 points) LFPW(29 points) andCOFW(29 points) datasetsThe top 25maximalmean errors of the testing facial images in each dataset are removed

(a)

Method Contour Eyebrow Mouth Nose Eye

LFPW (68 points)

SDM 00718 00581 00433 00363 00337ESR 00746 00634 00535 00435 00408RCPR 00830 00615 00484 00414 00367

Our method 00634 00518 00406 00364 00332

(b)

Method Jaw Eyebrow Mouth Nose Eye

LFPW (29 points)

SDM 00385 00381 00376 00389 00295ESR 00498 00419 00461 00435 00350RCPR 00637 00460 00471 00433 00309

Our method 00336 00360 00365 00362 00292

(c)

Method Jaw Eyebrow Mouth Nose Eye

COFW (29 points)

SDM 00607 00643 00614 00525 00457ESR 01255 00873 00882 00781 00664RCPR 01055 00679 00633 00533 00440

Our method 00561 00569 00603 00497 00435

LFPW (68 points)

Our methodSDM

RCPRESR

LFPW (29 points) COFW (29 points)

0045005

0055006

0065007

0075008

Aver

age a

lignm

ent e

rror

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

RCPRESR

5 10 15 20 250Proportion of removal database images ()

0030035

0040045

0050055

Aver

age a

lignm

ent e

rror

005006007008009

01011

Aver

age a

lignm

ent e

rror

Figure 3 Mean errors of four methods tested on LFPW (68 points) LFPW (29 points) and COFW (29 points) datasets

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 9: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

Computational Intelligence and Neuroscience 9

LFPW (68 points) eyebrow

Our methodSDM

ESRRCPR

LFPW (29 points) eyebrow COFW (29 points) eyebrow

0050055006

0065007

0075008

0085

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

Aver

age a

lignm

ent

erro

r

0040045005

0055006

00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(a)

LFPW (68 points) eye LFPW (29 points) eye COFW (29 points) eye

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

0030035004

0045005

0055006

0065Av

erag

e alig

nmen

ter

ror

0025003

0035004

0045005

0055

Aver

age a

lignm

ent

erro

r

00400500600700800901

011

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(b)

LFPW (68 points) nose LFPW (29 points) nose COFW (29 points) nose

Our methodSDM

ESRRCPR

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

0040045005

0055006

0065007

Aver

age a

lignm

ent

erro

r00400600801

012014

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(c)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) mouth LFPW (29 points) mouth COFW (29 points) mouth

Our methodSDM

ESRRCPR

004

005

006

007

008

Aver

age a

lignm

ent

erro

r

003004005006007008

Aver

age a

lignm

ent

erro

r

00600801

012014016

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

Our methodSDM

ESRRCPR

5 10 15 20 250Proportion of removal database images ()

(d)

Aver

age a

lignm

ent

erro

r

LFPW (68 points) contour LFPW (29 points) jaw COFW (29 points) jaw

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

Our methodSDM

ESRRCPR

00600700800901

011

Aver

age a

lignm

ent

erro

r

003004005006007008009

Aver

age a

lignm

ent

erro

r

005

01

015

02

025

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

5 10 15 20 250Proportion of removal database images ()

(e)

Figure 4 Facial feature point detection curves of four methods for each facial component on LFPW (68 points) LFPW (29 points) andCOFW (29 points) datasets (a) Eyebrow (b) Eye (c) Nose (d) Mouth (e) Contour or jaw

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 10: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

10 Computational Intelligence and Neuroscience

be occluded by hair or objects and have a more separateddistribution pattern Experimental results demonstrates thatour proposed method can estimate facial feature points withhigh accuracy and is able to deal with the task of facealignment on complex occlusions and large shape variations

5 Conclusion

A robust sparse reconstruction method for facial featurepoint detection is proposed in this paper In the methodwe build the regressive training model by learning a coupledset of shape increment dictionaries and local appearancedictionaries which are learned to encode various facial posesand rich local textures And then we apply the sparse modelto infer the final face shape locations of an input image bycontinuous reconstruction of shape increments Moreoverin order to find the best matched parameters we performextensive validation tests by using the way of alternate veri-fication and local enumeration It shows in the comparisonresults that our sparse coding based reconstruction modelhas a strong stability In the later experiments we compareour proposed method with three classic methods on threepublicly available face datasets when removing the top 05 10 15 20 and 25 of the testing facial imagesaccording to the descending order of alignment errors Theexperimental results also support that ourmethod is superiorto the others in detection accuracy and robustness

Competing Interests

The authors declare that they have no competing interests

Acknowledgments

This work is supported by the National Key Researchamp Development Plan of China (no 2016YFB1001401)and National Natural Science Foundation of China (no61572110)

References

[1] Z Zhang P Luo C Change Loy and X Tang ldquoFacial landmarkdetection by deep multi-task learningrdquo in Proceedings of theEuropean Conference on Computer Vision (ECCV rsquo14) pp 94ndash108 Springer International Zurich Switzerland 2014

[2] L Vuong J Brandt Z Lin L Bourdev and T S Huang ldquoInter-active facial feature localizationrdquo in Proceedings of the Euro-pean Conference on Computer Vision pp 679ndash692 SpringerFlorence Italy October 2012

[3] H-S Lee and D Kim ldquoTensor-based AAM with continuousvariation estimation application to variation-robust face recog-nitionrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 31 no 6 pp 1102ndash1116 2009

[4] T Weise S Bouaziz L Hao and M Pauly ldquoRealtime perform-ance-based facial animationrdquo ACM Transactions on Graphicsvol 30 no 4 p 77 2011

[5] S W Chew P Lucey S Lucey J Saragih J F Cohn andS Sridharan ldquoPerson-independent facial expression detectionusing constrained local modelsrdquo in Proceedings of the IEEE

International Conference on Automatic Face and Gesture Recog-nition andWorkshops (FG rsquo11) pp 915ndash920 IEEE Santa BarbaraCalif USA March 2011

[6] C Huang X Ding and C Fang ldquoPose robust face tracking bycombining view-based AAMs and temporal filtersrdquo ComputerVision and Image Understanding vol 116 no 7 pp 777ndash7922012

[7] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo09) pp 1794ndash1801 IEEE MiamiFla USA June 2009

[8] S Gao I W-H Tsang L-T Chia and P Zhao ldquoLocal featuresare not lonelymdashLaplacian sparse coding for image classifica-tionrdquo in Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo10) pp 3555ndash3561 IEEE June 2010

[9] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[10] M Yang L Zhang J Yang and D Zhang ldquoRobust sparse cod-ing for face recognitionrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo11) pp625ndash632 IEEE Providence RI USA June 2011

[11] D-S Pham O Arandjelovic and S Venkatesh ldquoAchievingstable subspace clustering by post-processing generic clusteringresultsrdquo inProceedings of the IEEE International Joint Conferenceon Neural Networks (IJCNN rsquo16) pp 2390ndash2396 IEEE Vancou-ver Canada July 2016

[12] T F Cootes G J Edwards and C J Taylor ldquoActive appearancemodelsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 23 no 6 pp 681ndash685 2001

[13] M F Hansen J Fagertun and R Larsen ldquoElastic appearancemodelsrdquo in Proceedings of the 22nd British Machine VisionConference (BMVC rsquo11) September 2011

[14] P A Tresadern M C Ionita and T F Cootes ldquoReal-time facialfeature tracking on a mobile devicerdquo International Journal ofComputer Vision vol 96 no 3 pp 280ndash289 2012

[15] G Tzimiropoulos S Zafeiriou and M Pantic ldquoRobust andefficient parametric face alignmentrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp1847ndash1854 IEEE Barcelona Spain November 2011

[16] G Tzimiropoulos and M Pantic ldquoOptimization problems forfast AAM fitting in-the-wildrdquo in Proceedings of the 14th IEEEInternational Conference on Computer Vision (ICCV rsquo13) pp593ndash600 IEEE Sydney Australia December 2013

[17] MHNguyen andFDe laTorre ldquoLocalminima free parameter-ized appearance modelsrdquo in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo08) pp 1ndash8 IEEE Anchorage Alaska USA June 2008

[18] M H Nguyen and F De La Torre ldquoLearning image align-ment without local minima for face detection and trackingrdquoin Proceedings of the 8th IEEE International Conference onAutomatic Face and Gesture Recognition (FG rsquo08) pp 1ndash7 IEEEAmsterdam Netherlands September 2008

[19] N M Hoai and F De la Torre ldquoMetric learning for imagealignmentrdquo International Journal of Computer Vision vol 88no 1 pp 69ndash84 2010

[20] T F Cootes GVWheeler KNWalker andC J Taylor ldquoView-based active appearance modelsrdquo Image and Vision Computingvol 20 no 9-10 pp 657ndash664 2002

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010

Page 11: A Robust Shape Reconstruction Method for Facial Feature ... · 2 ComputationalIntelligenceandNeuroscience Learned model Shape increment reconstruction Training Testing {Δ +Δ +···+ΔS(A)}

Computational Intelligence and Neuroscience 11

[21] C Vogler Z Li A Kanaujia S Goldenstein and D MetaxasldquoThe best of both worlds combining 3D deformable modelswith active shape modelsrdquo in Proceedings of the IEEE 11thInternational Conference on Computer Vision (ICCV rsquo07) pp 1ndash7 IEEE Rio de Janeiro Brazil October 2007

[22] X Yu J Huang S Zhang W Yan and D N Metaxas ldquoPose-free facial landmark fitting via optimized part mixtures andcascaded deformable shape modelrdquo in Proceedings of the 14thIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1944ndash1951 IEEE Sydney Australia December 2013

[23] T F Cootes C J Taylor D H Cooper and J Graham ldquoActiveshape models-their training and applicationrdquo in ComputerVision and Image Understanding vol 61 no 1 pp 38ndash59 1995

[24] L Liang R Xiao F Wen and J Sun ldquoFace alignment viacomponent-based discriminative searchrdquo inComputer VisionmdashECCV 2008 10th European Conference on Computer VisionMarseille France October 12-18 2008 Proceedings Part II vol5303 of Lecture Notes in Computer Science pp 72ndash85 SpringerBerlin Germany 2008

[25] JM Saragih S Lucey and J F Cohn ldquoDeformablemodel fittingby regularized landmark mean-shiftrdquo International Journal ofComputer Vision vol 91 no 2 pp 200ndash215 2011

[26] H Gao H K Ekenel and R Stiefelhagen ldquoFace alignmentusing a rankingmodel based on regression treesrdquo in Proceedingsof the British Machine Vision Conference (BMVC rsquo12) pp 1ndash11London UK September 2012

[27] N Duffy and D Helmbold ldquoBoosting methods for regressionrdquoMachine Learning vol 47 no 2-3 pp 153ndash200 2002

[28] X P Burgos-Artizzu P Perona and P Dollar ldquoRobust facelandmark estimation under occlusionrdquo in Proceedings of theIEEE International Conference on Computer Vision (ICCV rsquo13)pp 1513ndash1520 Sydney Australia 2013

[29] P Dollar P Welinder and P Perona ldquoCascaded pose regres-sionrdquo in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 1078ndash1085 IEEE SanFrancisco Calif USA June 2010

[30] X Cao Y Wei F Wen and J Sun ldquoFace alignment by explicitshape regressionrdquo International Journal of Computer Vision vol107 no 2 pp 177ndash190 2014

[31] X Xiong and F De La Torre ldquoSupervised descent method andits applications to face alignmentrdquo in Proceedings of the 26thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo13) pp 532ndash539 Portland Ore USA June 2013

[32] Y C Pati R Rezaiifar and P S Krishnaprasad ldquoOrthogonalmatching pursuit recursive function approximation with appli-cations to wavelet decompositionrdquo in Proceedings of the 27thAsilomar Conference on Signals Systems amp Computers pp 40ndash44 Pacific Grove Calif USA November 1993

[33] M Aharon M Elad and A Bruckstein ldquoK-SVD an algorithmfor designing overcomplete dictionaries for sparse representa-tionrdquo IEEE Transactions on Signal Processing vol 54 no 11 pp4311ndash4322 2006

[34] X Gao Y Su X Li and D Tao ldquoA review of active appearancemodelsrdquo IEEE Transactions on Systems Man and CyberneticsPart C Applications and Reviews vol 40 no 2 pp 145ndash1582010