DETECTION OF SIDE-VIEW FACES IN COLOR IMAGES · For submission to WACV 2000, paper number: 1 Page 2 of 7 Detection of Side-view Faces in Color Images shading and occlusion, ending

For submission to WACV 2000, paper number: 1

Page 1 of 7 Detection of Side-view Faces in Color Images

DETECTION OF SIDE-VIEW FACES IN COLOR IMAGES

Gang Wei1 , Dongge Li1 and Ishwar K. Sethi21Department of Computer Science

Wayne State University, Detroit, MI 482022Intelligent Information Engineering Laboratory

Department of Computer Science and EngineeringOakland University, Rochester, MI 48309-4478

Abstract

A coarse-to-fine scheme for the detection ofside-view faces in color images is proposed inthis paper, which extends the current state of theart of face detection research. The input imagecan be of complex scene, containing clutteredbackground and confusing objects like frontal-view faces. The system consists of four stages,each of which is a refinement of the previousone, namely: 1) skin-tone detection by color, 2)region and edge preprocessing withmorphological operations and length filtering, 3)face candidate region selection based onnormalized similarity value and 4) finalverification using Hidden Markov Models.Encouraging experimental results have beenobtained, due to the utilization of multiplefeatures of the input image and the conjunctionof employment of various image processing andpattern recognition techniques. The paper hasseveral original designs besides being able todetect faces other than frontal-view, includingthe Normalized Similarity Value (NSV) to detectthe presence of a given curve pattern, theIterative Partition Process to segment the objectfrom confusing extraneous regions for higherdetection accuracy and the exploration of the useof HMM to recognize objects in images.

1. Introduction

The problem of face detection in images -locating image areas corresponding to humanfaces - has received a considerable amount ofattention in recent years due to numerouspossible applications in practice, including thosein person identification, surveillance, andimage/video annotation. A variety of facedetection methods has been reported in theliterature and a comprehensive review can befound in [2]. While many of the earlier methodswere developed to detect faces in gray scaleimages, most of the recent methods, however,work on color images. Despite the large numberof methods proposed in the literature for face

detection, the performance of these methodsremains unsatisfactory for images containingside-view faces. This probably stems from thefact that people usually focus on frontal-viewfaces in interpreting video or image contents andtherefor the detection of side-view faces areneglected in most research work. While inapplications such as access control via faceidentification, it is reasonable to look for a front-view face only, applications such as surveillance,criminal identification in a mug shot database,and image/video annotation would greatlybenefit by being able to detect side-view faces.

In this paper, we propose a coarse-to-finescheme to detect side-view faces in color imagesto extend the state of the art of face detectionresearch. The proposed scheme is designed tocomplement our frontal-face detection system[11]. For a given color image of arbitrarilycomplex scene, the purpose of the system is tolocate only side-view faces as in Fig. 1. Anomni-face detection system capable of findinghuman faces from various views can be obtainedby integrating the frontal-view and side-viewface detection systems.

Unlike front-view faces, which have an ovalshape, it is hard to mathematically model theshapes of side-view faces. Some characteristicsof side-view faces have been analyzed inprevious research on person identification suchas [5] and [10]. It has been observed that themost distinguishable feature of side-view faces isthe protrusion of the nose in the face regioncontour. This naturally leads to the attempt tomodel the side-view face profiles using curverepresentation methods like B-spline as in [1] or[8] and use it as the filter to select matchingcurves in the edge map of the image. In practice,however, the face profiles can not be perfectlyextracted due to the limitation of existing edgedetection techniques. As illustrated in Fig. 1 (d),the face profile could be broken into manysegments or connected to extraneous edgesbecause of environmental factors like noises,



shading and occlusion, ending up that no singleedge segment satisfies the selection criteria. Tocircumvent this difficulty, we developed theNormalized Similarity Value (NSV) to findgroups of edge segments that resemble side-viewface profiles when considered together. It doesnot require explicit correspondence betweenpixels in the template and the image and isespecially suitable for this problem where thescene could be highly cluttered.

Working on multiple image features, our systemis stable and accurate owing to the employmentof the novel components besides the NSV,including the Iterative Partition Process and theexploration of Hidden Markov Model forrecognizing objects in 2D image. The remainderof this paper is organized as follows. Section 2presents the detailed description of the fourstages of the system. In Section 3 we giveexamples of the experimental results andevaluate the performance of the system.Conclusion and possible future extension of thesystem are discussed in Section 4.

2. System description

The system follows a coarse-to-fine strategy inthat each of the four stages is a refinement of theprevious one. A stage makes a hypothesis aboutthe position and size of side-view faces based oncertain image features. The hypotheses are eitherverified or rejected in the next stage until thefaces are located in the final stage. Those fourstages are 1) skin-tone region extraction bycolor, 2) region and edge preprocessing withmorphological operations and length filtering, 3)face candidate region selection based curvepattern detection with Normalized SimilarityValue and 4) final verification using HiddenMarkov Models.

2.1 Skin-tone region segmentation

The purpose of the first stage is to reduce thepixel search space. Human faces have skin-tonecolor and thus by extracting those regions we canfocus our attention only to them instead of thewhole image. YIQ color coordinate is used toperform the segmentation at pixel level. Bymanually marking the skin-tone regions in alarge collection of image as the training set, weobserved that skin-tone pixels reside in a halfellipse in the Y-I plane as shown in Fig. 2. Thedistribution of skin-tone pixels is modeled by thehalf ellipse, which is applied to the input color

image to classify each pixel as being skin-tone ornon-skin-tone. The output of this operation is abinary image where the white pixels have skin-tone in the original image, which is illustrated inFig. 1 (a) and (b). This stage provides thecoarsest estimation of side-view faces becauseonly color information is considered and manyfalse alarms may be generated such as thefrontal-view face and non-face objects in Fig. 1(b).

2.2 Region and edge preprocessing

The candidate regions of side-view faces areselected based on an analysis of the shapes andedges of the regions. Due to various factors suchas noise, shading and illumination variations,skin-tone regions or edges can not be perfectlyextracted. Therefore it is necessary to eliminatethe effect of above factors before proceeding tothe next stage. Small and isolated region areusually unlikely to be faces and it is desirable toremove them in this early stage instead of laterstages, which require far more computationoverhead. Morphological operations are used toaccomplish this. We applied the Open operationto the skin-tone binary image to remove smallisolated regions. It can also break weakconnections as well as erase thin protrusion [4].This feature is important because face regionsmay be connected to other objects having skin-tone, which introduces complexity in lateranalysis, and breaking them apart in advance canminimize the adverse effect. To get optimalresults, the size of the structuring element of theopen operation is adaptive to the size of theimage while constrained by an upper and a lowerlimit. Fig. 1 (c) shows the result of this process.

Edge detection is applied to the image and theresult is shown in Fig. 1 (d). Since the contoursof face also are in skin-tone, it is sufficient tojust inspect skin-tone edges. To extract thoseedges, the pixel-wise intersection operation isperformed between the edge map like in Fig. 1(d) and Fig. 1 (c), the preprocessed skin-toneregions and Fig. 1 (e) shows the result. Veryshort edge segments are usually due to noise andminor variations and we use an edge-length filterto remove short edge segments. The result in acleaner edge map is shown in Fig. 1 (f).

2.3 Face candidate selection

In this stage face candidates are selected basedon the analysis of each region. Two side-view



face profile templates are constructed as in Fig.3, for left-oriented and right-oriented,respectively. If a region contains edge curvessimilar to one of the templates, then that regionis considered as a candidate. Due to extraneousfactors, the contour of a face is often broken intoseveral edge segments and thus no singlesegment resembles the templates. Instead, groupsof edge segments need to be checked to see ifthey satisfy the criteria. However, exhaustivelyconsidering all possible combinations of edgesegments is impractical. In our system, thisdifficulty is circumvented by the NormalizedSimilarity Value (NSV) measure, whichaddresses the problem of finding matches for adesignated pattern (template) in the image. NSVis relatively insensitive to noise and does notrequire explicit correspondence between thepoints on the pattern and the image. These twocharacteristics make it especially suitable for ourapplication environment, where the scene couldbe very complex and the curve pattern to bedetected can not be defined mathematically.

To determine if a region should be accepted as acandidate, its NSV’s to the templates arecomputed. If the highest NSV is higher than athreshold, the region is accepted as a facecandidate, otherwise the Iterative PartitionProcess is applied to see if it can be divided intosmaller candidates until there are no morepotential face candidates.

2.3.1 Normalized Similarity Value

NSV is essentially a variation of DirectedHausdorff Distance discussed in [6] and [7].Thus before introducing our approach, it isuseful to take a brief review of HausdorffDistance, a measure for the similarity betweentwo finite point sets. Given sets A and B,Hausdorff distance is defined as

H(A, B)=max(h(A, B), h(B, A)) (1)

where h(A, B) is called directed distance from Ato B. For all points in A, calculate the distancesto their respective nearest neighbors in B, andh(A, B) is the largest value. The definition ofh(A, B) is given below:

||.|| is some norm in the plane. h(B, A) is thedirected distance from B to A, which is defined

similarly. The undirected H(A, B) is small whenboth h(A, B) and h(B, A) are small. If H(A,B)=d, then every point in A is within distance dof some (note the term some is used since noexplicit pairing between points in A and B isrequired) point in B, and vice versa. ThusHausdorff distance can be used to evaluate themismatch (and similarity) between A and B.However, this measure is vulnerable to noisesince a single point far away may cause H(A, B)very large. A modification called partialHausdorff distance is used to solve this problem,which is defined as

HLK(A, B)=max(hL(A, B), hK(B, A)) (3)

where the hL(A, B) and hK(B, A) are the partialdirected distances. By ranking all points in A bytheir distance to the nearest match in B, hL(A, B)is the Lth largest distance (1<=L<=q, whereq=|A|). The definition of hL(A, B) is:

and hK(B, A) is similarly defined. Thus if HLK(A,B)=d, then there are L points in A that are withindistance d of some point in B and vice versa. Bytruncating the contributions of maximum values,the influence of the noise is effectively reduced.

We decide if a region is a candidate by checkingif there are any edges that resemble the side-viewface templates well. Therefore, when evaluatingthe similarity, only the distance from thetemplate to the image (forward distance) isconsidered. The following measure s for NSV isused to see how similar the points on an edgesegment map A are to the template B:

The quantity N is the number of points in B andcard( . ) is the cardinality of the set. The value ofs ranges from 0 to 1, which indicates the fractionof points in B that are within distance Γd to somepoints in A. Thus the larger s is, the better matchwe can find in A for B. When s=1, it means thatevery point in B is within distance Γd to itsnearest neighbor in A and a perfect match isfound. In implementation, the value of Γd isadaptive to the size of the template and rangesfrom 2 to 6 pixels.

)4(||||min

),( baAa

KBAh thBbL −

∈= ∈

)5(/}))min

(|({ NbaAa

Bbcards dΓ<−∈

∈=

)2(||||minmax

),( baBbAa

BAh −∈∈

=



2.3.2 Comparison between a region and thetemplates

We compare each region against the twotemplates to decide if it is accepted as acandidate. The following processes are applied.

• Compute its similarity to the left and righttemplate (SL and SR, respectively)• Take the larger one as the final similarity(S). That is, S=max(SL, SR). If S is less than athreshold Γs, the region is put into the rejectedlist and subject to an iterative process.• Otherwise the region is accepted. If SL isgreater than SR, the region is more like a left-oriented face than a right-oriented face and thusannotated as a left candidate. If SL is less thanSR, it is annotated as right candidate. If they areequal, an arbitrary decision is made (howeverthis is very unlikely to happen)

To calculate the SR of a given region, the rightface template is scaled to the same height as theregion and aligned with the region vertically asin Fig. 4. Then the template is translatedhorizontally pixel by pixel from the leftmostpoint to the rightmost point of the region. Ateach position, the Normalize Similarity Value(sri) between the template and the cleaned edgemap (like Fig. 2 (f)) in the region is calculated.Thus we have a series of NSV’s. SR is defined as

SR = max (sr1, sr2, ……, srn) (6)

And SL is computed in the same approach withthe left face template. By comparing the valuesof SL and SR, we can decide if the region shouldbe accepted and further annotate the candidatesas left or right oriented.

A straightforward implementation of the aboveprocess is inefficient because it involvescomplex and unnecessary computation. First,repeatedly finding the nearest neighbor in theimage for a point on the template is time-consuming. We used distance transformation asdiscussed in [3] to solve this problem. Second, atmost translation positions, the NSV's are lessthan Γs, which will have no impact on thedecision and annotation results and calculatingthe exact values on those positions ismeaningless. Therefore, it is desirable to tell theNSV of current position is less than Γs beforefully calculating is finished. According to thedefinition of NSV, if the NSV at a position isgreater than Γs, it follows that there are at least

N*Γs points in B within distance Γd of somepoint in A. On the other hand, if there are morethan N*(1-Γs) points in B that are away from anypoint of A, NSV is less than Γd. Based on thisanalysis, we define a counter C, which isinitialized to 0 and incremented by 1 each time apoint in B in found to be more than Γd awayfrom any point in A. When C grows greater thanN*(1-Γs), the computation at current position isaborted and NSV is assigned with any value lessthan Γs. Then the template is translated to thenext position for the same process. By doing this,we can avoid a lot of unnecessary computationoverhead and improve the efficiencysignificantly.

Discarding the regions that are rejected in thetemplate comparison can lead to low detectionrates because the face region may be connectedto another skin-tone region. For example, in Fig.1 (c) the woman's face region was rejected in thefirst round of template comparison because theextension of her neck caused an improper scalingof the template, which is illustrated in Fig. 4. Wedesigned an Iterative Partition Process to solvethis problem. The rejected regions are dividedinto smaller sub-regions to see if new facecandidates can be found. A similar algorithm isdiscussed in [11]. However, the objective in [11]is to find frontal-view face candidates andtherefore a clustering partition is used to generateconvex sub-regions. In this application, regionsare split vertically into several (the empiricalnumber is 3) parts of uniform width like Fig. 4so that the frontal part of the side-view face canbe separated from the rest part of the region. Thedivided smaller regions undergo the sametemplate comparison as described above. Thisprocess is applied iteratively until no more facecandidates can be found. Fig. 1 (g) illustrates thatthe frontal part of the woman’s face wasdetached from the neck and picked up in theIterative Partition Process. On the binary imagethere are also other two small regions that arefalsely accepted as face candidates. Thesecandidates are subject to further verification inthe final stage.

2.3.3 Final verification with Hidden MarkovModel

The final verification stage determines if acandidate region is indeed a face by inspectingthe intensity patterns within the region. Facialfeatures like eyes and eyebrows are usually



darker than the rest of the face, which motivatedus to divide the candidate region into a numberof horizontal stripes of uniform height and checkwhich stripes have higher average intensity thanothers. However, manually identifying thosepatterns unique to faces is tedious and hard dueto the variation of different person’s face and thecomplication caused by other factors likeexpression and hairstyle. By the use of HiddenMarkov Model, we enabled the system toautomatically capture patterns corresponding tofaces, which solves the difficulty elegantly.

Hidden Markov Model (HMM) is a populartechnique widely used in signal processing and isparticularly powerful in recognizing time-seriespatterns. The essence of HMM is to construct amodel to explain the occurrence of observations(symbols) and use it to identify otherobservations. [9] presents the fundamentals ofHMM and its applications. We used a collectionof face candidate regions containing both real-face regions and false positive regions as thetraining set. Each region is partitioned into anumber of horizontal stripes (in implementationthe number is 16) and the average intensityvalues of the stripes are used as the observationsequence to the HMM. Based on theseobservation sequences, we trained two HMM’sfor face and non-face, respectively. Thus for agiven candidate region, the same types ofpatterns are extracted and used as the observationsequence to the two HMM's. If the face-HMMgenerates the larger response the region islabeled as a face, otherwise it is rejected. Thefinal result of the system is shown in Fig.1 (h).

3. Experimental results

Our system has been tested with 130 imagesrepresenting varying degrees of difficulty interms of different factors of the problem, e.g.,background skin-tone objects, partial occlusionand various scales. The images contain 60 frontalfaces and 85 side-view faces. 60 of the side-viewfaces are correctly. 15 false positives aregenerated, 7 of which are caused by frontal-viewfaces. The performance is encouragingconsidering the complexity of the test images weused and that the recall and precision iscomparable to the state of the art of frontal-viewface detection systems. More results arepresented in Fig. 5. We observed that falsepositive are often found in skin-tone regions richin texture whereas false negatives are usuallyfaces of which the nose protrusions are not

obvious or the face is enclosed by other skin-tone objects.

4. Summary and future work

Our work was developed to extend the state ofthe art of face detection research by providingthe ability to detect side-view faces, which is notsupported by existing systems. The proposedsystem works in a top-down fashion on colorimages. Each of the four stages generateshypotheses and seeks evidences to verify them.The hypotheses correspond to face candidateregions and the evidences are essentiallyparticular image features. Unsupported regionsare rejected while accepted ones are subject torefinement in the next stage until the faces arefinally located.

The system contains three original components.The first is the design of NSV to detect thepresence of a given curve pattern, which isespecially useful when the pattern can not bemodel mathematically. It does not requireexplicit correspondence between pixels in thetemplate and the image, which makes it verysuitable in detecting patterns in complex scenes.The Iterative Partition Process is another novelpart of our scheme to improve the detection. Thisprocess allows us to minimize falsely rejectedregions by detaching candidate regionsconnected to extraneous regions. Finally weexplored the use of HMM to recognize objects inimage. HMM is a popular technique to identifytime-series patterns and our experiment showsthat it also works well in 2D images.

Further refinements on each stage will be madeto improve the performance in terms ofprecision, recall and efficiency. We are in theprocess of developing a module for the finalverification stage using time-wrapping techniqueand compare its performance with the currentHMM method. The frontal-view face detectionsystem in [11] will be incorporated to obtain anomni-face detector capable of detecting facesirrespective of pose. We have extended thefrontal-view detection system for face tracking invideo, and by adding the side-view facedetection ability, the tracking power will begreatly reinforced.

Reference:

1. T.J. Cham and R. Cipolla (1999).Automated B-spline Curve Representation



Incorporating MDL and Errro-MinimizingControl Point Insertion Strategies. IEEETrans. Pattern Anal. Mach. Intell. Vol. 21

2. Rama Chellappa R. Chellappa, C.L. Wilson,S. Sirohey (1995). Human and machinerecognition of faces: A Survey. Proceedingsof the IEEE, VOL. 83, No. 5, p705-740

3. P.E. Danielsson (1980). Euclidean Distancemapping. Computer Graphics and ImageProcessing

4. R.C. Gonzalez, R.E. Woods (1993). DigitalImage Processing. Addison-WesleyPublishing Company

5. L. Harmon, M. K. Khan, R. Lasch and P. F.Ramig (1981). Machine identification ofhuman faces. Pattern Recognition, Vol. 13,p97-110

6. D. P. Hettenlocher, W. J. Rucklidge (1992).A multi-resolution Technique forComparing Images Using the HausdorffDistance. TR 92-1321, Department ofComputer Science, Cornell University

7. D.P. Huttenlocher, G.A. Klanderman andW.J. Rucklidge (1993). Comparing ImagesUsing the Hausdorff Distance. IEEE Trans.Pattern Anal. Mach. Intell. Vol. 15, No.9p850-863

8. G. Medioni and Y. Yasumoto (1987).Corner Detection and Curve RepresentationUsing Cubic B-Spline. Computer Vision,Graphics and Image Processing, 39, pp267-278

9. L.R. Rabiner and B.H.Juang. Anintroduction to Hidden Mardov Models.IEEE ASSP Magazine. pp4-15, January1986

10. C. .J. Wu and J. Huang (1990). Human faceprofile recognition by computer. PatterRecognition, Vol. 23, 255-259

11. G. Wei and I.K. Sethi (1999). FaceDetection for Image Annotation. PatternRecognition Letters, Vol. 20, pp1313-1321

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

� � ! � � � " # $ � � % & � � ' ( � � � � � � � � � ) * + � # � � � � & � � ' ( � � � � � � � �

� � , # - � * # � � � � # ( � � � . / � ( � - ( � � � � � � 0 + (

, � � 1 2 ! 3 # 4 $ + � � ) ( . � � � ( � - ( � � � $ � � - � � �



I0 255

Y 255

Figure 2. The distribution of skin-tonepixels in Y-I plane.

Figure 3. Side-view face templates

Figure 4. Illustration of iterative region splitting for theside-view face candidate selection

(a) (b) (c)

Fig. 5: More examples

(d) (e) (f)

Documents

DETECTION OF SIDE-VIEW FACES IN COLOR IMAGES · For submission to WACV 2000, paper number: 1 Page 2 of 7 Detection of Side-view Faces in Color Images shading and occlusion, ending