6

Click here to load reader

[IEEE Canadian Conference on Electrical and Computer Engineering, 2005. - Saskatoon, SK, Canada (May 1-4, 2005)] Canadian Conference on Electrical and Computer Engineering, 2005. -

Embed Size (px)

Citation preview

Page 1: [IEEE Canadian Conference on Electrical and Computer Engineering, 2005. - Saskatoon, SK, Canada (May 1-4, 2005)] Canadian Conference on Electrical and Computer Engineering, 2005. -

3D Characteristic Facial Contours

Xue Dong Yang William Xu Boting Yang Department of Computer Science

University of Regina Regina, Saskatchewan S4S 0A2

{yang, xuwill11, boting}@cs.uregina.ca

Abstract

Since 1970’s, many sophisticated face recognition techniques have been developed, and the performance of the existing face recognition systems has been improved constantly. However, two very challenging problems remain: changes in lighting conditions and head poses. Several approaches, mostly based on 3D models with high computational costs, have been proposed in recent years. In this paper, we investigate the feasibility of using only a small set of characteristic contours extracted from 3D face models. The 3D surface-matching problem is therefore reduced to a contour-matching problem. The contour-matching problem is further simplified into a simple one-dimensional z-distance comparison problem. Our preliminary experiments are performed on a small database containing 20 different face models, including two identical twin brothers. Our preliminary results show a very encouraging performance in both comparison accuracy and computational speed. Promising further research along this direction is discussed.

Keywords: Face Recognition, 3D Facial Model, Facial Database Retrieval

1. Introduction

In recent years, face recognition has become a very active research area due to the wide range of commercial applications and the availability of feasible technology after nearly 30 years of research. There are numerous commercial and law enforcement applications using face recognition technology [5]. These applications include the static matching of photographs such as passport and driver’s license as well as the real-time matching of surveillance video images.

The face matching techniques can be broadly categorized into two types of approaches: statistical approaches and neural network approaches. The main idea of statistical approach is to use the eigenfaces for face recognition [10]. Each face in the database can be represented by a weight vector. We can obtain the weight vector by projecting the image onto eigenface components by an inner product operation. This approach is relatively robust for changes in lighting condition but degrades quickly for changes of scale. The idea of eigenfaces can also be

extended to eigenfeatures [1] such as eigeneyes, eigenmouth, etc. Neural Networks have been used for problems such as gender classification, face recognition and facial expression classification (e.g. [4]). Readers may refer to [12] for a very comprehensive overview of the field. Many of these techniques work reasonably well in certain controlled environments. However, they are often not sufficiently robust to handle several challenging problems such as changes in lighting conditions, head poses, cosmetics, or hair styles [12]. The problem of illumination variation and pose variation are the most difficult among all these problems.

In the recent decade, research interest in using 3D face models for face recognition has dramatically increased. By using 3D face models, the problem of changes in light condition and head pose will be greatly attenuated due to the nature of 3D models. In [3], a 3D morphable model was proposed by using Principle Component Analysis (PCA). The geometry of a face is represented by a shape-vector S and the texture by the texture-vector T. A human face can be expressed as a linear combination of the shapes and textures of the m exemplar faces. In matching the morphable model against a 2D input image, the coefficients are optimized so that the model can produce an image as close as possible to the input image. In [9], the system uses 12 facial features such as longitudinal section, transections and other facial features for face recognition. By using the 3D head models, synthetic face images can be generated under different poses and different illumination conditions [8]. The component-based detector can be used to detect the face and extract the fourteen facial components for each synthetic image. Curvature information calculated from range image data has also been used for face recognition [7]. The facial feature information, such as the locations of eyes, nose and mouth, are used to normalize the range image into a standard position. The system computes the volume of space between two face surfaces as a measure of similarity. A smaller value indicates a higher similarity between two faces.

There are several disadvantages and limitations with the existing 3D-based face recognition techniques. For example, when the morphable model obtained from the 3D scans database is changed, the coefficients of all the gallery images will need to be recalculated. Facial feature based techniques require very high-resolution face models. The accuracy of the

0-7803-8886-0/05/$20.00 ©2005 IEEECCECE/CCGEI, Saskatoon, May 2005

953

Page 2: [IEEE Canadian Conference on Electrical and Computer Engineering, 2005. - Saskatoon, SK, Canada (May 1-4, 2005)] Canadian Conference on Electrical and Computer Engineering, 2005. -

feature extraction will affect the model’s transformation into a standard position, hence it will affect the comparison result. Most of these techniques apply a linear search algorithm to search for the matching image, which is very expensive for a large database.

The motivation of this paper is to study a new 3D face recognition technique that can overcome some problems in the previous face recognition techniques. The main idea is to use characteristic contour curves extracted from normalized 3D face models for comparison. The differences between the characteristic contour curves of two 3D facial models provide a quantitative measurement of similarity. The 3D surface-matching problem is therefore reduced to a contour-matching problem. The contour-matching problem is further simplified into a simple one-dimensional z-distance comparison problem. Since only the extracted characteristic contours is saved in the data structure, the size of a 3D facial database can be greatly reduced. Furthermore, a binary search algorithm for the searching of face models from the database is developed, and thereby greatly improved the efficiency of the searching process. Our preliminary experiments are performed on a small database containing 20 different face models, including two identical twin brothers. Our preliminary results show a very encouraging performance in both comparison accuracy and computational speed. Promising further research along this direction is discussed.

2. 3D Characteristic Facial Contours

Human head has eyes, ears, mouth, nose, forehead, cheek, chin and hair. The same people can have different hairstyles during different periods of time, which can make him/her look different. Therefore the hair is not useful for matching two face models in general. On the other hand, without the consideration of facial expressions, the facial components such as eyes, mouth, nose and forehead are in fixed locations of the head and the size of these components are fixed for individuals. The 3D information, such as size, shape, and location of these facial components, is different from person to person. No two people have exactly the same facial components, even for identical twins. Facial components are used instinctually by people. For example we often describe a person by his/her characteristic facial components such as small eyes, big nose, big head, etc. Based on the above observations, we can see it is very promising to use 3D face models in face recognition.

To accurately acquire the range data of 3D face models is very crucial to the approaches in 3D face recognition. The two devices that are commonly used to acquire range data are stereo vision camera and 3D laser scanner. 3D laser scanners produce more accurate range data than stereo vision systems. Stereo vision techniques depend heavily on the availability of fine textures in the image, thus producing poor results in smooth regions. However, some of the existing stereo vision systems were reported to obtain 3D range data with accuracy

close to that of 3D laser scanners. In this research, we used a 3D laser scanner (Minolta Vivid 900) to collect 3D face models. The resolution of the 3D face models is about 1mm. We also downloaded a set of models obtained by stereo vision techniques [6] for the testing purpose. The size of one face data file obtained from the laser scanner is about 1.5 to 2.0 MB. This would result in a very large database. The models generated from stereo vision are represented in the form of adaptive polygon mesh with much lower resolutions. The challenges for using the 3D face models in face recognition in general include, but not limited to:

1. Can we compare the 3D face models’ data when they are in different resolutions?

2. How can we compare the 3D face models’ data when they are in different scales?

3. Can we reduce the size of the data file so that we have a reasonably sized database for large applications?

The 3D face models acquired from either the 3D laser scanner or the stereo vision camera can have variations in size and orientation. Most of the face recognition systems make efforts to normalize the face models before they apply face model matching algorithms. Normally the face recognition systems use extracted feature points as the reference points for face normalization. For example, in [9], the system uses the location of the nose bridge, the nose base and the outside corner of the eyes to transform the face models into a standard position. We normalize the 3D face model into a standard orientation by the following steps:

1. For the profile view: Align the top of eyebrows and the top of the lip vertically;

2. For the front view: Align the center of the two eyes horizontally;

3. For the top view: Align the rear edge of the two ears horizontally.

The other problem with the 3D face models is that the models may have different scales. Although the 3D range data acquired from the 3D laser scanner will not have scaling problems, the models generated using 2D images and stereo vision techniques do have this kind of problem. Our system uses the distance between the edges of the two ears as the reference distance for normalizing the size of the head.

A person’s hairstyle can change dramatically from time to time. Even for a same style, the shape cannot be kept exactly the same all the time. Therefore, hairstyles are not reliable information for identification purposes. It should also be noted that the current 3D devices produce poor or no data in the hair region. We use a standardized central region of a face for further processing. The nose tip is selected as the reference point. Our system uses a sphere to intersect with the polygon mesh to get the standardized face model in the form of polygon mesh. The radius of the sphere is 90mm (referring to our normalized scale). We obtain the center of the sphere by

954

Page 3: [IEEE Canadian Conference on Electrical and Computer Engineering, 2005. - Saskatoon, SK, Canada (May 1-4, 2005)] Canadian Conference on Electrical and Computer Engineering, 2005. -

adding 10mm to the z value of the nose tip, since this is roughly the center of the face. After intersecting the sphere with the face model, the polygons outside the sphere are discarded. Polygons intersecting with the sphere are used to generate the new boundary polygons. These boundary polygons combine with the polygons inside the sphere, resulting in our standardized polygon mesh for the face model. Figure 1 shows the face models (a) before and (b) after the intersection.

(a) Before Intersection

(b) After Intersection Figure 1. Standardized Face Models

Comparison of faces using standardized polygon mesh model is still a difficult and computationally intensive task, particularly if two polygon meshes have different resolutions. We hypothesize that it is sufficient to use a subset of polygon mesh data to differentiate between different persons. In this study, we use six experimentally selected planes to intersect with the standardized face models to obtain the six characteristic contours (cross sections) of the face models (Figure 2). The first plane is a vertical plane cutting through the center of the sphere. This plane is on the vertical center of the face model since our faces are symmetric on the nose ridge. The second (or third) plane is a vertical plane 35mm to the left (or right) of the first plane. This plane vertically cuts through the vicinity of the center of the left (or right) eye. The fourth plane is a horizontal plane cutting through the center of the sphere. This plane horizontally cuts through the face model 10mm above the nose tip. The fifth plane is 50mm above the fourth plane. This plane horizontally cuts through the vicinity

of the top of the eyebrows. The sixth plane is 30mm underneath the fourth plane. This plane horizontally cuts through the location between the nose base and the mouth.

Figure 2. The Positions of Cutting Planes.

For each plane we first calculate the polygons that intersect with the planes. We outline this computation briefly below (the detail of this algorithm can be found in [11]):

• By checking if not all vertices of a polygon are on the same side of the cutting plane, find the polygon faces that are intersecting with the cutting plane, and store them in a polygon face list;

• For each polygon in the list, traverse through edges to computepolygon edge/cutting plane intersection points;

• By connecting the intersecting points in the proper order, we obtain an intersecting contour curve between the face model and a cutting plane. An example is shown in Figure 3.

Figure 3. Extracted Contours.

3. Facial Contour Comparison

955

Page 4: [IEEE Canadian Conference on Electrical and Computer Engineering, 2005. - Saskatoon, SK, Canada (May 1-4, 2005)] Canadian Conference on Electrical and Computer Engineering, 2005. -

There are many existing techniques for comparing 2D shapes. But most of these algorithms are for 2D polygonal (closed) shapes with the same number of vertices. To compare the similarity of open curves is nontrivial and sometimes can be very difficult, especially for curves with different number of vertices. The first difficulty we have is that the contours we obtain are open shapes. Although in Figure 3 it seems the contours are closed shapes, they are actually open curves since the sphere part of the curves are added for displaying purposes and can not be used for similarity comparing purposes. The second difficulty is that for face models with different resolutions, the number of vertices on the contour will be different since they are obtained from different numbers of polygons. The third difficulty is that we need to have a similarity value for each pair of contours since we need to add all these similarity values together to obtain the difference between two face models. Several existing 2D shape comparison algorithms (e.g. [2]) were tried, but all failed to overcome the above three problems.

Two major challenges in comparing the 2D shapes are to find (1) the best orientation and (2) reference points for the two shapes. Fortunately our 2D contours are obtained from the standardized polygon meshes and cutting planes. This means that the extracted contours have been standardized. The idea of scan-line z-distance for comparing the similarity between two contours is proposed here. As shown in Figure 4, based on the reference point (nose tip), we use horizontal (or vertical) lines for every 2mm to scan through the contours.

Figure 4. Scan-line z-distances.

It is clear that the interval between the sample points is uniform regardless of the original resolution of the polygon mesh. This successfully solves the resolution problem for the subsequent comparison step. The difference, however, is that a higher resolution original polygon will produce more accurate sample positions. For calculating the similarity between two contours we only to compute the sum of all the differences of the transformed z-coordinates between each pair of vertices within the range of overlapping lower and upper bound in their corresponding vertex list. The formula is as follows:

Diffc = =

−m

ibiai ZZ

1||

The variable m is the number of scan-line vertices within the range of the lower bound and upper bound. As we can see from the above formula, if two shapes are very similar the value of Diffc will be close to zero. The value of Diffc

increases as the similarity between two contours drops. By adding up all the six similarity values, we can check if the two face models are close enough to be considered from the same person. We denote the sum of the similarity values as Diffm,and we have two threshold values, T0 and T1, for Diffm . If Diffm

< T0, a match is found. If Diffm > T1, a mismatch can be concluded. Both T0 and T1 are experimentally determined for reliable decision.

If Diffm is between T0 and T1, then we are not sure if we have found a match. We call the range between these two threshold values uncertainty values. There are two possible reasons: the location of the nose tip might not be accurate enough because of errors introduced while obtaining the face models; the orientation of the face model might be different due to errors in face normalization. To solve these two problems, we apply two optimization steps to get the smallest possible Diffm: 1) translate the nose tip by small displacements in both vertical and horizontal directions; 2) rotate the face model by small angels. If the minimum Diffm < T0, a match is accepted.

If linear search used by most existing algorithms is employed here, there will still be a significant amount of calculation that needs to be performed, especially when the database contains a large number of face models. Also, we notice that human faces have different ratios in their width and height. By using this characteristics based on our uniquely standardized face models to arrange them in our database, we can greatly improve the efficiency of our searching process.

In our system, we use the ratio of the height (yDist of the first vertical contour) and width (xDist of the first horizontal contour) of the standardized face model, to sort the order of face models in the database. When we search the database file for the matching face model, we use the ratio information of the query face model. We only extract those face models with a ratio close to the query face model for our matching process. By doing this, we do not need to perform a linear search and evaluate every face model; instead, the searching process becomes a binary search on a scalar value since our database has already been sorted using the ratio information. Thus it is very efficient, and the complexity of the search is O(nlogn).

The original file size of a typical face model is about 1.5M -2.0M. By storing only the 3D characteristic contour information, the size of the face model’s representation is reduced to only 7 - 8KB, about 0.5% of its original file size. Thus our system greatly reduces the storage requirements.

4. Experimental Results

956

Page 5: [IEEE Canadian Conference on Electrical and Computer Engineering, 2005. - Saskatoon, SK, Canada (May 1-4, 2005)] Canadian Conference on Electrical and Computer Engineering, 2005. -

A small database is created for the testing purpose. Due to the difficulty of finding volunteers, this database contains only 20 different faces. However, the deficiency of the small size of the database is compensated by two identical twin brothers that pose a great challenge to any face recognition techniques.

Figure 5 shows the experimental results of the average vertex differences of the 20 face models in the database, as compared with the model 15. The second lowest value (model 14) of 1.39 units is the result of comparing the twin brother’s face models.

Figure 5. The average differences.

We would like to check the robustness of our algorithm by creating noise on the face models. We do this by rotating a face model for a small random amount along the x, y, and z axes respectively. Figure 6 shows the comparison of a person him/her self, before and after the random noise is added. As we can see, the average vertex difference values range from 1.4 to 1.9 units. If the optimization step is applied, the average difference values are all within 0.9 units (Figure 7).

Several more experimental results and comprehensive analysis can be found in [11]. Due to the limitation of the space here, they are not included in this paper. It should be pointed out that the choice of six characteristic contours and their selected positions are besed on a large amount of empirical results. It is noticed that when more contours are used for comparison, the discrimination capability of the system actually decreases. Though the selected cutting positions appear to be effective, they are by no means the positions for optimal performance. It will be an interesting future research topic to investigate how many contours are sufficient and where are their optimal positions, by using a significantly larger set of sample facial models.

Figure 6. Self-comparison without optimization.

Figure 7. Self-comparison with optimization.

For comparison purpose, we implemented the existing algorithm described in [7]. As we noticed from its experimental results, some of the slef-differences between the noisy face models are close to or bigger than the cross-differences between different face models, potentially leading to wrong matches. Thus, the existing algorithm [7] is more sensitive to the noisy as compared to our method.

5. Conclusions and Future Research

957

Page 6: [IEEE Canadian Conference on Electrical and Computer Engineering, 2005. - Saskatoon, SK, Canada (May 1-4, 2005)] Canadian Conference on Electrical and Computer Engineering, 2005. -

The primary objective of this research is to investigate the feasibility of face recognition by using only a small set of characteristic contours extracted from the 3D face models. Six characteristic contours are extracted from the normalized 3D face model at empirically selected positions by using the nose tip as the reference point. By using the characteristic contours, we reduce the complex 3D surface-matching problem into a 2D contour-matching problem. Scalar values that measure the similarity between two sets of characteristic contours are calculated by using a list of scan-line z values derived from each contour. By using the scan-line z values, the 2D contour-matching problem is further simplified into a 1D z-distance comparison problem. Very promising results have been obtained from our experiments. To improve the robustness of our system, we apply an optimization process to ensure that we reduce the errors that could be introduced at the face normalization stage.

We proposed an indexing scheme for binary searching the face model database. Also, the data files in our database only store the information of the characteristic contours for each face model, only about 0.5% of its original size. Our technique shows a high degree of efficiency in both spatial representation and searching computation for the face model database.

The current face normalization procedure is not fully automatic because it requires a few manual operations. It is highly achievable, in our opinion, to develop a fully automatic algorithm that can reliably detect the reference positions from the 3D face model. Different 3D facial features that may serve as the basis of the normalization should be studied. Our novel attempt to index a face model database using a single scalar value suggests an extremely promising research direction. Alternative choices of index formulation should be studied in depth and a comprehensive evaluation among them should be conducted. The current face model database is relatively small, due primarily to the limited pool of volunteers. A much larger database may raise new research issues in the future, thus requiring continued research in this direction.

Acknowledgement

This work was conducted in the New Media Studio Lab at the University of Regina that is funded by a CFI Grant.

References

[1] Akamatsu, S., Sasaki, T., and Fukamachi, H., “A Robust Face Identification Scheme - KL Expansion of An Invariant Feature Space,” SPIE Proc.: Intell. Robots and Computer Vision X: Algorithms and Technics, Vol. 1607, pp. 71-84, 1991.

[2] Arkin, E.M., Chew, L.P., Huttenlocher, D.P., and Mitchell, S.B., “An Efficiently Computable Metric for Comaparing Polygonal Shapes,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 13, pp. 209-216, 1991.

[3] Blanz, V., and Vetter, T., “Face Recognition Based on Fitting a 3D Morphable Model,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 25, pp. 1063-1074, September, 2003.

[4] Brunelli R., and Poggio, T., “HyperBF Networks for Gender Classification,” Proc. of DARPA Image Understanding Workshop, pp. 331-314, 1992.

[5] Chellappa, R., Wilson, C.L., and Sirohey, S., “Human and Machine Recognition of Faces: A Survey,” Proc. of the IEEE, Vol. 83, pp. 705-740, 1995.

[6] http://cvlab.epfl.ch/research/face/stereo/stereo.html (as accessed in May 2004)

[7] Gordon, G., “Face Recognition from Frontal and Profile Views,” Int’l Workshop on Face and Gesture Recognition, pp. 47-52, 1996.

[8] Huang, J., Heisele, B., and Blanz, V., “Component-based Face Recognition with 3D Morphable Models,” 4th Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA), 2003.

[9] Lee Y., “3D Face Recognition Using Longitudinal Section and Transection,” DICTA 2003, pp. 49-58, 2003.

[10] Turk, M.A., Pentland, and A.P., “Face Recognition Using Eigenfaces,” Proc. CVPR 1991, pp. 586-591, 1991.

[11] Xu, W. Characteristic Contour-based 3D Face Model Recognition Technique, M.Sc. Thesis, Dept. of Computer Science, University of Regina, 2004.

[12] Zhao, W., Chellappa, R., Rosenfeld, A., and Phillips, P.J., “Face Recognition: A Literature Survey,” CVL Technical Report, University of Maryland, 2000.

958