12
Synthesis of 3D faces using region-based morphing under intuitive control By Yu Zhang * and Norman I. Badler ********************************************************************************************* This paper presents a new region-based method for automatically synthesizing varied, natural looking 3D human face models by morphing local facial features according to intuitive control parameters. We automatically compute a one-to-one vertex correspondence among the unregistered face scans in a large database by deforming a generic mesh to fit the specific person’s face geometry in a global-to-local fashion. With the obtained correspondence, we transform the generated data sets of feature shapes into vector space representations. We parameterize the example models using the face anthropometric measurements that reflect the facial physical structure, and predefine the interpolation functions for the parameterized example models based on radial basis functions. At runtime, the interpolation functions are evaluated to efficiently generate the appropriate feature shape by taking the anthropometric parameters as input. We use a shape blending approach for generating a seamlessly deformed mesh around the feature region boundary. The correspondence among all example textures is established by parameterizing the 3D generic mesh over a 2D image domain. The new feature texture with desired attributes is synthesized by interpolating the example textures. Our 3D face synthesis method has several advantages: (1) it regulates the naturalness of synthesized faces, maintaining the quality existing in the real face examples; (2) the region-based morphing and comprehensive face shape and texture control parameters allow more diverse faces to be generated readily; and (3) the automatic runtime face synthesis is efficient in time complexity and performs fast. Copyright # 2006 John Wiley & Sons, Ltd. Received: 10 April 2006; Revised: 2 May 2006; Accepted: 10 May 2006 KEY WORDS: face modeling; facial features; region-based morphing; anthropometry; texture; 3D scanned data Introduction Generation of realistic human face models is one of the most interesting problems in computer animation. Many applications such as character animation for films and advertisement, computer games, video teleconfer- ence, user-interface agents, and avatars require a large number of different faces. However, generating diverse 3D face models for these applications is a difficult and time-consuming task, particularly if realism is desired. One avenue for creating 3D human face models is manual construction by deforming an existing model or having an artist design one from scratch. Without a proper parameterization closely tied to the facial physical structure and constraints from real human faces, it usually requires a great deal of expertize and time-consuming manual control to avoid unrealistic results. With a significant increase in the quality and availability of 3D capture methods, a common approach towards creating face models of real humans uses laser range scanners to acquire both the face geometry and texture simultaneously. The scanning technology coupled with software for model reconstruction works well for capturing static faces, but often requires significant effort to process the noisy and incomplete captured surface data. More limiting, however, is that the resulting model corresponds to a single individual that is difficult to be automatically modified to yield a realistic, novel face. COMPUTER ANIMATION AND VIRTUAL WORLDS Comp. Anim. Virtual Worlds 2006; 17: 421–432 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cav.145 ******************************************************************************************************************* *Correspondence to: Y. Zhang, Center for Human Modeling and Simulation, Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104-6389, USA. E-mail: [email protected] ******************************************************************************************************************* Copyright # 2006 John Wiley & Sons, Ltd.

Synthesis of 3D faces using region-based morphing under intuitive control

Embed Size (px)

Citation preview

Page 1: Synthesis of 3D faces using region-based morphing under intuitive control

Synthesis of 3D faces using region-basedmorphing under intuitive control

By Yu Zhang* and Norman I. Badler* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

This paper presents a new region-based method for automatically synthesizing varied,

natural looking 3D human face models by morphing local facial features according to

intuitive control parameters. We automatically compute a one-to-one vertex correspondence

among the unregistered face scans in a large database by deforming a generic mesh to fit

the specific person’s face geometry in a global-to-local fashion. With the obtained

correspondence, we transform the generated data sets of feature shapes into vector space

representations. We parameterize the example models using the face anthropometric

measurements that reflect the facial physical structure, and predefine the interpolation

functions for the parameterized example models based on radial basis functions. At runtime,

the interpolation functions are evaluated to efficiently generate the appropriate feature shape

by taking the anthropometric parameters as input. We use a shape blending approach

for generating a seamlessly deformed mesh around the feature region boundary. The

correspondence among all example textures is established by parameterizing the 3D generic

mesh over a 2D image domain. The new feature texture with desired attributes is synthesized

by interpolating the example textures. Our 3D face synthesis method has several advantages:

(1) it regulates the naturalness of synthesized faces, maintaining the quality existing in the

real face examples; (2) the region-based morphing and comprehensive face shape and texture

control parameters allow more diverse faces to be generated readily; and (3) the automatic

runtime face synthesis is efficient in time complexity and performs fast. Copyright # 2006

John Wiley & Sons, Ltd.

Received: 10 April 2006; Revised: 2 May 2006; Accepted: 10 May 2006

KEY WORDS: face modeling; facial features; region-based morphing; anthropometry; texture;3D scanned data

Introduction

Generation of realistic human face models is one of the

most interesting problems in computer animation.

Many applications such as character animation for films

and advertisement, computer games, video teleconfer-

ence, user-interface agents, and avatars require a large

number of different faces. However, generating diverse

3D face models for these applications is a difficult and

time-consuming task, particularly if realism is desired.

One avenue for creating 3D human face models is

manual construction by deforming an existing model or

having an artist design one from scratch. Without a

proper parameterization closely tied to the facial

physical structure and constraints from real human

faces, it usually requires a great deal of expertize and

time-consuming manual control to avoid unrealistic

results. With a significant increase in the quality and

availability of 3D capture methods, a common approach

towards creating face models of real humans uses laser

range scanners to acquire both the face geometry and

texture simultaneously. The scanning technology

coupled with software for model reconstruction works

well for capturing static faces, but often requires

significant effort to process the noisy and incomplete

captured surface data. More limiting, however, is that

the resulting model corresponds to a single individual

that is difficult to be automatically modified to yield a

realistic, novel face.

COMPUTER ANIMATION AND VIRTUAL WORLDS

Comp. Anim. Virtual Worlds 2006; 17: 421–432

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cav.145* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

*Correspondence to: Y. Zhang, Center for Human Modelingand Simulation, Computer and Information ScienceDepartment, University of Pennsylvania, Philadelphia, PA19104-6389, USA. E-mail: [email protected]

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd.

Page 2: Synthesis of 3D faces using region-based morphing under intuitive control

The reported psychophysical evidence1 suggests that

internal facial features (eyes, nose, mouth and chin) are

important for discriminating faces. Thus, we construct a

face synthesis system by perceiving the face as a set of

feature regions. Such a region-based face synthesis

allows us to generate more diverse faces through

various combinations of the synthesized features. In

this paper, we present a new region-based method for

automatically synthesizing varied realistic face models

by morphing local facial features according to intuitive

control parameters. Our method takes as examples 3D

scanned data to exploit the variations of the real human

faces. We use a three-step model fitting approach for the

3D registration problem. The obtained correspondence

enables the application of principal component analysis

(PCA) to exemplar shapes of each facial feature to build

a local shape space. We parameterize the example

models using the face anthropometric measurements,

and predefine the interpolation functions for the

parameterized example models based on radial basis

functions. At runtime, the interpolation functions are

evaluated to efficiently generate the appropriate feature

shapes by taking the anthropometric parameters as

input. We use a shape blending approach to generate a

seamlessly deformed mesh around the feature region

borders. We determine correspondence among the

example textures by constructing a parameterization

of the 3D generic mesh over a 2D image domain. Having

the texture interpolators formulated, the runtime texture

synthesis becomes the interpolation function evaluation

according to the texture attribute parameters.

Previous and RelatedWork

Face modeling and animation is an active area of

research in computer graphics (see Reference [2] for an

excellent survey). For our purpose, we focus on

modeling of static faces which is directly related to

our work. In this category, several methods are

documented in the literature. The desire of the early

parametric models3–6 was to create an encapsulated

model that could generate awide range of faces based on

a small set of input parameters. However, manual

parameter tuning without constaints from real human

faces for generating a realistic face is difficult and time-

consuming. Furthermore, the choice of the parameter set

depends on the face mesh topology and therefore the

manual association of a group of vertices to a specific

parameter is required. Control of face texture is also

ignored in these models.

The image-based technique7–12 utilizes an existing 3D

face model and information from few pictures for the

reconstruction of both geometry and texture. Although

this technique can provide reconstructed face models

easily, its drawbacks are the inaccurate geometry

reconstruction and inability to generate new faces

without image counterparts.

Decarlo et al.13 construct a range of face models using

a variational constrained optimization technique with

the anthropometric measurements as constraints. How-

ever, this approach requires 1minute of computation for

the optimization process to generate a new face and

lacks the texture control. In contrast, our method is

efficient to generate a face with desired shape and

texture within a second. Moreover, we utilize the prior

knowledge of the face shape in relation with the given

measurements to regulate the naturalness of modeled

faces, maintaining the quality that exists in the real faces

of individuals. Kahler et al.14 use statistic data of face

anthropometric measurements to drive the landmark-

based face deformation according to growth and aging.

Our goal is to rapidly generate variation of the face

shape using anthropometric measurements as direct

control.

Blanz and Vetter15 present a process for estimating the

shape of a face in a single photograph, and a set of

controls for intuitive manipulation of appearance

attributes. There are several key differences from our

work. First, they manually assign the attribute values to

face shape and texture, and devise attribute controls for

a single variable using linear regression. We automati-

cally compute the anthropometric measurements for

face shape and relate several variables simultaneously

by learning a mapping between the measurement space

and the shape space through scattered data interp-

olation. Second, they use a 3D variant of a gradient-

based optical flow algorithm to derive the point-to-point

correspondence. This approach will not work well for

faces of different races or in different illumination given

the inherent problem of using static textures. We

present a robust method of determining correspon-

dences that does not depend on the texture information.

Third, our goal here is to synthesize faces from direct

control by perceiving the face as a set of independent

features.

Example-based synthesis is another stream of

research related to our method. Rose et al.16 and Sloan

et al.17 propose example-based motion blending frame-

works, employing scattered data interpolation. Lewis

et al.18 introduce an example-based pose space defor-

mation technique, and Allen et al.19 apply a similar

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 422 Comp. Anim. Virtual Worlds 2006; 17: 421–432

Y. ZHANG AND N. I. BADLER* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 3: Synthesis of 3D faces using region-based morphing under intuitive control

technique to range scan data for creating a new pose

model. Allen et al.20 and Seo et al.21 present methods for

generating a variety of human body shapes with

intuitive parameters based on unorganized scanned

data. Our method differs from these approaches in the

following respects. First, we efficiently identify feature

points for scanned data with much less user interaction

and use a robust model fitting approach. Also, our

framework is capable of generating variation of the

model texture. Third, we use a more comprehensive set

of control parameters to characterize the model

appearance. Last but not least, we focus on modeling

the face, especially facial features. We present effective

algorithms to address the problem of geometry and

texture blending that arises in this context.

Face Data and Feature PointIdentif|cation

We use the USF face database22 that consists of

Cyberware scans of 186 human faces (126 male and

60 female) with amixture of race and age. Each subject is

captured wearing a bathing cap and with a neutral

expression. The laser scans provide face structure

data (see Figure. 1(a)) and a 360� 524 RGB image for

texture mapping (see Figure. 1(b) and (c)). We use a

generic head model created with Autodesk Maya (see

Figure. 1(h)). It consists of 2274 triangles.

Let each 3D face in the database be Fi(i¼ 1,. . .,M). Since

the number of vertices inFi varies, we resample all faces

in the database so that they have the same number of

vertices all in mutual correspondence. Feature points

are identified to guide the resampling. In our method,

the feature points are identified semi-automatically.

Figure. 1(d)–(g) depicts the process. A 2D feature

template consisting of polylines groups a set of 83

feature points that correspond to the facial features such

as the eyes, eyebrows, nose, mouth, and face outline. It is

superimposed onto the front-view face image obtained

by orthographic projection of a textured face scan. The

facial features in this image are identified by using the

Active Shape Models (ASM)23 and the feature template

is fitted to the features automatically. Little user

interaction is needed to tune the feature point positions

due to slight inaccuracy of the automatic facial feature

detection. The 3D positions of the feature points on the

scanned surface are then recovered by re-projection to

the 3D space. In this way, we efficiently define a set of

feature points in a face Fi as Ui¼ ui, 1,. . ., ui,n, where

n¼ 83. Our generic model G is already tagged with the

corresponding set of feature points V¼ v1,. . ., vn by

default.

Figure 1. Face data and semi-automatic feature point identification: (a) scanned face geometry; (b) acquired color image; (c)

textured face scan; (d) initial outline of the feature template; (e) after automatic facial feature detection; (f) after interactive user

tuning; (g) and (h) feature points identified on the scanned data and generic model, respectively.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 423 Comp. Anim. Virtual Worlds 2006; 17: 421–432

SYNTHESIS OF 3D FACES UNDER INTUITIVE CONTROL* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 4: Synthesis of 3D faces using region-based morphing under intuitive control

Model Fitting

The problem of deriving full correspondence for all

models Fi can be stated as: resample the surface for all

Fi using G under the constraint that vj is mapped to ui,j.

We define an RBF interpolation function that gives a

mapping between vj and ui,j:

fðxÞ ¼Xnj¼1

wjfjðkx� vjkÞ þMxþ t (1)

where x2R3 is a vertex on the generic headmodel and f

is a radial basis function. wj, M and t are the unknown

parameters. Among them, wj 2 R3 are the interpolation

weights, M 2 R3� 3 represents rotation and scaling

transformations, and t 2 R3 represents translation

transformation. Many different functions for f(r) have

been proposed.24 We had better results with the multi-

quadric function fðrÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffir2 þ r2

p, where r is the locality

parameter and determined as the Euclidean distance to

the nearest other feature point. To determinewj,M and t,

we solve the following equations:

ui;j ¼ fðvjÞjnj¼1;Xnj¼1

wj ¼ 0;Xnj¼1

wTj vj ¼ 0 (2)

This system of linear equations is solved using an LU

decomposition. The generic model is then deformed by

feeding all its vertices into Equation. 1 (see Figure. 2(a)).

We further improve the shape using a local defor-

mation which ensures that all the generic mesh vertices

are truly embedded in the scanned surface. The local

deformation is based on the closest points on the

surfaces of the generic model and the scanned data. The

vertices of the generic model are displaced towards their

closest positions on the surface of the scanned data. The

polygons of the scanned data are organized into a binary

space partition tree to speed up the process of the closest

point identification. Each generic mesh vertex takes the

texture coordinates of the sampling point. Figure. 2(b)

and (c) show the result of local deformation.

In order to accurately represent the high-resolution

surface detail, we use a quaternary subdivision scheme

to construct the subdivision hierarchy on top of the base

mesh resulting from the local deformation process. The

newly generated vertices by subdivision are not

necessarily lying on the scanned surface. We employ

a normal mesh25 to represent a hierarchy of surface

details. For each newly generated vertex, we compute

the distance from it along its normal direction

to the nearest point on the scanned surface. By apply-

ing the computed distances as displacements to the

vertices, the subdivision mesh is refined (see

Figure 2(d)). In our experiments, normal meshes up

to level 2 are used for a good trade-off between

approximation accuracy and computational cost.

Region-Based Face ShapeMorphing

Forming Local Shape Spaces

To morph local shapes of facial features, we form local

shape spaces using PCA. The model fitting process

generates the necessary vertex-to-vertex correspon-

dence across 3D faces in the database, which is the

prerequisite of PCA. We divide faces into subregions

Figure 2. Model fitting: (a) generic model after global warp-

ing; (b) after local deformation; (c) textured; (d) level-by-level

model refinement using normal mesh; (e) meshes of four

features decomposed from the level 2 normal mesh shown

in (d).

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 424 Comp. Anim. Virtual Worlds 2006; 17: 421–432

Y. ZHANG AND N. I. BADLER* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 5: Synthesis of 3D faces using region-based morphing under intuitive control

that can be morphed independently. Since all face scans

are in correspondence through mapping onto the

generic model, it is sufficient to define these regions

only on the generic model. We partition the generic

mesh into four regions: eyes, nose, mouth, and chin. The

segmentation is transferred to the multi-resolution

normal meshes to generate individualized feature

shapes with correspondence (see Figure. 2(e)). Note

that in order to isolate the shape variation from the

position variation, we normalize all face scans with

respect to the rotation and translation of the face before

the model fitting step. Thus, PCA can be performed

directly on the obtained data sets of feature shapes.

Given the set {F} of features, we obtain a compact

representation for themeshes of each facial feature using

PCA. Let {Fi}i¼ 1,. . .,M be a set of example meshes of

feature F, each mesh being associated to one of the M

meshes of the database. Thesemeshes are represented as

vectors that contain the x, y, z coordinates of the nFvertices Fi ¼ ðxi1; yi1; zi1; . . . ; xinF ; yinF ; zinFÞ 2 R3nF . Each

mesh can be expressed as a linear combination of Mþ 1

meshes fFFj gj¼0;...;M:

Fi ¼ FF0 þ

XMj¼1

aFij FFj (3)

where FF0 is the mean shape, FF

j are the eigenvectors of

the covariance matrix of the set fFi �FF0g. By truncating

the expansion of Equation. 3 at j¼ kF we introduce an

error whose magnitude decreases when kF is increased.

We choose the kF such thatPkF

j¼1 lFj � t

PMj¼1 l

Fj , where t

defines the proportion of the total variation (98% for

each feature in our experiments).

Each eigenvector is a new coordinate axis for our

existing data; thus each feature mesh can be restated as a

point in the space spanned by the PCA-yielded

orthogonal mesh basis. We call these axes eigenmeshes.

aFij ðj ¼ 1; . . . ; kFÞ give the coordinates of the feature mesh

in terms of the reduced eigenmesh basis.

Anthropometric Parameters

Although eigenmeshes represent the most salient

directions of the feature shape variation in the dataset,

they bear little resemblance to the underlying inter-

dependent structure of biological forms. Arguably, face

anthropometry provides a set of meaningful measure-

ments or shape parameters that allow the most complete

control over the shape of the face. Anthropometric

study26 describes a set of 132 measurements for

characterizing the human face. The measurements are

taken between the landmarks defined in terms of

visually-identifiable or palpable features on the subject’s

face. Such measurements use a total of 47 landmarks for

describing the face. Following the conventions laid out

in Reference [26], we have chosen a subset of 38

landmarks from the standard landmark set for anthro-

pometric measurements (see Figure. 3).

Instead of supporting all 132 measurements, in this

paper, we are only concerned with those related to four

facial features. The example models are placed in the

standard posture for measurements. The measurements

are computed using the Euclidean coordinates of

landmarks. Particularly, the axial distances correspond

to the x, y, and z axes of the world coordinate system.

Such a systematic collection of anthropometric measure-

ments are taken through all example models to

determine their locations in a multi-dimensional

measurement space.

Feature Shape Synthesis

From the previous stage we obtain a set of examples of

each feature with measured shape characteristics, each

of them consisting of the same set of dimensions, where

every dimension is an anthropometric measurement.

We assume that an example model Fi of feature F hasmF

dimensions, where each dimension is represented by a

value in the interval ð0; 1�. A value of 1 corresponds to

the maximum measurement value of the dimension.

That is, the example measurements are normalized. The

measurements of Fi can then be represented by the

vector

qFi ¼ ½qFi1; . . . ; qFimF

�; 8j 2 ½1;mF� : qFij 2 ð0; 1� (4)

Figure 3. Anthropometric landmarks (green dots). The land-

mark names are taken from Reference [26].

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 425 Comp. Anim. Virtual Worlds 2006; 17: 421–432

SYNTHESIS OF 3D FACES UNDER INTUITIVE CONTROL* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 6: Synthesis of 3D faces using region-based morphing under intuitive control

This is equivalent to projecting each example model Fiinto a measurement space spanned by the mF selected

anthropometric measurements.

With the input shape control thus parameterized, our

goal is to generate a new deformation of the generic

mesh by computing the corresponding eigenmesh

coordinates with control through the measurement

parameter. Given an input measurement qF in the

measurement space, such controlled deformation

should interpolate the example models. To do this,

we interpolate the eigenmesh coordinates of the

example models. The interpolation is multi-dimen-

sional. Consider a RmF!R mapping, the interpolated

eigenmesh coordinates aFj ð�Þ 2 R; 1 � j � kF at an input

measurement vector qF 2 RmF are computed as:

aFj ðqFÞ ¼XMi¼1

g ijRiðqFÞ for 1 � j � kF (5)

where gij 2 R are the radial coefficients and M is the

number of example models. Let qFi be the measurement

vector of an example model. The radial basis function

Ri(qF) is a multi-quadric function of the Euclidean

distance between qF and qFi in the measurement space:

RiðqFÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffikqF � qF

i k2 þ r2i

qfor 1 � i � M (6)

where ri are the locality parameter used to control the

behavior of the basis function

ri ¼ mini6¼jkqFi � qF

j k i; j ¼ 1; . . . ;M (7)

The jth eigenmesh coordinate of the ith example

model, aFij corresponds to the measurement vector of the

ith example model, qFi . Equation. 5 should be satisfied

for qFi and a

Fij . Hence, by substituting qF

i and aFij for qF

and aFj respectively in Equation. 5, we have

aFij ðqFi Þ ¼

XMi¼1

g ijRiðqFi Þ for 1 � j � kF (8)

gij are obtained by solving Equation. 8 using an LU

decomposition. We can now generate the eigenmesh

coordinates, hence the shape, corresponding to the input

measurement vector qF according to Equation. 5.

Smooth Shape Blending

After the shape interpolation procedure, the surround-

ing facial areas should be blended with the deformed

facial features to generate a seamlessly smooth mesh.

The position of a vertex xi in the feature region F after

deformation is xi0. LetV denote the set of vertices of the

head mesh. For smooth blending, positions of the subset

VF ¼ V nVF of vertices that are not inside the feature

region should be updated with deformation of the facial

features. For each vertex xj 2 VF, the vertex in each

feature region that exerts influence on it, xFki , is the one

of minimal distance to it, that is k xj � xFki k¼minfiji2VFg k xj � xi k . Note that the distance is

measured offline in the original undeformed generic mesh.

For each non-feature vertex xj, the displacement vector

for its corresponding closest feature vertex xFki is used to

update its position in shape blending. The displacement

is weighted by an exponential fall-off function according

to the distance between xj and xFki:

x0j ¼ xj þXF2G

exp � 1

akxj � xFkik

� �kx0ki F � xFkik (9)

where G is the set of features and a controls the size of

the region influenced by the blending. We set a to 1/10

of the diagonal length of the bounding box of the head

model.

Region-Based FaceTextureMorphing

Mesh Parameterization

Tomorph textures of facial features, we form local texture

spaces by using PCA. Again, in general, applying PCA to

a set of face images requires normalization to remove

texture variation due to shape difference, and corre-

spondences must be found between face images. In our

case, however, correspondences between the two

textures are implicit in the texture coordinates of the

two associated face meshes. Since every face generated

from one generic model has a similar characteristic for

texture coordinates, we can produce shape-free face

textures by constructing a parameterization of the 3D

generic mesh over a 2D image plane.

Given the vertex-wise correspondence between a

fitted generic mesh (base mesh) and the original

undeformed generic mesh, it is trivial to transfer a

texture map between them. Each vertex on the original

generic mesh simply takes the texture coordinates of its

corresponding vertex on the base mesh for texture

mapping (see Figure. 4(b)). We parameterize the 3D

generic head mesh over a 2D domain [0, 1]2 in order to

obtain a shape-free texture map. We project the original

generic mesh rendered with the transferred texture to a

2D image plane by implementing a cylindrical projec-

tion. The resulting cylindrical coordinates map to a

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 426 Comp. Anim. Virtual Worlds 2006; 17: 421–432

Y. ZHANG AND N. I. BADLER* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 7: Synthesis of 3D faces using region-based morphing under intuitive control

suitable resolution cylindrical texture image (512� 512 in

our experiments) in which each pixel value represents

the surface color of the texture-mapped face surface in

cylindrical coordinates (see Figure. 4(c)). The generic

mesh can be textured with this cylindrical texture image

using normalized cylindrical coordinates as the texture

coordinates.

Forming Local Texture Spaces

As the generic mesh has been partitioned into four

feature regions, the cylindrical texture image can

be divided into corresponding texture patches (see

Figure. 4(d)). PCA is used to parameterize the local

textures in a low dimensional space. We represent the

shape-free texture of a facial feature by a texture vector

I¼ðR1;G1;B1; . . . ;Rn;Gn;BnÞ 2 R3n that contains the R,

G, B color values, where n is the number of pixels in the

texture image. A texturemodel is then constructed using

a data set of M (M�n) exemplar local textures:

I ¼ C0 þXMj¼1

bjCj ¼ C0 þCb (10)

where C0 is the mean texture and C ¼ ðC1j . . . jCMÞ isthe orthogonal texture basis consisting of eigentextures.

The vector b defines a set of parameters of a texture. In

the obtained M eigentextures, we choose top k modes,

which correspond to the largest eigenvalues.

FeatureTexture Synthesis

While facial feature shapes can easily be related to

anthropometric measurements, texture attributes can

hardly be measured quantitatively. For each facial

feature, we define a set of distinct texture attributes to

build a texture attribute space where each attribute

represents an axis in the space. We manually assign

the attribute values (in the interval (0,1]) that describe

the marked-ness of the attributes to each example

texture, projecting it into the texture attribute space. We

then map the high-level texture control parameters onto

the eigentexture coefficients through the scattered data

interpolation. Given a new set of input texture attribute

values, the desired characteristics can be synthesized on

the cylindrical full-head texture image (see Figure. 4(e))

by blending the example local textures through RBF-

based interpolation. The reader is referred to Reference

[27] for detailed description of our full-head texture

generation technique.

SmoothTexture Blending

We perform a gradual blend with the surrounding area

for region-based texture morphing. The pixels on the

outermost ring of the feature region are grouped into

the boundary pixel set P0 ¼ fp01; . . . ; p0n0g, where n0 is the

number of boundary pixels. We then identify N rings of

pixels around the feature region to define the

blending region (see Figure. 4(e)). We denote CðP0Þ ¼fCðp01Þ; . . . ;Cðp0n0Þg and C0ðP0Þ ¼ fC0ðp01Þ; . . . ;C0ðp0n0Þg,the color sets of boundary pixels before and after

texture morphing, respectively. The change of boundary

pixel colors is used to update colors of the set

Pj ¼ fpjkjk ¼ 1; . . . ; njg of pixels that are in the jth ring

around the region, where j ranges from 1 to N, and nj is

the number of pixels in the jth ring. The color updating is

Figure 4. (a) Base mesh with texture mapping. (b) Texture transferred to the original undeformed generic model. (c) Cylindrical

texture image. (d) Segmented textures of four features. (e) Facial feature regions and blending regions in the mean cylindrical full-

head texture image: the local feature regions are in red with their boundaries in blue. The white areas are the texture blending

regions. The overlap of two blending regions is in green.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 427 Comp. Anim. Virtual Worlds 2006; 17: 421–432

SYNTHESIS OF 3D FACES UNDER INTUITIVE CONTROL* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 8: Synthesis of 3D faces using region-based morphing under intuitive control

executed in an order starting at the 1st ring and

expanding towards the Nth ring:

C0ðpjkÞ ¼ CðpjkÞ þWj

sjk

Xsjkl¼1

ðC0ðpj�1l Þ � Cðpj�1

l ÞÞ

j ¼ 1; . . . ;N

(11)

where pj�1l are the pixels that are adjacent to the pixel p

jk

and in the (j� 1)th ring, sjk is the number of all found

pixels pj�1l for p

jk and Wj is a weight function to

attenuate the color updating. Wj is defined according to

the distance between the pixel and the region boundary

measured in terms of the number of the ring inwhich the

pixel is located:

Wj ¼ fðjÞfðj� 1Þ j ¼ 1; . . . ;N;

fðxÞ ¼ e�kx � e�kN

1� e�kN

(12)

where function f defines an attenuation profile which

dictates the way the change of the texture color

decreases from the feature region boundary to the

surrounding Nth-ring pixels. The parameter k controls

the profile of attenuation. We use k¼ 0.2 and N¼ 10 in

our experiments. As shown in Figure. 4(e), the blending

regions of every two features overlap. To obtain a

smooth color transition, we compute aweighted average

of the change of the pixel color contributed from each

blending region for pixels in the overlapping zone.

Results

Table 1 shows the number of eigenmeshes/eigentex-

tures and input shape/texture control parameters used

for feature morphing. The shape/texture parameters

provide an 82-D combined appearance control for face

synthesis. The user can select the feature to work on

using a GUI. The parameter values are interactively

chosen within [0,1].

Figure. 5 illustrates a number of distinct facial shapes

produced by region-based shape morphing. Figure. 6

shows some examples of new facial appearances

generated by region-based texture morphing and the

snapshots of dynamic feature texture morphing. Note

that it is not necessary to begin with the average model.

We can start with any facemodel of a specific person and

edit various aspects of its shape and texture. Figure. 7

illustrates face editing results on two subjects.

In order to quantify performance, we arbitrarily

selected ten examples in the database for the cross-

validation. Each example has been excluded from the

example database in training the face synthesis system,

and its shape and texture measurements were used as a

test input to the system. We assess the reconstruction

by measuring the maximum, mean, and root mean

square (RMS) errors from the feature regions of the

output model to those of the input model. For the

feature shapes, errors are computed by the distances

between the corresponding vertex positions. Table 2

shows the average errors measured for the ten

reconstructed models. The errors are given using both

absolute measures (/mm) and as a percentage of the

diameter of the output head model bounding box. For

textures, errors are measured as the differences

between colors of the corresponding pixels of the

input and output cylindrical full-head textures in terms

of Euclidean distance in the RGB color space. The

average errors in absolute measurements (the color

value of each channel is in the interval [0, 255]) are

given in Table 2.

Our system is implemented on a 2.8 GHz PC with an

Nvidia Quadro FX 3450 graphics board. Even though

the preprocessing steps (model fitting, PCA of feature

shapes and textures, computing anthropometric mea-

surements, and LUdecomposition) take up considerable

time, this does not impair usability due to the auto-

mation (beyond initial feature point identification and

assignment of texture attribute values) and their one-

time computation nature. At runtime, our scheme

spends about 1 second in generating a new face using

Shape Texture

Eyes Nose Mouth Chin Eyes Nose Mouth Chin

Numer of eigenmodes used formorphing 23 26 20 18 32 21 26 17Number of control parameters 13 20 12 7 10 5 9 6

Table 1. Number of eigenmodes and high-level control parameters used in our system

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 428 Comp. Anim. Virtual Worlds 2006; 17: 421–432

Y. ZHANG AND N. I. BADLER* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 9: Synthesis of 3D faces using region-based morphing under intuitive control

the level 2 norm mesh (36, 392 triangles) rendered with

a 512� 512 texture image upon receiving the input

parameters.

Conclusion and FutureWork

We have presented a new region-based method for

synthesizing realistic faces by morphing local facial

features according to intuitive control parameters. The

original contribution of our method is given in terms of

the following advantages:

� As correlations between control parameters and the

face shape and texture are estimated by exploiting

the real faces of individuals, our method regulates

the naturalness of synthesized faces.

� Our system provides sets of comprehensive anthro-

pometric parameters to easily control face shape

characteristics by taking into account the physical

structure of real faces.

Figure 5. (a) Automatically generated face models by morphing the shapes of four facial features on the average model (outlined)

according to the input anthropometric parameters. (b) Close view of the synthesized shapes of each individual facial feature.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 429 Comp. Anim. Virtual Worlds 2006; 17: 421–432

SYNTHESIS OF 3D FACES UNDER INTUITIVE CONTROL* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 10: Synthesis of 3D faces using region-based morphing under intuitive control

Figure 7. Variation of facial features of two individual faces. Top row: region-based shape morphing. Bottom row: region-based

texture morphing. The original model is in the first image of each example.

Figure 6. (a) Some novel facial appearances generated by morphing the textures of four facial features on the average model

rendered with the mean texture (outlined). (b) Region-based face texture morphing (left to right in each example). Face shape is

unchanged.

Shape Texture

Eyes Nose Mouth Chin Eyes Nose Mouth Chin

Averagemax. 3.85 (0.91%) 3.55 (0.84%) 6.58 (1.65%) 4.46 (1.06%) 18.3 15.9 23.5 25.7Averagemean 1.37 (0.33%) 1.62 (0.38%) 2.04 (0.49%) 2.57 (0.57%) 7.8 7.2 11.7 9.4Average RMS 1.93 (0.46%) 2.23 (0.53%) 2.84 (0.67%) 3.62 (0.86%) 10.6 10.1 15.8 13.1

Table 2. Cross validation results of our 3D face synthesismethod

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 430 Comp. Anim. Virtual Worlds 2006; 17: 421–432

Y. ZHANG AND N. I. BADLER* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 11: Synthesis of 3D faces using region-based morphing under intuitive control

� The augmentation of the control from face shape to

texture allows more diverse face appearances to be

generated.

� The automatic runtime face synthesis is efficient in

time complexity and performs fast.

The scanned face data provides the best available

resource for face synthesis. On the other hand, itmight be

a limitation since the morphing is limited in its expres-

sive power by the variety of the faces in the training

database. We would like to extend our current database

to incorporate more faces of different races as well as to

increase diversity of age. We also plan to extend our

system on morphing other face regions such as the

forehead, cheeks, and upper jaw. In order to fully auto-

mate the system implementation, we would like to use

some extensions of theASM to achievemore robust facial

feature detection.Automatic determination of the texture

attribute values is also one of the future challenges.

References1. Young A, Hay D. Configurational information in face per-

ception. Experimental Psychology Society, January 1986.2. Parke FI, Waters K. Computer Facial Animation. AK Peters:

Wellesley, MA, 1996.3. DiPaola S. Extending the range of facial types. Journal of

Visualization and Computer Animation 1991; 2(4): 129–131.4. Magnenat-Thalmann N, Minh H, deAngelis M, Thalmann

D. Design, transformation and animation of human faces.The Visual Computer 1989; 5: 32–39.

5. Parke FI. Parameterized models for facial animation. IEEEComputer Graphics and Application 1982; 2(9): 61–68.

6. Patel M, Willis P. FACES: the facial animation, constructionand editing system. Eurographics’91, 1991, pp. 33–45.

7. Akimoto T, Suenaga Y, Wallace RS. Automatic creation of3D facial models. IEEE Computer Graphics and Application,1993; 13(5): 16–22.

8. Pighin F, Hecker J, Lischinski D, Szeliski R, Salesin DH.Synthesizing realistic facial expressions from photographs.In Proceedings of SIGGRAPH’98, July 1998, pp. 75–84.

9. Guenter B, Grimm C, Wood D, Malvar H, Pighin F.Making faces. In Proceedings of SIGGRAPH’98, July 1998,pp. 55–66.

10. Lee WS, Magnenat-Thalmann N. Fast head modeling foranimation. Journal Image and Vision Computing 2000; 18(4):355–364.

11. Liu Z, Zhang Z, Jacobs C, Cohen M. Rapid modeling ofanimated faces from video. Journal of Visualization andComputer Animation 2001; 12(4): 227–240.

12. Park IK, Zhang H, Vezhnevets V, Choh HK. Image-basedphotorealistic 3D face modeling. In Proceedings of IEEEAutomatic Face and Gesture Recognition, 2004, pp. 49–54.

13. DeCarlo D, Metaxas D, Stone M. An anthropometric facemodel using variational techniques. In Proceedings ofSIGGRAPH’98, July 1998, pp. 67–74.

14. Kahler K, Haber J, Yamauchi H, Seidel H-P. Head shop:generating animated head models with anatomical struc-ture. In Proceedings of ACM SIGGRAPH Symposium onComputer Animation, 2002, pp. 55–64.

15. Blanz V, Vetter T. A morphable model for the synthesis of3D faces. In Proceedings of SIGGRAPH’99, August 1999,pp. 187–194,

16. Rose C, Cohen M, Bodenheimer B. Verbs and adverbs:multidimensional motion interpolation using RBF. IEEEComputer Graphics and Application, 1998; 18(5): 32–40.

17. Sloan P-P, Rose CF, Cohen MF. Shape by example. InProceedings of ACM SIGGRAPH Symposium on Interactive3D Graphics, 2001, pp. 135–143.

18. Lewis J, Cordner M, Fong N. Pose space deformations: aunified approach to shape interpolation and skeleton-driven deformation. In Proceedings of SIGGRAPH’00, July2000, pp. 165–172.

19. Allen B, Curless B, Popovic Z. Articulated body defor-mation from range scan data. In Proceedings of SIG-GRAPH’02, 2002, pp. 612–619.

20. Allen B, Curless B, Popovic Z. The space of human bodyshapes: Reconstruction and parameterization from rangescans. Proc. SIGGRAPH’03, 2003, pp. 587–594.

21. Seo H, Magnenat-Thalmann N. Automatic modeling ofhuman bodies from sizing parameters. In Proceedings ofACM SIGGRAPH Symposium on Interactive 3D Graphics,2003, pp. 19–26.

22. USF DARPA HumanID 3D Face Database, Courtesy ofProfessor Sudeep Sarkar, University of South Florida.

23. Cootes TF, Taylor CJ, Cooper DH, Graham J. Active shapemodels: their training and applications. Computer Vision andImage Understanding 1995; 61(1): 38–59.

24. Carr JC, Beatson RK, Cherrie JB, Mitchell TJ, Fright WR,MccallumBC, Evans TR. Reconstruction and representationof 3d objects with radial basis functions. In Proceedings ofSIGGRAPH 2001, August 2001, pp. 67–76.

25. Guskov I, Vidimce K, Sweldens W, Schroder P. Normalmeshes. Proceedings of SIGGRAPH’00, July 2000. pp. 95–102.

26. Farkas LG. Anthropometry of the Head and Face. Raven Press:New York, 1994.

27. Zhang Y. An efficient texture generation technique forhuman head cloning and morphing. In Proceedings of Inter-national Conference on Computer Graphics Theory and Appli-cations, February 2006.

Authors’ biographies:

Yu Zhang is currently a postdoctoral researcher in theComputer and Information Science Department atthe University of Pennsylvania. He received his B.E.and M.E. degrees from Northwestern Polytechnical

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 431 Comp. Anim. Virtual Worlds 2006; 17: 421–432

SYNTHESIS OF 3D FACES UNDER INTUITIVE CONTROL* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 12: Synthesis of 3D faces using region-based morphing under intuitive control

University, Xi an, China, in 1997 and 1999, respectively.He received his Ph.D. from Nanyang TechnologicalUniversity, Singapore, in 2004. From 2003 to 2005, hewas a research fellow in the Department of ComputerScience, School of Computing, National University ofSingapore. In 2005, he worked as a research scientist atthe Genex Technologies, Inc., USA. His research inter-ests include computer graphics, computer animation,physically-based modeling, visualization, and virtualreality. He is a member of the IEEE Computer Societyand the ACM SIGGRAPH.

Norman I. Badler is a professor of Computer andInformation Science at the University of Pennsylvania,and has been on the faculty since 1974. His researchfocuses on animation via simulation, embodied agentsoftware, and computational connections betweenlanguage, instructions, and action. He directs the CenterforHumanModeling and Simulation at Penn and is a co-editor of Graphical Models.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 432 Comp. Anim. Virtual Worlds 2006; 17: 421–432

Y. ZHANG AND N. I. BADLER* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *