AN MPEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEM.pdf

Embed Size (px)

Citation preview

  • 7/28/2019 AN MPEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEM.pdf

    1/6

    AN M PEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEMGunnar Hovden and Nam LingComputer Engineering Department

    Santa Clara UniversitySanta Clara. USA

    ABSTRACTWe present a method for generating MPEG-4 F APs (FacialAnimation Parameters) from a video sequence of a talkinghead. The method includes a render unit' that animates aface based on a set of FAPs. The render unit providesfeedback to the FAP generation process as guidancetoward an optimal set of FAPs. Our optimization processconsists of minimizing a penalty function, which includesa matching function aud a few barrier functions. Thematching fuiiction compares how well an animated facematches with the original face. Each barrier functionindicates the level of distortion from a normal looking facefor a certain part of a face, and advises the optimizer.Unnecessary FAPs are eliminated and the search ispartitioned to speed up the optimization process. Threedifferent search techniques, Steepest Decent Method,Linear Search Method. and Cyc lic Coordinates Method areapplied to derive an optimum.

    1. INTRODUCTIONThe MPEG-4 FBA (Face and Body Animation) standard[ l ] defines a face model that can be used to animate thehuman face. A total of 68 FAPs (Facial AnimationParameters) are defined by the standard. Each FAPdescribes the movement of a certain feature point on theface (for example, the left comer of the inner part of thelip or the right corner of the right eyebrow). A stream ofFAPs can animate the movements, moods and expressionsof a talking face. Very high compression can be achievedby this method, making it suitable for many consumerelectronics products including video phone and videoconferencing, cell phones and PDA's with videocapabilities, and Internet agents with virtual human faceinterface. In addition, PC and video games will benefitfrom the added realism provided by close-ups of animatedhuman faces. and characters in cartoons can be animatedwith FAPs generated by a human actor for better facialexpressions.

    The success of facial animation relies on being able toreliably and accurately track facial features and generateFAPs. Many methods have been proposed includ ing theuse of eigen space s~ and deformable graphs [2],segmentation and geometrical characteristics [3], colorsnakes [4], color tracking [ S I , as well as edge detectionand templates [ 6 ] . Known difficulties with the abovemethods include lack of robustness and accuracy.

    The uniqueness of our method is th e use of feedbackfrom the render unit to ensure good resemblance betweenthe animated face and the original face from the videosequence. This improves the robustness and accurac y overexisting methods.

    11. PROPOSED FAP GENERATION SYSTEMFigure 1 shows an overview of the proposed FAPgeneration system. A video sequence contains frames of atalking person's face. The render unit [7 ] is based on athree-dimensional face model with texture mapping anddelivers high quality, lifelike animations of the face basedon the FAPs. For the first iteration all FAPs are set to 0,which corresponds to a face with neutral expression andclosed mouth looking straight ahead. The animated face iscompared to the original face from the video sequence audthe penalty for the animated face is computed. The pen altyindicates how good the animated face matches the originalface. It reflects how humans perceive colors, patterns andsimilarities between pictures. Higher value means poorermatch. If penalty = 0, then the original face and theanimated face are ide ntical. The pena lty is fed back to theFAP optimizer, and is used to guide the FAP o ptimizer tofind a set of better FAPs. Determining the values of thebest FAPs is an iterative process. The process continuesuntil som e stopping criteria are fulfilled, and the F AP s arethen finalized.

    ' We would like to thank face2face animation inc. [7] for generously letting us use their face model and render unit

    0-7803-7795-8/031$17.00 0 003 IEEE 171

    Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on August 2, 2009 at 22:48 from IEEE Xplore. Restrictions apply.

  • 7/28/2019 AN MPEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEM.pdf

    2/6

    Render unit

    Frames

    +V enaltyFig. I Overview o f the FAP generation process

    111. P E N A L T Y F U N C T I O NThe pena lty is a function o f tlie 66 FAPs (not counting thefirst two FAPs, which are high-level FAPs) defined in theMPEG-4 standard. We can express tlie penalty as

    penalty =.xfap3, , ap4, ..., fap68) (1)where.fap3 is the value for FAP #3 , etc.

    The coniplexity of the peiialty function depend s on thequality of the face model and the render unit. If the imagefrom th e rendcr unit is lifelike then a simple penaltyfu~lctioii all be used with great success. Our render unithas very high quality, giviiig very lifelike results. A pixel-by-pixcl peiialty function w ill suffice. Each pixel in theoriginal image will he compared with the pixel at thecorresponding location in the animated image. We areusing R G B color space. Colors are represented with athree-tuple of real numbers ( r , g . b) ranging from 0 to 1.A match function, mafch( r , , g , . b , , r2 ,g ,, b2), ells howwell the colors match. We have derived tlie followinginatcli function:

    (2 )i1arcNr1,g,,b,, 2 x g 2 ,,)= ( I -obs(r,-rJ).(l -abs(g,-g,)).(l -abs(b,-b,))where obs(...) is the absolute value.

    The above match function returns 1 if two colors andtheir intensities are identical, othenvise it returns a non-negative value less thaii I . The match function returns a

    value lower than I if the two colors have differentiiitensity levels. The skin on the face is mostly of the samecolor - ifferent intensity levels are thus important formatching. Different colors that have the same inteusitiesalso result in a value lower than 1, so that the lips areclearly distinguished from the surrounding area even ifthey happen to have similar intensities.The penalty is based on the match function and isdefined as:

    penalty =xfap3, .._,ap68)mask(x,y)f 1 - m a f c h ( o r i , ( x , y ) , o r i ~ ( x , y ) ,

    where1, if the pixel at (x,y) is part of the face

    {, if the pixel at (x,y) is not part of tlie faceask(x,y)=an d ori(x , y ) is the pixel at position (x, y ) in the originalframe aiid ani(x, y) is the pixel at position (x , y ) in theanimated frame.The mask function in ( 3 ) is used to mask out the part

    of the image that is not part of the face, so that tliebackground does not affect the penalty. The mask is basedon the rendered face rather than on the original face, sincethe area occupied by the rendered face, as opposed to theoriginal face, can he determined with 100% accuracy.Figure 2 shows an example of a face and thecorresponding mask.

    Fig. 2(a) A rendered face (b) The corresponding maskAlthough ( 3 ) defines the penalty in terms of pixels, colormatches and masks, it is important to remembe r that thismerely provides a way of evaluating the function.Xfap3,..., ap68). The penalty is a function of th e 66 FAPs.

    172

    Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on August 2, 2009 at 22:48 from IEEE Xplore. Restrictions apply.

  • 7/28/2019 AN MPEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEM.pdf

    3/6

    Searching i n a 66-dimensional space is hard. Add noiseand the problem of local versus global minima, and thetask becomes ovenvhelming. Hence, we need searchsti-ategies to m ake the search practical.

    IV . ELI MI NATING UNN ECESSARY FAPSNot all of the FAPs defined i n th e MPEG-4 FB A standardare necessary to make a lifelike animation that is truthfulto the original face. A total of 45 FAPs are used in ourexperiment, and they are:

    0 Ja w (2 FAPs)0 Outer lip ( I O FAPs)0 Inner lip (IO FAPs)0 Cheek (4 FAPs)0 Eyelids (4 FAPs)0 Eyeballs (4 FAPs)0 Eyebrows (8 FAPs)0 Head pose (3 FAPs)

    Th e MPEG-4 FB A standard assumes that the head doesnot move vertically, horizontally, or back and forth. It is,however, nccFssary to know the location of the headbefore the other FAPs can be found. Our search space istherefore augmented to 4 8 dimensions.

    V. PARTI TION TH E SEARCH SPACEThe complexity of the optimization problem can hereduced by partitioning the 48-tuple of FAP values intosmaller, independent optimization problems. It isconsidcrably easier (and faster) to partition a search spaceinto independent spaces with lower dimensions and searchi n each smaller space than to search in the original space.Dissiinilaritics between the original and the animated facei n the upper part of the face (eyes and eyebrows) do notaffect the dissimilarities in the lower part of th e face (nose,lips, cheek, and chin), and vice versa. We can thereforesearch for an optinial solution for FAPs related to theuppcr face independently from the FAPs related to thelower face. Further partitioning is possible; an example isto partition the left from the right part of the face.The partitions used in our experiment are as follows:Location (horizontal position, vertical position, and

    scale) and pose (roll, pitch, and yaw) of head (6FAPs)J aw (2 FAPs)Inner lip (IO FAPs)Outer lip ( I O FAPs)Left cheek (2 FAPs)Kight cheek (2 FAPs)Left eyelid (2 FAPs)Right eyelid (2 FAPs)

    0 Left eyeball (2 FAPs)0 Right eyeball (2 FAPs)0 Left eyebrow (4 FAPs)0 Right eyebrow (4 FAPs)

    VI. ANATOM I CAL CONSTRAI NTSThe penalty function as given by (3 ) ha s no knowledgeabout the appearance of a normal face. What may seem tobe a goo d fit according the penalty may look exaggerated,twisted and distorted to a human. We need to guide theFAP optimizer unit to avoid exaggerated, twisted anddistorted faces.So far we have attempted to generate FAPs byminimizing the penalty function, penalfy =A...).We nowadd one term, which includes knowledge about hownormal and non-distorted faces look. The problem thenbecomes to minimize penal@ = x f a p 3 , ..., Jap68) +barrief lap3 , ..., fap68) . The pulpose of the functionbarrier(...) is to tell the FA P optimizer unit if a set ofFAPs generates a distorted or unnatural face. If barrier( ...)= 0 then the set of FAP generates a normal looking,undistorted face. A set of FAPs resulting in barrier( ...)>0 indicates that the FAPs represents an unnatural ordistorted face. The higher the value of burrier(...) the moredistorted the face is. The name of the function is chosenbecause it serves as a barrier that is difficult foroptimization algorithms to cross or overcome. Hence,forcing the optimization to produce FAPs that do notgenerate distorted or unnatural faces.

    The barrier function does not prohibit any set of FA Pvalues. It simply advises the optimizer unit on how anatural face should look like. Even unnatural or unusualfacial expressions are permitted if the minimal point inx fa p 3 , . . . , f a 6 8 ) is vely dominant.A. Shape of lipsA problem encountered while optimizing the lips is thatfeature point 2.9 is too close to feature point 2.7 an dfeature point 2.6 is too close to feature point 2.8, asdepicted in Figure 3.This will occasionally happen whenthe mouth is almost, hut not fully, closed. The shape of theupper lip can be approximated with a sine function withthe argument ranging from 0 for the right, inner lip corner(feature point 2.5) to x for the left, inner lip corner(feature point 2.4), where the magnitude is given by themid, inner lip (feature point 2.2). This sine function isdrawn with a dashed line i n Figure 3 . We will use abarrier function to suggest to the optimizer unit that anaturally looking upper lip should resemble a sine functionfrom 0 to x. The upper lip is certainly allowed to deviatefrom the shape of a sine function to ensure accuracy with

    173

    Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on August 2, 2009 at 22:48 from IEEE Xplore. Restrictions apply.

  • 7/28/2019 AN MPEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEM.pdf

    4/6

    the original lip. The harrier function is simply a guide tonatural looking lips. We will skip the geometry andexpress the barrier as

    (4)Durrier-,(fap3..._.fap68)=c;(distan ce from feature point 2.7 to sine funcion)+r;(distance from feature point 2.6 to sine function)

    for suitable constants c , an d cz

    Fig. 3 Inner lip feature pointsB. Upper lip above lower lipI t is difficult to distinguish between the upper and thelower lips when the mouth is closed. Sometimes theoptimizer unit will position the lower part of the upper lipbelow the upper part of the lower lip. We can avoid thisproblem by insisting that the lower part of the upper lipshould always he above the upper part of the lower lip,i.e., feature point 2.7 should be above feature point 2.9,feature point 2.2 should be above feature point 2.3 an dfeature point 2.6 should he above feature point 2.8. W eadopt the notation that 2.3.y means the y-value for featurepoint 2.3 i n the animated face. The following barrierfunction provides the optimizer unit with the necessaryguidance:barrier,(fap3, .._. ap68)

    =[ 0, i f 2.7.yt2.Y.yc1.(2.9.y.-2.7.y), i f 2.7.y

  • 7/28/2019 AN MPEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEM.pdf

    5/6

    Method with a coarse estimate of tlie lips position,by gradually opening the mouth in 30 steps. Themouths opening and the shape of the lips are thenoptimized by tlie Steepest Descent Method.

    0 Cyclic Coordinates Method: The last step in theFAP generation process is two iterations of theCyclic Coordinates Method to refine the FAPsbefore they are finnlized. Eac h of the partitioned s ubproblems that comprises of the complete penaltyfunction, penalty = f l a p 3 , ..., fap68) +brirrier,(rap3, .... fap68)+barrier,@p3, _..,fap68)+6orrier3(/iip3, ..,,ap68) , has been shown to havea condition number close to 1. This indicates thateven a simple optimization method like CyclicCoordinate Method will perform well in terms ofnumber o f iterations required to reach the m inimum.Thc condition number is calculated as the ratio ofthe largest eigenvalue to the smallest eigenvalue fortlie Hessian iiiatrix for the penalty function.

    VIII. RESULTSAll tests itre perfomied in full color with each framehaving a resolution of 480 by 420 pixels and 24 bits perpixel. The implcmentation of the above described FAPgeneration system is written in C and runs on a AMD K7processor running at 1.533GHz with a RADEON 7500graphics card. On tlie average 2263 iterations are requiredto compute the FAPs for each frame. The time to processeach frame is approximately 46 seconds. Faster hardwarecould reduce this time significantly.Figure 4 shows the neutral animated face (all FAPs arezero). The left coluinii of Figure 5 shows three framesfrom the video sequence, and the right column shows thecorrcspoiiding animated faces based on the FAPs foundthrough optimization. In particular, the pose, the gazedircction. tlie eyelids, the eyebrows and the lips are veryclose to the original face and tlie animated face looks verynatural.

    IX . CONCLUSI ON A N D FUTURE WORKThis paper introduces a new method for automatic FAPgeneration and demonstrates tliat the method works in areal-world system. Unlike previous methods in theliterature, tlie introduced met hod utilizes feedback from therender unit to ensure that tlie generated FAPs produce ananiniation that resembles the face in the original videosequence. The method is very robust and the resultingaiiin iati on~ are accurate, lifelike, and truthful to theoriginal face.

    Fig. 4 Neutral face generated by the render unit

    Fig. 5 original face (left colum n) and the correspondinganimated face (right column) for three different framesin the video sequence

    175

    Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on August 2, 2009 at 22:48 from IEEE Xplore. Restrictions apply.

  • 7/28/2019 AN MPEG-4 FACIAL ANIMATION PARAMETERS GENERATION SYSTEM.pdf

    6/6

    The emphasis i n this paper has been on the quality of theresults. Possible ways to improve the speed of the FAPgeneration include:

    0 Reducing the number of iterations required to reachthe minimum0 Tweaking the source code to speed up the program0 Using faster hardware (CPU and graphics card)0 Removing memory traffic bottlenecks0 Utilizing M M X and 3D-NOW technology that is

    already present in the CPUX. REFERENCES

    [ I ] ISOIIEC JTCI/SC29IWG1I, "Final Draft ofInteiiiational Standard ISO/IEC 14496-2, Coding ofAudio-visual Objects: Visual", Atlantic City, Oct.1998.

    [2 ] J. Ahlberg, "Facial Feature Extraction usingEigenspaces and D eformable Graphs", InternationalWor.!aliop on Synthetic-Natural Hybrid Coding andThree Diinensional Imaging, Sep. 1999, pp. 8- 1 1.

    [3] N. Sanis, P. Karagiannis, and M. Strintzis."Automatic Extraction o f Facial Feature Points forMPEG4 Videophone Applica t ions" , IEEEInternotional Conference on Consumer Electronics,2000, pp 130-131.K. Seo, W. Kim, C. Oh, and J. Lee, "Face Detectionand Facial Feature Extraction Using Color Snake",Proceedings of the 2002 IEEE InternationalSymposium on Industrial Electronics, Volume 2,2002, pp. 457.462.

    [SI S . Ahn and H. Kim, "Automatic FDP (FacialDefinition Parameters) Generation for a VideoConferencing System", In terna t ional W or kh op onSynthetic-Natural Hybrid Coding and nreeDimensional Imaging, Sep. 1999, pp. 16-19.J. Kim, M. Song, I . Kim, Y . Kwon, H. Kim, and S.A h , "Automatic FDP/FAP Genera t ion f rom anImage Sequence", IEEE International Symposium onCircuits and Systems, May 2000, pp. I 40-42.

    [7] face2face animation inc. [Online]. Available:

    [4 ]

    [6]

    http:Nwww.t2fanimation.com

    176

    http:///reader/full/Nwww.t2fanimation.comhttp:///reader/full/Nwww.t2fanimation.com