Download pdf - Real Time Face Detection and Facial Expression Recognition ... · computer vision literature for automatic facial expression recognition. See [12, 1] for reviews. Automated systems

RealTime FaceDetectionand Facial ExpressionRecognition: DevelopmentandApplications to Human Computer Interaction.

MarianStewartBartlett� � GwenLittlewort

� � IanFasel�� Javier R. Movellan

��

MachinePerceptionLaboratory, Institutefor NeuralComputation�

Universityof California,SanDiego,CA 92093&

IntelligentRoboticsandCommunicationLaboratories�

ATR, Kyoto,Japan.

AbstractComputeranimatedagentsandrobotsbringasocialdimen-sionto humancomputerinteractionandforceusto think innew waysabouthowcomputers couldbeusedin daily life.Faceto facecommunicationis a real-timeprocessoperat-ing at a a time scalein the order of 40 milliseconds.Thelevel of uncertaintyat this timescaleis considerable, mak-ing it necessaryfor humansandmachinesto relyonsensoryrich perceptualprimitivesrather thanslowsymbolicinfer-enceprocesses.In this paperwe presentprogresson onesuch perceptualprimitive. The systemautomaticallyde-tectsfrontal facesin thevideostreamandcodesthemwithrespectto 7 dimensionsin real time: neutral, anger, dis-gust, fear, joy, sadness,surprise. Thefacefinder employsa cascadeof feature detectors trainedwith boostingtech-niques[16, 3]. Theexpressionrecognizer receivesimagepatcheslocatedby the facedetector. A Gabor representa-tion of the patch is formedand thenprocessedby a bankof SVMclassifiers. A novel combinationof AdaboostandSVM’s enhancesperformance. The systemwas testedonthe Cohn-Kanadedatasetof posedfacial expressions[7].The generalization performanceto new subjectsfor a 7-way forcedchoicecorrect. Most interestinglythe outputsof theclassifierchange smoothlyasa functionof time, pro-viding a potentiallyvaluablerepresentationto codefacialexpressiondynamicsin a fully automaticand unobtrusivemanner. Thesystemhasbeendeployedona widevarietyofplatformsincludingSony’s Aibo petrobot,ATR’s RoboVie,andCU animator, and is currentlybeingevaluatedfor ap-plications including automaticreadingtutors, assessmentof human-robotinteraction.

1. Intr oductionComputeranimatedagentsandrobotsbringasocialdimen-sion to humancomputerinteractionand force us to thinkin new waysabouthow computerscould be usedin dailylife. Faceto facecommunicationis a real-timeprocessop-eratingat a a time scalein the order of 40 milliseconds.The level of uncertaintyat this time scaleis considerable,makingit necessaryfor humansto rely onsensoryrich per-ceptualprimitivesratherthanslow symbolicinferencepro-cesses.Thusfulfilling theideaof machinesthatinteractfaceto facewith usrequiresdevelopmentof robustreal-timeper-

ceptive primitives.In this paperwe presentsomefirst stepstowardsthe developmentof onesuchprimitive: a systemthatautomaticallyfindsfacesin thevisualvideostreamandcodesfacialexpressiondynamicsin real time. Thesystemhasbeendeployedonawidevarietyof platformsincludingSony’s Aibo petrobot,ATR’s RoboVie [6], andCU anima-tor [10]. Theperformanceof thesystemis currentlybeingevaluatedfor applicationsincluding automaticreadingtu-tors,assessmentof human-robotinteraction,andevaluationof psychiatricintervention.

CharlesDarwin wasoneof the first scientiststo recog-nize that facialexpressionis oneof themostpowerful andimmediatemeansfor humanbeingsto communicatetheiremotions,intentions,andopinionsto eachother. In additionto providing informationaboutaffectivestate,facialexpres-sionsalsoprovideinformationaboutcognitivestate,suchasinterest,boredom,confusion,andstress,andconversationalsignalswith informationaboutspeechemphasisandsyntax.A numberof groundbreakingsystemshaveappearedin thecomputervision literaturefor automaticfacial expressionrecognition. See[12, 1] for reviews. Automatedsystemswill haveatremendousimpactonbasicresearchby makingfacialexpressionmeasurementmoreaccessibleasa behav-ioral measure,and by providing dataon the dynamicsoffacialbehavior at a resolutionthatwaspreviously unavail-able. Computersystemswith this capability have a widerangeof applicationsin basicand appliedresearchareas,including man-machinecommunication,security, law en-forcement,psychiatry, education,andtelecommunications.

In this paperwe presentresultson a userindependentfully automaticsystemfor real time recognitionof basicemotionalexpressionsfrom video. The systemautomat-ically detectsfrontal facesin the video streamand codeseachframe with respectto 7 dimensions:Neutral, anger,disgust,fear, joy, sadness,surprise.The systempresentedhere differs from previous work in that it is fully auto-matic and operatesin real-time at a high level of accu-racy (93%generalizationto new subjectson a 7-alternativeforcedchoice).Anotherdistinctionis thatthepreprocessingdoesnot includeexplicit detectionandalignmentof internalfacial features.This providesa savings in processingtimewhich is importantfor real-timeapplications.We presenta methodfor furtherspeedadvantageby combiningfeatureselectionbasedon Adaboostwith featureintegrationbasedonsupportvectormachines.

1

2. Preparing training data2.1. DatasetThe systemwastrainedandtestedon CohnandKanade’sDFAT-504dataset[7]. This datasetconsistsof 100univer-sity studentsrangingin agefrom 18 to 30 years.65%werefemale,15% wereAfrican-American,and3% wereAsianor Latino. Videoswere recodedin analogS-videousinga cameralocateddirectly in front of the subject. Subjectswere instructedby an experimenterto performa seriesof23 facial expressions.Subjectsbegan andendedeachdis-play with a neutralface. Beforeperformingeachdisplay,anexperimenterdescribedandmodeledthedesireddisplay.Imagesequencesfrom neutralto target displayweredigi-tized into 640 by 480 pixel arrayswith 8-bit precisionforgrayscalevalues.

For our study, we selected313 sequencesfrom thedataset.The only selectioncriterion was that a sequencebe labeledasoneof the 6 basicemotions.The sequencescamefrom 90 subjects,with 1 to 6 emotionsper subject.The first and last frames(neutraland peak)were usedastraining imagesandfor testinggeneralizationto new sub-jects, for a total of 625 examples. The trainedclassifierswerelaterappliedto theentiresequence.

2.2. Locating the facesWe recently developeda real-time face-detectionsystembasedon [16], capableof detectionandfalsepositive ratesequivalentto thebestpublishedresults[14,15, 13, 16]. Thesystemscansacrossall possible�� pixel patchesinthe imageandclassifieseachasfacevs. non-face. Largerfacesaredetectedby applyingthesameclassifierat largerscalesin theimage(usingascalefactorof 1.2). Any detec-tion windowswith significantoverlapareaveragedtogether.Thefacedetectorwastrainedon5000facesandmillions ofnon-facepatchesfrom about8000 imagescollectedfromthewebby CompaqResearchLaboratories.

The systemconsistsof a cascadeof classifiers,eachofwhich containsa subsetof filters reminiscentof HaarBasisfunctions,which canbecomputedvery fastat any locationand scalein constanttime (seeFigure 1). In a �� pixel window, thereareover 160,000possiblefilters of thistype. For eachclassifierin thecascadea subsetof 2 to 200of thesefilters arechosenby usinga featureselectionpro-cedurebasedon Adaboost[4] as follows: First, all 5000facepatchesanda randomsetof 10000non-facepatchesaretaken from the labeledimagesetfrom Compaq.Then,usinga randomsampleof 5% of thepossiblefilters,a sim-ple classifier(or “weak learner”)usingonly onefilter at atime is trainedto minimizetheweightedclassificationerroronthissamplefor eachof thefilters. Thesingle-filterclassi-fier thatgivesthebestperformanceis selected.Ratherthanusethis result directly, we refine the selectionby findingthebestperformingsingle-featureclassifierfrom a new setof filters generatedby shifting andscalingthechosenfilterby two pixels in eachdirection,aswell ascompositefiltersmadeby reflectingeachshiftedandscaledfeaturehorizon-tally aboutthecenterandsuperimposingit on theoriginal.This canbethoughtof asa singlegenerationgeneticalgo-rithm,andis muchfasterthanexhaustively searchingfor thebestclassifieramongall 160,000possiblefilters andtheirreflection-basedcousins.Usingthechosenclassifierastheweaklearnerfor thisroundof boosting,theweightsoverthe

a. b.

c. d.

Figure1: Integral imagefilters (after Viola & Jones,2001[16]). a. Thevalueof thepixel at �� is thesumof all thepixelsaboveandto theleft. b. Thesumof thepixelswithinrectangle� canbecomputedas �� !� . (c) Eachfeatureis computedby taking the differenceof the sumsof the pixels in the white boxesandgrey boxes. Featuresinclude thoseshown in (c), as in [16], plus (d) the samefeaturessuperimposedon their reflectionabouttheY axis.

examplesarethenadjustedaccordingto its performanceoneachexampleusingtheAdaboostrule. This featureselec-tion processis thenrepeatedwith thenew weights,andtheentireboostingprocedurecontinuesuntil the“strongclassi-fier” (i.e., thecombinedclassifierusingall theweaklearn-ers for that stage)canachieve a minimum desiredperfor-mancerateon a validationset. Finally, after training eachstrongclassifier, aboot-strapround(ala [15]) is performed,in which thefull systemup to thatpoint is scannedacrossadatabaseof non-faceimages,andfalsealarmsarecollectedandusedasthenon-facesfor trainingthesubsequentstrongclassifierin thesequence.

While [16] useAdaboostin their featureselectionalgo-rithm, which requiresbinary classifiers,we have recentlybeen experimentingwith Gentleboost,describedin [5],which usesreal valuedfeatures. Figure 2 shows the firsttwo filters chosenby the systemalong with the real val-uedoutputof theweaklearners(or tuningcurves)built onthosefilters. We have alsodevelopeda training procedureto eliminatethecascadeof classifier, sothataftereachsin-gle feature,the systemcandecidewhetherto testanotherfeatureor to make a decision.Preliminaryresultsshow po-tential for dramaticimprovementsin speedwith no lossofaccuracy over thecurrentsystem.

Becausethestrongclassifiersearlyin thesequenceneedvery few featuresto achieve good performance(the firststagecanreject"�#!$ of thenon-facesusingonly � features,usingonly 20 simpleoperations,or about60 microproces-sor instructions),the averagenumberof featuresthat needto beevaluatedfor eachwindow is very small,makingtheoverall systemvery fast. The currentsystemachieves anexcellent tradeoff in speedand accuracy We host an on-line demoof thefacedetector, alongwith thefacialexpres-

2

Filter outputFilter output

Figure2: The first two features(top) and their respectivetuningcurves(bottom).Eachfeatureis shown over theav-erageface. The tuning curvesshow the evidencefor face(high) vs.non-face(low). Thefirst tuningcurve shows thata dark horizontalregion over a bright horizontalregion inthecenterof thewindow is evidencefor aface,andfor non-faceotherwise.Theoutputof thesecondfilter is bimodal.Both a strongpositive anda strongnegative output is evi-dencefor a face,while outputcloserto zerois evidencefornon-face.

sion recognitionsystemof [2], on the world wide web athttp://mplab.ucsd.edu.

Performanceon theCMU-MIT dataset(astandard,pub-lic datasetfor benchmarkingfrontalfacedetectionsystems)is comparableto [16]. While CMU-MIT containswidevari-ability in the imagesdue to illumination, occlusions,anddifferencesin image quality, the performancewas muchmoreaccurateonthedatasetusedfor thisstudy, becausethefaceswerefrontal, focusedandwell lit, with simpleback-ground[3]. All facesweredetectedfor thisdataset.

2.3. PreprocessingTheautomaticallylocatedfaceswererescaledto 48x48pix-els. A comparisonwas also madeat double resolution(96x96). No further registrationwasperformed.The typi-caldistancebetweenthecentersof theeyeswasroughly24pixels. Theimageswereconvertedinto a Gabormagnituderepresentation,using a bank of Gaborfilters at 8 orienta-tionsand5 spatialfrequencies(4:16pixelspercycle at 1/2octave steps)[8].

3. Facial ExpressionClassificationFacialexpressionclassificationwasbasedonsupportvectormachines(SVM’s). SVM’s arewell suitedto this taskbe-causethe high dimensionalityof the Gaborrepresentationdoesnot affect trainingtime for kernelclassifiers.Thesys-tem performeda 7-way forcedchoicebetweenthe follow-ing emotioncategories: Happiness,sadness,surprise,dis-

gust,fear, anger, neutral.Theclassificationwasperformedin two stages. First, supportvector machinesperformedbinary decisiontasks. Seven SVM’s were trainedto dis-criminateeachemotionfrom everythingelse.Theemotioncategory decisionwas then implementedby choosingtheclassifierwith the maximummargin for the testexample.Generalizationto novel subjectswastestedusingleave-one-subject-outcross-validation. Linear, polynomial,andRBFkernelswith Laplacian,andGaussianbasisfunctionswereexplored. LinearandRBF kernelsemploying a unit-widthGaussianperformedbest,andarepresentedhere.

We comparedrecognitionperformanceusingtheoutputof the automaticfacedetectorto performanceon imageswith explicit featurealignmentusinghand-labeledfeatures.For themanuallyalignedfaceimages,thefaceswererotatedsothattheeyeswerehorizontalandthenwarpedsothattheeyesandmouthwerealignedin eachface. The resultsaregiven in Table1. Therewasno significantdifferencebe-tweenperformanceon theautomaticallydetectedfacesandperformanceon themanuallyalignedfaces(z=0.25,p=0.4,n=625).

SVM Automatic Manuallyaligned

Linear 84.8 85.3RBF 87.5 87.6

Table 1: Facial expressionrecognition performanceformanually aligned versus automatically detected faces(96x96images).

3.1. SVM’sand AdaboostSVM performancewas next comparedto Adaboost foremotionclassification.The featuresemployed for the Ad-aboostemotionclassifierwerethe individual Gaborfilters.Therewere48x48x40= 92160possiblefeatures.A subsetof thesefilterswaschosenusingAdaboost.Oneachtraininground,the thresholdandscaleparameterof eachfilter wasoptimizedandthefeaturethatprovidedbestperformanceontheboosteddistributionwaschosen.SinceAdaboostis sig-nificantly slower to train thanSVM’s, we did not do ’ leaveonesubjectout’ crossvalidation. Insteadwe separatedthesubjectsrandomlyinto tengroupsof roughlyequalsizeanddid ’ leave onegroupout’ crossvalidation.

During Adaboost,training for eachemotion classifiercontinueduntil the distributions for the positive and neg-ative sampleswerecompletelyseparatedby a gap propor-tional to thewidthsof the two distributions(seeFigure3).Thetotalnumberof filtersselectedusingthisprocedurewas538.Here,thesystemcalculatedtheoutputof Gaborfilterslessefficiently, astheconvolutionsweredonein pixel spaceratherthanFourierspace,but theuseof 200timesfewerGa-bor filters neverthelessresultedin a substantialspeedben-efit. The generalizationperformance,85.0%,wascompa-rable to linear SVM performanceon the leave-group-outtestingparadigm,but Adaboostwassubstantiallyfaster, asshown in Table2.

Adaboostprovides an addedvalue of choosingwhichfeaturesaremostinformative to testat eachstepin thecas-cade.Figure4 illustratesthefirst 5 Gaborfeatureschosen

3

a. Numberof features b. Numberof features

Figure3: Stoppingcriteria for Adaboosttraining. a. Out-put of oneexpressionclassifierduring Adaboosttraining.The responsefor eachof the training examplesis shownasa functionof numberfeaturesastheclassifiergrows. b.Generalizationerrorasa functionof thenumberof featureschosenby Adaboost.Generalizationerrordoesnot increasewith “overtraining.”

ANGER DISGUST FEAR

JOY SADNESS SURPRISE

Figure4: Gaborsselectedby Adaboostfor eachexpression.Whitedotsindicatelocationsof all selectedGabors.Beloweachexpressionis a linear combinationof the real part ofthe first 5 Adaboostfeaturesselectedfor that expression.Facesshown areameanof 10 individuals.

for eachemotion.Thechosenfeaturesshow no preferencefor direction,but the highestfrequenciesarechosenmoreoften.Figure5 showsthenumberof chosenfeaturesateachof the5 wavelengthsused.

3.2 AdaSVM’sA combinationapproach,in which theGaborFeaturescho-sen by Adaboostwere usedas a reducedrepresentationfor trainingSVM’s(AdaSVM’s)outperformedAdaboostby3.8percentpoints,adifferencethatwasstatisticallysignifi-cant(z=1.99,p=0.02).AdaSVM’soutperformedSVM’sbyanaverageof 2.7 percentpoints,an improvementthatwasmarginally significant(z = 1.55,p = 0.06).

After examinationof the frequency distribution of the

2 4 6 8 10 12 14 16 180

50

100

150

200

250

wavelength in pixels

Wavelength distribution of Adaboost−chosen features

Figure 5: Wavelengthdistribution of featuresselectedbyAdaboost.

Gaborfilter selectedby Adaboost,it becameapparentthathigherspatialfrequency Gaborsandhigherresolutionim-agescould potentially improve performance. Indeed,bydoubling the resolutionto 96x96and increasingthe num-berof Gaborwavelengthsfrom 5 to 9 sothat they spanned2:32 pixels in 1/2 octave stepsimproved performanceofthe nonlinearAdaSVM to 93.3%correct. As the resolu-tion goesup,thespeedbenefitof AdaSVM’sbecomesevenmoreapparent.At thehigherresolution,thefull Gaborrep-resentationincreasedby a factorof 7, whereasthenumberof Gaborsselectedby Adaboostonly increasedby a factorof 1.75.

Leave-group-out Leave-subject-outAdaboost SVM SVM AdaSVM

Linear 85.0 84.8 86.2 88.8RBF 86.9 88.0 90.7

Table2: Performanceof Adaboost,SVM’s andAdaSVM’s(48x48images).

SVM Adaboost AdaSVMLin RBF Lin RBF

Time t t 90t 0.01t 0.01t 0.0125tTime t% t 90t 0.16t 0.16t 0.2t

Memory m 90m 3m 3m 3.3m

Table3: Processingtimeandmemoryconsiderations.Timet% includestheextra time to calculatetheoutputsof the538Gaborsin pixel spacefor Adaboostand AdaSVM, ratherthanthefull FFT employedby theSVM’s.

4. RealTime Emotion Mirr oringAlthougheachindividual imageis separatelyprocessedandclassified,thesystemoutputfor asequenceof videoframeschangessmoothlyasafunctionof time(SeeFigure6). This

4

Figure6: The neutraloutputdecreasesandthe output forthe relevant emotionincreasesasa function of time. Twotestsequencesfor onesubjectareshown.

Figure7: Examplesof the emotionmirror. The animatedcharactermirrorsthefacialexpressionof theuser.

providesapotentiallyvaluablerepresentationto codefacialexpressionin realtime.

To demonstratethepotentialof this ideawedevelopedareal time ’emotion mirror’. The emotionmirror rendersa3D characterin realtime thatmimicstheemotionalexpres-sionof aperson.

In theemotionmirror, theface-findercapturesafaceim-agewhich is sent to the emotionclassifier. The outputsof the7-emotionclassifierconstitutesa 7-D emotioncode.This codewassentto CU Animate,a setof softwaretoolsfor rendering3D computeranimatedcharactersin realtime,developedat the Centerfor Spoken LanguageResearchatCU Boulder. The7-D emotioncodegave a weightedcom-binationof morphtargetsfor eachemotion. The emotionmirror wasdemonstratedatNIPS2002.Figure7 shows theprototypesystematwork.

Theemotionmirror is aprototypesystemthatrecognizesthe emotionof the userandrespondsin an engagingway.Thelong-termgoalis to incorporatethissysteminto roboticandcomputeranimationapplicationsin which it is impor-tantto engagetheuseratanemotionallevel and/orhavethecomputerrecognizeandadaptto theemotionsof theuser.

5. Deploymentand EvaluationTherealtimesystempresentedherehasbeendeployedin avariety of platforms,including Sony’s Aibo Robot,ATR’sRoboVie [6], andCU animator[10]. The performanceofthe systemis currentlybeingevaluatedat homes,schools,andin laboratoryenvironments.

Automatedtutoring systemsmay be more effective ifthey adaptto the emotionalandcognitive stateof the stu-dent,likegoodteachersdo. Wearepresentlyintegratingau-tomaticfacetrackingandexpressionanalysisin automaticanimatedtutoringsystems.(SeeFigure8). This work is incollaborationwith RonColeatU. Colorado.

Facetrackingandexpressionrecognitionmayalsomakerobotsmoreengaging. For this purpose,the real time sys-temhasbeendeployedin theAibo robotandin theRoboVierobot(SeeFigure9). Thesystemalsoprovidesamethodformeasuringthegoodnessof interactionbetweenhumansandrobots. We have employed automaticexpressionanalysisto evaluatewhethernew featuresof the Robovie robot en-hanceduserenjoyment(SeeFigure10).

Figure8: Wearepresentlyintegratingautomaticfacetrack-ing andexpressionanalysisin automaticanimatedtutoringsystems.

Figure 9: The real time systemhasbeendeployed in theAibo robot(left) andtheRoboVie robot(right).

6. ConclusionsComputeranimatedagentsandrobotsbringasocialdimen-sionto humancomputerinteractionandforceusto think innew waysabouthow computerscouldbeusedin daily life.Faceto facecommunicationis a real-timeprocessoperat-ing at a a time scalein the orderof 40 milliseconds. Thelevel of uncertaintyat this time scaleis considerable,mak-ing it necessaryfor humansandmachinesto rely onsensory

5

Figure 10: Human responseduring interactionwith theRoboVie robotatATR is measuredby automaticexpressionanalysis.

rich perceptualprimitivesratherthanslow symbolicinfer-enceprocesses.In this paperwe presentprogresson onesuchperceptualprimitive: Real time recognitionof facialexpressions.

Ourresultssuggestthatuserindependentfully automaticreal time codingof basicexpressionsis anachievablegoalwith presentcomputerpower, at least for applicationsinwhich frontal views canbeassumed.Theproblemof clas-sificationinto 7 basicexpressionscanbe solved with highaccuracy by a simple linear system,after the imagesarepreprocessedby a bankof Gaborfilters. Theseresultsareconsistentwith thosereportedby PadgettandCottrell on asmallerdataset[11]. A previoussystem[9] employeddis-criminantanalysis(LDA) toclassifyfacialexpressionsfromGaborrepresentations.Herewe exploredusingSVM’s forfacial expressionclassification. While LDA’s are optimalwhenthe classdistributionsareGaussian,SVM’s may bemoreeffective whenthenclassdistributionsarenot Gaus-sian.

Goodperformanceresultswereobtainedfor directlypro-cessingtheoutputof anautomaticfacedetectorwithout theneedfor explicit detectionandregistrationof facialfeatures.Performanceof anonlinearSVM on theoutputof theauto-matic facefinder was almost identical to performanceonthesamesetof facesusingexplicit featurealignmentwithhand-labeledfeatures.

Using Adaboost to perform feature selection greatlyspeededupthetheapplication.SVM’strainedonthisrepre-sentationshow animprovedclassificationperformanceoverAdaboostaswell.

Acknowledgments

Supportfor this projectwasprovidedby ONR N00014-02-1-0616,NSF-ITRIIS-0086107,andCaliforniaDigital Me-dia InnovationProgramDIMI 01-10130,andtheMIND In-stitute. Two of theauthors(MovellanandFasel)werealsosupportedby the Intelligent RoboticsandCommunicationLaboratoryat ATR. We would also like to thank HiroshiIshiguroatATR for helpportingthesoftwarepresentedhereto RoboVie. This researchwas supportedin part by theTelecommunicationsAdvancementOrganizationof Japan.

References[1] Marian S. Bartlett. Face Image Analysisby Unsupervised

Learning, volume 612 of The Kluwer International Serieson Engineeringand ComputerScience. Kluwer AcademicPublishers,Boston,2001.

[2] M.S.Bartlett,G. Littlewort, B. Braathen,T.J.Sejnowsk,andJ.R. Movellan. A prototypefor automaticrecognitionofspontaneousfacial actions. In S. Becker, S. Thrun, andK. Obermayer, editors,Advancesin Neural InformationPro-cessingSystems, volume 15, Cambridge,MA, 2003. MITPress.

[3] I. Fasel and J. R. Movellan. Comparisonof neurally in-spiredfacedetectionalgorithms. In Proceedingsof the in-ternationalconferenceonartificial neural networks(ICANN2002). UAM, 2002.

[4] Yoav FreundandRobertE. Schapire. Experimentswith anew boostingalgorithm.In Proc.13thInternationalConfer-enceon Machine Learning, pages148–146.Morgan Kauf-mann,1996.

[5] J Friedman,T Hastie, and R Tibshirani. Additive logis-tic regression:A statisticalview of boosting. ANNALSOFSTATISTICS, 28(2):337–374,2000.

[6] H. Ishiguro, T. Ono, M. Imai, T. Maeda, and T. Kan-daandR. Nakatsu.Robovie: aninteractive humanoidrobot.28(6):498–503,2001.

[7] T. Kanade,J.F. Cohn,andY. Tian. Comprehensive databasefor facial expressionanalysis. In Proceedingsof the fourthIEEE International conferenceon automaticfaceand ges-ture recognition (FG’00), pages46–53,Grenoble,France,2000.

[8] M. Lades,J. Vorbruggen,J. Buhmann,J. Lange,W. Konen,C. von der Malsburg, and R. Wurtz. Distortion invariantobject recognitionin the dynamiclink architecture. IEEETransactionsonComputers, 42(3):300–311,1993.

[9] M. Lyons,J.Budynek,A. Plante,andS.Akamatsu.Classify-ing facialattributesusinga2-dgaborwaveletrepresentationanddiscriminantanalysis.In Proceedingsof the4th interna-tional conferenceonautomaticfaceandgesturerecognition,pages202–207,2000.

[10] JiyongMa,JieYan,RonCole,andCUAnimate.Cuanimate:Tools for enablingconversationswith animatedcharacters.In Proceedingsof ICSLP-2002, Denver, USA, 2002.

[11] C. Padgett and G. Cottrell. Representingface imagesfor emotion classification. In M. Mozer, M. Jordan,andT. Petsche,editors,Advancesin Neural InformationProcess-ing Systems, volume9, Cambridge,MA, 1997.MIT Press.

[12] M. Pantic andJ.M. Rothcrantz. Automaticanalysisof fa-cial expressions:Stateof theart. IEEETransactionsonPat-tern AnalysisandMachine Intelligence, 22(12):1424–1445,2000.

[13] H. Rowley, S.Baluja,andT. Kanade.Neuralnetwork-basedfacedetection.IEEETrans.onPatternAnalysisandMachineIntelligence, 1(20):23–28,1998.

[14] H. SchneidermanandT. Kanade.Probabilisticmodelingoflocal appearanceandspatialrelationshipsfor object recog-nition. In Proc. IEEE Intl. Conf. on ComputerVision andPatternRecognition, pages45–51,1998.

[15] KahKay SungandTomasoPoggio.Examplebasedlearningfor view-basedhumanfacedetection. IEEE Trans.PatternAnal.Mach. Intelligence, 20:3951,1998.

[16] PaulViola andMichaelJones.Robustreal-timeobjectdetec-tion. TechnicalReportCRL 20001/01,CambridgeResearch-Laboratory, 2001.

6