Insights on spontaneous facial expressions from automatic expression measurement.pdf

7/27/2019 Insights on spontaneous facial expressions from automatic expression measurement.pdf

1/34

To appear in Giese,M. Curio, C., Bulthoff, H. (Eds.) Dynamic Faces: Insights fromExperiments and Computation. MIT Press. 2009.

PartIII

Chapter14

Insightsonspontaneousfacialexpressionsfromautomaticexpressionmeasurement

Marian Bartlett1, Gwen Littlewort

1, Esra Vural

1,3, Jake Whitehill

1, Tingfan Wu

1, Kang Lee

2,

andJavierMovellan1

1InstituteforNeuralComputation,UniversityofCalifornia,SanDiego

2HumanDevelopmentandAppliedPsychology,UniversityofToronto,Canada3EngineeringandNaturalScience,SabanciUniversity,Istanbul,Turkey

Correspondencetobeaddressedto:[email protected]


2/34

Introduction

Thecomputervisionfieldhasadvancedtothepointthatwearenowabletobegintoapply

automatic facial expression recognition systems to important research questions in

behavioral science. One of the major outstanding challenges has been to achieve robust

performance with spontaneous expressions. Systems that performed well on highly

controlled datasets often performed poorly when tested on a different dataset with

different image conditions, and especially had trouble generalizing to spontaneous

expressions, which tend to have much greater noise from numerous causes. A major

milestoneinrecentyearsisthatautomaticfacialexpressionrecognitionsystemsarenow

abletomeasurespontaneousexpressionswithsomesuccess.Thischapterdescribesone

such system, the Computer Expression Recognition Toolbox (CERT). CERT will soon beavailabletotheresearchcommunity.(Seehttp://mplab.ucsd.edu).

Thissystemwasemployedinsomeoftheearliestexperimentsinwhichspontaneous

behaviorwasanalyzedwithautomatedexpressionrecognitiontoextractnewinformation

about facial expression that was previously unknown (Bartlett et al., 2008). These

experiments are summarized in this chapter. The experiments measured facial behavior

associatedwithfakedversusgenuineexpressionsofpain,driverdrowsiness,andperceived

difficultyof avideolecture.Theanalysisrevealedinformationaboutfacialbehaviorduring

theseconditionsthatwerepreviouslyunknown,includingthecouplingofmovementssuch

aseyeopennesswithbrowraiseduringdriverdrowsiness.Automatedclassifierswereable

todifferentiaterealfromfakepainsignificantlybetterthannavehumansubjects,andtodetect driver drowsiness above 98% accuracy. Another experiment showed that facial

expression was able to predict perceived difficulty of a video lecture and preferred

presentationspeed(Whitehilletal.,2008).CERTisalsobeingemployedinaprojecttogive

feedbackonfacialexpressionproductiontochildrenwithautism.Aprototypegame,called

SmileMaze, requires the player to produce a smile to enable a character to pass though

doorsandobtainrewards(Cockburnetal.,2008).

Theseareamongthefirstgenerationofresearchstudiestoapplyfullyautomatedfacial

expressionmeasurementtoresearchquestionsinbehavioralscience.Toolsforautomatic

expression measurement will bring about paradigmatic shifts in a number of fields by

making facial expression more accessibleas a behavioral measure. Moreover, automatedfacial expression analysis will enable investigations into facial expression dynamics that

werepreviouslyintractablebyhumancodingbecauseofthetimerequiredtocodeintensity

changes.

TheFacialActionCodingSystem

The facial action coding system (FACS) (Ekman and Friesen, 1978) is arguably the most

widely used method for coding facial expressions in the behavioral sciences. The system

describes facial expressions in terms of 46 component movements, which roughly


3/34

correspond to the individual facial muscle movements. An example is shown in Figure 1.

FACSprovidesanobjectiveandcomprehensivewaytoanalyzeexpressionsintoelementary

components, analogous to decomposition of speech into phonemes. Because it iscomprehensive,FACShasprovenusefulfordiscoveringfacialmovementsthatareindicative

of cognitive and affective states. See Ekman and Rosenberg (2005) for a review of facial

expressionstudiesusingFACS.Theprimarylimitationtothewidespreaduseof FACSisthe

time requiredto code.FACSwas developed for coding byhand, usinghuman experts. It

takesover100hoursoftrainingtobecomeproficientinFACS,andittakesapproximately2

hoursforhumanexpertstocodeeachminuteofvideo.Theauthorshavebeendeveloping

methods for fully automating the facial action coding system (e.g. Donato et al., 1999;

Bartlett et al., 2006). In this chapter we apply a computer vision system trained to

automatically detect facial Action Units to dataminefacial behavior and to extract facial

signalsassociatedwithstatesincluding:(1)realversusfakepain,(2)driverfatigue(3)easyversusdifficultvideolectures.

[INSERTFIGURE1ABOUTHERE]

SpontaneousExpressionsThe machine learning system presented here was trained on spontaneous facial

expressions. The importance of using spontaneous behavior for developing and testing

computervisionsystemsbecomesapparentwhenweexaminetheneurologicalsubstrate

forfacialexpressionproduction.Therearetwodistinctneuralpathwaysthatmediatefacial

expressions, each one originating in a different area of the brain. Volitional facial

movements originate in the cortical motor strip, whereas spontaneous facial expressions

originate in the subcortical areas of the brain (see Rinn, 1984, for a review). These two

pathways have different patterns of innervation on the face, with the cortical system

tendingtogivestrongerinnervationtocertainmusclesprimarilyinthelowerface,whilethe

subcorticalsystemtendsto morestronglyinnervatecertainmusclesprimarilyintheupper

face(e.g.Morecraftetal.,2001).

The facial expressions mediated by these two pathways have differences both in

whichfacialmusclesaremovedandintheirdynamics(Ekman,2001;Ekman&Rosenberg,

2005). Subcorticallyinitiated facial expressions (thespontaneous group) arecharacterized

bysynchronized,smooth,symmetrical,consistent,andreflex-likefacialmusclemovements

whereas cortically initiatedfacialexpressions (posed expressions) aresubject to volitional

real-time control and tend to be less smooth, with more variable dynamics (Rinn, 1984;

Frank,Ekman,&Friesen,1993;Schmidt,Cohn&Tian,2003;Cohn&Schmidt,2004).Given

the two different neural pathways for facial expression production, it is reasonable to

expecttofinddifferencesbetweengenuineandposedexpressionsofstatessuchaspainor

drowsiness.Moreover,itiscrucialthatthecomputervisionmodelfordetectingstatessuch


4/34

asgenuinepainordriverdrowsinessisbasedonmachinelearningofexpressionsamples

whenthesubjectisactuallyexperiencingthestateinquestion.

TheComputerExpressionRecognitionToolbox

TheComputerExpressionRecognitionToolbox(CERT),developedatUniversityofCalifornia,

SanDiego,isafullyautomatedsystemthatanalyzesfacialexpressionsinreal-time.CERTis

basedon15yearsexperienceinautomatedfacialexpressionrecognition(e.g.Bartlettetal.,

1996,1999,2006;Donatoetal.,1999;Littlewortetal.,2006).Thislineofworkoriginated

from a collaboration between Sejnowski and Ekman, in response to an NSF planning

workshoponautomatedfacialexpressionunderstanding(Ekmanetal.,1993).The present

systemautomaticallydetectsfrontalfacesinthevideostreamandcodeseachframewithrespectto40continuousdimensions,includingbasicexpressionsofanger,disgust,fear,joy,

sadness,surprise,contempt,acontinuousmeasureofheadpose(yaw,pitch,androll),as

wellas30facialactionunits(AUs)fromtheFacialActionCodingSystem(Ekman&Friesen,

1978).SeeFigure2.

SystemoverviewThe technical approach to CERT is an appearance-based discriminative approach. Such

approacheshaveprovenhighlyrobustandfastforfacedetectionandtracking(e.g.Viola&

Jones, 2004). Appearance-based discriminativeapproachesdont suffer frominitialization

and drift, which presents challenges for state of the art tracking algorithms, and take

advantageofthe richappearance-basedinformationin facialexpressionimages.Thisclass

ofapproachesachievesahighlevelofrobustnessthroughtheuseofverylargedatasetsfor

machinelearning.Itisimportantthatthetrainingsetissimilartotheproposedapplications

intermsofnoise.Adetailedanalysisofmachinelearningmethodsforrobustdetectionof

onefacialexpression,smiles,isprovidedinWhitehilletal.,(inpress).


ThedesignofCERTisasfollows.Facedetectionanddetectionofinternalfacialfeaturesis

firstperformedoneachframeusingboostingtechniquesina generativeframework(Fasel

etal.2005),extendingworkbyViolaandJones(2004).Theautomaticallylocatedfacesthen

undergoa2Dalignmentbycomputingafastleastsquaresfitbetweenthedetectedfeature

positionsandasixfeaturefacemodel.Theleastsquaresalignmentallowsrotation,scale,

and shear. The aligned face image is then passed through a bank of Gabor filters 8

orientationsand9spatialfrequencies(2to32pixelspercycleat1/2octavesteps).Output

magnitudeswerethennormalizedandpassedtofacialactionclassifiers.

Facial action detectors were then developed by training separate support vector

machinestodetectthepresenceorabsenceofeachfacialaction.Thetrainingsetconsisted


5/34

of over 10000 images which were coded for facial actions from the Facial Action Coding

System,includingover5000examplesfromspontaneousexpressions.

In previous work, we conducted empirical investigations of machine learningmethods applied to the related problem of classifying expressions of basic emotions.We

compared image features(e.g.Donato et al.,1999), classifiers suchas AdaBoost,support

vector machines,and linear discriminant analysis,as well as feature selection techniques

(Littlewortet al., 2006). Best results were obtained by selecting a subset of Gabor filters

using AdaBoost and then training Support Vector Machines on the outputs of the filters

selectedbyAdaBoost.AnoverviewofthesystemisshowninFigure2a.

BenchmarkperformanceIn this chapter, performance for expression detection is assessed using a measure fromsignal detection theory, area under the ROC curve (A). The ROC curve is obtained by

plotting true positives against false positives as the decision threshold for deciding

expression present shifts from high (0 detections and 0 false positives) to low (100%

detections and 100% false positives). A ranges from 0.5 (chance) to 1 (perfect

discrimination).We employA insteadofpercentcorrectsinceA canchangefor thesame

systemdependingontheproportionoftargetsandnontargetsinagiventestset.Acanbe

interpretedintermsofpercentcorrectona2alternativeforcedchoicetaskinwhichtwo

images are presented on each trial and the system must decide which of the two is the

target.

Performances on a benchmark dataset (Cohn-Kanade) shows state of the art

performance for both recognition of basic emotions and facial actions. Performance for

expressionsofbasicemotionwas.98areaundertheROCfordetection(1vs.all)across7

expressionsofbasicemotion.,and93%correctfora7alternativeforcedchoice.Thisisthe

highestperformancereportedtoourknowledgeonthisbenchmarkdataset.Performance

forrecognizingfacialactionsfromtheFacialActionCodingSystemwas.93meanareaunder

the ROC for posed facial actions. (This was mean detection performance across 20 facial

action detectors). Recognition of spontaneous facial actions was tested on the RU-FACS

dataset (Bartlett et al., 2006). This dataset is an interview setting containing speech.

Performancewastestedfor33subjectswithfourminutesofcontinuousvideoeach.Mean

areaundertheROCfordetectionof11facialactionswas0.84.

System outputs consist of the margin of the SVM (distance to the separating

hyperplanebetweenthetwoclasses),foreachframeofvideo.Systemoutputsaresignificantlycorrelated with the intensity of the facial action, as measured by FACS expert intensity codes

(Bartlett et al., 2006), and also as measured by nave observers obtained by turninga dial while

watchingcontinuousvideo(Whitehilletal.,inpress).Thustheframe-by-frameintensitiesprovide

information on the dynamics of facial expression at temporal resolutions that were previously

impracticalviamanualcoding.ThereisalsopreliminaryevidencefromJimTanakaslaboratoryof

concurrent validity with EMG. CERT outputs significantly correlated with EMG measures of

zygomaticandcorrugatoractivitydespitethevisibilityoftheelectrodesinthevideoprocessedby

CERT.


6/34

Asecond-layerclassifiertodetectinternalstates

The overall CERT system gives a frame-by-frame output with N channels, consisting of N

facialActionUnits.Thissystemcanbeappliedtodataminehumanbehavior.Byapplying

CERTtofacevideowhilesubjectsexperiencespontaneousexpressionsof agivenstate,we

canlearnnewthingsaboutthefacialbehaviorsassociatedwiththatstate.Also,bypassing

theNchanneloutputtoa machinelearningsystem,wecandirectlytraindetectorsforthe

specificstateinquestion.SeeFigure3.


RealversusFakedExpressionsofPain

The ability to distinguishreal pain from faked pain (malingering) is an important issue in

medicine (Fishbain, 2006). Navehuman subjectsare nearchance for differentiating real

fromfakepainfromobservingfacialexpressions(e.g.Hadjistavropoulosetal.,1996).Inthe

absence of direct training in facial expressions, clinicians are also poor at assessing pain

fromtheface(e.g.Prkachinetal.2001and2007;Grossman,1991).Howeveranumberof

studiesusingtheFacialActionCodingSystem(FACS)(Ekman&Friesen,1978)haveshown

that information exists in the face for differentiating real from posed pain (e.g. Hill and

Craig, 2002; Craig et al., 1991; Prkachin 1992). In fact, if subjects receive corrective

feedback,theirperformanceimprovessubstantially(Hill&Craig,2004).Thusitappearsthat

asignalispresent,butthatmostpeopledontknowwhattolookfor.

This study explored the application of a system for automatically detecting facial

actionstothisproblem.ThissectionisbasedonworkdescribedinLittlewort,Bartlett&Lee

(2009).Thegoalof thisworkwasto1)assesswhethertheautomatedmeasurementswith

CERT were consistent withexpression measurements obtained by human experts, and 2)

develop a classifier to automatically differentiate real from faked pain in a subject-

independentmannerfromtheautomatedmeasurements.

In this study, participants were videotaped under three experimental conditions:

baseline,posedpain,andrealpain.Weemployedamachinelearningapproachinatwo-

stagesystem.Inthefirststage,thevideowaspassedthroughasystemfordetectingfacial

actions from the Facial Action Coding System (Bartlett et al., 2006). This data was then

passedtoasecondmachinelearningstage,inwhichaclassifierwastrainedtodetectthe

difference between expressions of real pain and fake pain. Nave human subjects were

testedonthesamevideostocomparetheirabilitytodifferentiatefakedfromrealpain.

Theultimategoalofthisworkisnotthedetectionofmalingeringperse,butrather

to demonstrate the ability of the automated systems to detect facial behavior that the

untrainedeyemightfailtointerpret,andtodifferentiatetypesofneuralcontroloftheface.


7/34

It holds out the prospect of illuminating basic questions pertaining to the behavioral

fingerprintofneuralcontrolsystems,andthusopensmanyfuturelinesofinquiry.

Videodatacollection

Videodatawascollectedof26humansubjectsduringrealpain,fakedpain,andbaseline

conditions.Humansubjectswereuniversitystudentsconsistingof6menand20women.

Thepainconditionconsistedofcoldpressorpaininducedbyimmersingthearminicewater

at 50 Celsius. For the baseline and faked pain conditions, the water was 20

0 Celsius.

Subjectswereinstructedtoimmersetheirforearmintothewateruptotheelbow,andhold

ittherefor60secondsineachofthethreeconditions.Forthefakedpaincondition,subjects

wereaskedtomanipulatetheirfacialexpressionssothatanexpertwouldbeconvinced

theywereinactualpain.Inthebaselinecondition,subjectswereinstructedtodisplaytheir

expressions naturally. Participants facial expressions were recorded using a digital video

cameraduringeachcondition.ExamplesareshowninFigure4.Forthe26subjectsanalyzed

here,theorderoftheconditionswasbaseline,fakedpain,andthenrealpain. Another22

subjectsreceivedthecounterbalancedorder:baseline,realpain,thenfakedpain.Because

repeatingfacialmovementsthatwereexperiencedminutesbeforediffersfromfakingfacial

expressions without immediate pain experience, the two conditions were analyzed

separately.Thefollowinganalysisfocusesontheconditioninwhichsubjectsfakefirst.


Afterthevideoswerecollected,asetof170naveobserverswereshownthevideos

and asked to guess whether each video contained faked or real pain. Subjects were

undergraduates with no explicit training in facial expression measurement. They were

primarily Introductory Psychology students at UCSD. Mean accuracy of nave human

subjects for discriminating fake from real pain in these videos was at chance at 49.1%

(standarddeviation13.7%).

CharacterizingtheDifferenceBetweenRealandFakedExpressionsofPain

Thecomputerexpressionrecognitiontoolboxwasappliedtothethreeone-minutevideosof

eachsubject.Thefollowingsetof20facialactions1wasdetectedforeachframe[124567910121415171820232425261+41+2+4].Thisproduceda20channeloutput

stream,consistingofonerealvalueforeachlearnedAU,foreachframeofthevideo.We

first assessed which AU outputs contained information about genuine pain expressions,

faked pain expressions, and show differences between genuine versus faked pain. The

resultswerecomparedtostudiesthatemployedexperthumancoding.

Realpainvs.baseline.Wefirstexaminedwhichfacialactiondetectorswereelevatedinreal

1 A version of CERT was employed that just detected these 20 AUs.


8/34

paincomparedtothebaselinecondition.Z-scoresforeachsubjectandeachAUdetector

were computed as Z=(x-)/, where (,) are the mean and variance for the output offrames100-1100inthe baselinecondition(warmwater,nofakedexpressions). Themean

differenceinZ-scorebetweenthebaselineandpainconditionswascomputedacrossthe26

subjects. Table1 showsthe action detectors with the largest difference inZ-scores. We

observedthattheactionswiththelargestZ-scoresforgenuinepainwereMouthopening

andjawdrop(25and26),lipcornerpullerbyzygomatic(12),nosewrinkle(9),andtoa

lesserextent,lipraise(10)andcheekraise(6).Thesefacialactionshavebeenpreviously

associatedwithcoldpressorpain(e.g.Prkachin,1992;Craig&Patrick1985).

Fakedpainvs.baseline.TheZ-scoreanalysiswasnextrepeatedforfakedversusbaseline.

Weobservedthatinfakedpaintherewasrelativelymorefacialactivitythaninrealpain.Thefacialactionoutputswiththehighestz-scoresforfakedpainrelativetobaselinewere

browlower(4),distressbrow(1or1+4),innerbrowraise(1),mouthopenandjawdrop(25

and26),cheekraise(6),lipraise(10),fearbrow(1+2+4),nosewrinkle(9),mouthstretch

(20),andlowerlidraise(7).

Realvs.FakedPain.Differencesbetweenrealandfakedpainwereexaminedbycomputing

thedifferenceofthetwoz-scores.Differenceswereobservedprimarilyintheoutputsof

actionunit4(browlower),aswellasdistressbrow(1or1+4)andinnerbrowraise(1inany

combination).

[INSERTTABLE1ABOUTHERE]

There was considerable variation among subjects in the difference between their

fakedandrealpainexpressions. However the mostconsistentfinding isthat 9 ofthe26

subjects showed significantly more brow lowering activity (AU4) during the faked pain

condition,whereasnoneofthesubjectsshowedsignificantlymoreAU4activityduringthe

realpaincondition.Also7subjectsshowedmorecheekraise(AU6),and6subjectsshowed

more inner brow raise (AU1), and the fear brow combination (1+2+4). The next most

common differences were to show more 12, 15, 17, and distress brow (1 alone or 1+4)

duringfakedpain. Pairedt-testswereconductedforeachAUtoassesswhetheritwasareliableindicator

of genuine versus faked pain in a within-subjects design. Of the 20 actions tested, the

differencewasstatisticallysignificantforthreeactions.ItwashighlysignificantforAU4(p


9/34

showedsimilarpatternstopreviousstudiesofrealandfakedpain.

Realpain.InpreviousstudiesusingmanualFACScodingbyhumanexperts,atleast12facialactions showed significant relationships with pain across multiple studies and pain

modalities.Ofthese,theonesspecificallyassociatedwithcoldpressorpainwere4,6,7,9,

10,12,25,26(Craig&Patrick,1985;Prkachin,1992).Agreementoftheautomatedsystem

withthehumancodingstudieswascomputedasfollows:firstasupersetoftheAUstested

inthe two cold pressor pain studies was created.AUs that were significantly elevated in

eitherstudywereassigneda1,andotherwisea0. Thisvectorwasthencorrelatedagainst

the findingsforthe 20AUsmeasured bytheautomatedsystem. AUs with the highest z-

scores,showninTable1A,wereassigneda1,andtheothersa0.OnlyAUsthatwere

measuredbothbytheautomatedsystemandbyatleastoneofthehumancodingstudies

wereincludedinthecorrelationanalysis.Agreementcomputedinthismannerwas90%forAUsassociatedwithrealpain.

Fakedpain.AstudyoffakedpaininadultsshowedelevationofthefollowingAUs:4,6,7,

10, 12, 25. (Craig, Hyde & Patrick, 1991). A study of faked pain in children ages 8-12,

(LaRochetteetal.,2006)observedsignificantelevationinthefollowingAUsforfakepain

relativetobaseline:1467101220232526.ThesefindingsagainmatchtheAUswiththe

highestz-scoresintheautomatedsystemoutputofthepresentstudy,asshowninTable1B.

(The two human coding studies did not measure AU 9 or the brow combinations).

Agreementofthecomputervisionfindingswiththesetwostudieswas85%.

Real vs. Fakedpain. Exaggerated activity of the brow lower (AU 4) during faked pain is

consistent with previous studies in which the real pain condition was exacerbated lower

backpain(Craigetal.1991,Hill&Craig,2002).Onlyoneotherstudylookedatrealversus

faked pain in whichthe real pain condition was cold pressor pain.This was a study with

childrenages8-12(LaRochetteetal.,2006).Whenfakedpainexpressionswerecompared

withrealcoldpressorpaininchildren,LaRochetteetalfoundsignificantdifferencesinAUs

14710.Againthefindingsofthepresentstudyusingtheautomatedsystemaresimilar,as

the AUchannels with the highest z-scores were 1, 4, and 1+4 (Table 1C), and the t-tests

weresignificantfor4,1+4and7.Agreementoftheautomatedsystemwiththehuman

codingfindingswas90%.

Human FACS coding of video in this study. In order to further assess the validity of the

automated system findings, we obtained FACS codes for a portion of the video data

employedinthisstudy.FACScodeswereobtainedbyanexpertcodercertifiedintheFacial

ActionCodingSystem.Foreachsubject,thelast500framesofthefakepainandrealpain

conditionswereFACScoded(about15secondseach).Ittook60manhourstocollectthe

humancodes,overthecourseofmorethan3months,sincehumancoderscanonlycodeup

to2hoursperdaybeforehavingnegativerepercussionsinaccuracyandcoderburn-out.

The sum of the frames containing each action unit were collected for each subject


10/34

condition,aswellasaweightedsum,multipliedbytheintensityoftheactionona1-5scale.

To investigate whether any action units successfully differentiated real from faked pain,pairedt-testswerecomputedoneachindividualactionunit.(Testsonspecificbrowregion

combinations 1+2+4and 1,1+4have not yet been conducted.) The one action unit that

significantlydifferentiatedthetwoconditionswasAU4,browlower,(p


11/34

25 subjects for parameter estimation and reserving a different subject for testing. This

providedanestimateofsubject-independentdetectionperformance.Thepercentcorrect2-

alternativeforcedchoiceoffakeversusrealpainonnewsubjectswas88percent.Thiswassignificantly higher than performance of nave human subjects, who obtained a mean

accuracyof49%correctfordiscriminatingfakedfromrealpainonthesamesetofvideos.

Performanceusingtheintegratedeventrepresentationwasalsoconsiderablyhigherthan

an alternative representation that did not take temporal dynamics into account. (See

Littlewortetal.,inpress,fordetails.)Thisintegratedeventrepresentationcontainsuseful

dynamic information allowing more accurate behavioral analysis. This suggests that this

decisiontaskdependsnotonlyonwhichsubsetofAUsarepresentatwhichintensity,but

alsoonthedurationandnumberofAUevents.

Discussionofpainstudy

Thefieldofautomaticfacialexpressionanalysishasadvancedtothepointthatwecanbegin

to apply it to address research questions in behavioral science. Here we describe a

pioneering effort to apply fully automated facial action coding to the problem of

differentiatingfakefromrealexpressionsofpain.Whilenavehumansubjectswereonlyat

49% accuracy for distinguishing fake from real pain, the automated system obtained .88

area underthe ROC,which isequivalentto 88% correct ona 2-alternative forced choice.

Moreover, the pattern ofresults interms ofwhich facial actions may be involvedin real

pain,fakepain,anddifferentiatingrealfromfakepainissimilarto previousfindingsin the

psychologyliteratureusingmanualFACScoding.

Here we applied machine learning on a 20-channel output stream of facial action

detectors.Themachinelearningwasappliedtosamplesofspontaneousexpressionsduring

the subject statein question. Here the statein questionwas fake versusrealpain. The

same approach can be applied to learn about other subject states, given a set of

spontaneous expression samples. Section 4 develops another example in which this

approachisappliedtothedetectionofdriverdrowsinessfromfacialexpression.

AutomaticDetectionofDriverFatigue2

Vural,E.,Cetin,M.,Ercil,A.,Littlewort,G.,Bartlett,M.,andMovellan,J.(2007).Drowsy

driverdetectionthroughfacialmovementanalysis.ICCVWorkshoponHumanComputerInteraction.2007IEEE.

2Based on Vural,E.,Cetin,M.,Ercil,A.,Littlewort,G.,Bartlett,M.,andMovellan,J.(2007).

Drowsydriverdetectionthroughfacialmovementanalysis.ICCVWorkshoponHuman

ComputerInteraction.2007IEEE.


12/34

The US National Highway Traffic Safety Administration estimates that in the US aloneapproximately 100,000 crashes each year are caused primarily by driver drowsiness or

fatigue(DepartmentofTransportation,2001).Infact,theNHTSAhasconcludedthatdrowsy

driving is just as dangerous as drunk driving. Thus methods to automatically detect

drowsinessmayhelpsavemanylives.

There are a number of techniques for analyzing driver drowsiness. One set of

techniquesplacessensorsonstandardvehiclecomponents,e.g.,steeringwheel,gaspedal,

and analyzes the signals sent by these sensors to detect drowsiness (Takei & Furukawa,

2005).It isimportantforsuch techniquesto beadapted tothedriver,sinceAbutandhis

colleaguesnotethattherearenoticeabledifferencesamongdriversinthewaytheyusethe

gas pedal(Igarashi,et al., 2005).A second set oftechniques focuses onmeasurement ofphysiologicalsignalssuchasheartrate,pulserate,andElectroencephalography(EEG)(e.g.

Cobb,1983).Ithasbeenreportedbyresearchersthatasthealertnessleveldecreases,EEG

powerofthealphaandthetabandsincreases(Hung&Chung,2005),andtherebyprovides

indicatorsofdrowsiness.Howeverthismethodhasdrawbacksintermsofpracticalitysince

itrequiresapersontowearanEEGcapwhiledriving.

A third set of solutions focuses on computer vision systems that can detect and

recognizethefacialmotionandappearancechangesoccurringduringdrowsiness(Gu& Ji,

2004; Gu, Zhang & Ji, 2005; Zhang & Zhang, 2006). The advantage of computer vision

techniquesisthattheyarenon-invasive,andthusaremoreamenabletousebythegeneral

public.Mostofthepreviousresearchoncomputervisionapproachestodetectionoffatigueprimarily make pre-assumptions about the relevant behavior, focusing on blink rate, eye

closure, and yawning. Here we employ machine learning methods to data mine actual

human behavior during drowsiness episodes. The objective of this study was to discover

whatfacialconfigurationsarepredictorsoffatigue.Inthisstudy,facialmotionwasanalyzed

automatically from video using the computer expression recognition toolbox (CERT) to

automaticaloycodefacialactionsfromtheFacialActionCodingSystem.

Drivingtask

Subjectsplayedadrivingvideogameonawindowsmachineusingasteeringwheel 3andan

opensourcemulti-platformvideogame4(SeeFigure6).Subjectswereinstructedtokeep

thecarasclosetothecenteroftheroadaspossibleastheydroveasimulationofawinding

road.Atrandomtimes,awindeffectwasappliedthatdraggedthecartotherightorleft,

forcingthesubjecttocorrectthepositionofthecar.Thistypeofmanipulationhadbeen

foundinthepasttoincreasefatigue(Orden,Jung&Makeig,2000).Drivingspeedwasheld

3Thrustmaster

RFerrari Racing Wheel

4The Open Racing Car Simulator (TORCS)


13/34

constant.Foursubjectsperformedthedrivingtaskoverathreehourperiodbeginningat

midnight.During thistime subjects fellasleep multiple times thus crashingtheir vehicles.

Episodesinwhichthecarlefttheroad(crash)wererecorded.VideoofthesubjectsfacewasrecordedusingaDVcamerafortheentire3hoursession.

In addition to measuring facial expressions with CERT, head movement was

measured using an accelerometer placed on a headband, as well as steering wheel

movement data. The accelerometer had 3 degrees of freedom, consisting of three one

dimensional accelerometersmounted atrightanglesmeasuringaccelerationsin therange

of5gto+5gwheregrepresentsearthgravitationalforce.

[INSERTFIGURE6ABOUTHERE]FacialActionsAssociatedwithDriverFatigue

Subject data was partitioned into drowsy (non-alert)and alert states asfollows. The one

minuteprecedingasleepepisodeoracrashwasidentifiedasanon-alertstate.Therewasa

meanof24non-alertepisodespersubject,witheachsubjectcontributingbetween9and35

non-alertsamples.Fourteenalertsegmentsforeachsubjectwerecollectedfromthefirst20

minutesofthedrivingtask.


Theoutputofthefacialactiondetector(CERT)consistedofacontinuousvaluefor

each facial action and each video frame which was the distance to the separating

hyperplane,i.e.,themargin.Histogramsfortwooftheactionunitsinalertandnon-alert

statesareshowninFigure7.TheareaundertheROC(A)wascomputedfortheoutputsof

each facial action detector to see to what degree the alert and non-alert output

distributionswereseparated.

In order to understand how each action unit is associated with drowsiness across

differentsubjects,MultinomialLogisticRidgeRegression(MLR)wastrainedoneachfacial

actionindividually.ExaminationoftheAforeachactionunitrevealsthedegreetowhich

eachfacialmovementwasabletopredictdrowsinessinthisstudy.TheAsforthedrowsy

andalertstatesareshowninTable2.Thefivefacialactionsthatwerethemostpredictiveof

drowsinessbyincreasingindrowsystateswere45,2(outerbrowraise),15(frown),17(chinraise),and9(nosewrinkle).Thefiveactionsthatwerethemostpredictiveofdrowsinessby

decreasingindrowsystateswere12(smile),7(lidtighten),39(nostrilcompress),4(brow

lower),and26 (jawdrop).Thehighpredictiveabilityoftheblink/eyeclosuremeasurewas

expected. However the predictability of the outer brow raise (AU 2) was previously

unknown.

We observed during this study that many subjects raised their eyebrows in an

attempt to keep their eyes open, and the strong association of the AU 2 detector is

consistent with that observation. Also of note is that action 26, jaw drop, which occurs


14/34

duringyawning,actuallyoccurredlessofteninthecritical60secondspriortoacrash.Thisis

consistent with the prediction that yawning does not tend tooccur inthe finalmomentsbeforefallingasleep.

[INSERTTABLE2ABOUTHERE]AutomaticDetectionofDriverFatigue

The ability to predict drowsiness in novel subjects from the facial action code was then

testedby runningMLRon thefullsetof facialactionoutputs.Predictionperformancewas

testedbyusingaleave-one-outcrossvalidationprocedure,inwhichonesubjectsdatawas

withheldfromtheMLRtrainingandretainedfortesting,andthetestwasrepeatedforeachsubject.Thedataforeachsubjectbyfacialactionwasfirstnormalizedtozero-meanand

unitstandarddeviation.TheMLRoutputforeachAUfeaturewassummedoveratemporal

windowof12seconds(360frames)beforecomputingA.

MLRtrainedonallAUfeaturesobtainedanAof.90forpredictingdrowsinessinnovel

subjects.Becausepredictionaccuracymaybeenhancedbyfeatureselection,inwhichonly

the AUs with the most information for discriminating drowsiness are included in the

regression, a second MLR was trained by contingent feature selection, starting with the

most discriminative feature (AU 45), and then iteratively adding the next most

discriminative feature given the features already selected. These features are shown on

Table3. Bestperformanceof.98wasobtainedwithfivefeatures:45,2,19(tongueshow),26(jawdrop),and15.ThisfivefeaturemodeloutperformedtheMLRtrainedonallfeatures.

Effect of Temporal Window Length. The performances shown in Table 3 employed a

temporal window of 12 seconds, meaning that the MLR output was summed over 12

seconds(360frames)formakingtheclassificationdecision(drowsy/not-drowsy).Wenext

examinedtheeffectof the size ofthetemporal window onperformance. Here,theMLR

outputwassummedoverwindowsofNseconds,whereNrangedfrom0.5to60seconds.

Thefivefeaturemodelwasagainemployedforthisanalysis.Figure8showstheareaunder

theROCfordrowsinessdetectioninnovelsubjectsoverarangeoftemporalwindowsizes.

Performance saturates at about 0.99 as the window size exceeds 30 seconds. In other

words, given a 30 second video segment the system can discriminate sleepy versus non-

sleepysegmentswith0.99accuracyacrosssubjects.

[INSERTTABLE3ABOUTHERE]


Couplingofbehaviors


15/34

Coupling of steering and head motion. Observation of the subjects during drowsy and

nondrowsystatesindicatedthatthesubjectsheadmotiondifferedsubstantiallywhenalert

versuswhenthedriverwasabouttofallasleep.Surprisingly,headmotionincreasedasthedriverbecamedrowsy,withlargerollmotioncoupledwiththesteeringmotionasthedriver

becamedrowsy.Justbeforefallingasleep,theheadwouldbecomestill.

We also investigated the coupling of the head and arm motions. Correlations

betweenheadmotionasmeasuredbytherolldimensionoftheaccelerometeroutputand

thesteeringwheelmotionareshowninFigure9.Forthissubject(subject2),thecorrelation

betweenheadmotionandsteeringincreasedfrom0.33inthealertstateto0.71inthenon-

alert state. For subject 1, the correlation between head motion and steering similarly

increasedfrom0.24inthealertstateto0.43inthenon-alertstate.Theothertwosubjects

showed a smaller coupling effect. Future work includes combining the head motion

measures and steering correlations with the facial movement measures in the predictivemodelfordetectingdriverdrowsiness.

Couplingofeyeopennessandeyebrowraise. We observed that for some ofthe subjects

couplingbetweeneyebrowupsandeyeopennessincreasedinthedrowsystate.Inother

wordssubjectstriedtoopentheireyesusingtheireyebrowsinanattempttokeepawake.

SeeFigure10.



ConclusionsofDriverFatigueStudy

Thissectionpresentedasystemforautomaticdetectionofdriverdrowsinessfromvideo.

Previousapproachesfocusedonassumptionsaboutbehaviorsthatmightbepredictiveof

drowsiness.Here,asystemforautomaticallymeasuringfacialexpressionswasemployedto

dataminespontaneousbehaviorduringrealdrowsinessepisodes.Thisisthefirstworkto

our knowledge to reveal significant associations between facial expression and fatigue

beyondeyeblinks.Theprojectalsorevealedapotentialassociationbetweenheadrolland

driverdrowsiness,andthecouplingofheadrollwithsteeringmotionduringdrowsiness.Of

noteisthatabehaviorthatisoftenassumedtobepredictiveofdrowsiness,yawn,wasinfactanegativepredictorofthe60-secondwindowpriortoacrash.Itappearsthatinthe

moments before falling asleep, drivers yawn less, not more, often. This highlights the

importance of using examples of fatigue and drowsiness conditions in which subjects

actuallyfallsleep.

AutomatedFeedbackforIntelligentTutoringSystems


16/34

Whitehill, Bartlett and Movellan (2008) investigated the utility of integrating automatic

facialexpressionrecognitionintoanautomatedteachingsystem.Webrieflydescribethatstudyhere.Therehasbeenagrowingthrusttodeveloptutoringsystemsandagentsthat

respondtothestudentsemotionalandcognitivestateandinteractwiththeminasocial

manner(e.g.Kapooretal.2007,DMelloetal.,2007).Whitehill'sworkusedexpressionto

estimatethestudent'spreferredviewingspeedofthevideos,andthelevelofdifficulty,as

perceivedbytheindividualstudent,ofthelectureateachmomentoftime.Thisstudytook

firststepstowardsdevelopingmethodsforclosedloopteachingpolicies, i.e.,systemsthat

haveaccesstorealtimeestimatesofcognitiveandemotionalstatesofthestudentsandact

accordingly. The goal of this study was to assess whether automated facial expression

measurements using CERT, collected in real-time, could predict factors such as the

perceiveddifficultyofavideolectureorthepreferredviewingspeed.In this study, 8 subjects separately watched a video lecture composed of several

shortclipsonmathematics,physics,psychology,andothertopics.Theplaybackspeedofthe

videowascontrollablebythestudentusingthekeyboard-thestudentcouldspeedup,slow

down, or rewind the video by pressing a different key. The subjects were instructed to

watch the video as quickly as possible (so as to be efficient with their time) while still

retaining accurate knowledge of the video's content, since they would be quizzed

afterwards.

Whilewatchingthelecture,thestudent'sfacialexpressionswere measuredinreal-

time by the CERT system (Bartlett et al., 2006). CERT is fully automatic, and requires no

manual initialization or calibration, which enables real-time applications. The version ofCERTusedinthestudycontaineddetectorsfor12differentFacialActionUnitsaswellasa

Smile detector trained to recognize social smiles. Each detector output a real-valued

estimateoftheexpression'sintensityateachmomentintime.Afterwatchingthevideoand

takingthequiz,eachsubjectthenwatchedthelecturevideoagainatafixedspeedof1.0

(real-time).Duringthissecondviewing,subjectsspecifiedhoweasyordifficulttheyfound

thelecturetobeateachmomentintimeusingthekeyboard.

In total, the experiment resulted in three data series for each student: (1)

Expression, consisting of 13 facial expression channels (12 different facial actions, plus

smile);(2)Speed,consistingofthevideospeedateachmomentintimeascontrolledbythe

user during the firstviewing ofthe lecture; and (3) Difficulty,as reportedby the studenthim/herselfduringthesecondviewingofthevideo.

For each subject, the three data series were time-aligned and smoothed. Then, a

regression analysis was performed using the Expression data (both the 13 intensities

themselves, as well asthe firsttemporal derivatives) as independent variables to predict

both Difficulty and Expression using standard linear regression. An example of such

predictionsisshowninFigure11cforonesubject.



17/34

Theresultsofthecorrelationstudysuggestedthatfacialexpression,asestimatedby

a fully automatic facial expression recognition system, is significantly predictive of both

preferredviewingSpeedandperceivedDifficulty.Acrossthe8subjects,andevaluatedonaseparatevalidationsettakenfromthethreedataseriesthatwasnotusedfortraining,the

averagecorrelationswithDifficultyandSpeedwere0.42and0.29,respectively.Thespecific

facialexpressionswhichwerecorrelatedvariedhighlyfromsubjecttosubject.Noindividual

facialexpressionwasconsistentlycorrelatedacrossallsubjects,butthemostconsistently

correlatedexpression(takingparityintoaccount)wasAU45(``blink'').Theblinkactionwas

negativelycorrelatedwithperceivedDifficulty,meaningthatsubjectsblinkedlessduringthe

moredifficultsectionsofvideo.Thisisconsistentwithpreviousworkassociatingdecreases

inblinkratewithincreasesincognitiveload(Holland&Tarlow,1972;Tada1986).

Overall, this pilot study provided proof of principal, that fully automated facial

expression recognition at the present state of the art can be used to provide real-timefeedback in automated tutoring systems. The validation correlations were small, but

statisticallysignificant(Wilcoxonsignranktestp


18/34


19/34

References

Bartlett,M.Littlewort,G.Vural,E.,Lee,K.,Cetin,M.,Ercil,A.,andMovellan,M.(2008).

Dataminingspontaneousfacialbehaviorwithautomaticexpressioncoding.Lecture

NotesinComputerScience5042p.1-21.

BartlettM.S.,LittlewortG.C.,FrankM.G.,LainscsekC.,FaselI.,andMovellanJ.R.,(2006).

Automatic recognition of facial actions in spontaneous expressions., Journal of

Multimedia.,1(6)p.22-35.

Bartlett, M.S., Hager, J.C., Ekman, P., and Sejnowski, T.J. (1999). Measuring facial

expressionsbycomputerimageanalysis.Psychophysiology,36,253-263.

Bartlett,M.Stewart,Viola,P.A.,Sejnowski,T.J.,Golomb,B.A.,Larsen,J.,Hager,J.C.,and

Ekman, P. (1996). Classifying facial action, Advances in Neural Information

ProcessingSystems8,MITPress,Cambridge,MA.p.823-829.

Cockburn, J., Bartlett, M., Tanaka, J., Movellan, J., Pierce, M., and Schultz, R. (2008).

SmileMaze:ATutoringSysteminReal-TimeFacialExpressionPerceptionandProduction

for Children with Autism Spectrum Disorder. Intl Conference on Automatic Face and

Gesture Recognition, Workshop on Facial and Bodily expressions for Control and

AdaptationofGames.

Cobb.W.(1983).Recommendationsforthepracticeofclinicalneurophysiology.Elsevier.

Cohn, J.F. & Schmidt, K.L. (2004). The timing of facial motion in posed and spontaneous

smiles.J.Wavelets,Multi-resolution&InformationProcessing,Vol.2,No.2,pp.121-132.

CraigKD, HydeS,PatrickCJ. (1991).Genuine,supressed,andfakedfacialbehaviourduring

exacerbationofchroniclowbackpain.Pain46:161172.

Craig KD, Patrick CJ. (1985). Facial expression during induced pain. J Pers Soc Psychol.

48(4):1080-91.

D'Mello, S., Picard, R., & Graesser, A.(2007). Towards an affect-sensitive autotutor. IEEE

Intelli-gentSystems,SpecialissueonIntelligentEducationalSystems,22(4),53-61.

DOT(2001).Savinglivesthroughadvancedvehiclesafetytechnology.USADepartmentof

Transportation.http://www.its.dot.gov/ivi/docs/AR2001.pdf.


20/34

Donato,G.,Bartlett,M.S.,Hager,J.C.,Ekman,P.&Sejnowski,T.J.(1999).Classifyingfacial

actions.IEEETrans.PatternAnalysisandMachineIntelligence,Vol.21,No.10,pp.974-989.

Ekman, P. (2001).Telling Lies:CluestoDeceit in theMarketplace, Politics, andMarriage.

W.W.Norton,NewYork,USA.

Ekman,P.,Levenson,R.W.andFriesen,W.V.(1983).Autonomicnervoussystemactivity

distinguishesbetweenemotions.Science,221,12081210.

EkmanP.andFriesen,W.FacialActionCodingSystem:ATechniquefortheMeasurementof

FacialMovement,ConsultingPsychologistsPress,PaloAlto,CA,1978.

Ekman,P.&Rosenberg,E.L.,(Eds.),(2005).Whatthefacereveals:Basicandappliedstudies

ofspontaneousexpressionusingtheFACS,OxfordUniversityPress,Oxford,UK.

Fasel I., Fortenberry B., Movellan J.R. A generative framework for real-time object

detectionandclassification.,ComputerVisionandImageUnderstanding98,2005.

FishbainDA,CutlerR,RosomoffHL,RosomoffRS.(2006).Accuracyofdeceptionjudgments.

PersSocPsycholRev.10(3):214-34.

FrankMG,EkmanP,FriesenWV.(1993).Behavioralmarkersandrecognizabilityofthesmile

ofenjoyment.JPersSocPsychol.64(1):83-93.

Grossman, S., Shielder, V., Swedeen, K., Mucenski, J. (1991). Correlation of patient and

caregiverratingsofcancerpain.JournalofPainandSymptomManagement6(2),p.53-

57.

Gu,H.,Ji,Q.(2004).Anautomatedfacereaderforfatiguedetection.In:FGR.(2004)111

116.

Gu, H., Zhang, Y., Ji, Q. (2005). Task oriented facial behavior recognition with selective

sensing.Comput.Vis.ImageUnderst.100(3)p.385415.

Hadjistavropoulos HD, Craig KD, Hadjistavropoulos T, Poole GD. (1996). Subjective

judgmentsofdeceptioninpainexpression:accuracyanderrors.Pain.65(2-3):251-8.

Hill,M.L.,&Craig,K.D.(2004).Detectingdeceptioninfacialexpressionsofpain:Accuracy

andtraining.ClinicalJournalofPain,20,415-422.


21/34

HillML,CraigKD (2002)Detectingdeceptioninpainexpressions:thestructureofgenuine

anddeceptivefacialdisplays.Pain.98(1-2):135-44.

Holland,M.K.andG.Tarlow(1972).Blinkingandmentalload.PsychologicalReports,31(1).

Hong,Chung,K.(2005).Electroencephalographicstudyofdrowsinessinsimulateddriving

withsleepdeprivation.InternationalJournalof Industrial Ergonomics.35(4),Pages307-

320.

Igarashi,K.,Takeda,K.,Itakura,F.,Abut,H.:(2005).DSPforIn-VehicleandMobileSystems.

SpringerUS

Kapoor, A., Burleson, W., & Picard, R. (2007). Automatic prediction of frustration.InternationalJournalofHuman-ComputerStudies,65(8),724-736.

Larochette AC, Chambers CT, Craig KD (2006). Genuine, suppressed and faked facial

expressionsofpaininchildren.Pain.2006Dec15;126(1-3):64-71.

Littlewort,Bartlett,&Lee(inpress).AutomaticcodingofFacialExpressionsdisplayedduring

PosedandGenuinePain.ImageandVisionComputing,2009.

Littlewort,G.,Bartlett,M.S.,Fasel,I., Susskind,J. &Movellan,J. (2006).Dynamicsoffacial

expressionextractedautomaticallyfromvideo.J.Image&VisionComputing,Vol.24,No.

6,pp.615-625.

MorecraftRJ,LouieJL,HerrickJL,Stilwell-MorecraftKS.(2001).Corticalinnervationofthe

facial nucleus inthe non-human primate:a new interpretationof the effects ofstroke

and related subtotal brain trauma on the muscles of facial expression. Brain 124(Pt

1):176-208.

Orden, K.F.V., Jung, T.P., Makeig, S. (2000). Combined eye activity measures accurately

estimatechangesinsustained visualtaskperformance.BiologicalPsychology52(3):221-

40.

Prkachin KM., Solomon, P.A. & Ross, A.J. (2007). The underestimation of pain among

health-careproviders.CanadianJournalofNursingResearch,39,88-106.

PrkachinKM,SchultzI,BerkowitzJ,HughesE,HuntD.Assessingpainbehaviouroflow-back

painpatientsinrealtime:concurrentvalidityandexaminersensitivity.BehavResTher.

40(5):595-607.


22/34

Prkachin KM. (1992). The consistency of facial expressions of pain: a comparison across

modalities.Pain.51(3):297-306.

Rinn WE. The neuropsyhology of facial expression: a review of the neurological and

psychologicalmechanismsforproducingfacialexpression.PsycholBull95:5277.

SchmidtKL,CohnJF,TianY.(2003).Signalcharacteristicsofspontaneousfacialexpressions:

automaticmovementinsolitaryandsocialsmiles.BiolPsychol.65(1):49-66.

Tada,H.(1986).Eyeblinkratesasafunctionoftheinterestvalueofvideostimuli.Tohoku

PsychologicaFolica,45.

Takei,Y.Furukawa,Y.(2005):Estimateofdriversfatiguethroughsteeringmotion.In:Man

andCybernetics,2005IEEEInternationalConference.Volume:2,pg.1765-1770.

Viola,P.&Jones,M.(2004).Robustreal-timefacedetection.J.ComputerVision,Vol.57,No.

2,pp.137-154.

Vural,E.,Cetin,M.,Ercil,A.,Littlewort,G.,Bartlett,M.,andMovellan,J.(2007).Drowsy

driverdetectionthroughfacialmovementanalysis.ICCVWorkshoponHumanComputer

Interaction.2007IEEE.

Whitehill,J.,Littlewort,G.,Fasel,I.,Bartlett,M.,&Movellan,J.(inpress).Towardspractical

smiledetection.TransactionsonPatternAnalysisandMachineIntelligence,2009.

Whitehill,J.,Bartlett,M.,andMovellan,J.(2008).Automatedteacherfeedbackusingfacial

expression recognition. Workshop on CVPR for Human Communicative Behavior

Analysis,IEEEConferenceonComputerVisionandPatternRecognition.

Zhang,Z.,shuZhang,J.(2006).Driverfatiguedetectionbasedintelligentvehiclecontrol.In:

Proceedings of the 18thInternationalConference on Pattern Recognition, Washington,

DC,USA,IEEEComputerSocietyp.12621265.


23/34

Table1.Z-scoredifferencesofthethreepainconditions,averagedacrosssubjects.FB:Fear

brow1+2+4.DB:Distressbrow(1,1+4).

A.RealPainvsbaseline:

ActionUnit 2512926106

Z-score 1.41.41.31.20.90.9

B.FakedPainvsBaseline:

ActionUnit4DB1251262610FB9207

Z-score2.72.11.71.51.41.41.31.31.21.11.00.9

C.RealPainvsFakedPain:

ActionUnit 4 DB1

Z-scoredifference1.8 1.71.0

Table 2. MLR model for predicting drowsiness across subjects. Predictive performance of

eachfacialactionindividuallyisshown.

Morewhencriticallydrowsy Lesswhencriticallydrowsy

AU45

2

15

17

9

30

20

11

14

1

10

27

18

22

24

19

NameBlink/EyeClosure

OuterBrowRaise

Lip Corner

Depressor

ChinRaiser

NoseWrinkle

JawSideways

Lipstretch

NasolabialFurrow

Dimpler

InnerBrowRaise

UpperLipRaise

MouthStretch

LipPucker

Lipfunneler

Lippresser

Tongueshow

A0.94

0.81

0.80

0.79

0.78

0.76

0.74

0.71

0.71

0.68

0.67

0.66

0.66

0.64

0.64

0.61

AU12

7

39

4

26

6

38

23

8

5

16

32

NameSmile

Lidtighten

NostrilCompress

Browlower

JawDrop

CheekRaise

NostrilDilate

Liptighten

Lipstoward

Upperlidraise

Upperlipdepress

Bite

A0.87

0.86

0.79

0.79

0.77

0.73

0.72

0.67

0.67

0.65

0.64

0.63


24/34

Table3.Drowsinessdetectionperformancefornovelsubjects,usingaMLRclassifierwith

differentfeaturecombinations.Theweightedfeaturesaresummedover12secondsbeforecomputingA.

Feature A

AU45,AU2,AU19,AU26,AU15

AllAUfeatures

.9792

.8954


25/34

Figure1.SamplefacialactionsfromtheFacialActionCodingSystemincorporatedinCERT.CERTincludes30total.


26/34

a

b

Figure 2. ComputerExpression Recognition Toolbox (CERT).a. Overviewof system design b. Exampleof

CERTrunningonlivevideo.Eachsubplothastimeinthehorizontalaxisandtheverticalaxisindicatesintensity

ofaparticularfacialmovement.


27/34

Figure 3. Data mining human behavior. CERT is applied to face videos containing

spontaneousexpressionsofstatesinquestion.Machinelearningis appliedto theoutputs

ofCERTtolearnaclassifiertoautomaticallydiscriminatestateAfromStateB.


28/34

a b

Figure4.Samplefacialbehaviorandfacialactioncodesfromthefaked(a)andrealpain(b)conditions.


29/34

Figure5.ExampleofIntegratedEventRepresentation,foronesubject,andonefrequencybandforAU4.

Green(.):RawCERToutputforAU4.Blue(-):DOGfilteredsignalforonefrequencyband.Red(-.):areaunder

eachcurve.


30/34

Figure6.DrivingSimulationTask.Top:Screenshotofdrivinggame.Center:Steeringwheel.Bottom:Image

ofasubjectinadrowsystate(left),andfallingasleep(right).


31/34

Figure7.Examplehistogramsforblink(left)andActionUnit2(right)inalertandnon-alertstatesforone

subject.AisareaundertheROC.Thex-axisisCERToutput.


32/34

Figure 8. Performance for drowsiness detection in novel subjects over temporal window

sizes.


33/34

Figure9.Headmotion(blue/gray) andsteeringposition(red/black)for60secondsinanalertstate(left)

and60secondspriortoacrash(right).Headmotionistheoutputoftherolldimensionoftheaccelerometer.

Figure10.EyeOpenness(red/black)andEyeBrowRaises(AU2)(blue/gray)for10secondsin

analertstate(left)and10secondspriortoacrash(right).


34/34

a b

c

Figure11a.Samplevideolecture.b.Automatedfacialexpressionrecognitionisperformedonsubjectsface

asshewatchesthelecture.c.Self-reporteddifficultyvalues(dashed),andthereconstructeddifficultyvalues

(solid)computedusinglinearregressionoverthefacialexpressionoutputsforonesubject(12actionunitsandsmiledetector).ReprintedwithpermissionfromWhitehill,Bartlett,&Movellan(2008).2008IEEE.

Documents

Insights on spontaneous facial expressions from automatic expression measurement.pdf