9
The CMU Pose, Illumination, and Expression Database Terence Sim, Simon Baker, and Maan Bsat Corresponding Author: Simon Baker The Robotics Institute Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 [email protected] Abstract In the Fall of 2000 we collected a database of over 40,000 facial images of 68 people. Using the CMU 3D Room we imaged each person across 13 different poses, under 43 different illumination conditions, and with 4 different expressions. We call this the CMU Pose, Illumination, and Expres- sion (PIE) database. We describe the imaging hardware, the collection procedure, the organization of the images, several possible uses, and how to obtain the database.

The CMU Pose, Illumination, and Expression Databasesimonb/pie_db/pami.pdfThe CMU Pose, Illumination, and Expression Database Terence Sim, Simon Baker, and Maan Bsat Corresponding Author:

  • Upload
    vudang

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

TheCMU Pose,Illumination,andExpressionDatabase

Terence Sim, Simon Baker, and Maan Bsat

Corresponding Author: Simon Baker

TheRoboticsInstituteCarnegieMellon University

5000ForbesAvenuePittsburgh,PA 15213

[email protected]

Abstract

In theFall of 2000we collecteda databaseof over 40,000facial imagesof 68 people.Usingthe

CMU 3D Roomwe imagedeachpersonacross13 differentposes,under43 differentillumination

conditions,andwith 4 differentexpressions.Wecall thistheCMU Pose,Illumination,andExpres-

sion(PIE)database.Wedescribetheimaginghardware,thecollectionprocedure,theorganization

of theimages,severalpossibleuses,andhow to obtainthedatabase.

1

1 Introduction

Peoplelook verydifferentdependingonanumberof factors.Perhapsthe3 mostsignificantfactors

are: (1) the pose;i.e. the angleat which you look at them,(2) the illumination conditionsat the

time, and (3) their facial expression;e.g. smiling, frowning, etc. Although several other face

databasesexist with a largenumberof subjects[Philipset al., 1997], andwith significantposeand

illumination variation[Georghiadeset al., 2000], we felt that therewasstill a needfor a database

consistingof a fairly largenumberof subjects,eachimageda largenumberof times,from several

differentposes,undersignificantilluminationvariation,andwith avarietyof expressions.

BetweenOctober2000andDecember2000wecollectedsuchadatabaseconsistingof over

40,000imagesof 68subjects.(Thetotalsizeof thedatabaseis about40GB.)Wecall this theCMU

Pose,Illumination, andExpression(PIE) database.To obtaina wide variationacrosspose,we

used13 camerasin theCMU 3D Room[Kanadeet al., 1998]. To obtainsignificantillumination

variation we augmentedthe 3D Room with a “flash system”similar to the one constructedby

AthinodorosGeorghiades,PeterBelhumeur, andDavid KriegmanatYaleUniversity[Georghiades

et al., 2000]. We built a similar systemwith 21 flashes. Sincewe capturedimageswith, and

without, backgroundlighting, we obtained��������������

different illumination conditions.

Furthermore,weaskedthesubjectsto posewith severaldifferentexpressions.

In the remainderof this paperwe describethe capturehardware, the organizationof the

images,a largenumberof possibleusesof thedatabase,andhow to obtainacopy of it.

2 Capture Apparatus

2.1 Setup of the Cameras: Pose

Obtainingimagesof a personfrom multiple posesrequireseithermultiple camerascapturingim-

agessimultaneously, or multiple “shots” takenconsecutively (or acombinationof thetwo.) There

are a numberof advantagesof using multiple cameras:(1) the processtakes less time, (2) if

the camerasarefixed in space,the (relative) poseis the samefor every subjectandthereis less

difficulty in positioningthesubjectto obtaina particularpose,(3) if the imagesaretakensimul-

taneouslywe know thattheimagingconditions(i.e. incidentillumination, etc)arethesame.This

1

c02 f15

c25

f22

f16 f17

f13

f14

c22

c37

f21 f12

f09 c05

f08

f11

c07

c27

c09 f20 f06

f07 c29

head

f19

c11

f05

f10

f18

c14

c31 f04

f03

c34

f02

FlashesCamerasHead

(a) (b)

Figure1: (a) Thesetupin theCMU 3D Room[Kanadeet al., 1998]. Thesubjectsits in a chairwith hisheadagainstapoleto fix theheadposition.Weused13Sony DXC 9000(3 CCD,progressive scan)cameraswith all gainandgammacorrectionturnedoff. We augmentedthe3D Roomwith 21 Minolta 220X flashescontrolledby anAdvantechPCL-734digital outputboard,duplicatingtheYale“flash dome”[Georghiadeset al., 2000]. (b) Thexyz-locationsof theheadposition,the13 cameras,andthe21 flashesplottedin 3D.Theselocationsweremeasuredwith aLeicatheodoliteandareincludedin themeta-data.

final advantagecanbeparticularlyusefulfor detailedgeometricandphotometricmodelingof ob-

jects.On theotherhand,thedisadvantagesof usingmultiple camerasare:(1) we actuallyneedto

possessmultiplecameras,digitizers,andcomputersto capturethedata,(2) thecamerasneedto be

synchronized:theshuttersmustall openat thesametime andwe mustknow thecorrespondence

betweentheframes,and(3) thecameraswill all havedifferentintrinsicparameters.

Settingup a synchronizedmulti-cameraimagingsystemis quiteanengineeringfeat. For-

tunately, sucha systemalreadyexistedat CMU, namelythe3D Room[Kanadeet al., 1998]. We

reconfiguredthe3D Roomandusedit to capturemultiple imagessimultaneouslyacrosspose.Fig-

ure 1 shows the capturesetupin the 3D Room. Thereare49 camerasin the 3D Room,14 very

high quality (3 CCD,progressive scan)Sony DXC 9000’s,and35 lower quality (singleCCD, in-

terlaced)JVCTK-C1380U’s. Wedecidedto useonly theSony camerassothattheimagequality is

approximatelythesameacrossthedatabase.Dueto otherconstraintswewereonly ableto use13

of the14Sony cameras.This still allowedusto capture13posesof eachpersonsimultaneously.

We positioned9 of the 13 camerasat roughly headheight in an arc from approximately

full left profile to full right profile. Eachneighboringpair of these9 camerasareapproximately���������

apart.Of theremaining4 cameras,2 wereplacedabove andbelow thecentralcamerac27,

and2 wereplacedin the cornersof the room, wheresurveillancecamerasaretypically located.

2

The locationsof 10 of the camerascanbe seenin Figure1(a). The other 3 are symmetrically

oppositethe3 right-mostcamerasin thefigure.We measuredthelocationsof thecamerasusinga

theodolite.Themeasuredlocationsareshown in Figure1(b)andareincludedin themeta-data.

2.2 The Flash System: Illumination

To obtainsignificantilluminationvariationweextendedthe3DRoomwith a“flashsystem”similar

to theYale Domeusedto capturethe datain [Georghiadeset al., 2000]. With help from Athin-

odorosGeorghiadesandPeterBelhumeur, we usedan AdvantechPCL-734,32-channeldigital

outputboardto control21 Minolta 220X flashes.TheAdvantechboardcanbedirectly wired into

the “hot-shoe”of the flashes.Generatinga pulseon oneof the outputchannelsthencausesthe

correspondingflashto gooff. WeplacedtheAdvantechboardin oneof the17computersusedfor

imagecapturein the3D Roomandintegratedtheflashcontrolcodeinto theimagecaptureroutine

so that the flash,the durationof which is approximately1ms,occurswhile the shutter(duration

approximately16ms)is open.We thenmodifiedtheimagecapturecodesothatoneflashgoesoff

in turn for eachimagecaptured.We werethenableto capture21 images,eachwith differentillu-

mination,in������ ����������

sec.Thelocationsof theflashes,measuredwith a theodolite,areshown

in Figure1(b)andincludedin thedatabasemeta-data,alongwith with cameralocations.

In theYale illumination database[Georghiadeset al., 2000] the imagesarecapturedwith

the room lights switchedoff. The imagesin the databasethereforedo not look entirely natural.

In therealworld, illumination usuallyconsistsof anambientlight with perhapsoneor two point

sources.To obtainrepresentative imagesof suchcases(thataremoreappropriatefor determining

therobustnessof facerecognitionalgorithmsto illuminationchange)wedecidedto captureimages

bothwith theroomlights on andwith themoff. We decidedto includeimageswith thelights off

to givesomepartialoverlapwith thedatabaseusedin [Georghiadeset al., 2000].

3 Database Contents

On averagethe entire captureproceduretook about10 minutesper subject. In that time, we

captured(andretained)over 600imagesfrom 13 poses,with 43 differentilluminations,andwith

4 expressions.The color imageshave size � ��� �!��" � . (The first 6 rows of the imagescontain

synchronizationinformationaddedby theVITC unitsin the3D Room[Kanadeet al., 1998]. This

3

c07

c27

c22 c02 c37 c05

c25 c09 c31

c29 c11 c14 c34

Figure2: An illustrationof theposevariationin thePIE database.Theposevariesfrom full left profile tofull frontal andon to full right profile. The9 camerasin thehorizontalsweepareeachseparatedby about#$#�%'&$(

. The 4 othercamerasinclude2 above and2 below the centralcamera,and2 in the cornersof theroom,typical locationsfor surveillancecameras.SeeFigure1 for thecameralocations.

informationmay be discarded.)The storagerequiredper personis approximately600MB using

color “raw PPM” images.Thus,thetotal storagerequirementfor 68 peopleis around40GB.This

canof coursebe reducedby compression,but we did not do so in orderto preserve the original

data.

3.1 Pose Variation

An exampleof theposevariationin thePIE databaseis shown in Figure2. This figurecontains

imagesof 1 subjectfrom eachof the 13 cameras.As canbe seen,thereis a wide variation in

posefrom full profile to full frontal. This subsetof the datashouldbe usefulfor evaluatingthe

robustnessof facerecognitionalgorithmsacrosspose. Sincethe cameralocationsareknown, it

canalsobe usedfor theevaluationof poseestimationalgorithms.Finally, it might beusefulfor

theevaluationof algorithmsthatcombineinformationfrom multiplewidely separatedviews.

3.2 Pose and Illumination Variation

Examplesof the poseandillumination variationareshown in Figure3. Figure3(a) containsthe

variationwith the room lights on andFigure3(b) with the lights off. Comparingthe imageswe

seethatthosein Figure3(a)appearmorenaturalandrepresentativeof imagesthatoccurin thereal

world thanthosein Figure3(b). On the otherhand,the datawith the lights off wascapturedto

reproducetheYaledatabaseusedin [Georghiadeset al., 2000]. Thiswill allow adirectcomparison

betweenthe 2 databases.We foreseea numberof possibleusesfor the poseand illumination

4

c27

c22

c05RoomLights Flashf01 Flashf09

c27

c22

c05Flashf01 Flashf09 Flashf17

(a) (b)

Figure3: Examplesof theposeandillumination variationwith (a) the room lights on, and(b) the roomlightsoff. Noticehow thecombinationof roomilluminationandflashesleadsto muchmorenaturallookingimagesthanwith just theflashalone.

variationdata:First, it canbeusedto reproducetheresultsin [Georghiadeset al., 2000]. Second,

it canbeusedto evaluatetherobustnessof facerecognitionalgorithmsto poseandillumination.

3.3 Pose and Expression Variation

An exampleof the poseandexpressionvariationis shown in Figure4. The subjectis asked to

provide a neutralexpression,to smile, to blink (i.e. to keeptheir eyes shut), and to talk. For

neutral,smiling, andblinking, we kept13 images,1 from eachcamera.For talking, we captured

2 secondsof video(60 frames).Sincethis occupiesa lot morespace,we keptthis datafor only 3

cameras:thefrontal camerac27,thethree-quarterprofile camerac22,andthefull profile camera

c05. In addition, for subjectswho usuallywearglasses,we collectedan extra setof 13 images

without their glasses,askingthemto putonaneutralexpression.

Theposeandexpressionvariationdatacanbeusedto testthe robustnessof facerecogni-

tion algorithmsto expression(andpose.) The reasonfor including blinking wasthat many face

recognitionalgorithmsusetheeye pupils to align a facemodel. It is thereforepossiblethat these

algorithmsareparticularlysensitive to subjectsblinking. We cannow testwhetherthis is indeed

thecase.

3.4 Meta-Data

Wecollectedavarietyof miscellaneous“meta-data”to aid in calibrationandotherprocessing:

5

c27

c22

c05Neutral Smiling Blinking Talking

Figure4: An exampleof theposeandexpressionvariationin thePIEdatabase.Eachsubjectis askedto giveaneutralexpression,to smile,to blink, andto talk. Wecapturethisvariationacrossall poses.For theneutralimages,the smiling images,andthe blinking images,we keepthe datafrom all cameras.For the talkingimages,we keep60 framesof video from only 3 cameras(frontal c27, three-quarterprofile c05,andfullprofilec22).For subjectswhowearglasseswealsocapture,from all cameras,onesetof neutral-expressionimagesof themwithout theirglasses.

Head, Camera, and Flash Locations: Usinga theodolite,we measuredthexyz-locationsof the

head,the 13 cameras,andthe 21 flashes.SeeFigure1(b). The valuesareincludedin the

databaseandcanbeusedto estimate(relative)headposesandilluminationdirections.

Background Images: At the startof eachrecordingsession,we captureda backgroundimage

from eachof the13 cameras.Theseimagescanbeusedto helplocalizethefaceregion.

Color Calibration Images: Althoughthecamerasthatwe usedareall of thesametype,thereis

still a largeamountof variationin theirphotometricresponses,bothdueto theirmanufacture

anddueto the fact that theaperturesettingson thecameraswereall setmanually. We did

perform“auto white-balance”on the cameras,but thereis still somenoticeablevariation

in their color response.To allow the camerasto be intensity- (gain andbias) andcolor-

calibrated,wecapturedimagesof colorcalibrationcharts.

Personal Attributes of the Subjects: Finally, weincludesomepersonalinformationaboutthe68

subjects.For eachsubjectwe recordthesubject’s sex andage,thepresenceor absenceof

eyeglasses,mustache,andbeard,aswell asthedateon which theimageswerecaptured.

4 Potential Uses of the Database

Throughoutthis paperwe have pointedout a numberof potentialusesof the database.We now

summarizesomeof thepossibilities,citing severalpapersthathavealreadyusedthedatabase:

6

) Evaluatingposeinvariantfacedetectors[Heiseleet al., 2001a,Heiseleet al., 2001b].

) Evaluationof headposeestimationalgorithms.

) Evaluationof the robustnessof facerecognitionalgorithmsto theposeof theprobeimage

[Grosset al., 2001,Blanzet al., 2002].

) Evaluationof facerecognitionalgorithmsthatoperateacrosspose;i.e.algorithmsfor which

thegalleryandprobeimageshavedifferentposes[Grosset al., 2002,Blanzet al., 2002].

) Evaluationof facerecognizersthatusemultiple imagesacrosspose[Grosset al., 2002].

) Evaluationof therobustnessof facerecognitionalgorithmsto illumination (andpose)[Sim

andKanade,2001,Grosset al., 2001,Blanzet al., 2002].

) Evaluationof therobustnessof facerecognitionalgorithmsto facialexpressionandpose.

) 3D facemodelbuilding eitherusingmultiple imagesacrosspose(stereo)or multiple images

acrossillumination (photometricstereo[Georghiadeset al., 2000]).

Theimportanceof theevaluationof algorithms(andthedatabasesusedto performtheevaluation)

for thedevelopmentof algorithmsshouldnot beunderestimated.It is oftenthefailureof existing

algorithmsonnew datasets,or simply theexistenceof new datasets,thatdrivesresearchforward.

5 Obtaining the Database

Wehavebeendistributing thePIEdatabasein thefollowing manner:

1. Therecipientshipsanempty(E)IDE harddrive to us.

2. Wecopy thedataontothedriveandshipit back.

TodatewehaveshippedthePIEdatabaseto over50researchgroupsworldwide.Anyoneinterested

in receiving thedatabaseshouldcontactthesecondauthorby [email protected] or

visit thePIE databasewebsiteatwww.ri.cmu.edu/projects/project_418.html.

Acknowledgements

We would like to thankAthinodorosGeorghiadesandPeterBelhumeurfor giving us detailsof

theYale“flash dome.” SundarVedulaandGermanCheunggaveusgreathelpusingtheCMU 3D

Room. We would alsolike to thankHenrySchneidermanandJeff Cohnfor discussionson what

7

datato collect.Financialsupportfor thecollectionof thisdatabasewasprovidedby theU.S.Office

of Naval Research(ONR) undercontractN00014-00-1-0915.A versionof this paperappearedin

the5th InternationalConferenceon AutomaticFaceandGestureRecognition[Sim et al., 2002].

We thankboththePAMI reviewersandthereviewersof [Sim et al., 2002] for their feedback.

References

[Blanzet al., 2002] V. Blanz,S.Romdhani,andT. Vetter. Faceidentificationacrossdifferentposesandilluminationswith a 3D morphablemodel. In Proceedings of the 5th IEEE InternationalConference on Face and Gesture Recognition, 2002.

[Georghiadeset al., 2000] A.S. Georghiades,P.N. Belhumeur, andD.J. Kriegman. From few tomany: Generativemodelsfor recognitionundervariableposeandillumination. In Proceedingsof the 4th IEEE International Conference on Automatic Face and Gesture Recognition, 2000.

[Grosset al., 2001] R. Gross,J.Shi, andJ.Cohn. Quovadisfacerecognition.In Proceedings ofthe 3rd Workshop on Empirical Evaluation Methods in Computer Vision, 2001.

[Grosset al., 2002] R. Gross,I. Matthews, andS. Baker. Eigenlight-fieldsandfacerecognitionacrosspose. In Proceedings of the 5th IEEE International Conference on Face and GestureRecognition, 2002.

[Heiseleet al., 2001a] B. Heisele,T. Serre,M. Pontil, and T. Poggio. Component-basedfacedetection.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2001.

[Heiseleet al., 2001b] B. Heisele,T. Serre,M. Pontil,T. Vetter, andT. Poggio.Categorizationbylearningandcombiningobjectparts.In Neural Information Processing Systems, 2001.

[Kanadeet al., 1998] T. Kanade,H. Saito,andS.Vedula.The3D room: Digitizing time-varying3D eventsby synchronizedmultiple video streams. TechnicalReport CMU-RI-TR-98-34,Carnegie Mellon UniversityRoboticsInstitute,1998.

[Philipset al., 1997] P.J. Philips, H. Moon, P. Rauss,and S.A. Rizvi. The FERET evaluationmethodologyfor face-recognitionalgorithms.In Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, 1997.

[Sim andKanade,2001] T. SimandT. Kanade.Combiningmodelsandexemplarsfor facerecog-nition: An illuminatingexample.In Proceedings of the Workshop on Models versus Exemplarsin Computer Vision, 2001.

[Sim et al., 2002] T. Sim, S. Baker, andM. Bsat. The CMU pose,illumination, andexpression(PIE) database.In Proceedings of the 5th IEEE International Conference on Face and GestureRecognition, 2002.

8