Upload
vudang
View
219
Download
0
Embed Size (px)
Citation preview
TheCMU Pose,Illumination,andExpressionDatabase
Terence Sim, Simon Baker, and Maan Bsat
Corresponding Author: Simon Baker
TheRoboticsInstituteCarnegieMellon University
5000ForbesAvenuePittsburgh,PA 15213
Abstract
In theFall of 2000we collecteda databaseof over 40,000facial imagesof 68 people.Usingthe
CMU 3D Roomwe imagedeachpersonacross13 differentposes,under43 differentillumination
conditions,andwith 4 differentexpressions.Wecall thistheCMU Pose,Illumination,andExpres-
sion(PIE)database.Wedescribetheimaginghardware,thecollectionprocedure,theorganization
of theimages,severalpossibleuses,andhow to obtainthedatabase.
1
1 Introduction
Peoplelook verydifferentdependingonanumberof factors.Perhapsthe3 mostsignificantfactors
are: (1) the pose;i.e. the angleat which you look at them,(2) the illumination conditionsat the
time, and (3) their facial expression;e.g. smiling, frowning, etc. Although several other face
databasesexist with a largenumberof subjects[Philipset al., 1997], andwith significantposeand
illumination variation[Georghiadeset al., 2000], we felt that therewasstill a needfor a database
consistingof a fairly largenumberof subjects,eachimageda largenumberof times,from several
differentposes,undersignificantilluminationvariation,andwith avarietyof expressions.
BetweenOctober2000andDecember2000wecollectedsuchadatabaseconsistingof over
40,000imagesof 68subjects.(Thetotalsizeof thedatabaseis about40GB.)Wecall this theCMU
Pose,Illumination, andExpression(PIE) database.To obtaina wide variationacrosspose,we
used13 camerasin theCMU 3D Room[Kanadeet al., 1998]. To obtainsignificantillumination
variation we augmentedthe 3D Room with a “flash system”similar to the one constructedby
AthinodorosGeorghiades,PeterBelhumeur, andDavid KriegmanatYaleUniversity[Georghiades
et al., 2000]. We built a similar systemwith 21 flashes. Sincewe capturedimageswith, and
without, backgroundlighting, we obtained��������������
different illumination conditions.
Furthermore,weaskedthesubjectsto posewith severaldifferentexpressions.
In the remainderof this paperwe describethe capturehardware, the organizationof the
images,a largenumberof possibleusesof thedatabase,andhow to obtainacopy of it.
2 Capture Apparatus
2.1 Setup of the Cameras: Pose
Obtainingimagesof a personfrom multiple posesrequireseithermultiple camerascapturingim-
agessimultaneously, or multiple “shots” takenconsecutively (or acombinationof thetwo.) There
are a numberof advantagesof using multiple cameras:(1) the processtakes less time, (2) if
the camerasarefixed in space,the (relative) poseis the samefor every subjectandthereis less
difficulty in positioningthesubjectto obtaina particularpose,(3) if the imagesaretakensimul-
taneouslywe know thattheimagingconditions(i.e. incidentillumination, etc)arethesame.This
1
c02 f15
c25
f22
f16 f17
f13
f14
c22
c37
f21 f12
f09 c05
f08
f11
c07
c27
c09 f20 f06
f07 c29
head
f19
c11
f05
f10
f18
c14
c31 f04
f03
c34
f02
FlashesCamerasHead
(a) (b)
Figure1: (a) Thesetupin theCMU 3D Room[Kanadeet al., 1998]. Thesubjectsits in a chairwith hisheadagainstapoleto fix theheadposition.Weused13Sony DXC 9000(3 CCD,progressive scan)cameraswith all gainandgammacorrectionturnedoff. We augmentedthe3D Roomwith 21 Minolta 220X flashescontrolledby anAdvantechPCL-734digital outputboard,duplicatingtheYale“flash dome”[Georghiadeset al., 2000]. (b) Thexyz-locationsof theheadposition,the13 cameras,andthe21 flashesplottedin 3D.Theselocationsweremeasuredwith aLeicatheodoliteandareincludedin themeta-data.
final advantagecanbeparticularlyusefulfor detailedgeometricandphotometricmodelingof ob-
jects.On theotherhand,thedisadvantagesof usingmultiple camerasare:(1) we actuallyneedto
possessmultiplecameras,digitizers,andcomputersto capturethedata,(2) thecamerasneedto be
synchronized:theshuttersmustall openat thesametime andwe mustknow thecorrespondence
betweentheframes,and(3) thecameraswill all havedifferentintrinsicparameters.
Settingup a synchronizedmulti-cameraimagingsystemis quiteanengineeringfeat. For-
tunately, sucha systemalreadyexistedat CMU, namelythe3D Room[Kanadeet al., 1998]. We
reconfiguredthe3D Roomandusedit to capturemultiple imagessimultaneouslyacrosspose.Fig-
ure 1 shows the capturesetupin the 3D Room. Thereare49 camerasin the 3D Room,14 very
high quality (3 CCD,progressive scan)Sony DXC 9000’s,and35 lower quality (singleCCD, in-
terlaced)JVCTK-C1380U’s. Wedecidedto useonly theSony camerassothattheimagequality is
approximatelythesameacrossthedatabase.Dueto otherconstraintswewereonly ableto use13
of the14Sony cameras.This still allowedusto capture13posesof eachpersonsimultaneously.
We positioned9 of the 13 camerasat roughly headheight in an arc from approximately
full left profile to full right profile. Eachneighboringpair of these9 camerasareapproximately���������
apart.Of theremaining4 cameras,2 wereplacedabove andbelow thecentralcamerac27,
and2 wereplacedin the cornersof the room, wheresurveillancecamerasaretypically located.
2
The locationsof 10 of the camerascanbe seenin Figure1(a). The other 3 are symmetrically
oppositethe3 right-mostcamerasin thefigure.We measuredthelocationsof thecamerasusinga
theodolite.Themeasuredlocationsareshown in Figure1(b)andareincludedin themeta-data.
2.2 The Flash System: Illumination
To obtainsignificantilluminationvariationweextendedthe3DRoomwith a“flashsystem”similar
to theYale Domeusedto capturethe datain [Georghiadeset al., 2000]. With help from Athin-
odorosGeorghiadesandPeterBelhumeur, we usedan AdvantechPCL-734,32-channeldigital
outputboardto control21 Minolta 220X flashes.TheAdvantechboardcanbedirectly wired into
the “hot-shoe”of the flashes.Generatinga pulseon oneof the outputchannelsthencausesthe
correspondingflashto gooff. WeplacedtheAdvantechboardin oneof the17computersusedfor
imagecapturein the3D Roomandintegratedtheflashcontrolcodeinto theimagecaptureroutine
so that the flash,the durationof which is approximately1ms,occurswhile the shutter(duration
approximately16ms)is open.We thenmodifiedtheimagecapturecodesothatoneflashgoesoff
in turn for eachimagecaptured.We werethenableto capture21 images,eachwith differentillu-
mination,in������ ����������
sec.Thelocationsof theflashes,measuredwith a theodolite,areshown
in Figure1(b)andincludedin thedatabasemeta-data,alongwith with cameralocations.
In theYale illumination database[Georghiadeset al., 2000] the imagesarecapturedwith
the room lights switchedoff. The imagesin the databasethereforedo not look entirely natural.
In therealworld, illumination usuallyconsistsof anambientlight with perhapsoneor two point
sources.To obtainrepresentative imagesof suchcases(thataremoreappropriatefor determining
therobustnessof facerecognitionalgorithmsto illuminationchange)wedecidedto captureimages
bothwith theroomlights on andwith themoff. We decidedto includeimageswith thelights off
to givesomepartialoverlapwith thedatabaseusedin [Georghiadeset al., 2000].
3 Database Contents
On averagethe entire captureproceduretook about10 minutesper subject. In that time, we
captured(andretained)over 600imagesfrom 13 poses,with 43 differentilluminations,andwith
4 expressions.The color imageshave size � ��� �!��" � . (The first 6 rows of the imagescontain
synchronizationinformationaddedby theVITC unitsin the3D Room[Kanadeet al., 1998]. This
3
c07
c27
c22 c02 c37 c05
c25 c09 c31
c29 c11 c14 c34
Figure2: An illustrationof theposevariationin thePIE database.Theposevariesfrom full left profile tofull frontal andon to full right profile. The9 camerasin thehorizontalsweepareeachseparatedby about#$#�%'&$(
. The 4 othercamerasinclude2 above and2 below the centralcamera,and2 in the cornersof theroom,typical locationsfor surveillancecameras.SeeFigure1 for thecameralocations.
informationmay be discarded.)The storagerequiredper personis approximately600MB using
color “raw PPM” images.Thus,thetotal storagerequirementfor 68 peopleis around40GB.This
canof coursebe reducedby compression,but we did not do so in orderto preserve the original
data.
3.1 Pose Variation
An exampleof theposevariationin thePIE databaseis shown in Figure2. This figurecontains
imagesof 1 subjectfrom eachof the 13 cameras.As canbe seen,thereis a wide variation in
posefrom full profile to full frontal. This subsetof the datashouldbe usefulfor evaluatingthe
robustnessof facerecognitionalgorithmsacrosspose. Sincethe cameralocationsareknown, it
canalsobe usedfor theevaluationof poseestimationalgorithms.Finally, it might beusefulfor
theevaluationof algorithmsthatcombineinformationfrom multiplewidely separatedviews.
3.2 Pose and Illumination Variation
Examplesof the poseandillumination variationareshown in Figure3. Figure3(a) containsthe
variationwith the room lights on andFigure3(b) with the lights off. Comparingthe imageswe
seethatthosein Figure3(a)appearmorenaturalandrepresentativeof imagesthatoccurin thereal
world thanthosein Figure3(b). On the otherhand,the datawith the lights off wascapturedto
reproducetheYaledatabaseusedin [Georghiadeset al., 2000]. Thiswill allow adirectcomparison
betweenthe 2 databases.We foreseea numberof possibleusesfor the poseand illumination
4
c27
c22
c05RoomLights Flashf01 Flashf09
c27
c22
c05Flashf01 Flashf09 Flashf17
(a) (b)
Figure3: Examplesof theposeandillumination variationwith (a) the room lights on, and(b) the roomlightsoff. Noticehow thecombinationof roomilluminationandflashesleadsto muchmorenaturallookingimagesthanwith just theflashalone.
variationdata:First, it canbeusedto reproducetheresultsin [Georghiadeset al., 2000]. Second,
it canbeusedto evaluatetherobustnessof facerecognitionalgorithmsto poseandillumination.
3.3 Pose and Expression Variation
An exampleof the poseandexpressionvariationis shown in Figure4. The subjectis asked to
provide a neutralexpression,to smile, to blink (i.e. to keeptheir eyes shut), and to talk. For
neutral,smiling, andblinking, we kept13 images,1 from eachcamera.For talking, we captured
2 secondsof video(60 frames).Sincethis occupiesa lot morespace,we keptthis datafor only 3
cameras:thefrontal camerac27,thethree-quarterprofile camerac22,andthefull profile camera
c05. In addition, for subjectswho usuallywearglasses,we collectedan extra setof 13 images
without their glasses,askingthemto putonaneutralexpression.
Theposeandexpressionvariationdatacanbeusedto testthe robustnessof facerecogni-
tion algorithmsto expression(andpose.) The reasonfor including blinking wasthat many face
recognitionalgorithmsusetheeye pupils to align a facemodel. It is thereforepossiblethat these
algorithmsareparticularlysensitive to subjectsblinking. We cannow testwhetherthis is indeed
thecase.
3.4 Meta-Data
Wecollectedavarietyof miscellaneous“meta-data”to aid in calibrationandotherprocessing:
5
c27
c22
c05Neutral Smiling Blinking Talking
Figure4: An exampleof theposeandexpressionvariationin thePIEdatabase.Eachsubjectis askedto giveaneutralexpression,to smile,to blink, andto talk. Wecapturethisvariationacrossall poses.For theneutralimages,the smiling images,andthe blinking images,we keepthe datafrom all cameras.For the talkingimages,we keep60 framesof video from only 3 cameras(frontal c27, three-quarterprofile c05,andfullprofilec22).For subjectswhowearglasseswealsocapture,from all cameras,onesetof neutral-expressionimagesof themwithout theirglasses.
Head, Camera, and Flash Locations: Usinga theodolite,we measuredthexyz-locationsof the
head,the 13 cameras,andthe 21 flashes.SeeFigure1(b). The valuesareincludedin the
databaseandcanbeusedto estimate(relative)headposesandilluminationdirections.
Background Images: At the startof eachrecordingsession,we captureda backgroundimage
from eachof the13 cameras.Theseimagescanbeusedto helplocalizethefaceregion.
Color Calibration Images: Althoughthecamerasthatwe usedareall of thesametype,thereis
still a largeamountof variationin theirphotometricresponses,bothdueto theirmanufacture
anddueto the fact that theaperturesettingson thecameraswereall setmanually. We did
perform“auto white-balance”on the cameras,but thereis still somenoticeablevariation
in their color response.To allow the camerasto be intensity- (gain andbias) andcolor-
calibrated,wecapturedimagesof colorcalibrationcharts.
Personal Attributes of the Subjects: Finally, weincludesomepersonalinformationaboutthe68
subjects.For eachsubjectwe recordthesubject’s sex andage,thepresenceor absenceof
eyeglasses,mustache,andbeard,aswell asthedateon which theimageswerecaptured.
4 Potential Uses of the Database
Throughoutthis paperwe have pointedout a numberof potentialusesof the database.We now
summarizesomeof thepossibilities,citing severalpapersthathavealreadyusedthedatabase:
6
) Evaluatingposeinvariantfacedetectors[Heiseleet al., 2001a,Heiseleet al., 2001b].
) Evaluationof headposeestimationalgorithms.
) Evaluationof the robustnessof facerecognitionalgorithmsto theposeof theprobeimage
[Grosset al., 2001,Blanzet al., 2002].
) Evaluationof facerecognitionalgorithmsthatoperateacrosspose;i.e.algorithmsfor which
thegalleryandprobeimageshavedifferentposes[Grosset al., 2002,Blanzet al., 2002].
) Evaluationof facerecognizersthatusemultiple imagesacrosspose[Grosset al., 2002].
) Evaluationof therobustnessof facerecognitionalgorithmsto illumination (andpose)[Sim
andKanade,2001,Grosset al., 2001,Blanzet al., 2002].
) Evaluationof therobustnessof facerecognitionalgorithmsto facialexpressionandpose.
) 3D facemodelbuilding eitherusingmultiple imagesacrosspose(stereo)or multiple images
acrossillumination (photometricstereo[Georghiadeset al., 2000]).
Theimportanceof theevaluationof algorithms(andthedatabasesusedto performtheevaluation)
for thedevelopmentof algorithmsshouldnot beunderestimated.It is oftenthefailureof existing
algorithmsonnew datasets,or simply theexistenceof new datasets,thatdrivesresearchforward.
5 Obtaining the Database
Wehavebeendistributing thePIEdatabasein thefollowing manner:
1. Therecipientshipsanempty(E)IDE harddrive to us.
2. Wecopy thedataontothedriveandshipit back.
TodatewehaveshippedthePIEdatabaseto over50researchgroupsworldwide.Anyoneinterested
in receiving thedatabaseshouldcontactthesecondauthorby [email protected] or
visit thePIE databasewebsiteatwww.ri.cmu.edu/projects/project_418.html.
Acknowledgements
We would like to thankAthinodorosGeorghiadesandPeterBelhumeurfor giving us detailsof
theYale“flash dome.” SundarVedulaandGermanCheunggaveusgreathelpusingtheCMU 3D
Room. We would alsolike to thankHenrySchneidermanandJeff Cohnfor discussionson what
7
datato collect.Financialsupportfor thecollectionof thisdatabasewasprovidedby theU.S.Office
of Naval Research(ONR) undercontractN00014-00-1-0915.A versionof this paperappearedin
the5th InternationalConferenceon AutomaticFaceandGestureRecognition[Sim et al., 2002].
We thankboththePAMI reviewersandthereviewersof [Sim et al., 2002] for their feedback.
References
[Blanzet al., 2002] V. Blanz,S.Romdhani,andT. Vetter. Faceidentificationacrossdifferentposesandilluminationswith a 3D morphablemodel. In Proceedings of the 5th IEEE InternationalConference on Face and Gesture Recognition, 2002.
[Georghiadeset al., 2000] A.S. Georghiades,P.N. Belhumeur, andD.J. Kriegman. From few tomany: Generativemodelsfor recognitionundervariableposeandillumination. In Proceedingsof the 4th IEEE International Conference on Automatic Face and Gesture Recognition, 2000.
[Grosset al., 2001] R. Gross,J.Shi, andJ.Cohn. Quovadisfacerecognition.In Proceedings ofthe 3rd Workshop on Empirical Evaluation Methods in Computer Vision, 2001.
[Grosset al., 2002] R. Gross,I. Matthews, andS. Baker. Eigenlight-fieldsandfacerecognitionacrosspose. In Proceedings of the 5th IEEE International Conference on Face and GestureRecognition, 2002.
[Heiseleet al., 2001a] B. Heisele,T. Serre,M. Pontil, and T. Poggio. Component-basedfacedetection.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2001.
[Heiseleet al., 2001b] B. Heisele,T. Serre,M. Pontil,T. Vetter, andT. Poggio.Categorizationbylearningandcombiningobjectparts.In Neural Information Processing Systems, 2001.
[Kanadeet al., 1998] T. Kanade,H. Saito,andS.Vedula.The3D room: Digitizing time-varying3D eventsby synchronizedmultiple video streams. TechnicalReport CMU-RI-TR-98-34,Carnegie Mellon UniversityRoboticsInstitute,1998.
[Philipset al., 1997] P.J. Philips, H. Moon, P. Rauss,and S.A. Rizvi. The FERET evaluationmethodologyfor face-recognitionalgorithms.In Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition, 1997.
[Sim andKanade,2001] T. SimandT. Kanade.Combiningmodelsandexemplarsfor facerecog-nition: An illuminatingexample.In Proceedings of the Workshop on Models versus Exemplarsin Computer Vision, 2001.
[Sim et al., 2002] T. Sim, S. Baker, andM. Bsat. The CMU pose,illumination, andexpression(PIE) database.In Proceedings of the 5th IEEE International Conference on Face and GestureRecognition, 2002.
8