8
To appear in: Proc. 7th Int. Conf. on Computer Vision, Corfu, Greece, September 22-25, 1999 A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester Rochester, NY 14627 USA Steven M. Seitz The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 USA Abstract In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple photographs taken at known but arbitrarily- distributed viewpoints. By studying the equivalence class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class, the photo hull, that (1) can be computed directly from pho- tographs of the scene, and (2) subsumes all other members of this class. We then give a provably-correct algorithm, called Space Carving, for computing this shape and present experimental results on complex real-world scenes. The ap- proach is designed to (1) build photorealistic shapes that accurately model scene appearance from a wide range of viewpoints, and (2) account for the complex interactions between occlusion, parallax, shading, and their effects on arbitrary views of a 3D scene. 1. Introduction A fundamental problem in computer vision is recon- structing the shape of a complex 3D scene from multiple photographs. While current techniques work well under controlled conditions (e.g., small stereo baselines [1], ac- tive viewpoint control [2], spatial and temporal smoothness [3], or scenes containing linear features or texture-less sur- faces [4–6]), very little is known about scene reconstruc- tion under general conditions. In particular, in the absence of a priori geometric information, what can we infer about the structure of an unknown scene from arbitrarily posi- tioned cameras at known viewpoints? Answering this ques- tion has many implications for reconstructing real objects and environments, which tend to be non-smooth, exhibit significant occlusions, and may contain both textured and texture-less surface regions (Figure 1). In this paper, we develop a theory for reconstructing Kiriakos Kutulakos gratefully acknowledges the support of the National Science Foundation under Grant No. IRI-9875628, of Roche Laboratories, Inc., and of the Dermatology Foundation. Part of this work was conducted while Steven Seitz was employed by the Vision Technology Group at Microsoft Research. The support of the Microsoft Corporation is gratefully acknowledged. arbitrarily-shaped scenes from arbitrarily-positioned cam- eras by formulating shape recovery as a constraint satisfac- tion problem. We show that any set of photographs of a rigid scene defines a collection of picture constraints that are satisfied by every scene projecting to those photographs. Furthermore, we characterize the set of all 3D shapes that satisfy these constraints and use the underlying theory to design a practical reconstruction algorithm, called Space Carving, that applies to fully-general shapes and camera configurations. In particular, we address three questions: Given input photographs, can we characterize the set of all photo-consistent shapes, i.e., shapes that re- produce the input photographs? Is it possible to compute a shape from this set and if so, what is the algorithm? What is the relationship of the computed shape to all other photo-consistent shapes? Our goal is to study the -view shape recovery problem in the general case where no constraints are placed upon the scene’s shape or about the viewpoints of the input pho- tographs. In particular, we address the above questions for the case when (1) no constraints are imposed on scene ge- ometry or topology, (2) no constraints are imposed on the positions of the input cameras, (3) no information is avail- able about the existence of specific image features in the input photographs (e.g., edges, points, lines, contours, tex- ture, or color), and (4) no a priori correspondence informa- tion is available. Unfortunately, even though several algo- rithms have been proposed for recovering shape from mul- tiple views that work under some of these conditions (e.g., work on stereo [7–9]), very little is currently known about how to answer the above questions, and even less so about how to answer them in this general case. At the heart of our work is the observation that these questions become tractable when scene radiance belongs to a general class of radiance functions we call locally com- putable. This class characterizes scenes for which global il- lumination effects such as shadows, transparency and inter- reflections can be ignored, and is sufficiently general to include scenes with parameterized radiance models (e.g., Lambertian, Phong, Torrance-Sparrow [10]). Using this observation as a starting point, we show how to compute,

A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

To appearin: Proc.7th Int. Conf. on ComputerVision,Corfu,Greece,September22-25,1999

A Theory of Shapeby SpaceCarving

KiriakosN. Kutulakos�

Depts.of ComputerScience& DermatologyUniversityof Rochester

Rochester, NY 14627USA

StevenM. Seitz�

TheRoboticsInstituteCarnegieMellon UniversityPittsburgh,PA 15213USA

AbstractIn this paper we consider the problem of computing

the 3D shapeof an unknown, arbitrarily-shapedscenefrom multiplephotographstaken at knownbut arbitrarily-distributedviewpoints. By studyingthe equivalenceclassof all 3D shapesthat reproducethe input photographs,weprove the existenceof a specialmemberof this class,thephoto hull, that (1) can be computeddirectly from pho-tographsof thescene, and(2) subsumesall othermembersof this class. We thengive a provably-correct algorithm,calledSpaceCarving, for computingthisshapeandpresentexperimentalresultsoncomplex real-worldscenes.Theap-proach is designedto (1) build photorealistic shapesthataccurately modelsceneappearancefrom a wide range ofviewpoints,and (2) accountfor the complex interactionsbetweenocclusion,parallax, shading, and their effectsonarbitrary viewsof a 3D scene.

1. Intr oduction

A fundamentalproblem in computervision is recon-structingthe shapeof a complex 3D scenefrom multiplephotographs. While current techniqueswork well undercontrolledconditions(e.g., small stereobaselines[1], ac-tiveviewpointcontrol[2], spatialandtemporalsmoothness[3], or scenescontaininglinearfeaturesor texture-lesssur-faces[4–6]), very little is known aboutscenereconstruc-tion undergeneralconditions.In particular, in theabsenceof a priori geometricinformation,whatcanwe infer aboutthestructureof anunknown scenefrom � arbitrarilyposi-tionedcamerasatknown viewpoints?Answeringthisques-tion hasmany implicationsfor reconstructingreal objectsand environments,which tend to be non-smooth,exhibitsignificantocclusions,andmay containboth texturedandtexture-lesssurfaceregions(Figure1).

In this paper, we develop a theory for reconstructing�

KiriakosKutulakosgratefullyacknowledgesthesupportof theNationalScienceFoundationunderGrantNo. IRI-9875628,of RocheLaboratories,Inc.,andof theDermatologyFoundation.�

Part of this work wasconductedwhile Steven Seitzwasemployed bythe Vision TechnologyGroupat Microsoft Research.The supportof theMicrosoftCorporationis gratefullyacknowledged.

arbitrarily-shapedscenesfrom arbitrarily-positionedcam-erasby formulatingshaperecoveryasa constraintsatisfac-tion problem. We show that any set of photographsof arigid scenedefinesa collectionof picture constraints thataresatisfiedby everysceneprojectingto thosephotographs.Furthermore,we characterizethe setof all 3D shapesthatsatisfy theseconstraintsand usethe underlyingtheory todesigna practical reconstructionalgorithm, called SpaceCarving, that appliesto fully-generalshapesand cameraconfigurations.In particular, weaddressthreequestions:

� Given � input photographs,canwe characterizethesetof all photo-consistentshapes, i.e., shapesthat re-producetheinputphotographs?� Is it possibleto computea shapefrom this setandifso,whatis thealgorithm?� What is the relationshipof the computedshapeto allotherphoto-consistentshapes?

Ourgoalis to studythe � -view shaperecoveryproblemin the generalcasewhereno constraints areplaceduponthescene’s shapeor abouttheviewpointsof theinput pho-tographs.In particular, we addresstheabove questionsforthecasewhen(1) no constraintsareimposedon scenege-ometryor topology, (2) no constraintsareimposedon thepositionsof the input cameras,(3) no informationis avail-able aboutthe existenceof specificimagefeaturesin theinput photographs(e.g.,edges,points,lines,contours,tex-ture,or color),and(4) noa priori correspondenceinforma-tion is available. Unfortunately, even thoughseveralalgo-rithmshave beenproposedfor recoveringshapefrom mul-tiple views thatwork undersomeof theseconditions(e.g.,work on stereo[7–9]), very little is currentlyknown abouthow to answertheabove questions,andevenlesssoabouthow to answerthemin thisgeneralcase.

At the heartof our work is the observation that thesequestionsbecometractablewhensceneradiancebelongstoa generalclassof radiancefunctionswe call locally com-putable. Thisclasscharacterizesscenesfor whichglobalil-luminationeffectssuchasshadows,transparency andinter-reflectionscan be ignored, and is sufficiently generaltoinclude sceneswith parameterizedradiancemodels(e.g.,Lambertian,Phong,Torrance-Sparrow [10]). Using thisobservationasa startingpoint, we show how to compute,

Page 2: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

from � photographsof anunknownscene,amaximalshapecalled the photo hull that enclosesthe set of all photo-consistentreconstructions.Theonly requirementsarethat(1) the viewpoint of eachphotographis known in a com-mon 3D world referenceframe(Euclidean,affine, or pro-jective, and (2) sceneradiancefollows a known, locally-computableradiancefunction. Experimentalresultsillus-tratingourmethod’sperformancearegivenfor bothrealandsimulatedgeometrically-complex scenes.

To our knowledge, no previous theoreticalwork hasstudiedtheequivalenceclassof solutionsto thegeneral� -view reconstructionproblemor provably-correctalgorithmsfor computingthem.1 The SpaceCarvingAlgorithm thatresultsfrom our analysis,however, is relatedto other3Dscene-spacestereoalgorithmsthathave beenrecentlypro-posed[14–21]. Of these,most closely relatedare mesh-based[14] andlevel-set[22] algorithms,aswell asmeth-ods that sweepa planeor other manifold througha dis-cretizedscenespace[15–17,20,23]. While thealgorithmsin [14,22] generatehigh-quality reconstructionsand per-formwell in thepresenceof occlusions,theiruseof regular-ization techniquespenalizescomplex surfacesandshapes.Even more importantly, no formal study hasbeenunder-taken to establishtheir validity for recovering arbitrarily-shapedscenesfrom unconstrainedcameraconfigurations(e.g.,the oneshown in Figure1a). In contrast,our SpaceCarvingAlgorithm is provably correctandhasno regular-izationbiases.Even thoughspace-sweepapproacheshavemany attractive properties,existing algorithms[15–17,20]arenot fully generali.e., they rely on thepresenceof spe-cific imagefeaturessuchasedgesandhencegenerateonlysparsereconstructions[15], or they placestrongconstraintsontheinputviewpointsrelativeto thescene[16,17]. Unlikeall previous methods,SpaceCarvingguaranteescompletereconstructionin thegeneralcase.

Ourapproachoffersfour maincontributionsovertheex-isting stateof the art. First, it introducesan algorithm-independentanalysisof the � view shape-recovery prob-lem, makingexplicit the assumptionsrequiredfor solvingit aswell astheambiguitiesintrinsic to theproblem. Sec-ond,it establishesthe tightestpossibleboundon theshapeof thetruesceneobtainablefrom � photographswithout apriori geometricinformation. Third, it describesthe firstprovably-correctalgorithm for scenereconstructionfromunconstrainedcameraviewpoints. Fourth, the approachleadsnaturallyto global reconstructionalgorithmsthat re-cover 3D shapeinformationfrom all photographsat once,eliminatingtheneedfor complex partialreconstructionandmergingoperations[19,24].

1Faugeras[11] hasrecentlyproposedthe term metamericto describesuchshapes,in analogywith the term’s usein the color perception[12]andstructure-from-motionliterature[13].

(a) (b)

Figure 1. Viewing geometry. Thescenevolumeandcam-eradistributioncoveredby ouranalysisarebothcompletelyunconstrained. Examplesinclude (a) a 3D environmentviewedfrom acollectionof camerasthatarearbitrarilydis-persedin freespace,and(b) a3D objectviewedby asinglecameramoving aroundit.

2. Picture Constraints

Let � bea3D scenedefinedby afinite,opaque,andpos-sibly disconnectedvolumein space.We assumethat � isviewed underperspective projectionfrom � known posi-tions ���� � � ������ in ������� (Figure1b). The radianceofa point � on the scene’s surfaceis a function rad������� thatmapsevery orientedray � throughthe point to the colorof light reflectedfrom � along � . We usethe term shape-radiancescenedescriptionto denotethe scene� togetherwith an assignmentof a radiancefunction to every pointon its surface.Thisdescriptioncontainsall theinformationneededto reproduceaphotographof thescenefor any cam-eraposition.

Everyphotographof a 3D scenetakenfrom a known lo-cationpartitionsthesetof all possibleshape-radiancescenedescriptionsinto two families,thosethatreproducethepho-tographand thosethat do not. We characterizethis con-straint for a given shapeanda given radianceassignmentby thenotionof photo-consistency:2

Definition 1 (Point Photo-Consistency)A point in ! that isvisible from " is photo-consistentwith the photographat " if (1) doesnot project to a backgroundpixel, and(2) thecolor at ’sprojectionis equalto rad#%$'& ("*) .Definition 2 (Shape-RadiancePhoto-Consistency)A shape-radiancescenedescriptionis photo-consistentwith the photo-graphat " if all points visible from " are photo-consistentandeverynon-backgroundpixel is theprojectionof a point in ! .

Definition 3 (ShapePhoto-Consistency)A shape ! is photo-consistentwith a setof photographsif there is an assignmentofradiancefunctionsto the visible points of ! that makes the re-sultingshape-radiancedescriptionphoto-consistentwith all pho-tographs.

2In thefollowing,wemakethesimplifyingassumptionthatpixel valuesin theimagemeasuresceneradiancedirectly.

Page 3: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

Our goal is to providea concretecharacterizationof thefamily of all scenesthatarephoto-consistentwith + inputphotographs.We achieve this by makingexplicit the twowaysin which photo-consistency with + photographscanconstrainascene’sshape.

2.1. Background Constraints

Photo-consistency requiresthat no point of , projectsto a backgroundpixel. If a photographtakenat position -containsidentifiablebackgroundpixels, this constraintre-stricts , to a conedefinedby - andthephotograph’s non-backgroundpixels.Given + suchphotographs,thesceneisrestrictedto thevisualhull, whichis thevolumeof intersec-tion of their correspondingcones[5].

When no a priori information is available about thescene’s radiance,the visual hull definesall the shapecon-straintsin the input photographs.This is becausethereisalwaysanassignmentof radiancefunctionsto thepointsonthesurfaceof thevisualhull thatmakestheresultingshape-radiancedescriptionphoto-consistentwith the + inputpho-tographs.3 Thevisualhull canthereforebethoughtof asa“least commitmentreconstruction”of the scene—any fur-ther refinementof this volume must rely on assumptionsaboutthescene’sshapeor radiance.

While visualhull reconstructionhasoftenbeenusedasamethodfor recovering3D shapefrom photographs[25,26],thepictureconstraintscapturedby thevisualhull only ex-ploit informationfrom thebackgroundpixels in thesepho-tographs.Unfortunately, theseconstraintsbecomeuselesswhenphotographscontainno backgroundpixels (i.e., thevisualhull degeneratesto .�/ ) or whenbackgroundidenti-fication cannotbe performedaccurately. Below we studypicture constraintsfrom non-backgroundpixels when thescene’s radianceis restrictedto a specialclassof radiancemodels. The resultingconstraintsleadto photo-consistentscenereconstructionsthataresubsetsof thevisualhull, andunlike thevisualhull, cancontainconcavities.

2.2. RadianceConstraints

Surfacesthat are not transparentor mirror-like reflectlight in a coherentmanner, i.e., thecolor of light reflectedfromasinglepointalongdifferentdirectionsisnotarbitrary.This coherenceprovidesadditionalpictureconstraintsbe-yond what canbe obtainedfrom backgroundinformation.In orderto takeadvantageof theseconstraints,we focusonsceneswhoseradiancesatisfiesthefollowing criterion:

Consistency Check Criterion: An algorithmconsist 02143 is available that takes as input at least57698

colors :�;<�=*>@?@?@?@>A:�;<B0 ,5

vectorsC�=*>@?@?@?@>DC�0 ,andthe light sourcepositions(non-Lambertiancase),anddecideswhetherit is possiblefor a singlesurface

3For example,setradE�F%GHJILK equalto thecolorat H ’sprojection.

Figure 2. (a) Illustrationof theVisibility andNon-Photo-Consistency Lemmas.If M is non-photo-consistentwith thephotographsat :�=*>A:*N�>A:@O , it is non-photo-consistentwith theentiresetVisP%QR1 MS3 , whichalsoincludes:@T .

point to reflect light of color :�;< U in direction C�Usimultaneouslyfor all VXWZY>@?*?@?@> 5 .

Givena shape, , theConsistency CheckCriteriongivesus a way to establishthe photo-consistency of every pointon , ’ssurface.Thiscriteriondefinesageneralclassof radi-ancemodels,thatwecall locally computable, thatarechar-acterizedby a locality property:theradianceatany point isindependentof theradianceof all otherpointsin thescene.Theclassof locally-computableradiancemodelsthereforerestrictsouranalysistosceneswhereglobalilluminationef-fectssuchastransparency, inter-reflection,andshadowscanbe ignored. This classsubsumesthe Lambertian( []\_^ )andotherparameterizedradiancemodels.4

Givenana priori locally computableradiancemodelforthescene,wecandeterminewhetheror notagivenshape,is photo-consistentwith a collectionof photographs.Evenmoreimportantly, whenthescene’sradianceis describedbysucha model,thenon-photo-consistency of a shape, tellsusagreatdealabouttheshapeof theunderlyingscene.Weusethe following two lemmasto make explicit the struc-ture of the family of photo-consistentshapes.Theselem-masprovide the analytical tools neededto describehowthe non-photo-consistency of a shape, affectsthe photo-consistency of its subsets(Figure2):

Lemma 1 (Visibility Lemma) Let M be a point on ` ’s surface,Surf1a`b3 , andlet VisP 1 MS3 bethecollectionof inputphotographsinwhich ` doesnot occludeM . If `dcfeg` is a shapethat alsohas Mon its surface, VisP 1 MS3dh VisP%QR1 MS3 .Proof: Since `dc is asubsetof ` , no pointof `dc canlie betweenMandthecamerascorrespondingto VisP 1 MS3 . QED

Lemma 2 (Non-Photo-ConsistencyLemma) If M9i Surf1a`b3 isnot photo-consistentwith a subsetof VisP 1 MS3 , it is not photo-consistentwith VisP 1 MS3 .

4Specificexamplesinclude(1) usinga mobilecameramountedwith alight sourceto capturephotographsof ascenewhosereflectancecanbeex-pressedin closedform (e.g.,usingtheTorrance-Sparrow model[10,27]),and(2) usingmultiplecamerasto capturephotographsof anapproximatelyLambertiansceneunderarbitraryunknown illumination(Figure1).

Page 4: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

Figure 3. Trivial shapesolutionsin theabsenceof free-spaceconstraints.A two-dimensionalobjectconsistingofa blacksquarewhosesidesarepaintedfour distinctdiffusecolors(red,blue,orange,andgreen),is viewedby fourcam-eras. Carvingout a small circle aroundeachcameraandprojectingtheimageontotheinteriorof thatcircleyieldsatrivial photo-consistentshape.

Intuitively, Lemmas1 and2 suggestthatbothvisibilityandnon-photo-consistency exhibit a form of “monotonic-ity:” theVisibility Lemmatellsusthatthecollectionof pho-tographsfrom which a surfacepoint jlk Surfm4n�o is visiblestrictly expandsas n getssmaller(Figure2). Analogously,the Non-Photo-Consistency Lemma, which follows as adirect consequenceof the definition of photo-consistency,tells us that eachnew photographcanbe thoughtof asanadditionalconstrainton the photo-consistency of surfacepoints—themorephotographsareavailable,themorediffi-cult it is for thosepointsto achievephoto-consistency. Fur-thermore,oncea surfacepoint fails to bephoto-consistentno new photographof that point can re-establishphoto-consistency.

Thekey consequenceof Lemmas1 and2 is givenby thefollowing theoremwhichshowsthatnon-photo-consistencyatapointrulesoutthephoto-consistency of anentirefamilyof shapes:

Theorem1 (SubsetTheorem) If prq Surfsatbu is not photo-consistent,nophoto-consistentsubsetof t containsp .

Proof: Let tdv2wxt be a shapethat containsp . Since p lies onthe surfaceof t , it mustalsolie on the surfaceof tdv . From theVisibility Lemmait followsthatVisyfs pSudz Visy%{ s pSu . Thetheoremnow follows by applying the Non-Photo-Consistency Lemmatotdv andusingthelocality propertyof locally computableradiancemodels.QED

We explore the ramificationsof the SubsetTheoreminthenext section.

3. The PhotoHull

The family of all shapesthat arephoto-consistentwith|photographsdefinestheambiguityinherentin theprob-

lemof recovering3D shapefrom thosephotographs.When

thereis morethanonephoto-consistentshapeit is impos-sible to decide,basedon thosephotographsalone,whichphoto-consistentshapecorrespondsto thetruescene.Thisambiguityraisestwo importantquestionsregardingthefea-sibility of scenereconstructionfrom photographs:

} Is it possible to compute a shape that is photo-consistentwith

|photographsand,if so,what is the

algorithm?} If a photo-consistentshapecanbecomputed,how can

we relatethat shapeto all otherphoto-consistent3Dinterpretationsof thescene?

Beforeproviding ageneralanswerto thesequestionsweobservethatwhenthenumberof inputphotographsis finite,thefirst questioncanbeansweredwith a trivial shape(Fig-ure 3). In general,trivial shapesolutionssuchasthis onecanonly beeliminatedwith theincorporationof freespaceconstraints,i.e.,regionsof spacethatareknown not to con-tainscenepoints.Ouranalysisenablesthe(optional)inclu-sionof suchconstraintsby specifyinganarbitraryshapenwithin whicha photo-consistentsceneis known to lie.5

In particular, our answersto both questionsreston thefollowing theorem. Theorem2 shows that for any shapen thereis a uniquephoto-consistentshapethat subsumes,i.e., containswithin its volume,all otherphoto-consistentshapesin n (Figure4):

Theorem2 (PhotoHull Theorem) Let t be an arbitrary setofpointsandlet t�~ betheunionof all photo-consistentsubsetsof t .Theshapet ~ is photo-consistentandis calledthephotohull.6

Proof: (By contradiction)Supposet�~ is not photo-consistentandlet p beanon-photo-consistentpointon its surface.Sincep�q�t ~ ,thereexists a photo-consistentshape,tdv�w�t ~ , that alsohas pon its surface. It follows from theSubsetTheoremthat t v is notphoto-consistent.QED

Theorem2 provides an explicit relation betweenthephotohull andall otherpossible3D interpretationsof thescene:thetheoremguaranteesthateverysuchinterpretationis a subsetof thephotohull. Thephotohull thereforerep-resentsa least-commitmentreconstructionof thescene.Wedescribeavolumetricalgorithmfor computingthisshapeinthenext section.

4. Reconstructionby SpaceCarving

An importantfeatureof the photohull is that it canbecomputedusinga simple,discretealgorithmthat “carves”spacein a well-definedmanner. Given an initial volume

5Note that if ���g�f� , theproblemreducesto thecasewhenno con-straintson freespaceareavailable.

6Our use of the term �J�J����������� � to denotethe “maximal” photo-consistentshapedefinedby a collectionof photographsis due to a sug-gestionby LeonardMcMillan.

Page 5: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

Figure 4. Illustration of the PhotoHull Theorem. Thegray-shadedregion correspondsto an arbitrary shape �containingthe object of Figure 3. ��� is a polygonalre-gion thatextendsbeyond thetruesceneandwhosebound-ary is definedby the polygonalsegments �d�A���D� , and � .When thesesegmentsarecoloredasshown, ��� ’s projec-tionsareindistinguishablefrom thatof the trueobjectandno photo-consistentshapein the gray-shadedregion cancontainpointsoutside� � .

�that containsthe scene,the algorithmproceedsby iter-

atively removing (i.e. “carving”) portionsof that volumeuntil it convergesto thephotohull,

���. Thealgorithmcan

thereforebe fully specifiedby answeringfour questions:(1) how do we selectthe initial volume

�, (2) how should

we representthat volumeto facilitatecarving,(3) how dowe carve at eachiterationto guaranteeconvergenceto thephotohull, and(4) whendoweterminatecarving?

Thechoiceof the initial volumehasa considerableim-pacton the outcomeof the reconstructionprocess(Figure3). Nevertheless,selectionof this volume is beyond thescopeof this paper;it will dependon thespecific3D shaperecovery applicationandon informationaboutthemannerin which theinput photographswereacquired.7 Below weconsiderageneralalgorithmthat,given � photographsandany initial volumethatcontainsthescene,is guaranteedtofind the(unique)photohull containedin thatvolume.

In particular, let�

be an arbitrary finite volume thatcontainsthe sceneasan unknown sub-volume. Also, as-sumethat the surfaceof the true sceneconformsto a ra-diancemodel definedby a consistency check algorithmconsist ����� . Werepresent

�asafinite collectionof vox-

els �% �¡�¢�¢�¢�¡L��£ . Using this representation,eachcarvingit-erationremovesa singlevoxel from

�.

The SubsetTheoremleadsdirectly to a methodfor se-lecting a voxel to carve away from

�at eachiteration.

Specifically, thetheoremtellsusthatif avoxel � onthesur-faceof

�is notphoto-consistent,thevolume

�Z¤¥�9¦¨§ ��©must still contain the photo hull. Hence, if only non-photo-consistentvoxels are removed at eachiteration, thecarvedvolumeis guaranteedto convergeto thephotohull.

7Examplesincludedefining ª to be equalto the visualhull or, in thecaseof acameramoving throughanenvironment, «f¬ minusa tubealongthecamera’s path.

Theorderin which non-photo-consistentvoxelsareexam-inedandremovedis not importantfor guaranteeingcorrect-ness.Convergenceto thisshapeoccurswhennonon-photo-consistentvoxel canbe foundon thesurfaceof thecarvedvolume. Theseconsiderationsleadto the following algo-rithm for computingthephotohull:8

SpaceCarving Algorithm

Step1: Initialize � to avolumecontainingthetruescene.

Step2: Repeatthefollowing stepsfor voxels ­¯® Surf°a�b± until anon-photo-consistentvoxel is found:

a. Project ­ to all photographs in Vis²f°a­�± . Let³�´µ�¶ �*·@·@·@� ³�´µ ¸ be thepixel colorsto which ­ projectsandlet ¹ ¶ �@·@·@·@�D¹ ¸ be the optical raysconnecting­ tothecorrespondingopticalcenters.

b. Determine the photo-consistency of ­ usingconsist º2° ³�´µ4¶ �@·@·@·*� ³�´µ ¸ �D¹ ¶ �@·*·@·@�D¹ ¸ ± .

Step3: If no non-photo-consistentvoxel is found, set ���¼»½�andterminate.Otherwise,set ��»g�¿¾ÁÀ*­% andrepeatStep2.

The key step in the algorithm is the searchand voxelconsistency checkingof Step2. The following proposi-tion givesan upperboundon the numberof voxel photo-consistency checks:9

Proposition1 The total numberof required photo-consistencychecks is boundedby Ã_ÄÆÅ where à is the numberof inputphotographsand Å is thenumberof voxelsin theinitial (i.e., un-carved)volume.

To performvisibility computationsefficiently, we useamulti-sweepimplementationof spacecarving.In whatfol-lows, we briefly summarizethe technique,but full detailsof theapproachareomitteddueto spacelimitations. Eachpassconsistsof sweepingaplanethroughthescenevolumeandtestingthe photo-consistency of voxels on that plane.Theadvantageof this methodis thatvoxelsarealwaysvis-itedin anorderthatcapturesall occlusionrelationsbetweentheentiresetof voxelsandanappropriately-chosensubsetÇ

of the cameras:eachsweepguaranteesthat if a voxel Èoccludesanothervoxel É when viewed from a camerainÇ

, È will necessarilybe visited before É . This is achievedby choosing

Çto be the setof all camerasthat lie on one

sideof the sweepplane. Sinceeachplanesweepconsid-ersonly a subsetof the camerasfrom which a voxel maybe visible, multiple sweepsare neededto ensurephoto-consistency of voxels with all input views. Our imple-mentationcycles throughsix directionsin eachpass,i.e.,in increasing/decreasingÊ , Ë , and Ì directions,andappliesrepeatedpassesuntil the carvingprocedureconverges. Inpractice,this typically occursafter2 or 3 passes.

8Convergenceto thisshapeis provablyguaranteedonly for scenesrep-resentableby adiscretesetof voxels.

9Proofis omitteddueto lackof space.

Page 6: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

5. Experimental Results

To demonstratetheapplicabilityof ourapproach,weper-formedseveralexperimentson realandsyntheticimagese-quences.In all examples,aLambertianmodelwasusedfortheConsistency CheckCriterion,i.e., it wasassumedthatavoxel projectsto pixelsof approximatelythesamecolor inevery image.Weuseda thresholdon thestandarddeviationof thesepixelsto decidewhetheror not to carvea voxel.

We first rantheSpaceCarvingAlgorithm on 16 imagesof a gargoyle sculpture(Figs. 5a-e).Thesub-pixel calibra-tion error in this sequenceenabledusinga small thresholdof 6% for theRGB componenterror. This threshold,alongwith the voxel sizeandthe 3D coordinatesof a boundingboxcontainingtheobjectweretheonly parametersgivenasinput to our implementation.Someerrorsarestill presentin the reconstruction,notablyholesthat occuras a resultof shadowsandotherilluminationchangescausedby mov-ing the objectratherthancamera.Theseeffectswerenotaccountedfor by theradiancemodel,causingsomevoxelsto beerroneouslycarved. Thefinite voxel size,calibrationerror, andimagediscretizationeffectsresultedin a lossofsomefinesurfacedetail.Voxelsizecouldbefurtherreducedwith bettercalibration,but only upto thepointwhereimagediscretizationeffects(i.e.,finite pixel size)becomeasignif-icantsourceof error. Figs. 5f-i show resultsfrom applyingouralgorithmto imagesof ahumanhand.

In a final experiment,we appliedour algorithmto im-agesof a syntheticbuilding scenerenderedfrom both itsinterior and exterior (Figure 6). This placementof cam-erasyieldsanextremelydifficult stereoproblem,dueto thedrasticchangesin visibility betweeninterior and exteriorcameras.10 Figure6 comparesthe original modelandthereconstructionfrom differentviewpoints. Themodel’s ap-pearanceis verygoodneartheinputviewpoints,asdemon-stratedin Figs. 6b-c. Note that thereconstructiontendsto“bulge” outandthatthewallsarenotperfectlyplanar(Fig-ure6e). This behavior is exactly aspredictedby Theorem2—the algorithm convergesto the largest possibleshapethat is consistentwith the input images. In low-contrastregionswhereshapeis visuallyambiguous,thiscausessig-nificantdeviationsbetweenthecomputedphotohull andthetruescene.While thesedeviationsdo not adverselyaffectsceneappearanceneartheinputviewpoints,they canresultin noticeableartifactsfor far-awayviews. Thesedeviationsandthe visual artifactsthey causeareeasily remediedbyincludingimagesfrom a wider rangeof cameraviewpointsto further constrainthe scene’s shape,asshown in Figure6f.

Ourexperimentshighlightanumberof advantagesof ourapproachoverprevioustechniques.Existingmulti-baselinestereotechniques[1] work bestfor denselytexturedscenes

10For example,the algorithmsin [16,17] fail catastrophicallyfor thisscenebecausethedistribution of the input views andthe resultingocclu-sionrelationshipsviolatetheassumptionsusedby thosealgorithms.

andsuffer in thepresenceof largeocclusions.In contrast,thegargoyleandhandsequencescontainmany low-texturedregionsanddramaticchangesin visibility. While contour-basedtechniqueslike volume intersection[25] work wellfor similar scenes,they requiredetectingsilhouettesor oc-cluding contours. For the gargoyle sequence,the back-groundwasunknown andheterogeneous,makingthecon-tour detectionproblemextremelydifficult. Note alsothatSeitzandDyer’s voxel coloring technique[16] would notwork for any of the above sequencesbecauseof the con-straintsit imposesoncameraplacement.Ourapproachsuc-ceedsbecauseit integratesboth texture andcontourinfor-mationasappropriate,without theneedto explicitly detectfeaturesor contours,or constrainviewpoints. Our resultsindicatethe approachis highly effective for both denselytexturedanduntexturedobjectsandscenes.

6. Concluding Remarks

While theSpaceCarvingAlgorithm’s effectivenesswasdemonstratedin the presenceof imagenoise, the photo-consistency theoryitself is basedon an idealizedmodelofimageformation. Extendingthetheoryto explicitly modelimagenoise,quantizationandcalibrationerrors,andtheireffectson thephotohull is anopenresearchproblem.Ex-tendingthe formulationto handlenon-locallycomputableradiancemodels(e.g.,shadows) is anotherimportanttopicof future work. Other researchdirectionsinclude (1) de-velopingspacecarvingalgorithmsfor noisyimages,(2) in-vestigatingtheuseof surface-basedratherthanvoxel-basedtechniquesfor finding the photo hull, (3) incorporatingapriori shapeconstraints(e.g.,smoothness),and(4) analyz-ing the topologicalstructureof the setof photo-consistentshapes.

References

[1] M. OkutomiandT. Kanade,“A multiple-baselinestereo,” T-PAMI, v.15,pp.353–363,1993.

[2] K. N. KutulakosandC. R. Dyer, “Recoveringshapeby pur-posive viewpoint adjustment,” IJCV, v.12,n.2,pp.113–136,1994.

[3] R. C. Bolles,H. H. Baker, andD. H. Marimont, “Epipolar-planeimageanalysis:An approachto determiningstructurefrom motion,” IJCV, v.1, pp.7–55,1987.

[4] R. Cipolla andA. Blake, “Surfaceshapefrom thedeforma-tion of apparentcontours,” IJCV, v.9,n.2,pp.83–112,1992.

[5] A. Laurentini,“The visualhull conceptfor silhouette-basedimageunderstanding,” T-PAMI, v.16,pp.150–162,1994.

[6] K. N. KutulakosandC. R.Dyer, “Global surfacereconstruc-tion by purposive controlof observer motion,” Artificial In-telligenceJournal, v.78,n.1-2,pp.147–177,1995.

[7] P. N. Belhumeur, “A bayesianapproachto binocularstereop-sis,” IJCV, v.19,n.3,pp.237–260,1996.

[8] I. Cox, S. Hingorani,S. Rao,andB. Maggs,“A maximumlikelihoodstereoalgorithm,” CVIU: Image Understanding,v.63,n.3,pp.542–567,1996.

Page 7: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

(a)

(b) (c) (d) (e)

(f)

(g) (h) (i)

Figure 5. Reconstructionresultsfor two realscenes.(a-e)Reconstructionof a gargoyle stonesculpture.Four out of 16 486x720RGB input imagesareshown in (a). The imageswereacquiredby (1) rotatingthe object in front of a stationarybackgroundinÍÍ�Î ÏÐ

increments,and(2) alteringtheobject’s backgroundbeforeeachimagewasacquired.This latterstepenabledcompletere-constructionof thesculpturewithoutany initial segmentationstep—thespacecarvingprocessensuredthatphoto-consistency couldnotbeenforcedfor pointsprojectingto non-objectpixels.(b-e)Reconstructionof thesculpture.Themodelcontains215,000surfacevoxels,carvedoutof aninitial volumeof approximately51million. It took250minutesto computeonanSGIO2R10000/175MHzworkstation. Oneof the 16 input imagesis shown (b), alongwith views of the reconstructionfrom the same(c) andnew (d-e)viewpoints.(f-i) Reconstructionof a hand.Threeout of onehundredcalibratedimagesof thehandareshown in (f). Thesequencewasgeneratedby moving acameraaroundthesubject’sstationaryhand.(g-i) Viewsof thereconstruction.ThereconstructedmodelwascomputedusinganRGB componenterror thresholdof 15. Themodelhas112,000voxelsandtook 53 minutesto computeontheSGIO2usingagraphicshardware-acceleratedversionof thealgorithm.

[9] C.V. Stewart,“MINPRAN: A new robustestimatorfor com-putervision,” T-PAMI., v.17,pp.925–938,1995.

[10] K. E. TorranceandE. M. Sparrow, “Theory of off-specular

reflectionfrom roughenedsurface,” JOSA, v.57, pp.1105–1114,1967.

[11] O. D. Faugeras,“Personalcommunication.”

Page 8: A Theory of Shape by Space Carvingkyros/pubs/99.iccv.carve.pdf · A Theory of Shape by Space Carving Kiriakos N. Kutulakos Depts. of Computer Science & Dermatology University of Rochester

Virtual View

(a) (b) (c)

(d) (e) (f)

Figure 6. Reconstructionof a syntheticbuilding scene.(a) 24 Cameraswereplacedin boththeinterior andexterior of a buildingto enablesimultaneous,completereconstructionof its exterior andinterior surfaces.Thereconstructioncontains370,000voxels,carved out of a Ñ�ÒÒ¼ÓlÔ�Õ�Ò¼ÓÖÑ�ÒÒ voxel block. (b) A renderedimageof the building for a viewpoint nearthe input cameras(shown as“virtual view” in (a)) is comparedto theview of thereconstruction(c). (d-f) Views of thereconstructionfrom far awaycameraviewpoints. (d) shows a renderedtop view of theoriginal building, (e) thesameview of thereconstruction,and(f) a newreconstructionresultingfrom addingimage(d) to the set of input views. Note that addingjust a single top view dramaticallyimprovesthequalityof thereconstruction.

[12] R. L. Alfvin and M. D. Fairchild, “Observer variability inmetamericcolor matchesusingcolor reproductionmedia,”Color Research & Application, v.22,n.3,pp.174–178,1997.

[13] J. A. J. C. van Veenand P. Werkhoven, “Metamerismsinstructure-from-motionperception,” Vision Research, v.36,pp.2197–2210,1996.

[14] P. Fua and Y. G. Leclerc, “Object-centeredsurface re-construction:Combiningmulti-imagestereoandshading,”IJCV, v.16,pp.35–56,1995.

[15] R. T. Collins, “A space-sweepapproachto truemulti-imagematching,” Proc.CVPR, pp.358–363,1996.

[16] S.M. SeitzandC.R.Dyer, “Photorealisticscenereconstruc-tion by voxel coloring,” Proc.CVPR, pp.1067–1073,1997.

[17] S.M. SeitzandK. N. Kutulakos,“Plenopticimageediting,”Proc.ICCV, pp.17–24,1998.

[18] C. L. Zitnick andJ. A. Webb,“Multi-baselinestereousingsurfaceextraction,” Tech.Rep.CMU-CS-96-196,CarnegieMellon University, Pittsburgh,PA, November1996.

[19] P. J.Narayanan,P. W. Rander, andT. Kanade,“Constructingvirtual worlds using densestereo,” Proc. ICCV, pp. 3–10,1998.

[20] R. Szeliski and P. Golland, “Stereo matchingwith trans-parency andmatting,” Proc.ICCV, pp.517–524,1998.

[21] S. Roy andI. J. Cox, “A maximum-flow formulationof theN-camerastereocorrespondenceproblem,” Proc. ICCV, pp.492–499,1998.

[22] O. FaugerasandR. Keriven, “Completedensestereovisionusinglevel setmethods,” Proc.ECCV, pp.379–393,1998.

[23] M. S. LangerandS. W. Zucker, “Shape-from-shadingon acloudyday,” JOSA-A, v.11,n.2,pp.467–478,1994.

[24] W. B. SealesandO. Faugeras,“Building three-dimensionalobjectmodelsfrom imagesequences,” CVIU, v.61, n.3,pp.308–324,1995.

[25] R. Szeliski, “Rapid octree constructionfrom image se-quences,” CVGIP: Image Understanding, v.58,n.1,pp. 23–32,1993.

[26] K. N. Kutulakos, “Shape from the light field boundary,”Proc.CVPR, pp.53–59,1997.

[27] Y. Sato,M. D. Wheeler, andK. Ikeuchi,“Object shapeandreflectancemodelingfrom observation,” Proc. SIGGRAPH,pp.379–387,1997.