43
Feature-Based Similarity Search in 3D Object Databases BENJAMIN BUSTOS, DANIEL A. KEIM, DIETMAR SAUPE, TOBIAS SCHRECK, AND DEJAN V. VRANI ´ C University of Konstanz The development of effective content-based multimedia search systems is an important research issue due to the growing amount of digital audio-visual information. In the case of images and video, the growth of digital data has been observed since the introduction of 2D capture devices. A similar development is expected for 3D data as acquisition and dissemination technology of 3D models is constantly improving. 3D objects are becoming an important type of multimedia data with many promising application possibilities. Defining the aspects that constitute the similarity among 3D objects and designing algorithms that implement such similarity definitions is a difficult problem. Over the last few years, a strong interest in methods for 3D similarity search has arisen, and a growing number of competing algorithms for content-based retrieval of 3D objects have been proposed. We survey feature-based methods for 3D retrieval, and we propose a taxonomy for these methods. We also present experimental results, comparing the effectiveness of some of the surveyed methods. Categories and Subject Descriptors: I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Curve, surface, solid, and object representations; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms: Algorithms Additional Key Words and Phrases: 3D model retrieval, content-based similarity search 1. INTRODUCTION The development of multimedia database systems and retrieval components is be- coming increasingly important due to a This work was partially funded by the German Research Foundation (DFG), Projects KE 740/6-1, KE 740/8-1, and SA 449/10-1 within the strategic research initiative SPP 1041 “Distributed Processing and Delivery of Digital Documents” (V3D2). It was also partially funded by the Information Society Technologies program of the European Commission, Future and Emerging Technologies under the IST-2001-33058 PANDA project (2001–2004). The first author is on leave from the Department of Computer Science, University of Chile. Authors’ address: Department of Computer and Information Science, University of Konstanz, Uni- versitaetsstr. 10 Box D78, 78457 Konstanz, Germany; email: {bustos,keim,saupe,schreck,vranic}@ informatik.uni-konstanz.de. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]. c 2005 ACM 0360-0300/05/1200-0345 $5.00 rapidly growing amount of available mul- timedia data. As we see progress in the fields of acquisition, storage, and dissem- ination of various multimedia formats, the application of effective and efficient ACM Computing Surveys, Vol. 37, No. 4, December 2005, pp. 345–387.

Feature-Based Similarity Search in 3D Object …...Feature-Based Similarity Search in 3D Object Databases 347 rounding, and small shift errors must be handled accordingly [Ankerst

  • Upload
    others

  • View
    28

  • Download
    0

Embed Size (px)

Citation preview

Feature-Based Similarity Search in 3D Object Databases

BENJAMIN BUSTOS, DANIEL A. KEIM, DIETMAR SAUPE, TOBIAS SCHRECK, ANDDEJAN V. VRANIC

University of Konstanz

The development of effective content-based multimedia search systems is an important

research issue due to the growing amount of digital audio-visual information. In the

case of images and video, the growth of digital data has been observed since the

introduction of 2D capture devices. A similar development is expected for 3D data as

acquisition and dissemination technology of 3D models is constantly improving. 3D

objects are becoming an important type of multimedia data with many promising

application possibilities. Defining the aspects that constitute the similarity among 3D

objects and designing algorithms that implement such similarity definitions is a

difficult problem. Over the last few years, a strong interest in methods for 3D similarity

search has arisen, and a growing number of competing algorithms for content-based

retrieval of 3D objects have been proposed. We survey feature-based methods for 3D

retrieval, and we propose a taxonomy for these methods. We also present experimental

results, comparing the effectiveness of some of the surveyed methods.

Categories and Subject Descriptors: I.3.5 [Computer Graphics]: Computational

Geometry and Object Modeling—Curve, surface, solid, and object representations;

I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism;

H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval

General Terms: Algorithms

Additional Key Words and Phrases: 3D model retrieval, content-based similarity search

1. INTRODUCTION

The development of multimedia databasesystems and retrieval components is be-coming increasingly important due to a

This work was partially funded by the German Research Foundation (DFG), Projects KE 740/6-1, KE 740/8-1,and SA 449/10-1 within the strategic research initiative SPP 1041 “Distributed Processing and Delivery ofDigital Documents” (V3D2). It was also partially funded by the Information Society Technologies programof the European Commission, Future and Emerging Technologies under the IST-2001-33058 PANDA project(2001–2004).

The first author is on leave from the Department of Computer Science, University of Chile.

Authors’ address: Department of Computer and Information Science, University of Konstanz, Uni-versitaetsstr. 10 Box D78, 78457 Konstanz, Germany; email: {bustos,keim,saupe,schreck,vranic}@informatik.uni-konstanz.de.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or direct commercial advantage andthat copies show this notice on the first page or initial screen of a display along with the full citation.Copyrights for components of this work owned by others than ACM must be honored. Abstracting withcredit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use anycomponent of this work in other works requires prior specific permission and/or a fee. Permissions may berequested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212)869-0481, or [email protected]©2005 ACM 0360-0300/05/1200-0345 $5.00

rapidly growing amount of available mul-timedia data. As we see progress in thefields of acquisition, storage, and dissem-ination of various multimedia formats,the application of effective and efficient

ACM Computing Surveys, Vol. 37, No. 4, December 2005, pp. 345–387.

346 B. Bustos et al.

database management systems to han-dle these formats is highly desirable. Theneed is obvious for image and video con-tent. In the case of 3D objects, a simi-lar development is expected in the nearfuture. The improvement in 3D scannertechnology and the availability of 3D mod-els widely distributed over the Internetare rapidly contributing to the creation oflarge databases of this type of multime-dia data. Also, the continuing advances ingraphics hardware are making fast pro-cessing of this complex data possible and,as a result, this technology is becomingavailable to a wide range of potential usersat a relatively low cost compared with thesituation ten years ago.

One of the most important tasks ina multimedia retrieval system is to im-plement effective and efficient similar-ity search algorithms. Multimedia ob-jects cannot be meaningfully queried inthe classical sense (exact search) becausethe probability that two multimedia ob-jects are identical is negligible unlessthey are digital copies from the samesource. Instead, a query in a multimediadatabase system usually requests a num-ber of the most similar objects to a givenquery object or a manually entered queryspecification.

One approach to implement similaritysearch in multimedia databases is by us-ing annotation information that describesthe content of the multimedia object. Un-fortunately, this approach is not very prac-ticable in large multimedia repositoriesbecause, in most cases, textual descrip-tions have to be generated manually andare difficult to extract automatically. Also,they are subject to the standards adoptedby the person who created them and can-not encode all the information availablein the multimedia object. A more promis-ing approach for implementing a similar-ity search system is using the multimediadata itself which is called content-basedsearch. In this approach, the multimediadata itself is used to perform a similarityquery. Figure 1 illustrates the concept ofcontent-based 3D similarity search. Thequery object is a 3D model of a chair. Thesystem is expected to retrieve similar 3D

Fig. 1. Example of a similarity search on a databaseof 3D objects, showing a query object (q) and a set ofpossible relevant retrieval answers (a).

objects from the database as shown inFigure 1.

1.1. Similarity Search in 3D ObjectDatabases

The problem of searching similar 3D ob-jects arises in a number of fields. Exam-ple problem domains include ComputerAided Design/Computer Aided Manufac-turing (CAD/CAM), virtual reality (VR),medicine, molecular biology, military ap-plications, and entertainment.

—In medicine, the detection of similarorgan deformations can be used fordiagnostic purposes. For example, thecurrent medical theory of child epilepsyassumes that an irregular developmentof a specific portion of the brain, calledthe hippocampus, is the reason forepilepsy. Several studies show that thesize and shape of the deformation of thehippocampus may indicate the defect,and this is used to decide whether ornot to remove the hippocampus by brainsurgery. Similarity search in a databaseof 3D hippocampi models can supportthe decision process and help to avoidunnecessary surgeries [Keim 1999].

—Structural classification is a basic taskin molecular biology. This classificationcan be successfully approached bysimilarity search, where proteins andmolecules are modeled as 3D objects. In-accuracies in the molecule 3D model dueto measurement, sampling, numerical

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 347

rounding, and small shift errors mustbe handled accordingly [Ankerst et al.1999b].

—For a number of years, many weatherforecast centers include pollen-forecastsin their reports in order to warn andaid people allergic to different kinds ofpollen. Ronneberger et al. [2002] devel-oped a pattern recognition system thatclassifies pollen from 3D volumetricdata acquired using a confocal laserscan microscope. The 3D structure ofpollen can be extracted. Grey scale in-variants provide components of featurevectors for classification.

—Forensic institutes around the worldmust deal with the complex task of iden-tifying any tablets with illicit products(drug pills). In conjunction with chem-ical analysis, physical characteristic ofthe pill (e.g., shape and imprint) areused in the identification process. Theshape and imprint recognition methodsinclude object bounding box, region-based shape and contour-based shapewhich can be used to define a 3D model ofthe pill. A similarity search system canbe used to report similarities betweenthe studied pill and the models of knownillicit tablets [Geradts et al. 2001].

—A 3D object database can be used to sup-port CAD tools because a 3D object canexactly model the geometry of an object,and any information needed about it canbe derived from the 3D model, for exam-ple, any possible 2D view of the object.These CAD tools have many applica-tions in industrial design. For example,standard parts in a manufacturingcompany can be modeled as 3D objects.When a new product is designed, it canbe composed by many small parts thatfit together to form the product. If someof these parts are similar to one of thestandard parts already designed, thenthe possible replacement of the originalpart with the standard part can lead toa reduction in production costs.

—Another industrial application is theproblem of fitting shoes [Novotni andKlein 2001a]. A 3D model of the client’sfoot is generated using a 3D scanning

tool. Next, a similarity search is per-formed to discard the most unlikelyfitting models according to the client’sfoot. The remaining candidates are thenexactly inspected to determine the bestmatch.

—A friend/foe detection system is sup-posed to determine whether a givenobject (e.g., a plane or a tank) is con-sidered friendly or hostile based on itsgeometric classification. This kind ofsystem has obvious applications inmilitary defense. One way to implementsuch a detection system is to store 3Dmodels of the known friendly or hostileobjects, and the system determines theclassification of a given object basedon the similarity definition and thedatabase of reference objects. As suchdecisions must be reached in real-timeand are obviously critical, high effi-ciency and effectiveness of the retrievalsystem is a dominant requirement forthis application.

—Movie and video game producers makeheavy use of 3D models to enhancerealism in entertainment applications.Reuse and adaptation of 3D objects bysimilarity search in existing databasesis a promising approach to reduceproduction costs.

As 3D objects are used in diverse appli-cation domains, different forms for objectrepresentation, manipulation, and pre-sentation have been developed. In theCAD domain, objects are often built bymerging patches of parametrized surfaceswhich are edited by technical personnel.Also, constructive solid geometry tech-niques are often employed, where com-plex objects are modeled by composingprimitives. 3D acquisition devices usuallyproduce voxelized object approximations(e.g., computer tomography scanners) orclouds of 3D points (e.g., in the sensingphase of structured light scanners). Otherrepresentation forms exist like swept vol-umes or 3D grammars. Probably the mostwidely used representation is to approxi-mate a 3D object by a mesh of polygons,usually triangles. For a survey on impor-tant representation forms, see Campbell

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

348 B. Bustos et al.

and Flynn [2001]. For 3D retrieval, ba-sically all of these formats may serve asinput to a similarity query. Where avail-able, information other than pure geom-etry data can be exploited, for example,structural data that may be included ina VRML representation. Many similaritysearch methods that are presented in theliterature to date rely on triangulationsbut could easily be extended to other rep-resentation forms. Of course, it is alwayspossible to convert or approximate fromone representation to another one.

Research on describing shapes and es-tablishing similarity relations between ge-ometric and visual shape has been doneextensively in the fields of computer vi-sion, shape analysis, and computationalgeometry for several decades. In computervision, the usual method is to try to rec-ognize objects in a scene by segmenting a2D image and then matching these seg-ments to a set of a priori known referenceobjects. Specific problems involve accom-plishing invariance with respect to light-ing conditions, view perspective, clutter,and occlusion. From the database perspec-tive, it is assumed that the objects are al-ready described in their entity and can bedirectly used. Problems arise in the form ofheterogeneous object representations (of-ten certain properties of 3D objects can-not be assured), and the decision problemper se is difficult. What is the similaritynotion? Where is the similarity threshold?How much tolerance is sustainable in agiven application context, and which an-swer set sizes are required? In addition,the database perspective deals with a pos-sibly large number of objects, therefore thefocus lies not only on accurate methodsbut also on fast methods providing effi-cient answer times even on large objectrepositories.

1.2. Feature Vector Paradigm

The usage of feature vectors (FVs) isthe standard approach for multimedia re-trieval [Faloutsos 1996] when it is notclear how to compare two objects directly.The feature-based approach is generaland can be applied on any multimedia

database, but we will present it from theperspective of 3D object databases.

1.2.1. Feature Vector Extraction. Havingdefined certain object aspects, numeri-cal values are extracted from a 3D ob-ject. These values describe the 3D objectand form a feature vector (FV) of usuallyhigh dimensionality. The resulting FVsare then used for indexing and retrievalpurposes. FVs describe particular char-acteristics of an object based on the na-ture of the extraction method. For 3D ob-jects, a variety of extraction algorithmshave been proposed, ranging from basicones, for example, properties of an object’sbounding box, to more complex ones, likethe distribution of normal vectors or cur-vature, or the Fourier transform of somespherical functions that characterize theobjects. It is important to note that dif-ferent extraction algorithms capture dif-ferent characteristics of the objects. It isa difficult problem to select some particu-lar feature methods to be integrated intoa similarity search system as we find thatnot all methods are equally suited for allretrieval tasks. Ideally, a system would im-plement a set of fundamentally differentmethods, such that the appropriate fea-ture could be chosen based on the applica-tion domain and/or user preferences. Aftera method is chosen and FVs are producedfor all objects in the database, a distancefunction calculates the distance of a querypoint to all objects of the database, pro-ducing a ranking of objects in ascendingdistance.

Figure 2 shows the principle of afeature-based similarity search system.The FV is extracted from the original 3Dquery object, producing a vector v ∈ R

d forsome dimensionality d .

The specific FV type and its givenparametrization determine the extractionprocedure and the resulting vector dimen-sionality. In general, different levels ofresolution for the FV are allowed: morerefined descriptors are obtained usinghigher resolutions. After the FV extrac-tion, the similarity search is performed ei-ther by a full scan of the database or by

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 349

Fig. 2. Feature-based similarity search.

using an index structure to retrieve therelevant models.

1.2.2. Metrics for Feature Vectors. Thesimilarity measure of two 3D objects is de-termined by a nonnegative, real number.Generally, a similarity measure is, there-fore, a function of the form

δ : Obj × Obj → R+0 ,

where Obj is a suitable space of 3D objects.Small values of δ denote strong similarity,and high values of δ correspond to dissim-ilarity.

Let U be the 3D object database and letq be the query 3D object. There are ba-sically two types of similarity queries inmultimedia databases.

—Range queries. A range query (q, r), forsome tolerance value r ∈ R

+, reportsall objects from the database that arewithin distance r to q, that is, (q, r) ={u ∈ U, δ(u, q) ≤ r}.

—k nearest neighbors (k-NN) queries. It re-ports the k objects from U closest to q,that is, it returns a set C ⊆ U such as|C| = k and for all u ∈ C and v ∈ U − C,δ(u, q) ≤ δ(v, q).

Assume that a FV of dimension d istaken for a similarity search. In typicalretrieval systems, the similarity measureδ(u, v) is simply obtained by a metric dis-tance L (�x, �y) in the d -dimensional spaceof FVs, where �x and �y denote the FVs ofu and v, respectively. An important fam-ily of similarity functions in vector spacesis the Minkowski (Ls) family of distances,

defined as:

Ls (�x, �y) =( ∑

1≤i≤d

|xi − yi |s)1/s

, �x, �y ∈ Rd , s ≥ 1.

Examples of these distance functionsare L1, which is called Manhattan dis-tance, L2, which is the Euclidean dis-tance, and the maximum distance L∞ =max1≤i≤d |xi − yi|.

A first extension to the standardMinkowski distance is to apply a weightvector w that weighs the influence thateach pair of components exerts on the to-tal distance value. This is useful if a userhas knowledge about the semantics of theFV components. Then, she can manuallyassign weights based on her preferenceswith respect to the components. If no suchexplicit knowledge exists, it is still possibleto generate weighting schemes based onrelevance feedback, see for example, Eladet al. [2002]. The basic idea in relevancefeedback is to let the user assign relevancescores to a number of retrieved results.Then, the query metric may automaticallybe adjusted such that the new ranking isin better agreement with the supplied rel-evance scores, and thereby (presumably)producing novel (previously not seen) rel-evant objects in the answer set.

If the feature components correspondto histogram data, several further exten-sions to the standard Minkowski distancecan be applied. In the context of imagesimilarity search, color histograms are of-ten used. The descriptors then consist ofhistogram bins, and cross-similarities canbe used to reflect natural neighborhood

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

350 B. Bustos et al.

similarities among different bins. Oneprominent example for employing cross-similarities is the QBIC system [Ashleyet al. 1995] where results from human per-ceptual research are used to determinea suitable cross-similarity scheme. It isshown that quadratic forms are the natu-ral way to handle these cross-similaritiesformally and that they can be efficientlyevaluated for a given database [Seidl andKriegel 1997]. If such intrafeature cross-similarities can be identified, quadraticforms may also be used for 3D similaritysearch, as done, for example, in the shapehistogram approach (Section 3.4.2). Apartfrom Minkowski and quadratic forms,other distance functions for distributionscan be borrowed from statistics and infor-mation theory. But this variety of distancefunctions also makes it difficult to selectthe appropriate distance function as theretrieval effectiveness of a given metric de-pends on the data to be retrieved and theextracted features [Puzicha et al. 1999].

1.3. Overview

The remainder of this article presents asurvey of approaches for searching 3D ob-jects in multimedia databases under thefeature vector paradigm. In Section 2, wediscuss fundamental issues of similaritysearch in 3D objects databases. In Section3, we review and classify feature-basedmethods for describing and comparing 3Dobjects that are suited for database de-ployment. A comparison in Section 4 triesto contrast the surveyed approaches withrespect to important characteristics andgives experimental retrieval effectivenessbenchmarks that we performed on a num-ber of algorithms. Finally, in Section 5, wedraw some conclusions and outline futurework in the area.

2. PROBLEMS AND CHALLENGES OF 3DSIMILARITY SEARCH SYSTEMS

Ultimately, the goal in 3D similaritysearch is to design database systems thatstore 3D objects and effectively and effi-ciently support similarity queries. In thissection, we discuss the main problems

posed by similarity search in 3D objectdatabases.

2.1. Descriptors for 3D Similarity Search

3D objects can represent complex informa-tion. The difficulties to overcome in defin-ing similarity between spatial objects arecomparable to those for the same task ap-plied to 2D images. Geometric propertiesof 3D objects can be given by a numberof representation formats as outlined inthe introduction. Depending on the for-mat, surface and matter properties canbe specified. The object’s resolution canbe arbitrarily set. Given that there is nofounded theory on a universally applica-ble description of 3D shapes or how to usethe models directly for similarity search,in a large class of methods for similar-ity ranking, the 3D data is transformedin some way to obtain numeric descrip-tors for indexing and retrieval. We alsorefer to these descriptors as feature vec-tors (FVs). The basic idea is to extract nu-meric data that describe the objects undersome identified geometric aspect and to in-fer the similarity of the models from thedistance of these numerical descriptions insome metric space. The similarity notionis derived by an application context thatdefines which aspects are of relevance forsimilarity. Similarity relations among ob-jects obtained in this way are then subjectto the specific similarity model employedand may not reflect similarity in a differ-ent application context.

The feature-based approach has severaladvantages compared to other approachesfor implementing similarity search. Theextraction of features from multime-dia data is usually fast and easilyparametrizable. Metrics for FVs such asthe Minkowski distances can also be effi-ciently computed. Spatial access methods[Bohm et al. 2001] or metric access meth-ods [Chavez et al. 2001] can be used toindex the obtained FVs. All these advan-tages make the feature-based approach agood candidate for implementing a 3D ob-ject similarity search engine.

3D similarity can also be estimatedunder paradigms other than the FV

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 351

Fig. 3. A 3D object in different scale and orientation (left) and also represented with anincreasing level of detail (right).

approach. Generally, nonnumeric descrip-tions can be extracted from 3D objects,like structural information. Also, directgeometric matching is an approach. Here,how easily a certain object can be trans-formed into another one is measured, anda cost associated by this transformationserves as the metric for similarity. Usuallythese metrics are computationally costlyto compute and do not always hold the tri-angle inequality, therefore it is more dif-ficult to index the database under thesealternative paradigms.

2.2. Descriptor Requirements and 3D PoseNormalization

Considering the descriptor approach, onecan define several requirements that ef-fective FV descriptors should meet. Gooddescriptors should abstract from the po-tentially very distinctive design decisionsthat different model authors make whenmodeling the same or similar objects.Specifically, the descriptors should be in-variant to changes in the orientation(translation, rotation and reflection) andscale of 3D models in their reference co-ordinate frame. That is, the similaritysearch engine should be able to retrieve ge-ometrically similar 3D objects with differ-ent orientations. Figure 3 (left) illustratesdifferent orientations of a Porsche car 3Dobject: the extracted FV should be (almost)the same in all cases. Ideally, an arbitrarycombination of translation, rotation, andscale applied to one object should not af-fect its similarity score with respect to an-other object.

Furthermore, a descriptor should alsobe robust with respect to small changesin the level of detail, geometry, and topol-ogy of the models. Figure 3 (right) showsthe Porsche car 3D object at four differentlevels of resolution. If such transforma-tions are made to the objects, the result-ing similarity measures should not changeabruptly but still reflect the overall simi-larity relations within the database.

Invariance and robustness propertiescan be achieved implicitly by those de-scriptors that consider relative objectproperties, for example, the distributionof the surface curvature of the objects.For other descriptors, these properties canbe achieved by a preprocessing normal-ization step which transforms the objectsso that they are represented in a canon-ical reference frame. In such a referenceframe, directions and distances are com-parable between different models, and thisinformation can be exploited for similar-ity calculation. The predominant methodfor finding this reference coordinate frameis pose estimation by principal componentanalysis (PCA) [Paquet et al. 2000; Vranicet al. 2001], also known as Karhunen-Loeve transformation. The basic idea isto align a model by considering its cen-ter of mass and principal axes. The objectis translated to align its center of masswith the coordinate origin (translation in-variance), and then is rotated around theorigin such that the x, y and z axes co-incide with the three principal compo-nents of the object (rotation invariance).Additionally, flipping invariance may beobtained by flipping the object based on

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

352 B. Bustos et al.

Fig. 4. Pose estimation using the PCA for three classes of 3D objects.

some moment test, and scaling invariancecan be achieved by scaling the model bysome canonical factor. Figure 4 illustratesPCA-based pose and scaling normaliza-tion of 3D objects. For some applications,matching should be invariant with respectto anisotropic scaling. For this purpose,Kazhdan et al. [2004] proposed a methodthat scales objects such that they are max-imally isotropic before computing FVs forshape matching.

While PCA is a standard approach topose estimation, several variants can beemployed. When a consistent definition ofobject mass properties is not available as isusually the case in mesh representations,one has to decide on the input to the PCA.Just using polygon centers or mesh ver-tices would make the outcome dependenton the tessellation of the model. Then, itis advantageous to use a weighing schemeto reflect the influence that each polygoncontributes to the overall object distribu-tion when using polygon centers or meshvertices [Vranic and Saupe 2000; Paquetand Rioux 2000]. Analytically, it is neces-sary to integrate over all of the infinites-imal points on a polygon [Vranic et al.2001]. Others use a Monte-Carlo approachto sample many polygon points [Ohbuchiet al. 2002] to obtain PCA input.

Some authors articulate a fundamentalcritique on the PCA as a tool for 3D re-trieval. Funkhouser et al. [2003] find PCAbeing unstable for certain model classesand consequently propose descriptors thatdo not rely on orientation information. Onthe other hand, omitting orientation infor-mation may also omit valuable object in-formation.

A final descriptor property that is alsodesirable to have is the multiresolutionproperty. Here, the descriptor embeds pro-gressive model detail information whichcan be used for similarity search on differ-ent levels of resolution. It eliminates theneed to extract and store multiple descrip-tors with different levels of resolution ifmultiresolution search is required, for ex-ample, for implementing a filter and re-finement step. A main class of descriptorsthat implicitly provides the multiresolu-tion property is one that performs a dis-crete Fourier or Wavelet transform of sam-pled object measures.

2.3. Retrieval System Requirements

There are two major concerns when de-signing and evaluating a similarity searchsystem, effectiveness and efficiency. Toprovide effective retrieval, the system,given a query, is supposed to return themost relevant objects from the database inthe first rankings, and to hold back irrele-vant objects from this ranking. Therefore,it needs to implement discriminationmethods to distinguish between similarand nonsimilar objects. The previouslydescribed invariants need to be provided.However, it is not clear what the exactmeaning of similarity is. As is obviousfrom the number of different methodsreviewed in Section 3, there exist a varietyof concepts for geometric similarity. Themost formalizable one until now is globalshape similarity, as illustrated in the firstrow of chairs shown in Figure 1. But,in spite of significant difference in theirglobal shapes, two objects could still be

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 353

considered similar in that they belongto some kind of semantic class, for ex-ample, as in the second row of chairs inFigure 1. Furthermore, partial similarityamong different objects also constitutesa similarity relationship within certainapplication domains. Most of the currentmethods are designed for global geometricsimilarity, while partial similarity stillremains a difficult problem.

The search system also has to pro-vide efficient methods for descriptor ex-traction, indexing, and query processingon the physical level. This is needed be-cause it can be expected that 3D databaseswill grow rapidly once 3D scanning and3D modeling become commonplace. Indatabases consisting of millions of objectswith hundreds of thousands of voxels ortriangles each which need to be auto-matically described and searched for, ef-ficiency becomes mandatory. Two broadtechniques exist to efficiently conduct fastsimilarity search [Faloutsos 1996]. A filter-and-refinement architecture first restrictsthe search space with some inexpensive,coarse similarity measure. On the cre-ated candidate set, some expensive butmore accurate similarity measure is em-ployed in order to produce the resultset. It is the responsibility of such fil-ter measures to guarantee no false dis-missals, or at least only a few, in or-der to generate high-quality answer sets.Second, if the objects in a multimediadatabase are already feature-transformedto numerical vectors, specially suitedhigh-dimensional data structures alongwith efficient nearest-neighbor query al-gorithms can be employed to avoid thelinear scan of all objects. Unfortunately,due to the curse of dimensionality [Bohmet al. 2001], the performance of allknown index structures deteriorates forhigh-dimensional data. Application of di-mensionality reduction techniques as apostprocessing step can help improvingthe indexability of high-dimensional FVs[Ngu et al. 2001].

Finally, note that, in traditionaldatabases, the key-based searchingparadigm implicitly guarantees full effec-tiveness of the search so efficiency aspects

are the major concern. In multimediadatabases, where effectiveness is subjectto some application and user context,efficiency and effectiveness concerns areof equal importance.

2.4. Partial Similarity

Almost all available methods for similar-ity search in 3D object databases focus onglobal geometric similarity. In some appli-cation domains, the notion of partial simi-larity is also considered. In partial similar-ity, similarities in parts or sections of theobjects are relevant. In some applications,complementarity between solid object seg-ments constitutes similarity between ob-jects, for examle, in the molecular dock-ing problem [Teodoro et al. 2001]. In thecase of 2D polygons, some solutions to thepartial similarity problem have been pro-posed [Berchtold et al. 1997]. For 3D ob-jects, to date it is not clear how to de-sign fast segmentation methods that leadto suited object partitions which could becompared pairwise. Although partial sim-ilarity is an important research field inmultimedia databases, this survey focuseson global geometric similarity.

2.5. Ground Truth

A crucial aspect for objective and repro-ducible effectiveness evaluation in mul-timedia databases is the existence of awidely accepted ground truth. Until nowthis is only partially the case for the re-search in 3D object retrieval since mostresearch groups in this field have collectedand classified their own 3D databases.In Section 4, we present our own pre-pared ground truth which we use to ex-perimentally compare the effectiveness ofseveral feature-based methods for 3D sim-ilarity search. Recently, the carefully com-piled Princeton Shape Benchmark wasproposed by Shilane et al. [2004]. Thebenchmark consists of a train databasewhich is proposed for calibrating searchalgorithms, and a test database which canthen be used to compare different searchengines with each other. This benchmarkcould eventually become a standard in

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

354 B. Bustos et al.

Fig. 5. 3D descriptor extraction process model.

evaluating and comparing retrieval per-formance of 3D retrieval algorithms.

3. METHODS FOR CONTENT-BASED 3DRETRIEVAL

This section reviews recent methods forfeature-based retrieval of 3D objects. InSection 3.1, an overview and a classifica-tion of the different methods discussed inthis survey is given. In Sections 3.2–3.7,we give a detailed description of many in-dividual methods, sorted according to ourclassification.

3.1. Overview and Classification

Classifying methods for 3D descriptioncan be done using different criteria. A pop-ular differentiation from the field of shapeanalysis functions according to the follow-ing schema [Loncaric 1998].

—Descriptors can be built based on thesurface of an object or based on interiorproperties. Curvature of the boundary isan example of the first type of descrip-tor, while measures for the distributionof object mass are of the second type ofdescription.

—Depending on the type of resulting ob-ject descriptor, numeric methods pro-duce a vector of scalar values represent-ing the object, while spatial methods useother means, for example, a sequenceof primitive shapes approximating theoriginal shape or a graph representingobject structure.

—Preserving descriptors preserve thecomplete object information which al-lows the lossless reconstruction of theoriginal object from the description.Nonpreserving descriptors discard a cer-tain amount of object information, usu-

ally retaining only that part of the infor-mation that is considered the most im-portant.

A descriptor differentiation more spe-cific to 3D models can be done basedon the type of model information focusedon, for example, geometry, color, texture,mass distribution, and material proper-ties. Usually geometry is regarded as mostimportant for 3D objects, and thus all de-scriptors presented in this survey makeuse of geometry input only. This is alsobecause geometry is always specified inmodels, while other characteristics aremore application-dependent and cannotbe assumed to be present in arbitrary 3Ddatabases.

Furthermore, one could differentiate de-scriptors with respect to integrity con-straints assumed for the models, for ex-ample, solid shape property, consistentface orientation, or the input type as-sumed (polygon mesh, voxelization, CSGset, etc.). Most of the presented methodsare flexible in that they allow for model in-consistencies and assume triangulations.Of course, the description flexibility de-pends on the model assumptions; addi-tional information can be expected to yieldmore options for designing descriptors.

Recently, we proposed a new way toclassify methods for 3D model retrieval[Bustos et al. 2005]. In this classifica-tion, the extraction of shape descriptorsis regarded as a multistage process (seeFigure 5). In the process, a given 3D object,usually represented by a polygonal mesh,is first preprocessed to approximate the re-quired invariance and robustness proper-ties. Then, the object is abstracted so thatits character is either of surface type, orvolumetric, or captured by one or several

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 355

2D images. Then, a numerical analysis ofthe shape can take place and, from the re-sult, finally the FVs are extracted.

We briefly sketch these basic steps in thefollowing. Without loss of generality, weassume that the 3D object is representedby a polygonal mesh.

(1) Preprocessing. If required by the de-scriptor, the 3D model is preprocessedfor rotation (R), translation (T), and/orscaling (S) invariance.

(2) Type of object abstraction. There arethree different types of object abstrac-tion: surface, volumetric, and image.Statistics of the curvature of the objectsurface is an example of a descriptorbased directly on surface, while mea-sures for the 3D distribution of objectmass, for example, using moment-based descriptors, belong to thevolumetric type of object abstraction.A third way to capture the characteris-tics of a mesh is to project it onto one orseveral image planes, producing ren-derings, corresponding depth maps,silhouettes, and so on, from whichdescriptors can be derived. This formsimage-based object abstractions.

(3) Numerical transformation. The mainfeatures of the polygonal mesh can becaptured numerically using differentmethods. For example, voxels gridsand image arrays can be wavelettransformed or surfaces can be adap-tively sampled. Other numericaltransformations include sphericalharmonics (SH), curve fitting, andthe discrete Fourier transform (DFT).Such transforms yield a numerical rep-resentation of the underlying object.

(4) Descriptor generation. At this stage,the final descriptor is generated. Itcan belong to one of three classes:

(a) Feature vectors (FVs) consist of el-ements in a vector space equippedwith a suitable metric. Usually theEuclidean vector space is takenwith dimensions that may easilyreach several hundreds.

(b) In statistical approaches, 3Dobjects are inspected for specific

features which are usually sum-marized in the form of a histogram.For example, in simple cases, thisamounts to the summed up surfacearea in specified volumetric regionsor, more complex, it may collectstatistics about distances of pointpairs randomly selected from the3D object. Usually, the obtainedhistogram is represented as a FVwhere each coordinate value corre-sponds to a bin of the histogram.

(c) The third category is better suitedfor structural 3D object shape de-scription that can be representedin the form of a graph [Sundaret al. 2003; Hilaga et al. 2001]. Agraph can more easily representthe structure of an object that ismade up of, or can be decomposedinto, several meaningful partssuch as the body and the limbs ofobjects modeling animals.

Table I shows the algorithms sur-veyed in this article with their refer-ences, preprocessing steps employed, typeof object abstraction considered, numerictransform applied, and descriptor typeobtained.

For presentation in this survey, we haveorganized the descriptors in the followingSections of the article.

—Statistics (Section 3.2). Statistical de-scriptors reflect basic object propertieslike the number of vertices and poly-gons, the surface area, the volume, thebounding volume, and statistical mo-ments. A variety of statistical descrip-tors are proposed in the literature for 3Dretrieval. In some application domains,simple spatial extension or volumetricmeasures may already be enough to re-trieve objects of interest.

—Extension-based descriptors (Section3.3). Extension-based methods build ob-ject descriptors from features sampledalong certain spatial directions from anobject’s center.

—Volume-based descriptors (Section3.4). These methods derive object fea-tures from volumetric representations

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

356 B. Bustos et al.

Table I. Overview of the Surveyed MethodsDescriptor Name Sect. Prepr. Obj. abs. Num. transf. Type

Simple statistics 3.2.1 RTS Volum. None FVParametrized stat. 3.2.2 RTS Surface Sampling FVGeometric 3D moments 3.2.3 RTS Surface Sampling FVRay moments 3.2.3 RTS Surface Sampling FVShape distr. (D2) 3.2.4 None Surface Sampling Hist.Cords based 3.2.5 RT Surface Sampling Hist.Ray based w. SH 3.3.1 RTS Image Sampl.+SH FVShading w. SH 3.3.1 RTS Image Sampl.+SH FVComplex w. SH 3.3.1 RTS Image Sampl.+SH FVExt. to ray based 3.3.2 RTS Image Sampl.+SH FVShape histograms 3.4.2 RTS Volum. Sampling Hist.Rot. inv. point cloud 3.4.3 RTS Volum. Sampling Hist.Voxel 3.4.4 RTS Volum. None Hist.3DDFT 3.4.4 RTS Volum. 3D DFT FVVoxelized volume 3.4.5 RTS Volum. Wavelet FVVolume 3.4.5 RTS Volum. None FVCover sequence 3.4.5 RTS Volum. None FVRot. inv. sph. harm. 3.4.6 TS Volum. Sampl.+SH FVReflective symmetry 3.4.7 TS Volum. Sampling FVWeighted point sets 3.4.8 RTS Volum. None Hist.Surface normal direct. 3.5.1 None Surface None Hist.Shape spectrum 3.5.2 None Surface Curve fitting Hist.Ext. Gaussian image 3.5.3 R Surface None Hist.Shape based on 3DHT 3.5.4 None Surface Sampling FVSilhouette 3.6.1 RTS Image Sampl.+DFT FVDepth Buffer 3.6.2 RTS Image 2D DFT FVLightfield 3.6.3 TS Image DFT, Zernike FV

Topological Matching 3.7.1 None Surface Sampling GraphSkeletonization 3.7.2 None Volumetric Dist. transf., clustering GraphSpin Image 3.7.3 None Surface Binning 2D Hist.

obtained by discretizing object surfaceinto voxel grids or by relying on themodels to already have volumetricrepresentation.

—Surface geometry (Section 3.5). Thesedescriptors focus on characteristics de-rived from model surfaces.

—Image-based descriptors (Section 3.6).The 3D similarity problem can be re-duced to an image similarity problem bycomparing 2D projections rendered fromthe 3D models.

While this survey focuses on FV-baseddescriptors, we recognize that a rich bodyof work from computer vision and shapeanalysis exists which deals with advanced3D shape descriptors relying on structuralshape analysis and customized data struc-tures and distance functions. In principle,these can also be used to implement sim-ilarity search algorithms for 3D objects.Therefore, in Section 3.7, we exemplarilyrecall 3D matching approaches based on

topological graphs, skeleton graphs, and acustomized data structure built for eachpoint in a 3D image (or model).

Figure 6 summarizes the chosen orga-nization of the methods surveyed in thisarticle. The remainder of this section fol-lows this organization.

3.2. Statistical 3D Descriptors

3.2.1. Simple Statistics. Bounding vol-ume, object orientation, and object volumedensity descriptors are probably the mostbasic shape descriptors and are widelyused in the CAD domain. In Paquet et al.[2000], the authors review several possiblesimple shape descriptors. The boundingvolume (BV) is given by the volume of theminimal rectangular box that encloses a3D object. The orientation of this bound-ing box is usually specified parallel toeither the coordinate frame or parallelto the principal axes of the respectiveobject. Also, the occupancy fraction of the

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 357

Fig. 6. Organization of 3D retrieval methods in this survey.

Fig. 7. Principal axes-based bounding volume and orientation of an ob-ject with respect to the original coordinate system (2D illustration).

object within its bounding volume givesinformation on how solid respectivelyrectangular the object is. Having deter-mined the principal axes, it is also possibleto integrate orientation information intothe description, relating the principalaxes to the given world coordinates of theobject. Here, it is proposed to consider thedistance between the bounding volume’scenter from the origin of the coordinatesystem as well as the angle enclosedbetween the principal axes and the co-ordinate system. If only the boundingvolume is considered, this descriptor isinvariant with respect to translation.If the bounding volume is determinededge-parallel to the object’s principal

axes, it is also approximately invariantwith respect to rotation. In both variants,the bounding volume descriptor is notinvariant with respect to the object’sscale. Figure 7 illustrates this.

3.2.2. Parameterized Statistics. Ohbuchiet al. [2002] propose a statistical fea-ture vector which is composed of threemeasures taken from the partitioningof a model into slices orthogonal to itsthree principal axes. The FV consists of3 ∗ 3 ∗ (n − 1) components, where n isthe number of equally-sized bins alongthe principal axes. A sampling windowis moved along the axes that considersthe average measures from consecutive

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

358 B. Bustos et al.

Fig. 8. Discretization of a model into 5 equally-sized slices, yielding 4 descriptor components.

pairs of adjacent slides, obtaining n − 1values on each principal axis for each ofthe three proposed measures (see Figure8). The measures used are the moment ofinertia of the surface points, the averagedistance of surface points from the princi-pal axis, and the variance in this distance.Selection of object points for PCA andstatistical measure calculation is done byrandomly sampling a number of pointsfrom the object’s faces (assuming a polyg-onal mesh), keeping the number of pointsin each face proportional to its area. Forretrieval, the authors experiment withthe standard Euclidean distance as wellas with a custom distance called elasticdistance which allows for some shift inthe bins to be compared [Ohbuchi et al.2002]. Both metrics are shown to producesimilar results. The authors conductexperiments on a VRML object databaseand conclude that their descriptor is wellsuited for objects that possess rotationalsymmetry, for example, chess figures. Asensitivity analysis indicates that thereexists some optimal choice for the numberof analysis windows, given a number oftotal sampling points.

3.2.3. Geometric 3D Moments. The usageof moments as a means of description hasa tradition in image retrieval and classifi-cation. Thus, moments have been used insome of the first attempts to define featurevectors for 3D object retrieval. Statisticalmoments μ are scalar values that describe

a distribution f . Parametrized by their or-der, moments represent a spectrum fromcoarse-level to detailed information of thegiven distribution [Paquet et al. 2000]. Inthe case of 3D objects, an object may beregarded as a distribution f (x, y , z) ∈ R

3,and the moment μi, j ,k of order n = i+ j +kin continuous form can be given as:

μi j k =∫ +∞

−∞

∫ +∞

−∞

∫ +∞

−∞f (x, y ,z)xi y jzkdxdydz.

It is well known, that the complete (in-finite) set of moments uniquely describesa distribution and vice versa. In its dis-crete form, objects are taken as finite pointsets P in 3D, and the moment formula

becomes μi j k = ∑|P |p=1 xp

i ypj z p

k . Because

moments are not invariant with respect totranslation, rotation, and scale of the con-sidered distribution, appropriate normal-ization should be applied before momentcalculation. When given as a polygonmesh, candidates for input to moment cal-culation are the mesh vertices, the centersof mass of triangles, or other object pointssampled by some scheme. A FV can then beconstructed by concatenating several mo-ments, for example, all moments of orderup to some n.

Studies that employ moments as de-scriptors for 3D retrieval include Vranicand Saupe [2001a] where moments arecalculated for object points sampled

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 359

uniformly with a ray-based scheme (seeSection 3.3.1), and Paquet et al. [2000]where moments are calculated from thecenters of mass (centroids) of all ob-ject faces (see Section 3.2.5). Vranic andSaupe [2001a] compare the retrieval per-formance of ray-based with centroid-basedmoments and conclude that the former aremore effective. Another publication thatproposed the usage of moments for 3D re-trieval is Elad et al. [2002]. Here, the au-thors uniformly sample a certain numberof points from the object’s surface for mo-ment calculation. Special to their analysisis the usage of relevance feedback to adjustthe distance function employed on theirmoment-based descriptor. While in mostsystems a static distance function is em-ployed, here it is proposed to interactivelyadapt the metric. A user performs an ini-tial query using a feature vector of sev-eral moments under the Euclidean norm.She marks relevant and irrelevant objectsin a prefix of the complete ranking. Then,via solving a quadratic optimization prob-lem, weights are calculated that reflect thefeedback so that, in the new ranking usingthe weighted Euclidean distance, relevantand irrelevant objects (according to theuser input) are discriminated by a fixeddistance threshold. The user is allowed toiterate through this process until a satis-factory end result is obtained. The authorsconclude that this process is suited to im-prove search effectiveness.

3.2.4. Shape Distribution. Osada et al.[2002] propose describing the shape ofa 3D object as a probability distributionsampled from a shape function, whichreflects geometric properties of the ob-ject. The algorithm calculates histogramscalled shape distributions and estimatessimilarity between two shapes by any met-ric that measures distances between dis-tributions (e.g., Minkowski distances). De-pending on the shape function employed,shape distributions possess rigid trans-formation invariance, robustness againstsmall model distortions, independence ofobject representation, and efficient com-putation. The shape functions studied by

Fig. 9. D2 distance histograms for some exampleobjects. (Figure adapted from Osada et al. [2002] c©2002 ACM Press).

the authors include the distribution of an-gles between three random points on thesurface of a 3D object and the distribu-tion of Euclidean distances between onecertain fixed point and random points onthe surface. Furthermore, they propose touse the Euclidean distance between tworandom points on the surface, the squareroot of the area of the triangle formed bytriples of random surface points, or thecube root of the volume of the tetrahe-dron between four random points on thesurface. Where necessary, a normalizationstep is applied for differences in scale.

Generally, the analytic computation ofdistributions is not feasible. Thus, the au-thors perform random point sampling ofan object and construct a histogram torepresent a shape distribution. Retrievalexperiments yielded that the best resultswere achieved using the D2 distance func-tion (distance between pairs of points onthe surface, see Figure 9 [Osada et al.2002].) and using the L1 norm of the prob-ability density histograms which were nor-malized by aligning the mean of each of thetwo histograms to be compared.

Shape distributions for 3D retrievalhave also been explored in Ip et al. [2002],Ip et al. [2003], and Ohbuchi et al. [2003].

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

360 B. Bustos et al.

3.2.5. Cords-Based Descriptor. Paquetet al. [2000] present a descriptor thatcombines information about the spatialextent and orientation of a 3D object. Theauthors define a cord as a vector thatruns from an object’s center of mass to thecentroid of a face of the object. For all ob-ject faces, such a cord is constructed. Thedescriptor consists of three histograms,namely, for the angles between the cordsand the object’s first two principal axes,and for the distribution of the cord length,measuring spatial extension. The threehistograms are normalized by the numberof cords. Using the principal axes forreference, the descriptor is invariant torotation and translation. It is also invari-ant to scale as the length distribution isbinned to the same number of bins forall objects. It can be inferred that thedescriptor is not invariant to nonuniformtessellation changes.

3.3. Extension-Based Descriptors

3.3.1. Ray-Based Sampling with SphericalHarmonics Representation. Vranic andSaupe [2001a, 2002] propose a descriptorframework that is based on taking sam-ples from a PCA-normalized 3D objectby probing the polygonal mesh alongregularly spaced directional unit vectorsui j as defined and visualized in Figure 10.The samples can be regarded as valuesof a function on a sphere (||ui j || = 1).The so-called ray-based feature vectormeasures the extent of the object fromits center of gravity O in directions ui j .The extent r(ui j ) = ||P (ui j ) − O|| indirection ui j is determined by findingthe furthest intersection point P (ui j )between the mesh and the ray emittedfrom the origin O in the direction ui j . Ifthe mesh is not intersected by the ray,then the extent is set to zero, r(ui j ) = 0.

The number of samples, 4B2 (Figure 10),should be large enough (e.g., B ≥ 32)so that sufficient information about theobject can be captured. The samplesobtained can be considered components ofa feature vector in the spatial domain. Asimilar FV called Sphere Projection wasconsidered by Leifman et al. [2003] which

Fig. 10. Determining ray directions u by uniformlyvarying spherical angular coordinates θi and ϕ j .

includes a number of empirical studies,showing good performance with respectto to a ground truth database of VRMLmodels collected from the Internet.

Nonetheless, such a descriptor consistsof a large dimensionality. In order to char-acterize many samples of a function on asphere by just a few parameters, sphericalharmonics [Healy et al. 2003] are pro-posed as a suitable tool. The magnitudesof complex coefficients, which are obtainedby applying the fast Fourier transformon the sphere (SFFT) to the samples, areregarded as vector components. Thus, theray-based feature vector is representedin the spectral domain where each vectorcomponent is formed by taking intoaccount all original samples. Bearing inmind that the extent function is a real-valued function, the magnitudes of theobtained coefficients are equal pairwise.Therefore, vector components are formedby using magnitudes of nonsymmetriccoefficients. Also, an embedded multires-olution representation of the feature caneasily be provided. A useful property ofthe ray-based FV with spherical harmonicrepresentation is invariance with respectto rotation around the z-axis (whenthe samples are taken as depicted inFigure 10). The inverse SFFT can beapplied to a number of the sphericalharmonic coefficients to reconstruct anapproximation of the underlying object atdifferent levels (see Figure 11). Besidesconsidering the extent as a feature aimedat describing 3D-shape, the authors con-sider a rendered perspective projectionof the object on an enclosing sphere. Thescalar product x(u) = |u · n(u)|, wheren(u) is the normal vector of the polygon

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 361

Fig. 11. Ray-based feature vector (left). The right illustration shows the back-transform ofthe ray-based samples from frequency to spatial domain.

that contains the point O + r(u)u (ifr(u) > 0), can be regarded as informationabout shading at the point (θ , ϕ) on theenclosing sphere. A shading-based FV isgenerated analogously to the ray-basedFV by sampling the shading function,applying the SFFT, and taking the mag-nitudes of low-frequency coefficients asvector components. An extension to usingeither r(u) or x(u), and also the combi-nation of both measures in a complexfunction y(u) = r(u)+i ·x(u), is consideredby the authors and called the complex FV.The authors demonstrate experimentallythat this combined FV in spherical har-monics representation outperforms boththe ray-based and the shading-based FVsin terms of retrieval effectiveness.

3.3.2. Extensions for Ray-Based Sampling.Vranic [2003] further explores an im-provement of the ray-based methods pre-viously described. Particularly the authorproposes not to restrict the sampling ateach ray to the last intersection point withthe mesh, and also to consider interior in-tersection points of the ray with modelsurfaces. This is implemented by usingconcentric spheres centered at the modelorigin and with uniformly varying radiiand associating all intersection points be-tween rays and the mesh, each to the clos-est sphere. For each ray and each sphereradius, the largest distance between theorigin and the intersection points asso-ciated with the respective ray and ra-dius is set as the sampling value if sucha point exists (otherwise, the respectivesampling value is set to zero). The au-thor thereby obtains samples of functions

on multiple concentric spheres. He definestwo FVs by applying the spherical Fouriertransform on these samples and extract-ing FV components from either the en-ergy contained in certain low frequencybands (RH1 FV) as done in the approachby Funkhouser et al. [2003] and describedin Section 3.4.6, or from the magnitudes ofcertain low frequency Fourier coefficients(RH2 FV). While RH2 relies on the PCA forpose estimation and includes orientationinformation, RH1 is rotation invariant bydefinition, discarding orientation informa-tion. The author experimentally evaluatesthe retrieval quality of these two descrip-tors against the ray-based FV in spher-ical harmonics representation describedpreviously, and against the FV defined byFunkhouser et al. [2003]. From the re-sults, the author concludes that (1) RH1outperforms the implicitly rotation invari-ant FV based on the spherical harmonicsrepresentation of a model voxelization (seeSection 3.4.6), implying that the SFT is ef-fective in filtering high frequency noise,and (2) that RH2 and the ray-based FV,both relying on PCA, outperform the othertwo FVs, implying that including orien-tation information using the PCA in FVsmay positively affect object retrieval on av-erage. As a further conclusion, the authorstates that RH2 performs slightly betterthan the ray-based FV, implying that con-sidering interior model information canincrease retrieval effectiveness.

3.4. Volume-Based Descriptors

3.4.1. Discretized Model Volume. A classencompassing several 3D descriptors that

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

362 B. Bustos et al.

Fig. 12. Shells and sectors as basic space decompositions for shape his-tograms. In each of the 2D examples, a single bin is marked.

are all derived from some form of volumet-ric discretization of the models is reviewednext. Here, the basic idea is to construct afeature vector from a model by partition-ing the space in which it resides, and thenaggregating the model content that is lo-cated in the respective partitioning seg-ments to form the components of featurevectors. Unless otherwise stated, thesedescriptors rely on model normalization,usually with PCA methods, to approxi-mately provide comparability between thespatial partitions of all models.

3.4.2. Shape Histograms. Ankerst et al.[1999a] studied classification and similar-ity search of 3D objects modeled as pointclouds. They describe 3D object shapes ashistograms of point fractions that fall intopartitions of the enclosing object space un-der different partitioning models. One de-composition is the shell model which par-titions the space into shells concentric tothe object’s center of mass, keeping radiiintervals constant. The sector model de-composition uses equally-sized segmentsobtained by forming Voronoi partitionsaround rays emitted from the model ori-gin and pointing to the vertices of an en-closing regular polyhedron. Finally, a com-bined model uses the intersection of shellsand sectors, see Figure 12 for an illus-tration. While the shell model is inher-ently rotation invariant, the sector and thecombined models rely on rotational objectnormalization. The authors propose thequadratic form distance for similarity esti-mation in order to reflect cross-similaritiesbetween histogram bins. Experiments are

conducted in a molecular classification setup and good discrimination capabilitiesare reported for the high-dimensional sec-tor (122-dim) and combination (240-dim)models, respectively.

3.4.3. Rotation Invariant Point Cloud Descrip-tor. Kato et al. [2000] present a descriptorthat relies on PCA registration but, at thesame time, is invariant to rotations of 90degrees along the principal axes. To con-struct the descriptor, an object is placedand oriented into the canonical coordinateframe using PCA and scaled to fit intoa unit cube with its origin at the cen-ter of mass of the object and perpendic-ular to the principal axes. The unit cubeis then partitioned into 7 × 7 × 7 equally-sized cube cells, and for each cell, the fre-quency of points regularly sampled fromthe object surface and which lie in the re-spective cell, is determined. To reduce thesize of the descriptor, which until now con-sists of 343 values, all grid cells are asso-ciated with one of 21 equivalence classesbased on their location in the grid. To thisend, all cells that coincide when perform-ing arbitrary rotations of 90 degrees alongthe principal axes are grouped togetherin one of the classes (see Figure 13 [Katoet al. 2000] for an illustration). For eachequivalence class, the frequency data con-tained in the cells belonging to the respec-tive equivalence class is aggregated, andthe final descriptor of dimensionality 21 isobtained. The authors presented retrievalperformance results on a 3D database, onwhich 7×7×7 was found to be the best griddimensionality, but state that, in general,

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 363

Fig. 13. Aggregating object geometry in equivalence classes defined on a 3×3×3grid. (Figure taken from Kato et al. [2000] c© 2000 IEEE).

the optimal size of the descriptor dependson the specific database characteristics.

3.4.4. Model Voxelization. Vranic andSaupe [2001b] present a FV, based on therasterization of a model into a voxel gridstructure and experimentally evaluatethe representation of this FV in boththe spatial and the frequency domain(see Figure 14). The voxel descriptor isobtained by first subdividing the bound-ing cube of an object (after PCA-basedrotation normalization) into n × n × nequally-sized voxel cells. Each of thesevoxel cells vij k , i, j , k ∈ {1, . . . , n} then

stores the fraction pij k = Sij k

S of the object

surface area Sij k that lies in voxel vij k ,where S = ∑n

i=1

∑nj=1

∑nk=1 Sij k is the

total surface area of the model. In orderto compute the value of Sij k , each modeltriangle Ti (i = 1, . . . , m) is subdividedinto l2

i (l ∈ N) coincident triangles, where

the value of l2i is proportional to the area

of T . The value of STi /l2i (STi is the area of

triangle Ti) is the attribute of the centersof gravity of the triangles obtained by thesubdivision. Finally, the value of Sij k isapproximated by summing up attributesof centroids lying in the correspondingvoxel cell. The object’s voxel cell occupan-cies constitute the FV of dimension n3. Forsimilarity estimation with this FV, a met-ric can be defined in the spatial domain(voxel), or after a 3D Fourier-transformin the frequency domain (3DDFT). Then,magnitudes of certain k low-frequency

Fig. 14. The voxel-based feature vector comparesoccupancy fractions of voxelized models in the spatialor frequency domain.

coefficients are used for description,enabling multiresolution search.

Using octrees for 3D similarity searchwas also recently proposed by Leifmanet al. [2003] where the similarity of two ob-jects is given by the sum of occupancy dif-ferences for each nonempty cell pair of thevoxel structure. The authors report goodretrieval capabilities of this descriptor.

3.4.5. Voxelized Volume. In the precedingsection, an object was considered as a col-lection of 2D-polygons, that is, as a surfacein 3D. This approach is the most generalapplying to unstructured “polygon soups.”In the case of polygons giving rise to awatertight model, one may want to use theenclosed volume to derive shape descrip-tors. Such schemes require an additionalpreprocessing step after pose normal-ization, namely the computation of a 3Dbitmap that specifies the inside/outsiderelation of each voxel with respect to theenclosed volume of the polygonal model.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

364 B. Bustos et al.

Several methods for similarity estimationbased on voxelized volume data of normal-ized models have been proposed. Paquetet al. [2000] and Paquet and Rioux [2000]propose a descriptor that characterizesvoxelized models by statistical momentscalculated at different levels of resolutionof the voxel grid and where the differentresolutions are obtained by applying theDaubechies D4 wavelet transform onthe 3D grid. Keim [1999] describes asimilarity measure based on the amountof intersection between the volumes oftwo voxelized 3D objects.

Novotni and Klein [2001b] proposedusing the minimum of the symmetricvolume differences between two solidobjects obtained when considering differ-ent object alignments based on principalaxes in order to measure volume simi-larity. The authors also give a techniquethat supports the efficient calculation ofsymmetric volume differences based onthe discretization of volumes into a grid.Sanchez-Cruz and Bribiesca [2003] reporta scheme for optimum voxel-based trans-formation from one object into anotherone which can be employed as a measureof object dissimilarity.

Another volume-based FV is presentedin Heczko et al. [2002]. In order to re-move the topological requirement of a wa-tertight model the volume of a given 3D-model specified by a collection of polygonsis defined in a different way. Each poly-gon contributes a (signed) volume givenby the tetrahedron that is formed by con-sidering the center of mass of all poly-gons as a vertex for a polyhedron with thegiven polygon as a base face. The sign ischosen according to the normal vector forthe polygon given by the model. The spacesurrounding the 3D models is partitionedinto sectors similar to the method in Sec-tion 3.4.2 and, in each sector, the (signed)volumes of the intersection with a gener-ated polyhedra is accumulated and givesone component of the FV. The partitioningscheme is as follows. The six surfaces of anobject’s bounding cube are equally dividedinto n2 squares each. Adding the object’scenter of mass to all squares, a total of6n2 pyramid-like segments in the bound-

Fig. 15. Volume-based feature vector.

ing cube is obtained. For similarity searcheither the volumes occupied in each seg-ment or a number of k first coefficientsafter a Fourier transform is considered.Figure 15 illustrates the idea in a 2Dsketch. Experimental results with this de-scriptor are presented in Section 4. Itperforms rather poorly which may be at-tributed to the fact that the retrievaldatabase used does not guarantee consis-tent orientation of the polygons.

Kriegel et al. [2003] present anotherapproach for describing voxelized objects.The cover sequence model approximates avoxelized 3D object using a sequence ofgrid primitives (called covers) which arebasically large parallelepipeds. The qual-ity of a cover sequence is measured as thesymmetric volume difference between theoriginal voxelized object and the sequenceof grid primitives. The sequence is de-scribed as the set of unions or differencesof the covers, and then each cover of the se-quence contributes six values for the finaldescriptor (three values for describing theposition of the cover, and three values fordescribing the extension of the cover). Themain problem is the ordering of the cov-ers in the sequence: two voxelized objectsthat are similar may produce features thatare distant from one and other, depend-ing on the ordering of the covers. To over-come this problem, the authors propose touse sets of FVs (one FV for each cover) todescribe the 3D objects and to compute a

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 365

Fig. 16. Descriptor extraction process for the harmonics 3D descriptor. (Figure taken from Funkhouser et al.[2003] c© 2003 ACM Press).

similarity measure between two sets ofFVs that ensures that the optimal se-quence of covers, the one that produces theminimal distance between the two objects,will always be considered.

3.4.6. Rotation Invariant Spherical HarmonicsDescriptor. Funkhouser et al. [2003] pro-pose a descriptor also based on the spher-ical harmonics representation of objectsamples (see Figure 16). The main differ-ence between this approach and the onereported in Section 3.3.1, apart from thesampling function chosen, is that, by de-scriptor design, it provides rotation invari-ance without requiring pose estimation.This is possible since the energy in eachfrequency band of the spherical transformis rotation invariant [Healy et al. 2003].

Input to their transform is the binaryvoxelization of a polygon mesh into a gridwith dimension 2R ∗ 2R ∗ 2R, where eachoccupied voxel indicates the intersectionof the mesh with the respective voxel. Toconstruct the voxelization, the object’s cen-ter of mass is translated into grid posi-tion (R, R, R) (grid origin), and the ob-ject is scaled so that the average distanceof occupied voxels to the center of mass

amounts to R2

, that is, 14

of the grids edgelength. By using this scale instead of scal-ing it so that the bounding cube fits intothe grid, it is possible to lose some ob-ject geometry in the description. On theother hand, sensitivity with respect to out-liers is expected to be reduced. The 8R3

voxels give rise to a binary function onthe corresponding cube which is written inspherical coordinates as fr (θ , φ) with the

origin (r = 0) placed at the cube center.The binary function is sampled for radiir = 1, 2, . . . , R and a sufficient numberof angles θ , φ to allow computation of thespherical harmonics representation of thespherical functions fr . The feature vectorconsists of low-frequency band energies ofthe functions fr , r = 1, . . . , R. By con-struction, it is invariant with respect torotation around the center of mass of theobject.

The authors presented experimental re-sults conducted on a database of 1,890models that were manually classified,comparing the harmonics 3D descriptorwith the shape histogram (see Section3.4.2), shape distribution D2 (see Sec-tion 3.2.4), EGI (see Section 3.5.3), and amoment-based (see Section 3.2.3) descrip-tor. The experiments indicated that theirdescriptor consistently outperformed theother descriptors which among other ar-guments was attributed to discarding ro-tation information.

A generalization of this approach con-sidering the full volumetric model infor-mation was introduced in Novotni andKlein [2003, 2004]. The authors formrotational-invariant descriptors from 3DZernike moments obtained from appropri-ately voxelized mesh models. The authorspresent the mathematical framework anddiscuss implementation specifics. Fromanalysis and experiments, they concludedthat, by considering the integral of thevolumetric information in extension tojust sampling it on concentric spheres,retrieval performance may be improved,and, at the same time, a more compact de-scriptor is obtained.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

366 B. Bustos et al.

3.4.7. Reflective Symmetry Descriptor.Kazhdan et al. [2003] present a descriptorthat is based on global object symmetry.The method is based on a function onthe unit sphere that measures reflectivesymmetry between two parts of an objectlying on the opposite sides of a cuttingplane. The cutting plane contains thecenter of gravity of the object, while thenormal vector is determined by a pointon the unit sphere. The main idea ofthe approach is that the shape of anobject may be characterized by using anappropriate measure of symmetry andby sampling the corresponding functionat a sufficient number of points. Briefly,for a given cutting plane, the reflectivesymmetry is computed using a functionf on concentric spheres. The functionf is defined by sampling a voxel-basedrepresentation of the object. The voxelattributes are defined using an expo-nentially decaying Euclidean distancetransform. The reflective symmetry mea-sure describes the proportion of f thatis symmetric with respect to the givenplane and the proportion of f that isantisymmetric. The symmetric proportionis obtained by projecting the function fonto the space π of functions invariantunder reflection about the given plane andby computing the L2 distance between theoriginal function and the projection. Theantisymmetric proportion is calculated ina similar manner, by projecting f ontothe space π⊥ that is orthogonal to π andby computing the L2 distance between fand the projection. Since the reflectivesymmetry measure is determined byanalyzing the whole object, a samplevalue of the function on the unit spheregives information about global symmetrywith respect to a given plane.

For 3D retrieval, the similarity betweenobjects is estimated by the L∞ norm be-tween their reflective symmetry descrip-tions. Analytically and experimentally, theauthors show the main properties of thisapproach to be stability against high-frequency noise, scale invariance, and ro-bustness against the level of detail ofthe object representation. Considering re-trieval power, the algorithm is experimen-

Fig. 17. Visualization of the reflective symmetrymeasure (lower row) for certain objects (upper row).The extension in direction u indicates the degree ofsymmetry with respect to the symmetry plane per-pendicular to u. (Figure taken from Kazhdan et al.[2003] c© 2003 Springer-Verlag with kind permissionof Springer Science and Business Media.)

tally shown to be orthogonal to some ex-tent to other descriptors, in the sensethat overall retrieval power is increasedwhen combining it with other descriptors.Figure 17 [Kazhdan et al. 2003] shows anexample of reflective symmetry measuresfor several objects.

3.4.8. Point Sets Methods. The weightedpoint set method [Tangelder and Veltkamp2003] compares two 3D objects repre-sented by polyhedral meshes. A shape sig-nature of the 3D object is defined as aset of points that consists of weightedsalient points from the object. The firststep of the algorithm is to place the 3Dobject into a canonical coordinate systemwhich is established by means of the PCAand to partition the object’s bounding cubeinto a rectangular voxel grid. Then, foreach nonempty grid cell, one representa-tive vertex as well as an associated weightare determined by one of three methodsexplored by the authors. In the first pro-posed method, for each nonempty cell, oneof the contained points is selected basedon the Gaussian curvature at the respec-tive point and the Gaussian curvature isassociated with that point. The two otherproposed methods average over the ver-tices of a cell, and associate either a mea-sure for the normal variation or the unitweight. The latter method is given in or-der to be able to support meshes withinconsistent polygon orientation because

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 367

Fig. 18. Shape index values for some elementary shapes. (Figure taken from Zaharia and Preteux [2001]c© 2001 SPIE).

then curvature and normal variation can-not be determined meaningfully. A varia-tion of the Earth’s Mover Distance (EMD)[Rubner et al. 1998], 1 the so-called pro-portional transportation distance (PTD),is introduced as the similarity functionto compare two weighted point sets. ThePTD is described as a linear program-ming problem that can be solved, for ex-ample, using the simplex algorithm. Theauthors state that the PTD is a pseudo-metric which makes it suitable to use forindexing purposes (in contrast, the EMDdoes not obey the triangle inequality). Ex-periments were performed which indicatecompetitive retrieval performance, but noclear winner could be identified among thethree proposed weighing methods.

Another approach that matches sets ofpoints was introduced in Shamir et al.[2003]. There, the point sets to be matchedare obtained for each 3D model by decom-posing the model into a coarse-to-fine hi-erarchy of an elementary shape (spheres).The point sets, therefore, consist of sphereradii and associated centers and can bematched by a custom coarse-to-fine al-gorithm involving exhaustive search ona coarse level and graph matching tech-niques on finer levels in the multiresolu-tion representation.

3.5. Surface Geometry-Based Descriptors

In this Section, we present 3D descriptorsthat are based on object surface measures.These surface measures include surfacecurvature measures as well as the distri-bution of surface normal vectors.

1The basic idea of the earth movers distance is tomeasure the distance between two histograms bysolving the transportation problem of converting onehistogram into the other.

3.5.1. Surface Normal Directions. Paquetand Rioux [2000] consider histograms ofthe angles enclosed between each of thefirst two principal axes and the face nor-mals of all object polygons. Straightfor-ward, it is possible to construct eitherone unifying histogram, or two separatehistograms for the distribution with re-spect to each of the two first principalaxes, or a bivariate histogram which re-flects the dependency between the angles.Intuitively, the bidimensional distributioncontains the most information. Still, suchhistograms are sensitive to the level ofdetail by which the model is repre-sented. An illustrative example is givenby the authors that presents two pyra-mids with the sides of one of themformed by inclined planes and the otherformed by a stairway-like makeup. Ob-viously, the angular distributions of thetwo pyramids will differ tremendously,while their global shape might be quitesimilar.

3.5.2. Surface Curvature. Zaharia andPreteux [2001] present a descriptorfor 3D retrieval proposed within theMPEG-7 framework for multimedia con-tent description. The descriptor reflectscurvature properties of 3D objects. Theshape spectrum FV is defined as thedistribution of the shape index for pointson the surface of a 3D object which is afunction of the two principal curvatures.The shape index is a scaled version of theangular coordinate of a polar representa-tion of the principal curvature vector, andit is invariant with respect to rotation,translation and scale by construction.Figure 18 [Zaharia and Preteux 2001]illustrates some elementary shapes withtheir corresponding shape index values.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

368 B. Bustos et al.

Fig. 19. Mapping from object normals to the Gaussian sphere.

Because the shape index is not definedfor planar surfaces, but 3D objects areusually approximated by polygon meshes,the authors suggest approximating theshape index by fitting quadratic surfacepatches to all mesh faces based on therespective face and adjacent faces andusing this surface for shape index cal-culation. To compensate for potentialestimation unreliability due to (near)planar surface approximations and (near)isolated polygonal face areas, these areexcluded from the shape index distri-bution based on a threshold criterion,but the relative area of the sum of suchproblematic surface regions is accumu-lated in two additional attributes namedplanar surface and singular surface,respectively. These attributes, togetherwith the shape index histogram, form thefinal descriptor. Experiments conductedby the authors with this descriptor onseveral 3D databases quantitatively showgood retrieval results.

Surface curvature as a description for3D models was also considered by Shumet al. [1996]. In this work, the modelswere resampled by fitting a regularlytessellated spherical polygon mesh ontothe model. Then, the curvature wasdetermined for each vertex of the fittedspherical mesh based on the vertex’neighbor nodes. Finally, the similaritymeasure between two models was ob-tained by minimizing an l p norm betweentheir curvature maps over all rotations

of the map, thereby supporting rotationinvariance.

3.5.3. Extended Gaussian Image. The dis-tribution of the normals of the polygonsthat form a 3D object can be used to de-scribe its global shape. One way to rep-resent this distribution is using the Ex-tended Gaussian Image (EGI) [Horn 1984;Ip and Wong 2002]. The EGI is a mappingfrom the 3D object to the Gaussian sphere(see Figure 19). To compute the EGI of a3D object, the normal vectors of all poly-gons of the 3D objects are mapped onto therespective point of the Gaussian spherethat has the same normal as the polygon.To build a descriptor from this mapping,the Gaussian sphere is partitioned intoR × C cells (by using R different longi-tudes and C−1 different latitudes), whereeach cell corresponds to a range of normalorientations. The number of mapped nor-mals on cell ci j gives the value of this cell.All cell’s values are mapped to a R×C ma-trix which is called the signature of the 3Dobject. The similarity between two objectsignatures a and b is given by sim(a, b) =∑R

i=1

∑Cj=1

(∣∣aij − bij∣∣ /

∣∣aij + bij∣∣) [Ip and

Wong 2002]. The EGI is scale andtranslation invariant, but it requiresrotational normalization. Retrieval per-formance studies were performed inKazhdan et al. [2003] and Funkhouseret al. [2003]. Also, its performance wasevaluated in recognition of aligned human

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 369

head models in Ip and Wong [2002]. Thereis also a complex version of the EGI (CEGI)[Kang and Ikeuchi 1993] which associatesa complex weight to each cell of the EGI.

3.5.4. Shape Descriptor Based on the 3DHough Transform. Zaharia and Preteux[2002] presents a descriptor based onthe 3D Hough transform, the so-calledcanonical 3D Hough transform descrip-tor (C3DHTD). The basic idea of the3D Hough transform is to accumulatethree-dimensional points within a set ofplanes. These planes are determined byparametrizing the space using sphericalcoordinates (e.g., a distances from the ori-gin, b azimuth angles, and c elevation an-gles, thus obtaining a · b · c planes). Eachtriangle t of the object contributes to eachplane p with a weight equal to the projec-tion area of t on p but only if the scalarproduct between the normals of t and p ishigher than a given threshold.

Rotation invariance for this descriptoris approximated by normalizing the 3Dobject with PCA, determining its princi-pal axes, and using its center of gravityas the origin of the coordinate system forthe Hough transform. However, it is ar-gued that PCA may fail to provide thecorrect orientation of a 3D object by justlabeling the principal axes according tothe eigenvalues (in ascending or descend-ing order). For this reason, the 48 possibleHough transforms (one for each possiblePCA-based coordinate system) are aggre-gated into the descriptor.

The direct concatenation of 48 descrip-tors would lead to a high complexityin terms of descriptor size and match-ing computation time. To solve this prob-lem, the authors propose to partition theunit sphere by projecting the vertices ofany regular polyhedron. This partition-ing schema is then used as parametriza-tion of the space. Then they show howto derive all Hough transforms from justone of them which is then called the gen-erating transform. The similarity mea-sure between two canonical 3DHT de-scriptors from objects q and r, hq andhr , respectively, is defined as d (hq , hr ) =

Fig. 20. Silhouettes of a 3D models. Note that, fromleft to right, the viewing direction is parallel to thefirst, second, and third principal axes of the model.Equidistant sampling points are marked along thecontour.

min1≤i≤48

{∣∣∣∣hq − hir

∣∣∣∣}, where the set {hir}

corresponds to one of the 48 possible 3DHough transforms of object r, and || · || de-notes the L1 or L2 norm. Retrieval exper-iments were conducted, contrasting theproposed descriptor with the shape spec-trum (cf. Section 3.5.2) and EGI (cf. Section3.5.3) descriptors, attributing best perfor-mance to the C3DHT descriptor.

3.6. Image-Based Descriptors

In the real world, spatial objects, apartfrom means of physical interaction, arerecognized by humans in the way theyare visually perceived. Therefore, a natu-ral approach is to consider 2D projectionsof spatial objects for similarity estimation.Thereby the problem of 3D retrieval isreduced to one in two dimensions wheretechniques from image retrieval can be ap-plied. One advantage of image-based re-trieval methods over most of the other de-scriptors is that it is straightforward todesign query interfaces where a user sup-plies a 2D sketch which is then input intothe search algorithm [Funkhouser et al.2003; Loffler 2000]. The problem of ro-tational invariance can again be solvedby either rotational normalization prepro-cessing, by using rotational-invariant fea-tures, or by matching over many differentalignments simultaneously.

3.6.1. Description with Silhouettes. A met-hod called silhouette descriptor [Heczkoet al. 2002] characterizes 3D objects interms of their silhouettes that are ob-tained by parallel projections. The objectsare first PCA-normalized and scaled intoa unit cube that is axis-parallel to the

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

370 B. Bustos et al.

Fig. 21. Depth buffer-based feature vector. The first row of images shows thedepth buffers of a car model. Darker pixels indicate that the distance betweenview plane and object is smaller than at brighter pixels. The second row showscoefficient magnitudes of the 2D Fourier transform of the six images.

principal axes. Then, parallel projectionsonto three planes each orthogonal to one ofthe principal axes are calculated. The au-thors propose to obtain descriptors by con-catenating Fourier descriptors of the threeresulting contours. To obtain the descrip-tors, a silhouette contour is scanned byplacing equally-spaced, sequential pointsonto the contour. The sequence of centroid-distances of the (ordered) contour pointsis Fourier transformed, and magnitudesof a number of low-frequency Fourier co-efficients contribute to the feature vector.Via PCA preprocessing, the Silhouette de-scriptor is pose and scale invariant. Figure20 illustrates the contour images of a carobject.

Experimental results on the retrievaleffectiveness of this descriptor were pub-lished in Vranic [2004] and some resultsare also given in Section 4 of this survey.Song and Golshani [2002] also address theusage of projected images for 3D retrieval.The authors propose to render object im-ages from certain directions and to employvarious distance functions on resultingimage pairs, for example, based on circu-larity measures from the projections ordistances between vectors of magnitudesafter Fourier transform. Further work onimage-based retrieval methods has beenreported in Ansary et al. [2004], Loffler[2000], and Cyr and Kimia [2004].

3.6.2. Description with Depth Information.Another image-based descriptor was pro-posed in Heczko et al. [2002] and furtherdiscussed in Vranic [2004]. The so-calleddepth buffer descriptor starts with the

same setup as the silhouette descriptor:the model is oriented and scaled into thecanonical unit cube. Instead of three sil-houettes, six grey-scale images are ren-dered using parallel projection, two eachfor each of the principal axes. Each pixelencodes in an 8-bit grey value the dis-tance from the viewing plane (i.e., sidesof the unit cube) of the object. These im-ages correspond to the concept of z- ordepth-buffers in computer graphics. Af-ter rendering, the six images are trans-formed using the standard 2D discreteFourier transform, and the magnitudes ofcertain k first low-frequency coefficientsof each image contribute to the depthbuffer feature vector of dimensionality 6k.An illustration of this method is given inFigure 21.

From our own experimental results (seeVranic [2004] as well as Section 4.2 inthis article), we conclude that the depthbuffer has good retrieval capability andis able to outperform other descriptors onour benchmarking database.

Figure 21 shows the depth buffer ren-derings of a car object as well as a visu-alization of the respective Fourier trans-forms.

3.6.3. Lightfield Descriptor. Chen et al.[2003] proposed a descriptor based onimages from many different viewingdirections. The authors define the Light-Field descriptor as certain image featuresextracted from a set of silhouettes thatare obtained from parallel projections ofa 3D object. A camera system is definedwhere a camera is located on each of

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 371

Fig. 22. The LightField descriptor determines similarity between 3D objects bythe maximum similarity when aligning sets of projections obtained from an arrayof cameras surrounding the object. (Figure taken from Chen et al. [2003] c© 2003Blackwell Publishing).

the vertices of a dodecahedron which iscentered at the object’s center, completelysurrounding the object (see Figure 22[Chen et al. 2003]). The cameras’ viewingdirections point towards the center of thedodecahedron and the camera up-vectoris uniquely defined. Considering parallelprojections, due to symmetries, at most 10unique silhouettes result from one suchcamera system. The similarity betweentwo objects is then defined as the mini-mum of the sum of distances between allcorresponding image pairs when rotatingone camera system relative to the other,covering all 60 possible alignments of thecamera systems. To further support therotation invariance of this method, theauthors consider not just one, but ten im-ages per dodecahedron vertex obtained byuniformly varying all camera positions inthe neighborhood of the vertex. For a fullobject-to-object comparison run, this leadsto 5, 460 rotations of one camera systemin order to determine the final distance.

The image metric employed to compareeach image pair is the l1 norm over a vec-tor of coefficients composed of 35 Zernikemoments and 10 Fourier coefficients ex-tracted from the rendered silhouettes. Foronline retrieval purposes, this rather ex-pensive algorithm is accelerated by a mul-tistage filter-and-refinement process. Thisprocess gradually increases the number of

rotations, images and vector componentsas well as the component quantization ac-curacy that are considered in each refine-ment iteration, discarding all the objectsthat exhibit a distance greater than themean distance between the query objectand all objects of the database. From ex-periments conducted by the authors aswell as those presented in Shilane et al.[2004], it can be concluded that the re-trieval quality of this method is excellent,and that it outperforms a range of othermethods presented.

3.7. Nonfeature Vector Matching Techniques

Until now, we have reviewed 3D descrip-tors based on rather fast and easy toextract vectors of real-valued measures(feature vectors) defined on model char-acteristics such as spatial extent, surfacecurvature, 2D projections, and so on.While the feature vector approach is prac-tical for application on large databasesof 3D objects, other paradigms for objectdescription and matching, that originatefrom computer vision and shape analysisexist. While very powerful, methodsfrom these fields usually are compu-tationally more complex, may lead torepresentations other than real vectors,and demand for customized distancefunctions.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

372 B. Bustos et al.

Graphs are a natural choice to cap-ture model topology but they ofteninvolve complex extraction algorithmsand matching strategies. Graphs havebeen derived from model surface [Hi-laga et al. 2001; McWherter et al. 2001;McWherter et al. 2001] and volumet-ric [Sundar et al. 2003] properties of3D models. Similarity calculation canproceed on the graphs themselves viacustomized graph matching approaches[Hilaga et al. 2001; Bespalov et al. 2003] orvia numeric descriptions of the graphs ob-tained, for example, using spectral theory[McWherter et al. 2001] or combinationsof both methods. In Sections 3.7.1 and3.7.2, we exemplarily recall two graph-based techniques, noting that a growingbody of work exists in this area [Bespalovet al. 2003; Biasotti et al. 2003].

Besides graphs and feature vectors, cus-tomized numeric data structures havebeen proposed for 3D description and re-trieval such as the Spin Images recalled inSection 3.7.3.

While all of these methods introduceinteresting matching concepts, their ap-plication to large databases of general 3Dobjects raises problems due to complexityissues or certain restrictions imposed onthe types of 3D models supported.

3.7.1. Topological Matching. Hilaga et al.[2001] present an approach to describe thetopology of 3D objects by a graph structureand show how to use it for matching andretrieval. The algorithm is based on con-structing so-called Reeb graphs from themodels which can be interpreted as infor-mation about the skeletal structure of anobject. The basic idea is to partition the ob-ject into connected portions by analyzinga function μ that is defined over the en-tire object’s surface. Informally, the Reebgraph generated from a 3D object is madeup of nodes that represent portions of theobject for which μ assumes values rangingin certain value intervals. Parent-child re-lationships between nodes represent ad-jacent intervals of these function valuesfor the contained object parts. For comput-ing the similarity of two objects, the au-

thors propose comparing the topology ofthe objects respective Reeb graphs as wellas similarities between the mesh proper-ties of the model parts that are associatedwith corresponding graph nodes.

Defining a suited function μ is criti-cal to the construction of graphs suitedfor object analysis and matching. For ex-ample, the height function h(x, y , z) = zthat returns the height of a surface at po-sition (x, y) is suited to analyze terraindata where orientation is well-defined. Tobe rotation, translation, and scale invari-ant, Hilaga et al. [2001] propose using theappropriately normalized sum of geodesicdistances between a unique central pointand all other points of the model surfaceas the function μ. Intuitively, if for a pointp of the objects surface, μ(p) is relativelylow, p is expected to be closer to the centerof the object, while points on the object’speriphery would possess higher functionvalues. To construct the final descriptorfor an object, the range of possible functionvalues is discretized into a number of bins.For each bin, the restriction of the objectto the parts containing μ-values in the re-spective bin, topologically connected sub-parts are identified and aggregated intoa node of the Reeb graph each. By merg-ing the nodes belonging to adjacent bins, agiven Reeb graph is recursively condensedinto coarser Reeb graphs and the so-calledmultiresolution Reeb graph (MRG) is ob-tained. The authors give details on com-puting the descriptor and a coarse-to-fine MRG matching strategy. Sensitivityand retrieval experiments are reported,indicating that the descriptor is usefulfor retrieving topologically similar objectsaccording to human notion. Figure 23[Hilaqa et al. 2001] schematically illus-trates construction of a Reeb graph andvisualizes the geodesic distance functionon two similar but deformed objects.

3.7.2. Skeleton-Based Object Matching.Skeletons derived from solid objects canbe regarded as intuitive object descrip-tions. They are able to capture importantinformation about the structure of objectswith applications in, for example, object

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 373

Fig. 23. The construction of several Reeb graphs is shown by recursively refining μ-intervals(left). The right image visualizes the aggregated, normalized geodesic distance function eval-uated on two topologically similar objects. (Figure taken from [Hilaga et al. 2001] c© 2001ACM Press).

analysis, compression, or animation.In order to use skeletons for 3D objectretrieval, suitable skeletonization algo-rithms and skeleton similarity functionshave to be defined. In a recent paper,Sundar et al. [2003] presented a frame-work for this task. To obtain a thinskeleton, the authors proposed to firstapply a thinning algorithm on the vox-elization of a solid object. The methodreduces the model voxels to those voxelsthat are important for object reconstruc-tion as determined by a heuristic thatrelates the distance transform value ofeach voxel with the mean of the distancetransform values of the voxels amongits 26 neighbors [Gagvani and Silver1999]. In a second step, the remainingvoxels are clustered, and a minimumspanning tree is constructed connectingthe voxel clusters. Clustering and con-necting proceed subject to the conditionof not violating the object boundary. Theresulting tree may be converted to adirected acyclic graph (DAG) by directingedges guided by the distance transformvalues of the voxels within a cluster.It may also be converted to a uniquelyrooted tree [Siddiqi et al. 1998]. Havingobtained a DAG, a topological signaturevector (TSV) is associated to each node inthe DAG as a node label which is formedby sums of eigenvalues of the adjacencymatrices of all subtrees rooted at theconsidered node. The TSV is used toencode structural information about thesubgraph rooted at the respective node.As another node label, measures for thedistribution of distance transform valuesof the respective cluster members are

Fig. 24. A pair of mutually best-matching objectsfrom a database of about 100 models. (Figure takenfrom Sundar et al. [2003] c© 2003 IEEE).

considered. These node labels constitutethe input to a distance function that mea-sures similarity between individual nodesin skeletal graphs. The final matchingof two skeletal graphs is performed byestablishing a set of node-to-node corre-spondences between the graphs basedon a greedy, recursive bipartite graph-matching algorithm [Shokoufandeh andDickinson 2001]. A final measure for thedissimilarity between two skeletal graphsmay be obtained from the quality of thenode correspondences as determined bytheir node label distance.

The authors demonstrated the capabil-ities of their framework by a number ofmatches obtained from querying a testdatabase (see Figure 24 [Sundar et al.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

374 B. Bustos et al.

Fig. 25. Building a spin image as a histogram of dis-tances α and β of points in some neighborhood withrespect to basis point p. (Figure taken from Johnsonand Hebert [1998] c© 1998 Elsevier with permissionfrom Elsevier.)

2003] for an example). They emphasizethe method’s suitability for matching ar-ticulated objects and also the potential forfinding partial matches between objects.The approach requires several parametersto be set, for example, the threshold levelsfor thinning and clustering.

3.7.3. Spin-Images. A 3D descriptorusing sets of so-called spin-images tocharacterize 3D objects was proposed byJohnson and Hebert [1999] and, regardingthe matching process, modified in a studyby de Alarcon et al. [2002]. The descriptoris rotation and translation invariant bydesign. It requires a set of points on themodel surface and associated normalvectors (i.e., an oriented point set O) asinput. The basic idea is to generate a set oftwo-dimensional histograms of the objectgeometry in the neighborhood of selectedpoints and to use these descriptions tosearch for point-to-point correspondencesbetween two models. It can also be usedto search for correspondences between amodel and a whole 3D scene. Via refine-ment steps, a final measure for similaritybetween parts of an object or two objectsas a hole can be generated. We recall thedescription generation algorithm in thefollowing.

For each oi ∈ O, O, for example, cho-sen as the centers of mass of (oriented)triangles of a model mesh, a so-called spin-image Si is generated by accumulating theoutput of a mapping R

3 → R2 in a two–

dimensional index, the spin-image. Thesespin-images describe object geometry in

Fig. 26. Selected spin images generated from a 3Dmodel. (Figure taken from Johnson [1997].

the neighborhood of oi. The set of spin im-ages gives an object description by a set ofdescriptions each local to one point oi fromthe object. Given an object point oi and itsassociated normal vector ni, the mappingis performed by building a 2-dimensionalhistogram from distances αi, j and βi, j .αi, j is defined as the distance of all pointso j ∈ O, i �= j to the line L extending ni toinfinity. βi, j is defined as the distance of allpoints o j ∈ O, i �= j to the plane throughoi with normal vector ni. The pair of dis-tance distributions (αi j , βi j ) for a point oiis then discretized into a two-dimensionalhistogram Si, where for each distance bin,the number of points o j that belong to therespective bin is recorded. Note that thismapping is equal to discretizing radiusand elevation components of points o jin a cylindrical coordinate system givenby origin oi and normal vector ni. Viathresholding, the relevant neighborhoodaround oi is controlled; alternatively, onemay consider the complete object as theneighborhood. The authors suggest apply-ing bilinear filtering on the spin-imagesin order to reduce the impact of noise.Scaling invariance is provided by normal-izing the distance range to unit length.Figures 25 [Johnson and Hebert 1998]

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 375

and 26 [Johnson 1997] illustrate the spinimages generation process.

Due to potentially high storage and com-putation overhead when cross-comparingall spin-images of two objects and alsothe presence of redundant informationamong close or symmetrically-related spinimages, Johnson and Hebert [1999] sug-gest performing compression on the setof an object’s spin images using dimen-sion reduction. de Alarcon et al. [2002]propose a two-step data reduction pro-cess by first clustering a spin-image setusing the self-organizing map (SOM) al-gorithm to group similar spin images,followed by application of a clusteringalgorithm. This technique is suited to re-duce the number of required descriptorcomparisons by checking only the spin im-age prototypes. Also, Assfalg et al. [2004]suggested spin image postprocessing tech-niques that help to reduce the number ofspin images used to describe each object.Particularly, spin images were interpretedas grey-scale images which could be ef-ficiently described by a low-dimensionalregion-based description scheme from thecontent-based image retrieval (CBIR) do-main. Orthogonally, fuzzy clustering wasproposed to reduce the number of spin im-ages to a smaller number of prototypesonto which a sum of cluster distance func-tion was suggested.

4. COMPARISON BETWEEN 3DDESCRIPTORS

Comparing the surveyed descriptors is adifficult task since the amount of technicaldetails given in the original literaturedid vary between different descriptorsand most of the authors did employindividually compiled benchmarks whenempirically evaluating retrieval precision.We recognize it is a tremendous task to(re)produce an objective analytic andexperimental comparison of the wealthof 3D retrieval methods that is beyondour resources. What we provide here isa limited comparison of key features ofthe surveyed algorithms as well as anexperimental comparison of the retrievalperformance measured against our own

benchmark of a subset of algorithms ofwhich we possess implementations.

4.1. Qualitative Comparison

We first summarize the surveyed meth-ods using the following main algorithmcharacteristics.

—Proposed dimensionality gives eitherthe dimensionality that was found toperform best if experiments were per-formed or recommended values fromthe literature if available. It is gen-erally agreed that the optimal di-mensionality depends on the experi-mental set up, that is, the choice ofdatabase and ground truth for retrievalexperiments.

—Invariance indicates the provided in-variance (rotation (R), translation (T),scale (S)), and how they are achieved(implicit or by object normalization).

—Required object representation gives theassumed object representation. Typi-cally the algorithms work on triangula-tions, but some assume point clouds orvoxelizations.

—Consistency requirements states theproperties required in addition to geom-etry information, mainly topology andorientation of objects.

—The proposed metric mentions thedistance function used in retrievalexperiments or recommended by therespective authors (if applicable). If aninterval is given, good retrieval resultsare reported throughout the interval.

Table II shows a qualitative comparisonbetween the FVs surveyed in Section3. Where more than one descriptor wasproposed in a descriptor class (e.g., for sta-tistical moments), we chose the descriptorthat was technically described best inthe literature, according to our view. Weomitted descriptors from the table, incases where the technical descriptionin the original sources was insufficientfor this comparison. In general, muchtechnical detail is also encapsulated in theimplementation of object preprocessingsuch as mesh normalization, surface point

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

376 B. Bustos et al.

Table II. Comparison Table Between 3D DescriptorsDescriptor Proposed

Dim.Invariance Req. Obj.

Represent.Req. Obj.Consistency

ProposedMetric

ParametrizedStatistics 3.2.2

64 × 3 × 3 RTS via PCA Triangulation — L2, elasticmatching

Shape distribu-tion 3.2.4

1024 bins RT implicit, Svia hist. norm.

Triangulation — L1

Shape his-tograms 3.4.2

122–240bins

RTS via PCA Point cloud — Quadraticform

Rot. inv. pointcloud 3.4.3

21 RTS via PCA Triangulation — Not specified

Voxel 3.4.4 172 RTS via PCA Triangulation — L1

Volume 3.4.5 486 RTS via PCA Triangulation Orientation L1

Cords 3.2.5 120 RT via PCA, Svia hist. norm.

Triangulation — L1

Ray-based sam-pling 3.3.1

91–169 RTS via PCA Triangulation — L1, L2

Rot. inv. sph.harm. 3.4.6

512 TS via norm., Rimplicit

Triangulation — L2

Reflective sym-metry 3.4.7

— TS via norm., Rassumed

Triangulation — L∞

Surf. normalproperties 3.5.1

n.a. RT via PCA, Simplicit

Triangulation Orientation n.a.

Shape spectrum3.5.2

10–100 RTS implicit Triangulation Orientation L1, L2

Ext. Gaussianimage 3.5.3

200 TS implicit, Rassumed

Triangulation Orientation Histogrammetric

Canonical 3DHT3.5.4

2560 RTS via PCA Triangulation Orientation L1, L2

Weighted pointsets 3.4.8

25 × 3cells insignature

RTS via PCA Triangulation Orientation(some of thevariants)

Solution totransportproblem

Silhouette 3.6.1 375 RTS via PCA Triangulation — L1

Depth buffer3.6.2

366 RTS via PCA Triangulation — L1

Lightfield 3.6.3 45 perimage

RTS implicit Triangulation — Multistagematching

Topologicalmatching 3.7.1

n.a. RTS implicit,non-structuraldeformation

Triangulation Non-disconnectedobjects

Customgraphmatching

Skeletonization3.7.2

n.a. RTS implicit Volumetric Volume Customgraphmatching

Sping Image3.7.3

n.a. RTS implicit Point Cloud — Correlation

resampling, choosing voxel grid resolu-tions, and so forth. Also, parametrizationof the numeric transform methods of-ten employed, such as the Fourier andSpherical Harmonics transform, whichin itself may have an impact on retrievalperformance of the methods, would haveto be considered. Such effects are notreflected in the comparison table.

4.2. Experimental Comparison

The database used for our experimentscontains 1,838 3D objects that we collected

from the Internet.2 From this set, 472 ob-jects were manually classified by shapesimilarity into 55 different model classes.The rest of the objects were left as unclas-sified. Each classified object of each modelclass was used as a query object. The ob-jects belonging to the same model class,excluding the query, were taken as the rel-evant objects.

Table III gives a complete description ofthe classified objects of the database. The

2Konstanz 3D model search engine at http://-merkur01.inf.uni-konstanz.de/CCCC/

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 377

Table III. Description of the Classified Set of Our 3D Object DatabaseClass id Description # of models

1 ants 62 rabbits 43 cows 74 dogs 45 fish-like 136 bees 57 CPUs 48 keyboards 89 cans 4

10 bottles 1411 bowls 412 pots 413 cups 814 wine glasses 915 teapots 416 biplanes 517 helicopters 918 missiles 1619 jet planes 1820 fighter jet planes 2621 propeller planes 1022 other planes 423 zeppelins 624 motorcycles 525 sport cars 626 cars 2327 Formula-1 cars 928 galleons 4

Class id Description # of models

29 submarines 530 warships 531 beds 732 chairs 2433 office chairs 634 sofas 435 benches 336 couches 1137 axes 438 glasses 739 knives 340 screws 341 spoons 342 tables 643 skulls 344 human heads 845 human masks 446 books 447 watches 248 sand clocks 449 swords 2550 barrels 351 birches 452 flower pots 953 trees 1154 weeds 955 human bodies 56

first column indicates the class identifica-tion number. The second column describesthe 3D class models. The last column liststhe number of objects per model class.

We implemented 16 different typesof FVs to perform experiments whichincludes statistical FVs (3D moments),geometry-based FVs (principal curvature,shape distribution, ray-based, ray-basedwith spherical harmonics, shading, com-plex valued shading, cords-based, segmentvolume occupation, voxel-based, 3DDFT,rotation invariant spherical harmonics),image-based FVs (depth buffer, silhou-ette), and other approaches (rotation in-variant point cloud descriptor).

4.3. Computational Complexity ofDescriptors

Firstly, we compared the computationalcomplexity of 16 implemented descriptors.Typically the computational cost of fea-ture extraction is not of primary concernas extraction needs to be done only oncefor a database, while additional extractionmust be performed only for those objects

that are to be inserted into the databaseor for query examples submitted by a userto the database. Nevertheless, we presentsome efficiency measures taken on an In-tel P4 2.4 GHz platform with 1GB of mainmemory, running Microsoft Windows. Wemade the observation that, in general, fea-ture calculation is quite fast for most ofthe methods and 3D objects. Shape spec-trum is an exception. Due to the approxi-mation of local curvature from polygonaldata by the fitting of quadratic surfacepatches to all object polygons, this methodis very computation intensive. In general,PCA object preprocessing only constituteda minor fraction of total extraction cost ason average the PCA cost was only 3.59 sec-onds for the complete database of 1,838objects (1.96 milliseconds per object onaverage).

Figure 27 shows the average extrac-tion time as a function of the dimen-sionality of a descriptor. We did notinclude in this chart those descriptorsthat posses the multiresolution property(because we computed those descriptorsonly once, using the maximum possible

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

378 B. Bustos et al.

Fig. 27. Average extraction time for some of the descriptors while varying theirdimensionality.

dimensionality), and we also discarded thecurves for shape spectrum (almost con-stant and one order of magnitude higherthan the others) and volume (a constantvalue for all possible dimensions, 387milliseconds). It follows that the extrac-tion complexity depends on the imple-mented descriptor. For example, one ofthem has constant extraction complexity(shape distribution), others produce sub-linear curves (e.g., rotation invariant andcords), others produce linear curves (e.g.,ray-moments), and the rest produce super-linear curves (e.g., harmonics 3D and mo-ments).

4.4. Effectiveness Comparison BetweenDescriptors

We use precision versus recall figures[Baeza-Yates and Ribeiro-Neto 1999] forcomparing the effectiveness of the searchalgorithms. Precision is the fraction of theretrieved objects which is relevant to agiven query, and recall is the fraction of therelevant objects which has been retrievedfrom the database. That is, if R is the setof relevant objects to the query, A is theset of objects retrieved, and RA is the set

of relevant objects in the result set, then

precision = |RA||A| and recall = |RA|

|R| .

All our precision versus recall figures arebased on the eleven standard recall levels(0%, 10%, . . . , 100%), and we average theprecision figures over all test queries ateach recall level.

In addition to the precision at multi-ple recall points, we also calculate the R-precision [Baeza-Yates and Ribeiro-Neto1999] for each query which is defined bythe precision when retrieving only the firstN objects, where N is the number of rele-vant objects for the query. The R-precisiongives a single number to rate the perfor-mance of a retrieval algorithm. This mea-sure is similar to the Bull-Eye Percent-age (BEP) score adopted as an evaluationstandard by MPEG-7. The BEP is also asingle value measure and equal to recallwhen retrieving the first 2N objects.

We tested all these FVs using differentlevels of resolution from a few dimensionsup to 512, and we used the Manhattan(L1) distance as the similarity function

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 379

Table IV. Average R-Precision of the 3D DescriptorsDescriptor Best dimensionality Avg. R-precisionDepth buffer 366 0.3220Voxel 343 0.3026Complex valued shading 196 0.2974Rays with spherical harmonics 105 0.2815Silhouette 375 0.27363DDFT 365 0.2622Shading 136 0.2386Ray-based 42 0.2331Rotation invariant point cloud 406 0.2265Rotation invariant spherical harmonics 112 0.2219Shape distribution 188 0.1930Ray moments 363 0.1922Cords-based 120 0.17283D moments 31 0.1648Volume 486 0.1443Principal curvature 432 0.1119

between vectors (we also tested L2

and Lmax , but we consistently obtainedthe best effectiveness scores using L1).Table IV shows the best R-precisionvalues obtained with all the FVs in de-scending order. The first column lists thedifferent descriptors. The second columnindicates the best dimensionality (interms of effectiveness) of the FV. The lastcolumn lists the average R-precision val-ues obtained for each FV with their bestdimensionality.

The best overall FV among our setof implemented methods was the depthbuffer, with an average R-precision of 0.32.The difference in effectiveness betweenthe best and the worst performing FV(depth buffer and principal curvature, re-spectively) was significant. However, thedifference in effectiveness between sim-ilar performing FVs was small speciallywhen comparing the most effective de-scriptors. This implies that, in practice,these best FVs should be suited aboutequally well for retrieval of general polyg-onal objects. As a contrast, the effective-ness difference between the worst and thebest descriptor was significant (up to a fac-tor of 3). We observed that descriptors thatrely on consistent polygon orientation likeshape spectrum or volume exhibited lowretrieval rates as consistent orientation isnot guaranteed for many of the models re-trieved from the Internet. Also, the geo-metrical moment-based descriptors seemto offer only limited discrimination capa-

bilities. Figures 28 and 29 show the pre-cision vs. recall figures for all the imple-mented descriptors (first eight and lasteight descriptors according to Table IV,respectively).

Figures 30 and 31 (first eight and lasteight descriptors, respectively) show theeffect of the descriptor dimensionality onthe overall effectiveness. The figures showthat the effectiveness of the FVs first in-creases with dimensionality, but the im-provement rate diminishes quickly forroughly more than 64 dimensions for mostFVs (except for 3DDFT). It is interest-ing to note that the saturation effect isreached for most descriptors at roughlythe same dimensionality level. This is anunexpected result, considering that differ-ent FVs describe different characteristicsof 3D objects.

We also performed some tests usingthe Princeton Shape Benchmark [Shilaneet al. 2004], to contrast our experimen-tal results with those obtained using adifferent 3D ground truth. In summary,we obtained the same results as with ourdatabase with only minor differences (seeBustos et al. [2005] for details).

4.5. Analysis of the Experimental Results

From the results obtained in our exper-iments on a limited set of implementeddescriptors, we conclude that the bestdescriptors on average are those basedon projections (2D, ray-based) of the

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

380 B. Bustos et al.

Fig. 28. Average precision vs. recall with best dimensionality, first eight descriptors,according to Table IV.

Fig. 29. Average precision vs. recall with best dimensionality, last eight descriptors,according to Table IV.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 381

Fig. 30. Dimensionality vs. R-precision, first eight descriptors, according to Table IV.

Fig. 31. Dimensionality vs. R-precision, last eight descriptors, according to Table IV.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

382 B. Bustos et al.

original 3D object, for example, depthbuffer, silhouette, and so forth. Excep-tions to this rule are the voxel FV and the3DDFT FV, which are volumetric descrip-tors and also obtained a good experimentaleffectiveness. Surface-based descriptorsobtained, in general, low effectivenessvalues. All the implemented FVs showedgood robustness with respect to the level ofdetail of the 3D objects. The good retrievalquality of image-based descriptors fromour experiments are in accordance withChen et al. [2003] where an image-baseddescriptor embedded in an advancedmultistage matching framework wasshown to provide excellent retrievalresults and outperformed several otherdescriptors.

However, we also observed significantvariance with respect to the effective-ness of retrieval when comparing the re-sults for classes of objects. For differentclasses of objects, a different FV was usu-ally the most effective one. Unfortunately,we could not find a strong correlation be-tween geometric properties of the 3D ob-ject class and the best suited FV for thatmodel class. A notable exception for this isthe Shape Spectrum descriptor (the worstdescriptor on average) which obtained thebest retrieval effectiveness for the humanmodel class. Shape spectrum was able torecognize different models of human bod-ies in different poses, something that wasnot possible for the rest of the imple-mented FVs. This can be attributed tothe fact that the Shape Spectrum descrip-tor considers the distribution of local cur-vature on the 3D object which does notvary considerably in similar 3D modelswith different poses (e.g., human bodies indifferent positions). Another observationthat we made is that model classes whichare difficult to correctly orient using PCA(cf. Figure 4, first row) are best retrievedby FVs that are inherently rotational in-variant, for example, rotational invariantspherical harmonics.

Besides these specific exceptions, it isdifficult to assess a priori which FV willhave the best retrieval effectiveness foran unknown query object. On average,depth buffer or voxel will do pretty well,

but one would like to always select thebest FV given a query object. There-fore, it is hard to give a recommendationwith respect to which FV to implementinto a 3D similarity search system whenbuilding.

A promising approach to solve thisproblem is to resort to the usage ofcombinations of feature vectors. The ideais to not only use one but several FVstogether, hence taking advantage of theparticularities of each considered FV. Alinear combination of FVs will not providethe optimal results because if one of theconsidered FVs has a very bad effective-ness for the given query object, then it willspoil the final result. Dynamic weightingmethods have been recently proposed[Bustos et al. 2004a, 2004b] which aimto avoid this problem, giving only a highweight to those FVs that are most promis-ing to the query object. The goodness ofa FV is estimated against a training (orreference) database prior to performingthe weighted query against the actualdatabase. The presented experimentalresults showed noticeable improvementsin the overall effectiveness of the retrievalsystem by using dynamically generatedcombinations of FVs.

Regarding the nature of the surveyedFVs, we conclude that they are all pro-posed for usage on databases which donot restrict the type of objects containedtherein. In practice, authors have usedgeneral purpose VRML models obtainedfrom the Internet, representing a widespectrum of objects. Of course, if thetype of models to be supported can beanticipated in advance, it is possible toperform benchmarks targeted at the spe-cific models in order to select the best FVsto implement. On the other hand, if therelevant model features for the retrievaltask are known, it should be possible todesign custom descriptors. For example,in CAD databases, it might be possibleto specify certain geometric featuresrelevant for a construction process soone could design descriptors exploitingsuch knowledge. Identifying application-specific requirements and designingdescriptors that support them is an

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 383

interesting future work with high com-mercial potential.

5. CONCLUSIONS

This survey described a variety of recentlyproposed feature-based descriptors for3D objects and introduced a taxonomyto classify them. We believe that the re-ported feature extraction methods presentthe first important achievements in thesearch for general purpose, fast retrievalalgorithms for 3D object databases. Thefeature vector approach maps 3D objectsto a vector of real values which, in turn,can be used for distance calculation. Fur-thermore, it makes applicable the widearea of multimedia indexing techniqueswhich have been researched for a longtime now [Bohm et al. 2001]. Retrievalsystems may also profit from semi-interactive query enhancement methods,like relevance feedback [Elad et al. 2002],annotation information [Zhang and Chen2001], or feature selection and combina-tion techniques [Bustos et al. 2004b].

While many sophisticated object anal-ysis and matching methods exist inthe domain of computational geometryand computer vision, these are usuallytailored to specific recognition problems,and it is questionable whether they easilyextend to the database retrieval problem.This is due to restrictions imposed onthe objects and due to computationalcomplexity issues. Extracting featurevectors from skeletal representations ofthe objects [Lou et al. 2004; Sundar et al.2003] is an interesting approach but,to date, its applicability to the databaseretrieval problem in terms of effectivenessand efficiency is unclear. Specifically, therobustness of such methods with respectto feature extraction parametrization hasto be explored.

Considering the wealth of feature ex-traction methods proposed so far (newmethods defining novel 3D features areproposed regularly), selecting the onesto use when building an actual 3D re-trieval system is a difficult problem. Acomplete and fair comparison of all themain available methods does not seem fea-

sible as it is currently more attractive forresearchers to propose new methods thanto re-implement existing ones. But consid-ering computational complexity and ob-ject consistency requirements can provideguidance in order to select application-specific methods to implement.

In this survey, we compared the com-putational complexity of certain featurevectors which are currently implementedin our own system. In practice, thenormalization step and the descriptorcomputation cost is small, and almost alldescriptors can be computed in less thana second for an object, on average, on astandard workstation. As the descriptorcomputation must be performed only onceper object, this implies that the describeddescriptors can be used for real-worldapplications.

We also experimentally compared sev-eral of these 3D FVs on a classifieddatabase of 3D objects formed by modelscollected from the Internet, and we com-pared their retrieval performance usingstandard effectiveness measures from theinformation retrieval domain (precisionvs. recall diagrams and the R-precisionvalues). Our experimental comparison of16 different 3D FVs shows that there arenumber of them that have good averageeffectiveness and work well in most cases(e.g., depth buffer, voxel and complex FVs)for the types of 3D models found on theInternet today.

There remain important open problemsin the research of content-based descrip-tion and retrieval of 3D objects, some ofwhich we outline in the following.

To consider searching 3D objectsfrom heterogeneous databases wherethe objects may be arbitrarily scaled andoriented in their respective coordinate sys-tems, scale and rotation invariant meth-ods must either normalize the models oremploy descriptions that provide these in-variances implicitly. Most methods to dateadvocate rotation and translation normal-ization based on Principal ComponentsAnalysis. As PCA may lead to counterintu-itive alignment results for certain 3D mod-els [Funkhouser et al. 2003; Tangelderand Veltkamp 2003], extensions and

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

384 B. Bustos et al.

alternatives to the PCA-based normaliza-tion need to be devised. Depending on theapplication domain, additional invariancemay be desirable, for example, invari-ance with respect to local deformations ingeometry and topology or invariance withrespect to anisotropic scaling [Kazhdanet al. 2004].

Also, the current methods focus mainlyon geometric aspects of 3D models. Leftaside are other attributes which arepresent in many 3D databases: color, ma-terial properties, and texture can be spec-ified in many formats, such as the popularVRML format. More specialized formatsfrom the CAD domain usually also containstructural object information and machin-ing process information; this informationmight as well be exploited for 3D retrieval.

How to improve the efficiency of 3Dsearch systems is also an open issue. Theneed for appropriate indexing techniques,considering the high dimensionality of thedescriptors, seems obvious. Moreover, ifwe consider the segmentation of objectsas a possible approach for partial similar-ity search, then the original database witha few thousands of models can be trans-formed into a database with millions ofobjects where efficiency considerations be-come mandatory.

ACKNOWLEDGMENTS

We thank the editors of ACM Computing Surveys

and the anonymous referees for their helpful com-

ments on the earlier version of this article. We also

thank the authors and publishing houses that gave

us permission to use some of the figures included in

this article.

REFERENCES

ANKERST, M., KASTENMULLER, G., KRIEGEL, H.-P., AND

SEIDL, T. 1999a. 3D shape histograms forsimilarity search and classification in spatialdatabases. In Proceedings of the 6th Inter-national Symposium on Advances in SpatialDatabases (SSD’99). Springer-Verlag, London,UK, 207–226.

ANKERST, M., KASTENMULLER, G., KRIEGEL, H.-P., AND

SEIDL, T. 1999b. Nearest neighbor classifica-tion in 3D protein databases. In Proceedings ofthe 7th International Conference on IntelligentSystems for Molecular Biology. AAAI Press, 34–43.

ANSARY, T. F., VANDEBORRE, J.-P., MAHMOUDI, S., AND

DAOUDI, M. 2004. A bayesian framework for3D models retrieval based on characteristicviews. In Proceedings of the 2nd InternationalSymposium on 3D Data Processing, Visualiza-tion, and Transmission. IEEE Computer Society,139–146.

ASHLEY, J., FLICKNER, M., HAFNER, J., LEE, D., NIBLACK,W., AND PETKOVIC, D. 1995. The query by imagecontent (QBIC) system. SIGMOD Rec. 24, 2, 475.

ASSFALG, J., BIMBO, A. D., AND PALA, P. 2004. Re-trieval of 3D objects by visual similarity. In Pro-ceedings of the 6th ACM SIGMM InternationalWorkshop on Multimedia Information Retrieval(MIR’04). ACM Press, New York, NY, 77–83.

BAEZA-YATES, R. A. AND RIBEIRO-NETO, B. 1999.Modern Information Retrieval. Addison-WesleyLongman Publishing Co., Inc., Boston, MA.

BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P. 1997.Using extended feature objects for partial simi-larity retrieval. VLDB J. 6, 4, 333–348.

BESPALOV, D., SHOKOUFANDEH, A., REGLI, W. C., AND

SUN, W. 2003. Scale-space representation of3D models and topological matching. In Proceed-ings of the 8th ACM Symposium on Solid Model-ing and Applications (SM’03). ACM Press, NewYork, NY, 208–215.

BIASOTTI, S., MARINI, S., MORTARA, M., PATANE, G.,SPAGNUOLO, M., AND FALCIDIENO, B. 2003. 3Dshape matching through topological structures.In Proceedings of the 11th International Confer-ence on Discrete Geometry for Computer Imagery.Lecture Notes in Computer Science, Vol. 2886.Springer, 194–203.

BOHM, C., BERCHTOLD, S., AND KEIM, D. A. 2001.Searching in high-dimensional spaces: Indexstructures for improving the performanceof multimedia databases. ACM Comput.Surv. 33, 3, 322–373.

BUSTOS, B., KEIM, D. A., SAUPE, D., SCHRECK, T., AND

VRANIC, D. 2004a. Automatic selection andcombination of descriptors for effective 3D sim-ilarity search. In Proceedings of the IEEE In-ternational Workshop on Multimedia Content-based Analysis and Retrieval. IEEE ComputerSociety, 514–521.

BUSTOS, B., KEIM, D. A., SAUPE, D., SCHRECK, T., AND

VRANIC, D. 2004b. Using entropy impurity forimproved 3D object similarity search. In Pro-ceedings of the IEEE International Conferenceon Multimedia and Expo. IEEE, 1303–1306.

BUSTOS, B., KEIM, D. A., SAUPE, D., SCHRECK, T., AND

VRANIC, D. 2005. An experimental effective-ness comparison of methods for 3D similaritysearch. J. Digital Libraries. Springer-Verlag.

CAMPBELL, R. J. AND FLYNN, P. J. 2001. A surveyof free-form object representation and recogni-tion techniques. Comput. Vision Image Under-stand. 81, 2, 166–210.

CHAVEZ, E., NAVARRO, G., BAEZA-YATES, R., AND MAR-ROQUıN, J. L. 2001. Searching in metric spaces.ACM Comput. Surv. 33, 3, 273–321.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 385

CHEN, D.-Y., TIAN, X.-P., SHEN, Y.-T., AND OUHYOUNG,M. 2003. On visual similarity based 3D modelretrieval. Computer Graphics Forum 22, 3, 223–232.

CYR, C. M. AND KIMIA, B. B. 2004. A similarity-based aspect-graph approach to 3D object recog-nition. Int. J. Comput. Vision 57, 1, 5–22.

DE ALARCON, P. A., PASCUAL-MONTANO, A. D., AND

CARAZO, J. M. 2002. Spin images and neuralnetworks for efficient content-based retrieval in3D object databases. In Proceedings of the In-ternational Conference on Image and Video Re-trieval (CIVR’02). Springer-Verlag, London, UK,225–234.

ELAD, M., TAL, A., AND AR, S. 2002. Content basedretrieval of VRML objects: an iterative and in-teractive approach. In Proceedings of the 6thEurographics Workshop on Multimedia 2001.Springer-Verlag New York, NY, 107–118.

FALOUTSOS, C. 1996. Searching MultimediaDatabases by Content. Kluwer AcademicPublishers, Norwell, MA.

FUNKHOUSER, T., MIN, P., KAZHDAN, M., CHEN, J., HAL-DERMAN, A., DOBKIN, D., AND JACOBS, D. 2003.A search engine for 3D models. ACM Trans.Graph. 22, 1, 83–105.

GAGVANI, N. AND SILVER, D. 1999. Parameter-controlled volume thinning. Graph. Models Im-age Process. 61, 3, 149–164.

GERADTS, Z., HARDY, H., POORTMAN, A., AND BIJHOLD, J.2001. Evaluation of contents-based image re-trieval methods for a database of logos on drugtablets. In Proceedings of SPIE Enabling Tech-nologies for Law Enforcement and Security. Vol.4232, 553–562.

HEALY, D. M., ROCKMORE, D. N., KOSTELEC, P. J., AND

MOORE, S. S. B. 2003. FFTs for the 2-sphere -Improvements and variations. J. Fourier Analy.Appl. 9, 4, 341–385.

HECZKO, M., KEIM, D. A., SAUPE, D., AND VRANIC,D. 2002. Methods for similarity search on 3Ddatabases. Datenbank-Spektrum 2, 2, 54–63. (InGerman).

HILAGA, M., SHINAGAWA, Y., KOHMURA, T., AND KUNII,T. L. 2001. Topology matching for fully auto-matic similarity estimation of 3D shapes. In Pro-ceedings of the 28th Annual Conference on Com-puter Graphics and Interactive Techniques (SIG-GRAPH’01). ACM Press, New York, NY, 203–212.

HORN, B. 1984. Extended Gaussian image. Pro-ceedings of the IEEE 72, 12, 1671–1686.

IP, C. Y., LAPADAT, D., SIEGER, L., AND REGLI, W. C.2002. Using shape distributions to comparesolid models. In Proceedings of the 7th ACMSymposium on Solid Modeling and Applications(SMA’02). ACM Press, New York, NY, 273–280.

IP, C. Y., REGLI, W. C., SIEGER, L., AND SHOKOUFAN-DEH, A. 2003. Automated learning of modelclassifications. In Proceedings of the 8th ACMSymposium on Solid Modeling and Applica-

tions (SM’03). ACM Press, New York, NY, 322–327.

IP, H. AND WONG, W. 2002. 3D head models re-trieval based on hierarchical facial region sim-ilarity. In Proceedings of the 15th InternationalConference on Vision Interface. 314–319.

JOHNSON, A. E. 1997. Spin-images: A representa-tion for 3-D surface matching. Ph.D. thesis,Robotics Institute, Carnegie Mellon University.

JOHNSON, A. E. AND HEBERT, M. 1998. Surfacematching for object recognition in complexthree-dimensional scenes. Image Vision Com-put. 16, 9–10, 635–651.

JOHNSON, A. E. AND HEBERT, M. 1999. Using spinimages for efficient object recognition in clut-tered 3D scenes. IEEE Trans. Pattern Anal.Mach. Intell. 21, 5, 433–449.

KANG, S. B. AND IKEUCHI, K. 1993. The complexegi: A new representation for 3-d pose deter-mination. IEEE Trans. Pattern Anal. Mach. In-tell. 15, 7, 707–721.

KATO, T., SUZUKI, M., AND OTSU, N. 2000. A similar-ity retrieval of 3D polygonal models using rota-tion invariant shape descriptors. In Proceedingsof the IEEE International Conference on Sys-tems, Man Management, and Cybernetics. 2946–2952.

KAZHDAN, M., CHAZELLE, B., DOBKIN, D., FUNKHOUSER,T., AND RUSINKIEWICZ, S. 2003. A reflectivesymmetry descriptor for 3D models. Algorith-mica 38, 1, 201–225.

KAZHDAN, M., FUNKHOUSER, T., AND RUSINKIEWICZ, S.2004. Shape matching and anisotropy. ACMTrans. Graph. 23, 3, 623–629.

KEIM, D. A. 1999. Efficient geometry-based simi-larity search of 3D spatial databases. In Pro-ceedings of the 1999 ACM SIGMOD Interna-tional Conference on Management of Data (SIG-MOD’99). ACM Press, New York, NY, 419–430.

KRIEGEL, H.-P., BRECHEISEN, S., KRUGER, P., PFEIFLE,M., AND SCHUBERT, M. 2003. Using sets of fea-ture vectors for similarity search on voxelizedCAD objects. In Proceedings of the 2003 ACMSIGMOD International Conference on Manage-ment of Data (SIGMOD’03). ACM Press, NewYork, NY, 587–598.

LEIFMAN, G., KATZ, S., TAL, A., AND MEIR, R. 2003.Signatures of 3D models for retrieval. In Pro-ceedings of the 4th Israel-Korea Bi-National Con-ference on Geometric Modeling and ComputerGraphics. 159–163.

LOFFLER, J. 2000. Content-based retrieval of 3Dmodels in distributed web databases by visualshape information. In Proceedings of the Inter-national Conference on Information Visualisa-tion (IV’00). IEEE Computer Society, Washing-ton, DC, 82.

LONCARIC, S. 1998. A survey of shape analysis tech-niques. Pattern Recog. 31, 8, 983–1001.

LOU, K., PRABHAKAR, S., AND RAMANI, K. 2004.Content-based three-dimensional engineering

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

386 B. Bustos et al.

shape search. In Proceedings of the 20th In-ternational Conference on Data Engineering(ICDE’04). IEEE Computer Society, Washing-ton, DC, 754–765.

MCWHERTER, D., PEABODY, M., REGLI, W. C., AND SHOK-OUFANDEH, A. 2001. Transformation invariantshape similarity comparison of solid models.In ASME Design Engineering Technical Confer-ences, 6th Design for Manufacturing Conference(DETC 2001/DFM-21191).

MCWHERTER, D., PEABODY, M., SHOKOUFANDEH, A. C.,AND REGLI, W. 2001. Database techniques forarchival of solid models. In Proceedings of the6th ACM Symposium on Solid Modeling and Ap-plications (SMA’01). ACM Press, New York, NY,78–87.

NGU, A. H. H., SHENG, Q. Z., HUYNH, D. Q., AND LEI, R.2001. Combining multi-visual features for effi-cient indexing in a large image database. VLDBJ. 9, 4, 279–293.

NOVOTNI, M. AND KLEIN, R. 2001a. Geometric 3Dcomparison - An application. In ECDL WS Gen-eralized Documents.

NOVOTNI, M. AND KLEIN, R. 2001b. A geometric ap-proach to 3D object comparison. In Proceedingsof the International Conference on Shape Mod-eling & Applications (SMI’01). IEEE ComputerSociety, Washington, DC, 167–175.

NOVOTNI, M. AND KLEIN, R. 2003. 3D Zernike de-scriptors for content based shape retrieval. InProceedings of the 8th ACM Symposium on SolidModeling and Applications (SM’03). ACM Press,New York, NY, 216–225.

NOVOTNI, M. AND KLEIN, R. 2004. Shape retrievalusing 3d zernike descriptors. Comput. Aid. De-sign 36, 11, 1047–1062.

OHBUCHI, R., MINAMITANI, T., AND TAKEI, T. 2003.Shape-similarity search of 3D models by us-ing enhanced shape functions. In Proceedings ofthe Theory and Practice of Computer Graphics(TPCG’03). IEEE Computer Society, Washing-ton, DC, 97.

OHBUCHI, R., OTAGIRI, T., IBATO, M., AND TAKEI,T. 2002. Shape-similarity search of three-dimensional models using parameterized statis-tics. In Proceedings of the 10th Pacific Conferenceon Computer Graphics and Applications (PG’02).IEEE Computer Society, Washington, DC, 265–274.

OSADA, R., FUNKHOUSER, T., CHAZELLE, B., AND DOBKIN,D. 2002. Shape distributions. ACM Trans.Graph. 21, 4, 807–832.

PAQUET, E., MURCHING, A., NAVEEN, T., TABATABAI, A.,AND RIOUX, M. 2000. Description of shape in-formation for 2-D and 3-D objects. Signal Pro-cess. Image Comm. 16, 103–122.

PAQUET, E. AND RIOUX, M. 2000. Nefertiti: A tool for3-D shape databases management. Image VisionComput. 108, 387–393.

PUZICHA, J., BUHMANN, J. M., RUBNER, Y., AND TOMASI,C. 1999. Empirical evaluation of dissimilar-

ity measures for color and texture. In Proceed-ings of the International Conference on ComputerVision-Volume 2 (ICCV’99). IEEE Computer So-ciety, Washington, DC, 1165–1173.

RONNEBERGER, O., BURKHARDT, H., AND SCHULTZ, E.2002. General-purpose object recognition in 3Dvolume data sets using gray-scale invariants -classification of airborne pollen-grains recordedwith a confocal laser scanning microscope. InProceedings of the 16th International Conferenceon Pattern Recognition. Vol. 2. IEEE ComputerSociety, 290–295.

RUBNER, Y., TOMASI, C., AND GUIBAS, L. J. 1998. Ametric for distributions with applications to im-age databases. In Proceedings of the 6th Interna-tional Conference on Computer Vision (ICCV’98).IEEE Computer Society, Washington, DC, 59–66.

SANCHEZ-CRUZ, H. AND BRIBIESCA, E. 2003. Amethod of optimum transformation of 3D ob-jects used as a measure of shape dissimilarity.Image Vision Comput. 21, 11, 1027–1036.

SEIDL, T. AND KRIEGEL, H.-P. 1997. Efficient user-adaptable similarity search in large multime-dia databases. In Proceedings of the 23rd Inter-national Conference on Very Large Data Bases(VLDB’97). Morgan Kaufmann Publishers Inc.,San Francisco, CA, 506–515.

SHAMIR, A., SHARF, A., AND COHEN-OR, D. 2003. En-hanced hierarchical shape matching for shapetransformation. Int. J. Shape Model. 9, 2.

SHILANE, P., MIN, P., KAZHDAN, M., AND FUNKHOUSER,T. 2004. The Princeton shape benchmark. InProceedings of the Shape Modeling International2004 (SMI’04). IEEE Computer Society, Wash-ington, DC, 167–178.

SHOKOUFANDEH, A. AND DICKINSON, S. J. 2001. A uni-fied framework for indexing and matching hi-erarchical shape structures. In Proceedings ofthe 4th International Workshop on Visual Form(IWVF-4). Springer-Verlag, London, UK, 67–84.

SHUM, H.-Y., HEBERT, M., AND IKEUCHI:, K. 1996.On 3D shape similarity. In Proceedings of the1996 Conference on Computer Vision and PatternRecognition (CVPR’96). IEEE Computer Society,Washington, DC, 526–531.

SIDDIQI, K., SHOKOUFANDEH, A., DICKINSON, S. J., AND

ZUCKER, S. W. 1998. Shock graphs and shapematching. In Proceedings of the 6th Interna-tional Conference on Computer Vision (ICCV’98).IEEE Computer Society, Washington, DC, 222–229.

SONG, J.-J. AND GOLSHANI, F. 2002. 3D object re-trieval by shape similarity. In Proceedings ofthe 13th International Conference on Databaseand Expert Systems Applications (DEXA’02).Springer-Verlag, London, UK, 851–860.

SUNDAR, H., SILVER, D., GAGVANI, N., AND DICKINSON,S. J. 2003. Skeleton based shape matchingand retrieval. In Proceedings of the Shape Model-ing International (SMI’03). IEEE Computer So-ciety, Washington, DC, 130–142.

ACM Computing Surveys, Vol. 37, No. 4, December 2005.

Feature-Based Similarity Search in 3D Object Databases 387

TANGELDER, J. AND VELTKAMP, R. 2003. Polyhedralmodel retrieval using weighted point sets. Int. J.Image and Graphics 3, 1, 209–229.

TEODORO, M. L., PHILLIPS, G. N., AND KAVRAKI, L. E.2001. Molecular docking: A problem with thou-sands of degrees of freedom. In Proceedings of theIEEE International Conference on Robotics andAutomation. IEEE, 960–966.

VRANIC, D. 2003. An improvement of rotation in-variant 3D shape descriptor based on functionson concentric spheres. In Proceedings of theIEEE International Conference on Image Pro-cessing. Vol. 3. IEEE, 757–760.

VRANIC, D. 2004. 3D model retrieval. Ph.D. thesis,University of Leipzig, Germany.

VRANIC, D. AND SAUPE, D. 2000. 3D model retrieval.In Proceedings of the Spring Conference on Com-puter Graphics and its Applications. ComeniusUniversity, 89–93.

VRANIC, D. AND SAUPE, D. 2001a. 3D model re-trieval with spherical harmonics and moments.In Proceedings of the 23rd DAGM-Symposiumon Pattern Recognition. Springer-Verlag, Lon-don, UK, 392–397.

VRANIC, D. AND SAUPE, D. 2001b. 3D shape descrip-tor based on 3D fourier transform. In Proceed-

ings of the EURASIP Conference on Digital Sig-nal Processing for Multimedia Communicationsand Services. Comenius University, 271–274.

VRANIC, D. AND SAUPE, D. 2002. Description of 3D-shape using a complex function on the sphere.In Proceedings of the IEEE International Con-ference on Multimedia and Expo. 177–180.

VRANIC, D., SAUPE, D., AND RICHTER, J. 2001. Toolsfor 3D-object retrieval: Karhunen-Loeve trans-form and spherical harmonics. In Proceedings ofthe IEEE 4th Workshop on Multimedia SignalProcessing. 293–298.

ZAHARIA, T. AND PRETEUX, F. 2001. Three-dimensional shape-based retrieval withinthe MPEG-7 framework. In Proceedings of theSPIE Conference on Nonlinear Image Processingand Pattern Analysis XII. 133–145.

ZAHARIA, T. AND PRETEUX, F. 2002. Shape-based re-trieval of 3D mesh models. In Proceedings of theIEEE International Conference on Multimediaand Expo (ICME’02).

ZHANG, C. AND CHEN, T. 2001. Indexing and re-trieval of 3D models aided by active learning. InProceedings of the 9th ACM International Con-ference on Multimedia (MULTIMEDIA’01). ACMPress, New York, NY, 615–616.

Received December 2004; revised October 2005; accepted December 2005

ACM Computing Surveys, Vol. 37, No. 4, December 2005.