1312 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …ivizlab.sfu.ca/arya/Papers/IEEE/PAMI/1999/December... · 2009. 5. 12. · 1312 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE

Flexible Syntactic Matching of Curves andIts Application to Automatic Hierarchical

Classification of SilhouettesYoram Gdalyahu and Daphna Weinshall, Member, IEEE

AbstractÐCurve matching is one instance of the fundamental correspondence problem. Our flexible algorithm is designed to match

curves under substantial deformations and arbitrary large scaling and rigid transformations. A syntactic representation is constructed

for both curves and an edit transformation which maps one curve to the other is found using dynamic programming. We present

extensive experiments where we apply the algorithm to silhouette matching. In these experiments, we examine partial occlusion,

viewpoint variation, articulation, and class matching (where silhouettes of similar objects are matched). Based on the qualitative

syntactic matching, we define a dissimilarity measure and we compute it for every pair of images in a database of 121 images. We use

this experiment to objectively evaluate our algorithm: First, we compare our results to those reported by others. Second, we use the

dissimilarity values in order to organize the image database into shape categories. The veridical hierarchical organization stands as

evidence to the quality of our matching and similarity estimation.

Index TermsÐCurve matching, syntactic matching, image database, silhouettes.

æ

1 INTRODUCTION

GIVEN a large collection of images, unraveling itsredundancies is an important and challenging task.One could use this knowledge to assist in image queryingand to construct more efficient and compact imagerepresentations. In order to identify redundancy in thedatabase, we propose the following approach: First, designalgorithms to measure the similarity between images.Second, given pairwise image similarity, use similarity-based clustering to reveal the structure in the data byhierarchically dividing the images into distinct clusters.Third, identify redundancy in each cluster and use it toprune the database and pick cluster representatives; thiswould allow for efficient indexing into the database.

It is important to distinguish between our approach,

where the clustering of N images uses only their N �Nsimilarity matrix, and the more typical approach, where

images are first embedded in some D-dimensional vector

space whose dimension should be significantly reduced

using such methods as PCA. Mapping an image into such a

space, in effect, requires the identification of D measure-

ments (or ªfeaturesº) that completely describe the image.

This has proven to be an elusive task. The task of image

comparison, on the other hand, seems more within our

reach: Rather than look for an explicit representation of

images as vectors, we seek an algorithm (as complex as

necessary) which receives as input two images and returns

as output the similarity between them.

In this paper, we focus on the design of similaritymeasures, limiting ourselves to the shape dimension ofsimilarity and ignoring other dimensions (e.g., color,motion, and context). The similarity is, therefore, definedas similarity between silhouettes. We describe our silhou-ette matching algorithm and show results of extensiveexperiments with real images. We then outline a stochasticclustering algorithm (described at length in [14]) and arguethat the good clustering results we show give objectiveevidence to the quality of our silhouette matching algo-rithm. The issue of database pruning, and how to identifywithin cluster redundancies to allow for efficient indexinginto the database, is discussed in later work [21].

More specifically, we describe a novel flexible curvematching algorithm to relate between feature points that areautomatically extracted on the boundaries of objects. Unlikemost pattern recognition applications of clustering, we usereal images of three-dimensional objects, basing oursimilarity measure on the shape of their occluding contours.Our algorithm is designed to give a graded similarity value,where low values reflect similarity between weakly similarcurves and higher values indicate strong similarity. Toillustrate, given two curves describing the shape of twodifferent mammals, we consider our algorithm to beªsuccessfulº if it matches their limbs and head correspond-ingly. The matched pairs of feature points are then alignedusing an optimal 2D similarity transformation (translation,rotation, and scale). From the residual distances betweencorresponding features, we compute a robust dissimilaritymeasure between silhouettes. The matching algorithm isoutlined in Section 3 and the resulting dissimilarity valuesare compared with those reported in the literature.

According to the general approach adopted in this paper,our next step is to feed the computed dissimilarities into apairwise clustering algorithm to obtain hierarchical clusters

1312 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 21, NO. 12, DECEMBER 1999

. The authors are with the Institute of Computer Science, The HebrewUniversity, 91904 Jerusalem, Israel. E-mail: {yoram, daphna}@cs.huji.ac.il.

Manuscript received 4 Jan. 1998; revised 19 Oct. 1999.Recommended for acceptance by K. Bowyer.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number 107787.

0162-8828/99/$10.00 ß 1999 IEEE

of similar images. A pairwise clustering algorithm exploitsonly proximity information and, is therefore, suitable whenvectorial representation of images is not available. Instead,the images are represented as nodes in a graph with edgeswhose weight reflect the similarity between every imagepair. In [14], we describe our stochastic clustering algo-rithm, which is outlined in Section 4. Our algorithmdetermines the number of clusters in a (true) hierarchicalmanner and tolerates, to some extent, violations of metricproperties (i.e., violation of the triangle inequality). Wedemonstrate perceptually veridical results using a databaseof 121 images of 12 different objects, which are hierarchi-cally classified. The useful clustering results illustrate thequality of our matching algorithm and the usefulness of ourgeneral approach.

2 CURVE MATCHING: PROBLEM AND RELATEDWORK

Contour matching is an important problem in computervision with a variety of applications, including model-basedrecognition, depth from stereo, and tracking. In theseapplications, the two matched curves are usually verysimilar. For example, a typical application of curvematching to model-based recognition would be to decidewhether a model curve and an image curve are the same, upto some scaling or 2D rigid transformation and somepermitted level of noise.

In this paper, we are primarily interested in the casewhere the similarity between the two curves is weak. Theorganization of silhouettes into shape categories (like tools,cars, etc.) necessitates flexible matching, which can supportgraded similarity estimation.

While our approach focuses on the silhouette boundary,a dual approach is based on its medial axis. Specifically, amedial axis, together with singularities labeling, form ashock graph representation, and matching shock graphs isan isomorphism problem. The methods for solving itincludes semidefinite programming [38], replicator dy-namics [32], graduated assignment [36], and syntactic graphmatching [40]. In some of these cases, the matching is onlystructural, while, in others, two levels of matching(structural and metrical) are supported. The methods basedon shock graphs succeed in defining a graded similaritymeasure and may be combined with suitable databaseindexing [37], [24]. In this paper, we show, however, thatour results are of the same quality in spite of usingboundary representation, which is inherently less sensitiveto occlusion and which does not involve the NP-completegraph isomorphism problem.

To put our method in the context of existing work onboundary matching, we first distinguish between densematching and feature matching. Dense matching is usuallyformulated as a parameterization problem, with some costfunction to be minimized. The cost might be defined as theªelastic energyº needed to transform one curve to the other[5], [8], [10], but other alternatives exist [2], [12], [15], [30].The main drawbacks of these methods are their highcomputational complexity (which is reduced significantly ifonly key points are matched) and the fact that they are

usually not invariant under both 2D rotation and scaling. Inaddition, the computation of elastic energy (which isdefined in terms of curvature) is scale dependent andrequires accurate evaluation of second order derivatives.

Feature matching methods may be divided into threegroups: proximity matching, spread primitive matching,and syntactic matching. The idea behind proximity match-ing methods is to search for the best matching whilepermitting the rotation, translation, and scaling (to be calledalignment transformation) of each curve such that thedistances between matched key points are minimized [4],[20], [22], [45]. Consequently, these methods are ratherslow; moreover, if scaling is permitted, an erroneousshrinking of one feature set may result, followed by thematching of the entire set with a small number of featuresfrom the other set. One may avoid these problems byexcluding many-to-one matches and by using the pointsorder, but then the method becomes syntactic (see below).Moreover, we illustrate in Section 5.7.3 why proximitymatching is not adequate for weakly similar curves. As analternative to the alignment transformation, features may bemapped to an intrinsic invariant coordinate frame [28], [34],[35]; the drawback of this approach is that it is global, as theentire curve is needed to correctly compute the mapping.

Features can be used to divide the curves into shapeelements or primitives. If a single curve can be decomposedinto shape primitives, the matching algorithm should beconstrained to preserve their order. But, in the absence ofany ordering information (like in stereo matching of manysmall fragments of curves), the matching algorithm may becalled ªspread primitive matching.º In this category, wefind algorithms that seek isomorphism between attributedrelational graphs [6], [9], [26] and algorithms that look forthe largest set of mutually compatible matches. Here,compatibility means an agreement on the induced coordi-nate transformation and a few techniques exist to find thelargest set of mutually compatible matches (e.g., clusteringin Hough space [39], geometrical hashing [25], and cliquefinding in an association graph [7], [11], [18], [23]). Notethat, at the application level, finding isomorphism betweenattributed relational graphs is the same problem as findingisomorphism between shock graphs (discussed above),although, in the last case, an additional constraint mayapply [32].

For our purpose of matching complex outlines, it isadvantageous to use the natural order of primitives. Thisresults in a great simplification and there is no need to solvethe difficult graph isomorphism problem. Moreover, therelations encoded by the attributed relational graphs needto be invariant with respect to 2D image transformationsand as a result they are usually nonlocal.

A syntactical representation of a curve is an ordered list ofshape elements, having attributes like length, orientation,bending angle, etc. Hence, many syntactical matchingmethods are inspired by efficient and well-known stringcomparison algorithms, which use edit operations (sub-stitution, deletion, and insertion) to transform one string tothe other [17], [29], [46]. The vision problem is differentfrom the string matching problem in two major aspects,however: First, in vision, invariance to certain geometrical

GDALYAHU AND WEINSHALL: FLEXIBLE SYNTACTIC MATCHING OF CURVES AND ITS APPLICATION TO AUTOMATIC HIERARCHICAL... 1313

transformations is desired; second, a resolution degradation(or smoothing) may create a completely different list ofelements in the syntactical representation.

There are no syntactic algorithms available whichsatisfactorily solve both of these problems. If invariantattributes are used, the first problem is immediatelyaddressed, but then the resolution problem either remainsunsolved [1], [16], [27] or may be addressed by constructing,for each curve, a cascade of representations at differentscales [3], [31], [44]. Moreover, invariant attributes are eithernonlocal (e.g., length that is measured in units of the totalcurve length) or they are noninterruptible (see discussion inSection 5.7). Using variant attributes is less efficient, butprovides the possibility of defining a merge operator whichcan handle noise [33], [41], [42] and might be useful (ifcorrectly defined) in handling resolution change. However,the methods using variant attributes could not ensurerotation and scale invariance.

3 FLEXIBLE SYNTACTIC CURVE MATCHING:ALGORITHM

In this section, we present a local syntactic matchingmethod which can cope with both occlusion and irrelevantchanges due to image transformation, while using variantattributes. These attributes support a simple smoothingmechanism, hence, we can handle true scale (resolution)changes. The algorithm is outlined in Section 3.1, while themissing details are given in Section 5. We are primarilyconcerned with the amount of flexibility that our methodachieves since we aim to apply it to weakly similar curves.Section 3.2 shows extensive experiments with real images,where excellent matching is obtained between weaklysimilar shapes. We demonstrate silhouette matching underpartial occlusion, under substantial change of viewpoint,and even when the occluding contours describe different(but related) objects, like two different cars or mammals.Our method is efficient and fast, taking only a few secondsto match two curves.

3.1 The Proposed Matching Method

The occluding contours of objects are first extracted fromthe image and a syntactic representation is constructed,whose primitives are line segments and whose attributesare length and absolute orientation. Our algorithm thenuses a variant of the edit matching procedure combinedwith heuristic search. Thus, we define a novel similaritymeasure between primitives to assign cost to each editoperation, a novel merge operation, and introduce penaltyfor interrupting a contour (in addition to the regulardeletion/insertion penalty).

More specifically, let A and A0 be two syntacticrepresentations of two contours; A fa1; a2; . . . ; aNg is acyclically ordered list of N line segments and A0 fa01; a02; . . . ; a0N 0 g is another cyclic list of N 0 segments.

Let ai be a segment of A and a0j be a segment of A

0.Matching these segments uniquely determines the relativeglobal rotation and scale (2D alignment transformation)between the curves. We assume that the optimal alignmentis well approximated by at least one of the NN 0 possibleselections of ai and a

0j. In fact, we will discuss in Section 5.2

a method to prune many of them, leaving us with a set � A�A0 of candidate global alignments, such thatusually jj � NN 0.

A member fai; a0jg of (abbreviated fi; jg for conve-nience) denotes a starting point for our syntactic matchingalgorithm. The algorithm uses edit operations to extend thecorrespondence between the remaining unmatched seg-ments, preserving their cyclic order. Total cost is minimizedusing dynamic programming, where the cost of a matchbetween every two segments depends on their attributes, aswell as on the attributes of the initial chosen pair (ai and a

0j).

This implicitly takes into account the global alignmentoperation. The various edit operations and their costfunctions are described in Section 5.3; the cost of the editoperations can be either negative or positive.1

We are searching for the member fi; jg 2 for which theextended correspondence list has minimal edit cost. Bruteforce implementation of this search is computationallyinfeasible. The syntactic matching process is, therefore,interlaced with heuristic search for the best initial pair in .Namely, a single dynamic programming extension step isperformed for the best candidate in (possibly a differentcandidate in each extension matching step) while main-taining the lowest cost achieved by any of the sequences.When no candidate in has the potential to achieve a lowercost, the search is stopped (see below).

The pseudocode in Fig. 1 integrates the components ofour matching algorithm into a procedure which gets twosyntactic representations and returns a segment correspon-dence and a dissimilarity value. The arrays which supportthe dynamic programming are not referenced in thispseudocode to increase its readability. We note that theprocedure CURVE-MATCHING minimizes the edit cost,which is typically negative, thus, in effect, CURVE-MATCHING is maximizing the ªgainº of matching.

The procedure INITIALIZE performs the initial prun-ing of pairs of starting points and returns the set ofcandidates sorted by increasing potential values (seebelow). Full description is given in Section 5.2. Itsimplementation performs a few syntactic matching stepsfor all NN 0 possible pairs and computes the intermediateedit cost corresponding to this partial matching. Theminimal intermediate edit cost achieved by any of thecandidates is returned as cost� and the candidate pair whichachieves cost� is returned as fi�; j�g.

The procedure PICK-CANDIDATE selects a particularmember of to be fed into the syntactic algorithm. Tounderstand its operation, we need to define the concept ofpotential: For each candidate fi; jg that has been extendedto a correspondence list of some length, we compute alower bound on its final edit cost. This bound is based onthe intermediate edit cost which has already been achievedand the cost of the best (lowest cost) possible matching of


1. Negative cost can be interpreted as positive ªgainº or ªreward,º Everymatching prefix has a total (accumulated) cost, which should be as negativeas possible. A prefix is never extended by a suffix with positive cost sincethis would increase the cost; hence, partial matching is achieved by leavingthe last segments unmatched. Note that if all costs are negative, then theminimal cost must be obtained by matching all the segments of the shortersequence, and if all costs are positive, then the minimal cost is triviallyobtained by leaving all segments unmatched. The average cost valuedetermines the asymptotic matching length for random sequences [13].

the remaining unmatched segments. We call this bound thepotential of the candidate fi; jg. The procedure PICK-CANDIDATE returns the member of which has best(minimal) potential.

Technically, we store as an ordered list sorted byincreasing potential value. Each member of is a candidatefi; jg and its current potential . The procedurePICK-CANDIDATE then returns the first member of .The list is initially sorted by INITIALIZE and its order ismaintained by the procedure PREDICT-COST discussedbelow.

The search for the best candidate fi�; j�g continues aslong as there exists a candidate whose potential is lowerthan the best cost achieved so far (cost�). It is implementedby the loop, which iterates as long as cost� > potential. Notethat cost� cannot increase and potential cannot decreaseduring the search.

The procedure SYNTACTIC-STEP is the core of ouralgorithm. It is given as input two cyclically orderedsequences A fa1; . . . ; aNg and A0 fa01; . . . ; a0N 0 g, whichare partially matched from position i of A and onward andfrom position j of A0 and onward. It uses dynamicprogramming to extend the edit transformation betweenA and A0 by one step. Since our editing cost operationtypically takes negative values, the edit cost of fi; jg couldbecome better (lower) than cost�. In this case, fi�; j�g is setequal to fi; jg and cost� is set equal to the newly achievededit cost (otherwise, fi�; j�g and cost� remain unchanged).Sections 5.3 and 5.4 give the full description of theprocedure SYNTACTIC-STEP.

Extending the editing sequence of fi; jg is likely toincrease its potential, making it a less attractive candidate.This is because the potential of fi; jg is partiallydetermined by a lower bound on the final edit distancebetween the yet unmatched segments and the editoperation just added can only tighten this bound bydecreasing the number of unmatched segments. Theprocedure PREDICT-COST reestimates the final cost and

corrects the potential of fi; jg. Since is kept as anordered list, PREDICT-COST pushes the candidate fi; jgdown to maintain the order of the list. Section 5.5 gives fulldetails of the potential estimation.

Assuming that the reader is familiar with conventionaldynamic programming implementations, it is sufficient todescribe the procedure TRACE as the procedure whichreads the lowest cost path from the dynamic program-ming array.2 When this procedure is applied to the arrayassociated with the best candidate fi�; j�g, the lowest costediting sequence is obtained. In our implementation, inorder to keep space complexity low, we keep just the lastfew rows and columns for each array. Hence, theprodecure TRACE needs to repeat the syntactic matchingfor the best pair fi�; j�g. See Section 5.4 for a descriptionof the dynamic programming implementation.

Finally, using the correspondence we found, we refinethe global 2D alignment by minimizing the sum of residualdistances between matched segments endpoints. Theprocedure DISSIMILARITY performs the minimization,and uses the residual distances to define a robust measureof dissimilarity between the curves (details in Section 5.6).

Our approach thus combines syntactic matching with aproximity measure (in this sense, our method resemblesthat of [1]). That is, we establish feature correspondenceusing syntactic matching and then evaluate the dissim-ilarity according to the residual distances betweenmatched points. We do not use the edit distance as ameasure of dissimilarity, mainly due to the fact that thisquantity depends on the somewhat arbitrary parametersof the edit operation and segment similarity, whereas,typically, the best matching result is not sensitive to theseexact parameters. That is, the same matching is obtainedfor a range of edit parameter values, although the editdistance may be different. Another advantage to combin-ing syntactic and proximity criteria is that, in many cases,the combination provides a mechanism for outliersremoval, as is demonstrated in Section 5.6.

3.2 Matching Results

We now present a few image pairs and triplets togetherwith the matching results, which demonstrate perceptuallyappealing matching. In Section 3.3 below, we apply ourmatching algorithm to a database of 31 silhouettes given tous by Sharvit and Kimia and compare our dissimilarityvalues to those reported in [36]. Additional classificationresults will be presented in Section 4.2, using the matchingof a few thousand image pairs, to provide indirect objectiveexamination of the matching quality.

In all the experiments reported in this paper, we use thesame parameter values (defined in Section 5): w1 1,w2 0:8, w3 8:0, and K 4 (with the exception of Fig. 6,where K 5). Each matching of an image pair took only afew seconds (see Section 4.2).

Fig. 2 shows two images of different objects. There is ageometrical similarity between the two silhouettes whichhas nothing to do with the semantic similarity between


Fig. 1. Pseudocode for CURVE-MATCHING procedure.

2. Note, however, that using both positive and negative costs allows forpartial matching, hence the path can terminate at any entry of thedynamical programming array and not necessarily at the last row orcolumn.

them. The geometrical similarity includes five approxi-mately vertical swellings or lumps (which describe the fourlegs and the tail). In other words, there are many placeswhere the two contours may be considered locally similar.This local similarity is captured by our matching algorithm.

The two occluding contours of the two animals and thefeature points were automatically extracted in the prepro-cessing stage. Corresponding points are marked in Fig. 2 bythe same numbers. Hence, the tails and feet are nicelymatched, although the two shapes are only weakly similar.The same matching result is obtained under arbitrarily largerotation and scaling of one image relative to the other.

Fig. 3 demonstrates the local nature of our algorithm,namely, that partial matching can be found when objects areoccluded. Since our method does not require global imagenormalization, the difference in length between the silhou-ette outlines does not impede the essentially perfectmatching of the common parts. Moreover, the commonparts are not identical (note the distance between the frontlegs and the number of ears) due to a small difference inviewpoint; this also does not impede the performance ofour algorithm.

Fig. 3 also demonstrates outliers pruning using threeimages. In Fig. 3b, there is a shadow between two of theleaves (pointed to by the arrow) and, as a result, the outline

penetrates inward. The feature points along the penetrationare (mistakenly) matched with features along the tail inFig. 3a, since the two parts are locally similar. However, weuse the procedure of mapping the points of Fig. 3a to Fig. 3b,then to Fig. 3c and back to Fig. 3a. Only points which aremapped back to themselves are accepted as correct matches;these matches are marked by common numbers in Fig. 3.

Figs. 4 and 6 show results when matching images takenfrom very different points of view. In Fig. 4, two differentviews of the same object are matched and the method ofiterative elimination of distances is demonstrated (seeSection 5.6). Fig. 6 shows matching between three differentcars, viewed from very different viewpoints and subjectedto occlusion. Matching under a large perturbation ofviewpoint can be successful as long as the silhouettesremain similar ªenough.º Note that preservation of shapeunder change of viewpoint is a quality that definesªcanonicalº or ªstableº views. Stable images of 3D objects


Fig. 2. Qualitative matching between pictures of toy models of a horseand a wolf. Note the correct correspondence between the feet of the wolfto those of the horse and the correspondence between the tails. Theresults are shown without outliers pruning. In this example, all thefeatures to which no number is attached had been merged, e.g., thesegment 9-10 on the horse outline was matched with three segments onthe wolf outline.

Fig. 3. Dealing with occlusion: partial matching between three images.Points are mapped from (a) to (b) to (c) and back to (a). Only pointswhich are mapped back to themselves are accepted (order is notimportant). The points on the tail in (a) are matched with the shadow(pointed to by the arrow) in (b), but matching (b) with (c) leaves theshadow unmatched. Hence, the tail is not matched back to itself and thecorrespondence with the shadow is rejected.

were proposed as the representative images in anappearance based approach to object representation [47].

The last example (Fig. 5), shows results with anarticulated object, matching human limbs at different bodyconfigurations.

3.3 Dissimilarity Measurements: Comparison

In this section, we use our matching algorithm to computea dissimilarity value, as will be explained in Section 5.6.Good matching is essential for correct dissimilarity estima-tion, whose values we use for quantitative comparisonswith other methods. We use the image database created bySharvit et al. (see [36]), which consists of 31 silhouettes ofsix classes of objects (including fish, airplanes, and tools).

In [36], 25 images were selected out of this databaseand their pairwise similarities were computed. Themeasure of quality was the number of instances (out of25) in which the first, second, and third nearest neighborof an image was found in its own class. We follow thesame procedure and show, in Table 1, the dissimilarityvalues which we obtain for 25 silhouettes.3

We find that the fraction of times (out of 25) that thefirst nearest neighbor of an image belongs to its own classis 25/25, namely, it is always the case. For the second andthird nearest neighbors, the results are 21/25 and 19/25.4

In comparison, the results reported in [36] are 23/25,21/25, and 20/25, respectively, whereas our results fortheir choice of 25 images are the fractions 25/25, 20/25,and 17/25, respectively. It is to be noted, however, that, inthe framework of [36], one can use additional information,specifically, whether the two graphs that represent a pairof shapes have similar topology.

We conclude that the two methods are comparable inquality when isolated silhouettes are matched. This is inspite of our using boundary representation, whereassymmetry representation (shock graph) is used in [36].The shock graph representation is inherently more sensitiveto occlusions, while shock graph matching requires solvingthe difficult NP-complete graph isomorphism problem (seeSection 1). Moreover, our method can easily be adjusted tohandle open curves by avoiding the assumption that thesyntactic representation is cyclic. On the other hand, shockgraphs must distinguish between interior and exterior.

Recently, progress has been made toward computingthe edit distance between shock graphs [40] using apolynomial time algorithm that exploits their specialstructure. So far, however, the algorithm is not capableof dealing with invariance to image transformations andno quantitative measures have been reported.


Fig. 4. Matching two views of an object subjected to large foreshorten-ing. Rejected pairs (in circles) were detected in four iterations ofeliminating the 10 percent most distant pairs and realigning the others.Thirty-three and 35 features were extracted on the two outlines; 32 pairswere initially matched and nine pairs were rejected 28 percent.

Fig. 5. Matching of human limbs at different body configurations. In this

case, the outlines were extracted with snakes rather than by gray level

clustering (see acknowledgments). Original images are not shown.

Fig. 6. Combination of various sources of difficulty: different models,different viewpoints, and occlusion. The merging utility is used toovercome the different number of feature points around the wheels; gapinsertion is utilized to ignore the large irrelevant part.

3. As Table 1 shows, we have at least four images in each class. In [36],the same images were chosen with the exception that one of the classesconsisted of only three images, while the fish class contained five images.However, for members of a class consisting of only three images, the threenearest neighbors can no longer be all in the same class. Hence, we slightlymodified the choice of selected images.

4. This demostrates the limitation of syntatic curve matching for shaperecognition as, in some case, objects have similar bounding curves butdifferent interior regions.

4 CLUSTERING OF SILHOUETTES

The next step in our approach involves feeding a graph of

image similarity values, computed by the silhouette match-

ing algorithm, to a similarity-based clustering algorithm. By

hierarchically dividing the images into distinct clusters, we

discover structure in the data. To this end, we developed a

stochastic clustering algorithm [14], whose full description

is beyond the scope of this paper. Instead, we give below a

brief review of the algorithm, showing clustering results

which demonstrate the usefulness of our approach and the

quality of the matching results.

4.1 Stochastic Clustering Algorithm

We represent the image database as a graph, with nodesrepresenting the images and similarity values assigningweights to the edges. Every partition of the nodes into rdisjoint sets is an r-way cut in the graph and the edgeswhich connect between different sets of nodes are said tocross the cut. The value c of a cut is the sum of the weights ofall crossing edges.

Our clustering method induces a probability distributionover the set of all r-way cuts in the graph, with a knownlower bound on the probability of the minimal cut. Underthis distribution over cuts, we compute, for every twonodes u and v in the graph, their probability pruv of being inthe same component of a random r-way cut.

The partition of the nodes into disjoint sets, whichsatisfies pruv < 0:5 for every crossing edge u; v, is theoutput of our clustering algorithm for scale level rr 1 . . .N. At level r 1 all the nodes must be in onecluster and, as r is increased, the partitions undergo aseries of bifurcations. The ªinterestingº bifurcation points,which account for meaningful data clustering, can beclearly distinguished from the others. This is a majoradvantage of our algorithm over deterministicagglomerative methods.

Our clustering method is robust and efficient, runningin ON log2N time for sparse graphs and ON2 logNfor complete graphs. It does not require that thesimilarity values obey metric relations, hence it is suitablefor the present application where the similarity provided


TABLE 1The Dissimilarity Values Computed between 25 Silhouettes (Multiplied by 1,000 and Rounded)

For each line, the columns that correspond to the three nearest neighbors (and the self-zero distance) are highlighted. The first, second, and thirdnearest neighbor are in the same class, a fraction of 25/25, 21/25, and 19/25 of the times, respectively.

by our matching algorithm does not necessarily satisfythe triangle inequality.5

4.2 Clustering Results

We now integrate the two steps of similarity estimationand pairwise clustering into one experiment of imagedatabase categorization. The database contains 121 imagesof 12 different objects; 90 of the images were collected byplacing six toy models on a turntable so that the objectscould be viewed from different viewpoints. The otherimages are the 31 silhouettes discussed in Section 3.3. Foreach of the six toy models, we collected 15 images byrotating them in azimuth

# ÿ20�;ÿ10�; 0�; 10�; 20�and elevation (' ÿ10�; 0�; 10�). We used models of a cow,wolf, hippopotamus, two different cars, and a child.

The central images # ' 0 in each of the threegroups of pictures of animal models are side views (i.e.,four legs, head, and tail are visible). All the different 15images of each animal model are somewhat similar in thatthe same parts are visible (though, in some pictures, someparts, such as two legs or a leg and a tail, are merged intoone in the silhouette). Thus, there is weak geometricalsimilarity between all the 45 silhouettes of the threemammals and there is weak geometrical similarity between

the 30 different silhouettes of the two cars. A desirable

shape categorization procedure should reveal this hidden

hierarchical structure.All the images were automatically preprocessed to

extract the silhouettes of the objects and represent them

syntactically (see Section 5.1). The dissimilarities between

the silhouettes are estimated using the algorithm described

in Section 3.1. In order to compare all the image pairs in our

database of 121 images, we performed 7,260 matching

assignments; this took about 10 hours on an INDY R4400

175Mhz workstation (about 5 seconds per image pair, on

average).The dissimilarity matrix constitutes the input to the

clustering algorithm outlined above. When the scale

parameter r is varied, the hierarchical classification shown

in Fig. 7 is obtained. At the highest level r 1, all theimages belong to a single cluster. As r is increased, finer

structure emerges. Note that related clusters (like the two

car clusters) split at higher r values, which means that our

dissimilarity measure is continuous, assigning low (but

reliable) values to weakly similar shapes.Since humans can do so, we assume that an ideal shape

classifier can put the images of every object in a different

class. It is hard to test this hypothesis since, as humans, we

cannot ignore the semantic meaning of the shapes. Never-

theless, comparing with the ideal human perceptual

classification, our finest resolution level is almost perfect,

with only two classification errors (in the boxes marked by

�) and the undesirable split of the fish cluster.


5. This would also be the case had we used the generalized Hausdorffdistance or the normalized edit distance [29]. The violation of metricproperties is also known to exist in the function underlying our humannotion of similarity between both semantic and perceptual stimuli [43].

Fig. 7. The classification tree (dendrogram) obtained for an image database consisting of 121 images. The finest classification level is shown byputting each cluster of silhouettes in a box. For the large clusters representing our own toy models (see text), the figure shows only five exemplars,but the other 10 are classified correctly as well. Note that the lower levels of the tree correspond to meaningful hierarchies, where similar classes (likethe two cars or the three sets of mammals) are grouped together. The vertical axis is not in scale.

Our categorization is obtained using only intrinsic shapeinformation. The relative size, orientation, and position ofthe silhouettes within each category is arbitrary. Moreover,global information, like the length of the occluding contouror the area it encloses, is not used. Hence, we expect thatmoderate occlusion will not affect the classification.

5 FLEXIBLE SYNTACTIC MATCHING: DETAILS

In this section, we give the details of the various proceduresand steps involved in the curve matching algorithm,outlined in Section 3.1. Contour representation is discussedin Section 5.1. The initial pruning of candidate globalalignments is discussed in Section 5.2, where we describethe procedure INITIALIZE. The syntactic edit operationswhich are used by SYNTACTIC-STEP, and their respectivecosts, are discussed in Section 5.3. The details of thedynamic programming procedure, which SYNTACTIC-STEP uses to minimize the edit distance, are given inSection 5.4. In Section 5.5, we define the potentials whichare used to guide the search for best starting point anddescribe the procedure PREDICT-COST. Finally, theprocedure DISSIMILARITY is discussed in Section 5.6.

5.1 Preprocessing and Contour Representation

In the examples shown in this paper, objects appear ondark background and segmentation is successfullyaccomplished by a commercial k-means segmentation tool.A syntactic representation of the occluding contour is thenautomatically extracted: It is a polygon whose vertices areeither points of extreme curvature or points which areadded to refine the polygonal approximation. Thus, theprimitives of our syntactic representation are line segmentsand the attributes are length and absolute orientation. Thenumber of segments depends on the chosen scale and theshape of the contour, but typically it is around 50. Coarserscale descriptions may be obtained using merge operations.

Feature points (vertices) are initially identified at pointsof high curvature, according to the following procedure: Atevery contour pixel p, an angle � is computed between twovectors u and v. The vector u is the vectorial sum of mvectors connecting p to its m neighboring pixels on the leftand the vector v is similarly defined to the right. Pointswhere � is locally minimal are defined as feature points. Thepolygonal approximation (obtained by connecting thesepoints by straight lines) is compared with the originaloutline. If the distance between the contour points to thepolygonal segments is larger than a few pixels, more featurepoints are added to refine the approximation.

5.2 Global Alignment: Pruning the Starting Points

The procedure INITIALIZE receives two cyclic sequencesA and A0 of lengths N and N 0, respectively, as defined inSection 3.1. Initially, there are NN 0 possible starting pointsfor the syntactic matching procedure; recall that eachstarting point corresponds to the matching of one segmentai in A to another segment aj in A

0, thus defining the global2D alignment between the two curves. However, thenumber of successful starting points is much smaller thanNN 0and they tend to correspond to similar global 2Dalignment transformations for two reasons: 1) Low cost

transformations tend to be similar since any pair ofsegments fai; a0jg, which belongs to a good correspondencelist, is likely to be a good starting point for the syntacticalgorithm. 2) The overall number of good starting pointstends to be small since most of the pairs in A�A0 cannot beextended to a low cost correspondence sequence; the reasonis that it might be possible to find a random match of shortlength, but it is very unlikely to successfully match longrandom sequences.

These observations are used by the procedure INITI-ALIZE to significantly reduce the number of candidatestarting points. The procedure uses as a parameter thenumber t of edit operations which are performed for everyone of the NN 0 possible starting points (in our experimentswe use t 5 or 10). The pruning proceeds using the relationbetween starting points and global 2D alignments, asfollows:

Every starting point fai; a0jg is associated with a global2D alignment and, in particular, with a certain rotationangle that maps the direction of ai to that of a

0j. Let n be

minN;N 0, and observe the distribution of the n rotationangles which achieved the best n edit distances after t steps.If these angles are distributed sharply enough around somecentral value c, we conclude that c is a good estimator forthe global rotation. Then, we discard every starting point inA�A0 whose associated rotation is too far from c. Theremaining set of candidates is the set (see Fig. 8).

For each candidate that remains in , the procedureINITIALIZE computes its future potential, as discussed inSection 5.5, and sorts the list by increasing potentialvalues. The minimal edit distance (cost) that has beenachieved during the first t steps is returned as cost� and thecandidate possessing this cost is returned as fi�; j�g.5.3 Syntactic Operations Which Determine the Edit

Distance

The goal of classical string edit algorithms is to find asequence of elementary edit operations which transformone string into another at a minimal cost. The elementaryoperations are substitution, deletion, and insertion of stringsymbols. Converting the algorithm to the domain of vision,symbol substitution is interpreted as matching two shapeprimitives and the substitution cost is replaced by thedissimilarity between the matched primitives. The dissim-ilarity measure is discussed in Section 5.3.1. Novel opera-tions involving gap insertion and the merging of primitivesare discussed in Sections 5.3.2 and 5.3.3.

5.3.1 Similarity Between Primitives

We now define the similarity between line segments ak anda0l. The cost of a substitution operation is this value with aminus sign. Hence, the more similar the segments are, thelower their substitution cost is. We denote the attributes ofak; a

0lÐorientation and lengthÐby �; ` and �0; `0, respec-

tively. The ratio between the length attributes is denotedrelative scale c `=`0.

The term ªreference segmentsº refers to the starting pointsegments which implicitly determine the global rotation andscale that aligns the two curves (as discussed above). Thereference segments are specified by the argument fi; jg inthe call to the procedure SYNTACTIC-STEP and are


denoted here a0; a00. The segment similarity also depends on

the corresponding attributesÐorientation, length, andrelative scaleÐof the reference segments: �0; `0, �00; `00,and c0 `0=`00.

We first define the component of similarity which isdetermined by the length (or relative scale) attribute oftwo matching segments. We map the two matched pairsof length values f`; `0g and f`0; `00g to two correspondingdirections in the `; `0-plane and measure the anglebetween these two directions. The cosine of twice thisangle is the length-dependent component of our measureof segment similarity (Fig. 9). This measure is numericallystable. It is not sensitive to small scale changes nor doesit diverge when c0 `0=`00 is small. It is measured inintrinsic units between ÿ1 and 1. The measure issymmetric so that the labeling of the contours as ªfirstºand ªsecondº has no effect.

Let � be the angle between the vectors `; `0 and `0; `00.Our scale similarity measure is:

S``; `0j`0; `00 cos 2� 4cc0 c2 ÿ 1c20 ÿ 1c2 1c20 1

: 1

Thus, S` depends explicitly on the scale values c and c0rather than on their ratio, hence, it cannot be computedfrom the invariant attributes `=`0 and `

0=`00. The irrelevanceof labeling can be readily verified since

S`c; c0 S`cÿ1; cÿ10 :We next define the orientation similarity S� between two

line segments whose attributes are � and �0, respectively.The relative orientation between them is measured in thetrigonometric direction (denoted �! �0) and comparedwith the reference rotation (�0 ! �00):

S��; �0j�0; �00 cos �! �0 ÿ �0 ! �00: 2As with the scale similarity measure, the use of the cosineintroduces nonlinearity; we are not interested in finesimilarity measurement when the two segments are closeto being parallel or antiparallel. Our matching algorithm isdesigned to be flexible in order to match curves that areonly weakly similar; hence, we want to encourage segmentmatching even if there is a small discrepancy between theirorientations. Similarly, the degree of dissimilarity betweentwo nearly opposite directions should not depend too muchon the exact angle between them. On the other hand, the

point of transition from acute to obtuse angle between thetwo orientations seems to have a significant effect on thedegree of similarity and, therefore, the derivative of S� ismaximal when the line segments are perpendicular.

Finally, the combined similarity measure is defined asthe weighted sum:

S w1S` S�; 3The positive weight w1 (which equals 1 in all ourexperiments) controls the coupling of scale and orientationsimilarity.

5.3.2 Gap Opening

In string matching, the null symbol � serves to define thedeletion and insertion operations using a! � and �! a,respectively, where a denotes a symbol. In our case, adenotes a line segment and � is interpreted as a ªgapelement.º Thus, a! � means that the second curve isinterrupted and a gap element � is inserted into it to bematched with a. Customarily, we define the same cost forboth operations, making the insertion of a into one sequenceequivalent to its deletion from the other.

The cost of interrupting a contour and inserting� connected gap elements into it (that are matched with� consecutive segments on the other curve) is defined asw3 ÿ w2 � �, where w2; w3 are positive parameters. Thus, weassign a penalty of magnitude w3 for each single interrup-tion, which is discounted by w2 for every individualelement insertion or deletion. This predefined quantitycompetes with the lowest cost (or best reward) that can beachieved by � substitutions. A match of � segments whosecost is higher (less negative) than w3 ÿ w2 � � is consideredto be poor and, in this case, the interruption and gapinsertion is preferred.

In all our experiments, we used w2 0:8 and w3 8:0.(These values were determined in an ad hoc fashion and notby systematic parameter estimation, which is left for futureresearch). These numbers make it possible to match a gapwith a long sequence of segments as required when curvesare partially occluded. On the other hand, isolated gaps arediscouraged due to the high interruption cost. Ouralgorithm, therefore, uses deletions in cases of occlusion,


Fig. 9. Length similarity is measured by comparing the correspondingreference length values `0; `00 with the corresponding length values ofthe current segment `; `0. Each length pair is mapped to a direction inthe plane and similarity is defined as cos2�. This value is boundedbetween ÿ1 and 1.

Fig. 8. An example of initial pruning. Two curves with n minN;N 0 50 are matched syntactically using t edit steps (see text for details). The50 candidates which achieve best (minimal) edit cost are examined tosee whether the distribution of their associated rotations shows centraltendency. Here, we sample the distribution after 1, 5, and 10 syntacticsteps. In this example, 5-10 steps are sufficient for a reliable estimationof the global rotation angle c. We proceed by eliminating the candidateswhose associated global rotation is too far from c.

while, for local mismatches, it uses the merging operation,which is described in Section 5.3.3.

5.3.3 Segment Merging

One advantage of using variant attributes (length andabsolute orientation), is that segment merging becomespossible. We use segment merging as the syntactichomologue of curve smoothing, accomplishing noisereduction by local resolution degradation. Segment mer-ging, if defined correctly, should simplify a contourrepresentation by being able to locally and adaptivelychange its scale from fine to coarse.

We define the following merging rule: Two adjacent linesegments are replaced by the line connecting their twofurthest endpoints. If the line segments are viewed asvectors oriented in the direction of propagation along thecontour, then the merging operation of any number ofsegments is simply their vectorial sum.6 The cost of a mergeoperation is defined as the dissimilarity between the mergedsegment and the one it is matched with. A comparison ofthis rule with the literature in given in the appendix.

5.4 Minimizing Edit Cost via Dynamic Programming

Procedure SYNTACTIC-STEP is given as input twocyclically ordered sequences A fa1; . . . ; aNg andA0 fa01; . . . ; a0N 0 g, which are partially matched fromposition i of A and onward, and from position j of A0

and onward. It uses dynamic programming to extend theedit transformation between A and A0 by one step. Theedit operations and their cost functions were discussedabove in Section 5.3.

The notation used in Section 3.1 hides, for simplicity, theworkspace arrays which are used for path planning. Let usassume that the procedure SYNTACTIC-STEP can access anarrayRwhich corresponds with the starting point fi; jg. Theentry R�; � holds the minimal cost that can be achievedwhen the � segments of A following ai are matched with the� segments of A0 following a0j. We will see later that it is notnecessary to keep the complete array in memory.

A syntactic step is the operation of getting an array Rwhich is partially filled, and extending it by filling some ofthe missing entries. Our choice of extension is called ªblockcompletion.º Let us assume for simplicity that the referencesegments fi; jg are the first ones. There is no loss ofgenerality here since the order in A;A0 is cyclic. Initially, thecomputed portion of the array R corresponds to a diagonalblock whose two corners are R1; 1 and R�; �. When theprocedure SYNTACTIC-STEP is applied to R, it extends Rby filling in another row (of length �) and/or anothercolumn (of length �).

The decision whether a row or a column is to be addeddepends on the previous values of R and is related to thepotential computation that is discussed below in Section 5.5.Roughly speaking, we add a row (column) if the minimum

of R is in the last computed row (column) and we add bothif the minimum is obtained in the last computed corner.

When extending the block of size �� , every new entryRk; l is computed according to the following rule:

Rk; l minfr1; r2; r3g; 4where

r1 min�;�2

fR�; � ÿ S�k; �lgr2 min

0

We use a tight lower bound on the edit cost estimation,

termed ªpotential.º The optimality of the search is thus

guaranteed. In this section, we define the potential function

and explain how it is used by the procedure PREDICT-

COST.Let us consider an entry Rk; l in the dynamic

programming array R. Without loss of generality, due

to the cyclic segment order, we can assume that the

forward path leaving this entry lies in the rectangular

block defined by the corners Rk; l and RN;N 0. Wedefine � and � to be the dimensions of this rectangle:

� minN ÿ k;N 0 ÿ l and � maxN ÿ k;N 0 ÿ l. Wealso define � to be the maximal similarity between

segments (� 1 w1 from (3)). The lowest cost substitu-tion thus has the negative cost ÿ�, which acts as areward.

The values of the parameters w1; w2; w3 are chosen to

guarantee that a diagonal path consisting of the lowest cost

substitutions has the minimal possible cost.7 Hence, the

lowest cost forward path originating from Rk; l mustcontain a diagonal segment of maximal length, namely of

length �. The cost assigned with this part is ÿ��. In additionto the diagonal part, the forward path may contain a vertical

or horizontal part of length � ÿ �. In general, this part candecrease the cost only if w3 ÿ w2� ÿ � is negative. How-ever, in the case where the value of Rk; l is set equal to r2 orr3 in (4), it is the case that the entry k; l is alreadyconsidered to be inside a gap. In this case, we can first

extend the gap by � ÿ � moves and only then continue with� diagonal steps. Thus, the interruption penalty is avoided,

and the forward path cost becomes ÿ�� ÿ w2� ÿ �.Combining this bound with the value of Rk; l, we get alower bound fkl on the minimal edit cost which can be

achieved passing through Rk; l:

fkl Rk; l ÿ �� minf0; w3 ÿ w2� ÿ �g

if Rk; l r1 in 4Rk; l ÿ �� ÿ w2� ÿ � if Rk; l r2; r3 in 4:

8

the following: Features are matched when the local pieces of

curve around them have similar shape; if, after alignment,

they are also proximal, meaning that they agree with the

global alignment, then the match is likely to be correct.The other pruning technique can be used when three

related images are available (rather than two). Assume

that a feature point p on contour 1 is matched with point

p0 on contour 2 and p0 is matched with p00 on contour 3.If the matching between 1 and 3 supports the mapping

between p and p0 0, then the correspondence list (p$ p0,p0 $ p00, p$ p00) is accepted.5.7 Discussion and Comparison to Other Methods

Below we discuss some issues relating to complexity,

invariance, and direct proximity computation. A detailed

comparison of our syntactic operations to other methods

is given in the appendix.

5.7.1 Complexity

The algorithm develops jj dynamic programming arraysand, when it terminates, each array has been completed upto a block of some size. Let n2 be the average number ofentries in a completed block. The total number of computedentries is, therefore, jjn2. Clearly, n is a fraction of N . Fromthe considerations discussed in Section 5.2, it follows thatjj is typically of the order of N as well. It is possible,although not used here, to constrain the procedureINITIALIZE to return a set of size minN;N 0 exactly.The overall number of computed entries is, therefore,ON3.

Every single entry computation is of complexity OK2since KK ÿ 1=2 alternative evaluations are required tocompute r1 in (4). Note that K is usually a small constant(we used K 4). An entry computation is followed byupdating the array potential. We maintain for each row andcolumn the best score achieved there, hence, the arraypotential is computed in OK time after a block iscompleted.

5.7.2 How to Achieve Invariance

Available syntactic matching methods usually achievescale and rotation invariance (if at all) by using invariantattributes. The benefit of using invariant attributes isefficiency. The drawback is that invariant attributescannot be smoothed by merging and they are eithernonlocal or noninterruptible. For example, in [27], theorientation of a line segment is measured with respect toits successor, hence, the opening of a gap betweensegments introduces ambiguity into the representation(see Fig. 11). In [16], the attributes which describe curvefragments are Fourier coefficients and, in [1], an attributecalled ªsphericityº is defined. Both are invariant attributesbut noninterruptible.8

Moreover, it seems to be impossible to find operators oninvariant attributes that are equivalent to smoothing in realspace. Instead, a cascade of different scale representationsmust be used [44], where a few fragments may be replacedby a single one which is their ªancestorº in a scale spacedescription. This requires massive preprocessing, building acascade of syntactical representations for each curve withconsistent fragment hierarchy.

In contrast, our algorithm is invariant with respect toscaling, rotation, and translation without relying on invar-iant attributes, while remaining efficient and capable ofcomparing complex real image curves in a few seconds.Furthermore, a novel merging operation was defined whichaccomplished curve simplification and helped in noisereduction and resolution change.

5.7.3 Direct Proximity Minimization

The conversion of residual distances into a dissimilaritymeasure is explained in Section 5.6. The reader may wonderwhy we can't choose a proximity criterion, like the averagedistance between matched points or the Hausdorff distance,and minimize it directly. The Hausdorff distance mayappear to be especially attractive since it is not based on anyprior feature pairing [2], [19].

We claim that direct minimization, which is heavilystudied in the literature, is not adequate when the curvesare only weakly similar. The reason is that proximitymethods treat curves as two sets of points and ignore morequalitative ªstructuralº information.

An example is shown in Fig. 12. In this example, wecompare two weakly similar curves, where a permissiblealignment transformation includes translation, rotation, anduniform scaling. Assuming that two sets of feature pointshave been extracted from the two curves, we investigate theproximity measure which is defined as the average distancebetween matched points. This measure is larger for thealignment shown in Fig. 12c than for the one shown inFig. 12d. Hence, a proximity algorithm, which seeks theoptimal alignment that achieves minimal residual distances,will consider Fig. 12d as a better alignment than Fig. 12c.9


8. The Fourier coefficients are normalized individually, which meansthat if every fragment undergoes a different rigid or scaling transformation,the representation remains unchanged. The sphericity representationbehaves in the same way. The relative size and orientation information ispreserved as long as the sequence is not interrupted since overlappingfragments are used. Note that, in spite of this property, the algorithms areapplied to partial matching in the framework of model-based recognition,since the solution that preserves the correct relative size and orientationinformation between primitives remains a valid solution and the danger offinding an undesired solution (as is demonstrated in Fig. 11) is small.

9. Note that the lower proximity value in Fig. 12d is not the result of theuse of different global scaling since the shorter curve appears in Fig. 12c andFig. 12d at the same size. Moreover, the lower proximity measure of Fig.12d is obtained even though many-to-one matches are avoided and theorder of points is kept. Without these constrainst, it is easy to get arbitrarilysmall proximity values for an arbitrary correspondence by shrinking one setof points and matching it with only a few points (or even a single point)from the other set.

Fig. 11. When the primitive attribute is measured relative to a precedingprimitive, interrupting the sequence creates problems. Here, forexample, the orientation information is lost when the dotted segmentis matched with a gap element. As a result, the two contours may bematched almost perfectly to each other and considered as very similar.

We continue with the previous example and investigateinstead the use of the directed Hausdorff distance as theproximity criterion. The directed Hausdorff distance from apoint set P to a point set Q is defined (with the Euclideannorm jj � jj) as: hP;Q maxp2P minq2Q jjpÿ qjj; it is equalto the largest distance from some point in P to its nearestneighbor in Q. This measure is asymmetric and, therefore,the symmetric expression maxfhP;Q; hQ;P g is oftenpreferred. However, when there is large image clutter, as inour case, the symmetric distance is not useful since its valueis determined by the irrelevant part of the curve. Thus, thedirected distance, measured from the shorter curve to thelonger one, is larger for the alignment shown in Fig. 12cthan for the one shown in Fig. 12d. A proximity algorithmthat is based on minimizing the directed Hausdorff distancewill consider Fig. 12d as a better alignment than Fig. 12c.

In contrast, because it uses local structure, our syntacticmatching algorithm provides the correct matching of thecurves in Fig. 12c.

6 DISCUSSION

This paper deals with the inherently ill-posed problem ofmatching weakly similar curves for the purpose of similarityestimation. Our flexible syntactic matching is based on asimple heuristic, guided by the principle that matchedfeatures should lie on locally similar pieces of curve.Naturally, this principle cannot guarantee that the resultswould always agree with our human intuition for ªgoodºmatching, but our examples demonstrate that satisfactory

and intuitive results are usually obtained. In addition, wesuccessfully passed two more objective tests: 1) In adatabase of 31 images we showed that nearest neighbors,computed according to our distance, are always of the sametype; 2) in a database of 121 images, clustering based on oursimilarity results gave perceptually veridical hierarchicalstructure. Note that ªsuccessfulº matching depends on theapplication at hand. Our method is not suitable forrecovering depth from stereo, but it is well-suited for morequalitative tasks, such as the organization of an imagedatabase, the selection of prototypical shapes, and imagemorphing for graphics or animation.

In order to achieve large flexibility, we introduced anonlinear measure of similarity between line segmentswhich is not sensitive to either very small or very largedifferences in their scale and orientation. Our specific choiceof segment similarity, combined with a novel mergingmechanism and an improved interruption operation, addup to a robust and successful algorithm. The mostimportant properties of our algorithm, which make itadvantageous over other successful matching algorithms,are its relatively low complexity, its locality, which allowsus to deal with occlusions, and its invariance to 2D imagetransformations.

We demonstrated excellent results, matching similarcurves under partial occlusion, matching similar curveswhere the curves depict the occluding contours of objectsobserved from different viewpoints, and matching differentbut related curves (like the silhouettes of different mammalsor cars). Furthermore, in a classification task using ouralgorithm to precompute image pair similarity, the cluster-ing algorithm detected all the relevant partitions. This kindof database structuring could not have emerged without thereliable estimation of the dissimilarities between weaklysimilar images. Hence, the classification tree is an indirectand objective evidence for the quality of our matchingmethod.

Note that, according to the appearance, based approachto object recognition, an object is represented by a collectionof images which, in some sense, span the image space ofthat object. Our method can be used to divide theappearances of an object into clusters of similar views inorder to assist the construction of an appearance basedrepresentation.

APPENDIX

SYNTACTIC OPERATIONS: COMPARISONS

A How to Measure Similarity

Local and scale invariant matching methods usually use the

normalized length `=`0. For example, the ratio between

normalized lengths `=`0`0=`00

is used in [26], [27] (with global

normalization, the difference j`=Lÿ `0=L0j can be used [41],[44]). The ratio between normalized lengths may be viewed

as the ratio between the relative scale c `=`0 and thereference relative scale c0 `0=`00. While this scale ratio isinvariant, unlike our measure, it is not bounded and is thus

less stable.


Fig. 12. (a), (b)ÐTwo weakly similar curves. The points are extractedautomatically from polygonal approximations that do not depart from theoriginal curves by more than eight pixels. (c) The desirable matchingresult, as obtained by our matching algorithm. The average distancebetween matched feature points is 25 pixels. The directed Hausdorffdistance is 34 pixels. (d) A nondesirable pairing of points which yieldsbetter proximity value. The average distance between matched featurepoints is only 23 pixels. The directed Hausdorff distance is only 29pixels.

We are familiar with only one other definition of asymmetric, bounded, and scale invariant measure forsegment length similarity [26]. However, their matchingalgorithm is not syntactic and is very different from ours. Inaddition, there is an important qualitative differencebetween the two definitions (see Fig. 13), where ourmeasure is more suitable for flexible matching.

As for our measure of orientation difference, we notethat a linear measure of orientation differences has beenwidely used by others [9], [27], [41], [42]. The nonlinearmeasure used by [26] differs from ours in exactly the sameway as discussed above concerning length.

Reminiscent of our combined similarity measure (3), in[33] a coupled measure is used: The segments are super-imposed at one end and their dissimilarity is proportionalto the distance between their other ends. However, thismeasure is too complicated for our case, and it has theadditional drawback that it is sensitive to the arbitraryreference scale and orientation (in the character recognitiontask of [33], it is assumed that characters are of the samescale and properly aligned).

B How to Open Gaps

All the syntactical shape matching algorithms that we arefamiliar with make use of deletions and insertions as purelylocal operations, as in classical string matching. That is, thecost of inserting a sequence of gaps into a contour is equalto the cost of spreading the same number of gap elements indifferent places along the contour. We distinguish the twocases since the first typically arises from occlusion or partialmatching, while the second arises typically from curvedissimilarity. In order to make the distinction, we adopt atechnique frequently used in protein sequence comparison,namely, we assign a cost to any event of contourinterruption, in addition to the (negative) cost fromdeletion/insertion of any single element.

C How to Merge Segments

A similar approach to segment merging (Section 5.3.3) wastaken in [44], but their use of invariant attributes made itimpossible to realize the merge operator as an operation onattributes. Specifically, there is no analytical relationbetween the attributes being merged to the attributes ofthe new primitive. Instead, a cascade of alternativerepresentations was used, each one obtained by a differentGaussian smoothing of the two-dimensional curve; a

primitive sequence is replaced by its ªancestorº in the scalespace description.10

Compare the polygonal approximation after mergingwith the polygon that would have been obtained had thecurve been first smoothed and then approximated. The twopolygons are not identical since smoothing may causedisplacement of features (vertices). However, a displacedvertex cannot be too far from some feature at the finestscale; the location error caused by ªfreezingº the featurepoints is clearly bounded by the length of the longestfragment in the initial (finest scale) representation. Toensure good multiscaled feature matching, our suboptimalpolygonal approximation is sufficient and the expensivegeneration of the multiscale cascade is not necessary.Instead, the attributes of the coarse scale representationmay be computed directly from the attributes of the finerscale.

Merging was defined as an operation on attributes by[41], which also applied the technique to Chinese characterrecognition [42]. Their algorithm suffers from some draw-backs concerning invariance and locality11; below weconcentrate on their merging mechanism and compare itto our own.

Assume that two line segments characterized by `1; �1and `2; �2 are to be merged into one segment `; �. In [41],` `1 `2 and � is the weighted average between �1 and �2,with weights `1=`1 `2 and `2=`1 `2 and with thenecessary cyclic correction.12 Usually, the polygonal shapethat is obtained using this simple ad hoc merging schemecannot approximate the smoothed contour very well.


Fig. 13. Scale similarity: The scale c `=`0 is compared with thereference scale c0 `0=`00. Left: The binary relation used by Li [26] tomeasure scale similarity is expÿjlogc=c0j=� with � 0:5. Right: Ourmeasure function (1) is not sensitive to small scale changes, since it isflat near the line c c0.

Fig. 14. Comparison between merging rules: (a) A polygonal approx-imation of a curve with two dotted segments which are to be merged. (b)Merging result according to our scheme. A coarser approximation isobtained. (c) Merging according to Tsai and Yu [41]. The new polygondoes not appear to give a good approximation.

10. The primitive elements used in [44] are convex and concavefragments, which are bounded by inflection points. The attributes are thefragment length divided by total curve length (a nonlocal attribute) and theaccumulated tangent angle along the fragment (a noninterruptibleattribute). The algorithm cannot handle occlusions or partial distortionsand massive preprocessing is required to prepare the cascade of syntacticalrepresentations for each curve with consistent fragment hierarchy.

11. The primitives used by [41] are line segments, the attributes arerelative length (with respect to the total length) and absolute orientation(with respect to the first segment). The relative length is, of course, anonlocal attribute and, in addition, the algorithm uses the total number ofsegments, meaning that the method cannot handle occlusions. The problemof attribute variance due to rotation remains, in fact, unsolved. The authorsassume that the identity of the first segments is known. They comment thatif this information is missing, one may try to hypothesize an initial match bylabeling the segment that is near the most salient feature as segmentnumber one.

12. For example, an equal weight average between 0:9� (almost ªwestº)and ÿ0:9� (almost ªwestº as well) is � (ªwestº) and not zero (ªeastº).

Satisfactory noise reduction is only achieved in one of thefollowing two extreme cases: Either one segment isdominant (much longer than the other one) or the twosegments have similar orientation. If two or more segmentshaving large variance are merged, the resulting curve maybe very different from the original curve (see Fig. 14).Hence, by performing segment merging on the polygonalapproximation at fine scale, one typically does not obtainan acceptable coarse approximation of the shape.

ACKNOWLEDGMENTS

The authors would like to thank Ben Kimia and Daniel

Sharvit for the 31 silhouette database and Davi Geiger for

the human limbs data. This research is partially funded by

the Israeli Ministry of Science.

REFERENCES[1] N. Ansari and E. Delp, ªPartial Shape Recognition: A Landmark

Based Approach,º IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 12, pp. 470-489, 1990.

[2] E. Arkin, L.P. Chew, D. Huttenlocher, K. Kedem, and J.Mitchel, ªAn Efficiently Computable Metric for ComparingPolygonal Shapes,º IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 13, pp. 209-216, 1991.

[3] H. Asada and M. Brady, ªThe Curvature Primal Sketch,º IEEETrans. Pattern Analysis and Machine Intelligence vol. 8, pp. 2-14,1986.

[4] N. Ayach and O. Faugeras, ªHYPER: A New Approach for theRecognition and Positioning of Two Dimensional Objects,º IEEETrans. Pattern Analysis and Machine Intelligence, vol. 8, pp. 44-54,1986.

[5] R. Basri, L. Costa, D. Geiger, and D. Jacobs, ªDetermining theSimilarity of Deformable Shapes,º IEEE Workshop Physics BasedModeling in Computer Vision, pp. 135-143, 1995.

[6] B. Bhanu and O. Faugeras, ªShape Matching of Two DimensionalObjects,º IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 6, pp. 137-155, 1984.

[7] R. Bolles and R. Cain, ªRecognizing and Locating Partially VisibleObjects: The Focus Feature Method,º Int'l J. Robotics Research,vol. 1, pp. 57-81, 1982.

[8] A. Brint and M. Brady, ªStereo Matching of Curves,º Image andVision Computing, vol. 8, pp. 50-56, 1990.

[9] W. Christmas, J. Kittler, and M. Petrou, ªStructural Matching inComputer Vision Using Probabilistic Relaxation,º IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 17, pp. 749-764, 1995.

[10] I. Cohen, N. Ayache, and P. Sulger, ªTracking Points onDeformable Objects Using Curvature Information,º Proc. EuropeanConf. Computer Vision, pp. 458-466, 1992.

[11] L. Davis, ªShape Matching Using Relaxation Techniques,º IEEETrans. Pattern Analysis and Machine Intelligence, vol. 1, pp. 60-72,1979.

[12] A. Del Bimbo and P. Pala, ªVisual Image Retrieval by ElasticMatching of User Sketches,º IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 19, pp. 121-132, 1997.

[13] A. Dembo, S. Karlin, and O. Zeitouni, ªCritical Phenomena forSequence Matching with Scoring,º Annals of Probability, vol. 22,pp. 1,993-2,021, 1994.

[14] Y. Gdalyahu, D. Weinshall, and M. Werman, ªStochastic ImageSegmentation by Typical Cuts,º Proc. IEEE Conf. Computer Visionand Pattern Recognition, Fort Collins, Colo., 1999.

[15] D. Geiger, A. Gupta, L. Costa, and J. Vlontzos, ªDynamicProgramming for Detecting, Tracking and Matching DeformableContours,º IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 17, pp. 294-302, 1995.

[16] J. Gorman, O. Mitchell, and F. Kuhl, ªPartial Shape RecognitionUsing Dynamic Programming,º IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 10, pp. 257-266, 1988.

[17] J. Gregor and M. Thomason, ªDynamic Programming Alignmentof Sequences Representing Cyclic Patterns,º IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 15, pp. 129-135, 1993.

[18] R. Horaud and T. Skordas, ªStereo Correspondence ThroughFeature Grouping and Maximal Cliques,º IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 11, pp. 1,168-1,180, 1989.

[19] D. Huttenlocher, G. Klanderman, and W. Rucklidge, ªComparingImages Using the Hausdorff Distance,º IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 15, pp. 850-863, 1993.

[20] D. Huttenlocher and S. Ullman, ªObject Recognition UsingAlignment,º Proc. Int'l Conf. Computer Vision, pp. 102-111,London, 1987.

[21] D.W. Jacobs, D. Weinshall, and Y. Gdalyahu, ªCondensing ImageDatabases when Retrieval Is Based on Nonmetric Distances,º Proc.Sixth Int'l Conf. Computer Vision, Bombay, 1998.

[22] B. Kamger-Parsi, M. Margalit, and A. Rozenfeld, ªMatchingGeneral Polygonal Arcs,º Computer Vision, Graphics, and ImageProcessing: Image Understanding, vol. 53, pp. 227-234, 1991.

[23] M. Koch and R. Kashyap, ªUsing Polygons to Recognize andLocate Partially Occluded Objects,º IEEE Trans. Pattern Analysisand Machine Intelligence vol. 9, pp. 483-494, 1987.

[24] L.J. Latecki and R. LakaÈmper, ªShape Similarity Measure forImage Database of Occluding Contours,º Proc. Fourth IEEEWorkshop Applications of Computer Vision, Princeton, N. J., Oct.1998.

[25] Y. Lamdan, J. Schwartz, and H. Wolfson, ªAffine Invariant ModelBased Object Recognition,º IEEE Trans. Robotics and Automation,vol. 6, pp. 578-589, 1990.

[26] S. Li, ªMatching: Invariant to Translations, Rotations and ScaleChanges,º Pattern Recognition, vol. 25, pp. 583-594, 1992.

[27] H. Liuand and M. Srinath, ªPartial Shape Classification UsingContour Matching in Distance Transformation,º IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 12, pp. 1,072-1,079,1990.

[28] C. Lu and J. Dunham, ªShape Matching Using PolygonApproximation and Dynamic Alignment,º Pattern RecongitionLetters, vol. 14, pp. 945-949, 1993.

[29] A. Marzal and E. Vidal, ªComputation of Normalized EditDistance and Applications,º IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 15, pp. 926-932, 1993.

[30] R. McConnell, R. Kwok, J. Curlander, W. Kober, and S. Pang, ª-SCorrelation and Dynamic Time Warping: Two Methods forTracking Icefloes in SAR Images,º IEEE Trans. Geoscience andRemote Sensing, vol. 29, pp. 1,004-1,012, 1991.

[31] F. Mokhtarian and A. Mackworth, ªScale-Based Description andRecognition of Planar Curves and Two-Dimensional Shapes,ºIEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, pp. 34-44, 1986.

[32] M. Pelillo, K. Siddiqi, and S. Zucker, ªMatching Hierarchicalstructures Using Association Graphs,º Proc. European Conf.Computer Vision, pp. 3-16, 1998.

[33] J. Rocha and T. Pavlidis, ªA Shape Analysis Model withApplications to a Character Recognition System,º IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 16, pp. 393-404, 1994.

[34] S. Sclaroff and A. Pentland, ªModal Matching for Correspondenceand Recognition,º IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 17, pp. 545-561, 1995.

[35] L. Shapiro and M. Brady, ªFeature Based Correspondence: AnEigenvector Approach,º Image and Vision Computing, vol. 10,pp. 283-288, 1992.

[36] D. Sharvit, J. Chan, H. Tek, and B. Kimia, ªSymmetry BasedIndexing of Image Databases,º J. Visual Comm. and ImageRepresentation, 1998.

[37] A. Shokoufandeh, S. Dickinson, K. Siddiqi, and S. Zucker,ªIndexing Using a Spectral Encoding of Topological Structure,ºProc. Computer Vision and Pattern Recognition, pp. 491-497, 1999.

[38] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker,ªShock Graphs and Shape Matching,º Proc. Int'l Conf.Computer Vision, pp. 222-229, 1998.

[39] G. Stockman, S. Kopstein, and S. Benett, ªMatching Images toModels for Registration and Object Detection via Clustering,ºIEEE Trans. Pattern Analysis and Machine Intelligence, vol. 4, pp.229-241, 1982.

[40] S. Tirthapura, D. Sharvit, P. Klein, and B. Kimia, ªIndexing Basedon Edit Distance Matching of Shape Graphs,º SPIE Proc. Multi-media Storage and Archiving Systems III, pp. 25-36, 1998.

[41] W. Tsai and S. Yu, ªAttributed String Matching with Merging forShape Recognition,º IEEE Trans. Pattern Analysis and MachineIntelligence, vol. 7, pp. 453-462, 1985.


[42] Y. Tsay and W. Tsai, ªAttributed String Matching by Split andMerge for On-Line Chinese Character Recognition,º IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 15, pp. 180-185, 1993.

[43] A. Tversky, ªFeatures of Similarity,º Psychological Review, vol. 84,pp. 327-352, 1977.

[44] N. Ueda and S. Suzuki, ªLearning Visual Models fromShape Contours Using Multiscale Covex/Concave StructureMatching,º IEEE Trans. Pattern Analysis and Machine Intelli-gence, vol. 15, pp. 337-352, 1993.

[45] S. Umeyama, ªParameterized Point Pattern Matching and ItsApplication to Recognition of Object Families,º IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 15, pp. 136-144, 1993.

[46] Y. Wang and T. Pavlidis, ªOptimal Correspondence of StringSubsequences,º IEEE Trans. Pattern Analysis and Machine Intelli-gence, vol. 12, pp. 1,080-1,086, 1990.

[47] D. Weinshall and M. Werman, ªOn View Likelihood andStability,º IEEE Trans. Pattern Analysis and Machine Intelligence,vol. 19, pp. 97-108, 1997.

[48] M. Werman and D. Weinshall, ªSimilarity and Affine InvariantDistance Between Point Sets,º IEEE Trans. Pattern Analysis andMachine Intelligence, vol. 17, pp. 810-814, 1995.

Yoram Gdalyahu received the BSc degree inphysics and mathematics from the HebrewUniversity, Jerusalem, Israel. He received theMSc degree in physics from the WeizmannInstitute of Science, Rehovot, Israel for his workon resonant tunneling and inelastic scattering inGallium Arsenide heterostructures. His PhDthesis was in computer vision, submitted to thesenate of the Hebrew University in October 1999.His research combines computer vision and

machine learning. Upon graduation, he received the Eshkol scholarshipgiven by the Israeli Ministry of Science to the best PhD students. He iscurrently with the IBM Research Laboratory, Haifa, Israel.

Daphna Weinshall received the BSc degree inmathematics and computer science from Tel-Aviv University, Tel-Aviv, Israel, in 1982. Shereceived the MSc and PhD degrees in mathe-matics and statistics from Tel-Aviv University in1985 and 1986, respectively, for her work onmodels of evolution and population genetics.Between 1987 and 1992, she visited the centerfor biological information processing at MIT andthe IBM T.J. Watson Research Center. In 1993,

she joined the Institute of Computer Science at the Hebrew University ofJerusalem, where she is now an associate professor. Her researchinterests include computer and biological vision, machine and humanlearning. She has published papers on learning in machine and humanvision, qualitative vision, visual psychophysics, Bayesian vision,invariants, multipoint and multiframe geometry, image and modelpoint-based metrics, motion and structure from motion. She is amember of the IEEE.


Documents

1312 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND …ivizlab.sfu.ca/arya/Papers/IEEE/PAMI/1999/December... · 2009. 5. 12. · 1312 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE