Many-to-Many Feature Matching for Structural Pattern Recognition · 2009-02-27 · Keselman, Dr. Lars Bretzner, Bram Platel, and Nicu Cornea for their helpful collaboration. I also

Many-to-Many Feature Matching for

Structural Pattern Recognition

Muhammed Fatih Demirci

Technical Report DU-CS-05-13Department of Computer Science

Drexel UniversityPhiladelphia, PA 19104

December, 2005

1

Many-to-Many Feature Matching for Structural Pattern Recognition

A Thesis

Submitted to the Faculty

of

Drexel University

by

Muhammed Fatih Demirci

in partial fulfillment of the

requirements for the degree

of

Doctor of Philosophy

December 2005

c�

Copyright 2005Muhammed Fatih Demirci. All Rights Reserved.

ii

Dedications

To my family

iii

Acknowledgements

This research would not have been possible without and a number of people. First and fore-

most, I would like to express my deepest gratitude to my advisor, Dr. Ali Shokoufandeh, for

his invaluable, friendly guidance, trust, patience, and constant encouragement. Being his

first Ph. D. student has been a great honor to me. I will always walk through my academic

life the way he taught me.

I would also like to express my gratitude to Dr. Sven Dickinson of the University of

Toronto for his collaboration, for providing timely advice and encouragement, and for

serving on my Ph. D. committee. I also would like to thank Dr. Ko Nishino for his ad-

vise, his tireless effort in reviewing my thesis and spending his precious time on the Ph. D.

committee. Special thanks are due to Dr. Dario Salvucci and Dr. Kim Boyer of the Ohio

State University for reading this thesis and for serving on my committee. I thank Dr. Wei

Sun for his generous support and thoughtful feedback. Thanks are also due to Dr. Yakov

Keselman, Dr. Lars Bretzner, Bram Platel, and Nicu Cornea for their helpful collaboration.

I also would like to thank the members of the Applied Algorithms Lab, Trip Denton,

Jeff Abrahamson, and John Novatnack, for taking time to proofread most of my publi-

cations, including this thesis. The discussions I had in AAL provided valuable input to

this dissertation. I also thank Craig Schroeder for proofreading many parts of this the-

sis. I would like to thank my friends Kemal Birtek, Suleyman Teke, Necati Anaz, and

Yucel Savran for being so friendly, patient and for keeping me company during my stay

in Philadelphia. I am also thankful to my brother-in-law, Dr. Sinan Akgul and my sister

Aysen Akgul for their help and hospitality during the first year of my study in Delaware.

Finally, my endless thanks are due to my parents Keziban and Musa Demirci and my

iv

fiancee Elmashan for their support, patience, and encouragement during these many years.

I would have never dreamed of pursuing my career as a researcher without them.

v

Table of Contents

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Graph Representations and Basic Terminology . . . . . . . . . . . . . . . 11

2.2 Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Embedding Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Dimensionality Reduction Techniques . . . . . . . . . . . . . . . . . . . 22

2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Metric Embedding of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Construction of a Tree Metric from a Distance Matrix

(Numerical Taxonomy Problem) . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Embedding into Graph-Dependent Dimensionality . . . . . . . . . . . . . 33

3.3.1 Path Partition of a Graph . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Construction of the Embedding . . . . . . . . . . . . . . . . . . . 38

3.3.3 Bringing Point Distributions into the Same Normed Space . . . . 40

vi

3.4 Embedding through Spherical Coding . . . . . . . . . . . . . . . . . . . 44

3.4.1 Construction of the Embedding . . . . . . . . . . . . . . . . . . . 46

3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Encoding Directed Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.1 Qualitative Shape Representation Using a Blob/Ridge Decomposition . . 54

4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 Distribution-Based Many-to-Many Matching . . . . . . . . . . . . . . . . . . . 63

5.1 Choosing an Appropriate Transformation . . . . . . . . . . . . . . . . . 65

5.2 The Final Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 View-Based 3-D Object Recognition . . . . . . . . . . . . . . . . . . . . . . . 71

6.1 Many-to-Many Matching using Silhouettes . . . . . . . . . . . . . . . . . 71

6.2 Many-to-Many Matching using Ridge-and-Blob Decomposition Graphs . 78

6.3 Comparison to Other Approaches . . . . . . . . . . . . . . . . . . . . . . 84

6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7 Face Recognition Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.1 Discrete Representation of Top Points via Scale Space Tessellation . . . . 89

7.2 Catastrophe Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.3 Construction of the Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8 3D Object Retrieval using Many-to-Many Matching of Curve Skeletons . . . . 101

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

vii

8.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8.2.1 The Curve-Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.3.1 Base Classification and Object Retrieval . . . . . . . . . . . . . . 107

8.3.2 Part Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

9.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

9.3 Discussion and FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 118

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

viii

List of Tables

6.1 Recognition rate as a function of increasing perturbation. Note that thebaseline recognition rate (with no perturbation) is 98.0% for COIL-20 and98.5% for ETH-80 datasets. . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.1 Recognition rate as a function of Gaussian noise at different signal levels. 99

ix

List of Figures

1.1 The need for many-to-many matching. In the two images, the two ob-jects are similar, but the extracted features are not necessarily one-to-one.Specifically, the ends of the fingers in the left hand have been over-segmentedin the hand of the right image. . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Object Recognition Domains Used in the Framework. From left-to-right:Silhouette, Multi-Scale Qualitative Shape Description, Top Point in ScaleSpace, 3-D Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Overview of Many-to-Many Matching Procedure. . . . . . . . . . . . . . 6

1.4 A hierarchical relation between two features in a directed graph. . . . . . 8

1.5 Left: the silhouette and its shock graph. Right: the shock tree constructedfrom the shock graph. Darker nodes reflect larger radii. . . . . . . . . . . 9

2.1 An example graph whose vertices represent different image regions andwhose edges represent relations between the regions. . . . . . . . . . . . 13

2.2 One-to-one feature correspondences computed by Siddiqi et al. [99] . . . 17

2.3 Matching results between two pairs of objects computed by Sebastian etal. [86] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Representing a 256 � 256 pixel image as a point in a 65 � 536-dimensionalspace. Each pixel shown by a square in (a) corresponds to an entry in the65 � 536-size vector in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1 Metric tree representation of the Euclidean distances between nodes in agraph. The gesture image (a) consists of 6 regions (the region represent-ing the entire hand is not shown). The complete graph in (b) captures theEuclidean distances between the centroids of the regions, while (c) is themetric tree representation of the multi-scale decomposition (with additionalvertices). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Path partition of a tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

x

3.3 (a) A sample tree with edge weights. (b) Embedded vertices are shownin 3-dimensional space. The Cartesian coordinates of the points are: a ��0 � 0 � 0 � , b � � 1 � 0 � 0 � , c � � 1 � 5 � 0 � 0 � , d � � 0 � 2 � 0 � , e � � 0 � 3 � 5 � 0 � , f � � 0 � 2 � 23 � 1 �40

3.4 The minimum distance d and minimum angle θ between 2 points. . . . . 45

3.5 An edge weighted tree and its spherical code in 2D. The Cartesian coordi-nates of the vertices are: a � � 0 � 0 � , b � � 0 � 1 � 0 � , c � � 0 � 1 � 5 � , d � � 2 � 0 � 0 � ,e � � 2 � 5 � 0 � 87 � , f � � 3 � 5 � 0 � , g � � 3 � 93 � 0 � 25 � , and h � � 4 � 5 � 0 � . . . . . . . 46

3.6 Trade-off between distortion and dimension for a given set of graphs. . . . 51

4.1 Feature Extraction: Extracted blobs and ridges at appropriate scales. . . . 56

4.2 Extracted blobs and ridges after removing multiple responses and ridgelinking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 The four edge relations: (a,b) two normalized distance measures, (c) rela-tive orientation, and (d) bearing. . . . . . . . . . . . . . . . . . . . . . . 59

4.4 Histogram creation for each directed graph relation . . . . . . . . . . . . 60

4.5 Part (a) shows a vertex and its neighbors with their attributes. Histogramscreated for each attribute are presented in parts (b) and (c). . . . . . . . . 61

6.1 Left: the silhouette and its medial axis. Right: the medial axis tree con-structed from the medial axis. Darker nodes reflect larger radii. . . . . . . 73

6.2 Sample views of the 9 objects. . . . . . . . . . . . . . . . . . . . . . . . 73

6.3 Summary of many-to-many matchings of object silhouettes. Every entry ofTable 1 corresponds to a set of 19 � 19 matching results between the viewsof the two objects associated with the row and the column. The shadeof gray in each cell denotes average matching distance of each 19 � 19block, with black and white representing smallest and largest distances,respectively. Table 2 shows a close up look at the matching results for fourviews of TEAPOT. Table 3 depicts a subset of results from three seperateblocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.4 Illustration of the many-to-many correspondences computed for two adja-cent views of the TEAPOT. Matched point clusters are shaded with the samecolor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.5 The result of matching skeleton graphs for some shapes in the Rutgers ToolsDatabase. Same colors indicate corresponding segments. Observe that thecorrespondence is intuitive in all cases. . . . . . . . . . . . . . . . . . . 77

xi

6.6 Applying our algorithm to the images in Figure 1.1. Many-to-many featurecorrespondences have been colored the same. . . . . . . . . . . . . . . . 79

6.7 Views of sample objects from the Columbia University Image Library (COIL-20) and the ETH Zurich (ETH-80) Image Set. . . . . . . . . . . . . . . . 80

6.8 Sample matching results for object 9 of the COIL-20 database, in whichrows and columns can be interleaved to form the set of sequential views.The diagonal and next lower diagonal therefore represent the neighboringviews of the query (row). Only one query, entry (10,8), was incorrectlymatched. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.9 The matching results for the COIL-20 database. The rows represent thequery views (36 views per object), and the columns representing modelviews (36 views per object). Each row represents the matching results for aquery view against the whole database. The intensity of entries representsthe quality of the matching, with black representing maximum similaritybetween the views and white minimum similarity. . . . . . . . . . . . . . 82

6.10 Sample views of objects from the Rutgers Tools Database. . . . . . . . . 85

6.11 Comparison to two leading graph matching algorithms: Pelillo et al. [76](left), Sebastian et al. [87] (center), and our algorithm (right). In each case,the top seven matched database objects are sorted by their similarity to thequery. Correct matches are colored yellow, while mismaches entries arecolored red. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.1 The generic catastrophes in isotropic scale space. Left: an annihilationevent. Right: a creation event. A positive charge � denotes an extremum,a negative charge � denotes a saddle, indicates the singular point. . . . 91

7.2 Visualization of the DAG construction algorithm. Left: the Delaunay tri-angulations at the scales of the nodes. Right: the resulting DAG (edgedirections not shown). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.3 The right image shows the DAG obtained from applying Algorithm 7 to thecritical paths and top points of the face in the left. . . . . . . . . . . . . . 94

7.4 Sample faces from 20 people. . . . . . . . . . . . . . . . . . . . . . . . . 94

7.5 Ten face images of one person from the database. . . . . . . . . . . . . . 95

7.6 Computing similarity between two given faces. (Matched point clusters areshaded with the same color.) See text. . . . . . . . . . . . . . . . . . . . 96

xii

7.7 Table 1: Matching results of 20 people. The rows represent the queriesand the columns represent the database faces (query and database sets arenon-intersecting). Each row represents the matching results for the set of10 query faces corresponding to a single individual matched against theentire database. The intensity of the table entries indicates matching results,with black representing maximum similarity between two faces and whiterepresenting minimum similarity. Table 2: Subset of the matching resultswith the pairwise distances shown. Table 3: Effect of presence or absenceof glasses in the matching for the same person. . . . . . . . . . . . . . . . 97

7.8 Sample face image after adding Gaussian noise at different signal levels.Part (a) shows the original image. Parts (b), (c), (d), (e), (f) shows how theimage looks after adding 1%, 2%, 4%, 8%, and 16% of Gaussian noise,respectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.1 Some examples of 3D shapes and their computed skeletons. . . . . . . . . 104

8.2 Computing similarity between two given objects. . . . . . . . . . . . . . 106

8.3 Precision/Recall for many-to-many matching algorithm in object retrievalexperiment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.4 Models are sorted by the similarity to the query object. . . . . . . . . . . 112

8.5 Part Matching Example: computed distances between a query part (torso)versus several simple and composite objects. . . . . . . . . . . . . . . . . 112

8.6 Correspondences in Part Matching: The query object in (a) is matchedagainst each of the objects in (b). The correspondences between their skele-tons are shown in red in (c) . . . . . . . . . . . . . . . . . . . . . . . . . 113

xiii

AbstractMany-to-Many Feature Matching for Structural Pattern Recognition

Muhammed Fatih DemirciAdvisor: Ali Shokoufandeh, Ph. D.

Graph matching is an important component in many object recognition algorithms. Al-

though most graph matching algorithms seek a one-to-one correspondence between nodes,

it is often the case that a more meaningful correspondence exists between a subset of nodes

in one graph and a subset of nodes in the other. In this thesis we aim to develop a framework

to establish many-to-many correspondences between the nodes of two noisy, vertex-labeled

weighted graphs. The difficulty of providing such correspondences is due to the fact that

any subset of nodes in one graph may correspond to any subset of nodes in another. To

overcome this combinatorial challenge, we transform the graphs into an alternative domain

in which the many-to-many graph matching becomes that of matching point sets. Our

interest in transforming the many-to-many graph matching problem into that of many-to-

many point matching is motivated by the fact that a number of algorithms have proven

useful in establishing such correspondences in the geometric space in polynomial-time.

Our goal is to use one such algorithm to approximate the solution for the original graph

representations. The algorithm is based on recent developments in efficient low-distortion

metric embedding of graphs into normed vector spaces. We present two such embedding

algorithms, beginning with Matousek’s algorithm [66], in which the dimensionality of a

graph’s embedding is graph-dependent. Two graphs to be matched may yield embeddings

with different dimensionality, requiring a projection step to bring them to the same space.

We overcome this problem by introducing a novel embedding technique, using a spheri-

cal encoding of graph structure, that embeds both graphs into a single space of prescribed

xiv

dimensionality. By embedding weighted graphs into normed vector spaces, we reduce

the problem of many-to-many graph matching to the problem of computing a distribution-

based distance measure between graph embeddings. We use a specific measure, the Earth

Mover’s Distance, to compute distances between sets of weighted vectors. The computed

mass flows yield a set of many-to-many node correspondences between the original graphs.

Empirical evaluation of the algorithm on an extensive set of recognition trials, including a

comparison with competing graph matching approaches, demonstrates both the robustness

and efficiency of the overall approach.

1

1. Introduction

1.1 The Problem

Humans show a remarkable ability to recognize objects without effort, despite the fact

that objects may vary in color, texture, or size. We are even able to describe and recognize

objects that we have not seen before. While people can accomplish such recognition tasks

quickly and accurately, building computer-based recognition systems with this capability

remains difficult. Given a database and a query, one way of defining the problem of object

recognition is to classify the query as an instance of a particular category from the database.

More formally, the object recognition problem is often formulated as the process of ex-

tracting object features, such as silhouettes, corners, and skeletons, and finding correspon-

dences between them. For recognition purposes, objects are often represented as attributed

graphs whose nodes represent their features (or their abstractions) and whose edges rep-

resent relations (or constraints) between the features. These graph representations allow

us to express many perceptually significant object properties, such as geometric or hierar-

chical part structures. We will use the terms features, nodes, and vertices interchangeably

throughout the rest of this thesis.

In the computer vision community, many algorithms designed to solve the object recog-

nition problem use graph representations. One of the most important reasons for this is

that graphs capture hierarchical feature relations in a way that is invariant to viewpoint

changes. When graphs are used to represent objects, the object recognition problem can be

transformed into that of graph matching. Given two graphs, the objective of graph match-

2

ing algorithms is to establish correspondences between nodes. To evaluate the quality of

a match, one defines an overall distance measure whose value depends on both node and

edge similarity.

Due to the importance of the recognition problem, which is formulated in terms of

graph matching, there has been a growing interest in developing efficient algorithms for

matching graphs and measuring the similarity of objects using graph representations. Pre-

vious work on graph matching has typically focused on the problem of finding a one-to-one

correspondence between vertex sets of graphs. However, the assumption of one-to-one cor-

respondence is a very restrictive one, as it assumes that the primitive features (nodes) in the

two graphs agree in their level of abstraction. Unfortunately, a variety of conditions may

lead to graphs that represent visually similar image feature configurations yet there is not a

one-to-one node correspondence between vertex sets.

The limitations of the one-to-one assumption are illustrated in Figure 1.1. In this ex-

ample an object is decomposed into a set of ridges and blobs extracted at appropriate

scales [95]. The ridges and blobs map to nodes in a directed graph, with parent/child

edges directed from coarser scale nodes to overlapping finer scale nodes, and sibling edges

between nodes that share a parent. Although the two images clearly contain similar ob-

jects, the decompositions are not identical. Specifically, the ends of the fingers in the right

hand have been over-segmented with respect to the left hand. It is quite common that due

to noise or segmentation errors inherent in any feature extraction method, a single feature

(node) in one graph can correspond to a collection of broken features (nodes) in another

graph. Also, due to scale differences, a single, coarse-grained feature in one graph can

correspond to a collection of fine-grained features in the other.

3

Figure 1.1: The need for many-to-many matching. In the two images, the two objects aresimilar, but the extracted features are not necessarily one-to-one. Specifically, the ends ofthe fingers in the left hand have been over-segmented in the hand of the right image.

1.2 Objective

The principal objective of the research reported in this thesis is to develop a novel

framework for establishing many-to-many feature correspondences between pairs of graphs.

The difficulty of finding such correspondences is due to the fact that any subset of nodes

in one graph may correspond to any subset of nodes in another. To overcome this combi-

natorial challenge, we transform the graphs into an alternative domain in which the many-

to-many feature matching becomes that of matching point sets. Our interest in transform-

ing the many-to-many graph matching problem into that of many-to-many point matching

is motivated by the fact that a number of algorithms have proven useful in establishing

such correspondences in the geometric space in polynomial-time. Our goal is to use one

such algorithm to approximate the solution for the original graphs. More specifically, we

4

draw on recent low-distortion graph embedding techniques, which embed the nodes of one

graph into points in a low-dimensional geometric space. The points (which represent graph

nodes) in this new space are positioned in such a way that the Euclidean distance between

pairs of points reflects the shortest path distances between their corresponding nodes in the

original graph.

It must be pointed out that these embedding methods are applicable only to undirected

graphs, in which a metric (symmetric) distance can be defined between every pair of nodes.

In most attributed graphs, however, such as scale-space structures, edges are directed and

information is encoded by hierarchical, non-metric relations, such as parent/child or sib-

ling relations. To extend our framework to directed graphs, we move such non-metric rela-

tional information into nodes as feature histograms. This allows embedded nodes to encode

neighborhood information about their relations in the original graph representations.

Armed with a low-dimensional vector representation of an input graph’s structure,

many-to-many graph matching can now be reduced to the much simpler problem of match-

ing weighted distributions of points in a normed vector space. A number of algorithms

have been developed for matching weighted points in normed vector spaces and computing

similarities between them. We consider one such similarity measure, known as the Earth

Mover’s Distance (EMD) [82]. Intuitively, the EMD approach can be defined as follows.

Given a pair of weighted point sets, consider the first set as a mass of earth spread in space

and the other as a collection of holes in the same space. The EMD then computes the

minimum amount of work needed to fill the holes with earth. The work here refers to the

product of point weights that move from one point set into the other and the distances over

which they travel. Our goal is to show that the many-to-many vector mapping that realizes

5

Figure 1.2: Object Recognition Domains Used in the Framework. From left-to-right: Sil-houette, Multi-Scale Qualitative Shape Description, Top Point in Scale Space, 3-D Skeleton

the minimum Earth Mover’s Distance corresponds to the desired many-to-many matching

between nodes of the original graphs. The result is a more efficient approach to many-to-

many graph matching that, in fact, includes the special case of one-to-one graph matching.

To demonstrate the effectiveness of the approach to shape retrieval, we apply it to four dif-

ferent object recognition domains: silhouettes, multi-scale qualitative shape descriptions,

top points in scale space, and 3-D skeletons. Figure 1.2 shows images from each of these

domains. A comparative study using silhouettes and 3-D skeletons shows that our method

outperforms all existing techniques reported for the same databases.

An overview of the approach is presented in Figure 1.3. A pair of views are first

represented by attributed graphs (Transition 1). The graphs are then mapped into a low-

dimensional vector space using a low-distortion graph embedding (Transition 2). Finally,

a many-to-many point (embedded graph node) correspondence is computed by the Earth

Mover’s Distance (Transition 3).

6

Figure 1.3: Overview of Many-to-Many Matching Procedure.

1.3 Thesis Overview

After defining the problem statement and the objectives of the research presented in this

chapter, we give a review of the related work in Chapter 2. The related work consists of

several existing graph representations, graph matching algorithms, embedding techniques,

and dimensionality reduction methods. Chapter 2 also introduces basic graph terminology.

Chapter 3 presents two low-distortion graph embedding techniques. Distortion, in this

concept, is defined as the maximum factor by which any distance is changed by the em-

bedding algorithm. The first embedding technique is inspired by the general framework

proposed by Matousek [66]. The framework begins by transforming a graph into a metric

tree, which is then embedded into a normed vector space. Although low-distortion em-

bedding is achieved, the approach suffers from the significant limitation that each graph

is embedded into a vector space whose dimensionality is graph-dependent. Thus, before

the embeddings can be matched, a dimensionality reduction technique such as Principle

Component Analysis (PCA) is required. The aim of dimensionality reduction methods is

7

to represent high-dimensional data in lower dimensions without significant loss of infor-

mation. Since the original high-dimensional data cannot be represented exactly in lower

dimensions, dimensionality reduction methods introduce error. The second embedding

technique is the deterministic variation of the spherical coding algorithm [43]. This novel

linear-time procedure embeds metric trees into normed vector spaces of prescribed dimen-

sionality with the minimal distortion. Since both embeddings are in the same space, they

can be matched directly without the need for a dimensionality reduction step.

Graph embedding methods approximate the distance metric defined on undirected edges

of the original graphs with minimal distortion. However, they fail to encode any oriented

relations, such as parent/child or sibling relations common to scale-space or coarse-to-fine

structures. This is due to the fact that oriented relations do not satisfy the symmetry prop-

erty of a metric. In Figure 1.4, for example, while the relative scale from feature B to

feature A is 0 � 5, it is 2 � 0 the other way around. To encode relational information in the vec-

tor space, the embedding procedure should represent each input graph node as a point in an

asymmetric metric space. Due to the limited number of algorithms defined on asymmetric

metric distances, we will instead propose a method to encode relational information in the

metric space.

More specifically, after using a graph embedding algorithm to represent nodes as points

in the metric space, we encode non-metric relational information into nodes as feature dis-

tributions over the values of incident oriented edges. For one node, encoding the attributes

of its oriented edges requires computing distributions on the attributes and assigning them

to the node. The resulting attribute provides a contextual signature for the node. This

allows the framework to be applied to hierarchical structures represented as hierarchical

8

Figure 1.4: A hierarchical relation between two features in a directed graph.

graphs. This process will be explained in more detail in Chapter 4.

Chapter 5 presents an overview of the Earth Mover’s Distance (EMD) framework [82]

for matching point sets in some geometric space. The EMD approach computes the min-

imum amount of work, which is defined in terms of displacements of the point masses

(weights), it takes to transform one distribution into another. The mass of each point is a

function of the histograms describing the node in the original graph. Here it is important

to consider the possibility that a point set may undergo a transformation with respect to the

other. To handle this, Cohen and Guibas [19] extended the definition of EMD, originally

applicable to pairs of fixed sets of points, to allow one of the sets to undergo a transforma-

tion. In Chapter 5 we show how to choose an appropriate transformation when matching

pairs of weighted point sets.

In Chapter 6 we evaluate the framework for each of the embedding techniques in two

different view-based recognition domains: silhouettes and multi-scale qualitative shape

descriptions. In the first domain, an object’s silhouette is represented by an undirected,

rooted, weighted graph in which nodes represent shocks [99] (also called skeleton points)

9

Figure 1.5: Left: the silhouette and its shock graph. Right: the shock tree constructed fromthe shock graph. Darker nodes reflect larger radii.

and edges connect adjacent shock points. Each point p on the discrete skeleton is labeled

by a 4-dimensional vector v�p �� x � y � r� α � , where

�x � y � are the Euclidean coordinates

of the point, r is the radius of the maximal bi-tangent circle centered at the point, and α

is the angle between the normal to either bi-tangent and the linear approximation to the

skeleton curve at the point. The right of Figure 1.5 shows the shock graph constructed for

the left image. The second domain uses multi-scale qualitative shape descriptions in which

an image is decomposed into a set of blobs and ridges with automatic scale selection using

the algorithm described in [95]. We will explain these domains in more detail in Chapter 6.

In addition to these experiments, we show the applicability of the framework in two

other recognition domains: top points in scale-space and 3D skeletons in Chapter 7 and

Chapter 8, respectively. In Chapter 7 we describe a set of face recognition tests on a small

face database using scale-space top points. The embedding method used in this experiment

is based on spherical coding. The choice of spherical embedding is motivated by its better

performance than that of Matousek’s embedding. Each image in the database is represented

as a directed acyclic graph (DAG), where vertices represent the top points, and the edges

represent neighborhood structure between them. A DAG for each face image is constructed

10

using the algorithm described in [77].

In Chapter 8 we then apply the framework for 3D object retrieval using skeletal rep-

resentations of 3D volumetric objects. Each 3D object is represented as a curve skeleton,

which consists of a set of connected 1D curves (1 voxel thick). This representation has

a number of advantages: intuitiveness, part/component matching, registration, and artic-

ulated transformation invariance. One important contribution of this chapter is to show

the ability of our matching framework for part matching. More specifically, our goal is to

match a part within a complex whole in 3-dimensional space. This type of matching is par-

ticularly useful for CAD-type databases and also for recognition in laser-scanned images,

which tend to cluster objects together. It is also central to medical applications in which a

particular biological configuration is to be found somewhere in a larger object such as an

organ. At the end of Chapter 8 we present our preliminary part matching results.

Chapter 9 draws some conclusions and presents the potential of the proposed method

in a variety of computer vision and pattern recognition domains. In an object recognition

framework, the quality of feature extraction methods has a significant impact on both the

correctness and effectiveness of the recognition system. Hence, the experimental results

also present the goodness of the feature extraction methods used in the framework. In

Chapter 9 we will also discuss the limitations of the approach and identify some directions

for future work.

11

2. Review of Previous Work

In this chapter we present a review of previous work relevant to the research in this

thesis. Specifically, this chapter discusses a number of different techniques including graph

representations, graph matching algorithms, embedding techniques, and dimensionality re-

duction methods.

2.1 Graph Representations and Basic Terminology

A finite graph G is a pair�V � E � , where V is a finite set of vertices and E is a set

of edges between the vertices. An edge e � � u � v � consists of two vertices u � v � V . A

graph G � � V � E � is edge-weighted if each edge e � E has a weight � � e �� . The size

of a graph G is defined as its number of vertices, �V � and number of edges, �E � . Edges

are undirected when their corresponding relations are unordered, and a graph that contains

these types of edges is called undirected. Similarly, for ordered relations, i.e,�u � v �� v � u � ,

the graph is called directed. Directed edges are usually used to represent non-symmetric

relations. In our framework, we use graphs that are either directed or undirected. A graph

G � � V � E � is said to be complete if for any two vertices u � v � V , where u �� v, there exists

an edge�u � v �� E. In some graphs, vertices and edges contain additional information. In an

attributed graph, for example, while scale, orientation, anisotropy may be associated with

each vertex, an edge may contain scale ratio, relative orientation, and normalized distance

between two vertices.

Graphs have proven to be useful for object representations. A wide range of scientific

12

areas such as computer vision, computational and molecular biology, linguistics, computer

networks, etc., use graph representations for their applications. When graphs are used

to represent objects, vertices typically represent features (or regions) of an object, while

edges represent relations (or constraints) between features. To give an example, in Fig-

ure 2.1, each image region is shown by either circles or ellipses and relations between them

are shown by red lines. (The smallest circles represent the centers of regions.) The graph,

the result of a feature extraction process, contains four vertices and three edges. The first

vertex in this graph is a virtual feature corresponding to the whole image, while the others

represent the palm of the hand, the index finger, and tip of the finger. Edges show hierar-

chical information between the vertices. This type of graph representation is explained in

more detail in Chapter 4.

Many researchers encode structural representations of objects in graphs. To name a few,

Dickinson et al. [30] and Cyr et al. [25] used aspect graphs for 3D object representation.

Ioffe and Forsyth [49] employed trees to model people and for human tracking. Authors

in [57, 97, 98] used the notion of shock graphs to represent 2D shapes.

Due to their common use in many fields, graph representations have received significant

attention for indexing into large databases. Messmer and Bunke [68] proposed a decision

tree mechanism for hierarchically partitioning a database. The decision tree is constructed

from the database graphs in a preprocessing step. A query graph is first matched to the

root and depending on the result of this match, the process is applied recursively to one of

the subtrees. The objective here is to determine if there is a subgraph isomorphism from a

query graph to one of the database graphs. Sengupta and Boyer [90] partitioned a database

of 3D models in a spectral graph decomposition framework, where the nodes in the graph

13

Figure 2.1: An example graph whose vertices represent different image regions and whoseedges represent relations between the regions.

represented 3D patches.

A related approach to the partition framework is clustering, where the database is or-

ganized into a set of prototypes and one representative element is selected in each group.

Shapiro and Haralick [93] used a clustering approach based on relational distance metric to

orginize a database of relational models. Sengupta and Boyer [89] presented a framework

for organizing large structural modelbases using an information theoretic criterion. The au-

thors constructed the hierarchical structure via clustering and computed the representative

elements of each cluster.

Recently, Sebastian et al. [88] proposed an indexing mechanism for retrieving can-

didate graphs from a large database. The framework was based on the use of a coarse-

14

scale distance along with coarse-scale sampling. The authors showed that a coarse-scale

distance measure resulted in 50 � 100 times speed-up in distance computations and over-

all the framework reduced the computational requirements in retrieving candidate graphs.

Shokoufandeh et al. [96] proposed a framework for indexing hierarchical image structures

that embedded the topological structure of a directed acyclic graph (DAG) into a low-

dimensional vector space. Encoding a DAG’s topology was derived from an eigenvalue

characterization of a DAG’s adjacency matrix. Costa and Shapiro [23] developed an ap-

proach where small relational subgraphs were used to retrieve model graphs from a large

database. Sossa and Horaud [100] proposed a scheme that used the coefficients of the

d2 � polynomial corresponding to the Laplacian matrix of a graph. Irniger and Bunke [52]

presented a method based on decision trees to filter a database of graphs for a given query

graph. The method extended their previous work on graph matching performance and graph

database filtering [50, 51] and it was used to tackle both graph and subgraph isomorphism

problems.

A related problem to indexing in the information management community is that of

query processing over data that conforms to labeled graph data models. In this commu-

nity, a number of techniuqes have focused on extracting structural summaries from the

data [34, 38, 39, 71, 72]. During the query evaluation for graph-structured data, the struc-

tural summaries has an important role [2, 15].

2.2 Graph Matching

The problem of finding the similarity between pairs of objects using their graph repre-

sentations has been the focus for over twenty years of many researchers in the computer

15

vision and pattern recognition communities. In this thesis we consider model-based object

recognition problems, where query and database objects are represented as two different

graphs. When objects are represented as graphs, the problem of object recognition can

be reformulated as that of graph matching. Graph matching has been used in a number

of applications, such as image analysis [64], document processing [63], and video analy-

sis [17]. Given a pair of graph, graph matching techniques are often required to compute

the distance between them via variety of functions (see [16, 31, 83, 92, 103] ).

Previous work on graph matching has usually focused on finding one-to-one corre-

spondences between graph nodes. Barrow and Burstall [7] used association subgraphs

as an auxiliary graph to locate maximum common subgraphs. Vertices in an association

graph represent node-to-node correspondences between two input graphs. The goal, in this

work, was to find the maximal clique of an association graph to locate node correspon-

dences. Pelillo et al. [76] used a similar approach to match pairs of trees. Their method

first constructed an association graph using the concept of graph connectivity and obtained

a maximal subtree isomorphism through a maximal clique formulation. They proved that

there was a one-to-one correspondence between maximal clique and maximal subtree iso-

morphism.

Shapiro and Haralick [91] proposed a framework to find a common subgraph isomor-

phism between two attributed graphs. The algorithm was based on comparing weighted

primitives (weighted attributes and weighted relation tuples) using a normalized distance

for each primitive property that was inexactly matched. The drawback of this method was

its computational cost, which was exponential in the number of graph nodes. To improve

the complexity of the framework, Grimson et al. [42] used a heuristic search technique that

16

terminated the search as soon as a solution meeting some minimum requirement was found

(near-optimal).

Gold and Rangarajan [37] developed a graduated assignment algorithm for one-to-one

graph matching by combining graduated non-convexity (deterministic annealing), two-way

(assignment) constraints, and sparsity. The technique is based on efficiently finding solu-

tions for optimization problems that use a match matrix denoting an assignment (corre-

spondence) between graph nodes.

A number of approaches have been developped to solving stereo correspondence prob-

lem in vision. One of the most important work on this was proposed by Boyer and Kak [13].

The authors first extract structural desciptions of two two-dimensional scene through a low

level process. These desctiptions are derived from the radial-valued skeleton of binary im-

ages and they are used in a stereo matching procedure via consistent labeling problem [45].

The number of features in such a matching is shown to be much less than in more tradional

feature-based frameworks.

Another class of graph matching techniques, known as spectral methods, represents

structural properties of graphs using eigenvalues and eigenvectors of graph adjacency ma-

trices. One of the most important advantages of using these techniques comes from the

fact that it is less computationally expensive than general combinatorial search procedures.

One work on spectral abstraction of hierarchical graph structures was proposed by Siddiqi

et al. [99]. In this work, the authors combined a bipartite matching framework with a spec-

tral decomposition of graph structure to match shock graphs. Shocks are organized into

a directed acyclic shock graph, which they characterize by a shock grammar defining the

process of reducing the shock graph into a rooted shock tree. The method then matches

17

Figure 2.2: One-to-one feature correspondences computed by Siddiqi et al. [99]

shock trees to locate the best set of corresponding nodes in polynomial time. Figure 2.2

shows one-to-one feature correspondences computed by their matching algorithm.

Shokoufandeh et al. [95] extended this framework to directed acyclic graphs that arise

in multi-scale image representations. The algorithm computes both topological and geo-

metric similarity as well as node correspondence between two given graphs. Computing

the similarity and node correspondence were formulated as a function of structural, con-

textual, and node context similarities. Although the algorithm can be used to find explicit

one-to-one node correspondences further down the hierarchies, one-to-one node correspon-

dences at higher levels effectively define a many-to-many matching between their underly-

ing nodes. Belongie et al. [9] used a similar idea to encode the qualitative shape occupancy

characteristics of a neighborhood surrounding a point. In a bipartite matching framework,

correspondences were formed between points with similar shape contexts, despite the fact

that the neighbors could have differing numbers of points.

18

The problem of many-to-many graph matching has been addressed most often in the

context of edit-distance. The idea of edit-distance was originally introduced for graph

matching by Sanfeliu and Fu [84]. The authors defined a distance between attributed rela-

tional graphs based on a descriptive graph grammar. Messmer and Bunke [67] presented

a general error-tolerant matching algorithm for finding subgraph isomorphisms between

given pairs of graphs. Their goal was to modify the input graphs during the matching pro-

cess using edit operations. Liu and Geiger [62] proposed a framework for matching trees

on a many-to-many basis. The algorithm first represented each shape contour (silhouette)

as a tree structure derived from a shape axis model. An edit-distance based tree matching

schema was then used to find the best approximate match and a matching cost.

Myers, Wilson, and Hancock [69] used the edit-distance to model the probability dis-

tributions for structural errors in the graph-matching problem. The probability distribution

was used to locate matchings between graph nodes. Sebastian et al. [86] matched shock

graphs of 2D shapes by first representing each shape as a point in the shape space and defin-

ing the distance between shapes as the minimum cost of deformation path connecting one

shape to another. They present an efficient graph-edit distance algorithm for finding glob-

ally optimal paths between shapes. Sample matching results between two pairs of objects,

computed by their framework [86], are shown in Figure 2.3.

Despite the fact that many edit-distance based matching frameworks exist, they all share

the same objective: finding the minimal set of re-labelings, additions, deletions, merges,

and splits of nodes and edges that transform one graph into another. As a result, applying a

sequence of edit operations with minimum total cost will make two input graphs isomoprhic

with one another. Although powerful, the edit-distance approach has its drawbacks: 1) it is

19

Figure 2.3: Matching results between two pairs of objects computed by Sebastian et al. [86]

computationally expensive (polynomial-time algorithms are available only for trees); 2) the

method, in its current form, does not accommodate edge weights (most approaches used in

this context are heuristic in nature); 3) the method does not deal well with occlusion and

scene clutter, resulting in much effort spent in “editing out” extraneous graph structure; and

4) the cost of an editing operation often fails to reflect the underlying visual information

(for example, the visual similarity of a contour and its corresponding broken fragments

should not be penalized by the high cost of merging the fragments).

In the context of line and segment matching, Beveridge and Riseman [10] proposed

a framework to find the optimal many-to-many correspondence mapping between a line

segment model and image line segments through exhaustive local search. Performance

of the local search was presented in the presence of increasing model complexity, image

clutter, and additional model instances. Although their method found good matches reliably

and efficiently (due to their choice of the objective function and a small neighborhood

size), it is unclear how the approach can be generalized to other types of feature graphs and

20

objective functions.

Scott and Longuet-Higgins [85] presented an algorithm that maximized the inner prod-

uct of two matrices, pairing matrix and proximity matrix, to find feature correspondences.

Elements of the proximity matrix described Gaussian weighted distances between pairs

of features. They showed that the eigenvectors of this matrix could be used to determine

correspondences between two given feature sets.

Kosinov and Caelli [58] used a similar approach, showing how inexact graph matching

could be solved using the renormalization of projections of vertices into the eigenspaces

of graphs combined with a form of relational clustering. In this framework, the authors’

goal was to formulate graph matching as clustering, which groups common local relational

structures between different graphs. Our framework differs from their approach in that (1)

it can handle information encoded in a graph’s nodes, which is desirable in many vision ap-

plications; (2) it does not require an explicit clustering step; (3) it provides a well-bounded,

low-distortion metric representation of graph structure; (4) it encodes both local and global

structure, allowing it to deal with noise and occlusion; and (5) it can accommodate multi-

scale representations.

2.3 Embedding Techniques

Low-distortion embedding techniques have received much attention in theoretical com-

puter science and have proven to be useful in a number of graph algorithms, including

clustering and, most recently, on-line algorithms. Indyk [47] provides a comprehensive

survey of recent advances and applications of low-distortion graph embedding. The main

applications can be grouped into the following classes.

21

General metrics into low-dimensional normed spaces: The goal here is to obtain a

low-dimensional representation of the original metric. This embedding technique enables

us to represent data points in the original metric with fewer bits. One of the applications of

this approach was given by Linial, London, and Rabinovich [60]. The authors introduced

metric embedding to obtain an approximation algorithm for the sparsest cut problem in

which the objective was to maximize the weighted number of pairs while minimizing the

cost of the cut. Using the notion of the embedding, an O�logk � -approximation algorithm

was obtained.

General metrics into tree metrics: This type of embedding enables us to embed finite

metrics into tree metrics instead of normed spaces. Applications of this approach include

both on-line and off-line algorithms. In a typical on-line algorithm, the objective is to

perform a set of requests without knowing future requests. Bartal et al. [8] presented a

randomized on-line algorithm for the Metrical Task System problem that runs O�log2 n �� a

competitive algorithm for the problem. Several researchers have also used this embedding

technique to find approximation algorithms for NP-hard problems. The studies have been

motivated by the fact that many problems that are NP-hard for general metrics have polyno-

mial time solutions for trees. As a result, this embedding approach enables the development

of good approximation algorithms.

Tree metrics into low-dimensional normed spaces: Instead of working with gen-

eral metrics, some embedding algorithms take tree metrics and embed them into low-

dimensional normed spaces. While many of these methods do not allow the dimension

of the target space to be specified, some of them embed tree metrics into normed spaces

with prescribed dimensionality. We will study two such embedding techniques, one in each

22

group, in Chapter 3. For recent approaches related to tree-metric embedding, see [43, 61,

66].

Specific metrics into normed spaces: Embedding specific metrics, such as Hausdorff

or edit-distance, into normed spaces allows us to use some well-known algorithms (cluster-

ing, for example) in the normed space to solve the problems in the original metric. Indyk

and Thaper [48] developed an embedding procedure in support of image retrieval. Image

feature distributions, such as color histograms, do not provide a convenient mechanism for

indexing into large image databases. In a two-step procedure, they first embed the feature

distribution in a vector and then use the Locality Sensitive Hashing (LSH) algorithm of

Gionis et al. [35] to retrieve nearby candidates. The embedding method was designed so

that the distance between two such embeddings mimics the Earth Movers Distance (EMD)

between their respective feature distributions. Grauman and Darrell [41] applied the frame-

work to match 2D contours as shape context-like distributions. Our earlier work that com-

bines low-distortion embedding and EMD was reported in [28], [29], and [56].

Low-distortion embedding continues to be a focal point in the theoretical computer

science community. Recent results related to properties of low-distortion embedding may

be found in [3, 61, 66]

2.4 Dimensionality Reduction Techniques

The goal of dimensionality reduction techniques is to map a set of points in a high-

dimensional space to a lower-dimensional space, with the aim of preserving important fea-

tures of the pointset (pairwise distances between data points, for example). Alternatively,

the goal can be defined as finding meaningful low-dimensional structures hidden in the

23

high dimensional data.

Dimensionality reduction techniques are useful tools especially for designing efficient

algorithms. One of the most important reasons for this comes from the fact that the running

times of most algorithms are proportional to the dimensionality of the space. Thus, re-

ducing the dimensionality of the space helps improve the running times. These techniques

are also used in clustering problems, where the objective is to find a set of representative

(or canonical) points minimizing a certain function defined on the input set, such as find-

ing the k-mean or k-center. The main idea of using a dimensionality reduction technique

for clustering is motivated by the following: Data points that are close to each other (and

thus, should be grouped together) in d � dimensional space become closer in dimension

d � 1. This, in turn, makes it easier to solve the clustering problem in lower dimensions.

Dimensionality reduction methods are also used to estimate the number of clusters.

Other application areas of dimensionality reduction techniques include visualization,

image processing, data compression, pattern recognition, data analysis, some biological

and physical sciences, and data mining. To give an example, let us assume that given a

pair of 256 � 256 pixel images, our goal is to compute the similarity between them. It is

clear that each image corresponds to a point in a 65 � 536-dimensional space, as depicted in

Figure 2.4. Assume that we are also given some similarity measure in this space. To gain

accuracy and to speed up computation time, one needs only to extract relevant information

and discard unnecessary details, which correspond to dimensions that do not provide any

useful information. The process of reducing such unnecessary details and selecting useful

features from a high-dimensional data set can be done through a dimensionality reduction

process.

24

Figure 2.4: Representing a 256 � 256 pixel image as a point in a 65 � 536-dimensional space.Each pixel shown by a square in (a) corresponds to an entry in the 65 � 536-size vector in(b).

Many popular methods exist for dimensionality reduction of data distributions. Two

well-known techniques in this family include principal components analysis (PCA) [53]

and multidimensional scaling (MDS) [24]. A standard multidimensional scaling technique

takes an�n � n � matrix, operates by means of eigenvector analysis, and produces a layout

based on a linear combination of dimensions. On the other hand, a typical PCA-based

dimensionality reduction technique is based on a linear projection that maximizes the vari-

ance in the projected space. In other words, the goal of a PCA-based algorithm is to find

a linear lower-dimensional representation of the data such that the variance of the recon-

structed data is preserved. We will study one PCA-based dimensionality reduction ap-

proach in Section 3.3.3.

PCA and MDS-based dimensionality reduction techniques are linear in nature, and are

widely used for classification and learning (via clustering) purposes. The linearity of these

25

methods is, in fact, considered to be one of their shortcomings. The high dimensional

representations of many experimental problems have more compact descriptions in terms

of lower dimensional manifolds. This makes the straight-line measurement of distances in

source space restrictive and will directly affect the results of classification and quality of

the resulting learning algorithms.

To address these shortcomings, Tenenbaum et al. [102] introduced the notion of ISOMAP

as a more sophisticated variation of multidimensional scaling. Here, the distances are mea-

sured based on geodesic shortest-paths along manifolds (or their approximations) of the

input data. To avoid global pairwise distance computation, Roweis and Saul [81] pro-

posed an eigenvalue-based method known as locally linear embedding (LLE). The method

characterizes each point in the data set by its local representation in terms of its neighbor-

hood patches that capture the local geometry of the manifold. The LLE then constructs a

neighborhood-preserving mapping based on the invariance properties of these local neigh-

borhoods. In the final step of the algorithm, each high-dimensional data point in the metric

space is mapped to a low-dimensional vector representing global internal coordinates on

the manifold.

2.5 Conclusions

Graphs have proven to be useful for object representations. When objects are repre-

sented as graphs, the problem of object recognition is reformulated as that of graph match-

ing. A number of approaches have been developed for matching pairs of graphs. Most of

these approaches, however, have focused on finding one-to-one feature (node) correspon-

dences. Due to limitations of these approaches mentioned above, they cannot be used in

26

more realistic cases, where a cluster of features of one graph correspond to a cluster of

features of another.

The problem of many-to-many graph matching has also been studied mostly in the

context of edit-distance. The general idea behind edit-distance is to find a minimal set of

re-labelings, additions, deletions, merges, and splits of nodes and edges that transform one

graph into another. Although the method has important potential for matching features

on a many-to-many basis, it suffers from a number of drawbacks, such as computational

complexity and inability to handle underlying visual information, while providing the cor-

respondences.

The development of an efficient and reliable many-to-many matching framework, which

is also stable with respect to noise is still an open issue. Our goal in this thesis is to de-

velop a framework for establishing many-to-many feature correspondences between pairs

of attributed graphs.

We have also seen that graph embedding techniques have proven to be useful in a num-

ber of graph algorithms, such as clustering, online algorithms, etc. Broadly speaking, they

reduce problems defined over “difficult” metric spaces, to problems over “easier” normed

spaces. In our framework, we will use graph embedding techniques to reduce the the many-

to-many feature matching problem to that of many-to-many point matching, for which a

number of existing matching approaches are available.

27

3. Metric Embedding of Graphs

In this chapter we introduce the concept of graph embedding, review some notation

and definitions that will be useful in the rest of this thesis, and present two low-distortion

embedding algorithms. Distortion is defined as the maximum factor by which any distance

between any two vertices is changed by the embedding algorithm. The formal definition

of distortion is given below. Both embedding algorithms begin by transforming a graph

into a metric tree that is then embedded into a normed vector space. In the first embedding

technique, which is inspired by the general framework proposed by Matousek [66], each

graph is embedded into a vector space whose dimensionality is graph-dependent. Thus, be-

fore the embeddings can be matched, a dimensionality reduction technique (such as PCA)

is required. Since high dimensional data cannot be represented exactly in lower dimen-

sions, dimensionality reduction techniques introduce error. We overcome this problem by

introducing the second embedding technique, which is the deterministic variation of the

spherical coding algorithm [43]. This novel procedure embeds metric trees into normed

vector spaces of prescribed dimensionality, precluding the need for dimensionality reduc-

tion techniques.

3.1 Introduction

The difficulty with establishing many-to-many node correspondences is due to the fact

that any subgraph of one graph can be assigned to any subgraph of another, which makes

the problem intractable. Our interest in low-distortion graph embedding is motivated by its

28

ability to transform graphs to an alternative space in which establishing many-to-many cor-

respondences between embedded graph nodes is computationally tractable. To ensure that

the solution of the many-to-many point matching problem in the embedded space reflects

a meaningful solution to the many-to-many graph matching problem in the original graph

space, the geometric structure of the points must somehow reflect the topological structure

of the graph.

During the last decade, low-distortion embedding has become recognized as a very

powerful tool for designing efficient algorithms. In low-distortion embeddings of metric

spaces into normed spaces, we consider mappings f : V �� , where V is a set of points

in the original metric space, with distance function � � �� , � is a set of points in the d-

dimensional normed space �� k, and for any pair p � q � V we have

1c� � p � q �� f � p �!� f

�q �"�� k �#� � p � q � (3.1)

where c is known as the distortion. Intuitively, such an embedding will enable us to re-

duce problems defined over difficult metric spaces,�V �$�%� , to problems over easier normed

spaces,� �&�'�� k � . Clearly, the closer c is to 1, the better the target set � mimics the origi-

nal set V . Consequently, the distortion parameter c is a critical characteristic of embedding

f .

The most fundamental existence result in computational embedding is due to Bour-

gain [11].

Lemma 1. Any finite metric space�V �$�%� can be embedded into a finite normed space ��(�� 2

of dimension at most log �V � with distortion O�log �V �)� .

29

Matousek [65] further extended this lemma for embedding finite metrics into ld∞.

Lemma 2. For any positive integer q, any finite metric space�V �*�+� can be embedded into

ld∞ with distortion 2q � 1, where d � O

�qn1 , q logn � .

These results are important since even an exponential matching algorithm, in terms of

number of dimensions of the target space, may be tractable. However, O�log �V �)� is too

large of a distortion and we seek an embedding with a much lower distortion.

The above definition of a low-distortion embedding maps a set of points in the original

metric space to a set of points in the target space. Since in our framework the original space

is based on graph representations, we must choose a suitable metric for our graphs, i.e, we

must define a nonnegative function describing the distance between any two vertices.

Given a graph G � � V � E � and any three vertices u � v � w � V , a metric � for the graph

satisfies the following properties:

1. � � u � v �-�.� � v � u ��/ 0

2. � � u � u �0� 0

3. � � u � v �1�2� � u � w �435� � w � v �In general, there are many ways to define metric distances on a weighted graph. The

best-known metric is the shortest-path metric δ� �� , i.e., � � u � v �� δ

�u � v � , the shortest

path distance between u and v for all u � v � V .

The problem of low-distortion embedding has a long history for the case of planar

graphs, in general, and trees, in particular. The following conjecture shows the existence

of a O�1 � distortion embedding of planar graphs.

30

Conjecture 1. [44] Let G � � V � E � be a planar graph, and let M � � V �$�%� be the shortest-

path metric for the graph G. Then there is an embedding of M into �� 1 with O�1 � distor-

tion.

This conjecture has only been proven for the case in which G is a tree. Although the

existence of such a distortion-free embedding under ��6�� k-norms was established in [60],

no deterministic construction was provided. Several researchers have also studied the pos-

sibility of embedding a tree into the ��(�� 2 norm with O�1 � distortion. Bourgain [12] showed

that a complete binary tree cannot be embedded into ��(�� 2 with less than O�87

loglogn �distortion. Matousek [66] then showed that Bourgain’s bound is tight for all trees. More

generally, he proved that any tree can be embedded into ��6�� d with O��

loglogn � min 9 1 , 2 : 1 , d ; � .One deterministic algorithm to embed a tree into a vector space is given by Matousek [66].

His framework suggests that if we can somehow map our graphs into trees, with small dis-

tortion, we can then embed the resulting trees into a vector space. In the following section

our goal is to compute tree metrics from graph representations.

3.2 Construction of a Tree Metric from a Distance Matrix

(Numerical Taxonomy Problem)

Let G � � V � E � denote an edge-weighted graph and � denote a shortest-path metric for

G, i.e., � � u � v �<� δ�u � v � , for all u � v � V . The problem of approximating (or fitting) an

n � n distance matrix � by a tree metric = is known as the Numerical Taxonomy problem.

In many fields such as paleontology and evolutionary biology, approximating a distance

matrix by a tree metric plays an important role. Recall that a tree metric = is a metric

31

induced by an edge-weighted tree on its vertex set, where the distance between any pair of

vertices u and v is the length of the unique path between them.

The numerical taxonomy problem has received significant attention over many years

with work going as far back as the beginning of 20th century [6]. Waterman et al. [105]

showed that if there is a tree metric = coinciding exactly with distance matrix � then it

is unique and can be constructed in linear time. Day [27] showed that for L1 and L2, the

numerical taxonomy problem is NP-hard. Since the numerical taxonomy problem is an

open problem for general distance metrics, we must explore approximation methods. The

numerical taxonomy problem can be approximated by converting the distance matrix � to

the weaker ultra-metric distance matrix.

An ultra-metric is a special type of tree metric defined on rooted trees, where the

distance to the root is the same for all leaves in the tree, an approximation that intro-

duces small distortion. A metric � is an ultra-metric if, for all points x � y � z, we have�?> x � y @A� max BC�?> x � z @D�$�?> y � z @�EF� An ultra metric can also be represented by a weighted tree

such that �G> x � y @ is the maximum edge weight on the path between points x and y. Un-

fortunately, an ultra-metric does not satisfy all the properties of a tree metric distance. To

create a general tree metric from an ultra-metric, we need to satisfy the 4-point condition

(see [14]), defined as

�?> x � y @H3I�?> z � w @A� max BC�?> x � z @J35�?> y � w @D�$�?> x � w @H3I�?> y � z @�Efor all x � y � z � w. A metric that satisfies the 4-point condition is called an additive metric,

and a metric � is additive if and only if it is a tree metric (see [14]).

32

Our construction of a tree metric consists of: 1) constructing an ultra-metric from � ,

and 2) modifying the ultra-metric to satisfy the 4-point condition. One such approximation

framework, called the centroid metric tree = , has been given by Agarwala et al. [3]. The

construction of a tree metric in their algorithm is achieved by transforming the general tree

metric problem to that of ultra-metrics. Given a graph G � � V � E � and a metric � defined

over G, the construction of an ultra-metric starts by computing the minimum spanning tree= mst of G. Let e � � u � v � be the maximum-weight edge of = mst . Clearly, removing e from

the tree = mst results in two two distinct subtrees = 1 and = 2. The ultra-metric U has root at

height �?> u � v @LK 2 and the subtrees of the root are the ultra-metric trees U1 and U2 recursively

defined on = 1 and = 2, respectively.

The algorithm presented by Agarwala et al. [3] follows the two-step procedure outlined

above, and generates an approximate tree metric = to an optimal additive metric in time

O�n2 � . It should be noted that this construction does not necessarily maintain the vertex set

of G invariant. The embedding process may add extra vertices generated during the metric

tree construction that must be removed prior to matching.

More specifically, let � be an n � n distance matrix and = be a tree that approximates� . The algorithm presented in [3] finds an additive tree = such that �� =M�*�5�� ∞ � 3ε , where

ε is the closest tree metric under the L∞ norm. Moreover, the authors showed that it is

NP-hard to find a tree = such that ��N=O�$�P�� ∞ Q 98ε . The results were then generalized to

other norms.

An example of constructing a metric tree from a graph is shown Figure 3.1, in which

a hierarchical blob decomposition of an image, shown in (a), yields a graph whose edge

weights reflect the Euclidean distances between the nodes (centroids of their corresponding

33

(a) (b) (c)

Figure 3.1: Metric tree representation of the Euclidean distances between nodes in a graph.The gesture image (a) consists of 6 regions (the region representing the entire hand is notshown). The complete graph in (b) captures the Euclidean distances between the centroidsof the regions, while (c) is the metric tree representation of the multi-scale decomposition(with additional vertices).

regions), shown in (b). The metric tree representation of the graph is shown in (c); note the

additional vertices (white) introduced by the construction, which will be later removed.

3.3 Embedding into Graph-Dependent Dimensionality

Given a metric tree approximation of our original graph, we can now proceed with

the first embedding algorithm, which is inspired by the general framework proposed by

Matousek’s [66]. The algorithm maps the nodes in the metric tree to points in some low-

dimensional Euclidean space. The dimension of the Euclidean space is graph-dependent.

The construction of the embedding depends on the notion of a path partition of a graph. In

the following subsection, we introduce the concept of path partition and then later on we

use it to construct the embedding.

34

3.3.1 Path Partition of a Graph

The process for the embedding of a particular node is based upon the path from the

tree’s root to that particular node. Parts of that path will be unique to that node, while

other parts will be shared by paths to other nodes. A partitioning of these paths, called

a caterpillar decomposition, yields a set of “basis” paths defining the dimensionality of

the vector embedding. The path from the root to any node will traverse some weighted

combination of these basis paths, yielding the components of the vector, with the weights

reflecting how much of the basis path is traversed.

Specifically, given a weighted graph G � � V � E � with metric distance � � ��R�S� , let =I��V �UTV� denote a tree representation of G, whose vertex distances are consistent with � � �� .

In the event that G is a tree, =W� G; otherwise = is the centroid metric tree of G. To

construct the embedding, we will assume that = is a rooted tree. It will be clear from the

construction that the choice of the root does not affect distortion of the embedding.

The dimensionality of the embedding of = depends on the caterpillar dimension [66],

denoted by cdim� =X� , and is recursively defined as follows. If = consists of a single vertex,

we set cdim� =�Y� 0. For a tree = with at least 2 vertices, cdim

� =�� k 3 1 if there exist paths

P1 ��R�R� Pr beginning at the root and otherwise pairwise disjoint, such that each component= j of =��IT � P1 �Z�5T � P2 �Z�P[�[�['�5T � Pr � satisfies cdim� = j �\� k. Here =]�5T � P1 �Z�IT � P2 �Z�[�[�[8�]T � Pr � denotes the tree = with the edges of the Pi’s removed, and the components = j are

rooted at the single vertex lying on some Pi. The caterpillar dimension can be determined

in linear time for a rooted tree = , and it is known that cdim� =X�� log

� �V �)� (see Lemma 3).

The construction of vectors f�v � , for v � V , depends on the notion of a path partition

of = . The path partition ^ of = is empty if ^ is single vertex; otherwise ^ consists of a set

35

of paths P1 ��$� Pr as in the definition of cdim� =X� , plus the union of path partitions of the

components of =��PT � P1 �!�PT � P2 �A�P[R[�['�PT � Pr � . The paths P1 ��$� Pr have level 1, and the

paths of level k / 2 are the paths of level k � 1 in the corresponding path partitions of the

components of =��PT � P1 �A�5T � P2 �A�_[�[�[`�5T � Pr � . Note that the paths in a path partition are

edge-disjoint and their union covers the edge-set of = .

To illustrate these concepts, consider the tree shown in Figure 3.2. The three darkened

paths from the root represent three level 1 paths. Following the removal of the level 1 paths,

we are left with 6 connected components that, in turn, induce seven level 2 paths, shown

with lightened edges.1 Following the removal of the seven level 2 paths, we are left with

an empty graph. Hence, the caterpillar dimension (cdim� =X� ) is 2.

Lemma 3. Given a rooted tree = , cdim� =�� log

� �V �N� .Proof. It is known that among all trees, the complete binary tree Bn has the largest caterpil-

lar dimension [66]. Thus, it is sufficient to show that cdim�Bn �\� log

� �V �)� . Let us remove

a root-leaf path P1 from Bn. Note that P1 is a level one path in the caterpillar decomposi-

tion. After the removal of P1, we will have at most log �V � subtrees. When we recursively

construct the caterpillar decomposition for each subtree, the longest root-leaf path of each

subtree will have a level, which is at most one greater than the level of the path that has

just been removed. Therefore, for complete binary trees (and thus for other trees), the

caterpillar dimension is bounded by log �V � .1Note that the third node from the root in the middle level 1 branch is the root of a tree-component

consisting of four nodes that will generate two level 2 paths.

36

Figure 3.2: Path partition of a tree.

Given a rooted tree = , we give the construction of its caterpillar decomposition in Al-

gorithm 1.

Complexity Analysis of Algorithm 1

Since the embedding algorithms use the notion of the caterpillar decomposition, we

first analyze the running time of its construction as given in Algorithm 1. It is easy to see

that Steps 1 through 11 take O� �V 3 E �)�a3 O

� �V �N�a3 O� �E �)� , where �E �U�b�V �c� 1. To find the

edge-disjoint paths in the caterpillar decomposition along with their levels, each edge in the

tree is visited exactly twice. This implies that the running time of the Steps 11 through 32

37

Algorithm 1 Caterpillar Decomposition Construction ( = : Edge-weighted Tree)

1: root d root of = .

2: Call Breath-First-Search(BFS) on = .

3: for each vertex v �e= do4: color[v] d WHITE

5: level[v] d 06: end for

7: color[root] d BLACK

8: for each edge e �%= do9: level[e] d 0

10: end for

11: for each leaf v �+= do12: u d predecessor of v

13: create an empty array E fa�gBaE14: no-edges d 0;15: while color[u]==WHITE do16: E f > no-edges 3h3i@jd edge

�u � v �

17: v d u

18: u d predecessor of v

19: end while20: if u �k� root then

21: E f > no-edges 3h3i@jd edge�root � v �

22: end if23: last-vertex d u

24: for (i d 0; i Q no-edges; i 3_3 ) do25: let e d E f > i @ be an edge between x and y, i.e, e � � x � y �26: level[e] = level[last-vertex] + 127: level[x] = level[last-vertex] + 128: level[y] = level[last-vertex] + 1

29: color[x] d BLACK30: end for31: add the path specified by the edges in E f into set ^32: end for

33: return ^

38

is linear in terms of the number of edges. The total running of the algorithm then becomes:

T�n �l� O

� �V �D3#�E �)�43 O� �V �N�43 O

� �E �)�j3 O� �E �)�

T�n �l� O

� �V �N�3.3.2 Construction of the Embedding

Given a path partition ^ of = , we use m to denote the number of levels (or caterpillar

dimension) in = , and let P�v � represent the unique path between the root and a vertex v � V .

The first segment of P�v � of weight l1 follows some path P1 of level 1 in ^ , the second

segment of weight l2 follows a path P2 of level 2, and the last segment of weight lα follows

a path Pα of level α � m. The sequences m P1 ��R�R� Pα n and m l1 ��R�R� lα n will be referred to

as the decomposition sequence and the weight sequence of P�v � , respectively.

To define the embedding f : V �o� under ��6�� 2, we let the relevant coordinates in � be

indexed by the paths in ^ . The vector f�v � , v � V , has non-zero coordinates corresponding

to the paths in the decomposition sequence of P�v � . Returning to Figure 3.2, the vector

f�v � will have 10 components (defined by three level 1 paths and seven level 2 paths).

Furthermore, every vector f�v � will have at most two non-zero components. Consider, for

example, the second lowest leaf node in the middle branch. Its path to the root will traverse

two level 2 edges corresponding to the fourth level 2 path, as well as three level 1 edges

corresponding to the second level 1 path.

Such embedding functions have become fairly standard in the metric space represen-

tation of weighted graphs [61, 66]. In fact, Matousek [66] has proven that setting the k-

th coordinate of f�v � , corresponding to path Pk, 1 � k � α , in decomposition sequence

39m P1 ��R��R� Pα n , to

f�v � Pk �qp lk r lk 3 ∑α

j s 1 max t 0 � l j � lk K 2m uwvwill result in a small distortion of at most x log log �V � . It should be mentioned that al-

though the choice of path decomposition ^ is not unique, the resulting embeddings are

isomorphic up to the transformation that preserves ratios of distances. Given a metric tree= , the construction of this embedding is summarized in Algorithm 2. Figure 3.3 shows an

example of embedding a tree into 3-dimensional space using this algorithm.

Algorithm 2 Embedding into Graph-Dependent Dimensionality

1: Construct the path partition ^ of = according to Section 3.3.1.

2: m d number of levels in ^ (caterpillar dimension)

3: for all v �e= do

4: Compute its decomposition sequence Q P1 �� Pα y andweight sequence Q l1 �� lα y

5: for k d 1 to α do

6: f�v � Pk d p lk r lk 3 ∑α

j s 1 max t 0 � l j � lk K 2m uwv7: end for

8: end for


As shown in the previous section, the construction of ^ has computational complexity

O� �V �)� . Since we embed each vertex in = to a point in Euclidean space, Steps 4-7 are

executed O� �V �N� times. The number of paths in ^ is bounded by O

� �E �N� . Thus, the total

running time of the algorithm is O� �V �N�43 O

� �V �'�G�E �N�0� O� �V �'�?�E �)� .

40

Figure 3.3: (a) A sample tree with edge weights. (b) Embedded vertices are shown in 3-dimensional space. The Cartesian coordinates of the points are: a � � 0 � 0 � 0 � , b � � 1 � 0 � 0 � ,c � � 1 � 5 � 0 � 0 � , d � � 0 � 2 � 0 � , e � � 0 � 3 � 5 � 0 � , f � � 0 � 2 � 23 � 1 �3.3.3 Bringing Point Distributions into the Same Normed Space

It is important to note that embeddings produced by the above algorithm may be in

different dimensions and are defined only up to a distance-preserving transformation. Note

that a translated and rotated version of a graph embedding will also be a graph embedding.

Therefore, in order to match two embeddings, we must first perform a “registration” step

to project the two distributions into the same normed space.

Our transformation is based on Principal Components Analysis (PCA). Specifically,

the projection of the original vectors onto the subspace spanned by the first K right singu-

lar vectors of the covariance matrix retains the maximum information about the original

vectors among all projections onto subspaces of dimension K. Hence, projecting the two

distributions onto the first K right singular vectors of their covariance matrices will equalize

their dimensions while losing minimal information. Specifically, our PCA-based transfor-

41

mation is contained in the following theorem:

Theorem 4. Let X �zB � x1 � w1 ��R�$� � xn � wn �È and Y �{B � y1 � w1 �4�R��R� � ym � wm �È be a pair

of weighted distributions in two different dimensions, d and d f , and let K be min�d � d f � .

Suppose moreover that

µx d �∑

iwixi ��K ∑

iwi (3.1)

µy d �∑

iwiyi ��K ∑

iwi (3.2)

σ 2x d �

∑i

wi �� xi � µx �� 2 ��K ∑i

wi (3.3)

σ 2y d �

∑i

wi �� yi � µy �� 2 ��K ∑i

wi (3.4)

Σxx d �∑

iwi�xi � µx � � xi � µx � T ��K ∑

iwi (3.5)

Σxx � UxDxV Tx is the SVD of Σxx (3.6)

Wx d first K columns of Vx (3.7)

Σyy d �∑

iwi�yi � µy � � yi � µy � T ��K ∑

iwi (3.8)

Σyy � UyDyV Ty is the SVD of Σyy (3.9)

Wy d first K columns of Vy (3.10)

Then the embeddings Px�xi �w� W T

x�xi � µx �RK σx and Py

�yi �Z� W T

y�yi � µy ��K σy equalize their

dimensions 2 while losing minimal information.

Proof. We represent the point sets X and Y by d � n and d f � m matrices, respectively.

Here d and d f reflect the dimensionality of the point sets. Without loss of generality let us

2In the literature this is also known as whitening.

42

assume that the dimensionality of set X is greater than that of set Y , i.e., the value of K used

in equations 3.7 and 3.10 is equal to the dimensionality of set Y (d y d f and K � d f ). For

simplicity suppose each point in sets X and Y has a uniform weight, i.e., wi � 1 � 0. Then

clearly,n

∑i

wi � n (3.11)

Using equation 3.11, equations 3.1 through 3.5 and equation 3.5 can be written as follows:

µx d 1n ∑

ixi (3.12)

µy d 1n ∑

iyi (3.13)

σ 2x d 1

n ∑i�� xi � µx �� 2 (3.14)

σ 2y d 1

n ∑i�� yi � µy �� 2 (3.15)

Σxx d 1n ∑

i

�xi � µx � � xi � µx � T (3.16)

Σyy d 1n ∑

i

�yi � µy � � yi � µy � T (3.17)

As shown in equation 3.6, computing the Singular Value Decomposition (SVD) of Σxx

yields three matrices,i.e, Σxx � UxDxV Tx , where the columns of Ux are the eigenvectors of

ΣxxΣxxT , diagonal entries of Dx are the square roots of the eigenvalues of both ΣxxΣxx

T and

ΣxxT Σxx, and the columns of Vx are the eigenvectors of Σxx

T Σxx. Note that Vx contains the

right singular vectors of Σxx and thus Wx consists of the first K right singular vectors of the

covariance matrix, Σxx. Similarly, after computing the Singular Value Decomposition of

Σyy, Wy consists of the first K right singular vectors of Σyy. Since based on our assumption

43

K � d f , Wy contains all right singular vectors of Σyy.

It is clear that both Σxx and Vx are d � d matrices. Since Wx includes the first d f (or

K) columns of Vx, Wx is a d � d f matrix. Similarly, Wy is represented as a d fj� d f matrix.

This, in turn, makes the final embeddings computed by both Px�xi � and Py

�yi � have the same

dimensions d f (Px and Py are d f|� n and d f|� m matrices, respectively). Hence, this proves

the theorem for uniform-weight point sets.

Note that the proof can easily be generalized for arbitrary point weights. Assuming

that point sets have integer weights (in case of rational weights, we can multiply each

weight by their least common denominator), we replace each weighted point�xi � wi � by

wi uniform-weight pairs. This results in two uniform-weight point sets X f and Y f with n fand m f elements, where n f / n and m f / m. (X f and Y f are d � n f and d f � m f matrices,

respectively). We can then use the first part of the proof for sets X f and Y f . This concludes

the theorem.

The above embedding has preserved both graph structure and edge weights, but has not

accounted for node information. To accommodate node information in our embedding, we

will associate a weight wv to each vector f�v � , for all v � V . These weights will be defined

in terms of vertex labels which, in turn, encode image feature values. Note that nodes with

multiple feature values give rise to a vector of weights assigned to every point. We will

present an example of one such distribution in Chapter 6.

44

3.4 Embedding through Spherical Coding

The previous embedding procedure suffers from a significant drawback. Namely, each

graph is embedded into a vector space whose dimensionality is dependent on the graph

structure. Before the embeddings can be matched, a dimensionality reduction technique is

required. Since the original high-dimensional data cannot be represented exactly in lower

dimensions, dimensionality reduction methods introduce error. In this section we intro-

duce a novel, linear-time method to embed trees into vector spaces of prescribed dimen-

sionality, thereby avoiding the need for a dimensionality reduction step. As in the previous

embedding, this embedding is based on the caterpillar decomposition of the metric tree.

The paths of this decomposition will be embedded along maximally spaced rays in some

fixed-dimension metric space. In this construction the set of rays share the origin as their

end-points. The main step of the embedding is to identify the principal direction for each

ray to guarantee that the rays are maximally apart. In practice, this can be achieved by

placing maximally spaced points on the surface of a unit sphere and using the unit-length

vectors between the origin and these points as the principal directions of the rays. One

may observe that the first embedding method is a special case of this embedding when the

dimension of the embedding space is equal to the number of levels in the decomposition

(caterpillar dimension) and the corresponding rays form an orthogonal basis for the em-

bedding space. This new embedding embedding algorithm is a deterministic version of the

algorithm presented in [43]. Given an object and its rotated view on a plane, this determin-

istic embedding algorithm guarantees that the principal directions of unit-length vectors

assigned to nodes that have the maximum distance from the root are the same.

45

Figure 3.4: The minimum distance d and minimum angle θ between 2 points.

A spherical code is a finite set of n points on the surface of a multi-dimensional unit

hypersphere. Given n, one may arrange the points on the sphere so as to minimize or

maximize a number of objective functions, such as the minimum distance between any

two points, the kissing number, the integration error, the indexing complexity, etc. The

choice of the objective function depends on the application. In this particular example, we

are interested in positioning the points on the sphere to maximize the minimum distance

between any pair of points [20]. Equivalently, one can try to minimize the radius r of a

multi-dimensional sphere such that n points can be placed on the surface, where any two

of the points are at angular distance 2 from each other. Recall that the angular distance

between two points is the acute angle subtended by them at the origin. Figure 3.4 shows

the relationship between the minimum distance and minimum angle between two points.

The minimum distance of a spherical code indicates the quality of the code.

46

a

c

h g

d b

f e

2.0

1.5

1.0

0.5 1.0

0.5 1.0

a

b

c

d

e

f

g

h

C b

C d

C e

C f

Figure 3.5: An edge weighted tree and its spherical code in 2D. The Cartesian coordinatesof the vertices are: a � � 0 � 0 � , b � � 0 � 1 � 0 � , c � � 0 � 1 � 5 � , d � � 2 � 0 � 0 � , e � � 2 � 5 � 0 � 87 � , f ��3 � 5 � 0 � , g � � 3 � 93 � 0 � 25 � , and h � � 4 � 5 � 0 � .

3.4.1 Construction of the Embedding

The embedding framework is best illustrated with an example where a weighted tree is

embedded into 2 , as shown in Figure 3.5. To ease visualization, we will limit the discus-

sion to the first quadrant. The weighted tree contains 4 paths } a � b � c ~ , } a � d � f � h ~ , } d � e ~ , and} f � g ~ in its caterpillar decomposition. In the embedding, the root is assigned to the origin.

Next, we seek a set of four vectors, one for each path in the caterpillar decomposition, such

that their inner products are minimized, i.e., their endpoints are maximally apart. These

vectors define the general directions in which the vertices on each path in the caterpillar

decomposition are embedded.

Three of the four vectors will be used by the caterpillar paths belonging to the subtree

rooted at vertex d, and one vector will be used by the path belonging to the subtree rooted

47

at vertex b. This effectively subdivides the first quadrant into two cones, Cb and Cd . The

volume of these cones is a function of the number of caterpillar paths belonging to the

subtrees rooted at b and d. The cone Cd , in turn, is divided into two smaller cones, Ce and

C f , corresponding to the subtrees rooted at e and f , respectively. The extreme rays of sub-

cones Cb, Ce, and C f correspond to the four directions defining the embedding. To complete

the embedding, we translate the sub-cones away from the origin along their directional rays

to positions defined by the path lengths in the tree. For example, to embed point b, we will

move along the extremal ray of Cb and will embed b at�0 � 1 � 0 � . Similarly, the sub-cone Cd

will be translated along the other extremal ray, embedding d at�2 � 0 � 0 � .

In d-dimensional Euclidean space d , computing the embedding f : V �o� under ��6�� 2is more involved. Let L denote the number of paths in the caterpillar decomposition. The

embedding procedure defines L vectors in d that have a large angle with respect to each

other on the surface of a hypersphere Sd of radius r. These vectors are chosen in such a

way that any two of their endpoints on the surface ∑d are at least spherical distance 2 from

each other. We refer to such vectors as well-separated. Consider the set of hyperplanes

Hi � � 0 � 2 � 4 �R��*� 2i � , and let ∑d�i �� Hi � ∑d . Since each of the ∑d

�i � are hypercircles, i.e.,

surfaces of spheres in dimension d � 1, we can recursively construct well-separated vectors

on each hypercircle ∑d�i � . Our construction stops when the sphere becomes a circle and

the surface becomes a point in two dimensions. It is known that taking r to be O�dL1 , d � 1 � ,

and the minimum angle between two vectors to be 2 K r provides us with L well-separated

vectors [20]. In Figure 3.5, we have four such vectors emanating from the origin.

Now that the embedding directions have been established, we can proceed with the em-

bedding of the vertices. The embedding procedure starts from the root (always embedded

48

at the origin) and embeds vertices following the embedding of their parents. For each ver-

tex in the metric tree = , we associate with every subtree = v a set of vectors Cv, such that

the number of vectors in Cv equals the number of paths in the caterpillar decomposition of= v. Initially, the root has the entire set of L vectors. Consider a subtree rooted at vertex

v, and let us assume that vertex v has k children, v1 �R��R� vk. We partition the set of vectors

into k subsets, such that the number of vectors in each subset, Sv, equals the number of

leaves in = v. We then embed the vertex vl (1 � l � k) at the position f�v �j3 wl � xl , where

wl is the length of the edge�v � vl � and xl is some vector in Cv. We recursively repeat the

same process for each subtree rooted at every child of v, and stop when there are no more

subtrees to consider.

It must be noted that the above algorithm is randomized since the vectors are chosen

arbitrarily from each Cv. We will, however, use the non-randomized version of this embed-

ding in our framework. More specifically, after embedding vertex v into f�v � , we consider

the subtree rooted at v ( = v) and compute the length from each leaf to the root, v. Among

the children of the root, we first start the embedding process from the vertex vl , which lies

on the maximum-length root-leaf path in = v. We then continue embedding vertices in the

same fashion, i.e., the ith vertex to be embedded is the one which lies on the ith maximum-

length root-leaf path in = v. The embedding procedure is summarized in Algorithms 3 and

4

Complexity Analysis of Algorithms 3 and 4

Since the running time of Algorithm 3 depends on that of Algorithm 4, we first ana-

lyze the complexity of Algorithm 4. One may notice that Algorithm 4 is called recursively

for each vertex in the tree. Since all the other steps of this algorithm are constant, its

49

Algorithm 3 Embedding through Spherical Coding

1: Construct the path partition ^ of = according to Section 3.3.1.

2: L d number of paths ^ .

3: r d O�dL1 , d � 1 �

4: δ � ρr where ρ is at least 2.

5: SphericalEmbedding�root �

Algorithm 4 SphericalEmbedding (u)

1: if u � root then

2: f�u �0d 0

3: end if

4: Compute the set of vectors Cu using r� δ � andρ according to Section 3.4.

5: for all v � Ad j > u @ do

6: f�v �-d f

�u �43 w

�u � v � � xu, where xu � Cu

7: SphericalEmbedding�v �

8: end for

50

running time becomes linear in terms of the number of vertices in the tree. Returning to

Algorithm 3, we have shown previously that given a tree, the construction of its caterpil-

lar decomposition takes O� �V �)� . Since Steps 2, 3 and 4 are constant-time operations, the

running time of Algorithm 3 is linear, O� �V �N� .

Since the dimensionality of Euclidean space is taken as an input to the spherical em-

bedding algorithm, a natural question one may ask is what dimensionality of Euclidean

space should be chosen so that the embedding approximates pairwise distances with mini-

mum distortion. To answer this question, we conducted a set of experiments in which 200

trees constructed for some of the Columbia University Image Library (COIL-20)(see [70])

objects were embedded into Euclidean spaces of varying dimensions, and measured the

average distortion in each dimension. For a given tree, we used the following method to

measure its average distortion in one particular dimension. First, we computed all of its

pairwise node distances before and after the embedding. We then measured the maximum

factor by which any pairwise distance was changed by the embedding algoritm. After

repeating this procedure for all trees, the average distortion for one particular dimension

was calculated. The trade-off between distortion and dimension is shown in Figure 3.6.

It should be noted that, while increasing the dimensionality of the embedding space will

improve the quality by decreasing the distortion, this trend does not continue indefinitely to

produce isometric embeddings. This can be attributed to the fact that the original distances

are non-additive, making an isometric embedding impossible.

51

Figure 3.6: Trade-off between distortion and dimension for a given set of graphs.

3.5 Conclusions

This chapter introduced the notion of graph embedding and presented two low-distortion

graph embedding algorithms. Each algorithm takes a tree metric and embeds it into some

low-dimensional vector space with the aim of preserving pairwise distances between the

nodes. While the distances cannot be preserved exactly, we would like to approximate them

with minimal distortion.

The first embedding technique is inspired by the general framework proposed by Ma-

tousek [66]. The framework begins by transforming a graph into a metric tree, which,

is then embedded into a normed vector space. Although low-distortion embedding is

52

achieved, the approach suffers from the significant limitation that each graph is embed-

ded into a vector space whose dimensionality is a property of the graph. Thus, before the

embeddings can be matched, a dimensionality reduction technique (such as PCA) is re-

quired. Since high-dimensional data cannot be represented exactly in lower dimensions,

dimensionality reduction techniques are prone to error.

The limitations of the first embedding technique bring us a second embedding method,

which enables us to embed input trees into normed vector spaces of prescribed dimension-

ality. The main advantage of this technique comes from the fact that because the nodes of

the trees are embedded into the same space, it avoids the need for a dimensionality reduc-

tion step. This novel linear-time embedding technique is based on a deterministic version of

the spherical coding algorithm [43]. The starting point of this embedding technique comes

from the first embedding method, where we use a new dimension orthogonal to all previous

ones for each path in the caterpillar decomposition. However, to restrict the dimension of

the target space, we need to relax the orthogonality constraint and compute L vectors that

have small inner product with each other, where L is the number of paths in the caterpillar

decomposition. Such vectors form good spherical codes as defined in [43] and the cost of

relaxing the orthogonality constraint is O�L1 ,�9 d � 1 ; � .

As presented in this chapter, since the dimension of the target space is given as an input

to the algorithm, it raises an interesting question: What dimensionality of the target space

should be chosen to approximate pairwise distances with minimum distortion? We tried to

answer this question by conducting an experiment where a number of trees were embedded

into Euclidean spaces with different dimensionality and computed the average distortion in

each dimension. The experimental results indicate that increasing the dimensionality of the

53

embedding space improves the quality of the embedding, but this trend does not continue

indefinitely to produce isometric embeddings.

To compare these two embedding methods to each other, we will develop two different

variations of the many-to-many matching framework and demonstrate the effectiveness of

each variation for shape retrieval and pose estimation in the experimental section of this

thesis. Since two different embedding techniques are used, graph nodes will be positioned

at different Cartesian coordinates in the vector space. We will also study the stability of the

matching framework for each variation in the presence of noise/occlusion in the following

chapters.

54

4. Encoding Directed Edges

Graph embedding methods approximate the distance metric defined on undirected edges

of the original graphs with minimal distortion. However, they fail to encode any oriented

relations, such as parent/child or sibling relations common to scale-space or coarse-to-

fine structures. This is due to the fact that oriented relations do not satisfy the symmetry

property of a metric. To encode relational information in the vector space, the embedding

procedure should represent each input graph node as a point in asymmetric metric space.

Due to the limited number of algorithms defined on asymmetric metric distances, we will

instead propose a method to encode relational information in the metric space.

Our method retains this important information by moving it into the nodes as node

attributes, a technique used in the encoding of directed topological structure [99], directed

geometric structure [95], and shape context [9]. Encoding in a node the attributes of the

oriented edges incident to the node requires computing distributions on the attributes and

assigning them to the node. For example, a node with a single parent at a coarser scale and

two children at a finer scale might encode a relative scale distribution (histogram) as a node

attribute. The resulting attribute provides a contextual signature for the node which will be

used by the matching framework (Chapter 5) to reduce matching ambiguity.

4.1 Qualitative Shape Representation Using a Blob/Ridge Decomposition

We will motivate this encoding in the context of directed graphs for qualitative shape

representation using a blob/ridge decomposition; details can be found in [95]. Two exam-

55

ples are shown in Figure 1.1. A blob (compact region) is graphically represented by a circle

defining a support region whose radius is proportional to its scale (7

t). Blobs are detected

as local maxima in scale space of the square of the normalized Laplacian operator,

∇2normL � t

�Lxx 3 Lyy � (4.1)

Ridges (elongated structures) are represented as ellipses each defining a support region

whose width is proportional to its scale (7

t). These elongated structures are localized

where the multi-scale ridge detector,

RnormL � t3 , 2 � Lpp � Lqq � 2 � t3 , 2 �� Lxx � Lyy � 2 3 4L2xy � (4.2)

assumes a local maximum in scale-space. For color images, the feature detection is per-

formed in the R, G, and B channels, respectively. To represent the spatial extent of a

detected image structure, a windowed second moment matrix,

∑ �&�η � ℜ2 �� L2

x LxLy

LxLy L2y

�'�� g�η; tint � dη (4.3)

is computed at the detected feature position and at an integration scale tint proportional

to the scale tdet of the detected image feature. The orientation and the anisotropy of the

feature are estimated from the eigenvalues of ∑ and the corresponding eigenvectors. The

spatial extent of the feature is thus given by the scale, the anisotropy, and the orientation.

Figure 4.1 shows an image of a hand with the extracted features superimposed.

56

Figure 4.1: Feature Extraction: Extracted blobs and ridges at appropriate scales.

The feature detection process may receive multiple overlapping responses originating

from the same image structure. Therefore, features are merged to remove such overlapping

responses. To detect overlapping features, we need a measure of inter-feature similarity.

For this purpose, each feature is associated with a 2-D Gaussian kernel g�x � ∑ � . When

two features are positioned near each other, their Gaussian functions will intersect. The

similarity measure between two such features can be defined as the disjunct volume D of

the two Gaussians, and is computed as

D�A � B �0� p �ΣA �D3&�ΣB �

2�

η � ℜ2

�gA � gB � 2dx � (4.4)

Similarly, the ridge detection will produce multiple responses on a ridge structure that

is long compared to its width. These ridges are linked together to form one long ridge, as

shown in Figure 4.2. Ridges are linked based on overlap and alignment. After the linking

is performed, we re-calculate the anisotropy and support region for the resulting ridge. The

57

Figure 4.2: Extracted blobs and ridges after removing multiple responses and ridge linking.

anisotropy is re-calculated as 1 � � w K l � , where w is the width of the structure and l is the

length of the structure.

Once we construct the feature map, we then assemble the component features into a

directed acyclic graph. Algorithm 5 shows the graph construction.

Algorithm 5 Ridge and Blob Decomposition Graph Construction

1: Extract features, merge multiple feature responses, and link ridges.

2: Choose the coarsest scale feature as the root.

3: Recursively define child nodes of the root based on spatial overlap; find parental andsibling edges.

4: Compute relations between the nodes in the graph; these serve as attributes of theedges.

5: For features not included in the graph, go to step 3.

As outlined in the algorithm, after linking spatially overlapping aligned ridges and

merging spatially overlapping blobs, we build directed acyclic graphs in a coarse-to-fine

58

manner. Specially, let G � � V � E � be a graph to be embedded. Each feature is represented

as a node in the graph and has a number of attributes, including position, orientation, and

support region. A feature at the coarsest scale is chosen as the root. Next, finer-scale

features that overlap with the root become its children through hierarchical edges. These

children, in turn, select overlapping features at finer scales to be their children, etc. From

the unassigned features, the feature at the coarsest scale is chosen as a new root. Children

of this root are selected from unassigned as well as assigned features and the process is

repeated until all features are assigned to a graph. This process creates the possibility that

a node may have multiple parents. In order to create one rooted graph which is needed in

the matching step, a virtual top root node is inserted as the parent of all root nodes in the

image.

There are a number of important geometric attributes associated with each edge. For

an edge � , directed from a vertex � A representing feature � A, to a vertex � B representing

feature � B, we define the following attributes, as shown in Figure 4.3:� Distance. Two measures of inter-feature distance are associated

with the edge: 1) the smallest distance d from the support region

of � A to the support region of � B, normalized to the the largest of

the radii rA and rB; and 2) the distance between their centers nor-

malized to the radius rA of � A in the direction of the distance vector

between their centers.� Relative orientation. The relative orientation between � A and � B.� Bearing. The bearing of a feature � B, as seen from a feature � A,

59

r

Feature

Feature

dB

A

r

B

A

rA Feature B

AFeature

d

relative orientation

φ

Feature B

Feature A

(a) (b) (c) (d)

Figure 4.3: The four edge relations: (a,b) two normalized distance measures, (c) relativeorientation, and (d) bearing.

is defined as the angle of the distance vector xB � xA with respect to

the orientation of A measured counter-clockwise.� Scale ratio. The scale invariant relation between � A and � B is a

ratio between scales t �A

and t �B.

Examples of graphs for hand images, showing hierarchical edges, are shown in Figure 1.1.

For every pair of vertices,�u � v � , we let Ru : v denote the attribute vector associated with

the pair. The entries of each vector represent the set of oriented relations R between u � v.

For a vertex u � V , we let N�u � denote the set of vertices v � V adjacent to u. For a relation

p � R, we denote � � u � p � as the set of values for relation p between u and all vertices in

N�u � , i.e., � � u � p � corresponds to entry p of vector Ru : v for v � N

�u � . Feature vector � u

for point u is the set of all � � u � p � ’s for p � R. Observe that every entry � � u � p � of vector� u can be considered as a local distribution (histogram) of feature p in the neighborhood

N�u � of u (see Figure 4.4). We adopt the method of [95], in which the distance function for

two such vectors � u and � p is computed through a weighted combination of Hausdorff

distances between � � u � p � and � � u f � p � for all values of p.

In Part (a) of Figure 4.5, we illustrate an example graph where a vertex and its neighbors

60

Figure 4.4: Histogram creation for each directed graph relation

with their attributes are shown. Assuming there are only two attributes associated with

every vertex, Parts (b) and (c) show two histograms for each of these attributes.

4.2 Conclusions

As presented in Chapter 3, graph embedding methods approximate the distance metric

defined by undirected, weighted edges of the original graphs. Given two nodes u and v in

a tree, the embedding methods guarantee that the undirected (symmetric) distance between

them is within a maximum-factor range of that in the target space. However, since oriented

relations, such as parent/child or sibling relations common to scale-space structures are di-

rected (asymmetric), they cannot be encoded by embedding algorithms. To overcome this

problem, we moved such hierarchical relations into nodes as their node attributes. More

61

Figure 4.5: Part (a) shows a vertex and its neighbors with their attributes. Histogramscreated for each attribute are presented in parts (b) and (c).

specifically, for every incoming and outgoing edge adjacent to one vertex, we created a

local histogram. For one particular vertex in a given graph, we used its histograms along

with its geometric location in the vector space to find its corresponding node(s) in the sec-

ond graph. The main advantage of this method, along with a low-distortion tree embedding

method, is that it will enable us to encode both geometric and topological structure of input

graphs during the matching process.

In our many-to-many matching framework, the steps that we presented so far represent

graph nodes as a set of points in a high-dimensional vector space. Our next goal is to

match these point sets and find the correspondences between them. Given two point sets,

the algorithm establishing such correspondences should also compute a similarity score

between them. Using this similarity value, we will gain information about how similar the

62

original graphs, and therefore, their corresponding shapes are. In the next chapter we will

first define the point matching algorithm used in the framework. We then show by a set of

experiments that many-to-many point correspondences using embedded node histograms

in the vector space yield meaningful many-to-many node correspondences in the original

graphs.

63

5. Distribution-Based Many-to-Many Matching

By embedding vertex-labeled graphs into normed spaces, we have reduced the problem

of many-to-many matching of graphs to that of many-to-many matching of weighted dis-

tributions of points in normed spaces. Given a pair of weighted distributions in the same

normed space, the Earth Mover’s Distance (EMD) framework [82] is then applied to find

an optimal match between the distributions. The EMD approach computes the minimum

amount of work (defined in terms of displacements of the masses associated with points)

it takes to transform one distribution into another. The EMD approach assumes that a dis-

tance measure between single features, called the ground distance, is given. The EMD

then “lifts” this distance from individual features to full distributions. The main advantage

of using EMD lies in the fact that it subsumes many histogram distances and permits par-

tial matches. This important property allows the similarity measure to deal with uneven

clusters and noisy datasets.

Computing the EMD is based on a solution to the well-known transportation prob-

lem [4], whose optimal value determines the minimum amount of “work” required to trans-

form one distribution into the other. More formally, let P � B � p1 � wp1��R��R� � pm � wpm �È be

the first distribution with m points, and let Q �.B � q1 � wq1��R��*� � qn � wqn �È be the second dis-

tribution with n points. Let D �g> di j @ be the ground distance matrix, where di j is the ground

distance between points pi and q j. Our objective is to find a flow matrix F �q> fi j @ , with fi j

being the flow between points pi and q j, that minimizes the overall cost:

64

Work�P� Q � F �-� ∑m

i s 1 ∑nj s 1 fi jdi j

subject to the following list of constraints:

fi j / 0 � 1 � i � m � 1 � j � n

∑nj s 1 fi j � wpi

� 1 � i � m

∑mi s 1 fi j � wq j

� 1 � j � n

∑mi s 1 ∑n

j s 1 fi j � min t ∑mi s 1 wpi

� ∑nj s 1 wq j

uThe optimal value of the objective function Work

�P� Q � F � defines the Earth Mover’s Dis-

tance between the two distributions.

The above formulation assumes that the two distributions have been aligned. However,

recall that a translated and rotated version of a graph embedding will also be a graph em-

bedding. To accommodate pairs of distributions that are “not rigidly embedded”, Cohen

and Guibas [19] extended the definition of EMD, originally applicable to pairs of fixed

sets of points, to allow one of the sets to undergo a transformation. Assuming that a

transformation T �� is applied to the second distribution, distances dTi j are defined as

dTi j � d

�pi � T � q j �R� , and the objective function becomes Work

�P� Q � F � T �-� ∑m

i s 1 ∑nj s 1 fi jd

Ti j.

The minimal value of the objective function Work�P� Q � F � T � defines the Earth Mover’s

Distance between the two distributions that are allowed to undergo a transformation from� .

Cohen and Guibas [19] also suggested an iterative process (which they call FT, short

for “an optimal Flow and an optimal Transformation”) that achieves a local minimum

65

of the objective function. Starting with an initial transformation T 9 0 ; �I� from a given

T 9 k ; �P� , they compute the optimal flow F � F 9 k ; that minimizes the objective function

Work�P� T 9 k ; � Q �� F � , and from a given optimal flow F 9 k ; they compute an optimal transfor-

mation T � T 9 k � 1 ; �I� that minimizes the objective function Work�P� T � Q �� F 9 k ; � . The

iterative process stops when the improvement in the objective function value falls below

a threshold. The resulting optimal pair�F � T � depends on the initial transformation T 9 0 ; .

Starting the iteration from several initial transformations increases the likelihood of obtain-

ing a global minimum.

5.1 Choosing an Appropriate Transformation

For our framework, the set � of allowable transformations consists of only those trans-

formations that preserve distances. Therefore, we use a weighted version of the Least

Squares Estimation algorithm [104] to compute an optimal distance-preserving transfor-

mation given a flow between the distributions. Specifically, the following theorem shows

how to compute the transformation parameters.

Theorem 5. Given a set of pairings B � xi � yi � wi �È (the flow of weight wi is sent from point

xi to point yi), the optimal transformation T�x �-� cRx 3 t is defined as follows:

66

µx d �∑

iwixi ��K ∑

iwi (5.1)

µy d �∑

iwiyi ��K ∑

iwi (5.2)

σ 2x d �

∑i

wi �� xi � µx �� 2 ��K ∑i

wi (5.3)

σ 2y d �

∑i

wi �� yi � µy �� 2 ��K ∑i

wi (5.4)

Σxy d �∑

iwi�yi � µy � � xi � µx � T ��K ∑

iwi (5.5)

R d UV T , where UDV T is the SVD of Σxy (5.6)

c d σy K σx (5.7)

t d µy � cRµx (5.8)

Proof. The original proof of optimality of the transformation [104] is easily adapted to

the weighted case. Namely, assuming that the flows from the xi’s to the yi’s are integer,

and each weighted pairing B � xi � yi � wi �È is replaced by wi unweighted pairings B � x ji � y j

i �È ,which makes the original proof applicable. Collecting appropriate terms, we get weighted

versions of the original equations. Fractional flows are reduced to integer flows by multi-

plying all fractions by their least common denominator. More formally the proof can be

stated as follows.

Consider a set of pairs S and its weight set W :

S � B � x1 � y1 ��*� � xn � yn �ÈW � B w1 ��*� wn E

67

Let us first assume that each pair�xi � yi � has a uniform weight, i.e, wi � 1 � 0. Then

clearlyn

∑i

wi � n (5.9)

Using equation 5.9, characteristics in 5.1- 5.5 can be written as follows:

µx d 1n ∑

ixi (5.10)

µy d 1n ∑

iyi (5.11)

σ 2x d 1

n ∑i�� xi � µx �� 2 (5.12)

σ 2y d 1

n ∑i�� yi � µy �� 2 (5.13)

Σxy d 1n ∑

i

�yi � µy � � xi � µx � T (5.14)

One may notice that the equations (5.10 - 5.14) are the same as the ones in [104]. Thus,

we can follow the original proof for the uniformly weighted case. More generally, suppose

that each pair has an integer weight, i.e., wi / 1. (Note that in case of rational weights, we

multiple each weight by their least common denominator. ) After replacing each weighted

pair�xi � yi � wi � with wi uniform-weight pairs, S and W can be written as follows:

S f � B � x1 � y1 ��*� � xn � yn �`��*� � xm � ym �ÈW f � B 1 ��$� 1 E

68

In S f some of the pairings�xi � yi � are repeated more than once. It is then easy to see that the

first part of the proof can easily be applied to S f . Hence, this concludes the proof.

5.2 The Final Algorithm

Our algorithm for many-to-many matching is a combination of the previous procedures.

Specifically, given two vertex-labeled edge-weighted graphs G1 and G2, we first find low-

distortion embeddings of the graphs into low-dimensional normed spaces, obtaining two

weighted distributions. Depending on which embedding method is used, a dimensionality

reduction technique (such as PCA) is required to bring the embeddings into the same space.

We then “register” one distribution with respect to the other so as to minimize the (original)

EMD between them. Next, we apply the FT iteration of the transformation version of the

EMD framework [19] to minimize the (extended) EMD. The pairing of points minimizing

the EMD corresponds to a weighted many-to-many pairing of nodes. We summarize our

approach in Algorithm 6.

Algorithm 6 Many-to-many graph matching

1: Compute the metric tree = i corresponding to Gi according to Chapter 3 (see [3] fordetails).

2: Construct low-distortion embeddings � i = fi� = i � of = i into

� � i �'��6�� 2 � according to oneof the algorithms presented in Chapter 3.3 and Chapter 3.4.

3: Compute the EMD between � i’s by applying the FT iteration, computing the optimaltransformation T according to Chapter 5 (see [56] for details).

4: Interpret the resulting optimal flow between � i’s as a many-to-many vertex matchingbetween Gi’s.

69


As we showed in Section 3.2, computing the metric tree = i for a given graph Gi takes

O� �V � 2 � . The complexity of Step 2 depends on which embedding algorithm is used. While

this may take O� �V ��I�E �)� using graph dependent dimensionality, it can also be done in

linear time through spherical coding. Since computing the EMD is based on the trans-

portation problem, it can be solved using a network flow algorithm in O� �V � 3 � . The FT

iteration alternates between finding the optimum transformation for a given flow and the

optimum flow for a given transformation. We measured in our experiments that this proce-

dure converges after five or six iterations. Finally, Step 4, the mapping of the EMD solution

back to the graph solution, is O� �V �N� . The overall complexity of the algorithm is therefore

O� �V � 3 � . Note that the total running time can be further improved by using efficient algo-

rithms for the transportation problem. For example, Atkinson and Vaidya [5] presented an

O�n2 � 5 logn logW � -algorithm for solving the transportation problem, where W is the mag-

nitude of the largest supply or demand in the EMD formulation and n is the total number

of nodes in G1 and G2.

5.3 Conclusions

After reducing the many-to-many feature matching problem into the point matching

problem, we use one existing framework, Earth Mover’s Distance, to find many-to-many

point correspondences in a vector space. Recall that pairwise distances between points in

the vector space reflect the shortest path distances between their corresponding nodes in

the original graphs. As mentioned in the previous chapters, we create local histograms

70

to encode directed relations in the original graphs. Given a point in the first embedding,

we use its local histograms and geometric coordinates in the EMD approach to locate its

corresponding point(s) from the second embedding.

During the matching process, it is important to consider the possibility that one point

set may undergo a transformation with respect to the other. To handle this, we use the EMD

under transformation to further minimize the objective function of the EMD formulation.

An important property of the EMD approach is that it subsumes many histogram dis-

tances and permits partial matches. This property is particularly useful when the total

weights (masses) of two distributions are not equal. In the experimental sections of the

thesis, we will use this property to locate a query object in a scene.

To experimentally verify that the many-to-many point correspondences reflect mean-

ingful many-to-many feature (node) correspondences in the original graphs and also to

show that the similarity score between point sets can be used as the similarity of input

graphs, we will perform a set of recognition and matching experiments on different do-

mains in the following chapters.

71

6. View-Based 3-D Object Recognition

To demonstrate the effectiveness of our many-to-many matching framework, we apply

it to the problem of view-based 3-D object recognition using two different graph-based

shape representations; silhouettes and ridge-and-blob decomposition graphs. In addition,

we compare our matching results to two leading graph matching algorithms: a one-to-one

matching algorithm proposed by Pelillo et al. [76] (using association graphs) and a many-

to-many matching algorithm proposed by Sebastian et al. [87] (using graph-edit distance)

in this chapter.

6.1 Many-to-Many Matching using Silhouettes

We first turn to the domain of view-based object recognition using silhouettes. For a

given view, an object’s silhouette is first represented by an undirected, rooted, weighted

graph, in which nodes represent shocks [99] (or, equivalently, skeleton points) and edges

connect adjacent shock points. Note that this representation is closely related to Siddiqi et

al.’s shock graph [99], except that our nodes (shock points) are neither clustered nor are

our edges directed. We will assume that each point p on the discrete skeleton is labeled

by a 4-dimensional vector v�p �� x � y � r� α � , where

�x � y � are the Euclidean coordinates of

the point, r is the radius of the maximal bi-tangent circle centered at the point, and α is the

angle between the normal to either bitangent and the linear approximation to the skeleton

curve at the point.1 This 4-tuple can be thought of as encoding local shape information of1Note that this 4-tuple is slightly different from Siddiqi et al.’s shock point 4-tuple, where the latter’s

radius is assumed normal to the axis.

72

the silhouette.

Skeletons with many points lead to graphs with many nodes. To reduce the size of the

graph, we first subdivide the skeleton into a number of small fragments of approximately 5

shock points each. Since the fragments are small, we can compute well-defined vector (4-

tuple) averages over the fragments. These averages become the labels of the corresponding

graph nodes. We define the distance between two nodes as the Euclidean distance between

their vector labels. For those pairs of nodes that correspond to adjacent skeleton fragments,

we define an edge whose weight is defined by the Euclidean distance between the pair. We

should mention here that the fragment size was chosen arbitrarily, and we expect that other

choices of similar magnitudes will work equally well.

To convert our shock graphs to shock trees, we compute the minimum spanning tree

of the weighted shock graph. Since the edges of the shock graph are weighted based on

Euclidean distances of corresponding nodes, the minimum spanning tree will generate suit-

able tree approximation for shock graphs. The root of the tree is the node that minimizes

the sum of distances to all other nodes. Finally, each node is weighted proportionally to

its average radius, with the total tree weight being 1. An illustration of the procedure was

given in Figure 1.5 (It is shown again in Figure 6.1). The left portion shows the initial

silhouette and its shock points (skeleton). The right portion depicts the constructed shock

tree. Darker, heavier nodes correspond to fragments whose average radii are larger.

We tested our many-to-many matching algorithm on a database of 1620 silhouettes

of 9 objects, with 180 views per object. A representative view of each object is shown

in Figure 6.2. For the experiments, we compute the shock tree representation of every

silhouette, and embed each tree into a normed space with low distortion. This procedure

73

Figure 6.1: Left: the silhouette and its medial axis. Right: the medial axis tree constructedfrom the medial axis. Darker nodes reflect larger radii.

Figure 6.2: Sample views of the 9 objects.

results in a database of weighted point-sets, each representing an embedded graph.To test our approach, we randomly selected 19 equidistant views of each object and

computed distances between these views and each of the remaining database entries (the

distance between a view and itself is always zero). To compute the distance between ob-

jects A and B, for one view of object A, we first find the sum of its total distances to object

B. After repeating this process for the other views of object A, we compute the average

distance between them. These object distances are summarized in Table 1, Figure 6.3. The

magnitudes of the distances are denoted by shades of gray, with black and white repre-

senting the smallest and largest distance, respectively. Due to symmetry of the resulting

distances, we only included the upper triangle of results. Intra-object distances, shown

along the main diagonal, are very close to zero. According to the table, inter-object dis-

74

tances were near intra-object distances in only 3 out of 36 cases (BINOCULAR and CLOCK,

CAMERA and PHONE, and CAR and TEAPOT).

To better understand the differences in the recognition rates for different objects, we

have selected a subset of the matching results among the 4 views of TEAPOT, taken at

20 � , 30 � , 60 � , and 90 � , respectively, as shown in Table 2. Due to the highly symmetric

structure of the object, implying that neighboring views are more likely to be similar, the

distance between a view of TEAPOT and its neighboring view is closer than its distance

to other objects’ views. Conversely, Table 3 illustrates the fact that due to a low view

sampling resolution, certain views of certain objects are more similar to certain views of

other objects than they are to neighboring views of the same object. For example, the best

(non-identical) match for the third view of CUP is the first view of PHONE. Upon closer

inspection of these two degenerate views, it turns out that there is considerable similarity

in their shock tree representations. On the other hand, the first two views of CUP have been

optimally matched to each other, along with the last two views of PHONE.

Figure 6.4 illustrates the many-to-many correspondences that our matching algorithm

yields for two adjacent views (30 � and 40 � ) of the TEAPOT. Corresponding clusters (many-

to-many mappings) have been shaded with the same color. Note that the extraneous branch

in the left view was not matched in the right view, reflecting the method’s ability to deal

with noise. More examples showing that the many-to-many feature matching results in an

intuitive pairing of shock segments are presented in Figure 6.5.

Based on the overall matching statistics, we observed that in 5 � 74% of the experiments,

the closest match selected by our algorithm was not a neighboring view of the correct ob-

ject. We expect that with increased view sampling resolution, ensuring that for each object

75

Figure 6.3: Summary of many-to-many matchings of object silhouettes. Every entry of Ta-ble 1 corresponds to a set of 19 � 19 matching results between the views of the two objectsassociated with the row and the column. The shade of gray in each cell denotes averagematching distance of each 19 � 19 block, with black and white representing smallest andlargest distances, respectively. Table 2 shows a close up look at the matching results forfour views of TEAPOT. Table 3 depicts a subset of results from three seperate blocks.

76

Figure 6.4: Illustration of the many-to-many correspondences computed for two adjacentviews of the TEAPOT. Matched point clusters are shaded with the same color.

view there exists a similar neighboring view, this error rate would decrease significantly.

We repeated the experiment using the spherical embedding, resulting in a 4 � 9% error rate.

This is a clear improvement in performance, at a reduced computational cost.

It should be noted that both the embedding and matching procedures can accommodate

perturbation, such as noise and occlusion. This is due to the fact that the path partitions

for unperturbed portions of the graph are unaffected by perturbation. Moreover, the projec-

tions of unperturbed nodes will also be unaffected by perturbation. Finally, the matching

procedure is an iterative process driven by flow optimization which, in turn depends only

on local features, whose local attributes can act as matching constraints.

77

Figure 6.5: The result of matching skeleton graphs for some shapes in the Rutgers ToolsDatabase. Same colors indicate corresponding segments. Observe that the correspondenceis intuitive in all cases.

78

To test the sensitivity of the matching algorithm to perturbation of the query, we per-

formed the following experiment for each of the 9 objects. Each view, in turn, was used

as a query (with replacement) and perturbed by deleting a randomly selected connected

subset of the skeleton points whose size was chosen randomly to fall between 5% and 25%

of the total number of skeleton points. If the closest view to the query was the unperturbed

view, matching was scored as correct. For the 9 objects, the average correct score was 89%,

reflecting the algorithm’s stability to missing data, a form of occlusion using Matousek’s

embedding. For the spherical embedding, we observed an average correct score of 91.4%.

6.2 Many-to-Many Matching using Ridge-and-Blob Decomposition Graphs

We now turn to the domain of blob graphs, a brief overview of which was presented

in Chapter 4. Let us first return to the example shown in Figure 1.1, where we observed

the need for many-to-many matching. The results of applying our method to these two

images are shown in Figure 6.6, in which many-to-many feature correspondences have

been colored the same. For example, a set of blobs and ridges describing a finger in the left

image is mapped to a set of blobs in ridges on the corresponding finger in the right image.

To provide a more comprehensive evaluation, we tested our framework on two separate

image libraries, the Columbia University COIL-20 (20 objects, 72 views per object) and

the ETH Zurich ETH-80 (8 categories, 10 exemplars per category, 41 views per exemplar).

A representative view of each object is shown in Figure 6.7. For each view, we compute a

multi-scale blob decomposition, using the algorithm described in [95]. Next, we compute

the tree metric corresponding to the complete edge-weighted graph defined on the regions

of the scale-space decomposition of the view. The edge weights are computed as a function

79

Figure 6.6: Applying our algorithm to the images in Figure 1.1. Many-to-many featurecorrespondences have been colored the same.

of the distances between the centroids of the regions in the scale-space representation. Fi-

nally, each tree is embedded into a normed space of prescribed dimension. This procedure

results in two databases of weighted point sets, each point set representing an embedded

graph.

For the COIL-20 database, we begin by removing 36 (of the 72) representative views

of each object (every other view), and use these removed views as queries to the remaining

view database (the other 36 views for each of the 20 objects). We then compute the distance

between each “query” view and each of the remaining database views, using our proposed

matching algorithm. Ideally, for any given query view i of object j, vi : j, the matching

algorithm should return either vi � 1 : j or vi � 1 : j as the closest view. We will classify this as a

correct matching. Figure 6.8 presents a subset of the matching experiments for object 9 of

the COIL-20 database, with a correct matching in almost all cases.

Based on the overall matching statistics, we observe that in all but 4 � 8% of the exper-

iments, the closest match selected by our algorithm was a neighboring view. Moreover,

80

Figure 6.7: Views of sample objects from the Columbia University Image Library (COIL-20) and the ETH Zurich (ETH-80) Image Set.

among the mismatches, the closest view belonged to the same object in 81 � 02% of the

cases. In comparison, Matousek’s embedding yielded a 10.74% matching error where,

among the mismatches, the closest view belonged to the same object in 80.0% of the cases.

Figure 6.9 presents the result of this experiment, with darker points representing the closer

matches.

For the ETH-80 database, we chose a subset of 32 objects (4 from each of the 8 cate-

gories) with full sampling (41 views) per object. For each object, we removed each of its

41 views from the database, one view at a time, and used the removed view as a query to

the remaining view database. We then computed the distance between each query view and

each of the remaining database views. The criteria for correct classification was similar

to the COIL-20 experiment. Our experiments showed that in all but 6 � 2% of the experi-

ments, the closest match selected by our algorithm was a neighboring view. Among the

81

ModelQuery

2.70 7.05 6.54 7.78 10.37 5.89 13.41 12.30 20.34 13.90 19.60 19.53

4.14 5.56 5.18 7.98 10.30 4.24 12.34 11.23 20.01 12.24 18.40 17.73

6.39 2.34 2.68 4.17 5.97 5.94 17.03 15.87 25.28 16.74 22.99 22.17

6.07 4.04 4.04 3.17 4.44 6.64 17.26 15.92 25.82 17.17 23.53 22.80

7.27 6.39 6.55 5.31 3.88 8.26 18.20 16.88 26.74 17.85 24.57 23.81

5.31 4.20 5.25 5.67 3.21 5.63 17.08 15.86 25.20 17.07 23.49 22.79

9.61 11.65 11.21 13.81 16.00 6.80 7.07 8.20 14.92 9.05 13.74 14.65

13.64 15.32 14.85 17.35 19.28 11.80 2.69 3.70 14.20 6.75 10.93 12.19

14.34 16.03 15.23 17.92 19.90 12.21 5.28 3.54 14.61 4.61 8.96 10.33

13.50 14.90 14.41 17.39 19.32 11.44 6.56 4.13 15.00 5.25 8.97 9.98

17.16 18.97 18.34 21.28 23.11 15.70 7.95 7.85 13.52 4.23 4.73 6.17

20.53 22.30 21.25 24.17 26.18 18.77 11.46 11.59 14.48 7.14 3.02 2.75

20.19 20.90 19.92 22.89 24.87 18.18 12.19 12.27 14.91 7.94 6.53 3.24

Figure 6.8: Sample matching results for object 9 of the COIL-20 database, in which rowsand columns can be interleaved to form the set of sequential views. The diagonal and nextlower diagonal therefore represent the neighboring views of the query (row). Only onequery, entry (10,8), was incorrectly matched.

mismatches, the closest view belonged to the same object in 77 � 19% of the cases, and the

same category in 96 � 27% of the cases. For Matousek’s embedding, in all but 17 � 5% of

the experiments, the closest view belonged to the correct object in 67 � 4% of the cases, and

82

Figure 6.9: The matching results for the COIL-20 database. The rows represent the queryviews (36 views per object), and the columns representing model views (36 views per ob-ject). Each row represents the matching results for a query view against the whole database.The intensity of entries represents the quality of the matching, with black representing max-imum similarity between the views and white minimum similarity.

83

PERTURBATION 5% 10% 15% 20%RECOGNITION RATE COIL-20 91.07% 88.13% 83.68% 77.72%RECOGNITION RATE ETH-80 93.2% 90.1% 86.3% 82.2%

Table 6.1: Recognition rate as a function of increasing perturbation. Note that the base-line recognition rate (with no perturbation) is 98.0% for COIL-20 and 98.5% for ETH-80datasets.

the same category in 81 � 3% of the cases. The results clearly demonstrate the improved

performance offered by the spherical embedding technique.

To demonstrate the framework’s robustness, we performed four perturbation experi-

ments on the COIL-20 and ETH-80 databases. The experiments are identical to the COIL-

20 and ETH-80 experiments described above, except that the query graph was perturbed

by adding/deleting 5%, 10%, 15%, and 20% of its nodes (and their adjoining edges). The

choice of spherical embedding was motivated by its better performance over that of Ma-

tousek’s embedding. The results are shown in Table 6.1, and reveal that, like our skeleton

tree matching example, the error rates increase gracefully as a function of increased pertur-

bation.

It should be pointed out that both skeleton tree and blob graph experiments can be con-

sidered worst case for two reasons. First, the sampling resolutions of the viewing sphere

were high in each case, meaning that more than the immediate neighbors of a particular

view may be similar to it. Given the high similarity among neighboring views, it could be

argued that our matching criterion is overly harsh, and that perhaps a measure of “view-

point distance”, i.e., “how many views away was the closest match” would be less severe.

In any case, we anticipate that with fewer samples per object, neighboring views would be

more dissimilar, and our matching results would improve. Second, and perhaps more im-

84

portantly, many of the objects are symmetric, and if a query neighbor has an identical view

elsewhere on the object, that view might be chosen (with equal distance) and scored as an

error. Many of the objects in the database are rotationally symmetric, yielding identical

views from each viewpoint.

6.3 Comparison to Other Approaches

In addition to demonstrating the effectiveness of our many-to-many matching algorithm

applied to shape retrieval, we compare our matching results to two leading graph matching

algorithms: a one-to-one matching algorithm proposed by Pelillo et al. [76] (using asso-

ciation graphs) and a many-to-many matching algorithm proposed by Sebastian et al. [87]

(using graph-edit distance). For the comparison, we use the Rutgers Tool Database [99],

which consists of 25 shapes organized into eight classes: brush, hammer, pliers, screw-

driver, wrench, hand, profile, and horse. Four of these classes, namely, hammer, pliers,

screwdriver, and wrench, can be further grouped into a broader “tools” category. Sam-

ple views from each class are shown in Figure 6.10. In the experiment, we remove the

first shape (the query) from the database and compare it to all remaining database shapes.

The shape is then put back in the database, and the procedure is repeated with the second

database shape, etc., until all 25 shapes have been used as a query. After computing the

similarity values between every database pair, we look at the top matches to see how many

of the within-category shapes belong to the same class as the query. Ideally, if an object

has n shapes in the database, the top n � 1 entries should belong to the same class as the

query.

Our results, along with those reported in [76] and [87], are presented in Figure 6.11,

85

Figure 6.10: Sample views of objects from the Rutgers Tools Database.

where correct matches retrieved from the database are colored yellow, while the mismached

entries are colored red. Considering only the best matches, we observe that while in Pelillo

et al.’s shock tree approach there is a total of 3 mismatched entries, both Sebastian et al.’s

graph-edit distance framework and our approach yield only 1 mismatched entry. In addi-

tion, considering all within-category matches, both the shock tree and graph-edit distance

approaches yield a total of 5 errors, while our approach yields only 3 errors. Moreover, if

we further group the hammer, pliers, screwdriver, and wrench shapes into the same “tools”

category, our many-to-many matching approach produces a 100% correct matching, while

the other two approaches still have mismatched entries. One would expect that as the de-

gree to which correspondences are many-to-many increases, both the graph-edit distance

algorithm as well as our algorithm would yield improved scores.

Overall, it is clear from the results that our results outperform both shock tree and

graph-edit distance approaches for the Rutgers Tools Database.

86

Figure 6.11: Comparison to two leading graph matching algorithms: Pelillo et al. [76](left), Sebastian et al. [87] (center), and our algorithm (right). In each case, the top sevenmatched database objects are sorted by their similarity to the query. Correct matches arecolored yellow, while mismaches entries are colored red.

87

6.4 Conclusions

In this chapter we experimentally verified that our matching framework yields mean-

ingful many-to-many feature correspondences between pairs of graphs representing 2D

shapes. The distance between graphs can be used as a dissimilarity measure between orig-

inal shapes. The effectiveness of the approach in the context of shape retieval using two

different recognition domains was demonstrated. Since we presented two embedding al-

gorithms in Chapter 3, we tested our matching framework using each of these embedding

techniques. In both domains, the experimental results clearly demonstrate the improved

performance offered by the spherical embedding technique over that of Matousek. This

can be attributed to the fact that embedded graph nodes can be directly matched in the

target space without the need for a dimensionality reduction process. As mentioned in

Chapter 3, an important trade-off exists between the distortion of the spherical embedding

and the dimension of the target space. The value of the dimension effects the distortion of

the embedding as well as the recognition rate. However, we should note that the higher the

dimensionality of the target space, the longer it takes the EMD to solve the transportation

problem. For practical purposes, we set the value of the dimension to 40.

We also tested the robustness of the framework by a set of perturbation experiments

in which the query graph was perturbed by adding/deleting 5%, 10%, 15%, and 20% of

its nodes and adjacant edges. According to the results, error rates increase gracefully as a

function of increased perturbation, which, in turn, shows the ability of the framework for

accommodating perturbation.

In addition to the recognition tests, we also performed a set of pose estimation experi-

88

ments, where the objective was to retrieve one of the neighboring views of the query. The

results show that for a given query, in more than 93% of the experiments, the algorithm

selects a correct neighboring view.

After demonstrating the effectiveness of our many-to-many matching algorithm applied

to shape retrieval, we will extend our framework to work with different feature extraction

algorithms and graph types. Here, our objective is to show the matching potential of our

framework in two different domains: face recognition and 3D object retrieval. We will

also compare our results to some existing approaches presented for these domains in the

experimental sections of the following chapters.

89

7. Face Recognition Experiments

In this chapter we evaluate our framework on a set of face recognition experiments.

We first begin by introducing a new feature extraction process and graph types. We then

show how we apply our matching framework to compute the similarity between pairs of

graphs of the new type. At the end of the chapter, we present the recognition performance

of our algorithm on a face database of 20 people with 10 faces per person for a total of

200 images. We also examine the stability of both the graph construction and matching

approaches in the experimental section.

7.1 Discrete Representation of Top Points via Scale Space Tessellation

It has been shown that top points (singular points in the scale space representation

of generic images) have proven to be valuable sparse image descriptors that can be used

for image reconstruction [54, 73] and image matching [55, 78]. In this section, we take

an unstructured set of top points and impose a neighborhood structure on them. Inspired

by the work of Lifshitz and Pizer [59], we will encode the scale space structure of a set

of top points in a directed acyclic graph (DAG). Specifically, we combine the position-

based grouping of the top points provided by a Delaunay triangulation with the scale space

ordering of the top points to yield a directed acyclic graph. This new representation allows

us to utilize powerful graph matching algorithms to compare images represented in terms

of top point configurations, rather than using point matching algorithms to compare sets of

isolated top points. Specifically, we draw on our work in many-to-many graph matching

90

which reduces the matching problem to that of computing a distribution-based distance

measure between embeddings of labeled graphs.

We describe our construction by first elaborating on those basics of catastrophe theory

required to introduce the concept of a top point. Next, we formally define a top point,

and introduce a measure for its stability that will be later utilized in the matching algo-

rithm. Section 7.3 describes the construction of the DAG through a Delaunay triangulation

scheme. The details of this construction process can be found in [77].

7.2 Catastrophe Theory

Critical points are points at any fixed scale in which the gradient vanishes (∇u � 0).

The study of how these critical points change as certain control parameters change is called

catastrophe theory. A Morse critical point will move along a critical path when a control

parameter is continuously varied. In principle, the single control parameter in the models

of this article can be identified as the scale of the blurring filter. The only generic mortifica-

tions in Gaussian scale space are creations and annihilations of pairs of Morse hypersaddles

of opposite Hessian signature1 [26, 32]. An example of this is given in Figure 7.1.

The points at which creation and annihilation events take place are often referred to

as top points2. A top point is a critical point at which the determinant of the Hessian

degenerates: �� ∇u � 0

det�H �-� 0 � (7.1)

1The Hessian signature is the sign of the determinant evaluated at the location of the critical point.2The terminology is reminiscent of the 1D case, in which only annihilations occur generically.

91

++

0

0C

A

space

scal

e

Figure 7.1: The generic catastrophes in isotropic scale space. Left: an annihilation event.Right: a creation event. A positive charge � denotes an extremum, a negative charge �denotes a saddle, indicates the singular point.

An easy way to find these top points is by means of zero-crossings in scale space. This

involves derivatives up to second order and yields sub-pixel results. Other, more elaborate

methods, can be used to find or refine the top point positions. For details, the reader is

referred to [32].

It is obvious that the positions of extrema at very fine scales are sensitive to noise. This,

in most cases, is not a problem. Most of these extrema are blurred away at coarse scales

and won’t affect our matching scheme. However, problems do arise in areas in the image

that consist of almost constant intensity (genericity implies that flat plateaus do not occur

in the image). One can imagine that the positions of the extrema (and thus the critical paths

and top points) are very sensitive to small perturbations in these areas. These unstable

critical paths and top points can continue up to very high scales since there is no structure

in the vicinity to interact with. To account for these unstable top points, we need to have a

measure of stability, so that we can either give unstable points a low weight in our matching

92

scheme, or disregard them completely.

7.3 Construction of the Graph

The goal of our construction is two-fold. First, we want to encode the neighborhood

structure of a set of points, explicitly relating nearby points to each other in a way that

is invariant to minor perturbations in point location. Moreover, when local neighborhood

structure does indeed change, it is essential that such changes not affect the encoded struc-

ture elsewhere in the graph (image). The Delaunay triangulation imposes a position-based

neighborhood structure with exactly these properties [79]. It represents a triangulation of

the points which is equivalent to the nerve of the cells in a Voronoi tessellation, i.e., that

triangulation of the convex hull of the points in the diagram in which every circumcircle

of a triangle is an empty circle [74]. The edge set of our resulting graph will be based on

the edges of the triangulation. Our second goal is to capture the scale space ordering of

the points to yield a directed acyclic graph, with coarser scale top points directed to nearby

finer scale top points.

A summary of this procedure is presented in Algorithm 7, and it is illustrated for a

simple image in Fig. 7.2. In the top two frames in the left figure, we show the transition

in the triangulation from v2 (point 2) to v3 (point 3); the root is shown as point 1. In the

upper right frame, the triangulation consists of three edges; correspondingly, G has three

edges:�1 � 2 �`� � 1 � 3 �`� � 2 � 3 � , where

�x � y � denotes an edge directed from node x to node y. In

the lower left figure, point 4 is added to the triangulation, and the triangulation recomputed;

correspondingly, we add edges�1 � 4 �� 2 � 4 �� 3 � 4 � to G (note that

�1 � 2 � is no longer in the

triangulation, but remains in G). Finally, in the lower right frame, point 5 is added, and

93

Figure 7.2: Visualization of the DAG construction algorithm. Left: the Delaunay triangu-lations at the scales of the nodes. Right: the resulting DAG (edge directions not shown).

the triangulation recomputed. The new edges in the triangulation yield new edges in G:

(2,5),(4,5),(1,5). The right side of Figure 7.2 illustrates the resulting graph (note that the

directions of the edges are not shown). In Figure 7.3 the right image shows the result of

applying this construction to the left image.

7.4 Experimental Results

We conduct our experiments using a subset of the Olivetti Research Laboratory face

database. The database consists of faces of 20 people with 10 faces per person, for a total

of 200 images; each image in the database is 112 � 92 pixels. The face images are in

frontal view and differ by various factors such as gender, facial expression, hair style, and

presence or absence of glasses. A representative view of each face and all 10 face images of

one person from the database are shown in Figure 7.4 and Figure 7.5, respectively. Our goal

is to evaluate our proposed many-to-many matching framework on a set of face recognition

94

Figure 7.3: The right image shows the DAG obtained from applying Algorithm 7 to thecritical paths and top points of the face in the left.

Figure 7.4: Sample faces from 20 people.

experiments, where the objective is to select a correct face image belonging to the same

person as the query.

Fig. 7.6 presents an overview of the approach for these experiments. For a given face,

we first create its DAG according to Section 7.3 (Transition 1), and embed each vertex of

the DAG into a vector space of prescribed dimensionality using a deterministic spherical

coding (Transition 2). The choice of spherical coding is motivated by its better performance

over that of Matousek. (See Chapter 6). Finally (Transition 3), we compute the distance

95

Algorithm 7 Top point graph construction procedure

1: Detect the critical paths.

2: Extract the top points from the critical paths.

3: Label the extremum path continuing up to infinity as v1.

4: Label the rest of the nodes (critical paths, together with their top points) according tothe scale of their top points from high scale to low as v2 �� vn.

5: For i � 2 to n evaluate node vi:

6: Project the previous extrema into the scale of the considered node vi.

7: Calculate the 2D Delaunay triangulation of all the extrema at that scale.

8: All connections to vi in the Delaunay triangulation are stored as directed edges inG.

between the two distributions by the modified Earth Mover’s Distance under transforma-

tion. The dimension of the target space in Transition 1 has a direct effect on the quality of

the embedding. Specifically, as the dimensionality of the target space increases, the quality

of the embedding will improve. As mentioned in Chapter 3, there exists an asymptotic

bound beyond which increasing the dimensionality will no longer improve the quality of

the embedding.

For the experiments, we first group the faces in the database by individual; these will

represent our categories. Next, we remove the first image (face) from each group and

compare it (the query) to all remaining database images. The image is then put back in

Figure 7.5: Ten face images of one person from the database.

96

Figure 7.6: Computing similarity between two given faces. (Matched point clusters areshaded with the same color.) See text.

the database, and the procedure is repeated with the second image from each group, etc.,

until all 10 face images of each of the 20 individuals had been used as a query. We say

the matching is correct if a query from one individual matches closest to another image

from the same individual, rather than an image from another individual. The results are

summarized in Table 1, Fig. 7.7. The magnitudes of the distances are denoted by shades

of gray, with black and white representing the smallest and largest distances, respectively.

Due to symmetry, only the lower half of the distance matrix is presented. Intra-object

distances, shown along the main diagonal, are very close to zero.

To better understand the differences in the recognition rates for different people, we

randomly selected a subset of the matching results among three people in the database,

as shown in Table 2, Fig. 7.7. Here, the�i � j � -th entry shows the actual distance between

face i and face j. It is important to note that the distance between two faces of the same

person is smaller than that of different people, as is the case for all query faces. In our

97

Figure 7.7: Table 1: Matching results of 20 people. The rows represent the queries and thecolumns represent the database faces (query and database sets are non-intersecting). Eachrow represents the matching results for the set of 10 query faces corresponding to a singleindividual matched against the entire database. The intensity of the table entries indicatesmatching results, with black representing maximum similarity between two faces and whiterepresenting minimum similarity. Table 2: Subset of the matching results with the pairwisedistances shown. Table 3: Effect of presence or absence of glasses in the matching for thesame person.

98

experiments, one of our objectives was to see how various factors, such as the presence or

absence of glasses, affects the matching results for a single person. Accordingly, we took

a set of images from the database of one person, half with the same factor, and computed

the distances between each image pair. Our results show that images with the same factors

are more similar to each other than to others. Table 3 of Fig. 7.7 presents a subset of our

results. As can be seen from the table, images of the same person with glasses are more

similar than those of the same person with and without glasses. Still, in terms of categorical

matching, the closest face always belongs to the same person.

We also examine the stability of the proposed matching framework under additive Gaus-

sian noise at different signal levels applied to the original face images. For this experiment,

the database consists of the original 200 unperturbed images, while the query set consists

of noise-perturbed versions of the database images. Specifically, for each of the 200 im-

ages in the database, we create a set of query images by adding 1%, 2%, 4%, 8%, and

16% Gaussian noise. Figure 7.8 shows how an image looks after adding Gaussian noise

at different signal levels. Next, we compute the similarity between each query (perturbed

database image) and each image in the database, and score the trial as correct if its dis-

tance to the face from which it was perturbed is minimal across all database images. This

amounts to 40,000 similarity measurements for each noise level, for a total of 200,000 sim-

ilarity measurements. Our results show that the recognition rate decreases down to 96.5%,

93%, 87%, 83.5%, and 74% for 1%, 2%, 4%, 8%, and 16% of Gaussian noise, respectively

(see Table 7.1).

99

Figure 7.8: Sample face image after adding Gaussian noise at different signal levels. Part(a) shows the original image. Parts (b), (c), (d), (e), (f) shows how the image looks afteradding 1%, 2%, 4%, 8%, and 16% of Gaussian noise, respectively

GAUSSIAN NOISE 1% 2% 4% 8% 16%RECOGNITION RATE COIL-20 96.5% 93.0% 87.0% 83.5% 74.0%

Table 7.1: Recognition rate as a function of Gaussian noise at different signal levels.

7.5 Conclusions

In this chapter we first presented a method for imposing neighborhood structure on a

set of scale space top points. Drawing on the Delaunay triangulation of a set of points,

we generated a directed acyclic graph (DAG) whose edges were directed from top points at

coarser scales to nearby top points at finer scales. We then applied our matching framework

on the resulting DAGs to compute similarities between them. The approach was used in

face recognition for a database which consists of faces of 20 people with 10 faces per

person, for a total of 200 images. We computed an average similarity score between each

pair of people. Our experimental results show that the similarity score between a person

from the database and himself/herself is always greater than the similarities with the others.

In other words, using average pairwise similarity values our algorithm resulted in 100%

accuracy. One of our objectives in the experiments was to see how various factors, such

100

as the presence or absence of glasses, affects the matching results for a single person. Our

results showed that images with the same factors are more similar to each other than to

others. We also studied the stability of the overall recognition approach with respect to

additive Gaussian noise at different signal levels. Generally, the matching scores indicate

the robustness of the framework against increasing level of noise.

Overall, the experimental results demonstrate the performance of our matching frame-

work in a face recognition domain using singular points in the scale space representation

of generic images (top points). In the next chapter we will adapt our matching approach to

work with a different shape representation in a different recognition domain. Specifically,

we will use our matching algorithm to retrieve 3D volumetric objects using their skeletal

representations in a database of 1081 objects.

101

8. 3D Object Retrieval using Many-to-Many Matching of Curve Skeletons

In this chapter, we will adapt our many-to-many matching framework to 3D object re-

trieval. The objects used in this work are volumetric and are represented as 3D skeletons.

We demonstrate the performance of the approach on a large database of 3D objects con-

taining more than 1000 exemplars. The method is especially suited to matching objects

with distinct part structure and is invariant to part articulation. Skeletal matching has an

intuitive quality that helps in defining the search and visualizing the results. In particular,

the matching algorithm produces a direct correspondence between two skeletons and their

parts, which can be used for registration and juxtaposition.

One important contribution of this study is to show the ability of our matching frame-

work for part matching. More specifically, our goal is to match a part within a complex

whole in 3-dimensional space. This type of matching is particularly useful for CAD-type

databases and also for recognition in laser-scanned images, which tend to cluster objects

together. It is also central to medical applications in which a particular biological configu-

ration is to be found somewhere in a larger object such as an organ.

8.1 Introduction

3D object models are now widespread and are used in many diverse applications, such

as computer graphics, scientific visualization, CAD, computer vision, medical imaging,

etc. Large databases of 3D models are publicly available, such as the Princeton Shape

Benchmark Database [94] or the 3D Cafe repository [1], with datasets contributed by the

102

CAD community, computer graphic artists, or scientific visualization community. Such

models include both polygonal representations (CAD objects, computer graphics imagery)

and volumetric data (medical images, scientific visualization datasets). The problem of

searching for a specific shape in a large database of 3D models is an important area of

research. Text descriptors associated with the 3D shapes can be used to drive the search

process, as is done for 2D images [40,80]. However, text descriptions may not be available

and furthermore, could not be used for part-matching or similarity-based matching.

Matching 3D objects is a difficult problem, with a complex relation to the 2D shape-

matching problem. While the 3D nature of the representation helps to remove some of

the viewpoint, lighting, and occlusion problems in computer vision, other issues arise. Of

course, the added dimension and the inherent increase in data size make the matching pro-

cess more computationally expensive. Furthermore, many of the models are degenerate,

containing holes, intersecting polygons, overly thin regions, etc. And there are many dif-

ferent types of matching that may be desirable. Given a query object, one may want to

search an entire database for a matching exemplar, if one exists. On the other hand, if the

database contains categorical models, one may want to find the category to which the query

exemplar belongs.

In this chapter we use the skeleton of a 3D shape for matching. The skeleton used here

is a stick-like simplification of the 3D object, which preserves the main topological fea-

tures of the original object and provides information about the local structure in the form

of the distance between the skeleton point and the surface point. It is an intuitive shape

representation, which captures the notion of parts or components of an object. This allows

the user to understand the nature of the match and to influence the matching process by em-

103

phasizing or de-emphasizing certain features of the object. We demonstrate the efficiency

of the matching framework on a database of about 1100 examples. While the performance

of our algorithm is comparable to that of other existing 3D matching methods (eg. [75,94]),

the locality of our skeletal representation and matching algorithm has some other benefits,

such as enabling part matching and articulated matching.

8.2 Approach

The main steps of the skeleton matching process are as follows. First, we determine the

curve skeleton of the object. An overview of this step is presented in Section 8.2.1. Next,

we match the exemplar skeleton against all other skeletons in the database. Finally, we will

rank the results and visualize the best match. Details of the approach can be found in [22].

A skeleton is a useful shape abstraction that captures the essential topology of an ob-

ject in both two and three dimensions. It provides the following characteristics that are not

present in global shape descriptors.

Part/Component Matching: In contrast to a global shape measure, skeleton matching can

accommodate part matching, where the object to be matched is part of a larger object, or

vice versa. This feature can potentially give the user more control over the matching algo-

rithm, allowing them to specify what part of the object they would like to match or whether

the matching algorithm should weight one part of the object more than the rest.

Registration and visualization: The skeleton can be used to register the two matched objects

and visualize the result in a common space. This is very important in scientific applications

where one is interested in both finding a similar object and understanding the extent of the

similarity [101].

104

Figure 8.1: Some examples of 3D shapes and their computed skeletons.

Intuitiveness: The skeleton is an intuitive representation of shape and can be easily under-

stood by the user, providing more control in the matching process.

Articulated transformation invariance: The method presented here can be used for artic-

ulated object matching, because the skeleton topology does not change within limits as a

result of articulated motion. An example was shown in [101]. Note that most global shape

descriptors cannot accommodate such changes in object configuration.

8.2.1 The Curve-Skeleton

We utilize a curve skeleton for the matching. The curve skeleton is a concise represen-

tation of the object which is easy to understand and is used in many CAD and Computer

Graphics modeling programs. The curve skeleton is not unique in 3D and its determination

is based upon the application for which it is being used. A full description and explanation

can be found in [21].

Our curve-skeleton extraction algorithm works on a volumetric representation of the 3D

105

object. It is based on the method presented by Chuang et. al. [18] which uses a generalized

Newtonian potential field generated by charges placed on the surface of the object to extract

a 1D curve-skeleton from a 3D shape. The generalized potential at a point due to a nearby

point charge is defined as a repulsive force, pushing the point away from the charge with a

strength that is inversely proportional to some power of the distance between the point and

the charge. This step produces a vector field.

Given a 3D vector field, we use concepts from vector field visualization to identify two

types of seed points that we will use to construct a curve-skeleton: critical points and high

divergence points. At critical points, the magnitude of the vector vanishes, which is why

they are also called zeros of the vector field. A full discussion of the visualization of vector-

field topology and the different types of critical points can be found in [36] and [46]. In

addition to critical points, we also use the divergence of the vector field to select new seed

points. We compute the divergence at each voxel inside the object and the user specifies

the percentage of the highest divergence points that will be used as seeds [21]. By varying

this parameter, one can generate an entire hierarchy of skeletons of various complexities

and select the best one for a given application. In the experiments presented in Section 8.3,

we used 40% of the highest divergence points as seeds for all our skeletons.

Skeleton segments are discovered using a force-following algorithm on the underlying

vector field, starting at each of the identified seed points. The force following process

evaluates the vector (force) value at the current point in the vector field and moves in the

direction of the vector with a small pre-defined step. For more details of this procedure,

see [21]. Figure 8.1 shows a few examples of 3D objects and their respective skeletons.

The algorithm starts by computing the generalized potential function at each object

106

Figure 8.2: Computing similarity between two given objects.

voxel, producing a 3D vector-field. Next, the critical and high divergence points of the

vector field as seeds for skeleton segments will be detected. Finally, the curve-skeleton

using the force-following algorithm initiated at every seed point will be extracted.

The skeleton obtained using the above algorithm consists of a set of points sampled

by the force following algorithm. Each skeleton point is then equipped with a distance-

transform value [33], a real number specifying the distance to the closest point on the

surface of the object. This additional information is used by the many-to-many matching

process.

Figure 8.2 shows an example of matching between two objects: in step 1, the curve-

skeleton for each object is computed while in step 2, the many-to-many matching estab-

lishes the distance and the correspondence between the two skeletal representations. The

skeleton regions that were matched to each other are shown in the same color in Figure

8.2.

107

8.3 Experimental Results

To evaluate the utility of our skeletal representation and many-to-many matching algo-

rithm, we performed 2 sets of experiments: 3D base classification and part matching.

8.3.1 Base Classification and Object Retrieval

We first tested our proposed approach to retrieving similar objects on a subset of 1081

objects from the Princeton Shape Benchmark Database [80], grouped into 99 non-empty

classes from both the test and train classifications [94]. In our experiments, we first created

3D skeletons for each object. We used 40% of highest divergence points as seeds for all

our skeletons. We then computed the distance from each object to the remaining database

entries using our many-to-many matching algorithm. If the conceptual classes correspond

to bodies which vary only in scale, or by articulated transformation, our algorithm should

return an object that belongs to the same class as the query. We will classify this as a

“correct matching”. Based on the overall matching statistics, we observe that in 71.1% of

the experiments, the overall best match selected by our algorithm belonged to the same

class as the query (also known as the nearest neighbor criterion [94]). In 74.3% of the

experiments, the best match belonged to the same parent class as that of query.

In a second experiment, we asked how many of the models in the query’s class appear

within the top T � 1 matches, where T is the size of the query’s class (First tier [94]). This

number was 17.2%. Repeating the same experiment, but considering the top 2 � T � 1

matches (second tier [94]) covers 22.7% of the members of the class.

Comparing these results with those reported by Shilane et. al [94] in Table 4 of their

108

work, it should be noted that our method outperforms all methods on the nearest neighbor

criterion, but does not do as well on the first and second tier criterion. This is evident in the

precision-recall plot in Figure 8.3. The precision-recall plot shows the relation between

recall (the ratio of models from the class of the query returned within the top N matches)

and the precision (the ratio of the top N matches that belong to the query class) [94]. Figure

8.3 shows the precision-recall plot averaged over all models and looking at the first 20 best

matches only.

In Figure 8.4, we have presented the matching results for a small subset of objects.

The first column of each row shows the query object; the remaining elements of each row

represent the top 10 closest objects of the database determined by our matching algorithm.

Observe that in most cases, the closest object is an object from a similar class. In some

cases, while the algorithm has identified an object with similar structure as the best match,

it was still penalized for selecting an object from an incorrect category. The query object

(race car) in row one and its best matched object are an example of such a case. They

can be attributed to the hierarchy of particular categories used by the Princeton Shape

Benchmark Database [80]. When similarity of shape is desired, a method which relies

on shape would help retrieve objects not normally associated with the exemplar and not

typically categorized with it.

8.3.2 Part Matching

Matching of a part within a complex whole is useful for CAD-type databases and also

for recognition in laser-scanned images, which tend to cluster objects together. It is also

central to medical applications in which a particular biological configuration is to be found

109

somewhere in a larger object such as an organ. Specifically, given a part of an object as a

query, one attempts to locate objects containing similar subparts. Here, the difficulty lies

in the fact that none of the database objects contains an exact copy of the query.

An important aspect of the part matching approach is the computation of correspon-

dence between the matched objects. Our many-to-many matching algorithm provides a

direct correspondence between the skeleton points of the query object and the skeleton

points in the matched objects. This allows one to register the query part into the composite

object. Global shape descriptors perform poorly at this task because global information

cannot preserve local correspondences.

In our next experiment, we used a query part (a torso) and matched it against several

simple objects in the database, some containing the query part. Aside from the simple

objects in the database, we have created a number of composite objects obtained by a

union operation applied to two simple objects – the kind of composition one would expect

to encounter in laser-scanned scenes. The query objects and some of the database objects

together with distance values computed by our matching algorithm are shown in Figure 8.5.

For every database object, we also show its corresponding parts with the query object in

Figure 8.6.

8.4 Conclusions

In this chapter we applied our matching framework to 3D object retrieval using skeletal

representations of volumetric objects. We demonstrated the performance of the method

on a database of over 1000 objects, with retrieval results comparable to the global shape

descriptor methods presented in [94].

110

The skeleton-based approach has a number of advantages over the global shape de-

scriptor methods. It is an intuitive representation of 3D objects that can be easily used to

understand the similarities present in the matched objects. Since our many-to-many match-

ing algorithm provides a direct correspondence between skeleton points in two matched

objects, one can use this correspondence for registration and juxtaposition. The skeleton

captures both global and local properties of the shape, so it can be used for many different

matching tasks.

One important contribution of this chapter is to show the ability of our matching frame-

work for part matching, where only a portion of the skeleton is matched. Part matching is

also useful in a CAD environment, where a user may be interested in retrieving objects that

contain a certain part or component. Laser-scanned scenes also tend to merge together all

elements in the environment; in this situation, part matching can be used for segmentation.

Our part matching examples showed that many-to-many matching can be used to locate

a part in a database of composite objects. The inverse problem is also of interest, where

given a composite object, one would like to identify its component parts among the objects

of a database. We will focus on this problem in the future.

111

Figure 8.3: Precision/Recall for many-to-many matching algorithm in object retrieval ex-periment.

112

2.4 17.9 18.0 20.4 20.5 20.9 21.0 21.1 21.8 21.9

1.5 10.4 12.8 14.0 14.2 14.7 14.8 15.3 15.4 15.5

25.8 30.4 35.5 36.2 38.1 43.7 44.1 44.8 44.9 45.3

1.3 34.3 34.7 35.3 35.5 35.9 39.8 40.2 40.5 40.6

Query Top 10 Matched Objects

Figure 8.4: Models are sorted by the similarity to the query object.

42.0 158.2 189.5 206.2 212.9

Figure 8.5: Part Matching Example: computed distances between a query part (torso) ver-sus several simple and composite objects.

113

Figure 8.6: Correspondences in Part Matching: The query object in (a) is matched againsteach of the objects in (b). The correspondences between their skeletons are shown in redin (c)

114

9. Conclusions

9.1 Summary

There is a growing trend towards research in feature matching, often formulated as a

graph matching problem, whose goal is to establish node correspondences between pairs of

graphs. Depending on the way that these correspondences are established, graph matching

algorithms can be divided into two groups: one-to-one and many-to-many. Although pow-

erful, algorithms providing one-to-one feature correspondences suffer from the significant

limitation that one-to-one correspondences between graphs of similar objects must exist.

However, due to noise, segmentation or articulation errors, such correspondences may not

exist.

In this thesis we presented an efficient (polynomial time) novel matching algorithm that

established many-to-many correspondences between the nodes of two noisy, vertex-labeled

weighted graphs. To match two graphs, we began by constructing metric tree representa-

tions of the graphs. Next, we embedded them into a geometric space with low distortion

using a novel encoding of the graph’s vertices with the aim of preserving pairwise vertex

distances. While the distances could not be preserved exactly, they were approximated

with low distortion. We presented two low-distortion embedding algorithms, beginning

with one that was inspired by the general framework of Matousek [65]. In this algorithm,

the dimensionality of a graph’s embedding is a function of the graph. Specifically, the

number of paths in the caterpillar decomposition of the graph defines the dimension of the

target space. Two graphs to be matched may yield embeddings with different dimensional-

115

ity, requiring a projection step to bring them to the same space. We overcome this problem

by introducing a second embedding technique, using a novel spherical encoding of graph

structure, which embedded both graphs into a single space of prescribed dimensionality.

The second embedding algorithm is a deterministic variation of the embedding technique

presented in [43].

By embedding weighted graphs into normed vector spaces, we reduced the problem

of many-to-many graph matching to that of many-to-many geometric point matching, for

which the Earth Mover’s Distance algorithm is ideally suited. Moreover, by mapping a

node’s geometric and structural “context” in the graph to an attribute vector assigned to its

corresponding point, we extended the technique to deal with hierarchical graphs that repre-

sent multi-scale structure. The many-to-many point matching computed by the EMD yields

a set of many-to-many node correspondences between the original graphs. Despite the fact

that our framework was designed to establish many-to-many feature correspondences, it,

in fact, includes one-to-one matching as a special case.

We evaluated the framework using each embedding technique on two different object

recognition domains: silhouettes and multi-scale ridge and blob decompositions. The ex-

perimental results demonstrated the effectiveness of the approach for finding many-to-many

feature correspondences. Given a query and a database of more than one thousand entries,

a more comprehensive evaluation of our framework for shape retrieval demonstrated the

ability of our approach to estimate correct pose and to select the correct object in more

than 93% and 97% of the cases, respectively. A set of perturbation experiments showed the

stability of the overall framework. We also compared our approach to two leading graph

matching algorithms and presented the recognition rates of each approach and their top

116

seven matches. Considering within-category matches, these comparison tests showed that

our framework resulted in better recognition rates than the others.

In addition to these experiments, we presented the applicability of the framework to

two other recognition domains: face recognition, and 3D object retrieval using skeletal

representations of 3D volumetric objects. These experiments also produced encouraging

results, showing the potential of the developed method in a variety of computer vision

and pattern recognition domains. The stability of the overall approach against increasing

levels of Gaussian noise and our preliminary part matching results were also presented in

these works. In 3D object retrieval experiments, our matching framework outperformed

all existing frameworks on a database of more than one thousand objects for the nearest

neighbor criterion.

9.2 Contributions

There are many contributions of this work that would be valuable to many fields of

computer vision. The specific contributions of the thesis are as follows:

1. We developed a novel framework for graph matching with many-to-

many node correspondences. Specifically, we showed that many-to-

many graph matching problem could be reduced to that of many-to-

many point matching in vector space. This contribution is important

because this step enables us to transform an intractable problem in

graph space into a tractable one in vector space with some approxi-

mation.

117

2. We showed that the deterministic variation of the spherical embed-

ding method is a powerful technique that enables us to embed tree

metrics into a vector space of prescribed dimensionality. The main

advantage of this technique is due to the fact that the embedded

nodes can be matched directly without the need for a dimensionality

reduction process.

3. We showed that directed edge relations, such as hierarchies between

graph nodes, could be represented as node attributes. Encoding such

relations in node attributes allowed us to express the mass of each

node as a function of its local histograms, which, in turn, enabled

us to use the graph structure, while establishing correspondences.

More specifically, this process extended the technique to deal with

hierarchical graphs representing multi-scale structures.

4. By a set of experiments, we showed that the many-to-many vector

mapping that realizes the minimum Earth Mover’s Distance corre-

sponds to the desired many-to-many matching between nodes of the

original graphs. In addition, shape retrieval experiments in various

computer vision domains showed that the distance computed by the

EMD could be used as a dissimilarity value between the original

graphs representing objects.

118

9.3 Discussion and FutureWork

Our matching framework can be applied to any many-to-many graph matching problem,

whether directed or undirected graphs. Still, the approach has its limitations. Finding

meaningful feature correspondences depends on appropriate edge weights in the original

graphs. Since these edge weights ultimately govern the proximity of the embedded points

and hence their propensity to being combined during the EMD step, the edge weights

(distances) are effectively a perceptual grouping or abstraction heuristic between features.

If they are chosen or defined poorly, the EMD step may not converge on a meaningful

solution.

The EMD is a global distance that tries to account for all the points. Although we

showed in our experiments that the framework is robust to perturbation of the graphs in

terms of missing/spurious features, overall the method is global. If a graph includes a node

representing an occluder with large mass, its presence will have an adverse effect on the

computed flows for the algorithm cannot selectively exclude the node. Note, however, that

if there are unique attributes shared by nodes to be matched, these attributes can act as

constraints on the EMD matching, ensuring that a pile of dirt with a particular “color” can

flow to holes of the same color.

It should also be noted that in an object recognition problem, the type of representation

used in describing the objects has a significant impact on both the correctness and effec-

tiveness of the recognition system. Hence, our recognition results show the quality of our

matching approach, as well as the feature extraction and representation methods used in

the framework.

119

The results of comparing our approach to existing frameworks are rather promising and

require further exploration of the matching algorithm. We will study the effectiveness of

our algorithm on much larger datasets and compare our results to more leading matching

frameworks based on both one-to-one and many-to-many matchings. Although our algo-

rithm finds many-to-many matchings in polynomial time, it takes about one minute on

an Intel(R) Xeon(TM) CPU 1.50GHz computer to match two graphs having around 2000

nodes, which limits the number of graphs that can be practically matched. We plan to

improve the efficiency of the algorithm by optimizing the matching code and revising the

algorithm itself.

One way to revise the algorithm is to use a distance-preserving embedding algorithm

(isometric embedding) in the framework. For instance, one may embed tree metrics into l1

and compute the correspondences under this norm. Given tree metrics, this technique will

enable us to embed their nodes with no distortion. Thus, pairwise distances in the vector

space will be equal to those in the tree metrics. Another distance-preserving embedding

algorithm can be obtained by embedding metrics defined for input graphs into l∞. Since,

any metric space can be embedded isometrically into l∞, pairwise distances in the target

space will be equal to the original ones. In both of these approaches, our goal will be to

solve for the transportation problem under each of the l1 and l∞ norms, while accounting

for the transformation.

Another alternative way of revising the algorithm is to define a different metric distance

on given weighted graphs. Recall that only the shortest-path metric was used in the frame-

work to compute distances between vertices of the input graphs. Since finding meaningful

feature correspondences depends on appropriate edge weights in the original graphs, the

120

type of the metric distance will have a direct effect on the overall performance of the algo-

rithm. When trying different types of metric distances, one interesting question is to find

out which types will result in better recognition scores than the others. In various recogni-

tion domains, this question also involves finding best metric distances for different feature

extraction methods.

As presented by a set of experiments, our framework can be used to locate a part in a

database for part matching. Although promising, these results are preliminary and require

further exploration. The inverse problem, namely, given a composite shape, identifying its

component parts from a part database, is also of interest. We will focus on these problems

in the future as well.

One of the objectives of our future work is also to develop an indexing mechanism

for improving the overall efficiency and effectiveness of the algorithm. One may observe

that such mechanisms may be constructed for both original graph representations and for

embedded point sets. A comparison study between these two methods is also of interest.

A key component to many research problems (such as feature tracking, morphing) is

robust feature matching. In the future we will design new vision algorithms based on our

many-to-many feature matching framework. We believe that our framework will have an

immediate impact on other computer vision problems.

121

Bibliography

[1] 3D Cafe. http://www.3dcafe.com/asp/freestuff.asp.

[2] S. Abiteboul. Querying semi-structured data. In ICDT, pages 1–18, 1997.

[3] R. Agarwala, V. Bafna, M. Farach, M. Paterson, and M. Thorup. On the approxima-bility of numerical taxonomy (fitting distances by tree metrics). SIAM Journal onComputing, 28(2):1073–1085, 1999.

[4] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms,and Applications, pages 4–7. Prentice Hall, Englewood Cliffs, New Jersey, 1993.

[5] D. S. Atkinson and P. M. Vaidya. Using geometry to solve the transportation problemin the plane. Algorithmica, 13(5):442–461, 1995.

[6] R. Barie. Lecons sur les fonctions discontinues. Paris, 1905.

[7] H.G Barrow and R.M. Burstall. Subgraph isomorphism, matching relational struc-tures and maximal cliques. Information Processing Letters, E76-A(4):83–84, 1975.

[8] Y. Bartal, A. Blum, C. Burch, and A. Tomkins. A polylog(n)-competitive algorithmfor metrical task systems. In STOC ’97: Proceedings of the twenty-ninth annualACM symposium on Theory of computing, pages 711–719, New York, NY, USA,1997. ACM Press.

[9] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition usingshape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence,24(4):509–522, April 2002.

[10] R. Beveridge and E. M. Riseman. How easy is matching 2D line models usinglocal search? IEEE Transactions on Pattern Analysis and Machine Intelligence,19(6):564–579, June 1997.

[11] J. Bourgain. On Lipschitz embedding of finite metric spaces into Hilbert space.Israel Journal of Mathematics, 52:46–52, 1985.

[12] J. Bourgain. The metrical interpretation of superreflexivity in Banach spaces. IsraelJournal of Mathematics, 56:222–230, 1986.

[13] K.L. Boyer and A.C. Kak. Structural stereopsis for 3-D vision. IEEE Transactionson Pattern Analysis and Machine Intelligence, 10(2):144–166, March 1988.

122

[14] P. Buneman. The recovery of trees from measures of dissimilarity. In F. Hodson,D. Kendall, and P. Tautu, editors, Mathematics in the Archaeological and HistoricalSciences, pages 387–395. Edinburgh University Press, Edinburgh, 1971.

[15] P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: a query language and algebrafor semistructured data based on structural recursion. VLDB Journal: Very LargeData Bases, 9(1):76–110, 2000.

[16] H. Bunke and K. Shearer. A graph distance metric based on the maximal commonsubgraph. Pattern Recognition Letters, 19(3-4):255–259, 1998.

[17] H.T. Chen, H. H. Lin, and T.L. Liu. Multi-object tracking using dynamical graphmatching. In Proceedings, IEEE Conference on Computer Vision and Pattern Recog-nition, pages II:210–217, 2001.

[18] J.H. Chuang, C. Tsai, and M.C. K. Skeletonization of three-dimensional object usinggeneralized potential field. IEEE Transactions on Pattern Analysis and MachineIntelligence, 22(11):1241–1251, 2000.

[19] S. D. Cohen and L. J. Guibas. The earth mover’s distance under transformationsets. In Proceedings, 7th International Conference on Computer Vision, pages 1076–1083, Kerkyra, Greece, 1999.

[20] J. H. Conway and N. J. A. Sloane. Sphere Packing, Lattices and Groups. Springer-Verlag, New York, 1998.

[21] N. Cornea, D. Silver, X. Yuan, and R. Balasubramanian. Computing hierarchicalcurve-skeletons of 3d objects. CAIP Technical Report CAIP-TR275, Nov 2004.

[22] N. D. Cornea, M. F. Demirci, D. Silver, A. Shokoufandeh, Y. Keselman, S. J. Dick-inson, and P. B. Kantor. 3d object retrieval using many-to-many matching of curveskeletons. In Shape Modeling and Applications, 2005.

[23] M. S. Costa and L. G. Shapiro. Relational indexing. In SSPR, pages 130–139, 1996.

[24] T. Cox and M. Cox. Multidimensional Scaling. Chapman and Hall, London, 1994.

[25] C. M. Cyr and B. B. Kimia. A similarity-based aspect-graph approach to 3d objectrecognition. Int. J. Comput. Vision, 57(1):5–22, 2004.

[26] J. Damon. Local morse theory for solutions to the heat equation and gaussian blur-ring. Journal of Differential Equations, 115(2):386–401, 1995.

123

[27] W.H.E. Day. Computational complexity of inferring phylogenies from dissimilaritymatrices. Bulletin of Mathematical Biology, 49(4):461–467, 1987.

[28] M. F. Demirci, A. Shokoufandeh, S. J. Dickinson, Y. Keselman, and L. Bretzner.Many-to-many feature matching using spherical coding of directed graphs. In ECCV(1), pages 322–335, 2004.

[29] M. F. Demirci, A. Shokoufandeh, Y. Keselman, S. J. Dickinson, and L. Bretzner.Many-to-many matching of scale-space feature hierarchies using metric embedding.In Scale-Space, pages 17–32, 2003.

[30] S. Dickinson, A. Pentland, and A. Rosenfeld. 3-D shape recovery using distributedaspect matching. IEEE Transactions on Pattern Analysis and Machine Intelligence,14(2):174–198, 1992.

[31] M. A. Eshera and K. S. Fu. A graph distance measure for image analysis. IEEETrans. SMC, 14:398–408, May 1984.

[32] L. Florack and A. Kuijper. The topological structure of scale-space images. J. Math.Imaging Vis., 12(1):65–79, 2000.

[33] N. Gagvani and D. Silver. Parameter controlled volume thinning. Graphical Modelsand Image Processing, 61(3):149–164, 1999.

[34] M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. Xtract: Asystem for extracting document type descriptors from xml documents. In SIGMODConference, pages 165–176, 2000.

[35] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions viahashing. In The VLDB Journal, pages 518–529, 1999.

[36] A. Globus, C. Levit, and T. Lasinski. Tool for visualizing the topology of three-dimensional vector fields. In IEEE Visualization, pages 33–40, 1991.

[37] S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching.IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):377–388,1996.

[38] R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimiza-tion in semistructured databases. In VLDB’97, Proceedings of 23rd InternationalConference on Very Large Data Bases, pages 436–445. Morgan Kaufmann, 1997.

124

[39] R. Goldman and J. Widom. Approximate DataGuides, 1999.

[40] Google Image Search. http://www.google.com.

[41] K. Grauman and T.J. Darrell. Fast contour matching using approximate earthmover’s distance. In Proceedings, IEEE Conference on Computer Vision and PatternRecognition (CVPR04), pages I: 220–227, 2004.

[42] W.E.L. Grimson, T. Lozano-Perez, and D.P. Huttenlocher. Object Recognition byComputer: The Role of Geometric Constraints. MIT Press, 1990.

[43] A. Gupta. Embedding tree metrics into low dimensional Euclidean spaces. In Pro-ceedings of the thirty-first annual ACM symposium on Theory of computing, pages694–700, 1999.

[44] A. Gupta, I. Newman, Y. Rabinovich, and A. Sinclair. Cuts, trees and l1 embeddings.Proceedings of Symposium on Foundations of Computer Scince, 1999.

[45] R. M. Haralick and L. G. Shapiro. The consistent labeling problem. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 1:173–184, 1979.

[46] J.L. Helman and L. Hesselink. Visualizing vector field topology in fluid flows. IEEEComputer Graphics and Applications, 11(3):36–46, 1991.

[47] P. Indyk. Algorithmic aspects of geometric embeddings. In Proceedings, 42ndAnnual Symposium on Foundations of Computer Science, 2001.

[48] P. Indyk and N. Thaper. Fast image retrieval via embeddings. In 3rd Intl. Workshopon Statistical and Computational Theories of Vision, 2003.

[49] S. Ioffe and D.A. Forsyth. Human tracking with mixtures of trees. In ICCV01, pagesI: 690–695, 2001.

[50] C. Irniger and H. Bunke. Graph matching: Filtering large databases of graphs usingdecision trees. IAPR-TC15 Workshop on Graph-based Representation in PatternRecognition, pages 239–249, 2001.

[51] C. Irniger and H. Bunke. Graph database filtering using decision trees. In Proceed-ings, 12th International Conference on Pattern Recognition, pages 383–388, 2004.

[52] C. Irniger and H. Bunke. Decision trees for error-tolerant graph database filtering.In GbRPR, pages 301–311, 2005.

125

[53] I.T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 1986.

[54] F. Kanters, L. Florack, B. Platel, and B. M. ter Haar Romeny. Image reconstructionfrom multiscale critical points. In Scale-Space, pages 464–478, 2003.

[55] F. Kanters, B. Platel, L. Florack, and B. M. ter Haar Romeny. Content based imageretrieval using multiscale top points. In Scale-Space, pages 33–43, 2003.

[56] Y. Keselman, A. Shokoufandeh, M. F. Demirci, and S. Dickinson. Many-to-manygraph matching via low-distortion embedding. In Proceedings, IEEE Conference onComputer Vision and Pattern Recognition, Madison, WI, June 2003.

[57] B. B. Kimia, A. Tannenbaum, and S. W. Zucker. Shape, shocks, and deformations I:The components of two-dimensional shape and the reaction-diffusion space. Int. J.Computer Vision, 15:189–224, 1995.

[58] S. Kosinov and T. Caelli. Inexact multisubgraph matching using graph eigenspaceand clustering models. In Proceedings of SSPR/SPR, volume 2396, pages 133–142.Springer, 2002.

[59] L. M. Lifshitz and S. M. Pizer. A multiresolution hierarchical approach to imagesegmentation based on intensity extrema. IEEE Transactions on Pattern Analysisand Machine Intelligence, 12(6):529–540, 1990.

[60] N. Linial, E. London, and Y. Rabinovich. The geometry of graphs and some of itsalgorithmic applications. Proceedings of 35th Annual Symposium on Foundations ofComputer Science, pages 557–591, 1994.

[61] N. Linial, A. Magen, and M. E. Saks. Trees and Euclidean metrics. Proceedings ofthe Thirtieth Annual ACM Symposium on the Theory of Computing, pages 169–175,1998.

[62] T.-L. Liu and D. Geiger. Approximate tree matching and shape similarity. InProceedings, 7th International Conference on Computer Vision, pages 456–462,Kerkyra, Greece, 1999.

[63] J. Llados, E. Marti, and J. Villanueva. Symbol recognition by error-tolerant subgraphmatching between region adjacency graphs. IEEE Transactions on Pattern Analysisand Machine Intelligence, 23(10):1137–1143, 2001.

126

[64] B. Luo and E.R.Hancock. Structural matching using the em algorithm and singularvalue decomposition. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 23:1120–1136, 2001.

[65] J. Matousek. On the distortion required for embedding finite metric spaces intonormed spaces. Israel Journal of Mathematics, 93:333–344, 1996.

[66] J. Matousek. On embedding trees into uniformly convex Banach spaces. IsraelJournal of Mathematics, 237:221–237, 1999.

[67] B. Messmer and H. Bunke. Efficient error-tolerant subgraph isomorphism detection.In D. Dori and A. Bruckstein, editors, Shape, Structure and Pattern Recognition,pages 231–240. World Scientific Publ. Co., 1995.

[68] B. T. Messmer and H. Bunke. Subgraph isomorphism in polynomial time. TechnicalReport IAM 95-003, 1995.

[69] R. Myers, R. Wilson, and E. Hancock. Bayesian graph edit distance. IEEE PAMI,22(6):628–635, 2000.

[70] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library (coil-20).Technical Report CUCS-005-96, February 1996.

[71] S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructureddata. pages 295–306, 1998.

[72] S. Nestorov, J. D. Ullman, J. L. Wiener, and S. S. Chawathe. Representative objects:Concise representations of semistructured, hierarchial data. In ICDE, pages 79–90,1997.

[73] M. Nielsen and M. Lillholm. What do features tell about images? In Scale-Space’01: Proceedings of the Third International Conference on Scale-Space and Mor-phology in Computer Vision, pages 39–50, London, UK, 2001. Springer-Verlag.

[74] A. Okabe and B. Boots. Spatial tessellations: Concepts and applications of Voronoidiagrams. John Wiley and Sons, New York, 1992.

[75] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin. Shape distributions. ACMTransactions on Graphics, 21(4):807–832, Oct. 2002.

127

[76] M. Pelillo, K. Siddiqi, and S. Zucker. Matching hierarchical structures using asso-ciation graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence,21(11):1105–1120, November 1999.

[77] B. Platel, M. F. Demirci, A. Shokoufandeh, L. Florack, F. Kanters, B. M. terHaar Romeny, and S. J. Dickinson. Discrete representation of top points via scalespace tessellation. In Scale-Space, pages 73–84, 2005.

[78] B. Platel, F. Kanters, L. Florack, and E. Balmachnova. Using multiscale top pointsin image matching. In 11th International Conference on Image Processing, 2004.

[79] F. Preparata and M. Shamos. Computational Geometry. Springer-Verlag, New York,NY, 1985.

[80] Princeton Shape Retrieval and Analysis, 3D Model Search.http://shape.cs.princeton.edu/search.html.

[81] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linearembedding. Science, 290:2323–2326, December 2000.

[82] Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover’s distance as a metric forimage retrieval. International Journal of Computer Vision, 40(2):99–121, 2000.

[83] A. Sanfeliu and K. S. Fu. A distance measure between attributed relational graphs forpattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13:353–362, May 1983.

[84] A. Sanfeliu and K.S. Fu. A distance measure between attributed relational graphsfor pattern recognition. SMC, 13(3):353–362, May 1983.

[85] G. Scott and H. Longuet-Higgins. An algorithm for associating the features of twopatterns. Proceedings of Royal Society of London, B244:21–26, 1991.

[86] T. Sebastian, P. Klein, and B. Kimia. Recognition of shapes by editing shock graphs.In IEEE International Conference on Computer Vision, pages 755–762, 2001.

[87] T. Sebastian, P. N. Klein, and B. Kimia. Recognition of shapes by editing theirshock graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence,26(5):550–571, 2004.

128

[88] T. B. Sebastian, P. N. Klein, and B. B. Kimia. Shock-based indexing into large shapedatabases. In ECCV ’02: Proceedings of the 7th European Conference on ComputerVision-Part III, pages 731–746, London, UK, 2002. Springer-Verlag.

[89] K. Sengupta and K. L. Boyer. Organizing large structural modelbases. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 17(4):321–332, 1995.

[90] K. Sengupta and K. L. Boyer. Modelbase partitioning using property matrix spectra.Computer Vision Image Understanding, 70(2):177–196, 1998.

[91] L. G. Shapiro and R. M. Haralick. Structural descriptions and inexact matching.IEEE Transactions on Pattern Analysis and Machine Intelligence, 3:504–519, 1981.

[92] L. G. Shapiro and R. M. Haralick. A metric for comparing relational descriptions.IEEE Transactions on Pattern Analysis and Machine Intelligence, 7:90–94, January1985.

[93] L. G. Shapiro and R.M. Haralick. Organization of relational models for scene analy-sis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4(6):595–602,November 1982.

[94] P. Shilane, M. Kazhdan P. Min, and T. Funkhouser. The princeton shape benchmark.In Shape Modeling International, Genoa, Italy, June 2004.

[95] A. Shokoufandeh, S.J. Dickinson, C. Jonsson, L. Bretzner, and T. Lindeberg. On therepresentation and matching of qualitative shape at multiple scales. In Proceedings,7th European Conference on Computer Vision, volume 3, pages 759–775, 2002.

[96] A. Shokoufandeh, D. Macrini, S.J. Dickinson, K. Siddiqi, and S.W. Zucker. Indexinghierarchical structures using graph spectra. PAMI, 27(7):1125–1140, July 2005.

[97] K. Siddiqi and B. B. Kimia. A shock grammar for recognition. In IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, pages 507–513,1996.

[98] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker. Shock graphs and shapematching. In Proceedings, IEEE International Conference on Computer Vision,pages 222–229, Bombay, January 1998.

[99] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker. Shock graphs and shapematching. International Journal of Computer Vision, 30:1–24, 1999.

129

[100] H. Sossa and R. Horaud. Model indexing: the graph-hashing approach. In Proceed-ings of the IEEE Conference on Computer Vision and Pattern Recognition, Urbana-Champaign, Illinois, USA, June 1992.

[101] H. Sundar, D. Silver, N. Gagvani, and S. Dickinson. Skeleton based shape matchingand retrieval. In Shape Modelling and Applications Conference, SMI 2003, Seoul,Korea, May 2003.

[102] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework fornonlinear dimensionality reduction. Science, 290:2319–2323, 2000.

[103] A. Torsello, D. Hidovic, and M. Pelillo. Four metrics for efficiently comparingattributed trees. In Proceedings, International Conference on Pattern Recognition,pages 467–470, 2004.

[104] S. Umeyama. Least-squares estimation of transformation parameters between twopoint patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence,13(4):376–380, April 1991.

[105] M.S. Waterman, T.F. Smith, M. Singh, and W.A. Beyer. Additive evolutionary trees.J. Theoret. Biol., 64:199–213, 1977.

130

Vita

Muhammed Fatih Demirci was born in Turkey, in 1978. He received his B.S. from the

Department of Computer Engineering, Selcuk University, Turkey in 1999 and his M.S. in

Computer Science from Drexel University in Philadelphia in 2002. Mr. Demirci is the

recipient of the College of Engineering Outstanding Graduate Student Research Award,

Drexel University (2003 - 2004). His research interests include computer vision, statistical

and structural pattern recognition, feature tracking, and graph theory.

Selected Publications:

Cornea, Demirci, Silver, Shokoufandeh, Dickinson, Kantor. 3D Object Retrieval usingMany-to-Many Matching of Curve Skeletons. IEEE International Conference on ShapeModeling and Applications 2005.

Platel, Demirci, Shokoufandeh, Florack, Kanters, Romeny, Dickinson. Discrete Rep-resentation of Top Points via Scale Space Tessellation. 5th International Conference onScale-Space 2005: 73-84.

Demirci, Shokoufandeh, Dickinson, Keselman, Bretzner. Many-to-Many Feature Match-ing Using Spherical Coding of Directed Graphs. The 8th European Conference on Com-puter Vision - ECCV (1) 2004: 322-335.

Demirci, Shokoufandeh, Keselman, Dickinson, Bretzner. Many-to-Many Matching ofScale-Space Feature Hierarchies Using Metric Embedding. 4th International Conferenceon Scale-Space 2003: 17-32.

Keselman, Shokoufandeh, Demirci, Dickinson. Many-to-Many Graph Matching viaMetric Embedding. IEEE Conference on Computer Vision and Pattern Recognition -CVPR (1) 2003: 850-857.

Documents

Many-to-Many Feature Matching for Structural Pattern Recognition · 2009-02-27 · Keselman, Dr. Lars Bretzner, Bram Platel, and Nicu Cornea for their helpful collaboration. I also