6
A COMPARATIVE STUDYOF SIMILARITY MEASURES FOR CONTENT-BASED MULTIMEDIA RETRIEVAL Christian Beecks, Merih Seran Uysal, Thomas Seidl Data Management and Data Exploration Group, RWTH Aachen University, Germany {beecks, uysal, seidl}@cs.rwth-aachen.de ABSTRACT Determining similarities among data objects is a core task of content-based multimedia retrieval systems. Approximating data object contents via flexible feature representations, such as feature signatures, multimedia retrieval systems frequently determine similarities among data objects by applying distance functions. In this paper, we compare major state-of-the-art sim- ilarity measures applicable to flexible feature signatures with respect to their qualities of effectiveness and efficiency. Fur- thermore, we study the behavior of the similarity measures by discussing their properties. Our findings can be used in guid- ing the development of content-based retrieval applications for numerous domains. Keywordssimilarity measures, content-based multimedia retrieval, evaluation 1. INTRODUCTION Today, the world is interconnected by digital data whose volume has been increasing at a high rate. It is immensely desirable for multimedia systems to enable users to access, search, and explore tremendous data arising from applications generating images, videos, audio, or other non-text documents stored in large multimedia databases. Similarity search is an increasingly important task in multi- media retrieval systems and is widely used in both commercial and scientific applications in various areas, such as copy and near-duplicate detection of images and videos [1, 2, 3, 4, 5] or content-based image, video, and audio retrieval [6, 7, 8, 9, 10, 11]. In order to meet the system requirements and user needs regarding adaptable multimedia retrieval, content-based similarity models have been developed and widely applied in the aforementioned areas. Its key challenge is that the inherent properties of data objects are gathered via feature representa- tions which are used for the comparison of the corresponding data objects depending on their contents. Given a query object which is either manually specified by a user or automatically generated by an application and a possibly large multimedia database, an appropriate similarity model can determine similarity between the query and each object in the database by computing the distance between their correspond- ing feature representations. By making use of the computed similarity values determined via distance values, it is possible to retrieve the most similar objects regarding the query. The retrieved objects can then be processed within corresponding applications or explored and evaluated by the users with respect to human perception of similarity. As the concept of determining similarities among data ob- jects is of crucial importance in multimedia retrieval systems and for the aforementioned applications, we provide insights into major state-of-the-art similarity measures applicable to fea- ture signatures exhibiting flexible content-based representation of multimedia data objects. This will be carried out by exten- sive experiments and evaluations taking into account both effec- tiveness and efficiency of the corresponding similarity measures whose performance qualities are compared with each other. Our findings can be used in guiding the development of content- based retrieval applications for numerous domains. We structure the paper as follows: we first provide prelimi- nary information about feature extraction and common feature representations in Section 2. Then, in Section 3 we survey major state-of-the-art similarity measures applicable to feature signa- tures for content-based multimedia retrieval. Section 4 will be devoted to experimental evaluation of the similarity measures regarding both effectiveness and efficiency. The discussion of the experimental evaluation results will be given in Section 5. We end this work with the conclusion in Section 6. 2. FEATURE EXTRACTION AND REPRESENTATION In this section, we describe the common feature extraction pro- cess and the frequently used feature representation form, the so-called feature signatures [12]. The extraction of data object features and their aggregation aim at digitizing and compactly storing the data objects’ inherent properties. The extraction of features and the aggregation of these fea- tures for each individual data object are illustrated in Figure 1. In the feature extraction step, each data object is mapped into a set of features in an appropriate feature space FS . In the field of content-based image retrieval [6, 8, 9], the feature space frequently comprises position, color, or texture dimen- sions [13, 14] where each image pixel is mapped to a single feature in the corresponding feature space. In the figure, the features shown via the same color in the feature space belong to the same data object. In this way, the content of each data object is exhibited via its feature distribution in the feature space. 978-1-4244-7493-6/10/$26.00 c 2010 IEEE ICME 2010 1552

similarity measures for retrieval technique

Embed Size (px)

Citation preview

Page 1: similarity measures for retrieval technique

A COMPARATIVE STUDY OF SIMILARITY MEASURES FOR CONTENT-BASEDMULTIMEDIA RETRIEVAL

Christian Beecks, Merih Seran Uysal, Thomas Seidl

Data Management and Data Exploration Group, RWTH Aachen University, Germany{beecks, uysal, seidl}@cs.rwth-aachen.de

ABSTRACTDetermining similarities among data objects is a core task ofcontent-based multimedia retrieval systems. Approximatingdata object contents via flexible feature representations, suchas feature signatures, multimedia retrieval systems frequentlydetermine similarities among data objects by applying distancefunctions. In this paper, we compare major state-of-the-art sim-ilarity measures applicable to flexible feature signatures withrespect to their qualities of effectiveness and efficiency. Fur-thermore, we study the behavior of the similarity measures bydiscussing their properties. Our findings can be used in guid-ing the development of content-based retrieval applications fornumerous domains.

Keywords— similarity measures, content-based multimediaretrieval, evaluation

1. INTRODUCTION

Today, the world is interconnected by digital data whose volumehas been increasing at a high rate. It is immensely desirablefor multimedia systems to enable users to access, search, andexplore tremendous data arising from applications generatingimages, videos, audio, or other non-text documents stored inlarge multimedia databases.

Similarity search is an increasingly important task in multi-media retrieval systems and is widely used in both commercialand scientific applications in various areas, such as copy andnear-duplicate detection of images and videos [1, 2, 3, 4, 5]or content-based image, video, and audio retrieval [6, 7, 8, 9,10, 11]. In order to meet the system requirements and userneeds regarding adaptable multimedia retrieval, content-basedsimilarity models have been developed and widely applied inthe aforementioned areas. Its key challenge is that the inherentproperties of data objects are gathered via feature representa-tions which are used for the comparison of the correspondingdata objects depending on their contents.

Given a query object which is either manually specified by auser or automatically generated by an application and a possiblylarge multimedia database, an appropriate similarity model candetermine similarity between the query and each object in thedatabase by computing the distance between their correspond-ing feature representations. By making use of the computedsimilarity values determined via distance values, it is possible

to retrieve the most similar objects regarding the query. Theretrieved objects can then be processed within correspondingapplications or explored and evaluated by the users with respectto human perception of similarity.

As the concept of determining similarities among data ob-jects is of crucial importance in multimedia retrieval systemsand for the aforementioned applications, we provide insightsinto major state-of-the-art similarity measures applicable to fea-ture signatures exhibiting flexible content-based representationof multimedia data objects. This will be carried out by exten-sive experiments and evaluations taking into account both effec-tiveness and efficiency of the corresponding similarity measureswhose performance qualities are compared with each other. Ourfindings can be used in guiding the development of content-based retrieval applications for numerous domains.

We structure the paper as follows: we first provide prelimi-nary information about feature extraction and common featurerepresentations in Section 2. Then, in Section 3 we survey majorstate-of-the-art similarity measures applicable to feature signa-tures for content-based multimedia retrieval. Section 4 will bedevoted to experimental evaluation of the similarity measuresregarding both effectiveness and efficiency. The discussion ofthe experimental evaluation results will be given in Section 5.We end this work with the conclusion in Section 6.

2. FEATURE EXTRACTION AND REPRESENTATION

In this section, we describe the common feature extraction pro-cess and the frequently used feature representation form, theso-called feature signatures [12]. The extraction of data objectfeatures and their aggregation aim at digitizing and compactlystoring the data objects’ inherent properties.

The extraction of features and the aggregation of these fea-tures for each individual data object are illustrated in Figure 1.In the feature extraction step, each data object is mapped intoa set of features in an appropriate feature space FS . In thefield of content-based image retrieval [6, 8, 9], the featurespace frequently comprises position, color, or texture dimen-sions [13, 14] where each image pixel is mapped to a singlefeature in the corresponding feature space. In the figure, thefeatures shown via the same color in the feature space belong tothe same data object. In this way, the content of each data objectis exhibited via its feature distribution in the feature space.

978-1-4244-7493-6/10/$26.00 c©2010 IEEE ICME 20101552

Page 2: similarity measures for retrieval technique

d 1

d 2

database feature spacefeature

histograms

feature extraction

d 1

d 2

d 1

d 2

featureaggregation

clustering of features

partitioning of the feature

space

featuresignatures

Fig. 1. The feature extraction process in two steps: the objects’properties are extracted and mapped in some feature space andthen represented in a more compact form.

In order to compute similarity between two data objects ef-ficiently, features are aggregated into a compact feature rep-resentation form which is depicted as the feature aggregationstep in Figure 1. There are two common feature representationforms, feature histograms and feature signatures, which arisefrom global partitioning of the feature space and local cluster-ing of features for each data object, respectively [12]. Theglobal partitioning of the feature space is carried out regard-less of feature distributions of single data objects. For each dataobject, a single feature histogram is generated where each en-try of the histogram corresponds to the number of its featureslocated in the corresponding global partition. In contrast, localclustering of features for a data object results in a feature sig-nature, also named as adaptive-binning histogram [15], whichcomprises clusters of the object’s features with the correspond-ing weights and centroids. In the figure, each plus denotes acentroid of an individual cluster which can be identified via itscolor. For instance, the red pluses denote the centroids of theclusters which correspond to the feature distribution of the reddata object in the database.

In this work, we focus on similarity measures applicable tofeature signatures which achieve a better balance between ex-pressiveness and efficiency than feature histograms [16]. Fur-thermore, feature signatures aggregate the data objects’ inherentproperties which are expressed via features more flexible thanfeature histograms and preserve the local appearance of featuresbetter than feature histograms. The definition of feature signa-tures is given below.

Definition 1 Feature SignatureGiven a feature space FS and a local clustering C =C1, . . . , Cn of the features f1, . . . , fk ∈ FS of object o, the fea-ture signature So is defined as a set of tuples from FS × R+

as So = {〈coi , woi 〉, i = 1, . . . , n}, where coi =∑

f∈Cif

|Ci| and

woi = |Ci|k represent the centroid and weight, respectively.

Intuitively, a feature signature So of object o is a set of cen-troids coi ∈ FS with the corresponding weights woi ∈ R+ ofthe clusters Ci. Carrying out the feature clustering individuallyfor each data object, we recognize that each feature signature

Fig. 2. Three example images in the top row and the corre-sponding feature signature visualizations in the bottom row.

reflects the feature distribution more meaningfully than any fea-ture histogram. In addition, feature histograms can be regardedas a special case of feature signatures where the clustering isreplaced with a global partitioning of the feature space.

We depict three example images with their correspondingfeature signatures in Figure 2. We visualize the feature signa-tures’ centroids from a seven-dimensional feature space (twoposition, three color, and two texture dimensions) as circles in atwo-dimensional position space. The circles’ colors and diame-ters reflect the colors and weights of the centroids, respectively.

Throughout the present work, we apply an adaptive variant ofthe k-means clustering algorithm to generate adjustable featuresignatures, similar to the one proposed in [15]. This clusteringalgorithm performs like a k-means one without specifying thenumber k of clusters in advance. Instead, the following param-eters have to be defined: the maximum radius R of a cluster,the nominal cluster separation S, and the minimum number ofpoints in a cluster M . For the choice of these parameters, werefer to Section 4.

In the following section, we include a short survey of state-of-the-art similarity measures applicable to feature signatures.

3. A SHORT SURVEY OF ADAPTIVE SIMILARITYMEASURES

In this section, we summarize major state-of-the-art similaritymeasures for feature signatures exhibiting content-based featurerepresentations of multimedia objects. Furthermore, we discusstheoretical fundamentals and ideas behind each similarity mea-sure in order to contribute to our understanding.

Adaptive similarity measures apply a so-called ground dis-tance function to determine distances among feature signatures’centroids in the feature space. Applicable ground distance func-tions are for instance those evaluated in [17].

The first similarity measure we present is the Hausdorff Dis-tance [18] which measures the maximum nearest neighbor dis-tance among centroids in both feature signatures. The formaldefinition of the Hausdorff Distance is given below.

Definition 2 Hausdorff DistanceGiven two feature signatures Sq and So and a ground distancefunction d, the Hausdorff Distance HDd between Sq and So isdefined as:

HDd(Sq, So) = max {h(Sq, So), h(So, Sq)} , where

h(Sq, So) = maxcq∈Sq

minco∈So

{d(cq, co)}.

1553

Page 3: similarity measures for retrieval technique

The Hausdorff Distance is only based on the centroid struc-ture of both feature signatures and does not consider theirweights. On top of this, it does not take into account the wholestructures of feature signatures which causes a limitation whilecomputing the distance between any two feature signatures.

As an extension for color-based image retrieval, the Percep-tually Modified Hausdorff Distance [19] was proposed whichuses the information of both weights and centroids of the fea-ture signatures. Below, the formal definition of the PerceptuallyModified Hausdorff Distance is given.

Definition 3 Perceptually Modified Hausdorff DistanceGiven two feature signatures Sq and So and a ground dis-tance function d, the Perceptually Modified Hausdorff DistancePMHDd between Sq and So is defined as:

PMHDd(Sq, So) = max {hw(Sq, So), hw(So, Sq)} ,

where hw(Sq, So) =

∑i w

qi ·minj{

d(cqi ,c

oj )

min{wqi ,w

oj }}∑

i wqi

.

The computation of the distance between Sq and So via thePerceptually Modified Hausdorff Distance requires the deter-mination of the centroid coj located as near as possible with thehighest possible weight for each centroid cqi , and vice versa. Inother words, in spite of the consideration of the weight and po-sition information the whole structures of the feature signaturesare not taken into consideration.

Another similarity measure is the well-known Earth Mover’sDistance [12, 16] originated in the computer vision domain. Itssuccessful utilization gave raise to numerous applications in dif-ferent domains. This similarity measure describes the cost fortransforming one feature signature into another one. Similarityis considered to be a transportation problem and, thus, is thesolution to a minimization problem which can be solved via aspecialized simplex algorithm.

Definition 4 Earth Mover’s DistanceGiven two feature signatures Sq and So and a ground distancefunction d, the Earth Mover’s DistanceEMDd between Sq andSo is defined as a minimum cost flow over all possible flowsfij ∈ R as:

EMDd(Sq, So) = minfij

{∑i

∑j fij · d(c

qi , c

oj)

min{∑i w

qi ,

∑j w

oj}

},

under the constraints: ∀i :∑j fij ≤ wqi , ∀j :

∑i fij ≤ woj ,

∀i, j : fij ≥ 0, and∑i

∑j fij = min{

∑i w

qi ,

∑j w

oj}.

The constraints guarantee a feasible solution, i.e. all costs arepositive and do not exceed the limitations given by the weightsin both feature signatures. However, as there is a minimizationproblem to solve, the run time complexity is considerably high.

The succeeding Weighted Correlation Distance [15] followsa slightly different approach. Instead of taking only distancesbetween the feature signatures’ centroids into the computation,it measures the intersection among the clusters represented by

the corresponding centroids via their distance d and maximumcluster radius R which has to be specified in the feature extrac-tion process, as can be seen in Section 2.

Definition 5 Weighted Correlation DistanceGiven two feature signatures Sq and So, a ground distancefunction d, and the maximum cluster radius R, the WeightedCorrelation Distance WCDd,R between Sq and So is definedas:

WCDd,R(Sq, So) = 1−∑i

∑j s(c

qi , c

oj) ·

wqi√

Sq·Sq· wo

i√So·So

where

Sq · So =∑i

∑j s(c

qi , c

oj) · w

qi · woj and

s(ci, cj) =

{1− 3

4dR + 1

16 ( dR )3 if 0 ≤ dR ≤ 2,

0 otherwise.

Based on the intersection s(ci, cj) between two centroids ciand cj , the weighted correlation Sq · So between both featuresignatures is normalized and used to determine the correspond-ing distance value.

The last similarity measure we present is the SignatureQuadratic Form Distance [20, 21, 22] which bridges the gap be-tween the traditional Quadratic Form Distance [23] and featuresignatures. Its formal definition is given below.

Definition 6 Signature Quadratic Form DistanceGiven two feature signatures Sq and So, the SignatureQuadratic Form Distance SQFDA between Sq and So is de-fined as:

SQFDA(Sq, So) =√

(wq| − wo) ·A · (wq| − wo)T ,

where A ∈ R(n+m)×(n+m) is the similarity matrix, wq =(wq1, . . . , w

qn) and wo = (wo1, . . . , w

om) form weight vectors,

and (wq| − wo) = (wq1, . . . , wqn,−wo1, . . . ,−wom) denotes the

concatenation of wq and −wo.

The similarity matrix A which is dynamically determinedfor each comparison of two feature signatures reflects cross-similarities among the feature signatures’ centroids. Entries ofA depend on the order of the centroids in which they appear inthe feature signatures and can be obtained via similarity func-tions [20, 21], for instance aij = e−α·L

22(ci,cj). Moreover, the

Signature Quadratic Form Distance computes the distance be-tween Sq and So by considering the weights and the positionsof any centroids cq and co. Thus the Signature Quadratic FormDistance takes into account the whole structure of both featuresignatures.

All aforementioned similarity measures are applicable to fea-ture signatures of different size and structure. We show theirproperties in Table 1. The Hausdorff Distance and Perceptu-ally Modified Hausdorff Distance exhibit the highest efficiency:both consider the weight and position information of the fea-ture signatures only partially which results in low computationtimes. The remaining similarity measures take into account the

1554

Page 4: similarity measures for retrieval technique

Table 1. The similarity measures’ properties.weight position

efficiency information informationHD ++++ - partiallyPMHD +++ partially partiallyEMD + completely completelyWCD ++ completely completelySQFD ++ completely completely

complete weight and position information given by the featuresignatures which decreases their efficiency.

We evaluate the retrieval performance of the similarity mea-sures in the forthcoming section.

4. EXPERIMENTAL EVALUATION

In this section, we conduct experiments on different multimediadatabases. First, we describe the experimental setup. Second,we evaluate the retrieval performance of the similarity measuresin terms of effectiveness and efficiency.

4.1. Experimental Setup

In order to thoroughly evaluate the presented similarity mea-sures, we measure their retrieval performance on the followingdatabases: the Wang database [24], the Coil100 database [25],the MIR Flickr database [26], and the 101objects database [27].We depict example images from these databases in Figure 3.

The Wang database comprises 1,000 images which are clas-sified into ten themes. The themes cover a multitude of topics,such as beaches, flowers, buses, food, etc. The Coil100 databaseconsists of 7,200 images classified into 100 different classes.Each class depicts one object photographed from 72 differentdirections. The MIR Flickr database contains 25,000 imagesdownloaded from http://flickr.com/ including textualannotations. The 101objects database contains 9,196 imageswhich are classified into 101 categories.

The themes, classes, textual annotations, and categories areused as ground truth to measure precision-recall values [28] af-ter each returned object. In the MIR Flickr database we de-fine virtual classes which contain all images sharing at least twocommon textual annotations and are used as ground truth.

We extracted feature signatures with respect to images in theaforementioned databases by applying an adaptive variant of thek-means clustering algorithm, described in Section 2, in differ-ent feature spaces comprising up to seven dimensions: threecolor (L, a, b), two position (x, y), and two texture dimensions(χ, η) consisting of contrast χ and coarseness η. In this way,image pixels are first mapped to features in the correspondingfeature space and then clustered accordingly. As color of animage is its fundamental perceptual property, we evaluated theperformance of the similarity measures on the following fea-ture spaces: color only (L, a, b), color + texture (L, a, b, χ, η),color + position (L, a, b, x, y), and color + position + texture

(a) (b) (c) (d)

Fig. 3. Example images of the (a) Wang, (b) Coil100, (c) MIRFlickr, and (d) 101objects database.

(L, a, b, x, y, χ, η). We fixed the parameters of the clusteringalgorithm to R = 30, S = 50, and M = 40. Furthermore,we excluded the black surroundings of objects in images ofthe Coil100 database by filtering out features which are almostblack.

All similarity measures were evaluated by using the Eu-clidean ground distance function as it turned out that differentground distance functions did not influence the performance ra-tios of the similarity measures. For the Signature QuadraticForm Distance, we figured out that an exponential similarityfunction aij = e−α·L

22(ci,cj) exhibits the best results and, there-

fore, we used this similarity function throughout our experimen-tal evaluation. We dynamically determine the best value of αwith respect to the corresponding database.

We ran all experiments on a 2.33GHz Intel machine andmeasured the performance of the presented similarity measuresbased on a JAVA implementation.

4.2. Retrieval Results

Tables 2 - 5 show the detailed mean average precision valuesfor each combination of database, feature space, and similaritymeasure. Each value is gathered by receiving the mean of av-erage precision values of 2,000 randomized queries which varyin their size and structure, i.e. the databases are queried withdifferent images and each query image is also represented byfeature signatures of different sizes. We highlight the highestmean average precision values of each row.

Comparing the retrieval results of the similarity measures,we observe that the mean average precision values of each sim-ilarity measure depend on the underlying feature space anddatabase. For the Wang, the Coil100, and the MIR Flickrdatabases the Signature Quadratic Form Distance (SQFD) ex-hibits the highest mean average precision values. When onlycolor features are used, the Perceptually Modified HausdorffDistance (PMHD) and the Earth Mover’s Distance (EMD) showthe best retrieval performance in terms of mean average preci-sion values. The latter exhibits the highest mean average pre-cision values for the 101objects database, regardless of the se-lected features. While the Hausdorff Distance (HD) achieves

1555

Page 5: similarity measures for retrieval technique

Table 2. Mean average precision values for the Wang database.features SQFD HD PMHD WCD EMD αcolor 0.586 0.317 0.439 0.567 0.580 2.9color + texture 0.620 0.345 0.476 0.604 0.599 1.5color + position 0.568 0.391 0.461 0.536 0.563 1.2color + pos. + text. 0.613 0.308 0.476 0.591 0.598 0.9mean: 0.597 0.340 0.463 0.575 0.585

Table 3. Mean average precision values for the Coil100database.

features SQFD HD PMHD WCD EMD αcolor 0.811 0.731 0.828 0.772 0.809 2.9color + texture 0.770 0.477 0.696 0.728 0.750 1.5color + position 0.802 0.498 0.664 0.743 0.706 0.7color + pos. + text. 0.776 0.425 0.606 0.726 0.710 0.6mean: 0.790 0.533 0.699 0.742 0.744

Table 4. Mean average precision values for the MIR Flickrdatabase.

features SQFD HD PMHD WCD EMD αcolor 0.322 0.318 0.322 0.321 0.322 0.7color + texture 0.343 0.317 0.331 0.338 0.337 1.0color + position 0.321 0.316 0.319 0.317 0.314 0.2color + pos. + text. 0.343 0.307 0.322 0.335 0.333 0.6mean: 0.332 0.315 0.324 0.328 0.327

Table 5. Mean average precision values for the 101objectsdatabase.

features SQFD HD PMHD WCD EMD αcolor 0.106 0.060 0.076 0.097 0.109 0.4color + texture 0.113 0.064 0.110 0.109 0.118 0.7color + position 0.113 0.081 0.120 0.097 0.130 1.6color + pos. + text. 0.139 0.072 0.105 0.117 0.141 1.6mean: 0.118 0.069 0.103 0.105 0.125

the lowest retrieval performance, the Weighted Correlation Dis-tance’s mean average precision values lie in between the perfor-mance of similarity measures mentioned before.

The computation time values needed to generate the retrievalresults are depicted in Table 6. As the ratio of computation timesis the same for all databases, we measure the similarity mea-sures’ efficiency on the MIR Flickr database, the largest one.The Hausdorff Distance and Perceptually Modified HausdorffDistance exhibit the lowest computation time values. Their av-erage computation time values are approximately 1.3 secondand 2.1 seconds, respectively. While the Weighted Correla-tion Distance computes the results in 5.1 seconds, the SignatureQuadratic Form Distance needs approximately twice as long, 10seconds. The lowest efficiency is shown by the Earth Mover’sDistance which needs 57.2 seconds for the retrieval process.

Concluding the experiments, we state that in general the Sig-nature Quadratic Form Distance exhibits the highest mean aver-age precision values, while the Hausdorff Distance and Percep-tually Modified Hausdorff Distance exhibit the lowest compu-tation time values. We briefly discuss these results with a closerlook on the similarity measures’ properties in the next section.

Table 6. Computation time values in milliseconds for the MIRFlickr database.

features SQFD HD PMHD WCD EMDcolor 2,374 346 502 1,253 24,281color + texture 10,409 1,429 2,201 5,333 58,706color + position 6,561 882 1,349 3,283 35,369color + pos.+ text. 20,666 2,657 4,168 10,526 110,527mean: 10,003 1,329 2,055 5,099 57,221

5. DISCUSSION

According to Table 1 in Section 3, we can classify the similaritymeasures into two groups. The first group examined includesthe Hausdorff Distance and Perceptually Modified HausdorffDistance. While the Hausdorff Distance completely ignoresweight information of the feature signatures, the latter includesthis information. Their computations gather position informa-tion of the feature signatures only partially, thus they are veryfast to compute. In our detailed experimental evaluation, wefigured out that the Hausdorff Distances achieve the best resultswhen the size of the query feature signatures is approximatelythe same as that of the database. As mentioned in the previ-ous section, we vary the query feature signature size to obtainrealistic real-world experimental evaluation.

The second group which handles this discrepancy betweenquery and database more successfully includes the WeightedCorrelation Distance, Earth Mover’s Distance, and SignatureQuadratic Form Distance. They completely take into accountweight and position information of the feature signatures to becompared. This comes at the cost of higher computation timevalues, which particularly hold for the Earth Mover’s Distancesolving a costly optimization problem. We figured out that thesimilarity measures of the second group achieve good resultseven if the size of the query feature signature is much smallerthan that of any feature signature corresponding a data objectin the database. The computation of the Weighted CorrelationDistance is limited by the parameter R which needs to be spec-ified in advance according to individual clusters. The SignatureQuadratic Form Distance ensures to use the complete positionand weight information in the computation. It achieves the high-est retrieval performance in terms of effectiveness.

6. CONCLUSIONS

In this paper, we presented new insights into the state-of-the-artsimilarity measures applicable to feature signatures for content-based multimedia retrieval. We compared the Hausdorff Dis-tance, Perceptually Modified Hausdorff Distance, WeightedCorrelation Distance, Earth Mover’s Distance, and SignatureQuadratic Form Distance, and evaluated experimental resultswith respect to their qualities of effectiveness, as well as effi-ciency. We observed that the Signature Quadratic Form Dis-tance comes up with the highest retrieval performance in termsof effectiveness and outperforms the aforementioned similaritymeasures.

1556

Page 6: similarity measures for retrieval technique

7. REFERENCES

[1] X. Wu, C.W. Ngo, A.G. Hauptmann, and H.K. Tan, “Real-time near-duplicate elimination for web video search withcontent and context,” IEEE Transactions on Multimedia,vol. 11, no. 2, pp. 196–207, 2009.

[2] X. Yang, Q. Zhu, and K.-T. Cheng, “Near-duplicate detec-tion for images and videos,” in Proc. of ACM Workshopon Large-scale Multimedia Retrieval and Mining, 2009,pp. 73–80.

[3] J. Law-To, L. Chen, A. Joly, I. Laptev, O. Buisson,V. Gouet-Brunet, N. Boujemaa, and F. Stentiford, “Videocopy detection: a comparative study,” in Proc. of ACM Int.Conf. on Image and Video Retrieval, 2007, pp. 371–378.

[4] C. Kim, “Content-based image copy detection,” SignalProcess.: Image Commun., vol. 18(3), pp. 169–184, 2003.

[5] A. Hampapur and RM Bolle, “Comparison of distancemeasures for video copy detection,” in Proc. of IEEE Int.Conf. on Multimedia and Expo, 2001, pp. 737–740.

[6] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Re-trieval: Ideas, Influences, and Trends of the New Age,”ACM Computing Surveys, vol. 40, no. 2, pp. 1–60, 2008.

[7] P. Geetha and V. Narayanan, “A Survey of Content-BasedVideo Retrieval,” Journal of Computer Science, vol. 4, no.6, pp. 474–486, 2008.

[8] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta,and R. Jain, “Content-Based Image Retrieval at the Endof the Early Years,” IEEE Trans. on Pattern Analysis andMachine Intelligence, vol. 22(12), pp. 1349–1380, 2000.

[9] N. Sebe, M. Lew, X. Zhou, T. Huang, and E. Bakker, “Thestate of the art in image and video retrieval,” Image andVideo Retrieval, pp. 7–12, 2003.

[10] J. Foote, “An overview of audio information retrieval,”Multimedia Systems, vol. 7, no. 1, pp. 2–10, 1999.

[11] Michael S. Lew, Nicu Sebe, Chabane Djeraba, andRamesh Jain, “Content-Based Multimedia InformationRetrieval: State of the Art and Challenges,” ACM Trans.Multimedia Comput. Commun. Appl., vol. 2(1), pp. 1–19,2006.

[12] Y. Rubner, C. Tomasi, and L. J. Guibas, “The EarthMover’s Distance as a Metric for Image Retrieval,” Int.Journal of Computer Vision, vol. 40(2), pp. 99–121, 2000.

[13] T. Deselaers, D. Keysers, and H. Ney, “Features for Im-age Retrieval: An Experimental Comparison,” Informa-tion Retrieval, vol. 11, no. 2, pp. 77–107, 2008.

[14] Remco C. Veltkamp, Mirela Tanase, and Danielle Sent,“Features in content-based image retrieval systems: a sur-vey,” in State-of-the-Art in Content-Based Image andVideo Retrieval, 1999, pp. 97–124.

[15] W. K. Leow and R. Li, “The analysis and applications ofadaptive-binning color histograms,” Computer Vision andImage Understanding, vol. 94, no. 1-3, pp. 67–91, 2004.

[16] Y. Rubner, Perceptual metrics for image database naviga-tion, Ph.D. thesis, 1999.

[17] R. Hu, S. M. Ruger, D. Song, H. Liu, and Z. Huang, “Dis-similarity measures for content-based image retrieval,” inProc. of IEEE Int. Conf. on Multimedia and Expo, 2008,pp. 1365–1368.

[18] D. P. Huttenlocher, G. A. Klanderman, and W. J. Ruck-lidge, “Comparing images using the Hausdorff Distance,”IEEE Transactions on Pattern Analysis and Machine In-telligence, vol. 15, no. 9, pp. 850–863, 1993.

[19] B. G. Park, K. M. Lee, and S. U. Lee, “Color-based imageretrieval using perceptually modified Hausdorff distance,”Journal on Image and Video Processing, vol. 2008, pp. 1–10, 2008.

[20] C. Beecks, M. S. Uysal, and T. Seidl, “Signature QuadraticForm Distances for Content-Based Similarity,” in Proc. ofACM Int. Conf. on Multimedia, 2009, pp. 697–700.

[21] C. Beecks, M. S. Uysal, and T. Seidl, “Signature QuadraticForm Distance,” in Proc. of ACM Int. Conf. on Image andVideo Retrieval, 2010, to appear.

[22] C. Beecks, M. S. Uysal, and T. Seidl, “Efficient k-NearestNeighbor Queries with the Signature Quadratic Form Dis-tance,” in Proc. of Int. Workshop on Ranking in Databases,2010, pp. 10–15.

[23] C. Faloutsos, R. Barber, M. Flickner, J. Hafner,W. Niblack, D. Petkovic, and W. Equitz, “Efficient andEffective Querying by Image Content,” Journal of Intelli-gent Information Systems, vol. 3(3/4), pp. 231–262, 1994.

[24] J. Z. Wang, Jia Li, and G. Wiederhold, “SIMPLIc-ity: semantics-sensitive integrated matching for picture li-braries,” IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, vol. 23, no. 9, pp. 947–963, 2001.

[25] S. Nene, S. K. Nayar, and H. Murase, “Columbia ObjectImage Library (COIL-100),” Tech. Rep., Department ofComputer Science, Columbia University, 1996.

[26] M. J. Huiskes and M. S. Lew, “The MIR Flickr retrievalevaluation,” in Proc. of ACM Int. Conf. on MultimediaInformation Retrieval, 2008, pp. 39–43.

[27] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generativevisual models from few training examples an incrementalbayesian approach tested on 101 object categories,” inWorkshop on Generative-Model Based Vision, 2004.

[28] S. Marchand-Maillet and M. Worring, “BenchmarkingImage and Video Retrieval: an Overview,” in Proc. ofACM Int. Workshop on Multimedia Information Retrieval,2006, pp. 297–300.

1557