14
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav Aggarwal, Ashwin T. V., and Sugata Ghosal Abstract—Most interactive “query-by-example” based image retrieval systems utilize relevance feedback from the user for bridging the gap between the user’s implied concept and the low-level image representation in the database. However, tradi- tional relevance feedback usage in the context of content-based image retrieval (CBIR) may not be very efficient due to a sig- nificant overhead in database search and image download time in client-server environments. In this paper, we propose a CBIR system that efficiently addresses the inherent subjectivity in user perception during a retrieval session by employing a novel idea of intra-query modification and learning. The proposed system generates an object-level view of the query image using a new color segmentation technique. Color, shape and spatial features of individual segments are used for image representation and retrieval. The proposed system automatically generates a set of modifications by manipulating the features of the query seg- ment(s). An initial estimate of user perception is learned from the user feedback provided on the set of modified images. This largely improves the precision in the first database search itself and alleviates the overheads of database search and image download. Precision-to-recall ratio is improved in further iterations through a new relevance feedback technique that utilizes both positive as well as negative examples. Extensive experiments have been conducted to demonstrate the feasibility and advantages of the proposed system. Index Terms—Color image segmentation, content-based image retrieval, intra-query learning, intra-query modification, query re- finement, relevance feedback. I. INTRODUCTION R APID growth in the number and size of image databases has created the need for more efficient search and retrieval techniques, since conventional database search based on textual queries can only provide at best a partial solution to the problem. Either the database images are often not annotated with textual descriptions, or the vocabulary needed to describe the user’s im- plied concept does not exist (or, at least not known to the user). Moreover, there is rarely a unique description that can be asso- ciated with a particular image. Thus, recently there has been an immense activity in building direct content-based image search engines. In content based search engines, each image is repre- sented using features such as color, texture, shape or position. A database consisting of feature vectors of all images is cre- Manuscript received April 12, 2001; revised February 26, 2002. The associate editor coordinating the review of this paper and approving it for publication was Dr. Ahmed Tewfik. G. Aggarwal was with the IBM India Research Laboratory, New Delhi 110016, India. He is now with Broadcom, Bangalore 560025, India (e-mail: [email protected]). Ashwin T. V. and S. Ghosal are with the IBM India Research Laboratory, New Delhi 110016, India (e-mail: [email protected]; [email protected]). Publisher Item Identifier S 1520-9210(02)04869-1. ated. Images in the database that are Nearest Neighbors to the query image according to a similarity metric in the feature space are retrieved. There are two prevalent approaches to measuring similarity: a) geometric distance-based [29] and b) probabilistic likelihood-based [21]. The first step in designing a CBIR system is the selection of an appropriate feature space so that images that are “close” in feature space are also perceptually close to the user. However, fully automatic, rigid approach to image retrieval cannot satisfy the information need of a diverse user population [29]. There- fore, relevance feedback during a retrieval session has emerged as a de facto standard methodology in recent CBIR systems for bridging the gap between the user’s high-level concept (e.g., sunset) and the low-level representation of images in the feature space (e.g., dominant orange-yellow color distribution). Given the user’s preferences to a set of retrieved images, the goal is to learn his notion of similarity by adjusting the parameters of the chosen similarity metric, and improve the relevance of the retrieved images to that user in successive iterations. This process of repeated database search, however, becomes a bot- tleneck with increase in database size. Furthermore, when the database is located remotely, say over the World Wide Web, downloading irrelevant images in each search iteration signif- icantly slows down the retrieval speed of the image of interest. It is, therefore, desirable to have a capability of understanding user perception from the query image itself at the client-site. This increases the relevance of the images, retrieved from the database, thereby reducing the time required to search the im- ages of interest. In this paper, we propose a new CBIR system called iPURE (Perceptual and User-friendly REtrieval) that incorporates a novel methodology of intra-query modification and learning of user perception at the client-site in addition to relevance feedback in successive iterations. An object-level view of the query image is first obtained using image segmentation. Once the user selects segment(s) of interest, a set of modified images is automatically generated by the system at the client-site. Initial user perception is learned based on the user feedback on the set of modifications. A new color image segmentation algorithm has been developed that is reasonably accurate and at the same time fast for an interactive application. In addition, a novel relevance feedback technique that explicitly uses both positive and negative examples to improve retrieval performance is incorporated in this system. A. Relationship With Existing Retrieval Systems All CBIR systems essentially perform search and retrieval operations in image databases using image centric features like 1520-9210/02$17.00 © 2002 IEEE

An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201

An Image Retrieval System With AutomaticQuery Modification

Gaurav Aggarwal, Ashwin T. V., and Sugata Ghosal

Abstract—Most interactive “query-by-example” based imageretrieval systems utilize relevance feedback from the user forbridging the gap between the user’s implied concept and thelow-level image representation in the database. However, tradi-tional relevance feedback usage in the context of content-basedimage retrieval (CBIR) may not be very efficient due to a sig-nificant overhead in database search and image download timein client-server environments. In this paper, we propose a CBIRsystem that efficiently addresses the inherent subjectivity in userperception during a retrieval session by employing a novel ideaof intra-query modification and learning. The proposed systemgenerates an object-level view of the query image using a newcolor segmentation technique. Color, shape and spatial featuresof individual segments are used for image representation andretrieval. The proposed system automatically generates a set ofmodifications by manipulating the features of the query seg-ment(s). An initial estimate of user perception is learned from theuser feedback provided on the set of modified images. This largelyimproves the precision in the first database search itself andalleviates the overheads of database search and image download.Precision-to-recall ratio is improved in further iterations througha new relevance feedback technique that utilizes both positiveas well as negative examples. Extensive experiments have beenconducted to demonstrate the feasibility and advantages of theproposed system.

Index Terms—Color image segmentation, content-based imageretrieval, intra-query learning, intra-query modification, query re-finement, relevance feedback.

I. INTRODUCTION

RAPID growth in the number and size of image databaseshas created the need for more efficient search and retrieval

techniques, since conventional database search based on textualqueries can only provide at best a partial solution to the problem.Either the database images are often not annotated with textualdescriptions, or the vocabulary needed to describe the user’s im-plied concept does not exist (or, at least not known to the user).Moreover, there is rarely a unique description that can be asso-ciated with a particular image. Thus, recently there has been animmense activity in building direct content-based image searchengines. In content based search engines, each image is repre-sented using features such as color, texture, shape or position.A database consisting of feature vectors of all images is cre-

Manuscript received April 12, 2001; revised February 26, 2002. The associateeditor coordinating the review of this paper and approving it for publication wasDr. Ahmed Tewfik.

G. Aggarwal was with the IBM India Research Laboratory, New Delhi110016, India. He is now with Broadcom, Bangalore 560025, India (e-mail:[email protected]).

Ashwin T. V. and S. Ghosal are with the IBM India Research Laboratory, NewDelhi 110016, India (e-mail: [email protected]; [email protected]).

Publisher Item Identifier S 1520-9210(02)04869-1.

ated. Images in the database that areNearest Neighborsto thequery image according to a similarity metric in the feature spaceare retrieved. There are two prevalent approaches to measuringsimilarity: a) geometric distance-based [29] and b) probabilisticlikelihood-based [21].

The first step in designing a CBIR system is the selection ofan appropriate feature space so that images that are “close” infeature space are also perceptually close to the user. However,fully automatic, rigid approach to image retrieval cannot satisfythe information need of a diverse user population [29]. There-fore, relevance feedback during a retrieval session has emergedas ade factostandard methodology in recent CBIR systems forbridging the gap between the user’s high-level concept (e.g.,sunset) and the low-level representation of images in the featurespace (e.g., dominant orange-yellow color distribution). Giventhe user’s preferences to a set of retrieved images, the goal isto learn his notion of similarity by adjusting the parametersof the chosen similarity metric, and improve the relevance ofthe retrieved images to that user in successive iterations. Thisprocess of repeated database search, however, becomes a bot-tleneck with increase in database size. Furthermore, when thedatabase is located remotely, say over the World Wide Web,downloading irrelevant images in each search iteration signif-icantly slows down the retrieval speed of the image of interest.It is, therefore, desirable to have a capability of understandinguser perception from the query image itself at the client-site.This increases the relevance of the images, retrieved from thedatabase, thereby reducing the time required to search the im-ages of interest.

In this paper, we propose a new CBIR system called iPURE(Perceptual andUser-friendly REtrieval) that incorporates anovel methodology ofintra-query modification and learningof user perception at the client-site in addition to relevancefeedback in successive iterations. An object-level view of thequery image is first obtained using image segmentation. Oncethe user selects segment(s) of interest, a set of modified imagesis automatically generated by the system at the client-site.Initial user perception is learned based on the user feedbackon the set of modifications. A new color image segmentationalgorithm has been developed that is reasonably accurateand at the same time fast for an interactive application. Inaddition, a novel relevance feedback technique that explicitlyuses both positive and negative examples to improve retrievalperformance is incorporated in this system.

A. Relationship With Existing Retrieval Systems

All CBIR systems essentially perform search and retrievaloperations in image databases using image centric features like

1520-9210/02$17.00 © 2002 IEEE

Page 2: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

202 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002

color, texture, shape, position, etc., rather than the traditionalway of searching annotated databases using keywords. Indi-vidual systems have typically focussed on some key issues ofthis multidisciplinary problem. In order to put the proposediPURE system in right perspective, we present a high-levelcategorization of various image retrieval systems in Fig. 1based on three criteriaviz., segmentation, relevance feedback,and query modification.1

Research activities in CBIR has progressed in three maindirections. Initial systems built using carefully selected globalimage features and fixed similarity metrics, perform well forretrieving images containing “stuff” and “scenes” [12], [26],[34] where the entire image is relevant. However, these sys-tems are not suited for searching objects where large parts of theimage are irrelevant. Thus, a second class of systems have beenbuilt around image segmentation. These systems are designedto retrieve “things” instead, by extracting local segment fea-tures [31], [14], [7], [20] and matching the object-level view ofdatabase images using some predetermined similarity metric interms of segment features. Both region-based and contour-basedshape features have been investigated for object-based image re-trieval. While region-based features are more robust, contour-based features can result in more precise retrieval especially inpresence of occlusion [31], [14]. CBIR systems with predefinednotion of similarity, irrespective of the features they employ, areefficient for searching homogeneous image databases, where thenotion of perceptual similarity during database querying is im-plicitly obvious. An example query image always leads to thesame set of retrieved images even though different users mayhave different requirements.

The performance of image-centric retrieval systems withfixed similarity metric is not satisfactory essentially due to thegap between the user’s implied concept, and low-level visualfeatures, and also due to the inherent subjectivity of humanperception. MARS [29] is one of the first retrieval systems toemploy relevance feedback, and estimates the user perceptionof the query image by weighting low-level features. PicHunter[9], FourEyes [22], SurfImage [23], and ImageRover [32]are some of the other CBIR engines that employ relevancefeedback. All these systems use global features for databaseindexing and retrieval.

We propose a novel methodology here that usesautomaticmodificationsof the query image itself and learning of user per-ception through intra-query learning on this set of modified im-ages. Since the images are generated at the client-end itself, itsaves the database search and image download time which isa significant overhead in the current relevance feedback basedimage retrieval paradigm. In comparison, VisualSEEk [34] onlyallows a user to manually modify the global features of theimage. Our technique modifies the retrieval parameters and syn-thesizes training images automatically. User responses on thisset of modified images is used to learn the initial weights of re-trieval parameters. In comparison, systems like Blobworld [7]and QBIC [12] depend on user’s assignment of weights to dif-ferent image properties, e.g., color, texture, shape, position etc.

1Note that distances of various systems from the origin do not bear any specialsignificance, and Fig. 1 is for bringing out the key features in some of the currentsystems and our relationship to these systems.

Fig. 1. High-level classification of CBIR systems.

A typical user is expected to be oblivious to these image-cen-tric terms. Actually, it is not easy even for an image processingexpert to properly specify the feature weights withouta prioriknowledge of the retrieval strategy and feature distribution inthe database. Furthermore, our system uses an object-level viewof the image via image segmentation and hence, allows ob-ject-level modifications and retrieval of “things.”

The rest of this paper is organized as follows. The overviewof the proposed iPURE system is presented in Section II. TheiPURE segmentation technique and feature extraction are de-scribed in Section III. We present the iPURE intra-query mod-ification methodology in Section IV. Section V deals with theiPURE intra-query learning and relevance feedback technique.The effectiveness and performance of the iPURE system is re-ported in Section VI. Finally we summarize the present studyand mention some future research issues in Section VII.

II. iPURE SYSTEM ARCHITECTURE

The iPURE system is a segmentation-based image retrievalsystem that works in a client-server environment. The systemarchitecture is shown in Fig. 2. Database creation and upda-tion is an offline task where images are first segmented usinga new reasonably accurate yet fast color segmentation scheme.Feature-based segment descriptors as well as spatial segmentdescriptors are extracted for each segment in the image. Thesefeature vectors are ingested along with the segmentation resultsinto the database.

The server segments the query image given by the user inreal-time (in interactive sense) and sends the segmentation re-sults to the client. The user can select one or more segmentsof interest and can then either proceed directly to searchingthe database, or go through the intra-query modification andlearning. Our retrieval strategy for multiple segments is similarto that of Blobworld [7] and VisualSEEk [34] and consists oftwo steps. In the first step, images containing the most similarsegments in terms of individual similarity are identified. Eachsegment is equally weighed to compute the rank of the image.In the second step, spatial relationship among the segments isused to rank the images identified in the first step. For a singlequery segment, only the first step is performed.

Page 3: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

AGGARWAL et al.: IMAGE RETRIEVAL SYSTEM WITH AUTOMATIC QUERY MODIFICATION 203

Fig. 2. Architecture of the iPURE image retrieval system.

One of the key innovations in iPURE system is the intra-querymodification and learning phase in which feature re-weightingand query redefinition are performed before searching thedatabase. The client Java applet generates a set of images bymodifying the values of the retrieval features of the querysegments. The user feedback on these modified images is sentto the server which employs relevance feedback techniquesto estimate the retrieval parameters of the similarity metricused for ranking segments in the database. Modifications alsoenable the user to redefine the query point by accepting amodified image and giving a negative feedback on the originalquery image itself.The learned retrieval parameters are usedduring the database search and the top-nearest neighborsto the query point are sent to the client. The user may againgive feedback on the retrieved images that will be used tofurther refine the retrieval parameters and search the database.Successive iterations of this relevance feedback loop help theuser to converge to her requirements.

III. I MAGE SEGMENTATION IN iPURE

The proposed iPURE system employs a novel color imagesegmentation scheme, that is fast and reasonably accurate inorder to support query-time generation of the object-level viewof a user-provided initial query image and modification of thesame during an interactive retrieval session by leveraging thestrengths of region-based and edge-based approaches. Thisgives the user a capability to start the search from his ownquery image.

Edge-based segmentation alone suffers due to presenceof spurious and missing edge pixels, whereas stand-aloneregion-based approaches, in general, offer limited performancedue to difficulty in finding initial seed points in image spaceor modes in feature space. Thus, integrated segmentation

techniques that incorporate both edge-based and region-basedinformation have been proposed for both gray-level and colorimages. One of the earliest integrated segmentation techniquefor gray-level images has been proposed by Pavlidiset al.[25]. In this approach, segments are obtained by using aregion-growing approach, and then edges between regionsare eliminated or modified based on contrast, gradient andshape of the boundary. Interested readers are referred to [24],and [33] for relatively recent surveys of image segmentationtechniques. In the context of color image retrieval, Saber andTekalpet al. proposed a new integrated method for combinedcolor image segmentation and edge linking [30]. In theirproposed method, initial segmentation is obtained using colorinformation alone. Spatial contiguity of resulting segmentsis ensured by employing a Gibbs random field model basedsegmentation map. Next, spatial color edge locations aredetermined. Finally, regions in the segmentation map are splitand merged by a region-labeling procedure to enforce theirconsistency with the edge map. Fanet al. have also recentlyreported an integrated color segmentation method for automaticface detection. Color edges are first obtained by combining anisotropic edge detector and an entropic thresholding technique.Significant geometric structures in the edge map are analyzedfor generating initial seeds for seeded region growing. Theresults of edge processing, and region growing are finally in-tegrated to provide homogeneous image regions with accurateand close boundaries.

The basic idea behind the iPURE approach is to generate aninitial over-segmentation of the image by detecting the domi-nant color features, and then merge nonobvious contiguous seg-ments using edge information. In order to generate perceptuallyacceptable segments and to aid query modification, the segmentboundaries are further regularized by solving a quadratic opti-mization problem.

Page 4: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

204 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002

A. Feature-Driven Color Region Segmentation

The iPURE system first generates region-based segmentationof the image by analyzing the global histogram of the intensi-ties in the LUV space. LUV color space is chosen for its ap-proximate uniformity in perceptual sense, and its ability to de-couple illumination and color information. A simple nonpara-metric procedure usingmean shiftalgorithm [8] is used for es-timating density gradients and finally to robustly determine thedominant color modes of the histogram. Segmentation param-eters are chosen to generate an over-segmentation of the inputimage.

B. Edge-Based Color Region Merging

Since a histogram essentially captures the global character-istics of an image, a relatively large segment with a gradualchange in color is frequently segmented into multiple percep-tually nonobvious segments. There is an inherent difficulty infinding modes of the histogram along with preset values of min-imum and maximum segment-size that can affect the image seg-mentation process. Sufficient under-segmentation of the imagecan merge these segments, but only at the expense of unaccept-able merging of segments with distinct colors with one another.iPURE performs edge-based post-processing of the over-seg-mented results to properly merge the segments.

First, a histogram of gradient magnitudes, obtained usingSobel operators is computed at segment boundary pixels foreach of and . A high threshold, is obtained sothat about 10% of the total boundary pixels have gradientmagnitudes greater than for each of and . Pixelswith gradient magnitude greater than this high threshold indeedcorrespond to perceptually obvious strong edges in a widevariety of images. Now a low threshold, is chosen to be amoderate, around 40% of the high threshold,. A lower gra-dient magnitude is needed only in poorly captured imageswith very wide local contrast variations (i.e., partially over-and under-exposed). Next for each segment, the fraction,of the boundary pixels with gradient magnitude greater than

is computed for each of its contiguous segment. If isless than, say 50%, for each of , and , it is inferred thatregion-based and edge-based boundaries do not substantiateeach other and, hence, the segmentsand are merged. Thisidea of threshold selection is very similar in spirit tohysteresisthresholding, originally proposed in [6].

Perceptual quality of the segmentation is further improvedusing results from vision psychology. It has been establishedthat human eye can not distinguish subtle color differences whenthe luminance value is either too high or too low. Hence, ifthe average luminance valueof contiguous segmentsand

are both less than or greater than some low and high empiricalthresholds, respectively, the fractional thresholdfor , and

is increased from say 50% to 65%.Note that and cannot be selected very reliably for two

contiguous segments that are quite close in terms of their colorproperties. Thus, in order to improve the robustness of the seg-mentation process, it is ensured while merging two segmentsthat vector difference, of mean color vectors of the candidatesegments do not differ by say, more than 20. In addition, if the

mean color vectors of the candidate segments is less than 10then the segments are merged irrespective of the edge strengthsbetween them. The vector differenceis calculated using thefollowing formula:

where and are the mean colors of twocontiguous segments,and . Usually one iteration is sufficientfor removing most of the perceptually nonobvious segmentsfrom an image. Our choice of have been tested with over2000 images in the Corel dataset.

C. Regularization of Segment Shapes

The human eye generally tends to perceive smooth kindof shapes as opposed to sudden local variations in shape. Inessence, the human eye searches for “regularity” in an imagesegment more often than not. However, segments often havecomplicated contours. Inherent imaging noise, as well asdifficulty in histogram mode finding and threshold settingduring edge-based post-processing further lead to even lessperceptually desirable image segmentation. In the iPUREsystem, segment boundaries are regularized by formulating aquadratic optimization problem that minimizes the total lengthof segment boundaries, while constraining that the segmentsthemselves are minimally modified.

Let denote an image pixel, andrepresent a segment. Then, if the pixelbelong to the -th seg-ment, and . This constraint corre-sponds to the constraint function

(1)

The first term guarantees that a pixel belongs to only one seg-ment, while the second term ensures that is either 0 or 1.Now, assume be the number of pixels in the original-th seg-ment. Thus we need to minimize the cost function:

(2)

where denotes the eight-connected neighbors of pixel.The second term is minimized if the original segment sizes arenot changed, whereas the third term is minimized if the neigh-bors of each pixel belong to the same segment. Thus, minimiza-tion of the third term would lead to a single segment in theimage, which in turn results in high penalty associated with thesecond term. We solve this optimization problem, described by(2) by mapping it onto a Hopfield type neural network as de-scribed in [27].

Note that the edge-based processing and the shape regular-ization process may be combined in the cost function by prop-erly incorporating the image gradients in the third term of (2). A

Page 5: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

AGGARWAL et al.: IMAGE RETRIEVAL SYSTEM WITH AUTOMATIC QUERY MODIFICATION 205

(a) (b)

(c) (d)

Fig. 3. Post-processing of color-based segmentation results for an imagecontaining horses. (a) Original image, (b) over-segmented, (c) region-merged,and (d) shape regularized.

sparse implementation of Hopfield network, proposed in [13],[27] is employed to reduce the complexity of Hopfield state up-dation to , where is the number of segment boundarypixels. The states of the network are initialized according to theinitial edge-processed segmentation.

Fig. 3(a) shows the original image of multiple horses of dif-ferent colors and sizes, taken from the “horses” category ofCorel stock pictures. Fig. 3(b) shows the twelve segments gen-erated by the mean-shift algorithm. Over-segmentation has re-sulted in the background area being over-segmented into eightsegments due to the shading and illumination effects. Becauseof strong shadowing effect, part of the larger horse has beenmerged with the head part of the smaller horse segment. Edge-based post-processing merges the nonobvious background seg-ments into one background segment that actually correspondto human perception, as shown in Fig. 3(c). Finally, the seg-ment boundaries are regularized, i.e., smoothened, as shown inFig. 3(d). This regularization of the segment boundaries aidsin feature extraction and shape matching. The effectiveness ofthe proposed integrated segmentation technique and post-pro-cessing is further demonstrated using other complex outdoorscenes, as shown in Figs. 4 and 5.

The entire segmentation process, including the mean-shiftalgorithm and edge-based post-processing, takes around 5 sfor a 128 192 image on a standard workstation. Edge-basedpost-processing and shape regularization are dependent onthe number of segment boundaries in the region-based seg-mentation. For a wide variety of stock photography images, ittakes 1–2 s for edge-based post-processing, and around 1–2s for 20 iterations of Hopfield network. Note that without theinitial segmentation and the sparse implementation of Hopfieldnetwork, it takes tens of minutes of execution time for Hopfieldnetwork to generate an acceptable segmentation.

D. Feature Extraction and Database Creation

Segmentation provides an object-level view of an image, i.e.,an image can be represented by the union of segments gener-

(a) (b)

(c) (d)

Fig. 4. Segmentation results for a beach scene. (a) Original image, (b) over-segmented, (c) region-merged, and (d) shape regularized.

(a) (b)

(c) (d)

Fig. 5. Color-based segmentation of a mountain image. (a) Original image,(b) over-segmented, (c) region-merged, and (d) shape regularized.

ated by the iPURE segmentation module. Thus, an image in thedatabase is represented by a set of segment descriptors associ-

Page 6: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

206 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002

ated with its constituent segments. Matching and retrieval areperformed based on the segment descriptors. The feature-basedsegment descriptor in the iPURE system currently comprises ofaverage LUV color of the segment, position (and centroidcoordinates w.r.t. image), size (number of pixels in the segment),orientation axis, and three shape moment-invariants. The shapemoments in iPURE are calculated using the normalized centralmoments, that are defined as

where for

using the central moments, , that are defined as

where

and

The three shape moments,, are computed using these normal-ized central moments as follows [18]:

(3)

This set of moments is invariant to translation, rotation and scalechange. The orientation angle, defined as the angle of axis of theleast moment of inertia, is computed as follows [18]:

(4)

The spatial segment descriptor comprises of spatial extent (coor-dinates of the minimum bounding rectangle), and contiguity in-formation (which segments touch the segment at the boundary).Minimum bounding rectangles are used to determine top-down,left-right relationships.

Normalization is performed for each feature dimension in-dependently based on characteristics of its histogram to makethe ranges approximately equal. We are currently investigatingmore sophisticated feature extraction and representation tech-niques for improving the retrieval performance [10].

The iPURE system uses the Mahalanobis distance as the sim-ilarity metric for matching segment descriptors which is definedas

where (5)

where and denote the feature vector of adatabase segment and the query segment respectively andisthe inverse of the correlation matrix.

IV. iPURE QUERY MODIFICATION

Traditionally, relevance feedback based retrieval systemshave used images retrieved from the database to obtain usersfeedback. In contrast, the iPURE system generates a set oftraining examples from the query image itself at the client-site

by modifying the retrieval features of the query segment(s).The user feedback on this set of synthetic images is used togenerate an initial estimate of the matrix and query point,

in (5), which are then used to search the database. Thissubstantially improves the retrieval results since the estimatedellipsoid approximates the user’s requirements. The user maygive feedback on the retrieved results that will be used tofurther refine the ellipsoid to the user requirements throughrelevance feedback techniques. The techniques used to generatethe modified images are described in this section.

A. Single Segment Modification

When the user selects a single segment in the query imageto start the search process, modifications are made in the color,position, size, and orientation features of the selected segmentand corresponding images are synthesized automatically. Theretrieval parameters are then estimated from the correspondinguser responses to the modified segments. To effectively prunethe number of modifications, a large change is made in eachindependent dimension of the feature vector. Acceptance of animage with a large modification in a feature dimension implieslow relevance of the corresponding feature in user perception.

A perceptually large change in average color is achieved bysimply bit flipping (1’s complement of the original) the imagein the RGB domain. LUV transformation of the new RGB valuebecomes the color component of the feature vector of the mod-ified segment. The bit-flipping procedure ensures that the orig-inal texture is preserved in the modified segment. If the originalquery and the color-flipped modified image are both acceptableto the user, then the acceptance of the color during retrieval isreduced.

Position modification for the chosen segment includes dis-placing it to extreme horizontal and vertical positions within theimage. The selected segment is displaced so that it remains en-tirely within the image. If a user finds such an image acceptablethen it implies that segment position has less importance in theuser perception and consequently the weights for the centroiddimensions are lowered.

Size modification includes scaling the segment about the seg-ment centroid by factors of and . Forthe largest value of , the modified segment fills one of the di-mensions of the image. If the selected segment touches eitherthe horizontal or vertical boundaries, no scaling up is possible.In this case, the segment is scaled down to % of itsoriginal size. Different scaling factors can also be used forand -dimensions. Acceptability of size modified images im-plies irrelevance of size dimension for the user.

Orientation modification includes rotating the selected seg-ment about the segment centroid by an angle of. If the mod-ified segment is acceptable to the user, the weight of orientationdimension is lowered during retrieval.

The system assumesorthogonality of features in order toprune the modification space. Strictly speaking, if the featuresare integral then independent modifications may not be suffi-cient to capture the user perception accurately. In such a case,additional modified images in which two or more features aresimultaneously modified would be needed. Obviously, there is atradeoff between the amount of user feedback desired for better

Page 7: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

AGGARWAL et al.: IMAGE RETRIEVAL SYSTEM WITH AUTOMATIC QUERY MODIFICATION 207

understanding of user perception and user patience for givingfeedback on modified images.

B. Multiple Segment Modification

Typically, segmentation-based CBIR systems treat multiplesegment queries as a simple Boolean AND/OR of the query seg-ments. However, simple Boolean AND/OR alone is not suffi-cient to describe multiple segments. Often, there is a semanticrelationship between the segments of interest since the segmen-tation algorithm generally does not segment semantic objects inan image as a single segment. A user may, thus, select multiplesegments to capture his notion of the query object. For example,a color-based segmentation will segment a multicolor nationalflag into multiple segments, and thereby destroying the objectsemantics.

The iPURE system performs multiple-segment modificationsto estimate the relative importance of the spatial relationships,and perform query expansion, if needed. Independent modifi-cations of features of each segment generate a large number ofmodified images. The concept of a semantic object is employedin iPURE for pruning the set of modifications. Frequently, whenthe user selects multiple segments of interest, these form a se-mantic object, where different segments satisfy some spatialconstraints. A semantic object represents a group of segmentswhose relative shape, size, and spatial organization as a wholedefines the user’s notion of query object. For example, when auser selects the multiple segments of a multicolor flag, gener-ating independent modifications for these segments is not at allefficient for learning the retrieval parameters.

Our proposed solution is to infer from user’s feedbackwhether the multiple segments of interest form 1) a semanticobject where relative shape, size, and spatial organizationof segments are strictly preserved, e.g., a multicolor flag, 2)multiple objects of interest, e.g., an apple and an orange, and3) an object and background, e.g., sun and sky. Thus, whenmultiple segments are of interest to the user, initial modificationis made to verify the hypothesis that the segments belong toone of these three cases.

1) Manipulation of Segment Contiguity:Let be thenumber of selected segments by the user. Define an incidencematrix, , of size where is the number ofadjacent pixels between segmentand segment. Note thatis a symmetric matrix and is equal to the size of the segment. A scheme for reducing contiguity between segments of

interest is given as follows.

Step 1) Identify any noncontiguous segment,, such that. Remove the -th row and column

of the incident matrix . Decrement . Ifstop.

Step 2) Find maximally contiguous segment,. If more than one exist,

choose the larger segment. In case of a tie, choosea segment randomly.

Step 3) Compute the convex hull of the remainingsegments. Move the maximally contiguous segmentoutside the convex hull.

Fig. 6. Distorting semantics of object. (a) Segments of interest form atriangular shape in the query image. (b) The maximally contiguous segmentmoved in the modified image. (c) Selected segments are scaled down abouttheir centroids. (d) Segments scaled down about their contact points with themaximally contiguous segment.

Step 4) Remove the -th row and column from the ma-trix . Decrement . If , stop. Else go tostep (2).

One limitation with the above scheme is that in some cases,the maximally contiguous segment cannot be moved outside theconvex hull, while keeping it entirely within the image. Anotherapproach guaranteed to reduce contiguity is to scale down theselected segments about their centroids as shown in Fig. 6(c).However, scaling does not preserve the initial size of the se-lected segments. A third alternative is to keep the maximallycontiguous segment intact and scale down the other segmentsabout a contact point with the maximally contiguous segmentas shown in Fig. 6(d). This maintains contiguity but distorts thesemantic shape of the multiple segments.

If the modified images, shown in Fig. 6 are not acceptableto the user, a semantic segment is created, and single segmentmodification scheme is applied to the semantic segment. Addi-tionally, a modification is performed to estimate the importanceof shape of the semantic segment for retrieval. The average colorof the semantic segment is modified to be the average of colorsof selected segments. The importance of the semantic shape isincreased over the individual color of the segments if user isinsensitive to this modification. If the original query is also rel-evant to the user, then query expansion is performed. In case thecolor modified image is not acceptable to the user, the impor-tance of individual colors is maintained for retrieval. When themodified image, shown in Fig. 6(b) or (c) is acceptable to theuser, it is inferred that selected segments are independent, andsingle segment modification scheme is applied to each of these.The importance of contiguity is reduced during retrieval. Impor-tance of top-down, left-right relationship is estimated by inter-changing the position of selected segments. The importance ofenclosure during retrieval is estimated by moving the enclosedsegment outside the segment that encloses it. The iPURE systemalso prunes the modification space by not performing scaling,translation, and rotation for segments that are labeled as back-ground regions using a set of heuristics [2].

Page 8: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

208 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002

V. iPURE INTRA-QUERY LEARNING AND

RELEVANCE FEEDBACK

Efficient CBIR systems employ learning techniques based onuser feedback to improve retrieval performance. MindReader[17] formulated learning of user perception as an optimizationproblem to estimate parameters of the distance metric to mini-mize the sum of distances of relevant examples from the query.Using the quadratic distance metric

where (6)

the problem is written as

subject to

where represent therelevance scoresthat a user associateswith the relevant images. It has been shown that theminimumvolume ellipsoidis the optimal solution of the above problem.The optimal parameters are obtained as

(7)

(8)

(9)

where is the weighted covariance matrix. MARS [29], one ofthe earliest relevance feedback based CBIR systems, assumesfeature independence, i.e., a diagonalto reduce the similaritymetric to a weighted Euclidean distance. Clearly, a MindReaderbased approach would require images on which feed-back would need to be provided to compute the values for thecomplete matrix. Hence, Rui [28] proposed a hierarchy offeatures that limits the interaction between features. This sig-nificantly reduces the number of parameters to be learned whichresults in more robust parameter estimates.

However, these systems learn only from the images marked asrelevant and ignore the information provided by the nonrelevantimages, e.g., the MindReader system requires that the relevancescores, necessarily be positive, i.e., the examples need to berelevant. Typically, these algorithms converge rapidly withoutadequate exploration of the feature space, i.e., the number ofrelevant examples retrieved saturates in a few iterations and atthe same time some nonrelevant images continue to be retrievedin successive iterations. Further, they expect a user to assign therelevance score to each relevant image. We believe that this issimilar to asking the user to coherently rank the images whichmay be a nontrivial task even for an experienced user. Otherslike Nastar [21], Brunelli [5] incorporate nonrelevant examplesalso along with the relevant examples for learning user percep-tion usingnonparametric models.

We propose a new relevance feedback algorithm in the iPUREsystem that uses both relevant as well as nonrelevant examplesto learn user perception. To the best of our knowledge, iPURE

is among the few systems that explicitly use nonrelevant exam-ples for estimates parameters ofparametric models. Further, theiPURE system requires the user to provide only “good,” i.e., rel-evant, and “bad,” i.e., nonrelevant, labels. The iPURE relevancefeedback algorithm computes the relevance scores,using thelabeled examples to ensure that the new parameter estimates arepushed away from the nonrelevant examples.

In the iPURE system, the user provides feedback either on aset of images generated automatically by modifications of thequery image at the client side, or on the set of top-images re-trieved from the database. Section V-A describes the algorithmused to learn from users feedback on modified query images.Section V-B describes the algorithm used to estimate the simi-larity metric when user provides feedback on database images.

A. Intra-Query Learning on Modified Images

The number of relevant images is expected to be small duringuser feedback on intra-query modifications. Hence, the iPUREsystem employs a special case of (5) with a diagonal. Thus,the similarity metric reduces to weighted Euclidean distance,i.e., where the weights, , denotethe relative importance of the-th feature. The system adapts

to capture variation in the relevant examples along each ofthe feature dimensions. These weights are updated as

, where is the standard deviation of the-thfeature dimension for the relevant segments andis an empir-ically determined constant. The query pointis re-defined asthe mean of the feature vectors of relevant segments.

These updated weights and query point are used as a startingpoint to search the database. This quick estimate of the retrievalparameters even before the database is searched, increases theprecision of images retrieved during the first database search.This is especially the case when different users may have dif-ferent perception about the same query, e.g., a user searching for“round” objects may start with a query image that has a “white-round” object in it as shown in the first image of Fig. 7(a).In the absence of any intra-query modifications, the databasesearch would retrieve all “white-round” objects since all featuredimensions, i.e., color, shape, size, would be weighed equally.However, when intra-query modifications are used as shown inFig. 7(b) and the user finds the color modification acceptable,the system learns the relative unimportance of color. The systemreduces the weight of color features and hence, retrieves more“round” objects during the first database search as shown inFig. 7(c). The number of “round” objects in the first databasesearch increases to nineteen from thirteen. The improvementin retrieval performance is demonstrated in Table I. Intra-querymodification and learning reduces the time required to retrieve22 relevant objects in the top-25 retrieved images from 0.15 s(three iterations) to 0.09 s (one iteration).

When the query image contains multiple segments of in-terest, the iPURE system learns the notion of asemantic objectas demonstrated when the user is searching for “mushrooms”starting from the query image shown in Fig. 8(a). When theuser selects the two segments that comprise the mushroom,the iPURE system distorts the object semantics by scalingdown the two segments about their centroids as shown inFig. 8(b). The system infers that the two segments form a

Page 9: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

AGGARWAL et al.: IMAGE RETRIEVAL SYSTEM WITH AUTOMATIC QUERY MODIFICATION 209

(a)

(b)

(c)

Fig. 7. Retrieval of “white-round” object with and without intra-querylearning. (a) Retrieval results without intra-query learning when user lookingfor “roundness” but starts from a “white-round” object. (b) Automaticmodifications for the “white-round” segment. (c) More “round” objectsretrieved after intra-query modification and learning.

TABLE IPERFORMANCECOMPARISON OFINTRA-QUERY LEARNING AND RELEVANCE

FEEDBACK. (a) NUMBER, N , OF RELEVANT RETRIEVED IMAGES (IN TOP

25) WHEN THE USER IS ACTUALLY INTERESTED INROUNDNESS. (b)TIME SPENT DURING RETRIEVAL (IN SEC)

semantic object since the user rejects the modified image.Subsequently, the system generates a set of modified imagesthat includes assigning a uniform color (average of the twosegment colors) to both the segments, individual color-flip,combined translation in both and -directions, combinedscale-up and down and combined rotation. Since the user findsboth the original and the uniformly gray colored objects asrelevant, but not the color-flipped bluish one, a query expansionis performed. Database search is done on both the originaltwo-segment query mushroom and the single-segment graymushroom. The precision improves substantially since thesystem matches the shape of the two segment semantic object,i.e., the mushroom, to the retrieved segments and not just twoindependent segments.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Fig. 8. Semantic object modifications. (a) Query image with two querysegments. (b) Contiguity broken to distort object semantics. (c) Color flipped.(d) Color average. (e) Horizontal translation. (f) Vertical translation. (g) Scaleup. (h) Scale down. (i) Rotation.

B. Relevance Feedback on Images Retrieved From theDatabase

In further iterations, when the database has been searchedfor the top- matches and the number of training examples islarger, the iPURE system does not place the diagonal restrictionon the matrix in (5). The nonzero off-diagonal elements inmay be robustly estimated as described in [17], [28]. In a het-erogenous collection of images, the features employed to rep-resent the image segments do not accurately capture the visualperception of all users. This results in some relevant segmentsappearing close to the nonrelevant segments in the feature spaceeven though the user perceives them as being far apart. To ad-dress this limitation, the iPURE algorithm selects a fraction ofthe relevant examples for estimating the retrieval parameters.Relevant examples that are farther from the nonrelevant exam-ples are chosen over those which are closer to nonrelevant ex-amples. This selection process ensures that the new estimates donot capture the nonrelevant examples. The sampling of relevantexamples is achieved by estimating the relevance scores, (7)to ensure that the resulting similarity metric best represents therelevant examples, the details of the algorithm are presented inthe next section.

1) Selecting Optimal Subset of Relevant Examples and Com-puting Their Relevance Scores:Selection of the right subset ofrelevant examples to estimate parameters have been explored inliterature, Jolion [19] proposed a random sampling based ap-proach forMinimum Volume Ellisoidestimation for clusteringproblems. The iPURE system uses a greedy algorithm that se-lects the examples and explores the feature space by changingthe scores associated with the relevant examples. The rele-vance scores along with the similarity metric parametersare obtained through an iterative process. Points in the featurespace at a fixed distance from the querylie on an ellipsoid

Page 10: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

210 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002

Fig. 9. Proposed relevance feedback algorithm for estimating the similaritymetric.

owing to the quadratic nature of the distance metric. The algo-rithm updates the parameter estimates to ensure that the ellip-soids representing the similarity metric better captures the rele-vant examples while excluding nonrelevant examples.

The pseudo code for the algorithm is shown in Fig. 9. Thescores are initialized to 1, i.e., all the relevant examples areconsidered to be equally important. In each iteration, the simi-larity metric parameters are determined using the currentscores . The distances of the relevant and the nonrelevant ex-amples from the target conceptare determined using (6). Let

denote the farthest relevant example that has a nonzeroscore and be its distance from . Let be the ellipsoiddefined by and having a radius . In most cases itis observed that some nonrelevant examples fall inside. Theseare the examples which are most likely to be retrieved in the nextiteration if the current estimates of are used. Therefore,the parameters are adjusted to moveaway from such exam-ples. Let represent the set of nonrelevant examples inside

. The algorithm modifies the parameters in each itera-tion to reduce the number of examples in . This is achievedas follows. The score of the farthest positive example isset to 0. The scores of the other positive examples with nonzeroscores are updated as the sum of their quadratic distances fromthe examples in . The updation of the scores causes the new

to move away from the examples in . Assigning a zeroscore to the farthest example leads to a shrinking ofensuringtermination. The updated scores are then used to obtain a new

(a)

(b)

(c)

(d)

Fig. 10. Retreival results for retrievinghorse imagesfrom Corel dataset.Segments relevant to user marked with dark gray border, nonrelevant segmentsmarked with light gray border. (a) Without intra-query modification, onlybrown horses are retrieved in first database search. (b) User feedback onautomatic modifications of a horse segment. User accepts color modificationsand translations (marked dark gray border) but rejects rotated horse (markedin light gray border). (c) With intra-query modification and learning, bothwhite and brown horses are retrieved. (d) Higher precision after one iterationof relevance feedback on horse query.

estimate of the parameters and the iteration proceeds. The itera-tion stops when the number of examples in reduces to zero.The final estimate of the parameters are used to rank im-ages in the database and top-images shown to the user.

In [3], we demonstrate the improved performance of the pro-posed sampling-based learning algorithm for a two-dimensionalsynthetic dataset. In the next section we present results for theCorel dataset.

Page 11: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

AGGARWAL et al.: IMAGE RETRIEVAL SYSTEM WITH AUTOMATIC QUERY MODIFICATION 211

(a) (b)

(c) (d)

Fig. 11. Precision and recall forhorseandeaglecategories; with and without intra-query learning of feature weights (IQM stands for Intra Query Modification).

VI. RESULTS

The iPURE system is tested on a database of 2200 Corel stockphotographs of size 128192 pixels from varied categoriessuch as sunsets, horses, flowers, mountains, everyday objects,highway signs, beaches, eagles etc. The entire database is seg-mented offline tocreate a database with18 547 labeledsegments.We present the retrieval performance of the iPURE system inthis section and compare the results with those of the existingalgorithms. In all experiments, the hierarchical similarity modelproposed by Ruiet al. [28] is used as the distance function forthe proposed relevance feedback algorithm. We choose Rui’smetric since the number of parameters to be estimated is consid-erably smaller than that of MindReader [17], also the segmentlevel descriptors we have employed naturally fall into distinctuncorrelated groups (e.g., color, shape, position, size).

In the absence of intra-query modification and learning, eachfeature dimension is equally weighed. In such a scenario, whenthe user presents a centered brown “horse” segment as the query,the system retrieves onlybrown colored segments centered inthe image, as shown in top-18 retrieval results of Fig. 10(a).However, a user looking for “horses” would findwhite horsesegments, horse segments that are not centered in the image or

of varying size acceptable. In contrast, if intra-query modifica-tion is employed, the system generates a set of modified imagesas shown in Fig. 10(b). Since the user finds the color-modi-fied horse acceptable (shown in dark gray), the system learnsthe relative unimportance of color in user perception and re-duces weights for the color features. The user also accepts theposition and size modified images that results in reduction ofweights for the position and size features. Thus, in response tosuch user feedback on the set of intra-query modification, shownin Fig. 10(b), weights for color, position and size feature com-ponents are reduced. However, the weight for the orientationfeature remain unchanged but its effective importance increasesas other feature weights have been reduced. Hence, after featurere-weighting the system retrieves both brown as well as whitecolored horse segments with moderate variations in position andsize as compared to the query segment, as shown in Fig. 10(c).Further, the number of nonrelevant images reduces to just oneafter a single iteration of the iPURE relevance feedback algo-rithm as shown in Fig. 10(d).

Precision and recall values have been used in literature tomeasure performance of image retrieval systems. Recall is theratio of the number of relevant images returned to the totalnumber of relevant images in the database. Precision is defined

Page 12: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

212 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002

(a) (b) (c)

Fig. 12. Number of images retrieved in successive iterations of relevance feedback (PRF represents Proposed Relevance Feedback algorithm, IQM stands forIntra Query Modification).

(a) (b) (c)

(d) (e) (f)

Fig. 13. Precision and Recall after one iteration of relevance feedback (PRF represents Proposed Relevance Feedback algorithm, IQM stands for IntraQueryModification).

as the ratio of the number of relevant images returned to thetotal number of images returned. The precision and recallperformance of the iPURE system for two classes of naturalimages—horses and eagles is shown in Fig. 11. Clearly, thereis a significant improvement in both precision as well as recallwhen intra-query modification and learning is employed.

The iPURE system further improves the retrieval per-formance in successive iterations of relevance feedback ascompared to existing algorithms. The number of relevantimages in the top-50 retrieved images for queries from four dif-ferent categories—horses, eagles, highway signs—are shownin Fig. 12 for successive iterations. The iPURE algorithm“PRFwith IQM” uses feature re-weighting through intra-query mod-ification and learning in the zeroth iteration and the proposed

relevance feedback algorithm from first iteration onwards.The “PRF without IQM” uses only the proposed relevancefeedback algorithm without intra-query modification.“Rui”implements the similarity model proposed in [28] and uses onlyrelevant examples.“MARS” assumes feature independence andestimates the feature weights from the features variances of therelevant examples [29]. These results demonstrate the effective-ness of the proposed intra-query modifications and explicit useof nonrelevant examples during relevance feedback. We furtherillustrate the benefits by comparing the precision and recall ofvarious algorithms after one relevance feedback iteration forthe four category queries in Fig. 13. In particular, intra-querylearning improves the recall in the initial iteration and after thatthe the sampling-based learning algorithm further improves the

Page 13: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

AGGARWAL et al.: IMAGE RETRIEVAL SYSTEM WITH AUTOMATIC QUERY MODIFICATION 213

precision and recall. Further, the recall does not reach unityfor all category queries even when 500 images are retrievedsince for some category queries, our chosen segment descriptorcannot capture completely the high-level visual characteristics.

VII. CONCLUSION

A new CBIR system that addresses the subjectivity of userperception during image retrieval is proposed, and its effective-ness is experimentally demonstrated in this paper. Our approachfundamentally differs from the traditional approaches in that itemploys a methodology of (client-side) intra-query modifica-tion and learning to reduce the need for traditional compute-and bandwidth-intensive relevance feedback mechanism. Inaddition, intra-query modification and learning provides a moreeffective mechanism for initializing the similarity metric beforesearching the image database. This methodology can very wellbe extended to other “query-by-example” based multimediasearch scenarios as well.

The proposed system is built around image segmentation toenable “object-level” search in the image. We have developed areasonably accurate and fast color segmentation technique thatleverages the strengths of region-based and edge-based segmen-tations. Also, a new parametric relevance feedback algorithm isproposed that explicitly utilizes information about nonrelevantexamples.

Modification strategies described in this paper are based onsimple heuristics. More effective modification schemes may bedeveloped if feature distribution in the image database is takeninto account. Further research is also needed for extending theproposed paradigm to global feature-based image retrieval.

Note that while fundamentally similar in nature to wellstudied classification problems, interactive “query-by-ex-ample” based image retrieval paradigm poses new challenges,since the training points are not representative of the databasefeature vectors. Thus, incremental classification with very fewtraining samples and good generalization needs to be addressedin the context of image retrieval.

REFERENCES

[1] G. Aggarwal, P. Dubey, S. Ghosal, A. Kulshreshtha, and A. Sarkar,“iPURE: Perceptual and user-friendly REtrieval of images,”Proc.IEEE ICME, Aug. 2000.

[2] G. Aggarwal, S. Ghosal, and P. Dubey, “Efficient query modification forimage retrieval,”Proc. IEEE CVPR, June 2000.

[3] T. V. Ashwin, N. Jain, and S. Ghosal, “Improving image retrieval per-formance with negative relevance feedback,”Proc. IEEE ICASSP, Mar.2001.

[4] S. Aksoy and R. Haralick, “Feature normalization and likelihood-basedsimilarity measures for image retrieval,”Pattern Recognit. Lett., SpecialIssue on Image and Video Retrieval, 2000.

[5] R. Brunelli and O. Mich, “Image retrieval by examples,”IEEE Trans.Multimedia, vol. 2, pp. 164–171, Sept. 2000.

[6] J. Canny, “A computational approach to edge detection,”IEEE Trans.Pattern Anal. Machine Intell., vol. PAMI-8, Nov. 1986.

[7] C. Carson, S. Belongie, and H. Greenspan, “Region-based imagequerying,”Proc. IEEE CAIVL, 1997.

[8] D. Comaniciu and P. Meer, “Robust analsis of feature spaces: Colorimage segmentation,” presented at the IEEE CVPR’97, San Juan, PR,1997.

[9] I. Cox, M. Miller, T. Minka, T. Papathomas, and P. Yianilos, “TheBayesian image retrieval system, PicHunter: Theory, implementationand psychophysical experiments,”IEEE Tran. Image Processing, vol.9, pp. 20–37, Jan. 2000.

[10] A. Del Bimbo, Visual Image Retrieval. San Francisco, CA: MorganKaufmann, 1999.

[11] J. Fan, D. K. Y. Yau, A. K. Elmagarmid, and W. G. Aref, “Automaticimage segmentation by integrating color-edge extraction and seeded re-gion growing,”IEEE Trans. Image Processing, vol. 10, pp. 1454–1466,Oct. 2001.

[12] M. Flickner, H. Sawhney, and W. Niblack, “Query by image and videocontent: The QBIC system,”IEEE Computer, vol. 28, no. 9, 1995.

[13] S. Ghosal, J. Mandel, and R. Tezaur, “Automatic substructuring for do-main decomposition using neural networks,”Proc. IEEE ICNN, June1994.

[14] B. Gunsel and A. M. Tekalp, “Shape similarity matching for query-by-example,”Pattern Recognit., vol. 31, no. 7, pp. 931–944, 1998.

[15] P. Hong, Q. Tian, and T. Huang, “Incorporate support vector machinesto content-based image retrieval with relevance feedback,”Proc. IEEEICIP, 2000.

[16] J. Hopfield and D. Tank, “Neural computation of decisions in optimiza-tion problems,”Biol. Cybern., vol. 52, 1985.

[17] Y. Ishikawa, R. Subramanya, and C. Faloutsos, “MindReader: Queryingdatabases through multiple examples,”Proc. VLDB, Aug. 1998.

[18] A. K. Jain, Fundamentals of Digital Image Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1997.

[19] J. Jolion, P. Meer, and S. Bataouche, “Robust clustering with applica-tions in computer vision,”IEEE Trans. Pattern Anal. Machine Intell.,vol. 13, pp. 791–802, Aug. 1991.

[20] M. Ma and B. Manjunath, “NETRA: A toolbox for navigating largeimage databases,”Multimedia Syst., vol. 7, no. 3, 1999.

[21] C. Meilhac and C. Nastar, “Relevance feedback and category search inimage databases,”Proc. IEEE Multimedia Computing and Systems, June1999.

[22] T. P. Minka and R. Picard, “Interactive learning using a society ofmodels,”Pattern Recognit., vol. 30, no. 4, 1997.

[23] C. Nastar, M. Mitschke, and C. Meilhac, “Efficient query refinement forimage retrieval,”Proc. IEEE CVPR, 1998.

[24] N. Pal and S. Pal, “A review of image segmentation techniques,”PatternRecognit., vol. 26, pp. 1277–1294, 1993.

[25] T. Pavlidis and Y. T. Liow, “Integrating region growing and edge detec-tion,” IEEE Trans. Pattern Anal. Machine Intell., vol. 12, pp. 225–233,Mar. 1990.

[26] A. P. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-basedmanipulation of image databases,”Int. J. Comput. Vis., vol. 18, no. 3, pp.233–254, 1996.

[27] A. Ramamurthy and S. Ghosal, “An integrated segmentation techniquefor interactive image retrieval,”Proc. IEEE ICIP, 2000.

[28] Y. Rui and T. Huang, “Optimizing learning in image retrieval,”Proc.IEEE CVPR, June 2000.

[29] Y. Rui, T. Huang, M. Ortega, and S. Mehrotra, “Relevance feedback: Apower tool for interactive content-based image retrieval,”IEEE Trans.Circuits Syst. Video Technol., vol. 8, pp. 644–655, Sept. 1998.

[30] E. Saber, A. M. Tekalp, and G. Bozdagi, “Fusion of color and edge in-formation for improved segmentation and edge linking,”J. Image Vis.Comput., vol. 15, no. 10, pp. 769–780, 1997.

[31] E. Saber and A. M. Tekalp, “Region-based affine shape matching for au-tomatic image annotation and query-by-example,”Vis. Commun. ImageRepresent., Mar. 1997.

[32] S. Sclaroff, L. Taycher, and M. LaCascia, “ImageRover: A content-based image browser for the worldwide web,”Proc. IEEE CAIVL, 1997.

[33] W. Skarbek and A. Koschan, “Color Image Segmentation—A Survey,”Dept. Comput. Sci., Tech. Univ. Berlin, Germany, Tech. Rep. 94-32, Oct.1994.

[34] J. Smith and S. Chang, “VisualSEEk: A fully automated content-basedimage query system,” presented at the Proc. ACM MM, MA, 1996.

Gaurav Aggarwal received the B.Tech. degreein computer science and engineering from IndianInstitute of Technology, Delhi, India, in 1996 and theM.S. degree in information and computer sciencefrom the University of California, Irvine, in 1998.

He was a Research Staff Member at IBM IndiaResearch Laboratory, New Delhi, during 1998–2001.His research interests include image processing,digital video standards, and content-based retrievalsystems. He is currently a Senior Design Engineerat Broadcom India, Bangalore, working on various

multimedia standards.

Page 14: An image retrieval system with automatic query ... fileIEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002 201 An Image Retrieval System With Automatic Query Modification Gaurav

214 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 4, NO. 2, JUNE 2002

Ashwin T. V. received the B.E. degree fromKarnataka Regional Engineering College, Surathkal,India, in 1998, and the M.S. degree in system scienceand automation from the Indian Institute of Science,Bangalore, India, in 2000.

Since 2000, he has been a Research Staff Memberat the IBM India Research Laboratory, New Delhi,where he has worked on image processing tasks likecrop area determination and mosaicing of remotesensing images. Currently he is pursuing contentbased retrieval techniques for IBM products. His

research interests include relevance feedback for content based informationretrieval, multimedia databases and pattern classification.

Sugata Ghosalreceived the B.E. degree in elec-tronics and telecommunication engineering fromJadavpur University, Calcutta, India, in 1988 andthe Ph.D. degree from the University of Kentucky,Lexington, in 1993.

He is a Research Staff Member and a Manager atIBM India Research Laboratory in New Delhi. Heconducts research in image analysis and multimediainformation systems. Prior to joining IBM, he was aResearcher at the Algorithm Research Center of SonyElectronics at San Jose, CA, and a principal investi-

gator in several U.S. DoD sponsored foveal vision projects at Amherst Systems(currently a subsidiary of Northrop Grumman, Inc.), Buffalo, NY. He has pub-lished over 35 papers and holds over 15 U.S. patents in error-resilient videocompression, coding, and reconstruction.