4
Hybdrid Content Based Image Retrieval combining Multi-objective Interactive Genetic Algorithm and SVM Romaric Pighetti, Denis Pallez, Fr´ ed´ eric Precioso [email protected], [email protected], [email protected] University Nice-Sophia Antipolis, I3S Lab (UMR UNS/CNRS 7271) Abstract The amount of images contained in repositories or available on Internet has exploded over the last years. In order to retrieve efficiently one or several images in a database, the development of Content-Based Image Retrieval (CBIR) systems has become an intensively ac- tive research area. However, most proposed systems are keyword-based and few imply the end-user during the search (through relevance feedback). Visual low-level descriptors are then substituted to keywords but there is a gap between visual description and user expecta- tions. We propose a new framework which combines a multi-objective interactive genetic algorithm, allow- ing a trade-off between image features and user evalu- ations, and a support vector machine to learn the user relevance feedback. We test our system on SIMPLIcity database, commonly used in the literature to evaluate CBIR systems using a genetic algorithm, and it outper- forms the recent frameworks. 1 Introduction The last decade has seen a massive increase of mul- timedia content owing to embedded cameras into stan- dard mobile phones, to online shared repositories easily available on Internet and to the growing role of daily life multimedia content on social networks. Current Content-Based Image Retrieval (CBIR) systems have thus to cope with huge multimedia databases. Among all possible approaches, image retrieval is classically considered as a binary classification problem: the class relevant, i.e. the set of images corresponding to the user query concept, and the class irrelevant, the remaining of the database. Active learning is a CBIR framework [8] where the user refines his query by iteratively providing annotations for some images carefully selected by the system, in a so-called Relevance Feedback loop (RF). The selection strategies of images to be annotated are particularly critical for image interactive retrieval sys- tems. Indeed, the training set has to be small since only few annotations should be required from the user [7], hence the annotated image set should lead to higher classification rate [2]. Most of existing approaches first quickly retrieve nearest neighbors of the user query. These data are sup- posed to be rather similar to the query and thus rele- vant candidates for the user. Gathering the query near- est neighbors is based on unsupervised clustering tech- niques. The candidates are further precisely classified with, for instance, a Support Vector Machine (SVM) [11, 3]. Recent works focus on approximate nearest neighbor scheme which get a lower computational cost than exact nearest neighbor search, then the candidates obtained are refined with a SVM [9, 6]. In this article, we want to avoid the unsupervised clustering step in order to explore more optimally the feature space looking for candidates with more ade- quate algorithms like Evolutionary Computation (EC). Those algorithms maintain a set of candidates and recombine them using genetic operators (selection, crossover, mutation...) in order to introduce diversity so as to enlarge the exploration and to converge towards global optima rather than local optima. As we want to involve the user in this space exploration, we focus on interactive evolutionary computation algorithms [14]. Recent works combine Genetic Algorithms (GA) or Interactive Genetic Algorithms (IGA) to explore the so- lution space with a SVM [17, 5] or with a Nearest Neighbor classifier [1] to integrate user feedback. In these works, the user does not provide any query im- age but browse the database to collect content of in- terest. In other recent works, CBIR systems requiring from the user to provide an image query have also been discussed. In [4], two content-based image retrieval frameworks with relevance feedback based on genetic programming are presented. The first framework ex- ploits only the user indication of relevant images, while the second considers also the images indicated as non- relevant.

Hybdrid Content Based Image Retrieval combining …2012.pdfHybdrid Content Based Image Retrieval combining Multi-objective Interactive Genetic Algorithm and SVM Romaric Pighetti, Denis

Embed Size (px)

Citation preview

Hybdrid Content Based Image Retrieval combining Multi-objectiveInteractive Genetic Algorithm and SVM

Romaric Pighetti, Denis Pallez, Frederic [email protected], [email protected], [email protected]

University Nice-Sophia Antipolis, I3S Lab (UMR UNS/CNRS 7271)

Abstract

The amount of images contained in repositories oravailable on Internet has exploded over the last years.In order to retrieve efficiently one or several images ina database, the development of Content-Based ImageRetrieval (CBIR) systems has become an intensively ac-tive research area. However, most proposed systems arekeyword-based and few imply the end-user during thesearch (through relevance feedback). Visual low-leveldescriptors are then substituted to keywords but thereis a gap between visual description and user expecta-tions. We propose a new framework which combinesa multi-objective interactive genetic algorithm, allow-ing a trade-off between image features and user evalu-ations, and a support vector machine to learn the userrelevance feedback. We test our system on SIMPLIcitydatabase, commonly used in the literature to evaluateCBIR systems using a genetic algorithm, and it outper-forms the recent frameworks.

1 Introduction

The last decade has seen a massive increase of mul-timedia content owing to embedded cameras into stan-dard mobile phones, to online shared repositories easilyavailable on Internet and to the growing role of dailylife multimedia content on social networks. CurrentContent-Based Image Retrieval (CBIR) systems havethus to cope with huge multimedia databases. Amongall possible approaches, image retrieval is classicallyconsidered as a binary classification problem: the classrelevant, i.e. the set of images corresponding to the userquery concept, and the class irrelevant, the remaining ofthe database. Active learning is a CBIR framework [8]where the user refines his query by iteratively providingannotations for some images carefully selected by thesystem, in a so-called Relevance Feedback loop (RF).The selection strategies of images to be annotated areparticularly critical for image interactive retrieval sys-

tems. Indeed, the training set has to be small sinceonly few annotations should be required from the user[7], hence the annotated image set should lead to higherclassification rate [2].

Most of existing approaches first quickly retrievenearest neighbors of the user query. These data are sup-posed to be rather similar to the query and thus rele-vant candidates for the user. Gathering the query near-est neighbors is based on unsupervised clustering tech-niques. The candidates are further precisely classifiedwith, for instance, a Support Vector Machine (SVM)[11, 3]. Recent works focus on approximate nearestneighbor scheme which get a lower computational costthan exact nearest neighbor search, then the candidatesobtained are refined with a SVM [9, 6].

In this article, we want to avoid the unsupervisedclustering step in order to explore more optimally thefeature space looking for candidates with more ade-quate algorithms like Evolutionary Computation (EC).Those algorithms maintain a set of candidates andrecombine them using genetic operators (selection,crossover, mutation...) in order to introduce diversityso as to enlarge the exploration and to converge towardsglobal optima rather than local optima. As we want toinvolve the user in this space exploration, we focus oninteractive evolutionary computation algorithms [14].

Recent works combine Genetic Algorithms (GA) orInteractive Genetic Algorithms (IGA) to explore the so-lution space with a SVM [17, 5] or with a NearestNeighbor classifier [1] to integrate user feedback. Inthese works, the user does not provide any query im-age but browse the database to collect content of in-terest. In other recent works, CBIR systems requiringfrom the user to provide an image query have also beendiscussed. In [4], two content-based image retrievalframeworks with relevance feedback based on geneticprogramming are presented. The first framework ex-ploits only the user indication of relevant images, whilethe second considers also the images indicated as non-relevant.

Tran et al. describe a multi-objective genetic al-gorithm (MOGA) but the method has not been imple-mented [15]. The very recent work of Lai et al. presentan IGA to address CBIR context [10] from whom ourmethod is inspired, but we prefer to combine an multi-objective IGA with a SVM to more precisely retrievethe images of interest with less annotations.

The paper is organized as follows: in the next sec-tion, we briefly present the approach of Lai et al [10]and we present its limitations, before describing in de-tail our hybrid method; in the third section, we presentexperiments of our method combining the power of ECand the power of SVM on SIMPLIcity project database;we then conclude and present future works.

2 An Hybrid Interactive CBIR system2.1 Related work

Lai et al. use basic low-level MPEG-7 descrip-tors with a single-objective IGA to address interactiveCBIR [10]. They consider a linear combination be-tween the similarity of images with the image querybased on descriptors and user evaluations representedby δ, as fitness function: F (q, C) = w1 × sim(q, C) +w2 × δ . The authors have empirically decided thatw1 = w2 = 0.5. In order to improve their framework,we have decided to adapt it into a multi-objective opti-mization problem as suggested by [15] in another con-text. We get thus rid off setting w1 and w2 and we allowinstead any combination. The best combinations definethe Pareto front to which the MOGA tends to converge.[15] suggested to use NSGA-II without implementing itnor conducting tests of their idea.

Another limitation of their system lies in the numberof images that can be retrieved. Indeed, with their IGAthe user needs to evaluate all the individuals of the pop-ulation, hence if we want to retrieve, says, 100 imagesat a time, the user will have to annotate all those images.

2.2 Our frameworkIn order to ensure the scalability of our system, we

combine the IGA approach with an interactive SVMwhich iteratively learns user evaluations according tothe IGA results at each iteration. SVM is well-known tobe efficient with few training annotations. Furthermore,in interactive learning the training set is augmented ateach generations, starting from the query. This leads tothe system presented in Fig.1 and which steps are de-tailed bellow:

(1) The user provides an image query;(2) The descriptors of this image are computed and

the n most similar images are retrieved using only im-age similarity on the visual descriptors and form the ini-tial population of the MOGA;

Image database

Retrieve n mostSimilar images

Add the query to The SVM training

set

Train the SVM

Genetic algorithmGenerate n new Vectors from thePrevious ones

Match the n vectors To images in the database

query

Binaryevaluation

Satisfied ?

Select kImages to be evaluated

By the user

Evaluate imagesAnd add them

To the training set

yesno

1 2

3

4

5

68

9

7

Figure 1. Flowchart of our framework

(3) The initial training set of the SVM is made of thequery only;

(4) The SVM is trained, providing fitness value tothe whole population;

(5) For each image in the population, the signed dis-tance to the SVM decision boundary and the similar-ity measure are used as fitness values for the MOGA.This algorithm generates n new encoding vectors fromthe previous generation using genetic operators as se-lection, crossover and eventually mutation;

(6) As those new vectors may not match any imageof the database, they must be replaced by encoding vec-tors of nearest images from the database using the sim-ilarity measure. If for some vectors, the nearest imagehas already been matched by another vector, we use thesecond nearest image etc;

(7) These n images are displayed to the user as theresults of the search. They are sorted using the userinterest learnt by the SVM; If the user is satisfied, hecan stop, otherwise the system continues;

(8) As the user is not able to evaluate all n images,k images (with k ≤ n) that are difficult to be classifiedfor the SVM are selected and presented to the user forevaluation. So, selected images are the k closest imagesfrom the SVM decision boundary. However, images al-ready evaluated in previous generations are not selectedfor new evaluations;

(9) The user annotates the k images which are addedto SVM training set;

(10) The process continues from step 4 until the useris satisfied.

The different parameters of our method are : NSGA-II for the GA with no mutation and same crossover

as [10]; SMO-SVM with a polynomial kernel of degree5 is used. We used the same descriptors and similaritymeasure as [10] so that the results reveal the changesbrought by the learning-exploring framework and notby new more efficient features.

3 Experiments

In order to evaluate our system and to compare itto existing frameworks, we focus on the SIMPLIcityproject [16] database which is composed of 1000 im-ages from Corel database. This database does not cor-respond to Computer Vision community standards forevaluating CBIR systems but it has been considered inseveral recent papers [13, 10, 5] and most of all in themethod we take for reference [10]. We have thus de-cided to use this database with the same experiment pro-tocol as in [10].

One experiment consists in randomly pick 10 imagesfrom each categories. For each image, a run of 9 gener-ations with this image as query is issued with our sys-tem. Then, the worst and best runs are removed for eachcategory and the average is computed on the 8 remain-ing runs for each category. Experiments are run fivetimes and the average is presented. An image is consid-ered relevant to the query if it is in the same categoryas the query. In our evaluations, the user is consideredas a perfect evaluator and thus annotate correctly eachimage with respect to its category. In addition, the nota-tion is binary, the images are annotated either to 1 or to0 if they are in the same category as the query or not re-spectively. We measure the performance of the differentsystems by computing the number of relevant results inthe top 20 images.

After reimplementing Lai et al. method [10], wehave conducted the same experiments, resumed inFig.2, that are unfortunately far from being similar tothose published in the original paper. Despite some mailexchanges with the authors, which provided us withmore implementation details than those in their paper,we have not been able to reproduce their results. Theystill did not provided some remaining implementationdetails which could explain the differences: normaliza-tion of low-level descriptors, precise parameter settingsfor the genetic algorithm. In such a context, in the fol-lowing experiments, we compare our method againstour own implementation of Lai et al. method [10].

As mentioned in the previous section, in [10] the au-thors have averaged the image similarity and the userRF evaluation into the fitness function. We consider in-stead this combination as a multi-objective optimizationusing NSGA-II algorithm.

We thus combined NSGA-II with a SVM which al-lows us to increase the number of individuals without

Figure 2. Our implementation of [10]

Figure 3. Comparison of all mentioned methods onthe same figure

requiring from the user to evaluate all the images con-tained in the population. Indeed, based on the distancebetween an image and the SVM decision boundary, wecan propagate the user evaluation to all images in thedatabase. Results of our framework are shown in Fig.3.It outperforms the one proposed in [10] as it finds morethan 80% of relevant images in only 3 generations forall categories while they assert it is possible in 6 gener-ations. Images of generation #1 are retrieved only usinglow-level descriptors since the user has not yet madeany evaluations. Relevant images retrieved in genera-tion #2 are not good because the fitness is based on thelearning process of SVM which learnt nothing relevantfrom generation #1. It is only at generation #3 that thepower of SVM combined to MOGA can be seen. More-over, our evaluation protocol is simpler for the user asonly binary evaluations are required (relevant or irrele-vant) while, in [10], each image must be evaluated with10 nuances which is a quite more complex cognitivetask. If this is true for image categorization task (theuser is supposed to look for specific categories of ob-jects present or not in images from a database), we mayconsider a more complex annotation process for othertasks involving user feelings as considered in [17] oreven more recently in [12]. In Fig.3, we also reduce thenumber of image evaluations from 20 to 5 per genera-tion. Our framework just requires 30 user evaluations

Image Query

Results : User Evaluations :

Generati on 1

Gen erati on 2

Gen erati on 3

Figure 4. Our retrieval processto retrieve 90% images (8 generations) rather than 120user evaluations (6 generations) in [10]. Finally, as eachcategory of the database contains 100 images, we testedour framework in order to find 100 best images ratherthan 20. We also report the results of a standard SVMapproach aiming at image database ranking [7].

In Fig.4, screenshot of our application is presented,showing the 3 first generations of a 20 image retrievalprocess.

4. Conclusion

After presenting a short survey on CBIR, we presenta hybrid method combining a multi-objective IGA, forits capability to converge towards global optima, witha SVM for its capability to learn user evaluations re-quired by the IGA. We tested our framework on a stan-dard database considerd in all recent works describ-ing CBIR systems based on GA. Results show that ourhybrid method outperforms previous works, retrievingfaster the relevant images present in the database withrespect to image query provided by the user while re-quiring less user interactions. Based on these prelimi-nary promising results, we are currently conducting ex-periments on larger database more used in computer vi-sion community (PASCAL VOC 2007, VOC 2009...).We are also investigating other multi-objective evolu-

tionary algorithms based on differential evolution whichshould converge faster while preserving retrieval preci-sion.References

[1] M. Arevalillo-Herrez, F. J. Ferri, and S. Moreno-Picot.Distance-based relevance feedback using a hybrid inter-active genetic algorithm for image retrieval. Applied SoftComputing, 11(2):1782 – 1791, 2011.

[2] E. Chang, S. Tong, K. Goh, and C. Chang. Support vec-tor machine concept-dependent active learning for imageretrieval. IEEE Trans. on Multimedia, 2, 2005.

[3] M. Crucianu, D. Estevez, V. Oria, and J.-P. Tarel. Hyper-plane queries in a feature-space m-tree for speeding upactive learning. In BDA, 2007.

[4] C. D. Ferreira, J. A. Santos, R. da S. Torres, M. A.Goncalves, R. C. Rezende, and W. Fan. Relevance feed-back based on genetic programming for image retrieval.Pattern Recogn. Lett., 32(1):27–37, jan 2011.

[5] S. Ganesh, K. Ramar, D. Manimegalai, and M. Sivaku-mar. Image retrieval using heuristic approach and ge-netic algorithm. Journal of Computational InformationSystems, 8(4):1563 – 1571, 2012.

[6] D. Gorisse, M. Cord, and F. Precioso. Salsas: Sub-linearactive learning strategy with approximate k-nn search.Pattern Recognition, 44(1011):2343 – 2357, 2011.

[7] P. Gosselin and M. Cord. Active learning methods for in-teractive image retrieval. Image Processing, IEEE Trans-actions on, 17(7):1200–1211, july 2008.

[8] S. C. Hoi, R. Jin, J. Zhu, and M. R. Lyu. Semi-supervisedsvm batch mode active learning for image retrieval. IEEECVPR, 0:1–7, 2008.

[9] B. Kulis and K. Grauman. Kernelized locality-sensitivehashing for scalable image search. In IEEE ICCV, pages2130 –2137, 29 2009-oct. 2 2009.

[10] C.-C. Lai and Y.-C. Chen. A user-oriented image re-trieval system based on interactive genetic algorithm. In-strumentation and Measurement, IEEE Transactions on,60(10):3318 –3325, oct. 2011.

[11] N. Panda, K. Goh, and E. Y. Chang. Active learning invery large databases. MTAP, 31(3):249–267, 2006.

[12] D. Parikh and K. Grauman. Relative attributes. In Inter-national Conference on Computer Vision (ICCV), 2011.

[13] M. Saadatmand-Tarzjan and H. Moghaddam. A novelevolutionary approach for optimizing content-based im-age indexing algorithms. Systems, Man, and Cyber-netics, Part B: Cybernetics, IEEE Transactions on,37(1):139 –153, feb. 2007.

[14] H. Takagi. New iec research and frameworks. In Aspectsof Soft Computing, Intelligent Robotics and Control, vol-ume 241 of Studies in Computational Intelligence, pages65–76. Springer, 2009.

[15] K. D. Tran. Content-based retrieval using a multi-objective genetic algorithm. In IEEE SoutheastCon,pages 561–569, april 2005.

[16] J. Wang, J. Li, and G. Wiederhold. Simplicity:semantics-sensitive integrated matching for picture li-braries. IEEE PAMI, 23(9):947 –963, sep 2001.

[17] S.-F. Wang, X.-F. Wang, and J. Xue. An improved inter-active genetic algorithm incorporating relevant feedback.In IEEE ICMLC, volume 5, pages 2996–3001, aug. 2005.