4
published in the Proceedings of IJCNN’99, Washington, DC, July 1999 available at <URL:http://www.cis.hut.fi/picsom/publications.html> PicSOM: Self-Organizing Maps for Content-Based Image Retrieval Jorma Laaksonen, Markus Koskela, and Erkki Oja Laboratory of Computer and Information Science, Helsinki University of Technology, P.O.BOX 5400, Fin-02015 HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja @hut.fi Abstract Digital image libraries are becoming more common and widely used as more visual information is produced at a rapidly growing rate. Content-based image retrieval is an important approach to the problem of processing this in- creasing amount of data. It is based on automatically ex- tracted features from the content of the images, such as color, texture, shape, and structure. We have started a project to study methods for content-based image retrieval using the Self-Organizing Map (SOM) as the image similar- ity scoring method. Our image retrieval system, named Pic- SOM, can be seen as a SOM-based approach to relevance feedback which is a form of supervised learning to adjust the subsequent queries based on the user’s responses during the information retrieval session. In PicSOM, a separate Tree Structured SOM (TS-SOM) is trained for each feature vector type in use. The system then adapts to the user’s preferences by returning her more images from those SOMs where her responses have been most densely mapped. Introduction Content-based image retrieval (CBIR) utilizes the visual content of the images in the process of searching and re- trieving images from an image database. The aim is to ob- tain discriminants which correspond as well as possible with the human judgment for image similarity and can be used in conducting image queries. In CBIR, the images are indexed by features directly derived from the visual content of the image. These features usually contain rather low-level in- formation such as the colors, textures, shapes, and spatial relations the image contains. Most traditional applications in computer vision are usually automatic and self-contained. In CBIR, however, the user is an inseparable part of the process. As the retrieval systems are usually not capable of returning the satisfactory images in their first response to the user, the image query becomes an iterative and interactive process towards the desired im- age or images. Relevance feedback is a common term in text-based retrieval to describe a form of supervised learn- ing to adjust the subsequent queries using the information gathered from the user feedback [12]. This helps the fol- lowing rounds of the retrieval process to approximate better the present need of the user. We have started to study the techniques for utilizing the strong self-organizing power of the Self-Organizing Map (SOM) [7] to facilitate content-based retrieval from im- age databases. As a part of the project, we have imple- mented an experimental image retrieval system called Pic- SOM. PicSOM uses a hierarchical version of the SOM al- gorithm called Tree Structured Self-Organizing Map (TS- SOM) [8, 9] as the method for retrieving images similar to a given set of reference images. PicSOM supports multi- ple parallel features and, with a technique introduced in the PicSOM system, the responses from multiple TS-SOMs are combined automatically. The goal is to autonomously adapt to the user’s preferences regarding the similarity of images in the database. Related Work The best-known system for content-based image retrieval is probably IBM’s Query By Image Content (QBIC) [5]. Re- cently, a number of other CBIR systems, both academic and commercial, have originated, including MIT Media Lab’s Photobook [11], NETRA [10], developed in the Alexandria Digital Library project, Virage [1] by Virage Technologies Inc. and Colombia University’s VisualSEEk [13]. PicSOM bears resemblance to the WEBSOM [6] document browsing and exploration tool developed at the Neural Net- work Research Centre at Helsinki University of Technol- ogy. WEBSOM is a means for organizing text documents into meaningful maps for exploration and search. It auto- matically organizes the documents into a two-dimensional grid so that related documents appear close to each other.

PicSOM: Self-Organizing Maps for Content-BasedImage Retrievalresearch.cs.aalto.fi/cbir/papers/ijcnn99.pdf · 2006-05-29 · P.O.BOX 5400, Fin-02015HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PicSOM: Self-Organizing Maps for Content-BasedImage Retrievalresearch.cs.aalto.fi/cbir/papers/ijcnn99.pdf · 2006-05-29 · P.O.BOX 5400, Fin-02015HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja

published in the Proceedings of IJCNN’99, Washington, DC, July 1999available at <URL:http://www.cis.hut.fi/picsom/publications.html>

PicSOM: Self-Organizing Maps for Content-Based Image Retrieval

JormaLaaksonen,MarkusKoskela,andErkki OjaLaboratoryof ComputerandInformationScience,

Helsinki Universityof Technology,P.O.BOX 5400,Fin-02015HUT, Finland

email:�jorma.laaksonen,markus.koskela,erkki.oja� @hut.fi

Abstract

Digital image libraries are becomingmore commonandwidely usedas more visual information is producedat arapidly growing rate. Content-basedimage retrieval is animportant approach to the problemof processingthis in-creasingamountof data. It is basedon automaticallyex-tracted features from the contentof the images, such ascolor, texture, shape, and structure. We have started aproject to studymethodsfor content-basedimage retrievalusingtheSelf-OrganizingMap(SOM)astheimagesimilar-ity scoringmethod.Our imageretrieval system,namedPic-SOM,can be seenas a SOM-basedapproach to relevancefeedbackwhich isa formof supervisedlearningtoadjustthesubsequentqueriesbasedontheuser’sresponsesduringtheinformationretrieval session.In PicSOM,a separate TreeStructuredSOM(TS-SOM)is trainedfor each featurevectortypein use. Thesystemthenadaptsto theuser’spreferencesby returningher more imagesfrom thoseSOMswhere herresponseshavebeenmostdenselymapped.

Introduction

Content-basedimage retrieval (CBIR) utilizes the visualcontentof the imagesin the processof searchingand re-trieving imagesfrom animagedatabase.Theaim is to ob-taindiscriminantswhichcorrespondaswell aspossiblewiththehumanjudgmentfor imagesimilarity andcanbeusedinconductingimagequeries.In CBIR, theimagesareindexedby featuresdirectly derived from the visual contentof theimage. Thesefeaturesusuallycontainratherlow-level in-formationsuchas the colors, textures,shapes,andspatialrelationstheimagecontains.

Most traditionalapplicationsin computervisionareusuallyautomaticandself-contained.In CBIR, however, theuserisaninseparablepartof theprocess.As theretrieval systemsareusuallynot capableof returningthesatisfactoryimagesin their first responseto theuser, theimagequerybecomes

an iterative andinteractive processtowardsthedesiredim-ageor images. Relevancefeedbackis a commonterm intext-basedretrieval to describea form of supervisedlearn-ing to adjustthe subsequentqueriesusingthe informationgatheredfrom the userfeedback[12]. This helpsthe fol-lowing roundsof theretrieval processto approximatebetterthepresentneedof theuser.

We have startedto study the techniquesfor utilizing thestrongself-organizingpower of the Self-OrganizingMap(SOM) [7] to facilitate content-basedretrieval from im-agedatabases.As a part of the project, we have imple-mentedan experimentalimageretrieval systemcalledPic-SOM. PicSOMusesa hierarchicalversionof the SOM al-gorithm calledTreeStructuredSelf-OrganizingMap (TS-SOM) [8, 9] asthemethodfor retrieving imagessimilar toa given setof referenceimages. PicSOMsupportsmulti-ple parallelfeaturesand,with a techniqueintroducedin thePicSOMsystem,theresponsesfrom multipleTS-SOMsarecombinedautomatically. Thegoalis to autonomouslyadaptto theuser’s preferencesregardingthesimilarity of imagesin thedatabase.

Related Work

Thebest-known systemfor content-basedimageretrieval isprobablyIBM’ s QueryBy ImageContent(QBIC) [5]. Re-cently, anumberof otherCBIR systems,bothacademicandcommercial,have originated,including MIT Media Lab’sPhotobook[11], NETRA [10], developedin theAlexandriaDigital Library project,Virage[1] by VirageTechnologiesInc. andColombiaUniversity’sVisualSEEk[13].

PicSOMbearsresemblanceto theWEBSOM[6] documentbrowsingandexplorationtool developedat theNeuralNet-work ResearchCentreat Helsinki University of Technol-ogy. WEBSOM is a meansfor organizingtext documentsinto meaningfulmapsfor explorationandsearch.It auto-matically organizesthe documentsinto a two-dimensionalgrid sothatrelateddocumentsappearcloseto eachother.

Page 2: PicSOM: Self-Organizing Maps for Content-BasedImage Retrievalresearch.cs.aalto.fi/cbir/papers/ijcnn99.pdf · 2006-05-29 · P.O.BOX 5400, Fin-02015HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja

Somepreviousexperimentson usingtheSOM asanindex-ing tool in content-basedimageretrieval have beenmadein [14]. In [4], the SOM was usedto classify regionsofsimilar texturein astronomicalimagesin animageretrievalsystemcalledASPECT.

Tree-Structured Self-Organizing Map

The Self-OrganizingMap (SOM) [7] is an unsupervised,self-organizingneuralalgorithmwhich is widely usedto vi-sualizeandinterpretlargehigh-dimensionaldatasets.TheSOMdefinesanelasticnetof pointsthatarefitted to thein-putspace.It canthusbeusedto visualizemultidimensionaldata,usuallyon a two-dimensionalgrid.

To speedup thesearchof thebest-matchingunit (BMU), avariantof SOM calledtheTreeStructuredSelf-OrganizingMap(TS-SOM)wasintroducedin [8, 9]. TS-SOMis atree-structuredvector quantizationalgorithm that usesnormalSOMsat eachof its hierarchicallevels.

TheTS-SOMis looselybasedon thetraditionaltree-searchalgorithm. Due to the tree structure,the numberof mapunits increaseswhenmoving downwardson theSOM lay-ersof theTS-SOM.Thesearchspacefor thebest-matchingvectoron theunderlyingSOMlayeris restrictedto aprede-finedportionjustbelow thebest-matchingunit of theaboveSOM. Insteadof mosttree-structuredalgorithms,thesearchspaceis not limited to thechildrenof theBMU ontheupperlayerbut canbesetto includealsoneighboringnodeshav-ing differentparentson the upperlayer. The treestructurereducesthe time complexity of the searchfrom ������� to������ ���� . Thecomplexity of thesearchesis thusremark-ably lower than if the whole bottommostSOM level hadbeenaccessedwithout thetreestructure.

Principle of PicSOM

In PicSOM,the imagequeriesare performedthroughtheWorld Wide Web. Thequeriesareiteratively refinedasthesystemexposesmoreimagesfrom its databaseto the user.Duringtheprocess,PicSOMtriesto adaptto theuser’spref-erencesregardingto thesimilarityof images.Thisisaccom-plishedwith theuseof separateTS-SOMsfor every typeoffeaturevectorsextractedfrom the images. Dependingonhow closeto eachothertheimagesacceptedby theuseraremappedonto a particularSOM layer, the morethe systemfavorstheimagesproposedby thatverySOM.

A typical imagequerywith PicSOMbeginswith thefollow-ing steps. First, an interesteduserconnectswith her webbrowserto the WWW server providing the searchengine.The systempresentsthe userthe first setof referenceim-ageswhich areuniformly pickedfrom thetop levelsof theTS-SOMsin use.Theuserthenselectsthesubsetof imageswhich matchher expectationsbestand to somedegreeof

relevancefit to herpurposes.Then,shehits the “ContinueQuery”buttonandherbrowsersendstheinformationontheacceptedimagesbackto thesearchengine.

The systemmarksthe imagesselectedby the userwith apositive valueandthenon-selectedimageswith a negativevalue in its internal datastructure. Thesevaluesare thensummedupin theirbest-matchingSOMunitsin eachof theTS-SOMmaps. EachSOM layer is thentreatedasa two-dimensionalmatrix formedof valuesdescribingthe user’sresponsesto thecontentsof themapunit. Finally, themapmatricesarelow-passfilteredwith symmetricalconvolutionmasksin orderto spreadtheuser’s responsesto the neigh-boringunitswhich,by presumption,containimagesthatareto someextentsimilar to thepresentones.Startingfrom theSOMunit having thelargestconvolvedresponsevalue,Pic-SOM retrievesfrom the databasethe imagewhosefeaturevectoris nearestto theweightvectorin thatunit. If thatim-agehasnotbeenshown to theuser, it is markedto beshownonthenext round.Thisprocessis continuedwith thesecondlargestvalueandsoonuntil apresetnumberof new imageshave beenselected.This set is thenpresentedto the user.Theiterationis continueduntil theuseris satisfiedwith oneor moreimagesreturnedby thesystem.

Figure 1: An exampleof convolved TS-SOMsfor color(left) andtexture(right) features.Blackcorrespondsto pos-itiveandwhite to negativeconvolvedvalues.

Figure1 illustratestwo convolvedTS-SOMs.Thethreeim-ageson the left representthreemap levels of a SOM forRGB color features,whereasthe convolutionson the rightare calculatedon a map of simple textural features. Thesizesof the SOM layersare ����� , ��������� , and ��������� ,from top to bottom.

Page 3: PicSOM: Self-Organizing Maps for Content-BasedImage Retrievalresearch.cs.aalto.fi/cbir/papers/ijcnn99.pdf · 2006-05-29 · P.O.BOX 5400, Fin-02015HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja

Figure2: TheWWW-baseduserinterfaceof PicSOM.

The distinct featuresdo not thus needto be weightedbytheuseror administratorof thesystemasrequiredin manyotherCBIR systems.Instead,PicSOMautomaticallyadaptsto theuser’sparticularneedsandimpressionof thesimilar-ity of images.Anotherintrinsic featurein our systemis itsability to useany numberof referenceimages,while manyothercurrentsystemsarebasedon usinga singlereferenceimage.

The WWW-baseduserinterfaceis illustratedin Figure2.Below theconvolvedSOMs,thefirst setof imagesconsistsof imagesselectedon the previous roundsof the retrievalprocess. Theseimagesmay be unselectedon any subse-quentround,thuschangingtheirvaluefrom positiveto neg-ative. This exampleshows a query with threeimagesofbuildings selected. The next images,separatedby a hor-izontal line, are the 16 best-scoringnew imagesobtainedfrom theconvolvedunitsin theTS-SOMs.

Experiments

We evaluatedtheusefulnessof thePicSOMapproachwitha setof experiments.We usedan imagedatabaseof 4350imagescontainingmainly color photographsin JPEGfor-mat. This set of imageswas downloadedfrom locationftp://ftp.sunet.se/pub/pictures/. Threedifferent featureex-

traction methodswere appliedto this dataand the corre-spondingTS-SOMswerecreated.

In the first featurevectorset, namedcolor, averageRGBvalueswerecalculatedin five zonesof the images,corre-spondingto the left, right, top, bottom, and centerpartsof the imagearea. This resultedto 15-dimensionalvec-tors. The secondset,namedtexture, wasformedlikewisein the five zones. This time, eachpixel’s YIQ luminancewascomparedto thatof its eightneighbors.Theestimatedprobability that the centerpixel has larger value than theneighborwasusedasa feature.Thefeaturevectorwasthus40-dimensional.In the last set [2], namedshape, the im-ageswerefirst transformedto binarizededgeimagesby us-ing ����� -sizedSobelmasks.Theedgedirectionswerethendiscretizedto eightvalues.An 8-binhistogramwasformedfrom the directionsin the samefive zonesasfor the othertwo featuretypes. This gave rise to a 40-dimensionalfea-turevector.

The TS-SOMsfor all featuresweresized ����� , ��������� ,and ����� ��� , from top to bottom.During thetraining,eachvectorwasused100 times in the adaptation.Eachof theTS-SOMswasfirst usedaloneandthenall the threemapstogether. A quantitativemeasurefor theimageretrieval per-formancewasobtainedasfollows. First, we selectedfromour databasethesubsetof imageswhich couldberegardedasportrayinganaircraft. Therewere348suchimagesandthus a priori probability of 8.0 percent. We then imple-mentedan“ideal screener”,a computerprogramwhich ex-aminedthe output of the systemand marked the imagespresentedby PicSOMeitherasselectedor non-selectedac-cordingto the hand-pickedbasetruth. The queryprocess-ing couldthusbesimulatedandperformancedatacollectedwithout any humanintervention.

For eachof the 348 aircraft imageswe recordedthe totalnumberof imagespresentedby the systemuntil that par-ticular imagewasshown. From this data,we formedhis-togramsandcalculatedthe averagenumberof shown im-agesneededbeforehit. After division by 4350,this figureyieldeda value !#"$�%'&(�() . For values!#*+%-,/. , theperfor-manceof the systemwasthusbetterthanrandompickingof imagesand,in general,the smallerthe ! valuethe bet-ter theperformance.In our experiments,thesystemalwayspresented16new imagesto thescreeningprogram.These-lectionof thisparameternaturallyhaseffectontheresulting! value.

As the above-describedprocedurewasrepeatedfor imagesetscontainingbuildings(492items)andhumanfaces(361items), the resultsin Table1 wereobtained.The last rowgivestheunweightedaverageof theresultsfor thethreeim-agesets. The resultsshow that the shapefeaturesclearlyyield betterperformancethanthetwo otherfeaturetypesin

Page 4: PicSOM: Self-Organizing Maps for Content-BasedImage Retrievalresearch.cs.aalto.fi/cbir/papers/ijcnn99.pdf · 2006-05-29 · P.O.BOX 5400, Fin-02015HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja

featuretypesandTS-SOMsusedimages color texture shape allaircraft 0.307 0.436 0.220 0.261buildings 0.319 0.312 0.309 0.300faces 0.372 0.356 0.364 0.362average 0.333 0.368 0.298 0.308

Table1: Retrieval performances! for threedistinctfeaturesaloneandthecombineduseof all threetogetherin PicSOM.

the caseof the aircraft images. For the building and faceimageclassestheperformancesof the threeindividual TS-SOMsaremoreeven.

Theresultsfor thePicSOMsystemwith thethreeTS-SOMsusedtogetherarein everycaseeitherthebestor thesecondbestones. PicSOMsystemis thus to someextent able tobenefitfrom theexistenceof multiple featuretypes.As it isnot beforehandknown which TS-SOMwould performbestfor theuser’squery, thePicSOMapproachprovidesarobustmethodfor usinga setof imagemapsin parallel.

We havealsoexperimentedwith othertypesof featurevec-tors. Also in the light of thoseresults,it seemsthat if onefeaturevectortypehasclearlybetterretrieval performance! thanthe others,it is morebeneficialto usethat particu-lar TS-SOM alonethan to usealso the worse-performingmaps.Therefore,it is necessaryfor theproperoperationofthePicSOMsystemthattheusedfeaturesarewell balanced,i.e., they shouldon the averageperformquite similarly bythemselves.

Conclusions

We have in this paperintroducedthe PicSOMmethodforcontent-basedimageretrieval and a preliminary quantita-tive evaluationof its performance.The resultsof our ex-perimentsshow that the PicSOM systemis able to effec-tively selectfrom a setof parallelTS-SOMsa combinationwhich eithernearlymatchesthebestindividual imagemapor slightly outperformsit in performance.

The featuresusedin this studyarecertainlyyet quite ten-tative. Therefore,we have starteda large seriesof experi-mentsfor selectinga properandwell-balancedsetof fea-turesto be usedin the PicSOMsystemandour future as-sessments.

Our future plans also include the use of the Corel [3]databaseof ��%0%�%�% photographimages. Also, we havestartedto collect imagesfrom the World Wide Web andtoindex them by using PicSOM. The PicSOM systemwilllater be publicly available for demonstrationpurposesathttp://www.cis.hut.fi/picsom/.

References[1] J.R. Bach,C. Fuller, A. Gupta,A. Hampapur, B. Horowitz,

R. Humphrey, R. Jain, and C.-F. Shu. The Virage imagesearchengine:An openframework for imagemanagement.In I. K. SethiandR.J.Jain,editors,StorageandRetrieval forImageandVideoDatabasesIV, volume2670of Proceedingsof SPIE, pages76–87,1996.

[2] S.Brandt. Useof shapefeaturesin content-basedimagere-trieval. Master’s thesis,Helsinki Universityof Technology,1999.To appear.

[3] The Corel Corporation World Wide Web home page,http://www.corel.com.

[4] A. Csillaghy. Neuralnetwork-generatedindexing featuresandretrieval effectiveness.In Proceedingsof theConverg-ing ComputingMethodologies in Astronomy(CCMA)Con-ference, Sonthofen,Bavaria,September1997.

[5] M. Flickner, H. Sawhney, W. Niblack, et al. Queryby im-ageandvideocontent:TheQBIC system.IEEE Computer,pages23–31,September1995.

[6] T. Honkela, S. Kaski, K. Lagus, and T. Kohonen.WEBSOM—self-organizingmapsof documentcollections.In Proceedingsof WSOM’97,Workshopon Self-OrganizingMaps,Espoo,Finland, June4-6, pages310–315.HelsinkiUniversity of Technology, NeuralNetworksResearchCen-tre,Espoo,Finland,1997.

[7] T. Kohonen.Self-OrganizingMaps, volume30 of SpringerSeriesin InformationSciences. Springer-Verlag,1997.Sec-ondExtendedEdition.

[8] P. Koikkalainen. Progresswith the tree-structuredself-organizingmap.In A. G. Cohn,editor, 11thEuropeanCon-ferenceon Artificial Intelligence. EuropeanCommitteeforArtificial Intelligence(ECCAI), JohnWiley & Sons,Ltd.,1994.

[9] P. KoikkalainenandE.Oja.Self-organizinghierarchicalfea-turemaps.In Proceedingsof 1990InternationalJoint Con-ferenceonNeural Networks, volumeII, pages279–284,SanDiego,CA, 1990.IEEE,INNS.

[10] W. Y. Ma andB. S.Manjunath.NETRA: A toolboxfor navi-gatinglargeimagedatabases.In IEEEInternationalConfer-enceonImageProcessing(ICIP), SantaBarbara,California,October1997.

[11] A. Pentland,R.W. Picard,andS.Sclaroff. Photobook:Toolsfor content-basedmanipulationof imagedatabases.In Stor-ageandRetrieval for ImageandVideoDatabasesII , volume2185of SPIEProceedingsSeries, SanJose,CA, USA,1994.

[12] G. SaltonandM. J. McGill. Introductionto ModernInfor-mationRetrieval. McGraw-Hill, 1983.

[13] J.R.SmithandS.-F. Chang.VisualSEEk:A fully automatedcontent-basedimagequerysystem. In Proceedingsof theACM Multimedia1996, Boston,MA, 1996.

[14] H. ZhangandD. Zhong. A schemefor visual featurebasedimageindexing. In Storage and Retrieval for Image andVideoDatabasesIII (SPIE), volume2420of SPIEProceed-ingsSeries, SanJose,CA, February1995.