PicSOM: Self-Organizing Maps for Content-BasedImage Retrievalresearch.cs.aalto.fi/cbir/papers/ijcnn99.pdf · 2006-05-29 · P.O.BOX 5400, Fin-02015HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja

published in the Proceedings of IJCNN’99, Washington, DC, July 1999available at <URL:http://www.cis.hut.fi/picsom/publications.html>

PicSOM: Self-Organizing Maps for Content-Based Image Retrieval

JormaLaaksonen,MarkusKoskela,andErkki OjaLaboratoryof ComputerandInformationScience,

Helsinki Universityof Technology,P.O.BOX 5400,Fin-02015HUT, Finland

email:�jorma.laaksonen,markus.koskela,erkki.oja� @hut.fi

Abstract

Digital image libraries are becomingmore commonandwidely usedas more visual information is producedat arapidly growing rate. Content-basedimage retrieval is animportant approach to the problemof processingthis in-creasingamountof data. It is basedon automaticallyex-tracted features from the contentof the images, such ascolor, texture, shape, and structure. We have started aproject to studymethodsfor content-basedimage retrievalusingtheSelf-OrganizingMap(SOM)astheimagesimilar-ity scoringmethod.Our imageretrieval system,namedPic-SOM,can be seenas a SOM-basedapproach to relevancefeedbackwhich isa formof supervisedlearningtoadjustthesubsequentqueriesbasedontheuser’sresponsesduringtheinformationretrieval session.In PicSOM,a separate TreeStructuredSOM(TS-SOM)is trainedfor each featurevectortypein use. Thesystemthenadaptsto theuser’spreferencesby returningher more imagesfrom thoseSOMswhere herresponseshavebeenmostdenselymapped.

Introduction

Content-basedimage retrieval (CBIR) utilizes the visualcontentof the imagesin the processof searchingand re-trieving imagesfrom animagedatabase.Theaim is to ob-taindiscriminantswhichcorrespondaswell aspossiblewiththehumanjudgmentfor imagesimilarity andcanbeusedinconductingimagequeries.In CBIR, theimagesareindexedby featuresdirectly derived from the visual contentof theimage. Thesefeaturesusuallycontainratherlow-level in-formationsuchas the colors, textures,shapes,andspatialrelationstheimagecontains.

Most traditionalapplicationsin computervisionareusuallyautomaticandself-contained.In CBIR, however, theuserisaninseparablepartof theprocess.As theretrieval systemsareusuallynot capableof returningthesatisfactoryimagesin their first responseto theuser, theimagequerybecomes

an iterative andinteractive processtowardsthedesiredim-ageor images. Relevancefeedbackis a commonterm intext-basedretrieval to describea form of supervisedlearn-ing to adjustthe subsequentqueriesusingthe informationgatheredfrom the userfeedback[12]. This helpsthe fol-lowing roundsof theretrieval processto approximatebetterthepresentneedof theuser.

We have startedto study the techniquesfor utilizing thestrongself-organizingpower of the Self-OrganizingMap(SOM) [7] to facilitate content-basedretrieval from im-agedatabases.As a part of the project, we have imple-mentedan experimentalimageretrieval systemcalledPic-SOM. PicSOMusesa hierarchicalversionof the SOM al-gorithm calledTreeStructuredSelf-OrganizingMap (TS-SOM) [8, 9] asthemethodfor retrieving imagessimilar toa given setof referenceimages. PicSOMsupportsmulti-ple parallelfeaturesand,with a techniqueintroducedin thePicSOMsystem,theresponsesfrom multipleTS-SOMsarecombinedautomatically. Thegoalis to autonomouslyadaptto theuser’s preferencesregardingthesimilarity of imagesin thedatabase.

Related Work

Thebest-known systemfor content-basedimageretrieval isprobablyIBM’ s QueryBy ImageContent(QBIC) [5]. Re-cently, anumberof otherCBIR systems,bothacademicandcommercial,have originated,including MIT Media Lab’sPhotobook[11], NETRA [10], developedin theAlexandriaDigital Library project,Virage[1] by VirageTechnologiesInc. andColombiaUniversity’sVisualSEEk[13].

PicSOMbearsresemblanceto theWEBSOM[6] documentbrowsingandexplorationtool developedat theNeuralNet-work ResearchCentreat Helsinki University of Technol-ogy. WEBSOM is a meansfor organizingtext documentsinto meaningfulmapsfor explorationandsearch.It auto-matically organizesthe documentsinto a two-dimensionalgrid sothatrelateddocumentsappearcloseto eachother.

Somepreviousexperimentson usingtheSOM asanindex-ing tool in content-basedimageretrieval have beenmadein [14]. In [4], the SOM was usedto classify regionsofsimilar texturein astronomicalimagesin animageretrievalsystemcalledASPECT.

Tree-Structured Self-Organizing Map

The Self-OrganizingMap (SOM) [7] is an unsupervised,self-organizingneuralalgorithmwhich is widely usedto vi-sualizeandinterpretlargehigh-dimensionaldatasets.TheSOMdefinesanelasticnetof pointsthatarefitted to thein-putspace.It canthusbeusedto visualizemultidimensionaldata,usuallyon a two-dimensionalgrid.

To speedup thesearchof thebest-matchingunit (BMU), avariantof SOM calledtheTreeStructuredSelf-OrganizingMap(TS-SOM)wasintroducedin [8, 9]. TS-SOMis atree-structuredvector quantizationalgorithm that usesnormalSOMsat eachof its hierarchicallevels.

TheTS-SOMis looselybasedon thetraditionaltree-searchalgorithm. Due to the tree structure,the numberof mapunits increaseswhenmoving downwardson theSOM lay-ersof theTS-SOM.Thesearchspacefor thebest-matchingvectoron theunderlyingSOMlayeris restrictedto aprede-finedportionjustbelow thebest-matchingunit of theaboveSOM. Insteadof mosttree-structuredalgorithms,thesearchspaceis not limited to thechildrenof theBMU ontheupperlayerbut canbesetto includealsoneighboringnodeshav-ing differentparentson the upperlayer. The treestructurereducesthe time complexity of the searchfrom �� to�� . Thecomplexity of thesearchesis thusremark-ably lower than if the whole bottommostSOM level hadbeenaccessedwithout thetreestructure.

Principle of PicSOM

In PicSOM,the imagequeriesare performedthroughtheWorld Wide Web. Thequeriesareiteratively refinedasthesystemexposesmoreimagesfrom its databaseto the user.Duringtheprocess,PicSOMtriesto adaptto theuser’spref-erencesregardingto thesimilarityof images.Thisisaccom-plishedwith theuseof separateTS-SOMsfor every typeoffeaturevectorsextractedfrom the images. Dependingonhow closeto eachothertheimagesacceptedby theuseraremappedonto a particularSOM layer, the morethe systemfavorstheimagesproposedby thatverySOM.

A typical imagequerywith PicSOMbeginswith thefollow-ing steps. First, an interesteduserconnectswith her webbrowserto the WWW server providing the searchengine.The systempresentsthe userthe first setof referenceim-ageswhich areuniformly pickedfrom thetop levelsof theTS-SOMsin use.Theuserthenselectsthesubsetof imageswhich matchher expectationsbestand to somedegreeof

relevancefit to herpurposes.Then,shehits the “ContinueQuery”buttonandherbrowsersendstheinformationontheacceptedimagesbackto thesearchengine.

The systemmarksthe imagesselectedby the userwith apositive valueandthenon-selectedimageswith a negativevalue in its internal datastructure. Thesevaluesare thensummedupin theirbest-matchingSOMunitsin eachof theTS-SOMmaps. EachSOM layer is thentreatedasa two-dimensionalmatrix formedof valuesdescribingthe user’sresponsesto thecontentsof themapunit. Finally, themapmatricesarelow-passfilteredwith symmetricalconvolutionmasksin orderto spreadtheuser’s responsesto the neigh-boringunitswhich,by presumption,containimagesthatareto someextentsimilar to thepresentones.Startingfrom theSOMunit having thelargestconvolvedresponsevalue,Pic-SOM retrievesfrom the databasethe imagewhosefeaturevectoris nearestto theweightvectorin thatunit. If thatim-agehasnotbeenshown to theuser, it is markedto beshownonthenext round.Thisprocessis continuedwith thesecondlargestvalueandsoonuntil apresetnumberof new imageshave beenselected.This set is thenpresentedto the user.Theiterationis continueduntil theuseris satisfiedwith oneor moreimagesreturnedby thesystem.

Figure 1: An exampleof convolved TS-SOMsfor color(left) andtexture(right) features.Blackcorrespondsto pos-itiveandwhite to negativeconvolvedvalues.

Figure1 illustratestwo convolvedTS-SOMs.Thethreeim-ageson the left representthreemap levels of a SOM forRGB color features,whereasthe convolutionson the rightare calculatedon a map of simple textural features. Thesizesof the SOM layersare �� , �� , and �� ,from top to bottom.

Figure2: TheWWW-baseduserinterfaceof PicSOM.

The distinct featuresdo not thus needto be weightedbytheuseror administratorof thesystemasrequiredin manyotherCBIR systems.Instead,PicSOMautomaticallyadaptsto theuser’sparticularneedsandimpressionof thesimilar-ity of images.Anotherintrinsic featurein our systemis itsability to useany numberof referenceimages,while manyothercurrentsystemsarebasedon usinga singlereferenceimage.

The WWW-baseduserinterfaceis illustratedin Figure2.Below theconvolvedSOMs,thefirst setof imagesconsistsof imagesselectedon the previous roundsof the retrievalprocess. Theseimagesmay be unselectedon any subse-quentround,thuschangingtheirvaluefrom positiveto neg-ative. This exampleshows a query with threeimagesofbuildings selected. The next images,separatedby a hor-izontal line, are the 16 best-scoringnew imagesobtainedfrom theconvolvedunitsin theTS-SOMs.

Experiments

We evaluatedtheusefulnessof thePicSOMapproachwitha setof experiments.We usedan imagedatabaseof 4350imagescontainingmainly color photographsin JPEGfor-mat. This set of imageswas downloadedfrom locationftp://ftp.sunet.se/pub/pictures/. Threedifferent featureex-

traction methodswere appliedto this dataand the corre-spondingTS-SOMswerecreated.

In the first featurevectorset, namedcolor, averageRGBvalueswerecalculatedin five zonesof the images,corre-spondingto the left, right, top, bottom, and centerpartsof the imagearea. This resultedto 15-dimensionalvec-tors. The secondset,namedtexture, wasformedlikewisein the five zones. This time, eachpixel’s YIQ luminancewascomparedto thatof its eightneighbors.Theestimatedprobability that the centerpixel has larger value than theneighborwasusedasa feature.Thefeaturevectorwasthus40-dimensional.In the last set [2], namedshape, the im-ageswerefirst transformedto binarizededgeimagesby us-ing �� -sizedSobelmasks.Theedgedirectionswerethendiscretizedto eightvalues.An 8-binhistogramwasformedfrom the directionsin the samefive zonesasfor the othertwo featuretypes. This gave rise to a 40-dimensionalfea-turevector.

The TS-SOMsfor all featuresweresized �� , �� ,and �� , from top to bottom.During thetraining,eachvectorwasused100 times in the adaptation.Eachof theTS-SOMswasfirst usedaloneandthenall the threemapstogether. A quantitativemeasurefor theimageretrieval per-formancewasobtainedasfollows. First, we selectedfromour databasethesubsetof imageswhich couldberegardedasportrayinganaircraft. Therewere348suchimagesandthus a priori probability of 8.0 percent. We then imple-mentedan“ideal screener”,a computerprogramwhich ex-aminedthe output of the systemand marked the imagespresentedby PicSOMeitherasselectedor non-selectedac-cordingto the hand-pickedbasetruth. The queryprocess-ing couldthusbesimulatedandperformancedatacollectedwithout any humanintervention.

For eachof the 348 aircraft imageswe recordedthe totalnumberof imagespresentedby the systemuntil that par-ticular imagewasshown. From this data,we formedhis-togramsandcalculatedthe averagenumberof shown im-agesneededbeforehit. After division by 4350,this figureyieldeda value !#"$�%'&(�() . For values!#*+%-,/. , theperfor-manceof the systemwasthusbetterthanrandompickingof imagesand,in general,the smallerthe ! valuethe bet-ter theperformance.In our experiments,thesystemalwayspresented16new imagesto thescreeningprogram.These-lectionof thisparameternaturallyhaseffectontheresulting! value.

As the above-describedprocedurewasrepeatedfor imagesetscontainingbuildings(492items)andhumanfaces(361items), the resultsin Table1 wereobtained.The last rowgivestheunweightedaverageof theresultsfor thethreeim-agesets. The resultsshow that the shapefeaturesclearlyyield betterperformancethanthetwo otherfeaturetypesin

featuretypesandTS-SOMsusedimages color texture shape allaircraft 0.307 0.436 0.220 0.261buildings 0.319 0.312 0.309 0.300faces 0.372 0.356 0.364 0.362average 0.333 0.368 0.298 0.308

Table1: Retrieval performances! for threedistinctfeaturesaloneandthecombineduseof all threetogetherin PicSOM.

the caseof the aircraft images. For the building and faceimageclassestheperformancesof the threeindividual TS-SOMsaremoreeven.

Theresultsfor thePicSOMsystemwith thethreeTS-SOMsusedtogetherarein everycaseeitherthebestor thesecondbestones. PicSOMsystemis thus to someextent able tobenefitfrom theexistenceof multiple featuretypes.As it isnot beforehandknown which TS-SOMwould performbestfor theuser’squery, thePicSOMapproachprovidesarobustmethodfor usinga setof imagemapsin parallel.

We havealsoexperimentedwith othertypesof featurevec-tors. Also in the light of thoseresults,it seemsthat if onefeaturevectortypehasclearlybetterretrieval performance! thanthe others,it is morebeneficialto usethat particu-lar TS-SOM alonethan to usealso the worse-performingmaps.Therefore,it is necessaryfor theproperoperationofthePicSOMsystemthattheusedfeaturesarewell balanced,i.e., they shouldon the averageperformquite similarly bythemselves.

Conclusions

We have in this paperintroducedthe PicSOMmethodforcontent-basedimageretrieval and a preliminary quantita-tive evaluationof its performance.The resultsof our ex-perimentsshow that the PicSOM systemis able to effec-tively selectfrom a setof parallelTS-SOMsa combinationwhich eithernearlymatchesthebestindividual imagemapor slightly outperformsit in performance.

The featuresusedin this studyarecertainlyyet quite ten-tative. Therefore,we have starteda large seriesof experi-mentsfor selectinga properandwell-balancedsetof fea-turesto be usedin the PicSOMsystemandour future as-sessments.

Our future plans also include the use of the Corel [3]databaseof ��%0%�%�% photographimages. Also, we havestartedto collect imagesfrom the World Wide Web andtoindex them by using PicSOM. The PicSOM systemwilllater be publicly available for demonstrationpurposesathttp://www.cis.hut.fi/picsom/.

References[1] J.R. Bach,C. Fuller, A. Gupta,A. Hampapur, B. Horowitz,

R. Humphrey, R. Jain, and C.-F. Shu. The Virage imagesearchengine:An openframework for imagemanagement.In I. K. SethiandR.J.Jain,editors,StorageandRetrieval forImageandVideoDatabasesIV, volume2670of Proceedingsof SPIE, pages76–87,1996.

[2] S.Brandt. Useof shapefeaturesin content-basedimagere-trieval. Master’s thesis,Helsinki Universityof Technology,1999.To appear.

[3] The Corel Corporation World Wide Web home page,http://www.corel.com.

[4] A. Csillaghy. Neuralnetwork-generatedindexing featuresandretrieval effectiveness.In Proceedingsof theConverg-ing ComputingMethodologies in Astronomy(CCMA)Con-ference, Sonthofen,Bavaria,September1997.

[5] M. Flickner, H. Sawhney, W. Niblack, et al. Queryby im-ageandvideocontent:TheQBIC system.IEEE Computer,pages23–31,September1995.

[6] T. Honkela, S. Kaski, K. Lagus, and T. Kohonen.WEBSOM—self-organizingmapsof documentcollections.In Proceedingsof WSOM’97,Workshopon Self-OrganizingMaps,Espoo,Finland, June4-6, pages310–315.HelsinkiUniversity of Technology, NeuralNetworksResearchCen-tre,Espoo,Finland,1997.

[7] T. Kohonen.Self-OrganizingMaps, volume30 of SpringerSeriesin InformationSciences. Springer-Verlag,1997.Sec-ondExtendedEdition.

[8] P. Koikkalainen. Progresswith the tree-structuredself-organizingmap.In A. G. Cohn,editor, 11thEuropeanCon-ferenceon Artificial Intelligence. EuropeanCommitteeforArtificial Intelligence(ECCAI), JohnWiley & Sons,Ltd.,1994.

[9] P. KoikkalainenandE.Oja.Self-organizinghierarchicalfea-turemaps.In Proceedingsof 1990InternationalJoint Con-ferenceonNeural Networks, volumeII, pages279–284,SanDiego,CA, 1990.IEEE,INNS.

[10] W. Y. Ma andB. S.Manjunath.NETRA: A toolboxfor navi-gatinglargeimagedatabases.In IEEEInternationalConfer-enceonImageProcessing(ICIP), SantaBarbara,California,October1997.

[11] A. Pentland,R.W. Picard,andS.Sclaroff. Photobook:Toolsfor content-basedmanipulationof imagedatabases.In Stor-ageandRetrieval for ImageandVideoDatabasesII , volume2185of SPIEProceedingsSeries, SanJose,CA, USA,1994.

[12] G. SaltonandM. J. McGill. Introductionto ModernInfor-mationRetrieval. McGraw-Hill, 1983.

[13] J.R.SmithandS.-F. Chang.VisualSEEk:A fully automatedcontent-basedimagequerysystem. In Proceedingsof theACM Multimedia1996, Boston,MA, 1996.

[14] H. ZhangandD. Zhong. A schemefor visual featurebasedimageindexing. In Storage and Retrieval for Image andVideoDatabasesIII (SPIE), volume2420of SPIEProceed-ingsSeries, SanJose,CA, February1995.

Documents

PicSOM: Self-Organizing Maps for Content-BasedImage Retrievalresearch.cs.aalto.fi/cbir/papers/ijcnn99.pdf · 2006-05-29 · P.O.BOX 5400, Fin-02015HUT, Finland email: jorma.laaksonen,markus.koskela,erkki.oja