8
Image Database Indexing using a Combination of Invariant Shape and Color Descriptions Ronald Alferez and Yuan-Fang Wang Computer Science Department University of California Santa Barbara, CA, 93106 U.S.A. Abstract Image and video library applications are becoming increasingly popular. The increasing popu- larity calls for software tools to help the user query and retrieve database images efficiently and effectively. In this paper, we present a technique which combines shape and color descriptors for invariant, within-a- class retrieval of images from digital libraries. We demonstrate the technique on a real database contain- ing airplane images of similar shape and query im- ages that appear different from those in the database because of lighting and perspective. We were able to achieve a very high retrieval rate. Keywords: images, video, libraries, features 1 Introduction Image and video library applications are becom- ing increasingly popular as witnessed by many na- tional and international research initiatives in these areas, an exploding number of professional meet- ings devoted to image/video/multi-media, and the emergency of commercial companies and prod- ucts. The advent of high-speed networks and inex- pensive storage devices has enabled the construc- tion of large electronic image and video archives and greatly facilitated their access on the Internet. In line with this, however, is the need for software tools to help the user query and retrieve database images efficiently and effectively. Querying an image library can be difficult and one of the main difficulties lies in designing power- ful features or descriptors to represent and organize images in a library. Many existing image database indexing and retrieval systems are only capable of between-classes retrieval (e.g., distinguishing fish from airplanes). However, these systems do not Supported in part by a grant from the National Science Foundation, IRI-94-11330 allow the user to retrieve images that are more spe- cific. In other words, they are unable to perform within-a-class retrieval (e.g., distinguishing differ- ent types of airplanes or different species of fish). This is because the aggregate features adopted by many current systems (such as color histograms and low-ordered moments) capture only the gen- eral shape of a class and are not descriptive enough to distinguish objects within a particular class. The within-a-class retrieval problem is further complicated if query images depicting objects, though belonging to the class of interest, may look different due to non-essential or incidental envi- ronment changes, such as rigid-body or articulated motion, shape deformation, and change in illumi- nation and viewpoint. In this paper, we address the problem of invariant, within-a-class retrieval of images by using a combination of invariant shape and color descriptors. By analyzing the shape of the object's contour as well as the color and texture characteristics of the enclosed area, information from multiple sources is fused for a more robust description of an object's appearance. this places our technique at an advantage over most current approaches that exploit either geometric informa- tion or color information exclusively. The analysis involves projecting the shape and color information onto basis functions of finite, local support (e.g., splines and wavelets). The projection coefficients, in general, are sensitive to changes induced by rigid motion, shape deforma- tion, and change in illumination and perspective. We derive expressions by massaging these sets of projection coefficients to cancel out the envi- ronmental factors to achieve invariance of the de- scriptors. Based on these features, we have con- ducted preliminary experiments to recognize dif- ferent types of airplanes (many of them having very similar shape) under varying illumination and

Image Database Indexing using a Combination of Invariant ...yfwang/papers/fusion99.pdfthe object's contour as well as the color and texture ... information from multiple sources is

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • ImageDatabaseIndexing usinga Combination of InvariantShapeand Color Descriptions

    RonaldAlferezandYuan-FangWangComputerScienceDepartment

    Universityof CaliforniaSantaBarbara,CA, 93106U.S.A.

    Abstract Image and video library applicationsarebecomingincreasinglypopular. The increasingpopu-larity callsfor softwaretoolstohelptheuserqueryandretrieve databaseimagesefficiently and effectively. Inthis paper, we presenta technique which combinesshapeand color descriptorsfor invariant, within-a-class retrieval of imagesfrom digital libraries. Wedemonstratethetechniqueon a real databasecontain-ing airplane imagesof similar shapeand query im-agesthat appeardifferent from thosein the databasebecauseof lighting and perspective. We were able toachievea veryhigh retrieval rate.

    Keywords: images,video,libraries,features

    1 Intr oductionImageand video library applicationsare becom-ing increasinglypopularaswitnessedby many na-tionalandinternationalresearchinitiativesin theseareas,an explodingnumberof professionalmeet-ings devotedto image/video/multi-media, andtheemergency of commercialcompaniesand prod-ucts.Theadventof high-speednetworksandinex-pensive storagedeviceshasenabledthe construc-tion of large electronicimageandvideo archivesandgreatlyfacilitatedtheir accesson theInternet.In line with this, however, is theneedfor softwaretools to help the userqueryandretrieve databaseimagesefficiently andeffectively.

    Queryingan imagelibrary canbe difficult andoneof themaindifficultiesliesin designingpower-ful featuresor descriptorsto representandorganizeimagesin a library. Many existing imagedatabaseindexing andretrieval systemsareonly capableofbetween-classesretrieval (e.g.,distinguishingfishfrom airplanes). However, thesesystemsdo not�Supportedin part by a grant from the NationalScience

    Foundation,IRI-94-11330

    allow theuserto retrieve imagesthataremorespe-cific. In otherwords, they areunableto performwithin-a-classretrieval (e.g.,distinguishingdiffer-ent typesof airplanesor differentspeciesof fish).This is becausetheaggregatefeaturesadoptedbymany currentsystems(such as color histogramsand low-orderedmoments)captureonly the gen-eralshapeof aclassandarenotdescriptiveenoughto distinguishobjectswithin aparticularclass.

    The within-a-classretrieval problemis furthercomplicatedif query images depicting objects,thoughbelongingto theclassof interest,maylookdifferent due to non-essentialor incidentalenvi-ronmentchanges,suchasrigid-bodyor articulatedmotion,shapedeformation,andchangein illumi-nation and viewpoint. In this paper, we addresstheproblemof invariant,within-a-classretrieval ofimagesby usinga combinationof invariantshapeandcolor descriptors.By analyzingthe shapeoftheobject's contouraswell asthecolorandtexturecharacteristicsof the enclosedarea, informationfrom multiple sourcesis fusedfor a more robustdescriptionof an object's appearance.this placesour techniqueat an advantageover most currentapproachesthat exploit eithergeometricinforma-tion or color informationexclusively.

    The analysisinvolvesprojectingthe shapeandcolor information onto basis functions of finite,local support (e.g., splinesand wavelets). Theprojectioncoefficients,in general,aresensitive tochangesinducedby rigid motion,shapedeforma-tion, and changein illumination and perspective.We derive expressionsby massagingthesesetsof projectioncoefficients to cancelout the envi-ronmentalfactorsto achieve invarianceof the de-scriptors. Basedon thesefeatures,we have con-ductedpreliminary experimentsto recognizedif-ferent types of airplanes(many of them havingverysimilarshape)undervaryingilluminationand

  • viewing conditionsandhave achievedgoodrecog-nition rates.We show that informationfusionhashelpedto improve the accuracy in retrieval andshapediscrimination.

    2 TechnicalDescriptionIn this section,we presentthe theoreticalfoun-dation of our image-derived, invariant shapeandcolor features.Invariantfeaturesform a compact,intrinsic descriptionof anobject,andcanbeusedto designretrieval andindexing algorithmsthatarepotentially more efficient than, say, aspect-basedapproaches.

    Thesearchfor invariantfeatures(e.g.,algebraicandprojective invariants)is a classicalprobleminmathematicsdatingbackto the18thcentury. Theneedfor invariantimagedescriptorshaslong beenrecognizedin computervision. Invariant featurescan be designedbasedon many different meth-odsandmadeinvariantto rigid-bodymotion,affineshapedeformation,sceneillumination, occlusion,andperspective projection.Invariantscanbecom-putedeither globally, such is the casein invari-antsbasedon momentsor Fourier transformco-efficients,or basedon localpropertiessuchascur-vatureandarc length. See[3, 4, 5] for survey anddiscussionon thesubjectof invariants.

    As mentionedbefore,our invariantfeaturesarederived from a localized analysisof an object'sshapeandcolor. Thebasicideais to projectanob-ject'sexteriorcontouror interiorregionontolocal-izedbasessuchaswaveletsandsplines.Thecoeffi-cientsarethennormalizedto eliminatechangesin-ducedby non-essentialenvironmentalfactorssuchasviewpoint and illumination. We will illustratethemathematicalframeworksusinga specificsce-nariowhereinvariantsfor curvesaresought.Theparticularbasisfunctionsusedin the illustrationwill bethewaveletbasesandsplinefunctions.In-terestedreadersarereferredto [1] for moredetails.

    Several implementationissuesarisein this in-variant framework which we will briefly discussbeforedescribingthe invariant expressionsthem-selves.1

    1. How are contoursextracted?

    1A word on the notationalconvention: matricesandvec-tors will be representedby bold-facecharacterswhile scalarquantitiesby plain-face characters. 2D quantitieswill bein small letterswhile 3D and higher-dimensionalquantitiesin capital letters. For example,coordinates(bold for vectorquantities)of a 2D curve (small letter for 2D quantities)willbedenotedby � .

    Or statedslightly differently, how is the problemof segmentation(separatingobjects from back-ground)addressed?Segmentationturnsout to bean extremelydifficult problemand,asfundamen-tal a problemassegmentationis, thereis no fail-proof solution. A “perfect” segmentationschemeis like theholy grail of low-level computervisionanda panaceato many high level vision problemsaswell.

    We arenot in searchof this holy-grail, which,we believe, is untenablein the foreseeablefuture.In an imagedatabaseapplication,the problemofobjectsegmentationis simplifiedbecause� Databaseimagescan usually be acquiredun-der standardimagingconditionswhich allow theingest and catalog operationsto be automatedor semi-automated.For example, to constructadatabaseof airplaneimages,many bookson civiland military aircraftsare available with standardfront, side,andtop views taken againsta uniformor unclutteredbackground. (The above is alsotruefor applicationsin botany andmarinebiology.)Thisallowsthecontoursof theobjectsof interesttobeextractedautomaticallyor with theaid of stan-dardtoolssuchasthefloodfill maskin Photoshop.Furthermore,thecatalogingoperationsareusuallydoneoff-line anddoneonly once.Hence,a semi-automatedschemewill suffice.� On the other hand, query imagesare usuallytaken underdifferent lighting andviewing condi-tions. Objectsof interestcanbeembeddeddeeplyin clutteredbackgroundwhich makestheir extrac-tion difficult. However, we can enlist the helpof the user to specify the object of interest in-steadof askingthe systemto attemptthe impos-sibletaskof automatedsegmentation.A query-by-sketchor a“human-in-the-loop”typesolutionwithaneasy-to-usegraphicsinterfaceandsegmentationaids suchas the flood fill maskis perfectly ade-quatehereanddoesnot imposeundueburdenontheuser. This proved to be feasiblein our experi-ments.2. How are contoursparameterized?For a contourbaseddescription,a commonframeof referenceisusuallyneededthatallowspointcor-respondencesto be establishedbetweentwo con-toursfor comparison.Thecommonframeof refer-encecomprisesa commonstartingpoint of traver-sal,acommondirectionof traversal,andaparame-terizationschemethattraversesto thecorrespond-ing pointsin the two contoursat thesameparam-eter setting. We will first discussthe parameter-

  • ization�

    issueand thenaddressthe issuesof pointcorrespondenceandtraversaldirection.

    When defining a parameterizedcurve �����

    �� � ��������������� , mostprefertheuseof theintrinsicarclengthparameterbecauseof its simplicity andthefactthatit is eitherinvariantor transformslinearlyin rigid-body motion anduniform scaling. How-ever, undermore generalscenarioswhereshapedeformationis allowed(e.g.,deformationinducedin an oblique view), intrinsic arc length parame-ter is no longer invariant. Suchdeformationcanstretchandcompressdifferentportionsof of anob-ject's shape,and a parameterizationbasedon in-trinsic arc lengthwill resultin wrongpoint corre-spondence.

    It is well known that many shapedeformationanddistortionresultingfrom imagingcanbemod-eledasanaffine transform,throughwhich the in-trinsic arc length is nonlinearly transformed[2].An alternative parameterizationis thus required.There are at least two possibilities. The first,called affine arc length, is defined[2] as: ������� �! "�$#�&% #� "�(')� where "� � "� are the first and #� � #�arethe secondderivativeswith respectto any pa-rameter � (possibly the intrinsic arc length), and�+*,�.-/ is thepathalongasegmentof thecurve.

    Another possibility [2] is to use the enclosedareaparameter: 01�324 ����65 � "�7%� "� 5 ')�98 Onecaninterprettheenclosedareaparameterastheareaofthetriangularregion enclosedby thetwo line seg-mentsfrom thecentroidof anobjectto two points* and - on thecontour. It canbeshown thatboththeseparameterstransformlinearly undera gen-eral affine transform[2]. Hence,they caneasilybemadeabsolutelyinvariantby normalizingthemwith respectto thetotalaffinearclengthor thetotalenclosedareaof the whole contour, respectively.Weusetheseparameterizationsin ourexperiments.3. How are identical traversal dir ection andstarting point guaranteed?It will be shown that the invariant signatures (tobedefinedlater)of two contoursarephase-shiftedversionsof eachotherwhenonly thestartingpointof traversaldiffers.Furthermore,thesamecontourparameterizedin oppositedirectionsproducesin-variantsignaturesthatareflippedandinvertedim-agesof eachother. Hence,a matchcanbechosenthat maximizescertaincross-correlationrelationsbetweenthetwo signatures.

    Allowing an arbitrary changeof origin andtraversal direction, togetherwith the use of anaffine invariant parameterization,imply that no

    point correspondenceis required in computingour invariants.

    Now wearereadyto introducetheinvariantex-pressionsthemselves.Our invariantsframework isvery generalandconsidersvariationin anobject'simageinducedby rigid-bodymotion,affine defor-mation,andchangesin parameterization,sceneil-lumination,andviewpoint. Eachformulationcanbeusedalone,or in conjunctionwith others.Dueto the pagelimitation, we can only give a briefdiscussionof the invariantsunderrigid-body andaffine transformand summarizethe invariant ex-pressionsunderchangeof illumination andview-point. Interestedreadersare referredto [1] formoredetails.

    Invariants under Rigid-Body Motion and AffineTransform Consider a 2D curve, ����� �� � ������������� � where � denotesa parameterizationwhich is invariant underaffine transform(as de-scribedabove), andits expansionontothewaveletbasis: � �9@+A �� (where > ��� is themotherwavelet)as B �C; � � � �����D: �

  • Scenarios Invariantexpressions

    Rigid-bodymotion(usingsplinebasis)5 ucv9w x A ucy�w zc55 u|{Dw } A uc~.w 5

    Affine transform(usingwaveletbasis)

    ppG�V ,. pppp ,9 ,c ppAffine transform(usingsplinebasis)

    pppp V . 9 pppppppp �c 3�| pppp

    Perspective transform�@ �+���%aN| + ; ����

    4 ']�(usingrationalsplinebasis ) where��� is theobservedimagecurve,and ; ��� is thedatabasecurve in rationalsplineform.Changeof illumination

    ppp � ucv9Uw x^.ucv�.w xVqq�ucv�

  • (1) (2) (3) (4)

    (5) (6) (7) (8)

    (9) (10) (11) (12)

    (13) (14) (15) (16)Figure1: A databaseof airplanemodels.

    We useda two-stageapproachin informationfusion. Featuresinvariant to affine deformationandperspective projectionwerefirst usedto matchthesilhouetteof thequeryairplanewith thesilhou-ettesof thosein the database.We thenemployedthe illumination invariantscomputedon objects'interior to disambiguateamongmodelswith sim-ilar shapebut differentcolors. The resultsshowthat wewereable to achieve 100% accuracyus-ing our invariants formulation for a databasecomprising very similar models,presentedwithquery imagesof largeperspective shapedistor-tion and changein illumination.

    Table2 shows the performanceof usingaffineandperspective invariantsfor shapematchingun-

    der a large changeof viewpoint. For eachqueryimage(A throughK), theaffineandperspective in-variantsignatureswerecomputed,and comparedwith the signaturesof all modelsin the database.Correlationcoefficientsasdescribedin Sec.2 wereusedto determinethesimilarity betweeneachpairof signatures.Eachrow in Table2 refersto aqueryimage.Eachof thetencolumnsrepresentstherankgiven to eachairplanemodel from the database(shown in parentheses).The columnsareorderedfrom left to right, with the leftmostcolumnbeingthe bestmatchfound. Only the top ten matchesareshown. Thevalues(not in parentheses)arethecorrelationcoefficients.Entriesprintedin boldfacearetheexpected(correct)matches.

  • (A) (B) (C) (D)

    (E) (F) (G) (H)

    (I) (J) (K)Figure2: Thesameairplanesin varyingposesandillumination.

    As can be seen,all query imageswere identi-fied correctly. Fig. 3 shows a sampleresult. Theleftmostimageis the queryimage. The top threematchesarein thenext threecolumns—thequeryimage (solid) and estimatedimage (dashed,us-ing perspective invariants)with thecorrespondingdatabasemodelareshown.

    For this experiment,all queryimageswerecor-rectlymatchedwith themodelsfrom thedatabase,usingaffine andperspective invariants. However,the error valuesof the top two matchesfor, say,airplaneK werevery closeto eachother. This isbecausethe top two matcheshave similar shapesandbotharesimilar to thequeryimage.Theconfi-dencein theselectedmatchescanbestrengthenedby testingwhetherthe interior regionsof the ob-jects are also consistent. Illumination invariantsreadilyapplieshere.

    For illumination invariants, a characteristic

    curve wasuniquelydefinedon thesurfaceof eachairplane model in the database(performedoff-line), so that its superimpositionover the modelemphasizesimportant (or interesting)color pat-terns in the image. Our perspective invariantsschemecomputedthe transformationparametersthatbestmatchthetwo givencontours.Thesameparameterswereusedto transformthe character-istic curve definedfor eachmodel to its assumedposein thequeryimage.Hence,thecolorsdefinedby the characteristiccurve in the model shouldmatchthecolorsdefinedby thetransformedcurvein the query image(except for changesdueto il-lumination). Illumination invariantsignaturesforthe query imageswere thencomputed,andcom-paredwith thesignaturesstoredin thedatabaseus-ing Eq.4.

    We show one result of illumination invariantswherethe (perspective invariant)errorsof the ¿

  • Rank(usingaffineandperspectiveinvariants)Image

    9Á+Â ÃVÄVÅ ÆVÇUÅ ÈVÂdÉ Ê

  • 100 150 200 250 300 350 400 450 500 550 60050

    100

    150

    200

    250

    300

    350

    400

    450

    100 200 300 400 500 600 70050

    100

    150

    200

    250

    300

    350

    400

    450

    100 150 200 250 300 350 400 450 500 550 6000

    50

    100

    150

    200

    250

    300

    350

    400

    450

    ( ¿ À @ ) ( X Ð t ) ( ÑcÒ t )Figure3: QueryimageA, with thetop threematchesfrom thedatabase,usingperspective invariants.(Solidfor thequeryimage,dashedfor theestimatedimageusingperspective invariants.)

    0 10 20 30 40 50 600

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    (a) (b) (c)

    0 10 20 30 40 50 600

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    (d) (e) (f)Figure4: (a),(d) Airplane modelswith the characteristiccurvessuperimposed,(b),(e) query imagewiththe transformedcharacteristiccurvessuperimposed,and(c),(f) illumination invariantsignaturesfor queryimageK (solid)andfor models14and3 (dashed).