16
71 Syuko KATO 1 Introduction We live in a world flooded with an ever rising tide of information, making information retrieval technologies more important than ever before in helping users find their way in a user-friendly and efficient manner. Since users see a variety of implications in an image when searching for images, a method for mak- ing nonsuperficial inferences regarding each user’s intentions is required, particularly when a system handles abundant and diversified images. Statistical and liner algebraic methods have been often used in conventional research on image retrieval methods, which seek to assess user intentions. One method involves entering language keys, as in the case of pic- ture retrieval[1] based on the entry of “impres- sional words” and on the entry of image keys, as in the case of similar image retrieval[2][3] for trademark designs and butterfly patterns. However, those methods entail calculations of the relations between language keys and image features and a priori weighing of image features based on training images in order to reflect the user’s mind. Such approaches fare poorly when user intentions are unclear or change during image retrieval. In recent years, techniques intended to facilitate interactions between the user and the retrieval system have received increasing attention. Some reports[4][5][6][7] indicate that it is effective to directly integrate the user’s subjective valuation directly into the retrieval system; this method is not limited to just image retrieval. In particular, genetic algo- rithms (GA), an evolutionary interactive method, can be used to automatically provide very effective solutions by assigning a fitness value to images, based on user estimations, without requiring the definition of complex estimation functions. This technique has been explored for image retrieval applications such as photomontages of wanted crime suspects[8], nature images retrieval[9], and line drawn por- trait retrieval[10]. It is regarded as an effective method for users who cannot articulate their An Image Retrieval Method Based on a Genetic Algorithm Controlled by User’s Mind Syuko KATO This paper describes an image retrieval method based on an interactive Genetic Algo- rithm controlled by the minds of individual users. The method dynamically enhances the ability to reflect the mind of the user in the retrieval process and improves the retrieval effi- ciency. By using this method, without explicit representation of the desired image, the image is nevertheless induced through feature selection based on the diverse, fragmentary, and ambiguous subjective intent of the user. This points the way to a powerful system that is capable of guessing the mind of the user and is not dependent on overt representation of images. Keywords Image retrieval, Genetic Algorithm, Interaction, Subjectivity, Feature selection

An Image Retrieval Method Based on a Genetic … retrieval, Genetic Algorithm, Interaction, Subjectivity, Feature selection . 72 intentions in the form of keys before image retrieval,

Embed Size (px)

Citation preview

71Syuko KATO

1 Introduction

We live in a world flooded with an everrising tide of information, making informationretrieval technologies more important thanever before in helping users find their way in auser-friendly and efficient manner. Sinceusers see a variety of implications in an imagewhen searching for images, a method for mak-ing nonsuperficial inferences regarding eachuser’s intentions is required, particularly whena system handles abundant and diversifiedimages.

Statistical and liner algebraic methodshave been often used in conventional researchon image retrieval methods, which seek toassess user intentions. One method involvesentering language keys, as in the case of pic-ture retrieval[1] based on the entry of “impres-sional words” and on the entry of image keys,as in the case of similar image retrieval[2][3]for trademark designs and butterfly patterns.However, those methods entail calculations ofthe relations between language keys and

image features and a priori weighing of imagefeatures based on training images in order toreflect the user’s mind. Such approaches farepoorly when user intentions are unclear orchange during image retrieval.

In recent years, techniques intended tofacilitate interactions between the user and theretrieval system have received increasingattention. Some reports[4][5][6][7] indicate thatit is effective to directly integrate the user’ssubjective valuation directly into the retrievalsystem; this method is not limited to justimage retrieval. In particular, genetic algo-rithms (GA), an evolutionary interactivemethod, can be used to automatically providevery effective solutions by assigning a fitnessvalue to images, based on user estimations,without requiring the definition of complexestimation functions. This technique has beenexplored for image retrieval applications suchas photomontages of wanted crime suspects[8],nature images retrieval[9], and line drawn por-trait retrieval[10]. It is regarded as an effectivemethod for users who cannot articulate their

An Image Retrieval Method Based on aGenetic Algorithm Controlled by User’sMindSyuko KATO

This paper describes an image retrieval method based on an interactive Genetic Algo-rithm controlled by the minds of individual users. The method dynamically enhances theability to reflect the mind of the user in the retrieval process and improves the retrieval effi-ciency. By using this method, without explicit representation of the desired image, theimage is nevertheless induced through feature selection based on the diverse, fragmentary,and ambiguous subjective intent of the user. This points the way to a powerful system thatis capable of guessing the mind of the user and is not dependent on overt representation ofimages.

Keywords Image retrieval, Genetic Algorithm, Interaction, Subjectivity, Feature selection

72

intentions in the form of keys before imageretrieval, but can estimate each presentedimages.

Nevertheless, the conventional imageretrieval methods have at least the followingtwo problems in terms of the expression andhandling of image features from large-scale,diversified image databases.

First, they limit the range of image fea-tures according to application area. In mostcases, they use only physical and graphicalimage features[11] such as color, geometry,and position that can be directly derived fromthe image; or even when using feature expres-sions of the semantic layer[11], such expres-sions are primarily limited to so-called Kanseiwords, or adjectives. However, an individualuser’s impressions of a desired image clearlycannot be depicted by limited sets of featureexpressions. For example, the user may con-note a sense of relaxation when seeing a pic-ture of a lying dog, or the picture of a fallingapple may bring to mind a typhoon or “flash,”due to associations unrelated to color or Kan-sei words. We must not neglect the role offeature expressions of the semantic layer thatindicate meanings suggested and derived froman image indirectly. Despite technologicalchallenges to the automatic acquisition of fea-ture expressions of the semantic layer, therequired retrieval mechanism must be able tohandle a variety of features involving thephysical, graphical, and semantic layers inorder to develop an image retrieval methodcapable of deep inferences into complexhuman minds.

Second, conventional methods fail toaddress image features as fragmentary expres-sions typical of complex mental states. Inmost cases, the relationship between an imageand an image feature is uniquely determinedwith uniform feature expressions; all explicitimage features are counted and distances cal-culated in the feature space reflecting the sub-jective amounts (quantified subjectiveweights) to infer the desired image. However,large-scale and diversified image databasesare more likely to have non-uniform feature

expressions for each image, including thoseinvolving the semantic layer. The calculationof distance in a large feature space reducesretrieval efficiency, with no offsetting advan-tages. In addition, this approach ignores theimplications of images that are not successful-ly described, and the range of inferences basedonly on explicit feature expressions is neces-sarily limited. During image retrieval, theuser may not have in mind a clear image; mayprovide poor expressions of desired images;experience different sensory associations thatthose registered in the system; or have in mindmore than one solution, with their imagesbeing ambiguous, fragmented, diversified, andmultimodal – that is, complex. The imagefeatures expressed by the system are onlyfragments reflecting limited aspects of theuser’s mental state. Although we believe it ispossible for the user to consciously control theimage retrieval process, we believe that thecomplexities of the mental states cannot beregarded as static quantities, and are not sub-ject to precise expression. After all, we con-stantly have a “change of mind”. To approachthe human mind, we need an approach thatcan dynamically infer the user’s intentionsbased on information derived from an interac-tive process between the user and the system,while leaving ambiguities in place. In thissense, we call such an image retrieval methodan “image retrieval method controlled byuser’s mind.”

Keeping in mind these aspects, we proposean image retrieval method, which tolerates theimperfection of feature expressions, based onan interactive GA[12][13][14][15] that canaddress the image features of the physical,graphical, and semantic layers. In our propos-als, we show that the results of feature selec-tion respond flexibly according to the user’simplicit expectations[12] when the user esti-mates the images presented by the system. Inthe body of research concerned with exploringvarious levels of feature expressions, there is alayered model[16] that is primarily based onthe perspective of the semantic layer. Ourapproach differs from this approach in that we

Journal of the Communications Research Laboratory Vol.48 No.2 2001

73

do not conduct separate matching operationsfor each level of feature expressions, butimplement matching at an integrated level.Even our approach has problems providing thematching images required for user estima-tions, and image searches can thus be quitetime-consuming, although performance willgreatly depend on user estimations based onperusal of many images. Furthermore, ourapproach relays on the emergent retrievalcapability of GA to guess user’s mind, onlyreflecting user estimations to image fitness.

In this paper, we propose a technique thatimproves on the method of the reference[12].To control the image retrieval process basedon the user’s mind, we increase the interactionbetween the user and the system by allowingthe user to change his/her estimation of thepresented image and the number of presenta-ble images at any time, even during an imagesearch. The user’s intentions are faithfullytracked by applying the information obtainedfrom such interaction to a crossover technique.Our method also boosts retrieval efficiency byusing information obtained during the retrievalhistory and interactions involving the acquisi-tion and presentation of matching images.

In the following sections, we will providean outline of the image retrieval methodtouched on in Section 2, describe its effective-ness in reflecting user intentions. Section 3describes a proposed method for improvingretrieval efficiency, while Section 4 discussescomputer simulations and results. In thispaper, we will refer to the method we havedeveloped thus far as the “basic method,”while referring to the improved basic methodas the “proposed method.” When no distinc-tion needs to be made, we will refer to bothsimply as “this method.”

2 Image Retrieval Method Basedon a Genetic Algorithm

This technique executes image retrievalwhile selecting features, using an interactiveGA technique to control the image retrievalprocess based on the user’s mind. Each user’s

intention is interactively taken into the systemas “environment,” while GA controls theselection results through numerous genera-tions. These results are internally representedas an “individual” indicate clues of the user’sintention. This makes possible retrieval ofimages that match the user’s intentions. Con-cretely, the user answers only questionsinvolving the images presented (YES, NO,UNDECIDED) during the retrieval process.This method is a browsing type imageretrieval technique by which the presentedimages become the retrieval results. The goalof this retrieval process is not to search for abest image, but to find more than one appro-priate image. If the user wishes to entersearch conditions at the start of search, he orshe may enter an arbitrary feature or a combi-nation of features of the images (logical prod-uct) defined by the system as an initial key.Unless the user thinks of a search condition,he/she is not required to enter any.

This method emphasizes the following twopoints, as described in Section 1. First, it iscapable of handling all the features of thephysical, graphical, and semantic layers; sec-ond, the feature selection results are matchedwith images based not on distance, but a setrelationship. This method of determining cor-relation based on set relationship allowsimperfections in expression and does notrestrict implicit meanings. The following is abrief explanation of the design method[12].(1) Coding

As each emerging chromosome describesthe result of feature selection in GA, the gene,the smallest operational unit in GA, is definedas a feature, namely, a pair of a feature axisand a feature value. When n(n≥1) genes havebeen collected, a chromosome is formed andbecomes the feature selection result. Thelength of a chromosome n is determined whenthe user enters the initial key. One generationconsists of as many as m (m≥1) chromosomes.The population size m may change.

In this method, since a feature is definedas a pair of a feature axis and a feature valueeach labeled with an integer, the features of all

Syuko KATO

74

the physical, graphical, and semantic layerscan be treated as genes. This facilitates han-dling a variety of features by labeling new fea-tures with integers and adding them to thealready defined features, although in practicethis approach is limited by computingresources. Moreover, the contents of the fea-ture definition can be highly flexible: repre-senting the combined intentions of the systemdesigner, index designer, and the characteris-tics of each feature, a single feature axisallows one or more feature values. A featurevalue defined for a feature may be defined asthe feature axis of another feature. Althoughwe have determined that the chromosomelength is a constant from a physical viewpoint,all the defined features need not occur on asingle chromosome. Since a chromosomemay have more than one of the same gene init, chromosome length is variable from a logi-cal viewpoint.(2) Initial population

An initial population is composed from theinitial key entered by the user at the start ofthe search, as a retrieval condition can bedescribed by the initial population in GA. Ifthe user does not enter an initial key, up to mchromosomes are randomly generated andused as the initial population. If the user doesenter an initial key, up to m chromosomesmade from chromosomes reflecting an arbi-trary number of entered features (pairs of thefeature axis and the feature value) and ran-domly generated chromosomes are used as theinitial population.(3) Genotype and phenotype

The relationship between the genotype andphenotype in GA is analogous to that betweenfeature expression and image in imageretrieval. Although an image is an object/enti-ty of which a feature can be uniquelydescribed as a collection of pixels, this methodis based on the assumption that an image canbe endowed infinite meanings by the complexuser’s mind, and that the feature expressionand image are related by set theory. The geno-type chromosome is regarded as a logicalproduct of features, which becomes the feature

selection result providing clues of user inten-tion. Subsequently, all images of features,including such feature selection results, areregarded as phenotype images and provided asmatching images corresponding to user inten-tion.(4) Fitness

Fitness is assigned to each chromosome.GA forms populations, placing priority on thechromosomes of the highest fitness, and pro-ceeds to evolve generations. To select featuresthat provide clues for inferring user’s subjec-tive intentions, fitness must reflect user’sminds. Thus, in assigning fitness, an interac-tive method is adopted to seek out the user’sdirect estimation of system during retrieval.At this time, it is important for the system topresent output in the form of an image (pheno-type). Images having the features includingall the features represented in chromosomesthat match the given chromosomes are shownto the user, and the user estimates each imageto determine image fitness. Note that fitnessmust be defined for the case in which morethan one image matches a single chromosome.For convenience, the average of the fitnessvalues assigned to matched images is adoptedas the fitness of the corresponding chromo-some. Regardless of user estimations, the fit-ness value is determined to provide a highervalue for an image more likely to match. Thisis done for the sake of convenience. Table 1lists six types of fitness corresponding to eachcondition. Introducing the concept of imagefitness degree allows us to automatically pro-vide intermediate fitness. Since image fitnessis determined to be 25% when the user estima-tion is NO, the same effects are obtained asthe bias fitness[17], which does not provideexplicit differences based on estimations.(5) Genetic operation

Chromosomes to which fitness areassigned in the population (characterized asm) are subject to genetic operations such asselection, crossover, and mutation. For selec-tion, the number of the chromosomes of eliteselection (characterized as e) are first taken indescending order of fitness. For chromosomes

Journal of the Communications Research Laboratory Vol.48 No.2 2001

75

other than elite chromosomes (m - e), theroulette method (selected based on probabilityproportional to fitness; overlap permitted) isapplied. Chromosomes other than elite chro-mosomes are subjected to crossover. Thepairs are removed in the order adopted duringselection, and crossover operations are per-formed according to the crossover rate.Adopting the general idea of uniformcrossover allows equal handling of all thegenes in each chromosome. In the basicmethod, mutations occur only with chromo-somes consisting of pairs of contradictory fea-tures (contradictory chromosomes) that cannotarise. A mask is created having the length ofsuch a chromosome, and the determination ofmutation according to the mutation rate ismade based on such contradictory chromo-somes. The mutation is then performed in thelocus of the gene selected for mutation by ran-domly generating the feature axis and the fea-ture value. The mutation operation is repeateduntil the features in the pair that appear in thechromosome do not contradict.

In the genetic operations adopted in thisimage retrieval method, selection is performedto select features that match user intentions.Crossovers are performed to obtain new pairsof features, while mutations are performed toretain features in pairs that do not contradict.

3 Proposal of Improved BasicMethod

3.1 OutlineTo achieve an image retrieval method con-

trolled by the user’s mind, we propose a tech-nique that improves the degree of “subjectivi-ty reflection” and the retrieval efficiency ofthe basic method described in Section 2. Inthe proposed method, the informationobtained through interactions with the user isextensively deployed by a more interactivemethod. The outline of the proposed algo-rithm and its mechanism are shown in Fig.1.This image retrieval system consists of fourmajor units: the image database unit, userinterface unit, initial input processing unit, andimage retrieval unit.

The image database unit stores/managesimage data and extracts/manages information(image information) on the image data includ-ing defined features. It exchanges therequired information with the initial input pro-cessing unit and the image retrieval unit.Large-scale image databases are built byprocuring/controlling image data through net-works and realized by expanding the imagedatabase unit.

The user interface unit is a site at whichthe system and the user interact directly. Inthe proposed method, the system accepts theinitial key entered by the user and the histori-cal information described in Section 3.3.1 assearch conditions, as well as the user estima-tion of the images presented and the numberof presentable images (browsing threshold) atany timing during retrieval, to achieve a flexi-ble interactive operation. In addition to simi-lar images or unregistered words, systems atsome future date may be able to accept non-verbal information such as the viewpoints ofthe user and facial expressions. This wouldenhance user input flexibility and allow such asystem to acquire even more mind informa-tion, which is the central factor for any controlbased on the user’s mind. Future challengeswill include further upgrades of the user inter-face and more sophisticated handling of

Syuko KATO

Typeof

fitnessNo.

Typeof

fitness

Userestimation Situation

Value(%)

1 Imagefitness Present Image for which the user

estimation is YES 100

2 Imagefitness Present Image for which the user

estimation is NO 25

3 Imagefitness Present Image for which the user

estimation is UNDECIDED 75

4 Imagefitness Absent Image for which the user

estimation is thinned out 50

5 Fitness Absent Chromosome with nomatching image 50

6 Fitness Absent Contradictorychromosome 0

Type of fitnessTable 1

76

knowledge bases.The initial input processing unit converts

the search conditions entered through the userinterface into a form used in GA. The pro-posed method generates an initial populationbased on the search conditions, determineschromosome length, and provides a normalpopulation consisting of pairs of non-contra-dicting features for the image retrieval unit.

The image retrieval unit consists of amatching image control unit, feature selectionunit, and mind-inferring unit, and executes thefeature selection reflecting the user’s mind byrepeating generation processing and perform-ing image retrieval. The proposed methoddefines the feature reference amount describedin Section 3.2.1 to improve the way of reflect-ing the user’s mind and applies this to thin outthe matching images in the matching imagecontrol unit and in the crossover and mutationoperations in the feature selection unit. Thematching control unit repeats a series of chro-

mosome processes consisting of chromosome-image correlating process, the matching imagethinning process and image presenting processin proportion to the size of the population.The information provided by the user throughinteractive exchanges is processed as the con-trol information reflecting the user’s mind,and the management of the calculated fitnessand feature reference amount are handled bythe mind-inferring unit. The mind-inferringunit only manages and provides the fitness andfeature reference amount; the proposed modeldoes not require a specific model of mind.The feature selection unit performs selection,crossovers and mutations, referring to the fit-ness and feature reference amount managedby the mind-inferring unit after chromosomeprocessing of one generation, in order toimprove the subjectivity reflection andretrieval efficiency, and generate the next-gen-eration population.

The following sections describe possible

Journal of the Communications Research Laboratory Vol.48 No.2 2001

Outline of the proposed methodFig.1

77

ways to improve improvement of subjectivityreflection (Section 3.2) and retrieval efficiency(Section 3.3.).

3.2 Improvement of subjectivityreflection3.2.1 Definition of feature referenceamount

In order to improve the accuracy of sub-jectivity reflection in the basic method andcontrol image retrieval with the user’s mind,the feature reference amount has been newlydefined as a quantity that reflects the user’smind. The feature reference amount is a sta-tistical quantity that reflects not just the userestimation of each image presented by the sys-tem upon the current chromosome in a form offitness, but also strongly reflects it within thegenes comprising the chromosome. The fea-ture reference amount is a quantity added toeach feature defined in the feature space withan initial value of zero, which emerges asgenes across generations. When each featureemerges as a gene, the feature referenceamount is renewed with reference to the fit-ness assigned to the chromosome upon itsemergence. In the research for this paper, thefollowing two calculation methods were stud-ied to calculate the feature reference amount.One is a method for adopting the average ofthe fitness values obtained until the feature hasemerged as a gene, as the feature referenceamount (hereafter, Fa-value) of the feature;the other is a method for adopting the largestof the fitness values as the feature referenceamount (hereafter, Fm-value) of the feature.

In reflecting the user’s mind, combinationsof features rather than a single feature oftenhave greater significance. This is because theuser changes the priority placed on each fea-ture during feature combination, and becausethe features each play one role among support-ed feature, unsupported feature, or irrelevantfeature. In this case, a combination of featuresis not limited to the combination of featuresexplicitly described, and includes features thathave not been explicitly described; it is thushard to understand one’s mind under such cir-

cumstances. Fitness is general information onthe user’s mind obtained through combina-tions of features dependent on the individuali-ty of each image database, while the featurereference amount is statistical informationdepicting the user’s mind, which depends oncombinations of features and the emergence ofeach feature. In the proposed method, the fit-ness and the feature reference amount are uti-lized as simple parameters to control theimage retrieval process based on the user’smind.3.2.2 Dominant conditioned uniformcrossover

A dominant conditioned uniform crossoverusing the feature reference amount is proposedto improve the subjectivity reflection in thefeature section results arising through genera-tions. In general, crossovers are performedupon pairs of chromosomes (Parent 1, Parent2) determined to be subject to crossoveraccording to the crossover rate. The uniformcrossover is a crossover technique for generat-ing a binary random mask the length of eachgene and determining the location of thecrossover according to the mask value (acrossover is performed when the mask value is“1”). This is the crossover techniqueemployed in the basic method, referred to inthis paper as a simple uniform crossover. Inthe dominant conditioned uniform crossoverproposed here, an additional condition is con-sidered to determine the location of thecrossover. When the mask value is 1, the fea-ture reference amounts assigned to featuresthat have emerged in the gene pair located inthe gene locus are compared, and a crossoveris performed if the feature reference amount ofParent 1 is equal to or less than that of Parent2. Provided that the gene of which feature ref-erence amount is the larger is called superiorgene and the other inferior gene, the aboveoperation effectively classifies chromosomesinto types including many superior genes andtypes including many inferior genes.3.2.3 Ways of enhancing situationdependency

In this method, because an interactive

Syuko KATO

78

process has been installed in GA, the featureselection reflecting the user’s mind proceedsin parallel with image retrieval. This methodis appropriate for gradually accepting changesin the user’s mind – that is, depending on cir-cumstances. This is an advantage of adoptingGA to realize an image retrieval method con-trolled by the user’s mind. However, in thebasic method, since no further estimation ispermitted for an image that has already beenestimated by the user, changes in mind cannotbe quickly reflected in the search process. Toimprove the flexibility of that method, we pro-pose a technique that enhances the situationdependence to improve subjectivity reflection.

The central part in Fig.2 is a window usedfor user estimations as implemented in theproposed method. When an image is present-ed by the system in the estimation-waitingwindow (right), the user estimates the imageusing a mouse. Images for which estimationhave been completed are moved to the estima-tion-completed window (left) so that the latestestimation results (YES, UNDECIDED, NO)equal to the permitted scrolling a window maybe identified. A change in the user’s mind isaccepted at any time, and the target image theestimation for which the user wants to changeis moved from the estimation-completed win-dows with the mouse. At this time, the systemreflects an appropriate feature selection resultthat has influenced the estimation of theimage, i.e., chromosome, upon the determina-tion of the population of the next-generationGA. As shown in Fig.1, a chromosome pairfulfilling the following condition is constantlyheld and renewed during the time of imageretrieval for each image that matches the chro-mosomes. One such chromosome in that pairis the chromosome (Cfmax) having the largestfitness among those that have emerged so far,and another chromosome (Cfmin) that has thelowest fitness; these are what have emergedmost recently and the numbers of their match-ing images are the lowest. If the user changesits estimation, the chromosome, either Cfmaxor Cfmin is temporarily added as a potentialchromosome for the next-generation popula-

tion, according to the direction of the change,and the fitness of each chromosome is deter-mined by the matching process between thechromosome and the image. They are thenrestored by selection to have predeterminedpopulation sizes. Four directions of changeare permitted in this case, as shown below.When the user changes the estimation fromYES to NO, or from UNDECIDED to NO, theCfmax gene is added. For a change from NOto YES, or from UNDECIDED to YES, theCfmin gene is added. The above rule isdefined so that fitness changes the most corre-sponding to the change in user estimation.

The proposed method improves the degreeof interaction with the user, and the user ispermitted to change his or her past estimationat any time during image retrieval, enablingthe user to interact with the system easily andnaturally, for user-friendly image retrieval. Inaddition, changes in the user’s mind can bequickly reflected in the feature selection toallow incorporation into the following trend ofpresented images. This refines and improvesthe subjectivity reflection in response tochanges in the user’s mind.

3.3 Improving retrieval efficiency3.3.1 Use of historical information

As generations evolve, this methodincreases the number of chromosomes featur-ing high fitness - that is, the feature selectionresults that reflect the user’s mind. We pro-

Journal of the Communications Research Laboratory Vol.48 No.2 2001

Windows for user estimationFig.2

79

pose a technique of utilizing the feature selec-tion results obtained for images in past search-es as historical information to raise retrievalefficiency, saving high-fitness chromosomesas historical information along with appropri-ate examples of matching images and theirestimation results. When the user makes arequest to use this historical information, thehistorical information is added to the initialpopulation unchanged. Note that one set ofhistorical information is provided for eachimage retrieval process. For convenience, anappropriate population size of chromosomes istaken as a set selected from chromosomes indescending order of fitness, going back intime from the latest generation, with no over-lap in order of appearance. From historicalinformation of the several sets of past infor-mation saved in the system, saved images andtheir estimations, which are saved with histori-cal information, are displayed in a suitableappropriate manner during image retrieval.Considering this information, the user deter-mines whether to use this historical informa-tion and reflects this decision in the initialpopulation. More than one set of historicalinformation may be specified for use at thistime.

For the proposed method, the method ofconstructing the initial population is expandedso that historical information in addition to theinitial key (feature) entered by the user can beentered. In the basic method, the permittedinitial input at the start of image retrieval waslimited to the initial key or search condition,while the chromosome of the population sizecorresponding to the initial key was defined asthe initial population. The concept of a sub-population is introduced at this point to allowhistorical information in the initial input. Thepopulation corresponding to the initial key isset aside as a sub-population without change,and the chromosome of one set of saved his-torical information comprises one sub-popula-tion reflecting historical information. As aresult, in the event of a request for the use of s(s≥0) sets of historical information, up to ssub-populations of historical information are

created. The sum of these sub-populations,the population size, becomes the initial popu-lation. In the generations that follow, no spe-cific sub-population is recognized, but thegeneration processing is performed on a popu-lation size basis. Note that in dealing withhistorical information, since sub-populationsmay have different chromosome lengths, thelongest chromosome length in the sub-popula-tions is taken as the representative chromo-some length for the population. Then sub-populations in which historical information inwhich chromosome lengths are shorter thanthe representative length compensate for theshortage of genes by randomly selecting fea-tures from those that have emerged in thosechromosomes.

Introducing the concept of the sub-popula-tion allows natural handling of a search condi-tion consisting of as many as s logical sums,the initial key entered by the user, through thegeneral of s sub-populations for each logicalunit. This definition of the initial populationhas a natural form, by which for many initialinputs of search conditions for logical product,the chromosome lengthens and the range ofmatching images narrows, while in the case ofmany initial inputs of search conditions forlogical sums, the population size grows andthe search range expands.3.3.2 Control of matching images

There are two problems related to imagescorresponding to matching images, i.e., chro-mosomes, that affect retrieval efficiency. Oneis a problem related to the acquisition ofmatching images; the other is a problem relat-ed to the thinning of matching images. Eachproblem is discussed below, and a solutionproposed. Both problems are solved by con-trolling matching images with feature refer-ence amounts and their variance, which arestatistical quantities reflecting the user’s mind.(1) Problem relevant to matching images andits solution

In this method, the chromosomes createdthrough generations express the feature selec-tion results that have been controlled by theuser’s mind. However, since most of the

Syuko KATO

80

emerging chromosomes become feature selec-tion results that do not match images in thedatabase, images are not presented quickly tothe user. Of course, this problem arises fromthe characteristics of each database, andwhether (or not) the feature selection resultshappen to be insignificant in the current data-base; they may be significant in another data-base. Reducing the emergence of chromo-somes that do not match images is reduced byconsidering the characteristics of each data-base improves retrieval efficiency.

We propose a method of controllinginsignificant feature selection results. Asshown in Fig.1, the chromosomes (featureselection results) that have not matchedimages are marked first. Then, if the markedchromosomes still remain in the next genera-tion after selection and crossover, they aresubjected to the following mutation using thefeature reference amount:

- Replace the genes of small feature refer-ence amounts and variance with those of largefeature reference amounts and variance in thetarget chromosomes (If the feature is single, anumber of other features equal to chromosomelength are randomly created).

Since the above operation reduces the log-ical chromosome length, reflecting the user’smind, we can expect the rate of emergence ofsignificant feature selection results matchingimages to improve. In addition, since they arereplaced by genes of large variance, genes ofhigh unpredictability are distributed to lattergenerations, contributing to the creation ofvarious feature selection results that provideindications of the user’s mind.(2) A problem associated with thinning ofmatching images and a proposed solution

Since the basic method assumes a large-scale image database, not all matching imagesare shown to the user, but are randomlythinned to reduce the load on the user duringestimation (the images thinned out are not pre-sented to the user, but assigned the image fit-ness listed on Table 1). However, thinning outimages randomly is inefficient, as the numberof matching images changes with chromo-

somes, with the risk of missing a number ofimages wanted by the user. Retrieval efficien-cy improves when a thinning, which reflectsmore of the user’s mind is performed, relyingon the size of the database and tolerances ofuser estimation.

We propose a method for controllingimage browsing in which the user enters avalue k, a browsing threshold indicating howmany images are permitted on screen perbrowsing session, as shown in Fig.1. The usermay alter k at any time during image retrieval.Excess images are thinned out and the remain-ing images presented to the user on screenonly when the number of matching imagesmatching the chromosome during chromo-some processing exceeds k. The order of pres-entation for images adheres to the followingrule, based on the feature reference amount:

- Focusing on the features defined in otherthan the target chromosomes among thematching images, the system presents imagesin descending order of feature referenceamount and ascending order of variance (ran-domly presented when exceeding k).

The above operation maintains user inter-action and helps the user browse images thatreflect his or her mind.

4 Computer Simulations

4.1 Outline of the simulationsComputer simulations have been per-

formed to simulate improvements in subjectiv-ity reflection and retrieval efficiency obtainedby the proposed method. For qualitative esti-mation of the flexibility of the interface unit,some 2300 color images such as illustrationsand photographs were used. In other quantita-tive experiments, for comparison against thebasic method, 100 binary images (32×32 pix-els) were used as stored images, as in the caseof Reference[12]. The quantitative resultsshown in this section are those obtained with-out entering any initial key and provided bythe user’s implicit intention. For simplicity,no UNDECIDED estimation was made. Theimplicit user intention was regarded as a “hor-

Journal of the Communications Research Laboratory Vol.48 No.2 2001

81

izontal line.” The feature definition used inthe experiment did not feature a clear featureof such a “horizontal line.” However, as relat-ed features in the semantic layer, we find afeature of a “line” along the “geometry name”feature axis, while in the graphical layer wefind a “black pixel” for each area of the 64-divided images along the “black pixel area”feature axis. It is possible to express blackpixel areas consisting of horizontally arrayedblack pixels. As images that match the user’sexpectation, manually drawn line images aredefined as “other” features along the “geome-try name” feature axis. The images defined bythe “line” feature values include “verticalline” images that do not match user intentions.We have prepared tricks such as these in fea-ture definition for the stored images used inthe experiment in order to provide diversity inthe feature expressions of images that meetuser expectations and simulate the complexsituations in which the same feature expres-sion may have different meanings to the sys-tem and the user while the system and the useragree or cross at a point through different fea-ture expressions. We assumed that the thin-out rate of matching images in the case of alarge-scale image database to be 90% (the10% given in the experiments referenced inReferences[12][13][14][15] are incorrect; 90% iscorrect). However, since the quantity of thestored images used in the quantitative experi-ment is merely 100, the matching image con-trol method described in Section 3.3.2 was notimplemented, and the thinning performed ran-domly.

4.2 Experimental Results and Discus-sion4.2.1 Subjectivity reflection

The degree of subjectivity reflection canbe determined from five requisite properties:individuality, situation-dependence, autonomy,diversity, and reproducibility.

Individuality and situation-dependenceare central properties for reflecting one’smind. In the proposed method, through inter-actions between the user and the system, the

user can enter search conditions representinghis/her mind and estimate the presentedimages on a real-time basis at any time duringimage retrieval. The image that fits userintentions can be provided on each occasionsby incorporating the user estimation into theGA fitness and feature reference amount.Individuality and situation-dependence arethereby reflected upon the image retrievalprocess. In the experiment, the user specifiedhistorical information, looking at pastretrieved images, and entered logical sumsearch conditions in addition to logical prod-uct conditions as initial keys. It was con-firmed that the interface grew more flexibleand reflections of individuality and situation-dependence more accurate. Because changesin the user’s mind were constantly taken intoaccount in an interactive manner, the userchanged estimations occasionally, with abreakthrough occurring when the featureselection froze, no longer providing newmatching images.

Autonomy is a property of finding animage that matches the user intention withoutbeing restricted by initial keys entered by theuser. The proposed method is provided withautonomy. GA is characterized by emergence,with an original mechanism that ensures ahigh probability of survival into the next gen-eration of high fitness chromosomes.Although the user did not enter initial keys inthe experiment, desired retrieval results werenevertheless obtained and the feature selectionresults providing clues of the user’s mind dur-ing the retrieval process obtained with highfitness, confirming the quality of autonomy.We also confirmed that the feature selectionresults (series of black pixels) reflecting dif-ferences in user intentions according to eitherimplicit expectation (anticipation of “horizon-tal line” or “vertical line”) emerged with highfitness. Even when the user entered initialkeys, the feature selection results, includingthose superficially contradicting the initialkeys, emerged with high fitness, and imagesmatching the user’s mind were provided.Thus, the proposed method was able to recon-

Syuko KATO

82

cile differences in “sensitivity” between theuser and the system.

Diversity is a latent ability indicating sys-tem adaptability, appearing when varioususers try to reflect their minds on the system.Considering that the present method reflectsthe user’s mind upon the system in a form ofthe feature selection, the degree of diversity offeature combinations seen in the feature selec-tion results should be related to the readinessto reflect one’s mind. This heavily draws onthe diversified searchability of GA, and itscrossover method is an important factor in set-ting its abilities. We have recognized thediversity of the proposed method based on thefrequency of fitness maximum feature patterns(the rate of emergence of chromosomes thathave a fitness of 100% and differ in the com-bination of features). Fig.3 shows the cumula-tive frequency of fitness maximum featurepatterns plotted against the consecutive num-ber of chromosomes (chromosome consecu-tive number) that have emerged through allgenerations, wherein the following four typesof crossover T0-T3 are compared.

T0: No selection or crossover. Chromosomesare randomly generated through all genera-tions.T1: No feature reference amount is set in eachfeature. Simple uniform crossover is per-formed. (Basic method)T2: Fa is adopted as the feature referenceamount, and a dominant conditioned uniformcrossover is carried out. (Proposed method)T3: Fm is adopted as the feature referenceamount, and a dominant conditioned uniformcrossover is carried out. (Proposed method)

Note that the imperfection in expressionwas taken into account in all types and that therelationship between chromosomes andimages is defined as an inclusion relation.The experiment was performed at populationsize 50, chromosome length 10, elite selectionrate 50%, crossover rate 100%, and mutationrate 100%. As generations evolve, the cumu-lative frequency increases in every type. In

comparison, T3 shows slightly higher valuesthan others at early generations, while T2starts to increase significantly at a consecutivechromosome number around 50000 (1000generations). From the viewpoint of diversity,T1 is inferior even to T0, in which genes aregenerated randomly. The results indicate thatT2 type of the proposed dominant conditioneduniform crossover is significantly improved interms of diversity, although it requires sometime for startup.

Reproducibility refers to the ability to pro-vide images matching the user’s mind. In thismethod, we focus on the analogy between therelationship between the GA phenotype andgenotype that have been expressed but notcorrelated one-to-one due to shortage of infor-mation and the implicit relationship betweenthe image and the user’s mind. We define therelationship between the genotype and pheno-type as an ambiguous, inclusion relationbetween the feature selection results shown bythe chromosomes and image features. Thereproducibility of this user-control method ismaintained by allocating the image fitnessfrom the user’s mind to phenotype images.Fig.4 demonstrates the changes in repro-ducibility (ratio of retrieved images to the totalimages that match the user intention). Theexperimental conditions were the same asthose of Fig.3. Although the potential forobtaining 100% reproducibility within a pre-

Journal of the Communications Research Laboratory Vol.48 No.2 2001

Diversity (Comparison betweencrossover modes)

Fig.3

83

determined generation range is highly contin-gent on variables, this value improves as gen-erations evolve regardless of differences intypes. In the early stages, T3 and T1 showhigher reproducibility, while T2 shows lowerreproducibility. However, as generationsevolve, T2 and T0, obtain high reproducibility,piling up short step-ups, while T1 and T3 arelikely to settle in local levels. This is alsofound from the results of the average multi-plexing degrees of the maximum fitness pat-terns where T1>T3>T2>T0. T1 and T3 showsharp increases in early generations becausethey produce many feature selection resultsthat overlap; frequent re-matching operationsare carried out on images that have beenthinned out and removed, thereby eluding userestimations. The reproducibility of T0 fluctu-ates significantly in each trial. This is becausethe randomness of T0 produces many featureselection results that do not match images.This is evident from its average fitness –around 47% - throughout all generations. Theresults also imply that T0 is a mode that doesnot learn the relations between the featuresdefined by the stored images and user inten-tions, even through interactive exchanges.

The above results on subjectivity reflec-tions demonstrate that the proposed methodimproves upon the basic method. The T3crossover mode obtains simplified imageretrieval results quickly. The T2 mode isappropriate when the user wishes diversified

image retrieval results that reflect his/hermind, with no requirement for speed of pro-cessing. Following the proposed method, it iseasy to change crossover modes from T3 to T2halfway through the evolution of generations,and the degree of subjectivity reflectionshould increase.4.2.2 Retrieval efficiency

To improve retrieval efficiency, the effi-ciency of GA itself must be improved, as withpipeline processing[18], eliminating the con-cept of generations. However, in this paper, inan effort to improve retrieval efficiency byaccurately reflecting the user’s mind, we pro-pose a method of raising efficiency in asemantic manner.

Our experiment demonstrates the useful-ness of historical information. Fig.5 showsthe cumulative frequency of fitness maximumpatterns along with population size, compar-ing the results of use and non-use of historicalinformation. The T2 crossover mode wasadopted and the horizontal axis represented bygeneration. The sub-population size (SUB-SIZE) in the case of using the historical infor-mation was determined to be the same forboth the initial keys and historical informa-tion. Compared to the same population size(SIZE), 100, the frequency of fitness maxi-mum feature patterns is larger with historicalinformation than without in an amount equalto the startup savings. In the experiment,where an equal 50 sub-population size was setfor the initial keys, the use of historical infor-mation proved effective. Fig.6 shows thechanges in reproducibility, while Fig.7 showschanges in conformity rate (the ratio of imagesmatching the user’s mind to the total retrievedimages. Since they are images that havealready appeared in the retrieval process, theydo not include images that have eluded userestimations) in the early stages of generation.These figures demonstrate that the use of his-torical information helps improve repro-ducibility and conformity rates in the earlystages of generation.

These studies demonstrate that the use ofhistorical information improves retrieval effi-

Syuko KATO

Reproducibility (comparison betweencrossover modes)

Fig.4

84 Journal of the Communications Research Laboratory Vol.48 No.2 2001

ciency by introducing sub-populations in theearly search stages without losing the diversityassociated with the degree of the user’s sub-jectivity reflection. When the user relies on

historical information, these results imply thatit is unnecessary to search all stored images.The proposed method can be a handy tech-nique for quickly finding likely images.

5 Conclusions

This paper places our GA-based imageretrieval as a method of controlling imageretrieval by each user’s mind. We have alsoproposed techniques for improving the degreeof subjectivity reflection and retrieval efficien-cy. To enhance subjectivity reflections, weincreased the level of interactivity and pro-posed a dominant conditioned uniformcrossover that provides a variety of featureselection results matching the user’s mind,based on a newly defined feature referenceamount and a technique for enhancing situa-tion-dependence capable of interactivelyreflecting changes in the user’s mind in thesystem at any time. The effectiveness of theseapproaches has been confirmed by computersimulations. To improve retrieval efficiency,we also propose a technique that draws on his-torical information based on past data, con-firming the effectiveness of this approach inthe early search stages. In addition, we pro-pose a matching image control method bywhich retrieval efficiency can be improved byacquiring matching images and controllingimage presentation based on the feature refer-ence amount that reflects the user’s mind.

The proposed method is quite useful forfinding images that match user intentionswithout precisely expressing the user’s mind.Since our image retrieval is controlled only bythe user’s mind, the information obtainedthrough interactions between the user and thesystem is central to this method. As a result,the method is greatly dependent on user esti-mations, and the complexity of user estima-tions is a current problem. Future challengesinclude improving the function for inferringthe user’s mind in order to reduce complexityin user estimations, adaptation of the methodto large-scale image databases, and furtherevaluations of its results and implications.

Diversity (comparison between useand non-use of historical informationand population sizes)

Fig.5

Reproducibility (comparison betweenuse and non-use of historical informa-tion and population sizes)

Fig.6

Conformity rate (comparisonbetween use and non-use of historicalinformation and population sizes)

Fig.7

85

Acknowledgments

We are grateful to Mr. Katsuhiro Fujimotoand Mr. Yutaka Ohkoshi at M-KEN Co., Ltd.,

Dr. Shun-ichi Iisaku, former Intelligent Com-munications Division Director, and staff mem-bers at the former Visual Communication Sec-tion for their assistance and for numeroushelpful discussions.

Syuko KATO

References1 T. KURITA, T. KATO, I. FUKUDA, AND A. SAKAKURA, "SENCE RETRIEVAL ON A IMAGE DATA-

BASE OF FULL COLOR PAINTINGS", TRANS. OF IPSJ, D-II, Vol.33, No.11, pp.1373-1383, Nov.,

1992(in Japanese).

2 H. NAGASHIMA AND Y. HIJIKATA, "ON RETRIEVAL OF SIMILAR TRADEMARKS CONSIDERING

SUBJECTIVITY", TRANS. OF IEICE, D-II, Vol.J74-D-II, No.3, pp.311-320, Mar., 1991(in Japanese).

3 K. TANABE, J. OHYA, AND K. ISHII, "SIMILARITY RETRIEVAL METHOD USING MULTIDIMEN-

SIONAL PSYCHOLOGICAL SPACE", TRANS. OF IEICE, D-II, Vol.J75-D-II, No.11, pp.1856-1865,

Nov., 1992(in Japanese).

4 T. YAGYU, Y. HISAMORI, Y. YAGI, AND M. YACHIDA, "COLOR COORDINATE SUPPORTING

SYSTEM WITH NAVIGATING STATE OF USER'S MIND", TRANS. OF IEICE, A, Vol.J79-A, No.2,

pp.261-270, Feb., 1996(in Japanese).

5 M. ODA, S. AKAMATSU, AND H. FUKAMACHI, "ADAPTABILITY OF K-L EXPANSION TECH-

NIQUE FOR AMBIGUOUS FACE IMAGE RETRIEVAL", TRANS. OF IEICE, A, Vol.J79-A, No.2,

pp.288-297, Feb., 1996(in Japanese).

6 H. TAKAGI AND K. AOKI, "INTERACTIVE EVOLUTIONARY COMPUTATION: FROM CREATIVITY

SUPPORT TO ENGINEERING APPLICATIONS", ET AL, WORKSHOPS:INTERACTIVE EVOLU-

TIONARY COMPUTATION, Mar., 1998(in Japanese).

7 G. Venturini, M. Slimane, F. Morin, and J. -P. Asselin de Beauville, "On using interactive Genetic Algo-

rithms for knowledge discovery in databases", Proc. of 7th Int. Conf. on Genetic Algorithms, pp.696-703,

1997.

8 C. Caldwell, and V. S. Johnston, "Tracking a criminal suspect through “Face-Space” with a Genetic Algo-

rithm", Proc. of 4th Int. Conf. on Genetic Algorithms, pp.416-421, 1991.

9 H. Shiraki and H. Saito, "An interactive image retrieval system using Genetic Algorithms", Int. Conf. on

Virtual Systems and Multimedia '96 in Gifu, 1996.

10 M. NAGAO, M. YAMAMOTO, K. SUZUKI, AND A. OHUCHI, "EVALUATION OF THE IMAGE

RETRIEVAL SYSTEM USING INTERACTIVE GENETIC ALGORITHM", JOURNAL OF JAPANESE

SOCIETY FOR ARTIFICIAL INTELLIGENCE, Vol.13, No.5, pp.720-727, Sep., 1998(in Japanese).

11 S. KATO, "INFORMATION SPACE FOR IMAGE RETRIEVAL", SPRING CONF. OF IEICE, D-499, pp.7-

235, Mar., 1994(in Japanese).

12 S. KATO, "FEATURE SELECTION IN FLEXIBLE IMAGE RETRIEVAL", TRANS. OF IEICE, D-II,

Vol.J80-D-II, No.2, pp.598-606, Feb., 1997(in Japanese).

13 S. KATO AND S. IISAKU, "AN IMAGE RETRIEVAL TECHNIQUE USING GENETIC ALGORITHMS AS

A METHOD OF REFLECTING USER SUBJECTIVITY", PROC. OF THE 2ND SYMPOSIUM INTELLI-

GENT INFORMATION MEDIA, pp.91-98, Dec., 1996(in Japanese).

14 S. KATO AND S. IISAKU, "A PROPOSAL OF HOW TO MAKE GOOD USE OF THE PERSONAL HIS-

TORY INFORMATION REFLECTING USER SUBJECTIVITY IN AN IMAGE RETRIEVAL TECHNIQUE

USING GAS", TECHNICAL REPORT OF IEICE, AI97-10, pp.71-78, May, 1997(in Japanese).

86 Journal of the Communications Research Laboratory Vol.48 No.2 2001

Syuko KATO

Researcher, Next GenerationInternet Group, Informationand Network Systems Division

Next Generation Media

15 S. Kato and S. Iisaku, "An image retrieval method based on a genetic algorithm", Proc. The 12th Int. Conf.

on Information Networking, pp.333-336, 1998.

16 A. KITAMOTO AND M. TAKAGI, "HIERACHICAL MODEL AS A FRAMEWORK FOR CONSTRUCT-

ING SIMILARITY-BASED IMAGE RETRIEVAL SYSTEMS", TECHNICAL REPORT OF IEICE, IE97-

27(PRMU97-58, MVE97-43), pp.25-32, Jul., 1997(in Japanese).

17 K. Nishino, M. Murakami, E. Mizutani, and N. Honda, "Efficient fuzzy fitness assignment strategies in an

interactive Genetic Algorithm for cartoon face search", Proc. of 6th Int. Fuzzy Systems Association World

Congress, pp.173-176, 1995.

18 A. KITAMOTO AND M. TAKAGI, "INTERACTIVE IMAGE BROWSING USING QUEUE-BASED

GENETIC ALGORITHM", JOURNAL OF JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE,

Vol.13, No.5, pp.728-738, Sep., 1998(in Japanese).