Blobs Annotation

7/28/2019 Blobs Annotation

http://slidepdf.com/reader/full/blobs-annotation 1/10

A Novel Image Auto-annotation Based on

Blobs Annotation

Mahdia Bakalem1, Nadjia Benblidia2, and Sami Ait-Aoudia3

1 Laboratory Research for the Development of Computing SystemsSaad Dahlab University Blida, Algeria Laboratory Research On theImage Processing High Computing School - ESI Oued smart, Algeria [email protected]

2 Laboratory Research for the Development of Computing SystemsSaad Dahlab University Blida, Algeria

[email protected] 3 Laboratory Research On the Image Processing High Computing School - ESI

Oued smart, [email protected]

Summary. At present, there are vast amounts of digital media available on theweb. In the Web image retrieval, the semantics of an image is a big problem,

generally, the search engines index the text associated to the image of Web pages.This text doesn’t correspond really to them.

The image annotation is an effective technology for improving the Web image re-trieval. Indeed, it permits assigning semantics to an image, by attributing to the im-ages keywords corresponding to the senses conveyed by these images. To improve theautomatic image annotation (AIA), a strategy consists in correlating the textual andvisual information of the images. In this work, we propose an image auto-annotationsystem based on AnnotB-LSA algorithm that integrates the LSA model.

The main focus of this paper is two-fold. First, in the training stage, we performclustering of regions into classes of similar visual regions called blobs according

to their visual feature. This clustering prepares a visual space by learning fromthe annotated images corpus and permits to annotate the blobs by performingthe algorithm annotB-LSA. Second, in the new image annotation stage, we canannotate a new image by selecting the key words of the blobs to which its regionsbelong. Experiment results show that our proposed system is performing.

1 Introduction

On the Web, a software robot collects information by scrutinizing web pages.This collected information is used by information retrieval system (IRS) whenusers start the process of search. The Web Information Retrieval uses searchengines that are based on information automatic collection and on indexation.

The searched information can be in documents: textual, multimedia, im-age, . . . etc. Our work focuses on image documents. An image presents two

R.S. Choraś (Ed.): Image Processing & Communications Challenges 3, AISC 102, pp. 113– 122.springerlink.com © Springer-Verlag Berlin Heidelberg 2011



114 M. Bakalem, N. Benblidia, and S. Ait-Aoudia

levels: syntaxic and semantic. The syntaxic or visual level represents theimage visual features as the color, the texture, the shape . . . The semanticlevel allows to give a sense to the image through keywords.

Information retrieval systems comprise two phases: indexation and search.

We will focus on the indexation.With the development of domains such as medicine, satellite images and

mapping (cartography), the digital images on the web do not stop increasing.Thus, the necessity of a new domain of Web Image Retrieval arises.

The image indexation is a process that consists in extracting informationfrom the images and in representing them in a structure called index. Thereare two types of indexation: Indexation based on the image visual contentsand indexation based on the textual contents.

The first indexation allows to extract information on the image base upon

visual features and to represent them by a vector descriptor. However, thisindexation does not allow extracting semantics of an image. Textual index-ation can be made from the text associated with the image. However, thistext does not correspond in best to real sense associated to the image

Because of the semantic gaps between the image visual features and the hu-man concepts, most users prefer textual requests based on the semantics. Toallow this, a current strategy consists in extracting and merging the textualand visual image information.

One of the promising strategies is the image visual contents indexation

also called image annotation. It permits to assign semantics to an image byattributing keywords corresponding to the meaning conveyed by the image.The automatic annotation of an image is also known as auto-annotation.Unfortunately, the image auto-annotation sets some problems:

• The problem of visual and textual synonymy and polysemy.• Two images can be semantically similar, even though the words that an-

notate them or their visual features are not identical.• The annotation can be based on the entire image or on parts of the image

called regions. However, the image segmentation doesn’t provide an exactcorrespondence between a concept and a region. Thus, the segmentationhas a very important role and a good segmentation drives to a goodannotation.

• The choice of the visual features (Colors, texture, form . . . ) is not easy.The question is: what are the relevant visual features? This choice is avery important parameter in the image annotation.

Therefore, the main question is: how to annotate automatically images onthe Web by making the correlation between the visual and semantic aspect?

In this work, we try to improve the images auto-annotation by propos-ing the AnnotB-LSA algorithm that permits to annotate the blobs of similarvisual segments of images (regions) in order to annotate new images; thealgorithm integrates the LSA model in order to extract the latent contex-tual interrelationship between the key words that annotate the blobs. This



A Novel Image Auto-annotation Based on Blobs Annotation 115

article is organized as follows: a state of the art is presented in Sec. 2. TheSec. 3 focuses on our auto-annotation system. Sec. 4 presents the experimen-tations done by using the Corel corpus. A conclusion of this work and somerecommendations are made in the Sec. 5

2 State of Art

Different domains and technologies are integrated in the image auto-annotation process in order to improving their performance.

Y. Zhao et al. [5] propose a novel annotation scheme based on neuralnetwork (NN) for characterizing the hidden association between two modes,visual and textual. This scheme integrates the latent semantic analysis (noted

as LSA-NN) for discovering the latent contextual interrelationship betweenthe key words. The LSA-NN based annotation scheme is built at image-levelto avoid the prior image segmentation.

W. Jin et al. [10] present a new semi-naive Bayesian approach incorporat-ing the categorization by pair- wise constraints for clustering. This approachis used for auto image annotation based on two-fold, the learning stage andthe annotation stage. The first stage permits to cluster the regions into regionclusters by incorporating pair- wise constraints which are derived by consid-ering the language model underlying the annotations assigned to training

images. The second stage uses a semi-naive Bayes model to compute theposterior probability of concepts given the region clusters.X-J. Wang et al. [8] present a novel way to annotate images using search

and data mining technologies, AnnoSearch. Leveraging the Web-scale images,the authors solve this problem in two-steps: firstly, searching for semanticallyand visually similar images on the Web, secondly, mining annotations fromthem. To improve the performance of image annotation systems, J. Lu etal. [9] use a real coded chromosome genetic algorithm and k -nearest neigh-bor (k-NN) classification accuracy as fitness function to optimize the weights

of MPEG-7 image feature descriptors. A binary one and k-NN classificationaccuracy combining with the size of feature descriptor subset as fitness func-tion are used to select optimal MPEG-7 feature descriptor subset. Further-more, a bi-coded chromosome genetic algorithm is used for the simultaneityof weight optimization and descriptor subset selection, whose fitness functionis the same as that of the binary one.

W. Liu and X. Tang [15] formulate the Nonlinear Latent Space model toreveal the latent variables of word and visual features more precisely. Insteadof the basic propagation strategy, the authors present a novel inference strat-

egy for image annotation via Image-Word Embedding (IWE). IWE simulta-neously embeds images and words and captures the dependencies betweenthem from a probabilistic viewpoint.

F. Monay and D. Gatica-Perez use the auto-annotation by propagationLSA [1] and by inference PLSA [2] in linear space latent. The annotation

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-




is propaged from classified documents by PLSA inference that is based onprobabilist calculation using the posteriory distribution of vocabulary terme.

Tollari [3] proposes to use the annotation by the probabilistic model in-spired from the space partitioned Va-Files technique [4]. After partitioning

the visual space in visual clusters with the Va-Files technique, the joined de-scription table between the visual clusters and the words are constructed inorder to estimate the distribution of the words for a blob (region) of a newimage. The system developed ’DIMATEX’, gives some substantial annotationscores. L. Khan [6] propose an ontology and semantic web approach to rep-resent the semantic aspect of an image. The low level concepts of ontologyare the linked images by using classifiers SVM (Support Vector Machine).The high-level concepts are represented by Bayesians networks. The imageannotation task is decomposed into a low level atomic classification and a

high-level classification of the concepts in a specific domain of ontology. TheSVM is used in the first classification and the Bayesian network in the secondclassification.

Y. Xiao Lee et al. [8] propose a concept-centered approach that combinesregion- and image-level analysis for automatic image annotation (AIA). Atthe region level, the authors group regions into separate concept groups andperform concept-centered region clustering separately. The key idea is tomake use of the inter- and intra-concept region distribution to eliminate un-reliable region clusters and identify the main region clusters for each concept.

The correspondence between the image region clusters and concepts is thenderived. To further enhance the accuracy of AIA task, the authors employ amulti-stage kNN classification using the global features at the image level. Fi-nally, the authors perform fusion of region- and image-level analysis to obtainthe final annotations.

J. Jeon et al. [11] propose an automatic approach to annotating and re-trieving images based on a training set of images. Regions in an image can bedescribed using a small vocabulary of blobs. Blobs are generated from imagefeatures using clustering. Given a training set of images with annotations,

the authors show that probabilistic models allow predicting the probabilityof generating word given the blobs in an image. This may be used automat-ically annotate and retrieve images given a word a query. The authors showthat relevance models allow deriving these probabilities in a natural way.

3 Description of Image Auto-Annotation System

In order to annotate a new image, the proposed Image Auto-annotation sys-tem learns from a corpus of annotated images. The annotated corpus imagesare segmented into regions. Similar visual regions can be regrouped in classescalled blobs.

Our system is based upon the AnnotB-LSA algorithm, which permits toannotate blobs. When blobs are constructed and annotated, we can annotatea collected image (new image from the Web for example). To do this, the

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-

http://-/?-




image is segmented and its regions affected to the annotated blobs. Then,the new image can be annotated by the key words of the blobs to which itsregions belong.

The auto-annotation process consists of two main stages: Training step

and new image processing step.

3.1 Training Stage

In this stage, we make the system learn from a corpus of annotated imagesin order to prepare for annotating a new image.

The training step comprises two phases: visual space preparation and vi-sual - textual correlation.

Visual Space Preparation

This phase permits the construction of blobs. We first perform the segmen-tation of each image of corpus into regions, followed by the extraction of thevisual features of each region and we achieve this phase by clustering regionsinto blobs according to their visual features. The visual space is representedby the set of blobs.

Visual - Textual Correlation

This phase allows the annotation of blobs constructed in the first phase. Wecorrelate the visual aspect (blobs of regions) and the textual aspect (corpusannotation) by a new algorithm called AnnotB-LSA. This algorithm permitsto annotate the blobs from the key words of the corpus images. The latentsemantic analysis (LSA) model [12] is used by the AnnotB-LSA algorithmin order to extract the latent semantic relationships in the space of the textualkey words and to minimize the ambiguousness (polysemy, synonymy) betweenthe key words that annotate the blobs.

Algorithm Preparation

LSA - latent Space ProcessIn order to prepare the latent space using the LSA model, we first ini-tialize a vector space fom the terms vector (key words) associated to thecorpus images, by constructing the matrix of cooccurrence key words X

images Amn where m indicates the key words lines and n the images

columns and The cell represents the number of apparition of a word inan image (the possible values are 0 or 1). A(i, j) = wij We then applysingular value decomposition (SVD) algorithm on the matrix A.

http://-/?-

http://-/?-




SVD algorithmThe SVD algorithm is performed on matrix A in two steps. First, wedecompose the matrix A in singular values into three (03) matrix:

A = UΣV t =r

i=0

σiuivit

• Σ : diagonal matrix of the singular values of the matrix A, with sizer × r

• U : orthonormal matrix of the vector U i (terms / key words) with sizem× r

• V t: is the orthonormal matrix of image with size n× r

Second, we reduce the space by selecting the first K singular values,

followed by elimination of the singular values close to zero and theircorresponding vectors in the U and V t matrix. Ak = U kΣ kV tk with k ≤ r.

AnnotB-LSA Algorithm

Once the latent space is prepared, the annotation of the blobs is performedby the AnnotB-LSA algorithm. This algorithm consists of the following steps:

Algorithm Blob Annotation AnnotB-LSAInput: set of blobsOutput: annotate blobs

for each blob bi do

Extraction of the key words vectors of the blob regions.Projection of the extracted key words vectors in the latent space.Calculation of the similarity between projected vectorsComparison between the key words vectors of the most similar regions.Annotation of the blobnext Blob

end for

3.2 New Image Processing Stage

Once the training stage is achieved, the system has a learning base of anno-tated blobs. Then our system can annotate a new image. For example, thisstep permits to index textually and automatically any collected image fromthe Web.

The annotation process of new image by using the learning base of anno-tated blobs acts as following:

1. Segmentation of an image into regions.2. Extraction of visual features of each region




3. Affectation of regions to the blobs defined in the training step accordingto the Euclidian distance (calculating between the visual features of aregion and the visual features of the center blob).

4. Selection of the key words of the image by selective heritage of the blobs

to which its regions belong.5. Deletion of the syntaxic repetitive words.

4 Experimentation and Results

We test the AnnotB-LSA algorithm on the Corel data set from Barnardet al. [13], which is extensively used as basic comparative data for recentresearch work in image annotation. The experimental data set comprises

16,000 images, each image is manually annotated with 1 to 5 keywords, andthese words are taken from an English vocabulary of 267 words, where 157words are the most frequently used. We used, as a test, the sample 009 of theCorel images corpus. in which 5,239 images are used as training set and the1801 images as testing set.

For the training process, 5239 images have been considered in order toconstruct the annotated blobs. The images have been segmented in orderto extract their visual features, and to categorize them in blobs that areannotated thereafter. For the test process, 1801 images have been examined.

To annotate an image belonging to this set, it is necessary to segment theimage in regions, to extract their visual features and finally to affect theseregions to the already constructed blobs.

Each image is segmented by the Normalized-Cuts algorithm in five re-gions [14]. For each region, a visual features vector is defined by the param-eters of the texture; in this setting, we considered the variance, the localhomogeneity, the correlation, the entropy and the contrast. The sets of visu-ally similar regions are classified in blobs by the K-Means algorithm.

The constructed blobs are annotated by the AnnotB-LSA algorithm using

the cosine similarity between two images projected vectors of contextual keywords to which belong two regions. This is given by the equation 1:

Sim(V 1, V 2) = cos(−→V 1,−→V 2) =

n

i=1 (v1i × v2i)z

n

i=1 v1i2 ×n

i=1 v2i2(1)

where V i is the projected vector.The following figures present an example of images with the manual an-

notation (Corel) and the annotation provided by our system. Let’s note thatthe annotation 1 represents the Corel annotations, and the annotation 2 rep-resents the AnnotB-LSA algorithm annotation.

http://-/?-

http://-/?-

http://-/?-

http://-/?-




71007.jpeg 46078.jpeg 195008.jpeg1: arch, flowers, gardens 1: tree, water 1: hunter, people

2: flowers, people, railroad, water,flight, plants, reef, tree, vegetables,cactus, coast, display, leaves, pattern,polar, sky, snow, town, walls, fish,rapids.

2: cactus, coast, display, flowers,leaves, pattern, people, polar, sky,snow, town, tree, walls, clouds, fish,plants, vegetables, cars, locomotive,pumpkin.

2: building, cactus, cliffs, clouds,flowers, leaves, mountains, ocean,rabbit, snow, stone, temple, water,flight, people, plants, reef,tree, veg-etables, fence, fungus, pattern, polar,rocks, walls, flag.

37016.jpeg 53024.jpeg 326017.jpeg1: ground, jet, plane 1: people 1: cat, feline, ground2: cactus, coast, display, flowers,leaves, pattern, people, polar, sky,snow, town, tree, walls, water, field,flag, food, petals, plants, reef, veg-etables, pumpkin, plane, trunk.

2: fence, flowers, fungus, pattern,people, polar, rocks, tree, walls, wa-ter, cars, locomotive, pumpkin, sky,snow, flight, plants, reef, vegetables,railroad.

2: flag, flowers, people, plants, po-lar, vegetables, walls, pumpkin, tree,water, close-up, designs, flight, moun-tains, snow, street, village.

22010.jpeg 276021.jpeg 191028.jpeg1: arch, bridge, stone 1: flowers, grass, mountains, snow 1: clouds, sky, sun, tree

2: currency, mountains, people,street, tree, cars, flowers, locomotive,pumpkin, sky, snow, water, flag,plants, polar, vegetables, walls,closeup, clouds, designs, fish, ocean,

plane, temple, trunk.

2: field, flight, flowers, people, plain,plants, polar, trunk, village, walls,close-up, coral, mountains, pattern,sky, street, tracks, pumpkin, water,food, reef.

2: actus, coast, display, flowers,leaves, pattern, people, polar, sky,snow, town, tree, walls, food, reef,water, flag, plants, vegetables, walls.

46008.jpeg 209050.jpeg 384054.jpeg1: beach, people, water 1: close-up, fish, water 1: beach, sky, tree

2:flowers, people, polar, walls, close-up, designs, flight, mountains, plants,

snow, street, tree, village, clouds,fish, ocean, plane, pumpkin, temple,trunk, water.

2:flowers, people, polar, walls, rail-road, water, closeup, clouds, de-

signs,fish, mountains, ocean, plane,pumpkin, snow, street, temple, tree,trunk.

2: flowers, people, p olar, walls, flight,plants, reef, tree, vegetables, water.

Fig. 1. AnnotB-LSA algorithm annotation Results




We have compared the results of our system with the corpus annotation.We notice that the annotation provided by the system is average. Theseresults concern just a little sample and are based upon the texture dimensiononly. We are being integrating the color in the program. We think that this

additional parameter should improve the results.The results we have obtained upon a sample of images are encouraging.

Therefore, we consider that the image auto-annotation system we have pro-posed is promising, especially if we refine the learning process and the imagevisual extraction process (by adding the color dimension).

5 Conclusion

In this paper, we have presented an image auto-annotation system to improveimage retrieval. Our system use a corpus, based on a new algorithm, AnnotB-LSA, which annotate the blobs by the correlation between blobs of visualsegments of similar images and annotations of images corpus. In order toextract the latent semantic relations in the textual space of key words andto minimize the ambiguousness between the annotations of corpus images,the algorithm integrates the model of latent semantic analysis (LSA). A newimage can then be annotated by affecting it to the blobs annotated by thealgorithm.

The experimentation is done on the sample 009 of the images Corel corpus,and it only takes into account the texture of the image. The results areencouraging but they could be improved by integrating other visual featuresas the color and the form and also by refining the training process.

References

1. Monay, F., Gatica-Perez, D.: One Picture Auto Annotation with Latent SpaceModels. In: Proc. ACM Int. Conf. one Multimedia (ACM MESSRS), Berkeley,

California, USA (2003)2. Monay, F., Gatica-Perez, D.: PLSA-based Auto-Annotation Picture: Con-

straining the Latent Space. In: Proc. ACM Int. Conf. one Multimedia (ACMMESSRS), NewYork, USA (2004)

3. Tollari, S.: Indexing and Research of pictures by Fusion of Textual and Visualinformation. Thesis of doctorate. University of the south Toulan-Var (2006)

4. Weber, R., Schek, H.-J., Blott, S.: In Quantitative Analysis and PerformanceStudy heart Similarity-Search Methods in High-Dimensional Space. In: Pro-ceedings of International Conference of Very Large Dated Bases (VLDB), pp.194–205 (1998)

5. Zhao, Y., Zhao, Y., Zhu, Z., Flap, S.: TO Novel Picture Annotation DesignBased one Neural Network. In: Eighth International Conference one IntelligentSystems Design and Applications. IEEE, Los Alamitos (2008), 978-0-7695-3382-7/08, doi:10.1109/ISDA.55

6. Khan, L.: Standards heart Picture Annotation using Semantic Web. To com-pute Standards & Interfaces 29, 169–204 (2007)




7. Xiao, Y., Chua, T.-S., Lee, C.-H.: Fusion of region and image-based techniquesfor automatic image annotation. In: Cham, T.-J., Cai, J., Dorai, C., Rajan,D., Chua, T.-S., Chia, L.-T. (eds.) MMM 2007. LNCS, vol. 4351, pp. 247–258.Springer, Heidelberg (2007)

8. Wang, X.-J., Zhang, L., Jing, F., There, W.-Y.: My. AnnoSearch: Auto-Annotation picture Search by. In: Proceedings of the 2006 IEEES ComputerSociety Conference one to Compute Vision and Pattern Recognition, June 17-22 (2006)

9. Lu, J., Zhao, T., Zhang, Y.: Feature Based Selection One Genetic Algorithmheart Picture Annotation. Knowledge-Based Systems (2008), 0950-7051

10. Jin, W., Shi, R., Chua, S.T.: In Semi-Naive Bayesian Method IncorporatingClustering with Even-Wise Constraints heart Auto Picture Annotation. In:MM 2004, October 10-16. ACM, New York (2004)

11. Jeon, J., Lavrenko, V., Manmatha, R.: Automatic Image Annotation and Re-

trieval using Cross-Media Relevance Models. In: SIGIR 2003. ACM Press,Toronto (2003), 1-58113-646-3/03/0007

12. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.:Indexing by Latent Semantic Analysis. Newspaper of the American Society of Information Science 41(6), 391–407 (1990)

13. Barnard, K., Duygulu, P., Freitas, N., Forsyth, D., Blei, D., Jordan, I.: Matchingwords and pictures. Newspaper of Plots Learning Research 3, 1107–1135 (2003)

14. Shi, J., Malik, J.: Normalized Cuts and Picture Segmentation. IEEE Transac-tions one pattern Analysis and Plots Intelligence 22(8), 888–905 (2000)

15. Liu, W., Tang, X.: Learning an Image-Word Embedding for Image Auto-

Annotation on the Nonlinear Latent Space. In: MM 2005, November 6-11.ACM, Singapore (2005), 1-59593-044-2/05/0011

Documents

Blobs Annotation