Combining Text and Image Queries at ImageCLEF2005:A Corpus-Based Relevance-Feedback Approach
Yih-Cheng Chang
Department of Computer Science and Information E
ngineeringNational Taiwan Universit
yTaipei, Taiwan
ImageCLEF 2005
Hsin-Hsi Chen Department of Computer Science and Information En
gineeringNational Taiwan University
Taipei, Taiwan
Wen-Cheng Lin
Department of Medical Informatics
Tzu Chi UniversityHualien, Taiwan
NTU NLPL 2
Why Combining Text and Image Queries in Cross Language Image Retrieval ?
Text-based image retrieval Translation errors in cross language image retrieval Annotation errors in automatic annotation Easy to catch semantic meanings Easy to construct textual query
Content-based image retrieval (CBIR) Semantic meanings are hard to be represented Have to find/draw example images Avoid translation in cross-language image retrieval Annotation is not necessary
NTU NLPL 3
How to Combine Text and Image Features in Cross Language Image Retrieval ?
Parallel approach Conducting text- and content-based retrieval separately and merging the retrieval results
Pipeline approach Using textual or visual information to perform initial retrieval, and then employing the other feature to filter out the irrelevant images
Transformation-based approach Mining the relations between images and text, and employing the mined relations to transform textual information into visual one, and vice versa
NTU NLPL 4
Approach at ImageCLEF 2004 Automatically transform textual queries into vis
ual representations Mine the relationships between text and images
Divide an image into several smaller parts Link the words in caption to the corresponding parts Analogous to word alignment in a sentence aligned parall
el corpus Build a transmedia dictionary
Transform a textual query into visual one using the transmedia dictionary
NTU NLPL 5
System at ImageCLEF2004
Query translation
Images Image captions
Text-Image correlation
learning
Text-based image retrieval
Source language textual query
Visual index
Textual index
Images Image captions
Query transformation
Transmedia dictionary
Target language textual queryVisual query
Content-based image retrieval
Result merging
Retrieved images
Language resources
Target collectionTraining collection
NTU NLPL 6
Learning Correlation
Mare and foal in field, slopes of Clatto Hill, Fife
hillmarefoalfieldslope
segmentationB01B02B03B04
NTU NLPL 7
Text-Based Image Retrievalat ImageCLEF2004
Run Query TranslationBackward
TransliterationMean Average
Precision
WCO WCO No 0.2920
WCO+NT WCO Yes 0.3276
F2hf First-two-highest-frequency No 0.4015
F2hf+NT First-two-highest-frequency Yes 0.4395
Mono - - 0.6304
Using similarity-based backward transliteration improves performance
69.71%
NTU NLPL 8
Cross-Language Experimentsat ImageCLEF2004
Query TypeMean Average
Precision
Textual Query (F2hf+NT) 0.4395
Generated Visual Query (18 topics) 0.0110
Textual Query + Generated Visual Query (N+V+A, n=30, t=0.02)
0.4441
poor
+0.46%:InsignificantPerformanceIncrease
+
NTU NLPL 9
Analyses of These Approaches
Parallel approach and Pipeline approach Simple and useful Not employ the relations between visual and textual
features Transformation-based approach
Textual and visual queries can be translated to each other using relations between visual and textual features
Hard to learn all relations between all visual and textual features
Degree of ambiguity of the relations is usually high
NTU NLPL 10
Our Approach at ImageCLEF2005:A Corpus-Based Relevance Feedback Method
A Corpus-Based Relevance Feedback approach Initiate a content-based retrieval Treat the retrieved images and their text
descriptions as aligned documents Adopt a corpus-based method to select key terms
from text descriptions, and generate a new query.
NTU NLPL 11
Fundamental Concepts of a Corpus-Based Relevant Feedback Approach
CBIR System
Example images
3511 Aux Squadron off to Germany Air-force personnel boarding aeroplane at military air base; technician cyclist and
aircraft hangars in background. July 1952 George Middlemass Cowie Fife Scotland
GMC-4-29-16 pc/mb
"Grouville Bay" Jersey Airways. Twin engined passenger aeroplane on grassy
airfield with aircraft hangars in background. Registered 2 July 1935 J Valentine & Co Jersey Channel Isles
JV-G2933 pc/mb
Textual query
Text-basedretrieval system
Textual retrieval result
aeroplane, military air base, airfield, ...
(Aircraft on the ground)
VIPER system
Normalizedmerge
CBIR system
Text-basedretrieval system
Query translation
pseudo relevance feedback
Chinese query
English query
Textual retrieval result
Visual retrieval result
Example image
Textual query (English
description)
Text-basedretrieval system
Final retrieval result
Textual retrieval result
Initial visual run
Feedback run
Textual run
Show images to user
Image database (Images with
English descriptions)
Image database (Images with
English descriptions)
Image database (Images with
English descriptions)
NTU NLPL 14
Bilingual Ad hoc Retrieval Task
28,133 photographs from St. Andrews University Library’s photographic collection
Collection is in English and queries are in different languages
In our experiments, queries are in Chinese All images are accompanied by a textual
description written in English by librarians working at St. Andrews Library
The test set contains 28 topics, and each topic has text description and an example image.
NTU NLPL 15
An Example – An image and Its Description
NTU NLPL 16
An Example – A topic in Chinese
A Chinese Title
An English Title
NTU NLPL 17
Some Models in Formal Runs
NTU NLPL 18
Experiment Results at ImageCLEF2005
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Query
Av
era
ge P
recis
ion
(%
)
CE
EE
EXCE+EX
EE+EX
+
+
+25.96%
+15.78%
+11.01%
Performance ofEE+EX >CE+EX EE >EX >CE >Visual run
NTU NLPL 19
Lessons Learned
Comparing to initial visual retrieval, average precision is increased from 8.29% to 34.25% after feedback cycle.
Combining Textual and Visual information can improve performance
20
Example: Aircraft on the Ground ( )
Text only (monolingual)
Text only (cross-lingual )
Top 2 images in cross-lingual run are non-relevant because of query translation problem : clear ( ), above ( ), floor ( )
NTU NLPL 21
Example: Aircraft on the Ground (after integration)
Text (monolingual) + Visual
Text+Visual Run is better than monolingual run because it expands some useful words, e.g., aeroplane, military air base, airfield
NTU NLPL 22
ImageCLEF2004 vs. ImageCLEF2005 Text-based IR (monolingual case)
0.6304 (2004) vs. 0.3952 (2005) Topics of this year is a little harder
Text+Image IR (monolingual case) 0.6591 (2004) vs. 0.5053 (2005)
Text+Image IR (crosslingual case) 0.4441 (2004) vs. 0.3977 (2005) 70.45% vs. 100.63%
NTU NLPL 23
Automatic Annotation Task The automatic annotate task in ImageCLEF 2005 ca
n be seen as a classification task, since each image can only be annotated with one word (i.e., a category)
We propose several methods to measure the similarity between a test image and a category, and a test image is classified to the most similar category.
The methods we proposed use the same image features, but different classification approaches.
NTU NLPL 24
Image Feature Extraction Resize images to 256 x 256 pixels Segment each image into 32 x 32 blocks
(each block is 8 x 8 pixels). Compute the average gray value of each
block to construct a vector with 1,024 elements.
The similarity between two images is measured by cosine formula.
NTU NLPL 25
Some Models and Experimental Results
NTU-annotate05-1NN
Baseline model. It uses 1-NN method to classify each image. NTU-annotate05-Top2
Computing the similarity between a test image and a category using the top 2 nearest images in each category, and classify the test image to the most similar category.
NTU-annotate05-SC Training data is clustered using k-means algorithm (k=1000). We compute
the centroid of each category in each cluster, and classify a test image to the category of the nearest centroid.
NTU NLPL 26
Conclusion:Bilingual Ad hoc Retrieval Task
An approach of combining textual and image features is proposed for Chinese-English image retrieval. a corpus-based feedback cycle from CBIR
Compared with the performance of monolingual IR (0.3952), integrating visual and textual queries achieves better performance in CL image retrieval (0.3977). resolve part of translation errors
The integration of visual and textual queries also improves the performance of the monolingual IR from 0.3952 to 0.5053. provide more information
The improvement is the best among all the groups. 78.2% of the best monolingual text retrieval
NTU NLPL 27
Conclusion:Automatic Annotation Task
A feature extraction algorithm is proposed and several classification approaches are explored under the same image features.
The approaches of 1-NN and top-2, which have error rates 21.7%, outperform the centroid-based approach (with error rate 22.5%).
Our method is 9% worse than the group of the best performance (error rate 12.6%), but is better than most of the groups in this task.
Thank You and Comments