1SVCL
Content based Image Retrieval
(at SVCL)
Nikhil Rasiwasia, Nuno VasconcelosStatistical Visual Computing Laboratory
University of California, San Diego
ECE271A – Fall 2007
2SVCL
Why image retrieval?
• Help in finding you the images you want.
Source: http://www.bspcn.com/2007/11/02/25-photographs-taken-at-the-exact-right-time/
3SVCL
But there is Google right?
• Metadata based retrieval systems– text, click-rates, etc.– Google Images– Clearly not sufficient
• what if computers understood images? – Content based image
retrieval (early 90’s)– search based on
the image content
Top 12 retrieval results for the query ‘Mountain’
Metadata based retrieval systems
4SVCL
Early understanding of images. • Query by Visual Example(QBVE)
– user provides query image– system extracts image features (texture, color, shape)– returns nearest neighbors using suitable similarity
measure
Texturesimilarity
Colorsimilarity
Shapesimilarity
5SVCL
• Bag of features representation– No spatial information ()– Yet performs good ()
• Each Feature represented by DCT coefficients – Other people use SIFT, Gabor filters etc
This is a graduate class, so! Details
dct( ) +
6SVCL
Image representation
Bag of DCT vectors
+ +
+
++
++
+
+ +
+
++
++
+++
+
++
+ ++
+++
++ +
+
+ +
++
+++
+++
+
++
+ +
++
+++
++
++
+ +
++
++
+
+++
+ ++
+ +
+ +++
++
+
+ ++
+
+++++
++
+++
+
+
+
+
+
++
++
+
+ ++
+
+
+
+
++
++
+
+
+
+
+
++
+
++
+
+
+
+
+
+
+++
++
+
+
+
+
+ ++
++
+
+
+
+
+
++
++
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
++ +++ +
+
+
+
++
+ ++++ +
+
+ +
+
+
+
+
+++
+++
+ ++
+ +
+
+
+
++
++
+ + ++ ++
+
+
+
+++ ++
+++
+++++
+
+
++++
+
+ +++++
+++
++
+
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+
++
++
+ ++
+
++
+
+
+
++
+
++++
+
+
+
+
+
++
++
++
+
+
+
+
+
+++
++
++
+
+
+
+
+
+
++
+
++
+
+
+
+
+
+++
+
++
++++
+++
+
++
+ +
++
+
+
+++
+
+ +
+ +++
++
++
+
+
+
+++
+
++
+ +
++
+
+
+++
++
+ +
+ +++
++
+
+ ++
+++
++ +
+
+ +
++
+
+ +
+ +
+ +++
++
++
+
++
+
+
GMM
7SVCL
Query by visual example
Query Image
Candidate Images
Probability under various models
Ranking
p1 >
p2
.
.
> pn
8SVCL
Query by visual example (QBVE)QUERY TOP MATCHES
9SVCL
What can go wrong?QUERY TOP MATCHES
10SVCL
• visual similarity does not always correlate with “semantic” similarity
This can go wrong!
Both have visually dissimilar sky
Disagreement of the semantic notions of train with the visual notions of arch.
11SVCL
Intelligent Researchers (like u)…• Semantic Retrieval (SR)
– User provided a query text (keywords)– find images that contains the associated semantic concept.
– around the year 2000,– model semantic classes, learn to annotate images– Provides higher level of abstraction, and supports natural
language queries
abc
query: “people, beach”
12SVCL
Semantic Class ModelingBag of DCT vectors
+ +
+
++
++
+
+ +
+
++
++
+++
++
+
+ ++
+++
++ +
+
+ +
++
++ +
+++
+ ++
+ +
++
+++
++
++
+ +
++
++
+ +++
+ ++
+ +
+ +++
++
+
+ ++
+
+++++
+++++
+
+
+
+
+
++
++++ ++
+
+
+
+
+++
++
+
+
+
+
++++ ++
+
+
+
+
+
+++ ++
+
+
+
+
+ ++++
+
+
+
+
+
+++
++ ++
+
+
+
+
+
+++++
+
+
+
+
+
++ +++ +
+
+
+
++
+ ++++ ++
+ +
++
++
+++
+++
+ ++
+ +
++
+
++
+++ + ++ +
+
+
+
+
++++
++ +++++++
+
++
+++
++ ++
+++
+++
+++++
+
+
+
+
+
+
+++++
+
+
+
+
+
+
++++
+ ++
+
+++
+
+
++
+
++++
+
+
+
+
+
+ +++++
+
+
+
+
+
++++
+++
+
+
+
+
+
+
+++++
+
+
+
+
+
+++
+
++
+
+++
+++
+ +++ +
++
+
+
++++
+ +
+ +++
+++
+
+
+
+
+++
+ ++
+ ++
+
+
+
+++ +
+
+ +
+ +++
++
+
+ ++
+++
++ +
++ +
++
+
+ +
+ +
+ +++
++ +
+
++
+
+
+
GMM
wi = mountain mountain
mountainxP WX ||
Semantic Class Model
Efficient Hierarchical Estimation
•“Formulating Semantics Image Annotation as a Supervised Learning Problem” [G. Carneiro, IEEE Trans. PAMI, 2007]
13SVCL
Semantic Retrieval
Query Image
Candidate Words
Probability under various models
Ranking
p1 >
p2
.
.
> pn
Mountian
Sky
Sexy
Girl
… so on
house
14SVCL
15SVCL
16SVCL
First Five Ranked Results• Query: mountain
• Query: pool
• Query: tiger
17SVCL
First Five Ranked Results• Query: horses
• Query: plants
• Query: blooms
18SVCL
First Five Ranked Results• Query: clouds
• Query: field
• Query: flowers
19SVCL
First Five Ranked Results• Query: jet
• Query: leaf
• Query: sea
20SVCL
But: Semantic Retrieval (SR)• Problem of lexical ambiguity
– multiple meaning of the same word•Anchor - TV anchor or for Ship?•Bank - Financial Institution or River
bank?
• Multiple semantic interpretations of an image
•Boating or Fishing or People?
• Limited by Vocabulary size – What if the system was not trained
for ‘Fishing’ – In other words, it is outside the
space of trained semantic concepts
Lake? Fishing? Boating? People?
Fishing! what if not in the vocabulary?
abc
21SVCL
In Summary• SR Higher level of abstraction
– Better generalization inside the space of trained semantic concepts
– But problem of • Lexical ambiguity • Multiple semantic interpretations• Vocabulary size
• QBVE is unrestricted by language. – Better Generalization outside the space of
trained semantic concepts • a query image of ‘Fishing’ would retrieve
visually similar images.
– But weakly correlated with human notion of similarity
VSabc
Both have visually dissimilar sky
Fishing! what if not in the vocabulary?
Lake? Fishing? Boating? People?
The two systems in many respects are complementary!
22SVCL
Query by Semantic Example (QBSE)
• Suggests an alternate query by example paradigm.
– The user provides an image.– The image is mapped to vector of weights of all the
semantic concepts in the vocabulary, using a semantic labeling system.
– Can be thought as an projection to an abstract space, called as the semantic space
– To retrieve an image, this weight vector is matched to database, using a suitable similarity function
Lake
Wat
erPeo
ple
Sky … Boat
.2 .3 .2 .1 … …
Semantic Space
Mapping to an abstract space of
semantic conceptsSemantic multinomial
vector of weights or
23SVCL
Query by Semantic Example (QBSE)
• As an extension of SR– Query specification not as set of few words.– But a vector of weights of all the semantic
concept in the vocabulary.– Can eliminat
• Problem of lexical ambiguity- Bank+’more’• Multiple semantic interpretation– Boating,
People• Outside the ‘semantic space’ – Fishing.
• As an enrichment of QBVE– The query is still by an example paradigm.– But feature space is Semantic.
• A mapping of the image to an abstract space.
– Similarity measure at a higher level of abstraction.
.1 .2 .1 .3 … … .2
Lake
Wat
erPeo
ple
Boatin
g …
Boat
0 .5 0 .5 … … 0
(SR) query: water, boating=
(QBVE) query: image
Lake
Wat
erPeo
ple
Boatin
g …
Boat
Semantic Space
Boating
Water
Lake
24SVCL
QBSE System
Concept 1Query Image
Any Semantic Labeling System
Concept 2
Concept 3
Concept L
. . .
Database
Weight Vector 1
Weight Vector 2
Weight Vector 3
Weight Vector 4
Weight Vector 5
Weight Vector N
. . . Suitable
Similarity Measure
. . .
Ranked Retrieval
Posterior probability
Weight Vector
123
L
25SVCL
QBSE System
Concept 1Query Image
Any Semantic Labeling System
Concept 2
Concept 3
Concept L
. . .
Database
Weight Vector 1
Weight Vector 2
Weight Vector 3
Weight Vector 4
Weight Vector 5
Weight Vector N
. . . Suitable
Similarity Measure
. . .
Ranked Retrieval
123
L
Posterior probability
Weight Vector
26SVCL
Semantic Class ModelingBag of DCT vectors
+ +
+
++
++
+
+ +
+
++
++
+++
++
+
+ ++
+++
++ +
+
+ +
++
++ +
+++
+ ++
+ +
++
+++
++
++
+ +
++
++
+ +++
+ ++
+ +
+ +++
++
+
+ ++
+
+++++
+++++
+
+
+
+
+
++
++++ ++
+
+
+
+
+++
++
+
+
+
+
++++ ++
+
+
+
+
+
+++ ++
+
+
+
+
+ ++++
+
+
+
+
+
+++
++ ++
+
+
+
+
+
+++++
+
+
+
+
+
++ +++ +
+
+
+
++
+ ++++ ++
+ +
++
++
+++
+++
+ ++
+ +
++
+
++
+++ + ++ +
+
+
+
+
++++
++ +++++++
+
++
+++
++ ++
+++
+++
+++++
+
+
+
+
+
+
+++++
+
+
+
+
+
+
++++
+ ++
+
+++
+
+
++
+
++++
+
+
+
+
+
+ +++++
+
+
+
+
+
++++
+++
+
+
+
+
+
+
+++++
+
+
+
+
+
+++
+
++
+
+++
+++
+ +++ +
++
+
+
++++
+ +
+ +++
+++
+
+
+
+
+++
+ ++
+ ++
+
+
+
+++ +
+
+ +
+ +++
++
+
+ ++
+++
++ +
++ +
++
+
+ +
+ +
+ +++
++ +
+
++
+
+
+
GaussianMixture Model
wi = mountain mountain
mountainxP WX ||
Semantic Class Model
Efficient Hierarchical Estimation
•“Formulating Semantics Image Annotation as a Supervised Learning Problem” [G. Carneiro, CVPR 2005]
27SVCL
QBSE System
Concept 1Query Image
Any Semantic Labeling System
Concept 2
Concept 3
Concept L
. . .
Database
Weight Vector 1
Weight Vector 2
Weight Vector 3
Weight Vector 4
Weight Vector 5
Weight Vector N
. . . Suitable
Similarity Measure
. . .
Ranked Retrieval
123
L
Posterior probability
Weight Vector
28SVCL
Semantic Multinomial
• Posterior Probabilities under series of L independent class models
. . .
skyxP WX ||
mountainxP WX ||
cloudsxP WX ||
L
3
2
1
L
ii
1
1
29SVCL
Semantic Multinomial
30SVCL
QBSE System
Concept 1Query Image
Any Semantic Labeling System
Concept 2
Concept 3
Concept L
. . .
Database
Weight Vector 1
Weight Vector 2
Weight Vector 3
Weight Vector 4
Weight Vector 5
Weight Vector N
. . . Suitable
Similarity Measure
. . .
Ranked Retrieval
123
L
Posterior probability
Weight Vector
31SVCL
Query using QBSE• Note that SMNs are probability
distributions• A natural similarity function is
the Kullback-Leibler divergence
Query Query SMN
)||(maxarg)( ii KLf
j
ij
L
jji
logmaxarg1
<>
<>
<>
<>
Database
32SVCL
Semantic Feature Space• The space is the simplex of posterior concept probabilities• Each image/SMN is thus represented as a point in this
simplexTraditional Text
Based QueryQuery by Semantic
Example
VS abc
vegetation
.2
x
xox
x
x
xx
x
x
x
x x
xx
x
xx
x
x
x
xx
xx
xx
x
xx
x
xx
x
xx
x
x
xx
xx
xx
x
xx
xx
x
xx
Semantic simplex
o - query
x – database images
closest
matches
mountain
sky
rocks
mountain
sky
vegetation
… .4 … .2 … … …
x
o
xx
x
x
x
x x
x
x
x
xx
x
x
x
x
xx
xx
x
x
x xx
xx
Semantic simplex
o - query
x
xx
x
x
x
x
x
x
x
x
xx
xx
x
x
xx
mountain
sky
rocks
vegetation
…
“mountains + sky”
mountain
sky
vegetation
x – database images
0 0 0.5 .5 0 …
Closest matches
34SVCL
Evaluation – Precision:Recall:Scope
Irrelevant Images |A|
Relevant Images |B|
Retrieved Images |Ar|+|Br|
|A| |B||Ar| |Br|
|Br||B|Recall =
|Br||Ar| + |Br|
Precision = |Ar| + |Br|Scope =
The proportion of retrieved and relevant images to all the images retrieved.
The proportion of relevant images that are retrieved, out of all relevant images available.
The number of images that are retrieved.
35SVCL
Relevant #retrieved Precision Recall
Yes 1 1/1 1/3
No 2 1/2 1/3
Yes 3 2/3 2/3
No 4 2/4 2/3
Yes 5 3/5 3/3
Query
Ranking
0
0.2
0.4
0.6
0.8
1
0.33 0.66 1
Recall
Pre
cisi
on
36SVCL
Experimental Setup
• Evaluation Procedure [Feng,04]
– Precision-Recall(scope) Curves : Calculate precision at various recalls(scopes).
– Mean Average Precision: Average precision over all queries, where recall changes (i.e. where relevant items occur)
• Training the Semantic Space– Images – Corel Stock Photo CD’s – Corel50
•5,000 images from 50 CD’s, 4,500 used for training the space
– Semantic Concepts•Total of 371 concepts•Each Image has caption of 1-5 concepts•Semantic concept model learned for each concept.
37SVCL
Experimental Setup
• Retrieval inside the Semantic Space.– Images – Corel Stock Photo CD’s – same as Corel50
•4,500 used as retrieval database•500 used to query the database
• Retrieval outside the Semantic Space– Corel15 - Another 15 Corel Photo CD’s, (not used previously)
•1200: retrieval database, 300: query database
– Flickr18 - 1800 images Downloaded from www.flickr.com•1440: retrieval database, 360: query database•harder than Corel images as shot by non-professional flickr users
38SVCL
Inside the Semantic Space
• Precision of QBSE is significantly higher at most levels of recall
VS
39SVCL
• MAP score for all the 50 classes VS
40SVCL
QBSE QBVE
Inside the Semantic Space same colors
different semantics
41SVCL
QBSE QBVE
“whitish + darkish”
“train + railroad”
Inside the Semantic Space
42SVCL
Outside the Semantic Space
43SVCL
People 0.09Buildings 0.07Street 0.07Statue 0.05Tables 0.04Water 0.04Restaurant 0.04
Buildings 0.06People 0.06Street 0.06Statue 0.04Tree 0.04Boats 0.04Water 0.03
People 0.08Statue 0.07Buildings 0.06Tables 0.05Street 0.05Restaurant 0.04House 0.03
People 0.1Statue 0.08Buildings 0.07Tables 0.06Street 0.06Door 0.05Restaurant 0.04
People 0.12Restaurant 0.07Sky 0.06Tables 0.06Street 0.05Buildings 0.05Statue 0.05
QBVE
QBSE
VSCommercial Construction
44SVCL
• nearest neighbors in this space is significantlymore robust
QBSE vs QBVE
x
xo x
x
x
xx
x
x
x
x xx
x
x
xx
x
xx
xxx
x
xx
x
xx
x
xx x
xx
x
x
xx
xx
xx
xx
x
xx
x
xx
Semantic simplex
o - query
x – database images closest
matches
locomotive
skybridge
train
2242
• both in terms of– metrics– subjective matching
quality
•“Query by semantic example” [N. Rasiawasia, IEEE Trans. Multimedia 2007]
45SVCL
Structure of the Semantic Space• is the gain really due to the semantic structure of
the SMN space?• this can be tested by comparing to a space where
the probabilities are relative to random image groupings
wi = random imgs wi
iWX wxP ||
Semantic Class Model
Efficient Hierarchical Estimation
46SVCL
• with random groupings performance is– quite poor, indeed worse than QBVE
– there seems to be an intrinsic gain of relying on a space where the features are semantic
The semantic gain
49SVCL
But what about this image ;)?
50SVCL
Questions?VS VS abc
51SVCL
Flickr18 Corel15• Automobiles• Building Landscapes• Facial Close Up• Flora• Flowers Close Up• Food and Fruits• Frozen• Hills and Valley• Horses and Foal• Jet Planes• Sand• Sculpture and Statues• Sea and Waves• Solar• Township• Train• Underwater• Water Fun
• Autumn• Adventure Sailing• Barnyard Animals• Caves• Cities of Italy• Commercial
Construction• Food• Greece• Helicopters• Military Vehicles• New Zealand• People of World• Residential Interiors• Sacred Places• Soldier
52SVCL
Content based image retrieval
• Query by Visual Example(QBVE)
– Color, Shape, Texture, Spatial Layout.
– Image is represented as multidimensional feature vector
– Suitable similarity measure
query: “people, beach”• Semantic Retrieval (SR)
– Given keyword w, find images that contains the associated semantic concept.
Query image Visually Similar Image
abc