Download ppt - SVCL 1 Content based Image Retrieval (at SVCL) Nikhil Rasiwasia, Nuno Vasconcelos Statistical Visual Computing Laboratory University of California, San

1SVCL

Content based Image Retrieval

(at SVCL)

Nikhil Rasiwasia, Nuno VasconcelosStatistical Visual Computing Laboratory

University of California, San Diego

ECE271A – Fall 2007

2SVCL

Why image retrieval?

• Help in finding you the images you want.

Source: http://www.bspcn.com/2007/11/02/25-photographs-taken-at-the-exact-right-time/

3SVCL

But there is Google right?

• Metadata based retrieval systems– text, click-rates, etc.– Google Images– Clearly not sufficient

• what if computers understood images? – Content based image

retrieval (early 90’s)– search based on

the image content

Top 12 retrieval results for the query ‘Mountain’

Metadata based retrieval systems

4SVCL

Early understanding of images. • Query by Visual Example(QBVE)

– user provides query image– system extracts image features (texture, color, shape)– returns nearest neighbors using suitable similarity

measure

Texturesimilarity

Colorsimilarity

Shapesimilarity

5SVCL

• Bag of features representation– No spatial information ()– Yet performs good ()

• Each Feature represented by DCT coefficients – Other people use SIFT, Gabor filters etc

This is a graduate class, so! Details

dct( ) +

6SVCL

Image representation

Bag of DCT vectors

+ +

+

++

++

+

+ +

+

++

++

+++

+

++

+ ++

+++

++ +

+

+ +

++

+++

+++

+

++

+ +

++

+++

++

++

+ +

++

++

+

+++

+ ++

+ +

+ +++

++

+

+ ++

+

+++++

++

+++

+

+

+

+

+

++

++

+

+ ++

+

+

+

+

++

++

+

+

+

+

+

++

+

++

+

+

+

+

+

+

+++

++

+

+

+

+

+ ++

++

+

+

+

+

+

++

++

++

+

+

+

+

+

+

++

+

++

+

+

+

+

+

++ +++ +

+

+

+

++

+ ++++ +

+

+ +

+

+

+

+

+++

+++

+ ++

+ +

+

+

+

++

++

+ + ++ ++

+

+

+

+++ ++

+++

+++++

+

+

++++

+

+ +++++

+++

++

+

++

+

+

+

+

+

+

++

+

++

+

+

+

+

+

+

++

++

+ ++

+

++

+

+

+

++

+

++++

+

+

+

+

+

++

++

++

+

+

+

+

+

+++

++

++

+

+

+

+

+

+

++

+

++

+

+

+

+

+

+++

+

++

++++

+++

+

++

+ +

++

+

+

+++

+

+ +

+ +++

++

++

+

+

+

+++

+

++

+ +

++

+

+

+++

++

+ +

+ +++

++

+

+ ++

+++

++ +

+

+ +

++

+

+ +

+ +

+ +++

++

++

+

++

+

+

GMM

7SVCL

Query by visual example

Query Image

Candidate Images

Probability under various models

Ranking

p1 >

p2

.

.

> pn

8SVCL

Query by visual example (QBVE)QUERY TOP MATCHES

9SVCL

What can go wrong?QUERY TOP MATCHES

10SVCL

• visual similarity does not always correlate with “semantic” similarity

This can go wrong!

Both have visually dissimilar sky

Disagreement of the semantic notions of train with the visual notions of arch.

11SVCL

Intelligent Researchers (like u)…• Semantic Retrieval (SR)

– User provided a query text (keywords)– find images that contains the associated semantic concept.

– around the year 2000,– model semantic classes, learn to annotate images– Provides higher level of abstraction, and supports natural

language queries

abc

query: “people, beach”

12SVCL

Semantic Class ModelingBag of DCT vectors

+ +

+

++

++

+

+ +

+

++

++

+++

++

+

+ ++

+++

++ +

+

+ +

++

++ +

+++

+ ++

+ +

++

+++

++

++

+ +

++

++

+ +++

+ ++

+ +

+ +++

++

+

+ ++

+

+++++

+++++

+

+

+

+

+

++

++++ ++

+

+

+

+

+++

++

+

+

+

+

++++ ++

+

+

+

+

+

+++ ++

+

+

+

+

+ ++++

+

+

+

+

+

+++

++ ++

+

+

+

+

+

+++++

+

+

+

+

+

++ +++ +

+

+

+

++

+ ++++ ++

+ +

++

++

+++

+++

+ ++

+ +

++

+

++

+++ + ++ +

+

+

+

+

++++

++ +++++++

+

++

+++

++ ++

+++

+++

+++++

+

+

+

+

+

+

+++++

+

+

+

+

+

+

++++

+ ++

+

+++

+

+

++

+

++++

+

+

+

+

+

+ +++++

+

+

+

+

+

++++

+++

+

+

+

+

+

+

+++++

+

+

+

+

+

+++

+

++

+

+++

+++

+ +++ +

++

+

+

++++

+ +

+ +++

+++

+

+

+

+

+++

+ ++

+ ++

+

+

+

+++ +

+

+ +

+ +++

++

+

+ ++

+++

++ +

++ +

++

+

+ +

+ +

+ +++

++ +

+

++

+

+

+

GMM

wi = mountain mountain

mountainxP WX ||

Semantic Class Model

Efficient Hierarchical Estimation

•“Formulating Semantics Image Annotation as a Supervised Learning Problem” [G. Carneiro, IEEE Trans. PAMI, 2007]

13SVCL

Semantic Retrieval

Query Image

Candidate Words

Probability under various models

Ranking

p1 >

p2

.

.

> pn

Mountian

Sky

Sexy

Girl

… so on

house

14SVCL

15SVCL

16SVCL

First Five Ranked Results• Query: mountain

• Query: pool

• Query: tiger

17SVCL

First Five Ranked Results• Query: horses

• Query: plants

• Query: blooms

18SVCL

First Five Ranked Results• Query: clouds

• Query: field

• Query: flowers

19SVCL

First Five Ranked Results• Query: jet

• Query: leaf

• Query: sea

20SVCL

But: Semantic Retrieval (SR)• Problem of lexical ambiguity

– multiple meaning of the same word•Anchor - TV anchor or for Ship?•Bank - Financial Institution or River

bank?

• Multiple semantic interpretations of an image

•Boating or Fishing or People?

• Limited by Vocabulary size – What if the system was not trained

for ‘Fishing’ – In other words, it is outside the

space of trained semantic concepts

Lake? Fishing? Boating? People?

Fishing! what if not in the vocabulary?

abc

21SVCL

In Summary• SR Higher level of abstraction

– Better generalization inside the space of trained semantic concepts

– But problem of • Lexical ambiguity • Multiple semantic interpretations• Vocabulary size

• QBVE is unrestricted by language. – Better Generalization outside the space of

trained semantic concepts • a query image of ‘Fishing’ would retrieve

visually similar images.

– But weakly correlated with human notion of similarity

VSabc

Both have visually dissimilar sky

Fishing! what if not in the vocabulary?

Lake? Fishing? Boating? People?

The two systems in many respects are complementary!

22SVCL

Query by Semantic Example (QBSE)

• Suggests an alternate query by example paradigm.

– The user provides an image.– The image is mapped to vector of weights of all the

semantic concepts in the vocabulary, using a semantic labeling system.

– Can be thought as an projection to an abstract space, called as the semantic space

– To retrieve an image, this weight vector is matched to database, using a suitable similarity function

Lake

Wat

erPeo

ple

Sky … Boat

.2 .3 .2 .1 … …

Semantic Space

Mapping to an abstract space of

semantic conceptsSemantic multinomial

vector of weights or

23SVCL

Query by Semantic Example (QBSE)

• As an extension of SR– Query specification not as set of few words.– But a vector of weights of all the semantic

concept in the vocabulary.– Can eliminat

• Problem of lexical ambiguity- Bank+’more’• Multiple semantic interpretation– Boating,

People• Outside the ‘semantic space’ – Fishing.

• As an enrichment of QBVE– The query is still by an example paradigm.– But feature space is Semantic.

• A mapping of the image to an abstract space.

– Similarity measure at a higher level of abstraction.

.1 .2 .1 .3 … … .2

Lake

Wat

erPeo

ple

Boatin

g …

Boat

0 .5 0 .5 … … 0

(SR) query: water, boating=

(QBVE) query: image

Lake

Wat

erPeo

ple

Boatin

g …

Boat

Semantic Space

Boating

Water

Lake

24SVCL

QBSE System

Concept 1Query Image

Any Semantic Labeling System

Concept 2

Concept 3

Concept L

. . .

Database

Weight Vector 1

Weight Vector 2

Weight Vector 3

Weight Vector 4

Weight Vector 5

Weight Vector N

. . . Suitable

Similarity Measure

. . .

Ranked Retrieval

Posterior probability

Weight Vector

123

L

25SVCL

QBSE System



Concept 2

Concept 3

Concept L

. . .

Database

Weight Vector 1

Weight Vector 2

Weight Vector 3

Weight Vector 4

Weight Vector 5

Weight Vector N

. . . Suitable

Similarity Measure

. . .

Ranked Retrieval

123

L


Weight Vector

26SVCL

Semantic Class ModelingBag of DCT vectors

+ +

+

++

++

+

+ +

+

++

++

+++

++

+

+ ++

+++

++ +

+

+ +

++

++ +

+++

+ ++

+ +

++

+++

++

++

+ +

++

++

+ +++

+ ++

+ +

+ +++

++

+

+ ++

+

+++++

+++++

+

+

+

+

+

++

++++ ++

+

+

+

+

+++

++

+

+

+

+

++++ ++

+

+

+

+

+

+++ ++

+

+

+

+

+ ++++

+

+

+

+

+

+++

++ ++

+

+

+

+

+

+++++

+

+

+

+

+

++ +++ +

+

+

+

++

+ ++++ ++

+ +

++

++

+++

+++

+ ++

+ +

++

+

++

+++ + ++ +

+

+

+

+

++++

++ +++++++

+

++

+++

++ ++

+++

+++

+++++

+

+

+

+

+

+

+++++

+

+

+

+

+

+

++++

+ ++

+

+++

+

+

++

+

++++

+

+

+

+

+

+ +++++

+

+

+

+

+

++++

+++

+

+

+

+

+

+

+++++

+

+

+

+

+

+++

+

++

+

+++

+++

+ +++ +

++

+

+

++++

+ +

+ +++

+++

+

+

+

+

+++

+ ++

+ ++

+

+

+

+++ +

+

+ +

+ +++

++

+

+ ++

+++

++ +

++ +

++

+

+ +

+ +

+ +++

++ +

+

++

+

+

+

GaussianMixture Model

wi = mountain mountain

mountainxP WX ||



•“Formulating Semantics Image Annotation as a Supervised Learning Problem” [G. Carneiro, CVPR 2005]

27SVCL

QBSE System



Concept 2

Concept 3

Concept L

. . .

Database

Weight Vector 1

Weight Vector 2

Weight Vector 3

Weight Vector 4

Weight Vector 5

Weight Vector N

. . . Suitable

Similarity Measure

. . .

Ranked Retrieval

123

L


Weight Vector

28SVCL

Semantic Multinomial

• Posterior Probabilities under series of L independent class models

. . .

skyxP WX ||

mountainxP WX ||

cloudsxP WX ||

L

3

2

1

L

ii

1

1

29SVCL

Semantic Multinomial

30SVCL

QBSE System



Concept 2

Concept 3

Concept L

. . .

Database

Weight Vector 1

Weight Vector 2

Weight Vector 3

Weight Vector 4

Weight Vector 5

Weight Vector N

. . . Suitable

Similarity Measure

. . .

Ranked Retrieval

123

L


Weight Vector

31SVCL

Query using QBSE• Note that SMNs are probability

distributions• A natural similarity function is

the Kullback-Leibler divergence

Query Query SMN

)||(maxarg)( ii KLf

j

ij

L

jji

logmaxarg1

<>

<>

<>

<>

Database

32SVCL

Semantic Feature Space• The space is the simplex of posterior concept probabilities• Each image/SMN is thus represented as a point in this

simplexTraditional Text

Based QueryQuery by Semantic

Example

VS abc

vegetation

.2

x

xox

x

x

xx

x

x

x

x x

xx

x

xx

x

x

x

xx

xx

xx

x

xx

x

xx

x

xx

x

x

xx

xx

xx

x

xx

xx

x

xx

Semantic simplex

o - query

x – database images

closest

matches

mountain

sky

rocks

mountain

sky

vegetation

… .4 … .2 … … …

x

o

xx

x

x

x

x x

x

x

x

xx

x

x

x

x

xx

xx

x

x

x xx

xx

Semantic simplex

o - query

x

xx

x

x

x

x

x

x

x

x

xx

xx

x

x

xx

mountain

sky

rocks

vegetation

…

“mountains + sky”

mountain

sky

vegetation

x – database images

0 0 0.5 .5 0 …

Closest matches

34SVCL

Evaluation – Precision:Recall:Scope

Irrelevant Images |A|

Relevant Images |B|

Retrieved Images |Ar|+|Br|

|A| |B||Ar| |Br|

|Br||B|Recall =

|Br||Ar| + |Br|

Precision = |Ar| + |Br|Scope =

The proportion of retrieved and relevant images to all the images retrieved.

The proportion of relevant images that are retrieved, out of all relevant images available.

The number of images that are retrieved.

35SVCL

Relevant #retrieved Precision Recall

Yes 1 1/1 1/3

No 2 1/2 1/3

Yes 3 2/3 2/3

No 4 2/4 2/3

Yes 5 3/5 3/3

Query

Ranking

0

0.2

0.4

0.6

0.8

1

0.33 0.66 1

Recall

Pre

cisi

on

36SVCL

Experimental Setup

• Evaluation Procedure [Feng,04]

– Precision-Recall(scope) Curves : Calculate precision at various recalls(scopes).

– Mean Average Precision: Average precision over all queries, where recall changes (i.e. where relevant items occur)

• Training the Semantic Space– Images – Corel Stock Photo CD’s – Corel50

•5,000 images from 50 CD’s, 4,500 used for training the space

– Semantic Concepts•Total of 371 concepts•Each Image has caption of 1-5 concepts•Semantic concept model learned for each concept.

37SVCL

Experimental Setup

• Retrieval inside the Semantic Space.– Images – Corel Stock Photo CD’s – same as Corel50

•4,500 used as retrieval database•500 used to query the database

• Retrieval outside the Semantic Space– Corel15 - Another 15 Corel Photo CD’s, (not used previously)

•1200: retrieval database, 300: query database

– Flickr18 - 1800 images Downloaded from www.flickr.com•1440: retrieval database, 360: query database•harder than Corel images as shot by non-professional flickr users

38SVCL

Inside the Semantic Space

• Precision of QBSE is significantly higher at most levels of recall

VS

39SVCL

• MAP score for all the 50 classes VS

40SVCL

QBSE QBVE

Inside the Semantic Space same colors

different semantics

41SVCL

QBSE QBVE

“whitish + darkish”

“train + railroad”

Inside the Semantic Space

42SVCL

Outside the Semantic Space

43SVCL

People 0.09Buildings 0.07Street 0.07Statue 0.05Tables 0.04Water 0.04Restaurant 0.04

Buildings 0.06People 0.06Street 0.06Statue 0.04Tree 0.04Boats 0.04Water 0.03

People 0.08Statue 0.07Buildings 0.06Tables 0.05Street 0.05Restaurant 0.04House 0.03

People 0.1Statue 0.08Buildings 0.07Tables 0.06Street 0.06Door 0.05Restaurant 0.04

People 0.12Restaurant 0.07Sky 0.06Tables 0.06Street 0.05Buildings 0.05Statue 0.05

QBVE

QBSE

VSCommercial Construction

44SVCL

• nearest neighbors in this space is significantlymore robust

QBSE vs QBVE

x

xo x

x

x

xx

x

x

x

x xx

x

x

xx

x

xx

xxx

x

xx

x

xx

x

xx x

xx

x

x

xx

xx

xx

xx

x

xx

x

xx

Semantic simplex

o - query

x – database images closest

matches

locomotive

skybridge

train

2242

• both in terms of– metrics– subjective matching

quality

•“Query by semantic example” [N. Rasiawasia, IEEE Trans. Multimedia 2007]

45SVCL

Structure of the Semantic Space• is the gain really due to the semantic structure of

the SMN space?• this can be tested by comparing to a space where

the probabilities are relative to random image groupings

wi = random imgs wi

iWX wxP ||



46SVCL

• with random groupings performance is– quite poor, indeed worse than QBVE

– there seems to be an intrinsic gain of relying on a space where the features are semantic

The semantic gain

49SVCL

But what about this image ;)?

50SVCL

Questions?VS VS abc

51SVCL

Flickr18 Corel15• Automobiles• Building Landscapes• Facial Close Up• Flora• Flowers Close Up• Food and Fruits• Frozen• Hills and Valley• Horses and Foal• Jet Planes• Sand• Sculpture and Statues• Sea and Waves• Solar• Township• Train• Underwater• Water Fun

• Autumn• Adventure Sailing• Barnyard Animals• Caves• Cities of Italy• Commercial

Construction• Food• Greece• Helicopters• Military Vehicles• New Zealand• People of World• Residential Interiors• Sacred Places• Soldier

52SVCL

Content based image retrieval

• Query by Visual Example(QBVE)

– Color, Shape, Texture, Spatial Layout.

– Image is represented as multidimensional feature vector

– Suitable similarity measure

query: “people, beach”• Semantic Retrieval (SR)

– Given keyword w, find images that contains the associated semantic concept.

Query image Visually Similar Image

abc