Fast Image Search Uri Shabi Shiri Chechik Presented by: For the Advanced Topics in Computer Vision course Spring 2007

Fast Image Search

  • Upload

  • View

  • Download

Embed Size (px)


Fast Image Search. Presented by:. Uri Shabi Shiri Chechik. For the Advanced Topics in Computer Vision course Spring 2007. Introduction. - The tasks. Recognition : - PowerPoint PPT Presentation

Citation preview

Page 1: Fast Image Search

Fast Image Search

Uri Shabi Shiri Chechik

Presented by:

For the Advanced Topics in Computer Vision courseSpring 2007

Page 2: Fast Image Search

Recognition:• Given a database of images and an input query

image, we wish to find an image in the database that represents the same object as in the query image.

Classification:• Given a database of images, the algorithm:

– Divides the images into groups– Given a query image it returns the group that

the image belongs to

-The tasksIntroduction

Page 3: Fast Image Search

-The tasksIntroduction

Query Image



D Nister, H Stewenius. Scalable Recognition with a Vocabulary Tree. CVPR’06. 2006.

Page 4: Fast Image Search

-The tasksIntroductionQuery ImageDatabase

Page 5: Fast Image Search

-The problemIntroduction

D Nister, H Stewenius. Scalable Recognition with a Vocabulary Tree. CVPR’06. 2006.

Page 6: Fast Image Search

Biederman 1987

- How many object categories are there?Introduction

Page 7: Fast Image Search

Challenges 1: view point variation

Michelangelo 1475-1564

-The ChallengesIntroduction

Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 8: Fast Image Search

Challenges 2: illumination -The ChallengesIntroduction

Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 9: Fast Image Search

Challenges 3: occlusion

Magritte, 1957

-The ChallengesIntroduction

Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 10: Fast Image Search

Challenges 4: scale -The ChallengesIntroduction

Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 11: Fast Image Search

Challenges 5: deformation

Xu, Beihong 1943

-The ChallengesIntroduction

Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 12: Fast Image Search

To sum up, we have few challenges:• View point variation• Illumination• Occlusion• Scale• Deformation

Introduction -The Challenges

Page 13: Fast Image Search

• A document can be represented by a collection of words• Common words can be ignored (the, an,etc.) –

This is called a ‘stop List’• Words are represented by their stems

– ‘walk’, ‘walking’, ‘walks’ ’walk’• A topic can be recognized by Word frequencies

-Bag of Words (Documents)Introduction

Sivic & Zisserman. Video Google: a text retrieval approach to object matching in videos, Computer Vision, 2003

Page 14: Fast Image Search

Analogy to documents

Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.

sensory, brain, visual, perception,

retinal, cerebral cortex,eye, cell, optical, eye nerve, image, visual,

Hubel, Wiesel

-Bag of Words (Documents)Introduction



Visual X 2

eye X 2





Page 15: Fast Image Search

• Images can be represented by visual words• An object in an image can be recognized by

visual word frequencies

-Bag of WordsIntroduction

J Sivic, A Zisserman. Video Google: a text retrieval approach to object matching in videos. Computer Vision, 2003.

Page 16: Fast Image Search

• We could use a feature as a visual word, but– Too many features– Two features of the same object will never look the same

• A visual word is a “visual stem” which is represented by a descriptor

• What is a good code word (visual word)?– Invariant to different view points, illumination, scale, shift

and transformation

-Visual wordIntroduction

Page 17: Fast Image Search

ObjectObject Bag of ‘words’Bag of ‘words’

-Bag of WordsIntroduction

Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 18: Fast Image Search

Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 19: Fast Image Search

• The fact that we only use the frequencies of visual words, implies that this method is Translation Invariant.

• This is why it is called a ‘Bag of Words’, since two images with the same words are identified as the same image

-Bag of WordsIntroduction

Page 20: Fast Image Search

• Feature Detection• Feature Description• Feature Recognition – how to find similar words

to a query feature from a database of code words

• Image Recognition\Classification – how to find similar images to the query image\ how to classify our image

Breaking down the problem

Page 21: Fast Image Search

ImageImage recognitionrecognition


Feature detection& representation

codewords dictionarycodewords dictionary

image representation

category modelscategory models(and/or) classifiers(and/or) classifiers


Adapted with permission from Fei Fei Li - http://people.csail.mit.edu/torralba/iccv2005/

Page 22: Fast Image Search

• We can use any feature detection algorithms• We can use a mixture of feature detections and

capture more types of features

• What is a good detection?– Invariant to rotation, illumination, scale, shift

and transformation

Feature Detection

Page 23: Fast Image Search

• What is a good descriptor?– Invariant to different view points, illumination,

scale, shift and transformation

• The image recognition is rotation or scale invariant if the detector & descriptor are as well.

Feature Description

Page 24: Fast Image Search

• SIFT descriptor• Local Frames of Reference

Feature Description

Page 25: Fast Image Search

• We determine a local orientation according to the dominant gradient• Define native coordinate system

• We take a 16×16 window and divide it into 16 4×4 windows

• We then compute the gradient orientation histogram of 8 main directions for each window

-SIFTFeature Description

Page 26: Fast Image Search

• Properties• Rotation invariant

-SIFTFeature Description

Page 27: Fast Image Search

• Works together with Distinguished Regions detector

• Assumption: Two frames of the same objects are related by affine transformation

• Idea: Find an affine transformation that best normalizes the frame.

• Two normalized frames of the same object will looks similar

-Local Affine Frames of ReferenceFeature Description

Stˇep´an Obdrˇz´alek,Jirı Matas. Object Recognition using Local Affine Frames on Distinguished Region

Page 28: Fast Image Search

• Properties– Rotation invariant (depending on the shape)– Brings different features of the same object to

be similar – A great advantage!

Could test similarity of features with great efficiency

-Local Frames of ReferenceFeature Description

Page 29: Fast Image Search

• In affine transformation we have 6 degrees of freedom, that can enforce 6 constraints

• An example to the constraints:– Rotate the object around the line from the

center of gravity to the most extreme point

-How to normalize?Feature Description

Page 30: Fast Image Search

A reminder:• Given a database of code words and a query

feature, we find the closest code word to the feature

Fast Feature Recognition

Query FeatureDatabase

Page 31: Fast Image Search

• Each (visual) word is associated with a list of (images) documents containing it

-Inverted FileFast Feature Recognition

Page 32: Fast Image Search

• Each image in the database is scored according to how many common features it has with the query image.

• The image with the best score is selected• Also note, that in order for the object to be

recognized successfully (compete with background regions) it need to be large enough (at least ¼ of image area)

-Inverted FileFast Feature Recognition

Page 33: Fast Image Search

• Why do we need different approaches? • Why can’t we just use a table? • There could be too many visual words and we

want a fast solution!

Fast Feature Recognition

Page 34: Fast Image Search

Three different approaches:• A small number of words• Vocabulary Tree• Decision Tree

Fast Feature Recognition

Page 35: Fast Image Search

Construction of the vocabulary:• We take a large training set of images from many

categories• Then form a codebook containing W words using

the K-means algorithm

Recognition phase:• We sequentially find the nearest neighbor of the

query feature

-A small number of wordsFast Feature Recognition

R Fergus, L Fei-Fei, P Perona, A Zisserman. Learning Object Categories from Google’s Image Search. ICCV 2005.

Page 36: Fast Image Search

-A small number of wordsFast Feature Recognition

Page 37: Fast Image Search

-A small number of wordsFast Feature Recognition

Page 38: Fast Image Search

Pros:• Going sequentially over the words leads high

accuracy• Space efficiency - we save only a small number of


Cons:• A small number of words doesn’t capture all the


-A small number of wordsFast Feature Recognition

Page 39: Fast Image Search

• Input

• A set of n points {x1,x2,…,xn} in a

d-dimensional feature space (the descriptors)• Number of clusters - K

• Objective • To find the partition of the points into K non-

empty disjointed subsets • So that each group consists of the descriptors

closest to a particular center

- K-Mean ClusteringA small Detour

Page 40: Fast Image Search

Step 1:• Randomly choose K equal size sets and

calculate their centers


- K-Mean ClusteringA small Detour

Page 41: Fast Image Search

Step 2:

• For each xi:

Assign xi to the cluster with the closest center

- K-Mean ClusteringA small Detour

Page 42: Fast Image Search

Step 3: Repeat until “no update”• Compute the mean (mass center) for each


• For each xi:

Assign xi to the cluster with the closest center

- K-Mean ClusteringA small Detour

Page 43: Fast Image Search

• The final result:

- K-Mean ClusteringA small Detour

Page 44: Fast Image Search

Idea:• Use many visual words – capture all features• But since we can’t sequentially go over a large

number of words we’ll use a tree!

-Vocabulary TreeFast Feature Recognition

D Nister, H Stewenius. Scalable Recognition with a Vocabulary Tree. CVPR’06. 2006.

Page 45: Fast Image Search

Construction of the vocabulary:• Input: A large set of descriptor vectors • Partition the training data into K groups, where

each group consists of the descriptors closest to a particular center

• Continue recursively for each group up to L levels

Recognition phase: • Traverse the tree up to the “leaves” which will

hopefully contain the closest word

-Vocabulary TreeFast Feature Recognition

Page 46: Fast Image Search

-Vocabulary TreeFast Feature Recognition

Page 47: Fast Image Search

-Vocabulary TreeFast Feature Recognition

Page 48: Fast Image Search

-Vocabulary TreeFast Feature Recognition

Page 49: Fast Image Search

-Vocabulary TreeFast Feature Recognition

Page 50: Fast Image Search

-Vocabulary TreeFast Feature Recognition

Page 51: Fast Image Search

-Vocabulary TreeFast Feature Recognition

Page 52: Fast Image Search
Page 53: Fast Image Search
Page 54: Fast Image Search
Page 55: Fast Image Search
Page 56: Fast Image Search
Page 57: Fast Image Search
Page 58: Fast Image Search
Page 59: Fast Image Search
Page 60: Fast Image Search
Page 61: Fast Image Search
Page 62: Fast Image Search
Page 63: Fast Image Search
Page 64: Fast Image Search
Page 65: Fast Image Search
Page 66: Fast Image Search
Page 67: Fast Image Search
Page 68: Fast Image Search

Pros:• We can save many visual words and thus capture

more features

Cons:• We can’t go sequentially over all words – not

perfectly accurate• Space: we need to save many words

-Vocabulary TreeFast Feature Recognition

Page 69: Fast Image Search

• Idea: use a tree, but on each non terminal node make a very simple check

• In order to overcome accuracy problems:– We can save some frames in both subtrees– We need to recheck similarity when we reach

to the leaves

-Decision TreeFast Feature Recognition

S Obdrzalek, J Matas. Sub-linear indexing for large scale object recognition. Proc. British Machine Vision Conference, 2005.

Page 70: Fast Image Search

• Assume Local Frames of Reference Descriptor was used

• A very simple check in each non terminal node: we check only one pixel and compare it to some threshold

• The leaves are associated with a list of frames• The affine transformation is not perfect:

– Frames close to the threshold saved in both subtrees

– We must recheck similarity of the frames in the leaves

-Decision TreeFast Feature Recognition

Page 71: Fast Image Search

The tree construction:– Every node gets a set of frames– If the number of frames is below some

threshold or indistinguishable, create a leaf– Else find a weak classifier:

• All frames below or close to the threshold in pixel x, are added to left list

• All fraes above or close to the threshold in pixel x, are added to right list

– Continue recursively

-Decision TreeFast Feature Recognition

xx ,

Page 72: Fast Image Search

Weak Classifier• The goal: minimize the expected recall time for

query. we need to find that on average we reach the leaves in minimal time

• Two requirements:– The tree is balanced– The number of ambiguous frames that are

stored in both subtrees is minimized.

xx ,

-Decision TreeFast Feature Recognition

Page 73: Fast Image Search

• Recognition Phase:– Traverse the tree according to the weak

classifiers– Check similarity to the frames in the leaf

-Decision TreeFast Feature Recognition

Page 74: Fast Image Search

Pros:• We can save many visual words and thus capture

more features• A very efficient test in each non terminal node

Cons:• Since the normalization transformation is not

perfect It won’t work on all frames • Ambiguous saving• We need to make another check in the leaves

-Decision TreeFast Feature Recognition

Page 75: Fast Image Search

Fast Image Recognition

• Inverted Files

• Probabilistic approach

• Implementation of pLSA – Google Image Search

Page 76: Fast Image Search

Fast Image Recognition• After recognizing the features in our query

image, our aim is to recognize an image in the database or a group of images (classify) that is most similar to our query image.

Page 77: Fast Image Search

Inverted Files

• Each (visual) word is associated with a list of (images) documents containing it along with frequency and positions of (visual) words

• Analogous to search engines ( )

Page 78: Fast Image Search

Inverted Files – cont.• Results are sorted according to a complex

scoring method that google calls ‘Page Ranking’

Page 79: Fast Image Search

Inverted Files – Recognizing an Image

• The list of features in our query image provides us with a list of corresponding images.

• The task is to rank those images according to their similarity to the query image

• Similarity is based on common words between query and DB image.

Page 80: Fast Image Search

Inverted Files – Voting• Each word

independently votes on the relevance of images.

• To improve recognition we add weights to the different words.

Page 81: Fast Image Search

Inverted Files – Possible Scoring of Words

• Considers how frequent the word is in the database. The more rare the word is, the higher score it will get (because it is more discriminative).

• This is similar to the entropy of a word.

lni ii

Nq n


Word ii in query image

Frequency of word ii in

query imageNumber of Documents in Database

Number of Documents containing

word ii

Page 82: Fast Image Search

Inverted Files – Scoring of images

2i ii

q d q d

• q and d are vectors of the frequencies of all words in an image.

• Similarity score between a DB image and query image • Lower score is better match.• Rare words, when their frequency does not match,

contribute much more to the score.• Need to go over all words - implementation is not

obvious with inverted files

Database Image (Document)

Query Image


lni ii i

Nn m


Page 83: Fast Image Search

Inverted Files – Scoring – Fast

22 2

| 0 | 0 | , 0

22 2 2 2

| , 0

i i i i

i i

i i i ii d i q i q d

i i i ii q d

q d q d

q d q d q d

Vectors are normalized

2i ii

q d q d

| , 0

2 2i i

i ii q d

q d

• Score is a function of common words only.• Scoring is now straightforward with inverted files

Page 84: Fast Image Search

Inverted Files – Scoring – Implications

• This part can be done fastfast• Allows scaling of database without linear growth

in search time because common features are rare (with a big vocabulary) for non-matching images.

• Score is [0,2] with 0 the best match, 2 for images with no common features.| , 0

2 2i i

i ii q d

q d

Page 85: Fast Image Search

• 1400 image database (small)

• Inverted Files• Detector – MSER “Maximally Stable

Extremel Regions”

• Descriptor – SIFT• Hierarchical k-means clustering

Experiment – Effect of Vocabulary Size

David Nister & Henrik Stewenius 2006

Page 86: Fast Image Search

Experiment – Effect of Vocabulary Size

Best parameters:• 1 million words, 6 levels tree

• L1 Normalization Scheme

• 90% success on first hit

Non-Hierarchical:• 10,000 words, one level• Much slower (linear)• Only 86% correct on first hit

Good performance

requires large vocabulary!David Nister & Henrik

Stewenius 2006

Page 87: Fast Image Search

Experiment – Scalability

• Performance scales well with database size

• Losing <5% hit rate while expanding the database 100-fold (green curve is more realistic scenario)

David Nister & Henrik Stewenius 2006

Page 88: Fast Image Search

Demonstration – Robustness• CD Cover is photographed using a digital camera.• Severe occlusions, specularities.• Viewpoint, Rotation and Scale are different.

David Nister & Henrik Stewenius 2006

• CD Cover is identified in real-time from a database of 50,000 images.

Page 89: Fast Image Search

• Big vocabulary can be used effectively and quicklyquickly for high performance image identification.

• Using inverted files, database can be

scaledscaled relatively easy.

• ‘Bag of Words’ model allows for identification

under noisy conditions.


David Nister & Henrik Stewenius 2006

Page 90: Fast Image Search

Bag of Words – Drawbacks

• Positional information is not taken into account.

• These two images have the same frequency of words. Is this the best we can do?

• In fact, in real life, the interaction between the features is important.

Page 91: Fast Image Search

Probabilistic approach – TSI-pLSA• Translation and Scale Invariant pLSA• We’ll try to guess a window surrounding our

object.• We’ll use the position, as well as the frequency,

of a word, to identify the object.• We’ll do all of this in an unsupervised manner.

First, what is pLSA?

R. Fergus et al. 2005

Page 92: Fast Image Search

Probabilistic approach – pLSA• Probabilistic, Unsupervised approach to object

classification.• Probabilistic – Instead of identifying a single topic in

the image we view an image as a collection of topics in different proportions. e.g. 50% motorbike, 50% house.

• Unsupervised – Topics are not defined in advance, rather they are learned from the database.

R. Fergus et al. 2005

Page 93: Fast Image Search

Probabilistic approach – Image is made of Topics

D = 7

W =


Page 94: Fast Image Search

Probabilistic approach – Image is made of Topics









D = 7

W =


Z = 2

Z =



Page 95: Fast Image Search

Probabilistic approach – The model – pLSA

• Distribution of words in documents is associated by a single variable z that represents the topic.

• Assumption is that distribution of words is independent of specific document – function of topic in document.

, |P w d P w d P d

, | |z

P w d P w z P z d P d ,P w d |P w d |P w d P d

Page 96: Fast Image Search

Probabilistic approach – Topics and Words

• Each topic (z) is characterized by its own frequencies of visual words

• Each image (d) contains a mixture of topics,

, | |z

P w d P w z P z d P d

|P w z

|P z d

Page 97: Fast Image Search

Probabilistic approach – Learning

• In Learning we compute the best values for

• EM is used to maximize the likelihood of the model over the data.

| , |P z d P w z

Page 98: Fast Image Search

Probabilistic approach – Topics• Note that learning is unsupervised. • Topics can be thought of as a group of features

that tend to appear together in an image.• Only the number of topics is provided in advance.

Results of 8-topic learning with images of



Page 99: Fast Image Search

Probabilistic approach – Topics – Example• The parts of a motorbike tend to appear together in an

image and therefore, most likely, will be grouped under one topic.

• Images sorted by their prominent topic show under Topic 7, the most images of motorbikes

• We’ll name Topic 7 ‘Motorbikes’ – our classifier.Topic


Page 100: Fast Image Search

Probabilistic approach – Recognition

• In recognition, is locked after learning phase.

• Using EM, is guessed.

, | |z

P w d P w z P z d P d

|P w z

|P z d

Page 101: Fast Image Search

Probabilistic approach – Recognition – Example

• In recognition, we identify the most likely distribution of topics.

• If topic 4 was our classifier for faces we’d classify this image as ‘Face’

|P w z Topic 12345

Page 102: Fast Image Search

• Number of topics is chosen in advance as a parameter.

• It is not related to the actual number of object classes in the data.

• We still need to pick the topic that best describes our object class.

• One option is to pick by hand the best topic.

Probabilistic approach – Choosing Topics

Page 103: Fast Image Search

Probabilistic approach – Validation set• Or we can use a validation set - a few high

quality images, and automatically pick the single best classifier for this validation set.

• In the future, a combination of topics could be used to give superior classification.

Page 104: Fast Image Search

pLSA - Summary

• Word frequency is a function of Topics in image.

• Topics are learned and recognized using EM

• Positional information is still not taken into account.

Page 105: Fast Image Search

Probabilistic approach – TSI-pLSA

• We’ll guess a window surrounding our object.• We’ll use the position of a word, relative to the

object, in the model.

Page 106: Fast Image Search

• c describes the boundaries of an object.

• P(c) is Calculated by fitting a mixture of Gaussians to the features.

• Center of the object is given as the mean of the Gaussian.

• Scale is given by the variance.

• Features are weighted by P(w|z) for a given topic

• This is repeated with k=1,2,….K for all possible bounding boxes.

Image of two planes

Fitting a ‘Mixture of Gaussians’ with k=2 to the features, weighted by their color, would give the two

centers of the planes.

Probabilistic approach – Object Boundaries

Page 107: Fast Image Search

Probabilistic approach – Word Position

• x describes the position of a word in relation to the object.

• Locations are quantized to 36 internal positions and one background position.

Page 108: Fast Image Search

Probabilistic approach – Word Position

• Word positions (x) are used in the model

, , , | |z

P w x d P w x z P z d P d

Page 109: Fast Image Search

Probabilistic approach – Robustness

• Hopefully, recognizing object centroids should allow TSI-pLSA to be scale invariant while preserving its translation-invariance.

Page 110: Fast Image Search

Probabilistic approach – Results

• Z = 2 (Number of topics)

• K = 2 (maximum # of Gaussians)

• Airplanes

R Fergus et al. 2005

Page 111: Fast Image Search

Probabilistic approach – Centroid selection• RedRed is first topics.

It was found to correspond to airplanes

• GreenGreen is second topic.

• Bounding boxes are suggested centroids.

• Solid rectangles are centroids with highest likelihood.

Page 112: Fast Image Search

• Need to specify number of topics – no theory on optimal number

• Many parameters for the model, example:– 350 visual words– 37 discreet positions (6x6 grid + background position)– 8 topics (irrespectable of the way we divide the

dataset)– 350x37x8 = 103,900 parameters to learn

• Need to provide many data points:– 500 images– 700 features in each image– Only 350,000 data points. That is only ~3


Probabilistic approach – Parameters

Page 113: Fast Image Search

Probabilistic approach – Summary

• Number of topics needs to be specified in advance.

• Images do not have to be annotated individually.

• Group of images should contain a few object classes. Otherwise topics would be meaningless.

Page 114: Fast Image Search


• So, we could use a database of airplanes and faces but we do not need to know which is which.

• How does that help us???• TSI-pLSA can be used to improve on

google image search.

Page 115: Fast Image Search

Google image search

• Returns many thousands of images.

• Currently, many are low quality (bad viewpoint, small objects, junk)

Page 116: Fast Image Search

• This is for the entire set of images returned for each keyword

• GreenGreen is good images

• IntermediateIntermediate means some relevance (like a cartoon)

• BadBad means junk image, unrelated.

Google image search – Quality images

Page 117: Fast Image Search

Google image search – Validation set

• Needs a set of images to serve as the ground truth to identify the best topics.

• Could choose some 20 images by hand.

• Empirically, the top 5 images returned by google search are usually good quality.

• Could use Google Translate to obtain the top 5 images in every language.

• For example, 30 images of airplanes in 6 languages

Page 118: Fast Image Search

Google image search + Probabilistic Approach – Results (TSI-pLSA)

• 7 Keywords (datasets):

• ~600 images/keyword• Validation set: ~30 images/keyword• Z=8 (Number of topics) • A single topic was chosen as classifier from the 8 topics

for each keyword using the validation set.• Descriptor – SIFT • Vocabulary – 350 words.

R. Fergus et al. 2005

Page 119: Fast Image Search

• Tested on pre-annotated set. (by hand)

• Each labeled image had to be classified according to the pre-determined classifier for each category

• Notable problems were (A)irplanes, half of them classified as (C)ars Rear.

• Both (F)aces and (W)ristwatches classified as (G)uitars

Results – cont.

Page 120: Fast Image Search

Google image search – Improving• Used TSI-pLSA to reorder the images returned by google

according to the chosen topic.• Graph shows how many of 15% top hits are relevant

• Note that since this method is completely unsupervised it could be used immediately to improve the search results of google image search, although at a high computational cost.

Page 121: Fast Image Search

Google image search – Automatic Classification

• Give a keyword.• Use google image search to find images

somewhat related.• Use TSI-pLSA to classify the images by topics.• Use validation set to choose best topic.• Classify any image in the world.

Page 122: Fast Image Search

Summary• Probabilistic method uses results

returned from image search engine, despite being extremely noisy, to construct automatically classifier for objects.

• This classifier uses frequency and position of visual words to identify the most likely topic of the object in the image.

• This topic can be further used to improve the relevance of such an image search.

• Positional data not always helpful (results not shown). In some cases using only the frequencies of the words gives better performance.

Page 123: Fast Image Search


• R Fergus, L Fei-Fei, P Perona, A Zisserman. Learning Object Categories from Google’s Image Search. ICCV 2005.

• D Nister, H Stewenius. Scalable Recognition with a Vocabulary Tree. CVPR’06. 2006.

• J Sivic, A Zisserman. Video Google: a text retrieval approach to object matching in videos. Computer Vision, 2003.

• S Obdrzalek, J Matas. Sub-linear indexing for large scale object recognition. Proc. British Machine Vision Conference, 2005.

• Li Fei-Fei, Rob Fergus, Antonio Torralba. Recognizing and Learning Object Categories. ICCV 2005 short courses. http://people.csail.mit.edu/torralba/iccv2005/