65
Demo

Andrew Zisserman Talk - Part 1b

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Andrew Zisserman Talk - Part 1b

Demo

Page 2: Andrew Zisserman Talk - Part 1b

Example

Page 3: Andrew Zisserman Talk - Part 1b
Page 4: Andrew Zisserman Talk - Part 1b
Page 5: Andrew Zisserman Talk - Part 1b

Spatial re-ranking• improves precision

• but not recall …

Page 6: Andrew Zisserman Talk - Part 1b

Query images Prec.

Rec.

• high precision at low recall (like google)

• variation in performance over query

• none retrieve all instances

Page 7: Andrew Zisserman Talk - Part 1b

Obtaining visual words is like a sensor measuring the image

“noise” in the measurement process means that some visual words are missing or incorrect, e.g. due to

• Missed detections• Changes beyond built in invariance• Quantization effects

Consequence: Visual word in query is missing in target image

Why aren’t all objects retrieved?

Clustered and quantized to visual words

Clustered and quantized to visual words

sparse frequency vector

Set of SIFTdescriptorsquery image

Hessian-Affine regions + SIFT

descriptors

Hessian-Affine regions + SIFT

descriptors

1. Query expansion2. Better quantization

Page 8: Andrew Zisserman Talk - Part 1b

Query expansion

Page 9: Andrew Zisserman Talk - Part 1b

In text :• Reissue top N results as queries

• Pseudo/blind relevance feedback

• Danger of topic drift – this is a big problem for text

Query Expansion

Page 10: Andrew Zisserman Talk - Part 1b

Original query: Hubble Telescope Achievements

Example from: Jimmy Lin, University of Maryland

Query expansion: Select top 20 terms from top 20 documents according to tf-idf

Telescope, hubble, space, nasa, ultraviolet, shuttle, mirror, telescopes, earth, discovery, orbit, flaw, scientists, launch, stars, universe, mirrors, light, optical, species

Added terms:

Query Expansion: Text

Page 11: Andrew Zisserman Talk - Part 1b

In text :• Reissue top N results as queries

• Pseudo/blind relevance feedback

• Danger of topic drift – this is a big problem for text

In vision:• Reissue spatially verified image regions as queries

• Spatial verification like an oracle of truth

Query Expansion

Page 12: Andrew Zisserman Talk - Part 1b

Visual query expansion - overview

1. Original query

3. Spatial verification

4. New enhanced query

2. Initial retrieval set

5. Additional retrieved images

oracle

Page 13: Andrew Zisserman Talk - Part 1b

Query Image Originally retrieved image Originally not retrieved

What Query Expansion Adds

Page 14: Andrew Zisserman Talk - Part 1b

What Query Expansion Adds

Page 15: Andrew Zisserman Talk - Part 1b

What Query Expansion Adds

Page 16: Andrew Zisserman Talk - Part 1b

What Query Expansion Adds

Page 17: Andrew Zisserman Talk - Part 1b

Visual query expansion - overview

1. Original query

3. Spatial verification

4. New enhanced query

2. Initial retrieval set

5. Additional retrieved images

oracle

Page 18: Andrew Zisserman Talk - Part 1b

Bag of visual words particular object retrieval

Hessian-Affineregions + SIFT descriptors

visual words+tf-idf weighting

querying

sparse frequency vector

centroids(visual words)

Invertedfile

ranked imageshort-list

Set of SIFTdescriptorsquery image

[Lowe 04, Chum & al 2007]

Geometricverification

[Chum & al 2007]

Queryexpansion

Page 19: Andrew Zisserman Talk - Part 1b

Demo

Page 20: Andrew Zisserman Talk - Part 1b

Query image Originally retrieved Retrieved only after expansion

Query Expansion

Page 21: Andrew Zisserman Talk - Part 1b

Queryimage

Expanded results (better)

Original results (good)

Prec.

Prec.

Rec.

Rec.

Page 22: Andrew Zisserman Talk - Part 1b

Better Quantization

Page 23: Andrew Zisserman Talk - Part 1b

Problems arising from quantization

• Typically, quantization has a significant impact on the final performance of the system [Sivic03,Nister06,Philbin07]

• Quantization errors split features that should be grouped together and confuse features that should be separated

Voronoicells

Page 24: Andrew Zisserman Talk - Part 1b

And more …

i. Points 3 and 4 are close, but never matched

i. Points 1, 2 and 3 are matched equally

Page 25: Andrew Zisserman Talk - Part 1b

Overcoming quantization errors• Soft-assign each descriptor to multiple cluster centers• Assignment weight according to Gaussian on distance• Normalize weights to sum to one

[Philbin et al. CVPR 2008, Van Gemert et al. ECCV 2008]

A: 0.1B: 0.5C: 0.4

B: 1.0 Hard Assignment

Soft Assignment

Learning a vocabulary to overcome quantization errors[Mikulik et al. ECCV 2010, Philbin et al. ECCV 2010]

Page 26: Andrew Zisserman Talk - Part 1b

Several other solutions are possible …

e.g. Hamming embedding [Jegou&Schmid ECCV 2008]• Standard quantization using bag-of-visual-words• Additional localization in the Voronoi cell by a binary signature

• More on methods of soft assignment tomorrow

Page 27: Andrew Zisserman Talk - Part 1b

Soft Assignment: Implementation

Bag of words: score a match between two features by the scalar product of their weight vectors

Spatial re-ranking: also score the number of inliers using this measure

Page 28: Andrew Zisserman Talk - Part 1b

Soft Assignment: ResultsBenefit 1: Helping Query Expansion

Query

HardAssignment

Only one good initial result – QE doesn't significantly improve results

Page 29: Andrew Zisserman Talk - Part 1b

Soft Assignment: ResultsBenefit 1: Helping Query Expansion

Query

SoftAssignment

4 good results – allows query expansion to return these results

in addition to the ones above

Page 30: Andrew Zisserman Talk - Part 1b

Soft Assignment: ResultsBenefit 2: Better spatial localization

HardAssignment

SoftAssignment

Page 31: Andrew Zisserman Talk - Part 1b

Results: Baseline to State of the Art

1. Baseline Method K = 10K 0.389

2. Large Vocabulary K=1M 0.618

3. Spatial Re-ranking 0.653

4. Soft Assignment (SA) 0.731

5. Query Expansion (QE) 0.801

Mean Average Precision

6. SA & QE 0.825

Disadvantages of soft assignment?

Page 32: Andrew Zisserman Talk - Part 1b

Outline

1. Object recognition cast as nearest neighbour matching

2. Object recognition cast as text retrieval

3. Large scale search and improving performance

4. Applications

• accessing expert knowledge, data mining, inpainting, location search, large scale reconstruction, mobile apps, …

5. The future and challenges

Page 33: Andrew Zisserman Talk - Part 1b

Application

Accessing expert knowledge:

• Use an image query to access an annotated dataset

• Search with query image retrieve annotation

Page 34: Andrew Zisserman Talk - Part 1b

Visual Access to Classical Art Archives

Currently: 111 thousand Greek vase images

Page 35: Andrew Zisserman Talk - Part 1b

http://explore.clarosnet.org/XDB/ASP/clarosHome/

Page 36: Andrew Zisserman Talk - Part 1b
Page 37: Andrew Zisserman Talk - Part 1b
Page 38: Andrew Zisserman Talk - Part 1b

Application:

Object Mining in Large Datasets

Page 39: Andrew Zisserman Talk - Part 1b

Objective …

Automatically find and group images of same object/scene

Page 40: Andrew Zisserman Talk - Part 1b

Motivation

Applications:

• Dataset summarization

• Efficient retrieval

• Efficient pre-processing for automatic 3-D reconstruction (e.g. PhotoSynth)

Page 41: Andrew Zisserman Talk - Part 1b

Matching Graph

Build a ‘matching graph’ over all the images in the dataset

Each image is a node and a link represents two images having some object in common

Given this graph structure, apply various clustering algorithms to group the data

Page 42: Andrew Zisserman Talk - Part 1b

Finding Commonly Occurring Objects

Simple idea: strong spatial constraints gives a 'link' between two images

Edge strength = # inliers

Page 43: Andrew Zisserman Talk - Part 1b

Finding Commonly Occurring ObjectsUse these links to build up a graph over all images in the

dataset

Nodes = images, edges = spatially verified matches

Page 44: Andrew Zisserman Talk - Part 1b

Building the Matching Graph

• Use each image to query the dataset

• Each query gives a list of results scored by a measure of the spatial consistency to the query

• Threshold this consistency measure to determine the links in the matching graph

Page 45: Andrew Zisserman Talk - Part 1b

Connected Components

In a collection of images of multiple disjoint objects we expect the matching graph to also be disjoint

A simple first step is to take connected components of the matching graph and examine the clusters returned

Page 46: Andrew Zisserman Talk - Part 1b

Connected Components

Example: five connected components from the Oxford dataset

56 images 71 images 26 images 25 images 56 images

Page 47: Andrew Zisserman Talk - Part 1b

Connected ComponentsA problem with connected components is that ‘connecting images’ can sometimes join two disjoint objects

LinkingImages

Can overcome this problem by divide and merge strategy

Page 48: Andrew Zisserman Talk - Part 1b

Datasets

Statue of Liberty dataset (37,034 images)• Crawled from Flickr by querying for ‘statue of

liberty’• Lots of images of the Statue of Liberty but also of

New York and other sites

Rome dataset (1,021,986 images) [1]• Again, crawled from Flickr• Contains too much stuff to mention

[1] Photo tourism: Exploring photo collections in 3D, Noah Snavely, Steven M. Seitz, Richard Szeliski

Page 49: Andrew Zisserman Talk - Part 1b

Results: Statue of Liberty

Largest cluster – 8461 images of the Statue of Liberty

Page 50: Andrew Zisserman Talk - Part 1b

2nd largest – 276 aerial views of New York

Results: Statue of Liberty

Page 51: Andrew Zisserman Talk - Part 1b

3rd largest – 80 American flags

Results: Statue of Liberty

Page 52: Andrew Zisserman Talk - Part 1b

Results: Statue of Liberty

Smaller clusters

Lego Statue of Liberty 59 images

Staten Island 52 images

Page 53: Andrew Zisserman Talk - Part 1b

Results: Rome

18676 images

15818 images

9632 images

4869 images

Page 54: Andrew Zisserman Talk - Part 1b

Timings

21,339 high resolution images from Flickr tagged with 'statue of liberty'

Querying with every image in the database to build the graph takes ~2 hours

Finding connected components (v quick) using a threshold of 20 spatially verified inliers gives 11 clusters with more than 20 images

Page 55: Andrew Zisserman Talk - Part 1b

As an aside …Better matching with fewer features[Turcot & Lowe, ICCV Workshop 2009].

• Build matching graph • Augment image bag-of-word histograms using neighbours

• Like query expansion, but done in advance on the `server side’

Page 56: Andrew Zisserman Talk - Part 1b

Application: Internet-based inpaintingPhoto-editing using images of the same place[Whyte, Sivic and Zisserman, 2009], but see also [Hays and Efros, 2007].

Page 57: Andrew Zisserman Talk - Part 1b

Application: place recognition (retrieval in a structured (on a map) database)

[Knopp, Sivic, Pajdla, ECCV 2010] http://www.di.ens.fr/willow/research/confusers/

Query

Query Expansion(Panoramio, Flickr, … )

Best match

Image indexingwith spatial verification

Optimized image database

ConfuserSuppressionOnly negative training data

(from geotags)

Image database

Page 58: Andrew Zisserman Talk - Part 1b

17

Correctly recognized examples

Page 59: Andrew Zisserman Talk - Part 1b

More correctly recognized examples

Page 60: Andrew Zisserman Talk - Part 1b

Application: Matching and 3D reconstruction in large unstructured datasets.

Building Rome in a Day, SameerAgarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski,International Conference on Computer Vision, 2009http://grail.cs.washington.edu/rome/

See also [Havlena, Torrii, Knopp and Pajdla, CVPR 2009].Figure: N. Snavely

Page 61: Andrew Zisserman Talk - Part 1b

Example of the final 3D point cloud and cameras57,845 downloaded images, 11,868 registered images. This video: 4,619 images.

The Old City of Dubrovnik

Page 62: Andrew Zisserman Talk - Part 1b

Bing visual scan

Application: Mobile visual search apps

and others… Snaptell.com, Moodstocks.com

Page 63: Andrew Zisserman Talk - Part 1b

ExampleExample

Slide credit: I. Laptev

Page 64: Andrew Zisserman Talk - Part 1b
Page 65: Andrew Zisserman Talk - Part 1b

Sivic, J. and Zisserman, A.Video Google: A Text Retrieval Approach to Object Matching in VideosProceedings of the International Conference on Computer Vision (2003)http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic03.pdf

Demo: http://www.robots.ox.ac.uk/~vgg/research/vgoogle/

Chum, O., Philbin, J., Isard, M., Sivic, J. and Zisserman, A.Total Recall: Automatic Query Expansion with a Generative Feature Model for Object RetrievalProceedings of the International Conference on Computer Vision (2007)http://www.robots.ox.ac.uk/~vgg/publications/papers/chum07b.pdf

Demo: http://www.robots.ox.ac.uk/~vgg/research/oxbuildings/

Philbin, J. and Zisserman, A.Object Mining using a Matching Graph on Very Large Image Collections Proc. of the Indian Conference on Vision, Graphics and Image Processing (2008)http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin08b.pdf

Papers and Demos