40
I KNOW WHAT YOU DID LAST SUMMER: OBJECT-LEVEL AUTO-ANNOTATION OF HOLIDAY SNAPS Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Embed Size (px)

Citation preview

Page 1: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

I KNOW WHAT YOU DID LAST SUMMER: OBJECT-LEVEL AUTO-

ANNOTATION OF HOLIDAY SNAPS

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Page 2: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 3: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 4: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Intorduction

Most of photo organization tools allow tagging (labeling) with keywords

Tagging is a tedious process

Automated annotation

Page 5: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 6: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Auto annotation step

First step : Build database on large-scale data crawling from community photo collections

Second step : Recognition from database

Page 7: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Step detail

The crawling stage : Create a large database of object model, each

object is represented as a cluster of images (object clusters)

Tell us what the cluster contain (labels, GPS location, related content )

The retrieval stage : Consists of a large scale retrieval system

which is based on local image feature Optimize this stage

Page 8: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Step detail (2)

The annotation stage : Estimates the position of object within image

(bounding box) Annotates with text, location, related content

from the database

Page 9: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Resulting method differs

Not general annotation of image with words

The annotation happens at the object level, and include textual labels, related web-sites, GPS location

The annotation of a query image happens within seconds

Building

Taipei 101

Page 10: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 11: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Automatic object mining

Geospatial grid is overlaid over the earth, query Flickr to retrieve geo-tagged photoGPS location

Page 12: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 13: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Scalable object cluster retrieval

Visual vocabulary technique : Created by clustering the descriptor vectors of local visual features such as SIFT or SURF

Ranked using TF*IDF Using RANSAC to estimate a homography

between candidate and query image Retain only candidate when the number

of inliers exceeds a give threshold

Page 14: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

TF*IDF

D : candidate document (candidate image) contain set of visual word

v : visual words (local feature)df(v) : document frequency of visual word v

Note : we want to know which object is present in the query image, so we return a ranked list of object clusters instead of image

Page 15: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 16: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Object knowledge from the wisdom of crowds

Database : Not organized by individual images but by

object clusters

We can use partly redundant information to : Obtain a better understanding of the object

appearance Segment objects Create more compact inverted indices

Page 17: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Object-specific feature confidence score

Use the feature matches from pair-wise can derive a score for each feature

Only feature which match to many of their counterparts in other image will receive a high score

Many of the photo are taken from varying viewpoint around the object, the background will receive less match

Page 18: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 19: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Object-specific feature confidence score

f : feature , i : image : set of inlying feature matches for image

ij : number of images in the current object

cluster o

, : parameter set 1 and 1/3Note : The bounding box is drawn around all

feature with confidence higher than

Page 20: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Object-specific feature confidence score

Page 21: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Better indices through object-specific feature sampling

Estimate bounding boxes can help to compact our inverted index of visual word

Removing object clusters taken by a single user

Page 22: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Last step of retrieval stage

Select the best object cluster as a final result Simple voting with retrieved image for their

parent clusters

Normalizing by cluster size is not feasible

Only votes of 5 images per cluster with the highest retrieval scores are counted

Page 23: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 24: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 25: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Object-level auto-annotation

Consists of two steps : Bounding box estimation Labelling

Bounding box estimation Estimated in the same way for database

images The query image matched to a number of

images in the cluster returned at the top Labelling

Simply copy the information to serve as labels for the query image from object cluster

Page 26: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 27: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Experiments

Conducted a large dataset collected from Flickr

Collected a challenging test-set of 674 images from Picasa Web-Albums

Estimated bounding boxes cover on average 52% of each images

Page 28: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 29: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Efficiency and Precision of Recognition

: baseline, TF*IDF-ranking on 500K visual vocabulary as it is used in other work

: bounding box features + no single user clusters

: all features + no single user clusters

: 66% random features subset + no single user clusters

: 66% random features subset

Page 30: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

67%

Page 31: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Annotation precision

Evaluate how well our system localize bounding boxes by measuring the intersection-over-union(IOU) measure for the ground-truth and hypothesis overlap

76.1%

Page 32: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Results

Page 33: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 34: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 35: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 36: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 37: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Page 38: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Page 39: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Conclusions

Presented a full auto-annotation pipeline for holiday snaps

Object-level annotation with bounding box, relevant tags, Wikipedia articles and GPS location

Page 40: Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Thanks!!!!