Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

I KNOW WHAT YOU DID LAST SUMMER: OBJECT-LEVEL AUTO-

ANNOTATION OF HOLIDAY SNAPS

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline

Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of

crowds Object-level auto-annotation Experiments and Results Conclusions

Outline



Intorduction

Most of photo organization tools allow tagging (labeling) with keywords

Tagging is a tedious process

Automated annotation

Auto annotation step

First step : Build database on large-scale data crawling from community photo collections

Second step : Recognition from database

Step detail

The crawling stage : Create a large database of object model, each

object is represented as a cluster of images (object clusters)

Tell us what the cluster contain (labels, GPS location, related content )

The retrieval stage : Consists of a large scale retrieval system

which is based on local image feature Optimize this stage

Step detail (2)

The annotation stage : Estimates the position of object within image

(bounding box) Annotates with text, location, related content

from the database

Resulting method differs

Not general annotation of image with words

The annotation happens at the object level, and include textual labels, related web-sites, GPS location

The annotation of a query image happens within seconds

Building

Taipei 101

Outline



Automatic object mining

Geospatial grid is overlaid over the earth, query Flickr to retrieve geo-tagged photoGPS location

Outline



Scalable object cluster retrieval

Visual vocabulary technique : Created by clustering the descriptor vectors of local visual features such as SIFT or SURF

Ranked using TF*IDF Using RANSAC to estimate a homography

between candidate and query image Retain only candidate when the number

of inliers exceeds a give threshold

TF*IDF

D : candidate document (candidate image) contain set of visual word

v : visual words (local feature)df(v) : document frequency of visual word v

Note : we want to know which object is present in the query image, so we return a ranked list of object clusters instead of image

Outline



Object knowledge from the wisdom of crowds

Database : Not organized by individual images but by

object clusters

We can use partly redundant information to : Obtain a better understanding of the object

appearance Segment objects Create more compact inverted indices

Object-specific feature confidence score

Use the feature matches from pair-wise can derive a score for each feature

Only feature which match to many of their counterparts in other image will receive a high score

Many of the photo are taken from varying viewpoint around the object, the background will receive less match


f : feature , i : image : set of inlying feature matches for image

ij : number of images in the current object

cluster o

, : parameter set 1 and 1/3Note : The bounding box is drawn around all

feature with confidence higher than


Better indices through object-specific feature sampling

Estimate bounding boxes can help to compact our inverted index of visual word

Removing object clusters taken by a single user

Last step of retrieval stage

Select the best object cluster as a final result Simple voting with retrieved image for their

parent clusters

Normalizing by cluster size is not feasible

Only votes of 5 images per cluster with the highest retrieval scores are counted

Outline



Object-level auto-annotation

Consists of two steps : Bounding box estimation Labelling

Bounding box estimation Estimated in the same way for database

images The query image matched to a number of

images in the cluster returned at the top Labelling

Simply copy the information to serve as labels for the query image from object cluster

Outline



Experiments

Conducted a large dataset collected from Flickr

Collected a challenging test-set of 674 images from Picasa Web-Albums

Estimated bounding boxes cover on average 52% of each images

Efficiency and Precision of Recognition

: baseline, TF*IDF-ranking on 500K visual vocabulary as it is used in other work

: bounding box features + no single user clusters

: all features + no single user clusters

: 66% random features subset + no single user clusters

: 66% random features subset

67%

Annotation precision

Evaluate how well our system localize bounding boxes by measuring the intersection-over-union(IOU) measure for the ground-truth and hypothesis overlap

76.1%

Results

Outline



Conclusions

Presented a full auto-annotation pipeline for holiday snaps

Object-level annotation with bounding box, relevant tags, Wikipedia articles and GPS location

Thanks!!!!

Documents

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool