Upload
beatrice-gilbert
View
241
Download
1
Tags:
Embed Size (px)
Citation preview
I KNOW WHAT YOU DID LAST SUMMER: OBJECT-LEVEL AUTO-
ANNOTATION OF HOLIDAY SNAPS
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Intorduction
Most of photo organization tools allow tagging (labeling) with keywords
Tagging is a tedious process
Automated annotation
Auto annotation step
First step : Build database on large-scale data crawling from community photo collections
Second step : Recognition from database
Step detail
The crawling stage : Create a large database of object model, each
object is represented as a cluster of images (object clusters)
Tell us what the cluster contain (labels, GPS location, related content )
The retrieval stage : Consists of a large scale retrieval system
which is based on local image feature Optimize this stage
Step detail (2)
The annotation stage : Estimates the position of object within image
(bounding box) Annotates with text, location, related content
from the database
Resulting method differs
Not general annotation of image with words
The annotation happens at the object level, and include textual labels, related web-sites, GPS location
The annotation of a query image happens within seconds
Building
Taipei 101
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Automatic object mining
Geospatial grid is overlaid over the earth, query Flickr to retrieve geo-tagged photoGPS location
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Scalable object cluster retrieval
Visual vocabulary technique : Created by clustering the descriptor vectors of local visual features such as SIFT or SURF
Ranked using TF*IDF Using RANSAC to estimate a homography
between candidate and query image Retain only candidate when the number
of inliers exceeds a give threshold
TF*IDF
D : candidate document (candidate image) contain set of visual word
v : visual words (local feature)df(v) : document frequency of visual word v
Note : we want to know which object is present in the query image, so we return a ranked list of object clusters instead of image
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Object knowledge from the wisdom of crowds
Database : Not organized by individual images but by
object clusters
We can use partly redundant information to : Obtain a better understanding of the object
appearance Segment objects Create more compact inverted indices
Object-specific feature confidence score
Use the feature matches from pair-wise can derive a score for each feature
Only feature which match to many of their counterparts in other image will receive a high score
Many of the photo are taken from varying viewpoint around the object, the background will receive less match
Object-specific feature confidence score
f : feature , i : image : set of inlying feature matches for image
ij : number of images in the current object
cluster o
, : parameter set 1 and 1/3Note : The bounding box is drawn around all
feature with confidence higher than
Object-specific feature confidence score
Better indices through object-specific feature sampling
Estimate bounding boxes can help to compact our inverted index of visual word
Removing object clusters taken by a single user
Last step of retrieval stage
Select the best object cluster as a final result Simple voting with retrieved image for their
parent clusters
Normalizing by cluster size is not feasible
Only votes of 5 images per cluster with the highest retrieval scores are counted
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Object-level auto-annotation
Consists of two steps : Bounding box estimation Labelling
Bounding box estimation Estimated in the same way for database
images The query image matched to a number of
images in the cluster returned at the top Labelling
Simply copy the information to serve as labels for the query image from object cluster
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Experiments
Conducted a large dataset collected from Flickr
Collected a challenging test-set of 674 images from Picasa Web-Albums
Estimated bounding boxes cover on average 52% of each images
Efficiency and Precision of Recognition
: baseline, TF*IDF-ranking on 500K visual vocabulary as it is used in other work
: bounding box features + no single user clusters
: all features + no single user clusters
: 66% random features subset + no single user clusters
: 66% random features subset
67%
Annotation precision
Evaluate how well our system localize bounding boxes by measuring the intersection-over-union(IOU) measure for the ground-truth and hypothesis overlap
76.1%
Results
Outline
Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of
crowds Object-level auto-annotation Experiments and Results Conclusions
Conclusions
Presented a full auto-annotation pipeline for holiday snaps
Object-level annotation with bounding box, relevant tags, Wikipedia articles and GPS location
Thanks!!!!