Upload
maxine-dyer
View
26
Download
4
Tags:
Embed Size (px)
DESCRIPTION
BEYOND SLIDING WINDOW:. Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann. Object Localization by Efficient Subwindow Search. Motivations. To localize the object without exhaustive search observation : often, only a small portion of the image contains the object of interest - PowerPoint PPT Presentation
Citation preview
BEYOND SLIDING WINDOW:Object Localization by Efficient Subwindow Search
Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann
MOTIVATIONS
To localize the object without exhaustive search observation : often, only a small portion of the
image contains the object of interest
To find a global optimum in a huge search space
Object detection and retrieval
CONTRIBUTIONS
Efficient (n^2 VS n^4) n^4 rectangles for an image n X n
n X n possible centers n possible choices for width & n for height n^4 rectangles
Optimal Versatile
arbitrary objects VS simple parametric objects in line drawings [4]
flexible in the choice of the cost function VS L2 distance [13]
Challenge To find optimal and tight bounds
BRANCH AND BOUND
first proposed by A. H. Land and A. G. Doig in 1960 for linear programming
a “divide and conquer” approach to optimize some cost function f(x)
recursively branching & bounding split S into subsets Si that min(f(x)) = min(vi) compute the lower & upper bounds of f(x) within
Si
pruning
BOUNDING I
a bag of visual words for non-rigid objects histograms of SIFT prototypes SVM decision function
bounds
get the maximal amount of + and minimal amount of –
integral image makes evaluation O(1)
,
RESULTS
PASCAL VOC 06 5,304 images with 9,507 objects from 10
categories 1000 visual words from 50,000 SURF descriptors claim a match when > 50% overlap between the
detected bounding box and the ground truth
PASCAL VOC 2007 9,963 images with 24,640 objects
BOUNDING II
spatial pyramid for rigid objects histograms with spatial information Extensions with ESS (fine-grained pyramids) SVM decision function
RESULTS
UIUC Car database (side-view, one car per image) 1050 training (550 positive images) 277 test (170 single scale + 107 multi scale) 1000 visual words from 50,000 SURF descriptors
RESULTS
10143 keyframes of a movie return 100 most relevant images for a query 2s per returned image