Upload
nguyen-tuan
View
594
Download
1
Embed Size (px)
Citation preview
The Duality of Object Retrieval:Unsupervised and Supervised
Approaches
TUAN NGUYEN ANH THE UNIVERSITY OF TOKYO
Index • Part 1: Basic Object Retrieval
Ø Unsupervised approaches
• Part 2: State-of-the-art results
• Part 3: Future attempts
Ø Duality & supervised approaches
• Conclusion
2
Part 1: Basic Object Retrieval
Object Retrieval 4
?
1st
2nd
3rd
4th
5
Similar images
Related info
Source: https://www.yandex.com/images
6
Key words for images
Similar images
Source: https://www.google.com/imghp
Related info
7
Pinterest: Zoom-in Search 8
Source: https://www.pinterest.com/
Overview of the system 9
Query
DatabaseMatching
Features
Features in object retrieval 10
Query
DatabaseMatching
Features
Local features • SIFT [Lowe, 1999, 2004]
• HOG [Dalal & Triggs, 2005]
11
Global and deep features • GIST features [Oliva et al., 2001]
Ø Describe the images by spectral information • Deep features
Ø Extracted from neural networks
12
[Krizhevsky et al., 2012]
Aggregated Features • BoF [Sivic et al., 2003] • Hamming Embedding [Jégou et al., 2008] • Fisher Vector [Perronnin et al., 2007] • VLAD [Jégou et al., 2012]
13
Bag of Features (BoF) • Cluster local descriptors to build a dictionary. • Compute the BoF vector as a histogram of
visual words.
14
Images
c2
c3
DictionaryBag of Features
[Sivic et al., 2003]
Hamming Embedding • Each local descriptors set of an image will be
encoded by a binary signature.
15
[Jégou et al., 2008]
Fisher Vector (FV) • Cluster the local descriptors by GMM • Fisher Kernel • Fisher Vector
16
Images Local descriptors GMMFisher Vector
[Perronnin et al., 2007]
VLAD • Replace the GMM in FV by k-means clustering • Approximate FV by
17
Images Local descriptors K-meansVLAD Vector
[Jégou et al., 2012]
Overview of the system 18
Query
DatabaseMatching
Features
Distances and similarities • Euclidean distances
• Hamming distances
• Inner product
• Approximated distances (ADC): Ø Distance between query vector and compressed
database vector.
Ø [Jégou et al., 2011]
19
Nearest neighbor search 20
Query
DatabaseMatching, Nearest
neighbor search
Features
Nearest neighbor search 21
Nearest neighbor
Indexing and compressing data • Coarse-to-fine strategy
Ø Use quantization techniques to build an inverted file (IVF)
22
c1 1 3
c2 2
c3 4 5 6
id code
m bytes
c2
c3
Inverted File
Compressed vector Faster search
Better memory footprint
[Jégou et al., 2011]
Quantization techniques • Compress the data for
better memory footprint • Search accuracy is
acceptable with appropriate parameters
23
Recall = 95% with 64 bits code
[Jégou et al., 2011]
c1 1 3
c2 2
c3 4 5 6
id code
m bytes
Feature processing • Square rooting [Arandjelovic & Zisserman,
2012] • L2-normalization [Jain et al., 2012] • Centralization [Tolias et al., 2013] • Down-weight highly populated cells in
aggregation [Jégou et al., 2009] • Whitening [Jégou et al., 2010]
24
Image processing: re-ranking • Estimate a transformation between the query
region and each target image. • Target images are re-ranked based on the
discriminability of the spatially verified visual words.
25
mAP with BoF: 0.618→0.645 [Philbin et al., 2007]Dataset: Oxford Buildings
Queries
Image processing: query expansion 26
mAP with BoF: 0.645→0.696 [Chum et al., 2007]
• Requery after reconstructing the original query.
• The new query is constructed from verified query in the first time retrieval.
Dataset: Oxford Buildings
Part 2: State-of-the-art results
Nearest neighbor search • Datasets: 1M~1B vectors with ground truth
data Ø BIGANN dataset: http://corpus-texmex.irisa.fr/
• Evaluation Ø recall@R = the proportion of queries with NN
ranked in top-R results.
28
c1 1 3
c2 2
c3 4 5 6
id code
m bytes
c2
c3
Inverted File
Compressed vector
Quantization techniques • Additive Quantization
[Babenko et al., 2014] • Approximate a vector by
the sum of codewords. • Learn codewords by an
iterative optimization.
• Composite Quantization [Zhang et al., 2014]
• Minimize the orthogonality of the approximation.
29
Indexing techniques • Multi-indexing [Babenko et al., 2012, 2015]
• Performance in a dataset of one billion SIFT vectors Ø Memory: 12 GB Ø Search time: 2 ms/query Ø recall@100 = 70%
30
Image search • Datasets: Oxford building dataset [Philbin et
al., 2007]
• Evaluation Ø mAP: Mean average precision for a set of queries
is the mean of the average precision scores for each query.
31
Selective Match Kernel • [Tolias et al., 2013] • Apply the power normalization to each VLAD
component to improve the accuracy. • Use hashing to reduce the memory footprint. • mAP = 0.817 on Oxford5K dataset [Philbin et al., 2007]
32
Neural Codes • [Babenko et al., 2014] • Attempt to use features that are extracted from
neural network to object retrieval. • Features are fine-tuned. • mAP = 0.435 with fc6 features on Oxford5K
dataset.
33
Sum-pooled convolutional features • [Babenko et al., 2015] • Deep features are sum-pooled and Gaussian
weighted to improve the accuracy. • mAP = 0.657 on Oxford5K dataset.
34
Summary of image retrieval results 35
• Search framework with deep features in object
retrieval still need to be improved.
Method Feature Framework mAPASMK [Tolias et al., 2013] SIFT VLAD 0.817Neural codes [Babenko et al., 2014] Deep features - 0.435SPoC [Babenko et al., 2015] Deep features SPoC 0.657
Part 3: Future attempts
Attempts on current topics • Improve the features:
Ø Feature fusion
Ø Find new match kernels
Ø Improve the system with deep features?
• Improve the distance metrics and NN search.
37
Dual-process system 38
• [Stanovich et al., 1999, 2004]
Fast, high capacity, implicit knowledge and basic emotions
only .
Slow, limited capacity, explicit knowledge and
complicated emotions.
Supervised Object Retrieval? • More than just apply the deep features into
retrieval.
• Learning while searching?
• Learning with feedback?
39
The Duality of Object Retrieval • The collaboration between unsupervised
learning and supervised learning in object retrieval.
40
[Stanovich et al., 1999, 2004]
Conclusion • Basic Object Retrieval
Ø Features: SIFT, HOG, GIST, deep features
Ø Distance metrics and NN search
Ø Hamming Embedding and Aggregation
Ø Pre-processing and post-processing
• State-of-the-art results
• Future attempts: Duality & Supervised & Unsupervised?
41
Thank you for listening