A Survey about Object Retrieval

The Duality of Object Retrieval:Unsupervised and Supervised

Approaches

TUAN NGUYEN ANH THE UNIVERSITY OF TOKYO

Index •  Part 1: Basic Object Retrieval

Ø Unsupervised approaches

•  Part 2: State-of-the-art results

•  Part 3: Future attempts

Ø Duality & supervised approaches

•  Conclusion

2

Part 1: Basic Object Retrieval

Object Retrieval 4

?

1st

2nd

3rd

4th

5

Similar images

Related info

Source: https://www.yandex.com/images

6

Key words for images

Similar images

Source: https://www.google.com/imghp

Related info

7

Pinterest: Zoom-in Search 8

Source: https://www.pinterest.com/

Overview of the system 9

Query

DatabaseMatching

Features

Features in object retrieval 10

Query

DatabaseMatching

Features

Local features •  SIFT [Lowe, 1999, 2004]

•  HOG [Dalal & Triggs, 2005]

11

Global and deep features •  GIST features [Oliva et al., 2001]

Ø Describe the images by spectral information •  Deep features

Ø Extracted from neural networks

12

[Krizhevsky et al., 2012]

Aggregated Features •  BoF [Sivic et al., 2003] •  Hamming Embedding [Jégou et al., 2008] •  Fisher Vector [Perronnin et al., 2007] •  VLAD [Jégou et al., 2012]

13

Bag of Features (BoF) •  Cluster local descriptors to build a dictionary. •  Compute the BoF vector as a histogram of

visual words.

14

Images

c2

c3

DictionaryBag of Features

[Sivic et al., 2003]

Hamming Embedding •  Each local descriptors set of an image will be

encoded by a binary signature.

15

[Jégou et al., 2008]

Fisher Vector (FV) •  Cluster the local descriptors by GMM •  Fisher Kernel •  Fisher Vector

16

Images Local descriptors GMMFisher Vector

[Perronnin et al., 2007]

VLAD •  Replace the GMM in FV by k-means clustering •  Approximate FV by

17

Images Local descriptors K-meansVLAD Vector


Overview of the system 18

Query

DatabaseMatching

Features

Distances and similarities •  Euclidean distances

•  Hamming distances

•  Inner product

•  Approximated distances (ADC): Ø Distance between query vector and compressed

database vector.

Ø [Jégou et al., 2011]

19

Nearest neighbor search 20

Query

DatabaseMatching, Nearest

neighbor search

Features

Nearest neighbor search 21

Nearest neighbor

Indexing and compressing data •  Coarse-to-fine strategy

Ø Use quantization techniques to build an inverted file (IVF)

22

c1 1 3

c2 2

c3 4 5 6

id code

m bytes

c2

c3

Inverted File

Compressed vector Faster search

Better memory footprint


Quantization techniques •  Compress the data for

better memory footprint •  Search accuracy is

acceptable with appropriate parameters

23

Recall = 95% with 64 bits code


c1 1 3

c2 2

c3 4 5 6

id code

m bytes

Feature processing •  Square rooting [Arandjelovic & Zisserman,

2012] •  L2-normalization [Jain et al., 2012] •  Centralization [Tolias et al., 2013] •  Down-weight highly populated cells in

aggregation [Jégou et al., 2009] •  Whitening [Jégou et al., 2010]

24

Image processing: re-ranking •  Estimate a transformation between the query

region and each target image. •  Target images are re-ranked based on the

discriminability of the spatially verified visual words.

25

mAP with BoF: 0.618→0.645 [Philbin et al., 2007]Dataset: Oxford Buildings

Queries

Image processing: query expansion 26

mAP with BoF: 0.645→0.696 [Chum et al., 2007]

•  Requery after reconstructing the original query.

•  The new query is constructed from verified query in the first time retrieval.

Dataset: Oxford Buildings

Part 2: State-of-the-art results

Nearest neighbor search •  Datasets: 1M~1B vectors with ground truth

data Ø BIGANN dataset: http://corpus-texmex.irisa.fr/

•  Evaluation Ø recall@R = the proportion of queries with NN

ranked in top-R results.

28

c1 1 3

c2 2

c3 4 5 6

id code

m bytes

c2

c3

Inverted File

Compressed vector

Quantization techniques •  Additive Quantization

[Babenko et al., 2014] •  Approximate a vector by

the sum of codewords. •  Learn codewords by an

iterative optimization.

•  Composite Quantization [Zhang et al., 2014]

•  Minimize the orthogonality of the approximation.

29

Indexing techniques •  Multi-indexing [Babenko et al., 2012, 2015]

•  Performance in a dataset of one billion SIFT vectors Ø Memory: 12 GB Ø Search time: 2 ms/query Ø recall@100 = 70%

30

Image search •  Datasets: Oxford building dataset [Philbin et

al., 2007]

•  Evaluation Ø mAP: Mean average precision for a set of queries

is the mean of the average precision scores for each query.

31

Selective Match Kernel •  [Tolias et al., 2013] •  Apply the power normalization to each VLAD

component to improve the accuracy. •  Use hashing to reduce the memory footprint. •  mAP = 0.817 on Oxford5K dataset [Philbin et al., 2007]

32

Neural Codes •  [Babenko et al., 2014] •  Attempt to use features that are extracted from

neural network to object retrieval. •  Features are fine-tuned. •  mAP = 0.435 with fc6 features on Oxford5K

dataset.

33

Sum-pooled convolutional features •  [Babenko et al., 2015] •  Deep features are sum-pooled and Gaussian

weighted to improve the accuracy. •  mAP = 0.657 on Oxford5K dataset.

34

Summary of image retrieval results 35

•  Search framework with deep features in object

retrieval still need to be improved.

Method Feature Framework mAPASMK [Tolias et al., 2013] SIFT VLAD 0.817Neural codes [Babenko et al., 2014] Deep features - 0.435SPoC [Babenko et al., 2015] Deep features SPoC 0.657

Part 3: Future attempts

Attempts on current topics •  Improve the features:

Ø Feature fusion

Ø Find new match kernels

Ø Improve the system with deep features?

•  Improve the distance metrics and NN search.

37

Dual-process system 38

•  [Stanovich et al., 1999, 2004]

Fast, high capacity, implicit knowledge and basic emotions

only .

Slow, limited capacity, explicit knowledge and

complicated emotions.

Supervised Object Retrieval? •  More than just apply the deep features into

retrieval.

•  Learning while searching?

•  Learning with feedback?

39

The Duality of Object Retrieval •  The collaboration between unsupervised

learning and supervised learning in object retrieval.

40

[Stanovich et al., 1999, 2004]

Conclusion •  Basic Object Retrieval

Ø Features: SIFT, HOG, GIST, deep features

Ø Distance metrics and NN search

Ø Hamming Embedding and Aggregation

Ø Pre-processing and post-processing

•  State-of-the-art results

•  Future attempts: Duality & Supervised & Unsupervised?

41

Thank you for listening

Technology

A Survey about Object Retrieval