30
Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Embed Size (px)

Citation preview

Page 1: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Yimeng Zhang, Zhaoyin Jia and Tsuhan ChenCornell University

Image Retrieval with Geometry-Preserving Visual Phrases

Page 2: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Similar Image Retrieval

Ranked relevant images

Image Database

Page 3: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Bag-of-Visual-Word (BoW)

Images are represented as the histogram of words

Similarity of two images: cosine similarity of histograms

…Length: dictionary size

Page 4: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Geometry-preserving Visual Phrases length-k Phrase:: k words in a certain spatial layout

……

(length-2 phrases)Bag of Phrases:

Page 5: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Phrases vs. Words

Word

Length-2

Length-3

Word

Length-2

Length-3

Irrelevant Relevant

Page 6: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Previous Works

Page 7: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Geometry Verification

Searching Step with BoW

Post-processing (Geometry Verification)

Only on top ranked images

Encode Spatial Info

Page 8: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Modeling relationship between words

Co-occurrences in Entire image [L. Torresani, et al, CVPR 2009]

No spatial information

Phrases in a local neighborhoods [J. Yuan et al, CVPR07][Z. Wu et al., CVPR10]

[C.L.Zitnick, Tech.Report 07]

No long range interactions, weak geometry

Select a subset of phrases [J. Yuan et al, CVPR07]

Discard a large portion of phrases

……

(length-2 Phrase)

Dimension: exponential to # of words in Phrase

Previous works: reduce the number of phrases

Our work: All phrases, Linear computation time

Page 9: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Approach

Page 10: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Overview

BoW BoP

1. Similarity Measure

2. Large Scale Retrieval

InvertedFiles

Min-hash InvertedFiles Min-hash

[Zhang and Chen, 09]

This Paper

Page 11: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Co-occurring Phrases

A B

C

A B

C

D

F

D

F

A

A

E F

E F

[Zhang and Chen, 09]

Only consider the translation difference

Page 12: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

F

F

Co-occurring Phrase Algorithm

A B

C

A B

Cxxx '

yyy '

-2 -1 0 1 2 3 4

32

1

0-1-2-3-4

BCA

DF

A

EF

Offset space

D

F

D

F

A

A

E F

E F

[Zhang and Chen, 09]

# of co-occurring length -2 Phrases:

1 +1

3

2=5

A

FA

Page 13: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Relation with the feature vector

……

……

)(xk )(yk

)(),( yx kk

Inner product of the feature vectors

# of co-occurring length-k phrases)|||||(| 11 kkk YXO

M: # of corresponding pairs, in practice, linear to the number of local features

)(MO same as BOW!!!

Page 14: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Inverted Index with BoWAvoid comparing with every image

Score table

Image ID I1 I2 … InScore +1

Inverted Index

Page 15: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Inverted Index with Word Location

……

I1

Assume same word only occurs once in the same image, Same memory usage as BoW

Page 16: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Score TableCompute # of Co-occurring Phrases:

BoW

Compute the Offset Space

Image ID I1 I2 … InScore

I1 I2 In

BoP

Page 17: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

wi

Inverted Files with Phrases

…Offset Space

+1 +1+1+1

I1 I10 …

I8…

I5…

……

Inverted Index

0,0 1,0

0,1

0,-1 1,-1-1,-1

-1,0

…… …

Page 18: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Final Score

I1 I2 In

OffsetSpace

Image ID I1 I2 … InScore

Final similarity scores

5

82

1

32

2

4 2

101

Page 19: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Overview

BoW BoP

InvertedFiles

Min-hash InvertedFiles Min-hash

Less storage and time complexity

Page 20: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Min-hash with BoW

Probability of min-hash collision(same word)= Image Similarity

I

I’

imf

Page 21: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Min-hash with Phrases

Probability of k min-hash collision with consistent geometry(Details are in the paper)

I

I’

imf

jmf

Offset spacexxx '

yyy '

-3 -2 -1 0 1 2

32

1

0

-1-2-3-4

Page 22: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Other Invariances

)ˆlog(s

''ˆs

sxxx

''ˆs

syyy x

y

'x'y

Image I

Image I’

1p

2p3p

Add dimension to the offset spaceIncrease the memory usage

[Zhang and Chen, 10]

Page 23: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Variant MatchingLocal histogram matching

Page 24: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Evaluation

1. BoW + Inverted Index vs. BoP + inverted Index

2. BoW + Min-hash vs. BoP + Min-hash

Post-processing methods: complimentary to our work

Page 25: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Experiments –Inverted Index5K Oxford dataset (55 queries)1M flicker distracters

Philbin, J. et al. 07

Page 26: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Example Precision-recall curve

Higher precision at lower recall

BoWBoP

Recall

Prec

isio

n

BoPBoW

RecallPr

ecis

ion

BoW

Page 27: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

ComparisonMean average precision: mean of the AP on 55 queries

0 100 200 300 400 500 600 700 800 900 10000.450

0.500

0.550

0.600

0.650

0.700

Vocabulary Size (K)

mAP

Outperform BoW (similar computation)Outperform BoW+RANSAC (10 times slower on 150 top images)Larger improvement on smaller vocabulary size

BoP

BoW BoW+RANSAC

BoP+RANSAC

Page 28: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

+Flicker 1M Dataset

Computational Complexity

Method Memory Runtime (seconds)Quantization Search

BoW 8.1G 0.89s 0.137sBoP 8.5G 0.215s

BoW+RANSAC - 0.89s 4.137s

RANSAC: 4s on top 300 images

0 200 400 600 800 10000.4

0.450.5

0.550.6

0.65 BoWBoP

Number of Images

mAP

Page 29: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

Experiment - min-hash

University of Kentucky dataset

Minhash with BoW: [O. Chum et al., BMVC08]

200 500 8002.80

2.90

3.00

3.10

3.20

3.30

BoWBoP

# of min-hash fun.

Page 30: Yimeng Zhang, Zhaoyin Jia and Tsuhan Chen Cornell University Image Retrieval with Geometry-Preserving Visual Phrases

ConclusionEncode more spatial information into the BoW

Can be applied to all images in the database at the searching step

Same computational complexity as BoW

Better Retrieval Precision than BoW+RANSAC