Graph Regularised Hashing (ECIR'15 Talk)

Graph Regularised Hashing

Sean Moran and Victor Lavrenko

Institute of Language, Cognition and ComputationSchool of Informatics

University of Edinburgh

ECIR’15 Vienna, March 2015

Graph Regularised Hashing (GRH)

Overview

GRH

Evaluation

Conclusion


Overview

GRH

Evaluation

Conclusion

Locality Sensitive Hashing

H

DATABASE


110101

010111

H

010101

111101

.....

DATABASE

HASH TABLE


110101

010111

H

H

QUERY

DATABASE

HASH TABLE

010101

111101

.....


110101

010111

H

H

COMPUTE SIMILARITY

NEARESTNEIGHBOURS

QUERY

010101

111101

.....

H

DATABASE

QUERY

HASH TABLE


110101

010111

111101

H

H

Content Based IR

Image: Imense Ltd

Image: Doersch et al.

Image: Xu et al.

Location Recognition

Near duplicate detection

010101

111101

.....

H

QUERY

DATABASE

QUERY

NEARESTNEIGHBOURS

HASH TABLE

COMPUTE SIMILARITY

Previous work

I Data-independent: Locality Sensitive Hashing (LSH) [Indyk.98]

I Data-dependent (unsupervised): Anchor Graph Hashing(AGH) [Liu et al. ’11], Spectral Hashing (SH) [Weiss ’08]

I Data-dependent (supervised): Self Taught Hashing (STH)[Zhang ’10], Supervised Hashing with Kernels (KSH) [Liu etal. ’12], ITQ + CCA [Gong and Lazebnik ’11], BinaryReconstructive Embedding (BRE) [Kulis and Darrell. ’09]

Previous work

Method Data-Dependent Supervised Scalable Effectiveness

LSH X LowSH X LowSTH X X MediumBRE X X MediumITQ+CCA X X MediumKSH X X High

GRH X X X High


Overview

GRH

Evaluation

Conclusion


I Two step iterative hashing model:

I Step A: Graph Regularisation

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik

s.t. Lik(hkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I Repeat for a set number of iterations (M)


I Step A: Graph Regularisation [Diaz ’07][1]


)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

[1] Diaz, F.: Regularizing query-based retrieval scores. In: IR(2007)


I Step A: Graph Regularisation [Diaz ’07]


)








)








)








)






-1 1 1

-1 -1 -1ba

c

1 1 1

S a b c

a 1 1 0b 1 1 1c 0 1 1

D−1 a b c

a 0.5 0 0b 0 0.33 0c 0 0 0.5

L0 b1 b2 b3

a −1 −1 −1b −1 1 1c 1 1 1


-1 1 1

-1 -1 -1ba

c

1 1 1

L1 = sgn

−1 0 0−0.33 0.33 0.33

0 1 1


-1 1 1

-1 1 1ba

c

1 1 1

L1 =

b1 b2 b3

a −1 1 1b −1 1 1c 1 1 1


I Step B: Data-Space Partitioning

for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik


I hk : Hyperplane k

I bk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ij

K : # bits

N: # data-points



for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik


I hk : Hyperplane k


I xi : data-point i



K : # bits

N: # data-points



for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik


I hk : Hyperplane k


I xi : data-point i



K : # bits

N: # data-points



for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik


I hk : Hyperplane k


I xi : data-point i



K : # bits

N: # data-points



for k = 1. . .K : min ||hk ||2 + C∑N

i=1 ξik


I hk : Hyperplane k


I xi : data-point i



K : # bits

N: # data-points


ba

c

e

f

g

h

d


ba

c

e

f

g

h

d


-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

1 -1 -1

1 -1 -1

1 1 1


1 1 1

-1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

1 1 1

1 -1 -1

1 -1 -1


-1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

-1 1 1

1 1 1

First bit flipped

1 1 -1 Second bit flipped

1 -1 -1


-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1h1 . x−b1=0

h1

Negative (-1)half space

Positive (+1)half space

1 1 -1

1 -1 -1

-1 1 1


-1 1 1

1 1 1

-1 -1 -1

1 1 -11 -1 -1

ba

c

e

f

g

h

1 1 -1

d

-1 1 1

-1 1 1

h2

Positive (+1)half space

h2 . x−b2=0 Negative (-1) half space

Evaluation

Overview

GRH

Evaluation

Conclusion

Datasets/Features

I Standard evaluation datasets [Liu et al. ’12], [Gong andLazebnik ’11]:

I CIFAR-10: 60K images, GIST descriptors, 10 classes1

I MNIST: 70K images, grayscale pixels, 10 classes2

I NUSWIDE: 270K images, GIST descriptors, 21 classes3

I True NNs: images that share at least one class in common[Liu et al. ’12]

1http://www.cs.toronto.edu/~kriz/cifar.html2http://yann.lecun.com/exdb/mnist/3http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm

http://www.cs.toronto.edu/~kriz/cifar.html

http://yann.lecun.com/exdb/mnist/

http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm

Evaluation Metrics

I Hamming ranking evaluation paradigm [Liu et al. ’12], [Gongand Lazebnik ’11]

I Standard evaluation metrics [Liu et al. ’12], [Gong andLazebnik ’11]:

I Mean average precison (mAP)

I Precision at Hamming radius 2 (P@R2)

GRH vs Literature (CIFAR-10 @ 32 bits)

LSH BRE STH KSH GRH (Linear) GRH (RBF)0.10

0.15

0.20

0.25

0.30

0.35

mA

P LinearGRH

Non-linear GRH

GRH vs Literature (CIFAR-10 @ 32 bits)

LSH BRE STH KSH GRH (Linear)GRH (RBF)0.10

0.15

0.20

0.25

0.30

0.35

mA

P

GRH's straightforwardobjective outperforms

more complexobjectives

GRH vs Literature (CIFAR-10)

16 24 32 40 48 56 640.10

0.15

0.20

0.25

0.30

0.35

0.40LSHBREKSHGRH

# Bits

mAP

GRH vs Literature (CIFAR-10)

Small amount of supervision

required

16 24 32 40 48 56 640.10

0.15

0.20

0.25

0.30

0.35

0.40LSHBREKSHGRH

# Bits

mAP

+25-30%

GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)

GRH (Linear) GRH (RBF)0.00

0.05

0.10

0.15

0.20

0.25

0.30 LSH

ITQ+CCA

mAP

Linear GRH

Non-Linear GRH

GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)

GRH (Linear) GRH (RBF)0.00

0.05

0.10

0.15

0.20

0.25

0.30 LSH

ITQ+CCA

mAP

Eigendecompositionnot necessary -saves O(d^3)

GRH vs # Supervisory Data-Points (CIFAR-10)

Linear, T=1K Linear, T=2K RBF, T=1K RBF, T=2K0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

mA

P

LinearGRH

Non-linear GRH

LinearGRH

Non-linear GRH

GRH vs # Supervisory Data-Points (CIFAR-10)

Linear, T=1K Linear, T=2K RBF, T=1K RBF, T=2K0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

mAP

Small amount of supervision

required

GRH Timing (CIFAR-10 @ 32 bits)

Timings (s)Method Train Test TotalGRH 42.68 0.613 43.29KSH [1] 81.17 0.103 82.27

BRE [2] 231.1 0.370 231.4

[1] Liu, W.: Supervised Hashing with Kernels. In: CVPR (2012)[2] Kulis, B.: Binary Reconstructive Embedding. In: NIPS (2009)

Conclusion

Overview

GRH

Evaluation

Conclusion

Conclusions and Future Work

I Supervised hashing model that is both accurate and easilyscalable

I Take-home messages:

I Regularising bits over a graph is effective (and efficient) forhashcode learning

I An intermediate eigendecomposition step is not necessary

I Hyperplanes (linear hypersurfaces) can achieve a very goodretrieval accuracy

I Future work: extend to the cross-modal hashing scenario (e.g.Image ↔ Text, English ↔ Spanish)

Thank you for your attention

Sean Moran

Code and datasets available at:

[email protected]

Engineering

Graph Regularised Hashing (ECIR'15 Talk)