Efficient Image Search and Retrieval using Compact Binary Codes

Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU)Antonio Torralba (MIT)Yair Weiss (Hebrew U.)

How can we search them, based on visual content?Large scale image searchInternet contains many billions of imagesThe Challenge:Need way of measuring similarity between imagesNeeds to scale to Internet

Existing approaches to Content-Based Image RetrievalFocus of scaling rather than understanding imageVariety of simple/hand-designed cues:Color and/or Texture histograms, Shape, PCA, etc.Various distance metricsEarth Movers Distance (Rubner et al. 98)

Most recognition approaches slow (~1sec/image)

Our ApproachLearn the metric from training data

DO BOTH TOGETHER

Use compact binary codes for speed

Large scale image/video searchRepresentation must fit in memory (disk too slow)

Facebook has ~10 billion images (1010)PC has ~10 Gbytes of memory (1011 bits) Budget of 101 bits/image

YouTube has ~ a trillion video frames (1012)Big cluster of PCs has ~10 Tbytes (1014 bits) Budget of 102 bits/frame

Binary codes for imagesWant images with similar content to have similar binary codes

Use Hamming distance between codesNumber of bit flipsE.g.:

Semantic Hashing [Salakhutdinov & Hinton, 2007]Text documents

Ham_Dist(10001010,10001110)=1Ham_Dist(10001010,11101110)=3

Semantic HashingAddress SpaceSemantically similar imagesQuery addressSemantic Hash FunctionQuery ImageBinary codeImages in database[Salakhutdinov & Hinton, 2007] for text documentsQuite different to a (conventional) randomizing hash

Semantic HashingEach image code is a memory addressFind neighbors by exploring Hamming ball around query address

Address SpaceQuery addressImages in databaseChooseCode lengthRadiusLookup time is independent of # of data pointsDepends on radius of ball & length of code:

Code requirementsSimilar images Similar CodesVery compact (

Input Image representation: Gist vectorsPixels not a convenient representationUse Gist descriptor instead (Oliva & Torralba, 2001)512 dimensions/image (real-valued 16,384 bits)L2 distance btw. Gist vectors not bad substitute for human perceptual distanceOliva & Torralba, IJCV 2001NO COLOR INFORMATION

1. Locality Sensitive HashingGionis, A. & Indyk, P. & Motwani, R. (1999)

Take random projections of dataQuantize each projection with few bits101No learning involvedGist descriptor

2. BoostingModified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003]Positive examples are pairs of similar imagesNegative examples are pairs of unrelated images

Learn threshold & dimension for each bit (weak classifier)

3. Restricted Boltzmann Machine (RBM)Type of Deep Belief NetworkHinton & Salakhutdinov, Science 2006 Single RBM layerAttempts to reconstruct input at visible layer from activation of hidden layerW

Multi-Layer RBM: non-linear dimensionality reduction512512w1Input Gist vector (512 dimensions)Layer 1512256w2Layer 2256Nw3Layer 3Output binary code (N dimensions)Linear units at first layer

Training RBM models1st Phase: Pre-training

Unsupervised

Can use unlabeled data (unlimited quantity)

Learn parameters greedily per layer

Gets them to right ballpark2nd Phase: Fine-tuning

Supervised

Requires labeled data(limited quantity)

Back propagate gradients of chosen error function

Moves parameters to local minimum

Greedy pre-training (Unsupervised)512512w1Input Gist vector (512 real dimensions)Layer 1

Greedy pre-training (Unsupervised)Activations of hidden units from layer 1 (512 binary dimensions)512256w2Layer 2

Greedy pre-training (Unsupervised)Activations of hidden units from layer 2 (256 binary dimensions)256Nw3Layer 3

Fine-tuning: back-propagation of Neighborhood Components Analysis objective 512512Input Gist vector (512 real dimensions)Layer 1512256Layer 2256NLayer 3Output binary code (N dimensions)w3w2w1

Neighborhood Components AnalysisGoldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004Tries to preserve neighborhood structure of input spaceAssumes this structure is given (will explain later)Points in output space (coordinate is activation probability of unit) Toy example with 2 classes & N=2 units at top of network:

Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class away

Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class awayPoints close in input space (Gist) will be close in output code space

Simple Binarization StrategySet threshold - e.g. use median

Deliberately add noise

Overall Query SchemeQuery ImageRBMCompute GistBinary codeGist descriptorImage 1Semantic HashRetrieved images

Retrieval Experiments

Test set 1: LabelMe22,000 images (20,000 train | 2,000 test)Ground truth segmentations for allCan define ground truth distance btw. images using these segmentations

Defining ground truth Boosting and NCA back-propagation require ground truth distance between imagesDefine this using labeled images from LabelMe

Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)

Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)Varying spatial resolution to capture approximate spatial correspondance

Examples of LabelMe retrieval12 closest neighbors under different distance metrics

LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000

LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000Number of bits% of 50 true neighbors in first 500 retrieved

Test set 2: Web images12.9 million imagesCollected from InternetNo labels, so use Euclidean distance between Gist vectors as ground truth distance

Web images retrieval% of 50 true neighbors in retrieval setSize of retrieval set

Web images retrievalSize of retrieval set % of 50 true neighbors in retrieval set% of 50 true neighbors in retrieval setSize of retrieval set

Examples of Web retrieval12 neighbors using different distance metrics

Retrieval Timings

SummaryExplored various approaches to learning binary codes for hashing-based retrievalVery quick with performance comparable to complex descriptors

More recent work on binarizationSpectral Hashing (Weiss, Torralba, Fergus NIPS 2009)

*********

Documents

Efficient Image Search and Retrieval using Compact Binary Codes