If you can't read please download the document
Upload
nydia
View
25
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Efficient Image Search and Retrieval using Compact Binary Codes. Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.). Large scale image search. Internet contains many billions of images. How can we search them, based on visual content?. The Challenge: - PowerPoint PPT Presentation
Citation preview
Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU)Antonio Torralba (MIT)Yair Weiss (Hebrew U.)
How can we search them, based on visual content?Large scale image searchInternet contains many billions of imagesThe Challenge:Need way of measuring similarity between imagesNeeds to scale to Internet
Existing approaches to Content-Based Image RetrievalFocus of scaling rather than understanding imageVariety of simple/hand-designed cues:Color and/or Texture histograms, Shape, PCA, etc.Various distance metricsEarth Movers Distance (Rubner et al. 98)
Most recognition approaches slow (~1sec/image)
Our ApproachLearn the metric from training data
DO BOTH TOGETHER
Use compact binary codes for speed
Large scale image/video searchRepresentation must fit in memory (disk too slow)
Facebook has ~10 billion images (1010)PC has ~10 Gbytes of memory (1011 bits) Budget of 101 bits/image
YouTube has ~ a trillion video frames (1012)Big cluster of PCs has ~10 Tbytes (1014 bits) Budget of 102 bits/frame
Binary codes for imagesWant images with similar content to have similar binary codes
Use Hamming distance between codesNumber of bit flipsE.g.:
Semantic Hashing [Salakhutdinov & Hinton, 2007]Text documents
Ham_Dist(10001010,10001110)=1Ham_Dist(10001010,11101110)=3
Semantic HashingAddress SpaceSemantically similar imagesQuery addressSemantic Hash FunctionQuery ImageBinary codeImages in database[Salakhutdinov & Hinton, 2007] for text documentsQuite different to a (conventional) randomizing hash
Semantic HashingEach image code is a memory addressFind neighbors by exploring Hamming ball around query address
Address SpaceQuery addressImages in databaseChooseCode lengthRadiusLookup time is independent of # of data pointsDepends on radius of ball & length of code:
Input Image representation: Gist vectorsPixels not a convenient representationUse Gist descriptor instead (Oliva & Torralba, 2001)512 dimensions/image (real-valued 16,384 bits)L2 distance btw. Gist vectors not bad substitute for human perceptual distanceOliva & Torralba, IJCV 2001NO COLOR INFORMATION
1. Locality Sensitive HashingGionis, A. & Indyk, P. & Motwani, R. (1999)
Take random projections of dataQuantize each projection with few bits101No learning involvedGist descriptor
2. BoostingModified form of BoostSSC [Shaknarovich, Viola & Darrell, 2003]Positive examples are pairs of similar imagesNegative examples are pairs of unrelated images
Learn threshold & dimension for each bit (weak classifier)
3. Restricted Boltzmann Machine (RBM)Type of Deep Belief NetworkHinton & Salakhutdinov, Science 2006 Single RBM layerAttempts to reconstruct input at visible layer from activation of hidden layerW
Multi-Layer RBM: non-linear dimensionality reduction512512w1Input Gist vector (512 dimensions)Layer 1512256w2Layer 2256Nw3Layer 3Output binary code (N dimensions)Linear units at first layer
Training RBM models1st Phase: Pre-training
Unsupervised
Can use unlabeled data (unlimited quantity)
Learn parameters greedily per layer
Gets them to right ballpark2nd Phase: Fine-tuning
Supervised
Requires labeled data(limited quantity)
Back propagate gradients of chosen error function
Moves parameters to local minimum
Greedy pre-training (Unsupervised)512512w1Input Gist vector (512 real dimensions)Layer 1
Greedy pre-training (Unsupervised)Activations of hidden units from layer 1 (512 binary dimensions)512256w2Layer 2
Greedy pre-training (Unsupervised)Activations of hidden units from layer 2 (256 binary dimensions)256Nw3Layer 3
Fine-tuning: back-propagation of Neighborhood Components Analysis objective 512512Input Gist vector (512 real dimensions)Layer 1512256Layer 2256NLayer 3Output binary code (N dimensions)w3w2w1
Neighborhood Components AnalysisGoldberger, Roweis, Salakhutdinov & Hinton, NIPS 2004Tries to preserve neighborhood structure of input spaceAssumes this structure is given (will explain later)Points in output space (coordinate is activation probability of unit) Toy example with 2 classes & N=2 units at top of network:
Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class away
Neighborhood Components AnalysisAdjust network parameters (weights and biases) to move:Points of SAME class closerPoints of DIFFERENT class awayPoints close in input space (Gist) will be close in output code space
Simple Binarization StrategySet threshold - e.g. use median
Deliberately add noise
Retrieval Experiments
Test set 1: LabelMe22,000 images (20,000 train | 2,000 test)Ground truth segmentations for allCan define ground truth distance btw. images using these segmentations
Defining ground truth Boosting and NCA back-propagation require ground truth distance between imagesDefine this using labeled images from LabelMe
Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)
Defining ground truth Pyramid Match (Lazebnik et al. 2006, Grauman & Darrell 2005)Varying spatial resolution to capture approximate spatial correspondance
Examples of LabelMe retrieval12 closest neighbors under different distance metrics
LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000
LabelMe RetrievalSize of retrieval set % of 50 true neighbors in retrieval set0 2,000 10,000 20,0000Number of bits% of 50 true neighbors in first 500 retrieved
Test set 2: Web images12.9 million imagesCollected from InternetNo labels, so use Euclidean distance between Gist vectors as ground truth distance
Web images retrieval% of 50 true neighbors in retrieval setSize of retrieval set
Web images retrievalSize of retrieval set % of 50 true neighbors in retrieval set% of 50 true neighbors in retrieval setSize of retrieval set
Examples of Web retrieval12 neighbors using different distance metrics
Retrieval Timings
SummaryExplored various approaches to learning binary codes for hashing-based retrievalVery quick with performance comparable to complex descriptors
More recent work on binarizationSpectral Hashing (Weiss, Torralba, Fergus NIPS 2009)
*********