1
S i S i d H hi f S l bl I Rti l Semi Supervised Hashing for Scalable Image Retrieval Semi Supervised Hashing for Scalable Image Retrieval Semi-Supervised Hashing for Scalable Image Retrieval Semi Supervised Hashing for Scalable Image Retrieval Semi Supervised Hashing for Scalable Image Retrieval 1 2 1 1 J W 2 S ji K d 1 Shih F Ch 1 Jun Wang 2 Sanjiv Kumar and 1 Shih Fu Chang Jun Wang, Sanjiv Kumar and Shih-Fu Chang Jun Wang, Sanjiv Kumar and Shih Fu Chang 1 f l i l i i Cl bi i i k SA 1 Department of Electrical Engineering Columbia University New York USA 1 Department of Electrical Engineering Columbia University New York USA Department of Electrical Engineering, Columbia University, New York, USA 2 G l R hN Y k USA 2 Google Research New York USA 2 Google Research, New York, USA Google Research, New York, USA RltdW k It d ti d M ti ti Related Work S i S i dF l ti Introduction and Motivation Related Work Semi-Supervised Formulation Introduction and Motivation Related Work Semi-Supervised Formulation Introduction and Motivation Semi Supervised Formulation Th i d f f t d t id i f The emerging need of fast and accurate indexing for Locality Sensitive Hashing (LSH) Idk t l 98 The emerging need of fast and accurate indexing for Locality Sensitive Hashing (LSH) -- Indyk et al 98 Incorporate both metric similarity and semantic similarity for fast Locality Sensitive Hashing (LSH) Indyk et al., 98 Incorporate both metric similarity and semantic similarity for fast l l t t b d i ti l (CBIR) large scale content based image retrieval (CBIR) and accurate indexing large scale content based image retrieval (CBIR) and accurate indexing ¾ h i hi i f ibl d i l d ¾Exhaustive search is infeasible due to computational and storage cost Given pair wise labeled data ¾Exhaustive search is infeasible due to computational and storage cost Given pair wise labeled data sampled from distribution random shift ¾I d f i hb i i hb - - sampled from distribution - - random shift ¾Instead of exact nearest neighbor approximate nearest neighbor ¾Instead of exact nearest neighbor, approximate nearest neighbor S t lH hi (SH) (ANN ) h i ti l f l dtb Spectral Hashing (SH) -- Weiss et al 08 A d d d h h ld (ANN )search is more practical for large databases Spectral Hashing (SH) -- Weiss et al. 08 Assume data zero centered and use mean threshold: (ANN )search is more practical for large databases Assume data zero centered and use mean threshold: ¾Compared ith tree based ANN methods hashing approach is more ii l fi ¾Compared with tree based ANN methods, hashing approach is more Empirical fitness: Th l i f h d SSH ¾Compared with tree based ANN methods, hashing approach is more Empirical fitness: The conceptual comparison of the proposed SSH scalable and efficient The conceptual comparison of the proposed SSH scalable and efficient method with LSH SH and RBMs method with LSH, SH and RBMs. dt ti l f Mi i - - data range - - spatial frequency Main issues: R l i ti f i i data range spatial frequency Main issues: Regularization of maximum variance: principal projections of data ¾ Ei i h hi hd l l d i i l j i Regularization of maximum variance: - - principal projections of data ¾ Existing hashing methods mostly rely on random or principal projections Fi l bj ti f ti ¾ Existing hashing methods mostly rely on random or principal projections, Sinusoidal binarization over PCA projections Final objective function Similar hi h t t t Sinusoidal binarization over PCA projections Final objective function Similar which are not very compact or accurate Dissimilar which are not very compact or accurate R t i t dB lt M hi (RBM ) Hi l 06 Dissimilar ¾ Si l ti ll t ht ti i il it Restricted Boltzmann Machines (RBMs) Hinton et al 06 ¾ Simple metrics are usually not enough to express semantic similarity Restricted Boltzmann Machines (RBMs) Hinton et al. 06 ¾ Simple metrics are usually not enough to express semantic similarity L d b li f k b i bi d G l Gi l lbld t ith f i i ¾ Learns deep belief networks to obtain compact binary codes Goal: Given a large unlabeled set with a few pair -wise ¾ Learns deep belief networks to obtain compact binary codes Goal: Given a large unlabeled set with a few pair -wise ¾ T t ii t i d t ii d i d fi t i ¾ Two training steps unsupervised pre-training and supervised fine tuning G dP ii lbld it l dt d d t t h h d ¾ Two training steps, unsupervised pre training and supervised fine tuning Good Partition labeled points learn data-dependent compact hash codes ¾ E ti ti l b f i ht i lt flblddt d l t ii ti P P ii labeled points, learn data dependent compact hash codes ¾ Estimating a large number of weights requires lots of labeled data and long training time Poor Partition Th th i i dh hi ki thi ' CVPR ¾ Estimating a large number of weights requires lots of labeled data and long training time Poor Partition The other semi-supervised hashing work in this year's CVPR: AD YM t l W kl S i dH hi i K lS AD Y. Mu, et al., Weakly-Supervised Hashing in Kernel Space O h l H hCd N O th lH hC d E i t dC l i Orthogonal Hash Codes Non Orthogonal Hash Codes Experiments and Conclusions Orthogonal Hash Codes Non-Orthogonal Hash Codes Experiments and Conclusions Orthogonal Hash Codes Non Orthogonal Hash Codes Experiments and Conclusions Rl h li i d bi MNIST (70K 2K training labels 1K test) Simplified objective function in matrix form Relax orthogonality constraints and combine MNIST (70K, 2K training labels, 1K test) Simplified objective function in matrix form Relax orthogonality constraints and combine Simplified objective function in matrix form Relax orthogonality constraints and combine th t th bj ti f ti lt t them to the objective function as penalty term them to the objective function as penalty term B i i th lit t it th bj ti f ti By imposing orthogonality constraints the objective function By imposing orthogonality constraints, the objective function C b l d b i l difi i f h can be converted to a standard eigen problem Can be solved by a simple modification of the can be converted to a standard eigen problem Can be solved by a simple modification of the can be converted to a standard eigen problem Can be solved by a simple modification of the i th l l ti previous orthogonal solution “adjusted” covariance matrix previous orthogonal solution adjusted covariance matrix O illi GIST (10K ii lbl 2K ) One million GIST (10K training labels 2K test) One million GIST (10K training labels, 2K test) th l l ti --- orthogonal solution orthogonal solution supervised unsupervised supervised unsupervised principal projections of adjustedcovariance matrix Finding maximum variance projections over the data covariance principal projections of adjusted covariance matrix Finding maximum variance projections over the data covariance Finding maximum variance projections over the data covariance i “ dj d” b i i h i i lbldd Ch ffi i matrix “adjusted” by incorporating the pair wise labeled data Choose proper coefficient to guarantee matrix adjusted by incorporating the pair -wise labeled data Choose proper coefficient to guarantee matrix adjusted by incorporating the pair wise labeled data I M t l dt t h th i t td th iti d fi it Issues: Most real datasets have the variance concentrated on the positive-definite Issues: Most real datasets have the variance concentrated on the positive-definite t l f j ti th lit t i f t top several few projections orthogonality constrains force to C l i top several few projections. orthogonality constrains force to Ch l k f i i Conclusions -- - Cholesky factorization Conclusions progressively pick those low variance directions substantially -- - Cholesky factorization ¾ progressively pick those low-variance directions substantially ¾ A semi-supervised formulation for hash function learning progressively pick those low variance directions, substantially ¾ A semi-supervised formulation for hash function learning d i h li f h i b ddi ¾ O th l d th l l ti b i l SVD reducing the quality of hamming embedding non orthogonal solution ¾ Orthogonal and non-orthogonal solutions by simple SVD reducing the quality of hamming embedding --- non-orthogonal solution ¾ Orthogonal and non orthogonal solutions by simple SVD reducing the quality of hamming embedding ¾ Easy implementation fast training and superior performance ¾ Easy implementation, fast training, and superior performance

SiSemi-SidSupervisedSupervised p H hi f S l bl I R t i ...jwang/cvprpresentations/SSH_CVPR2010_poster.pdfSiSemi-SidSupervisedSupervised p H hi f S l bl I R t i lHashing for Scalable

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SiSemi-SidSupervisedSupervised p H hi f S l bl I R t i ...jwang/cvprpresentations/SSH_CVPR2010_poster.pdfSiSemi-SidSupervisedSupervised p H hi f S l bl I R t i lHashing for Scalable

S i S i d H hi f S l bl I R t i lSemi Supervised Hashing for Scalable Image RetrievalSemi Supervised Hashing for Scalable Image RetrievalSemi-Supervised Hashing for Scalable Image RetrievalSemi Supervised Hashing for Scalable Image RetrievalSemi Supervised Hashing for Scalable Image Retrievalp g g1 2 11J W 2S ji K d 1Shih F Ch1Jun Wang 2Sanjiv Kumar and 1Shih Fu ChangJun Wang, Sanjiv Kumar and Shih-Fu ChangJun Wang, Sanjiv Kumar and Shih Fu ChangJ g, j g

1 f l i l i i C l bi i i k SA1Department of Electrical Engineering Columbia University New York USA1Department of Electrical Engineering Columbia University New York USADepartment of Electrical Engineering, Columbia University, New York, USAp f g g, y, ,2G l R h N Y k USA2Google Research New York USA2Google Research, New York, USAGoogle Research, New York, USAg

R l t d W kI t d ti d M ti ti Related Work S i S i d F l tiIntroduction and Motivation Related Work Semi-Supervised FormulationIntroduction and Motivation Related Work Semi-Supervised FormulationIntroduction and Motivation Semi Supervised FormulationpTh i d f f t d t i d i fThe emerging need of fast and accurate indexing for Locality Sensitive Hashing (LSH) I d k t l 98The emerging need of fast and accurate indexing for Locality Sensitive Hashing (LSH) -- Indyk et al 98 Incorporate both metric similarity and semantic similarity for fastg g g Locality Sensitive Hashing (LSH) Indyk et al., 98 Incorporate both metric similarity and semantic similarity for fast

l l t t b d i t i l (CBIR)y g ( ) p y y

large scale content based image retrieval (CBIR) and accurate indexinglarge scale content based image retrieval (CBIR) and accurate indexingg g ( )h i h i i f ibl d i l d

gExhaustive search is infeasible due to computational and storage cost Given pair wise labeled dataExhaustive search is infeasible due to computational and storage cost Given pair wise labeled datasampled from distribution random shiftp gI d f i hb i i hb

p- - sampled from distribution - - random shiftInstead of exact nearest neighbor approximate nearest neighborp

Instead of exact nearest neighbor, approximate nearest neighborS t l H hi (SH)(ANN ) h i ti l f l d t b Spectral Hashing (SH) -- Weiss et al 08 A d d d h h ld(ANN )search is more practical for large databases Spectral Hashing (SH) -- Weiss et al. 08 Assume data zero centered and use mean threshold:(ANN )search is more practical for large databases p g ( ) Assume data zero centered and use mean threshold:

Compared ith tree based ANN methods hashing approach is more i i l fiCompared with tree based ANN methods, hashing approach is more Empirical fitness:Th l i f h d SSHCompared with tree based ANN methods, hashing approach is more Empirical fitness:The conceptual comparison of the proposed SSHscalable and efficient

pThe conceptual comparison of the proposed SSHscalable and efficient method with LSH SH and RBMsmethod with LSH, SH and RBMs.

d tti l fM i i - - data range- - spatial frequencyMain issues: R l i ti f i idata rangespatial frequencyMain issues: Regularization of maximum variance:principal projections of data

E i i h hi h d l l d i i l j iRegularization of maximum variance:- - principal projections of data

Existing hashing methods mostly rely on random or principal projections Fi l bj ti f tip p p j

Existing hashing methods mostly rely on random or principal projections, Sinusoidal binarization over PCA projections Final objective function Similarg g y y p p p j

hi h t t t Sinusoidal binarization over PCA projections Final objective function Similarwhich are not very compact or accurate p j

Dissimilarwhich are not very compact or accurateR t i t d B lt M hi (RBM ) Hi l 06

DissimilarSi l t i ll t h t ti i il it Restricted Boltzmann Machines (RBMs) – Hinton et al 06Simple metrics are usually not enough to express semantic similarity Restricted Boltzmann Machines (RBMs) Hinton et al. 06Simple metrics are usually not enough to express semantic similarity ( )

L d b li f k b i bi dG l Gi l l b l d t ith f i i Learns deep belief networks to obtain compact binary codesGoal: Given a large unlabeled set with a few pair-wise Learns deep belief networks to obtain compact binary codes Goal: Given a large unlabeled set with a few pair-wise p p yT t i i t i d t i i d i d fi t i

g pTwo training steps unsupervised pre-training and supervised fine tuning G d P i il b l d i t l d t d d t t h h dTwo training steps, unsupervised pre training and supervised fine tuning Good Partitionlabeled points learn data-dependent compact hash codes E ti ti l b f i ht i l t f l b l d d t d l t i i ti P P i i

Good a t t olabeled points, learn data dependent compact hash codes Estimating a large number of weights requires lots of labeled data and long training time Poor PartitionTh th i i d h hi k i thi ' CVPRp , p p Estimating a large number of weights requires lots of labeled data and long training time Poor PartitionThe other semi-supervised hashing work in this year's CVPR: AD p g y

Y M t l W kl S i d H hi i K l SAD

Y. Mu, et al., Weakly-Supervised Hashing in Kernel Space, , y p g p

O h l H h C d N O th l H h C d E i t d C l iOrthogonal Hash Codes Non Orthogonal Hash Codes Experiments and ConclusionsOrthogonal Hash Codes Non-Orthogonal Hash Codes Experiments and ConclusionsOrthogonal Hash Codes Non Orthogonal Hash Codes Experiments and Conclusionsg g pR l h li i d bi MNIST (70K 2K training labels 1K test)Simplified objective function in matrix form Relax orthogonality constraints and combine MNIST (70K, 2K training labels, 1K test)Simplified objective function in matrix form Relax orthogonality constraints and combine ( , g , )Simplified objective function in matrix form Relax orthogonality constraints and combine

th t th bj ti f ti lt tthem to the objective function as penalty termthem to the objective function as penalty termj p y

B i i th lit t i t th bj ti f tiBy imposing orthogonality constraints the objective functionBy imposing orthogonality constraints, the objective function y p g g y , jC b l d b i l difi i f hcan be converted to a standard eigen problem Can be solved by a simple modification of thecan be converted to a standard eigen problem Can be solved by a simple modification of thecan be converted to a standard eigen problem Can be solved by a simple modification of the

i th l l tiprevious orthogonal solution“adjusted” covariance matrix previous orthogonal solutionadjusted covariance matrix O illi GIST (10K i i l b l 2K )

p gjOne million GIST (10K training labels 2K test)One million GIST (10K training labels, 2K test)

th l l ti( g )

--- orthogonal solutionorthogonal solutionsupervised unsupervisedsupervised unsupervisedp p

principal projections of “adjusted” covariance matrixFinding maximum variance projections over the data covariance principal projections of adjusted covariance matrixFinding maximum variance projections over the data covariance p p p j jFinding maximum variance projections over the data covariance i “ dj d” b i i h i i l b l d d Ch ffi imatrix “adjusted” by incorporating the pair wise labeled data Choose proper coefficient to guaranteematrix adjusted by incorporating the pair-wise labeled data Choose proper coefficient to guarantee matrix adjusted by incorporating the pair wise labeled data p p g

I M t l d t t h th i t t d th iti d fi itIssues: Most real datasets have the variance concentrated on the positive-definiteIssues: Most real datasets have the variance concentrated on the positive-definitept l f j ti th lit t i f ttop several few projections orthogonality constrains force to

C l itop several few projections. orthogonality constrains force to

Ch l k f i i Conclusionsp p j g y

-- - Cholesky factorization Conclusionsprogressively pick those low variance directions substantially -- - Cholesky factorizationprogressively pick those low-variance directions substantially yA semi-supervised formulation for hash function learningprogressively pick those low variance directions, substantially A semi-supervised formulation for hash function learningp g y p y

d i h li f h i b ddi O th l d th l l ti b i l SVDreducing the quality of hamming embedding non orthogonal solution Orthogonal and non-orthogonal solutions by simple SVDreducing the quality of hamming embedding --- non-orthogonal solution Orthogonal and non orthogonal solutions by simple SVDreducing the quality of hamming embedding gEasy implementation fast training and superior performanceEasy implementation, fast training, and superior performance