Similarity Search in Visual Data Ph.D. Thesis Defense Anoop Cherian * Department of Computer Science...
If you can't read please download the document
Similarity Search in Visual Data Ph.D. Thesis Defense Anoop Cherian * Department of Computer Science and Engineering University of Minnesota, Twin-Cities
Similarity Search in Visual Data Ph.D. Thesis Defense Anoop
Cherian * Department of Computer Science and Engineering University
of Minnesota, Twin-Cities Adviser: Prof. Nikolaos Papanikolopoulos
*Contact: [email protected]@cs.umn.edu
Slide 2
Talk Outline Introduction Problem Statement Algorithms for
Similarity Search in Matrix Valued Data High Dimensional Vector
Data Conclusion Future Work
Slide 3
Thesis Related Publications Journals 1. A. Cherian, S. Sra, A.
Banerjee, and N. Papanikolopoulos. Jensen-Bregman-
LogDet-Divergence with Application to Efficient Similarity Search
for Covariance Matrices. Transactions on Pattern Analysis and
Machine Intelligence (TPAMI), [Accepted with minor revisions].
(Chapter 3) 2.A. Cherian, V. Morellas, and N. Papanikolopoulos.
Efficient Nearest Neighbor Retrieval via Sparse Coding. Pattern
Recognition Journal, [Being submitted] (Chapters 5, 7) Conference
Publications 1. A. Cherian, S. Sra, A. Banerjee, and N.
Papanikolopoulos. Efficient Similarity Search on Covariance
Matrices via the Jensen-Bregman-LogDet-Divergence, Intl. Conf. on
Computer Vision (ICCV), 2011. (Chapter 3) 2. A. Cherian, V.
Morellas, N. Papanikolopoulos, and S. Badros. Dirichlet Process
Mixture Models on Symmetric Positive Definite Matrices for
Appearance Clustering in Video Surveillance Applications, Computer
Vision and Pattern Recognition (CVPR), 2011. (Chapter 4)
Slide 4
Thesis Related Publications 3. A. Cherian, J. Andersh, V.
Morellas, N. Papanikolopoulos, and B. Mettler. Motion Estimation of
a Miniature Helicopter using a Single Onboard Camera, American
Control Conference (ACC), 2010. (Chapter 5) 4. A. Cherian, S. Sra,
and N. Papanikolopoulos. Denoising Sparse Noise via Online
Dictionary Learning. Intl. Conf. on Acoustics, Speech, and Signal
Processing (ICASSP), 2011. (Chapter 6) 5. A. Cherian, V. Morellas,
and N. Papanikolopoulos. Robust Sparse Hashing. Intl. Conf. on
Image Processing (ICIP), 2012 (Chapter 6) [Best Student Paper
Award] 6. A. Cherian, V. Morellas, and N. Papanikolopoulos.
Approximate Nearest Neighbors via Dictionary Learning, Proceedings
of SPIE, 2011. (Chapters 5,6,7) 7. S. Sra, and A. Cherian.
Generalized Dictionary Learning for Symmetric Positive Definite
Matrices with Application to Nearest Neighbor Retrieval, European
Conference on Machine Learning (ECML), 2011. (Chapter 8) 8. A.
Cherian, and N. Papanikolopoulos. Large Scale Image Search via
Sparse Coding. Minnesota Supercomputing Institute (MSI) Poster
Presentation, 2012. [Best Poster Award]
Slide 5
Talk Outline Introduction Motivation Problem Statement
Algorithms for Similarity Search in Matrix Valued Data High
Dimensional Vector Data Conclusion Future Work
Slide 6
Courtesy of Intel
Slide 7
Big-Data Challenge How to connect the information seeker to the
right content? Solution Similarity search Three fundamental steps
in similarity search 1. Represent the data 2. Describe the query 3.
Retrieve data most similar to the query
Slide 8
Visual Data Challenges Art courtesy of Thomas Kinkade Pastoral
House Never express yourself more clearly than you are able to
think-- Neils Bohr It is sometimes difficult to describe precisely
in words, what data is to be retrieved! This is especially the case
in visual content retrieval, where similarity is defined by an
unconscious process. Therefore, characterizing what we see is hard.
It is even harder to teach a machine visual similarity.
Slide 9
A Few Applications using Visual Similarity Search Content-based
image retrieval Medical Image Analysis 3D Reconstruction Visual
SurveillanceHuman-Machine Interaction
Slide 10
3D Scene Reconstruction: Technical Analysis Courtesy: Google
Street View Goal: 3D street view Input: A set of images Algorithm
1. Find point correspondences between pairs of images 2. Estimate
camera parameters 3. Estimate camera motion 4. Estimate 3D point
locations
Slide 11
3D Scene Reconstruction: Technical Analysis Courtesy: Google
Street View Typically SIFT point descriptors (128D) are used as
point descriptors Each image produces several thousand SIFT
descriptors (let us say 10K SIFTs/image) There are several thousand
images required for a reliable reconstruction (assume 1K images).
Thus, there are approximately 10Kx1K=10 7 SIFTs. Pair-wise
computations require 10 14 comparisons! This is for only one scene
think of millions of scenes in a Street-View application!
Computational bottleneck: Efficient similarity computation!
Slide 12
Talk Outline Introduction Motivation Problem Statement
Algorithms for Similarity Search in Matrix Valued Data High
Dimensional Vector Data Conclusion Future Work
Slide 13
Problem Statement Approximate Nearest Neighbor
Slide 14
Problem Challenges High dimensional data Poses the curse of
dimensionality Difficult to distinguish near and far points
Examples: SIFT (128D), GIST (960D) Large scale datasets Needle in
the haystack! Peta-bytes of visual data and billions of data
descriptors Desired Similarity Search Algorithm Properties High
retrieval accuracy Fast retrieval Low memory footprint Scalability
to large datasets Scalability to high dimensional data Robustness
to data perturbations Generalizability to various data descriptors
Unit ball inside a unit hypercube
Slide 15
Thesis Contributions We propose NN retrieval algorithms for two
different data modalities:- Matrix valued data (as symmetric
positive definite matrices) A new similarity distance Jensen
Bregman LogDet Divergence An unsupervised clustering algorithm High
dimensional vector valued data A novel connection between sparse
coding and hashing A fast and accurate hashing algorithm for NN
retrieval Theoretical analysis of our algorithms Experimental
validation of our algorithms: against the state-of-the-art
techniques in NN retrieval, and on several computer vision
datasets
Slide 16
Talk Outline Introduction Introduction Motivation Motivation
Problem Statement Problem Statement Algorithms for Similarity
Search in Algorithms for Similarity Search in Matrix Valued Data
High Dimensional Vector Data Conclusion Future Work
Slide 17
Covariance of features Appearance silhouette Features (color +
gradient + curvature) Covariance Descriptor Advantages
Multi-feature fusion Compact Real-time computable Robust to static
noise Robust to illumination Robust to affine transforms Matrix
(Covariance) Valued Data
Slide 18
Importance of Covariance Valued Data in Vision Diffusion
Imaging (3x3D) (DT-MRI) Object Tracking (5x5D), Tuzel, et al. 2006
Activity Recognition (12x12D), Guo et al. 2009 Emotion Recognition
(30x30D), Zheng, et al., 2010 Face Recognition (40x40D), Pang et
al. 2008 3D Object Recognition (8x8D), Fehr et al. 2012
Slide 19
Geometry of Covariances Covariances form a manifold in
Euclidean space due to their positive definiteness property
Distances are not straight lines, but curves! Incorporating
curvature makes distance computation expensive X Y S p ++
Slide 20
Similarity Metrics on Covariances Affine Invariant Riemannian
Metric (AIRM) Natural metric induced by the Riemannian geometry
Log-Euclidean Riemannian Metric (LERM) Induced by approximating
covariances to a flat geometry Kullback-Leibler Divergence Metric
(KLDM) Considering covariances as objects of an associated Gaussian
distribution Matrix Frobenius Distance (FROB) Considering
covariances as vectors in the Euclidean space
Slide 21
Let f be a convex function d f (X,Y) is the deviation of f(Y)
from the tangent through f(X) (see figure on the right)
Jensen-Bregman divergence is the average deviation of f from the
mid point of X and Y Our new measure is derived by substituting f
as the -log|. | function: where X,Y are covariances and log|. | is
the logdet function. f(Y) f(X) f Our Distance:Jensen-Bregman LogDet
Divergence (JBLD)
Slide 22
Metric\ Property MGNIAINEOFLOPS FROB d(d+1)/2 AIRM 4d 3 LERM
(8/3) d 3 KLDM (8/3) d 3 JBLD d3d3 Notation What it means ? MDoes
it satisfy all metric properties? GAre gradient computations fast?
NIIs the measure invariant to inversion? AIAffine invariance?
Notation What it means ? NEAre neg. eigenvalues at infinity? OWill
not overestimate AIRM? FLOPSComputational complexity Properties of
JBLD
Slide 23
Computational Speedup using JBLD Speedup in computing AIRM and
JBLD for increasing matrix dimensionality Speedup in computing
gradients of AIRM and JBLD for increasing matrix
dimensionality
Nearest Neighbors using JBLD Considering NN retrieval on any
metric space Scalability Ease for exact NN retrieval Ease for
Approximate NN We decided to use a Metric Tree (MT) on JBLD for NN
retrieval Square-root of JBLD is a metric. Basically a hierarchical
kmeans algorithm From root (which is the entire dataset),
bipartitions data recursively.
Slide 26
Experimental Results using JBLD
Slide 27
Experiments: Evaluation Datasets Weizmann Actions dataset ETH
Tracking dataset Brodatz Texture dataset Faces in the Wild dataset
DatasetCovariance sizeDataset sizeGround truth
Actions12x1265KAvailable Textures8x827KAvailable
Faces40x4031KAvailable Tracking8x810KAIRM
Slide 28
Experimental Results using JBLD Metric Tree Creation Time NN
via Metric tree ANN via Metric tree
Slide 29
Unsupervised Clustering of Covariances Clustering is an
important step in NN retrieval K-Means type clustering need known
number of clusters (K) Finding K is non-trivial in practice Thus,
we propose an unsupervised clustering algorithm on covariances
Extension to Dirichlet Process Mixture Model (DPMM) Uses
Wishart-Inverse-Wishart (WIW) conjugate pair Also investigates
other DPMM models such as, Gaussian on log-Euclidean covariance
vectors Gaussian on vectorized covariances
Slide 30
Experimental Results Purity is synonymous with Accuracy.
Definitions:- le: LERM, f: FROB, l-KLDM, g: AIRM Faces, 40x40 D,
900 matrices, 110 clusters Simulation results for increasing true
number of clusters DPMM computational expense against k-means
(using AIRM) and EM (using MoW) Appearances, 5x5 D, 758 matrices,
31 clusters
Slide 31
Talk Outline Introduction Motivation Problem Statement
Algorithms for Similarity Search in Matrix Valued Data High
Dimensional Vector Data Conclusion Future Work
Slide 32
Importance of Vector Valued Data in Vision Fundamental data
type in several applications As histogram based descriptors -
Examples: SIFT, Spin Images, etc. As feature descriptors - Example:
image patches As filter outputs: - Example: GIST descriptor GIST
Texture patches SIFT
Slide 33
KD Trees Partitions space along fixed hyperplanes Locality
Sensitive Hashing (LSH), Indyk et al. 2008 Generates hash codes by
projecting data to random hyperplanes Spectral Hashing, Torralba et
al. 2008 Projection planes derived from orthogonal subspaces of PCA
Kernelized Hashing, Kulis et al. 2010 Projection planes derived
from PCA over kernel matrix learned from data Shift Invariant
Kernel Hashing, Lazebik et al. 2009 Spectral hashing with a cosine
based kernel Product Quantization, Jegou et al. 2011 K-means
sub-vector clustering followed by standard LSH FLANN, Lowe et al.
2009 Not a hashing algorithm, but a hybrid of Hierarchical K-Means
and KD-tree. Related Work KD-Tree X X 1 2 3 4 5 LSH Hash code :
11010
Slide 34
Our Approach Based on Dictionary Learning (DL) and Sparse
Coding (SC) Algorithm steps: For each data vector v, 1.Represent v
as a sparse vector w using a dictionary B 2.Encode w as a hash code
T 3.Store w at H(T), where H is a hash table indexed by T End Given
query vector q, 1. Generate sparse vector w q and hash code T q 2.
Find ANN(q) in H(T q )
Slide 35
Dictionary Learning and Sparse Coding Dictionary learning:- An
algorithm to learn atoms from data. Sparse Coding:- An algorithm to
represent data in terms of a few atoms in the dictionary. An
Analogy Dictionary Learning Data Dictionary of basic atoms
Slide 36
Dictionary Learning and Sparse Coding Dictionary learning:- An
algorithm to learn atoms from data. Sparse Coding:- An algorithm to
represent data in terms of a few atoms in the dictionary. An
Analogy Dictionary Learning Image dataDictionary of basic
atoms
Slide 37
Dictionary Learning and Sparse Coding Dictionary learning:- An
algorithm to learn atoms from data. Sparse Coding:- An algorithm to
represent data in terms of a few atoms in the dictionary. An
Analogy Sparse Coding 0 x Na 0 x Li 0 x Be. 2 x H. 1 x O. 0 x Xe 0
x Rn Sparse atom selection Data vector Sparse representation (lots
of zeros)
Slide 38
Dictionary Learning and Sparse Coding Dictionary learning:- An
algorithm to learn atoms from data. Sparse Coding:- An algorithm to
represent data in terms of a few atoms in the dictionary. An
Analogy Sparse Coding 0.0 x. 1.2 x. 0.4 x. 0.0 x Sparse atom
selection Image Sparse representation (lots of zeros)
Sparse Coding & NN Retrieval Connection High probability
New data point
Slide 41
Sparse Coding & NN Retrieval Connection
Slide 42
Advantages of Sparse Coding for NN Retrieval Hashing efficiency
Large number of hash codes 2 k n C k k-sparse codes against 2 k
codes of LSH Storage efficiency Need to store only sparse
coefficients Against entire data vectors as in LSH Query efficiency
Linear search on low dimensional sparse vectors No curse of
dimensionality Sparse coding complexity O(ndk) for a dictionary of
n atoms each of dimension d and generating k-sparse codes. 1-sparse
2-sparse
Slide 43
Disadvantage: Sensitivity to Data Perturbation! Sparse coding
fits hyperplanes to dense regions of data There are 2 k n C k
hyperplanes for k-sparse code and n-atom dictionary Example: Assume
n=1024, k=10 We have 10 30 hyperplanes Data partitions can be too
small! Small data perturbations can lead data points to change
partitions Different partitions imply different hash codes and
hashing fails!
Slide 44
Robust NN Retrieval Robust Dictionary Learning Robust Sparse
Coding Align dictionary atoms compensating for data perturbations
Approaches Let perturbations be noise. Develop a denoising model
Make data immune to worst case perturbation Hierarchical data space
partitioning Larger partitions subsume smaller partitions Generate
multiple hash codes, one for each partition
Slide 45
Robust Dictionary Learning Denoising approach Robust
Optimization Data has large and small perturbations Assume Gaussian
noise for small perturbations Assume Laplacian for large but sparse
perturbations. Denoise for Gaussian + Laplacian noise Resulting
denoised data should produce same SCT! Basis learned Subtract off
Laplacian noise Subtract off Gaussian noise Worst case perturbation
Project data to worst case perturbation Basis learned No
assumptions on noise distribution Learn worst case perturbation
from a training set Project every data point as if perturbed by the
worst case noise Learn basis on the perturbed data Resulting
immunized data should produce same SCT!
Slide 46
Robust Dictionary Learning: Experimental Results Denoising
approach Robust optimization INRIA Copydays Dataset Graf Bike Bark
Boat Wall Leu UBC Tree
Slide 47
Robust Sparse Coding Based on the regularization path of sparse
coding Similar data points will have similar regularization paths
Similar data points & basis activationsDissimilar data points
& basis activations Main idea: Generate multiple SCTs for each
increasing regularizations Multi-Regularization Sparse Coding
(MRSC) algorithm Increasing regularization means bigger data
partitions & more robustness
Talk Outline Introduction Motivation Problem Statement
Algorithms for Similarity Search in Matrix Valued Data High
Dimensional Vector Data Conclusion Future Work
Slide 53
We considered NN problems on two different data types
Covariance data High dimensional vector data For covariance data,
We proposed an efficient similarity metric-Jensen Bregman LogDet
Divergence Proposed novel unsupervised clustering algorithms with
high clustering accuracy For vector data We established a
connection between LSH and sparse coding Proposed efficient
algorithms for robust NN retrieval We proposed a framework for
sparse coding covariances- Generalized Dictionary Learning
Conclusion
Slide 54
Talk Outline Introduction Motivation Problem Statement
Algorithms for Similarity Search in Matrix Valued Data High
Dimensional Vector Data Conclusion Future Work
Slide 55
Future Work Covariance data Application of JBLD for DT-MRI
applications Semi-supervised Dirichlet process mixture models
Metric learning on covariance manifolds Locality sensitive hashing
on covariances High dimensional vector data Hamming embedding via
dictionary learning Dictionary Learning under constraints Bulk
sparse coding Large scale dictionary learning