Transcript
Page 1: Similarity Learning for High Dimensional Sparse Datakuanl/papers/aistats15_hdsl_poster.pdf · Derivation of scalable algorithms for the proposed formulations, with time/memory cost

Similarity Learning for High Dimensional Sparse Data

Existing methodsMotivationMeasuring distance or similarity is a key component of AI, machine learning, pattern recognition, data mining, etcEx: nearest neighbor classification, clustering, information retrieval...

How to define good distances between objects?

Metric learning [1]: learn distance or similarity function automatically from data (must-link/cannot-link relations)

metriclearning

Main contributions A similarity model that can efficiently learn high-dimensional similarity in the original space by parameterizing the similarity measure as a convex combination of rank-one matrices with specific sparsity structures.

Derivation of scalable algorithms for the proposed formulations, with time/memory cost independent from data dimensionality

Appealing optimization and generalization guarantees.

Experiments on high dimensional real data showing its potential in classification, dimensionality reduction and data exploration.

Classification Feature Selection and Sparsity

More experiments in the paper + MATLAB code available!

Dimension Reduction

Experiments

ApproachFormulation Optimization Theoretical analysis

Limited features selected: As iteration goes, the number of features tends to converge; Extreme sparse similarity matrix:0.0006% entries are non-zero.

One run of HDSL as means of dimension reduction, outperforming PCA and random projection in terms of k-NN classification test error.

References[1] Bellet, A.; Habrard, A.; and Sebban, M. 2013. A Survey on Metric Learning for Feature Vectors and Structured Data. Technical report, arXiv:1306.6709.[2] Jaggi, M. "Revisiting Frank-Wolfe: Projection-free sparse convex optimization." ICML, 2013.[3] Gao, X, et al. "SOML: Sparse online metric learning with application to image retrieval." AAAI. 2014.[4] Schultz, M, and Joachims. T. "Learning a distance metric from relative comparisons." NIPS,2004.

[5] Goldberger, J., Roweis, S., Hinton, G. and Salakhutdinov, R.. Neighbourhood Components Analysis. In NIPS, 2004.[6] K. Q. Weinberger and L. K. Saul. DistanceMetric Learning for Large Margin Nearest Neighbor Classification. JMLR, 2009.[7] Kedem, D., Tyree, S., Weinberger, K., Sha, F. and Lanckriet, G., Non-linear Metric Learning. In NIPS, 2012.[8] Shen. S. et al. Positive Semidefinite Metric Learning Using Boosting-like Algorithms. JMLR, 2012[9] Shi, Y., Bellet, A. and Sha, F., Sparse Compositional Metric Learning. In AAAI, 2014

Kuan Liu Aurélien Bellet Fei Sha

Time/Memory cost independent from datadimensionality (d).

Learning sparse similarity efficiently based on FW.

i.e., only one similarity basis (two features) added/removed at each iteration. gives compact storage of Mand efficient computation of objective function, activeconstraints, etc.

Frank Wolfe Algorithm[2]:Settting: assume data high dimensional, sparse. , ,

Goal: learn , , .

Similarity basis: Given

Learning Objective:

Smoothed hinge loss more similar to than to

(1)

Sparsity: At any iteration k, has at most rank k+1 with 4(k+1) non-zero entries, using at most 2(k+1) distinct features.

Convergence: Let be an optimal solution to(1), and let , for each iteration k,

Generalization: the number of iterations givesa tradeoff between optimization error and model complexity. For each iteration k,

expected risk empirical risk

This is expensive in high dimensionsMust learn D2 parametersMust ensure that is PSD: O(D3) time complexity

Most algorithms learn a Mahalanobis distance

where is a D x D positive semi-definite (PSD) matrix

Explicit low rank decomposition[5][6][7].Rank one matrices decompostion[8][9].

Low-dimensional projection based methods.Current approaches

Learn a diagonal matrix [3][4].

= 0.125 + 0.25 + ... + 0.125

In k-NN classification on data sets with d up to , our methods achieve lowest test errors than other state-of-the-art similarity learning approaches.