An Introduction to Metric Learning for Clustering

Metric Learning for Clustering

SCC5945 - Análise Semi-Supervisionada e Não-Supervisionadade Padrões em Dados

(Seminar)

Sidgley Camargo de AndradePhD student in computer science

Institute of Computer Science and MathematicsUniversity of São Paulo

June 2016

1 / 12

http://pessoal.utfpr.edu.br/sidgleyandrade

Agenda

Constraint-based algorithms

Motivation

Metrics

Metric learning for clustering

MPCK-means algorithm

References

2 / 12

Constraint-based algorithms

How to help the unsupervised algorithms to find bettersolution?

I Constraint-based methods– e.g. background knowledgethrough pairwise constraints Wagstaff et al. (2001)

Con ⊆ DxD : must-link constraintsCon6= ⊆ DxD : cannot-link constraints

I Active- and self-learning

I Other . . .

Are there “problems” related to algorithms above?

3 / 12

Motivation

Figure: (Basu et al., 2008). Legend [–] must-link [- -] cannot-link

4 / 12

Metrics

The metrics depict the relationships between the data (e.g.euclidean distance, mahalanobis distance, etc. . . )

What is the right metric?

There are few forms or systemic mechanisms to tweak distancemetrics, and them are often by hand Xing et al. (2003).

5 / 12

Metric learning for clustering

Assumption: keeping dissimilar points far from each other andsimilar points closest to each other reduces the risk of errors.

Xing et al. (2003)Suppose a user indicates that certain points in an input space (say,<n) are considered by them to be “similar” (or “dissimilar”). Can weautomatically learn a distance metric over <n that respects theserelationships, i.e., one that assigns small distances between thesimilar pairs and greater distances otherwise?

Learn a metric d : <nx<n 7→ < over the input space.

6 / 12

Problem

A simple way is to require that similar pairs (must-linked) havesmall distance between them, whereas dissimilar pairs (cannot-link)have greater distance between them

d(x , y) = dA(x , y) = ||x − y ||A =√(x − y)TA(x − y)

minA

∑(xi ,xj )∈S ||xi − xj ||2A

s.t.∑

(xi ,xj )∈D ||xi − xj ||2A ≥ c

A � 0

, where A � 0 is a constraint that symmetric matrix A must bepositive semi-definite – “pseudo metric” – and c any positiveconstant ≥ 1

1Question for class – Why is constant c positive?2Question for class – How to transform to max problem?

7 / 12

Example – Xing et al. (2003)

8 / 12

Metric Pairwise Constraint K-means(MPCK-means)

Assumes a matrix Ah (metric) for each cluster h

Permits the specification of an individual weight for each constraint(fM and fC ); the penalty for constraint violations is proportional tothe violated constraints weight

9 / 12

MPCK-means algorithm – Bilenko et al. (2004)

10 / 12

MPCK-means algorithm – Bilenko et al. (2004)

11 / 12

References

Basu, S., Davidson, I., and Wagstaff, K. (2008). Constrained Clustering:Advances in Algorithms, Theory, and Applications. Chapman &Hall/CRC, 1 edition.

Bilenko, M., Basu, S., and Mooney, R. J. (2004). Integrating constraintsand metric learning in semi-supervised clustering. In Proceedings ofthe Twenty-first International Conference on Machine Learning, ICML’04, pages 11–, New York, NY, USA. ACM.

Wagstaff, K., Cardie, C., Rogers, S., and Schrödl, S. (2001). Constrainedk-means clustering with background knowledge. In Proceedings of theEighteenth International Conference on Machine Learning, ICML ’01,pages 577–584, San Francisco, CA, USA. Morgan KaufmannPublishers Inc.

Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. (2003). Distancemetric learning, with application to clustering with side-information. InAdvances in Neural Information Processing System, pages 505–512.MIT Press.

12 / 12

Education

An Introduction to Metric Learning for Clustering