Upload
utfpr
View
222
Download
0
Embed Size (px)
Citation preview
Metric Learning for Clustering
SCC5945 - Análise Semi-Supervisionada e Não-Supervisionadade Padrões em Dados
(Seminar)
Sidgley Camargo de AndradePhD student in computer science
Institute of Computer Science and MathematicsUniversity of São Paulo
June 2016
1 / 12
Agenda
Constraint-based algorithms
Motivation
Metrics
Metric learning for clustering
MPCK-means algorithm
References
2 / 12
Constraint-based algorithms
How to help the unsupervised algorithms to find bettersolution?
I Constraint-based methods– e.g. background knowledgethrough pairwise constraints Wagstaff et al. (2001)
Con ⊆ DxD : must-link constraintsCon6= ⊆ DxD : cannot-link constraints
I Active- and self-learning
I Other . . .
Are there “problems” related to algorithms above?
3 / 12
Motivation
Figure: (Basu et al., 2008). Legend [–] must-link [- -] cannot-link
4 / 12
Metrics
The metrics depict the relationships between the data (e.g.euclidean distance, mahalanobis distance, etc. . . )
What is the right metric?
There are few forms or systemic mechanisms to tweak distancemetrics, and them are often by hand Xing et al. (2003).
5 / 12
Metric learning for clustering
Assumption: keeping dissimilar points far from each other andsimilar points closest to each other reduces the risk of errors.
Xing et al. (2003)Suppose a user indicates that certain points in an input space (say,<n) are considered by them to be “similar” (or “dissimilar”). Can weautomatically learn a distance metric over <n that respects theserelationships, i.e., one that assigns small distances between thesimilar pairs and greater distances otherwise?
Learn a metric d : <nx<n 7→ < over the input space.
6 / 12
Problem
A simple way is to require that similar pairs (must-linked) havesmall distance between them, whereas dissimilar pairs (cannot-link)have greater distance between them
d(x , y) = dA(x , y) = ||x − y ||A =√(x − y)TA(x − y)
minA
∑(xi ,xj )∈S ||xi − xj ||2A
s.t.∑
(xi ,xj )∈D ||xi − xj ||2A ≥ c
A � 0
, where A � 0 is a constraint that symmetric matrix A must bepositive semi-definite – “pseudo metric” – and c any positiveconstant ≥ 1
1Question for class – Why is constant c positive?2Question for class – How to transform to max problem?
7 / 12
Example – Xing et al. (2003)
8 / 12
Metric Pairwise Constraint K-means(MPCK-means)
Assumes a matrix Ah (metric) for each cluster h
Permits the specification of an individual weight for each constraint(fM and fC ); the penalty for constraint violations is proportional tothe violated constraints weight
9 / 12
MPCK-means algorithm – Bilenko et al. (2004)
10 / 12
MPCK-means algorithm – Bilenko et al. (2004)
11 / 12
References
Basu, S., Davidson, I., and Wagstaff, K. (2008). Constrained Clustering:Advances in Algorithms, Theory, and Applications. Chapman &Hall/CRC, 1 edition.
Bilenko, M., Basu, S., and Mooney, R. J. (2004). Integrating constraintsand metric learning in semi-supervised clustering. In Proceedings ofthe Twenty-first International Conference on Machine Learning, ICML’04, pages 11–, New York, NY, USA. ACM.
Wagstaff, K., Cardie, C., Rogers, S., and Schrödl, S. (2001). Constrainedk-means clustering with background knowledge. In Proceedings of theEighteenth International Conference on Machine Learning, ICML ’01,pages 577–584, San Francisco, CA, USA. Morgan KaufmannPublishers Inc.
Xing, E. P., Ng, A. Y., Jordan, M. I., and Russell, S. (2003). Distancemetric learning, with application to clustering with side-information. InAdvances in Neural Information Processing System, pages 505–512.MIT Press.
12 / 12