Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering...

Preview:

DESCRIPTION

These datasets have identical statistics up to 2 nd order

Citation preview

Data Mining Course 2007

Eric Postma

Clustering

Overview

Three approaches to clustering

1. Minimization of reconstruction error• PCA, nlPCA, k-means clustering

2. Distance preservation• Sammon mapping, Isomap, SPE

3. Maximum likelihood density estimation• Gaussian Mixtures

• These datasets have identical statistics up to 2nd order

1. Minimization of reconstruction error

Illustration of PCA (1)

• Face dataset (Rice database)

Illustration of PCA (2)

• Average face

Illustration of PCA (3)

• Top 10 Eigenfaces

Each 39-dimensional data item describes different aspects of the welfare and poverty of one country.

2D PCA projection

Non-linear PCA

• Using neural networks (to be discussed tomorrow)

2. Distance preservation

Sammon mapping

• Given a data set X. The distance between any two samples is defined as Dij

• We consider the projection on a two dimensional plane where the projected points are separated by dij

• Define an Error function

i jiij

ijij

i ji ij DdD

DE

21

Sammon mapping

Main limitations of Sammon

• The Sammon mapping procedure is a gradient descent method

• Main limitation: local minima

• MDS may be preferred because it finds global minima (being based on PCA)

• Both methods have difficulty with “curved or curly subspaces”

Isomap

• Tenenbaum• Build a graph in which each node

represents a data point• Compute shortest distances along the

graph (e.g., Dijkstra’s algorithm)• Store all distances in a matrix D• Perform MDS on the matrix D

Illustration of Isomap (1)

• For two arbitrary points on the manifold Euclidean distance does not always reflect similarity (cf. dashed blue line)

Illustration of Isomap (2)

• Isomap finds the appropriate shortest path along the graph (red curve, for K=7, N=1000)

Illustration of Isomap (3)

• Two-dimensional embedding (red line is the shortest path along the graph, blue line is the true distance in the embedding.

Illustration of Isomap (4)

• Isomaps (●) ability to find the intrinsic dimensionality as compared to PCA and MDS (∆ and o).

Illustration of Isomap (5)

Illustration of Isomap (6)

Illustration of Isomap (7)

• Interpolation along a straight line

Stochastic Proximity Embedding

• SPE algorithm

• Agrafiotis, D.K. and Xu, H. (2002). A self-organizing principle for learning nonlinear manifolds. Proceedings of the National Academy of Sciences U.S.A.

Stress function

Output proximity between points i and j Input proximity between points i and j

2)(),( ijijijij rdrdf

Swiss roll data set

Original 3D set 2D embedding obtained by SPE

Stress as a function of embedding dimension(averaged over 30 runs)

Scalability (# steps for four set sizes)Linear scaling

Conformations of methylpropyletherC1C2C3O4C5

Diamine combinatorial library

Clustering

• Minimize the total within-cluster variance (reconstruction error)

C

c

N

i ci

ic wxkE2

• kic = 1 if a data point belongs to cluster cK-means clustering1. Random selection of C cluster centres2. Partition the data by assigning them to the clusters3. The mean of each partitioning is the new cluster centreA distance threshold may be used…

• Effect of distance threshold on the number of clusters

Main limitation of k-means clustering

• Final partitioning and cluster centres depend on initial configuration

• Discrete partitioning may introduce errors

• Instead of minimizing the reconstruction error, we may maximize the likelihood of the data (given some probabilistic model)

Neural algorithms related to k-means

• Kohonen self-organizing feature maps

• Competitive learning networks

3. Maximum likelihood

Gaussian Mixtures

• Model the pdf of the data using a mixture of distributions

• K is the number of kernels (<< # data points)• Common choice for the component densities p(x|i):

K

iiPixpxp )()|()(

2

2

2/2 2exp

)2(1)|(

i

id

i

xixp

Illustration of EM applied to GM model

The solid line gives the initialization of the EM algorithm: two kernels,P(1) = P(2) = 0:5, μ1 = 0.0752; μ2 = 1.0176, σ1 = σ2 = 0:2356

Convergence after 10 EM steps..

Relevant literature

• L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik (submitted). Dimensionality Reduction: A Comparative Review.

• http://www.cs.unimaas.nl/l.vandermaaten

Recommended