Unsupervised Clustering

2PR , ANN, & ML

Training samples are not labeled

May not know

how many classes

a prior probability

state-conditional probability

Automatic discovery of structures

Intuitively, objects in the same class stick

together and form clusters

P i( )

p x i( | )

3PR , ANN, & ML

Locating groups (clusters) having similar

measurements

Given x x x unlabelled

partition in to C clusters

{ , ,..., }( )

, ,...,

Unsupervised Clustering (cont)

4PR , ANN, & ML

Similarity Measure

Need a similarity measurement s(x,x’)

(e.g., distance between x and x’ )

yxΣyxyx

features)binary (for y Commonalit

productinner Normalized

)()(),(

Distance sMahalanobi

1,)|(),(

Metric Minkowski

5PR , ANN, & ML

Similarity Measure (cont.)

How to set the threshold?

Too large all samples assigned into a single class

Too small each sample in its own class

How to properly weight each feature?

How do brightness of a fish and length of a fish

relate?

How to scale (or normalize) different measures?

6PR , ANN, & ML

Axis Scaling

7PR , ANN, & ML

Axis Scaling (cont.)

8PR , ANN, & ML

9PR , ANN, & ML

Threshold

10PR , ANN, & ML

Criteria for Clustering

A criterion function for clustering

J xmmx1

i x xi i in

11PR , ANN, & ML

Within and Between group variance

minimize and maximize the trace and

determinant of the appropriate scatter matrix

trace: square of the scattering radius

determinant: square of the scattering volume

xmmmmmS

xmmxmxS

Criteria for Clustering (cont.)

12PR , ANN, & ML

Pathology: when class sizes differ

J xmmx1

Relaxed constraint Allow the large class to

grow into smaller class

Stringent constraint Disallow large class and

split it

13PR , ANN, & ML

K-Means Algorithm

(fixed # of clusters)

Arbitrarily pick N cluster centers, assign

samples to nearest center

Compute sample mean of each cluster

Reassign samples to clusters with the

nearest mean (for all samples)

Repeat if there are changes, otherwise stop

14PR , ANN, & ML

15PR , ANN, & ML

16PR , ANN, & ML

17PR , ANN, & ML

K-Means

In some sense, it is a simplified version of

the case II of learning parametric form

iiikki

wPwxpxwP

)(),|(

)(),|(),|(

In stead of multiple membership, give the

sample to whichever class whose mean is

closest to it

otherwise

xwP ki0

mean classclosest with class1),|(

18PR , ANN, & ML

Fuzzy K-Means

In Matlab, you can find k-mean (it is called

c-mean) under fuzzy logic toolbox

It implements fuzzy k-mean

Where membership function is not 1 or 0, but

can be fractional

19PR , ANN, & ML

Iterative K-means Iterative refinement of clusters to minimize

Each step: move a sample from one to another

J xmmx1

|ˆ|11

to from x̂

20PR , ANN, & ML

1. Select an initial partition of the n samples into clusters and compute

2. Select the next candidate sample

3. If the current cluster is larger than 1, then

4. Transfer

5. Update 6. If J has not changed in n attempts, stop, otherwise

go back to 2.

x to if for all jk k j

J m mi k, ,

J m mc, ,...,1

21PR , ANN, & ML

Hierarchical Clustering

K-means assume a “flat” data

description

Hierarchical descriptions are more

frequently

22PR , ANN, & ML

Hierarchical clustering (cont.)

Samples in the same cluster will remain so

at a higher level

x1 x2 x3 x4 x5 x6 1

23PR , ANN, & ML

Bottom-up: agglomerate

Top-down: divisive

.to.Go

bycdecrement,anddelete,and.Merge

andclusters,distinctofpairnearestthe.Find

stopc,c.If

},...,x{xand n c.Let

Hierarchical clustering Options

24PR , ANN, & ML

At a certain step, we have c classes

Merge two clusters i and j

,|| jjii

ijjiij

Hierarchical clustering Procedure

25PR , ANN, & ML

Then certain criteria function will increase

||||||

ijjijjii

mxmxmx

),(minarg||minarg,,

I*,j* chosen in such a way

Until no such pair exist that cause an increase in the criteria

function less than certain preset threshold

Hierarchical clustering Procedure (cont.)

26PR , ANN, & ML

In fact, the criteria function can be chosen

in many different ways, resulting in

different clustering behaviors

)(max),(

classes)(compact neighbor Furthest

)(min),(

classes) (elongatedneighbor Nearest

Criteria Function

27PR , ANN, & ML

•Starting from every sample in a cluster

•Merge them according to some criteria function

•Until two clusters exist

28PR , ANN, & ML

29PR , ANN, & ML

30PR , ANN, & ML

In Reality

With n objects, the distance matrix is n x n

For large databases, it is computationally

expensive to compute and store the matrix

Solution: storing only k nearest clusters in

the distance matrix: complexity is k x n

31PR , ANN, & ML

Divisive Clustering

Less often used

Have to be careful that criterion function

usually decreases monotonically (the

samples become purer each time)

Natural grouping is the one where a large

drop in impurity occurs

iteration

impurity

32PR , ANN, & ML

ISODATA Algorithm

Iterative self-organizing data analysis

technique

A tried-and-true clustering algorithm

Dynamically update # of clusters

Can go both top-down (split) and bottom-up

(merge) directions

33PR , ANN, & ML

Notation:

merged be can that clusters ofnumber maximum

mergingfor distance maximum

splittingfor spread maximum

clusters ofnumber (desired) eapproximat

clustera in samples ofnumber on threshold

34PR , ANN, & ML

Algorithm

1. Cluster the existing data into Nc clusters but eliminate any data and classes with fewer than T members, decrease Nc. Exit when classification of samples has not changed

2. If iteration odd and

Split clusters whose samples are sufficiently disjoint, increase Nc

If any clusters have been split, go to 1

3. Merge any pair of clusters whose samples are sufficiently close

4. Go to step 1

c NNorN

35PR , ANN, & ML

Step 1

Classify samples according

to nearest mean

Discard samples in clusters

< T samples, decrease Nc

Recompute sample

mean of each cluster

Change in classification?

Parameter T

noexit

36PR , ANN, & ML

Compute for each cluster

Compute globally

kk dNN

Step 2

37PR , ANN, & ML

22, skk

TNDs ,,2

Split cluster

Nc= Nc+1

Cluster centers are displaced in

opposite direction along the axis of

largest variance

38PR , ANN, & ML

Compute for each pairs of clusters

|| jiijd μμ

Sort distance from smallest to largest less than Dm

Step 3

39PR , ANN, & ML

max# Nmergeof

Dd mij

Merge i and j

Nc= Nc-1

noNeither cluster i

nor j merged?

' jjii

μμμ

Spectral Clustering

Graph theoretical approach

(Advanced) linear algebra is really

important

A field by itself

Cover the basics here

PR , ANN, & ML

Graph notations

Undirected graph G = (V, E)

V = {v1, …, vn} are nodes (samples)

E = {ei,j| 0<=i,j<n} are edges, with weights

(similarity) wi,j

W: adjacency matrix with entries wi,j

D: degree matrix (diagonal) with entries

A: subset of vertices, \bar{A}: complement

1A = (f1, … fn)T, fi=1 if i in A and 0 otherwise

PR , ANN, & ML

More graph notations

Size of subset A in V

Based on # of vertices

Based on its connections

Connected component A

Any two vertices in A can be joined by a path

where all intermediate points lie in A

No connection between A and \bar{A}

Partition with A1, …, Ak, if

PR , ANN, & ML

Similarity Definitions

RBF (fully connected or thresholded):

E.g, Gaussian kernel gives wi,j

e-neighborhood graph:

Connect all points whose pairwise distances less

than e

K-nearest neighbor:

vi and vj are neighbors if either one is a kNN of the

Mutual k-nearest neighbor:

vi and vj are neighbors if both are a kNN of the other

PR , ANN, & ML

Graph Laplacian: L = D - W

PR , ANN, & ML

One connected component

PR , ANN, & ML

Multiple connected components

PR , ANN, & ML

Normalized Laplacian

PR , ANN, & ML

Unnormalized Spectral Clustering

PR , ANN, & ML

Normalized Spectral Clustering

PR , ANN, & ML

50PR , ANN, & ML

Toy Example

Random sample of 200 points drawn from 4

Gaussians

Similarity based on

Fully connected or

10 nearest neighbors

PR , ANN, & ML

Min-Cut Formulation

If the edge weight represents degree of similarity,

optimal bi-partitioning of a graph is to minimize

the cut (so called min-cut problem)

Intuition: Cut the connection between dis-similar

samples, hence, edge weight (similarity) should

be small

vuwBAcut,

),(),(

Feature

Problem with Min-cut

Tends to cut out small regions

Sum weight (green + red + blue) = constant

Min-cut minimizes sum of weights of blue

edges, with no regard to green and red (half of

the picture)

Remedy Distribute the total weight such that

Sum of weights of the blue edges are

minimized

Max between group variance

Sum of weights of the red (green) edges are

maximized

Min within group variance

Normalized Cut

Penalize cutting out small, isolated clusters

set vertex full :

green)-totalblue (red ),(),(

red)-totalblue (green ),(),(

(blue) ),(),(

),(),(

vuwVBasso

vuwVAasso

vuwBAcut

greentotal

redtotal

VBasso

VAasso

BAcutBANcut

Normalized Cut (cont.)

Penalize cutting out small, isolated clusters

Small blue

Small red (or green)

greentotal

redtotal

VBasso

VAasso

BAcutBANcut

),(),(

Intuition

2),(),(

),(),(),(),(),(),(

bluered

bluegreen

bluered

bluegreen

BANcutBANassoc

VBasso

VAasso

VBasso

BBassoc

VAasso

AAassoc

VBasso

BBassoc

VAasso

AAassoc

VAasso

BAcutvuwvuwvuwvuwVAasso

AvAuBvAuAvAuVvAu

Assoc reflects intra-class connection which

should be maximized

Ncut represents inter-class connection which

should be minimized

Solution

How do you define similarity?

Multiple measurements (i.e., a feature vector)

can be used

How do you find the minimal normalized

Solution turns out to be a generalized eigen

value problem!

kk vvwjid ||),(

NNNNwNwNw

),()2,()1,(

),2()2,2()1,2(

),1()2,1()1,1(

x1WDx1

x1WDx1TT )1(

))(()())(()(

),(),(

VBasso

VAasso

BAcutBANcut

0000 kjkj x

jiji xwdxwxwxwd

yDzzzW)D(DD

DyW)y(D

x)b(1x)(1yDyy

W)y(Dyx

min)(min

A symmetric semi-positive-definite matrix

Real, >=0 eigen values

Orthogonal eigen vectors

0:eigenvalue:reigenvecto 2

Hence, the second smallest eigen vector

contains the minimal cut solution (in

floating point format)

Even though computing all eigen

vectors/values are expensive O(n^3),

computing a small number of those are not

that expensive (Lanczos method)

Recursive applications of the procedure

Results

Unsupervised Clustering - sites.cs.ucsb.edu

Documents

Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress

Unsupervised Learning and Clustering - University of …rita/ml_course/lectures_old/Clustering.pdf · Unsupervised Learning and Clustering. ... Iterative Optimization Algorithms

L16: Micro-array analysis Dimension reduction Unsupervised clustering

Unsupervised Person Re-identiﬁcation: Clustering and Fine ...Unsupervised Person Re-identiﬁcation: Clustering and Fine-tuning Hehe Fan, Liang Zheng and Yi Yang Abstract—The superiority

Hierarchical Clustering Analysis - Partek · Hierarchical clustering is considered an unsupervised clustering method. ... offers an alternative to Hierarchical clustering in the form

Unsupervised spike sorting with wavelets and super-paramagnetic clustering

A CLUSTERING APPROACH FOR THE UNSUPERVISED RECOGNITION OF NONLITERAL LANGUAGEanoop/students/jbirke/jbirke... · 2018-08-11 · a clustering approach for the unsupervised recognition

1 Chapter 8: Clustering. 2 Searching for groups Clustering is unsupervised or undirected. Unlike classification, in clustering, no pre- classified data

Unsupervised Learning: Clustering 1 Lecture 16: Clustering Web Search and Mining

Unsupervised Object Matching and Categorization via Agglomerative Correspondence Clustering

Clustering - cs.bilkent.edu.trcs.bilkent.edu.tr/~gunduz/teaching/cs550/documents/CS550... · Clustering / Unsupervised Learning In SUPERVISED learning ... Heterogeneous vs homogeneous:

Unsupervised learning Clustering using the k-means algorithm Avi Libster

5 Unsupervised Learning and Clustering Algorithms - … · · 2007-02-26R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996 102 5 Unsupervised Learning and Clustering Algorithms

Unsupervised Word Sense Discrimination By Clustering Similar Contexts

Unsupervised Learning: K-means Clusteringi-systems.github.io/HSE545/machine learning all/05 Clustering/01_K... · Unsupervised Learning Data clustering is an unsupervised learning

Unsupervised Feature Extraction for Time Series Clustering ...bao/papers/N28.pdf · Unsupervised Feature Extraction for Time Series Clustering Using Orthogonal Wavelet Transform Hui

Clustering Time Series using Unsupervised-Shapelets …eamonn/ClusteringTimeSeriesUsingUnsupervised-… · Clustering Time Series using Unsupervised-Shapelets Jesin Zakaria Abdullah

An Unsupervised Learning Approach for Overlapping Co-clustering

Unsupervised Learning Clustering Algorithms - IT - websiteafred/tutorials/B_Clustering_Algorithms.pdf · 2 3 Unsupervised Learning Clustering Algorithms Unsupervised Learning -- Ana

Unsupervised Learning: Clustering