48
Cut-based clustering algorithms Pasi Fränti Winter workshop on Data Mining and Pattern Recognition Mekrijärvi 4.3.2013 Speech and Image Processing Unit School of Computing University of Eastern Finland

Cut-based clustering algorithms

  • Upload
    ozzy

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Speech and Image Processing Unit School of Computing University of Eastern Finland. Winter workshop on Data Mining and Pattern Recognition Mekrijärvi 4.3.2013. Cut-based clustering algorithms. Pasi Fränti. Part I: Clustering problem. Cut-based clustering. What is cut? - PowerPoint PPT Presentation

Citation preview

Page 1: Cut-based clustering algorithms

Cut-based clustering algorithmsPasi Fränti

Winter workshop on Data Mining and Pattern Recognition Mekrijärvi 4.3.2013

Speech and Image Processing UnitSchool of Computing

University of Eastern Finland

Page 2: Cut-based clustering algorithms

Part I:Clustering problem

Page 3: Cut-based clustering algorithms

Cut-based clustering

• What is cut?

• Can we used graph theory in clustering?

• Is normalized-cut useful?

• Are cut-based algorithms efficient?

Page 4: Cut-based clustering algorithms

Application exampleLocation markers

Page 5: Cut-based clustering algorithms

Application exampleClustered

Page 6: Cut-based clustering algorithms

Definitions and data

Set of N data points:X={x1, x2, …, xN}

Set of M cluster prototypes (centroids):C={c1, c2, …, cM},

P={p1, p2, …, pM},

Partition of the data:

Page 7: Cut-based clustering algorithms

Cluster prototypesPartition of data

Clustering solution

Page 8: Cut-based clustering algorithms

Cluster prototypesPartition of data

Centroids as prototypes

Partition by nearestprototype mapping

Duality of partition and centroids

Page 9: Cut-based clustering algorithms

Clustering problem

Where are the clusters?– Given data set (X) of N data points, and number of

clusters (M), find the clusters.

– Result is partition of cluster model (prototypes).

How many clusters?– Define cost function f.

– Repeat the clustering algorithm for several M.

– Select the best result according to f.

How to solve efficiently?

Algorithmic problemM

athematical

problem

Computer science

problem

Page 10: Cut-based clustering algorithms

Cluster missingClusters missing

Too m

any clusters

Incorrect cluster allocation

Incorrect number of clusters

Challenges in clustering

Page 11: Cut-based clustering algorithms

• Clustering method = defines the problem

• Clustering algorithm = solves the problem

• Problem defined as cost function– Goodness of one cluster– Similarity vs. distance – Global vs. local cost function (what is “cut”)

• Solution: algorithm to solve the problem

Clustering method

Page 12: Cut-based clustering algorithms

Minimize intra-cluster variance

K

k

kj

kiji cxcxd

1

2),(

N

ipi i

cxN

PCMSE1

21),(

Euclidean distance to centroid:

Mean square error:

Page 13: Cut-based clustering algorithms

Part II:Cut-based clustering

Page 14: Cut-based clustering algorithms

• Usually assumes graph

• Based on concept of cut

• Includes implicit assumptions which are often:– Even false ones!– No difference than clustering in vector space– Implies sub-optimal heuristics

Cut-based clustering

Page 15: Cut-based clustering algorithms

• Minimum-spanning tree based clustering (single link)

• Split-and-merge (Lin&Chen TKDE 2005): Split the data set using K-means, then merge similar clusters based on Gaussian distribution cluster similarity.

• Split-and-merge (Li, Jiu, Cot, PR 2009): Splits data into a large number of subclusters, then remove and add prototypes until no change.

• DIVFRP (Zhong et al, PRL 2008): Dividing according to furthest point heuristic.

• Normalized-cut (Shi&Malik, PAMI-2000): Cut-based, minimizing the disassociation between the groups and maximizing the association within the groups.

• Ratio-Cut (Hagen&Kahng, 1992)

• Mcut (Ding et al, ICDM 2001)

• Max k-cut (Frieze&Jerrum 1997)

• Feng et al, PRL 2010. Particle Swarm Optimization for selecting the hyperplane.

Cut-based clustering methods

Methods that I did not

find to process more

detailed…

Page 16: Cut-based clustering algorithms

Clustering a graph

But where we get this…?

Page 17: Cut-based clustering algorithms

Distance graph

Distance graph

37

3

2

6

5

74

2

5

34

7

Calculate from vector space!

Page 18: Cut-based clustering algorithms

Space complexity of graph

Distance graph Complete graph

N∙(N-1)/2 edges = O(N2)

37

3

2

6

5

74

2

5

34

7

But…

Page 19: Cut-based clustering algorithms

Minimum spanning tree (MST)

Distance graph MST

Works with simple examples like this

37

3

2

6

5

74

2

5

34

7 4

23

52

3

Page 20: Cut-based clustering algorithms

Cut

Graph cut Resulted clusters

Cost function is to maximize the weight of edges cut

This equals to minimizing the within cluster edge weights

Page 21: Cut-based clustering algorithms

Cut

Graph cut Resulted clusters

Equivalent to minimizing MSE!

Page 22: Cut-based clustering algorithms

Stopping criterionEnds up to a local minimum

21.2

21.3

21.4

21.5

21.6

21.7

21.8

50 60 70 80 90

Number of clusters

SC

RepeatedK-means

RLS

Divisive

Agglomerativ

e

Page 23: Cut-based clustering algorithms

Clustering method

0

5

10

15

20

25

30

35

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Number of cluster

MS

E

0

2

4

6

8

10

12

DB

I & F

-tes

t

MSEDBIF-test

Minimum point

Page 24: Cut-based clustering algorithms

Part III:Efficient divisive clustering

Page 25: Cut-based clustering algorithms

Motivation

• Efficiency of divide-and-conquer approach• Hierarchy of clusters as a result• Useful when solving the number of clusters

Challenges• Design problem 1: What cluster to split?• Design problem 2: How to split?• Sub-optimal local optimization at best

Divisive approach

Page 26: Cut-based clustering algorithms

Split(X, M) C, P m 1; REPEAT

Select cluster to be split; Split the cluster; m m+1; UpdateDataStructures;

UNTIL m=M;

Split-based (divisive) clustering

Page 27: Cut-based clustering algorithms

• Heuristic choices:– Cluster with highest variance (MSE)

– Cluster with most skew distribution (3rd moment)

• Optimal choice:– Tentatively split all clusters

– Select the one that decreases MSE most!

• Complexity of choice:– Heuristics take the time to compute the measure

– Optimal choice takes only twice (2) more time!!!

– The measures can be stored, and only two new clusters appear at each step to be calculated.

Select cluster to be split

Use this !

Page 28: Cut-based clustering algorithms

Selection example

8.2

6.5

7.5

4.3

11.2

11.6

Biggest MSE…

… but dividing this decreases MSE more

Page 29: Cut-based clustering algorithms

Selection example

4.1

6.3

Only two new values need to be calculated

7.5

8.2

4.3

6.5

11.6

Page 30: Cut-based clustering algorithms

How to split

• Centroid methods:– Heuristic 1: Replace C by C- and C+– Heuristic 2: Two furthest vectors.– Heuristic 3: Two random vectors.

• Partition according to principal axis:

– Calculate principal axis– Select dividing point along the axis– Divide by a hyperplane– Calculate centroids of the two sub-clusters

Page 31: Cut-based clustering algorithms

Splitting along principal axispseudo code

Step 1: Calculate the principal axis.

Step 2: Select a dividing point.

Step 3: Divide the points by a hyper plane.

Step 4: Calculate centroids of the new clusters.

Page 32: Cut-based clustering algorithms

Example of dividing

Dividing hyper plane

Prin

cipa

l axi

s

Page 33: Cut-based clustering algorithms

Optimal dividing pointpseudo code of Step 2

Step 2.1: Calculate projections on the principal axis.

Step 2.2: Sort vectors according to the projection.

Step 2.3: FOR each vector xi DO:- Divide using xi as dividing point.- Calculate distortion of subsets D1 and D2.

Step 2.4: Choose point minimizing D1+D2.

Page 34: Cut-based clustering algorithms

Finding dividing point

2

22

22

11

1

11' ii vc

n

nvc

n

nDD

1'

1

111

n

vcnc i

1'

2

222

n

vcnc i

• Calculating error for next dividing point:

• Update centroids:

Can be done in O(1) time!!!

Page 35: Cut-based clustering algorithms

Sub-optimality of the split

optimal partitionboundary for2-level clustering

Page 36: Cut-based clustering algorithms

Example of splitting processD

ivid

ing

hype

r pl

ane

Principal axis

2 clusters 3 clusters

Page 37: Cut-based clustering algorithms

4 clusters 5 clusters

Example of splitting process

Page 38: Cut-based clustering algorithms

6 clusters 7 clusters

Example of splitting process

Page 39: Cut-based clustering algorithms

8 clusters 9 clusters

Example of splitting process

Page 40: Cut-based clustering algorithms

10 clusters 11 clusters

Example of splitting process

Page 41: Cut-based clustering algorithms

12 clusters 13 clusters

Example of splitting process

Page 42: Cut-based clustering algorithms

14 clusters 15 clusters

MSE = 1.94

Example of splitting process

Page 43: Cut-based clustering algorithms

K-means refinement

Result afterre-partition:MSE = 1.39

Result after K-means: MSE = 1.33

Result directly

after split: MSE = 1.94

Page 44: Cut-based clustering algorithms

2

...2

...444422

M

N

M

NNNNNNNNni

MNM

NMNNN log

22...

44

22

n p nmin max

mp

Nn

npmnmnnnN m

max

maxmin21 ...

Time complexity

Number of processed vectors, assuming that clusters are always split into two equal halves:

Assuming inequal split to nmax and nmin sizes:

Page 45: Cut-based clustering algorithms

Number of vectors processed:

MNmp

N

Mp

N

p

N

p

N

p

Nn

M

m

i

log

...32

1

At each step, sorting the vectors is bottleneck:

MNNmp

N

p

N

mp

N

mp

NnnNT

M

m

M

mii

logloglog

loglog

1

1

Time complexity

Page 46: Cut-based clustering algorithms

Comparison of results

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0

0 20 40 60 80 100 120 140 160

Time (seconds)

MS

E

Split 2

Split 1

SLR(KM)

S+KMSLR

SLR+KM S(KM)

Birch1

Page 47: Cut-based clustering algorithms

Conclusions

• Cut Same as partition• Cut-based method Empty concept• Cut-based algorithm Same as divisive• Graph-based clustering Flawed concept• Clustering of graph more relevant topic

Page 48: Cut-based clustering algorithms

References1. P Fränti, T Kaukoranta and O Nevalainen, "On the splitting method for

vector quantization codebook generation", Optical Engineering, 36 (11), 3043-3051, November 1997.

2. C-R Lin and M-S Chen, “ Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging”, TKDE, 17(2), 2005.

3. M Liu, X Jiang, AC Kot, “A multi-prototype clustering algorithm”, Pattern Recognition, 42(2009) 689-698.

4. J Shi and J Malik, “Normalized cuts and image segmentation”, TPAMI, 22(8), 2000.

5. L Feng, M-H Qiu, Y-X Wang, Q-L Xiang, Y-F Yang, K Liu, ”A fast divisive clustering algorithm using an improved discrete particle swarm optimizer”, Pattern Recognition Letters, 2010.

6. C Zhong, D Miao, R Wang, X Zhou, “DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points”, Pattern Recognition Letters, 29 (2008) 2067–2077.