35
1 Partitioning Algorithms: Basic Concepts Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared Error Squared Error of a cluster m i is the mean (centroid) of C i Squared Error of a clustering k i C p i k i i i m p d C Error Error 1 2 1 ) , ( ) ( i C p i i m p d C Error 2 ) , ( ) (

Partitioning Algorithms: Basic Concepts

  • Upload
    bao

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Partitioning Algorithms: Basic Concepts. Partition n objects into k clusters Optimize the chosen partitioning criterion Example: minimize the Squared Error Squared Error of a cluster m i is the mean (centroid) of C i Squared Error of a clustering. Example of Square Error of Cluster. - PowerPoint PPT Presentation

Citation preview

Page 1: Partitioning Algorithms: Basic Concepts

1

Partitioning Algorithms: Basic Concepts Partition n objects into k clusters

Optimize the chosen partitioning criterion Example: minimize the Squared Error Squared Error of a cluster

mi is the mean (centroid) of Ci

Squared Error of a clustering

k

i Cp

i

k

ii

i

mpdCErrorError1

2

1

),()(

iCp

ii mpdCError2

),()(

Page 2: Partitioning Algorithms: Basic Concepts

2

Example of Square Error of Cluster

0 1 2 3 4 5 6 7 8 9 10

10

98

76

5

43

21

Ci={P1, P2, P3}

P1 = (3, 7)P2 = (2, 3)P3 = (7, 5)mi = (4, 5)

|d(P1, mi)|2

=(3-4)2+(7-5)2=5|d(P2, mi)|2=8

|d(P3, mi)|2=9

Error (Ci)=5+8+9=22

P3

P2

P1

mi

Page 3: Partitioning Algorithms: Basic Concepts

3

Example of Square Error of Cluster

0 1 2 3 4 5 6 7 8 9 10

10

98

76

5

43

21

Cj={P4, P5, P6}

P4 = (4, 6)P5 = (5, 5)P6 = (3, 4)mj = (4, 5)

|d(P4, mj)|2

=(4-4)2+(6-5)2=1|d(P5, mj)|2=1

|d(P6, mj)|2=1

Error (Cj)=1+1+1=3

P5

P6

P4

mj

Page 4: Partitioning Algorithms: Basic Concepts

4

Partitioning Algorithms: Basic Concepts

Global optimal: examine all possible partitions kn possible partitions, too expensive!

Heuristic methods: k-means and k-medoids k-means (MacQueen’67): Each cluster

is represented by center of cluster k-medoids (Kaufman &

Rousseeuw’87): Each cluster is represented by one of the objects (medoid) in cluster

Page 5: Partitioning Algorithms: Basic Concepts

5

K-means Initialization

Arbitrarily choose k objects as the initial cluster centers (centroids)

Iteration until no change For each object Oi

Calculate the distances between Oi and the k centroids

(Re)assign Oi to the cluster whose centroid is the closest to Oi

Update the cluster centroids based on current assignment

Page 6: Partitioning Algorithms: Basic Concepts

6

k-Means Clustering Method

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

cluster

meancurrent clusters

new clusters

objectsrelocat

ed

Page 7: Partitioning Algorithms: Basic Concepts

7

Example

For simplicity, 1 dimensional objects and k=2.

Objects: 1, 2, 5, 6,7 K-means:

Randomly select 5 and 6 as initial centroids; => Two clusters {1,2,5} and {6,7};

meanC1=8/3, meanC2=6.5 => {1,2}, {5,6,7}; meanC1=1.5, meanC2=6 => no change. Aggregate dissimilarity = 0.5^2 + 0.5^2 +

1^2 + 1^2 = 2.5

Page 8: Partitioning Algorithms: Basic Concepts

8

Variations of k-Means Method Aspects of variants of k-means

Selection of initial k centroids E.g., choose k farthest points

Dissimilarity calculations E.g., use Manhattan distance

Strategies to calculate cluster means E.g., update the means incrementally

Page 9: Partitioning Algorithms: Basic Concepts

9

Strengths of k-Means Method Strength

Relatively efficient for large datasets O(tkn) where n is # objects, k is #

clusters, and t is # iterations; normally, k, t <<n

Often terminates at a local optimum global optimum may be found using

techniques such as deterministic annealing and genetic algorithms

Page 10: Partitioning Algorithms: Basic Concepts

10

Weakness of k-Means Method Weakness

Applicable only when mean is defined, then what about categorical data?

k-modes algorithm Unable to handle noisy data and outliers

k-medoids algorithm Need to specify k, number of clusters, in

advance Hierarchical algorithms Density-based algorithms

Page 11: Partitioning Algorithms: Basic Concepts

11

k-modes Algorithm

Handling categorical data: k-modes (Huang’98)

Replacing means of clusters with modes

Given n records in cluster, mode is record made up of most frequent attribute values

age income student credit_rating<=30 high no fair<=30 high no excellent31…40 high no fair>40 medium no fair>40 low yes fair>40 low yes excellent31…40 low yes excellent<=30 medium no fair<=30 low yes fair>40 medium yes fair<=30 medium yes excellent31…40 medium no excellent31…40 high yes fair

In the example cluster, mode = (<=30, medium, yes, fair)

Using new dissimilarity measures to deal with categorical objects

Page 12: Partitioning Algorithms: Basic Concepts

12

A Problem of K-means Sensitive to outliers

Outlier: objects with extremely large (or small) values

May substantially distort the distribution of the data

++

Outlier

Page 13: Partitioning Algorithms: Basic Concepts

13

k-Medoids Clustering Method

k-medoids: Find k representative objects, called medoids

PAM (Partitioning Around Medoids, 1987) CLARA (Kaufmann & Rousseeuw, 1990) CLARANS (Ng & Han, 1994): Randomized

sampling

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

k-means

k-medoids

Page 14: Partitioning Algorithms: Basic Concepts

14

PAM (Partitioning Around Medoids) (1987) PAM (Kaufman and Rousseeuw, 1987) Arbitrarily choose k objects as the initial

medoids Until no change, do

(Re)assign each object to the cluster with the nearest medoid

Improve the quality of the k-medoids (Randomly select a nonmedoid object, Orandom,

compute the total cost of swapping a medoid with

Orandom)

Work for small data sets (100 objects in 5 clusters)

Not efficient for medium and large data sets

Page 15: Partitioning Algorithms: Basic Concepts

15

Swapping Cost For each pair of a medoid m and a non-

medoid object h, measure whether h is better than m as a medoid

Use the squared-error criterion

Compute Eh-Em

Negative: swapping brings benefit Choose the minimum swapping cost

k

i Cpi

i

mpdE1

2),(

Page 16: Partitioning Algorithms: Basic Concepts

16

Four Swapping Cases When a medoid m is to be swapped with a non-

medoid object h, check each of other non-medoid objects j

j is in cluster of m reassign j Case 1: j is closer to some k than to h; after

swapping m and h, j relocates to cluster represented by k

Case 2: j is closer to h than to k; after swapping m and h, j is in cluster represented by h

j is in cluster of some k, not m compare k with h Case 3: j is closer to some k than to h; after

swapping m and h, j remains in cluster represented by k

Case 4: j is closer to h than to k; after swapping m and h, j is in cluster represented by h

Page 17: Partitioning Algorithms: Basic Concepts

17

PAM Clustering: Total swapping cost TCmh=jCjmh

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

j

mh

k

Cjmh = d(j, k) d(j, k)=0

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

k

mh j

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

k

m

hj

Cjmh = d(j, h) d(j, m) May be positive or negative

Case 2

Case 3

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

h

m k

j

Cjmh = d(j, k) d(j, m) 0

Case 1

Case 4

Cjmh = d(j, h) d(j, k) < 0

Page 18: Partitioning Algorithms: Basic Concepts

18

Complexity of PAM Arbitrarily choose k

objects as the initial medoids

Until no change, do (Re)assign each object to

the cluster with the nearest medoid

Improve the quality of the k-medoids

For each pair of medoid m and non-medoid object h

Calculate the swapping cost TCmh =jCjmh

O(1)

O((n-k)2*k)

O((n-k)*k)

O((n-k)2*k)

(n-k)*k times

O(n-k)

Page 19: Partitioning Algorithms: Basic Concepts

19

Strength and Weakness of PAM

PAM is more robust than k-means in the presence of outliers because a medoid is less influenced by outliers or other extreme values than a mean

PAM works efficiently for small data sets but does not scale well for large data sets

O(k(n-k)2 ) for each iteration

where n is # of data objects, k is # of clusters

Can we find the medoids faster?

Page 20: Partitioning Algorithms: Basic Concepts

20

CLARA (Clustering Large Applications) (1990) CLARA (Kaufmann and Rousseeuw in

1990) Built in statistical analysis packages, such

as S+ It draws multiple samples of data set,

applies PAM on each sample, gives best clustering as output

Handle larger data sets than PAM (1,000 objects in 10 clusters)

Efficiency and effectiveness depends on the sampling

Page 21: Partitioning Algorithms: Basic Concepts

21

CLARA - Algorithm Set mincost to MAXIMUM; Repeat q times // draws q samples

Create S by drawing s objects randomly from D;

Generate the set of medoids M from S by applying the PAM algorithm;

Compute cost(M,D) If cost(M, D)<mincost

Mincost = cost(M, D); Bestset = M;

Endif; Endrepeat; Return Bestset;

Page 22: Partitioning Algorithms: Basic Concepts

22

Complexity of CLARA

Set mincost to MAXIMUM; Repeat q times

Create S by drawing s objects randomly from D;

Generate the set of medoids M from S by applying the PAM algorithm;

Compute cost(M,D) If cost(M, D)<mincost

Mincost = cost(M, D);Bestset = M;

Endif; Endrepeat; Return Bestset;

O(1)

O(1)

O((s-k)2*k)

O((n-k)*k)O(1)

O((s-k)2*k+(n-k)*k)

Page 23: Partitioning Algorithms: Basic Concepts

23

Strengths and Weaknesses of CLARA Strength:

Handle larger data sets than PAM (1,000 objects in 10 clusters)

Weakness: Efficiency depends on sample size A good clustering based on samples will

not necessarily represent a good clustering of whole data set if sample is biased

Page 24: Partitioning Algorithms: Basic Concepts

24

CLARANS (“Randomized” CLARA) (1994) CLARANS (A Clustering Algorithm based

on Randomized Search) (Ng and Han’94) CLARANS draws sample in solution space

dynamically A solution is a set of k medoids The solutions space contains

solutions in total

The solution space can be represented by a graph where every node is a potential solution, i.e., a set of k medoids

k

n

Page 25: Partitioning Algorithms: Basic Concepts

25

Graph Abstraction Every node is a potential solution (k-

medoid) Every node is associated with a squared

error Two nodes are adjacent if they differ by

one medoid Every node has k(nk) adjacent nodes

{O1,O2,…,Ok}

{Ok+1,O2,…,Ok} {Ok+n,O2,…,Ok}

n-k neighbors for one medoid

k(n k) neighbors for one node

Page 26: Partitioning Algorithms: Basic Concepts

26

Graph Abstraction: CLARANS Start with a randomly selected node,

check at most m neighbors randomly If a better adjacent node is found, moves

to node and continue; otherwise, current node is local optimum; re-starts with another randomly selected node to search for another local optimum

When h local optimum have been found, returns best result as overall result

Page 27: Partitioning Algorithms: Basic Concepts

27

CLARANS

N

NN

C

C

N

N N

<

Local minimum

Compare no more than maxneighbor times

numlocal

Best Node

Local minimum

…Local minimum

…Local minimum

Page 28: Partitioning Algorithms: Basic Concepts

28

CLARANS - Algorithm Set mincost to MAXIMUM; For i=1 to h do // find h local optimum

Randomly select a node as the current node C in the graph;

J = 1; // counter of neighbors Repeat

Randomly select a neighbor N of C;If Cost(N,D)<Cost(C,D)

Assign N as the current node C;J = 1;

Else J++;Endif;

Until J > m Update mincost with Cost(C,D) if

applicableEnd for; End For Return bestnode;

Page 29: Partitioning Algorithms: Basic Concepts

29

Graph Abstraction (k-means, k-modes, k-medoids) Each vertex is a set of k-representative

objects (means, modes, medoids) Each iteration produces a new set of k-

representative objects with lower overall dissimilarity

Iterations correspond to a hill descent process in a landscape (graph) of vertices

Page 30: Partitioning Algorithms: Basic Concepts

30

Comparison with PAM Search for minimum in graph (landscape) At each step, all adjacent vertices are

examined; the one with deepest descent is chosen as next k-medoids

Search continues until minimum is reached For large n and k values (n=1,000, k=10),

examining all k(nk) adjacent vertices is time consuming; inefficient for large data sets

CLARANS vs PAM For large and medium data sets, it is obvious

that CLARANS is much more efficient than PAM For small data sets, CLARANS outperforms

PAM significantly

Page 31: Partitioning Algorithms: Basic Concepts

31

When n=80, CLARANS is 5 times faster than PAM, while the cluster quality is the same.

Page 32: Partitioning Algorithms: Basic Concepts

32

Comparision with CLARA

CLARANS vs CLARA CLARANS is always able to find clusterings

of better quality than those found by CLARA; CLARANS may use much more time than CLARA

When the time used is the same, CLARANS is still better than CLARA

Page 33: Partitioning Algorithms: Basic Concepts

33

Page 34: Partitioning Algorithms: Basic Concepts

34

Hierarchies of Co-expressed Genes and Coherent Patterns

The interpretation of co-expressed genes

and coherent patterns mainly depends on the

domain knowledge

Page 35: Partitioning Algorithms: Basic Concepts

35

A Subtle Situation

To split or not to split? It’s a question.

group A

group A1

group A2