105
1 Data Mining Lecture # 11 Clustering

Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

1

Data Mining

Lecture # 11Clustering

Page 2: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Cluster Analysis

Page 3: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

What is Cluster Analysis?

• Cluster: a collection of data objects

– Similar to one another within the same cluster

– Dissimilar to the objects in other clusters

• Cluster analysis

– Finding similarities between data according to the

characteristics found in the data and grouping similar data

objects into clusters

Page 4: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

What is Cluster Analysis?

• Clustering analysis is an important human activity

• Early in childhood, we learn how to distinguish between cats and dogs

• Unsupervised learning: no predefined classes

• Typical applications

– As a stand-alone tool to get insight into data distribution

– As a preprocessing step for other algorithms

Page 5: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Clustering

• Hard vs. Soft

– Hard: same object can only belong to single cluster

– Soft: same object can belong to different clusters

Page 6: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Clustering

• Flat vs. Hierarchical

– Flat: clusters are flat

– Hierarchical: clusters form a tree

• Agglomerative

• Divisive

Page 7: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Clustering: Rich Applications and Multidisciplinary Efforts

• Pattern Recognition

• Spatial Data Analysis

– Create thematic maps in GIS by clustering feature spaces

– Detect spatial clusters or for other spatial mining tasks

• Image Processing

• Economic Science (especially market research)

• WWW

– Document classification

– Cluster Weblog data to discover groups of similar access patterns

Page 8: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Quality: What Is Good Clustering?

• A good clustering method will produce high quality clusters with

– high intra-class similarity

(Similar to one another within the same cluster)

– low inter-class similarity

(Dissimilar to the objects in other clusters)

Page 9: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

What is Cluster Analysis?

• Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

Inter-cluster distances are maximized

Intra-cluster distances are

minimized

Page 10: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Similarity and Dissimilarity Between Objects

• Distances are normally used to measure the similarity or

dissimilarity between two data objects

• Some popular ones include: Minkowski distance:

where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two p-

dimensional data objects, and q is a positive integer

• If q = 1, d is Manhattan distance

qq

pp

qq

jx

ix

jx

ix

jx

ixjid )||...|||(|),(

2211

||...||||),(2211 pp j

xi

xj

xi

xj

xi

xjid

Page 11: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Similarity and Dissimilarity Between Objects (Cont.)

• If q = 2, d is Euclidean distance:

• Also, one can use weighted distance, parametric

Pearson correlation, or other disimilarity measures

)||...|||(|),( 22

22

2

11 pp jx

ix

jx

ix

jx

ixjid

Page 12: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Major Clustering Approaches

• Partitioning approach:

– Construct various partitions and then evaluate them by some criterion, e.g., minimizing

the sum of square errors

– Typical methods: k-means, k-medoids, CLARANS

• Hierarchical approach:

– Create a hierarchical decomposition of the set of data (or objects) using some criterion

– Typical methods: Hierarchical, Diana, Agnes, BIRCH, ROCK, CAMELEON

• Density-based approach:

– Based on connectivity and density functions

– Typical methods: DBSACN, OPTICS, DenClue

Page 13: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Clustering Approaches

1. Partitioning Methods

2. Hierarchical Methods

3. Density-Based Methods

Page 14: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Partitioning Algorithms: Basic Concept

• Given a k, find a partition of k clusters that optimizes the chosen

partitioning criterion

• k-means and k-medoids algorithms

• k-means (MacQueen’67): Each cluster is represented by the center of the cluster

• k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each

cluster is represented by one of the objects in the cluster

Page 15: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

The K-Means Clustering Method

• Given k, the k-means algorithm is

implemented in four steps:1. Partition objects into k nonempty subsets

2. Compute seed points as the centroids of the clusters of the

current partition (the centroid is the center, i.e., mean

point, of the cluster)

3. Assign each object to the cluster with the nearest seed

point

4. Go back to Step 2, stop when no more new assignment

Page 16: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

K-means Clustering

Page 17: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

K-means Clustering

Page 18: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

K-means Clustering

Page 19: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

K-means Clustering

Page 20: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

K-means Clustering

Page 21: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

The K-Means Clustering Method

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K object as initial cluster center

Assign each objects to most similar center

Update the cluster means

Update the cluster means

reassignreassign

Page 22: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Run K-means clustering with 3 clusters (initial centroids: 3, 16, 25) for at least 2 iterations

Page 23: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Centroids:3 – 2 3 4 7 9 new centroid: 5

16 – 10 11 12 16 18 19 new centroid: 14.33

25 – 23 24 25 30 new centroid: 25.5

Page 24: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Centroids:5 – 2 3 4 7 9 new centroid: 5

14.33 – 10 11 12 16 18 19 new centroid: 14.33

25.5 – 23 24 25 30 new centroid: 25.5

Page 25: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

In class Practice

• Run K-means clustering with 3 clusters (initial centroids: 3, 12, 19) for at least 2 iterations

Page 26: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

• Problem

Example

Suppose we have 4 types of medicines and each has two attributes (pH and

weight index). Our goal is to group these objects into K=2 group of medicine.

Medicine

Weight pH-Index

A 1 1

B 2 1

C 4 3

D 5 4

A B

C

D

Page 27: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Step 1: Use initial seed points for partitioning Bc ,Ac 21

24.4)14()25( ),(

5)14()15( ),(

222

221

cDd

cDd

Assign each object to the cluster with the nearest seed point

Euclidean distance

D

C

A B

Page 28: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Step 2: Compute new centroids of the current partition Knowing the members of each

cluster, now we compute the new centroid of each group based on these new memberships.

)3

8 ,

3

11(

3

431 ,

3

542

)1 ,1(

2

1

c

c

Page 29: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Step 2: Renew membership based on new centroids

Compute the distance of all objects to the new centroids

Assign the membership to objects

Page 30: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Step 3: Repeat the first two steps until its convergence Knowing the members of each

cluster, now we compute the new centroid of each group based on these new memberships.

)2

13 ,

2

14(

2

43 ,

2

54

)1 ,2

11(

2

11 ,

2

21

2

1

c

c

Page 31: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Example

• Step 3: Repeat the first two steps until its convergence Compute the distance of all

objects to the new centroids

Stop due to no new assignment Membership in each cluster no longer change

Page 32: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

ExerciseFor the medicine data set, use K-means with the Manhattan distance

metric for clustering analysis by setting K=2 and initialising seeds as

C1 = A and C2 = C. Answer three questions as follows:

1. How many steps are required for convergence?

2. What are memberships of two clusters after convergence?

3. What are centroids of two clusters after convergence?

Medicine Weight pH-Index

A 1 1

B 2 1

C 4 3

D 5 4

A B

C

D

Page 33: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical Clustering

• Two main types of hierarchical clustering– Agglomerative:

• Start with the points as individual clusters

• At each step, merge the closest pair of clusters until only one cluster (or k clusters) left

Matlab: Statistics Toolbox: clusterdata,

which performs all these steps: pdist, linkage, cluster

– Divisive:

• Start with one, all-inclusive cluster

• At each step, split a cluster until each cluster contains a point (or there are k clusters)

• Traditional hierarchical algorithms use a similarity or distance matrix– Merge or split one cluster at a time

– Image segmentation mostly uses simultaneous merge/split

Page 34: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering

• Agglomerative (Bottom-up)– Compute all pair-wise pattern-pattern similarity

coefficients

– Place each of n patterns into a class of its own

– Merge the two most similar clusters into one• Replace the two clusters into the new cluster

• Re-compute inter-cluster similarity scores w.r.t. the new cluster

– Repeat the above step until there are k clusters left (k can be 1)

Page 35: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering

• Agglomerative (Bottom up)

Page 36: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering• Agglomerative (Bottom up)

• 1st iteration1

Page 37: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering• Agglomerative (Bottom up)

• 2nd iteration1 2

Page 38: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering• Agglomerative (Bottom up)

• 3rd iteration

1 23

Page 39: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering• Agglomerative (Bottom up)

• 4th iteration

1 23

4

Page 40: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering• Agglomerative (Bottom up)

• 5th iteration

1 23

4

5

Page 41: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering• Agglomerative (Bottom up)

• Finally k clusters left

1 23

4

6 9

5

7

8

Page 42: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering

• Divisive (Top-down)

– Start at the top with all patterns in one cluster

– The cluster is split using a flat clustering algorithm

– This procedure is applied recursively until each pattern is in its own singleton cluster

Page 43: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical clustering

• Divisive (Top-down)

Page 44: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical Clustering: The Algorithm

• Hierarchical clustering takes as input a set of points • It creates a tree in which the points are leaves and the internal

nodes reveal the similarity structure of the points.– The tree is often called a “dendogram.”

• The method is summarized below:

Place all points into their own clusters

While there is more than one cluster, do

Merge the closest pair of clusters

The behavior of the algorithm depends on how “closest pair

of clusters” is defined

Page 45: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical Clustering: Example

A

B

EF

C D

A B E FC D

This example illustrates single-link clustering in

Euclidean space on 6 points.

Page 46: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical Clustering

• Produces a set of nested clusters organized as a hierarchical tree

• Can be visualized as a dendrogram

– A tree like diagram that records the sequences of merges or splits

1 3 2 5 4 60

0.05

0.1

0.15

0.2

1

2

3

4

5

6

1

23 4

5

Page 47: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Strengths of Hierarchical Clustering

• Do not have to assume any particular number of clusters– Any desired number of clusters can be obtained by

‘cutting’ the dendogram at the proper level

Page 48: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Hierarchical Clustering: Merging Clusters

Single Link: Distance between two clusters

is the distance between the closest points.

Also called “neighbor joining.”

Average Link: Distance between clusters is

distance between the cluster centroids.

Complete Link: Distance between clusters

is distance between farthest pair of points.

Page 49: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

How to Define Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.

Similarity?

• MIN

• MAX

• Group Average

• Distance Between Centroids

• Other methods driven by an objective function– Ward’s Method uses squared error

Proximity Matrix

Page 50: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

How to Define Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN

• MAX

• Group Average

• Distance Between Centroids

• Other methods driven by an objective function– Ward’s Method uses squared error

Page 51: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

How to Define Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN

• MAX

• Group Average

• Distance Between Centroids

• Other methods driven by an objective function– Ward’s Method uses squared error

Page 52: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

How to Define Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN

• MAX

• Group Average

• Distance Between Centroids

• Other methods driven by an objective function– Ward’s Method uses squared error

Page 53: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

How to Define Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN

• MAX

• Group Average

• Distance Between Centroids

• Other methods driven by an objective function

Page 54: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

An example

Let us consider a gene measured in a set of 5 experiments: A,B,C,D and E. The values measured in the 5 experiments are:

A=100 B=200 C=500 D=900 E=1100

We will construct the hierarchical clustering of these values using Euclidean distance, centroid linkage and an agglomerative approach.

57

Page 55: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

An example

SOLUTION:

• The closest two values are 100 and 200=>the centroid of these two values is 150.

• Now we are clustering the values: 150, 500, 900, 1100

• The closest two values are 900 and 1100

=>the centroid of these two values is 1000.

• The remaining values to be joined are: 150, 500, 1000.

• The closest two values are 150 and 500

=>the centroid of these two values is 325.

• Finally, the two resulting subtrees are joined in the root of the tree.

58

Page 56: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

An example:Two hierarchical clusters of the expression values of a single gene measured

in 5 experiments.

59

100A

200B

500C

900D

1100E

100A

200B

500C

900D

1100E

The dendograms are identical: both diagrams show that:•A is most similar to B•C is most similar to the group (A,B)•D is most similar to E

In the left dendogram A and E are plotted far from each otherIn the right dendogram A and E are immediate neighbors

THE PROXIMITY IN A HIERARCHICAL CLUSTERING DOES NOT NECESSARILYCORRESPOND TO SIMILARITY

Page 57: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

What Is the Problem of the K-Means Method?

• The k-means algorithm is sensitive to outliers !

– Since an object with an extremely large value may substantially distort

the distribution of the data.

• K-Medoids: Instead of taking the mean value of the object in a

cluster as a reference point, medoids can be used, which is the

most centrally located object in a cluster.

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Page 58: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Limitations of K-means: Differing Sizes

Original Points K-means (3 Clusters)

Page 59: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Limitations of K-means: Differing Density

Original Points K-means (3 Clusters)

Page 60: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Limitations of K-means: Non-globular Shapes

Original Points K-means (2 Clusters)

Page 61: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

64

The K-Medoids Clustering Method

• Find representative objects, called medoids, in

clusters

• Medoids are located in the center of the

clusters.

– Given data points, how to find the medoid?

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Page 62: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

65

A Typical K-Medoids Algorithm (PAM)

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Total Cost = 20

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrary choose k object as initial medoids

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Assign each remaining object to nearest medoids

Randomly select a nonmedoid object,Oramdom

Compute total cost of swapping

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Total Cost = 26

Swapping O and Oramdom

If quality is improved.

Do loop

Until no change

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

Page 63: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

The K-Medoid Clustering Method

• K-Medoids Clustering: Find representative objects (medoids) in clusters

– PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)

• Starts from an initial set of medoids and iteratively replaces one of the

medoids by one of the non-medoids if it improves the total distance of

the resulting clustering

• PAM works effectively for small data sets, but does not scale well for

large data sets (due to the computational complexity)

• Efficiency improvement on PAM

– CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples

– CLARANS (Ng & Han, 1994): Randomized re-sampling

66

Page 64: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

X1 2 6

X2 3 4

X3 3 8

X4 4 7

X5 6 2

X6 6 4

X7 7 3

X8 7 4

X9 8 5

X10 7 6

K-mediods example

Page 65: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

• Initialize k mediods

• Let us assume c1 = (3,4) and c2 = (7,4)

• Calculate distance so as to associate each data object to its nearest medoid.

Page 66: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

i c1

Data objects (Xi)

Cost (distance)

1 3 4 2 6 3

3 3 4 3 8 4

4 3 4 4 7 4

5 3 4 6 2 5

6 3 4 6 4 3

7 3 4 7 3 5

9 3 4 8 5 6

10 3 4 7 6 6

i c2

Data objects (Xi)

Cost (distance)

1 7 4 2 6 7

3 7 4 3 8 8

4 7 4 4 7 6

5 7 4 6 2 3

6 7 4 6 4 1

7 7 4 7 3 1

9 7 4 8 5 2

10 7 4 7 6 2

Cluster1 = {(3,4)(2,6)(3,8)(4,7)}Cluster2 = {(7,4)(6,2)(6,4)(7,3)(8,5)(7,6)}

Page 67: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

i O′

Data objects (Xi)

Cost (distance)

1 7 3 2 6 8

3 7 3 3 8 9

4 7 3 4 7 7

5 7 3 6 2 2

6 7 3 6 4 2

8 7 3 7 4 1

9 7 3 8 5 3

10 7 3 7 6 3

i c1

Data objects (Xi)

Cost (distance)

1 3 4 2 6 3

3 3 4 3 8 4

4 3 4 4 7 4

5 3 4 6 2 5

6 3 4 6 4 3

8 3 4 7 4 4

9 3 4 8 5 6

10 3 4 7 6 6

• Select one of the nonmedoids O′. Let us assume O′ = (7,3)• Now the medoids are c1(3,4) and O′(7,3)

• Do not change the mediod as S > 0

Page 68: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

• Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters.

• This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition.

Page 69: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

Page 70: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

Page 71: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

Page 72: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

Page 73: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

Page 74: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

Page 75: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

Page 76: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clusteringhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html

Page 77: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering• For example: we have initial centroid 3 & 11

(with m=2)

• For node 2 (1st element):

U11 =

The membership of first node to first cluster

U12 =

The membership of first node to second cluster

%78.9882

81

81

11

1

112

32

32

32

1

12

2

12

2

%22.182

1

181

1

112

112

32

112

1

12

2

12

2

Page 78: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

• For example: we have initial centroid 3 & 11 (with m=2)

• For node 3 (2nd element):

U21 = 100% The membership of second node to first cluster

U22 = 0%

The membership of second node to second cluster

Page 79: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

• For example: we have initial centroid 3 & 11

(with m=2)

• For node 4 (3rd element):

U31 =

The membership of first node to first cluster

U32 =

The membership of first node to second cluster

%98

49

50

1

49

11

1

114

34

34

34

1

12

2

12

2

%250

1

149

1

114

114

34

114

1

12

2

12

2

Page 80: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

• For example: we have initial centroid 3 & 11

(with m=2)

• For node 7 (4th element):

U41 =

The membership of fourth node to first cluster

U42 =

The membership of fourth node to second cluster

%502

1

11

1

117

37

37

37

1

12

2

12

2

%502

1

11

1

117

117

37

117

1

12

2

12

2

Page 81: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Fuzzy C-means Clustering

• C1=

...%)50(%)98(%)100(%)78.98(

...7*%)50(4*%)98(3*%)100(2*%)78.98(2222

2222

Page 82: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

FCM Soft Clustering

Page 83: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture Model

Page 84: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Component 1 Component 2p(x

)

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Mixture Model

x

p(x

)

Page 85: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Component 1 Component 2p(x

)

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Mixture Model

x

p(x

)

Page 86: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

-5 0 5 100

0.5

1

1.5

2

Component Models

p(x

)

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

Mixture Model

x

p(x

)

Page 87: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

The EM Algorithm

• Dempster, Laird, and Rubin

– general framework for likelihood-based parameter estimation with missing data

• start with initial guesses of parameters

• E step: estimate memberships given params

• M step: estimate params given memberships

• Repeat until convergence

– converges to a (local) maximum of likelihood

– E step and M step are often computationally simple

– generalizes to maximum a posteriori (with priors)

Page 88: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4ANEMIA PATIENTS AND CONTROLS

Red Blood Cell Volume

Red B

lood C

ell

Hem

oglo

bin

Concentr

ation

Page 89: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell H

em

og

lob

in C

on

ce

ntr

atio

n

EM ITERATION 1

Page 90: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell H

em

og

lob

in C

on

ce

ntr

atio

n

EM ITERATION 3

Page 91: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell H

em

og

lob

in C

on

ce

ntr

atio

n

EM ITERATION 5

Page 92: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell H

em

og

lob

in C

on

ce

ntr

atio

n

EM ITERATION 10

Page 93: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell H

em

og

lob

in C

on

ce

ntr

atio

n

EM ITERATION 15

Page 94: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7

3.8

3.9

4

4.1

4.2

4.3

4.4

Red Blood Cell Volume

Re

d B

loo

d C

ell H

em

og

lob

in C

on

ce

ntr

atio

n

EM ITERATION 25

Page 95: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture

• The Gaussian mixture architecture estimates probability density functions (PDF) for each class, and then performs classification based on Bayes’ rule:

)(

)()|()|(

XP

CPCXPXCP i

ii

Where P(X | Ci) is the PDF of class i, evaluated at X, P( Ci ) is the prior probability for class i, and P(X) is the overall PDF, evaluated

at X.

Page 96: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture

• Unlike the unimodal Gaussian architecture, which assumes P(X | Cj) to be in the form of a Gaussian, the Gaussian mixture model estimates P(X | Cj) as a weighted average of multiple Gaussians.

Nc

k

kkj GwCXP1

)|(

where wk is the weight of the k-th Gaussian Gk and the weightssum to one. One such PDF model is produced for each class.

Page 97: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture

• Each Gaussian component is defined as:

)]()(2/1[

2/12/

1

||)2(

1kk

Tk MXVMX

k

nk e

VG

where Mk is the mean of the Gaussian and Vk is the covariancematrix of the Gaussian.

Page 98: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture

• Free parameters of the Gaussian mixture model consist of the means and covariance matrices of the Gaussian components and the weights indicating the contribution of each Gaussian to the approximation of P(X | Cj).

Page 99: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Composition of Gaussian Mixture

G1,w1 G2,w2

G3,w3

G4,w4

G5,w5

Class 1

)(

)()|()|(

XP

CPCXPXCP

j

jj

Nc

k

kkj GwCXP1

)|(

)]()(2/1[

2/12/

1

||)2(

1)|( i

Ti XViX

i

dik eV

GXpG

Variables: μi, Vi, wk

We use EM (estimate-maximize) algorithm to approximate this variables.

Page 100: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture

• These parameters are tuned using a complex iterative procedure called the estimate-maximize (EM) algorithm, that aims at maximizing the likelihood of the training set generated by the estimated PDF.

Page 101: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture Training Flow Chart (1)

Initialize the initial Gaussian means μi, i=1,…G using the K means clustering algorithmInitialize the covariance matrices, Vi, to the distance to the nearest cluster.Initialize the weights πi =1 / G so that all Gaussian are equally likely.

)]()(2/1[

2/12/

1

||)2(

1)|( i

Ti XViX

i

di eV

GXp

)|()|(1

i

G

i

is GXpXp

Present each pattern X of the training set and model each of the classes Kas a weighte sum of Gaussians:

where G is the number of Gaussians, the πi’s are the weights, and

where Vi is the covariance matrix.

Page 102: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture Training Flow Chart (2)

Compute:

G

j

kjj

kiikiiiip

CXp

CGXp

Xp

CGXpXGP

1

),|(

),|(

)(

),|()|(

Iteratively update the weights, means and covariances:

cN

p

pi

c

i tN

t1

)(1

)1(

p

N

p

pi

ic

XttN

tic

1

)()(

1)1(

)))())(((()()(

1)1(

1

T

ipip

N

ppi

ic

i tXtXttN

tVc

E-Step for EM

M-Step for EM

Note: Here is the PE which we did in the class

Page 103: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture Training Flow Chart (3)

Recompute τip using the new weights, means and covariances. Stop training if

Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.

thresholdttpipipi )()1(

Page 104: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

Gaussian Mixture Test Flow Chart

Present each input pattern X and compute the confidence for each class j:

where is the prior probability of class Cj estimated by counting the number of training patterns. Classify pattern X as the class with the highest confidence.

),|()( jxj CXPCP

N

NCP ci

j )(

Gaussian Mixture End

Page 105: Data Mining - biomisa.orgbiomisa.org/wp-content/uploads/2019/10/Lect-11-DM.pdf · • Partitioning approach: – Construct various partitions and then evaluate them by some criterion,

108

Acknowledgements

Introduction to Machine Learning, Alphaydin

Pattern Classification” by Duda et al., John Wiley & Sons.

Read GMM from “Automated Detection of Exudates in Colored Retinal Images for Diagnosis of Diabetic Retinopathy”, Applied Optics, Vol. 51 No. 20, 4858-4866, 2012.

Mat

eria

l in

th

ese

slid

es h

as b

een

tak

en f

rom

, th

e fo

llow

ing

reso

urc

es