22
Data Mining Techniques Clustering

Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Embed Size (px)

Citation preview

Page 1: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Data Mining Techniques Clustering

Page 2: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Purpose

• In clustering analysis, there is no pre-classified data

• Instead, clustering analysis is a process where a set of objects is partitioned into several clusters

• All members in one cluster are similar to each other and different from the members of other clusters, according to some similarity metric (e.g., the opposite of distance between objects)

Page 3: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Cluster Analysis

X (Income)

Y (Age)

Customer(Object)

Variables

Cluster

Page 4: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Cluster Analysis

Data Matrix

DissimilarityMatrix (nn)

n objetcsp variables

Page 5: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Attribute Types Involved in Cluster Analysis

• Interval Variables– An interval variable contains continuous measurements

(e.g., height, weight, temperature, cost, etc.) which follow a linear scale

– It is essential that intervals keep the same importance throughout the scale

• Nominal Variables– A nominal variable takes on more than two states. For

example, the eye color of a person can be blue, brown, green or grey eyes

– These states may be coded as 1, 2, ..., M, however their order and the interval between any two states do not have any meaning

Page 6: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Attribute Types Involved in Cluster Analysis

• Ordinal Variables– An ordinal variable takes on more than two states. For

example, you may ask someone to convey his/her appreciation of some paintings in terms of the following categories: 1=detest, 2=dislike, 3=indifferent, 4=like and 5=admire

– In an ordinal variable, their states are ordered in a meaningful sequence. However, the interval between any two consecutive states are not equally distanced

• Binary Variables– Binary variables have only two possible states. For

example, the gender of a person is either female or male

Page 7: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 8: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 9: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 10: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 11: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 12: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 13: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 14: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 15: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Dissimilarity (Distance) Measure

Page 16: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Categorization of Clustering Methods

• Exclusive vs. Non-Exclusive (Overlapping)• Hierarchical Methods vs. Partitioning Methods• Hierarchical Methods

– Single Link Method– Complete Link Method

• Partitioning Methods– Kohonen Self-Organizing Feature Maps– K-Means Methods– K-Medoids Methods (PAM, CLARA, CLARANS)– Density-Based Methods– …

Page 17: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Hierarchical Methods

DissimilarityMatrix (55)

Page 18: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

K-Means Methods

Page 19: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

K-Means Methods

Page 20: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

K-Means Methods

Page 21: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

K-Means Methods

Sensitive toOutlier!

Page 22: Data Mining Techniques Clustering. Purpose In clustering analysis, there is no pre-classified data Instead, clustering analysis is a process where a set

Exercise 7

Object X Y

1 22 60

2 40 25

3 60 30

4 64 66

5 80 30

6 82 55

Number of clusters = 2

Using Single Link, Complete Link and K-Means to cluster the following data: