28
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu Tommy W.S. Chow Department of Information Manag ement Clustering of the self- organizing map using a clustering validity index based on inter-cluster and intra-cluster density Pattern Recognition Volume: 37, Issue: 2, February, 2004, pp. 175-188.

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu Tommy W.S. Chow

Embed Size (px)

DESCRIPTION

Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu Tommy W.S. Chow Department of Information Management. - PowerPoint PPT Presentation

Citation preview

Page 1: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sitao Wu

Tommy W.S. ChowDepartment of Information Management

Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density

Pattern Recognition Volume: 37, Issue: 2, February, 2004, pp. 175-188.

Page 2: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Outline Motivation Objective Introduction SOM and Clustering Clustering of the SOM using local clustering validity

index and preprocessing of the SOM for filtering Experimental results Conclusions Personal opinion Review

Page 3: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivation

Classical clustering methods based on the SOM.

Page 4: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective

Preprocessing techniques Filtering out noises and outliers.

A new two-level SOM-based clustering algorithm. Clustering validity index based on inter-cluster and intra-

cluster density.

Page 5: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction

Self-Organizing Map, SOM. Clustering algorithms. two-level SOM-based clustering. In this paper, a new two-level algorithm for clusterin

g of the SOM is proposed. SOM. Agglomerative hierarchical clustering.

Page 6: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM and Clustering

SOM and visualization. Clustering algorithms. Clustering of the SOM.

Page 7: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM and visualization

Initial Step. Training Step.

Find the winner from (1). Update the winner and neighborhood according to (2).

(3) ))(2

||||exp()(

(2) )]()()[()()()1(

(1) ||})()({||minarg)(

2

2

t

rrth

tmtxthttmtm

tmtxtc

icci

iciii

ii

Page 8: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM and visualization

Page 9: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Clustering algorithms

The categories of clustering methods Hierarchical Partitioning Density-based Grid-based Model-based

Page 10: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Clustering of the SOM

Agglomerative hierarchical clustering of the SOM. Merging criterion : Inter-cluster distance, Inter-cluster and

intra-cluster density. Filtering noises and outliers before clustering of the SOM.

Page 11: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Clustering of the SOM using local clustering validity index and preprocessing of the SOM for filtering

Global clustering validity index for different clustering algorithms.

Merging criterion using the CDbw. Preprocessing before clustering of the SOM. Clustering of the SOM. The algorithm of clustering of the SOM.

Page 12: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Global Clustering validity index for different clustering algorithms

Three types of methods used to cluster validity: External criteria. Internal criteria. Relative criteria.

compact and well-separated clusters The newly proposed multi-representation clustering

validity index.

Page 13: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

CDbw

The notations in the clustering validity index A set of representation points represents th

e i th cluster. stdev(i) is a standard deviation vector of the i th cluster. The p th component of stdev(i) is defined by

The average standard deviation is given by

)1/()()(1

2

i

n

k

pi

pk

p nmxistdevi

c

i

cistdevstdev1

2 /||)(||

},...,,{ 21 iiriii vvvV

Page 14: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

(5) otherwise.

stdev,||||

0

1),(

),()(

(4) 1.c ,)(1

)(_

1

1 1

ijlijl

n

lijlij

c

i

r

jij

vxvxf

vxfvdensity

vdensityc

cdenIntra

i

i

(7) otherwise.

)/2,||)(||||)(||(||||

0

1),(

),()(

(6) 1,c ),(||)(||||)(||

|)(_)(_||)(_

1

1 1

jstdevistdevuxuxf

uxfudensity

udensityjstdevistdev

jrepcloseirepclosecdenInter

ijkijk

nn

kijkij

ij

c

i

c

ij

ji

CDbw – Intra_den & Inter_den

Page 15: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The definition of the clusters’ separation

The overall clustering validity index, which is called “Composing Density Between and With clusters”.

c

i

c

ij cdenInter

jrepcloseirepclosecSep

1 1

(8) 1.c ,)(_1

||)(_)(_||)(

(9) )()(_)( cSepcdenIntracCDbw

CDbw

Page 16: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Merging criterion using the CDbw

To find the pair of clusters with minimal value of the CDbw.

Page 17: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Preprocessing before clustering of the SOM

1. Labeling.

2. Compute the distance deviation : devj=||wj - mj||, mean_dev, and std_dev.

3. If devj > mean_dev + std_dev, exclude the neuron j.

4. Compute distances : disj(xi)=||xi - wj||, mean_disj, and std_disj.

5. If disj(xi) > mean_disj + std_devj, filter out the input vector xj.

6. Compute the number of data belonging to the jth cluster : numj, mean_num, and std_num.

7. If numj < mean_num - std_num, exclude the neuron j.

Page 18: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Clustering of the SOM

Page 19: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

The algorithm of Clustering of the SOM

1. Train input data by the SOM.

2. Preprocessing before clustering of the SOM.

3. Cluster SOM by using the agglomerative hierarchical clustering. The merging criterion is the CDbw.

4. Find the optimal partition of the input data according to the CDbw.

Page 20: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Experimental results 200 2D synthetic data set.

With some noises and outliers. Use k-means, four HCA, and the proposed algorithm.

150 Iris data set. Three classes with 50 points each. Use single-linkage and proposed clustering algorithm.

1780 15D synthetic data set. Generating 20 uniformly distributed random 15D points.

178 Wine data set. Three classes are 59, 71, and 48, respectively.

Page 21: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2D synthetic data set

Page 22: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2D synthetic data set

Page 23: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Iris data set

Page 24: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

15D synthetic data set

Page 25: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Wine data set

Page 26: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Conclusions

In this paper, we propose a new SOM-based clustering algorithm.

The clustering validity index locally to determine which pair of clusters to be merged.

The preprocessing steps for filtering out noises and outliers.

The experimental results better than other clustering algorithms on the SOM.

Page 27: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Personal opinion

This method more precise than others. We can consider the entropy or other index besides distan

ce and density.

Page 28: Advisor : Dr. Hsu       Graduate : Sheng-Hsuan Wang  Author : Sitao Wu Tommy W.S. Chow

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Review

Self-Organizing Map, SOM. Clustering methods. Two-level Clustering. Clustering Validity index – CDbw. The preprocessing steps.