25
A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Cheng-Ru L in Ming-Syan Chen

A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging

Embed Size (px)

DESCRIPTION

A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging. Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Cheng-Ru Lin Ming-Syan Chen. Outline. Motivation Objective Introduction Preliminaries Cohesion-Base Self-Merging Algorithm - PowerPoint PPT Presentation

Citation preview

A Robust and Efficient Clustering Algorithm based on Cohesion Self-Merging

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang

Author : Cheng-Ru LinMing-Syan Chen

Outline

Motivation Objective Introduction Preliminaries Cohesion-Base Self-Merging Algorithm Performance Studies Conclusion Personal opinion

Motivation

The dissimilarity measured between two clusters are vulnerable to outliers, and removing the outliers precisely is yet another difficult task.

Objective

We propose a new similarity measurement, referred to as “cohesion”, to measure the inter-cluster distances.

Introduction

Hierarchical Clustering algorithms. Good clustering quality.

Partitional clustering algorithms. Good execution time and space requirem

ent. Hybrid clustering algorithms.

combin the features of partitional and hierarchical clustering methods

Introduction

Preliminaries

Hierarchical Clustering Algorithms. Hierarchical Clustering Algorithm. Single-link and Complete-link. Algorithm CURE.

Preliminaries

Partitional Clustering Algorithms. The K-means algorithm. Algorithm CLARA and CLARANS.

Preliminaries

Hybrid Clustering Algorithms. Phase1: Partition. Phase2:Merge. Algorithm BIRCH.

Cohesion-Based Self-Merging Algorithm

We propose a new similarity measurement, namely cohesion, based on the joinability of a data point to another cluster.

Cohesion-Based Self-Merging Algorithm

Cohesion-Based Self-Merging Algorithm

Definition 1 : Given a cluster Cl consisting of n data points,

p1,p2,…,pn, the radius r of Cl is defined as

2

11

2

))(

(n

cpdr

n

ii

Cohesion-Based Self-Merging Algorithm

Definition 2 : Given a data point of a cluster and anoth

er cluster , the joinability of to is defined as

)|),(),(|

(),(i

jiiiji r

cpdcpdExpCpjoin

ip iCjC ip jC

Cohesion-Based Self-Merging Algorithm

Definition 3 : The cohesion of two clusters and is defi

ned as

||||

),(),(

),(ji

Cpi

Cpj

ji CC

CpjoinCpjoin

CCchs ji

iC jC

Cohesion-Based Self-Merging Algorithm

Cohesion-Based Self-Merging Algorithm

Algorithm CSM Input:

The input data set, n. The number of subclusters, m. The desired number of clusters, k.

Output: The hierarchical structure of the k clusters.

Cohesion-Based Self-Merging Algorithm

Cohesion-Based Self-Merging Algorithm

Performance Studies

Experiment 1: Clustering Quality of Algorithm CSM.

Performance Studies

Performance Studies

Performance Studies

Experiment 2: Efficiency of Algorithm CSM.

Performance Studies

Conclusion

Algorithm CSM is able to not only resist outliers but also lead to similar clustering results as algorithm CURE while incurring a much shorter execution time complexity.

Personal Opinion

This paper has some examples can help us understand.

Cohesion : a good method to resist outliers. Weakness : the number of subclusters, m?

the desired number of clusters, k?