Upload
zelenia-lewis
View
36
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang PR, 2012. Determining the number of clusters using information entropy for mixed data. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Citation preview
Determining the number of clusters using information entropy for mixed data
Presenter : Hong-Yi, Cai Authors : Jiye Liang, Xingwang Zhao, Deyu Li, Fuyuan Cao, Chuangyin Dang
PR, 2012
1
Motivation
• The determination of the initial parameters of cluster is the most difficult problem.
• None of cluster algorithms can cluster effectively mixed data set.
3
Objectives
• To propose a generalized mechanism on mixed data set by integrating Renyi entropy and complement entropy.
• To improve k-prototype algorithm by using new generalized mechanism.
4
Methodology
• A generalized mechanism for numerical data…
6
Renyi Entropy :
Parzen window density estimation:
By the convolution theorem…
Within-Cluster Entropy:
Between-Cluster Entropy:
Improved Entropy for numerical data:
Methodology
• A generalized mechanism for categorical data…
7
Indiscernibility relation…
Complement Entropy: Within-Cluster Entropy:
Improved Entropy for categorical data:
Between-Cluster Entropy:
Huang Dissimilarity for categorical data:
Methodology
• Cluster validity index for mixed data…
9
For numerical data…
For categorical data…
For mixed data…
Conclusions
• The generalized mechanism and algorithm can cluster effectively and determine the optimal number of clusters for mixed data sets.
20