Upload
dana
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Modified global k-means algorithm for minimum sum-of-squares clustering problems. Presenter : Lin, Shu -Han Authors : Adil M. Bagirov. Pattern Recognition (PR, 2008). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Motivation. k- Means algorithm - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Modified global k-means algorithm for
minimum sum-of-squares clustering problems
Pattern Recognition (PR, 2008)
Presenter : Lin, Shu-Han
Authors : Adil M. Bagirov
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Motivation Objective Methodology Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Motivation
k-Means algorithm sensitive to the choice of starting points
inefficient for solving clustering problems in large data sets
Global k-Means (GKM) algorithm incremental algorithm (dynamically adds a cluster center at a time)
uses each data point as a candidate for the k-th cluster center
3
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
Propose a new version of GKM
4
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – k-Means
5
sensitive to the choice of a starting point
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – The GKM algorithm
6
Objective function
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Objective function
7
Old version
Reformulated version
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – fast GKM algorithm
8
Old version
Proposed version (auxiliary cluster function)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – modified GKM algorithm
9
Proposed version
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – modified GKM algorithm
10
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
11
MS k-means: Multi-start k-means GKM: fast Global K-Means MGKM: Modified Global K-Means
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
12
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments
13
Overall (14 datasets, 140 results) The MS k-means algorithm finds the best known (or near best known)
solutions 42 (33.3%) times
GKM algorithm 76 (60.3%) times
MGKM algorithm 102 (81.0%) times
Large k in large data sets (m) The MS k-means algorithm failed to find the best known (or near best
known) solutions
GKM algorithm finds such solutions 22 (45.8%) times
MGKM algorithm 42 (87.5%) times.
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
14
Conclusions
A new version of the GKM Change the computation of starting points
By minimize the auxiliary cluster function
Given tolerance
Is more effective than GKM large dataset especially
The choice of starting points in k-means is crucial
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
15
Comments
Advantage Theoretically analysis
Drawback Describe why they think to modify anything they tend to modify is
important, or need to.
Application GKM outperforms k-means algorithm