Upload
prudence-carr
View
227
Download
0
Embed Size (px)
DESCRIPTION
Clustering Methods Density-based Clustering Methods 3 Partitioning methods K-Means Hierarchical methods Agglomerative Hierarchical Clustering Divisive hierarchical clustering Density-based methods DBSCAN: a Density-Based Spatial Clustering of Applications with Noise Grid-based methods STING: A Statistical Information Grid Approach to Spatial Data Mining Model-based methods Expectation-Maximization Neural Network Approach High Dimensional Data Clustering CLIQUE: A Dimension-Growth Subspace Clustering Method
Citation preview
CLUSTERINGDENSITY-BASED METHODSElsayed HemayedData Mining Course
2
Outline
Density-based Clustering Methods
Density-Based Clustering Methods Density-Based Clustering Background Terminology How does DBSCAN find clusters? DBSCAN
3
Clustering Methods
Density-based Clustering Methods
Partitioning methods K-Means
Hierarchical methods Agglomerative Hierarchical Clustering Divisive hierarchical clustering
Density-based methods DBSCAN: a Density-Based Spatial Clustering of Applications
with Noise Grid-based methods
STING: A Statistical Information Grid Approach to Spatial Data Mining Model-based methods
Expectation-Maximization Neural Network Approach
High Dimensional Data Clustering CLIQUE: A Dimension-Growth Subspace Clustering Method
4
DBSCAN
Density-based Clustering Methods
Density-based Clustering Methods
5
Density-Based Clustering Methods Clustering based on density, such as density-connected
points instead of distance metric. Cluster = set of “density connected” points. Major features:
Discover clusters of arbitrary shape Handle noise Need “density parameters” as termination condition- (when
no new objects can be added to the cluster.)
Example: DBSCAN (Ester, et al. 1996) OPTICS (Ankerst, et al 1999) DENCLUE (Hinneburg & D. Keim 1998)
Density-based Clustering Methods
6
Density-Based Clustering: Background
Eps neighborhood: The neighborhood within a radius Eps of a given object MinPts: Minimum number of points in an Eps-
neighborhood of that object. Core object: If the Eps neighborhood contains at
least a minimum number of points Minpts, then the object is a core object
Directly density-reachable: A point p is directly density-reachable from a point q wrt. Eps, MinPts if 1) p is within the Eps neighborhood of q 2) q is a core object p
qMinPts = 5
Eps = 1Density-based Clustering Methods
7
Density Reachability and Density Connectivity
M, P, O and R are core objects since each is in an Eps neighborhood containing at least 3 points
Minpts = 3
Eps=radius of the circles
Density-based Clustering Methods
8
Directly density reachable Q is directly density reachable from M. M is directly density reachable from P and
vice versa.
Density-based Clustering Methods
9
Indirectly density reachable Q is indirectly density reachable from P
since Q is directly density reachable from M and M is directly density reachable from P. But, P is not density reachable from Q since Q is not a core object.
Density-based Clustering Methods
10
Core, border, and noise points DBSCAN is a Density-Based Spatial Clustering of
Applications with Noise Density = number of points within a specified radius (Eps)
A point is a core point if it has a specified number (or more) of points (MinPts) within Eps These are points that are at the interior of a cluster.
A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point.
A noise point is any point that is not a core point nor a border point.
Density-based Clustering Methods
11
How does DBSCAN find clusters?
Density-based Clustering Methods
DBSCAN searches for clusters by checking the Eps-neighborhood of each point in the database.
If the Eps-neighborhood of a point p contains more than MinPts, a new cluster with p as a core object is created.
DBSCAN then iteratively collects directly density-reachable objects from these core objects, which may involve the merge of a few density-reachable clusters.
The process terminates when no new point can be added to any cluster
12
DBSCAN Algorithm Arbitrary select a point p Retrieve all points density-reachable from p
wrt Eps and MinPts. If p is a core point, a cluster is formed. If p is a border point, no points are density-
reachable from p and DBSCAN visits the next point of the database.
Continue the process until all of the points have been processed.
Density-based Clustering Methods
13
DBSCAN Summary DBSCAN is A Density-Based Clustering Method
Based on Connected Regions with Sufficiently High Density
The algorithm grows regions with sufficiently high density into clusters and discovers clusters of arbitrary shape in spatial databases with noise.
It defines a cluster as a maximal set of density-connected points. So distance is not the metric unlike the case of hierarchical methods.
Density-based Clustering Methods
14
Summary
Density-based Clustering Methods
Density-Based Clustering Methods Density-Based Clustering
Background Terminology How does DBSCAN find clusters? DBSCAN