12
Density-Based Clustering Methods

Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

Embed Size (px)

Citation preview

Page 1: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

Density-Based Clustering Methods

Page 2: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

• Clustering based on density (local cluster criterion), such as density-connected points

• Major features:– Discover clusters of arbitrary shape– Handle noise– One scan– Need density parameters as termination condition

• Several interesting studies:– DBSCAN: Ester, et al. (KDD’96)– OPTICS: Ankerst, et al (SIGMOD’99).

Page 3: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

Density-Based Clustering: Definitions

• Two parameters:

– Eps: Maximum radius of the neighbourhood

– MinPts: Minimum number of points in an Eps-neighbourhood of that point

• NEps(p): {q belongs to D | dist(p,q) <= Eps}

Page 4: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

•Directly density-reachable: A point p is directly density-reachable from a point q wrt. Eps, MinPts if

–1) p belongs to NEps(q)

–2) core point condition:

|NEps (q)| >= MinPts

pq

MinPts = 5

Eps = 1 cm

Page 5: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

Density-Based Clustering: Definitions

• Density-reachable:

– A point p is density-reachable from a point q wrt. Eps, MinPts if there is a chain of points p1, …, pn, p1 = q, pn = p such that pi+1 is directly density-reachable from pi

• Density-connected

– A point p is density-connected to a point q wrt. Eps, MinPts if there is a point o such that both, p and q are density-reachable from o wrt. Eps and MinPts.

p

qp1

p q

o

Page 6: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

Density Based Cluster: Definition• Relies on a density-based notion of cluster: A

cluster is defined as a maximal set of density-connected points

• A cluster C is a subset of D satisfying– For all p,q if p is in C, and q is density reachable from

p, then q is also in C – For all p,q in C: p is density connected to q

Core

Border

Outlier

Eps = 1cm

MinPts = 5

Page 7: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

• Lemma 1: If p is a core point, and O is the set of points density reachable from p, then O is a cluster

• Lemma 2: Let C be a cluster and p be any core point of C, then C equals the set of density reachable points from p

• Implication: Finding density reachable point of an arbitrary point generates a cluster. A cluster is unique determined by any of its core points

Page 8: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

DBSCAN: The Algorithm

– Arbitrary select a point p

– Retrieve all points density-reachable from p wrt Eps

and MinPts.

– If p is a core point, a cluster is formed.

– If p is a border point, no points are density-reachable

from p and DBSCAN visits the next point of the

database.

– Continue the process until all of the points have been

processed.

Page 9: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

• Selection of Eps and MinPts: Chosen corresponding to the thinnest cluster.

• Complexity: O(n*log n)

Page 10: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

OPTICS: A Cluster-Ordering Method

• OPTICS: Ordering Points To Identify the Clustering Structure– Ankerst, Breunig, Kriegel, and Sander (SIGMOD’99)– Produces a special order of the database wrt its

density-based clustering structure – This cluster-ordering contains info equiv to the

density-based clusterings corresponding to a broad range of parameter settings

– Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure

– Can be represented graphically or using visualization techniques

Page 11: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

OPTICS: Some Extension from DBSCAN

• Index-based: • k = number of dimensions • N = 20• p = 75%• M = N(1-p) = 5

– Complexity: O(kN2)• Core Distance

• Reachability Distance

D

p2

MinPts = 5

= 3 cm

Max (core-distance (o), d (o, p))

r(p1, o) = 2.8cm. r(p2,o) = 4cm

o

o

p1

Page 12: Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters

Reachability-distance

Cluster-order

of the objects

undefined