Upload
pritinaidu
View
221
Download
0
Embed Size (px)
Citation preview
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 1/8
Densiy based scan algorithm
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 2/8
` DBScan is a clustering algorithm which gives acluster as a maximal set of density-connectedpoints .
` The explanation of this algorithm is done withrespect to few definitions
- k-neighborhood
- core objects
- density reachable
- density connected
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 3/8
` The neighborhood within a radius µk¶ of a given object is called
the k-neighborhood of the object .
` If the k-neighborhood of an object contains at least a min
number , minpts of an object then object is called coreobject.
` In a given set of objects D we say that µp¶ is directly densityreachable from µq¶ if µp¶ is within the k-neighborhood of µq¶ and
µq¶ is called core object .
` An object µp¶ is density reachable from object µq¶ with respect to
k and minpts in a set of objects ,if objects are p1«..pn , where
p1=q and pn =p such that pi+1 is ddr from pi .
` An object µp¶ is density connected to object µq¶ with respect to k
and minpts in a set of objects D , if there is an object µO¶ in D
such that both µp¶ and µq¶ are density reachable from µO¶ with
respect to k and minpts .
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 4/8
` DBScan does not require you to know the number
of clusters in the data a priori, as opposed to k-
means.
` DBScan has a notion of noise` DBScan can find arbitrarily shaped clusters. It can
even find clusters completely surrounded by a
different cluster.
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 5/8
` DBSCAN can only result in a good clustering as
good as its distance measure is in the function
getNeighbors(P,epsilon). The most common
distance metric used is the euclidean distancemeasure. Especially for high-dimensional data,
this distance metric can be rendered almost
useless.
` DBScan does not respond well to data sets withvarying densities (called hierarchical data sets
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 6/8
` Two parameters:
k: Maximum radius of the neighbourhood
minpts: Minimum number of points in an k-neighbourhoodof that point
` N k ( p): {q belongs t o D | d ist ( p,q ) <= k}
` Directly density-reachable: A point p is directly density-reachable from a point q w.r.t. k , M inPts if
p belongs to N k ( q )
core point condition:
|N k ( q )| >= M inPts
p
q
MinPts = 5
k = 1 cm
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 7/8
` Density-reachable:
A point p is density-reachablefrom a point q w.r.t. k , M inPts if there is a chain of points p1, «, pn, p1 = q, pn = p such that pi+1
is directly density-reachable
from pi
` Density-connected
A point p is density-connectedto a point q w.r.t. k , M inPts if there is a point o such that both,
p and q are density-reachablefrom o w.r.t. k and M inPts
p q
o
p
q p1
8/8/2019 Dwd Seminar
http://slidepdf.com/reader/full/dwd-seminar 8/8
Relies on a density-based notion of cluster: A clust er is
defined as a maximal set of density-connected points
Discovers clusters of arbitrary shape in spatial databases
with noise
Core
Border
Outlier
E ps = 1cm
MinPts = 5