8
Densiy based scan algo rit hm

Dwd Seminar

Embed Size (px)

Citation preview

Page 1: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 1/8

Densiy based scan algorithm

Page 2: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 2/8

` DBScan is a clustering algorithm which gives acluster as a maximal set of density-connectedpoints .

` The explanation of this algorithm is done withrespect to few definitions

- k-neighborhood

- core objects

- density reachable

- density connected

Page 3: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 3/8

` The neighborhood within a radius µk¶ of a given object is called

the k-neighborhood of the object .

` If the k-neighborhood of an object contains at least a min

number , minpts of an object then object is called coreobject.

` In a given set of objects D we say that µp¶ is directly densityreachable from µq¶ if µp¶ is within the k-neighborhood of µq¶ and

µq¶ is called core object .

` An object µp¶ is density reachable from object µq¶ with respect to

k and minpts in a set of objects ,if objects are p1«..pn , where

p1=q and pn =p such that pi+1 is ddr from pi .

` An object µp¶ is density connected to object µq¶ with respect to k

and minpts in a set of objects D , if there is an object µO¶ in D

such that both µp¶ and µq¶ are density reachable from µO¶ with

respect to k and minpts .

Page 4: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 4/8

` DBScan does not require you to know the number 

of clusters in the data a priori, as opposed to k-

means.

` DBScan has a notion of noise` DBScan can find arbitrarily shaped clusters. It can

even find clusters completely surrounded by a

different cluster.

Page 5: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 5/8

` DBSCAN can only result in a good clustering as

good as its distance measure is in the function

getNeighbors(P,epsilon). The most common

distance metric used is the euclidean distancemeasure. Especially for high-dimensional data,

this distance metric can be rendered almost

useless.

` DBScan does not respond well to data sets withvarying densities (called hierarchical data sets

Page 6: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 6/8

` Two parameters:

k: Maximum radius of the neighbourhood

minpts: Minimum number of points in an k-neighbourhoodof that point

` N k (  p): {q belongs t o D | d ist (  p,q ) <= k}

` Directly density-reachable: A point  p is directly density-reachable from a point q w.r.t. k , M inPts if 

p belongs to N k ( q )

core point condition:

|N k  ( q )| >= M inPts

 p

q

MinPts = 5

k = 1 cm

Page 7: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 7/8

` Density-reachable:

 A point  p is density-reachablefrom a point q w.r.t. k , M inPts if there is a chain of points  p1, «, pn,  p1 = q,  pn =  p such that  pi+1

is directly density-reachable

from  pi 

` Density-connected

 A point  p is density-connectedto a point q w.r.t. k , M inPts if there is a point o such that both,

 p and q are density-reachablefrom o w.r.t. k  and M inPts

 p q

o

 p

q p1

Page 8: Dwd Seminar

8/8/2019 Dwd Seminar

http://slidepdf.com/reader/full/dwd-seminar 8/8

Relies on a density-based notion of cluster: A clust er is

defined as a maximal set of density-connected points

Discovers clusters of arbitrary shape in spatial databases

with noise

Core

Border 

Outlier 

E ps = 1cm

MinPts = 5