Upload
yaduvanshi-yadav
View
100
Download
5
Embed Size (px)
Citation preview
K-MEANS CLUSTERING
PRESENTED BY-
DEEPAK VERMA(14052140019)AJAY(1405214003)
OVERVIEW INTRODUCTION
TYPE OF CLUSTERING
IMPLEMENTATION DETAILS
FLOW CHART
ADVANTAGE AND DISADVANTAGE OF CLUSTERING APPLICATIONS
REFERENCES
INTRODUCTIONA Cluster is a group of similar
objects.Clustering refers to a method by
which large sets of data are grouped into clusters of smaller sets of similar data.Let us consider an example:-f the three different colors into
three Different groups.
The balls of same color are clustered into a group as shown below
Types of Clustering:- Hard clustering Soft clustering
Types of Clusters
Exclusive Clustering:
Data is grouped in an exclusive way, so that if a certain datum belongs to a definite cluster then it could not be included in another cluster.E.g. K-means
Overlapping Clustering:
The overlapping clustering, uses fuzzy sets to cluster data, so that each point may belong to two or more cluster with different degrees of member ship.
Hierarchical clustering:
“A set of nested clusters organized as a hierarchical tree”
•The hierarchical methods produce a set of nested clusters in which each pair of objects or clusters is progressively nested in a larger cluster until only one cluster remains
Clustering Algorithms:-
A clustering algorithm attempts to find natural groups of component (or data)based on some similarity.
The clustering algorithm finds the centroid of group of datasets.
Most algorithms evaluate the distance between a point and the cluster centroids
RAW DATA CLUSTERING ALGORITHM
CLUSTERS OF DATA
K-Means Algorithm:-
It is a distance-based, Partitional Clustering algorithm.
“K” stands for number of clusters, it is a user input to the algorithm.
It is unsupervised algorithm.Each cluster is associated with a
centroid.Each point is assigned to cluster with
closest centroid.This algorithm is iterative in nature.
Procedure:-
1) Select K points as the initial centroid.
2) Repeat it again.3) From K clusters by assigning all
points to the closest centroid.4) Re-compute the centroid of each
cluster.5) Until the centroids don’t change
START
Number of Cluster K
Centroid
Distance Objects of Centroids
Grouping Based of
Minimum Distance
No objec
t move
group
END
FLOW CHART :
Pick k=3Initial ClusterCenters(randomly)
Y
STEP.1
k2
k1
k3
X
k2
k1
k3Y
Assign each pointTo the ClosestClustercenter
X
STEP.2
k2
k1
k3
Y
X
MoveeachClusterCenterTo theMeanOf each cluster
STEP.3
k2
k1
k3
X
Y
ReassignPointsClosest to aDifferentNewClustercenter
Q. WhichPoints arereassigned
STEP.4 Continue
k2
k1
k3
X
Y
A:threePoints Withanimation
STEP.4
k2
k1
k3
X
Y
Re-computeClustermeans
STEP.5
k2
k1
k3
X
Y
MoveClusterCentersTo clustermeans
STEP.6
ADVANTAGES
1. If variables are huge, then the K-Means most of the times computationally faster than hierarchical clustering. If we keep k smalls.
2. K-Means produce tighter clusters than hierarchical clustering , especially if the cluster are globular.
DISADVANTAGES1. Difficult to product k-value.
2. It does not work well with clusters of Different size and Different density.
Applications
Wireless sensor networks City-planning Search Engines Email Filtering
References:- Slide Share : “ Clustering Using K-Means Implementation” by Kartik Rao
Wikipedia :https://en.wikipedia.org/wiki/K-means_clustering
Saurabh Singh :https://www.youtube.com/watch?v=rjm4slbER_M