K means clustring @jax

Preview:

Citation preview

K-MEANS CLUSTERING

PRESENTED BY-

DEEPAK VERMA(14052140019)AJAY(1405214003)

OVERVIEW INTRODUCTION

TYPE OF CLUSTERING

IMPLEMENTATION DETAILS

FLOW CHART

ADVANTAGE AND DISADVANTAGE OF CLUSTERING APPLICATIONS

REFERENCES

INTRODUCTIONA Cluster is a group of similar

objects.Clustering refers to a method by

which large sets of data are grouped into clusters of smaller sets of similar data.Let us consider an example:-f the three different colors into

three Different groups.

The balls of same color are clustered into a group as shown below

Types of Clustering:- Hard clustering Soft clustering

Types of Clusters

Exclusive Clustering:

Data is grouped in an exclusive way, so that if a certain datum belongs to a definite cluster then it could not be included in another cluster.E.g. K-means

Overlapping Clustering:

The overlapping clustering, uses fuzzy sets to cluster data, so that each point may belong to two or more cluster with different degrees of member ship.

Hierarchical clustering:

“A set of nested clusters organized as a hierarchical tree”

•The hierarchical methods produce a set of nested clusters in which each pair of objects or clusters is progressively nested in a larger cluster until only one cluster remains

Clustering Algorithms:-

A clustering algorithm attempts to find natural groups of component (or data)based on some similarity.

The clustering algorithm finds the centroid of group of datasets.

Most algorithms evaluate the distance between a point and the cluster centroids

RAW DATA CLUSTERING ALGORITHM

CLUSTERS OF DATA

K-Means Algorithm:-

It is a distance-based, Partitional Clustering algorithm.

“K” stands for number of clusters, it is a user input to the algorithm.

It is unsupervised algorithm.Each cluster is associated with a

centroid.Each point is assigned to cluster with

closest centroid.This algorithm is iterative in nature.

Procedure:-

1) Select K points as the initial centroid.

2) Repeat it again.3) From K clusters by assigning all

points to the closest centroid.4) Re-compute the centroid of each

cluster.5) Until the centroids don’t change

START

Number of Cluster K

Centroid

Distance Objects of Centroids

Grouping Based of

Minimum Distance

No objec

t move

group

END

FLOW CHART :

Pick k=3Initial ClusterCenters(randomly)

Y

STEP.1

k2

k1

k3

X

k2

k1

k3Y

Assign each pointTo the ClosestClustercenter

X

STEP.2

k2

k1

k3

Y

X

MoveeachClusterCenterTo theMeanOf each cluster

STEP.3

k2

k1

k3

X

Y

ReassignPointsClosest to aDifferentNewClustercenter

Q. WhichPoints arereassigned

STEP.4 Continue

k2

k1

k3

X

Y

A:threePoints Withanimation

STEP.4

k2

k1

k3

X

Y

Re-computeClustermeans

STEP.5

k2

k1

k3

X

Y

MoveClusterCentersTo clustermeans

STEP.6

ADVANTAGES

1. If variables are huge, then the K-Means most of the times computationally faster than hierarchical clustering. If we keep k smalls.

2. K-Means produce tighter clusters than hierarchical clustering , especially if the cluster are globular.

DISADVANTAGES1. Difficult to product k-value.

2. It does not work well with clusters of Different size and Different density.

Applications

 Wireless sensor networks City-planning Search Engines Email Filtering

References:- Slide Share : “ Clustering Using K-Means Implementation” by Kartik Rao

Wikipedia :https://en.wikipedia.org/wiki/K-means_clustering

Saurabh Singh :https://www.youtube.com/watch?v=rjm4slbER_M