Machine learning clustering

Mauritius JEDI Machine Learning

&Big Data

Clustering Algorithms

Nadeem Oozeer

Machine learning:

• Supervised vs Unsupervised.

– Supervised learning - the presence of the outcome variable is available to guide the learning process.

• there must be a training data set in which the solution is already known.

– Unsupervised learning - the outcomes are unknown.

• cluster the data to reveal meaningful partitions and hierarchies

Clustering:

• Clustering is the task of gathering samples into groups of similar samples according to some predefined similarity or dissimilarity measure

sample Cluster/group

• In this case clustering is carried out using the Euclidean distance as a measure.

Clustering:

• What is clustering good for

– Market segmentation - group customers into different market segments

– Social network analysis - Facebook "smartlists"

– Organizing computer clusters and data centers for network layout and location

– Astronomical data analysis - Understanding galaxy formation

Galaxy Clustering:

• Multi-wavelength data obtained for galaxy clusters– Aim: determine robust criteria for the inclusion of a galaxy into

a cluster galaxy– Note: physical parameters of the galaxy cluster can be heavily

influenced by wrong candidate

Credit:HST

Clustering Algorithms :

• Hierarchy methods

– statistical method used to build a cluster by arranging elements at various levels

Dendogram:

• Each level will then represent a possible cluster.

• The height of the dendrogram shows the level of similarity that any two clusters are joined

• The closer to the bottom they are the more similar the clusters are

• Finding of groups from a dendrogram is not simple and is very often subjective

• Partitioning methods

– make an initial division of the database and then use an iterative strategy to further divide it into sections

– here each object belongs to exactly one cluster

Credit:Legodi, 2014

K-means:

K-means algorithm:

1. Given n objects, initialize k cluster centers

2. Assign each object to its closest cluster centre

3. Update the center for each cluster

4. Repeat 2 and 3 until no change in each cluster center

• Experiment: Pack of cards, dominoes

• Apply the K-means algorithm to the Shapley data– Change the number of potential cluster and find how the

clustering differ

K Nearest Neighbors (k-NN):

• One of the simplest of all machine learning classifiers

• Differs from other machine learning techniques, in that it doesn't produce a model.

• It does however require a distance measure and the selection of K.

• First the K nearest training data points to the new observation are investigated.

• These K points determine the class of the new observation.

• Simple idea: label a new point the same as the closest known point

Label it red.

1-NN Aspects of anInstance-Based Learner

1. A distance metric– Euclidian

2. How many nearby neighbors to look at?– One

3. A weighting function (optional)– Unused

4. How to fit with the local points?– Just predict the same output as the nearest

neighbor.

• Generalizes 1-NN to smooth away noise in the labels

• A new point is now assigned the most frequent label of its knearest neighbors

Label it red, when k = 3

Label it blue, when k = 7

Machine learning clustering

Documents

Clustering: Similarity-Based Clustering · Clustering: Similarity-Based Clustering CS4780/5780 – Machine Learning Fall 2013 Thorsten Joachims Cornell University Reading: Manning/Raghavan/Schuetze,

10701 Machine Learning Clustering - cs.cmu.eduepxing/Class/10701/slides/clustering.pdf · Clustering 10701 Machine Learning • Organizing data into clusters ... • Clustering methods

machine learning - Clustering in R

28 Machine Learning Unsupervised Hierarchical Clustering

Machine Learning with MapReduce. K-Means Clustering 3

Machine Learning with Weka: Filters and Clustering...Machine Learning with Weka: Filters and Clustering PU Tools, Ressourcen, Infrastruktur Nils Reiter, nils.reiter@uni-koeln.de December

Machine Learning Problems Unsupervised Learning – Clustering – Density estimation – Dimensionality Reduction Supervised Learning – Classification – Regression

Chapter 9 Classification and Clustering. Classification and Clustering n Classification/clustering are classical pattern recognition/ machine learning

Machine learning hands on clustering

Machine Learning with MATLAB · 2 Agenda Machine Learning Overview Machine Learning with MATLAB –Unsupervised Learning Clustering –Supervised Learning Classification Regression

Artificial Intelligence for Cybersecurity · Artificial Intelligence and Machine Learning. Machine Learning. Unsupervised Learning. Clustering. Clustering (2) •Can be aggregative

Hierarchical Clustering Ke Chen COMP24111 Machine Learning

Machine Learning Queens College Lecture 7: Clustering

Machine Learning for Malware Classification and Clustering

Machine Learning - K-means Clustering

Wireless Sensor Network Clustering with Machine Learning

Clustering UNSUPERVISED LEARNING INTRODUCTION Machine Learning

Machine Learning : Clustering, Self-Organizing Mapsds24/lehre/ml_ws_2013/ml_09_clust.pdf · 12/12/2013 Machine Learning : Clustering, Self-Organizing Maps 11 SOM-s (usually) consist

Andrew Rosenberg- Lecture 11: Clustering Introduction and Projects Machine Learning

Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion