32
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 1 / 32

Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering algorithmsMachine Learning

Hamid Beigy

Sharif University of Technology

Fall 1394

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 1 / 32

Page 2: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Table of contents

1 Supservised vs. Unsupervised Learning

2 Hierarchical Clustering

3 Non-Hierarchical Clustering

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 2 / 32

Page 3: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Supervised vs. Unsupervised Learning

Supervised Learning

Classification: partition examples into groups according to pre-defined categoriesRegression: assign value of feature vectorsrequired labeled data for training

Unsupervised Learning

Clustering: partition examples into groups when no pre-defined categories/classes areavailableNovelty detection: find new lables of dataConcept drift detection: find changes in distribution of dataOutlier detection: find unusual events (e.g. hackers)Only instances required, but no labels

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 3 / 32

Page 4: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Why do Unsupervised Learning?

Cost

Raw data is cheapLabeled data is expensive

Save memory/computation

Reduce noise in high-dimensional data

Useful in exploratory data analysis

Often pre-processing step for supervised learning

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 4 / 32

Page 5: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering

Partition unlabeled examples into disjoint subsets of clusters, such that:

Examples within a cluster are similarExamples in different clusters are different

Discover new categories in an unsupervised manner

(no sample category labels provided)

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 5 / 32

Page 6: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Similarity Measures

Eucliudian distance (L2 norm)

L2(−→x ,−→x ′) =√

ΣNi=1(xi − x ′i )

2

L1 norm:

L1(−→x ,−→x ′) =√

ΣNi=1|xi − x ′i |

Cosine similarity:

cos(−→x ,−→x ′) =−→x ∗−→x ′||−→x ||||−→x ′||

Kernels

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 6 / 32

Page 7: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Application of Clustering

Cluster retrieved documents

to present more organized and understandable results to user → ”diversified retrieval”

Detecting near duplicatesEntity resolution

E.g. ”thorsten Joachims” == ”thorsten B Joachims”

Exploratory data analysis

Automated (or semi-automated) creation of taxonomies

E.g. Yahoo, DMOZ

Comparison

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 7 / 32

Page 8: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Applications of Clustering

Text Mining

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 8 / 32

Page 9: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Applications of Clustering

Image SegmentationImage Segmentation

http://people.cs.uchicago.edu/ pff/segment

Sriram Sankararaman Clustering

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 9 / 32

Page 10: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering Example

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 10 / 32

Page 11: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering Example

\

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 11 / 32

Page 12: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering Example

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 12 / 32

Page 13: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering Example

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 13 / 32

Page 14: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering Example

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 14 / 32

Page 15: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering Algorithms Categories

Hierarchical clustering

E.g. Linkage clustering

Centroid-based clustering

E.g. K-means clustering

Distribution-based clustering

E.g. EM algorithm

Density-based clustering

E.g. DBSCAN

Other methods

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 15 / 32

Page 16: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Hierarchical Clustering

Organize the clusters in a hierarchical way

Produces a rooted tree (Dendrogram)

Animal

Vertebrate Invertebrate

Fish Reptile Amphibian Mammal Worm Insect Crustacean

Recursive application of a standard clustering algorithm can produce a hierarchicalclustering

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 16 / 32

Page 17: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Agglomerative vs. Divisive Clustering

Agglomerative (bottom-up) Methods start with each example in its own cluster anditeratively combine them to form larger and larger clusters.

Divisive(top-down) methods separate all examples recursively into smaller clusters.

Hierarchical Clustering

• Organize the clusters in a hierarchical way.• Produces a rooted binary tree (dendrogram).

Sriram Sankararaman Clustering

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 17 / 32

Page 18: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Agglomerative (bottom up)

Assumes a similarity function for determining the similarity of two clusters.

Starts with all instances in a separate cluster and then repeatedly joins the two clustersthat are most similar until there is only one cluster.

The history of merging forms a binary tree or hierarchy

Basic algorithms:

Start with all instances in their own clusterUntil there is only one cluster:

Among the current clusters, determine the two clusters, ci and cj that are most similarReplace ci and cj with a single cluster ci ∪ cj

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 18 / 32

Page 19: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Cluster Similarity

How to compute similarity of two clusters each possibly containing multiple instances?

Single Linkage: Similarity of two most similar members.Complete Linkage: Similarity of two least similar members.Group Average: Average similarity between members.

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 19 / 32

Page 20: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Single-Link (bottom-up)

sim(ci , cj) = minx∈ci ,y∈cj sim(x , y)

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 20 / 32

Page 21: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Compelete-Link (bottom-up)

sim(ci , cj) = maxx∈ci ,y∈cj sim(x , y)

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 21 / 32

Page 22: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Computational Complexity of HAC

In the first iteration, all HAC methods need to compute similarity of all pairs n individualinstances which is O(n2).

In each of the subsequent O(n) merging iterations, must find smallest distance pair ofclusters → Maintain heap O(n2log(n))

in each of the subsequent O(n) merging iterations, it must compute the distance betweenthe most recently created cluster and all other existing cluster. Can this be done inconstant time such that O(n2log(n)) overall?

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 22 / 32

Page 23: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Computing Cluster Similarity

After merging ci and cj , the similarity of the resulting cluster to any cluster, Ck , can becomputed by:

Single Linkage:

sim((ci ∪ cj), ck) = min(sim(ci , ck), sim(cj , ck))

Single Linkage:

sim((ci ∪ cj), ck) = max(sim(ci , ck), sim(cj , ck))

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 23 / 32

Page 24: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Group Average Bottom-up Clustering

Use average similarity across all pairs within the merged cluster to measure the similarityof two clusters.

sim(ci , cj) =1

|ci ∪ cj − 1|(|ci ∪ cj − 1|)− 1

∑−→x ∈(ci ,cj )

∑−→y ∈(ci ,cj ):−→y 6=−→x

sim(−→x ,−→y )

Compromise between single and complete link.

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 24 / 32

Page 25: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Non-Hierarchical Clustering

K-means clustering (”hard”)

Mixtures of Gaussian and training via Expectation maximization Algorithm (”soft”)

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 25 / 32

Page 26: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Clustering Criterion

Evaluation function that assigns a (usually real-valued) value to a clusteringClustering criterion typically function of

withing-cluster similarity andbetween-cluster dissimilarity

OptimizationFind clustering that maximize the criterion

Global optimization (often intractable)Greedy searchApproximation algorithms

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 26 / 32

Page 27: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Centroid-Based Clustering

Assumes instances are real-valued vectors.

Clusters represented via centroids (i.e. average of points in a cluster) c:

~µ(c) = 1/|c |∑~x∈c

~x

Reassignment of instances to clusters is based on distance to the current cluster

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 27 / 32

Page 28: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

K-Means Algorithm

Input: k = number of clusters, distance measure d,

Select k random instances s1, s2, ..., sk as seeds.

Until clustering converges or other stopping criterion:For each instance xi :

Assign xi to the cluster cj such that d(xi , sj) is min.

For each cluster cj ,item itupdate the centroid of each cluster

sj = µ(cj)

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 28 / 32

Page 29: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

K-Means Algorithm

Input: k = number of clusters, distance measure d,

Select k random instances s1, s2, ..., sk as seeds.

Until clustering converges or other stopping criterion:For each instance xi :

Assign xi to the cluster cj such that d(xi , sj) is min.

For each cluster cj ,item itupdate the centroid of each cluster

sj = µ(cj)

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 29 / 32

Page 30: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Time Complexity

Assume computing distance between two instances is O(D) where D is the dimensionalityof the vectors.

Reassigning clusters for N points: O(kN) distance computations, or O(kDN).

Computing centroids: Each instance gets added once to some centroid: O(DN).

Assume these two steps are each done once for i iterations: O(ikDN).

Linear in all relevant factors, assuming a fixed number of iterations, more efficient thanHAC.

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 30 / 32

Page 31: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Time Complexity

Problem

Results can vary based on random seed selection, especially for high-dimensional data.

Some seeds can result in poor convergence rate, or convergence to sub-optimalclusterings.

Idea: Combine HAC and K-means clustering.

First randomly take a sample of instances of size

Run group-average HAC on this sample

Use the results of HAC as initial seeds for K-means.

Overall algorithm is efficient and avoids problems of bad seed selection.

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 31 / 32

Page 32: Machine Learning Hamid Beigyce.sharif.edu/courses/94-95/1/ce717-1/resources/... · Table of contents 1 Supservised vs. Unsupervised Learning 2 Hierarchical Clustering 3 Non-Hierarchical

Gaussian Mixtures and EM

Gaussian Mixture Models

Assume

EM Algorithm

Assume P(Y ) and k know and∑

i = 1.REPEAT

~µj =

∑ni=1 P(Y = j |X = ~xi , ~µ1, ..., ~µk)~xi∑ni=1 P(Y = j |X = ~xi , ~µ1, ..., ~µk)

P(Y = j |X = ~xi , ~µ1, ..., ~µk) =P(X = ~xi |Y = j , ~µj)P(Y = j)∑kl=1 P(X = ~xi |Y = l , ~µl)P(Y = l)

=

e−0.5(~xj−~µj )2

P(Y = j)∑kl=1 e

−0.5(~xj−~µj )2P(Y = l)

Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1394 32 / 32