9
Unsupervised Unsupervised Learning: Learning: Clustering Clustering Some material adapted from slides by Andrew Some material adapted from slides by Andrew Moore, CMU. Moore, CMU. Visit Visit http://www.autonlab.org/tutorials/ for for Andrew Andrew s repository of Data Mining s repository of Data Mining tutorials. tutorials.

Unsupervised Learning: Clustering

Embed Size (px)

DESCRIPTION

Unsupervised Learning: Clustering. Some material adapted from slides by Andrew Moore, CMU. Visit http://www.autonlab.org/tutorials/ for Andrew ’ s repository of Data Mining tutorials. Unsupervised Learning. Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y. - PowerPoint PPT Presentation

Citation preview

Page 1: Unsupervised Learning: Clustering

UnsuperviseUnsupervised Learning: d Learning: ClusteringClusteringSome material adapted from slides by Andrew Some material adapted from slides by Andrew

Moore, CMU.Moore, CMU.

Visit Visit http://www.autonlab.org/tutorials/ for forAndrewAndrew’’s repository of Data Mining tutorials.s repository of Data Mining tutorials.

Page 2: Unsupervised Learning: Clustering

Unsupervised LearningUnsupervised Learning Supervised learning used labeled data pairs Supervised learning used labeled data pairs

(x, y) to learn a function f : X→Y.(x, y) to learn a function f : X→Y. But, what if we donBut, what if we don’’t have labels?t have labels?

No labels = No labels = unsupervised learningunsupervised learning Only some points are labeled = Only some points are labeled = semi-semi-

supervised learningsupervised learning Labels may be expensive to obtain, so we only get Labels may be expensive to obtain, so we only get

a few.a few.

ClusteringClustering is the unsupervised grouping of is the unsupervised grouping of data points. It can be used for data points. It can be used for knowledge knowledge discoverydiscovery..

Page 3: Unsupervised Learning: Clustering

Clustering DataClustering Data

Page 4: Unsupervised Learning: Clustering

K-Means ClusteringK-Means Clustering

K-Means ( k , data )• Randomly choose k

cluster center locations (centroids).

• Loop until convergence

• Assign each point to the cluster of the closest centroid.

• Reestimate the cluster centroids based on the data assigned to each.

Page 5: Unsupervised Learning: Clustering

K-Means ClusteringK-Means Clustering

K-Means ( k , data )• Randomly choose k

cluster center locations (centroids).

• Loop until convergence

• Assign each point to the cluster of the closest centroid.

• Reestimate the cluster centroids based on the data assigned to each.

Page 6: Unsupervised Learning: Clustering

K-Means ClusteringK-Means Clustering

K-Means ( k , data )• Randomly choose k

cluster center locations (centroids).

• Loop until convergence

• Assign each point to the cluster of the closest centroid.

• Reestimate the cluster centroids based on the data assigned to each.

Page 7: Unsupervised Learning: Clustering

K-Means AnimationK-Means Animation

Example generated by Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning.Proc. Conference onKnowledge Discovery in Databases 1999.

Page 8: Unsupervised Learning: Clustering

Problems with K-MeansProblems with K-Means VeryVery sensitive to the initial points. sensitive to the initial points.

Do many runs of k-Means, each with Do many runs of k-Means, each with different initial centroids.different initial centroids.

Seed the centroids using a better Seed the centroids using a better method than random. (e.g. Farthest-method than random. (e.g. Farthest-first sampling)first sampling)

Must manually choose k.Must manually choose k. Learn the optimal k for the clustering. Learn the optimal k for the clustering.

(Note that this requires a performance (Note that this requires a performance measure.)measure.)

Page 9: Unsupervised Learning: Clustering

Problems with K-MeansProblems with K-Means How do you tell it which clustering you How do you tell it which clustering you

want?want?

Constrained clustering techniquesConstrained clustering techniques

Same-cluster constraint(must-link)

Different-cluster constraint(cannot-link)