View
226
Download
5
Tags:
Embed Size (px)
Citation preview
1
Kunstmatige Intelligentie / RuG
KI2 - 7
Clustering Algorithms
Johan Everts
What is Clustering?
Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)
The Goals of Clustering
Determine the intrinsic grouping in a set of unlabeled data.
What constitutes a good clustering? All clustering algorithms will produce clusters, regardless of whether the data contains them
There is no golden standard, depends on goal: data reduction “natural clusters” “useful” clusters outlier detection
Stages in clustering
Taxonomy of Clustering Approaches
Hierarchical Clustering
Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.
Single link
Agglomerative Clustering
In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.
Complete link
Agglomerative Clustering
In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.
Example – Single Link AC
BA FI MI NA RM TO
BA 0 662 877 255 412 996
FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754 0 219 869
RM 412 268 564 219 0 669
TO 996 400 138 869 669 0
Example – Single Link AC
Example – Single Link AC
BA FI MI/TO NA RM
BA 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564
NA 255 468 754 0 219
RM 412 268 564 219 0
Example – Single Link AC
Example – Single Link AC
BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564
NA/RM 255 268 564 0
Example – Single Link AC
Example – Single Link AC
BA/NA/RM FI MI/TO
BA/NA/RM 0 268 564
FI 268 0 295
MI/TO 564 295 0
Example – Single Link AC
Example – Single Link AC
BA/FI/NA/RM MI/TO
BA/FI/NA/RM 0 295
MI/TO 295 0
Example – Single Link AC
Example – Single Link AC
Taxonomy of Clustering Approaches
Square error
K-Means
Step 0: Start with a random partition into K clusters
Step 1: Generate a new partition by assigning each pattern to its closest cluster center
Step 2: Compute new cluster centers as the centroids of the clusters.
Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)
K-Means
K-Means – How many K’s ?
K-Means – How many K’s ?
Locating the ‘knee’
The knee of a curve is defined as the point of maximum curvature.
Leader - Follower
Online Specify threshold distance
Find the closest cluster center Distance above threshold ? Create new
cluster Or else, add instance to cluster
Leader - Follower
Find the closest cluster center Distance above threshold ? Create new
cluster Or else, add instance to cluster
Leader - Follower
Find the closest cluster center Distance above threshold ? Create new
cluster Or else, add instance to cluster and update
cluster center
Distance < Threshold
Leader - Follower
Find the closest cluster center Distance above threshold ? Create new
cluster Or else, add instance to cluster and update
cluster center
Leader - Follower
Find the closest cluster center Distance above threshold ? Create new
cluster Or else, add instance to cluster and update
cluster center
Distance > Threshold
Kohonen SOM’s
The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing
Kohonen SOM’s
Each weight is representative of a certain input. Input patterns are shown to all neurons simultaneously. Competitive learning: the neuron with the largest response is chosen.
Kohonen SOM’s
Initialize weights Repeat until convergence
Select next input pattern Find Best Matching Unit Update weights of winner and neighbours Decrease learning rate & neighbourhood size
Learning rate & neighbourhood size
Kohonen SOM’s
Distance related learning
Kohonen SOM’s
Some nice illustrations
Kohonen SOM’s
Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map
Performance Analysis
K-Means Depends a lot on a priori knowledge (K) Very Stable
Leader Follower Depends a lot on a priori knowledge
(Threshold) Faster but unstable
Performance Analysis
Self Organizing Map Stability and Convergence Assured
Principle of self-ordering Slow and many iterations needed for
convergence Computationally intensive
Conclusion
No Free Lunch theorema Any elevated performance over one class, is
exactly paid for in performance over another class
Ensemble clustering ? Use SOM and Basic Leader Follower to
identify clusters and then use k-mean clustering to refine.
Any Questions ?
?