Upload
sushike
View
226
Download
0
Embed Size (px)
Citation preview
8/14/2019 Ki2 s07 Clustering Algorithms
1/42
1Kunstmatige Intelligentie / RuG
KI2 - 7
Clustering Algorithms
Johan Everts
8/14/2019 Ki2 s07 Clustering Algorithms
2/42
What is Clustering?
Find K clusters (or a classification that consists of K clusters) so thatthe objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)
8/14/2019 Ki2 s07 Clustering Algorithms
3/42
The Goals of Clustering
Determine the intrinsic grouping in a set of unlabeled data.
What constitutes a good clustering?All clustering algorithms will produce clusters,regardless of whether the data contains them
There is no golden standard, depends on goal:data reductionnatural clustersuseful clusters outlier detection
8/14/2019 Ki2 s07 Clustering Algorithms
4/42
Stages in clustering
8/14/2019 Ki2 s07 Clustering Algorithms
5/42
Taxonomy of Clustering Approaches
8/14/2019 Ki2 s07 Clustering Algorithms
6/42
Hierarchical Clustering
Agglomerative clustering treats each data point as a singletoncluster, and then successively merges clusters until all points
have been merged into a single remaining cluster. Divisiveclustering works the other way around.
8/14/2019 Ki2 s07 Clustering Algorithms
7/42
Single link
Agglomerative Clustering
In single-link hierarchical clustering, we merge in each step the twoclusters whose two closest members have the smallest distance.
8/14/2019 Ki2 s07 Clustering Algorithms
8/42
Complete link
Agglomerative Clustering
In complete-link hierarchical clustering, we merge in each step the twoclusters whose merger has the smallest diameter.
8/14/2019 Ki2 s07 Clustering Algorithms
9/42
Example Single Link AC
BA FI MI NA RM TO
BA 0 662 877 255 412 996
FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754 0 219 869
RM 412 268 564 219 0 669
TO 996 400 138 869 669 0
8/14/2019 Ki2 s07 Clustering Algorithms
10/42
Example Single Link AC
8/14/2019 Ki2 s07 Clustering Algorithms
11/42
Example Single Link AC
BA FI MI/TO NA RM
BA 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564
NA 255 468 754 0 219
RM 412 268 564 219 0
8/14/2019 Ki2 s07 Clustering Algorithms
12/42
Example Single Link AC
8/14/2019 Ki2 s07 Clustering Algorithms
13/42
Example Single Link AC
BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564
NA/RM 255 268 564 0
8/14/2019 Ki2 s07 Clustering Algorithms
14/42
Example Single Link AC
8/14/2019 Ki2 s07 Clustering Algorithms
15/42
Example Single Link AC
BA/NA/RM FI MI/TO
BA/NA/RM 0 268 564
FI 268 0 295
MI/TO 564 295 0
8/14/2019 Ki2 s07 Clustering Algorithms
16/42
Example Single Link AC
8/14/2019 Ki2 s07 Clustering Algorithms
17/42
Example Single Link AC
BA/FI/NA/RM MI/TO
BA/FI/NA/RM 0 295
MI/TO 295 0
8/14/2019 Ki2 s07 Clustering Algorithms
18/42
Example Single Link AC
8/14/2019 Ki2 s07 Clustering Algorithms
19/42
Example Single Link AC
8/14/2019 Ki2 s07 Clustering Algorithms
20/42
Taxonomy of Clustering Approaches
8/14/2019 Ki2 s07 Clustering Algorithms
21/42
Square error
8/14/2019 Ki2 s07 Clustering Algorithms
22/42
K-Means
Step 0: Start with a random partition into K clusters
Step 1: Generate a new partition byassigning each pattern to its closest clustercenter
Step 2: Compute new cluster centers as the
centroids of the clusters.Step 3: Steps 1 and 2 are repeated untilthere is no change in the membership (alsocluster centers remain the same)
8/14/2019 Ki2 s07 Clustering Algorithms
23/42
K-Means
8/14/2019 Ki2 s07 Clustering Algorithms
24/42
K-Means How many Ks ?
8/14/2019 Ki2 s07 Clustering Algorithms
25/42
K-Means How many Ks ?
8/14/2019 Ki2 s07 Clustering Algorithms
26/42
Locating the knee
The knee of a curve is defined as the point of maximum curvature.
8/14/2019 Ki2 s07 Clustering Algorithms
27/42
Leader - Follower
OnlineSpecify threshold distance
Find the closest cluster centerDistance above threshold ? Create newclusterOr else, add instance to cluster
8/14/2019 Ki2 s07 Clustering Algorithms
28/42
Leader - Follower
Find the closest cluster centerDistance above threshold ? Create newclusterOr else, add instance to cluster
8/14/2019 Ki2 s07 Clustering Algorithms
29/42
Leader - Follower
Find the closest cluster centerDistance above threshold ? Create new clusterOr else, add instance to cluster and updatecluster center
Distance < Threshold
8/14/2019 Ki2 s07 Clustering Algorithms
30/42
Leader - Follower
Find the closest cluster centerDistance above threshold ? Create new clusterOr else, add instance to cluster and updatecluster center
8/14/2019 Ki2 s07 Clustering Algorithms
31/42
Leader - Follower
Find the closest cluster centerDistance above threshold ? Create new clusterOr else, add instance to cluster and updatecluster center
Distance > Threshold
8/14/2019 Ki2 s07 Clustering Algorithms
32/42
Kohonen SOMs
The Self-Organizing Map (SOM) is an unsupervisedartificial neural network algorithm. It is a compromise
between biological modeling and statistical data processing
8/14/2019 Ki2 s07 Clustering Algorithms
33/42
Kohonen SOMs
Each weight is representative of a certain input.Input patterns are shown to all neurons simultaneously.Competitive learning: the neuron with the largest response is chosen.
8/14/2019 Ki2 s07 Clustering Algorithms
34/42
Kohonen SOMs
Initialize weightsRepeat until convergence
Select next input patternFind Best Matching UnitUpdate weights of winner and neighboursDecrease learning rate & neighbourhood size
Learning rate & neighbourhood size
8/14/2019 Ki2 s07 Clustering Algorithms
35/42
8/14/2019 Ki2 s07 Clustering Algorithms
36/42
Kohonen SOMs
8/14/2019 Ki2 s07 Clustering Algorithms
37/42
Some nice illustrations
8/14/2019 Ki2 s07 Clustering Algorithms
38/42
Kohonen SOMs
Kohonen SOM Demo (from ai-junkie.com):mapping a 3D colorspace on a 2D Kohonenmap
http://somdemo/SOMDemo/executable/SOMDemo.exehttp://somdemo/SOMDemo/executable/SOMDemo.exe8/14/2019 Ki2 s07 Clustering Algorithms
39/42
Performance Analysis
K-MeansDepends a lot on a priori knowledge (K)Very Stable
Leader FollowerDepends a lot on a priori knowledge
(Threshold)Faster but unstable
8/14/2019 Ki2 s07 Clustering Algorithms
40/42
Performance Analysis
Self Organizing MapStability and Convergence Assured
Principle of self-orderingSlow and many iterations needed forconvergence
Computationally intensive
8/14/2019 Ki2 s07 Clustering Algorithms
41/42
Conclusion
No Free Lunch theoremaAny elevated performance over one class, isexactly paid for in performance over
another class
Ensemble clustering ?Use SOM and Basic Leader Follower toidentify clusters and then use k-meanclustering to refine.
8/14/2019 Ki2 s07 Clustering Algorithms
42/42
Any Questions ?
?