Ki2 s07 Clustering Algorithms

  • Upload
    sushike

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    1/42

    1Kunstmatige Intelligentie / RuG

    KI2 - 7

    Clustering Algorithms

    Johan Everts

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    2/42

    What is Clustering?

    Find K clusters (or a classification that consists of K clusters) so thatthe objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    3/42

    The Goals of Clustering

    Determine the intrinsic grouping in a set of unlabeled data.

    What constitutes a good clustering?All clustering algorithms will produce clusters,regardless of whether the data contains them

    There is no golden standard, depends on goal:data reductionnatural clustersuseful clusters outlier detection

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    4/42

    Stages in clustering

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    5/42

    Taxonomy of Clustering Approaches

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    6/42

    Hierarchical Clustering

    Agglomerative clustering treats each data point as a singletoncluster, and then successively merges clusters until all points

    have been merged into a single remaining cluster. Divisiveclustering works the other way around.

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    7/42

    Single link

    Agglomerative Clustering

    In single-link hierarchical clustering, we merge in each step the twoclusters whose two closest members have the smallest distance.

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    8/42

    Complete link

    Agglomerative Clustering

    In complete-link hierarchical clustering, we merge in each step the twoclusters whose merger has the smallest diameter.

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    9/42

    Example Single Link AC

    BA FI MI NA RM TO

    BA 0 662 877 255 412 996

    FI 662 0 295 468 268 400

    MI 877 295 0 754 564 138

    NA 255 468 754 0 219 869

    RM 412 268 564 219 0 669

    TO 996 400 138 869 669 0

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    10/42

    Example Single Link AC

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    11/42

    Example Single Link AC

    BA FI MI/TO NA RM

    BA 0 662 877 255 412

    FI 662 0 295 468 268

    MI/TO 877 295 0 754 564

    NA 255 468 754 0 219

    RM 412 268 564 219 0

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    12/42

    Example Single Link AC

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    13/42

    Example Single Link AC

    BA FI MI/TO NA/RM

    BA 0 662 877 255

    FI 662 0 295 268

    MI/TO 877 295 0 564

    NA/RM 255 268 564 0

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    14/42

    Example Single Link AC

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    15/42

    Example Single Link AC

    BA/NA/RM FI MI/TO

    BA/NA/RM 0 268 564

    FI 268 0 295

    MI/TO 564 295 0

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    16/42

    Example Single Link AC

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    17/42

    Example Single Link AC

    BA/FI/NA/RM MI/TO

    BA/FI/NA/RM 0 295

    MI/TO 295 0

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    18/42

    Example Single Link AC

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    19/42

    Example Single Link AC

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    20/42

    Taxonomy of Clustering Approaches

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    21/42

    Square error

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    22/42

    K-Means

    Step 0: Start with a random partition into K clusters

    Step 1: Generate a new partition byassigning each pattern to its closest clustercenter

    Step 2: Compute new cluster centers as the

    centroids of the clusters.Step 3: Steps 1 and 2 are repeated untilthere is no change in the membership (alsocluster centers remain the same)

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    23/42

    K-Means

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    24/42

    K-Means How many Ks ?

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    25/42

    K-Means How many Ks ?

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    26/42

    Locating the knee

    The knee of a curve is defined as the point of maximum curvature.

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    27/42

    Leader - Follower

    OnlineSpecify threshold distance

    Find the closest cluster centerDistance above threshold ? Create newclusterOr else, add instance to cluster

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    28/42

    Leader - Follower

    Find the closest cluster centerDistance above threshold ? Create newclusterOr else, add instance to cluster

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    29/42

    Leader - Follower

    Find the closest cluster centerDistance above threshold ? Create new clusterOr else, add instance to cluster and updatecluster center

    Distance < Threshold

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    30/42

    Leader - Follower

    Find the closest cluster centerDistance above threshold ? Create new clusterOr else, add instance to cluster and updatecluster center

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    31/42

    Leader - Follower

    Find the closest cluster centerDistance above threshold ? Create new clusterOr else, add instance to cluster and updatecluster center

    Distance > Threshold

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    32/42

    Kohonen SOMs

    The Self-Organizing Map (SOM) is an unsupervisedartificial neural network algorithm. It is a compromise

    between biological modeling and statistical data processing

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    33/42

    Kohonen SOMs

    Each weight is representative of a certain input.Input patterns are shown to all neurons simultaneously.Competitive learning: the neuron with the largest response is chosen.

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    34/42

    Kohonen SOMs

    Initialize weightsRepeat until convergence

    Select next input patternFind Best Matching UnitUpdate weights of winner and neighboursDecrease learning rate & neighbourhood size

    Learning rate & neighbourhood size

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    35/42

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    36/42

    Kohonen SOMs

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    37/42

    Some nice illustrations

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    38/42

    Kohonen SOMs

    Kohonen SOM Demo (from ai-junkie.com):mapping a 3D colorspace on a 2D Kohonenmap

    http://somdemo/SOMDemo/executable/SOMDemo.exehttp://somdemo/SOMDemo/executable/SOMDemo.exe
  • 8/14/2019 Ki2 s07 Clustering Algorithms

    39/42

    Performance Analysis

    K-MeansDepends a lot on a priori knowledge (K)Very Stable

    Leader FollowerDepends a lot on a priori knowledge

    (Threshold)Faster but unstable

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    40/42

    Performance Analysis

    Self Organizing MapStability and Convergence Assured

    Principle of self-orderingSlow and many iterations needed forconvergence

    Computationally intensive

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    41/42

    Conclusion

    No Free Lunch theoremaAny elevated performance over one class, isexactly paid for in performance over

    another class

    Ensemble clustering ?Use SOM and Basic Leader Follower toidentify clusters and then use k-meanclustering to refine.

  • 8/14/2019 Ki2 s07 Clustering Algorithms

    42/42

    Any Questions ?

    ?