33
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Embed Size (px)

Citation preview

Page 1: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Unsupervised Learning and Clustering

k-means clustering

Sum-of-Squared Errors

Competitive Learning SOM

Pre-processing and Post-processing techniques

Page 2: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

K-means clustering

This an elementary but very popular method for clustering.

Our goal is to find the k mean vectors or “cluster centers”.

Initialize k, m1, m2, …, mk Repeat Classify samples according to its nearest mi Recompute mi Until there is no change in mi Return m1, m2, …, mk

Page 3: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Complexity

The computational complexity of the algorithm is defined as follows:

O( n d c T )

Where d is the number of features, n is the number of examples, c is the number of clusters, and T is the number of iterations.

The number of iterations is normally much less than the number of examples.

Page 4: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Figure 10.3

Page 5: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

K-means clustering

Disadvantage 1: Prone to fall into local minima.

This can be solved with more computational power byrunning the algorithm many times with different initialmeans.

Disadvantage 2: Susceptible to outliers.

One solution is to replace the mean with the median.

Page 6: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

K-means clustering

Hugo Steinhaus

Born in January 14, 1887 (Austria-Hungary).

Professor at the University of Wroclaw, Notre Dame, and Sussex.

Authored over 170 works in mathematics.

First one to use k-means clustering

Page 7: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Unsupervised Learning and Clustering

k-means clustering

Sum-of-Squared Errors

Competitive Learning SOM

Pre-processing and Post-processing techniques

Page 8: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

The Sum-of-Squared Error

We can now define the goal of clustering:

Goal: To divide a dataset of examples into c disjoint subsets D1, D2, …, Dc, so that the distance between examples withinthe same partition is small compared to the distance betweenexamples on different partitions.

To achieve this, we define the c means by looking to minimizea metric.

Page 9: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Metric

Let mi be the mean of examples on partition Di:

mi = (1 / ni) Σ x (for all x in Di)

Then the metric to minimize is the sum-of-squared errors:

Je = Σi Σx || x – mi || 2

For all x in Di where index i goes along the clusters.

Page 10: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Figure 10.10

Page 11: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Others

Hierarchical clustering Clusters have subclusters which also have subclusters and so on.

Online clustering As time goes on new information may call for restructuring the clusters (plasticity). But we don’t want this to happen very often (stability).

Page 12: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Figure 10.11

Page 13: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Unsupervised Learning and Clustering

k-means clustering

Sum-of-Squared Errors

Competitive Learning SOM

Pre-processing and Post-processing techniques

Page 14: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Vector Quantisation

Data will be represented with prototype vectors.

Page 15: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Feature Mapping

Input Nodes

Page 16: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Feature Mapping

Input Nodes

[ x1, x2, x3, x4 ]T

[ w1, w2, w3, w4 ]T

w1

w2

w3

w4

Page 17: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Feature Mapping

Weight vector will be mapped into the feature space.

[ w1, w2, w3, w4 ]T

[ x1, x2, x3, x4 ]T

Page 18: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

SOM Algorithm

Initialization

Select the number of neurons in the map

Choose random values for all weights

Learning

Repeat For each example, find the neuron closest to the point:

min || x - w ||

Page 19: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

SOM Algorithm

Winner takes all

Input Nodes

Update weights of winneronly (and neighbors)

Page 20: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

SOM Algorithm

Update Weights

Update weights for the closest neuron and neighbors:

wt+1 = wt + A(x,w) (x – w)

where is the learning rate

Function A defines a neighboring function.

Page 21: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

SOM Algorithm

The neighboring function A:

Page 22: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

SOM Algorithm

Usage

For every test point

Select the closest neuron using minimum Euclidean distance:

min || x - w ||

Page 23: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Mapping a Grid to a Grid

Page 24: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

SOM Algorithm

Comments

Neighborhoods should be large at the beginningbut short as the nodes gain a specific ordering

Global ordering comes naturally (complexity theory)

Architecture of the map:Few nodes: underfittingMany nodes: overfitting

Page 25: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Teuvo Kohonen

Teuvo Kohonen• Born in 1934, Finland• He has several books and over 300 papers• His most famous work is in Self Organizing Maps• Member of the Academy of Finland

Awards:• IEEE Neural Networks Council Pioneer Award, 1991• Technical Achievement Award of IEEE, 1995• Frank Rosenblatt Technical Field Award, 2008

Page 26: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Unsupervised Learning and Clustering

k-means clustering

Sum-of-Squared Errors

Competitive Learning SOM

Pre-processing and Post-processing techniques

Page 27: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Cluster Tendency

Cluster tendency is a preprocessing step that indicates when data objects exhibit a clustering structure; it precludes using clustering when the data appears randomly generated under the uniform distribution over a sample window of interest in the attribute space

Page 28: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Example Cluster Tendency

Clustering captures inherentdata groups.

Clustering does not capture groups;Results come from random variations.

Page 29: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Example Cluster Tendency

Problem: How do we choose the sampling window?

Rule of thumb: Create a window centered at the mean that captures half the total number of examples.

Page 30: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Cluster Validation

Cluster validation is used to assess the value of the output of a clustering algorithm.

InternalStatistics are devised to capture the quality of the induced clusters using the available data objects.

ExternalIf the validation is performed by gathering statistics comparing the induced clusters against an external and independent classification of objects, the validation is called external.

Page 31: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Example Cluster Validation

Page 32: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Metrics Cluster Validation

One type of statistical metrics is defined in terms of a 2 x2 table where each entry counts the number of object pairs that agree or disagree with the class and cluster to which they belong:

E11 E12

E21 E22

Same class;Same cluster

Different class;Different cluster

Same class;Different cluster

Different class;Same cluster

Page 33: Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Examples Metrics Cluster Validation

Rand:

[ E11 + E22 ] / [ E11 + E12 + E21 + E22 ]

Jaccard:

E11 / [ E11 + E12 + E21 ]