Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques

Unsupervised Learning and Clustering

k-means clustering

Sum-of-Squared Errors

Competitive Learning SOM

Pre-processing and Post-processing techniques

K-means clustering

This an elementary but very popular method for clustering.

Our goal is to find the k mean vectors or “cluster centers”.

Initialize k, m1, m2, …, mk Repeat Classify samples according to its nearest mi Recompute mi Until there is no change in mi Return m1, m2, …, mk

Complexity

The computational complexity of the algorithm is defined as follows:

O( n d c T )

Where d is the number of features, n is the number of examples, c is the number of clusters, and T is the number of iterations.

The number of iterations is normally much less than the number of examples.

Figure 10.3

K-means clustering

Disadvantage 1: Prone to fall into local minima.

This can be solved with more computational power byrunning the algorithm many times with different initialmeans.

Disadvantage 2: Susceptible to outliers.

One solution is to replace the mean with the median.

K-means clustering

Hugo Steinhaus

Born in January 14, 1887 (Austria-Hungary).

Professor at the University of Wroclaw, Notre Dame, and Sussex.

Authored over 170 works in mathematics.

First one to use k-means clustering


k-means clustering




The Sum-of-Squared Error

We can now define the goal of clustering:

Goal: To divide a dataset of examples into c disjoint subsets D1, D2, …, Dc, so that the distance between examples withinthe same partition is small compared to the distance betweenexamples on different partitions.

To achieve this, we define the c means by looking to minimizea metric.

Metric

Let mi be the mean of examples on partition Di:

mi = (1 / ni) Σ x (for all x in Di)

Then the metric to minimize is the sum-of-squared errors:

Je = Σi Σx || x – mi || 2

For all x in Di where index i goes along the clusters.

Figure 10.10

Others

Hierarchical clustering Clusters have subclusters which also have subclusters and so on.

Online clustering As time goes on new information may call for restructuring the clusters (plasticity). But we don’t want this to happen very often (stability).

Figure 10.11


k-means clustering




Vector Quantisation

Data will be represented with prototype vectors.

Feature Mapping

Input Nodes

Feature Mapping

Input Nodes

[ x1, x2, x3, x4 ]T

[ w1, w2, w3, w4 ]T

w1

w2

w3

w4

Feature Mapping

Weight vector will be mapped into the feature space.

[ w1, w2, w3, w4 ]T

[ x1, x2, x3, x4 ]T

SOM Algorithm

Initialization

Select the number of neurons in the map

Choose random values for all weights

Learning

Repeat For each example, find the neuron closest to the point:

min || x - w ||

SOM Algorithm

Winner takes all

Input Nodes

Update weights of winneronly (and neighbors)

SOM Algorithm

Update Weights

Update weights for the closest neuron and neighbors:

wt+1 = wt + A(x,w) (x – w)

where is the learning rate

Function A defines a neighboring function.

SOM Algorithm

The neighboring function A:

SOM Algorithm

Usage

For every test point

Select the closest neuron using minimum Euclidean distance:

min || x - w ||

Mapping a Grid to a Grid

SOM Algorithm

Comments

Neighborhoods should be large at the beginningbut short as the nodes gain a specific ordering

Global ordering comes naturally (complexity theory)

Architecture of the map:Few nodes: underfittingMany nodes: overfitting

Teuvo Kohonen

Teuvo Kohonen• Born in 1934, Finland• He has several books and over 300 papers• His most famous work is in Self Organizing Maps• Member of the Academy of Finland

Awards:• IEEE Neural Networks Council Pioneer Award, 1991• Technical Achievement Award of IEEE, 1995• Frank Rosenblatt Technical Field Award, 2008


k-means clustering




Cluster Tendency

Cluster tendency is a preprocessing step that indicates when data objects exhibit a clustering structure; it precludes using clustering when the data appears randomly generated under the uniform distribution over a sample window of interest in the attribute space

Example Cluster Tendency

Clustering captures inherentdata groups.

Clustering does not capture groups;Results come from random variations.

Example Cluster Tendency

Problem: How do we choose the sampling window?

Rule of thumb: Create a window centered at the mean that captures half the total number of examples.

Cluster Validation

Cluster validation is used to assess the value of the output of a clustering algorithm.

InternalStatistics are devised to capture the quality of the induced clusters using the available data objects.

ExternalIf the validation is performed by gathering statistics comparing the induced clusters against an external and independent classification of objects, the validation is called external.

Example Cluster Validation

Metrics Cluster Validation

One type of statistical metrics is defined in terms of a 2 x2 table where each entry counts the number of object pairs that agree or disagree with the class and cluster to which they belong:

E11 E12

E21 E22

Same class;Same cluster

Different class;Different cluster

Same class;Different cluster

Different class;Same cluster

Examples Metrics Cluster Validation

Rand:

[ E11 + E22 ] / [ E11 + E12 + E21 + E22 ]

Jaccard:

E11 / [ E11 + E12 + E21 ]

Documents

Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques