Upload
lucinda-cobb
View
239
Download
3
Tags:
Embed Size (px)
Citation preview
Unsupervised Learning and Clustering
k-means clustering
Sum-of-Squared Errors
Competitive Learning SOM
Pre-processing and Post-processing techniques
K-means clustering
This an elementary but very popular method for clustering.
Our goal is to find the k mean vectors or “cluster centers”.
Initialize k, m1, m2, …, mk Repeat Classify samples according to its nearest mi Recompute mi Until there is no change in mi Return m1, m2, …, mk
Complexity
The computational complexity of the algorithm is defined as follows:
O( n d c T )
Where d is the number of features, n is the number of examples, c is the number of clusters, and T is the number of iterations.
The number of iterations is normally much less than the number of examples.
Figure 10.3
K-means clustering
Disadvantage 1: Prone to fall into local minima.
This can be solved with more computational power byrunning the algorithm many times with different initialmeans.
Disadvantage 2: Susceptible to outliers.
One solution is to replace the mean with the median.
K-means clustering
Hugo Steinhaus
Born in January 14, 1887 (Austria-Hungary).
Professor at the University of Wroclaw, Notre Dame, and Sussex.
Authored over 170 works in mathematics.
First one to use k-means clustering
Unsupervised Learning and Clustering
k-means clustering
Sum-of-Squared Errors
Competitive Learning SOM
Pre-processing and Post-processing techniques
The Sum-of-Squared Error
We can now define the goal of clustering:
Goal: To divide a dataset of examples into c disjoint subsets D1, D2, …, Dc, so that the distance between examples withinthe same partition is small compared to the distance betweenexamples on different partitions.
To achieve this, we define the c means by looking to minimizea metric.
Metric
Let mi be the mean of examples on partition Di:
mi = (1 / ni) Σ x (for all x in Di)
Then the metric to minimize is the sum-of-squared errors:
Je = Σi Σx || x – mi || 2
For all x in Di where index i goes along the clusters.
Figure 10.10
Others
Hierarchical clustering Clusters have subclusters which also have subclusters and so on.
Online clustering As time goes on new information may call for restructuring the clusters (plasticity). But we don’t want this to happen very often (stability).
Figure 10.11
Unsupervised Learning and Clustering
k-means clustering
Sum-of-Squared Errors
Competitive Learning SOM
Pre-processing and Post-processing techniques
Vector Quantisation
Data will be represented with prototype vectors.
Feature Mapping
Input Nodes
Feature Mapping
Input Nodes
[ x1, x2, x3, x4 ]T
[ w1, w2, w3, w4 ]T
w1
w2
w3
w4
Feature Mapping
Weight vector will be mapped into the feature space.
[ w1, w2, w3, w4 ]T
[ x1, x2, x3, x4 ]T
SOM Algorithm
Initialization
Select the number of neurons in the map
Choose random values for all weights
Learning
Repeat For each example, find the neuron closest to the point:
min || x - w ||
SOM Algorithm
Winner takes all
Input Nodes
Update weights of winneronly (and neighbors)
SOM Algorithm
Update Weights
Update weights for the closest neuron and neighbors:
wt+1 = wt + A(x,w) (x – w)
where is the learning rate
Function A defines a neighboring function.
SOM Algorithm
The neighboring function A:
SOM Algorithm
Usage
For every test point
Select the closest neuron using minimum Euclidean distance:
min || x - w ||
Mapping a Grid to a Grid
SOM Algorithm
Comments
Neighborhoods should be large at the beginningbut short as the nodes gain a specific ordering
Global ordering comes naturally (complexity theory)
Architecture of the map:Few nodes: underfittingMany nodes: overfitting
Teuvo Kohonen
Teuvo Kohonen• Born in 1934, Finland• He has several books and over 300 papers• His most famous work is in Self Organizing Maps• Member of the Academy of Finland
Awards:• IEEE Neural Networks Council Pioneer Award, 1991• Technical Achievement Award of IEEE, 1995• Frank Rosenblatt Technical Field Award, 2008
Unsupervised Learning and Clustering
k-means clustering
Sum-of-Squared Errors
Competitive Learning SOM
Pre-processing and Post-processing techniques
Cluster Tendency
Cluster tendency is a preprocessing step that indicates when data objects exhibit a clustering structure; it precludes using clustering when the data appears randomly generated under the uniform distribution over a sample window of interest in the attribute space
Example Cluster Tendency
Clustering captures inherentdata groups.
Clustering does not capture groups;Results come from random variations.
Example Cluster Tendency
Problem: How do we choose the sampling window?
Rule of thumb: Create a window centered at the mean that captures half the total number of examples.
Cluster Validation
Cluster validation is used to assess the value of the output of a clustering algorithm.
InternalStatistics are devised to capture the quality of the induced clusters using the available data objects.
ExternalIf the validation is performed by gathering statistics comparing the induced clusters against an external and independent classification of objects, the validation is called external.
Example Cluster Validation
Metrics Cluster Validation
One type of statistical metrics is defined in terms of a 2 x2 table where each entry counts the number of object pairs that agree or disagree with the class and cluster to which they belong:
E11 E12
E21 E22
Same class;Same cluster
Different class;Different cluster
Same class;Different cluster
Different class;Same cluster
Examples Metrics Cluster Validation
Rand:
[ E11 + E22 ] / [ E11 + E12 + E21 + E22 ]
Jaccard:
E11 / [ E11 + E12 + E21 ]