Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
1
Data Mining
Lecture # 11Clustering
Cluster Analysis
What is Cluster Analysis?
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Finding similarities between data according to the
characteristics found in the data and grouping similar data
objects into clusters
What is Cluster Analysis?
• Clustering analysis is an important human activity
• Early in childhood, we learn how to distinguish between cats and dogs
• Unsupervised learning: no predefined classes
• Typical applications
– As a stand-alone tool to get insight into data distribution
– As a preprocessing step for other algorithms
Clustering
• Hard vs. Soft
– Hard: same object can only belong to single cluster
– Soft: same object can belong to different clusters
Clustering
• Flat vs. Hierarchical
– Flat: clusters are flat
– Hierarchical: clusters form a tree
• Agglomerative
• Divisive
Clustering: Rich Applications and Multidisciplinary Efforts
• Pattern Recognition
• Spatial Data Analysis
– Create thematic maps in GIS by clustering feature spaces
– Detect spatial clusters or for other spatial mining tasks
• Image Processing
• Economic Science (especially market research)
• WWW
– Document classification
– Cluster Weblog data to discover groups of similar access patterns
Quality: What Is Good Clustering?
• A good clustering method will produce high quality clusters with
– high intra-class similarity
(Similar to one another within the same cluster)
– low inter-class similarity
(Dissimilar to the objects in other clusters)
What is Cluster Analysis?
• Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Inter-cluster distances are maximized
Intra-cluster distances are
minimized
Similarity and Dissimilarity Between Objects
• Distances are normally used to measure the similarity or
dissimilarity between two data objects
• Some popular ones include: Minkowski distance:
where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two p-
dimensional data objects, and q is a positive integer
• If q = 1, d is Manhattan distance
pp
jx
ix
jx
ix
jx
ixjid )||...|||(|),(
2211
||...||||),(2211 pp j
xi
xj
xi
xj
xi
xjid
Similarity and Dissimilarity Between Objects (Cont.)
• If q = 2, d is Euclidean distance:
• Also, one can use weighted distance, parametric
Pearson correlation, or other disimilarity measures
)||...|||(|),( 22
22
2
11 pp jx
ix
jx
ix
jx
ixjid
Major Clustering Approaches
• Partitioning approach:
– Construct various partitions and then evaluate them by some criterion, e.g., minimizing
the sum of square errors
– Typical methods: k-means, k-medoids, CLARANS
• Hierarchical approach:
– Create a hierarchical decomposition of the set of data (or objects) using some criterion
– Typical methods: Hierarchical, Diana, Agnes, BIRCH, ROCK, CAMELEON
• Density-based approach:
– Based on connectivity and density functions
– Typical methods: DBSACN, OPTICS, DenClue
Clustering Approaches
1. Partitioning Methods
2. Hierarchical Methods
3. Density-Based Methods
Partitioning Algorithms: Basic Concept
• Given a k, find a partition of k clusters that optimizes the chosen
partitioning criterion
• k-means and k-medoids algorithms
• k-means (MacQueen’67): Each cluster is represented by the center of the cluster
• k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each
cluster is represented by one of the objects in the cluster
The K-Means Clustering Method
• Given k, the k-means algorithm is
implemented in four steps:1. Partition objects into k nonempty subsets
2. Compute seed points as the centroids of the clusters of the
current partition (the centroid is the center, i.e., mean
point, of the cluster)
3. Assign each object to the cluster with the nearest seed
point
4. Go back to Step 2, stop when no more new assignment
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
The K-Means Clustering Method
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrarily choose K object as initial cluster center
Assign each objects to most similar center
Update the cluster means
Update the cluster means
reassignreassign
Example
• Run K-means clustering with 3 clusters (initial centroids: 3, 16, 25) for at least 2 iterations
Example
• Centroids:3 – 2 3 4 7 9 new centroid: 5
16 – 10 11 12 16 18 19 new centroid: 14.33
25 – 23 24 25 30 new centroid: 25.5
Example
• Centroids:5 – 2 3 4 7 9 new centroid: 5
14.33 – 10 11 12 16 18 19 new centroid: 14.33
25.5 – 23 24 25 30 new centroid: 25.5
In class Practice
• Run K-means clustering with 3 clusters (initial centroids: 3, 12, 19) for at least 2 iterations
• Problem
Example
Suppose we have 4 types of medicines and each has two attributes (pH and
weight index). Our goal is to group these objects into K=2 group of medicine.
Medicine
Weight pH-Index
A 1 1
B 2 1
C 4 3
D 5 4
A B
C
D
Example
• Step 1: Use initial seed points for partitioning Bc ,Ac 21
24.4)14()25( ),(
5)14()15( ),(
222
221
cDd
cDd
Assign each object to the cluster with the nearest seed point
Euclidean distance
D
C
A B
Example
• Step 2: Compute new centroids of the current partition Knowing the members of each
cluster, now we compute the new centroid of each group based on these new memberships.
)3
8 ,
3
11(
3
431 ,
3
542
)1 ,1(
2
1
c
c
Example
• Step 2: Renew membership based on new centroids
Compute the distance of all objects to the new centroids
Assign the membership to objects
Example
• Step 3: Repeat the first two steps until its convergence Knowing the members of each
cluster, now we compute the new centroid of each group based on these new memberships.
)2
13 ,
2
14(
2
43 ,
2
54
)1 ,2
11(
2
11 ,
2
21
2
1
c
c
Example
• Step 3: Repeat the first two steps until its convergence Compute the distance of all
objects to the new centroids
Stop due to no new assignment Membership in each cluster no longer change
ExerciseFor the medicine data set, use K-means with the Manhattan distance
metric for clustering analysis by setting K=2 and initialising seeds as
C1 = A and C2 = C. Answer three questions as follows:
1. How many steps are required for convergence?
2. What are memberships of two clusters after convergence?
3. What are centroids of two clusters after convergence?
Medicine Weight pH-Index
A 1 1
B 2 1
C 4 3
D 5 4
A B
C
D
Hierarchical Clustering
• Two main types of hierarchical clustering– Agglomerative:
• Start with the points as individual clusters
• At each step, merge the closest pair of clusters until only one cluster (or k clusters) left
Matlab: Statistics Toolbox: clusterdata,
which performs all these steps: pdist, linkage, cluster
– Divisive:
• Start with one, all-inclusive cluster
• At each step, split a cluster until each cluster contains a point (or there are k clusters)
• Traditional hierarchical algorithms use a similarity or distance matrix– Merge or split one cluster at a time
– Image segmentation mostly uses simultaneous merge/split
Hierarchical clustering
• Agglomerative (Bottom-up)– Compute all pair-wise pattern-pattern similarity
coefficients
– Place each of n patterns into a class of its own
– Merge the two most similar clusters into one• Replace the two clusters into the new cluster
• Re-compute inter-cluster similarity scores w.r.t. the new cluster
– Repeat the above step until there are k clusters left (k can be 1)
Hierarchical clustering
• Agglomerative (Bottom up)
Hierarchical clustering• Agglomerative (Bottom up)
• 1st iteration1
Hierarchical clustering• Agglomerative (Bottom up)
• 2nd iteration1 2
Hierarchical clustering• Agglomerative (Bottom up)
• 3rd iteration
1 23
Hierarchical clustering• Agglomerative (Bottom up)
• 4th iteration
1 23
4
Hierarchical clustering• Agglomerative (Bottom up)
• 5th iteration
1 23
4
5
Hierarchical clustering• Agglomerative (Bottom up)
• Finally k clusters left
1 23
4
6 9
5
7
8
Hierarchical clustering
• Divisive (Top-down)
– Start at the top with all patterns in one cluster
– The cluster is split using a flat clustering algorithm
– This procedure is applied recursively until each pattern is in its own singleton cluster
Hierarchical clustering
• Divisive (Top-down)
Hierarchical Clustering: The Algorithm
• Hierarchical clustering takes as input a set of points • It creates a tree in which the points are leaves and the internal
nodes reveal the similarity structure of the points.– The tree is often called a “dendogram.”
• The method is summarized below:
Place all points into their own clusters
While there is more than one cluster, do
Merge the closest pair of clusters
The behavior of the algorithm depends on how “closest pair
of clusters” is defined
Hierarchical Clustering: Example
A
B
EF
C D
A B E FC D
This example illustrates single-link clustering in
Euclidean space on 6 points.
Hierarchical Clustering
• Produces a set of nested clusters organized as a hierarchical tree
• Can be visualized as a dendrogram
– A tree like diagram that records the sequences of merges or splits
1 3 2 5 4 60
0.05
0.1
0.15
0.2
1
2
3
4
5
6
1
23 4
5
Strengths of Hierarchical Clustering
• Do not have to assume any particular number of clusters– Any desired number of clusters can be obtained by
‘cutting’ the dendogram at the proper level
Hierarchical Clustering: Merging Clusters
Single Link: Distance between two clusters
is the distance between the closest points.
Also called “neighbor joining.”
Average Link: Distance between clusters is
distance between the cluster centroids.
Complete Link: Distance between clusters
is distance between farthest pair of points.
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Similarity?
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
Proximity Matrix
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function– Ward’s Method uses squared error
How to Define Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective function
An example
Let us consider a gene measured in a set of 5 experiments: A,B,C,D and E. The values measured in the 5 experiments are:
A=100 B=200 C=500 D=900 E=1100
We will construct the hierarchical clustering of these values using Euclidean distance, centroid linkage and an agglomerative approach.
57
An example
SOLUTION:
• The closest two values are 100 and 200=>the centroid of these two values is 150.
• Now we are clustering the values: 150, 500, 900, 1100
• The closest two values are 900 and 1100
=>the centroid of these two values is 1000.
• The remaining values to be joined are: 150, 500, 1000.
• The closest two values are 150 and 500
=>the centroid of these two values is 325.
• Finally, the two resulting subtrees are joined in the root of the tree.
58
An example:Two hierarchical clusters of the expression values of a single gene measured
in 5 experiments.
59
100A
200B
500C
900D
1100E
100A
200B
500C
900D
1100E
The dendograms are identical: both diagrams show that:•A is most similar to B•C is most similar to the group (A,B)•D is most similar to E
In the left dendogram A and E are plotted far from each otherIn the right dendogram A and E are immediate neighbors
THE PROXIMITY IN A HIERARCHICAL CLUSTERING DOES NOT NECESSARILYCORRESPOND TO SIMILARITY
What Is the Problem of the K-Means Method?
• The k-means algorithm is sensitive to outliers !
– Since an object with an extremely large value may substantially distort
the distribution of the data.
• K-Medoids: Instead of taking the mean value of the object in a
cluster as a reference point, medoids can be used, which is the
most centrally located object in a cluster.
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Limitations of K-means: Differing Sizes
Original Points K-means (3 Clusters)
Limitations of K-means: Differing Density
Original Points K-means (3 Clusters)
Limitations of K-means: Non-globular Shapes
Original Points K-means (2 Clusters)
64
The K-Medoids Clustering Method
• Find representative objects, called medoids, in
clusters
• Medoids are located in the center of the
clusters.
– Given data points, how to find the medoid?
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
65
A Typical K-Medoids Algorithm (PAM)
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Total Cost = 20
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrary choose k object as initial medoids
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Assign each remaining object to nearest medoids
Randomly select a nonmedoid object,Oramdom
Compute total cost of swapping
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Total Cost = 26
Swapping O and Oramdom
If quality is improved.
Do loop
Until no change
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
The K-Medoid Clustering Method
• K-Medoids Clustering: Find representative objects (medoids) in clusters
– PAM (Partitioning Around Medoids, Kaufmann & Rousseeuw 1987)
• Starts from an initial set of medoids and iteratively replaces one of the
medoids by one of the non-medoids if it improves the total distance of
the resulting clustering
• PAM works effectively for small data sets, but does not scale well for
large data sets (due to the computational complexity)
• Efficiency improvement on PAM
– CLARA (Kaufmann & Rousseeuw, 1990): PAM on samples
– CLARANS (Ng & Han, 1994): Randomized re-sampling
66
X1 2 6
X2 3 4
X3 3 8
X4 4 7
X5 6 2
X6 6 4
X7 7 3
X8 7 4
X9 8 5
X10 7 6
K-mediods example
• Initialize k mediods
• Let us assume c1 = (3,4) and c2 = (7,4)
• Calculate distance so as to associate each data object to its nearest medoid.
i c1
Data objects (Xi)
Cost (distance)
1 3 4 2 6 3
3 3 4 3 8 4
4 3 4 4 7 4
5 3 4 6 2 5
6 3 4 6 4 3
7 3 4 7 3 5
9 3 4 8 5 6
10 3 4 7 6 6
i c2
Data objects (Xi)
Cost (distance)
1 7 4 2 6 7
3 7 4 3 8 8
4 7 4 4 7 6
5 7 4 6 2 3
6 7 4 6 4 1
7 7 4 7 3 1
9 7 4 8 5 2
10 7 4 7 6 2
Cluster1 = {(3,4)(2,6)(3,8)(4,7)}Cluster2 = {(7,4)(6,2)(6,4)(7,3)(8,5)(7,6)}
i O′
Data objects (Xi)
Cost (distance)
1 7 3 2 6 8
3 7 3 3 8 9
4 7 3 4 7 7
5 7 3 6 2 2
6 7 3 6 4 2
8 7 3 7 4 1
9 7 3 8 5 3
10 7 3 7 6 3
i c1
Data objects (Xi)
Cost (distance)
1 3 4 2 6 3
3 3 4 3 8 4
4 3 4 4 7 4
5 3 4 6 2 5
6 3 4 6 4 3
8 3 4 7 4 4
9 3 4 8 5 6
10 3 4 7 6 6
• Select one of the nonmedoids O′. Let us assume O′ = (7,3)• Now the medoids are c1(3,4) and O′(7,3)
• Do not change the mediod as S > 0
Fuzzy C-means Clustering
• Fuzzy c-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters.
• This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition.
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clusteringhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html
Fuzzy C-means Clustering• For example: we have initial centroid 3 & 11
(with m=2)
• For node 2 (1st element):
U11 =
The membership of first node to first cluster
U12 =
The membership of first node to second cluster
%78.9882
81
81
11
1
112
32
32
32
1
12
2
12
2
%22.182
1
181
1
112
112
32
112
1
12
2
12
2
Fuzzy C-means Clustering
• For example: we have initial centroid 3 & 11 (with m=2)
• For node 3 (2nd element):
U21 = 100% The membership of second node to first cluster
U22 = 0%
The membership of second node to second cluster
Fuzzy C-means Clustering
• For example: we have initial centroid 3 & 11
(with m=2)
• For node 4 (3rd element):
U31 =
The membership of first node to first cluster
U32 =
The membership of first node to second cluster
%98
49
50
1
49
11
1
114
34
34
34
1
12
2
12
2
%250
1
149
1
114
114
34
114
1
12
2
12
2
Fuzzy C-means Clustering
• For example: we have initial centroid 3 & 11
(with m=2)
• For node 7 (4th element):
U41 =
The membership of fourth node to first cluster
U42 =
The membership of fourth node to second cluster
%502
1
11
1
117
37
37
37
1
12
2
12
2
%502
1
11
1
117
117
37
117
1
12
2
12
2
Fuzzy C-means Clustering
• C1=
...%)50(%)98(%)100(%)78.98(
...7*%)50(4*%)98(3*%)100(2*%)78.98(2222
2222
FCM Soft Clustering
Gaussian Mixture Model
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(x
)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x
)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Component 1 Component 2p(x
)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x
)
-5 0 5 100
0.5
1
1.5
2
Component Models
p(x
)
-5 0 5 100
0.1
0.2
0.3
0.4
0.5
Mixture Model
x
p(x
)
The EM Algorithm
• Dempster, Laird, and Rubin
– general framework for likelihood-based parameter estimation with missing data
• start with initial guesses of parameters
• E step: estimate memberships given params
• M step: estimate params given memberships
• Repeat until convergence
– converges to a (local) maximum of likelihood
– E step and M step are often computationally simple
– generalizes to maximum a posteriori (with priors)
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4ANEMIA PATIENTS AND CONTROLS
Red Blood Cell Volume
Red B
lood C
ell
Hem
oglo
bin
Concentr
ation
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell H
em
og
lob
in C
on
ce
ntr
atio
n
EM ITERATION 1
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell H
em
og
lob
in C
on
ce
ntr
atio
n
EM ITERATION 3
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell H
em
og
lob
in C
on
ce
ntr
atio
n
EM ITERATION 5
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell H
em
og
lob
in C
on
ce
ntr
atio
n
EM ITERATION 10
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell H
em
og
lob
in C
on
ce
ntr
atio
n
EM ITERATION 15
3.3 3.4 3.5 3.6 3.7 3.8 3.9 43.7
3.8
3.9
4
4.1
4.2
4.3
4.4
Red Blood Cell Volume
Re
d B
loo
d C
ell H
em
og
lob
in C
on
ce
ntr
atio
n
EM ITERATION 25
Gaussian Mixture
• The Gaussian mixture architecture estimates probability density functions (PDF) for each class, and then performs classification based on Bayes’ rule:
)(
)()|()|(
XP
CPCXPXCP i
ii
Where P(X | Ci) is the PDF of class i, evaluated at X, P( Ci ) is the prior probability for class i, and P(X) is the overall PDF, evaluated
at X.
Gaussian Mixture
• Unlike the unimodal Gaussian architecture, which assumes P(X | Cj) to be in the form of a Gaussian, the Gaussian mixture model estimates P(X | Cj) as a weighted average of multiple Gaussians.
Nc
k
kkj GwCXP1
)|(
where wk is the weight of the k-th Gaussian Gk and the weightssum to one. One such PDF model is produced for each class.
Gaussian Mixture
• Each Gaussian component is defined as:
)]()(2/1[
2/12/
1
||)2(
1kk
Tk MXVMX
k
nk e
VG
where Mk is the mean of the Gaussian and Vk is the covariancematrix of the Gaussian.
Gaussian Mixture
• Free parameters of the Gaussian mixture model consist of the means and covariance matrices of the Gaussian components and the weights indicating the contribution of each Gaussian to the approximation of P(X | Cj).
Composition of Gaussian Mixture
G1,w1 G2,w2
G3,w3
G4,w4
G5,w5
Class 1
)(
)()|()|(
XP
CPCXPXCP
j
jj
Nc
k
kkj GwCXP1
)|(
)]()(2/1[
2/12/
1
||)2(
1)|( i
Ti XViX
i
dik eV
GXpG
Variables: μi, Vi, wk
We use EM (estimate-maximize) algorithm to approximate this variables.
Gaussian Mixture
• These parameters are tuned using a complex iterative procedure called the estimate-maximize (EM) algorithm, that aims at maximizing the likelihood of the training set generated by the estimated PDF.
Gaussian Mixture Training Flow Chart (1)
Initialize the initial Gaussian means μi, i=1,…G using the K means clustering algorithmInitialize the covariance matrices, Vi, to the distance to the nearest cluster.Initialize the weights πi =1 / G so that all Gaussian are equally likely.
)]()(2/1[
2/12/
1
||)2(
1)|( i
Ti XViX
i
di eV
GXp
)|()|(1
i
G
i
is GXpXp
Present each pattern X of the training set and model each of the classes Kas a weighte sum of Gaussians:
where G is the number of Gaussians, the πi’s are the weights, and
where Vi is the covariance matrix.
Gaussian Mixture Training Flow Chart (2)
Compute:
G
j
kjj
kiikiiiip
CXp
CGXp
Xp
CGXpXGP
1
),|(
),|(
)(
),|()|(
Iteratively update the weights, means and covariances:
cN
p
pi
c
i tN
t1
)(1
)1(
p
N
p
pi
ic
XttN
tic
1
)()(
1)1(
)))())(((()()(
1)1(
1
T
ipip
N
ppi
ic
i tXtXttN
tVc
E-Step for EM
M-Step for EM
Note: Here is the PE which we did in the class
Gaussian Mixture Training Flow Chart (3)
Recompute τip using the new weights, means and covariances. Stop training if
Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.
thresholdttpipipi )()1(
Gaussian Mixture Test Flow Chart
Present each input pattern X and compute the confidence for each class j:
where is the prior probability of class Cj estimated by counting the number of training patterns. Classify pattern X as the class with the highest confidence.
),|()( jxj CXPCP
N
NCP ci
j )(
Gaussian Mixture End
108
Acknowledgements
Introduction to Machine Learning, Alphaydin
Pattern Classification” by Duda et al., John Wiley & Sons.
Read GMM from “Automated Detection of Exudates in Colored Retinal Images for Diagnosis of Diabetic Retinopathy”, Applied Optics, Vol. 51 No. 20, 4858-4866, 2012.
Mat
eria
l in
th
ese
slid
es h
as b
een
tak
en f
rom
, th
e fo
llow
ing
reso
urc
es