7
day month year documentname/initials 1 ECE471-571 – Pattern Recognition Lecture 12 – Unsupervised Learning (Clustering) Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi Email: [email protected] Pattern Classification Statistical Approach Non-Statistical Approach Supervised Unsupervised Basic concepts: Distance Agglomerative method Basic concepts: Baysian decision rule (MPP, LR, Discri.) Parameter estimate (ML, BL) Non-Parametric learning (kNN) LDF (Perceptron) k-means Winner-takes-all Kohonen maps Dimensionality Reduction FLD, PCA Performance Evaluation ROC curve (TP, TN, FN, FP) cross validation Classifier Fusion majority voting NB, BKS Stochastic Methods local opt (GD) global opt (SA, GA) Decision-tree Syntactic approach NN (BP) Support Vector Machine Deep Learning (DL) Mean-shift 3 Review - Bayes Decision Rule ( ) ( )( ) () x p P x p x P j j j ω ω ω | | = For a given x, if P(w1|x)> P(w2|x), then x belongs to class 1, otherwise 2 If , then x belongs to class 1, otherwise, 2. Maximum Posterior Probability Likelihood Ratio p(x|!1) p(x|!2) > P (!2) P (!1) () () if class to vector x feature a assign will classifier The x g x g j i i > ω Discriminant Function Case 1: Minimum Euclidean Distance (Linear Machine), Si=s 2 I Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S Case 3: Quadratic classifier , Si = arbitrary Estimate Gaussian (Maximum Likelihood Estimation, MLE), Two-modal Gaussian Dimensionality reduction Performance evaluation ROC curve For a given x, if k1/k > k2/k, then x belongs to class 1, otherwise 2 Non-parametric kNN

ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

day month year documentname/initials 1

ECE471-571 – Pattern Recognition

Lecture 12 – Unsupervised Learning(Clustering)

Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttp://www.eecs.utk.edu/faculty/qi

Email: [email protected]

Pattern Classification

Statistical Approach Non-Statistical Approach

Supervised UnsupervisedBasic concepts:

DistanceAgglomerative method

Basic concepts:Baysian decision rule (MPP, LR, Discri.)

Parameter estimate (ML, BL)

Non-Parametric learning (kNN)

LDF (Perceptron)

k-means

Winner-takes-all

Kohonen maps

Dimensionality Reduction

FLD, PCA

Performance EvaluationROC curve (TP, TN , FN , FP)cross validation

Classifier Fusionmajority votingNB, BKS

Stochastic Methodslocal opt (GD)global opt (SA, GA)

Decision-tree

Syntactic approach

NN (BP)

Support Vector Machine

Deep Learning (DL)

Mean-shift

3

Review - Bayes Decision Rule( ) ( ) ( )

( )xpPxp

xP jjj

ωωω

|| =

For a given x, if P(w1|x) > P(w2|x), then x belongs to class 1, otherwise 2

If , then x belongs to class 1, otherwise, 2.

Maximum PosteriorProbability

Likelihood Ratiop(x|!1)

p(x|!2)>

P (!2)

P (!1)

( ) ( ) if class to vector x feature aassign willclassifier The

xgxg ji

i

>

ωDiscriminantFunction

Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2ICase 2: Minimum Mahalanobis Distance (Linear Machine), Si = SCase 3: Quadratic classifier , Si = arbitrary

Estimate Gaussian (Maximum Likelihood Estimation, MLE),Two-modal Gaussian

Dimensionality reduction

Performance evaluationROC curve

For a given x, if k1/k > k2/k, then x belongs to class 1, otherwise 2Non-parametrickNN

Page 2: ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

day month year documentname/initials 2

Unsupervised Learning

• What�s unknown?– In the training set, which class does each sample

belong to?– For the problem in general, how many classes

(clusters) is appropriate?

4

Clustering Algorithm

• Agglomerative clustering– Step1: assign each data point in the training set to a

separate cluster– Step2: merge the two �closest� clusters– Step3: repeat step2 until you get the number of

clusters you want or the appropriate cluster number• The result is highly dependent on the measure of

cluster distance

5

Distance from a Point to a Cluster

Euclidean distanceCity block distanceSquared Mahalanobis distance

6

( ) Aeuc xAxd µ−=,

( ) ( ) ( )AAT

Amah xxAxd µµ −Σ−= −1,

Page 3: ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

day month year documentname/initials 3

Distance between Clusters

The centroid distance

Nearest neighbor measure

Furthest neighbor measure

7

( ) BAmean BAd µµ −=,

( ) ( ) BbAabadBAd eucba ∈∈= ,for ,min, ,min

( ) ( ) BbAabadBAd eucba ∈∈= ,for ,max, ,max

Example

8

dmin dmax

A

B

C

A

B

C

Minimum Spanning Tree

Step1: compute all edges in the graphStep2: sort the edges by lengthStep3: beginning with the shortest edge, for each edge between nodes u and v, perform the following operations:nStep3.1: A = find(u) (A is the cluster where u is in)nStep3.2: B = find(v) (B is the cluster where v is in)nStep3.3: if (A!=B) C=union(A, B), and erase sets A and

B

9

Page 4: ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

day month year documentname/initials 4

Comparison of Shape of Clusters

dmin tends to choose clusters which are ??dmax tends to choose clusters which are ??

10

Example

11

The k-means Algorithm

Step1: Begin with an arbitrary assignment of samples to clusters or begin with an arbitrary set of cluster centers and assign samples to nearest clustersStep2: Compute the sample mean of each clusterStep3: Reassign each sample to the cluster with the nearest meanStep4: If the classification of all samples has not changed, stop; else go to step 2.

12

Page 5: ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

day month year documentname/initials 5

Winner-take-all Approach

• Begin with an arbitrary set of cluster centers wi• For each sample x, find the nearest cluster center wa, which is called the winner.

• Modify wa using wanew = wa

old + e(x - waold)

– e is known as a �learning parameter�. – Typical values of this parameter are small, on the

order of 0.01.

13

Winner-take-all

14

wa

xx-wa

*Kohonen Feature Maps (NN)

An extension of the winner-take-all algorithm. Also called self-organizing feature mapsA problem-dependent topological distance is assumed to exist between each pair of the cluster centersWhen the winning cluster center is updated, so are its neighbors in the sense of this topological distance.

15

Page 6: ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

day month year documentname/initials 6

*SOM – A Demo

16

wa

x

x-wa

w4

w3

w2

w1

SOM – The Algorithm

The winning cluster center and its neighbors are trained based on the following formula

17

( ) ( )( )krkr

kr xkk ωεωω −Φ+=+1

( )max

max

minmax

kk

k !!"

#$$%

&=

εε

εε

( )!!!

"

#

$$$

%

& −−=Φ 2

2

2exp

σωω winnerrgg

k

Learning Rate(as k inc, e dec,more stable)

The closer (topological closeness) the neighbor, the moreit will be affected.

rω are the cluster centers

rgω are the coordinate of

the cluster centers

winnergω is the coordinate of

the winner

SOM - An Example

18

http://www.ai-junkie.com/ann/som/som1.html

Page 7: ECE471-571 –Pattern Recognition Lecture …web.eecs.utk.edu/~hqi/ece471-571/lecture12_clustering.pdf · ECE471-571 –Pattern Recognition Lecture 12–UnsupervisedLearning (Clustering)

day month year documentname/initials 7

*Mean Shift Clustering

• Originally proposed in 1975 by Fukunaga and Hostetler for mode detection

• Cheng’s generalization to solve clustering problems in 1995

• Non-parametric – no prior knowledge needed about number of clusters

• Key parameter: window size• Challenging issue:

– How to determine the right window size?– Slow convergence

19

Mean Shift Clustering

• 1. Initialization: Choose a window/kernel of size h, e.g., a flat kernel,

and apply the window on each data point, x

• 2. Mean calculation: Within each window centered at x, compute the

mean of data, where Wx is the set of points enclosed within window h

• 3. Mean shift: Shift the window to the mean, i.e., x=m(x), where the

difference m(x)-x is referred to as the mean shift.

• 4. If ||m(x)-x||>e, go back to step 2.

20

8.4. CONNECTED COMPONENTS: SPATIAL SEGMENTATION USING REGIONGROWING

1. Initialization: Choose a window/kernel of size h, e.g., a flat kernel,

K(x) =

⇢1 if ||x|| h0 if ||x|| > h

and apply the window on each data point, x.

2. Mean calculation: Within each window centered at x, compute the mean ofdata, m(x).

m(x) =

Ps2⌦x

K(s� x)sP

s2⌦xK(s� x)

where ⌦x is the set of data points included within the window specified by h.

3. Mean shift: Shift the window to the mean, i.e., x = m(x), where the di↵erencem(x)� x is referred to as the mean shift in [8.23].

4. If km(x)� xk > ✏, go back to Step 2.

Figure 8.14: Mean shift clustering of colors

background.

8.4.1 A Recursive Approach

This algorithm implements region growing by using a pushdown stack on which to tem-porarily keep the coordinates of pixels in the region. It produces a label image, L(x, y)from an image f(x, y). L is initialized to all zeros.

1. Find an unlabeled black pixel; that is, L(x, y) = 0. Choose a new label number forthis region, call it N . If all pixels have been labeled, stop.

2. L(x, y) N .

3. If f(x�1, y) is black and L(x�1, y) = 0, set L(x�1, y) N and push the coordinatepair (x� 1, y) onto the stack.

If f(x+1, y) is black and L(x+1, y) = 0, set L(x+1, y) N and push (x+1, y)onto the stack.

If f(x, y�1) is black and L(x, y�1) = 0, set L(x, y�1) N and push (x, y�1)onto the stack.

228

8.4. CONNECTED COMPONENTS: SPATIAL SEGMENTATION USING REGIONGROWING

1. Initialization: Choose a window/kernel of size h, e.g., a flat kernel,

K(x) =

⇢1 if ||x|| h0 if ||x|| > h

and apply the window on each data point, x.

2. Mean calculation: Within each window centered at x, compute the mean ofdata, m(x).

m(x) =

Ps2⌦x

K(s� x)sP

s2⌦xK(s� x)

where ⌦x is the set of data points included within the window specified by h.

3. Mean shift: Shift the window to the mean, i.e., x = m(x), where the di↵erencem(x)� x is referred to as the mean shift in [8.23].

4. If km(x)� xk > ✏, go back to Step 2.

Figure 8.14: Mean shift clustering of colors

background.

8.4.1 A Recursive Approach

This algorithm implements region growing by using a pushdown stack on which to tem-porarily keep the coordinates of pixels in the region. It produces a label image, L(x, y)from an image f(x, y). L is initialized to all zeros.

1. Find an unlabeled black pixel; that is, L(x, y) = 0. Choose a new label number forthis region, call it N . If all pixels have been labeled, stop.

2. L(x, y) N .

3. If f(x�1, y) is black and L(x�1, y) = 0, set L(x�1, y) N and push the coordinatepair (x� 1, y) onto the stack.

If f(x+1, y) is black and L(x+1, y) = 0, set L(x+1, y) N and push (x+1, y)onto the stack.

If f(x, y�1) is black and L(x, y�1) = 0, set L(x, y�1) N and push (x, y�1)onto the stack.

228

21

Original | RGB plotk=10, 24, 64 (kmeans)Init (random | uniform), k=6Mean shift (h=30 è14 clusters)Mean shift (h=20 è35 clusters)