Chap10.Part1

7/31/2019 Chap10.Part1

1/26

Unsupervised Learning andClustering

7/31/2019 Chap10.Part1

2/26

Why consider unlabeled samples?

1. Collecting and labeling large set of samples is costlyGetting recorded speech is free, labeling is time consuming

2. Classifier could be designed on small set of labeledsamples and tuned on a large unlabeled set

3. Train on large unlabeled set and use supervision ongroupings found

4. Characteristics of patterns may change with time

5. Unsupervised methods can be used to find usefulfeatures

6. Exploratory data analysis may discover presence of

significant subclasses affecting design

7/31/2019 Chap10.Part1

3/26

7/31/2019 Chap10.Part1

4/26

Gradient Ascent for Mixtures

Mixture density:

Likelihood of observed samples:

Log-likelihood:

Gradient w.r.t. i:

MLE must satisfy:

)(),|x(),x(1

j

c

j

jj Ppp =

=

=

=n

k

kxpDp1

)|()|(

=

=n

k

kxpl1

)|(ln

= ==

c

j

jjjk

n

k k

Pxpxp

l ii11

)(),|()|(

1

0)

,|(ln)

,|(1=

=iikk

n

ki xpxP i

7/31/2019 Chap10.Part1

5/26

Gaussian Mixture

Unknown mean vectors,yields

Leading to an iterative

scheme for improvingestimates

t

n

k

ki

n

k

kki

i

xP

xxP

),..(where

),|(

),|(

c1

1

1

==

=

=

))(,|(

)1( 1

==+

n

k

kki xjxP

j

))(,|(

1

=

n

k

ki

i

jxP

7/31/2019 Chap10.Part1

6/26

k-means clustering

Gaussian case with all parametersunknown leads to a formulation:

begin initialize n, c, 1,2,..,c do classify n samples according to nearest i

recompute iuntil no change in iend

7/31/2019 Chap10.Part1

7/26

k-meansclustering with one feature

One-dimensionalexample

Six starting points lead local maxima whereastwo for both of which 1(0) =2(0) lead to asaddle point

7/31/2019 Chap10.Part1

8/26

k-means clustering with two features

Two-dimensional

example

There are three means and there are three steps

in the iteration. Voronoi tesselations based on meansare shown

7/31/2019 Chap10.Part1

9/26

7/31/2019 Chap10.Part1

10/26

7/31/2019 Chap10.Part1

11/26

Data sets having identical statistics

upto second order, i.e., same and

7/31/2019 Chap10.Part1

12/26

7/31/2019 Chap10.Part1

13/26

Similarity Measures

Two Issues1. How to measure similarity between samples?2. How to evaluate partitioning?

If distance is a good measure of dissimilarity

distance between samples in same cluster must be smallerthan distance between samples in different clusters

Two samples belong to the same cluster if distancebetween them is less than a threshold d0

Distancethresholdaffects number

and size ofclusters

7/31/2019 Chap10.Part1

14/26

7/31/2019 Chap10.Part1

15/26

Binary Feature Similarity

Measures

||'||||||)',(

'

xx

xxxxst

=Numerator

no of attributes possessed by both x and xDenominator

(xtxxtx)1/2 is geometric mean of no of

attributes possessed by x and x

d

xxs xxt '

)',( = Fraction of attributes shared

Tanimoto coefficient: Ratio of

number of shared attributes tonumber possessed by x or xxxxxxx

xxttt

t

xxs ''

'

')',( +=

7/31/2019 Chap10.Part1

16/26

7/31/2019 Chap10.Part1

17/26

7/31/2019 Chap10.Part1

18/26

7/31/2019 Chap10.Part1

19/26

7/31/2019 Chap10.Part1

20/26

Hierarchical Clustering

7/31/2019 Chap10.Part1

21/26

7/31/2019 Chap10.Part1

22/26

7/31/2019 Chap10.Part1

23/26

7/31/2019 Chap10.Part1

24/26

Nearest Neighbor Algorithm

7/31/2019 Chap10.Part1

25/26

How to determine nearest clusters

7/31/2019 Chap10.Part1

26/26

Documents

Chap10.Part1