9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Memorial University of Newfoundland

Pattern RecognitionLecture 3

May 9, 2006

http://www.engr.mun.ca/~charlesr

Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM

EN-3026


9881 Project Deliverable #1

• Report topic due tonight via email!

• Two to three sentences regarding the nature of the

problem you will address in your project

• You will be penalized for missing deadlines

2


Recap - Univariate Normal Distribution

3

Week 2 Graphics

Charles Robertson

May 7, 2005

Review

p(x) = p(x1, x2, ..., xn)

mean ! E[x] = µ

covariance ! E[(x" µ)(x" µ)T ] = !

!ij = E[(xi " µi)(xj " µj)]

Univariate normal distribution

p(x) =1#2"!

e!12 (x!µ

! )2

Multivariatep(x) =

1

(2")n2 |!|

12

e!12 (x!µ)T !!1(x!µ)

1

x

2.5% 2.5%

!

p(x)

µ + ! µ + 2!µ - !µ - 2! µ

FIGURE 2.7. A univariate normal distribution has roughly 95% of its area in the range|x ! µ| " 2! , as shown. The peak of the distribution has value p(µ) = 1/

#2"! . From:

Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyrightc$ 2001 by John Wiley & Sons, Inc.

Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright 2001 by John Wiley & Sons, Inc.


Recap - Multivariate Normal Distributions

4

Week 2 Graphics

Charles Robertson

May 6, 2006

1 Review

p(x) = p(x1, x2, ..., xn)

mean ! E[x] = µ

covariance ! E[(x" µ)(x" µ)T ] = !

!ij = E[(xi " µi)(xj " µj)]

Univariate normal distribution

p(x) =1#2"!

e!12 ( x!µ

! )2

Multivariate

p(x) =1

(2")n2 |!| 1

2e!

12 (x!µ)T !!1(x!µ)

1


Transformations of Random Variables

5


Distance Based Classification

• Distance based classification is the most common

type of pattern recognition technique

• Concepts are a basis for other classification

techniques

6


C2

C1

x1

x2

x

7


?



• First we will look at choosing a class prototype

• A prototype is a sample or pattern which represents the

class

• Then we will look at how to calculate the distance

from a new pattern that we are trying to classify to the

class using the prototype

8



We use a pattern-to-class distance measure:

Week 2 Graphics

Charles RobertsonMay 8, 2005

1 Distance Based Classification


x!C1 i! d(x,C1) < d(x,C2)

We use a class prototype zi (a pattern):

d(x,C1) = d(x, z1)

During training have labeled samples such that

C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}

Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we

require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).

Consider a 2 dimensional example

Consider two well separated classes (very idealized compared to most practical problems).

2 Choosing a prototype

1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

distance from x to C1

To find the distance, use a class prototype (pattern):

Week 2 Graphics




x!C1 i! d(x,C1) < d(x,C2)


d(x,C1) = d(x, z1)


C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}






1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

Week 2 Graphics




x!C1 i! d(x,C1) < d(x,C2)


d(x,C1) = d(x, z1)


C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}






1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

9


During training we have labeled samples, such that

Basic rule of thumb - require that N > 10d

Week 2 Graphics




x!C1 i! d(x,C1) < d(x,C2)


d(x,C1) = d(x, z1)


C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}






1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

Week 2 Graphics




x!C1 i! d(x,C1) < d(x,C2)


d(x,C1) = d(x, z1)


C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}






1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

(and features!)

10


Consider a 2 feature example (d=2) with two well-separated classes. This is very idealized comparedwith real practical problems.

x1

x2

C1

C2

11


Choosing a Prototype

1. Sample Mean

Week 2 Graphics




x!C1 i! d(x,C1) < d(x,C2)


d(x,C1) = d(x, z1)


C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}






1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

For class Ci

The advantage of the mean is that it minimizes the representation error of the class.

The mean probably does not correspond to the location of any collected sample.

Ci

12


Minimizing representation error2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

2. CHOOSING A PROTOTYPE 2


err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)


!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni


MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj





err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)


!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni


MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj





err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)


!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni


MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj





err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)


!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni


MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj





err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)


!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni


MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj





err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)


!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni


MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj



which is the mean.

13


The sample covariance matrix is

3. DISTANCE MEASURES 3

S =1N

N!

k=1

(xk !m)(xk !m)T

or

S =1N

!

x!Ci

(x!m)(x!m)T

with i,j entry

sij =1N

N!

k=1

(xki !mi)(xkj !mj) =1N

N!

k=1

(xkixkj)!mimj

and

sj " sjj = the feature variances

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.


S =1N

N!

k=1

(xk !m)(xk !m)T

or

S =1N

!

x!Ci

(x!m)(x!m)T

with i,j entry

sij =1N

N!

k=1

(xki !mi)(xkj !mj) =1N

N!

k=1

(xkixkj)!mimj

and

sj " sjj = the feature variances











with ith,jth entry:

14



2. Most Typical Sample

The sample which is most similar to the the class mean.

Choose ! ! ! ! such that

! ! ! ! ! ! is minimized.


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2





dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

Ci

15



3. Nearest Neighbour


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2





dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

Here x is a sample from the class, and y is the new samplewe are trying to classify. Thus the prototype depends on thelocation of the pattern we are classifying.


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2





dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2





dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

Ci

y

Nearest neighbour prototypes are sensitiveto noise and outliers in the training set.

16




The pattern y is classified in the class of its k nearestneighbours from the training samples. The chosen distancedetermines how ‘near’ is defined.

This gives some protection against noise, but is more computationally expensive.

Ciy

Cjk = 3

17


Summary

18

• First step of distance-based classification is to choose

a prototype

• Options include:

• Sample mean

• Most typical sample

• Nearest neighbour

• k-Nearest neighbours


Distance Measures

Most familiar distance metric is the Euclidean distance:


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2





dw(x, z) = (d!

j=1

(wj(xj ! zj))2)1/2

There are many possibilities for distance measurements.Another example is the Manhattan distance:


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2

Manhattan distance:

dM (x, zi) =d!

j=1

|xj ! zij |




19


Anything that fulfills the following four properties can be a metric:


**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2

Manhattan distance:

dM (x, zi) =d!

j=1

|xj ! zij |





**TODO**

S =1N

d!

j=1

**TODO**











3 Distance Measures


dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2

Manhattan distance:

dM (x, zi) =d!

j=1

|xj ! zij |




4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle in-equality d(x, z) ! d(x, y) + d(y, z)


dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required



4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)


dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.







20


Clearly the Euclidean distance is a metric, but so is a moregeneral weighted metric:





dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.











dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.







21


Minimum Euclidean Distance (MED) Classifier





dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.







Given classes C1 and C2 with prototypes z1 and z2 respectively:





dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.







Equivalently:

22


C2C1

x1

x2

m1

m2

y

dE(y,m1) < dE(y,m2)

Classify pattern y to class C1

23


Decision Boundaries

Given a prototype and a distance metric, it is possibleto find the decision boundary between classes.

2 4 6 8 1014

15

16

17

18

19

20

21

22width

lightness

salmon sea bass

FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The darkline could serve as a decision boundary of our classifier. Overall classification error onthe data shown is lower than if we use only one feature as in Fig. 1.3, but there willstill be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, PatternClassification. Copyright c! 2001 by John Wiley & Sons, Inc.

2 4 6 8 1014

15

16

17

18

19

20

21

22width

lightness

salmon sea bass

FIGURE 1.6. The decision boundary shown might represent the optimal tradeoff be-tween performance on the training set and simplicity of classifier, thereby giving thehighest accuracy on new patterns. From: Richard O. Duda, Peter E. Hart, and David G.Stork, Pattern Classification. Copyright c! 2001 by John Wiley & Sons, Inc.

Figures 1.4 and 1.6 From Pattern Classification by Duda, Hart, and Stork

24


For the MED, the decision boundary between the two classes isa straight line.

z1

z2

In general the decision boundary for the MED is a hyperplanewhich is a perpendicular bisector of the line joining theclass prototypes.

25


C2C1

x1

x2

m1

m2

26


MED Example





dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.











dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.











dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2








Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}


-¿ Do on board.







27


In the example, using the MED and the class mean as a prototypedoes not give a good classification.

Can we fix this by using a different kind of prototype?

Nearest neighbour:! Classifies pattern (4,2) in class C2.

k-Nearest neighbour:! k=1 ! same as nearest neighbour! k=2! a sample from both classes, needs more neighbours...! k=3! two samples are same distance!! k=4! now two samples from each class

! ! kNN can require a lot of computation

28


In general, the MED is sub-optimal for classes with unequalfeature variances, even if the features are uncorrelated.

One solution is to modify the metric!

29


Equivariance Feature Weighting

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

Weight each feature when calculating the distance.

Week 2 Graphics

Charles Robertson

May 8, 2005


d2w(x, z) =

d!

j=1

(wj(xj ! zj))2


E[(wj(xj ! µj))2] = w2j !2

j





Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2


The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2



1

Week 2 Graphics

Charles Robertson

May 8, 2005


d2w(x, z) =

d!

j=1

(wj(xj ! zj))2


E[(wj(xj ! µj))2] = w2j !2

j





Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2


The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2



1

Week 2 Graphics

Charles Robertson

May 8, 2005


d2w(x, z) =

d!

j=1

(wj(xj ! zj))2


E[(wj(xj ! µj))2] = w2j !2

j





Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2


The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2



1

30


Equivariance Feature Weighting

But we only have the sample mean and variance, not thereal mean and variance of the class population.

Week 2 Graphics

Charles Robertson

May 8, 2005


d2w(x, z) =

d!

j=1

(wj(xj ! zj))2


E[(wj(xj ! µj))2] = w2j !2

j





Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2


The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2



1

Week 2 Graphics

Charles Robertson

May 8, 2005


d2w(x, z) =

d!

j=1

(wj(xj ! zj))2


E[(wj(xj ! µj))2] = w2j !2

j





Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2


The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2



1

which is the average squared difference from the sample mean.

Week 2 Graphics

Charles Robertson

May 8, 2005


d2w(x, z) =

d!

j=1

(wj(xj ! zj))2


E[(wj(xj ! µj))2] = w2j !2

j





Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2


The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2



1

So the metric then is

which is like measuring the distance using standard deviation units.

31


This weighting is equivalent to a transformation of features.1. EQUIVARIANCE FEATURE WEIGHTING 2

x! = Wx =

!

"""#

w1

w2 00 . . .

wn

$

%%%&

!

"""#

x1

x2...

xn

$

%%%&

New features are just scaled versions of the original ones, all unit variance.

From the example:

For C1, s21 = 1

4 [(0! 4)2 + (4! 4)2 + (4! 4)2 + (8! 4)2] = 8

And s22 = 1

4 [(0)2 + (1)2 + (1)2 + (0)2] = 12

The class C2 has the same variances as for C1

d2w(x,m1) =

(4! 4)2

8+

(3! 0)2

1/2= 18

d2w(x,m2) =

(4! 8)2

8+

(3! 2)2

1/2= 4

! x!C2

This is only good if the features are uncorrelated.

New features are just scaled versions of the original ones,all in unit variance.

Back to example...

32

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 33

Summary

• Euclidean distance is a common distance measure,

but not the only one.

• Metrics must meet 4 constraints: identity, non-

negative, symmetric, triangle inequality

• Between classes there exist decision boundaries

• Minimum Euclidean Distance is not always a good

classifier

• Weighting features by the inverse of the sample

variance gives better classification, but only good if

the features are uncorrelated

Documents

9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char