17
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 3 May 9, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM EN-3026 ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 9881 Project Deliverable #1 Report topic due tonight via email! Two to three sentences regarding the nature of the problem you will address in your project You will be penalized for missing deadlines 2

9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

  • Upload
    hanhu

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Memorial University of Newfoundland

Pattern RecognitionLecture 3

May 9, 2006

http://www.engr.mun.ca/~charlesr

Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM

EN-3026

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

9881 Project Deliverable #1

• Report topic due tonight via email!

• Two to three sentences regarding the nature of the

problem you will address in your project

• You will be penalized for missing deadlines

2

Page 2: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Recap - Univariate Normal Distribution

3

Week 2 Graphics

Charles Robertson

May 7, 2005

Review

p(x) = p(x1, x2, ..., xn)

mean ! E[x] = µ

covariance ! E[(x" µ)(x" µ)T ] = !

!ij = E[(xi " µi)(xj " µj)]

Univariate normal distribution

p(x) =1#2"!

e!12 (x!µ

! )2

Multivariatep(x) =

1

(2")n2 |!|

12

e!12 (x!µ)T !!1(x!µ)

1

x

2.5% 2.5%

!

p(x)

µ + ! µ + 2!µ - !µ - 2! µ

FIGURE 2.7. A univariate normal distribution has roughly 95% of its area in the range|x ! µ| " 2! , as shown. The peak of the distribution has value p(µ) = 1/

#2"! . From:

Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyrightc$ 2001 by John Wiley & Sons, Inc.

Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright 2001 by John Wiley & Sons, Inc.

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Recap - Multivariate Normal Distributions

4

Week 2 Graphics

Charles Robertson

May 6, 2006

1 Review

p(x) = p(x1, x2, ..., xn)

mean ! E[x] = µ

covariance ! E[(x" µ)(x" µ)T ] = !

!ij = E[(xi " µi)(xj " µj)]

Univariate normal distribution

p(x) =1#2"!

e!12 ( x!µ

! )2

Multivariate

p(x) =1

(2")n2 |!| 1

2e!

12 (x!µ)T !!1(x!µ)

1

Page 3: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Transformations of Random Variables

5

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Distance Based Classification

• Distance based classification is the most common

type of pattern recognition technique

• Concepts are a basis for other classification

techniques

6

Page 4: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

C2

C1

x1

x2

x

7

Distance Based Classification

?

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Distance Based Classification

• First we will look at choosing a class prototype

• A prototype is a sample or pattern which represents the

class

• Then we will look at how to calculate the distance

from a new pattern that we are trying to classify to the

class using the prototype

8

Page 5: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Distance Based Classification

We use a pattern-to-class distance measure:

Week 2 Graphics

Charles RobertsonMay 8, 2005

1 Distance Based Classification

We use a pattern-to-class distance measure:

x!C1 i! d(x,C1) < d(x,C2)

We use a class prototype zi (a pattern):

d(x,C1) = d(x, z1)

During training have labeled samples such that

C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}

Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we

require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).

Consider a 2 dimensional example

Consider two well separated classes (very idealized compared to most practical problems).

2 Choosing a prototype

1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

distance from x to C1

To find the distance, use a class prototype (pattern):

Week 2 Graphics

Charles RobertsonMay 8, 2005

1 Distance Based Classification

We use a pattern-to-class distance measure:

x!C1 i! d(x,C1) < d(x,C2)

We use a class prototype zi (a pattern):

d(x,C1) = d(x, z1)

During training have labeled samples such that

C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}

Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we

require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).

Consider a 2 dimensional example

Consider two well separated classes (very idealized compared to most practical problems).

2 Choosing a prototype

1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

Week 2 Graphics

Charles RobertsonMay 8, 2005

1 Distance Based Classification

We use a pattern-to-class distance measure:

x!C1 i! d(x,C1) < d(x,C2)

We use a class prototype zi (a pattern):

d(x,C1) = d(x, z1)

During training have labeled samples such that

C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}

Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we

require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).

Consider a 2 dimensional example

Consider two well separated classes (very idealized compared to most practical problems).

2 Choosing a prototype

1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

9

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

During training we have labeled samples, such that

Basic rule of thumb - require that N > 10d

Week 2 Graphics

Charles RobertsonMay 8, 2005

1 Distance Based Classification

We use a pattern-to-class distance measure:

x!C1 i! d(x,C1) < d(x,C2)

We use a class prototype zi (a pattern):

d(x,C1) = d(x, z1)

During training have labeled samples such that

C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}

Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we

require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).

Consider a 2 dimensional example

Consider two well separated classes (very idealized compared to most practical problems).

2 Choosing a prototype

1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

Week 2 Graphics

Charles RobertsonMay 8, 2005

1 Distance Based Classification

We use a pattern-to-class distance measure:

x!C1 i! d(x,C1) < d(x,C2)

We use a class prototype zi (a pattern):

d(x,C1) = d(x, z1)

During training have labeled samples such that

C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}

Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we

require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).

Consider a 2 dimensional example

Consider two well separated classes (very idealized compared to most practical problems).

2 Choosing a prototype

1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

(and features!)

10

Page 6: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Consider a 2 feature example (d=2) with two well-separated classes. This is very idealized comparedwith real practical problems.

x1

x2

C1

C2

11

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Choosing a Prototype

1. Sample Mean

Week 2 Graphics

Charles RobertsonMay 8, 2005

1 Distance Based Classification

We use a pattern-to-class distance measure:

x!C1 i! d(x,C1) < d(x,C2)

We use a class prototype zi (a pattern):

d(x,C1) = d(x, z1)

During training have labeled samples such that

C1 = {xi|xi!C1; i = 1, 2, ..., N1}

C2 = {xj |xj!C2; i = 1, 2, ..., N2}

Assume that N1, N2 >> d, where d is the number of dimensions. Basic rule of thumb is that we

require N > 4d; so if we have a two dimensional problem, we need at least 8 samples (total).

Consider a 2 dimensional example

Consider two well separated classes (very idealized compared to most practical problems).

2 Choosing a prototype

1. Sample Mean:

mi =1Ni

!

x!Ci

x

1

For class Ci

The advantage of the mean is that it minimizes the representation error of the class.

The mean probably does not correspond to the location of any collected sample.

Ci

12

Page 7: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Minimizing representation error2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

2. CHOOSING A PROTOTYPE 2

The mean minimizes representation error

err =!

x!Ci

|x! zi|2

=!

x!Ci

(x! zi)T (x! zi)

!err

!zi

= !2!

x!Ci

(x! zi)

To minimize, set it to 0.

!

x!Ci

(x! zi) = 0

!

x!Ci

(x)!Nizi = 0

zi =

!

x!Ci

x

Ni

Note that the mean square representation error is

MSE =1Ni

!

x!Ci

|x!mi|2

=1Ni

!

x!Ci

d!

j=1

(xj !mij)2

=1Ni

d!

j=1

!

x!Ci

(xj !mij)2

=1Ni

d!

j=1

NiSj

where Sj comes from the sample covariance matrix.

What is the sample covariance matrix?

which is the mean.

13

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

The sample covariance matrix is

3. DISTANCE MEASURES 3

S =1N

N!

k=1

(xk !m)(xk !m)T

or

S =1N

!

x!Ci

(x!m)(x!m)T

with i,j entry

sij =1N

N!

k=1

(xki !mi)(xkj !mj) =1N

N!

k=1

(xkixkj)!mimj

and

sj " sjj = the feature variances

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3. DISTANCE MEASURES 3

S =1N

N!

k=1

(xk !m)(xk !m)T

or

S =1N

!

x!Ci

(x!m)(x!m)T

with i,j entry

sij =1N

N!

k=1

(xki !mi)(xkj !mj) =1N

N!

k=1

(xkixkj)!mimj

and

sj " sjj = the feature variances

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

with ith,jth entry:

14

Page 8: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Choosing a Prototype

2. Most Typical Sample

The sample which is most similar to the the class mean.

Choose ! ! ! ! such that

! ! ! ! ! ! is minimized.

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

Ci

15

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Choosing a Prototype

3. Nearest Neighbour

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

Here x is a sample from the class, and y is the new samplewe are trying to classify. Thus the prototype depends on thelocation of the pattern we are classifying.

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) = (n!

j=1

(xj ! zij)2)1/2

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) = (n!

j=1

(wj(xj ! zj))2)1/2

Ci

y

Nearest neighbour prototypes are sensitiveto noise and outliers in the training set.

16

Page 9: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Choosing a Prototype

4. k-Nearest Neighbours

The pattern y is classified in the class of its k nearestneighbours from the training samples. The chosen distancedetermines how ‘near’ is defined.

This gives some protection against noise, but is more computationally expensive.

Ciy

Cjk = 3

17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Summary

18

• First step of distance-based classification is to choose

a prototype

• Options include:

• Sample mean

• Most typical sample

• Nearest neighbour

• k-Nearest neighbours

Page 10: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Distance Measures

Most familiar distance metric is the Euclidean distance:

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) = (d!

j=1

(wj(xj ! zj))2)1/2

There are many possibilities for distance measurements.Another example is the Manhattan distance:

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2

Manhattan distance:

dM (x, zi) =d!

j=1

|xj ! zij |

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

19

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Anything that fulfills the following four properties can be a metric:

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2

Manhattan distance:

dM (x, zi) =d!

j=1

|xj ! zij |

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

3. DISTANCE MEASURES 3

**TODO**

S =1N

d!

j=1

**TODO**

2. Most typical Sample

The sample in the set which is closes to the center of the class. Choose zi = x such that

x!Ci, d(x,mi) is minimized.

3. Nearest Neighbours

Choose zi = x such that d(y, x) is minimized. Here y is the new sample and x is a sample from

the original training set (all classes).

If x!C1 then classify y as C1. Or if x!C2 then classify y as C2.

4. k-Nearest Neighbours

y is classified in the class of its k nearest neighbours from the training samples. Gives some

protection against noise, but is computationally intensive.

3 Distance Measures

Most familiar distance measurement is the Euclidean distance:

dE(x, zi) =

"

#d!

j=1

(xj ! zij)2$

%1/2

Manhattan distance:

dM (x, zi) =d!

j=1

|xj ! zij |

Many possibilities for distance measurements. Anything that fulfills following constraints can

be a metric (measure): 1. Identity d(x, z) = 0 i! x = z 2. Non-negative d(x, z) " 0 3. Symmetry

d(x, z) = d(z, x) 4. Triangle in-equality d(x, z) # d(x, y) + d(y, z)

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle in-equality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

20

Page 11: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Clearly the Euclidean distance is a metric, but so is a moregeneral weighted metric:

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

21

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Minimum Euclidean Distance (MED) Classifier

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

Given classes C1 and C2 with prototypes z1 and z2 respectively:

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

Equivalently:

22

Page 12: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

C2C1

x1

x2

m1

m2

y

dE(y,m1) < dE(y,m2)

Classify pattern y to class C1

23

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Decision Boundaries

Given a prototype and a distance metric, it is possibleto find the decision boundary between classes.

2 4 6 8 1014

15

16

17

18

19

20

21

22width

lightness

salmon sea bass

FIGURE 1.4. The two features of lightness and width for sea bass and salmon. The darkline could serve as a decision boundary of our classifier. Overall classification error onthe data shown is lower than if we use only one feature as in Fig. 1.3, but there willstill be some errors. From: Richard O. Duda, Peter E. Hart, and David G. Stork, PatternClassification. Copyright c! 2001 by John Wiley & Sons, Inc.

2 4 6 8 1014

15

16

17

18

19

20

21

22width

lightness

salmon sea bass

FIGURE 1.6. The decision boundary shown might represent the optimal tradeoff be-tween performance on the training set and simplicity of classifier, thereby giving thehighest accuracy on new patterns. From: Richard O. Duda, Peter E. Hart, and David G.Stork, Pattern Classification. Copyright c! 2001 by John Wiley & Sons, Inc.

Figures 1.4 and 1.6 From Pattern Classification by Duda, Hart, and Stork

24

Page 13: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

For the MED, the decision boundary between the two classes isa straight line.

z1

z2

In general the decision boundary for the MED is a hyperplanewhich is a perpendicular bisector of the line joining theclass prototypes.

25

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

C2C1

x1

x2

m1

m2

26

Page 14: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

MED Example

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

4. MINIMUM EUCLIDEAN DISTANCE (MED) CLASSIFIER 4

3. Symmetry d(x, z) = d(z, x)

4. Triangle inequality d(x, z) ! d(x, y) + d(y, z)

Clearly the euclidean distance is a metric, but so is a more general weighted metric:

dw(x, z) =

!

"d#

j=1

(wj(xj " zj))2$

%1/2

where the di!erence in jth feature is weighted by wj .

4 Minimum Euclidean Distance (MED) Classifier

x!C1 if dE(x, z1) < dE(x, z2) or (x" z1)T (x" z1) < (x" z2)T (x" z2)

What is the decision boundary?

The decision boundary is in general a hyperplane which is a perpendicular bisector of the line

joining the class prototypes (z1 and z2).

-¿ Prove in exercise ’3’ in the assignment.

Example

C1 = {(0, 0), (4, 1), (4,"1), (8, 0)} C2 = {(4, 2), (8, 3), (8, 1), (12, 2)}

Classify point (4,3) using means as prototypes and MED metric.

-¿ Do on board.

Can we fix this by using a di!erent kind of prototype?

-¿ Yes, by using nearest neighbour, which finds (4,2) in class 2

Can we fix by using k Nearest Neighbour? -¿ k = 1 means x!C2 -¿ k = 2 does not choose - both

classes are picked -¿ k = 3 does not choose - distance to both classes are the same -¿ could choose

nearest of the two classes -¿ k=4 does not choose either (same case as k=3!) -¿ potentially lots of

extra computation required

27

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

In the example, using the MED and the class mean as a prototypedoes not give a good classification.

Can we fix this by using a different kind of prototype?

Nearest neighbour:! Classifies pattern (4,2) in class C2.

k-Nearest neighbour:! k=1 ! same as nearest neighbour! k=2! a sample from both classes, needs more neighbours...! k=3! two samples are same distance!! k=4! now two samples from each class

! ! kNN can require a lot of computation

28

Page 15: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

In general, the MED is sub-optimal for classes with unequalfeature variances, even if the features are uncorrelated.

One solution is to modify the metric!

29

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Equivariance Feature Weighting

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

Weight each feature when calculating the distance.

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

30

Page 16: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Equivariance Feature Weighting

But we only have the sample mean and variance, not thereal mean and variance of the class population.

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

which is the average squared difference from the sample mean.

Week 2 Graphics

Charles Robertson

May 8, 2005

1 Equivariance Feature Weighting

d2w(x, z) =

d!

j=1

(wj(xj ! zj))2

If the variance of feature j is !2j , then the variance of wjxj is

E[(wj(xj ! µj))2] = w2j !2

j

Thus choose wj = 1!j

to scale the features.

But we only have the sample mean and variance for the class, not the real mean and variance

of the class population.

Choose wj = 1sj

, where

s2j =

1N

!

C

(xj !mj)2

which is the average squared di!erence from the sample mean.

The metric then is

d2w(x, z) =

n!

j=1

(xj ! zj

sj)2

which is like measuring the distance using standard deviation units (on a per class basis).

This weighting is equivalent to a transformation of features

1

So the metric then is

which is like measuring the distance using standard deviation units.

31

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

This weighting is equivalent to a transformation of features.1. EQUIVARIANCE FEATURE WEIGHTING 2

x! = Wx =

!

"""#

w1

w2 00 . . .

wn

$

%%%&

!

"""#

x1

x2...

xn

$

%%%&

New features are just scaled versions of the original ones, all unit variance.

From the example:

For C1, s21 = 1

4 [(0! 4)2 + (4! 4)2 + (4! 4)2 + (8! 4)2] = 8

And s22 = 1

4 [(0)2 + (1)2 + (1)2 + (0)2] = 12

The class C2 has the same variances as for C1

d2w(x,m1) =

(4! 4)2

8+

(3! 0)2

1/2= 18

d2w(x,m2) =

(4! 8)2

8+

(3! 2)2

1/2= 4

! x!C2

This is only good if the features are uncorrelated.

New features are just scaled versions of the original ones,all in unit variance.

Back to example...

32

Page 17: 9881 Pr oject Deliverable #1 - Memorial University of ...charlesr/9881/lecture3.pdf · 9881 Pr oject Deliverable #1 ... Choosing a Pr ototype 1. Sample Mean W eek 2 Graphics Char

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 33

Summary

• Euclidean distance is a common distance measure,

but not the only one.

• Metrics must meet 4 constraints: identity, non-

negative, symmetric, triangle inequality

• Between classes there exist decision boundaries

• Minimum Euclidean Distance is not always a good

classifier

• Weighting features by the inverse of the sample

variance gives better classification, but only good if

the features are uncorrelated