COSC522–MachineLearning Discriminant Functions

COSC 522 – Machine Learning

Discriminant Functions

Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: [email protected]

1

Course Website: http://web.eecs.utk.edu/~hqi/cosc522/

https://www.eecs.utk.edu/people/hairong-qi/

http://web.eecs.utk.edu/~hqi/cosc522/

Recap from Previous Lecture• Definition of supervised learning (vs. unsupervised learning)• The difference between the training set and the test set• The difference between classification and regression• Definition of “features”, “samples”, and “dimension”• From histogram to probability density function (pdf)• In Bayes’ Formula, what is conditional pdf? Prior probability? Posterior

probability?• What does the normalization factor (or evidence) do?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of

error?• What are cost function (or objective function) and optimization method used in

MPP?

2

Recap

3

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

MaximumPosteriorProbability

€

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

∫Overall probability of error

Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent

from each other?• What would the covariance matrix look like if the features are independent

from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)

distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or

quadratic classifier (machine)? What does the decision boundary looklike?

• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?

• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?

4

Multi-variate Gaussian

Linear and Quadratic Machines

and their assumptions

5

Discrimimant Function

One way to represent pattern classifier- use discriminant functions gi(x)

For two-class cases,

( ) ( ) if class to vector x feature aassign willclassifier The

xgxg ji

i

>

ω

€

g x( ) = g1(x) − g2(x) = P ω1 | x( ) − P ω2 | x( )

6

Multivariate Normal Density

€

p x ( ) =

12π( )d / 2

Σ1/ 2 exp − 1

2 x − µ ( )T

Σ−1 x − µ ( )

%

& ' (

) *

x : d - component column vector µ : d - component mean vectorΣ : d - by - d covariance matrixΣ : determinant

Σ-1 : inverse

!!!

"

#

$$$

%

&

=

!!!

"

#

$$$

%

&

=

ddx

xx

µ

µ

µ

11

,

( ) ( )!"

#$%

& −−== 2

2

21

exp21

1,dWhen σµ

σπ

xxp

!!!

"

#

$$$

%

&

=

!!!

"

#

$$$

%

&

=Σ2

1

121

1

111

dd

d

ddd

d

σσ

σσ

σσ

σσ

7

Discriminant Function for Normal Density

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )iiiiT

i

iiiiT

i

iii

Pxx

Pd

xx

Pxpxg

ωµµ

ωπµµ

ωω

lnln21

21

lnln212ln

221

ln|ln

1

1

+Σ−−Σ−−=

+Σ−−−Σ−−=

+=

−

−

( )( )

( ) ( )!"#

$%

&−Σ−−

Σ= − µµ

π

xxwxp T

d1

2/12/ 21

exp2

1|










8




9

Case 1: Si=s2I

The features are statistically independent, and have the same varianceGeometrically, the samples fall in equal-size hyperspherical clustersDecision boundary: hyperplane of d-1 dimension

€

Σ =

σ 2 0 0 σ 2

$

%

& & &

'

(

) ) ) , Σ =σ 2d ,Σ−1 =

1σ 2 0

0 1σ 2

$

%

& & & &

'

(

) ) ) )

10

Linear Discriminant Function and Linear Machine

( ) ( )

( )

( ) ( )iiTi

Ti

i

ii

Ti

Ti

T

ii

i

Pxxg

Pxxx

Px

xg

ωσ

µµ

σ

µ

ωσ

µµµ

ωσ

µ

ln2

ln2

2

ln2

22

2

2

2

+−=

++−

−=

+−

−=

( ) ( )iT

ii

i

xxx

x

µµµ

µ

−−=−

−2

(distance) normEuclidean the:

11

Minimum-Distance Classifier

When P(wi) are the same for all c classes, the discriminant function is actually measuring the minimum distance from each x to each of the c mean vectors

( ) 2

2

2σµi

i

xxg

−

−=

12

Case 2: Si = SThe covariance matrices for all the classes are identical but not a scalar of identity matrix. Geometrically, the samples fall in hyperellipsoidalDecision boundary: hyperplane of d-1 dimension

€

gi x ( ) = ln p

x |ω i( ) + ln P ω i( )

= −12 x − µ i( )T

Σi−1 x −

µ i( ) + ln P ω i( )

= µ i

T Σ−1( )T x − 1

2 µ i

TΣ−1 µ i + lnP ω i( )Squared Mahalanobis distance

13

Case 3: Si = arbitrary

The covariance matrices are different from each categoryQuadratic classifierDecision boundary: hyperquadratic for 2-D Gaussian( ) ( ) ( )

( ) ( ) ( )

( ) ( )iiiiTi

Ti

Tii

T

iiiiT

i

iii

Pxxx

Pxx

Pxpxg

ωµµµ

ωµµ

ωω

lnln21

21

21

lnln21

21

ln|ln

111

1

+Σ−Σ−Σ+Σ−=

+Σ−−Σ−−=

+=

−−−

−










14







classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean) distance

classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Mahalanobis)

distance classifier?• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?15

16

Bayes Decision Rule( ) ( ) ( )

( )xpPxp

xP jjj

ωωω

|| =

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >Maximum

PosteriorProbability

( ) ( ) if class to vector x feature aassign willclassifier The

xgxg ji

i

>

ωDiscriminantFunction

Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I

Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S

Case 3: Quadratic classifier , Si = arbitrary

All assuming Gaussian pdf

Documents

COSC522–MachineLearning Discriminant Functions