16
COSC 522 – Machine Learning Discriminant Functions Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville https://www.eecs.utk.edu/people/hairong-qi/ Email: [email protected] 1 Course Website : http://web.eecs.utk.edu/~hqi/cosc522/

COSC522–MachineLearning Discriminant Functions

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COSC522–MachineLearning Discriminant Functions

COSC 522 – Machine Learning

Discriminant Functions

Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: [email protected]

1

Course Website: http://web.eecs.utk.edu/~hqi/cosc522/

Page 2: COSC522–MachineLearning Discriminant Functions

Recap from Previous Lecture• Definition of supervised learning (vs. unsupervised learning)• The difference between the training set and the test set• The difference between classification and regression• Definition of “features”, “samples”, and “dimension”• From histogram to probability density function (pdf)• In Bayes’ Formula, what is conditional pdf? Prior probability? Posterior

probability?• What does the normalization factor (or evidence) do?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of

error?• What are cost function (or objective function) and optimization method used in

MPP?

2

Page 3: COSC522–MachineLearning Discriminant Functions

Recap

3

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

MaximumPosteriorProbability

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

∫Overall probability of error

Page 4: COSC522–MachineLearning Discriminant Functions

Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent

from each other?• What would the covariance matrix look like if the features are independent

from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)

distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or

quadratic classifier (machine)? What does the decision boundary looklike?

• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?

• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?

4

Multi-variate Gaussian

Linear and Quadratic Machines

and their assumptions

Page 5: COSC522–MachineLearning Discriminant Functions

5

Discrimimant Function

One way to represent pattern classifier- use discriminant functions gi(x)

For two-class cases,

( ) ( ) if class to vector x feature aassign willclassifier The

xgxg ji

i

>

ω

g x( ) = g1(x) − g2(x) = P ω1 | x( ) − P ω2 | x( )

Page 6: COSC522–MachineLearning Discriminant Functions

6

Multivariate Normal Density

p x ( ) =

12π( )d / 2

Σ1/ 2 exp − 1

2 x − µ ( )T

Σ−1 x − µ ( )

%

& ' (

) *

x : d - component column vector µ : d - component mean vectorΣ : d - by - d covariance matrixΣ : determinant

Σ-1 : inverse

!!!

"

#

$$$

%

&

=

!!!

"

#

$$$

%

&

=

ddx

xx

µ

µ

µ

11

,

( ) ( )!"

#$%

& −−== 2

2

21

exp21

1,dWhen σµ

σπ

xxp

!!!

"

#

$$$

%

&

=

!!!

"

#

$$$

%

&

=Σ2

1

121

1

111

dd

d

ddd

d

σσ

σσ

σσ

σσ

Page 7: COSC522–MachineLearning Discriminant Functions

7

Discriminant Function for Normal Density

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )iiiiT

i

iiiiT

i

iii

Pxx

Pd

xx

Pxpxg

ωµµ

ωπµµ

ωω

lnln21

21

lnln212ln

221

ln|ln

1

1

+Σ−−Σ−−=

+Σ−−−Σ−−=

+=

( )( )

( ) ( )!"#

$%

&−Σ−−

Σ= − µµ

π

xxwxp T

d1

2/12/ 21

exp2

1|

Page 8: COSC522–MachineLearning Discriminant Functions

Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent

from each other?• What would the covariance matrix look like if the features are independent

from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)

distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or

quadratic classifier (machine)? What does the decision boundary looklike?

• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?

• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?

8

Multi-variate Gaussian

Linear and Quadratic Machines

and their assumptions

Page 9: COSC522–MachineLearning Discriminant Functions

9

Case 1: Si=s2I

The features are statistically independent, and have the same varianceGeometrically, the samples fall in equal-size hyperspherical clustersDecision boundary: hyperplane of d-1 dimension

Σ =

σ 2 0 0 σ 2

$

%

& & &

'

(

) ) ) , Σ =σ 2d ,Σ−1 =

1σ 2 0

0 1σ 2

$

%

& & & &

'

(

) ) ) )

Page 10: COSC522–MachineLearning Discriminant Functions

10

Linear Discriminant Function and Linear Machine

( ) ( )

( )

( ) ( )iiTi

Ti

i

ii

Ti

Ti

T

ii

i

Pxxg

Pxxx

Px

xg

ωσ

µµ

σ

µ

ωσ

µµµ

ωσ

µ

ln2

ln2

2

ln2

22

2

2

2

+−=

++−

−=

+−

−=

( ) ( )iT

ii

i

xxx

x

µµµ

µ

−−=−

−2

(distance) normEuclidean the:

Page 11: COSC522–MachineLearning Discriminant Functions

11

Minimum-Distance Classifier

When P(wi) are the same for all c classes, the discriminant function is actually measuring the minimum distance from each x to each of the c mean vectors

( ) 2

2

2σµi

i

xxg

−=

Page 12: COSC522–MachineLearning Discriminant Functions

12

Case 2: Si = SThe covariance matrices for all the classes are identical but not a scalar of identity matrix. Geometrically, the samples fall in hyperellipsoidalDecision boundary: hyperplane of d-1 dimension

gi x ( ) = ln p

x |ω i( ) + ln P ω i( )

= −12 x − µ i( )T

Σi−1 x −

µ i( ) + ln P ω i( )

= µ i

T Σ−1( )T x − 1

2 µ i

TΣ−1 µ i + lnP ω i( )Squared Mahalanobis distance

Page 13: COSC522–MachineLearning Discriminant Functions

13

Case 3: Si = arbitrary

The covariance matrices are different from each categoryQuadratic classifierDecision boundary: hyperquadratic for 2-D Gaussian( ) ( ) ( )

( ) ( ) ( )

( ) ( )iiiiTi

Ti

Tii

T

iiiiT

i

iii

Pxxx

Pxx

Pxpxg

ωµµµ

ωµµ

ωω

lnln21

21

21

lnln21

21

ln|ln

111

1

+Σ−Σ−Σ+Σ−=

+Σ−−Σ−−=

+=

−−−

Page 14: COSC522–MachineLearning Discriminant Functions

Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent

from each other?• What would the covariance matrix look like if the features are independent

from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean)

distance classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or

quadratic classifier (machine)? What does the decision boundary looklike?

• What are the assumptions made when using a minimum (Mahalanobis)distance classifier?

• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?

14

Multi-variate Gaussian

Linear and Quadratic Machines

and their assumptions

Page 15: COSC522–MachineLearning Discriminant Functions

Questions• What is a discriminant function?• What is a multivariate Gaussian (or normal density function)?• What is the covariance matrix and what is its dimension?• What would the covariance matrix look like if the features are independent

from each other?• What would the covariance matrix look like if the features are independent

from each other AND have the same spread in each dimension?• What is minimum (Euclidean) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Euclidean) distance

classifier?• What is minimum (Mahalanobis) distance classifier? Is it a linear or quadratic

classifier (machine)? What does the decision boundary look like?• What are the assumptions made when using a minimum (Mahalanobis)

distance classifier?• What does the decision boundary look like for a quadratic classifier?• What are the cost functions for the discriminant functions? And what is the

optimization method used to find the best solution?15

Page 16: COSC522–MachineLearning Discriminant Functions

16

Bayes Decision Rule( ) ( ) ( )

( )xpPxp

xP jjj

ωωω

|| =

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >Maximum

PosteriorProbability

( ) ( ) if class to vector x feature aassign willclassifier The

xgxg ji

i

>

ωDiscriminantFunction

Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I

Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S

Case 3: Quadratic classifier , Si = arbitrary

All assuming Gaussian pdf