Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern...

Preview:

Citation preview

Pattern Classification

All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000

with the permission of the authors and the publisher

Chapter 2 (part 3)Bayesian Decision Theory

(Sections 2-6,2-9)

• Discriminant Functions for the Normal Density

• Bayes Decision Theory – Discrete Features

Pattern Classification, Chapter 2 (Part 3)

2Discriminant Functions for the Normal Density

• We saw that the minimum error-rate classification can be achieved by the discriminant function

gi(x) = ln P(x | ωi) + ln P(ωi)

• Case of multivariate normal

)(lnln212ln

2)()(

21)(

1

iii

it

ii Pdxxxg ωπμμ +Σ−−−−−= ∑−

Pattern Classification, Chapter 2 (Part 3)

3

Case Σi = σ2I (I stands for the identity matrix)

• What does “Σi = σ2I” say about the dimensions?

• What about the variance of each dimension?

normEuclidean thedenotes where

)(ln2

)(

:osimplify tcan weThus

)(lnln212ln

2)()(

21)(

in oft independen are ln (d/2) and both :Note

2

2

1

i

+−

−=

+Σ−−−−−=

Σ

∑−

ii

i

iii

it

ii

Pxg

Pdxxxg

i

ωσμχ

ωπμμ

π

Pattern Classification, Chapter 2 (Part 3)

4

• We can further simplify by recognizing that the quadratic term xtx implicit in the Euclidean norm is the same for all i.

)category!th for the threshold thecalled is (

)(ln2

1 ;

:wherefunction)nt discrimina(linear )(

0

202

0

ii

iitii

ii

itii

Pw

wxg

ω

ωσσ

+−==

+=

μμμw

xw

Pattern Classification, Chapter 2 (Part 3)

5

• A classifier that uses linear discriminantfunctions is called “a linear machine”

• The decision surfaces for a linear machine are pieces of hyperplanes defined by:

gi(x) = gj(x)

The equation can be written as:wt(x-x0)=0

Pattern Classification, Chapter 2 (Part 3)

6

• The hyperplane separating Ri and Rj

always orthogonal to the line linking the means!

)()()(ln)(

21

2

2

0 jij

i

ji

ji PP μμ

μμμμx −

−−+=

ωωσ

)(21 then )()( 0 jiji PPif μμx +== ωω

Pattern Classification, Chapter 2 (Part 3)

7

Pattern Classification, Chapter 2 (Part 3)

8

Pattern Classification, Chapter 2 (Part 3)

9

Pattern Classification, Chapter 2 (Part 3)

10

• Case Σi = Σ (covariance of all classes are identical but arbitrary!)Hyperplane separating Ri and Rj

(the hyperplane separating Ri and Rj is generally not orthogonal to the line between the means!)

[ ]).(

)()()(/)(ln

)(21

and

)(Where

0)(equation theHas

10

0t

jiji

tji

jiji

ji

PPμμ

μμΣμμμμx

μμΣw

xxw

1

−−−

−+=

−=

=−

ωω

Pattern Classification, Chapter 2 (Part 3)

11

Pattern Classification, Chapter 2 (Part 3)

12

Pattern Classification, Chapter 2 (Part 3)

13

• Case Σi = arbitrary

• The covariance matrices are different for each category

The decision surfaces are hyperquadratics(Hyperquadrics are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids)

)(Plnln21

21 w

w21W

:wherewxwxWx)x(g

iii1

iti0i

i1

ii

1ii

0itii

ti

ωΣμΣμ

μΣ

Σ

+−−=

=

−=

=+=

Pattern Classification, Chapter 2 (Part 3)

14

Pattern Classification, Chapter 2 (Part 3)

15

Pattern Classification, Chapter 2 (Part 3)

16Bayes Decision Theory – Discrete

Features

• Components of x are binary or integer valued, x can take only one of m discrete values

v1, v2, …, vm

concerned with probabilities rather than probability densities in Bayes Formula:

∑=

=

=

c

jjj

jjj

PPP

PPP

P

1

)()|()(

where)(

)()|()|(

ωω

ωωω

xx

xx

x

Pattern Classification, Chapter 2 (Part 3)

17Bayes Decision Theory – Discrete

Features

• Conditional risk is defined as before: R(α|x)

• Approach is still to minimize risk:

)|(minarg* xiiR αα =

Pattern Classification, Chapter 2 (Part 3)

18

Bayes Decision Theory – Discrete Features

• Case of independent binary features in 2 category problemLet x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with probabilities:

pi = P(xi = 1 | ω1)qi = P(xi = 1 | ω2)

Pattern Classification, Chapter 2 (Part 3)

19Bayes Decision Theory – Discrete

Features

• Assuming conditional independence, P(x|ωi) can be written as a product of component probabilities:

=

=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛=

−=

−=

d

i

x

i

i

x

i

i

d

i

xi

xi

d

i

xi

xi

ii

ii

ii

qp

qp

PP

qqP

ppP

1

1

2

1

1

12

1

11

11

)|()|(

:bygiven ratio likelihood a yielding

)1()|(

and

)1()|(

ωω

ω

ω

xx

x

x

Pattern Classification, Chapter 2 (Part 3)

20Bayes Decision Theory – Discrete

Features

• Taking our likelihood ratio

)()(ln

11ln)1(ln)(

:yields)()(ln

)|()|(ln)(

31 Eq. intoit plugging and

11

)|()|(

2

1

1

2

1

2

1

1

1

2

1

ωω

ωω

ωω

ωω

pp

qpx

qpxg

pp

ppg

qp

qp

PP

d

i i

ii

i

ii

d

i

x

i

i

x

i

iii

+⎥⎦

⎤⎢⎣

⎡−−

−+=

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛=

=

=

x

xxx

xx

Pattern Classification, Chapter 2 (Part 3)

21

• The discriminant function in this case is:

0g(x) if and0 g(x) if decide)(P)(Pln

q1p1lnw

:and

d,...,1i )p1(q)q1(plnw

:where

wxw)x(g

21

2

1d

1i i

i0

ii

iii

0i

d

1ii

≤>

+−−

=

=−−

=

+=

=

=

ωωωω

Recommended