- Home
- Documents
*Lectures 5 & 6: Classifiers - University of az/lectures/est/ · Lectures 5 & 6: Classifiers*

prev

next

out of 29

View

212Download

0

Embed Size (px)

Lectures 5 & 6: ClassifiersHilary Term 2007 A. Zisserman

Bayesian Decision Theory Bayes decision rule Loss functions Likelihood ratio test

Classifiers and Decision Surfaces Discriminant function Normal distributions

Linear Classifiers The Perceptron Logistic Regression

Decision Theory

Suppose we wish to make measurements on a medical image and classify it as showing evidence of cancer or not

image x

C1 cancer

C2 no cancerimage processingdecision

rule

measurement

and we want to base this decision on the learnt joint distribution

How do we make the best decision?

p(x,Ci) = p(x|Ci)p(Ci)

Classification

Assign input vector to one of two or more classes

Any decision rule divides input space into decision regions separated by decision boundaries

x Ck

Example: two class decision depending on a 2D vector measurement

Also, would like a confidence measure (how sure are we that the input belongs to the chosen category?)

Decision Boundary for average error

Consider a two class decision depending on a scalar variable x

x

R1R1 R2R2

p x( , )C2p x( , )C2

p x( , )C1p x( , )C1x^

x^

x0x0

minimize number of misclassifications if the decision boundary is at x0

Bayes Decision ruleAssign x to the class Ci for which p(x, Ci) is largest

Assign x to the class Ci for which p( Ci | x ) is largest

since p(x, Ci) = p(Ci|x) p(x) this is equivalent to

p(error) =Z +

p(error, x) dx

=ZR1p(x,C2) dx+

ZR2p(x,C1) dx

Bayes error

A classifier is a mapping from a vector x to class labels {C1, C2}

The Bayes error is the probability of misclassification

p(error) =Z +

p(error, x) dx

=ZR1p(x,C2) dx+

ZR2p(x,C1) dx

=ZR1p(C2|x)p(x) dx+

ZR2p(C1|x)p(x) dx

x

R1R1 R2R2

p x( , )C2p x( , )C2

p x( , )C1p x( , )C1x^

x^

x0x0

Example: Iris recognition

How Iris Recognition Works, John DaugmanIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 14, NO. 1, JANUARY 2004

Posteriors

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

clas

s de

nsiti

es

p(x|C1)

p(x|C2)

x0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

post

erio

r pr

obab

ilitie

s

x

p(C1|x) p(C

2|x)

Assign x to the class Ci for which p( Ci | x ) is largest

i.e. class i if p(Ci|x) > 0.5

p(C1|x)+ p(C2|x) = 1,so p(C2|x) = 1 p(C1|x)

sum to 1

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

post

erio

r pr

obab

ilitie

s

x

p(C1|x) p(C

2|x)

Reject option

avoid making decisions if unsure

reject if posterior probability p(Ci|x) <

reject region

Example skin detection in video

Objective: label skin pixels (as a means to detect humans)

Two stages:

1. Training: learn likelihood for pixel colour, given skin and non-skin pixels

2. Testing: classify a new image into skin regions

training image training skin pixel mask masked pixels

r=R/(R+G+B)

g=G

/(R+G

+B)

chromaticity color space

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

- chromaticity color space: r=R/(R+G+B), g=G/(R+G+B)- invariant to scaling of R,G,B, plus 2D for visualisation

Choice of colour space

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

r=R/(R+G+B)

g=G

/(R+G

+B)

skin pixels in chromaticity space

Represent likelihood as Normal Distribution

N (x|,) = 1(2)n/2 ||1/2

exp

12(x )>1(x )

r=R/(R+G+B)

g=G

/(R+G

+B)

p(x|background)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

20

40

60

80

100

120

Gaussian fitted to background pixels

r=R/(R+G+B)

g=G

/(R+G

+B)

p(x|skin)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

200

400

600

800

1000

1200

Gaussian fitted to skin pixels

r=R/(R+G+B)

g=G

/(R+G

+B)

contours of p(x|skin/background)

0.3 0.35 0.4 0.45 0.5 0.55

0.2

0.25

0.3

0.35

0.4

0.45

contours of two Gaussians 3D view of two Gaussiansvertical axis is likelihood

Posterior probability of skin given pixel colour

Assume equal prior probabilities, i.e. probability

of a pixel being skin is 0.5.

Posterior probability of skin is defined by Bayes rule:

P(skin|x) = p(x|skin)P(skin)p(x)

where

p(x) = p(x|skin)P(skin)+ p(x|background)P(background)i.e. the marginal pdf of x

NB: the posterior depends on both foreground and background likelihoods i.e. it involves both distributions

P(x|background)

0

20

40

60

80

100

120

Assess performance on training image

input

P(x|skin)

0

200

400

600

800

1000

1200

P(skin|x)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1P(x|skin)

0

200

400

600

800

1000

1200

likelihood posterior

posterior depends on likelihoods (Gaussians) of both classes

P(skin|x)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1P(skin|x)>0.5

Test data

p(x|background)

p(x|skin)

p(skin|x)

p(skin|x)>0.5

Test performance on other frames

Receiver Operator Characteristic (ROC) Curve

In many algorithms there is a threshold that affects performance

true positives

false positives

1

10e.g. true positive: skin pixel classified as skin

false positive: background pixel classified as skin

threshold decreasing

worse performance

Loss function revisited

Consider again the cancer diagnosis example. The consequences for an incorrect classification vary for the following cases:

False positive: does not have cancer, but is classified as having it> distress, plus unnecessary further investigation

False negative: does have cancer, but is classified as not having it> no treatment, premature death

The two other cases are true positive and true negative.

Because the consequences of a false negative far outweigh the others, rather than simply minimize the number of mistakes, a loss function can be minimized.

Loss matrix

R(Ci|x) =Xj

Lijp(Cj|x)

Lij =0 1

1000 0

cancer normal

cancer

normalclassification

truth

true +ve false +ve

false -ve true -ve

Risk

Bayes Risk

The class conditional risk of an action is

R(ai|x) =Xj

L(ai|Cj)p(Cj|x)

action

measurement

loss incurred if action i taken and true state is j

Bayes decision rule: select the action for which R(ai | x) is minimum

Mininimize Bayes risk

This decision minimizes the expected loss

ai = argminaiR(ai|x)

Likelihood ratio

Two category classification with loss function

Conditional risk

R(a1|x) = L11p(C1|x)+ L12p(C2|x)R(a2|x) = L21p(C1|x)+ L22p(C2|x)

Thus for minimum risk, decide C1 if

L11p(C1|x)+ L12p(C2|x) < L21p(C1|x) + L22p(C2|x)p(C2|x)(L12 L22) < p(C1|x)(L21 L11)

p(x|C2)p(C2)(L12 L22) < p(x|C1)p(C1)(L21 L11)Assuming L21 L11 > 0, then decide C1 if

p(x|C1)p(x|C2)

>p(C2)(L22 L12)p(C1)(L11 L21)

i.e. likelihood ratio exceeds a threshold that is independent of x

Bayes

A two category classifier can often be written in the form

where is a discriminant function, and

is a discriminant surface.

In 2D is a set of curves.

Discriminant functions

g(x)

C1C2

g(x)

(> 0 assign x to C1< 0 assign x to C2

g(x) = 0

g(x) = 0

g(x) = 0

Posterior probability of skin given pixel colour

Assume equal prior probabilities, i.e. probability

of a pixel being skin is 0.5.

Posterior probability of skin is defined by Bayes rule:

P(skin|x) = p(x|skin)P(skin)p(x)

where

p(x) = p(x|skin)P(skin)+ p(x|background)P(background)i.e. the marginal pdf of x

ExampleIn the minimum average error classifier, the assignment rule is: decide C1if the posterior p(C1|x) > p(C2|x).

The equivalent discriminant function is

g(x) = p(C1|x) p(C2|x)or

g(x) = lnp(C1|x)p(C2|x)

Note, these two functions are not equal, but the decision boundaries are

the same.

Developing this further

g(x) = lnp(C1|x)p(C2|x)

= lnp(x|C1)p(x|C2)

+ lnp(C1)

p(C2)

Decision surfaces for Normal distributions

Suppose that the likelihoods are Normal:

p(x|C1) N(1,1) p(x|C2) N(2,2)

Then

g(x) = lnp(x|C1)p(x|C2)

+ lnp(C1)

p(C2)

= lnp(x|C1) lnp(x|C2)+ lnp(C1)

p(C2)

(x 1)>11 (x 1)+ (x 2)>12 (x 2) + c0

where c0 = ln p(C1)p(C2)

12 ln |1|+12 ln |2|.

Case 1: i = 2I

g(