21
240-650: Chapter 3: Maximum- Likelihood and Baysian Parameter 1 Montri Karnjanadecha [email protected] .th http:// fivedots.coe.psu. ac.th/~montri 240-650 Principles of Pattern Recognition

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha [email protected] . ac.th/~montri

Embed Size (px)

Citation preview

Page 1: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

1

Montri [email protected]://fivedots.coe.psu.ac.th/~montri

240-650 Principles of Pattern

Recognition

Page 2: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

2

Chapter 3

Maximum-Likelihood and Bayesian Parameter

Estimation

Page 3: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

3

Introduction

• We could design an optimum classifier if we know P(i) and p(x|i)

• We rarely have knowledge about the probabilistic structure of the problem

• We often estimate P(i) and p(x|i) from training data or design samples

Page 4: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

4

Maximum-Likelihood Estimation

• ML Estimation

• Always have good convergence properties as the number of training samples increases

• Simpler that other methods

Page 5: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

5

The General Principle

• Suppose we separate a collection of samples according to class so that we have c data sets, D1, …, Dc with the samples in Dj having been drawn independently according to the probability law p(x|j)

• We say such samples are i.i.d.– independently and identically distributed random variable

Page 6: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

6

The General Principle

• We assume that p(x|j) has a known parametric form and is determined uniquely by the value of a parameter vector j

• For example

• We explicitly write p(x|j) as p(x|j, j)

jjj Np Σμx ,~)|(

Page 7: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

7

Problem Statement

• To use the information provided by the training samples to obtain good estimates for the unknown parameter vectors 1,…c associated with each category

Page 8: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

8

Simplified Problem Statement

• If samples in Di give no information about j if i = j

• We now have c separated problems of the following form:

To use a set D of training samples drawn independently from the probability density p(x|) to estimate the unknown vector .

Page 9: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

9

• Suppose that D contains n samples, x1,…,xn.

• Then we have

• The Maximum-Likelihood estimate of is the value of that maximizes p(D|)

n

kkpDp

1

)|()|( θxθ

Likelihood of q with respect to

the set of samples

θ̂

Page 10: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

10

Page 11: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

11

• Let = (1, …, p)t

• Let be the gradient operator

p

.

.1

Page 12: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

12

Log-Likelihood Function

• We define l() as the log-likelihood function

• We can write our solution as

)|(ln)( θθ Dpl

)(maxargˆ θθ l

Page 13: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

13

MLE

• From

• We have

• And

• Necessary condition for MLE

n

kkpDp

1

)|()|( θxθ

n

kkpl

1

)|(ln)( θxθ

n

kkpl

1

)|(ln θx

0 l

Page 14: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

14

The Gaussian Case: Unknown

• Suppose that the samples are drawn from a multivariate normal population with mean and covariance matrix

• Let is the only unknown

• Consider a sample point xk and find

• and

μxΣμxΣμx k

tk

dkp 1

2

1)2(ln

2

1)|(ln

μxΣμxμ kkp 1)|(ln

Page 15: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

15

• The MLE of must satisfy

• After rearranging

n

kk

1

1 0μ̂xΣ

n

kkn 1

1ˆ xμ

Page 16: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

16

Sample Mean

• The MLE for the unknown population meanis just the arithmetic average of the training samples (or sample mean)

• If we think of the n samples as a cloud of points, then the sample mean is the centroid of the cloud

Page 17: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

17

The Gaussian Case: Unknown and

• This is a more typical case where mean and covariance matrix are unknown

• Consider the univariate case with 1= and 2=2

212

2 2

12ln

2

1)|(ln

kk xxp θ

Page 18: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

18

• And its derivative is

• Set to 0

• and

22

21

2

12

22

1

1

|ln

k

k

k x

x

xpl θθθ

n

kkx

11

2

0ˆˆ1

n

k

n

k

kx

1 122

2

1

2

ˆ

ˆ1

Page 19: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

19

• With a little rearranging, we have

n

kkxn

μ1

n

kkxn 1

22 )ˆ(1

ˆ

Page 20: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

20

MLE for multivariate case

n

kkn 1

1ˆ xμ

n

k

tkkn 1

)ˆ)(ˆ(1ˆ μxμxΣ

Page 21: 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th . ac.th/~montri

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation

21

Bias

• The MLE for the variance 2 is biased• The expected value over all data sets of

size n of the sample variance is not equal to the true variance

• An Unbiased estimator for is given by

22

1

2 11

n

nxx

n

n

ii

n

k

tkkn 1

ˆˆ1

1μxμxC