Upload
godwin-townsend
View
216
Download
1
Embed Size (px)
Citation preview
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
1
Montri [email protected]://fivedots.coe.psu.ac.th/~montri
240-650 Principles of Pattern
Recognition
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
2
Chapter 3
Maximum-Likelihood and Bayesian Parameter
Estimation
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
3
Introduction
• We could design an optimum classifier if we know P(i) and p(x|i)
• We rarely have knowledge about the probabilistic structure of the problem
• We often estimate P(i) and p(x|i) from training data or design samples
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
4
Maximum-Likelihood Estimation
• ML Estimation
• Always have good convergence properties as the number of training samples increases
• Simpler that other methods
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
5
The General Principle
• Suppose we separate a collection of samples according to class so that we have c data sets, D1, …, Dc with the samples in Dj having been drawn independently according to the probability law p(x|j)
• We say such samples are i.i.d.– independently and identically distributed random variable
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
6
The General Principle
• We assume that p(x|j) has a known parametric form and is determined uniquely by the value of a parameter vector j
• For example
• We explicitly write p(x|j) as p(x|j, j)
jjj Np Σμx ,~)|(
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
7
Problem Statement
• To use the information provided by the training samples to obtain good estimates for the unknown parameter vectors 1,…c associated with each category
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
8
Simplified Problem Statement
• If samples in Di give no information about j if i = j
• We now have c separated problems of the following form:
To use a set D of training samples drawn independently from the probability density p(x|) to estimate the unknown vector .
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
9
• Suppose that D contains n samples, x1,…,xn.
• Then we have
• The Maximum-Likelihood estimate of is the value of that maximizes p(D|)
n
kkpDp
1
)|()|( θxθ
Likelihood of q with respect to
the set of samples
θ̂
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
10
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
11
• Let = (1, …, p)t
• Let be the gradient operator
p
.
.1
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
12
Log-Likelihood Function
• We define l() as the log-likelihood function
• We can write our solution as
)|(ln)( θθ Dpl
)(maxargˆ θθ l
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
13
MLE
• From
• We have
• And
• Necessary condition for MLE
n
kkpDp
1
)|()|( θxθ
n
kkpl
1
)|(ln)( θxθ
n
kkpl
1
)|(ln θx
0 l
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
14
The Gaussian Case: Unknown
• Suppose that the samples are drawn from a multivariate normal population with mean and covariance matrix
• Let is the only unknown
• Consider a sample point xk and find
• and
μxΣμxΣμx k
tk
dkp 1
2
1)2(ln
2
1)|(ln
μxΣμxμ kkp 1)|(ln
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
15
• The MLE of must satisfy
• After rearranging
n
kk
1
1 0μ̂xΣ
n
kkn 1
1ˆ xμ
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
16
Sample Mean
• The MLE for the unknown population meanis just the arithmetic average of the training samples (or sample mean)
• If we think of the n samples as a cloud of points, then the sample mean is the centroid of the cloud
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
17
The Gaussian Case: Unknown and
• This is a more typical case where mean and covariance matrix are unknown
• Consider the univariate case with 1= and 2=2
212
2 2
12ln
2
1)|(ln
kk xxp θ
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
18
• And its derivative is
• Set to 0
• and
22
21
2
12
22
1
1
|ln
k
k
k x
x
xpl θθθ
n
kkx
11
2
0ˆˆ1
n
k
n
k
kx
1 122
2
1
2
0ˆ
ˆ
ˆ1
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
19
• With a little rearranging, we have
n
kkxn
μ1
1ˆ
n
kkxn 1
22 )ˆ(1
ˆ
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
20
MLE for multivariate case
n
kkn 1
1ˆ xμ
n
k
tkkn 1
)ˆ)(ˆ(1ˆ μxμxΣ
240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation
21
Bias
• The MLE for the variance 2 is biased• The expected value over all data sets of
size n of the sample variance is not equal to the true variance
• An Unbiased estimator for is given by
22
1
2 11
n
nxx
n
n
ii
n
k
tkkn 1
ˆˆ1
1μxμxC