Upload
limei
View
19
Download
0
Embed Size (px)
DESCRIPTION
•Objectives: Overview General Case Gaussian Cases Resources: DHS – Chap. 3 (Part 1) AM – Tutorial AM – Links BGIM – Primer CSRN – Unbiased DM – Bias. ECE 8443 – Pattern Recognition. LECTURE 10: MAXIMUM LIKELIHOOD ESTIMATION. - PowerPoint PPT Presentation
Citation preview
• Objectives:
OverviewGeneral CaseGaussian Cases
• Resources:
DHS – Chap. 3 (Part 1)
AM – TutorialAM – LinksBGIM – PrimerCSRN – UnbiasedDM – Bias• URL: .../publications/courses/ece_8443/lectures/current/lecture_10.ppt
ECE 8443 – Pattern Recognition
LECTURE 10: MAXIMUM LIKELIHOOD ESTIMATION
• In Chapter 2, we learned how to design an optimal classifier if we knew the prior probabilities, P(i), and class-conditional densities, p(x|i).
• What can we do if we do not have this information?
• What limitations do we face?
• There are two common approaches to parameter estimation: maximum-likelihood and Bayesian estimation.
• Maximum likelihood: treat the parameters as quantities whose values are fixed but unknown.
• Bayes: treat the parameters as random variables having some known prior distribution. Observations of samples converts this to a posterior.
• Bayesian learning: sharpen the a posteriori density causing it to peak near the true value.
10: MAXIMUM LIKELIHOOD ESTIMATIONINTRODUCTION
10: MAXIMUM LIKELIHOOD ESTIMATIONGENERAL PRINCIPLE
• I.I.D.: c data sets, D1,...,Dc, where Dj drawn independently according to p(x|j).
• Assume p(x|j) has a known parametric form and is completely determined by the parameter vector j (e.g., p(x|j) N(j,j), where j=[1, ..., j , 11, 12, ...,dd])
• p(x|j) has an explicit dependence on j: p(x|j,j)
• Use training samples to estimate 1, 2,..., c
• Functional independence: assume Di gives no useful information about j for ij
• Simplifies notation to a set D of training samples (x1,...xn) drawn independently from p(x|) to estimate .
• Because the samples were drawn independently:)(p)|D(p
n
kk
1x
10: MAXIMUM LIKELIHOOD ESTIMATIONEXAMPLE
• p(D|) is called the likelihood of with respect to the data.
• Given several training points
• Top: candidate source distributions are shown
• Which distribution is the ML estimate?
• Middle: an estimate of the likelihood of the data as a function of (the mean)
• Bottom: log likelihood
• The value of that maximizes this likelihood, denoted ,
is the maximum likelihood estimate (ML) of .
10: MAXIMUM LIKELIHOOD ESTIMATIONGENERAL MATHEMATICS
n
kk
n
kk
θ
p
p
pln
))(pln(
θlmaxargˆ
θDplnθl
.
.),...,,(
1
1
1
21
:Define
Let
Let
x
x
t• The ML estimate is found
by solving this equation:
.pln
]pln[l
n
kk
n
kk
01
1
x
x
• The solution to this equation can be a global maximum, a local maximum, or even an inflection point.
• Under what conditions is it a global maximum?
10: MAXIMUM LIKELIHOOD ESTIMATIONMAXIMUM A POSTERIORI (MAP)
• A class of estimators – maximum a posteriori (MAP) – maximize where describes the prior probability of different parameter values.
pl p
• An ML estimator is a MAP estimator for uniform priors.
• A MAP estimator finds the peak, or mode, of a posterior density.
• MAP estimators are not transformation invariant (if we perform a nonlinear transformation of the input data, the estimator is no longer optimum in the new space). This observation will be useful later in the course.
10: MAXIMUM LIKELIHOOD ESTIMATIONGAUSSIAN CASE: UNKNOWN MEAN
• Consider the case where only the mean, = , is unknown:
)()(])ln[(
)]()(exp[)(
ln[))(pln(
kkd
kk//d
xx
xxx
t
tk
1
1212
21
221
21
2
1
01
n
kkpln x
)())(pln( k xxk
1which implies:
)(
)]()([])ln[([
)]()(])ln[([
k
kkd
kkd
x
xx
xx
t
t
1
1
1
21
221
21
221because:
• Rearranging terms:
• Significance???
10: MAXIMUM LIKELIHOOD ESTIMATIONGAUSSIAN CASE: UNKNOWN MEAN
n
kk
n
kk
n
k
n
kk
n
kk
n
kk
nˆ
ˆn
ˆ
)ˆ(
)ˆ(
1
1
1 1
1
1
1
1
0
0
0
0
x
x
x
x
x
• Substituting into the expresssion for the total likelihood:
01
1
1
n
kk
n
kk )(plnl xx
10: MAXIMUM LIKELIHOOD ESTIMATIONUNKNOWN MEAN AND VARIANCE
• Let = [,2]:
)()(])ln[())(pln( kk 11
212 21
221 xxx t
k θ
22
21
2
12
221
1
)x(
)x(
))(pln(k
k
θkx
• The full likelihood leads to:
n
k
n
kk
n
k
k
n
kk
ˆ)ˆx(ˆ
)ˆx(ˆ
)ˆx(ˆ
12
1
21
122
21
2
11
2
022
1
01
10: MAXIMUM LIKELIHOOD ESTIMATIONUNKNOWN MEAN AND VARIANCE
• This leads to these equations:
2
1
22
11
1
1
)ˆxn
ˆˆ
xn
ˆˆ
n
kk
n
kk
(
• In the multivariate case:
n
kkk
n
kk
ˆˆn
ˆ
nˆ
1
2
1
1
1
txx
x
The true covariance is the expected value of the
matrix , which is a familiar result. txx ˆˆ kk
10: MAXIMUM LIKELIHOOD ESTIMATIONCONVERGENCE OF THE MEAN
• Does the maximum likelihood estimate of the variance converge to the true value of the variance? Let’s start with a few simple results we will need later.
• Expected value of the ML estimate of the mean:
n
i
n
ii
n
ii
n
]x[En
]xn[E]ˆ[E
1
1
1
1
1
1
22
1 12
2
11
22
22
1
11
n
i
n
jji
n
jj
n
ii
]xx[En
]xn
xn
[E
]ˆ[E
])ˆ[E(]ˆ[E]ˆvar[
10: MAXIMUM LIKELIHOOD ESTIMATIONVARIANCE OF ML ESTIMATE OF THE MEAN
• The expected value of xixj will be 2 for j k since the two random variables are independent.
• The expected value of xi2 will be 2 + 2.
• Hence, in the summation above, we have n2-n terms with expected value 2 and n terms with expected value 2 + 2.
• Thus, n
nnnn
]ˆvar[2
222222
1
• We see that the variance of the estimate goes to zero as n goes to infinity, and our estimate converges to the true estimate (error goes to zero).
which implies: 22
22 n
])ˆ[E(]ˆvar[]ˆ[E
2
11
2
22
222
2222
1
2
2
)xn(x
]x[E
][E]x[E
][E]x[E]x[E)x[(E
n
ii
n
ii
Note that this implies: 22
1
2
n
iix
• Now we can combine these results. Recall our expression for the ML estimate of the variance:
n
ii ]ˆx
n[Eˆ
1
22 1
10: MAXIMUM LIKELIHOOD ESTIMATIONVARIANCE RELATIONSHIPS
• We will need one more result:
10: MAXIMUM LIKELIHOOD ESTIMATIONCOVARIANCE EXPANSION
))n(]ˆx[E)((n
])ˆ[E]ˆx[E]x[E(n
)]ˆˆxx([En
ˆxn[Eˆ
i
n
i
n
iii
i
n
ii
n
ii
22
1
22
2
1
2
2
1
2
1
22
21
21
211
• Expand the covariance and simplify:
))n((n
])xx[E]xx[E(n
]xx[En
]xx[E]ˆx[E
ii
n
jij
ji
n
jji
n
jjii
222
1
11
111
1
• One more intermediate term to derive:
10: MAXIMUM LIKELIHOOD ESTIMATIONBIASED VARIANCE ESTIMATE
2
1
2
1
22
1
2
22
1
22
2222
1
22
22
1
222
1
1111
11
1
21
21
n)n(
n)n(
n)n/(
n)n(
n
)n(n
))n()n()((n
))n(]ˆx[E)((n
ˆ
n
i
n
i
n
i
n
i
n
i
i
n
i
• Substitute our previously derived expression for the second term:
22
1
22 11 n
n]ˆx
n[Eˆ
n
ii
• An unbiased estimator is:
n
iii ˆˆ
n 111 txxC
• These are related by:
Cn)n(ˆ 1
which is asymptotically unbiased. See Burl, AJWills and AWM for excellent examples and explanations of the details of this derivation.
10: MAXIMUM LIKELIHOOD ESTIMATIONEXPECTATION SIMPLIFICATION
• Therefore, the ML estimate is biased:
However, the ML estimate converges (and is MSE).