Upload
omegauser
View
221
Download
0
Embed Size (px)
Citation preview
7/30/2019 Error analysis lecture 8
1/27
Physics 509 1
Physics 509: Introduction to
Parameter Estimation
Scott OserLecture #8
September 30, 2008
7/30/2019 Error analysis lecture 8
2/27
Physics 509 2
Outline
Last time: we reviewed multi-dimensionalGaussians, the Central Limit Theorem, and talkedin a general way about Gaussian error ellipses.
Today:
1) Introduction to parameter estimation2) Some useful basic estimators3) Error estimates on estimators4) Maximum likelihood estimators5) Estimating errors in the ML method6) Extended ML method
7/30/2019 Error analysis lecture 8
3/27
Physics 509 3
What is an estimator?
Quite simple, really ... an estimator is a procedure you apply to adata set to estimate some property of the parent distribution fromwhich the data is drawn.
This could be a recognizable parameter of a distribution (eg. the p
value of a binomial distribution), or it could be a more generalproperty of the distribution (eg. the mean of the parentdistribution).
The procedure can be anything you do with the data to generate anumerical result. Take an average, take the median value,multiply them all and divide by the GDP of Mongolia ... all of theseare estimators. You are free to make up any estimator you careto, and aren't restricted to standard choices. (Whether an
estimator you make yourself is a useful estimator or not is acompletely separate question!)
7/30/2019 Error analysis lecture 8
4/27
Physics 509 4
Bayesian estimators
You're already seen the Bayesian solution to parameter estimation... if your data is distributed according to a PDF depending onsome parametera, then Bayes' theorem gives you a formula forthe PDF ofa:
PaD , I= PaIPDa , Ida PaIPDa , I
=PaIPDa , IPDI
The PDF P(a|D,I) contains all the information there is to have
about the true value ofa. You can report it any way you like---preferably by publishing the PDF itself, or else if you want to reportjust a single number you can calculate the most likely value ofa, orthe mean of its distribution, or whatever you want.
There's no special magic: Bayesian analysis directly converts theobserved data into a PDF for any free parameters.
7/30/2019 Error analysis lecture 8
5/27
Physics 509 5
Frequentist estimators
Frequentists have a harder time of it ... they say that theparameters of the parent distribution have some fixed albeitunknown values. It doesn't make sense to talk about theprobability of a fixed parameter having some other value---all wecan talk about is how likely or unlikely was it that we would
observe the data we did given some value of the parameter. Let'stry to come up with estimators that are as close as possible to thetrue value of the parameter.
Most of what we will be talking about in this lecture is frequentistmethodology, although I'll try to relate it to Bayesian language asmuch as possible.
7/30/2019 Error analysis lecture 8
6/27
Physics 509 6
Desired properties of estimators
What makes a good estimator? Consider some
1) Consistent: a consistent estimator will tend to the true value asthe amount of data approaches infinity:
2) Unbiased: the expectation value of the estimator is equal to itstrue value, so its bias b is zero.
3) Efficient: the variance of the estimator is as small as possible(as we'll see, there are limitations on how small it can be)
It's not always possible to satisfy all three of these requirements.
a= a x1,
x2,
...xn
limN a=a
b= a a= dx1 ... dxn Px1 ...x na a x1 ...x n a=0
V a= dx1 ... dxn Px1 ...xna a x1 ...xn a 2
(Mean square error)2= aa
2 =b2V a
7/30/2019 Error analysis lecture 8
7/27
Physics 509 7
Common estimators
1) Mean of a distribution---obvious choice is to use the average:
Consistent and unbiased if measurements are independent. Not
necessarily the most efficient---its variance depends on thedistribution under consideration, and is given by
There may be more efficient estimators, especially if the parentdistribution has big tails. But in many circumstances the samplemean is the most efficient.
V =2
N
=1
Ni=1
N
xi x
7/30/2019 Error analysis lecture 8
8/27
Physics 509 8
Estimating the variance
A biased estimator:
If you know the true mean of a distribution, one usefulestimator (consistent and unbiased) of the variance is
Vx =1
Ni=1
N
x i2 What if is also unknown?
Vx = 1N1i=1
N
xix2
Vx=N1
NVx
An unbiased estimator:
Vx= 1Ni=1
N
xix2
But its square root is a biasedestimator of!
7/30/2019 Error analysis lecture 8
9/27
Physics 509 9
Estimating the standard deviation
We can use s as our estimator for. It will generally bebiased---we don't worry a lot about this because we're moreinterested in having an unbiased estimate ofs2.
For samples from a Gaussian distribution, the RMS on ourestimate for is given by
Vx =s 2=1
N1i=1
N
x ix 2
s=
2 N1
The square root of an estimate of the variance is the obviousthing to use as an estimate of the standard deviation:
Think of this as the error estimate on our error bar.
We can use s as our estimator for. It will generally bebiased---we don't worry a lot about this because we're moreinterested in having an unbiased estimate ofs2.
For samples from a Gaussian distribution, the RMS on ourestimate for is given by
7/30/2019 Error analysis lecture 8
10/27
Physics 509 10
Likelihood function and the minimum variance bound
Likelihood function: probability of data given the parameters
Lx1 ...xna= Px ia(The likelihood is actually one of the factors in the numerator ofBayes theorem.)
A remarkable result---for any unbiased estimator fora, thevariance of the estimator satisfies:
V a
1
d2ln L
da2
V a
1dbda 2
d
2ln L
da2
If estimator is biased with bias b, then this becomes
7/30/2019 Error analysis lecture 8
11/27
Physics 509 11
The minimum variance bound
The minimum variance bound is a remarkable result---itsays that there is some best case estimator which,when averaged over thousands of experiments, willgive parameter estimates closer to the true value, as
measured by the RMS error, than any other.
An estimator that achieves the minimum variancebound is maximally efficient.
Information theory is the science of how muchinformation is encoded in data set. The MVB comesout of this science.
7/30/2019 Error analysis lecture 8
12/27
Physics 509 12
Maximum likelihood estimators
By far the most useful estimator is the maximum likelihoodmethod. Given your data set x1 ... xN and a set of unknownparameters , calculate the likelihood function
Lx1 ...xN=i=1
N
Pxi
It's more common (and easier) to calculate -ln L instead:
ln Lx1 ...xN=i=1
N
ln Px iThe maximum likelihood estimator is that value of whichmaximizes L as a function of. It can be found by minimizing-ln L over the unknown parameters.
7/30/2019 Error analysis lecture 8
13/27
Physics 509 13
Simple example of an ML estimator
Suppose that our data sample is drawn from two different
distributions. We know the shapes of the two distributions, but notwhat fraction of our population comes from distribution A vs. B. Wehave 20 random measurements of X from the population.
PAx= 21e2
e2x PB x =3x2
Ptotx= f PAx 1 fPB x
7/30/2019 Error analysis lecture 8
14/27
Physics 509 14
Form for the log likelihood and the MLestimator
Suppose that our data sample is drawn from two differentdistributions. We know the shapes of the two distributions, but notwhat fraction of our population comes from distribution A vs. B. Wehave 20 random measurements of X from the population.
Ptotx= f PAx 1 fPB x
Form the negative log likelihood:
Minimize -ln(L) with respect to f. Sometimes you can solve thisanalytically by setting the derivative equal to zero. More often youhave to do it numerically.
ln L f=i=1
N
ln Ptotxif
7/30/2019 Error analysis lecture 8
15/27
Physics 509 15
Graph of the log likelihoodThe graph to the left
shows the shape of thenegative log likelihoodfunction vs. the unknownparameter f.
The minimum is f=0.415.This is the ML estimate.
As we'll see, the 1 errorrange is defined by
ln(L)=0.5 above the
minimum.
The data set was actually
drawn from a distributionwith a true value of f=0.3
7/30/2019 Error analysis lecture 8
16/27
Physics 509 16
Properties of ML estimators
Besides its intrinsic intuitiveness, the ML method has somenice (and some not-so-nice) properties:
1) ML estimator is usually consistent.
2) ML estimators are usually biased, although if also consistentthen the bias approaches zero as N goes to infinity.
3) Estimators are invariant under parameter transformations:
fa= f a
4) In the asymptotic limit, the estimator is efficient. The CentralLimit Theorem kicks on in the sum of the terms in the loglikelihood, making it Gaussian:
a2=
1
d2ln L
da2 a0
7/30/2019 Error analysis lecture 8
17/27
Physics 509 17
Relation to Bayesian approach
There is a close relation between the ML method and theBayesian approach.
The Bayesian posterior PDF for the parameter is the product ofthe likelihood function P(D|a,I) and the prior P(a|I).
So the ML estimator is actually the peak location for the Bayesianposterior PDF assuming a flat prior P(a|I)=1.
The log likelihood is related to the Bayesian PDF by:
P(a|D,I) = exp[ ln(L(a)) ]
This way of viewing the log likelihood as the logarithm of a
Bayesian PDF with uniform prior is an excellent way to intuitivelyunderstand many features of the ML method.
7/30/2019 Error analysis lecture 8
18/27
Physics 509 18
Errors on ML estimators
In the limit of large N, the
log likelihood becomesparabolic (by CLT).Comparing to ln(L) for asimple Gaussian:
it is natural to identify the1 range on the parameterby the points as which ln(L)=.
2 range: ln(L)=(2)2=23 range: ln(L)=(3)2=4.5
This is done even when thelikelihood isn't parabolic(although at some peril).
ln L=L 012
f f f
2
7/30/2019 Error analysis lecture 8
19/27
Physics 509 19
Parabolicity of the log likelihood
In general the log likelihood
becomes more parabolic asN gets larger. The graphsat the right show thenegative log likelihoods forour example problem for
N=20 and N=500. The redcurves are parabolic fitsaround the minimum.
How large does N have tobe before the parabolicapproximation is good?That depends on theproblem---try graphing
-ln(L) vs your parameter tosee how parabolic it is.
7/30/2019 Error analysis lecture 8
20/27
Physics 509 20
Asymmetric errors from ML estimatorsEven when the log likelihood is
not Gaussian, it's nearlyuniversal to define the 1range by ln(L)=. This canresult in asymmetric errorbars, such as:
The justification often given forthis is that one could always
reparameterize the estimatedquantity into one which doeshave a parabolic likelihood.Since ML estimators aresupposed to be invariant under
reparameterizations, you couldthen transform back to getasymmetric errors.
Does this procedure actuallywork?
0.410.150.17
7/30/2019 Error analysis lecture 8
21/27
Physics 509 21
Coverage of ML estimator errors
What do we really want theML error bars to mean?Ideally, the 1 range wouldmean that the true valuehas 68% chance of being
within that range.
Fraction of time1 range includes
N true value5 56.7%10 64.8%20 68.0%500 67.0%
Distribution of ML estimators for two N values
7/30/2019 Error analysis lecture 8
22/27
Physics 509 22
Errors on ML estimators
Simulation is the best
way to estimate the trueerror range on an MLestimator: assume a truevalue for the parameter,
and simulate a fewhundred experiments,then calculate MLestimates for each.
N=20:Range from likelihoodfunction: -0.16 / +0.17RMS of simulation: 0.16
N=500:Range from likelihoodfunction: -0.030 / +0.035RMS of simulation: 0.030
7/30/2019 Error analysis lecture 8
23/27
Physics 509 23
Likelihood functions of multiple parameters
Often there is more than one free parameter. To handle this, we
simply minimize the negative log likelihood over all freeparameters.
Errors determined by (in the Gaussian approximation):
ln L x1 ...xNa1 ... a m
a j=0
cov1 ai, a j= 2
ln L ai a j
evaluated at minimum
7/30/2019 Error analysis lecture 8
24/27
Physics 509 24
Error contours for multiple parameters
We can also find the errors
on parameters by drawingcontours on ln L.
1 range on a singleparameter a: the smallestand largest values of a thatgive ln L=, minimizing lnL over all other parameters.
But to get joint errorcontours, must use differentvalues of ln L (see NumRec Sec 15.6):
m=1 m=2 m=368.00% 0.5 1.15 1.77
90.00% 1.36 2.31 3.13
95.40% 2 3.09 4.01
99.00% 3.32 4.61 5.65
7/30/2019 Error analysis lecture 8
25/27
Physics 509 25
Extended maximum likelihood estimators
Sometimes the number of observed events is not fixed, but also
contains information about the unknown parameters. Forexample, maybe we want to fit for the rate. For this purpose wecan use the extended maximum likelihood method.
Normal ML method:
Extended ML method:
Pxdx=1
Q xdx==predicted number of events
E d d i lik lih d i
7/30/2019 Error analysis lecture 8
26/27
Physics 509 26
Extended maximum likelihood estimators
= ln [Px i]
Likelihood =eN
N !
i=1
N
Px i [note that = ]
ln L = ln Px iN ln v
The argument of the logarithm is the number density ofevents predicted at x
i. The second term (outside the
summation sign) is the total predicted number of events.
Pxdx=1
Example of the extended maximum likelihood
7/30/2019 Error analysis lecture 8
27/27
Physics 509 27
Example of the extended maximum likelihoodin action: SNO flux fits
P(E,R,,) =CC P
CC(E,R,,)
+ ES PES
(E,R,,)
+ NC PNC
(E,R,,)
Fit for the numbers ofCC, ES, and NCevents.
Careful: because everyevent must be eitherCC, ES, or NC, thethree event totals areanti-correlated witheach other.