Error analysis lecture 8

7/30/2019 Error analysis lecture 8

1/27

Physics 509 1

Physics 509: Introduction to

Parameter Estimation

Scott OserLecture #8

September 30, 2008


2/27

Physics 509 2

Outline

Last time: we reviewed multi-dimensionalGaussians, the Central Limit Theorem, and talkedin a general way about Gaussian error ellipses.

Today:

1) Introduction to parameter estimation2) Some useful basic estimators3) Error estimates on estimators4) Maximum likelihood estimators5) Estimating errors in the ML method6) Extended ML method


3/27

Physics 509 3

What is an estimator?

Quite simple, really ... an estimator is a procedure you apply to adata set to estimate some property of the parent distribution fromwhich the data is drawn.

This could be a recognizable parameter of a distribution (eg. the p

value of a binomial distribution), or it could be a more generalproperty of the distribution (eg. the mean of the parentdistribution).

The procedure can be anything you do with the data to generate anumerical result. Take an average, take the median value,multiply them all and divide by the GDP of Mongolia ... all of theseare estimators. You are free to make up any estimator you careto, and aren't restricted to standard choices. (Whether an

estimator you make yourself is a useful estimator or not is acompletely separate question!)


4/27

Physics 509 4

Bayesian estimators

You're already seen the Bayesian solution to parameter estimation... if your data is distributed according to a PDF depending onsome parametera, then Bayes' theorem gives you a formula forthe PDF ofa:

PaD , I= PaIPDa , Ida PaIPDa , I

=PaIPDa , IPDI

The PDF P(a|D,I) contains all the information there is to have

about the true value ofa. You can report it any way you like---preferably by publishing the PDF itself, or else if you want to reportjust a single number you can calculate the most likely value ofa, orthe mean of its distribution, or whatever you want.

There's no special magic: Bayesian analysis directly converts theobserved data into a PDF for any free parameters.


5/27

Physics 509 5

Frequentist estimators

Frequentists have a harder time of it ... they say that theparameters of the parent distribution have some fixed albeitunknown values. It doesn't make sense to talk about theprobability of a fixed parameter having some other value---all wecan talk about is how likely or unlikely was it that we would

observe the data we did given some value of the parameter. Let'stry to come up with estimators that are as close as possible to thetrue value of the parameter.

Most of what we will be talking about in this lecture is frequentistmethodology, although I'll try to relate it to Bayesian language asmuch as possible.


6/27

Physics 509 6

Desired properties of estimators

What makes a good estimator? Consider some

1) Consistent: a consistent estimator will tend to the true value asthe amount of data approaches infinity:

2) Unbiased: the expectation value of the estimator is equal to itstrue value, so its bias b is zero.

3) Efficient: the variance of the estimator is as small as possible(as we'll see, there are limitations on how small it can be)

It's not always possible to satisfy all three of these requirements.

a= a x1,

x2,

...xn

limN a=a

b= a a= dx1 ... dxn Px1 ...x na a x1 ...x n a=0

V a= dx1 ... dxn Px1 ...xna a x1 ...xn a 2

(Mean square error)2= aa

2 =b2V a


7/27

Physics 509 7

Common estimators

1) Mean of a distribution---obvious choice is to use the average:

Consistent and unbiased if measurements are independent. Not

necessarily the most efficient---its variance depends on thedistribution under consideration, and is given by

There may be more efficient estimators, especially if the parentdistribution has big tails. But in many circumstances the samplemean is the most efficient.

V =2

N

=1

Ni=1

N

xi x


8/27

Physics 509 8

Estimating the variance

A biased estimator:

If you know the true mean of a distribution, one usefulestimator (consistent and unbiased) of the variance is

Vx =1

Ni=1

N

x i2 What if is also unknown?

Vx = 1N1i=1

N

xix2

Vx=N1

NVx

An unbiased estimator:

Vx= 1Ni=1

N

xix2

But its square root is a biasedestimator of!


9/27

Physics 509 9

Estimating the standard deviation

We can use s as our estimator for. It will generally bebiased---we don't worry a lot about this because we're moreinterested in having an unbiased estimate ofs2.

For samples from a Gaussian distribution, the RMS on ourestimate for is given by

Vx =s 2=1

N1i=1

N

x ix 2

s=

2 N1

The square root of an estimate of the variance is the obviousthing to use as an estimate of the standard deviation:

Think of this as the error estimate on our error bar.

We can use s as our estimator for. It will generally bebiased---we don't worry a lot about this because we're moreinterested in having an unbiased estimate ofs2.

For samples from a Gaussian distribution, the RMS on ourestimate for is given by


10/27

Physics 509 10

Likelihood function and the minimum variance bound

Likelihood function: probability of data given the parameters

Lx1 ...xna= Px ia(The likelihood is actually one of the factors in the numerator ofBayes theorem.)

A remarkable result---for any unbiased estimator fora, thevariance of the estimator satisfies:

V a

1

d2ln L

da2

V a

1dbda 2

d

2ln L

da2

If estimator is biased with bias b, then this becomes


11/27

Physics 509 11

The minimum variance bound

The minimum variance bound is a remarkable result---itsays that there is some best case estimator which,when averaged over thousands of experiments, willgive parameter estimates closer to the true value, as

measured by the RMS error, than any other.

An estimator that achieves the minimum variancebound is maximally efficient.

Information theory is the science of how muchinformation is encoded in data set. The MVB comesout of this science.


12/27

Physics 509 12

Maximum likelihood estimators

By far the most useful estimator is the maximum likelihoodmethod. Given your data set x1 ... xN and a set of unknownparameters , calculate the likelihood function

Lx1 ...xN=i=1

N

Pxi

It's more common (and easier) to calculate -ln L instead:

ln Lx1 ...xN=i=1

N

ln Px iThe maximum likelihood estimator is that value of whichmaximizes L as a function of. It can be found by minimizing-ln L over the unknown parameters.


13/27

Physics 509 13

Simple example of an ML estimator

Suppose that our data sample is drawn from two different

distributions. We know the shapes of the two distributions, but notwhat fraction of our population comes from distribution A vs. B. Wehave 20 random measurements of X from the population.

PAx= 21e2

e2x PB x =3x2

Ptotx= f PAx 1 fPB x


14/27

Physics 509 14

Form for the log likelihood and the MLestimator

Suppose that our data sample is drawn from two differentdistributions. We know the shapes of the two distributions, but notwhat fraction of our population comes from distribution A vs. B. Wehave 20 random measurements of X from the population.

Ptotx= f PAx 1 fPB x

Form the negative log likelihood:

Minimize -ln(L) with respect to f. Sometimes you can solve thisanalytically by setting the derivative equal to zero. More often youhave to do it numerically.

ln L f=i=1

N

ln Ptotxif


15/27

Physics 509 15

Graph of the log likelihoodThe graph to the left

shows the shape of thenegative log likelihoodfunction vs. the unknownparameter f.

The minimum is f=0.415.This is the ML estimate.

As we'll see, the 1 errorrange is defined by

ln(L)=0.5 above the

minimum.

The data set was actually

drawn from a distributionwith a true value of f=0.3


16/27

Physics 509 16

Properties of ML estimators

Besides its intrinsic intuitiveness, the ML method has somenice (and some not-so-nice) properties:

1) ML estimator is usually consistent.

2) ML estimators are usually biased, although if also consistentthen the bias approaches zero as N goes to infinity.

3) Estimators are invariant under parameter transformations:

fa= f a

4) In the asymptotic limit, the estimator is efficient. The CentralLimit Theorem kicks on in the sum of the terms in the loglikelihood, making it Gaussian:

a2=

1

d2ln L

da2 a0


17/27

Physics 509 17

Relation to Bayesian approach

There is a close relation between the ML method and theBayesian approach.

The Bayesian posterior PDF for the parameter is the product ofthe likelihood function P(D|a,I) and the prior P(a|I).

So the ML estimator is actually the peak location for the Bayesianposterior PDF assuming a flat prior P(a|I)=1.

The log likelihood is related to the Bayesian PDF by:

P(a|D,I) = exp[ ln(L(a)) ]

This way of viewing the log likelihood as the logarithm of a

Bayesian PDF with uniform prior is an excellent way to intuitivelyunderstand many features of the ML method.


18/27

Physics 509 18

Errors on ML estimators

In the limit of large N, the

log likelihood becomesparabolic (by CLT).Comparing to ln(L) for asimple Gaussian:

it is natural to identify the1 range on the parameterby the points as which ln(L)=.

2 range: ln(L)=(2)2=23 range: ln(L)=(3)2=4.5

This is done even when thelikelihood isn't parabolic(although at some peril).

ln L=L 012

f f f

2


19/27

Physics 509 19

Parabolicity of the log likelihood

In general the log likelihood

becomes more parabolic asN gets larger. The graphsat the right show thenegative log likelihoods forour example problem for

N=20 and N=500. The redcurves are parabolic fitsaround the minimum.

How large does N have tobe before the parabolicapproximation is good?That depends on theproblem---try graphing

-ln(L) vs your parameter tosee how parabolic it is.


20/27

Physics 509 20

Asymmetric errors from ML estimatorsEven when the log likelihood is

not Gaussian, it's nearlyuniversal to define the 1range by ln(L)=. This canresult in asymmetric errorbars, such as:

The justification often given forthis is that one could always

reparameterize the estimatedquantity into one which doeshave a parabolic likelihood.Since ML estimators aresupposed to be invariant under

reparameterizations, you couldthen transform back to getasymmetric errors.

Does this procedure actuallywork?

0.410.150.17


21/27

Physics 509 21

Coverage of ML estimator errors

What do we really want theML error bars to mean?Ideally, the 1 range wouldmean that the true valuehas 68% chance of being

within that range.

Fraction of time1 range includes

N true value5 56.7%10 64.8%20 68.0%500 67.0%

Distribution of ML estimators for two N values


22/27

Physics 509 22

Errors on ML estimators

Simulation is the best

way to estimate the trueerror range on an MLestimator: assume a truevalue for the parameter,

and simulate a fewhundred experiments,then calculate MLestimates for each.

N=20:Range from likelihoodfunction: -0.16 / +0.17RMS of simulation: 0.16

N=500:Range from likelihoodfunction: -0.030 / +0.035RMS of simulation: 0.030


23/27

Physics 509 23

Likelihood functions of multiple parameters

Often there is more than one free parameter. To handle this, we

simply minimize the negative log likelihood over all freeparameters.

Errors determined by (in the Gaussian approximation):

ln L x1 ...xNa1 ... a m

a j=0

cov1 ai, a j= 2

ln L ai a j

evaluated at minimum


24/27

Physics 509 24

Error contours for multiple parameters

We can also find the errors

on parameters by drawingcontours on ln L.

1 range on a singleparameter a: the smallestand largest values of a thatgive ln L=, minimizing lnL over all other parameters.

But to get joint errorcontours, must use differentvalues of ln L (see NumRec Sec 15.6):

m=1 m=2 m=368.00% 0.5 1.15 1.77

90.00% 1.36 2.31 3.13

95.40% 2 3.09 4.01

99.00% 3.32 4.61 5.65


25/27

Physics 509 25

Extended maximum likelihood estimators

Sometimes the number of observed events is not fixed, but also

contains information about the unknown parameters. Forexample, maybe we want to fit for the rate. For this purpose wecan use the extended maximum likelihood method.

Normal ML method:

Extended ML method:

Pxdx=1

Q xdx==predicted number of events

E d d i lik lih d i


26/27

Physics 509 26

Extended maximum likelihood estimators

= ln [Px i]

Likelihood =eN

N !

i=1

N

Px i [note that = ]

ln L = ln Px iN ln v

The argument of the logarithm is the number density ofevents predicted at x

i. The second term (outside the

summation sign) is the total predicted number of events.

Pxdx=1

Example of the extended maximum likelihood


27/27

Physics 509 27

Example of the extended maximum likelihoodin action: SNO flux fits

P(E,R,,) =CC P

CC(E,R,,)

+ ES PES

(E,R,,)

+ NC PNC

(E,R,,)

Fit for the numbers ofCC, ES, and NCevents.

Careful: because everyevent must be eitherCC, ES, or NC, thethree event totals areanti-correlated witheach other.

Documents

Error analysis lecture 8