Error analysis lecture 8

Embed Size (px)

Citation preview

  • 7/30/2019 Error analysis lecture 8

    1/27

    Physics 509 1

    Physics 509: Introduction to

    Parameter Estimation

    Scott OserLecture #8

    September 30, 2008

  • 7/30/2019 Error analysis lecture 8

    2/27

    Physics 509 2

    Outline

    Last time: we reviewed multi-dimensionalGaussians, the Central Limit Theorem, and talkedin a general way about Gaussian error ellipses.

    Today:

    1) Introduction to parameter estimation2) Some useful basic estimators3) Error estimates on estimators4) Maximum likelihood estimators5) Estimating errors in the ML method6) Extended ML method

  • 7/30/2019 Error analysis lecture 8

    3/27

    Physics 509 3

    What is an estimator?

    Quite simple, really ... an estimator is a procedure you apply to adata set to estimate some property of the parent distribution fromwhich the data is drawn.

    This could be a recognizable parameter of a distribution (eg. the p

    value of a binomial distribution), or it could be a more generalproperty of the distribution (eg. the mean of the parentdistribution).

    The procedure can be anything you do with the data to generate anumerical result. Take an average, take the median value,multiply them all and divide by the GDP of Mongolia ... all of theseare estimators. You are free to make up any estimator you careto, and aren't restricted to standard choices. (Whether an

    estimator you make yourself is a useful estimator or not is acompletely separate question!)

  • 7/30/2019 Error analysis lecture 8

    4/27

    Physics 509 4

    Bayesian estimators

    You're already seen the Bayesian solution to parameter estimation... if your data is distributed according to a PDF depending onsome parametera, then Bayes' theorem gives you a formula forthe PDF ofa:

    PaD , I= PaIPDa , Ida PaIPDa , I

    =PaIPDa , IPDI

    The PDF P(a|D,I) contains all the information there is to have

    about the true value ofa. You can report it any way you like---preferably by publishing the PDF itself, or else if you want to reportjust a single number you can calculate the most likely value ofa, orthe mean of its distribution, or whatever you want.

    There's no special magic: Bayesian analysis directly converts theobserved data into a PDF for any free parameters.

  • 7/30/2019 Error analysis lecture 8

    5/27

    Physics 509 5

    Frequentist estimators

    Frequentists have a harder time of it ... they say that theparameters of the parent distribution have some fixed albeitunknown values. It doesn't make sense to talk about theprobability of a fixed parameter having some other value---all wecan talk about is how likely or unlikely was it that we would

    observe the data we did given some value of the parameter. Let'stry to come up with estimators that are as close as possible to thetrue value of the parameter.

    Most of what we will be talking about in this lecture is frequentistmethodology, although I'll try to relate it to Bayesian language asmuch as possible.

  • 7/30/2019 Error analysis lecture 8

    6/27

    Physics 509 6

    Desired properties of estimators

    What makes a good estimator? Consider some

    1) Consistent: a consistent estimator will tend to the true value asthe amount of data approaches infinity:

    2) Unbiased: the expectation value of the estimator is equal to itstrue value, so its bias b is zero.

    3) Efficient: the variance of the estimator is as small as possible(as we'll see, there are limitations on how small it can be)

    It's not always possible to satisfy all three of these requirements.

    a= a x1,

    x2,

    ...xn

    limN a=a

    b= a a= dx1 ... dxn Px1 ...x na a x1 ...x n a=0

    V a= dx1 ... dxn Px1 ...xna a x1 ...xn a 2

    (Mean square error)2= aa

    2 =b2V a

  • 7/30/2019 Error analysis lecture 8

    7/27

    Physics 509 7

    Common estimators

    1) Mean of a distribution---obvious choice is to use the average:

    Consistent and unbiased if measurements are independent. Not

    necessarily the most efficient---its variance depends on thedistribution under consideration, and is given by

    There may be more efficient estimators, especially if the parentdistribution has big tails. But in many circumstances the samplemean is the most efficient.

    V =2

    N

    =1

    Ni=1

    N

    xi x

  • 7/30/2019 Error analysis lecture 8

    8/27

    Physics 509 8

    Estimating the variance

    A biased estimator:

    If you know the true mean of a distribution, one usefulestimator (consistent and unbiased) of the variance is

    Vx =1

    Ni=1

    N

    x i2 What if is also unknown?

    Vx = 1N1i=1

    N

    xix2

    Vx=N1

    NVx

    An unbiased estimator:

    Vx= 1Ni=1

    N

    xix2

    But its square root is a biasedestimator of!

  • 7/30/2019 Error analysis lecture 8

    9/27

    Physics 509 9

    Estimating the standard deviation

    We can use s as our estimator for. It will generally bebiased---we don't worry a lot about this because we're moreinterested in having an unbiased estimate ofs2.

    For samples from a Gaussian distribution, the RMS on ourestimate for is given by

    Vx =s 2=1

    N1i=1

    N

    x ix 2

    s=

    2 N1

    The square root of an estimate of the variance is the obviousthing to use as an estimate of the standard deviation:

    Think of this as the error estimate on our error bar.

    We can use s as our estimator for. It will generally bebiased---we don't worry a lot about this because we're moreinterested in having an unbiased estimate ofs2.

    For samples from a Gaussian distribution, the RMS on ourestimate for is given by

  • 7/30/2019 Error analysis lecture 8

    10/27

    Physics 509 10

    Likelihood function and the minimum variance bound

    Likelihood function: probability of data given the parameters

    Lx1 ...xna= Px ia(The likelihood is actually one of the factors in the numerator ofBayes theorem.)

    A remarkable result---for any unbiased estimator fora, thevariance of the estimator satisfies:

    V a

    1

    d2ln L

    da2

    V a

    1dbda 2

    d

    2ln L

    da2

    If estimator is biased with bias b, then this becomes

  • 7/30/2019 Error analysis lecture 8

    11/27

    Physics 509 11

    The minimum variance bound

    The minimum variance bound is a remarkable result---itsays that there is some best case estimator which,when averaged over thousands of experiments, willgive parameter estimates closer to the true value, as

    measured by the RMS error, than any other.

    An estimator that achieves the minimum variancebound is maximally efficient.

    Information theory is the science of how muchinformation is encoded in data set. The MVB comesout of this science.

  • 7/30/2019 Error analysis lecture 8

    12/27

    Physics 509 12

    Maximum likelihood estimators

    By far the most useful estimator is the maximum likelihoodmethod. Given your data set x1 ... xN and a set of unknownparameters , calculate the likelihood function

    Lx1 ...xN=i=1

    N

    Pxi

    It's more common (and easier) to calculate -ln L instead:

    ln Lx1 ...xN=i=1

    N

    ln Px iThe maximum likelihood estimator is that value of whichmaximizes L as a function of. It can be found by minimizing-ln L over the unknown parameters.

  • 7/30/2019 Error analysis lecture 8

    13/27

    Physics 509 13

    Simple example of an ML estimator

    Suppose that our data sample is drawn from two different

    distributions. We know the shapes of the two distributions, but notwhat fraction of our population comes from distribution A vs. B. Wehave 20 random measurements of X from the population.

    PAx= 21e2

    e2x PB x =3x2

    Ptotx= f PAx 1 fPB x

  • 7/30/2019 Error analysis lecture 8

    14/27

    Physics 509 14

    Form for the log likelihood and the MLestimator

    Suppose that our data sample is drawn from two differentdistributions. We know the shapes of the two distributions, but notwhat fraction of our population comes from distribution A vs. B. Wehave 20 random measurements of X from the population.

    Ptotx= f PAx 1 fPB x

    Form the negative log likelihood:

    Minimize -ln(L) with respect to f. Sometimes you can solve thisanalytically by setting the derivative equal to zero. More often youhave to do it numerically.

    ln L f=i=1

    N

    ln Ptotxif

  • 7/30/2019 Error analysis lecture 8

    15/27

    Physics 509 15

    Graph of the log likelihoodThe graph to the left

    shows the shape of thenegative log likelihoodfunction vs. the unknownparameter f.

    The minimum is f=0.415.This is the ML estimate.

    As we'll see, the 1 errorrange is defined by

    ln(L)=0.5 above the

    minimum.

    The data set was actually

    drawn from a distributionwith a true value of f=0.3

  • 7/30/2019 Error analysis lecture 8

    16/27

    Physics 509 16

    Properties of ML estimators

    Besides its intrinsic intuitiveness, the ML method has somenice (and some not-so-nice) properties:

    1) ML estimator is usually consistent.

    2) ML estimators are usually biased, although if also consistentthen the bias approaches zero as N goes to infinity.

    3) Estimators are invariant under parameter transformations:

    fa= f a

    4) In the asymptotic limit, the estimator is efficient. The CentralLimit Theorem kicks on in the sum of the terms in the loglikelihood, making it Gaussian:

    a2=

    1

    d2ln L

    da2 a0

  • 7/30/2019 Error analysis lecture 8

    17/27

    Physics 509 17

    Relation to Bayesian approach

    There is a close relation between the ML method and theBayesian approach.

    The Bayesian posterior PDF for the parameter is the product ofthe likelihood function P(D|a,I) and the prior P(a|I).

    So the ML estimator is actually the peak location for the Bayesianposterior PDF assuming a flat prior P(a|I)=1.

    The log likelihood is related to the Bayesian PDF by:

    P(a|D,I) = exp[ ln(L(a)) ]

    This way of viewing the log likelihood as the logarithm of a

    Bayesian PDF with uniform prior is an excellent way to intuitivelyunderstand many features of the ML method.

  • 7/30/2019 Error analysis lecture 8

    18/27

    Physics 509 18

    Errors on ML estimators

    In the limit of large N, the

    log likelihood becomesparabolic (by CLT).Comparing to ln(L) for asimple Gaussian:

    it is natural to identify the1 range on the parameterby the points as which ln(L)=.

    2 range: ln(L)=(2)2=23 range: ln(L)=(3)2=4.5

    This is done even when thelikelihood isn't parabolic(although at some peril).

    ln L=L 012

    f f f

    2

  • 7/30/2019 Error analysis lecture 8

    19/27

    Physics 509 19

    Parabolicity of the log likelihood

    In general the log likelihood

    becomes more parabolic asN gets larger. The graphsat the right show thenegative log likelihoods forour example problem for

    N=20 and N=500. The redcurves are parabolic fitsaround the minimum.

    How large does N have tobe before the parabolicapproximation is good?That depends on theproblem---try graphing

    -ln(L) vs your parameter tosee how parabolic it is.

  • 7/30/2019 Error analysis lecture 8

    20/27

    Physics 509 20

    Asymmetric errors from ML estimatorsEven when the log likelihood is

    not Gaussian, it's nearlyuniversal to define the 1range by ln(L)=. This canresult in asymmetric errorbars, such as:

    The justification often given forthis is that one could always

    reparameterize the estimatedquantity into one which doeshave a parabolic likelihood.Since ML estimators aresupposed to be invariant under

    reparameterizations, you couldthen transform back to getasymmetric errors.

    Does this procedure actuallywork?

    0.410.150.17

  • 7/30/2019 Error analysis lecture 8

    21/27

    Physics 509 21

    Coverage of ML estimator errors

    What do we really want theML error bars to mean?Ideally, the 1 range wouldmean that the true valuehas 68% chance of being

    within that range.

    Fraction of time1 range includes

    N true value5 56.7%10 64.8%20 68.0%500 67.0%

    Distribution of ML estimators for two N values

  • 7/30/2019 Error analysis lecture 8

    22/27

    Physics 509 22

    Errors on ML estimators

    Simulation is the best

    way to estimate the trueerror range on an MLestimator: assume a truevalue for the parameter,

    and simulate a fewhundred experiments,then calculate MLestimates for each.

    N=20:Range from likelihoodfunction: -0.16 / +0.17RMS of simulation: 0.16

    N=500:Range from likelihoodfunction: -0.030 / +0.035RMS of simulation: 0.030

  • 7/30/2019 Error analysis lecture 8

    23/27

    Physics 509 23

    Likelihood functions of multiple parameters

    Often there is more than one free parameter. To handle this, we

    simply minimize the negative log likelihood over all freeparameters.

    Errors determined by (in the Gaussian approximation):

    ln L x1 ...xNa1 ... a m

    a j=0

    cov1 ai, a j= 2

    ln L ai a j

    evaluated at minimum

  • 7/30/2019 Error analysis lecture 8

    24/27

    Physics 509 24

    Error contours for multiple parameters

    We can also find the errors

    on parameters by drawingcontours on ln L.

    1 range on a singleparameter a: the smallestand largest values of a thatgive ln L=, minimizing lnL over all other parameters.

    But to get joint errorcontours, must use differentvalues of ln L (see NumRec Sec 15.6):

    m=1 m=2 m=368.00% 0.5 1.15 1.77

    90.00% 1.36 2.31 3.13

    95.40% 2 3.09 4.01

    99.00% 3.32 4.61 5.65

  • 7/30/2019 Error analysis lecture 8

    25/27

    Physics 509 25

    Extended maximum likelihood estimators

    Sometimes the number of observed events is not fixed, but also

    contains information about the unknown parameters. Forexample, maybe we want to fit for the rate. For this purpose wecan use the extended maximum likelihood method.

    Normal ML method:

    Extended ML method:

    Pxdx=1

    Q xdx==predicted number of events

    E d d i lik lih d i

  • 7/30/2019 Error analysis lecture 8

    26/27

    Physics 509 26

    Extended maximum likelihood estimators

    = ln [Px i]

    Likelihood =eN

    N !

    i=1

    N

    Px i [note that = ]

    ln L = ln Px iN ln v

    The argument of the logarithm is the number density ofevents predicted at x

    i. The second term (outside the

    summation sign) is the total predicted number of events.

    Pxdx=1

    Example of the extended maximum likelihood

  • 7/30/2019 Error analysis lecture 8

    27/27

    Physics 509 27

    Example of the extended maximum likelihoodin action: SNO flux fits

    P(E,R,,) =CC P

    CC(E,R,,)

    + ES PES

    (E,R,,)

    + NC PNC

    (E,R,,)

    Fit for the numbers ofCC, ES, and NCevents.

    Careful: because everyevent must be eitherCC, ES, or NC, thethree event totals areanti-correlated witheach other.