36
MT2004 Olivier GIMENEZ Telephone: 01334 461827 E-mail: [email protected] Website: http://www.creem.st-and.ac.uk/olivier/OGimene z.html

MT2004

  • Upload
    mai

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

MT2004. Olivier GIMENEZ Telephone: 01334 461827 E-mail: [email protected] Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html. 11. Maximum Likelihood Estimation. - PowerPoint PPT Presentation

Citation preview

Page 1: MT2004

MT2004

Olivier GIMENEZ

Telephone: 01334 461827

E-mail: [email protected]

Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html

Page 2: MT2004

11. Maximum Likelihood Estimation

• So far, we’ve provided confidence intervals for and tested hypotheses about model parameters (mean of a normally distributed population)

• Objective here: estimating the parameters of a model, using data

• Example: We want to estimate the probability of getting a head upon flipping a particular coin.

• We flip the coin ‘independently’ 10 times (i.e. we sample n = 10 flips), obtaining the following result: H H T H H H T T H H

Page 3: MT2004

11. Maximum Likelihood Estimation

• We flip the coin ‘independently’ 10 times (i.e. we sample n = 10 flips), obtaining the following result: H H T H H H T T H H

• The probability of obtaining this sequence – in advance of collecting the data – is a function of the unknown parameter :

• Pr(data | parameter) = Pr(H H T H H H T T H H | )

= (1 - ) (1 - ) (1 - )

7 (1 - )3

• But the data for our particular sample are fixed: we have already collected them!

• The parameter also has a fixed value, but this value is unknown. We know that it varies between 0 and 1.

Page 4: MT2004

11. Maximum Likelihood Estimation• The value of varies between 0 and 1

• We shall treat the probability of the observed data as a function of

• This function is called the likelihood function:

• L(parameter | data) = Pr(H H T H H H T T H H | )

= L( | H H T H H H T T H H)

= L( | data)

= 7 (1-)3

• The probability function and the likelihood function are the same equation. But the probability function is a function of the data with the value of the parameter fixed, while the likelihood function is a function of the parameter with the data fixed.

Page 5: MT2004

11. Maximum Likelihood Estimation• Here are some representative values of the likelihood for different values of

Likelihood of observing 7 heads and 3 tails for different values of the probability of observing a head,

Page 6: MT2004

11. Maximum Likelihood Estimation• The probability of obtaining the sample of data that we have in hand, H H T H H H T T H H, is small regardless of the true value of .

• This is usually the case; any specific sample result – including the one that is realised – will have low probability

• Nevertheless, the likelihood contains useful information about the unknown parameter

• E.g. cannot be zero or one (probability 0), and is unlikely to be close to zero or one.

• Reversing this reasoning, the value of that is most supported by the data is the one for which the likelihood is largest

• This value is the maximum-likelihood estimate (MLE) of

Page 7: MT2004

11. Maximum Likelihood Estimation

• More generally, for n independent flips of the coin, producing a particular sequence that includes x head and n - x tails,

L( | data) = Pr(data | ) = x (1-)n-x

• We want the value of that maximises L( | data), which we often abbreviate L(

• It is simpler – and equivalent – to find the value of that maximises the log of the likelihood

Log L() = x log + (n – x ) log(1-)

Page 8: MT2004

11. Maximum Likelihood Estimation

• Differentiating log L() with respect to produces

• Setting the derivative to 0 and solving produces the MLE which, as before, is the sample proportion x / n

Page 9: MT2004

11. Maximum Likelihood Estimation• In greater generality: consider a set of observations x1,…,xn which are modelled as observations of independent discrete random variables with probability function f(x;) which depends on some (vector of) parameters .

• According to the model, the probability of obtaining the observed data is the product of the probability functions for each observations, i.e.

• We seek the parameters of the model that make the data look most probable, in other words, we seek to maximise the likelihood L(; x1,…, xn) (function of the parameters with data fixed) with respect to

Page 10: MT2004

11. Maximum Likelihood Estimation

• Equivalently, we seek to maximise the log-likelihood

• Recall that log(a b) = log(a) + log(b)

• Example: suppose that you have n observations x1,…,xn on independent Poisson distributed random variables each with probability function

1. Form the likelihood and then the corresponding log-likelihood

2. Maximise the log-likelihood w.r.t. and obtain its MLE

Page 11: MT2004

11. Maximum Likelihood Estimation

1. Form the likelihood…

… and then the corresponding log-likelihood

Page 12: MT2004

11. Maximum Likelihood Estimation2. Maximise the log-likelihood w.r.t. and obtain its MLE

so the MLE of is:

• The expression that we’ve just derived is an estimator, i.e. a function of the random variables X1,…, Xn

• The value of this function which is obtained by evaluating it on observation values x1,…, xn is an estimate.

Page 13: MT2004

11. Maximum Likelihood EstimationThe MLE of is:

• The expression that we’ve just derived is an estimator, i.e. a function of the random variables X1,…, Xn

• The value of this function which is obtained by evaluating it on observation values x1,…, xn is an estimate.

• Suppose that in this case we have 4 observations 1, 3, 8 and 2. What is the maximum likelihood estimate?

Page 14: MT2004

11. Maximum Likelihood EstimationThe MLE of is:

• Suppose that in this case we have 4 observations 1, 3, 8 and 2. What is the maximum likelihood estimate?

• The maximum likelihood estimate is:

• Note that, in general, we should check that we have obtained a maximum likelihood estimator, so we should calculate the second derivative d2 l()/ d 2 and check that it’s negative (which it is in this example)… In other words, l() is concave.

Page 15: MT2004

11. Maximum Likelihood EstimationA more complicated example…

Suppose that you have a series of measurements y1,…, yn of radioactive emission counts from samples of caesium of masses x1,…, xn, respectively. You wish to model the counts as Poisson random variables, where each Yi has mean xi. Obtain the maximum likelihood estimator of (the radioactivity per unit mass)

1. Form the likelihood and then the corresponding log-likelihood

2. Maximise the log-likelihood w.r.t. and obtain its MLE

Page 16: MT2004

11. Maximum Likelihood EstimationSuppose that you have a series of measurements y1,…, yn of radioactive emission counts from samples of caesium of masses x1,…, xn, respectively. You wish to model the counts as Poisson random variables, where each Yi has mean xi. Obtain the maximum likelihood estimator of (the radioactivity per unit mass)

Page 17: MT2004

11. Maximum Likelihood EstimationSuppose that you have a series of measurements y1,…, yn of radioactive emission counts from samples of caesium of masses x1,…, xn, respectively. You wish to model the counts as Poisson random variables, where each Yi has mean xi. Obtain the maximum likelihood estimator of (the radioactivity per unit mass)

Page 18: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

• So far, MLE for a single parameter, using discrete data (Binomial, Poisson)

• Maximum likelihood estimation works as well for continuous random variables

• The likelihood is the product of the p.d.f.’s of the r.v.’s

• BUT, the likelihood (or log-likelihood) can no longer be interpreted as a probability of getting the observed data, given , only as a probability density of getting the observed data.

Page 19: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

• BUT, the likelihood (or log-likelihood) can no longer be interpreted as a probability of getting the observed data, given , only as a probability density of getting the observed data.

• In practice, this makes no difference. We maximise the likelihood w.r.t. the parameters as usual.

Page 20: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3.

One model for such data assumes that the times are independent random variables T1,…, T6, all with the same p.d.f.

1. Form the likelihood and then the corresponding log-likelihood

2. Maximise the log-likelihood w.r.t. and obtain the MLEs (estimator and estimate)

Page 21: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3.

One model for such data assumes that the times are independent random variables T1,…, T6, all with same p.d.f. f(t) = exp(- t)

1. Form the likelihood and then the corresponding log-likelihood

Page 22: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3.

One model for such data assumes that the times are independent random variables T1,…, T6, all with same p.d.f. f(t) = exp(- t)

2. Maximise the log-likelihood w.r.t. and obtain the MLEs (estimator and estimate)

and setting this to zero gives the maximum-likelihood estimator:

Page 23: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

Example 1: The following data are a small part of a dataset on coal mining disasters; the numbers are times in days between major disasters: 157, 33, 186, 78, 538, 3.

One model for such data assumes that the times are independent random variables T1,…, T6, all with same p.d.f. f(t) = exp(- t)

2. Maximise the log-likelihood w.r.t. and obtain the MLEs (estimator and estimate)

Plugging in the observed values for T1,…, T6, we get an estimate:

Page 24: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

• Example 2: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a normal distribution with unknown mean and unknown variance 2, to be estimated.

1. Form the likelihood and then the corresponding log-likelihood

2. Maximise the log-likelihood w.r.t. and 2and obtain the MLEs

Page 25: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

1. Form the likelihood and then the corresponding log-likelihood

Page 26: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

2. Maximise the log-likelihood w.r.t. and 2and obtain the MLEs

First, we find the partial derivative w.r.t.

and setting this this to zero gives:

so,

Page 27: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

2. Maximise the log-likelihood w.r.t. and 2and obtain the MLEs

Then, we find the partial derivative w.r.t.

and setting this to zero gives:

so,

Page 28: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

To sum up, we have that:

and

• Note that the maximum likelihood estimator of the variance 2 is NOT the sample variance s2

• In general, MLEs are biased (but the bias tends to zero as the sample size gets larger)

• The MLEs do have the advantage of being consistent

Page 29: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

• Example 3: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a Weibull distribution with unknown parameters and , to be estimated.

1. Form the log-likelihood

2. Maximise the log-likelihood w.r.t. and and obtain the MLEs

Page 30: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

• Example 3: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a Weibull distribution with unknown parameters and , to be estimated.

1. Form the log-likelihood

Page 31: MT2004

11. Maximum Likelihood EstimationLikelihood for continuous distributions

• Example 3: Suppose that we have some observations x1,…, xn which we wish to model as observations of i.i.d. r.v.’s from a Weibull distribution with unknown parameters and , to be estimated.

2. Maximise the log-likelihood w.r.t. and and obtain the MLEs

One ends up with a nonlinear equation in that cannot be solved in closed form.

We need to use optimisation routines, e.g. optim with program R, to find the maximum of the log-likelihood (or equivalently the minimum of the negative log-likelihood)

Page 32: MT2004

11. Maximum Likelihood EstimationInvariance of MLEs

The invariance property of maximum likelihood estimators:

Example 1: suppose that x1,…, xn are observations N(,2). Find the MLE of .

We saw that the MLE for 2 is:

If we consider the one-to-one function:

Then the invariance property says that the MLE for is:

Page 33: MT2004

11. Maximum Likelihood EstimationInvariance of MLEs

The invariance property of maximum likelihood estimators:

Example 2: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. The likelihood of p is:

Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution

Page 34: MT2004

11. Maximum Likelihood EstimationInvariance of MLEs

Example: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution

Page 35: MT2004

11. Maximum Likelihood EstimationInvariance of MLEs

Example: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution

and setting this to zero gives:

Page 36: MT2004

11. Maximum Likelihood EstimationInvariance of MLEs

Example: suppose that x1,…, xk are observations on independent binomial r.v.’s, each with n trials and unknown probability p. Find the MLE of p, and deduce the MLE of the mean of the Bin(n,p) distribution