32
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation

6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

  • Upload
    others

  • View
    15

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

Copyright © Cengage Learning. All rights reserved.

6 Point Estimation

Page 2: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

Copyright © Cengage Learning. All rights reserved.

6.2 Methods of Point Estimation

Page 3: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

3

Methods of Point Estimation The definition of unbiasedness does not in general indicate how unbiased estimators can be derived. We now discuss two “constructive” methods for obtaining point estimators: the method of moments and the method of maximum likelihood. By constructive we mean that the general definition of each type of estimator suggests explicitly how to obtain the estimator in any specific problem.

Page 4: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

4

Methods of Point Estimation Although maximum likelihood estimators are generally preferable to moment estimators because of certain efficiency properties, they often require significantly more computation than do moment estimators. It is sometimes the case that these methods yield unbiased estimators.

Page 5: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

5

The Method of Moments

Page 6: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

6

The Method of Moments The basic idea of this method is to equate certain sample characteristics, such as the mean, to the corresponding population expected values. Then solving these equations for unknown parameter values yields the estimators.

Page 7: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

7

The Method of Moments Definition Let X1, . . . , Xn be a random sample from a pmf or pdf f(x). For k = 1, 2, 3, . . . , the kth population moment, or kth moment of the distribution f(x), is E(Xk). The kth sample moment is Thus the first population moment is E(X) = µ, and the first sample moment is ΣXi/n = The second population and sample moments are E(X2) and ΣXi

2/n, respectively. The population moments will be functions of any unknown parameters θ1, θ2, . . . .

Page 8: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

8

The Method of Moments Definition Let X1, X2, . . . , Xn be a random sample from a distribution with pmf or pdf f (x; θ1, . . . , θm), where θ1, . . . , θm are parameters whose values are unknown. Then the moment estimators θ1, . . . , θm are obtained by equating the first m sample moments to the corresponding first m population moments and solving for θ1, . . . , θm.

Page 9: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

9

The Method of Moments If, for example, m = 2, E(X) and E(X2) will be functions of θ1 and θ2. Setting E(X) = (1/n) Σ Xi (= ) and E(X2) = (1/n) Σ Xi

2 gives two equations in θ1 and θ2. The solution then defines the estimators.

Page 10: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

10

Example 12 Let X1, X2, . . . , Xn represent a random sample of service times of n customers at a certain facility, where the underlying distribution is assumed exponential with parameter λ. Since there is only one parameter to be estimated, the estimator is obtained by equating E(X) to . Since E(X) = 1/λ for an exponential distribution, this gives 1/λ = or λ = 1/ . The moment estimator of λ is then λ

Page 11: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

11

Maximum Likelihood Estimation

Page 12: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

12

Maximum Likelihood Estimation The method of maximum likelihood was first introduced by R. A. Fisher, a geneticist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have certain desirable efficiency properties.

Page 13: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

13

Example 15 A sample of ten new bike helmets manufactured by a certain company is obtained. Upon testing, it is found that the first, third, and tenth helmets are flawed, whereas the others are not. Let p = P(flawed helmet), i.e., p is the proportion of all such helmets that are flawed. Define (Bernoulli) random variables X1, X2, . . . , X10 by

Page 14: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

14

Example 15 Then for the obtained sample, X1 = X3 = X10 = 1 and the other seven Xi’s are all zero.

The probability mass function of any particular Xi is , which becomes p if xi = 1 and 1 – p when xi = 0.

Now suppose that the conditions of various helmets are independent of one another.

This implies that the Xi’s are independent, so their joint probability mass function is the product of the individual pmf’s.

cont’d

Page 15: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

15

Example 15 Thus the joint pmf evaluated at the observed Xi’s is f (x1, . . . , x10; p) = p(1 – p)p . . . p = p3(1 – p)7

Suppose that p = .25. Then the probability of observing the sample that we actually obtained is (.25)3(.75)7 = .002086. If instead p = .50, then this probability is (.50)3(.50)7 = .000977. For what value of p is the obtained sample most likely to have occurred? That is, for what value of p is the joint pmf (6.4) as large as it can be? What value of p maximizes (6.4)?

(6.4)

cont’d

Page 16: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

16

Example 15 Figure 6.5(a) shows a graph of the likelihood (6.4) as a function of p. It appears that the graph reaches its peak above p = .3 = the proportion of flawed helmets in the sample.

Figure 6.5(a)

Graph of the likelihood (joint pmf) (6.4) from Example 15

cont’d

Page 17: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

17

Example 15 Figure 6.5(b) shows a graph of the natural logarithm of (6.4); since ln[g(u)] is a strictly increasing function of g(u), finding u to maximize the function g(u) is the same as finding u to maximize ln[g(u)].

Figure 6.5(b)

Graph of the natural logarithm of the likelihood

cont’d

Page 18: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

18

Example 15

(6.5)

We can verify our visual impression by using calculus to find the value of p that maximizes (6.4). Working with the natural log of the joint pmf is often easier than working with the joint pmf itself, since the joint pmf is typically a product so its logarithm will be a sum. Here ln[ f (x1, . . . , x10; p)] = ln[p3(1 – p)7] = 3ln(p) + 7ln(1 – p)

cont’d

Page 19: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

19

Example 15 Thus [the (1) comes from the chain rule in calculus].

cont’d

Page 20: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

20

Example 15 Equating this derivative to 0 and solving for p gives 3(1 – p) = 7p, from which 3 = 10p and so p = 3/10 = .30 as conjectured. That is, our point estimate is = .30. It is called the maximum likelihood estimate because it is the parameter value that maximizes the likelihood (joint pmf) of the observed sample. In general, the second derivative should be examined to make sure a maximum has been obtained, but here this is obvious from Figure 6.5.

cont’d

Page 21: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

21

Example 15 Suppose that rather than being told the condition of every helmet, we had only been informed that three of the ten were flawed. Then we would have the observed value of a binomial random variable X = the number of flawed helmets. The pmf of X is For x = 3, this becomes The binomial coefficient is irrelevant to the maximization, so again = .30.

cont’d

Page 22: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

22

Maximum Likelihood Estimation The likelihood function tells us how likely the observed sample is as a function of the possible parameter values. Maximizing the likelihood gives the parameter values for which the observed sample is most likely to have been generated—that is, the parameter values that “agree most closely” with the observed data.

Page 23: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

23

Suppose X1, X2, . . . , Xn is a random sample from an exponential distribution with parameter λ. Because of independence, the likelihood function is a product of the individual pdf’s: The natural logarithm of the likelihood function is ln[ f (x1, . . . , xn ; λ)] = n ln(λ) – λΣxi

Example 16

Page 24: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

24

Example 16 Equating (d/dλ)[ln(likelihood)] to zero results in n/λ – Σxi = 0, or λ = n/Σxi = Thus the mle is it is identical to the method of moments estimator [but it is not an unbiased estimator, since

cont’d

Page 25: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

25

Example 17 Let X1, . . . , Xn be a random sample from a normal distribution. The likelihood function is so

Page 26: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

26

Example 17 To find the maximizing values of µ and σ 2, we must take the partial derivatives of ln(f ) with respect to µ and σ 2, equate them to zero, and solve the resulting two equations. Omitting the details, the resulting mle’s are The mle of σ 2 is not the unbiased estimator, so two different principles of estimation (unbiasedness and maximum likelihood) yield two different estimators.

cont’d

Page 27: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

27

Estimating Functions of Parameters

Page 28: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

28

Estimating Functions of Parameters In Example 17, we obtained the mle of σ2 when the underlying distribution is normal.

The mle of σ = , as well as that of many other mle’s, can be easily derived using the following proposition.

Proposition The Invariance Principle

Let be the mle’s of the parameters θ1, θ2, . . . , θm. Then the mle of any function h(θ1, θ2, . . . , θm) of these parameters is the function h( ) of the mle’s.

Page 29: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

29

Example 20 Example 17 continued… In the normal case, the mle’s of µ and σ2 are To obtain the mle of the function substitute the mle’s into the function: The mle of σ is not the sample standard deviation S, though they are close unless n is quite small.

Page 30: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

30

Large Sample Behavior of the MLE

Page 31: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

31

Large Sample Behavior of the MLE Although the principle of maximum likelihood estimation has considerable intuitive appeal, the following proposition provides additional rationale for the use of mle’s. Proposition Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter θ is approximately unbiased and has variance that is either as small as or nearly as small as can be achieved by any estimator. Stated another way, the mle is approximately the MVUE of θ.

Page 32: 6 Point Estimation - Auburn Universitywebhome.auburn.edu/.../CourseNotesPowerPoint/...02.pdfMethods of Point Estimation Although maximum likelihood estimators are generally preferable

32

Large Sample Behavior of the MLE Because of this result and the fact that calculus-based techniques can usually be used to derive the mle’s (though often numerical methods, such as Newton’s method, are necessary), maximum likelihood estimation is the most widely used estimation technique among statisticians. Many of the estimators used in the remainder of the book are mle’s. Obtaining an mle, however, does require that the underlying distribution be specified.