[Chapter 10. Hypothesis Testing]people.math.umass.edu/~daeyoung/Stat516/Chapter10.pdf10.1 Introduction Purpose of Statistics, estimation and test-ing-make inference about the population

[Chapter 10. Hypothesis

Testing]

10.1 Introduction

10.2 Elements of a Statistical Test

10.3 Common Large-Sample Tests

10.4 Calculating Type II Error Probabilities and Findingthe Sample Size for Z Tests

10.5 Relationship between Hypothesis-Testing Proce-dures and Confidence Intervals

10.6 Another Way to Report the Results of a StatisticalTest

10.7 Some Common on the Theory of Hypothesis Test-ing

10.8 Small-Sample Hypothesis Testing for µ1 − µ2

10.9 Testing Hypotheses concerning Variances

10.10 Power of Tests and the Neyman-Pearson Lemma

10.11 Likelihood Ratio Tests

1

10.1 Introduction

• Purpose of Statistics, estimation and test-ing

- make inference about the population by us-ing the information contained in a sampletaken from the population of our interest

- (point/interval) estimators(or estimates) ofunknown target parameters characterizingthe population

- tests of hypotheses about the estimates

• Formal procedure for hypothesis testing sim-ilar to scientific method

i) one poses a hypothesis concerning one or morepopulation parameters

ii) one samples the population and compares obser-vations with the hypothesis

iii) if the observations disagree with the hypothesis,

one rejects it. Otherwise, one concludes that the

sample did not have sufficient evidence to reject the

hypothesis.

2

• Testing a hypothesis requires making a de-cision when comparing the observed sam-ple with theory

• Questions needed to be answered in statis-tical hypothesis testing

- How do we decide whether the sample disagreeswith the scientist’s hypothesis?

- When should we reject/not reject the hypothesis orwithhold judgement?

- What is the probability that we will make the wrongdecision?

- What function of the sample measurements should

be used to make a decision

3

10.2 Elements of a Statistical Test

(Insomnia Example)

An experiment has prepared a drug dosagelevel that she claims will induce sleep for80% of people suffering from insomnia. Af-ter examining the dosage, we feel that herclaims regarding the effectiveness of thedosage level are inflated.In an attempt to prove that her claim isfalse, we administer her prescribed dosageto 20 insomniacs, and we observe Y , thenumber for which the drug dose inducessleep. If Y is less than or equal to 12, wewill conclude that her claims are inflated .

• Elements of a Statistical Test

1. Hypothesis : null hypothesis, H0 andalternative hypothesis, Ha

2. Test statistic and an associated Rejec-tion Region(RR)

3. Type I error and Type II error

4

1. Hypothesis : null hypothesis, H0 and alter-native hypothesis, Ha

- Alternative hypothesis, Ha : a researchhypothesis about the parameter(s) that wewish to support on the basis of the infor-mation contained in the sample

- Null hypothesis, H0 : the converse of Haand the hypothesis to be tested

- Support for one hypothesis is obtained byshowing lack of support for its converse

(Insomnia Example) We seek support forhypothesis that her claims are inflated, sowe want to test her claims

H0 : p = .8 vs. Ha : p < .8

where p=P(a drug dosage level induces sleepfor people suffering from insomnia).

(Q) How do we use the observed data todecide between H0 and Ha?

5

2. Test statistic and an associated RejectionRegion(RR)

- test statistic(like an estimator) : a func-tion of the sample measurements on whichthe statistical decision will be based

- rejection region(RR) : specifies the valuesof the test statistic for which H0 is to berejected in favor of Ha

- rule : compute the value of the teststatistic for a particular sample. Then

i) if the computed value of the test statis-tic falls in the RR, reject H0 and accept Ha

ii) if the value of the test statistic doesnot falls in the RR, do not reject H0

(Insomnia Example)

- test statistic : Y=the number for whichthe drug dose induces sleep among 20 in-somniacs (small values of Y are contradic-tory to H0, but favorable to Ha)

- RR : RR = y : y ≤ 12(Q) How do we find a good RR?

6

3. Type I error and Type II error

(Insomnia Example) what value should wechoose for k in RR = y : y ≤ k?- seek some objective criteria for decidinga good RR : for any fixed RR determinedby a particular value of k, there are the twopossible erroneous decisions resulting froma test

(Def 10.1)

i) A type I error is made if H0 is rejectedwhen H0 is true : α = P (Type I error)(levelof the test, significance level)

ii) A type II error is made if H0 is acceptedwhen Ha is true : β = P (Type II error)

True State of H0

Statistical Decision H0 is True H0 is FalseReject H0 Type I error Correct

Do not Reject H0 Correct Type II error

- α and β, the probabilities of making twotypes of errors : measure the risks associ-ated with the two possible erroneous deci-sions, and provide a practical way to mea-sure the goodness of a test

7

(Insomnia Example)

i) making a type I error : Her claims aretrue, but we conclude that a drug dosagelevel that she claims will not induce sleepfor 80% of people suffering from insomnia(reject H0 : p = .8 and accept Ha : p < .8when H0 : p = .8 is true).

ii) making a type II error : we concludethat a drug dosage level that she claimswill induce sleep for 80% of people sufferingfrom insomnia when, in fact, her claims arenot true (accept H0 : p = .8 when Ha : p <.8 is true).

(Insomnia Example)

- find α

- find β when p = .6


8

[Note] we would like to have a test usingRR = y ≤ k that guarantees a low risk ofmaking a type I error and offering adequateprotection against a type II error.

How? behavior of α and β by changing theRR

Suppose we enlarge RR in a new rejectionregion RR? such that RR ⊂ RR?. Then

i) α? = P (test statistic is in RR? when H0 is true)≥ P (test statistic is in RR when H0 is true) = α

ii) β? = P (test statistic is not in RR? when Ha is true)≤ P (test statistic is not in RR when Ha is true) =β

iii) the rejection region decreasing(increasing)α results in an increase(decrease) in β

9

(Insomnia Example) We consider two morerejection region, RR?1 = y ≤ 9 and RR?2 =y ≤ 15. Find following values for RR?1and RR?2, respectively

- find α



(Q) We found that α and β are inverselyrelated. Can we reduce both α and β?

: need to increase the sample size. Formost statistical tests, β will decrease as thesample size increases if α is fixed at someacceptably small.

(Reading HW) (Example 10.1-10.4)

10

10.3 Common Large-Sample tests(Z Tests)

• Suppose we want to test a set of hypothe-ses concerning a parameter θ based on arandom sample Y1, . . . , Yn.We develop hypothesis-testing proceduresthat are based on θ that has an (approxi-mately) normal sampling distribution with

mean θ and standard error σθ,θ−θσθ∼N(0,1).

Note that Y , p, Y1 − Y2, p1 − p2 in Table8.1 are candidates of the large-sample es-timators, θ.

• The summary of the large-sample α levelhypothesis test is

H0 Ha Test Rejection regionstatistic (RR)

i) θ = θ0 θ > θ0 Z z > zα(upper tail RR)ii) θ = θ0 θ < θ0 Z z < −zα(lower tail RR)iii) θ = θ0 θ 6= θ0 Z |z| < zα/2(two-tailed RR)

where Z = θ−θ0σθ

under H0 , zα satisfies α =

P (Z > zα), and zα/2 satisfies α/2 = P (Z >

zα/2)

11

• The summary of the large-sample α level hypothesistest(continued)

i) Upper tail test : test H0 : θ = θ0 vs. Ha : θ > θ0where θ0 is a specific value of θ

- accept H0 if θ is close to θ0(i.e., θ−θ0

σθis close to

zero)

favor rejection of H0 and acceptance of Ha if θ >θ0(i.e., θ−θ0

σθis larger than zero) by a suitable amount

- test statistic is Z = θ−θ0

σθand this measures the

number of standard errors between θ and θ0.

- RR = θ > k‘ = θ−θ0

σθ> k

′−θ0

σθ = z > k for some

choice of k = k′−θ0

σθ.

How to decide the k in RR? fix the type I error (α,the level of the test) under the fact that Z∼N(0,1)

: if we wish an α-level test(α = P(Type I error)),k = zα where α = P (Z > zα). Then RR = z > zα.

(Example 10.5)

(Example 10.6)

12

• The summary of the large-sample α level hypothesistest(continued)

iii) two-tailed test : test H0 : θ = θ0 vs. Ha : θ 6= θ0where θ0 is a specific value of θ

- accept H0 if θ is close to θ0(i.e., θ−θ0

σθis close to

zero)

- favor rejection of H0 and acceptance of Ha if θ > θ0

or θ < θ0 (i.e., θ−θ0

σθis either larger or smaller than

zero) by a suitable amount

- test statistic is Z = θ−θ0

σθ

- RR = θ > k‘1, θ < −k‘

2 = z > k1, z < −k2 forsome choice of k1 > 0 and k2 > 0.

- How to decide k1 and k2? fix the type I error (α,the level of the test) under the fact that Z∼N(0,1)and its symmetry

: if we wish an α-level test(α = P(Type I error)),k1 = k2 = zα/2 where α = P (| Z |> zα/2). ThenRR = | z |> zα/2

(Example 10.7)

13

10.4 Calculating β and finding n forLarge-Sample tests(Z Tests)

• For H0 : θ = θ0 vs. Ha : θ > θ0, calculatethe probability β of a type II error only forspecific values for θ in Ha

- For Ha : θ = θa(> θ0) and RR = θ > k,

β = P (H0 is accepted when Ha is true)

= P (θ is not in RR when Ha is true)

= P (θ ≤ k when θ = θa)

= P

(θ − θaσθ

≤k − θaσθ

when θ = θa

)where θ−θa

σθ∼N(0,1) for θa, the true value of θ

• For a fixed n, the size of β is a function ofθa and θ0

- if θa is close to θ0, the true value of θ(θ0 or θa) isdifficult to detect so that β tends to be large

- if θa is far from θ0, the true value of θ(θ0 or θa) is

relatively easy to detect so that β tends to be small

• Graphical depiction of the relation betweenα and β

14

(Example 10.8)

• We can reduce the value of β by increasingn (see Example 10.8)

• Given specified values of α and β, supposewe want to test either “H0 : µ = µ0 vs.Ha : µ = µa < µ0” or “H0 : µ = µ0 vs.Ha : µ = µa > µ0”.

Then sample size formula for an one-tailed

α-level test is n =(zα+zβ)2σ2

(µa−µ0)2 .

(Proof)

(Example 10.9)

15

10.5 Relationship between Hypothesis-Testing and Confidence Intervals

Given θ−θσθ∼N(0,1) and α/2 = P (Z > zα/2),

1. Large sample confidence intervals(Sec8.6) : a two sided confidence interval for θ with

confidence coefficient 1− α is θ ± zα/2σθ.

2. Large sample test(Sec 10.3) : An two-

sided α-level test of H0 : θ = θ0 vs. Ha : θ 6= θ0 is

to use a Z test based on Z = θ−θσθ

and reject H0 if z

falls in the RR = | z |> zα/2.

Duality between [1] and [2]

- The complement of the RR associated with anytest is “acceptance region”. For any of large sampletwo-tailed α-level tests, the acceptance region is

Acceptance region ≡ −zα/2 ≤ z ≤ zα/2

= −zα/2 ≤θ − θ0

σθ≤ zα/2

= θ − zα/2σθ ≤ θ0 ≤ θ + zα/2σθ

16

- Do not reject H0 : θ = θ0 in favor of Ha : θ 6= θ0 ifthe value of θ0 lies inside a 100(1−α)% confidenceinterval for θ, [θ − zα/2σθ, θ + zα/2σθ].

- [θ− zα/2σθ, θ+ zα/2σθ] is the set of all values of θ0

for which H0 : θ = θ0 is “acceptable” at level α. Anyvalue inside the confidence interval is an acceptablevalue of θ

- Since many values of θ in this confidence interval

are acceptable, we usually do not accept a single θ

value as being the true value.

Duality between one-sided test and one-sided confidence interval

- A large-sample α-level test for H0 : θ = θ0 vs.Ha : θ > θ0 rejects H0 if θ−θ0

σθ> zα

⇔ do not reject H0 if θ0 > θ − zασθ (100(1 − α)%lower confidence bound for θ).

- A large-sample α-level test for H0 : θ = θ0 vs.

Ha : θ < θ0 rejects H0 if θ−θ0

σθ< −zα

⇔ do not reject H0 if θ0 < θ + zασθ (100(1 − α)%

upper confidence bound for θ).

17

10.6 Another Way to Report the Re-sults of a Statistical Test: P-values

• α = P (Type I error): level of the test orsignificance level

- recommend small value of α, but choice of α in

data analysis is somewhat arbitrary

- it is possible that H0 is rejected at α = 0.05, but

not at α = 0.01.

• P-value(associated with a test statistic)

(Def 10.2) If W is a test statistic, the p-value isthe smallest value of α for which the observed dataindicate that H0 should be rejected.

- P-value calculation for one-tailed test

: If one rejects H0 in favor of Ha for small values ofW (i.e., RR = w ≤ k), then the P-value associ-ated with an observed value w0 of W is P-value =P (W ≤ w0 when H0 is true)

: If one rejects H0 in favor of Ha for large values of

W (i.e., RR = w ≥ k), then the P-value associ-

ated with an observed value w0 of W is P-value =

P (W ≥ w0 when H0 is true)

18

- P-value calculation for two-tailed test

: If one rejects H0 in favor of Ha for either small orlarge values of W (i.e., RR = | w |≥ k), then theP-value associated with an observed value w0 of Wis P-value = P (|W |≥ w0 when H0 is true)

- The smaller the P-value becomes, the more com-pelling is the evidence that H0 should be rejected

- If researchers have a value of α in mind, the P-value can be used to implement an α-level test.

- The P-value allows us to evaluate the extent to

which the observed data disagree with H0.

(Example 10.10) (see Example 10.1)

(Example 10.11) (see Example 10.7)

(Exercise 10.51)

(Exercise 10.55)

19

10.7 Some Common on the Theory ofHypothesis Testing

• Choice between a one-tailed or a two-tailed test: depend on the alternative value of θ that theresearcher would like to detect

• Statistical test : a procedure to reject or accept H0,with measured error α or β

- Need to know how to select tests with smallestpossible value of β where α is fixed by researchers

• But, this framework might not be adequate for allpractical situations.

- we should be cautious about making conclusionswhen there is no sufficient evidence to reject H0

- β : the selection of a practically meaningful valuefor θ might not be easy, and its calculation mightbe tedious.

- If the calculated value of β is small and the valueof the test statistic is not in RR, it’s reasonable tosay that H0 is accepted.

- If the value of β can not be calculated and thevalue of the test statistic is not in RR, we will “failto reject ” rather than “accept” H0, and report theP value associated with the test.

- If H0 is rejected for a small P value, this does

not imply that H0 is “wrong by a large amount”,

but means that H0 can be rejected based on a test

procedure with a small α.

20

- “statistical” significance might not be equatedwith “practical” significance. In order to assess thepractical significance, construction of a confidenceinterval is recommended.

• Is it possible to use the following form of the nullhypothesis, “H0 : θ ≥ θ0 vs. Ha : θ < θ0”, instead of“H0 : θ = θ0 vs. Ha : θ < θ0”

- the alternative hypothesis is our primary concern

- calculation of the level α of the test was relativelysimple for H0 : θ = θ0

: for H0 : θ = θ0, α = P (test statistic is RR when θ = θ0)

: for H0 : θ ≥ θ0, the definition of α needs to bechanged such as α = maxθ≥θ0

P (test statistic is RR)

- using H0 : θ = θ0 still leads to the correct testingprocedure and the correct calculation of α

: α = maxθ≥θ0P (test statistic is RR) typically oc-

curs when θ = θ0

21

10.8 Small-Sample Hypothesis Testingfor µ and µ1 − µ2

• Large-sample hypothesis testing needs large

n : θ−θ0σθ∼N(0,1)

• A small-sample test for µ (see [case 1] inSection 8.8)

Assumptions : Y1, . . . , Yn ∼ N(µ, σ2) withunknown µ and σ2

H0 Ha Test Rejection regionstatistic (RR)

i) µ = µ0 µ > µ0 T t > tαii) µ = µ0 µ < µ0 T t < −tαiii) µ = µ0 µ 6= µ0 T |t| > tα/2

where T = Y−µ0S/√n∼ t(n − 1) under H0 and

tα satisfies α = P (T > tα)(see Table 5 inAppendix 3)

(Example 10.12 and 10.13)

22

• A small-sample test for µ1 − µ2 (see [case2] in Section 8.8)

Assumptions : independent random samplefrom each of two normal populations

: Y11, . . . , Y1n1∼ N(µ1, σ

21) and Y21, . . . , Y2n2

∼N(µ2, σ

22) with unknown µ1, µ2 and σ2 =

σ21 = σ2

2

H0 Ha Test Rejectionstatistic region (RR)

i) µ1 − µ2 = D0 µ1 − µ2 > D0 T t > tαii) µ1 − µ2 = D0 µ1 − µ2 < D0 T t < −tαiii) µ1 − µ2 = D0 µ1 − µ2 6= D0 T |t| > tα/2

where T = Y1−Y2−D0

Sp√

1n1

+ 1n2

∼ t(n1 + n2 − 2) un-

der H0 and tα satisfies α = P (T > tα)(seeTable 5 in Appendix 3)


23

[Note]

- It is not easy to verify the assumptionsmentioned above in most cases

- The t-test for µ and µ1 − µ2 is robustrelative to the assumption of normality

- The t-test for µ and µ1−µ2 are also robustrelative the assumption that σ2

1 = σ22 when

n1 = n2

- the duality between tests and confidenceintervals considered in Section 10.6 still holdsfor the tests based on the t distribution.

24

10.9 Testing Hypotheses concerning Vari-ances

• Test of hypotheses concerning σ2 (see Sec-tion 8.9)

Assumptions : Y1, . . . , Yn ∼ N(µ, σ2) withunknown µ and σ2


i) σ2 = σ20 σ2 > σ2

0 χ2 χ2 > χ2α

ii) σ2 = σ20 σ2 < σ2

0 χ2 χ2 < χ21−α

iii) σ2 = σ20 σ2 6= σ2

0 χ2 χ2 > χ2α/2 or χ2 < χ2

1−α/2

where χ2 = (n−1)S2

σ20

∼ χ2(n − 1) under H0

and χ2α satisfies α = P (χ2 > χ2

α)(see Figure10.10 and Table 6 in Appendix 3)


25

• Test of hypotheses σ21 = σ2

2 (see Def 7.3in pp. 362-363)

Assumptions

: independent random sample from eachof two normal populations

: Y11, . . . , Y1n1∼ N(µ1, σ

21) and Y21, . . . , Y2n2

∼N(µ2, σ

22) with unknown µ1, µ2, σ2

1, and σ22.


σ21 = σ2

2 σ21 > σ2

2 F F > Fα(n1 − 1, n2 − 1)

σ21 = σ2

2 σ21 6= σ2

2 F F > Fα/2(n1 − 1, n2 − 1) orF < 1/Fα/2(n2 − 1, n1 − 1)

where F =S2

1S2

2∼ F (n1− 1, n2− 1) under H0

and tα satisfies α = P (F > Fα(n1 − 1, n2 −1))(see Table 7 in Appendix 3)

(Example 10.19, 10.20, 10.21)

26

[Note]

Regarding test of hypotheses σ21 = σ2

2,

- Sample variances, S21 and S2

2 are independent, andthey are estimates for σ2

1 and σ22.

- Theorem 7.3 : (n1 − 1)S21/σ

21 ∼ χ2(n1 − 1) and

(n2 − 1)S22/σ

22 ∼ χ2(n2 − 1).

- Def 7.3 : F = [(n1−1)S21/σ

21]/(n1−1)

[(n2−1)S22/σ

22]/(n2−1)

= S21/σ

21

S22/σ

22∼ F (n1 −

1, n2 − 1)

- How to determine the lower-tail critical value forσ2

1 6= σ22 ?

: If F = S21

S22∼ F (n1 − 1, n2 − 1), 1/F = S2

2

S21∼ F (n2 −

1, n1−1). Then α/2 = P(F > Fα/2(n1 − 1, n2 − 1)

)=

P(F < 1/Fα/2(n2 − 1, n1 − 1)

) Both χ2 and F tests presented in thissection are not robust, as they are very sen-sitive to departure from the assumption ofnormality of the underlying population(s)

27

10.10 Power of Tests and the Neyman-Pearson Lemma

• We have learned specific tests for a numberof practical hypothesis testing situations.

- why we chose those particular tests?

- how do we decide on those test statistics?

- how do we know if associated rejectionregions are best?

• Goodness of a test

- predetermined α = P(Type I error) de-cides the location of the RR

- β = P(Type II error) is related to powerof the test !!

(Def 10.3) Suppose that W is the test statis-tic and RR is the rejection region for a testof a hypothesis involving the value of θ.Then the power of the test, power(θ), ispower(θ)= P (W in RR at θ)=P(reject H0 at θ).

28

• Power at θ, power(θ)

- measures the test’s ability to detect thatH0 is false.

- Suppose we want to test H0 : θ = θ0 vs.Ha : θ = θa(θa is a value of θ under Ha).Then

i) power(θ0)=P(reject H0 under H0) = α

ii) power(θa)=P(reject H0 under Ha) = 1-P(do not reject H0 under Ha) = 1− β(θa).

• Power curve, a graph of power(θ) for H0 :θ = θ0 vs. Ha : θ 6= θ0,

- ideal shape (see Figure 10.14)

: power(θ0) = 0 and power(θa) = 1(i.e.,β(θa) = 0) for all θa in Ha. But both α andβ can not be arbitrarily small for fixed n

- typical shape(see Figure 10.13)

: for a fixed n, choose a small α underH0 and select a RR that has power(θa) > α

at each θa in Ha

29

• Best testing procedure

- from among all tests with a significancelevel of α, seek the test whose power func-tion comes closest to the ideal power func-tion(if such a test exists) : for a fixedn, choose a small α and select a RR tomaximize power(θa)(i.e., minimize β(θa)) ateach θa in Ha

- how do we find such a testing procedure?:The Neyman-Pearson Lemma(Theorem 10.1)

• Simple and Composite hypotheses(Def 10.4)

Suppose we take a random sample from adistribution with θ. Then a hypothesis issaid to be a simple hypothesis if that hy-pothesis uniquely specifies the distributionof the population from which the sample istaken. Any hypothesis that is not a simplehypothesis is called a composite hypothe-sis.

30

(Example) Y1, . . . , Yn ∼ f(y) = (1/λ)e−y/λ, y > 0.

Then, ‘H : λ = 2’ is a simple hypothesis and ‘H :λ > 2’ is a composite hypothesis

(Example) Y1, . . . , Yn ∼ N(µ, σ2).

i) if σ2 is known, then ‘H : µ = 5’ is a simple hy-pothesis

ii) if σ2 is unknown, then ‘H : µ = 5’ is a composite

hypothesis

• Most Powerful(MP) Test for H0 : θ = θ0vs. Ha : θ = θa

: The Neyman-Pearson Lemma(Thm 10.1)Suppose that we wish to test the simplenull hypothesis H0 : θ = θ0 vs. the simplealternative hypothesis Ha : θ = θa based ona random sample Y1, . . . , Yn from a distribu-tion with θ. Let L(θ) denote the likelihoodof the sample at θ. Then, for a given α,the test that maximizes the power at θa

has a rejection region, RR =L(θ0)L(θa)

< k

.

The value of k is chosen so that the testhas the desired value for α. Such a test isa most powerful α-level test for H0 vs. Ha.

31

→ Idea of The Neyman-Pearson Lemma

i) since we are concerned with two values of θ, θ0and θa, we want to choose a RR, so that α =power(θ0) is fixed and β(θa) is as small as possi-ble(i.e., seek a test with maximum power, power(θa) =1− β(θa))

ii) L(θ) = L(y1, . . . , yn | θ) gives the likelihood ofobserving the sample (Y1 = y1, . . . , Yn = yn) whenthe value of the parameter is θ. If the sample arefrom a distribution with θa, then L(θ0) is smallerthan L(θa).

(Example 10.22)

[note]

i) Forms of the test statistic and of the RR dependson both H0 and Ha.

ii) Neyman-Pearson Lemma gives the form of theRR; the actual RR depends on the specified valueof α

iii) For discrete distributions, we specify the test tobe one for which the probability of a type I erroris closest to the predetermined value of α withoutexceeding it.

32

• Uniformly Most Powerful(UMP) Test forH0 : θ = θ0 vs. Ha : θ > θ0

Suppose we sample from a population whosedistribution is completely specified exceptfor the value of a single parameter θ. Wewould like to test H0 : θ = θ0(simple) vs.Ha : θ > θ0(composite). Then,

i) no general theorem is applicable due tocomposite Ha : θ > θ0.

ii) but, the Neyman-Pearson Lemma(Thm10.1) can be applied to obtain a most pow-erful(MP) test for H0 : θ = θ0(simple) vs.Ha : θ = θa(simple) where θa > θ0.

iii) when a test obtained by the Neyman-Pearson Lemma actually maximizes the powerfor every value of θ greater than θ0, it issaid to be a uniformly most powerful(UMP)test for H0 : θ = θ0 vs. Ha : θ > θ0.

iv) one can derive a UMP test for H0 : θ =θ0 vs. Ha : θ < θ0 in a similar way.

33

(Example 10.23)

[Note]

i) In many situations, the actual RR for the MPtest depends on only the value of θ0(and does notdepend on the particular choice of θa)

ii) In most cases there do not exist UMP two-tailedtest such as H0 : θ = θ0 vs. Ha : θ 6= θ0

→ In Example 10.23, associated RR of the UMPtest for H0 : µ = µ0 vs. Ha : µ > µ0 was y >µ0 + zασ/

√n. For associated RR of the UMP test

for H0 : µ = µ0 vs. Ha : µ < µ0 is y < µ0− zασ/√n.

→ Thus if we wish to test H0 : µ = µ0 vs. Ha : µ 6=µ0, no single rejection region yields the MP test forall values of µa 6= µ0.

iii) The Neyman-Pearson Lemma(Thm 10.1) is use-less if we wish to test a hypothesis about a sin-gle parameter θ when the sampled distribution con-tains other unspecified parameters(called nuisanceparameters).

→ We want to test H0 : µ = µ0 when the sampleis taken from a normal distribution with unknownvariance σ2. In this case, H0 : µ = µ0 is composite,as σ2 could be any nonnegative number.

→ See Chapter 10.11 for a very general and widely

used testing method

34

10.11 (large sample) Likelihood RatioTests

• Motivation

- When the distribution of the observations is knownexcept for a single unknown parameter, the Neyman-Pearson Lemma(Thm 10.1) is useful under eithersimple hypotheses or the one-sided composite hy-potheses

- But, in many cases,

i) the distribution of our interest has more thanone unknown parameter,

ii) we are interested in two-sided test, H0 : θ = θ0

vs. Ha : θ 6= θ0,

• Four steps for Likelihood Ratio(LR) Tests

[Step-1] Write the likelihood function which is a

function of observed sample and the parameters

: Suppose a random sample is selected from a dis-

tribution containing k parameters, Θ = (θ1, . . . , θk).

Then the likelihood function is L(Θ) = L(y1, . . . , yn |θ1, . . . , θk). In some cases, we are interested in test-

ing hypotheses only about some of the parameters.

35

(Example) One takes n samples y1, . . . , yn from N(µ, σ2)where unknown µ and σ2. Then Θ = (µ, σ2) andL(Θ) = L(y1, . . . , yn | µ, σ2). If one is interested intesting hypotheses about only µ, then we call µ aparameter of interest and σ2 a nuisance parameter.

[Step-2] Write H0 and Ha down clearly, and specify

the set of parameter values under H0 and Ha

: Suppose we want to test H0 : Θ ∈ Ω0 and Ha :

Θ ∈ Ωa where Ω0 is a particular set of possible

values under H0 and Ωa is an another set of possible

values under Ha. Note that Ω0 ∩ Ωa is empty and

Ω0 ∪Ωa = Ω(union of Ω0 and Ωa).

(Example continued) We want to test H0 : µ = µ0

vs. Ha : µ 6= µ0. Then, Θ = (µ, σ2), Ω0 = (µ, σ2) :

µ = µ0, σ2 > 0 and Ωa = (µ, σ2) : µ 6= µ0, σ2 > 0,and hence Ω = Ω0 ∪ Ωa = (µ, σ2) : −∞ < µ <

∞, σ2 > 0.

36

[Step-3] Given observed samples, calculate L(Ω0) =maxΘ∈Ω0

L(Θ) under H0 : Θ ∈ Ω0, and L(Ω) =maxΘ∈ΩL(Θ) under Ω = Ω0 ∪Ωa, respectively.

: L(Ω0) is the maximum of L(Θ)(this is same asthe maximum of `(Θ) = lnL(Θ)) for all Θ ∈ Ω0(See Section 9.7)

: L(Ω) is the maximum of L(Θ)(this is same asthe maximum of `(Θ) = lnL(Θ)) for all Θ ∈ Ω =Ω0 ∪Ωa(See Section 9.7)

: L(Ω0) and L(Ω) represent a best explanation forobserved sample y1, . . . , yn for all Θ ∈ Ω0 and Θ ∈ Ω,respectively.

: If L(Ω0) = L(Ω), a best explanation for the ob-served data can be found inside Ω0, and we shouldnot reject H0 : Θ ∈ Ω0.

: If L(Ω0) < L(Ω), a best explanation for the ob-served data can be found inside Ωa, and we shouldconsider rejecting H0 in favor of Ha.

: Thus, the ratio L(Ω0)L(Ω)

(or the difference `(Ω0) −

`(Ω)(= ln L(Ω0)L(Ω)

)) is a test statistic.

(Example continued) L(Ω0)L(Ω)

under H0 : µ = µ0 vs.

Ha : µ 6= µ0 is

37

[Step-4] Define λ by

λ =L(Ω0)

L(Ω)=

maxΘ∈Ω0L(Θ)

maxΘ∈ΩL(Θ).

Then one can do testing H0 : Θ ∈ Ω0 vs. Ha : Θ ∈Ωa by using one of the following two methods.

M1) If λ is a function of a test statistic whosedistribution is known, one can use that test statisticfor testing H0 : Θ ∈ Ω0.

M2) If one can not represent λ as a function of atest statistic whose distribution is known, one canuse the following large sample test :

- For large n, −2 ln(λ) has approximately a χ2

distribution with r0 − r degree of freedoms wherer0 is the number of parameters estimated underH0 : Θ ∈ Ω0 and r is the number of parametersestimated under Θ ∈ Ω.

- If we desire an α-level test, the large sample LRtest has a rejection region given by −2 ln(λ) > χ2

α.So, for given observed data, −2 ln(λ) > χ2

α impliesthat one might reject H0 at approximately α levelof significance.

: 0 ≤ λ ≤ 1

: a value of λ close to zero means that the likelihoodof the observed sample is much smaller under H0than Ha. Thus the observed data suggest favoringHa over H0

: RR = λ ≤ k where α = P (λ ≤ k).

38

: Under M2),

RR = λ < k = −2 ln(λ) > −2 ln(k) = k?≈ −2 ln(λ) > χ2

α where −2 ln(λ) ∼· χ2(r0− r)and α = P (−2 ln(λ) ≥ χ2

α).

(Example continued)

(Example 10.24)

(Example 10.25)

39

Documents

[Chapter 10. Hypothesis Testing]people.math.umass.edu/~daeyoung/Stat516/Chapter10.pdf10.1 Introduction Purpose of Statistics, estimation and test-ing-make inference about the population