Chapter 9: TESTS OF HYPOTHESES FOR A SINGLE …homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch9_pt1.pdfChapter 9: TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Part 1: Intro to Hypothesis

Chapter 9: TESTS OF HYPOTHESESFOR A SINGLE SAMPLE

Part 1: Intro to Hypothesis TestingSections 9-1, 9-2, 9-3

Statistical InferenceWe infer something about the population as awhole from the information in a sample.

Sample

Population

- Point estimation X

- Confidence intervals X

- Hypothesis testing (introduced in chapter 9)1

Hypothesis TestingSections 9-1, 9-2, 9-3

We’ll start with an illustration...

• Example: Reduction of car emissions

A certain automobile engine emits 100 mgof nitrogen oxides per second on average. Amodification to the engine has been proposedthat may reduce the emissions.

The new design will be put into productionIF it can be demonstrated that its mean emis-sion rate is less than 100 mg/s.

To make a decision, a random sample ofn = 50 modified engines is taken andemission measurements are recorded.

2

The sample mean is x̄ = 92 mg/s and thesample standard deviation is s = 21 mg/s.

A normal probability plot suggests emissionsfollow a normal distribution.

Isn’t 92 far enough below 100 for us to saythe modified engine is better?

Is there enough evidence to completely changethe manufacturing line and switch whichengine is produced?

3

STATISTICAL QUESTION:

Could we have gotten this low of a samplemean emission x̄ even if the modified engineWASN’T any better than the first (i.e. it’spopulation mean was actually 100)?

Could we have grabbed a sample that hap-pened to have many low emission values even-though the population mean was 100?

To make a decision on the engines, we wantto quantify the above question with a prob-ability:

“Given that the true population mean emis-sion is 100 mg/s, what is the probabilityof observing an emissions x̄ this low orlower?

4

Recall from the last chapter:

If we assume µ = 100 and n large, we have

X̄ ∼ N(100, σ2

n ).This is a known behavior of the sample mean.

Probability of interest:

Given µ = 100 (engine not any better),P (X̄ ≤ 92) = ?

Since σ2 is unknown in this case, we have

T =X̄ − µS/√n∼ tn−1

where S is the sample standard deviation

and T has a t distribution with n−1 degreesof freedom (and n = 50 in this example).

5

P (X̄ ≤ 92) = P

(X̄ − µS/√n≤ 92− 100

21/√

50

)

= P (T ≤ −2.69)

because T ∼ t49

T

t(49

) de

nsity

−3 −2 −1 0 1 2 3

t with 49 df

= 0.0049

6

NOT VERY LIKELY...

The probability of observing an emissions x̄this low or lower, given that the true popu-lation mean is 100 mg/s is

0.0049

This suggests that our initial assumption inthe calculation, that the true mean was 100,is perhaps incorrect.

For this reason, we reject the assumption ofµ = 100 in favor of the ‘alternative’, thatthe true mean emissions IS LESS THAN 100mg/s.

We don’t know FOR SURE, but there’s strongevidence against someone saying that the meanof the modified engine is 100 mg/s.

7

If it was 100 mg/s, we would very rarely seean x̄ this low (could happen, but not likely).

What’s unlikely enough to actually rejectthe initial assumption (that the two enginemodels were equal)?

There’s some opinion here, but we often use0.05 as a threshold. Anything less than thisis considered rather unlikely.————————————————————

We have essentially just performed a hypoth-esis test, now we will formalize the proce-dure...

8

•General set-up for testing ahypothesis for µ

1. State your null H0 and alternative H1hypotheses.(The null is what we assume to be true.)

H0 : µ = µ0

(The subscript on µ0 is used to emphasizethat this value is the assumed mean underthe null hypothesis being true.)

There are 3 choices for the alternative,either...

* H1 : µ 6= µ0 (two-sided alternative)

* H1 : µ < µ0 (one-sided alternative)

* H1 : µ > µ0 (one-sided alternative)

9

2. Calculate the test statistic (either a Z or T )under the null being true.(In this example, the test statistic was aT , we’ll make a conclusion based on this.)

3. Compute the probability of observing a teststatistic this extreme, or more extreme,under the null being true.(This probability is called a p-value.)

4. State your conclusion with respect to theproblem:

Either... ‘Reject the null’or

‘Fail to reject the null’.

5. Verify any assumptions that were needed.(This is usually a normal probability plotfor verifying normality which is needed tohave T ∼ tn−1).

10

• Example: Formalizing the emissionshypothesis test

1. State your null H0 and alternative H1hypotheses.

H0 : µ = 100H1 : µ < 100 (this is a one-sided

hypothesis test withµ0 = 100)

2. Calculate the observed test statistic underthe null being true.

t0 =x̄− µ0

s/√n

=92− 100

21/√

50= −2.69

(The subscript on t0 is used to emphasizethe fact that we’re assuming the mean tobe µ0.)

11

3. Compute the probability of observing atest statistic this extreme, or more extreme,under the null being true (i.e. compute thep-value).

Under H0 true, T0 = X̄−µ0S/√n∼ t49, and

P (T0 ≤ −2.69) = 0.0049

T

t(49

) de

nsity

−3 −2 −1 0 1 2 3

t with 49 df

Thus, because this is a one-sided hypoth-esis test, the p-value=0.0049.

12

p-value=0.0049...

“If the true mean is really µ = 100, thenthe probability of observing a sample mean(from a sample of size n = 50) this far be-low 100 (or even farther) is only 0.0049.”

4. State your conclusion for the hypothesistest:

Using 0.05 or 5100 as a threshold for ‘un-

likeliness’, we have

p-value = 0.0049 < 0.05

and we reject the null in favor of thealternative, which is that µ < 100.

13

5. Be sure to verify any assumptions thatwere needed.

As stated earlier, we checked the normalprobability plot of the emission values andit was OK, and the needed requirement forT0 ∼ t49 (that the parent population wasnormally distributed) was fulfilled.

When we rejectH0, we say the test was significant.

For this example, we say there was significantstatistical evidence that the modified enginehas a mean emissions lower than 100 mg/s.

So, there was strong evidence that the mod-ified engine is better.

14

Some comments on terminology...

• The Null Hypothesis:

– It is what we assume to be true upon en-tering the hypothesis test

In many formal arguments, we often as-sume something to be true, and then seeif we can contradict this assumptionlater.

We’re not looking to prove somethinghere, but we may find that the data werenot very likely to have occurred under thenull being true, which was the assumptionwe made (in which case we reject the null).

– Often, the null is the less interesting state-ment to the researcher.

15

– Innocent until proven guilty.We’re being cautious, we’re giving thestatus-quo the benefit of the doubt.

– The situation is assumed uninterestinguntil evidence can show (beyond reason-able doubt) that something interesting isgoing on.

– Symbolized by H0.

– It is a statement about a population pa-rameter, not a statistic.

– Example: the modified engine data,

H0 : µ = 100

16

• P-value:

– The p-value represents the probability ofobtaining a test statistic as extreme (ormore extreme) in magnitude than the ob-served test statistic under H0 true

– If you perform a two-sided hypothesis testH0 : µ = µ0 vs. H1 : µ 6= µ0,

the p-value is the probability in both tails(example on slide p.23)

– Large test statistic (in absolute value) ⇔small p-value

– Small p-values are evidence against thenull hypothesis (as are large test statistics)

– When we make a decision to reject H0 itis because the p-value is small

17

– A small p-value says we would have beenvery unlikely to have gotten a sample withdata like this if H0 were true

– The p-value is not the probability thatH0is true

– We use the calculated p-value to make aconclusion or decision on the hypothesistest based on a chosen significance level α(on next slide):

∗ Reject the null hypothesis

∗ Fail to reject the null hypothesis(i.e. accept the null hypothesis)

– We do not prove the null hypothesis true,this is not how things are set-up. We willassume it to be true right from the startof the procedure.

18

• The significance level α:

– How low must a p-value be to reject thenull?

– We set a threshold that will control ourchance of making a particular mistake.

What mistake?REJECTING H0 WHEN H0 ISACTUALLY TRUE.

This is called a type I error.

This is often seen as a big mistake.

In the emissions example, the companywould completely re-do their engine manu-facturing set-up if they reject. This wouldbe a big waste if the modified engine ac-tually wasn’t any better.

19

– We set the chance of such a mistake to beα which is often set at 0.05 (though 0.01and others are also seen).

We simply accept a 5% chance that wemake a type I error. For most situations,this chance of a mistake is considered lowenough.

– By only rejecting when the p-value is lessthen α we control the type I error at theα level.

α = P (type I error)

= P (reject H0 when H0 is true)

= P (reject H0|H0 is true)

= P (a false positive occuring)

20

• Example: An example where σ2 is knownor you have very large sample

If σ2 is known, or you have a very largesample, the test statistic will be theZ test statistic, instead of the T .

An inspector measured the full volume of asimple random sample of n = 100 cans ofjuice that were labeled as containing 12 oz.The sample had a mean volumed 11.98 ozand a standard deviation of 0.19 oz.

Let µ represent the mean fill volume for allcans of juice recently filled by the machine.

Perform a hypothesis test that µ = 12 versusµ 6= 12 at the α = 0.05 significance level.

21

ANS:

22

Documents

Chapter 9: TESTS OF HYPOTHESES FOR A SINGLE …homepage.stat.uiowa.edu/~rdecook/stat2020/notes/ch9_pt1.pdfChapter 9: TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Part 1: Intro to Hypothesis