28
MATH& 146 Lesson 20 Section 2.7 Applying the Normal Model for Hypothesis Testing 1

146 20 applying_the_normal_model online

Embed Size (px)

Citation preview

Page 1: 146 20 applying_the_normal_model online

MATH& 146

Lesson 20

Section 2.7

Applying the Normal Model for

Hypothesis Testing

1

Page 2: 146 20 applying_the_normal_model online

Applying the Normal Model

Earlier this course, we used simulation techniques

to form conclusions about the population.

An alternate, more common approach is to use the

normal distribution to form conclusions.

When the sample size is sufficiently large, this

approximation generally provides us with the same

conclusions.

2

Page 3: 146 20 applying_the_normal_model online

Standard Error

Point estimates vary from sample to sample, and

we quantify this variability with what is called the

standard error (SE).

The standard error is equal to the standard

deviation associated with the estimate.

3

Page 4: 146 20 applying_the_normal_model online

Standard Error

The way we determine the standard error varies

from one situation to the next. However, typically it

is determined using a formula based on the

Central Limit Theorem.

For the time being, the standard error will just be

given. Formulas will be introduced in upcoming

lessons.

4

Page 5: 146 20 applying_the_normal_model online

Opportunity Cost

In Section 2.2 (Lesson 13) we were introduced to

the opportunity cost study, which found that

students became thriftier when they were

reminded that not spending money now means the

money can be spent on other things in the future.

Let's re-analyze the data in the context of the

normal distribution and compare the results.

5

Page 6: 146 20 applying_the_normal_model online

Opportunity Cost

The figure below summarizes the null distribution

as determined using the randomization method.

The best fitting normal distribution for the null

distribution has a mean of 0.

6

Page 7: 146 20 applying_the_normal_model online

Opportunity Cost

The standard error has a value SE = 0.078 as a

given. Recall the point estimate of the difference

was 0.20, as shown in the plot.

7

Page 8: 146 20 applying_the_normal_model online

Opportunity Cost

Now, we'll use the normal distribution approach to

compute the two-tailed p-value.

8

Page 9: 146 20 applying_the_normal_model online

Opportunity Cost

It is helpful to draw and shade a picture of the

normal distribution so we know precisely what we

want to calculate. Here we want to find the area of

the two tails representing the p-value.

9

Page 10: 146 20 applying_the_normal_model online

P-Values Method 1

There are two approaches you can take to find the

p-value. The most direct approach is to use the

normalcdf command in your calculator, along with

the point estimate, standard error, and mean of the

null distribution.

10

Page 11: 146 20 applying_the_normal_model online

P-Values Method 1

(Pt Est > mean)

For a two-tail test,

11

-value normalcdf(Pt Est,BIG,mean, ) 2p SE

The point estimate, or

observed value, assuming

it is larger than the mean.

The mean (null value) of

the null distribution.

Multiply the final result by

two to account for the other

tail.

Standard error of the point

estimate.

Go "Pt Est, BIG" if the point

estimate is larger than the

mean.

Page 12: 146 20 applying_the_normal_model online

P-Values Method 1

(Pt Est < mean)

For a two-tail test,

12

-value normalcdf( BIG,Pt Est,mean, ) 2p SE

Go "–BIG, Pt Est" if the

point estimate is smaller

than the mean.

The mean (null value) of

the null distribution.

Multiply the final result by

two to account for the other

tail.

Standard error of the point

estimate.

The point estimate, or

observed value, assuming

it is smaller than the mean.

Page 13: 146 20 applying_the_normal_model online

P-Values Method 1

For the opportunity cost study, the point estimate =

0.20, mean = 0, and SE = 0.078. Therefore,

13

-value normalcdf(0.20,999,0,0.078) 2 0.0103p

Page 14: 146 20 applying_the_normal_model online

P-Values Method 2

The second approach to calculating p-values also

uses the normalcdf command, but we compute the

Z-score for the point estimate (called a test

statistic) first, then make use of the fact that Z-

scores always have a mean of 0 and standard

deviation of 1.

14

Page 15: 146 20 applying_the_normal_model online

P-Values Method 2

For a two-tail test,

15

-value normalcdf(| |,BIG,0,1) 2p Z

The absolute value of the Z

score. That is, remove any

negative sign.

Use some arbitrarily big

number, such as 999.

Z scores always have a

mean of 0.

Multiply the final result be

two to account for the other

tail.

Z scores always have a

standard deviation of 1.

Page 16: 146 20 applying_the_normal_model online

P-Values Method 2

For the opportunity cost study, the point estimate =

0.20, mean = 0, and SE = 0.078. Therefore the Z

score is

The p-value then is

16

-value normalcdf(2.56,999,0,1) 2 0.0105p

point estimate mean 0.20 02.56

standard error 0.078Z

Page 17: 146 20 applying_the_normal_model online

Opportunity Cost

Notice that the two methods give nearly the same

p-value (0.0103 for Method 1 vs. 0.0105 for

Method 2).

Technically, these two methods should give

identical p-values, but the extra rounding of the Z

score changed the answer slightly.

Either way, both values are about the same as we

got from the randomization approach (two-tail p-

value was about .012).

17

Page 18: 146 20 applying_the_normal_model online

Opportunity Cost

As before, since the p-value is less than 0.05, we

conclude that the treatment did indeed impact

students' spending.

18

Page 19: 146 20 applying_the_normal_model online

Medical Consultant

In Section 2.4 (Lesson 15) we learned about a medical

consultant who reported that only 3 of her 62 clients

who underwent a liver transplant had complications,

which is less than the more common complication rate

of 0.10.

As in the other case studies, we identified a suitable

null distribution using a simulation approach, as shown

below.

19

Page 20: 146 20 applying_the_normal_model online

Medical Consultant

Here we have added the best-fitting normal curve to

the figure, which has a mean of 0.10. Borrowing a

formula that we'll encounter in Chapter 3 (Lesson 22),

the standard error of the distribution was also

computed: SE = 0.038.

20

Page 21: 146 20 applying_the_normal_model online

Medical Consultant

In the previous analysis, we obtained a p-value of

0.2444, and we will try to reproduce that p-value using

the normal distribution approach.

However, before we begin, we want to point out a

simple detail that is easy to overlook: the null

distribution we earlier generated is slightly skewed,

and the distribution isn't that smooth.

21

Page 22: 146 20 applying_the_normal_model online

Medical Consultant

In fact, the normal distribution only sort-of fits this

model. We'll discuss this discrepancy more in a

moment, but for the time being we will continue

with a normal distribution.

We'll again begin by creating a picture. Here a

normal distribution centered at 0.10 with a

standard error of 0.038.

22

Page 23: 146 20 applying_the_normal_model online

P-Values Method 1

Again, for this study, the point estimate = 3/62,

mean = 0.10, and SE = 0.038. Since the point

estimate is less than the mean,

23

-value normalcdf( 999,3/62,.1,.038) 2 0.1744p

Page 24: 146 20 applying_the_normal_model online

P-Values Method 2

For the medical consultant study, the Z score is

The p-value then is

24

-value normalcdf(1.358,999,0,1) 2 0.1745p

point estimate mean 3 / 62 0.101.358

standard error 0.038Z

Page 25: 146 20 applying_the_normal_model online

Conditions for Inference

Both methods give a p-value of about 0.1744. This is

the estimated p-value for the hypothesis test.

However, there's a problem: this is very different than

the earlier (simulated) p-value we computed: 0.2444.

The discrepancy is explained by normal model's poor

representation of the null distribution. As noted earlier,

the null distribution from the simulations is not very

smooth, and the distribution itself is slightly skewed.

That's the bad news.

25

Page 26: 146 20 applying_the_normal_model online

Conditions for Inference

The good news is that we can foresee these problems

using some simple checks.

Previously, we noted that the two common

requirements to apply the Central Limit Theorem are

(1) independence, and (2) a large enough sample size

(in this case, at least 10 successes and 10 failures).

The guidelines for this particular situation would have

alerted us that the normal model was a poor

approximation.

26

Page 27: 146 20 applying_the_normal_model online

Conditions for Inference

The success story in this section was the application of

the normal model in the context of the opportunity cost

data.

However, the biggest lesson comes from our failed

attempt to use the normal approximation in the

medical consultant case study.

Statistical techniques are like a carpenter's tools.

When used responsibly, they can produce amazing

and precise results.

27

Page 28: 146 20 applying_the_normal_model online

Conditions for Inference

However, if the tools are applied irresponsibly or under

inappropriate conditions, they will produce unreliable

results.

For this reason, with every statistical method that we

introduce in future lessons, we will carefully outline

conditions when the method can reasonably be used.

These conditions should be checked in each

application of the technique.

28