46
INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1 Chapter 14

INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Embed Size (px)

Citation preview

Page 1: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

INTRODUCTION TO INFERENCE

BPS - 5th Ed.Chapter 14

1

Chapter 14

Page 2: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Statistical Inference

BPS - 5th Ed.Chapter 14

2

Provides methods for drawing conclusions about a population from sample dataConfidence Intervals What is the population mean?Tests of Significance Is the population mean larger than 66.5?

Page 3: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Inference about a MeanSimple Conditions

BPS - 5th Ed.Chapter 14

3

1. SRS from the population of interest2. Variable has a Normal distribution

N(, ) in the population3. Although the value of is

unknown, the value of the population standard deviation is known

Page 4: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Confidence Interval

BPS - 5th Ed.Chapter 14

4

A level C confidence interval has two parts

1. An interval calculated from the data, usually of the form:

estimate ± margin of error

2. The confidence level C, which is the probability that the interval will capture the true parameter value in repeated samples; that is, C is the success rate for the method.

Page 5: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Body Mass Index of Young Women

The Study:654 Women ages 20-29Sample mean is 26.8Population standard deviation is 7.5

Based on this information, answer the following:1.Estimate the population mean2.Find the sampling distribution3.Based on the Normal Distribution, what is the

expected BMI of 95% of all samples?

BPS - 5th Ed.Chapter 14

5

Page 6: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

6

NAEP Quantitative Scores(National Assessment of Educational Progress)

Rivera-Batiz, F. L., “Quantitative literacy and the likelihood of employment among young adults,” Journal of Human

Resources, 27 (1992), pp. 313-328.

What is the average score for all young adult males?

Page 7: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

7

NAEP Quantitative Scores

The NAEP survey includes a short test of quantitative skills, covering mainly basic arithmetic and the ability to apply it to realistic problems. Scores on the test range from 0 to 500, with higher scores indicating greater numerical abilities. It is known that NAEP scores have standard deviation = 60.

Page 8: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

8

NAEP Quantitative Scores

In a recent year, 840 men 21 to 25 years of age were in the NAEP sample. Their mean quantitative score was 272.

On the basis of this sample, estimate the mean score in the population of all 9.5 million young men of these ages.

Page 9: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

9

NAEP Quantitative Scores1. To estimate the unknown population mean ,

use the sample mean = 272.

2. The law of large numbers suggests that will be close to , but there will be some error in the estimate.

3. The sampling distribution of has the Normal distribution with mean and standard deviation

x

x

x

n

60

8402.1

Page 10: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

10

NAEP Quantitative Scores

Page 11: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

11

NAEP Quantitative Scores4. The 68-95-99.7 rule

indicates that and are within two standard deviations (4.2) of each other in about 95% of all samples.

x

267.8=4.2272=4.2 x276.2=4.2+272=4.2+x

Page 12: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

12

NAEP Quantitative Scores

So, if we estimate that lies within 4.2 of , we’ll be right about 95% of the time.

x

Page 13: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Confidence IntervalMean of a Normal Population

BPS - 5th Ed.Chapter 14

13

Take an SRS of size n from a Normal population with unknown mean and known standard deviation . A level C confidence interval for is:

n

σzx

Page 14: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Confidence IntervalMean of a Normal Population

BPS - 5th Ed.Chapter 14

14

Page 15: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study

BPS - 5th Ed.Chapter 14

15

NAEP Quantitative ScoresUsing the 68-95-99.7 rule gave an approximate 95% confidence interval. A more precise 95% confidence interval can be found using the appropriate value of z* (1.960) with the previous formula.

267.884=4.116272=1)(1.960)(2. x276.116=4.116272=1)(1.960)(2. x

We are 95% confident that the average NAEP quantitative score for all adult males is between 267.884 and 276.116.

Page 16: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Careful Interpretation of a Confidence Interval

BPS - 5th Ed.Chapter 14

16

“We are 95% confident that the mean NAEP score for the population of all adult males is between 267.884 and 276.116.”(We feel that plausible values for the population of males’ mean NAEP score are between 267.884 and 276.116.)

** This does not mean that 95% of all males will have NAEP scores between 267.884 and 276.116. **

Statistically: 95% of all samples of size 840 from the population of males should yield a sample mean within two standard errors of the population mean; i.e., in repeated samples, 95% of the C.I.s should contain the true population mean.

Page 17: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Reasoning of Tests of Significance

BPS - 5th Ed.Chapter 14

17

What would happen if we repeated the sample or experiment many times?

How likely would it be to see the results we saw if the claim of the test were true?

Do the data give evidence against the claim?

Page 18: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Stating HypothesesNull Hypothesis, H0

BPS - 5th Ed.Chapter 14

18

The statement being tested in a statistical test is called the null hypothesis.

The test is designed to assess the strength of evidence against the null hypothesis.

Usually the null hypothesis is a statement of “no effect” or “no difference”, or it is a statement of equality.

When performing a hypothesis test, we assume that the null hypothesis is true until we have sufficient evidence against it.

Page 19: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Stating HypothesesAlternative Hypothesis, Ha

BPS - 5th Ed.Chapter 14

19

The statement we are trying to find evidence for is called the alternative hypothesis.

Usually the alternative hypothesis is a statement of “there is an effect” or “there is a difference”, or it is a statement of inequality.

The alternative hypothesis should express the hopes or suspicions we bring to the data. It is cheating to first look at the data and then frame Ha to fit what the data show.

Page 20: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

The Hypotheses for Means

BPS - 5th Ed.Chapter 14

20

Null: H0: = 0

One sided alternatives

Ha:

Ha:

Two sided alternative

Ha: 0

Page 21: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Test StatisticTesting the Mean of a Normal Population

BPS - 5th Ed.Chapter 14

21

Take an SRS of size n from a Normal population with unknown mean and known standard deviation . The test statistic for hypotheses about the mean (H0: = 0) of a Normal distribution is the standardized version of :

μxz 0

x

Page 22: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

22

Sweetening Colas

Diet colas use artificial sweeteners to avoid sugar. These sweeteners gradually lose their sweetness over time. Trained testers sip the cola and assign a “sweetness score” of 1 to 10. The cola is then retested after some time and the two scores are compared to determine the difference in sweetness after storage. Bigger differences indicate bigger loss of sweetness.

Page 23: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

23

Sweetening Colas

Suppose we know that for any cola, the sweetness loss scores vary from taster to taster according to a Normal distribution with standard deviation = 1.

The mean for all tasters measures loss of sweetness.

The sweetness losses for a new cola, as measured by 10 trained testers, yields an average sweetness loss of = 1.02. Do the data provide sufficient evidence that the new cola lost sweetness in storage?

x

Page 24: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

24

Sweetening Colas If the claim that = 0 is true (no loss of sweetness, on

average), the sampling distribution of from 10 tasters is Normal with = 0 and standard deviation

The data yielded = 1.02, which is more than three standard deviations from = 0. This is strong evidence that the new cola lost sweetness in storage.

If the data yielded = 0.3, which is less than one standard deviations from = 0, there would be no evidence that the new cola lost sweetness in storage.

x

0.31610

1

n

σ

x

x

Page 25: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

25

Sweetening Colas

Page 26: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

26

Sweetening Colas

The null hypothesis is no average sweetness loss occurs, while the alternative hypothesis (that which we want to show is likely to be true) is that an average sweetness loss does occur.

H0: = 0 Ha: > 0

This is considered a one-sided test because we are interested only in determining if the cola lost sweetness (gaining sweetness is of no consequence in this study).

Page 27: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

27

Sweetening ColasIf the null hypothesis of no average sweetness loss is true, the test statistic would be:

Because the sample result is more than 3 standard deviations above the hypothesized mean 0, it gives strong evidence that the mean sweetness loss is not 0, but positive.

3.23

101

01.020

μxz

Page 28: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study II

BPS - 5th Ed.Chapter 14

28

Studying Job SatisfactionDoes the job satisfaction of assembly workers differ when their work is machine-paced rather than self-paced? A matched pairs study was performed on a sample of workers, and each worker’s satisfaction was assessed after working in each setting. The response variable is the difference in satisfaction scores, self-paced minus machine-paced.

Page 29: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study II

BPS - 5th Ed.Chapter 14

29

Studying Job SatisfactionThe null hypothesis is no average difference in scores in the population of assembly workers, while the alternative hypothesis (that which we want to show is likely to be true) is there is an average difference in scores in the population of assembly workers.

H0: = 0 Ha: ≠ 0

This is considered a two-sided test because we are interested determining if a difference exists (the direction of the difference is not of interest in this study).

Page 30: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Professors’ Salaries

BPS - 5th Ed.Chapter 14

30

A researcher reports that the average salary of assistant professors is more than $42,000. A sample of 30 assistant professors has a mean salary of $43,260. At α= 0.05, test the claim that assistant professors earn more than $42,000 a year. The standard deviation of the population is $5230.

Page 31: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Cost of Men’s Athletic Shoes

BPS - 5th Ed.Chapter 14

31

A researcher claims that the average cost of men’s athletic shoes is less than $80 with a standard deviation of 19.2. He selects a random sample of 36 pairs of shoes from a catalog and finds a mean price of $75. (The costs have been rounded to the nearest dollar.) Is there enough evidence to support the researcher’s claim at α = 0.10?

Page 32: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Cost of Rehabilitation

BPS - 5th Ed.Chapter 14

32

The Medical Rehabilitation Education Foundation reports that the cost of rehabilitation for stroke victims is $24,672. To see if the average cost of rehabilitation is different at a particular hospital, a researcher selects a random sample of 35 stroke victims at the hospital and finds that the average cost of their rehabilitation is $25,226. The standard deviation of the population is $3251. At α = 0.01, can it be concluded that the average cost of stroke rehabilitation at a particular hospital is different from $24,672?

Page 33: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

P-value

BPS - 5th Ed.Chapter 14

33

Assuming that the null hypothesis is true, the probability that the test statistic would take a value as extreme or more extreme than the value actually observed is called the P-value of the test.

The smaller the P-value, the stronger the evidence the data provide against the null hypothesis. That is, a small P-value indicates a small likelihood of observing the sampled results if the null hypothesis were true.

Page 34: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

P-value for Testing Means

BPS - 5th Ed.Chapter 14

34

Ha: > 0 P-value is the probability of getting a value as large

or larger than the observed test statistic (z) value.

Ha: < 0 P-value is the probability of getting a value as small

or smaller than the observed test statistic (z) value.

Ha: 0 P-value is two times the probability of getting a

value as large or larger than the absolute value of the observed test statistic (z) value.

Page 35: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

35

Sweetening ColasFor test statistic z = 3.23 and alternative hypothesisHa: > 0, the P-value would be:

P-value = P(Z > 3.23) = 1 – 0.9994 = 0.0006

If H0 is true, there is only a 0.0006 (0.06%) chance that we would see results at least as extreme as those in the sample; thus, since we saw results that are unlikely if H0 is true, we therefore have evidence against H0 and in favor of Ha.

Page 36: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

36

Sweetening Colas

Page 37: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study II

BPS - 5th Ed.Chapter 14

37

Studying Job SatisfactionSuppose job satisfaction scores follow a Normal distribution with standard deviation = 60. Data from 18 workers gave a sample mean score of 17. If the null hypothesis of no average difference in job satisfaction is true, the test statistic would be:

1.20

1860

0170

μxz

Page 38: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study II

BPS - 5th Ed.Chapter 14

38

Studying Job SatisfactionFor test statistic z = 1.20 and alternative hypothesisHa: ≠ 0, the P-value would be:

P-value = P(Z < -1.20 or Z > 1.20)

= 2 P(Z < -1.20) = 2 P(Z > 1.20)

= (2)(0.1151) = 0.2302

If H0 is true, there is a 0.2302 (23.02%) chance that we would see results at least as extreme as those in the sample; thus, since we saw results that are likely if H0 is true, we therefore do not have good evidence against H0 and in favor of Ha.

Page 39: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study II

BPS - 5th Ed.Chapter 14

39

Studying Job Satisfaction

Page 40: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Statistical Significance

BPS - 5th Ed.Chapter 14

40

If the P-value is as small as or smaller than the significance level (i.e., P-value ≤ ), then we say that the data give results that are statistically significant at level .

If we choose = 0.05, we are requiring that the data give evidence against H0 so strong that it would occur no more than 5% of the time when H0 is true.

If we choose = 0.01, we are insisting on stronger evidence against H0, evidence so strong that it would occur only 1% of the time when H0 is true.

Page 41: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Tests for a Population Mean

BPS - 5th Ed.Chapter 14

41

The four steps in carrying out a significance test:

1. State the null and alternative hypotheses.2. Calculate the test statistic.3. Find the P-value.4. State your conclusion in the context of

the specific setting of the test.

The procedure for Steps 2 and 3 is on the next page.

Page 42: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

BPS - 5th Ed.Chapter 14 42

Page 43: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study I

BPS - 5th Ed.Chapter 14

43

Sweetening Colas1. Hypotheses: H0: = 0

Ha: > 0

2. Test Statistic:

3. P-value: P-value = P(Z > 3.23) = 1 – 0.9994 = 0.0006

4. Conclusion:

Since the P-value is smaller than = 0.01, there is very strong evidence that the new cola loses sweetness on average during storage at room temperature.

3.23

101

01.020

μxz

Page 44: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study II

BPS - 5th Ed.Chapter 14

44

Studying Job Satisfaction1. Hypotheses: H0: = 0

Ha: ≠ 0

2. Test Statistic:

3. P-value: P-value = 2P(Z > 1.20) = (2)(1 – 0.8849) = 0.2302

4. Conclusion:

Since the P-value is larger than = 0.10, there is not sufficient evidence that mean job satisfaction of assembly workers differs when their work is machine-paced rather than self-paced.

1.20

1860

0170

μxz

Page 45: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Confidence Intervals & Two-Sided Tests

BPS - 5th Ed.Chapter 14

45

A level two-sided significance test

rejects the null hypothesis H0: = 0

exactly when the value 0 falls outside a

level (1 – ) confidence interval for .

Page 46: INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1

Case Study II

BPS - 5th Ed.Chapter 14

46

A 90% confidence interval for is:

Since 0 = 0 is in this confidence interval, it is plausible that the true value of is 0; thus, there is not sufficient evidence(at = 0.10) that the mean job satisfaction of assembly workers differs when their work is machine-paced rather than self-paced.

40.26 to 6.26

23.261718

601.64517

n

σzx

Studying Job Satisfaction