84
1 Inference about Comparing Two Populations Chapter 13

1 Inference about Comparing Two Populations Chapter 13

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Inference about Comparing Two Populations Chapter 13

1

Inference about Comparing Two

Populations

Chapter 13

Page 2: 1 Inference about Comparing Two Populations Chapter 13

2

13.1 Introduction13.1 Introduction

• In previous discussions we presented methods designed to make an inference about characteristics of a single population. We estimated, for example the population mean, or hypothesized on the value of the standard deviation.

• However, in the real world we encounter many times the need to study the relationship between two populations. – For example, we want to compare the effects of a new drug on blood

pressure, in which case we can test the relationship between the mean blood pressure of two groups of individuals: those who take the drug, and those who don’t.

– Or, we are interested in the effects a certain ad has on voters’ preferences as part of an election campaign. In this case we can estimate the difference in the proportion of voters who prefer one candidate before and after the ad is televised.

Page 3: 1 Inference about Comparing Two Populations Chapter 13

3

13.1 Introduction13.1 Introduction

• Variety of techniques are presented whose objective is to compare two populations.

• These techniques are designed to study the…– difference between two means.– ratio of two variances.– difference between two proportions.

Page 4: 1 Inference about Comparing Two Populations Chapter 13

4

• The reason we are looking at the difference between the two means is that is strongly related to a normal distribution, whose mean is 1 – 2. See next for details.

21 xx

• Two random samples are therefore drawn from the two populations of interest and their means and are calculated.

1x 2x

• We’ll look at the relationship between the two population means by analyzing the value of 1 – 2.

13.2 Inference about the Difference between Two Means: Independent Samples13.2 Inference about the Difference between Two Means: Independent Samples

Page 5: 1 Inference about Comparing Two Populations Chapter 13

5

The Sampling Distribution ofThe Sampling Distribution of is normally distributed if the (original)

population distributions are normal .

is approximately normally distributed if the (original) population is not normal, but the samples’ size is sufficiently large (greater than 30).

The expected value of is 1 - 2

The variance of is

21 xx

21 xx

21xx

21 xx

21 xx

2

22

1

21

Page 6: 1 Inference about Comparing Two Populations Chapter 13

6

• If the sampling distribution of is normal or approximately normal we can write:

• Z can be used to build a test statistic or a confidence interval for 1 - 2

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

21xx

Making an inference about –Making an inference about –

Page 7: 1 Inference about Comparing Two Populations Chapter 13

7

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

• Practically, the “Z” statistic is hardly used, because the population variances are not known.

? ?

• Instead, we construct a “t” statistic using the sample “variances” (S1

2 and S22).

S22S1

2t

Making an inference about –Making an inference about –

Page 8: 1 Inference about Comparing Two Populations Chapter 13

8

• Two cases are considered when producing the t-statistic.– The two unknown population variances are equal.– The two unknown population variances are not equal.

Making an inference about –Making an inference about –

Page 9: 1 Inference about Comparing Two Populations Chapter 13

9

Inference about Inference about ––: Equal variances: Equal variances

• If the two variances 12 and 2

2 are equal to one another, then their estimate S1

2 and S22

estimate the same value.• Therefore, we can pool the two sample

variances and provide a better estimate of the common populations’ variance, based on a larger amount of information.

• This is done by forming the pooled variance estimate. See next.

Page 10: 1 Inference about Comparing Two Populations Chapter 13

10

To get some intuition about this pooled estimate,

note that we can re-write it as

which has the form of a weighted average of the two sample variances. The weights are the relative sample sizes. A larger sample provides larger weight and thus influences the pooled estimate more (it might be easier to eliminate the values ‘-1’ and ‘-2’ from the formula in order to see the structuremore easily

22

21

221

21

12p S

2nn1n

S2nn

1nS

Inference about Inference about ––: Equal variances: Equal variances

2nns)1n(s)1n(

S21

2

22

2

112

p

2nn

s)1n(s)1n(S

21

2

22

2

112

p

• Calculate the pooled variance estimate by:

Page 11: 1 Inference about Comparing Two Populations Chapter 13

11

Inference about Inference about ––: Equal variances: Equal variances

2nns)1n(s)1n(

S21

2

22

2

112

p

2nn

s)1n(s)1n(S

21

2

22

2

112

p

Example: S12 = 25; S2

2 = 30; n1 = 10; n2 = 15. Then,

04347.2821510

)30)(115()25)(110(S2

p

• Calculate the pooled variance estimate by:

Page 12: 1 Inference about Comparing Two Populations Chapter 13

12

Note how Sp2 replaces both

S12 and S2

2.

2

2p

1

2p

2121

ns

ns

)μ(μ)xx(t

2

2p

1

2p

2121

ns

ns

)μ(μ)xx(t

Inference about Inference about ––: Equal variances: Equal variances

• Construct the t-statistic as follows:

2nnd.f.

n1

n1

s

)μ(μ)xx(t

21

21

2p

2121

2nnd.f.

n1

n1

s

)μ(μ)xx(t

21

21

2p

2121

Page 13: 1 Inference about Comparing Two Populations Chapter 13

13

1n)ns(

1n)ns(

)nsn(sd.f.

ns

ns

)μ(μ)xx(t

2

2

222

1

21

21

22

221

21

2

22

1

21

2121

1n)ns(

1n)ns(

)nsn(sd.f.

ns

ns

)μ(μ)xx(t

2

2

222

1

21

21

22

221

21

2

22

1

21

2121

Inference about –: Unequal variancesInference about –: Unequal variances

Page 14: 1 Inference about Comparing Two Populations Chapter 13

14

Which case to use:Equal variance or unequal variance?

Which case to use:Equal variance or unequal variance?

• Whenever there is insufficient evidence that the variances are unequal, it is preferable to run the equal variances t-test.

• This is so, because for any two given samples

The number of degrees of freedom for the equal variances case

The number of degrees of freedom for the unequal variances case

Page 15: 1 Inference about Comparing Two Populations Chapter 13

15

Page 16: 1 Inference about Comparing Two Populations Chapter 13

16

• Example 13.1– Do people who eat high-fiber cereal for

breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?

– A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal.

– For each person the number of calories consumed at lunch was recorded.

Example: Making an inference about –

Example: Making an inference about –

Page 17: 1 Inference about Comparing Two Populations Chapter 13

17

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Solution: • The data are quantitative. • The parameter to be tested is the difference between two means. • The claim to be tested is: The mean caloric intake of consumers (1) is less than that of non-consumers (2).

Example: Making an inference about –

Example: Making an inference about –

Page 18: 1 Inference about Comparing Two Populations Chapter 13

18

• The hypotheses are:

H0: 1 - 2 = 0H1: 1 - 2 < 0

– To check the relationships between the variances, we use a computer output to find the sample

variances (Xm13-1.xls). From the data we have S1

2= 4103, and S22 =

10,670.

– It appears that the variances are unequal.

Example: Making an inference about –

Example: Making an inference about –

11= mean caloric intake for fiber consumers= mean caloric intake for fiber consumers

22= mean caloric intake for fiber non-consumers= mean caloric intake for fiber non-consumers

Page 19: 1 Inference about Comparing Two Populations Chapter 13

19

• Solving by hand– From the data we have:

123122.6

110710710670

143434103

10710670434103df

10,670s4,103,s633.23x604.2,x

22

22

2121

Example: Making an inference about –

Example: Making an inference about –

Page 20: 1 Inference about Comparing Two Populations Chapter 13

20

• Solving by hand– H1: 1 - 2 < 0

The rejection region is t < -tdf = -t.05,123 1.658

-2.09

10710670

434103

)0()23.6332.604(

ns

ns

)()xx(t

2

22

1

21

21

Example: Making an inference about –

Example: Making an inference about –

Page 21: 1 Inference about Comparing Two Populations Chapter 13

21

Example: Making an inference about –

Example: Making an inference about –

t-Test: Two-Sample Assuming Unequal Variances

ConsumersNonconsumersMean 604.023 633.234Variance 4102.98 10669.8Observations 43 107Hypothesized Mean Difference0df 123t Stat -2.09107P(T<=t) one-tail 0.01929t Critical one-tail 1.65734P(T<=t) two-tail 0.03858t Critical two-tail 1.97944

At 5% significance level there is sufficient evidence to reject the null hypothesis.

-2.09107 < -1.65734

Xm13-1.xls

.01929 < .05

Page 22: 1 Inference about Comparing Two Populations Chapter 13

22

56.1,86.5665.2721.29107

1067043

41039796.1)239.63302.604(

2n

22

s

1n

21

s

2t)

2x

1x(

• Solving by handThe confidence interval estimator for the differencebetween two means when the variances are unequal is

Example: Making an inference about –

Example: Making an inference about –

Page 23: 1 Inference about Comparing Two Populations Chapter 13

23

Note that the confidence interval for the differenceNote that the confidence interval for the differencebetween the two means falls entirely in the negativebetween the two means falls entirely in the negativeregion: [-56.86, -1.56]; even at best the difference region: [-56.86, -1.56]; even at best the difference between the two means is mbetween the two means is m11 – m – m22 = -1.56, so we = -1.56, so we

can be 95% confident mcan be 95% confident m1 1 is smaller than mis smaller than m2 2 !!

This conclusion agrees with the results of the test This conclusion agrees with the results of the test performed before.performed before.

Example: Making an inference about –

Example: Making an inference about –

Page 24: 1 Inference about Comparing Two Populations Chapter 13

24

• Example 13.2– An ergonomic chair can be assembled using two

different sets of operations (Method A and Method B)

– The operations manager would like to know whether the assembly time under the two methods differ.

Example: Making an inference about –

Example: Making an inference about –

Page 25: 1 Inference about Comparing Two Populations Chapter 13

25

• Example 13.2– Two samples are randomly and independently selected

• A sample of 25 workers assembled the chair using design A.

• A sample of 25 workers assembled the chair using design B.

• The assembly times were recorded

– Do the assembly times of the two methods differs?

Example: Making an inference about –

Example: Making an inference about –

Page 26: 1 Inference about Comparing Two Populations Chapter 13

26

Example: Making an inference about –

Example: Making an inference about –

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Assembly times in Minutes

Solution

• The data are quantitative.

• The parameter of interest is the difference between two population means.

• The claim to be tested is whether a difference between the two designs exists.

Page 27: 1 Inference about Comparing Two Populations Chapter 13

27

Example: Making an inference about –

Example: Making an inference about –• Solving by hand

–The hypotheses test is:

H0: 1 - 2 0 H1: 1 - 2 0

– To check the relationship between the two variances we calculate the value of S1

2 and S22 (Xm13-02.xls).

– From the data we have S12= 0.8478, and S2

2 =1.3031.

so 12 and 2

2 appear to be equal.

Page 28: 1 Inference about Comparing Two Populations Chapter 13

28

Example: Making an inference about –

Example: Making an inference about –• Solving by hand

4822525.f.d

93.0

251

251

076.1

0)016.6288.6(t

4822525.f.d

93.0

251

251

076.1

0)016.6288.6(t

3031.1s 8478.0s 016.6x 288.6x 22

2121

076.122525

)303.1)(125()848.0)(125(S2

p

– To calculate the t-statistic we have:

Page 29: 1 Inference about Comparing Two Populations Chapter 13

29

• The 2-tail rejection region is t < -t =-t.025,48 = -2.009 or t > t = t.025,48 = 2.009

• The test: Since t= -2.009 < 0.93 < 2.009, there is insufficient evidence to reject the null hypothesis.

For = 0.05

2.009.093-2.009

Rejection regionRejection region

Example: Making an inference about –

Example: Making an inference about –

Page 30: 1 Inference about Comparing Two Populations Chapter 13

30

Example: Making an inference about –

Example: Making an inference about –

t-Test: Two-Sample Assuming Equal Variances

Design-A Design-BMean 6.288 6.016Variance 0.847766667 1.3030667Observations 25 25Pooled Variance 1.075416667Hypothesized Mean Difference0df 48t Stat 0.927332603P(T<=t) one-tail 0.179196744t Critical one-tail 1.677224191P(T<=t) two-tail 0.358393488t Critical two-tail 2.01063358

t-Test: Two-Sample Assuming Equal Variances

Design-A Design-BMean 6.288 6.016Variance 0.847766667 1.3030667Observations 25 25Pooled Variance 1.075416667Hypothesized Mean Difference0df 48t Stat 0.927332603P(T<=t) one-tail 0.179196744t Critical one-tail 1.677224191P(T<=t) two-tail 0.358393488t Critical two-tail 2.01063358

.35839 > .05-2.0106 < .9273 < +2.0106

Xm13-02.xls

Page 31: 1 Inference about Comparing Two Populations Chapter 13

31

• Conclusion: From this experiment, it is unclear at 5% significance level if the two assembly methods are different in terms of assembly time

Example: Making an inference about –

Example: Making an inference about –

Page 32: 1 Inference about Comparing Two Populations Chapter 13

32

Example: Making an inference about –Constructing a Confidence Interval

Example: Making an inference about –Constructing a Confidence IntervalA 95% confidence interval for 1 - 2 when the two variances are

equal is calculated as follows:

]8616.0,3176.0[5896.0272.0

)251

251

1.075(0106.2016.6288.6

)n1

n1

(st)xx(21

2

p21

Thus, at 95% confidence level -0.3176 < 1 - 2 < 0.8616

Notice: “Zero” is included in the confidence interval and therefore the two mean values could be equal.

Page 33: 1 Inference about Comparing Two Populations Chapter 13

33

Checking the required Conditions for the equal variances case (example 13.2)Checking the required Conditions for the equal variances case (example 13.2)

The data appear to be approximately normal

0

2

4

6

8

10

12

5 5.8 6.6 7.4 8.2 More

Design A

01234567

4.2 5 5.8 6.6 7.4 More

Design B

Page 34: 1 Inference about Comparing Two Populations Chapter 13

34

13.4 Matched Pairs Experiment -Dependent samples

13.4 Matched Pairs Experiment -Dependent samples

• What is a matched pair experiment?• A matched pairs experiment is a sampling design in which every two

observations share some characteristic. For example, suppose we are interested in increasing workers productivity. We establish a compensation program and want to study its efficiency. We could select two groups of workers, measure productivity before and after the program is established and run a test as we did before.

• But, if we believe workers’ age is a factor that may affect changes in productivity, we can divide the workers into different age groups, select a worker from each age group, and measure his or her productivity twice. One time before and one time after the program is established. Each two observations constitute a matched pair, and because they belong to the same age group they are not independent.

Page 35: 1 Inference about Comparing Two Populations Chapter 13

35

13.4 Matched Pairs Experiment -Dependent samples

13.4 Matched Pairs Experiment -Dependent samples

Why matched pairs experiments are needed?

The following example demonstrates a situationwhere a matched pair experiment is the correct approach to testing the difference between two population means.

Page 36: 1 Inference about Comparing Two Populations Chapter 13

36

Example 13.3 – To investigate the job offers obtained by MBA graduates, a

study focusing on salaries was conducted.– Particularly, the salaries offered to finance majors were

compared to those offered to marketing majors.– Two random samples of 25 graduates in each discipline were

selected, and the highest salary offer was recorded for each one.

– From the data, can we infer that finance majors obtain higher

salary offers than marketing majors among MBAs?.

13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment

Additional example

Page 37: 1 Inference about Comparing Two Populations Chapter 13

37

• Solution– Compare two populations of

quantitative data.

– The parameter tested is 1 - 2

Finance Marketing61,228 73,36151,836 36,95620,620 63,62773,356 71,06984,186 40,203

. .

. .

. .

1

2

The mean of the highest salaryoffered to Finance MBAs

The mean of the highest salaryoffered to Marketing MBAs

– H0: 1 - 2 = 0 H1: 1 - 2 > 0

13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment

Page 38: 1 Inference about Comparing Two Populations Chapter 13

38

• Solution – continued

From Xm13-3.xls we have:

559,228,262s

,294,433,360s

423,60x624,65x

22

21

2

1

• Let us assume equal variances

13.4 Matched Pairs Experiment13.4 Matched Pairs Experiment

t-Test: Two-Sample Assuming Equal Variances

Finance MarketingMean 65624 60423Variance 360433294 262228559Observations 25 25Pooled Variance 311330926Hypothesized Mean Difference 0df 48t Stat 1.04215119P(T<=t) one-tail 0.15128114t Critical one-tail 1.67722419P(T<=t) two-tail 0.30256227t Critical two-tail 2.01063358

There is insufficient evidence to concludethat Finance MBAs are offered higher salaries than marketing MBAs.

Page 39: 1 Inference about Comparing Two Populations Chapter 13

39

• Question– The difference between the sample means is

65624 – 60423 = 5,201.– So, why could not we reject H0 and favor H1?

The effect of a large sample variabilityThe effect of a large sample variability

Page 40: 1 Inference about Comparing Two Populations Chapter 13

40

• Answer: – Sp

2 is large (because the sample variances are large) Sp

2 = 311,330,926. – A large variance reduces the value of the t statistic

and it becomes more difficult to reject H0.

The effect of a large sample variabilityThe effect of a large sample variability

)n1

n1

(s

)()xx(t

21

2p

21

Recall that rejection of thenull hypothesis occurs when‘t’ is sufficiently large (t>t).A large Sp

2 reduces ‘t’ and therefore it does not fall inthe rejection region.

Page 41: 1 Inference about Comparing Two Populations Chapter 13

41

The matched pairs experimentThe matched pairs experiment

• We are looking for hypotheses formulation where the variability of the two samples has been reduced.

• By taking matched pair observations and testing the differences per pair we achieve two goals:– We still test 1 – 2 (see explanation next)– The variability used to calculate the t-statistic is

usually smaller (see explanation next).

Page 42: 1 Inference about Comparing Two Populations Chapter 13

42

The matched pairs experiment – Are we still testing 1 – 2?

The matched pairs experiment – Are we still testing 1 – 2?

• Note that the difference between the two means is equal to the mean difference of pairs of observations

• A short exampleGroup 1 Group 2 Difference

10 12 - 215 11 +4

Mean1 =12.5 Mean2 =11.5Mean1 – Mean2 = 1 Mean Differences = 1

Page 43: 1 Inference about Comparing Two Populations Chapter 13

43

The matched pairs experiment – Reducing the variability

The matched pairs experiment – Reducing the variability

Observations might markedly differ...

The range of observationssample B

The range of observationssample A

Page 44: 1 Inference about Comparing Two Populations Chapter 13

44

...but the differences between pairs of observations might have much smaller variability.

0

Differences

The range of thedifferences

The matched pairs experiment – Reducing the variability

The matched pairs experiment – Reducing the variability

Page 45: 1 Inference about Comparing Two Populations Chapter 13

45

• Example 12.4 (12.3 part II)– It was suspected that salary offers were affected by

students’ GPA, (which caused S12 and S2

2 to increase).– To reduce this variability, the following procedure was

used:• 25 ranges of GPAs were predetermined.• Students from each major were randomly selected, one from

each GPA range.• The highest salary offer for each student was recorded.

– From the data presented can we conclude that Finance majors are offered higher salaries?

The matched pairs experimentThe matched pairs experiment

Page 46: 1 Inference about Comparing Two Populations Chapter 13

46

• Solution (by hand)– The parameter tested is D (=1 – 2)– The hypotheses:

H0: D = 0H1: D > 0

– The t statistic:

ns

xt

D

DD

ns

xt

D

DD

The matched pairs hypothesis testThe matched pairs hypothesis test

The rejection region is t > t.05,25-1 = 1.711

Page 47: 1 Inference about Comparing Two Populations Chapter 13

47

• Solution (by hand) – continue – From the data (Xm13-4.xls) calculate:

GPA Group Finance Marketing1 95171 893292 88009 927053 98089 992054 106322 990035 74566 748256 87089 770387 88664 782728 71200 594629 69367 5155510 82618 81591

. .

. .

. .

Difference5842

-4696-11167319-259

100511039211738178121027

.

.

.

The matched pairs hypothesis testThe matched pairs hypothesis test

Difference

Mean 5064.52Standard Error 1329.3791Median 3285Mode #N/AStandard Deviation 6646.8953Sample Variance 44181217Kurtosis -0.659419Skewness 0.359681Range 23533Minimum -5721Maximum 17812Sum 126613Count 25

Using Descriptive Statistics in Excel we get:

Page 48: 1 Inference about Comparing Two Populations Chapter 13

48

• Solution (by hand) – continue

– Calculate t

647,6s065.5x

D

D

81.325664705065

nsx

tD

DD

The matched pairs hypothesis testThe matched pairs hypothesis test

See conclusion laterSee conclusion later

Page 49: 1 Inference about Comparing Two Populations Chapter 13

49

t-Test: Paired Two Sample for Means

Finance MarketingMean 65438.2 60373.68Variance 4.45E+08 4.69E+08Observations 25 25Pearson Correlation 0.952025Hypothesized Mean Difference0df 24t Stat 3.809688P(T<=t) one-tail 0.000426t Critical one-tail 1.710882P(T<=t) two-tail 0.000851t Critical two-tail 2.063898

Recall: The rejection regionis t > t. Indeed, 3.809 > 1.7108

.000426 < .05

The matched pairs hypothesis testThe matched pairs hypothesis test

Xm13-4.xls

Using Data Analysis in Excel

Page 50: 1 Inference about Comparing Two Populations Chapter 13

50

Conclusion: There is sufficient evidence to infer at 5% significance level that the Finance MBAs’ highest salary offer is, on the average, higher than this ofthe Marketing MBAs.

The matched pairs hypothesis testThe matched pairs hypothesis test

Page 51: 1 Inference about Comparing Two Populations Chapter 13

51

The matched pairs mean difference estimation

The matched pairs mean difference estimation

744,2065,525

6647064.25065is4.13examplein

differencemeantheofervalintconfidence%95The5.13Example

ns

tx

ofEstimatorervalintConfidence

1n,2/D

D

Page 52: 1 Inference about Comparing Two Populations Chapter 13

52

The matched pairs mean difference estimation

The matched pairs mean difference estimation

Using Data Analysis Plus Xm13-4.xlst-Estimate:Mean

DifferenceMean 5065Standard Deviation 6647LCL 2321UCL 7808

GPA Group Finance Marketing1 95171 893292 88009 927053 98089 992054 106322 990035 74566 748256 87089 770387 88664 782728 71200 594629 69367 5155510 82618 81591

. .

. .

. .

Difference5842

-4696-11167319-259

100511039211738178121027

.

.

.

First calculate the differences for each pair, then run the confidence interval procedure in Data Analysis Plus.

Page 53: 1 Inference about Comparing Two Populations Chapter 13

53

Checking the required conditionsfor the paired observations case

Checking the required conditionsfor the paired observations case

• The validity of the results depends on the normality of the differences.

Diffrences

0

2

4

6

-300

0 030

0060

0090

00

1200

0

1500

0

1800

0M

ore

Page 54: 1 Inference about Comparing Two Populations Chapter 13

54

13.5 Inferences about the ratio 13.5 Inferences about the ratio of two variancesof two variances

13.5 Inferences about the ratio 13.5 Inferences about the ratio of two variancesof two variances

• In this section we draw inference about the relationship between two population variances.

• This question is interesting because:– Variances can be used to evaluate the consistency

of processes. – The relationships between variances determine the

technique used to test relationships between mean values

Page 55: 1 Inference about Comparing Two Populations Chapter 13

55

• The parameter tested is 12/2

2

• The statistic used is 22

22

21

21

ss

F

Parameter tested and statistic Parameter tested and statistic

• The Sampling distribution of 12/2

2

– The statistic [s12/1

2] / [s22/2

2] follows the F distribution with…Numerator d.f. = n1 – 1, and Denominator d.f. = n2 – 1.

Page 56: 1 Inference about Comparing Two Populations Chapter 13

56

– Our null hypothesis is always

H0: 12 / 2

2 = 1

– Under this null hypothesis the F statistic becomes

F =S1

2/12

S22/2

2

22

21

ss

F22

21

ss

F

Parameter tested and statistic Parameter tested and statistic

Page 57: 1 Inference about Comparing Two Populations Chapter 13

57

Page 58: 1 Inference about Comparing Two Populations Chapter 13

58

(see example 13.1)In order to test whether having a rich-in-fiber breakfast reduces the amount of caloric intake at lunch, we need to decide whether the variances are equal or not.

Example 13.6 (revisiting 13.1)

Calories intake at lunch

The hypotheses are:

H0:

H1: 1

1

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Testing the ratio of two population variances Testing the ratio of two population variances

Page 59: 1 Inference about Comparing Two Populations Chapter 13

59

1n,1n,2

1n,1n,2

12

21

F1

F

FF

– The F statistic value is F=S12/S2

2 = .3845

– Conclusion: Because .3845<.63 we can reject the null hypothesis in favor of the alternative hypothesis, and conclude that there is sufficient evidence in the data to argue at 5% significance level that the variance of the two groups differ.

Testing the ratio of two population variances Testing the ratio of two population variances• Solving by hand

– The rejection region is

63.F

1F

1

61.1FF

40,120,025.42,106,025.

120,40,025.106,42,025.

Page 60: 1 Inference about Comparing Two Populations Chapter 13

60

(see Xm13.1)

The hypotheses are:

H0:

H1: 1

1

F-Test Two-Sample for Variances

Consumers NonconsumersMean 604.0232558 633.2336449Variance 4102.975637 10669.76565Observations 43 107df 42 106F 0.384542245P(F<=f) one-tail0.000368433F Critical one-tail0.637072617

F-Test Two-Sample for Variances

Consumers NonconsumersMean 604.0232558 633.2336449Variance 4102.975637 10669.76565Observations 43 107df 42 106F 0.384542245P(F<=f) one-tail0.000368433F Critical one-tail0.637072617

Example 13.6 (revisiting 13.1)

Testing the ratio of two population variances Testing the ratio of two population variances

From Data AnalysisFrom Data Analysis

Page 61: 1 Inference about Comparing Two Populations Chapter 13

61

Estimating the Ratio of Two Population Variances

Estimating the Ratio of Two Population Variances

• From the statistic F = [s12/1

2] / [s22/2

2] we can isolate 1

2/22 and build the following confidence

interval:

1nand1nwhere

Fs

sF

1

s

s

221

1,2,2/22

21

22

21

2,1,2/22

21

1nand1nwhere

Fs

sF

1

s

s

221

1,2,2/22

21

22

21

2,1,2/22

21

Page 62: 1 Inference about Comparing Two Populations Chapter 13

62

• Example 13.7– Determine the 95% confidence interval estimate of the ratio of

the two population variances in example 12.1– Solution

• We find F/2,v1,v2 = F.025,40,120 = 1.61 (approximately)

F/2,v2,v1 = F.025,120,40 = 1.72 (approximately)

• LCL = (s12/s2

2)[1/ Fa/2,v1,v2 ]

= (4102.98/10,669.770)[1/1.61]= .2388

• UCL = (s12/s2

2)[ Fa/2,v2,v1 ]

= (4102.98/10,669.770)[1.72]= .6614

Estimating the Ratio of Two Population VariancesEstimating the Ratio of Two Population Variances

Page 63: 1 Inference about Comparing Two Populations Chapter 13

63

13.6 Inference about the difference between two population proportions13.6 Inference about the difference between two population proportions• In this section we deal with two populations whose data

are nominal.• For nominal data we compare the population

proportions of the occurrence of a certain event.• Examples

– Comparing the effectiveness of new drug vs.old one– Comparing market share before and after advertising

campaign– Comparing defective rates between two machines

Page 64: 1 Inference about Comparing Two Populations Chapter 13

64

Parameter tested and statisticParameter tested and statistic

• Parameter– When the data is nominal, we can only count the

occurrences of a certain event in the two populations, and calculate proportions.

– The parameter tested is therefore p1 – p2.

• Statistic– An unbiased estimator of p1 – p2 is (the

difference between the sample proportions). 21 p̂p̂

Page 65: 1 Inference about Comparing Two Populations Chapter 13

65

Sample 1 Sample size n1

Number of successes x1

Sample proportion

Sample 1 Sample size n1

Number of successes x1

Sample proportion

Sampling distribution ofSampling distribution of

• Two random samples are drawn from two populations.• The number of successes in each sample is recorded.• The sample proportions are computed.

Sample 2 Sample size n2

Number of successes x2

Sample proportion

Sample 2 Sample size n2

Number of successes x2

Sample proportion

2

22 n

xp̂

21 p̂p̂

1

11 n

xp ˆ

Page 66: 1 Inference about Comparing Two Populations Chapter 13

66

• The statistic is approximately normally distributed if n1p1, n1(1 - p1), n2p2, n2(1 - p2) are all equal to or greater than 5.

• The mean of is p1 - p2.

• The variance of is p1(1-p1) /n1)+ (p2(1-p2)/n2)

21 p̂p̂

21 p̂p̂

21 p̂p̂

Sampling distribution ofSampling distribution of 21 p̂p̂

Page 67: 1 Inference about Comparing Two Populations Chapter 13

67

Because p1 and p2 are unknown, we use their estimates instead. Thus,

should all be equal to or greater than 5.

)p̂1(n,p̂n),p̂1(n,p̂n 22221111

2

22

1

11

2121

n)p1(p

n)p1(p

)pp()p̂p̂(Z

2

22

1

11

2121

n)p1(p

n)p1(p

)pp()p̂p̂(Z

The z-statisticThe z-statistic

Page 68: 1 Inference about Comparing Two Populations Chapter 13

68

Testing p1 – p2 Testing p1 – p2

• There are two cases to consider:Case 1: H0: p1-p2 =0

Calculate the pooled proportion

21

21

nn

xxp̂

Then Then

Case 2: H0: p1-p2 =D (D is not equal to 0)Do not pool the data

2

22 n

xp̂

1

11 n

xp̂

)n1

n1

)(p̂1(p̂

)p̂p̂(Z

21

21

)n1

n1

)(p̂1(p̂

)p̂p̂(Z

21

21

2

22

1

11

21

n)p̂1(p̂

n)p̂1(p̂

D)p̂p̂(Z

2

22

1

11

21

n)p̂1(p̂

n)p̂1(p̂

D)p̂p̂(Z

Page 69: 1 Inference about Comparing Two Populations Chapter 13

69

• Example 13.8– Management needs to decide which of two new

packaging designs to adopt, to help improve sales of a certain soap.

– A study is performed in two communities:• Design A is distributed in Community 1.• Design B is distributed in Community 2.• The old design packages is still offered in both

communities.– Design A is more expensive, therefore,to be

financially viable it has to outsell design B.

Testing p1 – p2 (Case I) Testing p1 – p2 (Case I)

Page 70: 1 Inference about Comparing Two Populations Chapter 13

70

• Summary of the experiment results– Community 1 - 580 packages with new design A sold

324 packages with old design sold– Community 2 - 604 packages with new design B sold

442 packages with old design sold

– Use 5% significance level and perform a test to find which type of packaging to use.

Testing p1 – p2 (Case I) Testing p1 – p2 (Case I)

Page 71: 1 Inference about Comparing Two Populations Chapter 13

71

• Solution– The problem objective is to compare the population

of sales of the two packaging designs.– The data is qualitative (yes/no for the purchase of

the new design per customer)– The hypotheses test are

H0: p1 - p2 = 0H1: p1 - p2 > 0

– We identify here case 1.

Population 1 – purchases of Design APopulation 2 – purchases of Design B

Testing p1 – p2 (Case I) Testing p1 – p2 (Case I)

Page 72: 1 Inference about Comparing Two Populations Chapter 13

72

• Solving by hand– For a 5% significance level the rejection region is

z > z = z.05 = 1.645

6072.)1046904()604580()nn()xx(p̂

isproportionpooledThe

2121

89.2

10461

9041

)6072.1(6072.

5774.6416.

n1

n1

)p̂1(p̂

)pp()p̂p̂(Z

becomesstatisticzThe

21

2121

5774.1046604p̂and,6416.904580p̂

aresproportionsampleThe

21 From Xm13-08.xls we have:

Testing p1 – p2 (Case I) Testing p1 – p2 (Case I)

Page 73: 1 Inference about Comparing Two Populations Chapter 13

73

• Conclusion: At 5% significance level there sufficient evidence to infer that the proportion of sales with design A is greater that the proportion of sales with design B (since 2.89 > 1.645).

Testing p1 – p2 (Case I) Testing p1 – p2 (Case I)

Page 74: 1 Inference about Comparing Two Populations Chapter 13

74

• Excel (Data Analysis Plus)

Testing p1 – p2 (Case I) Testing p1 – p2 (Case I)

z-Test: Two ProportionsCommunity 1 Community 2

sample proportions 0.6416 0.5774Observations 904 1046Hypothesized Difference 0z Stat 2.89P(Z<=z) one tail 0.0019z Critical one-tail 1.6449P(Z<=z) two-tail 0.0038z Critical two-tail 1.96

Xm13-08.xls

• ConclusionSince 2.89 > 1.645, there is sufficient evidence in the data to conclude at 5% significance level, that design A will outsell design B.

Additional example

Page 75: 1 Inference about Comparing Two Populations Chapter 13

75

• Example 13.9 (Revisit example 13.08)– Management needs to decide which of two new

packaging designs to adopt, to help improve sales of a certain soap.

– A study is performed in two communities:• Design A is distributed in Community 1.• Design B is distributed in Community 2.• The old design packages is still offered in both communities.

– For design A to be financially viable it has to outsell design B by at least 3%.

Testing p1 – p2 (Case II) Testing p1 – p2 (Case II)

Page 76: 1 Inference about Comparing Two Populations Chapter 13

76

• Summary of the experiment results– Community 1 - 580 packages with new design A sold

324 packages with old design sold– Community 2 - 604 packages with new design B sold

442 packages with old design sold• Use 5% significance level and perform a test to

find which type of packaging to use.

Testing p1 – p2 (Case II) Testing p1 – p2 (Case II)

Page 77: 1 Inference about Comparing Two Populations Chapter 13

77

• Solution– The hypotheses to test are

H0: p1 - p2 = .03H1: p1 - p2 > .03

– We identify case 2 of the test for difference in proportions (the difference is not equal to zero).

Testing p1 – p2 (Case II) Testing p1 – p2 (Case II)

Page 78: 1 Inference about Comparing Two Populations Chapter 13

78

58.1

1046)577.1(577.

904)642.1(642.

03.442604

604324580

580

n)p̂1(p̂

n)p̂1(p̂

D)p̂p̂(Z

2

22

1

11

21

• Solving by hand

The rejection region is z > z = z.05 = 1.645.Conclusion: Since 1.58 < 1.645 do not reject the null hypothesis. There is insufficient evidence to infer that packaging with Design A will outsell this of Design B by 3% or more.

Testing p1 – p2 (Case II) Testing p1 – p2 (Case II)

Page 79: 1 Inference about Comparing Two Populations Chapter 13

79

• Using Excel (Data Analysis Plus)

Testing p1 – p2 (Case II) Testing p1 – p2 (Case II)

Xm13-08.xls

z-Test: Two Proportions

Community 1 Community 2Sample Proportion 0.6416 0.5774Observations 904 1046Hypothesized Difference 0.03z stat 1.5467P(Z<=z) one-tail 0.061z Critical one-tail 1.6449P(Z<=z) two-tail 0.122z Critical two-tail 1.96

Page 80: 1 Inference about Comparing Two Populations Chapter 13

80

Estimating p1 – p2 Estimating p1 – p2

• Example (estimating the cost of life saved)– Two drugs are used to treat heart attack victims:

• Streptokinase (available since 1959, costs $460)• t-PA (genetically engineered, costs $2900).

– The maker of t-PA claims that its drug outperforms Streptokinase.

– An experiment was conducted in 15 countries. • 20,500 patients were given t-PA• 20,500 patients were given Streptokinase• The number of deaths by heart attacks was recorded.

Page 81: 1 Inference about Comparing Two Populations Chapter 13

81

• Experiment results– A total of 1497 patients treated with Streptokinase

died.– A total of 1292 patients treated with t-PA died.

• Estimate the cost per life saved by using t-PA instead of Streptokinase.

Estimating p1 – p2 Estimating p1 – p2

Page 82: 1 Inference about Comparing Two Populations Chapter 13

82

• Solution– The problem objective: Compare the outcomes of

two treatments.– The data is nominal (a patient lived/died)– The parameter estimated is p1 – p2.

• p1 = death rate with t-PA

• p2 = death rate with Streptokinase

Estimating p1 – p2 Estimating p1 – p2

Page 83: 1 Inference about Comparing Two Populations Chapter 13

83

• Solving by hand– Sample proportions:

– The 95% confidence interval is

0630.205001292

p̂,0730.205001497

p̂ 21

2

22

1

1121 n

)p̂1(p̂n

)p̂1(p̂)p̂p̂(

2

22

1

1121 n

)p̂1(p̂n

)p̂1(p̂)p̂p̂(

0149.UCL0051.LCL

0049.100.20500

)0630.1(0630.20500

)0730.1(0730.96.10630.0730.

Estimating p1 – p2 Estimating p1 – p2

Page 84: 1 Inference about Comparing Two Populations Chapter 13

84

• Interpretation– We estimate that between .51% and 1.49% more

heart attack victims will survive because of the use of t-PA.

– The difference in cost per life saved is 2900-460= $2440.

– The total cost saved by switching to t-PA is estimated to be between 2440/.0149 = $163,758 and 2440/.0051 = $478,431

Estimating p1 – p2 Estimating p1 – p2