Copyright (c) Bani K. Mallick1 STAT 651 Lecture 9

Preview:

Citation preview

Copyright (c) Bani K. Mallick 1

STAT 651

Lecture 9

Copyright (c) Bani K. Mallick 2

Topics in Lecture #9 Comparing two population means

Output: detailed look

The t-test

Copyright (c) Bani K. Mallick 3

Book Sections Covered in Lecture #9

Chapter 6.2

Copyright (c) Bani K. Mallick 4

Relevant SPSS Tutorials Transformations of Data

2-sample t-test

Paired t-test

Copyright (c) Bani K. Mallick 5

Lecture 8 Review: Comparing Two Populations

There a two populations

Take a sample from each population

The sample sizes need not be the same

Population 1:

Population 2:

1n

2n

Copyright (c) Bani K. Mallick 6

Lecture 8 Review: Comparing Two Populations

Each will have a sample standard deviation

Population 1:

Population 2:

1s

2s

Copyright (c) Bani K. Mallick 7

Lecture 8 Review: Comparing Two Populations

Each sample with have a sample mean

Population 1:

Population 2:

That’s the statistics. What are the parameters?

1X

2X

Copyright (c) Bani K. Mallick 8

Lecture 8 Review: Comparing Two Populations

Each sample with have a population standard deviation

Population 1:

Population 2:

1

2

Copyright (c) Bani K. Mallick 9

Lecture 8 Review: Comparing Two Populations

Each sample with have a population mean

Population 1:

Population 2:

1

2

Copyright (c) Bani K. Mallick 10

Lecture 8 Review: Comparing Two Populations

How do we compare the population means and ????

The usual way is to take their difference:

If the population means are equal, what is their difference?

12

1 2

Copyright (c) Bani K. Mallick 11

Lecture 8 Review: Comparing Two Populations

The usual way is to take their difference:

If the population means are equal, their difference = 0

Suppose we form a confidence interval for the difference. From this we learn whether 0 is in the confidence interval, and hence can make decisions about the hypothesis

1 2

Copyright (c) Bani K. Mallick 12

NHANES Comparison

Group Statistics

60 2.9905 .6173 7.969E-02

59 2.6969 .6423 8.362E-02

Health StatusHealthy

Cancer

Log(Saturated Fat)N Mean Std. Deviation

Std. ErrorMean

Copyright (c) Bani K. Mallick 13

NHANES Comparison: what the output looks like

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 14

NHANES Comparison: the variable

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 15

NHANES Comparison: The method. If you think the

varianes are wildly different, try a transformation

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 16

NHANES Comparison: the p-value.

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 17

NHANES Comparison: the difference in sample means

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 18

NHANES Comparison: the standard error of difference in

sample means

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 19

NHANES Comparison: the 95% confidence interval

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

F Sig.

Levene's Test forEuality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 20

NHANES Comparison

The “Mean Difference” is 0.2937. Since the healthy cases had a higher mean, this is

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

What is this a CI for? The difference in population mean log(saturated fat) intake between cancer cases and healthy controls:

(Healthy) – (Cancer)

Copyright (c) Bani K. Mallick 21

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

The null hypothesis of interest is that the population means are equal, i.e.,

(Healthy) – (Cancer) = 0

Copyright (c) Bani K. Mallick 22

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.05 or p > 0.05?

Copyright (c) Bani K. Mallick 23

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

0 = Hypothesized

value

0.0065 0.5223

Confidence Interval

Copyright (c) Bani K. Mallick 24

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.05 or p > 0.05?

Answer: p < 0.05 since the 95% CI does not cover zero.

Copyright (c) Bani K. Mallick 25

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

Is the p-value p < 0.01 or p > 0.01?

Answer: You cannot tell from a 95% CI. However, from the SPSS output, p = 0.012. (see next slide)

Copyright (c) Bani K. Mallick 26

NHANES Comparison: the 95% confidence interval

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 0.0065 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

F Sig.

Levene's Test forEuality of Variances

t df Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Copyright (c) Bani K. Mallick 27

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

What do we conclude from this confidence interval?

Copyright (c) Bani K. Mallick 28

NHANES Comparison

Mean(Healthy) – Mean(Cancer)

The 95% CI is from 0.0065 to 0.5223

What do we conclude from this confidence interval?

The population mean log(saturated fat) intake is greater in the Healthy cases by between 0.0065 and 0.5223 (exponentiate to get in terms of grams of saturated fat), with 95% confidence

Copyright (c) Bani K. Mallick 29

Comparing Two Population Means: the Formulas

The data:

The populations:

The aim: CI for

1X 1s 1n

2X 2s 2n

1 12 2

1 2

Copyright (c) Bani K. Mallick 30

Comparing Two Populations

Does it matter which one you call population 1 and which one you call population 2?

Not at all. The key is to interpret the difference properly.

Copyright (c) Bani K. Mallick 31

Comparing Two Populations

The aim: CI for

This is the difference in population means

The estimate of the difference in population means is the difference in sample means

This is a random variable: it has sample to sample variability

1 2

1 2X X

Copyright (c) Bani K. Mallick 32

Comparing Two Populations

Difference of sample means

“Population” mean from repeated sampling is

The s.d. from repeated sampling is

1 2X X

1 2

2 21 2

1 2n n

Copyright (c) Bani K. Mallick 33

Comparing Two Populations

Difference of sample means

The s.d. from repeated sampling is

You need reasonably large samples from BOTH populations

1 2X X

2 21 2

1 2n n

Copyright (c) Bani K. Mallick 34

Comparing Two Populations

If you can reasonably believe that the population sd’s are nearly equal, it is customary to pick the equal variance assumption and estimate the common standard deviation by

2 21 1 2 2

p1 2

(n 1)s (n 1)ss

n n 2

Copyright (c) Bani K. Mallick 35

Comparing Two Populations

The standard error then of is the value

The number of degrees of freedom is

1 2X X

p 1 2

1 1s

n n

1 2n n 2

Copyright (c) Bani K. Mallick 36

Comparing Two Populations

A (1100% CI for is

Note how the sample sizes determine the CI length

1 2X X /2 1 2 p 1 2

1 1t (n +n -2)s

n n

1 2

Copyright (c) Bani K. Mallick 37

Comparing Two Populations

Generally, you should make your sample sizes nearly equal, or at least not wildly unequal. Consider a total sample size of 100

= 1 if n1 = 1, n2 = 99

= 0.20 if n1 = 50, n2 = 50

Thus, in the former case, your CI would be 5 times longer!

1 2

1 1

n n

1 2 /2 1 2 p 1 2

1 1X X t (n +n -2)s

n n

Copyright (c) Bani K. Mallick 38

Comparing Two Populations

The CI can of course be used to test hypotheses

This is the same as

So we just need to check whether 0 is in the interval, just as we have done

0 1 2 a 1 2H : vs H :

0 1 2 a 1 2H : =0 vs H : 0

Copyright (c) Bani K. Mallick 39

Comparing Two Populations: The t-test

There is something called a t-test, which gives you the information as to whether 0 is in the CI.

It does not tell you where the means lie however, so it is of limited use. P-values tell you the same thing.

0 1 2 a 1 2H : =0 vs H : 0

Copyright (c) Bani K. Mallick 40

Comparing Two Populations: The t-test

The t-statistic is defined by

1 2

p 1 2

X Xt =

1 1s

n n

Copyright (c) Bani K. Mallick 41

Comparing Two Populations: The t-test

You reject equality of means if

In this case, is p < or is p > ?

/2 1 2|t| > t (n +n -2)

Copyright (c) Bani K. Mallick 42

Comparing Two Populations: The t-test

You reject equality of means if

p <

/2 1 2|t| > t (n +n -2)

Copyright (c) Bani K. Mallick 43

NHANES Comparison: the t-test

Independent Samples Test

.186 .667 2.543 117 .012 .2937 .1155 6.497E-02 .5223

2.542 116.627 .012 .2937 .1155 6.488E-02 .5224

Equal variancesassumed

Equal variancesnot assumed

Log(Saturated Fat)F Sig.

Levene's Test forEquality of Variances

tdf

Sig. (2-tailed)Mean

DifferenceStd. ErrorDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

/2 1 2 .025t (n +n -2) = t (117) 1.98

/2 1 2t = 2.543 > t (n +n -2) 1.98, hence reject

the hypothesis that the population means are equal,

for = 0.05

Copyright (c) Bani K. Mallick 44

Comparing Two Populations

SPSS Demonstrations: bluebonnets and Framingham Heart Disease and Blood Pressure, as time permits

Recommended