Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 10 Inferring Population Means

Copyright © 2013 Pearson Education, Inc. All rights reserved

Chapter 10

Inferring Population

Means

1 - 2 Copyright © 2013 Pearson Education, Inc.. All rights reserved.

Learning Objectives

Understand when a goodness-of-fit test is needed and appropriate, and know how to perform the test and interpret results.

Distinguish between tests of homogeneity and tests of independence.

Understand when it is appropriate to use a chi-square statistic to test whether two categorical variables are associated; know how to perform this test and interpret the results.


10.1

The Basic Ingredients for Testing with

Categorical Variables

1 - 4

Volunteers Representative of Student Body?

White Asian Hispanic Other

34% 32% 13% 21%

Copyright © 2013 Pearson Education, Inc.. All rights reserved.

Ethnicities for UCLA Student Body


150 120 45 85

Random Sample of 400 UCLA Volunteers

Is the ethnic distribution of volunteers the same as the ethnic distribution for the student body?

1 - 5

What Would We Expect?


34% 32% 13% 21%


Ethnicities for UCLA Student Body


Observed 150 120 45 85


0.34 x 400 = 136 0.13 x 400 = 52

0.32 x 400 = 128 0.21 x 400 = 84

Expected 136 128 52 84

1 - 6

Questions on Goodness of Fit

The observed counts are not the same as the expected counts.

Are they far enough from expected to conclude that the distribution of all UCLA volunteers differs from the student body distribution?Copyright © 2013 Pearson Education, Inc.. All rights reserved.


Observed 150 120 45 85

Expected 136 128 52 84


1 - 7

c2 Test Statistic

c2 measures how far the observed is from the expected.

c2 = 0.12 + 0.47 + 0.94 + 0.01 = 1.54



Observed 150 120 45 85

Expected 136 128 52 84


2

2 Observed Expected

Expected

2 2 2 2150 136 120 128 45 52 85 84

0.12, 0.47, 0.94, 0.01136 136 52 84

1 - 8

Political Affiliation and Music Preference

Is Political Affiliation associated with Music Preference?


Democrat Republican

Pop 70 52

Classic Rock 34 57

Other 21 16

Survey of 250 People

1 - 9

Finding Expected Counts

If they are independent, then the number of Republicans who listen to Pop would be


Democrat Republican

Pop 85 52

Classic Rock 34 57

Other 21 16


137100% 54.8%

250Pop:

125100% 50%

25Rep:

0

Expected 0.548 0.5 250 68.5

1 - 10

Finding Expected Counts

Same test statistic: Computer is easier than by hand c2 ≈ 14.2 DF = (Rows – 1)(Columns – 1) = (3-1)(2-1) = 2 p-value = 0.0008


Democrat Republican

Pop 85 (69.75) 52 (68.50)

Classic Rock 34 (46.33) 57 (44.67)

Other 21 (23.93) 16 (23.07)


2

2 Observed Expected

Expected

1 - 11

Using the c2

All expected counts must be 5 or higher. Data is qualitative. Can be used to test if an unknown

distribution is the same as a known distribution.

Can be used to test if two variables are independent or associated.



10.2

The Chi-Square Test for Goodness of Fit

1 - 13

Chi-Square Test for Goodness of Fit

Used to see if an unknown distribution is different from a given distribution.

Always the same null and alternative hypotheses: H0: The population distribution of the variable is

the same as the proposed distribution. Ha: The population distributions are different.

Uses a c2 test statistic. The rest follows the standard procedure.


1 - 14

Chi-Square Test for Goodness of Fit

To find the expected count: Percent of the population times the sample size

for uniform distribution

Uses a c2 test statistic. The degrees of freedom (DF):

numbers of categories – 1 All expected counts must be greater than 5. The rest follows the standard procedure.


Sample Size

Number of Possibilities

1 - 15

Goodness of Fit: Rolling a Die

You are playing a game that involves rolling a die and suspect that the die is not fair.

1. Hypothesize H0: The die is fair.

(1,2,3,4,5, and 6 are equally likely to occur) Ha: The die is not fair.

You roll it 300 times and get:


1 - 16

Goodness of Fit: Rolling a Die

2. Prepare Use a = 0.01, c2 Statistic, all expected counts

are greater than 5. If all numbers our equally likely to occur, then

we would expect to get 50 of each value, 300/6 = 50.


Outcome 1 2 3 4 5 6

Observed 35 45 69 52 43 56

Expected 50 50 50 50 50 50

1 - 17

3. Compute to Compare

Stat → Goodness-of-fit→ Chi-Square test


1 - 18

4. Interpret

P-Value = 0.0156 > 0.01 = a Fail to reject H0

There is insufficient evidence to support the claim that the die is not fair.


1 - 19

Facts about Goodness of Fit

The test statistic c2 will always be non-negative. If c2 is close to 0, then we will fail to reject H0.

If c2 is large, then we will reject H0.

Can conclude that the unknown distribution differs from the known.

Cannot conclude that the unknown distribution is the same as the known.


1 - 20

Counts Must Be Used

If proportions are given instead of counts Multiply each proportion by the sample size to

obtain the count. If percents are given instead of counts

Convert the percents to decimals by dividing by 100. Then multiply each percent by the sample size to obtain the count.



10.3

Chi-Square Tests for Associations between Categorical Variables

1 - 22

Test for Independence

One sample two categorical variables. Answers whether there is an association

between two categorical variables. Random, independent collection. All expected counts greater than 5. H0: The two variables are independent

Ha: There is an association between the two variablesCopyright © 2013 Pearson Education, Inc.. All rights reserved.

1 - 23

Is type of business associated with US region? A random sample of 558 businesses was studied

Manufacturing Retail Financial

East 47 92 67

Central 23 40 18

North 19 28 14

South 39 40 47

West 25 43 16


1. Hypothesize H0: business type and region are independent

Ha: Business type and region are associated

1 - 24

2. Prepare

a = 0.05, c2 test for independence, all expected counts greater then 5.

Stat →Tables→Contingency→with summary


1 - 25



1 - 26


c2 ≈ 17.38 P-value = 0.0263


1 - 27

4. Interpret

P-value = 0.0263 < 0.05 = a Reject H0

Accept Ha

There is statistically significant evidence to support the claim that business type and region are associated.


1 - 28

Test for Homogeneity

Two samples, one categorical question. Tests if the two populations are associated.

Is the distribution for the first population the same as for the second population?

Differs from Goodness of Fit in that there are two samples instead of one sample and one known population.

Differs from Test for Independence in that there are two samples and one variable instead of one sample and two variables. Copyright © 2013 Pearson Education, Inc.. All rights reserved.

1 - 29

Do freshmen and sophomores have different opinions about spending a year abroad? Is spending a year abroad a good idea?

Strongly Agree

Agree Disagree Strongly Disagree

Freshmen 45 33 18 7

Sophomores 32 28 20 6


1. Hypothesize H0: The distributions of opinions for freshmen

and sophomores are the same. Ha: The distributions of opinions for freshmen

and sophomores are not the same.

1 - 30

2. Prepare

a = 0.05 c2 test for homogeneity All expected counts are greater than 5.


1 - 31



Stat →Tables→Contingency→with summary

1 - 32

4. Interpret


P-value = 0.7367 > 0.05 = a There is statistically insignificant evidence to

conclude that the distributions of opinions for freshmen and sophomores are not the same.

1 - 33

Comparing Test for Independence and Difference Between Proportions

For testing two variables each with two possible outcomes, the test for independence will give the same result as a two tailed test for the difference between proportions.

To show one answer occurs with higher probability for one group than another only the one tailed test for a difference between proportions can be used.



10.4

Hypothesis Tests When Sample Sizes Are

Small

1 - 35

Small Sample Sizes: Consolidation

Were hospitalization rates from the swine flu different for different ages?

With expected counts less than 5, the c2 test cannot be used.

Instead, consolidate into just young, middle and old.Copyright © 2013 Pearson Education, Inc.. All rights reserved.

1 - 36




Age Category

Under 15 15 – 29 30 and Older Totals

Yes 16 9 10 35

No 239 241 104 584

Totals 255 250 114

1 - 37



Now the sample sizes are large enough.

p-value = 0.12 is large.


Age Category

Under 15 15 – 29 30 and Older Totals

Yes 16 9 10 35

No 239 241 104 584

Totals 255 250 114

1 - 38


Fail to reject the null hypothesis. There is insignificant evidence to make a conclusion about whether hospitalization rates from the swine flu were different for different ages.

Problems with this approach: Grouping infants and young teens may not make

sense. Grouping middle aged people with senior

citizens may not make sense.


1 - 39

Fisher’s Exact Test

Used to compare two proportions (or more proportions with advanced techniques).

Can be used with small sample sizes. Too advanced without the use of technology

such as StatCrunch. For larger sample sizes use a test for

independence, homogeneity, or difference between proportions.



Chapter 10

Case Study

1 - 41

Is Oil Amount Associated With Successful Popcorn?

Success means at least half the kernels popped in 75 seconds or less.

H0: The quality of popcorn and the amount of oil are independent.

Ha: The quality of popcorn and the amount of oil are associated.Copyright © 2013 Pearson Education, Inc.. All rights reserved.

1 - 42

Is Oil Amount Associated With Successful Popcorn? All expected counts

at least 5. p-value = 0.006 is

very small. Reject H0, Accept Ha

There is statistically significant evidenceto support the claimthat oil amount and popcorn success are associated.



Chapter 10

Guided Exercise 1

1 - 44

Are Humans Like Random Number Generators?

38 students were asked to pick a “random” number from 1 to 5.

Test the hypothesis that humans are not like random number generators. Use a significance level of 0.05, and assume the data were collected from a random sample of students.Copyright © 2013 Pearson Education, Inc.. All rights reserved.

Integer One Two Three Four Five

Frequency 3 5 14 11 5

1 - 45

Are Humans Like Random Number Generators?1. Hypothesize

H0: Humans are like random number generators and produce numbers in equal quantities.

Ha: Humans do not produce numbers in equal quantities.

2. Prepare Why are all Expected = 7.6?

38/5 = 7.6 Use the c2 statistic.


Integer One Two Three Four Five

Freq. 3 5 14 11 5

1 - 46


p-value = 0.0217


2 2 2

2 3 7.6 5 7.6 11 7.6

7.6 7.6 7.6

2 214 7.6 5 7.6

11.477.6 7.6

1 - 47

4. Interpret

p-value = 0.0217 < 0.05 = a Reject H0. Accept Ha. Conclusion: Humans have been shown to be

different from random number generators.



Chapter 10

Guided Exercise 2

1 - 49

Obesity and Relationship

In a study reported in the medical journal Obesity the research subjects were categorized in terms of whether or not they were obese and whether they were dating, cohabiting, or married.

Test the hypothesis that the variables Relationship Status and Obesity are associated, using a significance level of 0.05.


1 - 50

1. Hypothesize

Calculate the row, column and grand totals.

H0: Relationship status and obesity are independent.

Ha: Relationship status and obesity are associated.Copyright © 2013 Pearson Education, Inc.. All rights reserved.

Dating Cohabitating Married Total

Obese 81 103 147 331

Not Obese 359 326 277 962

Total 440 429 424 1293

1 - 51

2. Prepare We choose the chi-square test for independence

because the data were from one random sample in which the people were classified two different ways. Find the smallest expected value and report it. Is it more than 5?

The smallest expected value is 108.5. Since it is much bigger than 5, the c2-test can be

used.


1 - 52


c2 ≈ 30.83 p-value < 0.001


1 - 53

4. Interpret

p-value < 0.001 p-value < 0.001 < 0.05 = a. Reject H0. Accept Ha. There is statistically significant evidence

to conclude that relationship status and obesity are associated.


1 - 54

Causality

Can we conclude from these data that living with someone is making some people obese and that marrying is making even more people obese? No. We can only conclude that obesity and relationship

status are associated.

Can we conclude that obesity affects your relationship status? No. Cause and effect cannot be concluded based on just

looking at the data. A control study would have to be done if possible.


1 - 55

Percentages

Find and compare the percentages obese in the three relationship statuses.

In StatCrunch, select Column Percent.

We see that the percent obese (34.67%)for the married category is much higher than the percent obese for the dating category (18.41%). The obesity percent (24.01%) for cohabitating couples is in the middle.


Documents

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 10 Inferring Population Means