Upload
loraine-williams
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
Copyright © 2013 Pearson Education, Inc. All rights reserved
Chapter 10
Inferring Population
Means
1 - 2 Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Learning Objectives
Understand when a goodness-of-fit test is needed and appropriate, and know how to perform the test and interpret results.
Distinguish between tests of homogeneity and tests of independence.
Understand when it is appropriate to use a chi-square statistic to test whether two categorical variables are associated; know how to perform this test and interpret the results.
Copyright © 2013 Pearson Education, Inc. All rights reserved
10.1
The Basic Ingredients for Testing with
Categorical Variables
1 - 4
Volunteers Representative of Student Body?
White Asian Hispanic Other
34% 32% 13% 21%
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Ethnicities for UCLA Student Body
White Asian Hispanic Other
150 120 45 85
Random Sample of 400 UCLA Volunteers
Is the ethnic distribution of volunteers the same as the ethnic distribution for the student body?
1 - 5
What Would We Expect?
White Asian Hispanic Other
34% 32% 13% 21%
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Ethnicities for UCLA Student Body
White Asian Hispanic Other
Observed 150 120 45 85
Random Sample of 400 UCLA Volunteers
0.34 x 400 = 136 0.13 x 400 = 52
0.32 x 400 = 128 0.21 x 400 = 84
Expected 136 128 52 84
1 - 6
Questions on Goodness of Fit
The observed counts are not the same as the expected counts.
Are they far enough from expected to conclude that the distribution of all UCLA volunteers differs from the student body distribution?Copyright © 2013 Pearson Education, Inc.. All rights reserved.
White Asian Hispanic Other
Observed 150 120 45 85
Expected 136 128 52 84
Random Sample of 400 UCLA Volunteers
1 - 7
c2 Test Statistic
c2 measures how far the observed is from the expected.
c2 = 0.12 + 0.47 + 0.94 + 0.01 = 1.54
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
White Asian Hispanic Other
Observed 150 120 45 85
Expected 136 128 52 84
Random Sample of 400 UCLA Volunteers
2
2 Observed Expected
Expected
2 2 2 2150 136 120 128 45 52 85 84
0.12, 0.47, 0.94, 0.01136 136 52 84
1 - 8
Political Affiliation and Music Preference
Is Political Affiliation associated with Music Preference?
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Democrat Republican
Pop 70 52
Classic Rock 34 57
Other 21 16
Survey of 250 People
1 - 9
Finding Expected Counts
If they are independent, then the number of Republicans who listen to Pop would be
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Democrat Republican
Pop 85 52
Classic Rock 34 57
Other 21 16
Survey of 250 People
137100% 54.8%
250Pop:
125100% 50%
25Rep:
0
Expected 0.548 0.5 250 68.5
1 - 10
Finding Expected Counts
Same test statistic: Computer is easier than by hand c2 ≈ 14.2 DF = (Rows – 1)(Columns – 1) = (3-1)(2-1) = 2 p-value = 0.0008
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Democrat Republican
Pop 85 (69.75) 52 (68.50)
Classic Rock 34 (46.33) 57 (44.67)
Other 21 (23.93) 16 (23.07)
Survey of 250 People
2
2 Observed Expected
Expected
1 - 11
Using the c2
All expected counts must be 5 or higher. Data is qualitative. Can be used to test if an unknown
distribution is the same as a known distribution.
Can be used to test if two variables are independent or associated.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Copyright © 2013 Pearson Education, Inc. All rights reserved
10.2
The Chi-Square Test for Goodness of Fit
1 - 13
Chi-Square Test for Goodness of Fit
Used to see if an unknown distribution is different from a given distribution.
Always the same null and alternative hypotheses: H0: The population distribution of the variable is
the same as the proposed distribution. Ha: The population distributions are different.
Uses a c2 test statistic. The rest follows the standard procedure.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 14
Chi-Square Test for Goodness of Fit
To find the expected count: Percent of the population times the sample size
for uniform distribution
Uses a c2 test statistic. The degrees of freedom (DF):
numbers of categories – 1 All expected counts must be greater than 5. The rest follows the standard procedure.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Sample Size
Number of Possibilities
1 - 15
Goodness of Fit: Rolling a Die
You are playing a game that involves rolling a die and suspect that the die is not fair.
1. Hypothesize H0: The die is fair.
(1,2,3,4,5, and 6 are equally likely to occur) Ha: The die is not fair.
You roll it 300 times and get:
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 16
Goodness of Fit: Rolling a Die
2. Prepare Use a = 0.01, c2 Statistic, all expected counts
are greater than 5. If all numbers our equally likely to occur, then
we would expect to get 50 of each value, 300/6 = 50.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Outcome 1 2 3 4 5 6
Observed 35 45 69 52 43 56
Expected 50 50 50 50 50 50
1 - 17
3. Compute to Compare
Stat → Goodness-of-fit→ Chi-Square test
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 18
4. Interpret
P-Value = 0.0156 > 0.01 = a Fail to reject H0
There is insufficient evidence to support the claim that the die is not fair.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 19
Facts about Goodness of Fit
The test statistic c2 will always be non-negative. If c2 is close to 0, then we will fail to reject H0.
If c2 is large, then we will reject H0.
Can conclude that the unknown distribution differs from the known.
Cannot conclude that the unknown distribution is the same as the known.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 20
Counts Must Be Used
If proportions are given instead of counts Multiply each proportion by the sample size to
obtain the count. If percents are given instead of counts
Convert the percents to decimals by dividing by 100. Then multiply each percent by the sample size to obtain the count.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Copyright © 2013 Pearson Education, Inc. All rights reserved
10.3
Chi-Square Tests for Associations between Categorical Variables
1 - 22
Test for Independence
One sample two categorical variables. Answers whether there is an association
between two categorical variables. Random, independent collection. All expected counts greater than 5. H0: The two variables are independent
Ha: There is an association between the two variablesCopyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 23
Is type of business associated with US region? A random sample of 558 businesses was studied
Manufacturing Retail Financial
East 47 92 67
Central 23 40 18
North 19 28 14
South 39 40 47
West 25 43 16
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1. Hypothesize H0: business type and region are independent
Ha: Business type and region are associated
1 - 24
2. Prepare
a = 0.05, c2 test for independence, all expected counts greater then 5.
Stat →Tables→Contingency→with summary
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 25
3. Compute to Compare
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 26
3. Compute to Compare
c2 ≈ 17.38 P-value = 0.0263
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 27
4. Interpret
P-value = 0.0263 < 0.05 = a Reject H0
Accept Ha
There is statistically significant evidence to support the claim that business type and region are associated.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 28
Test for Homogeneity
Two samples, one categorical question. Tests if the two populations are associated.
Is the distribution for the first population the same as for the second population?
Differs from Goodness of Fit in that there are two samples instead of one sample and one known population.
Differs from Test for Independence in that there are two samples and one variable instead of one sample and two variables. Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 29
Do freshmen and sophomores have different opinions about spending a year abroad? Is spending a year abroad a good idea?
Strongly Agree
Agree Disagree Strongly Disagree
Freshmen 45 33 18 7
Sophomores 32 28 20 6
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1. Hypothesize H0: The distributions of opinions for freshmen
and sophomores are the same. Ha: The distributions of opinions for freshmen
and sophomores are not the same.
1 - 30
2. Prepare
a = 0.05 c2 test for homogeneity All expected counts are greater than 5.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 31
3. Compute to Compare
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Stat →Tables→Contingency→with summary
1 - 32
4. Interpret
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
P-value = 0.7367 > 0.05 = a There is statistically insignificant evidence to
conclude that the distributions of opinions for freshmen and sophomores are not the same.
1 - 33
Comparing Test for Independence and Difference Between Proportions
For testing two variables each with two possible outcomes, the test for independence will give the same result as a two tailed test for the difference between proportions.
To show one answer occurs with higher probability for one group than another only the one tailed test for a difference between proportions can be used.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Copyright © 2013 Pearson Education, Inc. All rights reserved
10.4
Hypothesis Tests When Sample Sizes Are
Small
1 - 35
Small Sample Sizes: Consolidation
Were hospitalization rates from the swine flu different for different ages?
With expected counts less than 5, the c2 test cannot be used.
Instead, consolidate into just young, middle and old.Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 36
Small Sample Sizes: Consolidation
Were hospitalization rates from the swine flu different for different ages?
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Age Category
Under 15 15 – 29 30 and Older Totals
Yes 16 9 10 35
No 239 241 104 584
Totals 255 250 114
1 - 37
Small Sample Sizes: Consolidation
Were hospitalization rates from the swine flu different for different ages?
Now the sample sizes are large enough.
p-value = 0.12 is large.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Age Category
Under 15 15 – 29 30 and Older Totals
Yes 16 9 10 35
No 239 241 104 584
Totals 255 250 114
1 - 38
Were hospitalization rates from the swine flu different for different ages?
Fail to reject the null hypothesis. There is insignificant evidence to make a conclusion about whether hospitalization rates from the swine flu were different for different ages.
Problems with this approach: Grouping infants and young teens may not make
sense. Grouping middle aged people with senior
citizens may not make sense.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 39
Fisher’s Exact Test
Used to compare two proportions (or more proportions with advanced techniques).
Can be used with small sample sizes. Too advanced without the use of technology
such as StatCrunch. For larger sample sizes use a test for
independence, homogeneity, or difference between proportions.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Copyright © 2013 Pearson Education, Inc. All rights reserved
Chapter 10
Case Study
1 - 41
Is Oil Amount Associated With Successful Popcorn?
Success means at least half the kernels popped in 75 seconds or less.
H0: The quality of popcorn and the amount of oil are independent.
Ha: The quality of popcorn and the amount of oil are associated.Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 42
Is Oil Amount Associated With Successful Popcorn? All expected counts
at least 5. p-value = 0.006 is
very small. Reject H0, Accept Ha
There is statistically significant evidenceto support the claimthat oil amount and popcorn success are associated.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Copyright © 2013 Pearson Education, Inc. All rights reserved
Chapter 10
Guided Exercise 1
1 - 44
Are Humans Like Random Number Generators?
38 students were asked to pick a “random” number from 1 to 5.
Test the hypothesis that humans are not like random number generators. Use a significance level of 0.05, and assume the data were collected from a random sample of students.Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Integer One Two Three Four Five
Frequency 3 5 14 11 5
1 - 45
Are Humans Like Random Number Generators?1. Hypothesize
H0: Humans are like random number generators and produce numbers in equal quantities.
Ha: Humans do not produce numbers in equal quantities.
2. Prepare Why are all Expected = 7.6?
38/5 = 7.6 Use the c2 statistic.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Integer One Two Three Four Five
Freq. 3 5 14 11 5
1 - 46
3. Compute to Compare
p-value = 0.0217
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
2 2 2
2 3 7.6 5 7.6 11 7.6
7.6 7.6 7.6
2 214 7.6 5 7.6
11.477.6 7.6
1 - 47
4. Interpret
p-value = 0.0217 < 0.05 = a Reject H0. Accept Ha. Conclusion: Humans have been shown to be
different from random number generators.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Copyright © 2013 Pearson Education, Inc. All rights reserved
Chapter 10
Guided Exercise 2
1 - 49
Obesity and Relationship
In a study reported in the medical journal Obesity the research subjects were categorized in terms of whether or not they were obese and whether they were dating, cohabiting, or married.
Test the hypothesis that the variables Relationship Status and Obesity are associated, using a significance level of 0.05.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 50
1. Hypothesize
Calculate the row, column and grand totals.
H0: Relationship status and obesity are independent.
Ha: Relationship status and obesity are associated.Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Dating Cohabitating Married Total
Obese 81 103 147 331
Not Obese 359 326 277 962
Total 440 429 424 1293
1 - 51
2. Prepare We choose the chi-square test for independence
because the data were from one random sample in which the people were classified two different ways. Find the smallest expected value and report it. Is it more than 5?
The smallest expected value is 108.5. Since it is much bigger than 5, the c2-test can be
used.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 52
3. Compute to Compare
c2 ≈ 30.83 p-value < 0.001
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 53
4. Interpret
p-value < 0.001 p-value < 0.001 < 0.05 = a. Reject H0. Accept Ha. There is statistically significant evidence
to conclude that relationship status and obesity are associated.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 54
Causality
Can we conclude from these data that living with someone is making some people obese and that marrying is making even more people obese? No. We can only conclude that obesity and relationship
status are associated.
Can we conclude that obesity affects your relationship status? No. Cause and effect cannot be concluded based on just
looking at the data. A control study would have to be done if possible.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
1 - 55
Percentages
Find and compare the percentages obese in the three relationship statuses.
In StatCrunch, select Column Percent.
We see that the percent obese (34.67%)for the married category is much higher than the percent obese for the dating category (18.41%). The obesity percent (24.01%) for cohabitating couples is in the middle.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.