Section 4.32-Way Tables and Marginal Distributions
AP Statistics
October 20, 2010
AP Statistics, Section 4.3, Part 1 2
What about Categorical Data?
Whenever we have looked at data so far, it has quantitative vs. quantitative
What tools do we use if we are faced with categorical data?
AP Statistics, Section 4.3, Part 1 3
2-Way Tables
First, think of data where each member of the distribution belongs to only one category.
Education Level:Did not complete high schoolCompleted high school1 to 3 years of college4 or more years of college
AP Statistics, Section 4.3, Part 1 4
2-Way Tables
First, think of data where the member of the distribution belongs to only category.
Age:25 to 34 years35 to 54 years55+ years
AP Statistics, Section 4.3, Part 1 5
Sample 2-Way Table
25 to 34 35 to 54 55+ Total
Did not complete HS 4,474 9,155 14,224 27,853
Completed HS 11,546 26,481 20,060 58,087
1 to 3 years of college 10,700 22,618 11,127 44,445
4+ years of college 11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
Each number represents a “count”, or the number of data points that meet the
criteria
Age is considered
“column variables”
Education is considered
“row variables”
AP Statistics, Section 4.3, Part 1 6
Calculating Marginal Distributions
When we do a marginal distribution, we only look at totals (the values found on the right margin or bottom margin)
In marginal distributions, we divide by “Grand Total”
25 to 34 35 to 54 55+ Total
Did not complete HS 4,474 9,155 14,224 27,853
Completed HS 11,546 26,481 20,060 58,087
1 to 3 years of college 10,700 22,618 11,127 44,445
4+ years of college 11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
AP Statistics, Section 4.3, Part 1 7
Calculating Marginal Distributions
Years of school completed, by age, 2000
(thousand of persons)
25 to 34 35 to 54 55+ Total
Did not complete HS 4,474 9,155 14,224 27,853
Completed HS 11,546 26,481 20,060 58,087
1 to 3 years of college 10,700 22,618 11,127 44,445
4+ years of college 11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
total with four years of college
table total44,845
175,230
25.6%
AP Statistics, Section 4.3, Part 1 8
Calculating Marginal Distributions
Years of school completed, by age, 2000
(thousand of persons)
25 to 34 35 to 54 55+ Total
Did not complete HS 4,474 9,155 14,224 27,853
Completed HS 11,546 26,481 20,060 58,087
1 to 3 years of college 10,700 22,618 11,127 44,445
4+ years of college 11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
27,85315.9%
175,230
58,08733.1%
175,230
44,44525.4%
175,230
44,84525.6%
175,230
AP Statistics, Section 4.3, Part 1 9
Graph of Marginal Distribution
0
5
10
15
20
25
30
35
No HSDegree
HighSchool
Only
SomeCollege
4 ormoreyears
Percent of Adults
AP Statistics, Section 4.3, Part 1 10
Calculating Conditional Distributions
The “conditional” part sounds like “on the condition the
respondents are 35 to 54”
“among those who have completed high school but did not go to college”
“for those respondents over 55 years of age”
25 to 34
35 to 54
55+ Total
Did not complete
HS4,474 9,155 14,224 27,853
Completed HS
11,546 26,481 20,060 58,087
1 to 3 years of college
10,700 22,618 11,127 44,445
4+ years of college
11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
AP Statistics, Section 4.3, Part 1 11
Calculating Conditional Distributions
When we look at conditional distributions, we are restricted to a particular column or row (but not “margins”)
In conditional distributions, we divide by “Total” of the column or row.
25 to 34
35 to 54
55+ Total
Did not complete
HS4,474 9,155 14,224 27,853
Completed HS
11,546 26,481 20,060 58,087
1 to 3 years of college
10,700 22,618 11,127 44,445
4+ years of college
11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
AP Statistics, Section 4.3, Part 1 12
Calculating Conditional Distributions
Years of school completed, by age, 2000 (thousand of persons)
25 to 34
35 to 54
55+ Total
Did not complete
HS4,474 9,155 14,224 27,853
Completed HS
11,546 26,481 20,060 58,087
1 to 3 years of college
10,700 22,618 11,127 44,445
4+ years of college
11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
25 to 34 with HS diploma only
total with HS diploma only
11,546
58,087
19.9%
AP Statistics, Section 4.3, Part 1 13
Calculating Conditional Distributions
Years of school completed, by age, 2000 (thousand of persons)
25 to 34
35 to 54
55+ Total
Did not complete HS
4,474 9,155 14,224 27,853
Completed HS
11,546 26,481 20,060 58,087
1 to 3 years of college
10,700 22,618 11,127 44,445
4+ years of college
11,066 23,183 10,596 44,845
Total 37,786 81,435 56,008 175,230
11,54619.9%
58,087
26,48145.6%
58,087
20,06034.5%
58,087
AP Statistics, Section 4.3, Part 1 14
Round-off ErrorRound-off Error
Occurs when tables round numbers in each entry and the totals don’t actually match up to what they really are.
AP Statistics, Section 4.3, Part 1 15
Graph of Conditional Distribution
0
10
20
30
40
50
Percent who only
completed high school
25 to 34 35 to 54 55 and over
AP Statistics, Section 4.3, Part 1 16
Simpson’s Paradox
A paradox is “a statement that is seemingly contradictory or opposed to common sense and yet is perhaps true”
Simpson’s Paradox involves percentages…
AP Statistics, Section 4.3, Part 1 17
Which Hospital Will You Choose?
Hospital A Hospital B Total
Died 63 16 79
Survived 2037 784 2821
Total 2100 800 2900
On the condition you were admitted to
Hospital A, what is your chance of
surviving?
Among the patients of Hospital B, what
percentage survived?
SR=97% SR=98%
AP Statistics, Section 4.3, Part 1 18
What if you are in “good” condition?
Hospital A Hospital B Total
Died 6 8 14
Survived 594 592 1186
Total 600 600 1200
If you were admitted to Hospital A in good
condition, what is your chance of
surviving?
Amongst Hospital B patients in good
condition, what percentage lived?
SR=99.0% SR=98.6%
AP Statistics, Section 4.3, Part 1 19
What if you are in “poor” condition?
Hospital A Hospital B Total
Died 57 8 65
Survived 1443 192 1635
Total 1500 200 1700
If you were admitted to Hospital A in poor
condition, what is your chance of
living?
Among Hospital B patients in poor condition, what
percentage lived?
SR=96.2% SR=96.0%
AP Statistics, Section 4.3, Part 1 20
Simpson’s ParadoxSimpson’s Paradox
Refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group.
The lurking variables in Simpson’s Paradox are categorical.
This happens very rarely!
AP Statistics, Section 4.3, Part 1 21
Assignment
Complete Worksheets 4.2A and 4.2B