Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Module 1 – Proposed Rearrangements and Additions
New Order Topic Comment Mathematics Required Software? Stage Corrections, Changes to Existing Material
BN1.1 Now Showing: Basic Numeracy Five questions None No Encounter NA
BN1.2 Background Bugaboos Currently 1.1 Percentages, addition, division No Encounter None
BN1.3 Times Table Troubles Currently 1.2 Multiplication No Encounter Question 1 refers to Table 1.1 when it should refer to Table 1.2
BN1.4 Now Showing: Computations, Benchmarks Five Questions Percentages No Engage NA (suggestion to update benchmarks on video)
BN1.5 Perceptions, Pictures, Pcts Currently 1.3 Percentages No Engage None
BN1.6 Computation and Common Sense Currently 1.4 Division No Engage None
BN1.7 Really Random Reasoning Currently 1.12 Counting No Reflect None
BN1.8 Hardwired to Slippery Thinking Currently 1.13 None No Reflect None
BN1.9 Why Numeracy Matters New Division No Reflect NA
BN1.10 Mean versus Median New Addition, division Yes Extend NA
BN1.11 Variation Matters New Addition, division, counting Optional Extend NA
BN1.12 Computing the Standard Deviation New Addition, division, square root Yes Extend NA
BN1.13 Now Showing: Expers - Introduction Five questions None Encounter NA
BN1.14 Slippery Evidence and Confounding Currently 1.6 None No Encounter None
BN1.15 Confounding Confusion New None No Encounter NA
BN1.16 Now Showing: Compare and Rand Five Questions Percentages, counting No Engage NA
BN1.17 Experimentation Takes Flight Currently 1.8 Addition, division, counting No Engage None
BN1.18 Catching on to Experimentation New None No Engage NA
BN1.19 Now Showing: Stat Sig Five Questions None No Reflect NA
BN1.20 Questionable Evidence New None No Reflect NA
BN1.21 Random Reflections New None No Reflect NA
BN1.22 Assessing Statistical Significance New Addition, division, square root Optional Extend NA
BN1.23 Designer Thoughts New Addition, division Yes Extend NA
BN1.24 What to Believe? New None No Extend NA
BN1.25 Now Showing: Scatterplots Five Questions None Encounter NA
BN1.26 Scatterplot – Part I New None No Encounter NA
BN1.27 Scatterplot –Part II New None Yes Encounter NA
BN1.28 Now Showing: Corr Coef Five Questions Addition, division, square root No Engage NA
BN1.29 Corr – Part I New Addition, division, square root No Engage NA
BN1.30 Corr – Part II New Addition, division, square root Yes Engage NA
BN1.31 Now Showing: Causation Five Questions None No Reflect NA
BN1.32 Association and Causation Currently 1.9 None No Reflect Replace the graph in Exhibit 2 with one I generated for BN 1.27
BN1.33 Association and Causation Revisited Currently 1.10 None No Reflect None
BN1.34 Simpson Currently 1.14 Fractions, percentages No Extend None
BN1.35 Simpson Revisited Currently 1.15 Fractions, percentages No Extend None
BN1.36 Correlation and Outliers New Addition, division, square root Yes Extend NA
BC1.1 A Very Lucky Project Currently BC 1.1
None No Beyond the Class None
BC1.2 I Got Your Simpson Right Here Currently BC 1.2
Fractions Yes Beyond the Class Replace first bullet link (no longer exists) with http://blog.revolutionanalytics.com/2013/10/an-interactive-tool-to-explain-simpsons-paradox.html
BC1.3 Watch My Slippery Evidence Currently BC 1.3
None No Beyond the Class None
REMOVE Now Showing Number Sense Currently BN1.5
Old single-page video questions. Being broken into many different pages now.
BEYOND THE NUMBERS 1.1_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
BEYOND THE NUMBERS 1.4_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Nursing Knowledge Needed
Questions
2. The nurse had injected the patient four times with a full 0.9 milliliter syringe. What was the nurse’s
mistake? Defend your answer.
Exhibit 2
Statistical Citizenship
Questions
1. Briefly list three “features of the Constitution that suggest a numerical approach to
governance.”
2. Initially the government was reluctant to collect more than the most basic census information of
race, sex, and age. Why? During which of the three time periods addressed by Cohen did this
attitude change?
3. In the colonies, if you had 48 pounds of soap, how many firkins of soap did you have?
4. Cohen writes that “the post-Civil War era finally brought a full melding of statistical data with
the functioning of representative government.” List three facts supporting this claim.
BEYOND THE NUMBERS 1.10_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Winging It
Exhibit 2
A BIG Visit
Exhibit 3
Gates Proof Inference
BEYOND THE NUMBERS 1.11_ LEARNING OUTCOME _
Variation Matters
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Uncommon Reach The wingspans of 18 persons are recorded in the table below, nine being ordinary folks and nine
being current or former NBA players.
Questions
1. Use a software package (e.g.
Excel or Numbers) to compute
the mean and the median of the
entire 18-person data set (“Data
Set I”).
Mean ________
Median ________
2. How do the mean and median
compare?
Non- NBA Persons
Wingspan (in)
NBA Persons Wingspan (in)
P1 70 Ike Diogu 88
P2 73 Anthony Davis 88
P3 72 Shelden Williams 88
P4 69 Elton Brand 90
P5 69 Shawn Bradley 90
P6 68 Bismack Biyombo 90
P7 68 Saer Sene 93
P8 64 Gheorghe Muresan 94
P9 73 Manute Bol 102
Exhibit 2
Middle Muddle Questions
1. Construct a histogram of all 18 of the wingspans in the table in Exhibit 1. Use the intervals shown in the table to the right. And plot the histogram on the axes below. Your instructor may require you to use a software package to do this exercise so follow her lead. The label of “60” on the plot below denotes the first bin – Wingspan ≤ 60. The label of “75” denotes the fourth bin – 70 < Wingspan ≤ 75 – and so on.
2. Locate the mean on the plot by drawing in a vertical line segment there. How useful is the mean at describing these 18 wingspans? Explain?
Interval (Bin) Frequency Wingspan ≤ 60
60 < Wingspan ≤ 65
65 < Wingspan ≤ 70
70 < Wingspan ≤ 75
75 < Wingspan ≤ 80
80 < Wingspan ≤ 85
85 < Wingspan ≤ 90
95 < Wingspan ≤ 100
100 < Wingspan ≤ 105
0
1
2
3
4
5
6
7
60 65 70 75 80 85 90 95 100 105 More
Freq
uenc
y
Wingspan (inches)
Wingspan Data
3. Let’s add an additional data set. Now suppose you have a data set of 18 persons, 8 with a wingspan of 80.5 inches, 5 with a wingspan of 75.5 and 5 with a wingspan of 85.5. We’ll call this “Data Set II.” Find the mean and median of these 18 data points.
4. Construct a histogram of all 18 of the wingspans from Question 3. Use the intervals shown in the table to the right. And plot the histogram on the axes below. Your instructor may require you to use a software package to do this exercise so follow her lead. The label of “60” on the plot below denotes the first bin – Wingspan ≤ 60. The label of “75” denotes the fourth bin – 70 < Wingspan ≤ 75 – and so on.
Interval (Bin) Frequency Wingspan ≤ 60
60 < Wingspan ≤ 65
65 < Wingspan ≤ 70
70 < Wingspan ≤ 75
75 < Wingspan ≤ 80
80 < Wingspan ≤ 85
85 < Wingspan ≤ 90
95 < Wingspan ≤ 100
100 < Wingspan ≤ 105
0
1
2
3
4
5
6
7
60 65 70 75 80 85 90 95 100 105 More
Freq
uenc
y
Wingspan (inches)
Wingspan Data
Exhibit 3
The Spice of Life Questions
1. Compare the data sets from Exhibit 1 and Exhibit 2. How close are their average values?
2. Compare the histograms from Exhibits 1 and 2. Specify at least two ways the histograms are notably different.
3. Let’s play a game. It costs you $1000 to play. Here are the rules. You get to pick two wingspans at random, eyes closed, out of either Data Set I or Data Set II. Your choice. Call your choices x1 and x2. You will receive a reward of $(80.5-x1)2 + $(80.5-x2)2. Suppose you decided to pick from Data Set I and chose P2 and Saer Sene. How much money did you make?
4. Think back to the game in Question 3. If you truly get to pick which Data Set you want to choose your two wingspans from, then which Data Set would you always be safest (in terms of anticipated profit) to choose and why?
FYI – Some plots and facts.
Mean of Data Set 1 is 80.5
Mean of Data Set 2 is 80.5
Histograms are below
0
1
2
3
4
5
6
7
60 65 70 75 80 85 90 95 100 105 More
Freq
uenc
y
Wingspan (inches)
Wingspan Data
0
1
2
3
4
5
6
7
8
9
60 65 70 75 80 85 90 95 100 105 More
Freq
uenc
y
Wingspan (inches)
Wingspan Data
BEYOND THE NUMBERS 1.12_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Background
√( )
( ) ( )
Uncommon Reach Revisited
Variance 140.2647059
Standard Deviation 11.84334015
Variance 14.70588
Standard Deviation 3.834825
3.088365219
BEYOND THE NUMBERS 1.13_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Brains and Beats
Exhibit 2
Fuzzy Quasi is a Bear
BEYOND THE NUMBERS 1.16_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
A Measured Response
Distance (cm)
Time (sec)
Distance (cm)
Time (sec)
1 0.045 16 0.181
2 0.064 17 0.186
3 0.078 18 0.192
4 0.090 19 0.197
5 0.101 20 0.202
6 0.111 21 0.207
7 0.119 22 0.212
8 0.128 23 0.217
9 0.135 24 0.221
10 0.143 25 0.226
11 0.150 26 0.230
12 0.156 27 0.235
13 0.163 28 0.239
14 0.169 29 0.243
15 0.175 30 0.247
Group R Time (sec) Group L
Time (sec)
1 1
2 2
3 3
4 4
5 5
6 6
4. Find the mean of both groups. Based on those two values, is there evidence of a difference
between the reaction times of Group L and Group R? Defend your answer.
5. What role would the variance of the measurements in each group have in making this decision in
Question 3 more precise? Explain.
BEYOND THE NUMBERS 1.19_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Cancer Carafe
Exhibit 2
Of Mice and People
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
What's Random?
Exhibit 2
Random Opposition
B A B A B A A A
B A B A A A B B
A B A B B A B B
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Badge of Big
Table B
Group R Time (sec) Group L
Time (sec)
1 0.090 1 0.111
2 0.119 2 0.181
3 0.143 3 0.090
4 0.169 4 0.186
5 0.064 5 0.045
6 0.150 6 0.143
5. Compute
√
6. It turns out that the one can say that the difference between the left-hand reaction times and right-
hand reaction times fail to be statistically significant if -2.23 < |Z| < 2.23, where Z is what you
computed in Question 5. Do the results shown in Table B support a statistically significant
difference in reaction times? Why or why not?
Instructor’s Note: The notation Z was used instead of “t” to avoid confusion later in the workbook. Also the degrees of freedom were
computed using the simple estimate of 6 + 6 -2.
BEYOND THE NUMBERS 1.23_ LEARNING OUTCOME _
Pairing Profits
Reaction Times (sec)
Left Right
1 1.05
0.74 0.76
0.66 0.71
0.78 0.79
0.68 0.69
0.65 0.72
0.75 0.75
0.69 0.72
0.94 0.99
0.79 0.8
0.81 0.82
0.62 0.67
(24-1) x
Variance of All
24 Reaction
Times
Variance
Attributed to
Hand (L/R)
Variance Left
Unexplained = +
Variance
Explained by
Pairing
Variance Left
Unexplained +
(24-1) x
Variance of All
24 Reaction
Times
Variance
Attributed to
Hand (L/R) = +
Time
Source DF Sum of Squares Mean Square F Value Pr > F
Model 12 0.30148333 0.02512361 95.30 <.0001
Error 11 0.00290000 0.00026364
Corrected Total 23 0.30438333
R-Square Coeff Var Root MSE Time Mean
0.990473 2.097337 0.016237 0.774167
Source DF Type I SS Mean Square F Value Pr > F
Subject 11 0.29608333 0.02691667 102.10 <.0001
Hand 1 0.00540000 0.00540000 20.48 0.0009
Source DF Type III SS Mean Square F Value Pr > F
Subject 11 0.29608333 0.02691667 102.10 <.0001
Hand 1 0.00540000 0.00540000 20.48 0.0009
Time
Source DF Sum of Squares Mean Square F Value Pr > F
Model 1 0.00540000 0.00540000 0.40 0.5350
Error 22 0.29898333 0.01359015
Corrected Total 23 0.30438333
R-Square Coeff Var Root MSE Time Mean
0.017741 15.05836 0.116577 0.774167
Source DF Type I SS Mean Square F Value Pr > F
Hand 1 0.00540000 0.00540000 0.40 0.5350
Source DF Type III SS Mean Square F Value Pr > F
Hand 1 0.00540000 0.00540000 0.40 0.5350
BEYOND THE NUMBERS 1.24_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Piltdown Meltdown - 1912
Exhibit 2
Marker Mice - 1974
Exhibit 3
Doing the Dishes - 2010
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
1.
2.
3.
4.
5.
BEYOND THE NUMBERS 1.26_ LEARNING OUTCOME _
Scatterplots --- Part I
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Anscombe's Activity
These data were created by F.J. Anscombe* in 1973 to remind us of the importance of plotting our data. You will see these data again later on in this workbook.
Questions
1. Create a scatterplot of y1 vs. x1. Does the plot show a positive
association or a negative association? How do you know?
2. Create a scatterplot of y4 vs. x4. Does the plot show a positive association or a negative
association? How do you know? Make sure you turn in your plots with this assignment
Obs x1 y1 x4 y4
1 10 8.04 8 6.58
2 8 6.95 8 5.76
3 13 7.58 8 7.71
4 9 8.81 8 8.84
5 11 8.33 8 8.47
6 14 9.96 8 7.04
7 6 7.24 8 5.25
8 4 4.26 19 12.5
9 12 10.84 8 5.56
10 7 4.82 8 7.91
11 5 5.68 8 6.89
Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, Connecticut: Graphics Press, 1983), pp. 14‐15. F.J. Anscombe, "Graphs in Statistical Analysis," American Statistician, vol. 27 (Feb 1973), pp. 17‐21
0
5
10
15
0 5 10 15 20
Y1
X1
0
5
10
15
0 5 10 15 20
Y4
X4
Exhibit 2
Vaccines and Risk
There is an on‐going debate over possible links between vaccines with thimerosal and onset of autism. The data set below records the percentages of California children who had received 4 doses of DTP by their 2nd birthday and the number of autism cases in California’s Department of Developmental Services’ regional service center system*.
Questions
1. Create a scatterplot of Autism Cases versus DTP Coverage. Does the plot show a positive association
or a negative association? How do you know? Make sure you turn in your plot with this
assignment.
2. Is the association weak or strong? Defend your reasoning.
*Dr. Loring Dales from the Immunization Branch, California Department of Health Service made these data publically available at http://www.putchildrenfirst.org/media/4.6.pdf. See also http://www.ncbi.nlm.nih.gov/pubmed/11231748
Year
DTP Coverage
(%)
Number of Autism Cases
1980 50.9 176 1981 55.4 201 1982 52.1 212 1983 47.7 229 1984 48.9 246 1985 54.3 293 1986 54.1 357 1987 55.3 347 1988 60.9 436 1989 62.2 522 1990 65.9 663 1991 67.3 823 1992 69.8 1042 1993 73.6 1090 1994 75.7 1182
0
200
400
600
800
1000
1200
1400
40 50 60 70 80
Aut
ism
Cas
es
DTP Coverage (%)
California 1980‐1994
FYI‐ Plots, so you can see what they students should see.
Part 1 Exhibit 1
Part 1 Exhibit 2
02468
101214
0 5 10 15 20
x1-y1
02468
101214
0 5 10 15 20
x4-y4
0
200
400
600
800
1000
1200
1400
40 50 60 70 80
Aut
ism
Cas
es
DTP Coverage (%)
California 1980‐1994
BEYOND THE NUMBERS 1.27_ LEARNING OUTCOME _
Scatterplots --- Part II
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Mortality and Global Warming
In this exercise we want you to construct a scatterplot of “Child Mortality” versus “CO2 Emissions” for
192 countries, from 2006 data, archived by Dr. Hans Rosling*. These data are available at
http://www.heretheyarenow . You must use a computer software package (e.g. Excel or Numbers), or
an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is
required. Make sure you label your axes and provide a professional plot. Answer the questions below.
Save your computer work. You may need it for another Beyond the Numbers later on.
Questions
1. What computer software did you use to construct your plot? Make sure you turn in your plot
with this assignment.
2. Does the scatterplot show a positive association or a negative association? How do you know?
3. Is the association weak or strong? Defend your reasoning.
*Hans Rosling is Professor of International Health at Karolinska Institute and the co‐founder and chairman of the Gapminder Foundation. Dr.
Rosling is committed to making important public data available for easy plotting and analysis with his Gapminder software.
Exhibit 2
Mortality and Global Warming Transformed
Save your computer work for this Exhibit. You may need it for another Beyond the Numbers later on.
Questions
1. Redo the scatterplot from Exhibit 1. Same rules on required use of a computer package and
professional‐looking result. This time plot log10(Child Mortality) versus log10(CO2 Emissions).
How does this plot compare to the one you did in Exhibit 1? Make sure you turn in your plot
with this assignment.
2. Does the scatterplot show a positive association or a negative association? How do you know?
3. Is the association weak or strong? Defend your reasoning.
FYI‐ Plots, so you can see what they students should see.
Part 2 Exhibit 1 – I’d like this to replace the plot on page 20 of current workbook. Same data, this is
lin/lin scale and should not be proprietary. For reference, India (blue), China (red) and US are
highlighted dots.
0
20
40
60
80
100
120
140
160
180
200
0 10 20 30 40 50 60 70
Child
mor
talit
y (0
-5 y
ear o
ld d
ying
per
1,0
00 b
orn)
CO2 emissions (tonnes per person)
Part 2 Exhibit 2 – Nice linear outcome. Will compare r in Exhibit 1 and Exhibit 2 in next BN
0
0.5
1
1.5
2
2.5
3
-2 -1 0 1 2
Child
mor
talit
y (0
-5 y
ear o
ld d
ying
per
1,0
00 b
orn)
CO2 emissions (tonnes per person)
BEYOND THE NUMBERS 1.28_ LEARNING OUTCOME _
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
1.
2.
3.
4.
5.
BEYOND THE NUMBERS 1.29_ LEARNING OUTCOME _
Computing Correlations --- Part I
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Anscombe's Activity Revisited
Recall the Anscombe’s data from an earlier Beyond the Numbers. In this activity you will be asked to compute the correlation coefficient for each pair of variables and compare.
Questions
1. Compute r for the (x1,y1) pairs.
2. Compute r for the (x4,y4) pairs.
3. Compare the two r values you found in light of the scatterplots of these data (which you plotted earlier). What note of inferential caution does this exercise sound?
Obs x1 y1 x1y1 x12 y1
2
1 10 8.04
2 8 6.95
3 13 7.58
4 9 8.81
5 11 8.33
6 14 9.96
7 6 7.24
8 4 4.26
9 12 10.84
10 7 4.82
11 5 5.68
Σx = Σy = Σxy = Σx2
= Σx2 =
Obs x4 y4 x4y4 x42 y4
2
1 8 6.58
2 8 5.76
3 8 7.71
4 8 8.84
5 8 8.47
6 8 7.04
7 8 5.25
8 19 12.5
9 8 5.56
10 8 7.91
11 8 6.89
Σx = Σy = Σxy = Σx2
= Σx2 =
Exhibit 2
Vaccines and Risk Revisited
Dr. Loring Dales of the Immunization Branch, California Department of Health Service writes “here are the data we have on (a) percentages of California children who had received 4 doses of DTP by their 2nd
Questions
1. Fill out all the entries in the table that are missing. Your instructor may have you retype the
table if you are not required to turn in this actual page.
2. Compute the correlation coefficient between DTP Coverage and Autism Prevalence
Year X = DTP Coverage (%)
Y = Number of Autism Cases xy x
2 y
2
1980 50.9 176 1981 55.4 201 1982 52.1 212 1983 47.7 229 1984 48.9 246 1985 54.3 293 1986 54.1 357 1987 55.3 347 1988 60.9 436 1989 62.2 522 1990 65.9 663 1991 67.3 823 1992 69.8 1042 1993 73.6 1090 1994 75.7 1182
Σx = Σy = Σxy = Σx2
= Σx2 =
FYI‐ Key bit, so you can see what they students should see.
Part 1 Exhibit 1
r = 0.82
r = 0.82 YIKES reaction
Part 1 Exhibit 2
R = 0.9616552 (“strong” correlation, will address later in causation section) Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation)
02468
101214
0 5 10 15 20
x1-y1
02468
101214
0 5 10 15 20
x4-y4
BEYOND THE NUMBERS 1.30_ LEARNING OUTCOME _
Computing Correlations --- Part II
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Mortality and Global Warming Revisited
Refer to BN1.B: Scatterplots – Part II. Hans Rosling is Professor of International Health at Karolinska
Institute and the co‐founder and chairman of the Gapminder Foundation. Dr. Rosling is committed to
making important public data available for easy plotting and analysis with his Gapminder software.
In this exercise we want you to compute the correlation coefficient between “Child Mortality” versus
“CO2 Emissions” for 192 countries, from 2006 data archived by Dr. Rosling. These data are available at
http://www.heretheyarenow . You must use a computer software package such as Excel or Numbers, or
an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is
required.
Questions
1. What is the value of r?
2. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong
or weak? How do you know?
3. Do you think the computation of r is appropriate for these data? Why or why not? (You
constructed a scatterplot of these data in BN_1.27).
Exhibit 2
Transformations Revisited
Refer to BN1.B: Scatterplots – Part II. In this exercise we want you to compute the correlation
coefficient between log10(Child Mortality) and log10(CO2 Emissions) for the 192 countries, from 2006, in
the data archived by Dr. Rosling. The raw data are available at http://www.heretheyarenow . You must
use a computer software package such as Excel or Numbers, or an online applet. Your instructor will tell
you which package she requires, if, indeed, a particular one is required.
Questions
1. What is the value of r?
2. How does this value of r compare to the one found in Exhibit 1?
3. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong
or weak? How do you know?
4. Do you think the computation of r is appropriate for these data? Why or why not? (You
constructed a scatterplot of these data in BN_Ab. See page kjlkjl also.
FYI‐ Key bit, so you can see what they students should see.
Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation)
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
1.
2.
3.
4.
5.
BEYOND THE NUMBERS 1.36_ LEARNING OUTCOME _
Outliers and Leverage Points
Name: Section Number:
To be graded, all assignments must be completed and submitted on the original book page.
Exhibit 1
Heptathletes
Finish data for two events from the 1992 Olympic Heptathlon are shown below. A scatterplot of the data are shown just to the right of the table. Chouaa is the green data point and Barber is the red one.
Questions
1. What kind of association do you see in the scatterplot –
positive, negative, neither? Support your answer.
0
10
20
30
40
50
60
12 13 14 15 16 17
Jave
lin D
ista
nce
(Met
ers)
Hurdles Time (Seconds)
Heptatlon Results from 1992Name Hurdles
(seconds) Javelin (meters)
Joyner‐Kersee 12.85 44.98
Nastase 12.86 41.3
Dimitrova 13.23 44.48
Belova 13.25 41.9
Braun 13.25 51.12
Beer 13.48 48.1
Court 13.48 52.12
Kamrowska 13.48 44.12
Wlodarczyk 13.57 43.46
Greiner 13.59 40.78
Kaljurand 13.64 47.42
Zhu 13.64 45.12
Skjaeveland 13.73 35.42
Lesage 13.75 41.28
Nazaroviene 13.75 44.42
Aro 13.87 45.42
Marxer 13.94 41.08
Rattya 13.96 49.02
Carter 13.97 37.58
Atroshchenko 14.03 45.18
Vaidianu 14.04 49
Teppe 14.06 52.58
Clarius 14.1 45.14
Bond‐Mills 14.31 43.3
Barber 14.79 0
Chouaa 16.62 44.4
2. Compute the correlation coefficient “r” for the entire data set. You should use a software package or
an online applet as required by your instructor. Is this value of “r” consistent with what you
answered in Question 1? Why or why not?
Exhibit 2
Language
“Outliers” in a scatterplot are data pairs that are not spatially close to the bulk of the data. Outliers are
not necessarily a problem for the human inference that arises from a correlation coefficient. However,
if the removal of a single outlier causes a distinct change in the correlation, then that outlier would be
called an “influence point” and influence points can disguise the essence of an association.
Questions
1. Looking at the scatterplot above, which athletes are outliers?
2. Compute the correlation coefficient “r” for the data set with Barber removed. Is Barber an
influence point? Why?
3. Compare the values of “r” that you computed for the entire data set and for the data set with
Barber removed. Which one best reflects the association seen in the scatterplot? Why?
FYI – Easy computations show the following:
a) r overall is ‐0.25213, not large but notably incongruous with the plot
b) r w/o barber is 0.00061 definitely making her a leverage point