Name: Section Numberweb.as.uky.edu/statistics/users/rayens/STA210... · BN1.1 Now Showing: Basic Numeracy Five questions None No Encounter NA BN1.2 Background Bugaboos Currently 1.1

Module 1 – Proposed Rearrangements and Additions

New Order Topic Comment Mathematics Required Software? Stage Corrections, Changes to Existing Material

BN1.1 Now Showing: Basic Numeracy Five questions None No Encounter NA

BN1.2 Background Bugaboos Currently 1.1 Percentages, addition, division No Encounter None

BN1.3 Times Table Troubles Currently 1.2 Multiplication No Encounter Question 1 refers to Table 1.1 when it should refer to Table 1.2

BN1.4 Now Showing: Computations, Benchmarks Five Questions Percentages No Engage NA (suggestion to update benchmarks on video)

BN1.5 Perceptions, Pictures, Pcts Currently 1.3 Percentages No Engage None

BN1.6 Computation and Common Sense Currently 1.4 Division No Engage None

BN1.7 Really Random Reasoning Currently 1.12 Counting No Reflect None

BN1.8 Hardwired to Slippery Thinking Currently 1.13 None No Reflect None

BN1.9 Why Numeracy Matters New Division No Reflect NA

BN1.10 Mean versus Median New Addition, division Yes Extend NA

BN1.11 Variation Matters New Addition, division, counting Optional Extend NA

BN1.12 Computing the Standard Deviation New Addition, division, square root Yes Extend NA

BN1.13 Now Showing: Expers - Introduction Five questions None Encounter NA

BN1.14 Slippery Evidence and Confounding Currently 1.6 None No Encounter None

BN1.15 Confounding Confusion New None No Encounter NA

BN1.16 Now Showing: Compare and Rand Five Questions Percentages, counting No Engage NA

BN1.17 Experimentation Takes Flight Currently 1.8 Addition, division, counting No Engage None

BN1.18 Catching on to Experimentation New None No Engage NA

BN1.19 Now Showing: Stat Sig Five Questions None No Reflect NA

BN1.20 Questionable Evidence New None No Reflect NA

BN1.21 Random Reflections New None No Reflect NA

BN1.22 Assessing Statistical Significance New Addition, division, square root Optional Extend NA

BN1.23 Designer Thoughts New Addition, division Yes Extend NA

BN1.24 What to Believe? New None No Extend NA

BN1.25 Now Showing: Scatterplots Five Questions None Encounter NA

BN1.26 Scatterplot – Part I New None No Encounter NA

BN1.27 Scatterplot –Part II New None Yes Encounter NA

BN1.28 Now Showing: Corr Coef Five Questions Addition, division, square root No Engage NA

BN1.29 Corr – Part I New Addition, division, square root No Engage NA

BN1.30 Corr – Part II New Addition, division, square root Yes Engage NA

BN1.31 Now Showing: Causation Five Questions None No Reflect NA

BN1.32 Association and Causation Currently 1.9 None No Reflect Replace the graph in Exhibit 2 with one I generated for BN 1.27

BN1.33 Association and Causation Revisited Currently 1.10 None No Reflect None

BN1.34 Simpson Currently 1.14 Fractions, percentages No Extend None

BN1.35 Simpson Revisited Currently 1.15 Fractions, percentages No Extend None

BN1.36 Correlation and Outliers New Addition, division, square root Yes Extend NA

BC1.1 A Very Lucky Project Currently BC 1.1

None No Beyond the Class None

BC1.2 I Got Your Simpson Right Here Currently BC 1.2

Fractions Yes Beyond the Class Replace first bullet link (no longer exists) with http://blog.revolutionanalytics.com/2013/10/an-interactive-tool-to-explain-simpsons-paradox.html

BC1.3 Watch My Slippery Evidence Currently BC 1.3

None No Beyond the Class None

REMOVE Now Showing Number Sense Currently BN1.5

Old single-page video questions. Being broken into many different pages now.

http://blog.revolutionanalytics.com/2013/10/an-interactive-tool-to-explain-simpsons-paradox.html

http://blog.revolutionanalytics.com/2013/10/an-interactive-tool-to-explain-simpsons-paradox.html

BEYOND THE NUMBERS 1.1_ LEARNING OUTCOME _

Name: Section Number:

To be graded, all assignments must be completed and submitted on the original book page.






Exhibit 1

Nursing Knowledge Needed

Questions

2. The nurse had injected the patient four times with a full 0.9 milliliter syringe. What was the nurse’s

mistake? Defend your answer.

Exhibit 2

Statistical Citizenship

http://www.maa.org/external_archive/QL/pgs7_20.pdf

Questions

1. Briefly list three “features of the Constitution that suggest a numerical approach to

governance.”

2. Initially the government was reluctant to collect more than the most basic census information of

race, sex, and age. Why? During which of the three time periods addressed by Cohen did this

attitude change?

3. In the colonies, if you had 48 pounds of soap, how many firkins of soap did you have?

4. Cohen writes that “the post-Civil War era finally brought a full melding of statistical data with

the functioning of representative government.” List three facts supporting this claim.




Exhibit 1

Winging It

http://www.robertniles.com/stats/

http://hypertextbook.com/facts/2006/bodyproportions.shtml

Exhibit 2

A BIG Visit

Exhibit 3

Gates Proof Inference


Variation Matters



Exhibit 1

Uncommon Reach The wingspans of 18 persons are recorded in the table below, nine being ordinary folks and nine

being current or former NBA players.

Questions

1. Use a software package (e.g.

Excel or Numbers) to compute

the mean and the median of the

entire 18-person data set (“Data

Set I”).

Mean ________

Median ________

2. How do the mean and median

compare?

Non- NBA Persons

Wingspan (in)

NBA Persons Wingspan (in)

P1 70 Ike Diogu 88

P2 73 Anthony Davis 88

P3 72 Shelden Williams 88

P4 69 Elton Brand 90

P5 69 Shawn Bradley 90

P6 68 Bismack Biyombo 90

P7 68 Saer Sene 93

P8 64 Gheorghe Muresan 94

P9 73 Manute Bol 102

Exhibit 2

Middle Muddle Questions

1. Construct a histogram of all 18 of the wingspans in the table in Exhibit 1. Use the intervals shown in the table to the right. And plot the histogram on the axes below. Your instructor may require you to use a software package to do this exercise so follow her lead. The label of “60” on the plot below denotes the first bin – Wingspan ≤ 60. The label of “75” denotes the fourth bin – 70 < Wingspan ≤ 75 – and so on.

2. Locate the mean on the plot by drawing in a vertical line segment there. How useful is the mean at describing these 18 wingspans? Explain?

Interval (Bin) Frequency Wingspan ≤ 60

60 < Wingspan ≤ 65








0

1

2

3

4

5

6

7

60 65 70 75 80 85 90 95 100 105 More

Freq

uenc

y

Wingspan (inches)

Wingspan Data

3. Let’s add an additional data set. Now suppose you have a data set of 18 persons, 8 with a wingspan of 80.5 inches, 5 with a wingspan of 75.5 and 5 with a wingspan of 85.5. We’ll call this “Data Set II.” Find the mean and median of these 18 data points.

4. Construct a histogram of all 18 of the wingspans from Question 3. Use the intervals shown in the table to the right. And plot the histogram on the axes below. Your instructor may require you to use a software package to do this exercise so follow her lead. The label of “60” on the plot below denotes the first bin – Wingspan ≤ 60. The label of “75” denotes the fourth bin – 70 < Wingspan ≤ 75 – and so on.

Interval (Bin) Frequency Wingspan ≤ 60









0

1

2

3

4

5

6

7

60 65 70 75 80 85 90 95 100 105 More

Freq

uenc

y

Wingspan (inches)

Wingspan Data

Exhibit 3

The Spice of Life Questions

1. Compare the data sets from Exhibit 1 and Exhibit 2. How close are their average values?

2. Compare the histograms from Exhibits 1 and 2. Specify at least two ways the histograms are notably different.

3. Let’s play a game. It costs you $1000 to play. Here are the rules. You get to pick two wingspans at random, eyes closed, out of either Data Set I or Data Set II. Your choice. Call your choices x1 and x2. You will receive a reward of $(80.5-x1)2 + $(80.5-x2)2. Suppose you decided to pick from Data Set I and chose P2 and Saer Sene. How much money did you make?

4. Think back to the game in Question 3. If you truly get to pick which Data Set you want to choose your two wingspans from, then which Data Set would you always be safest (in terms of anticipated profit) to choose and why?

FYI – Some plots and facts.

Mean of Data Set 1 is 80.5

Mean of Data Set 2 is 80.5

Histograms are below

0

1

2

3

4

5

6

7

60 65 70 75 80 85 90 95 100 105 More

Freq

uenc

y

Wingspan (inches)

Wingspan Data

0

1

2

3

4

5

6

7

8

9

60 65 70 75 80 85 90 95 100 105 More

Freq

uenc

y

Wingspan (inches)

Wingspan Data




Background

√( )

( ) ( )

Uncommon Reach Revisited

http://www.robertniles.com/stats/stdev.shtml

Variance 140.2647059

Standard Deviation 11.84334015

Variance 14.70588

Standard Deviation 3.834825

3.088365219






Exhibit 1

Brains and Beats

Exhibit 2

Fuzzy Quasi is a Bear






A Measured Response

Distance (cm)

Time (sec)

Distance (cm)

Time (sec)

1 0.045 16 0.181

2 0.064 17 0.186

3 0.078 18 0.192

4 0.090 19 0.197

5 0.101 20 0.202

6 0.111 21 0.207

7 0.119 22 0.212

8 0.128 23 0.217

9 0.135 24 0.221

10 0.143 25 0.226

11 0.150 26 0.230

12 0.156 27 0.235

13 0.163 28 0.239

14 0.169 29 0.243

15 0.175 30 0.247

Group R Time (sec) Group L

Time (sec)

1 1

2 2

3 3

4 4

5 5

6 6

4. Find the mean of both groups. Based on those two values, is there evidence of a difference

between the reaction times of Group L and Group R? Defend your answer.

5. What role would the variance of the measurements in each group have in making this decision in

Question 3 more precise? Explain.






Exhibit 1

Cancer Carafe

http://www.nytimes.com/1981/03/12/us/study-links-coffee-use-to-pancreas-cancer.html

http://www.nytimes.com/1981/06/30/science/critics-say-coffee-study-was-flawed.html

http://www.nytimes.com/1981/06/30/science/critics-say-coffee-study-was-flawed.html

Exhibit 2

Of Mice and People

http://www.nature.com/news/misleading-mouse-studies-waste-medical-resources-1.14938

http://www.nature.com/news/misleading-mouse-studies-waste-medical-resources-1.14938

http://www.cnn.com/2014/02/06/health/subway-bread-chemical/



Exhibit 1

What's Random?

Exhibit 2

Random Opposition

B A B A B A A A

B A B A A A B B

A B A B B A B B



Badge of Big

Table B

Group R Time (sec) Group L

Time (sec)

1 0.090 1 0.111

2 0.119 2 0.181

3 0.143 3 0.090

4 0.169 4 0.186

5 0.064 5 0.045

6 0.150 6 0.143

5. Compute

√

6. It turns out that the one can say that the difference between the left-hand reaction times and right-

hand reaction times fail to be statistically significant if -2.23 < |Z| < 2.23, where Z is what you

computed in Question 5. Do the results shown in Table B support a statistically significant

difference in reaction times? Why or why not?

Instructor’s Note: The notation Z was used instead of “t” to avoid confusion later in the workbook. Also the degrees of freedom were

computed using the simple estimate of 6 + 6 -2.


Pairing Profits

Reaction Times (sec)

Left Right

1 1.05

0.74 0.76

0.66 0.71

0.78 0.79

0.68 0.69

0.65 0.72

0.75 0.75

0.69 0.72

0.94 0.99

0.79 0.8

0.81 0.82

0.62 0.67

(24-1) x

Variance of All

24 Reaction

Times

Variance

Attributed to

Hand (L/R)

Variance Left

Unexplained = +

Variance

Explained by

Pairing

Variance Left

Unexplained +

(24-1) x

Variance of All

24 Reaction

Times

Variance

Attributed to

Hand (L/R) = +

Time

Source DF Sum of Squares Mean Square F Value Pr > F

Model 12 0.30148333 0.02512361 95.30 <.0001

Error 11 0.00290000 0.00026364

Corrected Total 23 0.30438333

R-Square Coeff Var Root MSE Time Mean

0.990473 2.097337 0.016237 0.774167

Source DF Type I SS Mean Square F Value Pr > F

Subject 11 0.29608333 0.02691667 102.10 <.0001

Hand 1 0.00540000 0.00540000 20.48 0.0009

Source DF Type III SS Mean Square F Value Pr > F

Subject 11 0.29608333 0.02691667 102.10 <.0001

Hand 1 0.00540000 0.00540000 20.48 0.0009

Time

Source DF Sum of Squares Mean Square F Value Pr > F

Model 1 0.00540000 0.00540000 0.40 0.5350

Error 22 0.29898333 0.01359015

Corrected Total 23 0.30438333

R-Square Coeff Var Root MSE Time Mean

0.017741 15.05836 0.116577 0.774167

Source DF Type I SS Mean Square F Value Pr > F

Hand 1 0.00540000 0.00540000 0.40 0.5350

Source DF Type III SS Mean Square F Value Pr > F

Hand 1 0.00540000 0.00540000 0.40 0.5350




Exhibit 1

Piltdown Meltdown - 1912

Exhibit 2

Marker Mice - 1974

Exhibit 3

Doing the Dishes - 2010



1.

2.

3.

4.

5.


Scatterplots --- Part I



Exhibit 1

Anscombe's Activity

These data were created by F.J. Anscombe* in 1973 to remind us of the importance of plotting our data. You will see these data again later on in this workbook.

Questions

1. Create a scatterplot of y1 vs. x1. Does the plot show a positive

association or a negative association? How do you know?

2. Create a scatterplot of y4 vs. x4. Does the plot show a positive association or a negative

association? How do you know? Make sure you turn in your plots with this assignment

Obs x1 y1 x4 y4

1 10 8.04 8 6.58

2 8 6.95 8 5.76

3 13 7.58 8 7.71

4 9 8.81 8 8.84

5 11 8.33 8 8.47

6 14 9.96 8 7.04

7 6 7.24 8 5.25

8 4 4.26 19 12.5

9 12 10.84 8 5.56

10 7 4.82 8 7.91

11 5 5.68 8 6.89

Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, Connecticut: Graphics Press, 1983), pp. 14‐15. F.J. Anscombe, "Graphs in Statistical Analysis," American Statistician, vol. 27 (Feb 1973), pp. 17‐21

0

5

10

15

0 5 10 15 20

Y1

X1

0

5

10

15

0 5 10 15 20

Y4

X4

Exhibit 2

Vaccines and Risk

There is an on‐going debate over possible links between vaccines with thimerosal and onset of autism. The data set below records the percentages of California children who had received 4 doses of DTP by their 2nd birthday and the number of autism cases in California’s Department of Developmental Services’ regional service center system*.

Questions

1. Create a scatterplot of Autism Cases versus DTP Coverage. Does the plot show a positive association

or a negative association? How do you know? Make sure you turn in your plot with this

assignment.

2. Is the association weak or strong? Defend your reasoning.

*Dr. Loring Dales from the Immunization Branch, California Department of Health Service made these data publically available at http://www.putchildrenfirst.org/media/4.6.pdf. See also http://www.ncbi.nlm.nih.gov/pubmed/11231748

Year

DTP Coverage

(%)

Number of Autism Cases

1980 50.9 176 1981 55.4 201 1982 52.1 212 1983 47.7 229 1984 48.9 246 1985 54.3 293 1986 54.1 357 1987 55.3 347 1988 60.9 436 1989 62.2 522 1990 65.9 663 1991 67.3 823 1992 69.8 1042 1993 73.6 1090 1994 75.7 1182

0

200

400

600

800

1000

1200

1400

40 50 60 70 80

Aut

ism

Cas

es

DTP Coverage (%)

California 1980‐1994

FYI‐ Plots, so you can see what they students should see.

Part 1 Exhibit 1

Part 1 Exhibit 2

02468

101214

0 5 10 15 20

x1-y1

02468

101214

0 5 10 15 20

x4-y4

0

200

400

600

800

1000

1200

1400

40 50 60 70 80

Aut

ism

Cas

es

DTP Coverage (%)

California 1980‐1994


Scatterplots --- Part II



Exhibit 1

Mortality and Global Warming

In this exercise we want you to construct a scatterplot of “Child Mortality” versus “CO2 Emissions” for

192 countries, from 2006 data, archived by Dr. Hans Rosling*. These data are available at

http://www.heretheyarenow . You must use a computer software package (e.g. Excel or Numbers), or

an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is

required. Make sure you label your axes and provide a professional plot. Answer the questions below.

Save your computer work. You may need it for another Beyond the Numbers later on.

Questions

1. What computer software did you use to construct your plot? Make sure you turn in your plot

with this assignment.

2. Does the scatterplot show a positive association or a negative association? How do you know?


*Hans Rosling is Professor of International Health at Karolinska Institute and the co‐founder and chairman of the Gapminder Foundation. Dr.

Rosling is committed to making important public data available for easy plotting and analysis with his Gapminder software.

Exhibit 2

Mortality and Global Warming Transformed

Save your computer work for this Exhibit. You may need it for another Beyond the Numbers later on.

Questions

1. Redo the scatterplot from Exhibit 1. Same rules on required use of a computer package and

professional‐looking result. This time plot log10(Child Mortality) versus log10(CO2 Emissions).

How does this plot compare to the one you did in Exhibit 1? Make sure you turn in your plot

with this assignment.

2. Does the scatterplot show a positive association or a negative association? How do you know?


FYI‐ Plots, so you can see what they students should see.

Part 2 Exhibit 1 – I’d like this to replace the plot on page 20 of current workbook. Same data, this is

lin/lin scale and should not be proprietary. For reference, India (blue), China (red) and US are

highlighted dots.

0

20

40

60

80

100

120

140

160

180

200

0 10 20 30 40 50 60 70

Child

mor

talit

y (0

-5 y

ear o

ld d

ying

per

1,0

00 b

orn)

CO2 emissions (tonnes per person)

Part 2 Exhibit 2 – Nice linear outcome. Will compare r in Exhibit 1 and Exhibit 2 in next BN

0

0.5

1

1.5

2

2.5

3

-2 -1 0 1 2

Child

mor

talit

y (0

-5 y

ear o

ld d

ying

per

1,0

00 b

orn)

CO2 emissions (tonnes per person)




1.

2.

3.

4.

5.


Computing Correlations --- Part I



Exhibit 1

Anscombe's Activity Revisited

Recall the Anscombe’s data from an earlier Beyond the Numbers. In this activity you will be asked to compute the correlation coefficient for each pair of variables and compare.

Questions

1. Compute r for the (x1,y1) pairs.

2. Compute r for the (x4,y4) pairs.

3. Compare the two r values you found in light of the scatterplots of these data (which you plotted earlier). What note of inferential caution does this exercise sound?

Obs x1 y1 x1y1 x12 y1

2

1 10 8.04

2 8 6.95

3 13 7.58

4 9 8.81

5 11 8.33

6 14 9.96

7 6 7.24

8 4 4.26

9 12 10.84

10 7 4.82

11 5 5.68

Σx = Σy = Σxy = Σx2

= Σx2 =

Obs x4 y4 x4y4 x42 y4

2

1 8 6.58

2 8 5.76

3 8 7.71

4 8 8.84

5 8 8.47

6 8 7.04

7 8 5.25

8 19 12.5

9 8 5.56

10 8 7.91

11 8 6.89


= Σx2 =

Exhibit 2

Vaccines and Risk Revisited

Dr. Loring Dales of the Immunization Branch, California Department of Health Service writes “here are the data we have on (a) percentages of California children who had received 4 doses of DTP by their 2nd

Questions

1. Fill out all the entries in the table that are missing. Your instructor may have you retype the

table if you are not required to turn in this actual page.

2. Compute the correlation coefficient between DTP Coverage and Autism Prevalence

Year X = DTP Coverage (%)

Y = Number of Autism Cases xy x

2 y

2

1980 50.9 176 1981 55.4 201 1982 52.1 212 1983 47.7 229 1984 48.9 246 1985 54.3 293 1986 54.1 357 1987 55.3 347 1988 60.9 436 1989 62.2 522 1990 65.9 663 1991 67.3 823 1992 69.8 1042 1993 73.6 1090 1994 75.7 1182


= Σx2 =

FYI‐ Key bit, so you can see what they students should see.

Part 1 Exhibit 1

r = 0.82

r = 0.82 YIKES reaction

Part 1 Exhibit 2

R = 0.9616552 (“strong” correlation, will address later in causation section) Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation)

02468

101214

0 5 10 15 20

x1-y1

02468

101214

0 5 10 15 20

x4-y4


Computing Correlations --- Part II



Exhibit 1

Mortality and Global Warming Revisited

Refer to BN1.B: Scatterplots – Part II. Hans Rosling is Professor of International Health at Karolinska

Institute and the co‐founder and chairman of the Gapminder Foundation. Dr. Rosling is committed to

making important public data available for easy plotting and analysis with his Gapminder software.

In this exercise we want you to compute the correlation coefficient between “Child Mortality” versus

“CO2 Emissions” for 192 countries, from 2006 data archived by Dr. Rosling. These data are available at

http://www.heretheyarenow . You must use a computer software package such as Excel or Numbers, or

an online applet. Your instructor will tell you which package she requires, if, indeed, a particular one is

required.

Questions

1. What is the value of r?

2. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong

or weak? How do you know?

3. Do you think the computation of r is appropriate for these data? Why or why not? (You

constructed a scatterplot of these data in BN_1.27).

Exhibit 2

Transformations Revisited

Refer to BN1.B: Scatterplots – Part II. In this exercise we want you to compute the correlation

coefficient between log10(Child Mortality) and log10(CO2 Emissions) for the 192 countries, from 2006, in

the data archived by Dr. Rosling. The raw data are available at http://www.heretheyarenow . You must

use a computer software package such as Excel or Numbers, or an online applet. Your instructor will tell

you which package she requires, if, indeed, a particular one is required.

Questions

1. What is the value of r?

2. How does this value of r compare to the one found in Exhibit 1?

3. Does the value of r suggest the association between Child Mortality and CO2 Emissions is strong

or weak? How do you know?

4. Do you think the computation of r is appropriate for these data? Why or why not? (You

constructed a scatterplot of these data in BN_Ab. See page kjlkjl also.

FYI‐ Key bit, so you can see what they students should see.

Part 2 Exhibit 1 R = ‐0.438960093 (seems a bit small – because of curvature) Part 2 Exhibit 2 R = ‐0.802859716 (much more reasonable after transformation)



1.

2.

3.

4.

5.


Outliers and Leverage Points



Exhibit 1

Heptathletes

Finish data for two events from the 1992 Olympic Heptathlon are shown below. A scatterplot of the data are shown just to the right of the table. Chouaa is the green data point and Barber is the red one.

Questions

1. What kind of association do you see in the scatterplot –

positive, negative, neither? Support your answer.

0

10

20

30

40

50

60

12 13 14 15 16 17

Jave

lin D

ista

nce

(Met

ers)

Hurdles Time (Seconds)

Heptatlon Results from 1992Name Hurdles

(seconds) Javelin (meters)

Joyner‐Kersee 12.85 44.98

Nastase 12.86 41.3

Dimitrova 13.23 44.48

Belova 13.25 41.9

Braun 13.25 51.12

Beer 13.48 48.1

Court 13.48 52.12

Kamrowska 13.48 44.12

Wlodarczyk 13.57 43.46

Greiner 13.59 40.78

Kaljurand 13.64 47.42

Zhu 13.64 45.12

Skjaeveland 13.73 35.42

Lesage 13.75 41.28

Nazaroviene 13.75 44.42

Aro 13.87 45.42

Marxer 13.94 41.08

Rattya 13.96 49.02

Carter 13.97 37.58

Atroshchenko 14.03 45.18

Vaidianu 14.04 49

Teppe 14.06 52.58

Clarius 14.1 45.14

Bond‐Mills 14.31 43.3

Barber 14.79 0

Chouaa 16.62 44.4

2. Compute the correlation coefficient “r” for the entire data set. You should use a software package or

an online applet as required by your instructor. Is this value of “r” consistent with what you

answered in Question 1? Why or why not?

Exhibit 2

Language

“Outliers” in a scatterplot are data pairs that are not spatially close to the bulk of the data. Outliers are

not necessarily a problem for the human inference that arises from a correlation coefficient. However,

if the removal of a single outlier causes a distinct change in the correlation, then that outlier would be

called an “influence point” and influence points can disguise the essence of an association.

Questions

1. Looking at the scatterplot above, which athletes are outliers?

2. Compute the correlation coefficient “r” for the data set with Barber removed. Is Barber an

influence point? Why?

3. Compare the values of “r” that you computed for the entire data set and for the data set with

Barber removed. Which one best reflects the association seen in the scatterplot? Why?

FYI – Easy computations show the following:

a) r overall is ‐0.25213, not large but notably incongruous with the plot

b) r w/o barber is 0.00061 definitely making her a leverage point

Documents

Name: Section Numberweb.as.uky.edu/statistics/users/rayens/STA210... · BN1.1 Now Showing: Basic Numeracy Five questions None No Encounter NA BN1.2 Background Bugaboos Currently 1.1