Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

1

Chapter 7 (part 2)

 Scatterplots, Association, and Correlation

Word of Caution in Correlation: Beware of Outliers

2

 Outliers can greatly

 For all n=10 data points,

 For the n=9 clustered data points,

impact correlation.

r = 0.880

r = 0.000

Don’t just remove outliers   It is not appropriate (or ethical) to just remove

a data point because it does not follow the general trend (or the expected trend).

 Start by making sure it is not a mistake.  Often, outliers can show us an interesting phenomenon that leads to further research.

3

Perhaps the most important caution about interpreting correlation is: Correlation does not necessarily imply causality.

4

Correlation does not imply Causality

5

Possible Explanations for a Correlation 1.  The correlation may be a coincidence.

2.  Both correlation variables might be directly influenced by some common underlying cause.

3.  One of the correlated variables may actually be a cause of the other. But note that, even in this case, it may be just one of several causes.

Correlation and Causality (Concepts & Applications)

 For each example that follows, state whether the correlation is most likely due to …

  (Choose which reason is most likely)  Coincidence

 A common underlying cause

 A direct cause (i.e. changing X causes a change in Y)

6


 Greater damage was observed when more firefighters were at a fire.

  (Which reason is most likely?)  Coincidence


 A direct cause (i.e. changing X causes a change in Y)

7

Number of firefighters at scene

fire

dam

age

(dol

lars

$)


 There is a positive correlation between practice time and skill among piano players; that is, those who practice more tend to be more skilled.



 A direct cause (i.e. changing X causes a change in Y) 8


 Amount of ice cream bought at the beach, and the number of swimmers requiring help from the lifeguards was found to be positively correlated.



 A direct cause (i.e. changing X causes a change in Y) 9

Example: SAT score vs. public $$$ spent on education

  In setting public policy, we may often hear something like…

10

“State spending on education is positively correlated with SAT scores and therefore we should increase our state’s spending on education.”

Example: SAT score vs. public $$$ spent on education  Data was taken from the 1997 Digest of

Education Statistics, an annual publication of the U.S. Department of Education.

11 4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

Are you surprised by the relationship in this plot? What could be going on here?

r = −0.3805

Example: SAT score vs. public $$$ spent on education  First, just looking at this scatterplot, would we

12 4 6 8 10

800

900

1000

1100

1200



Tot

al S

AT

sco

re

interpret it as “Increasing expenditures causes a decrease in SAT scores?”

NO.

r = −0.3805

Simpson’s Paradox

 A statistical relationship between two variables can be reversed by including additional factors in the analysis.

13

4 6 8 10

800

900

1000

1100

1200



Tot

al S

AT

sco

re

Alaska

Iowa

Minnesota

New Jersey

North Dakota

Utah Wisconsin

Connecticut

Idaho

New York

South Carolina

Example: SAT score vs. public $$$ spent on education   Let’s ask… do ALL students in these states

14

take the SAT? It turns out that the answer is ‘no’ and this REALLY matters…

r = −0.3805

4 6 8 10

800

900

1000

1100

1200



Tot

al S

AT

sco

re

% of students taking SAT<9%9-10%11%-20%21%-40%41%-60%61%-69%70%-81%


15

Does the percent of students

taking the SAT in a

state help explain the paradox?

YES!

Low percentage

high percentage

4 6 8 10

800

900

1000

1100

1200



Tot

al S

AT

sco

re



16

Within states with the same %

taking the SAT, we

actually see a positive

relationship!!

Low percentage

high percentage

17

When only a few students take the SAT in a state, who are these students? (best students)

4 6 8 10

800

900

1000

1100

1200



Tot

al S

AT

sco

re



If you have ALL students in a state taking the SAT, then you won’t be grabbing just the ‘good students’… and the average SAT will be lower compared to states with a small percentage of their ‘best’ students taking the SAT (is this a fair state-to-state comparison?)

18

Within similar states (i.e. similar % taking the SAT) we see that spending more $$$ is associated with higher SAT scores.

4 6 8 10

800

900

1000

1100

1200



Tot

al S

AT

sco

re



Simpson’s Paradox

 A statistical relationship between two variables can be reversed by including additional factors in the analysis.

 By including the variable called “% of students taking the SAT”, we saw a reversal of the relationship shown in the original scatterplot between SAT score and $$$ spent per student.

19

Interpret correlation with caution

 Remember that correlation is a simple summary of a sometimes complex situation.

 Scatterplots are useful, but they do have limitations when many variables impact each other in a complex manner.

20

21

The SAT score and expenditure information was a modification of material available at:

www.stat.ucla.edu/labs/pdflabs/sat.pdf

Food for thought… Should we reward school teachers based on student’s standardized test scores?

22

0 20 40 60 80 100

0200

400

600

800

1000

1200

Test Scores vs. teaching ability and effort

Teacher's genuine ability and effort

Stu

dent

's n

atio

nal t

est s

core

s What might this scatterplot look like? If students score high, was the teacher good? If students score low, was the teacher bad?

Documents

Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations