22
1 Chapter 7 (part 2) Scatterplots, Association, and Correlation

Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

1

Chapter 7 (part 2)

 Scatterplots, Association, and Correlation

Page 2: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Word of Caution in Correlation: Beware of Outliers

2

 Outliers can greatly

 For all n=10 data points,

 For the n=9 clustered data points,

impact correlation.

r = 0.880

r = 0.000

Page 3: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Don’t just remove outliers   It is not appropriate (or ethical) to just remove

a data point because it does not follow the general trend (or the expected trend).

 Start by making sure it is not a mistake.  Often, outliers can show us an interesting phenomenon that leads to further research.

3

Page 4: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Perhaps the most important caution about interpreting correlation is: Correlation does not necessarily imply causality.

4

Page 5: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Correlation does not imply Causality

5

Possible Explanations for a Correlation 1.  The correlation may be a coincidence.

2.  Both correlation variables might be directly influenced by some common underlying cause.

3.  One of the correlated variables may actually be a cause of the other. But note that, even in this case, it may be just one of several causes.

Page 6: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Correlation and Causality (Concepts & Applications)

 For each example that follows, state whether the correlation is most likely due to …

  (Choose which reason is most likely)  Coincidence

 A common underlying cause

 A direct cause (i.e. changing X causes a change in Y)

6

Page 7: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Correlation and Causality (Concepts & Applications)

 Greater damage was observed when more firefighters were at a fire.

  (Which reason is most likely?)  Coincidence

 A common underlying cause

 A direct cause (i.e. changing X causes a change in Y)

7

Number of firefighters at scene

fire

dam

age

(dol

lars

$)

Page 8: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Correlation and Causality (Concepts & Applications)

 There is a positive correlation between practice time and skill among piano players; that is, those who practice more tend to be more skilled.

  (Which reason is most likely?)  Coincidence

 A common underlying cause

 A direct cause (i.e. changing X causes a change in Y) 8

Page 9: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Correlation and Causality (Concepts & Applications)

 Amount of ice cream bought at the beach, and the number of swimmers requiring help from the lifeguards was found to be positively correlated.

  (Which reason is most likely?)  Coincidence

 A common underlying cause

 A direct cause (i.e. changing X causes a change in Y) 9

Page 10: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Example: SAT score vs. public $$$ spent on education

  In setting public policy, we may often hear something like…

10

“State spending on education is positively correlated with SAT scores and therefore we should increase our state’s spending on education.”

Page 11: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Example: SAT score vs. public $$$ spent on education  Data was taken from the 1997 Digest of

Education Statistics, an annual publication of the U.S. Department of Education.

11 4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

Are you surprised by the relationship in this plot? What could be going on here?

r = −0.3805

Page 12: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Example: SAT score vs. public $$$ spent on education  First, just looking at this scatterplot, would we

12 4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

interpret it as “Increasing expenditures causes a decrease in SAT scores?”

NO.

r = −0.3805

Page 13: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Simpson’s Paradox

 A statistical relationship between two variables can be reversed by including additional factors in the analysis.

13

Page 14: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

Alaska

Iowa

Minnesota

New Jersey

North Dakota

Utah Wisconsin

Connecticut

Idaho

New York

South Carolina

Example: SAT score vs. public $$$ spent on education   Let’s ask… do ALL students in these states

14

take the SAT? It turns out that the answer is ‘no’ and this REALLY matters…

r = −0.3805

Page 15: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

% of students taking SAT<9%9-10%11%-20%21%-40%41%-60%61%-69%70%-81%

Example: SAT score vs. public $$$ spent on education

15

Does the percent of students

taking the SAT in a

state help explain the paradox?

YES!

Low percentage

high percentage

Page 16: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

% of students taking SAT<9%9-10%11%-20%21%-40%41%-60%61%-69%70%-81%

Example: SAT score vs. public $$$ spent on education

16

Within states with the same %

taking the SAT, we

actually see a positive

relationship!!

Low percentage

high percentage

Page 17: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

17

When only a few students take the SAT in a state, who are these students? (best students)

4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

% of students taking SAT<9%9-10%11%-20%21%-40%41%-60%61%-69%70%-81%

Example: SAT score vs. public $$$ spent on education

If you have ALL students in a state taking the SAT, then you won’t be grabbing just the ‘good students’… and the average SAT will be lower compared to states with a small percentage of their ‘best’ students taking the SAT (is this a fair state-to-state comparison?)

Page 18: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

18

Within similar states (i.e. similar % taking the SAT) we see that spending more $$$ is associated with higher SAT scores.

4 6 8 10

800

900

1000

1100

1200

SAT score vs. money spent on students by state

Expenditures per Pupil in $1000s

Tot

al S

AT

sco

re

% of students taking SAT<9%9-10%11%-20%21%-40%41%-60%61%-69%70%-81%

Example: SAT score vs. public $$$ spent on education

Page 19: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Simpson’s Paradox

 A statistical relationship between two variables can be reversed by including additional factors in the analysis.

 By including the variable called “% of students taking the SAT”, we saw a reversal of the relationship shown in the original scatterplot between SAT score and $$$ spent per student.

19

Page 20: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Interpret correlation with caution

 Remember that correlation is a simple summary of a sometimes complex situation.

 Scatterplots are useful, but they do have limitations when many variables impact each other in a complex manner.

20

Page 21: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

21

The SAT score and expenditure information was a modification of material available at:

www.stat.ucla.edu/labs/pdflabs/sat.pdf

Page 22: Association, and Correlation - University of Iowahomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7_correlation_pt2.pdfCorrelation does not imply Causality 5 Possible Explanations

Food for thought… Should we reward school teachers based on student’s standardized test scores?

22

0 20 40 60 80 100

0200

400

600

800

1000

1200

Test Scores vs. teaching ability and effort

Teacher's genuine ability and effort

Stu

dent

's n

atio

nal t

est s

core

s What might this scatterplot look like? If students score high, was the teacher good? If students score low, was the teacher bad?