36
Chapter 7: Correlation te distribution: a distribution that shows the relation between two -2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Area of primary visual cortex Visual Acuity Left hemisphere Right hemisphere This graph is called a scatter plot or scatter diagram

Chapter 7: Correlation

  • Upload
    kirti

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Left hemisphere. Right hemisphere. Chapter 7: Correlation. B ivariate distribution: a distribution that shows the relation between two variables. 1. 0.9. This graph is called a scatter plot or s catter diagram. 0.8. Visual Acuity. 0.7. 0.6. 0.5. 0.4. -2. -1.9. -1.8. -1.7. - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 7:  Correlation

Chapter 7: CorrelationBivariate distribution: a distribution that shows the relation between two variables

-2 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.30.4

0.5

0.6

0.7

0.8

0.9

1

Area of primary visual cortex

Vis

ual A

cuity

Left hemisphereRight hemisphere

This graph is called a scatter plot or scatter diagram

Page 2: Chapter 7:  Correlation

How do we quantify the strength of the relationship between the two variables in a bivariate distribution?

Page 3: Chapter 7:  Correlation

How do we quantify the strength of the relationship between the two variables in a bivariate distribution?

Page 4: Chapter 7:  Correlation

Example from the book:Two measures made for each subject – stress level and eating difficulties

Stress E.D.

17 9

8 13

8 7

20 18

14 11

7 1

21 5

22 15

19 26

30 28 5 10 15 20 25 30 35

5

10

15

20

25

Stress

Eat

ing

Diff

icul

ties

Page 5: Chapter 7:  Correlation

The most common way to quantify the relation between the two variables in a bivariate distribution is the Pearson correlation coefficient, labeled r. r is always between -1 and 1.The z-score formula is the most intuitive formula:

17 9

8 13

8 7

20 18

14 11

7 1

21 5

22 15

19 26

30 28

X Y

16.60

7.02

13.30

8.28

mx =

sx =

my =

sy =

zx zy zxzy

0.06 -0.52 -0.03

-1.23 -0.04 0.04

-1.23 -0.76 0.93

0.48 0.57 0.27

-0.37 -0.28 0.10

-1.37 -1.48 2.03

0.63 -1.00 -0.63

0.77 0.21 0.16

0.34 1.53 0.52

1.91 1.77 3.39

yxzz 6.68

raw scores z scores

Example: use the z-score formula to calculate r: nzz

r yx

68.0nzz

r yx

Page 6: Chapter 7:  Correlation

17 9

8 13

8 7

20 18

14 11

7 1

21 5

22 15

19 26

30 28

x y

0.06 -0.52 -0.03

-1.23 -0.04 0.04

-1.23 -0.76 0.93

0.48 0.57 0.27

-0.37 -0.28 0.10

-1.37 -1.48 2.03

0.63 -1.00 -0.63

0.77 0.21 0.16

0.34 1.53 0.52

1.91 1.77 3.39

zx zy zxzy

How does each data point contribute to the correlation value?

30

mx

my

Points in the upper right or lower left quadrants add to the correlation valuePoints in the upper left or lower right subtract to the correlation value.

5 10 15 20 25 30 35

5

10

15

20

25

Stress

Eat

ing

Diff

icul

ties

r = 0.68

Page 7: Chapter 7:  Correlation

Fun fact about the Pearson correlation statistic

Since the z-scores do not change when you add or multiply the raw scores, the Pearson correlation doesn’t change either.

multiplying y by 2 and adding

100

10 20 30

5

10

15

20

25

Stress

Eat

ing

Diff

icul

ties

r = 0.68

0 20 40

110

120

130

140

150

StressE

atin

g D

iffic

ultie

s

r = 0.68

nzz

r yx

Page 8: Chapter 7:  Correlation

Similarly, the correlation stays the same no matter how you stretch your axes:

As a rule, you should plot your axes with an equal scale.

10 20 30

5

10

15

20

25

StressE

atin

g D

iffic

ultie

s

r = 0.68

0 20 400

5

10

15

20

25

30

Stress

Eat

ing

Diff

icul

ties

r = 0.68

5 10 15 20 25 300

10

20

30

Stress

Eat

ing

Diff

icul

ties

r = 0.68

Page 9: Chapter 7:  Correlation

Guess that correlation!

50 55 60 65 70 75 80

55

60

65

70

75

Average of parent's height (in)

Stu

dent

's h

eigh

t (in

)

n = 90, r = 0.34

Page 10: Chapter 7:  Correlation

Guess that correlation!

58 60 62 64 66 68 70 72

66

68

70

72

74

76

78

Father‘s height (in)

Mal

e st

uden

t's h

eigh

t (in

)

n = 21, r = 0.34

Page 11: Chapter 7:  Correlation

50 55 60 65 70 75 80 85

50

55

60

65

70

75

Mother's height (in)

Fem

ale

stud

ent's

hei

ght (

in)

n = 70, r = 0.68

Page 12: Chapter 7:  Correlation

Guess that correlation!

2.5 3 3.5 42.5

3

3.5

4

High School GPA

UW

GPA

n = 90, r = 0.19

Page 13: Chapter 7:  Correlation

Guess that correlation!

0 5 10 15 20 25

5

6

7

8

9

10

11

Caffeine (cups/day)

Sle

ep (h

ours

/nig

ht)

n = 91, r = -0.12

Page 14: Chapter 7:  Correlation

Guess that correlation!

0 5 10 15 20 25

0

5

10

15

20

25

30

Caffeine (cups/day)

Drin

ks (p

er w

eek)

n = 91, r = 0.01

Page 15: Chapter 7:  Correlation

Guess that correlation!

0 2 4 6 8

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Facebook friends

Drin

ks (p

er w

eek)

n = 91, r = 0.10

Page 16: Chapter 7:  Correlation

Guess that correlation!

30 40 50 60 70 80 90

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Favorite outdoor temperature (F)

Vid

eo g

ame

play

ing

(hou

rs/w

eek)

n = 91, r = -0.19

Page 17: Chapter 7:  Correlation

0 20 40 60 80 100

70

80

90

100

110

120

130

140

x

y

r = -0.56

Guess that correlation!

Page 18: Chapter 7:  Correlation

10 20 30 40 50 60

105

110

115

120

125

130

135

140

145

150

x

y

r = 0.94

Guess that correlation!

Page 19: Chapter 7:  Correlation

10 20 30 40 50 60 70 80 90

100

110

120

130

140

150

160

x

y

r = 0.08

Guess that correlation!

Page 20: Chapter 7:  Correlation

-20 -15 -10 -5 0 5

135

140

145

150

155

x

y

r = -1.00

Guess that correlation!

Page 21: Chapter 7:  Correlation

-40 -30 -20 -10 0 10 20 30 40

80

90

100

110

120

130

140

x

y

r = -0.08

Guess that correlation!

Page 22: Chapter 7:  Correlation

-50 0 50 10080

100

120

140

160

180

200

220

240

x

y

r = 0.49

Guess that correlation!

Page 23: Chapter 7:  Correlation

-20 -10 0 10 20 30 40 50 60 700

10

20

30

40

50

60

70

x

y

r = -0.92

Guess that correlation!

Page 24: Chapter 7:  Correlation

-40 -20 0 20 40 60130

140

150

160

170

180

190

200

210

220

x

y

r = -0.77

Guess that correlation!

Page 25: Chapter 7:  Correlation

r is a measure of the linear relation between two variables

-2 -1 0 1 2

0

0.5

1

1.5

2

2.5

3

3.5

4

x

y

r = 0.01

Page 26: Chapter 7:  Correlation

-1.5 -1 -0.5 0 0.5 1 1.5

-1

-0.5

0

0.5

1

x

y

r = 0.00

Guess that correlation!

Page 27: Chapter 7:  Correlation

-1 -0.5 0 0.5 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

x

y

r = 0.91

Guess that correlation!

Page 28: Chapter 7:  Correlation

nzz

r yx

Z-Score formula for calculating r (intuitive, but not very practical)

Deviation-Score formula for calculating r: (somewhat intuitive, somewhat more practical)

YX SnSYYXX

r

))((

Substituting the formula for z:

XSXXz

Computational formula for calculating r: (less intuitive, more practical)

YX SSSSYYXX

r

))((

Page 29: Chapter 7:  Correlation

Computational formula for calculating r: (less intuitive, more practical)

YX SSSSYYXX

r

))((

A little algebra shows that:

n

YXXYYYXX ))((

Computational raw score formula for calculating r: (least intuitive, most practical)

YX SSSSn

YXXY

r

Page 30: Chapter 7:  Correlation

Using the Computational raw-score formula:

n X Y X2 Y2 XY10 17 9 289 81 153

8 13 64 169 1048 7 64 49 56

20 18 400 324 36014 11 196 121 154

7 2 49 4 1421 5 441 25 10522 15 484 225 33019 26 361 676 49430 28 900 784 840

Totals 166 134 3248 2458 2610

SSX 492.4SSy 662.4

r 0.675

yxSSSSn

YXXYr

nXXSSx

22

nYYSSY

22

Page 31: Chapter 7:  Correlation

A second measure of correlation, called the Spearman Rank-Order Coefficient is appropriate for ordinal scores. It is calculated by:

Where D is the difference between each pair of ranks.

Most often used when:

a) At least one variable is an ordinal scaleb) One of the distributions is very skewed or has outliers

)1(6

1 2

2

nnD

rs

Page 32: Chapter 7:  Correlation

Fact: (According to Wikipedia anyway)

In 1995, National Pax had planned to replace the "Sir Isaac Lime" flavor with "Scarlett O'Cherry," until a group of Orange County, California fourth-graders created a petition in opposition and picketed the company's headquarters in early 1996. The crusade also included an e-mail campaign, in which a Stanford professor reportedly accused the company of "Otter-cide." After meeting with the children, company executives relented and retained the Sir Isaac Lime flavor.[1]

Example: Is there a correlation between your preference for Otter Pops® flavors and mine?

Page 33: Chapter 7:  Correlation

Example: Suppose two wine experts were asked to rank-order their preference for eight wines. How can we measure the similarity of their rankings?

X Y Rank X Rank Y D D2

1 2 1 2 -1 12 1 2 1 1 13 5 3 5 -2 44 3 4 3 1 15 4 5 4 1 16 7 6 7 -1 17 8 7 8 -1 18 6 8 6 2 4

n=8 14 2D

833.)18(8)14)(6(1 2

sr

)1(6

1 2

2

nnD

rs

Page 34: Chapter 7:  Correlation

Pearson correlation is much more sensitive to outlying values than the Spearman coefficient.

From: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

Page 35: Chapter 7:  Correlation

Pearson correlation is much more sensitive to outlying values than the Spearman coefficient.

0 5 10 15 20

5

6

7

8

9

10

11

Caffeine (cups/day)

Sle

ep (h

ours

/nig

ht)

n = 91Pearson's r = -0.12

Spearman's rs = 0.02

0 5 10 15 20

5

6

7

8

9

10

11

Caffeine (cups/day)

Sle

ep (h

ours

/nig

ht)

n = 89Pearson's r = 0.06

Spearman's rs = 0.07

Page 36: Chapter 7:  Correlation

Only the rank order matters for the Spearman coefficient

-0.5 0 0.5

-0.5

0

0.5

1

X

Y

Pearson r: 0.92Spearman r s: 1.00