Linear Correlation. PSYC 6130, PROF. J. ELDER 2 Perfect Correlation 2 variables x and y are perfectly correlated if they are related by an affine transform

Linear Correlation

PSYC 6130, PROF. J. ELDER 2

Perfect Correlation

• 2 variables x and y are perfectly correlated if they are related by an affine transform

y = ax + b

• The correlation is positive if a>0 and negative if a<0.

• By corollary, 2 variables are perfectly positively correlated if and only if each pair of corresponding values has the same z-score.

• If the 2 variables are perfectly negatively correlated, corresponding z-scores will be equal in magnitude but opposite in sign.


Pearson’s r

Pearson's correlation coefficient

for a population:

, where and x y X Yx Y

X Y

z z X Yz z

N

for a sample:

, where and 1

1 if and are perfectly positively correlated

-1 if and are perfectly negatively correlated

x yx Y

X Y

z z X X Y Yr z z

N s s

x y

x y


Scatterplots

PSYC 6130 Section A 2005-2006

y = 0.9678x + 0.86

r2 = 0.5576

75%

80%

85%

90%

95%

100%

80% 82% 84% 86% 88% 90% 92% 94% 96% 98%

Assignment 1 Grade

Ass

ignm

ent 2

Gra

de

0.75r


Pearson’s r only measures linear dependence

• Two variables can have low correlation and still be highly dependent.


CSE 3101 2006F

y = 0.74x + 20

r2 = 0.40

y = 0.0065x2 - 0.15x + 48

r2 = 0.410

20

40

60

80

100

0 20 40 60 80 100

Assignment 1 (%)

Ass

ignm

ent 2

(%

)

Higher-Order Models

PSYC 6130A 2005

y = 0.53x + 0.45

r2 = 0.38

y = -1.4x2 + 2.3x - 0.11

r2 = 0.4650%

60%

70%

80%

90%

100%

30% 50% 70% 90% 110%

Fall Exam (%)

Fin

al E

xam

(%

)


Pearson’s r depends on the range of the variables under study

• r2 measures the proportion of variance in one variable accounted for by the other.

• If the range of variable X is restricted, it will account for less of the variance in Y.


y = 0.867x + 0.105

r2 = 0.64075%

80%

85%

90%

95%

100%

80% 85% 90% 95% 100%

Assignment 1 Grade

Ass

ignm

ent 2

Gra

de

PSYC 6130 Section A 2005-2006 (A+ Grades Only)

y = 0.144x + 0.799

r2 = 0.020

91%

92%

93%

94%

95%

96%

91% 92% 93% 94% 95% 96%

Assignment 1 Grade

Ass

ignm

ent 2

Gra

de

0.80r 0.14r



y = 0.867x + 0.105

r2 = 0.64075%

80%

85%

90%

95%

100%

75% 85% 95% 105%

Assignment 1 Grade

Ass

ignm

ent

2 G

rade

Pearson’s r is Sensitive to Outliers

0.80r


y = 0.40x + 0.539

r2 = 0.16475%

80%

85%

90%

95%

100%

75% 85% 95% 105%

Assignment 1 Grade

Ass

ignm

ent

2 G

rade

0.40r

Outlier (Fake Student)


Standard Definition of Correlation (Population)

2Recall that the population vari oance f isX X

XYcovariWe defin ancee the population of an s d aX Y

2 22 1X X XE X X

N

1XY X Y X YE X Y X Y

N

The Pearson correlation between and is then given byX Y

XY

X Y


Standard Definition of Correlation (Sample)

2Recall that the sample of isvariance X Xs

XYcovarianceWe define the sample of and ass X Y

2 22 1

1Xs E X X X XN

1

1XYs E X X Y Y X X Y YN

The Pearson correlation between and is then given byX Yr

XY

X Y

sr

s s


Computational Formula

( )( ) X Yx y X Y

X Y X Y

XYz z X Y NN N

covariance

For a population:

For a sample: 1

( )( ) 11 ( 1)

x y

X Y X Y

XY NXYz z X X Y Y NrN N s s s s

unbiased covariance


Example: 6130A 2005-2006 Assignment Marks

Assignment 1 Assignment 2

X Y XY

0.8671 0.8176 0.70890.8150 0.8239 0.67150.8497 0.8428 0.71610.8555 0.8679 0.74250.9017 0.8365 0.75430.9538 0.8742 0.83380.9191 0.9308 0.85550.9306 0.9308 0.86630.9480 0.9182 0.87050.9364 0.9371 0.87750.9480 0.9308 0.88240.9422 0.9434 0.88890.9480 0.9560 0.9062

Mean 0.9088 0.8931 0.8134Pop. Std. Dev. 0.04473 0.04847Sample Std. Dev. 0.04655 0.05045

End of Lecture 7

Wed, Oct 29 2008

Correlation and the Power of Matched Tests


Correlation and the Power of Matched t-tests

• Now that we understand correlation, we can better understand the power of matched t-tests when scores in the two conditions are correlated.


y = 0.867x + 0.105

r2 = 0.64075%

80%

85%

90%

95%

100%

75% 85% 95% 105%

Assignment 1 Grade

Ass

ignm

ent

2 G

rade


Recall formulae for standard error for independent and matched tests

• Independent t-test • Matched t-test2 2

2 1 2 1 22D

s s s ss

n

r

n

1 2

2 2 21 2

1X X

s s sn

2 2 21 2

For the purpose of power calculations, assume homogeneity of variance:

[ ] [ ] .

Then:

E s E s

1 2

2 22X X n

2 2 2 22 2 2(1 )

D n n n


Knowing the expected std error, we can estimate the expected t-value

• Independent t-test • Matched t-test

1 2

1 2

1 2

[ ]

2

2

X X

E t

n

nd

1 2

1 2

[ ]

1

2 1

1

2 1

D

E t

n

nd

1Thus

1matched independent


The power of matched t-tests

• Large positive correlations between scores in the two conditions will mean a greater expected t-score for the matched design.

• But keep in mind that the critical value for the matched design will be somewhat larger as well, due to a smaller df.

• Which test is more powerful is decided by the exact tradeoff between these two effects.

1

1matched independent

2( 1)independentdf n 1matcheddf n

Applying Correlation Analysis


Adjusted Correlation Coefficient

Although the sample covariance is an of

the population covariance , the sample correlation coefficient

is an unbiased estimator of the populat

unbiased

ion corr

estimator

n elation coeot fficien t

XY

XY

s

r.

A less-biased estimate of is given by

the adjusted correlation coefficient :adjr

21 11

2adj

r Nr

N


Testing Pearson’s r for Significance

2 2

2Thus

1 12

r Nt r

r rN

2

When =0, is approximately -distributed on N-2 degrees of freedom,

with standard deviation

1

2r

r t

rs

N


Underlying Assumptions (For Inference)

• Independent random sampling

• Bivariate normal distribution

-10 -5 0 5 10-10

-5

0

5

10

X

Y

-100

10

-10

0

100

0.01

0.02

0.03

XY

Pro

babi

lity


Applications of Pearson’s r

• Measuring reliability and validity

– Examples:

• e.g., test-retest reliability

• Split-half reliability

• Inter-rater reliability

• Criterion validity of self-report (correlate self-report against behavioural measure)

• Correlation between tests that are supposed to measure the same thing.

• Correlation between algorithmic model and human responses in behavioural studies.

• Measuring relationships between variables (correlational studies)

– e.g., frequency of cannabis and alcohol use

• Measuring relationships between IVs and DVs (experimental studies, when IV on interval/ratio scale

– e.g., exam performance as a function of alcohol consumption on previous night.


Power Analysis for Pearson’s r

ALet expected correlation under alternate hypothesis

A

Then

E[ ] 1t N


Confidence Intervals for Pearson’s r

• Pearson’s r is bounded on [-1..1].

• Consequently, sampling distribution for r is not normal.

• Sampling distribution for >0 is negatively skewed.

• Sampling distribution for <0 is positively skewed.

• Thus confidence intervals are generally not symmetric.


Fisher Transform

• Fisher transform (Appendix r′): Method for symmetrizing r to facilitate calculation of confidence interval using standard normal table.

1

3rs

N

1 1log

2 1

rr

r

is approximately normally-distributed around with standard deviationr


Confidence Intervals on r

/2 rr z s

Note that since does not depend on statistics computed from the sample,

approximates a normal distribution, not a -distribution, and we can use the

-table for our calculations:

rs

r t

z

The inverse transform can then be applied to convert the confidence limits

back to :

r

r

2

2

1

1

r

r

er

e

These transforms can also be done directly using Appendix r

End of Lecture 8

Nov 5 2008


Testing Difference of Pearson Correlations from 2 Independent Samples

• Converting the skewed r distribution to an (approximately) normal distribution allows straightforward two-sample testing:

1 2

1 2

r r

r rz

s

1 2

1 2

where

1 1

3 3

(Remember that variances add)

r rsN N


Example

COSC 3101 2003-04 Section N

r = 0.303

20

30

40

50

60

70

80

90

100

20 40 60 80 100

Midterm Exam Grade (%)

Fin

al E

xam

Gra

de

(%

)

COSC 3101 2003-04 Section M

r = 0.4486

20

30

40

50

60

70

80

90

100

20 40 60 80 100

Midterm Exam Grade (%)

Fin

al E

xam

Gra

de

(%

)

N=43 N=44

Documents

Linear Correlation. PSYC 6130, PROF. J. ELDER 2 Perfect Correlation 2 variables x and y are perfectly correlated if they are related by an affine transform