Upload
alexander-berry
View
221
Download
6
Tags:
Embed Size (px)
Citation preview
Linear Correlation
PSYC 6130, PROF. J. ELDER 2
Perfect Correlation
• 2 variables x and y are perfectly correlated if they are related by an affine transform
y = ax + b
• The correlation is positive if a>0 and negative if a<0.
• By corollary, 2 variables are perfectly positively correlated if and only if each pair of corresponding values has the same z-score.
• If the 2 variables are perfectly negatively correlated, corresponding z-scores will be equal in magnitude but opposite in sign.
PSYC 6130, PROF. J. ELDER 3
Pearson’s r
Pearson's correlation coefficient
for a population:
, where and x y X Yx Y
X Y
z z X Yz z
N
for a sample:
, where and 1
1 if and are perfectly positively correlated
-1 if and are perfectly negatively correlated
x yx Y
X Y
z z X X Y Yr z z
N s s
x y
x y
PSYC 6130, PROF. J. ELDER 4
Scatterplots
PSYC 6130 Section A 2005-2006
y = 0.9678x + 0.86
r2 = 0.5576
75%
80%
85%
90%
95%
100%
80% 82% 84% 86% 88% 90% 92% 94% 96% 98%
Assignment 1 Grade
Ass
ignm
ent 2
Gra
de
0.75r
PSYC 6130, PROF. J. ELDER 5
Pearson’s r only measures linear dependence
• Two variables can have low correlation and still be highly dependent.
PSYC 6130, PROF. J. ELDER 6
CSE 3101 2006F
y = 0.74x + 20
r2 = 0.40
y = 0.0065x2 - 0.15x + 48
r2 = 0.410
20
40
60
80
100
0 20 40 60 80 100
Assignment 1 (%)
Ass
ignm
ent 2
(%
)
Higher-Order Models
PSYC 6130A 2005
y = 0.53x + 0.45
r2 = 0.38
y = -1.4x2 + 2.3x - 0.11
r2 = 0.4650%
60%
70%
80%
90%
100%
30% 50% 70% 90% 110%
Fall Exam (%)
Fin
al E
xam
(%
)
PSYC 6130, PROF. J. ELDER 7
Pearson’s r depends on the range of the variables under study
• r2 measures the proportion of variance in one variable accounted for by the other.
• If the range of variable X is restricted, it will account for less of the variance in Y.
PSYC 6130 Section A 2005-2006
y = 0.867x + 0.105
r2 = 0.64075%
80%
85%
90%
95%
100%
80% 85% 90% 95% 100%
Assignment 1 Grade
Ass
ignm
ent 2
Gra
de
PSYC 6130 Section A 2005-2006 (A+ Grades Only)
y = 0.144x + 0.799
r2 = 0.020
91%
92%
93%
94%
95%
96%
91% 92% 93% 94% 95% 96%
Assignment 1 Grade
Ass
ignm
ent 2
Gra
de
0.80r 0.14r
PSYC 6130, PROF. J. ELDER 8
PSYC 6130 Section A 2005-2006
y = 0.867x + 0.105
r2 = 0.64075%
80%
85%
90%
95%
100%
75% 85% 95% 105%
Assignment 1 Grade
Ass
ignm
ent
2 G
rade
Pearson’s r is Sensitive to Outliers
0.80r
PSYC 6130 Section A 2005-2006
y = 0.40x + 0.539
r2 = 0.16475%
80%
85%
90%
95%
100%
75% 85% 95% 105%
Assignment 1 Grade
Ass
ignm
ent
2 G
rade
0.40r
Outlier (Fake Student)
PSYC 6130, PROF. J. ELDER 9
Standard Definition of Correlation (Population)
2Recall that the population vari oance f isX X
XYcovariWe defin ancee the population of an s d aX Y
2 22 1X X XE X X
N
1XY X Y X YE X Y X Y
N
The Pearson correlation between and is then given byX Y
XY
X Y
PSYC 6130, PROF. J. ELDER 10
Standard Definition of Correlation (Sample)
2Recall that the sample of isvariance X Xs
XYcovarianceWe define the sample of and ass X Y
2 22 1
1Xs E X X X XN
1
1XYs E X X Y Y X X Y YN
The Pearson correlation between and is then given byX Yr
XY
X Y
sr
s s
PSYC 6130, PROF. J. ELDER 12
Computational Formula
( )( ) X Yx y X Y
X Y X Y
XYz z X Y NN N
covariance
For a population:
For a sample: 1
( )( ) 11 ( 1)
x y
X Y X Y
XY NXYz z X X Y Y NrN N s s s s
unbiased covariance
PSYC 6130, PROF. J. ELDER 13
Example: 6130A 2005-2006 Assignment Marks
Assignment 1 Assignment 2
X Y XY
0.8671 0.8176 0.70890.8150 0.8239 0.67150.8497 0.8428 0.71610.8555 0.8679 0.74250.9017 0.8365 0.75430.9538 0.8742 0.83380.9191 0.9308 0.85550.9306 0.9308 0.86630.9480 0.9182 0.87050.9364 0.9371 0.87750.9480 0.9308 0.88240.9422 0.9434 0.88890.9480 0.9560 0.9062
Mean 0.9088 0.8931 0.8134Pop. Std. Dev. 0.04473 0.04847Sample Std. Dev. 0.04655 0.05045
End of Lecture 7
Wed, Oct 29 2008
Correlation and the Power of Matched Tests
PSYC 6130, PROF. J. ELDER 16
Correlation and the Power of Matched t-tests
• Now that we understand correlation, we can better understand the power of matched t-tests when scores in the two conditions are correlated.
PSYC 6130 Section A 2005-2006
y = 0.867x + 0.105
r2 = 0.64075%
80%
85%
90%
95%
100%
75% 85% 95% 105%
Assignment 1 Grade
Ass
ignm
ent
2 G
rade
PSYC 6130, PROF. J. ELDER 17
Recall formulae for standard error for independent and matched tests
• Independent t-test • Matched t-test2 2
2 1 2 1 22D
s s s ss
n
r
n
1 2
2 2 21 2
1X X
s s sn
2 2 21 2
For the purpose of power calculations, assume homogeneity of variance:
[ ] [ ] .
Then:
E s E s
1 2
2 22X X n
2 2 2 22 2 2(1 )
D n n n
PSYC 6130, PROF. J. ELDER 18
Knowing the expected std error, we can estimate the expected t-value
• Independent t-test • Matched t-test
1 2
1 2
1 2
[ ]
2
2
X X
E t
n
nd
1 2
1 2
[ ]
1
2 1
1
2 1
D
E t
n
nd
1Thus
1matched independent
PSYC 6130, PROF. J. ELDER 19
The power of matched t-tests
• Large positive correlations between scores in the two conditions will mean a greater expected t-score for the matched design.
• But keep in mind that the critical value for the matched design will be somewhat larger as well, due to a smaller df.
• Which test is more powerful is decided by the exact tradeoff between these two effects.
1
1matched independent
2( 1)independentdf n 1matcheddf n
Applying Correlation Analysis
PSYC 6130, PROF. J. ELDER 21
Adjusted Correlation Coefficient
Although the sample covariance is an of
the population covariance , the sample correlation coefficient
is an unbiased estimator of the populat
unbiased
ion corr
estimator
n elation coeot fficien t
XY
XY
s
r.
A less-biased estimate of is given by
the adjusted correlation coefficient :adjr
21 11
2adj
r Nr
N
PSYC 6130, PROF. J. ELDER 22
Testing Pearson’s r for Significance
2 2
2Thus
1 12
r Nt r
r rN
2
When =0, is approximately -distributed on N-2 degrees of freedom,
with standard deviation
1
2r
r t
rs
N
PSYC 6130, PROF. J. ELDER 23
Underlying Assumptions (For Inference)
• Independent random sampling
• Bivariate normal distribution
-10 -5 0 5 10-10
-5
0
5
10
X
Y
-100
10
-10
0
100
0.01
0.02
0.03
XY
Pro
babi
lity
PSYC 6130, PROF. J. ELDER 24
Applications of Pearson’s r
• Measuring reliability and validity
– Examples:
• e.g., test-retest reliability
• Split-half reliability
• Inter-rater reliability
• Criterion validity of self-report (correlate self-report against behavioural measure)
• Correlation between tests that are supposed to measure the same thing.
• Correlation between algorithmic model and human responses in behavioural studies.
• Measuring relationships between variables (correlational studies)
– e.g., frequency of cannabis and alcohol use
• Measuring relationships between IVs and DVs (experimental studies, when IV on interval/ratio scale
– e.g., exam performance as a function of alcohol consumption on previous night.
PSYC 6130, PROF. J. ELDER 25
Power Analysis for Pearson’s r
ALet expected correlation under alternate hypothesis
A
Then
E[ ] 1t N
PSYC 6130, PROF. J. ELDER 26
Confidence Intervals for Pearson’s r
• Pearson’s r is bounded on [-1..1].
• Consequently, sampling distribution for r is not normal.
• Sampling distribution for >0 is negatively skewed.
• Sampling distribution for <0 is positively skewed.
• Thus confidence intervals are generally not symmetric.
PSYC 6130, PROF. J. ELDER 27
Fisher Transform
• Fisher transform (Appendix r′): Method for symmetrizing r to facilitate calculation of confidence interval using standard normal table.
1
3rs
N
1 1log
2 1
rr
r
is approximately normally-distributed around with standard deviationr
PSYC 6130, PROF. J. ELDER 28
Confidence Intervals on r
/2 rr z s
Note that since does not depend on statistics computed from the sample,
approximates a normal distribution, not a -distribution, and we can use the
-table for our calculations:
rs
r t
z
The inverse transform can then be applied to convert the confidence limits
back to :
r
r
2
2
1
1
r
r
er
e
These transforms can also be done directly using Appendix r
End of Lecture 8
Nov 5 2008
PSYC 6130, PROF. J. ELDER 30
Testing Difference of Pearson Correlations from 2 Independent Samples
• Converting the skewed r distribution to an (approximately) normal distribution allows straightforward two-sample testing:
1 2
1 2
r r
r rz
s
1 2
1 2
where
1 1
3 3
(Remember that variances add)
r rsN N
PSYC 6130, PROF. J. ELDER 31
Example
COSC 3101 2003-04 Section N
r = 0.303
20
30
40
50
60
70
80
90
100
20 40 60 80 100
Midterm Exam Grade (%)
Fin
al E
xam
Gra
de
(%
)
COSC 3101 2003-04 Section M
r = 0.4486
20
30
40
50
60
70
80
90
100
20 40 60 80 100
Midterm Exam Grade (%)
Fin
al E
xam
Gra
de
(%
)
N=43 N=44