Upload
jayme
View
33
Download
0
Embed Size (px)
DESCRIPTION
Correlation and Regression. Dharshan Kumaran Hanneke den Ouden. Aims. Is there a relationship between x and y? What is the strength of this relationship? Pearson’s r Can we describe this relationship and use this to predict y from x? y=ax+b - PowerPoint PPT Presentation
Citation preview
Correlation and Regression
Dharshan Kumaran
Hanneke den Ouden
Aims
Is there a relationship between x and y? What is the strength of this relationship?
– Pearson’s r Can we describe this relationship and use this to predict y
from x? – y=ax+b
Is the relationship we have described statistically significant?
– t test Relevance to SPM
– GLM
Relation between x and y
Correlation: is there a relationship between 2 variables?
Regression: how well a certain independent variable predict dependent variable?
CORRELATION CAUSATION– In order to infer causality: manipulate independent
variable and observe effect on dependent variable
Observation ‘clouds’
Y
X
Y
X
Y
X
YY Y
Positive correlation Negative correlation No correlation
Variance vs Covariance
Do two variables change together?
n
yyxxyx
i
n
ii ))((
),cov( 1
Covariance ~
X * Y
n
xxS
n
ii
x
2
12
)(
Variance ~
X * X
Covariance
When X and Y : cov (x,y) = pos. When X and Y : cov (x,y) = neg. When no constant relationship: cov (x,y) = 0
n
yyxxyx
i
n
ii ))((
),cov( 1
Example Covariance
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7
x y xxi yyi ( xix )( yiy
)
0 3 -3 0 0 2 2 -1 -1 1 3 4 0 1 0 4 0 1 -3 -3 6 6 3 3 9
3x 3y 7
4.15
7)))((
),cov( 1
n
yyxxyx
i
n
ii What does this
number tell us?
Pearson’s R
Covariance does not really tell us anything– Solution: standardise this measure
Pearson’s R: standardise by adding std to equation:
),cov( yx
yxxy ss
yxr
),cov(
Pearson’s R
n
yyxxyx
i
n
ii ))((
),cov( 1
yx
i
n
ii
xy sns
yyxxr
))((1
11 r
n
ZZr
n
iyx
xy
ii 1
*
Limitations of r
When r = 1 or r = -1:– We can predict y from x with certainty– all data points are on a straight line: y = ax + b
r is actually – r = true r of whole population– = estimate of r based on data
r is very sensitive to extreme values:
n
yyxxyx
i
n
ii ))((
),cov( 1
0
1
2
3
4
5
0 1 2 3 4 5 6
r
r
In the real world…
r is never 1 or –1 find best fit of a line in a cloud of observations:
Principle of least squares
ε
y = ax + bε = residual error
= , true value= , predicted value
iyy
min)ˆ(
1
2
n
yyn
ii
The relationship between x and y (1) : Finding a and b
Population:
Model:
Solution least squares minimisation:
iii baxy bxay iiˆˆˆ
x
y
s
sra
ˆˆ x
s
sryb
x
yˆˆ
The relationship between x and y (2)
xs
sryx
s
srbxay
x
yi
x
yii
ˆˆˆˆˆ
yxxs
sry i
x
yi )(
ˆˆ
What can the model explain?
y
yS2
S2y
S2(yi - i)y
Total variance = predicted variance + error variance
)ˆ(2ˆ
2
ii yyyy sss 2
22
2
2ˆ ˆ...
)ˆ(y
iy sr
n
yys
)ˆ(2ˆ
2
ii yyyy sss
2
2ˆ2222
ˆ ˆˆy
yyy s
srsrs
predicted
totalExplained variance =
predicted variance:
2
)ˆ(2ˆ
2
ii yyyy sss
Substitute this into equation above
Error variance:
222ˆ ˆ yy srs
22222 )ˆ1(ˆ yyy srsrs
)ˆ( ii yys2ys 22ˆ ysr
22 )ˆ1( ysr
2
Is the model significant?
We’ve determined the form of the relationship (y = ax + b) and it’s strength (r).
Does a prediction based on this model do a better job that just predicting the mean?
Analogy with ANOVA
Total variance = predicted variance + error variance
In a one-way ANOVA, we have
SSTotal = SSBetween + SSWithin
22222 )ˆ1(ˆ yyy srsrs
F statistic (for our model)
),mod(
Eff
MSErr
MSF dferroreldf
MSEff=SSbg/dF
MSErr=SSwg/dF
22ˆ ysr /1
22 )ˆ1( ysr / (N-2)
F and t statistic
2
2),mod(
ˆ1
)2(ˆ
r
NrF dferroreldf
Alternatively (as F is the square of t):
2)2(ˆ1
)2(ˆ
r
Nrt N
So all we need toknow is N and r!!
Basic assumptions
Linear relationship Homogeneity of variance (Y) ~ N(0,2) No errors in measurement of X Independent sampling
SPM- GLM
Y1 = x11 1 + x12 +…+ x1n n + 1
Y2 = x21 1 + x22 +…+ x2n n + 2
:
Ym = xm1 1 + xm2 +…+ xmn n+ m
.
iii baxy Regression model
m
2
1
m1
21
11
m
2
1
xxx
yyy
n
12x22x 2nx
1nx
m2x mnx
Multiple regression model
In matrix notation
m
2
1
m1
21
11
m
2
1
x
x
x
y
y
y
n
12x
22x 2nx1nx
m2x mnx
SPM !!!!
Observed data = design matrix * parameters + residuals
XY
The End
Any questions?*
*See Will, Dan and Lucy