24
Correlation and Regression Dharshan Kumaran Hanneke den Ouden

Correlation and Regression

  • Upload
    jayme

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Correlation and Regression. Dharshan Kumaran Hanneke den Ouden. Aims. Is there a relationship between x and y? What is the strength of this relationship? Pearson’s r Can we describe this relationship and use this to predict y from x? y=ax+b - PowerPoint PPT Presentation

Citation preview

Page 1: Correlation and Regression

Correlation and Regression

Dharshan Kumaran

Hanneke den Ouden

Page 2: Correlation and Regression

Aims

Is there a relationship between x and y? What is the strength of this relationship?

– Pearson’s r Can we describe this relationship and use this to predict y

from x? – y=ax+b

Is the relationship we have described statistically significant?

– t test Relevance to SPM

– GLM

Page 3: Correlation and Regression

Relation between x and y

Correlation: is there a relationship between 2 variables?

Regression: how well a certain independent variable predict dependent variable?

CORRELATION CAUSATION– In order to infer causality: manipulate independent

variable and observe effect on dependent variable

Page 4: Correlation and Regression

Observation ‘clouds’

Y

X

Y

X

Y

X

YY Y

Positive correlation Negative correlation No correlation

Page 5: Correlation and Regression

Variance vs Covariance

Do two variables change together?

n

yyxxyx

i

n

ii ))((

),cov( 1

Covariance ~

X * Y

n

xxS

n

ii

x

2

12

)(

Variance ~

X * X

Page 6: Correlation and Regression

Covariance

When X and Y : cov (x,y) = pos. When X and Y : cov (x,y) = neg. When no constant relationship: cov (x,y) = 0

n

yyxxyx

i

n

ii ))((

),cov( 1

Page 7: Correlation and Regression

Example Covariance

0

1

2

3

4

5

6

7

0 1 2 3 4 5 6 7

x y xxi yyi ( xix )( yiy

)

0 3 -3 0 0 2 2 -1 -1 1 3 4 0 1 0 4 0 1 -3 -3 6 6 3 3 9

3x 3y 7

4.15

7)))((

),cov( 1

n

yyxxyx

i

n

ii What does this

number tell us?

Page 8: Correlation and Regression

Pearson’s R

Covariance does not really tell us anything– Solution: standardise this measure

Pearson’s R: standardise by adding std to equation:

),cov( yx

yxxy ss

yxr

),cov(

Page 9: Correlation and Regression

Pearson’s R

n

yyxxyx

i

n

ii ))((

),cov( 1

yx

i

n

ii

xy sns

yyxxr

))((1

11 r

n

ZZr

n

iyx

xy

ii 1

*

Page 10: Correlation and Regression

Limitations of r

When r = 1 or r = -1:– We can predict y from x with certainty– all data points are on a straight line: y = ax + b

r is actually – r = true r of whole population– = estimate of r based on data

r is very sensitive to extreme values:

n

yyxxyx

i

n

ii ))((

),cov( 1

0

1

2

3

4

5

0 1 2 3 4 5 6

r

r

Page 11: Correlation and Regression

In the real world…

r is never 1 or –1 find best fit of a line in a cloud of observations:

Principle of least squares

ε

y = ax + bε = residual error

= , true value= , predicted value

iyy

min)ˆ(

1

2

n

yyn

ii

Page 12: Correlation and Regression

The relationship between x and y (1) : Finding a and b

Population:

Model:

Solution least squares minimisation:

iii baxy bxay iiˆˆˆ

x

y

s

sra

ˆˆ x

s

sryb

x

yˆˆ

Page 13: Correlation and Regression

The relationship between x and y (2)

xs

sryx

s

srbxay

x

yi

x

yii

ˆˆˆˆˆ

yxxs

sry i

x

yi )(

ˆˆ

Page 14: Correlation and Regression

What can the model explain?

y

yS2

S2y

S2(yi - i)y

Total variance = predicted variance + error variance

)ˆ(2ˆ

2

ii yyyy sss 2

Page 15: Correlation and Regression

22

2

2ˆ ˆ...

)ˆ(y

iy sr

n

yys

)ˆ(2ˆ

2

ii yyyy sss

2

2ˆ2222

ˆ ˆˆy

yyy s

srsrs

predicted

totalExplained variance =

predicted variance:

2

Page 16: Correlation and Regression

)ˆ(2ˆ

2

ii yyyy sss

Substitute this into equation above

Error variance:

222ˆ ˆ yy srs

22222 )ˆ1(ˆ yyy srsrs

)ˆ( ii yys2ys 22ˆ ysr

22 )ˆ1( ysr

2

Page 17: Correlation and Regression

Is the model significant?

We’ve determined the form of the relationship (y = ax + b) and it’s strength (r).

Does a prediction based on this model do a better job that just predicting the mean?

Page 18: Correlation and Regression

Analogy with ANOVA

Total variance = predicted variance + error variance

In a one-way ANOVA, we have

SSTotal = SSBetween + SSWithin

22222 )ˆ1(ˆ yyy srsrs

Page 19: Correlation and Regression

F statistic (for our model)

),mod(

Eff

MSErr

MSF dferroreldf

MSEff=SSbg/dF

MSErr=SSwg/dF

22ˆ ysr /1

22 )ˆ1( ysr / (N-2)

Page 20: Correlation and Regression

F and t statistic

2

2),mod(

ˆ1

)2(ˆ

r

NrF dferroreldf

Alternatively (as F is the square of t):

2)2(ˆ1

)2(ˆ

r

Nrt N

So all we need toknow is N and r!!

Page 21: Correlation and Regression

Basic assumptions

Linear relationship Homogeneity of variance (Y) ~ N(0,2) No errors in measurement of X Independent sampling

Page 22: Correlation and Regression

SPM- GLM

Y1 = x11 1 + x12 +…+ x1n n + 1

Y2 = x21 1 + x22 +…+ x2n n + 2

:

Ym = xm1 1 + xm2 +…+ xmn n+ m

.

iii baxy Regression model

m

2

1

m1

21

11

m

2

1

xxx

yyy

n

12x22x 2nx

1nx

m2x mnx

Multiple regression model

In matrix notation

Page 23: Correlation and Regression

m

2

1

m1

21

11

m

2

1

x

x

x

y

y

y

n

12x

22x 2nx1nx

m2x mnx

SPM !!!!

Observed data = design matrix * parameters + residuals

XY

Page 24: Correlation and Regression

The End

Any questions?*

*See Will, Dan and Lucy