55
Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Embed Size (px)

Citation preview

Page 1: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Correlation and linear regression analysis

Martin van der Esch, Phd

Page 2: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Content

Correlation and linear regression analysis

Association researchHowever, also used in experimental studies

Page 3: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Correlation and regression

- Interested in relationship/association/correlation- Direction and magnitude of relationship- Dependent or independent variables- Association does not imply a ‘cause and effect’ relationship

Page 4: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Correlation

Page 5: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Correlation

Expressed as productmomentcorrelation Pearson coefficent (r) when data are not skewed

or rank order correlation Spearman (rs) when data are ordinal, skewed or in case of presence of outliers.

DimensionlessRage between +1 and –1 (0 = no correlation)Magnitude indicates how close the points are to a

straight line (the strength of an association)

+1 or –1: perfect correlation: all points lying on the line

5

Page 6: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Between -1 to 1.

Page 7: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

70605040302010

Leeftijd

220

200

180

160

140

120

100

Sy

st.

blo

ed

dru

k

R Sq Linear = 0,432

Page 8: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

70605040302010

Leeftijd

220

200

180

160

140

120

100

Sy

st.

blo

ed

dru

k

R Sq Linear = 0,712

Page 9: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Correlation coefficient

Range: -1 ≤ r ≤ 1.

In SPSS Model Summary

,844a ,712 ,702 9,563Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Leeftijda.

Coefficientsa

97,077 5,528 17,562 ,000,949 ,116 ,844 8,174 ,000

(Constant)Leeftijd

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Syst. bloeddruka.

Page 10: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Formula correlation

n

i

n

iii

n

iii

yyxx

yyxxr

1 1

22

1

)()(

))((

Page 11: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Regression analysis

Page 12: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Statistical analysisData were analyzed with SPSS for Windows 16.0 (SPSS Inc). According to their distribution, the

various parameters are expressed as mean (± standard deviation) or median (interquartile range). Data with a non-Gaussian distribution was log transformed for analysis if possible. To compare the groups, student’s T-test or Mann-Whitney U test was used when appropriate. Furthermore, correlations between variables were analyzed by using Pearson correlation or Spearman’s rho tests. Univariate linear regression analyses were performed on log-transformed data to investigate the influence of possible confounders (i.e. sex, smoking status, systolic blood pressure and body mass index (BMI) on the results. Wilcoxon signed-rank test was used to investigate the differences in values at baseline and at 8 weeks in the prospectively followed subgroup of patients (n=9). P-values less than 0.05 were considered statistically significant.

I C van Eijk, M E Tushuizen, A Sturk, B A C Dijkmans, M Boers, A E Voskuyl, M Diamant, G.J. Wolbink, R Nieuwland and M T NurmohamedCirculating microparticles remain associated with complement activation despite intensive anti-inflammatory therapy in early rheumatoid arthritisAnn Rheum Dis published online 16 Nov 2009;

Page 13: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Typical association question

Research question: is there an association between age and pain in patients with …?

Hypothesis: pain increases in older patients

Y = a + bX + e

Page 14: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Readeage

pain

50

Page 15: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade15

Simple (uni) linear regression analysis

Difference with correlation analysis: prediction line that gives the best description of the scatter

plot, best fitting line difficult to draw line by hand solve problem with mathematical equation

Page 16: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Simple (uni) linear regression analysis

We use the ‘Method of Least Squares’ to fit the best line

Minimal distance between the data and the fitting line

Page 17: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Readeage

pain

50

Page 18: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

1 = difference between age 0 and age 1 difference between age 1 and age 2

----------------------------------- difference between age 30 and age 31

Pain = 0 + 1 * age

What is 0?

What is 1? 1 = Beta=b

0 = pain at age is 0

Page 19: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade19

Mathematical equation to describe the relationship

y = a + b*x

y is called the dependent (outcome) variable

x is called the independent (predictor, explanatory) variable

a is the intercept: value of y when x=0

b (unstandardized beta) is the ´slope´: it represents the amount by which Y increases on average if we increase x by one unit

a and b are called regression coefficients

Page 20: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

Regression coefficient is equal to the difference in the outcome variable when the determinant one unit changes

Page 21: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Readeage

pain

50

1

1

Page 22: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

pain = - 20 + 0,5 * age

What is –20? What is 0,5?

Page 23: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

You can also analyse difference between two groups with simple regression analysis.

Page 24: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Back to 2 groups and analysis of pain

group pre after

medication 75.8 (6.8) 65.8 (10.1)

placebo 75.4 (7.1) 68.2 (9.0)

Page 25: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Group Statistics

100 -7,2000 3,75513

100 -10,0000 6,81650

groepplacebo

nieuwe medicatie

VERSCHILN Mean Std. Deviation

Independent Samples Test

3,598 198 ,000 2,8000 1,26530 4,33470VERSCHILt df Sig. (2-tailed)

MeanDifference Lower Upper

95% ConfidenceInterval of the

Difference

t-test for Equality of Means

Page 26: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Now analysed by simple regression analysis

placebo Medication 1

Continuous outcome

Pain

Page 27: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Readeplacebo Medication 1

Continuous outcome

Pain

Page 28: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

Regression coefficient is equal to the difference in mean between two comparable groups

Page 29: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Simpe (uni) linear regression analysis

1 = mean difference between placebo and medication

Pain = 0 + 1 * group

placebo = 0; medication = 1

0 = mean in controlegroup

Page 30: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Coefficientsa

-7,200 ,550 -13,084 ,000

-2,800 ,778 -3,598 ,000

(Constant)

groep

Model1

B Std. Error

UnstandardizedCoefficients

t Sig.

Dependent Variable: VERSCHILa.

0 1

Page 31: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Hypothesis test for β

SEt

N-2 degrees of freedom

Page 32: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

P value?

t -3,598

778,0

800,2t

Page 33: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Back to the exampleExperimental designIncluding another medicineThree comparable groups

Page 34: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Pain

group To T1

medication1 75.8 (6.8) 65.8 (10.1)

medication2 76.8 (7.5) 61.9 (11.7)

placebo 75.4 (7.1) 68.2 (9.0)

Page 35: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Readeplacebo medication1 medication2

Continuous outcome

Page 36: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Coefficientsa

-6,850 ,586 -11,681 ,000

-3,850 ,454 -8,475 ,000

(Constant)

groep

Model1

B Std. Error

UnstandardizedCoefficients

t Sig.

Dependent Variable: VERSCHILa.

Group analysed as continuous variabele

Page 37: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

But…, group isn’t a continous variable: a categorical variable Therefore it needs to be analysed by dummy-variables

Page 38: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Page 39: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Dummy variables

Categorical Variables Codings

,000 ,000

1,000 ,000

,000 1,000

placebo

nieuwe medicatie

alternatieve medicatie

GROEP(1) (2)

Parameter coding

Dummy 1: new medication - placebo

Dummy 2: alt. medication - placebo

Placebo: controle / control groep

Page 40: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Simple regression analysis

Pain = 0 + 1 * medicationgroup1 + 2 * medicatiogroup2

What is 0?0 = mean of placebogroupWhat is 1?1 = difference between placebo and medication1What is 2?2 = difference between placebo and medication2

Page 41: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Readeplacebo medication1 medication2

Continuous outcome

Page 42: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Coefficientsa

-7,200 ,642 -11,222 ,000

-2,800 ,907 -3,086 ,002

-7,700 ,907 -8,487 ,000

(Constant)

DUMMIE1

DUMMIE2

Model1

B Std. Error

UnstandardizedCoefficients

t Sig.

Dependent Variable: VERSCHILa.

Pain = 0 + 1 * medicationgroup1 +

2 * medicationgroup2

Page 43: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Intermezzo

Little excercise..•

Is there a relationship between your height (cm) and shoesize (european size)…

•Estimate relationcoefficient…

•What does that mean?

•Estimate formula Height = ? + ? * shoesize

•Group: men/woman.

•Groups: occupational therapy, physiotherapy, other.

Page 44: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Assumption linear regression analysis

Linear relationship between x en y•S

catter diagram(

otherwise Logaritmic transformation (next week)

For each value of x, there is a distribution of values of y in the population; this distribution is Normal

•Analyses of the residuals

Variability of the distribution of y values in the population is the same for all values of x, i.e. the variance is constant (s2 / sd)

•Analyses of the residuals

Page 45: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Checking for linearity

Scatterplot

Adding a quadratic term

Splitting exposure variable into groups (4-5)

Page 46: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Adding a quadratic termpain

age

Page 47: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Checking for linearity

Splitting exposure variable into groups

Page 48: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Splitting exposure variable into groupspain

age

1

2

34

Page 49: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Example in SPSS

Examine the association between age and pain score at baseline.

ScatterplotLinear regression analysisChecking for linearity

•Adding a quadratic term

•Splitting exposure variable into groups

Page 50: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Scatter plot

40,00 50,00 60,00 70,00 80,00 90,00

age

65,00

70,00

75,00

80,00

pa

in

Page 51: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Lineair regression analysis

Pain (at baseline) = 56.2 + 0.23 * age

Coefficientsa

56,239 2,131 26,394 ,000

,234 ,033 ,523 7,005 ,000

(Constant)

age

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: paina.

Page 52: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Adding a quadratic term

Coefficientsa

56,239 2,131 26,394 ,000

,234 ,033 ,523 7,005 ,000

49,128 13,116 3,746 ,000

,456 ,405 1,020 1,124 ,263

-,002 ,003 -,499 -,549 ,584

(Constant)

age

(Constant)

age

age2

Model1

2

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: paina.

Page 53: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Splitting exposure variable into groups

Produce categorical age variableRecode to dummy variablesPerform linear regression analysis with dummiesAre the B’s increasing in a linear order with comparable

distance between the dummies?

Page 54: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Splitting exposure variable into groups

Coefficientsa

68,382 ,535 127,936 ,000

1,924 ,774 ,223 2,486 ,014

3,346 ,750 ,404 4,459 ,000

5,446 ,768 ,638 7,094 ,000

(Constant)

dummy1

dummy2

dummy3

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: paina.

Page 55: Amsterdam Rehabilitation Research Center | Reade Correlation and linear regression analysis Martin van der Esch, Phd

Amsterdam Rehabilitation Research Center | Reade

Questions?

55