CORRELATION AND REGRESSION - Etusivu | …...CORRELATION AND REGRESSION Vanja Radišić Biljak...

Preview:

Citation preview

CORRELATION AND REGRESSION

Vanja Radišić Biljak

Department of medical laboratory diagnostics, University Hospital „Sveti Duh”, Zagreb, Croatia

MedCalc

Example 1

Is there an association between White Blood Cell (WBC) count and

concentration of C-reactive protein (CRP) in a group of University Hospital

„Sveti Duh” Emergency Department patients with a suspicion of acute

appendicitis?

What is the question about the data?

Are these groups

different?

Are these groups

associated?

Can I predict one variable

by knowing the other?

Tests for

statistical

differences

Correlation Regression

Correlation

• Statistical procedure applied to investigate association between two

variables

• Numerically expressed as coefficient of correlation (r)

• Level of significance (P)

Positive correlation

• Increase of x → Increase of y

• Decrease of x → Decrease of y

0 < r ≤ 1

x

y

Negative correlation

-1 ≤ r < 0

Increase of x → Decrease of y

Decrease of x → Increase of y

y

x

No correlation

• Increase of x → ? of y

• Decrease of x → ? of y

y

x

Association but no correlation

• Correlation can be calculated only if there is LINEAR association

y y

x x

Types of correlation

• Pearson (rp) and Spearman correlation (rs)

1. Sample size < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

Is there an association between White Blood Cell (WBC) count and

concentration of C-reactive protein (CRP) in a group of University Hospital

„Sveti Duh” Emergency Department patients with a suspicion of acute

appendicitis?

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

Is there an association between White Blood Cell (WBC) count and

concentration of C-reactive protein (CRP) in a group of University Hospital

„Sveti Duh” Emergency Department patients with a suspicion of acute

appendicitis?

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

Is there an association between White Blood Cell (WBC) count and

concentration of C-reactive protein (CRP) in a group of University Hospital

„Sveti Duh” Emergency Department patients with a suspicion of acute

appendicitis?

Example 1

Is there an association between White Blood Cell (WBC) count and

concentration of C-reactive protein (CRP) in a group of University Hospital

„Sveti Duh” Emergency Department patients with a suspicion of acute

appendicitis?

Quantitative (numerical) data

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Reject normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Reject normality

Both variables do

not follow normal

distribution

Spearman correlation

Pearson correlation

Example 1

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Reject normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

MedCalc

Coefficient of correlation (r)

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

Coefficient of correlation (r)

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

Coefficient of correlation (r)

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

There is statistically significant positive poor

association between WBC and CRP in a group of

University Hospital „Sveti Duh” Emergency Department

patients with a suspicion of acute appendicitis.

Coefficient of determination (D)

• D = r2

• Indicates how well data fit a statistical model

• r = 0.85; D = 0.7225

• 72% of data are associated

• r = 0.25; D = 0.0625

• 6.25% of data are associated

There is statistically significant positive poor

association between WBC and CRP in a group of

University Hospital „Sveti Duh” Emergency Department

patients with a suspicion of acute appendicitis.

r2=0,2162

22% of data are associated

Example 2

Is there an association between White Blood Cell (WBC) count and Mean

Platelet Volume (MPV) in a group of University Hospital „Sveti Duh”

Emergency Department patients with a suspicion of acute appendicitis?

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Type of data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

Is there an association between White Blood Cell (WBC) count and Mean

Platelet Volume (MPV) in a group of University Hospital „Sveti Duh”

Emergency Department patients with a suspicion of acute appendicitis?

Quantitative (numerical) data

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Accept normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 2

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Accept normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

MedCalc

Coefficient of correlation (r)

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

Coefficient of correlation (r)

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

Coefficient of correlation (r)

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

We cannot interpret association between WBC and

MPV in a group of University Hospital „Sveti Duh”

Emergency Department patients with a suspicion of

acute appendicitis, as the P value is not significant.

Example 3

Is there an association between platelet (PLT) count and Mean Platelet

Volume (MPV) in a group of University Hospital „Sveti Duh” Emergency

Department patients with a suspicion of acute appendicitis?

Example 3

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 3

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Accept normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Example 3

• Pearson (rp) and Spearman correlation (rs)

1. Sample size = 120 < 30 Spearman correlation

2. Numerical data At least one ordinal

data Spearman correlation

3. Accept normality

Both variables do not

follow normal

distribution

Spearman correlation

Pearson correlation

Coefficient of correlation (r)

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

There is statistically significant negative poor

association between Plt and MPV in a group of

University Hospital „Sveti Duh” Emergency Department

patients with a suspicion of acute appendicitis.

There is statistically significant negative poor

association between Plt and MPV in a group of

University Hospital „Sveti Duh” Emergency Department

patients with a suspicion of acute appendicitis.

r2=0,248

25% of data are associated

Example 4

Can we predict MPV values if we know Plt count in a group of University

Hospital „Sveti Duh” Emergency Department patients with a suspicion of acute

appendicitis?

What is the question about the data?

Are these groups

different?

Are these groups

associated?

Can I predict one variable

by knowing the other?

Tests for

statistical

differences

Correlation Regression

Regression

• Dependent variable is numeric

• Calculating value of dependent

variable

• Dependent variable is categorical

(binomial)

• Calculating the odds of an event

• Presence/Existence of disease

• Cut-off value

Linear Logistic

Example 4

Can we predict MPV if we know PLT count in a group of University Hospital

„Sveti Duh” Emergency Department patients with a suspicion of acute

appendicitis?

Quantitative (numerical) data

Example 4

• Dependent variable is numeric

• Calculating value of dependent

variable

• Dependent variable is categorical

(binomial)

• Calculating the odds of an event

• Presence/Existence of disease

• Cut-off value

Linear Logistic

Example 4

• Dependent variable is numeric

• Calculating value of dependent

variable

• Dependent variable is categorical

(binomial)

• Calculating the odds of an event

• Presence/Existence of disease

• Cut-off value

Linear Logistic

Linear regression

• Linear regression can be calculated ONLY if there is correlation between

variables

• Independent variable (x)

• Dependent variable (y)

• Dependent variable (y) is calculated from the independent variable (x) using

mathematical operation

Linear regression equation

Equation

y = a + bx

Intercept = a

1

1

Slope = b

x

y

Independent

variable

Dependent

variable

www.mathisfun.com

MedCalc

Example 4

Confidence limits

95% confidence

interval for slope and

intercept

y = a + bx

Example 4

Example 4

Example 4

only 25% of data follow calulated equation

Residuals

• Difference between

measured and calculated

value

• If equation describes the

data well, residuals are

low

x

Y – f(x)

0

> 0

< 0

Example 4

Equation doesn’t describe the data well,

coefficient of determination is low, residuals are high.

Example 5

• Can we predict CRP concentrations if we know WBC count and MPV

values in a group of University Hospital „Sveti Duh” Emergency Department

patients with a suspicion of acute appendicitis?

Example 5

• Can we predict CRP concentrations if we know WBC count and MPV

values in a group of University Hospital „Sveti Duh” Emergency Department

patients with a suspicion of acute appendicitis?

Multiple regression

Example 5

Example 5

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

Example 5

Coefficient of correlation (r) Interpretation

0-0.24 No association

0.25-0.49 Poor association

0.50-0.74 Moderate to good association

0.75-1.00 Very good to excellent

association

r can be interpreted only if P < level of significance (0.05)

Example 5

There is no association between MPV and CRP in a group of

University Hospital „Sveti Duh” Emergency Department patients

with a suspicion of acute appendicitis.

MedCalc

Example 5

Independent variables

Dependent

variable

Example 6

Example 5

We wanted to predict an OUTCOME

Example 6

• Dependent variable is numeric

• Calculating value of dependent

variable

• Dependent variable is categorical

(binomial)

• Calculating the odds of an event

• Presence/Existence of disease

• Cut-off value

Linear Logistic

Example 6

• Dependent variable is numeric

• Calculating value of dependent

variable

• Dependent variable is categorical

(binomial)

• Calculating the odds of an event

• Presence/Existence of disease

• Cut-off value

Linear Logistic

Logistic regression

• We can analyze more than two groups of data

• Dependent variable is categorical and binomial (Y)

• Independent variables can be both, numerical and categorical (x1, x2, x3...)

Example 6

Dependent variable: OUTCOME (APPENDICITIS YES/NO)

Independent variables:

CLINICAL

(categorical)

LABORATORY

(numerical)

LABORATORY

(categorical)

Appetite CRP

Urine test strip

Vomiting WBC

Diarrhea RBC

Dysuria RDW

Rebound tenderness PLT

Pain migration MPV

Stepwise analysis

• Analyze all variables separately to

identify which are significantly

associated with the outcome

• Include significantly associated

variables

• Include other variables for

adjustment

1. Univariate analysis 2. Multivariate analysis

MedCalc

Example 6

Example 6

Example 6

Example 6

Example 6

Example 6

Example 6

Example 6

Logistic regression

The goal of logistic regression is:

to find the best fitting model

(yet biologically reasonable)

to describe the relationship between the

dependent variable and the set of independent

(predictor or explanatory) variables.

Logistic regression

The goal of logistic regression is:

to find the best fitting model

(yet biologically reasonable)

to describe the relationship between the

dependent variable and the set of independent

(predictor or explanatory) variables.

Logistic regression

The goal of logistic regression is:

to find the best fitting model

(yet biologically reasonable)

to describe the relationship between the

dependent variable and the set of independent

(predictor or explanatory) variables.

https://www.biochemia-medica.com

Recommended