Introduction to Biostatistics and Bioinformatics Regression and Correlation

Preview:

Citation preview

Introduction to Biostatistics and Bioinformatics

Regression and Correlation

Learning Objectives

Regression – estimation of the relationship between variables • Linear regression• Assessing the assumptions• Non-linear regression

Learning Objectives

Regression – estimation of the relationship between variables • Linear regression• Assessing the assumptions• Non-linear regression

Correlation • Correlation coefficient quantifies the association strength• Sensitivity to the distribution

Relationships

Relationship No Relationship

Relationships

Linear Relationships

Non-Linear Relationship

Relationships

Linear, Strong Linear, Weak

Linear Regression

Linear, Strong Linear, Weak Non-Linear

Linear Regression - Residuals

Linear, Strong Linear, Weak Non-Linear

Resi

duals

Resi

duals

Resi

duals

Linear Regression Model

Linearcomponent

Intercept Slope

Random Error

Dependent Variable

Independent Variable

Random Error component

ii10i εXββY

Linear Regression Assumptions

The relationship between the variables is linear.

Linear Regression Assumptions

The relationship between the variables is linear.

Errors are independent, normally distributed with mean zero and constant variance.

Linear Regression Assumptions

Linear Non-LinearR

esi

duals

Resi

duals

Linear Regression Assumptions

Constant Variance Variable VarianceR

esi

duals

Resi

duals

Linear Regression Model

Linearcomponent

Intercept Slope

Random Error

Dependent Variable

Independent Variable

Random Error component

ii10i εXββY

Linear Regression – Estimating the Line

Estimated

Intercept

Estimated Slope

Estimated Value

Independent Variablei10i XˆˆY

Least Squares Method

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

Least Squares Method

2

iS

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

Least Squares Method

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

0ˆ0

S

01

S

Least Squares Method

Find slope and intercept given measurements Xi,Yi, i=1..N

that minimizes the sum of the squares of the residuals.

0))X)X(

(ˆXY

XY(2Xˆ2XXˆ2X

Y2XY2

Xˆ2XXˆ2XY2XY2)Xˆ2X)XˆY(2XY2(

)Xˆ2Xˆ2XY2()XˆXˆˆ2XˆY2(ˆ

)XˆXˆˆ2ˆ)Xˆˆ(Y2(Yˆ

)XˆXˆˆ2ˆ)Xˆˆ(Y2(Yˆ

))Xˆˆ()Xˆˆ(Y2(Yˆ

))Xˆˆ(Y(ˆ

)Y-Y(ˆˆˆ

2i

2i

1ii

ii2i1i

i1i

iii

2i1i1iii

2i1i1ii

2i1i0ii

2i

21i10i1i

1

2i

21i10

20i10i

2i

1

2i

21i10

20i10i

2i

1

2i10i10i

2i

1

2i10i

1

2ii

1

2

11

NNNN

Si

NN

N

N

i1

i0

2i

2i

iiii

1

XˆYˆ

X)X(

XYXY

ˆ

Linear Regression in Python

import scipy.stats as stats

slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)

Linear Regression Example

Linear, Strong

Resi

duals

x=np.linspace(-1,1,points)y=x+0.1*np.random.normal(size=points)slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)y_line=slope*x+intercept

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y,color='#4D0132',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])ax1.plot(x,y_line,color='red',lw=2)fig.savefig('linear.png')

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])fig.savefig('linear-residuals.png')

Linear Regression Example

x=np.linspace(-1,1,points)y=x+0.4*np.random.normal(size=points)slope,intercept,r_value,p_value,std_err = stats.linregress(x,y)y_line=slope*x+intercept

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y,color='#4D0132',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])ax1.plot(x,y_line,color='red',lw=2)fig.savefig('linear-weak.png')

fig, (ax1) = plt.subplots(1,figsize=(4,4))ax1.scatter(x,y-y_line, color='#963725',lw=0,s=60)ax1.set_xlim([-1.5,1.5])ax1.set_ylim([-1.5,1.5])fig.savefig('linear-weak-residuals.png')

Linear, Weak

Resi

duals

Linear Regression Example

Outlier

Regression – Non-linear data

Solution 1: Transformation

Solution 2: Non-linear Regression

,...)ˆ,ˆ,f(XY 10ii

Correlation Coefficient

22 )()(

))((

YYXX

YYXXr

ii

ii

• A measure of the correlation between the two variables

• Quantifies the association strength

Pearson correlation coefficient:

Correlation Coefficient

Correlation Coefficient

Correlation Coefficient

Correlation Coefficient

Correlation Coefficient

Correlation Coefficient

Source: Wikipedia

Coefficient of Variation

n

ni

iix

1

xxx n,...,,21

Variance

Sample

Mean

n

i

ni

ix

1

2

2)(

Coefficient of Variation (CV)

Correlation Coefficient and CV

Uniform distribution

Correlation Coefficient and CV

Uniform distribution Normal distribution Lognormal distribution

Correlation Coefficient - Outliers

Outlier

Correlation Coefficient – Non-linear

Solutions:• Transformation• Rank correlation (Spearman, r=0.93)

Correlation Coefficient and p-value

Hypothesis: Is there a correlation?

r r r

p p p

Application: Analytical Measurements

Theoretical Concentration

Measu

red

C

on

cen

trati

on

A Few Characteristics of Analytical Measurements

Accuracy: Closeness of agreement between a test result and an accepted reference value.

Precision: Closeness of agreement between independent test results.

Robustness: Test precision given small, deliberate changes in test conditions (preanalytic delays, variations in storage temperature).

Lower limit of detection: The lowest amount of analyte that is statistically distinguishable from background or a negative control.

Limit of quantification: Lowest and highest concentrations of analyte that can be quantitatively determined with suitable precision and accuracy.

Linearity: The ability of the test to return values that are directly proportional to the concentration of the analyte in the sample.

Limit of Detection and Linearity

Theoretical Concentration

Measu

red

C

on

cen

trati

on

Precision and Accuracy

Theoretical Concentration

Theoretical Concentration

Measu

red

C

on

cen

trati

on

Measu

red

C

on

cen

trati

on

Summary - Regression

Source: http://xkcdsw.com/content/img/2274.png

Summary - Correlation

Next Lecture: Experimental Design & Analysis

Experimental Design by Christine Ambrosinowww.hawaii.edu/fishlab/Nearside.htm

Recommended