Chapter 2: Simple Linear Regression 2.2-2.3 Parameter ... · PDF fileChapter 2: Simple Linear...

Chapter 2: Simple Linear Regression

• 2.1 The Model• 2.2-2.3 Parameter Estimation• 2.4 Properties of Estimators• 2.5 Inference• 2.6 Prediction• 2.7 Analysis of Variance• 2.8 Regression Through the Origin• 2.9 Related Models

2.1 The Model

• Measurement of y (response) changes in a linear fashion with asetting of the variable x (predictor):

y =β0 + β1x

linear relation+

◦ The linear relation is deterministic (non-random).

◦ The noise or error is random.

• Noise accounts for the variability of the observations about thestraight line.No noise⇒ relation is deterministic.Increased noise⇒ increased variability.

• Experiment with this simulation program:

simple.sim <-

function(intercept=0,slope=1,x=seq(1,10),sigma=1){

noise <- rnorm(length(x),sd = sigma)

y <- intercept + slope*x + noise

title1 <- paste("sigma = ", sigma)

plot(x,y,pch=16,main=title1)

abline(intercept,slope,col=4,lwd=2)

• > source("simple.sim.R")

> simple.sim(sigma=.01)

> simple.sim(sigma=.1)

> simple.sim(sigma=1)

> simple.sim(sigma=10)

Simulation Examples

2 4 6 8 10

sigma = 0.01

2 4 6 8 10

sigma = 0.1

2 4 6 8 10

sigma = 1

2 4 6 8 10

sigma = 10

The Setup

• Assumptions:1. E[y|x] = β0 + β1x.2. Var(y|x) = Var(β0 + β1x+ ε|x) = σ2.

• Data: Suppose data y1, y2, . . . , yn are obtained at settingsx1, x2, . . . , xn, respectively. Then the model on the data is

yi = β0 + β1xi + εi

(εi i.i.d. N(0, σ2) and E[yi|xi] = β0 + β1xi.)Either1. the x’s are fixed values and measured without error

(controlled experiment)OR

2. the analysis is conditional on the observed values of x(observational study).

2.2-2.3 Parameter Estimation, Fitted Values and Residuals

1. Maximum Likelihood Estimation

Distributional assumptions are required

2. Least Squares Estimation

Distributional assumptions are not required

2.2.1 Maximum Likelihood Estimation

◦ Normal assumption is required:

f(yi|xi) =1√2πσ

e−12σ2(yi−β0−β1xi)

Likelihood: L(β0, β1, σ) =∏f(yi|xi)

σne− 1

∑ni=1(yi−β0−β1xi)

◦ Maximize with respect to β0, β1, and σ2.

◦ (β0, β1): Equivalent to minimizingn∑i=1

(yi − β0 − β1xi)2

(i.e. Least-squares) σ2: SSE/n, biased.

2.2.2 Least Squares Estimation

◦ Assumptions:

1. E[εi] = 0

2. Var(εi) = σ2

3. εi’s are independent.

• Note that normality is not required.

Method

• Minimize

S(β0, β1) =n∑i=1

(yi − β0 − β1xi)2

with respect to the parameters or regression coefficients β0 andβ1: β0 and β1.

◦ Justification: We want the fitted line to pass as close to all of thepoints as possible.

◦ Aim: small Residuals (observed - fitted response values):

ei = yi − β0 − β1xi

Look at the following plots:

> source("roller2.plot")

> roller2.plot(a=14,b=0)

> roller2.plot(a=2,b=2)

> roller2.plot(a=12, b=1)

> roller2.plot(a=-2,b=2.67)

0 2 4 6 8 10 12

Roller weight (t)

Fitted valuesData values

+ve residual

−ve residual

a=14, b=0

0 2 4 6 8 10 12

Roller weight (t)

+ve residual

−ve residual

a=2, b=2

0 2 4 6 8 10 12

Roller weight (t)

+ve residual

−ve residual

a=12, b=1

0 2 4 6 8 10 12

Roller weight (t)

+ve residual −ve residual

a=−2, b=2.67

◦ The first three lines above do not pass as close to the plottedpoints as the fourth, even though the sum of the residuals isabout the same in all four cases.◦ Negative residuals cancel out positive residuals.

The Key: minimize squared residuals source("roller3.plot.R")

> roller3.plot(14,0); roller3.plot(2,2); roller3.plot(12,1);> roller3.plot(a=-2,b=2.67) # small SS

0 2 4 6 8 10 12

Roller weight (t)

+ve residual−ve residual

a=14, b=0

0 2 4 6 8 10 12

0Roller weight (t)

+ve residual−ve residual

a=2, b=2

0 2 4 6 8 10 12

Roller weight (t)

+ve residual

−ve residual

a=12, b=1

0 2 4 6 8 10 12

Roller weight (t)

+ve residual −ve residual

a=−2, b=2.67

Unbiased estimate of σ2

n− # parameters estimated

n∑i=1

n− 2

n∑i=1

* n observations⇒ n degrees of freedom

* 2 degrees of freedom are required to estimate the parameters

* the residuals retain n− 2 degrees of freedom

Alternative Viewpoint

* y = (y1, y2, . . . , yn) is a vector in n-dimensional space.(n degrees of freedom)

* The fitted values yi = β0 + β1xi also form a vector inn-dimensional space:y = (y1, y2, . . . , yn)

(2 degrees of freedom)* Least-squares seeks to minimize the distance between y and y.* The distance between n-dimensional vectors u and v is the

square root ofn∑i=1

(ui − vi)2

* Thus, the squared distance between y and y isn∑i=1

(yi − yi)2 =n∑i=1

Regression Coefficient Estimators

• The minimizers of

S(β0, β1) =n∑i=1

(yi − β0 − β1xi)2

β0 = y − β1x

β1 =Sxy

Sxy =n∑i=1

(xi − x)(yi − y)

Sxx =n∑i=1

(xi − x)2

HomeMade R Estimators

> ls.est <-

function (data)

x <- data[,1]

y <- data[,2]

xbar <- mean(x)

ybar <- mean(y)

Sxy <- S(x,y,xbar,ybar)

Sxx <- S(x,x,xbar,xbar)

b <- Sxy/Sxx

a <- ybar - xbar*b

list(a = a, b = b, data=data)

> S <-

function (x,y,xbar,ybar)

sum((x-xbar)*(y-ybar))

Calculator Formulas and Other Properties

Sxy =n∑i=1

xiyi −(∑ni=1 xi)(

∑ni=1 yi)

Sxx =n∑i=1

x2i −

(∑ni=1 xi)

• Linearity Property:

Sxy =n∑i=1

yi(xi − x)

so β1 is linear in the yi’s.• Gauss-Markov Theorem: Among linear unbiased estimators forβ1 and β0, β1 and β0 are best (i.e. they have smallest variance).• Exercise: Find the expected value and variance of y1−y

x1−x. Is itunbiased for β1? Is there a linear unbiased estimator withsmaller variance?

Residuals

εi = ei = yi − β0 − β1xi

= yi − yi

= yi − y − β1(xi − x)

• In R:> res

function (ls.object)

a <- ls.object$a

b <- ls.object$b

x <- ls.object$data[,1]

y <- ls.object$data[,2]

resids <- y - a - b*x

resids

2.3.1 Consequences of Least-Squares

1. ei = yi − y − β1(xi − x)

(follows from intercept formula)2.∑ni=1 ei = 0 (follows from 1.)

3.∑ni=1 yi =

∑ni=1 yi

(follows from 2.)4. The regression line passes through the centroid (x, y) (follows

from intercept formula)5.∑xiei = 0

(set partial derivative of S(β0, β1) wrt β1 to 0)6.∑yiei = 0

(follows from 2. and 5.)

2.3.2 Estimation of σ2

• The residual sum of squares or error sum of squares is given by

SSE =n∑i=1

e2i = Syy − β1Sxy

Note: SSE = SSRes and Syy = SST

• An unbiased estimator for the error variance is

σ2 = MSE =SSE

n− 2=

[Syy − β1Sxy

]n− 2

Note: MSE = MSRes

• σ = Residual Standard Error =√

Example

• roller data

weight depression

1 1.9 2

2 3.1 1

3 3.3 5

4 4.8 5

5 5.3 20

6 6.1 20

7 6.4 23

8 7.6 10

9 9.8 30

10 12.4 25

Hand Calculation

• (y = depression, x = weight)

◦∑10i=1 xi = 1.9 + 3.1 + · · ·12.4 = 60.7

◦ x = 60.710 = 6.07

◦∑10i=1 yi = 2 + 1 + · · ·25 = 141

◦ y = 14110 = 14.1

◦∑10i=1 x

2i = 1.92 + 3.12 + · · ·+ 12.42 = 461

◦∑10i=1 y

2i = 4 + 1 + 25 + · · ·+ 625 = 3009

◦∑10i=1 xiyi = (1.9)(2) + · · · (12.4)(25) = 1103

◦ Sxx = 461− (60.7)2

10 = 92.6

◦ Sxy = 1103− (60.7)(141)10 = 247

◦ Syy = 3009− (141)2

10 = 1021

◦ β1 =SxySxx

= 24792.6 = 2.67

◦ β0 = y − β1x = 14.1− 2.67(6.07) = −2.11

◦ σ2 = 1n−2(Syy − β1Sxy)

= 18(1021− 2.67(247)) = 45.2 = MSE

• Example Summary: the fitted regression line relating depression(y) to weight (x) is

y = −2.11 + 2.67x

The error variance is estimated as MSE = 45.2.

R commands (home-made version)

> roller.obj <- ls.est(roller)

> roller.obj[-3] # intercept and slope estimate

[1] -2.09

[1] 2.67

> res(roller.obj) # residuals

[1] -0.98 -5.18 -1.71 -5.71 7.95

[6] 5.82 8.02 -8.18 5.95 -5.98

> sum(res(roller.obj)ˆ2)/8 # error variance (MSE)

[1] 45.4

R commands (built-in version)

> attach(roller)

> roller.lm <- lm(depression ˜ weight)

> summary(roller.lm)

lm(formula = depression ˜ weight)

Residuals:

Min 1Q Median 3Q Max

-8.180 -5.580 -1.346 5.920 8.020

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -2.0871 4.7543 -0.439 0.67227

weight 2.6667 0.7002 3.808 0.00518

Residual standard error: 6.735 on 8 degrees of freedom

Multiple R-squared: 0.6445, Adjusted R-squared: 0.6001

F-statistic: 14.5 on 1 and 8 DF, p-value: 0.005175

> detach(roller)

• or

> roller.lm <- lm(depression ˜ weight, data = roller)

Using Extractor Functions

• Partial Output:

> coef(roller.lm)

(Intercept) weight

-2.09 2.67

> summary(roller.lm)$sigma

[1] 6.74

• From the output,

◦ the slope estimate is β1 = 2.67.

◦ the intercept estimate is β0 = −2.09.

◦ the Residual standard error is the square root of the MSE:6.742 = 45.4.

Other R commands

◦ fitted values: predict(roller.lm)

◦ residuals: resid(roller.lm)

◦ diagnostic plots: plot(roller.lm)(these include a plot of the residuals against the fitted valuesand a normal probability plot of the residuals)

◦ Also plot(roller); abline(roller.lm)

(this gives a plot of the data with the fitted line overlaid)

2.4: Properties of Least Squares Estimates

◦ E[β1] = β1

◦ E[β0] = β0

◦ Var(β1) = σ2

◦ Var(β0) = σ2(

1n + x2

Standard Error Estimators

◦ Var(β1) = MSESxx

so the standard error (s.e.) of β1 is estimated by√MSESxx

roller e.g.: MSE = 45.2, Sxx = 92.6

⇒ s.e. of β1 is√

45.2/92.6 = .699

◦ Var(β0) = MSE(

1n + x2

)so the standard error (s.e.) of β0 is estimated by√√√√MSE

)roller e.g.:MSE = 45.2, Sxx = 92.6, x = 6.07, n = 10

⇒ s.e. of β0 is√

110 + 6.072

)= 4.74

Distributions of β1 and β0

◦ yi is N(β0 + β1xi, σ2), and

β1 =∑ni=1 aiyi (ai = xi−x

⇒ β1 is N(β1,σ2

Also, SSEσ2 is χ2

n−2 (independent of β1) so

β1 − β1√MSE/Sxx

∼ tn−2

◦ β0 =∑ni=1 ciyi (ci = 1

n −x(xi−x)Sxx

⇒ β0 is N(β0, σ2(

1n + x2

)) and

β0 − β0√MSE

(1n + x2

) ∼ tn−2

2.5 Inferences about the Regression Parameters

• Tests

• Confidence Intervals

for β1, and β0 + β1x0

2.5.1 Inference for β1

H0 : β1 = β10 vs. H1 : β1 6= β10

Under H0,

t0 =β1 − β10√MSE/Sxx

has a t-distribution on n− 2 degrees of freedom.

p-value = P (|tn−2| > t0)

e.g. testing significance of regression for roller:

H0 : β1 = 0 vs. H1 : β1 6= 0

t0 =2.67− 0

.699= 3.82

p-value = P (|t8| > 3.82) = 2(1− P (t8 < 3.82)) = .00509

R command: 2*(1-pt(3.82,8))

(1− α) Confidence Intervals

◦ slope:

β1 ± tn−2,α/2s.e.

β1 ± tn−2,α/2

√MSE/Sxx

roller e.g. (95% confidence interval)

2.67± t8,.025(.699)

2.67± 2.31(.699) = 2.67± 1.61

R command: qt(.975, 8)

◦ intercept:

β0 ± tn−2,α/2s.e.

e.g. − 2.11± 2.31(4.74) = −2.11± 10.9

2.5.2 Confidence Interval for Mean Response

◦ E[y|x0] = β0 + β1x0

where x0 is a possible value that the predictor could take.

◦ E[y|x0] = β0 + β1x0 is a point estimate of the mean response atthat value.

◦ To find a 1− α confidence interval for E[y|x0], we need thevariance of y0 = E[y|x0]:

Var(y0) = Var(β0 + β1x0)

β0 =n∑i=1

β1x0 =n∑i=1

bix0yi

Therefore,

y0 =n∑i=1

(ci + bix0)yi

Var(y0) =n∑i=1

(ci + bix0)2Var(yi)

= σ2n∑i=1

(ci + bix0)2

= σ2n∑i=1

(x0 − x)(xi − x)

= σ2(1

(x0 − x)2

◦ Var(y0) = MSE(1n + (x0−x)2

◦ The confidence interval is then

y0 ± tn−2,α/2

√√√√MSE(1

(x0 − x)2

◦ e.g. Compute a 95% confidence interval for the expecteddepression when the weight is 5 tonnes.

x0 = 5, y0 = −2.11 + 2.67(5) = 11.2

n = 10, x = 6.07, Sxx = 92.6,MSE = 45.2

Interval:

11.2± t8,.025

√45.2(

(5− 6.07)2

= 11.2± 2.31√

5.08 = 11.2± 5.21

= (5.99,16.41)

◦ R code:

> predict(roller.lm,

newdata=data.frame(weight<-5),

interval="confidence")

fit lwr upr

[1,] 11.2 6.04 16.5

◦ Ex. Write your own R function to compute this interval.

2.6: Predicted Responses

◦ If a new observation were to be taken at x0, we would predict itto be y0 = β0 + β1x0 + ε where ε is independent noise.

◦ y0 = β0 + β1x0 is a point prediction of the response at thatvalue.

◦ To find a 1− α prediction interval for y0, we need the variance ofy0 − y0:

Var(y0 − y0) = Var(β0 + β1x0 + ε− β0 − β1x0)

= Var(−β0 − β1x0) + Var(ε)

= σ2(1

(x0 − x)2

Sxx) + σ2

= σ2(1 +1

(x0 − x)2

◦ Var(y0 − y0) = MSE(1 + 1n + (x0−x)2

◦ The prediction interval is then

y0 ± tn−2,α/2

√√√√MSE(1 +1

(x0 − x)2

◦ e.g. Compute a 95% prediction interval for the depression for asingle new observation where the weight is 5 tonnes.

x0 = 5, y0 = −2.11 + 2.67(5) = 11.2

n = 10, x = 6.07, Sxx = 92.6,MSE = 45.2

Interval:

11.2± t8,.025

√45.2(1 +

(5− 6.07)2

= 11.2± 2.31√

50.3 = 11.2± 16.4 = (−5.2,27.6)

R code for Prediction Intervals

interval="prediction")

fit lwr upr

[1,] 11.2 -5.13 27.6

◦ Write your own R function to produce this interval.

Degrees of Freedom

• A random sample of size n coming from a normal populationwith mean µ and variance σ2 has n degrees of freedom:Y1, . . . , Yn.• Each linearly independent restriction reduces the number of

degrees of freedom by 1.•∑ni=1

(Yi−µ)2

σ2 has a χ2(n) distribution.

•∑ni=1

(Yi−Y )2

σ2 has a χ2(n−1) distribution. (Calculating Y imposes

one linear restriction.)

•∑ni=1

(Yi−Yi)2

σ2 has a χ2(n−2) distribution. (Calculating β0 and β1

imposes two linearly independent restrictions.)• Given the xi’s, there are 2 degrees of freedom in the quantityβ0 + β1xi.• Given xi and Y , there is one degree of freedom:yi − Y = β1(xi − x).

2.7: Analysis of Variance: Breaking down (analyzing) Variation

◦ The variation in the data (responses) is summarized by

Syy =n∑i=1

(yi − y)2 = TSS

(Total sum of squares)

◦ 2 sources of variation in the responses:

1. variation due to the straight line relationship with thepredictor

2. deviation from the line (noise)

◦ yi−ydeviation from data center = yi−yi

residual + yi−ydifference: line and center

yi − y = ei + yi − y36

Syy =∑

(yi − y)2

(ei + yi − y)2

∑(yi − y)2

since ∑eiyi = 0 and y

∑ei = 0

Syy = SSE +∑

(yi − y)2 = SSE + SSR

The last term is the regression sum of squares.

Relation between SSR and β1

◦ We saw earlier that

SSE = Syy − β1Sxy

◦ Therefore,

SSR = β1Sxy = β12Sxx

Note that, for a given set of x’s SSR depends only on β1.

MSR = SSR/d.f. = SSR/1

(1 degree of freedom for slope parameter)

Expected Sums of Squares

E[SSR] = E[Sxxβ12]

= Sxx(Var(β1) + (E[β1])2

Sxx+ β2

= σ2 + β21Sxx

= E[MSR]

Therefore, if β1 = 0, then SSR is an unbiased estimator for σ2.◦ Development of E[MSE]:

E[Syy] = E[∑

(yi − y)2]

= E[∑

y2i − ny

2] =∑

E[y2i ]− nE[y2]

Development of E[MSE] (cont’d)

Consider the 2 terms on RHS, separately:

1. term:

E[y2i ] = Var(yi) + (E[yi])

= σ2 + (β0 + β1xi)2

∑E[y2

i ] = nσ2 + nβ20 + 2nβ0β1x+

∑β2

2. term:

E[y2] = Var(y) + (E[y])2

= σ2/n+ (β0 + β1x)2

nE[y2] = σ2 + nβ20 + 2nβ0β1x+ nβ2

Development of E[MSE] (cont’d)

E[Syy] = (n− 1)σ2 + β21

∑(xi − x)2

= E[SST ]

E[SSE] = E[Syy]− E[SSR]

= (n− 1)σ2 + β21

∑(xi − x)2 − (σ2 + β2

= (n− 2)σ2

E[MSE] = E[SSE/(n− 2)] = σ2

Another approach to testing H0 : β1 = 0

◦ Under the null hypothesis, both MSE and MSR estimate σ2.

◦ Under the alternative, only MSE estimates σ2.

E[MSR] = σ2 + β21Sxx > σ2

◦ A reasonable test is

F0 =MSRMSE

∼ F1,n−2

◦ Large F0 ⇒ evidence against H0.

◦ Note t2ν = F1,ν so this is really the same test as

β1√MSE/Sxx

MSRMSE

The ANOVA table

Source df SS MS FReg. 1 β1

2Sxx β12Sxx

MSRMSE

Error n− 2 Syy − β12Sxx SSE/(n− 2)

Total n-1 Syyroller data example:> anova(roller.lm) # R code

Analysis of Variance Table

Response: depression

Df Sum Sq Mean Sq F value Pr(>F)

weight 1 658 658 14.5 0.0052

Residuals 8 363 45

(recall that the t-statistic for testing β1 = 0 had been 3.81 =√14.5)

◦ Ex. Write an R function to compute these ANOVA quantities.

Confidence Interval for σ2

◦ SSEσ2 ∼ χ2

n−2so

P (χ2n−2,1−α/2 ≤

σ2≤ χ2

n−2,α/2) = 1− α

P (SSE

χ2n−2,α/2

≤ σ2 ≤SSE

χ2n−2,1−α/2

) = 1− α

e.g. roller data:

SSE = 363

χ28,.975 = 2.18

(R code: 1-qchisq(8, .025))

χ28,.025 = 17.5

(363/17.5,363/2.18) = (20.7,166.5)

2.7.1 R2 - Coefficient of Determination

◦ R2 = is the fraction of the response variability explained by theregression:

R2 =SSRSyy

◦ 0 ≤ R2 ≤ 1. Values near 1 imply that most of the variability isexplained by the regression.

◦ roller data:

SSR = 658 and Syy = 1021

R2 =658

1021= .644

R output

Multiple R-Squared: 0.644, ...

◦ Ex. Write an R function which computes R2.

◦ Another interpretation:

E[R2].

=E[SSR]

E[Syy]=

β21Sxx + σ2

(n− 1)σ2 + β21Sxx

1Sxxn−1 + σ2

σ2 + β21Sxxn−1

Sxx(n−1)

σ2 + β21

Sxx(n−1)

for large n. (Note: this differs from the textbook.)

Properties of R2

• Thus, R2 increases as

1. Sxx increases (x’s become more spread out)

2. σ2 decreases

◦ Cautions

1. R2 does not measure the magnitude of the regression slope.

2. R2 does not measure the appropriateness of the linear model.

3. A large value of R2 does not imply that the regression modelwill be an accurate predictor.

Hazards of Regression

• Extrapolation: predicting y values outside the range of observedx values. There is no guarantee that a future response wouldbehave in the same linear manner outside the observed range.e.g. Consider an experiment with a spring. The spring isstretched to several different lengths x (in cm) and the restoringforce F (in Newtons) is measured:

> spring.lm <- lm(F˜x - 1,data=spring)

> summary(spring.lm)

Coefficients:47

x 1.5884 0.0232 68.6 6.8e-06

The fitted model relating F to x is

F = 1.58x

Can we predict the restoring force for the spring, if it has beenextended to a length of 15 cm?

• High leverage observations: x values at the extremes of therange have more influence on the slope of the regression thanobservations near the middle of the range.

• Outliers can distort the regression line. Outliers may beincorrectly recorded OR may be an indication that the linearrelation or constant variance assumption is incorrect.

• A regression relationship does not mean that there is acause-and-effect relationship. e.g. The following data give thenumber of lawyers and number of homicides in a given year for anumber of towns:

no. lawyers no. homicides

Note that the number of homicides increases with the number oflawyers. Does this mean that in order to reduce the number ofhomicides, one should reduce the number of lawyers?

• Beware of nonsense relationships. e.g. It is possible to showthat the area of some lakes in Manitoba is related to elevation.Do you think there is a real reason for this? Or is the apparentrelation just a result of chance?

2.9 - Regression through the Origin

◦ intercept = 0

yi = β1xi + ε

◦ Max. Likelihood and L-S⇒

minimizen∑i=1

(yi − β1xi)2

∑xiyi∑x2i

ei = yi − β1xi

SSE =∑

◦ Max. Likelihood⇒

σ2 =SSE

◦ Unbiased Estimator:

σ2 = MSE =SSE

n− 1

◦ Properties of β1:

E[β1] = β1

Var(β1) =σ2∑x2i

◦ 1− α C.I. for β1:

β1 ± tn−1,α/2

√√√√MSE∑x2i

◦ 1− α C.I. for E[y|x0]:

y0 ± tn−1,α/2

√√√√MSEx20∑

Var(β1x0) =σ2∑x2i

◦ 1− α P.I. for y, given x0:

y0 ± tn−1,α/2

√√√√MSE(1 +x2

0∑x2i

◦ R code:> roller.lm <- lm(depression˜weight - 1,

data=roller)

Coefficients:

weight 2.392 0.299 7.99 2.2e-05

Residual standard error: 6.43

on 9 degrees of freedom

Multiple R-Squared: 0.876,

Adjusted R-squared: 0.863

F-statistic: 63.9 on 1 and 9 DF,

p-value: 2.23e-005

interval="prediction")

fit lwr upr

[1,] 12.0 -2.97 26.9

Ch. 2.9.2 Correlation

• Bivariate Normal Distribution:

f(x, y) =1

2πσ1σ2

√1− ρ2

e− 1

2(1−ρ2)(Y 2−2ρXY+X2)

X =x− µ1

Y =y − µ2

◦ correlation coefficient:

ρ =σ12

σ1σ2

◦ µ1 and σ21 are the mean and variance of x

◦ µ2 and σ22 are the mean and variance of y

◦ σ12 = E[(x− µ1)(y − µ2)], the covariance

Conditional distribution of y given x

f(y|x) =1√

2πσ1.2e−1

(y−β0−β1x

β0 = µ2 − µ1ρσ2

β1 =σ2

σ21.2 = σ2

2(1− ρ2)

An Explanation

◦ Suppose

y = β0 + β1x+ ε

where ε and x are independent N(0, σ2) and N(µ1, σ21) random

variables.◦ Define µ2 = E[y] and σ2

2 = Var(y):

µ2 = β0 + β1µ1

σ22 = β2

1σ21 + σ2

◦ Define σ12 = Cov(x− µ1, y − µ2):

σ12 = E[(x− µ1)(y − µ2)]

= E[(x− µ1)(β1x+ ε− β1µ1)]

= β1E[(x− µ1)2] + E[(x− µ1)ε]

= β1σ21

◦ Define ρ = σ12σ1σ2

ρ =β1σ

σ1σ2= β1

Therefore,

β1 =σ2

β0 = µ2 − β1µ1

= µ2 − µ1σ2

◦ What is the conditional distribution of y, given x = x?

y|x=x = β0 + β1x+ ε

must be normal with mean

E[y|x = x] = β0 + β1x

= µ2 + ρσ2

σ1(x− µ1)

and variance

Var(y|x = x) = σ21.2 = σ2 = σ2

2 − β21σ

= σ22 − ρ

σ21σ2

= σ22(1− ρ2)

Estimation

◦ Maximum Likelihood Estimation (using bivariate normal) gives

β0 = y − β1x

β1 =Sxy

r = ρ =Sxy√SxxSyy

◦ Note that

r = β1

√Sxx

r2 = β12Sxx

Syy= R2

i.e. the coefficient of determination = square of correlationcoefficient

Testing H0 : ρ = 0 vs. H1 : ρ 6= 0

◦ Cond’l approach (equiv. to testing β1 = 0):

t0 =β1√

MSE/Sxx

=β1√

Syy−β12Sxx

(n−2)Sxx

√√√√√β12(n− 2)

SyySxx− β1

√√√√(n− 2)r2

1− r2

where we have used β12 = r2Syy/Sxx.

◦ The above statistic is a t statistic on n− 2 degrees of freedom⇒conclude ρ 6= 0 if pvalue

P (|tn−2| > |t0|) is small.

Testing H0 : ρ = ρ0

1− rhas an approximate normal distribution with mean

µZ =1

1 + ρ

1− ρand variance

σ2Z =

n− 3

for large n.

◦ Thus,

Z0 =log 1+r

1−r − log 1+ρ01−ρ0

1/(n− 3)

has an approximate standard normal distribution when the nullhypothesis is true.

Confidence Interval for ρ

• Confidence interval for 12 log 1+ρ

1−ρ :

Z ± zα/2

√1/(n− 3)

• Find endpoints (l, u) of this confidence interval and solve for ρ:

(e2l − 1

1 + e2l,e2u − 1

1 + e2u)

R code for fossum example

• Find 95% confidence interval for the correlation between totallength and head length:

> source("fossum.R")

> attach(fossum)

> n <- length(totlngth) # sample size

> r <- cor(totlngth,hdlngth)# correlation est.

> zci <- .5*log((1+r)/(1-r)) +

qnorm(c(.025,.975))*sqrt(1/(n-3))

> ci <- (exp(2*zci)-1)/(1+exp(2*zci))

> ci # transformed conf. interval

> detach(fossum)

[1] 0.62 0.83 # conf. interval for true

# correlation

Chapter 2: Simple Linear Regression 2.2-2.3 Parameter ... · PDF fileChapter 2: Simple Linear...

Documents

Linear Regression

SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression

Title stata.com etregress — Linear regression with ... · etregress — Linear regression with endogenous treatment effects ... etregress— Linear regression with endogenous

The Algebra of Linear Regression - statpower.net Notes/RegressionAlgebra.pdf · The Algebra of Linear Regression 1 Introduction 2 Bivariate Linear Regression 3 Multiple Linear Regression

Linear Regression - Wharton Finance - Finance …finance.wharton.upenn.edu/.../Linear-Regression-Slides.pdf · 2009-10-07 · regression model, 2-variable linear regression model)

Econometrics notes (Introduction, Simple Linear regression, Multiple linear regression)

Linear regression: Part 1 - NTNU · Linear regression: Part 1. Lecture Outline What are linear models?-EX1: What is the ‘best’ line? What is linear regression? ... Linear regression

Linear and Non-Linear Regression: Powerful and Very ...business.expertjournals.com/ark:/16759/EJBM_322... · Keywords: Linear Regression, Non-Linear Regression, Best-Fitting Model,

LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model

Linear Models & Linear Regression

New Linear Regression (Part II) · 2020. 9. 23. · Linear Regression (Part II) nanopoulos@ismll.de Outline • Generalize the linear regression • Simple multiple regression •

Chapter 13: SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression

Multiple Linear Regression - Analysis Made Easy · Multiple Linear Regression The MULTIPLE LINEAR REGRESSION command performs simple multiple regression using least squares. Linear

Part II Multiple Linear Regression - Statistics · PDF filePart II Multiple Linear Regression 86. Chapter 7 Multiple Regression A multiple linear regression model is a linear model

Linear Regression - SAS · MULTIPLE LINEAR REGRESSION SAS/STAT SYNTAX & EXAMPLE DATA ... SIMPLE LINEAR REGRESSION ... title 'Best Models Using All-Regression Option'; run;

1 Curve-Fitting Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression Interpolation

Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model

Generalized linear regression - Little Dumb doctor to Multiple Linear Regression 3 tics tutorial at Multiple Linear RegressionMultiple Linear Regression • Regression analysis is

1 Curve-Fitting Polynomial Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression

Regression Linear Regression