28
The linear regression model Bayesian estimation Module 4: Bayesian Methods Lecture 5: Linear regression Peter Hoff Departments of Statistics and Biostatistics University of Washington 1/28

Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Module 4: Bayesian MethodsLecture 5: Linear regression

Peter Hoff

Departments of Statistics and BiostatisticsUniversity of Washington

1/28

Page 2: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Outline

The linear regression model

Bayesian estimation

2/28

Page 3: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Regression models

How does an outcome Y vary as a function of x = {x1, . . . , xp}?

• Which xj ’s have an effect?

• What are the effect sizes?

• Can we predict Y as a function of x?

These questions can be assessed via a regression model p(y |x).

3/28

Page 4: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Regression data

Parameters in a regression model can be estimated from data: y1 x1,1 · · · x1,p...

......

yn xn,1 · · · xn,p

These data are often expressed in matrix/vector form:

y =

y1...yn

X =

x1

...xn

=

x1,1 · · · x1,p...

...xn,1 · · · xn,p

4/28

Page 5: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

FTO experiment

FTO gene is hypothesized to be involved in growth and obesity.

Experimental design:

• 10 fto + /− mice

• 10 fto − /− mice

• Mice are sacrificed at the end of 1-5 weeks of age.

• Two mice in each group are sacrificed at each age.

5/28

Page 6: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

FTO Data

● ●

● ●

●●

1 2 3 4 5

510

1520

25

age (weeks)

wei

ght (

g)

fto−/−fto+/−

6/28

Page 7: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Data analysis

• y = weight

• xg = fto heterozygote ∈ {0, 1} = number of “+” alleles

• xa = age in weeks ∈ {1, 2, 3, 4, 5}

How can we estimate p(y |xg , xa)?

Cell means model:

genotype age−/− θ0,1 θ0,2 θ0,3 θ0,4 θ0,5+/− θ1,1 θ1,2 θ1,3 θ1,4 θ1,5

Problem: Only two observations per cell.

7/28

Page 8: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Linear regression

Solution: Assume smoothness as a function of age. For each group,

y = α0 + α1xa + ε

This is a linear regression model.Linearity means “linear in the parameters”.

We could also try the model

y = α0 + α1xa + α2x2a + α3x

3a + ε,

which is also a linear regression model.

8/28

Page 9: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Multiple linear regression

We can estimate the regressions for both groups simultaneously:

Yi = β1xi,1 + β2xi,2 + β3xi,3 + β4xi,4 + εi , where

xi,1 = 1 for each subject i

xi,2 = 0 if subject i is homozygous, 1 if heterozygous

xi,3 = age of subject i

xi,4 = xi,2 × xi,3

Under this model,

E[Y |x] = β1 + β3 × age if x2 = 0, and

E[Y |x] = (β1 + β2) + (β3 + β4)× age if x2 = 1.

9/28

Page 10: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Multiple linear regression

● ●

● ●

●●

510

1520

25w

eigh

t

β3 = 0 β4 = 0

● ●

● ●

●●

β2 = 0 β4 = 0

● ●

● ●

●●

1 2 3 4 5

510

1520

25

age

wei

ght

β4 = 0

● ●

● ●

●●

1 2 3 4 5age

10/28

Page 11: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Normal linear regression

How does each Yi vary around E[Yi |β, xi ] ?

Assumption of normal errors:

ε1, . . . , εn ∼ i.i.d. normal(0, σ2)

Yi = βTxi + εi .

This completely specifies the probability density of the data:

p(y1, . . . , yn|x1, . . . xn,β, σ2)

=n∏

i=1

p(yi |xi ,β, σ2)

= (2πσ2)−n/2 exp{− 1

2σ2

n∑i=1

(yi − βTxi )2}.

11/28

Page 12: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Matrix form

• Let y be the n-dimensional column vector (y1, . . . , yn)T ;

• Let X be the n × p matrix whose ith row is xi .

Then the normal regression model is that

{y|X,β, σ2} ∼ multivariate normal (Xβ, σ2I),

where I is the p × p identity matrix and

Xβ =

x1 →x2 →

...xn →

β1

...βp

=

β1x1,1 + · · ·+ βpx1,p...

β1xn,1 + · · ·+ βpxn,p

=

E[Y1|β, x1]...

E[Yn|β, xn]

.

12/28

Page 13: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Ordinary least squares estimation

What values of β are consistent with our data y,X?

Recall

p(y1, . . . , yn|x1, . . . xn,β, σ2) = (2πσ2)−n/2 exp{− 1

2σ2

n∑i=1

(yi − βTxi )2}.

This is big when SSR(β) =∑

(yi − βTxi )2 is small.

SSR(β) =n∑

i=1

(yi − βTxi )2 = (y − Xβ)T (y − Xβ)

= yTy − 2βTXTy + βTXTXβ .

What value of β makes this the smallest?

13/28

Page 14: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Calculus

Recall from calculus that

1. a minimum of a function g(z) occurs at a value z such that ddzg(z) = 0;

2. the derivative of g(z) = az is a and the derivative of g(z) = bz2 is 2bz .

d

dβSSR(β) =

d

(yTy − 2βTXTy + βTXTXβ

)= −2XTy + 2XTXβ ,

Therefore,

d

dβSSR(β) = 0 ⇔ −2XTy + 2XTXβ = 0

⇔ XTXβ = XTy

⇔ β = (XTX)−1XTy .

β̂ols = (XTX)−1XTy is the OLS estimator of β.

14/28

Page 15: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

OLS estimation in R

### OLS estimatebeta.ols<- solve( t(X)%*%X )%*%t(X)%*%y

c(beta.ols)

## [1] -0.06822 2.94485 2.84421 1.72948

### using lmfit.ols<-lm(y~ X[,2] + X[,3] +X[,4] )

summary(fit.ols)$coef

## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.06822 1.4223 -0.04796 9.623e-01## X[, 2] 2.94485 2.0114 1.46406 1.625e-01## X[, 3] 2.84421 0.4288 6.63235 5.761e-06## X[, 4] 1.72948 0.6065 2.85171 1.154e-02

15/28

Page 16: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

OLS estimation

● ●

● ●

●●

1 2 3 4 5age

wei

ght

summary(fit.ols)$coef

## Estimate Std. Error t value Pr(>|t|)## (Intercept) -0.06822 1.4223 -0.04796 9.623e-01## X[, 2] 2.94485 2.0114 1.46406 1.625e-01## X[, 3] 2.84421 0.4288 6.63235 5.761e-06## X[, 4] 1.72948 0.6065 2.85171 1.154e-02

16/28

Page 17: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Bayesian inference for regression models

yi = β1xi,1 + · · ·+ βpxi,p + εi

Motivation:

• Posterior probability statements: Pr(βj > 0|y,X)

• OLS tends to overfit when p is large, Bayes more conservative.

• Model selection and averaging

17/28

Page 18: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Prior and posterior distribution

prior β ∼ mvn(β0,Σ0)sampling model y ∼ mvn(Xβ, σ2I)posterior β|y,X ∼ mvn(βn,Σn)

where

Σn = Var[β|y,X, σ2] = (Σ−10 + XTX/σ2)−1

βn = E[β|y,X, σ2] = (Σ−10 + XTX/σ2)−1(Σ−1

0 β0 + XTy/σ2).

Notice:

• If Σ−10 � XTX/σ2, then βn ≈ β̂ols

• If Σ−10 � XTX/σ2, then βn ≈ β0

18/28

Page 19: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

The g-prior

How to pick β0,Σ0?

g-prior:β ∼ mvn(0, gσ2(XTX)−1)

Idea: The variance of the OLS estimate β̂ols is

Var[β̂ols] = σ2(XTX)−1 =σ2

n(XTX/n)−1

This is roughly the uncertainty in β from n observations.

Var[β]gprior = gσ2(XTX)−1 =σ2

n/g(XTX/n)−1

The g -prior can roughly be viewed as the uncertainty from n/g observations.

For example, g = n means the prior has the same amount of info as 1 obs.

19/28

Page 20: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Posterior distributions under the g -prior

{β|y,X, σ2} ∼ mvn(βn,Σn)

Σn = Var[β|y,X, σ2] =g

g + 1σ2(XTX)−1

βn = E[β|y,X, σ2] =g

g + 1(XTX)−1XTy

Notes:

• The posterior mean estimate βn is simply gg+1

β̂ols.

• The posterior variance of β is simply gg+1

Var[β̂ols].

• g shrinks the coefficients and can prevent overfitting to the data

• If g = n, then as n increases, inference approximates that using β̂ols.

20/28

Page 21: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Monte Carlo simulation

What about the error variance σ2?

prior 1/σ2 ∼ gamma(ν0/2, ν0σ20/2)

sampling model y ∼ mvn(Xβ, σ2I)posterior 1/σ2|y,X ∼ gamma([ν0 + n]/2, [ν0σ

20 + SSRg ]/2)

where SSRg is somewhat complicated.

Simulating the joint posterior distribution:

joint distribution p(σ2,β|y,X) = p(σ2|y,X)× p(β|y,X, σ2)simulation {σ2,β} ∼ p(σ2,β|y,X) ⇔ σ2 ∼ p(σ2|y,X),β ∼ p(β|y,X, σ2)

To simulate {σ2,β} ∼ p(σ2,β|y,X),

1. First simulate σ2 from p(σ2|y,X)

2. Use this σ2 to simulate β from p(β|y,X, σ2)

Repeat 1000’s of times to obtain MC samples: {σ2,β}(1), . . . , {σ2,β}(S).

21/28

Page 22: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

FTO example

Priors:

1/σ2 ∼ gamma(1/2, 3.6781/2)

β|σ2 ∼ mvn(0, g × σ2(XTX)−1)

Posteriors:

{1/σ2|y,X} ∼ gamma((1 + 20)/2, (3.6781 + 251.7753)/2)

{β|Y,X, σ2} ∼ mvn(.952× β̂ols, .952× σ2(XTX)−1)

where

(XT X)−1 =

0.55 −0.55 −0.15 0.15−0.55 1.10 0.15 −0.30−0.15 0.15 0.05 −0.050.15 −0.30 −0.05 0.10

β̂ols =

−0.06822.94492.84421.7295

22/28

Page 23: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

R-code

## data dimensionsn<-dim(X)[1] ; p<-dim(X)[2]

## prior parametersnu0<-1s20<-summary(lm(y~-1+X))$sigma^2g<-n

## posterior calculationsHg<- (g/(g+1)) * X%*%solve(t(X)%*%X)%*%t(X)SSRg<- t(y)%*%( diag(1,nrow=n) - Hg ) %*%y

Vbeta<- g*solve(t(X)%*%X)/(g+1)Ebeta<- Vbeta%*%t(X)%*%y

## simulate sigma^2 and betas2.post<-beta.post<-NULLfor(s in 1:5000){

s2.post<-c(s2.post,1/rgamma(1, (nu0+n)/2, (nu0*s20+SSRg)/2 ) )beta.post<-rbind(beta.post, rmvnorm(1,Ebeta,s2.post[s]*Vbeta))

}

23/28

Page 24: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

MC approximation to posterior

s2.post[1:5]

## [1] 9.737 13.002 15.284 14.528 14.818

beta.post[1:5,]

## [,1] [,2] [,3] [,4]## [1,] 1.701 1.2066 1.649 2.841## [2,] -1.868 1.2554 3.216 1.975## [3,] 1.032 1.5555 1.909 2.338## [4,] 3.351 -1.3819 2.401 2.364## [5,] 1.486 -0.6652 2.032 2.977

24/28

Page 25: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

MC approximation to posterior

quantile(s2.post,probs=c(.025,.5,.975))

## 2.5% 50% 97.5%## 7.163 12.554 24.774

quantile(sqrt(s2.post),probs=c(.025,.5,.975))

## 2.5% 50% 97.5%## 2.676 3.543 4.977

apply(beta.post,2,quantile,probs=c(.025,.5,.975))

## [,1] [,2] [,3] [,4]## 2.5% -5.26996 -4.840 1.065 -0.5929## 50% -0.01051 2.698 2.678 1.6786## 97.5% 5.20650 9.992 4.270 3.9071

25/28

Page 26: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

OLS/Bayes comparison

apply(beta.post,2,mean)

## [1] 0.0133 2.7080 2.6796 1.6736

apply(beta.post,2,sd)

## [1] 2.6637 3.7726 0.8055 1.1429

summary(fit.ols)$coef

## Estimate Std. Error t value Pr(>|t|)## X -0.06822 1.4223 -0.04796 9.623e-01## Xxg 2.94485 2.0114 1.46406 1.625e-01## Xxa 2.84421 0.4288 6.63235 5.761e-06## X 1.72948 0.6065 2.85171 1.154e-02

26/28

Page 27: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Posterior distributions

−10 0 10 20

0.00

0.02

0.04

0.06

0.08

0.10

β2

−4 −2 0 2 4 6 8

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

β4

−10 −5 0 5 10 15

−2

02

46

β2

β 4

27/28

Page 28: Module 4: Bayesian Methods Lecture 5: Linear …...Module 4: Bayesian Methods Lecture 5: Linear regression Peter Ho Departments of Statistics and Biostatistics University of Washington

The linear regression model Bayesian estimation

Summarizing the genetic effect

Genetic effect = E[y |age,+/−]− E[y |age,−/−]

= [(β1 + β2) + (β3 + β4)× age]− [β1 + β3 × age]

= β2 + β4 × age

1 2 3 4 5

05

1015

age

β 2+

β 4ag

e

28/28