20
Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Embed Size (px)

Citation preview

Page 1: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Ridge regression and Bayesian linear

regressionKenneth D. Harris

6/5/15

Page 2: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Multiple linear regression

What are you predicting?

Data type Continuous

Dimensionality 1

What are you predicting it from?

Data type Continuous

Dimensionality p

How many data points do you have? Enough

What sort of prediction do you need? Single best guess

What sort of relationship can you assume? Linear

Page 3: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Multiple linear regression

What are you predicting?

Data type Continuous

Dimensionality 1

What are you predicting it from?

Data type Continuous

Dimensionality p

How many data points do you have? Not enough

What sort of prediction do you need? Single best guess

What sort of relationship can you assume? Linear

Page 4: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Multiple predictors, one predicted variable

• Choose to minimize sum-squared error:

Optimal weight vector (in MATLAB)

Page 5: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Too many predictors

• If , you can fit the training data perfectly• is equations in unknowns

• If , the solution is underconstrained ( is not invertible)

• But even if , you can problems with too many predictors

Page 6: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

𝑁=40 ,𝑝=30 , 𝑦=𝑥1

Page 7: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒

Page 8: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

𝑁=40 ,𝑝=30 , 𝑦=𝑥1+𝑛𝑜𝑖𝑠𝑒

Page 9: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Geometric interpretation

Target

𝐱𝟏

Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .

It would be better to just fit .

SignalN

oise

𝐱𝟐

Page 10: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Geometric interpretation

Target

𝐱𝟏

Target can be fit exactly, by having a massive positive weight of for and a massive negative weight for .

It would be better to just fit .

SignalN

oise

𝐱𝟐

Page 11: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Overfitting = large weight vectors

• Solution: weight vector penalty

Optimal weight vector

The inverse can always be taken, even for .

Page 12: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Example

𝜆=0 𝜆=3

Page 13: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Ridge regression introduces a bias𝜆=0 𝜆=50

Page 14: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

A quick trick to do ridge regression

• Ordinary linear regression:

Minimizes . Define

Then is the solution to ridge regression. (Why?)

Page 15: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Regression as a probability model

What are you predicting?

Data type Continuous

Dimensionality 1

What are you predicting it from?

Data type Continuous

Dimensionality p

How many data points do you have? Enough

What sort of prediction do you need? Probability distribution

What sort of relationship can you assume? Linear

Page 16: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Regression as a probability model

• Assume is random, but and are just numbers.

Then the likelihood is

Maximum likelihood is the same as least-squares fit.

Page 17: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Bayesian linear regression

• Now consider to also be random with prior distribution:

The posterior distribution is

Page 18: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Bayesian linear regression

This is all quadratic in . So is Gaussian distributed.

Page 19: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Bayesian linear regression

Mean of is exactly the same as in ridge regression. But we also get a covariance matrix for .

Page 20: Ridge regression and Bayesian linear regression Kenneth D. Harris 6/5/15

Bayesian predictions• Given a training set , and a new value Assume is random but are fixed.

• To make a prediction of , integrate over all possible :

Mean is the same as in ridge regression, but we also get a variance:.

The variance does not depend on the training set . It is low when many of the training set values are collinear with .