Lecture 10 Multiple Linear Regressionghobbs/STAT_512/Lecture...predicting freshman GPA based on ACT scores. Perhaps we consider high school GPA and an intelligence test as well. Then

10-1

Lecture 10

Multiple Linear Regression

STAT 512

Spring 2011

Background Reading

KNNL: 6.1-6.5

10-2

Topic Overview

• Multiple Linear Regression Model

10-3

Data for Multiple Regression

• iY is the response variable (as usual)

• 1 2 , 1, ,i i i pX X X −⋯ are the 1p− explanatory

variables for cases 1,2,...,i n=

• Example – In HW #2, you considered

predicting freshman GPA based on ACT

scores. Perhaps we consider high school

GPA and an intelligence test as well. Then

for this problem, we would be using p = 4.

10-4

Multicollinearity

• Predictor variables are often correlated to

each other.

• If predictor variables are highly correlated,

they will be “fighting” to explain the same

part of the variation in the response

variable.

• Caution: Using highly correlated predictor

variables in the same model will not lead

to useful parameter estimates. Want to be

careful of this.

10-5

Multiple Regression Model

0 1 1 2 2 1 , 1i i i p i p iY X X Xβ β β β ε− −= + + + + +⋯

• 1,2,...,i n= observations

• Assumptions exactly as before:

( )2~ 0,iid

i Nε σ

• iY is the value of the response variable for

the ith case.

• ikX is the value of the kth explanatory

variable for the ith case.

10-6

Multiple Regression Model (2)

• 0β is the intercept (think multidimensional).

• 1 2 1, , , pβ β β −⋯ are the regression (slope)

coefficients for the explanatory variables.

• Parameters as usual include all of the 'sβ as

well as 2σ . These need to be estimated

from the data.

10-7

Regression Plane/Surface

10-8

Model in Matrix Form

( )( )

11 1

2

2

~ N ,

,

nn n p p

n n

N

σ

σ

×× × ×

×

= +

∼

Y X

0 I

Y X I

β ε

ε

β

10-9

Design matrix X

11 12 1, 1

21 22 2, 1

1 2 , 1

1

1

1

p

p

n p

n n n p

X X X

X X X

X X X

−

−

×

−

=

X

⋯

⋯

⋮ ⋮ ⋮ ⋱ ⋮

⋯

10-10

Coefficient matrix ββββ

0

1

1

1

p

p

β

β

β

×

−

=

⋮β

10-11

Least Squares Solution

• Minimize distances between point and

response surface

• Find b to minimize

( ) ( )SSE ′= − −Y Xb Y Xb

• Obtain normal equations as before:

′ ′X Xb = X Y

• Least Squares Solution as before:

( )1−

′ ′=b X X X Y

10-12

Fitted Values / Residuals

• Fitted (predicted) values for the mean of Y

are

( )1ˆ −

′ ′= = =Y Xb X X X X Y HY

• Residuals are

( )ˆ− − −e = Y Y = Y HY = I H Y

• Note formulas are same as before, with hat

matrix:

( )1−

′ ′=H X X X X

10-13

“Linear” Regression Models

• The term linear here refers to the

parameters, not the predictor variables.

• We can use linear regression models to deal

with almost any “function” of a predictor

variable (e.g. ( )2, logX X , etc.)

• We cannot use linear regression models to

deal with nonlinear functions of the

parameters (unless we can find a

transformation that makes them linear).

10-14

Types of Predictors

• Continuous Predictors – we are used to

these.

• Qualitative Predictors

� Two possible outcomes (e.g. male/female)

represented by 0 or 1

• Polynomial Regression

� Use squared or higher-ordered terms in regression

model.

� Typically always include lower order terms.

� 2 10 1 2 1

p

i i i p i iY X X Xβ β β β ε−

−= + + + + +⋯

10-15

Types of Predictors (2)

• Using Transformed Variables

� Transform one or more X’s

� Transform Y

• Interaction Effects

� Use Product of Predictor variables as an additional

variable.

� Each variable in the product included by itself as

well.

� 0 1 1 2 2 3 1 2i i i i i iY X X X Xβ β β β ε= + + + +

• More on some of these models later...

10-16

Analysis of Variance

Formulas for sums of squares(in matrix terms)

are the same as before

( )

( )

( )

2

2

2

1ˆ

ˆ

1

i

i i

i

SSR Y Yn

SSE Y Y

SSTO Y Yn

′ ′ ′= − = −

′ ′ ′ ′= − = = −

′ ′= − = −

∑

∑

∑

b X Y Y JY

e e Y Y b X Y

Y Y Y JY

10-17

Analysis of Variance (2)

• Degrees of Freedom depend on the model

• Always n – 1 total degrees of freedom

• Model degrees of freedom is equal to the

number of terms in the model ( )1p−

� Each variable has at least one term

� May be additional terms for squares,

interactions, etc.

• Error degrees of freedom is difference

between total and model degrees of

freedom ( )n p− .

10-18

Analysis of Variance (3)

• Mean Squares obtained by dividing SS by

DF for each source.

• The mean square error (MSE) is still,

always, and forever, the estimate of 2σ .

10-19

ANOVA Table

Source df SS MS F

Regression

(Model) p-1 ( )

2ˆiY Y−∑

R

SSR

df

MSR

MSE

Error n-p ( )2ˆ

i iY Y−∑ E

SSE

df

Total n-1 ( )2

iY Y−∑ T

SSTO

df

10-20

F-test for model significance

• The ratio F = MSR / MSE is again used to

test for a regression relationship.

• Difference from SLR

� Null Hyp: 0 1 2 1: ... 0pH β β β −= = = =

� Alt Hyp: : at least one 0a kH β ≠

• Tests model significance, not individual

variables; gives no indication of which

variable(s) in particular are important

10-21

F-test for model significance (2)

• Under null, has F-distribution with degrees

of freedom 1p− and n p− .

• Reject if statistic is larger than critical value

for 0.05α = ; or if p-value for test (given

in SAS ANOVA table) is less than 0.05

• If reject, conclude at least one of the

explanatory variables is important.

• If fail to reject, and sample size large

enough (power), then none of the

explanatory variables are useful.

10-22

Coefficient of Multiple Determination

• 2 1SSR SSE

RSSTO SSTO

= = −

• Measures the percentage of variation

explained by the variables in the model.

• Additional variables will make R2 go up; so

cannot really use R2 to determine whether a

variable should be added.

10-23

Adjusted R2

• 2 11 1a

MSE n SSER

MSTO n p SSTO

− = − = − −

• Recall mean squares are SS adjusted by

degrees of freedom

• 2aR can increase or decrease when a new

variable is introduced into the model;

depending on whether the decrease in SSE

is offset by the lost degree of freedom.

• 2aR can be used to decide if variables are

important in a model.

10-24

Inference for INDIVIDUAL Regression Coefficients

• We already have b ~ N(

�, σ2(X΄X)-1), so

define

{ } ( )1

p p

MSE−

×

′= ×2s b X X

• For individual kb , the estimated variance is

the kth diagonal element of this matrix:

{ } { }2

,k k ks b =

2s b

Note: k=0,1,…,p-1.

10-25

Confidence Intervals for kβ

• CI for kβ is { }k crit kb t s b±

• Critical value comes from t-distribution with

n p− degrees of freedom (DF for error)

• If CI includes zero, then we cannot reject

0 : 0kH β = (i.e. that variable is not

significant when added to the model

containing all of the other variables.)

10-26

Significance Test for kβ

• Is known as a

�� test; tests

whether the kth explanatory variable is

important when added to all of the other

variables in the model (i.e. it is a

conditional test).

• Test statistic is as before: { }* /k kt b s b=

• Compare to t-critical value on n – p degrees

of freedom.

• This is the test given in the model

parameters section of SAS for PROC REG.

10-27

Upcoming in Lecture 11

• Case Study: Computer Science Student

Data

Documents

Lecture 10 Multiple Linear Regressionghobbs/STAT_512/Lecture...predicting freshman GPA based on ACT scores. Perhaps we consider high school GPA and an intelligence test as well. Then