Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
10-1
Lecture 10
Multiple Linear Regression
STAT 512
Spring 2011
Background Reading
KNNL: 6.1-6.5
10-2
Topic Overview
• Multiple Linear Regression Model
10-3
Data for Multiple Regression
• iY is the response variable (as usual)
• 1 2 , 1, ,i i i pX X X −⋯ are the 1p− explanatory
variables for cases 1,2,...,i n=
• Example – In HW #2, you considered
predicting freshman GPA based on ACT
scores. Perhaps we consider high school
GPA and an intelligence test as well. Then
for this problem, we would be using p = 4.
10-4
Multicollinearity
• Predictor variables are often correlated to
each other.
• If predictor variables are highly correlated,
they will be “fighting” to explain the same
part of the variation in the response
variable.
• Caution: Using highly correlated predictor
variables in the same model will not lead
to useful parameter estimates. Want to be
careful of this.
10-5
Multiple Regression Model
0 1 1 2 2 1 , 1i i i p i p iY X X Xβ β β β ε− −= + + + + +⋯
• 1,2,...,i n= observations
• Assumptions exactly as before:
( )2~ 0,iid
i Nε σ
• iY is the value of the response variable for
the ith case.
• ikX is the value of the kth explanatory
variable for the ith case.
10-6
Multiple Regression Model (2)
• 0β is the intercept (think multidimensional).
• 1 2 1, , , pβ β β −⋯ are the regression (slope)
coefficients for the explanatory variables.
• Parameters as usual include all of the 'sβ as
well as 2σ . These need to be estimated
from the data.
10-7
Regression Plane/Surface
10-8
Model in Matrix Form
( )( )
11 1
2
2
~ N ,
,
nn n p p
n n
N
σ
σ
×× × ×
×
= +
∼
Y X
0 I
Y X I
β ε
ε
β
10-9
Design matrix X
11 12 1, 1
21 22 2, 1
1 2 , 1
1
1
1
p
p
n p
n n n p
X X X
X X X
X X X
−
−
×
−
=
X
⋯
⋯
⋮ ⋮ ⋮ ⋱ ⋮
⋯
10-10
Coefficient matrix ββββ
0
1
1
1
p
p
β
β
β
×
−
=
⋮β
10-11
Least Squares Solution
• Minimize distances between point and
response surface
• Find b to minimize
( ) ( )SSE ′= − −Y Xb Y Xb
• Obtain normal equations as before:
′ ′X Xb = X Y
• Least Squares Solution as before:
( )1−
′ ′=b X X X Y
10-12
Fitted Values / Residuals
• Fitted (predicted) values for the mean of Y
are
( )1ˆ −
′ ′= = =Y Xb X X X X Y HY
• Residuals are
( )ˆ− − −e = Y Y = Y HY = I H Y
• Note formulas are same as before, with hat
matrix:
( )1−
′ ′=H X X X X
10-13
“Linear” Regression Models
• The term linear here refers to the
parameters, not the predictor variables.
• We can use linear regression models to deal
with almost any “function” of a predictor
variable (e.g. ( )2, logX X , etc.)
• We cannot use linear regression models to
deal with nonlinear functions of the
parameters (unless we can find a
transformation that makes them linear).
10-14
Types of Predictors
• Continuous Predictors – we are used to
these.
• Qualitative Predictors
� Two possible outcomes (e.g. male/female)
represented by 0 or 1
• Polynomial Regression
� Use squared or higher-ordered terms in regression
model.
� Typically always include lower order terms.
� 2 10 1 2 1
p
i i i p i iY X X Xβ β β β ε−
−= + + + + +⋯
10-15
Types of Predictors (2)
• Using Transformed Variables
� Transform one or more X’s
� Transform Y
• Interaction Effects
� Use Product of Predictor variables as an additional
variable.
� Each variable in the product included by itself as
well.
� 0 1 1 2 2 3 1 2i i i i i iY X X X Xβ β β β ε= + + + +
• More on some of these models later...
10-16
Analysis of Variance
Formulas for sums of squares(in matrix terms)
are the same as before
( )
( )
( )
2
2
2
1ˆ
ˆ
1
i
i i
i
SSR Y Yn
SSE Y Y
SSTO Y Yn
′ ′ ′= − = −
′ ′ ′ ′= − = = −
′ ′= − = −
∑
∑
∑
b X Y Y JY
e e Y Y b X Y
Y Y Y JY
10-17
Analysis of Variance (2)
• Degrees of Freedom depend on the model
• Always n – 1 total degrees of freedom
• Model degrees of freedom is equal to the
number of terms in the model ( )1p−
� Each variable has at least one term
� May be additional terms for squares,
interactions, etc.
• Error degrees of freedom is difference
between total and model degrees of
freedom ( )n p− .
10-18
Analysis of Variance (3)
• Mean Squares obtained by dividing SS by
DF for each source.
• The mean square error (MSE) is still,
always, and forever, the estimate of 2σ .
10-19
ANOVA Table
Source df SS MS F
Regression
(Model) p-1 ( )
2ˆiY Y−∑
R
SSR
df
MSR
MSE
Error n-p ( )2ˆ
i iY Y−∑ E
SSE
df
Total n-1 ( )2
iY Y−∑ T
SSTO
df
10-20
F-test for model significance
• The ratio F = MSR / MSE is again used to
test for a regression relationship.
• Difference from SLR
� Null Hyp: 0 1 2 1: ... 0pH β β β −= = = =
� Alt Hyp: : at least one 0a kH β ≠
• Tests model significance, not individual
variables; gives no indication of which
variable(s) in particular are important
10-21
F-test for model significance (2)
• Under null, has F-distribution with degrees
of freedom 1p− and n p− .
• Reject if statistic is larger than critical value
for 0.05α = ; or if p-value for test (given
in SAS ANOVA table) is less than 0.05
• If reject, conclude at least one of the
explanatory variables is important.
• If fail to reject, and sample size large
enough (power), then none of the
explanatory variables are useful.
10-22
Coefficient of Multiple Determination
• 2 1SSR SSE
RSSTO SSTO
= = −
• Measures the percentage of variation
explained by the variables in the model.
• Additional variables will make R2 go up; so
cannot really use R2 to determine whether a
variable should be added.
10-23
Adjusted R2
• 2 11 1a
MSE n SSER
MSTO n p SSTO
− = − = − −
• Recall mean squares are SS adjusted by
degrees of freedom
• 2aR can increase or decrease when a new
variable is introduced into the model;
depending on whether the decrease in SSE
is offset by the lost degree of freedom.
• 2aR can be used to decide if variables are
important in a model.
10-24
Inference for INDIVIDUAL Regression Coefficients
• We already have b ~ N(
�, σ2(X΄X)-1), so
define
{ } ( )1
p p
MSE−
×
′= ×2s b X X
• For individual kb , the estimated variance is
the kth diagonal element of this matrix:
{ } { }2
,k k ks b =
2s b
Note: k=0,1,…,p-1.
10-25
Confidence Intervals for kβ
• CI for kβ is { }k crit kb t s b±
• Critical value comes from t-distribution with
n p− degrees of freedom (DF for error)
• If CI includes zero, then we cannot reject
0 : 0kH β = (i.e. that variable is not
significant when added to the model
containing all of the other variables.)
10-26
Significance Test for kβ
• Is known as a
�� � �� �� �� � � � � �� � � test; tests
whether the kth explanatory variable is
important when added to all of the other
variables in the model (i.e. it is a
conditional test).
• Test statistic is as before: { }* /k kt b s b=
• Compare to t-critical value on n – p degrees
of freedom.
• This is the test given in the model
parameters section of SAS for PROC REG.
10-27
Upcoming in Lecture 11
• Case Study: Computer Science Student
Data