Upload
ngonguyet
View
226
Download
6
Embed Size (px)
Citation preview
15-1
Objectives of Multiple Regression
• Establish the linear equation that best predicts values of a dependent variable Y using more than one explanatory variable from a large set of potential predictors x1, x2, ... xk.
• Find that subset of all possible predictor variables that explains a significant and appreciable proportion of the variance of Y, trading off adequacy of prediction against the cost of measuring more predictor variables.
15-2
Expanding Simple Linear Regression• Quadratic model.
Y = β0 + β1x1 + β2x12 + ε
• General polynomial model.
Y = β0 + β1x1 + β2x12 + β3x13 + ... + βkx1k + ε
Any independent variable, xi, which appears in the polynomial regression model as xik is called a kth-degree term.
Adding one or more polynomial terms to the model.
15-3
Polynomial model shapes.
x
y
10203040
1.52.02.53.0
3.5 Linear
Quadratic xn
yn10203040
1.52.02.53.0
3.5Adding one more terms to the model significantly improves the model fit.
15-4
Incorporating Additional Predictors
Simple additive multiple regression model
y = β0 + β1x1 + β2x2 + β3x3 + ... + βkxk + ε
Additive (Effect) Assumption - The expected change in y per unit increment in xj is constantand does not depend on the value of any other predictor. This change in y is equal to βj.
15-5
Additive regression models: For two independent variables, the response is modeled as a surface.
15-6
Interpreting Parameter Values (Model Coefficients)
• “Intercept” - value of y when all predictors are 0.β0
• “Partial slopes”β1, β2, β3, ... βk
βj - describes the expected change in y per unit increment in xjwhen all other predictors in the model are held at a constant value.
15-8
Multiple Regression with Interaction Terms
Y = β0 +β1x1 + β2x2 +
β3x3 + ... + βkxk +
β12x1x2 + β13x1x3 +
... + β1kx1xk + ...
+ βk-1,kxk-1xk + ε
cross-productterms quantify the interaction among predictors.
Interactive (Effect) Assumption: The effect of one predictor, xi, on the response, y, will depend on the value of one or more of the other predictors.
15-9
Interpreting Interaction
β1 – No longer the expected change in Y per unit increment in X1!
β12 – No easy interpretation! The effect on y of a unit increment in X1, now depends on X2.
i2i1i122i21i10i xxxxy ε+β+β+β+β=
or Define: 2i1i3i xxx =
i3i122i21i10i xxxy ε+β+β+β+β=
No difference
Interaction Model
15-10
x1
y
x2=2
x2=1
x2=0
β0
interaction
x1
y
x2=2x2=1x2=0
β0
β1β2
β2no-interaction
β1
β1+2β12
β0+2β2β0+β2
15-13
A Protocol for Multiple Regression
Identify all possible predictors.
Establish a method for estimating model parameters and their standard errors.
Develop tests to determine if a parameter is equal to zero (i.e. no evidence of association).
Reduce number of predictors appropriately.
Develop predictions and associated standard error.
15-14
Estimating Model ParametersLeast Squares Estimation
Assuming a random sample of n observations (yi, xi1,xi2,...,xik), i=1,2,...,n. The estimates of the parameters for the best predicting equation:
Is found by choosing the values:
which minimize the expression:
kˆ,,ˆ,ˆ βββ !10
∑ ∑= =
β−−β−β−β−=−=n
1i
n
1i
2ikk2i21i10i
2ii )xxxy()yy(SSE !
ikkiii xˆxˆxˆˆy β++β+β+β= !22110
15-15
Normal EquationsTake the partial derivatives of the SSE function with respect to β0, β1,…, βk, and equate each equation to 0. Solve this system of k+1 equations in k+1 unknowns to obtain the equations for the parameter estimates.
∑∑∑∑
∑∑∑∑
∑∑∑
====
====
===
β++β+β=
β++β+β=
β++β+β=
n
1ik
2ik
n
1i11iik0
n
1iik
n
1iiikik
n
1ikik1i
n
1i1
21i0
n
1i1i
n
1ii1i1i
n
1ikik
n
1i11i0
n
1ii
ˆxˆxxˆxyxx
ˆxxˆxˆxyxx
ˆxˆxˆny1
!
"""
!
!
15-16
An Overall Measure of How Well the Full Model Performs
• Denoted as R2.• Defined as the proportion of the variability in the dependent variable y that is accounted for by the independent variables, x1, x2, ..., xk, through the regression model.
• With only one independent variable (k=1), R2 = r2, the square of the simple correlation coefficient.
Coefficient of Multiple Determination
15-17
Computing the Coefficient of Determination
10, 2221
≤≤−
==⋅ RSSSES
SSSRR
yy
yy
yyxxxy k!
TSS )(1
2 =−=∑=
n
iiyy yyS
∑ ∑∑∑
∑
= ===
=
−−−−=
−=n
i
n
iiikk
n
iii
n
iii
n
iii
yxyxyy
yySSE
1 1111
10
2
1
2
ˆˆˆ
)ˆ(
βββ !
15-18
0246x1
0
2
4
6x2-10123456x1
-1
1
3
5x2
12345x1
1
2
3
4
5
x2Multicollinearity
A further assumption in multiple regression (absent in SLR), is that the predictors (x1, x2, ... xk) are statistically uncorrelated. That is, the predictors do not co-vary. When the predictors are significantly correlated (correlation greater than about 0.6) then the multiple regression model is said to suffer from problems of multicollinearity.
r = 0 r = 0.6 r = 0.8
15-19
Effect of Multicollinearity on the Fitted Surface
xxxxxxx
Extreme collinearity
x1
xxxx xx
xxxxxxx
xxxx xx
x2
y
15-20
Multicollinearity leads to
• Numerical instability in the estimates of the regression parameters – wild fluctuations in these estimates if a few observations are added or removed.
• No longer have simple interpretations for the regression coefficients in the additive model.
Ways to detect multicollinearity• Scatterplots of the predictor variables.
• Correlation matrix for the predictor variables – the higher these correlations the worse the problem.
• Variance Inflation Factors (VIFs) reported by software packages. Values larger than 10 usually signal a substantial amount of collinearity.
What can be done about multicollinearity• Regression estimates are still OK, but the resulting confidence/prediction
intervals are very wide.
• Choose explanatory variables wisely! (E.g. consider omitting one of two highly correlated variables.)
• More advanced solutions: principal components analysis;; ridge regression.
15-21
Testing in Multiple Regression
• Testing individual parameters in the model.• Computing predicted values and associated standard errors.
),0(N~ , 2110 σεεβββ ++++= kk XXY !
Overall AOV F-test
H0: None of the explanatory variables is a significant predictor of Y
MSEMSR
knSSEkSSRF =−−
=)1/(
/
α,1, −−> knkFFReject if:
15-22
Standard Error for Partial Slope Estimate
The estimated standard error for: jβ
)1(1ˆ 2ˆ
1121 kjjjjjj
xxxxxxxx RSs
!! +−•−= εβ σ where )1(
ˆ+−
=knSSE
εσ
∑=
−=n
ijijxx xxS
jj1
2)(
21121 kjjj xxxxxxR !! +−•and is the coefficient of determination for the
model with xj as the dependent variable and all other x variables as predictors.
What happens if all the predictors are truly independent of each other?
jj
jkjjj
xxxxxxxx S
sR εβ
σ0 ˆ2
1121→→
+−• !!
∞→→ β• +− jkjjj ˆxxxxxx sR 121121 !!
If there is high dependency?
15-23
Confidence Interval
100(1-α)% Confidence Interval for jβ
jst
knj βαβ ˆ2),1(
ˆ+−
±
Reflects the number of data points minus the number of parameters that have to be estimated.
df for SSE
15-24
Testing whether a partial slope coefficient is equal to zero.
0
0
0
00
≠
<
>
=
j
j
ja
j
H
H
β
β
β
β
Test Statistic:
js
t j
β
β
ˆ
ˆ=
Rejection Region:
2),1(
),1(
),1(
|| α
α
α
+−
+−
+−
>
−<
>
kn
kn
kn
tt
tt
ttAlternatives:
15-25
Predicting Y
• We use the least squares fitted value, , as our predictor of a single value of y at a particular value of the explanatory variables (x1, x2, ..., xk).
• The corresponding interval about the predicted value of y is called a prediction interval.
• The least squares fitted value also provides the best predictor of E(y), the mean value of y, at a particular value of (x1, x2, ..., xk). The corresponding interval for the mean prediction is called a confidence interval.
• Formulas for these intervals are much more complicated than in the case of SLR;; they cannot be calculated by hand (see the book).
y
15-26
Minimum R2 for a “Significant” Regression
Since we have formulas for R2 and F, in terms of n, k, SSE and TSS, we can relate these two quantities.
We can then ask the question: what is the min R2 which will ensure the regression model will be declared significant, as measured by the appropriate quantile from the F distribution?
The answer (below), shows that this depends on n, k, and SSE/TSS.
α,1,min2
1 −−−−= knkFTSS
SSEknkR