Upload
juniper-crawford
View
215
Download
1
Tags:
Embed Size (px)
Citation preview
15-1
Objectives of Multiple Regression
• Establish the linear equation that best predicts values of a dependent variable Y using more than one explanatory variable from a large set of potential predictors {x1, x2, ... xk}.
• Find that subset of all possible predictor variables that explains a significant and appreciable proportion of the variance of Y, trading off adequacy of prediction against the cost of measuring more predictor variables.
15-2
Expanding Simple Linear Regression
• Quadratic model.
Y = 0 + 1x1 + 2x12 +
• General polynomial model.
Y = 0 + 1x1 + 2x12 +
3x13 + ... + kx1
k +
Any independent variable, xi, which appears in the polynomial regression model as xi
k is called a kth-degree term.
Adding one or more polynomial terms to the model.
15-3
Polynomial model shapes.
x
y
10 20 30 40
1.5
2.0
2.5
3.0
3.5
Linear
Quadratic xn
yn
10 20 30 40
1.5
2.0
2.5
3.0
3.5
Adding one more terms to the model significantly improves the model fit.
15-4
Incorporating Additional Predictors
Simple additive multiple regression model
y = 0 + 1x1 + 2x2 + 3x3 + ... + kxk +
Additive (Effect) Assumption - The expected change in y per unit increment in xj is constant and does not depend on the value of any other predictor. This change in y is equal to j.
15-5
Additive regression models: For two independent variables, the response is modeled as a surface.
15-6
Interpreting Parameter Values (Model Coefficients)
• “Intercept” - value of y when all predictors are 0.0
• “Partial slopes”
k
j - describes the expected change in y per unit
increment in xj when all other predictors in
the model are held at a constant value.
15-7
Graphical depiction of j.
- slope in direction of x1.
2 - slope in direction of x2.
15-8
Multiple Regression with Interaction Terms
Y = 0 +1x1 + 2x2 +
3x3 + ... + kxk +
12x1x2 + 13x1x3 +
... + 1kx1xk + ...
+ k-1,kxk-1xk +
cross-productterms quantify the interaction among predictors.
Interactive (Effect) Assumption: The effect of one predictor, xi, on the response, y, will depend on the value of one or more of the other predictors.
15-9
Interpreting Interaction
1 – No longer the expected change in Y per unit increment in X1!
12 – No easy interpretation! The effect on y of a unit increment in X1,
now depends on X2.
i2i1i122i21i10i xxxxy
or Define: 2i1i3i xxx
i3i122i21i10i xxxy
No difference
Interaction Model
15-10
x1
y
x2=2
x2=1
x2=0
0
interaction
x1
y
x2=2
x2=1
x2=0
0
12}
2}
no-interaction
1
1+212
0+22
0+2
15-11
Multiple Regression models with interaction:
Lines move apart
Lines come together
15-12
Effect of the Interaction Term in Multiple Regression
Surface is twisted.
15-13
A Protocol for Multiple Regression
Identify all possible predictors.
Establish a method for estimating model parameters and their standard errors.
Develop tests to determine if a parameter is equal to zero (i.e. no evidence of association).
Reduce number of predictors appropriately.
Develop predictions and associated standard error.
15-14
Estimating Model ParametersLeast Squares Estimation
Assuming a random sample of n observations (yi, xi1,xi2,...,xik), i=1,2,...,n. The estimates of
the parameters for the best predicting equation:
Is found by choosing the values:
which minimize the expression:
kˆ,,ˆ,ˆ 10
n
1i
n
1i
2ikk2i21i10i
2ii )xxxy()yy(SSE
ikkiii xˆxˆxˆˆy 22110
15-15
Normal Equations
Take the partial derivatives of the SSE function with respect to 0, 1,…, k, and equate each equation to 0. Solve this system of k+1 equations in k+1 unknowns to obtain the equations for the parameter estimates.
n
1ik
2ik
n
1i11iik0
n
1iik
n
1iiikik
n
1ikik1i
n
1i1
21i0
n
1i1i
n
1ii1i1i
n
1ikik
n
1i11i0
n
1ii
ˆxˆxxˆxyxx
ˆxxˆxˆxyxx
ˆxˆxˆny1
15-16
An Overall Measure of How Well the Full Model Performs
• Denoted as R2.• Defined as the proportion of the variability in the
dependent variable y that is accounted for by the independent variables, x1, x2, ..., xk, through the regression model.
• With only one independent variable (k=1), R2 = r2, the square of the simple correlation coefficient.
Coefficient of Multiple Determination
15-17
Computing the Coefficient of Determination
10, 22
21
R
S
SSES
S
SSRR
yy
yy
yyxxxy k
TSS )(1
2
n
iiyy yyS
n
i
n
iiikk
n
iii
n
iii
n
iii
yxyxyy
yySSE
1 1111
10
2
1
2
ˆˆˆ
)ˆ(
15-18
0 2 4 6x1
0
2
4
6
x2
-1 0 1 2 3 4 5 6x1
-1
1
3
5
x2
1 2 3 4 5x1
1
2
3
4
5
x2
Multicollinearity
A further assumption in multiple regression (absent in SLR), is that the predictors (x1, x2, ... xk) are statistically uncorrelated. That is, the
predictors do not co-vary. When the predictors are significantly correlated (correlation greater than about 0.6) then the multiple regression model is said to suffer from problems of multicollinearity.
r = 0 r = 0.6 r = 0.8
15-19
Effect of Multicollinearity on the Fitted Surface
xx
xx
xx
x
Extreme collinearity
x1
xx
xx
xx
xx
xx
xx
x
xx
xx
xx
x2
y
15-20
Multicollinearity leads to
• Numerical instability in the estimates of the regression parameters – wild fluctuations in these estimates if a few observations are added or removed.
• No longer have simple interpretations for the regression coefficients in the additive model.
Ways to detect multicollinearity
• Scatterplots of the predictor variables.
• Correlation matrix for the predictor variables – the higher these correlations the worse the problem.
• Variance Inflation Factors (VIFs) reported by software packages. Values larger than 10 usually signal a substantial amount of collinearity.
What can be done about multicollinearity
• Regression estimates are still OK, but the resulting confidence/prediction intervals are very wide.
• Choose explanatory variables wisely! (E.g. consider omitting one of two highly correlated variables.)
• More advanced solutions: principal components analysis; ridge regression.
15-21
Testing in Multiple Regression
• Testing individual parameters in the model.• Computing predicted values and associated standard
errors.
),0(N~ , 2110 kkXXY
Overall AOV F-test
H0: None of the explanatory variables is a significant predictor of Y
MSE
MSR
knSSE
kSSRF
)1/(
/
,1, knkFFReject if:
15-22
Standard Error for Partial Slope Estimate
The estimated standard error for: j
)1(
1ˆ
2ˆ
1121 kjjjjj
jxxxxxxxx RS
s
where)1(
ˆ
kn
SSE
n
ijijxx xxS
jj1
2)(
2
1121 kjjj xxxxxxR and is the coefficient of determination for the model with xj as the dependent variable and all other x variables as predictors.
What happens if all the predictors are truly independent of each other?
jj
jkjjj
xx
xxxxxxS
sR
0 ˆ
2
1121
jkjjj ˆxxxxxx sR 12
1121
If there is high dependency?
15-23
Confidence Interval
100(1-)% Confidence Interval for j
jst
knj ˆ2),1(
ˆ
Reflects the number of data points minus the number of parameters that have to be estimated.
df for SSE
15-24
Testing whether a partial slope coefficient is equal to zero.
0
0
0
00
j
j
ja
j
H
H
Test Statistic:
js
t j
ˆ
ˆ
Rejection Region:
2),1(
),1(
),1(
||
kn
kn
kn
tt
tt
tt
Alternatives:
15-25
Predicting Y
• We use the least squares fitted value, , as our predictor of a single value of y at a particular value of the explanatory variables (x1, x2, ..., xk).
• The corresponding interval about the predicted value of y is called a prediction interval.
• The least squares fitted value also provides the best predictor of E(y), the mean value of y, at a particular value of (x1, x2, ..., xk). The corresponding interval for the mean prediction is called a confidence interval.
• Formulas for these intervals are much more complicated than in the case of SLR; they cannot be calculated by hand (see the book).
y
15-26
Minimum R2 for a “Significant” Regression
Since we have formulas for R2 and F, in terms of n, k, SSE and TSS, we can relate these two quantities.
We can then ask the question: what is the min R2 which will ensure the regression model will be declared significant, as measured by the appropriate quantile from the F distribution?
The answer (below), shows that this depends on n, k, and SSE/TSS.
,1,min2
1 knkFTSS
SSE
kn
kR
15-27
Minimum R2 for Simple Linear Regression (k=1)