10
Chap. 11, page 1 Math 445 Chapter 11 Model Checking and Refinement Rainfall data In the rainfall data, we ended up leaving out case 28 (Death Valley) because it had a large residual and its altitude was the lowest in the data set. The resulting model is therefore not applicable to such low altitude locations. If case 28 had not been unusual, then we would not have been justified in omitting it. Without #28 (Death Valley) Coefficients a -2.074 .525 -3.951 .001 .000725 .000241 4.647 3.012 .006 .093924 .014285 .773 6.575 .000 -.431176 .059929 -.662 -7.195 .000 -.000019 .000006 -4.620 -2.959 .007 (Constant) Altitude (ft) Latitude (degrees) Rainshadow Altitude*Latitude Model 1 B Std. Error Unstandardized Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: Log10(Precipitation) a. R 2 = .80 Since there is an interaction between Altitude and Latitude, interpretation of the coefficients for these variables becomes a little complicated. However, we can interpret the effect of the Rainshadow variable in this model.

Chapter 11

Embed Size (px)

Citation preview

Page 1: Chapter 11

Chap. 11, page 1

Math 445 Chapter 11 Model Checking and Refinement Rainfall data

In the rainfall data, we ended up leaving out case 28 (Death Valley) because it had a large residual and its altitude was the lowest in the data set. The resulting model is therefore not applicable to such low altitude locations. If case 28 had not been unusual, then we would not have been justified in omitting it. Without #28 (Death Valley)

Coefficientsa

-2.074 .525 -3.951 .001.000725 .000241 4.647 3.012 .006.093924 .014285 .773 6.575 .000

-.431176 .059929 -.662 -7.195 .000-.000019 .000006 -4.620 -2.959 .007

(Constant)Altitude (ft)Latitude (degrees)RainshadowAltitude*Latitude

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Log10(Precipitation)a.

R2 = .80 Since there is an interaction between Altitude and Latitude, interpretation of the coefficients for these variables becomes a little complicated. However, we can interpret the effect of the Rainshadow variable in this model.

Page 2: Chapter 11

Chap. 11, page 2

Case 28 is an example of an outlier, a case for which the model does not fit well. Outliers have large residuals. We are also interested in influential cases, cases whose omission changes the fitted model substantially. Influential cases may not be outliers. Least squares is sensitive to unusual cases and an influential case may “pull” the regression plane toward it so much that it does not have a large residual. In simple linear regression, we could often identify influential cases simply from a scatterplot. In multiple regression, it may not be possible to see influential cases in pairwise scatterplots and we need additional tools. Case-Influence statistics Leverage: The leverage of a case is based only on the values of the explanatory variables. It measures the distance of the case from the mean for the explanatory variables (in multidimensional space). For one explanatory variable, the leverage is

( )nXX

XXns

XXn

hi

i

X

ii

1)(

1)1(

12

22

+−

−=+⎥

⎤⎢⎣

⎡ −−

=∑

With more than one explanatory variable, the leverage is a measure of distance in higher-dimensional space. The distance takes into account the joint variability of the variables – see Display 11.10 on p. 316. High-leverage cases are easy to identify visually with only one explanatory variable, but become increasingly difficult to identify visually with more explanatory variables. Leverages are always between 1/n and 1. The average of all the leverages in a data set is always p/n where p is the number of explanatory variables. SPSS computes centered leverages (under the Linear Regression…Save button), even though it calls them simply “leverages.” The centered leverage is

nhi /1− . Therefore, the centered leverage is between 0 and 1-1/n. Leverage measures the potential influence of a case. High leverage cases have the potential to change the least squares fit substantially.

Page 3: Chapter 11

Chap. 11, page 3

Studentized residuals While the true residuals (what we called the iε ) all have the same standard deviation σ in the regression model, the observed residuals ie don’t. Why not? Consider simple linear regression:

• True residual )( 10 iii XY ββε +−=

• Observed residual: )ˆˆ( 10 iii XYe ββ +−= First, we already know that the size of the observed residuals tend to be smaller than the sizes of the true residuals. That’s why we divide by n-2 when we compute the standard deviation of the observed residuals to get an estimate of the standard deviation of the true residuals. The reason that the observed residuals tend to be smaller is that the least squares line is the line which best fits the data so the deviations from this line will tend to be smaller than the deviations from the true line. What do we mean when we say that the residuals do not all have the same standard deviation? How can a single value have a standard deviation? What we mean is: what is the standard deviation of the residuals at each iX from many simulated sets

of data from the linear regression model with a fixed set of iX ’s? To carry out this simulation we would follow the following steps. The iX ’s remain the same for every simulation.

1. Generate a set of a set of iY ’s where each iY is from a normal distribution with mean iX10 ββ + and standard deviation σ. That gives a set of n pairs of values ),(,),,(),,( 2211 nn YXYXYX … .

2. Fit the least squares line 3. Compute the residuals. 4. Repeat steps 1-3 many times with a new set of iY ’s each time.

Now look at the distribution of observed residuals for each iX and, in particular, compute the standard deviation of the observed residuals at each iX . You will find that the standard deviations are different and that the standard deviation of the residuals for iX ’s far from X (high leverage values) is smaller than for iX ’s near X (low leverage values). In fact, it can be shown that the standard deviation of the residual at iX is: SD(Residuali) = )1( ih−σ where ih is the leverage. This formula applies to any multiple regression model, not just the simple linear regression model.

Page 4: Chapter 11

Chap. 11, page 4

Example (simple linear regression): Suppose iY is normal with mean ii XY 21)( +=µ , i= 1,..,5, and standard deviation σ = 1, and that the

iX ’s are 1, 4, 5, 6 and 14. Here are the iY ’s from one simulation: 3.42, 9.86, 10.05, 12.90, 27.38. The least squares line is

ii XY 83.173.1ˆ += and the residuals are: -0.145, 0.803, -0.844, 0.182, 0.004. Repeating the simulation 10,000 times, here are the mean and standard deviation of the residuals at each

iX : iX 1 4 5 6 14 mean of residuals 0.008 0.002 -0.013 0.000 0.004 std. dev. of residuals 0.737 0.869 0.884 0.900 0.350

Calculate the leverages for these 5 iX ’s: Use the formula on the previous page to calculate the standard deviation of the residuals at each iX . How do they match the values estimated from the simulation? Why are we so concerned about the standard deviation of the residuals at different iX ’s?

• Because a big residual is more unusual at a high leverage point than at a low-leverage point. Therefore, standardizing the residuals by an estimate of their standard deviation is a better way to compare residuals. Since residuals always have mean 0, this means dividing each residual by an estimate of its standard deviation )1( ih−σ . Since we don’t know σ, we replace it by σ (the square root of mean square residual in the ANOVA table).

• The studentized residual is

i

ii h

resstudres

−=

1σ.

Page 5: Chapter 11

Chap. 11, page 5

• Studentized residuals are also sometimes called internally studentized residuals. In SPSS, they are called “studentized residuals” (under the Save button on the Linear Regression window).

• A potential problem with the studentized residuals is that σ may be inflated if a residual is an outlier. Therefore, a modified version of the studentized residual is the externally studentized residual, called the studentized deleted residual in SPSS. σ is replaced by )(ˆ iσ , the estimated standard deviation of the residuals from the model fit with the ith observation omitted.

ii

ii h

resstudres

−=

1ˆ )(

*

σ

Internally and externally studentized residuals can be used in just the same way as the raw residuals: in residual plots, normal probability plots, etc. In fact, they are preferred to the raw residuals because the nonconstant variance of the raw residuals has been corrected for. When examining studentized residuals, one should look for outliers. In addition, one can use the standard normal distribution as a rough guide for identifying unusual values: e.g, we expect about 5% of values less than -2 or greater than 2 and less than 1% to be outside the range –3 to 3. Cook’s Distance A more direct measure of the influence of an observation is Cook’s Distance, which measures how much the fitted values change when each observation is omitted. For case i,

( )

∑=

−=

n

j

jiji p

YYD

12

2)(

ˆ

ˆˆ

σ

where p is the number of regression coefficients. The numerator of the above expression is what’s important; the denominator just standardizes the statistic.

jY is the fitted value for case j when the whole data set is used to fit the model. )(ˆ

ijY is the fitted value for case j when case i is omitted in fitting the model. So, for example, to calculate 1D we omit case 1, calculate the model, and calculate the fitted values for all observations including case 1. We then calculate the sum of the squared differences between these predicted values and the predicted values from the model fit to all the data. A values of Cook’s D close to or greater than 1 is often considered to be indicative of an observation with large influence. While Cook’s D is a useful measure if the goal of the model is prediction, it is not as useful for seeing how a particular coefficient changes when an observation is omitted. However, it can be used to identify cases to check – omit a case with large Cook’s D and see how the coefficients of interest change.

Page 6: Chapter 11

Chap. 11, page 6

Other measures of influence A number of other measures of influence have been proposed. However, some of these measures are redundant and it is not necessary to look at all of them. Two others that SPSS computes are DfFits, which measures how much the predicted value for case i changes when case I is omitted and DfBetas, which measures how much the omission of case i changes each of the coefficients in the model (hence, for each case, there is a separate DfBetas value for each variable). Rainfall data: model Logprec = Altitude + Latitude + Rainshadow + Altitude*Latitude Results with all cases included:

Page 7: Chapter 11

Chap. 11, page 7

Without case #28:

In Sec. 11.4.4, p. 320, Ramsey and Schafer suggest that if “the residual plot from a good inferential model fails to suggest any problems, there is generally no need to examine case influence statistics at all.” I would agree except that I would suggest that the residual plot should use the externally studentized residuals (=studentized deleted residuals).

Page 8: Chapter 11

Chap. 11, page 8

We next examine two types of plots useful in refining models:

• Partial regression leverage plots (also called added-variable plots) are useful for visually identifying influential and high leverage points for each regression coefficient separately. These are not discussed in the text, but are easily available in SPSS.

• Partial residual plots (also called component-plus-residual plots) are useful for identifying

nonlinear relationships in a multiple regression model. These are discussed in the text, but are not readily available in SPSS. They can be constructed in SPSS, but it’s rather tedious.

It might seem that simply plotting the response variable Y versus each explanatory variable would be adequate for assessing the relationship between Y and each X variable for a multiple regression model. However, these plots can be misleading because they do not control for the values of the other X variables. For example, an apparently strong relationship between Y and 1X may disappear when other variables are included in the model. If the scatterplot of Y versus 1X looks curved, it does not necessarily mean that a squared term will be necessary with the other X variables in the model. Similarly, a case that appears influential in the Y versus 1X scatterplot may not be influential with the other X variables in the model and a case that doesn’t appear influential may turn out to be so with the other X variables in the model. Plots of the residuals versus each X variable are also inadequate. They are better than Y versus X plots because they show only the unexplained variation in Y on the y-axis. However, the X variables are not adjusted for relationships with each other. Partial regression leverage plots (not in text) • A partial regression leverage plot (or added-variable plot) attempts to separate out the relationship

between the response and any explanatory variable after adjusting for the other explanatory variables in the model.

• The steps involved creating the partial regression leverage plot for variable 1X are:

1. Compute the residuals from the regression of Y on all the other X variables in the model except 1X .

2. Compute the residuals from the regression of 1X on all the other X variables in the model.

3. Plot the first set of residuals on the y-axis against the second set on the x-axis. Steps 1-3 are repeated for all the X variables in the model. The partial regression leverage plot for 1X looks at the relationship between Y and 1X after adjusting for the other X variables. It turns out that the slope of the least squares line for this plot is exactly equal to 1β , the coefficient on 1X in the regression model with all the X’s in it. In addition, high leverage and influential cases for 1β can be identified from this scatterplot. This is the primary use of the partial regression leverage plots. SPSS: Partial regression leverage plots for all X variables can be generated automatically in SPSS by selecting “Produce all partial plots” on the Plot menu for the Regression…Linear menu.

Page 9: Chapter 11

Chap. 11, page 9

Partial residual plots • A partial residual plot (or component-plus-residual plot) is constructed differently from a partial

regression leverage plot, but also has the property that the slope of the least squares line through the plot is the coefficient for that variable in the multiple regression model with all the X variables included.

• Partial residual plots are better than partial regression leverage plots for identifying nonlinear relationships between Y and an X variable after adjusting for the other X variables in the model.

• If a clear nonlinear relationship is identified, possible solutions include adding the square of the X variable to the model, transforming the X variable, or transforming the Y variable.

• To construct the partial residual plot for 1X , follow the following steps. For the sake of this example, assume there are three other X variables in the model: 432 ,, XXX .

1. Regress Y on all the X variables to obtain 443322110ˆˆˆˆˆˆ XXXXY βββββ ++++= .

2. Compute the partial residuals for 1X as 4433220ˆˆˆˆ XXXYpres ββββ −−−−= .

3. Plot the partial residuals for 1X (on the y-axis) against 1X (on the x-axis). Steps 1-3 should be repeated for 432 ,, XXX . • Partial residual plots are also useful for identifying high leverage and influential cases.

• SPSS does not automatically produce partial residual plots (recall that “partial plots” in the PSSS

regression menu means “partial regression leverage plots”). It is somewhat of a hassle to produce these plots in SPSS manually, but it can be done by following steps 1-3. It is easier to replace step 2 by the equivalent calculation:

2. 11

ˆ Xrespres β+= where res is the residual from the full model fit in step 1.

Thus, the steps are: fit the full model (step 1) and save the residuals as RES_1. Use Transform… Compute to compute the partial residuals as RES_1+ 11

ˆ Xβ where you will type in the value for

1β from the model fit in step 1. Plot the partial residuals versus 1X . Repeat for the other variables.

A loess smooth can be added to the partial residual plot to help identify non-linear relationships. The following page contains both partial regression leverage plots and partial residual plots for the rainfall data where the log10(Precip) is regressed on Altitude, Latitude, and Rainshadow with no interaction. It might be best to look for nonlinear relationships before considering interactions, but certainly these plots can also be used for models with interactions. Case #28 has been omitted.

Coefficientsa

-1.137 .479 -2.372 .026.0000139 .0000167 .089 .832 .413

.06835 .01302 .562 5.250 .000-.40686 .06795 -.625 -5.988 .000

(Constant)Altitude (ft)Latitude (degrees)Rainshadow

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Log10(Precipitation)a.

Page 10: Chapter 11

Chap. 11, page 10

Partial regression leverage plots Partial residual plots