25
Regression Several Explanatory Variables

Regression

  • Upload
    gema

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Regression. Several Explanatory Variables. Example: Scottish hill races data . These data are made available in R as > Library(MASS) > data(hills) They give record times (minutes) in 1984 of 35 Scottish hill races, against distance (miles) and total height climbed (feet). - PowerPoint PPT Presentation

Citation preview

Page 1: Regression

Regression

Several Explanatory Variables

Page 2: Regression

Example: Scottish hill races data.

These data are made available in R as

> Library(MASS)> data(hills)

They give record times (minutes) in 1984 of 35 Scottish hill races, against distance (miles) and total height climbed (feet).

Page 3: Regression
Page 4: Regression

We regard time as the response variable, and seek to model how its conditional distribution depends on the explanatory variables distance and climb.

Page 5: Regression

The R code pairs(hills) produces the plots shown.

Page 6: Regression

These show that the response variable time has a strong positive association with each of the explanatory variables distance and climb - although a stronger dependence on distance.

However, the two explanatory variables distance and climb also have a strong positive association with each other, and this complicates the modelling.

Page 7: Regression

Preliminary analysis of the data suggests that the observation (number 18) corresponding to Knock Hill is almost certainly in error - the time is much too great for the given distance and climb, and it may have been misrecorded by 1 hour. We therefore omit Knock Hill from the analysis. (use plot and identify commands)

Page 8: Regression

On physical grounds we attempt to find a model with zero intercept.We consider first a linear model (Model 1) involving both the explanatory variables distance and time.

time = a x distance + b x climb + ε

Page 9: Regression
Page 10: Regression

The fitted model is

time = 5.47 x dist + 0.0106 x climb + ε

Page 11: Regression

The “three stars” associated with the estimates of the coefficients, shows that distance and climb are both important explanatory variables.

(This can be confirmed by noting the very much poorer fits obtained if either of these variables is omitted).

Page 12: Regression

> plot(hills.model.1) produces

Page 13: Regression
Page 14: Regression
Page 15: Regression
Page 16: Regression

The pattern of residuals leads us to suspect that there may be some nonlinear dependence on climb and/or distance. This would be physically quite natural. It here seems reasonable to introduce quadratic terms as a first attempt to model any nonlinearity.

Page 17: Regression

We consider now the (quite elaborate) model (Model2):

time = a0 x distance + b0 x (distance)2 + c0 x climb + d0 x(climb)2 + ε

Page 18: Regression
Page 19: Regression

The fitted model is now:time=5.62xdistance+0.0323x(distance)2+0.000262xclimb+0.00000180x(climb)2+ε

Page 20: Regression

The analysis, most notably “star values” associated with the estimate of thecoefficient of (climb)2, shows that there is indeed evidence of nonlinearity in thedependence on climb, and (given also physical considerations) quite possibly in the dependence on distance.

Page 21: Regression
Page 22: Regression
Page 23: Regression

The pattern of residuals is now more randomly spread, indicating a better model than the fisrt one.

Page 24: Regression

Finally, the residuals of model 1 can be plotted against those of model 2.

Page 25: Regression

This suggests that Model 2 is a considerable improvement, at least insofar as it reduces the large residuals associated with the 3 labelled observations.

The observations corresponding to Bens of Jura and Lairig Ghru remain moderately influential.