18
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible. Because those vertical distances represent “left-over” variation in the response after fitting the regression line, these distances are called residuals.

Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible. Because those vertical distances

Embed Size (px)

Citation preview

Page 1: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Residuals

Recall that the vertical distances from the points to the least-squares regression line are as small as possible.

Because those vertical distances represent “left-over” variation in the response after fitting the regression line, these distances are called residuals.

Page 2: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Or in other words, the residuals are the distances from the points to the LSRL.

Page 3: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Calculating a Residual

One subject's NEA rose by 135 calories and he gained 2.7 kg of fat. The predicted gain for 135 calories from the regression equation is:

kg==y 3.041350.003443.505ˆ

The residual for this subject is therefore:

=yy ˆobserved - predicted

2 . 7− 3 . 04=− 0 . 34 kg

Page 4: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Fat Gain & NEA (yet again!)

Here are the residuals for all 16 data values from the NEA experiment:

Although residuals can be calculated from any model that is fitted to the data, the residuals from the least-squares line have a special property: the sum of the least-squares residuals is always zero. (Try adding the numbers above- - they add up to zero!)

Page 5: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

The line y=0 corresponds with the regression line, and also marks the mean of our residuals.

The residuals plot magnifies the deviations from the line to make patterns easier to see.

Page 6: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Residual Plots

What to look for when examining a residual plot:

1. Residual plots should have no pattern.

Page 7: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Residual Plots

What to look for when examining a residual plot:

A curved pattern shows that the relationships may not be linear.

Increasing spread about the line as x increases indicates the prediction will be less accurate for larger x values. Similarly, decreasing spread indicates the prediction will be less accurate for smaller x values.

Page 8: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Residual Plots

What to look for when examining a residual plot:

1. The residual plot should show no pattern.

2. The residuals should be relatively small in size.

Page 9: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

The role of r2 in regression

A residual plot is a graphical tool for evaluating how well a linear model fits the data.

Look at the residual plot first to see if a linear model is a good fit.

If the linear model is a good fit, then there is also a numerical quantity that tells us how well the LSRL does at predicting values of the response variable y. It is r2, the coefficient of determination.

Page 10: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

The role of r2 in regression

r2 is actually the correlation squared, but there's more to the story...

The idea of r2 is this: how much better is the least-squares line at predicting responses y

than if we just used our mean?

Page 11: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

The role of r2 in regression

Is the LSRL better at predicting the data values than the mean? r2 tells us how much better.

Here's the line that represents the y mean of our data.

Here's our LSRL

Page 12: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Note: Remember we defined the variance back when we talked about standard deviation. r2 compares the variance from the mean (the SST part of the equation) with the residuals (the SSE part of the equation).

Here's the formula:

Page 13: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

For example, if r2=0.606 (as it does in the NEA example), then about 61% of the variation in fat gain among the individual subjects is due to the straight-line relationship between fat gain and NEA. The other 39% is individual variation among subjects that is not explained by the linear relationship.

Page 14: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

When you report a regression, give r2 as a measure of how successful the regression was in explaining the response. When you see a correlation, square it to get a better feel for the strength of the linear relationship.

Page 15: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Review

Facts About Least-Square Regression

The distinction between explanatory and response variables is essential in

regression. In the regression setting you must know clearly which variable is

explanatory!

Page 16: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Review

Facts About Least-Square Regression

There is a close connection between correlation and the slope of the LSRL. The

slope is

This equation says that along the regression line, a change of one standard

deviation in x corresponds to a change of r standard deviations in y.

b=rs xs y

Page 17: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

The least-squares regression line of y on x always passes through the point

(mean of x values, mean of y values)

Review

Facts About Least-Square Regression

y,x

Page 18: Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances

Review

Facts About Least-Square Regression

The correlation r describes the strength of a straight-line relationship. The square of the

correlation, r2, is the fraction of the variation in the values of y that is explained

by the least-squares regression of y on x.