57
Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Embed Size (px)

Citation preview

Page 1: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-1

Copyright © 2012 Pearson Education. All rights reserved.

Chapter 17

Understanding Residuals

Page 2: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-2

17.1 Examining Residuals for GroupsConsider the following study of the Sugar content vs. the Calorie content of breakfast cereals:

There is no obvious departure from the linearity assumption.

Page 3: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-3

17.1 Examining Residuals for GroupsThe histogram of residuals looks fairly normal…

Page 4: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-4

17.1 Examining Residuals for Groups

The mean Calorie content may depend on some factor besides sugar content.

…but the distribution shows signs of being a composite of three groups of cereal types.

Page 5: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-5

17.1 Examining Residuals for GroupsExamining the residuals of groups…

…suggests factors other than sugar content that may be important in determining Calorie content.Puffing: replacing cereal with “air” lowers the Calorie content, even for high-sugar cerealsFat/oil: Fats add to the Calorie content, even for low-sugar cereals

Puffed cereals (high air content per serving)

Cereals with fruits and/or nuts (high fat/oil content per serving)

All others

Page 6: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-6

17.1 Examining Residuals for GroupsConclusion:

It may be better to report three regressions, one for puffed cereals, one for high-fat cereals, and one for all others.

Page 7: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-7

17.1 Examining Residuals for Groups

Example : Concert VenuesA concert production company examined it’s records and made the following scatterplot. The company places concerts in two venues, a smaller, more intimate theatre (plotted with blue circles) and a larger auditorium style venue.

Describe the relationshipbetween Talent Costand Total Revenue.How are the results for thetwo venues similar? Different?

Page 8: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-8

17.1 Examining Residuals for GroupsExample : Concert VenuesA concert production company examined it’s records and made the following scatterplot. The company places concerts in two venues, a smaller, more intimate theatre (plotted with blue circles) and a larger auditorium style venue.

Describe the relationshipbetween Talent Costand Total Revenue.Positive, linear, andmoderately strong. As TalentCost increases, Revenuealso increases.

Page 9: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-9

17.1 Examining Residuals for GroupsExample : Concert VenuesA concert production company examined it’s records and made the following scatterplot. The company places concerts in two venues, a smaller, more intimate theatre (plotted with blue circles) and a larger auditorium style venue.How are the results for thetwo venues similar?Both venues show anincrease of revenue withtalent cost. Different? The larger venue hasgreater variability. Revenue forthat venue is more difficult topredict.

Page 10: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-10

17.1 Examining Residuals for GroupsExample : Concert VenuesA concert production company examined it’s records and made the following scatterplot. The company places concerts in two venues, a smaller, more intimate theatre (plotted with blue circles) and a larger auditorium style venue.

How are the results for thetwo venues different? The larger venue has greatervariability. Revenue for thatvenue is more difficult topredict.

Page 11: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-11

17.2 Extrapolation and PredictionExtrapolating – predicting a y value by extending the regression model to regions outside the range of the x-values of the data.

Page 12: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-12

17.2 Extrapolation and PredictionWhy is extrapolation dangerous?

It introduces the questionable and untested assumption that the relationship between x and y does not change.

Page 13: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-13

17.2 Extrapolation and PredictionCautionary Example: Oil Prices in Constant Dollars

Model Prediction (Extrapolation):

On average, a barrel of oil will increase $7.39 per year from 1983 to 1998.

Page 14: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-14

17.2 Extrapolation and PredictionCautionary Example: Oil Prices in Constant Dollars

Actual Price Behavior

Extrapolating the 1971-1982 model to the ’80s and ’90s lead to grossly erroneous forecasts.

Page 15: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-15

17.2 Extrapolation and PredictionRemember: Linear models ought not be trusted beyond the span of the x-values of the data.

If you extrapolate far into the future, be prepared for the actual values to be (possibly quite) different from your predictions.

Page 16: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-16

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

In regression, an outlier can stand out in two ways. It can have…

1) a large residual:

Page 17: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-17

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

In regression, an outlier can stand out in two ways. It can have…

2) a large distance from : x “High-leveragepoint”

A high leverage point is influential if omitting it gives a regression model with a very different slope.

Page 18: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-18

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

Page 19: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-19

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

Not high-leverage

Large residual

Not very influential

Page 20: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-20

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

Page 21: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-21

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

High-leverage

Small residual

Not very influential

Page 22: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-22

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

Page 23: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-23

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

Tell whether the point is a high-leverage point, if it has a large residual, and if it is influential.

High-leverage

Medium residual

Very influential (omitting the red point will change the slope dramatically!)

Page 24: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-24

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

What should you do with a high-leverage point?

Sometimes, these points are important. They can indicate that the underlying relationship is in fact nonlinear.

Other times, they simply do not belong with the rest of the data and ought to be omitted.

When in doubt, create and report two models: one with the outlier and one without.

Page 25: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-25

17.3 Unusual and ExtraordinaryObservations

Outliers, Leverage, and Influence

WARNING:

Influential points do not necessarily have high residuals and therefore can hide in residual plots.

So, use scatterplots rather than residual plots to identify high-leverage outliers.

(Residual plots work well of course for identifying high-residual outliers.)

Page 26: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-26

17.3 Unusual and ExtraordinaryObservations

Example: Hard Drive Prices Prices for external hard drives are linearly associated with the Capacity (in GB). The least squares regression line without a 200 GB drive that sold for $299.00 was found to be .

The regression equation with the original data is

How are the two equations different?

Does the new point have a large residual? Explain.

Price 18.64 0.104Capacity

Price 66.57 0.088Capacity

Page 27: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-27

17.3 Unusual and ExtraordinaryObservations

Example: (continued) Hard Drive Prices Prices for external hard drives are linearly associated with the Capacity (in GB). The least squares regression line without a 200 GB drive that sold for $299.00 was found to be

. The regression equation with the original data is .

How are the two equations different? The intercepts are different, but the slopes are similar.

Does the new point have a large residual? Explain. Yes. The hard drive’s price doesn’t fit the pattern since it pulled the line up but didn’t decrease the slope very much.

Price 18.64 0.104Capacity

Price 66.57 0.088Capacity

Page 28: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-28

17.4 Working with Summary Values

Scatterplots of summarized (averaged) data tend to show less variability than the un-summarized data.

Wind speeds at two locations, collected at 6AM, noon, 6PM, and midnight.

Raw data: Daily-averageddata:

Monthly-averaged data:

R2 = 0.736 R2 = 0.844 R2 = 0.942

Page 29: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-29

17.4 Working with Summary Values

WARNING:

Be suspicious of conclusions based on regressions of summary data.

Regressions based on summary data may look better than they really are!

In particular, the strength of the correlation will be misleading.

Page 30: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-30

17.5 Autocorrelation

Time-series data are sometimes autocorrelated, meaning points near each other in time will be related.

First-order autocorrelation:Adjacent measurements are related

Second-order autocorrelation:Every other measurement is related

etc…

Autocorrelation violates the independence condition. Regression analysis of autocorrelated data can produce misleading results.

Page 31: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-31

17.5 Autocorrelation

Autocorrelation can sometimes be detected by plotting residuals versus time.

Don’t rely on plots to detect autocorrelation. Rather, use the Durbin-Watson statistic.

Page 32: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-32

17.5 Autocorrelation

2

12

2

1

n

t ti

n

tt

e eD

e

The value of D will always be between 0 and 4, inclusive.

D = 0 perfect positive autocorrelation (et = et–1 for all points)

D = 2 no autocorrelation

D = 4 perfect negative autocorrelation (et = –et–1 for all points)

Durbin-Watson Statistic – estimates the autocorrelation by summing squares of consecutive differences and comparing the sum with its expected value under the null hypothesis of no autocorrelation.

Page 33: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-33

17.5 Autocorrelation

Whether the calculated Durbin-Watson statistic D indicates significant autocorrelation depends on the sample size, n, and the number of predictors in the regression model, k.

Table W of Appendix C provides critical values for the Durbin-Watson statistic (dL and dU) based on n and k.

Page 34: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-34

17.5 Autocorrelation

Testing for positive first-order autocorrelation:

If D < dL, then there is evidence of positive autocorrelation

If dL < D < dU, then test is inconclusive

If D > dU, then there is no evidence of positive autocorrelation

Testing for negative first-order autocorrelation:

If D > 4 – dL, then there is evidence of negative autocorrelation

If 4 – dL < D < 4 – dU, then test is inconclusive

If D < 4 – dU, then there is no evidence of negative autocorrelation

Page 35: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-35

17.5 Autocorrelation

Dealing with autocorrelation:

Time series methods (Chapter 20) attempt to deal with the problem by modeling the errors.

Or, look for a predictor variable (Chapter 19) that removes the dependence in the residuals.

A simple solution: sample from the time series so that the values are more distant in time and likely minimize first-order autocorrelation (sampling may do nothing to minimize higher-order autocorrelation, though).

Page 36: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-36

17.5 Autocorrelation

Example: Monthly Orders

A company fits a regression to predict monthly Orders over a period of 48 months. The Durbin-Watson statistic of the residuals is 0.875.

At α = 0.01, what are the values of dL and dU?

Is there evidence of positive autocorrelation?

Is there evidence of negative autocorrelation?

Page 37: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-37

17.5 Autocorrelation

Example: Monthly Orders

A company fits a regression to predict monthly Orders over a period of 48 months. The Durbin-Watson statistic of the residuals is 0.875.

At α = 0.01, what are the values of dL and dU? Using n = 50 from the table, dL = 1.32 and dU = 1.40.

Is there evidence of positive autocorrelation? Yes. D < 1.32.

Is there evidence of negative autocorrelation? No. D < 4 – 1.40.

Page 38: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-38

17.6 Transforming (Re-expressing) Data

Linearity

Some data show departures from linearity.

Example: Auto Weight vs. Fuel Efficiency

Linearity condition is not satisfied.

Page 39: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-39

17.6 Transforming (Re-expressing) Data y

Linearity

In cases involving upward bends of negatively-correlated data, try analyzing –1/y (negative reciprocal of y) vs. x instead.

Linearity condition now appears satisfied.

Page 40: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-40

17.6 Transforming (Re-expressing) Data

The auto weight vs. fuel economy example illustrates the principle of transforming data.

There is nothing sacred about the way x-values or y-values are measured. From the standpoint of measurement, all of the following may be equally-reasonable:

x vs. y

x vs. –1/y

x2 vs. y

x vs. log(y)

One or more of these transformations may be useful for making data more linear, more normal, etc.

Page 41: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-41

17.6 Transforming (Re-expressing) Data

Goals of Re-expression

Goal 1 Make the distribution of a variable more symmetric.

vs. y x vs. logy x

Page 42: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-42

17.6 Transforming (Re-expressing) Data

Goals of Re-expression

Goal 2 Make the spread of several groups more alike.

vs. y x log vs. y x

We’ll see methods later in the book that can be applied only to groups with a common standard deviation.

Page 43: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-43

17.6 Transforming (Re-expressing) Data

Goals of Re-expression

Goal 3 Make the form of a scatterplot more nearly linear.

vs. y x log vs. logy x

Page 44: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-44

17.6 Transforming (Re-expressing) Data

Goals of Re-expressionGoal 4 Make the scatter in a scatterplot or residual plot spread out evenly rather than following a fan shape.

vs. y x log vs. logy x

Page 45: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-45

17.7 The Ladder of Powers

Ladder of Powers – a collection of frequently-useful re-expressions.

Page 46: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-46

17.7 The Ladder of Powers

Ladder of Powers – a collection of frequently-useful re-expressions.

Page 47: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-47

17.7 The Ladder of Powers

Ladder of Powers – a collection of frequently-useful re-expressions.

Page 48: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-48

17.7 The Ladder of Powers

Example : Foreign Prices

You want to model the relationship between prices for various items in Paris and Hong Kong. The scatterplot of Hong Kong prices vs. Paris prices shows a generally straight pattern with a small amount of scatter.

What re-expression (if any) of the Hong Kong prices might you start with?

Page 49: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-49

17.7 The Ladder of Powers

Example : Foreign Prices

You want to model the relationship between prices for various items in Paris and Hong Kong. The scatterplot of Hong Kong prices vs. Paris prices shows a generally straight pattern with a small amount of scatter. What re-expression (if any) of the Hong Kong prices might you start with?

No re-expression is needed to strengthen the linearity assumption. More information is needed to decide whether re-expression might strengthen the normality assumption or the equal-variance assumption.

Page 50: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-50

17.7 The Ladder of Powers

Example : Population Growth You want to model the population growth of the United States over the past 200 years with a percentage growth that’s nearly constant. The scatterplot shows a strongly upwardly curves pattern.

What re-expression (if any) of the Hong Kong prices might you start with?

Page 51: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-51

17.7 The Ladder of Powers

Example : Population Growth You want to model the population growth of the United States over the past 200 years with a percentage growth that’s nearly constant. The scatterplot shows a strongly upwardly curves pattern.

What re-expression (if any) of the Hong Kong prices might you start with?

Try a “Power 0” (logarithmic) re-expression of the population values. This should strengthen the linearity assumption.

Page 52: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-52

Make sure the relationship is straight enough to fit a regression model.

Be on guard for data that is a composite of values from different groups. If you find data subsets that behave differently, consider fitting a different model to each group.

Beware of extrapolating.

Beware of extrapolating far into the future.

Page 53: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-53

Look for unusual points.

Beware of high-leverage points, especially those that are influential.

Consider setting aside outliers and re-running the regression.

Treat unusual points honestly. You must not eliminate points simply to “get a good fit”.

Page 54: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-54

Be alert for autocorrelation. A Durbin-Watson test can be useful for revealing first-order autocorrelation.

Watch out when dealing with data that are summaries. These tend to inflate the impression of the strength of the correlation.

Re-express your data when necessary.

Page 55: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-55

What Have We Learned?

Be skeptical of regression models. Always plot and examine the residuals for unexpected behavior. Be alert to a variety of possible violations of the standard regression assumptions and know what to do when you find them.

Be alert for subgroups in the data.• Often these will turn out to be separate groups that should not be analyzed together in a single analysis.• Often identifying subgroups can help us understand what is going on in the data.

Page 56: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-56

What Have We Learned?

Be especially cautious about extrapolating beyond the data.

Look out for unusual and extraordinary observations.• Cases that are extreme in x have high leverage and can affect a regression model strongly.• Cases that are extreme in y have large residuals.• Cases that have both high leverage and large residuals are influential. Setting them aside will change the regression model, so you should consider whether their influence on the model is appropriate or desirable.

Page 57: Copyright © 2012 Pearson Education. All rights reserved. 17-1 Copyright © 2012 Pearson Education. All rights reserved. Chapter 17 Understanding Residuals

Copyright © 2012 Pearson Education. All rights reserved. 17-57

What Have We Learned?

Notice when you are working with summary values.• Summaries vary less than the data they summarize, so they may give the impression of greater certainty than your model deserves.

Diagnose and treat nonlinearity.• If a scatterplot of y vs. x isn’t straight, a linear regression model isn’t appropriate.• Re-expressing one or both variables can often improve the straightness of the relationship.• The powers, roots, and the logarithm provide an ordered collection of re-expressions so you can search up and down the “ladder of powers” to find an appropriate one.