Upload
kalkin
View
63
Download
0
Embed Size (px)
DESCRIPTION
Regression Wisdom. Getting to Know Your Scatterplot and Residuals. Important Terms. Extrapolation (203) Outlier (205) Leverage (206) Influential Point (206) Lurking Variable (208). Residuals. - PowerPoint PPT Presentation
Citation preview
Regression Wisdom
Getting to Know Your Scatterplot and Residuals
Important Terms0Extrapolation (203)
0Outlier (205)
0 Leverage (206)
0 Influential Point (206)
0 Lurking Variable (208)
Residuals
0Recall – Residuals are the difference between data values and the corresponding values predicted by the regression model
0Residual = Observed Value – Predicted Value e =
(page 172)
When Residuals Aren’t Random0 We want our plot of residuals
to be boring
0 It should have no structure, direction, shape, none of that stuff.
0 When it does, there is something else going on in the data that explains the variation of the two variables.
Sifting Residuals for Groups
- We can form subsets of the same population to try and achieve a better analysis of the data.
- Sometimes the easiest way to achieve this is to examine a plot or histogram of residuals
Sifting and Subsets0You can perform
regression analysis on each subset of the larger population, noting correlation and all appropriate summary statistics for each subset.
Extrapolation0Our Linear Model:
0Plug in a new x, it gives you a predicted
0 But the farther the new x-value is from , the less trust we can place in the predicted y value.
0 Once we venture into new x territory such a prediction is called an extrapolation
0 Extrapolations require the very questionable assumption that nothing about the relationship between x and y changes even at extreme values of x and beyond
Extrapolation0 If your x variable is Time,
extrapolation becomes a prediction about the future!
0Example:
Mid-1970s, oil cost $17 a barrel in 2005 dollars
0 This is what it had cost for about 20 years!
0 But suddenly, within a few years, the price skyrocketed to over $40 a barrel
0 If you used this data for your model, you might be predicting oil prices today in the hundreds upon hundreds of dollars per barrel while if you had done your analysis before the spike in prices, you might still be predicting around 17$ a barrel.
Outliers, Leverage, Influence0 Outliers can have big impacts
on your fitted regression line.
0 Points with large residuals always deserve special attention.
0 A data point with an unusually large x-value from the mean is said to have high leverage
0 High Leverage doesn’t mean the point changes the overall picture.
0 If the point lines up with the pattern of other points, including it doesn’t change our estimate of the line
0 But by sitting so far from it may strengthen the relationship, inflate the correlation and R-Squared
Outliers, Leverage, Influence0A point is influential if
omitting it from the analysis gives a very different model
0 Influence depends on both leverage and residual
0A case with high leverage whose y-value sits right on the line is not influential.
0Removing this point may not change the slope but may change R-Squared
Outliers, Leverage, Influence0A point is influential if
omitting it from the analysis gives a very different model
0 Influence depends on both leverage and residual
0A case with modest leverage but a very large residual can be influential.
0With enough leverage, the regression line can be pulled right to it. Then its highly influential but will have a small residual
Outliers, Leverage, Influence0A point is influential if
omitting it from the analysis gives a very different model
0 Influence depends on both leverage and residual
0The only thing to do is to do your analysis twice:
0Once with the point
0Once omitting the point
Does the unusual point have high-leverage, a large residual,
and is it influential?
Not high leverageNot influentialLarge Residual
High LeverageNot InfluentialSmall Residual
High LeverageInfluentialNot Large Residual
Lurking Variables, Causation0 No matter how strong the
association
0 No matter how large the value
0 No matter how straight the line
0 There is NO way to conclude from regression alone that one variable causes the other.
0There may always be a lurking variable that causes the apparent association
Lurking Variable Example0The scatterplot shows the
Life Expectancy of men and women in 41 different countries
0These values are plotted against the square root of Doctors per person in that country.
Lurking Variable Example0There is a strong positive
correlation,
0This confirms our expectation that more doctors per person improves healthcare, leading to longer lifetimes and greater life expectancy.
Lurking Variable Example0Can we conclude though
that doctors cause greater life expectancy? Perhaps, but increasing numbers of doctors and greater life expectancy may both be results of a larger change.
Lurking Variable Example0Here is a similar looking
scatterplot now comparing life expectancy to the square root of TVs per person.
0This is an even stronger association!
A Final Note0Beware of scatterplots of
statistics of summarized data.
0 For example,
HomeworkPg 214, #1, 3, 4, 8, 10