Upload
bertina-park
View
229
Download
4
Tags:
Embed Size (px)
Citation preview
AP STATISTICSSection 3.2 Least Squares Regression
Objective: To be able to derive the least squares regression line.
A regression line is a straight line that describes how a response variable changes as the explanatory variable changes.
The regression line as the model:
The idealized model: or • is the response variable• is the explanatory variable• is the y–intercept: “a is the predicted value of y when x = 0.”• is the slope: “For every unit change in x there is a predicted
change of b in y.”
Extrapolation: Using your model to make predictions outside the range of the x-values. RISKY!!
The Least Squares Regression Line: (LSRL) It is the line that minimizes the sum of the squares of the vertical distances between the points and the model.
Model:
• is the predicted response.• is the explanatory variable.
Ex. Derive the LSRL for the number of TVs versus the number of rooms in someone’s house.
1. Define all variables in the model.
2. Interpret the slope and y-intercept in the context of the problem.
3. Does the y-intercept have meaning in the context of this problem? Why?
4. What is the predicted number of TVs when there are 20 rooms in a house? How do you feel about this predictions?
Residual: (error) the difference between the observed y-value and the predicted y-value.
residual = = observed – predicted• A residual value is positive if the point lies above the line.• A residual value is negative if the point lies below the line.
A residual plot is a scatterplot of the residuals versus x.• Used to determine how well the line fits the data.• Magnifies the errors.• Graphed around the line y = 0• Sum of the residuals is 0.• Some residual plots plot residuals versus the predictions
(. The appearance will be the same as residuals vs. x.
• If most of the residuals are positive then the model is underestimating the predictions.
• If most of the residuals are negative then the model is overestimating the predictions.
• INTERPRETATION: If the residual plot is scattered, then the model is a good fit for the data.
• We do NOT want to see patterns in our residual plots.
BEWARE:
1. A curved pattern indicates that the data is nonlinear.
2. Watch for funnel/megaphone patterns.
3. Don’t look too hard for a pattern.
4. Be careful interpreting data sets where n is small.
Ex. Use LSRL techniques to develop a model for shoe size versus height. Then calculate the residuals and create a residual plot.
Data:
1. Create a scatterplot of the data.
2. Describe the scatterplot.
3. Using your calculator, derive the LSRL and find correlation.
4. How does the value for correlation support your description in #2.
5. Define all variables in the model.
6. Interpret the y-intercept in the context of the problem. Does it have
meaning in this setting?
7. Interpret the slope in the context of the problem.
8. Add the LSRL to the scatterplot on your calculator.
9. What is the predicted shoe size when the height is 80 inches tall?
How do you feel about this prediction?
10. Calculate the residuals for all observations. Show how the first
residual was calculated.
10. Create a residual plot.
11. Interpret the residual plot.
Standard deviation of the residuals:
The larger this quantity is the more variability there is in the points around the line.
Coefficient of Determination: () “The percentage of change in y that is explained by the linear relationship of y on x.”
where
Total Sum of Squares =
Sum of the Squared Errors =
Ex. Find by hand for (0,0), (5,9) & (10,6)
Q: If we only know r, can we 100% certain what the value of is?
Q: If we only know , can we be 100% what the value of r is?
Q: If we only know and the y-intercept, can we be 100% what r is?
Q: If we only know and the slope, can we be 100% what r is?
Q: Can be greater than r?
Q: 0 < < 1? True or False
Q: r < < 1? True or False
Q: 0 < < r? True or False