4.2
Least Squares Regression
If we have two variables X and Y, we often
would like to model the relation as a line Draw a line through the scatter diagram
We want to find the line that “best” describes the linear relationship … the regression line
Best Fit
● We want to use a linear model● Linear models can be written in several
different (equivalent) ways y = m x + b y – y1 = m (x – x1)
y = b1 x + b0
● Because the slope and the intercept both are important to analyze, we will use
y = b1 x + b0
Linear Model
● One difference between math and stat is
that statistics assumes that the measurements are not exact, that there is an error or residual
● The formula for the residual is alwaysResidual = Observed – Predicted
● This relationship is not just for this chapter … it is the general way of defining error in statistics
Residuals
What the residual is on the scatter diagram
Residual
The model line
The x value of interest
The observed value y
The residual
The predicted value y
● We want to minimize the residuals, but we
need to define what this means● We use the method of least-squares
We consider a possible linear mode We calculate the residual for each point We add up the squares of the residuals
● The line that has the smallestis called the least-squares regression line
Least-Squares Regression Line
The equation for the least-squares
regression line is given byy = b1x + b0
b1 is the slope of the least-squares regression line
b0 is the y-intercept of the least-squares regression line
Least Squares Regression Line
● Finding the values of b1 and b0, by
hand, is a very tedious process● You should use calculator for this● Finding the coefficients b1 and b0 is
only the first step of a regression analysis We need to interpret the slope b1
We need to interpret the y-intercept b0
Tough Stuff
Interpreting the slope b1
The slope is sometimes defined as
The slope is also sometimes defined as
The slope relates changes in y to changes in x
Slope
Where the graph crosses the y-axis Interpreted as the location where
“x” = 0 Think about what is means in the
situation for the “x” to be zero
Y-intercept
If the least-squares regression line is used to make predictions based on values of the explanatory variable that are much larger or much smaller than the observed values, we say the researcher is working outside the scope of the model. Never use a least-squares regression line to make predictions outside the scope of the model because we can’t be sure the linear relation continues to exist.
4-11
© 2010 Pearson Prentice Hall. All rights reserved
The speed of a golf club and the distance
the ball went were measured for a linear relationship.
Draw a scatter pot of the data, and the least-squares regression line. Write down the equation.
Example
Club Head Speed (mph)
Distance (yards)
100 257
102 264
103 274
101 266
105 277
100 263
99 258
105 275
Data
• Using your equation, estimate how far a golf ball would travel if it was hit at 104 mph.
• Interpret the slope of the regression equation
Sample Problem The heights and weights of 11 men between the ages
of 21 and 26 were measured. The data are presented in the table below.
A. Create at a scatter diagram to confirm that an approximately linear relationship exists between x and y.
B. Find the least squares regression line, treating height, x, as the explanatory variable and weight, y, as the response variable
C. Interpret the slope D. Interpret the y-intercept, is this appropriate. d. Use the regression line to predict the weight of a
man who is 73 inches tall?