Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Chapter 3
TOPIC SLIDE
Linear Regression Defined 2
Regression Equation 3
The Slope or b 4
The Y-Intercept or a 5
What Value of the Y-Variable Should be Predicted When r = 0? 7
The Regression Line 9
The Point of Averages 12
Residuals 15
Extrapolation, Restricted Range, and Lurking Variables 20
Tutorials
• Obtaining a linear regression analysis in Excel 2007
CORRELATION AND REGRESSION
➊ The stronger the correlation, the more accurately
one variable can be predicted from another
variable
➋ By using the linear regression equation, we can
predict scores for one variable (the Y-variable)
from scores on a second variable (the X-variable)
The linear regression equation assumes the
statistical relationship between two variables
follows a straight line known as the regression line
Chapter 3
LINEAR REGRESSION
➊ The regression equation consists of four parts:
• The predicted value for the Y-variable or y’
• The slope of the regression line or b
• The known value of the X-variable or x
• The value for the y-intercept or a
Chapter 3
LINEAR REGRESSION
axby i'
➊ The slope of the regression line or b :
• Has the same sign (+ or -) as the correlation
coefficient r
• Is a function of the strength of the correlation
and the ratio of standard deviations for X and Y
variables
Chapter 3
LINEAR REGRESSION
axby i'
SDx
SDyrb
➊ The value for the y-intercept or a :
• Is the point where the regression line crosses
the y-axis
• Is the predicted value of y when the x-variable
equals zero
• This value may sometimes be a strange
value, but remember it’s a predicted value
Chapter 3
LINEAR REGRESSION
axby i'
➊ The y-intercept equals:
• The slope of the regression equation (b) times
the overall mean for the x-variable (X ) subtracted from
• The overall mean for the y-variable (Y )
Chapter 3
LINEAR REGRESSION
XbYa
➊ If the correlation is zero, that means the value for
the slope is zero and the regression line is flat
(i.e., horizontal)
➋ If b = 0, then the y-intercept formula simplifies to:
Which means the regression equation simplifies
to:
Chapter 3
LINEAR REGRESSION
Ya
Yy' Why?
➊ If there is no correlation between two variables,
the best prediction for either variable is its mean
➋ On average, the mean is closer to all values in a
distribution compared to any other score
• In other words, if the mean is used to predict
each score in a data set, the average error in
prediction will be smaller compared to using
some other score from the distribution
Chapter 3
LINEAR REGRESSION
➊ What values make the regression line?
• The values predicted by the regression
equation create the regression line
Chapter 3
LINEAR REGRESSION
axby i'These predicted points all fall on the
regression line
➊ Represents a central point inside the points of a scatterplot
• The points in a scatterplot can be thought of as
regressing to this central point
➋ Is the best fitting line and is also known as the line of least-
squares
• Imagine the different angles you could plot a straight
line through a scatterplot
• The line that would result in the smallest average
distance from all points would be the regression line
Chapter 3
LINEAR REGRESSION
Chapter 3
LINEAR REGRESSION
Regression Equation
The blue line is the regression line. The points that make this
line are the predicted values from the regression equation.
➊ Every linear regression line passes through the point of
averages
• The point of averages is located by the intersection of
the overall mean for the x-variable and the overall
mean of the y-variable
➋ Point predicted closer to the point of averages are, on
average, more accurate than points plotted further away
from this point
Chapter 3
LINEAR REGRESSION
Chapter 3
LINEAR REGRESSION
Regression Equation
The black dot represents the point of averages where the
overall means for the x-variable (Father’s Height 69 inches) and
y-variable (Son’s Height 71.5 inches). This point is always found
on a linear regression line
➊ The regression line can be plotted using Excel, however,
you can also plot this line using two points:
• The point of averages and
• The y-intercept
➋ You can also plot the regression line by plugging-in values
of the x-variable into the regression equation and solving
for the predicted value of the y-variable
Remember – the regression line is made-up of all the
predicted values of the y-variable or y ‘
Chapter 3
LINEAR REGRESSION
➊ The term residuals refers to the amount of error in
prediction
• In other words, the regression equation produces a
predicted value for the y-variable
• The difference between the predicted value of Y and
the real value of Y is known as error or the residual
• Excel can calculate the residuals for each predicted
score, however if we were to obtain the residuals by
hand, the formula used is:
• Formula for Residuals: y – y ‘
Chapter 3
LINEAR REGRESSION
Chapter 3
LINEAR REGRESSION
Regression Equation
The distance between each real point and the regression line is
a residual or error in prediction. The sum of the residuals is
always equal to zero.
Residual
Residual
➊ Residuals can help identify outliers
• When a residual is very large, it may indicate an outlier
• Outliers can have the effect of increasing or decreasing
the slope of the regression line
• This means that outliers can also increase or
decrease the correlation between two variables
• Depending on the size of the outlier, a researcher may
want to run the regression analysis with and without the
outlier to see how much the score may affect the
results
Chapter 3
LINEAR REGRESSION
➊ The regression equation attempts to predict the mean of
the y-variable at each value of the x-variable – WHY?
• Suppose you have three fathers who are each 74
inches tall (or 6’2”)
• Each of these fathers has a son who is a different
height
• The value of the x-variable entered into the regression
equation will be the same for each of these three
fathers
• What value for sons’ heights should the equation try to
predict?
Chapter 3
LINEAR REGRESSION
Chapter 3
LINEAR REGRESSION
Regression Equation
The regression equation will try to predict the average height of
the sons (y-variable) at each height of the fathers (x-variable).
What height should
be predicted for the
three sons who
each have a father
that is 74” tall?
➊ What is meant by extrapolation ?
• Predicting values beyond the range of the data used to
develop the regression equation
➋ What is meant by limited range ?
• When the regression equation is based on a very
narrow range of data compared to the true range of the
data in the population
What is meant by lurking variables ?
• Other variables that can account for the correlation
between two variables
Chapter 3
LINEAR REGRESSION
➊ The correlation coefficient can be obtained by hand using
the following formula:
Chapter 3
LINEAR REGRESSION
SDy
SDxbr
End of Chapter 3 – Part 2