22
Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression Line 9 The Point of Averages 12 Residuals 15 Extrapolation, Restricted Range, and Lurking Variables 20 Tutorials Obtaining a linear regression analysis in Excel 2007 CORRELATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

Chapter 3

TOPIC SLIDE

Linear Regression Defined 2

Regression Equation 3

The Slope or b 4

The Y-Intercept or a 5

What Value of the Y-Variable Should be Predicted When r = 0? 7

The Regression Line 9

The Point of Averages 12

Residuals 15

Extrapolation, Restricted Range, and Lurking Variables 20

Tutorials

• Obtaining a linear regression analysis in Excel 2007

CORRELATION AND REGRESSION

Page 2: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The stronger the correlation, the more accurately

one variable can be predicted from another

variable

➋ By using the linear regression equation, we can

predict scores for one variable (the Y-variable)

from scores on a second variable (the X-variable)

The linear regression equation assumes the

statistical relationship between two variables

follows a straight line known as the regression line

Chapter 3

LINEAR REGRESSION

Page 3: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The regression equation consists of four parts:

• The predicted value for the Y-variable or y’

• The slope of the regression line or b

• The known value of the X-variable or x

• The value for the y-intercept or a

Chapter 3

LINEAR REGRESSION

axby i'

Page 4: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The slope of the regression line or b :

• Has the same sign (+ or -) as the correlation

coefficient r

• Is a function of the strength of the correlation

and the ratio of standard deviations for X and Y

variables

Chapter 3

LINEAR REGRESSION

axby i'

SDx

SDyrb

Page 5: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The value for the y-intercept or a :

• Is the point where the regression line crosses

the y-axis

• Is the predicted value of y when the x-variable

equals zero

• This value may sometimes be a strange

value, but remember it’s a predicted value

Chapter 3

LINEAR REGRESSION

axby i'

Page 6: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The y-intercept equals:

• The slope of the regression equation (b) times

the overall mean for the x-variable (X ) subtracted from

• The overall mean for the y-variable (Y )

Chapter 3

LINEAR REGRESSION

XbYa

Page 7: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ If the correlation is zero, that means the value for

the slope is zero and the regression line is flat

(i.e., horizontal)

➋ If b = 0, then the y-intercept formula simplifies to:

Which means the regression equation simplifies

to:

Chapter 3

LINEAR REGRESSION

Ya

Yy' Why?

Page 8: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ If there is no correlation between two variables,

the best prediction for either variable is its mean

➋ On average, the mean is closer to all values in a

distribution compared to any other score

• In other words, if the mean is used to predict

each score in a data set, the average error in

prediction will be smaller compared to using

some other score from the distribution

Chapter 3

LINEAR REGRESSION

Page 9: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ What values make the regression line?

• The values predicted by the regression

equation create the regression line

Chapter 3

LINEAR REGRESSION

axby i'These predicted points all fall on the

regression line

Page 10: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ Represents a central point inside the points of a scatterplot

• The points in a scatterplot can be thought of as

regressing to this central point

➋ Is the best fitting line and is also known as the line of least-

squares

• Imagine the different angles you could plot a straight

line through a scatterplot

• The line that would result in the smallest average

distance from all points would be the regression line

Chapter 3

LINEAR REGRESSION

Page 11: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

Chapter 3

LINEAR REGRESSION

Regression Equation

The blue line is the regression line. The points that make this

line are the predicted values from the regression equation.

Page 12: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ Every linear regression line passes through the point of

averages

• The point of averages is located by the intersection of

the overall mean for the x-variable and the overall

mean of the y-variable

➋ Point predicted closer to the point of averages are, on

average, more accurate than points plotted further away

from this point

Chapter 3

LINEAR REGRESSION

Page 13: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

Chapter 3

LINEAR REGRESSION

Regression Equation

The black dot represents the point of averages where the

overall means for the x-variable (Father’s Height 69 inches) and

y-variable (Son’s Height 71.5 inches). This point is always found

on a linear regression line

Page 14: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The regression line can be plotted using Excel, however,

you can also plot this line using two points:

• The point of averages and

• The y-intercept

➋ You can also plot the regression line by plugging-in values

of the x-variable into the regression equation and solving

for the predicted value of the y-variable

Remember – the regression line is made-up of all the

predicted values of the y-variable or y ‘

Chapter 3

LINEAR REGRESSION

Page 15: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The term residuals refers to the amount of error in

prediction

• In other words, the regression equation produces a

predicted value for the y-variable

• The difference between the predicted value of Y and

the real value of Y is known as error or the residual

• Excel can calculate the residuals for each predicted

score, however if we were to obtain the residuals by

hand, the formula used is:

• Formula for Residuals: y – y ‘

Chapter 3

LINEAR REGRESSION

Page 16: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

Chapter 3

LINEAR REGRESSION

Regression Equation

The distance between each real point and the regression line is

a residual or error in prediction. The sum of the residuals is

always equal to zero.

Residual

Residual

Page 17: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ Residuals can help identify outliers

• When a residual is very large, it may indicate an outlier

• Outliers can have the effect of increasing or decreasing

the slope of the regression line

• This means that outliers can also increase or

decrease the correlation between two variables

• Depending on the size of the outlier, a researcher may

want to run the regression analysis with and without the

outlier to see how much the score may affect the

results

Chapter 3

LINEAR REGRESSION

Page 18: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The regression equation attempts to predict the mean of

the y-variable at each value of the x-variable – WHY?

• Suppose you have three fathers who are each 74

inches tall (or 6’2”)

• Each of these fathers has a son who is a different

height

• The value of the x-variable entered into the regression

equation will be the same for each of these three

fathers

• What value for sons’ heights should the equation try to

predict?

Chapter 3

LINEAR REGRESSION

Page 19: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

Chapter 3

LINEAR REGRESSION

Regression Equation

The regression equation will try to predict the average height of

the sons (y-variable) at each height of the fathers (x-variable).

What height should

be predicted for the

three sons who

each have a father

that is 74” tall?

Page 20: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ What is meant by extrapolation ?

• Predicting values beyond the range of the data used to

develop the regression equation

➋ What is meant by limited range ?

• When the regression equation is based on a very

narrow range of data compared to the true range of the

data in the population

What is meant by lurking variables ?

• Other variables that can account for the correlation

between two variables

Chapter 3

LINEAR REGRESSION

Page 21: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

➊ The correlation coefficient can be obtained by hand using

the following formula:

Chapter 3

LINEAR REGRESSION

SDy

SDxbr

Page 22: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files...Chapter 3 TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept

End of Chapter 3 – Part 2