42
Anthony Greene 1 Regression Using Correlation To Make Predictions

Anthony Greene1 Regression Using Correlation To Make Predictions

Embed Size (px)

Citation preview

Page 1: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 1

Regression

Using Correlation To Make Predictions

Page 2: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 2

Making a prediction

xy zrz ˆ

To obtain the predicted value of y based on a known value of x and a known correlation.

Note what happens for positive and negative values of r and for high and low values of r and for near-zero values of r.

Page 3: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 3

Graph of y = 5 – 3 x

Page 4: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 4

y-Intercept and Slope

For a linear equation y = a + bx, the constant a is the y-intercept and the constant b is the slope.

x and y are related variables

Page 5: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 5

Straight-line graphs of three linear equations

Y = a + bXa = y-interceptb = slope (rise/run)

Page 6: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 6

Graphical Interpretation of Slope

The straight-line graph of the linear equation y = a +bx slopes upward if b > 0, slopes downward if b < 0, and is horizontal if b = 0

Page 7: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 7

Graphical interpretation of slope

Page 8: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 8

Four data points

Page 9: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 9

Scatter plot

Page 10: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 10

Two possible straight-line fits to the data points

Page 11: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 11

Determining how well the data points in are fit by Line A Vs.Line B

Page 12: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 12

Least-Squares Criterion

The straight line that best fits a set of data points is the one having the smallest possible sum of squared errors. Recall that the sum of squared errors is error variance.

Page 13: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 13

Regression Line and Regression Equation

Regression line: The straight line that best fits a set of data points according to the least-squares criterion.Regression equation: The equation of the regression line.

Page 14: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 14

The best-fit line minimizes the distance between the actual data and the predicted value

Page 15: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 15

Residual, e, of a data point

Page 16: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 16

nyySS

nyxxyS

nxxSS

MySS

MyMxS

MxSS

y

p

x

yy

yxp

xx

22

22

2

2

formula nalcomputatio Or the

We define SSx, SSP and SSy by

Notation Used in Regression and Correlation

Page 17: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 17

Regression Equation

xbyn

aSS

Sb

xbay

x

P 1 and where

ˆ

The regression equation for a set of n data points is

xy bMMa

Page 18: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 18

The relationship between b and r

• That is, the regression slope is just the correlation coefficient scaled up to the right size for the variables x and y.

same thehave and becausewhich

x

y

x

y

x

p

yx

p

s

srb

nyx

SS

SSr

SS

Sb

SSSS

Sr

Page 19: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 19

abxy

bMMaMbMbxy

MxbMy

s

srbMx

s

srMy

s

Mxr

s

My

zrz

xyyx

xy

x

yx

x

yy

x

x

y

y

xy

ˆ

recall ˆ

ˆ

recall ˆ

ˆˆ

Page 20: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 20

Criterion for Finding a Regression Line

Before finding a regression line for a set of data points, draw a scatter diagram. If the data points do not appear to be scattered about a straight line, do not determine a regression line.

Page 21: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 21

Linear regression requires linear data:(a) Data points scattered about a curve (b) Inappropriate straight line fit to the dataHigher order regression equations exist but are outside the range of this course

Page 22: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 22

Uniform Variance

0102030405060708090100

1 2 3 4 5

Math Proficiency By Grade

Page 23: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 23

Assumptions for Regression Inferences

Page 24: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 24

Table for obtaining the three sums of squares for the used car data

Page 25: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 25

Regression line and data points for used car data

What is a fair asking price for a 2.5 year old car?

71.172ˆ

5.226.2047.195ˆ

26.2047.195ˆ

y

y

xy

So since the price unit is $100s, the best prediction is $17,271

Page 26: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 26

Extrapolation in the used car example

Page 27: Anthony Greene1 Regression Using Correlation To Make Predictions

27

Total sum of squares, SST: The variation in the observed values of the response variable:

Regression sum of squares, SSR: The variation in the observed values of the response variable that is explained by the regression:

Error sum of squares, SSE: The variation in the observed values of the response variable that is not explained by the regression:

Sums of Squares in Regression

xxxyy SSMySSR 22ˆ

yyy SMySST 2

xx

xyyy S

SSyySSE

22

ˆ

Page 28: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 28

Regression Identity

The total sum of squares equals the regression sum of squares plus the error sum of squares. In symbols,

SST = SSR + SSE.

Page 29: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 29

Graphical portrayal of regression for used cars

y = a + bx

Page 30: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 30

What sort of things could regression be used for?

Any instance where a known correlation exists, regression can be used to predict a new score. Examples:

1. If you knew that there was a past correlation between the amount of study time and the grade on an exam, you could make a good prediction about the grade before it happened.

2. If you knew that certain features of a stock correlate with its price, you can use regression to predict the price before it happens.

Page 31: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 31

Regression Example: Low Correlation

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350weight

hei

gh

t

Find the regression equation for predicting height based on knowledge of weight. The existing data is for 10 male stats students?

Page 32: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 32

287.00 75.00 21,525.00 82,369.00 5,625.00

300.00 71.00 21,300.00 90,000.00 5,041.00

255.00 80.00 20,400.00 65,025.00 6,400.00

180.00 69.00 12,420.00 32,400.00 4,761.00

130.00 70.00 9,100.00 16,900.00 4,900.00

215.00 77.00 16,555.00 46,225.00 5,929.00

165.00 71.00 11,715.00 27,225.00 5,041.00

240.00 71.00 17,040.00 57,600.00 5,041.00

160.00 72.00 11,520.00 25,600.00 5,184.00

150.00 65.00 9,750.00 22,500.00 4,225.002,082.00 721.00 151,325.00 465,844.00 52,147.00

X Y

Page 33: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 33

287.00 75.00 21,525.00 82,369.00 5,625.00

300.00 71.00 21,300.00 90,000.00 5,041.00

255.00 80.00 20,400.00 65,025.00 6,400.00

180.00 69.00 12,420.00 32,400.00 4,761.00

130.00 70.00 9,100.00 16,900.00 4,900.00

215.00 77.00 16,555.00 46,225.00 5,929.00

165.00 71.00 11,715.00 27,225.00 5,041.00

240.00 71.00 17,040.00 57,600.00 5,041.00

160.00 72.00 11,520.00 25,600.00 5,184.00

150.00 65.00 9,750.00 22,500.00 4,225.002,082.00 721.00 151,325.00 465,844.00 52,147.00

X Y XY X2 Y2

Page 34: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 34

287.00 75.00 21,525.00 82,369.00 5,625.00

300.00 71.00 21,300.00 90,000.00 5,041.00

255.00 80.00 20,400.00 65,025.00 6,400.00

180.00 69.00 12,420.00 32,400.00 4,761.00

130.00 70.00 9,100.00 16,900.00 4,900.00

215.00 77.00 16,555.00 46,225.00 5,929.00

165.00 71.00 11,715.00 27,225.00 5,041.00

240.00 71.00 17,040.00 57,600.00 5,041.00

160.00 72.00 11,520.00 25,600.00 5,184.00

150.00 65.00 9,750.00 22,500.00 4,225.002,082.00 721.00 151,325.00 465,844.00 52,147.00

X Y XY X2 Y2

Page 35: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 35

SSx = x2 - (x)2/n = 465,844-433,472.4 = 32,372

SP = xy - x y/n = 151,325-150, 112.2

b=SP/SSx, so b = 1,213/32,372=0.03

a = (1/n)(y-bx), so a = 0.1(721-60.38) = 66

So, Y=0.03x+66

X Y XY X2 Y2

2,082 721 151,325 465,844 52,147

^

Page 36: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 36

Y=0.03x+66^

0

10

20

30

40

50

60

70

80

90

0 50 100 150 200 250 300 350weight

hei

gh

t

Page 37: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 37

Regression Example: High Correlation

Find the regression equation for predicting probability of a teenage suicide attempt based on weekly heroine usage.

0102030405060708090100

1 2 3 4 5 6 7

200020012002

Page 38: Anthony Greene1 Regression Using Correlation To Make Predictions

38

X Y XY X2 Y2

1 0.2 0.2 1 0.04

1 0.31 0.31 1 0.0961

1 0.18 0.18 1 0.0324

2 0.27 0.54 4 0.0729

2 0.38 0.76 4 0.1444

2 0.46 0.92 4 0.2116

3 0.9 2.7 9 0.81

3 0.58 1.74 9 0.3364

3 0.45 1.35 9 0.2025

4 0.84 3.36 16 0.7056

4 0.74 2.96 16 0.5476

4 0.68 2.72 16 0.4624

5 0.85 4.25 25 0.7225

5 0.78 3.9 25 0.6084

5 0.73 3.65 25 0.5329

6 0.88 5.28 36 0.7744

6 0.82 4.92 36 0.6724

6 0.78 4.68 36 0.6084

7 0.92 6.44 49 0.8464

7 0.85 5.95 49 0.7225

7 0.91 6.37 49 0.8281

84 13.51 63.18 420 9.9779

Page 39: Anthony Greene1 Regression Using Correlation To Make Predictions

39

X Y XY X2 Y2

1 0.2 0.2 1 0.04

1 0.31 0.31 1 0.0961

1 0.18 0.18 1 0.0324

2 0.27 0.54 4 0.0729

2 0.38 0.76 4 0.1444

2 0.46 0.92 4 0.2116

3 0.9 2.7 9 0.81

3 0.58 1.74 9 0.3364

3 0.45 1.35 9 0.2025

4 0.84 3.36 16 0.7056

4 0.74 2.96 16 0.5476

4 0.68 2.72 16 0.4624

5 0.85 4.25 25 0.7225

5 0.78 3.9 25 0.6084

5 0.73 3.65 25 0.5329

6 0.88 5.28 36 0.7744

6 0.82 4.92 36 0.6724

6 0.78 4.68 36 0.6084

7 0.92 6.44 49 0.8464

7 0.85 5.95 49 0.7225

7 0.91 6.37 49 0.8281

84 13.51 63.18 420 9.9779

Page 40: Anthony Greene1 Regression Using Correlation To Make Predictions

40

X Y XY X2 Y2

1 0.2 0.2 1 0.04

1 0.31 0.31 1 0.0961

1 0.18 0.18 1 0.0324

2 0.27 0.54 4 0.0729

2 0.38 0.76 4 0.1444

2 0.46 0.92 4 0.2116

3 0.9 2.7 9 0.81

3 0.58 1.74 9 0.3364

3 0.45 1.35 9 0.2025

4 0.84 3.36 16 0.7056

4 0.74 2.96 16 0.5476

4 0.68 2.72 16 0.4624

5 0.85 4.25 25 0.7225

5 0.78 3.9 25 0.6084

5 0.73 3.65 25 0.5329

6 0.88 5.28 36 0.7744

6 0.82 4.92 36 0.6724

6 0.78 4.68 36 0.6084

7 0.92 6.44 49 0.8464

7 0.85 5.95 49 0.7225

7 0.91 6.37 49 0.8281

84 13.51 63.18 420 9.9779Σ

Page 41: Anthony Greene1 Regression Using Correlation To Make Predictions

41

n = 21

SSx = x2 - (x)2/n = 420 - 336 = 84

SP = xy - x y/n = 63.18 – 54.04 = 9.14

b=SP/SSx, so b = 9.14/84 = 0.109

a=(1/n)(y-bx), so a = (1/21)(13.51-9.156) = 0.207

So, Y= 0.109x + 0.207

X Y XY X2 Y2

84 13.51 63.18 420 9.9779Σ

^

Page 42: Anthony Greene1 Regression Using Correlation To Make Predictions

Anthony Greene 42

Why Is It Called Regression?

• For low correlations, the predicted value is close to the mean

• For zero correlations the prediction is the mean• Only for perfect correlations R2= 1.0 do the

predicted scores show as much variation as the actual scores

• Since perfect correlations are rare, we say that the predicted scores show regression towards the mean