29
Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Embed Size (px)

DESCRIPTION

Can you draw the regression line?

Citation preview

Page 1: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Class 22. Understanding Regression

EMBSPart of 12.7

Sections 1-3 and 7 of Pfeifer Regression note

Page 2: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

What is the regression line?

• It is a line drawn through a cloud of points.• It is the line that minimizes sum of squared errors.– Errors are also known as residuals.– Error = Actual – Predicted.– Error is the vertical distance point (actual) to line

(predicted).– Points above the line are positive errors.

• The average of the errors will be always be zero• The regression line will always “go through” the

average X, average Y.

Error aka residualPredicted aka fitted

Page 3: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

X

YCan you draw the regression line?

Page 4: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

X

YA

B

C

D

E

Which is the regression line?

F

Page 5: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

X

Y

D

Which is the regression line?

Page 6: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

X

YWhich is the regression line?

(1,1) (3,1)

(2,7)

(3,3)(2,3)(1,3) Error

= 7-3 = 4

Error = 1-3 = -2

Error = 1-3 = -2

Sum of Errors is 0!

SSE=(-2^2+4^2+-2^2) is smaller than from any other

line. The line goes through (2,3), the average.

Page 7: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Draw in the regression line…

40 60 80 100 120 140 160 1800

20406080

100120140160180

40 60 80 100 120 140 1600

20

40

6080

100

120

140

160

20 40 60 80 100 120 140 1600

20

40

60

80

100

120

140

160

0 50 100 150 200 250 30035

55

75

95

115

135

155

Page 8: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Draw in the regression line…

40 60 80 100 120 140 160 1800

20406080

100120140160180

40 60 80 100 120 140 1600

20

40

6080

100

120

140

160

20 40 60 80 100 120 140 1600

20

40

60

80

100

120

140

160

0 50 100 150 200 250 30035

55

75

95

115

135

155

Page 9: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Two Points determine a line…….and regression can give you the equation.

Degrees C Degrees F0 32

100 212

0 20 40 60 80 100 1200

50

100

150

200

250

Degrees C

Degr

ees F

Page 10: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Two Points determine a line…….and regression can give you the equation.

Degrees C Degrees F0 32

100 212

0 20 40 60 80 100 1200

50

100

150

200

250

f(x) = 1.8 x + 32

Degrees C

Degr

ees F

Page 11: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Data Set A Data Set B Data Set C Data Set DX Y X Y X Y X Y10 9.14 10 8.04 10 7.47 19 12.088 8.14 8 6.95 8 6.47 19 11.26

13 8.74 13 7.58 13 8.97 19 13.219 8.77 9 8.81 9 6.97 19 14.3411 9.25 11 8.33 11 10.87 19 13.9714 8.1 14 9.96 14 9.47 19 12.546 6.13 6 7.24 6 5.47 19 10.754 3.1 4 4.26 4 4.47 8 7.00

12 9.13 12 10.84 12 8.47 19 11.067 7.26 7 4.82 7 8.87 19 13.415 4.74 5 5.68 5 4.97 19 12.39

Four Sets of X,Y Data

Page 12: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Four Sets of X,Y Data

2 4 6 8 10 12 14 160123456789

10

A

2 4 6 8 10 12 14 160

2

4

6

8

10

12

C

2 4 6 8 10 12 14 160

2

4

6

8

10

12

B

6 8 10 12 14 16 18 2002468

10121416

D

Page 13: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.8166R Square 0.6669Adjusted R Square 0.6299Standard Error 1.2357Observations 11

ANOVA df SS MS F Significance F

Regression 1 27.5100 27.5100 18.0164 0.0022Residual 9 13.7425 1.5269Total 10 41.2525

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%Intercept 2.9993 2.1532 1.3929 0.1971 -1.8716 7.8702 -1.8716 7.8702X 0.5001 0.1178 4.2446 0.0022 0.2336 0.7666 0.2336 0.7666

Four Sets of X,Y DataData Analysis/Regression

Identical Regression OutputFor A, B, C, and D!!!!!

Page 14: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Assumptions

• Y is normal and we sample n independent observations.– The sample mean is the estimate of μ– The sample standard deviation s is the estimate of σ.– We use and s and n to test hypotheses about μ• Using the t-statistic and the t-distribution with n-1 dof.

– We never forecasted “the next Y”.• Although, our point forecast for a new Y would be

Page 15: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Example: Section 4 IQsIQ

   Mean 108.545Standard Error 3.448Median 110Mode 102Standard Deviation 19.807Sample Variance 392.318Kurtosis 0.228Skewness -0.499Range 85Minimum 57Maximum 142Sum 3582Count 33 n

𝑌s

To test H0: μ=100

The CLT tells us this test works even if Y is not

normal.

Page 16: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Regression Assumptions

• Y│X is normal with mean a+bX and standard deviation σ, and we sample n independent observations.– We use regression to estimate a, b, and σ.• , , and “standard error” are the appropriate estimates.• Our point forecast for a new observation is + (X)

– (Plug X into the regression equation)• At some point, we will learn how to use regression output

to test interesting hypotheses.• What about a probability forecast of the new YlX?

Page 17: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Summary: The key assumption of linear regression…..

• Y ~ N(μ,σ) (no regression)

• Y│X ~ N(a+bX,σ) (with regression)

– In other wordsμ = a + b (X) or E(Y│X) = a + b(X)

Without regression, we used data to

estimate and test hypotheses about the

parameter μ.

With regression, we use (x,y) data to estimate and test

hypotheses about the parameters a and b.

In both cases, we use the t because we

don’t know σ.

With regression, we also want to use X to forecast a new Y.

The mean of Y given X is a linear function of X.

EMBS(12.14)

Page 18: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Example: Assignment 22MSF Hours26 234.2 4.1729 4.4234.3 4.7585.9 4.83143.2 6.6785.5 7140.6 7.08140.6 7.1740.4 7.17101 10239.7 12179.3 12.5126.5 13.67140.8 15.08

Regression StatisticsMultiple R 0.72600331R Square 0.527080806Adjusted R Square 0.490702407Standard Error 2.773595935Observations 15

ANOVA df

Regression 1Residual 13Total 14

CoefficientsIntercept 3.312316042MSF 0.044489502

n

�̂��̂�

Standarderror

Page 19: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Forecasting Y│X=157.3

• Plug X=157.3 into the regression equation to get 10.31 as the point forecast.– The point forecast is the mean of the probability

distribution forecast.• Under Certain Assumptions…….– GOOD METHOD• Pr(Y<8) = NORMDIST(8,10.31,2.77,true) = 0.202

Assumes and and “standard error” are a,

b, and σ.

Page 20: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Example: Assignment 22MSF Hours26 234.2 4.1729 4.4234.3 4.7585.9 4.83143.2 6.6785.5 7140.6 7.08140.6 7.1740.4 7.17101 10239.7 12179.3 12.5126.5 13.67140.8 15.08

Regression StatisticsMultiple R 0.72600331R Square 0.527080806Adjusted R Square 0.490702407Standard Error 2.773595935Observations 15

ANOVA df

Regression 1Residual 13Total 14

CoefficientsIntercept 3.312316042MSF 0.044489502

Job A Job BIntercept 1 1MSF 157.3 64.7Point Forecast 10.3105 6.1908sigma 2.77 2.77X 8 8Normdist 0.2021 0.7432

n│(X=157.3)

�̂��̂�

Standarderror

Page 21: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Forecasting Y│X=157.3

• Plug X=157.3 into the regression equation to get 10.31 the point forecast.– The point forecast is the mean of the probability

distribution forecast.• Under Certain Assumptions…….– BETTER METHOD• t= (8-10.31)/2.77 = -0.83• Pr(Y<8) = 1-t.dist.rt(-0.83,13) = 0.210

Assumes and are a and b….but accounts for the

fact that “standard error” is not σ

dof = n - 2

Page 22: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Forecasting Y│X=157.3

• Plug X=157.3 into the regression equation to get 10.31 the point forecast.– The point forecast is the mean of the probability

distribution forecast.• Under Certain Assumptions…….– PERFECT METHOD• t= (8-10.31)/2.93 = -0.79• Pr(Y<8) = 1-t.dist.rt(-0.79,13) = 0.222

To account for using and to estimate a and b, we must

increase the standard deviation used in the forecast. The

“correct” standard deviation is called “standard error of

prediction”…which here is 0.293.

dof = n - 2

Page 23: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Probability Forecasting with Regressionsummary

• Plug X into the regression equation to calculate the point forecast.– This becomes the mean.

• GOOD– Use the normal with “standard error” in place of σ.

• BETTER– Use the t (with n-2 dof) to account for using “standard error”

to estimate σ.• PERFECT– Use the t with the “standard error of prediction” to account for

using and to estimate a and b.

Page 24: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Probability Forecasting with Regression

• “Standard error of prediction” is larger than “standard error” and depends on– 1/n (the larger the n the smaller is “standard error

of prediction”)– (X-)^2 (the farther the X is from the average X, the

larger is “standard error of prediction”)• As n gets big, the “standard error of

prediction” approaches “standard error”.

Page 25: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛=𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑒𝑟𝑟𝑜𝑟 ×√1+ 1𝑛+( 𝑋− 𝑋 )2

∑ ( 𝑋−𝑋 )2❑

Summed over the n data points

The X for which we predict Y

The good and better methods ignore these

terms…okay the bigger the n.

(EMBS 12.26)

Page 26: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

BOTTOM LINE

• You will be asked to use the BETTER METHOD– Use the t with n-2 dof– Just use “standard error”

• Know that “standard error” is smaller than the correct “standard deviation of prediction”.– As a result, your probability distribution is a little too

narrow.• Know that the “standard deviation of prediction”

depends on 1/n and (X-)^2 … which means it approaches “standard error” as n gets big.

Page 27: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Much ado about nothing?

0 50 100 150 200 250 300

-5

0

5

10

15

20

25

95% Prediction Intervals

MSF

Hour

sPerfect

(widest and curved)

Good (straight and

narrowest)

Better

Page 28: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

TODAY• Got a better idea of how the “least squares” regression line

goes through the cloud of points.• Saw that several “clouds” can have exactly the same

regression line….so chart the cloud.• Practiced using a regression equation to calculate a point

forecast (a mean)• Saw three methods for creating a probability distribution

forecast of Y│X.– We will use the better method.– We will know that it understates the actual uncertainty…..a

problem that goes away as n gets big.

Page 29: Class 22. Understanding Regression EMBS Part of 12.7 Sections 1-3 and 7 of Pfeifer Regression note

Next Class

• We will learn about “adjusted R square”– (p 9-10 pfeifer note)

– The most over-rated statistic of all time.• We will learn the four assumptions required to use

regression to make a probability forecast of Y│X.– (Section 5 pfeifer note, 12.4 EMBS)

– And how to check each of them.• We will learn how to test H0: b=0.– (p 12-13 pfeifer note, 12.5 EMBS)

– And why this is such an important test.