26
12- 12-1

12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

Embed Size (px)

Citation preview

Page 1: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-11

Page 2: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-22

Chapter Twelve

Multiple Regressionand Model Building

McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Page 3: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-33

Multiple Regression

12.1 The Linear Regression Model12.2 The Least Squares Estimates and Prediction12.3 The Mean Squared Error and the Standard Error12.4 Model Utility: R2, Adjusted R2, and the F Test12.5 Testing the Significance of an Independent Variable12.6 Confidence Intervals and Prediction Intervals12.7 Dummy Variables12.8 Model Building and the Effects of Multicollinearity 12.9 Residual Analysis in Multiple Regression

Page 4: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-44

12.1 The Linear Regression Model

εxβxβxββ=εμy= kkxxy|x k ...22110,...,, 21

The linear regression model relating y to x1, x2, …, xk is

is the mean value of the dependent variable y when the values of the independent variables are x1, x2, …, xk.

are the regression parameters relating the mean value of y to x1, x2, …, xk.

is an error term that describes the effects on y of all factors other than the independent variables x1, x2, …, xk .

kkxxy|x xβxβxββ=μk

...22110,...,, 21

kββββ ,...,,, 210

where

Page 5: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-55

Example: The Linear Regression Model

εxβxββ=y 22110

Average Hourly Fuel ConsumptionWeek Temperature, x1 (F) Chill Index, x2 y (MMcf)1 28.0 18 12.42 28.0 14 11.73 32.5 24 12.44 39.0 22 10.85 45.9 8 9.46 57.8 16 9.57 58.1 1 8.08 62.5 0 7.5

Example 12.1: The Fuel Consumption Case

Page 6: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-66

The Linear Regression Model Illustrated

Example 12.1: The Fuel Consumption Case

Page 7: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-77

The Regression Model Assumptions

Assumptions about the model error terms, ’s

Mean Zero The mean of the error terms is equal to 0.

Constant Variance The variance of the error terms is, the same for every combination values of x1, x2, …, xk.

Normality The error terms follow a normal distribution for every combination values of x1, x2, …, xk.

Independence The values of the error terms are statistically independent of each other.

Model εxβxβxββ=εμy= kkxxy|x k ...22110,...,, 21

Page 8: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-88

12.2 Least Squares Estimates and Prediction

kk xbxbxbby 00220110 ...ˆ

Estimation/Prediction Equation:

b1, b2, …, bk are the least squares point estimates of the parameters 1, 2, …, k.

x01, x02, …, x0k are specified values of the independent predictor variables x1, x2, …, xk.

is the point estimate of the mean value of the dependent variable when the values of the independent variables are x01, x02, …, x0k. It is also the point prediction of an individual value of the dependent variable when the values of the independent variables are x01, x02, …, x0k.

Page 9: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-99

Example: Least Squares Estimation

Example 12.3: The Fuel Consumption Case Minitab OutputFuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill

Predictor Coef StDev T PConstant 13.1087 0.8557 15.32 0.000Temp -0.09001 0.01408 -6.39 0.001Chill 0.08249 0.02200 3.75 0.013

S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%

Analysis of VarianceSource DF SS MS F PRegression 2 24.875 12.438 92.30 0.000Residual Error 5 0.674 0.135Total 7 25.549

Predicted Values (Temp = 40, Chill = 10) Fit StDev Fit 95.0% CI 95.0% PI 10.333 0.170 ( 9.895, 10.771) ( 9.293, 11.374)

Page 10: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1010

Example: Point Predictions and Residuals

Example 12.3: The Fuel Consumption CaseObserved Fuel Predicted Fuel

Average Hourly Consumption Consumption ResidualWeek Temperature, x1 (F) Chill Index, x2 y (MMcf) 13.1087 - .0900x1 + .0825x2 e = y - pred

1 28.0 18 12.4 12.0733 0.32672 28.0 14 11.7 11.7433 -0.04333 32.5 24 12.4 12.1631 0.23694 39.0 22 10.8 11.4131 -0.61315 45.9 8 9.4 9.6372 -0.23726 57.8 16 9.5 9.2260 0.27407 58.1 1 8.0 7.9616 0.03848 62.5 0 7.5 7.4831 0.0169

Page 11: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1111

12.3 Mean Square Error and Standard Error

Mean Square Error, point estimate of residual variance )1(

2

kn-

SSEMSEs

)1(

kn-

SSEMSEs Standard Error, point estimate of

residual standard deviation

Example 12.3 The Fuel Consumption Case

0.1348

38

674.0

)1(2

kn-

SSEMSEs 0.3671 1348.02ss

22 )ˆ( iii yyeSSE Sum of Squared Errors

Analysis of VarianceSource DF SS MS F PRegression 2 24.875 12.438 92.30 0.000Residual Error 5 0.674 0.135Total 7 25.549

Page 12: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1212

12.4 Model Utility: Multiple Coefficient of Determination, R²

The multiple coefficient of determination R2 is

variationTotal

n variatioExplainedR 2

(SSE)SquaresofSumErrorˆ= variationdUnexplaine

(SSR) SquaresofSumRegressionˆ= variationExplained

(SSTO) SquaresofSumTotal = variationTotal

2

2

2

)y(y

)yy(

)y(y

ii

i

i

variation dUnexplaine variation Explained variation Total

R2 is the proportion of the total variation in y explained by the linear regression model

2Multiple correlation coefficient R, R

Page 13: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1313

12.4 Model Utility: Adjusted R2

The adjusted multiple coefficient of determination is

)1(

1

1R 22

kn

n

n

kR

Fuel Consumption Case:S = 0.3671 R-Sq = 97.4% R-Sq(adj) = 96.3%

Analysis of VarianceSource DF SS MS F PRegression 2 24.875 12.438 92.30 0.000Residual Error 5 0.674 0.135Total 7 25.549

963.0)12(8

18

18

2974.0R,974.0

25.549

24.875 R 2 2

Page 14: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1414

12.4 Model Utility: F Test for Linear Regression Model

To test H0: = = …= = 0 versus

Ha: At least one of the, , …, k is not equal to 0

Test Statistic:

1)](k-)/[n variationed(Unexplain

)/k variation(Explained

F(model)

Reject H0 in favor of Ha if: F(model) > For p-value <

Fis based on k numerator and n-(k+1) denominator degrees of freedom.

Page 15: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1515

Example: F Test for Linear Regression

Test Statistic:

30.92)38/(674.0

2/875.24

1)](k-)/[n variationed(Unexplain

)/k variation(ExplainedF(model)

Example 12.5 The Fuel Consumption Case Minitab Output

Reject H0 at level of significance, since

Fis based on 2 numerator and 5 denominator degrees of freedom.

F-test at = 0.05 level of significance

05.0000.0value-p

and79.530.92F(model) 05.F

Analysis of VarianceSource DF SS MS F PRegression 2 24.875 12.438 92.30 0.000Residual Error 5 0.674 0.135Total 7 25.549

Page 16: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1616

12.5 Testing Significance of the Independent Variable

bj

j

s

bt=

Test Statistic

If the regression assumptions hold, we can reject H0: j = 0 at the level of significance (probability of Type I error equal to ) if and only if the appropriate rejection point condition holds or, equivalently, if the corresponding p-value is less than .

0:

0:

0:

ja

ja

ja

H

H

H

2/2/

2/

or

isthat,

tttt

tt

tt

tt

t, t/2 and p-values are based on n – (k+1) degrees of freedom.

Alternative Reject H0 if: p-Value

tofrightondistributit underarea Twice

tofleftondistributit underArea

tofrightondistributit underArea

100(1-)% Confidence Interval for j

][ 2/ jbj stb

Page 17: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1717

Example: Testing and Estimation for s

Example 12.6: The Fuel Consumption Case Minitab Output

Predictor Coef StDev T PConstant 13.1087 0.8557 15.32 0.000Temp -0.09001 0.01408 -6.39 0.001Chill 0.08249 0.02200 3.75 0.013

025.2 571.275.3

02200.0

08249.0

2

ts

bt=

b

013.0)75.3(2 tPvaluep

t, t/2 and p-values are based on 5 degrees of freedom.

Chill is significant at the = 0.05 level, but not at = 0.01

0.13905]0.02593,[

]05656.008249.0[

)]02200.0)(571.2(08249.0[

][22/2

bstb

Test Interval

Page 18: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1818

12.6 Confidence and Prediction Intervals

valueDistance]ty[ )ˆ()ˆ(/2 sss yyyy

t is based on n-(k+1) degrees of freedom

valueDistance+1],ty[ ˆˆ/2 sss yy

Prediction:

100(1 - )% confidence interval for the mean value of y

If the regression assumptions hold,

100(1 - )% prediction interval for an individual value of y

kk xbxbxbby 00220110 ...ˆ

(Distance value requires matrix algebra)

Page 19: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-1919

Example: Confidence and Prediction Intervals

Example 12.9 The Fuel Consumption Case Minitab Output

FuelCons = 13.1 - 0.0900 Temp + 0.0825 Chill

Predicted Values (Temp = 40, Chill = 10) Fit StDev Fit 95.0% CI 95.0% PI 10.333 0.170 (9.895, 10.771) (9.293,11.374)

]771.10,895.9[

]438.0333.10[

]0.2144515)3671.02.571)(([10.333

] valueDistancety[ /2

s

95% Confidence Interval 95% Prediction Interval

]374.11,292.9[

]041.1333.10[

]0.21445151)3671.02.571)(([10.333

] valueDistance1ty[ /2

s

Page 20: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-2020

12.7Dummy Variables

Number of Location SalesHouseholds Dummy Volume

Store x Location DM y1 161 Street 0 157.272 99 Street 0 93.283 135 Street 0 136.814 120 Street 0 123.795 164 Street 0 153.516 221 Mall 1 241.747 179 Mall 1 201.548 204 Mall 1 206.719 214 Mall 1 229.78

10 101 Mall 1 135.22

Example 12.11 The Electronics World Case

otherwise0

locationmallainisstoreaif1MD

Location Dummy Variable

Page 21: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-2121

Example: Regression with a Dummy Variable

Example 12.11: The Electronics World Case Minitab Output

Sales = 17.4 + 0.851 Households + 29.2 DM

Predictor Coef StDev T PConstant 17.360 9.447 1.84 0.109Househol 0.85105 0.06524 13.04 0.000DM 29.216 5.594 5.22 0.001

S = 7.329 R-Sq = 98.3% R-Sq(adj) = 97.8%

Analysis of Variance

Source DF SS MS F PRegression 2 21412 10706 199.32 0.000Residual Error 7 376 54Total 9 21788

Page 22: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-2222

12.8 Model Building and the Effects of Multicollinearity

Example: The Sale Territory Performance Case

Sales Time MktPoten Adver MktShare Change Accts WkLoad Rating3669.88 43.10 74065.11 4582.88 2.51 0.34 74.86 15.05 4.93473.95 108.13 58117.30 5539.78 5.51 0.15 107.32 19.97 5.12295.10 13.82 21118.49 2950.38 10.91 -0.72 96.75 17.34 2.94675.56 186.18 68521.27 2243.07 8.27 0.17 195.12 13.40 3.46125.96 161.79 57805.11 7747.08 9.15 0.50 180.44 17.64 4.62134.94 8.94 37806.94 402.44 5.51 0.15 104.88 16.22 4.55031.66 365.04 50935.26 3140.62 8.54 0.55 256.10 18.80 4.63367.45 220.32 35602.08 2086.16 7.07 -0.49 126.83 19.86 2.36519.45 127.64 46176.77 8846.25 12.54 1.24 203.25 17.42 4.94876.37 105.69 42053.24 5673.11 8.85 0.31 119.51 21.41 2.82468.27 57.72 36829.71 2761.76 5.38 0.37 116.26 16.32 3.12533.31 23.58 33612.67 1991.85 5.43 -0.65 142.28 14.51 4.22408.11 13.82 21412.79 1971.52 8.48 0.64 89.43 19.35 4.32337.38 13.82 20416.87 1737.38 7.80 1.01 84.55 20.02 4.24586.95 86.99 36272.00 10694.20 10.34 0.11 119.51 15.26 5.52729.24 165.85 23093.26 8618.61 5.15 0.04 80.49 15.87 3.63289.40 116.26 26879.59 7747.89 6.64 0.68 136.58 7.81 3.42800.78 42.28 39571.96 4565.81 5.45 0.66 78.86 16.00 4.23264.20 52.84 51866.15 6022.70 6.31 -0.10 136.58 17.44 3.63453.62 165.04 58749.82 3721.10 6.35 -0.03 138.21 17.98 3.11741.45 10.57 23990.82 860.97 7.37 -1.63 75.61 20.99 1.62035.75 13.82 25694.86 3571.51 8.39 -0.43 102.44 21.66 3.41578.00 8.13 23736.35 2845.50 5.15 0.04 76.42 21.46 2.74167.44 58.54 34314.29 5060.11 12.88 0.22 136.58 24.78 2.82799.97 21.14 22809.53 3552.00 9.14 -0.74 88.62 24.96 3.9

Page 23: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-2323

Correlation Matrix

Example: The Sale Territory Performance Case

Page 24: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-2424

Multicollinearity

Multicollinearity refers to the condition where the independent variables (or predictors) in a model are dependent, related, or correlated with each other.

EffectsHinders ability to use bjs, t statistics, and p-values to assess the relative importance of predictors.Does not hinder ability to predict the dependent (or response) variable.

DetectionScatter Plot MatrixCorrelation MatrixVariance Inflation Factors (VIF)

Page 25: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-2525

12.9 Residual Analysis in Multiple Regression

For an observed value of yi, the residual is

)...(ˆ 110 ikkiiiii xbxbbyyye

If the regression assumptions hold, the residuals should look like a random sample from a normal distribution with mean 0 and variance 2.

Residual Plots

Residuals versus each independent variableResiduals versus predicted y’sResiduals in time order (if the response is a time series)Histogram of residualsNormal plot of the residuals

Page 26: 12-1. 12-2 Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved

12-12-2626

Multiple Regression

Summary:12.1 The Linear Regression Model12.2 The Least Squares Estimates and Prediction12.3 The Mean Squared Error and the Standard Error12.4 Model Utility: R2, Adjusted R2, and the F Test12.5 Testing the Significance of an Independent

Variable12.6 Confidence Intervals and Prediction Intervals12.7 Dummy Variables12.8 Model Building and the Effects of

Multicollinearity 12.9 Residual Analysis in Multiple Regression