17
Multiple Regression Model INSR 260, Spring 2009 Bob Stine 1

Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Multiple Regression Model

INSR 260, Spring 2009Bob Stine

1

Page 2: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Overview Multiple Regression Model (MRM)

Estimators, terminology! ! ! ! ! similar to SRM

Assumptions! ! ! ! ! ! ! ! ! ! ! ! new plots

Inference! ! ! ! ! ! ! ! ! ! ! ! ! new test

Prediction! ! ! ! ! ! ! ! ! ! ! similar to SRM

Examples! ! ! ! (from Bowerman, Ch 4)Fuel consumptionSales management

2

Page 3: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Multiple Regression ModelEquation has k explanatory variablesMean! ! ! ! E Y|X = β0 + β1 X1 +...+ βk Xk = μy|x

Observations ! ! yi = β0 + β1 xi1 +...+ βk xik + εi

Assumptions (as in SRM)Independent observationsEqual variance σ2

Normal distribution around “line”! yi ~ N(μy|x,σ2)! ! ! ! εi ~ N(0, σ2)

k+2 parameters identify model! β0, β1, …, βk, σ2

3

Page 4: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Least SquaresCriterion

Find estimates that minimize sum of squared deviations! ! ! mina Σ(yi - a0 - a1 xi1 - … - ak xik)2

Fitted values, residualsFitted values (on the line)! ŷ = b0 + b1 xi1 + … + bk xik

Residual deviations!! ! ! e = y - ŷ

Standard error of regression (estimate of σ2)s2 = Σ ei2/(n-k-1)degrees of freedomRMSE = square root of s2

4

Page 5: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Goodness of FitR-squared statistic

Square of correlation between Y and ŶPercentage of “explained” variationAlways increases as variables are added to equation

Adjusted R-squaredWill not increase unless s2 gets smallerDifference from R2 increases as k increases

R2 = Explained SS

Total SS

R2 = 1 - s2

var(y)5

Page 6: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Checking AssumptionsScatterplots of Y on X1, Y on X2

Data for fuel consumption (n = 8)

Scatterplot matrix

6

7

8

9

10

11

12FUELCONS

25 30 35 40 45 50 55 60 65

TEMP

7

8

9

10

11

12

FUELCONS

0 5 10 15 20 25

CHILL

7

8

9

10

11

12

25

30

35

40

45

50

55

60

0

5

10

15

20

FUELCONS

7 8 9 10 11 12

TEMP

25 35 40 45 55

CHILL

0 5 10 15 20

Scatterplot Matrix

FUELCONS

TEMP

CHILL

1.0000

-0.9484

0.8706

-0.9484

1.0000

-0.7182

0.8706

-0.7182

1.0000

FUELCONS TEMP CHILL

Correlations

Data

y = weekly natural gas consumption X1 = average temperature

X2 = chill index (wind, clouds, temp)

Page 7: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Partial vs MarginalSRM

MRM

7

7

8

9

10

11

12

FU

ELC

ON

S

25 30 35 40 45 50 55 60 65

TEMP

Linear Fit

FUELCONS = 15.837857 - 0.1279217*TEMP

Intercept

TEMP

Term

15.837857

-0.127922

Estimate

0.801773

0.017457

Std Error

19.75

-7.33

t Ratio

<.0001*

0.0003*

Prob>|t|

Parameter Estimates

Linear Fit

Bivariate Fit of FUELCONS By TEMP

7

8

9

10

11

12

FU

ELC

ON

S

0 5 10 15 20 25

CHILL

Linear Fit

FUELCONS = 7.8494238 + 0.1835399*CHILL

Intercept

CHILL

Term

7.8494238

0.1835399

Estimate

0.652654

0.042339

Std Error

12.03

4.34

t Ratio

<.0001*

0.0049*

Prob>|t|

Parameter Estimates

Linear Fit

Bivariate Fit of FUELCONS By CHILL

7

8

9

10

11

12

FU

ELC

ON

S A

ctu

al

7 8 9 10 11 12

FUELCONS Predicted P=0.0001

RSq=0.97 RMSE=0.3671

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.97363

0.963081

0.367078

10.2125

8

Summary of Fit

Intercept

TEMP

CHILL

Term

13.108737

-0.090014

0.082495

Estimate

0.855698

0.014077

0.022003

Std Error

15.32

-6.39

3.75

t Ratio

<.0001*

0.0014*

0.0133*

Prob>|t|

Parameter Estimates

Slopes differ

Page 8: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

More DiagnosticsOverall plots (MRM version of SRM scatterplots)

Leverage plots (partial regression plots)Simple regression view of MR slope, one for each slope

8

7

8

9

10

11

12

FU

ELC

ON

S A

ctu

al

7 8 9 10 11 12

FUELCONS Predicted P=0.0001

RSq=0.97 RMSE=0.3671

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0.0

0.1

0.2

0.3

0.4

FU

ELC

ON

S R

esid

ual

7 8 9 10 11 12

FUELCONS Predicted

8

9

10

11

12

FU

ELC

ON

S

Levera

ge R

esid

uals

25 30 35 40 45 50 55 60 65

TEMP Leverage, P=0.0014

8

9

10

11

12

FU

ELC

ON

S

Levera

ge R

esid

uals

0 5 10 15 20 25

CHILL Leverage, P=0.0133

Page 9: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

InferenceStandard error of the slope is affected by correlation among explanatory variables

Variance inflation factor (Chap 5)! Var(slope in MRM) ≈ Var(slope in SRM) VIF

Three equivalent methods for each estimated slope and the intercept

Confidence intervalt-statisticp-value

9

Var(bj) =!2

!i(xij ! xj)2

"1

1!R2Xj |Xm !=j

#

Intercept

TEMP

CHILL

Term

13.108737

-0.090014

0.082495

Estimate

0.855698

0.014077

0.022003

Std Error

15.32

-6.39

3.75

t Ratio

<.0001*

0.0014*

0.0133*

Prob>|t|

10.909095

-0.126201

0.0259356

Lower 95%

15.308379

-0.053827

0.1390543

Upper 95%

.

2.07

2.07

VIF

Parameter Estimates

Page 10: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Overall F TestTest both slopes simultaneously

H0: β1 = β2 = 0Ratio of variance explained to remaining variation

Test of the size of R2 statistic

10

7

8

9

10

11

12

FU

ELC

ON

S A

ctu

al

7 8 9 10 11 12

FUELCONS Predicted P=0.0001

RSq=0.97 RMSE=0.3671Model

Error

C. Total

Source

2

5

7

DF

24.875018

0.673732

25.548750

Sum of

Squares

12.4375

0.1347

Mean Square

92.3031

F Ratio

0.0001*

Prob > F

Analysis of Variance

F = R2/k

(1-R2)/(n-k-1)

Page 11: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

PredictionNo simple plot

Extrapolation effect is more subtle

Software is needed to identify extrapolationOptions in Fit Model to save various standard errors as well as prediction and confidence intervalsAdd an extra row (before fitting) to get JMP to predict a new case

11

ŷ

Page 12: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Sales ExampleQuestion

Evaluation of sales representativesResponse is annual company sales in territory

y measured in thousands of units

Data are a sample for n = 25 sales representatives

Several explanatory variablesTime (months) with the companyTotal sales of company and rivals in territory (potential)Advertising expenditure in territoryCompany’s market share in prior four yearsChange in company’s market share

12

Page 13: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Initial Graphical Analysis

13

2000

4000

6000

0

100

200

300

20000

40000

60000

0

4000

8000

2

6

10

-2.0

-1.0

0.0

1.0

Sales

2000 5000

Time

0 100 250

MktPoten

20000

Adver

0 4000

MktShare

2 4 6 8 12

Change

-2.0 0.0

Scatterplot Matrix

Sales

Time

MktPoten

Adver

MktShare

Change

1.0000

0.6229

0.5978

0.5962

0.4835

0.4892

0.6229

1.0000

0.4540

0.2492

0.1062

0.2515

0.5978

0.4540

1.0000

0.1741

-0.2107

0.2683

0.5962

0.2492

0.1741

1.0000

0.2645

0.3765

0.4835

0.1062

-0.2107

0.2645

1.0000

0.0855

0.4892

0.2515

0.2683

0.3765

0.0855

1.0000

Sales Time MktPoten Adver MktShare Change

Correlations

Page 14: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Multiple Regression

14

-750

-500

-250

0

250

500

750

Sale

s R

esid

ual

1000 3000 4000 5000 6000

Sales Predicted

Residual by Predicted Plot

Overall fit

Leverage plots

2000

3000

4000

5000

6000

7000Sale

s A

ctu

al

1000 3000 4000 5000 6000

Sales Predicted P<.0001

RSq=0.92 RMSE=430.23

Actual by Predicted Plot

2000

3000

4000

5000

6000

7000

Sale

s L

evera

ge

Resid

uals

-50 0 50 100 200 300 400

Time Leverage, P=0.0065

Leverage Plot

2000

3000

4000

5000

6000

7000

Sale

s L

evera

ge

Resid

uals

10000 30000 50000 70000

MktPoten Leverage, P<.0001

Leverage Plot

Page 15: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Model SummaryOverall fit

Individual estimatesInterpretation of these estimates?Why linear? Implications of model are very strong.

15

RSquare

RSquare Adj

Root Mean Square Error

Mean of Response

Observations (or Sum Wgts)

0.915009

0.892643

430.2319

3374.568

25

Summary of Fit

Model

Error

C. Total

Source

5

19

24

DF

37862659

3516890

41379549

Sum of

Squares

7572532

185099

Mean Square

40.9106

F Ratio

<.0001*

Prob > F

Analysis of Variance

Intercept

Time

MktPoten

Adver

MktShare

Change

Term

-1113.788

3.6121012

0.0420881

0.1288568

256.95554

324.53345

Estimate

419.8869

1.1817

0.006731

0.037036

39.13607

157.2831

Std Error

-2.65

3.06

6.25

3.48

6.57

2.06

t Ratio

0.0157*

0.0065*

<.0001*

0.0025*

<.0001*

0.0530

Prob>|t|

Parameter Estimates

Page 16: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

PredictionConditions for another rep (not one of these 25)

Sales were 3082Time with company!! 85.42Market potential!! ! 35,182.73Advertising!! ! ! ! 7,281.65Market share!! ! ! 9.64Change in share!! ! 0.28!

Prediction resultsPlug values for explanatory variables into equationPrediction !! ! ! ! ! ! ! ! ŷ = 4182Confidence interval for mean! 3884.9 to!4478.6Prediction interval for rep! ! 3233.6! to 5129.9Benchmarking implication: How is this rep doing?

16

Page 17: Multiple Regression Model - Wharton Statistics Departmentstine/insr260_2009/lectures/mrm.pdf · Multiple Regression 14-750-500-250 0 250 500 750 Sales Residual 1000 3000 4000 5000

Summary Multiple Regression Model (MRM)

Estimators!! ! partial (MRM) vs marginal (SRM)

Assumptions! ! ! ! ! ! ! ! ! ! leverage plots

Inference! ! ! ! ! ! ! ! ! ! ! ! ! F-test

Prediction! ! ! ! ! ! ! ! ! ! ! ! ! Software

17