15
Stat 112: Lecture 18 Notes • Chapter 7.1: Using and Interpreting Indicator Variables. • Visualizing polynomial regressions in multiple regression • Review Problem for Quiz

Stat 112: Lecture 18 Notes

Embed Size (px)

DESCRIPTION

Stat 112: Lecture 18 Notes. Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem for Quiz. Comparing Toy Factory Managers. - PowerPoint PPT Presentation

Citation preview

Page 1: Stat 112: Lecture 18 Notes

Stat 112: Lecture 18 Notes

• Chapter 7.1: Using and Interpreting Indicator Variables.

• Visualizing polynomial regressions in multiple regression

• Review Problem for Quiz

Page 2: Stat 112: Lecture 18 Notes

Comparing Toy Factory Managers

• An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (A, B and C). Data in toyfactorymanager.JMP.

• How do the managers compare?

Page 3: Stat 112: Lecture 18 Notes

Including Categorical Variable in Multiple Regression: Right

Approach• Create an indicator (dummy) variable for

each category.• Manager[a] = 1 if Manager is A 0 if Manager is not A • Manager[b] = 1 if Manager is B 0 if Manager is not B• Manager[c] = 1 if Manager is C 0 if Manager is not C

Page 4: Stat 112: Lecture 18 Notes

• For a run size of length 100, the estimated time for run of Managers A, B and C ar

• For the same run size, Manager A is estimated to be on average 38.41-(-14.65)=53.06 minutes slower than Manager B and

38.41-(-23.76)=62.17 minutes slower than Manager C.

Response Time for Run Expanded Estimates Nominal factors expanded to all levels Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001 Run Size 0.243369 0.025076 9.71 <.0001 Manager[a] 38.409663 3.005923 12.78 <.0001 Manager[b] -14.65115 3.031379 -4.83 <.0001 Manager[c] -23.75851 2.995898 -7.93 <.0001

1*76.230*65.140*41.38100*24.071.176),100|(ˆ

0*76.231*65.140*41.38100*24.071.176),100|(ˆ

0*76.230*65.141*41.38100*24.071.176),100|(ˆ

cManagerRunsizeTimeE

bManagerRunsizeTimeE

aManagerRunsizeTimeE

Page 5: Stat 112: Lecture 18 Notes

Effect Tests

• Effect test for manager: vs. Haa: not all manager[a],manager[b],manager[c] equal. Null hypothesis is that all : not all manager[a],manager[b],manager[c] equal. Null hypothesis is that all

managers are the same (in terms of mean run time) when run size is held fixed, managers are the same (in terms of mean run time) when run size is held fixed, alternative hypothesis is that not all managers are the same (in terms of mean run alternative hypothesis is that not all managers are the same (in terms of mean run time) when run size is held fixed. This is a partial F test.time) when run size is held fixed. This is a partial F test.

• p-value for Effect Test <.0001. Strong evidence that not all managers are the same p-value for Effect Test <.0001. Strong evidence that not all managers are the same when run size is held fixed. when run size is held fixed.

• Note: equivalent to Note: equivalent to because JMP has constraint that manager[a]+manager[b]+manager[c]=0.• Effect test for Run size tests null hypothesis that Run Size coefficient is 0 versus

alternative hypothesis that Run size coefficient isn’t zero. Same p-value as t-test.

Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F Run Size 1 1 25260.250 94.1906 <.0001 Manager 2 2 44773.996 83.4768 <.0001

Expanded Estimates Nominal factors expanded to all levels Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001 Run Size 0.243369 0.025076 9.71 <.0001 Manager[a] 38.409663 3.005923 12.78 <.0001 Manager[b] -14.65115 3.031379 -4.83 <.0001 Manager[c] -23.75851 2.995898 -7.93 <.0001

][][][:0 cManagerbManageraManagerH

0][][][: cmanagerbmanageramanagerHa

][][][:0 cManagerbManageraManagerH

Page 6: Stat 112: Lecture 18 Notes

• Effect tests shows that managers are not equal.• For the same run size, Manager C is best (lowest mean

run time), followed by Manager B and then Manager C.• The above model assumes no interaction between

Manager and run size – the difference between the mean run time of the managers is the same for all run sizes.

Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F Run Size 1 1 25260.250 94.1906 <.0001 Manager 2 2 44773.996 83.4768 <.0001 Expanded Estimates Nominal factors expanded to all levels Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001 Run Size 0.243369 0.025076 9.71 <.0001 Manager[a] 38.409663 3.005923 12.78 <.0001 Manager[b] -14.65115 3.031379 -4.83 <.0001 Manager[c] -23.75851 2.995898 -7.93 <.0001

Page 7: Stat 112: Lecture 18 Notes

The effect test shows that the managers are not all equal. How about differences between specific managers? How does the mean time for Alex (Manager A) compare to Bob (Manager B) for fixed run sizes. We can test whether there is a difference between Alex and Bob, and can construct a confidence interval for the difference.

0

1

H : [manager a]= [manager b]

H : [manager a] [manager b]

We will use the Custom Test provided by JMPIN: Expanded Estimates Nominal factors expanded to all levels

Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001

Manager[a] 38.409663 3.005923 12.78 <.0001

Manager[b] -14.65115 3.031379 -4.83 <.0001

Manager[c] -23.75851 2.995898 -7.93 <.0001

Run Size 0.243369 0.025076 9.71 <.0001 Note: The t-tests for Managers are not useful here.

Testing for Differences Between Specific Managers

Page 8: Stat 112: Lecture 18 Notes

Inference for Differences of Coefficients in JMP

Consider a multiple regression

1 0 1 1( | , , )k k kE Y X X X X

Suppose we want to test

0 : i jH vs. :a i jH .

This is equivalent to

0 : 0i jH vs. : 0a i jH

After Fit Model, click the red triangle next to Response. Then click Estimates and then Custom Test. Then enter a 1 for i and a -1

for j .

Prob>|t| is the p-value for the two sided test. Std Error is the standard error of i jb b so that

an approximate 95% confidence interval for

i j is 2*Std Errori jb b

Page 9: Stat 112: Lecture 18 Notes

Custom Test

Parameter

Intercept 0

Manager[a] 1

Manager[b] -1

Run Size 0

= 0 Value 53.06 Std Error 5.24

t Ratio 10.12 Prob>|t| 2.929978e-14

Conclusion: We reject the null hypothesis and conclude that Manager a and Manager b perform differently. 2. The confidence interval for the difference: 53.06 2.003 5.24=53.06 10.50

Here *56,.9752.003 t .

53.06=38.41-(-14.65)_ 5.242 3.0062+3.0312 since ’s are dependent.

Page 10: Stat 112: Lecture 18 Notes

3. How do we test the difference between Manager a and Manager c?

0 1H : [manager a]= [manager c], H : [manager a] [manager c]

Now [manager c]=-( [manager a] + [manager b])

So 0H : [manager a]= [manager c]=-( [manager a] + [manager b])

It becomes: 0H : 2 [manager a]+ [manager b])=0

We can use the Custom Test again with 2 and 1 as contrasts:

Custom Test

Parameter Intercept 0 Manager[a] 2 Manager[b] 1 Run Size 0 = 0 Value 62.168 Std Error 5.180 t Ratio 12.002 Prob>|t| 4.098093e-17

It is significant.

Page 11: Stat 112: Lecture 18 Notes

Visualizing Polynomial Effects in Multiple Regression

• Fit Special provides a good plot of the effect of X on Y when using simple regression.

• When we use polynomials in multiple regression, how can we see the changes in the mean that are associated with changes in one variable.

• Solution: Use the Prediction Profiler. After Fit Model, click the red triangle next to Response, click Factor Profiling and then Profiler.

Page 12: Stat 112: Lecture 18 Notes

Polynomials in Multiple Regression Example-

• Fast Food Locations. An analyst working for a fast food chain is

asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp

Page 13: Stat 112: Lecture 18 Notes

Polynomial Regression for Fast Food Chain DataResponse Revenue

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 1062.4317 72.9538 14.56 <.0001 Income 5.4563847 2.162126 2.52 0.0202 Age 1.6421762 5.413888 0.30 0.7648 (Income-24.2)*(Income-24.2) -3.979104 0.570833 -6.97 <.0001 (Age-8.392)*(Age-8.392) -4.112892 1.267459 -3.24 0.0041

Since quadratic coefficients for both income and age are significant, we will consider cubic terms for both income and age. Response Revenue Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 734.8803 146.2852 5.02 <.0001 Income 17.781924 5.237722 3.39 0.0032 Age 2.8688282 6.581383 0.44 0.6681 (Income-24.2)*(Income-24.2) -3.197419 0.607609 -5.26 <.0001 (Age-8.392)*(Age-8.392) -5.009426 1.501641 -3.34 0.0037 (Income-24.2)*(Income-24.2)*(Income-24.2) -0.248737 0.099073 -2.51 0.0218 (Age-8.392)*(Age-8.392)*(Age-8.392) 0.2871644 0.330796 0.87 0.3968

Since cubic coefficient for age is not significant, we will use a second-order (quadratic) polynomial for age. Since cubic coefficient for income is significant, let’s consider a fourth-order polynomial for income.

Page 14: Stat 112: Lecture 18 Notes

Response Revenue Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 746.81501 146.7115 5.09 <.0001 Income 16.32413 4.872106 3.35 0.0036 Age 6.3972838 5.319759 1.20 0.2447 (Income-24.2)*(Income-24.2) -4.023468 1.240477 -3.24 0.0045 (Age-8.392)*(Age-8.392) -4.240435 1.163926 -3.64 0.0019 (Income-24.2)*(Income-24.2)*(Income-24.2) -0.221251 0.090451 -2.45 0.0249 (Income-24.2)*(Income-24.2)*(Income-24.2)*(Income-24.2) 0.0092713 0.015497 0.60 0.5571

Since fourth order coefficient for income is not significant, we will use a third order (cubic) polynomial for income. Final Model: Response Revenue Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 752.87321 143.8674 5.23 <.0001 Income 15.903936 4.739058 3.36 0.0033 Age 6.3016037 5.226739 1.21 0.2428 (Income-24.2)*(Income-24.2) -3.36781 0.571292 -5.90 <.0001 (Age-8.392)*(Age-8.392) -4.166012 1.137538 -3.66 0.0017 (Income-24.2)*(Income-24.2)*(Income-24.2) -0.205342 0.084981 -2.42 0.0259

E(Revenue|Income =30, Age=10)=752.87+15.90*30+6.30*10-3.37*(30-24.2)2 -4.17*(10-8.392)2-0.21*(30-24.2)3= 1128.88

Page 15: Stat 112: Lecture 18 Notes

Prediction ProfilerPrediction Profiler

Rev

enue

1281

777.065

1190.632

±33.265

Income

15.6

33.6

24.2

Age3.

4

14.9

8.392

The Prediction Profiler graph for Income shows the mean revenue as income varies when the other variables are set equal to their sample means (i.e., Age = 8.392). The Prediction Profiler graph fro Age shows the mean revenue as Age varies when the other variables are set equal to their sample means (i.e., Income=24.2).