Upload
daria-fleming
View
30
Download
0
Embed Size (px)
DESCRIPTION
Stat 112: Lecture 18 Notes. Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem for Quiz. Comparing Toy Factory Managers. - PowerPoint PPT Presentation
Citation preview
Stat 112: Lecture 18 Notes
• Chapter 7.1: Using and Interpreting Indicator Variables.
• Visualizing polynomial regressions in multiple regression
• Review Problem for Quiz
Comparing Toy Factory Managers
• An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (A, B and C). Data in toyfactorymanager.JMP.
• How do the managers compare?
Including Categorical Variable in Multiple Regression: Right
Approach• Create an indicator (dummy) variable for
each category.• Manager[a] = 1 if Manager is A 0 if Manager is not A • Manager[b] = 1 if Manager is B 0 if Manager is not B• Manager[c] = 1 if Manager is C 0 if Manager is not C
• For a run size of length 100, the estimated time for run of Managers A, B and C ar
• For the same run size, Manager A is estimated to be on average 38.41-(-14.65)=53.06 minutes slower than Manager B and
38.41-(-23.76)=62.17 minutes slower than Manager C.
Response Time for Run Expanded Estimates Nominal factors expanded to all levels Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001 Run Size 0.243369 0.025076 9.71 <.0001 Manager[a] 38.409663 3.005923 12.78 <.0001 Manager[b] -14.65115 3.031379 -4.83 <.0001 Manager[c] -23.75851 2.995898 -7.93 <.0001
1*76.230*65.140*41.38100*24.071.176),100|(ˆ
0*76.231*65.140*41.38100*24.071.176),100|(ˆ
0*76.230*65.141*41.38100*24.071.176),100|(ˆ
cManagerRunsizeTimeE
bManagerRunsizeTimeE
aManagerRunsizeTimeE
Effect Tests
• Effect test for manager: vs. Haa: not all manager[a],manager[b],manager[c] equal. Null hypothesis is that all : not all manager[a],manager[b],manager[c] equal. Null hypothesis is that all
managers are the same (in terms of mean run time) when run size is held fixed, managers are the same (in terms of mean run time) when run size is held fixed, alternative hypothesis is that not all managers are the same (in terms of mean run alternative hypothesis is that not all managers are the same (in terms of mean run time) when run size is held fixed. This is a partial F test.time) when run size is held fixed. This is a partial F test.
• p-value for Effect Test <.0001. Strong evidence that not all managers are the same p-value for Effect Test <.0001. Strong evidence that not all managers are the same when run size is held fixed. when run size is held fixed.
• Note: equivalent to Note: equivalent to because JMP has constraint that manager[a]+manager[b]+manager[c]=0.• Effect test for Run size tests null hypothesis that Run Size coefficient is 0 versus
alternative hypothesis that Run size coefficient isn’t zero. Same p-value as t-test.
Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F Run Size 1 1 25260.250 94.1906 <.0001 Manager 2 2 44773.996 83.4768 <.0001
Expanded Estimates Nominal factors expanded to all levels Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001 Run Size 0.243369 0.025076 9.71 <.0001 Manager[a] 38.409663 3.005923 12.78 <.0001 Manager[b] -14.65115 3.031379 -4.83 <.0001 Manager[c] -23.75851 2.995898 -7.93 <.0001
][][][:0 cManagerbManageraManagerH
0][][][: cmanagerbmanageramanagerHa
][][][:0 cManagerbManageraManagerH
• Effect tests shows that managers are not equal.• For the same run size, Manager C is best (lowest mean
run time), followed by Manager B and then Manager C.• The above model assumes no interaction between
Manager and run size – the difference between the mean run time of the managers is the same for all run sizes.
Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F Run Size 1 1 25260.250 94.1906 <.0001 Manager 2 2 44773.996 83.4768 <.0001 Expanded Estimates Nominal factors expanded to all levels Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001 Run Size 0.243369 0.025076 9.71 <.0001 Manager[a] 38.409663 3.005923 12.78 <.0001 Manager[b] -14.65115 3.031379 -4.83 <.0001 Manager[c] -23.75851 2.995898 -7.93 <.0001
The effect test shows that the managers are not all equal. How about differences between specific managers? How does the mean time for Alex (Manager A) compare to Bob (Manager B) for fixed run sizes. We can test whether there is a difference between Alex and Bob, and can construct a confidence interval for the difference.
0
1
H : [manager a]= [manager b]
H : [manager a] [manager b]
We will use the Custom Test provided by JMPIN: Expanded Estimates Nominal factors expanded to all levels
Term Estimate Std Error t Ratio Prob>|t| Intercept 176.70882 5.658644 31.23 <.0001
Manager[a] 38.409663 3.005923 12.78 <.0001
Manager[b] -14.65115 3.031379 -4.83 <.0001
Manager[c] -23.75851 2.995898 -7.93 <.0001
Run Size 0.243369 0.025076 9.71 <.0001 Note: The t-tests for Managers are not useful here.
Testing for Differences Between Specific Managers
Inference for Differences of Coefficients in JMP
Consider a multiple regression
1 0 1 1( | , , )k k kE Y X X X X
Suppose we want to test
0 : i jH vs. :a i jH .
This is equivalent to
0 : 0i jH vs. : 0a i jH
After Fit Model, click the red triangle next to Response. Then click Estimates and then Custom Test. Then enter a 1 for i and a -1
for j .
Prob>|t| is the p-value for the two sided test. Std Error is the standard error of i jb b so that
an approximate 95% confidence interval for
i j is 2*Std Errori jb b
Custom Test
Parameter
Intercept 0
Manager[a] 1
Manager[b] -1
Run Size 0
= 0 Value 53.06 Std Error 5.24
t Ratio 10.12 Prob>|t| 2.929978e-14
Conclusion: We reject the null hypothesis and conclude that Manager a and Manager b perform differently. 2. The confidence interval for the difference: 53.06 2.003 5.24=53.06 10.50
Here *56,.9752.003 t .
53.06=38.41-(-14.65)_ 5.242 3.0062+3.0312 since ’s are dependent.
3. How do we test the difference between Manager a and Manager c?
0 1H : [manager a]= [manager c], H : [manager a] [manager c]
Now [manager c]=-( [manager a] + [manager b])
So 0H : [manager a]= [manager c]=-( [manager a] + [manager b])
It becomes: 0H : 2 [manager a]+ [manager b])=0
We can use the Custom Test again with 2 and 1 as contrasts:
Custom Test
Parameter Intercept 0 Manager[a] 2 Manager[b] 1 Run Size 0 = 0 Value 62.168 Std Error 5.180 t Ratio 12.002 Prob>|t| 4.098093e-17
It is significant.
Visualizing Polynomial Effects in Multiple Regression
• Fit Special provides a good plot of the effect of X on Y when using simple regression.
• When we use polynomials in multiple regression, how can we see the changes in the mean that are associated with changes in one variable.
• Solution: Use the Prediction Profiler. After Fit Model, click the red triangle next to Response, click Factor Profiling and then Profiler.
Polynomials in Multiple Regression Example-
• Fast Food Locations. An analyst working for a fast food chain is
asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp
Polynomial Regression for Fast Food Chain DataResponse Revenue
Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 1062.4317 72.9538 14.56 <.0001 Income 5.4563847 2.162126 2.52 0.0202 Age 1.6421762 5.413888 0.30 0.7648 (Income-24.2)*(Income-24.2) -3.979104 0.570833 -6.97 <.0001 (Age-8.392)*(Age-8.392) -4.112892 1.267459 -3.24 0.0041
Since quadratic coefficients for both income and age are significant, we will consider cubic terms for both income and age. Response Revenue Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 734.8803 146.2852 5.02 <.0001 Income 17.781924 5.237722 3.39 0.0032 Age 2.8688282 6.581383 0.44 0.6681 (Income-24.2)*(Income-24.2) -3.197419 0.607609 -5.26 <.0001 (Age-8.392)*(Age-8.392) -5.009426 1.501641 -3.34 0.0037 (Income-24.2)*(Income-24.2)*(Income-24.2) -0.248737 0.099073 -2.51 0.0218 (Age-8.392)*(Age-8.392)*(Age-8.392) 0.2871644 0.330796 0.87 0.3968
Since cubic coefficient for age is not significant, we will use a second-order (quadratic) polynomial for age. Since cubic coefficient for income is significant, let’s consider a fourth-order polynomial for income.
Response Revenue Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 746.81501 146.7115 5.09 <.0001 Income 16.32413 4.872106 3.35 0.0036 Age 6.3972838 5.319759 1.20 0.2447 (Income-24.2)*(Income-24.2) -4.023468 1.240477 -3.24 0.0045 (Age-8.392)*(Age-8.392) -4.240435 1.163926 -3.64 0.0019 (Income-24.2)*(Income-24.2)*(Income-24.2) -0.221251 0.090451 -2.45 0.0249 (Income-24.2)*(Income-24.2)*(Income-24.2)*(Income-24.2) 0.0092713 0.015497 0.60 0.5571
Since fourth order coefficient for income is not significant, we will use a third order (cubic) polynomial for income. Final Model: Response Revenue Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 752.87321 143.8674 5.23 <.0001 Income 15.903936 4.739058 3.36 0.0033 Age 6.3016037 5.226739 1.21 0.2428 (Income-24.2)*(Income-24.2) -3.36781 0.571292 -5.90 <.0001 (Age-8.392)*(Age-8.392) -4.166012 1.137538 -3.66 0.0017 (Income-24.2)*(Income-24.2)*(Income-24.2) -0.205342 0.084981 -2.42 0.0259
E(Revenue|Income =30, Age=10)=752.87+15.90*30+6.30*10-3.37*(30-24.2)2 -4.17*(10-8.392)2-0.21*(30-24.2)3= 1128.88
Prediction ProfilerPrediction Profiler
Rev
enue
1281
777.065
1190.632
±33.265
Income
15.6
33.6
24.2
Age3.
4
14.9
8.392
The Prediction Profiler graph for Income shows the mean revenue as income varies when the other variables are set equal to their sample means (i.e., Age = 8.392). The Prediction Profiler graph fro Age shows the mean revenue as Age varies when the other variables are set equal to their sample means (i.e., Income=24.2).