103
6-4 Other Aspects of Regression 6-4.1 Polynomial Models

6-4 Other Aspects of Regression

  • Upload
    aldon

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

6-4 Other Aspects of Regression. 6-4.1 Polynomial Models. 6-4 Other Aspects of Regression. 6-4.1 Polynomial Models. 6-4 Other Aspects of Regression. 6-4.1 Polynomial Models. - PowerPoint PPT Presentation

Citation preview

Page 1: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Page 2: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Page 3: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.1 Polynomial Models

Suppose that we wanted to test the contribution of the second-order terms to this model. In other words, what is the value of expanding the model to include the additional terms?

𝑌=𝛽0+𝛽1 (𝑇 −1212.5 )+𝛽2 (𝑅− 12.444 )+𝜖

Page 4: 6-4 Other Aspects of Regression

OPTIONS NOOVP NODATE NONUMBER;DATA ex69;INPUT YIELD TEMP RATIO;TEMPC=TEMP-1212.5; RATIOC=RATIO-12.444; TEMRATC=TEMPC*RATIOC; TEMPCSQ=TEMPC**2; RATIOCSQ=RATIOC**2;CARDS;49.0 1300 7.550.2 1300 9.050.5 1300 11.048.5 1300 13.547.5 1300 17.044.5 1300 23.028.0 1200 5.331.5 1200 7.534.5 1200 11.035.0 1200 13.538.0 1200 17.038.5 1200 23.015.0 1100 5.317.0 1100 7.520.5 1100 11.029.5 1100 17.0PROC REG DATA=EX69; MODEL YIELD= TEMPC RATIOC TEMRATC TEMPCSQ RATIOCSQ/VIF; TITLE 'QUADRATIC REGRESSION MODEL - FULL MODEL';

PROC REG DATA=EX69; MODEL YIELD=TEMPC RATIOC/VIF; TITLE 'LINEAR REGRESSION MODEL - REDUCED MODEL';RUN; QUIT;

Example 6-9

6-4 Other Aspects of Regression

Page 5: 6-4 Other Aspects of Regression

QUADRATIC REGRESSION MODEL - FULL MODEL

The REG Procedure Model: MODEL1 Dependent Variable: YIELD

Number of Observations Read 16 Number of Observations Used 16

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 5 2112.33724 422.46745 371.49 <.0001 Error 10 11.37214 1.13721 Corrected Total 15 2123.70937

Root MSE 1.06640 R-Square 0.9946 Dependent Mean 36.10625 Adj R-Sq 0.9920 Coeff Var 2.95351

Parameter Estimates

Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 36.43394 0.55288 65.90 <.0001 0 TEMPC 1 0.13048 0.00364 35.83 <.0001 1.13707 RATIOC 1 0.48005 0.05860 8.19 <.0001 1.45205 TEMRATC 1 -0.00733 0.00079928 -9.18 <.0001 1.37367 TEMPCSQ 1 0.00017820 0.00005854 3.04 0.0124 1.20061 RATIOCSQ 1 -0.02367 0.01019 -2.32 0.0425 1.71889

6-4 Other Aspects of Regression

Page 6: 6-4 Other Aspects of Regression

LINEAR REGRESSION MODEL - REDUCED MODEL

The REG Procedure Model: MODEL1 Dependent Variable: YIELD

Number of Observations Read 16 Number of Observations Used 16

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 1952.97853 976.48926 74.35 <.0001 Error 13 170.73085 13.13314 Corrected Total 15 2123.70937

Root MSE 3.62397 R-Square 0.9196 Dependent Mean 36.10625 Adj R-Sq 0.9072 Coeff Var 10.03695

Parameter Estimates

Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 36.10634 0.90599 39.85 <.0001 0 TEMPC 1 0.13396 0.01191 11.25 <.0001 1.05264 RATIOC 1 0.35106 0.16955 2.07 0.0589 1.05264

6-4 Other Aspects of Regression

Page 7: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.1 Polynomial Models

=

= 46.72

Tabled F = f0.05 = 2.44 (p-value < 0.0001)

Full Model:

Reduced Model: 𝑌=𝛽0+𝛽1 (𝑇 −1212.5 )+𝛽2 (𝑅− 12.444 )+𝜖

= 0

Page 8: 6-4 Other Aspects of Regression

PTIONS NOOVP NODATE NONUMBER;DATA BIDS;INFILE 'C:\users\myung\Documents\Teaching\ 학부과목 \imen214-stats\bids.dat';INPUT PRICE QUANTITY BIDS;LOGPRICE=LOG(PRICE);QUANSQ=QUANTITY**2;

PROC REG DATA=BIDS; MODEL PRICE= QUANTITY/P CLM CLI XPX; PLOT PRICE*QUANTITY/PRED CONF; /* SCATTER PLOT */ PLOT R.*P.;TITLE 'LINEAR REGRESSION OF PRICE VS. QUANTITY';

PROC REG DATA=BIDS; MODEL LOGPRICE= QUANTITY/P CLM CLI XPX;TITLE 'LINEAR REGRESSION OF LOGPRICE VS. QUANTITY';

PROC REG DATA=BIDS; MODEL PRICE= QUANTITY QUANSQ/P CLM CLI XPX;TITLE 'QUADRATIC REGRESSION OF PRICE VS. QUANTITY';RUN; QUIT;

Example

6-4 Other Aspects of Regression

Page 9: 6-4 Other Aspects of Regression

Residual Plots

(b) The variance of the observations may by increasing with time or with the magnitude of yi or xi. Data transformation on the response y is often used to eliminate this problem (, ).(c) Plots of residuals against and xi also indicate inequality of variance.(d) Indicates model inadequacy; that is, higher-order terms should be added to the model, a transformation on the x-variable or the y-variable (or both) should be considered, or other regressors should be considered.

Page 10: 6-4 Other Aspects of Regression

LINEAR REGRESSION OF PRICE VS. QUANTITY

The REG Procedure Model: MODEL1

Model Crossproducts X'X X'Y Y'Y

Variable Intercept QUANTITY PRICE

Intercept 30 273 2374.97 QUANTITY 273 3688.98 12492.46 PRICE 2374.97 12492.46 266887.1815--------------------------------------------------------------------------------------------- LINEAR REGRESSION OF PRICE VS. QUANTITY

The REG Procedure Model: MODEL1 Dependent Variable: PRICE

Number of Observations Read 30 Number of Observations Used 30

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 69039 69039 196.62 <.0001 Error 28 9831.89259 351.13902 Corrected Total 29 78871

Root MSE 18.73870 R-Square 0.8753 Dependent Mean 79.16567 Adj R-Sq 0.8709 Coeff Var 23.67024

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 148.05523 5.98682 24.73 <.0001 QUANTITY 1 -7.57028 0.53989 -14.02 <.0001

6-4 Other Aspects of Regression

Page 11: 6-4 Other Aspects of Regression

LINEAR REGRESSION OF PRICE VS. QUANTITY

The REG Procedure Model: MODEL1 Dependent Variable: PRICE

Output Statistics

Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual

1 153.3200 140.4849 5.5523 129.1115 151.8584 100.4509 180.5190 12.8351 2 74.1100 93.5492 3.5717 86.2330 100.8654 54.4737 132.6247 -19.4392 3 29.7200 21.6315 5.3423 10.6883 32.5748 -18.2824 61.5455 8.0885 4 54.6700 57.9689 3.7403 50.3072 65.6305 18.8272 97.1105 -3.2989 5 68.3900 77.6516 3.4229 70.6401 84.6631 38.6320 116.6712 -9.2616 6 119.0400 120.0452 4.4949 110.8378 129.2526 80.5718 159.5185 -1.0052 7 116.1400 135.1858 5.2599 124.4114 145.9601 95.3178 175.0537 -19.0458 8 146.4900 147.2982 5.9426 135.1253 159.4711 107.0298 187.5666 -0.8082 9 81.8100 89.0070 3.4925 81.8531 96.1610 49.9616 128.0525 -7.1970 10 19.5800 8.7620 6.0757 -3.6835 21.2076 -31.5897 49.1138 10.8180 11 141.0800 126.1014 4.7863 116.2970 135.9058 86.4846 165.7183 14.9786 12 101.7200 112.4749 4.1651 103.9432 121.0066 73.1537 151.7961 -10.7549 13 24.8800 16.3323 5.6378 4.7838 27.8808 -23.7518 56.4165 8.5477 14 19.4300 8.7620 6.0757 -3.6835 21.2076 -31.5897 49.1138 10.6680 15 39.6300 63.2681 3.6042 55.8853 70.6509 24.1800 102.3561 -23.6381 16 151.1300 135.9428 5.3010 125.0842 146.8013 96.0520 175.8336 15.1872 17 79.1800 92.7922 3.5565 85.5069 100.0774 53.7224 131.8619 -13.6122 18 204.9400 146.5412 5.8985 134.4586 158.6238 106.2999 186.7824 58.3988 19 81.0600 96.5773 3.6396 89.1220 104.0327 57.4755 135.6791 -15.5173 20 37.6200 61.7540 3.6396 54.2987 69.2094 22.6522 100.8558 -24.1340 21 17.1300 -3.3504 6.8070 -17.2939 10.5931 -44.1890 37.4882 20.4804 22 37.8100 46.6135 4.1345 38.1443 55.0826 7.3057 85.9212 -8.8035 23 130.7200 134.4287 5.2190 123.7382 145.1193 94.5833 174.2741 -3.7087 24 26.0700 8.0050 6.1204 -4.5321 20.5422 -32.3750 48.3851 18.0650 25 39.5900 36.7721 4.5657 27.4197 46.1245 -2.7353 76.2795 2.8179 26 66.2000 79.1657 3.4212 72.1576 86.1737 40.1467 118.1847 -12.9657 27 160.2500 129.8866 4.9789 119.6878 140.0853 90.1703 169.6028 30.3634 28 19.3900 17.0894 5.5950 5.6286 28.5501 -22.9696 57.1483 2.3006 29 86.6000 113.9890 4.2276 105.3292 122.6487 74.6397 153.3382 -27.3890 30 47.2700 60.2400 3.6778 52.7063 67.7736 21.1231 99.3568 -12.9700

Sum of Residuals 0 Sum of Squared Residuals 9831.89259 Predicted Residual SS (PRESS) 11542

6-4 Other Aspects of Regression

Page 12: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 13: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 14: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 15: 6-4 Other Aspects of Regression

LINEAR REGRESSION OF LOGPRICE VS. QUANTITY

The REG Procedure Model: MODEL1

Model Crossproducts X'X X'Y Y'Y

Variable Intercept QUANTITY LOGPRICE

Intercept 30 273 123.86106074 QUANTITY 273 3688.98 990.08475122 LOGPRICE 123.86106074 990.08475122 527.52302023--------------------------------------------------------------------------------------------- LINEAR REGRESSION OF LOGPRICE VS. QUANTITY

The REG Procedure Model: MODEL1 Dependent Variable: LOGPRICE

Number of Observations Read 30 Number of Observations Used 30

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 15.59165 15.59165 799.63 <.0001 Error 28 0.54596 0.01950 Corrected Total 29 16.13761

Root MSE 0.13964 R-Square 0.9662 Dependent Mean 4.12870 Adj R-Sq 0.9650 Coeff Var 3.38210

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 5.16397 0.04461 115.75 <.0001 QUANTITY 1 -0.11377 0.00402 -28.28 <.0001

6-4 Other Aspects of Regression

Page 16: 6-4 Other Aspects of Regression

LINEAR REGRESSION OF LOGPRICE VS. QUANTITY

The REG Procedure Model: MODEL1 Dependent Variable: LOGPRICE

Output Statistics

Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual

1 5.0325 5.0502 0.0414 4.9654 5.1350 4.7519 5.3485 -0.0177 2 4.3056 4.3449 0.0266 4.2903 4.3994 4.0537 4.6360 -0.0393 3 3.3918 3.2641 0.0398 3.1825 3.3456 2.9667 3.5615 0.1277 4 4.0013 3.8102 0.0279 3.7531 3.8673 3.5185 4.1018 0.1912 5 4.2252 4.1059 0.0255 4.0537 4.1582 3.8152 4.3967 0.1193 6 4.7795 4.7430 0.0335 4.6744 4.8116 4.4489 5.0372 0.0364 7 4.7548 4.9706 0.0392 4.8903 5.0509 4.6735 5.2677 -0.2158 8 4.9870 5.1526 0.0443 5.0619 5.2433 4.8525 5.4527 -0.1656 9 4.4044 4.2766 0.0260 4.2233 4.3299 3.9856 4.5676 0.1278 10 2.9745 3.0707 0.0453 2.9779 3.1634 2.7700 3.3714 -0.0962 11 4.9493 4.8340 0.0357 4.7610 4.9071 4.5388 5.1293 0.1153 12 4.6222 4.6293 0.0310 4.5657 4.6928 4.3363 4.9223 -0.007046 13 3.2141 3.1844 0.0420 3.0984 3.2705 2.8858 3.4831 0.0296 14 2.9668 3.0707 0.0453 2.9779 3.1634 2.7700 3.3714 -0.1039 15 3.6796 3.8898 0.0269 3.8348 3.9448 3.5985 4.1811 -0.2102 16 5.0181 4.9819 0.0395 4.9010 5.0629 4.6847 5.2792 0.0362 17 4.3717 4.3335 0.0265 4.2792 4.3878 4.0423 4.6246 0.0382 18 5.3227 5.1412 0.0440 5.0512 5.2313 4.8413 5.4411 0.1815 19 4.3952 4.3904 0.0271 4.3348 4.4459 4.0990 4.6817 0.004827 20 3.6275 3.8670 0.0271 3.8115 3.9226 3.5757 4.1584 -0.2395 21 2.8408 2.8887 0.0507 2.7848 2.9926 2.5843 3.1930 -0.0478 22 3.6326 3.6395 0.0308 3.5764 3.7026 3.3466 3.9324 -0.006937 23 4.8731 4.9592 0.0389 4.8795 5.0389 4.6623 5.2561 -0.0861 24 3.2608 3.0593 0.0456 2.9659 3.1527 2.7584 3.3602 0.2015 25 3.6786 3.4916 0.0340 3.4219 3.5613 3.1972 3.7860 0.1870 26 4.1927 4.1287 0.0255 4.0765 4.1809 3.8379 4.4195 0.0640 27 5.0767 4.8909 0.0371 4.8149 4.9669 4.5950 5.1869 0.1858 28 2.9648 3.1958 0.0417 3.1104 3.2812 2.8973 3.4943 -0.2311 29 4.4613 4.6520 0.0315 4.5875 4.7166 4.3588 4.9452 -0.1907 30 3.8559 3.8443 0.0274 3.7881 3.9004 3.5528 4.1358 0.0116

Sum of Residuals 0 Sum of Squared Residuals 0.54596 Predicted Residual SS (PRESS) 0.63012

6-4 Other Aspects of Regression

Page 17: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 18: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 19: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 20: 6-4 Other Aspects of Regression

QUADRATIC REGRESSION OF PRICE VS. QUANTITY

The REG Procedure Model: MODEL1 Model Crossproducts X'X X'Y Y'Y

Variable Intercept QUANTITY QUANSQ PRICE

Intercept 30 273 3688.98 2374.97 QUANTITY 273 3688.98 57017.832 12492.46 QUANSQ 3688.98 57017.832 942040.4526 127145.9652 PRICE 2374.97 12492.46 127145.9652 266887.1815--------------------------------------------------------------------------------------------- QUADRATIC REGRESSION OF PRICE VS. QUANTITY

The REG Procedure Model: MODEL1 Dependent Variable: PRICE

Number of Observations Read 30 Number of Observations Used 30

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 74008 37004 205.45 <.0001 Error 27 4862.98599 180.11059 Corrected Total 29 78871

Root MSE 13.42053 R-Square 0.9383 Dependent Mean 79.16567 Adj R-Sq 0.9338 Coeff Var 16.95246

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 169.38879 5.90606 28.68 <.0001 QUANTITY 1 -15.23747 1.51008 -10.09 <.0001 QUANSQ 1 0.39391 0.07500 5.25 <.0001

6-4 Other Aspects of Regression

Page 21: 6-4 Other Aspects of Regression

QUADRATIC REGRESSION OF PRICE VS. QUANTITY

The REG Procedure Model: MODEL1 Dependent Variable: PRICE

Output Statistics

Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual

1 153.3200 154.5452 4.7936 144.7095 164.3809 125.3047 183.7857 -1.2252 2 74.1100 80.0994 3.6195 72.6729 87.5259 51.5789 108.6199 -5.9894 3 29.7200 24.7813 3.8728 16.8349 32.7278 -3.8790 53.4416 4.9387 4 54.6700 43.8449 3.7956 36.0569 51.6328 15.2281 72.4616 10.8251 5 68.3900 61.7498 3.8956 53.7568 69.7429 33.0766 90.4231 6.6402 6 119.0400 118.4028 3.2344 111.7664 125.0392 90.0778 146.7279 0.6372 7 116.1400 144.6235 4.1737 136.0599 153.1871 115.7860 173.4610 -28.4835 8 146.4900 167.8690 5.7838 156.0016 179.7363 137.8840 197.8540 -21.3790 9 81.8100 74.5022 3.7259 66.8572 82.1471 45.9240 103.0803 7.3078 10 19.5800 22.3824 5.0655 11.9889 32.7759 -7.0504 51.8152 -2.8024 11 141.0800 128.5129 3.4586 121.4166 135.6093 100.0766 156.9493 12.5671 12 101.7200 106.4742 3.1943 99.9201 113.0283 78.1683 134.7801 -4.7542 13 24.8800 23.5178 4.2632 14.7704 32.2652 -5.3748 52.4104 1.3622 14 19.4300 22.3824 5.0655 11.9889 32.7759 -7.0504 51.8152 -2.9524 15 39.6300 48.1415 3.8674 40.2062 56.0768 19.4843 76.7987 -8.5115 16 151.1300 146.0172 4.2535 137.2897 154.7448 117.1306 174.9039 5.1128 17 79.1800 79.1469 3.6383 71.6817 86.6120 50.6162 107.6775 0.0331 18 204.9400 166.3570 5.6639 154.7357 177.9784 136.4685 196.2455 38.5830 19 81.0600 83.9885 3.5410 76.7229 91.2541 55.5095 112.4676 -2.9285 20 37.6200 46.8745 3.8496 38.9757 54.7733 18.2274 75.5216 -9.2545 21 17.1300 22.2044 6.8875 8.0724 36.3365 -8.7469 53.1557 -5.0744 22 37.8100 35.9376 3.5916 28.5683 43.3069 7.4320 64.4433 1.8724 23 130.7200 143.2376 4.0968 134.8317 151.6435 114.4465 172.0287 -12.5176 24 26.0700 22.3122 5.1608 11.7231 32.9013 -7.1903 51.8147 3.7578 25 39.5900 30.5186 3.4799 23.3784 37.6588 2.0712 58.9659 9.0714 26 66.2000 63.3477 3.8824 55.3817 71.3138 34.6820 92.0135 2.8523 27 160.2500 135.0878 3.7008 127.4944 142.6812 106.5234 163.6522 25.1622 28 19.3900 23.6747 4.1986 15.0598 32.2896 -5.1781 52.5275 -4.2847 29 86.6000 108.7969 3.1850 102.2617 115.3320 80.4954 137.0984 -22.1969 30 47.2700 45.6390 3.8296 37.7814 53.4967 17.0032 74.2748 1.6310

Sum of Residuals 0 Sum of Squared Residuals 4862.98599 Predicted Residual SS (PRESS) 6362.25589

6-4 Other Aspects of Regression

Page 22: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 23: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 24: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.2 Categorical Regressors

• Many problems may involve qualitative or categorical variables.

• The usual method for the different levels of a qualitative variable is to use indicator variables.

• For example, to introduce the effect of two different operators into a regression model, we could define an indicator variable as follows:

Page 25: 6-4 Other Aspects of Regression

Example 6-10

Y=gas mileage, x1=engine displacement, x2=horse powerx3=0 if automatic transmission 1 if manual transmission

if automatic (x3=0), then

if manual (x3=1), then

+)+

It is unreasonable because x1, x2 effects to x3 are not involved in the model

Interaction model:

if automatic (x3=0), then

if manual (x3=1), then

6-4 Other Aspects of Regression

Page 26: 6-4 Other Aspects of Regression

Dummy Variables

Many times a qualitative variable seems to be needed in a regression model. This can be accomplished by creating dummy variables or indicator variables. If a qualitative variable has levels you will need dummy variables. Notice that in ANOVA if a treatment had levels it had degrees of freedom. The ith dummy variable is defined as

This can be done automatically in PROC GLM by using the CLASSS statement as we did in ANOVA. Any dummy variables defined with respect to a qualitative variable must be treated as a group. Individual t-tests are not meaningful. Partial F-tests must be performed on the group of dummy variables.

6-4 Other Aspects of Regression

Page 27: 6-4 Other Aspects of Regression

OPTIONS NOOVP NODATE NONUMBER;DATA EX611;INPUT FORM SCENT COLOR RESIDUE REGION QUALITY @@;IF REGION=1 THEN REGION1=0; ELSE REGION1=1;FR=FORM*REGION1; RR=RESIDUE*REGION1;CARDS;6.3 5.3 4.8 3.1 1 91 4.4 4.9 3.5 3.9 1 873.9 5.3 4.8 4.7 1 82 5.1 4.2 3.1 3.6 1 835.6 5.1 5.5 5.1 1 83 4.6 4.7 5.1 4.1 1 844.8 4.8 4.8 3.3 1 90 6.5 4.5 4.3 5.2 1 848.7 4.3 3.9 2.9 1 97 8.3 3.9 4.7 3.9 1 935.1 4.3 4.5 3.6 1 82 3.3 5.4 4.3 3.6 1 845.9 5.7 7.2 4.1 2 87 7.7 6.6 6.7 5.6 2 807.1 4.4 5.8 4.1 2 84 5.5 5.6 5.6 4.4 2 846.3 5.4 4.8 4.6 2 82 4.3 5.5 5.5 4.1 2 794.6 4.1 4.3 3.1 2 81 3.4 5.0 3.4 3.4 2 836.4 5.4 6.6 4.8 2 81 5.5 5.3 5.3 3.8 2 844.7 4.1 5.0 3.7 2 83 4.1 4.0 4.1 4.0 2 80PROC REG DATA=EX611; MODEL QUALITY=FORM RESIDUE REGION1/R; TITLE 'MODEL WITH DUMMY VARIABLE';PROC REG DATA=EX611; MODEL QUALITY=FORM RESIDUE REGION1 FR RR/R; TITLE 'INTERACTION MODEL WITH DUMMY VARIABLE';RUN; QUIT;

Example 6-11

6-4 Other Aspects of Regression

Page 28: 6-4 Other Aspects of Regression

MODEL WITH DUMMY VARIABLE

The REG Procedure Model: MODEL1 Dependent Variable: QUALITY

Number of Observations Read 24 Number of Observations Used 24

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 3 339.74858 113.24953 23.05 <.0001 Error 20 98.25142 4.91257 Corrected Total 23 438.00000

Root MSE 2.21643 R-Square 0.7757 Dependent Mean 84.50000 Adj R-Sq 0.7420 Coeff Var 2.62300

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 89.80615 2.99018 30.03 <.0001 FORM 1 1.81923 0.32599 5.58 <.0001 RESIDUE 1 -3.37945 0.68582 -4.93 <.0001 REGION1 1 -3.40619 0.91941 -3.70 0.0014

6-4 Other Aspects of Regression

Page 29: 6-4 Other Aspects of Regression

INTERACTION MODEL WITH DUMMY VARIABLE

The REG Procedure Model: MODEL1 Dependent Variable: QUALITY

Number of Observations Read 24 Number of Observations Used 24

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 5 342.36626 68.47325 12.89 <.0001 Error 18 95.63374 5.31299 Corrected Total 23 438.00000

Root MSE 2.30499 R-Square 0.7817 Dependent Mean 84.50000 Adj R-Sq 0.7210 Coeff Var 2.72780

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 88.25694 4.83957 18.24 <.0001 FORM 1 1.98252 0.42919 4.62 0.0002 RESIDUE 1 -3.21529 0.95251 -3.38 0.0034 REGION1 1 -1.70670 6.57164 -0.26 0.7980 FR 1 -0.64190 0.94343 -0.68 0.5049 RR 1 0.43032 1.89360 0.23 0.8228

6-4 Other Aspects of Regression

Page 30: 6-4 Other Aspects of Regression

OPTIONS NOOVP NODATE NONUMBER;DATA appraise;INPUT price units age size parking area cond$ @@;IF COND='F' THEN COND1=1; ELSE COND1=0;IF COND='G' THEN COND2=1; ELSE COND2=0;CARDS;90300 4 82 4635 0 4266 F 384000 20 13 17798 0 14391 G157500 5 66 5913 0 6615 G 676200 26 64 7750 6 34144 E165000 5 55 5150 0 6120 G 300000 10 65 12506 0 14552 G108750 4 82 7160 0 3040 G 276538 11 23 5120 0 7881 G420000 20 18 11745 20 12600 G 950000 62 71 21000 3 39448 G560000 26 74 11221 0 30000 G 268000 13 56 7818 13 8088 F290000 9 76 4900 0 11315 E 173200 6 21 5424 6 4461 G323650 11 24 11834 8 9000 G 162500 5 19 5246 5 3828 G353500 20 62 11223 2 13680 F 134400 4 70 5834 0 4680 E187000 8 19 9075 0 7392 G 93600 4 82 6864 0 3840 F110000 4 50 4510 0 3092 G 573200 14 10 11192 0 23704 E79300 4 82 7425 0 3876 F 272000 5 82 7500 0 9542 EPROC REG DATA=APPRAISE; MODEL PRICE=UNITS AGE AREA COND1 COND2/R; TITLE 'REDUCED MODEL WITH DUMMY VARIABLE';RUN; QUIT;

Example

6-4 Other Aspects of Regression

Page 31: 6-4 Other Aspects of Regression

REDUCED MODEL WITH DUMMY VARIABLE

The REG Procedure Model: MODEL1 Dependent Variable: price

Number of Observations Read 24 Number of Observations Used 24

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 5 1.040122E12 2.080244E11 253.02 <.0001 Error 18 14799036255 822168681 Corrected Total 23 1.054921E12

Root MSE 28673 R-Square 0.9860 Dependent Mean 296193 Adj R-Sq 0.9821 Coeff Var 9.68067

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 176940 24044 7.36 <.0001 units 1 7256.44825 1185.16990 6.12 <.0001 age 1 -1155.95291 258.66820 -4.47 0.0003 area 1 11.86451 1.54837 7.66 <.0001 COND1 1 -61240 22434 -2.73 0.0138 COND2 1 -61572 18756 -3.28 0.0041

6-4 Other Aspects of Regression

Full model 0.9801 0.9746 34123

Reduced Model 0.9771 0.9737 34721

With dummy 0.9860 0.9821 28673

Page 32: 6-4 Other Aspects of Regression

REDUCED MODEL WITH DUMMY VARIABLE

The REG Procedure Model: MODEL1 Dependent Variable: price

Output Statistics

Dependent Predicted Std Error Std Error Student Cook's Obs Variable Value Mean Predict Residual Residual Residual -2-1 0 1 2 D

1 90300 100552 13370 -10252 25365 -0.404 | | | 0.008 2 384000 416212 11510 -32212 26262 -1.227 | **| | 0.048 3 157500 153841 10976 3659 26489 0.138 | | | 0.001 4 676200 696729 18464 -20529 21938 -0.936 | *| | 0.103 5 165000 160684 9531 4316 27043 0.160 | | | 0.001 6 300000 285448 13119 14552 25496 0.571 | |* | 0.014 7 108750 85674 14143 23076 24943 0.925 | |* | 0.046 8 276538 262106 9519 14432 27047 0.534 | |* | 0.006 9 420000 389183 11460 30817 26284 1.172 | |** | 0.044 10 950000 951227 26362 -1227 11278 -0.109 | | | 0.011 11 560000 574431 19411 -14431 21104 -0.684 | *| | 0.066 12 268000 241261 13868 26739 25097 1.065 | |** | 0.058 13 290000 288643 14681 1357 24630 0.0551 | | | 0.000 14 173200 187560 10295 -14360 26762 -0.537 | *| | 0.007 15 323650 274227 9146 49423 27176 1.819 | |*** | 0.062 16 162500 175105 10710 -12605 26598 -0.474 | | | 0.006 17 353500 351466 14254 2034 24879 0.0817 | | | 0.000 18 134400 180575 17168 -46175 22966 -2.011 | ****| | 0.377 19 187000 239159 10122 -52159 26827 -1.944 | ***| | 0.090 20 93600 95497 13315 -1897 25395 -0.0747 | | | 0.000 21 110000 123281 9601 -13281 27018 -0.492 | | | 0.005 22 573200 548207 20456 24993 20093 1.244 | |** | 0.267 23 79300 95924 13318 -16624 25393 -0.655 | *| | 0.020 24 272000 231646 15028 40354 24420 1.653 | |*** | 0.172

Sum of Residuals 0 Sum of Squared Residuals 14799036255 Predicted Residual SS (PRESS) 26406332001

6-4 Other Aspects of Regression

Page 33: 6-4 Other Aspects of Regression

Analysis of Covariate

Suppose we have the following setup.

6-4 Other Aspects of Regression

Treatment

1 2 r

X Y X Y X Y

Suppose X and Y are linearly related. We are interested in comparing the means of Y at the different levels of the treatment. Suppose a plot of the data looks like the following.

1

1

111

1

11

2

22

22

22

4

4444

4 4

4

3 3

33

33

33

X

Y

Page 34: 6-4 Other Aspects of Regression

Why Use Covariates?

Concomitant variables or covariates are used to adjust for factors that influence the Y measurements. In randomized block designs, we did the same thing, but there we could control the value of the block variable. Now we assume we can measure the variable, but not control it. The plot on the previous page demonstrates why we need covariates in some situations. If the covariate (X) was ignored we would most likely conclude that treatment level 3 resulted in a larger mean than 1 and 4 but not different from 2. If the linear relation is extended we see that the value of Y in level 3 could very well be less than that of 1, nearly equal to that of 4 and surely less than that of 2.

One assumption we need, equivalent to the no interaction assumption in two-way ANOVA, is that the slopes of the linear relationship between X and Y is the same in each treatment level.

6-4 Other Aspects of Regression

Page 35: 6-4 Other Aspects of Regression

Checking for Equal Slopes

The Model we fit firstTreatment = 1

Y-intercept = slope=

Treatment = r1

Y-intercept = slope=

Treatment = r

Y-intercept = slope=

The test of equal slopes is

If we fail to reject this we return the model without the interaction term and test without the interaction term and test

6-4 Other Aspects of Regression

Page 36: 6-4 Other Aspects of Regression

EXAMPLE

Four different formulations of an industrial glue are being tested. The tensile strength of the glue is also related to the thickness. Five observations on strength (Y) and thickness (X) in 0.01 inches are obtained for each formulation. The data are shown in the following table.

6-4 Other Aspects of Regression

Glue Formation

1 2 3 4

y x y x y x y x

46.545.949.846.144.3

1314121214

48.749.050.148.545.2

1210111214

46.347.148.948.250.3

1514111110

44.743.051.048.148.6

1615101211

Page 37: 6-4 Other Aspects of Regression

OPTIONS NOOVP NODATE NONUMBER;DATA GLUE; INPUT FORMULA STRENGTH THICK @@; CARDS;1 46.5 13 1 45.9 14 1 49.8 12 1 46.1 12 1 44.3 14 2 48.7 12 2 49.0 10 2 50.1 11 2 48.5 12 2 45.2 14 3 46.3 15 3 47.1 14 3 48.9 11 3 48.2 11 3 50.3 10 4 44.7 16 4 43.0 15 4 51.0 10 4 48.1 12 4 48.6 11 PROC GLM DATA=GLUE; CLASS FORMULA; MODEL thick=FORMULA; TITLE 'Test differences in covariate means';PROC GLM DATA=GLUE; CLASS FORMULA; MODEL STRENGTH=FORMULA THICK FORMULA*THICK; TITLE 'ANALYSIS OF COVARIANCE WITH INTERACTION';PROC GLM DATA=GLUE; CLASS FORMULA; MODEL STRENGTH=FORMULA THICK/SOLUTION; LSMEANS FORMULA/PDIFF STDERR; TITLE 'ANALYSIS OF COVARIANCE WITHOUT INTERACTION';RUN; QUIT;

Example

6-4 Other Aspects of Regression

SOLUTION produces a solution to the normal equations (parameter estimates). PROC GLM displays a solution by default when your model involves no classification variables, so you need this option only if you want to see the solution for models with classification effects.

PDIFF requests that p-values for differences of the LS-means be produced.

STDERR produces the standard error of the LS-means and the probability level for the hypothesis H0: LS-mean=0

Page 38: 6-4 Other Aspects of Regression

Test differences in covariate means The GLM Procedure

Class Level Information

Class Levels Values FORMULA 4 1 2 3 4

Number of Observations Read 20 Number of Observations Used 20-------------------------------------------------------------------------------------------------- Test differences in covariate means The GLM Procedure

Dependent Variable: THICK

Sum of Source DF Squares Mean Square F Value Pr > F Model 3 4.55000000 1.51666667 0.42 0.7442 Error 16 58.40000000 3.65000000 Corrected Total 19 62.95000000

R-Square Coeff Var Root MSE THICK Mean 0.072280 15.34536 1.910497 12.45000

Source DF Type I SS Mean Square F Value Pr > F FORMULA 3 4.55000000 1.51666667 0.42 0.7442

Source DF Type III SS Mean Square F Value Pr > F FORMULA 3 4.55000000 1.51666667 0.42 0.7442

6-4 Other Aspects of Regression

Page 39: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 40: 6-4 Other Aspects of Regression

ANALYSIS OF COVARIANCE WITH INTERACTION

The GLM ProcedureDependent Variable: STRENGTH

Sum of Source DF Squares Mean Square F Value Pr > F Model 7 74.01777794 10.57396828 7.22 0.0016 Error 12 17.56772206 1.46397684 Corrected Total 19 91.58550000

R-Square Coeff Var Root MSE STRENGTH Mean 0.808182 2.546457 1.209949 47.51500

Source DF Type I SS Mean Square F Value Pr > F FORMULA 3 11.05750000 3.68583333 2.52 0.1076 THICK 1 59.56576027 59.56576027 40.69 <.0001 THICK*FORMULA 3 3.39451766 1.13150589 0.77 0.5312

Source DF Type III SS Mean Square F Value Pr > F FORMULA 3 2.80437055 0.93479018 0.64 0.6046 THICK 1 41.34340945 41.34340945 28.24 0.0002 THICK*FORMULA 3 3.39451766 1.13150589 0.77 0.5312

6-4 Other Aspects of Regression

Page 41: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 42: 6-4 Other Aspects of Regression

ANALYSIS OF COVARIANCE WITHOUT INTERACTION The GLM Procedure

Dependent Variable: STRENGTH

Sum of Source DF Squares Mean Square F Value Pr > F

Model 4 70.62326027 17.65581507 12.63 0.0001 Error 15 20.96223973 1.39748265 Corrected Total 19 91.58550000

R-Square Coeff Var Root MSE STRENGTH Mean 0.771118 2.487955 1.182152 47.51500

Source DF Type I SS Mean Square F Value Pr > F FORMULA 3 11.05750000 3.68583333 2.64 0.0876 THICK 1 59.56576027 59.56576027 42.62 <.0001

Source DF Type III SS Mean Square F Value Pr > F FORMULA 3 1.77104066 0.59034689 0.42 0.7397 THICK 1 59.56576027 59.56576027 42.62 <.0001

Standard Parameter Estimate Error t Value Pr > |t|

Intercept 60.00712329 B 2.04941586 29.28 <.0001 FORMULA 1 -0.35801370 B 0.74829823 -0.48 0.6392 FORMULA 2 0.21006849 B 0.76349365 0.28 0.7870 FORMULA 3 0.47404110 B 0.75339742 0.63 0.5387 FORMULA 4 0.00000000 B . . . THICK -1.00993151 0.15469162 -6.53 <.0001

NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

6-4 Other Aspects of Regression

Page 43: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 44: 6-4 Other Aspects of Regression

ANALYSIS OF COVARIANCE WITHOUT INTERACTION

The GLM Procedure Least Squares Means

STRENGTH Standard LSMEAN FORMULA LSMEAN Error Pr > |t| Number

1 47.0754623 0.5354766 <.0001 1 2 47.6435445 0.5381512 <.0001 2 3 47.9075171 0.5300869 <.0001 3 4 47.4334760 0.5314395 <.0001 4

Least Squares Means for effect FORMULA Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: STRENGTH

i/j 1 2 3 4

1 0.4722 0.2895 0.6392 2 0.4722 0.7298 0.7870 3 0.2895 0.7298 0.5387 4 0.6392 0.7870 0.5387

NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.

6-4 Other Aspects of Regression

Page 45: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 46: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Note: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used.

Page 47: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.3 Variable Selection Procedures

Best Subsets Regressions

Selection Techniques

1) R2

2) MSE3) Cp

Page 48: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.3 Variable Selection ProceduresBackward Elimination

1. all regressors in the model 2. t-test: smallest absolute t-

value eliminated first3. Minitab for cut-off4. form, residue, region

Page 49: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.3 Variable Selection ProceduresForward Selection

1. No regressors in the model 2. largest absolute t-value

added first3. Minitab for cut-off4. form, residue, region, scent

Page 50: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression6-4.3 Variable Selection ProceduresStepwise Regression

1. begins with forward step, then backward elimination

2. tin=tout

3. Minitab for cut-off4. form, residue, region

Page 51: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

OPTIONS NODATE NOOVP NONUMBER;DATA SALES;INFILE 'C:\users\myung\Documents\Teaching\ 학부과목 \imen214-stats\ch06\sales.dat';INPUT SALES TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING;PROC CORR DATA=SALES; VAR SALES; WITH TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'CORRELATIONS OF DEPENDENT WITH INDENDENTS';PROC CORR DATA=SALES; VAR TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'CORRELATIONS BETWEEN INDEPENDENT VARIABLES';PROC REG DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/VIF R; TITLE 'REGRESSION MODEL WITH ALL VARIABLES';PROC RSQUARE DATA=SALES CP; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/ADJRSQ RMSE SSE; TITLE 'ALL POSSIBLE REGRESSIONS';PROC STEPWISE DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/FORWARD;PROC STEPWISE DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING/BACKWARD; TITLE 'STEPWISE REGRESSION USING BACKWARD ELIMINATION';PROC STEPWISE DATA=SALES; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING; TITLE 'STEPWISE REGRESSION THE STEPWISE TECHNIQUE';PROC REG DATA=SALES; MODEL SALES=POTENT ADVERT SHARE ACCOUNTS/R; MODEL SALES=POTENT ADVERT SHARE CHANGE ACCOUNTS/R; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE/R; MODEL SALES=TIME POTENT ADVERT SHARE ACCOUNTS/R; MODEL SALES=TIME POTENT ADVERT SHARE CHANGE WORKLOAD/R;RUN; QUIT;

Page 52: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

CORRELATIONS OF DEPENDENT WITH INDENDENTS

피어슨 상관 계수 , N = 25 H0: Rho=0 가정하에서 Prob > |r|

SALES

TIME 0.62292 0.0009

POTENT 0.59781 0.0016

ADVERT 0.59618 0.0017

SHARE 0.48351 0.0143

CHANGE 0.48014 0.0151

ACCOUNTS 0.75399 <.0001

WORKLOAD -0.11722 0.5768

RATING 0.40188 0.0464

CORRELATIONS BETWEEN INDEPENDENT VARIABLES 피어슨 상관 계수 , N = 25 H0: Rho=0 가정하에서 Prob > |r|

TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING

TIME 1.00000 0.45397 0.24919 0.10621 0.27512 0.75782 -0.17932 0.10113 0.0226 0.2297 0.6133 0.1832 <.0001 0.3911 0.6305

POTENT 0.45397 1.00000 0.17410 -0.21067 0.22570 0.47864 -0.25884 0.35870 0.0226 0.4052 0.3121 0.2780 0.0155 0.2115 0.0783

ADVERT 0.24919 0.17410 1.00000 0.26446 0.34826 0.20004 -0.27223 0.41146 0.2297 0.4052 0.2014 0.0880 0.3377 0.1880 0.0410

SHARE 0.10621 -0.21067 0.26446 1.00000 0.14686 0.40301 0.34935 -0.02356 0.6133 0.3121 0.2014 0.4836 0.0458 0.0870 0.9110

CHANGE 0.27512 0.22570 0.34826 0.14686 1.00000 0.32344 -0.29839 0.49418 0.1832 0.2780 0.0880 0.4836 0.1148 0.1474 0.0120

ACCOUNTS 0.75782 0.47864 0.20004 0.40301 0.32344 1.00000 -0.19885 0.22861 <.0001 0.0155 0.3377 0.0458 0.1148 0.3406 0.2717

WORKLOAD -0.17932 -0.25884 -0.27223 0.34935 -0.29839 -0.19885 1.00000 -0.27691 0.3911 0.2115 0.1880 0.0870 0.1474 0.3406 0.1802

RATING 0.10113 0.35870 0.41146 -0.02356 0.49418 0.22861 -0.27691 1.00000 0.6305 0.0783 0.0410 0.9110 0.0120 0.2717 0.1802

Page 53: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

REGRESSION MODEL WITH ALL VARIABLES

The REG Procedure Model: MODEL1 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 8 38108913 4763614 23.30 <.0001 Error 16 3270636 204415 Corrected Total 24 41379549

Root MSE 452.12251 R-Square 0.9210 Dependent Mean 3374.56760 Adj R-Sq 0.8814 Coeff Var 13.39794

Parameter Estimates

Parameter Standard Variance Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 -1642.62908 768.07059 -2.14 0.0482 0 TIME 1 1.66830 1.97176 0.85 0.4100 3.43888 POTENT 1 0.03684 0.00826 4.46 0.0004 1.97767 ADVERT 1 0.15912 0.04722 3.37 0.0039 1.89317 SHARE 1 183.52338 68.76742 2.67 0.0168 3.35939 CHANGE 1 289.66240 196.48626 1.47 0.1598 1.55819 ACCOUNTS 1 6.49811 4.81615 1.35 0.1960 5.65732 WORKLOAD 1 25.67997 34.65316 0.74 0.4694 1.89904 RATING 1 15.01902 128.57870 0.12 0.9085 1.78590

Page 54: 6-4 Other Aspects of Regression

All Possible Regressions

This is the brute force method of modeling. It is feasible if the number of independent variables is small (less than 10 or so) and the sample size is not too large. Some of the common quantities to look at are1) R-square should be large. Should be adequate increase when an

additional variable is added.2) Adj R-square should not be much less than R-square. It should

show an increase if a variable is added.3) Mallows Cp should be approximately the number of parameters

in the model (including the y-intercept). This is a good measure to use to narrow down the possible models quickly, then use 1) and 2) to pick the final models.

4) The model should make sense. Note: Many of the better methods of model selection are to time consuming to use on all possible regressions. A number of good models can be chosen and then use better methods.

6-4 Other Aspects of Regression

Page 55: 6-4 Other Aspects of Regression

Example

ALL POSSIBLE REGRESSIONS The RSQUARE Procedure Model: MODEL1 Dependent Variable: SALES R-Square Selection Method

Number in Adjusted Root Model R-Square R-Square C(p) MSE SSE Variables in Model

1 0.5685 0.5497 66.3492 881.09306 17855475 ACCOUNTS 1 0.3880 0.3614 102.8808 1049.28700 25323074 TIME 1 0.3574 0.3294 109.0853 1075.24242 26591364 POTENT 1 0.3554 0.3274 109.4803 1076.87335 26672093 ADVERT 1 0.2338 0.2005 134.1049 1174.09975 31705735 SHARE 1 0.2305 0.1971 134.7633 1176.58902 31840319 CHANGE 1 0.1615 0.1250 148.7358 1228.22772 34696497 RATING 1 0.0137 -.0291 178.6477 1332.06192 40810946 WORKLOAD----------------------------------------------------------------------------------------------- 2 0.7751 0.7547 26.5262 650.39250 9306229 ADVERT ACCOUNTS 2 0.7461 0.7230 32.4045 691.10682 10507830 POTENT SHARE 2 0.6413 0.6087 53.6096 821.37564 14842475 POTENT ACCOUNTS 2 0.6308 0.5973 55.7287 833.27546 15275656 CHANGE ACCOUNTS 2 0.6241 0.5899 57.0986 840.87809 15555671 ACCOUNTS RATING 2 0.6071 0.5714 60.5323 859.63994 16257578 POTENT ADVERT 2 0.6070 0.5713 60.5495 859.73297 16261097 SHARE ACCOUNTS 2 0.5953 0.5586 62.9144 872.41841 16744505 TIME ADVERT 2 0.5747 0.5361 67.0862 894.35859 17597300 TIME ACCOUNTS 2 0.5696 0.5305 68.1238 899.73205 17809391 ACCOUNTS WORKLOAD 2 0.5642 0.5246 69.2192 905.37075 18033316 TIME SHARE 2 0.5130 0.4688 79.5768 957.04432 20150544 TIME POTENT 2 0.5041 0.4590 81.3931 965.82101 20521825 TIME RATING 2 0.4912 0.4449 84.0028 978.29397 21055300 TIME CHANGE 2 0.4829 0.4359 85.6674 986.16706 21395560 POTENT CHANGE 2 0.4696 0.4214 88.3710 998.82260 21948225 ADVERT SHARE 2 0.4399 0.3890 94.3726 1026.35798 23175036 ADVERT CHANGE 2 0.4049 0.3508 101.4741 1058.01462 24626688 SHARE CHANGE 2 0.4047 0.3505 101.5122 1058.18173 24634469 SHARE RATING 2 0.3977 0.3429 102.9229 1064.35711 24922833 POTENT RATING 2 0.3881 0.3324 104.8745 1072.84183 25321771 TIME WORKLOAD 2 0.3849 0.3290 105.5063 1075.57456 25450934 ADVERT RATING 2 0.3589 0.3006 110.7800 1098.11716 26528948 POTENT WORKLOAD 2 0.3576 0.2992 111.0360 1099.19970 26581280 ADVERT WORKLOAD 2 0.3270 0.2659 117.2276 1125.06422 27846929 SHARE WORKLOAD 2 0.2664 0.1997 129.5062 1174.67346 30356870 CHANGE RATING 2 0.2313 0.1614 136.6126 1202.45031 31809508 CHANGE WORKLOAD 2 0.1615 0.0853 150.7280 1255.80320 34694917 WORKLOAD RATING----------------------------------------------------------------------------------------------- 3 0.8490 0.8274 13.5717 545.51455 6249309 POTENT ADVERT SHARE 3 0.8277 0.8031 17.8740 582.63576 7128753 POTENT ADVERT ACCOUNTS 3 0.8121 0.7853 21.0291 608.42176 7773718 TIME POTENT SHARE 3 0.7999 0.7713 23.5119 627.96859 8281235 POTENT SHARE CHANGE 3 0.7969 0.7679 24.1038 632.53902 8402218 ADVERT ACCOUNTS WORKLOAD 3 0.7886 0.7584 25.7910 645.39107 8747122 ADVERT CHANGE ACCOUNTS 3 0.7885 0.7583 25.8037 645.48671 8749715 POTENT SHARE ACCOUNTS 3 0.7862 0.7557 26.2800 649.06760 8847064 ADVERT SHARE ACCOUNTS 3 0.7795 0.7480 27.6296 659.11017 9122951 ADVERT ACCOUNTS RATING 3 0.7752 0.7431 28.5071 665.55835 9302326 TIME ADVERT ACCOUNTS 3 0.7735 0.7411 28.8570 668.11220 9373852 POTENT SHARE RATING 3 0.7730 0.7405 28.9589 668.85366 9394670 POTENT SHARE WORKLOAD 3 0.6991 0.6562 43.9012 769.94432 12449099 TIME POTENT ADVERT 3 0.6959 0.6524 44.5686 774.15195 12585536 TIME ADVERT SHARE

Page 56: 6-4 Other Aspects of Regression

Example

Number in Adjusted Root Model R-Square R-Square C(p) MSE SSE Variables in Model

3 0.6931 0.6492 45.1331 777.69275 12700926 POTENT CHANGE ACCOUNTS 3 0.6905 0.6463 45.6507 780.92497 12806720 TIME SHARE RATING 3 0.6764 0.6302 48.4975 798.47034 13388653 SHARE ACCOUNTS RATING 3 0.6683 0.6209 50.1523 808.49396 13726912 POTENT ACCOUNTS RATING 3 0.6675 0.6200 50.3000 809.38278 13757110 SHARE CHANGE ACCOUNTS 3 0.6496 0.5995 53.9337 830.94586 14499891 CHANGE ACCOUNTS RATING 3 0.6488 0.5987 54.0842 831.82696 14530658 POTENT ADVERT CHANGE 3 0.6488 0.5986 54.1030 831.93707 14534505 POTENT ACCOUNTS WORKLOAD 3 0.6426 0.5916 55.3403 839.14431 14787427 TIME POTENT ACCOUNTS 3 0.6408 0.5894 55.7197 841.34191 14864980 CHANGE ACCOUNTS WORKLOAD 3 0.6379 0.5862 56.2905 844.63762 14981667 TIME SHARE CHANGE 3 0.6354 0.5833 56.8050 847.59673 15086825 TIME ACCOUNTS RATING 3 0.6353 0.5832 56.8220 847.69443 15090303 TIME CHANGE ACCOUNTS 3 0.6336 0.5813 57.1602 849.63429 15159447 POTENT ADVERT WORKLOAD 3 0.6328 0.5803 57.3345 850.63171 15195060 ACCOUNTS WORKLOAD RATING 3 0.6327 0.5802 57.3508 850.72543 15198409 TIME ADVERT CHANGE 3 0.6305 0.5777 57.7955 853.26569 15289309 TIME SHARE ACCOUNTS 3 0.6251 0.5716 58.8861 859.46363 15512232 TIME ADVERT RATING 3 0.6118 0.5563 61.5843 874.60988 16063791 SHARE ACCOUNTS WORKLOAD 3 0.6073 0.5511 62.5034 879.70951 16251665 POTENT ADVERT RATING 3 0.6066 0.5504 62.6310 880.41503 16277743 TIME ADVERT WORKLOAD 3 0.5954 0.5376 64.9000 892.87068 16741579 TIME SHARE WORKLOAD 3 0.5923 0.5340 65.5358 896.32931 16871531 TIME POTENT CHANGE 3 0.5761 0.5155 68.8099 913.93492 17540818 TIME ACCOUNTS WORKLOAD 3 0.5663 0.5043 70.8015 924.48011 17947933 TIME POTENT RATING 3 0.5429 0.4776 75.5328 949.06129 18915064 ADVERT SHARE CHANGE 3 0.5356 0.4693 77.0075 956.59421 19216522 TIME CHANGE RATING 3 0.5203 0.4518 80.1009 972.20552 19848855 ADVERT SHARE RATING 3 0.5176 0.4486 80.6586 974.99341 19962855 TIME POTENT WORKLOAD 3 0.5117 0.4420 81.8408 980.87727 20204525 TIME WORKLOAD RATING 3 0.5008 0.4295 84.0531 991.79381 20656754 POTENT CHANGE WORKLOAD 3 0.4979 0.4261 84.6469 994.70351 20778137 TIME CHANGE WORKLOAD 3 0.4849 0.4114 87.2636 1007.42515 21313014 POTENT CHANGE RATING 3 0.4839 0.4102 87.4753 1008.44774 21356304 ADVERT SHARE WORKLOAD 3 0.4613 0.3843 92.0552 1030.31412 22292491 SHARE CHANGE RATING 3 0.4533 0.3752 93.6750 1037.93771 22623608 ADVERT CHANGE WORKLOAD 3 0.4432 0.3637 95.7041 1047.40918 23038386 ADVERT CHANGE RATING 3 0.4427 0.3631 95.8150 1047.92422 23061049 SHARE WORKLOAD RATING 3 0.4309 0.3495 98.2115 1058.99621 23550933 SHARE CHANGE WORKLOAD 3 0.4044 0.3193 103.5767 1083.37325 24647650 POTENT WORKLOAD RATING 3 0.3914 0.3045 106.1918 1095.05858 25182219 ADVERT WORKLOAD RATING 3 0.2697 0.1654 130.8362 1199.60125 30219906 CHANGE WORKLOAD RATING----------------------------------------------------------------------------------------------- 4 0.9004 0.8805 5.1519 453.83623 4119346 POTENT ADVERT SHARE ACCOUNTS 4 0.8960 0.8752 6.0599 463.94781 4304951 TIME POTENT ADVERT SHARE 4 0.8711 0.8453 11.0938 516.42801 5333958 POTENT ADVERT SHARE CHANGE 4 0.8641 0.8369 12.5172 530.32625 5624919 POTENT ADVERT ACCOUNTS WORKLOAD 4 0.8513 0.8215 15.1086 554.73627 6154647 POTENT ADVERT SHARE WORKLOAD 4 0.8512 0.8214 15.1311 554.94368 6159250 POTENT ADVERT SHARE RATING 4 0.8496 0.8195 15.4486 557.85979 6224151 TIME POTENT SHARE CHANGE 4 0.8480 0.8176 15.7640 560.74142 6288619 TIME POTENT SHARE RATING 4 0.8382 0.8059 17.7462 578.52470 6693817 POTENT ADVERT CHANGE ACCOUNTS 4 0.8287 0.7945 19.6712 595.28564 7087300 POTENT SHARE CHANGE ACCOUNTS 4 0.8283 0.7940 19.7530 595.98795 7104033 TIME POTENT ADVERT ACCOUNTS 4 0.8279 0.7934 19.8434 596.76238 7122507 TIME POTENT SHARE WORKLOAD

Page 57: 6-4 Other Aspects of Regression

Example

Number in Adjusted Root Model R-Square R-Square C(p) MSE SSE Variables in Model

4 0.8277 0.7933 19.8726 597.01202 7128467 POTENT ADVERT ACCOUNTS RATING 4 0.8187 0.7825 21.6963 612.42408 7501265 ADVERT CHANGE ACCOUNTS WORKLOAD 4 0.8134 0.7761 22.7777 621.38217 7722316 TIME POTENT SHARE ACCOUNTS 4 0.8130 0.7757 22.8447 621.93296 7736012 POTENT SHARE ACCOUNTS RATING 4 0.8080 0.7696 23.8737 630.33166 7946360 POTENT SHARE CHANGE WORKLOAD 4 0.8055 0.7666 24.3696 634.33897 8047719 ADVERT ACCOUNTS WORKLOAD RATING 4 0.8045 0.7654 24.5811 636.04064 8090954 POTENT SHARE CHANGE RATING 4 0.8009 0.7611 25.3012 641.80035 8238154 ADVERT SHARE CHANGE ACCOUNTS 4 0.7981 0.7577 25.8797 646.39079 8356421 POTENT SHARE ACCOUNTS WORKLOAD 4 0.7977 0.7573 25.9475 646.92627 8370272 ADVERT SHARE ACCOUNTS WORKLOAD 4 0.7971 0.7565 26.0796 647.96937 8397286 TIME ADVERT ACCOUNTS WORKLOAD 4 0.7949 0.7539 26.5155 651.39768 8486379 ADVERT SHARE ACCOUNTS RATING 4 0.7901 0.7481 27.4959 659.04431 8686788 POTENT SHARE WORKLOAD RATING 4 0.7892 0.7470 27.6786 660.45942 8724133 ADVERT CHANGE ACCOUNTS RATING 4 0.7890 0.7469 27.7029 660.64731 8729097 TIME ADVERT SHARE ACCOUNTS 4 0.7887 0.7464 27.7729 661.18865 8743409 TIME ADVERT CHANGE ACCOUNTS 4 0.7800 0.7361 29.5256 674.59966 9101694 TIME ADVERT ACCOUNTS RATING 4 0.7455 0.6946 36.5127 725.60144 10529949 TIME ADVERT SHARE RATING 4 0.7284 0.6741 39.9757 749.59485 11237849 TIME POTENT ADVERT WORKLOAD 4 0.7271 0.6725 40.2410 751.40097 11292068 TIME ADVERT SHARE CHANGE 4 0.7252 0.6703 40.6235 753.99809 11370262 TIME POTENT ADVERT CHANGE 4 0.7166 0.6599 42.3759 765.78317 11728477 TIME SHARE ACCOUNTS RATING 4 0.7142 0.6571 42.8474 768.92310 11824855 POTENT CHANGE ACCOUNTS WORKLOAD 4 0.7030 0.6436 45.1179 783.86846 12288995 TIME POTENT ADVERT RATING 4 0.7026 0.6431 45.2116 784.47872 12308137 TIME SHARE CHANGE RATING 4 0.6979 0.6375 46.1447 790.53370 12498871 TIME ADVERT SHARE WORKLOAD 4 0.6978 0.6373 46.1784 790.75213 12505779 POTENT CHANGE ACCOUNTS RATING 4 0.6974 0.6369 46.2450 791.18232 12519389 TIME SHARE WORKLOAD RATING 4 0.6960 0.6352 46.5312 793.02844 12577882 SHARE CHANGE ACCOUNTS RATING 4 0.6939 0.6326 46.9700 795.85093 12667574 TIME POTENT CHANGE ACCOUNTS 4 0.6914 0.6297 47.4613 798.99991 12768017 POTENT ADVERT CHANGE WORKLOAD 4 0.6869 0.6243 48.3812 804.86195 12956055 TIME SHARE CHANGE ACCOUNTS 4 0.6830 0.6196 49.1677 809.84021 13116823 POTENT ACCOUNTS WORKLOAD RATING 4 0.6767 0.6120 50.4533 817.91293 13379631 SHARE ACCOUNTS WORKLOAD RATING 4 0.6725 0.6070 51.2908 823.12861 13550814 TIME POTENT ACCOUNTS RATING 4 0.6676 0.6011 52.2960 829.34587 13756292 SHARE CHANGE ACCOUNTS WORKLOAD 4 0.6643 0.5971 52.9590 833.42108 13891814 CHANGE ACCOUNTS WORKLOAD RATING 4 0.6575 0.5890 54.3346 841.81426 14173025 TIME CHANGE ACCOUNTS RATING 4 0.6550 0.5860 54.8314 844.82452 14274569 TIME ADVERT CHANGE WORKLOAD 4 0.6539 0.5846 55.0694 846.26320 14323228 POTENT ADVERT CHANGE RATING 4 0.6502 0.5802 55.8143 850.74922 14475485 TIME POTENT ACCOUNTS WORKLOAD 4 0.6461 0.5754 56.6326 855.65053 14642757 TIME SHARE CHANGE WORKLOAD

Page 58: 6-4 Other Aspects of Regression

Example

Number in Adjusted Root Model R-Square R-Square C(p) MSE SSE Variables in Model

4 0.6457 0.5749 56.7143 856.13857 14659465 TIME ACCOUNTS WORKLOAD RATING 4 0.6457 0.5749 56.7188 856.16512 14660374 TIME CHANGE ACCOUNTS WORKLOAD 4 0.6451 0.5742 56.8336 856.85057 14683858 TIME ADVERT WORKLOAD RATING 4 0.6428 0.5714 57.2994 859.62382 14779062 TIME ADVERT CHANGE RATING 4 0.6388 0.5666 58.1110 864.43504 14944959 TIME SHARE ACCOUNTS WORKLOAD 4 0.6347 0.5617 58.9426 869.33742 15114951 POTENT ADVERT WORKLOAD RATING 4 0.6122 0.5346 63.5103 895.78631 16048662 TIME POTENT CHANGE WORKLOAD 4 0.6041 0.5250 65.1327 904.99453 16380302 TIME POTENT CHANGE RATING 4 0.5803 0.4964 69.9543 931.82409 17365923 TIME POTENT WORKLOAD RATING 4 0.5577 0.4692 74.5388 956.63636 18303063 ADVERT SHARE CHANGE RATING 4 0.5494 0.4593 76.2164 965.55611 18645972 TIME CHANGE WORKLOAD RATING 4 0.5449 0.4538 77.1345 970.40328 18833650 ADVERT SHARE CHANGE WORKLOAD 4 0.5284 0.4341 80.4671 987.79743 19514875 ADVERT SHARE WORKLOAD RATING 4 0.5042 0.4051 85.3565 1012.77689 20514341 POTENT CHANGE WORKLOAD RATING 4 0.4791 0.3749 90.4505 1038.16256 21555630 SHARE CHANGE WORKLOAD RATING 4 0.4582 0.3498 94.6757 1058.75708 22419331 ADVERT CHANGE WORKLOAD RATING----------------------------------------------------------------------------------------------- 5 0.9119 0.8888 4.8276 437.95155 3644230 POTENT ADVERT SHARE CHANGE ACCOUNTS 5 0.9108 0.8873 5.0560 440.74727 3690905 TIME POTENT ADVERT SHARE CHANGE 5 0.9064 0.8817 5.9565 451.60487 3874992 TIME POTENT ADVERT SHARE ACCOUNTS 5 0.9028 0.8772 6.6716 460.04338 4021158 POTENT ADVERT SHARE ACCOUNTS WORKLOAD 5 0.9025 0.8768 6.7397 460.83914 4035082 TIME POTENT ADVERT SHARE RATING 5 0.9013 0.8754 6.9716 463.53877 4082496 POTENT ADVERT SHARE ACCOUNTS RATING 5 0.8965 0.8692 7.9550 474.81309 4283502 TIME POTENT ADVERT SHARE WORKLOAD 5 0.8838 0.8532 10.5287 503.12779 4809614 POTENT ADVERT CHANGE ACCOUNTS WORKLOAD 5 0.8713 0.8374 13.0623 529.52425 5327523 POTENT ADVERT SHARE CHANGE RATING 5 0.8712 0.8373 13.0801 529.70500 5331160 POTENT ADVERT SHARE CHANGE WORKLOAD 5 0.8648 0.8293 14.3598 542.54514 5592749 TIME POTENT ADVERT ACCOUNTS WORKLOAD 5 0.8648 0.8292 14.3724 542.67015 5595327 POTENT ADVERT ACCOUNTS WORKLOAD RATING 5 0.8619 0.8255 14.9645 548.50755 5716350 TIME POTENT SHARE CHANGE RATING 5 0.8545 0.8162 16.4595 562.97889 6021959 TIME POTENT SHARE WORKLOAD RATING 5 0.8539 0.8155 16.5677 564.01204 6044082 TIME POTENT SHARE CHANGE WORKLOAD 5 0.8530 0.8143 16.7553 565.79809 6082422 POTENT ADVERT SHARE WORKLOAD RATING 5 0.8500 0.8105 17.3653 571.56874 6207126 TIME POTENT SHARE CHANGE ACCOUNTS

Page 59: 6-4 Other Aspects of Regression

Example

Number in Adjusted Root Model R-Square R-Square C(p) MSE SSE Variables in Model

5 0.8481 0.8081 17.7558 575.23145 6286933 TIME POTENT SHARE ACCOUNTS RATING 5 0.8398 0.7977 19.4247 590.63290 6628097 POTENT ADVERT CHANGE ACCOUNTS RATING 5 0.8388 0.7964 19.6323 592.52025 6670525 TIME POTENT ADVERT CHANGE ACCOUNTS 5 0.8342 0.7905 20.5677 600.95257 6861736 POTENT SHARE CHANGE ACCOUNTS RATING 5 0.8301 0.7854 21.3845 608.22037 7028708 POTENT SHARE CHANGE ACCOUNTS WORKLOAD 5 0.8283 0.7832 21.7507 611.45033 7103559 TIME POTENT ADVERT ACCOUNTS RATING 5 0.8280 0.7827 21.8275 612.12593 7119265 TIME POTENT SHARE ACCOUNTS WORKLOAD 5 0.8204 0.7732 23.3467 625.33360 7429800 ADVERT CHANGE ACCOUNTS WORKLOAD RATING 5 0.8190 0.7714 23.6388 627.84103 7489503 ADVERT SHARE CHANGE ACCOUNTS WORKLOAD 5 0.8188 0.7712 23.6725 628.12996 7496398 TIME ADVERT CHANGE ACCOUNTS WORKLOAD 5 0.8170 0.7688 24.0496 631.35137 7573486 POTENT SHARE ACCOUNTS WORKLOAD RATING 5 0.8115 0.7619 25.1542 640.69365 7799279 POTENT SHARE CHANGE WORKLOAD RATING 5 0.8074 0.7568 25.9784 647.57682 7967759 ADVERT SHARE ACCOUNTS WORKLOAD RATING 5 0.8064 0.7554 26.1959 649.38077 8012212 TIME ADVERT ACCOUNTS WORKLOAD RATING 5 0.8040 0.7524 26.6788 653.36890 8110927 TIME ADVERT SHARE CHANGE ACCOUNTS 5 0.8035 0.7517 26.7847 654.24073 8132588 ADVERT SHARE CHANGE ACCOUNTS RATING 5 0.8023 0.7503 27.0163 656.14156 8179913 TIME ADVERT SHARE ACCOUNTS RATING 5 0.7984 0.7454 27.8090 662.60858 8341952 TIME ADVERT SHARE ACCOUNTS WORKLOAD 5 0.7894 0.7340 29.6353 677.27347 8715288 TIME ADVERT CHANGE ACCOUNTS RATING 5 0.7676 0.7065 34.0407 711.40355 9615805 TIME POTENT ADVERT CHANGE WORKLOAD 5 0.7534 0.6886 36.9113 732.78869 10202606 TIME ADVERT SHARE CHANGE RATING 5 0.7458 0.6789 38.4645 744.10291 10520094 TIME ADVERT SHARE WORKLOAD RATING 5 0.7358 0.6662 40.4916 758.61644 10934479 TIME POTENT ADVERT WORKLOAD RATING 5 0.7287 0.6573 41.9196 768.67520 11226370 TIME SHARE CHANGE ACCOUNTS RATING 5 0.7271 0.6553 42.2409 770.92084 11292060 TIME ADVERT SHARE CHANGE WORKLOAD 5 0.7252 0.6529 42.6186 773.55140 11369254 TIME POTENT ADVERT CHANGE RATING 5 0.7213 0.6480 43.4101 779.03619 11531050 POTENT CHANGE ACCOUNTS WORKLOAD RATING 5 0.7179 0.6437 44.1046 783.81759 11673030 TIME SHARE ACCOUNTS WORKLOAD RATING

Page 60: 6-4 Other Aspects of Regression

Example

Number in Adjusted Root Model R-Square R-Square C(p) MSE SSE Variables in Model

5 0.7151 0.6401 44.6731 787.70913 11789228 TIME POTENT CHANGE ACCOUNTS WORKLOAD 5 0.7060 0.6286 46.5186 800.21291 12166473 TIME SHARE CHANGE WORKLOAD RATING 5 0.6997 0.6207 47.7928 808.73377 12426956 TIME POTENT CHANGE ACCOUNTS RATING 5 0.6964 0.6165 48.4630 813.17902 12563942 SHARE CHANGE ACCOUNTS WORKLOAD RATING 5 0.6949 0.6146 48.7594 815.13719 12624524 POTENT ADVERT CHANGE WORKLOAD RATING 5 0.6881 0.6060 50.1343 824.16080 12905579 TIME POTENT ACCOUNTS WORKLOAD RATING 5 0.6872 0.6049 50.3110 825.31348 12941704 TIME SHARE CHANGE ACCOUNTS WORKLOAD 5 0.6735 0.5876 53.0929 843.25103 13510374 TIME CHANGE ACCOUNTS WORKLOAD RATING 5 0.6691 0.5821 53.9771 848.87242 13691103 TIME ADVERT CHANGE WORKLOAD RATING 5 0.6278 0.5298 62.3465 900.34880 15401931 TIME POTENT CHANGE WORKLOAD RATING 5 0.5595 0.4435 76.1766 979.50095 18229020 ADVERT SHARE CHANGE WORKLOAD RATING----------------------------------------------------------------------------------------------- 6 0.9181 0.8908 5.5848 433.98496 3390173 TIME POTENT ADVERT SHARE CHANGE ACCOUNTS 6 0.9173 0.8897 5.7457 436.08518 3423065 POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD 6 0.9121 0.8828 6.7968 449.56357 3637933 POTENT ADVERT SHARE CHANGE ACCOUNTS RATING 6 0.9119 0.8825 6.8390 450.09635 3646561 TIME POTENT ADVERT SHARE CHANGE RATING 6 0.9109 0.8812 7.0401 452.62529 3687654 TIME POTENT ADVERT SHARE CHANGE WORKLOAD 6 0.9094 0.8792 7.3368 456.33215 3748303 TIME POTENT ADVERT SHARE ACCOUNTS RATING 6 0.9072 0.8762 7.7935 461.98069 3841671 TIME POTENT ADVERT SHARE ACCOUNTS WORKLOAD 6 0.9040 0.8720 8.4367 469.82008 3973156 POTENT ADVERT SHARE ACCOUNTS WORKLOAD RATING 6 0.9026 0.8702 8.7075 473.08103 4028502 TIME POTENT ADVERT SHARE WORKLOAD RATING 6 0.8845 0.8460 12.3772 515.24728 4778636 TIME POTENT ADVERT CHANGE ACCOUNTS WORKLOAD 6 0.8845 0.8460 12.3789 515.26674 4778997 POTENT ADVERT CHANGE ACCOUNTS WORKLOAD RATING 6 0.8713 0.8284 15.0485 543.89060 5324706 POTENT ADVERT SHARE CHANGE WORKLOAD RATING 6 0.8653 0.8204 16.2738 556.53501 5575162 TIME POTENT ADVERT ACCOUNTS WORKLOAD RATING 6 0.8647 0.8196 16.3875 557.69376 5598402 TIME POTENT SHARE CHANGE WORKLOAD RATING 6 0.8619 0.8159 16.9579 563.47159 5715004 TIME POTENT SHARE CHANGE ACCOUNTS RATING 6 0.8549 0.8065 18.3734 577.55979 6004356 TIME POTENT SHARE ACCOUNTS WORKLOAD RATING 6 0.8539 0.8053 18.5666 579.45602 6043847 TIME POTENT SHARE CHANGE ACCOUNTS WORKLOAD

Page 61: 6-4 Other Aspects of Regression

Example

Number in Adjusted Root Model R-Square R-Square C(p) MSE SSE Variables in Model

6 0.8411 0.7881 21.1731 604.45835 6576658 TIME POTENT ADVERT CHANGE ACCOUNTS RATING 6 0.8351 0.7801 22.3903 615.78608 6825465 POTENT SHARE CHANGE ACCOUNTS WORKLOAD RATING 6 0.8211 0.7615 25.2080 641.24188 7401441 ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING 6 0.8208 0.7611 25.2667 641.76159 7413443 TIME ADVERT CHANGE ACCOUNTS WORKLOAD RATING 6 0.8194 0.7592 25.5526 644.28605 7471881 TIME ADVERT SHARE CHANGE ACCOUNTS WORKLOAD 6 0.8109 0.7479 27.2749 659.29091 7823961 TIME ADVERT SHARE ACCOUNTS WORKLOAD RATING 6 0.8092 0.7456 27.6175 662.23441 7893979 TIME ADVERT SHARE CHANGE ACCOUNTS RATING 6 0.7677 0.6902 36.0298 730.81314 9613581 TIME POTENT ADVERT CHANGE WORKLOAD RATING 6 0.7535 0.6713 38.9066 752.83334 10201645 TIME ADVERT SHARE CHANGE WORKLOAD RATING 6 0.7288 0.6384 43.9036 789.62396 11223108 TIME SHARE CHANGE ACCOUNTS WORKLOAD RATING 6 0.7236 0.6315 44.9415 797.05230 11435263 TIME POTENT CHANGE ACCOUNTS WORKLOAD RATING----------------------------------------------------------------------------------------------- 7 0.9209 0.8883 7.0136 438.81025 3273425 TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD 7 0.9182 0.8846 7.5492 446.08716 3382894 TIME POTENT ADVERT SHARE CHANGE ACCOUNTS RATING 7 0.9174 0.8834 7.7159 448.32838 3416972 POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING 7 0.9120 0.8757 8.8204 462.90380 3642759 TIME POTENT ADVERT SHARE CHANGE WORKLOAD RATING 7 0.9102 0.8733 9.1733 467.46444 3714891 TIME POTENT ADVERT SHARE ACCOUNTS WORKLOAD RATING 7 0.8858 0.8387 14.1223 527.28660 4726530 TIME POTENT ADVERT CHANGE ACCOUNTS WORKLOAD RATING 7 0.8649 0.8092 18.3548 573.52034 5591735 TIME POTENT SHARE CHANGE ACCOUNTS WORKLOAD RATING 7 0.8227 0.7496 26.9005 657.02511 7338594 TIME ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING----------------------------------------------------------------------------------------------- 8 0.9210 0.8814 9.0000 452.12251 3270636 TIME POTENT ADVERT SHARE CHANGE ACCOUNTS WORKLOAD RATING

Page 62: 6-4 Other Aspects of Regression

Stepwise Regression

Forward Selection:1) Begins with no variables in the model. Calculates simple linear

model for each X and adds most significant. (if above stated p-value).

2) Calculates all models with already added variables and each non-added variable. Most significant is added. (if above sated p-value)

3) This process is continued until no variables can be added.

Backward Elimination:4) Model with all variables is fit. List significant term is removed

and model is refit without this variable (if p-value is greater than specified limit).

5) Least significant variable is removed (if p-value is greater than specified limit) and the model is refit without this variable.

6) This process is continued until no variables can be removed.

6-4 Other Aspects of Regression

Page 63: 6-4 Other Aspects of Regression

Stepwise Regression

Stepwise Technique:This technique is a variation on the forward selection technique. After a variable is added, the least significant is also removed if it has a p-value greater than the specified limit. This accounts for multicollinearity to some degree.

Typically you do not do a stepwise procedure if you do an all possible regressions and vice versa. Stepwise procedures are more economical than stepwise procedures in large data sets.

There is no guarantee that the stepwise procedures will end up with the same model or the “best” model.

6-4 Other Aspects of Regression

Page 64: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression ALL POSSIBLE REGRESSIONS

The STEPWISE Procedure Model: MODEL1 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Forward Selection: Step 1

Variable ACCOUNTS Entered: R-Square = 0.5685 and C(p) = 66.3492

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 23524074 23524074 30.30 <.0001 Error 23 17855475 776325 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F

Intercept 709.32383 515.24608 1471307 1.90 0.1819 ACCOUNTS 21.72177 3.94603 23524074 30.30 <.0001

Bounds on condition number: 1, 1----------------------------------------------------------------------------------------------- Forward Selection: Step 2

Variable ADVERT Entered: R-Square = 0.7751 and C(p) = 26.5262

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 32073320 16036660 37.91 <.0001 Error 22 9306229 423010 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F

Intercept 50.29906 407.60969 6441.42281 0.02 0.9029 ADVERT 0.22653 0.05039 8549246 20.21 0.0002 ACCOUNTS 19.04825 2.97291 17365864 41.05 <.0001

Bounds on condition number: 1.0417, 4.1667-----------------------------------------------------------------------------------------------

Page 65: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression Forward Selection: Step 3 Variable POTENT Entered: R-Square = 0.8277 and C(p) = 17.8740 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 34250796 11416932 33.63 <.0001 Error 21 7128753 339464 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -327.23339 394.40032 233687 0.69 0.4160 POTENT 0.02192 0.00866 2177476 6.41 0.0194 ADVERT 0.21607 0.04533 7713722 22.72 0.0001 ACCOUNTS 15.55392 2.99937 9128825 26.89 <.0001

Bounds on condition number: 1.3213, 11.039----------------------------------------------------------------------------------------------- Forward Selection: Step 4 Variable SHARE Entered: R-Square = 0.9004 and C(p) = 5.1519 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 37260202 9315051 45.23 <.0001 Error 20 4119346 205967 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1441.93183 423.58170 2386786 11.59 0.0028 POTENT 0.03822 0.00798 4727717 22.95 0.0001 ADVERT 0.17499 0.03691 4630369 22.48 0.0001 SHARE 190.14430 49.74415 3009406 14.61 0.0011 ACCOUNTS 9.21390 2.86521 2129962 10.34 0.0043

Bounds on condition number: 1.9872, 26.842----------------------------------------------------------------------------------------------- Forward Selection: Step 5

Variable CHANGE Entered: R-Square = 0.9119 and C(p) = 4.8276 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 37735319 7547064 39.35 <.0001 Error 19 3644230 191802 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1289.68098 420.04665 1808097 9.43 0.0063 POTENT 0.03771 0.00770 4593924 23.95 0.0001 ADVERT 0.15782 0.03725 3443070 17.95 0.0004 SHARE 191.43749 48.01009 3049587 15.90 0.0008 CHANGE 266.90116 169.58067 475117 2.48 0.1320 ACCOUNTS 8.36231 2.81737 1689728 8.81 0.0079

Bounds on condition number: 2.0633, 40.675-----------------------------------------------------------------------------------------------

Page 66: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression Forward Selection: Step 6 Variable TIME Entered: R-Square = 0.9181 and C(p) = 5.5848 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 6 37989376 6331563 33.62 <.0001 Error 18 3390173 188343 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1194.27096 424.27123 1492341 7.92 0.0115 TIME 1.98986 1.71330 254057 1.35 0.2606 POTENT 0.03831 0.00765 4720754 25.06 <.0001 ADVERT 0.14712 0.03804 2816953 14.96 0.0011 SHARE 212.55550 50.93154 3280352 17.42 0.0006 CHANGE 269.63915 168.06129 484819 2.57 0.1260 ACCOUNTS 5.04611 3.99339 300732 1.60 0.2225

Bounds on condition number: 4.2214, 80.718----------------------------------------------------------------------------------------------- Forward Selection: Step 7 Variable WORKLOAD Entered: R-Square = 0.9209 and C(p) = 7.0136 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 7 38106124 5443732 28.27 <.0001 Error 17 3273425 192554 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1606.14928 681.04937 1070947 5.56 0.0306 TIME 1.59214 1.80607 149640 0.78 0.3903 POTENT 0.03699 0.00792 4198456 21.80 0.0002 ADVERT 0.16119 0.04249 2770422 14.39 0.0015 SHARE 181.69220 64.98520 1505210 7.82 0.0124 CHANGE 299.03549 174.07303 568246 2.95 0.1040 ACCOUNTS 6.63862 4.52621 414228 2.15 0.1607 WORKLOAD 26.06794 33.47799 116748 0.61 0.4469

Bounds on condition number: 5.3044, 128.04----------------------------------------------------------------------------------------------- No other variable met the 0.5000 significance level for entry into the model.

Summary of Forward Selection Variable Number Partial Model Step Entered Vars In R-Square R-Square C(p) F Value Pr > F 1 ACCOUNTS 1 0.5685 0.5685 66.3492 30.30 <.0001 2 ADVERT 2 0.2066 0.7751 26.5262 20.21 0.0002 3 POTENT 3 0.0526 0.8277 17.8740 6.41 0.0194 4 SHARE 4 0.0727 0.9004 5.1519 14.61 0.0011 5 CHANGE 5 0.0115 0.9119 4.8276 2.48 0.1320 6 TIME 6 0.0061 0.9181 5.5848 1.35 0.2606 7 WORKLOAD 7 0.0028 0.9209 7.0136 0.61 0.4469

Page 67: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression STEPWISE REGRESSION USING BACKWARD ELIMINATION

The STEPWISE Procedure Model: MODEL1 Dependent Variable: SALES Backward Elimination: Step 0 All Variables Entered: R-Square = 0.9210 and C(p) = 9.0000

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 38108913 4763614 23.30 <.0001 Error 16 3270636 204415 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1642.62908 768.07059 934951 4.57 0.0482 TIME 1.66830 1.97176 146335 0.72 0.4100 POTENT 0.03684 0.00826 4067958 19.90 0.0004 ADVERT 0.15912 0.04722 2321099 11.35 0.0039 SHARE 183.52338 68.76742 1455893 7.12 0.0168 CHANGE 289.66240 196.48626 444255 2.17 0.1598 ACCOUNTS 6.49811 4.81615 372122 1.82 0.1960 WORKLOAD 25.67997 34.65316 112258 0.55 0.4694 RATING 15.01902 128.57870 2789.05629 0.01 0.9085

Bounds on condition number: 5.6573, 172.56----------------------------------------------------------------------------------------------- Backward Elimination: Step 1 Variable RATING Removed: R-Square = 0.9209 and C(p) = 7.0136

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 7 38106124 5443732 28.27 <.0001 Error 17 3273425 192554 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1606.14928 681.04937 1070947 5.56 0.0306 TIME 1.59214 1.80607 149640 0.78 0.3903 POTENT 0.03699 0.00792 4198456 21.80 0.0002 ADVERT 0.16119 0.04249 2770422 14.39 0.0015 SHARE 181.69220 64.98520 1505210 7.82 0.0124 CHANGE 299.03549 174.07303 568246 2.95 0.1040 ACCOUNTS 6.63862 4.52621 414228 2.15 0.1607 WORKLOAD 26.06794 33.47799 116748 0.61 0.4469

Page 68: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression Backward Elimination: Step 2 Variable WORKLOAD Removed: R-Square = 0.9181 and C(p) = 5.5848 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 6 37989376 6331563 33.62 <.0001 Error 18 3390173 188343 Corrected Total 24 41379549 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1194.27096 424.27123 1492341 7.92 0.0115 TIME 1.98986 1.71330 254057 1.35 0.2606 POTENT 0.03831 0.00765 4720754 25.06 <.0001 ADVERT 0.14712 0.03804 2816953 14.96 0.0011 SHARE 212.55550 50.93154 3280352 17.42 0.0006 CHANGE 269.63915 168.06129 484819 2.57 0.1260 ACCOUNTS 5.04611 3.99339 300732 1.60 0.2225 Bounds on condition number: 4.2214, 80.718----------------------------------------------------------------------------------------------- Backward Elimination: Step 3 Variable TIME Removed: R-Square = 0.9119 and C(p) = 4.8276 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 37735319 7547064 39.35 <.0001 Error 19 3644230 191802 Corrected Total 24 41379549 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1289.68098 420.04665 1808097 9.43 0.0063 POTENT 0.03771 0.00770 4593924 23.95 0.0001 ADVERT 0.15782 0.03725 3443070 17.95 0.0004 SHARE 191.43749 48.01009 3049587 15.90 0.0008 CHANGE 266.90116 169.58067 475117 2.48 0.1320 ACCOUNTS 8.36231 2.81737 1689728 8.81 0.0079

----------------------------------------------------------------------------------------------- Backward Elimination: Step 4 Variable CHANGE Removed: R-Square = 0.9004 and C(p) = 5.1519 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 37260202 9315051 45.23 <.0001 Error 20 4119346 205967 Corrected Total 24 41379549 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1441.93183 423.58170 2386786 11.59 0.0028 POTENT 0.03822 0.00798 4727717 22.95 0.0001 ADVERT 0.17499 0.03691 4630369 22.48 0.0001 SHARE 190.14430 49.74415 3009406 14.61 0.0011 ACCOUNTS 9.21390 2.86521 2129962 10.34 0.0043

Bounds on condition number: 1.9872, 26.842----------------------------------------------------------------------------------------------- All variables left in the model are significant at the 0.1000 level.

Page 69: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

Summary of Backward Elimination

Variable Number Partial Model Step Removed Vars In R-Square R-Square C(p) F Value Pr > F

1 RATING 7 0.0001 0.9209 7.0136 0.01 0.9085 2 WORKLOAD 6 0.0028 0.9181 5.5848 0.61 0.4469 3 TIME 5 0.0061 0.9119 4.8276 1.35 0.2606 4 CHANGE 4 0.0115 0.9004 5.1519 2.48 0.1320

Page 70: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

The STEPWISE Procedure Model: MODEL1 Dependent Variable: SALES

Stepwise Selection: Step 1 Variable ACCOUNTS Entered: R-Square = 0.5685 and C(p) = 66.3492

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 23524074 23524074 30.30 <.0001 Error 23 17855475 776325 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 709.32383 515.24608 1471307 1.90 0.1819 ACCOUNTS 21.72177 3.94603 23524074 30.30 <.0001

Bounds on condition number: 1, 1----------------------------------------------------------------------------------------------- Stepwise Selection: Step 2 Variable ADVERT Entered: R-Square = 0.7751 and C(p) = 26.5262

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 32073320 16036660 37.91 <.0001 Error 22 9306229 423010 Corrected Total 24 41379549

Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 50.29906 407.60969 6441.42281 0.02 0.9029 ADVERT 0.22653 0.05039 8549246 20.21 0.0002 ACCOUNTS 19.04825 2.97291 17365864 41.05 <.0001

Bounds on condition number: 1.0417, 4.1667-----------------------------------------------------------------------------------------------

Page 71: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression Stepwise Selection: Step 3 Variable POTENT Entered: R-Square = 0.8277 and C(p) = 17.8740 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F

Model 3 34250796 11416932 33.63 <.0001 Error 21 7128753 339464 Corrected Total 24 41379549 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -327.23339 394.40032 233687 0.69 0.4160 POTENT 0.02192 0.00866 2177476 6.41 0.0194 ADVERT 0.21607 0.04533 7713722 22.72 0.0001 ACCOUNTS 15.55392 2.99937 9128825 26.89 <.0001

Bounds on condition number: 1.3213, 11.039----------------------------------------------------------------------------------------------- Stepwise Selection: Step 4 Variable SHARE Entered: R-Square = 0.9004 and C(p) = 5.1519 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 4 37260202 9315051 45.23 <.0001 Error 20 4119346 205967 Corrected Total 24 41379549 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1441.93183 423.58170 2386786 11.59 0.0028 POTENT 0.03822 0.00798 4727717 22.95 0.0001 ADVERT 0.17499 0.03691 4630369 22.48 0.0001 SHARE 190.14430 49.74415 3009406 14.61 0.0011 ACCOUNTS 9.21390 2.86521 2129962 10.34 0.0043

Bounds on condition number: 1.9872, 26.842-----------------------------------------------------------------------------------------------

Stepwise Selection: Step 5 Variable CHANGE Entered: R-Square = 0.9119 and C(p) = 4.8276 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 37735319 7547064 39.35 <.0001 Error 19 3644230 191802 Corrected Total 24 41379549 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept -1289.68098 420.04665 1808097 9.43 0.0063 POTENT 0.03771 0.00770 4593924 23.95 0.0001 ADVERT 0.15782 0.03725 3443070 17.95 0.0004 SHARE 191.43749 48.01009 3049587 15.90 0.0008 CHANGE 266.90116 169.58067 475117 2.48 0.1320 ACCOUNTS 8.36231 2.81737 1689728 8.81 0.0079

Bounds on condition number: 2.0633, 40.675----------------------------------------------------------------------------------------------- All variables left in the model are significant at the 0.1500 level. No other variable met the 0.1500 significance level for entry into the model.

Page 72: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

Summary of Stepwise Selection

Variable Variable Number Partial Model Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F

1 ACCOUNTS 1 0.5685 0.5685 66.3492 30.30 <.0001 2 ADVERT 2 0.2066 0.7751 26.5262 20.21 0.0002 3 POTENT 3 0.0526 0.8277 17.8740 6.41 0.0194 4 SHARE 4 0.0727 0.9004 5.1519 14.61 0.0011 5 CHANGE 5 0.0115 0.9119 4.8276 2.48 0.1320

Page 73: 6-4 Other Aspects of Regression

Press Statistic

The main purpose of many regression analyses is to predict Y for a future set of X’s. The problem is that we have only present Y’s and X’s to use to make a model, but we would like to evaluate the model by how well it estimates Y’s with new X’s.

The Press Statistic tries to overcome this problem. It is similar to the DFFITS in that you remove one observation at a time. The parameters are then calculated and is calculated for the X’s of the observation that is removed. Once the ’s are calculated in this manner for each observation (call them ) the press statistic can be calculated.

Notice that this is very similar to SSE. It is very computation intensive, however. The Press Statistic is obtained in SAS by using the r option on the model statement.

6-4 Other Aspects of Regression

Page 74: 6-4 Other Aspects of Regression

Validation Data Split

1) Split data into a fitting portion and a validation portion. This should be done randomly.

2) Perform the model fitting routine as discussed earlier using data in the fitting portion only.

3) For each viable model compute the SSE using the observations in the validation data portion. The best model is the one that minimizes the SSE.

4) Recalculate the chosen model using the entire data set.

Notice this procedure requires a large enough data set to enable you to split a validation portion off and still have adequate data to evaluate models. The process is tedious in SAS, requiring multiple runs or fancy programming.

6-4 Other Aspects of Regression

Page 75: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

The REG Procedure Model: MODEL1 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 4 37260202 9315051 45.23 <.0001 Error 20 4119346 205967 Corrected Total 24 41379549

Root MSE 453.83623 R-Square 0.9004 Dependent Mean 3374.56760 Adj R-Sq 0.8805 Coeff Var 13.44872

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -1441.93183 423.58170 -3.40 0.0028 POTENT 1 0.03822 0.00798 4.79 0.0001 ADVERT 1 0.17499 0.03691 4.74 0.0001 SHARE 1 190.14430 49.74415 3.82 0.0011 ACCOUNTS 1 9.21390 2.86521 3.22 0.0043

Sum of Residuals 0 Sum of Squared Residuals 4119346 Predicted Residual SS (PRESS) 5804450

Page 76: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

The REG Procedure Model: MODEL1 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 4 37260202 9315051 45.23 <.0001 Error 20 4119346 205967 Corrected Total 24 41379549

Root MSE 453.83623 R-Square 0.9004 Dependent Mean 3374.56760 Adj R-Sq 0.8805 Coeff Var 13.44872

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -1441.93183 423.58170 -3.40 0.0028 POTENT 1 0.03822 0.00798 4.79 0.0001 ADVERT 1 0.17499 0.03691 4.74 0.0001 SHARE 1 190.14430 49.74415 3.82 0.0011 ACCOUNTS 1 9.21390 2.86521 3.22 0.0043

Sum of Residuals 0 Sum of Squared Residuals 4119346 Predicted Residual SS (PRESS) 5804450

Page 77: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

The REG Procedure Model: MODEL2 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 5 37735319 7547064 39.35 <.0001 Error 19 3644230 191802 Corrected Total 24 41379549

Root MSE 437.95155 R-Square 0.9119 Dependent Mean 3374.56760 Adj R-Sq 0.8888 Coeff Var 12.97801

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -1289.68098 420.04665 -3.07 0.0063 POTENT 1 0.03771 0.00770 4.89 0.0001 ADVERT 1 0.15782 0.03725 4.24 0.0004 SHARE 1 191.43749 48.01009 3.99 0.0008 CHANGE 1 266.90116 169.58067 1.57 0.1320 ACCOUNTS 1 8.36231 2.81737 2.97 0.0079

Sum of Residuals 0 Sum of Squared Residuals 3644230 Predicted Residual SS (PRESS) 5470022

Page 78: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

The REG Procedure Model: MODEL3 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 5 37688644 7537729 38.80 <.0001 Error 19 3690905 194258 Corrected Total 24 41379549

Root MSE 440.74727 R-Square 0.9108 Dependent Mean 3374.56760 Adj R-Sq 0.8873 Coeff Var 13.06085

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -1145.46066 429.09266 -2.67 0.0152 TIME 1 3.53781 1.21646 2.91 0.0090 POTENT 1 0.04287 0.00685 6.26 <.0001 ADVERT 1 0.13503 0.03739 3.61 0.0019 SHARE 1 252.99524 40.23829 6.29 <.0001 CHANGE 1 300.27943 168.89413 1.78 0.0914

Sum of Residuals 0 Sum of Squared Residuals 3690905 Predicted Residual SS (PRESS) 5681706

Page 79: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

The REG Procedure Model: MODEL4 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 5 37504557 7500911 36.78 <.0001 Error 19 3874992 203947 Corrected Total 24 41379549

Root MSE 451.60487 R-Square 0.9064 Dependent Mean 3374.56760 Adj R-Sq 0.8817 Coeff Var 13.38260

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -1349.90222 429.80275 -3.14 0.0054 TIME 1 1.95130 1.78268 1.09 0.2874 POTENT 1 0.03882 0.00796 4.88 0.0001 ADVERT 1 0.16468 0.03791 4.34 0.0004 SHARE 1 210.84008 52.98770 3.98 0.0008 ACCOUNTS 1 5.97052 4.11204 1.45 0.1628

Sum of Residuals 0 Sum of Squared Residuals 3874992 Predicted Residual SS (PRESS) 6339858

Page 80: 6-4 Other Aspects of Regression

Example

6-4 Other Aspects of Regression

The REG Procedure Model: MODEL5 Dependent Variable: SALES

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 6 37691895 6281983 30.66 <.0001 Error 18 3687654 204870 Corrected Total 24 41379549

Root MSE 452.62529 R-Square 0.9109 Dependent Mean 3374.56760 Adj R-Sq 0.8812 Coeff Var 13.41284

Parameter Estimates

Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -1204.48288 643.19486 -1.87 0.0775 TIME 1 3.55133 1.25385 2.83 0.0110 POTENT 1 0.04289 0.00704 6.09 <.0001 ADVERT 1 0.13656 0.04027 3.39 0.0033 SHARE 1 250.30061 46.53086 5.38 <.0001 CHANGE 1 306.09513 179.48471 1.71 0.1053 WORKLOAD 1 3.88068 30.80564 0.13 0.9011

Sum of Residuals 0 Sum of Squared Residuals 3687654 Predicted Residual SS (PRESS) 6286583

Page 81: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Model R2 Adj R2 MSE PRESSPotent advert share accounts 0.9004 0.8805 453.8362 5804450

Potent advert share change accounts 0.9119 0.8888 437.9516 5470022

Time potent advert share change 0.9108 0.8873 440.7473 5681706

Time potent advert share accounts 0.9064 0.8817 451.6049 6339858

Time potent advert share change workload

0.9109 0.8812 452.6253 6286583

No one model could be used confidence interval might be helpful to decide the best model or parsimony

Page 82: 6-4 Other Aspects of Regression

OPTIONS NODATE NOOVP NONUMBER;DATA WINDMILL;INFILE 'C:\users\myung\Documents\Teaching\ 학부과목 \imen214-stats\ch06\windmill.dat';INPUT X Y;XSQ=X**2;XRC=1/X;LOGY=LOG(Y);LABEL Y='DC OUTPUT' X='WIND VELOCITY';

PROC REG DATA=WINDMILL; MODEL Y=X;TITLE 'LINEAR REGRESSION MODEL';

PROC REG DATA=WINDMILL; MODEL Y=X XSQ;TITLE 'QUADRATIC REGRESSION MODEL';

PROC REG DATA=WINDMILL; MODEL Y=XRC;TITLE 'RECIPROCAL REGRESSION MODEL';RUN; QUIT;

Q 6-44 in Page 354

6-4 Other Aspects of Regression

Page 83: 6-4 Other Aspects of Regression

LINEAR REGRESSION MODEL

The REG Procedure Model: MODEL1 Dependent Variable: Y DC OUTPUT

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 8.92961 8.92961 160.26 <.0001 Error 23 1.28157 0.05572 Corrected Total 24 10.21119

Root MSE 0.23605 R-Square 0.8745 Dependent Mean 1.60960 Adj R-Sq 0.8690 Coeff Var 14.66526

Parameter Estimates

Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 0.13088 0.12599 1.04 0.3097 X WIND VELOCITY 1 0.24115 0.01905 12.66 <.0001

6-4 Other Aspects of Regression

Page 84: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 85: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 86: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 87: 6-4 Other Aspects of Regression

QUADRATIC REGRESSION MODEL

The REG Procedure Model: MODEL1 Dependent Variable: Y DC OUTPUT

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 9.88011 4.94006 328.27 <.0001 Error 22 0.33108 0.01505 Corrected Total 24 10.21119

Root MSE 0.12267 R-Square 0.9676 Dependent Mean 1.60960 Adj R-Sq 0.9646 Coeff Var 7.62140

Parameter Estimates

Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 -1.15590 0.17465 -6.62 <.0001 X WIND VELOCITY 1 0.72294 0.06143 11.77 <.0001 XSQ 1 -0.03812 0.00480 -7.95 <.0001

6-4 Other Aspects of Regression

Page 88: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 89: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 90: 6-4 Other Aspects of Regression

RECIPROCAL REGRESSION MODEL

The REG Procedure Model: MODEL1 Dependent Variable: Y DC OUTPUT

Number of Observations Read 25 Number of Observations Used 25

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 1 10.00722 10.00722 1128.43 <.0001 Error 23 0.20397 0.00887 Corrected Total 24 10.21119

Root MSE 0.09417 R-Square 0.9800 Dependent Mean 1.60960 Adj R-Sq 0.9792 Coeff Var 5.85061

Parameter Estimates

Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 2.97886 0.04490 66.34 <.0001 XRC 1 -6.93455 0.20643 -33.59 <.0001

6-4 Other Aspects of Regression

Page 91: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 92: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 93: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 94: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 95: 6-4 Other Aspects of Regression

6-4 Other Aspects of RegressionConfidence Interval for E(Y)

The % confidence interval for E(Y) for given values of the X’s has the form

The standard error for when considered as an estimate of E(Y) is estimated by . This quantity is output by SAS when the CLM option is used. It is labeled STD ERR PREDICTION (SEP).

Prediction Interval for a Y The % prediction interval for Y has the form

The standard error for prediction can be obtained from the SEP as follows,

This can be obtained by SAS option CLI on the model statement. To obtain confidence and prediction intervals for X values not in the data set, enter an observation for each set of X’s and put a missing value for the Y. The observations will not be used in the calculation, but the desired intervals will be constructed.

Page 96: 6-4 Other Aspects of Regression

OPTIONS NODATE NOOVP NONUMBER;DATA RADS;INFILE 'C:\users\myung\Documents\Teaching\ 학부과목 \imen214-stats\ch06\rads.dat';INPUT Y X1 X2;LABEL Y='RADIATION DOSE' X1='CURRENT' X2='EXPOSURE TIME';PROC REG DATA=RADS; MODEL Y=X1 X2/XPX CLM CLI; TITLE 'FULL REGRESSION MODEL';DATA RADS2; X1=15; X2=5; OUTPUT;DATA RADS3; SET RADS RADS2;PROC REG DATA=RADS3; MODEL Y=X1 X2/ALPHA=0.10 CLM CLI; TITLE 'PREDICTION INTERVAL WITH ALPHA=0.10';RUN; QUIT;

Q 6-57 in page 357

6-4 Other Aspects of Regression

Page 97: 6-4 Other Aspects of Regression

The REG Procedure Model: MODEL1

Model Crossproducts X'X X'Y Y'Y

Variable Label Intercept X1 X2 Y Intercept Intercept 40 920 258.75 17615.7 X1 CURRENT 920 25800 5951.25 494005.5 X2 EXPOSURE TIME 258.75 5951.25 3696.5625 251661.975 Y RADIATION DOSE 17615.7 494005.5 251661.975 20890132.29----------------------------------------------------------------------------------------------- FULL REGRESSION MODEL

The REG Procedure Model: MODEL1 Dependent Variable: Y RADIATION DOSE Number of Observations Read 40 Number of Observations Used 40

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F Model 2 11076473 5538237 99.67 <.0001 Error 37 2055837 55563 Corrected Total 39 13132310

Root MSE 235.71839 R-Square 0.8435 Dependent Mean 440.39250 Adj R-Sq 0.8350 Coeff Var 53.52461

Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 -440.39250 94.19757 -4.68 <.0001 X1 CURRENT 1 19.14750 3.46047 5.53 <.0001 X2 EXPOSURE TIME 1 68.08000 5.24107 12.99 <.0001

6-4 Other Aspects of Regression

Page 98: 6-4 Other Aspects of Regression

FULL REGRESSION MODEL The REG Procedure Model: MODEL1 Dependent Variable: Y RADIATION DOSE

Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 7.4000 -231.8975 66.8963 -367.4424 -96.3526 -728.3696 264.5746 239.2975 2 14.8000 -214.8775 66.2678 -349.1489 -80.6061 -711.0034 281.2484 229.6775 3 29.6000 -180.8375 65.0718 -312.6855 -48.9895 -676.3130 314.6380 210.4375 4 59.2000 -112.7575 62.9394 -240.2848 14.7698 -607.1008 381.5858 171.9575 5 88.8000 -44.6775 61.1828 -168.6456 79.2906 -538.1146 448.7596 133.4775 6 296.0000 431.8825 61.2809 307.7156 556.0494 -61.6046 925.3696 -135.8825 7 444.0000 772.2825 73.5667 623.2221 921.3429 271.9515 1273 -328.2825 8 592.0000 1113 91.8815 926.5128 1299 600.0703 1625 -520.6825 9 11.1000 -136.1600 56.7253 -251.0965 -21.2235 -627.4059 355.0859 147.2600 10 22.2000 -119.1400 55.9828 -232.5719 -5.7081 -610.0360 371.7560 141.3400 11 44.4000 -85.1000 54.5617 -195.6526 25.4526 -575.3387 405.1387 129.5000 12 88.8000 -17.0200 52.0001 -122.3822 88.3422 -506.1144 472.0744 105.8200 13 133.2000 51.0600 49.8596 -49.9651 152.0851 -437.1184 539.2384 82.1400 14 444.0000 527.6200 49.9800 426.3510 628.8890 39.3910 1016 -83.6200 15 666.0000 868.0200 64.4570 737.4177 998.6223 372.8745 1363 -202.0200 16 888.0000 1208 84.7636 1037 1380 700.8678 1716 -320.4200 17 14.8000 -40.4225 50.5880 -142.9236 62.0786 -528.9085 448.0635 55.2225 18 29.6000 -23.4025 49.7539 -124.2136 77.4086 -511.5367 464.7317 53.0025 19 59.2000 10.6375 48.1494 -86.9225 108.1975 -476.8356 498.1106 48.5625 20 118.4000 78.7175 45.2261 -12.9192 170.3542 -407.6048 565.0398 39.6825 21 177.6000 146.7975 42.7477 60.1825 233.4125 -338.6036 632.1986 30.8025 22 592.0000 623.3575 42.8880 536.4582 710.2568 137.9056 1109 -31.3575 23 888.0000 963.7575 59.1278 843.9533 1084 471.3500 1456 -75.7575 24 1184 1304 80.7852 1140 1468 799.2760 1809 -120.1575 25 22.2000 151.0525 55.1193 39.3701 262.7349 -339.4422 641.5472 -128.8525 26 44.4000 168.0725 54.3548 57.9391 278.2059 -322.0718 658.2168 -123.6725 27 88.8000 202.1125 52.8901 94.9470 309.2780 -287.3735 691.5985 -113.3125 28 177.6000 270.1925 50.2433 168.3899 371.9951 -218.1474 758.5324 -92.5925 29 266.4000 338.2725 48.0245 240.9656 435.5794 -149.1500 825.6950 -71.8725 30 888.0000 814.8325 48.1495 717.2724 912.3926 327.3593 1302 73.1675 31 1332 1155 63.0483 1027 1283 660.8322 1650 176.7675 32 1776 1496 83.6973 1326 1665 988.8073 2002 280.3675 33 29.6000 342.5275 76.8902 186.7332 498.3218 -159.8508 844.9058 -312.9275 34 59.2000 359.5475 76.3440 204.8599 514.2351 -142.4887 861.5837 -300.3475 35 118.4000 393.5875 75.3081 240.9987 546.1763 -107.8060 894.9810 -275.1875 36 236.8000 461.6675 73.4734 312.7962 610.5388 -38.6072 961.9422 -224.8675 37 355.2000 529.7475 71.9744 383.9135 675.5815 30.3682 1029 -174.5475 38 1184 1006 72.0578 860.3045 1152 506.8788 1506 177.6925 39 1776 1347 82.7589 1179 1514 840.5153 1853 429.2925 40 2368 1687 99.3941 1486 1888 1169 2205 680.8925 41 . 187.2200 47.0609 91.8657 282.5743 -299.8165 674.2565 .

Sum of Residuals 0 Sum of Squared Residuals 2055837 Predicted Residual SS (PRESS) 2683158

Page 99: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 100: 6-4 Other Aspects of Regression

6-4 Other Aspects of Regression

Page 101: 6-4 Other Aspects of Regression

PREDICTION INTERVAL WITH ALPHA=0.10

The REG Procedure Model: MODEL1 Dependent Variable: Y RADIATION DOSE

Number of Observations Read 41 Number of Observations Used 40 Number of Observations with Missing Values 1

Analysis of Variance

Sum of Mean Source DF Squares Square F Value Pr > F

Model 2 11076473 5538237 99.67 <.0001 Error 37 2055837 55563 Corrected Total 39 13132310

Root MSE 235.71839 R-Square 0.8435 Dependent Mean 440.39250 Adj R-Sq 0.8350 Coeff Var 53.52461

Parameter Estimates

Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|

Intercept Intercept 1 -440.39250 94.19757 -4.68 <.0001 X1 CURRENT 1 19.14750 3.46047 5.53 <.0001 X2 EXPOSURE TIME 1 68.08000 5.24107 12.99 <.0001

6-4 Other Aspects of Regression

Page 102: 6-4 Other Aspects of Regression

PREDICTION INTERVAL WITH ALPHA=0.10 The REG Procedure Model: MODEL1 Dependent Variable: Y RADIATION DOSE

Output Statistics Dependent Predicted Std Error Obs Variable Value Mean Predict 90% CL Mean 90% CL Predict Residual

1 7.4000 -231.8975 66.8963 -344.7579 -119.0371 -645.2812 181.4862 239.2975 2 14.8000 -214.8775 66.2678 -326.6775 -103.0775 -627.9729 198.2179 229.6775 3 29.6000 -180.8375 65.0718 -290.6197 -71.0553 -593.3914 231.7164 210.4375 4 59.2000 -112.7575 62.9394 -218.9422 -6.5728 -524.3687 298.8537 171.9575 5 88.8000 -44.6775 61.1828 -147.8986 58.5436 -455.5341 366.1791 133.4775 6 296.0000 431.8825 61.2809 328.4958 535.2692 20.9842 842.7808 -135.8825 7 444.0000 772.2825 73.5667 648.1685 896.3965 355.6857 1189 -328.2825 8 592.0000 1113 91.8815 957.6698 1268 685.8599 1540 -520.6825 9 11.1000 -136.1600 56.7253 -231.8610 -40.4590 -545.1921 272.8721 147.2600 10 22.2000 -119.1400 55.9828 -213.5882 -24.6918 -527.8808 289.6008 141.3400 11 44.4000 -85.1000 54.5617 -177.1508 6.9508 -493.2935 323.0935 129.5000 12 88.8000 -17.0200 52.0001 -104.7491 70.7091 -424.2607 390.2207 105.8200 13 133.2000 51.0600 49.8596 -33.0578 135.1778 -355.4180 457.5380 82.1400 14 444.0000 527.6200 49.9800 443.2991 611.9409 121.0999 934.1401 -83.6200 15 666.0000 868.0200 64.4570 759.2750 976.7650 455.7409 1280 -202.0200 16 888.0000 1208 84.7636 1065 1351 785.8106 1631 -320.4200 17 14.8000 -40.4225 50.5880 -125.7692 44.9242 -447.1566 366.3116 55.2225 18 29.6000 -23.4025 49.7539 -107.3421 60.5371 -429.8437 383.0387 53.0025 19 59.2000 10.6375 48.1494 -70.5951 91.8701 -395.2533 416.5283 48.5625 20 118.4000 78.7175 45.2261 2.4169 155.0181 -326.2150 483.6500 39.6825 21 177.6000 146.7975 42.7477 74.6782 218.9168 -257.3680 550.9630 30.8025 22 592.0000 623.3575 42.8880 551.0014 695.7136 219.1497 1028 -31.3575 23 888.0000 963.7575 59.1278 864.0034 1064 553.7582 1374 -75.7575 24 1184 1304 80.7852 1168 1440 883.7718 1725 -120.1575 25 22.2000 151.0525 55.1193 58.0610 244.0440 -257.3542 559.4592 -128.8525 26 44.4000 168.0725 54.3548 76.3708 259.7742 -240.0424 576.1874 -123.6725 27 88.8000 202.1125 52.8901 112.8820 291.3430 -205.4543 609.6793 -113.3125 28 177.6000 270.1925 50.2433 185.4273 354.9577 -136.4200 676.8050 -92.5925 29 266.4000 338.2725 48.0245 257.2506 419.2944 -67.5761 744.1211 -71.8725 30 888.0000 814.8325 48.1495 733.5998 896.0652 408.9417 1221 73.1675 31 1332 1155 63.0483 1049 1262 743.5739 1567 176.7675 32 1776 1496 83.6973 1354 1637 1074 1918 280.3675 33 29.6000 342.5275 76.8902 212.8066 472.2484 -75.7739 760.8289 -312.9275 34 59.2000 359.5475 76.3440 230.7480 488.3470 -58.4691 777.5641 -300.3475 35 118.4000 393.5875 75.3081 266.5356 520.6394 -23.8940 811.0690 -275.1875 36 236.8000 461.6675 73.4734 337.7109 585.6241 45.1176 878.2174 -224.8675 37 355.2000 529.7475 71.9744 408.3200 651.1750 113.9432 945.5518 -174.5475 38 1184 1006 72.0578 884.7392 1128 590.4621 1422 177.6925 39 1776 1347 82.7589 1207 1486 925.2304 1768 429.2925 40 2368 1687 99.3941 1519 1855 1256 2119 680.8925 41 . 187.2200 47.0609 107.8239 266.6161 -218.3072 592.7472 .

Sum of Residuals 0 Sum of Squared Residuals 2055837 Predicted Residual SS (PRESS) 2683158

Page 103: 6-4 Other Aspects of Regression