34
IOWA STATE UNIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500 Lecture No. 10 October 5, 2010

I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

Embed Size (px)

Citation preview

Page 1: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTRET & Evaluating Regression

analyses With The Help of PROC RSQUARE

Animal Science 500

Lecture No. 10

October 5, 2010

Page 2: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREGu The purpose of robust regression is to detect outliers and provide

stable results in the presence of outliers. n In order to achieve this stability, robust regression limits the influence of outliers.

u Outliers can be classified as:n Problems with outliers in the y-direction (response direction)

n Problems with multivariate outliers in the x-space (i.e., outliers in the covariate space, which are also referred to as leverage points)

n Problems with outliers in both the y-direction and the x-space

Page 3: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREGu Two types of estimations methods

u M Estimation - is the method for outlier detection and robust regression when contamination is mainly

in the response direction (y)

u LTS Estimation - the method used when data contamination occurs in the x space.

Page 4: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG M-estimation

u The following ROBUSTREG statements analyze the data:

Proc Robustreg data=stack;

model y = x1 x2 x3 / diagnostics leverage;

id x1;

test x3;

run;

quit;

Page 5: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG M-estimation

Proc Robustreg data=stack;

model y = x1 x2 x3 / diagnostics leverage;

id x1;

test x3;

run;

quit;

The procedure does M estimation with the bisquare weight function (default), and it uses the median method for estimating the scale parameter.

The MODEL statement specifies the covariate effects.

The DIAGNOSTICS option requests a table for outlier diagnostics,

The LEVERAGE option adds leverage point diagnostic results to this table for continuous covariate effects.

The ID statement specifies that variable x1 is used to identify each observation in this table. If the ID statement is missing, the observation number is used to identify the observations (might even be better this way in some cases).

Tests of significance for the covariate effects are obtained using the test line with a variable(s) listed with the test term.

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect3.htm

Page 6: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG example output M-estimation

The ROBUSTREG Procedure

Model InformationData Set WORK.STACKDependent Variable yNumber of Covariates 3Number of Observations 21Method M Estimation

Summary Statistics

Variable Q1 Median Q3 Mean Standard MAD

Deviation x1 53.0000 58.0000 62.0000 60.4286 9.1683 5.9304x2 18.0000 20.0000 24.0000 21.0952 3.1608 2.9652x3 82.0000 87.0000 89.5000 86.2857 5.3586 4.4478y 10.0000 15.0000 19.5000 17.5238 10.1716 5.9304

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect3.htm

Page 7: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG example output

The ROBUSTREG Procedure

Parameter Estimates

Parameter DF EstimateStandard

Error 95% Confidence Limits Chi-Square Pr > ChiSq

Intercept 1 -42.2854 9.5045 -60.9138 -23.6569 19.79 <.0001

x1 1 0.9276 0.1077 0.7164 1.1387 74.11 <.0001

x2 1 0.6507 0.2940 0.0744 1.2270 4.90 0.0269

x3 1 -0.1123 0.1249 -0.3571 0.1324 0.81 0.3683

Scale 1 2.2819        

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect3.htm

Page 8: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG example output M-estimation

Diagnostics

Obs x1Mahalanobis

DistanceRobust MCD

DistanceLeverage

StandardizedRobust

ResidualOutlier

1 80.000000 2.2536 5.5284 * 1.0995  

2 80.000000 2.3247 5.6374 * -1.1409  

3 75.000000 1.5937 4.1972 * 1.5604  

4 62.000000 1.2719 1.5887   3.0381 *

21 70.000000 2.1768 3.6573 * -4.5733 *

The ROBUSTREG Procedure

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect3.htm

Page 9: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG example output M-estimation

Diagnostics Summary

Observation Type Proportion Cutoff

Outlier 0.0952 3.0000

Leverage 0.1905 3.0575

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect3.htm

Page 10: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG LTS-estimation

u The following statements invoke the ROBUSTREG procedure with the LTS estimation method.

Proc Robustreg data=hbk fwls method=lts;

model y = x1 x2 x3 / diagnostics leverage;

Id index;

run;

quit;

Page 11: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG LTS-estimation

u The following statements invoke the ROBUSTREG procedure with the LTS estimation method.

Proc Robustreg data=hbk fwls method=lts;

model y = x1 x2 x3 / diagnostics leverage;

Id index;

run;

quit;

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

Page 12: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG Output LTS-estimation

The ROBUSTREG Procedure

Model Information

Data Set WORK.HBK

Dependent Variable y

Number of Covariates 3

Number of Observations 75

Method LTS Estimation

 

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

Page 13: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG Output LTS-estimation

The ROBUSTREG Procedure

Model Information

Data Set WORK.HBK

Dependent Variable y

Number of Covariates 3

Number of Observations 75

Method LTS Estimation

Summary Statistics

Variable Q1 Median Q3 MeanStandardDeviation MAD

X1 0.8000 1.8000 3.1000 3.2067 3.6526 1.9274

X2 1.0000 2.2000 3.3000 5.5973 8.2391 1.6309

X3 0.9000 2.1000 3.0000 7.2307 11.7403 1.7791

Y -0.5000 0.1000 0.7000 1.2787 3.4928 0.8896

 

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

Page 14: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG Output LTS-estimation

The ROBUSTREG Procedure

LTS Profile

Total Number of Observations 75

Number of Squares Minimized 57

Number of Coefficients 4

Highest Possible Breakdown Value

0.2533

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

Page 15: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG Output LTS-estimation

The ROBUSTREG Procedure

LTS Parameter Estimates

Parameter DF Estimate

Intercept 1 -0.3431

x1 1 0.0901

x2 1 0.0703

x3 1 -0.0731

Scale (sLTS) 0 0.7451

Scale (Wscale) 0 0.5749

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

Page 16: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG Output LTS-estimation

Diagnostics

Obs index Mahalanobis Distance

Robust MCD Distance Leverage

StandardizedRobust

ResidualOutlier

1 1 1.9168 29.4424 * 17.0868 *3 2 1.8558 30.2054 * 17.8428 *5 3 2.3137 31.8909 * 18.3063 *7 4 2.2297 32.8621 * 16.9702 *9 5 2.1001 32.2778 * 17.7498 *11 6 2.1462 30.5892 * 17.5155 *13 7 2.0105 30.6807 * 18.8801 *15 8 1.9193 29.7994 * 18.2253 *17 9 2.2212 31.9537 * 17.1843 *19 10 2.3335 30.9429 * 17.8021 *21 11 2.4465 36.6384 * 0.0406  23 12 3.1083 37.9552 * -0.0874  25 13 2.6624 36.9175 * 1.0776  27 14 6.3816 41.0914 * -0.7875  

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

Page 17: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG Output LTS-estimation

Diagnostics Summary

Observation Type Proportion Cutoff

Outlier 0.1333 3.0000

Leverage 0.1867 3.0575

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

Page 18: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC ROBUSTREG Output LTS-estimation

Parameter Estimates for Final Weighted Least Squares Fit

Parameter DF Estimate Standard Error 95% Confidence Limits Chi-

Square Pr > ChiSq

Intercept 1 -0.1805 0.1044 -0.3852 0.0242 2.99 0.0840

x1 1 0.0814 0.0667 -0.0493 0.2120 1.49 0.2222

x2 1 0.0399 0.0405 -0.0394 0.1192 0.97 0.3242

x3 1 -0.0517 0.0354 -0.1210 0.0177 2.13 0.1441

Scale 0 0.5572          

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/rreg_sect4.htm

The final weighted least squares estimates are shown. These estimates are least squares estimates computed after deleting the detected outliers.

Page 19: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE

u The RSQUARE procedure selects optimal subsets of independent variables in a multiple regression analysis.

u Regression coefficients and a variety of statistics useful for model selection can be printed or output to a SAS data set.

u In SAS Version 6+, the RSQUARE procedure is subsumed by PROC REG.

Page 20: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE

u General Formn PROC RSQUARE options;

l MODEL dependents=independents/options;l FREQ variable;l WEIGHT variable; l BY variables;

n Run;n Quit;

Page 21: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE

u There must be one or more MODEL statements.

u The FREQ, WEIGHT, and BY statements can appear only once.

u The MODEL, FREQ, WEIGHT, and BY statements can appear in any order.

Page 22: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Options

u The following options can be specified in the PROC statement;n DATA=SASdataset

l names the SAS data set to be used.l The data set can be an ordinary SAS data set or a

TYPE=CORR, COV, or SSCP data set. If the DATA= option is omitted, RSQUARE uses the most recently created SAS data set.

n SIMPLE|S l Prints means and standard deviations for every variable listed

in a MODEL statement.

Page 23: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Options

u The following options can be specified in the PROC statement;n CORR|C

l Pprints the correlation matrix for all variables in the analysis.

n NOINT l suppresses the intercept term from all models.

n NOPRINT l suppresses the regression printout

n OUTEST=SASdataset l creates a TYPE=EST data set containing model-selection

statistics and parameter estimates for the selected models.

Page 24: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Options

u The options listed in the MODEL Statement section can also be used in the PROC RSQUARE statement.

u Any option specified in the PROC statement applies to every MODEL statement except those in which you specify a different value of the option.

u Optional statistics will appear in the OUTEST= data set only if the corresponding options are specified in the PROC statement

Page 25: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

u MODEL dependents=independents/options;

u Options are listed after a forward slash that follows the model statement

u SELECT=n

u Specifies the maximum number of subset models of each size to be printed or output to the OUTEST= data set.

u If SELECT= is used without the B option, the variables in each MODEL are listed in order of inclusion instead of the order in which they appear in the MODEL statement.

u If SELECT= is omitted and the number of regressors is less than 11, all possible subsets are evaluated.

u If SELECT= is omitted and the number of regressors is greater than 10, the number of subsets selected is at most equal to the number of regressors. A small value of SELECT= greatly reduces the CPU time required for large problems.

Page 26: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

u MODEL dependents=independents/options;

u Options are listed after a forward slash that follows the model statement

n INCLUDE=i l Requests that the first i variables after the equal sign in the

MODEL statement be included in every regression model. l The default status = no variables are required to appear in

every model.

n START=n l Specifies the smallest number of regressors to be reported in a

subset model. The default value is one more than the value specified by the INCLUDE= option, or one if INCLUDE= is omitted.

Page 27: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

u MODEL dependents=independents/options;

u Options are listed after a forward slash that follows the model statement

n STOP=n l Specifies the largest number of regressors to be reported in a

subset model. The default is the number of regressors listed in the MODEL statement.

n ADJRSQ l Computes r-square adjusted for degrees of freedom for each

model selected.

n CP l Computes Mallows' Cp statistic for each model selected.

Page 28: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

u MODEL dependents=independents/options;

u Options are listed after a forward slash that follows the model statement

n JP l Computes Jp, the estimated mean square error of prediction for each

model selected assuming that the values of the regressors are fixed and that the model is correct.

l The Jp statistic is also called the final prediction error (FPE).

n MSE l Computes the mean square error for each model selected.

n SSE l Computes the error sum of squares for each model selected.

n B l Computes estimated regression coefficients for each model selected

Page 29: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

u MODEL dependents=independents/options;

u The FREQ Statement can also be used in this syntax

u The use of FREQ in this sense treats the data set as if each observation appears n times where n is the value of the FREQ variable for the observation.

u The total number of observations will be considered equal to the sum of the FREQ variable when the procedure determines the df when calculating significance probabilities.

PROC RSQUARE options; MODEL dependents=independents/options;FREQ variable;WEIGHT variable; BY variables;

Run;Quit;

Page 30: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

u MODEL dependents=independents/options;

u The FREQ Statement can also be used in this syntax

u If your data set includes a variable indicating the frequency of occurrence for other values in the observation, you would include this variables name beside the Freq statement.

PROC RSQUARE options; MODEL dependents=independents/options;FREQ variable;WEIGHT variable; BY variables;

Run;Quit;

Page 31: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUAREModel Statement Options

u MODEL dependents=independents/options;

u The WEIGHT Statement can also be used in this syntax

u The WEIGHT statement names a variable in the input data set whose values are relative weights for a weighted least-squares fit. If the weight value is proportional to the reciprocal of the variance for each observation, then the weighted estimates are the best linear unbiased estimates (BLUE).

u The WEIGHT and FREQ statements have similar effects, except in the calculation of degrees of freedom. BY Statement

PROC RSQUARE options; MODEL dependents=independents/options;FREQ variable;WEIGHT variable; BY variables;

Run;Quit;

Page 32: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

u MODEL dependents=independents/options;

u The BY variable can be used in this syntax

u The BY statement can be used with PROC RSQUARE

u Will result in separate analyses on observations in groups defined by the BY variables.

u When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

u If the data has not been sorted previously in ascending order,

u Use PROC SORT procedure with a similar BY statement to sort the data,

u Or might be appropriate to use the option NOTSORTED

u or DESCENDING if data was previous sorted in the largest to smallest value for some other reason previously.

u Most likely you will need to sort the data

Page 33: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE Model Statement Options

PROC SORT DATA=New by variable1;Run;

Quit;

PROC RSQUARE options;

MODEL dependents=independents/options;

FREQ variable;

WEIGHT variable;

BY variables;

Run;

Quit;

Page 34: I OWA S TATE U NIVERSITY Department of Animal Science PROC ROBUSTRET & Evaluating Regression analyses With The Help of PROC RSQUARE Animal Science 500

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC RSQUARE

u What we are building toward using PROC RSQUARE is building the best model or most predictive model.

u Topic of next lecture Model Development and Selection of Variables