Upload
jariou
View
1.362
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Extend and reuse SAS own procedures within data step code. Using PROC FCMP, we show you can create reusable code in the data step to pull together the power of possibly many procedures and getting a much cleaner programming model.
Citation preview
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Adding Statistical Functionality to the DATA Step with PROC
FCMP
Stacey Christian and Jacques Rioux SAS Institute Inc., Cary, NC
Paper 326-2010
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Introduction/Motivation
Ever want to call a SAS procedure from the DATA step?
Ever want to encapsulate a complicated analytical algorithm in a reusable function?
This talk will demonstrate how to add statistical functionality to the DATA step through the definition of FCMP function wrappers.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Overview
RUN_MACRO function in FCMP
Recursive Technique
Iterative Technique/The Simulation
Meta Programming with FCMP
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
RUN_MACRO Function in FCMP
executes a predefined SAS macro
Syntax:
rc = run_macro(‘macro_name’, var_1, var_2, …);
• rc : return code
• macro_name: name of sas macro to run
• var_N: variables to pass to/from macro
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
See Macro Run /* Create a macro called testmacro */
%macro subtract_macro; %let difference = %sysevalf(&a - &b);%mend subtract_macro;
/* Use subtract_macro within a function */
proc fcmp outlib = sasuser.ds.functions;
function subtract(a,b); rc = run_macro(‘subtract_macro', a, b, difference); if rc eq 0 then return(difference); else return(.); endsub; /* test the call */ a = 5.3; b = 0.7; diff = subtract(a, b); put diff=;
run;
diff=4.6
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
See Macro Run in DATA Step options cmplib = (sasuser.ds);
data _null_; a = 5.3; b = 0.7; diff = subtract(a, b); put diff=;run;
diff=4.6
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Segmenting Time Series Data
“Segmenting Time Series: A Survey and Novel Approach” Keogh, Eamonn, et. al.
reduce extremely large time series data sets
piecewise linear approximations
top-down recursive algorithm
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Top Down Algorithm
SegmentTopDown ( currentSegment ) { error = run_linear_approximation( currentSegment );
leftError = run_linear_approximation ( leftSegment );
rightError = run_linear_approximation ( rightSegment );
combinedError = leftError + rightError;
if (combinedError < error) then { call SegmentTopDown ( leftSegment ) ; call SegmentTopDown ( rightSegment );
} else {
keep_segment( currentSegment ); }}
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Top Down Subroutine
subroutine segment_topdown(data $, segdata $, var $, start, end, threshold); error = linear_approximation(data, start,end); mid = start + floor((end-start)/2); left_error = linear_approximation (data, start, mid); right_error = linear_approximation (data, mid+1, end); improvement = (error – (left_error + right_error)) / error; if (improvement > threshold) then do; call segment_topdown(data, segdata, start, mid, threshold); call segment_topdown(data, segdata, mid+1, end, threshold); end; else do; call append_segment(segdata, start, end, error); end; endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Linear Approximation Subroutine
function linear_approximation(ds_in $, var $, first_obs, last_obs);
rc = run_macro(‘linear_approximation_macro’, ds_in, first_obs, last_obs, var, error);
return(error);
endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Linear Approximation Macro
%macro linear_approximation_macro; data _TEMP_; set &ds_in(firstobs=&first_obs obs=&last_obs); retain _TREND_ 0; _TREND_ = _TREND_ + 1; run; proc reg data=_TEMP_ outest=_EST_ noprint; model &var = _TREND_ / sse; run; quit; proc sql noprint; select _SSE_ into :ERROR from _est_; quit; %mend linear_approximation_macro;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Results
data _NULL_;
call segment_topdown("sasuser.snp", "work.segds_20", "close", 1, 15116, 0.2);
call segment_topdown("sasuser.snp", "work.segds_15", "close", 1, 15116, 0.15); run;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Graphic Results
42 Piecewise Linear Segments
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Graphic Results
113 Piecewise Linear Segments
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Iterative Technique• "Minimum Quadratic Distance Estimation for the
Proportional Hazards Regression Model with Grouped Data“, Jacques Rioux and Andrew Luong
• Survival models/proportional hazard model
• Proc PHREG (max likelihood) versus minimum distance methods
• Iteratively reweighted least squares algorithm
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Iteratively Reweighted Least Squares Algorithm
initialize_weights( weights );
params1 = run_regression( weights );
while (maxRelativeDifference > criteria)
{
update_weights(weights);
params2 = run_regression( weights );
maxRelativeDifference = params2 - params1;
params1 = params2;
}
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
IterativeTechnique: DATA Step code
subroutine fit_ph_model(indata $, parmData $, depVars $, weightVars $, indepVars $ );
array params1[3];
array params2[3];
call prepare_phdata(indata, “_prepdata_”);
call run_regression(“_prepdata_”, depVars, indepVars, weightVars, parmData, params1);
maxRelativeDifference = 1;
do while( maxRelativeDifference > 0.0001 );
call update_weights(“_prepdata_”, weightVars, parmData);
call run_regression( “_prepdata_”, depVars, indepVars, weightVars, parmData, params2 );
maxRelativeDifference = calc_max_relative_diff(params1,params2);
end;
endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Run_Regression Subroutine
subroutine run_regression( data $, dependent $, independent $, weight $, parmData $, parmArray[*]); outargs parmArray;
array tmpArray[1] _temporary_;
rc = RUN_MACRO ('run_regression_macro', data, parmData , dependent, independent, weight) ;
rc = read_array(parmData, tmpArray); do i = 1 to dim(parmArray); parmArray[i] = tmpArray[1,i]; end;
endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Run_Regression Macro
%macro run_regression_macro;
proc reg data=&data outest=&parmData NOPRINT; model &dependent = &independent/noint; weight &weight; quit;
data &parmData; set &parmData; keep &independent; run;
%mend run_regression_macro
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The True Glory of Reusable Functions: The Simulation
• Now have a “fitting routine” for the Proportional Hazard Model (fit_ph_model)
• Create a function to generate PH data (called generate_ph_data)
• Create a function to append fits to results data set (called append_ph_data).
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The Simulation Study
proc fcmp; do i=1 to 1000; call simulate_ph_data ("work.simdata"); call fit_ph_model("work.simdata", "work.params", "log_log_Pij", "Weight", "x1 x2 x3" ); call append_data("work.simresults", "work.params"); end; run;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Simulation Results
Coefficient Real Value Mean StDev
X1 0.1 0.102454 0.036917
X2 0.3 0.307029 0.050375
X3 0.2 0.205464 0.017793
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Simulation Graphs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Meta Programming
Create you own scoring function dynamically from a fitted model
subroutine create_score( data $, dependent $, independent $, scoreFunc $, library $ ); paramds = "work.params"; rc = RUN_MACRO('run_regression_macro', data, paramds, dependent, independent); rc = RUN_MACRO('create_score_func_macro', paramds, independent, scoreFunc, library); endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Score Function Macro%macro create_score_func_macro;
proc transpose data =¶mds out=¶mds._t; var &independent; run;
proc sql noprint; select trim(_NAME_) || " * " || strip(put(col1,BEST12.))
into: theScore separated by " + "from ¶mds._t;
select trim(_NAME_)into: theArgs separated by " , "from ¶mds._t;
quit;
data _NULL_; set ¶mds; call symputX ("Intercept",intercept); run;
<continued>
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Score Function Macro - continuedproc fcmp outlib=&library..score; function &scoreFunc(&theArgs); return(&Intercept + &theScore); endsub; quit;
%mend create_score_func_macro;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Run Create Score Function
data _NULL_; call create_score("work.mroz", "lwage", "educ exper age kidslt6 kidsge6", "PredLWage_Full", "sasuser.score"); call create_score("work.mroz", "lwage", "educ exper age", "PredLWage_NoKids", "sasuser.score");run;
data _NULL_; educ = 15; exper = 5; age = 30; kidslt6 = 2; kidsge6 = 1;
PredWage_Full = exp(PredLWage_Full(educ, exper, age, kidslt6, kidsge6)); put PredWage_Full=;
PredWage_NoKids = exp(PredLWage_NoKids(educ, exper, age)); put PredWage_NoKids=;run;
PredWage_Full=3.4199679212 PredWage_NoKids=3.787216653
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Conclusions
Users can encapsulate preexisting analytical procedures as building blocks for even larger more complex statistical analysis methods!
PROC FCMP provides the vehicle to write reusable, independent program units (functions and subroutines)
These units can be written and tested independently.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Where to find more information
http://support.sas.com/saspresents
Paper is PDF form
Zip file containing all source code
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Adding Statistical Functionality to the DATA Step with PROC FCMP
Paper 326-2010