30
Parallel processing SAS Batch jobs in Windows, check return codes and then aggregate information This post will discuss a method and present macros to achieve Parallel processing SAS Batch jobs in Windows OS. Scenario: You need to extract data from say a large database or multiple sources and then aggregate that information into one dataset in the least possible time without sacrificing much on the processing time and programming time. Let’s discuss this hypothetical scenario with an example…Imagine that there were 3 independent extract processes…and suppose each of those processes took the below min to process…See Appendix #1 below…The example is simple to understand but the macro might be a little bit overkill…but I am sure it is worth 100% to try…Thanks in advance for your patience…. Process 1 took 30 min Process 2 took 30 min Process 3 took take 30 min Aggregate process took 15 min Total process : 1 hr 45 min.

Parallel Processing SAS Batch Jobs in Windows

Embed Size (px)

DESCRIPTION

Parallel processing SAS Batch jobs in Windows,Learn SAS, programming, SAS slides, SAS tutorials, SAS Macros, SAS certification, SAS Sample Code, SAS Macro examples, SAS video tutorials, SAS ebooks, SAS tutorials, SAS tips and Techniques, Base SAS,Advanced SAS, SAS certification, SAS interview Questions and answers, Proc SQL, SAS syntax, Quick SAS links, SAS Documentation, ODS HTML, ODS, Clinical trials, Financial Industry, Question and Answers, SAS Resumes, SAS Blogs,http://sastechies.blogspot.com, http://www.sastechies.com

Citation preview

Page 1: Parallel Processing SAS Batch Jobs in Windows

Parallel processing SAS Batch jobs in Windows, check return codes and then aggregate information

This post will discuss a method and present macros to achieve Parallel processing SAS Batch jobs in Windows OS.

Scenario:

You need to extract data from say a large database or multiple sources and then aggregate that information into one dataset in the least possible time without sacrificing much on the processing time and programming time.

Let’s discuss this hypothetical scenario with an example…Imagine that there were 3 independent extract processes…and suppose each of those processes took the below min to process…See Appendix #1 below…The example is simple to understand but the macro might be a little bit overkill…but I am sure it is worth 100% to try…Thanks in advance for your patience….

Process 1 took 30 min

Process 2 took 30 min

Process 3 took take 30 min

Aggregate process took 15 min

Total process : 1 hr 45 min.

Now your task is to somehow decreasing the processing time..

The simple solution is to decrease the processing time by processing these independent extract jobs in parallel and then aggregate the info in a separate process. But how do we do that?

Page 2: Parallel Processing SAS Batch Jobs in Windows

Assuming that we have a directory structure like this for the Project to store permanent datasets(data), source information (sourcedata), logs (log), programs(progs) and output (out)

Here I present the steps to do that:

1. Break down the processes into separate .sas programs (Look at Programs in Appendix # 2 #3)

Put each of the extract processes into separate programs…For this example prog1, prog2 , prog 3 and aggregate(aggregation part)

SASStart.bat – It is a batch file (save the code in Appendix # 4 into C:\Project\bat\SASStart.bat)

Page 3: Parallel Processing SAS Batch Jobs in Windows

2. Kick off those separate programs using systask command with the SASStart.bat batch file

Use systask SAS statement to execute the Batch file SASStart.bat asynchronously i.e. concurrently.

More info http://support.sas.com/onlinedoc/913/getDoc/en/hostwin.hlp/win-stmt-systask.htm

Syntax briefly

systask command "SASStart.bat progsname" taskname="task1" status=rc_task1;

systask command – tells SAS to run the bat program asynchronously

taskname – identify the task

status= stores the return code value in the macro variable specified (rc_task1 in the example)

3. Check the return codes before aggregating

Page 4: Parallel Processing SAS Batch Jobs in Windows

We need to check the return codes of each of those process by looking at the values of the macro variables mentioned in the status=macvar.

4. Aggregate the code

Prepare the aggregation code and the process code to do all the above tasks.

Please look at the Appendices for the code and explanation of the code.

Download all the code and data from http://www.sastechies.com/blogs/parallel.zip

All the macros are generic and a little bit of customization in the highlighted areas is required to implement this technique to your programs.

1.Replace these macro variable values to your project specific paths%let progPath=C:\Project\progs; /* Location of the Programs*/%let BatchPath=C:\Project\bat; /* Location of the SAS Start Batch file*/%let prognames= prog1 prog2 prog3; /*name of the program names to execute*/

2. replace the contents of the Prog1-3.sas and aggregate.sas to your project specific programs.

Page 5: Parallel Processing SAS Batch Jobs in Windows

APPENDIX # 1

Code that processes sequentially

Libname datalib "C:\Project\data";Libname source "C:\Project\sourcedata";

/* Process 1: Extracting info starting with J */proc sql;create table datalib.data1 asselect *from source.class1where name like 'J%';quit;

/* Process 2: Extracting info starting with A */proc sql;create table datalib.data2 asselect *from source.class2where name like 'A%';quit;

/* Process 3: Extracting info starting with R */proc sql;create table datalib.data3 asselect *from source.class3

Page 6: Parallel Processing SAS Batch Jobs in Windows

where name like 'R%';quit;

/* aggregate Process: Put the information together */data datalib.aggregatedata; set datalib.data1 datalib.data2 datalib.data3;run;

/* sorting by name */proc sort data=datalib.aggregatedata;by name;quit;

Page 7: Parallel Processing SAS Batch Jobs in Windows

Appendix # 2:

Save the code into each of these .sas files

Program prog1.sas

Libname datalib "C:\Project\data";

/* Process 1: Extracting info starting with J */proc sql;create table datalib.data1 asselect *from source.class1where name like 'J%';quit;

Page 8: Parallel Processing SAS Batch Jobs in Windows

Program prog2.sas

Libname datalib "C:\Project\data";

/* Process 2: Extracting info starting with A */proc sql;create table datalib.data2 asselect *from source.class2where name like 'A%';quit;

Program prog3.sas

Libname datalib "C:\Project\data";

/* Process 3: Extracting info starting with R */proc sql;create table datalib.data3 asselect *from source.class3where name like 'R%';quit;

Page 9: Parallel Processing SAS Batch Jobs in Windows

Appendix # 3:

Program aggregate.sas

Description of each of these macros used in the program

%initiateParallelProcessing – This is the main macro which kicks off all the parallel programs with different SAS sessions. The SYSTASK statement within the Do loop here kicks off the batch program c:\Project\bat\SASStart.bat with program as the parameter for every task. It also has the name of the taskname and stores the return code of the batch process to the macro variable mentioned in the status= statement.

systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;

the WAITFOR _ALL_ task1 task2 task3 – tells SAS to wait for all the processes to complete before executing all the statements after that i.e. %checkreturncodes and %aggregate etc..

waitfor _all_ %tasklist(&start.,&end.);; /* Wait for All of the tasks to complete */

%checkreturncodes(&start.,&end.,task,"flag_error=N");; - Print All the Error Codes to Log

%aggregatecode – this macro checks all the return codes of the parallel tasks..if 0 then completes the aggregating code after that.

%sleepsecs – sometimes it is better to sleep sas for couple of secs just to let the Windows OS settle with the new task…so I’ve used sleepsecs(2) so that Windows can recognize the batch file execution properly…this is optional

Page 10: Parallel Processing SAS Batch Jobs in Windows

%tasklist(start,end) – I’ve used this macro just to give me the list of the tasks that were spawned..

%Conditions(start,end,rctype)- I’ve used this macro piece to check the values for rc_task1 eq 0 or rc_task2 eq 0 etc for all the task id’s spawned…

%Checkreturncodes(start,end,rctype,cond)- This macro checks for the conditions rc_task1 eq 0 or rc_task2 eq 0 and then sets the condition that a user wants to set…in this case I wanted to set the condition flag_error=N…

%printerrorcodes(start,end,rctype)- This macro prints the error codes to the Log..just a listing of the values of the macro variables of all the tasks..

splitmacroval(macvar,prefix,splitcnt,dlm=' ')- Read more about this macro at-http://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html

Full Code

/* Macro to send SAS to a sleep mode for few seconds */%macro sleepsecs(sec); %put;%put =============Macro: sleepsecs(&sec);

data _null_; slept= sleep(&sec);/* argument to sleep must be expressed in seconds */run;%mend sleepsecs;

/* Macro to generate a tasklist task1 task2 etc... */%macro tasklist(start,end);

Page 11: Parallel Processing SAS Batch Jobs in Windows

%do i=&start %to &end.; task&i.%end;%mend tasklist;

/* Macro to check if all the returncodes for the Child sessions is 0 or not*/%macro Conditions(start,end,rctype);%put;%put =============Macro: Conditions(&start,&end,&rctype);

%do i=&start %to %eval(&end - 1); &&&rc_&rctype.&i. eq 0 or %end;%mend Conditions;

/* Macro to Check the return codes of all tasks */%macro Checkreturncodes(start,end,rctype,cond); %put;%put =============Macro: Checkreturncodes(&start,&end,&rctype,&cond);

%put;%put =============================================================================; %put The Error codes for the Above tasks are: %printerrorcodes(&start,&end,&rctype); %put =============================================================================;%put; %if %Conditions(&start,&end,&rctype) &&&rc_&rctype.&end. eq 0 %then %do;

Page 12: Parallel Processing SAS Batch Jobs in Windows

%let leftof=%scan(%sysfunc(compress(&cond,'"')),1,'=');%let rightof=%scan(%sysfunc(compress(&cond,'"')),2,'=');%global &leftof;

%let &leftof=&rightof;%put Flag Set: &leftof=&rightof;%put;

%end;

%mend Checkreturncodes;

/* Macro to print the error codes to the Log */%macro printerrorcodes(start,end,rctype); %put;%put =============Macro: printerrorcodes(&start,&end,&rctype);

%do i=&start %to %eval(&end); rc_&rctype.&i.=&&&rc_&rctype.&i. %end;%mend printerrorcodes;

/* Create a SAS session for each of the Filename Inputs */%macro initiateParallelProcessing; %let start=1;%let end=&progcnt;

options noxwait noxsync;x cd "&BatchPath";

%do i=&start. %to &end.; /* */

Page 13: Parallel Processing SAS Batch Jobs in Windows

/* Call the SASStart.bat with All the parameters in the Back ground mode with Systask command */

%put;%put *************************Task executing task&i.********************************************;

%put systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;

systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;

%sleepsecs(3); /* wait for 3 secs atleast for the system to stabilize with the new App*/%end;

waitfor _all_ %tasklist(&start.,&end.);; /* Wait for All of the tasks to complete */

%checkreturncodes(&start.,&end.,task,"flag_error=N");; /* Print All the Error Codes to Log */

%mend initiateParallelProcessing;

%macro splitmacroval(macvar,prefix,splitcnt,dlm=' '); %global &splitcnt; /*create the global variable for storing the splitcount*/ %let num=1; /*initialize a counter variable*/

%global &prefix# /*create the global variable for storing the first split*/ %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*Scan the first value*/

%do %while(&&&prefix&num ne ); %let num=%eval(&num + 1);/*increment the counter*/

Page 14: Parallel Processing SAS Batch Jobs in Windows

%global &prefix# /*create the global variable*/ %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*scan the next value*/ %end;

%let &splitcnt=%eval(&num - 1); /*Store the split count to the macro variable reqd*/%mend splitmacroval;

%let flag_error=Y; /* Initialize the error flag to Y */%let progPath=C:\Project\progs; /* Location of the Programs*/%let BatchPath=C:\Project\bat; /* Location of the SAS Start Batch file*/%let prognames= prog1 prog2 prog3; /*name of the program names to execute*/%splitmacroval(prognames,progs,progcnt,dlm=' ');/* Read more about this macro at-http://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html*/

%initiateParallelProcessing;/* Call the SAS Process to process those files */%put The value of flag_error=&flag_error; %put;

%macro aggregatecode;%if &flag_error eq N %then %do;

Libname datalib "C:\Project\data";data datalib.aggregatedata; set datalib.data1 datalib.data2 datalib.data3;run;

proc sort data=datalib.aggregatedata;

Page 15: Parallel Processing SAS Batch Jobs in Windows

by name;quit;

%end;%else %do; %put Errors in Processing...Please check the Logs...; %end;%mend aggregatecode;

%aggregatecode;

Page 16: Parallel Processing SAS Batch Jobs in Windows

Appendix # 4:

Save the code to SASstart.bat and put it in C:\projects\bat\

This windows batch file takes one parameter i.e. name of the sas program (extension .sas is not required). This batch file kicks a batch SAS session with the command

For eg. The macro initiateProcessing uses SASstart.bat to kick off prog2.sas with this command.

"C:\Program Files\SAS\SAS 9.1\sas.exe " -sysin "C:\Project\progs\prog2.sas.sas" -print "C:\Project\out\prog2 08DEC2009.lst" –log "C:\Project\Log\prog2 08DEC2009.log" -unbuflog -nodmsexp -nosplash

Read more about Running SAS in batch mode in Windows OS at

http://sastechies.blogspot.com/2009/12/running-sas-in-batch-mode-in-windows-os.html

tips to understand this program is to compare with SAS syntax…

The Set statement is like our SAS %let statement

%XXX% is like out SAS &XXXX macro resolution

%ERRORLEVEL% is like out SAS &sysrc macvariable

At SET ProgName=%1 – is the parameter that we send to the SASstart.bat program

SASStart.bat

@echo off

Page 17: Parallel Processing SAS Batch Jobs in Windows

if x==%1x goto CHKPARM

goto PROCESS

:PROCESS

echo Start Processing

SET ProgName=%1 SET SASEXEpath=C:\Program Files\SAS\SAS 9.1\sas.exe SET ProgPath=C:\Project\progs SET Prog2Start=%ProgPath%\%ProgName%.sas

for /f "tokens=1-2,* delims= " %%p in ('echo %DATE%') do SET Rundated=%%q echo Rundated=%Rundated%

for /f "tokens=1-3,* delims=/" %%f in ('echo %Rundated%') do SET Rundatet=%%f%%g%%h echo Rundatet=%Rundatet%

SET day=%Rundatet:~2,2% SET year=%Rundatet:~4,4%

SET Monthtemp=%Rundatet:~0,2%

echo Monthtemp=%Monthtemp%

if %Monthtemp%==01 set month=JAN if %Monthtemp%==02 set month=FEB if %Monthtemp%==03 set month=MAR

Page 18: Parallel Processing SAS Batch Jobs in Windows

if %Monthtemp%==04 set month=APR if %Monthtemp%==05 set month=MAY if %Monthtemp%==06 set month=JUN if %Monthtemp%==07 set month=JUL if %Monthtemp%==08 set month=AUG if %Monthtemp%==09 set month=SEP if %Monthtemp%==10 set month=OCT if %Monthtemp%==11 set month=NOV if %Monthtemp%==12 set month=DEC

SET Rundate=%day%%Month%%year%

SET ProjPath=C:\Project SET Logpath=%ProjPath%\Log SET OutputPath=%ProjPath%\out SET Logfile=%Logpath%\%ProgName% %Rundate%.log SET Outputfile=%OutputPath%\%ProgName% %Rundate%.lst

echo Command Executing is echo --------------------------------------------------- echo "%SASEXEpath%" -sysin "%Prog2Start%.sas" -print "%Outputfile%" -log "%Logfile%" -unbuflog -nodmsexp -nosplash echo ---------------------------------------------------

"%SASEXEpath%" -icon -sysin "%Prog2Start%" -print "%Outputfile%" -log "%Logfile%" -unbuflog -nodmsexp -nosplash SET Capterr=%ERRORLEVEL% echo SAS Execution Complete with Exit code %Capterr% goto TERM

Page 19: Parallel Processing SAS Batch Jobs in Windows

:CHKPARM

echo Please check your parameter list echo Correct Usage: DQS AppName Environment FileDate Filename echo Eg: DQS CAHSEE Production 05/10/2009 CAHSEE.DAT goto TERM

:TERMexit /b %Capterr%

Page 20: Parallel Processing SAS Batch Jobs in Windows

Appendix # 5

Log of the Aggregate process

535 /* Macro to send SAS to a sleep mode for few seconds */536 %macro sleepsecs(sec);537 %put;%put =============Macro: sleepsecs(&sec);538539 data _null_;540 slept= sleep(&sec);/* argument to sleep must be expressed in seconds */541 run;542 %mend sleepsecs;543544 /* Macro to generate a tasklist task1 task2 etc... */545 %macro tasklist(start,end);546 %do i=&start %to &end.;547 task&i.548 %end;549 %mend tasklist;550551 /* Macro to check if all the returncodes for the Child sessions is 0 or not*/552 %macro Conditions(start,end,rctype);553 %put;%put =============Macro: Conditions(&start,&end,&rctype);554555 %do i=&start %to %eval(&end - 1);556 &&&rc_&rctype.&i. eq 0 or557 %end;558 %mend Conditions;559560 /* Macro to Check the return codes of all tasks */561 %macro Checkreturncodes(start,end,rctype,cond);562 %put;%put =============Macro: Checkreturncodes(&start,&end,&rctype,&cond);563564 %put;%put =============================================================================;565 %put The Error codes for the Above tasks are: %printerrorcodes(&start,&end,&rctype);566 %put =============================================================================;%put;567568 %if %Conditions(&start,&end,&rctype) &&&rc_&rctype.&end. eq 0 %then569 %do;570 %let leftof=%scan(%sysfunc(compress(&cond,'"')),1,'=');571 %let rightof=%scan(%sysfunc(compress(&cond,'"')),2,'=');572 %global &leftof;573 %let &leftof=&rightof;574 %put Flag Set: &leftof=&rightof;575 %put;576 %end;

Page 21: Parallel Processing SAS Batch Jobs in Windows

577578 %mend Checkreturncodes;579 /* Macro to print the error codes to the Log */580 %macro printerrorcodes(start,end,rctype);581 %put;%put =============Macro: printerrorcodes(&start,&end,&rctype);582583 %do i=&start %to %eval(&end);584 rc_&rctype.&i.=&&&rc_&rctype.&i.585 %end;586 %mend printerrorcodes;587588 /* Create a SAS session for each of the Filename Inputs */589 %macro initiateParallelProcessing;590 %let start=1;591 %let end=&progcnt;592593 options noxwait noxsync;594 x cd "&BatchPath";595596 %do i=&start. %to &end.; /* */597 /* Call the SASStart.bat with All the parameters in the Back ground mode with Systask597! command */598 %put;%put *************************Task executing598! task&i.********************************************;599 %put systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;600 systask command "SASStart.bat &&progs&i." taskname="task&i." status=rc_task&i.;601 %sleepsecs(3); /* wait for 3 secs atleast for the system to stabilize with the new601! App*/602 %end;603604 waitfor _all_ %tasklist(&start.,&end.);; /* Wait for All of the tasks to complete604! */605 %checkreturncodes(&start.,&end.,task,"flag_error=N");; /* Print All the Error605! Codes to Log */606607 %mend initiateParallelProcessing;608609 %macro splitmacroval(macvar,prefix,splitcnt,dlm=' ');610 %global &splitcnt; /*create the global variable for storing the splitcount*/611 %let num=1; /*initialize a counter variable*/612613 %global &prefix# /*create the global variable for storing the first split*/614 %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*Scan the first value*/615616 %do %while(&&&prefix&num ne );617 %let num=%eval(&num + 1);/*increment the counter*/618 %global &prefix# /*create the global variable*/619 %let &prefix&num=%scan(&&&macvar,&num,&dlm); /*scan the next value*/620 %end;621

Page 22: Parallel Processing SAS Batch Jobs in Windows

622 %let &splitcnt=%eval(&num - 1); /*Store the split count to the macro variable reqd*/623 %mend splitmacroval;624625626 %let progPath=C:\Project\progs; /* Location of the Programs*/627 %let BatchPath=C:\Project\bat; /* Location of the Batch file*/628 %let prognames= prog1 prog2 prog3; /*name of the program names to execute*/629 %splitmacroval(prognames,progs,progcnt,dlm=' ');/* Read more about this macro at-630 http://sastechies.blogspot.com/2009/11/sas-macro-to-split-macro-variables.html*/631632 %let flag_error=Y;633 %initiateParallelProcessing; /* Call the SAS Process to process those files */

*************************Task executing task1********************************************systask command "SASStart.bat prog1" taskname="task1" status=rc_task1

=============Macro: sleepsecs(3)

NOTE: Task "task1" produced no LOG/Output.NOTE: DATA statement used (Total process time): real time 3.06 seconds cpu time 0.04 seconds

*************************Task executing task2********************************************systask command "SASStart.bat prog2" taskname="task2" status=rc_task2

=============Macro: sleepsecs(3)

NOTE: Task "task2" produced no LOG/Output.NOTE: DATA statement used (Total process time): real time 3.06 seconds cpu time 0.04 seconds

*************************Task executing task3********************************************systask command "SASStart.bat prog3" taskname="task3" status=rc_task3

=============Macro: sleepsecs(3)

NOTE: Task "task3" produced no LOG/Output.NOTE: DATA statement used (Total process time): real time 3.06 seconds cpu time 0.04 seconds

=============Macro: Checkreturncodes(1,3,task,"flag_error=N")

Page 23: Parallel Processing SAS Batch Jobs in Windows

=============================================================================

=============Macro: printerrorcodes(1,3,task)The Error codes for the Above tasks are: rc_task1=0 rc_task2=0 rc_task3=0=============================================================================

=============Macro: Conditions(1,3,task)Flag Set: flag_error=N

634 %put The value of flag_error=&flag_error; %put;The value of flag_error=N

635636 %macro aggregatecode;637 %if &flag_error eq N %then638 %do;639 Libname datalib "C:\Project\data";640 data datalib.aggregatedata;641 set datalib.data1 datalib.data2 datalib.data3;642 run;643644 proc sort data=datalib.aggregatedata;645 by name;646 quit;647648 %end;649 %else %do; %put Errors in Processing...Please check the Logs...; %end;650 %mend aggregatecode;651652 %aggregatecode;NOTE: Libref DATALIB was successfully assigned as follows: Engine: V9 Physical Name: C:\Project\data

NOTE: There were 7 observations read from the data set DATALIB.DATA1.NOTE: There were 2 observations read from the data set DATALIB.DATA2.NOTE: There were 2 observations read from the data set DATALIB.DATA3.NOTE: The data set DATALIB.AGGREGATEDATA has 11 observations and 5 variables.NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.00 seconds

NOTE: There were 11 observations read from the data set DATALIB.AGGREGATEDATA.NOTE: The data set DATALIB.AGGREGATEDATA has 11 observations and 5 variables.NOTE: PROCEDURE SORT used (Total process time): real time 0.07 seconds

Page 24: Parallel Processing SAS Batch Jobs in Windows

cpu time 0.01 seconds

Page 25: Parallel Processing SAS Batch Jobs in Windows

Appendix # 6

Final Datasets

Download all the code and data from

http://www.sastechies.com/blogs/parallel.zip

Read more @ http://sastechies.blogspot.com/

Page 26: Parallel Processing SAS Batch Jobs in Windows