Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Never Lose a SAS Job
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Not Again!!
Unexpected re-boot, system failures
Long running job didn’t complete
Must manually re-start job from step 1
It can drive you crazy!!!
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Grid Gets the Stars Aligned...
SAS checkpoint-restart features
+ LSF requeue capabilities
+ SASGSUB batch submission utility
---------------------------------------------------
Completion of SAS Jobs in Minimal Time
Ideal for critical long-running SAS jobs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Checkpoint/Restart
Checkpoint mode• Record info about data/proc steps in checkpoint library
Restart mode• Global statements and macros re-executed• SAS reads data in checkpoint library to determine
which steps completed• Program execution resumes with step that was
executing when failure occurred• Data/proc steps that completed successfully will not be
re-executed
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
To Set Up for Checkpoint-Restart
Specify following options on batch SAS invocation:• STEPCHKPT – enables checkpoint mode• STEPRESTART – causes SAS to use checkpoint-restart data• NOWORKINIT – does not init WORK library when SAS starts• NOWORKTERM – saves WORK library when SAS exits• ERRORCHECK STRICT – puts SAS in syntax check mode
when error in libname, filename, %include and lock stmts• ERRORABEND – causes SAS to terminate for most errors
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The WORK Directory
WORK is default location for checkpoint library• Can use STEPCHKPTLIB to point to permanent library• Must include libname as first statement in batch program
WORK directory must be on shared storage
Example:• sas92 -noworkinit -noworkterm -work abc
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Use of Both STEPCHKPT and STEPRESTART
Initial invocation • Results in checkpoint mode only• No data in checkpoint library
Subsequent invocations• Uses data from checkpoint library• Continues checkpoint mode for remainder of program
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Application SAS Grid
Manager
HOST A
HOST B
HOST C
Normal Queue
SAS Grid Manager – Queues
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Automatic Job Requeue
Configure queue to automatically requeue job with specific exit value• REQUEUE_EXIT_VALUES=all ~0 ~1
− Any exit code other than 0 or 1 (success & warnings) will be requeued
• REQUEUE_EXIT_VALUES=EXCLUDE(all ~0 ~1)
− Run requeued job on different host• Jobs requeued 5 times by default
− MAX_JOB_REQUEUE lets you configure requeue limit, can be globally specified for all queue or on per queue basis
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Automatic Job Rerun
A job is automatically rerun when• Execution host becomes unavailable while a job is
running• System fails while a job is running• RERUNNABLE=yes
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
LSF Queue Definition
Begin Queue QUEUE_NAME = sas_rerun PRIORITY = 40 NICE = 10 RERUNNABLE = YES REQUEUE_EXIT_VALUES = all ~0 ~1DESCRIPTION = Jobs submitted to this queue will be requeued automatically and also rerunnable. End Queue
Jobs dispatched from this queue will be rerun if system failures
Jobs with fatal exit code will be
requeued
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SASGSUB Capabilities
Standalone utility that will allow user to• Submit SAS program to grid for processing• Display status of user’s jobs on the grid• Retrieve output from user’s jobs to local directory• Kill jobs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Using SASGSUB
Advantages• Submit and forget • View job output while job is running• Eliminate need for full SAS install on client• Make use of SAS checkpoint/restart capability
NOTE - requires shared file system between client and grid
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Submitting a Job
Command line interface• sasgsub –gridsubmitpgm <sas_pgm>
Example output
Job ID: 6772Job directory: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm"Job log file: "/CNT/sasgsub/gridwork/sascnn1/SASGSUB-2009-03-17_14.09.52.847_testPgm/testPgm.log“
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Submitting a Job for Checkpoint-Restart
GRIDRESTARTOK• Automatically adds the following options to batch SAS invocation
− STEPCHKPT, STEPRESTART, ERRORCHECK STRICT, ERRORABEND, NOWORKINIT, NOWORKTERM
• Sets RERUNNABLE parm on job
Command line interface• sasgsub –gridsubmitpgm <sas_pgm> -gridrestartok
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Getting Job Status
Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:
08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:28:57, Started:
08Dec2008:10:28:57 on Host d15003, Ended: 08Dec2008:10:28:57 Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:28:57
Command line interface• sasgsub –gridgetstatus <job_id | _ALL_>
Example output
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Retrieving Results
Command line interface• sasgsub –gridgetresults <job_id | _ALL_>
Example Output
Current Job Information Job 1917 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started:
08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33 Moved job information to .\SASGSUB-2008-11-21_21.52.57.130_testPgm
Job 1918 (testPgm) is Finished: Submitted: 08Dec2008:10:53:33, Started: 08Dec2008:10:53:33 on Host d15003, Ended: 08Dec2008:10:53:33
Moved job information to .\SASGSUB-2008-11-24_13.13.39.167_testPgm
Job 1925 (testPgm) is Submitted: Submitted: 08Dec2008:10:53:34
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Application SAS Grid
Manager
HOST A
HOST B
HOST C
normal queue
Putting It All Together
sas_rerun queue
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Application SAS Grid
Manager
HOST A
HOST B
HOST C
normal queue
Putting It All Together
sas_rerun queue
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Author contact information second line
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
A simple solution
Record a checkpoint number, save it in WORK
If restarting, skip PROC / DATA steps to there
Tokenize everything
Execute all global statements