31
Ground Rules Switch off all mobile phones No breaks allowed in between the training program No discussion among the trainees Questions are open for discussion so that everyone can learn and they must be addressed to the Trainer Though the T rainer would try to provide all questions/clarifications sought, in some specific cases he may decide to provide the necessary clarifications post the training and continue ahead with the training Every trainee must complete the exercises provided at the end of the training program individually. Training would be considered complete only upon completion of the exercises satisfactorily All exercises submitted would be discussed with the trainee individually / another meeting might be set up to share the learning among the group

MS Introduction to SAS Training

Embed Size (px)

Citation preview

Page 1: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 1/31

Ground RulesSwitch off all mobile phones

No breaks allowed in between the training program

No discussion among the trainees

Questions are open for discussion so that everyone can learn and theymustbe addressed to the TrainerThough the Trainer would try to provide all questions/clarifications sought,in some specific cases he may decide to provide the necessaryclarificationspost the training and continue ahead with the trainingEvery trainee must complete the exercises provided at the end of thetraining

program individually. Training would be considered complete only uponcompletion of the exercises satisfactorily

All exercises submitted would be discussed with the trainee individually /another meeting might be set up to share the learning among the group

Page 2: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 2/31

Overview of how the training is organized

Expectations from the Training Program

Page 3: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 3/31

Numerous software are used in the academia and industry fordata management, statistical analysis and optimization

Statistical Analysis

Predictive Modeling

Decision Trees & Segmentation

Forecasting & Simulation

Optimization

Campaign Management

Win Cross

KnowledgeSeeker

CART

Unica

Evolver Risk Optimizer

Page 4: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 4/31

SAS provides a complete set of solutions for enterprise wisebusiness users for data management and analysis

Brief History• Stands for Statistical Analysis System • Developed in early 70‟s at the North Carolina State University • SAS Institute Inc. formed in 1976

Applications ofSAS

SAS Products

• Base SAS – Data Management and Basic Procedures • SAS/STAT – Statistical Analysis • SAS/GRAPH – Presentation Quality Graphics • SAS/OR – Operations Research • SAS/ETS – Econometrics and Time Series Analysis • SAS/IML – Interactive Matrix Language • SAS/SQL – Structured Query Language

• Data Entry, Retrieval and Management • Report Writing • Statistical and Mathematical Analysis • Business planning, Forecasting and Decision Support • Operations Research • Quality Improvement •  Applications Development

Page 5: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 5/31

• Editor – Write theSAS program

• Log – Check the logafter running the SAScode

• Output – Check theoutput, if applicable,post SAS processing

• Explorer – Navigateand check librariesand datasets

• Results – Stores pastresults for review

There are five basic windows available in the SAS software – irrespective of whether its Windows SAS or Unix SAS

Results Explorer Output LOG Editor

SAS Help

Page 6: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 6/31

• DATA step is used to create a SAS dataset,o either temporary or permanent datao from raw data or another SAS dataset

• SAS dataset can be creating in multiple wayso In-stream raw data o DATA Employee;o INPUT Name $ Id Dateofjoining mmddyy10.;o DATALINES;o Sundar 1012 09152005o Indrajit 1000 06012005o  Anindya 1017 12262005o ;RUN;

o Existing SAS dataset o DATA Employee2;o SET Employee;o Empperiod = today() - dateofjoining;o RUN

;

o An external delimited file (details later!!) o Remote Databases (Oracle, DB2 etc.)

“DATA” and “PROC” step are the basic and most importantdata processing methods available in SAS

• SAS Procedure (Proc) is used to perform an action ona SAS dataset, for e.g. – 

• Sorting a SAS dataset by one or more variables • Running a frequency distribution on a variable in a

SAS dataset • Ordinary least squares linear regression model (comes

with SAS/STAT) 

• Creating a final report in a client presentable format 

Page 7: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 7/31

Page 8: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 8/31

• To create a SAS dataset from a raw data file

o Start a DATA step and name the dataset being createdo Mention the location of raw data file to be read using a INFILE statement

INFILE „filename‟ <options>; o Mention the data fields from the raw data file using the INPUT statement

INPUT varname <$> <var-specifications>;

Raw data files in any input format can be read into SAS usingthe INFILE and INPUT statements in a DATA Step

0031GOLDENBERG DESIREE0040WILLIAMS ARLENE M.0071PERRY ROBERT A. 0082MCGWIER-WATTSCHRISTINA

0031,GOLDENBERG,DESIREE0040,WILLIAMS,ARLENE M.0071,PERRY,ROBERT A. 0082,MCGWIER-WATTS,CHRISTINA

Raw data in columns

Delimited raw data

SAS Dataset

Page 9: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 9/31

Page 10: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 10/31

Page 11: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 11/31

• Every field in a SAS dataset have these three properties defined at the time of creation•   Length of field is the number of bytes SAS allocates for storing the values of the field in

the SAS dataset. By default the length of numeric and character variables is 8 bytes.• Length of variable can be set using two ways

o Using an appropriate informat in the INPUT statemento By assigning the variable to a constant value, in which case the length is set to the

first constant value encountered for the variable

Lengths, Informats and Formats

DATA CUSTOMER; LENGTH age 3;INPUTname $15. age 4. ; … RUN;

“AGE” is allocated 3 bytes 

DATA NAMES; INPUT name $; CARDS; Tony Hargis Dave Eagle ; RUN;

“NAME” is allocated 8 bytes by default 

Page 12: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 12/31

•   Informat instructs SAS how to read a raw SAS dataset. If no informat is specified, then itsBEST12. for numeric, and $w. for character variables where „w‟ is either 8 or the length ofthe first constant value encountered

• You can specify an informato for numeric variables by using w.d where „w‟ is the total width and „d‟ is the number of

places after the decimalo for character variables by using $w. where „w‟ is the maximum number of characters

for the variable•   Format  is a layout specification for how a variable should be printed or displayed. Bydefault it is BEST12. for numeric and $w.d for character formats

• The format of a variable can be changed by using the FORMAT  statement in the DATA step.o Overrides the default setting of length for a variable when it is created by assignment

to a character constanto

Can be used to display numbers, dates, currency, etc. in a user friendly manner

Lengths, Informats and Formats…(contd.) 

Page 13: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 13/31

• Create a new variable in an same or different dataset 

DATA Output_Data; SET Input_Data; IF Variable_1 > 500 THEN Variable_2 = 1;ELSE Variable_2 = 0; RUN; 

• Create a new variable in an same or different dataset DATA Output_Data; SET Input_Data; IF Variable_1 > 500 THEN DO; Variable_2 = 1;Variable_3 = “Sundar”; END; RUN; 

• Create a different dataset based on the criterion DATA Output_Data; SET Input_Data; IF Variable_1 > 500 THEN OUTPUT Output_Data;RUN;

If… Then… Else conditions can be used effectively to executethe SAS process conditionally

Q: How will you output to more than one SAS dataset using the IF Statement?

Ne~=Eq=

Ge>=

Gt>

Le<=

MnemonicOperator

Lt<

Comparison Operators

NOT~

OR|

MnemonicOperator

 AND&

Logical Operators

Multiple conditions can be specified using a combination of Logical and Comparisonoperators

Page 14: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 14/31

Page 15: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 15/31

• Common variables in the input datasets are used in the BY  statement

• Datasets must be sorted by the common variable(s) prior to merging

DATA Merged_data; MERGE Input_Data_1 (in = AA) Input_Data_2 (in = BB); /* Use combinations of aa and bb to control what is written to the output dataset. */  BY <Common Variables>;RUN;

Data MERGE statement is used to combine multiple datasetsbased on values of specified common variables

Q: How will you perform a Many-to-Many merge in SAS? Q: What will happen if you don‟t use the “BY” statement while merging? 

One-to-

OneMerg

e

One-to-

Many Merg

e

Page 16: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 16/31

SAS Procedures

Page 17: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 17/31

Base SAS Procedures

Report WritingProcedures 

PRINTFREQ 

MEANS SUMMARY TABULATE 

PLOT SQL

StatisticalProcedures 

CHART FREQ 

MEANS CORR SQL 

SUMMARY UNIVARIATE

UtilityProcedures 

EXPORT IMPORT APPEND 

CONTENTS DATASETS 

SORT TRANSPOSE

• SAS Procedure or a PROC step always starts with a the word PROC  • Some commonly used Base SAS procedures are listed below

SAS Procedures – An Introduction

Page 18: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 18/31

• Most SAS procedures require an input dataset which is specified using the DATA= option• VAR specifies the variables on which the procedure is applicable. If no variables are

specified, then SAS will automatically apply the procedure on all the variables.• WHERE allows usage of a particular filter criteria on the procedure.• “Sales ” is used to refer the SAS dataset to elaborate any SAS procedure going forward  • Only a few most frequently used SAS procedures are covered in the training• Further, not all options available on the SAS procedure is covered in the training

 A typical SAS procedure has a few key words that are a part ofthe syntax

PROC <PROCEDURE NAME > DATA = <DSN Name> OPTIONS; • BY <Variable List>; • CLASS <Variable List>; • VAR ; • WHERE ; 

RUN ;

Page 19: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 19/31

SAS Procedure – PROC CONTENTS – displays the structureof the dataset

PROC CONTENTS DATA = SALES OUT = VAR_LIST VARNUM; RUN;

Name of the input data set

Option lists all the variables inthe same order as present in

the data set

Name of the output data setwill contain the list of thevariables with their formats

Q: What is the output if you don‟t use the option “varnum” 

# of Observations in thedataset

List of variables with their

Type, format, length and Label

Informs if the dataset has beensorted or not

Page 20: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 20/31

SAS Procedure – PROC PRINT – prints the observations of aSAS dataset in the SAS Output window

PROC PRINT DATA = SALES (FIRSTOBS = X OBS = Y); VAR <Variable List>; WHERE <Condition>; RUN;

Name of the input data set

Option: Print sampleobservations satisfying the

criteria

Option: To print only samplerecords from row # “X” to row

# ”Y” of the SAS dataset

Q: What will the syntax if you want to print last 10 observations in the output

Option: To print only selectedvariables from the SAS dataset

Output of Proc Print Procedure

Page 21: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 21/31

• The DATASETS procedure helps too Copy or append SAS files from one library to another

o Rename, repair or delete SAS fileso List the SAS files that are contained in a SAS libraryo Create or delete indexes

SAS Procedure – PROC DATASETS – is a utility procedurethat manages SAS files

PROC DATASETS MEMTYPE = DATA LIB = WORK NOLIST;APPEND BASE = DATA = ; CHANGE old_name = new_name  ; COPY IN = l ibref-1  OUT = l ibref-2  ; SELECT sas_fi les;  DELETE sas_fi les ; RUN;

Specifies the kind of files to process

Specifies the library

Option does not print any kind of outputin the SAS output window

Specifies the dataset to be renamed

Specifies the library to copy SAS datasets

Only specified datasets will be copied

Specifies SAS datasets to be deleted

Page 22: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 22/31

• SORT procedure can be used either to modify the original dataset or create a new sorteddataset

• SAS, by default, sorts the datasets in an ascending order, unless specified otherwise• Variables should be mentioned in the same order as sorting is required

• Using NODUPKEY option without using the OUT = statement may destroy your originaldataset and duplicate might not be available for any future analysis

SAS Procedure – PROC SORT – orders the SAS datasetobservations by the values of one or more variables

PROC SORT DATA = SALES OUT = <DSN Name>NODUPKEY DUPOUT = <DSN Name>; BY <Variable List>; WHERE <Condition>; RUN;

Name of the input data set

Option: Remove duplicates

Option: To print only samplerecords from row # “X” to row

# ”Y” of the SAS dataset

Q: What option will be used if you want to remove duplicate records, when

duplicates are to be identified using all the variables in the SAS dataset

Option: Store only duplicatesin a separate SAS dataset

Option: Sort values byone or more variables

Option: FilterCriteria

Page 23: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 23/31

• For two-way tables, PROC FREQ  can compute

tests and measures of association

SAS Procedure – PROC FREQ – produces one-way to n-wayfrequency and cross-tabulation (contingency) tables

PROC FREQ DATA = SALES; WEIGHT <Weight Variable>; TABLES <Variable List> /MISSING NOROW NOCOL NOPERCENT ALL; WHERE <Condition>; RUN;

Option: Give different weight to the

observations

Q: What option would you use to generate three way tables? Q: How would you output the results of Freq procedure to a SAS dataset?

Option:Missing: Treats missing values as aseparate observation Norow: Removes row percentages Nocol: Removes column percentages Nopercent: Removes cell percentage

Sample Proc Freq Procedure Output Output with Statistical Test

Page 24: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 24/31

• PROC MEANS  also computeso Descriptive statistics based on moments and quantileso Calculates confidence interval for meanso Performs t – test

SAS Procedure – PROC MEANS – produces summarystatistics

PROC MEANS DATA = SALES; CLASS <Variable List>; VAR <Variable List>; OUTPUT OUT = <DSN Name> <Summary Procedure>; RUN;

Example:SUM(Variable) = New Variable 1 MEAN(Variable) = New Variable 2 MIN(Variable) = New Variable 3 MAX(Variable) = New Variable 4

Sample Proc Means Procedure Output

Q: Which SAS dataset will contain the results without the use of “output out =“ option Q: What SAS default variables which will be created in the output SAS dataset?

Page 25: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 25/31

Page 26: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 26/31

Page 27: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 27/31

Page 28: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 28/31

Page 29: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 29/31

Page 30: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 30/31

1. Check the SAS log after every step of data processing

2. Use “Proc Print” post every data processing3. Ensure that meaningful names are given to SAS variables and SAS datasets4. Comment your code – Use them judiciously to indicate the purpose of SAS processing

and for future reference as well5. Indent the code so that its easy to read6. Use a “BY” while merging SAS datasets. Check the # of observations pre and post data

merge7. Use “else if” when using “if” statements recursively in a Data Step 8. Be careful of Divide by Zero errors during Data Step processing!9. Run your code on Sample 10 observations before running it on entire SAS Dataset10. Have your code audited and verified by some one else to confirm there are no logical

issues11. Check the processing for issues after creation of a new variable12. Be careful of missing values while processing13. Pay extra attention while reading external files into SAS. There are separate list of audit

checks to be followed to ensure there are no issues

List of audit checks to keep in mind while working with SAS fordata processing and analysis

ADD THE WORD “ALWAYS” IN FRONT OF EACH STATEMENT 

Page 31: MS Introduction to SAS Training

8/13/2019 MS Introduction to SAS Training

http://slidepdf.com/reader/full/ms-introduction-to-sas-training 31/31

• Other than the in built SAS help, there are many websites which provides assistanceo Website 1o http://v8doc.sas.com/sashtml/ o Website 2o http://www.ats.ucla.edu/stat/sas/ o Website 3o www.google.com 

SAS Help

IF NONE OF THESE HELP, ASK YOUR COLLEAGUE !!