Statistical MethodsStatistical MethodsStatistical MethodsStatistical Methods
Lynne StokesLynne Stokes
Department of Statistical Department of Statistical ScienceScience
Lecture 7: Introduction to SAS Lecture 7: Introduction to SAS Programming LanguageProgramming Language
2
PreliminariesPreliminariesPreliminariesPreliminaries
• Create a Folder: c:/Stat6337Create a Folder: c:/Stat6337– Send to the DesktopSend to the Desktop
• Access BlackboardAccess Blackboard• Download the Eysenck Data FileDownload the Eysenck Data File• Download the lecture7Eysenck.sas File Download the lecture7Eysenck.sas File • Download the lecture7class.sas FileDownload the lecture7class.sas File• Download the lecture7SASSummary.doc Download the lecture7SASSummary.doc
FileFile
3
Eysenck’s Data FileEysenck’s Data FileEysenck’s Data FileEysenck’s Data FileAge
Group Counting Rhyming Adjective Imagery Intentionalold 9 7 11 12 10old 8 9 13 11 19old 6 6 8 16 14old 8 6 6 11 5old 10 6 14 9 10old 4 11 11 23 11old 6 6 13 12 14old 5 3 13 10 15old 7 8 10 19 11old 7 7 11 11 11young 8 10 14 20 21young 6 7 11 16 19young 4 8 18 16 17young 6 10 14 15 15young 7 4 13 18 22young 6 7 22 16 16young 5 10 17 20 22young 7 6 16 22 22young 9 7 12 14 18young 7 7 11 19 21
Recall Condition
4
Open the SAS ProgramOpen the SAS ProgramOpen the SAS ProgramOpen the SAS Program
• Double-click the lecture7.sas FileDouble-click the lecture7.sas File– Press the Run Icon (Runner Image)Press the Run Icon (Runner Image)
• EditorEditor– Create and Modify SAS Command FilesCreate and Modify SAS Command Files– Can Save Can Save in the Stat 6337 Folderin the Stat 6337 Folder : : File / Save As …File / Save As …
• LogLog– Messages about the Compilation and Execution of Messages about the Compilation and Execution of
the SAS Programthe SAS Program– Contains Contains Error Messages (in red)Error Messages (in red), if any, if any– Can Save Can Save in the Stat 6337 Folderin the Stat 6337 Folder : : File / Save As …File / Save As …
• OutputOutput– Results of the Execution of the SAS ProgramResults of the Execution of the SAS Program– Can Save Can Save in the Stat 6337 Folderin the Stat 6337 Folder : : File / Save As …File / Save As …
To Erase the Contents of the Log or Output FilesRight Click, Select “Clear All”
5
SAS StructureSAS StructureSAS StructureSAS Structure
• DATADATA StepStep– Describe the data, provide names for variables, Describe the data, provide names for variables,
define new or transformed variablesdefine new or transformed variables
• PROCPROCs : SAS Proceduress : SAS Procedures– Descriptive Statistics: Proc Univariate, Proc Descriptive Statistics: Proc Univariate, Proc
MeansMeans– Graphics: Proc Chart, Proc PlotGraphics: Proc Chart, Proc Plot– Regression: Proc RegRegression: Proc Reg– Two-sample t-tests: Proc TtestTwo-sample t-tests: Proc Ttest– Analysis of Variance: Proc Anova, Proc GLM, Analysis of Variance: Proc Anova, Proc GLM,
Proc MixedProc Mixed– Specialized Data Operations: Proc SortSpecialized Data Operations: Proc Sort– etc.etc.
6
SAS SyntaxSAS SyntaxSAS SyntaxSAS Syntax
• Every command Every command MUSTMUST end with a end with a semicolonsemicolon– Commands can continue over two or more linesCommands can continue over two or more lines– This WILL be Your #1, #2 & #3 Mistakes !!!!This WILL be Your #1, #2 & #3 Mistakes !!!!
• Variable namesVariable names are 1-8 characters (letters are 1-8 characters (letters and numerals, beginning with a letter or and numerals, beginning with a letter or underscore), but no blanks or special underscore), but no blanks or special characters characters – Note: values for character variables can exceed 8 Note: values for character variables can exceed 8
characterscharacters
• CommentsComments – Begin with Begin with **, end with , end with ;;– Can comment several lines: begin with Can comment several lines: begin with /*/* and end with and end with
*/*/
7
Data Input in the SAS FileData Input in the SAS FileData Input in the SAS FileData Input in the SAS File
• DataData fname fname ;;– creates temporary file with the data that are creates temporary file with the data that are
described in the data stepdescribed in the data step• InputInput namename . . . . . . name $name $ . . . . . . ;;
– list input: lists the variable names (1 – 8 list input: lists the variable names (1 – 8 characters/letters), characters/letters), namename is assumed to be a is assumed to be a quantitative variablequantitative variable
– namename MUSTMUST be followed by be followed by $$ if if namename is a character is a character variablevariable
– alternatives: comma separated, column specifiedalternatives: comma separated, column specified• DatalinesDatalines (or (or CardsCards)) ; ;
– indicates that the data follow, line by lineindicates that the data follow, line by line• ;;
– indicates that the last line of data has been input, the indicates that the last line of data has been input, the semicolon is on a line by itselfsemicolon is on a line by itself
• Example:Example: lecture7class.sas lecture7class.sas – Open lecture7class.sasOpen lecture7class.sas
» Change filename, if necessaryChange filename, if necessary– Clear output and log files; Run lecture7class.sasClear output and log files; Run lecture7class.sas
8
Data Input with Multiple Data Input with Multiple Responses on a Single Line of Responses on a Single Line of
the Data Filethe Data File
Data Input with Multiple Data Input with Multiple Responses on a Single Line of Responses on a Single Line of
the Data Filethe Data File
• SAS Requires that Each Response Value be on SAS Requires that Each Response Value be on a Separate Line of Dataa Separate Line of Data
• When n Responses are on One Line of DataWhen n Responses are on One Line of Data– Input y1 y2 … yn– y = y1; output;– y = y2; output;– . . . – y = yn; output;
• If y1 …yn Represent Responses for n Levels of If y1 …yn Represent Responses for n Levels of a Factora Factor– Input y1 y2 … yn– factor = ‘Level 1’; y = y1; output;– factor = ‘Level 2’; y = y2; output;– . . . – factor = ‘Level n’; y = yn; output;
• Example:Example: lecture7.sas– Data Flow2Data Flow2
Creates n DataLines with 1 Response Value
on Each Line
Creates n DataLines with 1
Factor & Response Valueon Each Line
9
Data Input from an Data Input from an External FileExternal File
Data Input from an Data Input from an External FileExternal File
• Filename fn ‘complete directory/file specification’ ;Filename fn ‘complete directory/file specification’ ;– e.g., filename eysdata filename eysdata
‘c:/Stat6337/EysenckRecall.dat’‘c:/Stat6337/EysenckRecall.dat’– Be Careful with Spaces in Directories and File Names !!!Be Careful with Spaces in Directories and File Names !!!
• Data fname ;Data fname ;– creates temporary file with the data that are described in the
data step
• Infile fn ;Infile fn ;– input the data from the file labeled fn
• Input name . . . name $ . . . ;Input name . . . name $ . . . ;– lists the variable names (1 – 8 characters/letters), name is
assumed to be a quantitative variable– name MUST be followed by $ if name is a character variable
• Run ;Run ;– indicates that the data step is completed
• Example:Example: lecture7class.sas– Data RecallData Recall
10
Program Data VectorProgram Data VectorProgram Data VectorProgram Data Vector
• One line of data is stored, as indicated One line of data is stored, as indicated on the on the InputInput statement of the statement of the Data Data StepStep
• Any calculations, deletions, etc. in the Any calculations, deletions, etc. in the Data StepData Step are performed on that line are performed on that line of dataof data
• When the When the Data StepData Step is completed, the is completed, the variables in the variables in the Program Data VectorProgram Data Vector are output to a temporary (work) fileare output to a temporary (work) file
• Can force data lines to be written at Can force data lines to be written at any time with the any time with the OutputOutput statement statement
11
Operations in the Data Operations in the Data StepStep
Operations in the Data Operations in the Data StepStep
• Arithmetic OperationsArithmetic Operations– x = u + v ;x = u + v ;
• TransformationsTransformations– x = log(y) ;x = log(y) ;
• LogicalLogical– If x > 0 then z = y/x ;If x > 0 then z = y/x ;
• RecodingRecoding– If gender = ‘m’ then gender = ‘Male’;If gender = ‘m’ then gender = ‘Male’;
else if gender = ‘f’ then gender = ‘Female’; else if gender = ‘f’ then gender = ‘Female’;– Note: SAS formats based on the first value of a Note: SAS formats based on the first value of a
variablevariable– To force a length (e.g., character variable), use To force a length (e.g., character variable), use
lengthlength
12
Titles and LabelsTitles and LabelsTitles and LabelsTitles and Labels
• Title#Title# ‘…’ ‘…’ ;;– Up to 10 title lines: Up to 10 title lines: title# ‘include your title here’;title# ‘include your title here’;– Can be placed in Data Steps or ProcsCan be placed in Data Steps or Procs– Changing Title# replaces that title and eliminates Changing Title# replaces that title and eliminates
Titlex, where x > #Titlex, where x > #
• LabelLabel namename = ‘…’ = ‘…’ ;;– Can be in a Data Step or Proc PrintCan be in a Data Step or Proc Print
13
Some Useful PROCsSome Useful PROCsSome Useful PROCsSome Useful PROCs
• Proc Chart– vertical or horizontal bar charts
• Proc Freq– frequency distributions, cross tabs
• Proc Means– select summary statistics
• Proc Plot– scatterplots
• Proc Print– prints data files
• Proc Sort– sorts data files by the values of one or more
variables
• Proc Univariate– a wide range of summary statistics, box plots
14
General Form of PROCsGeneral Form of PROCsGeneral Form of PROCsGeneral Form of PROCs
PROC xxxx data=fname options; by groups; proc-specific statements; title . . . ; output out = fn . . . ;run ;
15
Printing to the Output FilePrinting to the Output FilePrinting to the Output FilePrinting to the Output File
• Proc PrintProc Print data = fnamedata = fname ; ;– varvar . . . . . . ;; lists the variables to be lists the variables to be
printed (can be omitted)printed (can be omitted)– run ;run ; indicates the print commands indicates the print commands
are completeare complete
16
Group AnalysesGroup AnalysesGroup AnalysesGroup Analyses
• Sort the GroupsSort the Groups– Proc Sort data= … ;Proc Sort data= … ;– by group;by group;– run;run;
• Execute the Proc, by GroupExecute the Proc, by Group– Proc xxx data= … ;Proc xxx data= … ;– by group;by group;– . . . . . . – run;run;
17
Summarize the Recall DataSummarize the Recall DataSummarize the Recall DataSummarize the Recall Data
Calculate frequencies for each condition/group and each ageProc Freq
Graph a histogram of the recall dataProc Chart
Calculate the average, standard deviation,minimum, and maximum to 2 decimal places
Proc Means
18
Summarize the Recall DataSummarize the Recall DataSummarize the Recall DataSummarize the Recall Data
Calculate descriptive statistics for each condition/groupProc Means, Proc Univariate
Note: Sort First, then Use the BY Command.
Graph Average Recall for All Combinations ofRecall Condition/Group and Age
Use a Group Identifier as the Plotting Symbol
Proc Plot
19
Proc AnovaProc AnovaProc AnovaProc Anova
• Only for Complete Factorial Only for Complete Factorial Experiments in Completely Experiments in Completely Randomized DesignsRandomized Designs– Otherwise: Proc GLM
• MUST have an Equal Number of MUST have an Equal Number of Repeats for Each Factor-Level Repeats for Each Factor-Level CombinationCombination
20
Proc AnovaProc AnovaProc AnovaProc Anova
• Proc Anova data = fn ;Proc Anova data = fn ;– By … ;By … ;
» Separate ANOVA Fits for Each Value of the BY variable(s).
– Class … ;Class … ;» List all the factors.
– Model … / options;Model … / options;» e.g., model recall = age group age*group ;
• factors: list individually; e.g. age group• interactions: connect with asterisk(s); e.g., age*group
– Means … / options;Means … / options;» e.g., means age group age*group / t bon;
– Run;Run;
21
Eysenck’s Study of Eysenck’s Study of Incidental LearningIncidental LearningEysenck’s Study of Eysenck’s Study of Incidental LearningIncidental Learning
Make analysis of variance calculations,use only recall condition as factor.
Calculate factor-level averages, with the toption.
22
Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body
LengthsLengths
Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body
LengthsLengths
Research Question:Do Mean Body Lengths (cm) Differ by
Cocaine Usage?
Research Question:Do Mean Body Lengths (cm) Differ by
Cocaine Usage?
Usage Groups: First Trimester Throughout Pregnancy Drug-Free
23
Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body
LengthsLengths
Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body
LengthsLengthsFirst Throughout
Case Trimester Pregnancy Drug-Free1 45.1 40.2 44.32 45.7 41.3 45.33 45.8 41.7 46.94 46.7 41.9 47.05 47.3 43.4 47.2
Average 46.12 41.70 46.14
24
AssignmentAssignmentAssignmentAssignment
• Create a Data FileCreate a Data File• Input the Data File into a SAS Input the Data File into a SAS
ProgramProgram• Cocaine Usage GroupsCocaine Usage Groups
– Calculate Averages and Standard Deviations
– Make Comparative Box Plots– Test the Equality of the Group Means
• Email Me ONLY the FINAL .log Email Me ONLY the FINAL .log FileFile