24
Statistical Methods Statistical Methods Lynne Stokes Lynne Stokes Department of Statistical Science Department of Statistical Science Lecture 7: Introduction to SAS Lecture 7: Introduction to SAS Programming Language Programming Language

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language

Embed Size (px)

Citation preview

Statistical MethodsStatistical MethodsStatistical MethodsStatistical Methods

Lynne StokesLynne Stokes

Department of Statistical Department of Statistical ScienceScience

Lecture 7: Introduction to SAS Lecture 7: Introduction to SAS Programming LanguageProgramming Language

2

PreliminariesPreliminariesPreliminariesPreliminaries

• Create a Folder: c:/Stat6337Create a Folder: c:/Stat6337– Send to the DesktopSend to the Desktop

• Access BlackboardAccess Blackboard• Download the Eysenck Data FileDownload the Eysenck Data File• Download the lecture7Eysenck.sas File Download the lecture7Eysenck.sas File • Download the lecture7class.sas FileDownload the lecture7class.sas File• Download the lecture7SASSummary.doc Download the lecture7SASSummary.doc

FileFile

3

Eysenck’s Data FileEysenck’s Data FileEysenck’s Data FileEysenck’s Data FileAge

Group Counting Rhyming Adjective Imagery Intentionalold 9 7 11 12 10old 8 9 13 11 19old 6 6 8 16 14old 8 6 6 11 5old 10 6 14 9 10old 4 11 11 23 11old 6 6 13 12 14old 5 3 13 10 15old 7 8 10 19 11old 7 7 11 11 11young 8 10 14 20 21young 6 7 11 16 19young 4 8 18 16 17young 6 10 14 15 15young 7 4 13 18 22young 6 7 22 16 16young 5 10 17 20 22young 7 6 16 22 22young 9 7 12 14 18young 7 7 11 19 21

Recall Condition

4

Open the SAS ProgramOpen the SAS ProgramOpen the SAS ProgramOpen the SAS Program

• Double-click the lecture7.sas FileDouble-click the lecture7.sas File– Press the Run Icon (Runner Image)Press the Run Icon (Runner Image)

• EditorEditor– Create and Modify SAS Command FilesCreate and Modify SAS Command Files– Can Save Can Save in the Stat 6337 Folderin the Stat 6337 Folder : : File / Save As …File / Save As …

• LogLog– Messages about the Compilation and Execution of Messages about the Compilation and Execution of

the SAS Programthe SAS Program– Contains Contains Error Messages (in red)Error Messages (in red), if any, if any– Can Save Can Save in the Stat 6337 Folderin the Stat 6337 Folder : : File / Save As …File / Save As …

• OutputOutput– Results of the Execution of the SAS ProgramResults of the Execution of the SAS Program– Can Save Can Save in the Stat 6337 Folderin the Stat 6337 Folder : : File / Save As …File / Save As …

To Erase the Contents of the Log or Output FilesRight Click, Select “Clear All”

5

SAS StructureSAS StructureSAS StructureSAS Structure

• DATADATA StepStep– Describe the data, provide names for variables, Describe the data, provide names for variables,

define new or transformed variablesdefine new or transformed variables

• PROCPROCs : SAS Proceduress : SAS Procedures– Descriptive Statistics: Proc Univariate, Proc Descriptive Statistics: Proc Univariate, Proc

MeansMeans– Graphics: Proc Chart, Proc PlotGraphics: Proc Chart, Proc Plot– Regression: Proc RegRegression: Proc Reg– Two-sample t-tests: Proc TtestTwo-sample t-tests: Proc Ttest– Analysis of Variance: Proc Anova, Proc GLM, Analysis of Variance: Proc Anova, Proc GLM,

Proc MixedProc Mixed– Specialized Data Operations: Proc SortSpecialized Data Operations: Proc Sort– etc.etc.

6

SAS SyntaxSAS SyntaxSAS SyntaxSAS Syntax

• Every command Every command MUSTMUST end with a end with a semicolonsemicolon– Commands can continue over two or more linesCommands can continue over two or more lines– This WILL be Your #1, #2 & #3 Mistakes !!!!This WILL be Your #1, #2 & #3 Mistakes !!!!

• Variable namesVariable names are 1-8 characters (letters are 1-8 characters (letters and numerals, beginning with a letter or and numerals, beginning with a letter or underscore), but no blanks or special underscore), but no blanks or special characters characters – Note: values for character variables can exceed 8 Note: values for character variables can exceed 8

characterscharacters

• CommentsComments – Begin with Begin with **, end with , end with ;;– Can comment several lines: begin with Can comment several lines: begin with /*/* and end with and end with

*/*/

7

Data Input in the SAS FileData Input in the SAS FileData Input in the SAS FileData Input in the SAS File

• DataData fname fname ;;– creates temporary file with the data that are creates temporary file with the data that are

described in the data stepdescribed in the data step• InputInput namename . . . . . . name $name $ . . . . . . ;;

– list input: lists the variable names (1 – 8 list input: lists the variable names (1 – 8 characters/letters), characters/letters), namename is assumed to be a is assumed to be a quantitative variablequantitative variable

– namename MUSTMUST be followed by be followed by $$ if if namename is a character is a character variablevariable

– alternatives: comma separated, column specifiedalternatives: comma separated, column specified• DatalinesDatalines (or (or CardsCards)) ; ;

– indicates that the data follow, line by lineindicates that the data follow, line by line• ;;

– indicates that the last line of data has been input, the indicates that the last line of data has been input, the semicolon is on a line by itselfsemicolon is on a line by itself

• Example:Example: lecture7class.sas lecture7class.sas – Open lecture7class.sasOpen lecture7class.sas

» Change filename, if necessaryChange filename, if necessary– Clear output and log files; Run lecture7class.sasClear output and log files; Run lecture7class.sas

8

Data Input with Multiple Data Input with Multiple Responses on a Single Line of Responses on a Single Line of

the Data Filethe Data File

Data Input with Multiple Data Input with Multiple Responses on a Single Line of Responses on a Single Line of

the Data Filethe Data File

• SAS Requires that Each Response Value be on SAS Requires that Each Response Value be on a Separate Line of Dataa Separate Line of Data

• When n Responses are on One Line of DataWhen n Responses are on One Line of Data– Input y1 y2 … yn– y = y1; output;– y = y2; output;– . . . – y = yn; output;

• If y1 …yn Represent Responses for n Levels of If y1 …yn Represent Responses for n Levels of a Factora Factor– Input y1 y2 … yn– factor = ‘Level 1’; y = y1; output;– factor = ‘Level 2’; y = y2; output;– . . . – factor = ‘Level n’; y = yn; output;

• Example:Example: lecture7.sas– Data Flow2Data Flow2

Creates n DataLines with 1 Response Value

on Each Line

Creates n DataLines with 1

Factor & Response Valueon Each Line

9

Data Input from an Data Input from an External FileExternal File

Data Input from an Data Input from an External FileExternal File

• Filename fn ‘complete directory/file specification’ ;Filename fn ‘complete directory/file specification’ ;– e.g., filename eysdata filename eysdata

‘c:/Stat6337/EysenckRecall.dat’‘c:/Stat6337/EysenckRecall.dat’– Be Careful with Spaces in Directories and File Names !!!Be Careful with Spaces in Directories and File Names !!!

• Data fname ;Data fname ;– creates temporary file with the data that are described in the

data step

• Infile fn ;Infile fn ;– input the data from the file labeled fn

• Input name . . . name $ . . . ;Input name . . . name $ . . . ;– lists the variable names (1 – 8 characters/letters), name is

assumed to be a quantitative variable– name MUST be followed by $ if name is a character variable

• Run ;Run ;– indicates that the data step is completed

• Example:Example: lecture7class.sas– Data RecallData Recall

10

Program Data VectorProgram Data VectorProgram Data VectorProgram Data Vector

• One line of data is stored, as indicated One line of data is stored, as indicated on the on the InputInput statement of the statement of the Data Data StepStep

• Any calculations, deletions, etc. in the Any calculations, deletions, etc. in the Data StepData Step are performed on that line are performed on that line of dataof data

• When the When the Data StepData Step is completed, the is completed, the variables in the variables in the Program Data VectorProgram Data Vector are output to a temporary (work) fileare output to a temporary (work) file

• Can force data lines to be written at Can force data lines to be written at any time with the any time with the OutputOutput statement statement

11

Operations in the Data Operations in the Data StepStep

Operations in the Data Operations in the Data StepStep

• Arithmetic OperationsArithmetic Operations– x = u + v ;x = u + v ;

• TransformationsTransformations– x = log(y) ;x = log(y) ;

• LogicalLogical– If x > 0 then z = y/x ;If x > 0 then z = y/x ;

• RecodingRecoding– If gender = ‘m’ then gender = ‘Male’;If gender = ‘m’ then gender = ‘Male’;

else if gender = ‘f’ then gender = ‘Female’; else if gender = ‘f’ then gender = ‘Female’;– Note: SAS formats based on the first value of a Note: SAS formats based on the first value of a

variablevariable– To force a length (e.g., character variable), use To force a length (e.g., character variable), use

lengthlength

12

Titles and LabelsTitles and LabelsTitles and LabelsTitles and Labels

• Title#Title# ‘…’ ‘…’ ;;– Up to 10 title lines: Up to 10 title lines: title# ‘include your title here’;title# ‘include your title here’;– Can be placed in Data Steps or ProcsCan be placed in Data Steps or Procs– Changing Title# replaces that title and eliminates Changing Title# replaces that title and eliminates

Titlex, where x > #Titlex, where x > #

• LabelLabel namename = ‘…’ = ‘…’ ;;– Can be in a Data Step or Proc PrintCan be in a Data Step or Proc Print

13

Some Useful PROCsSome Useful PROCsSome Useful PROCsSome Useful PROCs

• Proc Chart– vertical or horizontal bar charts

• Proc Freq– frequency distributions, cross tabs

• Proc Means– select summary statistics

• Proc Plot– scatterplots

• Proc Print– prints data files

• Proc Sort– sorts data files by the values of one or more

variables

• Proc Univariate– a wide range of summary statistics, box plots

14

General Form of PROCsGeneral Form of PROCsGeneral Form of PROCsGeneral Form of PROCs

PROC xxxx data=fname options; by groups; proc-specific statements; title . . . ; output out = fn . . . ;run ;

15

Printing to the Output FilePrinting to the Output FilePrinting to the Output FilePrinting to the Output File

• Proc PrintProc Print data = fnamedata = fname ; ;– varvar . . . . . . ;; lists the variables to be lists the variables to be

printed (can be omitted)printed (can be omitted)– run ;run ; indicates the print commands indicates the print commands

are completeare complete

16

Group AnalysesGroup AnalysesGroup AnalysesGroup Analyses

• Sort the GroupsSort the Groups– Proc Sort data= … ;Proc Sort data= … ;– by group;by group;– run;run;

• Execute the Proc, by GroupExecute the Proc, by Group– Proc xxx data= … ;Proc xxx data= … ;– by group;by group;– . . . . . . – run;run;

17

Summarize the Recall DataSummarize the Recall DataSummarize the Recall DataSummarize the Recall Data

Calculate frequencies for each condition/group and each ageProc Freq

Graph a histogram of the recall dataProc Chart

Calculate the average, standard deviation,minimum, and maximum to 2 decimal places

Proc Means

18

Summarize the Recall DataSummarize the Recall DataSummarize the Recall DataSummarize the Recall Data

Calculate descriptive statistics for each condition/groupProc Means, Proc Univariate

Note: Sort First, then Use the BY Command.

Graph Average Recall for All Combinations ofRecall Condition/Group and Age

Use a Group Identifier as the Plotting Symbol

Proc Plot

19

Proc AnovaProc AnovaProc AnovaProc Anova

• Only for Complete Factorial Only for Complete Factorial Experiments in Completely Experiments in Completely Randomized DesignsRandomized Designs– Otherwise: Proc GLM

• MUST have an Equal Number of MUST have an Equal Number of Repeats for Each Factor-Level Repeats for Each Factor-Level CombinationCombination

20

Proc AnovaProc AnovaProc AnovaProc Anova

• Proc Anova data = fn ;Proc Anova data = fn ;– By … ;By … ;

» Separate ANOVA Fits for Each Value of the BY variable(s).

– Class … ;Class … ;» List all the factors.

– Model … / options;Model … / options;» e.g., model recall = age group age*group ;

• factors: list individually; e.g. age group• interactions: connect with asterisk(s); e.g., age*group

– Means … / options;Means … / options;» e.g., means age group age*group / t bon;

– Run;Run;

21

Eysenck’s Study of Eysenck’s Study of Incidental LearningIncidental LearningEysenck’s Study of Eysenck’s Study of Incidental LearningIncidental Learning

Make analysis of variance calculations,use only recall condition as factor.

Calculate factor-level averages, with the toption.

22

Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body

LengthsLengths

Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body

LengthsLengths

Research Question:Do Mean Body Lengths (cm) Differ by

Cocaine Usage?

Research Question:Do Mean Body Lengths (cm) Differ by

Cocaine Usage?

Usage Groups: First Trimester Throughout Pregnancy Drug-Free

23

Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body

LengthsLengths

Effect of Cocaine Usage on Effect of Cocaine Usage on Newborn Infant Body Newborn Infant Body

LengthsLengthsFirst Throughout

Case Trimester Pregnancy Drug-Free1 45.1 40.2 44.32 45.7 41.3 45.33 45.8 41.7 46.94 46.7 41.9 47.05 47.3 43.4 47.2

Average 46.12 41.70 46.14

24

AssignmentAssignmentAssignmentAssignment

• Create a Data FileCreate a Data File• Input the Data File into a SAS Input the Data File into a SAS

ProgramProgram• Cocaine Usage GroupsCocaine Usage Groups

– Calculate Averages and Standard Deviations

– Make Comparative Box Plots– Test the Equality of the Group Means

• Email Me ONLY the FINAL .log Email Me ONLY the FINAL .log FileFile