Getting Start SAS Class (1)

Embed Size (px)

Citation preview

  • 8/12/2019 Getting Start SAS Class (1)

    1/41

    How to start using SAS

  • 8/12/2019 Getting Start SAS Class (1)

    2/41

    The topics

    An overview of the SAS system

    Reading raw data/ create SAS data set

    Combining SAS data sets & Match mergingSAS Data Sets

    Formatting data

    Introduce some simple regression procedure

    Summary report procedures

  • 8/12/2019 Getting Start SAS Class (1)

    3/41

    Basic Screen Navigation

    Main: Editor

    contains the SAS program to be submitted. Log

    contains information about the processing of the SASprogram, including any warning and error messages Output

    contains reports generated by SAS procedures andDATA steps

    Side:

    Explorenavigate to other objects like libraries

    Resultsnavigate your Output window

  • 8/12/2019 Getting Start SAS Class (1)

    4/41

    SAS programs

    A SAS program is a sequence of steps that the usersubmits for execution.

    Data steps are typically used to create SAS data sets

    PROC steps are typically used to process SAS datasets (that is, generate reports and graphs, editdata, sort data and analyze data

  • 8/12/2019 Getting Start SAS Class (1)

    5/41

    SAS Data Libraries

    A SAS data library is a collection of SAS files that are

    recognized as a unit by SAS

    A SAS data set is one type of SAS file stored in a datalibrary

    Work library is temporary library, when SAS is closed, all

    the datasets in the Work library are deleted; create a

    permanent SAS dataset via your own library.

  • 8/12/2019 Getting Start SAS Class (1)

    6/41

    SAS Data Libraries

    Identify SAS data libraries by assigning each a library referencename (libref) with LIBNAME statement

    LIBNAMElibref file-folder-location;Eg: LIBNAME readData 'C:\temp\sas class\readData;

    Rules for naming a libref:

    The name must be 8 characters or less

    The name must begin with a letter or underscore

    The remaining characters must be letters, numbers orunderscores.

  • 8/12/2019 Getting Start SAS Class (1)

    7/41

    Reading raw data set into SASsystem

    In order to create a SAS data set from a rawdata file, you must

    Start a DATA step and name the SAS data setbeing created (DATA statement)

    Identify the location of the raw data file to read(INFILE statement)

    Describe how to read the data fields from the raw

    data file (INPUT statement)

  • 8/12/2019 Getting Start SAS Class (1)

    8/41

    Reading external raw data file intoSAS system

    LIBNAME readData 'C:\temp\sas class\readData;DATAreadData.wa80;

    INFILEk:\census\stf2_wa80.txt;INPUT @10 SUMRYLVL $2. @40 COUNTY $3.

    @253 TABA1 9.0 @271 TABA1 9.0;

    RUN;

    TheLIBNAME statement assigns a libref readData to a data library. The DATAstatement creates a permanent SAS data set named wa80. The INFILE statement points to a raw data file. The INPUTstatement

    - name the SAS variables

    - identify the variables as character or numeric ($ indicates character data)- specify the locations of the fields in the raw data- can be specified as column, formatted, list, or named input

    The RUN statement detects the end of a step

  • 8/12/2019 Getting Start SAS Class (1)

    9/41

    Example 1

    Reading raw data separated by spaces

    /* Create a SAS permanent data set named HighLow1;Read the data file temperature1.dat using listing input*/

    DATAreadData.HighLow1;INFILEC:\sas class\readData\temperature1.dat;INPUTCity $ State $ NormalHigh NormalLow

    RecordHigh RecordLow;RUN;

    /*The PROC PRINT step creates a isting report of thereadData.HighLow1 data set*/

    PROC PRINT DATA= readData.highlow1;TITLEHigh and Low Temperatures for July;

    RUN;

    Nome AK 55 44 88 29Miami FL 90 75 97 65

    Raleign NC 88 68 105 50

    temperature1.dat:

  • 8/12/2019 Getting Start SAS Class (1)

    10/41

    Example 2

    Reading multiple lines of raw data per observation

    /*Read the data file using line pointer, slash(/) and pount-n (#n).

    The slash(/) indicates next line, the #n means to go to the n line

    for that observation. Slash(/) can be replaced by #2 here */

    DATAreadData.highlow2;

    INFILEC:\sas class\readData\temperature2.dat;

    INPUTCity $ State $

    / NormalHigh NormalLow

    #3 RecordHigh RecordLow;

    PROC PRINT DATA= readData.highlow2;

    TITLEHigh and Low Temperatures for July;

    RUN;

    Nome AK

    55 44

    88 29

    Miami FL

    90 75

    97 65

    Raleign NC88 68

    105 50

    temperature2.dat:

  • 8/12/2019 Getting Start SAS Class (1)

    11/41

    Example 3

    Reading multiple observations per line of raw data

    /*To read multiple observations per line of raw data,use double railing atsigns (@@) at the end of INPUT statement */

    DATAreadData.highlow3;

    INFILEC:\sas class\readData\temperature3.dat;

    INPUTCity $ State $ NormalHigh NormalLow RecordHighRecordLow @@;

    PROC PRINT DATA= readData.highlow3;

    TITLEHigh and Low Temperatures for July;

    RUN;

    Nome AK 55 44 88 29 Miami FL 90 75 97 65 Raleign NC 88

    68 105 50

    temperature3.dat:

  • 8/12/2019 Getting Start SAS Class (1)

    12/41

    Reading external raw data file intoSAS system

    Reading raw data arranged in columns

    INPUTFILEID $ 1-5 RECTYP $ 6-9 SUMRYLVL $ 10-11URBARURL $ 12-13 SMSACOM $ 14-15;

    Reading raw data mixed in columnsINPUTFILEID $ 1-5 @10 SUMRYLVL $ 2. @253 TABA1 9.0

    @271 TABA1 9.0;

    /* The @n is the column pointer, where n is the number of the columnSAS should move to. The $w. reads standard character data, and

    w.d reads standard numeric data, where w is the total width and dis the number of decimal places. */

  • 8/12/2019 Getting Start SAS Class (1)

    13/41

    Reading Delimited or PC DatabaseFiles with the IMPORT Procedure

    If your data file has the proper extension, use the simplest form ofthe IMPORT procedure:

    PROC IMPORT DATA FILE= filename OUT= data-set

    Type of File Extensio n DBMS Identi f ier

    Comma-delimited .csv CSVTab-delimited .txt TABExcel .xls EXCELLotus Files .wk1, .wk3, .wk4 WK1,WK3,WK4Delimiters other than commas or tabs DLM

    Examples:

    1.PROC IMPORTDATAFILE=c:\temp\sale.csv OUT=readData.money; RUN;

    2. PROC IMPORT DATAFILE=c:\temp\bands.xls OUT=readData.music; RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    14/41

    Reading Files with the IMPORTProcedure

    If your file does not have the proper extension, or your fileis of type with delimiters other than commas or tabs, thenyou must use the DBMS=and DELIMITER=option

    PROC IMPORT DATAFILE = filename OUT= data-setDBMS= identifier;

    DELIMITER = delimiter-character;

    RUN;

    Example:

    PROC IMPORT DATAFILE = C:\sas class\readData\import2.txtOUT=readData.sasfile DBMS=DLM;

    DELIMITER = &;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    15/41

    Format in SAS data set

    Standard Formats (selected): Character: $w.

    Date, Time and Datetime:

    DATEw., MMDDYYw., TIMEw.d, Numeric: COMMAw.d, DOLLARw.d,

    Use FORMAT statement

    PROC PRINT DATA=sales;

    VAR Name DateReturned CandyType Profit;

    FORMAT DateReturned DATE9. Profit DOLLAR 6.2;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    16/41

    Format in SAS data set

    Create your own custom formats with two steps: Create the format using PROC FORMAT and VALUE statement.

    Assign the format to the variable using FORMAT statement.

    General form of a simple PROC FORMAT steps:

    PROC FORMAT;

    VALUEname range-1=formatted-text-1

    range-2=formatted-text-2;

    RUN;

    The namein VALUE statement is the name of the format you arecreating, which cant be longer than eight characters, must not start or

    end with a number. If the format is for character data, it must startwith a $.

  • 8/12/2019 Getting Start SAS Class (1)

    17/41

    Format in SAS data set

    Exmaple:

    /* Step1: Create the format for certain variables */

    PROCFORMAT;VALUEgenFmt 1 = 'Male'

    2 = 'Female';VALUE money

    low-

  • 8/12/2019 Getting Start SAS Class (1)

    18/41

    Format in SAS data set

    Permanently store formats in a SAS catalog by Creating a format catalog file with LIB in PROC

    FORMATstatement

    Setting the format search options

    Example:

    LIBNAME class C:\sas class\Format;

    OPTIONS FMTSEARCH=(fmtData.fmtvalue); RUN;

    PROC FORMAT LIB=fmtData.fmtvalue;

    VALUE genFmt 1 = Male 2=Female;RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    19/41

    Combining SAS Data Sets:Concatenating and Interleaving

    Use the SET statement in a DATA step to

    concatenate SAS data sets.

    Use the SET and BY statements in a DATA

    step to interleave SAS data sets.

  • 8/12/2019 Getting Start SAS Class (1)

    20/41

    Combining SAS Data Sets:Concatenating and Interleaving

    General form of a DATA step concatenation: DATA SAS-data-set;

    SETSAS-data-set1 SAS-data-set2 ;

    RUN;

    Example:

    DATAstack.allEmp;

    SETstack.emp1 stack.emp2 stack.emp3;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    21/41

    Combining SAS Data Sets:Concatenating and Interleaving

    General form of a DATA step interleave: DATASAS-data-set;

    SETSAS-data-set1 SAS-data-set2 ;

    BYBY-variable;

    RUN;

    Sort all SAS data set first by using PROC SORT

    Example:

    PROC SORTdata=stack.emp2 OUT=stack.emp2_sorted; BYSalary;RUN;

    DATAstack.allEmp;SETstack.emp1 stack.emp2 stack.emp3;

    BYsalary;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    22/41

    Match-Merging SAS Data Sets

    One-to-one match merge

    One-to-many match merge

    Many-to-many match merge

    The SAS statements for all three types of match

    merge are identical in the following form:

    DATAnew-data-set;

    MERGEdata-set-1 data-set-2 data-set-3;BYby-variable(s);/* indicates the variable(s) that control

    which observations to match */

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    23/41

    Merging SAS Data Sets: A MoreComplex Example

    /*To match-merge the data sets by common variables -EmpID, the data sets must be ordered by EmpID*/

    PROC SORTdata=combData.Groupsched;

    BYEmpID;

    RUN;

    Example: Merge two data sets acquire the names of the groupteam that is scheduled to fly next week.

    combData.employee combData.groupsched

    EmpID LastName

    E00632 Strauss

    E01483 Lee

    E01996 Nick

    E04064 Waschk

    EmpID FlightNum

    E04064 5105

    E0632 5250

    E01996 5501

  • 8/12/2019 Getting Start SAS Class (1)

    24/41

    Merging SAS Data Sets: A MoreComplex Example

    /* simply merge two data sets*/

    DATA combData.nextweek;

    MERGE combData.employee combData.groupsched;

    BY EmpID;

    RUN;

    EmpID LastJName FlightNum

    E00632 Strauss 5250

    E01483 Lee

    E01996 Nick 5501

    E04064 Waschk 5105

  • 8/12/2019 Getting Start SAS Class (1)

    25/41

    Merging SAS Data Sets: A MoreComplex Example

    Eliminating NonmatchesUse the IN=data set option to determine which dataset(s)

    contributed to the current observation.

    General form of the IN=data set option:SAS-data-set (IN=variable)

    Variableis a temporary numeric variable that has twopossible values:

    0indicates that the data set did notcontribute to thecurrent observation.

    1 indicates that the data set didcontribute to thecurrent observation.

  • 8/12/2019 Getting Start SAS Class (1)

    26/41

    Merging SAS Data Sets: A MoreComplex Example

    /*Exclude from the data set employee who are scheduled to fly nextweek. */

    LIBNAME combData K:\sas class\merge;

    DATA combData.nextweek;

    MERGE combData.employeecombData.groupsched (in=InSched);

    BYEmpID;IFInSched=1; True

    RUN;

    EmpID LastJName FlightNum

    E00632 Strauss 5250

    E01996 Nick 5501

    E04064 Waschk 5105

  • 8/12/2019 Getting Start SAS Class (1)

    27/41

    Merging SAS Data Sets: A MoreComplex Example

    /* Find employees who are not in the flight scheduled group. */

    LIBNAME combData K:\sas class\merge;

    DATAcombData .nextweek;MERGE combData .employee (in=InEmp)

    combData.groupsched (in=InSched);BYEmpID;IFInEmp=1; TrueIFInSched=0; False

    RUN;

    EmpID LastJName FlightNum

    E01483 Lee

  • 8/12/2019 Getting Start SAS Class (1)

    28/41

    Different Types of Merges in SAS

    DATA work.three;

    MERGE work.one work.two;

    BY X;

    RUN;

    One-to-Many Merging

    X Y

    1 A

    2 B

    3 C

    X E

    1 A1

    1 A2

    2 B1

    3 C1

    3 C2

    X Y Z

    1 A A1

    1 A A2

    2 B B1

    3 C C1

    3 C C2

    Work.three

    Work.two

    Work.one

  • 8/12/2019 Getting Start SAS Class (1)

    29/41

    Different Types of Merges in SAS

    DATA work.three;

    MERGE work.one work.two;

    BY X;

    RUN;

    Many-to-Many Merging

    X Y

    1 A1

    1 A2

    2 B1

    2 B2

    X Z

    1 AA1

    1 AA2

    1 AA3

    2 BB1

    2 BB2

    X Y Z

    1 A1 AA1

    1 A2 AA2

    1 A2 AA3

    2 B1 BB1

    2 B2 BB2

    Work.three

    Work.two

    Work.one

  • 8/12/2019 Getting Start SAS Class (1)

    30/41

    Some simple regression analysisprocedure

    The REG Procedure

    The LOGISTIC Procedure

  • 8/12/2019 Getting Start SAS Class (1)

    31/41

    The REG procedure

    The REG procedure is one of many regressionprocedures in the SAS System.

    The REG procedure allows several MODELstatements and gives additional regressiondiagnostics, especially for detection of collinearity. Italso creates plots of model summary statistics andregression diagnostics.

    PROC REG;

    MODELdependents=independents ;PLOT;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    32/41

    An example

    PROCREGDATA=water;MODELWater = Temperature Days Persons / VIF;MODELWater = Temperature Production Days / VIF;

    RUN;

    PROCREG DATA=water;MODELWater = Temperature Production Days;

    PLOTSTUDENT.* PREDICTED.;PLOTSTUDENT.* NPP.;PLOTNPP.*r.;

    PLOTr.*NQQ.;RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    33/41

    The LOGISTIC procedure

    The binary or ordinal responses with continuousindependent variables

    PROC LOGISTIC< options > ;

    MODEL dependents=independents < / options > ;RUN;

    The binary or ordinal responses with categoricalindependent variables

    PROC LOGISTIC< options > ;

    CLASS categorical variables < / option > ;

    MODELdependents=independents < / options > ;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    34/41

    Example

    PROC LOGISTICdata=Neuralgia;

    CLASSTreatment Sex;

    MODELPain= Treatment Sex Treatment*Sex Age Duration;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    35/41

    Overview Summary Report

    Procedures

    PROC FREQ: produce frequency counts

    PROC TABULATE: produce one- and two-dimensional tabular

    reports

    PROC REPORT: produce flexible detail and summary reports

  • 8/12/2019 Getting Start SAS Class (1)

    36/41

    The FREQ Procedure

    The FREQ procedure display frequency countsof the data values in a SAS data set.

    General form of a simple PROC FREQ steps:

    PROC FREQ DATA= SAS-data-set;

    TABLESAS-variables;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    37/41

    The FREQ Procedure

    Example:

    PROCFREQDATA= class.crew ;

    FORMATJobCode $codefmt. Salary money.;

    TABLEJobCode*Salary /NOCOL NOROW OUT=freqTable;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    38/41

    The TABULATE Procedure

    PROC TABULATE displays descriptivestatistics in tabular format.

    General form of a simple PROC TABULATE

    steps:PROC TABULATE DATA=SAS-data-set;

    CLASSclass-variables;

    VARanalysis-variables;

    TABLErow-expression,column-expression;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    39/41

    The TABULATE Procedure

    Example:

    TITLE'Average Salary for Cary and Frankfurt';

    PROCTABULATEDATA= class.crew FORMAT=dollar12.;WHERE Location IN('Cary','Frankfurt');

    CLASS Location JobCode;

    VARSalary;

    TABLEJobCode, Location*Salary*mean;

    RUN;

  • 8/12/2019 Getting Start SAS Class (1)

    40/41

    The REPORT procedure

    REPORT procedure combines features of the

    PRINT, MEANS, and TABULATE procedures.

    It enables you to create listing reports

    create summary reports

    enhance reports

    request separate subtotals and grand totals

  • 8/12/2019 Getting Start SAS Class (1)

    41/41

    The REPORT procedure

    ExamplePROCREPORTDATA=class.crew nowd HEADLINE HEADSKIP;

    COLUMNJobCode Location Salary;

    DEFINEJobCode / GROUP WIDTH= 8 'Job Code';DEFINE Location / GROUP'Home Base';

    DEFINESalary / FORMAT=dollar10. 'Average Salary MEAN ;

    RBREAK AFTER/ SUMMARIZE DOL;

    RUN;