99
SAS Environment and Concepts of Libraries SAS Training

SAS Training Day1

Embed Size (px)

DESCRIPTION

SAS Training Day1

Citation preview

  • SAS Environment and Concepts of Libraries

    SAS Training

  • Statistical Analysis System (SAS)

    Is a set of solutions for enterprise-wide business users for performing:

    Data Entry, Retrieval and Management

    Report writing and graphics

    Statistical and Mathematical Analysis

    Business planning, Forecasting and Decision support

    Operations research and Project management

    Quality improvement

    Applications development

  • The core of the SAS system is base SAS software, which consists of:

    SAS Language

    SAS Procedures

    SAS Macros

    Data Step debugger

    ODS

    Windowing Environment

  • The basic components of SAS language are:

    SAS Files

    Data Step

    Procedure Step

    SAS Informats

    SAS Formats

    Variables

    Functions

    Statements

    Miscellaneous(SAS Programs, Outputs, Log and Errors )

  • SAS GUI

    SAS Programming Environment Contains 6 Main Windows:

    1. Project Designer: Shows the Process Flow of a Project in Flow charts

    2. Project Explorer: Shows the Process Flow of a Project as Drop Down Menu

    3. Code Editor: Used to write and Edit codes

    4. Server List: Show the Physical Storage Locations of Data

    5. Log Window: Information about the execution of a program and Lists the errors while execution

    6. Output Window: Displays the output of execution of a program

  • SAS Programs

    SAS programs can be used to access, manage, analyze, or present your data

    Layout for SAS Programs

    SAS statements are in free format. This means that:

    they can begin and end anywhere on a line

    one statement can continue over several lines

    several statements can be on a line.

    SAS statements can be specified in uppercase or lowercase

    In most situations, text that is enclosed in quotation marks is case sensitive

  • SAS Libraries

    Every SAS file is stored in a SAS library

    SAS Library is a collection of SAS files

    A SAS data library is the highest level of

    organization for information within SAS

    In the Windows and UNIX environments, a

    library is typically a group of SAS files in the

    same folder or directory.

  • Storing Files Temporarily or Permanently:

    There are two types of libraries in SAS

    Temporary library

    Permanent library

    Depending on the library name that is used when create a file, we can store SAS files

    temporarily or permanently

  • Temporary Library:

    Its Temporary Storage Location of a data file

    They last only for the current SAS session

    Work is the temporary library in SAS

    When the session ends, the data files in the

    temporary library are deleted

    The file is stored in Work, when:

    No specific library name is used while

    creating a file

    Specify the library name as Work

  • Permanent Library:

    Its the Permanent storage location of data files

    Permanent SAS libraries are available in subsequent SAS sessions

    Permanent SAS data libraries are stored until delete them

    To store files permanently in a SAS data library:

    Specify a library name Other than the default library name Work

    Three Permanent Libraries provided by SAS are:

    Local

    SASuser

    SAShelp

  • Creating a Permanent Library:

    To create a permanent library use libname statement

    It creates a reference to the path where SAS files are stored

    The LIBNAME statement is global, which means that the librefs remain in effect until modify them , cancel them, or end your SAS session

    The LIBNAME statement assigns the libref for the current SAS session only

    Assign a libref to each permanent SAS data library each time a SAS session starts

    SAS no longer has access to the files in the library, once the libref is deleted or SAS session is ended.

    Contents of Permanent library exists in the path specified

  • Syntax :

    libname ;

    where,

    libref is the name of the library to be created

    It can be 1 to 8 characters long

    Begins with a letter or underscore

    Contains only letters, numbers, or underscores

    path is location in memory to store the SAS files

  • Example:

    libname Taxes C:\Documents and Settings\admin\Desktop\Training ;

    Here,

    Taxes - A library reference name

    libname - This keyword assigns the libref taxes to the folder called

    training in the path:

    C:\Documents and Settings\admin\Desktop\Training

    Path should be specified in single code

  • Data lib1.emp;

    Length name$ 12;

    Input id name$ doj sal;

    Informat doj mmddyy8. sal dollar7.;

    Format doj date9. sal dollar7.;

    Label id = Employee Id name = Employee Name doj = Date of Joining

    Sal = Salary;

    Cards;

    1076 abcasdayut 12/23/05 $10,000

    1983 aaaertgr 07/12/98 $40,000

    1723 xyzasdsf 04/15/98 $25,000

    ;

    Run;

  • SAS Data Sets

    SAS Data Set is a SAS file which holds Data

    Data must be in the form of a SAS data set to be processed

    Many of the data processing tasks access data in the form of a SAS data set and analyze, manage, or present the data

    A SAS data set also points to one or more indexes, which enable SAS to locate records in the data set more efficiently

    Rules for SAS Data Set Names:

    SAS data set names :

    can be 1 to 32 characters long

    must begin with a letter (AZ, either uppercase or lowercase) or an underscore (_) can continue with any combination of numbers, letters, or underscores.

    These are examples of valid data set names:

    Payroll

    LABDATA1995_1997

    _EstimatedTaxPayments3

  • SAS data set consists of two parts:

    Descriptor portion

    Data portion

  • Descriptor Portion:

    The descriptor portion of a SAS data set contains information about the data set, including:

    The name of the data set

    The date and time that the data set was created

    The number of observations

    The number of variables.

    Example: Descriptor portion of the data set Clinic.Insure

    Data Set Name: CLINIC.INSURE

    Member Type: DATA

    Engine: V8

    Created: 10:05 Tuesday, March 30, 1999

    Observations: 21

    Variables: 7

    Indexes: 0

    Observation Length: 64

  • Data Portion:

    Collection of data values that are arranged in a rectangular table

    Example:

    Here,

    Jones is a data value, the weight 158.3 is a data value, and so on

  • Observations:

    Rows are called observations in SAS

    It is a Collections of data values that usually relate to a single object in SAS Data Sets

    The values Jones, M, 48, and 128.6 constitute a single observation in the data set shown

    below

  • Variables:

    Columns are called variables in SAS

    It is a collection of values that describe a particular characteristic

    The values Jones, Laverne, Jaffe and Wilson contribute the variable Name in the data set

    shown below

  • Missing Values:

    If a data is unknown for a particular observation, a missing value is recorded

    . (called period) indicates missing value of a numeric variable

    (blank) indicates missing value of a character variable

  • Variable Attributes:

    In addition to general information about the data set, the descriptor portion contains information about the attributes of each variable in the data set

    The attribute information includes the variable's:

    Name Type Length Format Informat Label

    Example: Listing of the attribute information in the descriptor portion of the SAS data set Clinic.Insure

    Variable Type Length Format Informat Label

    Policy Num 8 Policy Number

    Total Num 8 DOLLAR8.2 COMMA10. Total Balance

    Name Char 20 Patient Name

  • Name:

    Each variable has a name that conforms to SAS naming conventions

    Variable names follow exactly the same rules as SAS data set names

    Like data set names, variable names:

    Can be 1 to 32 characters long

    Must begin with a letter (AZ, either uppercase or lowercase) or an underscore (_)

    Can continue with any combination of numbers, letters, or underscores.

  • Type:

    A variable's type is either character or numeric

    Character variables, such as Name (shown below), can contain any values

    Numeric variables, such as Policy and Total (shown below), can contain only numeric values

    (the digits 0 through 9, +, -, ., and E for scientific notation)

  • Length:

    A variable's length (the number of bytes used to store it) is related to its type

    Character variables can be up to 32,767 bytes long

    In the example below, Name has a length of 20 characters and uses 20 bytes of storage.

    All numeric variables have a default length of 8

    Numeric values (no matter how many digits they contain) are stored as floating-point numbers in 8 bytes of storage, unless specify a different length.

  • Format:

    A Format is an instruction that SAS uses

    to write data values

    Format is used to control the written

    appearance of data values, or in some

    cases, to group data values together for

    analysis

    SAS software offers a variety of

    character, numeric, and date and time

    formats

    Formats can be created and stored

    Can permanently assign a format to a

    variable in a SAS data set, or can

    temporarily specify a format in a PROC

    step to determine the way the data values

    appear in the output

  • Informat:

    Used to Read data values in certain formats

    into standard SAS values

    It determines how data values are read into

    a SAS data set

    Informats are used to read numeric values

    that contain letters or other special

    characters

  • Label:

    A variable can have a label consisting of descriptive text up to 256 characters long

    By default, many reports identify variables by their names

    To display more descriptive information about the variable assign a label to that variable

    Example:

    Label Policy as Policy Number, Total as Total Balance, and Name as Patient Name to

    display these labels in reports

  • Referencing Permanent SAS Files

    Two-Level Names:

    Two-level name are used to reference a permanent SAS file in SAS programs

    There are two parts in a Two-Level Name:

    1. Libref name

    2. Filename

    Libref Is the name of the SAS data library that contains the file

    Filename Is the name of the file itself

    A period separates the libref and filename

  • Example:

    Clinic.Admit is the two-level name for the SAS data set Admit

    Admit is assigned to the library named Clinic

  • Referencing Temporary SAS Files

    To reference temporary SAS files specify the default libref Work, a period, and the filename

    Example:

    Here,

    The two-level name Work.Test references the SAS data set named Test that is stored in the

    temporary SAS library Work

  • One-Level name

    One-level name (the filename only) can be used to reference a file in a temporary SAS library

    When a one-level name is used, the default libref Work is assumed

    Example:

    Here,

    The one-level name Test also references the SAS data set named Test that is stored in the temporary SAS library Work.

  • Components of SAS Programs

    SAS Programs contains only two steps:

    Data Step

    Proc Step

    A SAS Program may contain:

    A DATA step

    A PROC step

    Combination of DATA and PROC step

  • Data Step:

    Typically create or modify SAS data sets and they can also be used to produce custom-designed

    reports

    DATA steps are used to:

    Put data into a SAS data set

    Compute values

    Check for and correct errors in data

    Produce new SAS data sets by subsetting, merging, and updating existing data sets

  • Proc Step:

    They pre-written routines that enable us to analyze and process the data in a SAS data set and to present the data in the form of a report

    PROC steps sometimes create new SAS data sets that contain the results of the procedure

    PROC steps can list, sort, and summarize data

    PROC steps are used to:

    Create a report that lists the data

    Produce descriptive statistics

    Create a summary report

    Produce plots and charts

  • Importing Data for creating SAS Datasets

    SAS Data step concepts:

    DATA steps typically create or modify SAS data sets

    Can also be used to produce custom-designed reports.

    SAS DATA steps can be used to:

    put data into a SAS data set

    compute values

    check for and correct errors in your data

    produce new SAS data sets by subsetting, merging, and updating existing data sets.

  • A SAS data set can be created by:

    Entering data as input

    Reading existing raw data

    Accessing external files (files that were created by other software)

    The fig below shows how to design and write a DATA step program to create a SAS data set

    from raw data that is stored in an external file

  • Data step:

    Data & Set Statements:

    Data & Set statements are used to create a data set

    Syntax:

    DATA ;

    SET ;

    Where,

    dataset1 is the Destination Data Set

    dataset2 is the Source Data Set

  • Reading Instream Data using Cards and Datalines

    Data can be entered into SAS data set directly through SAS program

    Reading instream data is useful when to create data and test programming statements on a few observations

    To read instream data use:

    DATALINES statement as the last statement in the DATA step (except for the RUN statement) and immediately preceding the data lines

    a null statement (a single semicolon) to indicate the end of the input data

    Only one DATALINES statement can be used in a DATA step

    Use separate DATA steps to enter multiple sets of data

    If the data contains semicolons, use the DATALINES4 statement plus a null statement that consists of four semicolons (;;;;) to indicate the end of the input data

  • Syntax:

    DATA ;

    INPUT [$] [$] ;DATALINES;

    .

    .

    data lines go here

    .

    .

    ;

    run ;

    After the DATALINES statement specify the data values

    After typing in the values give a semicolon to indicate the end of the data values.

    Can also use Cards instead of datalines

  • Example:

    Data emp_details ;

    Input id name$ age ;

    Datalines ;

    2458 Murray, W 42

    2462 Almers, C 38

    2501 Bonaventure, T 48

    2523 Johnson, R 39

    2539 LaMance, K 45

    2544 Jones, M 49

    ;

    run ;

    Here,

    A dataset called emp_details is created with variables id, name & age, and having 6

    observations

    Name is a character variable which is indicated by $ sign after name

  • Importing Different File types

    SAS GUI can be used to import different file types data such as:

    Excel File

    Comma separated Files (CSV)

  • Importing Files using PROC IMPORT

    Proc import procedure step can be used to import an external file of different file types

    Syntax:

    proc import datafile = External file path out= dbms= replace;

    delimiter= special character ; getnames= ; datarow= n ;

    Where,

    External file path is the path of the external file to import

    Out= specifies the dataset to be created using the imported file

    dbms specifies the file type to be imported or dlm if delimited files are imported

    replace replaces already existing files

    getnames=yes tells SAS to read the variable names from the first line of the data file

    delimiter= specifies the delimiter in the external file. It is specified only when the dbms= dlm is specified

    datarow =n specifies the row from which the data has to read from the external file. Where, n is a number

  • Importing a comma separated file (.csv) :

    Example 1:

    Comma separated file is a special external file with file extension .csv (comma separated variables)

    proc import datafile="comma.csv" out= mydata dbms=csv replace;

    getnames=no;

    run;

    Here,

    A comma separated file called comma.csv is imported

    A new dataset called mydata is created

    getnames=no indicates that the first row in the file is not variable names

    replace indicated SAS to replace the existing file mydata

  • Example 2:

    Another way of reading a comma delimited file is to consider a comma as an ordinary delimiter

    Here is a program that shows how to use the dbms=dlm and delimiter=","

    proc import datafile="comma1.txt" out=mydata dbms=dlm replace;

    Delimiter =", ;

    Getnames =yes ;

    Datarow =5 ;

    Run ;

    Here,

    comma1.txt is a comma separated text file whose variable values are separated by commas

    dbms=dlm indicates that comma1.txt is a delimiter file

    delimiter=, indicates the delimiter as ,

    Datarow=5 tell SAS to read data from the 5th row

  • Import from Tab- Delimitated files (TXT File):

    Example:

    proc import datafile ="tab.txt" out=mydata dbms=tab replace ;

    getnames=no ;

    Run ;

    Here,

    tab.txt is a tab separated text file

    dbms=tab indicates tab.txt as tab separated file

  • Data Understanding

    Proc Contents Step:

    The CONTENTS procedure is used to create SAS output that describes either of the following:

    The contents of a library

    The descriptor information for an individual SAS data set

    Describes the structure of the data set rather than the data values

    Displays valuable information at the...

    Data set level

    Name

    Engine

    Creation date

    Number of observations

    Number of variables

    File size (bytes)

  • Variable level

    Name

    Type

    Length

    Formats

    Position

    Label

  • Syntax:

    Proc Contents Data = libref . _ALL_ NODETAILS; Run;

    Where,

    libref is the libref that has been assigned to the SAS library.

    _ALL_ requests a listing of all files in the library

    A period (.) is used to append _ALL_ to the libref

    NODETAILS (NODS) suppresses the printing of detailed information about each file when _ALL_ is specified.

    Specify NODS only when you specify _ALL_

  • Example:

    To view the contents of the Mylib library, submit the following PROC CONTENTS step:

    proc contents data = mylib ._all_ nods ;

    run ;

    The output from this step lists only the names, types, sizes, and modification dates for the SAS files in the Mylib library

    To view the descriptor information for the Mylib.Admit data set, submit the following PROC CONTENTS step:

    proc contents data = mylib .admit ;

    run ;

    The output from this step lists information for Mylib.Admit data set, including an alphabetic list of the variables in the data set

  • Proc Print:

    Prints a listing of the values of some or all of the variables in a SAS data set

    Syntax:

    proc print data = libref .Datasetname [ (firstobs = n obs = n) split = Special Character double label n noobs ] ;

    [ Id Variable list ;

    Var Variable list ;

    By Variable list ;

    Sum Varibale list

    ]

    Run ;

    Where,

    [ ] are optionals

    Libref is the library in which Datasetname is the dataset whose values are to be printed

  • Firstobs indicates the starting number of observation to be printed

    Obs indicates the ending number of observation to be printed

    Drop indicates the variables to be dropped

    Keep indicates the variables to be keep

    Split ='split character' - splits labels as column headings across multiple lines where split character appears

    Double - double spaces the printed output

    Label - uses variable labels as column headings (variable name is default heading)

    N - Lists no: of observations in the specified datasets

    Noobs - suppresses the observation number in the output.

  • Id -Identify observations by the formatted values of the variables which can be listed instead of

    observation numbers

    Var -Select variables that appear in the report and determine their order

    By - Produce a separate section of the report for each BY group

    Sum - Total values of numeric variables

  • Example:

    proc print data = candy_products (firstobs=1 obs=16 ) n noobs double label ;

    id Prodid ;

    var Prodid Product Category Retail_price ;

    by Category ;

    Sum Retail_price ;

    Run ;

    Here,

    Candy_products is the dataset which is present in work library

    First observation to 16thobservation are printed (firstobs=1 and obs=16)

    N gives the number of observation

    Double - Double spacing between observations printed (only in list input)

    Label - Prints the label of each variable instead of variable names

    Id - Prodid becomes the row identifier instead of observation no:

    Var - Only the variables indicated here are printed

    By - The outputs are grouped by category

    Sum - Sum of the Retail_price

  • Data Transfer from one library to another

    We can create a new data set from an existing SAS data set

    To create the new data set, read a data set using the DATA step and use the programming features of the DATA step to manipulate data

    Store the manipulated data to new data set or the same which will overwrite the existing data

    Syntax 1:

    Data SAS-data-set;

    Set SAS-data-set; Run;

    where ,

    SAS-data-set in the DATA statement is the name (libref.filename) of the SAS data set to be created (Destination Data Set)

    SAS-data-set in the SET statement is the name (libref.filename) of the SAS data set to be read (Source Data Set)

  • Example:

    libname lab23 c : \ drug\ allergy \ labtests ;

    libname research c : \ drug \ allergy ;

    data lab23.drug1h ;

    set research.cltrials ;

    Run ;

    Where

    Lab23 and research are two libraries which are created in two different locations

    The DATA statement creates the permanent SAS data set Drug1H

    Drug1H will be stored in a SAS data library to which the libref Lab23 has been

    assigned

    The SET statement below reads the permanent SAS data set Research.CLTrials.

  • Syntax 2: Data Transfer from one library to another using Proc Copy

    proc copy in = libref1 out = libref2 ;

    [ select Ds1 Ds2 . . . ; ]

    run ;

    Where,

    Libref1 is the library from which the data sets are to copied

    Libref2 is the library to which the data sets are to be copied

    Select is an option which selects the data sets Ds1, Ds2, etc form libref1 to libref2

    If Select is not used, all the data sets from libref1 is copied to libref2

  • Example:

    proc copy in = clinic out = work ;

    select admit ;

    run ;

    Here,

    Data Set admit is copied from clinic libref to temporary library work

  • Manipulating data during data transfers

    Some of the options for manipulating data are:

    Firstobs

    Obs

    Label

    Rename

    Delete

    Drop

    Keep

    by group

    point= option

    Output

    END= option

  • Firstobs & Obs Data Set Options:

    Firstobs and Obs options are used to select a range of observations from a data set

    It can be used in both Data step and proc step

    When used in Data step the selected observation remain in memory

    When used in proc print step the output displays the selected observations

    Firstobs specifies the starting no: of the observations to be selected

    Obs specifies the ending no: of the observations to be selected

    Firstobs and Obs can be used together to select a range of observations

    If only Firstobs is specified, observations from that position to the end of file are selected

    If only Obs is specified, observations from first to the specified no: are selected

  • Syntax:

    data SAS-Data-Set;

    Set SAS-Data-Set (firstobs = n obs = n);

    run;

    or

    data SAS-Data-Set (firstobs = n obs = n);

    Set SAS-Data-Set;

    run;Where,

    SAS-Data-Set in Data Step is the Destination Data set

    SAS-Data-Set in Set Step is the Source Data set

    N ;- Any numeric value

    Firstobs specifies the observation to start with

    Obs specifies the last observation

    Firstobs and Obs options can be used both in Data Step or Set Step

  • Example:

    data candy_products;

    set local.candy_products (firstobs=10 obs=100);

    run;

    Here,

    91 observations are copied from candy_products in local library to candy_products in

    work library

  • Label & Rename Statements:

    Label is a descriptive text given to a variable

    It can be up to 256 characters long

    Label can be assigned temporarily in proc step or permanently using data step

    Label assigned in data step remains in memory and will be shown when the data set is printed

    using proc print step

    Rename statement is used to rename a variable in the data set

    Rename statement in data step will permanently rename the variable in the data set

  • Syntax:

    Data libname .dataset-name ;

    Set libname .dataset-name ;

    Label Variable-Name = ;

    Rename Variable-Name = ;

    Run;

    or

    proc print data= libname . Dataset-name Label;

    Label Variable-Name = ;

    Run;

    Where,

    Variable Label is assigned to Variable specified by Variable-Name in the Label Statement

    New Variable Name is assigned to the Variable specified by Variable-Name in the Rename Statement

    Label in Data step will write the new label in memory for that variable and will be displayed when

    Label in proc step will only be displayed when that block of proc step is being executed

    Label option should be specified in proc when using label statement in proc step

  • Example:

    Data demo.class;

    Set demo.class ;

    Label sizehh = Size of household;Rename sizehh = sizehouse;

    Run;

    proc print data = demo1.class1 Label;

    label sizehh = Size of Household;run;

    Here,

    Size of household is assigned as label for the variable Sizehh in Data step

    Sizehh variable is renamed as Sizehouse in Data step

    Size of household label is assigned for the variable Sizehh temporarily using proc step which is effective only when that block of code is executed

    Rename Statement can be used only in Data step as it is data modification

  • DROP= and KEEP= Data Set Options:

    Drop= and Keep= options in data step can be used to drop and keep variables in that data set

    Drop=, omits all variables specified after it

    Keep=, keeps all variables specified after it

    Use the KEEP= option instead of the DROP= option if more variables are dropped than kept

    Specify drop and keep options in parentheses after a SAS data set name

    Syntax:

    (DROP = variable(s)) (KEEP = variable(s))

    where ,

    the DROP= or KEEP= option, in parentheses, follows the name of the data set that contains the variables to be dropped or kept

    variable(s) identifies the variables to drop or keep

  • Example:

    1. Timemin and Timesec are dropped from the data set clinic.stress

    data clinic.stress (drop= timemin timesec);

    Set clinic.stress;

    Run;

    2. Timemin and Timesec are Kept in the data set clinic.stress

    data clinic.stress (Keep= timemin timesec);

    Set clinic.stress;

    Run;

  • Drop and Keep Statements:

    Another way to exclude variables from data set is to use the DROP statement or the KEEP statement

    Like the DROP= and KEEP= data set options, these statements drop or keep variables

    The DROP statement differs from the DROP= data set option in the following ways:

    Cannot use the DROP statement in SAS procedure steps

    The DROP statement applies to all output data sets that are named in the DATA statement.

    To exclude variables from some data sets but not from others, place the appropriate DROP= data set option next to each data set name that is specified in the DATA statement.

    The KEEP statement is similar to the DROP statement, except that the KEEP statement specifies a list of variables to write to output data sets

    Use the KEEP statement instead of the DROP statement if the number of variables to keep is significantly smaller than the number to drop

  • Syntax:

    DROP variable(s);

    KEEP variable(s);

    Where,

    variable(s) identifies the variables to drop or keep

    Example:

    data clinic.stress;

    Set clinic.stress;

    drop timemin timesec;

    Run;

    Here,

    Drop statement omits variables timemin and timesec

  • Data Modifications using conditional statements

    Conditional Statement:- Where:

    Where statement can be used to select observations during proc step and data step

    There can be only one WHERE statement in a step

    Syntax:

    Where where-expression;

    Where,

    where-expression specifies a condition for selecting observations

    The where-expression can be any valid SAS expression

    The WHERE statement works for both character and numeric variables

    WHERE statement is observation level

  • To specify a condition based on the value of a character variable:

    enclose the value in quotation marks

    write the value with lowercase and uppercase letters exactly as it appears in the data

    set

    Following comparison operators can be used to express a condition in the WHERE statement:

    Symbol Meaning Example

    = or eq equal to where name='Jones, C.';

    ^= or ne not equal to where temp ne 212;

    > or gt greater than where income>20000;

    < or lt less than where partno lt "BG05";

    >= or ge greater than or equal to where id>='1543';

  • Contains operator in Where:

    The CONTAINS operator selects observations that include the specified substring.

    The mnemonic equivalent for the CONTAINS operator is ?

    Example:

    where firstname CONTAINS 'Jon';

    where firstname ? 'Jon';

    Here,

    Firstname is the variable name and Jon is the value

  • Compound WHERE Expressions:

    WHERE statements can be used to select observations that meet multiple conditions

    To link a sequence of expressions into compound expressions, use logical operators, including

    the following:

    Operator Meaning

    AND or & and, both. If both expressions are true, then the compound

    expression is true.

    OR or | or, either. If either expression is true, then the compound

    expression is true.

  • Example:

    1. Where with proc step

    proc print data = clinic.admit;

    var age height weight fee;

    where age > 30;

    run;

    2. Where with data step

    data clinic.admit;

    set clinic.admit;

    where age >30 and pulse >55;

    run;

    3. Some examples using logical operators:

    where ID>1050 and state='NC';

    where actlevel = 'LOW' or actlevel = 'MOD';

    where actlevel in ('LOW','MOD');

    where fee in (124.80,178.20);

    where (age75) or area='A';

  • Conditional Statement:- IF Then Else:

    The IF-THEN statement executes a SAS statement when the condition in the IF clause is true

    comparison and Logical operators can be used in IF conditional expression

    Any numeric value other than 0 or missing is true, and a value of 0 or missing is false

    Syntax:

    IF expression THEN statement;

    [

    else IF expression THEN statement;

    .

    .

    else statement;

    ]

    Where,

    expression is any valid SAS expression

    statement is any executable SAS statement

  • Example:

    Data clinic.stress;

    Set clinic.stress;

    if totaltime > 800 then TestLength = 'Long';

    else if 750

  • Deleting Unwanted Observations: Delete option

    If Then statement along with Delete option can be used to select observations in a data set and

    delete

    Syntax:

    IF expression THEN DELETE;

    If the expression is:

    true, the DELETE statement executes, and control returns to the top of the DATA step

    (the observation is deleted).

    false, the DELETE statement does not execute, and processing continues with the next

    statement in the DATA step

  • Example:

    Data clinic.stress;

    Set clinic.stress;

    if resthr < 70 then delete;

    Run;

    Here,

    The IF-THEN and DELETE statements below omit any observations whose values for

    RestHR are lower than 70

  • Assigning Values Conditionally Using SELECT Groups:

    Use IF-THEN/ELSE statements or SELECT groups based on the following criteria.:

    When a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is more efficient than using a series of IF-THEN or IF-THEN/ELSE statements because CPU time is reduced

    SELECT groups also make the program easier to read and debug.

    For programs with few conditions, use IF-THEN/ELSE statements

  • Syntax:

    SELECT ; WHEN-1 (when-expression-1 ) statement;WHEN-n (when-expression-1 ) statement; END;

    Where,

    SELECT begins a SELECT group

    The optional select-expression specifies any SAS expression that evaluates to a single value.

    WHEN identifies SAS statements that are executed when a particular condition is true.

    When-expression specifies any SAS expression, including a compound expression

    Must specify at least one when-expression

    Statement is any executable SAS statement.

    The optional OTHERWISE statement specifies a statement to be executed if no WHEN condition is met.

    END ends a SELECT group

  • Example:

    data emps (keep=salary group);

    set sasuser.payrollmaster;

    length Group $ 20;

    select (jobcode);

    when ("FA1") group="Flight Attendant I";

    when ("FA2") group="Flight Attendant II";

    when ("FA3") group="Flight Attendant III";

    when ("ME1") group="Mechanic I";

    when ("ME2") group="Mechanic II";

    when ("ME3") group="Mechanic III";

    when ("NA1") group="Navigator I";

    when ("NA2") group="Navigator II";

    when ("NA3") group="Navigator III";

    when ("TA1","TA2","TA3") group="Ticket Agents";

    otherwise group="Other";

    end;

    run;

    The SELECT group assigns values to the variable Group based on values of the variable JobCode

  • Appending Data Sets

    It is concatenation of two data sets which are already existing.

    The observation in each data set will stack together according to the order specified to form new

    data set

    Appends the observations from one data set to another data set

  • Syntax:

    DATA output-SAS-data-set;

    SET SAS-data-set-1 SAS-data-set-2;

    RUN;

    Where,

    output-SAS-data-set names the data set to be created

    SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read

    SAS-data-set-1 and SAS-data-set-2 gets appended and copies to output-SAS-data-set

  • Example:

    Data combined;

    Set A C;

    Run;

  • Appending Data Sets Using Proc Step

    Adding observations using append procedure

    The base file gets appended with observations from data file.

    No new data set is created

    Works only if the base file is having all the variables in the data file, otherwise use force option

  • Syntax:

    Proc Append base = data = [force];

    Run;

    Where,

    SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read

    SAS-data-set-2 gets appended to SAS-data-set-1999

    Force is an optional keyword, used when base file is having some variables missing

    compared to data file, to force SAS to append

  • Example:

    Proc Append base = A data = C;

    Run;

  • Merging

    A merge combines observations from two or more SAS data sets based on the values of

    specified common variables (one or more)

    It creates a new data set (the merged data set)

    Merging is done in a data step with the statements

    MERGE : to name the input data sets

    BY : to name the common variable(s) to be used for matching

    Prerequisites for a match-merge

    input data sets must have a common variable

    input data sets must be sorted by the common variable(s)

    It is also called "match-merge."

  • Syntax:

    DATA output-SAS-data-set;MERGE SAS-data-set-1 SAS-data-set-2;BY variable(s);

    RUN;

    Where,

    output-SAS-data-set names the data set to be created

    SAS-data-set-1 and SAS-data-set-2 specify the data sets to be read

    variable(s) in the BY statement specifies one or more variables whose values are used to match observations

    DESCENDING indicates that the input data sets are sorted in descending order by the variable that is specified

    If there are more than one variable in the BY statement, DESCENDING applies only to the variable that immediately follows it

    Each input data set in the MERGE statement must be sorted in order of the values of the BY variable(s)

    Each BY variable must have the same type in all data sets to be merged

  • Sorting of Data Set:

    Procedure sort can be used to sort the data sets either ascending or descending

    Syntax:

    Proc Sort Data = Data-Set-1 [out = Data-Set-2];

    By [Descending] Variabel1 [Variable2 ];

    Run;

    Here,

    Data-Set-1 will be sorted in either ascending or descending order

    If OUT= option is specified then a Data-Set-1 will be copied to Data-Set-2 and will get sorted there but the original data set (Data-Set-1) remains un sorted.

    By statement will sort the data set according to the variables specified

    Descending option will sort the data set in descending order by the variable just proceeding that.

  • Example:

    During match-merging SAS sequentially checks each observation of each data set to see whether

    the BY values match, then writes the combined observation to the new data set

    data merged;

    merge a b;

    by num;

    run;

  • Example: Sample Data Sets:

    1. Clinic.Demog

    proc sort data=clinic.demog;

    by id;

    run;

    proc print data=clinic.demog;

    Obs ID Age Sex Date

    1 A001 21 m 05/22/75

    2 A002 32 m 06/15/63

    3 A003 24 f 08/17/72

    4 A004 . 03/27/69

    5 A005 44 f 02/24/52

    6 A007 39 m 11/11/57

  • 2. Clinic.Visit

    proc sort data=clinic.visit;

    by id;

    run;

    proc print data=clinic.visit;

    run;

    Obs ID Visit SysBP DiasBP Weight Date

    1 A001 1 140 85 195 11/05/98

    2 A001 2 138 90 198 10/13/98

    3 A001 3 145 95 200 07/04/98

    4 A002 1 121 75 168 04/14/98

    5 A003 1 118 68 125 08/12/98

    6 A003 2 112 65 123 08/21/98

    7 A004 1 143 86 204 03/30/98

    8 A005 1 132 76 174 02/27/98

    9 A005 2 132 78 175 07/11/98

    10 A005 3 134 78 176 04/16/98

    11 A008 1 126 80 182 05/22/98

  • Example: Merging

    data clinic.merged;

    merge clinic.demog clinic.visit;

    by id;

    run;

    Obs ID Age Sex Date Visit SysBP DiasBP Weight

    1 A001 21 m 11/05/98 1 140 85 195

    2 A001 21 m 10/13/98 2 138 90 198

    3 A001 21 M 07/04/98 3 145 95 200

    4 A002 32 M 04/14/98 1 121 75 168

    5 A003 24 f 08/12/98 1 118 68 125

    6 A003 24 f 08/21/98 2 112 65 123

    7 A004 . 03/30/98 1 143 86 204

    8 A005 44 f 02/27/98 1 132 76 174

    9 A005 44 f 07/11/98 2 132 78 175

    10 A005 44 f 04/16/98 3 134 78 176

    11 A007 39 m 11/11/57 . . .

    12 A008 . 05/22/98 1 126 80 182

  • Excluding Unmatched Observations:

    By default, DATA step match-merging combines all observations in all input data sets

    To exclude unmatched observations from output data set, use the IN= data set option and the

    subsetting IF statement in DATA step.

    In this case, use

    the IN= data set option to create and name a variable that indicates whether the data set

    contributed data to the current observation

    the subsetting IF statement to check the IN= values and to write to the merged data set only those

    observations that appear in the data sets for which IN= is specified

  • Syntax:

    (IN=variable)

    Where,

    the IN= option, in parentheses, follows the data set name

    variable names the variable to be created

    Within the DATA step, the value of the variable is 1 if the data set contributed data to

    the current observation. Otherwise, its value is 0.

  • Example:

    To Match-merge the data sets Clinic.Demog and Clinic.Visit and select only observations that

    appear in both data sets :

    Use IN= to create two temporary variables, indemog and invisit

    The first IN= creates the temporary variable indemog, which is set to 1 when an observation from Clinic.Demog contributes to the current observation; otherwise, it is set to 0

    Likewise, the value of invisit depends on whether Clinic.Visit contributes to an observation or not

    IF statement is used to select only observations that appear in both Clinic.Demog and Clinic.Visit

    If the condition is met, the new observation is written to Clinic.Merged. Otherwise, the observation is deleted

    data clinic.merged;

    merge clinic.demog (in= indemog) clinic.visit (in=invisit);

    by id;

    if indemog=1 and invisit=1;

    run;

    proc print data=clinic.merged;

    run;

  • Output:

    Obs ID Age Sex BirthDate Visit SysBP DiasBP Weigh

    t

    VisitDate

    1 A001 21 m 05/22/75 1 140 85 195 11/05/98

    2 A001 21 m 05/22/75 2 138 90 198 10/13/98

    3 A001 21 m 05/22/75 3 145 95 200 07/04/98

    4 A002 32 m 06/15/63 1 121 75 168 04/14/98

    5 A003 24 f 08/17/72 1 118 68 125 08/12/98

    6 A003 24 f 08/17/72 2 112 65 123 08/21/98

    7 A004 . 03/27/69 1 143 86 204 03/30/98

    8 A005 44 f 02/24/52 1 132 76 174 02/27/98

    9 A005 44 f 02/24/52 2 132 78 175 07/11/98

    10 A005 44 f 02/24/52 3 134 78 176 04/16/98

  • Different Types Of Merge

    Join Condition Description

    Inner Join No condition Includes all the observations from both the dataset

    Right Inner Join If Y = 1 Includes all the observations from right dataset

    Left Inner Join If X = 1 Includes all the observations from left dataset

    Exact Join If X = 1 and Y = 1 Includes all the matching observations from both datasets

    Outer Join If X = 0 or Y = 0 Includes all the non matching observations from both datasets

    Right Outer Join If X = 0 and Y = 1 Includes all the non matching observations from right dataset

    Left Outer Join If X = 1 and Y = 0 Includes all the non matching observations from left dataset