SAS Programming TIPS to Be Used in Development

Embed Size (px)

Citation preview

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    1/16

    TIPS to be Used in Development:

    1. Dynamic concatenation of transposed variables

    The transpose procedure in SAS changes multiple values in rows (for a column) intocolumns, and can also change multiple columns values into multiple rows values for a single

    column. The number of variables (columns) to be created in the output file is based on themaximum number of observations (rows) in a variable (column) to be transposed. If theprefix= option is not specified, then the transposed variables have names Col1, Col2 etc.i.e. if the dataset has 4 observations in a variable, then on transposing 4 variables namelyCol1, Col2, Col3 and Col4 are obtained against each value of the by variable (labparam,lab_id in the below example).

    In certain situations, it is required to concatenate the columns obtained during transposing.This can be attained simply by setting the dataset and concatenating the variables using theconcatenation operator (||).

    e.g. Newvar= Col1|| ,||Col2|| ,||Col3|| ,||Col4;

    This procedure may however not produce desirable results if any of the four columns havemissing values. If suppose Col3 and Col4 have missing values then consecutive , will bepresented as the last few characters in the variable Newvar.

    The concatenation is thus done by determining the maximum number of Col variables in thetransposed dataset and then concatenating the non missing variables to create the newvariable against each of the by variables.

    Consider the following example from a clinical trial analysis, where the laboratory values arecollected at different laboratories. For different laboratories, the reference ranges adoptedfor parameters are different.Such a dataset is shown below

    While obtaining descriptive statistics for different lab parameters, the lab ranges are to bepresented in the report as a sub header of the following form.

    e.g. Param1

    Lab-A: 0-35 g/L (Age 0-99)

    Lab-B: 0-34 g/L (Age 0-18), 0-35 g/L (Age 18-60), 0-36 g/L (Age 60-99)

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    2/16

    For different laboratory groups the number of ranges will be different, so it is necessary todynamically concatenate the ranges for each lab for each parameter. The steps involved inthis process are given below

    1. Dataset with the lab ranges is transposed.

    *Transposing the data to get the reference ranges horizontally;

    proctransposedata=range out=range_trn;

    by labparam lab_id;

    var range;

    run;

    2. Select the information about the number of columns in the transposed dataset withthe help of sashelp.vcolumn dataset. A macro variable named coln is created withthe maximum number of Col variables as its value.

    *Creating a macro variable with maximum number of columns;

    data_null_;

    set sashelp.vcolumn end=last;

    where trim(libname) = 'WORK' and trim(memname) ='RANGE_TRN' andupcase(substr(name,1,3))='COL';

    iflast thencall symput('coln',trim(left(put(_n_,6.))));

    run;

    3. Set the transposed dataset and giving array reference to the Col variables.Concatenate the ranges with using a do loop and then remove the preceding commaby taking substring of the concatenated variable. A variable named labrange iscreated by concatenating with the lab_id.

    data range_con;

    length convar $400;

    array cols {&coln.} $50 col1-col&coln.;

    set range_trn;

    do i=1to dim(cols);

    ifnot missing(cols[i]) then convar= trim(convar)||', '||cols[i];

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    3/16

    end;

    convar=substr(left(convar),2);

    Labrange=trim(left(lab_id))||':'|| compbl(convar);

    keep labparam lab_id labrange;

    run;

    The dataset range_con contains the concatenated labrange variable.

    *Transposing the dataset to present the ranges for different lab_id's in differentvariables;

    proctransposedata= range_con out=lab_rng(drop=_name_);

    id lab_id;

    by labparam;

    var labrange;

    run;

    The output dataset obtained during transposing is of the following form

    By merging it with the descriptive statistics dataset and using labparam and the variablescontaining the concatenated lab ranges (here LAB_A and LAB_B) as by variables in procreport creates a report which contains descriptive statistics for one parameter per page. Byusing the #byval option in the title statement we can obtain the sub headers of thefollowing type.

    Param1

    Lab-A: 0-35 g/L (Age 0-99)

    Lab-B: 0-34 g/L (Age 0-18), 0-35 g/L (Age 18-60), 0-36 g/L (Age 60-99)

    2.Project specific SAS shortcuts

    As a SAS programmer, an important concept to know when opening and saving SASprograms is the SasInitialFolder. The location of this folder is the path name where SASreads or stores files when a specific drive and pathname is not given. The name of this pathappears in a small panel located in the lower right-hand corner of our SAS session. This can

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    4/16

    be changed at any time with a double-click within that rectangle to access a dialogue box,then entering the drive and path to the desired folder in the space provided or by movingthrough the folders as done while searching for a particular file. But the change which ismade is not permanent. During the next invocation of the SAS System, the current folderwill by default present the current folder path that was permanently set. In SAS v8 forwindows, the default folder may be something like C:\Documents and Settings\\My Documents\My SAS Files\V8 or a path which is in the drive where SAS isinstalled (e.g. E:\SAS Installation). When trying to open or save a program in a SASsession, this default folder will be shown and changing this folder to the desired folder mayconsume some amount of our valuable time. Here is the way to save this time and also tomake project specific SAS shortcuts, which will allow opening and saving SAS programs ofthat project from the SAS Programs folder which is set for that project. The steps to setthe current folder to a new location are given below. The folder which is to be set should becreated before going through the following steps. 1. Create a shortcut of SAS by clicking onthe Start menu and select SAS from All Programs, right click on it and then select Sendto Desktop. Then a new SAS shortcut icon will be created in the desktop.2. Right click onthe icon and select Rename and give a name related to the project.3. Again right click onthe icon and select Properties and click on the Shortcuts tab.4. In the Target box, go tothe far end of the right-hand side of what is seen in that by pressing the End button. Type

    one space, and enter the following command exactly as you see it here:-sasInitialFolder="."5. The current or working SAS folder will be the pathname that isspecified in the 'Start in' field (located just below the 'Target' field). To designate the choiceof the current folder, enter the drive and pathname of it within that box. This folder needsto exist on the specified directory.e.g. G:\Project1\SAS programs6. Click on Apply andthen click OK. The shortcut is now ready for use for the particular project and it will directto the specified folder on clicking File-> Open or File->Save As.

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    5/16

    3.0 Index, Indexc and Indexw functions

    The INDEX, INDEXC and INDEXW functions are used for determining the charactermatches. These functions are useful for testing purposes. The index and indexc functions

    can indicate if a string of characters is present in a target variable. Both functions return theposition number of the match in the target variable. A zero indicates that the searchargument is not present in the target variable.

    For finding a special character such as a letter, a group of letters, or special characters,Index functions can be used and it is case sensitive. The syntax is,

    INDEX (source, excerpt)

    The INDEXC function allows multiple arguments and will identify the first occurrence of anyof the characters in any of the arguments, but otherwise functions similarly to the indexfunction.

    /* Example: 1 */

    /* Results*/

    The INDEXWfunction searches source, from left to right, for the first occurrence of excerptand returns the position in source of the substring's first character. If the substring is notfound in source, INDEXW returns a value of 0. If there are multiple occurrences of thestring, INDEXWreturns only the position of the first occurrence.

    The INDEXWfunction is case sensitive function that performs exactly the same function asthe INDEXfunction, with one significant exception. The indexw function searches for strings

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    6/16

    that are words, whereas the index function searches for patterns as separate words or asparts of other words.

    /* Example: 2 */

    /* Results */

    Result Explanation

    The above program demonstrates the difference between INDEX and INDEXW functions. Inthe first observation in the table above, the INDEX function returns a 1 because the letters

    "the" as part of the word "there" begin the string. Since the INDEXW function needs eitherwhite space at the beginning or end of a string to delimit a word, it returns a 12, theposition of the word "the" in the string. Observation 3 emphasizes the fact that apunctuation mark does not serve as a word separator. Finally, since the string "the" doesnot appear anywhere in the fourth observation, both functions return a 0.

    4.0 Read from multiple external files in one data step by using FILEVAR= option

    Prepared byJose Abraham

    External files are usually read into SAS one by one using separate data steps for eachexternal file. But multiple external files which have the same structure can be easily readinto SAS in one data step by using FILEVAR= and END= options in the INFILE statement.Following example illustrates how to read multiple external files where the locations of theexternal files are stored in another external file.

    Consider we have demographic information of subjects from three different centers storedin three external files. All the three external files have the same structure as given below

    mailto:[email protected]:[email protected]
  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    7/16

    The data values are aligned in columns and there are no missing values. The layout follows.

    We have another external file which contains the location information of these external files.Suppose these files are stored in the 'demog' folder in the E-drive, and the followingexternal file (dmgfiles) which contains the locations is also in it.

    Following SAS data step reads the three external files in one DATA step by using the nameswhich are specified in the external file 'dmgfiles'. This reads the list to determine theexternal files it should read.

    http://3.bp.blogspot.com/_AgqsS20RSWk/TA30VQZO-VI/AAAAAAAAAN8/W9-fzZ8rD3w/s1600/untitled5.bmphttp://4.bp.blogspot.com/_AgqsS20RSWk/TA3pSI3MXeI/AAAAAAAAANo/a4hKJCYyjuI/s1600/untitled4.bmphttp://3.bp.blogspot.com/_AgqsS20RSWk/TA3pRsCpb0I/AAAAAAAAANg/QECQuUOdkRo/s1600/untitled3.bmphttp://3.bp.blogspot.com/_AgqsS20RSWk/TA3pRUlSAtI/AAAAAAAAANY/k_jpExO-wBs/s1600/untitled2.bmphttp://4.bp.blogspot.com/_AgqsS20RSWk/TA3pREhkJ5I/AAAAAAAAANQ/3VBxaOM_EBg/s1600/untitled1.bmphttp://2.bp.blogspot.com/_AgqsS20RSWk/TA3pQqYYWAI/AAAAAAAAANI/O6gtzBP9u8E/s1600/untitled.bmp
  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    8/16

    Data step working:

    1.First INFILE statement specifies the name of the external file containing the list offilenames that the DATA step should read.

    2.First INPUT statement reads the name of the external files with modified list input. Awidth (60) which is sufficient to hold the name of the external file is specified.

    3.Second INFILE statement specifies a text, dummy, and this act as a placeholder for thefile specification which is always required on the INFILE statement. The actual specificationfor the input file comes from the value of the variable assigned by the FILEVAR= option.

    a.The FILEVAR= option is set to 'dmgfiles', the variable that contains the name of theexternal file that the current iteration of the data step should read.

    b.END= option defines a variable that SAS sets to 1 when it reads the last data line inthe currently opened external file. The END= variable is initialized to 0 and retains the

    value until it detects that the current input data line is the last in the external file. SASthen sets the variable to 1.

    c.When the FILEVAR= option is included in the INFILE statement, SAS resets the END=variable to 0 when the value of the FILEVAR= variable changes (If SAS did not reset thevalue of the END= variable to 0 each time it opened a new external file, the DATA stepwould stop after reading the first external file).

    4.The do while loop is controlled by testing the value of the END= variable. The loop stopsafter SAS reads the last data line in the currently opened external file.

    5.Name of the file from which the records are read (source file name) is assigned into a

    variable 'Source'.

    6.The above data step iterates four times: one for each of the dmg files (dmg01, dmg02,dmg03) and a fourth time in which it detects that there are no more data lines in theexternal file that contains the filenames.

    7.The default behavior of SAS is that it writes an observation to a data set only at the endof each iteration of the DATA step. An explicit OUTPUT statement is specified to avoid thisand output all data values read form the external file.

    8.The output dataset 'demogdat' obtained is as follows

    Source Ctrn Subjid Age Sex RaceE:\demog\dmg01.txt 001 001_01 29Male Caucasian

    E:\demog\dmg01.txt 001 001_02 28Female Caucasian

    E:\demog\dmg01.txt 001 001_03 25Male Caucasian

    E:\demog\dmg02.tx 002 002_01 27Male Asian

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    9/16

    t

    E:\demog\dmg02.txt 002 002_02 28Male Asian

    E:\demog\dmg02.txt 002 002_03 25Female Asian

    E:\demog\dmg03.txt 003 003_01 27Female Asian

    E:\demog\dmg03.txt 003 003_02 28Male Asian

    E:\demog\dmg03.txt 003 003_03 25Female Asian

    5.0 Macro Variable Resolution Using Multiple Ampersands

    The SAS macro consists of two basic parts: macros and macro variables. The names of

    macro variables are prefixed with an ampersand (&) while the names of macros are prefixed

    with percent sign (%). Prior to the execution of the SAS code the macro variables are

    resolved. The resolved values are then substituted back into the code.

    The macro variable references that have been described with one ampersand preceding the

    macro variable name are direct reference to a macro variable. In indirect referencing, more

    than one ampersand precedes a macro variable reference. The macro processor follows

    specific rules in resolving references with multiple ampersands.

    The rules that the macro processor uses to resolve macro variable reference that containmultiple ampersands follow

    Macro variable references are resolved from left to right Two ampersands (&&) resolve to one ampersand (&) Multiple leading ampersands cause the macro processor to rescan the reference until

    no more ampersands can be resolved.

    Consider the example below.

    Options symbolgen;

    %let section4 =operating system;

    %let n=4;

    %put &&section&n;

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    10/16

    For the above code, on the first pass the two ampersands are resolved to one and &n is

    resolved to 4, yielding &section4. On the second pass the macro variable reference

    &section4 resolves to operating system.

    The following figure shows the process of resolving the macro variable reference in the

    program.

    %let a =freight;

    %let b=passenger;

    %let c=special;

    %let code=a;

    %put &code;

    %put &&code;

    %put &&&code;

    The following demonstrates how the macro variables with multiple ampersands are

    resolved.

    6.0 Combining SAS data sets using UPDATE statement

    UPDATE is an executable type SAS statement generally used in DATA steps. By updating a

    SAS dataset replaces the values of variables in one dataset with values from another

    dataset. The dataset containing the original information is the master data set, and the data

    set containing the new information is the transaction data set. UPDATE performs much thesame function as merge with two exceptions:

    Only two datasets can be combined using UPDATE

    statement

    http://1.bp.blogspot.com/_AgqsS20RSWk/So5umP2MnbI/AAAAAAAAAIE/RcqEUw9qq9Q/s1600-h/untitled.bmp
  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    11/16

    If the observation of a variable in the transaction dataset (i.e.

    data set containing new information) is missing, then the updated dataset containing

    the value of the observation same as in the master dataset

    Syntax

    DATA updated data set;

    UPDATE master-data-set transaction-data-set;

    BY variable-list;

    RUN;

    Where Master-data-set names the SAS data set that is used as the master file. Transaction-

    data-set names the SAS dataset that contains the changes to be applied to the master data

    set. Variable-list specifies the variables by which observations are matched.

    Basic use of UPDATE statement

    Consider two datasets, Lab1 and Lab2. Both data sets have four subjects with information

    about their blood pressure at two different times. The first dataset Lab1 contains the subject

    id, name of the subject and Blood pressure reading. The second dataset contain the latest

    blood pressure reading for the same subjects but the name of the subjects is not given.

    It is required to replace the latest Blood pressure values in LAB1 with the values in LAB2.

    Here LAB1 is known as Master data set and LAB2 is transaction dataset.

    Lab 1

    Subject Name BP

    001 AAA 120

    002 BBB 130

    003 CCC 140

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    12/16

    004 DDD 150

    Lab 2

    Subject BP

    001 160

    002 160

    003 160

    004 160

    The following program updates LAB1 (Master Dataset) with LAB2 (Transaction dataset).

    DATA lab1_updated;

    UPDATE lab1 lab2;

    BY Subject;

    RUN;

    Printed output of lab1_updated is given below;

    In the above example it is found that the latest blood pressure value is populated into

    updated dataset based on the BY variable Subject identifier.

    Suppose Lab1 contains a duplicate observation. For example,

    http://4.bp.blogspot.com/_AgqsS20RSWk/So5X988IGBI/AAAAAAAAAHs/MGGI_uDvDJw/s1600-h/untitled.bmp
  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    13/16

    Lab 1

    Subject Name BP

    001 AAA 120

    002 BBB 130

    002 BBB 130

    003 CCC 140

    004 DDD 150

    DATA lab1_updated;

    UPDATE lab1 lab2;

    BY Subject;

    RUN;

    If the above program is executed updating will not work for duplicate BY value.

    That means if master data set contains two observations with the same value of the BY

    variable, the first observation is updated and the second observation is ignored. SAS writesa warning message to the log when the data step executes.

    Printed output of lab1_updated is given below;

    Observations two and three have duplicate values of BY variable subject identifier. However,

    the value of variable Blood pressure was not updated in the second occurrence of the

    duplicate BY value.

    http://1.bp.blogspot.com/_AgqsS20RSWk/So5Xr8vBycI/AAAAAAAAAHk/Of3o3Qus55c/s1600-h/untitled.bmp
  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    14/16

    A situation may arise where a missing observation can occur in master dataset or in

    transaction dataset. Below example illustrates updating when dataset contains unmatched

    and missing observations.

    Lab A

    Subject Name BP

    001 AAA .

    002 BBB 130

    003 CCC 140

    004 DDD 150

    Lab B

    Subject BP

    001 160

    002 .

    003 160

    004 160

    In the above dataset Lab A (Master Dataset) contains missing value corresponding to the

    subject 001 and the Lab B (Transaction dataset) contains missing value corresponding to

    the subject 002. The output of the below code gives the latest known BP value for each

    subject.

    DATA lab1_updated;

    UPDATE LabA LabB;

    BY Subject;

    RUN;

  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    15/16

    Printed output is given below;

    The following is a more advanced example where the UPDATE statement is used to flatten

    a dataset containing different variable data values for a key spread across several

    observations. The goal is to combine non-missing values into one record per unique key

    value. The master dataset structure is read, but the OBS=0 option stops the DATA step

    from reading any data from it. The same dataset is then uploaded as transaction dataset,

    flattening non-missing observations for each unique key into one observation.

    DATA lab;

    INPUT Subject $ Calcium Albumin Chloride;

    CARDS;

    001 2.25 . .

    001 . 49 .

    001 . . 100

    002 3.1 . .

    002 . 50 90

    002 . . .

    ;

    RUN;

    DATA lab;

    UPDATE lab (obs=0) lab;

    http://4.bp.blogspot.com/_AgqsS20RSWk/So5Xb7s8DzI/AAAAAAAAAHc/eaC7l670p8w/s1600-h/untitled.bmp
  • 8/8/2019 SAS Programming TIPS to Be Used in Development

    16/16

    BY subject;

    RUN;

    Output is given below;

    http://2.bp.blogspot.com/_AgqsS20RSWk/So5uyPTJL0I/AAAAAAAAAIM/GEnRdFgWU48/s1600-h/untitled1.bmphttp://3.bp.blogspot.com/_AgqsS20RSWk/So5XMf0JXwI/AAAAAAAAAHU/3TTfa_zFL1c/s1600-h/untitled.bmp