Introduction to SAS Programming for EXST 700x - …statweb.lsu.edu/EXST700xSASHandout.pdf · Example 1: dropping variables that must be read ... for anyone programming in SAS.) SAS

  • Upload
    ngocong

  • View
    233

  • Download
    4

Embed Size (px)

Citation preview

  • Introduction to SAS Programming for EXST 700x

    21Note: Invalid Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21Variable Is Uninitialized or Variable Not Found . . . . . . . . . . . .21

    Errors: Invalid Option, The Option Is Not Recognized, orStatement Is Not Valid . . . . . . . . . . . . . . . . . . . . . . . .

    21Common error messages . . . . . . . . . . . . . . . . . . . . . . . .21Where to start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20SAS Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18PROC CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . .18Getting information about a SAS data set . . . . . . . . . . . . .17Option 2: ODS HTML . . . . . . . . . . . . . . . . . . . . .16Option 1: ODS RTF . . . . . . . . . . . . . . . . . . . . . .16Using ODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16Printing directly from SAS . . . . . . . . . . . . . . . . .16Output and printing . . . . . . . . . . . . . . . . . . . . . . . . . . .14OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10The output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10The log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    9The second program . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8Standard code formatting . . . . . . . . . . . . . . . . . . . . . . . .8PROC steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8DATALINES; or CARDS; . . . . . . . . . . . . . . . . . . . .8An assignment statement . . . . . . . . . . . . . . . . . . .8Rules for SAS data set and variable names . . . . .7Two types of SAS variables . . . . . . . . . . . . . . . . .7The INPUT statement . . . . . . . . . . . . . . . . . . . . . .7The DATA statement . . . . . . . . . . . . . . . . . . . . . .7The DATA step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7Comment statements . . . . . . . . . . . . . . . . . . . . . . . . . . .6SAS program basics continued . . . . . . . . . . . . . . . . . . . . . .3SAS program basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3Program Editor line numbers . . . . . . . . . . . . . . . . . . . . . . .2Opening a SAS program . . . . . . . . . . . . . . . . . . . . . . . . . . .2Getting into SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1Hands-on practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1Lab 1 Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1Useful links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1Getting Started with SAS . . . . . . . . . . . . . . . . . . . . . . . . . . .

    EXST700xSASHandout.lwp i Copyright 2009 LSU Dept. of Experimental Statistics

  • 47Subsetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47Assignment statements . . . . . . . . . . . . . . . . . . . . . . . . .47Independent IFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47Choices within choices . . . . . . . . . . . . . . . . . . . . . . . . . .46One assignment statement but multiple choices . . . . . . . .45One assignment statement . . . . . . . . . . . . . . . . . . . . . . . .45IF/THEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45The subsetting IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45The mysteries of IF/THEN/ELSE/DO . . . . . . . . . . . . .44WHERE operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43PROC step example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43DATA step example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43WHERE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43When exact spelling and capitalization matter . . .42Numeric values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42Character values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42Specifying character vs. numeric values . . . . . . . . . .42Missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42Miscellaneous Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40Merging SAS data sets . . . . . . . . . . . . . . . . . . . . . . . . . . .39Concatenating SAS data sets . . . . . . . . . . . . . . . . . . . .39The subsetting IF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38WHERE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38Subsetting a SAS data set . . . . . . . . . . . . . . . . . . . . . . .38

    Creating a New SAS Data Setfrom Existing SASData Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    36Reading Excel worksheets as permanent SAS

    data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    36Creating a library using SAS Explorer . . . . . . . . . . . . . . . . . . .35Creating a permanent SAS data set . . . . . . . . . . . . . . . . . . . .32Accessing an existing permanent SAS data set . . . . . . . . . . . .32LIBNAME statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31Temporary SAS data sets vs. permanent SAS data sets . . . . . .31Permanent SAS data sets . . . . . . . . . . . . . . . . . . . . . . .31PROC IMPORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28Import Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28Reading Excel worksheets . . . . . . . . . . . . . . . . . . . . . . .27Reading other types of raw data . . . . . . . . . . . . . . . . . . . . . .25Multiple observations per record (line of data) . . . . . . . . . . . .23CSV (comma separated values) . . . . . . . . . . . . . . . . . . . . . . .23Reading raw data files . . . . . . . . . . . . . . . . . . . . . . . . . . .23Getting Data Into SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22Correcting unpaired quotation marks . . . . . . . . . . . .

    EXST700xSASHandout.lwp ii Copyright 2009 LSU Dept. of Experimental Statistics

  • 69VBAR (character variable, no options) . . . . . . . . . . . . . . . . . .69GCHART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68PROC CORR with VAR and WITH . . . . . . . . . . . . . . . . . . . .68Simple PROC CORR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67CORR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67PROC FREQ with the LIST option . . . . . . . . . . . . . . . . . . . . .66Simple PROC FREQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65FREQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63PLOT and NORMAL options . . . . . . . . . . . . . . . . . . . . . .62Default . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62UNIVARIATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62BY (with options and VAR) . . . . . . . . . . . . . . . . . . . . . . .61CLASS (with options and VAR) . . . . . . . . . . . . . . . . . . . .61Default . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61Grouping with Class or By . . . . . . . . . . . . . . . . . . . . . . . .60Var . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60Maxdec= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60MEANS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59SORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57PRINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57POSITION option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56PROCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55Labels and PROC MEANS . . . . . . . . . . . . . . . . . . . . . . . . . . .54Labels and PROC PRINT . . . . . . . . . . . . . . . . . . . . . . . . . . . .53LABEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52Excel data RENAME example . . . . . . . . . . . . . . . . . . . . . . . .51Basic RENAME example . . . . . . . . . . . . . . . . . . . . . . . . . . . .51RENAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50TITLE/FOOTNOTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49Example 2: dropping temporary variables . . . . . . . . . . . . . . . .48Example 1: dropping variables that must be read . . . . . . . . . . .48DROP/KEEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48Multiple choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    EXST700xSASHandout.lwp iii Copyright 2009 LSU Dept. of Experimental Statistics

  • 88Sorting a permanent SAS data set . . . . . . . . . . . . . . . . . . . . .88Creating a permanent SAS data set in sort order . . . . . . . . . .88Sorting isnt always necessary . . . . . . . . . . . . . . . . . . . . . . . .88Sorting guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87

    Sortingdigits in character variables vs.numeric variables . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    86Sortingupper case before lower case . . . . . . . . . . .86Sorting in descending order . . . . . . . . . . . . . . . . . . . . .86Sorting by multiple variables . . . . . . . . . . . . . . . . . . . .86Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84SUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84PAGEBY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83BY and ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82BY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81Noobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81

    Multiple statements and the order ofstatements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    81PROC PRINT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79Tips and tricks for matching DO/END . . . . . . . . . . .79Identifying matching parentheses and brackets . . .77Proper Code Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . .75Using SAS Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74Enhanced Program Editor: color is your friend! . .74Windows 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74Windows XP or Vista . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73

    Configuring Windows Explorer to showextensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    73Identifying common file types associated with

    SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    73Running SAS under Virtual Lab . . . . . . . . . . . . . . . . . .72Extras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72GPLOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71

    HBAR (numeric variable with LEVELS=, RANGE, andNOSTATS option) . . . . . . . . . . . . . . . . . . . . . . . . . . .

    71VBAR (numeric variable with LEVELS= option) . . . . . . . . . . .70VBAR (numeric variable, no options) . . . . . . . . . . . . . . . . . . .70HBAR (character variable, no options) . . . . . . . . . . . . . . . . . .

    EXST700xSASHandout.lwp iv Copyright 2009 LSU Dept. of Experimental Statistics

  • 91Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89

    Will use data set in both the original order and the neworder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    89Will use data set only in new order . . . . . . . . . . . . . . . . .89Sorting a temporary SAS data set . . . . . . . . . . . . . . . . . . . . . .

    EXST700xSASHandout.lwp v Copyright 2009 LSU Dept. of Experimental Statistics

  • EXST700xSASHandout.lwp vi Copyright 2009 LSU Dept. of Experimental Statistics

  • Getting Started with SAS1

    References

    The Little SAS Book: A Primer, Fourth Edition. 2008. Lora D. Delwiche and Susan J.Slaughter. (This is a very valuable and useful compact reference book recommendedfor anyone programming in SAS.)

    SAS Programming 1: Essentials Course Notes. 2008. SAS Institute. (one of twotextbooks used in EXST 4025: SAS Programming)

    Useful links

    http://support.sas.com/ (especially the Knowledge Base section)

    http://www.ats.ucla.edu/stat/sas/faq/

    Note: Please report any mistakes, typos, discrepancies, errors, or clarifications in thishandout to Elizabeth A. Swoope, [email protected], so the handout can be improved.

    If you are interested in learning more about SAS programming, you can take EXST4025 SAS Programming, offered during the summer semester.

    Lab 1 Tutorial

    Objectives

    { To learn about the SAS user interface.{ To learn how to create SAS data sets from raw data (1) included in the SAS

    program and (2) read from an external file.{ To learn the rules for data set and variable names.{ To learn best practices for formatting programs.{ To learn how to use basic procedures to sort and print data and calculate

    summary statistics.{ To learn how to get information about a SAS data set.

    Hands-on practice

    If you are working in an EXST lab, the data files for this tutorial are located in s:\EXST700x.If you want to work through the tutorial on your own computer, you can downloadthe files at www.stat.lsu.edu/html/EXST700x.html. A video of the tutorial, in several formats(with TOC for viewing online, MP4 for viewing online or downloading, M4V [iPod]for viewing online or downloading) is also on that web page.

    EXST700xSASHandout.lwp 1 Copyright 2009 LSU Dept. of Experimental Statistics

    1 Screen captures from Base SAS software are copyright 2009 SAS Institute Inc.,Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS InstituteInc., Cary, NC.

  • Getting into SAS

    1. Get into SAS: Click the SAS icon (for either SAS 9.1 or SAS 9.2, whicheveris available)) on the Windows desktop or start > All programs > SAS > SAS 9.x (English)(where x is either 1 or 2; this tutorial works for both version 9.1 and 9.2).

    2. Maximize the SAS window so that it fills the screen. You need the largest possi-ble workspace. Your screen will look like this:

    Opening a SAS program

    3. You will begin by opening and running an existing program: File > Open Program... (orC+O) > browse to s:\EXST700x > select Demo1.sas > Open or double-click the file name).

    EXST700xSASHandout.lwp 2 Copyright 2009 LSU Dept. of Experimental Statistics

  • 4. Maximize the window so that you can see the entire program. The program willlook like this, although yours wont have the line numbers or horizontal linesbetween steps.

    Program Editor line numbers

    Optional: If you want to see line numbersand/or lines between program steps, as shownin the screen shot above:

    a. With your cursor in the Editor window,Tools > Options > Enhanced Editor... > General tab.

    b. Check Show line numbers.

    c. Check Collapse code sections, Show section lines intext, and Show section brackets in margin.

    (You can also change the tab size andspecify whether you want to replace tabswith spaces.)

    d. OK.

    SAS program basics

    Lets look at this SAS program, which is a sequence of steps. SAS programs caninclude two types of steps: DATA steps and PROC steps.

    DATA steps read and modify data and create SAS data sets (the data packaged withvariable names, labels, formats, etc., in the form that SAS uses it). PROC stepsperform functions (e.g., sorting or printing) or analyses (e.g., summary statistics or

    EXST700xSASHandout.lwp 3 Copyright 2009 LSU Dept. of Experimental Statistics

  • statistical analysis) and produce results (e.g., a data set sorted in a specific order)and/or reports.

    { SAS steps begin with either (1) a DATA statement or (2) a PROC statement.{ SAS steps end with (1) a RUN statement (for most steps), (2) a QUIT statement (for

    some PROCS), or (3) the beginning of the next step (the next DATA or PROC statement).

    There are three steps in this sample SAS program: one DATA step and two PROC steps.SAS statements usually begin with an identifying keyword and always end with asemicolon (;). The screen shot below shows the three steps. The keywords andsemicolons are highlighted, as are the raw data (the names and numbers)

    Youll run this simple program that reads raw data, sorts it, and prints it; youll look atthe contents of the various SAS windows; then youll learn more about the statementsin this program.

    5. Click the Submit (running man) button on the toolbar.You got output, and thats good, but the first thing you should do whenever yourun a SAS program is to look at the Log window. You can get output when thereare errors and warnings in your program; the fact that you get output does notmean that your program ran correctly.

    6. Click the Log window tab at the bottom of the screen, then scroll to the top of thescreen so that you can see the entire log. The screen shot below shows the relevantpart of the log. When you look at the log, you see notes but no warnings or errors;this program ran properly so you are almost ready to look at your output.

    Before doing so, though, look at the notes in the log and you will see that SASreports how many observations and how many variables it processed in eachstep. This can be important information. Also, notice that SAS refers to the datasets as WORK.GRADES rather than GRADES. Thats because SAS uses two-partnames. If you dont use a two-part name, SAS knows that the data set is a tempo-rary one, which exists only for the duration of your SAS session. When you exit

    EXST700xSASHandout.lwp 4 Copyright 2009 LSU Dept. of Experimental Statistics

  • from SAS, temporary data sets are deleted automatically. The first part of a tempo-rary SAS data set name is WORK. (work dot).

    7. Now that youve examined the log and confirmed that the program has runcorrectly, look at the Output window, which displays the output produced by theproc print step. The output resembles this:

    8. Clear the Output window by pressing C+E. (Youll want to remember thiscommand because youll probably use it often.)

    You dont have to print a SAS data set to see the data that are stored in it. Youcan use SAS Explorer to view the data set. Temporary (WORK.) SAS data setsexist from the time you create them until you exit from SAS.

    9. View the data in a SAS data set:

    a. Click the Explorer tab at the bottom of the pane on the leftside of the screen.

    b. Double-click the Libraries icon (screen shot on right).

    c. Double-click the Work icon. (Work is where your data setsare stored during your SAS session. Remember that thenames of temporary SAS data sets start with WORK.)

    EXST700xSASHandout.lwp 5 Copyright 2009 LSU Dept. of Experimental Statistics

  • d. Double-click Grades (the data set you created). This opens a VIEWTABLEwindow. Your screen will look like the screen shot below. The variables(columns, categories of information), observations (rows, information aboutone person or item), and data are shown

    e. Important: Close the VIEWTABLE window by clicking on the Close button on themenu line (not the red one on the title bar, which closes SAS). If you open a SASdata set in a VIEWTABLE window, you must close it before running any programthat recreates or modifies the data set; if the SAS data set is open in a VIEWTABLEwindow, any step that attempts to modify the SAS data set will not run.

    SAS program basics continued

    10. Look at the program again by viewing the program editor window (titled Demo1.sas).

    EXST700xSASHandout.lwp 6 Copyright 2009 LSU Dept. of Experimental Statistics

  • Comment statements

    This program begins with a comment statement that identifies the program name, theprogrammer, and the date the program was written. You can include comment state-ments anywhere in your program, and its good practice to do so. Comments caninclude information about when a program was modified and a summary of themodifications or explanations of what sections of code do.

    There are two types of comment statements:

    * comment ;

    /* comment */

    Comment statements can span several lines. Type * at the beginning of the first lineand ; at the end of the last line, or type /* at the beginning and */ at the end.

    (You can also comment out lines of code when testing or running a program so thatyou dont have to delete them. Theyre still in the program but they wont run untilyou remove the comment code.)

    The DATA step

    This data step creates a SAS data set names grades from raw data that are included inthe data step itself. This is the simplest way to create a SAS data set, but it is bestused only for very small data sets.

    The DATA statement

    The data step begins with the statement data grades; and ends with therun; statement. data is the SAS keyword and grades is the name of the SASdata set you are creating.

    The INPUT statement

    Input statements begin with the keyword input and include information aboutvariable names and types. The input statement input Name $ Exam1aExam1b; reads three variables (Name, Exam1a, and Exam1b) from the raw data.The $ that follows Name indicates that Name is a character variable. Exam1a andExam1b are numeric variables.

    Two types of SAS variables

    { charactercan contain any printable character: letters, digits, special characters,punctuation marks, symbols, and spaces. Names of character variables arefollowed by $ in the INPUT statement.

    { numericcan contain only digits, the minus/negative sign (-) and E (indicatingscientific notation). Numeric variables are measures or counts and can be usedin mathematical operations.

    EXST700xSASHandout.lwp 7 Copyright 2009 LSU Dept. of Experimental Statistics

  • Rules for SAS data set and variable names

    { can be up to 32 characters long{ must start with a letter or underscore. Subsequent characters can be letters,

    underscores, or digits, but not spaces, punctuation marks, or symbols. (There is away to included special characters but that is beyond the scope of this class.)

    { can be uppercase, lowercase, or mixed case.{ are not case sensitive (e.g., NAME, name, and Name refer to the same variable.)

    An assignment statement

    Diff = Exam1b - Exam1a; is an assignment statement. It creates a newvariable Diff by subtracting Exam1a from Exam1b. Assignment statements are usedto (1) create new variables or (2) change the value of existing variables.

    DATALINES; or CARDS;

    The datalines; statement immediately precedes the lines of raw data. Youmay use the cards; statement instead, although technically there arent anycards in the program. The two statements have the same function.

    The raw data in this program are in list format. That is, each data item isseparated from the previous data item by a space. The Exam1a score for onestudent may not be in the same columns as the Exam1a score for the otherstudents.

    Note that a semicolon follows the last line of raw data.

    PROC steps

    The two PROC steps begin with the PROC keyword followed by the name of theprocedure (e.g., SORT or PRINT). DATA= specifies the SAS data set to be processed.If you do not specify the data set, SAS will use the last one created. However, it isgood programming practice to explicitly specify the data set to be processed. If youdont, your program may not process the data set that you intended for it to use.

    The BY statement in the PROC SORT step tells SAS what variable to use to order theobservations in the data set. (You can sort by more than one variable and in descend-ing order.)

    Standard code formatting

    Unlike some programming languages, SAS is very flexible in its requirements forprogram statements. However, the fact that SAS has few rules does not mean that youshould not use standard formatting when writing your code. Using standard codeformatting makes it easier to recognize and correct errors, as well as making your codemore readable and easier to maintain.

    You can find more detailed information and example code starting on page 77.

    EXST700xSASHandout.lwp 8 Copyright 2009 LSU Dept. of Experimental Statistics

  • One blank line between steps

    Use exactly the capitalization you want whenyou create the data set and variables.

    Upper, lower, or mixed case

    Indent second and subsequent lines.Statements can extend over morethan one physical line.

    Only one statement per lineMultiple statements per line

    Exactly one space between wordsOne or more spaces between words

    Begin DATA, PROC, and RUN statements incolumn 1. Indent all other statements one ormore tab stops for readability.

    Begin and end statements in anycolumn

    Best Practices (recommended)Allowed by SAS

    The second program

    Youll run a second program that reads data from a text file and runsa different PROC. The data file that you will be reading isClassGrades.txt and it contains the first name, exam score, and labaverage for 19 students. You can open this in Windows Notepad orother program to look at it if youd like. The file looks like the screenshot to the right:

    1. Clear the program editor window: with your cursor anywherein the editor window, press C+E.

    2. Open the program Demo2.sas:

    The INFILE statement lists the complete path for the external raw data file. Notethat the drive letter, path, and filename (including extension) are enclosed insingle quotation marks. This program includes the INFILE statement and doesnot include the DATALINES statement and raw data.

    Also note that the title is different. If there were no TITLE statement in thisprogram, the title from the previous program would be used. TITLE statementsstay in effect until a new one is used.

    EXST700xSASHandout.lwp 9 Copyright 2009 LSU Dept. of Experimental Statistics

  • 3. If necessary, edit the path in the INFILE statement. If you are in lab, that path iss:\EXST700x

    4. Run the program.

    The log

    5. Look at the log window first. Because you did not clear that window, it containsthe logs for both the first and second programs. The part of the log for the secondprogram is shown below. The NOTEs give information about the raw data file.

    6. The log doesnt show any error or warning messages, so clear it.

    The output

    7. View the output window. Scroll through it and youll see four pages of output,similar to those that follow. You can use the mouse and scroll bars or the xand z keys to move through the output.

    EXST700xSASHandout.lwp 10 Copyright 2009 LSU Dept. of Experimental Statistics

  • There are two numeric variables, Exam1 and LabAvg. Because there is no VARstatement to tell PROC UNIVARIATE which variables to process, it analyzes allnumeric variables, so there are two pages for Exam1 and two for LabAvg. Note alsothat the page numbers do not restart with 1.

    EXST700xSASHandout.lwp 11 Copyright 2009 LSU Dept. of Experimental Statistics

  • 8. Clear the Output window.9. Using the instructions in the lettered steps below, modify this program so that

    PROC UNIVARIATE analyzes only Exam1 and prints stem-and-leaf, box, andnormal probability plots in addition to the summary statistics.

    EXST700xSASHandout.lwp 12 Copyright 2009 LSU Dept. of Experimental Statistics

  • a. Insert the word plot between grades2 and the semicolon in the PROCUNIVARIATE statement.

    b. Add the statement var Exam1; between the PROC UNIVARIATE and RUNstatements. Use T, not s, to indent the statement.

    10. You dont have to rerun the entire program. The grades2 SAS data set still exists,so you dont need to recreate it. Instead, highlight just the PROC UNIVARIATEstep as shown below, then click the Submit button.

    11. Check the log window to be sure that there arent any error messages, then viewthe output window. The plots that you requested are on the last page of output;they are highlighted in the screen shot below:

    EXST700xSASHandout.lwp 13 Copyright 2009 LSU Dept. of Experimental Statistics

  • 12. You will modify the program so that it clears the log and output windowsautomatically each time you run the program. In the program editor, insert a newblank line 1 above the comment line and type:

    dm 'log; clear; output; clear';

    dm refers to the Display Manager, the log and output commands switch to thosewindows, respectively, and clear clears each. Be sure to type both apostrophes andthe semicolon.

    You will probably want to use the above statement as the first line of everyprogram that you write.

    OPTIONS

    The OPTIONS statement is used to set parameters for the program output. You caninclude multiple options in a single OPTIONS statement and the options are in effectfor the duration of the SAS session, or unless you submit another OPTIONS statementwith different options.

    Print the date and time (the default) ordont print the date and time.

    DATE | NODATE

    Controls whether the output is centered(the default) or left-aligned. If you areprinting directly from SAS, you mayprefer CENTER. If you will be copyingfrom the output window and pastinginto a word processing program, youmay prefer NOCENTER.

    CENTER | NOCENTER

    The first option restarts page numberingof output at 1 each time you rerun theprogram. Otherwise, the page numbersare consecutive for the duration of yourSAS session. (You can specify a numberother than 1.) Use NONUMBER to turnpage numbering off or NUMBER to turnpage numbering back on.

    PAGENO = 1 | NONUMBER | NUMBER

    Description/commentsOption

    EXST700xSASHandout.lwp 14 Copyright 2009 LSU Dept. of Experimental Statistics

  • Use this setting with specific marginsettings rather than LINESIZE and/orPAGESIZE, unless you know the specificLINESIZE/PAGESIZE values for theorientation setting.

    ORIENTATION=PORTRAIT |ORIENTATION=LANDSCAPE

    Use these settings if you will be printingfrom SAS rather than copying/pastinginto a word processing program.

    RIGHTMARGIN=nLEFTMARGIN=nTOPMARGIN=nBOTTOMMARGIN=n

    (where n is in the form 0.75in or 2cm).

    LINESIZE (or LS) specifies how manycharacters will fit on a single lines,PAGESIZE (or PS) specifies how manylines will fit on a page. Use these optionsif you will be copying SAS output into aword processing program. You can setfairly large LINESIZE and PAGESIZEvalues then adjust the font size in theword processing document to make theoutput fit. LS=78 and PS=55 arecommon settings.

    LINESIZE = x or ls = x

    PAGESIZE = x or ps = x

    Controlling margins: the materials for your specific class will tell you what methodyour professor prefers.

    Description/commentsOption

    13. Right below the comment statement, add an OPTIONS statement to restart pagenumbering at 1 each time you run the program and suppress printing of the date:

    options pageno=1 nodate;

    As you are typing, notice how the colors of the words in the program editorchange. Use the colors to help you be sure that you have ended statements withsemicolons, spelled keywords properly, and used matching quotation markswhere appropriate.

    14. Run the program again, then look at the log and output windows. You will see:

    a. the log contains only the statements for the current program execution; thelines from the previous program run were cleared by the DM statement.

    b. the output window contains only output from the current program execution.

    c. page numbering begins with 1. (OPTIONS)

    d. the data and time are not printed at the top of each page. (OPTIONS)

    15. Use SAS Explorer to look at the SAS data sets in the Work library. Both Grades andGrades2 are there.

    EXST700xSASHandout.lwp 15 Copyright 2009 LSU Dept. of Experimental Statistics

  • Output and printing

    Printing directly from SAS

    If you are printing (whether program, log, or output) directly from SAS, select thewindow that you want to print and click the Print button on the toolbar or pressC+P. Output from SAS is serviceable but not especially attractive.

    You can also copy/paste from SAS into a word processing program, but you mustformat the pasted text using a monospaced (typewriter-like) font like Courier New,Letter Gothic, or SAS Monospace.

    Using ODS

    You can generate more attractive output using SASs ODS (Output Delivery System),which can be used to create RTF (Rich Text Format), HTML, or PDF files. RTF filescan be edited in a word processing program.

    When you use ODS, SAS routes output to both the regular output window as well asto the file you specify. The ODS statements look like this:

    ods filetype file = 'filename-extension' options;

    ...SAS statements that produce output...

    ods filetype close;

    The file types you are most likely to use are RTF, HTML, and PDF.

    16. Modify the program to produce the output preferred by your professor using theappropriate steps below. The STYLE option is not necessary, but the HighContraststyle produces relatively plain, simple output with no shading. Substitute yourdesired file type for filetype and extension in the above statements. You can viewexamples of the output produced by all the styles that are standard in SAS atwww.stat.lsu.edu/SAS_ODS_styles/SAS_ODS_style.htm.

    Option 1: ODS RTF

    1. This example will create RTF output, which can be edited with a word process-ing program like Word:

    a. In the line above the PROC UNIVARIATE statement, add the code:

    ods rtf file='c:\temp\lab1.rtf' style=HighContrast;

    b. In the line below the last RUN statement, add the code:

    ods rtf close;

    EXST700xSASHandout.lwp 16 Copyright 2009 LSU Dept. of Experimental Statistics

  • (If you want to create either an HTML or PDF file, substitute HTML or PDF forRTF in both ODS statements and the extension on the file name.) The fourlines of code you have added to the original program are highlighted in thescreen shot below (line 1, 3, 10, and 14):

    2. Run the program.

    3. When you are asked if you want to open or save the file, click Open and the filewill open in Word. You can edit it, save it, etc. Once youve done whatever yourinstructor specifies, close Word. If you did not get output in Word, check the log,make corrections to your program, and run it again.

    4. In SAS, close the Results Viewer window. RTF output is displayed in Word, not theResults Viewer. You should look at the log to confirm that the program ran withouterrors.

    5. Close the program without saving it (or, if you want to keep the program, save itin My Documents or c:\temp).

    Option 2: ODS HTML

    1. This example will create HTML output, which can be editing in a word process-ing program or an HTML editor:

    a. In the line above the PROC UNIVARIATE statement, add the code:

    ods html file='c:\temp\lab1.html' style=HighContrast;

    b. In the line below the last RUN statement, add the code:

    ods html close;

    EXST700xSASHandout.lwp 17 Copyright 2009 LSU Dept. of Experimental Statistics

  • (If you want to create either an RTF or PDF file, substitute RTF or PDF forHTML in both ODS statements and the extension on the file name.) The fourlines of code you have added to the original program are highlighted in thescreen shot below (lines 1, 3, 10, and 14):

    2. Run the program.

    3. Click the Results Viewer tab to view the HTML output, then close the window. Ifyou didnt get results, check the log, correct any errors in your program, and runthe program again.

    4. Close the Results Viewer window.

    Getting information about a SAS data set

    A SAS data set is a file that contains both the actual data values and informationreferred to as the descriptor. The descriptor includes things like the number of obser-vations; the number of variables; a list of variables with their types, formats, and labels;and other information. Think of it as an organized kit of parts with a packing slip.

    There are a couple of ways of viewing the descriptor information (packing slip).

    PROC CONTENTS

    1. In the program editor window, delete everything following the OPTIONS statement.

    2. Below the OPTIONS statement, add the code highlighted in the screen shot below:

    3. Submit the program then look at the log and then output window. Note that eventhough this program does not include a TITLE statement, the output has a titlebecause the program you submitted earlier included a TITLE statement. TITLEstatements are in effect until a new TITLE statement is submitted.

    EXST700xSASHandout.lwp 18 Copyright 2009 LSU Dept. of Experimental Statistics

  • The most useful information is probably the number of observations andvariables and the list of variables and attributes.

    You should be careful when you create variables to use consistent capitalizationso that the list of variables is in the order that you expect. If you have thevariables exam1 and LabAvg, you may expect to see exam1 listed before LabAvg, butSAS sorts capital letters first so LabAvg would be listed before exam1.If formats or labels were part of this SAS data set, the list of variables and attrib-utes would include that information.

    4. You may want to see the list of variables in the order that they were created, somodify the program by adding the position option to the PROC CONTENTSstatement, as shown in the screen shot below, then submit it again.

    5. After checking the log, look at the output window. The list of variables increation order follows the list in alphabetical order.

    EXST700xSASHandout.lwp 19 Copyright 2009 LSU Dept. of Experimental Statistics

  • You havent learned about permanent SAS data sets yet, but if someone sendsyou a permanent SAS data set, you can use PROC CONTENTS to find out whatsin the data set.

    SAS Explorer

    You have used SAS Explorer to look at the data stored in a SAS data set. You can alsouse it to get the descriptor information.

    1. In SAS Explorer, right-click on Grades2 to open the menu. The most usefuloptions are View Columns and Properties.

    2. Click on Properties. On the General tab is information about the number of rows(observations) and columns (variables), as well as the date(s) that the file wascreated and last modified.

    3. Click on the Columns tab to view information about the variables.

    If you only want to see information about the variables (i.e., the information onthe Columns tab), choose View Columns instead of Properties to go directly to the tab.

    4. You have completed the tutorial for the first lab so you may exit from SAS. Donot save Demo2.sas. If you want to work on your assignment, clear the programeditor, log, and output windows before beginning. You will probably want to readthrough the next section on troubleshooting.

    For more information on using SAS Explorer, refer to page 75.

    Note: The tutorial for the first lab was designed to be completed in order, step by step.The topics covered in the remainder of the handout stand alone and are designed tobe used as reference material.

    EXST700xSASHandout.lwp 20 Copyright 2009 LSU Dept. of Experimental Statistics

  • Troubleshooting

    The most common beginning programmer errors are omitting characters (specifically,semicolons and matching single quotes) and making typographical errors. There aremany other things that cause errors and warning messages in the log, but you will seethose more when you are working with your own data rather than class data.

    Where to start

    { Look at the SAS log for errors, warnings, and notes.{ Start at the beginning of the log; scroll up! Often, errors at the end of the

    program are caused by errors at the beginning of the program. Just one missingsemicolon at the end of one of the first statements in your program, or onemissing quotation mark, can cause errors through the entire program.

    { If the error message doesnt make sense, work backwards (toward the beginningof the program) to see if you can find the problem.

    Common error messages

    Errors: Invalid Option, The Option Is Not Recognized, or Statement Is Not Valid

    Look for:

    { a misspelled SAS keyword{ a missing semicolon{ a statement in a step where it isnt valid (e.g., an assignment statement in a

    PROC step){ a RUN statement in the middle of a step{ an option used with the wrong statement{ an unmatched quotation mark{ an unmatched comment designator (e.g., using /* without a matching */)

    Variable Is Uninitialized or Variable Not Found

    Look for:{ a misspelled variable name (usually a typo){ using a variable that is no longer in the data set (e.g., it was dropped in an earlier step){ using the wrong data set (i.e., the one you are using doesnt contain that variable){ making a logic error, like using a variable before youve created it

    Note: Invalid Data

    In this class, the most common reason for getting an Invalid data note is that you forgotto use a $ after the variable name when reading a character variable. There are otherreasons for getting that message, but they are beyond the scope of this course.

    EXST700xSASHandout.lwp 21 Copyright 2009 LSU Dept. of Experimental Statistics

  • Correcting unpaired quotation marks

    If you program didnt run correctly because you have a missing quotation mark, youcant just fix the problem in the program editor and resubmit your program. You mustperform the following steps to clear the error then submit your corrected code.

    1. Click the Break button on the toolbar to stop the program from executing.

    2. Select 1. Cancel Submitted Statements then click OK.3. Select Y to cancel submitted statements, then click OK.4. Clear the log, add the missing quotation mark to your program, and resubmit.

    Otherwise, youll just keep getting errors even if youve corrected your code.

    EXST700xSASHandout.lwp 22 Copyright 2009 LSU Dept. of Experimental Statistics

  • Getting Data Into SAS

    Note: The tutorial for the first lab was designed to be completed in order, step by step.The topics covered in the remainder of the handout stand alone and are designed tobe used as needed.

    Reading raw data files

    In the lab 1 tutorial, you learned how to read raw data in list input form (data valuesseparated by one or more spaces) when the raw data are included in the program andwhen they are stored in a separate file. This section demonstrates reading data in twoother common formats.

    CSV (comma separated values)

    CSV is a very common and useful raw data format. The values for each variable areseparated by commas. Values that included commas (e.g., Green, Kyle or Baton Rouge, LA)are enclosed in quotation marks so that the commas within the values arent inter-preted as separating variables. In some CSV files, all character variables, whether ornot they include commas, are enclosed in quotation marks. (Numbers that includecommas are also enclosed in quotation marks. SAS can read them, but the method fordoing so is beyond the scope of this course.)

    Many programs can export data to CSV format, so if SAS cant read a programs nativefiles, you can usually export the data to a CSV file and read that into SAS.

    The screen shot to the right shows the file ClassGrades3.csv,which contains the same data that were in ClassGrades.txt exceptthat the values are separated by commas rather than spaces andthe names are in the form Last, First so must be enclosed inquotation marks.

    If you want to view the file, open Windows Explorer (not Inter-net Explorer), right-click on ClassGrades3.csv > Open With > Notepad.Do not double-click the file name because that will open thefile in Excel and that can cause problems with some types ofdata. If you are not going to read the raw data file using SAS,its okay to open it in Excel. However, if you are going to useSAS to read the raw data file, its best not to open it in Excel.

    1. In the SAS program editor, type the following program (using the appropriatepath for the CSV file on the computer you are using):

    EXST700xSASHandout.lwp 23 Copyright 2009 LSU Dept. of Experimental Statistics

  • The DSD (Delimiter-Sensitive Data) option in the INFILE statement serves severalfunctions:

    { It assumes that the delimiter is a comma. (If the delimiter is some othercharacter, like a tab or ~ or /, you must also use the DLM= option.)

    { It ignores delimiters in data values enclosed in quotation marks. In this file, it ignoresthe comma separating the last and first names in the Name field. If the file weretab-delimited, it would ignore tabs within values enclosed in quotation marks.

    { It does not read quotation marks as part of the data values.{ It treats two delimiters in a row as a missing value. That is, two commas with

    nothing between them represent a missing value for a variable.

    The MISSOVER option keeps SAS from going to a new data line if there is amissing value for the last variable.

    Even though there are no missing values in this data file, youll use the MISSOVERoption because its a good programming practice when working with delimited files.

    2. Submit the program and check the log. If there are errors in the log, correct themand rerun the program.

    3. Once there are no errors in the log and it reports thatWORK.GRADESCSV has 19 observations and 3 variables,use SAS Explorer to look at WORK.GRADESCSV. Note thatthe names are truncated (incomplete, chopped off). Thisis because the default length of a SAS character variableis 8, so SAS reads the first 8 characters of each name.

    4. Close the VIEWTABLE window.

    5. The LENGTH statement is one way to tell SAS to allowmore than 8 characters in a character variable. It mustcome before the variable is created in the INPUT state-ment. You do need to know the length of the longestvalue for the variable. In this case, it is 13. Note that theLENGTH statement includes the variable name, a $(because this is a character variable), and the number ofcharacters. You can use a single LENGTH statement toset the length for more than one variable, just as you canlist several variables in one INPUT statement.

    EXST700xSASHandout.lwp 24 Copyright 2009 LSU Dept. of Experimental Statistics

  • 6. Submit the program, check the log, then view the data set. Youll see that thenames are complete:

    7. Close the VIEWTABLE window. You may also wish to clear the log and programeditor windows.

    Multiple observations per record (line of data)

    You may have raw data in which each line of data contains more than one observation,as shown in the screen shot of ClassGrades2.txt shown below. These are the same datafor 19 students that are in ClassGrades.txt except that some lines of data in this filecontain information for three students, some for two, and some for just one. If you usea standard INPUT statement, only the first observation will be read from each record(line of data).

    8. In the SAS program editor window, type the following program (with the correctpath to the txt file) then run it.

    EXST700xSASHandout.lwp 25 Copyright 2009 LSU Dept. of Experimental Statistics

  • 9. View the log. Notice that although the txt file contains data for 19 students, only 8observations were read from the file. Thats because there are 9 records (lines of data).

    10. Use SAS Explorer to view the data set and you will see that the only students inthe data set are those whose data were at the beginning of each record (line ofdata) in the txt file:

    11. Close the VIEWTABLE window.

    12. Edit the program by adding two at-signs (@@) to the end of the INPUT statement.This is called a double trailing at sign and it tells SAS to keep reading data from arecord (line of data) until all the data on the line have been read, then go to thenext line and continue reading.

    13. Submit the program and view the log. There are two things you should note.

    First, theres a message NOTE: SAS went to a new line when INPUT statement reached past theend of a line. When you use @@ in an INPUT statement to read multiple observa-tions from a single record, this is a normal and expected message so it is notcause for concern. However, with most other forms of input, it is often a sign thatyour INPUT statement is not reading the data correctly and you have lost obser-vations and/or the wrong data are being read for some variables.

    Second, the data set contains 19 observations and 3 variables, as it should.

    EXST700xSASHandout.lwp 26 Copyright 2009 LSU Dept. of Experimental Statistics

  • 14. Use SAS Explorer to view the data set and you will see that all the observationsare there:

    15. Close the VIEWTABLE window. You may also wish to clear the log and programeditor windows.

    Reading other types of raw data

    SAS can read raw data in many other formats. For example, values may be alignedwithin columns and not separated by delimiters like spaces or commas. The delimitermay be something other than spaces or commas, like tabs or tildes (~) or slashes (/) orbackslashes (\) or pipes (|).

    The raw data may contain dates (e.g., 12/17/2009 or 22JAN2010) or numbers that includecommas and/or dollar or other currency symbols (e.g., 10,287 or $12.85 or $15,873).

    It may contain multiple types of records. For example, a raw data file may containaccident records, driver and vehicle records (one or two observations per records),and casualty records (up to five observations per casualty record), as is the case withTexas traffic accident data.

    Learning to read these types of raw data is beyond the scope of this course, but youshould be aware that it is possible to read complex raw data with SAS.

    EXST700xSASHandout.lwp 27 Copyright 2009 LSU Dept. of Experimental Statistics

  • Reading Excel worksheets

    An Excel worksheet is one of the most common formats for data.There are two ways to import an Excel worksheet into SAS, bothof which will be covered here. You can also read an Excelworksheet directly as a SAS data set without importing it, andthat topic will be covered in a different section of this handout.

    There are a couple of guidelines for Excel worksheets that youwant to use as SAS data sets. The screen shot to the right, ofClassGrades3.xls, is an example of a worksheet in the proper formfor use in SAS.

    First, the data must be in a rectangular block. That is, thereshould not be blank rows, titles, or anything that isnt datamixed in with the data.

    Second, the first row should contain the variable names and thebest practice is to make them SAS-compatible (up to 32 charac-ters long, start with a letter or underscore, no spaces), althoughSAS will substitute the underscore character for spaces. If youdont include variable names in the first row, SAS assigns thevariable names F1, F2, and so on.

    Import Wizard

    17. In SAS, File > Import Data... to open the Import Wizard.

    18. Select Microsoft Excel Workbook from the data source drop-down then Next >. (Thescreen shot above is from SAS 9.2. If you are using SAS 9.1, the choice will beslightly different and will not list *.xlsb, *.xlsm, or *.xlsx because SAS 9.1 can importExcel 2003 and earlier files but not Excel 2007 and later ones.)

    19. In the Connect to MS Excel dialog, click Browse..., navigate to the folder containingClassGrades3.xls, and double-click the file name to select it.

    EXST700xSASHandout.lwp 28 Copyright 2009 LSU Dept. of Experimental Statistics

  • 20. Click OK. In the next dialog, use the Select table drop-down to select the sheet thatyou want to import, which is Sheet1$ in this example. (Note that you can alsoimport a range rather than an entire sheet.)

    21. Click the Options... button. Since the variable names are in the first row, Use data inthe first row as SAS variable names. should be checked. This is the default. (If the firstrow of the sheet contains data, rather than variable names, you would uncheckthat option and SAS would assign variable names F1, F2, F3, and so on.)

    22. Click OK then Next >. Youll be creating a temporary SAS data set called Grades, sothe library is WORK. Type Grades under Member:. (If this is a data set youll needto use again, youll probably want to make it a permanent SAS data set, ratherthan a temporary one, so youd select a library other than WORK.)

    23. Click Next >.

    EXST700xSASHandout.lwp 29 Copyright 2009 LSU Dept. of Experimental Statistics

  • 24. If you will need to import the worksheet again or include the code to import it ina program, type the path and name for the program file. If you wont need to useperform this import again, you can leave this blank.

    25. Click Finish. The log should show NOTE: WORK.GRADES data set was successfully created.26. Use SAS Explorer to view the SAS data set Grades, which should look like the

    screen shot below, then close the VIEWTABLE window.

    EXST700xSASHandout.lwp 30 Copyright 2009 LSU Dept. of Experimental Statistics

  • PROC IMPORT

    The easiest way to see how PROC IMPORT works is to look at the program producedby the Import Wizard:

    Permanent SAS data sets

    Temporary SAS data sets vs. permanent SAS data sets

    So far, you have worked with temporary SAS data sets. These are data sets that youcreate during a SAS session, and those data sets are available to you only during that SAS session. SAS writes temporary data sets to the hard drive for your use during thesession, but the files are deleted when you exit SAS. If you want to work with thesame SAS data set during another session, you have to create it again each time.

    As you can imagine, having to create large, complex SAS data sets each time you needto use them is inefficient, especially if there is a lot of processing required to trans-form the data from raw data to data that is ready for analyses.

    Temporary SAS data sets have either one-step names (e.g., flights) or two-step names,with the first step being work (e.g., work.flights). It is your choice whether you want touse the one-step name (to save some typing) or the two-step name (so that workreminds you that the data set is temporary) when you refer to them in your programs.

    Permanent SAS data sets are stored on your hard drive or other storage device and areavailable to you at all times, as long as youve used LIBNAME statements to tell SASthe path for the folder where they are stored.

    The only difference between a temporary SAS data set and a permanent SAS data setis where the data set is stored. Permanent SAS data sets are stored in a folder otherthan the one that SAS uses for WORK (temporary data sets) so they are not deletedwhen you exit SAS.

    EXST700xSASHandout.lwp 31 Copyright 2009 LSU Dept. of Experimental Statistics

  • LIBNAME statements

    LIBNAME statements are used to associate a libref (and alias or nickname) with a path(drive letter and folder(s)). The libref can be up to eight characters long, must startwith a letter or underscore, and cannot contain spaces. The LIBNAME statementincludes a libref, which is the first step in the two-step name, and the path to thefolder. Submit the LIBNAME statement just once during the session. If you includeLIBNAME statements in a program, its best to put them toward the beginning of theprogram (i.e., before or after the OPTIONS statement) even if you wont create or readthe permanent SAS data set until later in the program.

    libname myperm 'c:\mysasdatasets';

    Note that the LIBNAME statement above includes only a path (drive letter and folder),not a file name. The permanent SAS data set flights2 (which is actually namedflights2.sas7bdat) stored in C:\mysasdatasets would be referred to as myperm.flights2, as shownin the example below:

    proc print data=myperm.flights2;run;

    Accessing an existing permanent SAS data set

    Youll work with a SAS data set named classgrades4. The complete file name isclassgrades4.sas7bdat. Before you begin, you need to know where this file is located. Ifyoure working on an EXST lab computer, its in S:\EXST700x.

    1. Get into SAS or clear the log, output, and program editor windows if you arealready in SAS.

    2. In the program editor window, type a LIBNAME statement similar to the oneshown below, but with the correct path for the file on your computer. (Note: The1 at the beginning is a line number, not something that you type.) The libref isgrades in this example, but you can use any word you like, and you can use adifferent libref for the same path during another SAS session.

    3. Check the log to verify that the LIBREF was assigned correctly.

    4. In SAS Explorer:

    a. Navigate to Libraries and you will see Grades listed in addition to the librariesthat you usually see (Maps, Sashelp, Sasuser, and Work).

    b. Double-click on Grades and you will see Classgrades4. You could view the dataset now, but youre going to do that after youve run a short program. If thereare other SAS data sets in this folder, youll see them, too.

    EXST700xSASHandout.lwp 32 Copyright 2009 LSU Dept. of Experimental Statistics

  • 5. Clear the program editor window. Youve submitted the LIBNAME statement andit applies for the duration of your SAS session so you dont need it any longer.

    6. Type the following program, noting that you are using a two-step data set nameand the first part is the libref grades that you just submitted. You will run PROCCONTENTS first because thats the first thing you should do when you receive apermanent SAS data set from someone, and youll also run PROC PRINT. You areincluding the DM and OPTIONS statements because its a good habit to developand also in case you have to correct errors and resubmit the program.

    7. Submit the program then check the log. If there are errors, correct them andresubmit the program. Once the log is clean (shows no errors), look at theoutput window.

    The PROC CONTENTS output looks like this:

    EXST700xSASHandout.lwp 33 Copyright 2009 LSU Dept. of Experimental Statistics

  • The list of variables shows more information than youve seen before. Thevariables are formatted and have labels. Both Exam1 and LabAvg are formatted toprint with one digit to the right of the decimal point and Ex1Date is formatted toprint in the format ddMONyyyy.Also, this data set was created by sorting a temporary data set, so there is sortinformation.

    The PROC PRINT output looks like this:

    Notice that the observations are in alphabetical order. Exam1 and LabAvg print withone digit to the right of the decimal point, and Ex1Date prints as a date becausethese variables are formatted.

    8. View classgrades4 using SAS Explorer. Notethat some of the column headings do notmatch the column headings in the PROCPRINT example above (e.g., Sec/Section,Exam1/Exam 1, LabAvg/Lab Avg., Ex1Date/Exam 1Date). The actual variable names are used inthe PROC PRINT output, while the labels areused in VIEWTABLE. Even though youhavent learned about labels, its importantthat you learn to look for variable names andlabels because you must use variable names,not labels, when you are programming. (Formore information on labels, refer to page 53.)

    Also notice that ID and Section contain digits.When you look at the properties of thesevariables in PROC CONTENTS or ViewColumns, youll see that these are character, not numeric, variables. Thats because,although they contain digits, they are not measures or counts of anything andthey arent used in mathematical calculations. If ID were numeric, the leadingzero on one ID would be lost.

    EXST700xSASHandout.lwp 34 Copyright 2009 LSU Dept. of Experimental Statistics

  • 9. Right-click on classgrades4 then View Columns. Youll have to scroll right to see the labels.

    10. Close the Properties dialog and the VIEWTABLE window then clear all open windows.

    Creating a permanent SAS data set

    Youll modify a program that youve already used so that it creates a permanent SASdata set.

    1. Open demo2.sas. Youll add a LIBNAME statement and edit the SAS data setnames, as shown below. If you are working on a lab computer, you should usethe path c:\temp. If you are working on your own computer, you may want tocreate a folder for your EXST homework. If youve done that, you can use thatfolder as your path. You can use any libref you like; this example uses perm.

    2. Look at the log. If you have errors, correct them and resubmit the program. Whenyour program runs correctly, the log looks like this:

    EXST700xSASHandout.lwp 35 Copyright 2009 LSU Dept. of Experimental Statistics

  • 3. In SAS Explorer, navigate up from the Grades library using the Up Folder button on the toolbar. Youll see the new library Perm. Navigate to it and open Grades2.There may be other SAS data sets in the library (folder).

    4. Close SAS.

    If you wanted to access this SAS data set in a later SAS session, you wouldsubmit a LIBNAME statement, then use the proper two-step name, as covered inthe previous section.

    Creating a library using SAS Explorer

    You can also create a libref in SAS Explorer. Use the New Library button on the toolbarto access the following screen:

    Type the libref in the Name: text box and provide the path in the Path: text box. If youwant the library to be available to you during every SAS session without having tosubmit a statement, Enable at startup.

    To remove the library, ] Delete.

    Reading Excel worksheets as permanent SAS data sets

    SAS can read data directly from an Excel worksheet as if it were a SAS data set; youdo not have to import the data into SAS.

    Note: this is an advanced technique. You must understand how to work with perma-nent SAS data sets before trying this.

    The following example code shows how to read the worksheet Sheet1 from an Excelworkbook (file) named ClassGrades3.xls stored in C:\EXST700x.

    The LIBNAME statement is similar to the one used to access permanent SAS data setsexcept that it includes the complete file name and extension, in addition to the driveletter and path.

    EXST700xSASHandout.lwp 36 Copyright 2009 LSU Dept. of Experimental Statistics

  • The worksheet within the file is used as the data set name. Note that it is enclosed insingle quotes, is followed by the $, and the letter n follows the second quotation mark

    The graphic below should make this clearer:

    The log from the above program looks like this:

    The output from the program looks like this:

    EXST700xSASHandout.lwp 37 Copyright 2009 LSU Dept. of Experimental Statistics

  • Creating a New SAS Data Setfrom Existing SAS Data Sets

    Subsetting a SAS data set

    You may want to create a SAS data set that contains only some observations from aparticular data set. The DATA statement tells SAS the name of the new SAS data setand the SET statement tells SAS the name of the existing SAS data set(s) from whichto create the new one.

    WHERE

    The WHERE statement can be used to specify thecharacteristic(s) of the observations you want in thenew SAS data set.

    This example uses the permanent SAS data setclassgrades4 (shown at right).

    The following statement was submitted at the begin-ning of the program to associate the libref perm (shortfor permanent) with the path for this file.

    libname perm 'c:\EXST700x';

    The following code creates a new SAS data set namedSection2 with only the observations from section 2 fromthe existing permanent SAS data set classgrades4.

    run;include only observations for section 2 where Sec='2'; from the permanent SAS data set classgrades4 set perm.classgrades4; create new temporary SAS data set Section2data Section2;ExplanationCode

    The resulting data set, with only observations from section 2, is shown below:

    The WHERE statement is covered in more detail on page 43.

    EXST700xSASHandout.lwp 38 Copyright 2009 LSU Dept. of Experimental Statistics

  • The subsetting IF

    In a DATA step, you can use a subsetting IF to keep only certain observations.

    IF State='LA';

    Theres an implied THEN in the above statement: IF State='LA' THEN OUTPUT;

    It does the same thing as the statement IF State ne 'LA' THEN DELETE;

    Which is better? It probably doesnt matter, but if its a case of wanting to keep only onething and deleting the rest, its probably better to use the subsetting IF or WHERE. If youwant to delete only the Louisiana records, then youd probably want to use IF Statene 'LA'; or WHERE State ne 'LA'; rather than IF State='LA' THEN DELETE;

    More detailed information on IF can be found starting on page 45.

    Concatenating SAS data sets

    Data sets can be concatenated, which means combining them one after the other intoa single data set. In this course, you will probably concatenate data sets with the samevariables, but SAS allows you to combine data setswith different variables, as well.

    This example uses three data sets (one for eachsection, shown to the right) which are combinedinto a single data set. The DATA statement namesthe new data set to be created and the SET state-ment includes the names of the existing data setsto be combined. The new data set contains all theobservations from the first data set in the SETstatement, then all the observations from thesecond data set, and so on.

    data AllSecs; set Sec1 Sec2 Sec3;run;

    The resulting data set looks like this:

    EXST700xSASHandout.lwp 39 Copyright 2009 LSU Dept. of Experimental Statistics

  • Merging SAS data sets

    SAS data sets can be merged by values of one (or more) common variables. In thisexample, one data set contains information about students and the other containsinformation about sections. The common variable is Sec. When you merge data sets bya variable (or variables), the data sets must be in order by that variable (or variables). Ifthe data are already in the proper order, you do not need to sort them. However, ifthey are not in the correct order, the data set must be sorted.

    In this example, the permanent SAS data setclassgrades4 is not in order by Sec so it must besorted. Because it is not good practice to sort apermanent SAS data set over itself, the data aresorted to a new temporary SAS data set SecOrder.libname perm 'c:\EXST700x';

    proc sort data=perm.classgrades4 out=SecOrder; by Sec;run;

    The sorted data set is shown below, as are theproperties. Note that although the columnheading is Section, the actual column (variable)name is Sec. Thats because the variables in thisdata set have labels.

    EXST700xSASHandout.lwp 40 Copyright 2009 LSU Dept. of Experimental Statistics

  • The data set to be merged with SecOrder is Sections. This data set also contains thevariable Sec. Because this data set is already in order by Sec, it does not have to besorted before it can be merged with SecOrder.

    To merge the two data sets, there is a DATA statement which creates the new data set,a MERGE statement which names the data sets to be used to create the new data set,and a BY statement that lists the variable(s) whose values are to be compared.

    run;using the values of the variable Sec by Sec;

    by merging SecOrder and Sections merge SecOrder Sections;

    create new temporary SAS data set SecMergeddata SecMerged;ExplanationCode

    EXST700xSASHandout.lwp 41 Copyright 2009 LSU Dept. of Experimental Statistics

  • Miscellaneous Topics

    This sections covers miscellaneous topics that dont fit anywhere else. Most apply tothe DATA step.

    Missing values

    SAS stores missing numeric values as . (a period) and missing character values as aspace. Do not use ' ' to indicate a missing values for numeric variables. Yes, SAS letsyou get away with it by converting from character to numeric (or numeric tocharacter) automatically and prints a warning message in the log (e.g., NOTE: Charactervalues have been converted to numeric values at the places given by:), but it is sloppy coding.

    IF Course = .;IF Course = ' ';Character

    IF Exam1 = ' ';IF Exam1 = .;Numeric

    IncorrectCorrect:Variable type:

    Specifying character vs. numeric values

    Character values

    When you specify the value of a character variable, whether in an assignment state-ment, WHERE, IF, or any other statement, the value is enclosed in quotation marks(either single or double). Examples:

    WHERE Animal='horse';IF Model="Murano";IF Model='Murano' THEN Make='Nissan';IF Section='1' THEN ClassTime='MW 1:30';Color='blue';

    (Note: Section was read as a character variable.)

    Numeric values

    When you specify the value of a numeric variable, the value is not enclosed in quota-tion marks:

    WHERE Year=2005;IF Age =65 THEN Deductions=2;IF Age >=50 THEN AARP='Eligible';Seats=25;

    EXST700xSASHandout.lwp 42 Copyright 2009 LSU Dept. of Experimental Statistics

  • When exact spelling and capitalization matter

    SAS is very forgiving about the order of statements, indentation of statements, andcapitalization of variable names and data set names. Unlike other popular computersoftware, it is not at all forgiving about capitalization of the values of character values(i.e., the actuall data values). You must spell and capitalize variable values exactlythe way theyre stored in the SAS data set.

    If the value for the variable Species is OYSTER, EASTERN then you must spell andcapitalize it exactly that way (all caps, exactly one space after the comma).

    IF Species='oyster'; (not spelled correctly)IF Species='oyster, eastern'; (not capitalized)IF Species='oysters, eastern'; (s at end of oysters)IF Species='OYSTER,EASTERN'; (no space after comma)IF Species='OYSTER, EASTERN'; (two spaces after comma)

    will all be interpreted as false because the value is OYSTER, EASTERN (all caps, onespace after the comma). Computers are stupid. They cant make judgments likehumans do, and you wouldnt want them to. You want them to do exactly what youtell them to do, although sometimes what you actually tell them to do isnt what youthought you told them to do.

    WHERE

    The WHERE statement can be used in both DATA and PROC steps to control whichobservations are processed.

    DATA step example:

    The following code creates a new data set Section1 that contains only observations forstudents in section 1.

    libname perm 'c:\EXST700x';

    data Section1; set perm.classgrades4; where Sec='1';run;

    PROC step example:

    You do not have to create a new data set to run a PROC on selected data. You can alsouse a WHERE statement within a PROC step. The following example prints the obser-vations only for students.

    libname perm 'c:\EXST700x';

    proc print data=perm.classgrades4; where Sec='2'; title 'Students in Section 2';run;

    EXST700xSASHandout.lwp 43 Copyright 2009 LSU Dept. of Experimental Statistics

  • WHERE operators

    where College ismissing;

    missing valueIS MISSINGwhere LabAvg is null;missing valueIS NULL

    where LabAvg between95 and 100;

    where Name between'A' and 'J';

    inclusive rangeBETWEEN xAND y

    Selected special WHERE operators

    where College not in('AGRI' 'A&S');

    logical notNOT^ or ~ or

    where College = 'AGRI'or College = 'A&S';

    logical orOR|

    where College = 'AGRI'and Year = '4';

    logical andAND&Logical operators

    addition+division/multiplication*

    where (Exam1 +LabAvg) / 2 >= 90;

    where (Exam1 +LabAvg) / 2 < 70;

    exponentiation**Arithmetic operators

    where College in('AGRI' 'A&S');

    where Year in('3','4');

    equal to one of alist

    IN

    where Name >= 'G';where Exam1 GE 90;

    greater than orequal to

    GE>=

    where Name LT 'G';where Exam1 < 90;

    less thanLT 'G';where Exam1 GT 90;

    greater thanGT>

    where Sec NE '1';where Exam1 ^= 90;

    not equal toNE^= or ~= or =

    where Sec = '3';where Exam1 EQ 90;

    equal toEQ=Comparison operators

    ExampleDefinitionMnemonicSymbol

    EXST700xSASHandout.lwp 44 Copyright 2009 LSU Dept. of Experimental Statistics

  • The mysteries of IF/THEN/ELSE/DO

    The subsetting IF

    In a data step, you can use a subsetting IF to keep only certain observations.

    IF State='LA';

    Theres an implied THEN in the above statement: IF State='LA' THEN OUTPUT;

    It does the same thing as the statement IF State ne 'LA' THEN DELETE;

    Which is better? It probably doesnt matter, but if its a case of wanting to keep onlyone thing and deleting the rest, its probably better to use the subsetting IF. If youwant to delete only the Louisiana records, then youd probably want to use IF Statene 'LA'; rather than IF State='LA' THEN DELETE;

    IF/THEN

    Whenever you use an IF statement to do something other than subset, there must be aTHEN. Its easy to understand why you may miss that because we are accustomed tousing IF all the time in real life without including THEN: If I finish my EXST 700xassignment this afternoon, Im going to spend an hour reading a good book. The wordTHEN isnt in the statement, but we mean it: IF I finish my EXST 700x assignment thisafternoon, THEN Im going to spend an hour reading a good book.

    One assignment statement

    If there is only one thing you need to do, like assign a value to one variable, use onestatement. Whether you put it on one line or two is up to you. The course materialsfrom SAS Institute show putting the entire statement on one line, but what you do ispersonal preference. If you do use two lines, indent the second one to make the codeeasier to read.

    Do not use a DO/END for one task. It works, but theres no reason to write three state-ments when one will do.

    IF Model='Murano' THEN DO; Make='Nissan';END;

    Incorrect

    IF Model='Murano' THEN Make='Nissan';

    or

    IF Model='Murano' THEN Make='Nissan';

    Correct

    EXST700xSASHandout.lwp 45 Copyright 2009 LSU Dept. of Experimental Statistics

  • One assignment statement but multiple choices

    You may need to assign values to one variable, but there are different values depend-ing on the value another variable. If only one statement can be true (the state can beTX or it can be LA or it can be MS, etc.), then use IF/ELSE IF/ELSE. If you know thatthere are more of some values than others, put the statements for the more commonsituations at the top and the least common situations at the end.

    The simplest set of statements is for two conditions, which calls for IF/ELSE. Gendercan be either M or F and nothing else.

    IF Gender='M' THEN Dorm='Johnson';ELSE Dorm='Hester';

    IF State='LA' THEN Food='Spicy';ELSE Food='Bland';

    If there are more than two possibilities, each of which is mutually exclusive, use ELSEIF for all but the first and last choices:

    IF State='LA' THEN Food='Spicy';ELSE IF State='TX' THEN Food='Somewhat spicy';ELSE IF State='MS' THEN Food='Home cooking';ELSE Food='Bland';

    If you have three or more mutually exclusive possibilities, use IF/ELSE IF/ELSE.Minimize your code. The first option is IF. The last option is ELSE. Every optionbetween the first and last is ELSE IF.

    1.672 and above1.460-711.2less than 60Cost multiplierHeight (inches)

    Correct:

    if height < 60 then cost = height * 1.2;else if height >= 60 and height = 60 and height = 72 then cost = height * 1.6;

    if height < 60 then cost = height * 1.2;else if height >= 60 and height = 72 then cost = height * 1.6;

    EXST700xSASHandout.lwp 46 Copyright 2009 LSU Dept. of Experimental Statistics

  • Choices within choices

    You may have a several subchoices within a main choice. In that case, use a main IFor ELSE IF or ELSE and a DO/END.

    IF State='LA' THEN DO; IF Region='south' THEN Food='Spicy'; ELSE Food='Home cooking';END;ELSE IF State='TX' THEN Food='Somewhat spicy';ELSE IF State='MS' THEN Food='Home cooking';ELSE Food='Bland';

    Note that the first IF/DO/END could also be written:

    IF State='LA' AND Region='south' THEN Food='Spicy';IF State='LA' AND Region ne 'south' THEN Food='Home cooking';

    Both work. Im not sure whether one method is better than the other, but I prefer thefirst set of code even though its four lines to the second, which is two. Processingstate and then region within state is more logical to me.

    Independent IFs

    There may be times when you assign several different variables using an IF statement.If an observation needs to be evaluated for each condition, you use IF with no ELSE orELSE IF.

    Assignment statements

    IF State='LA' THEN Type='Southerner';IF Gender='M' THEN Dorm='Johnson';IF EyeColor='Blue' THEN Sunglasses='yes';

    In the code above, State, Gender, and EyeColor have nothing to do with each other soeach IF statement needs to be evaluated for each observation.

    Subsetting

    If you need to delete observations that have missing values for any of severalvariables, you may want to use several IF/THEN DELETE statements. The examplebelow deletes all observations that have missing values for height, weight, age, and/orgroup. That is, any observation that has a missing value for at least one of the variableswill be deleted. Your data set will include only observations that have values for allfour variables:

    EXST700xSASHandout.lwp 47 Copyright 2009 LSU Dept. of Experimental Statistics

  • IF height = . THEN DELETE;IF weight = . THEN DELETE;IF age = . THEN DELETE;IF group = ' ' THEN DELETE;

    The single statement below produces the same data set but its probably easier formost people to write, debug, and understand the four separate statements above thanthe single statement below:

    IF height NE . AND weight NE . AND age NE . AND group NE ' ';

    Multiple choices

    You may want to use the same assignment statement for several values of a variable.You must include the variable name and value each time. You cant just list thevariable name once with a series of values.

    IF Model='Murano' OR 'Altima' THEN Make='Nissan';Incorrect

    IF Model='Murano' OR Model='Altima' THEN Make='Nissan';Correct

    DROP/KEEP

    DROP and KEEP statements are used to reduce the number of variables in a data set.A DROP statement lists the variables to omit from a data set while a KEEP statementlists the variables to output to the data set.

    DROP list-of-variables;

    KEEP list-of-variables;

    DROP/KEEP statements are not executable, so it doesnt matter where in the DATAstep they appear. Variables listed in the DROP statement are available for processingin the current DATA step, they just arent output to the new data set.

    It doesnt matter which statement you use. You may choose to use whichever onerequires a shorter list of variables (e.g., if you want to output ten of fifteen variables toa new data set, using DROP is shorter than using KEEP, but if you want to outputthree of twelve variables, using KEEP is shorter than using DROP) or you may want touse KEEP so that you explicitly specify the variables in the new data set.

    Example 1: dropping variables that must be read

    Sometimes you have raw data that contain variables you dont need, but you have toread the unneeded variables as well as the variables you do need. The followingexample shows a tab-delimited file that contains values for eight variables. Note thatonly the values for the first variable (ID) line up; the rest do not. All records includethe same data (ID, name, course and section, date, time, number of questions, numberof questions answered correctly, and score). In this case, youre interested in just ID,

    EXST700xSASHandout.lwp 48 Copyright 2009 LSU Dept. of Experimental Statistics

  • name, and score. However, in order to read the variables you do need, you also haveto read the ones that you dont.

    The log below shows the data step that reads all eight variables from the tab-delimitedraw data file (note the dlm='09'x option in the INFILE statement) then keeps onlythe three variables that are needed. This version of the INPUT statement lists onevariable per line. (Note: This program demonstrates INPUT statement techniques thatare beyond the scope of this course.)

    Example 2: dropping temporary variables

    Sometimes you need to create new variables from existing variables. You can usuallycreate one long assignment statement, but its often easier to use several shorterassignment statements to create intermediate variables. However, you dont need theextra variables in the output data set.

    EXST700xSASHandout.lwp 49 Copyright 2009 LSU Dept. of Experimental Statistics

  • This example shows the process of converting character IDs which consist of two lettersand two digits (e.g., BH37) to a six-digit numeric ID (667237). The techniques shown inthis program are beyond the scope of this course, but it demonstrates dropping variablesthat are used during the DATA step but which do not belong in the output data set. Theactual problem was much more complicated than this example shows, but this is repre-sentative. In this case, the original character ID damID is dropped, as well as the tempo-rary variables used to create the new numeric ID newdamID.libname franke 'c:\consulting\franke';data numericIDs; set franke.monkeys; char1 = rank(substr(damID,1,1,)); char2 = rank(substr(damID,2,1)); digits = substr(damID,3,2); newdamID = input(cats(char1,char2,digits)); drop damID char1 char2 digits;run;

    NOTE: The following single statement could be substituted for the four assignmentstatements but this single statement is more difficult to write, debug, and understand.

    newdamID = input(cats(rank(substr(damID,1,1,)), rank(substr(damID,2,1)),substr(damID,3,2)));

    TITLE/FOOTNOTE

    TITLE statements print at the top of each page of output, FOOTNOTE statements printat the bottom. The default TITLE statement is The SAS System, but there is no defaultFOOTNOTE statement.

    TITLE and FOOTNOTE statements are in the form, where the title or footnote text isenclosed in either apostrophes (single quotes) or quotation marks:

    TITLEn 'title text';FOOTNOTEn "title text";

    where n is a number from 1 to 10. The numbers indicate which line the title (orfootnote) appears on, so TITLE1 is the first line, TITLE4 is the fourth line, etc. Youcan omit titles with specific numbers (e.g., TITLE2 and TITLE3) to get blank linesbetween titles (or footnotes) in your output.

    There can be up to 10 TITLE statements (TITLE1 through TITLE10; TITLE is the sameas TITLE1.) and 10 FOOTNOTE statements. TITLE/FOOTNOTE statements remain ineffect until they are changed (a new statement with the same number is submitted),canceled, or you exit from SAS.

    Submitting a TITLE (or FOOTNOTE) statement with a particular number (e.g.,TITLE5) replaces the existing TITLE (or FOOTNOTE) with the same number andcancels all statements with higher numbers (e.g., TITLE6 and higher if the new state-ment is TITLE5).

    The null title or footnote statement (TITLE; or FOOTNOTE;) cancels all titles or footnotes.

    EXST700xSASHandout.lwp 50 Copyright 2009 LSU Dept. of Experimental Statistics

  • If you need to include an apostrophe in a title or footnote, use quotation marks aroundthe title text. If you need to include quotation marks in the title/footnote text, useapostrophes (single quotes).

    title "Dr. Smith's Exam 1 Grades";

    produces the title Dr. Smith's Exam 1 Grades

    title 'Examples of "he said/she said" cases';

    produces the title Examples of "he said/she said" cases

    RENAME

    The RENAME statement is used in a DATA step to change the name of variables. It isnot an executable statement, so it can go anywhere in the DATA step. The variablekeeps its original name within the DATA step; it has the new name only in the newoutput data set.

    The following example shows a permanent SAS data set with short variable names forthe three exam grades (E1, E2, and E3). The program creates a new variable ExamAvgthat is the mean of the three exam grades and it renames the exam grade variables.Note that the original names are used in the assignment statement, not the new name.Also, one RENAME statement is used to rename several variables. You can list multi-ple variables on a single line in a RENAME statement, but its easier to work with ifeach variable is listed on a separate line.

    Basic RENAME example

    Original data set with short variable names (first five observations):

    Log:

    EXST700xSASHandout.lwp 51 Copyright 2009 LSU Dept. of Experimental Statistics

  • New data set with renamed variables (first five observations):

    Excel data RENAME example

    When you import an Excel worksheet, the variables are imported with both variablenames and labels. Its important to realize that because if you use RENAME, the factthat the labels match the old variable/column names makes it look like RENAME didnot work.

    Original imported data set. Note that there are both column names and labels, whichare the same:

    First RENAME attempt log:

    EXST700xSASHandout.lwp 52 Copyright 2009 LSU Dept. of Experimental Statistics

  • In the screen shot of the new data set, it looks like the RENAME did not work.However, if you look at the column properties, youll see that the column names havechanged but the labels have not. VIEWTABLE uses labels rather than column(variable) names.

    RENAME and LABEL

    If you want to see the new variable names in VIEWTABLE view, youll need to use aLABEL statement in addition to RENAME. You dont have to do this, though, as longas you are aware that the variable names and labels dont match.

    The screen shot of the data set and the properties show that the variable names andthe labels now match.

    LABEL

    A LABEL statement can be used to assign longer, more descriptive information tovariables. Labels can be up to 256 characters long (compared to 32 for variable names)and can contain spaces (variable names cannot).

    EXST700xSASHandout.lwp 53 Copyright 2009 LSU Dept. of Experimental Statistics

  • Whether the label p