1 Introduction to SAS Available at raulcruz

Preview:

Citation preview

1

Introduction to SAS

Available at http://brac.umd.edu/~raulcruz

2

What is SAS?

SAS = “Statistical Analysis System” – developed for both data manipulation and data analyses in 1976

Visit the SAS website: http://www.sas.com

3

Basics of SAS 5 Windows

EDITOR – file where you write code and comments for execution by SAS (save as .sas)

LOG – file where notes about the execution of the program are written, as well as errors (save as .log)

OUTPUT – file where results from the program are written (save as .lst)

Explorer Window

Results Window

4

The SAS interface consists of multiple windows designed for specific functions.

The following windows are open by default:

Enhanced Editor Window

Output Window Log

Window Explorer

Window Results

Window Type SAS programs here.  The "enhanced" editor has more advanced features than the traditional "program editor" used in SAS 6.12.

View the results of SAS procedures including tables and line charts.  Graphs will be displayed in a separate Graph window.

View  SAS programs as they execute including error messages and warnings.

Browse your SAS tables (datasets) and libraries.  Create new files and file shortcuts.

Displays a hierarchical outline of SAS results to simplify output navigation.

5

SAS Menus File: file input/output Edit: Editing contents in every window

Contents in LOG and OUTPUT windows are not editable, but deletable

View: view programs, log files, outputs, and data sets

Tools: editors for graphics, report, table, etc Solutions: analysis without writing codes Window: navigating among windows Help: help information of SAS

6

SAS toolbar

The toolbar gives you quick access to commands that are already accessible through the pull down menus

Not all operating environments have a toolbar

7

SAS command bar

Command bar is a place where you can type in SAS command.

Most commands you can type in the command bar are accessible through the SAS menus or the toolbar

8

Controlling your windows

The window pull-down menu Type the name of the window in

the command bar Click on the window

9

Basic Rules of SAS Codes Every SAS statement ends with a semicolon ; Lines of data are NOT separated by semicolons SAS statements can extend over multiple lines

provided you do not split a word of the statement across lines

More than one statement can appear on a single line

You can start statement anywhere within a line (not recommended)

SAS is case insensitive Words in SAS statement are separated by blanks

10

SAS Steps Two main types of SAS steps:

Data Step: read in data, manipulate datasets etc.

PROC Step: perform statistical analyses etc.

DATA and PROC steps execute when a RUN, QUIT, or CARDS statement is enters Another DATA or PROC statement is entered The ENDSAS statement is entered

11

SAS Comments Two ways to comment:

/* …..comments…..*/ good for long documentation good for commenting out sections of code

*……comments……; good for commenting out one line of code only commented until first ‘;’

SAS Comments are green in (SAS steps are blue)

Example 1/*Data instructor contains information of several teachers*/

data instructor;input name $ gender $ age;cards;Jane F 30Mary F 29Mike M 28;run;Proc means;var age; run;

SAS Dataset Basic structure: a rectangular matrix

Name Sex Age

Observation 1 Jane F 30

Observation 2 Mary F 29

Observation 3 Mike M 28

Columns are variables Rows are observations

14

SAS data type

(1) Numeric data: numbers• Can be added and subtracted• Can have decimal places• Can be positive or negative

(2) Character data: contains letters, numerals or special characters

SAS Dataset and variable names

Dataset name Start with A-Z or underscore character _ Can contain only letters, numbers, or

underscores Can contain upper- and lowercase letters choose names which are easy to be memorized Can be greater than 8 characters in SAS 8.0+

Variable name: same rule as dataset name

16

Examples: valid SAS names

Parts LastName First_Name _Null1_ X12 X1Y1

17

Examples: invalid SAS names

3Parts Last Name First-Name _Null1$ Num%

18

Submitting a program in SAS

First, get your program into the editor

Type your program in the editor Open an existing SAS program: use

open from the File full down menu or use the open icon or just click your SAS program directly

19

Submitting a program in SAS

Make your editor window active, and submit your code by

Submit Icon Enter submit in the command bar Select “submit” from the Run pull-

down menu

20

Submitting a program in SAS

Reading the SAS log window It starts with notes about the version

of SAS and your SAS site number Original SAS code with line numbers

added on the left Notes contains information about

SAS data set and computer resources used

21

Assessing errors in .log file Non-error SAS messages begin with NOTE: SAS error message begin with ERROR: or

possibly WARNING: In data set creation NOTE’s are important to

read because they indicate if the data set was created correctly. Many times there are no errors yet the data set is not correct.

ERROR message sometimes give you hints about options or keywords in DATA/PROC steps

22

The output window

Viewing results from the output window

You can save and print contents in the output window

When you have a lot of output, one easy way to find the specific output is to use the list in the “results” window

23

Creating HTML output Tools --- Options --Preferences Click on the “Results” tab Click the box next to “Create HTML” Once turned on, results will be show

in the “Results Viewer” window Results viewer window just show

one piece of output at a time To turn off, just uncheck it

24

SAS Data Libraries A SAS library is simply a location where

SAS data sets are stored Explorer window, click on “libraries”,

there are at least three libraries: Sashelp, Sasuser and work.

Sashelp and Sasuser contains information that controls your SAS session.

Work is the default library, it is a temporary storage location for SAS data sets.

25

Creating a new library

Make the “Active libraries” window active (click Explorer, then click libraries)

Choose “New” from the File menu or right click in the active libraries window and choose “New” from the pop-up menu

26

Creating a new library Type the name of the library in the box

after name. This name must be eight characters or

fewer, and contains only letters, numbers and underscore.

In the path field, type in the complete path to the folder or directory where you want to save your data (or use Browse)

27

Creating a new library

Another way to create a new library is to use the LIBNAME statement to associate the library with a directory accessible from your computer. LIBNAME mylib ‘E:/’;

associates the directory h:/EPIB698A/week1 with the name mylib. Mylib is known as a libref (a library reference)

Temporary/permanent SAS datasets

Every SAS dataset is stored in a SAS data library. By default all data sets created during a SAS session are

temporary data sets and are deleted when you close SAS. All data sets associated with the library WORK are

deleted at the end of the SAS session (they are temporary).

A permanent data set is a data set that will not be deleted when SAS is exited. To create a permanent data set, simply use a different

library name to create a data set.

To create Permanent SAS datasets

Code to create permanent SAS datasets

libname yourlib ‘E:/';

data yourlib.instructor; input name $ sex $ age; cards; Mike M 30 Wendy F 29 Jane F 28 ; run;

30

To access Permanent SAS datasets When you start a new SAS session, the

permanent datasets can be accessed directly using libref.

The name of the libref can be different from the name you used when creating the permanent data set.libname mylib ‘E:\';

proc print data=mylib.instructor;

run;

31

Viewing SAS data with SAS Explorer Click the libraries icon in the Explorer

window Click the library you want to see Click the data name to open a SAS

data To go back to the previous window

within Explorer, choose “up one level” from the view menu, or click the up one level button on the toolbar

32

Listing the properties of a SAS data set

Right click the SAS data icon Select “Properties” from the pop

up menu If choose columns, SAS displays

information about the columns (or variables) in the data set.

PROC contents

PROC contents prints the descriptive information about the data set and the variables in the data set Data set information: name, number of

observations, number of variables, and date created

Variable information: name, internal order, type, length, format/informat, and label

Very useful for snapshot a data set Syntax:proc contents data=data_set_name;

run;

TITLES Titles are descriptive headers SAS places at the top of

each page of the OUT window. A title is set with the TITLE statement followed by a

string of character. The string must be enclosed in single or double

quotes. The maximum length for a string is 200 characters. If you want multiple line titles you can use the TITLE

statement where the word title is followed by a number:title1 ‘EPIB 698A'; title2 'week1';

To clear the title setting simply executetitle;

35

PROC print The PRINT procedure prints the observations in a SAS

data set to the output window. Features:

Autoformatting columns labeled with variable names or labels automatic accumulation and printing of subtotals

and totals Syntax:

proc print data=data_set_name options;

var var1 var2 var3 var4;

run;

Order

36

PROC print (cont.) The var statement

The var statement is used to specify the variables to process in a proc step. Not unique to proc print.

Variables are usually processed in the order listed in the var statement.

Only applies to a local proc step (not global) If no var statement is used, generally the

procedure will process all the variables (or all the numeric variables if a calculation is performed).

37

PROC print (cont.) Useful options with PROC print:

double: double spaces the output noobs: suppresses observation numbers label: uses variable labels as column headings

added statements for use in PROC print: sums variables at bottom of output:sum variable_list;

Import/Export Data

To Export SAS datasets Go to the File menu and select “Export Data” Choose the data file ( from the library Work) Locate and select file type using the browse button Save the data set and finish Check the log to make sure the data set was

created This method does not require a data step, but any

modification may require a data step Convenient for Excel file

Import a SAS data set follows similar step38

39

Home gardener's data

DATA homegarden;

INFILE ‘E:\Garden.txt';

INPUT Name $ 1-7 Tomato Zucchini Peas grapes;

group = 14;

Type = 'home';

Zucchini_1= Zucchini * 10;

Total=tomato + zucchini_1 + peas + grapes;

PerTom = (Tomato / Total) * 100;

Run;

40

Modifying a data set with the SET

statement

The SET statement The SET statement in the data step allows you to

read a SAS data set so that you can add new variables, create a subset, or modify the data set

The SET statement brings a SAS data set, one observation at a time, in to a data step for processing

Syntax: Data new-data-set ;Set data set;

41

Data new; input x y ; cards; 1 2 3 4 ;run;

Data new1; set new; z=x+y;run;

Modifying a data set with the

SET statement

GPLOT The GPLOT procedure plots the values of two or more

variables on a set of coordinate axes (X and Y). The procedure produces a variety of two-dimensional

graphs including simple scatter plots overlay plots in which multiple sets of data points

display on one set of axes

Procedure Syntax: PROC GPLOT

PROC GPLOT; PLOT y*x </option(s)>;

run;

Example: plot of systolic blood pressure (SBP) by diastolic blood pressure (DBP)

title "Scatter Plot of SBP by DBP";proc gplot data=d.clinic;

plot SBP * DBP;run;

44

data exponential_survival_function;lambda1 = .2;lambda2 = .002;lambda3 = .1;x=0;do x = 0.01 to 15 by .01; S_x_1 = exp(-lambda1*x); S_x_2 = exp(-lambda2*x); S_x_3 = exp(-lambda3*x); output;end;run;

data linetext(drop=S_x_1 S_x_2 S_x_3 lambda1 lambda2 lambda3); /*Creating variables "function"=label,xsys ='2', ysys ='2', hsys= '3', position='6' and size='2'(size of text) */ retain function 'label' xsys ysys '2' hsys '3' position '6' size 1.9; set exponential_survival_function end=last; style = "'Albany AMT/bold'"; /*Setting the variable y equal to the last values for each S_x_lambda and the text corresponding to each lambda */ if last then do; y=S_x_1-.01; text='lambda=.2 '; output; y=S_x_2-.01; text='lambda=.002'; output; y=S_x_3-.01; text='lambda=.1'; output; end; x=x-.5;run; /* Add a title to the graph */title1 'Exponential Survival Function ';

/* Create axis definitions *//*offset = extra space for the labels of the curves*/axis1 offset=(1,10) label=('X (time)');axis2 label=('Suvival Function');

/* Produce the plot */proc gplot data=exponential_survival_function; plot (S_x_1 S_x_2 S_x_3)*x / overlay annotate=linetext haxis=axis1 vaxis=axis2;run;quit;

Plots Exponential Survival Function

45

46

data weibull_survival_function;lambda1 = .2;alpha1= .5;lambda2 = .1;alpha2=1.0;lambda3 = .002;alpha3=3.0;x=0;do x = 0.01 to 15 by .01; S_x_1 = exp(-lambda1*x**alpha1); S_x_2 = exp(-lambda2*x**alpha2); S_x_3 = exp(-lambda3*x**alpha3); output;end;run;

data linetext(drop=S_x_1 S_x_2 S_x_3 lambda1 lambda2 lambda3); /*Creating variables "function"=label,xsys ='2', ysys ='2', hsys= '3', position='6' and size='2'(size of text) */ retain function 'label' xsys ysys '2' hsys '3' position '6' size 2; set weibull_survival_function end=last; style = "'Albany AMT/bold'"; /*Setting the variable y equal to the last values for each S_x_lambda and the text corresponding to each lambda */ if last then do; y=S_x_1; text='lambda=.2, alpha=.50 '; output; y=S_x_2; text='lambda=.1, alpha=1.0 '; output; y=S_x_3; text='lambda=.002, alpha=3.0 '; output; end;run; /* Add a title to the graph */title1 'Weibull Survival Function ';

/* Create axis definitions *//*offset = extra space for the labels of the curves*/axis1 offset=(1,20) label=('X (time)');axis2 label=('Suvival Function');

/* Produce the plot */proc gplot data=weibull_survival_function; plot (S_x_1 S_x_2 S_x_3)*x / overlay annotate=linetext haxis=axis1 vaxis=axis2;run;quit;

Plots Weibull Survival Function

47

48

Plots Exponential Hazard Function

The hazard function is constant when the survival time

is exponentially distributed

49

data weibull_hazard_function;lambda1 = .2;alpha1= .5;lambda2 = .1;alpha2=1.0;lambda3 = .002;alpha3=3.0;x=0;do x = 0.01 to 15 by .01; h_x_1 = lambda1*alpha1*(x**(alpha1-1)); h_x_2 = lambda2*alpha2*(x**(alpha2-1)); h_x_3 = lambda3*alpha3*(x**(alpha3-1)); output;end;run;

data linetext(drop=h_x_1 h_x_2 h_x_3 lambda1 lambda2 lambda3); /*Creating variables "function"=label,xsys ='2', ysys ='2', hsys= '3', position='6' and size='2'(size of text) */ retain function 'label' xsys ysys '2' hsys '3' position '6' size 2; set weibull_hazard_function end=last; style = "'Albany AMT/bold'"; /*Setting the variable y equal to the last values for each S_x_lambda and the text corresponding to each lambda */ if last then do; y=h_x_1; text='lambda=.2, alpha=.50 '; output; y=h_x_2; text='lambda=.1, alpha=1.0 '; output; y=h_x_3; text='lambda=.002, alpha=3.0 '; output; end;run; /* Add a title to the graph */title1 'Weibull hazard Function ';

/* Create axis definitions *//*offset = extra space for the labels of the curves*/axis1 offset=(1,20) label=('X (time)');axis2 label=('Hazard Function');

/* Produce the plot */proc gplot data=weibull_hazard_function; plot (h_x_1 h_x_2 h_x_3)*x / overlay annotate=linetext haxis=axis1 vaxis=axis2;run;quit;

Plots Weibull Hazard Function

50

51

Checking Exponential Distribution

data exponential_survival_function;set exponential_survival_function;log_S_x=log(S_x_1);run;

/* Add a title to the graph */title1 'Check for Exponential Distribution Function ';

/* Create axis definitions *//*offset = extra space for the labels of the curves*/axis1 label=('X (time)');axis2 label=('ln[S(x)]');

proc gplot data=exponential_survival_function; plot log_S_x *x /haxis=axis1 vaxis=axis2;run;quit;

Add this code to the program in slide 44

52

53

Checking Exponential Distribution

data weibull_survival_function;set weibull_survival_function;log_S_x=log(-1*(log(S_x_1)));log_x=log(x);run;

/* Add a title to the graph */title1 'Check for Weibull Distribution Function ';

/* Create axis definitions *//*offset = extra space for the labels of the curves*/axis1 label=('log(X) (time)');axis2 label=('ln[-ln[S(x)]]');

proc gplot data=weibull_survival_function; plot log_S_x *log_x /haxis=axis1 vaxis=axis2;;run;quit;

Add this code to the program in slide 46

54

55

Lab with SAS Regents Drive Garage (Building #202) in Room

0504.  The lab is open 24 hours, 7 days per week: http://www.oit.umd.edu/as/cl/

Securing SAS outside the classroom Labs (http://www.oit.umd.edu/as/cl/) Desktop version from departments Room 1304 SPH Building

Recommended