12
1 Computing Tutorial STAT 6510 Winter 2012 1. SAS Overview After you open SAS, you will note that there are lots and lots of windows for the program: Explorer: Keeps track of your libraries and files. Results: Keeps track of your output. Output: Displays output. Log: Lets you know what SAS has done. Editor: A word processor that you use to write programs. Before we start working with data, we need to get a dataset in SAS. There are many ways of doing this. For this class, I will discuss two different ways: Using the Table Editor to enter data by hand Importing data from a text file (including .txt or .csv) Once we have the data in hand, we will want to work with it. Although SAS has begun to support point-and-click type approaches to data analysis (using menus), it is better practice to store everything you do to the dataset in a program. That way, when you make a mistake (and you will), it is simple to change one small part of what you have programmed, and re-run the program. If you rely on point-and-click, you will have to point-and-click all over again. SAS does allow you to save the program commands that are executed when you point-and-click, which is a good way to learn program commands to include in your overall program. Programs primarily consist of three types of commands. SAS Variable definitions (which we will use to specify folder locations) Data steps are used to read data into SAS, manipulate data, and format data. Procedure steps (abbreviated proc) are used to perform calculations and produce plots using the data. Here is an example of a program. It assumes that SAS already knows about the dataset "work.diamondSRS". This program keeps only the size of the diamonds (carats), sorts them by size, and saves only the large sizes in a new data set called ``largediamond”. Finally, the program creates a histogram of the large diamond sizes. We will look more carefully at each statement as we go through the basics of SAS programming.

Computing Tutorial - asc.ohio-state.edu

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

1

Computing Tutorial

STAT 6510

Winter 2012

1. SAS Overview

After you open SAS, you will note that there are lots and lots of windows for the program:

Explorer: Keeps track of your libraries and files.

Results: Keeps track of your output.

Output: Displays output.

Log: Lets you know what SAS has done.

Editor: A word processor that you use to write programs.

Before we start working with data, we need to get a dataset in SAS. There are many ways of doing this. For this

class, I will discuss two different ways:

Using the Table Editor to enter data by hand

Importing data from a text file (including .txt or .csv)

Once we have the data in hand, we will want to work with it. Although SAS has begun to support point-and-click

type approaches to data analysis (using menus), it is better practice to store everything you do to the dataset in

a program. That way, when you make a mistake (and you will), it is simple to change one small part of what you

have programmed, and re-run the program. If you rely on point-and-click, you will have to point-and-click all

over again. SAS does allow you to save the program commands that are executed when you point-and-click,

which is a good way to learn program commands to include in your overall program. Programs primarily consist

of three types of commands.

SAS Variable definitions (which we will use to specify folder locations)

Data steps are used to read data into SAS, manipulate data, and format data.

Procedure steps (abbreviated proc) are used to perform calculations and produce plots using the data.

Here is an example of a program. It assumes that SAS already knows about the dataset "work.diamondSRS". This

program keeps only the size of the diamonds (carats), sorts them by size, and saves only the large sizes in a new

data set called ``largediamond”. Finally, the program creates a histogram of the large diamond sizes. We will

look more carefully at each statement as we go through the basics of SAS programming.

2

/*Data step that loads the dataset "Work.Diamond" into a new file named

"Work.Large", and keeps only the carat variable;*/

DATA Work.Large;

set Work.Diamond;

keep carat;

RUN;

*Proc step that sorts the data according to carat;

PROC sort data Work.Large;

by carat;

RUN;

*Proc step that makes a histogram using the data;

PROC univariate data=work.large;

histogram carat;

RUN;

2. Getting Data Into SAS

a. SAS Libraries SAS stores all its datasets in folders called “Libraries”. By default, SAS will use the “Work” library, which

is a temporary library. When you quit SAS, everything in the “Work” library is erased. You can create

permanent libraries, which are simply pointers from SAS to folders located in your computer.

For example, I can create a new library called Stat651Examples on my desktop.

i. Create a folder in your computer for the SAS Library. I created a folder called 651Computing

3

ii. Navigate to your current libraries in SAS. To do this, in SAS, double-click on the “Libraries” icon in

the “Explorer” window.

It will look something like this:

4

iii. Add your permanent library. Choose File → New, then enter the name for your library (here I used

Stat651), which can only be 8 characters long, and the pathway to the folder on your computer

(probably by “browsing” to it). Click OK. (Note the name in SAS does not need to match the folder

name!)

iv. Check that this worked. Now the explorer lists a new library.

You need to tell SAS where your library is every time you start a new SAS Session!! An easier way to

do this is to declare the library location in the first line of your SAS program:

LIBNAME Stat651 ’C:\Documents and Settings\651Computing’;

PRACTICE: Create a Library to store your data in.

b. Enter Data By Hand Using the Table Editor i. Choose Tools → Table Editor

5

ii. Click on the heading letters to re-name the variables

iii. Click on the table cells to enter data

iv. Save by choosing File → Save

v. Navigate to the Library where you want to save your data, and enter the name you’d like to give

your data. Here I’m saving the data set as “Diamond” in the “Work” library.

6

c. Importing Data From a Text File i. Choose File → Import Data

ii. Choose the file type (in this case .txt)

iii. Browse for the file location on your computer

7

iv. Choose the LIBRARY and File Name that SAS will use to identify the data set. Since we have not

defined any libraries, we will save our data in WORK for now. I can give the data any name I like.

v. If you want to, you can have SAS write the commands to import this data again without having

to go through the point-and-click importing. If so, you need to give SAS a file name in which to

save these commands. Otherwise, just click “Finish”.

vi. You can use the Explorer to check that you data is saved where you think it is. In this case, I can

choose Libraries → Work → Diamond. Double-clicking on the dataset will open the data in the

Data Viewer.

PRACTICE: Download the DiamondsSRS.txt dataset from the course website. Load it into SAS.

3. SAS programming Basics

As I mentioned earlier, it is best practice to write out a program, rather than “point and click”.

Some things to keep in mind:

Always end each part of your command with a semicolon;

You can write comments to yourself by using:

/*this is a comment*/

Or you can comment an entire line by using:

*this is a comment;

8

SAS does not care about capitalization

Use the command:

RUN;

to tell SAS that you are finished giving it commands, and it should go ahead and do what you

told it to.

SAS has a built-in help menu. Use it as best you can.

Missing values are indicated by a ‘.’

4. DATA Steps Data steps are used to create new datasets out of existing datasets. If you use the same name as an

existing data step, SAS will overwrite your data.

DO NOT RUN:

/*These commands erase all the data stored in the Diamonds dataset in the

Work library, or rather, replaces the current data with an empty data set;*/

Data Work.Diamond;

Run;

Data names consist of two parts, separated by a period. The part before the period specifies the library.

The part after the period specifies the specific name of the data set. Thus, Work.Diamond refers to

the dataset “Diamond” in the library “Work”.

Since we will be building new datasets by changing existing data sets (for example to create a data set

that only contains the carat variable), we need to tell SAS where to get the old datasets. This is what the

SET command is for.

/*Data step that saves the dataset "Work.Diamond" into a new file named

"STAT651.Diamond";*/

DATA STAT651.Diamond; /*name of the new dataset*/

set Work.Diamond; /*name of the current dataset*/

RUN;

Of course, we sometimes like to change the dataset. We might like to create a new variable, or keep

only a subset of variables.

/*Data step that loads the dataset "Work.Diamond" into a new file named

"Work.Money", changes the dollars to cents, and drops the dollar

variable;*/

DATA Work.Money; /*name of the new dataset*/

set Work.Diamond; /*name of the current dataset*/

cents = total_price*100 /*create a new variable out of an older one*/

keep carat cents; /*keep only the specified variables*/

RUN;

9

In this class, it might be useful to only keep a certain kind of observation.

/*Keep only those diamonds with more than 2 carats*/

DATA Work.Large; /*name of the new dataset*/

set Work.Diamond; /*name of the current dataset*/

where carat>= 2; /*keep only the specified observations*/

RUN;

In addition to creating variables that are a simple function of other variables, we might like to categorize

the observations according to one or more variables. One way to do this is with if…then…else

statements:

DATA Work.MoMoney; /*name of the new dataset*/

set Work.Diamond; /*name of the current dataset*/

if carat < 2 then size = 0; /*categorize by carat*/

else if carat < 4 then size = 1;

else size = 2;

/*categorize by carat AND price*/

if (carat < 2 & Total_Price > 6000) then expensive = 1;

else expensive = 0;

RUN;

10

PRACTICE: Create a dataset that contains only the values of diamonds less than 1 carat.

PRACTICE: Create a dataset that contains an indicator of a diamond priced more than $10,000.

5. PROC Steps Proc steps are all about manipulating data that has already been created. There are many reasons to do

so, including exploratory data analysis and statistical analysis. But, regardless of its purpose, PROC

steps have the same general anatomy:

On each line, you might also have options, which are indicated by a -slash ‘/’.

As an example, this code sorts your dataset by the indicated variable:

proc sort data=Work.Diamond;

By Carat;

RUN;

a. Exploratory Data Analysis This section covers only a few commands. We will look at more as we need them

i. Univariate numerical summaries

PROC UNIVARIATE data= Work.Diamond;

VAR carat;

RUN;

ii. Histograms

PROC UNIVARIATE data=Work.Diamond NOPRINT;

/*The NOPRINT option suppresses the univariate summary*/

Histogram carat/ midpoints=0 to 6 by 0.25;

Title ‘Histogram of Diamond Carats’;

RUN;

11

iii. Tables. The following code would create 3 tables, one for the size of the diamond, one

for whether or not the diamond is expensive and a 2-way table of size vs. expensive:

PROC FREQ data=Work.MoMoney;

TABLES size expensive size*expensive;

RUN;

b. Statistical Analysis i. Estimation of a mean from a survey

PROC SURVEYMEANS data=Work.Diamond N=1000;

Var carat;

RUN;

PRACTICE: Describe the distribution of the value of diamonds less than 1 carat.

PRACTICE: Describe the distribution of the indicator of a diamond priced more than $10,000.

PRACTICE: Use finite population estimation (survey estimation) to estimate the proportion of

diamonds priced more than $10,000. Include a 95% CI.

6. Random Number Generation

a. Add random numbers to an existing data set

Data Work.Diamond;

Set Work.Diamond;

Rnum = uniform(-1); /*The minus one tells SAS to choose a random seed.

If you choose a positive number, you will get the same random numbers each

time. (This can be helpful if you are debugging or want to get the same

sample again.)*/

RUN;

b. Create a dataset that only contains random numbers Data Work.randnum;

DO ID=1 to 100 by 1;

Rnum = ceil(uniform(-1) * 1000); /* a random integer between 1 and

1000. (ceil=ceiling)*/

Output;

END;

RUN;

12

/*Check to see if there are any duplicates, by writing any duplicates to a

new dataset*/

proc sort data=work.Randnum;

by Rnum;

RUN;

data work.dupobs;

set work.randnum;

by Rnum;

if ^first.Rnum;

run;

/*Create a dataset with only unique values*/

data work.unique;

set work.randnum;

by Rnum;

if first.Rnum;

run;

PRACTICE: Create a dataset that is a random sample of 10 diamonds.

7. SURVEYSELECT SAS has a procedure that will help you select random samples from populations. The simplest way to select is to

take a simple random sample. The code below takes a simple random sample of size 10 from

proc surveyselect data=work.diamond

method=srs n=10 out = work.diamondsample

seed = 32348340 stats;

run;

PRACTICE: Create a dataset with unique IDs for all 24 students enrolled in this class. Select a simple

random sample of 5 students to be in a group.