36
Working with Data in Windows HRP223 – 2010 October 4 th , 2010 Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Working with Data in Windows HRP223 – 2010 October 4 th, 2010 Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved. Warning: This

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Working with Data in Windows

HRP223 – 2010October 4th, 2010

Copyright © 1999-2010 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Sources of Data

• Toy data– For statistics classes, you may be able to type in the data

directly into a SAS code file into EG like in TLSB for EG.• Excel

– For small amounts of HIPAA safe data you can use Excel with validation.

• Text files with columns of numbers and text– Exports created by databases frequently provide a text file

full of data and a program for loading it into SAS.• SAS

– Native SAS datasets created by somebody else.

Recognize File Types

• Windows adds a period and a suffix that is a couple of letters long to the names of files to indicate what program uses the file. By default, the suffix is hidden.

2

3 Uncheck

4

1

5

Follow these steps to show file extensions (suffixes) in Vista.

Show File Extensions (Suffixes) in XP

2

3 Uncheck

41

5

Types of Files

.pdf Adobe portable document format

.zip archives full of compressed data

.xls Excel prior to 2007

.xlsx Excel 2007 and later

.csv comma separated values (text which Excel likes)

.txt text files

.sas SAS code files

.egp Enterprise Guide projects

.sas7bdat SAS data files

.htm or .html web pages

SAS and EG files

• .sas files are text files full of instructions that a programmer can write and/or edit.

• .egp files are not.

Searching

• Because the contents of .egp files are incomprehensible (without special tools) you will have trouble searching for things inside of projects.

• This affects me when I can’t remember the name of a project and to find it I want to search for key words in the code (like the principal investigator’s name or the name of the source data file).– I can not find a tool to search the contents of all

the .egp files on my hard drive.

Files in Enterprise Guide

• You can (and should) save SAS code files outside of the EG project to make it easy to search.

• Most people create EG projects that reference data files that live outside of EG.– SAS datasets– Excel files – Text files full of data

Converted to SAS format

Native Excel format

Shortcuts

• Windows indicates a “shortcut” to a file that lives elsewhere with an arrow in the bottom left corner of an icon.

• EG uses the same symbol to denote a shortcut to a file outside of the project.

What is in an EGP file?

• An EG project file .egp contains information and instructions but it will have links to a lot of external files.

Shortcut to a file NOT in the

project.

This is part of

the project

Shortcut to a file NOT in the

project.

EG and Code

• You can write and store your “code” instructions to SAS inside of the EG project or you can create a short cut to the code file which lives outside of EG.

Right click and choose New > Program Look at the process flow No shortcut icon

External SAS files

• You can easily save a code file outside of the project by choosing Save Program As… from the File menu or clicking the Save or Save As … from the program tab (when the code is open).

Shortcut

Where are SAS Data Sets Stored?

• While SAS can refer to files using their Windows path, it is easier to type a short name instead of a long path.

• SAS calls the short names “libraries”.• EG automatically knows about a couple of places

where data can be stored.– It creates a temporary work folder whenever EG starts.– It creates a permanent sasuser folder when EG is

installed.• The locations for data are called libraries.

Libraries

• By default the data goes into the sasuser library. This is a very bad idea.

• You will end up with every file in one folder.

• Anybody using SAS can access that folder, so there are significant HIPAA issues.

• Right click on a file and pick Properties to see where it is stored.

Libraries

• You can see the contents of libraries by going to the Server List window and opening the local libraries “file drawer.”

If you previously closed the window use the View menu to select Server List.

Double click the dataset to browse it.

Change the Default File Location

• On every machine you, use you should change the default file location to the work library. Do this once per machine.

Click 1st

Click 2x

Permanent Store

• I suggest that you save your data into the temporary work library by default.

• If you have a huge file which you only want to import once, or if you want to keep a permanent copy of a SAS data file, you will want to set up a permanent library.– This is just a fancy way of specifying what folder

SAS should use to save the .sas7bdat data files.

Loading Data The Easy Way

• First fix the problematic registry entries that are described in the instructions on installing SAS.

www.stanford.edu/class/hrp223/2010/SAS92TS2M3.pptx

• If you have mixtures of characters and number values in a column in Excel programs reading the data (including SAS) can drop the cells that have character data without warning.

SASR

Importing the Easy Way

• The most bulletproof way for importing with EG 4.2 is to use the import wizard.

Always check this

on.

Double check that it guesses the right Type, especially for

dates.

Tell SAS that there is a folder which can

hold data by creating a library. This only makes it aware of the folder. It does not automatically

put stuff in the folder.

It’s just a folder!• When the library is created it is just a pointer to a

preexisting folder. That folder can contain anything.

• When you want to use the folder you need to explicitly tell EG to store data in the folder.

• First rename the node and draw an arrow to indicate where the library is used. These changes are only aesthetic.

Now it looks good but the

import is still into work.

1st rename the node to match the library name

2nd add a line to the flowchart connecting the library to the import. It just looks good.

Find your library here.

Notice it is in the library.

A “design feature” is that you have to Refresh the library to see the freshly added file.

Playing with Data

• Once the data is imported you can add code “nodes” to the flowchart or use the graphical user interface to tweak the data and do analyses.

Complex changes

Quick and easy subset and sorting

It gives you more options as you add in sort variables.

SQL is built behind the scenes.

Note the awful new name.

Convert to a 4 digit number with the input function:

input( t1.score , 4. )

Context sensitive menus help you describe the data you are browsing.

Before After

Descriptive Statistics

drag