37
Good Data Management Practices Patty Glynn http:// staff.washington.edu/ glynn/ 10/31/05 http://staff.washington.edu/glynn/ GoodDataManagement.ppt

Good Data Management Practices Patty Glynn 10/31/05

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Good Data Management Practices Patty Glynn  10/31/05

Good DataManagement Practices

Patty Glynn

http://staff.washington.edu/glynn/

10/31/05

http://staff.washington.edu/glynn/GoodDataManagement.ppt

Page 2: Good Data Management Practices Patty Glynn  10/31/05

Four Statistical Packages

• SPSS

• Stata

• R

• SAS

Page 3: Good Data Management Practices Patty Glynn  10/31/05

• Point and Click

• Command Line

• Programs (the best way)

Three Ways to Work

Page 4: Good Data Management Practices Patty Glynn  10/31/05

Outline

• Sermon on SYNTAX

• Cleaning data and creating variables

• Never overwrite original data

• Practices that will help you keep track of your work

• Safeguarding your work

Page 5: Good Data Management Practices Patty Glynn  10/31/05

A Sermon on SYNTAX

• Command line and Point and Click – Advantages:

• Quick, may require less learning

– Disadvantages: • Takes longer the second time – you must wade

through the point and click menu rather than just change a word

• You do not have a record of what you have done

Page 6: Good Data Management Practices Patty Glynn  10/31/05

SPSS

The King of Point and Click

Page 7: Good Data Management Practices Patty Glynn  10/31/05

You can point and click to get files, create variables, change variable values, and do analysis, and end up without a record of what you have done. You will be sorry.

Page 8: Good Data Management Practices Patty Glynn  10/31/05

Or, you can use Point and Click as an aid as you write programs. You can copy syntax created by Point and Click into your program.

In SPSS programs are written in a Syntax Window and they have the extension of .sps when you save them.

Page 9: Good Data Management Practices Patty Glynn  10/31/05

You can modify SPSS defaults so that commands will be reflected in the log. This allows you to copy commands from your log into your program file. These changes also make debugging easier.

Page 10: Good Data Management Practices Patty Glynn  10/31/05

http://staff.washington.edu/glynn/config_spss.pdf

You will find information about how to modify SPSS at the following URL.

Page 11: Good Data Management Practices Patty Glynn  10/31/05

STATA

Page 12: Good Data Management Practices Patty Glynn  10/31/05

You can point and click, issue commands on the command line, or create .do files. “.do” files can store your programs.

Page 13: Good Data Management Practices Patty Glynn  10/31/05

R

Page 14: Good Data Management Practices Patty Glynn  10/31/05

With R you can point and click, issue commands on the command line, or create .R files. “.R” files store your programs.

Results from P&C are reflected so you can copy them into your program.

Page 15: Good Data Management Practices Patty Glynn  10/31/05

SAS

Page 16: Good Data Management Practices Patty Glynn  10/31/05

SAS allows some point and click, but immediately offers an editor where you can write your programs. SAS programs end with the .sas extension, and are text files.

SAS features an enhanced editor with cool color coding that makes it easier to write and debug programs.

Page 17: Good Data Management Practices Patty Glynn  10/31/05

Outline

• Sermon on SYNTAX

• Cleaning data and creating variables

• Never overwrite original data

• Practices that will help you keep track of your work

• Safeguarding your work

Page 18: Good Data Management Practices Patty Glynn  10/31/05

Never clean data in the data view

Page 19: Good Data Management Practices Patty Glynn  10/31/05

Scenario 1:

You get a data set and find errors in it.

You change the values in the data window.

You save it with point and click, over-writing your original data.

Later you try to recall what changes you made, when and why. Of course you can’t. You can’t even be sure that you made the “corrections” for the proper cases.

You can’t look back at older data sets to confirm what you did. You sit there sweating.

Page 20: Good Data Management Practices Patty Glynn  10/31/05

Scenario 2 same as Scenario 1 :

You save it with point and click, over-writing your original data and, while you are saving the file,

1)Your computer goes down because of a power outage OR

2) There is a brief interruption in the network

HALF OF YOUR DATA SET IS LOST.

You cry.

Page 21: Good Data Management Practices Patty Glynn  10/31/05

Scenario 3:

You get a data set and find errors in it.

You write a program that:

1)gets the original data

2)makes changes in values with SYNTAX

3)Includes comments about the changes

4)saves the new file in a different name

Science marches forward.

Page 22: Good Data Management Practices Patty Glynn  10/31/05

Creating Variables and Recodingis not the same as Cleaning Data

• You always want clean data

• You may not always want the recoded or created variables

• Make new variables, but keep the old ones. (don’t over-write) Use the original to check the new

Page 23: Good Data Management Practices Patty Glynn  10/31/05

Examples of Recoding/Creating

• Creating a series of dummies from a categorical variable

• Creating an index from a series of scale variables

• Creating a dichotomous or categorical variable from a continuous variable

• Always consider MISSING VALUES

Page 24: Good Data Management Practices Patty Glynn  10/31/05

Sample SPSS Program

* CleanNew.sps .* 10/10/05 created dummy for male .Get file = ‘dirty.sav’ .* Cleaning data, PJG, looked at survey form, educ for ID=1 should be 16, 10/9/05 .

If id = 1 educ = 16 .* Create a dummy variable from “gender”. If gender = ‘m’ male = 1 .If gender = ‘f’ male = 0 .If gender = ‘’ male = -9 .Missing values male (-9) .Variable label male ‘Male’ .Value labels male 1 ‘Male’ 0 ‘Female’ .Save outfile = ‘CleanNew.sav’ / drop gender .

Page 25: Good Data Management Practices Patty Glynn  10/31/05

Summary for Cleaning and Creating variables

• Use syntax (programs) to create and clean variables

• Document when and why in your programs

• Save new file – do not over-write the old

Page 26: Good Data Management Practices Patty Glynn  10/31/05

Outline

• Sermon on SYNTAX

• Cleaning data and creating variables

• Never overwrite original data

• Practices that will help you keep track of your work

• Safeguarding your work

Page 27: Good Data Management Practices Patty Glynn  10/31/05

It may be months between the time that you finish a paper, submit it, and get to revise it for publication.

Page 28: Good Data Management Practices Patty Glynn  10/31/05

What you will need to know:

• The origin of your variables:– What is the source for each variable– How were they created?

• What programs created your final tables?

• What program files created the file you used for your final tables?

Page 29: Good Data Management Practices Patty Glynn  10/31/05

Create a Directory for the Project

• For example, c:\MA_Thesis

• Store all of the programs and data in that directory and subdirectories

Page 30: Good Data Management Practices Patty Glynn  10/31/05

Naming Conventions

• For every data file you have, you should have a program file with a corresponding name.

• When you have finished your paper, create a program file for each table. For example: table1.sas table2.sas

Page 31: Good Data Management Practices Patty Glynn  10/31/05

Document your work

• Write comments in your program.

• Put a file in your directory called a_note, readme, or something similar that includes a brief description of the project and important information.

Page 32: Good Data Management Practices Patty Glynn  10/31/05

Outline

• Sermon on SYNTAX

• Cleaning data and creating variables

• Never overwrite original data

• Practices that will help you keep track of your work

• Safeguarding your work

Page 33: Good Data Management Practices Patty Glynn  10/31/05

Safeguarding your work

• Multiple backups – not all stored in the same basket

• Worry about the future– Keep up with formats (cards, tapes, floppy

disks, CDs, what next? )– Store in portable formats

Page 34: Good Data Management Practices Patty Glynn  10/31/05

Documents that may be helpfulhttp://staff.washington.edu/glynn/record.pdf

http://staff.washington.edu/glynn/debug.pdf

http://staff.washington.edu/glynn/debugsp.pdf

http://staff.washington.edu/glynn/config_spss.pdf

http://staff.washington.edu/glynn/

Page 35: Good Data Management Practices Patty Glynn  10/31/05

Computer Environments for the Social SciencesCSSS 506

Winter Quarter 2006

See contents from 2005:

http://courses.washington.edu/glyclass/csss506/winter05/index.html

Page 36: Good Data Management Practices Patty Glynn  10/31/05
Page 37: Good Data Management Practices Patty Glynn  10/31/05

The End

Patty Glynn

http://staff.washington.edu/glynn/

10/31/05

http://staff.washington.edu/glynn/GoodDataManagement.ppt