Upload
buidat
View
224
Download
0
Embed Size (px)
Citation preview
Organizing Your Data for Statistical Analysis in SPSS Edward A. Greenberg, PhD
ASU HEALTH SOLUTIONS DATA LAB
REVISED JANUARY 4, 2013
SPSS Data Sets
• Rows are cases or observations
• Columns are variables (measurements)
• Up to 231-1 columns (2,147,493,647)
• No limit on the number of cases
Variable Types
• Numeric (40 character maximum
length)
• Dates and times (various formats)
• Other variations of numeric (currency,
comma, scientific notation, etc.)
• String (32,767 maximum length)
Variable Names
• Variable names must be unique.
• Variable names may be up to 64
characters in length.
• Names can contain letters, numbers, or
special characters.
• Names must start with a letter or @, #,
or $.
Unit of Analysis
What constitutes a “case?”
• A person
• A household
• An organization
• An experimental trial
Labeling Data
• Variable names may be short and
cryptic.
• Variable labels can be up to 255
characters.
• SPSS procedures display at least 40
characters of variable labels.
• Value labels can be up to 120
characters.
Order of Variables
• The order of variables in the SPSS data
file normally should be the same as the
order of items in the questionnaire.
• Use variable names that help you
identify the scale or instrument to which
they apply.
Case Numbers
• Each case in an SPSS file should
include a case number.
• Often this will be the first variable in the
file.
• The case number does not identify the
subject but it links the data record to
the subject’s questionnaire.
• Useful for correcting data entry errors
Create a Codebook
• When preparing to enter your data into SPSS, prepare a codebook for the data set.
• The codebook documents all of the items to be entered in the data set:
– Variable names and labels
– Variable types and formats
– Coded values for categorical items
– Missing values
Sample Codebook
VARIABLE
NAME TYPE & LENGTH
DESCRIPTION / VARIABLE LABEL / CODED VALUE / VALUE
LABEL
CASENO
NUM 3
Case number
Case number
SEX STR 1 6. I am:
M Male
F Female
AGE NUM 2 7. My age is:
(Code actual age in years)
EDUC NUM 1 8. What is the highest level of education that you have completed?
Education level
1 No formal education
2 Some grade school
3 Completed grade school
4 Some high school
5 Completed high school
6 Some college
7 Completed college
8 Some graduate work
9 A graduate degree
Missing Data
Data may be missing for several reasons:
• Don’t know
• Refused to answer
• Not applicable
• Skipped a question
• Instrument problem
• Data entry omission
Missing Values
SPSS provides several ways of
designating numeric data as “missing
values.”
• A blank cell is treated as “system
missing,” represented by a dot (“.”) in
the SPSS Data Editor.
• Specific values can be declared as
“user missing” values.
Missing Values
• Up to three “user missing” values can
be declared for a variable.
• Or, a range of values plus one
additional value can be declared to be
missing.
Missing Values
In this example, variable AGEWED has
three labeled values that are to be treated
as missing
Missing Values
• Expressions handle missing values in
different ways.
• The result of (var1+var2+var3)/3 is
missing if any of the three variables is
missing.
• The result of MEAN(var1, var2, var3) is
missing if all three of the variables are
missing.
Missing Values in Procedures
The FREQUENCIES procedure excludes
cases with missing values from computations.
Multiple Responses
• Multiple-response items are questions that
can have more than one value for each
case.
• Two ways of coding:
– For each response, a variable can have one
of two values e.g., 1=Yes and 2=No (“multiple-
dichotomy” method)
– Create a series of variables for 1st choice, 2nd
choice, etc. (“multiple categories” method)
MULT RESPONSE Procedure
• In the MULT RESPONSE procedure, multiple response variables are combines into groups.
• The MULT RESPONSE procedure counts responses in multiple response groups in frequency or cross tabular tables.
• Total percentages of responses generally will exceed 100%.
Repeated Measures
• Data that are recorded on more than
one occasion for each subject
• Some procedures, such as GLM,
require that all measurements for a
case be on the same data record.
• Other procedures, such as the MIXED
procedure, may expect one data record
per occasion.