Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Crash Course in Statistics
Introduction to SPSS
July 2014
Dr. Jürg Schwarz [email protected]
Neuroscience Center Zurich
Slide 2
Program 8 July 2014: Morning Lessons (09.00 – 12.00)
◦ First Part – Introduction
- A typical example
- Resources
- First steps
◦ Exercises
Program 8 July 2014: Afternoon Lessons (13.00 – 16.00)
◦ Second Part – Additional Topics
- Analysis functions
- Charts
◦ Exercises
Slide 3
Table of Contents
Introduction _____________________________________________________________________________________________ 5
A typical example ................................................................................................................................................................................................... 5
"How-to" in SPSS – First Impression .................................................................................................................................................................... 10
Resources _____________________________________________________________________________________________ 11
Manuals................................................................................................................................................................................................................ 11
Sample Files ........................................................................................................................................................................................................ 12
Using the Help System (Core System User’s Guide) ............................................................................................................................................ 14
Online-Resources ................................................................................................................................................................................................. 17
First Steps _____________________________________________________________________________________________ 18
Change the Application Language ........................................................................................................................................................................ 18
Starting SPSS & Opening a Data File ................................................................................................................................................................... 19
Data Editor & Data Organization .......................................................................................................................................................................... 20
Running an Analysis & Viewing Results ............................................................................................................................................................... 22
Intermezzo: Alphabetical view of the variables in the dialog boxes ....................................................................................................................... 23
Working with Syntax ............................................................................................................................................................................................. 27
Modifying Data Values .......................................................................................................................................................................................... 33
Select Cases & Split File ...................................................................................................................................................................................... 37
Data Entry ............................................................................................................................................................................................................ 41
Data Editor: Defining Variables, Entering Data & Missing Values ......................................................................................................................... 42
Importing Data ...................................................................................................................................................................................................... 47
Exercise 01: First Part – Introduction _______________________________________________________________________ 52
Slide 4
Analysis functions: Analyze ______________________________________________________________________________ 53
Descriptive Statistics ............................................................................................................................................................................................ 53
Inferential Statistics .............................................................................................................................................................................................. 63
Creating charts _________________________________________________________________________________________ 70
Manuals................................................................................................................................................................................................................ 70
Creating and Editing Charts .................................................................................................................................................................................. 71
Bar Chart – Self-Study 45 minutes ....................................................................................................................................................................... 72
Exercise 02: Second Part – Additional Topics ________________________________________________________________ 73
Slide 5
Introduction
A typical example
Medical research: What are the factors affecting body weight?
Data set (EXAMPLE00.SAV)
Sample of n = 198 men and women
Typical questions
Is there an impact of the factors K
◦ body size [cm]
◦ age [years]
◦ sex [0 / 1]
K on body weight?
How can this impact be modeled?
How strong is the impact of each factor?
Body size [cm]
Body w
eig
ht
[kg]
Slide 6
A closer look: Joint representation of the relationships between the variables
Body size Body weight Age
Body w
eig
ht
Body s
ize
A
ge
Slide 7
Questions
Question in everyday language:
How do individual characteristics influence body weight?
Research question:
Is there an impact of the factors K
◦ body size
◦ age
◦ sex
K on body weight?
How strong is the impact of the factors?
Is there a model?
Is linear regression analysis the right model?
Statistical question:
H0: "No model" (= No overall model and no significant coefficients)
HA: "Model" (= Overall model and significant coefficients)
Can we reject H0?
Slide 8
Solution
Multiple linear regression model with body weight as the dependent variable
0 1 2 3body weight = age body size sex uβ + β ⋅ + β ⋅ + β ⋅ +
0 3
body weight dependent variable
age, ... sex independent variables
,... coefficients
u error term
=
=
β β =
=
"How-to" in SPSS
Scales
Dependent variable: metric
Independent variables: metric, categorical (coded as dummy variables)
SPSS: Analyze�Regression�Linear...
Method: Enter (All variables are entered into the model simultaneously)
Method: Stepwise (Each variable is individually tested for fitness and included)
Method: Blockwise (Variables are entered in a predefined sequences of blocks)
Slide 9
Result
Significant overall model (table not shown)
High value for "Adjusted R Square" (R2adj. ≤ 1)
Significant coefficients (Sig. < .05)
SPSS Output
weight = -25.295 + .417 ⋅⋅⋅⋅ size + .476 ⋅⋅⋅⋅ age + 8.345 ⋅⋅⋅⋅ gender_d
Example interpretation: One more year of age increases body weight by .476 kilograms,
holding all the other independent variables constant.
age has highest impact: Standardized Beta = .596
Slide 10
"How-to" in SPSS – First Impression
Structure of SPSS
Output
Data File (in Data Editor)
Syntax File (in Syntax Editor)
Slide 11
Resources
Manuals
This introduction refers to the manual "IBM SPSS Statistics 22 Brief Guide"
Find this manual and also "IBM SPSS Statistics 22 Core System User’s Guide" here:
◦ www-01.ibm.com/support/docview.wss?uid=swg27038407#en
Slide 12
Sample Files
This introduction uses the data file demo.sav
◦ Find it here: www.schwarzpartners.ch/ZNZ_2014/Data_Files
Variable View
Data View
Slide 13
The data file demo.sav is a fictional survey of several thousand people (n = 6400),
containing basic demographic and consumer information.
Name Label
age Age in years
marital Marital statusaddress Years at current address
income Household income in thousandsinccat Income category in thousands
car Price of primary vehiclecarcat Primary vehicle price category
ed Level of educationemploy Years with current employerretire Retired
empcat Years with current employerjobsat Job satisfaction
gender Genderreside Number of people in household
wireless Wireless servicemultline Multiple linesvoice Voice mail
pager Paging serviceinternet Internet
callid Caller IDcallwait Call waiting
owntv Owns TVownvcr Owns VCR
owncd Owns stereo/CD playerownpda Owns PDAownpc Owns computer
ownfax Owns fax machinenews Newspaper subscription
response Response
Slide 14
Using the Help System (Core System User’s Guide)
Help is provided in many different forms:
Help menu (the most important)
◦ Topics: Provides access to the Contents, Index, and Search tabs, which you can use to find specific Help topics.
◦ Tutorial: Illustrated, step-by-step instructions on how to use many of the basic features.
◦ Case Studies: Hands-on examples of how to create various types of statistical analyses and how to interpret the results.
◦ Statistics Coach: A wizard-like approach to guide you through the process of finding the procedure that you want to use.
◦ Command Syntax Reference: Detailed command syntax reference information is available in two forms: integrated into the Help system and as a separate document in PDF form.
Context-sensitive Help
◦ Dialog box Help buttons: Most dialog boxes have a Help button that takes you directly to a Help topic for that dialog box.
◦ Pivot table context menu Help: Right-click on terms in an activated pivot table in the View-er and choose What’s This? from the context menu to display definitions of the terms.
◦ Command syntax: In a command syntax window, position the cursor anywhere within a syntax block for a command and press F1 on the keyboard.
Slide 15
Help menu
Dialog box Help buttons
=>
=>
:
Slide 16
Tutorials
:
Slide 17
Online-Resources
SPSS Solutions for Education
www-01.ibm.com/software/analytics/spss/academic/students/resources.html
IBM-ID [email protected] Password 7mydevelopper
SPSS Support (especially Knowledgebase Search)
http://support.spss.com/tech/default.asp
User spssswitzerland Password spssswitzerland
SPSS Support (resources for all levels of users and application developers)
www.spss.com/devcentral
User [email protected] Password 7mydevelopper
Other Resources / Forum / Discussion
www.ats.ucla.edu/stat/spss
http://spssx-discussion.1045642.n5.nabble.com
www.spssusers.co.uk
www.dynelytics.com/en =>
Slide 18
First Steps
Change the Application Language
The language can be selected through the Language tab under Edit�Options:
Slide 19
Starting SPSS & Opening a Data File
From the Start menu choose:
IBM SPSS Statistics
IBM SPSS Statistics 21
Other possibility:
Double click on SPSS data file
Find data file demo.sav here: www.schwarzpartners.ch/ZNZ_2014/Data_Files
Slide 20
Data Editor & Data Organization
The Data Editor displays the contents of the active data file
◦ Data View Columns represent variables and rows represent cases (observations)
◦ Variable View Each row is a variable, each column is an attribute of that variable
Slide 21
SPSS data is organized by cases (rows) and variables (columns)
Variables (columns)
Each column in the data editor corresponds to a specific measurement.
In many areas of research, these measurements are called variables.
Cases (rows)
For a survey of individuals, each row
would represent a respondent.
In an experiment, each row might corre-
spond to a single recorded observation.
Data View
Slide 22
Running an Analysis & Viewing Results
The "Analyze" menu contains different methods of analysis.
For example a simple frequency table with histogram:
Analyze�Descriptive Statistics�FrequenciesK
Slide 23
Intermezzo: Alphabetical view of the variables in the dialog boxes
The default settings of SPSS show labels for the variables in the dialog fields:
This could make the search for particular variables difficult.
Variables are shown with a label.
Slide 24
SPSS can be adjusted so that variables are displayed with their names and in
alphabetical order.
To do so, select the following setting under the General tab of Edit�Options:
Variables are displayed
alphabetically by names.
Place the cursor in the box that contains the
variables, and enter a character from the
keyboard. The first variable beginning with
this character will appear.
This allows you to quickly search through
the variable box to find a variable.
Slide 25
Create an additional histogram
Slide 26
Slide 27
Working with Syntax
Open a new syntax file through the menu: File�New�Syntax
Output
Data Editor
Syntax-Editor
*.sav files
*.spv files
*.sps files
Slide 28
How do you get the command syntax?
Option I: Perform an analysis through the menu
Example: Analyze�Descriptive Statistics�Frequencies
Output
Data Editor
Slide 29
Where is the syntax for this analysis? => The syntax is displayed in the output.
Double-click the syntax part in the log, highlight and copy the syntax.
Paste the syntax into the Syntax Editor.
Slide 30
Option II: Paste the syntax directly from the dialog box ("Paste" button).
Option III: Write the syntax yourself.
Executing the Syntax
Place the cursor inside the syntax in the syntax editor and run the analysis through the menu
Run�Selection.
Slide 31
Typical Syntax File
Why should you use syntax?
Rapidly leads to greater efficiency.
Documentation Reproducing the results Automatically process many commands Allows access to all commands Communication with other persons Opens the world of macros
Slide 32
What if the syntax is not displayed in the output?
Through the menu Edit�OptionsK�Viewer, choose Display commands in the log
The syntax is now displayed in the output.
Slide 33
Modifying Data Values
The data may not always exist in a form that can be used for analysis or reporting.
For example, you may want to:
◦ convert a scale variable into a categorical variable.
◦ merge different response categories into a single category.
◦ calculate a new variable from the difference between two existing variables.
Slide 34
Computing a new variable
New variables can be computed based on existing ones, for example by averaging scores,
summing them up etc.
For example you may want to compute the equivalence income (based on the household inco-
me and the number of persons in the household).
Transform�Compute VariableK
Syntax
COMPUTE income_equiv = income / SQRT(reside).
Slide 35
Recoding a variable
Example: creating a categorical variable from a scale variable.
For example, based on age in years we could build age categories.
Menu: Transform�Recode into Different VariablesK
Slide 36
Syntax
RECODE age (Lowest thru 24=1) (25 thru 44=2) (45 thru 60=3) (61 thru Highest=4) INTO age_r.
FREQUENCIES VARIABLES=age age_r /ORDER ANALYSIS.
Result
Scale values (age) Categorical values (age_r)
: Categories
1: up to 24 years
2: 25 - 44 years
3: 45 - 60 years
4: over 60 years
==>
Slide 37
Select Cases & Split File
Select cases
A particular subset of the data can be analyzed by selecting specific cases. Through this, all
undesired cases of your data set are either temporarily or permanently deleted.
For example, you may want to analyze only respondents who are older than 45 years.
Menu: Data�Select CasesK
Slide 38
Syntax Result
USE ALL.
COMPUTE filter_$=(age > 45).
FILTER BY filter_$.
EXECUTE .
FREQUENCIES VARIABLES=age
/FORMAT=NOTABLE
/HISTOGRAM
/ORDER=ANALYSIS.
FILTER OFF.
USE ALL.
EXECUTE .
These lines remove the
"filter" for all analyses
to come.
Slide 39
Split File
Sometimes data in different categories should be analyzed separately.
To do this, the data can be split up, and the same analysis can be performed on two or more da-
tasets.
For example, we could split the dataset by means of the variable age_r which means we are
conducting separate analyses for each of the age categories.
Menu: Data�Split FileK
Slide 40
Syntax Result
SORT CASES BY age_r .
SPLIT FILE SEPARATE BY age_r .
EXECUTE.
FREQUENCIES VARIABLES=income
/FORMAT=NOTABLE
/HISTOGRAM
/ORDER=ANALYSIS.
SPLIT FILE OFF.
This line removes the
split for all analyses
to come.
Slide 41
Data Entry
There are different ways to enter data into SPSS.
Data can be directly entered into SPSS or can be imported from many different sources:
◦ Direct: SPSS Data Editor
◦ From a spreadsheet program (such as Excel)
◦ From a database program (such as Access)
◦ From other applications (such as a text editor)
Scanners may be efficient for entering large amounts of data.
Slide 42
Data Editor: Defining Variables, Entering Data & Missing Values
Entering (new) numerical data
Open a new data file (through the menu File�New�Data)
At the bottom of the Data Editor window, switch to Variable View.
◦ Enter age in the first row of the first column.
◦ Enter marital in the second row.
◦ Enter income in the third row.
New variables are automatically assigned the "Numeric" data type.
Slide 43
Switch to the Data View in order to enter values.
To suppress the decimal place for the variables age, marital and income:
◦ At the bottom of the Data Editor window, switch to Variable View.
◦ Select the Decimals column and enter a 0 for age.
◦ Select the Decimals column and enter a 0 for marital.
Slide 44
Adding variable labels and value labels
Enter "Age in years" into the age cell of the "Labels" column.
Do the same for "Marital Status", and so on.
Select the Values cell for marital and open the dialog box.
◦ For Value, enter 1.
◦ For Label, enter "single".
◦ Click on Add so that this designation is registered.
Slide 45
Handling missing values
In general, missing or invalid data should not be ignored.
Sometimes survey participants refuse to answer particular questions.
They may not know an answer, or may respond in an unexpected way.
If these data are not identified or filtered out, your analysis may not yield correct results.
Empty data cells, or cells that con-
tain invalid input, are converted to
missing values, which are displayed
as a period.
Slide 46
The reason why data is missing could be important for your analysis.
For example, for a particular question, it could be useful to distinguish between those who refu-
sed to answer and those for whom the question was not applicable.
In "Variable View" select the Missing cell for income and open the dialog box.
In this dialog box you can specify up to three different missing values, either by defining a range
of values, or particular single values.
Slide 47
Importing Data
Data can be imported from different sources.
◦ Reading an SPSS Data File
SPSS data files have a file extension of *.sav.
◦ Importing data from a spreadsheet
In addition to entering data into the data editor, you can import from programs such as Micro-soft Excel. The column headings serve as variable names.
◦ Importing data from a text file
Text files are common sources of data. Many spreadsheet programs and databases can save their contents in text file format. For example, in CSV files, variables are separated with commas or tabs.
◦ Importing data from a database (not in this course)
Data from a database can be imported with the help of a database wizard.
Slide 48
Importing data from Spreadsheets
Find the Excel file "demo.xls" in www.schwarzpartners.ch/ZNZ_2014/Data_Files
Column headings
are variable names.
Slide 49
Open the Excel file through the SPSS File menu (Excel file must be closed)
Slide 50
Importing data from a text file
Find the text file "demo.txt" in www.schwarzpartners.ch/ZNZ_2014/Data_Files
Open the text file through the SPSS File menu (text file must be closed)
Slide 51
Slide 52
Exercise 01: First Part – Introduction
Ressources => www.schwarzpartners.ch/ZNZ_2014 => Exercises SPSS => Exercise 01
Slide 53
Analysis functions: Analyze
Descriptive Statistics
Summary information about the distribution,
variability, and central tendency of variables.
FrequenciesI
◦ Provides statistics and graphical displays for describing many types of variables.
◦ For a frequency report and bar chart, you can arrange the distinct values in ascending or de-scending order or order the categories by their frequencies. The frequencies report can be suppressed when a variable has many distinct values. You can label charts with frequencies (the default) or percentages.
Statistics and plots: Frequency counts, percentages, cumulative percentages, mean, median,
mode, sum, standard deviation, variance, range, minimum and maximum values, standard error
of the mean, skewness and kurtosis (both with standard errors), quartiles, user-specified percen-
tiles, bar charts, pie charts, and histograms.
Slide 54
Menu FrequenciesI
Slide 55
Example FrequenciesI
FREQUENCIES VARIABLES=age
/FORMAT=NOTABLE
/STATISTICS=STDDEV VARIANCE MINIMUM MAXIMUM MEAN MEDIAN SKEWNESS SESKEW KURTOSIS SEKURT
/HISTOGRAM
/ORDER=ANALYSIS.
Slide 56
DescriptivesI
◦ Displays univariate summary statistics for several variables in a single table
◦ Calculates standardized values (z scores).
◦ Variables can be ordered by the size of their means (in ascending or descending order), alphabetically, or by the order in which you select the variables (the default).
When z scores are saved, they are added to the data in the data editor
Statistics: Sample size, mean, minimum, maximum, standard deviation, variance, range, sum,
standard error of the mean, and kurtosis and skewness with their standard errors.
Slide 57
Example DescriptivesI
DESCRIPTIVES VARIABLES=age
/SAVE
/STATISTICS=MEAN STDDEV MIN MAX.
z scores are saved in the data editor
Slide 58
ExploreI
◦ Produces summary statistics and graphical displays either for all of your cases or separately for groups of cases
◦ Use for: data screening, outlier identification, description, assumption checking, and charac-terizing differences among subpopulations (groups of cases).
Slide 59
Example ExploreI
EXAMINE VARIABLES=income BY gender
/PLOT BOXPLOT
/COMPARE GROUPS
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
:
Slide 60
Example ExploreI
Numbers indicate cases in the dataset
Slide 61
CrosstabsI
◦ Forms two-way and multiway tables and provides a variety of tests and measures of association
Slide 62
Example CrosstabsI
CROSSTABS
/TABLES=gender BY inccat
/FORMAT=AVALUE TABLES
/CELLS=COUNT ROW
/COUNT ROUND CELL.
Slide 63
Inferential Statistics
SPSS offers univariate and multivariate analysis techniques, including (among many others):
General linear models (GLM)
Survival analysis procedures
Non-parametric procedures
One-Sample t-Test
Use to test the claim that a population mean is equal to a specific value.
Example: Test the hypothesis that in the population the mean age is 45 years.
Slide 64
Hypothesis structure
H0: µ = 45 years
HA: µ ≠ 45 years
SPSS output
Mean of sample age = 42.06 years
The probability of the t-Test is p = .000, assuming the null hypothesis
H0 can be rejected:
The mean age in the population is significantly different than 45 years.
Slide 65
Two-Sample t-Test
Comparing two independent Means.
Example: Test the hypothesis that the mean income of men and woman is different.
Slide 66
Hypothesis structure
H0: µmen = µwomen
HA: µmen ≠ µwomen
SPSS output
When comparing groups, their variances must be relatively similar for the t-test to be used.
Levene's test checks for this. If the significance for Levene's test is
◦ ≤ 0.05, then the row "Equal variances not assumed" is used
◦ > 0.05, then row "Equal variances assumed" is used
Men and women in the population do not differ significantly in terms of their mean income.
(t-test: df = 6398, t = .702, p = .483).
Mean of men's income = 70.1608 [1000 $]
Mean of women's income = 68.7798 [1000 $]
Slide 67
Paired Samples t-Test
Very often the two samples to be compared are not randomly selected:
The second sample is the same as the first after some treatment has been applied.
Example: Influence of diet on body weight of overweight men.
Data set: body_weight.sav
◦ weight_0 = weight at beginning of diet
◦ weight_1 = weight after ¼ year
Slide 68
Hypothesis structure
H0: µbeginning = µafter
HA: µbeginning ≠ µafter
SPSS output
The mean weight at beginning of the diet is not significantly different from
the mean weight after ¼ year of diet. (t-test: df = 98, t = -1.210, p = .229).
Mean weight at beginning = 115.4586 [kg]
Mean weight after ¼ year = 116.3687 [kg
Slide 69
How to choose a statistical test?
Use an inferential statistics decision-making tree!
http://en.wikiversity.org/wiki/Inferential_statistics_decision-making_tree
Tree from UZH (German) www.methodenberatung.uzh.ch
Slide 70
Creating charts
Manuals
This introduction refers to the manual "IBM SPSS Statistics 22 Brief Guide"
Find this manual and also "GPL Reference Guide for IBM SPSS Statistics" here:
◦ www-01.ibm.com/support/docview.wss?uid=swg27038407#en
Slide 71
Creating and Editing Charts
SPSS provides a large number of options for producing charts and diagrams.
The graphics options are available on the Graphs menu.
Either use Chart Builder or Legacy Dialogs which are the old styled commands
Slide 72
Bar Chart – Self-Study 45 minutes
Create a bar chart of mean income for different levels of job satisfaction.
From the menus choose: Graphs�Chart Builder...
Click OK
Slide 73
Exercise 02: Second Part – Additional Topics
Ressources => www.schwarzpartners.ch/ZNZ_2014 => Exercises SPSS => Exercise 02
Slide 74
Notes: