Upload
others
View
37
Download
2
Embed Size (px)
Citation preview
•
Statistical Package for the Social Sciences
• INTRODUCTION TO SPSS SPSS for Windows
Version 16.0:
• Its first version in 1968
• In 1975. SPSS Statistics
were designed
INTRODUCTION TO SPSS
Objective
• About the four-windows in SPSS
• The basics of managing data files
• The basic analysis in SPSS
Introduction: What is SPSS?
• Originally it is an acronym of Statistical
Package for the Social Science but now it
stands for Statistical Product and Service
Solutions
• One of the most popular statistical
packages which can perform data
manipulation and analysis with simple
instructions
•
Statistical Package for the Social Sciences
• INTRODUCTION TO SPSSSPSS for Windows
Version 16.0:
• Its first version in 1968
• In 1975. SPSS Statistics
were designed
Data Classification Level of Measurement Statistics Test 1.Categorical
Categories data
Gender: Male (M), Female
(F)
Religion/Nationality
1. Nominal Level
Categories data:
Gender, Name of City,
Political Party
* There is no distance
property
Qualitative Analysis
(Non-parametric test)
2.Rank
Ranking or Ordering data
Income Level:
1= Very Low Income
2= Low Income
3= Medium Income
4= High Income
5= Very High Income
Education Level:
1= Primary
2= Secondary
3= Height School
4= Collage
5= University
1. Ordinal Level
Ranking data/
Opinion/Attitude data:
1= Very Disagree
2= Disagree
3= Neutral
4= Agree
5= Very Agree
Median
Chi-square test (χ 2 –
test)
Spearman Rank
Correlation etc.
Bar Chart
Pie Chart
3. Metric
Certain unit of
Measurement:
Income =$ 215,
Expenditure=$234,
Agri. Productivity=245
kg./unit,
Speed = 125.56 km./hr.
etc.
3. Interval Level
Fixed and equal units
Distances between
Categories
Zero point is not available
Metric data:
G.P.A.= 3.52
Thermometer: 40o F, 0o C
etc.
Quantitative Analysis
(Parametric test)
, S.D.
t-test (two set of
data)
ANOVA (F-test),
(more than two
groups
of data)
4. Ratio Level
Defined Zero point
Metric data:
Real Number : Age,
Income, , Expenditure,
Area, Product,
Height, Weight, Volume
etc.
Pearson’s
Correlation
Multiple Regression
Factor Analysis
Correlation Matrix
Index Number etc.
Histograms Chart
X
Open Spss Windows Click
Click
SPSS Windows
The Four Windows: 1-Data editor
2-Output viewer
3-Syntax editor
4-Script window
First Windows: Data Editor
• Data Editor
Spreadsheet-like system for defining, entering, editing, and
displaying data. Extension of the saved file will be “*.sav”
Second Windows: Output Viewer
• Output Viewer
Displays output and errors. Extension of the saved file will be
“*.spv”
Third Windows: Syntax editor
• Syntax Editor
Text editor for syntax composition. Extension of the saved file
will be “*.sps”
Fourth Windows: Script Window
• Script Window
Provides the opportunity to write full-blown programs, in a
BASIC-like language. Text editor for syntax composition.
Extension of the saved file will be “.*sbs”
The basics of managing
data files
Opening SPSS
• The default window will have the data editor
• There are two sheets in the window:
1. Data view 2. Variable view
Data View window
• The Data View window
This sheet is visible when you first open the Data Editor and this
sheet contains the data
• Click on the tab labeled Variable View
Click
Types of Variables • What are variables you would
consider in buying a second
hand bike?
• Brand (Trek, Raleigh)
• Type (road, mountain, racer)
• Components (Shimano, no name)
• Age
• Condition (Excellent, good, poor)
• Price
• Frame size
• Number of gears
Variable View window
• This sheet contains information about the data set that is stored
with the dataset • Name
– The first character of the variable name must be alphabetic
– Variable names must be unique, and have to be less than 64
characters.
– Spaces are NOT allowed.
Variable View window: Type
• Type
– Click on the ‘type’ box. The two basic types of variables that you
will use are numeric and string. This column enables you to
specify the type of variable.
Variable View window: Width
• Width
– Width allows you to determine the number of characters SPSS will allow to be entered for the variable
Variable View window: Decimals
• Decimals
– Number of decimals
– It has to be less than or equal to 16
3.14159265
Variable View window: Label
• Label
– You can specify the details of the variable
– You can write characters with spaces up to 256
characters
Variable View window: Values
• Values
– This is used and to suggest which
numbers represent which categories when
the variable represents a category
Defining the value labels
• Click the cell in the values column as shown below
• For the value, and the label, you can put up to 60 characters.
• After defining the values click add and then click OK.
Click
Practice 1
• How would you put the following information into SPSS?
Value = 1 represents Male and Value = 2 represents Female
Name Gender Height
JAUNITA 2 5.4
SALLY 2 5.3
DONNA 2 5.6
SABRINA 2 5.7
JOHN 1 5.7
MARK 1 6
ERIC 1 6.4
BRUCE 1 5.9
Practice 1 (Solution Sample)
Click
Click
Saving the data
• To save the data file you created simply click ‘file’ and click ‘save as.’
You can save the file in different forms by clicking “Save as type.”
Click
Click
Click
Import Files In SPSS
Three types of Data
1. SPSS Files or SPSS Data
2. Excel files or Excel Data
3. Text files or Text data
1.SPSS Files or SPSS Data 1-Click
5-Click
2-Click 3-Click
4-Click
6-Click
Resulting Windows
2. Opening Excel Files 1-Click
2-Click 3-Click
1-Click
2-Click
3-Click
5-Click
4-Click
1-Click
2-Click
3-Click
4-Click
Selection Of 1or More Then 1 Variable
3-Click
2-Click (change variable)
1-Click
Click
1-Click
2-Click
Open window
3.Opening Text Files • 1st Step
•
2nd Step
3rd Step
4th Step
Sorting the data
• Click ‘Data’ and then click Sort Cases
Sorting the data (cont’d)
• Double Click ‘Name of the students.’ Then click
ok.
Click
Click
Practice
• How would you sort the data by the
‘Height’ of students in descending order?
• Answer
– Click data, sort cases, double click ‘height of
students,’ click ‘descending,’ and finally click
ok.
Transforming data
• Click ‘Transform’ and then click ‘Compute Variable…’
Transforming data (cont’d)
• Example: Adding a new variable named ‘lnheight’ which is
the natural log of height
– Type in lnheight in the ‘Target Variable’ box. Then type in
‘ln(height)’ in the ‘Numeric Expression’ box. Click OK
Click
Transforming data (cont’d)
• A new variable ‘lnheight’ is added to the table
Practice
• Create a new variable named “sqrtheight”
which is the square root of height.
• Answer
The basic analysis
The basic analysis of SPSS
• Frequencies – This analysis produces frequency tables showing
frequency counts and percentages of the values of individual variables.
• Descriptives – This analysis shows the maximum, minimum,
mean, and standard deviation of the variables
• Linear regression analysis – Linear Regression estimates the coefficients of
the linear equation
Frequency distributions and
graphing
Levels of Measurement
Frequency distributions
Graphing data
Stages in scientific investigation
Obtain your data:
Usually get data from a sample, taken from a population.
Descriptive statistics:
Statistical Information of data.
Inferential statistics:
Use data from a sample to reveal characteristics of the
population from which the sample data were selected.
Levels (scales) of measurement
• Nominal Scale: Consists of a set of categories
that have different names. – Measurements on a nominal scale label and
categorize observations, but do not make any
quantitative distinctions between observations.
– Example:
• Eye color: blue, green, brown, hazel
Levels of measurement
• Ordinal Scale: Consists of a set of categories that are organized in an ordered sequence. – Measurements on an ordinal scale rank observations
in terms of size or magnitude.
– Example: • T-shirt size: Small, Med, Lrg, XL, XXL
Levels of measurement
• Interval Scale: Consists of ordered categories where all of the categories are
intervals of exactly the same size. – With an interval scale, equal differences between
numbers on the scale reflect equal differences in magnitude.
– Ratios of magnitudes are not meaningful.
– Example: • Fahrenheit temperature scale
20º 40º
“Not Twice as hot”
Levels of measurement
• Ratio scale: An interval scale with the additional
feature of an absolute zero point.
– With a ratio scale, ratios of numbers DO reflect ratios
of magnitude.
SPSS doesn’t distinguish between these, collapses
them into ‘scale’ measurements
Distributions
• The data that we have entered into our SPSS
files form distributions.
• Each column of information in the data view
corresponds to the scores of a variable for the
individuals in our sample.
Levels of measurement
1. Nominal (categorical or frequency data):
When numbers are used as names.
e.g. street numbers, footballers' numbers.
We can do with nominal data is count how often each
number occurs (i.e. get frequencies of categories).
2. Ordinal
When numbers are used as ranks.
e.g. order of finishing in a race: the first three finishers are
"1", "2" and "3", but the difference between "1" and "2" is
unlikely to be the same as between "2" and "3".
3. Interval
When measurements are made on a scale with equal intervals
between points on the scale
e.g. temperature on Celsius scale
4. Ratio
When measurements are made on a scale with equal intervals
between points on the scale, and the scale has a true zero
point.
e.g. height, weight, time, distance.
Nominal data masquerading as scale
measurements
SPSS uses numbers as codes for nominal data.
Here “1” = “male” and “2” = “female. These are names, not
numbers
Frequency distributions
50 scores on a statistics exam.
84 82 72 70 72
80 62 96 86 68
68 87 89 85 82
87 85 84 88 89
86 86 78 70 81
70 86 88 79 69
79 61 68 75 77
90 86 78 89 81
67 91 82 73 77
80 78 76 86 83
Raw (ungrouped) Frequency Distribution
Score Freq Score Freq Score Freq Score Freq
96 1 86 6 76 1 66 0
95 0 85 2 75 1 65 0
94 0 84 2 74 0 64 0
93 0 83 1 73 1 63 0
92 0 82 3 72 2 62 1
91 1 81 2 71 0 61 1
90 1 80 2 70 3
89 3 79 2 69 1
88 2 78 3 68 3
87 2 77 2 67 1
Class interval width = 3
Score Frequency
94-96 1
91-93 1
88-90 6
85-87 10
82-84 6
79-81 6
76-78 6
73-75 2
70-72 5
67-69 5
64-66 0
61-63 2
Class interval width = 5
Score Frequency
95-99 1
90-94 2
85-89 15
80-84 10
75-79 9
70-74 6
65-69 5
60-64 2
Grouped Frequency Distributions
Grouped Frequency Distributions
Raw Frequency of Scores (Class
Interval = 3):
0
2
4
6
8
10
12
94
-96
91
-93
88
-90
85
-87
82
-84
79
-81
76
-78
73
-75
70
-72
67
-69
64
-66
61
-63
Score
Ra
w F
req
ue
nc
y
Raw Frequency of Scores (Class
Interval = 5):
0
2
4
6
8
10
12
14
16
95
-99
90
-94
85
-89
80
-84
75
-79
70
-74
65
-69
60
-64
Score
Ra
w F
req
ue
nc
y
Score Raw Freq.
(=total in
each cell)
94-96 1
91-93 1
88-90 6
85-87 10
82-84 6
79-81 6
76-78 6
73-75 2
70-72 5
67-69 5
64-66 0
61-63 2
Cumulative Frequency Distributions Cumulative freq.
(=each cell total + all
preceding cell totals)
50
49
48
42
32
26
20
14 ( = 2+5+5+0+2)
12 ( = 5+5+0+2)
7 ( = 5+0+2)
2 ( = 0+2)
2 ( = 2)
Cumulative freq.
(= cum. freq. as %
of total)
100
98
96
84
64
52
40
28 ( = (14/50)*100 )
24 ( = (12/50)*100 )
14 ( = (7/50)*100 )
4 ( = (2/50)*100 )
4 ( = (2/50)*100 )
Cumulative frequency graph
0
10
20
30
40
50
60
70
80
90
100
62 65 68 71 74 77 80 83 86 89 92 95
Score
Fre
qu
en
cy
(%
to
tal)
Relative Frequency Distributions
Useful for comparing groups with different totals. Group A: N = 50
Score Raw Freq.
96-100 3
91-95 4
86-90 11
81-85 15
76-80 8
71-75 4
66-70 2
61-65 3
Total: 50
Group B: N = 80
Score Raw Freq.
96-100 3
91-95 4
86-90 18
81-85 24
76-80 11
71-75 9
66-70 5
61-65 6
Total: 80
Rel. Freq.
6 %
8 %
22 %
30 %
16 %
8 %
4 %
6 %
100 %
Rel. Freq.
3.75 %
5.00 %
22.50 %
30.00 %
13.75 %
11.25 %
6.25 %
7.50 %
100 %
Relative frequency = (cell total/overall total) x 100
Raw Frequencies of Scores (N = 50)
0
2
4
6
8
10
12
14
16
96-100 91-95 86-90 81-85 76-80 71-75 66-70 61-65
Score
Ra
w f
req
ue
nc
y
Raw Frequency and Relative Frequency Distributions
Only the scale of the graph changes - not the pattern of frequencies.
Relative Frequencies of Scores (N = 50)
0
5
10
15
20
25
30
35
96-100 91-95 86-90 81-85 76-80 71-75 66-70 61-65
Score
Re
lati
ve
fre
qu
en
cy
(%
)
Frequency of accidents
0
10
20
30
40
50
volvo mini porsche
Type of car driven
No
. o
f a
cc
ide
nts
pe
r y
ea
r
Effects of aspect ratio and scale on graph appearance
(a) A graph aimed at giving an accurate impression...
Frequency of accidents
0
10
20
30
40
50
volvo mini porsche
Type of car driven
No
. o
f a
cc
ide
nts
pe
r y
ea
r
(b) A tall thin graph exaggerates apparent differences...
Frequency of accidents
0
10
20
30
40
50
volvo mini porsche
Type of car driven
No
. o
f accid
en
ts p
er
year
(c) A low wide graph minimises apparent differences...
Frequency of accidents
10
20
30
40
50
volvo mini porsche
Type of car driven
No
. o
f a
cc
ide
nts
pe
r y
ea
r
(d) Starting the scale at a value other than zero can also
exaggerate apparent differences.
Graphing averages If plotting averages, always include a measure of how scores are spread
out around the average.
0
500
1000
1500
2000
2500
3000
porsche drivers skoda drivers
Me
an
an
nu
al in
su
ran
ce
pre
miu
m (
+/-
1 S
.D.)
Type of driver
Opening the sample data
• Open ‘Employee data.sav’ from the SPSS
– Go to “File,” “Open,” and Click Data
Opening the sample data
• Go to Program Files,” “SPSSInc,” “SPSS17,” and
“Samples” folder.
• Open “Employee Data.sav” file
Frequencies
• Click ‘Analyze,’ ‘Descriptive statistics,’ then
click ‘Frequencies’
Frequencies
• Click gender and put it into the variable box.
• Click ‘Charts.’
• Then click ‘Bar charts’ and click ‘Continue.’
Click Click
Frequencies
• Finally Click OK in the Frequencies box.
Click
Using the Syntax editor
• Click ‘Analyze,’ ‘Descriptive statistics,’ then
click ‘Frequencies.’
• Put ‘Gender’ in the Variable(s) box.
• Then click ‘Charts,’ ‘Bar charts,’ and click
‘Continue.’
• Click ‘Paste.’
Click
Using the Syntax editor
• Highlight the commands in the Syntax editor
and then click the run icon.
• You can do the same thing by right clicking the
highlighted area and then by clicking ‘Run
Current’
Click
Right
Click!
Practice
• Do a frequency analysis on the variable “minority”
• Create pie charts for it
• Do the same analysis using the syntax editor
Answer
Click
Descriptives
• Click ‘Analyze,’ ‘Descriptive statistics,’ then
click ‘Descriptives…’
• Click ‘Educational level’ and ‘Beginning
Salary,’ and put it into the variable box.
• Click Options
Click
Descriptives
• The options allows you to analyze other
descriptive statistics besides the mean and Std.
• Click ‘variance’ and ‘kurtosis’
• Finally click ‘Continue’
Click
Click
Descriptives
• Finally Click OK in the Descriptives box. You will
be able to see the result of the analysis.
Regression Analysis
• Click ‘Analyze,’ ‘Regression,’ then click
‘Linear’ from the main menu.
Regression Analysis
• For example let’s analyze the model
• Put ‘Beginning Salary’ as Dependent and ‘Educational Level’ as
Independent. (SPSS 16 but SPSS 17 has a different Window)
edusalbegin 10
Click Click
Regression Analysis
• Clicking OK gives the result
Plotting the regression line
• Click ‘Graphs,’ ‘Legacy Dialogs,’
‘Interactive,’ and ‘Scatterplot’ from the
main menu.
Plotting the regression line
• Drag ‘Current Salary’ into the vertical axis box
and ‘Beginning Salary’ in the horizontal axis
box.
• Click ‘Fit’ bar. Make sure the Method is
regression in the Fit box. Then click ‘OK’.(SPSS
16 but SPSS 17 has a different Window)
Click Set this to
Regression!
Practice
• Find out whether or not the previous
experience of workers has any affect
on their beginning salary?
– Take the variable “salbegin,” and
“prevexp” as dependent and independent
variables respectively.
• Plot the regression line for the above
analysis using the “scatter plot” menu.
Answer
Click
Click on the “fit” tab to make
sure the method is regression