INTRODUCTION TO SPSS - University of...

Preview:

Citation preview

Statistical Package for the Social Sciences

• INTRODUCTION TO SPSS SPSS for Windows

Version 16.0:

• Its first version in 1968

• In 1975. SPSS Statistics

were designed

INTRODUCTION TO SPSS

Objective

• About the four-windows in SPSS

• The basics of managing data files

• The basic analysis in SPSS

Introduction: What is SPSS?

• Originally it is an acronym of Statistical

Package for the Social Science but now it

stands for Statistical Product and Service

Solutions

• One of the most popular statistical

packages which can perform data

manipulation and analysis with simple

instructions

Statistical Package for the Social Sciences

• INTRODUCTION TO SPSSSPSS for Windows

Version 16.0:

• Its first version in 1968

• In 1975. SPSS Statistics

were designed

Data Classification Level of Measurement Statistics Test 1.Categorical

Categories data

Gender: Male (M), Female

(F)

Religion/Nationality

1. Nominal Level

Categories data:

Gender, Name of City,

Political Party

* There is no distance

property

Qualitative Analysis

(Non-parametric test)

2.Rank

Ranking or Ordering data

Income Level:

1= Very Low Income

2= Low Income

3= Medium Income

4= High Income

5= Very High Income

Education Level:

1= Primary

2= Secondary

3= Height School

4= Collage

5= University

1. Ordinal Level

Ranking data/

Opinion/Attitude data:

1= Very Disagree

2= Disagree

3= Neutral

4= Agree

5= Very Agree

Median

Chi-square test (χ 2 –

test)

Spearman Rank

Correlation etc.

Bar Chart

Pie Chart

3. Metric

Certain unit of

Measurement:

Income =$ 215,

Expenditure=$234,

Agri. Productivity=245

kg./unit,

Speed = 125.56 km./hr.

etc.

3. Interval Level

Fixed and equal units

Distances between

Categories

Zero point is not available

Metric data:

G.P.A.= 3.52

Thermometer: 40o F, 0o C

etc.

Quantitative Analysis

(Parametric test)

, S.D.

t-test (two set of

data)

ANOVA (F-test),

(more than two

groups

of data)

4. Ratio Level

Defined Zero point

Metric data:

Real Number : Age,

Income, , Expenditure,

Area, Product,

Height, Weight, Volume

etc.

Pearson’s

Correlation

Multiple Regression

Factor Analysis

Correlation Matrix

Index Number etc.

Histograms Chart

X

Open Spss Windows Click

Click

SPSS Windows

The Four Windows: 1-Data editor

2-Output viewer

3-Syntax editor

4-Script window

First Windows: Data Editor

• Data Editor

Spreadsheet-like system for defining, entering, editing, and

displaying data. Extension of the saved file will be “*.sav”

Second Windows: Output Viewer

• Output Viewer

Displays output and errors. Extension of the saved file will be

“*.spv”

Third Windows: Syntax editor

• Syntax Editor

Text editor for syntax composition. Extension of the saved file

will be “*.sps”

Fourth Windows: Script Window

• Script Window

Provides the opportunity to write full-blown programs, in a

BASIC-like language. Text editor for syntax composition.

Extension of the saved file will be “.*sbs”

The basics of managing

data files

Opening SPSS

• The default window will have the data editor

• There are two sheets in the window:

1. Data view 2. Variable view

Data View window

• The Data View window

This sheet is visible when you first open the Data Editor and this

sheet contains the data

• Click on the tab labeled Variable View

Click

Types of Variables • What are variables you would

consider in buying a second

hand bike?

• Brand (Trek, Raleigh)

• Type (road, mountain, racer)

• Components (Shimano, no name)

• Age

• Condition (Excellent, good, poor)

• Price

• Frame size

• Number of gears

Variable View window

• This sheet contains information about the data set that is stored

with the dataset • Name

– The first character of the variable name must be alphabetic

– Variable names must be unique, and have to be less than 64

characters.

– Spaces are NOT allowed.

Variable View window: Type

• Type

– Click on the ‘type’ box. The two basic types of variables that you

will use are numeric and string. This column enables you to

specify the type of variable.

Variable View window: Width

• Width

– Width allows you to determine the number of characters SPSS will allow to be entered for the variable

Variable View window: Decimals

• Decimals

– Number of decimals

– It has to be less than or equal to 16

3.14159265

Variable View window: Label

• Label

– You can specify the details of the variable

– You can write characters with spaces up to 256

characters

Variable View window: Values

• Values

– This is used and to suggest which

numbers represent which categories when

the variable represents a category

Defining the value labels

• Click the cell in the values column as shown below

• For the value, and the label, you can put up to 60 characters.

• After defining the values click add and then click OK.

Click

Practice 1

• How would you put the following information into SPSS?

Value = 1 represents Male and Value = 2 represents Female

Name Gender Height

JAUNITA 2 5.4

SALLY 2 5.3

DONNA 2 5.6

SABRINA 2 5.7

JOHN 1 5.7

MARK 1 6

ERIC 1 6.4

BRUCE 1 5.9

Practice 1 (Solution Sample)

Click

Click

Saving the data

• To save the data file you created simply click ‘file’ and click ‘save as.’

You can save the file in different forms by clicking “Save as type.”

Click

Click

Click

Import Files In SPSS

Three types of Data

1. SPSS Files or SPSS Data

2. Excel files or Excel Data

3. Text files or Text data

1.SPSS Files or SPSS Data 1-Click

5-Click

2-Click 3-Click

4-Click

6-Click

Resulting Windows

2. Opening Excel Files 1-Click

2-Click 3-Click

1-Click

2-Click

3-Click

5-Click

4-Click

1-Click

2-Click

3-Click

4-Click

Selection Of 1or More Then 1 Variable

3-Click

2-Click (change variable)

1-Click

Click

1-Click

2-Click

Open window

3.Opening Text Files • 1st Step

2nd Step

3rd Step

4th Step

Sorting the data

• Click ‘Data’ and then click Sort Cases

Sorting the data (cont’d)

• Double Click ‘Name of the students.’ Then click

ok.

Click

Click

Practice

• How would you sort the data by the

‘Height’ of students in descending order?

• Answer

– Click data, sort cases, double click ‘height of

students,’ click ‘descending,’ and finally click

ok.

Transforming data

• Click ‘Transform’ and then click ‘Compute Variable…’

Transforming data (cont’d)

• Example: Adding a new variable named ‘lnheight’ which is

the natural log of height

– Type in lnheight in the ‘Target Variable’ box. Then type in

‘ln(height)’ in the ‘Numeric Expression’ box. Click OK

Click

Transforming data (cont’d)

• A new variable ‘lnheight’ is added to the table

Practice

• Create a new variable named “sqrtheight”

which is the square root of height.

• Answer

The basic analysis

The basic analysis of SPSS

• Frequencies – This analysis produces frequency tables showing

frequency counts and percentages of the values of individual variables.

• Descriptives – This analysis shows the maximum, minimum,

mean, and standard deviation of the variables

• Linear regression analysis – Linear Regression estimates the coefficients of

the linear equation

Frequency distributions and

graphing

Levels of Measurement

Frequency distributions

Graphing data

Stages in scientific investigation

Obtain your data:

Usually get data from a sample, taken from a population.

Descriptive statistics:

Statistical Information of data.

Inferential statistics:

Use data from a sample to reveal characteristics of the

population from which the sample data were selected.

Levels (scales) of measurement

• Nominal Scale: Consists of a set of categories

that have different names. – Measurements on a nominal scale label and

categorize observations, but do not make any

quantitative distinctions between observations.

– Example:

• Eye color: blue, green, brown, hazel

Levels of measurement

• Ordinal Scale: Consists of a set of categories that are organized in an ordered sequence. – Measurements on an ordinal scale rank observations

in terms of size or magnitude.

– Example: • T-shirt size: Small, Med, Lrg, XL, XXL

Levels of measurement

• Interval Scale: Consists of ordered categories where all of the categories are

intervals of exactly the same size. – With an interval scale, equal differences between

numbers on the scale reflect equal differences in magnitude.

– Ratios of magnitudes are not meaningful.

– Example: • Fahrenheit temperature scale

20º 40º

“Not Twice as hot”

Levels of measurement

• Ratio scale: An interval scale with the additional

feature of an absolute zero point.

– With a ratio scale, ratios of numbers DO reflect ratios

of magnitude.

SPSS doesn’t distinguish between these, collapses

them into ‘scale’ measurements

Distributions

• The data that we have entered into our SPSS

files form distributions.

• Each column of information in the data view

corresponds to the scores of a variable for the

individuals in our sample.

Levels of measurement

1. Nominal (categorical or frequency data):

When numbers are used as names.

e.g. street numbers, footballers' numbers.

We can do with nominal data is count how often each

number occurs (i.e. get frequencies of categories).

2. Ordinal

When numbers are used as ranks.

e.g. order of finishing in a race: the first three finishers are

"1", "2" and "3", but the difference between "1" and "2" is

unlikely to be the same as between "2" and "3".

3. Interval

When measurements are made on a scale with equal intervals

between points on the scale

e.g. temperature on Celsius scale

4. Ratio

When measurements are made on a scale with equal intervals

between points on the scale, and the scale has a true zero

point.

e.g. height, weight, time, distance.

Nominal data masquerading as scale

measurements

SPSS uses numbers as codes for nominal data.

Here “1” = “male” and “2” = “female. These are names, not

numbers

Frequency distributions

50 scores on a statistics exam.

84 82 72 70 72

80 62 96 86 68

68 87 89 85 82

87 85 84 88 89

86 86 78 70 81

70 86 88 79 69

79 61 68 75 77

90 86 78 89 81

67 91 82 73 77

80 78 76 86 83

Raw (ungrouped) Frequency Distribution

Score Freq Score Freq Score Freq Score Freq

96 1 86 6 76 1 66 0

95 0 85 2 75 1 65 0

94 0 84 2 74 0 64 0

93 0 83 1 73 1 63 0

92 0 82 3 72 2 62 1

91 1 81 2 71 0 61 1

90 1 80 2 70 3

89 3 79 2 69 1

88 2 78 3 68 3

87 2 77 2 67 1

Class interval width = 3

Score Frequency

94-96 1

91-93 1

88-90 6

85-87 10

82-84 6

79-81 6

76-78 6

73-75 2

70-72 5

67-69 5

64-66 0

61-63 2

Class interval width = 5

Score Frequency

95-99 1

90-94 2

85-89 15

80-84 10

75-79 9

70-74 6

65-69 5

60-64 2

Grouped Frequency Distributions

Grouped Frequency Distributions

Raw Frequency of Scores (Class

Interval = 3):

0

2

4

6

8

10

12

94

-96

91

-93

88

-90

85

-87

82

-84

79

-81

76

-78

73

-75

70

-72

67

-69

64

-66

61

-63

Score

Ra

w F

req

ue

nc

y

Raw Frequency of Scores (Class

Interval = 5):

0

2

4

6

8

10

12

14

16

95

-99

90

-94

85

-89

80

-84

75

-79

70

-74

65

-69

60

-64

Score

Ra

w F

req

ue

nc

y

Score Raw Freq.

(=total in

each cell)

94-96 1

91-93 1

88-90 6

85-87 10

82-84 6

79-81 6

76-78 6

73-75 2

70-72 5

67-69 5

64-66 0

61-63 2

Cumulative Frequency Distributions Cumulative freq.

(=each cell total + all

preceding cell totals)

50

49

48

42

32

26

20

14 ( = 2+5+5+0+2)

12 ( = 5+5+0+2)

7 ( = 5+0+2)

2 ( = 0+2)

2 ( = 2)

Cumulative freq.

(= cum. freq. as %

of total)

100

98

96

84

64

52

40

28 ( = (14/50)*100 )

24 ( = (12/50)*100 )

14 ( = (7/50)*100 )

4 ( = (2/50)*100 )

4 ( = (2/50)*100 )

Cumulative frequency graph

0

10

20

30

40

50

60

70

80

90

100

62 65 68 71 74 77 80 83 86 89 92 95

Score

Fre

qu

en

cy

(%

to

tal)

Relative Frequency Distributions

Useful for comparing groups with different totals. Group A: N = 50

Score Raw Freq.

96-100 3

91-95 4

86-90 11

81-85 15

76-80 8

71-75 4

66-70 2

61-65 3

Total: 50

Group B: N = 80

Score Raw Freq.

96-100 3

91-95 4

86-90 18

81-85 24

76-80 11

71-75 9

66-70 5

61-65 6

Total: 80

Rel. Freq.

6 %

8 %

22 %

30 %

16 %

8 %

4 %

6 %

100 %

Rel. Freq.

3.75 %

5.00 %

22.50 %

30.00 %

13.75 %

11.25 %

6.25 %

7.50 %

100 %

Relative frequency = (cell total/overall total) x 100

Raw Frequencies of Scores (N = 50)

0

2

4

6

8

10

12

14

16

96-100 91-95 86-90 81-85 76-80 71-75 66-70 61-65

Score

Ra

w f

req

ue

nc

y

Raw Frequency and Relative Frequency Distributions

Only the scale of the graph changes - not the pattern of frequencies.

Relative Frequencies of Scores (N = 50)

0

5

10

15

20

25

30

35

96-100 91-95 86-90 81-85 76-80 71-75 66-70 61-65

Score

Re

lati

ve

fre

qu

en

cy

(%

)

Frequency of accidents

0

10

20

30

40

50

volvo mini porsche

Type of car driven

No

. o

f a

cc

ide

nts

pe

r y

ea

r

Effects of aspect ratio and scale on graph appearance

(a) A graph aimed at giving an accurate impression...

Frequency of accidents

0

10

20

30

40

50

volvo mini porsche

Type of car driven

No

. o

f a

cc

ide

nts

pe

r y

ea

r

(b) A tall thin graph exaggerates apparent differences...

Frequency of accidents

0

10

20

30

40

50

volvo mini porsche

Type of car driven

No

. o

f accid

en

ts p

er

year

(c) A low wide graph minimises apparent differences...

Frequency of accidents

10

20

30

40

50

volvo mini porsche

Type of car driven

No

. o

f a

cc

ide

nts

pe

r y

ea

r

(d) Starting the scale at a value other than zero can also

exaggerate apparent differences.

Graphing averages If plotting averages, always include a measure of how scores are spread

out around the average.

0

500

1000

1500

2000

2500

3000

porsche drivers skoda drivers

Me

an

an

nu

al in

su

ran

ce

pre

miu

m (

+/-

1 S

.D.)

Type of driver

Opening the sample data

• Open ‘Employee data.sav’ from the SPSS

– Go to “File,” “Open,” and Click Data

Opening the sample data

• Go to Program Files,” “SPSSInc,” “SPSS17,” and

“Samples” folder.

• Open “Employee Data.sav” file

Frequencies

• Click ‘Analyze,’ ‘Descriptive statistics,’ then

click ‘Frequencies’

Frequencies

• Click gender and put it into the variable box.

• Click ‘Charts.’

• Then click ‘Bar charts’ and click ‘Continue.’

Click Click

Frequencies

• Finally Click OK in the Frequencies box.

Click

Using the Syntax editor

• Click ‘Analyze,’ ‘Descriptive statistics,’ then

click ‘Frequencies.’

• Put ‘Gender’ in the Variable(s) box.

• Then click ‘Charts,’ ‘Bar charts,’ and click

‘Continue.’

• Click ‘Paste.’

Click

Using the Syntax editor

• Highlight the commands in the Syntax editor

and then click the run icon.

• You can do the same thing by right clicking the

highlighted area and then by clicking ‘Run

Current’

Click

Right

Click!

Practice

• Do a frequency analysis on the variable “minority”

• Create pie charts for it

• Do the same analysis using the syntax editor

Answer

Click

Descriptives

• Click ‘Analyze,’ ‘Descriptive statistics,’ then

click ‘Descriptives…’

• Click ‘Educational level’ and ‘Beginning

Salary,’ and put it into the variable box.

• Click Options

Click

Descriptives

• The options allows you to analyze other

descriptive statistics besides the mean and Std.

• Click ‘variance’ and ‘kurtosis’

• Finally click ‘Continue’

Click

Click

Descriptives

• Finally Click OK in the Descriptives box. You will

be able to see the result of the analysis.

Regression Analysis

• Click ‘Analyze,’ ‘Regression,’ then click

‘Linear’ from the main menu.

Regression Analysis

• For example let’s analyze the model

• Put ‘Beginning Salary’ as Dependent and ‘Educational Level’ as

Independent. (SPSS 16 but SPSS 17 has a different Window)

edusalbegin 10

Click Click

Regression Analysis

• Clicking OK gives the result

Plotting the regression line

• Click ‘Graphs,’ ‘Legacy Dialogs,’

‘Interactive,’ and ‘Scatterplot’ from the

main menu.

Plotting the regression line

• Drag ‘Current Salary’ into the vertical axis box

and ‘Beginning Salary’ in the horizontal axis

box.

• Click ‘Fit’ bar. Make sure the Method is

regression in the Fit box. Then click ‘OK’.(SPSS

16 but SPSS 17 has a different Window)

Click Set this to

Regression!

Practice

• Find out whether or not the previous

experience of workers has any affect

on their beginning salary?

– Take the variable “salbegin,” and

“prevexp” as dependent and independent

variables respectively.

• Plot the regression line for the above

analysis using the “scatter plot” menu.

Answer

Click

Click on the “fit” tab to make

sure the method is regression

Recommended