84
Harvard-MIT Data Centers This tutorial was written as an introductory guide to SPSS for social scientists and social science students, including scholars performing quantitative research and undergraduates working on their senior theses.A more general guide is provided with the Windows version of SPSS.(Once you have opened a data set, simply click on Help and then Tutorials, and then the purple book with Tutorials next to it; if you do not see the purple book, click on Contents on the top right.)For more information on SPSS, you can also go to the S P SS H o m epage. This tutorial will take you through the steps of testing a simple research question: Do votersopinions on how the president is handling the economy influence which party they will vote for in House elections?If a voter believes that the Democratic president is handling the economy poorly, for example, is she more likely to vote for the Republican House candidate?We can divide this research question into two variables.The dependent variable, or the variable we are trying to explain, is the vote in House elections.The independent variable, or the variable that is supposed to be influencing the dependent variable, is voter opinion on how the president is handling the economy.However, a relationship between opinions on the economy and the vote might be spurious.�� Maybe ones party identification (whether one calls oneself a Republican or Democrat) drives both the vote in House elections and opinions on how the president is handling the economy.In this case, party identification is a control variable; it is another variable that might be influencing this relationship that needs to be held constant.�� We will test it by looking at the American National Election Survey of 1998.The A m erican National El e ction Stud y (often known by the acronyms ANESor just NES ) is a telephone survey of voting age Americans conducted every two years by the Uni v ersity of M i chigan.NES data sets are widely used by political scientists for studying American electionsmore often than G a llup, for example.�� (The Harvard-MIT Data Center is located in the Government Department at Harvard University .Please check out our homepage.) Harvard-MIT Users Others (a) (c) (d) SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml 1 of 84 5/1/2013 2:03 PM

Harvard SPSS Tutorial.pdf

Embed Size (px)

DESCRIPTION

SPSS Tutorial

Citation preview

Page 1: Harvard SPSS Tutorial.pdf

Harvard-MIT Data Center����s

This tutorial was written as an introductory guide to SPSS for social scientists and social science students, including scholars performing quantitative

research and undergraduates working on their senior theses.� A more general guide is provided with the Windows version of SPSS.� (Once you have

opened a data set, simply click on Help and then Tutorials, and then the purple book with Tutorials next to it; if you do not see the purple book, click on

Contents on the top right.)� For more information on SPSS, you can also go to the SPSS Homepage.

This tutorial will take you through the steps of testing a simple research question: Do voters� opinions on how the president is handling the economy

influence which party they will vote for in House elections?� If a voter believes that the Democratic president is handling the economy poorly, for

example, is she more likely to vote for the Republican House candidate?�

We can divide this research question into two variables.� The dependent variable, or the variable we are trying to explain, is the vote in House

elections.� The independent variable, or the variable that is supposed to be influencing the dependent variable, is voter opinion on how the president is

handling the economy.� However, a relationship between opinions on the economy and the vote might be �spurious.�� Maybe one�s party

identification (whether one calls oneself a Republican or Democrat) drives both the vote in House elections and opinions on how the president is handling

the economy.� In this case, party identification is a control variable; it is another variable that might be influencing this relationship that needs to be

�held constant.��

We will test it by looking at the American National Election Survey of 1998.� The American National Election Study (often known by the acronyms

�ANES� or just �NES�) is a telephone survey of voting age Americans conducted every two years by the University of Michigan.� NES data sets are

widely used by political scientists for studying American elections�more often than Gallup, for example.��

(The Harvard-MIT Data Center is located in the Government Department at Harvard University.� Please check out our homepage.)

Harvard-MIT UsersOthers

(a)

(c)

(d)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

1 of 84 5/1/2013 2:03 PM

Page 2: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

2 of 84 5/1/2013 2:03 PM

Page 3: Harvard SPSS Tutorial.pdf

This tutorial is divided into a few sections.� It begins with an introduction into why to use SPSS� on page 3.� Then, it outlines how to read a data set

and use it to produce basic statistics.� It does so by looking at actual data�the American National Election Survey of 1998�in an attempt to answer a

real research question.

There are four main steps to manipulating data with SPSS:

1. Reading Data, or how to translate raw data or data in another form into SPSS (page 6);

2. Transforming Data, or how to either create new variables or change the values of existing variables (page 17);

3. Defining Variables, or how to put labels onto data so that people can understand it, and how to structure data so that SPSS knows how to read it

properly (page 27);

4. Creating Tables, (page 46.)

Most SPSS users prefer to use its Windows graphic interface, that is, pointing with the mouse and clicking on the options they want.� At Harvard, those

who want the greater control of typing in commands tend to use other statistical packages.� Nonetheless, SPSS provides a way to not only type commands

but also switch between this syntax editor (see page 4) and the Windows point-and-click method.�� While this tutorial will focus on the latter, the

command code will be mentioned briefly as well.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

3 of 84 5/1/2013 2:03 PM

Page 4: Harvard SPSS Tutorial.pdf

SPSS is the statistical package most widely used by political scientists. There seem to be several reasons why:

1. Force of habit: SPSS has been around since the late 1960s.� (Political scientist Norman Nie, who co-authored The Changing American Voter with

Sidney Verba, developed it.�� �SPSS� originally stood for �Statistical Package for the Social Sciences,� but the name has since been changed

to reflect the marketing of SPSS outside the academic community;)

2. Of the major packages, it seems to be the easiest to use for the most widely used statistical techniques;

3. One can use it with either a Windows point-and-click approach or through syntax (i.e., writing out of SPSS commands.)� Each has its own

advantages, and the user can switch between the approaches;

4. Many of the widely used social science data sets come with an easy method to translate them into SPSS; this significantly reduces the preliminary

work needed to explore new data.

There are also two important limitations that deserve mention at the outset:

1. SPSS users have less control over statistical output than, for example, Stata or Gauss users.� For novice users, this hardly causes a problem.� But,

once a researcher wants greater control over the equations or the output, she or he will need to either choose another package or learn techniques for

working around SPSS�s limitations;

2. SPSS has problems with certain types of data manipulations, and it has some built in quirks that seem to reflect its early creation.�� The best known

limitation is its weak lag functions, that is, how it transforms data across cases.� For new users working off of standard data sets, this is rarely a

problem.� But, once a researcher begins wanting to significantly alter data sets, he or she will have to either learn a new package or develop greater

skills at manipulating SPSS.

Overall, SPSS is a good first statistical package for people wanting to perform quantitative research in social science because it is easy to use and because it

can be a good starting point to learn more advanced statistical packages.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

4 of 84 5/1/2013 2:03 PM

Page 5: Harvard SPSS Tutorial.pdf

Originally, SPSS was written like a programming language.� Users wrote SPSS syntax (often on a mainframe computer and even with key-punch cards)

that performed the tasks they wanted.� In SPSS-Windows, users can still use syntax by using the syntax editor.� They would

a) Open the syntax window by clicking on File, dragging down to New, and choosing Syntax;

b) Type the SPSS syntax that they want to run;

c) Click on Run and drag down to All.� (Alternatively, if they want to run only a few commands, they would highlight those commands, click on Run,

and drag down to Selection.)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

5 of 84 5/1/2013 2:03 PM

Page 6: Harvard SPSS Tutorial.pdf

We will introduce the syntax for every step that we take in this tutorial, except the last one.� We have included the complete syntax below as an

example.� (Please notice that every command ends with a period.)�

If you want to learn more about SPSS syntax, your computer lab might have SPSS syntax guides; try the base guide first.� You can also find online syntax

guides by clicking on Help, dragging down to Syntax Guide, and choosing Base.� Finally, by clicking on Help, dragging down to Topics, and then choosing

Index, you can find the syntax for particular tasks.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

6 of 84 5/1/2013 2:03 PM

Page 7: Harvard SPSS Tutorial.pdf

Compute clinteco = v980219.

Recode v980336 (1=1)(2=3)(3=2)(4=7)(5=2)(8=8)(9=9) into party3.

Compute hvote = v980313.

Variable label clinteco "Approval of Clinton, Economy".

Value Labels clinteco

1 "Strongly Approve"� 2 "Not Strongly Approve"� 4 "Not Strongly Disapprove"

����������� 5 "Strongly Disapprove"� 8 "DK"� 9 "NA�� 0 "Inappropriate".

Missing Values clinteco (0,8,9).

Variable Label party3 "Party Identification".

Value Labels party3�� 1 "Democrat"� 2� "Ind./ No Pref."� 3� "Republican"� 7� "Other Party"������������������� 8� "Don't Know"�� 9� "No Answer Given".��������������������Missing Values party3 (7, 8, 9).

Variable Label hvote "Vote in House Election".

Value Labels hvote

����������� 1� "Democratic Candidate"

����������� 2� "Republican Candidate"

����������� 3� "Third Party/Independent Cand."

����������� 7� "Name given not on candidate list"

����������� 8� "Don't Know or Refuse"

����������� 9� "No Answer Given"

����������� 0� "Inappropriate, Didn't Vote".

Missing Values hvote (7 thru 9, 0).

Frequencies vars = clinteco hvote party3.

Crosstab tables = hvote by clinteco

�� / cells = count column.

Crosstab tables = hvote by clinteco by party3

�� / cells = count column.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

7 of 84 5/1/2013 2:03 PM

Page 8: Harvard SPSS Tutorial.pdf

Under most circumstances, a data set is not simply handed to you.� You would have to search through an archive and then download the data that will

most likely help you complete your research project.� The Harvard-MIT Data Center has such an archive, but because of licensing agreements with other

organizations, we can give access to most data sets to only Harvard and MIT users.� Therefore, we have separate directions on how to download data for

Harvard and MIT users (page 7) and for others (page 13).� (Special thanks to the Center for Political Studies at the University of Michigan, which has

generously made the American National Election Study of 1998 publicly available, so that people outside of Harvard and MIT can also use this tutorial.)*

* Sapiro, Virginia, Steven J. Rosenstone, and the National Election Studies. NATIONAL ELECTION STUDIES, 1998: POST-ELECTION STUDY

[dataset]. Ann Arbor, MI: University of Michigan, Center for Political Studies [producer and distributor], 1999.

These materials are based on work supported by the National Science Foundation under Grant Nos. : SBR-9707741, SBR-9317631, SES-9209410,

SES-9009379, SES-8808361, SES-8341310, SES-8207580, and SOC77-08885.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

8 of 84 5/1/2013 2:03 PM

Page 9: Harvard SPSS Tutorial.pdf

Any opinions, findings and conclusions or recommendations expressed in these materials are those of the author(s) and do not necessarily reflect those of

the National Science Foundation.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

9 of 84 5/1/2013 2:03 PM

Page 10: Harvard SPSS Tutorial.pdf

Downloading Data: Harvard-MIT Users:

Our goal is to download the �American National Election Study, 1998.��� We can accomplish this in several easy steps:

1. Go to the Harvard-MIT Data Center home page;�2. Double click [Search Holdings] on the top left;

3. In the page that opens, there will be a search option in the top, left window under �Harvard-MIT Data Center Catalog.�� (a) Type some part of

the study title in the box after �Search for,� for example, �1998.�� (b) Click �go�.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

10 of 84 5/1/2013 2:03 PM

Page 11: Harvard SPSS Tutorial.pdf

4. In the middle-left window, click the title of the study (�American National Election Study, 1998: Post-Election Survey.�)� DO NOT choose the

cumulative data file or the pilot study;

5. The study�s description will appear on the right.� Click [Data];

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

11 of 84 5/1/2013 2:03 PM

Page 12: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

12 of 84 5/1/2013 2:03 PM

Page 13: Harvard SPSS Tutorial.pdf

6. Scroll down until you see �da2684_LREC.por� and next to it �Subset/Crosstabs�.� (�da� means that it is data;� 8493 is the study number;

LREC means the length of each record, or line of data; and �por� means that it can be imported into SPSS.)� Click on �Subset.�

7. (a) Under �Choose an output format:� choose �SPSS Portable File�.� (b) Then click [Create Subset].� (Do not worry about the options of

selecting cases or variables.� It is usually better to download the entire data set.� That way, if you later decide that you need a variable that you

hadn�t originally considered important, you won�t have to download the data set a second time.)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

13 of 84 5/1/2013 2:03 PM

Page 14: Harvard SPSS Tutorial.pdf

8. After a few minutes, a dialogue box with the title �Unknown File Type� will appear.� It will ask what you want to do with the file that you are

downloading.� Click [Save File].

9. Next, save the data set.� It will be saved as an �SPSS portable file,� which means that it can be imported into an �SPSS save file,� the file type

that you want.

a) Find a location for saving the file.� If you are working at the Harvard-MIT Data Center lab, you may save it on the C drive, but it will be deleted

after you log off;�b) Name the file nes1998.� The complete name will be nes1998.por;

c) Click [save].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

14 of 84 5/1/2013 2:03 PM

Page 15: Harvard SPSS Tutorial.pdf

Congratulations!� You have just successfully downloaded a data set from the Harvard-MIT Data Center archives and are ready to open it as an SPSS save

file.� (If you plan to use this data set past the tutorial, or if you might search for another data set in the HDMC archives, you might get more information

on how to search for and download data (page 11.)� Otherwise, move on to reading data in SPSS, (which begins on page 14.)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

15 of 84 5/1/2013 2:03 PM

Page 16: Harvard SPSS Tutorial.pdf

More on Searching for Data Sets at HMDC:

If you plan to use NES 1998 or any other data set for your own research, you will need more information both to select the right data set and to choose

among often hundreds of variables in the data set.� Here are a few hints.

1. After you search the archives and (a) click on a study that you might be interested in, (b) a description will appear that might help you decide whether

this data set is worth exploring.

2. You can also search the codebook with key words.� For example, if I want to find out what variables this study has on parties, I would (a) Click

[data], (b) type �Party� under �Codebook (search),� and (c) click go.� (If you decide to use the study, you could download the entire codebook

by (z) clicking on cb#### LREC, with #### as the study number.)�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

16 of 84 5/1/2013 2:03 PM

Page 17: Harvard SPSS Tutorial.pdf

The codebook search shown above would produce the following information:

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

17 of 84 5/1/2013 2:03 PM

Page 18: Harvard SPSS Tutorial.pdf

Finally, the study that you want might have to be ordered.� This is not the big deal that it might appear on the surface.� Simply click [Order] or order the

data, fill in the form that appears, and then click [Submit Order].�

For more information about searching the archives and HMDC in general, see the Harvard-MIT Data Center Frequently Asked Questions.� Otherwise,

move on to reading data in SPSS (on page 14.)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

18 of 84 5/1/2013 2:03 PM

Page 19: Harvard SPSS Tutorial.pdf

Downloading Data: Other Users:

Simply download the data set by clicking the following:

After a few minutes, a dialogue box with the title �Unknown File Type� will appear.� It will ask what you want to do with the file that you are

downloading.� Click [Save File].

Next, save the data set.� It will be saved as an �SPSS portable file,� which means that it can be imported into an �SPSS save file,� the file type that

you want.

d) Find a location for saving the file.� If you are working at the Harvard-MIT Data Center lab, you may save it on the C drive, but it will be deleted

after you log off;�e) Name the file nes1998.� The complete name will be nes1998.por;

f) Click [save].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

19 of 84 5/1/2013 2:03 PM

Page 20: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

20 of 84 5/1/2013 2:03 PM

Page 21: Harvard SPSS Tutorial.pdf

Now that you�ve created your portable file, called ANES94.por, you want to translate it into an SPSS save file, which you can use to manipulate the data

and create tables.�1.

Open SPSS on your computer.� You should see an Excel-like file.� (A dialogue box titled �What would you like to do?� might appear.� Hit

[Cancel.])� If you are using version 9.0 or earlier of SPSS for Windows, the file should look like this:

If you are using� version 10.0, it should look like this:

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

21 of 84 5/1/2013 2:03 PM

Page 22: Harvard SPSS Tutorial.pdf

The primary differences between 9.0 and 10.0 are related to the items circled above.� We will discuss this difference when it becomes relevant.�Otherwise, the graphics will show the simpler 9.0 version, which is still the most commonly used version at the time that this tutorial was being written.

2. (a) Click on File and drag down to Open.� (b) In the dialogue box that appears, make �Files of type� on the bottom �SPSS portable (*.por).��(c) Find Nes1998 and double click on it;

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

22 of 84 5/1/2013 2:03 PM

Page 23: Harvard SPSS Tutorial.pdf

3.

Data with variable names should now appear.� This is called an SPSS �save file,� which means nothing more than that it is in a form in which SPSS

can read it.� It is also untitled.� Go to File and Save, and then save it as NES98; its complete name will be NES98.sav.���

As you can see, data in SPSS is structured in a matrix.� Each row is a separate case.� So, case 1 is a person interviewed for the survey, case 2 is another

person interviewed, etc.� The columns are separate variables.� Each one is either the coded responses to a particular question, or some altered version

of coded responses.� (For this reason, data sets will often have more variables than questions asked on the survey, and they often will have multiple

variables based on the same responses.)

In its initial form, data in SPSS can be hard to interpret.� As you can see in this data set, each variable name is nothing more than a V and then usually a

long number.� Similarly, the code for each response can be meaningless to us.� For example, it would be hard to remember the variable name for the

party identification variable (V980336) or that a �5� means that the respondent refers to himself or herself as a �Democrat.�� So, our first step is to

create variables with names that we can understand and labels to help us interpret their code.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

23 of 84 5/1/2013 2:03 PM

Page 24: Harvard SPSS Tutorial.pdf

Transforming data usually means creating new variables and/or changing old values into new values.� For example, let�s say that variable v980243 is a

�feeling thermometer� on Al Gore, in which the respondent rates Gore from 0 to 100.� In this form, this variable is unmanageable for making

crosstabulation tables.� So, in this case, you might create a new variable called Gore5, in which v980243 is collapsed into five categories: When v980243

equals 0 to 20, Gore5 becomes equal to 1; when v980243 equals 21 to 40, Gore5 becomes equal to 2; etc.�

The two most common methods to transform data in SPSS are to compute (page 18), which uses simple algebra, and to recode (page 22), when you define

the transformation from value to value.� They will be shown while we transform our three variables in our mini-study:

1. The independent variable, Clinton Economic Scale (page 18);

2. A control variable, recode (page 22);

3. The dependent variable, the House vote (page 26).

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

24 of 84 5/1/2013 2:03 PM

Page 25: Harvard SPSS Tutorial.pdf

We will create a new variable, clinteco, from a variable already in the data set, v980219.� The values for the old variable and the new, �target� variable

are as follows:

Meaning��������������������������������������� v980219����������clinteco

����������� Strongly Approve������������������������������� 1..................... 1

����������� Not Strongly Approve������������������������ 2..................... 2

����������� Not Strongly Disapprove�������������������� 4..................... 4

����������� Strongly Disapprove��������������������������� 5..................... 5

����������� Don�t Know��������������������������������������� 8.....................

8

����������� No Answer Given������������������������������� 9..................... 9

����������� Inappropriate, Not Asked������������������� 0..................... 0

Since the old and new variables have identical values, we can create the new variable through simple algebra (clinteco = v980219).� Therefore we can

use the compute function:

1. Click on Transform and then drag down to Compute;

2. Type �Clinteco� under �Target Variable:�;

3. In the long list of variables, find v980219.� (It might read �A12a/b. STRENGTH APPR/DISAPP CLINTON ECO [v980219].�� Click on it once;

4. Click on the right arrow. �v980219 should appear under �Numeric expression:�;

5. Click [OK].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

25 of 84 5/1/2013 2:03 PM

Page 26: Harvard SPSS Tutorial.pdf

After SPSS stops �Running execute� and tells you that the �SPSS for Windows processor is ready,� go all the way to the right of your data set;

clinteco will be in the last column.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

26 of 84 5/1/2013 2:03 PM

Page 27: Harvard SPSS Tutorial.pdf

NOTE: If you choose to use the syntax editor, you could produce the same results with the following commands:

����������� Compute clinteco = v980219.

����������� Execute.

Do you want to learn

a few more things about � If not, go on to recode on page 22.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

27 of 84 5/1/2013 2:03 PM

Page 28: Harvard SPSS Tutorial.pdf

SPSS users use the compute command primarily when they want to alter variables in some mathematical way.� For example, variables v980260 and

v980261 are respectively the feeling thermometers for the Democratic and Republican Parties.� (They tell us how the respondent rated each party from 0

to 100.)� Let�s say that we wanted to create a new variable, Dem.Rep, which tells us the relative rating of each party.� So, if the respondent rated the

Democratic Party 60 and the Republican Party 40, Dem.Rep would equal 20.� Conversely, if the respondent rated the Democratic Party 40 and the

Republican Party 60, Dem.Rep would equal -20.�

You can create the variable Dem.Rep with two syntax commands as follows:

1. Click on Transform and then drag down to Compute;

2. Type �Dem.Rep� under �Target Variable:�;

3. In the long list of variables, find v980260. Click on it once;

4. Click on the right arrow. v980260 should appear under �Numeric expression:�;

5. Click on the minus sign button on what appears to be a calculator.� A minus sign should appear under �Numeric expression:�;

6. Click on v980261 in the long list of variables;

7. Click on the right arrow. v980261 should appear under �Numeric expression:�;

8. Click [OK].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

28 of 84 5/1/2013 2:03 PM

Page 29: Harvard SPSS Tutorial.pdf

But, what if we wanted to have Dem.Rep tell us the distance between each party rating?� So, Dem.Rep would equal 20 if the respondent rated the

Democratic Party 60 and the Republican Party 40, or if the respondent rated the Democratic Party 40 and the Republican Party 60.� In other words,

Dem.Rep equals the absolute value of v980260 minus v980261.� In this case, we would have to use the absolute value function.

SPSS provides a long list of mathematical functions that can be used with the compute command.� To create this absolute value function, you would add

two steps between Steps 2 and 3 in the previous set of instructions:

2a)��� Click ABS(numexpr) in the long list of functions under �Functions:�;

2b)��� Click the upward pointing arrow next to �Functions:�

SPSS will highlight the area between the parentheses.� As you click on variables and functions, SPSS will place that information in the location of the

highlighted area, in this case between the parentheses.� Note that you could also simply type �abs(v980260- v980261), and it would produce the exact

same result.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

29 of 84 5/1/2013 2:03 PM

Page 30: Harvard SPSS Tutorial.pdf

NOTE: You could produce the same two computes with the following commands in the syntax window:

�Compute Dem.Rep =� v980260 - v980261.

Execute.

Compute Dem.Rep =� abs(v980260 - v980261).

Execute.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

30 of 84 5/1/2013 2:03 PM

Page 31: Harvard SPSS Tutorial.pdf

In this case, the given variable is v980336, and the target variable will be called party3, since we will be concerned with only three values: Democrats,

Independents, and Republicans.� In this case, the old and new values are not identical:

�Meaning��������������������������������������� v980336�����������

party3

����������� Democrat�������������������������������������������1..................... 1

����������� Republican�����������������������������������������2..................... 3

����������� Independent��������������������������������������� 3..................... 2

����������� Other Party���������������������������������������� 4.....................

7

����������� No Preference������������������������������������ 5..................... 2

����������� Don�t Know��������������������������������������� 8.....................

8

����������� No Answer Given������������������������������� 9..................... 9

1. Click on Transform, drag down to Recode, and choose Put into different variables;

2. From the long list of variables find v980336.� Click on it.

3. Click the right arrow. V980336 should appear in the center box under �Numeric Variable � Output Variable;�4. On the top right under �Output Variable� and �Name:� type the new variable name, party3;

5. Click [Change].� This center box should now read �v980336 � party3�;

6. Click on [Old and New Values].� A new dialogue box will open.� See below on how to use it;

7. When you are finished with Step 6, click [OK].� After SPSS has stopped executing, go to the right of your data set to see your new variable.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

31 of 84 5/1/2013 2:03 PM

Page 32: Harvard SPSS Tutorial.pdf

After you click [Old and New Values�] a dialogue box will appear.� These are the steps you would take to specify the old and new values:

6a. Type in the old value (under v980336 in the table above) under �Old Value� and next to �Value:�;

6b. Type in a new value (under party3 in the table above) under �New Value� and next to �Value:�;

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

32 of 84 5/1/2013 2:03 PM

Page 33: Harvard SPSS Tutorial.pdf

6c. Click [Add], which will add it to the changes.� Repeat 6a to 6c for each old value;

6d. Click [Continue] when you are finished.

NOTE: If you choose to use the syntax editor, you could produce the same results with the following command:

Recode v980336 (1=1)(2=3)(3=2)(4=7)(5=2)(8=8)(9=9) into party3.

Execute.

Do you want to learn more about recode?� If not, move on to transforming the House vote (on page 26.)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

33 of 84 5/1/2013 2:03 PM

Page 34: Harvard SPSS Tutorial.pdf

Let�s go back to an earlier recode example.� Let�s say that you want to collapse the Al Gore feeling thermometer into five categories.� The changes

would be as follows:

Meaning��������������������������������������� v980243�����������Gore5

��������������������������������������������������������������������0-20�������������������� 1

������������������������������������������������������������������21-40�������������������� 2

������������������������������������������������������������������41-60�������������������� 3

������������������������������������������������������������������61-80�������������������� 4

����������������������������������������������������������������81-100�������������������� 5

����������� Don�t Know��������������������������������������997��������������� 997*

����������� No Answer Given������������������������������998��������������� 998*

����������� Inappropriate, Not Asked������������������ 999��������������� 999*

(*You would usually code these numbers as 7, 8, and 9 respectively.� I am using a less common approach to help with the SPSS example.)

Trying to recode v980243 into Gore5 from value to value (0�1, 1�1, 2�1, etc.) would take an exceptionally long time.� SPSS for Windows provides

many other combinations that can simplify this procedure.� These options for specifying old and new values are shown graphically and then described

below:

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

34 of 84 5/1/2013 2:03 PM

Page 35: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

35 of 84 5/1/2013 2:03 PM

Page 36: Harvard SPSS Tutorial.pdf

If the old value is:

a) A single value, such as �Value: [997]� in our current example;

b) A system missing value.� We�ll discuss missing values below;

c) Any missing value.� Again, we�ll discuss missing values below;

d) A range of values, such as �Range: [21] through [40]� in our current example;

e) Within the lowest range of values, such as �Range: Lowest through [20]� in our current example;

f) Within the highest range of values;

g) Any old value not already in the box �Old�New.�� (See setting and �Else�Copy� in graphic below, which transforms 997, 998, and 999 into

the same respective value.)

If the new value is:

x) A single value, such as �Value: [1]� in our current example;

y) A system missing value.� We�ll discuss missing values below;

z) The same as the old value. (See setting and �Else�Copy� in graphic below, which transforms 997, 998, and 999 into the same respective value.)

If you created the recodes with the point-and-click approach, the �Recode into Different Variables� dialogue box could look as follows just before you

click [Continue].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

36 of 84 5/1/2013 2:03 PM

Page 37: Harvard SPSS Tutorial.pdf

NOTE: You could produce the same results with the following syntax:

Recode v980243� (lo thru 20=1)(21 thru 40=2)(41 thru 60 = 3)(61 thru 80=4)(81 thru 100=5)

���������������������������� (else=copy) into Gore5.

Execute.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

37 of 84 5/1/2013 2:03 PM

Page 38: Harvard SPSS Tutorial.pdf

The given variable is v980313, and the target variable would be called HVote.� As you can see below, the old and new values are identical:

Meaning��������������������������������������� v980313�����������HVote

����������� Democratic Candidate������������������������ 1..................... 1

����������� Republican Candidate������������������������� 2..................... 2

����������� Third Party/Independent Cand.����������� 3..................... 3

����������� Name given not on candidate list��������� 7..................... 7

����������� Don�t Know or Refuse����������������������� 8..................... 8

����������� No Answer Given������������������������������� 9..................... 9

����������� Inappropriate, Didn�t Vote������������������ 0..................... 0

Using either the compute or recode function, create HVote.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

38 of 84 5/1/2013 2:03 PM

Page 39: Harvard SPSS Tutorial.pdf

With the data editor in front of you, go all the way to the right of your data set.� The last three variables should be the three you created: clinteco, party3,

and hvote.�

You might notice that they are difficult to read.� Who could remember after a few hours (or a month) what a �5� means in clinteco, or for that matter

what clinteco itself is supposed to measure?� Moreover, there would be a problem if you ran statistics on these variables: All the responses that should

often be ignored, like �I don�t know� and �I refuse to answer,� would be included into the tables and statistical calculations.

To resolve these problems, SPSS includes what is commonly called �data definition.�� There are four general types of variable definitions that most

SPSS users use.�� The variable type tells SPSS what type of data it is (for example, numeric or string) and how many digits or characters it can hold;�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

39 of 84 5/1/2013 2:03 PM

Page 40: Harvard SPSS Tutorial.pdf

� The variable label replaces the variable name with a variable description in output (e.g., �Approval of Clinton, Economy� instead of

�clinteco;�)

� The value labels replaces each value with a description of the value (e.g.,� �Strongly Disapprove� instead of �5;�)

� Missing values tell SPSS which values to ignore when it runs statistics or creates tables.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

40 of 84 5/1/2013 2:03 PM

Page 41: Harvard SPSS Tutorial.pdf

Defining data is done quite differently in SPSS 9.0 and SPSS 10.0.� For this reason, our instructions will be influenced by which version you use.�Instructions for SPSS 9.0 (and earlier) begin on page 29, while instructions for SPSS version 10.0 begin on page 36.� (There are hints below, incase you

are not sure which version you are using.)

Hints:� You can tell if you are using version 10.0 because:

a) The toolbar includes �S-PLUS;�b) The bottom left includes the options �Data View� and �Variable View.�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

41 of 84 5/1/2013 2:03 PM

Page 42: Harvard SPSS Tutorial.pdf

Let�s begin by defining the data for clinteco.� Double click on the column header, where it says the variable�s name. A dialogue box called �Define

Variables� should appear.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

42 of 84 5/1/2013 2:03 PM

Page 43: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

43 of 84 5/1/2013 2:03 PM

Page 44: Harvard SPSS Tutorial.pdf

We will define our data in four main steps:

1. Click [Type], for Variable types (page 30);

2. Click [Labels], for Variable and value labels (page 31);

3. Click [Missing Values] for Missing Values (page 33);

4. Click [OK] when you are finished.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

44 of 84 5/1/2013 2:03 PM

Page 45: Harvard SPSS Tutorial.pdf

Step 1: Defining variable types in 9.0:

As you can see in the dialogue box �Define Variable Type� in the graphic below, SPSS gives you eight options for variable types (numeric through

string.)� For each type, SPSS gives you options for the structure.� For the current setting, numeric, it gives you the option of width (in this case, �8�)

and the number of decimal places.� Unless it is told otherwise, SPSS assigns new numeric variables the structure �f8.2�, which means that it can show

eight digits with two decimal places, such as the number 123456.78.

Normally, you would not need to change the variable type of a numeric variable (i.e., variables made up of numbers,) for several reasons.� Changing the

numeric structure has no influence on the calculation of statistics, and the numeric code does not show on tables when value labels are defined.�However, the need could arise, like year of birth, in which value labels are inappropriate and two decimal places (1975.00) make the data less clear.

So, for practice, we will change numeric variable clinteco from f8.2 to f8.0.�1a.�� Change the Decimal Places: from �2� to �0�;

1b.�� Click [Continue].

After you click [Continue], SPSS will take you back to the �Define Variable� data type.� If you clicked [OK] and returned to the Data Editor, the data

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

45 of 84 5/1/2013 2:03 PM

Page 46: Harvard SPSS Tutorial.pdf

for clinteco would now have no decimal places.� (1.00 would become 1, 2.00 would become 2, etc.)

Step 2: Defining variable and value labels in 9.0:

You can define variable and value labels quite easily in SPSS-Windows.� Looking at the dialogue box called �Define Variables,� click [Labels�].� (If

you don�t know how to find this dialogue box, go back to the beginning of this section, where it says in large purple letters: �Step 3: Defining

Variables.�)� A dialogue box called �Define Labels: clinteco� should appear.

Once you have this dialogue box on the screen, take the following steps:

2a. In the box next to �Variable label:� type �Approval of Clinton, Economy�.� You could give it a longer variable label, but since these labels come

out on tables, a brief but clear variable label is usually the best;

2b. Type �1� in the box next to �Value:�;

2c. Type �Strongly Approve� in the box next to �Value Label:�;

2d. Click [Add].� The bottom box should now read: 1 = �Strongly Approve�;

--.� Repeat 2b through 2d for each value and its label.� The full list is as follows:

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

46 of 84 5/1/2013 2:03 PM

Page 47: Harvard SPSS Tutorial.pdf

1 = �Strongly Approve������������ 2 = �Not Strongly Approve������������ 4 = �Not Strongly Disapprove������������ 5 = �Strongly Disapprove������������ 8 = �DK������������ 9 = �NA������������ 0 = �Inappropriate������ When you are done, the dialogue box should look like the graphic below;

2e. When you have finished plugging in the value labels, click [Continue].� This will return you to the dialogue box called �Define Variables.������������

When you are done plugging in the variable and value labels, the dialogue box should look like this (before you click [Continue]):

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

47 of 84 5/1/2013 2:03 PM

Page 48: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

48 of 84 5/1/2013 2:03 PM

Page 49: Harvard SPSS Tutorial.pdf

Step 3: Defining missing values in 9.0:

The final step in data definition is to define missing values, or values that SPSS ignores when it runs statistics.� (You usually don�t want to include

�don�t know� and �no answer,� for example, because they change the meaning of statistics and make many tables unreadable.)� There are two

types of missing values in SPSS:

� System-missing values, which means that SPSS assigned it this value.� In your data editor, system missing values have a dot (.);

� User-missing values, which means that either you or the person who constructed the data set assigned the value as missing.� In your data editor,

user-missing values are numbers that can have value labels.

User-missing values are easy to define in SPSS-Windows.� Looking at the dialogue box called �Define Variables,� click [Missing Values�].� (If you

don�t know how to find this dialogue box, go back to the beginning of this section, where it says in large purple letters: �Step 3: Defining

Variables.�)� A dialogue box called �Define Missing Values: clinteco� should appear.�

Clinteco is supposed to have three user-missing values: 0, 8, and 9.� You have three options:

3a.�� Make 0, 8, and 9 three discrete missing values;

3b.�� Define it as a range of values.� Had the missing values been 7, 8, and 9, you could define it as Low: [7] and High: [9];

3c.�� Define it as a discrete value and a range.� You could define it Low: [8] and High: [9] and Discrete value: [0].

When you are done, click [Continue].� This will return you to the dialogue box called �Define Variables.�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

49 of 84 5/1/2013 2:03 PM

Page 50: Harvard SPSS Tutorial.pdf

Step 4: Double-Checking your data definitions in 9.0:

If you are at the dialogue box with the title �Define Variable,� click [OK].� This will return you to the data editor. �Now you can double-check your

work.�

First, you should notice that the values for clinteco have been replaced by value labels.� (If there are no value labels, point and click on View and drag

down to Value Labels.)� If you point the pointer over the header for clinteco, it should also show variable labels, like in this graphic.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

50 of 84 5/1/2013 2:03 PM

Page 51: Harvard SPSS Tutorial.pdf

You may also check the data definition by clicking on Utilities and dragging down to Variables.� The Dialogue box �Variables� should appear.�Highlight clinteco in the list of variables.� The white box in the middle should give you (a) the variable label (�Approval of Clinton, Economy,�) (b) the

variable type (F8, which used to be F8.2,) (c) the missing values, and (d) value labels.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

51 of 84 5/1/2013 2:03 PM

Page 52: Harvard SPSS Tutorial.pdf

NOTE: You could produce the variable labels, value labels, and missing values for clinteco with the following syntax:

Variable label clinteco ��Approval of Clinton, Economy�.

Value Labels clinteco

1 �Strongly Approve������������ 2 �Not Strongly Approve������������ 4 �Not Strongly Disapprove������������ 5 �Strongly Disapprove������������ 8 �DK������������ 9 �NA������������ 0 �Inappropriate�.

Missing Values clinteco (0,8,9).

Execute.

Please go to Continue Data to continue the tutorial.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

52 of 84 5/1/2013 2:03 PM

Page 53: Harvard SPSS Tutorial.pdf

Let�s begin by defining the data for clinteco.� Click �Variable View.�

The following data editor should emerge, if you drag it down to the end of the variable list.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

53 of 84 5/1/2013 2:03 PM

Page 54: Harvard SPSS Tutorial.pdf

Now that we are in variable view, we can define clinteco in four main steps: (1) Variable types (page 37), (2) Variable labels (page 38), (3) Value labels

(page 39), and (4) Missing Values (page 41).

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

54 of 84 5/1/2013 2:03 PM

Page 55: Harvard SPSS Tutorial.pdf

Step 1: Defining variable types in 10.0:

If you look at variable view, you will see that the variable type for all three variables is �numeric,� that the length of each is 8 and the width is 2. This

means that SPSS recognizes clinteco, party3, and hvote as numbers (one can do math with these variables, for example) with a maximum of eight digits,

including two decimal places, such as the number 123456.78.� (Unless you specify otherwise, SPSS assigns all new numeric variables this structure, called

�f8.2�.� Don�t ask!) Normally, you would not need to change the variable type, since it does not influence statistical output, but the need does arise,

such as if you want to avoid decimal places for years (e.g., 1975.00).

So, for practice, we will change numeric variable clinteco from f8.2 to f8.0.�1. Click on the box to the right of clinteco and under [Type].� Borders will appear around that box, and a smaller gray box will appear on the right side

of the box;

2. Click on that gray box.� A dialogue box will appear;

3. In the white box next to �Decimal Places:�, change [2] to [0];�4. Click [OK].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

55 of 84 5/1/2013 2:03 PM

Page 56: Harvard SPSS Tutorial.pdf

Step 2: Defining variable labels in 10.0:

Variable labels are labels that tell you what the variable measures.� In other words, if you ran a table on clinteco without a variable label, the top of the

table would say �clinteco,� which would force you to remember what clinteco is supposed to represent.� With the variable label, you can briefly

describe what the variable represents, such as with the phrase �Approval of Clinton, Economy.�

In SPSS Windows version 10.0, variable labels are extremely easy to produce.� In the box to the right of clinteco and under� [Label], type �Approval

of Clinton, Economy.�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

56 of 84 5/1/2013 2:03 PM

Page 57: Harvard SPSS Tutorial.pdf

Step 3: Defining value labels in 10.0:

Value labels tell us the meaning behind specific values.� In other words, when a data set is created, no one wants to type in �Strongly Approve� every

time that a respondent answered �Strongly Approve.�� It would waste time and computer memory.� So, instead, the keypuncher types in an arbitrary

value to represent �Strongly Approve,� in this case �1�.� However, it is very

hard to remember what these arbitrary numbers represent.� For that reason, in SPSS, one can type in a label for each value.� So, for example, if you

create a frequency table for clinteco, the values will read �Strongly Approve,� �Not Strongly Approve,� �Not Strongly Disapprove,� etc. instead of

�1,� �2,� �4,� etc.

�You could create these labels as follows:

1. Click on the box to the right of clinteco and under [values].� Borders will appear around that box, and a smaller gray box will appear on the right side

of the box;

2. Click on that gray box.� A dialogue box will appear;

3. Type �1� in the box next to �Value:�;

4. Type �Strongly Approve� in the box next to �Value Label:�;

5. Click [Add].� The bottom box should now read: 1 = �Strongly Approve�;

n Repeat steps 3 through 5 for each value and its label.� The full list is as follows:

1 = �Strongly Approve�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

57 of 84 5/1/2013 2:03 PM

Page 58: Harvard SPSS Tutorial.pdf

2 = �Not Strongly Approve�4 = �Not Strongly Disapprove�5 = �Strongly Disapprove�8 = �DK�9 = �NA�0 = �Inappropriate�

6. When you have finished plugging in the value labels, click [OK].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

58 of 84 5/1/2013 2:03 PM

Page 59: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

59 of 84 5/1/2013 2:03 PM

Page 60: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

60 of 84 5/1/2013 2:03 PM

Page 61: Harvard SPSS Tutorial.pdf

Step 4: Defining missing values in 10.0:

The final step in data definition is to define missing values, or values that SPSS ignores when it runs statistics.� (You usually don�t want to include

�don�t know� and �no answer,� for example, because they change the meaning of statistics and make many tables unreadable.)� There are two

types of missing values in SPSS:

� System-missing values, which means that SPSS assigned it this value.� In your data editor, system missing values have a dot (.);

� User-missing values, which means that either you or the person who constructed the data set assigned the value as missing.� In your data editor,

user-missing values are numbers that can have value labels.

Clinteco is supposed to have three user-missing values: 0, 8, and 9.� You would define these missing values as follows:

1. Click on the box to the right of clinteco and under [Missing].� Borders will appear around that box, and a smaller gray box will appear on the right

side of the box;

2. Click on that gray box.� A dialogue box will appear;

3. You now have two options:

a) Define up to three discrete missing values.� In this case, the discrete values would be 0, 8, and 9;�����������b) Define either a range of missing values, or a range of values and one discrete missing value.� In this case, the range would be from 8 to 9 and the

discrete value would be zero.� (This would be filled into the white boxes as follows: Low [8], High [9], and Value [0].)

4. When you have finished plugging in the missing values, click [OK].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

61 of 84 5/1/2013 2:03 PM

Page 62: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

62 of 84 5/1/2013 2:03 PM

Page 63: Harvard SPSS Tutorial.pdf

Step 4: Double-Checking your data definitions in 10.0:

Now that you redefined clinteco, you might want to double-check your work before moving on to the next section. There are three main ways to check

data definition.��� The first, and most obvious, is to look at data view.� Your clinteco should look like the following graphic, with an �8� under

width, a �0� under Decimals, �Approval of Clinton, Economy� under Label, and �0, 8, 9� under Missing.� The main problem with this approach

is that it is hard to see the value labels.

A second approach is to examine the data by clicking Variable View.� You should be able to see the value labels for clinteco.� (If you see no labels at

all, click on View and then drag down to Value Labels.)� You can also see the variable label for clinteco by moving your mouse so that the arrow points at

the column header (i.e., where it says �clinteco.�)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

63 of 84 5/1/2013 2:03 PM

Page 64: Harvard SPSS Tutorial.pdf

The third and best way to check data definition is to click on Utilities and drag down to Variables.� The Dialogue box �Variables� should appear.� If

you highlight clinteco in the list of variables, the white box in the middle should give you (a) the variable label (�Approval of Clinton, Economy,�) (b)

the variable type (F8, which used to be F8.2,) (c) the missing values, and (d) value labels.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

64 of 84 5/1/2013 2:03 PM

Page 65: Harvard SPSS Tutorial.pdf

NOTE: You could produce the variable labels, value labels, and missing values for clinteco with the following syntax:

Variable label clinteco ��Approval of Clinton, Economy�.

Value Labels clinteco

1 �Strongly Approve������������ 2 �Not Strongly Approve������������ 4 �Not Strongly Disapprove������������ 5 �Strongly Disapprove������������ 8 �DK������������ 9 �NA������������ 0 �Inappropriate�.

Missing Values clinteco (0,8,9).

Execute.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

65 of 84 5/1/2013 2:03 PM

Page 66: Harvard SPSS Tutorial.pdf

Using the previous case as your example, produce variable labels, value labels, and missing values for party3 and hvote with the following specifications:

Party3:

Variable Label: �Party Identification�Missing Values: 7, 8, 9

Value Labels:

����������� 1 = �Democrat����������������������������������������������� 2 = �Ind./ No Pref.����������������������������������������� 3 = �Republican��������������������������������������������� 7 = �Other Party�������������������������������������������� 8 = �Don�t Know������������������������������������������� 9 = �No Answer Given�����������������������

HVote:

Variable Label: �Vote in House Election�Missing Values: 7 - 9, 0

Value Labels:

����������� 1 = �Democratic Candidate������������ 2 = �Republican Candidate������������ 3 = �Third Party/Independent Cand.������������ 7 = �Name given not on candidate list������������ 8 = �Don�t Know or Refuse������������ 9 = �No Answer Given������������ 0 = �Inappropriate, Didn�t Vote�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

66 of 84 5/1/2013 2:03 PM

Page 67: Harvard SPSS Tutorial.pdf

You are now prepared to begin creating tables that will help you test the hypothesis that voters� opinions on how the president is handling the economy

influence which party they will vote for in House elections.� We will be discussing two basic and widely used tables: the frequency table (page 47) and

the crosstabulation table (page 49).

Please note that this tutorial is not intended to be a mini-course on statistics, nor is it a substitute for proper instructions in social science research

methods.� Our goal is to introduce you to SPSS.� We will discuss how to read frequency and crosstabulation tables, but we will not attempt to

demonstrate how you could use more interesting and powerful statistical methods for analyzing the relationship between variables.� Instead, we

encourage you to learn more about social science research methods through courses or books.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

67 of 84 5/1/2013 2:03 PM

Page 68: Harvard SPSS Tutorial.pdf

As the name implies, a frequency table tells you the frequency of each value of a variable.� So, if there is a study of 1000 respondents, and 545 are

female, a frequency table of the variable �sex� could look as follows:

Female������ 545������� 54.5%

Male��������� 455������� 45.5%

Total������� 1000����� 100.0%

In SPSS, you would produce a frequency table as follows:

1. Click on Analyze, drag down to Descriptive Statistics, and then choose Frequencies;

2. Find clinteco, which is probably at the top or the bottom of your variable list.� Click on it so that it is highlighted yellow;

3. Click the right arrow;

4. Click [OK].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

68 of 84 5/1/2013 2:03 PM

Page 69: Harvard SPSS Tutorial.pdf

After it finishes �Running Frequencies, �SPSS will produce the table in an �output file.�� The frequency table is usually the second table.

NOTE: You could produce frequency tables with� the following syntax:

Frequencies vars = clinteco hvote party3.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

69 of 84 5/1/2013 2:03 PM

Page 70: Harvard SPSS Tutorial.pdf

Column 1:�� The value labels for valid and missing values.� (When there are no value labels, SPSS shows the values instead.)

Column 2:�� Frequency, or the number of cases that fall into each category.� For example, 676 people responded �strongly approve,� 1226 gave a

valid response to this question, and 4 answered that they did not know.� There are a total of 1281 cases.

Column 3:�� Percent.� This column shows the percent of each value regardless of whether the case is valid or not.� In this column you might check

what percent of cases are valid (95.7%) or what percent of respondents answered that they don�t know (0.3%).

Column 4:�� Valid percent.� This is the most widely used percent column in the frequency table.� If you are asked, �What percent of respondents

strongly approve of Clinton�s handling of the economy?� the person is usually concerned about the valid percent, or the percent of those

who actually answered the question.� 55% strongly approve of Clinton�s performance while 9% strongly disapprove.� (Please note that

responses to survey questions are heavily influenced by how the question is worded.� Contrary to what we all hear on TV and read in

newspapers, it is questionable whether you can reach any concrete conclusions from these statistics alone;)

Column 5:�� Cumulative percent, which gives you the valid percent of that value added to the valid percent of the previous values.� Notice how the

numbers in this column increase from a low number�55.1%, the same number for that value in the valid percent column�to 100.0%.

Now create frequency tables for hvote and party3.� You can create frequencies for multiple variables by either double clicking on each of them or by

darkening all the variables you want and then clicking the right arrow.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

70 of 84 5/1/2013 2:03 PM

Page 71: Harvard SPSS Tutorial.pdf

Crosstabulation tables, usually called �crosstabs� for short, are probably the simplest and most descriptive way to see whether there is a relationship

between two variables.� Crosstabs are easier to illustrate than explain.� Let�s say that you want to know whether there is a gender gap in party

identification: Are women more likely than men to be Democrats, and are men more likely than women to be Republicans?� The following crosstab might

help us answer this question.� It shows that women are about 10% more likely than men to identify themselves as Democrats.�� Conversely, the table

also suggests that men are slightly more likely than women to call themselves Republicans.�

��������������������������� Female�������� Male����������� Total

Republican������������ 271����������� 273����������� 544

���������������������������� 28.6%������� 32.8%������� 30.5%

Independent���������� 310����������� 316����������� 626

���������������������������� 32.7%������� 37.9%������� 35.1%

Democrat�������������� 368����������� 244����������� 612

���������������������������� 38.8%������� 29.3%������� 34.3%

Total��������������������� 949����������� 833��������� 1782

�������������������������� 100.0%����� 100.0%����� 100.0%

Please note that the independent variable (Gender) is the column while the dependent variable (Party) is the row, and that the percentages are based on the

independent variable; they are �column percents.�� This is the standard structure of crosstabs.� While the independent variable is sometimes put in

the rows (usually for reasons of space) the percentage is still based on the independent variable; in this case, you would use �row percents.�� Also

please note that the totals of any row or column should be at least 30; if any value for a variable has less than thirty cases, it should be combined, or

�collapsed,� with another value.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

71 of 84 5/1/2013 2:03 PM

Page 72: Harvard SPSS Tutorial.pdf

Now let us produce a crosstabulation table in SPSS that might help us answer our research question:

1. Click Analyze, drag down to Descriptive Statistics and choose Crosstabs;

2. From the long list of variables choose the independent variable, clinteco, and put it into �Columns(s):�;

3. Choose the dependent variable, hvote, and put it into �Row(s):�;

4. Click on [Cells]:

a) Under �Count� choose �Observed;��b) Under �Percentages� choose �Column;��c) Click [Continue].

5. Click [OK].

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

72 of 84 5/1/2013 2:03 PM

Page 73: Harvard SPSS Tutorial.pdf

The following table should appear in your output file.� (I reduced some labels to make it easier to read in this context.)

NOTE: You could produce the same crosstab table with the following syntax:

Crosstab tables = hvote by clinteco

�� / cells = count column.

The natural inclination is to read the percents down the column, like with the following statement: �Of those who strongly approve of Clinton�s

economic performance, 65% percent voted for a Democratic House candidate but only 35% voted for a Republican House candidate.�� The problem

with this approach is that it is heavily influenced by the distribution of the dependent variable.� To give a silly, exaggerated example, let�s propose the

hypothesis that respondents who approve of the way Clinton handles the economy are more likely to answer the question, �Who did you vote for?� than

respondents who disapprove.� Following the approach mentioned above, you might read the table below as follows: �Of those who strongly approve of

Clinton�s economic performance, 98% answered the question while 2% did not answer.�� But, if you look at the percents for �strongly disapprove�,

the results are practically identical.� In other words, the correct comparison is across the table: ����98% of those who strongly approve answered the

question, and 97% of those who strongly disapprove also answered the question.����� No difference!

����������������������������������������������� Not����������� Not

�������������������������� Strongly����� Strongly����� Strongly����� Strongly

������������������������� Approve���� Approve����� Disapp.����� Disapp.�������Total

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

73 of 84 5/1/2013 2:03 PM

Page 74: Harvard SPSS Tutorial.pdf

Gave Answer�������� 200����������� 256����������� 161�����������273����������� 890

���������������������������� 97.6%������� 96.2%������� 92.5%�������97.2%������� 96.1%

No Answer��������������� 5������������� 10�������������13��������������� 8������������� 38

������������������������������ 2.4%��������� 3.8%���������7.5%��������� 2.8%��������� 3.9%

Gave Answer�������� 205����������� 266����������� 174�����������281����������� 926

�������������������������� 100.0%����� 100.0%����� 100.0%����� 100.0%�����100.0%

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

74 of 84 5/1/2013 2:03 PM

Page 75: Harvard SPSS Tutorial.pdf

For this table as well, you would read the percents across each row (i.e., each category of the dependent variable.)� So, you might read this table with the

following statement: �Among people who voted, 65% of those who strongly approve of Clinton�s economic performance voted for a Democratic House

candidate, as compared to only 5% of those who strongly disapprove.�� (It is not necessary to read the percents in every row, just enough so that you

describe the table accurately to your readers.� Make sure that you include the table itself clearly and accurately.� You can produce tables in Microsoft

Word by clicking on Table and then dragging down to Insert Table.)

While there appears to be a strong relationship between the assessment of Clinton�s performance on the economy and the House vote, this crosstab

certainly does not end the discussion.� A closer look at the table should produce some suspicions.� First, the independent variable is very skewed.�More than half of the respondents (303 of 542) strongly approve of Clinton�s economic performance.� Second, of those who �not strongly approve,�only 36% voted for Clinton.� If approval of the economic performance decided the vote, then one would imagine that most voters who approved, even

marginally, would vote for a Democrat.

But, even if there were no immediate reason for suspicion, one would have to wonder: Is it possible that some other factor is causing this relationship?� Is

the relationship between the independent and dependent variable spurious?

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

75 of 84 5/1/2013 2:03 PM

Page 76: Harvard SPSS Tutorial.pdf

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

76 of 84 5/1/2013 2:03 PM

Page 77: Harvard SPSS Tutorial.pdf

Is the relationship between the independent and dependent variables spurious?� In other words, maybe there is another variable that influences both the

independent and dependent variables in such a way that a causal relationship appears when it does not really exist.

Let us take the research question: �Does sending more fire trucks to a fire simply increase the amount of property damage?� If one measured both

variables for a large number of fires, one would almost definitely find a strong relationship between them.� But any conclusions at this point would surely

be suspect, since this person has not yet considered a critical control variable, the size of the fire!�

Hypothesis:������ number of fire trucks � amount of property damage

Maybe: size of fire � amount of property damage, &

����������������������� size of fire � number of fire trucks

For our research question�whether voters� opinions on how the president is handling the economy influence which party they will vote for in House

elections�the most obvious control variable is party identification.� Maybe being Republican or Democratic influences both what candidate one votes

for and what one thinks of Clinton�s performance on the economy:

Hypothesis:������ Opinion of Clinton�s performance � House vote

Maybe: Party identification � House vote, &

����������������������� Party identification � Opinion of Clinton�s performance

Reproduce the hvote by clinteco crosstab table, except move party3 into the third white box on the right, under �Layer 1 of 1.�� (If you don�t

remember how, return to crosstabulation table.)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

77 of 84 5/1/2013 2:03 PM

Page 78: Harvard SPSS Tutorial.pdf

SPSS should produce the following table (not including the red circles:)

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

78 of 84 5/1/2013 2:03 PM

Page 79: Harvard SPSS Tutorial.pdf

One written interpretation could be as follows:

[Once one controls for party identification, the relationship between opinion on Clinton�s performance with the economy and the vote for Democratic or

Republican House candidates becomes significantly weaker.� The relationship between these two variables becomes statistically insignificant for

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

79 of 84 5/1/2013 2:03 PM

Page 80: Harvard SPSS Tutorial.pdf

Independents (p = .072 with a chi square of 7.)� While this relationship remains significant for Democratic and Republican identifiers (p = .001 with a

chi-square of 16 and p = .002 with a chi square of 14, respectively), it is also much weaker. �While Somers� d was .34 for all voters regardless of party

identification, it dropped down to .11 for Democratic identifiers and .13 for Republican identifiers.� At this stage of analysis, it appears that the

relationship between voters� opinion on Clinton�s economic performance and whether they voted for a Democratic or Republican House candidate is

weak at best.]

[NEXT->]

As the red circles show, we have just produced one of the most common problems with crosstabulation tables.� By dividing the original table into multiple

tables based on party identification, we have created columns with too few cases.� We have no choice but to collapse them.�

In this case, we will collapse (i.e., combine) �strongly disapprove� and �not strongly disapprove� into simply ����disapprove.���� ��In order to save

our previous work, we will create the new variable clintec2, a name that indicates to us that it was built from clinteco.�� (We can also create this

variable quickly, as I will show you below.)

Recode clinteco into clinteco2 as follows:

a) Value: [1]� �������������������������� ����������� Value: [1];

b) Value: [2]� �������������������������� ����������� Value: [2];

c) Range: [4] through [5] ������ ����������� Value: [3];

d) All other values ����������������� ����������� System-missing.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

80 of 84 5/1/2013 2:03 PM

Page 81: Harvard SPSS Tutorial.pdf

Then produce the following variable and value labels:

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

81 of 84 5/1/2013 2:03 PM

Page 82: Harvard SPSS Tutorial.pdf

Now, rerun crosstabs with hvote as the row, clintec2 as the column, and party3 under layer 1 of 1.� It should produce the following table:

This new table indicates that a relationship between the House vote and approving of Clinton�s economic performance still exists even if we control for

party identification, but the relationship is weaker.� Among Republicans, 20% of those who strongly approved voted for a Democratic candidate while

7% of those who disapproved voted for a Democrat. �Similarly, among independents, 59% of those who strongly approve voted for a Democrat, while

only 16% (4 of 25) of those who disapprove voted for a Democrat.�

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

82 of 84 5/1/2013 2:03 PM

Page 83: Harvard SPSS Tutorial.pdf

NOTE: You could produce the same crosstab table with the following syntax:

Crosstab tables = hvote by clintec2 by party3

�� / cells = count column.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

83 of 84 5/1/2013 2:03 PM

Page 84: Harvard SPSS Tutorial.pdf

Congratulations!� You have now completed a basic analysis on SPSS.�

Of course, we have not ended the discussion on whether opinion on the president�s job performance influences the House vote.� For example, while we

have looked at partisanship, we have not measured how the strength of partisanship influences the vote.� We have also not looked at other possible

intervening variables.� Finally, there are more powerful (and fun) statistical approaches that would help us get a clearer picture of how our independent

and dependent variables are related.

We at the Harvard-MIT Data Center encourage you to continue learning social science research methods.� We also encourage you to explore the other

social science statistical packages available at Harvard and other universities.

Please feel free to contact us with any questions, comments, complaints, or suggestions.

SPSS Tutorial http://www.hmdc.harvard.edu/projects/SPSS_Tutorial/spsstut.shtml

84 of 84 5/1/2013 2:03 PM