11
220 0®o non © © 0 0 0 0 0 0 0 0 0 0 0 00o 0 00 00'x ° 00 00 0 o 0 00 000 O o Oo:o . . - 0 0 00()0 0 ova © ® o o ooo o O o0 D o ° O o O o © © © 0 0 0 0 0 I ntroduction Topics Appropriate to Experiments The Classical Experiment Independent and Dependent Variables Pretesting and Postesting Experimental and Control Groups The Double-Blind Experiment Selecting Subjects Probability Sampling Randomization Matching Matching or Randomization? Variations on Experimental Design Preexperimental Research Designs Validity Issues in Experimental Research 0 10 00 0 o0 00 O O CHAPTER 8 Experiments Chapter Overview An Illustration of Experimentation "Natural" Experiments Strengths and Weaknesses of the Experimental Method MAIN POINTS KEY TERMS REVIEW QUESTIONS AND EXERCISES ADDITIONAL READINGS RESOURCES ON THE INTERNET An experiment is a mode of observation that enables researchers to probe causal relation- ships. Many experiments in social research are conducted under the controlled conditions of a laboratory, but experimenters can also take advantage of natural occurrences to study the effects of events in the social world. Introduction This chapter addresses the research method most commonly associated with structured science in general: the experiment. Here we'll focus on the ex- periment as a mode of scientific observation in social research. At base, experiments involve (1) taking action and (2) observing the consequences of that action. Social researchers typically select a group of subjects, do something to them, and observe the ef- fect of what was done. In this chapter, we'll exam- ine the logic and some of the techniques of social scientific experiments. It's worth noting at the outset that we often use experiments in nonscientific inquiry. In preparing a stew, for example, we add salt, taste, add more salt, and taste again. In defusing a bomb, we clip the red wire, observe whether the bomb explodes, clip an- other, and .... We also experiment copiously in our attempts to develop generalized understandings about the world we live in. All skills are learned through ex- perimentation: eating, walking, talking, riding a bicycle, swimming, and so forth. Through experi- mentation, students discover how much studying is required for academic success. Through experi- mentation, professors learn how much prepara- tion is required for successful lectures. This chapter discusses how social researchers use experiments to develop generalized understandings. We'll see that, like other methods available to the social re- searcher, experimenting has its special strengths and weaknesses. Topics Appropriate to Experiments Experiments are more appropriate for some topics and research purposes than others. Experiments are especially well suited to research projects involving relatively limited and well-defined concepts and propositions. In terms of the traditional image of sci- ence, discussed earlier in this book, the experimen- tal model is especially appropriate for hypothesis testing. Because experiments focus on determining causation, they're also better suited to explanatory than to descriptive purposes. The Classical Experiment . 221 Let's assume, for example, that we want to dis- cover ways of reducing prejudice against African Americans. We hypothesize that learning about the contribution of African Americans to U.S . history will reduce prejudice, and we decide to test this hy- pothesis experimentally. To begin, we might test a group of experimental subjects to determine their levels of prejudice against African Americans. Next, we might show them a documentary film depicting the many important ways African Americans have contributed to the scientific, literary, political, and social development of the nation. Finally, we would measure our subjects' levels of prejudice against Af- rican Americans to determine whether the film has actually reduced prejudice. Experimentation has also been successful in the study of small group interaction. Thus, we might bring together a small group of experimental sub- jects and assign them a task, such as making recom- mendations for popularizing car pools. We observe, then, how the group organizes itself and deals with the problem. Over the course of several such ex- periments, we might systematically vary the nature of the task or the rewards for handling the task suc- cessfully. By observing differences in the way groups organize themselves and operate under these vary- ing conditions, we can learn a great deal about the nature of small group interaction and the factors that influence it. For example, attorneys sometimes present evidence in different ways to different mock juries, to see which method is the most effective. We typically think of experiments as being con- ducted in laboratories. Indeed, most of the examples in this chapter involve such a setting. This need not be the case, however. Social researchers often study what are called natural experiments: "experiments" that occur in the regular course of social events. The latter portion of this chapter deals with such research. The Classical Experiment In both the natural and the social sciences, the most conventional type of experiment involves three major pairs of components: (1) independent and dependent variables, (2) pretesting and posttesting , l li

Babbie Ch8 Experiments

Embed Size (px)

Citation preview

Page 1: Babbie Ch8 Experiments

220

0®o

non ©©

0 0 0 00•

0 0 0 00 000o 0

00 00'x °00 000 o 0

00000 Oo Oo:o

.. -0 000()0 0ova•

© ® o oooo

o O o0

D o ° Oo

O o© © © 0• 0 0

0 0

I ntroduction

Topics Appropriate to Experiments

The Classical ExperimentIndependent and Dependent VariablesPretesting and PostestingExperimental and Control GroupsThe Double-Blind Experiment

Selecting Subjects

Probability SamplingRandomizationMatchingMatching or Randomization?

Variations on Experimental Design

Preexperimental Research DesignsValidity Issues in Experimental Research

01000 0o0 00 O O

CHAPTER 8

Experiments

Chapter Overview

An Illustration of Experimentation

"Natural" Experiments

Strengths and Weaknessesof the Experimental Method

MAIN POINTS

KEY TERMS

REVIEW QUESTIONS AND EXERCISES

ADDITIONAL READINGS

RESOURCES ON THE INTERNET

An experiment is a mode of observation thatenables researchers to probe causal relation-

ships. Many experiments in social research areconducted under the controlled conditions of a

laboratory, but experimenters can also takeadvantage of natural occurrences to study the

effects of events in the social world.

IntroductionThis chapter addresses the research method mostcommonly associated with structured science ingeneral: the experiment. Here we'll focus on the ex-periment as a mode of scientific observation in socialresearch. At base, experiments involve (1) takingaction and (2) observing the consequences of thataction. Social researchers typically select a group ofsubjects, do something to them, and observe the ef-fect of what was done. In this chapter, we'll exam-ine the logic and some of the techniques of socialscientific experiments.

It's worth noting at the outset that we often useexperiments in nonscientific inquiry. In preparing astew, for example, we add salt, taste, add more salt,and taste again. In defusing a bomb, we clip the redwire, observe whether the bomb explodes, clip an-other, and ....

We also experiment copiously in our attemptsto develop generalized understandings about theworld we live in. All skills are learned through ex-perimentation: eating, walking, talking, riding abicycle, swimming, and so forth. Through experi-mentation, students discover how much studyingis required for academic success. Through experi-mentation, professors learn how much prepara-tion is required for successful lectures. This chapterdiscusses how social researchers use experimentsto develop generalized understandings. We'll seethat, like other methods available to the social re-searcher, experimenting has its special strengthsand weaknesses.

Topics Appropriate to ExperimentsExperiments are more appropriate for some topicsand research purposes than others. Experiments areespecially well suited to research projects involvingrelatively limited and well-defined concepts andpropositions. In terms of the traditional image of sci-ence, discussed earlier in this book, the experimen-tal model is especially appropriate for hypothesistesting. Because experiments focus on determiningcausation, they're also better suited to explanatorythan to descriptive purposes.

The Classical Experiment . 221

Let's assume, for example, that we want to dis-cover ways of reducing prejudice against AfricanAmericans. We hypothesize that learning about thecontribution of African Americans to U.S . historywill reduce prejudice, and we decide to test this hy-pothesis experimentally. To begin, we might test agroup of experimental subjects to determine theirlevels of prejudice against African Americans. Next,we might show them a documentary film depictingthe many important ways African Americans havecontributed to the scientific, literary, political, andsocial development of the nation. Finally, we wouldmeasure our subjects' levels of prejudice against Af-rican Americans to determine whether the film hasactually reduced prejudice.

Experimentation has also been successful in thestudy of small group interaction. Thus, we mightbring together a small group of experimental sub-jects and assign them a task, such as making recom-mendations for popularizing car pools. We observe,then, how the group organizes itself and deals withthe problem. Over the course of several such ex-periments, we might systematically vary the natureof the task or the rewards for handling the task suc-cessfully. By observing differences in the way groupsorganize themselves and operate under these vary-ing conditions, we can learn a great deal about thenature of small group interaction and the factorsthat influence it. For example, attorneys sometimespresent evidence in different ways to different mockjuries, to see which method is the most effective.

We typically think of experiments as being con-ducted in laboratories. Indeed, most of the examplesin this chapter involve such a setting. This need notbe the case, however. Social researchers often studywhat are called natural experiments: "experiments"that occur in the regular course of social events.The latter portion of this chapter deals with suchresearch.

The Classical ExperimentIn both the natural and the social sciences, the mostconventional type of experiment involves threemajor pairs of components: (1) independent anddependent variables, (2) pretesting and posttesting,

l

4'

li

Page 2: Babbie Ch8 Experiments

222 . Chapter8: Experiments

and (3) experimental and control groups. Thissection looks at each of these components and theway they're put together in the execution of theexperiment.

Independent and Dependent VariablesEssentially, an experiment examines the effect ofan independent variable on a dependent variable.Typically, the independent variable takes the formof an experimental stimulus, which is either pres-ent or absent. That is, the stimulus is a dichotomousvariable, having two attributes, present or not pres-ent. In this typical model, the experimenter com-pares what happens when the stimulus is presentto what happens when it is not.

In the example concerning prejudice againstAfrican Americans, prejudice is the dependent vari-able and exposure to African-American history is theindependent variable. The researcher's hypothesissuggests that prejudice depends, in part, on a lackof knowledge of African-American history. Thepurpose of the experiment is to test the validity ofthis hypothesis by presenting some subjects withan appropriate stimulus, such as a documentaryfilm. In other terms, the independent variable isthe cause and the dependent variable is the effect.Thus, we might say that watching the film caused achange in prejudice or that reduced prejudice wasan effect of watching the film.

The independent and dependent variables ap-propriate to experimentation are nearly limitless.Moreover, a given variable might serve as an inde-pendent variable in one experiment and as a de-pendent variable in another. For example, preju-dice is the dependent variable in our example, butit might be the independent variable in an experi-ment examining the effect of prejudice on votingbehavior.

pretesting The measurement of a dependent variableamong subjects.posttesting The remeasurement of a dependent vari-able among subjects after they've been exposed to anindependent variable.

To be used in an experiment, both independentand dependent variables must be operationally de-fined. Such operational definitions might involvea variety of observation methods. Responses to aquestionnaire, for example, might be the basis fordefining prejudice. Speaking to or ignoring AfricanAmericans, or agreeing or disagreeing with them,might be elements in the operational definition ofinteraction with African Americans in a small groupsetting.

Conventionally, in the experimental model,dependent and independent variables must be op-erationally defined before the experiment begins.However, as you'll see in connection with surveyresearch and other methods, it's sometimes appro-priate to make a wide variety of observations duringdata collection and then determine the most usefuloperational definitions of variables during lateranalyses. Ultimately, however, experimentation,like other quantitative methods, requires specificand standardized measurements and observations.

Pretesting and PosttestingIn the simplest experimental design, subjects aremeasured in terms of a dependent variable ( pre-testing), exposed to a stimulus representing an in-dependent variable, and then remeasured in termsof the dependent variable ( posttesting). Any dif-ferences between the first and last measurementson the dependent variable are then attributed tothe independent variable.

In the example of prejudice and exposure toAfrican-American history, we'd begin by pretestingthe extent of prejudice among our experimentalsubjects. Using a questionnaire asking about atti-tudes toward African Americans, for example, wecould measure both the extent of prejudice exhib-ited by each individual subject and the averageprejudice level of the whole group. After expos-ing the subjects to the African-American historyfilm, we could administer the same questionnaireagain. Responses given in this posttest would per-mit us to measure the later extent of prejudice foreach subject and the average prejudice level of thegroup as a whole. If we discovered a lower level ofprejudice during the second administration of the

questionnaire, we might conclude that the film hadindeed reduced prejudice.

In the experimental examination of attitudessuch as prejudice, we face a special practical prob-lem relating to validity. As you may already haveimagined, the subjects may respond differently tothe questionnaires the second time even if their at-titudes remain unchanged. During the first admin-istration of the questionnaire, the subjects may beunaware of its purpose. By the second measure-ment, they may have figured out that were inter-ested in measuring their prejudice. Because no onewishes to seem prejudiced, the subjects may "deanup" their answers the second time around. Thus,the film will seem to have reduced prejudice al-though, in fact, it has not.

This is an example of a more general problemthat plagues many forms of social research: Thevery act of studying something may change it. Thetechniques for dealing with this problem in the con-text of experimentation will be discussed in variousplaces throughout the chapter. The first techniqueinvolves the use of control groups.

Experimental and Control GroupsLaboratory experiments seldom, if ever, involve onlythe observation of an experimental group towhich a stimulus has been administered. In addi-tion, the researchers also observe a control group,which does not receive the experimental stimulus.

In the example of prejudice and African-American history, we might examine two groupsof subjects. To begin, we give each group a question-naire designed to measure their prejudice againstAfrican Americans. Then we show the film only tothe experimental group. Finally, we administer aposttest of prejudice to both groups. Figure 8-1 il-lustrates this basic experimental design.

Using a control group allows the researcherto detect any effects of the experiment itself. If theposttest shows that the overall level of prejudice ex-hibited by the control group has dropped as muchas that of the experimental group, then the appar-ent reduction in prejudice must be a function ofthe experiment or of some external factor ratherthan a function of the film. If, on the other hand,

The Classical Experiment . 223

prejudice is reduced only in the experimental group,this reduction would seem to be a consequence ofexposure to the film, because that's the only differ-ence between the two groups. Alternatively, if prej-udice is reduced in both groups but to a greater de-gree in the experimental group than in the controlgroup, that, too, would be grounds for assumingthat the film reduced prejudice.

The need for control groups in social researchbecame dear in connection with a series of studiesof employee satisfaction conducted by F. J. Roeth-lisberger and W. J. Dickson (1939) in the late 1920sand early 1930s. These two researchers were inter-ested in discovering what changes in working con-ditions would improve employee satisfaction andproductivity. To pursue this objective, they studiedworking conditions in the telephone "bank wiringroom" of the Western Electric Works in the Chi-cago suburb of Hawthorne, Illinois.

To the researchers' great satisfaction, they dis-covered that making working conditions better in-creased satisfaction and productivity consistently.As the workroom was brightened up through bet-ter lighting, for example, productivity went up.When lighting was further improved, productivitywent up again.

To further substantiate their scientific con-clusion, the researchers then dimmed the lights.Whoops-productivity improved again!

At this point it became evident that the wiring-room workers were responding more to the atten-tion given them by the researchers than to improvedworking conditions. As a result of this phenomenon,often called the Hawthorne effect, social researchershave become more sensitive to and cautious aboutthe possible effects of experiments themselves. In

experimental group in experimentation, a groupof subjects to whom an experimental stimulus isadministered.control group In experimentation, a group of sub-jects to whom no experimental stimulus is adminis-tered and who should resemble the experimentalgroup in all other respects. The comparison of the con-trol group and the experimental group at the end ofthe experiment points to the effect of the experimentalstimulus.

Page 3: Babbie Ch8 Experiments

224 . Chapter 8: Experiments

FIGURE 8-1Diagram of Basic Experimental Design

ExperimentalGroup

Measure dependentvariable

rAdminister experimental

stimulus (film)

Remeasure dependentvariable

Compare: Same?

Compare: Different?

ControlGroup

Measure dependentvariable

Remeasure dependentvariable

the wiring-room study, the use of a proper controlgroup-one that was studied intensively withoutany other changes in the working conditions-would have pointed to the existence of this effect.

The need for control groups in experimentationhas been nowhere more evident than in medicalresearch. Time and again, patients who participatein medical experiments have appeared to improve,and it has been unclear how much of the improve-ment has come from the experimental treatmentand how much from the experiment. In testing theeffects of new drugs, then, medical researchers fre-quently administer a placebo-a "drug" with norelevant effect, such as sugar pills-to a controlgroup. Thus, the control-group patients believe thatthey, like the experimental group, are receiving anexperimental drug. Often, they improve. If the newdrug is effective, however, those receiving the ac-tual drug will improve more than those receivingthe placebo.

In social scientific experiments, control groupsguard against not only the effects of the experi-ments themselves but also the effects of any eventsoutside the laboratory during the experiments. Inthe example of the study of prejudice, suppose thata popular African-American leader is assassinatedin the middle of, say, a week-long experiment.

Such an event may very well horrify the experi-mental subjects, requiring them to examine theirown attitudes toward African Americans, with theresult of reduced prejudice. Because such an effectshould happen about equally for members of thecontrol and experimental groups, a greater reduc-tion of prejudice among the experimental groupwould, again, point to the impact of the experi-mental stimulus: the documentary film.

Sometimes an experimental design requiresmore than one experimental or control group. Inthe case of the documentary film, for example, wemight also want to examine the impact of reading abook on African-American history. In that case, wemight have one group see the film and read thebook, another group only see the movie, still an-other group only read the book, and the controlgroup do neither. With this kind of design, we coulddetermine the impact of each stimulus separately,as well as their combined effect.

The Double-Blind ExperimentLike patients who improve when they merelythink they're receiving a new drug, sometimesexperimenters tend to prejudge results. In medical

research, the experimenters may be more likely to"observe" improvements among patients receivingthe experimental drug than among those receiv-ing the placebo. (This would be most likely, per-haps, for the researcher who developed the drug.)A double-blind experiment eliminates this pos-sibility, because in this design neither the subjectsnor the experimenters know which is the experi-mental group and which is the control. In themedical case, those researchers who were respon-sible for administering the drug and for noting im-provements would not be told which subjects werereceiving the drug and which the placebo. Con-versely, the researcher who knew which subjectswere in which group would not administer theexperiment.

In social scientific experiments, as in medicalexperiments, the danger of experimenter bias isfurther reduced to the extent that the operationaldefinitions of the dependent variables are dear andprecise. Thus, medical researchers would be lesslikely to unconsciously bias their reading of a pa-tient's temperature than they would be to bias theirassessment of how lethargic the patient was. Forthe same reason, the small group researcher wouldbe less likely to misperceive which subject spoke,or to whom he or she spoke, than whether thesubject's comments sounded cooperative or com-petitive, a more subjective judgment that's difficultto define in precise behavioral terms.

As I've indicated several times, seldom canwe devise operational definitions and measure-ments that are wholly precise and unambiguous.This is another reason why it can be appropriateto employ a double-blind design in social researchexperiments.

Selecting Subjectsin Chapter 7 we discussed the logic of sampling,which involves selecting a sample that is repre-sentative of some populations. Similar consider-ations apply to experiments. Because most socialresearchers work in colleges and universities, itseems likely that research laboratory experiments

Selecting Subjects . 225

would be conducted with college undergraduatesas subjects. Typically, the experimenter asks stu-dents enrolled in his or her classes to participate inexperiments or advertises for subjects in a collegenewspaper. Subjects may or may not be paid forparticipating in such experiments (recall also fromChapter 3 the ethical issues involved in asking stu-dents to participate in such studies).

In relation to the norm of generalizability inscience, this tendency dearly represents a poten-tial defect in social research. Simply put, collegeundergraduates are not typical of the public atlarge. There is a danger, therefore, that we maylearn much about the attitudes and actions of col-lege undergraduates but not about social attitudesand actions in general.

However, this potential defect is less signifi-cant in explanatory research than in descriptiveresearch. True, having noted the level of prejudiceamong a group of college undergraduates in ourpretesting, we would have little confidence that thesame level existed among the public at large. Onthe other hand, if we found that a documentaryfilm reduced whatever level of prejudice existedamong those undergraduates, we would have moreconfidence-without being certain-that it wouldhave a comparable effect in the community at large.Social processes and patterns of causal relationshipsappear to be more generalizable and more stablethan specific characteristics such as an individual'slevel of prejudice.

Aside from the question of generalizability, thecardinal rule of subject selection in experimenta-tion concerns the comparability of experimentaland control groups. Ideally, the control group rep-resents what the experimental group would belike if it had not been exposed to the experimentalstimulus. The logic of experiments requires, there-fore, that experimental and control groups be assimilar as possible. There are several ways to ac-complish this.

double-blind experiment An experimental designin which neither the subjects nor the experimentersknow which is the experimental group and which isthe control.

MW - 1

Page 4: Babbie Ch8 Experiments

226 . Chapter8: Experiments

Probability SamplingThe discussions of the logic and techniques of prob-ability sampling in Chapter 7 provide one methodfor selecting two groups of people that are similarto each other. Beginning with a sampling framecomposed of all the people in the population understudy, the researcher might select two probabilitysamples. If these samples each resemble the totalpopulation from which they're selected, they'll alsoresemble each other.

Recall also, however, that the degree of resem-blance (representativeness) achieved by probabilitysampling is largely a function of the sample size. Asa general guideline, probability samples of less than100 are not likely to be terribly representative, andsocial scientific experiments seldom involve thatmany subjects in either experimental or controlgroups. As a result, then, probability sampling isseldom used in experiments to select subjects froma larger population. Researchers do, however, usethe logic of random selection when they assignsubjects to groups.

RandomizationHaving recruited, by whatever means, a total groupof subjects, the experimenter may randomly assignthose subjects to either the experimental or thecontrol group. The researcher might accomplishsuch randomization by numbering all of the sub-jects serially and selecting numbers by means of arandom number table. Alternatively, the experi-menter might assign the odd-numbered subjects tothe experimental group and the even-numberedsubjects to the control group.

Let's return again to the basic concept of proba-bility sampling. If we recruit 40 subjects all to-

randomization A technique for assigning experi-mental subjects to experimental and control groupsrandomly.matching In connection with experiments, the proce-dure whereby pairs of subjects are matched on the ba-sis of their similarities on one or more variables, andone member of the pair is assigned to the experimentalgroup and the other to the control group.

gether, in response to a newspaper advertisement,for example, there's no reason to believe that the40 subjects represent the entire population fromwhich they have been drawn. Nor can we assumethat the 20 subjects randomly assigned to the ex-perimental group represent that larger population.We can have greater confidence, however, that the20 subjects randomly assigned to the experimentalgroup will be reasonably similar to the 20 assignedto the control group.

Following the logic of our earlier discussions ofsampling, we can see our 40 subjects as a popula-tion from which we select two probability samples-each consisting of half the population. Because eachsample reflects the characteristics of the total popu-lation, the two samples will mirror each other.

As we saw in Chapter 7, our assumption ofsimilarity in the two groups depends in part on thenumber of subjects involved. In the extreme case, ifwe recruited only two subjects and assigned, by theflip of a coin, one as the experimental subject andone as the control, there would be no reason to as-sume that the two subjects are similar to each other.With larger numbers of subjects, however, random-ization makes good sense.

MatchingAnother way to achieve comparability between theexperimental and control groups is through match-ing. This process is similar to the quota samplingmethods discussed in Chapter 7. If 12 of our sub-jects are young white men, we might assign 6 ofthem at random to the experimental group and theother 6 to the control group. if 14 are middle-agedAfrican-American women, we might assign 7 toeach group. We repeat this process for every rele-vant grouping of subjects.

The overall matching process could be most ef-ficiently achieved through the creation of a quotamatrix constructed of all the most relevant charac-teristics. Figure 8-2 provides a simplified illustrationof such a matrix. In this example, the experimenterhas decided that the relevant characteristics are race,age, and gender. Ideally, the quota matrix is con-structed to result in an even number of subjects ineach cell of the matrix. Then, half the subjects in

FIGURE 8-2Quota Matrix Illustration

each cell go into the experimental group and halfinto the control group.

Alternatively, we might recruit more subjectsthan our experimental design requires. We mightthen examine many characteristics of the large ini-tial group of subjects. Whenever we discover a pairof quite similar subjects, we might assign one atrandom to the experimental group and the otherto the control group. Potential subjects who are un-like anyone else in the initial group might be leftout of the experiment altogether.

Whatever method we employ, the desired re-sult is the same. The overall average description ofthe experimental group should be the same as thatof the control group. For example, on average bothgroups should have about the same ages, the samegender composition, the same racial composition,and so forth. This test of comparability should beused whether the two groups are created throughprobability sampling or through randomization.

Thus far I have referred to the "relevant" vari-ables without saying dearly what those variablesare. Of course, these variables cannot be specifiedin any definite way, any more than I could specify

Selecting Subjects . 227

in Chapter 7 which variables should be used instratified sampling. Which variables are relevant ul-timately depends on the nature and purpose of anexperiment. As a general rule, however, the controland experimental groups should be comparable interms of those variables that are most likely to berelated to the dependent variable under study. In astudy of prejudice, for example, the two groupsshould be alike in terms of education, ethnicity,and age, among other characteristics. In somecases, moreover, we may delay assigning subjectsto experimental and control groups until we haveinitially measured the dependent variable. Thus,for example, we might administer a questionnairemeasuring subjects' prejudice and then match theexperimental and control groups on this variableto assure ourselves that the two groups exhibit thesame overall level of prejudice.

Matching or Randomization?When assigning subjects to the experimental andcontrol groups, you should be aware of two argu-ments in favor of randomization over matching.

Men Women

AfricanAmerican White African

American White

Under 30 years 8 12 10 16

30 to 50 years 18 30 14 28

Over 50 years 12 20 12 22

Page 5: Babbie Ch8 Experiments

228 . Chapter8: Experiments

First, you may not be in a position to know in ad-vance which variables will be relevant for the match-ing process. Second, most of the statistics used toanalyze the results of experiments assume random-ization. Failure to design your experiment that way,then, makes your later use of those statistics lessmeaningful.

On the other hand, randomization only makessense if you have a fairly large pool of subjects, sothat the laws of probability sampling apply. Withonly a few subjects, matching would be a betterprocedure.

Sometimes researchers can combine matchingand randomization. When conducting an experi-ment on the educational enrichment of young ado-lescents, for example, J. Milton Yinger and his col-leagues (1977) needed to assign a large number ofstudents, aged 13 and 14, to several different ex-perimental and control groups to ensure the com-parability of students composing each of the groups.They achieved this goal by the following method.

Beginning with a pool of subjects, the research-ers first created strata of students nearly identical toone another in terms of some 15 variables. Fromeach of the strata, students were randomly assignedto the different experimental and control groups. Inthis fashion, the researchers actually improved onconventional randomization. Essentially, they hadused a stratified sampling procedure (Chapter 7),except that they had employed far more stratifica-tion variables than are typically used in, say, surveysampling.

Thus far I've described the classical experiment-the experimental design that best represents thelogic of causal analysis in the laboratory. In prac-tice, however, social researchers use a great varietyof experimental designs. Let's look at some now.

Variations on Experimental DesignDonald Campbell and Julian Stanley (1963), ina classic book on research design, describe some16 different experimental and quasi-experimentaldesigns. This section describes some of these varia-tions to better show the potential for experimenta-tion in social research.

Preexperimental Research DesignsTo begin, Campbell and Stanley discuss three "pre-experimental" designs, not to recommend thembut because they're frequently used in less-than-professional research. These designs are called "pre-experimental" to indicate that they do not meet thescientific standards of experimental designs. In thefirst such design-the one-shot case study-the re-searcher measures a single group of subjects on adependent variable following the administration ofsome experimental stimulus. Suppose, for example,that we show the African-American history filmmentioned earlier to a group of people and thenadminister a questionnaire that seems to measureprejudice against African Americans. Suppose fur-ther that the answers given to the questionnaireseem to represent a low level of prejudice. We mightbe tempted to conclude that the film reduced prej-udice. Lacking a pretest, however, we can't be sure.Perhaps the questionnaire doesn't really represent avery sensitive measure of prejudice, or perhaps thegroup we're studying was low in prejudice to beginwith. In either case, the film might have made nodifference, though our experimental results mighthave misled us into thinking it did.

The second preexperimental design discussedby Campbell and Stanley adds a pretest for the ex-perimental group but lacks a control group. Thisdesign-which the authors call the onegroup pretest-posttestdesign-suffers from the possibility that somefactor other than the independent variable mightcause a change between the pretest and posttestresults, such as the assassination of a respectedAfrican-American leader. Thus, although we cansee that prejudice has been reduced, we can't besure that it was the film that caused that reduction.

To round out the possibilities for preexperimen-tal designs, Campbell and Stanley point out thatsome research is based on experimental and con-trol groups but has no pretests. They call this designthe static group comparison. For example, we mightshow the African-American history film to onegroup and not to another and then measure preju-dice in both groups. If the experimental group hadless prejudice at the conclusion of the experiment,we might assume the film was responsible. But un-less we had randomized our subjects, we would

FIGURE 8-3Three Preexperimental Research Designs

One-Shot Case StudyA man who exercisesis observed to be intrim shape

One-Group Pretest-Posttest DesignAn overweight man whoexercises is later observedto be in trim shape

Static-Group ComparisonA man who exercises isobserved to be in trimshape while one whodoesn't is observed tobe overweight

Time 1

Time 1

Time 2

Comparison

Time 3

Time 2

Time 3

Some intuitivestandard of whatconstitutes atrim shape

n

have no way of knowing that the two groups hadthe same degree of prejudice initially; perhaps theexperimental group started out with less.

Figure 8-3 graphically illustrates these threepreexperimental research designs by using a differ-ent research question: Does exercise cause weightreduction? To make the several designs clearer, thefigure shows individuals rather than groups, butthe same logic pertains to group comparisons. Let's

Variations on Experimental Design . 229

review the three preexperimental designs in thisnew example.

The one-shot case study represents a commonform of logical reasoning in everyday life. Askedwhether exercise causes weight reduction, we maybring to mind an example that would seem to sup-port the proposition: someone who exercises and isthin. There are problems with this reasoning, how-ever. Perhaps the person was thin long before be-

Page 6: Babbie Ch8 Experiments

230 . Chapter8: Experiments

ginning to exercise. Or perhaps he became thin forsome other reason, like eating less or getting sick.The observations shown in the diagram do notguard against these other possibilities. Moreover,the observation that the man in the diagram is intrim shape depends on our intuitive idea of whatconstitutes trim and overweight body shapes. Alltold, this is very weak evidence for testing the rela-tionship between exercise and weight loss.

The one-group pretest-posttest design offerssomewhat better evidence that exercise producesweight loss. Specifically, we have ruled out the pos-sibility that the man was thin before beginning toexercise. However, we still have no assurance thatit was his exercising that caused him to lose weight.

Finally, the static-group comparison eliminatesthe problem of our questionable definition of whatconstitutes trim or overweight body shapes. In thiscase, we can compare the shapes of the man whoexercises and the one who does not. This design,however, reopens the possibility that the man whoexercises was thin to begin with.

Validity Issues in Experimental ResearchAt this point I want to present in a more systematicway the factors that affect the validity of experimen-tal research. First we'll look at what Campbell andStanley call the sources of internal invalidity, reviewedand expanded in a follow-up book by Thomas Cookand Donald Campbell (1979). Then we'll considerthe problem of generalizing experimental results tothe "real" world, referred to as external invalidity.Having examined these, we'll be in a position toappreciate the advantages of some of the more so-phisticated experimental and quasi-experimentaldesigns social science researchers sometimes use.

Sources of Internal InvalidityThe problem of internal invalidity refers to thepossibility that the conclusions drawn from experi-mental results may not accurately reflect what has

Internal invalidity Refers to the possibility that theconclusions drawn from experimental results may notaccurately reflect what went onn in the experiment itself.

gone on in the experiment itself. The threat of in-ternal invalidity is present whenever anything otherthan the experimental stimulus can affect the de-pendent variable.

Campbell and Stanley (1963:5-6) and Cookand Campbell (1979:51-55) point to severalsources of internal invalidity. Here are twelve:

1. History. During the course of the experiment,historical events may occur that will confoundthe experimental results. The assassination of anAfrican-American leader during the course of anexperiment on reducing anti-African-Americanprejudice is one example; the arrest of an African-American leader for some heinous crime, whichmight increase prejudice, is another.2. Maturation. People are continually growing andchanging, and such changes can affect the results ofthe experiment. In a long-term experiment, the factthat the subjects grow older (and wiser?) may havean effect. In shorter experiments, they may growtired, sleepy, bored, or hungry, or change in otherways that affect their behavior in the experiment.3. Testing. As we have seen, often the process oftesting and retesting influences people's behavior,thereby confounding the experimental results. Sup-pose we administer a questionnaire to a group as away of measuring their prejudice. Then we admin-ister an experimental stimulus and remeasure theirprejudice. By the time we conduct the posttest, thesubjects will probably have become more sensitiveto the issue of prejudice and will be more thought-ful in their answers. In fact, they may have figuredout that we're trying to find out how prejudicedthey are, and, because few people like to appearprejudiced, they may give answers that they thinkwe want or that will make them look good.4. Instrumentation. The process of measurement inpretesting and posttesting brings in some of the is-sues of conceptualization and operationalization dis-cussed earlier in the book. If we use different mea-sures of the dependent variable in the pretest andposttest (say, different questionnaires about preju-dice), how can we be sure they're comparable toeach other? Perhaps prejudice will seem to decreasesimply because the pretest measure was more sen-sitive than the posttest measure. Or if the measure-ments are being made by the experimenters, their

standards or their abilities may change over thecourse of the experiment.5. Statistical regression. Sometimes it's appropriate toconduct experiments on subjects who start out withextreme scores on the dependent variable. If youwere testing a new method for teaching math tohardcore failures in math, you'd want to conductyour experiment on people who previously havedone extremely poorly in math. But consider for aminute what's likely to happen to the math achieve-ment of such people over time without any experi-mental interference. They're starting out so low thatthey can only stay at the bottom or improve: Theycan't get worse. Even without any experimentalstimulus, then, the group as a whole is likely to showsome improvement over time. Referring to a regres-sion to the mean, statisticians often point out that ex-tremely tall people as a group are likely to havechildren shorter than themselves, and extremelyshort people as a group are likely to have childrentaller than themselves. There is a danger, then, thatchanges occurring by virtue of subjects starting outin extreme positions will be attributed erroneouslyto the effects of the experimental stimulus.6. Selection biases. We discussed selection bias ear-lier when we examined different ways of selectingsubjects for experiments and assigning them to ex-perimental and control groups. Comparisons don'thave any meaning unless the groups are compa-rable at the start of an experiment.7. Experimental mortality. Although some social ex-periments could, I suppose, kill subjects, experimen-tal mortality refers to a more general and less ex-treme problem. Often, experimental subjects willdrop out of the experiment before it's completed,and this can affect statistical comparisons and con-clusions. In the classical experiment involving anexperimental and a control group, each with apretest and posttest, suppose that the bigots in theexperimental group are so offended by the African-American history film that they tell the experi-menter to forget it, and they leave. Those subjectssticking around for the posttest will have been lessprejudiced to start with, so the group results willreflect a substantial "decrease" in prejudice.8. Causal time order. Though rare in social research,ambiguity about the ti me order of the experimen-

Variations on Experimental Design . 23 1

tal stimulus and the dependent variable can arise.Whenever this occurs, the research conclusionthat the stimulus caused the dependent variablecan be challenged with the explanation that the"dependent" variable actually caused changes inthe stimulus.9. Diffusion or imitation of treatments. When experi-mental and control-group subjects can communi-cate with each other, experimental subjects maypass on some elements of the experimental stimu-lus to the control group. For example, supposethere's a lapse of time between our showing of theAfrican-American history film and the posttest ad-ministration of the questionnaire. Members of theexperimental group might tell control-group sub-jects about the film. In that case, the control groupbecomes affected by the stimulus and is not a realcontrol. Sometimes we speak of the control groupas having been "contaminated."10. Compensation. As you'll see in Chapter 12, inexperiments in real-life situations-such as a spe-cial educational program-subjects in the controlgroup are often deprived of something consideredto be of value. In such cases, there may be pres-sures to offer some form of compensation. For ex-ample, hospital staff might feel sorry for control-group patients and give them extra "tender lovingcare." In such a situation, the control group is nolonger a genuine control group.11. Compensatory rivalry. In real-life experiments,the subjects deprived of the experimental stimulusmay try to compensate for the missing stimulus byworking harder. Suppose an experimental mathprogram is the experimental stimulus; the controlgroup may work harder than before on their mathin an attempt to beat the "special" experimentalsubjects.12. Demoralization. On the other hand, feelings ofdeprivation within the control group may result intheir giving up. In educational experiments, de-moralized control-group subjects may stop study-ing, act up, or get angry.

These, then, are some of the sources of internalinvalidity in experiments. Aware of these, experi-menters have devised designs aimed at handlingthem. The classical experiment, if coupled withproper subject selection and assignment, addresses

Page 7: Babbie Ch8 Experiments

FIGURE 8-4The Classical Experiment: Using an African-American History Film to Reduce Prejudice

each of these problems. Let's look again at thatstudy design, presented graphically in Figure 8-4.

If we use the experimental design shown inFigure 8-4, we should expect two findings. Forthe experimental group, the level of prejudicemeasured in their posttest should be less than wasfound in their pretest. In addition, when the twoposttests are compared, less prejudice should befound in the experimental group than in the con-trol group.

This design also guards against the problem ofhistory in that anything occurring outside the ex-periment that might affect the experimental groupshould also affect the control group. Consequently,there should still be a difference in the two posttestresults. The same comparison guards against prob-lems of maturation as long as the subjects havebeen randomly assigned to the two groups. Testingand instrumentation can't be problems, becauseboth the experimental and control groups are sub-ject to the same tests and experimenter effects. Ifthe subjects have been assigned to the two groupsrandomly, statistical regression should affect both

equally, even if people with extreme scores on prej-udice are being studied. Selection bias is ruled outby the random assignment of subjects. Experimen-tal mortality is more complicated to handle, but thedata provided in this study design offer several waysto deal with it. Slight modifications to the design-administering a placebo (such as a film havingnothing to do with African Americans) to the con-trol group, for example-can make the problemeven easier to manage.

The remaining five problems of internal inva-lidity are avoided through the careful administra-tion of a controlled experimental design. The ex-perimental design we've been discussing facilitatesthe dear specification of independent and depen-dent variables. Experimental and control subjectscan be kept separate, reducing the possibility ofdiffusion or imitation of treatments. Administra-tive controls can avoid compensations given to thecontrol group, and compensatory rivalry can bewatched for and taken into account in evaluatingthe results of the experiment, as can the problemof demoralization.

FIGURE 8-5The Solomon Four-Group Design

Sources of External Invalidity

internal invalidity accounts for only some of thecomplications faced by experimenters. In addition,there are problems of what Campbell and Stanleycall external invalidity, which relates to the gen-eralizability of experimental findings to the "real"world. Even if the results of an experiment are anaccurate gauge of what happened during that ex-periment, do they really tell us anything about lifein the wilds of society?

Campbell and Stanley describe four forms ofthis problem; I'll present one as an illustration.The generalizability of experimental findings isjeopardized, as the authors point out, if there's aninteraction between the testing situation and theexperimental stimulus (1963:18). Here's an ex-ample of what they mean.

Staying with the study of prejudice and the Af-rican-American history film, let's suppose that ourexperimental group-in the classical experiment-has less prejudice in its posttest than in its pretestand that its posttest shows less prejudice than that

Variations on Experimental Design . 233

Group 1

Group 2

Group 3

Group 4

Pretest

Pretest

Stimulus

Stimulus

TIME

Ot Posttest should show less prejudice than the pretest.O2 Posttest and pretest should show the same amount of prejudice.3O Group 1 should show less prejudice than Group 2.

® Group 3 should show less prejudice than Group 4.

Posttest

Posttest

Posttest4

Posttest

of the control group. We can be confident that thefilm actually reduced prejudice among our experi-mental subjects. But would it have the same effectif the film were shown in theaters or on television?We cant be sure, because the film might be effec-tive only when people have been sensitized to theissue of prejudice, as the subjects may have been intaking the pretest. This is an example of interactionbetween the testing and the stimulus. The classicalexperimental design cannot control for that possi-bility. Fortunately, experimenters have devisedother designs that can.

The Solomon four group design ( D. Campbell andStanley 1963:24-25) addresses the problem of test-ing interaction with the stimulus. As the name sug-gests, it involves four groups of subjects, assignedrandomly from a pool. Figure 8-5 presents this de-sign graphically.

external invalidity Refers to the possibility that cnn-dusions drawn from experimental results may not begeneralizable to the "real" world.

Page 8: Babbie Ch8 Experiments

234 , Chapter 8: Experiments

Notice that Groups 1 and 2 in Figure 8-5 com-pose the classical experiment, with Group 2 beingthe control group. Group 3 is administered the ex-perimental stimulus without a pretest, and Group 4is only posttested. This experimental design permitsfour meaningful comparisons, which are describedin the figure. If the African-American history filmreally reduces prejudice-unaccounted for by theproblem of internal validity and unaccounted forby an interaction between the testing and the stim-ulus-we should expect four findings:

1. In Group 1, posttest prejudice should be lessthan pretest prejudice.]

2. The Group 2 pretest and posttest should showthe same degree of prejudice.

3. There should be less prejudice evident in theGroup 1 posttest than in the Group 2 posttest.

4. The Group 3 posttest should show less preju-dice than the Group 4 posttest.

Notice that findings (3) and (4) rule out any in-teraction between the testing and the stimulus. Andremember that these comparisons are meaningfulonly if subjects have been assigned randomly to thedifferent groups, thereby providing groups of equalprejudice initially, even though their preexperimen-tal prejudice is only measured in Groups 1 and 2.

There is a side benefit to this research design, asthe authors point out. Not only does the Solomonfour-group design rule out interactions betweentesting and the stimulus, it also provides data forcomparisons that will reveal how much of this in-teraction has occurred in a classical experiment.This knowledge allows a researcher to review andevaluate the value of any prior research that usedthe simpler design.

The last experimental design I'll mention hereis what Campbell and Stanley (1963:25-26) callthe posttest-only control group design; it consists of thesecond half-Groups 3 and 4-of the Solomondesign. As the authors argue persuasively, withproper randomization, only Groups 3 and 4 areneeded for a true experiment that controls for theproblems of internal invalidity as well as for the in-teraction between testing and stimulus. With ran-domized assignment to experimental and controlgroups (which distinguishes this design from the

static-group comparison discussed earlier), the sub-jects will be initially comparable on the dependentvariable-comparable enough to satisfy the con-ventional statistical tests used to evaluate the re-sults -so it's not necessary to measure them. In-deed, Campbell and Stanley suggest that the onlyjustification for pretesting in this situation is tradi-tion. Experimenters have simply grown accustomedto pretesting and feel more secure with research de-signs that include it. Be clear, however, that thispoint applies only to experiments in which subjectshave been assigned to experimental and controlgroups randomly, because that's what justifies theassumption that the groups are equivalent withoutactually measuring them to find out.

This discussion has introduced the intricacies ofexperimental design, its problems, and some solu-tions. There are, of course, a great many other ex-perimental designs in use. Some involve more thanone stimulus and combinations of stimuli. Othersinvolve several tests of the dependent variable overtime and the administration of the stimulus at dif-ferent times for different groups. If you're interestedin pursuing this topic, you might look at the Camp-bell and Stanley book.

An Illustration of ExperimentationExperiments have been used to study a wide va-riety of topics in the social sciences. Some experi-ments have been conducted within laboratory situ-ations; others occur out in the "real world." Thefollowing discussion provides a glimpse of both.

Let's begin with a "real world" example. InGeorge Bernard Shaw's well-loved play, Pygmalion-the basis of the long-running Broadway musical,My Fair Lady-Eliza Doolittle speaks of the powersothers have in determining our social identity.Here's how she distinguishes the way she's treatedby her tutor, Professor Higgins, and by Higgins'sfriend, Colonel Pickering:

You see, really and truly, apart from the thingsanyone can pick up (the dressing and the properway of speaking, and so on), the difference be-tween a lady and a flower girl is not how she

behaves, but how she's treated. I shall alwaysbe a flower girl to Professor Higgins, because healways treats me as a flower girl, and alwayswill, but I know I can be a lady to you, becauseyou always treat me as a lady, and always will.

(Act 'vThe sentiment Eliza expresses here is basic

social science, addressed more formally by sociolo-gists such as Charles Horton Cooley (the "looking-glass self") and George Herbert Mead ("the gener-alized other"). The basic point is that who we thinkwe are-our self-concept-and how we behaveare largely a function of how others see and treatus. Related to this, the way others perceive us islargely conditioned by expectations they have in ad-vance. If they've been told we're stupid, for example,they're likely to see us that way-and we may cometo see ourselves that way and actually act stupidly.

This phenomenon has generally been called thePygmalion effect, and it's nicely suited to controlled ex-periments. In one of the best-known experimentalinvestigations of the Pygmalion effect, Robert Rosen-thal and Lenore Jacobson (1968) administered whatthey called a "Harvard Test of Inflected Acquisition"to students in a West Coast school. Subsequently,they met with the students' teachers to present theresults of the test. In particular, Rosenthal and Ja-cobson identified certain students as very likely toexhibit a sudden spurt in academic abilities duringthe coming year, based on the results of the test.

When IQ test scores were compared later, theresearchers' predictions proved accurate. The stu-dents identified as "sputters" far exceeded theirclassmates during the following year, suggestingthat the predictive test was a powerful one. In fact,the test was a hoax! The researchers had madetheir predictions randomly among both good andpoor students. What they told the teachers did notreally reflect students' test scores at all. The progressmade by the "sputters" was simply a result of theteachers expecting the improvement and payingmore attention to those students, encouragingthem, and rewarding them for achievements. (No-tice the similarity between this situation and theHawthorne effect discussed earlier in this chapter.)

The Rosenthal-Jacobson study attracted a greatdeal of popular as well as scientific attention. Sub-

An Illustration of Experimentation . 235

sequent experiments have focused on specific as-pects of what has become known as the attributionprocess, or the expectations communication model. Thisresearch, largely conducted by psychologists, paral-lels research primarily by sociologists, which takesa slightly different focus and is often gathered un-der the label expectations-states theory. Psychologicalstudies focus on situations in which the expecta-tions of a dominant individual affect the perfor-mance of subordinates-as in the case of a teacherand students, or a boss and employees. The socio-logical research has tended to focus more on therole of expectations among equals in small, task-oriented groups. In a jury, for example, how do ju-rors initially evaluate each other, and how do thoseinitial assessments affect their later interactions?(You can learn more about this phenomenon, in-cluding attempts to find practical applications, bysearching the Web for "The Pygmalion Effect.")

Here's an example of an experiment conductedto examine the way our perceptions of our abilitiesand those of others affect our willingness to acceptthe other person's ideas. Martha Foschi, G. KeithWarriner, and Stephen Hart (1985) were particu-larly interested in the role "standards" play in thatrespect:

In general terms, by "standards" we mean howwell or how poorly a person has to perform inorder for an ability to be attributed or deniedhim/her. In our view, standards are a key vari-able affecting how evaluations are processedand what expectations result. For example, de-pending on the standards used, the same levelof success may be interpreted as a major ac-complishment or dismissed as unimportant.

(1985:108-9)

To begin examining the role of standards, theresearchers designed an experiment involving fourexperimental groups and a control. Subjects weretold that the experiment involved something called"pattern recognition ability," which was an innateability some people had and others didn't. The re-searchers said subjects would be working in pairson pattern recognition problems.

In fact, of course, there's no such thing as pat-tern recognition ability. The object of the experiment

Page 9: Babbie Ch8 Experiments

236 . Chapter 8: Experiments

was to determine how information about this sup-posed ability affected subjects' subsequent behavior.

The first stage of the experiment was to "test"each subject's pattern recognition abilities. If youhad been a subject in the experiment, you wouldhave been shown a geometrical pattern for 8 sec-onds, followed by two more patterns, each of whichwas similar to but not the same as the first one. Yourtask would be to choose which of the subsequentset had a pattern closest to the first one you saw.You would be asked to do this 20 times, and a com-puter would print out your "score.' Half the sub-jects would be told that they had gotten 14 correct;the other half would be told that they had gottenonly 6 correct-regardless of which patterns theymatched with which. Depending on the luck ofthe draw, you would think you had done eitherquite well or quite badly. Notice, however, that youwouldn't really have any standard for judging yourperformance-maybe getting 4 correct would beconsidered a great performance.

At the same time you were given your score,however, you would also be given your "partner'sscore," although both the "partners" and their"scores" would also be computerized fictions. (Sub-jects were told they would be communicating withtheir partners via computer terminals but wouldnot be allowed to see each other.) If you were as-signed a score of 14, you'd be told your partner hada score of 6; if you were assigned 6, you'd be toldyour partner had 14.

This procedure meant that you would enterthe teamwork phase of the experiment believingeither (1) you had done better than your partneror (2) you had done worse than your partner. Thisinformation constituted part of the "standard" youwould be operating under in the experiment. Inaddition, half of each group was told that a score ofbetween 12 and 20 meant the subject definitely hadpattern recognition ability; the other subjects weretold that a score of 14 wasn't really high enough toprove anything definite. Thus, you would emergefrom this with one of the following beliefs:

1. You are definitely better at pattern recognitionthan your partner.

2. You are possibly better than your partner.

3. You are possibly worse than your partner.4. You are definitely worse than your partner.

The control group for this experiment was toldnothing about their own abflities or their partners'.In other words, they had no expectations.

The final step in the experiment was to set the"teams" to work. As before, you and your partnerwould be given an initial pattern, followed by acomparison pair to choose from. When you en-tered your choice in this round, however, youwould be told what your partner had answered;then you would be asked to choose again. In yourfinal choice, you could either stick with your origi-nal choice or switch. The "partner's" choice was, ofcourse, created by the computer, and as you canguess, there were often a disagreements in theteams: 16 out of 20 times, in fact.

The dependent variable in this experiment wasthe extent to which subjects would switch theirchoices to match those of their partners. The re-searchers hypothesized that the definitely bettergroup would switch least often, followed by theprobably better group, followed by the control group,followed by the probably worse group, followed bythe definitely worse group, who would switch mostoften.

The number of times subjects in the five groupsswitched their answers follows. Realize that eachhad 16 opportunities to do so. These data indicatethat each of the researchers' expectations was cor-rect-with the exception of the comparison be-t ween the possibly worse and definitely worse groups.Although the latter group was in fact the morelikely to switch, the difference was too small to betaken as a confirmation of the hypothesis. (Chap-ter 16 will discuss the statistical tests that let re-searchers make decisions like this.)

In more detailed analyses, it was found that thesame basic pattern held for both men and women,though it was somewhat clearer for women thanfor men. Here are the actual data:

Because specific research efforts like this onesometimes seem extremely focused in their scope,you might wonder about their relevance to any-thing. As part of a larger research effort, however,studies like this one add concrete pieces to our un-derstanding of more general social processes.

It's worth taking a minute or so to considersome of the life situations where "expectationstates" might have very real and important conse-quences. I've mentioned the case of jury delibera-tions. How about all forms of prejudice and dis-crimination? Or, consider how expectation statesfigure into job interviews or meeting your heart-throb's parents. If you think about it, you'll un-doubtedly see other situations where these labora-tory concepts apply in real life.

"Natural" ExperimentsAlthough we tend to equate the terms experiment

and laboratory experiment, many important socialscientific experiments occur outside controlled set-tings, often in the course of normal social events.Sometimes nature designs and executes experi-ments that we can observe and analyze; sometimessocial and political decision makers serve this natu-ral function.

Imagine, for example, that a hurricane hasstruck a particular town. Some residents of thetown suffer severe financial damages, and others

"Natural" Experiments . 237

escape relatively lightly. What, we might ask, arethe behavioral consequences of suffering a naturaldisaster? Are those who suffer most more likely totake precautions against future disasters than arethose who suffer least? To answer these questions,we might interview residents of the town sometime after the hurricane. We might question themregarding their precautions before the hurricaneand the ones they're currently taking, comparingthe people who suffered greatly from the hurricanewith those who suffered relatively little. In thisfashion, we might take advantage of a natural ex-periment, which we could not have arranged evenif we'd been perversely willing to do so.

A similar example comes from the annals ofsocial research concerning World War II. After thewar ended, social researchers undertook retrospec-tive surveys of wartime morale among civilians inseveral German cities. Among other things, theywanted to determine the effect of mass bombing onthe morale of civilians. They compared the reportsofwartime morale of residents in heavily bombedcities with reports from cities that received relativelylittle bombing. (Bombing did not reduce morale.)

Because the researcher must take things prettymuch as they occur, natural experiments raisemany of the validity problems discussed earlier.Thus when Stanislav Kasl, Rupert Chisolm, andBrenda Eskenazi (1981) chose to study the impactthat the Three Mile Island (TMI) nuclear accidentin Pennsylvania had on plant workers, they had tobe especially careful in the study design:

Disaster research is necessarily opportunistic,quasi-experimental, and after-the-fact. In theterminology of Campbell and Stanley's classicalanalysis of research designs, our study falls intothe "static-group comparison" category, consid-ered one of the weak research designs. How-ever, the weaknesses are potential and theiractual presence depends on the unique circum-stances of each study.

(1981:474)

The foundation of this study was a survey ofthe people who had been working at Three Mile Is-land on March 28, 1979, when the cooling systemfailed in the number 2 reactor and began melting

Group

Mean NumberofSwitches

Definitely better 5.05Possibly better 6.23Control group 7.95Possibly worse 9.23Definitely worse 9.28

Mean Numberof Switches

Women Men

Definitely better 4.50 5.66Possibly better 6.34 6.10Control group 7.68 8.34Possibly worse 9.36 9.09Definitely worse 10.00 8.70

Page 10: Babbie Ch8 Experiments

238 . Chapter8: Experiments

the uranium core. The survey was conducted fiveto six months after the accident. Among otherthings, the survey questionnaire measured work-ers' attitudes toward working at nuclear powerplants. B they had measured only the TMIworkers'attitudes after the accident, the researchers wouldhave had no idea whether attitudes had changed asa consequence of the accident. But they improvedtheir study design by selecting another, nearby-seemingly comparable-nuclear power plant (ab-breviated as PB) and surveyed workers there as acontrol group: hence their reference to a static-group comparison.

Even with an experimental and a control group,the authors were wary of potential problems in theirdesign. In particular, their design was based on theidea that the two sets of workers were equivalent toone another, except for the single fact of the acci-dent. The researchers could have assumed this ifthey had been able to assign workers to the twoplants randomly, but of course that was not the case.instead, they needed to compare characteristics ofthe two groups and infer whether they were equiv-alent. Ultimately, the researchers concluded thatthe two sets of workers were very much alike, andthe plant the employees worked at was merely afunction of where they lived.

Even granting that the two sets of workers wereequivalent, the researchers faced another problemof comparability. They could not contact all theworkers who had been employed at TMIat the timeof the accident. The researchers discussed the prob-lem as follows:

One special attrition problem in this study wasthe possibility that some of the no-contactnonrespondents among the TMI subjects, butnot PB subjects, had permanently left the areabecause of the accident. This biased attritionwould, most likely, attenuate the estimatedextent of the impact. Using the evidence ofdisconnected or "not in service" telephonenumbers, we estimate this bias to be negligible(1 percent).

(Kasl et al. 1981:475)

The TMI example points to both the specialproblems involved in natural experiments and the

possibility for taking those problems into account.Social research generally requires ingenuity and in-sight; natural experiments call for a little more thanthe average.

Earlier in this chapter, we used a hypotheticalexample of studying whether an African-Americanhistory film reduced prejudice. Sandra Ball-Rokeach, Joel Grube, and Milton Rokeach (1981)were able to address that topic in real life through anatural experiment. In 1977, the television drama-tization of Alex Haley's Roots, a historical saga aboutAfrican Americans, was presented by ABC on eightconsecutive nights. It garnered the largest audiencesin television history up to that time. Ball-Rokeachand her colleagues wanted to know whether Rootschanged white Americans' attitudes toward AfricanAmericans. Their opportunity arose in 1979, whena sequel-Roots: The Next Generation-was televised.Although it would have been nice (from a re-searcher's point of view) to assign random samplesof Americans either to watch or not to watch theshow, that wasn't possible. Instead, the researchersselected four samples in Washington State andmailed questionnaires that measured attitudes to-ward African Americans. Following the last epi-sode of the show, respondents were called andasked how many, if any, episodes they had watched.Subsequently, questionnaires were sent to respon-dents, remeasuring their attitudes toward AfricanAmericans.

By comparing attitudes before and after forboth those who watched the show and those whodidn't, the researchers reached several conclusions.For example, they found that people with alreadyegalitarian attitudes were much more likely towatch the show than were those who were moreprejudiced toward African Americans: a self-selec-tion phenomenon. Comparing the before and afterattitudes of those who watched the show, more-over, suggested the show itself had little or no ef-fect. Those who watched it were no more egalitar-ian afterward than they had been before.

This example anticipates the subject of Chap-ter 12, evaluation research, which can be seen as aspecial type of natural experiment. As you'll see,evaluation research involves taking the logic of ex-perimentation into the field to observe and evalu-

ate the effects of stimuli in real life. Because this isan increasingly important form of social research,an entire chapter is devoted to it.

Strengths and Weaknessesof the Experimental MethodExperiments are the primary tool for studyingcausal relationships. However, like all researchmethods, experiments have both strengths andweaknesses.

The chief advantage of a controlled experimentlies in the isolation of the experimental variableand its impact over time. This is seen most clearlyi n terms of the basic experimental model. A groupof experimental subjects are found, at the outset ofthe experiment, to have a certain characteristic; fol-lowing the administration of an experimental stimu-lus, they are found to have a different characteristic.To the extent that subjects have experienced noother stimuli, we may conclude that the change ofcharacteristics is attributable to the experimentalstimulus.

Further, since individual experiments are oftenrather limited in scope, requiring relatively littletime and money and relatively few subjects, we of-ten can replicate a given experiment several timesusing several different groups of subjects. (This isn'talways the case, of course, but it's usually easier torepeat experiments than, say, surveys.) As in allother forms of scientific research, replication of re-search findings strengthens our confidence in thevalidity and generalizability of those findings.

The greatest weakness of laboratory experi-ments lies in their artificiality. Social processes thatoccur in a laboratory setting might not necessarilyoccur in more natural social settings. For example,an African-American history film might genuinelyreduce prejudice among a group of experimentalsubjects. This would not necessarily mean, however,that the same film shown in neighborhood movietheaters throughout the country would reduce prej-udice among the general public. Artificiality is notas much of a problem, of course, for natural experi-ments as for those conducted in the laboratory.

Main Points . 239

In discussing several of the sources of internaland external invalidity mentioned by Campbell,Stanley, and Cook, we saw that we can create ex-perimental designs that logically control such prob-lems. This possibility points to one of the great ad-vantages of experiments: They lend themselves toa logical rigor that is often much more difficult toachieve in other modes of observation.

MAIN POINTS

Experiments are an excellent vehicle for thecontrolled testing of causal processes.

The classical experiment tests the effect of anexperimental stimulus (the independent vari-able) on a dependent variable through thepretesting and posttesting of experimental andcontrol groups.

It is generally less important that a group ofexperimental subjects be representative ofsome larger population than that experimen-tal and control groups be similar to eachother.

A double-blind experiment guards against ex-perimenter bias because neither the experi-menter nor the subject knows which subjectsare in the control and experimental groups.

Probability sampling, randomization, andmatching are all methods of achieving compa-rability in the experimental and control groups.Randomization is the generally preferredmethod. In some designs, it can be combinedwith matching.

Campbell and Stanley describe three formsof preexperiments: the one-shot case study,the one-group pretest-posttest design, and thestatic-group comparison. None of these designsfeatures all the controls available in a trueexperiment.

Campbell and Stanley list, among others,12 sources of internal invalidity in experimen-tal design. The classical experiment with ran-dom assignment of subjects guards against eachof these 12 sources of internal invalidity.

I

Page 11: Babbie Ch8 Experiments

240 . Chapter 8: Experiments

• Experiments also face problems of external in-validity: Experimental findings may not reflectreal life.

The interaction of testing and stimulus is an ex-ample of external invalidity that the classicalexperiment does not guard against.

The Solomon four-group design and other vari-ations on the classical experiment can safeguardagainst external invalidity.

Campbell and Stanley suggest that, givenproper randomization in the assignment of sub-jects to the experimental and control groups,there is no need for pretesting in experiments.

Natural experiments often occur in the courseof social life in the real world, and social re-searchers can implement them in somewhatthe same way they would design and conductlaboratory experiments.

Like all research methods, experiments havestrengths and weaknesses. Their primary weak-ness is artificiality: What happens in an experi-ment may not reflect what happens in theoutside world. Their strengths include the isola-tion of the independent variable, which permitscausal inferences; the relative ease of replica-tion; and scientific rigor.

KEY TERMS:,

The following terms are defined in context in thechapter and at the bottom of the page where the termis introduced, as well as in the comprehensive glossaryat the back of the book.pretesting

randomizationposttesting

matchingexperimental group

internal invaliditycontrol group

external invaliditydouble-blind

experiment

REVIEW 4UESTIONS.AND-EXERCISES

1. In the library or on the Web, locate a research re-port of an experiment. Identify the dependentvariable and the stimulus.

2. Pick 6 of the 12 sources of internal invalidity dis-cussed in this chapter and make up examples (notdiscussed in the chapter) to illustrate each.

3. Create a hypothetical experimental design that il-lustrates one of the problems of external invalidity.

4. Think of a recent natural disaster you've wit-nessed or read about. Frame a research questionthat might be studied by treating that disaster as anatural experiment. In two or three paragraphs,outline how the study might be done.

5. In this chapter, we looked briefly at the problemof "placebo effects." On the Web, find a study inwhich the placebo effect figured importantly.Write a brief report on the study, including thesource of your information. (Hint: you might wantto do a search on "placebo.')

ADDITIONAL READINGS:,,,%

Campbell, Donald, and Julian Stanley. 1963. Experi-mental and Quasi-Experimental Designs for Research.Chicago: Rand McNally. An excellent analysis ofthe logic and methods of experimentation in so-cial research. This book is especially useful in itsapplication of the logic of experiments to othersocial research methods. Though fairly old, thisbook has attained the status of a classic and isstill frequently cited.

Cook, Thomas D., and Donald T. Campbell. 1979.Quasi-Experimentation: Design and Analysis Issuesfor Field Settings. Chicago: Rand McNally. An ex-panded and updated version of Campbell andStanley.

Jones, Stephen R. G. 1990. "Worker Independenceand Output: The Hawthorne Studies Reevalu-ated." American Sociological Review 55:176-90.This article reviews these classical studies andquestions the traditional interpretation (whichwas presented in this chapter).

Martin, David W. 1996. Doing Psychology Experiments.4th ed. Monterey, CA: Brooks/Cole. With thor-ough explanations of the logic behind researchmethods, often in a humorous style, this bookemphasizes ideas of particular importance to thebeginning researcher, such as getting an idea foran experiment or reviewing the literature.

Ray, William J. 2000. Methods toward a Science ofBe-havior and Experience. 6th ed. Belmont, CA:Wadsworth. A comprehensive examination ofsocial science research methods, with a special

emphasis on experimentation. This book is es-pecially strong in the philosophy of science.

RESOURCES ON THE INTERNET

VIRTUAL SOCIETY'S COMPANION WEB SITE FOR THEPRACTICE OF SOCIAL RESEARCH, 10TH EDITION

http://www.wadsworth.com/sociologyOnce at the Virtual Society, dick on "Find CompanionSites" from the left navigation bar, click on "ResearchMethods and Statistics," and then click on your bookcover. On the companion site, you will find usefullearning resources for your course. Some of those re-sources include Tutorial Quizzes with feedback, Inter-net Exercises, Flashcards, and Chapter Tutorials forevery chapter, as well as Extended Projects, Social Re-search in Cyberspace, and Primers for using variousdata analysis software such as SPSS and NVivo.

WEB LINKS FOR THIS CHAPTER

Please realize that the Internet is an evolvingentity, subject to change. Nevertheless, thesefew Web sites should be fairly stable.

Yahoo! Directory: Tests and Experimentshttp://dir.yahoo.com/Social_Science/Psychology/Research/Tests-and-Experiments/Here you'll find an extensive list of Web sites relatingto various kinds of social science experiments. Sometell you about past or ongoing experiments, and some

Resources on the Internet . 241

i nvite you to participate, either as a subject or as anexperimenter.

William D. Hacker Social Science ExperimentalLaboratoryhttp://ssel.caltech.edu/This laboratory at the California Institute of Technol-ogy gives students the opportunity to participate assubjects in experiments online-for pay.

Stanford Prison Experimenthttp://www.prisonexp.org/This Web site provides a slide show relating a famoussocial science experiment that reveals some of theproblems that can occur in this type of study.

I NFOTRAC COLLEGE EDITION

http://www.infotrac-college.com/wadsworth/access.html

Access the latest news and journal articles with Info-Trac College Edition, an easy-to-use online database ofreliable, full-length articles from hundreds of top aca-demic journals. Conduct an electronic search usingthe following terms:Double-blind experiment Experiment ANDExperiment AND

randomizationcontrol group

Experiment ANDExperiment AND

stimulusmatching

Mock juryExperiment AND

Natural experimentplacebo

Pretest AND posttest