Cu as i Experimental

8/17/2019 Cu as i Experimental

1/81

ii:.

EXPERIMENTAL

ND

QUASI-EXPERIMENTAL

DESIGNS

OR

GENERALIZED

CAUSAL

NFERENCE

Wil l iam R. Shadish

Trru UNIvERSITY

op

MEvPrrts

Thomas

D. Cook

NonrrrwpsrERN

UNrvPnslrY

Donald

T.Campbell

i L l i

.jr-* -

**

fr

HOUGHTON

IFFLIN

OMPANY

Boston

New

York


2/81

Experimentsnd

Generalized

ausal

lnference

Ex.per'i'ment

(ik-spEr'e-mant):

Middle

English from Old

French

rom

Latin

experimentum, from experiri, to try; see er- in Indo-EuropeanRoots.]

n.

Abbr. exp.,

expt,

1. a. A

test under

controlled

conditions that

is

made to demonstrate

a

known

truth,

examine

the validity of

a hypothe-

sis,

or determine

he

efficacy

of something

previously untried' b.

The

processof conducting

such

a test; experimentation.

2'

An innovative

act or

procedure:

"Democracy

is

only an experiment

n

gouernment"

(.V{illiam

Ralph lnge).

Cause

k6z):

[Middle

English

from

Old French

from

Latin causa' teason,

purpose.] n. 1. a.

The

producer

of an

effect,

result, or consequence.

b. The

one, such

as a

person,

an

event' or

a condition,

that is responsi-

ble

for an action

or a

result.

v. 1.

To be the

causeof or

reason or;

re-

sult

in. 2.

To bring

about

or compel

by authority

or

force.

o

MANv

historians

and

philosophers,

he

increased

emphasis

on experimenta-

tion

in the

15th and

L7th centuries

marked

the emergence

f

modern science

from

its roots

in

natural

philosophy

(Hacking,

1983). Drake

(1981)

cites

Galileo's

1.6' .2

treatrse

Bodies

Tbat Stay

Atop

'Water,

or Moue in

It as ushering

n

modern

experimental

science,

but

earlier

claims

can be

made favoring

\Tilliam

Gilbert's

1,600 tudy

Onthe

Loadstone

nd

MagneticBodies,

eonardo

da Vinci's

(1,452-1.51.9)

any nvestigations,

nd

perhapseven

he

Sth-century

.C.philoso-

pher

Empedocles,

who used

various

empirical

demonstrations

o argue

against

Parmenides

Jones,

1.969a,

1'969b).In

the

everyday

senseof

the term,

humans

have beenexperimenting

with

different

ways

of doing

things

from

the earliest

mo-

ments

of their

history.

Such

experimenting

s as

natural a

part

of

our

life as rying

a

new

recipeor

a different

way

of starting

campfires.


3/81

z

|

1. EXeERTMENTs

NDGENERALTzED

AUsAL

NFERENcE

I

However,

the

scientific revolution

of the

1.7th

century departed

n

three ways

from

the common use

of observation n natural

philosophy

atthat

time. First, it in-

creasingly used

observation to

correct errors

in

theory.

Throughout historg

natu-

ral

philosophers

often used

observation in their theories,

usually to win

philo-

sophical

arguments

by

finding

observations that supported

their theories.

However,

they still

subordinated the use of observation to

the

practice

of

deriving

theories rom

"first

principles,"

starting

points

that

humans know to be true

by our

nature or

by divine revelation

(e.g.,

he assumed

properties

of the

four

basic

ele-

ments

of

fire,

water,

earth, and air in Aristotelian natural

philosophy).

According

to some accounts,

his subordination

of

evidence o theory degenerated

n the 17th

century:

"The

Aristotelian

principle

of

appealing

to experience

had degenerated

among

philosophers

nto

dependenceon

reasoning supported by casual examples

and the refutation

of opponents by

pointing

to apparent

exceptionsnot carefully

examined"

(Drake,

'1,98"1.,

p. xxi).'Sfhen

some

17th-century

scholars hen began

o

use observation to correct apparent errors in theoretical and religious first princi-

ples,

they came

into

conflict with religious or

philosophical authorities, as

in

the

case

of the

Inquisition's

demands that Galileo

recant his

account

of the earth re-

volving around

the sun. Given such hazards, he fact that the

new experimental

sci-

ence ipped

the

balance

oward observation and ^way

from dogma

is remarkable.

By

the time

Galileo died, the role

of systematicobservation

was

firmly

entrenched

as a central feature

of science,and it has remained so ever

since

(Harr6,1981).

Second,

before the

17th

century,

appeals o

experiencewere

usually

based

on

passive

observation

of ongoing systems ather than on

observation of what

hap-

pens

after a

system

s

deliberately

changed.

After the scientific

revolution

in the

L7th

centurS the

word experiment

(terms

in boldface

in this book are defined

in

the Glossary)

came to connote

taking a deliberate

action

followed by systematic

observation

of what occurredafterward. As Hacking

(1983)

noted

of Francis

Ba-

con:

"He

taught that not

only

must

we observenature

in the

raw,

but that

we must

also

'twist

the

lion's

tale', that

is,

manipulate our world

in order to

learn its

se-

crets"

(p.

U9).

Although

passive

observation

evealsmuch about

the world, ac-

tive manipulation

is required

to discover some of the

world's

regularities and

pos-

sibilities

(Greenwood,,

1989). As

a

mundane

example,

stainless steel

does not

occur

naturally;

humans must manipulate it into

existence.

Experimental science

came to

be concerned with

observing

the

effects of

such

manipulations.

Third,

early

experimenters

realized

he desirability of controlling

extraneous

influences that might limit or bias observation. So telescopeswere carried to

higher

points

at

which

the air

was clearer,

he

glass for microscopeswas

ground

ever more

accuratelg

and scientists

constructed laboratories

in

which

it was

pos-

sible

to use walls

to

keep

out

potentially

biasing ether

waves and to use

(eventu-

ally

sterilized)

est tubes to

keep out dust or

bacteria.

At first, thesecontrols were

developed

or

astronomg chemistrg

and

physics,

he

natural sciences

n which in-

terest in

science irst

bloomed.

But when scientists

started

to use experiments

n

areas such

as

public

health

or

education,

in

which extraneous

influences

are

harder

to control

(e.g.,

Lind

,

1,753lr, hey found

that the controls

used

n natural


4/81

EXPERTMENTS

ND CAUSATTON

I

science

n the aboratory

worked

poorly

in these

new applications.

So hey

devel-

oped

new methods

of dealing

with extraneous

nfluence,such

asrandom

assign-

ment

(Fisher,

,925) r

adding

a nonrandomized

ontrol

group

(Coover

&

Angell,

1.907).

s theoretical

nd

observational

xperience

ccumulated

cross hese et-

tings and

opics,

more sources

f bias

were dentified

and

more methodswere de-

velopedo copewith them(Dehue, 000).

TodaS he

key feature

common

o all

experiments

s still

to

deliberately

ary

something

o as

o

discover

what

happenso something

lse ater-to discover he

effects

f

presumed auses.

s laypersons

e

do this,

for example, o assess hat

happens

o our

blood

pressuref we exercise

more, to our weight

f we diet

less,

or ro our

behavior

f we

read a

self-help

book.

However,scientific

experimenta-

tion

has developed

ncreasingly

pecialized

ubstance,

anguage,

and tools, in-

cluding

he

practice

of

field

experimentation

n the social

scienceshat

is the

pri-

mary

focus of

this

book.

This chapter

begins

to explore

these

matters by

(1)

discussing

he

nature

of causation

hat

experiments

est,

2)

explaining he spe-

cializederminology e.g., andomizedexperiments, uasi-experiments)hat de-

scribes

social

experiments,

3)

introducing

the

problem

of

how to

generalize

causalconnections

rom

individual

experiments,

nd

(4)

briefly situating he ex-

perimentwithin

a larger

iterature

on the

nature

of science.

EXPERIMENTS

ND CAUSATION

A sensible

iscussion

f

experiments

equires

both a vocabulary

or talking about

causation

nd

an understanding

f

key concepts

hat underlie

hat vocabulary.

Defining

Cause,

ffect, nd

Causal

elationships

Most

people

ntuitively

recognize

ausal

elationships

n their

daily lives.

For in-

stance,

ou

may

say hat

another

automobile's

itting

yours

was a cause

of the

damage o

your

car;

that the

number

of

hours

you

spent

studying

was a cause

f

your

test

grades; r that

the amount

of food a

friend

eatswasa cause

f his weight.

You may even

point to

more complicated

ausal

elationships,

oting that a

low

test

grade

was

demoralizing,

which

reducedsubsequent

tudying,which

caused

even

ower

grades.

Here

he same

ariable

low grade)

can

be both

a cause nd an

effect,and there can be a reciprocalrelationshipbetween wo variables

low

grades

and

not studying)

hat

cause ach

other.

Despite his

ntuitive

amiliarity with

causal

elationsbips,

precise

efinition

of

cause nd effect

haseluded

philosophers

or centuries.l

ndeed, he definitions

1. Our analysis

efldcts he use

of the

word causation

n ordinary

anguage, ot the

more detailed

discussions f

cause

by

philosophers. eaders

nterestedn

suchdetail

may consult

a

host of works that

we reference

n

this

chapter,

ncludingCook

and Campbell

1979).


5/81

4

|

1.

EXPERTMENTS

ND

GENERALTZED

AUSALNFERENCE

of terms

such

as cause

and,

ffectdependpartly

on

eachother

and

on the

causal

relationship

n

which

both

are

embedded.

o the 17th-century

hilosopher

John

Locke

said:

"That

which

produces

ny

simpleor complex

dea,

we

denote

by

the

general

name

caLtse,

nd

that

which is

produce

, effect"

(1,97

,

p.

32fl

and also:

" A cAtrses that which

makes

any

other thing,

either

simple dea,

substance,

r

mode,

begin

o be;

and

an effect s

that,

which

had

ts

beginning rom

some

other

thing"

(p.

325).

Since

hen,

other

philosophers

nd

scientistsavegiven

us

useful

definitions

of

the three

key deas--cause,

ffect,

and causal elationship-that

are

more

specific

nd

hat

better

lluminate

how

experiments

ork. We

would

not

de-

fend

any

of these

as he

true or

correct

definition,

given

hat the

atter

has

eluded

philosophers

or

millennia;

but

we do claign

hat these deas

help

o

clarify

the sci-

entific

practice

of probing

causes.

Cause

Consider

he

cause

of

a

forest

ire.

We

know

that

fires

start n

different

ways-a

match

tossed

rom

a ca\

a

lightning

strike,

or a smoldering

ampfire, or

exam-

ple.

None

of these

causes

s necessary

ecause forest ire

can start

even

when,

say'

a match

s

not present.

Also,

none

of

them

s

sufficient

o start

he fire.

After

all,

a match

must

stay

"hot"

long

enough

o start combustion;

t

must

contact

combustible

material

such

as

dry leaves;

here

must

be

oxygen

or

combustion

o

occur;

and

the weather

must

be

dry enough

so that the leaves

are dry

and the

match

s

not

doused

by rain.

So

he match

s

part

of a constellation

f conditions

without

which

a

fire

will not result,

although

someof

these onditions

an

be usu-

ally

taken

or granted,

such

as he

availability

of oxygen.A lighted

match s,

rhere-

fore, what Mackie (1,974)calledan inus condition-"an

insufficient

but non-

redundant

part

of an unnecessary

ut sufficient

condition"

(p.

62; italics

n

orig-

inal).

It

is insufficient

because

match

cannot start

a

fire

without

the other

con-

ditions.

It

is nonredundant

only if it

adds

something ire-promoting

hat is

uniquely

different

rom

what the

other factors

n the

constellation

e.g.,

oxygen,

dry leaves)

ontribute

o

starting

a fire;

after all,it

would beharder

o say

whether

the match

caused

he

fire if

someone

lsesimultaneously

ried

starting t

with

a

cigarette

ighter.

t is part

of a

sufficient

condition to

start a fire in

combination

with

the full

constellation

of

factors.

But

that condition s not necessary

ecause

there

are

other

sets

of

conditions

hat

can

also start

fires.

A research xampleof an inusconditionconcerns newpotential reatment

for

cancer.

n

the

ate

1990s,

a team

of

researchers

n Boston

headed

y Dr.

Judah

Folkman

reported

hat

a new

drug

calledEndostatin

shrank umors

by limiting

their

blood

supply

(Folkman,

1996).

Other respectedesearchers

ould not repli-

cate

he

effect

even

when

usingdrugs

shipped

o

them

from Folkman's

ab.

Scien-

tists

eventually

eplicated

he

results

after they had

traveled

o

Folkman's

ab

to

learn

how

to

properly

manufacture,

ransport,

store,and handle

he drug

and how

to

inject

t in

the

right

location

at

the right

depthand angle.One

observerabeled

these

contingencies

he

"in-our-hands"

phenomenon,

meaning

"even

we

don't


6/81

EXPERIMENTS

ND

CAUSATION

S

know

which

details

are

mportant,

so

it

might

take

you some

time

to work

it out"

(Rowe,

L999,

p.732).

Endostatin

was

an

inus

condition.

It was insufficient

cause

by

itself, and

its effectiveness

equired

it to

be

embedded

n a

larger set

of

condi-

tions

that

were

not

even

ully understood

by

the original

investigators.

Most

causes

are

more

accurately

called

nus conditions.

Many

factors are

usu-

ally

required

for

an effect

o occur,

but

we

rarely

know all of them and how they

relate

to each

other.

This

is one

reason

that

the

causal

relationships

we discuss

n

this

book

are

not

deterministic

but only

increase

he

probability that

an effect

will

occur

(Eells,

1,991,;

olland,

1,994).It

also

explains

why

a

given causal

elation-

ship

will

occur

under

some

conditions

but

not universally

across

ime,

space,

hu-

-"r

pop,rlations,

or

other

kinds

of treatments

and

outcomes

that

are

more

or less

related

io those

studied.

To different

{egrees,

all

causal

relationships

are

context

dependent,

so

the

generalization

of

experimental

effects

s always

at

issue.

That is

*hy

*.

return

to

such

generahzations

hroughout

this

book.

Effect

'We

can

better

understand

what

an effect

s

through

a counterfactual

model

that

goes

back

at

least

to

the

18th-century

philosopher

David

Hume

(Lewis,

'l'973'

p.

SSe

.

A counterfactual

is something

that

is contrary

to

fact.

In an experiment,

ie

obserue

what

did

happez

when

people

received

a

treatment.

The

counterfac-

tual

is knowledge

of

what

would

haue

happened

to

those

same

people

if

they

si-

multaneously

had

not

received

reatment.

An

effect

s the

difference

between

what

did

happen

and

what

would

have

happened.

'We

cannot

actually

observe

a counterfactual.

Consider

phenylketonuria

(PKU),

a

genetically-based

etabolic

disease

hat

causes

mental

retardation

unless

treated

during

the

first

few

weeks of life. PKU is the absenceof an enzymethat

would

otherwise

prevent

a buildup

of

phenylalanine,

a

substance

oxic

to the

nervous

system.

Vhen

a restricted

phenylalanine

diet

is begun

early

and

main-

tained,

reiardation

is

prevented.

n this

example,

the

cause

could

be thought

of as

the

underlying

genetic

defect,

as

the

enzymatic

disorder,

or as

the

diet.

Each

im-

plies a difierenicounterfactual.

For

example,

if we

say

that

a

restricted

phenyl-

alanine

diet

caused

a

decrease

n

PKU-based

mental

retardation

in

infants who

are

phenylketonuric

at

birth,

the

counterfactual

is whatever

would

have

happened

'h"d

t'h.r.

same

nfants

not

received

a

restricted

phenylalanine

diet.

The same

ogic

applies

to

the

genetic

or enzymatic

version

of

the

cause.

But

it is

impossible

for

theseu.ry ,"-i infants simultaneously to both have and not havethe diet, the ge-

netic disorder,

or

the

enzyme

deficiency.

So

a central

task

for all

cause-probing

research

s to create

reasonable

ap-

proximations

to

this

physically

impossible

counterfactual.

For

instance,

f it were

ethical

to do

so,

we

might

contrast

phenylketonuric

infants

who

were

given the

diet

with

other

phenylketonuric

infants

who

wer€

not

given the

diet

but who

were

similar

in

many ways

to

those

who

were

(e.g.,

similar

face)

gender,age,

socioeco-

nomic status,

health

status).

Or

we

might

(if

it were

ethical)

contrast

infants

who


7/81

I

6

I

1. EXPERIMENTS

ND

GENERALIZED

AUSALNFERENCE

were not on

the diet for

the first

3 months

of their

lives

with those

same nfants

after they

were

put

on the diet

starting in

the

4th

month. Neither of these

ap-

proximations

is

a true counterfactual.

In

the

first

case,

he

individual infants in

the

treatment condition are different from those in the comparison condition; in the

second case,

he identities

are

the same,

but time

has

passed

and

many

changes

other than the

treatment have

occurred

to the infants

(including

permanent

dam-

age done

by

phenylalanine

during the first

3

months

of life). So two central

tasks

in experimental

design are

creating

a

high-quality

but necessarily

mperfect

source

of counterfactual

inference

and understanding

how this source differs from

the

treatment condition.

This

counterfactual

reasoning

s fundarnentally

qualitative

because ausal n-

ference, even

in experiments,

is

fundamentally

qualitative

(Campbell,

1975;

Shadish, 1995a;

Shadish

6c Cook, 1,999). However, some

of these

points

have

been

ormalized

by statisticians nto

a specialcase hat

is

sometimes alled Rubin's

CausalModel (Holland, 1,986;Rubin, 1.974,'1.977,1978,79861.This book is not

about statistics,

so we

do

not

describe hat model in detail

('West,

Biesanz,

&

Pitts

[2000]

do so and relate t

to the

Campbell radition). A

primary

emphasis f Ru-

bin's

model

is

the analysis

of cause n

experiments, and

its

basic

premises

are con-

sistent with

those

of this book.2 Rubin's

model has also beenwidely used

o ana-

lyze

causal inference

in

case-control

studies in

public health

and medicine

(Holland

6c Rubin, 1988),

in

path

analysis n sociology

(Holland,1986),

and

in

a

paradox

that

Lord

(1967)

introduced

into

psychology

(Holland

6c Rubin,

1983);

and

it

has

generated

many

statistical nnovations that we cover ater in

this

book.

It is

new

enough that

critiques

of

it

are

just

now

beginning

to

appear

(e.g.,

Dawid,

2000;

Pearl, 2000).

tUfhat

s clear, however, is that Rubin's is a very gen-

eral

model

with

obvious

and subtle implications. Both it

and the critiques of

it

are

required

material

for

advanced

students

and

scholars of cause-probingmethods.

CausalRelationship

How

do

we

know if

cause and effect

are

related? In

a classicanalysis

ormalized

by the

19th-century

philosopher

John

Stuart Mill, a causal

relationship

exists if

(1)

the causepreceded

he effect,

(2)

the causewas

related

to the effect,and

(3)

we

can find

no

plausible

alternative

explanation

for

the effect other

than

the cause.

These

three

characteristics

mirror

what happens in experiments

n

which

(1)

we

manipulate the presumed cause and observe an outcome afterward; (2) we see

whether

variation in

the

cause s related

to variation

in

the effect ; and

(3)

we use

various methods

during

the experiment

to

reduce

the

plausibility

of other expla-

nations for

the effect,

along with

ancillary methods to explore the

plausibility

of

those

we cannot

rule

out

(most

of this book is abou t methods

for

doing this).

2. However,

Rubin's model

is not intended

to say much

about the matters of

causal

generalization

that

we address

in this book.


8/81

EXPERTMENTS

ND

CAUSATTON

|

7

I

Henceexperiments

re

well-suited

o

studying

causal

elationships.No other

sci-

entific

method

egularly

matches

he characteristics

f causal elationships

owell.

Mill's analysis

lso

points o the

weakness

f

other

methods. n many correlational

studies,

or example,

t

is impossible

o

know

which of two variables

ame irst,

so defending

causal

elationship

etween

hem

s

precarious.

Understanding

his

logic of causal elationships nd how its key terms,suchas causeand effect,are

defined

helps

esearchers

o

critique

cause-probing

tudies.

Causation,

orrelation,

nd

Confounds

A well-known

maxim

in research

is:

Correlation

does

not

proue

causation.

This is

so because

we

may not

know

which

variable

came

irst

nor whether

alternative ex-

planations

for the

presumed effect

exist.

For example,

suppose

ncome

and

educa-

tion are correlated.

Do

you

have

o

have a high

income

before

you

can aff.ordto

pay

for education,or do you first have o get a good educationbefore you can get a bet-

ter

paying

ob?

Each

possibility

may

be true,

and

so both

need

nvestigation.

But

un-

til

those

nvestigations

are

completed

and

evaluated

by the scholarly

communiry

a

simple

correlation

does

not indicate

which

variable

came

first. Correlations

also

do

little to rule

out alternative

explanations

for a

relationship

between two

variables

such as education

and

income.

That

relationship

may not be

causal at al l but

rather

due

to a

third

variable

(often

called

a confound),

such

as

intelligenceor

family so-

cioeconomic

status,

hat

causes

oth

high

education

and

high

income.

For

example,

if high

intelligence

causes

uccess

n education

and on

the

job,

then intelligent

peo-

ple

would

have correlated

education

and

incomes,

not because

ducation

causes

n-

come

(or

vice

versa)

but

because

oth

would

be

causedby

intelligence.

Thus

a

cen-

tral task in the study of experiments is identifying the different kinds of confounds

that

can operate

n a

particular

research

area

and

understanding

he strengths

and

weaknesses

ssociated

with

various

ways

of dealing

with them

Manipulable

nd

Nonmanipulable

auses

In the

ntuitive understanding

f experimentation

hat most

peoplehave, t makes

sense

o say,

Let's

seewhat

happens

f we

require

welfare

ecipients

o work";

but

it makesno sense

o

say,

Let's

see

what

happens

f I

change

his

adult

male nto a

three-year-oldirl." And so t is also n scientific xperiments. xperiments xplore

the effects

of things

that

can

be

manipulated,

such as

the dose

of a

medicine, he

amount of

a welfare

check,

he

kind or

amount

of

psychotherapy r

the number

of

children

n a classroom.

onmanipulable

vents

e.g.,

he explosion

of a super-

nova) or attributes

e.g., eople's

ges,

heir

raw

geneticmaterial,or their

biologi-

cal sex)

cannot

be causes

n experiments

ecause

e cannot

deliberately

ary

hem

to seewhat

then

happens.

Consequently,

ost scientists

nd

philosophers

gree

that

it is much

harder

o discover

he

effects

f

nonmanipulable

auses.


9/81

I

8

|

1. EXeERTMENTSNDGENERALTzEDAUsAL

NFERENcE

To be clear,we are

not

arguing that

all causes

must

be

manipulable-only

that

experimental

causes

must

be so.

Many variables hat

we correctly

think

of as causes

are

not directly manipulable. Thus

it is

well established

hat a

genetic

defect causes

PKU even hough that defect s not directly manipulable.'We can investigatesuch

causes

ndirectly in nonexperimental studiesor

even

n

experiments

by manipulat-

ing biological

processes

hat

prevent

the

gene from exerting

its

influence,

as

through

the use of diet to

inhibit

the

gene's

biological

consequences.

oth the non-

manipulable

gene

and the manipulable diet

can be

viewed as

causes-both

covary

with

PKU-based etardation, both

precede he retardation,

and

it is possible

o ex-

plore

other explanations

for the

gene's

and the

diet's

effectson

cognitive

function-

ing.

However, investigating he manipulablc

diet as a

cause

has two

important ad-

vantages

over considering the

nonmanipulable

genetic

problem as a cause.

First,

only the diet

provides

a direct action to

solve the

problem;

and

second,

we will see

that studying

manipulable agentsallows a

higher

quality

source

of counterfactual

inference hrough such methods as random assignment. fhen individuals with the

nonmanipulable

genetic problem

are compared

with

persons

without

it,

the

latter

are

likely to be different

from

the

former in

many ways

other than

the

genetic

de-

fect. So the counterfactual

inference

about

what

would

have

happened

to

those

with the PKU

genetic

defect

s much more difficult

to

make.

Nonetheless,

nonmanipulable causes hould

be

studied using

whatever

means

are available

and seemuseful.

This is true because

uch

causes ventually

help

us

to

find

manipulable agents

that can then be

used

to ameliorate

the

problem

at

hand. The PKU example

illustrates

this.

Medical researchers

id

not discover

how

to treat

PKU

effectively

by

first

trying different

diets

with

retarded children.

They

first discovered the nonmanipulable biological features of

retarded children

af-

fected with PKU, finding abnormally

high

levels of

phenylalanine

and

its

associ-

ated

metabolic

and

genetic problems

in those

children.

Those

findings

pointed

in

certain ameliorative directions and

away

from others,

leading scientists

o exper-

iment with treatments they

thought might be effective

and

practical. Thus

the

new

diet

resulted from a sequenceof studies

with different

immediate

purposes, with

different

forms, and

with

varying degreesof

uncertainty

reduction.

Somewere ex-

perimental, but

others were

not.

Further,

analogue experiments

can sometimes

be done

on

nonmanipulable

causes, hat is, experiments that

manipulate an

agent

that

is

similar

to the

cause

of

interest. Thus

we cannot change

a

person's ace, but

we can

chemically

induce

skin pigmentation changes n volunteer individuals-though such analoguesdo

not match the reality of being

Black

every

day and

everywhere

or an entire

life.

Similarly

past

events,which are

normally nonmanipulable,

sometimes

constitute

a

natural

experiment that

may even

have

been

randomized,

as when

the

1'970

Vietnam-era draft

lottery

was used

to

investigate a

variety of

outcomes

(e.g.,

An-

grist,

Imbens,

&

Rubin, 1.996a;Notz, Staw,

&

Cook,

l97l).

Although

experimenting on

manipulable causes

makes he

job

of discovering

their effectseasier,experiments are

far from

perfect

meansof

investigating

causes.


10/81

I

EXPERIMENTSND CAUSATION 9

Sometimes

experiments

modify the conditions

in

which testing

occurs in a

way

that reduces he

fit between

hose conditions and

the

situation to which

the results

are

to

be

generalized.Also,

knowledge of the

effects

of manipulable

causes

ells

nothing about

how and why those effects

occur.

Nor

do experiments

answer many

other

questions relevant to the

real world-for example,

which

questions

are

worth asking, how strong the need for treatment is, how a cause s distributed

through societg

whether

the treatment

is

implemented with theoretical fidelitS

and what value

should be

attached to the

experimental

results.

In additioq,

in experiments,

we first

manipulate a treatment and only then

ob-

serve

ts effects;but

in some other

studieswe

first

observean effect, such as AIDS,

and then search

for its cause,

whether

manipulable

or not. Experiments

cannot

help

us with

that search.

Scriven

(1976)

likens such

searches o detective

work

in

which a crime

has beencommitted

(..d.,

"

robbery),

the detectives

bservea

par-

ticular

pattern

of evidence

surrounding

the

crime

(e.g.,

he

robber

wore a baseball

cap and a

distinct

jacket

and used a certain

kind of

Bun),

and then the detectives

search or criminals whose known method of operating

(their

modus

operandi or

m.o.) includes this

pattern. A

criminal

whose

m.o. fits that

pattern

of

evidence

then becomesa

suspect o be

investigated

further.

Epidemiologists

use a similar

method, the case-control

design

(Ahlbom

6c

Norell, 1,990), n which they observe

a

particular health outcome

(e.g.,

an

increase

n brain tumors) that is not

seen

n

another

group

and then

attempt to

identify

associatedcauses

e.g.,

ncreased

cell

phone use). Experiments

do

not aspire to answer

all the

kinds

of

questions,

not

even all the

types of

causal

questions,

hat social

scientistsask.

Causal escriptionnd Causal xplanation

The uniquestrength

of experimentation

s in describing

he consequencesttrib-

utable o deliberately

aryinga treatment.'We

all this

causaldescription. n con-

trast,

experiments

o

lesswell in clarifying

the

mechanismshrough which and

the

conditions

under

which that

causal

elationship

holds-what

we

call

causal

explanation.

For example,most

childrenvery

quickly earn he descriptive

ausal

relationshipbetween

licking

a light switch

and obtaining

llumination n a room.

However, ew children

(or

evenadults)

can

fully explain why that

light

goes

on.

To do so, hey would

have o

decompose

he treatment

the

act of flicking

a

light

switch)

nto

its

causally fficacious

eatures

e.g.,

losingan nsulated ircuit) and

its nonessentialeatures

e.g.,

whether he

switch

s thrown by hand or a motion

detector).

They would

have o do the same

or the effect

(either

ncandescent r

fluorescent

ight can be

produced,

but

light

will still be

produced

whether the

light fixture is recessed r

not). For

full explanation,

hey would then have to

show

how

the

causally

efficacious

arts of

the treatment

nfluence he

causally

affected

parts

of the outcome

hrough

identified

mediating

processes

e.g.,

he


11/81

I

1O I T. CXPTRIMENTS

ND

GENERALIZED

AUSAL

NFERENCE

passage

of electricity

through the circuit,

the excitation

of

photons).3

ClearlS the

causeof the

light

going

on

is

a complex

cluster

of

many factors.

For those

philoso-

phers

who equate cause

with

identifying that constellation

of

variables

that

nec-

essarily inevitably and infallibly results in the effect (Beauchamp,1.974), alk of

cause

s not

warranted

until everything

of

relevance

s known.

For them,

there

is

no causal description

without causal

explanation.

Whatever

the

philosophic mer-

its of their

position,

though,

it is not

practical to expect

much

current social

sci-

ence

o

achieve

such complete

explanation.

The

practical

importance of

causal explanation

is

brought

home when the

switch

fails

to

make the

light

go

on

and when

replacing

the

light bulb

(another

easily

learned manipulation)

fails to solva

the

problem. Explanatory

knowledge

then

offers clues about

how to

fix

the

problem-for

example,

by detecting

and

re-

pairing

a

short circuit. Or

if we wanted

to create

llumination

in

a

place

without

lights

and

we had explanatory

knowledge, we would

know exactly

which

features

of the cause-and-effect elationship are essential o create ight and which are ir-

relevant. Our explanation

might tell

us that

there

must be

a source

of electricity

but

that that source

could take several

different

molar

forms, such

as abattery,

a

generator,

a windmill, or a

solar array.

There

must also

be a

switch

mechanism o

close a circuit, but

this could also

take

many forms,

including

the touching of

two

bare wires

or

even

a

motion

detector

that

trips the

switch

when someone

enters

the

room.

So causal explanation

is an

important

route to

the

generalization

of

causal descriptions

because

t tells us which

featuresof

the

causal

relationship

are

essential o transfer

to other situations.

This

benefit

of causal explanation

helps elucidate

its

priority and

prestige n

all sciences nd helpsexplain why, once a novel and important causal

relationship

is discovered, he

bulk

of

basic scientific

effort

turns

toward

explaining

why and

how it happens. Usuallg this

involves decomposing

he

cause

nto its causally ef-

fective

parts,

decomposing the

effects

nto its causally

affected

parts,

and

identi-

fying

the

processes

hrough

which the effective

causal

parts influence

the causally

affected

outcome

parts.

These examplesalso

show the close

parallel between

descriptive

and

explana-

tory causation and

molar and

molecular causation.a

Descriptive causation

usually

concerns

simple bivariate

relationships between

molar treatments

and

molar out-

comes,

molar here referring to a

package

hat

consists

of

many different

parts.

For

instance, we may

find

that

psychotherapy decreases

epression,

a simple

descrip-

tive causalrelationship benveena molar treatment package and a molar outcome.

However,

psychotherapy

consists of

such

parts

as

verbal

interactions,

placebo-

3. However, he full explanationa

physicist

would

offer might

be

quite different

rom

this electrician's

explanation,

perhaps nvoking he behaviorof subparticles.

his difference

ndicates

ust

how complicated

s the

notion of explanationand

how it

can

quickly

become

uite

complex

once

one

shifts

evelsof analysis.

4. By molar, we meansomething aken as

a whole rather than

in

parts.An analogy

s to

physics,

n

which molar

might refer o the

properties

r

motions of masses, s

distinguished

rom

thoseof

molecules

r atoms hat

make up

thosemasses.


12/81

EXPERIMENTSNDCAUSATION

11

I

generating

procedures,

setting characteristics,

ime

constraints,

and

payment

for

services.

Similarly,

many

depression

measures

consist of

items

pertaining

to

the

physiological,cognitive,

and affective

aspects

f

depression.

Explan atory causation

breaks

hese

molar

causes

and

effects

nto

their

molecular

parts

so as to

learn,

say,

that

the verbal

nteractions

and

the

placebo

featuresof therapy

both cause

changes

in the cognitivesymptomsof depression,but that payment for services oes not do

so even

hough

it is

part

of the

molar

treatment

package.

If experiments

are

less

able to

provide

this highly-prized

explanatory

causal

knowledge,

why.are

experiments

so

central

to science,

specially

o basic

social sci-

ence,

n which

theory

and

explanation

are often

the coin of

the realm?

The answer is

that

the dichotomy

ber'*reen

escriptive

and

explanatory

causation

is lessclear

in

sci-

entific

practice han

in abstract

discussions

bout

causation.

First, many causalex-

planatironsconsist

of

chains

of descriptivi

causal

inks in which

one event causes

he

next. Experiments

help to test

the

links

in each

chain. Second,

experiments

help dis-

tinguish

between

he validity

of

competing

explanatory

theories,

or example, by test-

ing competing mediating links proposed by those theories.Third, someexperiments

test whether

a descriptive

causal

relationship

varies

in strength

or direction

under

Condition

A versus

Condition

B

(then

the condition

is

a

moderator variable

that ex-

plains

the

conditions

under

which the

effect

holds).

Fourth, some

experiments

add

quantitative

or

qualitative observations

of

the

links

in the explanatory

chain

(medi-

ator

variables)

to

generateand study

explanations

for the

descriptive

causal effect.

Experiments

are

also

prized

in

applied

areas

of social

science,

n which the

identification

of

practical

solutions

to

social

problems has as

great

or

even

greater

priority

than

explanations

of those

solutions.

After all, explanation

is not always

required

for

identifying

practical solutions.

Lewontin

(1997)

makes this

point

about the

Human Genome

Project,

a

coordinated

multibillion-dollar

research

program ro map the human genomethat it is hoped eventually will clarify the ge-

netic

causesof

diseases.

ewontin

is skeptical

about

aspects

of this

search:

' ilhat

is

involved

here

s the difference

etween

xplanation

nd

intervention.

Many

disorders

anbe

explained

y the

failure

of

the organism

o

makea

normal

protein,

a

failure

hat

is

the

consequence

f a

genemutation.

But interuention

equireshat

the

normal

proteinbe

providedat

the

right

place

n the

right cells,at the

right time and

n

the

right amount,

or

else hat

an alternative

way

be

ound to

providenormal

cellular

function.'What

s worse,

t

might even

be

necessary

o

keep he

abnormal

roteinaway

from the

cellsat

critical

moments.

None

of

these

bjectives

s served

y knowing he

DNA sequence

f the

defective

ene.

Lewontin,

1,997,

p.29)

Practical applications arenot immediately revealedby theoretical advance.In-

stead, o

reveal hem

may take

decades

of

follow-up

work, including

tests of sim-

ple

descriptive

causal

relationships.

The same

point

is illustrated

by the cancer

drug

Endostatin,

discussed

arlier.

Scientists

knew the

action

of

the drug

occurred

through

cutting off

tumor

blood

supplies;

but

to successfully

use he

drug

to treat

cancers

n

mice required

administering

it at

the

right

place,

angle,

and

depth,

and

those details

were

not

part of the usual

scientific

explanation

of the

drug's

effects.


13/81

12

I

1. EXPERTMENTS

ND

GENERALTZED

AUSAL

NFERENCE

I

In the

end, hen,causal

escriptions ndcausal xplanations

re

n

delicate

al-

ance

n

experiments.'$7hatxperiments

o best

s to improvecausaldescriptions;

they do

less

well at explaining

causal

elationships. ut most experiments

an be

designedo providebetterexplanationshan s typically he caseoday.Further, n

focusing

on causaldescriptions, xperiments

ften

investigate

molar

events hat

may be less

strongly related to outcomes han are more

molecularmediating

processes,

specially

hose

processes

hat are

closer o the outcome

n

the explana-

tory

chain. However,

many

causaldescriptions

re still dependable nd strong

enough o be useful,

o be worth making the building blocks

around which im-

portant policies

and theoriesare created.

ust

consider

he dependability f

such

causal tatementss

hat schooldesegregationauses

hite light, or that outgroup

threatcausesngroup

cohesion, r

that

psychotherapymprovesmentalhealth,

or

that diet reduces

he

retardation

due

o

PKU. Suchdependableausal

elationships

are

useful o

policymakers, ractitioners,

nd

scientists like.

MODERN

DESCRIPTIONS

F EXPERIMENTS

Some of

the terms used n describing modern experimentation

(see

Table L.L)

are

unique,

clearly defined,

and consistently used;

others are blurred and

inconsis-

tently used. The common

attribute in all experiments

is control of treatment

(though

control can take many

different

forms). So Mosteller

(1990,

p.

225)

writes,

"fn

an experiment

the

investigator

controls the

application of the treat-

ment"l

and

Yaremko,

Harari, Harrison,

and

Lynn

(1,986,

p.72)

write,

"one

or

more independent

variables are

manipulated

to

observe their effects on one or

more

dependent

variables." However,

over time

many different

experimental sub-

types

have

developed

n response

o the needs and

histories of different sciences

('Winston,

1990;

'Winston

6c Blais, 1.996\.

TABLE .1

TheVocabulary

f Experiments

Experiment:

study n whichan nterventions

deliberately

ntroduced

o observetseffects.

Randomized

xperiment:

n experimentn whichunitsareassigned

o

receive

he reatment r

analternativeondition ya random rocessuch s he oss f a coin r a table f

random

umbers.

Quasi-Experiment:

n experimentn

whichunitsare

not assignedo

conditionsandomly.

Natural

xperiment: ot

really n experimentecausehe cause

sually annot e

manipulated;

study

hatcontrastsnaturallyccurring

vent uch

sanearthquakeith

a comoarison

ondition.

Correlational

tudy:Usually

ynonymous

ith nonexperimental

r observationaltudy; study

that

simply

bserveshesize nddirection

f

a relationship

mong

ariables.


14/81

I

MODERN

ESCRIPTIONS

F

EXPERIMENTS

I

tr

Randomized

xperiment

The

most clearly

described

ariant

s

the

randomized

experiment,

widely

credited

to Sir

Ronald

Fisher

1,925,1926).It

was

irst

used

n agriculture

ut

aterspread

to other

topic

areas

because

t

promisedcontrol

over extraneous

ources

f

vari-

ation

without

requiring

he physical solationof the aboratory. ts distinguishing

feature

s clear

and

important-that

the

various

reatments

being

contrasted

in-

cluding

no treatment

at

all)

are assigned

o experimental

nits'

by chance,

or ex-

ample,

by

coin

ossor

use

of a table

of

random

numbers.

f

implemented orrectlS

,"rdo-

assignment

reates

wo

or

more

groupsof units

that are

probabilistically

similar

o .".h

other

on the

average.6

ence,

any

outcome

differences

hat

are ob-

served

etween

hose

groups

at

the

end,of

a study

are

ikely

to be due o

treatment'

not to differences

etween

he

groups hat

already

existed

at the start

of the study.

Further,

when

certain

assumptions

re

met, he

randomized

experiment

ields

an

estimate

of

the

size

of a treatment

effect

hat

has desirable

tatistical

properties'

alongwith estimates f the probability that the true effect alls within a defined

confidence

nterval.

These

eatures

of experiments

re so

highly

prized

hat

in a

research

rea

such

as

medicine

he randomized

experiment

s

often

referred

o as

the

gold standard

or treatment

outcome

esearch.'

Closely

elated

o

the

randomized

experiment

s a

more ambiguous

and

in-

consistently

sed

erm,

true

experiment.

Some

authors

use

t

synonymously

ith

randomized

xperiment

Rosenthal

&

Rosnow,

1991').

Others

use

t more

gener-

ally to

refer

o any

study

n

which

an

independent

ariable

s deliberately

manip-

ulated

(Yaremko

et

al.,

1,9861and

dependent

ariable

s assessed.

We

shall

not

use he

term

at all

given ts

ambiguity

and

given

hat

the

modifier

true seems

o

imply

restricted

laims

o a

single

correct

experimental

method.

Quasi-Experiment

Much

of this

book

focuses

on

a class

of

designs

hat

Campbell

and Stanley

(1,963)

opularized

s

quasi-experiments.s

uasi-experiments

hare

with all other

5. Units

can be

people,animals,

ime

periods, nstitutions,

or

almost

anythingelse.

Typically

n field

experimentation

hey

are

people

or

some

aggregate

f

people,such

as classrooms

r work sites.

n addition,

a

little

thought

shows hat

random

assignment

f units

to treatments

s the

sameas assignment

f

treatments

o units, so

these

phrases

re requendy

used

nterchangeably'

6. The word probabilistically s crucial,as s explained n more detail n Chapter8.

7.

Although the rerm

randomized

experiment

is used his

way

consistently

across

many

fields and

in this book,

statisticians

sometimes

use he closely

related term

random

experiment

n a different

way to

indicate

experiments

for which the

outcome

annor

be

predictedwith

certainry

e.g.,

Hogg &

Tanis, 1988).

8. Campbell

1957)

irst called

hese ompromise

esigns

ut

changed

erminology

very

quickly; Rosenbaum

(1995a\

and Cochran

1965\

refer o these

as

observational

tudies,

term we

avoid because

many

people

use

t to

refer o

correlational

r nonexperimental

tudies,

s

well. Greenberg

nd Shroder

1997)

use

qudsi-etcperiment

o

refer o studies

hat

randomly

assign

roups

(e.g.,

ommunities)

o conditions,

but

we would

consider hese

roup-

randomizedexperiments

Murray'

1998).


15/81

I

14

I

1. EXPERIMENTS

NDGENERALIZED

AUSALNFERENCE

I

experiments

a similar

purpose-to

test descriptivecausalhypotheses

bout

manip-

ulable causes-as well as many

structural details, such as

the

frequent

presence

of

control

groups

and

pretest

measures,

o support a counterfactual

nference

about

what would have happened in the absenceof treatment. But, by definition, quasi-

experiments lack random

assignment. Assignment to conditions

is by

means

of self-

selection,by which units choose

reatment

for

themselves, r

by

meansof adminis-

trator selection,

by which teachers,bureaucrats, egislators, herapists,

physicians,

or others

decidewhich

persons

should

get

which treatment.

Howeveq researchers

who use

quasi-experiments

may still have considerablecontrol

over selectingand

schedulingmeasures,

ver how nonrandom

assignment

s

executed,

over the kinds

of

comparison

groups

with which

treatment,groups

are compared,

and over some

aspectsof how treatment is

scheduled.

As

Campbell

and Stanley

note:

There are many

natural socialsettings n which the

research

erson

can

introduce

somethingike experimental esign nto his scheduling f datacollection rocedures

(e.g.,

he uhen and o

whom of

measurement),ven hough

he acks he full control

over he

scheduling f experimental timuli

(the

when

and o wltom of exposure nd

the ability to randomize

xposures) hich makes

a true experiment

ossible.

ollec-

tively,such

situations an be regarded s

quasi-experimental

esigns.

Campbell

&

StanleS

,963,

.

34)

In quasi-experiments,

he

cause s manipulable and

occurs before the effect

is

measured. However,

quasi-experimental

design

features usually create

ess

com-

pelling

support for counterfactual inferences. For example,

quasi-experimental

control

groups

may differ from

the treatment

condition

in many systematic

non-

random) ways other than the presenceof the treatment Many of theseways could

be alternative

explanations for the observed effect,

and so

researchershave to

worry

about ruling

them out in order

to

get

a

more valid

estimateof

the treatment

effect.

By

contrast, with random

assignment

he researcher

does

not have to th ink

as

much

about a ll these alternative

explanations.

If correctly done,

random as-

signment makes most

of the alternatives

less likely as causes

of the observed

treatment effect

at the start of the study.

In

quasi-experiments,

he researcher as o enumerate

alternative

explanations

one by one,

decide which are

plausible,

and then use

ogic, design,

and measure-

ment

to assess

hether

each

one

is

operating

in

a

way that

might explain any ob-

servedeffect. The diff iculties are

that thesealternative

explanations

are

never com-

pletely enumerable n advance, that some of them are particular to the context

being studied,

and that the methods needed o eliminate

them

from contention will

vary

from

alternative o

alternative and

from

study

to study.

For example,suppose

two nonrandomly

formed

groups

of

children are

studied, a volunteer

treatment

group

that

gets

a

new

reading

program

and a control

group

of

nonvolunteerswho

do not

get

it.

If the treatment

group

does better,

s it becauseof

treatment or be-

cause he

cognitive development of

the volunteers

was

increasing

more rapidly even

before treatment

began?

(In

a

randomized experiment,

maturation rates would

r


16/81

MODERN ESCRIPTIONS

FEXPERIMENTS

|

1s

have

been

probabilistically

qual

n both

groups.)

To assess

his

alternative,

he

re-

searcher

might

add

multiple

pretestso

reveal

maturational

rend

before

he

treat-

ment, and

then

compare

hat

trend

with

the

trend after

treatment.

Another

alternative

xplanation

might

be

hat the

nonrandom

control

group n-

cluded

more

disadvantaged

hildren

who

had

essaccess

o books

n their

homes

or

who

had

parentswho

read

o them lessoften. (In a randomizedexperiment'both

groupswould

have

had

similar

proportions

of

such

children.)

To assess

his

alter-

nativi,

the

experimenter

may

measure

he

number

of

books

at home,

parental

ime

spent

eadingto

children,

and

perhaps rips

o

libraries.

hen

he

researcher

ould

see

f these

variables

differed

across

reatment

and

control

groups n the

hypothe-

sized

direction

hat

could

explain

the

observed

reatment

effect.

Obviously,

as the

number

of

plausible

alternative

explapations

ncreases,

he design

of

the

quasi-

.

experiment

becomes

more

intellectually

demanding

and

complex---especially

e-

cause

we

are

never

certain

we

have

dentified

all

the alternative

xplanations.

he

efforts

of

the

quasi-experimenter

tart

to

look

like affempts

o bandage

a

wound

that would havebeen essseveref random assignment ad beenused nitially.

The ruling

out

of alternative

ypotheses

s closely

elated

o a

falsificationist

logic

popularized

y

Popper

1959).

Popper

noted

how

hard it

is to be

sure

hat a

g*.r"t conclusion

e.g.,

ll r*"ttr

are

white)

is correct

based

on

a

limited

set of

observations

e.g.,

all

the

swans

've seen

were

white).

After

all,

future observa-

tions

may change

e.g.,

ome

ay

may

seea

black

swan).

So confirmation

s

log-

ically

difficult.

By contrast,

observing

disconfirming

nstance

e.g.,

a black

swan)

is sufficient,

n

Popper's

iew,

o

falsify

the

general onclusion

hat

all

swans

are

white.

Accordingly,

opper

urged

scientists

o try

deliberately

o

falsify the

con-

clusions

hey

wiih

to

draw

rather

than

only

to seek

nformation

corroborating

them.

Conciusions

hat

withstand

alsification

are

retained

n scientific

books

or

journals and treated as plausibleuntil better evidencecomesalong. Quasi-

experimentation

s falsificationist

n that

it requires

experimenters

o

identify

a

causal

laim

and

then

o

generate

nd examine

plausible

alternative

xplanations

that

might

falsify

he

claim.

However,

uch

alsification

an

never

be

as definitive

as

Popper

hoped.

Kuhn

(7962)pointed out

that

falsification

depends

n

two

assumptions

hat

can

never

be

fully

tested.

The

first

is that

the

causal

claim

is

perfectlyspecified.

But

that

is

never h.

."r..

So

many

features

of both

the claim

and

the test

of

the claim

are

debatable-for

example,

which

outcome

s of

interest,

how

it

is measured,

he

conditions

of

treatment,

who

needs

reatment,

and

all the

many

other

decisions

that researchers ustmake n testingcausal elationships. s a result,disconfir-

mation

often

eads

heorists

o

respecify

art of their

causal

heories.

For

exam-

ple,

hey

might

now

specify

ovel

conditions

hat

must

hold

for their

theory

o

be

irue

and

that

were

derived

rom

the apparently

disconfirming

observations.

ec-

ond,

falsification

equires

measures

hat are

perfectly

valid

reflections

f

the the-

ory

being

tested.

However,

most

philosophers

maintain

that

all

observation

s

theorv-laden.

t

is laden

both

with

intellectual

nuances

pecific

o

the

partially


17/81


18/81


19/81

18

I

1. EXPERIMENTS

NDGENERALIZED

AUSAL

NFERENCE

Nonexperimental esigns

The termscorrelational

design,

passive

bservational esign,and

nonexperimental

design efer to situations n which a presumedcauseand effectare identifiedand

measuredbut in which

other structural featuresof experiments

re missing.Ran-

dom assignments not

part

of the design,nor are such design

elements s

pretests

and control

groups

rom

which researchers ight

construct useful ounterfactual

inference. nstead, eliance

s

placed

on measuring lternative

xplanationsndi-

vidually

and then statistically

ontrolling

for them. In cross-sectionaltudies

n

which all the

data are

gathered

on

the

respondents t one ime,

the researchermay

not even know if

the cause

precedes

he dffect.

When these

studiesare used or

causal

purposes,

he missing

design eatures an

be

problematic

nlessmuch s

al-

ready known

about

which alternative

nterpretations re

plausible,

unless hose

that are

plausible

an

be validly measured, nd unless

he substantive

model

used

for statistical djustments well-specified. hesearedifficult conditions o meet n

the real

world of research

ractice,

and therefore

many commentators

oubt the

potential

of suchdesigns

o supportstrongcausal

nferences

n most cases.

EXPERIMENTS

ND

THEGENERALIZATION

F

CAUSAL

ONNECTIONS

The strength

of experimentation is its ability to

illuminate causal

inference.The

weaknessof experimentation

is doubt about the extent

to which

that causal rela-

tionship

generalizes.

We

hope that an innovative

feature of this

book is its focus

on

generalization.

Here

we

introduce

the

general ssues hat are

expanded

n

later

chapters.

Most Experiments

re HighlyLocalBut Have

GeneralAspirations

Most experiments

rehighly ocalized

and

particularistic. hey arealmostalways

conducted n

a

restricted

ange

of settings,

ften

just

one, with a

particular

ver-

sion of one typeof treatment ather than, say,a sampleof all possible ersions.

Usually they

have severalmeasures-each

with

theoretical

assumptionshat are

different rom

those

present

n

other

measures-but

far from a complete etof all

possible

measures.

ach

experimentnearly always

usesa

convenient ampleof

people

ather

than

one that reflectsa well-described

opulation;

and

it

will

in-

evitably

be conducted

t a

particular point

in time that

rapidly becomes

istory.

Yet

readers f

experimental esultsare rarelyconcerned

ith

what happened

in that particular,past,

ocal study.Rather, hey usually

aim to

learn eitherabout

theoretical

onstructs

f

interest

or about alarger

policy.Theoristsoften want to


20/81

EXeERTMENTS

ND

THE

GENERALIZATIONF

CAUSAL

ONNECTIONS

I

t '

connect

experimental

results

to

theories

with broad

conceptual

applicability,

which

,.q,rir.,

generalization

at

the

linguistic

level

of constructs

rather

than

at the

level of

the

operations

used

to

represent

these constructs

in

a

given experiment.

They

nearly

always

want

to

generallze

o

more

people and

settings

han

are

rep-

resented

n a single

experiment.

Indeed,

the value

assigned

o

a substantive

heory

usually

depends

on

how

broad

a

rangeof phenomena the theory covers.SimilarlS

policymakers

may be

interested

in whether

a

causal

relationship

would

hold

iprobabilistically)

across

he

many

sites

at

which

it would

be

implemented

as a

policS an

inference

hat

requires

generalization

beyond

the

original

experimental

stody

contexr.

Indeed,

all

human

beings

probably

value

the

perceptual and

cogni-

tive stability

that

is fostered

by

generalizations.

Otherwise,

the

world

might ap-

pear

as a

btulzzing

acophony

of

isolqted

instances

requiring

constant

cognitive

processing hat

would

overwhelm

our

limited

capacities.

In defining

generalization

as

a

problem,

we

do

not assume

hat more

broadly

ap-

plicable

resulti

are

always

more

desirable

Greenwood, 1989).

For example,

physi-

cists -ho useparticle accelerators o discover new elementsmay not expect that it

would

be desiiable

to

introduce

such

elements

nto the

world.

Similarly,

social

scien-

tists

sometimes

aim

to

demonstrate

that

an

effect

is

possible and

to understand

its

mechanisms

without

expecting

that

the

effect

can be

produced

more

generally.

For

instance,

when

a

"sleeper

effect"

occurs

in an

attitude

change

study

involving

per-

suasive

communications,

the

implication

is

that

change

s manifest

after

a time

delay

but

not

immediately

so.

The circumstances

under

which

this

effect

occurs

turn

out

to

be

quite

limited

and

unlikely

to

be

of any

general

nterest

other

than

to

show

that

the

theory

predicting

t

(and

many

other

ancillary

theories)

may

not be

wrong

(Cook,

Gruder,

Hennigan

&

Flay

l979\.Experiments

that

demonstrate

limited

generaliza-

tion

may be

ust

as

valuable

as hose

hat

demonstrate

broad

generalization.

Nonetheless,

conflict seemso exist berween he localizednature of the causal

knowledge

that

individual

experiments

provide

and

the

more

generalized

causal

goals hat

research

aspires

o attain.

Cronbach

and

his

colleagues

Cronbach

et

al.,

f

gSO;

Cronbach,

19821have

made

this

argument

most

forcefully

and their

works

have

contributed

much

to

our

thinking

about

causal

generalization.

Cronbach

noted

that

each

experiment

consists

of

units

that

receive

he

experiences

eing

con-

trasted,

of

the

treaiments

themselves

of obseruations

made

on the

units,

and

of the

settings

in

which

the

study

is conducted.

Taking

the

first

letter

from each

of

these

four

iords,

he defined

the

acronym

utos

to

refer

to the

"instances

on which

data

are

collected"

(Cronb

ach,

1.982,p.

78)-to

the

actual

people, reatments'

measures'

and settings hat were sampled n the experiment.He then defined wo

problems

of

generalizition:

(1)

generaliiing

to

the

"domain

about

which

[the]

question

is

asked"

(p.7g),which

he called

UTOS;

and

(2)

generalizing

o

"units,

treatments,

variables,

"nd

r.r,ings

not

directly

observed"

(p.

831,

hi.h

he called

oUTOS.e

9. We oversimplify

Cronbach's

presentation

here

or

pedagogical

easons.

For example,

Cronbach

only

usedcapital

S,

not small s,

so that

his system

eferred

only

to

,tos,

not

utos. He

offered

diverse

and

not always

consistent

definitions

of

UTOS and

*UTOS,

in

particular.

And

he

does

not

use he

word

generalization

n

the

same

broad

way we

do here.


21/81

I

20 I 1. EXPERIMENTSNDGENERALIZEDAUSAL

NFERENCE

Our

theory

of

causal

generalization,

utlinedbelowand

presented

n morede-

tail in ChaptersLL through

13, melds

Cronbach's

hinking

with

our own

ideas

about

generalization

rom

previous

works

(Cook,

1990, t99t;

Cook 6c Camp-

bell,1979), creatinga theory that is different n modestways rom both of these

predecessors.

ur theory

s influenced y Cronbach'swork

in

two

ways.First,we

follow him

by

describing

xperiments

onsistently

hroughout

his

book as con-

sistingof the elements f units, treatments,

bservations,

nd

settingsrlo

hough

we

frequently ubstitute

ersons

or

units

given

hat most

ield

experimentation

s

conductedwith humansas

participants.

We

lso

often

substitute

utcome

.orob-

seruations

iven

he centrality of observations

bout

outcome

when examining

causal

elationships. econd, e acknowledge

hat

researchers

reoften

nterested

in two

kinds

of.generalization bout

eachof these

ive

elements,

nd that

these

two typesare nspiredbg but

not identical o, the

two

kinds of

generalization

hat

Cronbach defined.

We

call these

construct validity

generalizations

inferences

about he constructshat research perations epresent) nd external aliditygen-

eralizations

inferences

bout whether he causal

elationship

oldsover

variation

in

persons,

ettings,reatment,

and measurement

ariables).

Construct alidity:CausalGeneralization

as

Representation

The first

causal

generalization

problem

concerns

how to

go

from the

particular

units, treatments,

observations, and settings

on

which data

are collected

to the

higher order constructs these nstances epresent.These constructs are almost al-

ways couched in terms that are

more abstract

than the

particular

instancessam-

pled

in an experiment. The labels may

pertain

to

the

individual

elementsof

the ex-

periment

(e.g.,

is the outcome

measured by

a

given

test

best described

as

intelligence or as achievement?).Or

the labels

may

pertain

to

the

nature of

rela-

tionships among elements,

ncluding causal

relationships,

as

when cancer

treat-

ments are

classified as

cytotoxic or cytostatic

depending

on

whether

they

kill tu-

mor cells directly or delay tumor

growth

by

modulating

their

environment.

Consider a

randomized

experiment

by Fortin

and

Kirouac

(1.9761.

he treatment

was a brief

educational

course administered

by several

nurses,

who

gave

a tour of

their

hospital

and covered

some basic facts

about

surgery

with

individuals

who

were to have elective abdominal or

thoracic surgery

1-5 o 20 days later in a sin-

gle

Montreal hospital. Ten specific outcome

measures

were used

after the

surgery,

such

as an activities

of

daily living scaleand

a count

of the

analgesics

sed

o con-

trol

pain.

Now compare this study

Documents

Cu as i Experimental