Cu as i Experimental

Embed Size (px)

Citation preview

  • 8/17/2019 Cu as i Experimental

    1/81

    ii:.

    EXPERIMENTAL

    ND

    QUASI-EXPERIMENTAL

    DESIGNS

    OR

    GENERALIZED

    CAUSAL

    NFERENCE

    Wil l iam R. Shadish

    Trru UNIvERSITY

    op

    MEvPrrts

    Thomas

    D. Cook

    NonrrrwpsrERN

    UNrvPnslrY

    Donald

    T.Campbell

    i L l i

    .jr-* -

    **

    fr

    HOUGHTON

    IFFLIN

    OMPANY

    Boston

    New

    York

  • 8/17/2019 Cu as i Experimental

    2/81

    Experimentsnd

    Generalized

    ausal

    lnference

    Ex.per'i'ment

    (ik-spEr'e-mant):

    Middle

    English from Old

    French

    rom

    Latin

    experimentum, from experiri, to try; see er- in Indo-EuropeanRoots.]

    n.

    Abbr. exp.,

    expt,

    1. a. A

    test under

    controlled

    conditions that

    is

    made to demonstrate

    a

    known

    truth,

    examine

    the validity of

    a hypothe-

    sis,

    or determine

    he

    efficacy

    of something

    previously untried' b.

    The

    processof conducting

    such

    a test; experimentation.

    2'

    An innovative

    act or

    procedure:

    "Democracy

    is

    only an experiment

    n

    gouernment"

    (.V{illiam

    Ralph lnge).

    Cause

    k6z):

    [Middle

    English

    from

    Old French

    from

    Latin causa' teason,

    purpose.] n. 1. a.

    The

    producer

    of an

    effect,

    result, or consequence.

    b. The

    one, such

    as a

    person,

    an

    event' or

    a condition,

    that is responsi-

    ble

    for an action

    or a

    result.

    v. 1.

    To be the

    causeof or

    reason or;

    re-

    sult

    in. 2.

    To bring

    about

    or compel

    by authority

    or

    force.

    o

    MANv

    historians

    and

    philosophers,

    he

    increased

    emphasis

    on experimenta-

    tion

    in the

    15th and

    L7th centuries

    marked

    the emergence

    f

    modern science

    from

    its roots

    in

    natural

    philosophy

    (Hacking,

    1983). Drake

    (1981)

    cites

    Galileo's

    1.6' .2

    treatrse

    Bodies

    Tbat Stay

    Atop

    'Water,

    or Moue in

    It as ushering

    n

    modern

    experimental

    science,

    but

    earlier

    claims

    can be

    made favoring

    \Tilliam

    Gilbert's

    1,600 tudy

    Onthe

    Loadstone

    nd

    MagneticBodies,

    eonardo

    da Vinci's

    (1,452-1.51.9)

    any nvestigations,

    nd

    perhapseven

    he

    Sth-century

    .C.philoso-

    pher

    Empedocles,

    who used

    various

    empirical

    demonstrations

    o argue

    against

    Parmenides

    Jones,

    1.969a,

    1'969b).In

    the

    everyday

    senseof

    the term,

    humans

    have beenexperimenting

    with

    different

    ways

    of doing

    things

    from

    the earliest

    mo-

    ments

    of their

    history.

    Such

    experimenting

    s as

    natural a

    part

    of

    our

    life as rying

    a

    new

    recipeor

    a different

    way

    of starting

    campfires.

  • 8/17/2019 Cu as i Experimental

    3/81

    z

    |

    1. EXeERTMENTs

    NDGENERALTzED

    AUsAL

    NFERENcE

    I

    However,

    the

    scientific revolution

    of the

    1.7th

    century departed

    n

    three ways

    from

    the common use

    of observation n natural

    philosophy

    atthat

    time. First, it in-

    creasingly used

    observation to

    correct errors

    in

    theory.

    Throughout historg

    natu-

    ral

    philosophers

    often used

    observation in their theories,

    usually to win

    philo-

    sophical

    arguments

    by

    finding

    observations that supported

    their theories.

    However,

    they still

    subordinated the use of observation to

    the

    practice

    of

    deriving

    theories rom

    "first

    principles,"

    starting

    points

    that

    humans know to be true

    by our

    nature or

    by divine revelation

    (e.g.,

    he assumed

    properties

    of the

    four

    basic

    ele-

    ments

    of

    fire,

    water,

    earth, and air in Aristotelian natural

    philosophy).

    According

    to some accounts,

    his subordination

    of

    evidence o theory degenerated

    n the 17th

    century:

    "The

    Aristotelian

    principle

    of

    appealing

    to experience

    had degenerated

    among

    philosophers

    nto

    dependenceon

    reasoning supported by casual examples

    and the refutation

    of opponents by

    pointing

    to apparent

    exceptionsnot carefully

    examined"

    (Drake,

    '1,98"1.,

    p. xxi).'Sfhen

    some

    17th-century

    scholars hen began

    o

    use observation to correct apparent errors in theoretical and religious first princi-

    ples,

    they came

    into

    conflict with religious or

    philosophical authorities, as

    in

    the

    case

    of the

    Inquisition's

    demands that Galileo

    recant his

    account

    of the earth re-

    volving around

    the sun. Given such hazards, he fact that the

    new experimental

    sci-

    ence ipped

    the

    balance

    oward observation and ^way

    from dogma

    is remarkable.

    By

    the time

    Galileo died, the role

    of systematicobservation

    was

    firmly

    entrenched

    as a central feature

    of science,and it has remained so ever

    since

    (Harr6,1981).

    Second,

    before the

    17th

    century,

    appeals o

    experiencewere

    usually

    based

    on

    passive

    observation

    of ongoing systems ather than on

    observation of what

    hap-

    pens

    after a

    system

    s

    deliberately

    changed.

    After the scientific

    revolution

    in the

    L7th

    centurS the

    word experiment

    (terms

    in boldface

    in this book are defined

    in

    the Glossary)

    came to connote

    taking a deliberate

    action

    followed by systematic

    observation

    of what occurredafterward. As Hacking

    (1983)

    noted

    of Francis

    Ba-

    con:

    "He

    taught that not

    only

    must

    we observenature

    in the

    raw,

    but that

    we must

    also

    'twist

    the

    lion's

    tale', that

    is,

    manipulate our world

    in order to

    learn its

    se-

    crets"

    (p.

    U9).

    Although

    passive

    observation

    evealsmuch about

    the world, ac-

    tive manipulation

    is required

    to discover some of the

    world's

    regularities and

    pos-

    sibilities

    (Greenwood,,

    1989). As

    a

    mundane

    example,

    stainless steel

    does not

    occur

    naturally;

    humans must manipulate it into

    existence.

    Experimental science

    came to

    be concerned with

    observing

    the

    effects of

    such

    manipulations.

    Third,

    early

    experimenters

    realized

    he desirability of controlling

    extraneous

    influences that might limit or bias observation. So telescopeswere carried to

    higher

    points

    at

    which

    the air

    was clearer,

    he

    glass for microscopeswas

    ground

    ever more

    accuratelg

    and scientists

    constructed laboratories

    in

    which

    it was

    pos-

    sible

    to use walls

    to

    keep

    out

    potentially

    biasing ether

    waves and to use

    (eventu-

    ally

    sterilized)

    est tubes to

    keep out dust or

    bacteria.

    At first, thesecontrols were

    developed

    or

    astronomg chemistrg

    and

    physics,

    he

    natural sciences

    n which in-

    terest in

    science irst

    bloomed.

    But when scientists

    started

    to use experiments

    n

    areas such

    as

    public

    health

    or

    education,

    in

    which extraneous

    influences

    are

    harder

    to control

    (e.g.,

    Lind

    ,

    1,753lr, hey found

    that the controls

    used

    n natural

  • 8/17/2019 Cu as i Experimental

    4/81

    EXPERTMENTS

    ND CAUSATTON

    I

    science

    n the aboratory

    worked

    poorly

    in these

    new applications.

    So hey

    devel-

    oped

    new methods

    of dealing

    with extraneous

    nfluence,such

    asrandom

    assign-

    ment

    (Fisher,

    ,925) r

    adding

    a nonrandomized

    ontrol

    group

    (Coover

    &

    Angell,

    1.907).

    s theoretical

    nd

    observational

    xperience

    ccumulated

    cross hese et-

    tings and

    opics,

    more sources

    f bias

    were dentified

    and

    more methodswere de-

    velopedo copewith them(Dehue, 000).

    TodaS he

    key feature

    common

    o all

    experiments

    s still

    to

    deliberately

    ary

    something

    o as

    o

    discover

    what

    happenso something

    lse ater-to discover he

    effects

    f

    presumed auses.

    s laypersons

    e

    do this,

    for example, o assess hat

    happens

    o our

    blood

    pressuref we exercise

    more, to our weight

    f we diet

    less,

    or ro our

    behavior

    f we

    read a

    self-help

    book.

    However,scientific

    experimenta-

    tion

    has developed

    ncreasingly

    pecialized

    ubstance,

    anguage,

    and tools, in-

    cluding

    he

    practice

    of

    field

    experimentation

    n the social

    scienceshat

    is the

    pri-

    mary

    focus of

    this

    book.

    This chapter

    begins

    to explore

    these

    matters by

    (1)

    discussing

    he

    nature

    of causation

    hat

    experiments

    est,

    2)

    explaining he spe-

    cializederminology e.g., andomizedexperiments, uasi-experiments)hat de-

    scribes

    social

    experiments,

    3)

    introducing

    the

    problem

    of

    how to

    generalize

    causalconnections

    rom

    individual

    experiments,

    nd

    (4)

    briefly situating he ex-

    perimentwithin

    a larger

    iterature

    on the

    nature

    of science.

    EXPERIMENTS

    ND CAUSATION

    A sensible

    iscussion

    f

    experiments

    equires

    both a vocabulary

    or talking about

    causation

    nd

    an understanding

    f

    key concepts

    hat underlie

    hat vocabulary.

    Defining

    Cause,

    ffect, nd

    Causal

    elationships

    Most

    people

    ntuitively

    recognize

    ausal

    elationships

    n their

    daily lives.

    For in-

    stance,

    ou

    may

    say hat

    another

    automobile's

    itting

    yours

    was a cause

    of the

    damage o

    your

    car;

    that the

    number

    of

    hours

    you

    spent

    studying

    was a cause

    f

    your

    test

    grades; r that

    the amount

    of food a

    friend

    eatswasa cause

    f his weight.

    You may even

    point to

    more complicated

    ausal

    elationships,

    oting that a

    low

    test

    grade

    was

    demoralizing,

    which

    reducedsubsequent

    tudying,which

    caused

    even

    ower

    grades.

    Here

    he same

    ariable

    low grade)

    can

    be both

    a cause nd an

    effect,and there can be a reciprocalrelationshipbetween wo variables

    low

    grades

    and

    not studying)

    hat

    cause ach

    other.

    Despite his

    ntuitive

    amiliarity with

    causal

    elationsbips,

    precise

    efinition

    of

    cause nd effect

    haseluded

    philosophers

    or centuries.l

    ndeed, he definitions

    1. Our analysis

    efldcts he use

    of the

    word causation

    n ordinary

    anguage, ot the

    more detailed

    discussions f

    cause

    by

    philosophers. eaders

    nterestedn

    suchdetail

    may consult

    a

    host of works that

    we reference

    n

    this

    chapter,

    ncludingCook

    and Campbell

    1979).

  • 8/17/2019 Cu as i Experimental

    5/81

    4

    |

    1.

    EXPERTMENTS

    ND

    GENERALTZED

    AUSALNFERENCE

    of terms

    such

    as cause

    and,

    ffectdependpartly

    on

    eachother

    and

    on the

    causal

    relationship

    n

    which

    both

    are

    embedded.

    o the 17th-century

    hilosopher

    John

    Locke

    said:

    "That

    which

    produces

    ny

    simpleor complex

    dea,

    we

    denote

    by

    the

    general

    name

    caLtse,

    nd

    that

    which is

    produce

    , effect"

    (1,97

    ,

    p.

    32fl

    and also:

    " A cAtrses that which

    makes

    any

    other thing,

    either

    simple dea,

    substance,

    r

    mode,

    begin

    o be;

    and

    an effect s

    that,

    which

    had

    ts

    beginning rom

    some

    other

    thing"

    (p.

    325).

    Since

    hen,

    other

    philosophers

    nd

    scientistsavegiven

    us

    useful

    definitions

    of

    the three

    key deas--cause,

    ffect,

    and causal elationship-that

    are

    more

    specific

    nd

    hat

    better

    lluminate

    how

    experiments

    ork. We

    would

    not

    de-

    fend

    any

    of these

    as he

    true or

    correct

    definition,

    given

    hat the

    atter

    has

    eluded

    philosophers

    or

    millennia;

    but

    we do claign

    hat these deas

    help

    o

    clarify

    the sci-

    entific

    practice

    of probing

    causes.

    Cause

    Consider

    he

    cause

    of

    a

    forest

    ire.

    We

    know

    that

    fires

    start n

    different

    ways-a

    match

    tossed

    rom

    a ca\

    a

    lightning

    strike,

    or a smoldering

    ampfire, or

    exam-

    ple.

    None

    of these

    causes

    s necessary

    ecause forest ire

    can start

    even

    when,

    say'

    a match

    s

    not present.

    Also,

    none

    of

    them

    s

    sufficient

    o start

    he fire.

    After

    all,

    a match

    must

    stay

    "hot"

    long

    enough

    o start combustion;

    t

    must

    contact

    combustible

    material

    such

    as

    dry leaves;

    here

    must

    be

    oxygen

    or

    combustion

    o

    occur;

    and

    the weather

    must

    be

    dry enough

    so that the leaves

    are dry

    and the

    match

    s

    not

    doused

    by rain.

    So

    he match

    s

    part

    of a constellation

    f conditions

    without

    which

    a

    fire

    will not result,

    although

    someof

    these onditions

    an

    be usu-

    ally

    taken

    or granted,

    such

    as he

    availability

    of oxygen.A lighted

    match s,

    rhere-

    fore, what Mackie (1,974)calledan inus condition-"an

    insufficient

    but non-

    redundant

    part

    of an unnecessary

    ut sufficient

    condition"

    (p.

    62; italics

    n

    orig-

    inal).

    It

    is insufficient

    because

    match

    cannot start

    a

    fire

    without

    the other

    con-

    ditions.

    It

    is nonredundant

    only if it

    adds

    something ire-promoting

    hat is

    uniquely

    different

    rom

    what the

    other factors

    n the

    constellation

    e.g.,

    oxygen,

    dry leaves)

    ontribute

    o

    starting

    a fire;

    after all,it

    would beharder

    o say

    whether

    the match

    caused

    he

    fire if

    someone

    lsesimultaneously

    ried

    starting t

    with

    a

    cigarette

    ighter.

    t is part

    of a

    sufficient

    condition to

    start a fire in

    combination

    with

    the full

    constellation

    of

    factors.

    But

    that condition s not necessary

    ecause

    there

    are

    other

    sets

    of

    conditions

    hat

    can

    also start

    fires.

    A research xampleof an inusconditionconcerns newpotential reatment

    for

    cancer.

    n

    the

    ate

    1990s,

    a team

    of

    researchers

    n Boston

    headed

    y Dr.

    Judah

    Folkman

    reported

    hat

    a new

    drug

    calledEndostatin

    shrank umors

    by limiting

    their

    blood

    supply

    (Folkman,

    1996).

    Other respectedesearchers

    ould not repli-

    cate

    he

    effect

    even

    when

    usingdrugs

    shipped

    o

    them

    from Folkman's

    ab.

    Scien-

    tists

    eventually

    eplicated

    he

    results

    after they had

    traveled

    o

    Folkman's

    ab

    to

    learn

    how

    to

    properly

    manufacture,

    ransport,

    store,and handle

    he drug

    and how

    to

    inject

    t in

    the

    right

    location

    at

    the right

    depthand angle.One

    observerabeled

    these

    contingencies

    he

    "in-our-hands"

    phenomenon,

    meaning

    "even

    we

    don't

  • 8/17/2019 Cu as i Experimental

    6/81

    EXPERIMENTS

    ND

    CAUSATION

    S

    know

    which

    details

    are

    mportant,

    so

    it

    might

    take

    you some

    time

    to work

    it out"

    (Rowe,

    L999,

    p.732).

    Endostatin

    was

    an

    inus

    condition.

    It was insufficient

    cause

    by

    itself, and

    its effectiveness

    equired

    it to

    be

    embedded

    n a

    larger set

    of

    condi-

    tions

    that

    were

    not

    even

    ully understood

    by

    the original

    investigators.

    Most

    causes

    are

    more

    accurately

    called

    nus conditions.

    Many

    factors are

    usu-

    ally

    required

    for

    an effect

    o occur,

    but

    we

    rarely

    know all of them and how they

    relate

    to each

    other.

    This

    is one

    reason

    that

    the

    causal

    relationships

    we discuss

    n

    this

    book

    are

    not

    deterministic

    but only

    increase

    he

    probability that

    an effect

    will

    occur

    (Eells,

    1,991,;

    olland,

    1,994).It

    also

    explains

    why

    a

    given causal

    elation-

    ship

    will

    occur

    under

    some

    conditions

    but

    not universally

    across

    ime,

    space,

    hu-

    -"r

    pop,rlations,

    or

    other

    kinds

    of treatments

    and

    outcomes

    that

    are

    more

    or less

    related

    io those

    studied.

    To different

    {egrees,

    all

    causal

    relationships

    are

    context

    dependent,

    so

    the

    generalization

    of

    experimental

    effects

    s always

    at

    issue.

    That is

    *hy

    *.

    return

    to

    such

    generahzations

    hroughout

    this

    book.

    Effect

    'We

    can

    better

    understand

    what

    an effect

    s

    through

    a counterfactual

    model

    that

    goes

    back

    at

    least

    to

    the

    18th-century

    philosopher

    David

    Hume

    (Lewis,

    'l'973'

    p.

    SSe

    .

    A counterfactual

    is something

    that

    is contrary

    to

    fact.

    In an experiment,

    ie

    obserue

    what

    did

    happez

    when

    people

    received

    a

    treatment.

    The

    counterfac-

    tual

    is knowledge

    of

    what

    would

    haue

    happened

    to

    those

    same

    people

    if

    they

    si-

    multaneously

    had

    not

    received

    reatment.

    An

    effect

    s the

    difference

    between

    what

    did

    happen

    and

    what

    would

    have

    happened.

    'We

    cannot

    actually

    observe

    a counterfactual.

    Consider

    phenylketonuria

    (PKU),

    a

    genetically-based

    etabolic

    disease

    hat

    causes

    mental

    retardation

    unless

    treated

    during

    the

    first

    few

    weeks of life. PKU is the absenceof an enzymethat

    would

    otherwise

    prevent

    a buildup

    of

    phenylalanine,

    a

    substance

    oxic

    to the

    nervous

    system.

    Vhen

    a restricted

    phenylalanine

    diet

    is begun

    early

    and

    main-

    tained,

    reiardation

    is

    prevented.

    n this

    example,

    the

    cause

    could

    be thought

    of as

    the

    underlying

    genetic

    defect,

    as

    the

    enzymatic

    disorder,

    or as

    the

    diet.

    Each

    im-

    plies a difierenicounterfactual.

    For

    example,

    if we

    say

    that

    a

    restricted

    phenyl-

    alanine

    diet

    caused

    a

    decrease

    n

    PKU-based

    mental

    retardation

    in

    infants who

    are

    phenylketonuric

    at

    birth,

    the

    counterfactual

    is whatever

    would

    have

    happened

    'h"d

    t'h.r.

    same

    nfants

    not

    received

    a

    restricted

    phenylalanine

    diet.

    The same

    ogic

    applies

    to

    the

    genetic

    or enzymatic

    version

    of

    the

    cause.

    But

    it is

    impossible

    for

    theseu.ry ,"-i infants simultaneously to both have and not havethe diet, the ge-

    netic disorder,

    or

    the

    enzyme

    deficiency.

    So

    a central

    task

    for all

    cause-probing

    research

    s to create

    reasonable

    ap-

    proximations

    to

    this

    physically

    impossible

    counterfactual.

    For

    instance,

    f it were

    ethical

    to do

    so,

    we

    might

    contrast

    phenylketonuric

    infants

    who

    were

    given the

    diet

    with

    other

    phenylketonuric

    infants

    who

    wer€

    not

    given the

    diet

    but who

    were

    similar

    in

    many ways

    to

    those

    who

    were

    (e.g.,

    similar

    face)

    gender,age,

    socioeco-

    nomic status,

    health

    status).

    Or

    we

    might

    (if

    it were

    ethical)

    contrast

    infants

    who

  • 8/17/2019 Cu as i Experimental

    7/81

    I

    6

    I

    1. EXPERIMENTS

    ND

    GENERALIZED

    AUSALNFERENCE

    were not on

    the diet for

    the first

    3 months

    of their

    lives

    with those

    same nfants

    after they

    were

    put

    on the diet

    starting in

    the

    4th

    month. Neither of these

    ap-

    proximations

    is

    a true counterfactual.

    In

    the

    first

    case,

    he

    individual infants in

    the

    treatment condition are different from those in the comparison condition; in the

    second case,

    he identities

    are

    the same,

    but time

    has

    passed

    and

    many

    changes

    other than the

    treatment have

    occurred

    to the infants

    (including

    permanent

    dam-

    age done

    by

    phenylalanine

    during the first

    3

    months

    of life). So two central

    tasks

    in experimental

    design are

    creating

    a

    high-quality

    but necessarily

    mperfect

    source

    of counterfactual

    inference

    and understanding

    how this source differs from

    the

    treatment condition.

    This

    counterfactual

    reasoning

    s fundarnentally

    qualitative

    because ausal n-

    ference, even

    in experiments,

    is

    fundamentally

    qualitative

    (Campbell,

    1975;

    Shadish, 1995a;

    Shadish

    6c Cook, 1,999). However, some

    of these

    points

    have

    been

    ormalized

    by statisticians nto

    a specialcase hat

    is

    sometimes alled Rubin's

    CausalModel (Holland, 1,986;Rubin, 1.974,'1.977,1978,79861.This book is not

    about statistics,

    so we

    do

    not

    describe hat model in detail

    ('West,

    Biesanz,

    &

    Pitts

    [2000]

    do so and relate t

    to the

    Campbell radition). A

    primary

    emphasis f Ru-

    bin's

    model

    is

    the analysis

    of cause n

    experiments, and

    its

    basic

    premises

    are con-

    sistent with

    those

    of this book.2 Rubin's

    model has also beenwidely used

    o ana-

    lyze

    causal inference

    in

    case-control

    studies in

    public health

    and medicine

    (Holland

    6c Rubin, 1988),

    in

    path

    analysis n sociology

    (Holland,1986),

    and

    in

    a

    paradox

    that

    Lord

    (1967)

    introduced

    into

    psychology

    (Holland

    6c Rubin,

    1983);

    and

    it

    has

    generated

    many

    statistical nnovations that we cover ater in

    this

    book.

    It is

    new

    enough that

    critiques

    of

    it

    are

    just

    now

    beginning

    to

    appear

    (e.g.,

    Dawid,

    2000;

    Pearl, 2000).

    tUfhat

    s clear, however, is that Rubin's is a very gen-

    eral

    model

    with

    obvious

    and subtle implications. Both it

    and the critiques of

    it

    are

    required

    material

    for

    advanced

    students

    and

    scholars of cause-probingmethods.

    CausalRelationship

    How

    do

    we

    know if

    cause and effect

    are

    related? In

    a classicanalysis

    ormalized

    by the

    19th-century

    philosopher

    John

    Stuart Mill, a causal

    relationship

    exists if

    (1)

    the causepreceded

    he effect,

    (2)

    the causewas

    related

    to the effect,and

    (3)

    we

    can find

    no

    plausible

    alternative

    explanation

    for

    the effect other

    than

    the cause.

    These

    three

    characteristics

    mirror

    what happens in experiments

    n

    which

    (1)

    we

    manipulate the presumed cause and observe an outcome afterward; (2) we see

    whether

    variation in

    the

    cause s related

    to variation

    in

    the effect ; and

    (3)

    we use

    various methods

    during

    the experiment

    to

    reduce

    the

    plausibility

    of other expla-

    nations for

    the effect,

    along with

    ancillary methods to explore the

    plausibility

    of

    those

    we cannot

    rule

    out

    (most

    of this book is abou t methods

    for

    doing this).

    2. However,

    Rubin's model

    is not intended

    to say much

    about the matters of

    causal

    generalization

    that

    we address

    in this book.

  • 8/17/2019 Cu as i Experimental

    8/81

    EXPERTMENTS

    ND

    CAUSATTON

    |

    7

    I

    Henceexperiments

    re

    well-suited

    o

    studying

    causal

    elationships.No other

    sci-

    entific

    method

    egularly

    matches

    he characteristics

    f causal elationships

    owell.

    Mill's analysis

    lso

    points o the

    weakness

    f

    other

    methods. n many correlational

    studies,

    or example,

    t

    is impossible

    o

    know

    which of two variables

    ame irst,

    so defending

    causal

    elationship

    etween

    hem

    s

    precarious.

    Understanding

    his

    logic of causal elationships nd how its key terms,suchas causeand effect,are

    defined

    helps

    esearchers

    o

    critique

    cause-probing

    tudies.

    Causation,

    orrelation,

    nd

    Confounds

    A well-known

    maxim

    in research

    is:

    Correlation

    does

    not

    proue

    causation.

    This is

    so because

    we

    may not

    know

    which

    variable

    came

    irst

    nor whether

    alternative ex-

    planations

    for the

    presumed effect

    exist.

    For example,

    suppose

    ncome

    and

    educa-

    tion are correlated.

    Do

    you

    have

    o

    have a high

    income

    before

    you

    can aff.ordto

    pay

    for education,or do you first have o get a good educationbefore you can get a bet-

    ter

    paying

    ob?

    Each

    possibility

    may

    be true,

    and

    so both

    need

    nvestigation.

    But

    un-

    til

    those

    nvestigations

    are

    completed

    and

    evaluated

    by the scholarly

    communiry

    a

    simple

    correlation

    does

    not indicate

    which

    variable

    came

    first. Correlations

    also

    do

    little to rule

    out alternative

    explanations

    for a

    relationship

    between two

    variables

    such as education

    and

    income.

    That

    relationship

    may not be

    causal at al l but

    rather

    due

    to a

    third

    variable

    (often

    called

    a confound),

    such

    as

    intelligenceor

    family so-

    cioeconomic

    status,

    hat

    causes

    oth

    high

    education

    and

    high

    income.

    For

    example,

    if high

    intelligence

    causes

    uccess

    n education

    and on

    the

    job,

    then intelligent

    peo-

    ple

    would

    have correlated

    education

    and

    incomes,

    not because

    ducation

    causes

    n-

    come

    (or

    vice

    versa)

    but

    because

    oth

    would

    be

    causedby

    intelligence.

    Thus

    a

    cen-

    tral task in the study of experiments is identifying the different kinds of confounds

    that

    can operate

    n a

    particular

    research

    area

    and

    understanding

    he strengths

    and

    weaknesses

    ssociated

    with

    various

    ways

    of dealing

    with them

    Manipulable

    nd

    Nonmanipulable

    auses

    In the

    ntuitive understanding

    f experimentation

    hat most

    peoplehave, t makes

    sense

    o say,

    Let's

    seewhat

    happens

    f we

    require

    welfare

    ecipients

    o work";

    but

    it makesno sense

    o

    say,

    Let's

    see

    what

    happens

    f I

    change

    his

    adult

    male nto a

    three-year-oldirl." And so t is also n scientific xperiments. xperiments xplore

    the effects

    of things

    that

    can

    be

    manipulated,

    such as

    the dose

    of a

    medicine, he

    amount of

    a welfare

    check,

    he

    kind or

    amount

    of

    psychotherapy r

    the number

    of

    children

    n a classroom.

    onmanipulable

    vents

    e.g.,

    he explosion

    of a super-

    nova) or attributes

    e.g., eople's

    ges,

    heir

    raw

    geneticmaterial,or their

    biologi-

    cal sex)

    cannot

    be causes

    n experiments

    ecause

    e cannot

    deliberately

    ary

    hem

    to seewhat

    then

    happens.

    Consequently,

    ost scientists

    nd

    philosophers

    gree

    that

    it is much

    harder

    o discover

    he

    effects

    f

    nonmanipulable

    auses.

  • 8/17/2019 Cu as i Experimental

    9/81

    I

    8

    |

    1. EXeERTMENTSNDGENERALTzEDAUsAL

    NFERENcE

    To be clear,we are

    not

    arguing that

    all causes

    must

    be

    manipulable-only

    that

    experimental

    causes

    must

    be so.

    Many variables hat

    we correctly

    think

    of as causes

    are

    not directly manipulable. Thus

    it is

    well established

    hat a

    genetic

    defect causes

    PKU even hough that defect s not directly manipulable.'We can investigatesuch

    causes

    ndirectly in nonexperimental studiesor

    even

    n

    experiments

    by manipulat-

    ing biological

    processes

    hat

    prevent

    the

    gene from exerting

    its

    influence,

    as

    through

    the use of diet to

    inhibit

    the

    gene's

    biological

    consequences.

    oth the non-

    manipulable

    gene

    and the manipulable diet

    can be

    viewed as

    causes-both

    covary

    with

    PKU-based etardation, both

    precede he retardation,

    and

    it is possible

    o ex-

    plore

    other explanations

    for the

    gene's

    and the

    diet's

    effectson

    cognitive

    function-

    ing.

    However, investigating he manipulablc

    diet as a

    cause

    has two

    important ad-

    vantages

    over considering the

    nonmanipulable

    genetic

    problem as a cause.

    First,

    only the diet

    provides

    a direct action to

    solve the

    problem;

    and

    second,

    we will see

    that studying

    manipulable agentsallows a

    higher

    quality

    source

    of counterfactual

    inference hrough such methods as random assignment. fhen individuals with the

    nonmanipulable

    genetic problem

    are compared

    with

    persons

    without

    it,

    the

    latter

    are

    likely to be different

    from

    the

    former in

    many ways

    other than

    the

    genetic

    de-

    fect. So the counterfactual

    inference

    about

    what

    would

    have

    happened

    to

    those

    with the PKU

    genetic

    defect

    s much more difficult

    to

    make.

    Nonetheless,

    nonmanipulable causes hould

    be

    studied using

    whatever

    means

    are available

    and seemuseful.

    This is true because

    uch

    causes ventually

    help

    us

    to

    find

    manipulable agents

    that can then be

    used

    to ameliorate

    the

    problem

    at

    hand. The PKU example

    illustrates

    this.

    Medical researchers

    id

    not discover

    how

    to treat

    PKU

    effectively

    by

    first

    trying different

    diets

    with

    retarded children.

    They

    first discovered the nonmanipulable biological features of

    retarded children

    af-

    fected with PKU, finding abnormally

    high

    levels of

    phenylalanine

    and

    its

    associ-

    ated

    metabolic

    and

    genetic problems

    in those

    children.

    Those

    findings

    pointed

    in

    certain ameliorative directions and

    away

    from others,

    leading scientists

    o exper-

    iment with treatments they

    thought might be effective

    and

    practical. Thus

    the

    new

    diet

    resulted from a sequenceof studies

    with different

    immediate

    purposes, with

    different

    forms, and

    with

    varying degreesof

    uncertainty

    reduction.

    Somewere ex-

    perimental, but

    others were

    not.

    Further,

    analogue experiments

    can sometimes

    be done

    on

    nonmanipulable

    causes, hat is, experiments that

    manipulate an

    agent

    that

    is

    similar

    to the

    cause

    of

    interest. Thus

    we cannot change

    a

    person's ace, but

    we can

    chemically

    induce

    skin pigmentation changes n volunteer individuals-though such analoguesdo

    not match the reality of being

    Black

    every

    day and

    everywhere

    or an entire

    life.

    Similarly

    past

    events,which are

    normally nonmanipulable,

    sometimes

    constitute

    a

    natural

    experiment that

    may even

    have

    been

    randomized,

    as when

    the

    1'970

    Vietnam-era draft

    lottery

    was used

    to

    investigate a

    variety of

    outcomes

    (e.g.,

    An-

    grist,

    Imbens,

    &

    Rubin, 1.996a;Notz, Staw,

    &

    Cook,

    l97l).

    Although

    experimenting on

    manipulable causes

    makes he

    job

    of discovering

    their effectseasier,experiments are

    far from

    perfect

    meansof

    investigating

    causes.

  • 8/17/2019 Cu as i Experimental

    10/81

    I

    EXPERIMENTSND CAUSATION 9

    Sometimes

    experiments

    modify the conditions

    in

    which testing

    occurs in a

    way

    that reduces he

    fit between

    hose conditions and

    the

    situation to which

    the results

    are

    to

    be

    generalized.Also,

    knowledge of the

    effects

    of manipulable

    causes

    ells

    nothing about

    how and why those effects

    occur.

    Nor

    do experiments

    answer many

    other

    questions relevant to the

    real world-for example,

    which

    questions

    are

    worth asking, how strong the need for treatment is, how a cause s distributed

    through societg

    whether

    the treatment

    is

    implemented with theoretical fidelitS

    and what value

    should be

    attached to the

    experimental

    results.

    In additioq,

    in experiments,

    we first

    manipulate a treatment and only then

    ob-

    serve

    ts effects;but

    in some other

    studieswe

    first

    observean effect, such as AIDS,

    and then search

    for its cause,

    whether

    manipulable

    or not. Experiments

    cannot

    help

    us with

    that search.

    Scriven

    (1976)

    likens such

    searches o detective

    work

    in

    which a crime

    has beencommitted

    (..d.,

    "

    robbery),

    the detectives

    bservea

    par-

    ticular

    pattern

    of evidence

    surrounding

    the

    crime

    (e.g.,

    he

    robber

    wore a baseball

    cap and a

    distinct

    jacket

    and used a certain

    kind of

    Bun),

    and then the detectives

    search or criminals whose known method of operating

    (their

    modus

    operandi or

    m.o.) includes this

    pattern. A

    criminal

    whose

    m.o. fits that

    pattern

    of

    evidence

    then becomesa

    suspect o be

    investigated

    further.

    Epidemiologists

    use a similar

    method, the case-control

    design

    (Ahlbom

    6c

    Norell, 1,990), n which they observe

    a

    particular health outcome

    (e.g.,

    an

    increase

    n brain tumors) that is not

    seen

    n

    another

    group

    and then

    attempt to

    identify

    associatedcauses

    e.g.,

    ncreased

    cell

    phone use). Experiments

    do

    not aspire to answer

    all the

    kinds

    of

    questions,

    not

    even all the

    types of

    causal

    questions,

    hat social

    scientistsask.

    Causal escriptionnd Causal xplanation

    The uniquestrength

    of experimentation

    s in describing

    he consequencesttrib-

    utable o deliberately

    aryinga treatment.'We

    all this

    causaldescription. n con-

    trast,

    experiments

    o

    lesswell in clarifying

    the

    mechanismshrough which and

    the

    conditions

    under

    which that

    causal

    elationship

    holds-what

    we

    call

    causal

    explanation.

    For example,most

    childrenvery

    quickly earn he descriptive

    ausal

    relationshipbetween

    licking

    a light switch

    and obtaining

    llumination n a room.

    However, ew children

    (or

    evenadults)

    can

    fully explain why that

    light

    goes

    on.

    To do so, hey would

    have o

    decompose

    he treatment

    the

    act of flicking

    a

    light

    switch)

    nto

    its

    causally fficacious

    eatures

    e.g.,

    losingan nsulated ircuit) and

    its nonessentialeatures

    e.g.,

    whether he

    switch

    s thrown by hand or a motion

    detector).

    They would

    have o do the same

    or the effect

    (either

    ncandescent r

    fluorescent

    ight can be

    produced,

    but

    light

    will still be

    produced

    whether the

    light fixture is recessed r

    not). For

    full explanation,

    hey would then have to

    show

    how

    the

    causally

    efficacious

    arts of

    the treatment

    nfluence he

    causally

    affected

    parts

    of the outcome

    hrough

    identified

    mediating

    processes

    e.g.,

    he

  • 8/17/2019 Cu as i Experimental

    11/81

    I

    1O I T. CXPTRIMENTS

    ND

    GENERALIZED

    AUSAL

    NFERENCE

    passage

    of electricity

    through the circuit,

    the excitation

    of

    photons).3

    ClearlS the

    causeof the

    light

    going

    on

    is

    a complex

    cluster

    of

    many factors.

    For those

    philoso-

    phers

    who equate cause

    with

    identifying that constellation

    of

    variables

    that

    nec-

    essarily inevitably and infallibly results in the effect (Beauchamp,1.974), alk of

    cause

    s not

    warranted

    until everything

    of

    relevance

    s known.

    For them,

    there

    is

    no causal description

    without causal

    explanation.

    Whatever

    the

    philosophic mer-

    its of their

    position,

    though,

    it is not

    practical to expect

    much

    current social

    sci-

    ence

    o

    achieve

    such complete

    explanation.

    The

    practical

    importance of

    causal explanation

    is

    brought

    home when the

    switch

    fails

    to

    make the

    light

    go

    on

    and when

    replacing

    the

    light bulb

    (another

    easily

    learned manipulation)

    fails to solva

    the

    problem. Explanatory

    knowledge

    then

    offers clues about

    how to

    fix

    the

    problem-for

    example,

    by detecting

    and

    re-

    pairing

    a

    short circuit. Or

    if we wanted

    to create

    llumination

    in

    a

    place

    without

    lights

    and

    we had explanatory

    knowledge, we would

    know exactly

    which

    features

    of the cause-and-effect elationship are essential o create ight and which are ir-

    relevant. Our explanation

    might tell

    us that

    there

    must be

    a source

    of electricity

    but

    that that source

    could take several

    different

    molar

    forms, such

    as abattery,

    a

    generator,

    a windmill, or a

    solar array.

    There

    must also

    be a

    switch

    mechanism o

    close a circuit, but

    this could also

    take

    many forms,

    including

    the touching of

    two

    bare wires

    or

    even

    a

    motion

    detector

    that

    trips the

    switch

    when someone

    enters

    the

    room.

    So causal explanation

    is an

    important

    route to

    the

    generalization

    of

    causal descriptions

    because

    t tells us which

    featuresof

    the

    causal

    relationship

    are

    essential o transfer

    to other situations.

    This

    benefit

    of causal explanation

    helps elucidate

    its

    priority and

    prestige n

    all sciences nd helpsexplain why, once a novel and important causal

    relationship

    is discovered, he

    bulk

    of

    basic scientific

    effort

    turns

    toward

    explaining

    why and

    how it happens. Usuallg this

    involves decomposing

    he

    cause

    nto its causally ef-

    fective

    parts,

    decomposing the

    effects

    nto its causally

    affected

    parts,

    and

    identi-

    fying

    the

    processes

    hrough

    which the effective

    causal

    parts influence

    the causally

    affected

    outcome

    parts.

    These examplesalso

    show the close

    parallel between

    descriptive

    and

    explana-

    tory causation and

    molar and

    molecular causation.a

    Descriptive causation

    usually

    concerns

    simple bivariate

    relationships between

    molar treatments

    and

    molar out-

    comes,

    molar here referring to a

    package

    hat

    consists

    of

    many different

    parts.

    For

    instance, we may

    find

    that

    psychotherapy decreases

    epression,

    a simple

    descrip-

    tive causalrelationship benveena molar treatment package and a molar outcome.

    However,

    psychotherapy

    consists of

    such

    parts

    as

    verbal

    interactions,

    placebo-

    3. However, he full explanationa

    physicist

    would

    offer might

    be

    quite different

    rom

    this electrician's

    explanation,

    perhaps nvoking he behaviorof subparticles.

    his difference

    ndicates

    ust

    how complicated

    s the

    notion of explanationand

    how it

    can

    quickly

    become

    uite

    complex

    once

    one

    shifts

    evelsof analysis.

    4. By molar, we meansomething aken as

    a whole rather than

    in

    parts.An analogy

    s to

    physics,

    n

    which molar

    might refer o the

    properties

    r

    motions of masses, s

    distinguished

    rom

    thoseof

    molecules

    r atoms hat

    make up

    thosemasses.

  • 8/17/2019 Cu as i Experimental

    12/81

    EXPERIMENTSNDCAUSATION

    11

    I

    generating

    procedures,

    setting characteristics,

    ime

    constraints,

    and

    payment

    for

    services.

    Similarly,

    many

    depression

    measures

    consist of

    items

    pertaining

    to

    the

    physiological,cognitive,

    and affective

    aspects

    f

    depression.

    Explan atory causation

    breaks

    hese

    molar

    causes

    and

    effects

    nto

    their

    molecular

    parts

    so as to

    learn,

    say,

    that

    the verbal

    nteractions

    and

    the

    placebo

    featuresof therapy

    both cause

    changes

    in the cognitivesymptomsof depression,but that payment for services oes not do

    so even

    hough

    it is

    part

    of the

    molar

    treatment

    package.

    If experiments

    are

    less

    able to

    provide

    this highly-prized

    explanatory

    causal

    knowledge,

    why.are

    experiments

    so

    central

    to science,

    specially

    o basic

    social sci-

    ence,

    n which

    theory

    and

    explanation

    are often

    the coin of

    the realm?

    The answer is

    that

    the dichotomy

    ber'*reen

    escriptive

    and

    explanatory

    causation

    is lessclear

    in

    sci-

    entific

    practice han

    in abstract

    discussions

    bout

    causation.

    First, many causalex-

    planatironsconsist

    of

    chains

    of descriptivi

    causal

    inks in which

    one event causes

    he

    next. Experiments

    help to test

    the

    links

    in each

    chain. Second,

    experiments

    help dis-

    tinguish

    between

    he validity

    of

    competing

    explanatory

    theories,

    or example, by test-

    ing competing mediating links proposed by those theories.Third, someexperiments

    test whether

    a descriptive

    causal

    relationship

    varies

    in strength

    or direction

    under

    Condition

    A versus

    Condition

    B

    (then

    the condition

    is

    a

    moderator variable

    that ex-

    plains

    the

    conditions

    under

    which the

    effect

    holds).

    Fourth, some

    experiments

    add

    quantitative

    or

    qualitative observations

    of

    the

    links

    in the explanatory

    chain

    (medi-

    ator

    variables)

    to

    generateand study

    explanations

    for the

    descriptive

    causal effect.

    Experiments

    are

    also

    prized

    in

    applied

    areas

    of social

    science,

    n which the

    identification

    of

    practical

    solutions

    to

    social

    problems has as

    great

    or

    even

    greater

    priority

    than

    explanations

    of those

    solutions.

    After all, explanation

    is not always

    required

    for

    identifying

    practical solutions.

    Lewontin

    (1997)

    makes this

    point

    about the

    Human Genome

    Project,

    a

    coordinated

    multibillion-dollar

    research

    program ro map the human genomethat it is hoped eventually will clarify the ge-

    netic

    causesof

    diseases.

    ewontin

    is skeptical

    about

    aspects

    of this

    search:

    ' ilhat

    is

    involved

    here

    s the difference

    etween

    xplanation

    nd

    intervention.

    Many

    disorders

    anbe

    explained

    y the

    failure

    of

    the organism

    o

    makea

    normal

    protein,

    a

    failure

    hat

    is

    the

    consequence

    f a

    genemutation.

    But interuention

    equireshat

    the

    normal

    proteinbe

    providedat

    the

    right

    place

    n the

    right cells,at the

    right time and

    n

    the

    right amount,

    or

    else hat

    an alternative

    way

    be

    ound to

    providenormal

    cellular

    function.'What

    s worse,

    t

    might even

    be

    necessary

    o

    keep he

    abnormal

    roteinaway

    from the

    cellsat

    critical

    moments.

    None

    of

    these

    bjectives

    s served

    y knowing he

    DNA sequence

    f the

    defective

    ene.

    Lewontin,

    1,997,

    p.29)

    Practical applications arenot immediately revealedby theoretical advance.In-

    stead, o

    reveal hem

    may take

    decades

    of

    follow-up

    work, including

    tests of sim-

    ple

    descriptive

    causal

    relationships.

    The same

    point

    is illustrated

    by the cancer

    drug

    Endostatin,

    discussed

    arlier.

    Scientists

    knew the

    action

    of

    the drug

    occurred

    through

    cutting off

    tumor

    blood

    supplies;

    but

    to successfully

    use he

    drug

    to treat

    cancers

    n

    mice required

    administering

    it at

    the

    right

    place,

    angle,

    and

    depth,

    and

    those details

    were

    not

    part of the usual

    scientific

    explanation

    of the

    drug's

    effects.

  • 8/17/2019 Cu as i Experimental

    13/81

    12

    I

    1. EXPERTMENTS

    ND

    GENERALTZED

    AUSAL

    NFERENCE

    I

    In the

    end, hen,causal

    escriptions ndcausal xplanations

    re

    n

    delicate

    al-

    ance

    n

    experiments.'$7hatxperiments

    o best

    s to improvecausaldescriptions;

    they do

    less

    well at explaining

    causal

    elationships. ut most experiments

    an be

    designedo providebetterexplanationshan s typically he caseoday.Further, n

    focusing

    on causaldescriptions, xperiments

    ften

    investigate

    molar

    events hat

    may be less

    strongly related to outcomes han are more

    molecularmediating

    processes,

    specially

    hose

    processes

    hat are

    closer o the outcome

    n

    the explana-

    tory

    chain. However,

    many

    causaldescriptions

    re still dependable nd strong

    enough o be useful,

    o be worth making the building blocks

    around which im-

    portant policies

    and theoriesare created.

    ust

    consider

    he dependability f

    such

    causal tatementss

    hat schooldesegregationauses

    hite light, or that outgroup

    threatcausesngroup

    cohesion, r

    that

    psychotherapymprovesmentalhealth,

    or

    that diet reduces

    he

    retardation

    due

    o

    PKU. Suchdependableausal

    elationships

    are

    useful o

    policymakers, ractitioners,

    nd

    scientists like.

    MODERN

    DESCRIPTIONS

    F EXPERIMENTS

    Some of

    the terms used n describing modern experimentation

    (see

    Table L.L)

    are

    unique,

    clearly defined,

    and consistently used;

    others are blurred and

    inconsis-

    tently used. The common

    attribute in all experiments

    is control of treatment

    (though

    control can take many

    different

    forms). So Mosteller

    (1990,

    p.

    225)

    writes,

    "fn

    an experiment

    the

    investigator

    controls the

    application of the treat-

    ment"l

    and

    Yaremko,

    Harari, Harrison,

    and

    Lynn

    (1,986,

    p.72)

    write,

    "one

    or

    more independent

    variables are

    manipulated

    to

    observe their effects on one or

    more

    dependent

    variables." However,

    over time

    many different

    experimental sub-

    types

    have

    developed

    n response

    o the needs and

    histories of different sciences

    ('Winston,

    1990;

    'Winston

    6c Blais, 1.996\.

    TABLE .1

    TheVocabulary

    f Experiments

    Experiment:

    study n whichan nterventions

    deliberately

    ntroduced

    o observetseffects.

    Randomized

    xperiment:

    n experimentn whichunitsareassigned

    o

    receive

    he reatment r

    analternativeondition ya random rocessuch s he oss f a coin r a table f

    random

    umbers.

    Quasi-Experiment:

    n experimentn

    whichunitsare

    not assignedo

    conditionsandomly.

    Natural

    xperiment: ot

    really n experimentecausehe cause

    sually annot e

    manipulated;

    study

    hatcontrastsnaturallyccurring

    vent uch

    sanearthquakeith

    a comoarison

    ondition.

    Correlational

    tudy:Usually

    ynonymous

    ith nonexperimental

    r observationaltudy; study

    that

    simply

    bserveshesize nddirection

    f

    a relationship

    mong

    ariables.

  • 8/17/2019 Cu as i Experimental

    14/81

    I

    MODERN

    ESCRIPTIONS

    F

    EXPERIMENTS

    I

    tr

    Randomized

    xperiment

    The

    most clearly

    described

    ariant

    s

    the

    randomized

    experiment,

    widely

    credited

    to Sir

    Ronald

    Fisher

    1,925,1926).It

    was

    irst

    used

    n agriculture

    ut

    aterspread

    to other

    topic

    areas

    because

    t

    promisedcontrol

    over extraneous

    ources

    f

    vari-

    ation

    without

    requiring

    he physical solationof the aboratory. ts distinguishing

    feature

    s clear

    and

    important-that

    the

    various

    reatments

    being

    contrasted

    in-

    cluding

    no treatment

    at

    all)

    are assigned

    o experimental

    nits'

    by chance,

    or ex-

    ample,

    by

    coin

    ossor

    use

    of a table

    of

    random

    numbers.

    f

    implemented orrectlS

    ,"rdo-

    assignment

    reates

    wo

    or

    more

    groupsof units

    that are

    probabilistically

    similar

    o .".h

    other

    on the

    average.6

    ence,

    any

    outcome

    differences

    hat

    are ob-

    served

    etween

    hose

    groups

    at

    the

    end,of

    a study

    are

    ikely

    to be due o

    treatment'

    not to differences

    etween

    he

    groups hat

    already

    existed

    at the start

    of the study.

    Further,

    when

    certain

    assumptions

    re

    met, he

    randomized

    experiment

    ields

    an

    estimate

    of

    the

    size

    of a treatment

    effect

    hat

    has desirable

    tatistical

    properties'

    alongwith estimates f the probability that the true effect alls within a defined

    confidence

    nterval.

    These

    eatures

    of experiments

    re so

    highly

    prized

    hat

    in a

    research

    rea

    such

    as

    medicine

    he randomized

    experiment

    s

    often

    referred

    o as

    the

    gold standard

    or treatment

    outcome

    esearch.'

    Closely

    elated

    o

    the

    randomized

    experiment

    s a

    more ambiguous

    and

    in-

    consistently

    sed

    erm,

    true

    experiment.

    Some

    authors

    use

    t

    synonymously

    ith

    randomized

    xperiment

    Rosenthal

    &

    Rosnow,

    1991').

    Others

    use

    t more

    gener-

    ally to

    refer

    o any

    study

    n

    which

    an

    independent

    ariable

    s deliberately

    manip-

    ulated

    (Yaremko

    et

    al.,

    1,9861and

    dependent

    ariable

    s assessed.

    We

    shall

    not

    use he

    term

    at all

    given ts

    ambiguity

    and

    given

    hat

    the

    modifier

    true seems

    o

    imply

    restricted

    laims

    o a

    single

    correct

    experimental

    method.

    Quasi-Experiment

    Much

    of this

    book

    focuses

    on

    a class

    of

    designs

    hat

    Campbell

    and Stanley

    (1,963)

    opularized

    s

    quasi-experiments.s

    uasi-experiments

    hare

    with all other

    5. Units

    can be

    people,animals,

    ime

    periods, nstitutions,

    or

    almost

    anythingelse.

    Typically

    n field

    experimentation

    hey

    are

    people

    or

    some

    aggregate

    f

    people,such

    as classrooms

    r work sites.

    n addition,

    a

    little

    thought

    shows hat

    random

    assignment

    f units

    to treatments

    s the

    sameas assignment

    f

    treatments

    o units, so

    these

    phrases

    re requendy

    used

    nterchangeably'

    6. The word probabilistically s crucial,as s explained n more detail n Chapter8.

    7.

    Although the rerm

    randomized

    experiment

    is used his

    way

    consistently

    across

    many

    fields and

    in this book,

    statisticians

    sometimes

    use he closely

    related term

    random

    experiment

    n a different

    way to

    indicate

    experiments

    for which the

    outcome

    annor

    be

    predictedwith

    certainry

    e.g.,

    Hogg &

    Tanis, 1988).

    8. Campbell

    1957)

    irst called

    hese ompromise

    esigns

    ut

    changed

    erminology

    very

    quickly; Rosenbaum

    (1995a\

    and Cochran

    1965\

    refer o these

    as

    observational

    tudies,

    term we

    avoid because

    many

    people

    use

    t to

    refer o

    correlational

    r nonexperimental

    tudies,

    s

    well. Greenberg

    nd Shroder

    1997)

    use

    qudsi-etcperiment

    o

    refer o studies

    hat

    randomly

    assign

    roups

    (e.g.,

    ommunities)

    o conditions,

    but

    we would

    consider hese

    roup-

    randomizedexperiments

    Murray'

    1998).

  • 8/17/2019 Cu as i Experimental

    15/81

    I

    14

    I

    1. EXPERIMENTS

    NDGENERALIZED

    AUSALNFERENCE

    I

    experiments

    a similar

    purpose-to

    test descriptivecausalhypotheses

    bout

    manip-

    ulable causes-as well as many

    structural details, such as

    the

    frequent

    presence

    of

    control

    groups

    and

    pretest

    measures,

    o support a counterfactual

    nference

    about

    what would have happened in the absenceof treatment. But, by definition, quasi-

    experiments lack random

    assignment. Assignment to conditions

    is by

    means

    of self-

    selection,by which units choose

    reatment

    for

    themselves, r

    by

    meansof adminis-

    trator selection,

    by which teachers,bureaucrats, egislators, herapists,

    physicians,

    or others

    decidewhich

    persons

    should

    get

    which treatment.

    Howeveq researchers

    who use

    quasi-experiments

    may still have considerablecontrol

    over selectingand

    schedulingmeasures,

    ver how nonrandom

    assignment

    s

    executed,

    over the kinds

    of

    comparison

    groups

    with which

    treatment,groups

    are compared,

    and over some

    aspectsof how treatment is

    scheduled.

    As

    Campbell

    and Stanley

    note:

    There are many

    natural socialsettings n which the

    research

    erson

    can

    introduce

    somethingike experimental esign nto his scheduling f datacollection rocedures

    (e.g.,

    he uhen and o

    whom of

    measurement),ven hough

    he acks he full control

    over he

    scheduling f experimental timuli

    (the

    when

    and o wltom of exposure nd

    the ability to randomize

    xposures) hich makes

    a true experiment

    ossible.

    ollec-

    tively,such

    situations an be regarded s

    quasi-experimental

    esigns.

    Campbell

    &

    StanleS

    ,963,

    .

    34)

    In quasi-experiments,

    he

    cause s manipulable and

    occurs before the effect

    is

    measured. However,

    quasi-experimental

    design

    features usually create

    ess

    com-

    pelling

    support for counterfactual inferences. For example,

    quasi-experimental

    control

    groups

    may differ from

    the treatment

    condition

    in many systematic

    non-

    random) ways other than the presenceof the treatment Many of theseways could

    be alternative

    explanations for the observed effect,

    and so

    researchershave to

    worry

    about ruling

    them out in order

    to

    get

    a

    more valid

    estimateof

    the treatment

    effect.

    By

    contrast, with random

    assignment

    he researcher

    does

    not have to th ink

    as

    much

    about a ll these alternative

    explanations.

    If correctly done,

    random as-

    signment makes most

    of the alternatives

    less likely as causes

    of the observed

    treatment effect

    at the start of the study.

    In

    quasi-experiments,

    he researcher as o enumerate

    alternative

    explanations

    one by one,

    decide which are

    plausible,

    and then use

    ogic, design,

    and measure-

    ment

    to assess

    hether

    each

    one

    is

    operating

    in

    a

    way that

    might explain any ob-

    servedeffect. The diff iculties are

    that thesealternative

    explanations

    are

    never com-

    pletely enumerable n advance, that some of them are particular to the context

    being studied,

    and that the methods needed o eliminate

    them

    from contention will

    vary

    from

    alternative o

    alternative and

    from

    study

    to study.

    For example,suppose

    two nonrandomly

    formed

    groups

    of

    children are

    studied, a volunteer

    treatment

    group

    that

    gets

    a

    new

    reading

    program

    and a control

    group

    of

    nonvolunteerswho

    do not

    get

    it.

    If the treatment

    group

    does better,

    s it becauseof

    treatment or be-

    cause he

    cognitive development of

    the volunteers

    was

    increasing

    more rapidly even

    before treatment

    began?

    (In

    a

    randomized experiment,

    maturation rates would

    r

  • 8/17/2019 Cu as i Experimental

    16/81

    MODERN ESCRIPTIONS

    FEXPERIMENTS

    |

    1s

    have

    been

    probabilistically

    qual

    n both

    groups.)

    To assess

    his

    alternative,

    he

    re-

    searcher

    might

    add

    multiple

    pretestso

    reveal

    maturational

    rend

    before

    he

    treat-

    ment, and

    then

    compare

    hat

    trend

    with

    the

    trend after

    treatment.

    Another

    alternative

    xplanation

    might

    be

    hat the

    nonrandom

    control

    group n-

    cluded

    more

    disadvantaged

    hildren

    who

    had

    essaccess

    o books

    n their

    homes

    or

    who

    had

    parentswho

    read

    o them lessoften. (In a randomizedexperiment'both

    groupswould

    have

    had

    similar

    proportions

    of

    such

    children.)

    To assess

    his

    alter-

    nativi,

    the

    experimenter

    may

    measure

    he

    number

    of

    books

    at home,

    parental

    ime

    spent

    eadingto

    children,

    and

    perhaps rips

    o

    libraries.

    hen

    he

    researcher

    ould

    see

    f these

    variables

    differed

    across

    reatment

    and

    control

    groups n the

    hypothe-

    sized

    direction

    hat

    could

    explain

    the

    observed

    reatment

    effect.

    Obviously,

    as the

    number

    of

    plausible

    alternative

    explapations

    ncreases,

    he design

    of

    the

    quasi-

    .

    experiment

    becomes

    more

    intellectually

    demanding

    and

    complex---especially

    e-

    cause

    we

    are

    never

    certain

    we

    have

    dentified

    all

    the alternative

    xplanations.

    he

    efforts

    of

    the

    quasi-experimenter

    tart

    to

    look

    like affempts

    o bandage

    a

    wound

    that would havebeen essseveref random assignment ad beenused nitially.

    The ruling

    out

    of alternative

    ypotheses

    s closely

    elated

    o a

    falsificationist

    logic

    popularized

    y

    Popper

    1959).

    Popper

    noted

    how

    hard it

    is to be

    sure

    hat a

    g*.r"t conclusion

    e.g.,

    ll r*"ttr

    are

    white)

    is correct

    based

    on

    a

    limited

    set of

    observations

    e.g.,

    all

    the

    swans

    've seen

    were

    white).

    After

    all,

    future observa-

    tions

    may change

    e.g.,

    ome

    ay

    may

    seea

    black

    swan).

    So confirmation

    s

    log-

    ically

    difficult.

    By contrast,

    observing

    disconfirming

    nstance

    e.g.,

    a black

    swan)

    is sufficient,

    n

    Popper's

    iew,

    o

    falsify

    the

    general onclusion

    hat

    all

    swans

    are

    white.

    Accordingly,

    opper

    urged

    scientists

    o try

    deliberately

    o

    falsify the

    con-

    clusions

    hey

    wiih

    to

    draw

    rather

    than

    only

    to seek

    nformation

    corroborating

    them.

    Conciusions

    hat

    withstand

    alsification

    are

    retained

    n scientific

    books

    or

    journals and treated as plausibleuntil better evidencecomesalong. Quasi-

    experimentation

    s falsificationist

    n that

    it requires

    experimenters

    o

    identify

    a

    causal

    laim

    and

    then

    o

    generate

    nd examine

    plausible

    alternative

    xplanations

    that

    might

    falsify

    he

    claim.

    However,

    uch

    alsification

    an

    never

    be

    as definitive

    as

    Popper

    hoped.

    Kuhn

    (7962)pointed out

    that

    falsification

    depends

    n

    two

    assumptions

    hat

    can

    never

    be

    fully

    tested.

    The

    first

    is that

    the

    causal

    claim

    is

    perfectlyspecified.

    But

    that

    is

    never h.

    ."r..

    So

    many

    features

    of both

    the claim

    and

    the test

    of

    the claim

    are

    debatable-for

    example,

    which

    outcome

    s of

    interest,

    how

    it

    is measured,

    he

    conditions

    of

    treatment,

    who

    needs

    reatment,

    and

    all the

    many

    other

    decisions

    that researchers ustmake n testingcausal elationships. s a result,disconfir-

    mation

    often

    eads

    heorists

    o

    respecify

    art of their

    causal

    heories.

    For

    exam-

    ple,

    hey

    might

    now

    specify

    ovel

    conditions

    hat

    must

    hold

    for their

    theory

    o

    be

    irue

    and

    that

    were

    derived

    rom

    the apparently

    disconfirming

    observations.

    ec-

    ond,

    falsification

    equires

    measures

    hat are

    perfectly

    valid

    reflections

    f

    the the-

    ory

    being

    tested.

    However,

    most

    philosophers

    maintain

    that

    all

    observation

    s

    theorv-laden.

    t

    is laden

    both

    with

    intellectual

    nuances

    pecific

    o

    the

    partially

  • 8/17/2019 Cu as i Experimental

    17/81

  • 8/17/2019 Cu as i Experimental

    18/81

  • 8/17/2019 Cu as i Experimental

    19/81

    18

    I

    1. EXPERIMENTS

    NDGENERALIZED

    AUSAL

    NFERENCE

    Nonexperimental esigns

    The termscorrelational

    design,

    passive

    bservational esign,and

    nonexperimental

    design efer to situations n which a presumedcauseand effectare identifiedand

    measuredbut in which

    other structural featuresof experiments

    re missing.Ran-

    dom assignments not

    part

    of the design,nor are such design

    elements s

    pretests

    and control

    groups

    rom

    which researchers ight

    construct useful ounterfactual

    inference. nstead, eliance

    s

    placed

    on measuring lternative

    xplanationsndi-

    vidually

    and then statistically

    ontrolling

    for them. In cross-sectionaltudies

    n

    which all the

    data are

    gathered

    on

    the

    respondents t one ime,

    the researchermay

    not even know if

    the cause

    precedes

    he dffect.

    When these

    studiesare used or

    causal

    purposes,

    he missing

    design eatures an

    be

    problematic

    nlessmuch s

    al-

    ready known

    about

    which alternative

    nterpretations re

    plausible,

    unless hose

    that are

    plausible

    an

    be validly measured, nd unless

    he substantive

    model

    used

    for statistical djustments well-specified. hesearedifficult conditions o meet n

    the real

    world of research

    ractice,

    and therefore

    many commentators

    oubt the

    potential

    of suchdesigns

    o supportstrongcausal

    nferences

    n most cases.

    EXPERIMENTS

    ND

    THEGENERALIZATION

    F

    CAUSAL

    ONNECTIONS

    The strength

    of experimentation is its ability to

    illuminate causal

    inference.The

    weaknessof experimentation

    is doubt about the extent

    to which

    that causal rela-

    tionship

    generalizes.

    We

    hope that an innovative

    feature of this

    book is its focus

    on

    generalization.

    Here

    we

    introduce

    the

    general ssues hat are

    expanded

    n

    later

    chapters.

    Most Experiments

    re HighlyLocalBut Have

    GeneralAspirations

    Most experiments

    rehighly ocalized

    and

    particularistic. hey arealmostalways

    conducted n

    a

    restricted

    ange

    of settings,

    ften

    just

    one, with a

    particular

    ver-

    sion of one typeof treatment ather than, say,a sampleof all possible ersions.

    Usually they

    have severalmeasures-each

    with

    theoretical

    assumptionshat are

    different rom

    those

    present

    n

    other

    measures-but

    far from a complete etof all

    possible

    measures.

    ach

    experimentnearly always

    usesa

    convenient ampleof

    people

    ather

    than

    one that reflectsa well-described

    opulation;

    and

    it

    will

    in-

    evitably

    be conducted

    t a

    particular point

    in time that

    rapidly becomes

    istory.

    Yet

    readers f

    experimental esultsare rarelyconcerned

    ith

    what happened

    in that particular,past,

    ocal study.Rather, hey usually

    aim to

    learn eitherabout

    theoretical

    onstructs

    f

    interest

    or about alarger

    policy.Theoristsoften want to

  • 8/17/2019 Cu as i Experimental

    20/81

    EXeERTMENTS

    ND

    THE

    GENERALIZATIONF

    CAUSAL

    ONNECTIONS

    I

    t '

    connect

    experimental

    results

    to

    theories

    with broad

    conceptual

    applicability,

    which

    ,.q,rir.,

    generalization

    at

    the

    linguistic

    level

    of constructs

    rather

    than

    at the

    level of

    the

    operations

    used

    to

    represent

    these constructs

    in

    a

    given experiment.

    They

    nearly

    always

    want

    to

    generallze

    o

    more

    people and

    settings

    han

    are

    rep-

    resented

    n a single

    experiment.

    Indeed,

    the value

    assigned

    o

    a substantive

    heory

    usually

    depends

    on

    how

    broad

    a

    rangeof phenomena the theory covers.SimilarlS

    policymakers

    may be

    interested

    in whether

    a

    causal

    relationship

    would

    hold

    iprobabilistically)

    across

    he

    many

    sites

    at

    which

    it would

    be

    implemented

    as a

    policS an

    inference

    hat

    requires

    generalization

    beyond

    the

    original

    experimental

    stody

    contexr.

    Indeed,

    all

    human

    beings

    probably

    value

    the

    perceptual and

    cogni-

    tive stability

    that

    is fostered

    by

    generalizations.

    Otherwise,

    the

    world

    might ap-

    pear

    as a

    btulzzing

    acophony

    of

    isolqted

    instances

    requiring

    constant

    cognitive

    processing hat

    would

    overwhelm

    our

    limited

    capacities.

    In defining

    generalization

    as

    a

    problem,

    we

    do

    not assume

    hat more

    broadly

    ap-

    plicable

    resulti

    are

    always

    more

    desirable

    Greenwood, 1989).

    For example,

    physi-

    cists -ho useparticle accelerators o discover new elementsmay not expect that it

    would

    be desiiable

    to

    introduce

    such

    elements

    nto the

    world.

    Similarly,

    social

    scien-

    tists

    sometimes

    aim

    to

    demonstrate

    that

    an

    effect

    is

    possible and

    to understand

    its

    mechanisms

    without

    expecting

    that

    the

    effect

    can be

    produced

    more

    generally.

    For

    instance,

    when

    a

    "sleeper

    effect"

    occurs

    in an

    attitude

    change

    study

    involving

    per-

    suasive

    communications,

    the

    implication

    is

    that

    change

    s manifest

    after

    a time

    delay

    but

    not

    immediately

    so.

    The circumstances

    under

    which

    this

    effect

    occurs

    turn

    out

    to

    be

    quite

    limited

    and

    unlikely

    to

    be

    of any

    general

    nterest

    other

    than

    to

    show

    that

    the

    theory

    predicting

    t

    (and

    many

    other

    ancillary

    theories)

    may

    not be

    wrong

    (Cook,

    Gruder,

    Hennigan

    &

    Flay

    l979\.Experiments

    that

    demonstrate

    limited

    generaliza-

    tion

    may be

    ust

    as

    valuable

    as hose

    hat

    demonstrate

    broad

    generalization.

    Nonetheless,

    conflict seemso exist berween he localizednature of the causal

    knowledge

    that

    individual

    experiments

    provide

    and

    the

    more

    generalized

    causal

    goals hat

    research

    aspires

    o attain.

    Cronbach

    and

    his

    colleagues

    Cronbach

    et

    al.,

    f

    gSO;

    Cronbach,

    19821have

    made

    this

    argument

    most

    forcefully

    and their

    works

    have

    contributed

    much

    to

    our

    thinking

    about

    causal

    generalization.

    Cronbach

    noted

    that

    each

    experiment

    consists

    of

    units

    that

    receive

    he

    experiences

    eing

    con-

    trasted,

    of

    the

    treaiments

    themselves

    of obseruations

    made

    on the

    units,

    and

    of the

    settings

    in

    which

    the

    study

    is conducted.

    Taking

    the

    first

    letter

    from each

    of

    these

    four

    iords,

    he defined

    the

    acronym

    utos

    to

    refer

    to the

    "instances

    on which

    data

    are

    collected"

    (Cronb

    ach,

    1.982,p.

    78)-to

    the

    actual

    people, reatments'

    measures'

    and settings hat were sampled n the experiment.He then defined wo

    problems

    of

    generalizition:

    (1)

    generaliiing

    to

    the

    "domain

    about

    which

    [the]

    question

    is

    asked"

    (p.7g),which

    he called

    UTOS;

    and

    (2)

    generalizing

    o

    "units,

    treatments,

    variables,

    "nd

    r.r,ings

    not

    directly

    observed"

    (p.

    831,

    hi.h

    he called

    oUTOS.e

    9. We oversimplify

    Cronbach's

    presentation

    here

    or

    pedagogical

    easons.

    For example,

    Cronbach

    only

    usedcapital

    S,

    not small s,

    so that

    his system

    eferred

    only

    to

    ,tos,

    not

    utos. He

    offered

    diverse

    and

    not always

    consistent

    definitions

    of

    UTOS and

    *UTOS,

    in

    particular.

    And

    he

    does

    not

    use he

    word

    generalization

    n

    the

    same

    broad

    way we

    do here.

  • 8/17/2019 Cu as i Experimental

    21/81

    I

    20 I 1. EXPERIMENTSNDGENERALIZEDAUSAL

    NFERENCE

    Our

    theory

    of

    causal

    generalization,

    utlinedbelowand

    presented

    n morede-

    tail in ChaptersLL through

    13, melds

    Cronbach's

    hinking

    with

    our own

    ideas

    about

    generalization

    rom

    previous

    works

    (Cook,

    1990, t99t;

    Cook 6c Camp-

    bell,1979), creatinga theory that is different n modestways rom both of these

    predecessors.

    ur theory

    s influenced y Cronbach'swork

    in

    two

    ways.First,we

    follow him

    by

    describing

    xperiments

    onsistently

    hroughout

    his

    book as con-

    sistingof the elements f units, treatments,

    bservations,

    nd

    settingsrlo

    hough

    we

    frequently ubstitute

    ersons

    or

    units

    given

    hat most

    ield

    experimentation

    s

    conductedwith humansas

    participants.

    We

    lso

    often

    substitute

    utcome

    .orob-

    seruations

    iven

    he centrality of observations

    bout

    outcome

    when examining

    causal

    elationships. econd, e acknowledge

    hat

    researchers

    reoften

    nterested

    in two

    kinds

    of.generalization bout

    eachof these

    ive

    elements,

    nd that

    these

    two typesare nspiredbg but

    not identical o, the

    two

    kinds of

    generalization

    hat

    Cronbach defined.

    We

    call these

    construct validity

    generalizations

    inferences

    about he constructshat research perations epresent) nd external aliditygen-

    eralizations

    inferences

    bout whether he causal

    elationship

    oldsover

    variation

    in

    persons,

    ettings,reatment,

    and measurement

    ariables).

    Construct alidity:CausalGeneralization

    as

    Representation

    The first

    causal

    generalization

    problem

    concerns

    how to

    go

    from the

    particular

    units, treatments,

    observations, and settings

    on

    which data

    are collected

    to the

    higher order constructs these nstances epresent.These constructs are almost al-

    ways couched in terms that are

    more abstract

    than the

    particular

    instancessam-

    pled

    in an experiment. The labels may

    pertain

    to

    the

    individual

    elementsof

    the ex-

    periment

    (e.g.,

    is the outcome

    measured by

    a

    given

    test

    best described

    as

    intelligence or as achievement?).Or

    the labels

    may

    pertain

    to

    the

    nature of

    rela-

    tionships among elements,

    ncluding causal

    relationships,

    as

    when cancer

    treat-

    ments are

    classified as

    cytotoxic or cytostatic

    depending

    on

    whether

    they

    kill tu-

    mor cells directly or delay tumor

    growth

    by

    modulating

    their

    environment.

    Consider a

    randomized

    experiment

    by Fortin

    and

    Kirouac

    (1.9761.

    he treatment

    was a brief

    educational

    course administered

    by several

    nurses,

    who

    gave

    a tour of

    their

    hospital

    and covered

    some basic facts

    about

    surgery

    with

    individuals

    who

    were to have elective abdominal or

    thoracic surgery

    1-5 o 20 days later in a sin-

    gle

    Montreal hospital. Ten specific outcome

    measures

    were used

    after the

    surgery,

    such

    as an activities

    of

    daily living scaleand

    a count

    of the

    analgesics

    sed

    o con-

    trol

    pain.

    Now compare this study