Problems with the Design and Implementation of Randomized … · 2009-09-03 · VSM 6 actually required thinking through quite a few things that weren‟t necessarily obvious either

VSM 1

Inst i tute of Educat ion Sciences

Fourth Annual IES Research Conference

Concurrent Panel Session

“Problems with the Design and Implementat ion

of Randomized Experiments”

Monday

June 8, 2009

Marriot t Wardman Park Hotel

Thurgood Marshal l South

2660 Woodley Road NW

Washington , DC 20008

VSM 2

Contents

Moderator:

Al len Ruby

NCER 3

Presenter:

Larry Hedges

Northwestern Universi ty 5

Q&A 42

VSM 3

Proceedings

DR. RUBY: Good afternoon, everyone . I am Allen Ruby from

IES. Thank you for at tending, and just a reminder, i f you ask a quest ion,

because this session is being taped, please s tate your name and your

aff i l iat ion i f you ‟d l ike to .

Last year , I int roduced Dr. Larry Hedges at the IES Research

Conference . He was doing a talk on inst rumental variables , and at that t ime, I

noted he had joined the Northwestern facul ty in 2005 . He has appointments in

s tat is t ics , educat ion, and soc ial pol icy. He is one of the eight Board of

Trustees Professors at Northwestern.

Previously, he was the Stel la Rowley Professor at the Universi ty

of Chicago, and his research has involved many f ields including sociology,

psychology and educat ional pol icy.

He is widely known for his development of methods for meta -

analysis . He has authored numerous art icles and several books . He has been

elected a member or a fel low of many boards, associat ions, and professional

organizat ions including the Nat ional Academy of Educat ion; the American

Stat is t ical Associat ion—come on down. There are plenty of seats up front ,

please— American Stat is t ical Associat ion; the Society for Mult ivariate

Experimental Psychology.

He has served in an edi torial posi t ion on a number of jo urnals

including the American Educat ion Research Journal ; the American Journal of

Sociology ; the Journal of Educat ion and Behavioral Stat is t ics ; and the

Review of Educat ional Research .

VSM 4

So I don ‟ t want to go into the detai ls again this year . I real ly just

want to ask what ‟s he done for educat ion research lately?

[Laughter . ]

DR. RUBY: And to that end, I‟d l ike to note two areas of his

current work that might be of interest to some of the folks here.

When we evaluate educat ional intervent ions in schools , we often

act as i f those schools included in our sample are a probabi l i ty sample from a

populat ion of schools , when, in fact , they ‟re real ly not . They‟re usual ly a

convenient sample of schools that are wil l ing to work with us .

And so Larry is current ly wo rking on how to address the

nonprobabi l i ty sampling in order to improve our es t imates of populat ion

t reatment effects or to improve the general izab i l i ty of the resul ts from our

experiments .

He has publ ished one piece of this work in a chapter ent i t led

“General izabi l i ty of Treatment Effects , ” in the book Scale-Up in Principle,

2007, edi ted by Schneider and McDonald.

A second area is as we move to a greater use of cluster

randomized t r ials , there are implicat ions for what the effect s izes are and

mean; and he has been working on improving methods for defining effect

s izes in cluster randomized t r ials ; and he has publ ished two pieces of work in

this area: one in 2007 in the Journal of Educat ional and Behavioral

Stat is t ics ; and another one in the recent ly rele ased Handbook of Research

Synthesis ; and another piece wil l be coming out soon in JEBS on three -level

designs.

VSM 5

Third, and what he ‟s talking about today, he ‟s been considering a

set of issues that often crop up when designing and carrying out randomized

t r ials . These issues have important implicat ions for preserving the integri ty of

the experiment , doing the analysis , and interpret ing the resul ts .

So I‟d l ike everyone to please welcome Dr. Larry Hedges .

Thank you.

[Applause.]

MR. HEDGES: Well , i t ‟s always tough to l ive up to an

introduct ion l ike that , and I ‟ l l jus t begin this talk by saying that IES put me

up to i t , and they put me up to i t in the fol lowing way: that some folks on the

IES s taff , and that actual ly wasn ‟ t jus t Al len, said to me, you know , we keep

get t ing these quest ions from people that are real ly sort of s imple quest ions,

and I‟m surprised that folks don ‟ t know the answer to those quest ions . Maybe

you could just give a talk to clar i fy, you know, these things for people.

And that ‟s how I got put up to doing what I ‟m going to t ry to do

today. And when I was asked to sort of gin up a t i t le for the session, I came

up with the t i t le “Problems with the Design and Implementat ion of

Randomized Experiments .”

But a few days ago when I was talking to Allen about this session

in somewhat greater detai l , he helped me to real ize that a bet ter t i t le for this

talk would be “Hard Answers to Easy Quest ions. ”

[Laughter . ]

MR. HEDGES: Because the quest ions that kept cropping up were

in principle easy ques t ions, but in order to give a good answer to them, i t

VSM 6

actual ly required thinking through qui te a few things that weren ‟ t necessari ly

obvious ei ther to the person asking the quest ion or to the person answering

the quest ion.

I think I know, I see a lot of m y methodological ly incl ined fr iends

in the audience, and I know we ‟ve al l had the experience of somebody coming

up to you and saying I ‟ve got a real ly s imple quest ion, and they think i t has a

yes/no answer, and they think that enough information to answer the quest ion

is contained in the f i rs t s imple phrasing of i t , and then we have to s i t them

down for a few hours and talk through a lot of detai ls of what they actual ly

mean, and so this , in a sense, this talk has some of that character .

I‟m going to s tar t wi th an easy quest ion that has ar isen frequent ly

in my experience, and I guess in IES ‟s experience as wel l . People wil l say

isn ‟ t i t okay i f I just match something l ike schools on some variable before

randomizing? And then they often go on to say, you kno w, lots of people do

i t .

[Laughter . ]

MR. HEDGES: And this quest ion has been asked of me direct ly

by some fai r ly sophis t icated people, and, you know, when they say that , you

know, “ lots of people do i t ,” what they are real ly saying is , “come on, i t can ‟ t

make that much difference, and I know lots of people who do this and kind of

ignore the fact that they did i t later . You know, surely, this is okay; r ight? ”

And I see this as an example of a s imple quest ion, but giving i t

an answer requires some seriou s thinking about design and analysis , and in a

certain sense, this is an opportuni ty to raise people ‟s consciousness about

VSM 7

some things you probably al ready know.

Maybe for many of you, al l of the things are things you already

know. In that case, you don ‟ t need this talk, but i f there are some things that

you might not have thought about qui te this way, maybe i t wi l l be helpful .

Well , the f i rs t s tep for me in thinking about a quest ion l ike this is

t rying to understand exact ly what the quest ion means, and n ot the only way,

but one way to think about i t i s that adding a matching or a blocking variable

means adding another factor to the research design.

So what I‟m going to do today, I‟m going to rely heavi ly on kind

of a s tandard analysis of variance, s tanda rd experimental design approach to

thinking about these things, not because analysis of variance is necessari ly

how you would analyze the data from a real experiment , but because i t makes

certain things real ly clear and has a sort of universal vocabulary t hat may be

helpful .

So i f you see a lot of analysis of variance kind of approach to

things, that ‟s why i t ‟s there .

And then to move on, that s imple quest ion, you know, has

consequences that depend a l i t t le bi t on which design you s tar ted with . You

might have s tar ted with an individual ly randomized design, the sort of

completely randomized design, no clustering or anything l ike that .

I want to talk a l i t t le bi t about that because i t ‟s kind of the

paradigm for answering the quest ions with respect to more complex designs

that we may actual ly be more l ikely to use in educat ion ; things l ike cluster

randomized designs, which might also be cal led hierarchical designs i f you

VSM 8

read a book l ike Kirk ‟s Experimental Design book; and mult icenter or

matched designs, which might be cal led general ized randomized blocks

designs i f you look in a s tandard kind of experimental design book.

So i f we s tar ted with, i f the place we s tar ted was with an

individual ly randomized design where we basical ly ass igned t reatments to

individuals at random; adding a blocking factor ; using the blocks ; matching a

l i t t le bi t ; corresponds to essent ial ly changing the design from a s imple

individual ly randomized design to a randomized blocks design or what is

actual ly usual ly cal led a general ized randomized blocks design.

And you could, and this is a sort of sample depict ion, and I ‟m

going to make our discussion easy by imagining that we ‟ve picked the levels

of our blocking factor in such a way that we get sort of equal numbers and

miraculously equal numbers in both t reatment and control in each block.

So the f i rs t point that ‟s useful in thinking about what happens i f

we had a blocking factor is that i t depends on the design and that adding that

changes the design.

Now, i f we want to ask what th e impact on the analysis might be,

wel l , we could, again, think about this in analysis of variance terms, and i f

we imagine that we have got ten incredibly lucky in the choice of our

blocking, levels of our blocking factor , so that we get equal numbers of

s tudents in each block, 2n s tudents in each block, n in t reatment and n in

control , and p blocks, then we would have a s tandard part i t ioning of the sums

of squares , and that goes, that we talk about in analysis of variance.

And I think of the part i t ioning of the sums of squares in terms of

VSM 9

part i t ioning of the variabi l i ty of the data, and so the not ion is the total

amount of variabi l i ty gets part i t ioned into part that ‟s due to t reatment and

part that ‟s due to within t reatments in the original part i t ioning.

And the degrees of freedom get part i t ioned, and the original tes t

s tat is t ic compares the sum of squares due to t reatment , which is a funct ion of

mean difference, to the variance within t reatment groups, in other words, the

sum of squares within t reatment g roups divided by the degrees of freedom

there.

Now, by introducing this blocking factor , we ‟ve introduced the

possibi l i ty of part i t ioning the variat ion further . If we s tar t wi th our

individual ly randomized design, but now introduce a blocking factor , we h ave

a different part i t ioning of the total variat ion, one in which there is a certain

amount of variat ion between blocks, a certain amount of variat ion that is

associated with the block t reatment interact ion, the fact that wi thin each

block, we have both t r eatment and control individuals . So there ‟s a block-

specif ic t reatment effect that we could compute.

And the block t reatment interact ion corresponds to an amount of

heterogenei ty that has to do with heterogenei ty in t reatment effects .

And then there ‟s a sort of within cel ls , that is wi thin blocks by

t reatments component .

Now, once we real ize we have this new design with a new

part i t ioning of the variat ion, the quest ion arises : wel l , what ‟s the r ight tes t

s tat is t ic?

And I sort of mean this in two senses . You know one sense you

VSM 10

could mean i t i s how do I actual ly compute the tes t , but also, you know, how

is i t sensible? With what is i t sensible to compare the variat ion due to

t reatments in order to decide whether the variat ion due to t reatments r ises

above the level of the background noise?

Well , that depends on the inference model . Now, that term may

be— I‟ l l jus t— this is just a review, by the way, of the sums of squares , that in

essence, what used to be cal led within t reatments in the original design now

gets broken up into three components : a between blocks component ; a block

t reatment interact ion component ; and a within blocks by t reatments , a within

cel l component .

And that ‟s an important thing about this design, a potent ial

advantage of this design . But as I was about to say, in order to f igure out

what the r ight thing to do, now that we ‟ve added this blocking factor to our

design, we have to think about the inference model that we want to at tach to

this design . And I need to say a l i t t le bi t about th at because one of the things

that isn ‟ t discussed in qui te these terms in many experimental design courses

is the not ion of inference models .

It ‟s always talked about , but not exact ly in this way . And I‟d l ike

to make a dis t inct ion between two kinds of in ference models , one which I ‟m

going to cal l condi t ional inference model , and one which I ‟m going to cal l the

uncondi t ional inference model .

And the point of making this dis t inct ion is that the inference

model is associated with, determines, and is determ ined by, the type of

inference that you want to make from the experiment ; and once you ‟ve chosen

VSM 11

the inference model , that has implicat ions for the appropriate s tat is t ical

analysis procedure ; and to put i t , to relate i t to something else that probably

confuses a lot of folks in their f i rs t experimental design course, the inference

model determines what are the natural random and f ixed effects in the model .

So let me say a l i t t le bi t more about what I mean by inference

models . The condi t ional inference mode l is one in which the general izat ions

you hope to make are to the blocks that are actual ly in the experiment or ones

that are just l ike them.

And what does “ just l ike them” mean? It means they respond,

that in a sense they have exact ly the same effect pa rameters .

In the condi t ional inference model , blocks in the experiment are

the universe you want to general ize to . And the idea of general izat ion to other

blocks depends on extra -s tat is t ical considerat ions . You have to make an

argument that the resul ts o f this experiment apply to some other blocks

because those blocks, those schools— frequent ly blocks are schools or they ‟re

school dis t r icts—are just l ike the ones that are in the experiment , and that ‟s

an ex tra-s tat is t ical argument .

So i t ‟s clear that general izat ion from a condi t ional , in the context

of a condi t ional inference model to things outs ide the observed blocks, say

the observed school dis t r icts , i s something that can ‟ t be done in a model free

way.

By the way—well , let me go on and say a word abou t the

uncondi t ional inference model . So what ‟s the al ternat ive? Well , the

al ternat ive that I‟m posing, and i t may not be exact ly the only al ternat ive, is

VSM 12

that we have an uncondi t ional inference model , which means that the

inference is intended to be a gene ral izat ion to a universe of blocks which

expl ici t ly includes at least some blocks that are not in the experiment .

Therefore, we think of the blocks in the experiment as being a

sample of blocks from a universe or populat ion of potent ial blocks . There ‟s

space ei ther place . And the idea, the usual idea, in the uncondi t ional

inference model is that the blocks in the experiment are a representat ive

sample of some populat ion, and the inference that ‟s of interest is to the

populat ion of blocks that the experimen t are a representat ive sample of .

And the beauty of the uncondi t ional inference s t rategy is that

inference to blocks not contained in the experiment is by sampling theory,

and we have this wonderful theory of general izat ion based on sampling.

Now, i f the blocks aren ‟ t a probabi l i ty sample of any populat ion

that you can specify, the general izat ion gets t r icky because the quest ion

becomes what is the relevant universe and how would you know ; and, in a

sense, ex tra-s tat is t ical considerat ions, just l ike in th e condi t ional inference

case, come into play and are, in some ways, just as t r icky.

Now, I want to refer this s i tuat ion that I ‟m talking about to a

s i tuat ion that you ‟re al l famil iar wi th but may not seem exact ly the same to

you, and that is the case of s t rat i f icat ion and clustering in sample surveys.

If you think about the way in which we do inference in sample

surveys, many sample surveys pick intact groups, a sample of intact groups,

clusters , and we get the sample by sampling f i rs t the clusters and th en

individuals within the clusters .

VSM 13

Now, when we do this , what we ‟re doing is taking a sample of

clusters and t rying to make an uncondi t ional inference to the universe from

which those clusters have been sampled . If you happen to have al l of the

clusters in the universe you want to infer to , we use a different name for the

clusters in sample survey work ; we cal l them strata.

And when you have a sample that ‟s based on s t rata, that is al l the

subdivis ions of the populat ion, and not clusters , some of the sub divis ions of

the populat ion, i t has implicat ions for the inference . It has implicat ions for

the uncertainty of the inference, in part icular , and i t ‟s exact ly paral lel here in

the experimental s i tuat ion.

So you can think of the inference model as being l in ked in an

inextr icable way to the sampling model for blocks . If the blocks that you

observe are ideal ly a random sample of blocks, they ‟re a source of random

variat ion . If the blocks observed are the ent i re universe of relevant blocks,

they are not a source of random variat ion . Just l ike clusters are a source of

added variat ion in a sample survey but s t rata aren ‟ t .

Now, the reason I‟m making a big deal about inference models ,

and I haven ‟ t said anything about s tat is t ical analysis yet , i s that I think i t

actual ly— these two things are often conflated in people ‟s understanding of

what to do in experiments , and I think i t helps to separate these two

conceptual ly different things.

There is this inference in sampling model , and then there ‟s what

analysis you do to t ry and learn something from data you col lect under those

models , and the fact is you can do any s tat is t ical analysis with any sampling

VSM 14

model ; some of them won ‟ t be sound. But , nonetheless , these two things are

in principle separable.

So going back to our s imple example, i f we think of the blocks as

f ixed effects , i f we ‟re wil l ing to , i f we ‟re wil l ing to entertain a condi t ional

inference model , that is we ‟re interested in inferr ing to the universe of blocks

we have observed in the experiment , then i t ‟s ent i rely appropriate to think of

the blocks as being f ixed effects , and in that case, we get one tes t s tat is t ic

which basical ly is the one that I ‟ve l is ted here as F condi t ional .

And the only thing that winds up in the error term in the

denominator we compare to the variat ion due to t reatment for the purposes of

seeing whether the t reatment variat ion r ises above the noise is just the

within-cel l variat ion.

And there ‟s the exact dis t r ibut ion of this thing is known . It ‟s an F

with the degrees of freedom th at I‟ve given here .

But i f the blocks are random effects , in other words, i f we think

of the blocks we have introduced as not being a universe unto themselves , but

being a sample from a universe we want to infer to— for example, we have

school dis t r icts . We‟ve blocked by school dis t r icts . We‟re not interested in

making a general izat ion to the school dis t r icts we happen to observe . We

want to make a general izat ion to the universe of school dis t r icts from which

we could putat ively say ours are a representat i ve sample, then the appropriate

tes t s tat is t ic is di fferent .

And in this case, the appropriate error term is determined by the

block t reatment interact ion variance, the sum of the squares is due to the

VSM 15

block t reatment interact ion, and i t ‟s notable that that error term has a lot

fewer degrees of freedom.

By the way, one of the ways of thinking about why this F

uncondi t ional is the appropriate tes t s tat is t ic is to sort of go back to the

design and remember that in this design, we can compute a t reatment eff ect

wi thin each block, and what this analysis of variance tes t s tat is t ic is , i s

basical ly just the equivalent of doing a t - tes t on the block specif ic t reatment

effects , only i t ‟s an F tes t . You know, square of a t - tes t on the block specif ic

t reatment effec ts .

Now, you can see, i f you compare these two, that the error term in

the condi t ional inference tes t , the so -cal led “f ixed effects model ,” has a

whole lot more degrees of freedom than the one in the uncondi t ional model .

If p is the number of blocks, we have, you know, basical ly 2pn

minus 2p degrees of freedom under the condi t ional analysis and only p minus

1 under the uncondi t ional analysis .

And that ‟s a difference so you can al ready tel l the condi t ional tes t

i s going to be more sensi t ive .

What you can ‟ t see immediately from just comparing the tes t

s tat is t ics is that i f there is a t reatment effect , the average value of the F

s tat is t ic is going to be larger under the f ixed effects model . General ly, i t ‟s

going to be larger , and actual ly what I mean here is not qui te the average

value of the F s tat is t ic . What I mean is the so -cal led “noncentral i ty

parameter ,” but i t ‟s , that ‟s a dis t inct ion that probably isn ‟ t important for the

general point .

VSM 16

And, in fact , you can see that the rat io of the noncentral i ty

parameters under the two models with due al lowance for some l i t t le cheat ing

on notat ion depends on this factor . It depends . Well , f i rs t of al l , you can see

that i f al l these numbers are bigger than 1 in this fract ion, then, lo and

behold, the fract ion wil l b e bigger than 1.

The crucial issue is i f there is heterogenei ty in the t reatment

effects across blocks, the f ixed effects analysis is going to have, is going to

have a larger noncentral i ty parameter . It ‟s going to be more powerful .

This omega parameter here is going to crop up again . It ‟s going to

turn out that when we introduce blocking factors , and part icularly i f those

blocking factors are not considered f ixed effects , then the amount by which

the t reatment effect varies across blocks is going to enter into the sensi t ivi ty

of the analysis .

But , remember, that shouldn ‟ t surprise you . If you think again

about what this uncondi t ional analysis is in this design, i f you think about

computing block specif ic t reatment effects and then doing, asking whether th e

average of the block specif ic t reatment effects is di fferent than zero, the sort

of intui t ion is , wel l , how am I going to do that?

Well , I‟m going to take the average of the t reatment , the block

specif ic t reatment effects , and I ‟m going to compare i t to the s tandard

deviat ion of the block t reatment , at the block specif ic t reatment effects , and

so i f there ‟s heterogenei ty there, i t ‟s going to be harder to detect effects .

You know, the detai ls of the symbols don ‟ t mat ter , but this is sort

of a general idea here that I think is accessible . I hope i t ‟s accessible.

VSM 17

Now, remember, the quest ion that I was asked, essent ial ly, can I

kind of ignore the blocking, suggested that the s tat is t ical analysis was going

to be not l inked to the sampling model necessari ly, and so there are three

possible things you could imagine doing for the analysis of your now blocked

design.

One is you could ignore the blocking al l together , and that ‟s what

the impetus of the original quest ion is : can ‟ t I do this and then not pay

at tent ion to i t? You could include the blocks as f ixed effects . You can

include the blocks as random effects .

And the point is which of those is a good thing to do depends on

the kind of inference you ‟re hoping to make . If you ‟re interested in making

uncondi t ional inferences, that is inferences to blocks, let ‟s say school

dis t r icts beyond the ones you observed, then, wel l i t ‟s going to turn out

ignoring blocking is always a bad idea.

But ignoring blocking here is a part icularly bad idea because i t

real ly can inflate the s ignif icance levels in ways that you would f ind

surpris ing. Your .05 s ignif icance level , i f you ignore the blocking and carry

out a tes t at the .05 s ignif icance level , i t ‟s possible that 50 percent of the

t ime, you wil l reject the nul l hypothesi s by chance when the nul l hypothesis

is t rue.

So the actual s ignif icance level might be 50 percent or even

higher in plausible ci rcumstances . So i t ‟s a real ly terr i f ical ly bad idea to

ignore blocking in most cases .

Now, i f you ‟re interested in making uncondi t ional inferences,

VSM 18

including the blocks as f ixed effects and doing that analysis is also a bad

idea, and i t also leads you to ant i -conservat ive decis ions about hypothesis

tes ts for t reatment effects .

Including the blocks as random effects , you know, i s the r ight

thing, but i t ‟s an analysis that has less power than the condi t ional analysis .

Now, that doesn ‟ t mean i t ‟s wrong; i t ‟s an analysis that ‟s making a different

inference.

You ought to expect the analysis that is sui ted to the

uncondi t ional inference to be less sensi t ive than the analysis sui ted to the

condi t ional inference . Well , why is that? Because i t ‟s real . It ‟s a lot harder

to f igure out whether there ‟s a t reatment effect in a whole populat ion of

blocks, most of which you haven ‟ t observed, than i t i s to f igure out i f there ‟s

a t reatment effect in a specif ic set of blocks which you have happened to

observed al l of .

Okay. If you want to make condi t ional inferences in this case,

that is you real ly only care about inferr ing about t reatment effec ts in the set

of school dis t r icts you ‟ve observed, then i t ‟s s t i l l a bad idea to ignore

blocking because that actual ly has the reverse effect . It actual ly can lead to

deflat ing s ignif icance levels . In other words, you may have huge downward

impacts on the power.

You can include blocks as f ixed effects , which is the r ight thing

to do. You can include blocks as random effects , but this may also have the

effect of reducing power and making i t very hard to detect the effects when

they‟re there.

VSM 19

So the not ion that i t ‟s always the r ight thing to do, i f in doubt ,

assume that the blocks are random effects , i s not necessari ly a good pol icy.

Now, i f we move on to the designs— I‟m going to move on to the

cluster randomized design, the hierarchical design, and point o ut that the kind

of approach that I just out l ined can be fol lowed there, too . You have to think

about blocking in the cluster randomized design as adding another factor to

the design . The inference model is going to determine what the most

appropriate analysis is , and you can reason that ei ther way.

You can ei ther reason from analysis backwards inference model

or the other way around, but one important thing is that int roducing a

relat ively smal l number of blocks here, as wel l as in the completely

randomized design, may leave you with very reduced uncertainty to a larger

universe of blocks based on a s tat is t ical inference kind of , you know,

sampling theory general izat ion.

So what do I mean by i t al l works the same way for cluster

randomized designs? Well , i f the original design was that we have clusters

ass igned to t reatments , I t r ied to depict here a case in which we have, you

know, clusters l is ted across the top and some of them are assigned to

t reatment and some of them are assigned to control .

But because clusters are ass igned to one and only one t reatment—

you know, no cluster is ass igned to both t reatment and control—when we

introduce a blocking factor , what happens is , at least ideal ly, we get a set of

clusters within each block that are ass igned to t reatment , and another set of

clusters within that block that are ass igned to control , and so the dashes there

VSM 20

are intended to indicate that those are, you know, those are non -assignments .

In a sense, the f i rs t m clusters in the f i rs t block are ass igned to t reatment , and

the second m clusters in the f i rs t block are ass igned to control , et cetera.

So this is a knowable design . It ‟s a part ial ly hierarchical design .

In this design, we see that t reatments are crossed with blocks, but clusters are

nested within block t reatment combinat ions, and this has at least one

immediate implicat ion for , you know, what the design can reveal to us , and

that is that s ince each block has both t reatments , we can actual ly es t imate

block specif ic t reatment effects here.

We can ‟ t es t imate cluster specif ic t reatment effects , but we can

est imate block specif ic t reatment effects .

So we might imagine applying this design in a case where the

blocks are school dis t r icts and clusters are schools , and we have some schools

in each dis t r ic t ass igned to each t reatment .

Okay. Well , the usual part i t ioning, the original part i t ioning, i f we

want to ask how this impacts the analysis , the original part i t ioning would just

subdivide the total variat ion into part due to t reatment , part due to cluste rs ,

and part due to within -cluster t reatment combinat ion, wi thin, wel l , wi thin

clusters that are within t reatments .

And there ‟s a sort of s tandard analysis which would tel l us that

the r ight tes t s tat is t ic compares the t reatment variat ion to the variat ion due to

clusters .

Now, when we introduce this blocking factor , we ‟ve sort of

int roduced the possibi l i ty of a bigger part i t ioning of a total variat ion in the

VSM 21

experiment into part that ‟s due to t reatment , part that ‟s due to blocks, part

that ‟s due to the block t reatment interact ion . Remember, this is a case in

which each block has both t reatment and control , so we can est imate a

t reatment effect in each block, and therefore, there ‟s the possibi l i ty of

heterogenei ty of t reatment effects across blocks.

Then we also have a part that ‟s variat ion due to variat ion between

the clusters within block t reatment combinat ions, and then we have a part

that ‟s due to within clusters , and i f we ask, wel l , how should we evaluate the

variat ion due to t reatment , how should we, what should we compare the s ignal

to , wel l , i t ‟s sort of not immediately obvious, and i t depends on the inference

model .

If we think of the blocks as being f ixed, that is we ‟re interested

in a condi t ional inference about the blocks—now in this case, I‟m thinking

that we ‟re imagining that we don ‟ t have al l— that ei ther we don ‟ t have or we

don ‟ t wish to think of having al l the clusters , al l the schools within a dis t r ict

in our experiment .

But we are, but we might be interested in ei ther inferr ing to just

the set of dis t r icts in the experiment or some bigger set of dis t r icts .

Well , i f we ‟re interested in the condi t ional inference, that is an

inference that ‟s formal ly to the dis t r icts we happen to observe in our

experiment , then we get , you know, one tes t s ta t is t ic . We compare the

t reatment versus the variat ion across clusters within block t reatment

combinat ions.

If we ‟re interested in inferr ing to a larger col lect ion of blocks

VSM 22

than the ones we ‟ve observed, to a larger universe, an uncondi t ional

inference, then, the appropriate tes t—well , there actual ly turns out there isn ‟ t

an exact appropriate tes t and analysis of variance, but there ‟s an approximate

one, and that ‟s s l ight ly conservat ive, and that one would compare the

t reatment variat ion to the variat ion ac ross blocks.

Now, the key point here is that the exact ly appropriate thing to do

isn ‟ t the same in these two cases . And as you ‟d expect when we ‟re making a

condi t ional inference, there ‟s a lot more degrees of freedom for es t imat ing

the uncertainty of the t reatment effect , the more degrees of freedom in the

error term, than there is under our uncondi t ional inference . And so that would

lead you to say, okay, the condi t ional inference is probably going to be more

powerful than the uncondi t ional inference, and that shouldn ‟ t surprise us ,

again, for the same reason i t wasn ‟ t surpris ing before.

The thing that ‟s maybe a l i t t le less obvious is— than just the

degrees of freedom difference— i s there ‟s a difference in the noncentral i ty

parameters , a difference in the a verage s ize of the tes t s tat is t ic . And in

general , the average s ize of the tes t s tat is t ic is going to be bigger when

there‟s a t reatment effect under the condi t ional model than under the

uncondi t ional model ; and exact ly how much bigger i t ‟s going to be depends

on how heterogeneous the t reatment effects are across blocks and exact ly how

heterogeneous the clusters are within blocks.

And so this term actual ly, this omega rho B term sort of is a term

that describes the amount of heterogenei ty across blocks in the t reatment

effect , and row C here is the interclass correlat ion which describes—across

VSM 23

clusters—which describes the amount of variat ion that there is across

clusters .

Now, as before, there are different possible analyses . We could

ignore the blocking a l l together . We can include the blocks as f ixed effects .

We could include the blocks as random effects , and which is the r ight thing to

do depends on how we want to think about the inference model we want to

make, whether we want to make a condi t ional infe rence or an uncondi t ional

inference.

As before, ignoring blocks is a bad idea. It inf lates s ignif icance

levels of the tes t , and i t ‟s a big effect . Including the blocks as f ixed effects is

a bad idea i f we want to make an uncondi t ional inference because i t wi l l also

lead us to erroneous s ignif icance levels . And including blocks as random

effects is sort of the r ight thing to do i f we want to make uncondi t ional

inferences.

And we have a paral lel s i tuat ion that i f we want to make

condi t ional inferences to th e blocks in the experiment , ignoring blocking is

s t i l l a bad idea . Including blocks as f ixed effects is the r ight thing to do . In

this case, s t rangely enough, including blocks as random effects doesn ‟ t have,

i t doesn ‟ t have a big effect on the s ignif icance tes t , which is not something

you would have expected but turns out to be t rue.

Now, I should say something about these sort of general pieces of

advice here . When I s tar ted this talk, I thought i t would be a good idea to

give you some formulas for these things, and then I real ized that they aren ‟ t

easy to interpret . I wi l l say that this advice that I ‟ve given here about the

VSM 24

completely randomized design is predicated on typical values of some of the

parameters that affect things.

If you block on something that actual ly has absolutely no effect ,

i t turns out that you can ignore the blocks and i t won ‟ t hurt you . But you ‟re

probably not going to be blocking on things— I mean I hope you aren ‟ t

choosing things that have absolutely no effect to block on . You know, i t ‟s not

a good idea .

Okay. Now, one can do the same kind of analysis for a mult i -

center , for s tudies that s tar t out being mult i -center , randomized block s tudies .

And essent ial ly what winds up happening is that you, as before, int roduce a

new factor , a new blocking factor , and the way to sort out what goes on in the

analysis is to just , you know, sort of do the s tandard kind of procedure to sort

through what the appropriate tes t s tat is t ics are in the design that you ‟ve

created.

Now, i t ‟s interest ing that both in the case of the hierarchical

design and in the design that ‟s original ly hierarchical , the cluster randomized

design, and design that ‟s original ly randomized blocks s tudy, you get

di fferent variance of part ial ly hierarchical experimental designs . They are

known designs.

We know how to analyze them and al l this s tuff , but they aren ‟ t

exact ly the same, and, you know, I have material on this , but I don ‟ t think I‟m

going to go through the detai ls of what happens in the mult i -center design

when you add extra blocks because, in a way, the sort of message that came

from the f i rs t two cases we ‟ve talked about probably you ‟ve got ten.

VSM 25

Okay. I got to f igure out whether I want to make condi t ional

inferences or uncondi t ional inferences, and then I got to sort out the r ight

thing to do . Make a s tat is t ical analysis that corresponds to that .

So what I‟d actual ly l ike to do at this point , because this could

actual ly, this could actual ly lead to our having a l i t t le bi t of t ime for a

discussion, and i f I‟m left to my own devices and I go through al l of this ,

there wil l be no t ime.

So I‟m going to speed through this , and get to “another easy

quest ion.”

[Laughter . ]

MR. HEDGES: And this is a quest ion that comes up al l the t ime .

There was some at t r i t ion from my stud y after ass ignment . Does that cause a

serious problem? And, you know, this is another s imple quest ion, and the

answer is far from simple.

I have an answer, and I ‟m going to frame i t in terms of

experimental design, but before I do, let me just point out that there ‟s a

s imple quest ion that has an easy answer, a different s imple quest ion that has a

real ly s imple answer.

And that ‟s „does at t r i t ion cause a problem in principle? ‟ And the

answer to that is „yes ‟ . You know, randomized experiments with at t r i t io n no

longer give model free, unbiased est imates of the causal effect of t reatment .

Remember, the reason we love randomized experiments is because

they in principle give us model free es t imates of t reatment effects , causal

effects of t reatments . And we lose that i f there ‟s at t r i t ion.

VSM 26

Whether the bias is serious or not depends on the model that

generates the missing data . That ‟s an old refrain I‟m sure you ‟ve heard

before. But before we, before we go too far with this , let me just give you one

way of thinking, thinking about post -assignment at t r i t ion in terms of concepts

that you ‟re used to from thinking about experiments .

If we have a t reatment and a control group, we can think about

missing this as l ike introducing another factor . We have the observed dat a for

the people in the t reatment group and the control group, and then we have the

missing data for the people in the t reatment and the control group.

Now, by just making this picture, you can see that there ‟s a

problem in est imat ing the t reatment effect from only the observed part of the

design. The observed t reatment effect is only part of the total t reatment

effect . You could think in—another term sometimes used in experimental

design is that the observed data al lows you to es t imate the s imple t reatmen t

effect on the observed, but doesn ‟ t a l low you to es t imate the t reatment effect

on the missing, and the main effect of t reatment is a combinat ion of those

two.

But this is t r ickier . You know, there are things you can learn by

just thinking about the alge bra of this that are useful . Suppose, so now what

I‟m going to do now is talk about algebra, not talk about anything that has to

do with s tat is t ical inference or anything deep.

This is , the rest of this bi t i s just about , just about algebra . If we

think of our design, and now we ‟re taking the God ‟s-eye view of our design .

We‟re assuming we can see the populat ion means, not only for the observed

VSM 27

people, but for the unobserved people, and this helps us kind of understand

what ‟s going on.

If mu TO is the mean in the t reatment group for the observed

individuals , and mu CO is the mean for the control group of the observed

individuals , and we have mu TM is the mean of the missing folks in the

t reatment group, then mu CM is the mean for the control folks who are

missing, and again you don ‟ t have access to these quant i t ies . You don ‟ t have

access to any of these quant i t ies in a real experiment .

But for the purposes of understanding what post -assignment

at t r i t ion might do, we can imagine we had access to these quant i t ies . We

could imagine ourselves in the role of the dei ty in actual ly understanding

what ‟s going on beneath the sampling error .

Now, I‟ l l int roduce another quant i ty which is obviously

important . When that ‟s the proport ion of the total number in each grou p that

are observed or missing, so pi T is the proport ion of the t reatment group

that ‟s observed, and pi bar -T is one minus pi T . So, in other words, i t ‟s the

proport ion of the t reatment group that ‟s missing, and the same thing for the

control group . So pi C is the proport ion of the control group that ‟s observed,

and pi bar-C is the proport ion of the control group that ‟s missing.

It ‟s clear that al l together this is the informat ion you would need

to sort out the t reatment group, the actual t reatment effect on the total group

that was randomized, and a l i t t le bi t of algebra tel ls you that , wel l , pi T t imes

mu TO plus pi bar -T t imes mu TM, that ‟s just the mean of al l the people in

the t reatment group, both the observed and the unobserved.

VSM 28

And pi C mu CO, plus pi bar-C, mu CM is just the mean of al l of

the people who are in the control group . That is al l of the people who are

randomized, the missing and observed.

And so the difference between those two is just the t reatment

effect on al l individuals randomize d, and when the proport ion of dropouts is

equal , this s impli f ies further ; so that i f the proport ion missing in both

t reatment and control group is the same, and I ‟ l l cal l that pi , then we can

wri te the t reatment , the t reatment effect on everybody randomize d, in terms

of the t reatment effect amongst the observed and the t reatment effect amongst

the missing in just the form of the las t s l ide there.

In other words, we could wri te , i f we cal led del ta the t reatment

effect on al l the individuals , then del ta is j ust a l inear combinat ion of the

t reatment effect among the observed folks and the t reatment effect upon the

missing folks . And these two pieces are weighed proport ionately, you know,

their proport ion pi of the folks that are observed.

So the t reatment eff ect among the observed gets weighed pi , and

pi bar of the folks are unobserved , so there the t reatment effect among the

missing gets the weight pi bar .

Now, this immediately tel ls us that we ‟re not going to be able to

do any inference on del ta without mak ing some assumptions or something that

sneaks into the model , l ike assumptions, that constrains the possible values of

del ta m.

So the reason I said there was a s imple answer to the quest ion of

whether or not at t r i t ion was a problem in principle is that t he value of the

VSM 29

t reatment effect on the individuals randomized could be anything i f there ‟s

any missing data, and we have no knowledge of del ta m.

So let me just sort of say that again because i t ‟s an obvious but

profound point . If we think of our outcome variable as having an infini te

range or a pract ical ly infini te range, then I can make del ta anything by

varying del ta m by enough, i rrespect ive of what del ta observed is .

So, in principle, at t r i t ion is an enormous problem that de -

ident i f ies our experimen tal es t imates .

Now one way you can—now, of course, there ‟s a big, you know,

caveat in what I just said, which was that i f del ta m has an infini te range,

then dot -dot-dot , but we al l know that the variables we measure usual ly don ‟ t

have an infini te range . You know, we have tes t scores that are bounded in

some ways . We might even have some bounds we ‟d be wil l ing to set on the

t reatment effect among the missing based on plausibi l i ty arguments , which

would al low us to bound the total t reatment effect based on knowing the pi ‟s .

So one way that people have approached the problem of deal ing

with at t r i t ion is to think about arr iving at bounds on the things we don ‟ t

observe— this is sort of the Chuck Manski approach, you know, f igure out

bounds on the things you don ‟ t observe—and i f you can specify bounds, then

you can specify bounds on the thing you want to es t imate.

And so the key point is no est imate of the t reatment effect is

possible without an est imate of the t reatment effect among the missing in this

model , and you know, i t ‟s possible that we can improve by assuming a range

of plausible values . There are various ways to do this . One of the ways to do

VSM 30

i t i s to take absolute bounds . The tes t scores go from zero to a hundred, then

zero is the smal lest they can be , and a hundred is the biggest they can be, and

you could—well , I won ‟ t go on yet—and you can set up bounds in that way.

You might say there ‟s a no qual i tat ive interact ion hypothesis ,

which is i f the t reatment effect is posi t ive for somebody, i t can ‟ t be negat ive

for anybody, and that sets up the possibi l i ty that i f del ta o is posi t ive, then

del ta m can ‟ t be any smal ler than zero.

It ‟s an assumption, not data . It ‟s not guaranteed, but i t ‟s an

assumption that could be made and some individuals f ind that a very plausible

assumption in , say, medical s tudies al though you can easi ly think of examples

in which i t probably isn ‟ t t rue.

But I want to go on for a minute because I think there ‟s another ,

again, s imple point that is not , I think, completely understood by al l

scient is ts , and that is that when the at t r i t ion rate is not the same in the

t reatment and control groups, the analysis gets t r ickier . And one idea that

people have used occasional ly is to t ry to convince themselves about the

t reatment effect on those who drop out , those who are missing or unobserved,

and they want to come up with bounds on that t reatment effect .

And they say, wel l , you know, i f I can be pret ty sure about what

the t reatment effect would have been on the missing folks , then that wi l l

convince me that the at t r i t ion hasn ‟ t spoi led the analysis .

Now how could you do that? Well , i t ‟s maybe not so impossible

to do. In a longi tudinal s tudy, for example, you may have mult iple waves of

measurements . And so you might use the t reatment effec t , the t reatment effect

VSM 31

measure on the las t wave of observat ions you have, as kind of a proxy for the

t reatment effect later .

And/or you might have something you know to be s t rongly

correlated with the outcome . It ‟s not the outcome, but you bel ieve you ca n

predict the outcome from i t , and i f you predict the outcome from i t , then you

can get some kind of purchase on what the t reatment effect upon the missing

might be.

So i f you got a s i tuat ion l ike the one that I ‟ve depicted here, you

might be fai r ly sangu ine. So you look at this . I think this— i t ‟s made-up data,

but i t ‟s the kind of data you might— i t ‟s not too crazy. We have a t reatment

effect among the observed.

We have an est imated t reatment effect or a putat ive t reatment

effort , or maybe the dei ty whisp ered in our ear , and we know that this is the

t reatment effect among the missing . We see that the score, the average scores

among the missing are smal ler than the average scores among the observed,

and that sort of makes sense, you know.

The people who can ‟ t make i t to the post - tes t are sometimes not

able to make i t to lots of other things, and that may interfere with their

overal l achievement . But when we look at the t reatment effect among the

missing, i t ‟s just about the same— i t ‟s exact ly the same as the t reatment effect

among the observed.

So how many of you are sanguine about the fact that the t reatment

effect among the total group of individuals randomized is going to turn out to

be posi t ive and maybe the same value as we ‟ve seen here?

VSM 32

I won ‟ t ask you—you could raise your hands, but I won ‟ t ask you

to raise your hands—but I wi l l ask you to make a mental commitment to that .

Does i t seem plausible to you that the t reatment effect among everybody

randomized is , yeah, going to be posi t ive in 23 or somethi ng l ike that?

Pause. Do a l i t t le wai t t ime . Let you think about that . Everybody

convinced themselves they have an answer? Al l r ight . Well , i t ‟s not , you

know, obviously I set you up for this . I shouldn ‟ t have set you up for i t . I

should have al lowed you to say, to think what you might have thought .

But now I‟m going to show you that just because the t reatment

effect among the missing is the same as the t reatment effect among the

observed doesn ‟ t mean that ei ther of those things is the t reatment effect

among the ent i re group randomized.

Here‟s the rest of the data . Here are the numbers , and pay

at tent ion to the total on the far r ight . I rounded those to zero decimal places ,

but what you can see is that the t reatment effect is posi t ive and the same,

exact ly the same, for both the observed and the missing . And the t reatment

effect is exact ly negat ive that for the ent i re group pooled.

Now, this may be a l i t t le surpris ing . It ‟s not surpris ing probably

i f you remember Simpson ‟s paradox because Simpson ‟s paradox, remember,

usual ly isn ‟ t—we usual ly don ‟ t think about in terms of cont inuous data . We

usual ly think about i t in terms of discrete data, but this is just a s imple

example of Simpson ‟s paradox.

What ‟s going on? You know how can this be? The f i rs t t ime

people see Simpson ‟s paradox, the quest ion is always how can this be, and the

VSM 33

answer to how this can be is that you see that most of the folks we observed

in the t reatment group, wel l , most of the folks in the t reatment group didn ‟ t

get observed, and the unobserved folks had lower scores .

Most of the folks in the control group did get observed, and most

of the folks who were observed had higher scores , and even though there was

this apparent perfect agreement between the observed and the unobserved in

terms of t reatment effect , the maldis t r ibut ion of people in terms of observed

and missing made i t possible—and the difference between observed and

missing on the average sort of made i t possible for the average t reatment

effect among— the average score among t he t reated to be much lower than the

average score among the controls .

This is just another example of , when I say Simpson ‟s paradox,

Simpson ‟s paradox is why the death rate can be going up even though i t ‟s

going down in every age category. How can that be? The populat ion is get t ing

older or the fact that the graduate school can be admit t ing more women, more

of any category, in each department , but overal l , have fewer of that group

being admit ted.

So the point is i f you have unequal at t r i t ion rates , i t ‟s not enough

just to know what the t reatment effect would have been in the missing guys .

You got to know something about what the t reatment effect—you got to know

the individual means.

Now, obviously, you can t rade off informat ion in coming up with

clever bounds, and there ‟s a whole l i terature on that s tuff . One possibi l i ty for

helping you understand the potent ial effects of at t r i t ion is to sort of put

VSM 34

bounds on the t reatment group means in some way, lower and upper bounds.

And i f you do that , you can kind of reason through that i f you put

in the smal l—you put in the lower bound for the missing folks in the

t reatment group and the upper bound for the missing folks in the control

group, you can come up with an absolute lower bound for the possible

t reatment effect .

And i f you put in the upper bound for the missing t reatment folks

and a lower bound for the missing control folks , you can come up with an

upper bound for the possible t reatment effect on the ent i re populat ion.

And in general there are many permut at ions of work l ike this . But

I‟ l l jus t point out that nothing that I said in this l i t t le sect ion of the talk

involves sampling or es t imat ion error . It ‟s al l just algebra, and things get a

l i t t le bi t more complex i f we s tar t taking sampling into account , b ut real ly the

biggest sort of s tumbling blocks are the ones that I t r ied to ident i fy here.

And now I‟m at the s tage where I‟m going to s tar t wrapping up so

there wil l be a l i t t le t ime i f people want to ask quest ions so I don ‟ t go for the

whole t ime.

There are a lot of seemingly s imple quest ions that ar ise in

connect ion with f ield experiments , and I t r ied to cover one category of them

and one other category to some degree.

And the main conclusion that I have about this is that the answers

to these quest ions often are much t r ickier than they seem to be . They require

complex thinking about fundamental aspects of what the research design is ;

what you ‟re t rying to infer from that design, and connected to i t , what your

VSM 35

vision of the sampling model is , even i f i t i sn ‟ t a formal sampling model in

the sense that a survey sampler would recognize i t .

And i t depends cri t ical ly also on assumptions about missing data

and how seriously you take that . Now, i f there ‟s only a t iny amount of

missing data, obviously those pr oblems aren ‟ t so important . But i f a thi rd of

your observat ions are missing, a thi rd of the people disappear, then they are

not to be ignored.

And I think the crucial point that I ‟d l ike to leave you with is that

often these s imple quest ions don ‟ t have, you know, don ‟ t have s imple

answers . They require a lot of complex thinking, and when you go to a

methodological advisor , and they give you the answer that my fr iend Henry

ment ioned as the best answer to quest ions l ike this , “ i t a l l depends,” they‟re

not just t rying to be uncooperat ive .

[Laughter . ]

MR. HEDGES: Or i t i sn ‟ t that they don ‟ t know. They may not

know, but , you know, a fai r answer is „ i t a l l depends ‟ ; even then . I may know

that much. So I think that the general sort of entreaty here is to be ge nt le with

those of us who say I‟ve got to know more . You need to explain more to me

about what you ‟re doing and more to me about where you got your sample and

what you think i t means and what the design is because, for me, answering

some s imple quest ions, these and others , i t ‟s hard to do without that

addi t ional detai l and informat ion.

I think I wi l l s top, uncharacteris t ical ly, before the end of the

t ime, and maybe there wil l be a possibi l i ty for discussion . I know there are

VSM 36

several people in this room who know at least as much as I do about al l these

things, and i f I don ‟ t know the answer or i f I know the answer, but i t ‟s not

r ight , one of them wil l help me out .

So I‟ l l s top.

[Applause.]

MR. HEDGES: So are there quest ions or comments that people

would l ike to make—maybe another s imple quest ion? That would be enough

to f i l l up the t ime remaining, I ‟m sure.

I‟m supposed to tel l people to go to the microphone, and Henry

has got f i rs t mover advantage.

MR. BRAUN: Thanks very much, Larry.

In t rying to analyze observat ional s tudies , people often use

select ion models , sort of two -stage models , to t ry to capture the effects of

select ion in order to get unbiased t reatment effects .

MR. HEDGES: Yeah.

MR. BRAUN: Do you know if people have t r ied to do s imila r

things in the context of randomized experiments with non -random at t r i t ion to

t ry to model the at t r i t ion in some way, maybe using covariates and so on, and

have they been successful? Is there any way to tel l?

MR. HEDGES: I‟m glad i t was an easy quest io n. You know, there

are different views about that . My col league Tom Cook, for example, would

argue that not so much of Heckman s tyle select ion models , but various kinds

of matching models , you know, propensi ty score models and even just plain

old garden-variety analysis of covariance can do a good job.

VSM 37

But I think the problem is that any demonstrat ion that they can do

a good job, and this is my— I know some of those folks are in this room so

they may choose to get up and disagree and explain why I ‟m ful l of i t .

Demonstrat ions that they can do a good job in some circumstance always

seemed to me to be— I have no idea how to general ize to another s i tuat ion,

and so I f ind them total ly unsat isfying.

There ‟s another point of view which is I know my former

col league J im Heckman, when he was presented with some s tudy—well ,

Robert LaLonde and Rebecca Maynard ‟s work that did some select ion model

analyses of pret ty good randomized experiments , and they got different

answers . And J im ‟s response to that was, see, I told you experiments were

unrel iable.

[Laughter . ]

MR. HEDGES: So i t becomes, I think, so I ‟m not persuaded .

What I‟m persuaded of is that there are cases in which they can do a good job .

What I can ‟ t tel l i s which cases they are, and that ‟s the rub . I mean the reason

I l ike experiments is because—and I think i t ‟s probably the reason al l of us

l ike experiments—you don ‟ t have to be very smart to do a good job with

experiments . You don ‟ t have to know anything i f you can keep the experiment

intact . As soon as i t f al ls apart , then you have to know stuff to make

inferences . That ‟s much harder .

Anybody from—Jack, okay. You ‟re supposed to talk into the

microphone. See, I remembered.

DR. RUBY: Name.

VSM 38

MR. MOSTOW: Oh, sorry. Jack Mostow, Carnegie Mellon .

When you are designing an experiment where you have mult iple

data points for individual , ei ther s ingle subject randomized within subject or

randomized between subjects , i s i t exact ly analogous to everything you said

about clustering at the classroom level or do the r ules change? And i f so,

how?

MR. HEDGES: It ‟s almost exact ly analogous . The only twist is—

I mean this is actual ly one of the great things that Steve Raudenbush and

Tony Bryk kind of contr ibuted to our understanding of the world . I mean they

didn ‟ t exact ly invent this s tuff , but they actual ly were the f i rs t— they actual ly

told us about i t , and you know, that ‟s probably more important in some ways,

that mult iple observat ions within individuals are kind of a nest ing s t ructure

that ‟s very much l ike individual s within clusters .

And so much of what ‟s t rue about , in fact , most of what ‟s t rue,

about clustering within schools or classrooms, is also t rue about clustering of

observat ions within individuals . There ‟s one sort of different point , which is

that the correlat ion s t ructure among observat ions within individuals can be a

lot more complex than the correlat ion s t ructure within clusters of individuals ,

at least the correlat ion s t ructures we choose to model within clusters of

individuals .

By that , I mean i f you have mult iple measures over t ime, i t ‟s

highly plausible—well , let me back up and say the problem with, the problem

that cluster—cluster randomized t r ials present us with is that individuals

within clusters are correlated . They‟re correlated because i f ther e‟s an effect

VSM 39

of the cluster , everybody in the cluster has the same effect . They share i t so

their data is correlated.

But usual ly we take that correlat ion to be the same— that

correlat ion among people to be the same . Now, when you model data across

t ime within individuals , for example, i t ‟s possible that the correlat ion

between measurements at any two t ime points is the same, but i t ‟s certainly

plausible that measures that are closer together correlate more highly than

measures that are further apart .

And the exact s t ructure of that , and i t turns out that the s t ructure

of that correlat ion mat ters , and so that ‟s— the fact that the correlat ion

s t ructure is potent ial ly more complicated is the principal way in which things

differ .

And one could tear one ‟s hair out about those things . That ‟s , of

course, what I‟ve done, and—

[Laughter . ]

MR. HEDGES: — i f I hadn ‟ t encountered those problems, I ‟d

have a ful l head of hair .

[Laughter . ]

MR. HEDGES: So that ‟s what I have to say about that . And

again, I would encourage my methodological ly-incl ined col leagues I‟m not

the only source of knowledge in the room so—

MR. ALEXANDER: I‟ve got a s imple one for you.

MR. HEDGES: Uh-oh.

[Laughter . ]

VSM 40

MR. ALEXANDER: Karl Alexander, Hopkins.

Indeed, a s imple one . So I thought this was t remendously

informat ive, very useful , even for someone who doesn ‟ t l ive and breathe this

framework. So I am wondering i f there ‟s a way that we could get your

complete l is t of s imple quest ions.

[Laughter . ]

MR. ALEXANDER: Can we wri te to you and ask for i t? And

your notes perhaps to walk us through some of the answers to these s imple

quest ions.

MR. HEDGES: Oh . Okay. Well , I could work on that . I don ‟ t

know that I‟ l l ever have a complete l is t .

MR. ALEXANDER: I don ‟ t mean complete, but a mo re complete.

MR. HEDGES: More complete . Yeah, I could probably, I could

probably furnish a more complete l is t of s imple quest ions with a part ial ly

complete l is t of answers , and I ‟d be interested to do that .

Actual ly, I‟d also be interested in hearing f rom other people what

their s imple quest ions are, and actual ly, Karl , thank you for that , because i t ‟s ,

you know, anybody who has any of these quest ions is not the only person who

has i t , and I assume that I haven ‟ t been asked al l possible s imple quest ions .

So, but amongst al l of us , we could generate a good l is t of quest ions.

And in some cases , i t requires a l i t t le bi t of work to f igure out , to

think through the quest ions, and i t ‟s actual ly valuable . I f ind i t valuable, but

that ‟s the kind of s tuff I do.

Yeah.

VSM 41

MR. BORMAN: Hi . Geoffrey Borman, Universi ty of Wisconsin -

Madison.

Maybe I‟ l l have a s imple quest ion . One issue that comes up a lot

related to your discussion about blocking—maybe not a lot , but i t ‟s happened

to me more than once— i s this issue when you ‟re involved in a recrui t ing—

schools , for instance—a group of schools may come forward, or you may

act ively go out and f ind a cluster of schools that are wil l ing to part icipate in

your experiment , and those schools maybe don ‟ t necessari ly const i tute a real

cluster in the way that we think about them, in that they ‟re not real ly related

to one another . They may be, i f i t ‟s a nat ional s tudy from al l over the country,

and real ly not have much in common at al l .

What they do have in common, though, is that you were able to

round up this group of schools at one t ime, and you know they ‟re eager to

know what their s tatus is in your experiment , i f they ‟re t reatment or control ,

and perhaps your implementat ion schedule depends on kind of this rol l ing

cycle of randomizat ion.

I‟m just wondering how would you think about that general ly and

analyt ical ly where, you know, these clusters are more just bui l t out of

convenience rather than being associated analyt ical ly with the outcome

measure , and how would you think abo ut that?

MR. HEDGES: Well , I can tel l you i f i t turns out they ‟re not

associated with the outcome measure, you ‟re home free because they‟re not

real clusters . I mean they‟re not clusters in the s tat is t ical sense.

The fact that they arise over t ime is in a way not so much of a

VSM 42

problem, at least in my view. And I think this rol l ing recrui tment is real ly

more the rule than the except ion in a lot of s tudies . I mean some people

manage to recrui t al l at once, but i t ‟s something that is— I guess the crucial

thing there is to sort of ask yourself the quest ion— I would go back to kind of

the inference model quest ion:

You know am I happy to just infer to this group—and this group

of sort of temporal clusters , i f you wil l—and I might very wel l be—and t reat

these things as i f they‟re just , you know, they‟re f ixed effects i f they have

any effects and not worry about the heterogenei ty of t reatment effects .

Now, as soon as I say that , I think of one example in which i t

real ly mat tered in the early schools where i t had a much different t reatment

effect than the later schools , and then you were lef t wi th the puzzle about

what to do . If you think in the sort of condi t ional inference sense, you could

talk about the average effect of the early and the late.

I‟m caricaturing your example a l i t t le bi t , but i t does correspond

to one wel l -known example to some of us anyway. You know as soon as

your— there is a sort of a price for , intel lectual price for thinking in terms of

f ixed effects . You know you ‟re talking about the average t reatment effect in

the clusters , in the blocks you happen to choose.

And what i f the blocks have very different t reatment effects?

Well , the convent ional answer is take the average in some way . Maybe you

weight i t ; maybe you just take the unweighted ave rage. And that ‟s “ the

t reatment effect .”

But i f “ the t reatment effect” i s made up of very different

VSM 43

individual effects , then i t ‟s kind of unsat isfying . You know no effect plus

real ly huge effect—no effect in one block, huge effect in another block, the

average is somewhere in between, doesn ‟ t seem l ike a very good descript ion

of the data overal l .

So I think, I don ‟ t know, from my point of view, Geoff , your

quest ion is real ly easy i f there are no block t reatment interact ions in that case

and real ly hard i f there are . And i f there are, then I think they ‟re going to be,

there‟s going to be an essent ial ly contested quest ion about what the r ight

analysis is al though one can describe, one can describe the data and say, look,

the early adopters found a whopping t reatment effect in this case; the late

adopters didn ‟ t f ind much . And there ‟s a real di fference between the two.

And in the absence—and this kind of i l lust rates a weakness of

experiments— i f you don ‟ t have anything else than just the experimental data

to sort out the heterogenei ty of those effects , then i t ‟s real ly hard to interpret

them.

And I think al l of us who do this kind of work, even though we

may be proselyt izers for experiments at one level , are also proselyt izers for

col lect ing enough other kinds of data that you can interpret the experimental

data . And I know I‟m preaching to the choir in you, Geoff , because I know

you do that rout inely, but I think that ‟s probably essent ial .

Ah, now Vivian may very wel l want to take issue with what I said

about , in answer to Henry‟s quest ion because we know she works in this f ield .

So—

MS. WONG: Actual ly I was going to int roduce, hopeful ly, maybe

VSM 44

another s imple quest ion.

MR. HEDGES: Uh-oh.

MS. WONG: I was wondering— let ‟s say in the case of a s imple

random assignment s tudy, do you have comments or thoughts about doing the

analysis with a regression -adjusted resul ts or sort of s imple t reatment minus

control di fference?

I guess in some cases where I see that there ‟s no obvious reason

that randomizat ion didn ‟ t work, that a lot of t imes people s t i l l present sort of

these covariate adjusted resul ts for experiments , and to me that seems l ike i t

could introduce more bias , but—

MR. HEDGES: Aah . Well , you know, there is work on that .

There is Freedman ‟s work on how adjust ing for covariates can introduce bias

in the randomized experiment . But i t goes away asymptot ical ly, and i t ‟s not

very big. I guess one— I guess I would say I‟m sort of more comfortable with

centered analyses so that you ‟re not adjust ing the t reatmen t effect , but you ‟re

just adjust ing variat ion within t reatment groups.

But I think i f al l you ‟re doing is just increasing precis ion, I ‟m not

unhappy with that , Freedman aside.

If the resul t i s sensi t ive to that , then I ‟m a l i t t le more worried .

Not just , you know, that p goes from one s ide of 05 to the other , but you

actual ly get something that looks l ike a qual i tat ively different answer, then

I‟d be very worried about , but I wouldn ‟ t expect to get that in a randomized

experiment .

So I do think i t ‟s—so I‟m general ly pret ty comfortable with

VSM 45

covariate adjustments because in the real world we need them to get enough

power to do experiments with feasible numbers of schools . So I think i f we

decide we ‟re not going to do that , then we ‟re hoping IES gets a much b igger

budget .

[Laughter . ]

MR. HEDGES: And gives a lot of i t to us . Okay . Go ahead,

Carol .

MS. O ‟DONNELL: I won ‟ t say who I am in case this is a s tupid

quest ion.

[Laughter . ]

MS. O ‟DONNELL: But I think i t might bui ld off of what she just

asked, but how do you know, under what condi t ions do you use a variable as a

blocking variable versus a covariate?

MR. HEDGES: You know, there ‟s an old and a new l i terature on

this , you know. David Cox and Leonard Feldt and folks l ike that wrote about

i t . If you have a real ly high correlat ion, and basical ly i t depends— the answer,

one answer is that i t depends on the form of the relat ionship between the

covariate and the outcome, and i f you ‟re convinced you know i t , and you

know i t ‟s l inear , or you can l inearize i t , i f you can ‟ t do that , then using i t as

a l inear covariate is probably not possible.

If you do know that , then there ‟s s t i l l a quest ion of whether

blocking or analysis of covariance is more eff icient , and i t ‟s a l i t t le— that ‟s a

l i t t le bi t t r icky, but I thin k the general answer is , and I think there ‟s at least

one person in the room who might dispute this , I think the general answer is

VSM 46

that i f you have a real ly high correlat ion, the use of the l inear covariate

probably gives you a s l ight power advantage, and there ‟s a point at which

blocking is compet i t ive.

And so the sort of s imple answer, one answer would be you t ry to

f igure out what the power would be under ei ther ci rcumstance and then pick

the one that has the best power i f you ‟re not worried about the p otent ial bias

int roduced by using covariates .

And i t wasn ‟ t a s tupid quest ion . It ‟s a class ic quest ion, I should

say, because I can think of three or four papers on i t . It becomes t r icky. But

one of the things that I guess I should also ment ion is that i t does—knowing

quest ions about f ixed versus random effects also can, you know, and

condi t ional versus uncondi t ional inferences, what are you inferr ing to can

also enter into this . And I answered that kind of caval ier ly not using my own

rules .

You know if you think of the values you ‟ve observed, the values

of the potent ial blocking variable you ‟ve observed as being sampled in some

sense, then that has implicat ions for the analysis of covariance because the

classic analysis of covariance would argue that thos e things are f ixed, and i f

you got them by sample, and you ‟re serious about that , and i t ‟s one thing to

take a sample and say, okay, this is the universe I care about , i t ‟s another

thing to say I took a sample and I think of i t as al lowing me to help

general ize to a larger universe . So depending on how you think about the

covariate, there may be implicat ions for analysis .

Well , looks l ike I‟ve exhausted you . Oh, no. Jack has got another

VSM 47

quest ion.

MR. MOSTOW: When you have mult iple observat ions per

individual—sorry— I‟m st i l l Jack Mostow— i t hasn ‟ t changed.

[Laughter . ]

MR. MOSTOW: But I might want to .

MR. HEDGES: It would have more interest ing i f i t had changed.

MR. MOSTOW: When you have mult iple observat ions per

individual , you can control for indivi dual differences by actual ly throwing in

ident i ty, s tudent ident i ty as a variable, and then that controls for everything

whether you know what i t was or not .

MR. HEDGES: Yeah.

MR. MOSTOW: But you may hear a giant sucking sound of al l

your power going away, or you can put in the covariates for the things that

you suspect might be important but you might be missing something . How do

you decide which?

MR. HEDGES: Well , now I ‟m get t ing into terr i tory where I know

people wil l disagree . In a sense, making, ident i fying, put t ing in a dummy

variable for each individual is very much in the t radi t ion of making the

individuals f ixed effects .

I mean that would be the language economists would use, and

then the quest ion becomes are you interested in this set of ind ividuals and is

between-individual variabi l i ty to be—do you think—well , the quest ion

becomes is this a sample of individuals , and you want to general ize to a

broader set of individuals or—and you want that to be part of the s tat is t ical

VSM 48

r igmarole—or do you want to think of this set of individuals as being a set of

individuals about whom I can learn things?

And i f you want to think of the individuals as being, as being a

sample of individuals that you hope your resul ts general ize to , and you want

s tat is t ical—you want the sampling theory general izat ion model to be the way

to do i t , then my answer would be you ought to t reat those individuals as

having random effects .

Now many—now I know a s tandard approach to this analysis is to

make them al l f ixed— i s to , in a sense, to make them fixed effects , but you

don ‟ t have a sampling theory warrant any more for saying this should apply to

other people from the universe from whom these people were sampled, and

you have to make the ex tra -s tat is t ical kind of argument , a me chanis t ic kind of

argument , for general izat ion.

Now that may not be ; that may not be too big a price to pay

because you get a lot by creat ing individuals as f ixed effects . You get a lot of

precis ion for es t imat ing what ‟s going on with them, but , of course , you don ‟ t

get something for nothing . You got al l that precis ion by defining away

between-people variabi l i ty as being f ixed.

And i t ‟s the same, and i t ‟s not any different than I got a bunch of

schools and, boy, there ‟s a lot of between-school variabi l i ty; maybe I should

just , you know, make the schools f ixed effects , and at that point , you get a lot

more precis ion, but at the cost is that you ‟ve defined away a lot of the natural

variabi l i ty, and the s tat is t ics don ‟ t a l low you to bring i t back in .

Now, i t may be that i f you ‟re t rying to do something l ike a proof

VSM 49

of concept , you probably want to do— i t may be exact ly the r ight thing to do.

If you want to show that something works someplace under good

condi t ions, then the f ixed effects approach seems l ike a good one to me. If

you want to show that this is going to work i f we scale i t up, that sounds l ike

a bad approach.

I‟m expect ing somebody to disagree with this , wi th that , but

maybe I‟ l l be lucky and we ‟ l l run out of t ime before they get a chance to calm

down enough to say something.

Yes? No? Okay. Well , you know, I‟m here for another three

minutes so you can—

[Laughter . ]

MR. HEDGES: You can ask quest ions i f you l ike . Well , okay.

I‟ l l put in an advert isement then i f we ‟ve got another few minutes . This is a

great— the IES Research Conference is real ly a great gathering, and i t i s

t remendously, wel l , grat i fying to me to see how many good people there are

doing serious educat ion scient i f ic work.

One of the other places where you can f ind lots of people w ho are

interested in doing good scient i f ic work in educat ion is the Society for

Research on Educat ional Effect iveness , a new scient i f ic society that was

formed a couple of years ago, and i t was formed by people l ike you . Well ,

people l ike me in part icular .

[Laughter . ]

MR. HEDGES: And the idea was to have something kind of l ike

the IES Research Conference, at least to have the same kind of populat ion as

VSM 50

the IES Research Conference . It ‟s open to anybody. I mean i t ‟s not IES

specif ic , but , you know, people who are interested in doing r igorous research

on educat ion, part icularly research that t r ies to sort out causal effects .

We‟re not a bunch of folks who just do experiments . There ‟s a lot

of folks who do longi tudinal s tudies , who do qual i tat ive work, who d o

measurement , who do s tat is t ics , but the meet ings—we‟ve had a couple of

nat ional meet ings so far . They‟ve been a lot of fun, for me anyway, and, you

know, I had to do a lot of adminis t rat ive detai ls so i f they were fun for me, I

think they were more fun for other folks .

I would urge you to think about joining the Society for Research

on Educat ional Effect iveness or i f not doing that , at least maybe you want to

come to one of our conferences . The next conference is going to be roughly a

year from now in ea rly March of 2010 . It wi l l be in Washington . You can

expect there to be a few hundred, a smal l few hundred, of individuals that are

in fact a lot l ike you . In fact , some of them are you . But some of you probably

aren ‟ t part of i t .

We have a journal , the Journal for Research on Educat ional

Ef fect iveness , and I think one of the things that has been impressive to me

about the group so far is that when I go to the conference, there are always

more things that I want to go to than there ‟s , you know, copies of m e to go to

them.

And so I f ind that I want to go to the methodological talks , but I

also want to go to a lot of substant ive talks because they ‟re fascinat ing, and I

also f ind that the people that I meet there are interest ing and fun, and I learn

VSM 51

from them, and I would commend i t to al l of you to at least think about .

You can f ind informat ion about us at

www.educat ionaleffect iveness .org, and or you can just sort of Google Society

for Research on Educat ional Effect iveness . And anyway, I think that is

another place where you can not only present your own work, but you can get

a chance to see high qual i ty, uniformly high qual i ty educat ional research in a

scient i f ic mold, and i t ‟s a conference that ‟s smal l enough that you can talk to

people in the hal l , and th at ‟s real ly part of what we t ry to do.

We t ry to create a program that gives opportuni ty for interact ion,

and we also do things l ike we provide food for people so that people don ‟ t

run off at luncht ime and so they get a chance to s tay together , and we usu al ly

have a few events that are yet another excuse for mingl ing.

So I would urge you to at least consider ei ther joining the society

or coming to our meet ing or both and maybe even thinking about present ing

your own work there .

We wil l look forward to s eeing as many of you as we can . I guess

i t ‟s okay for me to make that commercial .

[Laughter . ]

MR. HEDGES: And i f there is no other quest ion, the fel low in

the back was holding up a s ign that was threatening to cut me off , so perhaps

we should end.

[Applause.]

[Whereupon, at 2:45 p.m. , the panel session concluded.]

Documents

Problems with the Design and Implementation of Randomized … · 2009-09-03 · VSM 6 actually required thinking through quite a few things that weren‟t necessarily obvious either