Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
VSM 1
Inst i tute of Educat ion Sciences
Fourth Annual IES Research Conference
Concurrent Panel Session
“Problems with the Design and Implementat ion
of Randomized Experiments”
Monday
June 8, 2009
Marriot t Wardman Park Hotel
Thurgood Marshal l South
2660 Woodley Road NW
Washington , DC 20008
VSM 2
Contents
Moderator:
Al len Ruby
NCER 3
Presenter:
Larry Hedges
Northwestern Universi ty 5
Q&A 42
VSM 3
Proceedings
DR. RUBY: Good afternoon, everyone . I am Allen Ruby from
IES. Thank you for at tending, and just a reminder, i f you ask a quest ion,
because this session is being taped, please s tate your name and your
aff i l iat ion i f you ‟d l ike to .
Last year , I int roduced Dr. Larry Hedges at the IES Research
Conference . He was doing a talk on inst rumental variables , and at that t ime, I
noted he had joined the Northwestern facul ty in 2005 . He has appointments in
s tat is t ics , educat ion, and soc ial pol icy. He is one of the eight Board of
Trustees Professors at Northwestern.
Previously, he was the Stel la Rowley Professor at the Universi ty
of Chicago, and his research has involved many f ields including sociology,
psychology and educat ional pol icy.
He is widely known for his development of methods for meta -
analysis . He has authored numerous art icles and several books . He has been
elected a member or a fel low of many boards, associat ions, and professional
organizat ions including the Nat ional Academy of Educat ion; the American
Stat is t ical Associat ion—come on down. There are plenty of seats up front ,
please— American Stat is t ical Associat ion; the Society for Mult ivariate
Experimental Psychology.
He has served in an edi torial posi t ion on a number of jo urnals
including the American Educat ion Research Journal ; the American Journal of
Sociology ; the Journal of Educat ion and Behavioral Stat is t ics ; and the
Review of Educat ional Research .
VSM 4
So I don ‟ t want to go into the detai ls again this year . I real ly just
want to ask what ‟s he done for educat ion research lately?
[Laughter . ]
DR. RUBY: And to that end, I‟d l ike to note two areas of his
current work that might be of interest to some of the folks here.
When we evaluate educat ional intervent ions in schools , we often
act as i f those schools included in our sample are a probabi l i ty sample from a
populat ion of schools , when, in fact , they ‟re real ly not . They‟re usual ly a
convenient sample of schools that are wil l ing to work with us .
And so Larry is current ly wo rking on how to address the
nonprobabi l i ty sampling in order to improve our es t imates of populat ion
t reatment effects or to improve the general izab i l i ty of the resul ts from our
experiments .
He has publ ished one piece of this work in a chapter ent i t led
“General izabi l i ty of Treatment Effects , ” in the book Scale-Up in Principle,
2007, edi ted by Schneider and McDonald.
A second area is as we move to a greater use of cluster
randomized t r ials , there are implicat ions for what the effect s izes are and
mean; and he has been working on improving methods for defining effect
s izes in cluster randomized t r ials ; and he has publ ished two pieces of work in
this area: one in 2007 in the Journal of Educat ional and Behavioral
Stat is t ics ; and another one in the recent ly rele ased Handbook of Research
Synthesis ; and another piece wil l be coming out soon in JEBS on three -level
designs.
VSM 5
Third, and what he ‟s talking about today, he ‟s been considering a
set of issues that often crop up when designing and carrying out randomized
t r ials . These issues have important implicat ions for preserving the integri ty of
the experiment , doing the analysis , and interpret ing the resul ts .
So I‟d l ike everyone to please welcome Dr. Larry Hedges .
Thank you.
[Applause.]
MR. HEDGES: Well , i t ‟s always tough to l ive up to an
introduct ion l ike that , and I ‟ l l jus t begin this talk by saying that IES put me
up to i t , and they put me up to i t in the fol lowing way: that some folks on the
IES s taff , and that actual ly wasn ‟ t jus t Al len, said to me, you know , we keep
get t ing these quest ions from people that are real ly sort of s imple quest ions,
and I‟m surprised that folks don ‟ t know the answer to those quest ions . Maybe
you could just give a talk to clar i fy, you know, these things for people.
And that ‟s how I got put up to doing what I ‟m going to t ry to do
today. And when I was asked to sort of gin up a t i t le for the session, I came
up with the t i t le “Problems with the Design and Implementat ion of
Randomized Experiments .”
But a few days ago when I was talking to Allen about this session
in somewhat greater detai l , he helped me to real ize that a bet ter t i t le for this
talk would be “Hard Answers to Easy Quest ions. ”
[Laughter . ]
MR. HEDGES: Because the quest ions that kept cropping up were
in principle easy ques t ions, but in order to give a good answer to them, i t
VSM 6
actual ly required thinking through qui te a few things that weren ‟ t necessari ly
obvious ei ther to the person asking the quest ion or to the person answering
the quest ion.
I think I know, I see a lot of m y methodological ly incl ined fr iends
in the audience, and I know we ‟ve al l had the experience of somebody coming
up to you and saying I ‟ve got a real ly s imple quest ion, and they think i t has a
yes/no answer, and they think that enough information to answer the quest ion
is contained in the f i rs t s imple phrasing of i t , and then we have to s i t them
down for a few hours and talk through a lot of detai ls of what they actual ly
mean, and so this , in a sense, this talk has some of that character .
I‟m going to s tar t wi th an easy quest ion that has ar isen frequent ly
in my experience, and I guess in IES ‟s experience as wel l . People wil l say
isn ‟ t i t okay i f I just match something l ike schools on some variable before
randomizing? And then they often go on to say, you kno w, lots of people do
i t .
[Laughter . ]
MR. HEDGES: And this quest ion has been asked of me direct ly
by some fai r ly sophis t icated people, and, you know, when they say that , you
know, “ lots of people do i t ,” what they are real ly saying is , “come on, i t can ‟ t
make that much difference, and I know lots of people who do this and kind of
ignore the fact that they did i t later . You know, surely, this is okay; r ight? ”
And I see this as an example of a s imple quest ion, but giving i t
an answer requires some seriou s thinking about design and analysis , and in a
certain sense, this is an opportuni ty to raise people ‟s consciousness about
VSM 7
some things you probably al ready know.
Maybe for many of you, al l of the things are things you already
know. In that case, you don ‟ t need this talk, but i f there are some things that
you might not have thought about qui te this way, maybe i t wi l l be helpful .
Well , the f i rs t s tep for me in thinking about a quest ion l ike this is
t rying to understand exact ly what the quest ion means, and n ot the only way,
but one way to think about i t i s that adding a matching or a blocking variable
means adding another factor to the research design.
So what I‟m going to do today, I‟m going to rely heavi ly on kind
of a s tandard analysis of variance, s tanda rd experimental design approach to
thinking about these things, not because analysis of variance is necessari ly
how you would analyze the data from a real experiment , but because i t makes
certain things real ly clear and has a sort of universal vocabulary t hat may be
helpful .
So i f you see a lot of analysis of variance kind of approach to
things, that ‟s why i t ‟s there .
And then to move on, that s imple quest ion, you know, has
consequences that depend a l i t t le bi t on which design you s tar ted with . You
might have s tar ted with an individual ly randomized design, the sort of
completely randomized design, no clustering or anything l ike that .
I want to talk a l i t t le bi t about that because i t ‟s kind of the
paradigm for answering the quest ions with respect to more complex designs
that we may actual ly be more l ikely to use in educat ion ; things l ike cluster
randomized designs, which might also be cal led hierarchical designs i f you
VSM 8
read a book l ike Kirk ‟s Experimental Design book; and mult icenter or
matched designs, which might be cal led general ized randomized blocks
designs i f you look in a s tandard kind of experimental design book.
So i f we s tar ted with, i f the place we s tar ted was with an
individual ly randomized design where we basical ly ass igned t reatments to
individuals at random; adding a blocking factor ; using the blocks ; matching a
l i t t le bi t ; corresponds to essent ial ly changing the design from a s imple
individual ly randomized design to a randomized blocks design or what is
actual ly usual ly cal led a general ized randomized blocks design.
And you could, and this is a sort of sample depict ion, and I ‟m
going to make our discussion easy by imagining that we ‟ve picked the levels
of our blocking factor in such a way that we get sort of equal numbers and
miraculously equal numbers in both t reatment and control in each block.
So the f i rs t point that ‟s useful in thinking about what happens i f
we had a blocking factor is that i t depends on the design and that adding that
changes the design.
Now, i f we want to ask what th e impact on the analysis might be,
wel l , we could, again, think about this in analysis of variance terms, and i f
we imagine that we have got ten incredibly lucky in the choice of our
blocking, levels of our blocking factor , so that we get equal numbers of
s tudents in each block, 2n s tudents in each block, n in t reatment and n in
control , and p blocks, then we would have a s tandard part i t ioning of the sums
of squares , and that goes, that we talk about in analysis of variance.
And I think of the part i t ioning of the sums of squares in terms of
VSM 9
part i t ioning of the variabi l i ty of the data, and so the not ion is the total
amount of variabi l i ty gets part i t ioned into part that ‟s due to t reatment and
part that ‟s due to within t reatments in the original part i t ioning.
And the degrees of freedom get part i t ioned, and the original tes t
s tat is t ic compares the sum of squares due to t reatment , which is a funct ion of
mean difference, to the variance within t reatment groups, in other words, the
sum of squares within t reatment g roups divided by the degrees of freedom
there.
Now, by introducing this blocking factor , we ‟ve introduced the
possibi l i ty of part i t ioning the variat ion further . If we s tar t wi th our
individual ly randomized design, but now introduce a blocking factor , we h ave
a different part i t ioning of the total variat ion, one in which there is a certain
amount of variat ion between blocks, a certain amount of variat ion that is
associated with the block t reatment interact ion, the fact that wi thin each
block, we have both t r eatment and control individuals . So there ‟s a block-
specif ic t reatment effect that we could compute.
And the block t reatment interact ion corresponds to an amount of
heterogenei ty that has to do with heterogenei ty in t reatment effects .
And then there ‟s a sort of within cel ls , that is wi thin blocks by
t reatments component .
Now, once we real ize we have this new design with a new
part i t ioning of the variat ion, the quest ion arises : wel l , what ‟s the r ight tes t
s tat is t ic?
And I sort of mean this in two senses . You know one sense you
VSM 10
could mean i t i s how do I actual ly compute the tes t , but also, you know, how
is i t sensible? With what is i t sensible to compare the variat ion due to
t reatments in order to decide whether the variat ion due to t reatments r ises
above the level of the background noise?
Well , that depends on the inference model . Now, that term may
be— I‟ l l jus t— this is just a review, by the way, of the sums of squares , that in
essence, what used to be cal led within t reatments in the original design now
gets broken up into three components : a between blocks component ; a block
t reatment interact ion component ; and a within blocks by t reatments , a within
cel l component .
And that ‟s an important thing about this design, a potent ial
advantage of this design . But as I was about to say, in order to f igure out
what the r ight thing to do, now that we ‟ve added this blocking factor to our
design, we have to think about the inference model that we want to at tach to
this design . And I need to say a l i t t le bi t about th at because one of the things
that isn ‟ t discussed in qui te these terms in many experimental design courses
is the not ion of inference models .
It ‟s always talked about , but not exact ly in this way . And I‟d l ike
to make a dis t inct ion between two kinds of in ference models , one which I ‟m
going to cal l condi t ional inference model , and one which I ‟m going to cal l the
uncondi t ional inference model .
And the point of making this dis t inct ion is that the inference
model is associated with, determines, and is determ ined by, the type of
inference that you want to make from the experiment ; and once you ‟ve chosen
VSM 11
the inference model , that has implicat ions for the appropriate s tat is t ical
analysis procedure ; and to put i t , to relate i t to something else that probably
confuses a lot of folks in their f i rs t experimental design course, the inference
model determines what are the natural random and f ixed effects in the model .
So let me say a l i t t le bi t more about what I mean by inference
models . The condi t ional inference mode l is one in which the general izat ions
you hope to make are to the blocks that are actual ly in the experiment or ones
that are just l ike them.
And what does “ just l ike them” mean? It means they respond,
that in a sense they have exact ly the same effect pa rameters .
In the condi t ional inference model , blocks in the experiment are
the universe you want to general ize to . And the idea of general izat ion to other
blocks depends on extra -s tat is t ical considerat ions . You have to make an
argument that the resul ts o f this experiment apply to some other blocks
because those blocks, those schools— frequent ly blocks are schools or they ‟re
school dis t r icts—are just l ike the ones that are in the experiment , and that ‟s
an ex tra-s tat is t ical argument .
So i t ‟s clear that general izat ion from a condi t ional , in the context
of a condi t ional inference model to things outs ide the observed blocks, say
the observed school dis t r icts , i s something that can ‟ t be done in a model free
way.
By the way—well , let me go on and say a word abou t the
uncondi t ional inference model . So what ‟s the al ternat ive? Well , the
al ternat ive that I‟m posing, and i t may not be exact ly the only al ternat ive, is
VSM 12
that we have an uncondi t ional inference model , which means that the
inference is intended to be a gene ral izat ion to a universe of blocks which
expl ici t ly includes at least some blocks that are not in the experiment .
Therefore, we think of the blocks in the experiment as being a
sample of blocks from a universe or populat ion of potent ial blocks . There ‟s
space ei ther place . And the idea, the usual idea, in the uncondi t ional
inference model is that the blocks in the experiment are a representat ive
sample of some populat ion, and the inference that ‟s of interest is to the
populat ion of blocks that the experimen t are a representat ive sample of .
And the beauty of the uncondi t ional inference s t rategy is that
inference to blocks not contained in the experiment is by sampling theory,
and we have this wonderful theory of general izat ion based on sampling.
Now, i f the blocks aren ‟ t a probabi l i ty sample of any populat ion
that you can specify, the general izat ion gets t r icky because the quest ion
becomes what is the relevant universe and how would you know ; and, in a
sense, ex tra-s tat is t ical considerat ions, just l ike in th e condi t ional inference
case, come into play and are, in some ways, just as t r icky.
Now, I want to refer this s i tuat ion that I ‟m talking about to a
s i tuat ion that you ‟re al l famil iar wi th but may not seem exact ly the same to
you, and that is the case of s t rat i f icat ion and clustering in sample surveys.
If you think about the way in which we do inference in sample
surveys, many sample surveys pick intact groups, a sample of intact groups,
clusters , and we get the sample by sampling f i rs t the clusters and th en
individuals within the clusters .
VSM 13
Now, when we do this , what we ‟re doing is taking a sample of
clusters and t rying to make an uncondi t ional inference to the universe from
which those clusters have been sampled . If you happen to have al l of the
clusters in the universe you want to infer to , we use a different name for the
clusters in sample survey work ; we cal l them strata.
And when you have a sample that ‟s based on s t rata, that is al l the
subdivis ions of the populat ion, and not clusters , some of the sub divis ions of
the populat ion, i t has implicat ions for the inference . It has implicat ions for
the uncertainty of the inference, in part icular , and i t ‟s exact ly paral lel here in
the experimental s i tuat ion.
So you can think of the inference model as being l in ked in an
inextr icable way to the sampling model for blocks . If the blocks that you
observe are ideal ly a random sample of blocks, they ‟re a source of random
variat ion . If the blocks observed are the ent i re universe of relevant blocks,
they are not a source of random variat ion . Just l ike clusters are a source of
added variat ion in a sample survey but s t rata aren ‟ t .
Now, the reason I‟m making a big deal about inference models ,
and I haven ‟ t said anything about s tat is t ical analysis yet , i s that I think i t
actual ly— these two things are often conflated in people ‟s understanding of
what to do in experiments , and I think i t helps to separate these two
conceptual ly different things.
There is this inference in sampling model , and then there ‟s what
analysis you do to t ry and learn something from data you col lect under those
models , and the fact is you can do any s tat is t ical analysis with any sampling
VSM 14
model ; some of them won ‟ t be sound. But , nonetheless , these two things are
in principle separable.
So going back to our s imple example, i f we think of the blocks as
f ixed effects , i f we ‟re wil l ing to , i f we ‟re wil l ing to entertain a condi t ional
inference model , that is we ‟re interested in inferr ing to the universe of blocks
we have observed in the experiment , then i t ‟s ent i rely appropriate to think of
the blocks as being f ixed effects , and in that case, we get one tes t s tat is t ic
which basical ly is the one that I ‟ve l is ted here as F condi t ional .
And the only thing that winds up in the error term in the
denominator we compare to the variat ion due to t reatment for the purposes of
seeing whether the t reatment variat ion r ises above the noise is just the
within-cel l variat ion.
And there ‟s the exact dis t r ibut ion of this thing is known . It ‟s an F
with the degrees of freedom th at I‟ve given here .
But i f the blocks are random effects , in other words, i f we think
of the blocks we have introduced as not being a universe unto themselves , but
being a sample from a universe we want to infer to— for example, we have
school dis t r icts . We‟ve blocked by school dis t r icts . We‟re not interested in
making a general izat ion to the school dis t r icts we happen to observe . We
want to make a general izat ion to the universe of school dis t r icts from which
we could putat ively say ours are a representat i ve sample, then the appropriate
tes t s tat is t ic is di fferent .
And in this case, the appropriate error term is determined by the
block t reatment interact ion variance, the sum of the squares is due to the
VSM 15
block t reatment interact ion, and i t ‟s notable that that error term has a lot
fewer degrees of freedom.
By the way, one of the ways of thinking about why this F
uncondi t ional is the appropriate tes t s tat is t ic is to sort of go back to the
design and remember that in this design, we can compute a t reatment eff ect
wi thin each block, and what this analysis of variance tes t s tat is t ic is , i s
basical ly just the equivalent of doing a t - tes t on the block specif ic t reatment
effects , only i t ‟s an F tes t . You know, square of a t - tes t on the block specif ic
t reatment effec ts .
Now, you can see, i f you compare these two, that the error term in
the condi t ional inference tes t , the so -cal led “f ixed effects model ,” has a
whole lot more degrees of freedom than the one in the uncondi t ional model .
If p is the number of blocks, we have, you know, basical ly 2pn
minus 2p degrees of freedom under the condi t ional analysis and only p minus
1 under the uncondi t ional analysis .
And that ‟s a difference so you can al ready tel l the condi t ional tes t
i s going to be more sensi t ive .
What you can ‟ t see immediately from just comparing the tes t
s tat is t ics is that i f there is a t reatment effect , the average value of the F
s tat is t ic is going to be larger under the f ixed effects model . General ly, i t ‟s
going to be larger , and actual ly what I mean here is not qui te the average
value of the F s tat is t ic . What I mean is the so -cal led “noncentral i ty
parameter ,” but i t ‟s , that ‟s a dis t inct ion that probably isn ‟ t important for the
general point .
VSM 16
And, in fact , you can see that the rat io of the noncentral i ty
parameters under the two models with due al lowance for some l i t t le cheat ing
on notat ion depends on this factor . It depends . Well , f i rs t of al l , you can see
that i f al l these numbers are bigger than 1 in this fract ion, then, lo and
behold, the fract ion wil l b e bigger than 1.
The crucial issue is i f there is heterogenei ty in the t reatment
effects across blocks, the f ixed effects analysis is going to have, is going to
have a larger noncentral i ty parameter . It ‟s going to be more powerful .
This omega parameter here is going to crop up again . It ‟s going to
turn out that when we introduce blocking factors , and part icularly i f those
blocking factors are not considered f ixed effects , then the amount by which
the t reatment effect varies across blocks is going to enter into the sensi t ivi ty
of the analysis .
But , remember, that shouldn ‟ t surprise you . If you think again
about what this uncondi t ional analysis is in this design, i f you think about
computing block specif ic t reatment effects and then doing, asking whether th e
average of the block specif ic t reatment effects is di fferent than zero, the sort
of intui t ion is , wel l , how am I going to do that?
Well , I‟m going to take the average of the t reatment , the block
specif ic t reatment effects , and I ‟m going to compare i t to the s tandard
deviat ion of the block t reatment , at the block specif ic t reatment effects , and
so i f there ‟s heterogenei ty there, i t ‟s going to be harder to detect effects .
You know, the detai ls of the symbols don ‟ t mat ter , but this is sort
of a general idea here that I think is accessible . I hope i t ‟s accessible.
VSM 17
Now, remember, the quest ion that I was asked, essent ial ly, can I
kind of ignore the blocking, suggested that the s tat is t ical analysis was going
to be not l inked to the sampling model necessari ly, and so there are three
possible things you could imagine doing for the analysis of your now blocked
design.
One is you could ignore the blocking al l together , and that ‟s what
the impetus of the original quest ion is : can ‟ t I do this and then not pay
at tent ion to i t? You could include the blocks as f ixed effects . You can
include the blocks as random effects .
And the point is which of those is a good thing to do depends on
the kind of inference you ‟re hoping to make . If you ‟re interested in making
uncondi t ional inferences, that is inferences to blocks, let ‟s say school
dis t r icts beyond the ones you observed, then, wel l i t ‟s going to turn out
ignoring blocking is always a bad idea.
But ignoring blocking here is a part icularly bad idea because i t
real ly can inflate the s ignif icance levels in ways that you would f ind
surpris ing. Your .05 s ignif icance level , i f you ignore the blocking and carry
out a tes t at the .05 s ignif icance level , i t ‟s possible that 50 percent of the
t ime, you wil l reject the nul l hypothesi s by chance when the nul l hypothesis
is t rue.
So the actual s ignif icance level might be 50 percent or even
higher in plausible ci rcumstances . So i t ‟s a real ly terr i f ical ly bad idea to
ignore blocking in most cases .
Now, i f you ‟re interested in making uncondi t ional inferences,
VSM 18
including the blocks as f ixed effects and doing that analysis is also a bad
idea, and i t also leads you to ant i -conservat ive decis ions about hypothesis
tes ts for t reatment effects .
Including the blocks as random effects , you know, i s the r ight
thing, but i t ‟s an analysis that has less power than the condi t ional analysis .
Now, that doesn ‟ t mean i t ‟s wrong; i t ‟s an analysis that ‟s making a different
inference.
You ought to expect the analysis that is sui ted to the
uncondi t ional inference to be less sensi t ive than the analysis sui ted to the
condi t ional inference . Well , why is that? Because i t ‟s real . It ‟s a lot harder
to f igure out whether there ‟s a t reatment effect in a whole populat ion of
blocks, most of which you haven ‟ t observed, than i t i s to f igure out i f there ‟s
a t reatment effect in a specif ic set of blocks which you have happened to
observed al l of .
Okay. If you want to make condi t ional inferences in this case,
that is you real ly only care about inferr ing about t reatment effec ts in the set
of school dis t r icts you ‟ve observed, then i t ‟s s t i l l a bad idea to ignore
blocking because that actual ly has the reverse effect . It actual ly can lead to
deflat ing s ignif icance levels . In other words, you may have huge downward
impacts on the power.
You can include blocks as f ixed effects , which is the r ight thing
to do. You can include blocks as random effects , but this may also have the
effect of reducing power and making i t very hard to detect the effects when
they‟re there.
VSM 19
So the not ion that i t ‟s always the r ight thing to do, i f in doubt ,
assume that the blocks are random effects , i s not necessari ly a good pol icy.
Now, i f we move on to the designs— I‟m going to move on to the
cluster randomized design, the hierarchical design, and point o ut that the kind
of approach that I just out l ined can be fol lowed there, too . You have to think
about blocking in the cluster randomized design as adding another factor to
the design . The inference model is going to determine what the most
appropriate analysis is , and you can reason that ei ther way.
You can ei ther reason from analysis backwards inference model
or the other way around, but one important thing is that int roducing a
relat ively smal l number of blocks here, as wel l as in the completely
randomized design, may leave you with very reduced uncertainty to a larger
universe of blocks based on a s tat is t ical inference kind of , you know,
sampling theory general izat ion.
So what do I mean by i t al l works the same way for cluster
randomized designs? Well , i f the original design was that we have clusters
ass igned to t reatments , I t r ied to depict here a case in which we have, you
know, clusters l is ted across the top and some of them are assigned to
t reatment and some of them are assigned to control .
But because clusters are ass igned to one and only one t reatment—
you know, no cluster is ass igned to both t reatment and control—when we
introduce a blocking factor , what happens is , at least ideal ly, we get a set of
clusters within each block that are ass igned to t reatment , and another set of
clusters within that block that are ass igned to control , and so the dashes there
VSM 20
are intended to indicate that those are, you know, those are non -assignments .
In a sense, the f i rs t m clusters in the f i rs t block are ass igned to t reatment , and
the second m clusters in the f i rs t block are ass igned to control , et cetera.
So this is a knowable design . It ‟s a part ial ly hierarchical design .
In this design, we see that t reatments are crossed with blocks, but clusters are
nested within block t reatment combinat ions, and this has at least one
immediate implicat ion for , you know, what the design can reveal to us , and
that is that s ince each block has both t reatments , we can actual ly es t imate
block specif ic t reatment effects here.
We can ‟ t es t imate cluster specif ic t reatment effects , but we can
est imate block specif ic t reatment effects .
So we might imagine applying this design in a case where the
blocks are school dis t r icts and clusters are schools , and we have some schools
in each dis t r ic t ass igned to each t reatment .
Okay. Well , the usual part i t ioning, the original part i t ioning, i f we
want to ask how this impacts the analysis , the original part i t ioning would just
subdivide the total variat ion into part due to t reatment , part due to cluste rs ,
and part due to within -cluster t reatment combinat ion, wi thin, wel l , wi thin
clusters that are within t reatments .
And there ‟s a sort of s tandard analysis which would tel l us that
the r ight tes t s tat is t ic compares the t reatment variat ion to the variat ion due to
clusters .
Now, when we introduce this blocking factor , we ‟ve sort of
int roduced the possibi l i ty of a bigger part i t ioning of a total variat ion in the
VSM 21
experiment into part that ‟s due to t reatment , part that ‟s due to blocks, part
that ‟s due to the block t reatment interact ion . Remember, this is a case in
which each block has both t reatment and control , so we can est imate a
t reatment effect in each block, and therefore, there ‟s the possibi l i ty of
heterogenei ty of t reatment effects across blocks.
Then we also have a part that ‟s variat ion due to variat ion between
the clusters within block t reatment combinat ions, and then we have a part
that ‟s due to within clusters , and i f we ask, wel l , how should we evaluate the
variat ion due to t reatment , how should we, what should we compare the s ignal
to , wel l , i t ‟s sort of not immediately obvious, and i t depends on the inference
model .
If we think of the blocks as being f ixed, that is we ‟re interested
in a condi t ional inference about the blocks—now in this case, I‟m thinking
that we ‟re imagining that we don ‟ t have al l— that ei ther we don ‟ t have or we
don ‟ t wish to think of having al l the clusters , al l the schools within a dis t r ict
in our experiment .
But we are, but we might be interested in ei ther inferr ing to just
the set of dis t r icts in the experiment or some bigger set of dis t r icts .
Well , i f we ‟re interested in the condi t ional inference, that is an
inference that ‟s formal ly to the dis t r icts we happen to observe in our
experiment , then we get , you know, one tes t s ta t is t ic . We compare the
t reatment versus the variat ion across clusters within block t reatment
combinat ions.
If we ‟re interested in inferr ing to a larger col lect ion of blocks
VSM 22
than the ones we ‟ve observed, to a larger universe, an uncondi t ional
inference, then, the appropriate tes t—well , there actual ly turns out there isn ‟ t
an exact appropriate tes t and analysis of variance, but there ‟s an approximate
one, and that ‟s s l ight ly conservat ive, and that one would compare the
t reatment variat ion to the variat ion ac ross blocks.
Now, the key point here is that the exact ly appropriate thing to do
isn ‟ t the same in these two cases . And as you ‟d expect when we ‟re making a
condi t ional inference, there ‟s a lot more degrees of freedom for es t imat ing
the uncertainty of the t reatment effect , the more degrees of freedom in the
error term, than there is under our uncondi t ional inference . And so that would
lead you to say, okay, the condi t ional inference is probably going to be more
powerful than the uncondi t ional inference, and that shouldn ‟ t surprise us ,
again, for the same reason i t wasn ‟ t surpris ing before.
The thing that ‟s maybe a l i t t le less obvious is— than just the
degrees of freedom difference— i s there ‟s a difference in the noncentral i ty
parameters , a difference in the a verage s ize of the tes t s tat is t ic . And in
general , the average s ize of the tes t s tat is t ic is going to be bigger when
there‟s a t reatment effect under the condi t ional model than under the
uncondi t ional model ; and exact ly how much bigger i t ‟s going to be depends
on how heterogeneous the t reatment effects are across blocks and exact ly how
heterogeneous the clusters are within blocks.
And so this term actual ly, this omega rho B term sort of is a term
that describes the amount of heterogenei ty across blocks in the t reatment
effect , and row C here is the interclass correlat ion which describes—across
VSM 23
clusters—which describes the amount of variat ion that there is across
clusters .
Now, as before, there are different possible analyses . We could
ignore the blocking a l l together . We can include the blocks as f ixed effects .
We could include the blocks as random effects , and which is the r ight thing to
do depends on how we want to think about the inference model we want to
make, whether we want to make a condi t ional infe rence or an uncondi t ional
inference.
As before, ignoring blocks is a bad idea. It inf lates s ignif icance
levels of the tes t , and i t ‟s a big effect . Including the blocks as f ixed effects is
a bad idea i f we want to make an uncondi t ional inference because i t wi l l also
lead us to erroneous s ignif icance levels . And including blocks as random
effects is sort of the r ight thing to do i f we want to make uncondi t ional
inferences.
And we have a paral lel s i tuat ion that i f we want to make
condi t ional inferences to th e blocks in the experiment , ignoring blocking is
s t i l l a bad idea . Including blocks as f ixed effects is the r ight thing to do . In
this case, s t rangely enough, including blocks as random effects doesn ‟ t have,
i t doesn ‟ t have a big effect on the s ignif icance tes t , which is not something
you would have expected but turns out to be t rue.
Now, I should say something about these sort of general pieces of
advice here . When I s tar ted this talk, I thought i t would be a good idea to
give you some formulas for these things, and then I real ized that they aren ‟ t
easy to interpret . I wi l l say that this advice that I ‟ve given here about the
VSM 24
completely randomized design is predicated on typical values of some of the
parameters that affect things.
If you block on something that actual ly has absolutely no effect ,
i t turns out that you can ignore the blocks and i t won ‟ t hurt you . But you ‟re
probably not going to be blocking on things— I mean I hope you aren ‟ t
choosing things that have absolutely no effect to block on . You know, i t ‟s not
a good idea .
Okay. Now, one can do the same kind of analysis for a mult i -
center , for s tudies that s tar t out being mult i -center , randomized block s tudies .
And essent ial ly what winds up happening is that you, as before, int roduce a
new factor , a new blocking factor , and the way to sort out what goes on in the
analysis is to just , you know, sort of do the s tandard kind of procedure to sort
through what the appropriate tes t s tat is t ics are in the design that you ‟ve
created.
Now, i t ‟s interest ing that both in the case of the hierarchical
design and in the design that ‟s original ly hierarchical , the cluster randomized
design, and design that ‟s original ly randomized blocks s tudy, you get
di fferent variance of part ial ly hierarchical experimental designs . They are
known designs.
We know how to analyze them and al l this s tuff , but they aren ‟ t
exact ly the same, and, you know, I have material on this , but I don ‟ t think I‟m
going to go through the detai ls of what happens in the mult i -center design
when you add extra blocks because, in a way, the sort of message that came
from the f i rs t two cases we ‟ve talked about probably you ‟ve got ten.
VSM 25
Okay. I got to f igure out whether I want to make condi t ional
inferences or uncondi t ional inferences, and then I got to sort out the r ight
thing to do . Make a s tat is t ical analysis that corresponds to that .
So what I‟d actual ly l ike to do at this point , because this could
actual ly, this could actual ly lead to our having a l i t t le bi t of t ime for a
discussion, and i f I‟m left to my own devices and I go through al l of this ,
there wil l be no t ime.
So I‟m going to speed through this , and get to “another easy
quest ion.”
[Laughter . ]
MR. HEDGES: And this is a quest ion that comes up al l the t ime .
There was some at t r i t ion from my stud y after ass ignment . Does that cause a
serious problem? And, you know, this is another s imple quest ion, and the
answer is far from simple.
I have an answer, and I ‟m going to frame i t in terms of
experimental design, but before I do, let me just point out that there ‟s a
s imple quest ion that has an easy answer, a different s imple quest ion that has a
real ly s imple answer.
And that ‟s „does at t r i t ion cause a problem in principle? ‟ And the
answer to that is „yes ‟ . You know, randomized experiments with at t r i t io n no
longer give model free, unbiased est imates of the causal effect of t reatment .
Remember, the reason we love randomized experiments is because
they in principle give us model free es t imates of t reatment effects , causal
effects of t reatments . And we lose that i f there ‟s at t r i t ion.
VSM 26
Whether the bias is serious or not depends on the model that
generates the missing data . That ‟s an old refrain I‟m sure you ‟ve heard
before. But before we, before we go too far with this , let me just give you one
way of thinking, thinking about post -assignment at t r i t ion in terms of concepts
that you ‟re used to from thinking about experiments .
If we have a t reatment and a control group, we can think about
missing this as l ike introducing another factor . We have the observed dat a for
the people in the t reatment group and the control group, and then we have the
missing data for the people in the t reatment and the control group.
Now, by just making this picture, you can see that there ‟s a
problem in est imat ing the t reatment effect from only the observed part of the
design. The observed t reatment effect is only part of the total t reatment
effect . You could think in—another term sometimes used in experimental
design is that the observed data al lows you to es t imate the s imple t reatmen t
effect on the observed, but doesn ‟ t a l low you to es t imate the t reatment effect
on the missing, and the main effect of t reatment is a combinat ion of those
two.
But this is t r ickier . You know, there are things you can learn by
just thinking about the alge bra of this that are useful . Suppose, so now what
I‟m going to do now is talk about algebra, not talk about anything that has to
do with s tat is t ical inference or anything deep.
This is , the rest of this bi t i s just about , just about algebra . If we
think of our design, and now we ‟re taking the God ‟s-eye view of our design .
We‟re assuming we can see the populat ion means, not only for the observed
VSM 27
people, but for the unobserved people, and this helps us kind of understand
what ‟s going on.
If mu TO is the mean in the t reatment group for the observed
individuals , and mu CO is the mean for the control group of the observed
individuals , and we have mu TM is the mean of the missing folks in the
t reatment group, then mu CM is the mean for the control folks who are
missing, and again you don ‟ t have access to these quant i t ies . You don ‟ t have
access to any of these quant i t ies in a real experiment .
But for the purposes of understanding what post -assignment
at t r i t ion might do, we can imagine we had access to these quant i t ies . We
could imagine ourselves in the role of the dei ty in actual ly understanding
what ‟s going on beneath the sampling error .
Now, I‟ l l int roduce another quant i ty which is obviously
important . When that ‟s the proport ion of the total number in each grou p that
are observed or missing, so pi T is the proport ion of the t reatment group
that ‟s observed, and pi bar -T is one minus pi T . So, in other words, i t ‟s the
proport ion of the t reatment group that ‟s missing, and the same thing for the
control group . So pi C is the proport ion of the control group that ‟s observed,
and pi bar-C is the proport ion of the control group that ‟s missing.
It ‟s clear that al l together this is the informat ion you would need
to sort out the t reatment group, the actual t reatment effect on the total group
that was randomized, and a l i t t le bi t of algebra tel ls you that , wel l , pi T t imes
mu TO plus pi bar -T t imes mu TM, that ‟s just the mean of al l the people in
the t reatment group, both the observed and the unobserved.
VSM 28
And pi C mu CO, plus pi bar-C, mu CM is just the mean of al l of
the people who are in the control group . That is al l of the people who are
randomized, the missing and observed.
And so the difference between those two is just the t reatment
effect on al l individuals randomize d, and when the proport ion of dropouts is
equal , this s impli f ies further ; so that i f the proport ion missing in both
t reatment and control group is the same, and I ‟ l l cal l that pi , then we can
wri te the t reatment , the t reatment effect on everybody randomize d, in terms
of the t reatment effect amongst the observed and the t reatment effect amongst
the missing in just the form of the las t s l ide there.
In other words, we could wri te , i f we cal led del ta the t reatment
effect on al l the individuals , then del ta is j ust a l inear combinat ion of the
t reatment effect among the observed folks and the t reatment effect upon the
missing folks . And these two pieces are weighed proport ionately, you know,
their proport ion pi of the folks that are observed.
So the t reatment eff ect among the observed gets weighed pi , and
pi bar of the folks are unobserved , so there the t reatment effect among the
missing gets the weight pi bar .
Now, this immediately tel ls us that we ‟re not going to be able to
do any inference on del ta without mak ing some assumptions or something that
sneaks into the model , l ike assumptions, that constrains the possible values of
del ta m.
So the reason I said there was a s imple answer to the quest ion of
whether or not at t r i t ion was a problem in principle is that t he value of the
VSM 29
t reatment effect on the individuals randomized could be anything i f there ‟s
any missing data, and we have no knowledge of del ta m.
So let me just sort of say that again because i t ‟s an obvious but
profound point . If we think of our outcome variable as having an infini te
range or a pract ical ly infini te range, then I can make del ta anything by
varying del ta m by enough, i rrespect ive of what del ta observed is .
So, in principle, at t r i t ion is an enormous problem that de -
ident i f ies our experimen tal es t imates .
Now one way you can—now, of course, there ‟s a big, you know,
caveat in what I just said, which was that i f del ta m has an infini te range,
then dot -dot-dot , but we al l know that the variables we measure usual ly don ‟ t
have an infini te range . You know, we have tes t scores that are bounded in
some ways . We might even have some bounds we ‟d be wil l ing to set on the
t reatment effect among the missing based on plausibi l i ty arguments , which
would al low us to bound the total t reatment effect based on knowing the pi ‟s .
So one way that people have approached the problem of deal ing
with at t r i t ion is to think about arr iving at bounds on the things we don ‟ t
observe— this is sort of the Chuck Manski approach, you know, f igure out
bounds on the things you don ‟ t observe—and i f you can specify bounds, then
you can specify bounds on the thing you want to es t imate.
And so the key point is no est imate of the t reatment effect is
possible without an est imate of the t reatment effect among the missing in this
model , and you know, i t ‟s possible that we can improve by assuming a range
of plausible values . There are various ways to do this . One of the ways to do
VSM 30
i t i s to take absolute bounds . The tes t scores go from zero to a hundred, then
zero is the smal lest they can be , and a hundred is the biggest they can be, and
you could—well , I won ‟ t go on yet—and you can set up bounds in that way.
You might say there ‟s a no qual i tat ive interact ion hypothesis ,
which is i f the t reatment effect is posi t ive for somebody, i t can ‟ t be negat ive
for anybody, and that sets up the possibi l i ty that i f del ta o is posi t ive, then
del ta m can ‟ t be any smal ler than zero.
It ‟s an assumption, not data . It ‟s not guaranteed, but i t ‟s an
assumption that could be made and some individuals f ind that a very plausible
assumption in , say, medical s tudies al though you can easi ly think of examples
in which i t probably isn ‟ t t rue.
But I want to go on for a minute because I think there ‟s another ,
again, s imple point that is not , I think, completely understood by al l
scient is ts , and that is that when the at t r i t ion rate is not the same in the
t reatment and control groups, the analysis gets t r ickier . And one idea that
people have used occasional ly is to t ry to convince themselves about the
t reatment effect on those who drop out , those who are missing or unobserved,
and they want to come up with bounds on that t reatment effect .
And they say, wel l , you know, i f I can be pret ty sure about what
the t reatment effect would have been on the missing folks , then that wi l l
convince me that the at t r i t ion hasn ‟ t spoi led the analysis .
Now how could you do that? Well , i t ‟s maybe not so impossible
to do. In a longi tudinal s tudy, for example, you may have mult iple waves of
measurements . And so you might use the t reatment effec t , the t reatment effect
VSM 31
measure on the las t wave of observat ions you have, as kind of a proxy for the
t reatment effect later .
And/or you might have something you know to be s t rongly
correlated with the outcome . It ‟s not the outcome, but you bel ieve you ca n
predict the outcome from i t , and i f you predict the outcome from i t , then you
can get some kind of purchase on what the t reatment effect upon the missing
might be.
So i f you got a s i tuat ion l ike the one that I ‟ve depicted here, you
might be fai r ly sangu ine. So you look at this . I think this— i t ‟s made-up data,
but i t ‟s the kind of data you might— i t ‟s not too crazy. We have a t reatment
effect among the observed.
We have an est imated t reatment effect or a putat ive t reatment
effort , or maybe the dei ty whisp ered in our ear , and we know that this is the
t reatment effect among the missing . We see that the score, the average scores
among the missing are smal ler than the average scores among the observed,
and that sort of makes sense, you know.
The people who can ‟ t make i t to the post - tes t are sometimes not
able to make i t to lots of other things, and that may interfere with their
overal l achievement . But when we look at the t reatment effect among the
missing, i t ‟s just about the same— i t ‟s exact ly the same as the t reatment effect
among the observed.
So how many of you are sanguine about the fact that the t reatment
effect among the total group of individuals randomized is going to turn out to
be posi t ive and maybe the same value as we ‟ve seen here?
VSM 32
I won ‟ t ask you—you could raise your hands, but I won ‟ t ask you
to raise your hands—but I wi l l ask you to make a mental commitment to that .
Does i t seem plausible to you that the t reatment effect among everybody
randomized is , yeah, going to be posi t ive in 23 or somethi ng l ike that?
Pause. Do a l i t t le wai t t ime . Let you think about that . Everybody
convinced themselves they have an answer? Al l r ight . Well , i t ‟s not , you
know, obviously I set you up for this . I shouldn ‟ t have set you up for i t . I
should have al lowed you to say, to think what you might have thought .
But now I‟m going to show you that just because the t reatment
effect among the missing is the same as the t reatment effect among the
observed doesn ‟ t mean that ei ther of those things is the t reatment effect
among the ent i re group randomized.
Here‟s the rest of the data . Here are the numbers , and pay
at tent ion to the total on the far r ight . I rounded those to zero decimal places ,
but what you can see is that the t reatment effect is posi t ive and the same,
exact ly the same, for both the observed and the missing . And the t reatment
effect is exact ly negat ive that for the ent i re group pooled.
Now, this may be a l i t t le surpris ing . It ‟s not surpris ing probably
i f you remember Simpson ‟s paradox because Simpson ‟s paradox, remember,
usual ly isn ‟ t—we usual ly don ‟ t think about in terms of cont inuous data . We
usual ly think about i t in terms of discrete data, but this is just a s imple
example of Simpson ‟s paradox.
What ‟s going on? You know how can this be? The f i rs t t ime
people see Simpson ‟s paradox, the quest ion is always how can this be, and the
VSM 33
answer to how this can be is that you see that most of the folks we observed
in the t reatment group, wel l , most of the folks in the t reatment group didn ‟ t
get observed, and the unobserved folks had lower scores .
Most of the folks in the control group did get observed, and most
of the folks who were observed had higher scores , and even though there was
this apparent perfect agreement between the observed and the unobserved in
terms of t reatment effect , the maldis t r ibut ion of people in terms of observed
and missing made i t possible—and the difference between observed and
missing on the average sort of made i t possible for the average t reatment
effect among— the average score among t he t reated to be much lower than the
average score among the controls .
This is just another example of , when I say Simpson ‟s paradox,
Simpson ‟s paradox is why the death rate can be going up even though i t ‟s
going down in every age category. How can that be? The populat ion is get t ing
older or the fact that the graduate school can be admit t ing more women, more
of any category, in each department , but overal l , have fewer of that group
being admit ted.
So the point is i f you have unequal at t r i t ion rates , i t ‟s not enough
just to know what the t reatment effect would have been in the missing guys .
You got to know something about what the t reatment effect—you got to know
the individual means.
Now, obviously, you can t rade off informat ion in coming up with
clever bounds, and there ‟s a whole l i terature on that s tuff . One possibi l i ty for
helping you understand the potent ial effects of at t r i t ion is to sort of put
VSM 34
bounds on the t reatment group means in some way, lower and upper bounds.
And i f you do that , you can kind of reason through that i f you put
in the smal l—you put in the lower bound for the missing folks in the
t reatment group and the upper bound for the missing folks in the control
group, you can come up with an absolute lower bound for the possible
t reatment effect .
And i f you put in the upper bound for the missing t reatment folks
and a lower bound for the missing control folks , you can come up with an
upper bound for the possible t reatment effect on the ent i re populat ion.
And in general there are many permut at ions of work l ike this . But
I‟ l l jus t point out that nothing that I said in this l i t t le sect ion of the talk
involves sampling or es t imat ion error . It ‟s al l just algebra, and things get a
l i t t le bi t more complex i f we s tar t taking sampling into account , b ut real ly the
biggest sort of s tumbling blocks are the ones that I t r ied to ident i fy here.
And now I‟m at the s tage where I‟m going to s tar t wrapping up so
there wil l be a l i t t le t ime i f people want to ask quest ions so I don ‟ t go for the
whole t ime.
There are a lot of seemingly s imple quest ions that ar ise in
connect ion with f ield experiments , and I t r ied to cover one category of them
and one other category to some degree.
And the main conclusion that I have about this is that the answers
to these quest ions often are much t r ickier than they seem to be . They require
complex thinking about fundamental aspects of what the research design is ;
what you ‟re t rying to infer from that design, and connected to i t , what your
VSM 35
vision of the sampling model is , even i f i t i sn ‟ t a formal sampling model in
the sense that a survey sampler would recognize i t .
And i t depends cri t ical ly also on assumptions about missing data
and how seriously you take that . Now, i f there ‟s only a t iny amount of
missing data, obviously those pr oblems aren ‟ t so important . But i f a thi rd of
your observat ions are missing, a thi rd of the people disappear, then they are
not to be ignored.
And I think the crucial point that I ‟d l ike to leave you with is that
often these s imple quest ions don ‟ t have, you know, don ‟ t have s imple
answers . They require a lot of complex thinking, and when you go to a
methodological advisor , and they give you the answer that my fr iend Henry
ment ioned as the best answer to quest ions l ike this , “ i t a l l depends,” they‟re
not just t rying to be uncooperat ive .
[Laughter . ]
MR. HEDGES: Or i t i sn ‟ t that they don ‟ t know. They may not
know, but , you know, a fai r answer is „ i t a l l depends ‟ ; even then . I may know
that much. So I think that the general sort of entreaty here is to be ge nt le with
those of us who say I‟ve got to know more . You need to explain more to me
about what you ‟re doing and more to me about where you got your sample and
what you think i t means and what the design is because, for me, answering
some s imple quest ions, these and others , i t ‟s hard to do without that
addi t ional detai l and informat ion.
I think I wi l l s top, uncharacteris t ical ly, before the end of the
t ime, and maybe there wil l be a possibi l i ty for discussion . I know there are
VSM 36
several people in this room who know at least as much as I do about al l these
things, and i f I don ‟ t know the answer or i f I know the answer, but i t ‟s not
r ight , one of them wil l help me out .
So I‟ l l s top.
[Applause.]
MR. HEDGES: So are there quest ions or comments that people
would l ike to make—maybe another s imple quest ion? That would be enough
to f i l l up the t ime remaining, I ‟m sure.
I‟m supposed to tel l people to go to the microphone, and Henry
has got f i rs t mover advantage.
MR. BRAUN: Thanks very much, Larry.
In t rying to analyze observat ional s tudies , people often use
select ion models , sort of two -stage models , to t ry to capture the effects of
select ion in order to get unbiased t reatment effects .
MR. HEDGES: Yeah.
MR. BRAUN: Do you know if people have t r ied to do s imila r
things in the context of randomized experiments with non -random at t r i t ion to
t ry to model the at t r i t ion in some way, maybe using covariates and so on, and
have they been successful? Is there any way to tel l?
MR. HEDGES: I‟m glad i t was an easy quest io n. You know, there
are different views about that . My col league Tom Cook, for example, would
argue that not so much of Heckman s tyle select ion models , but various kinds
of matching models , you know, propensi ty score models and even just plain
old garden-variety analysis of covariance can do a good job.
VSM 37
But I think the problem is that any demonstrat ion that they can do
a good job, and this is my— I know some of those folks are in this room so
they may choose to get up and disagree and explain why I ‟m ful l of i t .
Demonstrat ions that they can do a good job in some circumstance always
seemed to me to be— I have no idea how to general ize to another s i tuat ion,
and so I f ind them total ly unsat isfying.
There ‟s another point of view which is I know my former
col league J im Heckman, when he was presented with some s tudy—well ,
Robert LaLonde and Rebecca Maynard ‟s work that did some select ion model
analyses of pret ty good randomized experiments , and they got different
answers . And J im ‟s response to that was, see, I told you experiments were
unrel iable.
[Laughter . ]
MR. HEDGES: So i t becomes, I think, so I ‟m not persuaded .
What I‟m persuaded of is that there are cases in which they can do a good job .
What I can ‟ t tel l i s which cases they are, and that ‟s the rub . I mean the reason
I l ike experiments is because—and I think i t ‟s probably the reason al l of us
l ike experiments—you don ‟ t have to be very smart to do a good job with
experiments . You don ‟ t have to know anything i f you can keep the experiment
intact . As soon as i t f al ls apart , then you have to know stuff to make
inferences . That ‟s much harder .
Anybody from—Jack, okay. You ‟re supposed to talk into the
microphone. See, I remembered.
DR. RUBY: Name.
VSM 38
MR. MOSTOW: Oh, sorry. Jack Mostow, Carnegie Mellon .
When you are designing an experiment where you have mult iple
data points for individual , ei ther s ingle subject randomized within subject or
randomized between subjects , i s i t exact ly analogous to everything you said
about clustering at the classroom level or do the r ules change? And i f so,
how?
MR. HEDGES: It ‟s almost exact ly analogous . The only twist is—
I mean this is actual ly one of the great things that Steve Raudenbush and
Tony Bryk kind of contr ibuted to our understanding of the world . I mean they
didn ‟ t exact ly invent this s tuff , but they actual ly were the f i rs t— they actual ly
told us about i t , and you know, that ‟s probably more important in some ways,
that mult iple observat ions within individuals are kind of a nest ing s t ructure
that ‟s very much l ike individual s within clusters .
And so much of what ‟s t rue about , in fact , most of what ‟s t rue,
about clustering within schools or classrooms, is also t rue about clustering of
observat ions within individuals . There ‟s one sort of different point , which is
that the correlat ion s t ructure among observat ions within individuals can be a
lot more complex than the correlat ion s t ructure within clusters of individuals ,
at least the correlat ion s t ructures we choose to model within clusters of
individuals .
By that , I mean i f you have mult iple measures over t ime, i t ‟s
highly plausible—well , let me back up and say the problem with, the problem
that cluster—cluster randomized t r ials present us with is that individuals
within clusters are correlated . They‟re correlated because i f ther e‟s an effect
VSM 39
of the cluster , everybody in the cluster has the same effect . They share i t so
their data is correlated.
But usual ly we take that correlat ion to be the same— that
correlat ion among people to be the same . Now, when you model data across
t ime within individuals , for example, i t ‟s possible that the correlat ion
between measurements at any two t ime points is the same, but i t ‟s certainly
plausible that measures that are closer together correlate more highly than
measures that are further apart .
And the exact s t ructure of that , and i t turns out that the s t ructure
of that correlat ion mat ters , and so that ‟s— the fact that the correlat ion
s t ructure is potent ial ly more complicated is the principal way in which things
differ .
And one could tear one ‟s hair out about those things . That ‟s , of
course, what I‟ve done, and—
[Laughter . ]
MR. HEDGES: — i f I hadn ‟ t encountered those problems, I ‟d
have a ful l head of hair .
[Laughter . ]
MR. HEDGES: So that ‟s what I have to say about that . And
again, I would encourage my methodological ly-incl ined col leagues I‟m not
the only source of knowledge in the room so—
MR. ALEXANDER: I‟ve got a s imple one for you.
MR. HEDGES: Uh-oh.
[Laughter . ]
VSM 40
MR. ALEXANDER: Karl Alexander, Hopkins.
Indeed, a s imple one . So I thought this was t remendously
informat ive, very useful , even for someone who doesn ‟ t l ive and breathe this
framework. So I am wondering i f there ‟s a way that we could get your
complete l is t of s imple quest ions.
[Laughter . ]
MR. ALEXANDER: Can we wri te to you and ask for i t? And
your notes perhaps to walk us through some of the answers to these s imple
quest ions.
MR. HEDGES: Oh . Okay. Well , I could work on that . I don ‟ t
know that I‟ l l ever have a complete l is t .
MR. ALEXANDER: I don ‟ t mean complete, but a mo re complete.
MR. HEDGES: More complete . Yeah, I could probably, I could
probably furnish a more complete l is t of s imple quest ions with a part ial ly
complete l is t of answers , and I ‟d be interested to do that .
Actual ly, I‟d also be interested in hearing f rom other people what
their s imple quest ions are, and actual ly, Karl , thank you for that , because i t ‟s ,
you know, anybody who has any of these quest ions is not the only person who
has i t , and I assume that I haven ‟ t been asked al l possible s imple quest ions .
So, but amongst al l of us , we could generate a good l is t of quest ions.
And in some cases , i t requires a l i t t le bi t of work to f igure out , to
think through the quest ions, and i t ‟s actual ly valuable . I f ind i t valuable, but
that ‟s the kind of s tuff I do.
Yeah.
VSM 41
MR. BORMAN: Hi . Geoffrey Borman, Universi ty of Wisconsin -
Madison.
Maybe I‟ l l have a s imple quest ion . One issue that comes up a lot
related to your discussion about blocking—maybe not a lot , but i t ‟s happened
to me more than once— i s this issue when you ‟re involved in a recrui t ing—
schools , for instance—a group of schools may come forward, or you may
act ively go out and f ind a cluster of schools that are wil l ing to part icipate in
your experiment , and those schools maybe don ‟ t necessari ly const i tute a real
cluster in the way that we think about them, in that they ‟re not real ly related
to one another . They may be, i f i t ‟s a nat ional s tudy from al l over the country,
and real ly not have much in common at al l .
What they do have in common, though, is that you were able to
round up this group of schools at one t ime, and you know they ‟re eager to
know what their s tatus is in your experiment , i f they ‟re t reatment or control ,
and perhaps your implementat ion schedule depends on kind of this rol l ing
cycle of randomizat ion.
I‟m just wondering how would you think about that general ly and
analyt ical ly where, you know, these clusters are more just bui l t out of
convenience rather than being associated analyt ical ly with the outcome
measure , and how would you think abo ut that?
MR. HEDGES: Well , I can tel l you i f i t turns out they ‟re not
associated with the outcome measure, you ‟re home free because they‟re not
real clusters . I mean they‟re not clusters in the s tat is t ical sense.
The fact that they arise over t ime is in a way not so much of a
VSM 42
problem, at least in my view. And I think this rol l ing recrui tment is real ly
more the rule than the except ion in a lot of s tudies . I mean some people
manage to recrui t al l at once, but i t ‟s something that is— I guess the crucial
thing there is to sort of ask yourself the quest ion— I would go back to kind of
the inference model quest ion:
You know am I happy to just infer to this group—and this group
of sort of temporal clusters , i f you wil l—and I might very wel l be—and t reat
these things as i f they‟re just , you know, they‟re f ixed effects i f they have
any effects and not worry about the heterogenei ty of t reatment effects .
Now, as soon as I say that , I think of one example in which i t
real ly mat tered in the early schools where i t had a much different t reatment
effect than the later schools , and then you were lef t wi th the puzzle about
what to do . If you think in the sort of condi t ional inference sense, you could
talk about the average effect of the early and the late.
I‟m caricaturing your example a l i t t le bi t , but i t does correspond
to one wel l -known example to some of us anyway. You know as soon as
your— there is a sort of a price for , intel lectual price for thinking in terms of
f ixed effects . You know you ‟re talking about the average t reatment effect in
the clusters , in the blocks you happen to choose.
And what i f the blocks have very different t reatment effects?
Well , the convent ional answer is take the average in some way . Maybe you
weight i t ; maybe you just take the unweighted ave rage. And that ‟s “ the
t reatment effect .”
But i f “ the t reatment effect” i s made up of very different
VSM 43
individual effects , then i t ‟s kind of unsat isfying . You know no effect plus
real ly huge effect—no effect in one block, huge effect in another block, the
average is somewhere in between, doesn ‟ t seem l ike a very good descript ion
of the data overal l .
So I think, I don ‟ t know, from my point of view, Geoff , your
quest ion is real ly easy i f there are no block t reatment interact ions in that case
and real ly hard i f there are . And i f there are, then I think they ‟re going to be,
there‟s going to be an essent ial ly contested quest ion about what the r ight
analysis is al though one can describe, one can describe the data and say, look,
the early adopters found a whopping t reatment effect in this case; the late
adopters didn ‟ t f ind much . And there ‟s a real di fference between the two.
And in the absence—and this kind of i l lust rates a weakness of
experiments— i f you don ‟ t have anything else than just the experimental data
to sort out the heterogenei ty of those effects , then i t ‟s real ly hard to interpret
them.
And I think al l of us who do this kind of work, even though we
may be proselyt izers for experiments at one level , are also proselyt izers for
col lect ing enough other kinds of data that you can interpret the experimental
data . And I know I‟m preaching to the choir in you, Geoff , because I know
you do that rout inely, but I think that ‟s probably essent ial .
Ah, now Vivian may very wel l want to take issue with what I said
about , in answer to Henry‟s quest ion because we know she works in this f ield .
So—
MS. WONG: Actual ly I was going to int roduce, hopeful ly, maybe
VSM 44
another s imple quest ion.
MR. HEDGES: Uh-oh.
MS. WONG: I was wondering— let ‟s say in the case of a s imple
random assignment s tudy, do you have comments or thoughts about doing the
analysis with a regression -adjusted resul ts or sort of s imple t reatment minus
control di fference?
I guess in some cases where I see that there ‟s no obvious reason
that randomizat ion didn ‟ t work, that a lot of t imes people s t i l l present sort of
these covariate adjusted resul ts for experiments , and to me that seems l ike i t
could introduce more bias , but—
MR. HEDGES: Aah . Well , you know, there is work on that .
There is Freedman ‟s work on how adjust ing for covariates can introduce bias
in the randomized experiment . But i t goes away asymptot ical ly, and i t ‟s not
very big. I guess one— I guess I would say I‟m sort of more comfortable with
centered analyses so that you ‟re not adjust ing the t reatmen t effect , but you ‟re
just adjust ing variat ion within t reatment groups.
But I think i f al l you ‟re doing is just increasing precis ion, I ‟m not
unhappy with that , Freedman aside.
If the resul t i s sensi t ive to that , then I ‟m a l i t t le more worried .
Not just , you know, that p goes from one s ide of 05 to the other , but you
actual ly get something that looks l ike a qual i tat ively different answer, then
I‟d be very worried about , but I wouldn ‟ t expect to get that in a randomized
experiment .
So I do think i t ‟s—so I‟m general ly pret ty comfortable with
VSM 45
covariate adjustments because in the real world we need them to get enough
power to do experiments with feasible numbers of schools . So I think i f we
decide we ‟re not going to do that , then we ‟re hoping IES gets a much b igger
budget .
[Laughter . ]
MR. HEDGES: And gives a lot of i t to us . Okay . Go ahead,
Carol .
MS. O ‟DONNELL: I won ‟ t say who I am in case this is a s tupid
quest ion.
[Laughter . ]
MS. O ‟DONNELL: But I think i t might bui ld off of what she just
asked, but how do you know, under what condi t ions do you use a variable as a
blocking variable versus a covariate?
MR. HEDGES: You know, there ‟s an old and a new l i terature on
this , you know. David Cox and Leonard Feldt and folks l ike that wrote about
i t . If you have a real ly high correlat ion, and basical ly i t depends— the answer,
one answer is that i t depends on the form of the relat ionship between the
covariate and the outcome, and i f you ‟re convinced you know i t , and you
know i t ‟s l inear , or you can l inearize i t , i f you can ‟ t do that , then using i t as
a l inear covariate is probably not possible.
If you do know that , then there ‟s s t i l l a quest ion of whether
blocking or analysis of covariance is more eff icient , and i t ‟s a l i t t le— that ‟s a
l i t t le bi t t r icky, but I thin k the general answer is , and I think there ‟s at least
one person in the room who might dispute this , I think the general answer is
VSM 46
that i f you have a real ly high correlat ion, the use of the l inear covariate
probably gives you a s l ight power advantage, and there ‟s a point at which
blocking is compet i t ive.
And so the sort of s imple answer, one answer would be you t ry to
f igure out what the power would be under ei ther ci rcumstance and then pick
the one that has the best power i f you ‟re not worried about the p otent ial bias
int roduced by using covariates .
And i t wasn ‟ t a s tupid quest ion . It ‟s a class ic quest ion, I should
say, because I can think of three or four papers on i t . It becomes t r icky. But
one of the things that I guess I should also ment ion is that i t does—knowing
quest ions about f ixed versus random effects also can, you know, and
condi t ional versus uncondi t ional inferences, what are you inferr ing to can
also enter into this . And I answered that kind of caval ier ly not using my own
rules .
You know if you think of the values you ‟ve observed, the values
of the potent ial blocking variable you ‟ve observed as being sampled in some
sense, then that has implicat ions for the analysis of covariance because the
classic analysis of covariance would argue that thos e things are f ixed, and i f
you got them by sample, and you ‟re serious about that , and i t ‟s one thing to
take a sample and say, okay, this is the universe I care about , i t ‟s another
thing to say I took a sample and I think of i t as al lowing me to help
general ize to a larger universe . So depending on how you think about the
covariate, there may be implicat ions for analysis .
Well , looks l ike I‟ve exhausted you . Oh, no. Jack has got another
VSM 47
quest ion.
MR. MOSTOW: When you have mult iple observat ions per
individual—sorry— I‟m st i l l Jack Mostow— i t hasn ‟ t changed.
[Laughter . ]
MR. MOSTOW: But I might want to .
MR. HEDGES: It would have more interest ing i f i t had changed.
MR. MOSTOW: When you have mult iple observat ions per
individual , you can control for indivi dual differences by actual ly throwing in
ident i ty, s tudent ident i ty as a variable, and then that controls for everything
whether you know what i t was or not .
MR. HEDGES: Yeah.
MR. MOSTOW: But you may hear a giant sucking sound of al l
your power going away, or you can put in the covariates for the things that
you suspect might be important but you might be missing something . How do
you decide which?
MR. HEDGES: Well , now I ‟m get t ing into terr i tory where I know
people wil l disagree . In a sense, making, ident i fying, put t ing in a dummy
variable for each individual is very much in the t radi t ion of making the
individuals f ixed effects .
I mean that would be the language economists would use, and
then the quest ion becomes are you interested in this set of ind ividuals and is
between-individual variabi l i ty to be—do you think—well , the quest ion
becomes is this a sample of individuals , and you want to general ize to a
broader set of individuals or—and you want that to be part of the s tat is t ical
VSM 48
r igmarole—or do you want to think of this set of individuals as being a set of
individuals about whom I can learn things?
And i f you want to think of the individuals as being, as being a
sample of individuals that you hope your resul ts general ize to , and you want
s tat is t ical—you want the sampling theory general izat ion model to be the way
to do i t , then my answer would be you ought to t reat those individuals as
having random effects .
Now many—now I know a s tandard approach to this analysis is to
make them al l f ixed— i s to , in a sense, to make them fixed effects , but you
don ‟ t have a sampling theory warrant any more for saying this should apply to
other people from the universe from whom these people were sampled, and
you have to make the ex tra -s tat is t ical kind of argument , a me chanis t ic kind of
argument , for general izat ion.
Now that may not be ; that may not be too big a price to pay
because you get a lot by creat ing individuals as f ixed effects . You get a lot of
precis ion for es t imat ing what ‟s going on with them, but , of course , you don ‟ t
get something for nothing . You got al l that precis ion by defining away
between-people variabi l i ty as being f ixed.
And i t ‟s the same, and i t ‟s not any different than I got a bunch of
schools and, boy, there ‟s a lot of between-school variabi l i ty; maybe I should
just , you know, make the schools f ixed effects , and at that point , you get a lot
more precis ion, but at the cost is that you ‟ve defined away a lot of the natural
variabi l i ty, and the s tat is t ics don ‟ t a l low you to bring i t back in .
Now, i t may be that i f you ‟re t rying to do something l ike a proof
VSM 49
of concept , you probably want to do— i t may be exact ly the r ight thing to do.
If you want to show that something works someplace under good
condi t ions, then the f ixed effects approach seems l ike a good one to me. If
you want to show that this is going to work i f we scale i t up, that sounds l ike
a bad approach.
I‟m expect ing somebody to disagree with this , wi th that , but
maybe I‟ l l be lucky and we ‟ l l run out of t ime before they get a chance to calm
down enough to say something.
Yes? No? Okay. Well , you know, I‟m here for another three
minutes so you can—
[Laughter . ]
MR. HEDGES: You can ask quest ions i f you l ike . Well , okay.
I‟ l l put in an advert isement then i f we ‟ve got another few minutes . This is a
great— the IES Research Conference is real ly a great gathering, and i t i s
t remendously, wel l , grat i fying to me to see how many good people there are
doing serious educat ion scient i f ic work.
One of the other places where you can f ind lots of people w ho are
interested in doing good scient i f ic work in educat ion is the Society for
Research on Educat ional Effect iveness , a new scient i f ic society that was
formed a couple of years ago, and i t was formed by people l ike you . Well ,
people l ike me in part icular .
[Laughter . ]
MR. HEDGES: And the idea was to have something kind of l ike
the IES Research Conference, at least to have the same kind of populat ion as
VSM 50
the IES Research Conference . It ‟s open to anybody. I mean i t ‟s not IES
specif ic , but , you know, people who are interested in doing r igorous research
on educat ion, part icularly research that t r ies to sort out causal effects .
We‟re not a bunch of folks who just do experiments . There ‟s a lot
of folks who do longi tudinal s tudies , who do qual i tat ive work, who d o
measurement , who do s tat is t ics , but the meet ings—we‟ve had a couple of
nat ional meet ings so far . They‟ve been a lot of fun, for me anyway, and, you
know, I had to do a lot of adminis t rat ive detai ls so i f they were fun for me, I
think they were more fun for other folks .
I would urge you to think about joining the Society for Research
on Educat ional Effect iveness or i f not doing that , at least maybe you want to
come to one of our conferences . The next conference is going to be roughly a
year from now in ea rly March of 2010 . It wi l l be in Washington . You can
expect there to be a few hundred, a smal l few hundred, of individuals that are
in fact a lot l ike you . In fact , some of them are you . But some of you probably
aren ‟ t part of i t .
We have a journal , the Journal for Research on Educat ional
Ef fect iveness , and I think one of the things that has been impressive to me
about the group so far is that when I go to the conference, there are always
more things that I want to go to than there ‟s , you know, copies of m e to go to
them.
And so I f ind that I want to go to the methodological talks , but I
also want to go to a lot of substant ive talks because they ‟re fascinat ing, and I
also f ind that the people that I meet there are interest ing and fun, and I learn
VSM 51
from them, and I would commend i t to al l of you to at least think about .
You can f ind informat ion about us at
www.educat ionaleffect iveness .org, and or you can just sort of Google Society
for Research on Educat ional Effect iveness . And anyway, I think that is
another place where you can not only present your own work, but you can get
a chance to see high qual i ty, uniformly high qual i ty educat ional research in a
scient i f ic mold, and i t ‟s a conference that ‟s smal l enough that you can talk to
people in the hal l , and th at ‟s real ly part of what we t ry to do.
We t ry to create a program that gives opportuni ty for interact ion,
and we also do things l ike we provide food for people so that people don ‟ t
run off at luncht ime and so they get a chance to s tay together , and we usu al ly
have a few events that are yet another excuse for mingl ing.
So I would urge you to at least consider ei ther joining the society
or coming to our meet ing or both and maybe even thinking about present ing
your own work there .
We wil l look forward to s eeing as many of you as we can . I guess
i t ‟s okay for me to make that commercial .
[Laughter . ]
MR. HEDGES: And i f there is no other quest ion, the fel low in
the back was holding up a s ign that was threatening to cut me off , so perhaps
we should end.
[Applause.]
[Whereupon, at 2:45 p.m. , the panel session concluded.]