71
Understanding and misunderstanding randomized controlled trials Angus Deaton and Nancy Cartwright Princeton University, NBER, and University of Southern California Durham University and UC San Diego This version, October 2017 We acknowledge helpful discussions with many people over the several years this paper has been in preparation. We would particularly like to note comments from seminar participants at Princeton, Columbia, and Chicago, the CHESS research group at Durham, as well as discussions with Orley Ash- enfelter, Anne Case, Nick Cowen, Hank Farber, Jim Heckman, Bo Honoré, Chuck Manski, and Julian Reiss. Ulrich Mueller had a major influence on shaping Section 1. We have benefited from generous comments on an earlier version by Christopher Adams, Tim Besley, Chris Blattman, Sylvain Chassang, Jishnu Das, Jean Drèze, William Easterly, Jonathan Fuller, Lars Hansen, Jeff Hammer, Glenn Harrison, Macartan Humphreys, Michal Kolesár, Helen Milner, Tamlyn Munslow, Suresh Naidu, Lant Pritchett, Dani Rodrik, Burt Singer, Richard Williams, Richard Zeckhauser, and Steve Ziliak. Cartwright’s re- search for this paper has received funding from the European Research Council (ERC) under the Eu- ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U), the Spencer Foundation, and the National Science Foundation (award 1632471). Deaton acknowl- edges financial support from the National Institute on Aging through the National Bureau of Eco- nomic Research, Grants 5R01AG040629-02 and P01AG05842-14 and through Princeton University’s Roybal Center, Grant P30 AG024928.

Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

Understandingandmisunderstandingrandomizedcontrolledtrials

AngusDeatonandNancyCartwright

PrincetonUniversity,NBER,andUniversityofSouthernCalifornia

DurhamUniversityandUCSanDiego

Thisversion,October2017

Weacknowledgehelpfuldiscussionswithmanypeopleovertheseveralyearsthispaperhasbeeninpreparation.WewouldparticularlyliketonotecommentsfromseminarparticipantsatPrinceton,Columbia,andChicago,theCHESSresearchgroupatDurham,aswellasdiscussionswithOrleyAsh-enfelter,AnneCase,NickCowen,HankFarber,JimHeckman,BoHonoré,ChuckManski,andJulianReiss.UlrichMuellerhadamajorinfluenceonshapingSection1.WehavebenefitedfromgenerouscommentsonanearlierversionbyChristopherAdams,TimBesley,ChrisBlattman,SylvainChassang,JishnuDas,JeanDrèze,WilliamEasterly,JonathanFuller,LarsHansen,JeffHammer,GlennHarrison,MacartanHumphreys,MichalKolesár,HelenMilner,TamlynMunslow,SureshNaidu,LantPritchett,DaniRodrik,BurtSinger,RichardWilliams,RichardZeckhauser,andSteveZiliak.Cartwright’sre-searchforthispaperhasreceivedfundingfromtheEuropeanResearchCouncil(ERC)undertheEu-ropeanUnion’sHorizon2020researchandinnovationprogram(grantagreementNo667526K4U),theSpencerFoundation,andtheNationalScienceFoundation(award1632471).Deatonacknowl-edgesfinancialsupportfromtheNationalInstituteonAgingthroughtheNationalBureauofEco-nomicResearch,Grants5R01AG040629-02andP01AG05842-14andthroughPrincetonUniversity’sRoybalCenter,GrantP30AG024928.

Page 2: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

1

ABSTRACT

RCTswouldbemoreusefulifthereweremorerealisticexpectationsofthemandiftheirpit-fallswerebetterrecognized.Forexample,andcontrarytomanyclaimsintheappliedlitera-ture,randomizationdoesnotequalizeeverythingbutthetreatmentacrosstreatmentsandcontrols,itdoesnotautomaticallydeliverapreciseestimateoftheaveragetreatmenteffect(ATE),anditdoesnotrelieveusoftheneedtothinkabout(observedorunobserved)con-founders.Estimatesapplytothetrialsampleonly,sometimesaconveniencesample,andusuallyselected;justificationisrequiredtoextendthemtoothergroups,includinganypop-ulationtowhichthetrialsamplebelongs.Demanding“externalvalidity”isunhelpfulbe-causeitexpectstoomuchofanRCTwhileundervaluingitscontribution.Statisticalinfer-enceonATEsinvolveshazardsthatarenotalwaysrecognized.RCTsdoindeedrequiremin-imalassumptionsandcanoperatewithlittlepriorknowledge.Thisisanadvantagewhenpersuadingdistrustfulaudiences,butitisadisadvantageforcumulativescientificprogress,wherepriorknowledgeshouldbebuiltuponandnotdiscarded.RCTscanplayaroleinbuildingscientificknowledgeandusefulpredictionsbuttheycanonlydosoaspartofacu-mulativeprogram,combiningwithothermethods,includingconceptualandtheoreticalde-velopment,todiscovernot“whatworks,”but“whythingswork”.

Page 3: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

2

IntroductionRandomizedcontrolledtrials(RCTs)arecurrentlywidelyvisibleineconomicstoday

andhavebeenusedinthesubjectatleastsincethe1960s(seeGreenbergand

Shroder(2004)foracompendium).Itisoftenclaimedthatsuchtrialscandiscover

“whatworks”ineconomics,aswellasinpoliticalscience,education,andsocialpol-

icy.Amongbothresearchersandthegeneralpublic,RCTsareperceivedtoyield

causalinferencesandestimatesofaveragetreatmenteffects(ATEs)thataremore

reliableandmorecrediblethanthosefromanyotherempiricalmethod.Theyare

takentobelargelyexemptfromthemyriadeconometricproblemsthatcharacterize

observationalstudies,torequireminimalsubstantiveassumptions,littleornoprior

information,andtobelargelyindependentof“expert”knowledgethatisoftenre-

gardedasmanipulable,politicallybiased,orotherwisesuspect.

Therearenow“WhatWorks”centersusingandrecommendingRCTsina

rangeofareasofsocialconcernacrossEuropeandtheAnglophoneworld.These

centersseeRCTsastheirpreferredtoolandindeedoftenpreferRCTevidencelexi-

cographically.Asoneofmanyexamples,theUSDepartmentofEducation’sstandard

for“strongevidenceofeffectiveness”requiresa“well-designedandimplemented”

RCT;noobservationalstudycanearnsuchalabel.This“goldstandard”claimabout

RCTsislesscommonineconomics,butImbens(2010,407)writesthat“randomized

experimentsdooccupyaspecialplaceinthehierarchyofevidence,namelyatthe

verytop.”TheAbdulLatifJameelPovertyActionLab(J-PAL),whosestatedmission

is“toreducepovertybyensuringthatpolicyisinformedbyscientificevidence”,ad-

vertisesthatitsaffiliatedprofessors“conductrandomizedevaluationstotestand

improvetheeffectivenessofprogramsandpoliciesaimedatreducingpoverty”,J-

PAL(2017).Theleadpageofitswebsite(echoedinthe‘Evaluation’section)notes

“843ongoingandcompletedrandomizedevaluationsin80countries”withnomen-

tionofanystudiesthatarenotrandomized.

Inmedicine,thegoldstandardviewhaslongbeenwidespread,e.g.fordrug

trialsbytheFDA;anotableexceptionistherecentpaperbyFrieden(2017),ex-di-

Page 4: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

3

rectoroftheU.S.CentersforDiseaseControlandPrevention,wholistskeylimita-

tionsofRCTsaswellasarangeofcontextswhereRCTs,evenwhenfeasible,are

dominatedbyothermethods.

WearguethatanyspecialstatusforRCTsisunwarranted.Whichmethodis

mostlikelytoyieldagoodcausalinferencedependsonwhatwearetryingtodis-

coveraswellasonwhatknowledgeisalreadyavailable.Whenlittleprior

knowledgeisavailable,nomethodislikelytoyieldwell-supportedconclusions.This

paperisnotacriticismofRCTsinandofthemselves,letaloneanattempttoidentify

goodandbadstudies.Instead,wewillarguethat,dependingonwhatwewantto

discover,whywewanttodiscoverit,andwhatwealreadyknow,therewilloftenbe

superiorroutesofinvestigation.

Wepresenttwosetsofarguments.Thefirstisanenquiryintotheideathat

ATEsestimatedfromRCTSarelikelytobeclosertothetruththanthoseestimated

inotherways.ThesecondexploreshowtousetheresultsofRCTsoncewehave

them.Inthefirstsection,ourdiscussionrunsinfamiliartermsofbiasandefficiency,

orexpectedloss.Noneofthismaterialisnew,butweknowofnosimilartreatment,

andwewishtodisputemanyoftheclaimsthatarefrequentlymadeintheapplied

literature.Someroutinemisunderstandingsare:(a)randomizationensuresafair

trialbyensuringthat,atleastwithhighprobability,treatmentandcontrolgroups

differonlyinthetreatment;(b)RCTsprovidenotonlyunbiasedestimatesofATEs

butalsopreciseestimates;(c)statisticalinferenceinRCTs,whichrequiresonlythe

simplecomparisonofmeans,isstraightforward,sothatstandardsignificancetests

arereliable.

Nothingwesayinthepapershouldbetakenasageneralargumentagainst

RCTs;wearesimplytryingtochallengeunjustifiableclaims,andexposemisunder-

standings.WearenotagainstRCTs,onlymagicalthinkingaboutthem.Themisun-

derstandingsareimportantbecausewebelievethattheycontributetothecommon

perceptionthatRCTsalwaysprovidethestrongestevidenceforcausalityandforef-

fectiveness.

Page 5: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

4

Inthesecondpartofthepaper,wediscusshowtousetheevidencefrom

RCTs.Thenon-parametricandtheory-freenatureofRCTs,whichisarguablyanad-

vantageinestimation,isoftenadisadvantagewhenwetrytousetheresultsoutside

ofthecontextinwhichtheresultswereobtained;credibilityinestimationcanlead

toincredibilityinuse.Muchoftheliterature,perhapsinspiredbyCampbelland

Stanley’s(1963)famous“primacyofinternalvalidity”,appearstobelievethatinter-

nalvalidityisnotonlynecessarybutalmostsufficienttoguaranteetheusefulnessof

theestimatesindifferentcontexts.Butyoucannotknowhowtousetrialresults

withoutfirstunderstandinghowtheresultsfromRCTsrelatetotheknowledgethat

youalreadypossessabouttheworld,andmuchofthisknowledgeisobtainedby

othermethods.OncethecommitmenthasbeenmadetoseeingRCTswithinthis

broaderstructureofknowledgeandinference,andwhentheyaredesignedtoen-

hanceit,theycanbeenormouslyuseful,notjustforwarrantingclaimsofeffective-

nessbutforscientificprogressmoregenerally.Cumulativescienceisnotadvanced

throughmagicalthinking.

TheliteratureontheprecisionofATEsestimatedfromRCTsgoesbacktothe

verybeginning.Gosset(writingas`Student’)neveracceptedFisher’sargumentsfor

randomizationinagriculturalfieldtrialsandarguedconvincinglythathisownnon-

randomdesignsfortheplacementoftreatmentandcontrolsyieldedmoreprecise

estimatesoftreatmenteffects(seeStudent(1938)andZiliak(2014)).Gosset

workedforGuinnesswhereinefficiencymeantlostrevenue,sohehadreasonsto

care,asshouldwe.Fisherwontheargumentintheend,notbecauseGossetwas

wrongaboutefficiency,butbecause,unlikeGosset’sprocedures,randomizationpro-

videsasoundbasisforstatisticalinference,andthusforjudgingwhetheranesti-

matedATEisdifferentfromzerobychance.Moreover,Fisher’sblockingprocedures

canlimittheinefficiencyfromrandomization(seeYates(1939)).Gosset’sreserva-

tionswereechoedmuchlaterinSavage’s(1962)commentthataBayesianshould

notchoosetheallocationoftreatmentsandcontrolsatrandombutinsuchaway

that,givenwhatelseisknownaboutthetopicandthesubjects,theirplacementre-

vealsthemosttotheresearcher.Theseissuesabouthowtoincorporatepriorinfor-

mationintorandomizedtrialsarecentraltoSection1.

Page 6: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

5

Ineconomics,thestrengthsandweaknessesofRCTsarewellexploredinthe

volumesbyHausmanandWise(1985)andbyGarfinkelandManski(1992);inthe

latter,theintroductionbyGarfinkelandManskiisabalancedsummaryofwhatran-

domizedtrialscanandcannotdo.ThepaperinthatvolumebyHeckman(1992)

raisesmanyoftheissuesthatheandhiscoauthorshaveexploredinsubsequentpa-

pers,seeinparticularHeckmanandSmith(1995),andHeckman,LalondeandSmith

(1999)whofocusonlabormarketexperiments.Manski(2013)containsagood

summaryofbothstrengthsandweaknesses.

Thereisalsoamorecontestedrecentliterature.Ontheonehand,thereare

proceduresthattakeasfundamentaltheunrestrictedindividualtreatmenteffectsof

individualsandseeknon-parametricapproachestoestimatingtheiraverage.Onthe

otherhand,theseproceduresarecontrastedwithanapproachthatuseselementsof

economictheorytodefineparametersofinterestandtoidentifymagnitudesthat

arelikelytobeinvarianttopolicymanipulationoracrosscontexts,whereinvari-

anceisdefinedinthesenseofHurwicz(1966).TheintroductioninImbensand

Wooldridge(2009)provideaneloquentdefenseofthetreatment-effectformulation.

Itemphasizesthecredibilitythatcomesfromatheory-freespecificationwithalmost

unlimitedheterogeneityintreatmenteffects.TheintroductioninHeckmanand

Vytlacil(2007)makesanequallyeloquentcaseagainst,notingthatthecrucialingre-

dientsoftreatmentsinRCTsareoftennotclearlyspecified—sothatweoftendonot

knowwhatthetreatmentreallyis—andthatthetreatmenteffectsarehardtolinkto

invariantparametersthatwouldbeusefulelsewhere.Aspectsofthesamedebate

featureinImbens(2010),AtheyandImbens(2017),AngristandPischke(2017),

Heckman(2005,2008,2010)andHeckmanandUrzua(2010).

Deaton(2010)complainsabouttheuseofinstrumentalvariables,including

randomization,asasubstituteforthinkingaboutandconstructingmodelsofeco-

nomicdevelopment.HearguesagainsttheideathatusingRCTstoevaluateprojects

todiscover“whatworks”caneveryieldasystematicbodyofscientificknowledge

thatcanbeusedtoreduceoreliminatepoverty.Thatpaperisanargumentagainst

theusefulnessoftheheterogeneoustreatmentapproach.Itarguesthatrefusingto

Page 7: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

6

modelheterogeneity,thoughavoidingassumptions,precludesthesortofcumula-

tiveresearchprogramthatmightyieldusefulpolicy.Thepaper’sclaimthatRCTs

havenospecialclaimtogeneratecredibleandusefulknowledgewaschallengedby

Imbens(2010);someofhisargumentsareansweredbelow.Cartwright(2007)and

CartwrightandMunro(2010)challengeany“goldstandard”viewofRCTs.Cart-

wright(2011,2012,2016)andCartwrightandHardie(2012)focusonthequestion

ofhowtousetheresultsofRCTsandwhatwecanlearnwhenanexperimentshows

thatsomepolicyworkssomewhere.Section2pursuestheseissuesingeneraland

throughcasestudies.

Section1:DoRCTsgivegoodestimatesofAverageTreatmentEffects

Inthissection,weexplorehowtoestimateaveragetreatmenteffects(ATEs)andthe

roleofrandomization.WenotefirstthatestimatingATEsisonlyoneofmanyuses

forthedatageneratedbyanRCT.Westartfromatrialsample,acollectionofsub-

jectsthatwillbeallocatedrandomlytoeitherthetreatmentorcontrolarmofthe

trial.This“sample”mightbe,butrarelyis,arandomsamplefromsomepopulation

ofinterest.Morefrequently,itisselectedinsomeway,forexampletothosewilling

toparticipate,orissimplyaconveniencesamplethatisavailabletothetrialists.

Givenrandomallocationtotreatmentsandcontrols,thedatafromthetrialallowthe

identificationoftwodistributions,𝐹"(𝑌")and𝐹&(𝑌&),ofoutcomes𝑌"and𝑌&inthe

treatedanduntreatedcaseswithinthetrialsample.TheestimatedATEisthediffer-

enceinmeansofthetwodistributionsandisthefocusofmuchoftheliteraturein

socialscienceandmedicine.Yetpolicymakersandresearchersmaywellbeinter-

estedinotherfeaturesofthetwodistributions.Forexample,ifYisincome,they

maybeinterestedinwhetheratreatmentreducedincomeinequality,orinwhatit

didtothe10thor90thpercentilesoftheincomedistribution,eventhoughdifferent

peopleoccupythosepercentilesinthetreatmentandcontroldistributions(seeBit-

leretal(2006)foranexampleinUSwelfarepolicy).Cancertrialsstandardlyusethe

mediandifferenceinsurvival,whichcomparesthetimesuntilhalfthepatientshave

diedineacharm.Morecomprehensively,policymakersmaywishtocompareex-

pectedutilitiesfortreatedanduntreatedunderthetwodistributionsandconsider

Page 8: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

7

optimalexpected-utilitymaximizingtreatmentrulesconditionalonthecharacteris-

ticsofsubjects(seeManski(2004)andManskiandTetenov(2016);Bhattacharya

andDupas(2012)containsanapplication.)Theseusesareimportant,butwefocus

onATEshereanddonotconsidertheseotherusesofRCTsanyfurtherinthispaper.

1.1Whatdoesrandomizationdo?

Ausefulwaytothinkabouttheestimationoftreatmenteffectsistouseaschematic

linearcausalmodeloftheform:

(1)

where, istheoutcomeforuniti,𝑇( isadichotomous(1,0)treatmentdummyindi-

catingwhetherornotiistreated,and𝛽( istheindividualtreatmenteffectofthe

treatmentoni.Thex’saretheobservedorunobservedotherlinearcausesofthe

outcome,andwesupposethat(1)capturesaminimalsetofcausesof𝑌( sufficientto

fixitsvalue.Jmaybe(very)large.Becausetheheterogeneityoftheindividualtreat-

menteffects,𝛽( ,isunrestricted,weallowthepossibilitythatthetreatmentinteracts

withthex’sorothervariables,sothattheeffectsofTcandependonanyothervaria-

bles.Notethatwedonotneedisubscriptsonthe𝛾’sthatcontroltheeffectsofthe

othercauses;iftheireffectsdifferacrossindividuals,weincludetheinteractionsof

individualcharacteristicswiththeoriginalx’sasnewx’s.Giventhatthex’scanbe

unobservable,thisisnotrestrictive.

Consideranexperimentthataimstotellussomethingaboutthetreatment

effects;thismightormightnotuserandomization.Eitherway,wecanrepresentthe

treatmentgroupashaving𝑇( = 1andthecontrolgroupashaving𝑇( = 0.Giventhe

study(ortrial)sample,subtractingtheaverageoutcomesamongthecontrolsfrom

theaverageoutcomesamongthetreatments,weget

Y

1−Y

0= β

1+ γ j (xij

1−

j=1

J

∑ xij0) = β

1+ (S

1− S

0) (2)

Thefirsttermonthefar-right-handsideof(2),whichistheATEinthetrialsample,

iswhatwewant,butthesecondtermorerrorterm,whichisthesumofthenetav-

eragebalanceofothercausesacrossthetwogroups,willgenerallybenon-zeroand

Yi = βiTi + γ j xijj=1

J∑Yi

Page 9: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

8

needstobedealtwithsomehow.Wegetwhatwewantwhenthemeansofallthe

othercausesareidenticalinthetwogroups,ormoreprecisely(andlessonerously)

whenthesumoftheirnetdifferences𝑆" − 𝑆&iszero;thisisthecaseofperfectbal-

ance.Withperfectbalance,thedifferencebetweenthetwomeansisexactlyequalto

theaverageofthetreatmenteffectsamongthetreated,sothatwehavetheultimate

precisioninthatweknowthetruthinthetrialsample,atleastinthislinearcase.As

always,the“truth”herereferstothetrialsample,anditisalwaysimportanttobe

awarethatthetrialsamplemaynotberepresentativeofthepopulationthatisulti-

matelyofinterest,includingthepopulationfromwhichthetrialsamplecomes;any

suchextensionrequiresfurtherargument.

Howdowegetbalance,orsomethingclosetoit?What,exactly,istheroleof

randomization?Inalaboratoryexperiment,wherethereisusuallymuchprior

knowledgeoftheothercauses,theexperimenterhasagoodchanceofcontrolling

(orsubtractingawaytheeffectsof)theothercauses,aimingtoensurethatthelast

termin(1)isclosetozero.Failingsuchknowledgeandcontrol,analternativeis

matching,whichisfrequentlyusedinstatistical,medical,andeconometricwork.For

eachsubject,amatchisfoundthatisascloseaspossibleonallsuspectedcauses,so

that,onceagain,thelasttermin(1)canbekeptsmall.Whenwehaveagoodideaof

thecauses,matchingmayalsodeliverapreciseestimate.Ofcourse,whenthereare

unknownorunobservablecausesthathaveimportanteffects,neitherlaboratory

controlnormatchingoffersprotection.

Whatdoesrandomizationdo?Sincethetreatmentsandcontrolscomefrom

thesameunderlyingdistribution,randomizationguarantees,byconstruction,that

thelasttermontherightin(1)iszeroinexpectation,subjecttothecaveatthatno

correlationsofthex’swithYareintroducedpost-randomization,forexampleby

subjectsnotacceptingtheirassignment.Theexpectationhereistakenoverre-

peatedrandomizationsonthetrialsample,eachwithitsownallocationoftreat-

mentsandcontrols.Assumingthatourcaveatholds,thelasttermin(2)willbezero

whenaveragedoverthisinfinitenumberof(entirelyhypothetical)replications,and

Page 10: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

9

theaverageoftheestimatedATEswillbethetrueATEinthetrialsample.So𝛽"de-

liversanunbiasedestimateoftheATEamongthetreatedinthetrialsample,andit

doessowhetherornotthecausesareobserved.Unbiasednessdoesnotrequireus

toknowanythingabouttheothercausesthoughitdoesrequirethattheynotchange

afterrandomizationsoastomakethemcorrelatedwiththetreatment,whichisan

importantcaveattowhichweshallreturn.IftheRCTisrepeatedmanytimesonthe

sametrialsample,then,assumingourcaveatholdsinthetrials,thelasttermin(2)

willbezerowhenaveragedoveraninfinitenumberof(entirelyhypothetical)trials,

andtheaverageoftheestimatedATEswillbethetrueATEinthetrialsample.Of

course,noneofthisistrueinanyonetrialwherethedifferenceinmeanswillbe

equaltotheaveragetreatmenteffectamongthosetreatedplusthetermthatreflects

theimbalanceintheneteffectsoftheothercauses.Wedonotknowthesizeofthis

errorterm,andthereisnothinginrandomizationthatlimitsitssize;bychancethe

randomizationinoursingletrialcanover-representanimportantexcludedcause(s)

inonearmovertheother,inwhichcasetherewillbeadifferencebetweenthe

meansofthetwogroupsthatisnotcausedbythetreatment.

Theunbiasednessresultcaneasilybecompromised.Inparticular,thetreat-

mentmustnotbecorrelatedwithanyothercause.Randomassignmentisdesigned

toaidwiththis,butitisnotsufficientif,forexample,thereislackofblindingsothat

individualsareawareoftheirassignment,orifthoseadministeringthetreatment

aresoaware,andifthatawarenesstriggersanothercause.Similarly,researchers

sometimesreturntoindividualswhowererandomizedyearsbefore,sothatthere

hasbeentimeforthesubjectsorotherstolearntheirassignmentorforothercauses

tobeinfluencedbytheassignment.Thisagainopensupthepossibilityofunbal-

ancedeffectsofcausesotherthanthetreatmentweareinterestedin.Wehaveal-

readynotedthatunbiasednessreferstothetrialsample,whichmayormaynotbe

representativeofthepopulationofinterest.

Ifweweretorepeatthetrialmanytimes,theover-representationoftheun-

balancedcauseswillsometimesbeinthetreatmentsandsometimesinthecontrols.

Theimbalancewillvaryoverreplicationsofthetrial,andalthoughwecannotsee

thisfromoursingletrial,weshouldbeabletocaptureitseffectsonourestimateof

Page 11: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

10

theATEfromanestimatedstandarderror.ThiswasFisher’sinnovation:notthat

randomizationbalancedothercausesbetweentreatmentsandcontrolsbutthat,

conditionalonourcaveatabove,randomizationprovidesthebasisforcalculating

thesizeoftheerror.Gettingthestandarderrorandassociatedsignificancestate-

mentsrightareofthegreatestimportance.Giventheabsenceoftreatment-related

post-randomizationchangesinothercauses,randomizationyieldsanunbiasedesti-

mateoftheATEinthetrialsampleaswellasasoundmethodformeasuringerrorof

estimationinthatsample;thereinliesitsvirtue,notthatityieldspreciseestimates

throughbalance.

1.2Misunderstandings:claimingtoomuch

Everythingsofarshouldbeperfectlyfamiliar,butexactlywhatrandomizationdoes

isfrequentlylostinthepracticalliterature.Thereisoftenconfusionbetweenperfect

control,ontheonehand(asinalaboratoryexperimentorperfectmatchingwithno

unobservablecauses),andcontrolinexpectationontheother,whichiswhatran-

domizationcontributes.Ifweknewenoughabouttheproblemtobeabletocontrol

well,thatiswhatwewoulddo.Randomizationisanalternativewhenwedonot

knowenough,butisgenerallyinferiortogoodcontrol.Wesuspectthatatleastsome

ofthepopularandprofessionalenthusiasmforRCTs,aswellasthebeliefthatthey

areprecisebyconstruction,comesfrommisunderstandingsaboutbalance.These

misunderstandingsarenotsomuchamongthetrialistswhowilloftengiveacorrect

accountwhenpressed.Theycomefromimprecisestatementsbytrialiststhatare

takenliterallybythelayaudiencethatthetrialistsarekeentoreach.

Suchamisunderstandingiswellcapturedbyaquotefromthesecondedition

oftheonlinemanualonimpactevaluationjointlyissuedbytheInter-AmericanDe-

velopmentBankandtheWorldBank(thefirst,2011editionissimilar):

“Wecanbeconfidentthatourestimatedimpactconstitutesthetrueimpact

oftheprogram,sincewehaveeliminatedallobservedandunobservedfac-

torsthatmightotherwiseplausiblyexplainthedifferenceinoutcomes.”Ger-

tler,Martinez,Premand,Rawlings,andVermeersch(2016,69).

Page 12: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

11

Thisstatementisfalse,becauseitconfusesactualbalanceinanysingletrialwith

balanceinexpectationovermany(hypothetical)trials.Ifitweretrue,andifallfac-

torswereindeedcontrolled(andnoimbalanceswereintroducedpostrandomiza-

tion),thedifferencewouldbeanexactmeasureoftheaveragetreatmenteffect

amongthetreatedinthetrialpopulation(atleastintheabsenceofmeasurementer-

ror).Weshouldnotonlybeconfidentofourestimatebut,asthequotesays,we

wouldknowthatitisthetruth.Notethatthestatementcontainsnoreferenceto

samplesize;wegetthetruthbyvirtueofbalance,notfromalargenumberofobser-

vations.

AsimilarquotecomesfromJohnList,oneofthemostimaginativeandsuc-

cessfulscholarswhouseRCTs:

“complicationsthataredifficulttounderstandandcontrolrepresentkeyrea-

sonstoconductexperiments,notapointofskepticism.Thisisbecauseran-

domizationactsasaninstrumentalvariable,balancingunobservablesacross

controlandtreatmentgroups.”Al-UbaydliandList(2013)(italicsintheorig-

inal.)

AndfromDeanKarlan,founderandPresidentofYale’sInnovationsforPovertyAc-

tion,whichrunsdevelopmentRCTsaroundtheworld:

“Asinmedicaltrials,weisolatetheimpactofaninterventionbyrandomly

assigningsubjectstotreatmentsandcontrolgroups.Thismakesitsothatall

thoseotherfactorswhichcouldinfluencetheoutcomearepresentintreat-

mentandcontrol,andthusanydifferenceinoutcomecanbeconfidentlyat-

tributedtotheintervention.”Karlan,GoldbergandCopestake(2009)

Andfromthemedicalliterature,fromadistinguishedpsychiatristwhoisdeeply

skepticaloftheuseofevidencefromRCTs,

“Thebeautyofarandomizedtrialisthattheresearcherdoesnotneedtoun-

derstandallthefactorsthatinfluenceoutcomes.Saythatanundiscoveredge-

neticvariationmakescertainpeopleunresponsivetomedication.Theran-

domizingprocesswillensure—ormakeithighlyprobable—thatthearmsof

thetrialcontainequalnumbersofsubjectswiththatvariation.Theresultwill

beafairtest.”(Kramer,2016,p.18)

Page 13: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

12

ClaimsareevenmadethatRCTsrevealknowledgewithoutpossibilityoferror.Judy

Gueron,thelong-timepresidentofMDRC,whichhasbeenrunningRCTsonUSgov-

ernmentpolicyfor45years,askswhyfederalandstateofficialswerepreparedto

supportrandomizationinspiteoffrequentdifficultiesandinspiteoftheavailability

ofothermethodsandconcludesthatitwasbecause“theywantedtolearnthetruth,”

GueronandRolston(2013,429).Therearemanystatementsoftheform“Weknow

that[projectX]workedbecauseitwasevaluatedwitharandomizedtrial,”Dynarski

(2015).

ItiscommontotreattheATEfromanRCTasifitwerethetruth,notjustin

thetrialsamplebutmoregenerally.Ineconomics,afamousexampleisLalonde’s

(1986)studyoftrainingprograms,whoseresultswereatoddswithanumberof

previousnon-randomizedstudies.Thepaperpromptedalarge-scalere-examination

oftheobservationalstudiestotrytobringthemintoline,thoughitnowseemsjust

aslikelythatthedifferenceslieinthefactthatthestudyresultsapplytodifferent

populations(Heckman,Lalonde,andSmith(1999)).Inepidemiology,Davey-Smith

andIbrahim(2002)statethat“observationalstudiespropose,RCTsdispose”.A

goodexampleistheRCTofhormonereplacementtherapy(HRT)forpost-menopau-

salwomen.HRThadpreviouslybeensupportedbypositiveresultsfromahigh-

qualityandlong-runningobservationalstudy,buttheRCTwasstoppedinthefaceof

excessdeathsinthetreatmentgroup.ThenegativeresultoftheRCTledtowide-

spreadabandonmentofthetherapy,whichmighthavebeenamistake(seeVanden-

broucke(2009)andFrieden(2017)).Yetthemedicalandpopularliteraturerou-

tinelystatesthattheRCTwasrightandtheearlierstudywrong,simplybecausethe

earlierstudywasnotrandomized.Thegoldstandardor“truth”viewdoesharm

whenitunderminestheobligationofsciencetoreconcileRCTsresultswithother

evidenceinaprocessofcumulativeunderstanding.

Thefalsebeliefinautomaticprecisionsuggeststhatweneedpaynoatten-

tiontotheothercausesin(1)or(2).Indeed,GerberandGreen(2012),intheir

standardtextforRCTsinpoliticalscience,writethatrunninganRCTis“aresearch

strategythatdoesnotrequire,letalonemeasure,allpotentialconfounders.”Thisis

trueifwearehappywithestimatesthatarearbitrarilyfarfromthetruth,justso

Page 14: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

13

longastheerrorscanceloutoveraseriesofimaginaryexperiments.Inreality,the

causalitythatisbeingattributedtothetreatmentmight,infact,becomingfroman

imbalanceinsomeothercauseinourparticulartrial;limitingthisrequiresserious

thoughtaboutpossibleconfounders.

1.3Samplesize,balance,andprecision

Atthetimeofrandomizationandintheabsenceofpost-randomizationchangesin

othercauses,atrialismorelikelytobebalancedwhenthesamplesizeislarge.As

thesamplesizetendstoinfinity,themeansofthex’sinthetreatmentandcontrol

groupswillbecomearbitrarilyclose.Yetthisisoflittlehelpinfinitesamples.As

Fisher(1926)noted:“Mostexperimentersoncarryingoutarandomassignment

willbeshockedtofindhowfarfromequallytheplotsdistributethemselves,”quoted

inMorganandRubin(2012).Evenwithverylargesamplesizes,ifthereisalarge

numberofcauses,balanceoneachcausemaybeinfeasible.Evenwithjustthree

causeswiththreevalueseach,thereare27cellstobalance,andinmostsocialand

medicalcasestherewillbemore.Vandenbroucke(2004)notesthattherearethree

billionbasepairsinthehumangenome,manyorallofwhichcouldberelevantprog-

nosticfactorsforthebiologicaloutcomethatweareseekingtoinfluence.Itistrue,

as(2)makesclear,thatwedonotneedbalanceoneachcauseindividually,onlyon

theirneteffect,theterm𝑆" − 𝑆&.Butconsiderthehumangenomebasepairs.Outof

allofthosebillions,onlyonemightbeimportant,andifthatoneisunbalanced,the

resultsofasingletrialcanbefarfromthetruth.Statementsaboutlargesamples

guaranteeingbalancearenotusefulwithoutguidelinesabouthowlargeislarge

enough,andsuchstatementscannotbemadewithoutknowledgeofothercauses

andhowtheyaffectoutcomes.Ofcourse,lackofbalanceintheneteffectofeither

observablesornon-observablesin(2)doesnotcompromisetheinferenceinanRCT

inthesenseofobtainingastandarderrorfortheunbiasedATE(seeSenn(2013)for

aparticularlyclearstatement).

HavingrunanRCT,itmakesgoodsensetoexamineanyavailablecovariates

forbalancebetweenthetreatmentsandcontrols;ifwesuspectthatanobserved

variablexisapossiblecause,anditsmeansinthetwogroupsareverydifferent,we

Page 15: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

14

shouldtreatourresultswithappropriatesuspicion.Inpractice,trialistsineconom-

ics(andinsomeotherdisciplines)usuallycarryoutastatisticaltestforbalanceaf-

terrandomizationbutbeforeanalysis,presumablywiththeaimoftakingsomeap-

propriateactionifbalancefails.Thefirsttableofthepapertypicallypresentsthe

samplemeansofobservablecovariates—theobservablex’sin(1)orinteractivefac-

torsrepresentedinβ—forthecontrolandtreatmentgroups,togetherwiththeirdif-

ferences,andtestsforwhetherornottheyaresignificantlydifferentfromzero,ei-

thervariablebyvariable,orjointly.Thesetestsareappropriateforunbiasednessif

weareconcernedthattherandomnumbergeneratormighthavefailed,orifweare

worriedthattherandomizationisunderminedbynon-blindedsubjectswhosys-

tematicallyunderminetheallocation.Otherwise,unbiasednessisguaranteedbythe

randomization,whateverthetestsshow,andasthenextparagraphdemonstrates,

thetestisnotinformativeaboutthebalancethatwouldleadtoprecision.

Ifwewrite𝜇&and𝜇"forthe(vectorsof)truemeansinthetrialsample(i.e.

themeansoverallpossiblerandomizations)oftheobservedcausesofYinthecon-

trolandtreatmentgroupsatthepointofassignment,thenullhypothesisis(pre-

sumably,asjudgedbythetypicalbalancetest)thatthetwovectorsareidentical,

withthealternativebeingthattheyarenot.Butiftherandomizationhasbeencor-

rectlydonethenullhypothesisistruebyconstruction(seee.g.Altman(1985)and

Senn(1994)),whichmayhelpexplainwhyitsorarelyfailsinpractice.AsBegg

(1990)notes,“(I)tisatestofanullhypothesisthatisknowntobetrue.Therefore,if

thetestturnsouttobesignificantitis,bydefinition,aafalsepositive.”Thisis,of

course,consistentwithFisher’scommentsabouttheplotsinthefield,whichnotes

thattwosamplesofplotsrandomlydrawnfromthesamefieldcanlookveryunbal-

anced.Indeed,althoughwecannot“test”itinthisway,weknowthatthenullhy-

pothesisisalsotruefortheunobservablecauses.Notethecontrastwiththestate-

mentquotedaboveclaimingthatRCTsguaranteebalanceoncausesacrosstreat-

mentandcontrolgroups.Thosestatementsrefertobalanceofcausesatthepointof

assignmentinanysingletrial,whichisnotguaranteedbyrandomization,whereas

thebalancetestsareaboutthebalanceofcausesatthepointofassignmentinexpec-

Page 16: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

15

tationovermanytrials,whichisguaranteedbyrandomization.Theconfusionisper-

hapsunderstandable,butitisaconfusionnevertheless.Ofcourse,itisalwaysgood

practicetolookforimbalancesbetweenobservedcovariatesinanysingletrialusing

somemoreappropriatedistancemeasure,forexamplethenormalizeddifferencein

means(ImbensandWooldridge(2009,equation3)).Similarly,itwouldhavebeen

goodpracticeforFishertoabandonarandomizationinwhichtherewereclearpat-

ternsinthe(random)distributionofplotsacrossthefield,eventhoughthetreat-

mentandcontrolplotswererandomlyselectionsthat,byconstruction,couldnot

differ“significantly”usingthestandard(incorrect)balancetest.Whethersuchim-

balancesshouldbeseenasunderminingtheestimateoftheATEdependsonour

priorsaboutwhichcovariatesarelikelytobeimportant,andhowimportant,which

is(notcoincidentally)thesamethoughtexperimentthatisroutinelyundertakenin

observationalstudieswhenweworryaboutconfounding.

Oneproceduretoimprovebalanceistoadaptthedesignbeforerandomiza-

tion,forexample,bystratification.Fisher,whoasthequoteaboveillustrates,was

wellawareofthelossofprecisionfromrandomizationarguedfor“blocking”(strati-

fication)inagriculturaltrialsorforusingLatinSquares,bothofwhichrestrictthe

amountofimbalance.Stratification,tobeuseful,requiressomepriorunderstanding

ofthefactorsthatarelikelytobeimportant,andsoittakesusawayfromthe“no

knowledgerequired”or“nopriorsaccepted”appealofRCTs;itrequiresthinking

aboutandmeasuringcovariates.ButasScriven(1974,103)notes:“(C)ausehunting,

likelionhunting,isonlylikelytobesuccessfulifwehaveaconsiderableamountof

relevantbackgroundknowledge”.Cartwright(1994,Chapter2)putsitevenmore

strongly,“nocausesin,nocausesout”.StratificationinRCTs,asinotherformsof

sampling,isastandardmethodforusingbackgroundknowledgetoincreasethe

precisionofanestimator.Ithasthefurtheradvantagethatitallowsfortheexplora-

tionofdifferentATEsindifferentstratawhichcanbeusefulinadaptingortrans-

portingtheresultstootherlocations(seeSection2).

Stratificationisnotpossibleiftherearetoomanycovariates,orifeachhas

manyvalues,sothattherearemorecellsthancanbefilledgiventhesamplesize.

Withfivecovariates,andtenvaluesoneach,andnopriorstolimitthestructure,we

Page 17: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

16

wouldhave100,000possiblestrata.Fillingtheseiswellbeyondthesamplesizesin

mosttrials.Analternativethatworksmoregenerallyistore-randomize.Iftheran-

domizationgivesanobviousimbalanceonknowncovariates—treatmentplotsall

ononesideofthefield,allofthetreatmentclinicsinoneregion,toomanyrichand

toofewpoorinthecontrolgroup—wetryagain,andkeeptryinguntilwegetabal-

ancemeasuredasasmallenoughdistancebetweenthemeansoftheobservedco-

variatesinthetwogroups.MorganandRubin(2012)suggesttheMahalanobisD–

statisticbeusedasacriterionanduseFisher’srandomizationinference(tobedis-

cussedfurtherbelow)tocalculatestandarderrorsthattakethere-randomization

intoaccount.Analternative,widelyadaptedinpractice,istoadjustforcovariatesby

runningaregression(orcovariance)analysis,withtheoutcomeontheleft-hand

sideandthetreatmentdummyandthecovariatesasexplanatoryvariables,includ-

ingpossibleinteractionsbetweencovariatesandtreatmentdummies.Freedman

(2008)showsthattheadjustedestimateoftheATEisbiasedinfinitesamples,with

thebiasdependingonthecorrelationbetweenthesquaredtreatmenteffectandthe

covariates.Acceptingsomebiasinexchangeforgreaterprecisionwilloftenmake

sense,thoughitcertainlyunderminesanygoldstandardargumentthatreliesonun-

biasednesswithoutconsiderationofprecision.

1.4Shouldwerandomize?

ThetensionbetweenrandomizationandprecisionthatgoesbacktoFisher,Gosset,

andSavagehasbeenreopenedinrecentpapersbyKasy(2016),Banerjee,Chassang,

andSnowberg(BCS)(2016)andBanerjee,Chassang,Montero,andSnowberg

(BCMS)(2016).

Thetrade-offbetweenbiasandprecisioncanbeformalizedinseveralways,

forexamplebyspecifyingalossorutilityfunctionthatdependsonhowauserisaf-

fectedbydeviationsoftheestimateoftheATEfromthetruthandthenchoosingan

estimatororanexperimentaldesignthatminimizesexpectedlossormaximizesex-

pectedutility.AsSavage(1962)noted,foraBayesian,thisinvolvesallocatingtreat-

mentsandcontrolsin“thespecificlayoutthatpromisedtotellhimthemost,”but

withoutrandomization.Ofcourse,thisrequiresseriousandperhapsdifficultthought

aboutthemechanismsunderlyingtheATE,whichrandomizationavoids.Savagealso

Page 18: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

17

notesthatseveralpeoplewithdifferentpriorsmaybeinvolvedinaninvestigation

andthatindividualpriorsmaybeunreliablebecauseof“vaguenessandtemptation

toself-deception,”defectsthatrandomizationmayalleviate,oratleastevade.BCMS

(2016)provideaproofofaBayesianno-randomizationtheorem,andBCS(2016)

provideanillustrationofaschooladministratorwhohaslongbelievedthatschool

outcomesaredetermined,notbyschoolquality,butbyparentalbackground,and

whocanlearnthemostbyplacingdeprivedchildrenin(supposed)high-quality

schoolsandprivilegedchildrenin(supposed)low-qualityschools,whichisthekind

ofstudysettingthatcasestudymethodologyiswellattunedto.AsBCSnote,thisal-

locationwouldnotpersuadethosewithdifferentpriors,andtheyproposerandomi-

zationasameansofsatisfyingskepticalobservers.

Severalpointsareimportant.First,theanti-randomizationtheoremisnota

justificationofanynon-randomizeddesign,forexample,onethatallowsselection

onunobservables,butonlytheoptimaldesignthatismostinformative.Accordingto

Chalmers(2001)andBothwellandPodolsky(2016),thedevelopmentofrandomi-

zationinmedicineoriginatedwithBradford-Hill,whousedrandomizationinthe

firstRCTinmedicine—thestreptomycintrial—becauseitpreventeddoctorsselect-

ingpatientsonthebasisofperceivedneed(oragainstperceivedneed,leaningover

backwardasitwere),anargumentrecentlyechoedbyWorrall(2007).Randomiza-

tionservesthispurpose,butsodoothernon-discretionaryschemes;whatisre-

quiredisthathiddeninformationshouldnotbeallowedtoaffecttheallocation.

Second,theidealrulesbywhichunitsareallocatedtotreatmentorcontrol

dependonthecovariatesandontheinvestigators’priorsabouthowthecovariates

affecttheoutcomes.Thisopensupallsortsofmethodsofinferencethatarelongfa-

miliartoeconomistsbutthatareexcludedbypurerandomization.Forexample,

whatphilosopherscallthehypothetico-deductivemethodworksbyusingtheoryto

makeapredictionthatcanbetakentothedataforpotentialfalsification(asinthe

schoolexampleabove).Thisisthewaythatphysicistslearn,asdoeconomistswhen

theyusetheorytoderivepredictionsthatcanbetestedagainstthedata,perhapsin

anRCT,butmorefrequentlynot.Someofthemostfruitfulresearchprogramsin

economicshavebeengeneratedbythepuzzlesthatresultwhenthedatafailto

Page 19: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

18

matchsuchtheoreticalpredictions,suchastheequitypremiumpuzzle,variouspur-

chasingpowerparitypuzzles,theFeldstein-Horiokapuzzle,theconsumption

smoothnesspuzzle,thepuzzleofcaloriedeclineinthefaceofmalnourishmentand

incomegrowth,andmanyothers.

Third,randomization,byrunningroughshodoverpriorinformationfrom

theoryandfromcovariates,iswastefulandevenunethicalwhenitunnecessarilyex-

posespeople,orunnecessarilymanypeople,topossibleharminariskyexperiment.

Worrall(2008)documentsthe(extreme)caseofECMO,anewtreatmentfornew-

bornswithpersistentpulmonaryhypertensionthatwasdevelopedinthe1970sby

intelligentanddirectedtrialanderrorwithinawell-understoodtheoryofthedis-

ease.Inearlyexperimentationbytheinventors,mortalitywasreducedfrom80to

20percent.TheinvestigatorsfeltcompelledtoconductanRCT,albeitwithanadap-

tive‘play-the-winner’designinwhicheachsuccessinanarmincreasedtheproba-

bilityofthenextbabybeingassignedtothatarm.Onebabyreceivedconventional

therapyanddied,11receivedECMOandlived.Evenso,astandardrandomizedcon-

trolledtrialwasthoughtnecessary.Withastoppingruleoffourdeaths,fourmore

babies(outoften)diedinthecontrolgroupandnoneoftheninewhoreceived

ECMO.

Fourth,thenon-randommethodsusepriorinformation,whichiswhythey

dobetterthanrandomization.Thisisbothanadvantageandadisadvantage,de-

pendingonone’sperspective.Ifpriorinformationisnotwidelyaccepted,orisseen

asnon-crediblebythoseweareseekingtopersuade,wewillgeneratemorecredible

estimatesifwedonotusethosepriors.Indeed,thisiswhyBCS(2016)recommend

randomizeddesigns,includinginmedicineandindevelopmenteconomics.Theyde-

velopatheoryofaninvestigatorwhoisfacinganadversarialaudiencewhowill

challengeanypriorinformationandcanevenpotentiallyvetoresultsbasedonit

(thinkofadministrativeagenciessuchastheFDAorjournalreferees).Theexperi-

mentertradesoffhisorherowndesireforprecision(andpreventingpossibleharm

tosubjects),whichwouldrequirepriorinformation,againstthewishesoftheaudi-

ence,whowantsnothingtodowiththosepriors.Eventhen,theapprovaloftheau-

dienceisonlyexante;oncethefullyrandomizedexperimenthasbeendone,nothing

Page 20: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

19

stopscriticsarguingthat,infact,therandomizationdidnotofferafairtestbecause

importantothercauseswerenotbalanced.AmongdoctorswhouseRCTs,andespe-

ciallymeta-analysis,suchargumentsare(appropriately)common(seeKramer

(2016)).

Today,whenthepublichascometoquestionexpertpriorknowledge,RCTs

willflourish.Incaseswherethereisgoodreasontodoubtthegoodfaithofexperi-

menters,asinmanypharmaceuticaltrials,randomizationwillindeedbeanappro-

priateresponse.Butwebelievesuchargumentsaredestructiveforscientificen-

deavor(whichisnotthepurposeoftheFDA)andshouldberesistedasageneral

prescriptioninscientificresearch.Previousknowledgeneedstobebuiltonandin-

corporatedintonewknowledge,notdiscardedinthefaceofaggressiveignorance.

Thesystematicrefusaltousepriorknowledgeandtheassociatedpreferencefor

RCTsarerecipesforpreventingcumulativescientificprogress.Intheend,itisalso

self-defeating.ToquoteRodrik(2016)“thepromiseofRCTsastheory-freelearning

machinesisafalseone.”

1.5StatisticalinferenceinRCTs

TheestimatedATEinasimpleRCTisthedifferenceinthemeansbetweenthetreat-

mentandcontrolgroups.Whencovariatesareallowedfor,asinmostRCTsineco-

nomics,theATEisusuallyestimatedfromthecoefficientonthetreatmentdummy

inaregressionthatlookslike(1),butwiththeheterogeneityin𝛽ignored.Modern

workcalculatesstandarderrorsallowingforthepossibilitythatresidualvariances

maybedifferentinthetreatmentandcontrolgroups,usuallybyclusteringthe

standarderrors,whichisequivalenttothefamiliartwosamplestandarderrorin

thecasewithnocovariates.Statisticalinferenceisdonewitht-valuesintheusual

way.Theseproceduresdonotalwaysgivetherightanswer.

Lookingbackat(1),theunderlyingobjectsofinterestaretheindividual

treatmenteffects𝛽( foreachoftheindividualsinthetrialsample.Neitherthey,nor

theirdistribution𝐺(𝛽)isidentifiedfromanRCT;becauseRCTsmakesofewas-

sumptionswhich,inmanycases,istheirstrength,theycanidentifyonlythemeanof

thedistribution.Inmanyobservationalstudies,researchersarepreparedtomake

moreassumptionsonfunctionalformsorondistributions,andforthatpriceweare

Page 21: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

20

abletoidentifyotherquantitiesofinterest.Withouttheseassumptions,inferences

mustbebasedonthedifferenceinthetwomeans,astatisticthatissometimesill-

behaved,asweshalldiscussbelow.Thisill-behaviorhasnothingtodowithRCTs,

perse,butwithinRCTs,andtheirminimalassumptions,wecannoteasilyswitch

fromthemeantosomeotherquantityofinterest.

Fisherproposedthatstatisticalinferenceshouldbedoneusingwhathasbe-

comeknownas“randomization”inference,aprocedurethatisasnon-parametricas

theRCT-basedestimateofanATEitself.Totestthenullhypothesisthat𝛽( = 0for

alli,notethat,underthenullthatthetreatmenthasnoeffectonanyindividual,an

estimatednonzeroATEmustbeaconsequenceoftheparticularrandomallocation

thatgeneratedit.Bytabulatingallpossiblecombinationsoftreatmentsandcontrols

inourtrialsample,andtheATEassociatedwitheach,wecancalculatetheexactdis-

tributionoftheestimatedATEunderthenull.Thisallowsustocalculatetheproba-

bilityofcalculatinganestimateaslargeasouractualestimatewhentherearenoef-

fectsoftreatment.Thisrandomizationtestrequiresafinitesample,butitwillwork

foranysamplesize(seeImbensandWooldridge(2009)foranexcellentaccountof

theprocedure).Imbens(2010)arguesthatitisthisrandomizationinferenceplus

theunbiasednessoftheATEthatprovidesthetwinnon-parametricpillarsthatsup-

portplacingRCTsatthe“verytop”ofthehierarchyofevidence.

Randomizationinferencecanbeusedfornullhypothesesthatspecifythatall

ofthetreatmenteffectsarezero,asintheaboveexample,butitcannotbeusedto

testthehypothesisthattheaveragetreatmenteffectiszero,whichwilloftenbeof

interest.Inagriculturaltrials,andinmedicine,thestronger(sharp)hypothesisthat

thetreatmenthasnoeffectwhateverisoftenofinterest.Inmanyeconomicapplica-

tionsthatinvolvemoney,suchaswelfareexperimentsorcost-benefitanalyses,we

areinterestedinwhethertheneteffectofthetreatmentispositiveornegative,and

inthesecases,randomizationinferencecannotbeused.Noneofwhichargues

againstitswideruseinsocialscienceswhenappropriate.

Incaseswhererandomizationinferencecannotbeused,wemustconstruct

testsforthedifferencesintwomeans.Standardprocedureswilloftenworkwell,but

therearetwopotentialpitfalls.One,the‘Fisher-Behrensproblem’,comesfromthe

Page 22: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

21

factthat,whenthetwosampleshavedifferentvariances—whichwetypicallywant

topermit—theusualt-statisticdoesnothavethet-distribution.Thesecondprob-

lem,whichismuchhardertoaddress,occurswhenthedistributionoftreatmentef-

fectsisnotsymmetric(BahadurandSavage(1956)).Neitherpitfallisspecificto

RCTs,butRCTsforceustoworkwithmeansinestimatingtreatmenteffectsand,

withonlyaveryfewexceptionsintheliterature,socialscientistswhouseRCTsap-

peartobeunawareofthedifficulties.

InthesimplecaseofcomparingtwomeansinanRCTwithoutcovariates,in-

ferenceisusuallybasedonthetwo–samplet–statisticwhichiscomputedbydivid-

ingtheATEbytheestimatedstandarderrorwhosesquareisgivenby

𝜎4 =𝑛" − 1 6" 𝑌( − 𝑌"(7"

4

𝑛"+

𝑛& − 1 6" 𝑌( − 𝑌&(7&4

𝑛&3

where0referstocontrolsand1totreatments,sothatthereare𝑛"treatmentsand

𝑛&controls,and𝑌"and𝑌&arethetwomeans.Ashaslongbeenknown,the“t-statis-

tic”basedon(3)isnotdistributedasStudent’stifthetwovariances(treatmentand

control)arenotidenticalbuthastheBehrens–Fisherdistribution.Inextremecases,

whenoneofthevariancesiszero,thet–statistichaseffectivedegreesoffreedom

halfofthatofthenominaldegreesoffreedom,sothatthetest-statistichasthicker

tailsthanallowedfor,andtherewillbetoomanyrejectionswhenthenullistrue.

Young(2016)arguesthatthisproblemisworsewhenthetrialresultsarean-

alyzedbyregressingoutcomesnotonlyonthetreatmentdummybutalsoonaddi-

tionalcontrolsandwhenusingclusteredorrobuststandarderrors.Whenthede-

signmatrixissuchthatthemaximalinfluenceislarge,sothatforsomeobservations

outcomeshavelargeinfluenceontheirownpredictedvalues,thereisareductionin

theeffectivedegreesoffreedomforthet–value(s)oftheaveragetreatmenteffect(s)

leadingtospuriousfindingsofsignificance.Younglooksat2,003regressionsre-

portedin53RCTpapersintheAmericanEconomicAssociationjournalsandrecalcu-

latesthesignificanceoftheestimatesusingrandomizationinferenceappliedtothe

authors’originaldata.In30to40percentoftheestimatedtreatmenteffectsinindi-

vidualequationswithcoefficientsthatarereportedassignificant,hecannotreject

thenullofnoeffectforanyobservation;thefractionofspuriouslysignificantresults

Page 23: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

22

increasesfurtherwhenhesimultaneouslytestsforallresultsineachpaper.These

spuriousfindingscomeinpartfromissuesofmultiple-hypothesistesting,both

withinregressionswithseveraltreatmentsandacrossregressions.Withinregres-

sions,treatmentsarelargelyorthogonal,butauthorstendtoemphasizesignificant

t–valuesevenwhenthecorrespondingF-testsareinsignificant.Acrossequations,

resultsareoftenstronglycorrelated,sothat,atworst,differentregressionsarere-

portingvariantsofthesameresult,thusspuriouslyaddingtothe“killcount”ofsig-

nificanteffects.Atthesametime,thepervasivenessofobservationswithhighinflu-

encegeneratesspurioussignificanceonitsown.

Theseissuesarenowbeingtakenmoreseriously.InadditiontoYoung

(2016),ImbensandKolesár(2016)providepracticaladvicefordealingwiththe

Fisher-Behrensproblem,andthebestcurrentpracticetriestobecarefulaboutmul-

tiplehypothesistesting.Yetitremainsthecasethatmanyoftheresultsinthelitera-

turearespuriouslysignificant.

Spurioussignificancealsoariseswhenthedistributionoftreatmenteffects

containsoutliersor,moregenerally,isnotsymmetric.Standardt–testsbreakdown

indistributionswithenoughskewness(seeLehmannandRomano(2005,p.466–

8)).Howdifficultisittomaintainsymmetry?Andhowbadlyisinferenceaffected

whenthedistributionoftreatmenteffectsisnotsymmetric?Ineconomics,manytri-

alshaveoutcomesvaluedinmoney.Doesananti-povertyinnovation—forexample

microfinance—increasetheincomesoftheparticipants?Incomeitselfisnotsym-

metricallydistributed,andthismightbetrueofthetreatmenteffectstooifthereare

afewpeoplewhoaretalentedbutcredit-constrainedentrepreneursandwhohave

treatmenteffectsthatarelargeandpositive,whilethevastmajorityofborrowers

fritterawaytheirloans,oratbestmakepositivebutmodestprofits.Arecentsum-

maryoftheliteratureisconsistentwiththis(seeBanerjee,Karlan,andZinman

(2015)).Anotherimportantexampleisexpendituresonhealthcare.Mostpeople

havezeroexpenditureinanygivenperiod,butamongthosewhodoincurexpendi-

tures,afewindividualsspendhugeamountsthataccountforalargeshareoftheto-

tal.Indeed,inthefamousRandhealthexperiment(seeManning,Newhouseetal.

Page 24: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

23

(1987,1988)),thereisasingleverylargeoutlier.Theauthorsrealizethatthecom-

parisonofmeansacrosstreatmentarmsisfragile,and,althoughtheydonotsee

theirproblemexactlyasdescribedhere,theyobtaintheirpreferredestimatesusing

astructuralapproachthatisexplicitlydesignedtomodeltheskewnessofexpendi-

tures.

Insomecases,itwillbeappropriatetodealwithoutliersbytrimming,trans-

forming,oreliminatingobservationsthathavelargeeffectsontheestimates.Butif

theexperimentisaprojectevaluationdesignedtoestimatethenetbenefitsofapol-

icy,theeliminationofgenuineoutliers,asintheRandHealthExperiment,willviti-

atetheanalysis.Itispreciselytheoutliersthatmakeorbreaktheprogram.Trans-

formations,suchastakinglogarithms,mayhelptoproducesymmetry,butthey

changethenatureofthequestionbeingasked;acostbenefitanalysismustbedone

indollars,notlogdollars.

Weconsideranexamplethatillustrateswhatcanhappeninarealisticbut

simplifiedcase;thefullresultsarereportedintheAppendix.Weimagineapopula-

tionofindividuals,eachwithatreatmenteffect𝛽( .Theparentpopulationmeanof

thetreatmenteffectsiszero,butthereisalongtailofpositivevalues;weusealeft-

shiftedlognormaldistribution.Wehaveamicrofinancetrialinmind,wherethereis

alongpositivetailofrareindividualswhocandoamazingthingswithcreditwhile

mostpeoplecannotuseiteffectively.Atrialsampleof2n individualsisrandomly

drawnfromtheparentpopulationandisrandomlysplitbetweenntreatmentsand

ncontrols.Withineachtrialsample,whosetrueATEwillgenerallydifferfromzero

becauseofthesampling,werunmanyRCTsandtabulatethevaluesoftheATEfor

each.

Usingstandardt-tests,the(trueintheparentdistribution)hypothesisthat

theATEiszeroisrejectedbetween14(𝑛 = 25)and6percent(𝑛 = 500)ofthetime.

Theserejectionscomefromtwoseparateissues,bothofwhicharerelevantinprac-

tice;(a)thattheATEintrialsamplediffersfromtheATEintheparentpopulationof

interest,and(b)thatthet-valuesarenotdistributedastinthepresenceofoutliers.

Page 25: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

24

Theproblemcasesarewhenthetrialsamplehappenstocontainoneormoreoutli-

ers,somethingthatisalwaysariskgiventhelongpositivetailoftheparentdistribu-

tion.Whenthishappens,everythingdependsonwhethertheoutlierisamongthe

treatmentsorthecontrols;ineffect,theoutliersbecomethesample,reducingthe

effectivenumberofdegreesoffreedom.Inextremecases,oneofwhichisillustrated

inFigureA.1,thedistributionofestimatedATEsisbimodal,dependingonthegroup

towhichtheoutlierisassigned.Whentheoutlierisinthetreatmentgroup,thedis-

persionacrossoutcomesislarge,asistheestimatedstandarderror,andsothose

outcomesrarelyrejectthenullusingthestandardtableoft–values.Theover-rejec-

tionscomefromcaseswhentheoutlierisinthecontrolgroup,theoutcomesarenot

sodispersed,andthet–valuescanbelarge,negative,andsignificant.Whilethese

casesofbimodaldistributionsmaynotbecommonanddependontheexistenceof

largeoutliers,theyillustratetheprocessthatgeneratestheover-rejectionsandspu-

rioussignificance.Notethatthereisnoremedythroughrandomizationinference

here,giventhatourinterestisinthehypothesisthattheaveragetreatmenteffectis

zero.

OurreadingoftheliteratureonRCTsindevelopmenteconomicssuggests

thattheyarenotexemptfromtheseconcerns.Manydevelopmenttrialsarerunon

(sometimesvery)smallsamples,theyhavetreatmenteffectswhereasymmetryis

hardtoruleout—especiallywhentheoutcomesareinmoney—andtheyoftengive

resultsthatarepuzzling,oratleastnoteasilyinterpretedintermsofeconomicthe-

ory.NeitherBanerjeeandDuflo(2012)norKarlanandAppel(2011),whocitemany

RCTs,raiseconcernsaboutmisleadinginference,implicitlytreatingallresultsasre-

liable.Nodoubttherearebehaviorsintheworldthatareinconsistentwithstandard

economics,andsomecanbeexplainedbystandardbiasesinbehavioraleconomics,

butitwouldalsobegoodtobesuspiciousofthesignificancetestsbeforeaccepting

thatanunexpectedfindingiswell-supportedandthattheorymustberevised.Repli-

cationofresultsindifferentsettingsmaybehelpful,iftheyaretherightkindof

places(seeourdiscussioninSection2).Yetithardlysolvestheproblemgiventhat

theasymmetrymaybeinthesamedirectionindifferentsettings,thatitseemslikely

tobesoinjustthosesettingsthataresufficientlyliketheoriginaltrialsettingtobe

Page 26: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

25

ofuseforinferenceaboutthepopulationofinterest,andthatthe“significant”t–val-

ueswillshowdeparturesfromthenullinthesamedirection.This,then,replicates

thespuriousfindings.

Asummary

Whatdotheargumentsofthissectionmeanabouttheimportanceofrandomization

andtheinterpretationthatshouldbegiventoanestimatedATEfromarandomized

trial?First,weshouldbesurethatanunbiasedestimateofanATEforthetrialpopu-

lationislikelytobeusefulenoughtowarrantthecostsofrunningthetrial.Second,

sincerandomizationdoesnotensureorthogonality,caremustbetaken(e.g.by

blinding)thattherearenosignificantpost-randomizationcorrelateswiththetreat-

ment.Thisisawell-knownlessonbutmanysocialandeconomictrialsarenot

blindedandinsufficientdefenseisofferedthatunbiasednessisnotundermined.In-

deed,lackofblindingisnottheonlysourceofpost-randomizationbias.Treatments

andcontrolsmaybehandledindifferentplaces,orbydifferentlytrainedpractition-

ers,oratdifferenttimesofday,andthesedifferencescanbringwiththemsystem-

aticdifferencesintheothercausestowhichthetwogroupsareexposed.Thesecan,

andshould,beguardedagainst.Butdoingsorequiresanunderstandingofwhat

thesecausallyrelevantfactorsmightbe.Third,theinferenceproblemsreviewed

herecannotjustbepresumedaway.Whenthereissubstantialheterogeneity,the

ATEinthetrialsamplecanbequitedifferentfromtheATEinthepopulationofin-

terest,evenifthetrialisrandomlyselectedfromthatpopulation;inpractice,there-

lationshipbetweenthetrialsampleandthepopulationisoftenobscure.

Beyondthat,inmanycases,thestatisticalinferencewillbefine,butserious

attentionshouldbegiventothepossibilitythatthereareoutliersintreatmentef-

fects,somethingthatknowledgeoftheproblemcansuggestandwhereinspectionof

themarginaldistributionsoftreatmentsandcontrolsmaybeinformative.Forexam-

ple,ifbotharesymmetric,itseemsunlikely(thoughcertainlynotimpossible)that

thetreatmenteffectsarehighlyskewed.MeasurestodealwithFisher-Behrens

shouldbeusedandrandomizationinferenceconsideredwhenappropriatetothe

hypothesisofinterest.

Page 27: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

26

Allofthiscanberegardedasrecommendationsforimprovementtocurrent

practice,notachallengetoit.Morefundamentally,westronglycontesttheoften-ex-

pressedideathattheATEcalculatedfromanRCTisautomaticallyreliable,thatran-

domizationautomaticallycontrolsforunobservables,orworstofall,thatthecalcu-

latedATEistrue.If,bychance,itisclosetothetruth,thetruthwearereferringtois

thetruthinthetrialsampleonly.Tomakeanyinferencebeyondthatrequiresanar-

gumentofthekindweconsiderinthenextsection.Wehavealsoarguedthat,de-

pendingonwhatwearetryingtomeasureandwhatwewanttousethatmeasure

for,thereisnopresumptionthatanRCTisthebestmeansofestimatingit.Thattoo

requiresanargument,notapresumption.

Section2:Usingtheresultsofrandomizedcontrolledtrials

2.1Introduction

SupposewehaveestimatedanATEfromawell-conductedRCTonatrialsample,

andourstandarderrorgivesusreasontobelievethattheeffectdidnotcomeabout

bychance.Wethushavegoodwarrantthatthetreatmentcausestheeffectinour

trialsample,uptothelimitsofstatisticalinference.Whataresuchfindingsgoodfor?

Theliteratureineconomics,asindeedinmedicineandinsocialpolicy,has

paidmoreattentiontoobtainingresultsthantoconsideringwhatcanbedonewith

them.Thereislittletheoreticalorempiricalworktoguideushowandforwhatpur-

posestousethefindingsofRCTs,suchastheconditionsunderwhichthesamere-

sultsholdoutsideoftheoriginalsettings,howtheymightbeadaptedforuseelse-

where,orhowtheymightbeusedforformulating,testing,understanding,orprob-

inghypothesesbeyondtheimmediaterelationbetweenthetreatmentandtheout-

comeinvestigatedinthestudy.Yetitcannotbethatknowinghowtouseresultsis

lessimportantthanknowinghowtodemonstratethem.Anychainofevidenceisonly

asstrongasitweakestlink,sothatarigorouslyestablishedeffectwhoseapplicabil-

ityisjustifiedbyaloosedeclarationofsimilewarrantslittle.Iftrialsaretobeuseful,

weneedpathstotheirusethatareascarefullyconstructedasarethetrialsthem-

selves.

Theargumentforthe“primacyofinternalvalidity”madebyShadish,Cook,

andCampbell(2002)maybereasonableasawarningthatbadRCTsareunlikelyto

Page 28: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

27

generalize,butitissometimesincorrectlytakentoimplythatresultsofaninternally

validtrialwillautomatically,oroften,apply‘asis’elsewhere,orthatthisshouldbe

thedefaultassumptionfailingargumentstothecontrary,asifaparameter,once

wellestablished,canbeexpectedtobeinvariantacrosssettings.Aninvarianceas-

sumptionisoftenmadeinmedicine,forexample,whereitissometimesplausible

thataparticularprocedureordrugworksthesamewayeverywhere(thoughsee

Horton(2000)forastrongdissentandRothwell(2005)forexamplesonbothsides

ofthequestion).Weshouldalsonotetherecentmovementtoensurethattestingof

drugsincludeswomenandminoritiesbecausemembersofthosegroupssuppose

thattheresultsoftrialsonmostlyhealthyyoungwhitemalesdonotapplytothem.

2.2Usingresults,transportability,andexternalvalidity

Supposeatrialhasestablishedaresultinaspecificsetting.If`thesame’resultholds

elsewhere,itissaidtohave`externalvalidity’.Externalvaliditymayreferjusttothe

transportabilityofthecausalconnection,orgofurtherandrequirereplicationofthe

magnitudeoftheATE.Eitherway,theresultholds—everywhere,orwidely,orin

somespecificelsewhere—oritdoesnot.

Thisbinaryconceptofexternalvalidityisoftenunhelpfulbecauseitasksthe

resultsofanRCTtosatisfyaconditionthatisneithernecessarynorsufficientfora

trialtobeuseful,andsobothoverstatesandunderstatestheirvalue.Itdirectsusto-

wardsimpleextrapolation—whetherthesameresultholdselsewhere—orsimple

generalization—itholdsuniversallyoratleastwidely—andawayfrommorecom-

plexbutmoreusefulapplicationsoftheresults.Thefailureofexternalvalidityinter-

pretedassimplegeneralizationorextrapolationsayslittleaboutthevalueofthe

trial.

First,thereareseveralusesofRCTsthatdonotrequiretransportabilitybe-

yondtheoriginalcontext;wediscusstheseinthenextsubsection.Second,thereare

oftengoodreasonstoexpectthattheresultsfromawell-conducted,informative,

andpotentiallyusefulRCTwillnotapplyelsewhereinanysimpleway.Withoutfur-

therunderstandingandanalysis,evensuccessfulreplicationtellsuslittleeitherfor

oragainstsimplegeneralizationortosupportfortheconclusionthatthenextwill

workinthesameway.Nordofailuresofreplicationmaketheoriginalresultuseless.

Page 29: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

28

Weoftenlearnmuchfromcomingtounderstandwhyreplicationfailedandcanuse

thatknowledge,inlookingforhowthefactorsthatcausedtheoriginalresultmight

operatedifferentlyindifferentsettings.Third,andparticularlyimportantforscien-

tificprogress,theRCTresultcanbeincorporatedintoanetworkofevidenceandhy-

pothesesthattestorexploreclaimsthatlookverydifferentfromtheresultsre-

portedfromtheRCT.WeshallgiveexamplesbelowofextremelyusefulRCTsthat

arenotexternallyvalidinthe(usual)sensethattheirresultsdonotholdelsewhere,

whetherinaspecifictargetsettingorinthemoresweepingsenseofholdingevery-

where.

BertrandRussell’schicken(Russell(1912))providesanexcellentexampleof

thelimitationstostraightforwardextrapolationfromrepeatedsuccessfulreplica-

tion.Thebirdinfers,onrepeatedevidence,thatwhenthefarmercomesinthe

morning,hefeedsher.TheinferenceservesherwelluntilChristmasmorning,when

hewringsherneckandservesherfordinner.Thoughthischickendidnotbaseher

inferenceonanRCT,hadweconstructedoneforher,wewouldhaveobtainedthe

sameresultthatshedid.Herproblemwasnothermethodology,butratherthatshe

didnotunderstandthesocialandeconomicstructurethatgaverisetothecausalre-

lationsthatsheobserved.

So,establishingcausalitydoesnothinginandofitselftoguaranteegenerali-

zability.NordoestheabilityofanidealRCTtoeliminatebiasfromselectionorfrom

omittedvariablesmeanthattheresultingATEfromthetrialsamplewillapplyany-

whereelse.Theissueisworthmentioningonlybecauseoftheenormousweight

thatiscurrentlyattachedineconomicstothediscoveryandlabelingofcausalrela-

tions,aweightthatishardtojustifyforeffectsthatmayhaveonlylocalapplicabil-

ity,whatmightbelabeled‘anecdotalcausality’.Theoperationofacausegenerally

requiresthepresenceof“supportfactors”,withoutwhichacausethatproducesthe

targetedeffectinoneplace,eventhoughitmaybepresentandhavethecapacityto

operateelsewhere,willremainlatentandinoperative.WhatMackie(1974)called

INUScausality(InsufficientbutNon-redundantpartsofaconditionthatisitselfUn-

necessarybutSufficientforacontributiontotheoutcome)isoftenthekindofcau-

salitywesee.Astandardexampleisahouseburningdownbecausethetelevision

Page 30: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

29

waslefton,althoughtelevisionsdonotoperateinthiswaywithoutsupportfactors,

suchaswiringfaults,thepresenceoftinder,andsoon.Thisisstandardfareinepi-

demiology,whichusestheterm`causalpie’torefertoasetofcausesthatarejointly

butnotseparatelysufficientforaneffect.

Ifwerewrite(1)intheform

𝑌( = 𝛽(𝑇( + 𝛾<𝑥(< = 𝜃 𝑤( 𝑇(

@

<A"

+ 𝛾<𝑥(<

@

<A"

(4)

wherethefunction𝜃(. )controlshowak-vector𝑤( ofk`supportfactors’affectindi-

viduali’streatmenteffect𝛽( .Thesupportfactorsmayincludesomeofthex’s.Since

theATEistheaverageofthe𝛽(𝑠,twopopulationswillhavethesameATEifandonly

iftheyhavethesameaveragefortheneteffectofthesupportfactorsnecessaryfor

thetreatmenttowork,i.e.forthequantityinfrontof𝑇( .Thesearehoweverjustthe

kindoffactorsthatarelikelytobedifferentlydistributedindifferentpopulations,

andindeedwedogenerallyfinddifferentATEsindifferenteconomic(andotherso-

cialpolicy)RCTsindifferentplaceseveninthecaseswhere(unusually)theyall

pointinthesamedirection.

Causalprocessesoftenrequirehighlyspecializedeconomic,cultural,orsocial

structurestoenablethemtowork.ConsidertheRubeGoldbergmachinethatis

riggedupsothatflyingakitesharpensapencil(CartwrightandHardie(2012,77)).

Theunderlyingstructureaffordsaveryspecificformof(4)thatwillnotdescribe

causalprocesseselsewhere.NeitherthesameATEnorthesamequalitativecausal

relationscanbeexpectedtoholdwherethespecificformfor(4)isdifferent.Indeed,

wecontinuallyattempttodesignsystemsthatwillgeneratecausalrelationsthatwe

likeandthatwillruleoutcausalrelationsthatwedonotlike.Healthcaresystems

aredesignedtopreventnursesanddoctorsmakingerrors;carsaredesignedsothat

driverscannotstarttheminreverse;workschedulesforpilotsaredesignedsothey

donotflytoomanyconsecutivehourswithoutrestbecausealertnessandperfor-

mancearecompromised.

Page 31: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

30

AsintheRubeGoldbergmachineandinthedesignofcarsandworksched-

ules,theeconomicstructureandequilibriummaydifferinwaysthatsupportdiffer-

entkindsofcausalrelationsandthusrenderatrialinonesettinguselessinanother.

Forexample,atrialthatreliesonprovidingincentivesforpersonalpromotionisof

nouseinastateinwhichapoliticalsystemlockspeopleintotheirsocialandeco-

nomicpositions.Cashtransfersthatareconditionalonparentstakingtheirchildren

toclinicscannotimprovechildhealthintheabsenceoffunctioningclinics.Policies

targetedatmenmaynotworkforwomen.Weusealevertotoastourbread,butlev-

ersonlyoperatetotoastbreadinatoaster;wecannotbrowntoastbypressingan

accelerator,eveniftheprincipleoftheleveristhesameinbothatoasterandacar.

Ifwemisunderstandthesetting,ifwedonotunderstandwhythetreatmentinour

RCTworks,werunthesamerisksasRussell’schicken.

2.3WhenRCTsspeakforthemselves:notransportabilityrequired

Forsomethingswewanttolearn,anRCTisenoughbyitself.AnRCTmayprovidea

counterexampletoageneraltheoreticalproposition,eithertothepropositionitself

(asimplerefutationtest)ortosomeconsequenceofit(acomplexrefutationtest).

AnRCTmayalsoconfirmapredictionofatheory,andalthoughthisdoesnotcon-

firmthetheory,itisevidenceinitsfavor,especiallyifthepredictionseemsinher-

entlyunlikelyinadvance.Thisisallfamiliarterritory,andthereisnothingunique

aboutanRCT;itissimplyoneamongmanypossibletestingprocedures.Evenwhen

thereisnotheory,orveryweaktheory,anRCT,bydemonstratingcausalityinsome

populationcanbethoughtofasproofofconcept,thatthetreatmentiscapableof

workingsomewhere.Thisisoneoftheargumentsfortheimportanceofinternalva-

lidity.

NoristransportationcalledforwhenanRCTisusedforevaluation,forexam-

pletosatisfydonorsthattheprojecttheyfundedachieveditsaimsinthepopulation

inwhichitwasconducted.Evenso,forsuchevaluations,saybytheWorldBank,to

beglobalpublicgoodsrequiresargumentsandguidelinesthatjustifyusingthere-

sultsinsomewayelsewhere;theglobalpublicgoodisnotanautomaticby-product

oftheBankfulfillingitsfiduciaryresponsibility.Whenthecomponentsoftreat-

mentschangeacrossstudies,evaluationsneednotleadtocumulativeknowledge.Or

Page 32: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

31

asHeckmanetal(1999,1934)note,“thedataproducedfromthem[socialexperi-

ments]arefarfromidealforestimatingthestructuralparametersofbehavioral

models.Thismakesitdifficulttogeneralizefindingsacrossexperimentsortouse

experimentstoidentifythepolicy-invariantstructuralparametersthatarerequired

foreconometricpolicyevaluation.”

Ofcourse,whenweaskexactlywhatthoseinvariantstructuralparameters

are,whethertheyexist,andhowtheyshouldbemodeled,weopenupmajorfault

linesinmodernappliedeconomics.Forexample,wedonotintendtoendorseinter-

temporaldynamicmodelsofbehaviorastheonlywayofrecoveringtheparameters

thatweneed.Wealsorecognizethattheusefulnessofsimplepricetheoryisnotas

universallyacceptedasitoncewas.Butthepointremainsthatweneedsomething,

someregularityorsomeinvariance,andthatsomethingcanrarelyberecoveredby

simplygeneralizingacrosstrials.

Athirdnon-problematicandimportantuseofanRCTiswhentheparameter

ofinterestistheATEinawell-definedpopulationfromwhichthetrialsampleisit-

selfarandomsample.Inthiscasethesampleaveragetreatmenteffect(SATE)isan

unbiasedestimatorofthepopulationaveragetreatmenteffect(PATE)that,byas-

sumption,isourtarget(seeImbens(2004)fortheseterms).Werefertothisasthe

`publichealth’case;likemanypublichealthinterventions,thetargetistheaverage,

`populationhealth,’notthehealthofindividuals.Onemajor(andwidelyrecog-

nized)dangerofthisuseofRCTsisthatscalingupfrom(evenarandom)sampleto

thepopulationwillnotgothroughinanysimplewayiftheoutcomesofindividuals

orgroupsofindividualschangethebehaviorofothers—whichwillbecommonin

economicexamplesbutperhapslesssoinhealth.Thereisalsoanissueoftimingif

timeelapsesbetweenthetrialandtheimplementation.

Ineconomics,a`public-health-style’exampleistheimpositionofacommod-

itytax,wherethetotaltaxrevenueisofinterestandpolicymakersdonotcarewho

paysthetax.Indeed,theorycanoftenidentifyaspecific,well-definedquantity

whosemeasurementiskeyforapolicy(seeDeatonandNg(1998)foranexampleof

whatChetty(2009)callsa“sufficient”statistic).Inthiscase,thebehaviorofaran-

domsampleofindividualsmightwellprovideagoodguidetothetaxrevenuethat

Page 33: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

32

canbeexpected.Anothercasecomesfromworkonpovertyprogramswherethe

sponsorsaremostconcernedaboutthebudget;wediscussthesecasesattheendof

thisSection.Evenhere,itiseasytoimaginebehavioraleffectscomingintoplaythat

driveawedgebetweenthetrialanditsfull-scaleimplementation,forexampleif

complianceishigherwhentheschemeiswidelypublicized,orifgovernmentagen-

ciesimplementtheschemedifferentlyfromtrialists.

2.4Transportingresultslaterallyandglobally

TheprogramofRCTsineconomics,asinotherareasofsocialscience,hasthe

broadergoaloffindingout`whatworks.’Atitsmostambitious,thisaimsforuniver-

salreach,andthedevelopmenteconomicsliteraturefrequentlyarguesthat“credi-

bleimpactevaluationsareglobalpublicgoodsinthesensethattheycanofferrelia-

bleguidancetointernationalorganizations,governments,donors,andnongovern-

mentalorganizations(NGOs)beyondnationalborders”,DufloandKremer(2008,

93).SometimestheresultsofasingleRCTareadvocatedashavingwideapplicabil-

ity,withespeciallystrongendorsementwhenthereisatleastonereplication.For

example,KremerandHolla(2009,3)useaKenyantrialasthebasisforablanket

statementwithoutspecifyingcontext,“Provisionoffreeschooluniforms,forexam-

ple,leadsto10%-15%reductionsinteenpregnancyanddropoutrates.”Dufloand

Kremer(2008,104),writingaboutanothertrial,aremorecautious,citingtwoeval-

uationsandrestrictingthemselvestoIndia:“Onecanberelativelyconfidentabout

recommendingthescaling-upofthisprogram,atleastinIndia,onthebasisofthese

estimates,sincetheprogramwascontinuedforaperiodoftime,wasevaluatedin

twodifferentcontexts,andhasshownitsabilitytoberolledoutonalargescale.”

Evenanumberofreplicationsdonotprovideasoundbasisforinference.Without

theorytosupporttheprojectionofresults,thisisjustinductionbysimpleenumera-

tion—swan1iswhite,swan2iswhite,...,soallswansarewhite.

TheproblemofgeneralizationextendsbeyondRCTs,toboth`fullycon-

trolled’laboratoryexperimentsandtomostnon-experimentalfindings.Ourargu-

menthereisthatevidencefromRCTsisnotautomaticallysimplygeneralizable,and

thatitssuperiorinternalvalidity,ifandwhenitexists,doesnotprovideitwithany

uniqueinvarianceacrosscontext.Thattransportationisfarfromautomaticalso

Page 34: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

33

tellsuswhy(evenideal)RCTsofsimilarinterventionsgivedifferentanswersindif-

ferentsettings.Suchdifferencesdonotnecessarilyreflectmethodologicalfailings

andwillholdacrossperfectlyexecutedRCTsjustastheydoacrossobservational

studies.

ManyadvocatesofRCTsunderstandthat`whatworks’needstobequalified

to`whatworksunderwhichcircumstances’andtrytosaysomethingaboutwhat

thosecircumstancesmightbe,forexample,byreplicatingRCTsindifferentplaces

andthinkingintelligentlyaboutthedifferencesinoutcomeswhentheyfindthem.

Sometimesthisisdoneinasystematicway,forexamplebyhavingmultipletreat-

mentswithinthesametrialsothatitispossibletoestimatea`responsesurface’that

linksoutcomestovariouscombinationsoftreatments(seeGreenbergandSchroder

(2004)orShadishetal(2002)).Forexample,theRANDhealthexperimenthadmul-

tipletreatments,allowinginvestigation,notofhowmuchhealthinsurancein-

creasedexpendituresunderdifferentcircumstances.Someofthenegativeincome

taxexperiments(NITs)inthe1960sand1970sweredesignedtoestimateresponse

surfaces,withthenumberoftreatmentsandcontrolsineacharmoptimizedtomax-

imizeprecisionofestimatedresponsefunctionssubjecttoanoverallcostlimit(see

Conlisk(1973)).Experimentsontime-of-daypricingforelectricityhadasimilar

structure(seeAigner(1985)).

TheexperimentsbyMDRC(originallyknownastheManpowerDevelopment

ResearchCorporation)havealsobeenanalyzedacrosscitiesinanefforttolinkcity

featurestotheresultsoftheRCTswithinthem(seeBloom,Hill,andRiccio(2005)).

UnliketheRANDandNITexamples,theseareexpostanalysesofcompletedtrials;

thesameistrueofVivalt(2015),whofinds,forthecollectionoftrialsshestudied,

thatdevelopment-relatedRCTsrunbygovernmentagenciestypicallyfindsmaller

(standardized)effectsizesthanRCTsrunbyacademicsorbyNGOs.Boldetal

(2013),whoranparallelRCTsonaninterventionimplementedeitherbyanNGOor

bythegovernmentofKenya,foundsimilarresultsthere.Notethattheseanalyses

haveadifferentpurposefrommeta-analysesthatassumethatdifferenttrialsesti-

matethesameparameteruptonoiseandaverageinordertoincreaseprecision.

Page 35: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

34

Althoughthereareissueswithallofmethodsofinvestigatingdifferences

acrosstrials,withoutsomedisciplineitistooeasytocomeupwith`just-so’orfairy

storiesthataccountfordifferences.Weriskaprocedurethat,ifaresultisreplicated

infullorinpartinatleasttwoplaces,putsthattreatmentintothe`itworks’box

and,iftheresultdoesnotreplicate,casuallyinterpretsthedifferenceinawaythat

allowsatleastsomeofthefindingstosurvive.

Howcanwedobetterthansimplegeneralizationandsimpleextrapolation?

Manywritersemphasizetheroleoftheoryintransportingandusingtheresultsof

trials,andweshalldiscussthisinthenextsubsection.Butstatisticalapproachesare

alsowidelyused;thesearedesignedtodealwiththepossibilitythattreatmentef-

fectsvarysystematicallywithothervariables.Referringbackto(4),itisclearthat,

supposingthesameformof(4)obtains,ifthedistributionofthewvaluesisthe

sameinthenewcircumstancesasintheold,theATEintheoriginaltrialwillholdin

thenewcircumstances.Ingeneral,ofcourse,thisconditionwillnothold,nordowe

haveanyobviouswayofcheckingitunlessweknowwhatthesupportfactorsarein

bothplaces.Oneproceduretodealwithinteractionsispost-experimentalstratifica-

tion,whichparallelspost-surveystratificationinsamplesurveys.Thetrialisbroken

upintosubgroupsthathavethesamecombinationofknown,observablew’s(age,

race,genderforexample),thentheATEswithineachofthesubgroupsarecalcu-

lated,andthentheyarereassembledaccordingtotheconfigurationofw’sinthe

newcontext.ThiscanbeusedtoestimatetheATEinanewcontext,ortocorrectes-

timatestotheparentpopulationwhenthetrialsampleisnotarandomsampleof

theparent.Othermethodscanbeusedwhentherearetoomanyw’sforstratifica-

tion,forexamplebyestimatingtheprobabilityofeachobservationinthepopulation

includedinthetrialsampleasafunctionofthew’s,thenweightingeachobservation

bytheinverseofthesepropensityscores.AgoodreferenceforthesemethodsisStu-

artetal(2011),orineconomics,Angrist(2004)andHotz,Imbens,andMortimer

(2005).

Thesemethodsareoftennotapplicable,however.First,reweightingworks

onlywhentheobservablefactorsusedforreweightingincludeall(andonly)genu-

ineinteractivecauses.Second,aswithanyformofreweighting,thevariablesusedto

Page 36: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

35

constructtheweightsmustbepresentinboththeoriginalandnewcontext.Forex-

ample,ifwearetocarryaresultforwardintime,wemaynotbeabletoextrapolate

fromaperiodoflowinflationtoaperiodofhighinflation.AsHotzetal(2005)note,

itwilltypicallybenecessarytoruleoutsuch`macro’effects,whetherovertime,or

overlocations.Third,italsodependsonassumingthatthesamegoverningequation

(4)coversthetrialandthetargetpopulation.

PearlandBareinboim(2011,2014)andBareinboimandPearl(2013,2014)

providestrategiesforinferringinformationaboutnewpopulationsfromtrialre-

sultsthataremoregeneralthanreweighting.Theysupposewehaveavailableboth

causalinformationandprobabilisticinformationforpopulationA(e.g.theexperi-

mentalone),whileforpopulationB(thetarget)wehaveonly(some)probabilistic

information,andalsothatweknowthatcertainprobabilisticandcausalfactsare

sharedbetweenthetwoandcertainonesarenot.Theyoffertheoremsdescribing

whatcausalconclusionsaboutpopulationBaretherebyfixed.Theirworkunder-

linesthefactthatexactlywhatconclusionsaboutonepopulationcanbesupported

byinformationaboutanotherdependsonexactlywhatcausalandprobabilisticfacts

theyhaveincommon. ButasMuller(2015)notes,this,liketheproblemwithsimple

reweighting,takesusbacktothesituationthatRCTsaredesignedtoavoid,where

weneedtostartfromacompleteandcorrectspecificationofthecausalstructure.

RCTscanavoidthisinestimation—whichisoneoftheirstrengths,supportingtheir

credibility—butthebenefitvanishesassoonaswetrytocarrytheirresultstoanew

context.

Thisdiscussionleadstoanumberofpoints.First,wecannotgettogeneral

claimsbysimplegeneralization;thereisnowarrantfortheconvenientassumption

thattheATEestimatedinaspecificRCTisaninvariantparameter,northatthe

kindsofinterventionsandoutcomeswemeasureintypicalRCTsparticipateingen-

eralcausalrelations.Whileitistruethatgeneralcausalclaimsexist—thatgravita-

tionalmassesattracteachother,orthatpeoplerespondtoincentives—theseuse

relativelyabstractconceptsandoperateatamuchhigherlevelthantheclaimsthat

canbereasonablyinferredfromatypicalRCT.

Page 37: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

36

Second,thoughtfulpre-experimentalstratificationinRCTsislikelytobeval-

uable,orfailingthat,subgroupanalysis,becauseitcanprovideinformationthatmay

beusefulforgeneralizationortransportation.Forexample,KremerandHolla

(2009)notethat,intheirtrials,schoolattendanceissurprisinglysensitivetosmall

subsidies,whichtheysuggestisbecausetherearealargenumberofstudentsand

parentswhoareonthe(financial)marginbetweenattendingandnotattending

school;ifthisisindeedthemechanismfortheirresults,agoodvariableforstratifi-

cationwouldbedistancefromtherelevantcutoff.Wealsoneedtoknowthatthis

samemechanismworksinanynewtargetsetting.

Third,weneedtobeexplicitaboutcausalstructure,evenifthatmeansmore

modelbuildingandmore—ordifferent—assumptionsthanadvocatesofRCTsare

oftencomfortablewith.Tobeclear,modelingcausalstructuredoesnotcommitus

totheelaborateandoftenincredibleassumptionsthatcharacterizesomestructural

modelingineconomics,butthereisnoescapefromthinkingaboutthewaythings

work;thewhyaswellasthewhat.

Fourth,wewilltypicallyneedtoknowmorethantheresultsoftheRCTitself,

forexampleaboutdifferencesinsocial,economic,andculturalstructuresandabout

thejointdistributionsofcausalvariables,knowledgethatwilloftenonlybeavaila-

blethroughobservationalstudies.Wewillalsoneedexternalinformation,boththe-

oreticalandempirical,tosettleonaninformativecharacterizationofthepopulation

enrolledintheRCTbecausehowthatpopulationisdescribediscommonlytakento

besomeindicationofwhichotherpopulationstheresultsarelikelytobeexportable

to.Manymedicalandpsychologicaljournalsareexplicitaboutthis.Forinstance,the

rulesforsubmissionrecommendedbytheInternationalCommitteeofMedicalJour-

nalEditors,ICMJE(2015,14)insistthatarticleabstracts“Clearlydescribetheselec-

tionofobservationalorexperimentalparticipants(healthyindividualsorpatients,

includingcontrols),includingeligibilityandexclusioncriteriaandadescriptionof

thesourcepopulation.”AnRCTisconductedonaspecifictrialsample,somehow

drawnfromapopulationofspecificindividuals.Theresultsobtainedarefeaturesof

thatsample,ofthoseveryindividualsatthatverytime,notanyotherpopulation

Page 38: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

37

withanydifferentindividualsthatmight,forexample,satisfyoneoftheinfiniteset

ofdescriptionsthatthetrialsamplesatisfies.

Thissameissueisconfrontedalreadyinstudydesign.Apartfromspecial

cases,likeposthocevaluationforpayment-for-results,wearenotespeciallycon-

cernedtolearnabouttheveryindividualsenrolledinthetrial.Mostexperiments

are,andshouldbe,conductedwithaneyetowhattheresultscanhelpuslearn

aboutotherpopulations.Thiscannotbedonewithoutsubstantialassumptions

aboutwhatmightandwhatmightnotberelevanttotheproductionoftheoutcome

studied.(Forexample,theICMJEguidelines(2015,14)goontosay:“Becausethe

relevanceofsuchvariablesasage,sex,orethnicityisnotalwaysknownatthetime

ofstudydesign,researchersshouldaimforinclusionofrepresentativepopulations

intoallstudytypesandataminimumprovidedescriptivedatafortheseandother

relevantdemographicvariables,”p14.)Sobothintelligentstudydesignandrespon-

siblereportingofstudyresultsinvolvesubstantialbackgroundassumptions.

Ofcourse,thisistrueforallstudies.ButRCTsrequirespecialconditionsif

theyaretobeconductedatallandespeciallyiftheyaretobeconductedsuccess-

fully—forexample,localagreements,compliantsubjects,affordableadministrators,

multipleblinding,peoplecompetenttomeasureandrecordoutcomesreliably,aset-

tingwhererandomallocationismorallyandpoliticallyacceptable,etc.—whereas

observationaldataareoftenmorereadilyandwidelyavailable.InthecaseofRCTs,

thereisdangerthatthesekindsofconsiderationshavetoomucheffect.Thisisespe-

ciallyworrisomewherethefeaturesthatthetrialsampleshouldhavearenotjusti-

fied,madeexplicit,orsubjectedtoseriouscriticalreview.

Theneedforobservationalknowledgeisoneofmanyreasonswhyitiscoun-

ter-productivetoinsistthatRCTsarethegoldstandardorthatsomecategoriesof

evidenceshouldbeprioritizedoverothers;thesestrategiesleaveushelplessinus-

ingRCTsbeyondtheiroriginalcontext.TheresultsofRCTsmustbeintegratedwith

otherknowledge,includingthepracticalwisdomofpolicymakers,iftheyaretobe

useableoutsidethecontextinwhichtheywereconstructed.

Page 39: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

38

Contrarytomuchpracticeinmedicineaswellasineconomics,conflictsbe-

tweenRCTsandobservationalresultsneedtobeexplained,forexamplebyrefer-

encetothedifferentpopulationsineach,aprocessthatwillsometimesyieldim-

portantevidence,includingontherangeofapplicabilityoftheRCTresultsthem-

selves.WhilethevalidityoftheRCTwillsometimesprovideanunderstandingof

whytheobservationalstudyfoundadifferentanswer,thereisnobasis(orexcuse)

forthecommonpracticeofdismissingtheobservationalstudysimplybecauseit

wasnotanRCTandthereforemustbeinvalid.Itisabasictenetofscientificadvance

that,ascollectiveknowledgeadvances,newfindingsmustbeabletoexplainandbe

integratedwithpreviousresults,evenresultsthatarenowthoughttobeinvalid;

methodologicalprejudiceisnotanexplanation.

2.5Usingtheoryforgeneralization

Economistshavebeencombiningtheoryandrandomizedcontrolledtrialssincethe

earlyexperiments.OrcuttandOrcutt(1968)laidouttheinspirationfortheincome

taxtrialsusingasimplestatictheoryoflaborsupply.Accordingtothis,people

choosehowtodividetheirtimebetweenworkandleisureinanenvironmentin

whichtheyreceiveaminimumGiftheydonotwork,andwheretheyreceiveanad-

ditionalamount(1-t)wforeachhourtheywork,wherewisthewagerate,andtisa

taxrate.ThetrialsassigneddifferentcombinationsofGandttodifferenttrial

groups,sothattheresultstracedoutthelaborsupplyfunction,allowingestimation

oftheparametersofpreferences,whichcouldthenbeusedinawiderangeofpolicy

calculations,forexampletoraiserevenueatminimumutilitylosstoworkers.

Followingtheseearlytrials,therehasbeenacontinuingtraditionofusing

trialresults,togetherwiththebaselinedatacollectedforthetrial,tofitstructural

modelsthataretobeusedmoregenerally.EarlyexamplesincludeMoffitt(1979)on

laborsupplyandWise(1985)onhousing;amorerecentexampleisHeckman,Pinto

andSavelyev(2013)forthePerrypre-schoolprogram.Developmenteconomicsex-

amplesincludeAttanasio,Meghir,andSantiago(2012),Attanasioetal(2015),Todd

andWolpin(2006),Wolpin(2013),andDuflo,Hanna,andRyan(2012).These

Page 40: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

39

structuralmodelssometimesrequireformidableauxiliaryassumptionsonfunc-

tionalformsorthedistributionsofunobservables,buttheyhavecompensatingad-

vantages,includingtheabilitytointegratetheoryandevidence,tomakeout-of-sam-

plepredictions,andtoanalyzewelfare,andtheuseofRCTevidenceallowsthere-

laxationofatleastsomeoftheassumptionsthatareneededforidentification.Inthis

way,thestructuralmodelsborrowcredibilityfromtheRCTsandinreturnhelpset

theRCTresultswithinacoherentframework.Withoutsomesuchinterpretation,the

welfareimplicationsofRCTresultscanbeproblematic;knowinghowpeopleingen-

eral(letalonejustpeopleinthetrialpopulation)respondtosomepolicyisrarely

enoughtotellwhetherornottheyaremadebetteroff,Harrison(2014a,b).Tradi-

tionalwelfareeconomicsdrawsalinkfrompreferencestobehavior,alinkthatisre-

spectedinstructuralworkbutoftenlostinthe`whatworks’literature,andwithout

whichwehavenobasisforinferringwelfarefrombehavior.Whatworksisnot

equivalenttowhatshouldbe.

Lighttouchtheorycandomuchtointerpret,toextend,andtouseRCTre-

sults.InboththeRANDHealthExperimentandnegativeincometaxexperiments,an

immediateissueconcernedthedifferencebetweenshortandlong-runresponses;

indeed,differencesbetweenimmediateandultimateeffectsoccurinawiderangeof

RCTs.BothhealthandtaxRCTsaimedtodiscoverwhatwouldhappenifconsum-

ers/workerswerepermanentlyfacedwithhigherorlowerprices/wages,butthetri-

alscouldonlyrunforalimitedperiod.Atemporarilyhightaxrateonearningsisef-

fectivelya`firesale’onleisure,sothattheexperimentprovidedanopportunityto

takeavacationandmakeuptheearningslater,anincentivethatwouldbeabsentin

apermanentscheme.Howdowegetfromtheshort-runresponsesthatcomefrom

thetrialtothelong-runresponsesthatwewanttoknow?Metcalf(1973)andAsh-

enfelter(1978)providedanswersfortheincometaxexperiments,asdidArrow

(1975)fortheRandHealthExperiment.

Arrow’sanalysisillustrateshowtousebothstructureandobservationaldata

totransportandadaptresultsfromonesettingtoanother.Hemodelsthehealthex-

perimentasatwo-periodmodelinwhichthepriceofmedicalcareisloweredinthe

firstperiodonly,andshowshowtoderivewhatwewant,whichistheresponsein

Page 41: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

40

thefirstperiodifpriceswereloweredbythesameproportioninbothperiods.The

magnitudethatwewantisS,thecompensatedpricederivativeofmedicalcarein

period1inthefaceofidenticalincreasesin𝑝"and𝑝4inbothperiods1and2.Thisis

equalto𝑠"" + 𝑠"4,thesumofthederivativesofperiod1’sdemandwithrespectto

thetwoprices.Thetrialgivesonly𝑠"".Butifwehavepost-trialdataonmedicalser-

vicesforbothtreatmentsandcontrols,wecaninfer𝑠4",theeffectoftheexperi-

mentalpricemanipulationonpost-experimentalcare.Choicetheory,intheformof

Slutskysymmetrysaysthat𝑠"4 = 𝑠4"andsoallowsArrowtoinfer𝑠"4andthusS.He

contraststhiswithMetcalf’salternativesolution,whichmakesdifferentassump-

tions—thattwoperiodpreferencesareintertemporallyadditive,inwhichcasethe

long-runelasticitycanbeobtainedfromknowledgeoftheincomeelasticityofpost-

experimentalmedicalcare,whichwouldhavetocomefromanobservationalanaly-

sis.Thesetwoalternativeapproachesshowhowwecanchoose,basedonourwill-

ingnesstomakeassumptionsandonthedatathatwehave,asuitablecombination

of(elementaryandtransparent)theoreticalassumptionsandobservationaldatain

ordertoadaptandusetrialresults.Suchanalysiscanalsohelpdesigntheoriginal

trialbyclarifyingwhatweneedtoknowinordertousetheresultsofatemporary

treatmenttoestimatethepermanenteffectsthatweneed.Ashenfelterprovidesa

thirdsolution,notingthatthetwo-periodmodelisformallyidenticaltoatwo-person

model,sothatwecanuseinformationontwo-personlaborsupplytotellusabout

thedynamics.

Theorycanoftenallowustoreclassifyneworunknownsituationsasanalo-

goustosituationswherewealreadyhavebackgroundknowledge.Onefrequently

usefulwayofdoingthisiswhenthenewpolicycanberecastasequivalenttoa

changeinthebudgetconstraintthatrespondentsface.Theconsequencesofanew

policymaybeeasiertopredictifwecanreduceittoequivalentchangesinincome

andprices,whoseeffectsareoftenwellunderstoodandwell-studied.Toddand

Wolpin(2008)andWolpin(2013)makethispointandprovideexamples.Inthela-

borsupplycase,anincreaseinthetaxratehasthesameeffectasadecreaseinthe

wagerate,sothatwecanrelyonpreviousliteraturetopredictwhatwillhappen

whentaxratesarechanged.InthecaseofMexico’sPROGRESAconditionalcash

Page 42: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

41

transferprogram,ToddandWolpinnotethatthesubsidiespaidtoparentsiftheir

childrengotoschoolcanbethoughtofasacombinationofreductioninchildren’s

wagesandanincreaseinparents’income,whichallowsthemtopredicttheresults

oftheconditionalcashexperimentwithlimitedadditionalassumptions.Ifthis

works,asitpartiallydoesintheiranalysis,thetrialhelpsconsolidateprevious

knowledgeandcontributestoanevolvingbodyoftheoryandempirical,including

trial,evidence.

Theprogramofthinkingaboutpolicychangesasequivalenttopriceandin-

comechangeshasalonghistoryineconomics;muchofrationalchoicetheorycanbe

sointerpreted(seeDeatonandMuellbauer(1980)formanyexamples).Whenthis

conversioniscredible,andwhenatrialonsomeapparentlyunrelatedtopiccanbe

modeledasequivalenttoachangeinpricesandincomes,andwhenwecanassume

thatpeopleindifferentsettingsrespondrelevantlysimilarlytochangesinprices

andincomes,wehaveareadymadeframeworkforincorporatingthetrialresults

intopreviousknowledge,aswellasforextendingthetrialresultsandusingthem

elsewhere.Ofcourse,alldependsonthevalidityandcredibilityofthetheory;peo-

plemaynotinfacttreatataxincreaseasadecreaseinthepriceofleisure,andbe-

havioraleconomicsisfullofexampleswhereapparentlyequivalentstimuligenerate

non-equivalentoutcomes.Theembraceofbehavioraleconomicsbymanyofthecur-

rentgenerationoftrialistsmayaccountfortheirlimitedwillingnesstouseconven-

tionalchoicetheoryinthisway.Unfortunately,behavioraleconomicsdoesnotyet

offerareplacementforthegeneralframeworkofchoicetheorythatissousefulin

thisregard.

Theorycanalsohelpwiththeproblemweraisedofdelineatingthepopula-

tiontowhichthetrialresultsimmediatelyapplyandforthinkingaboutmoving

fromthispopulationtopopulationsofinterest.Ashenfelter’s(1978)analysisis

againagoodillustrationandpredatesmuchsimilarworkinlaterliterature.Thein-

cometaxexperimentsofferedparticipationinthetrialtoarandomsampleofthe

populationofinterest.Becausetherewasnoblindingandnocompulsion,people

whowererandomizedintothetreatmentgroupwerefreetochoosetorefusetreat-

ment.Asinmanysubsequentanalyses,Ashenfeltersupposesthatpeoplechooseto

Page 43: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

42

participateifitisintheirinteresttodoso,dependingonwhathasbecomeknownin

theRCTandInstrumentalVariablesliteratureastheirownidiosyncratic`gain.’The

simplelaborsupplymodelgivesanapproximatecondition:Ifthetreatmentin-

creasesthetaxratefrom t0 to t1 withanoffsettingincreaseinG,thenanindividual

assignedtotheexperimentalgroupwilldeclinetoparticipateif

(t1 − t0 )w0h0 +12s00 (t1 − t0 ) >G1 −G0 (5)

wheresubscript1referstothetreatmentsituation,0tothecontrol,ℎ&ishours

worked,and𝑠&&isthe(negative)utility-constantresponseofhoursworkedtothe

taxrate.Ifthereisnosubstitution,thesecondtermontheleft-handsideiszero,and

peoplewillaccepttreatmentiftheincreaseinGmorethanmakesupforthein-

creasesintaxespayable,the`breakeven’condition.Inconsequence,thosewith

higherearningsarelesslikelytoaccepttreatment.Somebetter-offpeoplewithhigh

substitutioneffectswillalsoaccepttreatmentiftheopportunitytobuymorecheap

leisureissufficiententicement.

Theselectiveacceptanceoftreatmentlimitstheanalyst’sabilitytolearn

aboutthebetter-offorlow-substitutionpeoplewhodeclinetreatmentbutwho

wouldhavetoacceptitifthepolicywereimplemented.Boththeintention-to-treat

estimatorandthe`astreated’estimatorthatcomparesthetreatedandtheun-

treatedareaffected,notjustbythelaborsupplyeffectsthatthetrialisdesignedto

induce,butbythekindofselectioneffectsthatrandomizationisdesignedtoelimi-

nate.Ofcourse,theanalysisthatleadsto(5)canperhapshelpussaysomething

aboutthisandhelpusadjustthetrialestimatesbacktowhatwewouldliketoknow.

Yetthisisnoeasymatterbecauseselectiondepends,notonlyonobservables,such

aspre-experimentalearningsandhoursworked,buton(muchhardertoobserve)

laborsupplyresponsesthatlikelyvaryacrossindividuals.ParaphrasingAshenfelter,

wecannotestimatetheeffectsofapermanentcompulsorynegativeincometaxpro-

gramfromatransitoryvoluntarytrialwithoutstrongassumptionsoradditionalevi-

dence.

Muchofthemodernliterature,forexampleontrainingprograms,wrestles

withtheissueofexactlywhoisrepresentedbytheRCTresults,includingnotonly

Page 44: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

43

whoparticipatesinthefirstplacebutwholeavesbeforethetrialiscompleted(see

againHeckman,LalondeandSmith(1999)).Asintheexamplesabove,modelingat-

tritionwithinatrialcanyieldestimatesofbehavioralresponsesthatcanbeusedto

transportthefindingstoothersettings(seeChanandHamilton(2006),Chassang,

PadróIMiguel,andSnowberg(2012)andChassangetal(2015)).Whenpeopleare

allowedtorejecttheirrandomlyassignedtreatmentaccordingtotheirown(realor

perceived)advantage,ortodropoutofatrialonanestimateofthebenefitsand

costsfromdoingso,wehavecomealongwayawayfromtherandomallocationin

thestandardconceptionofarandomizedcontrolledtrial.Moreover,theabsenceof

blindingiscommoninsocialandeconomicRCTs,andwhiletherearetrials,suchas

welfaretrials,thateffectivelycompelpeopletoaccepttheirassignments,andsome

wherethetreatmentisgenerousenoughtodoso,therearetrialswheresubjects

havemuchfreedomand,inthosecasesitislessthanobvioustouswhatrole,ifany,

randomizationplaysinwarrantingtheresults.

2.6Scalingup:usingtheaverageforpopulations

ManyRCTsaresmall-scaleandlocal,forexampleinafewschools,clinics,orfarms

inaparticulargeographic,cultural,socio-economicsetting.Ifsuccessfulaccording

toacost-effectivenesscriterion,forexample,itisacandidateforscaling-up,apply-

ingthesameinterventionforamuchlargerarea,oftenawholecountry,orsome-

timesevenbeyond,aswhensometreatmentisconsideredforallrelevantWorld

Bankprojects.Thefactthattheinterventionmightworkdifferentlyatscalehaslong

beennotedintheeconomicsliterature,e.g.GarfinkelandManski(1992),Heckman

(1992),andMoffitt(1992),andisrecognizedintherecentreviewbyBanerjeeand

Duflo(2009).Wewantheretoemphasizethepervasivenessofsucheffectsaswell

astonoteagainthatthisshouldnotbetakenasanargumentagainstusingRCTsbut

onlyagainsttheideathateffectsatscalearelikelytobethesameasinthetrial.

Anexampleofwhatareoftencalled`generalequilibriumeffects’comesfrom

agriculture.SupposeanRCTdemonstratesthatinthestudypopulationanewwayof

usingfertilizerhadasubstantialpositiveeffecton,say,cocoayields,sothatfarmers

whousedthenewmethodssawincreasesinproductionandinincomescompared

tothoseinthecontrolgroup.Iftheprocedureisscaleduptothewholecountry,or

Page 45: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

44

toallcocoafarmersworldwide,thepricewilldrop,andifthedemandforcocoais

priceinelastic—asisusuallythoughttobethecase,atleastintheshortrun—cocoa

farmers’incomeswillfall.Indeed,theconventionalwisdomformanycropsisthat

farmersdobestwhentheharvestissmall,notlarge.Ofcourse,theseconsiderations

mightnotbedecisiveindecidingwhetherornottopromotetheinnovation,and

theremaystillbelongtermgainsif,forexample,somefarmersfindsomethingbet-

tertodothangrowingcocoa.But,inthiscase,thescaled-upeffectisoppositeinsign

tothetrialeffect.Theproblemisnotwiththetrialresults,whichcanbeusefullyin-

corporatedintoamorecomprehensivemarketmodelthatincorporatesthere-

sponsesestimatedbythetrial.Theproblemisonlyifweassumethattheaggregate

looksliketheindividual.Thatotheringredientsoftheaggregatemodelmustcome

fromobservationalstudiesshouldnotbeacriticism,evenforthosewhofavorRCTs;

itissimplythepriceofdoingseriousanalysis.

Therearemanypossibleinterventionsthataltersupplyordemandwhoseef-

fect,inaggregate,willchangeapriceorawagethatisheldconstantintheoriginal

RCT.Educationwillchangethesuppliesofskilledversusunskilledlabor,withimpli-

cationsforrelativewagerates.Conditionalcashtransfersincreasethedemandfor

(andperhapssupplyof)schoolsandclinics,whichwillchangepricesorwaiting

lines,orboth.Thereareinteractionsbetweenpeoplethatwilloperateonlyatscale.

Givingonechildavouchertogotoprivateschoolmightimproveherfuture,butdo-

ingsoforeveryonecandecreasethequalityofeducationforthosechildrenwhoare

leftinthepublicschools(seethecontrastingstudiesofAngristetal(2002)and

HsiehandUrquiola(2002)).Educationalortrainingprogramsmaybenefitthose

whoaretreatedbutharmthoseleftbehind;Créponetal(2014)recognizetheissue

andshowhowtoadaptanRCTtodealwithit.

Scalingupcanalsodisturbthepoliticalequilibrium.Anexploitativegovern-

mentmaynotallowthemasstransferofmoneyfromabroadtoapowerlessseg-

mentofthepopulation,thoughitmaypermitasmall-scaleRCTofcashtransfers,

perhapseveninthehopethatalarge-scaleimplementationwillyieldopportunities

forpredation.ProvisionofhealthcarebyforeignNGOsmaybesuccessfulintrials,

buthaveunintendednegativeconsequencestoscalebecauseofgeneralequilibrium

Page 46: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

45

effectsonthesupplyofhealthcarepersonnel,orbecauseitdisturbsthenatureof

thecontractbetweenthepeopleandagovernmentthatisusingtaxrevenuetopro-

videservices.InIndia,thegovernmentspendslargesumsonfoodsubsidiesthrough

asystem(thePDS)thatisbothcorruptandinefficient,withmuchofthegrainthatis

procuredfailingtofinditswaytotheintendedbeneficiaries.LocalizedRCTson

whetherornotfamiliesarebetteroffwithcashtransfersarenotinformativeabout

howpoliticianswouldchangetheamountofthetransferiffacedwithunanticipated

inflation,andatleastasimportant,whetherthegovernmentcouldcutprocurement

fromrelativelywealthyandpoliticallypowerfulfarmers.Withoutapoliticaland

generalequilibriumanalysis,itisimpossibletothinkabouttheeffectsofreplacing

foodsubsidieswithcashtransfers(seee.g.Basu(2010)).

Eveninmedicine,wherebiologicalinteractionsbetweenpeoplearelesscom-

monthanaresocialinteractionsinsocialscience,interactionscanbeimportant.In-

fectiousdiseasesareonewell-knownexample,whereimmunizationprogramsaf-

fectthedynamicsofdiseasetransmissionthroughherdimmunity(seeFineand

Clarkson(1986)andManski(2013,52)).Thesocialandeconomicsettingalsoaf-

fectshowdrugsareactuallyusedandthesameissuescanarise;thedistinctionbe-

tweenefficacyandeffectivenessinclinicaltrialsisinpartrecognitionofthefact.

2.7Drillingdown:usingtheaverageforindividuals

Justasthereareissueswithscaling-up,itisnotobvioushowtousetheresultsfrom

RCTsatthelevelofindividualunits,evenindividualunitsthatwereincludedinthe

trial.Awell-conductedRCTdeliversanATEforthetrialpopulationbut,ingeneral,

thataveragedoesnotapplytoeveryone.Itisnottrue,forexample,asarguedinthe

AmericanMedicalAssociation’s“Users’guidetothemedicalliterature”that“ifthe

patientwouldhavebeenenrolledinthestudyhadshebeenthere—thatisshemeets

alloftheinclusioncriteriaanddoesn’tviolateanyoftheexclusioncriteria—thereis

littlequestionthattheresultsareapplicable”(seeGuyattetal(1994,60)).Even

moremisleadingaretheoften-heardstatementsthatanRCTwithanaveragetreat-

menteffectinsignificantlydifferentfromzerohasshownthatthetreatmentworks

fornoone.

Page 47: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

46

Theseissuesarefamiliartophysicianspracticingevidence-basedmedicine

whoseguidelinesrequire“integratingindividualclinicalexpertisewiththebest

availableexternalclinicalevidencefromsystematicresearch,”Sackettetal(1996,

71).Exactlywhatthismeansisunclear;physiciansknowmuchmoreabouttheirpa-

tientsthanisallowedforintheATEfromtheRCT(though,onceagain,stratification

inthetrialislikelytobehelpful)andtheyoftenhaveintuitiveexpertisefromlong

practicethatcanhelpthemidentifyfeaturesinaparticularpatientthatmayinflu-

encetheeffectivenessofagiventreatmentforthatpatient.Butthereisanoddbal-

ancestruckhere.Thesejudgmentsaredeemedadmissibleindiscussionwiththein-

dividualpatient,buttheydon’tadduptoevidencetobemadepubliclyavailable,

withtheusualcautionsaboutcredibility,bythestandardsadoptedbymostEBM

sites.Itisalsotruethatphysicianscanhaveprejudicesand`knowledge’thatmight

beanythingbut.Clearly,therearesituationswhereforcingpractitionerstofollow

theaveragewilldobetter,evenforindividualpatients,andotherswheretheoppo-

siteistrue,KahnemanandKlein(2009).

Whetherornotaveragesareusefultoindividualsraisesthesameissue

throughoutsocialscienceresearch.Imaginetwoschools,StJoseph’sandSt.Mary’s,

bothofwhichwereincludedinanRCTofaclassroominnovation.Theinnovationis

successfulonaverage,butshouldtheschoolsadoptit?ShouldStMary’sbeinflu-

encedbyapreviousattemptinStJoseph’sthatwasjudgedafailure?Manywould

dismissthisexperienceasanecdotalandaskhowStJoseph’scouldhaveknownthat

itwasafailurewithoutbenefitof`rigorous’evidence.YetifStMary’sislikeStJo-

seph’s,withasimilarmixofpupils,asimilarcurriculum,andsimilaracademic

standing,mightnotStJoseph’sexperiencebemorerelevanttowhatmighthappen

atStMary’sthanisthepositiveaveragefromtheRCT?Andmightitnotbeagood

ideafortheteachersandgovernorsofStMary’stogotoStJoseph’sandfindout

whathappenedandwhy?Theymaybeabletoobservethemechanismofthefailure,

ifsuchitwas,andfigureoutwhetherthesameproblemswouldapplyforthem,or

whethertheymightbeabletoadapttheinnovationtomakeitworkforthem,per-

hapsevenmoresuccessfullythanthepositiveaverageinthetrial.

Page 48: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

47

Onceagain,thesequestionsareunlikelytobeeasilyansweredinpractice;

but,aswithtransportability,thereisnoseriousalternativetotrying.Assumingthat

theaverageworksforyouwilloftenbewrong,anditwillatleastsometimesbepos-

sibletodobetter.Asinthemedicalcase,theadvicetoindividualschoolsoftenlacks

specificity.Forexample,theU.S.InstituteofEducationScienceshasprovideda

“user-friendly”guidetopracticessupportedbyrigorousevidence,USDepartmentof

Education(2003).Theadvice,whichissimilartorecommendationsindevelopment

economics,isthattheinterventionbedemonstratedeffectivethroughwell-designed

RCTsinmorethanonesiteandthat“thetrialsshoulddemonstratetheinterven-

tion’seffectivenessinschoolsettingssimilartoyours”(2003,17).Nooperational

definitionof“similar”isprovided.

2.8Examplesandillustrationsfromeconomics

OurargumentsinthisSectionshouldnotbecontroversial,yetwebelievethatthey

representanapproachthatisdifferentfrommostcurrentpractice.Todocument

thisandtofilloutthearguments,weprovidesomeexamples.Whiletheseareocca-

sionallycritical,ourpurposeisconstructive;indeed,webelievethatmisunderstand-

ingsabouthowtouseRCTshaveartificiallylimitedtheirusefulness,aswellasalien-

atedsomewhowouldotherwiseusethem.

Conditionalcashtransfers(CCTs)areinterventionsthathavebeentestedus-

ingRCTs(andotherRCT-likemethods)andareoftencitedasaleadingexampleof

howanevaluationwithstronginternalvalidityleadstoarapidspreadofthepolicy,

e.g.AngristandPischke(2010)amongmanyothers.Thinkthroughthecausalchain

thatisrequiredforCCTstobesuccessful:Peoplemustlikemoney,theymustlike

(ordonotobjecttoomuch)totheirchildrenbeingeducatedandvaccinated,there

mustexistschoolsandclinicsthatarecloseenoughandwellenoughstaffedtodo

theirjob,andthegovernmentoragencythatisrunningtheschememustcareabout

thewellbeingoffamiliesandtheirchildren.Thatsuchconditionsholdinawide

rangeof(althoughcertainlynotall)countriesmakesitunsurprisingthatCCTs

`work’inmanyreplications,thoughtheycertainlywillnotworkinplaceswherethe

schoolsandclinicsdonotexist,e.g.Levy(2001),norinplaceswherepeople

stronglyopposeeducationorvaccination.

Page 49: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

48

Similarly,giventhatthesupportfactorswilloperatewithdifferentstrengths

andeffectivenessindifferentplaces,itisalsonotsurprisingthatthesizeoftheATE

differsfromplacetoplace;forexample,Vivalt’sAidGradewebsitelists29estimates

fromarangeofcountriesofthestandardized(dividedbylocalstandarddeviationof

theoutcome)effectsofCCTsonschoolattendance;allbutfourshowtheexpected

positiveeffect,andtherangerunsfrom–8to+38percentagepoints,Vivalt(2015).

Eveninthisleadingcase,wherewemightreasonablyconcludethatCCTs`work’in

gettingchildrenintoschool,itwouldbehardtocalculatecrediblecost-effectiveness

numbersortocometoageneralconclusionaboutwhetherCCTsaremoreorless

costeffectivethanotherpossiblepolicies.Bothcostsandeffectsizescanbeex-

pectedtodifferinnewsettings,justastheyhaveinobservedones,makingthese

predictionsdifficult.

Therangeofestimatesillustratesthatthesimpleviewofexternalvalidity—

thattheATEtransportsfromoneplacetoanother—isnotreasonable.AidGrade

usesstandardizedmeasuresofeffectsizedividedbystandarddeviationofoutcome

atbaseline,asdoesthemajormulti-countrystudybyBanerjeeetal(2015).Butwe

mightprefermeasuresthathaveaneconomicinterpretation,suchasadditional

monthsofschoolingper$100spent(forexampleifadonoristryingtodecide

wheretospend,seebelow).Nutritionmightbemeasuredbyheight,orbythelogof

height.EveniftheATEbyonemeasurecarriesacross,itwillonlydosousingan-

othermeasureiftherelationshipbetweenthetwomeasuresisthesameinbothsit-

uations.Thisisexactlythesortofthingthataformalanalysisoftransportability

forcesustothinkabout.(NotealsothattheATEintheoriginalRCTcandifferde-

pendingonwhethertheoutcomeismeasuredinlevelsorinlogs;itiseasytocon-

structexampleswherethetwoATEshavedifferentsigns.)

Muchoftheeconomicsliterature,likethemedicalliterature,workswiththe

viewofexternalvaliditythat,unlessthereisevidencetothecontrary,thedirection

andsizeoftreatmenteffectscanbetransportedfromoneplacetoanother.TheJ-

PALwebsitereportsitsfindingsunderageneralheadingofpolicyrelevance,subdi-

videdbyaselectionoftopics.Undereachtopic,thereisalistofrelevantRCTsfrom

arangeofdifferentsettingsaroundtheworld.Theseareconvenientlyconverted

Page 50: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

49

intoacommoncost-effectivenessmeasuresothat,forexample,under“education”,

subhead“studentparticipation”,therearefourstudiesfromAfrica:oninforming

parentsaboutthereturnstoeducationinMadagascar,ondeworming,onschooluni-

forms,andonmeritscholarships,allfromKenya.Theunitsofmeasurementaread-

ditionalyearsofstudenteducationper$100,andamongthesefourstudies,theav-

erageeffectsofspending$100are20.7years,13.9years,0.71yearsand0.27years

respectively.(Notethatthisisadifferent—andmuchsuperior—standardization

fromtheeffectsizestandardizationdiscussedbelow.)

Whatcanweconcludefromsuchcomparisons?Foraphilanthropicdonorin-

terestedineducation,andifmarginalandaverageeffectsarethesame,theymight

indicatethatthebestplacetodevoteamarginaldollarisinMadagascar,whereit

wouldbeusedtoinformparentsaboutthevalueofeducation.Thisiscertainlyuse-

ful,butitisnotasusefulasstatementsthatinformationordewormingprograms

areeverywheremorecost-effectivethanprogramsinvolvingschooluniformsor

scholarships,orifnoteverywhere,atleastoversomedomain,anditisthesesecond

kindsofcomparisonthatwouldgenuinelyfulfillthepromiseof`findingoutwhat

works.’Butsuchcomparisonsonlymakesenseifwecantransporttheresultsfrom

oneplacetoanother,iftheKenyanresultsalsoholdinMadagascar,Mali,orNa-

mibia,orsomeotherlistofplaces.J-PAL’smanualforcost-effectiveness,Dhaliwalet

al(2012)explainsin(entirelyappropriate)detailhowtohandlevariationincosts

acrosssites,notingvariablefactorssuchaspopulationdensity,prices,exchange

rates,discountrates,inflation,andbulkdiscounts.Butitgivesshortshrifttocross-

sitevariationinthesizeofATEs,whichplayanequalpartinthecalculationsofcost

effectiveness.Themanualbrieflynotesthatdiminishingreturns(orthelast-mile

problem)mightbeimportantintheorybutarguesthatthebaselinelevelsofout-

comesarelikelytobesimilarinthepilotandreplicationareas,sothattheATEcan

besafelytransportedasis.Allofthislacksajustificationfortransportability,some

understandingofwhenresultstransport,whentheydonot,orbetterstill,howthey

shouldbemodifiedtomakethemtransportable.

Page 51: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

50

OneofthelargestandmosttechnicallyimpressiveofthedevelopmentRCTs

isbyBanerjeeetal(2015),whichtestsa“graduation”programdesignedtoperma-

nentlyliftextremelypoorpeoplefrompovertybyprovidingthemwithagiftofa

productiveasset(fromguinea-pigs,(regular-)pigs,sheep,goats,orchickensde-

pendingonlocale),trainingandsupport,andlife-skillscoaching,aswellassupport

forconsumption,saving,andhealthservices.Theideaisthatthispackageofaidcan

helppeoplebreakoutofpovertytrapsinawaythatwouldnotbepossiblewithone

interventionatatime.ComparableversionsoftheprogramweretestedinEthiopia,

Ghana,Honduras,India,Pakistan,andPeruand,exceptingHonduras(wherethe

chickensdied)findlargelypositiveandpersistenteffects—withsimilar(standard-

ized)effectsizes—forarangeofoutcomes(economic,mentalandphysicalhealth,

andfemaleempowerment).Onesiteapart,essentiallyeveryoneacceptedtheiras-

signment.ReplicationofpositiveATEsoversuchawiderangeofplacescertainly

providesproofofconceptforsuchascheme.YetBauchet,Morduch,andRavi(2015)

failtoreplicatetheresultinSouthIndia,wherethecontrolgroupgotaccesstomuch

thesamebenefits,whatHeckman,Hohman,andSmith(2000)call“substitution

bias”.Evenso,theresultsareimportantbecause,althoughthereisalongstanding

interestinpovertytraps,manyeconomistshavebeenskepticaloftheirexistenceor

thattheycouldbesprungbysuchaid-basedpolicies.Inthissense,thestudyisan

importantcontributiontothetheoryofeconomicdevelopment;ittestsatheoretical

propositionandwill(orshould)changemindsaboutit.

Anumberofdifficultiesremain.Astheauthorsnote,suchtrialscannottellus

whichcomponentofthetreatmentaccountedfortheresults,orwhichmightbedis-

pensable—amuchmoreexpensivemultifactorialtrialwouldberequired—thoughit

seemslikelyinpracticethatthecostliestcomponent—therepeatedvisitsfortrain-

ingandsupport—islikelytobethefirsttobecutbycash-strappedpoliticiansorad-

ministrators.Andasnoted,itisnotclearwhatshouldcountas(simple)replication

ininternationalcomparisons;itishardtothinkoftheusesofstandardizedeffect

sizes,excepttodocumentthateffectsexisteverywhereandthattheyaresimilarly

largerelativetolocalvariationinsuchthings.

Page 52: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

51

Theeffectsize—theATEstandardizedbybeingexpressedinnumbersof

standarddeviationsoftheoriginaloutcome—thoughconvenientlydimensionless,

haslittletorecommendit.AswithmuchofRCTpractice,itstripsoutanyeconomic

content—noratesofreturn,orbenefitsminuscosts—anditremovesanydiscipline

onwhatisbeingcompared.Applesandorangesbecomeimmediatelycomparable,

asdotreatmentswhoseinclusioninameta-analysisislimitedonlybytheimagina-

tionoftheanalystsinclaimingsimilarity.Trainingprogramsforphysicalfitnesscan

bepooledwithtrainingprogramsforwelding,ormarketing,orevenobedience

trainingforpets.Inpsychology,wheretheconceptoriginated,thisresultsinendless

disputesaboutwhatshouldandshouldnotbepooledinameta-analysis.Goldberger

andManski(1995,769)notethat“standardizationaccomplishesnothingexceptto

givequantitiesinnoncomparableunitsthesuperficialappearanceofbeingincom-

parableunits.Thisaccomplishmentisworsethanuseless—ityieldsmisleadingin-

ferences.”Beyondthat,Simpson(2017)notesthatrestrictionsonthetrialsample—

oftengoodpracticetoreducebackgroundnoiseandtohelpdetectaneffect—will

reducethebaselinestandarddeviationandinflatetheeffectsize.Moregenerally,ef-

fectsizesareopentomanipulationbyexclusionrules.Itmakesnosensetoclaim

replicabilityonthebasisofeffectsizes,letalonetousethemtorankprojects.Effect

sizesareirrelevantforpolicymaking.

Thegraduationstudycanbetakenastheclosesttofulfillingthe`findingout

whatworks’aimoftheRCTmovementindevelopment.Yetitissilentonperhaps

thecrucialaspectforpolicy,whichisthatthetrialwasruninpartnershipwith

NGOs,whereaswhatwewouldliketoknowiswhetheritcouldbereplicatedbygov-

ernments,includingthosegovernmentsthatareincapableofgettingdoctors,

nurses,andteacherstoshowuptoclinicsorschools,Chaudhuryetal(2005),

Banerjee,DeatonandDuflo(2004),orofregulatingthequalityofmedicalcareinei-

therthepublicorprivatesectors,Filmer,HammerandPritchett(2000)orDasand

Hammer(2005).Infact,wealreadyknowagreatdealabout`whatworks.’Vaccina-

tionswork,maternalandchildhealthcareserviceswork,andclassroomteaching

works.Yetknowingthisdoesnotgetthosethingsdone.Addinganotherprogram

thatworksunderidealconditionsisusefulonlywhereconditionsareinfactideal,in

Page 53: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

52

whichcaseitwouldlikelybeunnecessary.Findingoutwhatworksisnotthemagic

keytoeconomicdevelopment.Technicalknowledge,thoughalwaysworthhaving,

requiressuitableinstitutionsandsuitableincentivesifitistodoanygood.

Asimilarpointisdocumentedinthecontrastbetweenasuccessfultrialthat

usedcamerasandthreatsofwagereductionstoincentivizeattendanceofteachers

inschoolsrunbyanNGOinRajasthaninIndia,Duflo,Hanna,andRyan(2012),and

thesubsequentfailureofafollow-upprograminthesamestatetotacklemassab-

senteeismofhealthworkers,Banerjee,Duflo,andGlennerster(2008).Inthe

schools,thecamerasandtimekeepingworkedasintended,andteacherattendance

increased.Intheclinics,therewasashort-runeffectonnurseattendance,butitwas

quicklyeliminated.(Theabilityofagentseventuallytounderminepoliciesthatare

initiallyeffectiveiscommonenoughandnoteasilyhandledwithinanRCT.)Inboth

trials,therewereincentivestoimproveattendance,andtherewereincentivesto

findawaytosabotagethemonitoringandrestoreworkerstotheiraccustomedpo-

sitions;theforceoftheseincentivesisa`high-level’cause,likegravity,ortheprinci-

pleofthelever,thatworksinmuchthesamewayeverywhere.Fortheclinics,some

sabotagewasdirect—thesmashingofcameras—andsomewassubtler,whengov-

ernmentsupervisorsprovidedofficial,thoughspecious,reasonsformissingwork.

WecanonlyconjecturewhythecausalitywasswitchedinthemovefromNGOto

government;wesuspectthatworkingforahighly-respectedlocalNGOisadifferent

contractfromworkingforthegovernment,wherenotshowingupforworkis

widely(ifinformally)understoodtobepartofthedeal.Theincentiveleverworks

whenitiswiredupright,aswiththeNGOs,butnotwhenthewiringcutsitout,as

withthegovernment.Knowing`whatworks’inthesenseofthetreatmenteffecton

thetrialpopulationisoflimitedvaluewithoutunderstandingthepoliticalandinsti-

tutionalenvironmentinwhichitisset.Thisunderlinestheneedtounderstandthe

underlyingsocial,economic,andculturalstructures—includingtheincentivesand

agencyproblemsthatinhibitservicedelivery—thatarerequiredtosupportthe

causalpathwaysthatweshouldliketoseeatwork.

Trialsineconomicdevelopmentoftentakeplaceinartificialenvironments.

Drèze(2016)notes,basedonextensiveexperienceinIndia,“whenaforeignagency

Page 54: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

53

comesinwithitsheavybootsandsuitcasesofdollarstoadministera`treatment,’

whetherthroughalocalNGOorgovernmentorwhatever,thereisalotgoingon

otherthanthetreatment.”Thereisalsothesuspicionthatatreatmentthatworks

doessobecauseofthepresenceofthe`treators,’oftenfromabroad,andmaynot

workdosowiththepeoplewhowillworkitinpractice.

Thereisalsomuchtobelearnedfrommanyyearsofeconomictrialsinthe

UnitedStates,particularlyfromtheworkofMDRC,fromtheearlyincometaxtrials,

aswellasfromtheRandHealthExperiment.Followingtheincometaxtrials,MDRC

hasrunmanyrandomizedtrialssincethe1970s,mostlyfortheFederalgovernment

butalsoforindividualstatesandforCanada(seethethoroughandinformativeac-

countbyGueronandRolston(2011)forthefactualinformationunderlyingthefol-

lowingdiscussion).MDRC’sprogram,likethatofJPALindevelopment,isintended

tofindout`whatworks’inthestateandfederalwelfareprograms.Theseprograms

areconditionalcashtransfersinwhichpoorrecipientsaregivencashprovidedthey

satisfycertainconditionssuchasworkrequirementsortraining,whichareoften

thesubjectofthetrial.Whatarethebenefitsandcostsofvariousalternatives,both

totherecipientsandtothelocalandfederaltaxpayers?Alloftheseprogramsare

deeplypoliticized,withsharplydifferentviewsoverbothfactsanddesirability.

Manyengagedinthesedisputesfeelcertainofwhatshouldbedoneandwhatits

consequenceswillbesothat,bytheirlights,controlgroupsareunethicalbecause

theydeprivesomepeopleofwhattheadvocates`know’willbecertainbenefits.

Giventhis,itisperhapssurprisingthatRCTshavebecometheacceptednormfor

thiskindofpolicyevaluationintheUS.

Thereasonsowemuchtopoliticalinstitutions,aswellastothecommonbe-

lief,exploredinSection1,thatRCTscanrevealthetruth.AttheFederallevel,pro-

spectivepoliciesarevettedbythenon-partisanCongressionalBudgetOffice(CBO),

whichmakesitsownestimatesofthebudgetaryimplicationsoftheprogram.Ideo-

logueswhoseprogramsarescoredpoorlybytheCBOhaveanincentivetosupport

anRCT,nottoconvincethemselves,buttoconvinceopponents;onceagain,RCTs

arevaluablewhenyouropponentsdonotshareyourprior.Andcontrolgroupsare

Page 55: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

54

easiertoputinplacewhenthereareinsufficientfundstocoverthewholepopula-

tion.TherewasalsoawidespreadandlargelyuncriticalbeliefthatRCTsgivethe

rightanswer,atleastforthebudgetaryimplications,which,ratherthanthewellbe-

ingoftherecipients,wereoftentheprimaryconcern;notethatallofthesetrialsare

onpoorpeoplebyrichpeoplewhoaretypicallymoreconcernedwithcostthanwith

thewellbeingofthepoor,Greenberg,SchroderandOnstott(1999).MDRCstrials

couldthereforebeeffectivedispute-reconciliationmechanismsbothforthosewho

sawtheneedforevidenceandforthosewhodidnot(exceptinstrumentally).The

outcomeherefitswithour`publichealth’case;whatthepoliticiansneedtoknowis

nottheoutcomesforindividuals,orevenhowtheoutcomesinonestatemight

transporttoanother,buttheaveragebudgetarycostinaspecificplace,something

thatagoodRCTconductedonarepresentativesampleofthetargetpopulationcan

deliver,atleastintheabsenceofgeneralequilibriumeffects,timingeffects,etc.

TheseRCTsbyMDRCandothercontractorshavedemonstratedboththefea-

sibilityoflarge-scalesocialtrialsincludingthepossibilityofrandomizationinthese

settings(wheremanyparticipantswerehostiletotheidea),aswellastheiruseful-

nesstopolicymakers.Theyalsoseemtohavechangedbeliefs,forexampleinfavor

ofthedesirabilityofworkrequirementsasaconditionofwelfare,evenamongmany

originallyopposed.Therearealsolimitations;thetrialsappeartohavehadatbesta

minorinfluenceonscientificthinkingaboutbehaviorinlabormarketsand,inthat

sense,theyaremoreabout`plumbing’thanscience,Duflo(2017).Theresultsof

similarprogramshaveoftenbeendifferentacrossdifferentsites,andtherehasto

datebeennofirmunderstandingofwhy;indeed,thetrialsarenotdesignedtore-

vealthis,Moffitt(2004).Finally,andperhapscruciallyforthepotentialcontribution

toeconomicscience,therehasbeenlittlesuccessinunderstandingeithertheunder-

lyingstructuresorchainsofcausation,inspiteofadeterminedeffortfromthebe-

ginningtoopentheblackboxes.

TheRANDhealthexperiment,Manningetal(1975a,b),providesadifferent

butequallyinstructivestoryifonlybecauseitsresultshavepermeatedtheacademic

andpolicydiscussionsabouthealthcareeversince.Itwasoriginallydesignedtotest

whethermoregenerousinsurancecausespeopletousemoremedicalcareand,ifso,

Page 56: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

55

byhowmuch.Theincentiveeffectsarehardlyindoubttoday;theimmortalityofthe

studycomesratherfromthefactthatitsmulti-arm(responsesurface)designal-

lowedthecalculationofanelasticityforthestudypopulation,thatmedicalexpendi-

turesdecreasedby–0.1to–0.2percentforeverypercentageincreaseinthecopay-

ment.AccordingtoAron-Dine,Einav,andFinkelstein(2013),itisthisdimensionless

andthusapparentlytransportablenumberthathasbeenusedeversincetodiscuss

thedesignofhealthcarepolicy;theelasticityhascometobetreatedasauniversal

constant.Ironically,theyarguethattheestimatecannotbereplicatedinrecentstud-

ies,anditisevenunclearthatitisfirmlybasedontheoriginalevidence.Thispoints,

onceagain,tothecentralimportanceoftransportabilityfortheusefulness,both

shortandlongterm,ofatrial.Here,thesimpledirecttransportabilityoftheresult

seemstohavebeenlargelyillusorythough,aswehaveargued,thisdoesnotmean

thatmorecomplexconstructionsbasedontheresultsofthetrialwouldnothave

donebetter.

Conclusions

Itisusefultorespondtotwochallengesthatareoftenputtous,onefrommedicine

andonefromsocialscience.Themedicalchallengeis,“Ifyouarebeingprescribeda

newdrug,wouldn’tyouwantittohavebeenthroughanRCT?”Thesecond(related)

challengeis,“OK,youhavehighlightedsomeoftheproblemswithRCTs,butother

methodshaveallofthoseproblems,plusproblemsoftheirown.”Webelievethatwe

haveansweredbothoftheseinthepaperbutthatitishelpfultorecapitulate.

Themedicalchallengeisaboutyou,aspecificperson,sothatoneanswer

wouldbethatyoumaybedifferentfromtheaverage,andyouareentitledtoand

oughttoaskabouttheoryandevidenceaboutwhetheritwillworkforyou.This

wouldbeintheformofaconversationbetweenyouandyourphysician,whoknows

alotaboutyou.Youwouldwanttoknowhowthisclassofdrugissupposedtowork

andwhetherthatmechanismislikelytoworkforyou.Isthereanyevidencefrom

otherpatients,especiallypatientslikeyou,withyourconditionandinyourcircum-

stances,oraretheresuggestionsfromtheory?Whatscientificworkhasbeendone

toidentifywhatsupportfactorsmatterforsuccesswiththiskindofdrug?Iftheonly

Page 57: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

56

informationavailableisfromthepharmaceuticalcompany,anRCTmightseemlike

agoodidea.Buteventhen,andalthoughknowledgeofthemeaneffectamongsome

groupiscertainlyofvalue,youmightgivelittleweighttoanRCTwhoseparticipants

areselectedinthewaytheywereselectedinthetrial,orwherethereislittleinfor-

mationaboutwhethertheoutcomesarerelevanttoyou.Recallthatmanynewdrugs

areprescribed‘off-label’,forapurposeforwhichtheywerenottested,andbeyond

that,thatmanynewdrugsareadministeredintheabsenceofanRCTbecauseyou

areactuallybeingenrolledinone.Forpatientswhoselastchanceistoparticipatein

atrialofsomenewdrug,thisisexactlythesortofconversationyoushouldhave

withyourphysician(followedbyoneaskinghertorevealwhetheryouareintheac-

tivearm,sothatyoucanswitchifnot),andsuchconversationsneedtotakeplace

forallprescriptionsthatarenewtoyou.Intheseconversations,theresultsofan

RCTmayhavemarginalvalue.Ifyourphysiciantellsyouthatsheendorsesevidence-

basedmedicine,andthatthedrugwillmostlikelyworkforyoubecauseanRCThas

shownthat‘itworks’,itistimetofindanewphysician.

Thesecondchallengeclaimsthatothermethodsarealwaysdominatedbyan

RCT.Thiskindofchallengeisnotwell-formulated.Dominatedforansweringwhat

question,forwhatpurposes?ThechiefadvantageoftheRCTisthatitcan,ifwell-

conducted,giveanunbiasedestimateofanATEinastudy(trial)sampleandthus

provideevidencethatthetreatmentcausedtheoutcomeinsomeindividualsinthat

sample.Ifthatiswhatyouwanttoknowandthere’slittlebackgroundknowledge

availableandthepriceisright,thenanRCTmaybethebestchoice.Astoother

questions,theRCTresultcanbepart—butusuallyonlyasmallpart—ofthedefense

of(a)ageneralclaim,(b)aclaimthatthetreatmentwillcausethatoutcomefor

someotherindividuals,oreven(c)aclaimaboutwhattheATEwillbeinsomeother

population.Buttheydolittlefortheseenterprisesontheirown.Whatisthebest

overallpackageofresearchworkfortacklingthesequestions—mostcost-effective

andmostlikelytoproducecorrectresults—dependsonwhatweknowandwhat

differentkindsofresearchwillcost.

Page 58: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

57

ThereareexampleswhereanRCTdoesbetterthananobservationalstudy,

andtheseseemtobethecasesthatcometomindfordefendersofRCTs.Forexam-

ple,regressionsofwhetherpeoplewhogetMedicaiddobetterorworsethanpeople

withprivateinsurancearevitiatedbygrossdifferencesintheothercharacteristics

ofthetwopopulations.ButitisalongstepfromthattosayingthatanRCTcansolve

theproblem,letalonethatitistheonlywaytosolvetheproblem.Itwillnotonlybe

expensivepersubject,butitcanonlyenrollaselectedandalmostcertainlyunrepre-

sentativestudysample,itcanberunonlytemporarily,andtherecruitmenttothe

experimentwillnecessarilybedifferentfromrecruitmentinaschemethatisper-

manentandopentothefullqualifiedpopulation.Noneofthisremovestheblem-

ishesoftheobservationalstudy,buttherearemanymethodsofmitigatingitsdiffi-

culties,sothat,intheend,anobservationalstudywithcrediblecorrectionsanda

morerelevantandmuchlargerstudysample—todayoftenthecompletepopulation

ofinterestthroughadministrativerecords—mayprovideabetterestimate.Every-

thinghastobejudgedonacase-by-casebasis.Thereisnorigorousargumentfora

lexicographicpreferenceforRCTs.

Thereisalsoanimportantlineofenquirythatgoes,notonlybeyondRCTs,

butbeyondthe‘methodofdifferences’thatiscommontoRCTs,regressions,orany

formofcontrolledoruncontrolledcomparison.Thehypothetico-deductivemethod

confrontstheory-baseddeductionswiththedata—eitherobservationalorexperi-

mental.Asnotedabove,economistsroutinelyusetheorytoteaseoutanewimplica-

tionthatcanbetakentothedata,andtherearealsogoodexamplesinmedicine

suchasBleyerandWelch(2012)’sdemonstrationofthelimitedimpactonbreast

cancerincidenceofmammographyscreening,atopicwhereothermethodshave

generatedgreatcontroversyandlittleconsensus.

RCTsaretheultimateinnon-parametricestimationofaveragetreatmentef-

fectsinthetrialsamplesbecausetheymakesofewassumptionsaboutheterogene-

ity,causalstructure,choiceofvariables,andfunctionalform.RCTsareoftenconven-

ientwaystointroduceexperimenter-controlledvariance—ifyouwanttoseewhat

happens,thenkickitandsee,twistthelion’stail—butnotethatmanyexperiments,

includingmanyofthemostimportant(andNobelPrizewinning)experimentsin

Page 59: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

58

economics,donotanddidnotuserandomization,Harrison(2013),Svorencik

(2015).Butthecredibilityoftheresults,eveninternally,canbeunderminedbyun-

balancedcovariatesandbyexcessiveheterogeneityinresponses,especiallywhen

thedistributionofeffectsisasymmetric,whereinferenceonmeanscanbehazard-

ous.Ironically,thepriceofthecredibilityinRCTsisthatwecanonlyrecoverthe

meanofthedistributionoftreatmenteffects,andthatonlyforthetrialsample.Yet,

inthepresenceofoutliers,reliableinferenceonmeansisdifficult.Andrandomiza-

tioninandofitselfdoesnothingunlessthedetailsareright;purposiveselectioninto

theexperimentalpopulation,likepurposiveselectionintoandoutofassignment,

underminesinferenceinjustthesamewayasdoesselectioninobservationalstud-

ies.Lackofblinding,whetherofparticipants,trialists,datacollectors,oranalysts,

underminesinference,akintoafailureofexclusionrestrictionsininstrumentalvari-

ableanalysis.

ThelackofstructurecanbeseriouslydisablingwhenwetrytouseRCTre-

sultsoutsideofafewcontexts,suchasprogramevaluation,hypothesistesting,or

establishingproofofconcept.Beyondthat,theresultscannotbeusedtohelpmake

predictionsbeyondthetrialsamplewithoutmorestructure,withoutmorepriorin-

formation,andwithouthavingsomeideaofwhatmakestreatmenteffectsvaryfrom

placetoplaceortimetotime.Thereisnooptionbuttocommittosomecausal

structureifwearetoknowhowtouseRCTevidenceoutoftheoriginalcontext.

Simplegeneralizationandsimpleextrapolationdonotcutthemustard.Thisistrue

ofanystudy,experimentalorobservational.Butobservationalstudiesarefamiliar

with,androutinelyworkwith,thesortofassumptionsthatRCTsclaimtoavoid,so

thatiftheaimistouseempiricalevidence,anycredibilityadvantagethatRCTshave

inestimationisnolongeroperative.AndbecauseRCTstellussolittleaboutwhyre-

sultshappen,theyhaveadisadvantageoverstudiesthatuseawiderrangeofprior

informationanddatatohelpnaildownmechanisms.

Yetoncethatcommitmenthasbeenmade,RCTevidencecanbeextremely

useful,pinningdownpartofastructure,helpingtobuildstrongerunderstanding

andknowledge,andhelpingtoassesswelfareconsequences.Asourexamplesshow,

thiscanoftenbedonewithoutcommittingtothefullcomplexityofwhatareoften

Page 60: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

59

thoughtofasstructuralmodels.Yetwithoutthestructurethatallowsustoplace

RCTresultsincontext,ortounderstandthemechanismsbehindthoseresults,not

onlycanwenottransportwhether`itworks’elsewhere,butwecannotdothestand-

ardstuffofeconomics,whichistosaywhethertheinterventionisactuallywelfare

improving.Withoutknowingwhythingshappenandwhypeopledothings,werun

theriskofworthlesscasual(`fairystory’)causaltheorizingandhavegivenupon

oneofthecentraltasksofeconomics.

Wemustbackawayfromtherefusaltotheorize,fromtheexultationinour

abilitytohandleunlimitedheterogeneity,andactuallySAYsomething.Perhapspar-

adoxically,unlesswearepreparedtomakeassumptions,andtosaywhatweknow,

makingstatementsthatwillbeincredibletosome,allthecredibilityoftheRCTisfor

naught.

RCTsineconomicsonhealth,labor,anddevelopmenthaveproventheir

worthinprovidingproofsofconceptandattestingpredictionsthatsomepolicies

mustalwaysworkorcanneverwork.But,aselsewhereineconomics,wecannot

findoutwhysomethingworksbysimplydemonstratingthatitdoeswork,nomatter

howoften,whichleavesusuninformedastowhetherthepolicyshouldbeimple-

mented.Beyondthat,smallscale,demonstrationRCTsarenotcapableoftellingus

whatwouldhappenifthesepolicieswereimplementedtoscale,ofcapturingunin-

tendedconsequencesthattypicallycannotbeincludedintheprotocols,orofmodel-

ingwhatwillhappenifschemesareimplementeddifferentlythaninthetrial,forex-

amplebygovernments,whosemotivesandoperatingprinciplesaredifferentfrom

theNGOsoracademicswhotypicallyruntrials.Whileitistruethatabstract

knowledgeisalwayslikelytobebeneficial,successfulpolicydependsoninstitutions

andonpolitics,mattersonwhichRCTshavelittletosay.TheresultsofRCTscanand

shouldfeedintopublicdebateaboutwhatshouldbedone,butweareondangerous

groundwhentheyareused,ongroundsoftheirsupposedepistemicsuperiority,to

insulatepolicyfromdemocraticprocesses.

Citations

Page 61: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

60

Aigner,DennisJ.,1985,“Theresidentialelectricitytime-of-usepricingexperiments.Whathavewelearned?”inDavidA.WiseandJerryA.Hausman,Socialexperimen-tation,Chicago,Il.ChicagoUniversityPressforNationalBureauofEconomicRe-search,11–54.

Al-Ubaydil,Omar,andJohnA.List,2013,“Onthegeneralizabilityofexperimentalre-sultsineconomics,”inG.FrechetteandA.Schotter,Methodsofmodernexperi-mentaleconomics,OxfordUniversityPress.

Altman,DouglasG.,1985,“Comparabilityofrandomizedgroups,”JournaloftheRoyalStatisticalSociety,SeriesD(TheStatistician),34(1),Statisticsinhealth,125–36.

Angrist,JoshuaD.,2004,“Treatmenteffectheterogeneityintheoryandpractice,”EconomicJournal,114,C52–C83.

Angrist,JoshuaD.,EricBettinger,ErikBloom,ElizabethKing,andMichaelKremer,2002,“VouchersforprivateschoolinginColombia:evidencefromarandomizednaturalexperiment,”AmericanEconomicReview,92(5),1535–58.

Angrist,JoshuaD.andJörn-SteffenPischke,2010,“Thecredibilityrevolutioninem-piricaleconomics:howbetterresearchdesignistakingtheconoutofeconomet-rics,”JournalofEconomicPerspectives,24(2),3–30.

Angrist,JoshuaD.andJörn-SteffenPischke,2017,“Undergraduateeconometricsin-struction:throughourclasses,darkly,”JournalofEconomicPerspectives,31(2),125-44.

Aron-Dine,Aviva,LiranEinav,andAmyFinkelstein,2013,“TheRANDhealthinsur-anceexperiment,threedecadeslater,”JournalofEconomicPerspectives,27(1),197–222.

Arrow,KennethJ.,1975,“Twonotesoninferringlongrunbehaviorfromsocialex-periments,”DocumentNo.P-5546,SantaMonica,CA.RandCorporation.

Ashenfelter,Orley,1978,“Thelaborsupplyresponseofwageearners,”inJohnL.PalmerandJosephA.Pechman,eds.,Welfareinruralareas:theNorthCarolina–IowaIncomeMaintenanceExperiment,Washington,DC.TheBrookingsInstitu-tion.109–38.

Athey,SusanandGuidoW.Imbens,2017,“Thestateofappliedeconometrics:cau-salityandpolicyevaluation,”JournalofEconomicPerspectives,31(2),3-32.

Attanasio,Orazio,CostasMeghir,andAnaSantiago,2012,“EducationchoicesinMexico:usingastructuralmodelandarandomizedexperimenttoevaluatePRO-GRESA,”ReviewofEconomicStudies,79(1),37–66.

Attanasio,Orazio,SarahCattan,EmlaFitzsimons,CostasMeghir,andMartaRubioCodina,2015,“Estimatingtheproductionfunctionforhumancapital:resultsfromarandomizedcontrolledtrialinColombia,”London.InstituteforFiscalStudies,WorkingPapernoW15/06.

Bahadur,R.R.,andLeonardJ.Savage,1956,“Thenon-existenceofcertainstatisticalproceduresinnonparametricproblems,”AnnalsofMathematicalStatistics,25:1115–22.

Banerjee,Abhijit,SylvainChassang,SergioMontero,andErikSnowberg,2016,“Atheoryofexperimenters,”processed,July2016.

Page 62: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

61

Banerjee,Abhijit,SylvainChassang,andErikSnowberg,2016,“Decisiontheoreticapproachestoexperimentdesignandexternalvalidity,”Cambridge,MA.NBERWorkingPaperno22167,April.

Banerjee,Abhijit,AngusDeaton,andEstherDuflo,2004,“Healthcaredeliveryinru-ralRajasthan,”EconomicandPoliticalWeekly,39(9),944–9.

Banerjee,AbhijitandEstherDuflo,2009,“Theexperimentalapproachtodevelop-menteconomics,”AnnualReviewofEconomics,1,151-78.

Banerjee,AbhijitandEstherDuflo,2012,Pooreconomics:aradicalrethinkingofthewaytofightglobalpoverty,PublicAffairs.

Banerjee,Abhijit,EstherDuflo,NathanaelGoldberg,DeanKarlan,RobertOsei,Wil-liamParienté,JeremyShapiro,BramThuysbaert,andChristopherUdry,2015,“Amultifacetedprogramcauseslastingprogressfortheverypoor:evidencefromsixcountries,”Science,348(6236),1260799.

Banerjee,Abhijit,EstherDuflo,andRachelGlennerster,2008,“Puttingaband-aidonacorpse:incentivesfornursesintheIndianpublichealthcaresystem,”JournaloftheEuropeanEconomicAssociation,6(2–3),487–500.

Banerjee,AbhijitV.andRuiminHe,2003,“TheWorldBankofthefuture,”AmericanEconomicReview,93(2),39–44.

Banerjee,Abhijit,DeanKarlan,andJonathanZinman,2015,“Sixrandomizedevalua-tionsofmicrocredit:introductionandfurthersteps,”AmericanEconomicJournal:AppliedEconomics,7(1),1-21.

Bareinboim,EliasandJudeaPearl,2013,“Ageneralalgorithmfordecidingtrans-portabilityofexperimentalresults,”JournalofCausalInference,1(1),107-34.

Bareinboim,EliasandJudeaPearl,2014,“Transportabilityfrommultipleenviron-mentswithlimitedexperiments:completenessresults,”inM.Welling,Z.Ghah-ramani,C.Cortes,andN.Lawrence,eds.,AdvancesofNeuralInformationPro-cessing,27,(NIPSProceedings),280-8.

Bauchet,Jonathan,JonathanMorduch,andShamikaRavi,2015,“Failurevsdisplace-ment:whyaninnovativeanti-povertyprogramshowednonetimpactinSouthIndia,”JournalofDevelopmentEconomics,116,1–16.

Basu,Kaushik,2010,“TheeconomicsoffoodgrainmanagementinIndia,”MinistryofFinance,Delhi.http://finmin.nic.in/workingpaper/Foodgrain.pdf

Begg,ColinB.,1990,“Significancetestsofcovarianceimbalanceinclinicaltrials,”ControlledClinicalTrials,11(4),223-5.

Bhattacharya,DebopamandPascalineDupas,2012,“Inferringwelfaremaximizingtreatmentassignmentunderbudgetconstraints,”JournalofEconometrics,167(1),168-96.

Bitler,MarianneP.,JonahB.Gelbach,andHilaryW.Hoynes,2006,“Whatmeanim-pactsmiss:distributionaleffectsofwelfarereformexperiments,”AmericanEco-nomicReview,96(4),988-1012.

Bleyer,Archie,andH.GilbertWelch,2012,“Effectofthreedecadesofscreeningmammographyonbreast-cancerincidence,”NewEnglandJournalofMedicine,367,1998-2005

Bloom,HowardS.,CarolynJ.Hill,andJamesA.Riccio,2005,“Modelingcross-siteex-perimentaldifferencestofindoutwhyprogrameffectivenessvaries,”inHoward

Page 63: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

62

S.Bloom,ed.,Learningmorefromsocialexperiments:evolvinganalyticalap-proaches,NewYork,NY.RussellSage.

Bold,Tessa,MwangiKimenyi,GermanoMwabu,AliceNg’ang’a,andJustinSandefur,2013,“Scalingupwhatworks:experimentalevidenceonexternalvalidityinKen-yaneducation,”Washington,DC.CenterforGlobalDevelopment,WorkingPaper321.

Bothwell,LauraE.andScottH.Podolsky,2016,“Theemergenceoftherandomized,controlledtrial,”NewEnglandJournalofMedicine,375(6),501–4.doi:10.1056/NEJMp1604635

Campbell,D.T.andJ.C.Stanley,1963,Experimentalandquasi-experimentaldesignsforresearch.Chicago.RandMcNally.

Cartwright,Nancy,1994,Nature’scapacitiesandtheirmeasurement.Oxford.Claren-donPress.

Cartwright,Nancy,2007,“AreRCTsthegoldstandard?”Biosocieties,2,11-20.Cartwright,Nancy,2011,“Aphilosopher’sviewofthelongroadfromRCTstoeffec-tiveness,”TheLancet,377,1400-01.

Cartwright,Nancy,2012,“Presidentialaddress:willthispolicyworkforyou?Pre-dictingeffectivenessbetter:howphilosophyhelps,”PhilosophyofScience,79,973-89.

Cartwright,Nancy,2016.“Whereistherigorwhenyouneedit?”inI.Marinovic,ed.,FoundationsandTrendsinAccounting:specialissueoncausalinferenceincapitalmarketsresearch,10(2-4):106-24.

Cartwright,NancyandJeremyHardie,2012,Evidencebasedpolicy:apracticalguidetodoingitbetter,Oxford.OxfordUniversityPress.

Cartwright,NancyandEileenMunro,2010,“ThelimitationsofRCTsinpredictingeffectiveness,”JournalofExperimentalChildPsychology,16(2),

Chalmers,Iain,2001,“Comparinglikewithlike:somehistoricalmilestonesintheevolutionofmethodstocreateunbiasedcomparisongroupsintherapeuticexper-iments,”InternationalJournalofEpidemiology,30,1156–64.

Chan,TatY.andBartonH.Hamilton,2006,“Learning,privateinformation,andtheeconomicevaluationofrandomizedexperiments,”JournalofPoliticalEconomy,114(6),997-1040.

Chassang,Sylvain,GerardPadróIMiguel,andErikSnowberg,2012,“Selectivetrials:aprincipal–agentapproachtorandomizedcontrolledexperiments,”AmericanEconomicReview,102(4),1279–1309.

Chassang,Sylvain,ErikSnowberg,BenSeymour,andCayleyBowles,2015,“Ac-countingforbehaviorintreatmenteffects:newapplicationsforblindtrials,”PLoSOne,10(6),e0127227.doi:10:1371/journal.pone.0127227.

Chaudhury,Nazmul,JeffreyHammer,MichaelKremer,KarthikMuralidharan,andF.HalseyRogers,2005,“Missinginaction:teacherandhealthworkerabsenceinde-velopingcountries,”JournalofEconomicPerspectives,19(4),91–116.

Chyn,Eric,2016,“Movedtoopportunity:thelong-runeffectofpublichousingdemo-litiononlabormarketoutcomesofchildren,”UniversityofMichigan.http://www-personal.umich.edu/~ericchyn/Chyn_Moved_to_Opportunity.pdf

Page 64: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

63

Chetty,Raj,2009,“Sufficientstatisticsforwelfareanalysis:abridgebetweenstruc-turalandreduced-formmethods,”AnnualReviewofEconomics,1,451-87.

Conlisk,John,1973,“Choiceofresponsefunctionalformindesigningsubsidyexper-iments,”Econometrica,41(4),643–56.

Crépon,Bruno,EstherDuflo,MarcGurgand,RolandRathelot,andPhilippeZamora,2014,“Dolabormarketpolicieshavedisplacementeffects?evidencefromaclus-teredrandomizedexperiment,”QuarterlyJournalofEconomics,128(2),531–80.

Das,JishnuandJeffreyHammer,2005,”Whichdoctor?Combiningvignettesanditemresponsetomeasureclinicalcompetence,”JournalofDevelopmentEconom-ics,78,348–83.

Davey-Smith,George,andShahIbrahim,2002,“Datadredging,bias,orconfound-ing,”BritishMedicalJournal,325,1437-8.

Deaton,Angus,2010,“Instruments,randomization,andlearningaboutdevelop-ment,”JournalofEconomicLiterature,48(2),424-55.

Deaton,AngusandNancyCartwright,2016,“Understandingandmisunderstandingrandomizedcontrolledtrials,”http://www.princeton.edu/~deaton/down-load.html?pdf=Deaton_Cartwright_RCTs_with_ABSTRACT_August_25.pdf

Deaton,AngusandJohnMuellbauer,1980,Economicsandconsumerbehavior,NewYork.CambridgeUniversityPress.

Deaton,AngusandSerenaNg,1998,“Parametricandnonparametricapproachestopriceandtaxreform,”JournaloftheAmericanStatisticalAssociation,93(443),900-9.

Dhaliwal,Iqbal,EstherDuflo,RachelGlennerster,andCaitlinTulloch,2012,“Com-parativecost-effectivenessanalysistoinformpolicyindevelopingcountries:ageneralframeworkwithapplicationsforeducation,”J–PAL,MIT,December3rd.http://www.povertyactionlab.org/publication/cost-effectiveness

Drèze,Jean,2016,Personalemailcommunication.Duflo,Esther,2017,“Theeconomistasplumber,”AmericanEconomicReview,107(5),1-26.

Duflo,Esther,RemaHanna,andStephenP.Ryan,2012,“Incentiveswork:gettingteacherstocometoschool,”AmericanEconomicReview,102(4),1241–78.

Duflo,EstherandMichaelKremer,2008,“Useofrandomizationintheevaluationofdevelopmenteffectiveness,”inWilliamEasterly,ed.,Reinventingforeignaid.Washington,DC.Brookings,93–120.

Dynarski,Susan,2015,“Helpingthepoorineducation:thepowerofasimplenudge,”NewYorkTimes,Jan17,2015.

Fine,PaulE.M.andJacquelineA.Clarkson,1986,“Individualversuspublicprioritiesinthedeterminationofoptimalvaccinationpolicies,”AmericanJournalofEpide-miology,124(6),1012–20.

Fisher,RonaldA.,1926,“Thearrangementoffieldexperiments,”JournaloftheMin-istryofAgricultureofGreatBritain,33,503–13.

Filmer,Deon,JeffreyHammer,andLantPritchett,2000,“Weaklinksinthechain:adiagnosisofhealthpolicyinpoorcountries,”WorldBankResearchObserver,15(2),199–204.

Page 65: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

64

Freedman,DavidA.,2008,“Onregressionadjustmentstoexperimentaldata,”Ad-vancesinAppliedMathematics,40,180–93.

Frieden,ThomasR.,2017,“Evidenceforhealthdecisionmaking—beyondrandom-ized,controlledtrials,”NewEnglandJournalofMedicine,377,465-75.

Garfinkel,IrwinandCharlesF.Manski,1992,“Introduction,”inIrwinGarfinkelandCharlesF.Manski,eds.,Evaluatingwelfareandtrainingprograms,Cambridge,MA.HarvardUniversityPress.1–22.

Gerber,AlanS.andDonaldP.Green,2012,FieldExperiments,NewYork.Norton.Gertler,PaulJ.,SebastianMartinez,PatrickPremand,LauraB.Rawlings,andChristelM.J.Vermeersch,2016,Impactevaluationinpractice,2ndEdition,Washington,DC.Inter-AmericanDevelopmentBankandWorldBank.

Goldberger,ArthurS.andCharlesF.Manski,1995,“ReviewArticle:TheBellCurvebyHerrnsteinandMurray,”JournalofEconomicLiterature,33(2),762-76.

Greenberg,DavidandMarkShroder,2004,Thedigestofsocialexperiments(3rded.),Washington,DC.UrbanInstitutePress.

Greenberg,David,MarkShroder,andMatthewOnstott,1999,“Thesocialexperi-mentmarket,”JournalofEconomicPerspectives,13(3),157–72.

Gueron,JudithM.andHowardRolston,2013,Fightingforreliableevidence,NewYork,RussellSage.

Guyatt,Gordon,DavidL.Sackett,andDeborahJ.CookfortheEvidence-BasedMedi-cineWorkingGroup,1994,“Users’guidestothemedicalliteratureII:howtouseanarticleabouttherapyorprevention.B.Whatweretheresultsandwilltheyhelpmeincaringformypatients?”JournaloftheAmericanMedicalAssociation,271(1),59–63.

Harrison,GlennW.,2013,“Fieldexperimentsandmethodologicalintolerance,”Jour-nalofEconomicMethodology,20(2),103–17.

Harrison,GlennW.,2014,“Impactevaluationandwelfareevaluation,”EuropeanJournalofDevelopmentResearch,26,39–45.

Harrison,GlennW.,2014,“Cautionarynotesontheuseoffieldexperimentstoad-dresspolicyissues,”OxfordReviewofEconomicPolicy,30(4),753-63.

Hausman,JerryA.andDavidA.Wise,1985,“Technicalproblemsinsocialexperi-mentation:costversuseaseofanalysis,”inJerryA.HausmanandDavidA.Wise,eds.,SocialExperimentation,Chicago,IL.ChicagoUniversityPress.187–220.

Heckman,JamesJ.,1992,“Randomizationandsocialpolicyevaluation,”inCharlesF.ManskiandIrwinGarfinkel,eds.,Evaluatingwelfareandtrainingprograms,Cam-bridge,MA.HarvardUniversityPress.547–70.

Heckman,JamesJ.,1997,“Instrumentalvariables:astudyofimplicitbehavioralas-sumptionsusedinmakingprogramevaluations,”JournalofHumanResources,32(3),441–62.

Heckman,JamesJ.,2005,“Thescientificmodelofcausality,”SociologicalMethodol-ogy,35(1),1-97.

Heckman,JamesJ.,2008,“Econometriccausality,”InternationalStatisticalReview,76(1),1-27.

Page 66: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

65

Heckman,JamesJ.,2010,“Buildingbridgesbetweenstructuralandprogramevalua-tionapproachestoevaluatingpolicy,”JournalofEconomicLiterature,48(2),356-98.

Heckman,JamesJ.,NeilHohman,andJeffreySmith,withtheassistanceofMichaelKhoo,2000,“Substitutionanddropoutbiasinsocialexperiments:astudyofaninfluentialsocialexperiment,”QuarterlyJournalofEconomics,115(2),651–94.

Heckman,JamesJ.,RobertJ.Lalonde,andJeffreyA.Smith,1999,“Theeconomicsandeconometricsofactivelabormarkets,”Chapter31inAshenfelter,OrleyandDa-vidCard,eds.Handbookoflaboreconomics,Amsterdam.North-Holland,3(A),1866–2097.

Heckman,JamesJ.,RodrigoPinto,andPeterSavelyev,2013,“Understandingthemechanismsthroughwhichaninfluentialearlychildhoodprogramboostedadultoutcomes,”AmericanEconomicReview,103(6),2052–86.

Heckman,JamesJ.andJeffreySmith,1995,“Assessingthecaseforsocialexperi-ments,”JournalofEconomicPerspectives,9(2),85-110.

Heckman,JamesJ.,JeffreySmith,andNancyClements,1997,“Makingthemostoutofprogrammeevaluationsandsocialexperiments:accountingforheterogeneityinprogrammeimpacts,”ReviewofEconomicStudies,64(4),487–535.

Heckman,JamesJ.andSergioUrzúa,2010,“ComparingIVwithstructuralmodels:whatsimpleIVcanandcannotidentify,”JournalofEconometrics,156,27-37.

Heckman,JamesJ.andEdwardVytlacil,2005,“Structuralequations,treatmentef-fects,andeconometricpolicyevaluation,”Econometrica,73(3),669–738.

Heckman,JamesJ.andEdwardJ.Vytlacil,2007,“Econometricevaluationofsocialprograms,Part1:causalmodels,structuralmodels,andeconometricpolicyeval-uation,”Chapter70inJamesJ.HeckmanandEdwardE.Leamer,eds.,HandbookofEconometrics,6B,4779–874.

Horton,Richard,2000,“Commonsenseandfigures:therhetoricofvalidityinmedi-cine:BradfordHillmemoriallecture1999,”Statisticsinmedicine,19,3149–64.

Hotz,V.Joseph,GuidoW.Imbens,andJulieH.Mortimer,2005,“Predictingtheeffi-cacyoffuturetrainingprogramsusingpastexperienceatotherlocations,”Jour-nalofEconometrics,125,241–70.

Hsieh,Chang-taiandMiguelUrquiola,2006,“Theeffectsofgeneralizedschoolchoiceonachievementandstratification:evidencefromChile’svoucherpro-gram,”JournalofPublicEconomics,90,1477–1503.

Hurwicz,Leonid,1966,“Onthestructuralformofinterdependentsystems,”StudiesinLogicandtheFoundationsofMathematics,44,232-9.

Imbens,GuidoW.,2004,“Nonparametricestimationofaveragetreatmenteffectsunderexogeneity:areview,”ReviewofEconomicsandStatistics,86(1),4–29.

Imbens,GuidoW.,2010,“BetterLATEthannothing:somecommentsonDeaton(2009)andHeckmanandUrzua,”JournalofEconomicLiterature,48(2),399–423.

Imbens,GuidoW.andJoshuaD.Angrist,1994,“Identificationandestimationoflocalaveragetreatmenteffects,”Econometrica,62(2),467–75.

Imbens,GuidoW.andMichalKolesár,2016,“Robuststandarderrorsinsmallsam-ples:somepracticaladvice,”ReviewofEconomicsandStatistics,98(4),701-12.

Page 67: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

66

Imbens,GuidoW.andJeffreyM.Wooldridge,2009,“Recentdevelopmentsintheeconometricsofprogramevaluation,”JournalofEconomicLiterature,47(1),5–86.

InternationalCommitteeofMedicalJournalEditors,2015,Recommendationsfortheconduct,reporting,editing,andpublicationofscholarlyworkinmedicaljournals,http://www.icmje.org/icmje-recommendations.pdf(accessed,August20,2016.)

J_PAL,2017,https://www.povertyactionlab.org/about-j-pal,(accessed,August21,2017).

Kahneman,DanielandGaryKlein,2009,“Conditionsforintuitiveexpertise:afailuretodisagree,”AmericanPsychologist,64(6),515–26.

Karlan,DeanandJacobAppel,2011,Morethangoodintentions:howaneweconom-icsishelpingtosolveglobalpoverty,NewYork.Dutton.

Karlan,Dean,NathanealGoldbergandJamesCopestake,2009,“Randomizedcon-trolledtrialsarethebestwaytomeasureimpactofmicrofinanceprogramsandimprovemicrofinanceproductdesigns,”EnterpriseDevelopmentandMicro-finance,20(3),167–76.

Kasy,Maximilian,2016,“Whyexperimentersmightnotwanttorandomize,andwhattheycoulddoinstead,”PoliticalAnalysis,1–15doi:10.1093/pan/mpw012

Kramer,Peter,2016,Ordinarilywell:thecaseforantidepressants,NewYork.Farrar,Straus,andGiroux.

Kremer,MichaelandAlakaHolla,2009,“Improvingeducationinthedevelopingworld:whathavewelearnedfromrandomizedevaluations?”AnnualReviewofEconomics,1,513–42.

Lalonde,RobertJ.,1986,“Evaluatingtheeconometricevaluationsoftrainingpro-gramswithexperimentaldata,”AmericanEconomicReview,76(4),604-20.

Lehman,Erich.L.andJosephP.Romano,2005,Testingstatisticalhypotheses(thirdedition),NewYork.Springer.

Levy,Santiago,2006,Progressagainstpoverty:sustainingMexico’sProgresa-Opor-tunidadesprogram,Washington,DC.Brookings.

Mackie,JohnL.,1974,Thecementoftheuniverse:astudyofcausation,Oxford.Ox-fordUniversityPress.

Manning,WillardG.,JosephP.Newhouse,NaihuaDuan,EmmettKeeler,andArleenLeibowitz,1988a,“Healthinsuranceandthedemandformedicalcare:evidencefromarandomizedexperiment,”AmericanEconomicReview,77(3),251–77.

Manning,WillardG.,JosephP.Newhouse,NaihuaDuan,EmmettKeeler,BernadetteBenjamin,ArleenLeibowitz,M.SusanMarquis,andJackZwanziger,1988b,Healthinsuranceandthedemandformedicalcare:evidencefromarandomizedex-periment,SantaMonica,CA.RAND.

Manski,CharlesF.,2004,“Treatmentrulesforheterogeneouspopulations,”Econo-metrica,72(4),1221-46.

Manski,CharlesF.,2013,Publicpolicyinanuncertainworld:analysisanddecisions,Cambridge,MA.HarvardUniversityPress.

Manski,CharlesF.andAlekseyTetenov,2016,“Sufficienttrialsizetoinformclinicalpractice,”PNAS,113(38),10518-23.

Page 68: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

67

Metcalf,CharlesE.,1973,“Makinginferencesfromcontrolledincomemaintenanceexperiments,”AmericanEconomicReview,63(3),478–83.

Moffitt,Robert,1979,“ThelaborsupplyresponseintheGaryexperiment,”JournalofHumanResources,14(4),477–87.

Moffitt,Robert,1992,“Evaluationmethodsforprogramentryeffects,”Chapter6inCharlesManskiandIrwinGarfinkel,Evaluatingwelfareandtrainingprograms,Cambridge,MA.HarvardUniversityPress,231–52.

Moffitt,Robert,2004,“Theroleofrandomizedfieldtrialsinsocialscienceresearch:aperspectivefromevaluationsofreformsofsocialwelfareprograms,”AmericanBehavioralScientist,47(5),506–40

Morgan,KariLockandDonaldB.Rubin,2012,“Rerandomizationtoimprovecovari-atebalanceinexperiments,”AnnalsofStatistics,40(2),1263–82.

Muller,SeánM.,2015,“Causalinteractionandexternalvalidity:obstaclestothepol-icyrelevanceofrandomizedevaluations,”WorldBankEconomicReview,29,S217–S225.

Orcutt,GuyH.andAliceG.Orcutt,1968,“Incentiveanddisincentiveexperimenta-tionforincomemaintenancepolicypurposes,”AmericanEconomicReview,58(4),754–72.

Pearl,JudeaandEliasBareinboim,2011,“Transportabilityofcausalandstatisticalrelations:aformalapproach,”Proceedingsofthe25thAAAIConferenceonArtificialIntelligence,AAAIPress,247-54,

Pearl, Judea and Elias Bareinboim, 2014, “External validity: from do-calculus to trans-portability across populations,” Statistical Science, 29(4), 579-95.

Rodrik,Dani,2006,personalemailcommunication.Rothwell,PeterM.,2005,“Externalvalidityofrandomizedcontrolledtrials:‘towhomdotheresultsofthetrialapply’”,Lancet,365,82–93.

Russell,Bertrand,2008[1912],Theproblemsofphilosophy,Rockville,MD.ArcManor.

Sackett,DavidL.,WilliamM.C.Rosenberg,J.A.MuirGray,R.BrianHaynesandW.ScottRichardson,1996,“Evidencebasedmedicine:whatitisandwhatitisn’t,”BritishMedicalJournal,312(January13),71–2.

Savage,LeonardJ.,1962,“Subjectiveprobabilityandstatisticalpractice,”inG.A.Bar-nardandD.R.Cox,eds.,TheFoundationsofStatisticalInference,London.Me-thuen.9-35.

Scriven,Michael,1974,“Evaluationperspectivesandprocedures,”inW.JamesPop-ham,ed.,Evaluationineducation—currentapplications,Berkeley,CA.McCutchanPublishingCorporation.

Senn,Stephen,1994,“Testingforbaselinebalanceinclinicaltrials,”StatisticsinMedicine,13,1715–26.

Senn,Stephen,2013,“Sevenmythsofrandomizationinclinicaltrials,”StatisticsinMedicine32,1439–50.

Shadish,WilliamR.,ThomasD.Cook,andDonaldT.Campbell,2002,Experimentalandquasi-experimentaldesignsforgeneralizedcausalinference,Boston,MA.HoughtonMifflin.

Simpson,Adrian,2017,“Themisdirectionofpublicpolicy:comparingandcombin-ingstandardisedeffectsizes,”JournalofEducationalPolicy32(4),450-66.

Page 69: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

68

Stuart,ElizabethA.,StephenR.Cole,andCatharineP.Bradshaw,andPhilipJ.Leaf,2011,“Theuseofpropensityscorestoassessthegeneralizabilityofresultsfromrandomizedtrials,”JournaloftheRoyalStatisticalSocietyA,174(2),369–86.

Student(W.S.Gosset),1938,“Comparisonbetweenbalancedandrandomarrange-mentsoffieldplots,”Biometrika,29(3/4),363-78.

Svorencik,Andrej,2015,Theexperimentalturnineconomics:ahistoryofexperi-mentaleconomics,UtrechtSchoolofEconomics,DissertationSeries#29,http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2560026

Todd,PetraE.andKennethJ.Wolpin,2006,“Assessingtheimpactofaschoolsub-sidyprograminMexico:usingasocialexperimenttovalidateadynamicbehav-ioralmodelofchildschoolingandfertility,”AmericanEconomicReview,96(5),1384–1417.

Todd,PetraE.andKennethJ.Wolpin,2008,“Exanteevaluationofsocialprograms,”Annalesd’EconomieetdelaStatistique,91/92,263–91.

U.S.DepartmentofEducation,InstituteofEducationSciences,NationalCenterforEducationEvaluationandRegionalAssistance,2003,Identifyingandimplement-ingeducationalpracticessupportedbyrigorousevidence:auserfriendlyguide,Washington,DC.InstituteofEducationSciences.

Vandenbroucke,JanP.,2004,“Whenareobservationalstudiesascredibleasran-domizedcontrolledtrials?”TheLancet,363:1728–31.

Vandenbroucke,JanP..2009,“TheHRTcontroversy:observationalstudiesandRCTsfallinline,”TheLancet,373,1233-5.

Vivalt,Eva,2015,“Howmuchcanwegeneralizefromimpactevaluations?”NYU,un-published.http://evavivalt.com/wp-content/uploads/2014/10/Vivalt-JMP-10.27.14.pdf

White,Halbert,1980,“Aheteroskedasticity-consistentcovariancematrixestimatorandadirecttestforheteroskedasticity,”Econometrica,50(1),1–25.

Wise,DavidA.,1985,“Abehavioralmodelversusexperimentation:theeffectsofhousingsubsidiesonrent,”inP.BruckerandR.Pauly,eds.MethodsofOperationsResearch,50,VerlagAnonHain.441–89.

Wolpin,KennethI.,2013,Thelimitsofinferencewithouttheory,Camridge,MA.MITPress.

Worrall,John,2007,“Evidenceinmedicineandevidence-basedmedicine,”Philoso-phyCompass,2/6,981–1022.

Worrall,John,2008,“Evidenceandethicsinmedicine,”PerspectivesinBiologyandMedicine,51(3),418-31.

Yates,Frank,1939,“Thecomparativeadvantagesofsystematicandrandomizedar-rangementsinthedesignofagriculturalandbiologicalexperiments,”Biometrika,30(3/4),440-66.

Young,Alwyn,2016,“ChannelingFisher:randomizationtestsandthestatisticalin-significanceofseeminglysignificantexperimentalresults,”LondonSchoolofEco-nomics,WorkingPaper,Feb.

Ziliak,StephenT.,2014,“Balancedversusrandomizedfieldexperimentsineconom-ics:whyW.S.Gossetaka‘Student’matters,”ReviewofBehavioralEconomics,1,167–208.

Page 70: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

69

Appendix:MonteCarloexperimentforanRCTwithoutliersInthisillustrativeexample,thereisparentpopulationeachmemberofwhichhashisorher

owntreatmenteffect;thesearecontinuouslydistributedwithashiftedlognormaldistribu-

tionwithzeromeansothatthepopulationATEiszero.Theindividualtreatmenteffectsβ

aredistributedsothat β + e0.5 ∼ Λ(0,1) ,forstandardizedlognormaldistributionΛ. Inthe

absenceoftreatment,everyoneinthesamplerecordszero,sothesampleaveragetreat-

menteffectinanyonetrialissimplythemeanoutcomeamongthentreatments.Forvalues

ofnequalto25,50,100,200,and500wedrawfromtheparentpopulation100trialsam-

pleseachofsize2n;withfivevaluesofn,thisgivesus500trialsamplesinall;becauseof

samplingthetrueATE’sineachtrialsamplewillnotbezero.Foreachofthese500samples,

werandomizeintoncontrolsandntreatments,estimatetheATEanditsestimatedt–value

(usingthestandardtwo-samplet–value,orequivalently,byrunningaregressionwithro-

bustt–values),andthenrepeat1,000times,sowehave1,000ATEestimatesandt–values

foreachofthe500trialsamples.TheseallowustoassessthedistributionofATEestimates

andtheirnominalt–valuesforeachtrial.

TheresultsareshowninTableA1.Eachrowcorrespondstoasamplesize.Ineach

row,weshowtheresultsof100,000individualtrials,composedof1,000replicationson

eachofthe100trial(experimental)samples.Thecolumnsareaveragedoverall100,000tri-

als.

TableA1:RCTswithskewedtreatmenteffects

Samplesize MeanofATE

estimates

Meanofnominalt–

values

Fractionnullre-

jected(percent)

25

50

0.0268

0.0266

–0.4274

–0.2952

13.54

11.20

100 –0.0018 –0.2600 8.71

200 0.0184 –0.1748 7.09

500 –0.0024 –0.1362 6.06

Note:1,000randomizationsoneachof100drawsofthetrialsamplerandomlydrawnfromalognormaldistributionoftreatmenteffectsshiftedtohaveazeromean.

Page 71: Understanding and misunderstanding randomized controlled ...deaton/downloads/Deaton...ropean Union’s Horizon 2020 research and innovation program (grant agreement No 667526 K4U),

70

Thelastcolumnshowsthefractionsoftimesthenullthatistrueinthepopulationis

rejectedinthetrialsamplesandisourkeyresult.Whenthereareonly50treatmentsand

50controls(row2),the(true)nullisrejected11.2percentofthetime,insteadofthe5per-

centthatwewouldlikeandexpectifwewereunawareoftheproblem.Whenthereare500

unitsineacharm,therejectionrateis6.06percent,muchclosertothenominal5percent.

FigureA1:EstimatesofanATEwithanoutlierinthetrialsample

FigureA1illustratestheestimatedATEsfromanextremetrialsamplefromthesimulations

inthesecondrowwith100observationsintotal;thehistogramshowsthe1,000estimates

oftheATEforthattrialsample.Thistrialsamplehasasinglelargeoutlyingtreatmenteffect

of48.3;themean(s.d.)oftheother99observationsis–0.51(2.1);whentheoutlierisinthe

treatmentgroup,wegettheright-handsideofthefigure,whenitisinthecontrolgroup,we

gettheleft-handside.

0.5

11.

5D

ensi

ty

-.5 0 .5 1 1.5 21,000 estimates of average treatment effect