12
ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin and Brahmajee K Nallamothu 14 Mar 2018 Department of Statistics and Political Science, Columbia University, New York City, NY, United States (Andrew Gelman, professor); Clinical Epidemiology & Biostatistics, Murdoch Children’s Research Institute, Melbourne School of Population and Global Health and Department of Paediatrics, University of Melbourne, Melbourne, Australia (John Carlin, professor); Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, United States (Brahmajee K Nallamothu, professor); Correspondence to: Brahmajee K Nallamothu [email protected] Acknowledgements: We thank Doug Helmreich for bringing this example to our attention, Shira Mitchell for helpful comments, and the Office of Naval Research, Defense Advanced Research Project Agency, and the National Institutes of Health for partial support of this work. Competing interests: Dr. Gelman and Dr. Carlin report no competing interests. Dr. Nallamothu is an interventional cardiologist and Editor-in-Chief of a journal of the American Heart Association but otherwise has no competing interests. Word Count: 2085

ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

ORBITA:Acasestudyintheanalysisandreportingofclinicaltrials

AndrewGelman,JohnCarlinandBrahmajeeKNallamothu

14Mar2018DepartmentofStatisticsandPoliticalScience,ColumbiaUniversity,NewYorkCity,NY,UnitedStates(AndrewGelman,professor);ClinicalEpidemiology&Biostatistics,MurdochChildren’sResearchInstitute,MelbourneSchoolofPopulationandGlobalHealthandDepartmentofPaediatrics,UniversityofMelbourne,Melbourne,Australia(JohnCarlin,professor);DepartmentofInternalMedicine,UniversityofMichiganMedicalSchool,AnnArbor,MI,UnitedStates(BrahmajeeKNallamothu,professor);Correspondenceto:[email protected]:WethankDougHelmreichforbringingthisexampletoourattention,ShiraMitchellforhelpfulcomments,andtheOfficeofNavalResearch,DefenseAdvancedResearchProjectAgency,andtheNationalInstitutesofHealthforpartialsupportofthiswork.Competinginterests:Dr.GelmanandDr.Carlinreportnocompetinginterests.Dr.NallamothuisaninterventionalcardiologistandEditor-in-ChiefofajournaloftheAmericanHeartAssociationbutotherwisehasnocompetinginterests.WordCount:2085

Page 2: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

Introduction

ORBITA(ObjectiveRandomisedBlindedInvestigationWithOptimalMedicalTherapyofAngioplastyinStableAngina)wasarandomizedclinicaltrialofapproximately200patientsinwhichhalfthepatientsreceivedstentsandhalfreceivedaplaceboprocedure.Itssummaryfindingwasthatstentingdidnot“increaseexercisetimebymorethantheeffectofaplaceboprocedure”withthemeandifferenceinthisprimaryoutcomebetweentreatmentandcontrolgroupsreportedas16.6sec(95%confidenceinterval,−8.9to+42.0sec)andap-valueof0.20.

IntheNewYorkTimes,Kolata(2017)reportedthefindingas“unbelievable,”remarkingthatit“stunnedleadingcardiologistsbycounteringdecadesofclinicalexperience.”Indeed,oneofus(BKN)wasquotedasbeinghumbledbythefindingasmanyhadexpectedapositiveresult.Ontheotherhand,Kolatanoted,“therehavelongbeenquestionsabout[stents’]effectiveness.”Attheveryleast,thewillingnessofdoctorsandpatientstoparticipateinacontrolledtrialwithaplaceboproceduresuggestssomedegreeofexistingskepticismandclinicalequipoise.

ORBITAwasalandmarktrialduetoitsinnovativeuseofaplaceboprocedure.However,substantialquestionsremainevenafterORBITAregardingtheroleofstentinginstableangina.Itisawell-knownstatisticalfallacytotakearesultthatisnotstatisticallysignificantandreportitaszero,aswasessentiallydoneherebasedonthep-valueof0.20fortheprimaryoutcome.Hadthiscomparisonhappenedtoproduceap-valueof0.04,wouldtheheadlinehavebeen,“‘Believable’:HeartStentsIndeedEaseChestPain”?

ThepurposeofthispaperistotakeacloserlookatthelackofstatisticalsignificanceinORBITAandthelargerquestionsitraisesaboutstatisticalanalyses,statisticallybasedversusclinicaldecision-making,andthereportingofclinicaltrials.Thisisimportantbecausealotofcertaintyseemstobehangingonasmallbitofdata.

Dichotomizedthresholdsareabigproblem,henceinthispaperwewillavoiddiscussing“statisticalsignificance”exceptwhendiscussingissuesofhowresultsareorcouldbereported.

StatisticalanalysisoftheORBITAtrial

Adjustingforbaselinedifferences.InORBITA,exercisetimeinastandardizedtreadmill

Page 3: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

test—theprimaryoutcomeinthepreregistereddesign—increasedonaverageby28.4secinthetreatmentgroupcomparedtoanincreaseofonly11.8secinthecontrolgroup.Asnotedabove,thisdifferencewasassociatedwithap-valuegreaterthan0.05.Hence,followingconventionalrulesofscientificreportingitwastreatedaszero—aninstanceoftheregrettablycommonstatisticalfallacyofpresentingnon-statistically-significantresultsasconfirmationofthenullhypothesisofnodifference.

However,theestimateusinggaininexercisetimedoesnotmakefulluseofthedatathatwereavailableondifferencesbetweenthegroupsatbaseline(VickersandAltman,2001,Harrell,2017a).Thetreatmentandplacebogroupsdifferedintheirpre-treatmentlevelsofexercisetime,withmeanvaluesof528.0and490.0s,respectively(SupplementaryTable).Thissortofdifferenceisfine—randomizationassuresbalanceonlyinexpectation—butitisimportanttoadjustforthisdiscrepancyinestimatingthetreatmenteffect.Inthepublishedpaper,theadjustmentwasperformedbysimplesubtractionofthepre-treatmentvalues:

Gaininexercisetime: (ypost−ypre)T−(ypost−ypre)

C, (1)

Butthisover-correctsfordifferencesinpre-testscores,becauseofthefamiliarphenomenonof“regressiontothemean”—justfromnaturalvariation,wewouldexpectpatientswithlowerscoresatbaselinetoimprove,relativetotheaverage,andpatientswithhigherscorestoregressdownward.

Theoptimallinearestimateofthetreatmenteffectisactually:

Gaininexercisetime: (ypost−βypre)T−(ypost−βypre)

C, (2)

whereβisthecoefficientofypreinaleast-squaresregressionofypostonypre,also

controllingforthetreatmentindicator.

Theestimatein(1)isaspecialcaseoftheregressionestimate(2)correspondingtoβ=1.Giventhatthepre-testandpost-testmeasurementshavenearlyidenticalvariances,wecananticipatethattheoptimalβwillbelessthan1,whichwillreducethecorrectionfordifferenceinpre-testandthusincreasetheestimatedtreatmenteffectwhiledecreasingthestandarderror.

AnadjustedanalysisusingtheinformationavailableisexplainedindetailinBox1.Thep-valuefromthisadjustedanalysisis0.09:asexpected,lowerthanthep=0.20fromtheunadjustedanalysis.Whatisrelevantisnotwhetherornotthisnewp-valuehasbecome

Page 4: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

“statisticallysignificant”butratherthereportedp-valueissubjecttochangebasedonalternativeanalyses.

Withindifferentconventionsforscientificreportingandfordifferentfields,ap-valueof0.09isconsideredtobestatisticallysignificant;forexample,inarecentsocialscienceexperimentpublishedintheProceedingsoftheNationalAcademyofSciences,Sands(2017)presentedacausaleffectbasedonap-valueoflessthan0.10,andthiswasenoughforpublicationinatopjournalandinthepopularpress.Voxmentionedthatworkuncriticallywithoutanyconcernregardingsignificancelevels(Resnick,2017).Bycontrast,Voxreportedstentsasaprimeexampleofthe“epidemicofunnecessarymedicaltreatments”afterORBITA(Belluz,2017).

TheseconcernsaredeepenedfurtherwhenoneconsidershowsensitiveresultsfromORBITAwerefromastatisticalstandpoint.Tobetterunderstandthisonecanperformasimplebootstrapanalysis,computingtheresultsthatwouldhavebeenobtainedfromreanalyzingthedata1000times,eachtimeresamplingpatientsfromtheexistingexperimentwithreplacement(Efron,1979).Asrawdatawerenotavailabletous,weapproximatedusingthenormaldistributionbasedontheobservedz-scoreof1.7.Theresultwasthat,in40%ofthesimulations,stentsoutperformedplacebowithp-valueslessthan0.05.Thisisnottosaythatstentsreallyarebetteronaveragethanplaceboinimprovingexercisetime—thedataalsoappearconsistentwithanulleffect.Thetake-homepointofthisexperimentisthattheresultscouldeasilyhavegone“theotherway”,whenreportingisforcedintoabinaryclassificationofstatisticalsignificance.

StatisticallyBasedversusClinicalDecision-Making

Injustifyingtheirstudydesignandsamplesize,Al-Lameeetal.(2017)wrote:“Evidencefromplacebo-controlledrandomisedcontrolledtrialsshowsthatsingleantianginaltherapiesprovideimprovementsinexercisetimeof48–55sec…Giventhepreviousevidence,ORBITAwasconservativelydesignedtobeabletodetectaneffectsizeof30sec.”Theestimatedeffectof21secwithstandarderror12secisconsistentwiththe“conservative”effectsizeestimateof30secgiveninthepublishedarticle.Soalthoughtheexperimentalresultsareconsistentwithanulleffect,theyareevenmoreconsistentwithasmallpositiveeffect.

Onemightask,however,abouttheclinicalsignificanceofsuchatreatmenteffect,whichwecandiscusswithoutrelevancetop-valuesorstatisticalsignificance.Forsimplicity,supposewetakethepointestimatefromthedataatfacevalue.Howshouldwethinkaboutanincreaseinaverageexercisetimeof21sec?Onewaytoconceptualizethisisin

Page 5: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

termsofpercentiles.Thedatashowapre-randomizationdistribution(averagingthetreatmentandcontrolgroups)withameanof509secandastandarddeviationof188sec.Assuminganormalapproximation,anincreaseinexercisetimeof21secfrom509to530secwouldtakeapatientfromthe50thpercentiletothe54thpercentileofthedistribution.Lookedatthatway,itwouldbehardtogetexcitedaboutthiseffectsize,evenifitwerearealpopulationshift.Indeed,arecentstudyafterORBITAsuggestedironicallythatsuchgainsarepossibleduringatreadmilltestbysimplyplayingmusic.

Thus,thelargerclinicalquestionishowtobalancethelong-termbenefitsofstentswithrisksoftheprocedure.Itdoesnotseemreasonableforapersontoreceivestentsjustforapotentialbenefitof21secofexercisetimeonastandardizedtreadmilltest—orevenahypothesizedlargerbenefitof50sec,whichwouldstillonlyrepresenta10%improvementforanaveragepatientinthisstudy.Yetmaybea5%to10%increaseisconsequentialinthiscaseasitcouldimprovequalityoflifeforapatient.Perhapsthissmallgaininexercisetimeisassociatedwiththeneedforlessmedications,fewerfunctionallimitationsorgreatermobility.Ifso,however,onemightpostulatethisgainwouldhavebeenapparentinassessmentsofanginaburden,anditwasnot.

Abigconcernhereisthatthesepatientswerealreadydoingprettywellonmedications—thatis,theyalreadyhadalowsymptomfrequencybeforestenting.Forexample,anginafrequencyasmeasuredbytheSeattleAnginaQuestionnairewas63.2afteroptimizingmedicationsandbeforestentinginthetreatmentgroup.Thisroughlytranslatesas“monthly”angina(JohnSpertus,personalcommunication).Howdoesastudywithafollow-upofjust6weeksexpecttoimproveanoutcomethathappensthisinfrequently?Infact,oneofthegreatdebatessurroundingORBITAisthatthosewhodiscountthetrialsuggestitenrolledpatientswhotypicallydonotreceivestentsinroutinepractice.ThosewhobelieveORBITAisagame-changerarguethattheselesssymptomaticpatientsactuallymakeupalargeproportionofthosereceivingstents.

Finally,arestentsreallybeinggiventopatientswithstableanginajusttoimprovefitnessortoreducesymptoms?Oristhereacontinuedexpectationthatstentshavelong-termbenefitsforpatients,despiteearlierdatafromstudiesliketheClinicalOutcomesUtilizingRevascularizationandAggressiveDrugEvaluation(COURAGE)study(Boden,2007)?Thiswouldseemtobethekeyquestion,inwhichcasetheshort-termeffects,orlackthereof,foundintheORBITAstudyarelargelyirrelevant.Otherlargertrials,suchasInternationalStudyofComparativeHealthEffectivenessWithMedicalandInvasiveApproaches(ISCHEMIA,see:https://clinicaltrials.gov/ct2/show/NCT01471522)areconsideringthismorefundamentalquestionbutwillnothaveaplaceboprocedure.

Page 6: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

EvidencefromORBITAthatpointedtowardconsistentimprovementsinthephysiologicalparameterofischemiathroughendpointssuchasfractionalflowreserveandstressechosuggeststhereislittlequestionthatsomephysiologicalchangesarebeingmadebystents,withverylargeandhighlystatisticallysignificant.Asisoftenthecase,thenullhypothesisthatthesephysicalchangesshouldmakeabsolutelyzerodifferencetoanydownstreamclinicaloutcomesseemsfarfetched.Thus,thesensiblequestiontoaskis“Howlargearetheclinicaldifferencesobservedandaretheyworthit?”—not“Howsurprisingistheobservedmeandifferenceundera[spurious]nullhypothesis?”

4.Recommendationsforstatisticalreportingoftrials

Thesearchforbettermedicalcareisanincrementalprocess,withincompleteevidenceaccumulatingovertime.Thereisunfortunatelyafundamentalincompatibilitybetweenthatcoreideaandthecommonpractice,bothinmedicaljournalsandthenewsmedia,ofup-or-downreportingofindividualstudiesbasedonstatisticalsignificance.WeoffersomerecommendationstotacklethisissueinBox2.

Inthedesign,evaluation,andreportingofexperimentalstudies,thereisanormoffocusingonthestatisticalsignificanceofaprimaryoutcome—describedattimesas“significantitis”or“dichotomania”(Greenland,2017).Itleadstoanoverrelianceonphraseslike,“Wedeemedapvaluelessthan0.05tobesignificant,”thatarecommonthroughoutthepublishedliterature.Theresultingconclusionsfromsuchaprocessfrequentlywillbefragilebecausep-valuesareextremelynoisyunlesstheunderlyingeffectishuge.Totheircredit,theORBITAauthorsthemselveshaverecognizedthesecriticalissues(seeonline:https://twitter.com/ProfDFrancis/status/952008644018753536).

ORBITAwasnevermeanttobedefinitiveinabroadsense—itwasdesignedtofindaphysiologicaleffectofstentingonmeanexercisetime,withoutclarityontheclinicalrelevanceofthisoutcome.Indeed,alikelyreasonwhythestudywaslimitedtothisendpointwasbecausethisisallthatcouldhavepassedanethicalboardgiventhenoveltyoftheplaceboprocedureinthissetting.FurtherbackgroundonthesetopicsfromDarrelFrancis,theseniorauthoronthestudy,appearsatHarrell(2017b).OnecertainimpactofORBITAisthatbiggertrialsofstentingwithplaceboproceduresarenowmuchmorelikelywithamoremeaningfulsetofoutcomesthatwillbemeasured.

Wedon’tseeanyeasyanswershere—long-termoutcomeswouldrequirealong-termstudy,afterall,andclinicaldecisionsneedtobemaderightaway,everyday.But

Page 7: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

perhapswecanuseourexaminationofthisparticularstudyanditsreportingtosuggestpracticaldirectionsforimprovementinhearttreatmentstudiesandinthedesignandreportingofclinicaltrialsmoregenerally.

Page 8: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

References

Al-Lamee,R.,Thompson,D.,Dehbi,H.M.,Sen,S.,Tang,K.,Davies,J.,Keeble,T.,Mielewczik,M.,Kaprielian,R.,Malik,I.S.,Nijjer,S.S.,Petraco,R.,Cook,C.,Ahmad,Y.,Howard,J.,Baker,C.,Sharp,A.,Gerber,R.,Talwar,S.,Assomull,R.,Mayet,J.,Wensel,R.,Collier,D.,Shun-Shin,M.,Thom,S.A.,Davies,J.E.,andFrancis,D.P.(2017).Percutaneouscoronaryinterventioninstableangina(ORBITA):adouble-blind,randomisedcontrolledtrial.Lancet.http://dx.doi.org/10.1016/S0140-6736(17)32714-9

Allison,D.B.,Brown,A.W.,George,B.J.,Kaiser,K.A.(2016).Reproducibility:Atragedyoferrors.Nature530,27–29.doi:10.1038/530027a.PubMedPMID:26842041;PubMedCentralPMCID:PMC4831566.

AmericanCollegeofCardiology(2017).ORBITA:Firstplacebo-controlledrandomizedtrialofPCIinCADpatients.ACCNews,2Nov.http://www.acc.org/latest-in-cardiology/articles/2017/10/27/13/34/thurs-1150am-orbita-tct-2017

Belluz,J.(2017).Thousandsofheartpatientsgetstentsthatmaydomoreharmthangood.Vox.com,6Nov.https://www.vox.com/science-and-health/2017/11/3/16599072/stent-chest-pain-treatment-angina-not-effective

Bland,J.M.,andAltman,D.G.(2015).Best(butoftforgotten)practices:Testingfortreatmenteffectsinrandomizedtrialsbyseparateanalysesofchangesfrombaselineineachgroupisamisleadingapproach.AmericanJournalofClinicalNutrition102,991–994.doi:10.3945/ajcn.115.119768.Epub2015Sep9.PubMedPMID:26354536.

Boden,W.E.,O'Rourke,R.A.,Teo,K.K.,Hartigan,P.M.,Maron,D.J.,Kostuk,W.J.,Knudtson,M.,Dada,M.,Casperson,P.,Harris,C.L.,Chaitman,B.R.,Shaw,L.,Gosselin,G.,Nawaz,S.,Title,L.M.,Gau,G.,Blaustein,A.S.,Booth,D.C.,Bates,E.R.,Spertus,J.A.,Berman,D.S.,Mancini,G.B.,andWeintraub,W.S.;COURAGETrialResearchGroup.(2007).OptimalmedicaltherapywithorwithoutPCIforstablecoronarydisease.NewEnglandJournalofMedicine356,1503–16.Epub2007Mar26.

Efron,B.(1979).Bootstrapmethods:Anotherlookatthejackknife.AnnalsofStatistics7,1–26.

Gelman,A.(2004).Treatmenteffectsinbefore-afterdata.InAppliedBayesianModelingandCausalInferencefromIncomplete-dataPerspectives,ed.A.GelmanandX.L.Meng,chapter18.NewYork:Wiley.

Gelman,A.(2018).Thefailureofnullhypothesissignificancetestingwhenstudyingincrementalchanges,andwhattodoaboutit.PersonalityandSocialPsychologyBulletin44,16–23.

Page 9: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

Gelman,A.,andCarlin,J.B.(2014).Beyondpowercalculations:AssessingTypeS(sign)andTypeM(magnitude)errors.PerspectivesonPsychologicalScience9,641–651.

Gelman,A.,andStern,H.S.(2006).Thedifferencebetween“significant”and“notsignificant”isnotitselfstatisticallysignificant.AmericanStatistician60,328–331.

Greenland,S.(2017).Theneedforcognitivescienceinmethodology.AmericanJournalofEpidemiology186,639–645.

Harrell,F.(2017a).Statisticalerrorsinthemedicalliterature.StatisticalThinkingblog,8Apr.http://www.fharrell.com/2017/04/statistical-errors-in-medical-literature.html

Harrell,F.(2017b).Statisticalcriticismiseasy;Ineedtorememberthatrealpeopleareinvolved.StatisticalThinkingblog,5Nov.http://www.fharrell.com/2017/11/statistiorbita-tct-2017cal-criticism-is-easy-i-need-to.html

Kolata,G.(2017).’Unbelievable’:Heartstentsfailtoeasechestpain.NewYorkTimes,2Nov.https://www.nytimes.com/2017/11/02/health/heart-disease-stents.html

Resnick,B.(2017).Whitefearofdemographicchangeisapowerfulpsychologicalforce.Vox.com,28Jan.https://www.vox.com/science-and-health/2017/1/26/14340542/white-fear-trump-psychology-minority-majority

Sands,M.L.(2017).Exposuretoinequalityaffectssupportforredistribution.ProceedingsoftheNationalAcademyofSciences114,663–668.

Schulz,K.F.,andGrimes,D.A.(2005).Samplesizecalculationsinrandomisedtrials:Mandatoryandmystical.Lancet365,1348–1353.

Simmons,J.,Nelson,L.,andSimonsohn,U.(2011).False-positivepsychology:Undisclosedflexibilityindatacollectionandanalysisallowpresentinganythingassignificant.PsychologicalScience22,1359-1366.

Vickers,A.J.,andAltman,D.G.(2001).Analysingcontrolledtrialswithbaselineandfollowupmeasurements.BritishMedicalJournal323,1123–1124.

Wasserstein,R.L.,andLazar,N.A.(2016).TheASA'sstatementonp-values:Context,process,andpurpose.AmericanStatistician70,129–133.

Page 10: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

SupplementaryTable.Summarydatacomparingstentstoplacebo,fromTable3ofAl-Lameeetal.(2017).

Page 11: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

Box1.Usingthereporteddatasummariestoobtaintheanalysiscontrollingforthepre-treatmentmeasureForeachofthetreatmentandcontrolgroups,wearegiventhestandarddeviationofthepre-testmeasurements,thestandarddeviationofthepost-testmeasurements,andthestandarddeviationoftheirdifference,whichcanbeobtainedbytakingthewidthoftheconfidenceintervalforthedifference,dividingby4togetthestandarderrorofthedifference,andthenmultiplyingby 𝑛togetbacktothestandarddeviation.

Thenweusetherule,sd(y! − y!) = sd y! ! + sd y! !

− 2ρ sd(y!)sd(y!)andsolveforρ,thecorrelationbetweenbeforeandaftermeasurementswithineachgroup.Theresultinthiscaseisρ=0.88withineachgroup.Wethenconvertthecorrelationtoaregressioncoefficientofy!ony!usingthewell-knownformula,β = ρ sd(y!)/sd(y!),whichyieldsβ = 0.88forthetreatedandβ = 0.86forthecontrolgroup.Ifthesetwocoefficientsweremuchdifferentfromeachother,wemightwanttoconsideraninteractionmodel(Gelman,2004),butheretheyarecloseenoughthatwesimplytaketheiraverage.

Weusetheaverage,β=0.87,in(2)andgetanestimatefortheadjustedmeandifferenceof21.3(indeed,quiteabithigherthanthereporteddifferenceingainscoresof16.6)withastandarderrorof12.5(veryslightlylowerthan12.7,thestandarderrorofthedifferenceingainscores)and95%CI−3.2to45.8s.Theestimateisnotquitetwostandarderrorsawayfromzero:thez-scoreis1.7,andthep-valueis0.09.

Page 12: ORBITA: A case study in the analysis and reporting …gelman/research/unpublished/S...ORBITA: A case study in the analysis and reporting of clinical trials Andrew Gelman, John Carlin

Box2.RecommendationsforAnalysesandReportingAnalyses1.Baselineadjustmentfordifferences:shouldbeprespecifiedfortheprimaryanalysiswherestrongconfounderssuchasabaselinemeasureoftheoutcomeareavailable.2.Beawareoffragilityofinferences.Fragilitycanbedemonstratedusingthesamplingorposteriordistributionasestimatedusingmathematicalmodeling,bootstrapsimulation,orBayesiananalysis.Reporting1.Avoiduseofsharpthresholdsforp-valuesandthuseliminatetheterm“statisticalsignificance”fromthereportingofresults.2.Considerthefullrange(upperandlowerends)ofintervalestimatesforimportantoutcomesandtheirpotentialinclusionofclinicallyimportantdifferences.3.Considerthepotentialforindividualvariabilityinresponses(heterogeneityoftreatmenteffects)andnotjustmeandifferences.