19
1 Paper submitted to special issue “Curiosity, Imagination and Surprise” of Research in the History of Economic Thought and Methodology, January 2018 Re-Thinking Reproducibility as a Criterion for Research Quality Sabina Leonelli Exeter Centre for the Study of the Life Sciences & Department of Sociology, Philosophy and Anthropology University of Exeter [email protected] Abstract: A heated debate surrounds the significance of reproducibility as an indicator for research quality and reliability, with many commentators linking a “crisis of reproducibility” to the rise of fraudulent, careless and unreliable practices of knowledge production. Through the analysis of discourse and practices across research fields, I point out that reproducibility is not only interpreted in different ways, but also serves a variety of epistemic functions depending on the research at hand. Given such variation, I argue that the uncritical pursuit of reproducibility as an overarching epistemic value is misleading and potentially damaging to scientific advancement. Requirements for reproducibility, however they are interpreted, are one of many available means to secure reliable research outcomes. Furthermore, there are cases where the focus on enhancing reproducibility turns out not to foster high-quality research. Scientific communities and Open Science advocates should learn from inferential reasoning from irreproducible data, and promote incentives for all researchers to explicitly and publicly discuss (1) their methodological commitments, (2) the ways in which they learn from mistakes and problems in everyday practice, and (3) the strategies they use to choose which research component of any project needs to be preserved in the long term, and how. Keywords: reproducibility; research methods; data practices; research methods; pluralism.

Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

1

Papersubmittedtospecialissue“Curiosity,ImaginationandSurprise”ofResearchintheHistoryofEconomicThoughtandMethodology,January2018Re-ThinkingReproducibilityasaCriterionforResearchQualitySabinaLeonelliExeterCentrefortheStudyoftheLifeSciences&DepartmentofSociology,PhilosophyandAnthropologyUniversityofExeters.leonelli@exeter.ac.ukAbstract:Aheateddebatesurroundsthesignificanceofreproducibilityasanindicatorforresearchqualityandreliability,withmanycommentatorslinkinga“crisisofreproducibility”totheriseoffraudulent,carelessandunreliablepracticesofknowledgeproduction.Throughtheanalysisofdiscourseandpracticesacrossresearchfields,Ipointoutthatreproducibilityisnotonlyinterpretedindifferentways,butalsoservesavarietyofepistemicfunctionsdependingontheresearchathand.Givensuchvariation,Iarguethattheuncriticalpursuitofreproducibilityasanoverarchingepistemicvalueismisleadingandpotentiallydamagingtoscientificadvancement.Requirementsforreproducibility,howevertheyareinterpreted,areoneofmanyavailablemeanstosecurereliableresearchoutcomes.Furthermore,therearecaseswherethefocusonenhancingreproducibilityturnsoutnottofosterhigh-qualityresearch.ScientificcommunitiesandOpenScienceadvocatesshouldlearnfrominferentialreasoningfromirreproducibledata,andpromoteincentivesforallresearcherstoexplicitlyandpubliclydiscuss(1)theirmethodologicalcommitments,(2)thewaysinwhichtheylearnfrommistakesandproblemsineverydaypractice,and(3)thestrategiestheyusetochoosewhichresearchcomponentofanyprojectneedstobepreservedinthelongterm,andhow.Keywords:reproducibility;researchmethods;datapractices;researchmethods;pluralism.

Page 2: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

2

Introduction:TheReproducibilityCrisis

Thereproducibilityofresearchresultsisoftenflaggedasapriorityinscientificresearch,afundamentalformofvalidationfordataandamajormotivationforopendatapolicies(RoyalSociety2012,ScienceInternational2015).Theepistemicsignificanceattributedtoreproducibilityhasrecentlybecomepoignantasaresultofthefailureofstudiesaimingtoreproducetheresultsofwell-established,publishedresearchinthefieldsofmolecularbiology(Ioannidisetal2012),experimentalpsychology(OpenScienceCollaboration2015;Baker2015)andclinicalmedicine(Prinzetal2011;BegleyandEllis2012).Thefindingsofthese“reproducibilitytests”havebeenpublishedandcommenteduponinprominentinternationaljournalswithinscienceandbeyond(e.g.asthemainfeatureina2013issueofTheEconomist),causingwidespreadworryanduproar.Failuretoreproduceresultswastakentoindicate,atbest,problemsinthedesignandmethodsusedbytheresearchersinquestion,andatworst,afundamentallackofcredibilityfortheknowledgetherebyobtained–apositionreminiscentofKarlPopper’spronouncementthat“non-reproduciblesingleoccurrencesareofnosignificancetoscience”(1959,64).In2015,Naturepublishedafullspecialissueonthe“challengesofirreproducibleresearch”(http://www.nature.com/news/reproducibility-1.17552),whichincludedacartoonshowingtheruinsofthetempleof“robustscience”[figure1].

Figure1.FromNature,September15,2015.ReproducedbycourtesyofNaturePublishingGroup.

Page 3: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

3

Manycommentatorspointedtothelackofreproducibilityasasignalthat,duetoincreasingadministrative,socialandfinancialpressures,scientistsarebecomingsloppierandeagertopublishassoonasfeasible,withouttakingadequatetimeandefforttoverifythereliabilityoftheirresults.Thisencouragesthegenerationoflow-qualityresearchoutcomes,whichshouldnotberelieduponassourcesofknowledge.Thisso-called“crisisofreproducibility”hasalsobeenlinkedto:ineffectualpracticesofqualitycontrolandself-correctionwithinjournals,resultinginthefailuretopickupseriouserrorsandunreliableresultsattherefereeingstage(Allenetal,2016);ageneralincreaseinthecomplexityandscaleofexperimentsandrelatedstatisticalanalysesofresults,withincreasingspecializationanddivisionoflaborresultinginlackofclarityaroundwhoisactuallyresponsibleforqualitychecks(Leonelli2017a);widespreadcognitivebiasamongresearchers,whoendupreinforcingeachother’spreexistingbeliefsandsettingupresearchplansandvalidationproceduresaccordingly(Bem,UttsandJohnson2011);questionableusesofstatisticaltechniquestosmoothenbiasandexcludeuncomfortableresults(e.g.thepracticeofp-hackingandselectivereporting;Simmons,NelsonandSimonsohn2011,Fanelli2012);andlackoftransparencyandeffectivecriticaldebatearoundresearchmethodsanddata,whichmakesitimpossibleforresearchersoutsideanygivenprojecttoscrutinizeitsresults(ScienceInternational2015).Amongthepotentialconsequencesofthecrisis,commentatorslistanincreasingmistrustinthecredibilityofscientificfindingsbypolicy-makersandcitizens,thesquanderingofprivateandpublicinvestmentsinthesearchfornewtechnologiesandmedicaltreatments,andpermanentdamagetotheintegrityandethosofthescientificenterprise(e.g.KNAW2018).Itishardlypossibletoimaginehigherstakesthanthesefortheworldofscience.Itsfutureexistenceandsocialroleseemstohingeontheabilityofresearchersandscientificinstitutionstorespondtothecrisis,thusavertingacompletelossoftrustinscientificexpertisebycivilsociety.Ifindmanyofthecriticalreflectionsonthechallengespresentedbythecurrentscale,organizationandmethodsofmuchexperimentalresearchtobetimelyandcompelling,andIagreethattraditionalmodelsofscientificpublishingandpublicscrutinyarenolongerfitforpurposewithintheincreasinglyglobalized,costly,technology-dependentandmulti-disciplinarylandscapeofcontemporaryresearch(Leonelli2016,2017a/b).However,inthispaperItakeissuewiththewidespreadreferencetoreproducibilityasanoverarchingepistemicvalueforscienceandagoodproxymeasureforthequalityandreliabilityofresearchresults.Reproducibilitycomesinavarietyofformsgearedtodifferentmethodsandgoalsinscience.Whilemostcommentatorsfocusontheuseofreproducibilityasthebeststrategytoachieveinter-subjectivelyreliableoutcomes,Ishallarguethat(1)suchconvergenceofresultscanbeobtainedintheabsenceofreproduction,and(2)adifferent,yetcrucialfunctionofreproducibilityconsistsinhelpingresearcherstoidentifyrelevantvariantsin

Page 4: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

4

thefirstplace.Myanalysisisorganizedasfollows.Takinginspirationfromsomeofthescientificandphilosophicaldiscussionsonreproducibility,Idiscusshowthemeaningofthistermcanshiftdramaticallydependingonwhatitisappliedto,andhow.Ithenreviewhowsuchvariabilityplaysoutinpracticebyidentifyingsixbroadtypesofresearchassociatedwithdifferentinterpretationsofreproducibility.Iconsiderhowthisvariationislinkedtothedegreeofcontrolthatresearchersareableandwillingtoexerciseontheirmaterialsandontheenvironmentinwhichtheirinvestigationstakeplace,aswellastotheextenttowhichtheyrelyonstatisticalmethodsinassessingthevalidityoftheevidencebeingproduced.Instudieswherecontroloverenvironmentalvariantsisonlypartiallyachieved,forinstance,reproductionresultingindifferentoutcomesisperceivedashighlyvaluable,sinceitcansignalhithertounknownsourcesofvariationordefinethescopeofthehypothesisbeingtested.Bycontrast,instudiesthatarecarriedoutinhighlyidiosyncraticenvironmentalconditionsand/oronperishableandraresampleswhichdonotlendthemselvestostatisticalanalysis,itistheveryuniquenessandirreproducibilityofresearchconditionsthatmakestheresultingdatavaluableassourcesofevidence.Insuchcases,afocusonenhancingreproducibilityturnsoutnottobethebestwaytofosterhigh-quality,robustresearchoutcomes.Rather,itisthewell-informedanalysisofhowreliableandmeaningfuldataareobtainedthroughirreproducibleresearchpracticesthatincreasesthesophisticationofresearchmethodsandofthewaysinwhichtheyaredocumentedanddisseminated.Thisoftenincludesanemphasisonthetriangulationofresultsobtainedthroughavarietyofmethodsandunderdiversecircumstances–therebytakingconvergenceamongfindingscomingfromdifferentsourcesassignalingtherobustnessandreliabilityofaparticularinference.1Iconcludethattheuncriticalpursuitofanarrowinterpretationofreproducibilityasanoverarchingepistemicvalueforresearch,endowedwiththepowertodemarcatesciencefromnon-science,ismisleadingandpotentiallydamagingtoscientificadvancement.Reproducibilityisnotonlyinterpretedinmanydifferentlyways,butalsoservesavarietyofepistemicfunctionsdependingonthefieldandtypeofresearchathand.Requirementsforreproducibility,howevertheyareinterpreted,arethusonlyoneoftheavailablemeanstosecurereliableresearchoutcomes.Inthefinalsectionofthepaper,IhighlighttheimplicationsofthisargumentforscientificcommunitiesandOpenScienceadvocates.

1 While maintaining a focus on repetition as a conclusive test, Allan Franklin (1986) discusses other several strategies for enhancing trust in experimental results. Chapman and Wylie (2016) provide an excellent discussion of triangulation as a key strategy for testing the robustness of inferential reasoning in the historical sciences.

Page 5: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

5

WhatIsReproducibility?Withinboththenaturalsciences,thephilosophyofscienceandsciencestudiesthereisawidevarietyofinterpretationsofthemeaningandpracticalimplicationsoftheterm‘reproducibility’-andofassociatedtermssuchasreplicabilityandrepeatability.Thisshouldnotbesurprising,giventhevastarrayofgoalsandconcernsrelatedtoreproducibilitythatexistacrossdifferentfieldsandapproachestoresearch.Withinapplieddomainssuchaspharmacologyandclinicalmedicine,reproducibilityiswidelyregardedasawaytoassessthesafetyofveryspecificknowledgeclaimsandrelatedproducts(suchasdrugs),andthecircumstancesinwhichsuchsafetycanbeguaranteed.Ifanexperimentfailstoyieldthesameresultswhentriedondifferentpopulations,forinstance,reproducibilitycanbeusedasatooltoinvestigatetherangeofvalidityoftheinferencesbeingtested.Reproducibilitycanalsobeinvokedforothergoals,however,suchas:thedebuggingofsoftwarefromerrorsincomputerscienceandinformatics;theexplorationofhowandwhymethodstriedondifferentmaterials,oronthesamematerialsatdifferenttimesorindifferentplaces,mayyielddifferentoutcomes–asisoftenthecaseinthelifesciences;ortheinvestigationoftheeffectofresearchers’ownbiases,assumptionsandinterestsontheoutcomesofresearch–acrucialissueparticularlyinexperimentalpsychologyandbehaviouraleconomics.2AsnotedbyHansRadder:

“manyphilosophersassume,implicitlyorexplicitly,thatsuccessfulexperimentsareorshouldbereproducible.However,since“experiment”isageneraltermforwhatinfactisarathercomplexprocess,theprecisemeaningofthisassumptionisnotclear.Toclarifythenotionofreproducibilityweneedtoaddressthefollowingquestion:reproducibilityofwhatandbywhom?”(Radder1996,16).

Thisquestionbecomesevermorevexingwhenconsideringresearchthatdoesnotrelyonexperimentaltechniquesbutratheronobservationaldata,includingbothquantitativeapproachessuchassurveysandqualitativemethodssuchasethnography.Thislandscapecanbeveryconfusingtoanalystswhoaretryingtorelatefield-specificdiscussionsofreproducibilitytoeachother,particularlysincecommentatorstendtousethesametermsindifferentways–andoftentodiscussdifferentproblems.Ratherthanattemptingtoreviewindetailthismyriadofterminologicaldifferences,Ifocusontwocentralconcernsthatallcommentatorsseemtoagree

2Casesofreproducibilityfailureineconomicsarelesswidelydiscussed,butarenolessinteresting.JulieNelsonre-analysedexistingstudiessuggestingstrongevidenceforwomenbeingmorerisk-aversethanmen,andshowedhowtheycherry-pickedfindings,usedbogusstatisticalmethodsandvisualizeddatainwaysthatsuggestedindefensibleclaims(forinstance,sheshowedthatassoonasyouvisualizethedatashowingwidevariationamongresponsesinbothmenandwomen,theeffectdiminishesanditbecomesclearthatrisk-aversionisnoteasilycorrelatedtogenderdifferencesbetweenindividuals;Nelson2016).

Page 6: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

6

on.3Oneisthedistinctionbetweentheabilitytoobtainthesameresultsthroughthesamemethods,andtheabilitytoobtainthesameresultsthroughanarrayofdifferentmethods.Radderreferstotheformerasreproducibilityandtothelatterasreplicability,thusassociatingthetermreproducibilitywiththerequirementtosuccessfullyrepeatthewholeofaresearchprocedureinordertoascertainthereliabilityofitsoutcomes(Radder1996,2012).Thisinturnrequiresaveryhighdegreeofcontroloversourcesofvariabilitythatmayaffectnotonlytheoutcomes,butalsotheveryperformanceofaninvestigation(suchas,forinstance,whetherresearcherscanreliablyexpectacertainsetofreactionsfromagivenorganismormaterialunderinvestigationtoagivenenvironmentalstimulus).Thesecondkeyconcerniswhatismeantby‘thesameoutcomes’.Whatcommentatorsandresearchersviewasoutcomescanencompassobjectsandprocessesatdifferentlevelsofabstractionandproducedatdifferentstagesofresearch.Theserangefromdirectmeasurementsgeneratedbyexperimentalapparatus(whatIanHackingcalls“marks”andmanyothersrefertoas“rawdata”,tosignaltherelativeindependencebetweenthemeasurementsandsubsequentinterpretations;Hacking1995,Leonelli2016)topatternsextractedfromsuchmeasurementsintheformofdatamodels,causalgeneralisationsderivedfromdataanalysisintheformofknowledgestatements,orformsofinterventionsuchasthoseusedtocreateanewchemicalsubstanceormodifythestructureofacell.Whetheroutcomesareconsideredtobedata,models,interventionsorknowledgestatementsmattersenormouslytotheinterpretationofclaimsaroundreproducibility,sincethereproductionofthesameoutcomescanbeinterpretedastherequirementtoobtainpreciselythesamedatasetmultipletimesor,moreliberally,asbeingabletoderivethesamepatternsand/orgeneralisationsfromthatdatasetmultipletimesandacrossdifferentcircumstances(evenifthedatathemselvesarenotpreciselythesame).Remarkably,despitethevariableinterpretationsofthetermthatareimmediatelyvisiblethroughsuchbasicconsiderations,theframingofreproducibilityusedintopsciencejournalstodayremainsnarrowlylinkedtoaparticularunderstandingofwhatgoodresearchshouldlooklike.Mosttypically,reproducibilityisassociatedwithexperimentalresearchmethodsthatyieldnumericaloutcomes.Existingscholarshipinthephilosophyofmeasurementandexperimentationpointstoatleastthreereasonsforthis:(1)experimentationaimstoachieveameasureofcontroloversourcesofvariabilitythatmayaffecttheoutcomesofresearch,andthehigherthedegreeofcontrolthatresearchershaveovermaterialsandset-up,thehigherthechancethatrepeatingexperimentswillyieldthesameoutcomes(Radder2012);(2)quantitativeoutcomesareeasiertocomparewitheachotherandaggregateintopatternsthanothertypesofdata,thusenablingwhatlookslikeasimpleandimmediatecomparisonbetweenresearchoutcomes(Porter1995);(3)numberslendthemselvestocomputationallyandstatisticallydrivenformsofanalysisandvalidation,whichareoftenassumedtolendameasureofmechanicalobjectivity 3 An excellent review of the various ways in which the term replicability is used in the life sciences, with explicit reference to the distinctive phases of research projects, is provided by Shavit and Ellison (2017, 9-14).

Page 7: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

7

totheevaluationofresearchoutcomes(DastonandGalison1992,Boumans2015).AstheseauthorshaverecognisedandIdiscussinthenextsection,however,thisapproachtoscientificresearchisnottheonlyoneyieldingusefulandreliableresults,norshoulditbeassumedtobe.Theinterpretationofreproducibilityassociatedwiththismodeofresearchworksbetterforcertainmethods,stagesandgoalsofinquirythanforothers,thusprovingtobeinadequateasanoverarchingcriterionforwhatreliable,high-qualityresearchneedstolooklike.

ReproducibilityinScientificPracticeIherereflectonhowreproducibilityisinterpretedinrelationtomethodologiesemployedinresearch,therelationbetweentheseinterpretationsandthedegreeofcontrolthatresearchershaveontheirenvironment,andtheextenttowhichtheyrelyonstatisticalinference.Thisisnotmeanttobeanexhaustivelist,norisitmeanttoapplyaccuratelytoallexistinginstancesoftheexamplesbeingmentioned.Rather,itismeanttoconveyasenseofthepatternsandmethodologicalstandardscharacterizingdifferentpartsofthescientificworld,andthediversityofassumptionsandpracticesthatcanbeassociatedwiththeidealofreproducibilityinresearch.

1ComputationalReproducibilityRadder’sdefinitionofreproducibilityisperhapsbestmatchedbythepracticesofsoftwaredevelopmentandvalidationusedincomputerscienceandinformatics,whereresearchersfocusonfindingandresolvingmistakesandbugsindataanalysisbyrunningthesamedatathroughagivensetofalgorithmsoverandoveragain.Hereitisnotonlypossible,butactuallynecessarytoreduceanysourceofvariationotherthaninthesoftwareorstatisticaltoolsbeingusedtoaminimum,soastospotwhatevererrormayhaveinfiltratedthesystem.Asdiscussedinarecentvolumeaimedtoprovideanoverviewofreproducibilitytechniques,awell-establishedwaytocapturethisistheideaof“computationalreproducibility”:“Aresearchprojectiscomputationallyreproducibleifasecondinvestigator[..]canrecreatethefinalreportedresultsoftheproject,includingkeyquantitativefindings,tables,andfigures,givenonlyasetoffilesandwritteninstructions.”(Kitzes2016,12).ApaperpublishedinSciencein2011usefullylinksthisinterpretationofreproducibilitywiththeuseofminimalstandardsforpublishingmeta-datawithincomputationalscience.Asshownintheirgraphicalillustrationofwhattheycall“reproducibilityspectrum”,capturedinfigure2below,thisviewofreproducibilitydoesnotinvolvereplicatingthecircumstancesofdataproduction,butratherbeingabletoobtainthesameoutcomeswhenrunningagivendatasetthroughthesamealgorithms.

Page 8: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

8

Figure2.ReproducibilityspectrumaccordingtoPeng(2011),redrawnbyMichelDurinx.

2DirectExperimentalReproducibility:StandardisedExperimentsThecircumstancesofdataproductionare,bycontrast,aprimaryconcernforexperimentalists,andindeedcomputationalreproducibilityishardlyapplicabletoexperimentalresearch,whereoneisunavoidablyconfrontedwithvariation.Atthesametime,thereissignificantdisagreementamongexperimentalfieldsconcerningwhatisseenasadesirableandrealisticdegreeofstandardisationandcontroloverenvironmentalvariables.Inclinicaltrialsaimedtotestthesafetyandefficacyofgivendrugsvis-à-vishumanpopulations,thedegreeofcontrolsandstandardizationbeingimplementedisamongthemostimpressiveandtightlyscrutinizedwithinthebiomedicalrealm.Experimentsconductedinparticleacceleratorsinphysicsaresimilarlycontrolled,thefocusononecentralizedexperimentalapparatusbeingparticularlyhelpfulinestablishingafixedframeworkwithinwhichexperimentscanbesuccessfullyrepeated.TheconceptofreproducibilitypursuedinsuchsettingsiswhatIshallhereafterrefertoas‘directreproducibility’:theabilitytoobtainthesameresultsthroughtherepeatedapplicationofthesameresearchmethods.Thisisasclosetocomputationalreproducibilityasitcangetwithintheexperimentalsciences,thoughnobody–particularlyinclinicalscience-wouldexpectacompleteandexactmatchbetweentheresultsofdifferentrunsofthesameexperiment.Rather,whatisexpectedisastrongsimilarityinthedatasetsandamatchinthepatternsinferredfromthedata(KNAW2018).Thisistypicallyevaluatedthroughrecoursetostatisticalinference,thusprivilegingstatisticsasakeyvalidatingtoolforreasoningfromevidence.4

3Scoping,IndirectandHypotheticalReproducibility:Semi-StandardisedExperimentsMost experiments are carried out under conditions that are less tightlycontrolledthanclinicalresearch.Suchexperimentsarebestdescribedassemi- 4ThisiswhatJulianReisscalled“experimentalparadigm”(Reiss2015).

Page 9: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

9

standardised: that is, where methods, set-up and materials used have beenconstruedwithingenuityinordertoyieldveryspecificoutcomes,andyetsomesignificant parts of the set-up necessarily elude the controls set up byexperimenters. Typically, it is those ‘non-standardised’ components that yieldthemostvaluableepistemicinsightsforresearchers,bygeneratingconfoundingresults and enabling the exploration of new phenomena. 5 One example isresearch on model organisms, where researchers have spent decadesengineeringtheorganismstoconformtospecificmorphological,developmentalandbehaviouralstandards,andyetthebehaviourandstructureoftheorganismsremainsinpartunpredictable(averitable‘sampleofnature’;Leonelli2013)andhighlysusceptibletosubtleshiftsinenvironmentalcircumstances,rangingfromnutrition to lighting and temperature (Ankeny and Leonelli 2011, Reardon2016).6Anotherexampleispsychologicalexperimentsonsocialgroupsselectedbecause conforming to given physical, social and behavioural criteria, and yetpresenting unforeseen sources of variability of potential relevance to theoutcomesbeinggenerated.Athirdnotablecaseisthatofbrainscansandothertypes of brain imaging in neuroscience, which are often affected byimperceptible shifts in the subject’s mood and metabolism despite the tightcontrolexercisedbyresearchersonexternalstimuli(TurnerandDeHaan2017).Within this type of research, it is common to find complaints about howcontextual differences between laboratory settings, research objects and otherenvironmental circumstances compromise theextent towhich researchers canaim for reproducibility. And yet, many researchers working under theseconditions do not aim for direct reproducibility. Some run experimentalreproductions to spot sources of variation that may prove significant wheninterpreting the data at hand – for instance, in the case of results obtainedonmodelorganisms,whenestablishingtheextenttowhichagivenoutcomecanbereliablyimputedtoorganismsbeyondthoseoriginallyusedinthestudy.ThisisaninterpretationthatIwillcallscopingreproducibility.7Others prefer other yet interpretations. One is indirect reproducibility thatfocuses on obtaining similar results from the performance of differentexperiment(whatRaddercalledreplicability),andconstitutesausefulvalidationtooltoseewhetherresultsproducedundervariablecircumstancesconvergeornot. Another is what Felipe Romero calls hypothetical reproducibility (or 5MaryMorgan(2012a,296)introducedtheideaofconfoundmentasresultingfromanexperimentaloutcomechallengingresearchers’existingknowledgeoftheworld,andthusleadingtothediscoveryofnewphenomena.Thisiscontrastedtothesurprisetriggeredbymodelexperiments,wherediscoveryconcernstheworldofthemodel,ratherthantheworlditself.6ArecentNaturearticlecomplainedofthedifficultiesincontrollingforenvironmentalvariabilitywithinstudiesonmice:“Micearesensitivetominorchangesinfood,beddingandlightexposure.It’s no secret that therapies that look promising inmice rarelywork in people. But too often,experimentaltreatmentsthatsucceedinonemousepopulationdonotevenworkinothermice,suggestingthatmanyrodentstudiesmaybeflawedfromthestart.Researchersrarelyreportonsubtleenvironmentalfactorssuchastheirmice’sfood,beddingorexposuretolight;asaresult,conditions varywidely across labs despite an enormous body of research showing that thesefactorscansignificantlyaffecttheanimals’biology.“It’ssortofsurprisinghowmanypeoplearesurprised by the extent of the variation” betweenmice that receive different care, says CoryBrayton,apathologistatJohnsHopkinsUniversityinBaltimore,Maryland.”(Reardon2016:264)7IthankMaryMorgan,HansRadderandStephanGüttingerfordiscussionsonthispoint.

Page 10: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

10

‘conceptual replication’ inpsychologists’ jargon;Romero2017): theattempt toobtainoutcomesthatmatchthosepredictedasimplicationsofpreviousfindings,thereby confirming the reliability of the previous findings. Both theseinterpretations of reproducibility lean on the idea that convergence acrossmultiple linesofevidence,evenwhentheyareproduced indifferentways, isamarkof reliable research (a ‘tangleof support’ inChapmanandWylie’s terms,2016).

4ReproducibleExpertise:Non-StandardExperimentsandResearchonRareMaterialsThere are also experiments where control over environmental variability isextremely limited, and standardisation very low. These are cases whereexperimenters are studying new objects or phenomena (new organisms forinstance) and/or employing newly devised, unique instruments that arepreciselytailoredtotheinquiryathand.8Researchersthenfocuslessoncontrolsand more on developing robust ways of evaluating the effects of theirinterventions and the relation between those effects and the experimentalcircumstances at the time in which data were collected. Direct or indirectreproducibility are not helpful concepts within this type of research. Notably,however, the idea of reproducibility does not completely disappear, withresearchersemphasisingthereproducibilityoftheexpertise–thespecificskillsand interpretive abilities – underpinning the conduct of research over thereproducibility of the outcomes. I will refer to this interpretation ofreproducibilityasreproducibleexpertise,anddefineitastheexpectationthatanyskilled experimenter working with the same methods and the same type ofmaterialsatthatparticulartimeandplacewouldproducesimilarresults.Appealstoreproducibleexpertisearealsocharacteristicofresearchonmaterialsthatarerare,unique,perishableand/orinaccessible,suchasdepletablesamplesstored in biobanks; unique specimens, such as specific botanical finds orarchaeologicalremains;ormaterialsthatarehardorexpensivetoaccess,suchasvery costly strains of transgenic mice). These materials are not amenable torepeated investigation as required by the direct and indirect forms ofreproducibility.Thisdoesnotconstituteanobstacletousingsuchmaterialsforresearch,sincetheuniquenessandirreproducibilityofthematerialsisarguablywhat makes the resulting data particularly useful as evidence. The onus ofreproducibility shifts instead to the credibility and skills of the investigatorsentrusted with handling these materials. Apposite methodologies have beendeveloped to cope with the impossibility to directly replicate the findings,including vetted access, cross-samples research and the centralisation ofresearch in locations where several researchers canwork together and checkeachother’sworkandensureitsreliabilityforthosewithnoaccesstothesamematerialsources. 8 EdRamsden,RachelAnkeny,NicoleNelsonand Idiscussedmodelsoforganisms that includeenvironmental features and are uniquely tailored to the specific goals of a given inquiry as‘situatedmodels’(Ankenyetal2014).

Page 11: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

11

5ReproducibleObservation:Non-experimentalcasedescriptionA tremendousamountofresearch inthemedical,historicalandsocialsciencesdoesnotrestonexperimentation,butratheronobservationaltechniquessuchassurveys, descriptions and case reports documenting unique circumstances(Morgan 2012b). Such research is, again, not replicable in the sense of direct,indirectorevenhypotheticalreproducibility(Boumans2015)andyet,asinthecase of non-standard experiments, there is an emphasis on reproducibility ofobservation - the expectation being that any skilled researcher placed in thesame time and place would pick out, if not the same data, at least similarpatterns.Inotherwords,onecanlearntoobserveinveryspecific,reproducibleways.Examplesarethepracticesofcomparativemulti-sitedethnography,whereresearchers are trained to observe similar phenomena across very differentcircumstances; structured interviewing, where researchers devise a relativelyrigid framing for their interactions with informants; and diagnosis based onradiographies, resonance scans and other medical imaging techniques, whereskilledobservationbyexpertphysicians iscrucial toextractingmeaningfulandreliableinformation.

6IrreproducibleResearch:ParticipantObservationAttheotherendofthespectrumfromcomputationalreproducibilitythereisresearchwheretheideaofreproducibilityhasbeenrejectedinfavorofanembraceofthesubjectivityandunavoidablecontext-dependenceofresearchoutcomes.Researchersworkingwithhighlyidiosyncratic,situatedfindingsarewell-awarethattheycannotrelyonreproducibilityasanepistemiccriterionfordataqualityandvalidity.Theythereforedevoteconsiderablecaretodocumentingdataproductionprocessesandstrategizingaboutdatapreservationanddissemination.Inotherwords,theyprioritizesophisticatedstrategiesforenhancingtheaccountabilityoftheirmethodsanddatamanagementstrategies,aswellasthelong-termpreservationoftheinstruments,techniquesandmaterialsthroughwhichresultsweregenerated.Theveryfactthatdifferentobservershavedifferentviewpointsandproducedifferentdataandinterpretationsishereusedasastartingpointforassessingandvalidatingresearchresults.Ethnographicworkinanthropology,forinstance,hasdevelopedmethodstoaccountforthefactthatdataarelikelytochangedependingontime,place,subjectsaswellasresearchers’moods,experiencesandinterests.Keyamongsuchmethodsistheprincipleofreflexivity,whichrequiresresearcherstogiveascomprehensiveaviewoftheirpersonalcircumstancesduringresearchaspossible,soastoenablereaderstoevaluatehowthefocusofattention,emotionalstateandexistingcommitmentsoftheresearchersatthetimeoftheinvestigationmayhaveaffectedtheirresults.Muchcanbelearntwithinthequantitativesciencesfromthisapproach,whichexplicitlyandpubliclydiscussesthemethodologicalcommitmentsandprocessualnatureoftheresearch(includingthewaysinwhichinvestigators

Page 12: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

12

changetheirworkinresponsetoshiftingcircumstances,problemsandmistakesineverydaypractice),andmakesavirtueoutoftheunavoidablevariationamongstudiescarriedoutatdifferenttimes,bydifferentgroupsandindifferentplaces.Table1.Synopticviewoftypesofresearchdesign/methodsandrelatedunderstandingofreproducibilitydiscussedinsection3.Typeofresearch

Example Degreeofcontrolonenvironment

Relianceonstatisticsasinferentialtool

Reproducibleinwhichsense?

Softwaredevelopment

Computerengineering,informatics

Total High ComputationalR:Obtainsameresultsfromthesamedata.

Standardisedexperiments

Clinicaltrials,environmentalsafetycontrols

Veryhigh High DirectR:Obtainsameresultsfromdifferentrunsofthesameexperiment.

Semi-standardisedexperiments

Behaviouraleconomics,experimentalpsychology,researchonmodelorganisms

Limited Variable ScopingR:Usedifferencesinresultstoidentifyrelevantvariation.IndirectR:Obtainsameresultsfromdifferentexperiments.HypotheticalR:corroborateresultsimpliedbypreviousfindings.

Non-standardexperiments&researchbasedonrare,unique,perishable,inaccessiblematerials

Researchonexperimentalorganisms,archeology,paleontology,history

Low Low ReproducibleExpertise:Anyskilledexperimenterworkingwithsamemethodsandmaterialswouldproducesimilarresults

Non-experimentalcasedescription

Casereportsinmedicine,(typesof)multi-sitedethnography

None Low ReproducibleObservation:Anyskilledobserverwouldpickoutsimilarpatterns

Participantobservation

Ethology,participantobservationinanthropology

None None IrreproducibleObservation:differentobserversareassumedtohavedifferentviewpointsandproduce

Page 13: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

13

differentdataandinterpretations

BeyondtheIdealofDirectReproducibilityTheideathatreproducibility,interpretedunivocallyasthereproductionofthesameoutcomesbythesamemethods,cannotworkasanoverarchingcriterionforevaluatingthequalityandreliabilityofresearchisbynomeansnew.Awell-knowncriticofthisviewisHarryCollins,whoseseminalbookChangingOrderarguedthatthevastmajorityofexperimentsareneverreplicated,andeveninthefewcaseswherereplicationisattempted,itisnotalwayssuccessful(Collins1985).Itisneverthelessinterestingtonotethatwhilecriticaloftheuseofreproducibilityasadescriptiveandevaluativetool,Collinsissympathetictotheuseofreproducibilityasaregulatoryidealforscience:“eventhoughreplicationisbesetbyproblems,onemuststicktoitasthefundamentalwayofshowingthatscientificresultsaresecure.Scienceisamatterofhavingtherightaspirationseveniftheycannotbefulfilled”(Collins2017).Idisagreewiththisview.SeeingreproducibilityasakeyaspirationforsciencebringsusbacktothePopperianviewthatreproducibility,atleastheuristically,demarcatesgoodfrombadscience.Therearegoodreasonstoresistthetemptationtoimputesuchdecisiveepistemicpowertoreproducibility.AsIdiscussed,thisidealisappliedverydifferentlydependingonresearchfields,materialsandgoals.Nevertheless,reproducibilityascurrentlydiscussedinscienceandsciencepolicyisoftenlinkedtotheexpectationthatresearcherscanandshouldbeabletoexerciseahighlevelofcontroloverthecircumstances,environmentandmaterialsemployedinastudy.Insistingonthisinterpretationofreproducibility,particularlywithinfundingandassessmentstructures,canpushresearcherstoplacelessemphasisoncarefullyreportingthemoreidiosyncraticaspectsoftheirresearch,andinsteadfocusonproducinggeneralprotocolsthatdonotlingeronthespecificcharacteristicsoftheirlocalsituation.Itcanalsobeinterpretedasincentivisingresearcherstofocuslessattentiononthevariationcharacterizingtheirresults,andtheextenttowhichsuchvariationcanaffectthereliabilityandscopeoftheirconclusions.Thissituationcarriessignificantepistemicrisks.AsreportedbyNature,“researchersrarelyreportonsubtleenvironmentalfactorssuchastheirmice’sfood,beddingorexposuretolight;asaresult,conditionsvarywidelyacrosslabsdespiteanenormousbodyofresearchshowingthatthesefactorscansignificantlyaffecttheanimals’biology”(Reardon2016).Acontrastingandmoreproductiveattitudeistorepeatexperimentsnotasawaytoreproduceresults,butratherasawaytouncoversourcesofvariationandexploretheirsignificance(whatIalreadydiscussedunderthelabelof‘scopingreproducibility’).Generally,theemphasisonanarrowinterpretationofreproducibilityislinkedwithadevaluingoftheroleofexpertiseandembodiedknowledgeindataproduction,processingandassessment.Theseareinsteadhighlyvaluedandwell-accountedforinqualitativeresearchtraditionsthatfocusonthecriticalevaluationofdatainlightofthesituatednessofhumanperspectiveandthelocal

Page 14: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

14

natureofresearchmethodsandmaterials.Thisresultsinmuchusefulworkonthemeta-datathatshouldaccompanyandguidetheanalysisofanygivendataset(e.g.Zahle2018),andtheinsightsthatcanbegainedwhenobtainingdiverseoutcomesfromthesameresearchprocess.Italsofostersresearchontriangulationtechniquesaimingtocompareandintegrateresultscomingfromdifferenttraditions,locations,sourcesandmethodologies,whichinturnfacilitatestestingwhetheranygiveninferenceisrobustinthefaceofdifferentlinesofevidence.Giventheseissues,itisimportanttoaskwhydirectreproducibilityprovessoattractiveasanidealtowhichresearchshouldaspire.Aninsightfulexplanationfortheepistemicpowerattributedtothisapproachisprovidedbyanothercriticofthisview,JohnNorton.Hisskepticismisgroundedpartlyintheobservationthat“afailureofreplicationmaynotimpugnacredibleexperimentalresult;andasuccessfulreplicationcanfailtovindicateanincredibleexperimentalresult”;andpartlyintheviewthatwhetherornotanexperimentmaybeviewedassuccessfullyreplicatedisdeterminedbywhicheverbackgroundfactsresearchersappealtowhenevaluatingtheexperimentanditsresults.AsNortonnotes,“commonly,thesebackgroundfactsdosupportsuccessfulreplicationasagoodevidentialguideandthishasfosteredtheillusionofadeeper,exceptionlessprinciple”(Norton2015).Inotherwords,researchersthatpursuedirectreproducibilityhavebeensosuccessfulatengineeringcohesiveexperimentalsystemsthathighlycontrolledexperimentshavecometoexemplifytheverybestofresearchpractices,inwaysthatdonojusticetootherresearchmethods,andparticularlytoqualitativetraditions.

Conclusion:TheEpistemicValueofIrreproducibleResearchWehaveseenhowdirectreproducibilityworksbestinresearchenvironmentscharacterizedbyahighstandardizationofmethodsandmaterials,ahighdegreeofcontroloverenvironmentalvariabilityandreliableandrelevantmethodsofstatisticalinference.Itshouldnotbesurprisingthatresearchthatstraysfromtheseconditions–suchasexploratory,non-standardresearchcarriedoutonuniquesamplesandunderhighlyvariableenvironmentalconditions-hastroubleconformingtothisinterpretationofreproducibility,andIhavesuggestedherethatsuchconformityisneitherfruitfulnordesirable.Innon-standardtypesofinquiry,researcherstypicallyrecognizethatdirectreproducibilitycannotfunctionasanepistemiccriterionforresearchquality,andinsteaddevotecareandcriticalthinkingondocumentingdataproductionprocesses,examiningthevariationamongtheirmaterialsandenvironmentalconditions,andstrategizingaboutdatapreservationanddissemination.Withinqualitativeresearchtraditions,explicitlyside-steppingtheidealof(direct)reproducibilityhashelpedresearcherstoimprovethereliabilityandaccountabilityoftheirresearchpracticesanddata.Thisisrarelyacknowledgedinscientificdebates,buttherearesomeinteresting

Page 15: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

15

exceptions,particularlyasthereproducibilitydebateisreachingmaturityandtheinitialpolemicaltonesarebeingreplacedbymoresophisticatedreasoning.ArecentexampleisapiecewrittenbytheeditorofSciencejournalsinJanuary2018,wherehedefendsthestepstakenwithinthejournalstoimprovereproducibility,butalsonotesthat

“Anotherapproachtoassessreproducibilityinvolvedanexperimentalprogramthatattemptedtoreplicateselectedfindingsincancerbiologybygroupsnotinvolvedwiththeoriginalstudies(seehttps://elifesciences.org/collections/9b1e83d1/reproducibility-project-cancer-biology).Althoughsomefindingswerelargelyreproduced,inatleastonecase(whichwaspublishedinScience),thekeyfindingwasnot.Yet,theinitialresultshavebeenutilizedandextendedinpublishedstudiesfromseveralotherlaboratories.Thiscasereinforcesthenotionthatreproducibility,certainlyincancerbiology,isquitenuanced,andconsiderablecaremustbetakeninevaluatingbothinitialreportsandreportedattemptsatextensionandreplication.Cleardescriptionofexperimentaldetailsisessentialtofacilitatetheseefforts.TheincreaseduseofpreprintserverssuchasbioRxivbythebiologicalandbiomedicalcommunitiesmayplayaroleinfacilitatingcommunicationofsuccessfulandunsuccessfulreplicationresults.”(Berg2018)

Thisisanimportantsteptowardsacknowledgingthatthecircumstancesofreproducibilitydiffersacrossresearchfields.Itfallsshortofanacknowledgementthatreproducibilityitselfcancomeindifferentguises,andsometimesbeinstrumentaltoadifferentepistemicpurposeoraltogetherirrelevant.Hereagain,theeditorialequatessciencewithexperimentation,therebyexcludingnon-experimentalsciencesfromconsideration.

Whatisacrediblealternativetothecurrenttendencytorelyonnarrowformsofreproducibilityasafail-proofcriterionforwhatshouldbetrustedasgoodscience?Oneoptionistorecognisetheimportanceofreproducibilityasastrategytointerrogatevariationacrossresults,ratherthantovalidatetheoutcomesofresearch.Anotheroptionistofocusontheideaofconvergenceortriangulationofresults,asdiscussedabove.PerhapsthestrongestadvocateofthatviewinphilosophyhasbeenHacking,whenarguingthatscientificresultscanbetrustedwhentheelementsoflaboratorysciencearebroughtintomutualconsistencyandsupport,matcheachotherandbecome“mutuallyself-vindicating”(Hacking1992,56).9Athirdalternativeistotailorreproducibilityrequirementstothecircumstancesandgoalsofanyspecificprojectorareaofresearch,whileatthesametimelearningfromsituationswhereinsistingonreproducibleresultsisneitherrealisticnorsignificanttoassessingthequalityofagivenstudy.Researcherswhocannotrelyonreproducibilityasanepistemic

9 TheelementsthatHackingsinglesoutinclude“(1)ideas:questions,backgroundknowledge,systematictheory,topicalhypotheses,andmodelingoftheapparatus;(2)things:target,sourceofmodification,detectors,tools,anddatagenerators;and(3)marksandthemanipulationofmarks:data,dataassessment,datareduction,dataanalysis,andinterpretation”(Hacking1992,56)–thuspotentiallyembracingresearchbeyondtheexperimentalsciences.

Page 16: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

16

criterionforqualityandvaliditytendtodevotemorecareandcriticalthinkingtoaccountingforresearchmethodsandcircumstances–forinstancebydocumentingdataproductionprocesses(includingthemoodandcircumstancesofindividualresearchers)andstrategizingaboutthepreservationanddisseminationofrelatedinstruments,protocolsandmaterials.

Indeed,manyscientificcommunitiesandOpenScienceadvocatesfocustheireffortsonfosteringpublicdiscussionofresearchers’methodologicalcommitmentsandeverydaypractices.Inconsultationwithlearnedsocietiesandexpertsinqualitativeresearch,fundingbodiesandresearchorganisationsshouldmirrortheseeffortsbyprovidingresearcherswithincentivesandguidelinestoexplicitlydiscussnotonlytheirmethods,butalsothewaysinwhichtheylearnfromunexpectedandincongruentfindings.Furthermore,overtdiscussionsshouldincludestrategiestopreserveresearchcomponentsandmaterialsinthelongerterm,particularlyinsituationswherehardchoicesneedtobemadeaboutwhichinstrument,software,materialandmeta-dataitisactuallypossible(andrealistic)tostore,andhow.Afrankdiscussionoftheextenttowhicheverydayresearchpracticedeviatesfromidealisedreproducibilitystandardsisnotonlypossible,butcrucialtotheexerciseofcriticalscrutinyandconstantquestioningofestablishedknowledgethatPopperhimselfviewedasmarkersofgoodscience.Thegoalofsuchopenexchangeshouldnotbetopunishresearchapproacheswherecomputationalordirectreproducibilityisnotpossibleorrelevant,butrathertoacknowledgethestrengthsandweaknessesofdifferentwaysofvalidatingresults,andlearnasmuchaspossiblefromthemethodologicalpreceptsthatguidedifferentpartsofscience.AcknowledgmentsThispaper–likemuchofmyworkoverthelasttenyears–owesmuchtoMaryMorgan’sgenerousadvice,andherabilitytoidentifyandprobethecoreandsignificanceofargumentsandquestions.IalsogratefullyacknowledgeinsightfulcommentsbyHansRadder,StephanGüttinger,NiccoloTempini,twoanonymousreferees,andparticipantstotheconference“Curiosity,ImaginationandSurprise”(Utrecht,September2017),particularlythehostMarcelBoumans.ThisresearchwasfundedbytheEuropeanResearchCouncilgrantaward335925(“TheEpistemologyofData-IntensiveScience”),theAustralianResearchCouncilDiscoveryProject“OrganismsandUs”andtheUKEconomicandSocialResearchCouncilawardES/P011489/1.

References

Allison,D.B.etal(2016).Atragedyoferrors.Nature,530,27–30.

Page 17: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

17

Ankeny,R.A.,Leonelli,S.,Nelson,N.andRamsden,E.(2014)MakingOrganismsModelHumans:SituatedModelsinAlcoholResearch.ScienceinContext27(3):485-509.Baker,L(2015)OverHalfofPsychologyStudiesFailReproducibilityTest.Naturedoi:10.1038/nature.2015.18248

Begley,C.G.,&Ellis,L.M.(2012).Drugdevelopment:Raisestandardsforpreclinicalcancerresearch.Nature,483(7391),531–533.http://doi.org/10.1038/483531a

Bem,D.J.,Utts,J,JohnsonW.O.(2011)MustPsychologistsChangetheWayTheyAnalyzeTheirData?JournalofPersonalityandSocialPsychology101(4):716–719.Berg,J(2018)Progressonreproducibility.Science359(6371):9DOI:10.1126/science.aar8654Boumans,M(2015).ScienceoutsidetheLab.OxfordUniversityPress.Collins,H.(2017)inJustinKitzes,DanielTurek,FatmaDeniz(Eds.)ThePracticeofReproducibleResearch:CaseStudiesandLessonsfromtheData-IntensiveSciences.UniversityofCaliforniaPress.Collins,H.(1985).ChangingOrder:ReplicationandInductioninScientificPractice.London:SagePublications.Daston,LandGalison,P(1992)"TheImageofObjectivity."Representations40:81-128.Earp,B.D.andTrafimow,D.(2015)Replication,falsification,andthecrisisofconfidenceinsocialpsychology,FrontiersinPsychology,6(621),1-11(doi:10.3389/fpsyg.2015.00621).Fanelli,D.(2012)“NegativeResultsareDisappearingfromMostDisciplinesandCountries.”Scientometrics90(3):891–904.Franklin,A.(1986)TheNeglectofExperiment.Cambridge,UK:CambridgeUniversityPress.Hacking,I.(1983).RepresentingandIntervening.Cambridge,UK:CambridgeUniversityPress.Hacking,I.(1995).Introduction,in:J.Z.Buchwald(ed.)ScientificPractice:TheoriesandStoriesofDoingPhysics.Chicago:UniversityofChicagoPress,1-9.Ioannidisetal(2010)NatureGenetics41onrepeatabilityofpublishedmicroarraygeneexpressionanalyses–didnotdetailsoftwareused

Page 18: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

18

Kitzes, J (2017) in Justin Kitzes, Daniel Turek, Fatma Deniz (Eds.) The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. University of California Press. KNAW (2018). Replication studies – Improving reproducibility in the empirical sciences, Amsterdam, KNAW. Leonelli, S. (2016) Data-Centric Biology: A Philosophical Study. Chicago, Il: Chicago University Press. Leonelli,S.(2017a)GlobalDataQualityAssessmentandtheSituatedNatureof“Best”ResearchPracticesinBiology.DataScienceJournal16(32):1-11.DOI:https://doi.org/10.5334/dsj-2017-030Leonelli,S.(2017b)IncentivesandRewardstoEngageinOpenScienceActivities.ReportfortheMutualLearningExerciseoftheEuropeanCommission:OpenScience–AltmetricsandRewards.https://rio.jrc.ec.europa.eu/en/library/mutual-learning-exercise-open-science-%E2%80%93-altmetrics-and-rewards-incentives-and-rewards-engageLeonelli,S.(2013)‘ModelOrganism’.In:Dubitzky,W.,Wolkenhauer,O.,ChoK-H.,Yokota,H.(Eds.)EncyclopaediaofSystemsBiology.Springer.Morgan,M.S.(2012a)TheWorldintheModel:HowEconomistsWorkandThink.Cambridge,UK:CambridgeUniversityPress.Morgan,M.S.(2012b)CaseStudies:OneObservationorMany?JustificationorDiscovery?PhilosophyofScience79:5,667-77.Nelson,J.(2016)Not-So-StrongEvidenceforGenderDifferencesinRiskTaking,FeministEconomics22(2),2016,pp.114-142.OpenScienceCollaboration(2015)Estimatingthereproducibilityofpsychologicalscience.Science,349(6251),1-8.Peng,R.D.(2011).ReproducibleResearchinComputationalScience.Science334,1226–1227.

Popper,K.(1959[2002])TheLogicofScientificDiscovery.London:Routledge.

Porter, T. M. (1995) Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, NJ: Princeton University Press.

Prinz,F.,Schlange,T.,&Asadullah,K.(2011).Believeitornot:howmuchcanwerelyonpublisheddataonpotentialdrugtargets?NatureReviews.DrugDiscovery,10(9),712.

Pulverer,B(2015)ReproducibilityBlues.EMBO.http://emboj.embopress.org/content/34/22/2721.full

Page 19: Re-Thinking Reproducibility as a Criterion for Research Qualityphilsci-archive.pitt.edu/14352/1/Reproducibility_2018_SL.pdf · 2018-02-05 · 2 Introduction: The Reproducibility Crisis

19

Radder,H.(2012[1984/1988]).TheMaterialRealizationofScience.FromHabermastoExperimentationandReferentialRealism,revisededition,withanewpostscript.Dordrecht:Springer.Radder,H.(1996).InandAbouttheWorld:PhilosophicalStudiesofScienceandTechnology.StateUniversityofNewYorkPress.Reardon,S(2016)Amouse’shousemayruinexperiments:Environmentalfactorsliebehindmanyirreproduciblerodentexperiments.Nature530,264.Reiss,J.(2016)Causation,EvidenceandInference.London:Routledge.Romero,F.(2017)NoveltyvsReplicability:VirtuesandVicesintheRewardSystemofScience.PhilosophyofScience.RoyalSociety(2012)ScienceasanOpenEnterprise.AccessedJanuary142018.http://royalsociety.org/policy/projects/science-public-enterprise/report/ScienceInternational(2015)Opendatainabigdataworld.LastAccessed19January2018,URL:https://www.icsu.org/publications/open-data-in-a-big-data-world

Schmidt,S.(2009)Shallwereallydoitagain?Thepowerfulconceptofreplicationisneglectedinthesocialsciences.ReviewofGeneralPsychology,13(2),90-100.Shavit,AandEllison,AM(2017)TowardsaTaxonomyofScientificReplicationinShavit,AandEllison,AM(eds)SteppingintotheSameRiverTwice:ReplicationinBiologicalResearch.YaleUniversityPress.Simmons,J.P.,Nelson,L.D.andSimonsohn,U.(2011)False-positivepsychology:Undisclosedflexibilityindatacollectionandanalysisallowspresentinganythingassignificant.PsychologicalScience22(11):1359–1366.TheEconomist(2013)TroubleattheLab.LastAccessed18Jauary2018,URL:http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-troubleTurner,Robert,andDanielDeHaan(2017)BridgingtheGapbetweenSystemandCell:TheRoleofUltra-HighFieldMRIinHumanNeuroscience.InProgressinBrainResearch,editedbyTaraMahfoud,SamMcLean,andNikolasRose,233:179–220.VitalModels.Elsevier.http://www.sciencedirect.com/science/article/pii/S0079612317300493.Zahle,J.(2018)ValuesandDataCollectioninSocialResearch.PhilosophyofScience85(1):144-163.