48
Use and Misuse of the Term Experiment in MSR Research Natalia Juristo University of Oulu & Technical University of Madrid PROMISE September 7 th 2016

PROMISE keynote Juristo

Embed Size (px)

Citation preview

Page 1: PROMISE keynote Juristo

UseandMisuseoftheTermExperiment

inMSRResearch

NataliaJuristo

UniversityofOulu&

TechnicalUniversityofMadrid

PROMISE September 7th 2016

Page 2: PROMISE keynote Juristo

Mo?va?on

n  TodayempiricismiseverywhereinSEn  ThisdoesnotmeanSEisempiricallymaturen  Conduc?ngempiricalstudiesdoesnotimplytheyarecarriedoutandunderstoodproperly

n  IfocushereinamethodologicalissueonMSRresearchn  TheuseofexperimentsinMSR

2

Page 3: PROMISE keynote Juristo

Mo?va?on

n  ForseveralyearsIhavebeenstrugglingwithmatchingMSRresearchwiththemoretradi?onalSEempiricalresearch(beingconductedalongthelast35years)

n  VeryoLenIwasshockedhearingtocallexperiment(inMSRworks)toempiricalstudiesIdonotconsiderassuch

n  Idiscusstodayaboutaresearchweareconduc?ngtoclarifythisissue

3

Page 4: PROMISE keynote Juristo

Collabora?on

n  Thisresearchhasbeenconductedincollabora?onwithn  ClaudiaAyalan  XavierFranchn  BurakTurhan

4

Page 5: PROMISE keynote Juristo

EvidenceofMisuse

Page 6: PROMISE keynote Juristo

Small-scaleLiteratureReview

n Weconductedaliteraturereviewtodouble-checktheuseofthetermexperimentinMSRworks

n  2015MSR,ESEMandEMSEn MSR 42papersreviewedn  ESEM 36papersn  EMSE 55papers

6

Page 7: PROMISE keynote Juristo

FindingsVenue2015

UseofTermExperiment

MSRvstradi<onalexperiment

MSRUsevs.Misuse

ESEM 30.5%11outof36

72,72%MSRWorks(8papers)27,28%tradi?onalexperiments(3papers)

Wronguse:12,5%Properuse:87,5%

MSR 42,8%18outof42

100%MSRWorks(18papers)0%tradi?onalexperiments

Wronguse:44,45%Properuse:55,55%

EMSE 52,72%29outof55

65,51%MSRWorks(19papers)34,48%tradi?onalexperiments(10papers)

Wronguse:52,63%Properuse:47,36%

….Letmeelaboratewhythetermismisused

Page 8: PROMISE keynote Juristo

Whatisanexperiment

Page 9: PROMISE keynote Juristo

Experiment Definition

n  Empirical procedure where key variables of a reality are manipulated to investigate the impact of such variations

Page 10: PROMISE keynote Juristo

WhatMakesanExperiment

n Manipula?onofvariablesunderstudyn  Treatmentsmustbeassignedtoexperimentalunits

n Controllingpoten?alconfoundingvariablesimpac?ngresultsn  Confoundingiseliminatedthoughrandomassignmentoftreatmentstounits

10

Page 11: PROMISE keynote Juristo

WhatMakesanExperimentInterven?on

n  Experimenta?onn  Thereisapurposelyinterven?onbyresearchers

n  Researchersallocatetreatmentstounitsn  Experimentalgroups(exposureandunexposure)aredeterminedbyresearcher

n  Observa?onn  Researchershaveapassiveroleanddonotinterferewithrealityn Dataaregenerateddirectlyfromrealityanda>ertheyareanalyzed

n  Exposurestatusisnotdeterminedbyresearcher

11

Page 12: PROMISE keynote Juristo

WhatMakesanExperimentRandomiza?on

n  Experimentslimitthepoten?alforanyconfoundingfactors(biases)byrandomlyassigningonepar?cipantpooltoatreatmentandanotherpar?cipantpooltocontrolorothertreatment

n  Randomalloca?onoftreatmentstosubjectsminimizesthechancethattheincidenceofconfounding(par?cularlyunknownconfounding)variableswilldifferbetweenthetwogroups

12

Page 13: PROMISE keynote Juristo

WhatMakesanExperimentInterven?on+Randomiza?on

n  Interven?onguaranteescausalityn  Inspiringexample

n  Inaquasi-experimentthealloca?onoftreatmentisnotpossible

n  Althoughrunundercontrolledcondi?ons

n  Thecaseofpsychologyexperimentsn  Personalitytreats

13

Page 14: PROMISE keynote Juristo

WhatDoesnotMakesanExperiment

n  Randomiza?onn  Comparisonn  Analysistechniques

14

Page 15: PROMISE keynote Juristo

WhatDoesnotMakesanExperimentRandomiza?on

n  Randomiza?onisastrategyaimingtoreduceconfoundingvariables(bias)n  Itismandatoryincontrolledexperimentsn  Canbeappliedtoothertypeofempiricalstudies

n  Inspiringexamplen  Randomiza?oninsurveys

15

Page 16: PROMISE keynote Juristo

WhatDoesnotMakesanExperimentComparison

n  Compareamongtheimpactofvaluesofavariabledoesnotmeanwewillbeabletorevealcausality

n  Comparinginasetofdataunitswithdifferentvaluesofavariableneithermakesthestudyanexperimentnorcantracebackdifferencestotreatments

16

Page 17: PROMISE keynote Juristo

WhatDoesnotMakesanExperimentAnalysis

n  Analysistechniquesdonotdifferen?ateexperimentsfromotherempiricalstudiesn  Whatallowstorevealcausalityisnotthetypeofanalysistechniqueitisthedesignofthestudy

n  Applyingtoasetofdataananalysistechniquetypicallyusedinexperimentsneithermakesthestudyanexperimentnordetectscausality

17

Page 18: PROMISE keynote Juristo

WhatDoesnotMakesanExperimentn  AnMSRstudy

n  ApplyingANOVAdoesnotmeanitisanexperimentn  Comparingpoolsofdatadifferinginavariable’svaluedoesnotimplyitisanexperiment

n  EvenifMSRstudieswouldrandomizedtheywerenotexperiments

n  Designguaranteesn  Thedropofbiasandconfoundingvariablesn  Thedifferencesobservedinbehaviorarecausedbytreatments

18

Page 19: PROMISE keynote Juristo

ImpactofRandomiza?onandDesign

19

Page 20: PROMISE keynote Juristo

TypesofExperiments

n Withoutinterven?onn Naturalenvironment

n  Naturalexperiments

n  Interven?onn Where?

n  Ar?ficialcontrolledenvironmentn  Laboratorycontrolledexperiments

n Naturalenvironmentn  Fieldexperiments

20

Page 21: PROMISE keynote Juristo

LaboratoryexperimentsPurposelyinterven?onRandomizedalloca?onoftreatmentsAr?ficialenvironmenthighlycontrolled

FieldexperimentsPurposelyinterven?onRandomizedalloca?onoftreatmentsNaturaluncontrolledenvironment

Page 22: PROMISE keynote Juristo

22

NaturalexperimentsNointerven?onInanaturaluncontrolledenvironment

Page 23: PROMISE keynote Juristo

MiningSoLwareRepositories

n MSRresearchn  Outcomes(suchasqualityandproduc?vity)arestudiedinlarge-samplesofpastdatato

n  Applysta?s?calmethodstotesthypothesisn  Buildmachinelearningandminingmethodsonpastdataintotoolstosupportprogrammingtasks

n  Thedatastoredinarepositoryhavebeenobtainedfromreality(withoutinterven?on)n  ThereforeMSRworksareobserva?onalstudies

n  Wecouldcallthemnaturalexperimentsbutthattermismisleading

23

Page 24: PROMISE keynote Juristo

MSRandEpidemiology

Page 25: PROMISE keynote Juristo

EmpiricalStudiesinMedicine

25

M

ethod Developm

ent

Laboratory Research or Pre-clinical

N

on-Hum

an E

xperiments

Field Research Ill People Ill & Healthy People

From 20-100 volunteers to 1-2M patients

Descriptive

A n a l y t i c

Retrospective

P ro s p e c t i v e

Descriptive

Page 26: PROMISE keynote Juristo

EmpiricalStudiesinMedicineAnaly<cal Experimental ClinicalTrial

FieldTrial

GroupTrial

Observa<onal CohortStudies Prospec@veStudy;Follow-upstudyConcurrentstudy;IncidencestudyLongitudinalstudy

HistoricalCohortstudies

Case-ControlStudies Retrospec@vestudy;CasecomparisonstudyCasehistorystudy;Casecompeerstudy;Casereferentstudy;Trohocstudy

Descrip<ve Individuals Cross-Sec?onalStudies Prevalencestudy;DiseasefrequencystudyMorbiditysurvey;Healthsurvey

Caseseries

Singlecase

Popula<on EcologicalStudies

Page 27: PROMISE keynote Juristo

(Prospec?ve)CohortStudyn  Acollec?onofdataatregularintervalsofagroupofpeoplewhodonothavethe

diseaseforaperiodof?meandseewhodevelopsthedisease(newincidence)n  Cohort

n  Groupofpeoplewhoshareacommoncharacteris?cwithinadefinedperiodn  e.g.,areborn,areexposedtoadrugorvaccineorpollutant,orundergoacertainmedicalprocedure

n  Comparisongroupn  Thegeneralpopula?onfromwhichthecohortisdrawnn  Anothercohortofpersonsthoughttohavehadlikleornoexposuretothesubstance

underinves?ga?on,butotherwisesimilarn  SE:Projects/Commitsthathavenotappliedthemethodunderstudy

n  Examplen  DoesexposuretoX(smoking)associatewithoutcomeY(lungcancer)?n  Suchastudywouldrecruitagroupofsmokersandagroupofnon-smokers(theunexposedgroup)

andfollowthemforasetperiodof?meandnotedifferencesintheincidenceoflungcancerbetweenthegroupsattheendofthis?me

n  SE:Apassivefollowupofprojects/commits,collec@ngdataatregularintervalsandno@ngthequality/produc@vetheyget

27

Page 28: PROMISE keynote Juristo

Retrospec?veStudies

n  Theresearchercollectsdatafrompastrecordsanddoesnotfollowpa?entsupasisinprospec?vestudies

n  Alltheevents(exposure,latentperiod,andsubsequentoutcome-developmentofdisease-)havealreadyoccurredinthepast

n  Errorsduetoconfoundingandbiasaremorecommoninretrospec?vestudiesthaninprospec?vestudies

28

Page 29: PROMISE keynote Juristo

Retrospec?veStudiesThreatstoValidity

n  Somekeydatahavenotbeenmeasuredn  Biasesmayaffecttheselec?onofcontrols

n  Selec?onbiasn  Onlyselectpa?entswiththenecessaryinforma?on

n  Misclassifica?onorinforma?onbiasasaresultoftheretrospec?veaspect

n  Researcherscannotcontrolexposureoroutcomeassessmentbutinsteadneedtorelyonothersforaccuraterecordkeepingn  Itcanbeverydifficulttomakeaccuratecomparisonsbetweentheexposed

andthenon-exposed

29

Page 30: PROMISE keynote Juristo

Retrospec?veCohortStudy

n  Recordsofgroupsofindividualswhoarealikeinmanywaysbutdifferbyacertaincharacteris?carecomparedforapar?cularoutputn  Forexample,femalenurseswhosmokeandthosewhodonotsmoken  SE:Useofpastdatainarepositorytocomparecertainoutputof

projectswithcharacteris@cAandno-A

n  Theresearchercollectsdatafrompastrecordsanddoesnotfollowpa?entsupasisthecasewithaprospec?vestudy

30

Page 31: PROMISE keynote Juristo

(Retrospec?ve)Case-ControlStudy

n  Recordsofindividualsaredividedintwogroupsdifferinginoutcome(diseaseornot)andcomparedonthebasisofsomesupposedcausalakributen  Case-Controlstudiesselectsubjectsbasedontheirdiseasestatus(theeffect)

n  Cohortstudiesselectsubjectsbasedontheirexposurestatus(thecause)

n  SE:Selectprojects/commitswithcertainlevel(i.e.qualityvalue)andtracebackcertainprojectcharacteris@csthatisbelievedtocontributetoquality

31

Page 32: PROMISE keynote Juristo

EcologicalStudiesn  Unitsofanalysisarepopula?ons

n  Comparisonofgroupsratherthanindividuals

n  Explorescorrela?onsbetweengrouplevelexposureandoutcomes

32

Page 33: PROMISE keynote Juristo

HierarchiesofEvidence

Page 34: PROMISE keynote Juristo

HierarchyofEvidences

n  Itiscri?caltounderstandwhichempiricalstudyyouareconduc?ngn  Tofullyunderstandwhattheresultsaretellingus

n  Thetypeofresultsdependsonthetypeofstudy!!!

n  Evidencehierarchiesreflecttherela?veauthorityofvarioustypesofempiricalstudies

34

Page 35: PROMISE keynote Juristo
Page 36: PROMISE keynote Juristo
Page 37: PROMISE keynote Juristo

AuthorityofEvidences

Field Experiments Observational

Analytical Prospective

Retrospective

Observational Descriptive

Laboratory Experiments

Page 38: PROMISE keynote Juristo

PsychologyHierarchyofEvidence

38

Page 39: PROMISE keynote Juristo

TwoMSRexamples

Page 40: PROMISE keynote Juristo

Example1

n MSR’15n  TheUniquenessofChanges:Characteris?csandApplica?onsn  Ray,Nagappan,Bird,Nagappan,Zimmeramnn

n  Whythispapern  Averywellwrikenpapern  Severalempiricalstudiesofdifferenttypeaboutthesameissuen  ProminentMSRauthors

40

Page 41: PROMISE keynote Juristo

EmpiricalStudies(Authors’terms)n  Topic

n  Somechangesareuniquewhileotherarenotn  Theyproposeawaytoiden?fyuniquenessofchanges

n  Empiricalstudies(inauthors’terms)n  Analysisofuniqueandnon-uniquechangesproper?es

n  Whatistheextentofuniquechanges;Whointroducesuniquechanges;Wheredouniquechangestakeplace

n  Applica?onsn  ExperimentforRiskAnalysis

n  CheckwhetherUfilecommitsarehaveahigherdefectratethanNUfilecommitsn  UseMann-Whitneytestforthecomparison

n  Recommenda?onsystemsn  Asystemisembeddedinthedevelopmentenvironmenttosuggestchangesto

developersn  Precisionandrecalloftherecommenda?onsisanalyzed

41

Page 42: PROMISE keynote Juristo

TypeofEmpiricalStudies(Epidemiologyterms)

n  Analysisofuniqueandnon-uniquechangesproper?esn  Whatistheextentofuniquechanges;Whointroducesuniquechanges;

Wheredouniquechangestakeplacen  Ecologicalstudy

n  Descrip?ve;Useofpopula?onaggregateddata

n  Applica?on:ExperimentforRiskAnalysisn  CheckwhetherUfilecommitshaveahigherdefectratethanNUfilecommits

n  Retrospec?vecohortstudyn  Comparisonofpastdata

n  Applica?ons:Recommenda?onsystemsn  Asystemisembeddedinthedevelopmentenvironmenttosuggestchangesto

developers;Precisionandrecalloftherecommenda?onsisanalyzedn  Prospec?veobserva?onalstudy;ecological?

n  Butnocomparisonismade(i.e.:ifquality/produc?vityofdevelopmentsusingtherecommenda?ons)n  CouldbeconductedasFieldTrialor(Prospec?ve)Cohortstudy

42

Page 43: PROMISE keynote Juristo

Example2

n  ESEM’15n  Howtomakebestuseofcross-companydataforwebeffort

es?ma?onn  Minku,Sarro,Mendes,Ferrucci

n  Topicn  ComparesCCdatasetversusWCdatasetforwebeffortes?ma?onn  ComparesDycomagainstNN-filtering

n  Dycom:FrameworkforlearningsoLwareeffortes?ma?onmodelsforacompanybasedonmappingCCmodelstothecompany’scontext)

n  NN-filtering:NearestNeighborfilteringtomakeCCes?ma?ons

43

Page 44: PROMISE keynote Juristo

ExperimentsinEffortEs?ma?onResearch

n  Interven?onn  Thetwo(effortes?ma?on)techniquescomparedn  Alloca?onoftreatmentstounits?

n  Yesn  Everyprojectbelongingtothetestdatasetisanexperimentalunit

n  Experimentalgroupsarethetestdatasetes?matedwithoneortheothertechnique

n  TypicalABdesigns;Butcouldtryothers

n  Controlconfoundingvariablesthroughrandomiza?on?n  No

44

Page 45: PROMISE keynote Juristo

WhichUseswereRight

Venue2015

UseofTermExperiment

MSRvstradi<onalexperiment

MSRUsevs.Misuse

ESEM 30.5%11outof36

72,7%MSRWorks(8papers)27,3%tradi?onalexperiments(3papers)

Observa?onal:12,5%Dataexperiments:87,5%

MSR 42,8%18outof42

100%MSRWorks(18papers)0%tradi?onalexperiments

Observa?onal:44,4%Dataexperiments:55,5%

EMSE 52,72%29outof55

65,5%MSRWorks(19papers)34,5%tradi?onalexperiments(10papers)

Observa?onal:52,6%Dataexperiments:47,4%

Page 46: PROMISE keynote Juristo

Conclusions

Page 47: PROMISE keynote Juristo

Conclusionsn  MSRisaresearchmethodbywhichseveraltypeofempirical

studiescanbeconductedn  Inanycasemostresearchis

n  Observa?onaln  Retrospec?ve

n  Unlessdataisminedfromdevelopmenttoolsprospec?vely

n  Thereforetheevidenceobtainedisoflowerqualitythann  Observa?onalprospec?vestudiesn  Fieldexperimentalstudies

n  Showcorrela?onbutitishardtoprovecausa?onn  Morepowerfultypesofobserva?onalstudies(Case-control;Cohort)

couldgetbekerevidence

47

Page 48: PROMISE keynote Juristo

UseandMisuseoftheTermExperiment

inMSRResearch

NataliaJuristo

UniversityofOulu&

TechnicalUniversityofMadrid

PROMISE September 7th 2016