PROMISE keynote Juristo

Preview:

Citation preview

UseandMisuseoftheTermExperiment

inMSRResearch

NataliaJuristo

UniversityofOulu&

TechnicalUniversityofMadrid

PROMISE September 7th 2016

Mo?va?on

n  TodayempiricismiseverywhereinSEn  ThisdoesnotmeanSEisempiricallymaturen  Conduc?ngempiricalstudiesdoesnotimplytheyarecarriedoutandunderstoodproperly

n  IfocushereinamethodologicalissueonMSRresearchn  TheuseofexperimentsinMSR

2

Mo?va?on

n  ForseveralyearsIhavebeenstrugglingwithmatchingMSRresearchwiththemoretradi?onalSEempiricalresearch(beingconductedalongthelast35years)

n  VeryoLenIwasshockedhearingtocallexperiment(inMSRworks)toempiricalstudiesIdonotconsiderassuch

n  Idiscusstodayaboutaresearchweareconduc?ngtoclarifythisissue

3

Collabora?on

n  Thisresearchhasbeenconductedincollabora?onwithn  ClaudiaAyalan  XavierFranchn  BurakTurhan

4

EvidenceofMisuse

Small-scaleLiteratureReview

n Weconductedaliteraturereviewtodouble-checktheuseofthetermexperimentinMSRworks

n  2015MSR,ESEMandEMSEn MSR 42papersreviewedn  ESEM 36papersn  EMSE 55papers

6

FindingsVenue2015

UseofTermExperiment

MSRvstradi<onalexperiment

MSRUsevs.Misuse

ESEM 30.5%11outof36

72,72%MSRWorks(8papers)27,28%tradi?onalexperiments(3papers)

Wronguse:12,5%Properuse:87,5%

MSR 42,8%18outof42

100%MSRWorks(18papers)0%tradi?onalexperiments

Wronguse:44,45%Properuse:55,55%

EMSE 52,72%29outof55

65,51%MSRWorks(19papers)34,48%tradi?onalexperiments(10papers)

Wronguse:52,63%Properuse:47,36%

….Letmeelaboratewhythetermismisused

Whatisanexperiment

Experiment Definition

n  Empirical procedure where key variables of a reality are manipulated to investigate the impact of such variations

WhatMakesanExperiment

n Manipula?onofvariablesunderstudyn  Treatmentsmustbeassignedtoexperimentalunits

n Controllingpoten?alconfoundingvariablesimpac?ngresultsn  Confoundingiseliminatedthoughrandomassignmentoftreatmentstounits

10

WhatMakesanExperimentInterven?on

n  Experimenta?onn  Thereisapurposelyinterven?onbyresearchers

n  Researchersallocatetreatmentstounitsn  Experimentalgroups(exposureandunexposure)aredeterminedbyresearcher

n  Observa?onn  Researchershaveapassiveroleanddonotinterferewithrealityn Dataaregenerateddirectlyfromrealityanda>ertheyareanalyzed

n  Exposurestatusisnotdeterminedbyresearcher

11

WhatMakesanExperimentRandomiza?on

n  Experimentslimitthepoten?alforanyconfoundingfactors(biases)byrandomlyassigningonepar?cipantpooltoatreatmentandanotherpar?cipantpooltocontrolorothertreatment

n  Randomalloca?onoftreatmentstosubjectsminimizesthechancethattheincidenceofconfounding(par?cularlyunknownconfounding)variableswilldifferbetweenthetwogroups

12

WhatMakesanExperimentInterven?on+Randomiza?on

n  Interven?onguaranteescausalityn  Inspiringexample

n  Inaquasi-experimentthealloca?onoftreatmentisnotpossible

n  Althoughrunundercontrolledcondi?ons

n  Thecaseofpsychologyexperimentsn  Personalitytreats

13

WhatDoesnotMakesanExperiment

n  Randomiza?onn  Comparisonn  Analysistechniques

14

WhatDoesnotMakesanExperimentRandomiza?on

n  Randomiza?onisastrategyaimingtoreduceconfoundingvariables(bias)n  Itismandatoryincontrolledexperimentsn  Canbeappliedtoothertypeofempiricalstudies

n  Inspiringexamplen  Randomiza?oninsurveys

15

WhatDoesnotMakesanExperimentComparison

n  Compareamongtheimpactofvaluesofavariabledoesnotmeanwewillbeabletorevealcausality

n  Comparinginasetofdataunitswithdifferentvaluesofavariableneithermakesthestudyanexperimentnorcantracebackdifferencestotreatments

16

WhatDoesnotMakesanExperimentAnalysis

n  Analysistechniquesdonotdifferen?ateexperimentsfromotherempiricalstudiesn  Whatallowstorevealcausalityisnotthetypeofanalysistechniqueitisthedesignofthestudy

n  Applyingtoasetofdataananalysistechniquetypicallyusedinexperimentsneithermakesthestudyanexperimentnordetectscausality

17

WhatDoesnotMakesanExperimentn  AnMSRstudy

n  ApplyingANOVAdoesnotmeanitisanexperimentn  Comparingpoolsofdatadifferinginavariable’svaluedoesnotimplyitisanexperiment

n  EvenifMSRstudieswouldrandomizedtheywerenotexperiments

n  Designguaranteesn  Thedropofbiasandconfoundingvariablesn  Thedifferencesobservedinbehaviorarecausedbytreatments

18

ImpactofRandomiza?onandDesign

19

TypesofExperiments

n Withoutinterven?onn Naturalenvironment

n  Naturalexperiments

n  Interven?onn Where?

n  Ar?ficialcontrolledenvironmentn  Laboratorycontrolledexperiments

n Naturalenvironmentn  Fieldexperiments

20

LaboratoryexperimentsPurposelyinterven?onRandomizedalloca?onoftreatmentsAr?ficialenvironmenthighlycontrolled

FieldexperimentsPurposelyinterven?onRandomizedalloca?onoftreatmentsNaturaluncontrolledenvironment

22

NaturalexperimentsNointerven?onInanaturaluncontrolledenvironment

MiningSoLwareRepositories

n MSRresearchn  Outcomes(suchasqualityandproduc?vity)arestudiedinlarge-samplesofpastdatato

n  Applysta?s?calmethodstotesthypothesisn  Buildmachinelearningandminingmethodsonpastdataintotoolstosupportprogrammingtasks

n  Thedatastoredinarepositoryhavebeenobtainedfromreality(withoutinterven?on)n  ThereforeMSRworksareobserva?onalstudies

n  Wecouldcallthemnaturalexperimentsbutthattermismisleading

23

MSRandEpidemiology

EmpiricalStudiesinMedicine

25

M

ethod Developm

ent

Laboratory Research or Pre-clinical

N

on-Hum

an E

xperiments

Field Research Ill People Ill & Healthy People

From 20-100 volunteers to 1-2M patients

Descriptive

A n a l y t i c

Retrospective

P ro s p e c t i v e

Descriptive

EmpiricalStudiesinMedicineAnaly<cal Experimental ClinicalTrial

FieldTrial

GroupTrial

Observa<onal CohortStudies Prospec@veStudy;Follow-upstudyConcurrentstudy;IncidencestudyLongitudinalstudy

HistoricalCohortstudies

Case-ControlStudies Retrospec@vestudy;CasecomparisonstudyCasehistorystudy;Casecompeerstudy;Casereferentstudy;Trohocstudy

Descrip<ve Individuals Cross-Sec?onalStudies Prevalencestudy;DiseasefrequencystudyMorbiditysurvey;Healthsurvey

Caseseries

Singlecase

Popula<on EcologicalStudies

(Prospec?ve)CohortStudyn  Acollec?onofdataatregularintervalsofagroupofpeoplewhodonothavethe

diseaseforaperiodof?meandseewhodevelopsthedisease(newincidence)n  Cohort

n  Groupofpeoplewhoshareacommoncharacteris?cwithinadefinedperiodn  e.g.,areborn,areexposedtoadrugorvaccineorpollutant,orundergoacertainmedicalprocedure

n  Comparisongroupn  Thegeneralpopula?onfromwhichthecohortisdrawnn  Anothercohortofpersonsthoughttohavehadlikleornoexposuretothesubstance

underinves?ga?on,butotherwisesimilarn  SE:Projects/Commitsthathavenotappliedthemethodunderstudy

n  Examplen  DoesexposuretoX(smoking)associatewithoutcomeY(lungcancer)?n  Suchastudywouldrecruitagroupofsmokersandagroupofnon-smokers(theunexposedgroup)

andfollowthemforasetperiodof?meandnotedifferencesintheincidenceoflungcancerbetweenthegroupsattheendofthis?me

n  SE:Apassivefollowupofprojects/commits,collec@ngdataatregularintervalsandno@ngthequality/produc@vetheyget

27

Retrospec?veStudies

n  Theresearchercollectsdatafrompastrecordsanddoesnotfollowpa?entsupasisinprospec?vestudies

n  Alltheevents(exposure,latentperiod,andsubsequentoutcome-developmentofdisease-)havealreadyoccurredinthepast

n  Errorsduetoconfoundingandbiasaremorecommoninretrospec?vestudiesthaninprospec?vestudies

28

Retrospec?veStudiesThreatstoValidity

n  Somekeydatahavenotbeenmeasuredn  Biasesmayaffecttheselec?onofcontrols

n  Selec?onbiasn  Onlyselectpa?entswiththenecessaryinforma?on

n  Misclassifica?onorinforma?onbiasasaresultoftheretrospec?veaspect

n  Researcherscannotcontrolexposureoroutcomeassessmentbutinsteadneedtorelyonothersforaccuraterecordkeepingn  Itcanbeverydifficulttomakeaccuratecomparisonsbetweentheexposed

andthenon-exposed

29

Retrospec?veCohortStudy

n  Recordsofgroupsofindividualswhoarealikeinmanywaysbutdifferbyacertaincharacteris?carecomparedforapar?cularoutputn  Forexample,femalenurseswhosmokeandthosewhodonotsmoken  SE:Useofpastdatainarepositorytocomparecertainoutputof

projectswithcharacteris@cAandno-A

n  Theresearchercollectsdatafrompastrecordsanddoesnotfollowpa?entsupasisthecasewithaprospec?vestudy

30

(Retrospec?ve)Case-ControlStudy

n  Recordsofindividualsaredividedintwogroupsdifferinginoutcome(diseaseornot)andcomparedonthebasisofsomesupposedcausalakributen  Case-Controlstudiesselectsubjectsbasedontheirdiseasestatus(theeffect)

n  Cohortstudiesselectsubjectsbasedontheirexposurestatus(thecause)

n  SE:Selectprojects/commitswithcertainlevel(i.e.qualityvalue)andtracebackcertainprojectcharacteris@csthatisbelievedtocontributetoquality

31

EcologicalStudiesn  Unitsofanalysisarepopula?ons

n  Comparisonofgroupsratherthanindividuals

n  Explorescorrela?onsbetweengrouplevelexposureandoutcomes

32

HierarchiesofEvidence

HierarchyofEvidences

n  Itiscri?caltounderstandwhichempiricalstudyyouareconduc?ngn  Tofullyunderstandwhattheresultsaretellingus

n  Thetypeofresultsdependsonthetypeofstudy!!!

n  Evidencehierarchiesreflecttherela?veauthorityofvarioustypesofempiricalstudies

34

AuthorityofEvidences

Field Experiments Observational

Analytical Prospective

Retrospective

Observational Descriptive

Laboratory Experiments

PsychologyHierarchyofEvidence

38

TwoMSRexamples

Example1

n MSR’15n  TheUniquenessofChanges:Characteris?csandApplica?onsn  Ray,Nagappan,Bird,Nagappan,Zimmeramnn

n  Whythispapern  Averywellwrikenpapern  Severalempiricalstudiesofdifferenttypeaboutthesameissuen  ProminentMSRauthors

40

EmpiricalStudies(Authors’terms)n  Topic

n  Somechangesareuniquewhileotherarenotn  Theyproposeawaytoiden?fyuniquenessofchanges

n  Empiricalstudies(inauthors’terms)n  Analysisofuniqueandnon-uniquechangesproper?es

n  Whatistheextentofuniquechanges;Whointroducesuniquechanges;Wheredouniquechangestakeplace

n  Applica?onsn  ExperimentforRiskAnalysis

n  CheckwhetherUfilecommitsarehaveahigherdefectratethanNUfilecommitsn  UseMann-Whitneytestforthecomparison

n  Recommenda?onsystemsn  Asystemisembeddedinthedevelopmentenvironmenttosuggestchangesto

developersn  Precisionandrecalloftherecommenda?onsisanalyzed

41

TypeofEmpiricalStudies(Epidemiologyterms)

n  Analysisofuniqueandnon-uniquechangesproper?esn  Whatistheextentofuniquechanges;Whointroducesuniquechanges;

Wheredouniquechangestakeplacen  Ecologicalstudy

n  Descrip?ve;Useofpopula?onaggregateddata

n  Applica?on:ExperimentforRiskAnalysisn  CheckwhetherUfilecommitshaveahigherdefectratethanNUfilecommits

n  Retrospec?vecohortstudyn  Comparisonofpastdata

n  Applica?ons:Recommenda?onsystemsn  Asystemisembeddedinthedevelopmentenvironmenttosuggestchangesto

developers;Precisionandrecalloftherecommenda?onsisanalyzedn  Prospec?veobserva?onalstudy;ecological?

n  Butnocomparisonismade(i.e.:ifquality/produc?vityofdevelopmentsusingtherecommenda?ons)n  CouldbeconductedasFieldTrialor(Prospec?ve)Cohortstudy

42

Example2

n  ESEM’15n  Howtomakebestuseofcross-companydataforwebeffort

es?ma?onn  Minku,Sarro,Mendes,Ferrucci

n  Topicn  ComparesCCdatasetversusWCdatasetforwebeffortes?ma?onn  ComparesDycomagainstNN-filtering

n  Dycom:FrameworkforlearningsoLwareeffortes?ma?onmodelsforacompanybasedonmappingCCmodelstothecompany’scontext)

n  NN-filtering:NearestNeighborfilteringtomakeCCes?ma?ons

43

ExperimentsinEffortEs?ma?onResearch

n  Interven?onn  Thetwo(effortes?ma?on)techniquescomparedn  Alloca?onoftreatmentstounits?

n  Yesn  Everyprojectbelongingtothetestdatasetisanexperimentalunit

n  Experimentalgroupsarethetestdatasetes?matedwithoneortheothertechnique

n  TypicalABdesigns;Butcouldtryothers

n  Controlconfoundingvariablesthroughrandomiza?on?n  No

44

WhichUseswereRight

Venue2015

UseofTermExperiment

MSRvstradi<onalexperiment

MSRUsevs.Misuse

ESEM 30.5%11outof36

72,7%MSRWorks(8papers)27,3%tradi?onalexperiments(3papers)

Observa?onal:12,5%Dataexperiments:87,5%

MSR 42,8%18outof42

100%MSRWorks(18papers)0%tradi?onalexperiments

Observa?onal:44,4%Dataexperiments:55,5%

EMSE 52,72%29outof55

65,5%MSRWorks(19papers)34,5%tradi?onalexperiments(10papers)

Observa?onal:52,6%Dataexperiments:47,4%

Conclusions

Conclusionsn  MSRisaresearchmethodbywhichseveraltypeofempirical

studiescanbeconductedn  Inanycasemostresearchis

n  Observa?onaln  Retrospec?ve

n  Unlessdataisminedfromdevelopmenttoolsprospec?vely

n  Thereforetheevidenceobtainedisoflowerqualitythann  Observa?onalprospec?vestudiesn  Fieldexperimentalstudies

n  Showcorrela?onbutitishardtoprovecausa?onn  Morepowerfultypesofobserva?onalstudies(Case-control;Cohort)

couldgetbekerevidence

47

UseandMisuseoftheTermExperiment

inMSRResearch

NataliaJuristo

UniversityofOulu&

TechnicalUniversityofMadrid

PROMISE September 7th 2016

Recommended