THEMYSTERIOUSCORRELATION
AdetectivestoryEvelinaGaba ová
@EVELGAB
CORRELATIONISNOTCAUSATION!
CORRELATIONANDCAUSATION
EVERYONELOVESANICECORRELATION!
CAUSALLANGUAGEFORCORRELATIONDATA
THINGSTOCHECKcorrelationorcausation(randomisedstudy?)smallsamplesizeresultbychancehiddencause(latentvariable)
DOSTORKSDELIVERBABIES?
Source:ClaudeCovo-Farchi
Testingthehypothesis
Matthews,R.(2000),StorksDeliverBabies(p=0.008).TeachingStatistics,22:36–38.
Storksandcountryarea
Birthrateandcountryarea
PredictingsalarywithlinearregressionCountryYearsofprogrammingexperienceTabsandspacesusageDevelopertypeandlanguageLevelofformaleducation(e.g.bachelor’s,master’s,doctorate)WhethertheycontributetoopensourceWhethertheyprogramasahobbyCompanysize
WHATHAPPENSIFWEREMOVETABSAND
SPACES?
DivingdeeperintolinearregressionFullmodelwiththeinformationontabsandspacesincludedReducedmodelwithouttheinformationontabsandspaces
Coefficientofdeterminationhowmuchvarianceinsalarycanthemodelexplain
Model Coefficientofdetermination Adjusted
Fullmodel 0.4008 0.3892
Reducedmodel 0.3938 0.3892
R2 R2adj
COLLINEARITY?
WHATCHANGEDINTHEREDUCEDMODEL?
MORESIGNIFICANTINTHEREDUCEDMODEL
YearsofprogrammingexperienceContributingtoopensourcePHP
OPENSOURCE?
Opensourcecontributorsusespacesmorethantabs
POTENTIALEXPLANATION?
Languageeffects?
Languageeffects?
LANGUAGEANDOPENSOURCE?
TABS,SPACES,OPENSOURCE&SALARY
Howdoesitfittogether?
ExploringsalarydistributionsBASEDONEXPERIENCELEVEL
What’sdifferentfortheseusers?…morestatisticaltesting
TheimportanceofversioncontrolHighersalary Lowersalary
Git 168 660
Iusesomeothersystem 17 30
Subversion 4 47
TeamFoundationServer 6 92
VERSIONCONTROLANDTABS/SPACES
## ## Pearson's Chi-squared test## ## data: .## X-squared = 258.48, df = 18, p-value < 2.2e-16
Versioncontrolandsalary
GitandSubversion
WHYISVERSIONCONTROLSOIMPORTANT?
github.com/evelinag/tabs-spaces-talk
BUTCANWETRUSTTHEDATA?
Salarydistribution
WHAT’SWRONG?
MISSINGDATA
Statisticsofmissingdatamissingcompletelyatrandommissingatrandommissingnotatrandom
DATATRAPSmissingdatawrongdata
accidentallyordeliberately
CANYOUTRUSTDATA?
INTERPRETATIONMachinelearningasaservice
EVELINAGABASOVAConsultingdatadetective
@evelgabevelinag.com