Mining Software Repositories: Using Humans to BetterSoftware
Marat Akhin
15/06/2015
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 1 / 18
What is MSR?
What is MSR?
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 2 / 18
What is MSR?
Mining software repositories
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 3 / 18
What is MSR?
Mining software repositories
Understand empirical aspects of software development
Use the past to guide the future
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 4 / 18
What is MSR?
MSR data
Historical data
Version control systems: CVS, SVN, Git, Mercurial
Bug trackers: Bugzilla, JIRA, YouTrack
Communication: e-mails, chat logs, wiki pages
Execution data
Execution traces
Deployment logs
Crash dumps
Source code data
Source code itself
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 5 / 18
What is MSR?
MSR methods
Classification
aka Supervised learning
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 6 / 18
What is MSR?
MSR methods
Clustering
aka Unsupervised learning
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 7 / 18
What is MSR?
MSR methods
Statistical hypothesis testing
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 8 / 18
What is MSR?
MSR insights
Quality assurance
Architecture analysis
Bug prediction
Developer feedback
You-name-it!
Marat Akhin Mining Software Repositories: Using Humans to Better Software 15/06/2015 9 / 18
Can we predict bugs?
Can we predict bugs?
Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 10 / 18
Can we predict bugs?
Don’t code on Fridays 1
Eclipse/Mozilla repos / bug-trackers
Link bug fixes to source code changes
Find interesting correlations
1Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. When do changesinduce fixes? (MSR’05)
Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 11 / 18
Can we predict bugs?
Reopened bugs stay 2
Eclipse / Apache / OpenOffice
Build decision trees by different criteria
Analyze the results
2Emad Shihab et al. Studying re-opened bugs in open source software (ESE’12)Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 12 / 18
Code reviews: yay or nay?
Code reviews: yay or nay?
Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 13 / 18
Code reviews: yay or nay?
More reviews == less bugs 3
Qt / ITK / VTK
Collect review metrics
Bulid regression models for bug prediction
3Shane McIntosh et al. The impact of code review coverage and code reviewparticipation on software quality: a case study of the qt, VTK, and ITK projects.(MSR’14)
Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 14 / 18
Code clones: what is that smell?
Code clones: what is that smell?
Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 15 / 18
Code clones: what is that smell?
Clones are better than other code 4
Apache / Evolution / GIMP / Nautilus
Detect clones and link them to bugs
Analyze clone-to-bug ratio
4Foyzur Rahman et al. Clones: what is that smell? (ESE’12)Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 16 / 18
What next?
What next?
Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 17 / 18
What next?
More data to explore
OSS source code doubles every year
Active use of *aaS platforms
MSR has access to vast amounts of development data
More insights coming next week!
Marat Akhin Mining Software Repositories: Using Humans to Better Software15/06/2015 18 / 18