14
The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s University Canada Canada

The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

The Road Ahead for Mining Software Repositories

Ahmed E. HassanQueen’s University

CanadaCanada

Page 2: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Code Repos

SourceforgeGoogleCode

22

Field Logs

Source ControlCVS/SVN

Bugzilla Mailinglists

CrashRepos

Historical Repositories Runtime Repos

Page 3: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

• Transforms static record-keeping repositories to activerepositories

• Makes repos data actionable

Mining Software Repositories (MSR)

• Makes repos data actionableby uncovering hidden patterns and trends

3

MailinglistBugzilla Crashes

Field logs CVS/SVN

Page 4: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

MSR researchersanalyze and cross-link repositories

fixed bug

discussionsBuggy change &

Fixing change Field crashes

Bugzilla CVS/SVNMailinglist Crashes

Estimate fix effortMark duplicates

Suggest experts and fix

New Bug Report

Page 5: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

MSR researchersanalyze and cross-link repositories

fixed bug

discussionsBuggy change &

Fixing change Field crashes

Bugzilla CVS/SVNMailinglist Crashes

Suggest APIsWarn about risky code or bugs

Suggest locations to co-change

New Change

Page 6: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Supporting software understanding (NETBSD)

Conceptual (proposed) Concrete (reality)

6

Why? Who?When? Where?

Page 7: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Mining supports software understanding (NETBSD)

• Eight unexpected dependencies

• All except two dependencies existed since day one:

– Virtual Address Maintenance Pager

– Pager Hardware Translations

Auto-generatedfrom CVS repository

7

Which? vm_map_entry_create (in src/sys/vm/Attic/vm_map.c) depends on pager_map (in /src/sys/uvm/uvm_pager.c)

Who? cgd

When? 1993/04/09 15:54:59 Revision 1.2 of src/sys/vm/Attic/vm_map.c

Why?

from sean eric fagan: it seems to keep the vm system from deadlocking the system when it runs out of swap + physical memory. prevents the system from giving the last page(s) to anything but the referenced "processes" (especially important is the pager process, which should never have to wait for a free page).

Page 8: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Going beyond code and bugs

• Taming the complexity of MSRTaming the complexity of MSR

• Showing the value of repositories

• Easing the adoption of MSR

Page 9: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

Going beyond code and bugs MSR 2004-2008:

~80% of publications focus on code and bugs

• Explore non-structured data– Social aspects: emails and comments

9

– Social aspects: emails and comments• Link data between repos• Seek non-traditional repos

– Demonstrate the value of IDE interactions or build failures repos

• Understand the limitation of repos– Causation vs. Correlation

• Small number of committers in OS projects

Page 10: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Simplify the extraction of high quality data

Taming the complexity of MSR

main() {int a;/*call

help*/helpInfo();

}

helpInfo() {errorString!

}main() {

int a;/*call

help*/h l I f ()

helpInfo(){int b;}main() {

int a;/*call

help*/h l I f ()

10

– Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors

• Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed

• Improve the quality of repository data– Provide tools for annotation of repos data at creation

helpInfo();}

helpInfo();}

V1:Undefined func.(Link Error)

V2:Syntax error

V3:Valid code

Page 11: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Simplify the extraction of high quality data

Taming the complexity of MSR

11

– Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors

• Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed

• Improve the quality of repository data– Provide tools for annotation of repos data at creation

Page 12: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Understand the needs of practitionersP di ti b d l

Showing the value of MSR

12

– Predicting buggy modules:• Buggy modules are well-known

– Predicting fault occurrences at module level is too coarse• Study the performance in practice

– Tools affecting the repos data• Show the practical benefits

– Statistical improvements not sufficient– Cost of maintenance should be evaluated

• Evaluate on non-open source systems

Page 13: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Opportunities in the Road Ahead

Repository Extract AnalyzeAdopt Results

Show Value

• Simplify access to techniques

Easing the adoption of MSR

13

– Integration into IDEs (HATARI, Hipikat, Myln, eRose)– A web service demonstration for an open source

project• A continuously updating MSR Challenge

• Help practitioners make decisions– MSR should aim to support not replace

practitioners

Page 14: The Road Ahead for Mining Software Repositoriesresearch.cs.queensu.ca/~ahmed/home/teaching/CISC880/F16/...The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s

Mining Software Repositories

14

http://msrconf.org