14
The Road Ahead for Mining Software Repositories Ahmed E. Hassan Queen’s University Canada Canada

The Road Ahead for Mining Software Repositories€¦ · by uncovering hidden patterns and trends 3 Bugzilla Mailinglist Crashes Field logs CVS/SVN. MSR researchers analyze and cross-link

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • The Road Ahead for Mining Software Repositories

    Ahmed E. HassanQueen’s University

    CanadaCanada

  • Code Repos

    SourceforgeGoogleCode

    22

    Field Logs

    Source ControlCVS/SVN

    Bugzilla Mailinglists

    CrashRepos

    Historical Repositories Runtime Repos

  • • Transforms static record-keeping repositories to activerepositories

    • Makes repos data actionable

    Mining Software Repositories (MSR)

    • Makes repos data actionableby uncovering hidden patterns and trends

    3

    MailinglistBugzilla Crashes

    Field logs CVS/SVN

  • MSR researchersanalyze and cross-link repositories

    fixed bug

    discussionsBuggy change &

    Fixing change Field crashes

    Bugzilla CVS/SVNMailinglist Crashes

    Estimate fix effortMark duplicates

    Suggest experts and fix

    New Bug Report

  • MSR researchersanalyze and cross-link repositories

    fixed bug

    discussionsBuggy change &

    Fixing change Field crashes

    Bugzilla CVS/SVNMailinglist Crashes

    Suggest APIsWarn about risky code or bugs

    Suggest locations to co-change

    New Change

  • Supporting software understanding (NETBSD)

    Conceptual (proposed) Concrete (reality)

    6

    Why? Who?When? Where?

  • Mining supports software understanding (NETBSD)

    • Eight unexpected dependencies• All except two dependencies existed since day one:

    – Virtual Address Maintenance Pager

    – Pager Hardware Translations

    Auto-generatedfrom CVS repository

    7

    Which? vm_map_entry_create (in src/sys/vm/Attic/vm_map.c) depends on pager_map (in /src/sys/uvm/uvm_pager.c) Who? cgd

    When? 1993/04/09 15:54:59 Revision 1.2 of src/sys/vm/Attic/vm_map.c

    Why?

    from sean eric fagan: it seems to keep the vm system from deadlocking the system when it runs out of swap + physical memory. prevents the system from giving the last page(s) to anything but the referenced "processes" (especially important is the pager process, which should never have to wait for a free page).

  • Opportunities in the Road Ahead

    Repository Extract AnalyzeAdopt Results

    Show Value

    • Going beyond code and bugs• Taming the complexity of MSRTaming the complexity of MSR• Showing the value of repositories• Easing the adoption of MSR

  • Opportunities in the Road Ahead

    Repository Extract AnalyzeAdopt Results

    Show Value

    Going beyond code and bugs MSR 2004-2008:

    ~80% of publications focus on code and bugs

    • Explore non-structured data– Social aspects: emails and comments

    9

    – Social aspects: emails and comments• Link data between repos• Seek non-traditional repos

    – Demonstrate the value of IDE interactions or build failures repos

    • Understand the limitation of repos– Causation vs. Correlation

    • Small number of committers in OS projects

  • Opportunities in the Road Ahead

    Repository Extract AnalyzeAdopt Results

    Show Value

    • Simplify the extraction of high quality data

    Taming the complexity of MSR

    main() {int a;/*call

    help*/helpInfo();

    }

    helpInfo() {errorString!

    }main() {

    int a;/*call

    help*/h l I f ()

    helpInfo(){int b;}main() {

    int a;/*call

    help*/h l I f ()

    10

    – Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors

    • Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed

    • Improve the quality of repository data– Provide tools for annotation of repos data at creation

    helpInfo();}

    helpInfo();}

    V1:Undefined func.(Link Error)

    V2:Syntax error

    V3:Valid code

  • Opportunities in the Road Ahead

    Repository Extract AnalyzeAdopt Results

    Show Value

    • Simplify the extraction of high quality data

    Taming the complexity of MSR

    11

    – Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors

    • Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed

    • Improve the quality of repository data– Provide tools for annotation of repos data at creation

  • Opportunities in the Road Ahead

    Repository Extract AnalyzeAdopt Results

    Show Value

    • Understand the needs of practitionersP di ti b d l

    Showing the value of MSR

    12

    – Predicting buggy modules:• Buggy modules are well-known

    – Predicting fault occurrences at module level is too coarse• Study the performance in practice

    – Tools affecting the repos data• Show the practical benefits

    – Statistical improvements not sufficient– Cost of maintenance should be evaluated

    • Evaluate on non-open source systems

  • Opportunities in the Road Ahead

    Repository Extract AnalyzeAdopt Results

    Show Value

    • Simplify access to techniques

    Easing the adoption of MSR

    13

    – Integration into IDEs (HATARI, Hipikat, Myln, eRose)– A web service demonstration for an open source

    project• A continuously updating MSR Challenge

    • Help practitioners make decisions– MSR should aim to support not replace

    practitioners

  • Mining Software Repositories

    14

    http://msrconf.org

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure true /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles true /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /NA /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /LeaveUntagged /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice