Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The Road Ahead for Mining Software Repositories
Ahmed E. HassanQueen’s University
CanadaCanada
Code Repos
SourceforgeGoogleCode
22
Field Logs
Source ControlCVS/SVN
Bugzilla Mailinglists
CrashRepos
Historical Repositories Runtime Repos
• Transforms static record-keeping repositories to activerepositories
• Makes repos data actionable
Mining Software Repositories (MSR)
• Makes repos data actionableby uncovering hidden patterns and trends
3
MailinglistBugzilla Crashes
Field logs CVS/SVN
MSR researchersanalyze and cross-link repositories
fixed bug
discussionsBuggy change &
Fixing change Field crashes
Bugzilla CVS/SVNMailinglist Crashes
Estimate fix effortMark duplicates
Suggest experts and fix
New Bug Report
MSR researchersanalyze and cross-link repositories
fixed bug
discussionsBuggy change &
Fixing change Field crashes
Bugzilla CVS/SVNMailinglist Crashes
Suggest APIsWarn about risky code or bugs
Suggest locations to co-change
New Change
Supporting software understanding (NETBSD)
Conceptual (proposed) Concrete (reality)
6
Why? Who?When? Where?
Mining supports software understanding (NETBSD)
• Eight unexpected dependencies• All except two dependencies existed since day one:
– Virtual Address Maintenance Pager
– Pager Hardware Translations
Auto-generatedfrom CVS repository
7
Which? vm_map_entry_create (in src/sys/vm/Attic/vm_map.c) depends on pager_map (in /src/sys/uvm/uvm_pager.c) Who? cgd
When? 1993/04/09 15:54:59 Revision 1.2 of src/sys/vm/Attic/vm_map.c
Why?
from sean eric fagan: it seems to keep the vm system from deadlocking the system when it runs out of swap + physical memory. prevents the system from giving the last page(s) to anything but the referenced "processes" (especially important is the pager process, which should never have to wait for a free page).
Opportunities in the Road Ahead
Repository Extract AnalyzeAdopt Results
Show Value
• Going beyond code and bugs• Taming the complexity of MSRTaming the complexity of MSR• Showing the value of repositories• Easing the adoption of MSR
Opportunities in the Road Ahead
Repository Extract AnalyzeAdopt Results
Show Value
Going beyond code and bugs MSR 2004-2008:
~80% of publications focus on code and bugs
• Explore non-structured data– Social aspects: emails and comments
9
– Social aspects: emails and comments• Link data between repos• Seek non-traditional repos
– Demonstrate the value of IDE interactions or build failures repos
• Understand the limitation of repos– Causation vs. Correlation
• Small number of committers in OS projects
Opportunities in the Road Ahead
Repository Extract AnalyzeAdopt Results
Show Value
• Simplify the extraction of high quality data
Taming the complexity of MSR
main() {int a;/*call
help*/helpInfo();
}
helpInfo() {errorString!
}main() {
int a;/*call
help*/h l I f ()
helpInfo(){int b;}main() {
int a;/*call
help*/h l I f ()
10
– Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors
• Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed
• Improve the quality of repository data– Provide tools for annotation of repos data at creation
helpInfo();}
helpInfo();}
V1:Undefined func.(Link Error)
V2:Syntax error
V3:Valid code
Opportunities in the Road Ahead
Repository Extract AnalyzeAdopt Results
Show Value
• Simplify the extraction of high quality data
Taming the complexity of MSR
11
– Toolkits and extracted data (e.g. FLOSSMetrics) are needed– Heuristics should be empirically verified– Acknowledgement mechanism needed for extractors
• Deal with skew in repository data– Visualization can help spot skew– Guidelines and re-sampling/robust techniques are needed
• Improve the quality of repository data– Provide tools for annotation of repos data at creation
Opportunities in the Road Ahead
Repository Extract AnalyzeAdopt Results
Show Value
• Understand the needs of practitionersP di ti b d l
Showing the value of MSR
12
– Predicting buggy modules:• Buggy modules are well-known
– Predicting fault occurrences at module level is too coarse• Study the performance in practice
– Tools affecting the repos data• Show the practical benefits
– Statistical improvements not sufficient– Cost of maintenance should be evaluated
• Evaluate on non-open source systems
Opportunities in the Road Ahead
Repository Extract AnalyzeAdopt Results
Show Value
• Simplify access to techniques
Easing the adoption of MSR
13
– Integration into IDEs (HATARI, Hipikat, Myln, eRose)– A web service demonstration for an open source
project• A continuously updating MSR Challenge
• Help practitioners make decisions– MSR should aim to support not replace
practitioners
Mining Software Repositories
14
http://msrconf.org
/ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 300 /GrayImageMinResolutionPolicy /OK /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 1200 /MonoImageMinResolutionPolicy /OK /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False
/Description > /Namespace [ (Adobe) (Common) (1.0) ] /OtherNamespaces [ > /FormElements false /GenerateStructure true /IncludeBookmarks false /IncludeHyperlinks false /IncludeInteractive false /IncludeLayers false /IncludeProfiles true /MultimediaHandling /UseObjectSettings /Namespace [ (Adobe) (CreativeSuite) (2.0) ] /PDFXOutputIntentProfileSelector /NA /PreserveEditing true /UntaggedCMYKHandling /LeaveUntagged /UntaggedRGBHandling /LeaveUntagged /UseDocumentBleed false >> ]>> setdistillerparams> setpagedevice