Upload
roy-phelps
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Open Source Java Bug Study:Understanding where help is needed
Tim Halloran
SSSG 6 Nov 2003
Carnegie Mellon University
Motivation—Why study open source Java bugs?
Technology: Chains of evidence (CoE) Extra-linguistic program assurance
(Lock, Uniqueness) Bureaucratic (mechanical)
Fluid Assurance Tool
Where ishelp needed
What canbe assured
+impact
Question: Can this have a positive impact on practice?
Study defect reports on and code changes made (fixes) to widely deployed open source Java projects
Goal: determine, empirically, how useful CoE is (how common and
“costly” are defects CoE could help prevent)
This talk
Methodology Data collection
Selected Java projects Tool data and limitations (and solutions)
Variable creation Variable reduction Summary Questions & discussion
Focus
Today…
Methodology
(1) Bug Selection
Data collection
(2) Expert Analysis
Variable creation
Variable reduction/exploratory data analysis
SamplingInter-raterReliability?
Results Analysis
Expert judgment:Is bug/fix bureaucratic?Could CoE have helped?Semantic category?Do we understand bug/fix?
Develop definitions (bureaucratic)
Example…
Data collection:3 Java projects investigated
Ant (64 kSLOC) A Java-based build tool
Struts (40 kSLOC) Framework for building Java web
applications based on a variation of the classic MVC design paradigm
Tomcat (65 kSLOC Java) The official reference
implementation of Java Servlet and JavaServer Pages technologies (web server)
Ant
Selection: Widely used Java software (external validity?)
Struts
Tomcat
Data collection:Tool data used
Software Defect (“Bug”) Data Off-line copy of Apache Software Foundation (ASF)
Bugzilla MySQL database Ant: 2,230 bugs (7-Sep-00 to 16-May-03) Struts: 1,473 bugs (19-Oct-00 to 16-May-03) Tomcat: 4,052 bugs (26-Aug-00 to 16-May-03)
Code Changes CVS commit logs
Ant: 9,565 commits (13-Jan-00 to 4-Jun-03) Struts: 3,610 commits (31-May-00 to 4-Jun-03) Tomcat: 14,833 commits (10-Oct-99 to 4-Jun-03)
Data collection:Limitations of ASF tool data
BugzillaBugs
CommitLogs
Goal: Link code changes made by each bug toadd code change information to bug information
Problems•No link from bug to commits•Informal links from commits to bugs•Informal identity management
Problem: No link from bug to commits
Data examples------------------ CVS commit log 1272 at 2001-02-01 15:37:28 by Nico Seessle ------------------Fixed Bug #378.ExecuteOn (and Apply) have a default-value of false for their parallel-attribute.
Problem: Informal links from commits to bugs
Problem: Informal identity management
[email protected]@[email protected]@daedelus.apache.org
Commit Email Real name Bugzilla Id
Craig R. McClanahan [email protected]
Solution: 1st manual identity determination
Manual building of project committer identity 99 individuals identified Used:
ASF web pages Google, etc. Dates of actions Project mailing lists (headers noting real name)
Very Manual—High Confidence in Links: an “Anchor” for linking bugs to commit logs
Solution: 2nd semi-automated linking of bugs to commits
Wrote Java code to assist linking CVS commits to individual Bugzilla bugs Extracts all numbers from CVS commit log Checks if number is a bug for the project
Becomes set of possible bugs Checks if commit is within the duration of bug Checks if committer was “involved” with the bug
Becomes inferred set of bugs
If extracted set matches inferred set then entry is made automatically—otherwise
researcher shown all information and asked to correct the inferred set (if necessary)
Example: Automatic Link
"struts" bug 15799 found : created 2003-01-04 15:12:17 (15799) Bugzilla description: Nested tags picks up wrong bean for values (15799) 2003-01-05 22:13:43 David Morris 4 1.0 Beta 3 1.1 Beta 3 (15799) 2003-02-04 21:03:34 James Mitchell 4 1.1 Beta 3 Nightly Build (15799) 2003-02-05 02:40:54 James Turner 15 [email protected] (15799) 2003-02-05 03:36:34 Ted Husted 4 Nightly Build 1.1 Beta 3 (15799) 2003-02-06 00:36:48 Arron Bates 8 NEW RESOLVED (15799) 2003-02-06 00:36:48 Arron Bates 11 FIXED------------------ CVS commit log 27541 at 2003-02-05 16:26:11 by Arron Bates ------------------Committed patch Bug15799, reported and patched by David Morris.IDEA also told me to remove a redundant class cast ( ...a fashionable thing to do it seems :)Inferred set [15799] = [15799]
No decision required by researcher
Example: Manual Link"tomcat" bug 207 found : created 2000-10-28 11:58:02 (207) Bugzilla description: mod_jk.conf-auto is not generated when tomcat is started BugRat Report#319Not adding bug 207 to inferred set [:log time after bug lifetime:comitter not in bug group]"tomcat" bug 660 found : created 2001-02-21 03:04:15 (660) Bugzilla description: Bad context on Authentication Form PageNot adding bug 660 to inferred set [:log time after bug lifetime:comitter not in bug group]"tomcat" bug 371 found : created 2000-12-22 20:24:31 (371) Bugzilla description: Webdav status code 207 not present in core/LocalStrings.properties BugRat Report#660------------------ CVS commit log 13662 at 2001-03-15 12:15:21 by Marc Saegesser ------------------Added 207 result code for WEBDAV.PR: 660/Bugzilla 371Submitted by: [email protected] (David F. Sklar)Inferred set [371]Link bug ids (c to clear)[207, 660, 371] 371
Decision required by researcher: 207 is a result code (not a bug reference) and 660 is the id from the pre-
Bugzilla Jakarta bug system
MANUAL INPUT
Noting and linking outside contribution: not done (yet)
Linking contribution by non-committers to bug fixes (or enhancements) between CVS and Bugzilla Often committers commit code changes contributed by
non-committers No standard approach in CVS logs to indicate such a
contribution (informal references to known contributors) Obscuring of email address (to fight SPAM) has hit
open source logs
Linking contributor names to Bugzilla Ids would face same issues noted for committers Larger scale and less “context” to manually build up
a case to link identity to identifiers
Testcase submitted by: Martijn Kruithof <martijn at kruithof.xs4all.nl>
Variable creation:Narrowing bug focus
Bugs Ant Struts Tomcat Total
total 2,230 1,474 4,052 7,756
fixed 886 711 1302 2,899
w/java 479 275 561 1,315
0
500
1000
1500
2000
2500
3000
3500
4000
4500
Ant Struts Tomcat
Bugstotal
fixed
w/java
total to fixed? fixed to w/java?
Examined 20 bugs:
Project Lost
Doc Proc Fixed-NIR
Ant 1 13 3 3
Struts 5 12 1 2
Tomcat 9 1 2 8
Total: 15 26 6 13
Variable creation:Per-bug variables
Variable Description Use/Transformation
Ptotal Total # of people involved Ppublic_LN = Log(Ptotal - Pkey + 1)
Pkey # of project committers involved Pcommit_BI (BI: 0 = no; 1 = yes)
Pasf # of ASF members involved Pasf_BI (any?)
Dtotal Duration in days
Dtotal_nonlater Duration in days excluding any time in LATER status Dtotal_nonlater_LN
STATUSchange # of changes (any type) to bug STATUSchange_BI (>2 total)
DUPcount # of duplicate bugs reported to this one DUPEcount_BI (any duplicates)
COMMcount # of comments posted
COMMsize Size in characters of all comments COMMsize_LN
ATTACHcount # of attachments (patches, images) posted ATTACHcount_BI (any?)
ATTACHsize Size in bytes of all attachments
REOPENEDcount # of times bug was reopened after being closed
PRIORITYFinal programmer assigned priority(Low, Med, High, Other)
PRIORITY_BI (not Other)
PRIORITYchanges # of times the priority of the bug was changed
SEVERITYFinal programmer assigned severity(Enhancement, Minor, Normal, Major, Critical)
SEVERITY_BI (> Normal)
SEVERITYchanges # of times the severity of the bug was changed
JavaSLOCcount # of lines of Java changed for bug fix JavaSLOCcount
JavaCUcount # of Java files (compilation units) changed for bug fix JavaCUCount
JavaPKcount # of Java packages changed for bug fix JavaPKCount
subsets
non-normal
Variable reduction: (preliminary) Principal components analysis
Factor 1: Public interest Public_LN (0.7) COMMsize_LN (0.6) DUPcount_BI (0.6) STATUSchanges_BI (0.3)
Factor 2: Java code changed JavaCUchange (0.9) JavaPKchange (0.8) JavaSLOCchange (0.7)
Factor 3: Committer interest Pcommit_BI (0.9) Pasf_BI (-0.9)
Factor 4: Effort/Time Dtotal_nonLATER_LN (0.7) PRIORITY_BI (0.7) STATUSchanges_BI (0.6) SEVERITY_BI(-0.3)
# of:committer
s
ASFmember
s either
0 449 bugs 651 bugs 2 bugs
1 643 bugs 633 bugs 901 bugs
2 183 bugs 27 bugs 325 bugs
3 29 bugs 4 bugs 65 bugs
4 11 bugs 17 bugs
5 5 bugs
Summary
We have a reasonable set of “synthetic” measures of some of the important characteristics of bugs and their fixes How “costly” in several dimensions (time, public
interest, etc.) Next step: Identify, via expert judges, bugs
for which CoE would have been effective Combination with results so far will provide some
understanding of how
Questions & Discussion
Questions?
Issues: Approach to study
Definitions bureaucratic (mechanical) vs.
functional program properties
NetBeans data