Upload
ahmed-lamkanfi
View
916
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Predicting the Severity of a Reported Bug
Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart GoethalsAnsymo | s.e.a.l. | ADReM
Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10
Predicting the Severity of a Reported Bug
Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart GoethalsAnsymo | s.e.a.l. | ADReM
Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10
Severity of a bug is important
✓ Critical factor in deciding how soon it needs to be fixed, i.e. when prioritizing bugs
Priority is business
Seve
rity
is
tech
nic
al
✓ Severity varies:➡ trivial, minor, normal major, critical and blocker
➡ clear guidelines exist to classify severity of bug reports
✓ Severity varies:➡ trivial, minor, normal major, critical and blocker
➡ clear guidelines exist to classify severity of bug reports
✓ Both a short and longer description of the problem
✓ Severity varies:➡ trivial, minor, normal major, critical and blocker
➡ clear guidelines exist to classify severity of bug reports
✓ Both a short and longer description of the problem
✓ Bugs are grouped according to products and components➡ e.g.: plug-ins, bookmarks are components of
product Firefox
Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?
Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?
Also the following questions:
Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?
Also the following questions:
Potential indicators?
Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?
Also the following questions:
Potential indicators?
Short versus long description?
Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?
Also the following questions:
Potential indicators?
Short versus long description?
Per component versus cross-component?
Approach
We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic occurrence of words
• training and evaluation period
• in first instance, per component
We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic occurrence of words
• training and evaluation period
• in first instance, per component
We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic occurrence of words
• training and evaluation period
• in first instance, per component
We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic occurrence of words
• training and evaluation period
• in first instance, per component
Non-severe bugs(trivial, minor)
Severe bugs(major, critical, blocker)
We use text mining to classify bug reports
• Bayesian classifier: based on the probabilistic occurrence of words
• training and evaluation period
• in first instance, per component
Non-severe bugs(trivial, minor)
Severe bugs(major, critical, blocker)
Default(normal)
Un
de
cid
ed
Evaluation of the approach:✓ precision and recall:
Cases drawn from the open-source community✓ Mozilla, Eclipse and GNOME
Results
How does the basic approach perform?➡ per component and using short description
How does the basic approach perform?➡ per component and using short description
Non-severeNon-severe SevereSeverecomponent precision recall precision recall
Mozilla: Layout 0.701 0.785 0.752 0.653
Mozilla: Bookmarks 0.692 0.703 0.698 0.687
Eclipse: UI 0.707 0.633 0.668 0.738
Eclipse: JDT-UI 0.653 0.714 0.685 0.621
GNOME: Calendar 0.828 0.783 0.794 0.837
GNOME:Contacts 0.767 0.706 0.728 0.785
What keywords are good indicators of severity?
What keywords are good indicators of severity?
Component Non-severe Severe
Mozilla Firefox- Generalinconsist, favicon, credit,
extra, consum, licens, underlin, typo, inspector,
titlebar
Fault, machin, reboot, reinstal, lockup, seemingli, perman,
instantli, segfault, compil
Eclipse JDT UIdeprec, style, runnabl,
system, cce, tvt35, whitespac, node, put, param
hang, freez, deadlock, thread, slow, anymor,
memori, tick, jvm, adapt
GNOME Mailermnemon, outbox, typo, pad,
follow, titl, high, acceler, decod, reflec
deadlock, sigsegv, relat, caus, snapshot, segment,
core, unexpectedli, build, loop
How does the approach perform when using the longer description?
How does the approach perform when using the longer description?
Non-severeNon-severe SevereSeverecomponent precision recall precision recall
Mozilla: Layout 0.583 0.961 0.890 0.314
Mozilla: Bookmarks 0.536 0.963 0.820 0.166
Mozilla: Firefox general 0.578 0.948 0.856 0.308
Eclipse: UI 0.548 0.976 0.892 0.197
Eclipse: JDT-UI 0.547 0.973 0.881 0.195
Eclipse: JDT-Text 0.570 0.988 0.955 0.257
How does the approach perform when using the longer description?
Non-severeNon-severe SevereSeverecomponent precision recall precision recall
Mozilla: Layout 0.583 0.961 0.890 0.314
Mozilla: Bookmarks 0.536 0.963 0.820 0.166
Mozilla: Firefox general 0.578 0.948 0.856 0.308
Eclipse: UI 0.548 0.976 0.892 0.197
Eclipse: JDT-UI 0.547 0.973 0.881 0.195
Eclipse: JDT-Text 0.570 0.988 0.955 0.257
How does the approach perform when combining bugs from different components?
How does the approach perform when combining bugs from different components?
Non-severeNon-severe SevereSevere
component precision recall precision recall
Mozilla 0.704 0.750 0.733 0.685
Eclipse 0.693 0.553 0.628 0.755
GNOME 0.817 0.737 0.760 0.835
How does the approach perform when combining bugs from different components?
Non-severeNon-severe SevereSevere
component precision recall precision recall
Mozilla 0.704 0.750 0.733 0.685
Eclipse 0.693 0.553 0.628 0.755
GNOME 0.817 0.737 0.760 0.835
Much larger training set necessary✓± 2000 reports instead of ± 500 per severity!
Conclusions
✓ It is possible to predict the severity of a reported bug
✓Short description better source for predictions
✓Cross-component approach works, but requires more training samples