MIning Software Repositories (MSR) 2010 presentation

Preview:

DESCRIPTION

 

Citation preview

Predicting the Severity of a Reported Bug

Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart GoethalsAnsymo | s.e.a.l. | ADReM

Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10

Predicting the Severity of a Reported Bug

Ahmed Lamkanfi, Serge Demeyer | Emanuel Giger | Bart GoethalsAnsymo | s.e.a.l. | ADReM

Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories, p.1-10

Severity of a bug is important

✓ Critical factor in deciding how soon it needs to be fixed, i.e. when prioritizing bugs

Priority is business

Seve

rity

is

tech

nic

al

✓ Severity varies:➡ trivial, minor, normal major, critical and blocker

➡ clear guidelines exist to classify severity of bug reports

✓ Severity varies:➡ trivial, minor, normal major, critical and blocker

➡ clear guidelines exist to classify severity of bug reports

✓ Both a short and longer description of the problem

✓ Severity varies:➡ trivial, minor, normal major, critical and blocker

➡ clear guidelines exist to classify severity of bug reports

✓ Both a short and longer description of the problem

✓ Bugs are grouped according to products and components➡ e.g.: plug-ins, bookmarks are components of

product Firefox

Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?

Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?

Also the following questions:

Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?

Also the following questions:

Potential indicators?

Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?

Also the following questions:

Potential indicators?

Short versus long description?

Can we accurately predict the severity of a reported bug by analyzing its textual descriptions?

Also the following questions:

Potential indicators?

Short versus long description?

Per component versus cross-component?

Approach

We use text mining to classify bug reports

• Bayesian classifier: based on the probabilistic occurrence of words

• training and evaluation period

• in first instance, per component

We use text mining to classify bug reports

• Bayesian classifier: based on the probabilistic occurrence of words

• training and evaluation period

• in first instance, per component

We use text mining to classify bug reports

• Bayesian classifier: based on the probabilistic occurrence of words

• training and evaluation period

• in first instance, per component

We use text mining to classify bug reports

• Bayesian classifier: based on the probabilistic occurrence of words

• training and evaluation period

• in first instance, per component

Non-severe bugs(trivial, minor)

Severe bugs(major, critical, blocker)

We use text mining to classify bug reports

• Bayesian classifier: based on the probabilistic occurrence of words

• training and evaluation period

• in first instance, per component

Non-severe bugs(trivial, minor)

Severe bugs(major, critical, blocker)

Default(normal)

Un

de

cid

ed

Evaluation of the approach:✓ precision and recall:

Cases drawn from the open-source community✓ Mozilla, Eclipse and GNOME

Results

How does the basic approach perform?➡ per component and using short description

How does the basic approach perform?➡ per component and using short description

Non-severeNon-severe SevereSeverecomponent precision recall precision recall

Mozilla: Layout 0.701 0.785 0.752 0.653

Mozilla: Bookmarks 0.692 0.703 0.698 0.687

Eclipse: UI 0.707 0.633 0.668 0.738

Eclipse: JDT-UI 0.653 0.714 0.685 0.621

GNOME: Calendar 0.828 0.783 0.794 0.837

GNOME:Contacts 0.767 0.706 0.728 0.785

What keywords are good indicators of severity?

What keywords are good indicators of severity?

Component Non-severe Severe

Mozilla Firefox- Generalinconsist, favicon, credit,

extra, consum, licens, underlin, typo, inspector,

titlebar

Fault, machin, reboot, reinstal, lockup, seemingli, perman,

instantli, segfault, compil

Eclipse JDT UIdeprec, style, runnabl,

system, cce, tvt35, whitespac, node, put, param

hang, freez, deadlock, thread, slow, anymor,

memori, tick, jvm, adapt

GNOME Mailermnemon, outbox, typo, pad,

follow, titl, high, acceler, decod, reflec

deadlock, sigsegv, relat, caus, snapshot, segment,

core, unexpectedli, build, loop

How does the approach perform when using the longer description?

How does the approach perform when using the longer description?

Non-severeNon-severe SevereSeverecomponent precision recall precision recall

Mozilla: Layout 0.583 0.961 0.890 0.314

Mozilla: Bookmarks 0.536 0.963 0.820 0.166

Mozilla: Firefox general 0.578 0.948 0.856 0.308

Eclipse: UI 0.548 0.976 0.892 0.197

Eclipse: JDT-UI 0.547 0.973 0.881 0.195

Eclipse: JDT-Text 0.570 0.988 0.955 0.257

How does the approach perform when using the longer description?

Non-severeNon-severe SevereSeverecomponent precision recall precision recall

Mozilla: Layout 0.583 0.961 0.890 0.314

Mozilla: Bookmarks 0.536 0.963 0.820 0.166

Mozilla: Firefox general 0.578 0.948 0.856 0.308

Eclipse: UI 0.548 0.976 0.892 0.197

Eclipse: JDT-UI 0.547 0.973 0.881 0.195

Eclipse: JDT-Text 0.570 0.988 0.955 0.257

How does the approach perform when combining bugs from different components?

How does the approach perform when combining bugs from different components?

Non-severeNon-severe SevereSevere

component precision recall precision recall

Mozilla 0.704 0.750 0.733 0.685

Eclipse 0.693 0.553 0.628 0.755

GNOME 0.817 0.737 0.760 0.835

How does the approach perform when combining bugs from different components?

Non-severeNon-severe SevereSevere

component precision recall precision recall

Mozilla 0.704 0.750 0.733 0.685

Eclipse 0.693 0.553 0.628 0.755

GNOME 0.817 0.737 0.760 0.835

Much larger training set necessary✓± 2000 reports instead of ± 500 per severity!

Conclusions

✓ It is possible to predict the severity of a reported bug

✓Short description better source for predictions

✓Cross-component approach works, but requires more training samples

Recommended