Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Automatically Generated Patches as Debugging Aids: A Human Study

Yida Tao, Jindae Kim, Sunghun Kim

Dept. of CSE, The Hong Kong University of Science and Technology

Chang Xu

State Key Lab for Novel Software Technology, Nanjing University

• Promising research progress• ClearView1: Prevent all 10 Firefox exploits

• GenProg2: Fix 55/105 real bugs

[1] Automatically Patching Errors in Deployed Software. Perkins et al. SOSP’09[2] A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. Le Goues et al. ICSE’12

Automatic Program Repair

- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code

“It won't get your bug patched any quicker. You’ll just have shifted the coders' attention away from their own app's bugs, and onto the repair tool’s bugs.”

Automatic Program Repair

#what-could-possibly-go-wrong

• Blackbox repair

• Increasing maintenance cost

• Vulnerable to attack

- Slashdot discussion: http://science.slashdot.org/story/09/10/29/2248246/Fixing-Bugs-But-Bypassing-the-Source-Code- A human study of patch maintainability. ISSTA’12- Automatic patch generation learned from human-written patches. ICSE’13

#program-out-of-control

#what-could-possibly-go-wrong

• Blackbox repair

• Increasing maintenance cost

• Vulnerable to attack

Use automatically generated patches as debugging aids

Our Human Study

• Investigate the usefulness of generated patches as debugging aids

• Discuss the impact of patch quality on debugging performance

• Explore practitioners’ feedback on adopting automatic program repair

Methodology

BugsParticipantsDebugging aid

Debugis given to

BugsParticipantsDebugging aid 11

Low-quality generated patch

High-quality generated patch

Buggy method location

Grad: 44

Engr: 28

MTurk: 23

95 Participants

CS graduate students

Industrial software engineers

Amazon Mechanical Turk workers

44 Graduate students• Between-group design

14 students

15 students

44 Graduate students• Between-group design

14 students

15 students

44 Graduate students• Between-group design• Onsite setting

• Eclipse IDE• Supervised session

14 students

15 students

Remote participants(28 Engr + 23 MTurk)

• Within-group design

Remote participants(28 Engr + 23 MTurk)

• Within-group design• Online debugging system

Bug Selection Criteria

• Real bugs

• The bug has accepted patches written by developers

• Proper number of bugs

• The bug has generated patches with different quality

Automatic patch generation learned from human-written patches. Kim et al. ICSE’13

Auto-generated patch A Auto-generated patch B

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)if(sub!=null){

args[i+1] = sub.toString();}

for (int i=0; i<parenCount; i++)SubString sub = (SubString)parens.get(i)args[parenCount+1] = new Integer(reImpl.leftContext.length);

avg. ranking from 85 devs and students

High-Quality Patch Low-Quality patch

Participants submit 337 patches as their debugging outcome

Location109

LowQ112

HighQ116# submitted patches

w.r.t debugging aid

Location109

LowQ112

HighQ116# submitted patches

w.r.t debugging aid

Bug166

Bug274

Bug359

Bug476

Bug562

# submitted patches w.r.t bugs

Evaluation of debugging performance

Patch CorrectnessCorrectness

Patch Correctness

• Passing test casesCorrectness

Patch Correctness

• Passing test cases

• Matching the semantics of original accepted patches

Correctness

Patch Correctness

• Passing test cases

• Matching the semantics of original accepted patches

• 3 evaluators

Correctness

Debugging Time

• Eclipse Plug-in

• Website Timer

Correctness

Debugging time

Correctness

Debugging time

• Independent variables• Debugging aids

• Bugs

• Participant types

• Programming experience

Multiple Regression AnalysisCorrectness

Debugging time

• Independent variables• Debugging aids

• Bugs

• Participant types

• Programming experience

correctness = α0 + α1 ∙ x1 + α2 ∙ x2 + α3 ∙ x3 + α4 ∙ x4

debugging time = β0 + β1 ∙ x1 + β2 ∙ x2 + β3 ∙ x3 + β4 ∙ x4

Post-study Survey

• Helpfulness of debugging aids

• Difficulty of bugs

• Opinions on using generated patches as debugging aids

Correctness

Debugging time

Survey feedback

Results

High-quality patches significantly improve debugging correctness

Location LowQ HighQ

% of correct patches

Location LowQ HighQ

High-quality patches significantly improve debugging correctness

Positive Coefficient = 1.25

p-value= 0.00 < 0.05 48%

Location LowQ HighQ

Low-quality patches slightly undermine debugging correctness

Location LowQ HighQ

Low-quality patches slightly undermine debugging correctness

Negative Coefficient = -0.55

p-value= 0.09 48%

Location LowQ HighQ

Low-quality patches can undermine debugging correctness

Negative Coefficient = -0.55

p-value= 0.09 48%

High-quality patches are more useful for difficult bugs3

Bug Difficulty

Bug1Math-280

Bug2Rhino-114493

Bug3Rhino-192226

Bug4Rhino-217379

Bug5Rhino-76683

High-quality patches are more useful for difficult bugs3

Bug1 Bug2 Bug3 Bug4 Bug5

Location LowQ HighQ

Bug Difficulty

Bug1Math-280

Bug2Rhino-114493

Bug3Rhino-192226

Bug4Rhino-217379

Bug5Rhino-76683

4The type of debugging aid does not affect debugging time

Location LowQ HighQ

Debugging time (min)

5Other factors’ impact on debugging performance

Difficult bugs significantly slow down debugging

Engr and MTurk are more likely to debug correctly

Novices tend to benefit more from HighQ patches

Helpfulness of debugging aidsVery helpful

Helpful

Medium

Slightly Helpful

Not Helpful

Participants consider high-quality generated patches much more helpful than low-quality patches

Mann-Whitney U test

p-value = 0.001

Feedback

Quick starting point

• Point to the buggy area

• Brainstorm

“They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.”

Quick starting point

• Point to the buggy area

• Brainstorm

Confusing, incomplete, misleading

• Wrong lead, especially for novices

• Require further human perfection

“They would seem to be useful in helping find various ideas around fixing the issue, even if the patch isn’t always correct on its own.”

“Generated patches would be good at recognizing obvious problems”

“…but may not recognize more involved defects.”

“Generated patches would be good at recognizing obvious problems”

“…but may not recognize more involved defects.”

“Generated patches simplify the problem”

“…but they may over-simplify it by not addressing the root cause.”

“I would use generated patches as debugging aids, as they provide extra diagnostic information”

“…along with access to standard debugging tools.”

Threats to Validity

• Bugs and generated patches may not be representative

• Quality measure of generated patches may not generalize

• May not generalize to domain experts

• Possibility of blindly reusing generated patches• Remove patches that are submitted less than 1 minute

Takeaway

• Auto-generated patches can be useful as debugging aids• Participants fix bugs more correctly with auto-

generated patches

• Quality control is required• Participants’ debugging correctness is

compromised with low-quality generated patches

• Maximize the benefits• Difficult bugs

• Novice developers

Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)

Spiritual

Efsec fse-2007

FSE 101 Introduction

Reentrancer FSE

TRANSMITTERS - Deccan Electricals Transmitters.pdf · FSE 512, FSE 516 FSE 308, FSE 510, ... TRANSMITTERS Construction Cranes Industry & Logistics Forestry Applications Mobile Hydraulics

150220-FSE brochure

FSE Brochure

Programa analítico fse

FSE 2016 in Bochum - fse.2015.rump.cr.yp.to€¦ · FSE 2016 in Bochum. FSE 2016 •Dates: 20.-23. March 2016 •Program Chair: Thomas Peyrin fse.rub.de. Bochum in brief • In the

FSE 770 RADIOBUS

Manual Proiecte FSE

09 fse qualitymanagement

Fse -military_stakeholder_exercise_optimized

STIHL FSE 60

06 fse design

02 fse processmodels

Fse investor presentation_quironsalud

FSE 1073 FSE 1000 FSE 1073 XFSE 1070 FSE 1072 X en de · FSE 1073 FSE 1073 X Instruction Manual Bedienungsanleitung Notice d'Utilisation Brugsanvisning Käyttöohjeet Gebruiksaanwijzing

OTCQX: EXSFF Timmins Porcupine West Presentation FSE: E1H1€¦ · FSE: E1H1 . TSX Venture: EXS . OTCQX: EXSFF . FSE: E1H1 . Share Price: $0.20 . Market Capitalization(FD): $18.0

LSM FSE/PSE PF - Lift System with FSE

Proiect FSE Model