46
Static and Adaptive Bug Fix Patterns Jim Whitehead, Sung Kim, Kai Pan University of California, Santa Cruz

Static and Adaptive Bug Fix Patterns

Embed Size (px)

Citation preview

Page 1: Static and Adaptive Bug Fix Patterns

Static and Adaptive Bug Fix Patterns

Jim Whitehead, Sung Kim, Kai Pan

University of California, Santa Cruz

Page 2: Static and Adaptive Bug Fix Patterns

Bug and Bug Fix Patterns?

• Are bugs and bug fixes random in their goal and structure, or do they exhibit patterns?

• We know there are some patterns, since there are existing pattern-oriented static analysis tools that are able to detect some bugs

• Hypothesis: there are both project-specific and project-independent patterns that are detectable in bugs and bug fixes

Page 3: Static and Adaptive Bug Fix Patterns

Static and Adaptive Bug Fix Patterns

• Static: syntax-driven change patterns► Example: changing an if condition expression► Found by statically analyzing code to detect

conformance to a pattern► Horizontal: same pattern can be found in multiple

projects

• Adaptive: memory-driven change patterns► Example: frequent string literal changes► Found by detecting a previous similar bug fix in a

project-specific bug fix database, or “memory”► Vertical: each pattern is specific to a given project

Page 4: Static and Adaptive Bug Fix Patterns

Promise of Bug and Fix Patterns

• If bugs exhibit detectable patterns, it would be possible to automatically detect bugs

• If there are common bug to fix mappings, it would be possible to supply a recommended fix for a detected bug

• Using bug fix patterns, it would be possible to see the frequency distribution of the patterns

► Would be useful to understand which kind of patterns occur more frequently

• Broadly, such patterns would contribute to an improved understanding of maintenance activity

Page 5: Static and Adaptive Bug Fix Patterns

Talk Overview

• Terminology and Detection of Bug Fix Changes• Static Bug Fix Patterns• Adaptive Bug Fix Patterns• Conclusions

Page 6: Static and Adaptive Bug Fix Patterns

Retrieving Bug Fix Changes

• Software projects today record their development history using Software Configuration Management tools

• As developers make changes, they record a reason along with the change

► In the change log message• When developers fix a bug in the software, they tend to

record log messages with some variation of the words “fixed” or “bug”

► “Fixed null pointer bug”• It is possible to mine the change history of a software

project to uncover these bug-fix changes• That is, we retrospectively recover those changes that

developers have marked as containing a bug fix► We assume they are not lying

Page 7: Static and Adaptive Bug Fix Patterns

Bug-introducing and bug-fix changes

Development history of foo.java

SCM log message: “Bug #567 fixed”

“bug fix”

Bug #567 entered into issue tracking system (bug finally observed and recorded)

Software change that introduces the bug “bug-introducing”

Page 8: Static and Adaptive Bug Fix Patterns

Commits, Transactions & Configurations

transactions

configurations

CVS file commits

Added feature X

Fixed null ptr bug

Modified button text

Added feature Y

log message

Page 9: Static and Adaptive Bug Fix Patterns

Hunks, and Hunk PairsRevision n-1(has bug hunks)

Revision n(has fix hunks)

modification

addition

deletion

added hunk

hunk pair type

deleted hunk

empty deleted hunk

empty added hunk

Page 10: Static and Adaptive Bug Fix Patterns

Static Bug Fix Patterns

Page 11: Static and Adaptive Bug Fix Patterns

Static Bug Patterns

• Performed manual analysis of bug fix hunk pairs in Java programs

► Examined bug hunks and corresponding fix hunks► Looked for syntax patterns of recurring changes► Identified 27 static bug fix patterns in Java code

Page 12: Static and Adaptive Bug Fix Patterns

Example Pattern

• Method Call with Different Actual Parameter Values (MC-DAP)

► The bug fix changes the expression passed into one or more parameters of a method call

- tree.putClientProperty(“JTree.lineStyle”, “Horizontal”);

+ tree.putClientProperty(“JTree.linStyle”, “Angled”);

- = bug revision

+ = fix revision

Page 13: Static and Adaptive Bug Fix Patterns

Static Bug Fix Pattern Categories

• Eight categories of static bug fix patterns► If-related► Method call► Sequence► Loop► Assignment► Switch► Try► Method declaration► Class field

Page 14: Static and Adaptive Bug Fix Patterns

If Patterns

• Addition of precondition check► Adds if around existing statement(s)

• Addition of precondition check with jump► Adds if before statement(s) with return/continue/break if

condition is met• Addition of postcondition check

► Adds if statement after operation to check results• Removal of if predicate

► Removal of if surrounding statement(s)• Addition of else branch

► Adds an else branch to existing if statement• Removal of else branch

► Remove else branch from existing if statement• Change of if condition expression

► Modify the conditional part of an if statement

Page 15: Static and Adaptive Bug Fix Patterns

Method Call Patterns

• Method call with different number of parameters of different types of parameters

► Same method name, but different number of parameters or types of parameters

► Change of method interface, or use of overloaded method

• Method call with different actual parameter values

• Change of class instance method call► Fix code calls a different member method of a class

instance

Page 16: Static and Adaptive Bug Fix Patterns

Sequence Patterns

• Addition of operations in an operation sequence of method calls to an object

► Many calls to the same object all in sequence – add one or more

• Removal of operations from an operation sequence of method calls to an object

► Many calls to the same object in sequence – remove one or more

• Addition of operations in a field setting sequence• Removal of operations from a field setting sequence• Addition or removal of method calls in a short construct

body► A short construct body is a short method (2 or 3 statements), or

an if or while body that is short (2 or 3 statements)

Page 17: Static and Adaptive Bug Fix Patterns

Loop and Assignment Patterns

• Change of loop predicate► Bug fix changes the loop condition of a loop

statement

• Change of expression that modifies the loop variable

► Bug fix changes the expression that modifies the loop variable, or adds a statement that modifies the loop variable

• Change of assignment expression► Bug fix changes the expression on the right hand

side of an assignment statement

Page 18: Static and Adaptive Bug Fix Patterns

Switch and Try Patterns

• Addition/removal of switch branch► Bug fix adds/removes a case from a switch

statement

• Addition/removal of try statement► Bug fix adds a try/catch statement to enclose a

section of code, or removes a try/catch statement

• Addition/removal of a catch block► Bug fix adds a catch block to an existing try

statement

Page 19: Static and Adaptive Bug Fix Patterns

Method Declaration and Class Field Patterns

• Change of method delcaration► Change to the declared interface for a method

• Addition of method declaration► Adding new method to existing class

• Removal of method declaration► Removal of an existing method

• Addition of a class field• Removal of a class field• Change of class field declaration

Page 20: Static and Adaptive Bug Fix Patterns

Evolutionary Pattern Analysis

• How many bug fixes contain a pattern?

• How frequently do these patterns occur in actual bug fixes?

• Are pattern frequencies consistent across projects?

• Analyzed five Java open source project histories

• Ran bug fix pattern detector program over bug fix changes

Project Revisions Bug Fixes

ArgoUML 4,685 1,310

Columba 2,362 797

Eclipse 6,394 2,807

JEdit 1,190 557

Scarab 2,962 535

Page 21: Static and Adaptive Bug Fix Patterns

Pattern Coverage

• What percentage of bug fixes contain at least one pattern? (About half)

Pattern Coverage

44.8% 47.5%52.6%

56.4%

45.8%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

Argouml Columba Eclipse Jedit Scarab

Page 22: Static and Adaptive Bug Fix Patterns

Frequency of pattern categories

Category ArgoUML Columba Eclipse JEdit Scarab

If-related 23.2% 20.0% 34.0% 30.5% 23.0%

Method call 30.3% 26.2% 26.5% 22.2% 33.9%

Sequence 9.7% 17.5% 6.5% 13.5% 9.5%

Loop 1.9% 0.8% 2.2% 1.6% 1.4%

Assignment 8.6% 7.6% 6.4% 8.4% 7.4%

Switch 0.0% 0.3% 1.6% 0.6% 0.0%

Try 1.0% 1.9% 2.6% 1.0% 1.4%

Method declaration

16.2% 17.2% 13.2% 13.4% 16.6%

Class field 7.6% 8.4% 7.0% 8.7% 6.7%

Page 23: Static and Adaptive Bug Fix Patterns

Cross project similarity

• Pearson correlation between the pattern frequencies across projects. (p-value < 0.001)

• Projects have surprisingly similar pattern frequencies

ArgoUML Columba Eclipse JEdit Scarab

ArgoUML 1 0.94 0.89 0.93 0.99

Columba 0.94 1 0.76 0.87 0.93

Eclipse 0.89 0.76 1 0.94 0.89

JEdit 0.93 0.87 0.94 1 0.92

Scarab 0.99 0.93 0.89 0.92 1

Page 24: Static and Adaptive Bug Fix Patterns

Most frequent individual patterns

Pattern ArgoUML Columba Eclipse JEdit Scarab

Method call with different actual parameters

24.0% 19.9% 18.0% 15.1% 26.1%

Change of if condition expression

10.9% 7.0% 18.7% 13.1% 11.0%

• Only two patterns consistently occur at over 10% frequency

Page 25: Static and Adaptive Bug Fix Patterns

Diving into if conditionals

• What is causing if conditionals to be such a prevalent bug fix type? (no clear answer yet)

ArgoUML Eclipse JEdit

Added condition clause 13.1% 20.8% 23.1%

Removed condition clause 11.5% 6.9% 11.2%

Added new variable 8.3% 14.3% 23.7%

Removed existing variable 12.0% 9.7% 15.1%

Increased number of operators 22.4% 22.3% 38.0%

Decreased number of operators 14.6% 15.1% 21.0%

Page 26: Static and Adaptive Bug Fix Patterns

Static Pattern Summary

• Can automatically detect 27 static bug fix patterns

• About 50% of all bug fix changes match at least one pattern

• If conditionals and method call parameter changes are the two most prevalent patterns

• Pattern frequencies are remarkably similar across analyzed projects

Page 27: Static and Adaptive Bug Fix Patterns

Adaptive Bug Fix Patterns

Page 28: Static and Adaptive Bug Fix Patterns

Project-Specific Bug Fix Patterns

• There are many bug fix patterns that are specific to an individual project, and may not match one of the static patterns

• Example from Eclipse project:► JavaProject.java, transaction 2024 (“Fix for bug 28434”)

- if (requiredProjectRsc.exists() && requiredProjectRsc.isOpen()) {

+ if (JavaProject.hasJavaNature(requiredProjectRsc))

► DeltaProcessor.java, transaction 1945 (“Fix for bug 27499”)

- boolean isOpened=proj.isOpen();

- if (isOpened && this.hasJavaNature(proj))

+ if (JavaProject.hasJavaNature(proj))

Page 29: Static and Adaptive Bug Fix Patterns

Detecting Non-Static Patterns

• Detecting non-static patterns► Saving exact code in bug and fix hunks doesn’t

work, since there is rarely an exact match.► Need a method for abstracting changes to find

patterns

• Approach► Abstract code in each bug fix change► Save abstracted bug and fix code in a database (the

“bug fix memory”)► Can search existing code to see if it matches a bug

fix pattern► Can suggest code to fix the bug

Page 30: Static and Adaptive Bug Fix Patterns

Adaptive Patterns

• Since the contents of the bug fix memory comes from a specific project its contained patterns adapt to that project.

• The set of known patterns changes over time, as information from new bug fixes is added.

• Can view the bug fix memory as a kind of online algorithm for learning project-specific bug fix patterns

Page 31: Static and Adaptive Bug Fix Patterns

Process for Abstracting Code

• Four step process► Raw component extraction

• Parse source code in a hunk, and burst out individual syntactic elements

► Normalization• Substitute type names for variables, string literals,

constants (abstract to types)► Information filtering

• Remove elements that are too common to yield project-specific patterns

► Diff filtering• Remove code components that are common in bug and fix

hunks, yielding only code unique to the change

Page 32: Static and Adaptive Bug Fix Patterns

Raw Component Extraction

• Step 1: Convert statements inside change hunks so they lie on a single line

► Eliminate whitespace► Concatenate multi-line statements to one line► Concatenate conditionals for complex statements (if, while,

etc.) to one line

• Step 2: Extract raw components► Component is a non-leaf node in the syntax tree of a single line► Bursts out complex statements into constituent parts

• Each portion of a complex conditional is a separate component► Additionally, separate out a method call and its parameters

Page 33: Static and Adaptive Bug Fix Patterns

Component Extraction Example

• Initial code

if (foo.flag >= 5 &&

foo.ready()) {

i=1;

foo.create(“example”);

initiate(5,bar);

}

• Extracted Componentsfoo.flag

foo.flag >= 5

foo.ready()

foo.flag >= 5 && foo.ready ()if (foo.flag >=5 && foo.ready())

i=1

“example”

foo.create() “example”

initiate(,) 5, bar

if

>=

&&.

.

foo flag

5 foo ready()

Page 34: Static and Adaptive Bug Fix Patterns

Normalization

• To further improve the ability to match code, perform abstraction of instances to types

► Replace variable instance with its type• Permits matching on type, rather than instance• foo.flag >= 5 Foo.flag >= 5 (type of foo is Foo)

► For literals, insert new component with type• i=1 yields int=1 and int=int

► For method calls, replace each parameter with type of parameter

• Use “*” for unknown types (we only do one-pass parse)• initiate(,) 5, bar initiate(,) int,* (type of bar is unknown)

Page 35: Static and Adaptive Bug Fix Patterns

Information Filtering Goal

• After normalization, resulting components are candidates for insertion into database

► Problem: many commonly occurring statement types• int=int

► Want to eliminate these, and others that don’t contribute unique information about bug fixes

Page 36: Static and Adaptive Bug Fix Patterns

Information Filtering Approach

• Assign an “information value” to component elements► Value 2:

• method call, string literal longer than 8 chars► Value 1:

• predicates for: if, do, while, for, as well as conditional expressions• return, case, switch, synchronized, throw• string literal, length 3-8 chars• variable name, field name, class name, variable type

► Value 0:• Everything else

• Information value for an entire component is the sum of its elemental information values

• We remove components with information value < 2► int=1 (info value = 1), int=int (info value = 0)► “example” (info value = 1), String (info value = 0)

Page 37: Static and Adaptive Bug Fix Patterns

Diff Filtering and Storing Memories

• As a final filtering step, keep only those components that are unique to either bug or fix hunks

► Duplicate components are eliminated, since they do not represent the bug or its fix

• After diff filtering step, store all components into the database (“memory”)

► Components record their transaction, file name, bug or fix hunk, etc.

► Also store initial source code of bug and fix hunks

Page 38: Static and Adaptive Bug Fix Patterns

Searching the Memory

• The memory database contains extracted adaptive bug and fix patterns for a given project

• Can use this memory to find code that matches bug code in the memory

• Use scenario► Developer working in their favorite development

environment► Receives feedback when code they are developing

matches a stored bug pattern► Can also suggest potential fixes from stored bug fix

code

Page 39: Static and Adaptive Bug Fix Patterns

Evaluation

• We evaluated the memory to determine how well it captures new bug fix changes

► Specifically, we create a memory for transactions 1 to n-1► At transaction n, for bug fix changes we examine whether the

bug hunks are found in the memory• This is a “half hit”

► If found, we also examine whether the fix hunk is found too• This is a “full hit”

► Examined same 5 project histories as for static patterns• ArgoUML, Columba, Eclipse, jEdit, Scarab

• This can be viewed as a proxy for how well the approach might work for bug and fix prediction

Page 40: Static and Adaptive Bug Fix Patterns

True and False Positives

Build memories based on transaction 1 .. n-1

……

False positive half hit, if found

True positive half hit, if found

Transaction 1 .. n-1

Memories

Non-fix change case at transaction n

Fix change caseat transaction n

Page 41: Static and Adaptive Bug Fix Patterns

True Positive Hit Rates

True Positive Hit Rate

0

5

10

15

20

25

30

35

40

45

ArgoUML Columba Eclipse jEdit Scarab

Projects

Hit

Rate

Full hit

Half hit

Page 42: Static and Adaptive Bug Fix Patterns

False Positive Hit Rates

False Positive Hit Rate

0

5

10

15

20

25

30

35

ArgoUML Columba Eclipse jEdit Scarab

Projects

Hit

Rate

Full hit

Half hit

Page 43: Static and Adaptive Bug Fix Patterns

True Positive and False Positive Full Hit Rates

0

2

4

6

8

10

12

14

16

18

ArgoUML Columba Eclipse jEdit Scarab

Projects

Hit

Rate

TP full hit

FP full hit

Page 44: Static and Adaptive Bug Fix Patterns

Adaptive Pattern Discussion

• Adaptive bug patterns work well► Captures 19.3%-40.3% of bugs (half-hits)► But, also captures a lot of non-bug changes (20.8%-

32.5%)► High full hit rate for non-fix changes could be due to

changes with no added hunk• Since there is no code to match in the database, we

automatically call this a full hit (might be better to ignore)

• Adaptive patterns are more project specific than static patterns

► Better suited for presenting possible bug fixes

Page 45: Static and Adaptive Bug Fix Patterns

Patterns Overall

• If you were to examine all project transactions► Not by time, grouping fix and non-fix changes together

• A fine-grain characterization of the kinds of changes made over the evolution of a software project?

Fix Non-Fix

StaticAdaptive

StaticAdaptive

Page 46: Static and Adaptive Bug Fix Patterns

Conclusion

• It is now possible to reliably extract static and adaptive bug fix patterns from software project evolution data

• Static patterns are useful for characterizing bug fixes at a fine grain syntactic level

• Adaptive patterns are useful for identifying potentially buggy code, and making bug fix recommendations at fine granularity