Upload
sung-kim
View
3.998
Download
2
Embed Size (px)
DESCRIPTION
Ning's PhD Thesis Defense
Citation preview
STAR: STACK TRACE BASED AUTOMATIC CRASH REPRODUCTION
PhD Thesis Defence
1
November 05, 2013
Ning Chen
Advisor: Sunghun Kim
Outline
1. Motivation & Related Work
2. Approaches of STAR1) Crash Precondition Computation
2) Input Model Generation
3) Test Input Generation
3. Evaluation Study
4. Challenges & Future Work
5. Contributions
2
Failure reproduction is a difficult and time consuming task. But it is necessary for fixing the corresponding bug.
For example: https://issues.apache.org/jira/browse/COLLECTIONS-70 Have not been fixed for five months due to difficulties in
reproducing the bug.
After a test case was submit, it was soon fixed with a comment:
“As always, a good test case makes all the difference.”
Motivation
3
Problem Statement The intention of this research is to propose a stack trace based
automatic crash reproduction framework, which is efficient and applicable to real world object-oriented programs.
Sub-problem 1:
Propose an efficient crash precondition computation approach which is applicable to non-trivial real world programs.
Sub-problem 2:
Propose a novel method sequence composition approach which can generate crash reproducible test cases for object-oriented programs.
4
Contributions Study the scalability challenge of automatic crash reproduction, and
propose approaches to improve its efficiency.
Study the object creation challenge for reproducing object-oriented crashes, and propose a novel method sequence composition approach to address it.
A novel framework, STAR, which combines the proposed approaches to achieve automatic crash reproduction using only crash stack trace.
A detailed empirical evaluation to investigate the usefulness of STAR.
5
Related Work
Record-and-replay approaches: Jrapture, 2000, BugNet, 2005, ReCrash/ReCrashJ, 2008 LEAP/LEAN, 2010
Post-failure-process approaches: Microsoft PSE, 2004 IBM SnuggleBug, 2009 XyLem, 2009 ESD, 2010 BugRedux, 2012
Related Work
7
Record-and-replay Approaches Approach:
Monitoring Phase: Captures/Stores runtime heap & stack objects. Test Generation Phase: Generates tests that loads the correct
objects with the crashed methods.
Original Program Execution
Stored Objects
Recreated Test Case
Store from heap & stack
Load as crashed method params
8
Record-and-replay Approaches
9
FrameworksInstrumenta
tionData Collections
Memory Overhead
Performance Overhead
Jrapture’00 Required All Interactions N/A N/A
BugNet’05 Required / Hardware
All Inputs/ Executed Code N/A N/A
ReCrash’08 Required Stack Objects 7% - 90% 31% - 60%
LEAP’10 Required SPE Access / Thread Info N/A 7% - 600%
Limitations: Require up-front instrumentations or special hardware deployment.
Collect client-side data, which may raise privacy concern. [Clause et. al, 2010]
Non-trivial memory and runtime overheads.
Post-failure-process Approaches Perform analyses on crashes only after they have
occurred.
Advantages Usually do not record runtime data.
Incur no or very little performance overhead.
10
Crash Explanation Approaches Microsoft PSE [Manevich et. al, 2004]
IBM SnuggleBug [Chandra et. al, 2009]
XyLem [Nanda et. al, 2009]
Assist crash debugging by providing hints on the target crashes: Potential crash traces Potential crash conditions
Could not reproduce the target crashes.
Post-failure-process Approaches
11
Crash Reproduction Approaches Core dump-based Approaches
Cdd [Leitner et. al, 2009]RECORE [Roßler et. al, 2013]
Symbolic execution-based approachesESD [Zamfir et. al, 2009]
BugRedux [Jin et. al, 2012]
Aims to reproduce crashes using only post-failure data such as Crash stack traces
Memory core dump at the time of the crash
Post-failure-process Approaches
12
Crash Reproduction Approaches Core dump-based approaches
E.g. Cdd [Leitner et. al, 2009] and RECORE [Roßler et. al, 2013]
Leverage the memory core dump and even some developer written contracts to guide the crash reproduction process.
Advantage Higher chance of reproducing a crash as more data is provided.
Limitations Requires not just stack trace, but the entire memory core dump at
the time of the crash.
Less capable in reality due to the lack of memory core dump.
13
Crash Reproduction Approaches Symbolic execution-based approaches
E.g. ESD [Zamfir et. al, 2009] and
BugRedux [Jin et. al, 2012]
Perform symbolic execution-based analysis to identify crash paths and generate crash reproducible test cases.
14
Advantages: Use only crash stack trace to achieve crash reproduction.
No runtime overhead is incurred at client-side.
Limitations: Existing approaches rely on forward symbolic executions to
compute crash preconditions, which is less efficient.
Could not be fully optimized due to the nature of forward symbolic execution.
Could not reproduce non-trivial crashes from object-oriented programs due to the object-creation challenge.
Crash Reproduction Approaches
15
Crash Reproduction Approaches STAR: Stack Traced based Automatic crash Reproduction
Advantages:
16
Approaches Limitations Advantages of STAR
Record-replay Data collection No runtime data collection
Record-replay Performance overhead No performance overhead
Core dump based
Memory Core dump anddeveloper written contracts
Crash stack trace
Symbolic. Exec.-based
Lack of optimizationsOptimizations to greatly improve the crash reproduction process.
Symbolic Exec.-based
Lack of support for object-oriented programs
Capable of reproducing non-trivial crashes for object-oriented programs.
Overview of STAR
program
Test Input Generation
test cases
1
2
3
stack trace
Crash Precondition Computation
Input Model Generation
Crash Preconditions
Crash Models
17
Crash Precondition Computation
Crash Precondition Computation
program
Test Input Generation
test cases
1
2
3
stack trace
Crash Precondition Computation
Input Model Generation
Crash Preconditions
Crash Models
19Crash Precondition Computation
Crash Precondition Computation Crash Precondition
the conditions of inputs at a method entry that can trigger the crash.
It specifies in what kind of memory state can the crash be reproduced.
Crash Precondition Computation 20
Crash Precondition Computation Existing approaches such as ESD and BugRedux use forward
symbolic executions to compute the crash preconditions. Program is executed in the same direction as normal executions.
Inputs and variables are represented as symbolic values instead of concrete values.
Limitations of forward symbolic execution Non-demand-driven: Need to execute many paths not related to
crash Limited optimization: Difficult perform optimizations using the
crash information
Crash Precondition Computation 21
Crash Precondition Computation STAR performs a backward symbolic execution to compute the
crash precondition. Program is executed from crash location to method entry.
Advantages of backward symbolic execution Demand-driven: Only paths related to the crash are executed.
Optimizations: Optimizations can be performed using the crash information.
Crash Precondition Computation 22
Backward Symbolic Execution Given a program P, a crash location L and the crash condition
C at L, we execute P from L to a method entry with C as the initial crash precondition.
The precondition is updated along the execution path according to the executed statements. E.g. int var3 = var1 + var2;
-> all occurrences of var3 are replaced by var1 + var2
E.g. if (var1 != null)
-> Coming from true branch: var1 != null is added to precondition
-> Coming from false branch: var1 == null is added to precondition
The preconditions at method entries are save as the final crash preconditions.
Crash Precondition Computation 23
Backward Symbolic ExecutionMethod Entry
buffer[i] = 0;
AIOBE
24Crash Precondition Computation
int i = this.last;
If (i < buffer.length)
T
TRUE
{buffer != null}{i < 0 or i >= buffer.length}
{buffer != null}{i < 0 or i >= buffer.length}
{i < buffer.length}
{buffer != null}{last < 0 or last >=
buffer.length}{last < buffer.length}
Precondition
Sym
bolic
Exe
cutio
n
isDebugging()
Challenge – Path explosion
buffer[i] = 0
AIOBE
debugLog(…) print(…)
i = 0 i = index
T F
FT
25
…
Crash Precondition Computation
index >= buffer.length
buffer = new int[16]
Optimizations STAR introduces three different approaches to improve
crash precondition computation process: Static Path Reduction
Heuristic backtracking
Early detection of inner contradictions
26Crash Precondition Computation
Static Path Reduction Observation:
Only a subset of the conditional branches and method calls contribute to the target crash.
E.g. Methods that perform runtime logging can be safely skipped
E.g. Branches which do not modify the crash related variables can be safely skipped.
Optimization: STAR detects and skips branches or method calls that do not contribute to the target crash during symbolic execution.
27Crash Precondition Computation
isDebugging()
Static Path Reduction
28Crash Precondition Computation
method isDebugging() does not contribute to the crash
buffer[i] = 0
AIOBE
debugLog(…) print(…)
i = 0 i = index
T F
FT
index >= buffer.length
buffer = new int[16]
isDebugging()
Static Path Reduction
29Crash Precondition Computation
the conditional branch does not contribute to the crash as well.
buffer[i] = 0
AIOBE
debugLog(…) print(…)
i = 0 i = index
T F
FT
index >= buffer.length
buffer = new int[16]
isDebugging()
Static Path Reduction
30Crash Precondition Computation
STAR can detect and skip over methods and branches not contributing to the crash
buffer[i] = 0
debugLog(…) print(…)
i = 0 i = index
T F
FT
index >= buffer.length
buffer = new int[16]
AIOBE
Static Path Reduction A conditional branch or a method call is contributive to the
crash if: It can modify any stack location referenced in the current crash
precondition formula.
It can modify any heap location referenced in the current crash precondition formula.
However, in backward execution, the actual heap locations may not be decidable until they are explicitly defined.
Crash Precondition Computation 31
Static Path Reduction For any reference whose heap location cannot be decide:
Compare whether the modified heap location and the reference has compatible data types.
Compare whether the modified heap location and the reference has the same field name (exception array)
If both of the above criterion are satisfied, the heap locations are considered the same.
In Java, the same heap location can only be accessed through the same field name, except for array fields.
Crash Precondition Computation 32
Heuristic Backtracking Observation:
Backtracking execution to the most recent branching point is likely inefficient, as the contradictions are usually introduced much earlier.
Optimization: STAR can efficiently backtrack to the most relevant branches where contradictions may still be avoided.
33Crash Precondition Computation
isDebugging()
Heuristic Backtracking
34
An executed path is not satisfiable according to the SMT solver.
Crash Precondition Computation
buffer[i] = 0
debugLog(…) print(…)
i = 0 i = index
T F
FT
index >= buffer.length
buffer = new int[16]
AIOBE
Heuristic Backtracking
35
Typical backtracking is not efficient.
Crash Precondition Computation
isDebugging()
buffer[i] = 0
debugLog(…) print(…)
i = 0 i = index
T F
FT
index >= buffer.length
buffer = new int[16]
AIOBE
i = index
isDebugging()
Heuristic Backtracking
36
STAR can quickly backtrack to the most relevant branches
Crash Precondition Computation
buffer[i] = 0
debugLog(…) print(…)
i = 0
T F
FT
index >= buffer.length
buffer = new int[16]
AIOBE
Heuristic Backtracking The unsatisfiable core of the last unsatisfied path
conditions. A subset of the path conditions which are still unsatisfied by
themselves
A branching point is considered relevant to the last unsatisfaction and will be backtracked to only if: A condition in the unsatisfiable core was added in this branch, or
A variable’s concrete value in the unsatisfiable core was decided in this branch, or
A variable’s actual heap location in the unsatisfiable core was decided in this branch.
Crash Precondition Computation 37
i = index
Inner Contradiction Detection
38
STAR quickly discovers inner-contradictions in the current precondition during execution.
Crash Precondition Computation
isDebugging()
buffer[i] = 0
debugLog(…) print(…)
i = 0
T F
FT
index >= buffer.length
buffer = new int[16]
AIOBE
Inner Contradiction Detection
Crash Precondition: index < 0 or index >= 16Index < 16
39Crash Precondition Computation
i = index
isDebugging()
buffer[i] = 0
print(…)
T F
FT
index >= buffer.length
buffer = new int[16]
AIOBE
debugLog(…)
i = 0
STAR quickly discovers inner-contradictions in the current precondition during execution.
Other Details Loops and recursive calls
Options for the maximum loop unrollment and maximum recursive call depth
Call graph construction User can specify a pointer analysis algorithm to use
Option for maximum call targets
String operations Strings are treated as arrays of characters.
Complex string operations/regular expressions are not support: require the usage of more specialized constraint solvers: Z3-str, HAMPI
40Crash Precondition Computation
Input Model Generation
Input Model Generation
program
Test Input Generation
test cases
1
2
3
stack trace
Crash Precondition Computation
Input Model Generation
Crash Preconditions
Crash Models
42Input Model Generation
Input Model Generation After computing the crash precondition, we need to
compute a model (object state) which satisfies this precondition.
However, for one precondition, there could be many models that can satisfy it. • E.g. For precondition: {ArrayList.size != 0}, there could be infinite
number of models satisfying it.
43Input Model Generation
Generating Feasible Input Models Object Creation Challenge [Xiao et. al, 2011]
Not every model satisfying a precondition is feasible to be generated.
For precondition: ArrayList.size != 0, an input model: ArrayList.size == -1 can satisfy it, but such object can never be generated.
Therefore, we want to obtain input models whose objects are actually feasible to generate.
44Input Model Generation
Generating Practical Input Models For different input models, the difficulties in generating the
corresponding objects can be very different.
45
Model 1:
ArrayList.size == 100
Requires add() 100 times
Model 2:
ArrayList.size == 1
Requires add() 1 time
Therefore, we also want to obtain input models whose values are as close to the initial values as possible.
Input Model Generation
Class Information STAR has an input model generation approach that can
Generate feasible models Generate practical models
Extracts and uses the class semantic information to guide the input model generation process. The initial value for each class member field.
The potential value range for each numerical field: • e.g. ArrayList.size >= 0
46Input Model Generation
Input Model Generation
47
ArrayList.size >= 0
ArrayList.size != 0
ArrayList.size starts from 0
ArrayList.size == 1
Crash Precondition Class Information
A feasible and practical model
Value Range Initial Value
Input Model Generation
SMTSolver
Test Input Generation
Test Input Generation
program
Test Input Generation
test cases
1
2
3
stack trace
Crash Precondition Computation
Input Model Generation
Crash Preconditions
Crash Models
49Test Input Generation
Test Input Generation Given a crashing model, it is necessary to generate test
inputs that can satisfy it.
However, it could be challenging to generate object test inputs [Xiao et. al, 2011] Non-public fields are not assignable Class invariants are easily broken if generate using reflection.
A legitimate method sequence that can create and mutate an object to satisfy the target model (target object state).
50Test Input Generation
Test Input Generation Randomized techniques
Randoop [Pacheco et. al, 2007]
Dynamic analysis Palulu [Artzi et. al, 2009] Palus [Zhang et. al, 2011]
Codebase mining MSeqGen [Thummalapenta et. al, 2009]
Not efficient as their input generation process are not demand-driven, and may rely on existing code bases.
Test Input Generation 51
Test Input Generation STAR proposes a novel demand-driven test input
generation approach.
52Test Input Generation
Summary Extraction
Forward symbolic execution to obtain the summary of each method.
Test Input Generation 53
Summary Extraction
54Test Input Generation
Summary of a method the collection of the summaries of its individual paths.
Summary of a method path: , where : the path conditions represented as a conjunction of constraints
over the method inputs (heap locations read by the method)
: postcondition of the path represented as a conjunction of constraints over the method outputs (heap locations written by the method) Essentially, it is the final effect of this method path.
Summary Extraction
Test Input Generation 55
obj != null
e = new Exception()
T F
throw e
list[size] = obj
size += 1
Method Entry
Method Exit
Path 1
obj != null
list[size] = objsize += 1
obj == null
throw new Exception
Path 2
We perform a forward symbolic execution to the target method.
Path Condition
Path Effect
Method Sequence Deduction
STAR introduced a deductive-style approach to construct method sequences that can achieve the target object state
Test Input Generation 56
Method Sequence Deduction
.
Deductive Engine
Constraint Solver
57Test Input Generation
Φ𝑡𝑎𝑟𝑔𝑒𝑡
Method Path:
Candidate Method
Φ h𝑝𝑎𝑡 ∧Φ𝑡𝑎𝑟𝑔𝑒𝑡
Input Parameter’s Object States
satisfies
By taking this path, the target object state can be achieved
Recursive deduction for parameter
Given a target object state , the path summaries for each method, the approach finds a method sequence that can produces an object satisfying in a recursive deduction.
Example
public class Container {
public Container()
public void add(Object);
public void remove(Object);
public void clear();
}
Desired object state (Input model): Container.size == 10
58Test Input Generation
Example – Summary Extraction
Path 1
size = 0 remove all in listsize = 0
Path 1
Path 1
obj != null
list[size] = objsize += 1
obj == null
throw an exception
Path 2
Path 1
obj in list
remove from listsize -= 1
Path 2
obj not in list
No effect
Container() clear()
add(obj)
remove(obj)
TRUE TRUE
Test Input Generation 59
Example – Sequence Deduction
Can add() produce target state?
Yes, this.size == 9 && obj != null
Container.size == 9
Select clear() No, not satisfiable
Can clear() produce target state?
60Test Input Generation
Select add(obj)Container.size
== 10 Deductive Engine
Constraint Solver
𝚽𝐭𝐚𝐫𝐠𝐞𝐭 Method Deduction
Example – Sequence Deduction
Can add() produce target state?
Yes, this.size == 9 && obj != null
Yes, this.size == 8 && obj != null
Can add() produce target state?
Container.size == 0
Yes, no parameter requirement
Can Contaier() produce target state?
…
61Test Input Generation
Container.size == 9
Select add(obj)
Select add(obj)Container.size
== 10
Select Container()
Deductive Engine
Constraint Solver
𝚽𝐭𝐚𝐫𝐠𝐞𝐭 DeductionMethod
Example – Final Sequence Combine in reverse direction to form the whole sequence
void sequence() {
Container container = new Container();
Object o1 = new Object();
container.add(o1);
… (10 times)
}
62Test Input Generation
Other Details The forward symbolic execution in method summary extraction
follows similar settings as precondition computation E.g. Loops and recursive calls are expanded for only limited
times/depth. (So the extracted path summary ≤ total method paths)
• The incompleteness of method path summary does not affect the precision of the method sequence composition. Generated method sequences are still correct. Method sequences may not be generated due to missing path summary.
Optimizations have been applied to reduce the number of methods and method paths to examine.
63Test Input Generation
Evaluation
Research Questions Research Question 1
How many crashes can STAR compute their crash triggering preconditions?
Research Question 2How many crashes can STAR reproduce based on the crash triggering preconditions?
Research Question 3How many crash reproductions by STAR are useful for revealing the actual cause of the crashes?
65
Subjects: Apache-Commons-Collection (ACC):
data container library that implements additional data structures over JDK. 60kLOC.
Ant (ANT)Java build tool that supports a number of built-in and extension tasks such as compile, test and run Java applications. 100kLOC.
Log4j (LOG)logging package for printing log output to different local and remote destinations. 20kLOC.
Evaluation Setup
66
Crash Report Collection: Collect from the issue tracking system of each subject.
Only confirmed and fixed crashes were collected.
Crashes with no or incorrect stack trace information were discarded.
Three major types of crashes: custom thrown exceptions, NPE and AIOBE. (covers 80% of crashes, Nam et. al, 2009)
52 crashes were obtained from the three subjects.
Evaluation Setup
Subject # of Crashes Versions Avg. Fix Time Report Period
ACC 12 2.0 – 4.0 42 days Oct. 03 – Jun. 12
ANT 21 1.6.1 – 1.8.3 25 days Apr. 04 – Aug. 12
LOG 19 1.0.0 – 1.2.16 77 days Jan. 01 – Oct. 09
67
Our evaluation study has the largest number of crashes compared to previous studies
Evaluation Setup
Subject Number of Crashes
RECRASH 11
ESD 6
BugRedux 17
RECORE 7
STAR 52
68
Research Question 1 How many crashes can STAR compute their crash
preconditions? How many crashes can STAR compute crash precondition without
the optimization approaches.
How many crashes can STAR compute crash precondition with the optimization approaches.
We applied STAR to compute the preconditions for each crash.
69
Research Question 1
ACC ANT LOG Overall0
10
20
30
40
50
60
70
80
66.7
14.3
36.8 34.6
7571.4 73.7 73.1
Without Optimizations With Optimizations
Cra
shes
with
pre
cond
ition
s (%
)
70
Percentage of crashes whose preconditions were computed by STAR
+57.1
+36.9 +38.5
Research Question 1
ACC ANT LOG Overall0
10
20
30
40
50
60
70
80
90
100
18.5
90.4
55.159.3
2.1 4.9 2.4 3.3
Without Optimizations With Optimizations
Ave
rage
tim
e sp
ent
(sec
ond)
71
Average time to compute the crash preconditions (The lower the better)
Research Question 1
ACC ANT LOG Overall0
10
20
30
40
50
60
70
80
66.7
14.3
36.834.6
75
23.8
47.444.2
66.7
23.8
36.838.5
66.7
14.3
42.1
36.5
7571.4
73.7 73.1
No Optimization Static Path ReductionHeuristic Backtracking Contradiction DetectAll Optimizations
Cra
shes
with
pre
cond
ition
s (%
)
72
Percentage of crashes whose preconditions were computed by STAR – Break down by each optimization
Research Question 1 STAR successfully computed crash preconditions for 38
(73.1%) out of the 52 crashes.
STAR’s optimization approaches have significantly improved the overall result by 20 (38.5%) crashes.
Static path reduction is the most effective optimization, but the application of all three optimizations together can achieve a much higher improvement.
73
Research Question 2 How many crashes can STAR reproduce based on the
crash preconditions?
Criterion of Reproduction [ReCrash, 2008]A crash is considered reproduced if the generated test case can trigger the same type of exception at the same crash line.
We applied STAR to generate crash reproducible test cases for each computed crash precondition.
74
Research Question 2
Subject # of Crashes# of
Precondition# of
ReproducedRatio
ACC 12 9 866.7%
(88.9%)
ANT 21 15 1257.1%
(80.0%)
LOG 19 14 1157.9%
(78.6%)
Total 52 38 3159.6%
(81.6%)
Overall crash reproductions achieved by STAR for each subject:
75
Research Question 2
SubjectAverage # of
ObjectsAvg. Candidate
MethodsMin – Max Sequence
Average Sequence
ACC 1.5 35.5 2 - 19 9.4
ANT 1.4 11.7 2 - 14 6.2
LOG 1.5 21.8 2 - 17 8.1
Total 1.5 21.4 2 - 19 7.7
More statistics for the test case generation process by STAR
76
Research Question 3 Criterion of Reproduction does not require a crash
reproduction to match the complete stack trace frames. A partial match of only the top stack frames is still considered as a
valid reproduction of the target crash according to the criterion.
The root causes of more than 60% of crashes lie in the top three stack frames [Schroter et. al, 2010] It is not necessary to reproduce the complete stack trace to reveal
the root cause of a crash.
77
Research Question 3 Drawbacks of Criterion of Reproduction
The crash reproduction may not be the same crash.
The crash reproduction may not be useful for revealing the crash triggering bug.
78
Buggy frame
Reproduced
Research Question 3 How many crash reproductions by STAR are useful for
revealing the actual causes of the crashes?
Criterion of useful crash reproductionA crash reproduction is considered useful if it can trigger the same incorrect behaviors at the buggy location, and eventually causes the crash to re-appear.
We manually examined the original and fixed versions of the program to identify the actual buggy location for each crash.
79
Research Question 3
Subject # of Reproduced # of Useful Ratio (Total)
ACC 8 7 87.5% (58.3%)
ANT 12 7 58.3% (33.3%)
LOG 11 8 72.7% (42.1%)
Total 31 22 71.0% (42.3%)
Overall useful crash reproductions achieved by STAR for each subject:
80
Comparison Study We compared STAR with two different crash reproduction
frameworks: Randoop: feedback-directed test input generation framework. It is
capable of generating thousands of test inputs that may reproduce the target crashes.
Maximum of 1000 seconds to generate test cases. (10 times of STAR)Manually provide the crash related class list to increase its probabilities.
BugRedux: a state-of-the-art crash reproduction framework. It can compute crash preconditions and generate crash reproducible test cases.
We apply the two frameworks to the same set of crashes used in our evaluation.
81
Comparison Study
82
Precondition Reproduction Usefulness0
5
10
15
20
25
30
35
40
0
12
8
18
107
38
31
22
Randoop BugRedux STAR
Nu
mb
er o
f C
rash
es
The number of crashes reproduced by the three approaches
Comparison Study
83
Randoop
12 crashes
BugRedux
10 crashes5 crashes
STAR
Comparison Study STAR outperformed Randoop because:
Randoop uses a randomized search technique to generate method sequences. Can generate many method sequences but not guided.
Due to the large search space of real world programs, the probabilities to generate crash reproducible sequences are low.
STAR outperformed BugRedux because: Several effective optimizations to improve the efficiency of the
crash precondition computation process.
A method sequence composition approach that can generate complex input objects satisfying the crash preconditions.
84
Case Study https://issues.apache.org/jira/browse/collections-411
An IndexOutOfBoundsException could be raised in method ListOrderedMap.putAll() due to incorrect index increment.
This bug was soon fixed by the developers by adding checkers to make sure index is incremented only in certain cases.
85
01 public void putAll(int index, Map map) {
02 for (Map.Entry entry : map.entrySet()) {
03 put(index, entry.getKey(), entry.getValue();
04 ++index; / / buggy increment
05 }
06 }
Case Study STAR was applied to generate a crash reproducible test case
for this crash: Surprisingly, it successfully generated a test case that could crash both the
original and fixed (latest) version of the program.
We reported this potential issue discovered by STAR to the project developers https://issues.apache.org/jira/browse/collections-474
We also attached the auto-generated test case by STAR in our bug report.
86
Case Study Developers quickly confirmed:
The original patch for bug ACC-411 was actually incomplete. It missed a corner case that can still crash the program.
Neither the developers nor the original bug reporter identified this corner case in over a year.
It only took developers a few hours to confirmed and fixed the bug after STAR’s test case demonstrated this corner case.
The crash reproducible test case by STAR was added to the official test suite of the Apache Commons Collections project by the developers. http://svn.apache.org/r1496168
87
Case Study STAR is capable of identifying and reproducing crashes that
are even difficult for experienced developers.
STAR can be used to confirm the completeness of bug fixes. If a bug fix is incomplete, STAR may generate a crash reproducible
test case to demonstrate the missing corner case.
88
Challenges & Future Work
Challenges We manually examined each not reproduced crashes to
identify the major challenges of reproduction:
Environment dependency (36.7%) File input. Network input.
SMT Solver Limitation (23.3%) Complex string constraints (e.g. regular expressions) Non-linear arithmetic
Concurrency & Non-determinism (16.7%) Some crashes are only reproducible non-deterministically or under
concurrent execution.
Path Explosion (6.7%)
90
Future Work Improving reproducibility
Support for environment simulation, e.g. file inputs
Incorporate specialized SMT solver: string solver like Z3-str
Automatic fault localization Existing fault localization approaches requires both passing and
failing test cases locate faulty statements.
STAR’s ability to generate failing test cases can help automate the fault localization process.
Crash reproduction for mobile applications Android applications are similar to desktop Java programs in many
aspects.
91
Conclusions We proposed STAR, an automatic crash reproduction
framework using stack trace.
Successfully reproduced 31 (59.6%) out of 52 real world crashes from three non-trivial programs.
The reproduced crashes can effectively help developers reveal the underlying crash triggering bugs, or even identify unknown bug.
A comparison study demonstrates that STAR can significantly outperform existing crash reproduction approaches.
92
Thank You!
Appendix
Our evaluation study has one of the largest subject size compared to previous studies
Subject Sizes
Subject Subject Sizes Average Subject Size
RECRASH 200 – 86,000 47,000
ESD 100 – 100,000 N/A
BugRedux 500 – 241,000 27,000
RECORE 68 – 62,000 35,000
STAR 20,000 – 100,000 60,000
95
Research Question 1
ACC ANT LOG Overall0
10
20
30
40
50
60
70
80
90
100
18.5
90.4
55.159.3
11.8
67.5
28.3
39.2
15.9
74.8
47.850
13.8
86.8
48.2
54.3
2.14.9
2.4 3.3
No Optimization Static Path ReductionHeuristic Backtracking Contradiction DetectAll Optimizations
Ave
rage
tim
e sp
ent
(sec
ond)
96
Average time to compute the crash preconditions (The lower the better) – Break down by each optimization
Comparison Study
ACC ANT LOG Overall0
5
10
15
20
25
30
35
2.4
29.9
4.275
8.7
2.3
10.8
3.75 4.6
BugRedux STAR
Ave
rag
e ti
me
spen
t (s
eco
nd
)
97
Average time to reproduce crashes (The lower the better) – Only the common reproductions
User Survey
ACC-53“The auto-generated test case would reproduce the bug. . . I think that having such a test case would have been useful.”
98
Survey Sent ResponsesConfirmed
CorrectnessConfirmed Usefulness
31 6 (19%) 5 3
Comparison Study
ACC JSAP SAT4J0
10
20
30
40
50
60
70
80
16
30
12
29
40
2019
58
2222
54
0
29
61
36
6974
54
Sample Execution Randoop PaluluRecGen Palus STAR
Bra
nch
Co
vera
ge
(%)
Branch coverage achieved by different test case generation approaches