STAR: Stack Trace based Automatic Crash Reproduction

STAR: STACK TRACE BASED AUTOMATIC CRASH REPRODUCTION

PhD Thesis Defence

1

November 05, 2013

Ning Chen

Advisor: Sunghun Kim

Outline

1. Motivation & Related Work

2. Approaches of STAR1) Crash Precondition Computation

2) Input Model Generation

3) Test Input Generation

3. Evaluation Study

4. Challenges & Future Work

5. Contributions

2

Failure reproduction is a difficult and time consuming task. But it is necessary for fixing the corresponding bug.

For example: https://issues.apache.org/jira/browse/COLLECTIONS-70 Have not been fixed for five months due to difficulties in

reproducing the bug.

After a test case was submit, it was soon fixed with a comment:

“As always, a good test case makes all the difference.”

Motivation

3

https://issues.apache.org/jira/browse/COLLECTIONS-70

https://issues.apache.org/jira/browse/COLLECTIONS-70

Problem Statement The intention of this research is to propose a stack trace based

automatic crash reproduction framework, which is efficient and applicable to real world object-oriented programs.

Sub-problem 1:

Propose an efficient crash precondition computation approach which is applicable to non-trivial real world programs.

Sub-problem 2:

Propose a novel method sequence composition approach which can generate crash reproducible test cases for object-oriented programs.

4

Contributions Study the scalability challenge of automatic crash reproduction, and

propose approaches to improve its efficiency.

Study the object creation challenge for reproducing object-oriented crashes, and propose a novel method sequence composition approach to address it.

A novel framework, STAR, which combines the proposed approaches to achieve automatic crash reproduction using only crash stack trace.

A detailed empirical evaluation to investigate the usefulness of STAR.

5

Related Work

Record-and-replay approaches: Jrapture, 2000, BugNet, 2005, ReCrash/ReCrashJ, 2008 LEAP/LEAN, 2010

Post-failure-process approaches: Microsoft PSE, 2004 IBM SnuggleBug, 2009 XyLem, 2009 ESD, 2010 BugRedux, 2012

Related Work

7

Record-and-replay Approaches Approach:

Monitoring Phase: Captures/Stores runtime heap & stack objects. Test Generation Phase: Generates tests that loads the correct

objects with the crashed methods.

Original Program Execution

Stored Objects

Recreated Test Case

Store from heap & stack

Load as crashed method params

8

Record-and-replay Approaches

9

FrameworksInstrumenta

tionData Collections

Memory Overhead

Performance Overhead

Jrapture’00 Required All Interactions N/A N/A

BugNet’05 Required / Hardware

All Inputs/ Executed Code N/A N/A

ReCrash’08 Required Stack Objects 7% - 90% 31% - 60%

LEAP’10 Required SPE Access / Thread Info N/A 7% - 600%

Limitations: Require up-front instrumentations or special hardware deployment.

Collect client-side data, which may raise privacy concern. [Clause et. al, 2010]

Non-trivial memory and runtime overheads.

Post-failure-process Approaches Perform analyses on crashes only after they have

occurred.

Advantages Usually do not record runtime data.

Incur no or very little performance overhead.

10

Crash Explanation Approaches Microsoft PSE [Manevich et. al, 2004]

IBM SnuggleBug [Chandra et. al, 2009]

XyLem [Nanda et. al, 2009]

Assist crash debugging by providing hints on the target crashes: Potential crash traces Potential crash conditions

Could not reproduce the target crashes.

Post-failure-process Approaches

11

Crash Reproduction Approaches Core dump-based Approaches

Cdd [Leitner et. al, 2009]RECORE [Roßler et. al, 2013]

Symbolic execution-based approachesESD [Zamfir et. al, 2009]

BugRedux [Jin et. al, 2012]

Aims to reproduce crashes using only post-failure data such as Crash stack traces

Memory core dump at the time of the crash

Post-failure-process Approaches

12

Crash Reproduction Approaches Core dump-based approaches

E.g. Cdd [Leitner et. al, 2009] and RECORE [Roßler et. al, 2013]

Leverage the memory core dump and even some developer written contracts to guide the crash reproduction process.

Advantage Higher chance of reproducing a crash as more data is provided.

Limitations Requires not just stack trace, but the entire memory core dump at

the time of the crash.

Less capable in reality due to the lack of memory core dump.

13

Crash Reproduction Approaches Symbolic execution-based approaches

E.g. ESD [Zamfir et. al, 2009] and

BugRedux [Jin et. al, 2012]

Perform symbolic execution-based analysis to identify crash paths and generate crash reproducible test cases.

14

Advantages: Use only crash stack trace to achieve crash reproduction.

No runtime overhead is incurred at client-side.

Limitations: Existing approaches rely on forward symbolic executions to

compute crash preconditions, which is less efficient.

Could not be fully optimized due to the nature of forward symbolic execution.

Could not reproduce non-trivial crashes from object-oriented programs due to the object-creation challenge.

Crash Reproduction Approaches

15

Crash Reproduction Approaches STAR: Stack Traced based Automatic crash Reproduction

Advantages:

16

Approaches Limitations Advantages of STAR

Record-replay Data collection No runtime data collection

Record-replay Performance overhead No performance overhead

Core dump based

Memory Core dump anddeveloper written contracts

Crash stack trace

Symbolic. Exec.-based

Lack of optimizationsOptimizations to greatly improve the crash reproduction process.

Symbolic Exec.-based

Lack of support for object-oriented programs

Capable of reproducing non-trivial crashes for object-oriented programs.

Overview of STAR

program

Test Input Generation

test cases

1

2

3

stack trace

Crash Precondition Computation

Input Model Generation

Crash Preconditions

Crash Models

17



program


test cases

1

2

3

stack trace



Crash Preconditions

Crash Models

19Crash Precondition Computation

Crash Precondition Computation Crash Precondition

the conditions of inputs at a method entry that can trigger the crash.

It specifies in what kind of memory state can the crash be reproduced.

Crash Precondition Computation 20

Crash Precondition Computation Existing approaches such as ESD and BugRedux use forward

symbolic executions to compute the crash preconditions. Program is executed in the same direction as normal executions.

Inputs and variables are represented as symbolic values instead of concrete values.

Limitations of forward symbolic execution Non-demand-driven: Need to execute many paths not related to

crash Limited optimization: Difficult perform optimizations using the

crash information


Crash Precondition Computation STAR performs a backward symbolic execution to compute the

crash precondition. Program is executed from crash location to method entry.

Advantages of backward symbolic execution Demand-driven: Only paths related to the crash are executed.

Optimizations: Optimizations can be performed using the crash information.


Backward Symbolic Execution Given a program P, a crash location L and the crash condition

C at L, we execute P from L to a method entry with C as the initial crash precondition.

The precondition is updated along the execution path according to the executed statements. E.g. int var3 = var1 + var2;

-> all occurrences of var3 are replaced by var1 + var2

E.g. if (var1 != null)

-> Coming from true branch: var1 != null is added to precondition

-> Coming from false branch: var1 == null is added to precondition

The preconditions at method entries are save as the final crash preconditions.


Backward Symbolic ExecutionMethod Entry

buffer[i] = 0;

AIOBE


int i = this.last;

If (i < buffer.length)

T

TRUE

{buffer != null}{i < 0 or i >= buffer.length}

{buffer != null}{i < 0 or i >= buffer.length}

{i < buffer.length}

{buffer != null}{last < 0 or last >=

buffer.length}{last < buffer.length}

Precondition

Sym

bolic

Exe

cutio

n

isDebugging()

Challenge – Path explosion

buffer[i] = 0

AIOBE

debugLog(…) print(…)

i = 0 i = index

T F

FT

25

…


index >= buffer.length

buffer = new int[16]

Optimizations STAR introduces three different approaches to improve

crash precondition computation process: Static Path Reduction

Heuristic backtracking

Early detection of inner contradictions


Static Path Reduction Observation:

Only a subset of the conditional branches and method calls contribute to the target crash.

E.g. Methods that perform runtime logging can be safely skipped

E.g. Branches which do not modify the crash related variables can be safely skipped.

Optimization: STAR detects and skips branches or method calls that do not contribute to the target crash during symbolic execution.


isDebugging()

Static Path Reduction


method isDebugging() does not contribute to the crash

buffer[i] = 0

AIOBE


i = 0 i = index

T F

FT



isDebugging()



the conditional branch does not contribute to the crash as well.

buffer[i] = 0

AIOBE


i = 0 i = index

T F

FT



isDebugging()



STAR can detect and skip over methods and branches not contributing to the crash

buffer[i] = 0


i = 0 i = index

T F

FT



AIOBE

Static Path Reduction A conditional branch or a method call is contributive to the

crash if: It can modify any stack location referenced in the current crash

precondition formula.

It can modify any heap location referenced in the current crash precondition formula.

However, in backward execution, the actual heap locations may not be decidable until they are explicitly defined.


Static Path Reduction For any reference whose heap location cannot be decide:

Compare whether the modified heap location and the reference has compatible data types.

Compare whether the modified heap location and the reference has the same field name (exception array)

If both of the above criterion are satisfied, the heap locations are considered the same.

In Java, the same heap location can only be accessed through the same field name, except for array fields.


Heuristic Backtracking Observation:

Backtracking execution to the most recent branching point is likely inefficient, as the contradictions are usually introduced much earlier.

Optimization: STAR can efficiently backtrack to the most relevant branches where contradictions may still be avoided.


isDebugging()

Heuristic Backtracking

34

An executed path is not satisfiable according to the SMT solver.


buffer[i] = 0


i = 0 i = index

T F

FT



AIOBE


35

Typical backtracking is not efficient.


isDebugging()

buffer[i] = 0


i = 0 i = index

T F

FT



AIOBE

i = index

isDebugging()


36

STAR can quickly backtrack to the most relevant branches


buffer[i] = 0


i = 0

T F

FT



AIOBE

Heuristic Backtracking The unsatisfiable core of the last unsatisfied path

conditions. A subset of the path conditions which are still unsatisfied by

themselves

A branching point is considered relevant to the last unsatisfaction and will be backtracked to only if: A condition in the unsatisfiable core was added in this branch, or

A variable’s concrete value in the unsatisfiable core was decided in this branch, or

A variable’s actual heap location in the unsatisfiable core was decided in this branch.


i = index

Inner Contradiction Detection

38

STAR quickly discovers inner-contradictions in the current precondition during execution.


isDebugging()

buffer[i] = 0


i = 0

T F

FT



AIOBE

Inner Contradiction Detection

Crash Precondition: index < 0 or index >= 16Index < 16


i = index

isDebugging()

buffer[i] = 0

print(…)

T F

FT



AIOBE

debugLog(…)

i = 0

STAR quickly discovers inner-contradictions in the current precondition during execution.

Other Details Loops and recursive calls

Options for the maximum loop unrollment and maximum recursive call depth

Call graph construction User can specify a pointer analysis algorithm to use

Option for maximum call targets

String operations Strings are treated as arrays of characters.

Complex string operations/regular expressions are not support: require the usage of more specialized constraint solvers: Z3-str, HAMPI




program


test cases

1

2

3

stack trace



Crash Preconditions

Crash Models

42Input Model Generation

Input Model Generation After computing the crash precondition, we need to

compute a model (object state) which satisfies this precondition.

However, for one precondition, there could be many models that can satisfy it. • E.g. For precondition: {ArrayList.size != 0}, there could be infinite

number of models satisfying it.


Generating Feasible Input Models Object Creation Challenge [Xiao et. al, 2011]

Not every model satisfying a precondition is feasible to be generated.

For precondition: ArrayList.size != 0, an input model: ArrayList.size == -1 can satisfy it, but such object can never be generated.

Therefore, we want to obtain input models whose objects are actually feasible to generate.


Generating Practical Input Models For different input models, the difficulties in generating the

corresponding objects can be very different.

45

Model 1:

ArrayList.size == 100

Requires add() 100 times

Model 2:

ArrayList.size == 1

Requires add() 1 time

Therefore, we also want to obtain input models whose values are as close to the initial values as possible.


Class Information STAR has an input model generation approach that can

Generate feasible models Generate practical models

Extracts and uses the class semantic information to guide the input model generation process. The initial value for each class member field.

The potential value range for each numerical field: • e.g. ArrayList.size >= 0



47

ArrayList.size >= 0

ArrayList.size != 0

ArrayList.size starts from 0

ArrayList.size == 1

Crash Precondition Class Information

A feasible and practical model

Value Range Initial Value


SMTSolver



program


test cases

1

2

3

stack trace



Crash Preconditions

Crash Models

49Test Input Generation

Test Input Generation Given a crashing model, it is necessary to generate test

inputs that can satisfy it.

However, it could be challenging to generate object test inputs [Xiao et. al, 2011] Non-public fields are not assignable Class invariants are easily broken if generate using reflection.

A legitimate method sequence that can create and mutate an object to satisfy the target model (target object state).


Test Input Generation Randomized techniques

Randoop [Pacheco et. al, 2007]

Dynamic analysis Palulu [Artzi et. al, 2009] Palus [Zhang et. al, 2011]

Codebase mining MSeqGen [Thummalapenta et. al, 2009]

Not efficient as their input generation process are not demand-driven, and may rely on existing code bases.

Test Input Generation 51

Test Input Generation STAR proposes a novel demand-driven test input

generation approach.


Summary Extraction

Forward symbolic execution to obtain the summary of each method.


Summary Extraction


Summary of a method the collection of the summaries of its individual paths.

Summary of a method path: , where : the path conditions represented as a conjunction of constraints

over the method inputs (heap locations read by the method)

: postcondition of the path represented as a conjunction of constraints over the method outputs (heap locations written by the method) Essentially, it is the final effect of this method path.

Summary Extraction


obj != null

e = new Exception()

T F

throw e

list[size] = obj

size += 1

Method Entry

Method Exit

Path 1

obj != null

list[size] = objsize += 1

obj == null

throw new Exception

Path 2

We perform a forward symbolic execution to the target method.

Path Condition

Path Effect

Method Sequence Deduction

STAR introduced a deductive-style approach to construct method sequences that can achieve the target object state


Method Sequence Deduction

.

Deductive Engine

Constraint Solver


Φ𝑡𝑎𝑟𝑔𝑒𝑡

Method Path:

Candidate Method

Φ h𝑝𝑎𝑡 ∧Φ𝑡𝑎𝑟𝑔𝑒𝑡

Input Parameter’s Object States

satisfies

By taking this path, the target object state can be achieved

Recursive deduction for parameter

Given a target object state , the path summaries for each method, the approach finds a method sequence that can produces an object satisfying in a recursive deduction.

Example

public class Container {

public Container()

public void add(Object);

public void remove(Object);

public void clear();

}

Desired object state (Input model): Container.size == 10


Example – Summary Extraction

Path 1

size = 0 remove all in listsize = 0

Path 1

Path 1

obj != null

list[size] = objsize += 1

obj == null

throw an exception

Path 2

Path 1

obj in list

remove from listsize -= 1

Path 2

obj not in list

No effect

Container() clear()

add(obj)

remove(obj)

TRUE TRUE


Example – Sequence Deduction

Can add() produce target state?

Yes, this.size == 9 && obj != null

Container.size == 9

Select clear() No, not satisfiable

Can clear() produce target state?


Select add(obj)Container.size

== 10 Deductive Engine

Constraint Solver

𝚽𝐭𝐚𝐫𝐠𝐞𝐭 Method Deduction

Example – Sequence Deduction





Container.size == 0

Yes, no parameter requirement

Can Contaier() produce target state?

…


Container.size == 9

Select add(obj)

Select add(obj)Container.size

== 10

Select Container()

Deductive Engine

Constraint Solver

𝚽𝐭𝐚𝐫𝐠𝐞𝐭 DeductionMethod

Example – Final Sequence Combine in reverse direction to form the whole sequence

void sequence() {

Container container = new Container();

Object o1 = new Object();

container.add(o1);

… (10 times)

}


Other Details The forward symbolic execution in method summary extraction

follows similar settings as precondition computation E.g. Loops and recursive calls are expanded for only limited

times/depth. (So the extracted path summary ≤ total method paths)

• The incompleteness of method path summary does not affect the precision of the method sequence composition. Generated method sequences are still correct. Method sequences may not be generated due to missing path summary.

Optimizations have been applied to reduce the number of methods and method paths to examine.


Evaluation

Research Questions Research Question 1

How many crashes can STAR compute their crash triggering preconditions?

Research Question 2How many crashes can STAR reproduce based on the crash triggering preconditions?

Research Question 3How many crash reproductions by STAR are useful for revealing the actual cause of the crashes?

65

Subjects: Apache-Commons-Collection (ACC):

data container library that implements additional data structures over JDK. 60kLOC.

Ant (ANT)Java build tool that supports a number of built-in and extension tasks such as compile, test and run Java applications. 100kLOC.

Log4j (LOG)logging package for printing log output to different local and remote destinations. 20kLOC.

Evaluation Setup

66

Crash Report Collection: Collect from the issue tracking system of each subject.

Only confirmed and fixed crashes were collected.

Crashes with no or incorrect stack trace information were discarded.

Three major types of crashes: custom thrown exceptions, NPE and AIOBE. (covers 80% of crashes, Nam et. al, 2009)

52 crashes were obtained from the three subjects.

Evaluation Setup

Subject # of Crashes Versions Avg. Fix Time Report Period

ACC 12 2.0 – 4.0 42 days Oct. 03 – Jun. 12

ANT 21 1.6.1 – 1.8.3 25 days Apr. 04 – Aug. 12

LOG 19 1.0.0 – 1.2.16 77 days Jan. 01 – Oct. 09

67

Our evaluation study has the largest number of crashes compared to previous studies

Evaluation Setup

Subject Number of Crashes

RECRASH 11

ESD 6

BugRedux 17

RECORE 7

STAR 52

68

Research Question 1 How many crashes can STAR compute their crash

preconditions? How many crashes can STAR compute crash precondition without

the optimization approaches.

How many crashes can STAR compute crash precondition with the optimization approaches.

We applied STAR to compute the preconditions for each crash.

69

Research Question 1

ACC ANT LOG Overall0

10

20

30

40

50

60

70

80

66.7

14.3

36.8 34.6

7571.4 73.7 73.1

Without Optimizations With Optimizations

Cra

shes

with

pre

cond

ition

s (%

)

70

Percentage of crashes whose preconditions were computed by STAR

+57.1

+36.9 +38.5

Research Question 1


10

20

30

40

50

60

70

80

90

100

18.5

90.4

55.159.3

2.1 4.9 2.4 3.3

Without Optimizations With Optimizations

Ave

rage

tim

e sp

ent

(sec

ond)

71

Average time to compute the crash preconditions (The lower the better)

Research Question 1


10

20

30

40

50

60

70

80

66.7

14.3

36.834.6

75

23.8

47.444.2

66.7

23.8

36.838.5

66.7

14.3

42.1

36.5

7571.4

73.7 73.1

No Optimization Static Path ReductionHeuristic Backtracking Contradiction DetectAll Optimizations

Cra

shes

with

pre

cond

ition

s (%

)

72

Percentage of crashes whose preconditions were computed by STAR – Break down by each optimization

Research Question 1 STAR successfully computed crash preconditions for 38

(73.1%) out of the 52 crashes.

STAR’s optimization approaches have significantly improved the overall result by 20 (38.5%) crashes.

Static path reduction is the most effective optimization, but the application of all three optimizations together can achieve a much higher improvement.

73

Research Question 2 How many crashes can STAR reproduce based on the

crash preconditions?

Criterion of Reproduction [ReCrash, 2008]A crash is considered reproduced if the generated test case can trigger the same type of exception at the same crash line.

We applied STAR to generate crash reproducible test cases for each computed crash precondition.

74

Research Question 2

Subject # of Crashes# of

Precondition# of

ReproducedRatio

ACC 12 9 866.7%

(88.9%)

ANT 21 15 1257.1%

(80.0%)

LOG 19 14 1157.9%

(78.6%)

Total 52 38 3159.6%

(81.6%)

Overall crash reproductions achieved by STAR for each subject:

75

Research Question 2

SubjectAverage # of

ObjectsAvg. Candidate

MethodsMin – Max Sequence

Average Sequence

ACC 1.5 35.5 2 - 19 9.4

ANT 1.4 11.7 2 - 14 6.2

LOG 1.5 21.8 2 - 17 8.1

Total 1.5 21.4 2 - 19 7.7

More statistics for the test case generation process by STAR

76

Research Question 3 Criterion of Reproduction does not require a crash

reproduction to match the complete stack trace frames. A partial match of only the top stack frames is still considered as a

valid reproduction of the target crash according to the criterion.

The root causes of more than 60% of crashes lie in the top three stack frames [Schroter et. al, 2010] It is not necessary to reproduce the complete stack trace to reveal

the root cause of a crash.

77

Research Question 3 Drawbacks of Criterion of Reproduction

The crash reproduction may not be the same crash.

The crash reproduction may not be useful for revealing the crash triggering bug.

78

Buggy frame

Reproduced

Research Question 3 How many crash reproductions by STAR are useful for

revealing the actual causes of the crashes?

Criterion of useful crash reproductionA crash reproduction is considered useful if it can trigger the same incorrect behaviors at the buggy location, and eventually causes the crash to re-appear.

We manually examined the original and fixed versions of the program to identify the actual buggy location for each crash.

79

Research Question 3

Subject # of Reproduced # of Useful Ratio (Total)

ACC 8 7 87.5% (58.3%)

ANT 12 7 58.3% (33.3%)

LOG 11 8 72.7% (42.1%)

Total 31 22 71.0% (42.3%)

Overall useful crash reproductions achieved by STAR for each subject:

80

Comparison Study We compared STAR with two different crash reproduction

frameworks: Randoop: feedback-directed test input generation framework. It is

capable of generating thousands of test inputs that may reproduce the target crashes.

Maximum of 1000 seconds to generate test cases. (10 times of STAR)Manually provide the crash related class list to increase its probabilities.

BugRedux: a state-of-the-art crash reproduction framework. It can compute crash preconditions and generate crash reproducible test cases.

We apply the two frameworks to the same set of crashes used in our evaluation.

81

Comparison Study

82

Precondition Reproduction Usefulness0

5

10

15

20

25

30

35

40

0

12

8

18

107

38

31

22

Randoop BugRedux STAR

Nu

mb

er o

f C

rash

es

The number of crashes reproduced by the three approaches

Comparison Study

83

Randoop

12 crashes

BugRedux

10 crashes5 crashes

STAR

Comparison Study STAR outperformed Randoop because:

Randoop uses a randomized search technique to generate method sequences. Can generate many method sequences but not guided.

Due to the large search space of real world programs, the probabilities to generate crash reproducible sequences are low.

STAR outperformed BugRedux because: Several effective optimizations to improve the efficiency of the

crash precondition computation process.

A method sequence composition approach that can generate complex input objects satisfying the crash preconditions.

84

Case Study https://issues.apache.org/jira/browse/collections-411

An IndexOutOfBoundsException could be raised in method ListOrderedMap.putAll() due to incorrect index increment.

This bug was soon fixed by the developers by adding checkers to make sure index is incremented only in certain cases.

85

01 public void putAll(int index, Map map) {

02 for (Map.Entry entry : map.entrySet()) {

03 put(index, entry.getKey(), entry.getValue();

04 ++index; / / buggy increment

05 }

06 }

https://issues.apache.org/jira/browse/collections-411


Case Study STAR was applied to generate a crash reproducible test case

for this crash: Surprisingly, it successfully generated a test case that could crash both the

original and fixed (latest) version of the program.

We reported this potential issue discovered by STAR to the project developers https://issues.apache.org/jira/browse/collections-474

We also attached the auto-generated test case by STAR in our bug report.

86




Case Study Developers quickly confirmed:

The original patch for bug ACC-411 was actually incomplete. It missed a corner case that can still crash the program.

Neither the developers nor the original bug reporter identified this corner case in over a year.

It only took developers a few hours to confirmed and fixed the bug after STAR’s test case demonstrated this corner case.

The crash reproducible test case by STAR was added to the official test suite of the Apache Commons Collections project by the developers. http://svn.apache.org/r1496168

87

http://svn.apache.org/r1496168

http://svn.apache.org/r1496168

Case Study STAR is capable of identifying and reproducing crashes that

are even difficult for experienced developers.

STAR can be used to confirm the completeness of bug fixes. If a bug fix is incomplete, STAR may generate a crash reproducible

test case to demonstrate the missing corner case.

88

Challenges & Future Work

Challenges We manually examined each not reproduced crashes to

identify the major challenges of reproduction:

Environment dependency (36.7%) File input. Network input.

SMT Solver Limitation (23.3%) Complex string constraints (e.g. regular expressions) Non-linear arithmetic

Concurrency & Non-determinism (16.7%) Some crashes are only reproducible non-deterministically or under

concurrent execution.

Path Explosion (6.7%)

90

Future Work Improving reproducibility

Support for environment simulation, e.g. file inputs

Incorporate specialized SMT solver: string solver like Z3-str

Automatic fault localization Existing fault localization approaches requires both passing and

failing test cases locate faulty statements.

STAR’s ability to generate failing test cases can help automate the fault localization process.

Crash reproduction for mobile applications Android applications are similar to desktop Java programs in many

aspects.

91

Conclusions We proposed STAR, an automatic crash reproduction

framework using stack trace.

Successfully reproduced 31 (59.6%) out of 52 real world crashes from three non-trivial programs.

The reproduced crashes can effectively help developers reveal the underlying crash triggering bugs, or even identify unknown bug.

A comparison study demonstrates that STAR can significantly outperform existing crash reproduction approaches.

92

Thank You!

Appendix

Our evaluation study has one of the largest subject size compared to previous studies

Subject Sizes

Subject Subject Sizes Average Subject Size

RECRASH 200 – 86,000 47,000

ESD 100 – 100,000 N/A

BugRedux 500 – 241,000 27,000

RECORE 68 – 62,000 35,000

STAR 20,000 – 100,000 60,000

95

Research Question 1


10

20

30

40

50

60

70

80

90

100

18.5

90.4

55.159.3

11.8

67.5

28.3

39.2

15.9

74.8

47.850

13.8

86.8

48.2

54.3

2.14.9

2.4 3.3

No Optimization Static Path ReductionHeuristic Backtracking Contradiction DetectAll Optimizations

Ave

rage

tim

e sp

ent

(sec

ond)

96

Average time to compute the crash preconditions (The lower the better) – Break down by each optimization

Comparison Study


5

10

15

20

25

30

35

2.4

29.9

4.275

8.7

2.3

10.8

3.75 4.6

BugRedux STAR

Ave

rag

e ti

me

spen

t (s

eco

nd

)

97

Average time to reproduce crashes (The lower the better) – Only the common reproductions

User Survey

ACC-53“The auto-generated test case would reproduce the bug. . . I think that having such a test case would have been useful.”

98

Survey Sent ResponsesConfirmed

CorrectnessConfirmed Usefulness

31 6 (19%) 5 3

Comparison Study

ACC JSAP SAT4J0

10

20

30

40

50

60

70

80

16

30

12

29

40

2019

58

2222

54

0

29

61

36

6974

54

Sample Execution Randoop PaluluRecGen Palus STAR

Bra

nch

Co

vera

ge

(%)

Branch coverage achieved by different test case generation approaches

Technology

STAR: Stack Trace based Automatic Crash Reproduction