Recent Work, in Two Acts Carlos Pacheco 8/15/2008 Agenda

Recent Work, in Two ActsCarlos Pacheco

8/15/2008

Agenda

1. Deconstructing Randoop

2. Mutation-based test generation

Deconstructing Randoop

deconstruct

verb [trans.]analyze (a text or a linguistic or conceptual system), typically in order to expose its hidden internal assumptions and contradictions and subvert its apparent significance or unity.

verb [trans.]analyze (a tool, algorithm or software system), typically in order to expose its hidden internal assumptions and components and evaluate its apparent significance or unity.

(alt.)deconstruct

Goals

• Identify Randoop's key, separable ideas

• Determine their individual effectiveness

• Determine their combination's effectiveness

Randoopclasses

under test

propertiesto check

feedback-directed random test generator

failingtest cases

Randoopclasses

under test

propertiesto check


failingtest cases

java.util.Collectionsjava.util.ArrayListjava.util.TreeSetjava.util.LinkedList...

Randoopclasses

under test

propertiesto check


failingtest cases


Reflexivity of equality:" o != null : o.equals(o) == true

Randoopclasses

under test

propertiesto check


failingtest cases


Reflexivity of equality:" o != null : o.equals(o) == true

public void test() {

Object o = new Object(); ArrayList a = new ArrayList(); a.add(o); TreeSet ts = new TreeSet(a); Set us = Collections.unmodifiableSet(ts);

// Fails at runtime. assertTrue(us.equals(us));

}

1. Seed component setcomponents = { ... }

2. Do until time limit expires:a. Create a new sequence

i. Randomly pick a method call m(T1...Tk)/Tret ii. For each input parameter of type Ti, randomly pick a sequence Si

from the components that constructs an object vi of type Ti

iii. Create new sequence Snew = S1; ... ; Sk ; Tret vnew = m(v1...vk);

iv. if Snew was previously created (lexically), go to i

b.Classify the new sequence Snew

a.May discard, output as test case, or add to components

Feedback-directed random test generation

int i = 0; boolean b = false;

Classifying a sequence

execute andcheck

properties

componentset

contract-violatingtest case

propertyviolated?

minimizesequence

yes

exceptionthrown?

no

yes

discardsequence

start

no

Prior evaluation

• Compared with other techniques– Model checking, symbolic execution, traditional random

testing

• On collection classes (lists, sets, maps, etc.)– Randoop achieved equal or higher code coverage in less

time

• On a large benchmark of programs (750KLOC)– Randoop revealed more errors

Randoop's two key ideas

1. Create method sequences incrementally (component set)

2. Use runtime information to guide generation

14

What makes it work?

• Component set?• Runtime feedback?• Both... Or neither?

Four techniques

RandoopRandoopwithout

feedback

naivewith

feedbacknaive

yes no

yes

no

use feedback?

usecompo-nents?

Naive sequence generation

• To generate one sequence:1. Start from the empty sequence S2. Select an enabled method at random3. Select input to the method from S4. Extend S with the new method call, go back to 1

• A method is enabled if S declares objects that can serve as its receiver and arguments

Naive generation with feedback

• Extend new sequence with method call• Execute method call, check properties• If exception/failure, go back one step– Remove last method call– Attempt different extension

Randoop without feedback

Add every new sequence to component set, regardless of its execution result.

Review: four techniques

RandoopRandoopwithout

feedback

naivewith

feedbacknaive

yes no

yes

no

use feedback?

usecompo-nents?

Evaluation

• Apply the four techniques to a set of libraries• Compare– coverage– errors revealed

library members LOC

chain 189 8K

logging 136 4Kjavax 90 14K

prims 990 6Kcollections 415 39K

jelly 469 14Kutilmde 577 13K

collext 2114 61Kmath 687 21K

Libraries

library 1 2 3 4 5

chain 28 1.3K 97K 10M 1B

logging 35 1.6K 112K 10.7M 1B

javax 38 2.2K 167K 15M 1.3B

prims 372 154K 63M 26B 1 x 1012

collections 1.6K 2.7M 4.6B 7.8 x 1012 1.3 x 1016

jelly 910 1.5M 3.5B 8.1 x 1012 1.8 x 1016

utilmde 2.8K 9.2M 30B 3.0 x 1014 3.0 x 1018

collext 6.9K 49M 343M 2.4 x 1015 1.7 x 1019

math 25K 623M 1.5 x 1013 3.8 x 1017 9.6 x 1021

Input space sizedistinct input sequences of length...

Input

For each library:– All public members in library– Sequence limit: 50 calls– Small set of primitives (0, -1, 100, 'a', etc.)

Other details

• Stopping criterioncoverage does not increase after 100 seconds

• Five propertiesEquals symmetric, equals reflexive, equals to null returns false, equals-hashcode, no NPEs

• Engineering fairness– Optimized all four techniques to make sequence

construction efficient

Output

• Failing test cases

• One test per (violating method,property) pair

• Ongoing: manually inspecting all failures

Failureslibrary naive Randoop w/o

feedbacknaive w/feedback

Randoop

chain 24 0 0 13

logging 19 0 0 12

javax 0 0 0 0

prims 10 0 13 16

collections 21 0 20 15

jelly 57 0 0 80

utilmde 1 2 0 2

collext 50 3 14 85

math 64 8 2 10

TOTAL 246 13 49 233

Failure kinds

library naive Randoop w/ofeedback

naive w/feedback

Randoop

NPEs 218 13 0 176

Other 28 0 49 57

TOTAL 246 13 49 233

Coverage achieved

javax chain jelly collext logging util collections prims math0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Randoop without feedback

naive with feedback

naive

Randoop

Coverage vs. time

Randoop

Other

time

cove

rage

Coverage vs. time

Randoop

Other

time

cove

rage

tother

Coverage vs. time

Randoop

Other

time

cove

rage

tothertRandoop

tRandoop / totherlibrary naive Randoop w/o

feedbacknaive w/feedback

chain 0.1 1 0.01

logging n/a 1 0.15

javax 0.67 1 0.12

prims 0.03 0.15 0.01

collections 0.03 0.22 0.01

jelly 0.03 1 0.01

utilmde 0.001 0.11 0.005

collext 0.04 0.19 0.06

math 0.01 0.13 0.01

Conclusion

• Randoop:– High coverage very quickly– More "serious" failures

• Naive:– Good coverage, slower/less than Randoop– More NPE failures

• Other techniques– Not as effective

Mutation-based generation

Carlos PachecoJeff Perkins

Motivation

• Randoop– Achieves reasonable coverage– Hits a coverage plateau

• Can we push the coverage plateau up?

Randoop

time

cove

rage

Goal

Idea

• Follow random generation with systematic mutation of method sequences– null– unrelated types– related types (super, subclasses)– aliasing– structurally-equivalent objects

Mutation via dataflow tracking

1. When coverage plateaus, stop random generation2. Identify frontier branches3. for each frontier branch:

a) Select candidate sequences (that reach frontier branches)b) Track the variables whose data flows into branch conditionc) Systematically mutate the variables

Example

Candidate sequence:

int var1 = 5;BinTree var2 = new BinTree(var1);int var3 = 2;t.add(var3);int var4 = 6;t.remove(var4);

Frontier branch:

Class BinTree {public boolean remove(int x) { . . . if (current.value == x) . . . }}

Runtime analysis:

relevant variables: var3 and var4var3 was compared to 6var4 was compared to 2

Strategy:

Modify every relevant variableto take on each compared value

Runtime analysis

• Determine data flow at frontier branch1. Tag each variable's runtime value on creation2. On each operation, create a tree with the operation as

the root and operands as branches3. From branch tree, determine

relevant variablesvalues that each variable was compared to

• Could also track control flow

Sequence mutation strategies

• Primitive variables– For each primitive variable x:

Set x to compared values +/- {0, 1, 10, 100}

• Reference variables– Given two variables x and y (of the same type):• Replace uses of x by y (alias)• Make x and y structurally equivalent (copy)• Make one null, the other non-null

Example 2

Candidate sequence:

int var0 = 100;int var1 = -1;List var2 = nCopies(var0, var1);shuffle(var2);

Frontier branch:

public int next(int n) { . . . if ((n & -n)==n) // i.e. n is a power of 2 . . .}

Runtime analysis:

Relevant variables: var0, var1var0 was compared to 4, 100

Winning strategy:

set var0 to 4

Example 3

Candidate sequence:ArrayList var0 = new ArrayList();int var1 = 0;String var2 = "a";var0.add(var1, var2);int var4 = 1;String var5 = "a";var0.add(var4, var5);long var7 = 100;boolean var8 = var0.add(var7);int var9 = 0;short var10 = 0;Object var11 = var0.set(var9, var10);String var12 = "b";boolean var13 = var0.remove(var12);double var14 = 0.0;int var15 = var0.lastIndexOf(var14);

Frontier branch:

public int lastIndexOf(Object elem) { . . . for (int i = size-1 ; i >= 0 ; i--) { if(elem.equals(elementData[i])) . . .}

Runtime analysis:

Relevant variables: var1, var9, var10, var14

Example 3

Candidate sequence:ArrayList var0 = new ArrayList();int var1 = 0;String var2 = "a";var0.add(var1, var2);int var4 = 1;String var5 = "a";var0.add(var4, var5);long var7 = 100;boolean var8 = var0.add(var7);int var9 = 0;short var10 = 0;Object var11 = var0.set(var9, var10);String var12 = "b";boolean var13 = var0.remove(var12);double var14 = 0.0;int var15 = var0.lastIndexOf(var14);

Frontier branch:

public int lastIndexOf(Object elem) { . . . for (int i = size-1 ; i >= 0 ; i--) { if(elem.equals(elementData[i])) . . .}

Winning strategy:

Replace uses of var14 with var7

Coverage-directed sequence mutation

• Randoop covered 933 of 2064 branches– 163 frontier branches– Dataflow information was found for 29 frontier

branches• Mutation strategies were able to cover 19 of

those branches

Dataflow implementation

• Instrument java class files as they are loaded• Maintain tags for each runtime value• When two values interact, merge their tags• Create summaries for JDK methods

Documents

Recent Work, in Two Acts Carlos Pacheco 8/15/2008 Agenda