View
222
Download
2
Category
Tags:
Preview:
Citation preview
1
Delta Debugging
Koushik SenEECS, UC Berkeley
2
<td align=left valign=top><SELECT NAME="op sys" MULTIPLE SIZE=7><OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows 95<OPTION VALUE="Windows 98">Windows 98<OPTION VALUE="Windows ME">Windows ME<OPTION VALUE="Windows 2000">Windows 2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="Mac System 7">Mac System 7<OPTION VALUE="Mac System 7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System 7.6.1<OPTION VALUE="Mac System 8.0">Mac System 8.0<OPTION VALUE="Mac System 8.5">Mac System 8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="Mac System 9.x">Mac System 9.x<OPTION VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTION VALUE="BSDI">BSDI<OPTION VALUE="FreeBSD">FreeBSD<OPTION VALUE="NetBSD">NetBSD<OPTION VALUE="OpenBSD">OpenBSD<OPTION VALUE="AIX">AIX<OPTION VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HP-UX<OPTION VALUE="IRIX">IRIX<OPTION VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTION VALUE="OS/2">OS/2<OPTION VALUE="OSF/1">OSF/1<OPTION VALUE="Solaris">Solaris<OPTION VALUE="SunOS">SunOS<OPTION VALUE="other">other</SELECT></td><td align=left valign=top><SELECT NAME="priority" MULTIPLE SIZE=7><OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTION VALUE="P4">P4<OPTION VALUE="P5">P5</SELECT></td><td align=left valign=top><SELECT NAME="bug severity" MULTIPLE SIZE=7><OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTION VALUE="normal">normal<OPTION VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTION VALUE="enhancement">enhancement</SELECT></tr></table>
How do we go from this
File Print Segmentation Fault
3
into this
<SELECT>
File Print Segmentation Fault
Simplification
• Once one has reproduced a problem, one must find out what’s relevant:– Does the problem really depend on 10,000 lines
of input?– Does the failure really require this exact
schedule?– Do we need this sequence of calls?
4
Broken computer
• How do you identify the problem?
5
Why Simplify?
• Ease of communication. A simplified test case is easier to communicate.
• Easier debugging. Smaller test cases result in smaller states and shorter executions.
• Identify duplicates. Simplified test cases subsume several duplicates.
6
7
Motivation
• The Mozilla open-source web browser project receives several dozens bug reports a day.
• Each bug report has to be simplified– Eliminate all details irrelevant to producing the failure
• To facilitate debugging• To make sure it does not replicate a similar bug report
• In July 1999, Bugzilla listed more than 370 open bug reports for Mozilla.– These were not even simplified– Mozilla engineers were overwhelmed with work– They created the Mozilla BugAThon: a call for volunteers
to process bug reports
8
Your Solution
• How do you solve these problems?
• Binary search– Cut the test case in half– Iterate
• Brilliant idea: Why not automate this?
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
9
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
10
✘
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
11
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
12
✘
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
13
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
14
✘
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
15
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
16
✔
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
17
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
18
✘
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
19
✔
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
20
✔
Binary Search
• Proceed by binary search. Throw away half the input and see if the output is still wrong.
• If not, go back to the previous state and discard the other half of the input.
21
Simplified input
22
<td align=left valign=top><SELECT NAME="op sys" MULTIPLE SIZE=7><OPTION VALUE="All">All<OPTION VALUE="Windows 3.1">Windows 3.1<OPTION VALUE="Windows 95">Windows 95<OPTION VALUE="Windows 98">Windows 98<OPTION VALUE="Windows ME">Windows ME<OPTION VALUE="Windows 2000">Windows 2000<OPTION VALUE="Windows NT">Windows NT<OPTION VALUE="Mac System 7">Mac System 7<OPTION VALUE="Mac System 7.5">Mac System 7.5<OPTION VALUE="Mac System 7.6.1">Mac System 7.6.1<OPTION VALUE="Mac System 8.0">Mac System 8.0<OPTION VALUE="Mac System 8.5">Mac System 8.5<OPTION VALUE="Mac System 8.6">Mac System 8.6<OPTION VALUE="Mac System 9.x">Mac System 9.x<OPTION VALUE="MacOS X">MacOS X<OPTION VALUE="Linux">Linux<OPTION VALUE="BSDI">BSDI<OPTION VALUE="FreeBSD">FreeBSD<OPTION VALUE="NetBSD">NetBSD<OPTION VALUE="OpenBSD">OpenBSD<OPTION VALUE="AIX">AIX<OPTION VALUE="BeOS">BeOS<OPTION VALUE="HP-UX">HP-UX<OPTION VALUE="IRIX">IRIX<OPTION VALUE="Neutrino">Neutrino<OPTION VALUE="OpenVMS">OpenVMS<OPTION VALUE="OS/2">OS/2<OPTION VALUE="OSF/1">OSF/1<OPTION VALUE="Solaris">Solaris<OPTION VALUE="SunOS">SunOS<OPTION VALUE="other">other</SELECT></td><td align=left valign=top><SELECT NAME="priority" MULTIPLE SIZE=7><OPTION VALUE="--">--<OPTION VALUE="P1">P1<OPTION VALUE="P2">P2<OPTION VALUE="P3">P3<OPTION VALUE="P4">P4<OPTION VALUE="P5">P5</SELECT></td><td align=left valign=top><SELECT NAME="bug severity" MULTIPLE SIZE=7><OPTION VALUE="blocker">blocker<OPTION VALUE="critical">critical<OPTION VALUE="major">major<OPTION VALUE="normal">normal<OPTION VALUE="minor">minor<OPTION VALUE="trivial">trivial<OPTION VALUE="enhancement">enhancement</SELECT></tr></table>
Complex Input
File Print Segmentation Fault
Simplified Input
• <SELECT NAME="priority" MULTIPLE SIZE=7>• Simplified from 896 lines to one single line• Required 12 tests only
23
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>
24
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>
25
✘
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
26
✘✔
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
27
✘✔✔
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
What do we do if both halves pass?
28
✘✔✔
29
Change Granularity
Increase Granularity of test case
DecreaseGranularity of test case
Failure of the test case
More chances Less chances
Progress of the search
Slower, Reduced by amount < ½
Faster,Reduced by amount > ½
30
General DD Algorithm
Basic idea:1. Start with few & large changes first
2. If all alternatives pass or are unresolved, apply more & smaller changes.
∆1∆1 ∆2∆2
∆2 ∆2 ∆1∆1
∆1∆1
∆1∆1 ∆2∆2∆2∆2 ∆3∆3
∆3∆3∆4∆4
∆4∆4∆5∆5
∆5∆5∆6∆6
∆6∆6∆7∆7
∆7∆7∆8∆8
∆8∆8
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
31
✘✔✔
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
32
✘✔✔✔
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
33
✘✔✔✔✘
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
34
✘✔✔✔✘
✘
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
35
✘✔✔✔
✔
✘
✘
36
Inputs and Failures
• Let E be the set of possible inputs
• rP E corresponds to an input that passes
• rF E corresponds to an input that fails
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
Example of rF and rP ?
37
✘✔✔✔
✔
✘
✘
Binary Search
• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>• <SELECT NAME="priority" MULTIPLE SIZE=7>
Example of rF and rP ?
rP = <SELECT NALE SIZE=7>
rF = <SELECT NAME="priori
38
✘✔✔✔
✔
✘
✘
39
Changes
• We can go from one input to another by changes
• A change is a mapping : E E which takes one input and changes it to another input
r1 = <SELECT NAty" MULTIPLE SIZE=7>’ = insert ME="priori at appropriate positionWhat is ’(r1)?
40
Changes
• We can go from one input to another by changes
• A change is a mapping : E E which takes one input and changes it to another input
r1 = <SELECT NAty" MULTIPLE SIZE=7>’ = insert ME="priori at appropriate position What is ’(r1)?
<SELECT NAME="priority" MULTIPLE SIZE=7>
41
Decomposing Changes
• A change can be decomposed to a number of elementary changes 1, 2, ..., n where =1 o 2 o ... o n
– where (i o j)(r) = i(j(r))
• For example, deleting a part of the input file can be decomposed to deleting characters one be one from the input file – another way to say it: by composing deleting of single characters
we can get a change that deletes part of the input file ’ = insert ME="priori ’ = 1 o 2 o 3 o 4 o 5 o 6 o 7 o 8 o 9 o 10
1 = insert M 2 = insert E
42
Summary
• We have an input with failure: rF
• We have an input without failure: rP
• We have a set of changes cF = {1, 2, ..., n } such that
rF = (1 o 2 o ... o n )(rP) where rF is a run with failure
• Each subset c of cF is a test case
43
Testing Test Cases
• Given a test case c, we would like to know if the input generated by changing rP by the changes in c is an input that causes a failure
• We define a function
test: Powerset(cF) {P, F, ?}
such that, given c={1, 2, ..., m} cF
test(c) = F iff (1 o 2 o ... o m )(rP) is a failing input
44
Minimizing Test Cases
• Now the question is: Can we find the minimal test case c such that test(c) = F?
• A test case c cF is called the global minimum of cF iffor all c’ cF , |c’| < |c| test(c’) F
• Global minimum is the smallest set of changes which will make the program fail
45
Minimizing Test Cases
• Now the question is: Can we find the minimal test case c such that test(c) = F?
• A test case c cF is called the global minimum of cF iffor all c’ cF , |c’| < |c| test(c’) F
• Global minimum is the smallest set of changes which will make the program fail
• Finding the global minimum may require us to perform exponential number of tests
Search for 1-minimal Input
• Different problem formulationFind a set of changes that cause the failure, but
removing any change causes the failure to go away
• This is 1-minimality
46
47
Minimizing Test Cases
• A test case c cF is called a local minimum of cF iffor all c’ c , test(c’) F
• A test case c cF is n-minimal iffor all c’ c , |c| |c’| n test(c’) F
• A test case is 1-minimal if for all i c , test(c – {i}) F
48
Naïve Algorithm
• To find a 1-minimal subset of C, simply
• Remove one element from C
• If c – {} = X, recurse with smaller set
• If C – {} X, C is 1-minimal
49
Analysis
• In the worst case, – We remove one element from the set per
iteration– After trying every other element
• Work is potentially N + (N-1) + (N-2) + …
• This is O(N2)
50
Work Smarter, Not Harder
• We can often do better
• Silly to start out removing 1 element at a time– Try dividing change set in 2 initially– Increase # of subsets if we can’t make progress– If we get lucky, search will converge quickly
51
Minimization Algorithm
• The delta debugging algorithm finds a 1-minimal test case
• It partitions the set cF to 1, 2, ... n 1, 2, ... n
are pairwise disjoint
– cF = 1 2 ... n
• Define the complement of i as i = cF i
• Start with n = 2• Tests each test case defined by the partition and their
complements• Reduce the test case if a smaller failure inducing set is
found– otherwise refine the partition, i.e. n = n*2
52
Each Step of the Minimization Algorithm
Let n = 2(A) Start with as test set(B) Test each 1, 2, ... n and each 1, 2, ..., n
(C) There are four possible outcomes1. Some i causes failure
– Go to step (A) with = i and n = 2
2. Some i causes failure– Go to step (A) with = i and n = n 1
3. No test causes failure– Increase granularity: Go to (A) with = and n = 2n
4. The granularity can no longer be increased– Done, found the 1-minimal subset
Delta Debugging
53
54
Analysis
• Worst case is still quadratic
• Subdivide until each set is of size 1– Reduced to the naïve algorithm
• Good news– For single failure, converges in log N– Binary search again
55
The GNU C Compiler
• This program (bug.c) causes GCC 2.95.2 to crash when optimization is enabled
• We would like to minimize this program in order to file a bug report
• In the case of GCC, a passing program run is the empty input
• For the sake of simplicity, we model change as the insertion of a single character
– r is running GCC with an empty input
– r means running GCC with bug.c– each change i inserts the ith
character of bug.c
#define SIZE 20
double mult(double z[], int n){ int i, j;
i = 0; for (j = 0; j < n; j++) { i = i + j + 1; z[i] = z[i] * (z[0] + 1.0); } return z[n];}
void copy(double to[], double from[], int count){ int n = (count + 7) / 8; switch (count % 8) do { case 0: *to++ = *from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; case 1: *to++ = *from++; } while (--n > 0); return mult(to, 2);}
int main(int argc, char *argv[]){ double x[SIZE], y[SIZE]; double *px = x;
while (px < x + SIZE) *px++ = (px – x) * (SIZE + 1.0); return copy(y, x, SIZE);}
56
The GNU C Compiler
• The test procedure would– create the appropriate subset of bug.c
– feed it to GCC– return iff GCC had crashed, and otherwise
755
377
188
77
57
The GNU C Compiler
• The minimized code is
• The test case is 1-minimal– No single character can be removed without removing the
failure– Even every superfluous whitespace has been removed– The function name has shrunk from mult to a single t– This program actually has a semantic error (infinite loop), but
GCC still isn't supposed to crash• So where could the bug be?
– We already know it is related to optimization– If we remove the –O option to turn off optimization, the failure
disappears
t(double z[],int n){int i,j;for(;;){i=i+j+1;z[i]=z[i]*(z[0]+0);}return z[n];}
58
The GNU C Compiler
• The GCC documentation lists 31 options to control optimization on Linux
• It turns out that applying all of these options causes the failure to disappear– Some option(s) prevent the failure
–ffloat-store –fno-default-inline –fno-defer-pop–fforce-mem –fforce-addr –fomit-frame-pointer–fno-inline –finline-functions –fkeep-inline-functions–fkeep-static-consts –fno-function-cse –ffast-math–fstrength-reduce –fthread-jumps –fcse-follow-jumps–fcse-skip-blocks –frerun-cse-after-loop –frerun-loop-opt–fgcse –fexpensive-optimizations –fschedule-insns–fschedule-insns2 –ffunction-sections –fdata-sections–fcaller-saves –funroll-loops –funroll-all-loops–fmove-all-movables –freduce-all-givs –fno-peephole–fstrict-aliasing
59
The GNU C Compiler
• We can use test case minimization in order to find the preventing option(s)– Each i stands for removing a GCC option– Having all i applied means to run GCC with no option (failing)– Having no i applied means to run GCC with all options
(passing)• After seven tests, the single option -ffast-math is found which
prevents the failure– Unfortunately, it is a bad candidate for a workaround because it
may alter the semantics of the program– Thus, we remove -ffast-math from the list of options and make
another run– Again after seven tests, it turn out that -fforce-addr also prevents
the failure– Further examination shows that no other option prevents the
failure
60
The GNU C Compiler
• So, this is what we can send to the GCC maintainers– The minimal test case– “The failure only occurs with optimization”– “-ffast-math and -fforce-addr prevent the failure”
61
Case Studies
• Reducing a Mozilla bug to – Print the html file containing <SELECT>
• Minimizing fuzz input– Fuzzing: Take a program, send it randomly
generated input and see it crashes– Used delta debugging to minimize fuzz input
62
Another Application
• Yesterday, my program worked. Today, it does not. Why? – The new release 4.17 of GDB changed 178,000 lines– it no longer integrated properly with DDD (a graphical
front-end) – How to isolate the change that caused the failure.
63
Summary
• Delta Debugging is a technique, not a tool• Bad News:
– Probably must be reimplemented for each significant system
– To exploit knowledge of changes • Good News:
– Relatively simple algorithm, significant payoff
– It’s worth reimplementing
Recommended