Upload
greg
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Representations and Operators for Improving Evolutionary Software Repair. Claire Le Goues. Westley Weimer. Stephanie Forrest. “Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.” – Mozilla Developer, 2005. - PowerPoint PPT Presentation
Citation preview
http://genprog.cs.virginia.edu 1
REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR
Claire Le Goues
Westley Weimer
Stephanie Forrest
Claire Le Goues, GECCO 2012 2
PROBLEM: BUGGY SOFTWARE
http://genprog.cs.virginia.edu
“Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.”
– Mozilla Developer, 2005
Annual cost of software errors in the
US: $59.5 billion (0.6% of GDP).
Average time to fix a security-critical error:
28 days.
90%: Maintenance
10%: Everything Else
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 3
APPROACH: EVOLUTIONARY COMPUTATION
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 4
Genetic Programming
Search
Genetic Programming
Search
Input: source code, specification
Output: repaired version of the program
Input: source code, specification
Output: repaired version of the program
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 5
SEARCH SPACE
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 6
Learning: patches or repaired programsPopulation: 40Iterations: 10 Max variant size: unboundedLargest benchmark program: 2.8 million lines of C code
Learning: patches or repaired programsPopulation: 40Iterations: 10 Max variant size: unboundedLargest benchmark program: 2.8 million lines of C code.
OTHER GP PROBLEMSGECCO GP-TRACK
BEST PAPERSLearning: expression trees or listsPopulation: 64 – 2500Iterations: 50 – 10000Max variant size: 16 operations, 48 operations, 17 levels, 11 levels
PROGRAM REPAIR
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 7
SEARCH SPACE
EC-based repair starts with a large genome.
The starting individual is mostly correct.
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 8
Genetic Programming
Search
Input: source code, specification
Output: repaired version of the program
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 9
IN-DEPTH STUDY OF REPRESENTATION AND
OPERATORS FOR EVOLUTIONARY PROGRAM
REPAIR.
OUR GOAL
IN-DEPTH STUDY OF REPRESENTATION AND
OPERATORS FOR EVOLUTIONARY PROGRAM
REPAIR.
Claire Le Goues, GECCO 2012
INPUT
OUTPUT
EVALUATE FITNESS
DISCARD
ACCEPT
10CROSSOVER, MUTATE, SELECT
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 11
IN-DEPTH STUDY OF REPRESENTATION AND
OPERATORS FOR EVOLUTIONARY PROGRAM
REPAIR.
OUR GOAL
Claire Le Goues, GECCO 2012
INPUT
OUTPUT
EVALUATE FITNESS
DISCARD
ACCEPT
12CROSSOVER, MUTATE, SELECT
Claire Le Goues, GECCO 2012CROSSOVER, MUTATE, SELECT
DISCARD
INPUT EVALUATE FITNESS
ACCEPT
OUTPUT
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 14
5 6
8 9 10
11
12
7
1
2 3 4
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 15
Input:
REPRESENTATIONPatch:
Delete(3)
Replace(3,5)
AST/WP:
1
2
4 5
5’
2
4
1
5
54
1
2 3 Delete(3) Insert(2,4) Replace(5,1)
Insert(5,4) Insert(3,3) Delete(4) …
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 16
Mutation operators• Manipulate only existing genetic material.• Semantic checking improves probability that mutation is
viable.Crossover:
• One-point: on the weighted path or edit list.• Patch-subset: uniform, on the edit list.
Mutation operators• Manipulate only existing genetic material.• Semantic checking improves probability that mutation is
viable.Crossover:
• One-point: on the weighted path or edit list.• Patch-subset: uniform, on the edit list.
GENETIC OPERATORSswap insertdelete
Aside: mutation operator selection matters!
Claire Le Goues, GECCO 2012
Legend:Likely faulty. probabilityMaybe faulty. probabilityNot faulty.
2
5 6
1
3 4
8
7
9
11
10
12
http://genprog.cs.virginia.edu
Input:
17
2
5 6
1
3 4
8
7
9
11
10
12
Legend:High change probability.Low change probability.Not changed.
Default: 10 : 1 ratio
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 18
IN-DEPTH STUDY OF REPRESENTATION AND
OPERATORS FOR EVOLUTIONARY PROGRAM
REPAIR.
OUR GOAL
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 19
Benchmarks: 105 bugs in 8 real-world programs.1 • 5 million lines of C code, 10,000 test cases.• Bugs correspond to human-written repairs for regression test
failures.Default parameters, for comparison:
• Patch representation.• Mutation operators selected with equal random probability. 1
mutation, 1 crossover/individual/iteration. • Population size: 40. 10 iterations or 12 wall-clock hours,
whichever comes first. Tournament size: 2.
EXPERIMENTAL SETUP
1Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer, “A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 Each.” International Conference on Software Engineering (ICSE), 2012, pp. 3 – 13.
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 20
55/105 bugs repaired using default parameters.Some bugs are more difficult to repair than others!
• Easy: 100% success rate on default parameters.• Medium: 50 – 100% success rate on default parameters• Hard: 1 – 50% success rate on default parameters• Unfixed: 0% success
Metrics: • # fitness evaluations to a repair • GP success rate.
EXPERIMENTAL SETUP
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 21
Representation: • Which representation choice gives better results? • Which representation features contribute most to
success? Crossover: Which crossover operator is best? Operators:
• Which operators contribute the most to success? • How should they be selected?
Search space: How should the representation weight program statements to best define the search space?
RESEARCH QUESTIONSRepresentation:
• Which representation choice gives better results? • Which representation features contribute most to
success? Crossover: Which crossover operator is best? Operators:
• Which operators contribute the most to success? • How should they be selected?
Search space: How should the representation weight program statements to best define the search space?
Representation: • Which representation choice gives better results? • Which representation features contribute most to
success?Crossover: Which crossover operator is best? Operators:
• Which operators contribute the most to success? • How should they be selected?
Search space: How should the representation weight program statements to best define the search space?
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 22
Procedure:Compare AST/WP to PATCH on original benchmarks with default parameters.For both representations, test effectiveness of:
1. Crossover.2. Semantic check.
Results:1. Patch outperforms AST/WP (14 – 30%).2. Semantic check strongly influences success rate of
both representations.3. Crossover also improves results.
REPRESENTATION: RESULTSProcedure:
Compare AST/WP to PATCH on original benchmarks with default parameters.For both representations, test effectiveness of:
1. Crossover.2. Semantic check.
Results:1. Patch outperforms AST/WP (14 – 30%).2. Semantic check strongly influences success rate of
both representations.3. Crossover also improves results.
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 23
Representation: • Which representation choice gives better results? • Which representation features contribute most to
success? Crossover: Which crossover operator is best? Operators:
• Which operators contribute the most to success? • How should they be selected?
Search space: How should the representation weight program statements to best define the search space?
RESEARCH QUESTIONS
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 24
CROSSOVER: RESULTS
Crossover Operator Success Rate
Fitness evaluations to repair
None 54.4% 82.43Default/“Uniform” 61.1% 163.05One-Point/AST-WP
63.7% 114.12
One-Point/Patch 65.2% 118.20
Crossover Operator Success Rate
Fitness evaluations to repair
None 54.4% 82.43Default/“Uniform” 61.1% 163.05One-Point/AST-WP
63.7% 114.12
One-Point/Patch 65.2% 118.20
Crossover Operator Success Rate
Fitness evaluations to repair
None 54.4% 82.43Default/“Uniform” 61.1% 163.05One-Point/AST-WP
63.7% 114.12
One-Point/Patch 65.2% 118.20
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 25
Representation: • Which representation choice gives better results? • Which representation features contribute most to
success? Crossover: Which crossover operator is best? Operators:
• Which operators contribute the most to success? • How should they be selected?
Search space: How should the representation weight program statements to best define the search space?
RESEARCH QUESTIONS
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 26
SEARCH SPACE: SETUPHypothesis: statements executed only by the failing test case(s) should be mutated more often than those also executed by the passing test cases.Procedure: examine that ratio in actual repairs.Result:
Expected: 10 : 1 vs.
Actual: 1 : 1.85
Hypothesis: statements executed only by the failing test case(s) should be mutated more often than those also executed by the passing test cases.Procedure: examine that ratio in actual repairs.Result:
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 27
Easy Medium Hard All0
102030405060708090
100110
34.3
93.6103.1
66.0
49.1
75.7
27.1
59.5
36.3
67.057.7 58.6
DefaultRealisticEqual
Search difficulty
# fit
ness
eva
luat
ions
to re
pair
SEARCH SPACE: REPAIR TIME
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 28
Easy Medium Hard 0% All0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%1.00
0.78
0.24
0.00
0.62
0.930.85
0.23 0.23
0.69
0.92 0.90
0.33
0.19
0.70
DefaultRealisticEqual
Search difficulty
GP
Succ
ess
Rat
eSEARCH SPACE: SUCCESS RATE
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 29
This EC problem is atypical; atypical problems warrant study.We studied representation and operators for EC-based bug repair. These choices matter, especially for difficult bugs.
• Incorporating all recommendations, GenProg repairs 5 new bugs; repair time decreases by 17–43% on difficult bugs.
We don’t know why some of these things are true, but:• We now have lots of interesting data to dig into!• We are currently (as we sit here) doing more and bigger
runs with the new parameters on the as-yet unfixed bugs.
CONCLUSIONS/DISCUSSION
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 30
PLEASE ASK QUESTIONS
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 31
REPRESENTATION BENCHMARKSProgram LOC Tests Bug Descriptiongcd 22 6 infinite loop Exampleuniq-utx 1146 6 segfault Text processinglook-utx 1169 6 segfault Dictionary lookuplook-svr 1363 6 infinite loop Dictionary lookupunits-svr 1504 6 segfault Metric conversionderoff-utx 2236 6 segfault Document processing
nullhttpd 5575 7 buffer exploit Webserver
indent 9906 6 infinite loop Source code processingflex 18775 6 segfault Lexical analyzer generator
atris 21553 3 buffer exploit Graphical tetris game
Total 63249
Claire Le Goues, GECCO 2012 http://genprog.cs.virginia.edu 32
SEARCH SPACE BENCHMARKSProgram LOC Tests Bugs Descriptionfbc 97,000 773 3 Language (legacy)gmp 145,000 146 2 Multiple precision mathgzip 491,000 12 5 Data compressionlibtiff 77,000 78 24 Image manipulationlighttpd 62,000 295 9 Web serverphp 1,046,000 8,471 44 Language (web)python 407,000 355 11 Language (general)wireshark 2,814,000 63 7 Network packet analyzerTotal 5,139,000 10,193 105
55/105 bugs repaired using default parameters.