Efficient Systematic Testing for
Dynamically Updatable Software
Christopher M. Hayden, Eric A. Hardisty, Michael Hicks, Jeffrey S. Foster
University of Maryland, College Park
2
Dynamic Software Updating (DSU)
Performing updates to software at runtime has clear benefits: Increased software availability
No need to terminate active connections / computation
… but can we trust updated software? Critical to ensure updates are safe
3
Our Contributions Verification of DSU through testing:
Testing Procedure
Test Minimization Algorithm
Experiments Effectiveness of Minimization
Empirical study of Update Safety / Safety Checks
4
DSU Safety DSU creates the opportunity for new
sources of bugs: Faulty state transformation
Unsafe update timing
Safety Checks – restrict when updates may be applied Activeness Safety / Con-freeness Safety
5
Activeness Safety (AS) AS prevents updates to active code
In this example, no patch updating main or foo is allowed:
main() { foo();
… baz(); }
foo() { … bar();}
6
Con-freeness Safety (CFS) CFS (Stoyle, et al ‘05) allows updates to
active code only when type safety can be ensured
In this example, no patch updating the signature of baz or bar is allowed:
main() { foo();
… baz(); }
foo() { … bar();}
7
DSU Testing Safety Checks offer limited guarantees:
CFS and AS ensure type-safe execution
AS ensure that you never return to old code following an update
Neither of these properties ensure safe update timing
We propose testing to verify the correctness of allowed update points: Use existing suite of application system tests
Ensure that updating anywhere during the execution of those tests results in an execution that passes the test.
8
Testing Procedure Approach:
Instrument application to trace update points
Execute system test and gather initial trace
For each update point in the initial trace, perform an update test: force an update at that point while executing the system test
Potential Update Points
Trace Start
9
Testing Procedure Approach:
Instrument application to trace update points
Execute system test and gather initial trace
For each update point in the initial trace, perform an update test: force an update at that point while executing the system test ✔
initial trace
10
Testing Procedure Approach:
Instrument application to trace update points
Execute system test and gather initial trace
For each update point in the initial trace, perform an update test: force an update at that point while executing the system test
✔initial trace update tests
✔✘✔
11
Update Test Minimization Program traces may have thousands or millions
of update points
Many update tests have the same behavior for a given patch
we can eliminate redundant tests
baz() {…}
Patch Avoid main() {
foo();
bar();
baz();}
Version 0
foo() {…}bar() {…}baz() {…}
Patch B
All update points yield
same behavior
All update points yield
distinct behavior
12
Minimization Algorithm
Execution events are traced if they have the potential to conflict with a patch A event conflicts with a patch p if applying p before
the event might produce a different result than applying p after the event
Example: function calls, global variable accesses
Trace the execution of a test T on P0
Iterate through the trace noting the last update point each time we reach a conflicting trace element
Run only the identified update tests Tnp
13
Experimental Results
14
Experimental Setup Based testing infrastructure on top of the
Ginseng DSU system (Neamtiu, et al): Modified to support tracing and updating at
pre-selected update points
Insertion of explicit update points before each function call to approximate more liberal systems
Disabled safety checking (CFS) for experiments
Tested 3 years of patches to OpenSSH and vsftpd (only report OpenSSH in this talk)
15
Program Modifications
foo() { while (1) { // main loop
update();
extract { ... // main loop body } } extract { ... // after main Loop }}
Identify Long-running loops
Add a Manually Selected Update
Point
Perform Loop Body Extraction
PerformContinuation
Extraction
16
Experiments: Update Test Suite
How many update tests must be run to test real-world updates to real-world applications?
How effective is minimization at eliminating redundant tests?
17
Update Test Suite Size: OpenSSH
#D to next version ReductionSig Fun Typ
eAll Points Activeness-Safe Points
0 3 98 5 580,871 g 31,791 (95%) 35,314 g 3,027 (91%)
1 0 6 0 705,322 g 1,795 (~100%) 587,578 g 1,717 (~100%)
2 5 238 11 638,720 g 63,011 (90%) 20,902 g 2,353 (89%)
3 0 18 0 772,198 g 4,324 (99%) 638,803 g 3,775 (99%)
4 13 172 10 773,086 g 27,399 (96%) 21,343 g 1,564 (93%)
5 0 24 1 878,235 g 17,398 (98%) 111,950 g 1,723 (98%)
6 6 257 10 879,668 g 47,092 (95%) 44,278 g 2,139 (95%)
7 4 179 12 918,717 g 89,601 (90%) 100,854 g 4,141 (96%)
8 0 72 3 973,364 g 34,293 (96%) 61,724 g 2,070 (97%)
9 10 157 7 933,514 g 52,356 (94%) 61,051 g 2,891 (95%)
Total 8,053,695 g 369,060 (95%) 1,683,797 g 25,400 (98%)
18
Empirical Study of Update Safety
How many failures occur when applying updates arbitrarily?
How many failures occur when applying updates subject only to the AS and CFS safety checks?
19
Safety: OpenSSHD to next version
All Points CFS Points AS Points
Update Sig Fun Type Failed Total Failed Total Failed
Total
0 3 98 5 19,715 580,871 0 68,044 0 35,314
1 0 6 0 0 705,322 0 705,322 0 587,578
2 5 238 11 306,965 683,720 1,688 75,307 4 20,902
3 0 18 0 0 772,198 0 772,198 0 638,803
4 13 172 10 565,681 773,086 609 110,633 380 21,343
5 0 24 1 10,703 878,235 0 130,000 0 111,950
6 6 257 10 163,333 879,668 44,461 96,183 110 44,278
7 4 179 12 11,380 918,717 1 80,070 1 100,854
8 0 72 3 3 973,364 0 261,885 0 61,724
9 10 157 7 357,919 933,514 24 121,337 0 61,051
Total 1,435,699 8,053,695 46,783 2,420,979 495 1,683,797
20
void handle_upload_common() { ret = do_file_recv();}
void do_file_recv() { … // receive file if (ret == SUCCESS) write(226, “OK.”); return ret;}
Version 0
void handle_upload_common() { ret = do_file_recv(); if (ret == SUCCESS) write(226, “OK.”);}
void do_file_recv () { … // receive file return ret;}
Version 1 (patch)
Unsafe Timing:Version Inconsistency (vsftpd)
Unsafe Timing:Version Inconsistency
void foo() { bar(); … baz();}
void bar() { … }
void baz() { dig(); … }
Version 0
void foo() { bar(); … baz();}
void bar() { dig(); … }
void baz() { … }
Version 1 (patch)
Manually Selected Update Points
22
D to next version
Safety
# Tests Sig Fun
Type Reduction Failed
Total
0 75 3 98 5 566 g 566 (0%) 0 566
1 75 0 6 0 630 g 592 (6%) 0 630
2 76 5 238 11 568 g 568 (0%) 0 568
3 91 0 18 0 783 g 770 (2%) 0 783
4 91 13 172 10 782 g 782 (0%) 0 782
5 104 0 24 1 860 g 841 (2%) 0 860
6 104 6 257 10 859 g 859 (0%) 0 859
7 104 4 179 12 850 g 850 (0%) 0 850
8 105 0 72 3 868 g 823 (5%) 0 868
9 104 10 157 7 833 g 833 (0%) 0 833
Total
7,599 g 7,484 (2%) 0 7,599
23
Summary We have argued that verification is
necessary to prevent unsafe updates Provided empirical evidence that AS/CFS
cannot prevent all unsafe updates
We have presented an approach for testing dynamic updates
We have presented and evaluated a minimization strategy to make update testing more practical
24
Discussion Questions
Given that AS cannot ensure correctness (both in theory and in practice), should DSU implementations continue to rely on it?
What standards for verification should be required of DSU system benchmarks?
Are there other assumptions of DSU that are appropriate for empirical evaluation?