Download pptx - Efficient Systematic Testing for Dynamically Updatable Software Christopher M. Hayden, Eric A. Hardisty, Michael Hicks, Jeffrey S. Foster University of

Efficient Systematic Testing for

Dynamically Updatable Software

Christopher M. Hayden, Eric A. Hardisty, Michael Hicks, Jeffrey S. Foster

University of Maryland, College Park

2

Dynamic Software Updating (DSU)

Performing updates to software at runtime has clear benefits: Increased software availability

No need to terminate active connections / computation

… but can we trust updated software? Critical to ensure updates are safe

3

Our Contributions Verification of DSU through testing:

Testing Procedure

Test Minimization Algorithm

Experiments Effectiveness of Minimization

Empirical study of Update Safety / Safety Checks

4

DSU Safety DSU creates the opportunity for new

sources of bugs: Faulty state transformation

Unsafe update timing

Safety Checks – restrict when updates may be applied Activeness Safety / Con-freeness Safety

5

Activeness Safety (AS) AS prevents updates to active code

In this example, no patch updating main or foo is allowed:

main() { foo();

… baz(); }

foo() { … bar();}

6

Con-freeness Safety (CFS) CFS (Stoyle, et al ‘05) allows updates to

active code only when type safety can be ensured

In this example, no patch updating the signature of baz or bar is allowed:

main() { foo();

… baz(); }

foo() { … bar();}

7

DSU Testing Safety Checks offer limited guarantees:

CFS and AS ensure type-safe execution

AS ensure that you never return to old code following an update

Neither of these properties ensure safe update timing

We propose testing to verify the correctness of allowed update points: Use existing suite of application system tests

Ensure that updating anywhere during the execution of those tests results in an execution that passes the test.

8

Testing Procedure Approach:

Instrument application to trace update points

Execute system test and gather initial trace

For each update point in the initial trace, perform an update test: force an update at that point while executing the system test

Potential Update Points

Trace Start

9




For each update point in the initial trace, perform an update test: force an update at that point while executing the system test ✔

initial trace

10




For each update point in the initial trace, perform an update test: force an update at that point while executing the system test

✔initial trace update tests

✔✘✔

11

Update Test Minimization Program traces may have thousands or millions

of update points

Many update tests have the same behavior for a given patch

we can eliminate redundant tests

baz() {…}

Patch Avoid main() {

foo();

bar();

baz();}

Version 0

foo() {…}bar() {…}baz() {…}

Patch B

All update points yield

same behavior

All update points yield

distinct behavior

12

Minimization Algorithm

Execution events are traced if they have the potential to conflict with a patch A event conflicts with a patch p if applying p before

the event might produce a different result than applying p after the event

Example: function calls, global variable accesses

Trace the execution of a test T on P0

Iterate through the trace noting the last update point each time we reach a conflicting trace element

Run only the identified update tests Tnp

13

Experimental Results

14

Experimental Setup Based testing infrastructure on top of the

Ginseng DSU system (Neamtiu, et al): Modified to support tracing and updating at

pre-selected update points

Insertion of explicit update points before each function call to approximate more liberal systems

Disabled safety checking (CFS) for experiments

Tested 3 years of patches to OpenSSH and vsftpd (only report OpenSSH in this talk)

15

Program Modifications

foo() { while (1) { // main loop

update();

extract { ... // main loop body } } extract { ... // after main Loop }}

Identify Long-running loops

Add a Manually Selected Update

Point

Perform Loop Body Extraction

PerformContinuation

Extraction

16

Experiments: Update Test Suite

How many update tests must be run to test real-world updates to real-world applications?

How effective is minimization at eliminating redundant tests?

17

Update Test Suite Size: OpenSSH

#D to next version ReductionSig Fun Typ

eAll Points Activeness-Safe Points

0 3 98 5 580,871 g 31,791 (95%) 35,314 g 3,027 (91%)

1 0 6 0 705,322 g 1,795 (~100%) 587,578 g 1,717 (~100%)

2 5 238 11 638,720 g 63,011 (90%) 20,902 g 2,353 (89%)

3 0 18 0 772,198 g 4,324 (99%) 638,803 g 3,775 (99%)

4 13 172 10 773,086 g 27,399 (96%) 21,343 g 1,564 (93%)

5 0 24 1 878,235 g 17,398 (98%) 111,950 g 1,723 (98%)

6 6 257 10 879,668 g 47,092 (95%) 44,278 g 2,139 (95%)

7 4 179 12 918,717 g 89,601 (90%) 100,854 g 4,141 (96%)

8 0 72 3 973,364 g 34,293 (96%) 61,724 g 2,070 (97%)

9 10 157 7 933,514 g 52,356 (94%) 61,051 g 2,891 (95%)

Total 8,053,695 g 369,060 (95%) 1,683,797 g 25,400 (98%)

18

Empirical Study of Update Safety

How many failures occur when applying updates arbitrarily?

How many failures occur when applying updates subject only to the AS and CFS safety checks?

19

Safety: OpenSSHD to next version

All Points CFS Points AS Points

Update Sig Fun Type Failed Total Failed Total Failed

Total

0 3 98 5 19,715 580,871 0 68,044 0 35,314

1 0 6 0 0 705,322 0 705,322 0 587,578

2 5 238 11 306,965 683,720 1,688 75,307 4 20,902

3 0 18 0 0 772,198 0 772,198 0 638,803

4 13 172 10 565,681 773,086 609 110,633 380 21,343

5 0 24 1 10,703 878,235 0 130,000 0 111,950

6 6 257 10 163,333 879,668 44,461 96,183 110 44,278

7 4 179 12 11,380 918,717 1 80,070 1 100,854

8 0 72 3 3 973,364 0 261,885 0 61,724

9 10 157 7 357,919 933,514 24 121,337 0 61,051

Total 1,435,699 8,053,695 46,783 2,420,979 495 1,683,797

20

void handle_upload_common() { ret = do_file_recv();}

void do_file_recv() { … // receive file if (ret == SUCCESS) write(226, “OK.”); return ret;}

Version 0

void handle_upload_common() { ret = do_file_recv(); if (ret == SUCCESS) write(226, “OK.”);}

void do_file_recv () { … // receive file return ret;}

Version 1 (patch)

Unsafe Timing:Version Inconsistency (vsftpd)

Unsafe Timing:Version Inconsistency

void foo() { bar(); … baz();}

void bar() { … }

void baz() { dig(); … }

Version 0

void foo() { bar(); … baz();}

void bar() { dig(); … }

void baz() { … }

Version 1 (patch)

Manually Selected Update Points

22

D to next version

Safety

# Tests Sig Fun

Type Reduction Failed

Total

0 75 3 98 5 566 g 566 (0%) 0 566

1 75 0 6 0 630 g 592 (6%) 0 630

2 76 5 238 11 568 g 568 (0%) 0 568

3 91 0 18 0 783 g 770 (2%) 0 783

4 91 13 172 10 782 g 782 (0%) 0 782

5 104 0 24 1 860 g 841 (2%) 0 860

6 104 6 257 10 859 g 859 (0%) 0 859

7 104 4 179 12 850 g 850 (0%) 0 850

8 105 0 72 3 868 g 823 (5%) 0 868

9 104 10 157 7 833 g 833 (0%) 0 833

Total

7,599 g 7,484 (2%) 0 7,599

23

Summary We have argued that verification is

necessary to prevent unsafe updates Provided empirical evidence that AS/CFS

cannot prevent all unsafe updates

We have presented an approach for testing dynamic updates

We have presented and evaluated a minimization strategy to make update testing more practical

24

Discussion Questions

Given that AS cannot ensure correctness (both in theory and in practice), should DSU implementations continue to rely on it?

What standards for verification should be required of DSU system benchmarks?

Are there other assumptions of DSU that are appropriate for empirical evaluation?