103
Penumbra: Automatically Identifying Failure-Relevant Inputs James Clause and Alessandro Orso College of Computing Georgia Institute of Technology Supported in part by: NSF awards CCF-0725202 and CCF-0541080 to Georgia Tech

Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Embed Size (px)

Citation preview

Page 1: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Penumbra: Automatically Identifying Failure-Relevant Inputs

James Clause and Alessandro OrsoCollege of Computing

Georgia Institute of Technology

Supported in part by:NSF awards CCF-0725202 and CCF-0541080

to Georgia Tech

Page 2: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Automated Debugging

• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...

Page 3: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Automated Debugging

Code-centric

• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...

Page 4: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Automated Debugging

Code-centric

• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...

What about inputs which cause the failure?

Page 5: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06

Data-centric Techniques

Page 6: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Page 7: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

Page 8: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

Penumbra

Page 9: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

Penumbra

Comparableperformance

Page 10: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging

Data-centric Techniques

Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)

Requires:1. Single execution2. Reduced manual effort

Penumbra

Comparableperformance

Page 11: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Intuition and Terminology

Failure-revealing input vector

Page 12: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Intuition and Terminology

Failure-revealing input vector

Failure-relevant subset(inputs which are useful for investigating the failure)

Page 13: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Intuition and Terminology

Failure-revealing input vector

Failure-relevant subset(inputs which are useful for investigating the failure)

Approximate failure-relevant subsets by identifying inputs that reach the failure along

program dependencies.

Page 14: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Page 15: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfoCommand line arguments

(flag, list of file names)

Page 16: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

File statistics (for each file)(size, last modified date, ...)

Command line arguments(flag, list of file names)

Page 17: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

File statistics (for each file)(size, last modified date, ...)

File contents (for each file)(first 50 characters)

Command line arguments(flag, list of file names)

Page 18: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

File statistics (for each file)(size, last modified date, ...)

File contents (for each file)(first 50 characters)

Command line arguments(flag, list of file names) Input vector

Page 19: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Page 20: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Overflow out

Page 21: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

buf.st_size ≥ 1GB

Overflow out

Page 22: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

buf.st_size ≥ 1GB

verbose is true

Overflow out

Page 23: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

buf.st_size ≥ 1GB

verbose is true

Overflow out

read 50 characters

Page 24: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Page 25: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1. Many more inputs than lines of code.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Page 26: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1. Many more inputs than lines of code.

2. Understanding the failure requires tracing interactions between inputs from multiple sources.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Page 27: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1. Many more inputs than lines of code.

2. Understanding the failure requires tracing interactions between inputs from multiple sources.

3. Only a small percentage of all inputs are relevant for the failure.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Page 28: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

Page 29: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

Relevant context:1. When the failure occurs.2. Which data are involved in

the failure.

Page 30: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

13. strcat(out, pview);

In general, it is chosen using traditional debugging methods.

Page 31: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

Page 32: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

Page 33: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

1

2

3

4

5

6

7

8

9

0

Page 34: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

1

2

3

4

5

6

7

8

9

0

Page 35: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

Page 36: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

Page 37: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

Page 38: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

Page 39: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

verbose is true

read 50 characters

buf.st_size ≥ 1GB

Page 40: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Outline

• Penumbra approach1. Tainting inputs

2. Propagating taint marks

3. Identifying relevant inputs

• Evaluation

• Conclusions and future work

Page 41: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting InputsAssign a taint mark to each input as it enters the application.

Page 42: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting InputsAssign a taint mark to each input as it enters the application.

Per-byte Per-entity Domain specific

Page 43: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Per-byte Per-entity Domain specific

Page 44: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Per-byte Per-entity Domain specific

Page 45: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Per-byte Per-entity Domain specific

Page 46: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Per-byte Per-entity Domain specific

Page 47: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Per-byte Per-entity Domain specific

Page 48: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Page 49: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Page 50: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Maintains per -byte precision

Page 51: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Maintains per -byte precision

Further increases scalability

Page 52: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.

When a taint mark is assigned to an input, log the input’s value and where the input was read from.

Precise identification

Unnecessarily expensive

Maintains per -byte precision

Increases scalability

Per-byte Per-entity Domain specific

Maintains per -byte precision

Further increases scalability

Page 53: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Page 54: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint MarksData-flow

Propagation (DF)Data- and control-flowPropagation (DF + CF)

Page 55: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 56: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 57: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

1 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 58: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

1 21 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 59: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;

1 21 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 60: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 61: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 62: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

1 2 3

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 63: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

2: Propagating Taint Marks

Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

1 2 3

The effectiveness of each option depends on the particular failure.

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)

Page 64: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

3: Identifying Relevant-inputs1. Relevant context indicates

which data is involved in the considered failure.

2. Identify which taint marks as associated with the data indicated by the relevant context.

3. Use recorded logs to reconstruct inputs that are identified by the taint marks.

Baz

1.5GB

Page 65: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Prototype Implementation

TraceProcessor

Tracegenerator

Page 66: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

Page 67: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

Implemented using Dytan, a generic x86 tainting framework

developed in previous work [Clause and Orso 2007].

Page 68: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

Page 69: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

input vector

executable

trace

relevant context

Prototype Implementation

TraceProcessor

Tracegenerator

input subset(DF)

input subset(DF+CF)

Page 70: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging

Page 71: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging

Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:

Page 72: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging

Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:

We selected a failure-revealing input vector for each subject.

Page 73: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Page 74: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Page 75: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

• Location: statement where the failure occurs.

• Data: any data read by such statement

Page 76: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Page 77: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Page 78: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

• Use gdb to inspect stack trace and program data.

• One second timeout to prevent incorrect results.

Page 79: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.

Page 80: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1: Effectiveness

Is the information that Penumbra provides helpful for

debugging real failures?

Page 81: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: gzip & ncompressCrash when a file name is longer than 1,024 characters.

Page 82: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: gzip & ncompress

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Inputs: 10,000,056

longfile name[ ]

Page 83: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: gzip & ncompress

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]

Page 84: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: gzip & ncompress

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Relevant (DF + CF): 3# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]

Page 85: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: pineCrash when a “from” field contains 22 or more double quote characters.

Page 86: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: pine

# Inputs: 15,103,766

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

Crash when a “from” field contains 22 or more double quote characters.

Page 87: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: pine

# Inputs: 15,103,766

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "

Crash when a “from” field contains 22 or more double quote characters.

Page 88: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: pine

# Inputs: 15,103,766 # Relevant (DF): 26

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "

Crash when a “from” field contains 22 or more double quote characters.

Page 89: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1 Results: pine

# Relevant (DF + CF): 15,100,344# Inputs: 15,103,766 # Relevant (DF): 26

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "

Crash when a “from” field contains 22 or more double quote characters.

Page 90: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1: Conclusions

Page 91: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1: Conclusions1. Data-flow propagation is always effective,

data- and control-flow propagation is sometimes effective.

➡ Use data-flow first then, if necessary, use control-flow.

Page 92: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 1: Conclusions1. Data-flow propagation is always effective,

data- and control-flow propagation is sometimes effective.

➡ Use data-flow first then, if necessary, use control-flow.

2. Inputs identified by Penumbra correspond to the failure conditions.

➡Our technique is effective in assisting the debugging of real failures.

Page 93: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Study 2: Comparison with Delta Debugging

RQ1: How much manual effort does each technique require?

RQ2: How long does it take to fix a considered failure given the information provided by

each technique?

Page 94: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

Page 95: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

Page 96: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

Page 97: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

Page 98: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.

5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid

Penumbra requires considerably less setup time than Delta Debugging (although more time time overall for gzip and ncompress).

Page 99: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ2: Debugging EffortUse number of relevant inputs as a proxy for debugging effort.

Page 100: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ2: Debugging Effort

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.

Page 101: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ2: Debugging Effort

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.

• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

Page 102: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

RQ2: Debugging Effort

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.

• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

• Penumbra (DF + CF) is likely less effective for bc, pine, and squid

Page 103: Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Conclusions & Future Work

• Novel technique for identifying failure-relevant inputs.

• Overcomes limitations of existing approaches

• Single execution

• Minimal manual effort

• Comparable effectiveness

• Combine Penumbra with existing code-centric techniques.