Penumbra: Automatically Identifying Failure-Relevant Inputs (ISSTA 2009)

Penumbra: Automatically Identifying Failure-Relevant Inputs

James Clause and Alessandro OrsoCollege of Computing

Georgia Institute of Technology

Supported in part by:NSF awards CCF-0725202 and CCF-0541080

to Georgia Tech

Automated Debugging

• Gupta and colleagues ’05• Jones and colleagues ’02• Korel and Laski ’88• Liblit and colleagues ’05• Nainar and colleagues ’07• Renieris and Reiss ’03• Seward and Nethercote ’05• Tucek and colleagues ’07• Weiser ’81• Zhang and colleagues ’05• Zhang and colleagues ’06• ...

Automated Debugging

Code-centric


Automated Debugging

Code-centric


What about inputs which cause the failure?

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06

Data-centric Techniques

• Chan and Lakhotia ’98• Zeller and Hildebrandt ’02• Misherghi and Su ’06Delta Debugging




Requires:1. Multiple executions2. Large amounts of manual

effort (oracle creation, setup)





Penumbra





Penumbra

Comparableperformance





Requires:1. Single execution2. Reduced manual effort

Penumbra

Comparableperformance

Intuition and Terminology

Failure-revealing input vector



Failure-relevant subset(inputs which are useful for investigating the failure)



Failure-relevant subset(inputs which are useful for investigating the failure)

Approximate failure-relevant subsets by identifying inputs that reach the failure along

program dependencies.

Motivating Example

int main(int argc, char **argv) { 1. int verbose, i, total_size = 0; 2. struct stat buf; 3. verbose = atoi(argv[1]); 4. for(i = 2; i < argc; i++) { 5. int fd = open(argv[i], O_RDONLY); 6. fstat(fd, &buf); 7. char *out = malloc(60); 8. sprintf(out, "%d", buf.st_size); 9. if(verbose) {10. char *pview = malloc(51);11. read(fd, pview, 50);12. pview[50] = '\0';13. strcat(out, pview);14. }15. printf("%s: %s\n", argv[i], out);16. total_size += buf.st_size;17. }18. printf("total: %d\n", total_size); }

fileinfo

Motivating Example


fileinfoCommand line arguments

(flag, list of file names)

Motivating Example


fileinfo

File statistics (for each file)(size, last modified date, ...)

Command line arguments(flag, list of file names)

Motivating Example


fileinfo


File contents (for each file)(first 50 characters)

Command line arguments(flag, list of file names)

Motivating Example


fileinfo


File contents (for each file)(first 50 characters)

Command line arguments(flag, list of file names) Input vector

Motivating Example


fileinfo

Motivating Example


fileinfo

Overflow out

Motivating Example


fileinfo

buf.st_size ≥ 1GB

Overflow out

Motivating Example


fileinfo

buf.st_size ≥ 1GB

verbose is true

Overflow out

Motivating Example


fileinfo

buf.st_size ≥ 1GB

verbose is true

Overflow out

read 50 characters

Motivating Example


fileinfo

1. Many more inputs than lines of code.

Motivating Example


fileinfo


2. Understanding the failure requires tracing interactions between inputs from multiple sources.

Motivating Example


fileinfo


2. Understanding the failure requires tracing interactions between inputs from multiple sources.

3. Only a small percentage of all inputs are relevant for the failure.

Motivating Example


fileinfo

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

Relevant context:1. When the failure occurs.2. Which data are involved in

the failure.

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

13. strcat(out, pview);

In general, it is chosen using traditional debugging methods.

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

1

2

3

4

5

6

7

8

9

0

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

1

2

3

4

5

6

7

8

9

0

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

filei

nfo

Penumbra Overview

foo: 512 ... bar: 1024 ... baz: 150... total: 150...

Foo

512B

Bar

1KB

Baz

1.5GB

1 Taint inputs

2 Propagate

taint marks

3 Identify

relevant inputs

0 8 9

verbose is true

read 50 characters

buf.st_size ≥ 1GB

Outline

• Penumbra approach1. Tainting inputs

2. Propagating taint marks

3. Identifying relevant inputs

• Evaluation

• Conclusions and future work

1: Tainting InputsAssign a taint mark to each input as it enters the application.

1: Tainting InputsAssign a taint mark to each input as it enters the application.

Per-byte Per-entity Domain specific

1: Tainting Inputs

Assign a unique taint mark to each

byte.(read from files)

Assign the same taint mark to related bytes.

(argv, argc, fstat, ...)

Assign taint marks based on user-

provided information.

Assign a taint mark to each input as it enters the application.


1: Tainting Inputs








Precise identification


1: Tainting Inputs









Unnecessarily expensive


1: Tainting Inputs











1: Tainting Inputs










Maintains per -byte precision


1: Tainting Inputs











Increases scalability


1: Tainting Inputs













1: Tainting Inputs














1: Tainting Inputs














Further increases scalability

1: Tainting Inputs








When a taint mark is assigned to an input, log the input’s value and where the input was read from.







Further increases scalability

2: Propagating Taint Marks

2: Propagating Taint MarksData-flow

Propagation (DF)Data- and control-flowPropagation (DF + CF)


Taint marks flow along onlydata dependencies.

Taint marks flow along data and control dependencies.

Data-flowPropagation (DF)

Data- and control-flowPropagation (DF + CF)




C = A + B;






C = A + B;

1 2






C = A + B;

1 21 2






C = A + B;

1 21 2






C = A + B;if(X) { C = A + B;}

1 21 2






C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3






C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

1 2 3






C = A + B;if(X) { C = A + B;}

1 21 2

1 2

3

1 2 3

The effectiveness of each option depends on the particular failure.



3: Identifying Relevant-inputs1. Relevant context indicates

which data is involved in the considered failure.

2. Identify which taint marks as associated with the data indicated by the relevant context.

3. Use recorded logs to reconstruct inputs that are identified by the taint marks.

Baz

1.5GB

Prototype Implementation

TraceProcessor

Tracegenerator

input vector

executable

trace

relevant context


TraceProcessor

Tracegenerator

input vector

executable

trace

relevant context


TraceProcessor

Tracegenerator

Implemented using Dytan, a generic x86 tainting framework

developed in previous work [Clause and Orso 2007].

input vector

executable

trace

relevant context


TraceProcessor

Tracegenerator

input vector

executable

trace

relevant context


TraceProcessor

Tracegenerator

input subset(DF)

input subset(DF+CF)

EvaluationStudy 1: Effectiveness for debugging real failures Study 2: Comparison with Delta Debugging


Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:


Application KLoC Fault locationbc 1.06 10.5 more_arrays : 177

gzip 1.24 6.3 get_istat : 828

ncompress 4.24 1.4 comprexx : 896

pine 4.44 239.1 rfc822_cat : 260

squid 2.3 69.9 ftpBuildTitleUrl : 1024

Subjects:

We selected a failure-revealing input vector for each subject.

Data GenerationPenumbra Delta Debugging

Setup(manual)

Execution(automated)

Choose a relevant context

Create an automated oracle

Use prototype tool to identify failure-relevant inputs (DF and DF +

CF)

Use the standard Delta Debugging

implementation to minimize inputs.


Setup(manual)





CF)




Setup(manual)





CF)



• Location: statement where the failure occurs.

• Data: any data read by such statement


Setup(manual)





CF)




Setup(manual)





CF)




Setup(manual)





CF)



• Use gdb to inspect stack trace and program data.

• One second timeout to prevent incorrect results.


Setup(manual)





CF)



Study 1: Effectiveness

Is the information that Penumbra provides helpful for

debugging real failures?

Study 1 Results: gzip & ncompressCrash when a file name is longer than 1,024 characters.

Study 1 Results: gzip & ncompress

Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip

Crash when a file name is longer than 1,024 characters.

# Inputs: 10,000,056

longfile name[ ]


Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip


# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]


Contents&

Attributes

Contents&

Attributes

bar

Contents&

Attributes

foo./gzip


# Relevant (DF + CF): 3# Inputs: 10,000,056 # Relevant (DF): 1

longfile name[ ]

Study 1 Results: pineCrash when a “from” field contains 22 or more double quote characters.

Study 1 Results: pine

# Inputs: 15,103,766

...From clause@boar Tue Feb 20 11:49:53 2007 Return-Path: <clause@boar> X-Original-To: clause Delivered-To: clause@boar Received: by boar (Postfix, from userid 1000) id 88EDD1724523; Tue, 20 Feb 2007 11:49:53 -0500 (EST) To: clause@boar Subject: test Message-Id: <20070220164953.88EDD1724523@boar> Date: Tue, 20 Feb 2007 11:49:53 -0500 (EST) From: "\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\"\""@host.fubar X-IMAPbase: 1172160370 390 Status: O X-Status: X-Keywords: X-UID: 5...

Crash when a “from” field contains 22 or more double quote characters.


# Inputs: 15,103,766


… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "



# Inputs: 15,103,766 # Relevant (DF): 26


… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "



# Relevant (DF + CF): 15,100,344# Inputs: 15,103,766 # Relevant (DF): 26


… \ \ \ \ \ \ \ \ \ \ \ …" " " " " " " " " " " "


Study 1: Conclusions

Study 1: Conclusions1. Data-flow propagation is always effective,

data- and control-flow propagation is sometimes effective.

➡ Use data-flow first then, if necessary, use control-flow.

Study 1: Conclusions1. Data-flow propagation is always effective,

data- and control-flow propagation is sometimes effective.

➡ Use data-flow first then, if necessary, use control-flow.

2. Inputs identified by Penumbra correspond to the failure conditions.

➡Our technique is effective in assisting the debugging of real failures.

Study 2: Comparison with Delta Debugging

RQ1: How much manual effort does each technique require?

RQ2: How long does it take to fix a considered failure given the information provided by

each technique?

RQ1: Manual effortUse setup-time as a proxy for manual (developer) effort.


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip

PenumbraDelta Debugging

squid


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip


squid


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip


squid


5,400

12,600

1,8001,8001259731470163

ncompress bc pine

Setu

p-tim

e (s

)

gzip


squid

Penumbra requires considerably less setup time than Delta Debugging (although more time time overall for gzip and ncompress).

RQ2: Debugging EffortUse number of relevant inputs as a proxy for debugging effort.

RQ2: Debugging Effort

Subject PenumbraPenumbra Delta DebuggingDF DF + CF

bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —

Use number of relevant inputs as a proxy for debugging effort.



bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —


• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.



bc 209 743 285

gzip 1 3 1

ncompress 1 3 1

pine 26 15,100,344 90

squid 89 2,056 —


• Penumbra (DF) is comparable to (slightly better than) Delta Debugging.

• Penumbra (DF + CF) is likely less effective for bc, pine, and squid

Conclusions & Future Work

• Novel technique for identifying failure-relevant inputs.

• Overcomes limitations of existing approaches

• Single execution

• Minimal manual effort

• Comparable effectiveness

• Combine Penumbra with existing code-centric techniques.