28
Mining Specifications Glenn Ammons, Dept. Computer Science University of Wisconsin Rastislav Bodik, Computer Science Division University of California, Berkeley James R. Larus, Microsoft Research POPL 2002

Mining Specifications

  • Upload
    byron

  • View
    17

  • Download
    2

Embed Size (px)

DESCRIPTION

Mining Specifications. Glenn Ammons , Dept. Computer Science University of Wisconsin Rastislav Bodik , Computer Science Division University of California, Berkeley James R. Larus , Microsoft Research POPL 2002. Motivation. Formal verification is a promising alternative to software testing - PowerPoint PPT Presentation

Citation preview

Page 1: Mining Specifications

Mining Specifications

Glenn Ammons, Dept. Computer Science University of Wisconsin

Rastislav Bodik, Computer Science Division University of California, Berkeley

James R. Larus, Microsoft Research

POPL 2002

Page 2: Mining Specifications

Motivation

Formal verification is a promising alternative to software testing

But

Verifiers will be of little use without enough correctness specifications to be verified

Page 3: Mining Specifications

The Assumption

Common behavior is (often) correct behavior.

If we can identify common behavior we can produce correct specifications, even from programs that contain errors.

Page 4: Mining Specifications

A Program Using socket API

1 int s = socket(AF_INET, SOCK_STREAM, 0);

2 …

3 bind(s, &serv_addr, sizeof(serv_addr));

4 …

5 listen(s, 5);

6 …

7 while (1) {

8 int ns = accept(s, &addr, &len);

9 if (ns < 0) break;

10 do {

11 read(ns, buffer, 255);

12 …

13 write(ns, buffer, size);

14 if (cond1) return;

15 } while (cond2)

16 close(ns);

17 }

18 close(s);

Page 5: Mining Specifications

An Example Trace

1 socket(domain = 2, type = 1, proto = 0, return = 7)

2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0)

3 listen(so = 7, backlog = 5, return = 0)

4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8)

5 read(fd = 8, buf = 0x400320, len = 255, return = 12)

6 write(fd = 8, buf = 0x400320, len = 12, return = 12)

7 read(fd = 8, buf = 0x400320, len = 255, return = 7)

8 write(fd = 8, buf = 0x400320, len = 7, return = 7)

9 close(fd = 8, return = 0)

10 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 10)

11 read(fd = 10, buf = 0x400320, len = 255, return = 13)

12 write(fd = 10, buf = 0x400320, len = 13, return = 13)

13 close(fd = 10, return = 0)

14 close(fd = 7, return = 0)

Page 6: Mining Specifications

Design Decisions

1. Learn from traces not from source• Contain fewer bugs

2. Take a “vote” on what the common program behavior is.

• the high-probability core encodes the frequently followed protocol.

Page 7: Mining Specifications

Mining System

Run

Tracer

Automatonlearner

Scenario extractor

Flow dependenceannotator

Instrumented program

Traces

Program

Test inputs

Annotated traces

Scenario seedAbstract scenario strings

Specifications

Page 8: Mining Specifications

• I - the set of all traces of interaction with an API or ADT .

• C I - the set of all correct traces of interaction.

• T - an unlabelled training set of interaction traces.

Find an automaton A that generates exactly the traces in C.

The (unsolvable) Problem

Page 9: Mining Specifications

Restriction 1

• C must be a regular language.– Model checkers require finite-state

specifications.– Algorithms for learning finite-state automatons

are relatively well developed.

Page 10: Mining Specifications

Interaction Scenarios

LinkedList(n)

mallocmalloc

freemalloc

freefree

.

.

.

.

.

.

malloc(return = O1)malloc(return = O2)

free(p = On)malloc(return = On)

free(p = O2)free(p = O1)

.

.

.

malloc(return = O1)free(p = O1)

O1{malloc(return = O2)free(p = O2)

O2{

malloc(return = On)free(p = On)

On{

.

.

.

malloc(return = Ostd)free(p = Ostd)

O1{malloc(return = Ostd)free(p = Ostd)

O2{

malloc(return = Ostd)free(p = Ostd)

On{

Page 11: Mining Specifications

The Problem – Take 2

• IS - the set of all interaction scenarios with an API or ADT that manipulate no more than k data objects.

• CS IS - the regular set of all correct scenarios.

• TS - an unlabelled training set of interaction scenarios from IS.

Find a finite-state automaton AS that generates exactly the scenarios in CS.

Page 12: Mining Specifications

Restriction 2 - Linking Ts and Cs

TS = c0,c1,… be an infinite sequence of elements from CS in which each element of CS occurs at least once.

for each n > 0: c0,c1,… cn ASn

for some N ≥ 0, ASN generates exactly the

scenarios in CS and ASn= ASN

for all

n ≥ N.AS0

,AS1,… identifies CS in the limit.

Page 13: Mining Specifications

The Probabilistic Approach• Is – as before.

• M – a target PFSA and PM a distribution over Is that M generates.

“Efficiently” find a PFSA M’ such that its distribution PM’ is an ε-good approximation of PM.

Page 14: Mining Specifications

Mining System

Run

Tracer

Automatonlearner

Scenario extractor

Flow dependenceannotator

Instrumented program

Traces

Program

Test inputs

Annotated traces

Scenario seedAbstract scenario strings

Specifications

Page 15: Mining Specifications

Tracer1. C stdio replacement (requires recompilation)2. Executable editing

1 socket(domain = 2, type = 1, proto = 0, return = 7)2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0)3 listen(so = 7, backlog = 5, return = 0)4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8)

skeleton:interaction(attribute0 ,…, attributen)

Page 16: Mining Specifications

Flow Dependence

Type inference

Dependence analysis Untyped trace with dependencies

Traces

Annotated traces

Page 17: Mining Specifications

Dependence Analysis

Definers:socket.returnbind.solisten.soaccept.returnclose.fd

• Takes a list of attributes that define or use objects (manually created).

• Creates a flow dependence between users and definers.

Users:bind.solisten.soaccept.soread.fdwrite.fdclose.fd

Page 18: Mining Specifications

Type Inference

If there exists a flow dependency between two attributes then typing gives these attributes the same type.

Type(socket.return)=T0

Type(bind.so)=T0

Type(listen.so)=T0

Type(accept.so)=T0

Type(accept.return)=T0

Type(read.fd)=T0

Type(write.fd)=T0 Type(close.fd)=T0

Page 19: Mining Specifications

Scenario Extraction

Simplification

Extraction scenarios

simplified scenarios

Annotaed traces

Standardization

Scenario seeds

Abstract scenario strings

Page 20: Mining Specifications

Extraction

• A scenario is a set of interactions related by flow dependences.

1 socket(domain = 2, type = 1, proto = 0, return = 7)

2 bind(so = 7, addr = 0x400120, addr_len = 6, return = 0)

3 listen(so = 7, backlog = 5, return = 0)

4 accept(so = 7, addr = 0x400200, addr_len = 0x400240, return = 8)

5 read(fd = 8, buf = 0x400320, len = 255, return = 12)

6 write(fd = 8, buf = 0x400320, len = 12, return = 12)

7 read(fd = 8, buf = 0x400320, len = 255, return = 7)

8 write(fd = 8, buf = 0x400320, len = 7, return = 7)

9 close(fd = 8, return = 0)

Page 21: Mining Specifications

Simplification

Eliminate all interaction attributes that do not carry a flow dependence.

1 socket(return = 7)

2 bind(so = 7)

3 listen(so = 7)

4 accept(so = 7, return = 8) [seed]

5 read(fd = 8)

6 write(fd = 8)

7 read(fd = 8)

8 write(fd = 8)

9 close(fd = 8)

Page 22: Mining Specifications

Standardization

1 socket(return = x0:T0)

2 bind(so = x0:T0)

3 listen(so = x0:T0)

4 accept(so = x0:T0, return = x1:T0) [seed]

5 read(fd = x1:T0)

7 read(fd = x1:T0)

6 write(fd = x1:T0)

8 write(fd = x1:T0)

9 close(fd = x1:T0)

1. Naming: replaces attribute values with symbolic variables.

2. Reordering

(A)

(B)

(C)

(D)

(E)

(E)

(F)

(F)

(G)

Page 23: Mining Specifications

Automaton Learning

1. OTS learner learns a PFSA2. A corer removes infrequently

traversed edges and converts the PFSA into an NFA.start

final

10000

10000

10000

5

5

5

5

Page 24: Mining Specifications

Specification Automaton for the Socket Protocolsocket(return = x)

bind(so = x)

listen(so = x)

accept(so = x, return = y)

read(fd = y) write(fd = y)

close(fd = x)

close(fd = y)

Page 25: Mining Specifications

Experimental Results

• Analyzed traces from programs that use the Xlib and X Toolkit Intrinsics libraries for the X11 windowing system.

• Traces were generated manually• Compare mined specification to

Interclient Communication Conventions Manual (ICCCM) rules.

Page 26: Mining Specifications

Experimental Results

• A small and buggy training set prevented the miner from discovering the rule.

• solution: an expert chooses correct traces as the training set.

Page 27: Mining Specifications

Benefits

• Exploits the massive programmers' effort that is reflected in the code (and nowhere else).

• Offers convenience and insights.It is easier to approve a mined formal specification than to write one.

Page 28: Mining Specifications

Conclusion

• Introduced a (semi) automatic machine-learning approach for discovering formal specifications.

• Reduced the problem to learning regular languages.

• Initial experience is promising.