Strauss: A Specification Miner Glenn Ammons Department of Computer Sciences University of...

Preview:

Citation preview

Strauss:A Specification

Miner

Glenn AmmonsDepartment of Computer SciencesUniversity of Wisconsin-Madison

2

How should programs behave?

• Strauss mines temporal specifications• for example, always call free after malloc

• Why?• To debug

• To verify

• To test

• To modify

• To understand

3

How should programs behave?

• Strauss mines temporal specifications• for example, always call free after malloc

• Why?• To debug

• To verify

• To test

• To modify

• To understand

• Specs constrain programs• like structured programming, types, etc.

4

Strauss output

Spec. says what programs should do:•What order of calls? •What values in calls?

X := socket()

Y := accept(sock = X)

close(sock = Y)

read(sock = Y)

write(sock = Y)

start

end

For all calls Y := accept(sock = X):

5

Strauss inputs

...socket(domain = 2, type = 1, proto = 0, return = 7))...

Traces

...socket(domain = 2, type = 1, proto = 0, return = 7))...

...7 := socket(domain = 2, type = 1, proto = 0)...

StraussStrauss

LibraryClient programs

Interface (e.g., sockets)

6

Strauss inputs

...socket(domain = 2, type = 1, proto = 0, return = 7))...

Traces

...socket(domain = 2, type = 1, proto = 0, return = 7))...

...7 := socket(domain = 2, type = 1, proto = 0)...

StraussStrauss

LibraryClient programs

Interface (e.g., sockets)

Scenario criterion

accept

State transition model (STM)

def socket()close(def use sock)def accept(use sock)read(use sock)write(use sock)

7

My contributions• A dynamic approach to mining

• analyze communication, not source code

• A tool for mining specifications• practical: scalable and mostly automatic• temporal specs may refer to multiple data

values• requires little info about code

• New method for debugging specifications• Birds of a feather flock together (concept

analysis)

• Found specifications and bugs

8

Summary of results

• Mined 17 specs. for X11• example: XInternAtom should only be

called during initialization, not in the event loop.

• Found 199 bugs• In widely distributed programs

• wish (Tk), xpdf, xpilot, ...

• Included races, leaks, logic errors, and performance bugs

• By verifying dynamically (on traces)

9

Related work: obtaining specs.

• By hand• From programmers

• Types• Specification languages and annotations

• From analysts, testers• Abstraction tools (Bandera [Dwyer et al.])

• With specification mining

10

Comparing spec. miners• Strauss

• Specs: temporal, multiple objects, arbitrary NFAs

• Mining: from traces, local, no code necessary

• Goal: specs. for interfaces

• Other work• Concurrent work

• FSMs for Java objects [Whaley, Martin, and Lam]• Simple templates [Metacompilation]• Loop invariants [ESC/Java]

• Earlier work• Arithmetic properties [Daikon]• Cliches [Programmer’s Apprentice]• Communicating processes [Verisoft]

11

Overview of Strauss

Strauss

Programs and interface

Instrumentedprograms

Training traces

STM and scenariocriterion

Unvalidatedspecification

Preprocess

Mine

Debug

Developer’sjudgement

Specification

Scenarios

Tracer

Run ontest inputs

Scenarioextractor

Learner

Cable

12

Trace + STM = Semantic trace

1 7 := socket(domain = 2, type = 1, proto = 0)2 0x100 := malloc(size = 23)3 8 := accept(sock = 7, addr = 0x40, addr_len = 0x50)4 23 := write(sock = 8, buf = 0x100, len = 23)5 0 := close(sock = 8)...

State transition model (STM)

def socket()close(def use sock)def accept(use sock)read(use sock)write(use sock)

Semantic trace

Vars: v7, v8

1 v7 := socket()2 skip3 v8 := accept(sock = v7)4 write(sock = v8)5 v8 := close(sock = v8)...

13

Trace + STM = Semantic trace

State transition model (STM)

def socket()close(def use sock)def accept(use sock)read(use sock)write(use sock)

Semantic trace

Vars: v7, v8

1 v7 := socket()2 skip3 v8 := accept(sock = v7)4 write(sock = v8)5 v8 := close(sock = v8)...

1 7 := socket(domain = 2,domain = 2, type = 1,type = 1, proto = 0proto = 0)2 0x100 := malloc(size = 23)0x100 := malloc(size = 23)3 8 := accept(sock = 7,, addr = 0x40,addr = 0x40, addr_len = 0x50addr_len = 0x50)4 23 := write(sock = 8,, buf = 0x100,buf = 0x100, len = 23len = 23)5 0 :=0 := close(sock = 8)...

14

Trace + STM = Semantic trace

State transition model (STM)

def socket()close(def use sock)def accept(use sock)read(use sock)write(use sock)

Semantic trace

Vars: v7, v8

1 v7 := socket()2 skip3 v8 := accept(sock = v7)4 write(sock = v8)5 v8 := close(sock = v8)...

1 7 := socket(domain = 2,domain = 2, type = 1,type = 1, proto = 0proto = 0)2 0x100 := malloc(size = 23)0x100 := malloc(size = 23)3 8 := accept(sock = 7,, addr = 0x40,addr = 0x40, addr_len = 0x50addr_len = 0x50)4 23 := write(sock = 8,, buf = 0x100,buf = 0x100, len = 23len = 23)5 0 :=0 := close(sock = 8)...

15

A different STMState transition model (STM)

def malloc()free(def p)read(use buf)write(use buf)

Semantic trace

Var: v0x100

1 skip2 v0x100 := malloc()3 skip4 write(buf = v0x100)5 skip...

1 7 := socket(domain = 2, type = 1, proto = 0)2 0x100 := malloc(size = 23)3 8 := accept(sock = 7, addr = 0x40, addr_len = 0x50)4 23 := write(sock = 8, buf = 0x100, len = 23)5 0 := close(sock = 8)...

16

A different STMState transition model (STM)

def malloc()free(def p)read(use buf)write(use buf)

Semantic trace

Var: v0x100

1 skip2 v0x100 := malloc()3 skip4 write(buf = v0x100)5 skip...

1 7 := socket(domain = 2,7 := socket(domain = 2, type = 1,type = 1, proto = 0)proto = 0)2 0x100 := malloc(size = 23size = 23)3 8 := accept(sock = 7,8 := accept(sock = 7, addr = 0x40,addr = 0x40, addr_len = 0x50addr_len = 0x50)4 23 :=23 := write(sock = 8,sock = 8, buf = 0x100,, len = 23len = 23)5 0 := close(sock = 8)0 := close(sock = 8)...

17

From dependences to scenarios

Pick a seedSlice forwardsSlice backwardsChop each pair

18

From dependences to scenarios

Pick a seedSlice forwardsSlice backwardsChop each pair

19

From dependences to scenarios

Pick a seedSlice forwardsSlice backwardsChop each pair

20

From dependences to scenarios

Pick a seedSlice forwardsSlice backwardsChop each pair

21

From dependences to scenarios

Pick a seedSlice forwardsSlice backwardsChop each pair

22

Extracting scenarios: example

Vars: v7, v8

1 v7 := socket

2 v8 := accept(sock = v7)

3 write(sock = v8)

4 v8 := close(sock = v8)

5 v7 := close(sock = v7)

23

Extracting scenarios: example

Pick a seedVars: v7, v8

1 v7 := socket

2 v8 := accept(sock = v7)

3 write(sock = v8)

4 v8 := close(sock = v8)

5 v7 := close(sock = v7)

24

Extracting scenarios: example

Pick a seedSlice forwards

Vars: v7, v8

1 v7 := socket

2 v8 := accept(sock = v7)

3 write(sock = v8)

4 v8 := close(sock = v8)

5 v7 := close(sock = v7)

25

Extracting scenarios: example

Pick a seedSlice forwardsSlice backwards

Vars: v7, v8

1 v7 := socket

2 v8 := accept(sock = v7)

3 write(sock = v8)

4 v8 := close(sock = v8)

5 v7 := close(sock = v7)

26

Extracting scenarios: example

Pick a seedSlice forwardsSlice backwardsChop each pair

Vars: v7, v8

1 v7 := socket

2 v8 := accept(sock = v7)

3 write(sock = v8)

4 v8 := close(sock = v8)

5 v7 := close(sock = v7)

27

Overview of Strauss

Strauss

Programs and interface

Instrumentedprograms

Training traces

STM and scenariocriterion

Unvalidatedspecification

Preprocess

Mine

Debug

Developer’sjudgement

Specification

Scenarios

Tracer

Run ontest inputs

Scenarioextractor

Learner

Cable

28

Overview of learning

standardizeScenarios(dep. graphs)

ACEGB

ACEGB

ACEGB

Strings

NFA Learn NFA

Scenario standardize

NFA

Scenario acceptor (SA)

String

Yes or no

Specification = (SA, STM, scenario criterion)

29

Standardizing scenarios

v7 := socket()

v8 := accept(sock = v7)

write(sock = v8)

read(sock = v8)

v8 := close(sock = v8)

v7 := close(sock = v7)

v2 := socket()

v3 := accept(sock = v2)

v2 := close(sock = v2)

read(sock = v3)

write(sock = v3)

v3 := close(sock = v3)

30

Standardizing scenarios

v7 := socket()

v8 := accept(sock = v7)

v7 := close(sock = v7)

read(sock = v8)

write(sock = v8)

v8 := close(sock = v8)

v2 := socket()

v3 := accept(sock = v2)

v2 := close(sock = v2)

read(sock = v3)

write(sock = v3)

v3 := close(sock = v3)

31

Standardizing scenarios

X := socket()

Y := accept(sock = X)

X := close(sock = X)

read(sock = Y)

write(sock = Y)

Y := close(sock = Y)

X := socket()

Y := accept(sock = X)

X:= close(sock = X)

read(sock = Y)

write(sock = Y)

Y := close(sock = Y)

32

Standard scenario = string

X := socket()

Y := accept(sock = X)

X := close(sock = X)

read(sock = Y)

write(sock = Y)

Y := close(sock = Y)

X := socket()

Y := accept(sock = X)

X:= close(sock = X)

read(sock = Y)

write(sock = Y)

Y := close(sock = Y)

A

B

C

D

E

F

33

Overview of learning

standardizeScenarios(dep. graphs)

ACEGB

ACEGB

ACEGB

Strings

NFA Learn NFA

Scenario standardize

NFA

Scenario acceptor (SA)

String

Yes or no

Specification = SA + STM + scenario criterion

34

Raman’s PFSA algorithm1. Build a weighted retrieval tree2. Merge similar states

35

Raman’s PFSA algorithm1. Build a weighted retrieval tree2. Merge similar states

socket

accept

read write

write

write

read

read

write

write

write

100

5

5

95

70

7025

25

25

5

end70

525

36

Raman’s PFSA algorithm1. Build a weighted retrieval tree2. Merge similar states

socket

accept

read write

write

write

read

read

write

write

write

100

5

5

95

70

7025

25

25

5

end70

525

37

Raman’s PFSA algorithm1. Build a weighted retrieval tree2. Merge similar states

socket

accept

read write

writeread

read

write

write

100

5

5

95

70

7025

25

5

end100

25

38

Raman’s PFSA algorithm1. Build a weighted retrieval tree2. Merge similar states

socket

accept

readwrite

read

readwrite

100

595

7025

25

end100

25

5

75

39

Raman’s PFSA algorithm1. Build a weighted retrieval tree2. Merge similar states

socket

accept

readwrite

readwrite

100

595

70

25

25

end100

25

5

75

40

Overview of Strauss

Strauss

Programs and interface

Instrumentedprograms

Training traces

STM and scenariocriterion

Unvalidatedspecification

Preprocess

Mine

Debug

Developer’sjudgement

Specification

Scenarios

Tracer

Run ontest inputs

Scenarioextractor

Learner

Cable

41

Overview of Debugging

Cable

Debugged specification

SpecificationACEGB

ACEGB

ACEGB

Classifiedscenarios

Learner

Scenarios

42

Classifying scenarios

b0b2

b1

G0G0

G0

G0G0

g0

g1

43

Classifying scenarios

b0b2

b1

G0G0

G0

G0G0

b0b2

b1 G0G0

G0

G0G0

g0

g1

g0

g1

44

Classifying scenarios

b0b2

b1

g0

g1

G0G0

G0

G0G0

b0b2

b1 g0

g1

G0G0

G0

G0G0

g0

g1

G0G0

G0

G0G0

45

Concept analysis: mammals

4-legged

hairy smart marine thumbed

cats yes yes

dogs yes yes

dolphins

yes yes

gibbons yes yes yes

humans yes yes

whales yes yes

46

Concept analysis: mammals

4-legged

hairy smart marine thumbed

cats yes yes

dogs yes yes

dolphins

yes yes

gibbons yes yes yes

humans yes yes

whales yes yes

47

Cable method: good pets

Good pets:

Bad pets:

48

Cable method: good pets

catsgibbonsdogsdolphinshumanswhales

{}

Good pets:

Bad pets:

49

Cable method: good pets

hairy

catsgibbonsdogs

Good pets:

Bad pets:

50

Cable method: good pets

4-leggedhairy

cats dogs

Good pets:

Bad pets:

51

Cable method: good pets

4-leggedhairy

cats dogs

Good pets:cats, dogs

Bad pets:

52

Cable method: good pets

Good pets:cats, dogs

Bad pets:

gibbons dolphinshumanswhales

smart

53

Cable method: good pets

Good pets:cats, dogs

Bad pets:gibbons dolphins humanswhales

gibbons dolphinshumanswhales

smart

54

Concept analysis: scenarios

Takestransition 0

Takestransition 1

Takes transition 2

Takes transition 3

Takes transition 4

Scen. 0 yes yes

Scen. 1 yes yes

Scen. 2 yes yes

Scen. 3 yes yes yes

Scen. 4 yes yes

Scen. 5 yes yes

55

Cable method: good scenarios

Good scenarios:

Bad scenarios:

56

Cable method: good scenarios

scen. 0scen. 1scen. 2scen. 3scen. 4scen. 5

{}

Good scenarios:

Bad scenarios:

57

Cable method: good scenarios

scen. 0scen. 1scen. 2

Takes transition 1

Good scenarios:

Bad scenarios:

58

Cable method: good scenarios

scen. 0 scen. 2

Takes transition 0Takes transition 1

Good scenarios:

Bad scenarios:

59

Cable method: good scenarios

scen. 0 scen. 2

Takes transition 0Takes transition 1

Good scenarios:0, 2

Bad scenarios:

60

Cable method: good scenarios

Takes transition 2Takes transition 3Takes transition 4

Good scenarios:0, 2

Bad scenarios:

scen. 1 scen. 3scen. 4scen. 5

61

Cable method: good scenarios

Takes transition 2Takes transition 3Takes transition 4

Good scenarios:0, 2

Bad scenarios:1, 3, 4, 5

scen. 1 scen. 3scen. 4scen. 5

62

Experimental results

• Where to find bugs?• in programs (static verification)?

• or in traces (dynamic verification)?

• Specifications and bugs

63

How Strauss verifies a spec

extract scenarios

Scenario acceptor

...socket(domain = 2, type = 1, proto = 0, return = 7))...

...socket(domain = 2, type = 1, proto = 0, return = 7))...

...socket(domain = 2, type = 1, proto = 0, return = 7))...

socket(...)

accept(...)

read(...) write(...)

close(...)

socket(...)

accept(...)

read(...) write(...)

close(...)

socket(...)

accept(...)

read(...) write(...)

close(...)

Traces Scenarios(dep. graphs)

Positive specifications:Accept trace iff every seed hasa scenario that the SA accepts.

Negative specifications:Accept trace iff every seed hasno scenario that the SA accepts.

Yes or no

64

Experimental results

•90 traces•72 programs

•7 “book” rules•10 “graph” rules

Rule STM States Edges Bugs

PrsAccelTbl 9 3 10 1PrsTransTbl 5 3 4 0Quarks 4 63 93 0RegionsAlloc 4 8 14 16RegionsBig 33 370 604 12RmvTimeOut 3 3 3 0XFreeGC 3 10 13 44XGContextFromGC 4 22 36 44XGetSelOwner 3 5 6 9XInternAtom 17 3 11 42XPutImage 9 3 7 2+2XSaveContext 8 989 1519 33XSetFont 4 20 37 44+1XSetSelOwner 16 3 8 7XtFree 4 95 171 45XtOwnSel 16 4 23 1XtRealizeProc 4 22 36 0

65

1. X11 selection protocol specs.:• Found 17 bugs total.• English spec is buggy!

2. Performance: XInternAtom should not be called in the event loop

• 42 buggy programs (out of 72); degree varied

3. XPutImage spec. (more on this later)• 2 different bugs! But, also 2 false positives.

4. Other bugs:• 138 total (resource errors)

Bugs found

66

Cable was usefulCost = concepts visited+ concepts labeled

SpecificationBase Optimal Expert

PrsAccelTbl 18 4 8PrsTransTbl 6 2 2Quarks 16 2 2RegionsAlloc 20 8 16RegionsBig 540 X 149RmvTimeOut 4 2 2XFreeGC 20 6 12XGContextFromGC 50 14 24XGetSelOwner 4 4 5XInternAtom 20 4 5XPutImage 12 6 10XSaveContext 184 X 42XSetFont 56 X 53XSetSelOwner 12 4 5XtFree 224 X 28XtOwnSel 14 4 5XtRealizeProc 76 12 13

Cost of labeling

67

XSetSelectionOwner

English: Pass XSetSelectionOwner the timestampfrom the last event.

For all calls XSetSelectionOwner(time = X)

XSetSelectionOwner(time = X)

(event.time = X) := XNextEvent()XFilterEvent(event.time = X)XtDispatchEvent(event.time = X)(event.time = X) := XCheckWindowEvent()cb_XtActionProc(event.time = X)

68

XInternAtom

English: Don’t call XInternAtom in the event loop.

For no calls XInternAtom(display = X)

cb_XtEventHandler(event.display = X)cb_XtActionProc(event.display = X)(event.display = X) := XtAppNextEvent()(event.display = X) := XNextEvent()XFilterEvent(event.display = X)XtCallActionProc(event.display = X)(event.display = X) := XWindowEventXtDispatchEvent(event.display = X)(event.display = X) := XCheckWindowEvent()

XInternAtom(display = X)

69

XPutImage

English: The image and GC passed to XPutImagemust have been created on the same display.

For all calls XPutImage(display = X, image = Y, gc = Z)

Z := XCreateGC(display = X)

Y := XCreateImage(display = X)Y := XShmCreateImage(display = X)

XPutImage(display = X, image = Y, gc = Z)

70

Conclusions• Strauss is

• local• scalable• dynamic

• Strauss finds• real specs.• real bugs

• Does Strauss• generalize well?• work for others?• do too much or too little?

71

Suggestions for future work

• Increasing automation• Exploiting patterns• Exploiting labels

• Beyond mining from traces• Static; hybrids (profile-based)

• More expressive models• Structured heaps; arithmetic properties

• Other domains• Program understanding; testing

72

My publications• On Strauss.

• Debugging temporal specifications with concept analysis. With Rastislav Bodik, James R. Larus, and David Mandelin. PLDI03.

• Mining specifications. With Rastislav Bodik and James R. Larus. POPL02.

• Path profiling• Improving data-flow analysis with path profiles.

With James R. Larus. PLDI98. Selected for 20 Years of PLDI (1979-1999).

• Exploiting hardware performance counters with flow and context sensitive profiling. With Tom Ball and James R. Larus. PLDI97.

73

End of talk

74

Classifying scenarios

b0b2

b1

g0

g1

G0G0

G0

G0G0

b0b2

b1 g0

g1

G0G0

G0

G0G0

g0

g1

G0G0

G0

G0G0

75

Overview of StraussTraces and STM

apply STM extract scenarios

socket(...)

accept(...)

read(...) write(...)

close(...)

Trace programs(with PDGs)

socket(...)

accept(...)

read(...) write(...)

close(...)

socket(...)

accept(...)

read(...) write(...)

close(...)

Seed pattern

Scenarios

Classify and learn

Specification

X := socket()

Y := accept(so = X)

close(so = Y)close(so = X)

read(so = Y)

write(so = Y)

start

end

..

76

Overview of StraussTraces and STM

apply STM extract scenarios

socket(...)

accept(...)

read(...) write(...)

close(...)

Trace programs(with PDGs)

socket(...)

accept(...)

read(...) write(...)

close(...)

socket(...)

accept(...)

read(...) write(...)

close(...)

Seed pattern

Scenarios

Classify and learn

Specification

X := socket()

Y := accept(so = X)

close(so = Y)close(so = X)

read(so = Y)

write(so = Y)

start

end

..

77

Strauss inputs

...socket(domain = 2, type = 1, proto = 0, return = 7))...

Traces

...socket(domain = 2, type = 1, proto = 0, return = 7))...

...7 := socket(domain = 2, type = 1, proto = 0)...

State transition model (STM)

def socket()close(def use sock)def accept(use sock)read(use sock)write(use sock)

StraussStrauss

LibraryClient programs

Interface (e.g., sockets)

Recommended