New frontiers in formal software verification Gerard J. Holzmann [email protected] s0s0 s1s1 f

new frontiers informal software verification

Gerard J. [email protected]

s0

s1

f

23 spacecraft and 10 science instrumentsExploring the Solar

System

OPPORTUNITY

SPITZER

EPOXI/DEEP

IMPACT

MRO

CLOUDSAT

DAWN

JASON 2KEPLER

WISE

AQUARIUS

JUNO

GRAIL

MSL

GRACE

JASON 1

MARS ODYSSEY

ACRIMSAT

STARDUST

CASSINI

VOYAGER 1

VOYAGER 2

GALEX

INSTRUMENTS

Earth Science

• ASTER

• MISR

• TES

• MLS

• AIRS

Planetary

• MIRO

• Diviner

• MARSIS

Astrophysics

• Herschel

• Planck

model checking

static analysis

2

3

formal software verification

• after some ~30 years of development, is still rarely used on industrial software– primary reasons:

• it is (perceived to be) too difficult • it is takes too long (months to years)

• even in safety critical applications, software verification is often restricted to the verification of models of software, instead of software

• goal:– make software verification as simple as

testing and as fast as compilation

4

verification of the PathStar switch(flashback to 1999)

• a commercial data/phone switch designed in Bell Labs research (for Lucent Technologies)

• newly written code for the core call processing engine

• the first commercial call processing code that was formally verified– 20 versions of the code were verified with

model checking during development 1998-2000 with a fully automated procedure

5

traditional hurdles:

feature interactionfeature breakageconcurrency problemsrace conditionsdeadlock scenariosnon-compliance withlegal requirements, etc.

software structure

basic call processingplus a long list of features(call waiting, call forwarding, three-way calling, etc., etc.)

call processing(~30KLOC)

~10%

call processingcontrol kernel

PathStar Code (C)

6

complex feature precedence relationsdetecting undesired feature interaction is a serious problem

7

the verification was automated in 5 steps

@PathStarcall processing 1

context3

featurerequirements

4

2 abstraction map

bug reportsproperty violations

msc

code

5

Spin

8

1. code conversion...

@dial:switch(op) {default: /* unexpected input */

goto error;case Crdtmf: /* digit collector ready */

x->drv->progress(x, Tdial);time = MSEC(16000); /* set timer */

@: switch(op) {default: /* unexpected input */

goto error;case Crconn:

goto B@1b;case Cronhook: /* caller hangs up

*/x->drv->disconnect(x);

@: if(op!=Crconn && op!=Crdis)

goto Aidle;...etc...

implied state machine:

Crconn

Crdtmf

Cronhook

else

else

dial

dial1

errordial2

a control state

a state change

PathStar

9

2. defining abstractions

• to verify the code we convert it into an automaton: a labeled transition system– the labels (transitions) are the basic statements from C

• each statement can be converted via an abstraction – which is encoded as a lookup table that supports three possible conversions :– relevant: keep (~ 60%)– partially relevant: map/abstract (~ 10%)– irrelevant to the requirements: hide (~ 30%)

• the generated transition system is then checked against the formal requirements (in linear temporal logic) with the model checker– the program and the negated requirements are converted into -

automata, and the model checker computes their intersection

10

3. defining the context

Ccode

Spinmodel

modelextraction

bugreporting

map

database offeature

requirements

environmentmodel

11

4. defining feature requirements

in Linear Temporal Logic this is written:<> (offhook /\ X ( !dialtone U onhook))

a sample property:“always when the subscriber goes offhook, a dialtone is generated”

failure to satisfy this requirement:<> eventually,

the subscriber goes offhook/\ andX thereafter, no dialtone isU generated until the next

onhook.

-automaton

mechanicalconversion

12

5. verification

LTL requirement

logicalnegation

C code

model extractor

abstractionmap

environmentmodel

*

no dialtonegenerated

bug report

13

hardware support (1999)

client/serversockets code

scripts

14

iterative search refinementt=5 min.

t=15 min.t=40 min.

each verification task is run multiple times, with increasing accuracy

15

minutes sincestart of check

performance (1999) “bugs per minute”

percentof bugsreported

25

50

75

100

(15)

(30)

(50)

(60)(numberof bugsreported)

10 20 30 40

15 bug reportsin 3 minutes25% of all bugs

first bug reportin 2 minutes

50% of all bugs reportedin 8 minutes

16

that was 1999, can we do better now?

1999• 16 networked computers

running the plan9 operating system– 500 MHz clockspeed

(~8GHz equiv)– 16x128 Mbyte of RAM

(~2GB equiv)

2012• 32-core off-the-shelf system,

running standard Ubuntu Linux (~ $4K USD)– 2.5 GHz clockspeed

(~80GHz equiv)– 64 Gbyte of shared RAM

difference:approx. 10x faster, and 32x more RAMdoes this change the usefulness of the approach?

17

performance in 2012: “bugs per second”

50% of all bugsin 7 seconds(38 bugs)

11 bug reportsafter 1 second

number of secondssince start of check

32-core PC,64 GB RAM

2.5GHz per coreUbuntu Linux

16 PCs,128 MB per PC

500MHzPlan9 OS

numberof bugsfound

10 min10 seconds

(1999)

(2012)

18

side-by-side comparison

1999• 25% of all bugs

reported in 180 seconds (15 bugs)

• 50% of all bugs reported in 480 seconds (30 bugs)

• 16 CPU networked system

2012• 15% of all bugs

reported in 1 second (11 bugs)

• 50% of all bugs reported in 7 seconds (38 bugs)

• 1 desktop PC

19

generalization: swarm verification

• goal: leverage the availability of large numbers of CPUs and/or cpu-cores– if an application is too large to verify

exhaustively, we can define a swarm of verifiers that each tries to check a randomly different part of the code

• using different hash-polynomials in bitstate hashing and different numbers of polynomials

• using different search algorithms and/or search orders

– use iterative search refinement to dramatically speedup error reporting (“bugs per second”)

http://www.geocities.com/xiv_skull/xiv_skull.gif

20

swarm search



spin front-endhttp://spinroot.com/swarm/

$ swarm –F config.lib –c6 > scriptswarm: 456 runs, avg time per cpu 3599.2

sec $ sh ./script

# rangek 1 4 # min and max nr of

hash functions

# limitsdepth 10000 # max search depthcpus 128 # nr available cpusmemory 64MB # max memory to be used;

recognizes MB,GBtime 1h # max time to be

used; h=hr, m=min, s=secvector 512 # bytes per state,

used for estimatesspeed 250000 # states per second processedfile model.pml # the spin model

# compilation options (each line defines a search mode)-DBITSTATE #

standard dfs-DBITSTATE -DREVERSE # reversed process

ordering-DBITSTATE -DT_REVERSE # reversed transition

ordering-DBITSTATE –DP_RAND # randomized process

ordering-DBITSTATE –DT_RAND # randomized

transition ordering-DBITSTATE –DP_RAND –DT_RAND # both

# runtime options-c1 -x -n

swarm configuration file:

21

the user specifies:1. # cpus to use2. mem / cpu3. maximum time

many small jobs do the work of one large job – but much faster and in less memory

22

100% coverage

#processes (log)

statesreached

linearscaling

swarmsearch

100% coverage witha swarm of 100 using

0.06% of RAM each (8MB) compared to a single

exhaustive run (13GB)

(DEOS O/S model)

23

tools mentioned:• Spin: http://spinroot.com (1989)

o parallel depth-first search added 2007o parallel breadth-first search added 2012

• Modex: http://spinroot.com/modex (1999)• Swarm: http://spinroot.com/swarm (2008)

thank you!

Documents

New frontiers in formal software verification Gerard J. Holzmann [email protected] s0s0 s1s1 f