Compiler Assisted Software Verification Using Plug-Ins Radu Grosu SUNY at Stony Brook

Compiler Assisted Software Verification

Using Plug-Ins

Radu Grosu SUNY at Stony Brook

Joint work with

S. Callanan, X. Huang, S. A. Smolka and E. Zadok

System-Software

• Difficult to develop & maintain:– Concurrent and distributed (OS, ES, middleware),

– Complicated by DS improving performance (locks, RC,...),

– Mostly written in C programming language.

• Has to be high-confidence: – Provides the critical infrastructure for all applications,

– Failures are very costly (business, reputation),

– Has to protect against cyber-attacks.

What is High-Confidence?

S |?

system-software S satisfies temporal-property φ

Ability to guarantee that

• Safety: something bad never happens

• Liveness: something good eventually happens

Checking for High-Confidence(in-principle)

• Every LTL formula can be translated to a FSA with executions B (looping prg.) such that L() = L(B).

• Automata-theoretic approach (infinite behaviors):

S |= iff L(BS) L(B ) iff L(BS B )

• Checking non-emptiness is equivalent to finding a reachable accepting cycle (lasso, faulty PHASE!).

Checking for High-Confidence(in-principle)

Instrumenter(Product)

BA BS

ExecutionEngine

LTL-P

BA

BS B

All LassosNon-accepting

AcceptingLasso L

• Combine static & runtime verification techniques:– Abstract interpretation (sequential IS programs),

– Model checking (concurrent FS programs),

– Runtime analysis (sequential program optimization).

• Make scalability a priority: – Open source compiler technology started to mature,

– Apply techniques to source code rather than models,• Models can be obtained by abstraction-refinement techniques,

– Probabilistic techniques trade-of between precision-effort.

Checking for High-Confidence(in-practice)

GCC Compiler

• Early stages: a modest C compiler.- Translation: source code translated directly to RTL.

- Optimization: at low RTL level.

- High level information lost: calls, structures, fields, etc.

• Now days: full blown, multi-language compiler generating code for more than 30 architectures.

- Input: C, C++, Objective-C, Fortran, Java and Ada.

- Tree-SSA: added GENERIC, GIMPLE and SSA ILs.

- Optimization: at GENERIC, GIMPLE, SSA and RTL levels.

- Verification: Tree-SSA API suitable for verification, too.

GCC Compilation Process

Java FileC++ FileC File

C Parser

C++ Parser

Java Parser

Genericize

Gimplify

Parse Tree

GEN AST

..

GPL AST

Code Gen

Build CFG

GPL AST

Rest Comp

SSA/GPL CFG

RTL Code

Obj Code

GCC Compilation Process

Java FileC++ FileC File

C Parser

C++ Parser

Java Parser

Genericize

Gimplify

Parse Tree

GEN AST

..

GPL AST

Code Gen

Build CFG

GPL AST

Rest Comp

SSA/GPL CFG

RTL Code

Obj Code

APIPlug-In

Plug-In Support

GCC & Builder modified to load plug-ins that:

• Analyze or modify the GCC representation,

• Have access to the internal APIs of GCC,

• Developed independently from GCC,

• No GCC recompilation necessary.

C Program and its GIMPLE IL

int main() {

int a,b,c;

a = 5;

b = a + 10;

c = a + foo(a,b);

if (a > c)

c = b++/a + b*a;

bar(a,b,c); }

int main {

int a,b,c; int T1,T2,T3,T4;

a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1;

if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1;fi: bar(a,b,c); }

Gimplify

Associated GIMPLE CFG

a = 5;b = a + 10;T1 = foo(a,b);T2 = b + T1;if (a > T2) goto B;

A

a 5

=CE

b

a 10

+

=

CE

CE

b

T1

foo a

CallE

= B

a T2

>

if

CE

T2

b T1

+

=T3 = b / a;T4 = b * a;c = T3 + T4;b = b + 1;

bar(a,b,c);return;

Exit

true falseBC

FUNCTION DECL

Entry int int int int int int inta T4T3T2c T1b

Checking for High-Confidence(in-practice)

Gimplify

SS S

InstrumentLTL-P

CFG BS

GCC

CFG

BS B

VerifierGAM static

Rst-CompGCC

Linker DispatcherHWMruntime

GSRV Platform

• GSRV suite: – Static and runtime verification tools we are developing for GCC.

• General purpose (plug-ins):– Verbose-dump: recursively traverses and prints the CFG,

– Intra/inter-procedural slicer: in work,– Code instrumenter: constructs the product machine.

• Static verification tools (plug-ins):– Symbolic (BDD) execution engine: for boolean C-programs,– GAM: CFG-GIMPLE abstract machine,– Monte Carlo MC: statistical algorithm for LTL-MC.

• Runtime verification tools (static libraries):– Dispatcher: catches and dispatches events to RV, – Monte Carlo RV: statistical algorithm for LTL-RV.

Instrumentation Plug-Ins

• Ref-Counts: detects misuse of reference counts– Instruments: inc(rc), dec(rc),– Checks: st-inv (rc0), tr-inv (|rc′-rc|=1), leak-inv (rc>0 ~> rc=0), – Maintains: a list of reference counts and their container type.

• Malloc: detects allocation bugs at runtime– Instruments: malloc() and free() function calls,– Checks sequences: free()free(), $free() and malloc()$,– Maintains: a list of existing allocations.

Instrumentation Plug-Ins

• Bounds: checks for invalid memory access– Instruments: malloc(), free() and f(a),– Checks: accesses to non-allocated areas,– Maintains: heap, stack and text allocations– Higher accuracy than ElectricFence-like libraries.

recurrencediameter

Explore N(,) independent lassos in the CT

Error margin and confidence ratio

Monte Carlo Approach

LTL…

flip a k-sided coin

LassosComputation tree (CT)

• Lasso sampling reduces overhead:

- Static verification: Reduces the space overhead

- Runtime verification: Dynamically adjusts sampling rate

• Lasso sampling weakened for RV:

- Reference counts: From zero up and back to zero.

Taking N(,) Independent Lassos(error margin and confidence ratio )

Geometric Random Variable

• Value of geometric RV X with parameter p:

– No. of independent samples until success.

• Probability mass function: – p(N) = P[X = N] = qN-1 p

• Cumulative Distribution Function:

– F(N) = P[X N] = ∑i Np(i) = 1 – qN = 1 – (1- p)N

How Many Lassos?

• Requiring 1- (1-p)N = 1- δ yields:

N = ln (δ) / ln (1- p)

• Lower bound on number of trials N needed to achieve success with confidence ratio δ.

What If p Unknown?

• Requiring p ε yields:

M = ln (δ) / ln (1- ε) N = ln (δ) / ln (1- p)

and therefore P[X M] 1- δ

• Lower bound on number of trials M needed to achieve success with

confidence ratio δ and error margin ε .

Statistical Hypothesis Testing

• Null hypothesis H0: p ε

• Alternative hypothesis H1: p < ε

• If no success after N trials, then reject H0

– In RV: adjust sampling rate.

• Type I error: α = P[ X > M | H0 ] < δ

• Since: P[ X M | H0 ] 1- δ

Model Checking Results

• TCAS:

– Safe/best/optimal advisory selection,

– No/avoid-unnecessary crossing.

• Dining Philosophers:

– (Un)Symmetric and (Un)Fair versions

• Needham-Schroeder Protocol:– Quite sophisticated C implementation.

Runtime Verification (Reference Counts)

• Check Linux file system cache objects

– inodes: on-disk files

– dentries: namespace nodes

• Optionally, log all events

• Simple per-category sampling policy

– Initially: sample all objects– Hypothesize: ε > 10-5 and δ = 10-5

– Stop sampling: if hypothesis is false.

RV of RC: Results

0

20

40

60

80

100

120

0 5 10 15 20 25

Tim

e (

se

co

nd

s)

Run number

Logging: ~10x

~3x

1,33x

Results

0

20

40

60

80

100

120

0 5 10 15 20 25

Tim

e (

se

co

nd

s)

Run number

Checking: ~2x

1,33x1,1x

Ongoing and Future Work

• Static Verification: open source software MC for GCC– Abstraction/refinement/interpolation techniques,

– Directed MC combined with Monte-Carlo MC:

• Linked GAM with CVS Light.

• Runtime Verification: open source software RV for GCC

– Develop: new plug-ins & a property (monitoring) language

– Explore: novel sampling techniques, e.g. based on phases

– Apply: Monte Carlo Decision Processes for optimal sampling.

Ongoing Instrumentation Plug-Ins

• CFG-duplicator: replicates each function’s CFG– Splits each basic block into two parts:

• Uninstrumented block: no change (except labels)

• Instrumented block: instrumentation applied

– Inserts selectors (if statements) before each pair– Block instrumentation can be toggled at run-time

• Multi-core: checking code into a separate thread– Puts relevant information into a shared buffer– Shadow thread reads and parses information– Low latency: 65 cycles between cores on 1.65GHz Power5

Future Instrumentation Plug-Ins

• FE-tracer: records function calls and parameters– Can be easily applied to both user and kernel code– Provides valuable trace information to guide debugging

• DS-access-logger: records what data went where– Faster than trap-based methods: no context switches– We can exploit type information to provide visual

representations of data structures and their links

• Thread-DL-detector: detects circular dependencies– Extracts the loop conditions for each loop– Finds variables that would be written if the loop exited– If two threads are blocking on each other, flags a deadlock

Documents

Compiler Assisted Software Verification Using Plug-Ins Radu Grosu SUNY at Stony Brook