Upload
maegan
View
24
Download
0
Embed Size (px)
DESCRIPTION
Compiler Assisted Software Verification Using Plug-Ins Radu Grosu SUNY at Stony Brook. Joint work with S. Callanan, X. Huang, S. A. Smolka and E. Zadok. System-Software. Difficult to develop & maintain: Concurrent and distributed (OS, ES, middleware), - PowerPoint PPT Presentation
Citation preview
Compiler Assisted Software Verification
Using Plug-Ins
Radu Grosu SUNY at Stony Brook
Joint work with
S. Callanan, X. Huang, S. A. Smolka and E. Zadok
System-Software
• Difficult to develop & maintain:– Concurrent and distributed (OS, ES, middleware),
– Complicated by DS improving performance (locks, RC,...),
– Mostly written in C programming language.
• Has to be high-confidence: – Provides the critical infrastructure for all applications,
– Failures are very costly (business, reputation),
– Has to protect against cyber-attacks.
What is High-Confidence?
S |?
system-software S satisfies temporal-property φ
Ability to guarantee that
• Safety: something bad never happens
• Liveness: something good eventually happens
Checking for High-Confidence(in-principle)
• Every LTL formula can be translated to a FSA with executions B (looping prg.) such that L() = L(B).
• Automata-theoretic approach (infinite behaviors):
S |= iff L(BS) L(B ) iff L(BS B )
• Checking non-emptiness is equivalent to finding a reachable accepting cycle (lasso, faulty PHASE!).
Checking for High-Confidence(in-principle)
Instrumenter(Product)
BA BS
ExecutionEngine
LTL-P
BA
BS B
All LassosNon-accepting
AcceptingLasso L
• Combine static & runtime verification techniques:– Abstract interpretation (sequential IS programs),
– Model checking (concurrent FS programs),
– Runtime analysis (sequential program optimization).
• Make scalability a priority: – Open source compiler technology started to mature,
– Apply techniques to source code rather than models,• Models can be obtained by abstraction-refinement techniques,
– Probabilistic techniques trade-of between precision-effort.
Checking for High-Confidence(in-practice)
GCC Compiler
• Early stages: a modest C compiler.- Translation: source code translated directly to RTL.
- Optimization: at low RTL level.
- High level information lost: calls, structures, fields, etc.
• Now days: full blown, multi-language compiler generating code for more than 30 architectures.
- Input: C, C++, Objective-C, Fortran, Java and Ada.
- Tree-SSA: added GENERIC, GIMPLE and SSA ILs.
- Optimization: at GENERIC, GIMPLE, SSA and RTL levels.
- Verification: Tree-SSA API suitable for verification, too.
GCC Compilation Process
Java FileC++ FileC File
C Parser
C++ Parser
Java Parser
Genericize
Gimplify
Parse Tree
GEN AST
..
GPL AST
Code Gen
Build CFG
GPL AST
Rest Comp
SSA/GPL CFG
RTL Code
Obj Code
GCC Compilation Process
Java FileC++ FileC File
C Parser
C++ Parser
Java Parser
Genericize
Gimplify
Parse Tree
GEN AST
..
GPL AST
Code Gen
Build CFG
GPL AST
Rest Comp
SSA/GPL CFG
RTL Code
Obj Code
APIPlug-In
Plug-In Support
GCC & Builder modified to load plug-ins that:
• Analyze or modify the GCC representation,
• Have access to the internal APIs of GCC,
• Developed independently from GCC,
• No GCC recompilation necessary.
C Program and its GIMPLE IL
int main() {
int a,b,c;
a = 5;
b = a + 10;
c = a + foo(a,b);
if (a > c)
c = b++/a + b*a;
bar(a,b,c); }
int main {
int a,b,c; int T1,T2,T3,T4;
a = 5; b = a + 10; T1 = foo(a,b); T2 = a + T1;
if (a > T2) goto fi; T3 = b / a; T4 = b * a; c = T2 + T3; b = b + 1;fi: bar(a,b,c); }
Gimplify
Associated GIMPLE CFG
a = 5;b = a + 10;T1 = foo(a,b);T2 = b + T1;if (a > T2) goto B;
A
a 5
=CE
b
a 10
+
=
CE
CE
b
T1
foo a
CallE
= B
a T2
>
if
CE
T2
b T1
+
=T3 = b / a;T4 = b * a;c = T3 + T4;b = b + 1;
bar(a,b,c);return;
Exit
true falseBC
FUNCTION DECL
Entry int int int int int int inta T4T3T2c T1b
Checking for High-Confidence(in-practice)
Gimplify
SS S
InstrumentLTL-P
CFG BS
GCC
CFG
BS B
VerifierGAM static
Rst-CompGCC
Linker DispatcherHWMruntime
GSRV Platform
• GSRV suite: – Static and runtime verification tools we are developing for GCC.
• General purpose (plug-ins):– Verbose-dump: recursively traverses and prints the CFG,
– Intra/inter-procedural slicer: in work,– Code instrumenter: constructs the product machine.
• Static verification tools (plug-ins):– Symbolic (BDD) execution engine: for boolean C-programs,– GAM: CFG-GIMPLE abstract machine,– Monte Carlo MC: statistical algorithm for LTL-MC.
• Runtime verification tools (static libraries):– Dispatcher: catches and dispatches events to RV, – Monte Carlo RV: statistical algorithm for LTL-RV.
Instrumentation Plug-Ins
• Ref-Counts: detects misuse of reference counts– Instruments: inc(rc), dec(rc),– Checks: st-inv (rc0), tr-inv (|rc′-rc|=1), leak-inv (rc>0 ~> rc=0), – Maintains: a list of reference counts and their container type.
• Malloc: detects allocation bugs at runtime– Instruments: malloc() and free() function calls,– Checks sequences: free()free(), $free() and malloc()$,– Maintains: a list of existing allocations.
Instrumentation Plug-Ins
• Bounds: checks for invalid memory access– Instruments: malloc(), free() and f(a),– Checks: accesses to non-allocated areas,– Maintains: heap, stack and text allocations– Higher accuracy than ElectricFence-like libraries.
recurrencediameter
Explore N(,) independent lassos in the CT
Error margin and confidence ratio
Monte Carlo Approach
LTL…
flip a k-sided coin
LassosComputation tree (CT)
• Lasso sampling reduces overhead:
- Static verification: Reduces the space overhead
- Runtime verification: Dynamically adjusts sampling rate
• Lasso sampling weakened for RV:
- Reference counts: From zero up and back to zero.
Taking N(,) Independent Lassos(error margin and confidence ratio )
Geometric Random Variable
• Value of geometric RV X with parameter p:
– No. of independent samples until success.
• Probability mass function: – p(N) = P[X = N] = qN-1 p
• Cumulative Distribution Function:
– F(N) = P[X N] = ∑i Np(i) = 1 – qN = 1 – (1- p)N
How Many Lassos?
• Requiring 1- (1-p)N = 1- δ yields:
N = ln (δ) / ln (1- p)
• Lower bound on number of trials N needed to achieve success with confidence ratio δ.
What If p Unknown?
• Requiring p ε yields:
M = ln (δ) / ln (1- ε) N = ln (δ) / ln (1- p)
and therefore P[X M] 1- δ
• Lower bound on number of trials M needed to achieve success with
confidence ratio δ and error margin ε .
Statistical Hypothesis Testing
• Null hypothesis H0: p ε
• Alternative hypothesis H1: p < ε
• If no success after N trials, then reject H0
– In RV: adjust sampling rate.
• Type I error: α = P[ X > M | H0 ] < δ
• Since: P[ X M | H0 ] 1- δ
Model Checking Results
• TCAS:
– Safe/best/optimal advisory selection,
– No/avoid-unnecessary crossing.
• Dining Philosophers:
– (Un)Symmetric and (Un)Fair versions
• Needham-Schroeder Protocol:– Quite sophisticated C implementation.
Runtime Verification (Reference Counts)
• Check Linux file system cache objects
– inodes: on-disk files
– dentries: namespace nodes
• Optionally, log all events
• Simple per-category sampling policy
– Initially: sample all objects– Hypothesize: ε > 10-5 and δ = 10-5
– Stop sampling: if hypothesis is false.
RV of RC: Results
0
20
40
60
80
100
120
0 5 10 15 20 25
Tim
e (
se
co
nd
s)
Run number
Logging: ~10x
~3x
1,33x
Results
0
20
40
60
80
100
120
0 5 10 15 20 25
Tim
e (
se
co
nd
s)
Run number
Checking: ~2x
1,33x1,1x
Ongoing and Future Work
• Static Verification: open source software MC for GCC– Abstraction/refinement/interpolation techniques,
– Directed MC combined with Monte-Carlo MC:
• Linked GAM with CVS Light.
• Runtime Verification: open source software RV for GCC
– Develop: new plug-ins & a property (monitoring) language
– Explore: novel sampling techniques, e.g. based on phases
– Apply: Monte Carlo Decision Processes for optimal sampling.
Ongoing Instrumentation Plug-Ins
• CFG-duplicator: replicates each function’s CFG– Splits each basic block into two parts:
• Uninstrumented block: no change (except labels)
• Instrumented block: instrumentation applied
– Inserts selectors (if statements) before each pair– Block instrumentation can be toggled at run-time
• Multi-core: checking code into a separate thread– Puts relevant information into a shared buffer– Shadow thread reads and parses information– Low latency: 65 cycles between cores on 1.65GHz Power5
Future Instrumentation Plug-Ins
• FE-tracer: records function calls and parameters– Can be easily applied to both user and kernel code– Provides valuable trace information to guide debugging
• DS-access-logger: records what data went where– Faster than trap-based methods: no context switches– We can exploit type information to provide visual
representations of data structures and their links
• Thread-DL-detector: detects circular dependencies– Extracts the loop conditions for each loop– Finds variables that would be written if the loop exited– If two threads are blocking on each other, flags a deadlock