Upload
gyorgy-orban
View
265
Download
1
Embed Size (px)
Citation preview
Static Analysis of C/C++ Programs
György Orbá[email protected]@gyorb
Gábor horvá[email protected]@Xazax-hun
Static Analysis
Analyze source code without running it● Find bugs
● Find optimization opportunities
● Transform code
● Verify
● Visualize
● Metrics
● ...
Static Analysis
● Answer questions about the code● Terminates?
● Maximum heap size?
● What is the output for a given input?
● Can this pointer be null?
● We have similar questions all the time during programming.
We are doing static analysis!
Halting Problem
● Does a computer program terminate on a given input?
● Undecidable problem● Rice's theorem from '53:
● Every interesting question about the behavior of a program is undecidable
int x = 5;if (program p halts on input i) ++x;
Approximations
● We can not solve undecidable problems● We can do approximations, apply heuristics that work well in practice● False positives (False reports)● False negatives (Unreported defects)
● For verification: reduce false negatives (to 0)● For industrial use: reduce false positives
Static analysis methods
1) Textual Pattern Matching – CppCheck2) AST Matchers/Walkers – CppCheck, Clang Tidy3) Flow Sensitive Algorithms – Compiler warnings4) Path Sensitive Algorithms – Clang Static Analyzer
3) and 4) are Abstract Interpretation techniques
Interpret the program but instead of using the actual semantics define an abstract one
1. Textual patterns
● Transform the code to canonical form● Match pattern on the transformed form
Token:Match(tok,”/ 0”);
int a = 5 / 0;int b = getValue();if (b == 4) { ... int c = 5 / (b – 4);}
3. flow sensitive
int x = 0;
if (z) {
x = 5;
}
int y = 3;
if (z) {
y = 10 / x;
}
x = 0
x = 5 x = 0
x = 0
x = 0
Division by zero
● Walk on the control flow graph● What to do on merge points?
False positive!
2. AST matchers`-FunctionDecl main 'int (void)' `-CompoundStmt 0x5e40ea0 <col:11, line:6:1> |-DeclStmt | `-VarDecl used d 'double *' |-BinaryOperator 'double *' lvalue '=' | |-DeclRefExpr 'double *' lvalue Var 0x5e2c5a0 'd' 'double *' | `-CStyleCastExpr 'double *' <BitCast> | `-CallExpr 'void *' | |-ImplicitCastExpr 'void *(*)(int) throw()' <FunctionToPointerDecay> | | `-DeclRefExpr 'void *(int) throw()' lvalue Function 0x5dea360 'malloc' 'void *(int) throw()' | `-ImplicitCastExpr 'int' <IntegralCast> | `-UnaryExprOrTypeTraitExpr 'unsigned long' sizeof | `-ParenExpr 'double *' lvalue | `-DeclRefExpr 'double *' lvalue Var 0x5e2c5a0 'd' 'double *' `-ReturnStmt `-IntegerLiteral 'int' 1
#include<stdlib.h>int main(){ double *d; d =(double*)malloc(sizeof(d)); return 1;}
●Type information is available
●Typedefs, macros, overloading, template specializations are all resolved
3. flow sensitive
int x = 0;
if (z) {
x = 5;
}
int y = 3;
if (z) {
y = 10 / x;
}
x = 0
x = 5 x = 0
x = 0
x = 0
Division by zero
● Walk on the control flow graph● What to do on merge points?
False positive!
3. flow sensitive
int x = 0;
if (z) {
x = 5;
}
int y = 3;
if (!z) {
y = 10 / x;
}
x = 0
x = 5 x = 0
x = Unknown
x = Unknownx = Unknown
● Polynomial but imprecise● Warnings usually implemented this way using the
conservative approach
False negative!
4. Path sensitive
● Path sensitive walk over the Control Flow Graph● Simulated execution of the program
● Try to consider every possible paths● Often used along with symbolic values
● In this case it is called symbolic execution● Constraints are calculated for symbols● False paths are pruned
● Exponential time in branches● It is possible to have reasonable performance though
Symbolic Execution
#include <stdlib.h>void test(int b){ int a,c; switch (b){ case 1:a = b / 0; break; case 4:
c = b - 4; a = b / c; break; };}
b: $b
b: $b b: $b b: $b$b=[4,4]$b=[1,1]
$b=[MIN_INT,0],[2,3],[5,MAX_INT]
b: $bc: 0
$b=[4,4];c=$b-4c=0
b: $bc: 0 $b=[4,4]
a=$b/$c
case 4
c = b-4;
Division by zero
switch(b)
a = b/c;
b: $b
a = b/0;
Nodes are immutable program states
Context sensitivityint getNull(int a) { return a?0:1;}
void test(int b){ int a; switch (b){ ... case 5: a = b / getNull(b); break; ... };}
Division by Zero
Called with a == 5, returns 0
Cross translation unit (TU) analysis● What happens when the definition of a function is in a
separate translation unit?
void f(int *x);
void g(int *x) {
if (*x > 0) {
f(x);
f(x);
}
}
The pointed value is known to be positive.
The pointed value is unknown.
Another TU:
void f(int *x) {
*x = (*x);
}
Clang tools● Clang Static Analyzer
● Symbolic execution● Context sensitive● No cross translation unit analysis (yet?)
● Compiler warnings● Flow sensitive algorithms
● Clang Tidy● AST Matching● Source Rewrites
● Clang Format● Works on tokens● Only formatting
CodeChecker
● Bug management tool for Clang Static Analyzer and Clang-tidy
● Supported architectures● Linux● OSX (ongoing)
● Available here:
https://github.com/Ericsson/codechecker
CodeChecker
● Support multiple analyzers– Clang Static Analyzer
– Clang-tidy
● Store results (Sqlite/PostgreSQL)● Compare analysis runs (new/resolved bugs)● Filter bug reports (severity, filename …)● Suppress false positives● ...
Thrift API (store/view)
Build system(make, ...)
Static Analysis Process
CMake
Compile CommandJson
Build logger
Static Analyzers
Analyzer/Checker/Buildconfigurations
ClangSA Clang-tidy
Report managerstore/compare/filter
Sqlite/PostgreSQL
Web client
Command line client
New clients
CodeChecker
Analysis ManagerProcess configurations
& analysis results