23
Static Analysis of C/C++ Programs György Orbán [email protected] @gyorb Gábor horváth [email protected] @Xazax-hun

Static Analysis of C++ programs

Embed Size (px)

Citation preview

Static Analysis of C/C++ Programs

György Orbá[email protected]@gyorb

Gábor horvá[email protected]@Xazax-hun

Static Analysis

Analyze source code without running it● Find bugs

● Find optimization opportunities

● Transform code

● Verify

● Visualize

● Metrics

● ...

Static Analysis

● Answer questions about the code● Terminates?

● Maximum heap size?

● What is the output for a given input?

● Can this pointer be null?

● We have similar questions all the time during programming.

We are doing static analysis!

Halting Problem

● Does a computer program terminate on a given input?

● Undecidable problem● Rice's theorem from '53:

● Every interesting question about the behavior of a program is undecidable

int x = 5;if (program p halts on input i)  ++x;

Approximations

● We can not solve undecidable problems● We can do approximations, apply heuristics that work well in practice● False positives (False reports)● False negatives (Unreported defects)

● For verification: reduce false negatives (to 0)● For industrial use: reduce false positives

Static analysis methods

1) Textual Pattern Matching – CppCheck2) AST Matchers/Walkers – CppCheck, Clang Tidy3) Flow Sensitive Algorithms – Compiler warnings4) Path Sensitive Algorithms – Clang Static Analyzer

3) and 4) are Abstract Interpretation techniques

Interpret the program but instead of using the actual semantics define an abstract one

1. Textual patterns

● Transform the code to canonical form● Match pattern on the transformed form

Token:Match(tok,”/ 0”);

int a = 5 / 0;int b = getValue();if (b == 4) {  ...  int c = 5 / (b – 4);}

3. flow sensitive

int x = 0;

if (z) {

    x = 5;

}

int y = 3;

if (z) {

    y = 10 / x;

}

x = 0

x = 5 x = 0

x = 0

x = 0

Division by zero

● Walk on the control flow graph● What to do on merge points?

False positive!

2. AST matchers`-FunctionDecl main 'int (void)' `-CompoundStmt 0x5e40ea0 <col:11, line:6:1> |-DeclStmt | `-VarDecl used d 'double *' |-BinaryOperator 'double *' lvalue '=' | |-DeclRefExpr 'double *' lvalue Var 0x5e2c5a0 'd' 'double *' | `-CStyleCastExpr 'double *' <BitCast> | `-CallExpr 'void *' | |-ImplicitCastExpr 'void *(*)(int) throw()' <FunctionToPointerDecay> | | `-DeclRefExpr 'void *(int) throw()' lvalue Function 0x5dea360 'malloc' 'void *(int) throw()' | `-ImplicitCastExpr 'int' <IntegralCast> | `-UnaryExprOrTypeTraitExpr 'unsigned long' sizeof | `-ParenExpr 'double *' lvalue | `-DeclRefExpr 'double *' lvalue Var 0x5e2c5a0 'd' 'double *' `-ReturnStmt `-IntegerLiteral 'int' 1

#include<stdlib.h>int main(){ double *d; d =(double*)malloc(sizeof(d)); return 1;}

●Type information is available

●Typedefs, macros, overloading, template specializations are all resolved

3. flow sensitive

int x = 0;

if (z) {

    x = 5;

}

int y = 3;

if (z) {

    y = 10 / x;

}

x = 0

x = 5 x = 0

x = 0

x = 0

Division by zero

● Walk on the control flow graph● What to do on merge points?

False positive!

3. flow sensitive

int x = 0;

if (z) {

    x = 5;

}

int y = 3;

if (!z) {

    y = 10 / x;

}

x = 0

x = 5 x = 0

x = Unknown

x = Unknownx = Unknown

● Polynomial but imprecise● Warnings usually implemented this way using the

conservative approach

False negative!

4. Path sensitive

● Path sensitive walk over the Control Flow Graph● Simulated execution of the program

● Try to consider every possible paths● Often used along with symbolic values

● In this case it is called symbolic execution● Constraints are calculated for symbols● False paths are pruned

● Exponential time in branches● It is possible to have reasonable performance though

Symbolic Execution

#include <stdlib.h>void test(int b){ int a,c; switch (b){ case 1:a = b / 0; break; case 4:

c = b - 4; a = b / c; break; };}

b: $b

b: $b b: $b b: $b$b=[4,4]$b=[1,1]

$b=[MIN_INT,0],[2,3],[5,MAX_INT]

b: $bc: 0

$b=[4,4];c=$b-4c=0

b: $bc: 0 $b=[4,4]

a=$b/$c

case 4

c = b-4;

Division by zero

switch(b)

a = b/c;

b: $b

a = b/0;

Nodes are immutable program states

Context sensitivityint getNull(int a) { return a?0:1;}

void test(int b){ int a; switch (b){ ... case 5: a = b / getNull(b); break; ... };}

Division by Zero

Called with a == 5, returns 0

Cross translation unit (TU) analysis● What happens when the definition of a function is in a

separate translation unit?

void f(int *x);

void g(int *x) {

    if (*x > 0) {

        f(x);

        f(x);

    }

}

The pointed value is known to be positive.

The pointed value is unknown.

Another TU:

void f(int *x) {

    *x = ­(*x);

}

Clang tools● Clang Static Analyzer

● Symbolic execution● Context sensitive● No cross translation unit analysis (yet?)

● Compiler warnings● Flow sensitive algorithms

● Clang Tidy● AST Matching● Source Rewrites

● Clang Format● Works on tokens● Only formatting

CodeChecker

CodeChecker

● Bug management tool for Clang Static Analyzer and Clang-tidy

● Supported architectures● Linux● OSX (ongoing)

● Available here:

https://github.com/Ericsson/codechecker

CodeChecker

● Support multiple analyzers– Clang Static Analyzer

– Clang-tidy

● Store results (Sqlite/PostgreSQL)● Compare analysis runs (new/resolved bugs)● Filter bug reports (severity, filename …)● Suppress false positives● ...

Thrift API (store/view)

Build system(make, ...)

Static Analysis Process

CMake

Compile CommandJson

Build logger

Static Analyzers

Analyzer/Checker/Buildconfigurations

ClangSA Clang-tidy

Report managerstore/compare/filter

Sqlite/PostgreSQL

Web client

Command line client

New clients

CodeChecker

Analysis ManagerProcess configurations

& analysis results

CodecheckerDEMO

Lets implement a

checker!