29
Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Embed Size (px)

Citation preview

Page 1: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Elsa/Oink/Cqual++:Open-Source Static Analysis for C++

Scott McPeak Daniel Wilkerson

work with Rob Johnson

CodeCon 2006

Page 2: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Goals

• Build extensible infrastructure to

• Find certain categories of bugs– Exhaustively, within some constraints

• At compile time

• In real-world C and C++ programs

• Using composable analyses

Page 3: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Components

• Elkhound: Generalized LR Parser Generator

• Elsa: C++ Parser

• Oink: Whole-program dataflow

• Cqual++: Type qualifier analysis

Page 4: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Elkhound: GLR Parser Generator

• GLR eliminates the pain of LALR(1)– Unbounded lookahead– Allows ambiguous grammars!

• 10x faster than other GLR implementations– Novel combination of GLR and LALR(1)

• User-defined disambiguation– Early: during parsing– Late: after generating AST w/ambiguities

Page 5: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Example: ‘>’ ambiguity

new C < 3 > + 4 > + 5 ;

new C < 3 > + 4 > + 5 ;

Expr

Type

Expr

Type

Page 6: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Example: ‘>’ ambiguity

new C < 3 > + 4 > + 5 ;

new C < 3 > + 4 > + 5 ;

Expr

Type

Expr

Type

unparenthesized ‘>’ symbol

Correct

Incorrect

Page 7: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Example: Type vs. Variable

• In C & C++, sometimes hard to tell whether a name refers to a type or a variable

(a) & (b) (a) & (b)

Expr Expr Type Expr

or

Page 8: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Example: Type vs. Variable

• In C & C++, sometimes hard to tell whether a name refers to a type or a variable

int a; // hiddenclass C { int f(int b) { return (a) & (b); } typedef int a; // visible};

Page 9: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Elsa: Extensible C++ Front-end

• Parses ANSI C++ with GNU extensions

• Uses GLR to handle the ambiguities

• Extensible components:– flex lexer– Elkhound parser– AST defined with custom tool– Type checker

Page 10: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

The Elsa Block Diagram

Lexer

preproc’dsource

Parser

tokenstream

TypeChecker

possiblyambiguousAST

PostProcess

annotatedunambiguousAST

finalAST

No lexer feedback hack!

Page 11: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Extending the Syntax

• ANSI or GNU? Both!– Declarative language– Extend simply by concatenating

nonterm ConditionalExp { -> Exp {...} -> Exp "?" Exp ":" Exp {...}}

ANSI Base:

nonterm ConditionalExp { -> Exp "?" ":" Exp {...}}

GNU Extension:

Page 12: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Declarative Abstract Syntax

class Statement (SourceLoc loc) { -> S_compound(ASTList<Statement> stmts); -> S_if(Condition cond, Statement thenBranch, Statement elseBranch);

-> S_while(Condition cond, Statement body);

// ...}

superclass name superclass ctor parameter

subclass names

subclass ctor parameter

subclass ctor list parameter

Page 13: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Extending the Abstract Syntax

• ANSI or GNU? Both!– Declarative language– Extend simply by concatenating

ANSI Base: GNU Extension:

class Statement { -> S_decl(Declaration decl); -> S_expr(Expression expr); -> S_if(...); -> S_for(...); }

class Statement { -> S_function(Function f);}

GNU nested functions

Page 14: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Semantic Analysis

• Disambiguate

• Compute types

• Resolve overloading

• Insert implicit conversions

• Instantiate templates

Page 15: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Disambiguation

Ambiguous syntax example: return (x)(y);

S_return

E_cast

TypeId

x

E_funCall

E_variable E_variable E_variable

y

ambiguity link

expr

exprtype func arg

Page 16: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Lowered Output: Simplified C++

• Original or Lowered output can be printed

• Lowering always done:– Templates are instantiated– Implicit type conversions inserted

• Lowering optionally done:– Implicit member functions created– Implicit ctor/dtor calls inserted

Page 17: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

C++ or XML, In and Out

Elsa

C++

XML

C++

XML

First pass renders to a canonical form.Serialization commutes with lowering.

Page 18: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Cqual++: Dataflow

• Dataflow Analysis on Type Qualifiers

• Successor to Cqual: Jeff Foster, Alex Aiken

char $tainted *getenv();

void printf(char $untainted *fmt, ...);

int main() { char *x = getenv(“foo”));

printf(x);}

Page 19: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Feature: Polymorphic Dataflow

int f(int x) {return x;}

int main() { int $tainted t = ...;

int a = f(t);

int $untainted u = f(3);

}

Page 20: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Feature: “Funky Qualifiers”:Fake Function Bodies

char $_1_2 *strcat(char $_1_2 *dest,

const char $_1 *src);int main() { char $tainted *x; char $untainted *y; strcat(y, x);}

{1} ½ {1,2}

Page 21: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Feature: Separate Compilation for Scalability

• “Compile” each file to a dataflow graph– only flow behavior between external symbols

matters– compress by finding smaller graph with same

flow behavior; typically saves factor of 12

• “Link” each graph– AST is gone at linking so we save even more

space

Page 22: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Non-Feature: Cqual++ Is Not Flow-Sensitive

q = p;... time passes ...

p->s = read_from_network();use_in_untrusting_way(p->s);

// does p == q still??q->s = "innocuous";use_in_trusting_way(p->s);

$tainted??

Page 23: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

What Exactly Is ‘Data-Flow’?

char *launderString(char *in) { int len = strlen(in); char *out = malloc(len+1); for (int i=0; i<len; ++i) { out[i] = 0; for (int j=0; j<8; ++j) if (in[i] & (1<<j)) out[i] |= (1<<j); } out[len] = '\0'; return out;}

Page 24: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Application: Finding Format-String Vulnerabilities

• Printf() is an interpreter

• the format string is a program– %n writes number of bytes written to memory

pointed to by the arg– ex: printf(“stuff%n”, p) means *p = 5

• if no argument p, printf() writes through some pointer on the stack– do not allow untrusted data in first arg to printf

Page 25: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Application: Finding User-Kernel Vulnerabilities

• Kernel must check user pointers are valid– must point to memory mapped into user

process’s address space– otherwise could manipulate the kernel data

• This is also a dataflow/taint analysis

Page 26: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Rob’s Cqual LinuxUser-Kernel Results

• 2.4.20, full config, 7 bugs, 275 false pos.

• 2.4.23, full config, 6 bugs, 264 false pos.

• including other trials on same kernels:– found 17 different security vulnerabilites– found bugs missed by other tools and manually– all but one bug confirmed exploitable– significant “bug churn” across kernel versions

Page 27: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Linus’s “Sparse” Toolfor User-Kernel Vulnerabilities

• Linus also has a tool using type qualifiers– it requires manual annotation of every var

• In contrast, Cqual++ infers the qualifiers– only sources and sinks need be annotated– and any “sanitizer” functions:

• Linus says this “is not the C way”– ok, he can write all the annotations

Page 28: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Future Application: Finding Character-Set Confusions

• Microsoft confusing ASCII and UCS2

• Mozilla has 20-ish differnt charcter sets

• they should only flow together through conversion functions

• if array sizes differ, confusions can be a security hole too

Page 29: Elsa/Oink/Cqual++: Open-Source Static Analysis for C++ Scott McPeak Daniel Wilkerson work with Rob Johnson CodeCon 2006

Oink Vision:Composable Analysis Tools

• Compilers refuse to compile bugs– well, some classes of bugs– and you may have to wait until tomorrow

morning to find out

• Correctness analysis is expected as part of any compiler toolchain

• The analyses are composable and extensible