Guarding Vulnerable Code: Module 1: Sanitization · 21 Type confusion detection* A static cast is checked only at compile time – Fast but no runtime guarantees Dynamic casts are

1

Guarding Vulnerable Code:Module 1: Sanitization

Mathias Payer, Purdue Universityhttp://hexhive.github.io

2

Vulnerabilities everywhere?

3

Common Languages: TIOBE’18

Jul 2018 Jul 2017 Change Language Ratings Change

1 1 Java 16.139% +2.37%

2 2 C 14.662% +7.34%

3 3 C++ 7.615% +2.04%

4 4 Python 6.361% +2.82%

5 7 + VB .NET 4.247% +1.20%

6 5 - C# 3.795% +0.28%

7 6 - PHP 2.832% -0.26%

8 8 JavaScript 2.831% +0.22%

9 - ++ SQL 2.334% +2.33%

10 18 ++ Objective-C 1.453% -0.44%

4

Software is highly complex

Google Chrome: 76 MLoCGnome: 9 MLoCXorg: 1 MLoCglibc: 2 MLoCLinux kernel: 17 MLoC

Low-level languages (C/C++) trade type safety and memory safety for performance

5

Defense: Testing vs. Mitigations

Mitigations● Stop exploitation● Always on● Low overhead

Software Testing● Discover bugs● Development tool● Result oriented

6

Memory Corruption

7

Memory error: invalid dereference

Dangling pointer:(temporal)

Out-of-bounds pointer:(spatial)

Violation iff: pointer is read, written, or freed

char foo[40];foo[42] = 23;

free(foo);*foo = 23;

8

Type Confusion

9

Type confusion through downcasts

Base

Greeter Exec

Greeter *g = new Greeter();Base *b = static_cast<Base*>(g);Exec *e = static_cast<Exec*>(b);

√X

10

C++ casting operations

● static_cast<ToClass>(Object)

– Compile time check

– No runtime type information

● dynamic_cast<ToClass>(Object)

– Runtime check

– Requires Runtime Type Information (RTTI)

– Not used in performance critical code

11

Static cast

movq -24(%rbp), %rax # Load pointer# Type “check”

movq %rax, -40(%rbp) # Store pointer

Base *b = …;a = static_cast<Greeter*>(b);

12

Dynamic cast (O2)

leaq _ZTI7Greeter(%rip), %rdxleaq _ZTI4Base(%rip), %rsixorl %ecx, %ecxmovq %rbp, %rdi # Load pointercall __dynamic_cast@PLT # Type check

Base *b = …;a = dynamic_cast<Greeter*>(b);

13

Type confusion

class Base {int x;

};class Greeter: Base {

int y;virtual void Hi();

};…Base *Bptr = new Base();Greeter *Gptr;Gptr = static_cast<Greeter*>Gptr; // Type ConfGptr->y = 0x43; // Memory safety violation!Gptr->Hi(); // Control-flow hijacking

x

vtable*

y

B G

xBptr

Gptrvtable*?

y?

14

Type Confusion Demo

15

C++ virtual dispatch

class Base { … };class Exec: public Base { public: virtual void exec(char *prg) { system(prg); }};class Greeter: public Base { public: virtual void sayHi(char *str) { std::cout << str << std::endl; }};Greeter *greeter = new Greeter();greeter->sayHi("Oh, hello there!");

Base

Greater Exec

16

Simple exploitation demo

int main() { Base *b1 = new Greeter(); Base *b2 = new Exec(); Greeter *g;

g = static_cast<Greeter*>(b1); g->sayHi("Greeter says hi!");

g = static_cast<Greeter*>(b2); g->sayHi("/usr/bin/xcalc");

delete b1; delete b2; return 0;}

vtable*b1

vtable*b2

GreeterT

ExecT

// g[0][0](str);

// g[0][0](str);

17

Sanitization

18

Problem: broken abstractions?

C/C++ void log(int a) { printf("Log: "); printf("%d", a);}void (*fun)(int) = &log;void init() { fun(15);}

ASMlog: ... fun: .quad loginit: ... movl $15, %edi movq fun(%rip), %rax call *%rax

19

LLVM Sanitization

● Test cases detect bugs through assertions, segmentation faults, traps, exceptions

● Enforce stronger policies during testing!– Address Sanitizer: memory safety

– Leak Sanitizer: memory leaks

– Memory Sanitizer: uninitialized memory

– UBSan: undefined behavior

– Thread Sanitizer: data races

– HexVASAN: variadic argument checker

– HexType: type safety

20

Type Safety

21

Type confusion detection*

● A static cast is checked only at compile time– Fast but no runtime guarantees

● Dynamic casts are checked at runtime– High overhead, limited to polymorphic classes

● HexType design: – Conceptually check all casts dynamically

– Aggressively optimize design and implementation

* TypeSanitizer: Practical Type Confusion Detection. Istvan Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Herbert Bos, Cristiano Giuffrida, Erik van der Kouwe. In CCS'16* HexType: Efficient Detection of Type Confusion Errors for C++. Yuseok Jeon, Priyam Biswas, Scott A. Carr, Byoungyoung Lee, and Mathias Payer. In CCS'17

22

Making type checks explicit

● Enforce runtime check at all cast sites– static_cast<ToClass>(Object)

– dynamic_cast<ToClass>(Object)

– reinterpret_cast<ToClass>(Object)

– (ToClass)(Object)

● Build global type hierarchy● Keep track of the allocation type of each object

– Must instrument all forms of allocation

– Requires disjoint metadata

23

HexType: design

Instrumentation(Type casting verification)

Source code

LLVM Pass

Clang

Type Hierarchy Information

HexTypeRuntimeLibrary

HexTypeBinary

Link

24

HexType: aggressive optimization

● Limit tracing to unsafe types– Remove tracing of types that are never cast

● Limit checking to unsafe casts– Remove statically verifiable casts

● No more RTTI for dynamic casts– Replace dynamic casts with fast lookup

25

Demo Time!

26

HexType coverage

27

Newly discovered bugs

● Discovered seven new vulnerabilities:

Apache Xerces C++ Qt base library

DOMNode

DOMCharacter

Data

DOMElement

DOMElementImpl

DOMText

DOMTextImpl

TypeConfusion!

QMapNodeBase

QMapNode

28

Sanitizer Summary: Type Safety

● Type confusion fundamental in today’s exploits● Existing sanitizers are incomplete, partial, slow

● HexType – (Almost) full coverage (2-6x increase)

– Reasonable overhead (SPEC CPU: 0-32x improvement, Firefox: 0-0.5x slowdown)

– Future work: remaining coverage, optimizations

29

T-Fuzz

30

Fuzzing Challenges

● Challenges– Shallow coverage

– Hard to find “deep” bugs

● Root cause– Fuzzer-generated inputs

cannot bypass complex sanity checks in the target program

– Existing work limits itself to input generation

start

end

check1

check2

check3

bug

Shallow code pathsShallow code paths

Deep code pathsDeep code paths

31

T-Fuzz: Fuzz the Program!

● Option 1: generate input to bypass checks by heavy-weight program analysis techniques– Driller (concolic analysis)

– VUzzer (dynamic taint analysis)

● Our idea: remove program’s sanity checks– Checks filter orthogonal input, e.g., magic values,

checksum, or hashes (Non-Critical Check, NCC)

– Insight: removing NCCs is safe

if (strncmp(hdr, “ELF", 3) == 0) { // main program logic} else { error(); }

32

Design and Implementation

● Fuzzer generates inputs● When “stuck”

– Detect NCCs*

● Transform program● Verify crashes

*Approximation of NCCs: edged in the CFG connecting covered/uncovered nodes

Fuzzer(e.g. AFL)

Program Transformer

Crash Analyzer

Bug Reports

False Positives

Crashing inputs

Inputs

Transformed Programs

33

Detecting NCC’s

● Approximate NCCs as edges connecting covered and uncovered nodes in CFG– Over approximate, may

contain false positives

– Lightweight and simple to implement 33

Covered Node

Uncovered Node

NCC Candidates

34

Program Transformation

● Our approach: negate NCCs– Simple: static binary rewriting

– Zero runtime overhead in resulting target program

– Unchanged CFG

– Trace in transformed program maps to original program

– Path constraints of original program can be recovered

34

start

end

A == B

True branch False branch

start

end

A != B

True branch False branch

Negated Check

35

Comparison to Symbolic Executoion

● Explores all code paths, tracks constraints

● Path explosion, e.g., loops● Each branch doubles the

number of code paths● Resource requirement● Theoretically beautiful,

limited scalability

... ... ... ...

...

( Path1, constraint set1)

( Pathn, constraint setn)

...

36

Comparison to Concolic Execution

● Guided by concrete inputs● Follows single code path,

collects constraints for new code paths

● Reduced resource requirements

● Still an exponential number of paths to explore!

... ... ... ...

...

input

C1

Not C1

37

Comparison to Driller (Fuzz & CE)

● Fuzzing until coverage wall● When fuzzing gets “stuck”,

concolic execution explores new code paths using fuzzer generated inputs

● Limitations– “SE & constraints solving” slows

down fuzzing

– Not able to bypass “hard” checks

Fuzzer

Inputs

mutating

target program

Crashes

SE & constraint solving

38

T-Fuzz: fuzz first, solve only crashes

● Fuzzing/SE decoupled● SE only applied to

detected crashes● For “hard” checks,

T-Fuzz detects the guarded bug, butcannot verify it

T-Fuzz

Fuzzer

program

Crashes

Program Transformation

T-Fuzz in action

SE & constraints solving

39

Evaluation

● Implementation– Fuzzer: shellphish fuzzer (python wrapper of AFL)

– Program Transformer: angr tracer, radare2

– Crash Analyzer: 2k LoC Python hackery

● Evaluation– DARPA CGC dataset

– LAVA-M dataset

– 4 real-world programs

40

DARPA CGC Dataset

● Improvement over Driller/AFL: 55 (45%) / 61 (58%)

● Driller outperforms T-Fuzz– 3 due to false crashes (L1)

– 7 due to transformation explosion (L2)

Method # bugs

AFL 105

Driller 121

T-Fuzz 166

Driller - AFL 16

T-Fuzz - AFL 61

T-Fuzz - Driller 55

Driller - T-Fuzz 10AFL (105)

T-Fuzz (166)Driller

(121)

10

6

55

41

LAVA-M Dataset

● T-Fuzz outperforms VUzzer and Steelix for “hard” checks

● T-Fuzz defeated by Steelix due to transformation explosion in who, but still found more bugs than VUzzer

● T-Fuzz found 1 unintended bug in who

Program # of bugs VUzzer Steelix T-Fuzz

base64 44 17 43 43

unique 28 27 24 26

md5sum 57 1 28 49

who 2136 50 194 95*

42

Evaluation on Real Programs

● Time budget: 24 hours– T-Fuzz triggers more crashes than AFL

– T-Fuzz found 3 new bugs in latest versions of ImageMagick and libpoppler (marked by *)

Program + library AFL T-Fuzz

pngfix + libpng (1.7.0) 0 11

tiffinfo + libtiff (3.8.2) 53 124

magick + ImageMagicK (7.0.7)

0 2*

pdftohtml + libpoppler (0.62.0)

0 1*

43

T-Fuzz Summary

● Fuzzers hit coverage wall, no “deep” bugs– T-Fuzz mutates both input and target program

– T-Fuzz improves over Driller/AFL by 45%/58%

– T-Fuzz triggeres bugs guarded by “hard” checks

– New bugs: 1 in LAVA-M, 3 in real-world programs

44

Conclusion

45

Conclusion

● Goal: Protect systems despite vulnerabilities● Sanitization finds bugs during testing

– HexType brings type safety to C++

– T-Fuzz explores deep program paths

● Combine sanitization and fuzzing for best results

Thank you! Questions?Source: https://hexhive.github.io

https://hexhive.github.io/

46

Word Cloud

So

urce

: http

s://h

exhi

ve.g

ithub

.io

https://hexhive.github.io/

Documents

Guarding Vulnerable Code: Module 1: Sanitization · 21 Type confusion detection* A static cast is checked only at compile time – Fast but no runtime guarantees Dynamic casts are