Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
Guarding Vulnerable Code:Module 1: Sanitization
Mathias Payer, Purdue Universityhttp://hexhive.github.io
2
Vulnerabilities everywhere?
3
Common Languages: TIOBE’18
Jul 2018 Jul 2017 Change Language Ratings Change
1 1 Java 16.139% +2.37%
2 2 C 14.662% +7.34%
3 3 C++ 7.615% +2.04%
4 4 Python 6.361% +2.82%
5 7 + VB .NET 4.247% +1.20%
6 5 - C# 3.795% +0.28%
7 6 - PHP 2.832% -0.26%
8 8 JavaScript 2.831% +0.22%
9 - ++ SQL 2.334% +2.33%
10 18 ++ Objective-C 1.453% -0.44%
4
Software is highly complex
Google Chrome: 76 MLoCGnome: 9 MLoCXorg: 1 MLoCglibc: 2 MLoCLinux kernel: 17 MLoC
Low-level languages (C/C++) trade type safety and memory safety for performance
5
Defense: Testing vs. Mitigations
Mitigations● Stop exploitation● Always on● Low overhead
Software Testing● Discover bugs● Development tool● Result oriented
6
Memory Corruption
7
Memory error: invalid dereference
Dangling pointer:(temporal)
Out-of-bounds pointer:(spatial)
Violation iff: pointer is read, written, or freed
char foo[40];foo[42] = 23;
free(foo);*foo = 23;
8
Type Confusion
9
Type confusion through downcasts
Base
Greeter Exec
Greeter *g = new Greeter();Base *b = static_cast<Base*>(g);Exec *e = static_cast<Exec*>(b);
√X
10
C++ casting operations
● static_cast<ToClass>(Object)
– Compile time check
– No runtime type information
● dynamic_cast<ToClass>(Object)
– Runtime check
– Requires Runtime Type Information (RTTI)
– Not used in performance critical code
11
Static cast
movq -24(%rbp), %rax # Load pointer# Type “check”
movq %rax, -40(%rbp) # Store pointer
Base *b = …;a = static_cast<Greeter*>(b);
12
Dynamic cast (O2)
leaq _ZTI7Greeter(%rip), %rdxleaq _ZTI4Base(%rip), %rsixorl %ecx, %ecxmovq %rbp, %rdi # Load pointercall __dynamic_cast@PLT # Type check
Base *b = …;a = dynamic_cast<Greeter*>(b);
13
Type confusion
class Base {int x;
};class Greeter: Base {
int y;virtual void Hi();
};…Base *Bptr = new Base();Greeter *Gptr;Gptr = static_cast<Greeter*>Gptr; // Type ConfGptr->y = 0x43; // Memory safety violation!Gptr->Hi(); // Control-flow hijacking
x
vtable*
y
B G
xBptr
Gptrvtable*?
y?
14
Type Confusion Demo
15
C++ virtual dispatch
class Base { … };class Exec: public Base { public: virtual void exec(char *prg) { system(prg); }};class Greeter: public Base { public: virtual void sayHi(char *str) { std::cout << str << std::endl; }};Greeter *greeter = new Greeter();greeter->sayHi("Oh, hello there!");
Base
Greater Exec
16
Simple exploitation demo
int main() { Base *b1 = new Greeter(); Base *b2 = new Exec(); Greeter *g;
g = static_cast<Greeter*>(b1); g->sayHi("Greeter says hi!");
g = static_cast<Greeter*>(b2); g->sayHi("/usr/bin/xcalc");
delete b1; delete b2; return 0;}
vtable*b1
vtable*b2
GreeterT
ExecT
// g[0][0](str);
// g[0][0](str);
17
Sanitization
18
Problem: broken abstractions?
C/C++ void log(int a) { printf("Log: "); printf("%d", a);}void (*fun)(int) = &log;void init() { fun(15);}
ASMlog: ... fun: .quad loginit: ... movl $15, %edi movq fun(%rip), %rax call *%rax
19
LLVM Sanitization
● Test cases detect bugs through assertions, segmentation faults, traps, exceptions
● Enforce stronger policies during testing!– Address Sanitizer: memory safety
– Leak Sanitizer: memory leaks
– Memory Sanitizer: uninitialized memory
– UBSan: undefined behavior
– Thread Sanitizer: data races
– HexVASAN: variadic argument checker
– HexType: type safety
20
Type Safety
21
Type confusion detection*
● A static cast is checked only at compile time– Fast but no runtime guarantees
● Dynamic casts are checked at runtime– High overhead, limited to polymorphic classes
● HexType design: – Conceptually check all casts dynamically
– Aggressively optimize design and implementation
* TypeSanitizer: Practical Type Confusion Detection. Istvan Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Herbert Bos, Cristiano Giuffrida, Erik van der Kouwe. In CCS'16* HexType: Efficient Detection of Type Confusion Errors for C++. Yuseok Jeon, Priyam Biswas, Scott A. Carr, Byoungyoung Lee, and Mathias Payer. In CCS'17
22
Making type checks explicit
● Enforce runtime check at all cast sites– static_cast<ToClass>(Object)
– dynamic_cast<ToClass>(Object)
– reinterpret_cast<ToClass>(Object)
– (ToClass)(Object)
● Build global type hierarchy● Keep track of the allocation type of each object
– Must instrument all forms of allocation
– Requires disjoint metadata
23
HexType: design
Instrumentation(Type casting verification)
Source code
LLVM Pass
Clang
Type Hierarchy Information
HexTypeRuntimeLibrary
HexTypeBinary
Link
24
HexType: aggressive optimization
● Limit tracing to unsafe types– Remove tracing of types that are never cast
● Limit checking to unsafe casts– Remove statically verifiable casts
● No more RTTI for dynamic casts– Replace dynamic casts with fast lookup
25
Demo Time!
26
HexType coverage
27
Newly discovered bugs
● Discovered seven new vulnerabilities:
Apache Xerces C++ Qt base library
DOMNode
DOMCharacter
Data
DOMElement
DOMElementImpl
DOMText
DOMTextImpl
TypeConfusion!
QMapNodeBase
QMapNode
28
Sanitizer Summary: Type Safety
● Type confusion fundamental in today’s exploits● Existing sanitizers are incomplete, partial, slow
● HexType – (Almost) full coverage (2-6x increase)
– Reasonable overhead (SPEC CPU: 0-32x improvement, Firefox: 0-0.5x slowdown)
– Future work: remaining coverage, optimizations
29
T-Fuzz
30
Fuzzing Challenges
● Challenges– Shallow coverage
– Hard to find “deep” bugs
● Root cause– Fuzzer-generated inputs
cannot bypass complex sanity checks in the target program
– Existing work limits itself to input generation
start
end
check1
check2
check3
bug
Shallow code pathsShallow code paths
Deep code pathsDeep code paths
31
T-Fuzz: Fuzz the Program!
● Option 1: generate input to bypass checks by heavy-weight program analysis techniques– Driller (concolic analysis)
– VUzzer (dynamic taint analysis)
● Our idea: remove program’s sanity checks– Checks filter orthogonal input, e.g., magic values,
checksum, or hashes (Non-Critical Check, NCC)
– Insight: removing NCCs is safe
if (strncmp(hdr, “ELF", 3) == 0) { // main program logic} else { error(); }
32
Design and Implementation
● Fuzzer generates inputs● When “stuck”
– Detect NCCs*
● Transform program● Verify crashes
*Approximation of NCCs: edged in the CFG connecting covered/uncovered nodes
Fuzzer(e.g. AFL)
Program Transformer
Crash Analyzer
Bug Reports
False Positives
Crashing inputs
Inputs
Transformed Programs
33
Detecting NCC’s
● Approximate NCCs as edges connecting covered and uncovered nodes in CFG– Over approximate, may
contain false positives
– Lightweight and simple to implement 33
Covered Node
Uncovered Node
NCC Candidates
34
Program Transformation
● Our approach: negate NCCs– Simple: static binary rewriting
– Zero runtime overhead in resulting target program
– Unchanged CFG
– Trace in transformed program maps to original program
– Path constraints of original program can be recovered
34
start
end
A == B
True branch False branch
start
end
A != B
True branch False branch
Negated Check
35
Comparison to Symbolic Executoion
● Explores all code paths, tracks constraints
● Path explosion, e.g., loops● Each branch doubles the
number of code paths● Resource requirement● Theoretically beautiful,
limited scalability
... ... ... ...
...
( Path1, constraint set1)
( Pathn, constraint setn)
...
36
Comparison to Concolic Execution
● Guided by concrete inputs● Follows single code path,
collects constraints for new code paths
● Reduced resource requirements
● Still an exponential number of paths to explore!
... ... ... ...
...
input
C1
Not C1
37
Comparison to Driller (Fuzz & CE)
● Fuzzing until coverage wall● When fuzzing gets “stuck”,
concolic execution explores new code paths using fuzzer generated inputs
● Limitations– “SE & constraints solving” slows
down fuzzing
– Not able to bypass “hard” checks
Fuzzer
Inputs
mutating
target program
Crashes
SE & constraint solving
38
T-Fuzz: fuzz first, solve only crashes
● Fuzzing/SE decoupled● SE only applied to
detected crashes● For “hard” checks,
T-Fuzz detects the guarded bug, butcannot verify it
T-Fuzz
Fuzzer
program
Crashes
Program Transformation
T-Fuzz in action
SE & constraints solving
39
Evaluation
● Implementation– Fuzzer: shellphish fuzzer (python wrapper of AFL)
– Program Transformer: angr tracer, radare2
– Crash Analyzer: 2k LoC Python hackery
● Evaluation– DARPA CGC dataset
– LAVA-M dataset
– 4 real-world programs
40
DARPA CGC Dataset
● Improvement over Driller/AFL: 55 (45%) / 61 (58%)
● Driller outperforms T-Fuzz– 3 due to false crashes (L1)
– 7 due to transformation explosion (L2)
Method # bugs
AFL 105
Driller 121
T-Fuzz 166
Driller - AFL 16
T-Fuzz - AFL 61
T-Fuzz - Driller 55
Driller - T-Fuzz 10AFL (105)
T-Fuzz (166)Driller
(121)
10
6
55
41
LAVA-M Dataset
● T-Fuzz outperforms VUzzer and Steelix for “hard” checks
● T-Fuzz defeated by Steelix due to transformation explosion in who, but still found more bugs than VUzzer
● T-Fuzz found 1 unintended bug in who
Program # of bugs VUzzer Steelix T-Fuzz
base64 44 17 43 43
unique 28 27 24 26
md5sum 57 1 28 49
who 2136 50 194 95*
42
Evaluation on Real Programs
● Time budget: 24 hours– T-Fuzz triggers more crashes than AFL
– T-Fuzz found 3 new bugs in latest versions of ImageMagick and libpoppler (marked by *)
Program + library AFL T-Fuzz
pngfix + libpng (1.7.0) 0 11
tiffinfo + libtiff (3.8.2) 53 124
magick + ImageMagicK (7.0.7)
0 2*
pdftohtml + libpoppler (0.62.0)
0 1*
43
T-Fuzz Summary
● Fuzzers hit coverage wall, no “deep” bugs– T-Fuzz mutates both input and target program
– T-Fuzz improves over Driller/AFL by 45%/58%
– T-Fuzz triggeres bugs guarded by “hard” checks
– New bugs: 1 in LAVA-M, 3 in real-world programs
44
Conclusion
45
Conclusion
● Goal: Protect systems despite vulnerabilities● Sanitization finds bugs during testing
– HexType brings type safety to C++
– T-Fuzz explores deep program paths
● Combine sanitization and fuzzing for best results
Thank you! Questions?Source: https://hexhive.github.io