112
Security through Multi-Layer Diversity Meng Xu (Qualifying Examination Presentation) 1

Security through Multi-Layer Diversity

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Security through Multi-Layer Diversity

Security through Multi-Layer Diversity

Meng Xu

(Qualifying Examination Presentation)

1

Page 2: Security through Multi-Layer Diversity

Bringing Diversity to Computing Monoculture

2

• Current computing monoculture leaves our infrastructure vulnerable to massive and rapid attacks.

• Knowing that victim systems run on a specific software stack, an attacker can compromise them deterministically.

Page 3: Security through Multi-Layer Diversity

3

Page 4: Security through Multi-Layer Diversity

4

Page 5: Security through Multi-Layer Diversity

Response from Security Community

5

• W⊕R, ASLR, CFI, CPI, MPX

• Softbound, CETS

• Address Sanitizer, Memory Sanitizer, Thread Sanitizer

• ……

Page 6: Security through Multi-Layer Diversity

Limitations of Existing Schemes

Widely-deployed security schemes: W⊕R, ASLR, CFI→ Not hard to by-pass

6

Page 7: Security through Multi-Layer Diversity

Limitations of Existing Schemes

Widely-deployed security schemes: W⊕R, ASLR, CFI→ Not hard to by-pass

More sophisticated schemes: LLVM sanitizers→ Offer protection against only specific vulnerabilities→ Refuse to be combined due to conflicts in design

7

Page 8: Security through Multi-Layer Diversity

Limitations of Existing Schemes

Widely-deployed security schemes: W⊕R, ASLR, CFI→ Not hard to by-pass

More sophisticated schemes: LLVM sanitizers→ Offer protection against only specific vulnerabilities→ Refuse to be combined due to conflicts in design

Accumulated overhead: Softbound + CETS→ 110% slowdown

8

Page 9: Security through Multi-Layer Diversity

A Biological Inspiration

9

Even the deadliest virus cannot kill all species because of gene diversity

Page 10: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

10

Page 11: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3

11

Page 12: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input (benign)

Output (consensus)

Variant 1 Variant 2 Variant 3

12

Page 13: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input (malicious)

No output (divergence)

Variant 1 Variant 2 Variant 3

13

Page 14: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input (malicious)

No output (divergence)

Variant 1 Variant 2 Variant 3

14

An attacker has to simultaneously compromise all variants in order to to

compromise the whole system

Page 15: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

15

Zend

Linux Platform

Implementation

Process

Page 16: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3

16

Zend

Linux

Process

Linux Linux Linux

Zend Zend Zend

ASan MSan UBSan

Page 17: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3

17

Zend

Linux

Implementation

Linux Linux Linux

Zend HHVM JPHP

ASan MSan UBSan

Page 18: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3

18

Zend

Linux Platform Linux Windows MacOS

Zend HHVM JPHP

ASan MSan UBSan

Page 19: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3

19

Zend

Linux Linux Windows MacOS

Zend HHVM JPHP

ASan MSan UBSanBunshin (ATC’17)

PlatPal (Security’17)

Future work

Page 20: Security through Multi-Layer Diversity

Bunshin: Compositing Security Mechanisms through Diversification

Meng Xu, Kangjie Lu, Taesoo Kim, Wenke Lee

Georgia Tech

20

Presented at the 2017 USENIX Annual Technical Conference (ATC’17)

Page 21: Security through Multi-Layer Diversity

Battle against Memory Errors

Protect dangerous operation using sanity checks:→ Auto-applied at compile time

21

void foo(T *a) {*a = 0x1234;

}

void foo(T *a) {if(!is_valid_address(a) {

report_and_abort();}*a = 0x1234;

}

Sanitize

Page 22: Security through Multi-Layer Diversity

Battle against Memory Errors

22

Memory Error Main Causes Defenses

Out-of-bound read/write

Lack of length check

SoftboundAddressSanitizer

Integer overflow

Format string bug

Bad type casting

Use-after-freeDangling pointer CETS

AddressSanitizerDouble free

Uninitialized read

Lack of initialization

MemorySanitizerData structure alignment

Subword copying

Undefined behaviors

Divide-by-zero

UndefinedBehaviorSanitizerPointer misalignment

Null-pointer dereference

Page 23: Security through Multi-Layer Diversity

Comprehensive Protection with Bunshin

• Accumulated execution slowdown

• Example: Softbound + CETS → 110% slowdown

• Bunshin: Reduce to 60% or 40% (depends on the config)

• Implementation conflicts

• Example: AddressSanitizer and MemorySanitizer

• Bunshin: Seamlessly enforce conflicting sanitizers

23

Page 24: Security through Multi-Layer Diversity

Challenges for Bunshin

24

• How to generate these variants?

• What properties they should have?

• How to make them appear as one to outsiders?

• What is a “behavior” and what is a divergence?

• What if the sanitizers introduces new behaviors?

• Multi-threading support?

Page 25: Security through Multi-Layer Diversity

Variant Generation Principles

• Check distribution

• Sanitizer distribution

25

Page 26: Security through Multi-Layer Diversity

Check Distribution

26

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3Program

Input

Output

Partition 1

Partition 2

Partition 3

Partition 1

Partition 2

Partition 3

Page 27: Security through Multi-Layer Diversity

Sanitizer Distribution

27

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3Program

Input

Output

A D D R E S S

M E MO R Y

U N D E F

A D D R E S S

M E MO R Y

U N D E F

Page 28: Security through Multi-Layer Diversity

Cost Profiling

• Calculate the slowdown caused by the sanity checks

void foo(T *a) {timing_start();if(!is_valid_address(a) {

report_and_abort();}*a = 0x1234;timing_end();

}

void foo(T *a) {timing_start();*a = 0x1234;timing_end();

}

28

Page 29: Security through Multi-Layer Diversity

Cost Distribution

• Equally distribute overhead to variants so that they execute at the same speed

29

17%

28%

35%

20%

Foo

Bar

Baz

Qux

17%

35%

Foo

Baz

28%

20%

Bar

Qux

Variant 1(52% overhead)

Variant 2(48% overhead)

Page 30: Security through Multi-Layer Diversity

Variant Generation Process

30

Costsprofiling

Securitymechanisms

Variantcompiling

Variantgenerator

Source code

VariantsOverhead

distribution(e.g., ASan, MSan, UBSan)

opt.

opt.

w/ ASanw/ UBSan

w/ MSan w/ ASan

...

full

selective

...

Page 31: Security through Multi-Layer Diversity

System Call Synchronization

31

Userspace

Kernel

Leader Follower 1 Follower 2

Partition 1

Partition 2

Partition 3

sync slot

Syscall number

Arguments

Execution result

Page 32: Security through Multi-Layer Diversity

System Call Synchronization

32

Userspace

Kernel

Leader Follower 1 Follower 2

Partition 1

Partition 2

Partition 3

Syscall number

Arguments

Execution result

sync slot

 ① Leader enters syscall

Page 33: Security through Multi-Layer Diversity

System Call Synchronization

33

Userspace

Kernel

Leader Follower 1 Follower 2

Partition 1

Partition 2

Partition 3

Syscall number

Arguments

Execution result

sync slot

 ② Followers enter syscall

Page 34: Security through Multi-Layer Diversity

System Call Synchronization

34

Userspace

Kernel

Leader Follower 1 Follower 2

Partition 1

Partition 2

Partition 3

Syscall number

Arguments

Execution result

sync slot

③ Kernel execute the syscall only once

Page 35: Security through Multi-Layer Diversity

System Call Synchronization

35

Userspace

Kernel

Leader Follower 1 Follower 2

Partition 1

Partition 2

Partition 3

Syscall number

Arguments

Execution result

sync slot

④ Leader fetches syscall result ④ Followers fetch syscall result

Page 36: Security through Multi-Layer Diversity

Strict and Selective Lockstep

36

Userspace

Kernel

Leader Follower 1 Follower 2

Partition 1

Partition 2

Partition 3

sync ring buffer

Leader writes at the next available slot

Followers read attheir own speed

Page 37: Security through Multi-Layer Diversity

Strict and Selective Lockstep

37

Userspace

Kernel

Leader Follower 1 Follower 2

Partition 1

Partition 2

Partition 3

sync ring buffer

Always strictly synchronized for “write” related system calls

Page 38: Security through Multi-Layer Diversity

Multi-threading Support

38

Before fork

After fork

Leader Follower 1 Follower 2

OriginalExecution group

NewExecution group

New ring buffer

Page 39: Security through Multi-Layer Diversity

Multi-threading Support

39

Before fork

After fork

Leader Follower 1 Follower 2

OriginalExecution group

NewExecution group

New ring buffer

Works if there is no interleaving

between threads

Page 40: Security through Multi-Layer Diversity

Multi-threading Support

40

Leader Follower 1 Follower 2

Userspace

Kernel

Total order of lock acquisition and releases

Record Enforce Enforce

Page 41: Security through Multi-Layer Diversity

Multi-threading Support

41

Leader Follower 1 Follower 2

Userspace

Kernel

Total order of lock acquisition and releases

Record Enforce EnforceWorks under

weak determinism(data race-free programs)

Implementation specific(pthread APIs only)

Page 42: Security through Multi-Layer Diversity

Evaluate Bunshin

42

• Robustness and Security

• Efficiency and Scalability

• Protection Distribution Case Studies

Page 43: Security through Multi-Layer Diversity

Robustness

43

Benchmark Single/Multi-thread Featuer Pass ?

SPEC CPU2006 Single

CPU IntensiveSPLASH-2x Multi

PARSEC Multi 6 out of 13

lighttpd Single

I/O Intensive

nginx Multi

python, php Single Interpreter

Page 44: Security through Multi-Layer Diversity

Security

• RIPE Benchmark

• Real-world CVEs

44

Config Succeed Probabilistic Failed Not possible

Default 114 16 720 2990

AddressSanitizer 8 0 842 2990

Bunshin 8 0 842 2990

Config CVE Exploits Sanitizer Detect

nginx-1.4.0 2013-2028 Blind ROP AddressSanitizer

cpython-2.7.10 2016-5636 Integer overflow AddressSanitizer

php-5.6.6 2015-4602 Type confusion AddressSanitizer

openssl-1.0.1a 2014-0160 Heartbleed AddressSanitizer

httpd-2.4.10 2014-3581 Null dereference UndefinedBehaviorSanitizer

Page 45: Security through Multi-Layer Diversity

Performance

Benchmark Items Strict-Lockstep Selective-Lockstep

SPEC CPU2006(19 Programs)

Max 17.5% 14.7%

Min 1.6% 1.0%

Ave 8.6% 5.6%

SPLASH-2X / PARSEC(19 Programs)

Max 21.4% 18.9%

Min 10.7% 6.6%

Ave 16.6% 14.5%

lighttpd1MB File Request Ave 1.44% 1.21%

nginx1MB File Request Ave 1.71% 1.41%

45

Page 46: Security through Multi-Layer Diversity

Performance Highlights

• Low overhead (5% - 16%) for standard benchmarks

• Negligible overhead (<= 2%) for server programs

• Extra cost of ensuring weak determinism is 8%

• Selective-lockstep saves around 3% overhead

46

Page 47: Security through Multi-Layer Diversity

Scalability - Number of Variants

47

Syn

c O

verh

ead

(%)

Number of variants2 4 6 8

0 0.5

6.6

11.4

1.7

11.2

17.2

37.6

0.64.4

10.5

20.9

Ave Max Min

Page 48: Security through Multi-Layer Diversity

Scalability - Number of Variants

48

Syn

c O

verh

ead

(%)

Number of variants2 4 6 8

0 0.5

6.6

11.4

1.7

11.2

17.2

37.6

0.64.4

10.5

20.9

Ave Max Min

The number of variants Bunshin can support with a reasonable overhead depends on machine configurations

and program characteristics.

Page 49: Security through Multi-Layer Diversity

Scalability - System Load

49

Syn

c O

verh

ead

(%)

Number of variants2% 50% 99%

0.20.8

1.9

6.4

9.7

13

2.2

4.8

6.6

Ave Max Min

Page 50: Security through Multi-Layer Diversity

Scalability - System Load

50

Syn

c O

verh

ead

(%)

Number of variants2% 50% 99%

0.20.8

1.9

6.4

9.7

13

2.2

4.8

6.6

Ave Max Min

Bunshin works well in all levels of system load (i.e., Bunshin does not require exclusive cores)

Page 51: Security through Multi-Layer Diversity

Check Distribution - ASan

51

Ove

rhea

d (%

)

Whole V1 V2 V3 Bunshin

43.137.234.934.8

107

Ove

rhea

d (%

)

Whole V1 V2 Bunshin

65.66357.4

107

Page 52: Security through Multi-Layer Diversity

Sanitizer Distribution - UBSan

52

Ove

rhea

d (%

)

Whole V1 V2 V3 Bunshin

94.58878.777.2

228

Ove

rhea

d (%

)

Whole V1 V2 Bunshin

129125124

228

Page 53: Security through Multi-Layer Diversity

Unifying LLVM Sanitizers

53

Ove

rhea

d (%

)

gobmk povray h264ref average

177

208

248

165 172

207189

141 148

191

246

158

98.9112

205

116

ASan MSan UBSan Bunshin

Page 54: Security through Multi-Layer Diversity

Ove

rhea

d (%

)

gobmk povray h264ref average

177

208

248

165 172

207189

141 148

191

246

158

98.9112

205

116

ASan MSan UBSan Bunshin

Unifying LLVM Sanitizers

54

With an average of 5% more slowdown, Bunshin can seamlessly unify all three

LLVM sanitizers

Page 55: Security through Multi-Layer Diversity

Limitations and Future Work

• Finer-grained check distribution

• Sanitizer integration

• Record-and-replay

55

Page 56: Security through Multi-Layer Diversity

Conclusion

• It is feasible to achieve both comprehensive protection and high throughput with an N-version system

• Bunshin is effective in reducing slowdown caused by sanitizers

• 107% → 47.1% for ASan, 228% → 94.5% for UBSan

• Bunshin can seamlessly unify three LLVM sanitizers with 5% extra slowdown

https://github.com/sslab-gatech/bunshin(Source code will be released soon)

56

Page 57: Security through Multi-Layer Diversity

Enhance System Security Through Diversity

Software Stack

Input

Output

Virtualization

Synchronize Execution & Consolidate Outputs

Input

Output

Variant 1 Variant 2 Variant 3

57

Zend

Linux Linux Windows MacOS

Zend HHVM JPHP

ASan MSan UBSanBunshin (ATC’17)

PlatPal (Security’17)

Future work

Page 58: Security through Multi-Layer Diversity

PlatPal: Detecting Malicious Documents with Platform Diversity

Meng Xu and Taesoo Kim

Georgia Tech

58

Presented at the 2017 USENIX Security Symposium (Security’17)

Page 59: Security through Multi-Layer Diversity

Malicious Documents On the Rise

59

Page 60: Security through Multi-Layer Diversity

60

Page 61: Security through Multi-Layer Diversity

61

Page 62: Security through Multi-Layer Diversity

62

Adobe Components Exploited

Element parser

JavaScript engine

Font manager

System dependencies

137 CVEs in 2015

227 CVEs in 2016

Page 63: Security through Multi-Layer Diversity

63

Maldoc Formula

Flexibility of doc spec

A large attack surface

Less caution from users

More opportunities to profit

Page 64: Security through Multi-Layer Diversity

Battle against Maldoc - A Survey

64

Category Focus Work Year Detection

Static

JavaScript PJScan 2011 Lexical analysis

JavaScript Vatamanu et al. 2012 Token clustering

JavaScript Lux0r 2014 API reference classification

JavaScript MPScan 2013 Shellcode and opcode sig

Metadata PDF Malware Slayer 2012 Linearized object path

Metadata Srndic et al. 2013 Hierarchical structure

Metadata PDFrate 2012 Content meta-features

Both Maiorca et al. 2016 Many heuristics combined

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig

JavaScript PDF Scrutinizer 2012 Known attack patterns

JavaScript ShellOS 2011 Memory access patterns

JavaScript Liu et al. 2014 Common attack behaviors

Memory CWXDetector 2012 Violation of invariants

Page 65: Security through Multi-Layer Diversity

Reliance on External PDF Parser

65

Category Focus Work Year Detection External Parser ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig No

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig Yes

JavaScript PDF Scrutinizer 2012 Known attack patterns Yes

JavaScript ShellOS 2011 Memory access patterns Yes

JavaScript Liu et al. 2014 Common attack behaviors No

Memory CWXDetector 2012 Violation of invariants No

Page 66: Security through Multi-Layer Diversity

Category Focus Work Year Detection External Parser ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig No

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig Yes

JavaScript PDF Scrutinizer 2012 Known attack patterns Yes

JavaScript ShellOS 2011 Memory access patterns Yes

JavaScript Liu et al. 2014 Common attack behaviors No

Memory CWXDetector 2012 Violation of invariants No

Reliance on External PDF Parser

66

Parser-confusion attacks(Carmony et al., NDSS’16)

Page 67: Security through Multi-Layer Diversity

Reliance on Machine Learning

67

Category Focus Work Year Detection Machine Learning ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig No

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig No

JavaScript PDF Scrutinizer 2012 Known attack patterns No

JavaScript ShellOS 2011 Memory access patterns No

JavaScript Liu et al. 2014 Common attack behaviors No

Memory CWXDetector 2012 Violation of invariants No

Page 68: Security through Multi-Layer Diversity

Reliance on Machine Learning

68

Category Focus Work Year Detection Machine Learning ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig No

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig No

JavaScript PDF Scrutinizer 2012 Known attack patterns No

JavaScript ShellOS 2011 Memory access patterns No

JavaScript Liu et al. 2014 Common attack behaviors No

Memory CWXDetector 2012 Violation of invariants No

Automatic classifier evasions(Xu et al., NDSS’16)

Page 69: Security through Multi-Layer Diversity

Reliance on Known Attacks

69

Category Focus Work Year Detection Known Attacks ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig Yes

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig Yes

JavaScript PDF Scrutinizer 2012 Known attack patterns Yes

JavaScript ShellOS 2011 Memory access patterns Yes

JavaScript Liu et al. 2014 Common attack behaviors Yes

Memory CWXDetector 2012 Violation of invariants No

Page 70: Security through Multi-Layer Diversity

Reliance on Known Attacks

70

Category Focus Work Year Detection Known Attacks ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig Yes

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig Yes

JavaScript PDF Scrutinizer 2012 Known attack patterns Yes

JavaScript ShellOS 2011 Memory access patterns Yes

JavaScript Liu et al. 2014 Common attack behaviors Yes

Memory CWXDetector 2012 Violation of invariants No

How about zero-day attacks ?

Page 71: Security through Multi-Layer Diversity

Reliance on Detectable Discrepancy (between benign and malicious docs)

71

Category Focus Work Year Detection Discrepancy ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig No

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig No

JavaScript PDF Scrutinizer 2012 Known attack patterns No

JavaScript ShellOS 2011 Memory access patterns Yes

JavaScript Liu et al. 2014 Common attack behaviors Yes

Memory CWXDetector 2012 Violation of invariants No

Page 72: Security through Multi-Layer Diversity

Reliance on Detectable Discrepancy (between benign and malicious docs)

72

Category Focus Work Year Detection Discrepancy ?

Static

JavaScript PJScan 2011 Lexical analysis Yes

JavaScript Vatamanu et al. 2012 Token clustering Yes

JavaScript Lux0r 2014 API reference classification Yes

JavaScript MPScan 2013 Shellcode and opcode sig No

Metadata PDF Malware Slayer 2012 Linearized object path Yes

Metadata Srndic et al. 2013 Hierarchical structure Yes

Metadata PDFrate 2012 Content meta-features Yes

Both Maiorca et al. 2016 Many heuristics combined Yes

Dynamic

JavaScript MDScan 2011 Shellcode and opcode sig No

JavaScript PDF Scrutinizer 2012 Known attack patterns No

JavaScript ShellOS 2011 Memory access patterns Yes

JavaScript Liu et al. 2014 Common attack behaviors Yes

Memory CWXDetector 2012 Violation of invariants No

Mimicry and reverse mimicry attacks(Srndic et al., Oakland’14 and Maiorca et al, AsiaCCS’13)

Page 73: Security through Multi-Layer Diversity

Prior works rely on

• External PDF parsers

• Machine learning

• Known attack signatures

• Detectable discrepancy

73

Highlights of the Survey

Parser-confusion attacks

Automatic classifier evasion

Zero-day attacks

Mimicry and reverse mimicry

Page 74: Security through Multi-Layer Diversity

Prior works rely on

• External PDF parsers

• Machine learning

• Known attack signatures

• Detectable discrepancy

74

Motivations for PlatPal

What PlatPal aims to achieve

• Using Adobe’s parser

• Using only simple heuristics

• Capable to detect zero-days

• Do not assume discrepancy

• Complementary to prior works

Page 75: Security through Multi-Layer Diversity

Prior works rely on

• External PDF parsers

• Machine learning

• Known attack signatures

• Detectable discrepancy

75

Motivations for PlatPal

What PlatPal aims to achieve

• Using Adobe’s parser

• Using only simple heuristics

• Capable to detect zero-days

• Do not assume discrepancy

• Complementary to prior works

Page 76: Security through Multi-Layer Diversity

Prior works rely on

• External PDF parsers

• Machine learning

• Known attack signatures

• Detectable discrepancy

76

Motivations for PlatPal

What PlatPal aims to achieve

• Using Adobe’s parser

• Using only simple heuristics

• Capable to detect zero-days

• Do not assume discrepancy

• Complementary to prior works

Page 77: Security through Multi-Layer Diversity

Prior works rely on

• External PDF parsers

• Machine learning

• Known attack signatures

• Detectable discrepancy

77

Motivations for PlatPal

What PlatPal aims to achieve

• Using Adobe’s parser

• Using only simple heuristics

• Capable to detect zero-days

• Do not assume discrepancy

• Complementary to prior works

Page 78: Security through Multi-Layer Diversity

Prior works rely on

• External PDF parsers

• Machine learning

• Known attack signatures

• Detectable discrepancy

78

Motivations for PlatPal

What PlatPal aims to achieve

• Using Adobe’s parser

• Using only simple heuristics

• Capable to detect zero-days

• Do not assume discrepancy

• Complementary to prior works

Page 79: Security through Multi-Layer Diversity

Prior works rely on

• External PDF parsers

• Machine learning

• Known attack signatures

• Detectable discrepancy

79

Motivations for PlatPal

What PlatPal aims to achieve

• Using Adobe’s parser

• Using only simple heuristics

• Capable to detect zero-days

• Do not assume discrepancy

• Complementary to prior works

Page 80: Security through Multi-Layer Diversity

A Motivating Example

• A CVE-2013-2729 PoC against Adobe Reader 10.1.4

SHA-1: 74543610d9908698cb0b4bfcc73fc007bfeb6d84

80

Page 81: Security through Multi-Layer Diversity

81

Page 82: Security through Multi-Layer Diversity

82

Page 83: Security through Multi-Layer Diversity

Platform Diversity as A Heuristic

83

When the same document is opened across different platforms:

• A benign document “behaves” the same

• A malicious document “behaves” differently

Page 84: Security through Multi-Layer Diversity

Questions for PlatPal

84

• What is a “behavior” ?

• What is a divergence ?

• How to trace them ?

• How to compare them ?

Page 85: Security through Multi-Layer Diversity

PlatPal Basic Setup

85

Windows Host

Virtual Machine 1

Adobe Reader

MacOS Host

Virtual Machine 2

Adobe Reader

?

Page 86: Security through Multi-Layer Diversity

PlatPal Dual-Level Tracing

86

Virtual Machine 1

Adobe Reader

Internal Tracer

Virtual Machine 2

Adobe Reader

Internal Tracer

?

Windows Host MacOS Host

Traces of PDFprocessing

Page 87: Security through Multi-Layer Diversity

PlatPal Dual-Level Tracing

87

Virtual Machine 1

Adobe Reader

Internal Tracer

Syscalls

External Tracer

Virtual Machine 2

Adobe Reader

Internal Tracer

Syscalls

External Tracer

?

Windows Host MacOS Host

Impacts on host platform

Traces of PDFprocessing

Page 88: Security through Multi-Layer Diversity

PlatPal Internal Tracer

88

Adobe Reader

Internal Tracer

COS object parsing

PD tree construction

Script execution

Other actions

Element rendering

• Implemented as an Adobe Reader plugin.

• Hooks critical functions and callbacks during the PDF processing lifecycle.

• Very fast and stable across Adobe Reader versions.

Page 89: Security through Multi-Layer Diversity

PlatPal External Tracer

89

Virtual Machine

Adobe Reader

Syscalls

External Tracer

Host Platform

Filesystem Operations

Network Activities

Program Executions

Normal Exit or Crash

• Implemented based on NtTrace (for Windows) and Dtrace (for MacOS).

• Resembles high-level system impacts in the same manner as Cuckoo guest agent.

• Starts tracing only after the document is loaded into Adobe Reader.

Page 90: Security through Multi-Layer Diversity

PlatPal Automated Workflow

90

Windows VM

Restore Clean Snapshot

Launch Adobe Reader

Attach External Tracer

Open PDF

Drive PDF by Internal Tracer

Dump Traces

Restore Clean Snapshot

Launch Adobe Reader

Attach External Tracer

Open PDF

Drive PDF by Internal Tracer

Dump Traces

MacOS VMCompare Traces

PlatPal <file-to-check>

Page 91: Security through Multi-Layer Diversity

Evaluate PlatPal

91

• Robustness against benign samples

A benign document “behaves” the same ?

• Effectiveness against malicious samples

A malicious document “behaves” differently ?

• Speed and resource usages

Page 92: Security through Multi-Layer Diversity

Robustness

92

Sample Type Number of Samples Divergence Detected ? (i.e., False Positive)

Plain PDF 966 No

Embedded fonts 34 No

JavaScript code 32 No

AcroForm 17 No

3D objects 2 No

• 1000 samples from Google search.

• 30 samples that use advanced features in PDF standards from PDF learning sites.

Page 93: Security through Multi-Layer Diversity

Effectiveness

• 320 malicious samples from VirusTotal with CVE labels.

• Restricted to analyze CVEs published after 2013.

• Use the most recent version of Adobe Reader when the CVE is published.

93

Page 94: Security through Multi-Layer Diversity

Effectiveness

Analysis Results of 320 Maldoc Samples

65%11%

24%

No DivergenceBoth CrashDivergence

94

Page 95: Security through Multi-Layer Diversity

Effectiveness

Analysis Results of 320 Maldoc Samples

65%11%

24%

No Divergence

Breakdown of 77 potentially false positives

26%

3%

25%

47%

Targets old versionsMis-classified by AV vendorNo malicious activity trigerredUnknown

95

Page 96: Security through Multi-Layer Diversity

Time and Resource Usages

Average Analysis Time Breakdown (unit. Seconds)

Item Windows MacOS

Snapshot restore 9.7 12.6

Document parsing 0.5 0.6

Script execution 10.5 5.1

Element rendering 7.3 6.2

Total 23.7 22.1

Resource Usages

• 2GB memory per running virtual machine.

• 60GB disk space for Windows and MacOS snapshots that each corresponds to one of the 6 Adobe Readers versions.

96

Page 97: Security through Multi-Layer Diversity

Evaluation Highlights

• Confirms our fundamental assumption in general:

benign document “behaves” the same

malicious document “behaves” differently

• PlatPal is subject to the pitfalls of dynamic analysis

i.e., prepare the environment to lure the malicious behaviors

• Incurs reasonable analysis time to make PlatPal practical

97

Page 98: Security through Multi-Layer Diversity

Further Analysis

• What could be the root causes of these divergences?

98

Page 99: Security through Multi-Layer Diversity

Diversified Factors across Platforms

99

Category Factor Windows MacOS

Shellcode Creation

Memory Management

Platform Features

Page 100: Security through Multi-Layer Diversity

Diversified Factors across Platforms

100

Category Factor Windows MacOS

Shellcode Creation

Syscall semantics Both the syscall number and the register set used to hold syscall arguments are different

Calling convention rcx, rdx, r8 for first 3 args rdi, rsi, rdx for first 3 args

Library dependencies e.g., LoadLibraryA e.g. dlopen

Memory Management

Platform Features

Page 101: Security through Multi-Layer Diversity

Diversified Factors across Platforms

101

Category Factor Windows MacOS

Shellcode Creation

Syscall semantics Both the syscall number and the register set used to hold syscall arguments are different

Calling convention rcx, rdx, r8 for first 3 args rdi, rsi, rdx for first 3 args

Library dependencies e.g., LoadLibraryA e.g. dlopen

Memory Management

Memory layout Offset from attack point (e.g., overflowed buffer) to target address (e.g., vtable entries) are different

Heap management Segment heap Magazine malloc

Platform Features

Page 102: Security through Multi-Layer Diversity

Diversified Factors across Platforms

102

Category Factor Windows MacOS

Shellcode Creation

Syscall semantics Both the syscall number and the register set used to hold syscall arguments are different

Calling convention rcx, rdx, r8 for first 3 args rdi, rsi, rdx for first 3 args

Library dependencies e.g., LoadLibraryA e.g. dlopen

Memory Management

Memory layout Offset from attack point (e.g., overflowed buffer) to target address (e.g., vtable entries) are different

Heap management Segment heap Magazine malloc

Platform Features

Executable format COM, PE, NE Mach-O

Filesystem semantics \ as separator, prefixed drive letter C:\

/ as separator,no prefixed drive letter

Config and info hub registry proc

Expected programs MS Office, IE, etc Safari, etc

Page 103: Security through Multi-Layer Diversity

Back to The Motivating Example

103

1. Allocate 1000 300-bytes chunks

2. Free 1 in every 10

3. Load a 300-byte malicious BMP image

4. Corrupt heap metadata due to a buffer overflow

5. Free BMP image, but what is actually freed is slot 9

6. A vtable of 300-byte is allocated on slot 9, which is attacker controlled

Page 104: Security through Multi-Layer Diversity

Another Case Study

104

CVE-2014-0521 PoC Example

Page 105: Security through Multi-Layer Diversity

Bypass PlatPal ?

105

An attacker has to simultaneously compromise all platforms in order to

bypass PlatPal.

Page 106: Security through Multi-Layer Diversity

Limitations of PlatPal

• User-interaction driven attacks

• Social engineering attacks

e.g., fake password prompt

• Other none-determinism to cause divergences

e.g., JavaScript gettime or RNG functions

106

Page 107: Security through Multi-Layer Diversity

Potential Deployment of PlatPal

• Not suitable for on-device analysis.

• Best suited for cloud storage providers which can scan for maldocs among existing files or new uploads.

• Also fits the model of online malware scanning services like VirusTotal.

• As a complementary scheme, PlatPal can be integrated with prior works to provide better prediction accuracy.

107

Page 108: Security through Multi-Layer Diversity

Conclusion

• It is feasible to harvest platform diversity for malicious document detection.

• PlatPal raises no false alarms in benign samples and detects a variety of behavioral discrepancies in malicious samples.

• PlatPal is scalable with various ways to deploy and integrate.

https://github.com/sslab-gatech/platpal(Source code will be released soon)

108

Page 109: Security through Multi-Layer Diversity

Future Works on Diversity Framework

• Implementation diversity

• Case study: PHP interpreters: Zend vs HHVM

• Integration with fuzzing

• Divergence as an indicator of exception, in addition to crashes and failed assertions

• Integration with symbolic execution

• Test whether two functionally similar modules enforce the same sequence and types of checks

109

Page 110: Security through Multi-Layer Diversity

Future Works on Diversity Framework

• Implementation diversity

• Case study: PHP interpreters: Zend vs HHVM

• Integration with fuzzing

• Divergence as an indicator of exception, in addition to crashes and failed assertions

• Integration with symbolic execution

• Test whether two functionally similar modules enforce the same sequence and types of checks

110

Page 111: Security through Multi-Layer Diversity

Future Works on Diversity Framework

• Implementation diversity

• Case study: PHP interpreters: Zend vs HHVM

• Integration with fuzzing

• Divergence as an indicator of exception, in addition to crashes and failed assertions

• Integration with symbolic execution

• Test whether two functionally similar modules enforce the same sequence and types of checks

111

Page 112: Security through Multi-Layer Diversity

Publications

1. Checking Open-Source License Violation and 1-day Security Risk at Large Scale Ruian Duan, Ashish Bijlani, Meng Xu, Taesoo Kim, and Wenke Lee In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS'17)

2. PlatPal: Detecting Malicious Documents with Platform Diversity Meng Xu, and Taesoo Kim In Proceedings of the 26th USENIX Security Symposium (Security'17) 

3. Bunshin: Compositing Security Mechanisms through Diversification Meng Xu, Kangjie Lu, Taesoo Kim, and Wenke Lee In Proceedings of the 2017 USENIX Annual Technical Conference (ATC'17) 

4. Toward Engineering a Secure Android Ecosystem: A Survey of Existing Techniques Meng Xu, Chengyu Song, Yang ji, Ming-Wei Shih, Kangjie Lu, Cong Zheng, Ruian Duan, Yeongjin Jang, Byoungyoung Lee, Chenxiong Qian, Sangho Lee, and Taesoo Kim In ACM Computing Surveys (CSUR) Volume 49, Issue 2, August 2016

5. UCognito: Private Browsing without Tears Meng Xu, Yeongjin Jang, Xinyu Xing, Taesoo Kim, and Wenke Lee. In Proceedings of the 22nd ACM Conference on Computer and Communications Security (CCS'15) 

112