Unleashing Mayhem on Binary Code

Unleashing Mayhem on Binary Code

Sang Kil ChaThanassis Avgerinos

Alexandre RebertDavid Brumley

Carnegie Mellon University

2

Automatic Exploit Generation Challenge

Automatically Find Bugs & Generate Exploits

AEG

Program Exploits

I = input();if (I < 42) vuln();else safe();

3



Explore Program

4

Ghostscript v8.62 Bugint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

Reading user input from command line

Buffer overflow

CVE-2009-4270

5

Multiple Pathsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

ManyBranches!

6



Transfer Control to Attacker Code

(exec “/bin/sh”)

7

Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

outp

rint

f

…

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

8

Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

Read Return Address from Stack Pointer (esp)

8

outp

rint

f

…

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Control Hijack Possible

9Source

int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) {…Executables (Binary)

01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010

Unleashing MayhemAutomatically Find Bugs & Generate Exploits

for Executables

10

Demo

11

if x < 100

if x*x = 0xffffffff

x = input()

How Mayhem Works:Symbolic Execution

if x > 42

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x != 0xffffffff)

(x > 42) ∧ (x*x != 0xffffffff)

∧ (x >= 100)

f t

f t

f t

12

f t

f t

f t

x = input()


if x > 42

if x*x = 0xffffffff

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)

if x < 100

13

f t

f t

f t

x = input()

if x > 42

if x*x = 0xffffffff

vuln()

Path Predicate = Π

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)Π =

if x < 100

14

f t

f t

f t

x = input()


if x > 42

if x*x = 0xffffffff

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)

ViolatesSafety Policyif x < 100

15

int outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}

Safety Policy in Mayhem

outp

rint

f

…

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Return to user-controlled address

EIP not affected by user input

Instruction Pointer (EIP) level:

16

Exploit Generation

Π∧

input[0-31] = attack code∧

input[1038-1042] = attack code address

Exploit is an input that satisfies the predicate:

Exploit PredicateCan transfer

control to attack code?

Can position attack code?

17

Challenges

Symbolic Execution Exploit Generation

Efficient Resource Management

Symbolic IndexChallenge

Hybrid ExecutionIndex-based Memory

Model

18

Challenge 1: Resource Management inSymbolic Execution

19

Current Resource Management in Symbolic Execution

Online Symbolic Execution

Offline Symbolic Execution

(a.k.a. Concolic)

20

Offline ExecutionOne pathat a time Method 1:

Re-run from scratch Inefficient⟹

Re-executedevery time

21

Online Execution

Method 2:Stop forking

Miss paths⟹

Method 3: Snapshot process

Huge disk ⟹image

Hit Resource Cap

Fork at branches

22

Mayhem: Hybrid Execution

Our Method:Don’t snapshot state; use path predicate to recreate state

9.4M 500K

Hit Resource Cap

Fork at branches

Ghostscript 8.62

“Checkpoint”

23

Hybrid Execution

Manage #executorsin memory within resource cap✓

Minimize duplicated work✓

Lightweight checkpoints✓

24

Challenge 2: Symbolic Indices

25

Symbolic Indices

x = user_input();y = mem[x];assert (y == 42);

x can be anything

Which memory cell contains 42?

232 cells to check

Memory0 232 -1

26

One Cause: Overwritten Pointers

42

mem[0x11223344]

mem[input]

…

arg

ret addr

ptr

buf

user

inpu

t

… assert(*ptr==42); return;

ptr address 11223344ptr = 0x11223344

27

Another Cause: Table LookupsTable lookups in standard APIs:• Parsing: sscanf, vfprintf, etc.• Character test: isspace, isalpha, etc.• Conversion: toupper, tolower, mbtowc, etc.• …

28

Method 1: Concretization

Over-constrained• Misses 40% of exploits in our experiments

Π∧ mem[x] = 42 ∧ ’Π

Π ∧ x = 17∧ mem[x] = 42 ∧ ’Π

✓ Solvable✗ Exploits

29

Method 2: Fully Symbolic

Π ∧ mem[x] = 42 ∧ ’Π

✗ Solvable✓ Exploits

Π ∧ mem[x] = 42 ∧ mem[0] = v0 ∧…∧ mem[232-1] = v232-1

∧ ’Π

30

Our ObservationPath predicate (Π)constrains rangeof symbolic memoryaccesses

y = mem[x]

f t

x <= 42

x can be anything

f

t

x >= 50

Use symbolic execution state to:Step 1: Bound memory addresses referencedStep 2: Make search tree for memory address values

42 < x < 50Π

31

Step 1 — Find Boundsmem[ x & 0xff ]

1. Value Set Analysis1 provides initial bounds• Over-approximation

2. Query solver to refine bounds

Lowerbound = 0, Upperbound = 0xff

[1] Balakrishnan et al., Analyzing memory accesses in x86 executables, ICCC 2004

32

Step 2 — Index Search Tree Construction

y = mem[x]if x = 1 then y = 10

Index

MemoryValue

1012

22

20

if x = 2 then y = 12



ite( x < 3, left, right )

ite( x < 2, left, right )

33

Fully Symbolic vs.Index-based Memory Modeling

Fully Symbolic Index-based Piecewise Opt.0

5000

10000Time Timeout atphttpd

v0.4b

34

Index Search Tree Optimization:Piecewise Linear Approximation

y = 2*x + 10

y = - 2*x + 28

Index

MemoryValue

35

Piecewise Linear Approximation

Fully Symbolic Index-based Piecewise Opt.0

5000

10000Time

2x faster

atphttpd v0.4b

36

Exploit Generation

37

a2ps

aeon

aspell

atphttpd

freeradius

ghostscript

glftpd

gnugol

htget

htpasswd

iwconfig

mbse-bbs

nCompress

orzHttpd

psUtils

rsync

sharutils

socat

squirrel mail

tipxd

xgalaga

xtokkaetama

coolplayer

destiny

dizzy

galan

gsplayer

muse

soritong

1 10 100 1000 10000 100000

Linux

(22)

Windows

(7)

38

a2ps

aeon

aspell

atphttpd

freeradius

ghostscript

glftpd

gnugol

htget

htpasswd

iwconfig

mbse-bbs

nCompress

orzHttpd

psUtils

rsync

sharutils

socat

squirrel mail

tipxd

xgalaga

xtokkaetama

coolplayer

destiny

dizzy

galan

gsplayer

muse

soritong

1 10 100 1000 10000 100000

2 Unknown Bugs:FreeRadius,GnuGol

39

Limitations• We do not claim to find all exploitable bugs

• Given an exploitable bug, we do not guarantee we will always find an exploit

• Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc.

• We do not consider defenses, which may defend against otherwise exploitable bugs– Q [Schwartz et al., USENIX 2011]But Every Report is Actionable

40

Related Work• APEG [Brumley et al., IEEE S&P 2008]

– Uses patch to locate bug, no shellcode executed

• Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities

[Heelan, MS Thesis, U. of Oxford 2009]– Creates control flow hijack from crashing input

• AEG [Avgerinos et al., NDSS 2011]

– Find and generate exploits from source code

• BitBlaze, KLEE, Sage, S2E, etc.– Symbolic execution frameworks

41

Conclusion• Mayhem automatically generated 29 exploits

against Windows and Linux programs

• Hybrid Execution– Efficient resource management for symbolic

execution

• Index-based Memory Modeling– Handle symbolic memory in real-world

applications

42

Thank You• Our shepherd: Cristian Cadar• Anonymous reviewers• Maverick Woo, Spencer Whitman

43

Q&ASang Kil Cha ([email protected])http://www.ece.cmu.edu/~sangkilc/

mailto:[email protected]

http://www.ece.cmu.edu/~sangkilc/

Documents

Unleashing Mayhem on Binary Code