Paradyn Project
Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2010
Binary Concolic Execution for Automatic Exploit Generation
Todd Frederick
Vulnerabilities are everywhere…
2Binary Concolic Execution
rtm
Robert Morris
An exploit
3Binary Concolic Execution
DD8F2F736800DD8F2F62696ED05E5ADD00DD00DD5ADD03D05E5CBC3B
DD8F2F736800DD8F2F62696ED05E5ADD00DD00DD5ADD03D05E5CBC3B
shell#Finger Server
1987
The problem: exploiting vulnerable codeo Find an exploit state in a program
oUse a known existing vulnerabilityo Previous work automatically finds
vulnerable states [Giffin, Jha, Miller 2006]
4Binary Concolic Execution
o Find input that drives the program down a path to the exploit stateoAnalyze program control flowoWalk through the program, finding inputs to
reach the current pointo Explore paths in the program to reach the
vulnerability
o Find an exploit state in a programoUse a known existing vulnerabilityo Previous work automatically finds
vulnerable states [Giffin, Jha, Miller 2006]
The problem
5Binary Concolic Execution
normal inputexploit
Program
Assume we know of a vulnerability
Running example
6Binary Concolic Execution
Program
login: goodbadpassword:Using backdoor!
Working with binary code
7Binary Concolic Execution
Program
8048282: lea 0x4(%esp),%ecx8048286: and $0xfffffff0,%esp8048289: pushl 0xfffffffc(%ecx)804828c: push %ebp804828d: mov %esp,%ebp804828f: push %ebx8048290: push %ecx8048291: sub $0x10,%esp8048294: call 8048210 <prompt>8048299: mov $0x3,%eax804829e: mov $0x0,%ebx80482a3: mov $0x80bd884,%ecx80482a8: mov $0x10,%edx80482ad: int $0x8080482af: mov %eax,0xfffffff0(%ebp)80482b2: movzbl 0x80bd886,%eax80482b9: movsbl %al,%edx80482bc: movzbl 0x80bd884,%eax80482c3: movsbl %al,%eax80482c6: mov %edx,%ecx80482c8: sub %eax,%ecx80482ca: mov %ecx,%eax80482cc: cmp $0x2,%eax80482cf: jne 8048302 <main+0x80>80482d1: movzbl 0x80bd886,%eax80482d8: movsbl %al,%edx
80482db: movzbl 0x80bd885,%eax80482e2: movsbl %al,%eax80482e5: mov %edx,%ecx80482e7: sub %eax,%ecx80482e9: mov %ecx,%eax80482eb: cmp $0x3,%eax80482ee: jne 8048302 <main+0x80>80482f0: movzbl 0x80bd886,%eax80482f7: cmp $0x64,%al80482f9: jne 8048302 <main+0x80>80482fb: call 804825c <backdoor>8048300: jmp 8048307 <main+0x85>8048302: call 8048236 <login>8048307: mov $0x1,%eax804830c: mov $0x0,%ebx8048311: int $0x808048313: mov %eax,0xfffffff4(%ebp)8048316: mov $0x0,%eax804831b: add $0x10,%esp804831e: pop %ecx804831f: pop %ebx8048320: pop %ebp8048321: lea 0xfffffffc(%ecx),%esp8048324: ret
exploit
Conceptual approach
8Binary Concolic Execution
Symbolic Execution
Program Generated Input
o Run program, tracking variables as expressions instead of actual (concrete) values
o Collect expressions along the current path
o Find concrete input to satisfy these expressions
Conceptual approach
9Binary Concolic Execution
o Run program, tracking variables as expressions instead of actual (concrete) values
o Collect expressions along the current path
o Find concrete input to satisfy these expressions
Program Generated InputSymbolic
Executor Solver
Path Conditio
ns
Conceptual approach
10Binary Concolic Execution
o Exponential number of pathso Limit and prioritize the paths we will
explore
Program Generated InputSymbolic
Executor Solver
Path Conditio
ns
Path Selector
Traditional symbolic execution
11Binary Concolic Execution
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Traditional symbolic execution
12Binary Concolic Execution
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Traditional symbolic execution
13Binary Concolic Execution
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] != 2
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2
Traditional symbolic execution
14Binary Concolic Execution
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] != 3
Traditional symbolic execution
15Binary Concolic Execution
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3 &&input[2] == ‘d’
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3 &&input[2] != ‘d’
Problems with symbolic execution• Must maintain exponentially many
symbolic states• Expressions may be difficult or
unfeasible to solve
16Binary Concolic Execution
Solution: Run program concretely and symbolically
Concrete ExecutionSymbolic ExecutionConcolic Execution
Concolic execution overview
17Binary Concolic Execution
Instructions
ProgramConcrete
Executor
Input
Generated InputSymbolic
Executor Solver
Path Conditio
ns
Path Selector
o Symbolic execution follows concrete path
o Some expressions use concrete values
Concolic execution• Advantages• Track less state in parallel by following a
single path at a time• Simplify expressions by substituting
concrete values for difficult sub expressions• Disadvantage• Concrete values only hold for a specific set
of concrete inputs, so mixing concrete values and expressions may produce inaccurate expressions
18Binary Concolic Execution
Concolic execution example
19Binary Concolic Execution
Inputgood
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer:
Concrete Memory
buffer:
Concolic execution example
20Binary Concolic Execution
Inputgood
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Concrete Memory
buffer:g,o,o,d
Concolic execution example
21Binary Concolic Execution
Inputgood
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] != 2
Concrete Memory
buffer:g,o,o,d
Concolic execution example
22Binary Concolic Execution
Inputgood
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2
Generated Inputegg
Concrete Memory
buffer:g,o,o,d
Concolic execution example
23Binary Concolic Execution
Inputegg
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer:
Concrete Memory
buffer:
Concolic execution example
24Binary Concolic Execution
Inputegg
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Concrete Memory
buffer:e,g,g
Concolic execution example
25Binary Concolic Execution
Inputegg
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2
Concrete Memory
buffer:e,g,g
Concolic execution example
26Binary Concolic Execution
Inputegg
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] != 3
Concrete Memory
buffer:e,g,g
Concolic execution example
27Binary Concolic Execution
Inputegg
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3
Generated Inputport
Concrete Memory
buffer:e,g,g
Concolic execution example
28Binary Concolic Execution
Inputport
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer:
Concrete Memory
buffer:
Concolic execution example
29Binary Concolic Execution
Inputport
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Concrete Memory
buffer:p,o,r,t
Concolic execution example
30Binary Concolic Execution
Inputport
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2
Concrete Memory
buffer:p,o,r,t
Concolic execution example
31Binary Concolic Execution
Inputport
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3
Concrete Memory
buffer:p,o,r,t
Concolic execution example
32Binary Concolic Execution
Inputport
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3 &&input[2] != ‘d’Concrete Memory
buffer:p,o,r,t
Concolic execution example
33Binary Concolic Execution
Inputport
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3 &&input[2] == ‘d’
Generated Inputbad
Concrete Memory
buffer:p,o,r,t
Concolic execution example
34Binary Concolic Execution
Inputbad
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer:
Concrete Memory
buffer:
Concolic execution example
35Binary Concolic Execution
Inputbad
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Concrete Memory
buffer:b,a,d
Concolic execution example
36Binary Concolic Execution
Inputbad
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2
Concrete Memory
buffer:b,a,d
Concolic execution example
37Binary Concolic Execution
Inputbad
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3
Concrete Memory
buffer:b,a,d
Concolic execution example
38Binary Concolic Execution
Inputbad
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3 &&input[2] == ‘d’Concrete Memory
buffer:b,a,d
Concolic execution example
39Binary Concolic Execution
Inputbad
read_input()if( input[2]–input[0] == 2
)
if( input[2] == ‘d’ )
if( input[2]-input[1] == 3 )
backdoor()login()
Symbolic Memory
buffer: input[0],input[1],input[2]
Path Condition
input[2]-input[0] == 2 &&input[2]-input[1] == 3 &&input[2] == ‘d’
Success
Concrete Memory
buffer:b,a,d
Inaccurate expressions• Some variables depend on input• Replacing these variables with concrete
values may yield inaccurate expressions
• Solving an inaccurate path condition may produce input that does not take the desired path
40Binary Concolic Execution
Concolic execution system design
41Binary Concolic Execution
Concrete Executor
Program
Input
Solver
Instructions
Generated InputSymbolic
Executor
Path Conditio
ns
Path Selector
Concolic execution system design
42Binary Concolic Execution
Concrete Executor
Program
Input
Instructions
Generated InputSymbolic
ExecutorSTP
(Solver)
Path Conditio
ns
Path SelectorSymEvalDyninst
ProcControl
API
Concrete execution components
43Binary Concolic Execution
Concrete Executor
DyninstProcContr
olAPI
Concrete execution components
44Binary Concolic Execution
Concrete Executor
•Redirects program input
•Reads actual values of instruction operands
•Tracks path takenDyninst
•Assists with static analysis
ProcControlAPI
•Runs program using single-stepping or breakpoints
Concolic execution system design
45Binary Concolic Execution
Concrete Executor
Program
Input
Instructions
Generated InputSymbolic
ExecutorSTP
(Solver)
Path Conditio
ns
Path SelectorSymEvalDyninst
ProcControl
API
Symbolic execution components
46Binary Concolic Execution
Symbolic Executor
SymEval
Symbolic execution components
47Binary Concolic Execution
Symbolic Executor
• Symbolic
memory • Identify input• Update symbolic
memory• Extract
conditional predicatesSymEval
•Represents instruction semantics as ASTs
Concolic execution system design
48Binary Concolic Execution
Concrete Executor
Program
Input
Instructions
Generated InputSymbolic
ExecutorSTP
(Solver)
Path Conditio
ns
Path SelectorSymEvalDyninst
ProcControl
API
Path searching components
49Binary Concolic Execution
STP(Solver)
Path Conditio
ns
Path Selector
Path searching components
50Binary Concolic Execution
STP(Solver)
•Designed for program analysis applications•Handles bit-vector data types
Path Conditions
•One term for each branch taken
Path Selector•Decides where to branch off from current path• Is a depth-first search for now•Other strategies will use static CFG analysis
Previous Work in Binary Concolic Execution• IDS signature generation [Song, et al.
2008]• Combined exploit strings to create
signatures• Required an initial exploit, or a patch for
the vulnerability• Program testing [Godefroid, et al. 2008]• Created test cases with maximum code
coverage in mind• Used instruction-level tracing for concrete
execution51Binary Concolic Execution
Potential Benefits of our Approach• Our approach will be capable of finding
the initial exploit
• We will do concrete execution with instrumentation, which gives us the flexibility to instrument selectively
• We plan to develop smarter path selection techniques using static control flow analysis
52Binary Concolic Execution
Status• Concrete execution partially
implemented using ProcControlAPI• Using standard input• Will support network and environment as
inputs• Symbolic execution and path selection
not implemented yet• Driving development of SymEval• Instruction semantics• AST simplification
53Binary Concolic Execution
Conclusion
54Binary Concolic Execution
Program
Exploit
Finding the
first exploit
with binaryconcolic
execution
using instrumentatio
n
movzbl 0x80bd886,%eaxcmp $0x64,%aljne 8048302call 804825c
input[2] == ‘d’
mov %edx,%ecxsub %eax,%ecxmov %ecx,%eaxcmp $0x2,%eaxjne 8048302
movzbl 0x80bd886,%eaxcmp $0x64,%aljne 8048302call 804825c
mov %edx,%ecxsub %eax,%ecxmov %ecx,%eaxcmp $0x3,%eaxjne 8048302