Upload
melvyn-gallagher
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
ADLab
1
K E V I N C O O G A N , G E N L U , S A U M YA D E B R AY
D E PA RT M E N T O F C O M U P U T E R S C I E N C E U N I V E R S I T Y O F A R I Z O N A
報告者:張逸文
Deobfuscation of Virtualization-Obfuscated Software
ADLab
4
Introduction( 2/4)
Virtualization obfuscators VMProtect, Code Virtualizer
{
VIRTUALIZER_START
your code
VIRTUALIZER_END
}
ADLab
5
Introduction( 3/4)
The virtualization-obfuscated programs are resistant to static and dynamic analysis techniques The executed code reveals only the structure and logic of the byte-
code interpreter Randomness VM
Outside-in approach Reverse engineer the VM interpreter Individual byte code instructions Recover the logic The structure of the interpreter meets certain requirements
ADLab
6
Introduction( 4/4)
Programs interact with the system through system callsIdentifying instructions that interact with the systemNot recovering the original instructionsCapturing behavior of the codeGeneral, using in a wide range
ADLab
7
Deobfuscation
Static analysis v.s dynamic traceIdentifying instructions that are known to be part of the
original codeNo information about the specific structure of the
interpreter
ADLab
8
Deobfuscation
Overall approach:1. Tracing tool
Low level execution trace
2. Identifying system calls and their arguments database
3. Instruction trace Relevant instructions
4. Building a subtrace Relevant subtrace
9
Deobfuscation
ADLab
Value-based Dependence Analysis Not recovering the original code The process of deobfuscation must be semantics-preserving Identifying instructions that affect the values of the arguments to
system calls Slicing algorithms --- control-dependent Data dependencies Use-definition chains --- link instructions that use a variable to the
instruction that define it Problem:
10
Deobfuscation
ADLab
Value-based dependence
if( I defines a location l S) {
I is marked as relevant;
l is removed from S;
the set of locations used by I is added to S; } Problem: a pointer to a structure
I uses some locations l1, l2, … , ld
if ( I uses li P to define ld )
ld is added to P
if ( li access a memory location )
[li ] is added to M
ADLab
11
Deobfuscation
Relevant Conditional Control Flow Value-based dependence analysis doesn’t identify the associated
control flow instructions The occurring of conditional control flow IA-32 architecture setting the condition code flags in the eflags
register Not such simple!! Examining target address Equational Resoning System: translate each instruction in the
dynamic trace into an equivalent set of equations
ADLab
12
Deobfuscation
Equational Resoning System Identifies conditional dependencies The left hand side variables in an equation is numbered by the order of
its instruction appears The right hand side variables is numbered by the instruction that defined
it Example 1.
ADLab
17
Deobfuscation
Relevant Call-Return Control Flow Identifying functions: the behavior of calls and returns Knowing how them work allows one to use for other purposes Behavior of Function Calls and Returns
19
Deobfuscation
ADLab
Identification Approach Call: a code address is saved at the call site Return: the saved address is used for a control transfer at the return
point
21
Experimental Evaluation
ADLab
Experimental Methodology Compile original source code Generate an original dynamic trace Build an original subtrace Virtualization-obfuscation technique Generate an obfuscated dynamic trace Build a relevant subtrace of the obfuscated subtrace The obfuscated subtrace is matched to the original subtrace and
scores are produced The relevance score and obfuscation score are calculated
ADLab
23
Related Work
Deobfuscation of code obfuscated via virtualization obfuscators Rolles, Sharif, Falliere
Programming language community Partial evaluation
ADLab
24
Conclusions
Virtualization-obfuscated programs are difficult to reverse engineer
We present a different approach to identifying the flow of values to system call instructions