40
Design of x86 Emulator for Generic Unpacking Chandra Prakash (chandrap@sunbelt- software.com)

Design of x86 Emulator for Generic Unpacking

  • Upload
    zizi

  • View
    110

  • Download
    1

Embed Size (px)

DESCRIPTION

Design of x86 Emulator for Generic Unpacking. Chandra Prakash ([email protected]). The problem. Large number of detections are still based on some static signature, e.g., MD5, CRC32 etc. Malware has cleverly evolved to evade signature based detections by use of packers. - PowerPoint PPT Presentation

Citation preview

Page 1: Design of x86 Emulator for Generic Unpacking

Design of x86 Emulator for Generic Unpacking

Chandra Prakash(chandrap@sunbelt-

software.com)

Page 2: Design of x86 Emulator for Generic Unpacking

The problem

Large number of detections are still based on some static signature, e.g., MD5, CRC32 etc.

Malware has cleverly evolved to evade signature based detections by use of packers

Page 3: Design of x86 Emulator for Generic Unpacking

The problem, contd… It is possible to write custom packing

routines for each packer Cryptanalysis or X-Ray can also be used But, the number of packers and

variations within each packer type are too many, e.g., Current version range for UPX is 1.x–3.x and FSG is 1.x-2.x

Moreover, there can be recursive layers of packing done

Page 4: Design of x86 Emulator for Generic Unpacking

A Solution - Emulation

Due to nature of the problem, it is desirable to have a general purpose solution

Emulation provides a “fairly” general purpose solution that leads to the term Generic Unpacking

Page 5: Design of x86 Emulator for Generic Unpacking

What is Emulation? Wikipedia definition is pretty clear

“An emulator duplicates (provides an emulation of) the functions of one system using a different system, so that the second system behaves like (and appears to be) the first system. This focus on exact reproduction of external behavior is in contrast to simulation, which can concern an abstract model of the system being simulated, often considering internal state.”

Page 6: Design of x86 Emulator for Generic Unpacking

Emulation – where else is it used? Supporting cross-platform

applications Controlled and secure execution of

un-trusted applications And off course, Dynamic behavioral

analysis of malware and packed malware detection via generic unpacking

Etc.

Page 7: Design of x86 Emulator for Generic Unpacking

Emulation – to what degree? Full emulation – Emulate everything;

Application as well as the Operating System E.g., VMWare and VirtualPC

Application Only - Emulate application level instruction set and System Call interface E.g., Wow64, Win32 emulation on 64-bit

Windows Our emulator for Generic Unpacking is

Application Only

Page 8: Design of x86 Emulator for Generic Unpacking

Emulator Components A software implementation of the subset of

hardware, operating system and application environment needed for running an application.

The hardware components include: the CPU, registers, interrupt vector table. The operating system components include: PE loader, virtual memory manager, structured exception handling(SEH). The

Application environment include: input parameter and environmental variable support, heap, stack, process environment block(PEB), thread information block (TIB), function hooks for spoofing execution references into system dll(s)

Page 9: Design of x86 Emulator for Generic Unpacking

Emulator Components

+fetch()+executeOneInstruction()

X86CPU

+readByte()+writeByte()+virtualAlloc()+virtualFree()+virtualProtect()

MemoryManager

+readByte()+writeByte()

-base-size

MemoryRegion

+readByte()+writeByte()+generateAccessViolationException()

-allocationType-protectionType-startPage-pageCount

MemoryBlockDescriptor

+parseImage()+loadImage()

PELoader

1

*

1

*

+resize()

-top-bottom

Stack

+heapCreate()+heapDestroy()

Heap+run()

Process

Thread

«struct»X86Registers

«struct»PEB

«struct»PEB_LDR_DATA

«struct»TIB

1

*

+initiailize()+loadPEImage()+createProcess()

System

+addHook()+findHook()

HookList

1

1

SEHHandler

1

1

1

1

1

*

Page 10: Design of x86 Emulator for Generic Unpacking

Emu Components - PE Loader The very first step in a target’s emulation Create a memory-mapped image as per

Windows PE specifications. Calculate virtual mapped size Allocate contiguous buffer based on

virtual mapped size and the copy PE headers and section data in aligned sections

Fix imports from primary module Fix relocations

Page 11: Design of x86 Emulator for Generic Unpacking

Emu Components - Registers There are eight 32-bit general purpose

registers (EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI)

Six 16-bit segment registers (CS, SS, DS, ES, FS, GS), DR0-DR3, DR6, DR7 hardware debug registers

EFLAGS and EIP registers Added benefit to also provide support

for FPU instructions and extensions to x86 architecture, such as MMX, SSE, SSE2, SSE3[10] and 3DNow! instructions

Page 12: Design of x86 Emulator for Generic Unpacking

Emu Components - CPU Fetch instructions from the virtual

memory address space of the target Decode instruction; find instruction

type, get operands Execute instruction; calculate

results and store Move on to the next instruction as

indicated by EIP

Page 13: Design of x86 Emulator for Generic Unpacking

Emu Components – Interrupt Handling

INT N generates interrupt, with N range as 0-255

Execution of INT N results in a software exception in the application

From user mode only a subset of these are allowed, all others result in access violation exception

Page 14: Design of x86 Emulator for Generic Unpacking

Emu Components – Interrupt Handling…contd

User mode Exceptions for INT N as noted on Windows XP-SP2

Interrupt Number (N)

Exception thrown

3, 2d Breakpoint

4 Integer Overflow

2a, 2b, 2c, 2e None

All others Exception Violation

Page 15: Design of x86 Emulator for Generic Unpacking

Emu Components – Virtual Memory Manager Manages Virtual Memory used by the target at

the very lowest level Maintains memory regions

Each region consist of a contiguous sequence of pages, e.g., PE image region

Each page has its own allocation and protection characteristics

Allocation type include reserved, committed and free Protection type include read, write, execute etc.

Access violation generated when a memory reference is not compatible to the allocation and protection type for the region

Page 16: Design of x86 Emulator for Generic Unpacking

SEH handling Most commonly used to obfuscate

execution path by deliberate generation and handling of software exceptions.

Typically used instructions are: Single step (INT1) and break point

(INT3) instructions Arithmetic divide or integer overflow

exceptions that are generated by DIV/IDIV and INTO instruction.

Page 17: Design of x86 Emulator for Generic Unpacking

Stack The stack is a contiguous memory

region that serves among other things as a memory work area for parameters passed in function calls and SEH chain.

There exists one stack for each thread. It is implemented in an inverted manner

so that it grows in the direction of decreasing memory address.

The stack parameters, e.g., base, limit, address of top level exception handler frame, should be appropriately set in TIB

Page 18: Design of x86 Emulator for Generic Unpacking

Heap Heap enables efficient memory allocations

of much lower granularity as opposed to page granular allocations of VirtualAlloc call.

To support Win32 heap related calls made by the target, e.g., HeapAlloc, HeapFree, etc., a simulation for the same needs to be provided.

The heap is implemented as a wrapper around page granular memory allocation calls.

Page 19: Design of x86 Emulator for Generic Unpacking

Thread Information Block(TIB) For each thread there is a TIB structure stored at the address

indicated by FS:[18h] in each thread. +0x000 ExceptionList : Ptr32 _EXCEPTION_REGISTRATION_RECORD +0x004 StackBase : Ptr32 Void +0x008 StackLimit : Ptr32 Void +0x00c SubSystemTib : Ptr32 Void +0x010 FiberData : Ptr32 Void +0x010 Version : Uint4B +0x014 ArbitraryUserPointer : Ptr32 Void +0x018 Self : Ptr32 _NT_TIB

The first field ExceptionList in TIB contains address of the top level exception handler frame represented by EXCEPTION_REGISTRATION_RECORD structure.

StackBase and StackLimit contain lower bound and upper bound of the thread’s stack.

Address of PEB can be obtained as FS:[30h]

Page 20: Design of x86 Emulator for Generic Unpacking

Process Environment Block (PEB) For each user mode process there is one PEB Some of the important fields accessed by malware are:

BeingDebugged, ImageBaseAddress, InLoadOrderModuleList, InMemoryOrderModuleList and InInitializationOrderModuleList of PEB_LDR_DATA

The IsDebuggerPresent Win32 API simply returns value in BeingDebugged field of PEB. This is used by malware to detect debugger’s presence as one of the anti-debugging tricks

0x002 BeingDebugged : UChar //In PEB The sorted list of modules is maintained in three different

LIST_ENTRY type data structures in PEB_LDR_DATA +0x00c InLoadOrderModuleList : _LIST_ENTRY +0x014 InMemoryOrderModuleList : _LIST_ENTRY +0x01c InInitializationOrderModuleList : _LIST_ENTRY

Page 21: Design of x86 Emulator for Generic Unpacking

Function hooks In application-only emulator, any system call

made by malware in a dependent system module like kernel32.dll is intercepted and its corresponding spoofed implementation provided

Some of the functions include: LoadLibraryA/W, GetProcAddresss, GetModuleHandleA/W, VirtualAlloc, VirtualFree, HeapAlloc, HeapFree, GetVersionExA/W etc.

Also a default un-emulated function hook should also be provided that gets called when an un-implemented import function is encountered

Page 22: Design of x86 Emulator for Generic Unpacking

Stop Conditions Ideally emulator should be stopped at OEP Finding exact OEP in a generic way is non-

trivial Typical conditions other than the target

initiated explicit termination are: Encountering an un-emulated system call in a

dependent module. Unhandled exception for which no SEH handler

was found. Some of these exceptions include invalid memory read, write, execute, divide by zero, integer overflow.

Page 23: Design of x86 Emulator for Generic Unpacking

Stop Conditions…Contd Encountering an un-emulated or illegal

instruction. A configured timeout. Maximum number of instructions being

reached. Attempt to load a dll that could not be

located. Too many dlls being loaded by the

target in explicit load module.

Page 24: Design of x86 Emulator for Generic Unpacking

Emulator fine tuning due to malware unique characteristics Practical constraints due to performance

optimizations and undocumented features would allow only limited implementation of the emulator.

Once the core emulator system is ready, developing a robust emulator is an iterative process driven by minor fine tuning of it for unique characteristics of supported packers and symptoms exhibited by the malware test-bed.

Examples that follow describe some of the cases experienced with malware samples that lead to the improvement of our emulator.

The cases described in these examples are no way complete!

Page 25: Design of x86 Emulator for Generic Unpacking

Example 1 – Setting Initial Stack

0041C25A CALL 0041C25F0041C25F PUSH EBP0041C260 MOV EBX,DWORD PTR SS:[ESP+8]0041C264 MOV EBP,DWORD PTR SS:[ESP+4]0041C268 SUB DWORD PTR SS:[ESP+4],1A4AF

At address 0041C260, the MOV instruction references an address ([ESP+8]) at the top of initial stack.

This address is the return address after the CALL instruction in kernel32.dll that “calls” the malware entry point.

The return address actually ends up calling ExitProcess.

Page 26: Design of x86 Emulator for Generic Unpacking

Example 2 – Module load address alignment

004A1584 MOV EBX,DWORD PTR SS:[ESP+24] ; EBX=77E8141A004A1588 AND EBX,FFE00000 ; EBX=77E00000 . . .004A16C4 ADD EBX,10000004A16CA JE SHORT 004A16F7004A16CC CMP WORD PTR DS:[EBX],5A4D004A16D1 JNZ SHORT 004A16C4

At 004A16C4 EBX value is incremented by system allocation granularity.

At 004A16CC it compares the content of value located at the address in EBX with WORD type 5A4D (ascii ‘MZ’), which is the startup marker for a PE image.

If the address of the startup marker is found in the address pointed by EBX, execution follows to location 004A16C4.

Page 27: Design of x86 Emulator for Generic Unpacking

Example 3 – Startup Register Values

31428200 PUSH ED01C39031428205 MOV EAX,ESP31428207 CALL EAX0012FFC0 NOP0012FFC1 RETN31428209 XCHG EAX,EBX ; EAX=7FFDF000,

EBX=0012FFC03142820A POP EBX

At 31428209 EBX is referenced whose value is equal to the PEB address of the program

Page 28: Design of x86 Emulator for Generic Unpacking

Example 4 – Handling DLL emulation For the correct emulation of a dll,

before the entry point function DllMain gets called, its input parameters must be set in the stack as in Windows.

BOOL WINAPI DllMain( HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvReserved);

Page 29: Design of x86 Emulator for Generic Unpacking

Example 5 – Setting register values before calling SEH handler

0048C093 CMP AL,40048C095 JNZ SHORT 0048C09B0048C097 NOP0048C098 NOP0048C099 RETN

SEH handler’s second instruction at 0048C095 has a conditional jump instruction depending on whether AL is zero or not.

In real Windows, EAX is set to zero just before SEH handler gets control.

Therefore, before SEH handler gets control other registers should be set up as they are set in Windows.

Page 30: Design of x86 Emulator for Generic Unpacking

Example 6 – Setting top level exception handler in SEH

004141EF SUB EDX,EDX004141F1 MOV EAX,DWORD PTR FS:[EDX]004141F4 MOV ESP,DWORD PTR DS:[EAX]004141F6 POP DWORD PTR FS:[EDX]004141F9 POP EAX004141FA POP EBP004141FB RETN

Windows also registers another handler on top before application handler gets control

Malware had already configured return a address on the stack that gets executed after RETN at 004141FB

At 00414F4 it skips over the top level SEH handler and positions ESP to the SEH frame for this handler

At 004141F6 the two top SEH handlers are torn down and after 00414FB execution resumes at location specified by ESP, that was last updated at 00414DFA

Page 31: Design of x86 Emulator for Generic Unpacking

Example 7 – Check for BeingDebugged field in PEB

3142821B MOV EAX, DWORD PTR FS:[18]31428220 MOV EAX, DWORD PTR DS:[EAX+30]31428223 MOVZX EAX, BYTE PTR DS:[EAX+2]31428227 CMP EAX, 03142822A JNZ SHORT 3142826E3142822C CALL 3142823131428231 POP EBP

At 3142821B address of TIB is obtained which is used to get address of PEB at 31428220

At 31428223 BeingDebugged field of PEB is checked to evaluate the condition of the branch instruction at 3142822A

Page 32: Design of x86 Emulator for Generic Unpacking

Example 8 – Check for loader lists in PEB

0044D0A5 MOV EAX,DWORD PTR FS:[30]0044D0AB TEST EAX,EAX0044D0AD JS SHORT 0044D0BB0044D0AF MOV EAX,DWORD PTR DS:[EAX+C] 0044D0B2 MOV ESI,DWORD PTR DS:[EAX+1C] 0044D0B5 LODS DWORD PTR DS:[ESI]

At 0044DA05 PEB is referenced at 0044DA05. At 0044D0AF, 0044D0B2 and 0044D0B5,

PEB_LDR_DATA, InInitializationOrderModuleList and InInitializationOrderModuleList.Flink respectively are referenced

The malware happens to be referencing the kernel32.dll load information in its dependent module list sorted on initialization order

Page 33: Design of x86 Emulator for Generic Unpacking

Example 9 – Reference to Thread Local Storage

004033FA MOV EAX,DWORD PTR DS:[4503D4]00403400 TEST CL,CL00403402 JNZ SHORT 0040341A00403404 MOV EDX,DWORD PTR FS:[2C]0040340B MOV EAX,DWORD PTR DS:[EDX+EAX*4]0040340E RETN

At 00403404 the beginning of thread local storage pointer array value in FS:[2Ch] is copied over in EDX

The next instruction at 0040340B returns in EAX value of a TLS pointer as indexed by previous value in EAX

Page 34: Design of x86 Emulator for Generic Unpacking

Example 10 – Normalizing malformed PEs in loader All Win32 PE executables are expected

to follow the PE format specifications in the strictest sense

Yet, it is seen that many malware samples do not conform to these formal guidelines and are still allowed to be run by the Windows loader.

In general a malware should be loaded by the emulator as long as Windows loader accepts it by relaxing constraints on these kind of aberrations

Page 35: Design of x86 Emulator for Generic Unpacking

Example 10 – Normalizing malformed PEs in loader…contd

Structure Field Value

Dos Header e_lfanew 0x10

Optional Header

SizeOfCode 0x4c454e52

Optional Header

SizeOfInitializedData

0x442e3233

Optional Header

AddressOfEntryPoint

0x11a4

Section Header

PointerToRawData 0x10

Page 36: Design of x86 Emulator for Generic Unpacking

Emulator Performance Optimizations

In plain emulation instructions are executed in software

Plain emulation is hundreds of times slower than native execution

Is not well suited for malware that require emulation for hundreds of millions of instructions

Page 37: Design of x86 Emulator for Generic Unpacking

Emulator Performance Optimizations – Dynamic Binary Translation (DBT)

Frequently executed instructions, e.g., decryption loop, are translated into native instructions

Repeat execution of same set of instructions above a certain threshold causes their translated counterpart to be executed

DBT is only about ten times slower than native execution

Page 38: Design of x86 Emulator for Generic Unpacking

Some more DBT details Code is partitioned into a sequence of

Basics Blocks (BB) Each BB is self contained and does

not contain any branch instructions For each BB corresponding translation

of native instruction obtained There is a performance hit at the time

of translation but that’s one time

Page 39: Design of x86 Emulator for Generic Unpacking

Page fault handler based unpacking All memory writes from a packed program are

monitored from kernel until an execute is issued in the modified monitored memory regions

The page fault handler based unpacking system yields maximum speed improvements as the malware is allowed to run natively on the host machine and in that sense does not require any kind of emulation.

But, its implementation is discouraged as it requires un-conventional ways of modification of page fault interrupt handler in the kernel and may not even work on 64-bit Vista because of patch guard protection.

Page 40: Design of x86 Emulator for Generic Unpacking

Thank You