View
223
Download
0
Tags:
Embed Size (px)
Citation preview
SRE Basics 1
SRE Basics
SRE Basics 2
In this Section… We briefly cover following topics
o Assembly codeo Virtual machine/Java bytecodeo Windows PE file format
SRE Basics 3
Assembly Code
SRE Basics 4
High Level Languages First, high level languages… Ancient high level languages
o Basic --- little structureo FORTRAN --- limited structureo C --- “structured” language
C was designed to deal with complexityo OO languages take this one step further
Above languages considered primitive today
SRE Basics 5
High Level Languages Object oriented (OO) languages
o “Object” groups code and data together
o Consider best way to handle complexity (at least for now…)
Important OO ideas includeo Encapsulation, inheritance,
polymorphism
SRE Basics 6
High Level Languages Program must deal with code and
data Data
o Variables, data structures, files, etc. Code
o Reverser must study control flowo Conditionals, switches, loops, etc.
SRE Basics 7
High Level Languages High level languages --- different users
want different thingso Goes back (at least) to C vs FORTRAN
Today, major tradeoff is between simplicity and flexibilityo Simplicity --- easy to write short program to
do exactly what you want (e.g., C)o Flexibility --- language has it all (e.g., Java)
SRE Basics 8
High Level Languages Some languages compiled into native
codeo exe is specific to the hardwareo C, C++, FORTRAN, etc.
Other languages “compiled” into “code”, which is interpreted by a virtual machineo Java, C#o Often possible to make compiled version
For reverser, this distinction is far more important than OO or not
SRE Basics 9
Intro to Assembly At the lowest level, machine binary Assembly code lives between
binary and high level languages When reversing native code, we
must deal with assembly codeo Why assembly code?o Why not “reverse” binary to, say, C?
SRE Basics 10
Intro to Assembly Reverser would like to deal with
high level, but is stuck with low level
Ideally, want to create mental “link” from low level to high levelo Easier for code written in Co Harder for OO code, such as C++o Why?
SRE Basics 11
Intro to Assembly Perhaps biggest difference at assembly
level is dealing with datao High level languages hide lots and lots of
details on data manipulationso For example, loading and storing
Also, low level instructions are primitive o Each instruction does not do very much
SRE Basics 12
Intro to Assembly Consider following simple C
program
Simple, but far higher level than assembly code
int multiply(int x, int y)
{
int z;
z = x * y;
return z;
}
SRE Basics 13
Intro to Assembly
In assembly code…1. Store state before entering function2. Allocate memory for z3. Load x and y into registers4. Multiply x by y and store result in register5. Copy result back to memory for z (optional)6. Restore state that was stored in 1.7. Return z
int multiply(int x, int y)
{
int z;
z = x * y;
return z;
}
SRE Basics 14
Intro to Assembly Why are things so complicated at low
level? It’s all about efficiency! Reading memory and storing are slow No single asm instruction to read
memory, operate on it, and store resulto But this is common in high level languages
SRE Basics 15
Intro to Assembly Registers --- “local” processor memory
o So don’t have to read and write RAM Stack --- “scratch paper” (in RAM)
o Holds register values, local variables, function parameters and return values
o E.g., storage for “z” in multiply example Heap --- dynamic, variable-sized data Data section --- e.g., string constants Control flow --- high level “if” or “while” are
much more complex at low level
SRE Basics 16
Registers Registers used in most instructions Specifics here deal with “IA-32”
o Intel Architecture, 32-bito Used in “Wintel” machineso We use IA-32 notationo AT&T notation also exists
Eight 32-bit registers (next slide)o All 8 start with “E”o Also several system registers
SRE Basics 17
Registers EAX, EBX, EDX --- generic, used for int,
Boolean, …, memory operations ECX --- generic, used as counter ESI/EDI --- generic, source/destination
pointers when copying memoryo SI == source index, DI == destination index
EBP --- generic, stack “base” pointero Usually, stack position after return address
ESP --- stack pointero Curretn stack frame is between ESP to EBP
SRE Basics 18
Flags EFLAGS --- special registers
o Status flags updated by various operations to “record” outcomes
o System flags too, but we don’t care about them
Flags are basic tool for conditionals For example, a TEST followed by a jump
instructiono TEST sets various flags, jump determines
action to take, based on those flags
SRE Basics 19
Instruction Format Most instructions consist of…
o Opcode --- the “instruction”o One or two operands --- “parameter(s)”
Operand (parameters) are data Operands come in 3 flavors
o Register name --- for example, EAXo Immediate --- e.g., hard-coded constanto Memory address --- enclosed in [brackets]
SRE Basics 20
Operand Examples EAX
o Read from (or write to) EAX register, depending on opcode
0x30004040 o Immediate --- number is embedded in code o Usually a constant in high-level code
[0x4000349e]o This os a memory addresso Could be a global variable in high level code
SRE Basics 21
Basic Instructions We cover a few common instructions
o First we give general formato Later, we give a few simple examples
There are lots of assembly instructions But, most assembly code uses only a few
o About 14 assembly instructions account for more than 90% of all code
SRE Basics 22
Opcode Counts Typical opcode counts, “normal” code
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics 23
Opcode Counts Opcode counts, typical virus code
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics 24
Instructions We consider following operations
o Moving datao Arithmetico Comparisonso Conditional brancheso Function calls
SRE Basics 25
Moving Data MOV is the most popular opcode 2 operands, destination and
source:o MOV DestOperand,
SourceOperand Note the order
o Destination first, source second
SRE Basics 26
Arithmetic
Six integer arithmetic operationso ADD, SUB, MUL, DIV, IMUL, IDIV
Many variations based on operandso ADD Op1, Op2 ; add, store result in Op1o SUB Op1, Op2 ; sub Op2 from Op1 --> Op1o MUL Op ; mul Op by EAX --->
EDX:EAXo DIV Op ; div EDX:EAX by Op
quotient ---> EAX, remainder ---> EDX
o IMUL, IDIV --- like MUL and DIV, but signed
SRE Basics 27
Comparisons CMP opcode has 2 operands
o CMP Operand1, Operand2 Subtracts Operand2 from
Operand1 Result “stored” in flag bits
o If 0 then ZF flag is seto Other flags can be used to tell which
is greater, depending on signed or unsigned
SRE Basics 28
Conditional Branches Conditional branches use “Jcc”
family of instructions (je, jne, jz, jnz, etc.)
Format iso Jcc TargetAddress
If Jcc true, goto TargetAddresso Otherwise, what happens?
SRE Basics 29
Function Calls Use CALL and RET
o CALL FunctionAddress……o RET ; pops return address
RET can be told to increment ESPo Need to reset stack pointero Why?
SRE Basics 30
Examples
What does this do? Compares value in EBX with
constant Jumps to specified address if
operands are not sameo Note: JNE and JNZ are same instruction
cmp ebx,0xf020
jnz 10026509
SRE Basics 31
Examples
What does this do? First, add 0x5b0 to ECX register, get
value at that memory and put in EDI Next, add 0x5b4 to ECX, get value at
that memory and put in EBXo Note that ECX points to some data structure
Finally, EDI = EDI * EBXo Note there are different forms of IMUL
mov edi,[ecx+0x5b0]
mov ebx,[ecx+0x5b4]
imul edi,ebx
SRE Basics 32
Examples
What does this do? PUSH four register values PUSH something related to stack ptr
o Probably, parameter or local variableo Would need to look at more code to decideo Note “dword ptr” is effectively a cast
CALL a function
push eax
push edi
push ebx
push esi
push dword ptr [esp+0x24]
call 0x10026eeb
SRE Basics 33
Examples
What does this do? Maybe “data structure in an array” Last line
o ECX --- gets base pointero EAX --- current offset into the arrayo Add 4 to get specific member of structure
mov eax, dword ptr [ebp - 0x20]
shl eax, 4
mov ecx, dword ptr [ebp - 0x24]
cmp dword ptr [eax+ecx+4], 0
call 0x10026eeb
SRE Basics 34
Examples AT&T syntax
pushl $14pushl $helloWorldpushl $1movl $4, %eaxpushl %eaxint $0x80addl $16, %esppushl $0movl $1, %eaxpushl %eaxint $0x80
SRE Basics 35
Compilation Converts high level representation of
code to binary Front end --- lexical analysis
o Verify syntax, etc. Intermediate representation Optimization
o Improve structure, eliminate redundancy, …
SRE Basics 36
Compilation Back end --- generates the actual code
o Instruction selectiono Register allocationo Instruction scheduling --- pipelining,
parallelism Back end process might make
disassembly hard to reado Optimization too
Each compiler has its own quirkso Can you automatically determine compiler?
SRE Basics 37
Virtual Machines & Bytecode
SRE Basics 38
Virtual Machines Some languages instead generate
intermediate bytecode Bytecode runs in a virtual
machineo Virtual machine is a program that
(historically) interprets bytecode o Translates bytecode for the hardware
Bytecode analogous to assembly code
SRE Basics 39
Virtual Machines Advantages?
o Hardware independent Disadvantages?
o Slow Today, usually just-in-time compilers
instead of interpreterso Compile snippets of bytecode into native
code as needed
SRE Basics 40
Reversing Bytecode Reversing bytecode is easy
o Unless special precautions are takeno Even then, easier than native code
Bytecode usually contains lots of metadatao Possible to reconstruct highly accurate high
level language Bytecode can be obfuscated
o In worst case, reverser must learn bytecodeo But bytecode is easier than native code
SRE Basics 41
Windows PE Files
SRE Basics 42
Windows PE File Format Designed to be standard
executable file format for all versions of OS…o …on all supported processors
Only small changes since PE format was introducedo E.g., support for 64-bit Windows
SRE Basics 43
Windows PE Files Trivia
o Q: What’s the difference between exe and dll?
o A: Not much --- one bit differs in PE fileso Q: What is size of smallest possible PE file?o A: 133 bytes
PE file on disk is a fileo Once loaded into memory, it’s a moduleo File is mapped to moduleo Address where module begins is HMODULEo PE file may not all be mapped to module
SRE Basics 44
Windows PE Files WINNT.H is final word on what PE file
looks like Tools to examine PE files
o Dumpbin (Visual Studio)o Dependso PE Browse Professional
In spite of its name, it’s free
o PEDUMP (by author of article)
SRE Basics 45
PE File Sections Each section is “chunk of code or data
that logically belongs together”o For example, all import tables in one section
Code is in .text sectiono Code is code, but many types of data
Data exampleso Program data (e.g., .rdata for read-only)o API import/export tableso Resources, relocation info, etc.
Can specify section names in C++ source
SRE Basics 46
PE File Sections When mapped, module starts on a
page boundary Linker can be told to merge
sectionso E.g., to merge .text and .rdata:o /MERGE:.rdata=.texto Some sections commonly mergedo Some sections cannot be merged
SRE Basics 47
Relative Virtual Addresses Exe file specifies in-memory addresses PE file specifies preferred load location
o But DLL can actually load just about anywhere
So, PE specifies addresses in a way that is independent of where it loads o No hardcoded addresses in PEo Instead, Relative Virtual Addresses (RVAs)o RVA is an offset relative to where PE is
loaded
SRE Basics 48
Relative Virtual Addresses To find actual memory location, add
RVA to the actual load address For example, suppose
o Exe file is loaded at 0x400000o And RVA is 0x1000o Then code (.text) starts at 0x401000
In Windows terminology, actual address is known as Virtual Address (VA)
SRE Basics 49
Data Directory There are many data structures within
exeo For efficiency, must be loaded quicklyo E.g., imports, exports, resources, base
relocations, etc. DataDirectory
o Array of 16 data structureso #define IMAGE_DIRECTORY_ENTRY_xxx
defines array indexes (0 to 15)
SRE Basics 50
Importing Functions To use code or data from another DLL,
must import it When PE file loads, Windows loader
locates imported functions/datao Usually automatic, when program first startso Imported DLLs may import otherso For example, any program created with
Visual C++ imports KERNEL32.DLL…o …and KERNEL32.DLL imports from
NTDLL.DLL
SRE Basics 51
Importing Functions Each PE has Import Address Table (IAT)
o IAT contains arrays of function pointers o One array per imported DLL
Each imported API has spot in IATo The only place where API address storedo So, all calls to API go thru one function ptro E.g., CALL DWORD PTR [0x00405030]o But, by default it’s a little more complex…
SRE Basics 52
PE File Structure Next slides describe PE file structure Note that all of these data structures
defined in WINNT.H Usually, 32-bit and 64-bit versions For example,
o IMAGE_NT_HEADERS32o IMAGE_NT_HEADERS64o Identical except for widened fields for 64-bit
SRE Basics 53
MS-DOS Header Every PE begins with small MS-DOS exe
o Prints message saying Windows required MS-DOS Header
o IMAGE_DOS_HEADERo 2 “important” valueso e_lfanew --- file offset of PE headero e_magic --- 0x5A4D, “MZ” in ASCII… Why
MZ?
SRE Basics 54
IMAGE_NT_HEADERS Header
Primary location for PE specifics Location in file given by e_lfanew One version for 32-bit exes and
another for 64-bit exeso Only minor differences between themo Single bit specifies 32-bit or 64-bit
SRE Basics 55
IMAGE_NT_HEADERS Header
Has 3 fieldstypedef struct _IMAGE_NT_HEADERS {
DWORD Signature;IMAGE_FILE_HEADER FileHeader;IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32
In valid PE, Signature is 0x00004550o In ASCII, this is “PE00”
SRE Basics 56
IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS {DWORD Signature;IMAGE_FILE_HEADER FileHeader;IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32
IMAGE_FILE_HEADER predates PEo Struct containing basic info about fileo Most important info is size of “optional
data” that follows (not really optional)
SRE Basics 57
IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS {DWORD Signature;IMAGE_FILE_HEADER FileHeader;IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32
IMAGE_OPTIONAL_HEADER o DataDirectory array (at end) is “address
book” of important locations in exeo Each entry contains RVA and size of data
SRE Basics 58
PE Sections Recall, section is “chunk of code or
data that logically belongs together”
For exampleo All data for exe’s import tables are in
one section
SRE Basics 59
Section Table Section table contains array of
IMAGE_SECTION_HEADER structs An IMAGE_SECTION_HEADER has
info about associated sectiono Location, length, and characteristicso Number of such headers given by
field: IMAGE_NT_HEADERS.FileHeader.NumberOfSections
SRE Basics 60
Alignment of Sections Visual Studio 6.0
o 4KB sections by default Visual Studio .NET
o 4KB by default, except for small files uses 0x200-byte alignment
o Also, .NET spec requires 8KB in-memory alignment (for IA-64 compatibility)
SRE Basics 61
PE Sections So far, overview of PE file format Now, look inside important
sections…o …and some data structures within
sections Then we finish with look at PEDUMP
o Recall there are other similar utilities
SRE Basics 62
Section Names .text ---The default code section. .data --- The default read/write data
section. Global variables typically go here.
.rdata --- The default read-only data section. String literals and C++/COM vtables are examples of items put into .rdata.
SRE Basics 63
Section Names .idata --- The imports table. It has become
common practice (explicitly, or via linker default behavior) to merge .idata into another section, typically .rdata. By default, the linker only merges the .idata section into another section when creating a release mode exe.
.edata --- The exports table. When creating an executable that exports APIs or data, the linker creates an .EXP file which contains an .edata section that's added into the final executable. Like the .idata section, the .edata section is often found merged into the .text or .rdata sections.
SRE Basics 64
Section Names .rsrc --- The resources. This section is read-
only. However, it should not be renamed and should not be merged into other sections.
.bss --- Uninitialized data. Rarely found in exes created with recent linkers. Instead, the VirtualSize of the exe's .data section is expanded to make room for uninitialized data.
.crt --- Data added for supporting the C++ runtime (CRT). A good example is the function pointers that are used to call the constructors and destructors of static C++ objects.
SRE Basics 65
Section Names .tls --- Data for supporting thread local storage
variables declared with __declspec(thread). This includes the initial value of the data, as well as additional variables needed by the runtime.
.reloc --- Base relocations in an exe. Base relocations are generally only needed for DLLs and not EXEs. In release mode, the linker doesn't emit base relocations for EXE files. Relocations can be removed when linking with the /FIXED switch.
.sdata --- "Short" read/write data that can be addressed relative to the global pointer. Used for IA-64 and other architectures that use a global pointer register. Regular-sized global variables on the IA-64 will go in this section.
SRE Basics 66
Section Names .srdata --- "Short" read-only data that can be addressed
relative to the global pointer. Used on the IA-64 and other architectures that use a global pointer register.
.pdata --- The exception table. Contains an array of IMAGE_RUNTIME_FUNCTION_ENTRY structs, CPU-specific. Pointed to by IMAGE_DIRECTORY_ENTRY_EXCEPTION slot in the DataDirectory. Used for architectures with table-based exception handling, such as the IA-64. The only architecture that doesn't use table-based exception handling is the x86.
.didat --- Delayload import data. Found in exes built in nonrelease mode. In release mode, the delayload data is merged into another section.
SRE Basics 67
Exports Section Exe may export code or data
o Makes it available to other exeso Refer to an exported thing as a symbol
At minimum, to export symbol, must specify its address in defined wayo Keyword ORDINAL tells linker to use
numbers, not names, for symbolso After all, names just a convenience for
coders
SRE Basics 68
IMAGE_EXPORT_DIRECTORY
Points to 3 arrayso And a table of ASCII strings containing
symbol names Only required array is Export
Address Table (EAT)o Array of function pointerso Addresses of exported functionso Export ordinal is an index into this
array
SRE Basics 69
IMAGE_EXPORT_DIRECTORY
Structure example
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics 70
Exampleexports table:
Name: KERNEL32.dll
Characteristics: 00000000
TimeDateStamp: 3B7DDFD8 -> Fri Aug 17 23:24:08 2001
Version: 0.00
Ordinal base: 00000001
# of functions: 000003A0
# of Names: 000003A0
Entry Pt Ordn Name
00012ADA 1 ActivateActCtx
000082C2 2 AddAtomA
•••remainder of exports omitted
SRE Basics 71
Example Spse, call GetProcAddress on AddAtomA API
o System locates KERNEL32’s IMAGE_EXPORT_DIRECTORY
o Gets start address of Export Names Table (ENT)o It finds there are 0x3A0 entries in ENTo Does binary search for AddAtomAo Suppose AddAtomA is 2nd entry…o …loader reads 2nd value from export ordinal table
SRE Basics 72
Example (Continued) Call GetProcAddress on AddAtomA
APIo … AddAtomA has export ordinal 2o Use this as index into EAT (taking into
account base field value)o Finds AddAtomA has RVA of 0x82C2o Add 0x82C2 to load address of
KERNEL32 to get actual address of AddAtomA
SRE Basics 73
Export Forwarding Can forward export to another DLL
o That is, must find it at “forward” address Example
o KERNEL32 HeapAlloc function forwarded to RtlAllocHeap function exported by NTDLL
o In EXPORTS section of KERNEL32, findEXPORTS
…HeapAlloc = NTDLL.RtlAllocHeap
SRE Basics 74
Imports Section Importing is opposite of exporting IMAGE_IMPORTS_DESCRIPTOR
o Points to 2 essentially identical arrayso Import Address Table & Import Name Table
IAT and INT
o Contain ordinal, address, forwarding infoo After binding, IAT rewritten, INT retains
original (pre-binding) infoo Binding discussed next…
SRE Basics 75
Imports Section Example
o Importing APIs from USER32.DLL
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics 76
Binding Binding means IAT overwritten
with actual addresseso VAs overwrite RVAs
Why do this?o Increased efficiency
Loader checks whether binding valid
SRE Basics 77
Delayload Data Hybrid between implicit & explicit
importing Not an OS issue
o A linker issue, at runtime There is IAT and INT for the DLL
o Identical to regular IAT and INTo But read by runtime library code instead of
OS Benefit? Calls then go directly to API…
SRE Basics 78
Resources Section For resources such as…
o icons, bitmaps, dialogs, etc. Most complicated section to
navigate Organized like a file system…
SRE Basics 79
Base Relocations Executable has many memory addresses As mentioned, PE file specifies preferred
memory address to load the moduleo ImageBase field in IMAGE_FILE_HEADER
If DLL loaded elsewhere, all addresses will be incorrecto Base relocations tell loader all locations that
need to be modifiedo Note that this is extra work for the loader
What about EXE, which is not a DLL?
SRE Basics 80
Base Relocation Example Consider the following line of code
Note that “8B 0D” specifies opcodeo Also note the address 0x0040D434
Suppose preferred load is at 0x00400000
If it loads at that address, it runs as-is Suppose instead it loads at 0x00500000 Then code above needs to change to
00401020: 8B 0D 34 D4 40 00 mov ecx,dword ptr [0x0040D434]
8B 0D 34 D4 50 00 mov ecx,dword ptr [0x0050D434]
SRE Basics 81
Base Relocation Example If not loaded at preferred address, then
loader computes delta For example on previous slide…
o delta = 0x00500000 - 0x0040000o So, delta is 0x00100000
Also, there would be base relocation specifying location 0x00401020o Loader modifies address located here by
delta
SRE Basics 82
Debug Directory Contains debug info Not required to run the program
o But useful for development Can be multiple forms of debug
infoo Most common is PDB file
SRE Basics 83
.NET Header .NET executables are PE files However, code/data is minimal Purpose of PE is simply to get .NET-
specific info into memoryo Metadata, intermediate language (IL)o MSCOREE.DLL at start of a .NET processo This dll “takes charge” and uses metadata
and IL from executableo So PE has stub to get MSCOREE.DLL going
SRE Basics 84
TLS Initialization Thread Local Storage (TLS)
o .tls section for thread local variables New threads initialized using .tls data Presence of TLS data indicated by
nonzero IMAGE_DIRECTORY_ENTRY_TLS in DataDirectoryo Points to IMAGE_TLS_DIRECTORY structo Contains virtual addresses, VAs (not RVAs)o The actual struct is in .rdata, not in .tls
SRE Basics 85
Program Exception Data x86 architecture uses frame-based
exception handlingo A fairly complex way to handle exceptions
IA-64 and others use table-based approacho Table containing info about every function that
might be affected by exception unwindingo Table entry includes start and end addresses,
how and where exception to be handledo When exception occurs, search thru table…
SRE Basics 86
PEDUMP Tools for analyzing PE files
o Dumpbin (Visual Studio)o Dependso PE Browse Professional
In spite of its name, it’s freeo PEDUMP (by author of article)