16
Binary‐level program analysis: A discussion of x86‐64 Gang Tan CSE 597 Spring 2019 Penn State University 2 * These slides follow Sec 3.13 of the book CSAPP “Computer Systems: A Programmer’s Perspective”; Figures and slides are borrowed/adapted from that book

Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Binary‐level program analysis:A discussion of x86‐64

Gang TanCSE 597

Spring 2019Penn State University

2

* These slides follow Sec 3.13 of the book CSAPP “Computer Systems: A Programmer’s Perspective”; Figures and slides are borrowed/adapted from that book

Page 2: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Intel’s 64‐Bit History

• 2001: Intel Attempts Radical Shift from IA32 to IA64– Totally different architecture (Itanium)– Executes IA32 code only as legacy– Performance disappointing

• 2003: AMD Steps in with Evolutionary Solution– x86‐64 (now called “AMD64”)

• Intel Felt Obligated to Focus on IA64– Hard to admit mistake or that AMD is better

• 2004: Intel Announces EM64T extension to IA32– Extended Memory 64‐bit Technology– Almost identical to x86‐64!

• All but low‐end x86 processors support x86‐64– But, lots of code still runs in 32‐bit mode

3

Page 3: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Overview of x86‐64

• Pointers and long integers are 64 bits long– Integer arithmetic operations support 8, 16, 32, and 64 bits

• 16 general‐purpose registers; each 64‐bit long• Calling conventions pass more parameters via registers

– System V AMD64 ABI: passes the first 6 parameters in registers– As a result, some procedures do not need to access the stack at 

all.• Conditional operations are implemented using conditional 

move instructions when possible– Better performance than using branches

• Floating‐point operations are implemented using the register‐oriented instruction set in SSE version 2– Rather than the stack‐based approach in IA32

4

Page 4: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

x86‐64 Data Types

5

Fig 3.34 of CSAPP

Page 5: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

16 64‐bit GP Registers

6Fig 3.35 of CSAPP

Page 6: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Instruction Operands

• Similar to IA32– Except that the base and index registers must use the r‐version of registers

• In addition, PC‐relative addressing– “add rax, 0x200ad1[rip]” accesses mem at address rip+0x200ad1

7

Page 7: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Function Calling: Argument Passing

8

• The following slides assume the System V AMD64 ABI• Arguments (up to the first six) are passed to procedures via 

registers– This reduces the overhead of storing and retrieving values on 

the stack• callq stores a 64‐bit return address on the stack.

Page 8: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Example of Argument Passing

9

long myfunc(long a, long b, long c, long d, long e, long f, long g, long h) {

long xx = a * b * c * d * e * f * g * h; long yy = a + b + c + d + e + f + g + h; long zz = utilfunc(xx, yy, xx % yy); return zz + 20; 

}

* Example from https://eli.thegreenplace.net/2011/09/06/stack‐frame‐layout‐on‐x86‐64/

Page 9: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Function Calling: Stack Frame

• A function may not require a stack frame, if– all local variables can be held in registers, and– no array/structure local variables, and– no address‐of operator (&) is used on local variables, and

– It does not call another function that requires argument passing on the stack, and

– It does not need to save some callee‐save regs

10

Page 10: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Function Calling: Red‐Zone Optimization

• Red‐zone optimization for leaf functions (functions that do not call other funs)– 128 bytes below rsp can be used by a leaf function without stack allocation

– Red‐zone will not be asynchronously clobbered by signals or interrupt handlers, and thus can use it for scratch data

11

Page 11: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Function Calling: the Base Pointer Optimization

• Two options for functions that need a stack frame• Option 1: the traditional approach (default for gcc without 

optimizations)– Function prologue: save the base pointer; create the new base 

pointer– Function body: References to stack location are made relative to 

the base pointer– Function epilogue: restore the base pointer

• Option 2: faster (default for gcc with optimizations)– Do not save/restore the base pointer; rbp used as a GP register– References to stack locations are made relative to the stack 

pointer– Stack allocation at the beginning; rsp remains at a fixed position 

during a call

12

Page 12: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Example

13

long int simple_l (long int *xp, long int y){long int t = *xp + y;*xp = t;return t;

C source code

Page 13: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Example

14

simple_l:pushl %ebp ; Save frame pointermovl %esp, %ebp ; New frame pointermovl 8(%ebp), %edx  ; Retrieve xpmovl 12(%ebp), %eax ; Retrieve ypaddl (%edx), %eax ; Add *xp to get tmovl %eax, (%edx)  ; Store t at xppopl %ebp ; Restore frame pointerret

Optimizedx86‐32 Assembly

Page 14: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Example

15

Optimizedx86‐64 Assembly

simple_l:movq %rsi, %rax ; Copy yaddq (%rdi), %rax ; Add *xp to get tmovq %rax, (%rdi)  ; Store t at xpret

Unoptimizedx86‐64 Assembly

simple_l:pushq %rbpmovq %rsp, %rbpmovq %rdi, ‐24(%rbp)movq %rsi, ‐32(%rbp)movq ‐24(%rbp), %raxmovq (%rax), %raxaddq ‐32(%rbp), %raxmovq %rax, ‐8(%rbp)movq ‐24(%rbp), %raxmovq ‐8(%rbp), %rdxmovq %rdx, (%rax)movq ‐8(%rbp), %raxleaveret

Page 15: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

Function Calling: Caller/Callee‐Save Registers

• Callee‐saved regs: rbx, rbp, and r12 to r15• Caller‐saved regs: r10 and r11

16

Page 16: Binary‐level program analysis: A discussion of x86‐64gxt29/teaching/cse597s19/slides/03x86-64.pdf · •16 general‐purpose registers; each 64‐bit long •Calling conventions

x86‐64 Assembly Code Example

long plus(long x, long y);

void sumstore(long x, long y, long *dest)

{long t = plus(x, y);*dest = t;

}

Optimized x86‐64 Assemblysumstore:

pushq %rbxmovq %rdx, %rbxcall plusmovq %rax, (%rbx)popq %rbxret

C source code