View
36
Download
0
Category
Preview:
DESCRIPTION
CS 105 Tour of Black Holes of Computing. Machine-Level Programming: X86-64. Topics Registers Stack Function Calls Local Storage. X86-64.ppt. Interesting Features of x86-64. Pointers and long Integers are 64 bits Arithmetic ops support 8, 16, 32, and 64-bit ints - PowerPoint PPT Presentation
Citation preview
Machine-Level Programming:X86-64
Machine-Level Programming:X86-64
TopicsTopics Registers Stack
Function CallsLocal Storage
X86-64.ppt
CS 105Tour of Black Holes of Computing
– 2 – 105
Interesting Features of x86-64Interesting Features of x86-64Pointers and long Integers are 64 bitsPointers and long Integers are 64 bits
Arithmetic ops support 8, 16, 32, and 64-bit ints
Set of general purpose registers is 16Set of general purpose registers is 16
Full compatibility with X32Full compatibility with X32
Conditional ops are implemented using conditional move Conditional ops are implemented using conditional move instructions (when possible) – faster than branchinstructions (when possible) – faster than branch
Much of pgm state held in registersMuch of pgm state held in registers Up to 6 arguments passed via registers (not stack) Some procedures do not need to access the stack at all??
Floating-point ops implemented using register-oriented Floating-point ops implemented using register-oriented instructions rather than stack-basedinstructions rather than stack-based
With move to x86-64, gcc updated to take advantage of many With move to x86-64, gcc updated to take advantage of many architecture changes architecture changes
– 3 – 105
Data Representations: IA32 + x86-64Data Representations: IA32 + x86-64
Sizes of C Objects (in Bytes)Sizes of C Objects (in Bytes) C Data Type Typical 32-bit Intel IA32 x86-64
unsigned 4 4 4 int 4 4 4 long int 4 4 8char 1 1 1short 2 2 2 float 4 4 4double 8 8 8 long double 8 10/12 16char * 4 4 8
Or any other pointer
x86-64: 64 bit ints and 64 bit pointers
– 4 – 105
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp
%rbp
x86-64 Integer Registers - 16x86-64 Integer Registers - 16
Extend existing registers. Add 8 new ones, word (16 bit and byte (8 bit) registers still available, first 2 bytes available (rax, rbx, rcx, rdx)
Make %ebp/%rbp general purpose, note 32 bit names
%eax
%ebx
%ecx
%edx
%esi
%edi
%esp
%ebp
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
%r8d
%r9d
%r10d
%r11d
%r12d
%r13d
%r14d
%r15d
– 5 – 105
InstructionsInstructions
Long wordLong word l l (4 Bytes) ↔ Quad word(4 Bytes) ↔ Quad word q q (8 Bytes)(8 Bytes)
New instructions:New instructions: movl → movq addl → addq sall → salq etc.
32-bit instructions that generate 32-bit results32-bit instructions that generate 32-bit results Set higher order bits of destination register to 0 Example: addl
– 6 – 105
Swap in 32-bit ModeSwap in 32-bit Mode
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}
swap:pushl %ebpmovl %esp,%ebppushl %ebx
movl 12(%ebp),%ecxmovl 8(%ebp),%edxmovl (%ecx),%eaxmovl (%edx),%ebxmovl %eax,(%edx)movl %ebx,(%ecx)
movl -4(%ebp),%ebxmovl %ebp,%esppopl %ebpret
Body
Setup
Finish
– 7 – 105
Swap in 64-bit ModeSwap in 64-bit Mode
Operands passed in registers (Operands passed in registers (why useful?why useful?)) First (xp) in %rdi, second (yp) in %rsi 64-bit pointers
No stack operations required, No stack operations required, actually no stack frameactually no stack frame
32-bit data32-bit data Data held in registers %eax and %edx movl operation
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0;}
swap:movl (%rdi), %edxmovl (%rsi), %eaxmovl %eax, (%rdi)movl %edx, (%rsi)retq
– 8 – 105
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp
%rbp
x86-64 Integer Registers, 6 used for argumentsx86-64 Integer Registers, 6 used for arguments
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15Callee saved Callee saved
Callee saved
Callee saved
C: Callee saved
Callee saved
Callee saved
Stack pointer
Used for linking
Return value
Argument #4
Argument #1
Argument #3
Argument #2
Argument #6
Argument #5
– 9 – 105
Interesting Features of Stack FrameInteresting Features of Stack Frame
Allocate entire frame at onceAllocate entire frame at once All stack accesses can be relative to %rsp Do so by decrementing stack pointer Can delay allocation, since safe to temporarily use red zone-
128 bytes available below the stack
Simple deallocationSimple deallocation Increment stack pointer No base/frame pointer needed
– 10 – 105
x86-64 Registersx86-64 RegistersArguments passed to functions via registersArguments passed to functions via registers
If more than 6 integral parameters, then pass rest on stack These registers can be used as caller-saved as well
All references to stack frame via stack pointerAll references to stack frame via stack pointer Eliminates need to update %ebp/%rbp
Other RegistersOther Registers R12-R15, Rbx, Rbp, callee saved 2 or 3 have special uses
Rax holds return value for a functionRsp is the stack pointer
– 11 – 105
Reading Condition Codes: x86-64Reading Condition Codes: x86-64
int gt (long x, long y){ return x > y;}
xorl %eax, %eax # eax = 0cmpq %rsi, %rdi # Compare x and ysetg %al # al = x > y
Body (same for both)
long lgt (long x, long y){ return x > y;}
SetX Instructions: Set single byte based on combination of condition codes
Does not alter remaining 3 bytes
Will disappearBlackboard?
– 12 – 105
Reading Condition Codes: x86-64Reading Condition Codes: x86-64
int gt (long x, long y){ return x > y;}
xorl %eax, %eax # eax = 0cmpq %rsi, %rdi # Compare x and ysetg %al # al = x > y
Body (same for both)
long lgt (long x, long y){ return x > y;}
Is %rax zero?Yes: 32-bit instructions set high order 32 bits to 0!
SetX Instructions: Set single byte based on combination of condition codes
Does not alter remaining 3 bytes
– 13 – 105
Conditionals: x86-64Conditionals: x86-64
absdiff: # x in %edi, y in %esimovl %edi, %eax # eax = xmovl %esi, %edx # edx = ysubl %esi, %eax # eax = x-ysubl %edi, %edx # edx = y-xcmpl %esi, %edi # x:ycmovle %edx, %eax # eax=edx if <=ret
int absdiff( int x, int y){ int result; if (x > y) { result = x-y; } else { result = y-x; } return result;}
Will disappearBlackboard?
– 14 – 105
Conditionals: x86-64Conditionals: x86-64
Conditional move instructionConditional move instruction cmovC src, dest Move value from src to dest if condition C holds More efficient than conditional branching (simple control flow) But overhead: both branches are evaluated
absdiff: # x in %edi, y in %esimovl %edi, %eax # eax = xmovl %esi, %edx # edx = ysubl %esi, %eax # eax = x-ysubl %edi, %edx # edx = y-xcmpl %esi, %edi # x:ycmovle %edx, %eax # eax=edx if <=ret
int absdiff( int x, int y){ int result; if (x > y) { result = x-y; } else { result = y-x; } return result;}
– 15 – 105
x86-64 Procedure Summaryx86-64 Procedure SummaryHeavy use of registersHeavy use of registers
Parameter passing More temporaries since more registers
Minimal use of stackMinimal use of stack Sometimes none Allocate/deallocate entire block
Many tricky optimizationsMany tricky optimizations What kind of stack frame to use Calling with jump Various allocation techniques
– 16 – 105
IA32 Example AddressesIA32 Example Addresses
$esp 0xffffbcd0p3 0x65586008p1 0x55585008p4 0x1904a110 p2 0x1904a008&p2 0x18049760beyond 0x08049744big_array 0x18049780huge_array 0x08049760main() 0x080483c6useless() 0x08049744final malloc() 0x006be166
address range ~232
FF
00
Stack
TextData
Heap
08
80
not drawn to scale
malloc() is dynamically linkedaddress determined at runtime
– 17 – 105
x86-64 Example Addressesx86-64 Example Addresses
$rsp 0x7ffffff8d1f8p3 0x2aaabaadd010p1 0x2aaaaaadc010p4 0x000011501120 p2 0x000011501010&p2 0x000010500a60beyond 0x000000500a44big_array 0x000010500a80huge_array 0x000000500a50main() 0x000000400510useless() 0x000000400500final malloc() 0x00386ae6a170
address range ~247
00007F
000000
Stack
TextData
Heap
000030
not drawn to scale
malloc() is dynamically linkedaddress determined at runtime
– 18 – 105
x86-64 Summaryx86-64 Summary
Pointers and long integers are 64 bitPointers and long integers are 64 bit
16 Registers, with 64, 32, 16, 8, bit uses16 Registers, with 64, 32, 16, 8, bit uses
Minimal use of stackMinimal use of stack Parameters passed in registers Allocate/deallocate entire block for a procedure Can use 128 bytes passed (beyond) stack pointer (red zone)
can use stack without messing with stack pointer
No Frame Pointer
Many tricky optimizationsMany tricky optimizations What kind of stack frame to use Calling with jump Various allocation techniques Compilers need to be smarter
– 19 – 105
x86-64 Locals in the Red Zonex86-64 Locals in the Red Zone
Avoiding Stack Pointer ChangeAvoiding Stack Pointer Change Can hold all information within
small window beyond stack pointer
/* Swap, using local array */void swap_a(long *xp, long *yp) { volatile long loc[2]; loc[0] = *xp; loc[1] = *yp; *xp = loc[1]; *yp = loc[0];
}
swap_a: movq (%rdi), %rax movq %rax, -24(%rsp) movq (%rsi), %rax movq %rax, -16(%rsp) movq -16(%rsp), %rax movq %rax, (%rdi) movq -24(%rsp), %rax movq %rax, (%rsi) ret
rtn Ptr
unused
%rsp
−8
loc[1]
loc[0]
−16
−24
– 20 – 105
x86-64 Stack Frame Examplex86-64 Stack Frame Example
Keeps values of Keeps values of aa and and ii in in callee save registerscallee save registers
Must set up stack frame to Must set up stack frame to save these registerssave these registers
long sum = 0;/* Swap a[i] & a[i+1] */void swap_ele_su (long a[], int i){ swap(&a[i], &a[i+1]); sum += a[i];}
swap_ele_su: movq %rbx, -16(%rsp) movslq %esi,%rbx movq %r12, -8(%rsp) movq %rdi, %r12 leaq (%rdi,%rbx,8), %rdi subq $16, %rsp leaq 8(%rdi), %rsi call swap movq (%r12,%rbx,8), %rax addq %rax, sum(%rip) movq (%rsp), %rbx movq 8(%rsp), %r12 addq $16, %rsp ret
Blackboard?
– 21 – 105
Understanding x86-64 Stack FrameUnderstanding x86-64 Stack Frameswap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a – array pointer leaq (%rdi,%rbx,8), %rdi # Create &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret
– 22 – 105
Understanding x86-64 Stack FrameUnderstanding x86-64 Stack Frameswap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a leaq (%rdi,%rbx,8), %rdi # &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret
rtn addr%r12
%rsp
−8
%rbx−16
rtn addr%r12
%rsp
+8
%rbx
– 23 – 105
Interesting Features of Stack FrameInteresting Features of Stack Frame
Allocate entire frame at onceAllocate entire frame at once All stack accesses can be relative to %rsp Do by decrementing stack pointer Can delay allocation, since safe to temporarily use red zone
Simple deallocationSimple deallocation Increment stack pointer No base/frame pointer needed
Recommended