38
Assembler IA32x86 Paulo Lopes 1

Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Embed Size (px)

Citation preview

Page 1: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Assembler

IA32‐x86

Paulo Lopes

1

Page 2: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Intel and AT&T Syntax

• Fonte GNU assembler manual (80386 dependent features)

• We will use intel syntax– Intel

• Registos: eax• Constantes: 4• Dest, Source

– At&t• Registos: %aex• Constantes: $4• Source, Dest

2

Page 3: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Intel and AT&T Syntax

– Intel• Operand size:  ‘byte ptr’, ‘word ptr’, ‘dword ptr’ and ‘qword ptr

• call/jmp far section:offset

– At&t• ‘b’, ‘w’, ‘l’ and ‘q’ at the end of the instruction

• Calls: lcall/ljmp $section, $offset

3

Page 4: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Registers

• the 8 32‐bit registers – eax (the accumulator)– ebx, ecx, edx, edi, esi,– ebp (the frame pointer),– esp (the stack pointer)

• Parte baixa de 16bits destes– ax (the accumulator)– bx, cx, dx, di, si,– bp (the frame pointer),– sp (the stack pointer).

4

Page 5: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Registers

• the 8 8‐bit registers – ah, al, bh, bl, ch, cl, dh, dl– (high and low part of: ax, bc, cx, dx)

• the 6 section registers– cs (code section), CS:IP is code line – ds (data section)  DS:SI‐>ES:DI– ss (stack section) SS:SP– es, fs, and gs.

• Other registers– Control, Debug and Test registers– Floating point registers– MMX e SSE registers– AMD 64 registers

5

Page 6: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

IA32‐X86 Execution Environment

6

Page 7: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

FLAGS Register

7

Page 8: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Instructions prefixes

• Section override prefixes: ‘cs’, ‘ds’, ‘ss’, ‘es’, ‘fs’, ‘gs’

• Operand/Address size prefixes: ‘data16’, ‘addr16’, ‘data32’ and ‘addr32’

• Lock: inhibits interrupts

• Wait: wait for the coprocessor (should not be needed)

• ‘rep’, ‘repe’, and ‘repne’: repeat ‘%ecx’ times

• ‘rex’: extensions to i386 instruction

8

Page 9: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Memory references

• section:[base + index*scale + disp]– Base: 32bits register– Index: 32bits register– Scale: 1, 2, 4 or 8 (defaul=1)– Section (optional), overrides the default section register

• The bits in the section register are append to the address• In x86‐64 sections are not used

– Disp: constant– x86‐64: [rip + 1234] (relative to the PC)

• Examples:mov ax, [ebp ‐ 4]mov ax, [foo + eax*4]mov ax, [foo]mov ax, gs:foo

9

Page 10: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Memory references

• BYTE PTR, WORD PTR, and DWORD PTR

• In some cases the size of the operands in an instruction is ambiguous. Ex:– mov [ebx], 2

• Solution, use– Mov [dword ptr ebx], 2

10

Page 11: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Near and Far Pointers (32bit mode)

11

Page 12: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Jumps

• Jump instructions are always optimized to use the smallest possible displacements.– In AT&T Absolute (as opposed to PC relative) call and jump operands must be prefixed with ‘*’. Undelimited in Intel Syntax

• ‘jcxz’, ‘jecxz’, ‘loop’, ‘loopz’, ‘loope’, ‘loopnz’ and ‘loopne’ instructions only come in byte displacements

12

Page 13: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Numbers

• Floating point constructors are ‘.float’ or ‘.single’, ‘.double’, and ‘.tfloat’ for 32‐, 64‐, and 80‐bit formats– ‘s’, ‘l’, and ‘t’

• Integer constructors are ‘.word’, ‘.long’ or ‘.int’, and ‘.quad’ for the 16‐, 32‐, and 64‐bit integer formats.– ‘s’ (single), ‘l’ (long), and ‘q’ (quad)

13

Page 14: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Specifying CPU Architecture

• .arch cpu_type– ‘i8086’ ‘i186’ ‘i286’ ‘i386’ ‘i486’ ‘i586’ ‘i686’ ‘pentium’ ‘pentiumpro’ ‘pentium4’ ‘k6’ ‘athlon’ ‘sledgehammer’

• ‘jumps’ or ‘nojumps’– Jumps: enable jump promotion (to far jumps)

14

Page 15: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

AS Assembler

15

Page 16: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Syntax

• Comments– /* coment */– Line coment

• #

• Symbols– Formed by letters numbers and ‘_’, ‘.’ e ‘$’

• Statements– One statement per line

• Constants– .byte 74, 0112, 092, 0x4A, 0X4a, ’J, ’\J # All the same value.– .ascii "Ring the bell\7" # A string constant.– .octa 0x123456789abcdef0123456789ABCDEF0 # A bignum.– .float 0f‐314159265358979E‐40 # ‐ pi, a flonum.

16

Page 17: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Constants

• Character Constants– .byte ’J

• Strings– .ascii "Ring the bell\7" # A string constant.– special characters

• \x

• Integers– binary : ‘0b0100111b’ or ‘0B0100111b’– octal: ‘01234567’ (starts with 0)– Decimal: 123456789– Hexadecimal:  ‘0x45ab4f’ or ‘0X45AB4F’ – Floating point: 0f‐314.15E‐2

17

Page 18: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Sections

• text, data and bss sections– .text

• code– .data

• Data with initialization– .bss

• Local data (the all section is zeroed at startup)

• .section “section”

• Subsections– .text 0– .text 1– etc

18

Page 19: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Symbols

• Labels– label_1: statement– Dot (‘.’)

• The current address AS is assembling into

• Expressions– Operators

• * Multiplication. / Division. % Remainder. <• << Shift Left. >> Shift Right. • | Bitwise Inclusive Or. & Bitwise And. ^ Bitwise Exclusive Or. ! Bitwise Or Not.

• + Addition ‐ Subtraction• == Is Equal To <> Is Not Equal To < Is Less Than > Is Greater Than• >= Is Greater Than Or Equal To <= Is Less Than Or Equal To• && Logical And. || Logical Or.

19

Page 20: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Directives

• .ascii "string". . .• .asciz or .string "string". . . (followed by a zero byte)

• .byte• .word • .long or .int• .quad

• .equ or .set symbol, expression

20

Page 21: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Directives

• .global symbol• .if, .endif, .else, .elseif• .data subsection• .text subsection• .section name

• .fill repeat , size , value– Size = 1 to 8 (byte to quad word)

• .skip or .space size , fill– Size in bytes

21

Page 22: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Directives

.macro name arg1=def1, arg2=def2statement \arg1 \arg2

.endm

Using the macroname arg1=arg1v, arg2=arg2v orname arg1v, arg2v

This are like inline functions.Example:.macro Add5 arg1

ADD \arg1, 5.endm

22

Page 23: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Instruction listing

23

Page 24: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Instruction format

• E series of fields most of them optional, with instructions with variable length.

24

Page 25: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Memory references

• section:[base + index*scale + disp]– Base: 32bits register– Index: 32bits register– Scale: 1, 2, 4 or 8 (defaul=1)– Section (optional), overrides the default section register– Disp: constant– x86‐64: [rip + 1234] (relative to the PC)

• Examples:mov ax, [ebp ‐ 4]mov ax, [foo + eax*4]mov ax, [foo]mov ax, gs:foo

25

Page 26: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Prefixes

• Group 1– Lock and repeat prefixes

• Group 2– Segment override prefixes:

• CS, SS, DS, ES, FS, GS– Branch hints:

• Branch not taken• Branch taken

• Group 3– Operand‐size override prefix

• Group 4– Address‐size override prefix

26

Page 27: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

ModR/M and SIB Bytes

• The mod field combines with the r/m field to form 32 possible values: eight registers and 24 addressing modes.

• The reg/opcode field specifies typically a register number

27

Page 28: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Instruction listingADC Add with carryADD AddAND Logical AND

CALL Call procedureCBW, CWD, CDQ 

Convert sizeCLC Clear carry flagCLD Clear direction flagCLI Clear interrupt flagCMC Complement carry flagCMP Compare operands (D‐S)CMPSB, CMPSW, CMPSD, CMPSQ 

Compare B, W, D, Q in memory([DS:ESI] ‐ [ES:EDI])

DEC Decrement by 1

DIV Unsigned divide

HLT Enter halt state

IDIV Signed divide

IMUL Signed multiply

IN Input from port

INC Increment by 1

INT Call to interrupt

IRET Return from interrupt

Jxx (JA, JAE, JB, JBE, JC, JCXZ, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ) – above, bellow (no sign), greater, less, (sign), etc

JMP Jump

LAHF Load flags into AH register

LEA Load Effective Address

28

Page 29: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Instruction listing

LDS,LES,LFS,LGS,LSS 

Load pointer to Segment : DST

LOCK Assert BUS LOCK# signal

LODSB, LODSW, LODSD, LODSQ

Load B,W, D, Q (for strings)

EAX <‐ [DS:ESI]

LOOP/LOOPx Loop control

MOV Move

MOVSB, MOVSW, MOVSD, MOVSQ 

Move B,W,D,Q  from string to string

[ES:EDI] <‐ [DS:ESI]

MUL Unsigned multiply

NEG Two's complement negation

NOP No operation

NOT Negate the operand, logical NOT

OR Logical OR

OUT Output to port

POP Pop data from stack

POPF Pop data into flags register

PUSH Push data onto stack

PUSHF Push flags onto stack

RCL Rotate left (with carry)

RCR Rotate right (with carry)

REPxx Repeat CMPS/MOVS/SCAS/STOS)

(this is a prefix)

RET , RETN, RETF 

Return from procedure

near and far 

ROL Rotate left

ROR Rotate right

29

Page 30: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Instruction listings

SAHF Store AH into flags

SAL Shift Arithmetically left

SAR Shift Arithmetically right

SBB Subtraction with borrow

SCASB, SCASW, SCASD, SCASQ   Compare B, W, D, Q string

EAX‐[ES:EDI] 

SHL Shift left (unsigned shift left)

SHR Shift right (unsigned shift right)

STC Set carry flag

STD Set direction flag

STI Set interrupt flag

STOSB, STOSW, STOSD, STOSQ

Store B,W,D,Q in string

[ES:EDI] <‐ EAX

SUB Subtraction: D=D‐S

TEST Logical compare (AND)

XCHG Exchange data

XOR Exclusive OR

• Further reference Intel Manuals– Intel® 64 and IA‐32 Architectures Software 

Developer’s Manual 

– Volume 1 ‐ Intel Basic Arquitecture –chapter 5 – Instruction Set Sumary

– Volume 2 ‐ Instruction Set Reference

30

Page 31: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Using AS

31

Page 32: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Hello World.intel_syntax noprefix 

.data  # section declaration 

msg:  .ascii  "Hello, world!\n" # our dear string 

len = . ‐msg # string length 

.text  # section declaration 

# we must export the entry point .global _start 

_start: 

# write our string to stdout 

mov  edx,  offset len#3 arg: message length 

mov  ecx, offset msg #2 arg: pointer to message

mov  ebx, 1#1 arg: file handle (stdout)

mov  eax, 4 #system call number (sys_write) 

int  0x80  #call kernel 

# and exit 

mov  ebx, 0  #1 arg: exit code mov  eax, 1 

#system call number (sys_exit) int  0x80  #call kernel 

32

Page 33: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Invoking

• Compiling:• as ‐g ‐o hello.o hello.s

– Option –o: specifies the object file “hello.o”– From the source file hello.s– Option –g: generates debug information

• Linking:• ld ‐o hello hello.o

– Option –o: specifies the executable file, “hello”– From the object file “hello.o”– Option –s: strips from symbolic information

33

Page 34: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Running and debugging

• Running– Just Type (./ specifies current directory)>> ./hello

• Debugging– Use “kdbg” the KDE front end the gnu debugger (dbg)

– Type>> kdbg hello

34

Page 35: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

System Calls• System call 3:  

– read read from a file descriptor fs/read_write.c– ssize_t read(int fd, void *buf, size_t count);

• System call 4 – write to a file descriptor fs/read_write.c– ssize_t write(int fd, const void *buf, size_t count); 

• Parameters are passed in the registers– EBX(1), ECX(2), EDX(3), ESI(4), EDI(5), EBP(6)

• EAX contains the system call number• The kernel is acessed by int 0x80

mov  edx, len # third parameter (string length)mov  ecx, msg   # second parameter (string pointer)mov  ebx, 1  # first parameter (stdin)mov  eax, 4 int  0x80  #call kernel 

35

Page 36: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

System Calls

• 162 : nanosleep– int nanosleep(const struct timespec *req, struct timespec *rem);Timespec:

time_t (long) seconds;long nanoseconds;

In assembleTime:

.long  # seconds 

.long  # nanosecond

ARG2: ECXARG1: EBXEAX=162

36

Page 37: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Mixing C with assembler

• C calling convention– Push the arguments in reverse order to the stack (last argument first)

• This means that the first argument will be on top of the stack– Call the function– Free the arguments

• Compilers reserve stack space for locals and arguments in order to avoid push and pops

• Called:– Optional for us: define new stack frame

• Push old bp• bp = sp

37

Page 38: Assembler - ULisboa · Intel and AT&T Syntax • Fonte GNU assembler manual (80386 dependent features) • We will use intelsyntax – Intel • Registos: eax

Mixing C with assembler

• Compiling– gcc  ‐o executable_file file.c file.s

• Linking with a library (ncurses): ‐l– gcc  ‐o executable_file file.s  ‐lncurses

• Entry point is now– “main” and not “_start”

38