Assembly Language
http://iescobar.comMsc. Ivan A. Escobar Broitman
Enero Mayo 2012
CHAPTER 1
Introduction
3
Microprocessor
Silicon chip that contains a central processing unit (CPU).
The “Brain” of all personal computers, most workstations, and a great number of digital devices.
In charge of program execution. It can be RISC or CISC.
4
Bus Connections
CPU Memory I/O
Address Bus
Data Bus
Control Bus
5
Bus Connections (continued)
A processor communicates with the system’s memory and I/O circuits by means of signals that travel through a set of cables or connections known as buses. Address Bus: Holds the memory address that will be
accessed. Data Bus: Holds the piece of data to read or write. Control Bus: Indicates the operation to be done
(read or write).
6
CPU Instructions
Each instruction has: an opcode (operation code), that
indicates which operation to perform. zero o more operands, which may be
registers, constants or memory locations.
7
Fetch-Execute Cycle
Fetch:1. Fetch an instruction from memory.
2. Decode the instruction to determine theoperation.
3. Fetch data from memory if necessary.
Execute:4. Perform the operation on the data.
5. Store the result in memory if needed.
8
RISC: Reduced Instruction Set Computer
Microprocessor that uses a relatively small number of fast but simple instructions.
Cheaper to design and produce because they require less transistors.
Mainly used in workstations.
9
CISC: Complex Instruction Set Computer
Microprocessor that uses a significantly large amount of complex (specialized) instructions.
Mainly used for Intel’s x86 architecture.
10
Programming Languages
Hardware
Machine Code
Assembly Language
High Level Language
11
Machine Code
Lowest level programming language.
Each CPU instruction is represented as an opcode, which is an unsigned integer number.
Only language that the computer really understands.
Difficult to understand by human beings.
12
Machine Code Example
The opcode for adding one to the accumulator in the Intel x86 is:
01000000b
or
0x40
13
Assembly Language
Same instruction set as machine code.
Each opcode is replaced by a symbolic name.
Less cryptic for human beings.
14
Assembly Language Example
The Intel x86 assembly language instruction that adds one to the accumulator is:
inc eax
15
Assembler
In order to execute a program written in assembly language, it first has to be translated to machine code using a special program called an assembler.
Assembler
0x40
inc eax
16
High Level Language
Has less primitive instructions than assembly language and machine code.
Program text is much more like natural language.
Easier to understand by human beings.
Examples: FORTRAN, LISP, COBOL, BASIC and C.
17
Compiler
A program written in a high level language may be translated to machine code using a compiler.
Compiler
cmp esi,0 jne .L1 add esi,5.L1
if(x == 0) x = x + 5;
Assembler
0x81FE000000000x75060x81C605000000
18
Interpreter
An interpreter translates a high level language program to an intermediate form that is subsequently executed by a virtual machine.
Interpreter
Intermediate Form
IF X = 0 THEN X = X + 5
Virtual Machine
Translator
19
Assembly Language Advantages
Program execution speed. Executable code size. “Bare bones” programming:
special instructions (FPU, MMX) I/O ports special CPU modes of operation
20
Assembly Language Disadvantages
Error prone. Long and tedious to write. Difficult to understand and modify. Strongly tied to a specific computer
architecture.
21
Commonly Used Assembly Language Applications
Operating Systems Device Drivers Communication Software Real Time Systems Embedded Systems Graphics
22
Reasons for Studying Assembly Language
To understand some of the low level details of how a real computer operates.
To get to know some technologies that can only be adequately understood using assembly language.
To obtain a better appreciation of the inner-workings of a compiler.
23
ComputerScience(ISC) Computer
Engineering(ISE)
ProgrammingLanguages
Course
What’s next?
MicroprocessorsCourse
Assembly LanguageCourse
CHAPTER 2
The Intel x86 Architecture
25
Moore’s Law
In 1965, Intel’s co-founder Gordon Moore, made the following observation:
Approximately every 18 monthsmicrochipsduplicate theirpower, while their coststays roughly the same.
26
10M
1M
100K
10K
0
1970 1975 1980 1985 1990 1995 2000
tran
sist
or
sIntel Processors
year
40048080
8086
80286
80386
80486
P5
P6
P7
27
Moore’s Law
28
4004 (1971)
First microprocessor. Built by Intel for Busicom
calculators. 4-bit registers. 108 kHz. 2,300 transistors. 640 bytes of memory.
29
4004 (1971)
30
8080 (1974)
Used in the MITS Altair 8800, the first commercial personal computer.
8-bit registers. 16-bit address bus. 2 MHz. 6,000 transistors. 64Kbytes of memory
31
8080 (1974)
32
8086/8088 (1978)
Used in the original IBM PC. First 16-bit microprocessor. 20-bit address bus. 16-bit (8086) and 8-bit (8088) data bus. 4.77+ MHz. 29,000 transistors. Addressable memory 1Mb.
33
8086/8088 (1978)
34
80286 (1982)
Used in the original IBM PC/AT. 24-bit address bus. 16-bit data bus. 6+ MHz. 134,000 transistors. Multitasking, protected mode and virtual
memory. Addressable memory 16Mb.
35
80286 (1982)
36
80386 (1985)
32-bit registers. 32-bit address bus. 32-bit data bus. Pipelining. 16+ MHz. 275,000 transistors. Addressable memory 4Gb.
37
80386 (1985)
38
P4: 80486 (1989)
Better execution speed. Integrated floating point unit (FPU). 8 KB L1 cache. 25+ MHz. 1’200,000 transistors. Addressable memory 4Gb.
39
P4: 80486 (1989)
40
P5: Pentium (1993)
64-bit data bus. 8 KB L1 cache for data and 8 KB for code. Dual pipeline for integer operations. 60+ MHz. 3’100,000 transistors. Addressable Memory 4Gb.
41
P5: Pentium (1993)
42
P6: Pentium Pro (1995)
36-bit address bus. 256 KB L2 cache. Superpipelining. Speculative and out of
order execution. 150+ MHz. 5’500,000 transistors. Addressable Memory
64Gb.
43
P6: Pentium Pro (1995)
44
P55C: Pentium MMX (1997)
Classic Pentium with MMX technology: 64-bit SIMD multimedia and communication extensions.
16 KB L1 cache for data and 16 KB for code.
166+ MHz. 4’500,000 transistors. Addressable memory 4Gb.
45
Klamath: Pentium II (1997)
Pentium Pro with MMX technology.
16 KB L1 cache for data and 16 KB for code.
512 KB L2 cache. 233+ MHz. 7’500,000 transistors. Addressable Memory
64Gb.
46
Klamath: Pentium II (1997)
47
New P6 processors
Pentium II Xeon (“Pentium II on steroids”) L2 cache runs at full processor speed. Designed for the computer server market.
Celeron (“the Castrated One”) Pentium II with no L2 cache. Designed for the sub-$1,000 PC market.
48
New PII XEON
49
CELERON
50
Katmai: Pentium III (1999)
Pentium II with 128-bit SIMD floating point oriented extension to the MMX technology.
Processor serial number in order to “enhance security”.
450+ MHz. Addressable Memory 64Gb.
51
Katmai: Pentium III (1999)
52
Pentium IV (2000)
0.18-micron 42 million transistors on a single chip. 1.4 3.0 Ghz. Bus Speed 400 Mhz.
53
Pentium IV (2000)
54
Merced: Itanium (2000)
Intel Architecture-64 (IA-64). Developed jointly by Intel and Hewlett-
Packard. Hardware x86 emulation. Not RISC or CISC, but EPIC (Explicitly
Parallel Instruction Computing). 600 MHz and 1,000 MHz. Tens of millions of transistors.
55
x86 Basic Structure
Code Cache
Data Cache
Registers
Execution Unit
Decode & Prefetch Unit
BranchPredictor
FloatingPointUnit
Bus Interface
To RAMInteger ALU
56
x86 Basic Structure (continued)
Execution unit: two parallel integer pipelines enable the CPU to read, interpret, execute and dispatch two instructions simultaneously.
Branch Predictor: The branch prediction unit tries to guess which sequence will be executed each time the program contains a conditional jump, so that the Prefetch and Decode Unit can get the instructions ready in advance.
57
x86 Basic Structure (continued)
Floating Point Unit: Third execution unit, where non-integer calculations are performed.
Primary Cache: Two on-chip caches, one for code and one for data, are far quicker than the external memory.
Bus Interface: This brings a mixture of code and data into the CPU, separates the two ready for use, and then recombines them and sends them back out.
58
x86 Modes of Operation The operating mode determines
which instructions and architectural features are accessible.
The Intel Architecture supports three operating modes: Real Mode Protected Mode Virtual-8086 Mode
59
Real Mode
Mode in which all x86 processors boot. The CPU works like a very fast 8086. Can only access up to 1 MB of memory. Only one task is executed at a time.
60
Real Mode
In Real address mode, the IA-32 processor can access 1MB of memory using 20 bit address in the range 0 to FFFFF hex. The basic problem that Intel engineers had to solve was that the original 8086 processor had only 16 bit registers, so it was impossible to directly represent a 20 bit address.
They came up with a scheme known as segmented memory. All memory is divided into 64kb units called segments, as shown in the figure:
61
Real Mode
62
Real Mode
An analogy might be a large building Segments= floors. Offset = a room in that floor. EX; 8000:0250 represents an offset of 250 in
the segment 8000, the last zero can be dropped of the segments.
To calculate linear address: Segment x 10 + offset 8000x10 +250 == 80250
63
Real Mode
A typical program has three segments: Code (CS) Data (DS) Stack (SS)
64
Protected Mode
Allows multitasking. Each program has its own memory
protected from other programs. Extended memory: more than 1 MB of
memory available. Supports virtual memory.
65
Protected Mode
When a processor is running in protected mode, each program can address up to 4GB of memory.
It uses the flat memory model. It only requires a 32 bit integer to hold the
address of any instruction or variable.
66
Protected Mode
A typical program has three segments: Code (CS) Data (DS) Stack (SS)
67
Virtual-8086 Mode
Allows simultaneous execution of two or more programs designed to work in real mode, each program having up to 1 MB of independent memory.
68
Registers
A register is a special high-speed storage area within the CPU.
The x86 processors have several registers available for the application programmer, grouped as follows: General-purpose data registers. Segment registers. Status and control registers (EIP and
EFLAGS registers).
69
General-Purpose Data Registers
These eight 32-bit registers are available for holding the following data items: Integer operands for logical and arithmetic
operations. Pointers (memory addresses).
70
eaxax
ah alAccumulator
ebxbx
bh blBase
ecxcx
ch clCount
edxdx
dh dlData
081631
General-Purpose Data Registers (continued)
71
espsp Stack Pointer
01631
ebpbp Base Pointer
esisi Source Index
edidi Destination Index
General-Purpose Data Registers (continued)
72
Segment Registers
The six segment registers hold 16-bit segment selectors.
A segment selector points to a special structure in memory called a segment descriptor. Several segment descriptors are grouped together into a descriptor table.
A segment descriptor contains addressing and control information which is used to control how a 32-bit linear address is generated.
73
cs Code Segment
016
ds Data Segment
es Extra Segment
fs Extra Segment
gs Extra Segment
ss Stack Segment
Segment Registers (continued)
74
Segment Registers (continued)
Segment Selector
Memory
SegmentRegister
SegmentDescriptor
SegmentDescriptor
SegmentDescriptor
SegmentDescriptor
. . .
DescriptorTable
Segment Information:
• Base address• Size• Privilege Level:
- private OS function- OS service- device driver- application program
• Type:- read-only- read/write- execute-only- execute/read
75
Instruction Pointer Register
The instruction pointer (EIP) is a 32-bit register that contains the offset in the current code segment for the next instruction to be executed.
eip Instruction Pointer
01631
76
Instruction Pointer Register (continued)
It is advanced from one instruction boundary to the next in straight-line code or it is moved ahead or backwards by a number of instructions when executing flow control instructions such as jumps or subroutine calls.
It cannot be accessed directly by software.
77
Flags Register
This 32-bit register is a collection of individual status and control bits called flags.
Each flag is usually manipulated independently and not as a set.
78
Flags Register (continued)
CF carry flag PF parity flag AF auxiliary flag ZF Zero Flag
SF sign flag DF direction flag OF overflow flag
...
11
of df
10
sf
7
zf
6
af
4
pf
2
cf
031
eflags
79
Flags Register (continued)
Carry Flag Is set if the result of an arithmetic operation involving unsigned numbers overflows.
Overflow Flag Is set if the result of an arithmetic operation involving signed numbers overflows.
Sign Flag Is set if the result of an arithmetic or logical operation is negative.
Zero Flag Is set if the result of an arithmetic or logical operation is zero.
80
Flags Register (continued)
Parity Flag Is set if the result of an arithmetic or logical operation has an even number of 1 bits in its 8 least significant bits.
Auxiliary Flag Is set if the result of an arithmetic operation has a carry out from the low-order nibble. Used in binary-coded decimal (BCD) operations.
Direction Flag Is explicitly set or cleared by the programmer in order to modify the behavior of some special string operations.
81
Memory Organization
The memory that the processor addresses on its bus is called physical memory.
Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address.
82
Memory Organization (continued)
The physical address space ranges from zero to a maximum of 232 – 1 (4 GB).
When employing the processor’s memory management facilities, programs DO NOT directly address physical memory. Instead, they access memory using a memory model.
83
Flat Memory Model
Memory appears to a program as a single, continuous address space, called a linear address space. All code and data are contained in this address space.
84
...
0xFFFFFFFF
0x00000000
Linear Address
Space
Flat Memory Model (continued)
The linear address space is byte addressable, with addresses running contiguously from 0 to 232 - 1.
85
Paging
The x86 supports translation of linear (virtual) addresses into physical addresses through paging.
Special tables map portions of the virtual addresses into physical memory locations.
Physical memory is divided into page frames, each 4 KB in size.
The operating system copies a certain number of pages from your storage device to main memory.
86
Physical Memory
Address Space
Virtual Memory
Disk Drive
Paging (continued)
When a program needs a page that is not in main memory, the operating system copies the required page into memory and copies another page back to the disk.
Each time a page is needed that is not currently in memory, a page fault occurs.
87
Generating a Physical Address
16-bit selector 32-bit offset
Logical Address
Segment Descriptor
+ 32-bit linear address
PagingPaging
32-bit physical address
88
32-bit Offset
32-bit base register
32-bit index register
scale factor
eax, ebx, ecx, edx, esi, edi, ebp, esp
eax, ebx, ecx, edx, esi, edi, ebp
1, 2, 4, 8+
displacement+ 8-bit, 32-bit
32-bit offset
89
32-bit Offset Example
MOV EAX, [ESI + ECX * 4 + 12]
base register
indexregister
scalefactor
displacement
90
Byte Order
When a value is stored in memory in multiple bytes, two distinct byte orders may be used:Big-EndianLittle-Endian
Big End Little end
91
Byte Order (continued)
In big-endian architectures, the leftmost bytes (those with a lower address) are most significant. In little-endian architectures, the rightmost bytes are most significant.
The terms big-endian and little-endian are derived from the Lilliputians of Jonathan Swift's
Gulliver's Travels, whose major political issue was whether soft-boiled eggs should be opened on the big side or the little side.
92
Byte Order (continued)
Intel x86 and DEC VAX systems store multibyte values in little-endian order.
HP, IBM and Motorola 68K systems store multibyte values in big-endian order.
The Power PC is a bi-endian processor: it supports both big and little-endian byte ordering.
93
00000001b
00000100b
00000000b
00000000b
00
01
02
03
00000000b
00000000b
00000100b
00000001b
00
01
02
03
little-endian big-endian
Byte Order Example
The byte ordering for the number 1025 stored in 4 bytes is:
Address
1025 = 00000000 00000000 00000100 00000001b
CHAPTER 3
The Linux Operating System
95
Operating System
Software that makes hardware usable.
Manages such things as: memory, screen display, keyboard input, disk files and printer output.
UserUser
Application Programs
Application Programs
OperatingSystem
OperatingSystem
HardwareHardware
96
UNIX
Operating system developed at Bell Labs in the early 1970s by Ken Thompson and Dennis Ritchie.
First operating system to be written in a high-level programming language, namely C.
97
UNIX (continued)
The name UNIX was intended as a pun on a previous OS called MULTICS (and was written UNICS at first: UNiplexed Information and Computing System).
Leading operating system for workstations
98
Linux
Free UNIX-type operating system originally created by Linus Torvalds at the University of Helsinki in Finland.
Developed under the GNU General Public License, the source code for Linux is freely available to everyone.
99
Linux (continued)
Linux is an independent POSIX (Portable
Operating System Interface for UNIX) implementation and includes: multitasking, multi-user, multiprocessing, virtual memory, shared libraries and TCP/IP networking.
Currently implemented in a wide range of platforms, including: x86, Alpha, SPARC, 68K and PowerPC.
100
Short for GNU's Not UNIX. A UNIX-compatible software
system developed by the Free Software Foundation (FSF).
The philosophy behind GNU is to produce software that is non-proprietary. Anyone can download, modify and redistribute GNU software. The only restriction is that they cannot limit further redistribution.
The GNU project was started in 1983 by Richard Stallman at the MIT.
GNU Project
101
POSIX
Acronym for Portable Operating System Interface for UNIX.
Set of IEEE and ISO standards that define an interface between programs and operating systems.
Supported by most UNIX systems and Windows NT.
102
Multitasking
The ability to execute more than one task (program) at the same time.
The CPU switches from one program to another so quickly that it gives the appearance of executing all of the programs at the same time.
103
Multitasking (continued)
There are two basic types of multitasking: Preemptive multitasking: the operating
system assigns CPU time slices to each program.
Cooperative multitasking: each program can control the CPU for as long as it needs it. If a program is not using the CPU, however, it can allow another program to use it temporarily.
Linux supports preemptive multitasking.
104
Multi-user
Computer systems that support two or more simultaneous users.
All mainframes and minicomputers and most workstations are multi-user systems.
105
Multiprocessing
Since version 2.0, Linux has the ability to run in multiprocessor architectures.
The OS can distribute several applications in true parallel fashion across several CPUs.
106
Virtual Memory
If it’s there and you can see it it’s real
If it’s not there and you can see it it’s virtual
If it’s there and you can’t see it it’s transparent
If it’s not there and you can’t see it you erased it!
IBM poster explaining virtual memory, circa 1978.
107
Virtual Memory (continued)
Technique that allows to increases the amount of apparent memory available on a system.
A swap space is an area on disk in which the OS stores images of running programs when memory is tight.
The Linux virtual memory system uses a swap space to implement paging.
108
Shared Libraries A library is a collection of
precompiled routines that a program can use.
In a static library, all library functions that a program requires are made part of an executable, which can make it rather large.
In a shared library, function code is not directly included in an executable file. Instead, the OS dynamically links a running program to the required routines contained in the shared library.
109
Shared Libraries (continued)
Shared libraries have two important advantages: Small executable files. Several programs running at the same time
can share a single copy of the library code.
110
TCP/IP Networking
Acronym for Transmission Control Protocol/Internet Protocol.
Consists of a suite of communications protocols used to connect hosts on the Internet.
Allows services such as: e-mail, telnet, ftp and http.
CHAPTER 4
The Netwide Assembly Language
112
nasm: The Netwide Assembler
Free and portable x86 assembler originally developed by Simon Tatham and Julian Hall.
It supports a range of object file formats, including Linux ELF, NetBSD/FreeBSD, COFF, Microsoft 16-bit OBJ and Win32.
113
Development Cycle
editor
assembly language
file*.asm
nasm
objectfile*.o
ELFexecutable
file
ld (linker)
114
ld: The Linker
An object file isn’t directly executable; it first needs to be fed into a linker (also known as link-loader or link-editor).
The linker does the following tasks: identifies the initial program entry point (_start label) binds symbolic references to memory addresses unites all the object and library files produces an executable ELF file
115
ELF File
The Executable and Linkable Format was designed by the UNIX System Laboratories.
Used by contemporary Linux implementations as its standard executable file format.
Supports shared libraries (dynamic linking).
116
a.out File
a.out is the default file name given to executable files by UNIX linkers.
It means “assembly output”, in spite of being linker output!
On the PDP-7 computer, there was no linker. Executable programs were created directly by the assembler. The name stuck, even when the linkers started to appear in newer machines.
117
$ vi test.asm$ lstest.asm$ nasm -f elf test.asm$ lstest.asm test.o$ ld -s -o test test.o$ lstest test.asm test.o$ test
assemblyassembly
linkagelinkage
executionexecution
Building a Program
editionedition
118
Linux-NASM Program Skeleton
bits 32 ; -- 32 bit programsection .data ; -- Start data segment ; put initialized data heresection .bss ; -- Start bss segment ; put non-initialized data here section .text ; -- Start code segment global _start ; -- Export “_start” label_start ; -- Define “_start” label ; put program code here mov eax, 1 ; -- Exit system call mov ebx, 0 ; exit code #0 int 0x80
119
Segments
A segment on UNIX is asection of related stuff in a binary.
ELF files have three segments: TEXT for storing code DATA for storing initialized data BSS for non-initialized data
120
NASM Source Code
Every NASM program source line has the following four fields:
label: instruction operands ; comment
Every field is optional. The number of operands depend of the
instruction.
121
Instructions
Mnemonics that represent x86 opcodes.
Generate code that produce actions at run time.
Not real x86 instructions (they don’t produce any actions at run time).
Are used in the instruction field because that’s the most convenient place to put them.
Pseudo-Instructions
122
Directives
Statements that allow us to control how a program is assembled.
They only work at assembly time (they don’t directly produce any machine code).
123
bits Directive
Specifies if NASM must produce code that will run in 16 or 32-bit mode.
ELF files only support 32-bit mode:
bits 32 May be omitted for ELF files.
124
section .data Directive
States the beginning of the initialized data segment.
An image of this segment’s data is physically stored in the executable file.
This segment contains read/write data.
125
Pseudo-Instructions for the Data Segment
Pseudo-Instruction
Meaning Size (bits)
db Define byte 8dw Define word 16dd Define double word 32dq Define quadword 64dt Define ten bytes 80
126
section .bss Directive
States the beginning of the non-initialized data segment.
Only the size of the data is stored in the executable file. Once the program is loaded into memory, all the data in this section is set to zero.
This segment contains read/write data. BSS means “Block Started by Symbol”, a
pseudo-instruction from the old IBM 704 assembler, carried over into UNIX.
127
Pseudo-Instruction
Meaning Size (bits)
resb Reserve byte 8resw Reserve word 16resd Reserve double word 32resq Reserve quadword 64rest Reserve ten bytes 80
Pseudo-Instructions for the BSS Segment
128
section .text Directive
States the beginning of the segment that contains the program’s executable instructions.
This segment is read-only.
129
System Calls
Processes access kernel facilities via the system call interface.
System calls are the only way a program con communicate to the outside world.
In assembly language, interrupt 0x80 is used to make system calls.
130
System calls (continued)
Process
Linux Kernel
I/O Devices(display, keyboard, mouse,
disks, printer, etc.)
system calls: INT 0x80
131
sys_exit
Terminate current process, return exit code to caller.
EAX 1EBX exit code
132
sys_read
Read a number of bytes from a given input device.
EAX 3EBX file descriptor (0 = stdin)ECX buffer addressEDX number of bytes to readINT 0x80
133
sys_write
Write a number of bytes to a given output device.
EAX 4EBX file descriptor (1 = stdout)ECX buffer addressEDX number of bytes to writeINT 0x80
CHAPTER 5
x86 Integer Instructions
135
Condition Codes
Sufix Meaning FlagsO Overflow OF=1
NO No Overflow OF=0C CarryB Below CF=1
NAE Not Above nor EqualNC No CarryNB Not Below CF=0AE Above or Equal
136
Condition Codes (continued)
Sufix Meaning FlagsZ Zero ZF=1E Equal
NZ Not Zero ZF=0NE Not EqualBE Below or Equal CF=1 OR ZF=1NA Not AboveA Above CF=0 AND ZF=0
NBE Not Below nor Equal
137
Condition Codes (continued)
Sufix Meaning FlagsS Sign SF=1
NS Not Sign SF=0P Parity PF=1
PE Parity EvenNP Not Parity PF=0PO Parity Odd
138
Condition Codes (continued)
Sufix Meaning FlagsL Less SF<>OF
NGE Not Greater nor EqualGE Greater or Equal SF=OFNL Not LessLE Less or Equal ZF=1 OR SF<>OFNG Not GreaterG Greater ZF=0 AND SF=OF
NLE Not Less nor Equal
139
Condition Codes (continued)
Above and Below are used for unsigned integer comparisons.
Greater and Less are used for signed integer comparisons.
140
Flow Control Instructions
JMP Jcc CALL RET
141
JMP: jump
Syntax: JMP dest
Operation (absolute jump):EIP dest
Operation (relative jump):EIP EIP + dest
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
142
Unconditional Jumps
Jmp statement label
We have two types of jumps, Intersegment Intrasegment
Address can be in a register, variable or label.
143
Unconditional Jumps
Example:
Start: Mov Ax, 0
Inc Ax,
Jmp Start
144
Jcc: short jump conditional
Syntax: Jcc dest
Operation:if(cc) EIP EIP + destendif
Notes: cc is any of the condition codes. dest must be within a signed 8-bit range (-128 to 127). -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
145
Jcc: near jump conditional
Syntax: Jcc NEAR dest
Operation:if(cc) EIP EIP + destendif
Notes: cc is any of the condition codes. dest must be within a signed 32-bit range.
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
146
Conditional Jumps
Dependent on condition codes. Example:
JZ jump if zero flag is set.
147
Conditional Codes
Examples: Code the following C routine using aseembly
language instructions. Add a value to x;
If x < 0Then… (body for negative condition)Else if x = 0… (body for zero condition)Else… (body for positive condition) End if
148
Conditional Codes
SolutionAdd x, eax ;add a value to x
Jns elseIf Zero ;jump if x is not negatve
… ; code for negative condition
Jmp endCheck
elseifZero:
jnz elsePos ; jump if x is not zero
… ; code for zero condition
jmp endCheck
elsePos: … ; code for positive balance
endCheck:
149
Comparing Instructions
CMP op1, op2 This instructions executes by calculating a
like a sub instruction op1 –op2 but it does not modify the operands it only modifies the flag register.
We use the flag register values. We have to analyse if we care or not of
the sign of the operation.
150
Compare Examples
OP1= 3B OP2= 3B CF=OF=SF=0 ZF=1 OP1==OP2 signed and unsigned
151
Compare Examples
OP1= 3B OP2= 15 OP1-OP2= 26 CF=OF=SF=ZF=0 OP1>OP2 signed and unsigned
152
Compare Examples
OP1=15 OP2= F6 OP1-OP2=1F CF=1 – borrow SF=OF=ZF=0 Signed operation = op1>op2 Unsigned operation =op1 < op2
153
Compare Examples
Legal Examples
Cmp eax, 356
cmp value, 03dh
Cmp bh, ‘$’
Illegal examples
Cmp 1000, total
154
Compare Programming Ex.
Code the following routine in assembly language.
If val < 10
Then
add 1 to xcount;
Else
add 1 to ycount;
End if;
155
Compare Programming Ex
Solution:
Cmp ebx, 10 ;value < 10
Jnl Elsey
Inc xcount ;add 1 to xcount
Jmp endVal
Elsey:
Inc ycount ;add 1 to ycount
endVal:
156
Programming Ex #2
Code the following routine in assembly language:
If (total mayor o igual 100) or (count=10)
Then
add value to total;
End if
157
Programming Ex2 Solution
Cmp total, 100
Jge addValue
Cmp cx, 10
Jne endAddCheck
addValue:
Mov ebx, value
Add total, ebx
endAddCheck:
158
While Loops
While continuation condition loop
…{ body}
end while;
The continuation condition is a boolean expression.
159
While loop excercise
Design an assembly language module to implement the following high level language instructions.
While (sum < 1000) loop
…{body increment sum}
End while;
160
While Loops Exercise 2
Design an assembly language module to implement the following high level language instructions.
X:=1twoTox:=1;While twoTox</number
multiply twoTox by 2;End while;Substract 1 from x;
161
Homework
162
CALL: call subroutine
Syntax: CALL dest
Operation (absolute call):ESP ESP - 4[ESP] EIPEIP dest
Operation (relative call):ESP ESP - 4[ESP] EIPEIP EIP + dest
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
163
RET: return from subroutine
Syntax: RET
Operation:EIP [ESP]ESP ESP + 4
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
164
Data Transfer Instructions
MOV CMOVcc SETcc XCHG XLATB
PUSH POP PUSHF POPF PUSHA POPA
165
MOV: move data
Syntax: MOV dest, orig
Operation:dest orig
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
166
CMOVcc: conditional move
Syntax: CMOVcc dest, orig
Operation:if(cc) dest origendif
Notes: cc is any of the condition codes.
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
167
SETcc: set conditional
Syntax: SETcc dest
Operation:if(cc) dest 1else dest 0endif
Notes: cc is any of the condition codes.
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
168
XCHG: exchange data
Syntax: XCHG op1, op2
Operation:temp op1op1 op2op2 temp
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
169
XLATB: translate byte
Syntax: XLATB
Operation:AL [EBX + AL]
Notes: AL is treated as an unsigned byte.
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
170
PUSH: push data on stack
Syntax: PUSH op
Operation:ESP ESP - 4[ESP] op
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
171
POP: pop data from stack
Syntax: POP dest
Operation:dest [ESP]ESP ESP + 4
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
172
PUSHF: push flags register
Syntax: PUSHF
Operation:ESP ESP - 4[ESP] EFLAGS
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
173
POPF: pop flags register
Syntax: POPF
Operation:EFLAGS [ESP]ESP ESP + 4
X
of
X
df
X
sf
X
zf
X
af
X
pf
X
cf
174
PUSHA: push all registers
Syntax: PUSHA
Operation:temp ESPESP ESP - 0x20[ESP + 0x1C] EAX [ESP + 0x18] ECX[ESP + 0x14] EDX[ESP + 0x10] EBX[ESP + 0x0C] temp[ESP + 0x08] EBP[ESP + 0x04] ESI[ESP + 0x00] EDI
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
175
POPA: pop all registers
Syntax: POPA
Operation:EDI [ESP + 0x00]ESI [ESP + 0x04]EBP [ESP + 0x08]EBX [ESP + 0x10]EDX [ESP + 0x14]ECX [ESP + 0x18]EAX [ESP + 0x1C]ESP ESP + 0x20
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
176
Flow Control Instructions
JMP Jcc CALL RET
177
JMP: jump
Syntax: JMP dest
Operation (absolute jump):EIP dest
Operation (relative jump):EIP EIP + dest
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
178
Jcc: short jump conditional
Syntax: Jcc dest
Operation:if(cc) EIP EIP + destendif
Notes: cc is any of the condition codes. dest must be within a signed 8-bit range (-128 to 127). -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
179
Jcc: near jump conditional
Syntax: Jcc NEAR dest
Operation:if(cc) EIP EIP + destendif
Notes: cc is any of the condition codes. dest must be within a signed 32-bit range.
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
180
CALL: call subroutine
Syntax: CALL dest
Operation (absolute call):ESP ESP - 4[ESP] EIPEIP dest
Operation (relative call):ESP ESP - 4[ESP] EIPEIP EIP + dest
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
181
RET: return from subroutine
Syntax: RET
Operation:EIP [ESP]ESP ESP + 4
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
182
Arithmetic Instructions
CLC STC CMC ADD ADC INC SUB SBB DEC NEG
CMP MUL IMUL DIV IDIV CBW CWD CDQ CWDE MOVSX MOVZX
183
CLC: clear carry flag
Syntax: CLC
Operation:CF 0
-
of
-
df
-
sf
-
zf
-
af
-
pf
0
cf
184
STC: set carry flag
Syntax: STC
Operation:CF 1
-
of
-
df
-
sf
-
zf
-
af
-
pf
1
cf
185
CMC: complement carry flag
Syntax: CMC
Operation:CF ~CF
-
of
-
df
-
sf
-
zf
-
af
-
pf
X
cf
186
ADD: add integers
Syntax: ADD dest, orig
Operation:dest dest + orig
X
of
-
df
X
sf
X
zf
X
af
X
pf
X
cf
187
ADD examples
AX: 0075 CX: 01A2
Add ax,cx
Results: AX: 0217 CX: 01A2 SF=ZF=CF=OF=0
188
ADD examples
AX: 77AC CX: 4B35
add ax, cx Results:
AX: C2E1 CX: 4B35 SF=OF=1; ZF=CF=0
189
ADC: add with carry
Syntax: ADC dest, orig
Operation:dest dest + orig + CF
X
of
-
df
X
sf
X
zf
X
af
X
pf
X
cf
190
INC: increment integer
Syntax: INC dest
Operation:dest dest + 1
X
of
-
df
X
sf
X
zf
X
af
X
pf
-
cf
191
INC examples
ECX: 00 00 01 A2
inc ecx Results: ECX= 00 00 01 A3 SF=ZF=OF=0
192
INC examples
EDX: 7F FF FF FF
inc edx Results: EDS: 80 00 00 00 SF=OF=1; ZF=0
193
SUB: subtract integers
Syntax: SUB dest, orig
Operation:dest dest - orig
X
of
-
df
X
sf
X
zf
X
af
X
pf
X
cf
194
SUB examples
EAX: 00 00 00 75 ECX: 00 00 01 A2
sub eax, ecx Results: EAX: FF FF FE D3 ECX: 00 00 01 A2 SF=1, ZF=CF=OF=0
195
SUB examples
DX: FF 20 Word at value FF 20
sub dx, Value Results: DX:00 00 Value: FF 20 ZF=1PF=1, the rest are zero.
196
SBB: subtract with borrow
Syntax: SBB dest, orig
Operation:dest dest - orig - CF
X
of
-
df
X
sf
X
zf
X
af
X
pf
X
cf
197
DEC: decrement integer
Syntax: DEC dest
Operation:dest dest - 1
X
of
-
df
X
sf
X
zf
X
af
X
pf
-
cf
198
DEC examples
BX: 00 01
dec bx Results: BX: 00 00 ZF=1; SF=OF=0
199
DEC examples
AL: F5
dec al Results: AL: F4 SF=1; OF=ZF=0
200
NEG: negate
Syntax: NEG dest
Operation:dest - dest
Notes: Sets CF, unless dest is zero, y which case CF is cleared.
X
of
-
df
X
sf
X
zf
X
af
X
pf
X
cf
201
NEG examples
BX: 01 A2
neg bx Results: BX: FE 5E SF=1; ZF=0
202
NEG examples
DH: F5
neg dh Results: DH:0B SF=ZF=0
203
NEG examples
EAX: 00 00 00 00
neg eax Results: EAX: 00 00 00 00 SF=0; ZF=1
204
CMP: compare integers
Syntax: CMP op1, op2
Operation:NULL op1 - op2
X
of
-
df
X
sf
X
zf
X
af
X
pf
X
cf
205
MUL: unsigned integer multiply Syntax:
MUL orig Operation:
case(size(orig)) 8: AX AL * orig 16: DX:AX AX * orig 32: EDX:EAX EAX * origendcase
Notes: CF and OF are cleared ifthe high order of the result is zero.
Orig cannot be immediateX
of
-
df
?
sf
?
zf
?
af
?
pf
X
cf
206
MUL examples
AX: 00 05 BX: 00 02 DX: ?? ??
mul bx Results: DX: 00 00 AX: 00 0A CF=OF=0
207
MUL examples
AL: 05 Byte at Factor: FF
mul Factor Results: AX: 04 FB CF=OF=1
208
IMUL: signed integer multiply Syntax #1:
IMUL orig
Operation:case(size(orig)) 8: AX AL * orig 16: DX:AX AX * orig 32: EDX:EAX EAX * origendcase
209
IMUL examples
AX: 00 05 BX: 00 02 DX: ?? ??
imul bx DX: 00 00 AX: 00 0A CF=OF=0
210
IMUL examples
AL: 05 Byte at Factor: FF
imul Factor Results: AX: 04 FB CF=OF=1
211
IMUL: signed integer multiply (continued) Syntax #2:
IMUL dest, orig
Operation:dest dest * orig
X
of
-
df
?
sf
?
zf
?
af
?
pf
X
cf
212
IMUL examples
EBX: 00 00 00 0A
imul ebx, 10 *Note source may be immediate
Results: EBX: 00 00 00 64 CF=OF=0
213
IMUL: signed integer multiply (continued) Syntax #3:
IMUL dest, orig, const
Operation:dest orig * const
Notes: CF and OF are cleared if the result is the same size as the multiplicand.
X
of
-
df
?
sf
?
zf
?
af
?
pf
X
cf
214
IMUL examples
Word at Value: 08F2 BX: ?? ??
imul bx, Value, 1000 Results: BX: F1 50 CF=OF=1
215
?
of
-
df
?
sf
?
zf
?
af
?
pf
?
cf
DIV: unsigned integer divide
Syntax: DIV orig
Operation:case(size(orig)) 8: AL AX / orig AH AX % orig 16: AX DX:AX / orig DX DX:AX % orig 32: EAX EDX:EAX / orig EDX EDX:EAX % origendcase
216
DIV
source (divisor) other(dividend) Quotient Remainder
byte AX AL AHword DX:AX AX DX
double word EDX:EAX EAX EDX
217
DIV examples
EDX: 00 00 00 00 (100/13) EAX: 00 00 00 64 EBX: 00 00 00 0D
div ebx Results: EDX: 00 00 00 09 EAX: 00 00 00 07
218
?
of
-
df
?
sf
?
zf
?
af
?
pf
?
cf
IDIV: signed integer divide
Syntax: IDIV orig
Operation:case(size(orig)) 8: AL AX / orig AH AX % orig 16: AX DX:AX / orig DX DX:AX % orig 32: EAX EDX:EAX / orig EDX EDX:EAX % origendcase
219
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
CBW: convert byte to word
Syntax: CBW
Operation:AX SignExtend(AL)
220
CBW examples
AL: 53
cbw Results: AX: 0053
221
CBW examples
AL: C6
cbw Results: AX: FF C6
222
CWD: convert word to dword
Syntax: CWD
Operation:DX:AX SignExtend(AX)
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
223
CWD example
AX: 07 0D DX: ?? ??
cwd
Results: DX: 00 00 AX: 07 0D
224
CDQ: convert dword to qword Syntax:
CDQ
Operation:EDX:EAX SignExtend(EAX)
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
225
CDQ example
EAX: FF FF FA 13 EDX: ?? ?? ?? ??
cdq Results: EDX: FF FF FF FF EAX: FF FF FA 13
226
CWDE: convert word to dword extended Syntax:
CWDE
Operation:EAX SignExtend(AX)
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
227
CWDE example
AX: FF 2A
cwde
Results: EAX: FF FF FF 2A
228
MOVSX: move data with sign extend Syntax:
MOVSX dest, orig
Operation:dest SignExtend(orig)
Notes: orig must be smaller than dest.
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
229
MOVSX examples
Word at value: 07 0D
movsx ecx, value Results: ECX: 00 00 07 0D
230
MOVSX examples
Word at value: F7 0D
movsx ecx, value
Results: ECX: FF FF F7 0D
231
MOVZX: move data with zero extend Syntax:
MOVZX dest, orig
Operation:dest ZeroExtend(orig)
Notes: orig must be smaller than dest.
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
232
MOVZX examples
Word at value: 07 0D
movzx ecx, value Results: ECX: 00 00 07 0D
233
MOVZX examples
Word at value: F7 0D
movzx ecx, value
Results: ECX: 00 00 F7 0D
234
Logical and Bitwise Instructions
AND OR XOR NOT TEST
SHL SHR SAR ROL ROR RCL RCR
235
AND: bitwise and
Syntax: AND dest, orig
Operation:dest dest & orig
Notes: 0 & 0 = 00 & 1 = 01 & 0 = 01 & 1 = 1 0
of
-
df
X
sf
X
zf
?
af
X
pf
0
cf
236
OR: bitwise or
Syntax: OR dest, orig
Operation:dest dest | orig
Notes: 0 | 0 = 00 | 1 = 11 | 0 = 11 | 1 = 1 0
of
-
df
X
sf
X
zf
?
af
X
pf
0
cf
237
XOR: bitwise xor
Syntax: XOR dest, orig
Operation:dest dest ^ orig
Notes: 0 ^ 0 = 00 ^ 1 = 11 ^ 0 = 11 ^ 1 = 0 0
of
-
df
X
sf
X
zf
?
af
X
pf
0
cf
238
NOT: bitwise not
Syntax: NOT dest
Operation:dest ~dest
Notes: ~0 = 1~1 = 0
0
of
-
df
X
sf
X
zf
?
af
X
pf
0
cf
239
TEST: test bits
Syntax: TEST op1, op2
Operation:NULL op1 & op2
0
of
-
df
X
sf
X
zf
?
af
X
pf
0
cf
240
SHL: shift left
Syntax: SHL dest, count
Operation:
?
of
-
df
X
sf
X
zf
?
af
X
pf
X
cf
cf ...
msb lsb
0
241
SHR: shift right
Syntax: SHR dest, count
Operation:
?
of
-
df
X
sf
X
zf
?
af
X
pf
X
cf
cf...
msb lsb
0
242
SAR: shift arithmetic right
Syntax: SHR dest, count
Operation:
?
of
-
df
X
sf
X
zf
?
af
X
pf
X
cf
cf...
msb lsb
243
ROL: rotate left
Syntax: ROL dest, count
Operation:
?
of
-
df
X
sf
X
zf
?
af
X
pf
X
cf
cf ...
msb lsb
244
ROR: rotate right
Syntax: ROR dest, count
Operation:
?
of
-
df
X
sf
X
zf
?
af
X
pf
X
cf
cf...
msb lsb
245
RCL: rotate through carry left Syntax:
RCL dest, count
Operation:
?
of
-
df
X
sf
X
zf
?
af
X
pf
X
cf
cf ...
msb lsb
246
RCR: rotate through carry right Syntax:
RCR dest, count
Operation:
?
of
-
df
X
sf
X
zf
?
af
X
pf
X
cf
cf...
msb lsb
247
String Instructions
CLD STD REP STOSB REP STOSW REP STOSD
REP MOVSB REP MOVSW REP MOVSD
248
CLD: clear direction flag
Syntax: CLD
Operation:DF 0
-
of
0
df
-
sf
-
zf
-
af
-
pf
-
cf
249
STD: set direction flag
Syntax: STD
Operation:DF 1
-
of
1
df
-
sf
-
zf
-
af
-
pf
-
cf
250
REP STOSB: repeat store string byte Syntax:
REP STOSB
Operation:while(ECX <> 0) [EDI] AL if(DF = 0) EDI EDI + 1 else
EDI EDI - 1 endif ECX ECX - 1endwhile -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
251
REP STOSW: repeat store string word Syntax:
REP STOSW
Operation:while(ECX <> 0) [EDI] AX if(DF = 0) EDI EDI + 2 else
EDI EDI - 2 endif ECX ECX - 1endwhile -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
252
REP STOSD: repeat store string dword Syntax:
REP STOSD
Operation:while(ECX <> 0) [EDI] EAX if(DF = 0) EDI EDI + 4 else
EDI EDI - 4 endif ECX ECX - 1endwhile -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
253
REP MOVSB: repeat move string byte Syntax:
REP MOVSB
Operation:while(ECX <> 0) BYTE [EDI] BYTE [ESI] if(DF = 0) ESI ESI + 1 EDI EDI + 1 else ESI ESI - 1
EDI EDI - 1 endif ECX ECX - 1endwhile -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
254
REP MOVSW: repeat move string word Syntax:
REP MOVSW
Operation:while(ECX <> 0) WORD [EDI] WORD [ESI] if(DF = 0) ESI ESI + 2 EDI EDI + 2 else ESI ESI - 2
EDI EDI - 2 endif ECX ECX - 1endwhile -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
255
REP MOVSD: repeat move string dword Syntax:
REP MOVSD
Operation:while(ECX <> 0) DWORD [EDI] DWORD [ESI] if(DF = 0) ESI ESI + 4 EDI EDI + 4 else ESI ESI - 4
EDI EDI - 4 endif ECX ECX - 1endwhile -
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
CHAPTER 6
Mixing C and Assembly Language
257
Modularization
Most programs consist of a number of seperate parts, called modules.
Source modules are seperately edited and compiled or assembled in order to produce the corresponding object modules.
All the object modules are linked together to produce an executable program.
258
ELFexecutable
file
ld (linker)standard C
library
startfile
crt0.o
nasm
source module*.asm
source module*.o
gcc
source module*.c
source module*.o
...
Modularization(continued)
259
Exporting & Importing Names in Assembly Language
Any assembly language label may be exported to other modules using the global directive.
260
Exporting & Importing Names in Assembly Language (continued)
The global directive must appear before the definition of the corresponding symbol.
If a module exports a certain label, any other module may import it.
To import a label, the extern directive must be used.
A label can not be defined and declared extern in the same module.
261
Assembly Export/Import Examplebits 32section .data global alpha extern beta alfa dd 500section .text global _start extern func_start inc dword [alpha] inc byte [beta] call func mov eax, 1 mov ebx, 0 int 0x80
bits 32section .data global beta extern alpha beta db 10section .text global funcfunc xor eax, eax mov al, [beta] add [alpha], eax ret
module1.asm
module2.asm
262
Assembly Export/Import Example (continued)
Building the program:
$ nasm -f elf module1.asm$ nasm -f elf module2.asm$ ld -s module1.o module2.o -o program$ lsmodule1.asm module2.asmmodule1.o module2.oprogram
263
Exporting & Importing Names in ANSI C
By default, al function names and global variables are exportable to other modules.
If a name is prefered to be kept local to a module, it must be declared static.
264
Exporting & Importing Names in ANSI C (continued)
To indicate that a name is probably declared in some other module, the extern modifier must be used in the variable or function prototype declaration.
The extern modifier is optional in function prototype declarations.
It is not an error to declare a name extern and to have it defined as well in the same module.
265
ANSI C Export/Import Example
int x; /* defines an exportable variable */static int y; /* defines a local module variable */
/* import x if not defined in this module */extern int x;
/* import h if not defined in this module */extern int h(int, int);
int f(int a, int b) /* defines an exportable function */ { return a + b; }
static int g(int c) /* defines a local module function */ { return c + c; }
266
x86 and GCC Data types
GCC Data Type Size in bytes Assembly Language Equivalent
char 1 byteshort 2 wordint 4 dwordlong 4 dwordlong long 8 qwordfloat 4 dworddouble 8 qwordlong double 10 twordvoid * 4 dword
267
Register Usage
Function return their values in the following registers:AL for charAX for shortEAX for int, long and void *EDX:EAX for long longST0 for floating point
268
Register Usage (continued)
Registers EAX, ECX, EDX (not EBX) may be changed by the function; all other registers must be saved and restored.
Flags may be changed by the procedure with the following restriction: The direction flag is 0 by default. The direction flag may be set temporarily, but must be cleared before any call or return.
269
Passing Parameters
The parameters received by a C function, or a C-callable assembly language subroutine, are passed through the stack.
Parameters are pushed into the stack in reverse order, that is, from right to left. This means that the first paramater is always the nearest to the top of the stack.
270
Passing Parameters (continued)
After the parameters are pushed into the stack, a CALL instruction to the desired function or subroutine is executed.
When the function or subroutine returns, the parameters are still in the stack and must be removed by the caller. This may be done using POP instructions or by adjusting directly the ESP register through an ADD instruction.
271
Subroutine Prologue
The first two instructions in a C-callable subroutine that receives arguments should be:
push ebp
mov ebp, esp
This saves the EBP value, so that it can now point to the current top of stack.
272
Subroutine Prologue (continued)
After this prologue, the stack has the following layout:
...
EBPESP
Original value of EBP
CALL return address
Subroutineparameters
EBP+4
EBP+8
EBP+n
273
Subroutine Epilogue
In order to undo the subroutine prologue, the following intructions must be the last in a C-callable subroutine:
pop ebp
ret
CHAPTER 8
Floating Point Instructions
275
FPU: Floating Point Unit
The FPU (Intel x87) is used for mathematical computations that require floating point numbers.
Uses IEEE 754 standard for floating point numbers.
Works in parallel together with the other x86 units.
276
FPU Registers
CPU and FPU have a separate set of registers, mutually inaccessible.
FPU has a stack of eight 80-bit registers. The register at the top of the stack is called ST0,
the one bellow is ST1 and so on. All values in the FPU registers are stored as real
extended numbers (80-bit). All computations take place using this precision.
277
FPU Registers (continued)
st7
st6
st5
st4
st3
st2
st1
st0
79 63 0
sign
exponent
mantissa
278
x87 Data Types
x87 Data TypeNumber of Bytes
NASM Type ANSI C Type
word integer 2 word shortshort integer 4 dword int
long integer 8 qword long long
packed BCD integer 10 tword not availablesingle precision real 4 dword floatdouble precision real 8 qword doubleextended precision real 10 tword long double
The values contained in the FPU registers may be converted to and from the following data types:
The long long type is a GCC extension to ANSI C.
279
FPU Operations
Most FPU operations involve pushing and popping values to and from the register stack.
When a value is pushed to the stack, register ST0 becomes ST1, ST1 becomes ST2 and so on, thus making space in ST0 for the pushed value.
280
FPU Operations (continued)
The opposite occurs when the stack is popped: ST1 becomes ST0, ST2 becomes ST1 and son on.
Instructions that refer to memory usually require a size prefix: word, dword, qword or tword.
281
Using FPU Instructions
1. Reset FPU (FINIT).
2. Copy data from memory into FPU registers.
3. Process data.
4. Copy data from FPU registers back into memory.
282
Types of FPU Operations
Real Transfers Integer Transfers Packed BCD
Transfers Loading Constants Addition Normal Subtraction Reversed Subtraction
Multiplication Normal Division Reversed Division Transcendental
Instructions Comparisons Miscellaneous
Operations
283
Types of FPU Operations (continued)
Description of most FPU operations can be consulted in the FPU Operation Tables.
CHAPTER 9
SIMD Instructions
285
Data Transfer Instructions
MOVD MOVQ
286
MOVD: move dword
Syntax: MOVD dest, orig
Operation:dest orig
Notes: dest and orig may be MMX registers, memory locations or 32-bit integer registers. When the destination operand is an MMX register, the 32-bit source value is written to the low-order 32 bits of the 64-bit MMX register and zero-extended to 64 bits. When the source operand is an MMX register, the low-order 32 bits of the MMX register are written to the 32-bit integer register or 32-bit memory location selected with the destination operand.
287
MOVQ: move qword
Syntax: MOVQ dest, orig
Operation:dest orig
Notes: orig and dest can be either an MMX register or a memory location; however, data cannot be transferred from one memory location to another memory location.
288
Arithmetic Instructions
PADDB PADDW PADDD PADDSB PADDSW PADDUSB PADDUSW
PSUBB PSUBW PSUBD PSUBSB PSUBSW PSUBUSB PSUBUSW
289
Arithmetic Instructions (continued)
PMULLW PMULHW PMADDWD
290
Data Range Limits for Saturation
Decimal Hexadecimal Decimal Hexadecimal
Signed Byte -128 0x80 127 0x7FSigned Word -32,768 0x8000 32,767 0x7FFFUnsigned Byte 0 0x00 255 0xFFUnsigned Word 0 0x0000 65,535 0xFFFF
Lower Limit Upper LimitData Type
291
PADDB: packed truncated byte addition Syntax:
PADDB dest, orig
Operation:
+
=
+
=
+
=
+
=
+
=
+
=
+
=
+
=
dest
orig
dest
292
PADDW: packed truncated word addition Syntax:
PADDW dest, orig
Operation:
+
=
+
=
+
=
+
=
dest
orig
dest
293
PADDD: packed truncated dword addition Syntax:
PADDD dest, orig
Operation:
+
=
+
=
dest
orig
dest
294
PADDSB: packed signed saturated byte addition Syntax:
PADDSB dest, orig
Operation:
+
=
+
=
+
=
+
=
+
=
+
=
+
=
+
=
dest
orig
dest
295
PADDSW: packed signed saturated word addition Syntax:
PADDSW dest, orig
Operation:
+
=
+
=
+
=
+
=
dest
orig
dest
296
PADDUSB: packed unsigned saturated byte addition Syntax:
PADDUSB dest, orig
Operation:
+
=
+
=
+
=
+
=
+
=
+
=
+
=
+
=
dest
orig
dest
297
PADDUSW: packed unsigned saturated word addition Syntax:
PADDUSW dest, orig
Operation:
+
=
+
=
+
=
+
=
dest
orig
dest
298
PSUBB: packed truncated byte subtraction Syntax:
PSUBB dest, orig
Operation:
-
=
-
=
-
=
-
=
-
=
-
=
-
=
-
=
dest
orig
dest
299
PSUBW: packed truncated word subtraction Syntax:
PSUBW dest, orig
Operation:
-
=
-
=
-
=
-
=
dest
orig
dest
300
PSUBD: packed truncated dword subtraction Syntax:
PSUBD dest, orig
Operation:
-
=
-
=
dest
orig
dest
301
PSUBSB: packed signed saturated byte subtraction Syntax:
PSUBSB dest, orig
Operation:
-
=
-
=
-
=
-
=
-
=
-
=
-
=
-
=
dest
orig
dest
302
PSUBSW: packed signed saturated word subtraction Syntax:
PSUBSW dest, orig
Operation:
-
=
-
=
-
=
-
=
dest
orig
dest
303
PSUBUSB: packed unsigned saturated byte subtraction Syntax:
PSUBUSB dest, orig
Operation:
-
=
-
=
-
=
-
=
-
=
-
=
-
=
-
=
dest
orig
dest
304
PSUBUSW: packed unsigned saturated word subtraction Syntax:
PSUBUSW dest, orig
Operation:
-
=
-
=
-
=
-
=
dest
orig
dest
305
PMULLW: packed multiply low word (signed) Syntax:
PMULLW dest, orig
Operation:
*
Low Order
=
*
Low Order =
*
Low Order =
*
Low Order =
dest
orig
dest
306
PMULHW: packed multiply high word (signed) Syntax:
PMULHW dest, orig
Operation:
*
High Order
=
*
High Order =
*
High Order =
*
High Order =
dest
orig
dest
307
PMADDWD: packed multiply and add (signed) Syntax:
PMADDWD dest, orig
Operation:
* * * *dest
orig
dest
+ +
308
Logical Instructions
PAND POR PXOR PANDN
309
PAND: bitwise qword and
Syntax: PAND dest, orig
Operation:
&
=
dest
orig
dest
310
POR: bitwise qword or
Syntax: POR dest, orig
Operation:
|
=
dest
orig
dest
311
PXOR: bitwise qword xor
Syntax: PXOR dest, orig
Operation:
^
=
dest
orig
dest
312
PANDN: bitwise qword and/not Syntax:
PANDN dest, orig
Operation:
~
&
dest
orig
dest=
~dest
313
Shift Instructions
PSLLW PSLLD PSLLQ PSRLW PSRLD PSRLQ PSRAW PSRAD
314
PSLLW: packed word logical shift left Syntax:
PSLLW dest, orig
Operation:
<<
=
<<
=
<<
=
<<
=
dest
orig
dest
315
PSLLD: packed dword logical shift left Syntax:
PSLLD dest, orig
Operation:
<<
=
<<
=
dest
orig
dest
316
PSLLQ: packed qword logical shift left Syntax:
PSLLQ dest, orig
Operation:
<<
=
dest
orig
dest
317
PSRLW: packed word logical (unsigned) shift right Syntax:
PSRLW dest, orig
Operation:
>>
=
>>
=
>>
=
>>
=
dest
orig
dest
318
PSRLD: packed dword logical (unsigned) shift right Syntax:
PSRLD dest, orig
Operation:
>>
=
>>
=
dest
orig
dest
319
PSRLQ: packed qword logical (unsigned) shift right Syntax:
PSRLQ dest, orig
Operation:
>>
=
dest
orig
dest
320
PSRAW: packed word arithmetic (signed) shift right Syntax:
PSRAW dest, orig
Operation:
>>
=
>>
=
>>
=
>>
=
dest
orig
dest
321
PSRAD: packed dword arithmetic (signed) shift right Syntax:
PSRAD dest, orig
Operation:
>>
=
>>
=
dest
orig
dest
322
Comparison Instructions
PCMPEQB PCMPEQW PCMPEQD PCMPGTB PCMPGTW PCMPGTD
323
PCMPEQB: packed compare for equal bytes Syntax:
PCMPEQB dest, orig
Operation:
==
=
==
=
==
=
==
=
==
=
==
=
==
=
==
=
dest
orig
dest
All ones if true, all zeros if false.
324
PCMPEQW: packed compare for equal words Syntax:
PCMPEQW dest, orig
Operation:
All ones if true, all zeros if false.
==
=
==
=
==
=
==
=
dest
orig
dest
325
PCMPEQD: packed compare for equal dwords Syntax:
PCMPEQD dest, orig
Operation:
All ones if true, all zeros if false.
==
=
==
=
dest
orig
dest
326
PCMPGTB: packed compare for greater than bytes (signed) Syntax:
PCMPGTB dest, orig
Operation:
>
=
>
=
>
=
>
=
>
=
>
=
>
=
>
=
dest
orig
dest
All ones if true, all zeros if false.
327
PCMPGTW: packed compare for greater than words (signed) Syntax:
PCMPGTW dest, orig
Operation:
All ones if true, all zeros if false.
>
=
>
=
>
=
>
=
dest
orig
dest
328
PCMPGTD: packed compare for greater that dwords (signed) Syntax:
PCMPGTD dest, orig
Operation:
All ones if true, all zeros if false.
>
=
>
=
dest
orig
dest
329
Conversion Instructions
PACKSSWB PACKSSDW PACKUSWB PUNPCKLBW PUNPCKLWD PUNPCKLDQ
PUNPCKHBW PUNPCKHWD PUNPCKHDQ
330
PACKSSWB: pack words into bytes with signed saturation Syntax:
PACKSSWB dest, orig
Operation:
dest
orig
dest
331
PACKSSDW: pack dwords into words with signed saturation Syntax:
PACKSSDW dest, orig
Operation:
dest
orig
dest
332
PACKUSWB: pack words into bytes with unsigned saturation Syntax:
PACKUSWB dest, orig
Operation:
dest
orig
dest
333
PUNPCKLBW: unpack low packed bytes Syntax:
PUNPCKLBW dest, orig
Operation:
dest
orig
dest
334
PUNPCKLWD: unpack low packed words Syntax:
PUNPCKLWD dest, orig
Operation:
dest
orig
dest
335
PUNPCKLDQ: unpack low packed dwords Syntax:
PUNPCKLDQ dest, orig
Operation:
dest
orig
dest
336
PUNPCKHBW: unpack high packed bytes Syntax:
PUNPCKHBW dest, orig
Operation:
dest
orig
dest
337
PUNPCKHWD: unpack high packed words Syntax:
PUNPCKHWD dest, orig
Operation:
dest
orig
dest
338
PUNPCKHDQ: unpack high packed dwords Syntax:
PUNPCKHDQ dest, orig
Operation:
dest
orig
dest
339
Empty MMX State Instruction
EMMS
340
EMMS: empty MMX state
Syntax: EMMS
Notes: Should be used at the end of a sequence of MMX instructions in order to allow subsequent FPU instructions.
CHAPTER 10
Interrupt Handling
342
Interrupting Program Execution An interrupt is an
asynchronous event that is typically triggered by hardware (I/O device).
An exception is a synchronous event that is generated when the processor detects one or more predefined conditions while executing an instruction.
343
Interrupting Program Execution (continued)
When an interrupt or exception is signaled, the processor halts execution of the current task and switches to a handler procedure that has been written specifically to handle the interrupt or exception condition.
344
Interrupting Program Execution (continued)
The processor accesses the handler procedure through an entry in the interrupt descriptor table (IDT).
When the handler has completed handling the interrupt or exception, program control is returned to the interrupted task.
345
Interrupt Descriptor Table
The IDT comprises up to 256 8-byte gate descriptors.
A gate is the mechanism that allows a task to execute code in a different privilege level.
Each gate descriptor contains the segment selector, offset and privilege level of its corresponding handler procedure.
The address and size of the IDT is stored in the 48-bit Interrupt Descriptor Table Register. (IDTR).
346
Gate for Interrupt n
Interrupt Descriptor Table Register
IDT base address (32 bits) IDT Limit
0151647
...
Gate for Interrupt 1
Gate for Interrupt 0IDT may begin at
any address inphysical memory
8-byte descriptors
handler procedure code for
interrupt 0
handler procedure code for
interrupt 1
handler procedure code for
interrupt n
IDTR
347
SIDT: store IDTR
Syntax: SIDT dest
Operation:dest IDTR
-
of
-
df
-
sf
-
zf
-
af
-
pf
-
cf
348
Hardware Interrupts
The x86 processor has two pins that can be attached to external interrupt-generating devices. These pins, or input lines, are: INTR Maskable interrupts NMI Nonmaskable interrupts
349
Interrupt Flag
The interrupt flag IF is contained in the EFLAGS register.
The INTR input line may be enabled or disabled through software (running in the correct privileged level) with the use of the STI (set IF) and CLI (clear IF) instructions. This means that INTR may be masked (disabled).
The NMI input line is nonmaskable, which means it may not be disabled.
350
The PIC 8259
The 8259 Programmable Interrupt Controller (PIC) chip accepts interrupts from up to eight different devices. If any one of the devices requests service, the 8259 will toggle the CPU’s INTR input line and pass an interrupt vector number to the CPU’s data bus.
Several PICs may be cascaded in order to support up to different 64 devices.
351
The PIC 8259 (continued)
A typical PC uses two PICs to provide 15 interrupt inputs (seven on the master PIC with its eighth input coming from the slave PIC to process its eight inputs).
In modern motherboards, the 8259 is usually incorporated into a larger chip as part of the chipset.
352
PIC and CPU Connections
CPUx86
data bus
D0 D1 D2 D3 D4 D5 D6 D7
INTR
PIC8259
master
IRQ0
IRQ1
IRQ2
IRQ3
IRQ4
IRQ5
IRQ6
IRQ7
PIC8259slave
IRQ0
IRQ1
IRQ2
IRQ3
IRQ4
IRQ5
IRQ6
IRQ7
353
PIC Inputs for a PC (Real Mode)8259 Pin Vector Number Device
IRQ 0 0x08 Timer chip IRQ 1 0x09 Keyboard IRQ 2 0x0A Cascade for slave controller (IRQ 8-15) IRQ 3 0x0B Serial port 2 IRQ 4 0x0C Serial port 1 IRQ 5 0x0D Parallel port 2 in AT, reserved in PS/2 systems IRQ 6 0x0E Diskette drive IRQ 7 0x0F Parallel port 1 IRQ 8/0 0x70 Real-time clock IRQ 9/1 0x71 CGA vertical retrace (and other IRQ 2 devices) IRQ 10/2 0x72 Reserved IRQ 11/3 0x73 Reserved IRQ 12/4 0x74 Reserved in AT, auxiliary device on PS/2 systems IRQ 13/5 0x75 FPU interrupt IRQ 14/6 0x76 Hard disk controller IRQ 15/7 0x77 Reserved
354
Interrupts and Exceptions (Protected Mode)
Vector Number Description Source0 Divide error DIV and IDIV instructions1 Debug Any code or data reference2 NMI interrupt Nonmaskable external interrupt3 Breakpoint INT 3 instruction4 Overflow INTO instruction5 Bound range exceeded BOUND instruction6 Invalid opcode UD2 instruction or reserved opcode7 Device not available No math coprocessor
8 Double faultAny instruction that can generate an exception, an NMI, or an INTR.
9 Reserved10 Invalid TSS Task switch or TSS access.
355
Interrupts and Exceptions (continued)Vector Number Description Source
11 Segment Not PresentLoading segment registers or accessing system segments.
12 Stack Segment FaultStack operations and SS register loads.
13 General ProtectionAny memory reference and other protection checks.
14 Page Fault Any memory reference.15 Reserved
16 Floating-Point ErrorFloating-point or WAIT/FWAIT instruction.
17 Alignment Check Any data reference in memory.18 Machine Check Model dependent.
19-31 Reserved
32-255 Maskable InterruptsExternal interrupt from INTR pin or INT n instruction.
356
Signals
Linux traps all interrupts and exceptions that are generated by the system.
Under some circumstances, the operating system will send a signal to a running process informing it that an exceptional situation has occurred.
357
Signals (continued)
Some signals report errors such as references to invalid memory addresses; others report asynchronous events, such as disconnection of a phone line.
358
Hardware Interrupts & Signals
CPUx86
PIC8259
OSkernel
Process
1. A device generates a hardware interrupt
2. CPU calls the handler procedure provided by the OS kernel
3. If required, the OS kernel sends a signal to a process
359
Software Exceptions & Signals
CPUx86
OSkernel
Process
1. Process generatesa software exception
2. CPU calls the handler procedure provided by the OS kernel
3. OS kernel sends a signal to the offending process
360
Signal Handling
A programmer may arrange for a particular signal to be ignored or to be processed by a special piece of code called a signal handler.
361
Signal Handling (continued)
In the latter case, the process that receives the signal suspends its current flow of control, executes the signal handler, and the resumes the original flow of control when the signal handler finishes.
362
Predefined Signals
There are 31 different signals defined for UNIX.
A programmer may choose one of the following actions for a particular signal: Trigger a user-supplied signal handler
Trigger the default kernel-supplied handler
Ignore it
363
Default Signal Handlers
DUMP: terminate the process and generate a core (memory) image file
QUIT: terminate the process without generating a core image file
IGNORE: ignore and discard the signal SUSPEND: suspends the process
364
List of Signals
MacroSignal
NumberDefault Action
Description
SIGHUP 1 quit HangupSIGINT 2 quit InterruptSIGQUIT 3 dump QuitSIGILL 4 dump Illegal instructionSIGTRAP 5 dump Trace trap (for debugging)SIGIOT 6 dump IO Trap instructionSIGBUS 7 dump Bus errorSIGFPE 8 dump Floating Point ExceptionSIGKILL 9 quit Kill (cannot be caught, blocked or
ignored)SIGUSR1 10 quit User defined signal 1
365
List of Signals (continued)
MacroSignal
NumberDefault Action
Description
SIGSEGV 11 dump Segmentation violationSIGUSR2 12 quit User defined signal 2SIGPIPE 13 quit Write on a pipe with no one to read
itSIGALRM 14 quit Alarm clockSIGTERM 15 quit Software termination signalSIGCHLD 17 ignore Child status has changedSIGCONT 18 ignore Continue after stopSIGSTOP 19 suspend Stop (cannot be caught, blocked or
ignored)SIGTSTP 20 suspend Stop signal generated from
keyboard
366
List of Signals (continued)
MacroSignal
NumberDefault Action
Description
SIGTTIN 21 suspend Background read attempted from control terminal
SIGTTOU 22 suspend Background write attempted to control terminal
SIGURG 23 ignore Urgent condition present on socketSIGXCPU 24 quit CPU time limit exceededSIGXFSZ 25 quit File size limit exceededSIGVTALRM 26 quit Virtual time alarmSIGPROF 27 quit Profiling timer alarmSIGWINCH 28 ignore Window size changedSIGLOST 29 quit Resource lost
367
Setting a SignalHandler
The signal system call allows a process to specify the action that it will take when a particular signal is received.
368
Setting a Signal Handler (continued)
It takes two parameters (from left to right):1. The code number of the signal to be
reprogrammed.2. The address of a user defined function,
which will be executed when thespecified signal arrives, or zero(SIG_DFL) to use the default handler,or one (SIG_IGN) to ignore the signal.