IA32 programming for Linux Concepts and requirements for writing Linux assembly language programs...

Preview:

Citation preview

IA32 programming for Linux

Concepts and requirements for writing Linux assembly language

programs for Pentium CPUs

A source program’s format

• Source-file: a pure ASCII-character textfile

• Is created using a text-editor (such as ‘vi’)

• You cannot use a ‘word processor’ (why?)

• Program consists of series of ‘statements’

• Each program-statement fits on one line

• Program-statements all have same layout

• Design in 1950s was for IBM punch-cards

Statement Layout (1950s)

• Each ‘statement’ was comprised of four ‘fields’• Fields appear in a prescribed left-to-right order• These four fields were named (in order):

-- the ‘label’ field-- the ‘opcode’ field-- the ‘operand’ field-- the ‘comment’ field

• In many cases some fields could be left blank• Extreme case (very useful): whole line is blank!

The ‘as’ program

• The ‘assembler’ is a computer program

• It accepts a specified text-file as its input

• It must be able to ‘parse’ each statement

• It can produce onscreen ‘error messages’

• It can generate an ELF-format output file

• (That file is known as an ‘object module’)

• It can also generate a ‘listing file’ (optional)

The ‘label’ field

• A label is a ‘symbol’ followed by a colon (‘:’)• The programmer invents his own ‘symbols’• Symbols can use letters and digits, plus a very

small number of ‘special’ characters ( ‘.’, ‘_’, ‘$’ )• A ‘symbol’ is allowed to be of arbitrarily length • The Linux assembler (‘as’) was designed for

translating source-text produced by a high-level language compiler (such as ‘cc’)

• But humans can also write such files directly

The ‘opcode’ field

• Opcodes are predefined symbols that are recognized by the GNU assembler

• There are two categories of ‘opcodes’ (called ‘instructions’ and ‘directives’)

• ‘Instructions’ represent operations that the CPU is able to perform (e.g., ‘add’, ‘inc’)

• ‘Directives’ are commands that guide the work of the assembler (e.g., ‘.globl’, ‘.int’)

Instructions vs Directives

• Each ‘instruction’ gets translated by ‘as’ into a machine-language statement that will be fetched and executed by the CPU when the program runs (i.e., at ‘runtime’)

• Each ‘directive’ modifies the behavior of the assembler (i.e., at ‘assembly time’)

• With GNU assembly language, they are easy to distinguish: directives begin with ‘.’

A list of the Pentium opcodes

• An ‘official’ list of the instruction codes can be found in Intel’s programmer manuals:

http://developer.intel.com

• But it’s three volumes, nearly 1000 pages (it describes ‘everything’ about Pentiums)

• An ‘unofficial’ list of (most) Intel instruction codes can fit on one sheet, front and back:

http://www.jegerlehner/intel/

The AT&T syntax

• The GNU assembler uses AT&T syntax (instead of official Intel/Microsoft syntax) so the opcode names differ slightly from names that you will see on those lists:

Intel-syntax AT&T-syntax--------------- ---------------------- ADD addb/addw/addl INC incb/incw/incl CMP

cmpb/cmpw/cmpl

The UNIX culture

• Linux is intended to be a version of UNIX (so that UNIX-trained users already know Linux)

• UNIX was developed at AT&T (in early 1970s) and AT&T’s computers were built by DEC, thus UNIX users learned DEC’s assembley language

• Intel was early ally of DEC’s competitor, IBM, which deliberately used ‘incompatible’ designs

• Also: an ‘East Coast’ versus ‘West Coast’ thing (California, versus New York and New Jersey)

Bytes, Words, Longwords

• CPU Instructions usually operate on data-items• Only certain sizes of data are supported:

BYTE: one byte consists of 8 bits

WORD: consists of two bytes (16 bits)

LONGWORD: uses four bytes (32 bits)• With AT&T’s syntax, an instruction’s name also

incorporates its effective data-size (as a suffix) • With Intel syntax, data-size usually isn’t explicit,

but is inferred by context (i.e., from operands)

The ‘operand’ field

• Operands can be of several types:

-- a CPU register may hold the datum

-- a memory location may hold the datum

-- an instruction can have ‘built-in’ data

-- frequently there are multiple data-items

-- and sometimes there are no data-items• An instruction’s operands usually are ‘explicit’,

but in a few cases they also could be ‘implicit’

Examples of operands

• Some instruction that have two operands:movl %ebx, %ecxaddl $4, %esp

• Some instructions that have one operand:incl %eaxpushl $fmt

• An instruction that lacks explicit operands:ret

The ‘comment’ field

• An assembly language program often can be hard for a human being to understand

• Even a program’s author may not be able to recall his programming idea after awhile

• So programmer ‘comments’ can be vital

• A comments begin with the ‘#’ character

• The assembler disregards all comments (but they will appear in program listings)

‘Directives’

• Sometimes called ‘pseudo-instructions’

• They tell the assembler what to do

• The assembler will recognize them

• Their names begin with a dot (‘.’)

• Examples: ‘.section’, ‘.global’, ‘.int,’ …

• The names of valid directives appears in the table-of-contents of the GNU manual

New program example

• Let’s look at a demo program (‘squares.s’)• It prints out a mathematical table showing some

numbers and their squares• But it doesn’t use any multiplications!• It uses an algorithm based on algebra:

(n+1)2 - n2 = n + n + 1

If you already know the square of a given number n , you can get the square of the

next number n+1 by just doing additions

Visualizing the algorithm idean

n

(n + 1)2 = n2 + 2n + 1

A program with a ‘loop’

• Here’s our program idea (expressed in C)int num = 1, val = 1;do {

printf( “ %d %d \n”, num, val );val += num + num + 1;num += 1;}

while ( num <= 20 );

Some new ‘directives’

• ‘.equ’ – equates a symbol to a value:.equ MAX, 20

• ‘.globl’ – just an alternative for ‘.global’:.globl main

Some new ‘instructions’

• ‘inc’ – adds one to the specified operand:incl arg

• ‘cmp’ – compares two specified operands:cmpl $max, arg

• ‘jle’ – jump (to a specified instruction) if condition ‘less than or equal to’ is true:

jle again

Comparisons can be ‘tricky’

• It’s easy to get confused by AT&T syntax:

mov $5, %eax

while: inc %eax

cmp $5, %eax

jle while

(e.g., will this loop ever finish executing?)

• REMEMBER: ‘compare’ means ‘subtract’

The FLAGS register

OF

DF

IF

TF

SF

ZF

0AF

0PF

1CF

Legend: ZF = Zero FlagSF = Sign FlagCF = Carry FlagPF = Parity FlagOF = Overflow FlagAF = Auxiliary Flag

In-class exercise #1

• How would you modify the source-code for the ‘squares’ program so that it prints out a larger table (i.e., more than 20 lines)?

• How many squares can you display on the screen before your program starts to show ‘wrong’ entries?

In-class exercise #2

• Can you write a program that prints out a table showing powers of 2 (it’s useful for computer science students to keep handy)

• Can you see how to do it without using any ‘multiply’ operations – just additions?

• Hint: study the ‘squares.s’ source-code

• Then write your own ‘powers.s’ solution

• Turn in printouts (source and its output)

Recommended