Upload
lenard-tate
View
214
Download
0
Embed Size (px)
Citation preview
1
CPE 471
Assemblers
CPE 471 2
Assembly Instructions Four parts to an instruction:
Label(s) Operation Operands Comments
We are using a flexible format Instruction is still on one line (except label) Spaces and tabs can appear between “tokens”
Alternative: fixed format Label in column 1; Op in column 16; Operands in col
24, etc.
CPE 471 3
Assembly Instructions Label definition:
Symbolic name for instructions address Clarifies branching to an address and data's
address Often severely restricted in length
Starts anywhere before operation Contains letters (a-zA-Z), digits, ‘$’, ‘_’, ‘.’
CPE 471 4
Assembly Instructions Operation field
Mnemonic for an instruction (ADD, BRA) or Mnemonic for a pseudo-instruction (DC, EQU)
Operand field Addresses and registers used by instruction In other words, what to add, where to branch
Comment field: no affect on assembler Used for documentation.
CPE 471 5
What's an Assembler Translation:
Source program translated into target Source and target define levels Source is not directly executed (what is that?)
Target or object file is executed or translated later Assembly language means the source is a
symbolic representation of machine language
When the source is higher level the translator is usually called a compiler
CPE 471 6
Advantages of Assembly Compared to machine code
Easier to remember (HLT vs 26577) (symbolic)
Similarly for addresses in program (symbolic) BGT loop1 vs BGT 6210
Over high-level language Access to full capabilities of the machine
testing overflow flag, test & set (atomic), registers Speed
CPE 471 7
Advantages of Assembly Often a holdover from when machines were
expensive and people where cheap Systems programming often done in a language
like C Old myth: "if a program will be used a lot, it
should (for efficiency) be written in assembly" Hard to write (10 lines/day independent of
language) Hard to read: high maintenance, high turnover Reality: good compilers, fast machines
CPE 471 8
Modern Approach
Write in high level language Analyze to find where time spent Invariably, it's a tiny part of the code Improve that tiny part (perhaps with assembly) Problem oriented language allows high level
insights Algorithmic insights save tremendously
Assembly programmer immersed in bit-twiddling (penny wise, pound foolish)
CPE 471 9
Types of Assemblers Assemblers can be classified according to the
number of passes (scanning the source file) to: One-pass assembler: some restrictions are imposed on
the forward referencing such as eliminating forward references to data.
Two-pass assembler: the only restriction on forward referencing is in the symbol definition, i.e., all assembler directives (e.g. EQU and ORG) that define symbols can only use symbols that are previously defined.
Multi-pass assembler, restrictions are made on the level of nesting in forward referencing.
CPE 471 10
Assembler Tasks Parse assembly instructions
Check for syntax Tokenize string
Assigns addresses to instructions Maintains location counter LC = eventual location in memory of this instruction
Generate machine code Evaluation mnemonic instructions (replace w/opcode) Evaluate operand sub-field (replace symbols with
value)
CPE 471 11
Assembler Tasks Concatenate to form instructions
Process pseudo-ops generate header record evaluate DC, DS, … etc
Write output object file Nothing here seems all that hard!
CPE 471 12
Example: First Attempt Read each input line and generate machine code
Associate symbol with location counter Lookup mnemonic and get opcode Generate instruction
Example:
CPE 471 13
ExampleTest ORIG $0100A EQU 16Begin LD R0,N
LD R1, #AST R0, ANSHLT
N DC 13ANS DS 1
END Begin
CPE 471 14
Data Structures Location counter (LocCounter) Search a table (OpTable) for mnemonic
Get opcode Prepare to handle arguments (group like
instructions together?) Translate arguments
Lookup/add symbol names (SymTable) and replace with location Watch out for relative offsets
CPE 471 15
Location Counter
Eventual address of instruction Initialized with 0 Increment with each instruction (see
OpTable). Always two for our machine
CPE 471 16
2-pass Assembler Solution: 2-pass assembler
Pass #1: identify and define all labels As location of each label is determined, save in
SymTable Requires some knowledge of instructions
Pass #2: generate object code Whenever a label is used, replace with value
saved in table
CPE 471 17
Pass I: Determine length of machine
instructions. Keep track of Location Counter (LC). Generate a table of symbols for pass 2. Process some pseudo operations. Remember literals.
CPE 471 18
Pass II Look up value of symbols. Generate machine code for instructions. Generate data for DS, DC, and literals. Process pseudo operations.
CPE 471 19
Pass 1 & 2 Communication
Scan text file twice Save symbol locations first pass, then plug in
2nd Simpler Disadvantage: slow
6,800 instructions medium size file 52,000 instructions large file
CPE 471 20
Big Picture: Labs 1 & 2Assembly
File
Object File
Linker
Assembler
Disassembler
Lab 2
Lab 1
Lab 3
Executable File
CPE 471 21
Table Driven Software Many times software is very repetitive
Use functions! Many times the information processes is
repetitive Use loops
Many times the information to write the code is repetitive but static: Use tables
CPE 471 22
Table Driven Software Easier to modify: add new entries, change
existing ones, well centralized Code is easier, eventually, to understand Works if there are not many exceptions
CPE 471 23
Machine OP Table
This table is static (unchanging) For our machine:
All opcodes are 6 bits Instructions size is one or two words. Formats do differ
MnemonicName
Opcode Instruction Size
Instruction Format
CPE 471 24
Machine Op Table Variable length instructions:
"Branch relative": PC <- PC + operand Near: operand is 9 bits, far operand is 16 bits.
Varying formats
Fixed format makes parsing simple
CPE 471 25
Machine Op Table Fields might include:
Name: “add” Type: ALU opcode: 000000 Size: 2 or 4
One entry for each instruction
CPE 471 26
Symbol Table Pass 1:
Each symbol is defined. Every time a new symbol seen (i.E., In a label) put it in the symbol table
If already there, error
CPE 471 27
Symbol Table (Pass #1) Each symbol gives a value
Explicit: A EQU 16 Easy: just put operand in table Implicit: N DC 20
Must know address of instruction Therefore, keep track of addresses as program is
scanned Use location counter (LC)
CPE 471 28
Symbol Table (Pass #2) Symbols in operand replaced with value
Look up symbol in symbol table If there, replace with value Else, ERROR
CPE 471 29
Literals Implicit allocation & initialization of memory Allows us to put the value itself right in the
instruction Prefice with “=“ Example: LD D3,=#16
This means: allocate storage at end of program initialize this storage with the value #16 use this address in the instruction
CPE 471 30
Literal Table Pass #1
literals are identified and placed in the table “name”, “value”, and “size” fields updated duplicates can be eliminated
To complete pass #1: literals are added to the end of the program “address” field can now be calculated
Pass #2: literals in instructions are replaced with the __________ field from the Literal Table what if the literal is not in the table?
CPE 471 31
Pseudo Operations Unlike operations, typically do not have
machine instruction (opcode) equivalents Give information to the assembler itself.
Another term: assembler directive Not intended to implement higher level
control structures (if-then, loops) Uses: segment definition, symbol definition,
memory initialization, storage allocation Low level bookeeping easily done by machine
CPE 471 32
Psuedo-op Table
Also a static table Some lengths are 0, some 1, others?
CPE 471 33
Object File Header record: contains information on program
length, symbols that other modules may be looking for, and the name of this module.
Format:
1 H for header2-5 Program length in HEX + a space6-80 List of entry names and locations
Entry name (followed by a space).Entry location in HEX 4 columns + a
space The first entry should be the first ORG.
CPE 471 34
Object File II Text (type T): contains the hex equivalent of the
machine instruction. Format:
1 T for text2-5 Address in HEX where this text should be placed in
memory.
6-80 Instructions and/or data in HEX. Each word (byte) should be followed by an allocation
character to determine the type: S (Absolute – Does not need modification), R (Relative:
needs relocation), and X (external).
CPE 471 35
Object File III End (type E): indicates the end of the
module and where the execution would begin. Format: Column
1 E for END2-5 Execution start address in
HEX
CPE 471 36
Information Flow Pass 1/2
CPE 471 37
Two Pass Assembler:Limitations
Q: Does our 2-pass approach solve all forward-reference problems?
A: no! Something is still broken…
CPE 471 38
Forward Reference Restriction
To avoid this trouble, impose a restriction: -------------
What about DS? Is a forward symbol allowed as the operand?
Y EQU 0 X EQU Y
X EQU Y Y EQU 0
Consider:
CPE 471 39
Containers What does this mean We can't just generate a machine instruction
from each assembly instruction -- must save info
We need to start using some data structures! Table means any data structure or
container
CPE 471 40
Absolute Programs Programmer decides a priori where
program will reside e.g., Prog ORIG $3176
But memory is shared concurrent users, concurrent jobs/processes several programs loaded simultaneously
CPE 471 41
Absolute Programs: Limitation
CPE 471 42
Absolute Programs: Limitation II
Would like the loading to be flexible decide at load time where it goes!
(not at ____________ time) this decision is made by __________________
What the programmer wants:“find a free slot in memory that is big enough
to fit this program”
CPE 471 43
Motivating Relocation -Example
Prog ORIG 0X DC.W YStart LD.L D1,X
ST.L D1,YHLT
Y DS.W 1END Start
•In Memory, this program appears as:
0000
0002
0004
0006
0008
000A
000C
CPE 471 44
Motivating Relocation -Example
* One slight changeProg ORIG $0100X DC.W YStart LD.L R1,X
ST.L R1,YHLT
Y DS.L 1END Start
•In Memory, this program appears as:
0100
0102
0104
0106
0108
010A
010C
CPE 471 45
Relocation The loader must update some (parts of)
text records, but not all after load address has been determined
The assembler does 2 things: assemble with a load address tell the loader which parts will need to be
updated
CPE 471 46
Modification Records One approach: define a new record type that
identifies the address to be modified We could add the following record to the object
file: M address e.g. M 0004 Also need to indicate size of quantity being
relocated: Two sizes: 9 bit pgoffset and 16 bit full word
M0000_16 M0001_9 M0002_9
One disadvantage of this approach: -
CPE 471 47
Alternative: Bit Masks Use 1 bit / memory cell (or byte)
bit value is 0 means no relocation necessary bit value is 1 means relocation necessary
Size of relocation data independent of number of records needing modification but more densely packed (1 bit / text record)
Hard to read (debug, grade,…)
CPE 471 48
Kinds of Data Our machine has two flavors of data:
relative (to the load address) absolute
• The first must be modified, the second not
• Let’s look at how these kinds arise…
CPE 471 49
Symbols Some are relative:
e.g., Some are absolute:
e.g., Symbol Table
CPE 471 50
Searching
We must associate a name with a value. Example: A symbol table is a collection of <key, value> pairs:
Search is given a key, return the corresponding value.
Very important for assemblers. Every line has an instruction (look up in MOT or POT). Lots of symbols (or literals) used. 50% of time searching tables.
CPE 471 51
Searching
Cover several methods: Linear Binary Hash (best for assembler)
For each method, lots of analysis (i.E. Theory and mathematics) about how long they take and why
CPE 471 52
Linear Search: Performance
For a table of length N, how long to insert? Constant time: O(1) (eg. fast)
Why? Insert at beginning or end How long to find?
Best case: 1 (i.e. At first location) Worst case: O(N) (i.e. At last location) average case: (N+1)/2 = O(N). How?
CPE 471 53
Linear Search: Performance
One way to lower average case is to change the probability: put most likely matches at beginning
All this analysis assumes name is in the table If not present, time is N. Average case
should consider this too What does this mean? Time to find one item
is linear (double size, twice as long). N
CPE 471 54
Binary Search
This algorithm requires: Sorted table There is a "<" operator Constant time access to arbitrary keys
Note: each iteration divides the problem in half
How long? Best case: 1. Worst case lg N
CPE 471 55
Compromise
In the assembler, we do all insertions then the bulk of the searching So insert as a linear table: O(1) Then sort: O(N lg N) Then binary search the sorted table
When do we detect duplicates Insertion, or After sort
CPE 471 56
Hash Tables
Combines advantage of binary search (fast retrieval) and advantage of linear search (fast insertion)
Basic idea: we want to use the key as an index in an array so it only takes one step to find something: table["key"]
Problem: strings (or any other data type) are not integers. If we gave strings an ordinal value, way too big 128^8 =72 quadrillion
CPE 471 57
Hash Tables
Solution: Use a function that transforms our data into an integer. This is called the hash function, h:
h: K --> Z(m)
where Z(n) = {1,2,... m};m = table size;K = set of possible keys
An ideal function generates a unique integer for each key (problem: 72Quadrillion to 4billion integers)
CPE 471 58
Hash Table
To insert <key, value>, we want to do: table[h (key)] = value
To find key: return table[h (key)];
Since |K| >> m what we look for is a hash function h that returns a uniform distribution
CPE 471 59
Hash Functions
We want to design h so that small differences in key makes a big difference in h(key).
H is like a deterministic random number generator. Why? Digital signatures (md5, pgp) sometimes called
hash function
Need to find the key after insertion
CPE 471 60
Hash Function Example
h(key) = (sum of letter values) mod 50 + 1 h("magenta") = (13+1+7+5+14+20+1) mod
50 + 1 = (61 mod 50) + 1 = 12
But two different keys can get mapped to the same value:
h("rub") = 18+21+2 = 41; h("dream") = 4+18+5+1+13 = 41
This is called a collision.
CPE 471 61
Dealing With Collisions
Go to next open cell. (X2,X3,X4 hash to same value)
Problem: leads to clustering:
Rehash with another, different function: Cost: computation, some clustering
X1 X2X3X4
CPE 471 62
Dealing With Collisions Keep a linked list
X2
X3
X1
X4
CPE 471 63
Hash Tables: Complexity
How long does insertion take (w/ chaining) Number of attempts depends on how full
the table is - 1st entry never collides Assume there are n elements in table (size is
m) Average time for an unsuccessful search:
Average time to search to end of a list Average list length: r = n/m Including hash and indexing: O(1 + r)
CPE 471 64
Expressions Most assemblers permit use of
expressions Used as instruction operands
in machine op’s and pseudo op’s Typically simple mathematical forms only Two parts:
operators (+, -, *, /) individual terms
CPE 471 65
• Individual terms may be: constants (e.g., #4, ‘A’, $3F) user-defined symbols (e.g., X, Buff) special terms (e.g., * for LC)
Examples
Buff DS 4
LD D2, Buff
LD D2, Buff+1
Buff DS 4
Bend EQU *
Len EQU Bend - Buff
CPE 471 66
Relocation Expressions are evaluated at ____________
(not entirely true… as we’ll see later) Expressions can be relative or absolute Intuition:
the value of an absolute expression _________
__________ with program relocation What are the “rules” for well-formed
expressions?
CPE 471 67
Absolute Expressions An expression is absolute iff:
1. contains only absolute terms, OR2. contains relative terms provided:
i) they occur in pairs, ANDii) terms in each pair have opposite sign,
ANDiii) relative terms do not enter in * or /
Examples of 1: 3+4+x2 2*X where:
Examples of 2: Buff2-Buf
CPE 471 68
Relative Expressions An expression is relative iff: 1. all relative terms can be paired as
above, except one, AND2. that remaining unpaired term is positive, AND
3. relative terms do not enter into * or / Examples:
Buff+6